feat: add WorldSim — OSINT-powered personality simulation skill

Rehoboam-class worldsim. Immersive CLI personality simulator that researches real people via 25+ verified platform access methods, builds 6-layer psychometric profiles, finds star threads (personality compression keys), and generates platform-authentic simulated conversations with mechanical verification and adversarial refinement. 26 files | 38K words | 2,283 lines Python - Immersive CLI interface (worldsim> prompt, no assistant framing) - OSINT pipeline: X API, Instagram private API, Bluesky, TikTok, Facebook, Threads, Mastodon, Reddit, GitHub, HN, Medium, Quora, Goodreads, Google Scholar, Crunchbase, podcasts, news/blogs - Star thread: one-sentence personality compression key per person - Deep psychometrics: Big Five + Moral Foundations + Schwartz Values + Cognitive Style + Narrative Framing + Behavioral Metadata - Anti-slop: mechanical detection of LLM writing patterns - GAN-style adversarial refinement loop with mechanical verification - Recursive self-improvement: learned rules grow with each simulation - Rehoboam persistence: SQLite + filesystem for profiles, predictions, social graph, knowledge archives - GEPA/MIPROv2 self-evolution integration tested and working - Knowledge archive: per-person source library with citations and semantic retrieval for context-aware grounding Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-04-08 13:46:20 -04:00
186 changed files with 10655 additions and 12549 deletions
@@ -81,14 +81,6 @@
 # HF_TOKEN=
 # OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1  # Override default base URL

-# =============================================================================
-# LLM PROVIDER (Qwen OAuth)
-# =============================================================================
-# Qwen OAuth reuses your local Qwen CLI login (qwen auth qwen-oauth).
-# No API key needed — credentials come from ~/.qwen/oauth_creds.json.
-# Optional base URL override:
-# HERMES_QWEN_BASE_URL=https://portal.qwen.ai/v1
-
 # =============================================================================
 # TOOL API KEYS
 # =============================================================================
@@ -8,9 +8,6 @@ on:
  release:
    types: [published]

-permissions:
-  contents: read
-
 concurrency:
  group: docker-${{ github.ref }}
  cancel-in-progress: true
@@ -20,29 +17,22 @@ jobs:
    # Only run on the upstream repository, not on forks
    if: github.repository == 'NousResearch/hermes-agent'
    runs-on: ubuntu-latest
-    timeout-minutes: 60
+    timeout-minutes: 30
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          submodules: recursive

-      - name: Set up QEMU
-        uses: docker/setup-qemu-action@v3
-
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

-      # Build amd64 only so we can `load` the image for smoke testing.
-      # `load: true` cannot export a multi-arch manifest to the local daemon.
-      # The multi-arch build follows on push to main / release.
-      - name: Build image (amd64, smoke test)
+      - name: Build image
        uses: docker/build-push-action@v6
        with:
          context: .
          file: Dockerfile
          load: true
-          platforms: linux/amd64
          tags: nousresearch/hermes-agent:test
          cache-from: type=gha
          cache-to: type=gha,mode=max
@@ -61,28 +51,26 @@ jobs:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

-      - name: Push multi-arch image (main branch)
+      - name: Push image (main branch)
        if: github.event_name == 'push' && github.ref == 'refs/heads/main'
        uses: docker/build-push-action@v6
        with:
          context: .
          file: Dockerfile
          push: true
-          platforms: linux/amd64,linux/arm64
          tags: |
            nousresearch/hermes-agent:latest
            nousresearch/hermes-agent:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

-      - name: Push multi-arch image (release)
+      - name: Push image (release)
        if: github.event_name == 'release'
        uses: docker/build-push-action@v6
        with:
          context: .
          file: Dockerfile
          push: true
-          platforms: linux/amd64,linux/arm64
          tags: |
            nousresearch/hermes-agent:latest
            nousresearch/hermes-agent:${{ github.event.release.tag_name }}
@@ -1,346 +0,0 @@
-# Hermes Agent v0.8.0 (v2026.4.8)
-
-**Release Date:** April 8, 2026
-
-> The intelligence release — background task auto-notifications, free MiMo v2 Pro on Nous Portal, live model switching across all platforms, self-optimized GPT/Codex guidance, native Google AI Studio, smart inactivity timeouts, approval buttons, MCP OAuth 2.1, and 209 merged PRs with 82 resolved issues.
-
---
-
-## ✨ Highlights
-
- **Background Process Auto-Notifications (`notify_on_complete`)** — Background tasks can now automatically notify the agent when they finish. Start a long-running process (AI model training, test suites, deployments, builds) and the agent gets notified on completion — no polling needed. The agent can keep working on other things and pick up results when they land. ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779))
-
- **Free Xiaomi MiMo v2 Pro on Nous Portal** — Nous Portal now supports the free-tier Xiaomi MiMo v2 Pro model for auxiliary tasks (compression, vision, summarization), with free-tier model gating and pricing display in model selection. ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018), [#5880](https://github.com/NousResearch/hermes-agent/pull/5880))
-
- **Live Model Switching (`/model` Command)** — Switch models and providers mid-session from CLI, Telegram, Discord, Slack, or any gateway platform. Aggregator-aware resolution keeps you on OpenRouter/Nous when possible, with automatic cross-provider fallback when needed. Interactive model pickers on Telegram and Discord with inline buttons. ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181), [#5742](https://github.com/NousResearch/hermes-agent/pull/5742))
-
- **Self-Optimized GPT/Codex Tool-Use Guidance** — The agent diagnosed and patched 5 failure modes in GPT and Codex tool calling through automated behavioral benchmarking, dramatically improving reliability on OpenAI models. Includes execution discipline guidance and thinking-only prefill continuation for structured reasoning. ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120), [#5414](https://github.com/NousResearch/hermes-agent/pull/5414), [#5931](https://github.com/NousResearch/hermes-agent/pull/5931))
-
- **Google AI Studio (Gemini) Native Provider** — Direct access to Gemini models through Google's AI Studio API. Includes automatic models.dev registry integration for real-time context length detection across any provider. ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577))
-
- **Inactivity-Based Agent Timeouts** — Gateway and cron timeouts now track actual tool activity instead of wall-clock time. Long-running tasks that are actively working will never be killed — only truly idle agents time out. ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389), [#5440](https://github.com/NousResearch/hermes-agent/pull/5440))
-
- **Approval Buttons on Slack & Telegram** — Dangerous command approval via native platform buttons instead of typing `/approve`. Slack gets thread context preservation; Telegram gets emoji reactions for approval status. ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
-
- **MCP OAuth 2.1 PKCE + OSV Malware Scanning** — Full standards-compliant OAuth for MCP server authentication, plus automatic malware scanning of MCP extension packages via the OSV vulnerability database. ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420), [#5305](https://github.com/NousResearch/hermes-agent/pull/5305))
-
- **Centralized Logging & Config Validation** — Structured logging to `~/.hermes/logs/` (agent.log + errors.log) with the `hermes logs` command for tailing and filtering. Config structure validation catches malformed YAML at startup before it causes cryptic failures. ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430), [#5426](https://github.com/NousResearch/hermes-agent/pull/5426))
-
- **Plugin System Expansion** — Plugins can now register CLI subcommands, receive request-scoped API hooks with correlation IDs, prompt for required env vars during install, and hook into session lifecycle events (finalize/reset). ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295), [#5427](https://github.com/NousResearch/hermes-agent/pull/5427), [#5470](https://github.com/NousResearch/hermes-agent/pull/5470), [#6129](https://github.com/NousResearch/hermes-agent/pull/6129))
-
- **Matrix Tier 1 & Platform Hardening** — Matrix gets reactions, read receipts, rich formatting, and room management. Discord adds channel controls and ignored channels. Signal gets full MEDIA: tag delivery. Mattermost gets file attachments. Comprehensive reliability fixes across all platforms. ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975), [#5602](https://github.com/NousResearch/hermes-agent/pull/5602))
-
- **Security Hardening Pass** — Consolidated SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards, cron path traversal hardening, and cross-session isolation. Terminal workdir sanitization across all backends. ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944), [#5613](https://github.com/NousResearch/hermes-agent/pull/5613), [#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
-
---
-
-## 🏗️ Core Agent & Architecture
-
-### Provider & Model Support
- **Native Google AI Studio (Gemini) provider** with models.dev integration for automatic context length detection ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577))
- **`/model` command — full provider+model system overhaul** — live switching across CLI and all gateway platforms with aggregator-aware resolution ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181))
- **Interactive model picker for Telegram and Discord** — inline button-based model selection ([#5742](https://github.com/NousResearch/hermes-agent/pull/5742))
- **Nous Portal free-tier model gating** with pricing display in model selection ([#5880](https://github.com/NousResearch/hermes-agent/pull/5880))
- **Model pricing display** for OpenRouter and Nous Portal providers ([#5416](https://github.com/NousResearch/hermes-agent/pull/5416))
- **xAI (Grok) prompt caching** via `x-grok-conv-id` header ([#5604](https://github.com/NousResearch/hermes-agent/pull/5604))
- **Grok added to tool-use enforcement models** for direct xAI usage ([#5595](https://github.com/NousResearch/hermes-agent/pull/5595))
- **MiniMax TTS provider** (speech-2.8) ([#4963](https://github.com/NousResearch/hermes-agent/pull/4963))
- **Non-agentic model warning** — warns users when loading Hermes LLM models not designed for tool use ([#5378](https://github.com/NousResearch/hermes-agent/pull/5378))
- **Ollama Cloud auth, /model switch persistence**, and alias tab completion ([#5269](https://github.com/NousResearch/hermes-agent/pull/5269))
- **Preserve dots in OpenCode Go model names** (minimax-m2.7, glm-4.5, kimi-k2.5) ([#5597](https://github.com/NousResearch/hermes-agent/pull/5597))
- **MiniMax models 404 fix** — strip /v1 from Anthropic base URL for OpenCode Go ([#4918](https://github.com/NousResearch/hermes-agent/pull/4918))
- **Provider credential reset windows** honored in pooled failover ([#5188](https://github.com/NousResearch/hermes-agent/pull/5188))
- **OAuth token sync** between credential pool and credentials file ([#4981](https://github.com/NousResearch/hermes-agent/pull/4981))
- **Stale OAuth credentials** no longer block OpenRouter users on auto-detect ([#5746](https://github.com/NousResearch/hermes-agent/pull/5746))
- **Codex OAuth credential pool disconnect** + expired token import fix ([#5681](https://github.com/NousResearch/hermes-agent/pull/5681))
- **Codex pool entry sync** from `~/.codex/auth.json` on exhaustion — @GratefulDave ([#5610](https://github.com/NousResearch/hermes-agent/pull/5610))
- **Auxiliary client payment fallback** — retry with next provider on 402 ([#5599](https://github.com/NousResearch/hermes-agent/pull/5599))
- **Auxiliary client resolves named custom providers** and 'main' alias ([#5978](https://github.com/NousResearch/hermes-agent/pull/5978))
- **Use mimo-v2-pro** for non-vision auxiliary tasks on Nous free tier ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018))
- **Vision auto-detection** tries main provider first ([#6041](https://github.com/NousResearch/hermes-agent/pull/6041))
- **Provider re-ordering and Quick Install** — @austinpickett ([#4664](https://github.com/NousResearch/hermes-agent/pull/4664))
- **Nous OAuth access_token** no longer used as inference API key — @SHL0MS ([#5564](https://github.com/NousResearch/hermes-agent/pull/5564))
- **HERMES_PORTAL_BASE_URL env var** respected during Nous login — @benbarclay ([#5745](https://github.com/NousResearch/hermes-agent/pull/5745))
- **Env var overrides** for Nous portal/inference URLs ([#5419](https://github.com/NousResearch/hermes-agent/pull/5419))
- **Z.AI endpoint auto-detect** via probe and cache ([#5763](https://github.com/NousResearch/hermes-agent/pull/5763))
- **MiniMax context lengths, model catalog, thinking guard, aux model, and config base_url** corrections ([#6082](https://github.com/NousResearch/hermes-agent/pull/6082))
- **Community provider/model resolution fixes** — salvaged 4 community PRs + MiniMax aux URL ([#5983](https://github.com/NousResearch/hermes-agent/pull/5983))
-
-### Agent Loop & Conversation
- **Self-optimized GPT/Codex tool-use guidance** via automated behavioral benchmarking — agent self-diagnosed and patched 5 failure modes ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120))
- **GPT/Codex execution discipline guidance** in system prompts ([#5414](https://github.com/NousResearch/hermes-agent/pull/5414))
- **Thinking-only prefill continuation** for structured reasoning responses ([#5931](https://github.com/NousResearch/hermes-agent/pull/5931))
- **Accept reasoning-only responses** without retries — set content to "(empty)" instead of infinite retry ([#5278](https://github.com/NousResearch/hermes-agent/pull/5278))
- **Jittered retry backoff** — exponential backoff with jitter for API retries ([#6048](https://github.com/NousResearch/hermes-agent/pull/6048))
- **Smart thinking block signature management** — preserve and manage Anthropic thinking signatures across turns ([#6112](https://github.com/NousResearch/hermes-agent/pull/6112))
- **Coerce tool call arguments** to match JSON Schema types — fixes models that send strings instead of numbers/booleans ([#5265](https://github.com/NousResearch/hermes-agent/pull/5265))
- **Save oversized tool results to file** instead of destructive truncation ([#5210](https://github.com/NousResearch/hermes-agent/pull/5210))
- **Sandbox-aware tool result persistence** ([#6085](https://github.com/NousResearch/hermes-agent/pull/6085))
- **Streaming fallback** improved after edit failures ([#6110](https://github.com/NousResearch/hermes-agent/pull/6110))
- **Codex empty-output gaps** covered in fallback + normalizer + auxiliary client ([#5724](https://github.com/NousResearch/hermes-agent/pull/5724), [#5730](https://github.com/NousResearch/hermes-agent/pull/5730), [#5734](https://github.com/NousResearch/hermes-agent/pull/5734))
- **Codex stream output backfill** from output_item.done events ([#5689](https://github.com/NousResearch/hermes-agent/pull/5689))
- **Stream consumer creates new message** after tool boundaries ([#5739](https://github.com/NousResearch/hermes-agent/pull/5739))
- **Codex validation aligned** with normalization for empty stream output ([#5940](https://github.com/NousResearch/hermes-agent/pull/5940))
- **Bridge tool-calls** in copilot-acp adapter ([#5460](https://github.com/NousResearch/hermes-agent/pull/5460))
- **Filter transcript-only roles** from chat-completions payload ([#4880](https://github.com/NousResearch/hermes-agent/pull/4880))
- **Context compaction failures fixed** on temperature-restricted models — @MadKangYu ([#5608](https://github.com/NousResearch/hermes-agent/pull/5608))
- **Sanitize tool_calls for all strict APIs** (Fireworks, Mistral, etc.) — @lumethegreat ([#5183](https://github.com/NousResearch/hermes-agent/pull/5183))
-
-### Memory & Sessions
- **Supermemory memory provider** — new memory plugin with multi-container, search_mode, identity template, and env var override ([#5737](https://github.com/NousResearch/hermes-agent/pull/5737), [#5933](https://github.com/NousResearch/hermes-agent/pull/5933))
- **Shared thread sessions** by default — multi-user thread support across gateway platforms ([#5391](https://github.com/NousResearch/hermes-agent/pull/5391))
- **Subagent sessions linked to parent** and hidden from session list ([#5309](https://github.com/NousResearch/hermes-agent/pull/5309))
- **Profile-scoped memory isolation** and clone support ([#4845](https://github.com/NousResearch/hermes-agent/pull/4845))
- **Thread gateway user_id to memory plugins** for per-user scoping ([#5895](https://github.com/NousResearch/hermes-agent/pull/5895))
- **Honcho plugin drift overhaul** + plugin CLI registration system ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295))
- **Honcho holographic prompt and trust score** rendering preserved ([#4872](https://github.com/NousResearch/hermes-agent/pull/4872))
- **Honcho doctor fix** — use recall_mode instead of memory_mode — @techguysimon ([#5645](https://github.com/NousResearch/hermes-agent/pull/5645))
- **RetainDB** — API routes, write queue, dialectic, agent model, file tools fixes ([#5461](https://github.com/NousResearch/hermes-agent/pull/5461))
- **Hindsight memory plugin overhaul** + memory setup wizard fixes ([#5094](https://github.com/NousResearch/hermes-agent/pull/5094))
- **mem0 API v2 compat**, prefetch context fencing, secret redaction ([#5423](https://github.com/NousResearch/hermes-agent/pull/5423))
- **mem0 env vars merged** with mem0.json instead of either/or ([#4939](https://github.com/NousResearch/hermes-agent/pull/4939))
- **Clean user message** used for all memory provider operations ([#4940](https://github.com/NousResearch/hermes-agent/pull/4940))
- **Silent memory flush failure** on /new and /resume fixed — @ryanautomated ([#5640](https://github.com/NousResearch/hermes-agent/pull/5640))
- **OpenViking atexit safety net** for session commit ([#5664](https://github.com/NousResearch/hermes-agent/pull/5664))
- **OpenViking tenant-scoping headers** for multi-tenant servers ([#4936](https://github.com/NousResearch/hermes-agent/pull/4936))
- **ByteRover brv query** runs synchronously before LLM call ([#4831](https://github.com/NousResearch/hermes-agent/pull/4831))
-
---
-
-## 📱 Messaging Platforms (Gateway)
-
-### Gateway Core
- **Inactivity-based agent timeout** — replaces wall-clock timeout with smart activity tracking; long-running active tasks never killed ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389))
- **Approval buttons for Slack & Telegram** + Slack thread context preservation ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890))
- **Live-stream /update output** + forward interactive prompts to user ([#5180](https://github.com/NousResearch/hermes-agent/pull/5180))
- **Infinite timeout support** + periodic notifications + actionable error messages ([#4959](https://github.com/NousResearch/hermes-agent/pull/4959))
- **Duplicate message prevention** — gateway dedup + partial stream guard ([#4878](https://github.com/NousResearch/hermes-agent/pull/4878))
- **Webhook delivery_info persistence** + full session id in /status ([#5942](https://github.com/NousResearch/hermes-agent/pull/5942))
- **Tool preview truncation** respects tool_preview_length in all/new progress modes ([#5937](https://github.com/NousResearch/hermes-agent/pull/5937))
- **Short preview truncation** restored for all/new tool progress modes ([#4935](https://github.com/NousResearch/hermes-agent/pull/4935))
- **Update-pending state** written atomically to prevent corruption ([#4923](https://github.com/NousResearch/hermes-agent/pull/4923))
- **Approval session key isolated** per turn ([#4884](https://github.com/NousResearch/hermes-agent/pull/4884))
- **Active-session guard bypass** for /approve, /deny, /stop, /new ([#4926](https://github.com/NousResearch/hermes-agent/pull/4926), [#5765](https://github.com/NousResearch/hermes-agent/pull/5765))
- **Typing indicator paused** during approval waits ([#5893](https://github.com/NousResearch/hermes-agent/pull/5893))
- **Caption check** uses exact line-by-line match instead of substring (all platforms) ([#5939](https://github.com/NousResearch/hermes-agent/pull/5939))
- **MEDIA: tags stripped** from streamed gateway messages ([#5152](https://github.com/NousResearch/hermes-agent/pull/5152))
- **MEDIA: tags extracted** from cron delivery before sending ([#5598](https://github.com/NousResearch/hermes-agent/pull/5598))
- **Profile-aware service units** + voice transcription cleanup ([#5972](https://github.com/NousResearch/hermes-agent/pull/5972))
- **Thread-safe PairingStore** with atomic writes — @CharlieKerfoot ([#5656](https://github.com/NousResearch/hermes-agent/pull/5656))
- **Sanitize media URLs** in base platform logs — @WAXLYY ([#5631](https://github.com/NousResearch/hermes-agent/pull/5631))
- **Reduce Telegram fallback IP activation log noise** — @MadKangYu ([#5615](https://github.com/NousResearch/hermes-agent/pull/5615))
- **Cron static method wrappers** to prevent self-binding ([#5299](https://github.com/NousResearch/hermes-agent/pull/5299))
- **Stale 'hermes login' replaced** with 'hermes auth' + credential removal re-seeding fix ([#5670](https://github.com/NousResearch/hermes-agent/pull/5670))
-
-### Telegram
- **Group topics skill binding** for supergroup forum topics ([#4886](https://github.com/NousResearch/hermes-agent/pull/4886))
- **Emoji reactions** for approval status and notifications ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
- **Duplicate message delivery prevented** on send timeout ([#5153](https://github.com/NousResearch/hermes-agent/pull/5153))
- **Command names sanitized** to strip invalid characters ([#5596](https://github.com/NousResearch/hermes-agent/pull/5596))
- **Per-platform disabled skills** respected in Telegram menu and gateway dispatch ([#4799](https://github.com/NousResearch/hermes-agent/pull/4799))
- **/approve and /deny** routed through running-agent guard ([#4798](https://github.com/NousResearch/hermes-agent/pull/4798))
-
-### Discord
- **Channel controls** — ignored_channels and no_thread_channels config options ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
- **Skills registered as native slash commands** via shared gateway logic ([#5603](https://github.com/NousResearch/hermes-agent/pull/5603))
- **/approve, /deny, /queue, /background, /btw** registered as native slash commands ([#4800](https://github.com/NousResearch/hermes-agent/pull/4800), [#5477](https://github.com/NousResearch/hermes-agent/pull/5477))
- **Unnecessary members intent** removed on startup + token lock leak fix ([#5302](https://github.com/NousResearch/hermes-agent/pull/5302))
-
-### Slack
- **Thread engagement** — auto-respond in bot-started and mentioned threads ([#5897](https://github.com/NousResearch/hermes-agent/pull/5897))
- **mrkdwn in edit_message** + thread replies without @mentions ([#5733](https://github.com/NousResearch/hermes-agent/pull/5733))
-
-### Matrix
- **Tier 1 feature parity** — reactions, read receipts, rich formatting, room management ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275))
- **MATRIX_REQUIRE_MENTION and MATRIX_AUTO_THREAD** support ([#5106](https://github.com/NousResearch/hermes-agent/pull/5106))
- **Comprehensive reliability** — encrypted media, auth recovery, cron E2EE, Synapse compat ([#5271](https://github.com/NousResearch/hermes-agent/pull/5271))
- **CJK input, E2EE, and reconnect** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
-
-### Signal
- **Full MEDIA: tag delivery** — send_image_file, send_voice, and send_video implemented ([#5602](https://github.com/NousResearch/hermes-agent/pull/5602))
-
-### Mattermost
- **File attachments** — set message type to DOCUMENT when post has file attachments — @nericervin ([#5609](https://github.com/NousResearch/hermes-agent/pull/5609))
-
-### Feishu
- **Interactive card approval buttons** ([#6043](https://github.com/NousResearch/hermes-agent/pull/6043))
- **Reconnect and ACL** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
-
-### Webhooks
- **`{__raw__}` template token** and thread_id passthrough for forum topics ([#5662](https://github.com/NousResearch/hermes-agent/pull/5662))
-
---
-
-## 🖥️ CLI & User Experience
-
-### Interactive CLI
- **Defer response content** until reasoning block completes ([#5773](https://github.com/NousResearch/hermes-agent/pull/5773))
- **Ghost status-bar lines cleared** on terminal resize ([#4960](https://github.com/NousResearch/hermes-agent/pull/4960))
- **Normalise \r\n and \r line endings** in pasted text ([#4849](https://github.com/NousResearch/hermes-agent/pull/4849))
- **ChatConsole errors, curses scroll, skin-aware banner, git state** banner fixes ([#5974](https://github.com/NousResearch/hermes-agent/pull/5974))
- **Native Windows image paste** support ([#5917](https://github.com/NousResearch/hermes-agent/pull/5917))
- **--yolo and other flags** no longer silently dropped when placed before 'chat' subcommand ([#5145](https://github.com/NousResearch/hermes-agent/pull/5145))
-
-### Setup & Configuration
- **Config structure validation** — detect malformed YAML at startup with actionable error messages ([#5426](https://github.com/NousResearch/hermes-agent/pull/5426))
- **Centralized logging** to `~/.hermes/logs/` — agent.log (INFO+), errors.log (WARNING+) with `hermes logs` command ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430))
- **Docs links added** to setup wizard sections ([#5283](https://github.com/NousResearch/hermes-agent/pull/5283))
- **Doctor diagnostics** — sync provider checks, config migration, WAL and mem0 diagnostics ([#5077](https://github.com/NousResearch/hermes-agent/pull/5077))
- **Timeout debug logging** and user-facing diagnostics improved ([#5370](https://github.com/NousResearch/hermes-agent/pull/5370))
- **Reasoning effort unified** to config.yaml only ([#6118](https://github.com/NousResearch/hermes-agent/pull/6118))
- **Permanent command allowlist** loaded on startup ([#5076](https://github.com/NousResearch/hermes-agent/pull/5076))
- **`hermes auth remove`** now clears env-seeded credentials permanently ([#5285](https://github.com/NousResearch/hermes-agent/pull/5285))
- **Bundled skills synced to all profiles** during update ([#5795](https://github.com/NousResearch/hermes-agent/pull/5795))
- **`hermes update` no longer kills** freshly-restarted gateway service ([#5448](https://github.com/NousResearch/hermes-agent/pull/5448))
- **Subprocess.run() timeouts** added to all gateway CLI commands ([#5424](https://github.com/NousResearch/hermes-agent/pull/5424))
- **Actionable error message** when Codex refresh token is reused — @tymrtn ([#5612](https://github.com/NousResearch/hermes-agent/pull/5612))
- **Google-workspace skill scripts** can now run directly — @xinbenlv ([#5624](https://github.com/NousResearch/hermes-agent/pull/5624))
-
-### Cron System
- **Inactivity-based cron timeout** — replaces wall-clock; active tasks run indefinitely ([#5440](https://github.com/NousResearch/hermes-agent/pull/5440))
- **Pre-run script injection** for data collection and change detection ([#5082](https://github.com/NousResearch/hermes-agent/pull/5082))
- **Delivery failure tracking** in job status ([#6042](https://github.com/NousResearch/hermes-agent/pull/6042))
- **Delivery guidance** in cron prompts — stops send_message thrashing ([#5444](https://github.com/NousResearch/hermes-agent/pull/5444))
- **MEDIA files delivered** as native platform attachments ([#5921](https://github.com/NousResearch/hermes-agent/pull/5921))
- **[SILENT] suppression** works anywhere in response — @auspic7 ([#5654](https://github.com/NousResearch/hermes-agent/pull/5654))
- **Cron path traversal** hardening ([#5147](https://github.com/NousResearch/hermes-agent/pull/5147))
-
---
-
-## 🔧 Tool System
-
-### Terminal & Execution
- **Execute_code on remote backends** — code execution now works on Docker, SSH, Modal, and other remote terminal backends ([#5088](https://github.com/NousResearch/hermes-agent/pull/5088))
- **Exit code context** for common CLI tools in terminal results — helps agent understand what went wrong ([#5144](https://github.com/NousResearch/hermes-agent/pull/5144))
- **Progressive subdirectory hint discovery** — agent learns project structure as it navigates ([#5291](https://github.com/NousResearch/hermes-agent/pull/5291))
- **notify_on_complete for background processes** — get notified when long-running tasks finish ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779))
- **Docker env config** — explicit container environment variables via docker_env config ([#4738](https://github.com/NousResearch/hermes-agent/pull/4738))
- **Approval metadata included** in terminal tool results ([#5141](https://github.com/NousResearch/hermes-agent/pull/5141))
- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
- **Detached process crash recovery** state corrected ([#6101](https://github.com/NousResearch/hermes-agent/pull/6101))
- **Agent-browser paths with spaces** preserved — @Vasanthdev2004 ([#6077](https://github.com/NousResearch/hermes-agent/pull/6077))
- **Portable base64 encoding** for image reading on macOS — @CharlieKerfoot ([#5657](https://github.com/NousResearch/hermes-agent/pull/5657))
-
-### Browser
- **Switch managed browser provider** from Browserbase to Browser Use — @benbarclay ([#5750](https://github.com/NousResearch/hermes-agent/pull/5750))
- **Firecrawl cloud browser** provider — @alt-glitch ([#5628](https://github.com/NousResearch/hermes-agent/pull/5628))
- **JS evaluation** via browser_console expression parameter ([#5303](https://github.com/NousResearch/hermes-agent/pull/5303))
- **Windows browser** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
-
-### MCP
- **MCP OAuth 2.1 PKCE** — full standards-compliant OAuth client support ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420))
- **OSV malware check** for MCP extension packages ([#5305](https://github.com/NousResearch/hermes-agent/pull/5305))
- **Prefer structuredContent over text** + no_mcp sentinel ([#5979](https://github.com/NousResearch/hermes-agent/pull/5979))
- **Unknown toolsets warning suppressed** for MCP server names ([#5279](https://github.com/NousResearch/hermes-agent/pull/5279))
-
-### Web & Files
- **.zip document support** + auto-mount cache dirs into remote backends ([#4846](https://github.com/NousResearch/hermes-agent/pull/4846))
- **Redact query secrets** in send_message errors — @WAXLYY ([#5650](https://github.com/NousResearch/hermes-agent/pull/5650))
-
-### Delegation
- **Credential pool sharing** + workspace path hints for subagents ([#5748](https://github.com/NousResearch/hermes-agent/pull/5748))
-
-### ACP (VS Code / Zed / JetBrains)
- **Aggregate ACP improvements** — auth compat, protocol fixes, command ads, delegation, SSE events ([#5292](https://github.com/NousResearch/hermes-agent/pull/5292))
-
---
-
-## 🧩 Skills Ecosystem
-
-### Skills System
- **Skill config interface** — skills can declare required config.yaml settings, prompted during setup, injected at load time ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635))
- **Plugin CLI registration system** — plugins register their own CLI subcommands without touching main.py ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295))
- **Request-scoped API hooks** with tool call correlation IDs for plugins ([#5427](https://github.com/NousResearch/hermes-agent/pull/5427))
- **Session lifecycle hooks** — on_session_finalize and on_session_reset for CLI + gateway ([#6129](https://github.com/NousResearch/hermes-agent/pull/6129))
- **Prompt for required env vars** during plugin install — @kshitijk4poor ([#5470](https://github.com/NousResearch/hermes-agent/pull/5470))
- **Plugin name validation** — reject names that resolve to plugins root ([#5368](https://github.com/NousResearch/hermes-agent/pull/5368))
- **pre_llm_call plugin context** moved to user message to preserve prompt cache ([#5146](https://github.com/NousResearch/hermes-agent/pull/5146))
-
-### New & Updated Skills
- **popular-web-designs** — 54 production website design systems ([#5194](https://github.com/NousResearch/hermes-agent/pull/5194))
- **p5js creative coding** — @SHL0MS ([#5600](https://github.com/NousResearch/hermes-agent/pull/5600))
- **manim-video** — mathematical and technical animations — @SHL0MS ([#4930](https://github.com/NousResearch/hermes-agent/pull/4930))
- **llm-wiki** — Karpathy's LLM Wiki skill ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635))
- **gitnexus-explorer** — codebase indexing and knowledge serving ([#5208](https://github.com/NousResearch/hermes-agent/pull/5208))
- **research-paper-writing** — AI-Scientist & GPT-Researcher patterns — @SHL0MS ([#5421](https://github.com/NousResearch/hermes-agent/pull/5421))
- **blogwatcher** updated to JulienTant's fork ([#5759](https://github.com/NousResearch/hermes-agent/pull/5759))
- **claude-code skill** comprehensive rewrite v2.0 + v2.2 ([#5155](https://github.com/NousResearch/hermes-agent/pull/5155), [#5158](https://github.com/NousResearch/hermes-agent/pull/5158))
- **Code verification skills** consolidated into one ([#4854](https://github.com/NousResearch/hermes-agent/pull/4854))
- **Manim CE reference docs** expanded — geometry, animations, LaTeX — @leotrs ([#5791](https://github.com/NousResearch/hermes-agent/pull/5791))
- **Manim-video references** — design thinking, updaters, paper explainer, decorations, production quality — @SHL0MS ([#5588](https://github.com/NousResearch/hermes-agent/pull/5588), [#5408](https://github.com/NousResearch/hermes-agent/pull/5408))
-
---
-
-## 🔒 Security & Reliability
-
-### Security Hardening
- **Consolidated security** — SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944))
- **Cross-session isolation** + cron path traversal hardening ([#5613](https://github.com/NousResearch/hermes-agent/pull/5613))
- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
- **Approval 'once' session escalation** prevented + cron delivery platform validation ([#5280](https://github.com/NousResearch/hermes-agent/pull/5280))
- **Profile-scoped Google Workspace OAuth tokens** protected ([#4910](https://github.com/NousResearch/hermes-agent/pull/4910))
-
-### Reliability
- **Aggressive worktree and branch cleanup** to prevent accumulation ([#6134](https://github.com/NousResearch/hermes-agent/pull/6134))
- **O(n²) catastrophic backtracking** in redact regex fixed — 100x improvement on large outputs ([#4962](https://github.com/NousResearch/hermes-agent/pull/4962))
- **Runtime stability fixes** across core, web, delegate, and browser tools ([#4843](https://github.com/NousResearch/hermes-agent/pull/4843))
- **API server streaming fix** + conversation history support ([#5977](https://github.com/NousResearch/hermes-agent/pull/5977))
- **OpenViking API endpoint paths** and response parsing corrected ([#5078](https://github.com/NousResearch/hermes-agent/pull/5078))
-
---
-
-## 🐛 Notable Bug Fixes
-
- **9 community bugfixes salvaged** — gateway, cron, deps, macOS launchd in one batch ([#5288](https://github.com/NousResearch/hermes-agent/pull/5288))
- **Batch core bug fixes** — model config, session reset, alias fallback, launchctl, delegation, atomic writes ([#5630](https://github.com/NousResearch/hermes-agent/pull/5630))
- **Batch gateway/platform fixes** — matrix E2EE, CJK input, Windows browser, Feishu reconnect + ACL ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
- **Stale test skips removed**, regex backtracking, file search bug, and test flakiness ([#4969](https://github.com/NousResearch/hermes-agent/pull/4969))
- **Nix flake** — read version, regen uv.lock, add hermes_logging — @alt-glitch ([#5651](https://github.com/NousResearch/hermes-agent/pull/5651))
- **Lowercase variable redaction** regression tests ([#5185](https://github.com/NousResearch/hermes-agent/pull/5185))
-
---
-
-## 🧪 Testing
-
- **57 failing CI tests repaired** across 14 files ([#5823](https://github.com/NousResearch/hermes-agent/pull/5823))
- **Test suite re-architecture** + CI failure fixes — @alt-glitch ([#5946](https://github.com/NousResearch/hermes-agent/pull/5946))
- **Codebase-wide lint cleanup** — unused imports, dead code, and inefficient patterns ([#5821](https://github.com/NousResearch/hermes-agent/pull/5821))
- **browser_close tool removed** — auto-cleanup handles it ([#5792](https://github.com/NousResearch/hermes-agent/pull/5792))
-
---
-
-## 📚 Documentation
-
- **Comprehensive documentation audit** — fix stale info, expand thin pages, add depth ([#5393](https://github.com/NousResearch/hermes-agent/pull/5393))
- **40+ discrepancies fixed** between documentation and codebase ([#5818](https://github.com/NousResearch/hermes-agent/pull/5818))
- **13 features documented** from last week's PRs ([#5815](https://github.com/NousResearch/hermes-agent/pull/5815))
- **Guides section overhaul** — fix existing + add 3 new tutorials ([#5735](https://github.com/NousResearch/hermes-agent/pull/5735))
- **Salvaged 4 docs PRs** — docker setup, post-update validation, local LLM guide, signal-cli install ([#5727](https://github.com/NousResearch/hermes-agent/pull/5727))
- **Discord configuration reference** ([#5386](https://github.com/NousResearch/hermes-agent/pull/5386))
- **Community FAQ entries** for common workflows and troubleshooting ([#4797](https://github.com/NousResearch/hermes-agent/pull/4797))
- **WSL2 networking guide** for local model servers ([#5616](https://github.com/NousResearch/hermes-agent/pull/5616))
- **Honcho CLI reference** + plugin CLI registration docs ([#5308](https://github.com/NousResearch/hermes-agent/pull/5308))
- **Obsidian Headless setup** for servers in llm-wiki ([#5660](https://github.com/NousResearch/hermes-agent/pull/5660))
- **Hermes Mod visual skin editor** added to skins page ([#6095](https://github.com/NousResearch/hermes-agent/pull/6095))
-
---
-
-## 👥 Contributors
-
-### Core
- **@teknium1** — 179 PRs
-
-### Top Community Contributors
- **@SHL0MS** (7 PRs) — p5js creative coding skill, manim-video skill + 5 reference expansions, research-paper-writing, Nous OAuth fix, manim font fix
- **@alt-glitch** (3 PRs) — Firecrawl cloud browser provider, test re-architecture + CI fixes, Nix flake fixes
- **@benbarclay** (2 PRs) — Browser Use managed provider switch, Nous portal base URL fix
- **@CharlieKerfoot** (2 PRs) — macOS portable base64 encoding, thread-safe PairingStore
- **@WAXLYY** (2 PRs) — send_message secret redaction, gateway media URL sanitization
- **@MadKangYu** (2 PRs) — Telegram log noise reduction, context compaction fix for temperature-restricted models
-
-### All Contributors
-@alt-glitch, @austinpickett, @auspic7, @benbarclay, @CharlieKerfoot, @GratefulDave, @kshitijk4poor, @leotrs, @lumethegreat, @MadKangYu, @nericervin, @ryanautomated, @SHL0MS, @techguysimon, @tymrtn, @Vasanthdev2004, @WAXLYY, @xinbenlv
-
---
-
-**Full Changelog**: [v2026.4.3...v2026.4.8](https://github.com/NousResearch/hermes-agent/compare/v2026.4.3...v2026.4.8)
@@ -163,17 +163,6 @@ def _is_oauth_token(key: str) -> bool:
    return True


-def _normalize_base_url_text(base_url) -> str:
-    """Normalize SDK/base transport URL values to a plain string for inspection.
-
-    Some client objects expose ``base_url`` as an ``httpx.URL`` instead of a raw
-    string.  Provider/auth detection should accept either shape.
-    """
-    if not base_url:
-        return ""
-    return str(base_url).strip()
-
-
 def _is_third_party_anthropic_endpoint(base_url: str | None) -> bool:
    """Return True for non-Anthropic endpoints using the Anthropic Messages API.

@@ -181,10 +170,9 @@ def _is_third_party_anthropic_endpoint(base_url: str | None) -> bool:
    with their own API keys via x-api-key, not Anthropic OAuth tokens. OAuth
    detection should be skipped for these endpoints.
    """
-    normalized = _normalize_base_url_text(base_url)
-    if not normalized:
+    if not base_url:
        return False  # No base_url = direct Anthropic API
-    normalized = normalized.rstrip("/").lower()
+    normalized = base_url.rstrip("/").lower()
    if "anthropic.com" in normalized:
        return False  # Direct Anthropic API — OAuth applies
    return True  # Any other endpoint is a third-party proxy
@@ -194,13 +182,12 @@ def _requires_bearer_auth(base_url: str | None) -> bool:
    """Return True for Anthropic-compatible providers that require Bearer auth.

    Some third-party /anthropic endpoints implement Anthropic's Messages API but
-    require Authorization: Bearer *** of Anthropic's native x-api-key header.
+    require Authorization: Bearer instead of Anthropic's native x-api-key header.
    MiniMax's global and China Anthropic-compatible endpoints follow this pattern.
    """
-    normalized = _normalize_base_url_text(base_url)
-    if not normalized:
+    if not base_url:
        return False
-    normalized = normalized.rstrip("/").lower()
+    normalized = base_url.rstrip("/").lower()
    return normalized.startswith(("https://api.minimax.io/anthropic", "https://api.minimaxi.com/anthropic"))


@@ -216,14 +203,13 @@ def build_anthropic_client(api_key: str, base_url: str = None):
        )
    from httpx import Timeout

-    normalized_base_url = _normalize_base_url_text(base_url)
    kwargs = {
        "timeout": Timeout(timeout=900.0, connect=10.0),
    }
-    if normalized_base_url:
-        kwargs["base_url"] = normalized_base_url
+    if base_url:
+        kwargs["base_url"] = base_url

-    if _requires_bearer_auth(normalized_base_url):
+    if _requires_bearer_auth(base_url):
        # Some Anthropic-compatible providers (e.g. MiniMax) expect the API key in
        # Authorization: Bearer even for regular API keys. Route those endpoints
        # through auth_token so the SDK sends Bearer auth instead of x-api-key.
@@ -956,18 +942,12 @@ def _convert_content_to_anthropic(content: Any) -> Any:

 def convert_messages_to_anthropic(
    messages: List[Dict],
-    base_url: str | None = None,
 ) -> Tuple[Optional[Any], List[Dict]]:
    """Convert OpenAI-format messages to Anthropic format.

    Returns (system_prompt, anthropic_messages).
    System messages are extracted since Anthropic takes them as a separate param.
    system_prompt is a string or list of content blocks (when cache_control present).
-
-    When *base_url* is provided and points to a third-party Anthropic-compatible
-    endpoint, all thinking block signatures are stripped.  Signatures are
-    Anthropic-proprietary — third-party endpoints cannot validate them and will
-    reject them with HTTP 400 "Invalid signature in thinking block".
    """
    system = None
    result = []
@@ -1122,15 +1102,7 @@ def convert_messages_to_anthropic(
                        curr_content = [{"type": "text", "text": curr_content}]
                    fixed[-1]["content"] = prev_content + curr_content
            else:
-                # Consecutive assistant messages — merge text content.
-                # Drop thinking blocks from the *second* message: their
-                # signature was computed against a different turn boundary
-                # and becomes invalid once merged.
-                if isinstance(m["content"], list):
-                    m["content"] = [
-                        b for b in m["content"]
-                        if not (isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking"))
-                    ]
+                # Consecutive assistant messages — merge text content
                prev_blocks = fixed[-1]["content"]
                curr_blocks = m["content"]
                if isinstance(prev_blocks, list) and isinstance(curr_blocks, list):
@@ -1148,79 +1120,6 @@ def convert_messages_to_anthropic(
            fixed.append(m)
    result = fixed

-    # ── Thinking block signature management ──────────────────────────
-    # Anthropic signs thinking blocks against the full turn content.
-    # Any upstream mutation (context compression, session truncation,
-    # orphan stripping, message merging) invalidates the signature,
-    # causing HTTP 400 "Invalid signature in thinking block".
-    #
-    # Signatures are Anthropic-proprietary.  Third-party endpoints
-    # (MiniMax, Azure AI Foundry, self-hosted proxies) cannot validate
-    # them and will reject them outright.  When targeting a third-party
-    # endpoint, strip ALL thinking/redacted_thinking blocks from every
-    # assistant message — the third-party will generate its own
-    # thinking blocks if it supports extended thinking.
-    #
-    # For direct Anthropic (strategy following clawdbot/OpenClaw):
-    # 1. Strip thinking/redacted_thinking from all assistant messages
-    #    EXCEPT the last one — preserves reasoning continuity on the
-    #    current tool-use chain while avoiding stale signature errors.
-    # 2. Downgrade unsigned thinking blocks (no signature) to text —
-    #    Anthropic can't validate them and will reject them.
-    # 3. Strip cache_control from thinking/redacted_thinking blocks —
-    #    cache markers can interfere with signature validation.
-    _THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
-    _is_third_party = _is_third_party_anthropic_endpoint(base_url)
-
-    last_assistant_idx = None
-    for i in range(len(result) - 1, -1, -1):
-        if result[i].get("role") == "assistant":
-            last_assistant_idx = i
-            break
-
-    for idx, m in enumerate(result):
-        if m.get("role") != "assistant" or not isinstance(m.get("content"), list):
-            continue
-
-        if _is_third_party or idx != last_assistant_idx:
-            # Third-party endpoint: strip ALL thinking blocks from every
-            # assistant message — signatures are Anthropic-proprietary.
-            # Direct Anthropic: strip from non-latest assistant messages only.
-            stripped = [
-                b for b in m["content"]
-                if not (isinstance(b, dict) and b.get("type") in _THINKING_TYPES)
-            ]
-            m["content"] = stripped or [{"type": "text", "text": "(thinking elided)"}]
-        else:
-            # Latest assistant on direct Anthropic: keep signed thinking
-            # blocks for reasoning continuity; downgrade unsigned ones to
-            # plain text.
-            new_content = []
-            for b in m["content"]:
-                if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
-                    new_content.append(b)
-                    continue
-                if b.get("type") == "redacted_thinking":
-                    # Redacted blocks use 'data' for the signature payload
-                    if b.get("data"):
-                        new_content.append(b)
-                    # else: drop — no data means it can't be validated
-                elif b.get("signature"):
-                    # Signed thinking block — keep it
-                    new_content.append(b)
-                else:
-                    # Unsigned thinking — downgrade to text so it's not lost
-                    thinking_text = b.get("thinking", "")
-                    if thinking_text:
-                        new_content.append({"type": "text", "text": thinking_text})
-            m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
-
-        # Strip cache_control from any remaining thinking/redacted_thinking
-        # blocks — cache markers interfere with signature validation.
-        for b in m["content"]:
-            if isinstance(b, dict) and b.get("type") in _THINKING_TYPES:
-                b.pop("cache_control", None)
-
    return system, result


@@ -1234,7 +1133,6 @@ def build_anthropic_kwargs(
    is_oauth: bool = False,
    preserve_dots: bool = False,
    context_length: Optional[int] = None,
-    base_url: str | None = None,
 ) -> Dict[str, Any]:
    """Build kwargs for anthropic.messages.create().

@@ -1248,11 +1146,8 @@ def build_anthropic_kwargs(

    When *preserve_dots* is True, model name dots are not converted to hyphens
    (for Alibaba/DashScope anthropic-compatible endpoints: qwen3.5-plus).
-
-    When *base_url* points to a third-party Anthropic-compatible endpoint,
-    thinking block signatures are stripped (they are Anthropic-proprietary).
    """
-    system, anthropic_messages = convert_messages_to_anthropic(messages, base_url=base_url)
+    system, anthropic_messages = convert_messages_to_anthropic(messages)
    anthropic_tools = convert_tools_to_anthropic(tools) if tools else []

    model = normalize_model_name(model, preserve_dots=preserve_dots)
@@ -1329,9 +1224,9 @@ def build_anthropic_kwargs(
    # Map reasoning_config to Anthropic's thinking parameter.
    # Claude 4.6 models use adaptive thinking + output_config.effort.
    # Older models use manual thinking with budget_tokens.
-    # Haiku and MiniMax models do NOT support extended thinking — skip entirely.
+    # Haiku models do NOT support extended thinking at all — skip entirely.
    if reasoning_config and isinstance(reasoning_config, dict):
-        if reasoning_config.get("enabled") is not False and "haiku" not in model.lower() and "minimax" not in model.lower():
+        if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
            effort = str(reasoning_config.get("effort", "medium")).lower()
            budget = THINKING_BUDGET.get(effort, 8000)
            if _supports_adaptive_thinking(model):
@@ -59,48 +59,13 @@ from hermes_constants import OPENROUTER_BASE_URL

 logger = logging.getLogger(__name__)

-_PROVIDER_ALIASES = {
-    "google": "gemini",
-    "google-gemini": "gemini",
-    "google-ai-studio": "gemini",
-    "glm": "zai",
-    "z-ai": "zai",
-    "z.ai": "zai",
-    "zhipu": "zai",
-    "kimi": "kimi-coding",
-    "moonshot": "kimi-coding",
-    "minimax-china": "minimax-cn",
-    "minimax_cn": "minimax-cn",
-    "claude": "anthropic",
-    "claude-code": "anthropic",
-}
-
-
-def _normalize_aux_provider(provider: Optional[str], *, for_vision: bool = False) -> str:
-    normalized = (provider or "auto").strip().lower()
-    if normalized.startswith("custom:"):
-        suffix = normalized.split(":", 1)[1].strip()
-        if not suffix:
-            return "custom"
-        normalized = suffix if not for_vision else "custom"
-    if normalized == "codex":
-        return "openai-codex"
-    if normalized == "main":
-        # Resolve to the user's actual main provider so named custom providers
-        # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
-        main_prov = _read_main_provider()
-        if main_prov and main_prov not in ("auto", "main", ""):
-            return main_prov
-        return "custom"
-    return _PROVIDER_ALIASES.get(normalized, normalized)
-
 # Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
 _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
    "gemini": "gemini-3-flash-preview",
    "zai": "glm-4.5-flash",
    "kimi-coding": "kimi-k2-turbo-preview",
-    "minimax": "MiniMax-M2.7",
-    "minimax-cn": "MiniMax-M2.7",
+    "minimax": "MiniMax-M2.7-highspeed",
+    "minimax-cn": "MiniMax-M2.7-highspeed",
    "anthropic": "claude-haiku-4-5-20251001",
    "ai-gateway": "google/gemini-3-flash",
    "opencode-zen": "gemini-3-flash",
@@ -127,7 +92,6 @@ auxiliary_is_nous: bool = False
 _OPENROUTER_MODEL = "google/gemini-3-flash-preview"
 _NOUS_MODEL = "google/gemini-3-flash-preview"
 _NOUS_FREE_TIER_VISION_MODEL = "xiaomi/mimo-v2-omni"
-_NOUS_FREE_TIER_AUX_MODEL = "xiaomi/mimo-v2-pro"
 _NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
 _ANTHROPIC_DEFAULT_BASE_URL = "https://api.anthropic.com"
 _AUTH_JSON_PATH = get_hermes_home() / "auth.json"
@@ -141,23 +105,6 @@ _CODEX_AUX_MODEL = "gpt-5.2-codex"
 _CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"


-def _to_openai_base_url(base_url: str) -> str:
-    """Normalize an Anthropic-style base URL to OpenAI-compatible format.
-
-    Some providers (MiniMax, MiniMax-CN) expose an ``/anthropic`` endpoint for
-    the Anthropic Messages API and a separate ``/v1`` endpoint for OpenAI chat
-    completions.  The auxiliary client uses the OpenAI SDK, so it must hit the
-    ``/v1`` surface.  Passing the raw ``inference_base_url`` causes requests to
-    land on ``/anthropic/chat/completions`` — a 404.
-    """
-    url = str(base_url or "").strip().rstrip("/")
-    if url.endswith("/anthropic"):
-        rewritten = url[: -len("/anthropic")] + "/v1"
-        logger.debug("Auxiliary client: rewrote base URL %s → %s", url, rewritten)
-        return rewritten
-    return url
-
-
 def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]:
    """Return (pool_exists_for_provider, selected_entry)."""
    try:
@@ -629,19 +576,11 @@ def _nous_base_url() -> str:


 def _read_codex_access_token() -> Optional[str]:
-    """Read a valid, non-expired Codex OAuth access token from Hermes auth store.
-
-    If a credential pool exists but currently has no selectable runtime entry
-    (for example all pool slots are marked exhausted), fall back to the
-    profile's auth.json token instead of hard-failing. This keeps explicit
-    fallback-to-Codex working when the pool state is stale but the stored OAuth
-    token is still valid.
-    """
+    """Read a valid, non-expired Codex OAuth access token from Hermes auth store."""
    pool_present, entry = _select_pool_entry("openai-codex")
    if pool_present:
        token = _pool_runtime_api_key(entry)
-        if token:
-            return token
+        return token or None

    try:
        from hermes_cli.auth import _read_codex_tokens
@@ -695,9 +634,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
            if not api_key:
                continue

-            base_url = _to_openai_base_url(
-                _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
-            )
+            base_url = _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
            model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
            logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
            extra = {}
@@ -714,9 +651,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
        if not api_key:
            continue

-        base_url = _to_openai_base_url(
-            str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
-        )
+        base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
        model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
        logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
        extra = {}
@@ -778,7 +713,7 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
                   default_headers=_OR_HEADERS), _OPENROUTER_MODEL


-def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
+def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
    nous = _read_nous_auth()
    if not nous:
        return None, None
@@ -790,13 +725,12 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
    else:
        model = _NOUS_MODEL
    # Free-tier users can't use paid auxiliary models — use the free
-    # models instead: mimo-v2-omni for vision, mimo-v2-pro for text tasks.
+    # multimodal model instead so vision/browser-vision still works.
    try:
        from hermes_cli.models import check_nous_free_tier
        if check_nous_free_tier():
-            model = _NOUS_FREE_TIER_VISION_MODEL if vision else _NOUS_FREE_TIER_AUX_MODEL
-            logger.debug("Free-tier Nous account — using %s for auxiliary/%s",
-                         model, "vision" if vision else "text")
+            model = _NOUS_FREE_TIER_VISION_MODEL
+            logger.debug("Free-tier Nous account — using %s for auxiliary/vision", model)
    except Exception:
        pass
    return (
@@ -902,13 +836,9 @@ def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
    pool_present, entry = _select_pool_entry("openai-codex")
    if pool_present:
        codex_token = _pool_runtime_api_key(entry)
-        if codex_token:
-            base_url = _pool_runtime_base_url(entry, _CODEX_AUX_BASE_URL) or _CODEX_AUX_BASE_URL
-        else:
-            codex_token = _read_codex_access_token()
-            if not codex_token:
-                return None, None
-            base_url = _CODEX_AUX_BASE_URL
+        if not codex_token:
+            return None, None
+        base_url = _pool_runtime_base_url(entry, _CODEX_AUX_BASE_URL) or _CODEX_AUX_BASE_URL
    else:
        codex_token = _read_codex_access_token()
        if not codex_token:
@@ -1208,7 +1138,17 @@ def resolve_provider_client(
        (client, resolved_model) or (None, None) if auth is unavailable.
    """
    # Normalise aliases
-    provider = _normalize_aux_provider(provider)
+    provider = (provider or "auto").strip().lower()
+    if provider == "codex":
+        provider = "openai-codex"
+    if provider == "main":
+        # Resolve to the user's actual main provider so named custom providers
+        # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
+        main_prov = _read_main_provider()
+        if main_prov and main_prov not in ("auto", "main", ""):
+            provider = main_prov
+        else:
+            provider = "custom"

    # ── Auto: try all providers in priority order ────────────────────
    if provider == "auto":
@@ -1358,9 +1298,7 @@ def resolve_provider_client(
                         provider, ", ".join(tried_sources))
            return None, None

-        base_url = _to_openai_base_url(
-            str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
-        )
+        base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url

        default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
        final_model = model or default_model
@@ -1437,11 +1375,24 @@ def get_async_text_auxiliary_client(task: str = ""):
 _VISION_AUTO_PROVIDER_ORDER = (
    "openrouter",
    "nous",
+    "openai-codex",
+    "anthropic",
+    "custom",
 )


 def _normalize_vision_provider(provider: Optional[str]) -> str:
-    return _normalize_aux_provider(provider, for_vision=True)
+    provider = (provider or "auto").strip().lower()
+    if provider == "codex":
+        return "openai-codex"
+    if provider == "main":
+        # Resolve to actual main provider — named custom providers and
+        # non-aggregator providers need to pass through as their real name.
+        main_prov = _read_main_provider()
+        if main_prov and main_prov not in ("auto", "main", ""):
+            return main_prov
+        return "custom"
+    return provider


 def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]:
@@ -1449,7 +1400,7 @@ def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Option
    if provider == "openrouter":
        return _try_openrouter()
    if provider == "nous":
-        return _try_nous(vision=True)
+        return _try_nous()
    if provider == "openai-codex":
        return _try_codex()
    if provider == "anthropic":
@@ -1482,26 +1433,17 @@ def _preferred_main_vision_provider() -> Optional[str]:
 def get_available_vision_backends() -> List[str]:
    """Return the currently available vision backends in auto-selection order.

-    Order: active provider → OpenRouter → Nous → stop.  This is the single
-    source of truth for setup, tool gating, and runtime auto-routing of
-    vision tasks.
+    This is the single source of truth for setup, tool gating, and runtime
+    auto-routing of vision tasks. The selected main provider is preferred when
+    it is also a known-good vision backend; otherwise Hermes falls back through
+    the standard conservative order.
    """
-    available: List[str] = []
-    # 1. Active provider — if the user configured a provider, try it first.
-    main_provider = _read_main_provider()
-    if main_provider and main_provider not in ("auto", ""):
-        if main_provider in _VISION_AUTO_PROVIDER_ORDER:
-            if _strict_vision_backend_available(main_provider):
-                available.append(main_provider)
-        else:
-            client, _ = resolve_provider_client(main_provider, _read_main_model())
-            if client is not None:
-                available.append(main_provider)
-    # 2. OpenRouter, 3. Nous — skip if already covered by main provider.
-    for p in _VISION_AUTO_PROVIDER_ORDER:
-        if p not in available and _strict_vision_backend_available(p):
-            available.append(p)
-    return available
+    ordered = list(_VISION_AUTO_PROVIDER_ORDER)
+    preferred = _preferred_main_vision_provider()
+    if preferred in ordered:
+        ordered.remove(preferred)
+        ordered.insert(0, preferred)
+    return [provider for provider in ordered if _strict_vision_backend_available(provider)]


 def resolve_vision_provider_client(
@@ -1546,39 +1488,16 @@ def resolve_vision_provider_client(
        return "custom", client, final_model

    if requested == "auto":
-        # Vision auto-detection order:
-        #   1. Active provider + model (user's main chat config)
-        #   2. OpenRouter  (known vision-capable default model)
-        #   3. Nous Portal (known vision-capable default model)
-        #   4. Stop
-        main_provider = _read_main_provider()
-        main_model = _read_main_model()
-        if main_provider and main_provider not in ("auto", ""):
-            if main_provider in _VISION_AUTO_PROVIDER_ORDER:
-                # Known strict backend — use its defaults.
-                sync_client, default_model = _resolve_strict_vision_backend(main_provider)
-                if sync_client is not None:
-                    return _finalize(main_provider, sync_client, default_model)
-            else:
-                # Exotic provider (DeepSeek, Alibaba, named custom, etc.)
-                rpc_client, rpc_model = resolve_provider_client(
-                    main_provider, main_model)
-                if rpc_client is not None:
-                    logger.info(
-                        "Vision auto-detect: using active provider %s (%s)",
-                        main_provider, rpc_model or main_model,
-                    )
-                    return _finalize(
-                        main_provider, rpc_client, rpc_model or main_model)
+        ordered = list(_VISION_AUTO_PROVIDER_ORDER)
+        preferred = _preferred_main_vision_provider()
+        if preferred in ordered:
+            ordered.remove(preferred)
+            ordered.insert(0, preferred)

-        # Fall back through aggregators.
-        for candidate in _VISION_AUTO_PROVIDER_ORDER:
-            if candidate == main_provider:
-                continue  # already tried above
+        for candidate in ordered:
            sync_client, default_model = _resolve_strict_vision_backend(candidate)
            if sync_client is not None:
                return _finalize(candidate, sync_client, default_model)
-
        logger.debug("Auxiliary vision client: none available")
        return None, None, None

@@ -154,15 +154,12 @@ class ContextCompressor:

    def _prune_old_tool_results(
        self, messages: List[Dict[str, Any]], protect_tail_count: int,
-        protect_tail_tokens: int | None = None,
    ) -> tuple[List[Dict[str, Any]], int]:
        """Replace old tool result contents with a short placeholder.

-        Walks backward from the end, protecting the most recent messages that
-        fall within ``protect_tail_tokens`` (when provided) OR the last
-        ``protect_tail_count`` messages (backward-compatible default).
-        When both are given, the token budget takes priority and the message
-        count acts as a hard minimum floor.
+        Walks backward from the end, protecting the most recent
+        ``protect_tail_count`` messages. Older tool results get their
+        content replaced with a placeholder string.

        Returns (pruned_messages, pruned_count).
        """
@@ -171,29 +168,7 @@ class ContextCompressor:

        result = [m.copy() for m in messages]
        pruned = 0
-
-        # Determine the prune boundary
-        if protect_tail_tokens is not None and protect_tail_tokens > 0:
-            # Token-budget approach: walk backward accumulating tokens
-            accumulated = 0
-            boundary = len(result)
-            min_protect = min(protect_tail_count, len(result) - 1)
-            for i in range(len(result) - 1, -1, -1):
-                msg = result[i]
-                content_len = len(msg.get("content") or "")
-                msg_tokens = content_len // _CHARS_PER_TOKEN + 10
-                for tc in msg.get("tool_calls") or []:
-                    if isinstance(tc, dict):
-                        args = tc.get("function", {}).get("arguments", "")
-                        msg_tokens += len(args) // _CHARS_PER_TOKEN
-                if accumulated + msg_tokens > protect_tail_tokens and (len(result) - i) >= min_protect:
-                    boundary = i
-                    break
-                accumulated += msg_tokens
-                boundary = i
-            prune_boundary = max(boundary, len(result) - min_protect)
-        else:
-            prune_boundary = len(result) - protect_tail_count
+        prune_boundary = len(result) - protect_tail_count

        for i in range(prune_boundary):
            msg = result[i]
@@ -224,39 +199,30 @@ class ContextCompressor:
        budget = int(content_tokens * _SUMMARY_RATIO)
        return max(_MIN_SUMMARY_TOKENS, min(budget, self.max_summary_tokens))

-    # Truncation limits for the summarizer input.  These bound how much of
-    # each message the summary model sees — the budget is the *summary*
-    # model's context window, not the main model's.
-    _CONTENT_MAX = 6000       # total chars per message body
-    _CONTENT_HEAD = 4000      # chars kept from the start
-    _CONTENT_TAIL = 1500      # chars kept from the end
-    _TOOL_ARGS_MAX = 1500     # tool call argument chars
-    _TOOL_ARGS_HEAD = 1200    # kept from the start of tool args
-
    def _serialize_for_summary(self, turns: List[Dict[str, Any]]) -> str:
        """Serialize conversation turns into labeled text for the summarizer.

-        Includes tool call arguments and result content (up to
-        ``_CONTENT_MAX`` chars per message) so the summarizer can preserve
-        specific details like file paths, commands, and outputs.
+        Includes tool call arguments and result content (up to 3000 chars
+        per message) so the summarizer can preserve specific details like
+        file paths, commands, and outputs.
        """
        parts = []
        for msg in turns:
            role = msg.get("role", "unknown")
            content = msg.get("content") or ""

-            # Tool results: keep enough content for the summarizer
+            # Tool results: keep more content than before (3000 chars)
            if role == "tool":
                tool_id = msg.get("tool_call_id", "")
-                if len(content) > self._CONTENT_MAX:
-                    content = content[:self._CONTENT_HEAD] + "\n...[truncated]...\n" + content[-self._CONTENT_TAIL:]
+                if len(content) > 3000:
+                    content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
                parts.append(f"[TOOL RESULT {tool_id}]: {content}")
                continue

            # Assistant messages: include tool call names AND arguments
            if role == "assistant":
-                if len(content) > self._CONTENT_MAX:
-                    content = content[:self._CONTENT_HEAD] + "\n...[truncated]...\n" + content[-self._CONTENT_TAIL:]
+                if len(content) > 3000:
+                    content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
                tool_calls = msg.get("tool_calls", [])
                if tool_calls:
                    tc_parts = []
@@ -266,8 +232,8 @@ class ContextCompressor:
                            name = fn.get("name", "?")
                            args = fn.get("arguments", "")
                            # Truncate long arguments but keep enough for context
-                            if len(args) > self._TOOL_ARGS_MAX:
-                                args = args[:self._TOOL_ARGS_HEAD] + "..."
+                            if len(args) > 500:
+                                args = args[:400] + "..."
                            tc_parts.append(f"  {name}({args})")
                        else:
                            fn = getattr(tc, "function", None)
@@ -278,8 +244,8 @@ class ContextCompressor:
                continue

            # User and other roles
-            if len(content) > self._CONTENT_MAX:
-                content = content[:self._CONTENT_HEAD] + "\n...[truncated]...\n" + content[-self._CONTENT_TAIL:]
+            if len(content) > 3000:
+                content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
            parts.append(f"[{role.upper()}]: {content}")

        return "\n\n".join(parts)
@@ -344,9 +310,6 @@ Update the summary using this exact structure. PRESERVE all existing information
 ## Critical Context
 [Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]

-## Tools & Patterns
-[Which tools were used, how they were used effectively, and any tool-specific discoveries. Accumulate across compactions.]
-
 Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions.

 Write only the summary body. Do not include any preamble or prefix."""
@@ -385,9 +348,6 @@ Use this exact structure:
 ## Critical Context
 [Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]

-## Tools & Patterns
-[Which tools were used, how they were used effectively, and any tool-specific discoveries (e.g., preferred flags, working invocations, successful command patterns)]
-
 Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions. The goal is to prevent the next assistant from repeating work or losing important details.

 Write only the summary body. Do not include any preamble or prefix."""
@@ -558,20 +518,13 @@ Write only the summary body. Do not include any preamble or prefix."""
        derived from ``summary_target_ratio * context_length``, so it
        scales automatically with the model's context window.

-        Token budget is the primary criterion.  A hard minimum of 3 messages
-        is always protected, but the budget is allowed to exceed by up to
-        1.5x to avoid cutting inside an oversized message (tool output, file
-        read, etc.).  If even the minimum 3 messages exceed 1.5x the budget
-        the cut is placed right after the head so compression still runs.
-
-        Never cuts inside a tool_call/result group.
+        Never cuts inside a tool_call/result group. Falls back to the old
+        ``protect_last_n`` if the budget would protect fewer messages.
        """
        if token_budget is None:
            token_budget = self.tail_token_budget
        n = len(messages)
-        # Hard minimum: always keep at least 3 messages in the tail
-        min_tail = min(3, n - head_end - 1) if n - head_end > 1 else 0
-        soft_ceiling = int(token_budget * 1.5)
+        min_tail = self.protect_last_n
        accumulated = 0
        cut_idx = n  # start from beyond the end

@@ -584,21 +537,21 @@ Write only the summary body. Do not include any preamble or prefix."""
                if isinstance(tc, dict):
                    args = tc.get("function", {}).get("arguments", "")
                    msg_tokens += len(args) // _CHARS_PER_TOKEN
-            # Stop once we exceed the soft ceiling (unless we haven't hit min_tail yet)
-            if accumulated + msg_tokens > soft_ceiling and (n - i) >= min_tail:
+            if accumulated + msg_tokens > token_budget and (n - i) >= min_tail:
                break
            accumulated += msg_tokens
            cut_idx = i

-        # Ensure we protect at least min_tail messages
+        # Ensure we protect at least protect_last_n messages
        fallback_cut = n - min_tail
        if cut_idx > fallback_cut:
            cut_idx = fallback_cut

        # If the token budget would protect everything (small conversations),
-        # force a cut after the head so compression can still remove middle turns.
+        # fall back to the fixed protect_last_n approach so compression can
+        # still remove middle turns.
        if cut_idx <= head_end:
-            cut_idx = max(fallback_cut, head_end + 1)
+            cut_idx = fallback_cut

        # Align to avoid splitting tool groups
        cut_idx = self._align_boundary_backward(messages, cut_idx)
@@ -623,13 +576,12 @@ Write only the summary body. Do not include any preamble or prefix."""
        up so the API never receives mismatched IDs.
        """
        n_messages = len(messages)
-        # Only need head + 3 tail messages minimum (token budget decides the real tail size)
-        _min_for_compress = self.protect_first_n + 3 + 1
-        if n_messages <= _min_for_compress:
+        if n_messages <= self.protect_first_n + self.protect_last_n + 1:
            if not self.quiet_mode:
                logger.warning(
                    "Cannot compress: only %d messages (need > %d)",
-                    n_messages, _min_for_compress,
+                    n_messages,
+                    self.protect_first_n + self.protect_last_n + 1,
                )
            return messages

@@ -637,8 +589,7 @@ Write only the summary body. Do not include any preamble or prefix."""

        # Phase 1: Prune old tool results (cheap, no LLM call)
        messages, pruned_count = self._prune_old_tool_results(
-            messages, protect_tail_count=self.protect_last_n,
-            protect_tail_tokens=self.tail_token_budget,
+            messages, protect_tail_count=self.protect_last_n * 3,
        )
        if pruned_count and not self.quiet_mode:
            logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)
@@ -64,10 +64,10 @@ SUPPORTED_POOL_STRATEGIES = {
 }

 # Cooldown before retrying an exhausted credential.
-# 429 (rate-limited) and 402 (billing/quota) both cool down after 1 hour.
-# Provider-supplied reset_at timestamps override these defaults.
+# 429 (rate-limited) cools down faster since quotas reset frequently.
+# 402 (billing/quota) and other codes use a longer default.
 EXHAUSTED_TTL_429_SECONDS = 60 * 60          # 1 hour
-EXHAUSTED_TTL_DEFAULT_SECONDS = 60 * 60      # 1 hour
+EXHAUSTED_TTL_DEFAULT_SECONDS = 24 * 60 * 60 # 24 hours

 # Pool key prefix for custom OpenAI-compatible endpoints.
 # Custom endpoints all share provider='custom' but are keyed by their
@@ -26,14 +26,12 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
    "openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
    "gemini", "zai", "kimi-coding", "minimax", "minimax-cn", "anthropic", "deepseek",
    "opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
-    "qwen-oauth",
    "custom", "local",
    # Common aliases
    "google", "google-gemini", "google-ai-studio",
    "glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
    "github-models", "kimi", "moonshot", "claude", "deep-seek",
    "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
-    "qwen-portal",
 })


@@ -115,15 +113,8 @@ DEFAULT_CONTEXT_LENGTHS = {
    "llama": 131072,
    # Qwen
    "qwen": 131072,
-    # MiniMax (lowercase — lookup lowercases model names at line 973)
-    "minimax-m1-256k": 1000000,
-    "minimax-m1-128k": 1000000,
-    "minimax-m1-80k": 1000000,
-    "minimax-m1-40k": 1000000,
-    "minimax-m1": 1000000,
-    "minimax-m2.5": 1048576,
-    "minimax-m2.7": 1048576,
-    "minimax": 1048576,
+    # MiniMax
+    "minimax": 204800,
    # GLM
    "glm": 202752,
    # Kimi
@@ -136,7 +127,7 @@ DEFAULT_CONTEXT_LENGTHS = {
    "deepseek-ai/DeepSeek-V3.2": 65536,
    "moonshotai/Kimi-K2.5": 262144,
    "moonshotai/Kimi-K2-Thinking": 262144,
-    "MiniMaxAI/MiniMax-M2.5": 1048576,
+    "MiniMaxAI/MiniMax-M2.5": 204800,
    "XiaomiMiMo/MiMo-V2-Flash": 32768,
    "mimo-v2-pro": 1048576,
    "mimo-v2-omni": 1048576,
@@ -189,7 +180,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.minimax": "minimax",
    "dashscope.aliyuncs.com": "alibaba",
    "dashscope-intl.aliyuncs.com": "alibaba",
-    "portal.qwen.ai": "qwen-oauth",
    "openrouter.ai": "openrouter",
    "generativelanguage.googleapis.com": "gemini",
    "inference-api.nousresearch.com": "nous",
@@ -197,7 +187,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
    "api.githubcopilot.com": "copilot",
    "models.github.ai": "copilot",
    "api.fireworks.ai": "fireworks",
-    "opencode.ai": "opencode-go",
 }


@@ -622,59 +611,6 @@ def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
    return False


-def query_ollama_num_ctx(model: str, base_url: str) -> Optional[int]:
-    """Query an Ollama server for the model's context length.
-
-    Returns the model's maximum context from GGUF metadata via ``/api/show``,
-    or the explicit ``num_ctx`` from the Modelfile if set.  Returns None if
-    the server is unreachable or not Ollama.
-
-    This is the value that should be passed as ``num_ctx`` in Ollama chat
-    requests to override the default 2048.
-    """
-    import httpx
-
-    bare_model = _strip_provider_prefix(model)
-    server_url = base_url.rstrip("/")
-    if server_url.endswith("/v1"):
-        server_url = server_url[:-3]
-
-    try:
-        server_type = detect_local_server_type(base_url)
-    except Exception:
-        return None
-    if server_type != "ollama":
-        return None
-
-    try:
-        with httpx.Client(timeout=3.0) as client:
-            resp = client.post(f"{server_url}/api/show", json={"name": bare_model})
-            if resp.status_code != 200:
-                return None
-            data = resp.json()
-
-            # Prefer explicit num_ctx from Modelfile parameters (user override)
-            params = data.get("parameters", "")
-            if "num_ctx" in params:
-                for line in params.split("\n"):
-                    if "num_ctx" in line:
-                        parts = line.strip().split()
-                        if len(parts) >= 2:
-                            try:
-                                return int(parts[-1])
-                            except ValueError:
-                                pass
-
-            # Fall back to GGUF model_info context_length (training max)
-            model_info = data.get("model_info", {})
-            for key, value in model_info.items():
-                if "context_length" in key and isinstance(value, (int, float)):
-                    return int(value)
-    except Exception:
-        pass
-    return None
-
-
 def _query_local_context_length(model: str, base_url: str) -> Optional[int]:
    """Query a local server for the model's context length."""
    import httpx
@@ -153,7 +153,6 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
    "minimax-cn": "minimax-cn",
    "deepseek": "deepseek",
    "alibaba": "alibaba",
-    "qwen-oauth": "alibaba",
    "copilot": "github-copilot",
    "ai-gateway": "vercel",
    "opencode-zen": "opencode",
@@ -204,30 +204,6 @@ OPENAI_MODEL_EXECUTION_GUIDANCE = (
    "the result.\n"
    "</tool_persistence>\n"
    "\n"
-    "<mandatory_tool_use>\n"
-    "NEVER answer these from memory or mental computation — ALWAYS use a tool:\n"
-    "- Arithmetic, math, calculations → use terminal or execute_code\n"
-    "- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)\n"
-    "- Current time, date, timezone → use terminal (e.g. date)\n"
-    "- System state: OS, CPU, memory, disk, ports, processes → use terminal\n"
-    "- File contents, sizes, line counts → use read_file, search_files, or terminal\n"
-    "- Git history, branches, diffs → use terminal\n"
-    "- Current facts (weather, news, versions) → use web_search\n"
-    "Your memory and user profile describe the USER, not the system you are "
-    "running on. The execution environment may differ from what the user profile "
-    "says about their personal setup.\n"
-    "</mandatory_tool_use>\n"
-    "\n"
-    "<act_dont_ask>\n"
-    "When a question has an obvious default interpretation, act on it immediately "
-    "instead of asking for clarification. Examples:\n"
-    "- 'Is port 443 open?' → check THIS machine (don't ask 'open where?')\n"
-    "- 'What OS am I running?' → check the live system (don't use user profile)\n"
-    "- 'What time is it?' → run `date` (don't guess)\n"
-    "Only ask for clarification when the ambiguity genuinely changes what tool "
-    "you would call.\n"
-    "</act_dont_ask>\n"
-    "\n"
    "<prerequisite_checks>\n"
    "- Before taking an action, check whether prerequisite discovery, lookup, or "
    "context-gathering steps are needed.\n"
@@ -349,13 +325,6 @@ PLATFORM_HINTS = {
        "only — no markdown, no formatting. SMS messages are limited to ~1600 "
        "characters, so be brief and direct."
    ),
-    "bluebubbles": (
-        "You are chatting via iMessage (BlueBubbles). iMessage does not render "
-        "markdown formatting — use plain text. Keep responses concise as they "
-        "appear as text messages. You can send media files natively: include "
-        "MEDIA:/absolute/path/to/file in your response. Images (.jpg, .png, "
-        ".heic) appear as photos and other files arrive as attachments."
-    ),
 }

 CONTEXT_FILE_MAX_CHARS = 20_000
@@ -1,57 +0,0 @@
-"""Retry utilities — jittered backoff for decorrelated retries.
-
-Replaces fixed exponential backoff with jittered delays to prevent
-thundering-herd retry spikes when multiple sessions hit the same
-rate-limited provider concurrently.
-"""
-
-import random
-import threading
-import time
-
-# Monotonic counter for jitter seed uniqueness within the same process.
-# Protected by a lock to avoid race conditions in concurrent retry paths
-# (e.g. multiple gateway sessions retrying simultaneously).
-_jitter_counter = 0
-_jitter_lock = threading.Lock()
-
-
-def jittered_backoff(
-    attempt: int,
-    *,
-    base_delay: float = 5.0,
-    max_delay: float = 120.0,
-    jitter_ratio: float = 0.5,
-) -> float:
-    """Compute a jittered exponential backoff delay.
-
-    Args:
-        attempt: 1-based retry attempt number.
-        base_delay: Base delay in seconds for attempt 1.
-        max_delay: Maximum delay cap in seconds.
-        jitter_ratio: Fraction of computed delay to use as random jitter
-            range.  0.5 means jitter is uniform in [0, 0.5 * delay].
-
-    Returns:
-        Delay in seconds: min(base * 2^(attempt-1), max_delay) + jitter.
-
-    The jitter decorrelates concurrent retries so multiple sessions
-    hitting the same provider don't all retry at the same instant.
-    """
-    global _jitter_counter
-    with _jitter_lock:
-        _jitter_counter += 1
-        tick = _jitter_counter
-
-    exponent = max(0, attempt - 1)
-    if exponent >= 63 or base_delay <= 0:
-        delay = max_delay
-    else:
-        delay = min(base_delay * (2 ** exponent), max_delay)
-
-    # Seed from time + counter for decorrelation even with coarse clocks.
-    seed = (time.time_ns() ^ (tick * 0x9E3779B9)) & 0xFFFFFFFF
-    rng = random.Random(seed)
-    jitter = rng.uniform(0, jitter_ratio * delay)
-
-    return delay + jitter
@@ -445,16 +445,6 @@ agent:
  # Higher = more room for complex tasks, but costs more tokens
  # Recommended: 20-30 for focused tasks, 50-100 for open exploration
  max_turns: 60
-
-  # Inactivity timeout for gateway agent runs (seconds, 0 = unlimited).
-  # The agent can run indefinitely when actively calling tools or receiving
-  # API responses.  Only fires after the agent has been idle for this duration.
-  # gateway_timeout: 1800
-
-  # Staged warning: send a warning before escalating to full timeout.
-  # Fires once per run when inactivity reaches this threshold (seconds).
-  # Set to 0 to disable the warning.
-  # gateway_timeout_warning: 900
  
  # Enable verbose logging
  verbose: false
@@ -654,14 +644,10 @@ platform_toolsets:
 # Voice Transcription (Speech-to-Text)
 # =============================================================================
 # Automatically transcribe voice messages on messaging platforms.
-# Providers: local (free, faster-whisper) | groq (free tier) | openai (Whisper API) | mistral (Voxtral Transcribe)
-# Set the corresponding API key in .env: GROQ_API_KEY, OPENAI_API_KEY, or MISTRAL_API_KEY.
+# Requires OPENAI_API_KEY in .env (uses OpenAI Whisper API directly).
 stt:
  enabled: true
-  # provider: "local"          # auto-detected if omitted
  model: "whisper-1"  # whisper-1 (cheapest) | gpt-4o-mini-transcribe | gpt-4o-transcribe
-  # mistral:
-  #   model: "voxtral-mini-latest"  # voxtral-mini-latest | voxtral-mini-2602

 # =============================================================================
 # Response Pacing (Messaging Platforms)
@@ -612,11 +612,6 @@ def _run_cleanup():
        pass
    # Shut down memory provider (on_session_end + shutdown_all) at actual
    # session boundary — NOT per-turn inside run_conversation().
-    try:
-        from hermes_cli.plugins import invoke_hook as _invoke_hook
-        _invoke_hook("on_session_finalize", session_id=_active_agent_ref.session_id if _active_agent_ref else None, platform="cli")
-    except Exception:
-        pass
    try:
        if _active_agent_ref and hasattr(_active_agent_ref, 'shutdown_memory_provider'):
            _active_agent_ref.shutdown_memory_provider(
@@ -760,10 +755,7 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
 def _cleanup_worktree(info: Dict[str, str] = None) -> None:
    """Remove a worktree and its branch on exit.

-    Preserves the worktree only if it has unpushed commits (real work
-    that hasn't been pushed to any remote).  Uncommitted changes alone
-    (untracked files, test artifacts) are not enough to keep it — agent
-    work lives in commits/PRs, not the working tree.
+    If the worktree has uncommitted changes, warn and keep it.
    """
    global _active_worktree
    info = info or _active_worktree
@@ -779,27 +771,23 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
    if not Path(wt_path).exists():
        return

-    # Check for unpushed commits — commits reachable from HEAD but not
-    # from any remote branch.  These represent real work the agent did
-    # but didn't push.
-    has_unpushed = False
+    # Check for uncommitted changes
    try:
-        result = subprocess.run(
-            ["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
+        status = subprocess.run(
+            ["git", "status", "--porcelain"],
            capture_output=True, text=True, timeout=10, cwd=wt_path,
        )
-        has_unpushed = bool(result.stdout.strip())
+        has_changes = bool(status.stdout.strip())
    except Exception:
-        has_unpushed = True  # Assume unpushed on error — don't delete
+        has_changes = True  # Assume dirty on error — don't delete

-    if has_unpushed:
-        print(f"\n\033[33m⚠ Worktree has unpushed commits, keeping: {wt_path}\033[0m")
-        print(f"  To clean up manually: git worktree remove --force {wt_path}")
+    if has_changes:
+        print(f"\n\033[33m⚠ Worktree has uncommitted changes, keeping: {wt_path}\033[0m")
+        print(f"  To clean up manually: git worktree remove {wt_path}")
        _active_worktree = None
        return

-    # Remove worktree (even if working tree is dirty — uncommitted
-    # changes without unpushed commits are just artifacts)
+    # Remove worktree
    try:
        subprocess.run(
            ["git", "worktree", "remove", wt_path, "--force"],
@@ -808,7 +796,7 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
    except Exception as e:
        logger.debug("Failed to remove worktree: %s", e)

-    # Delete the branch
+    # Delete the branch (only if it was never pushed / has no upstream)
    try:
        subprocess.run(
            ["git", "branch", "-D", branch],
@@ -822,27 +810,19 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:


 def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
-    """Remove stale worktrees and orphaned branches on startup.
+    """Remove worktrees older than max_age_hours that have no uncommitted changes.

-    Age-based tiers:
-    - Under max_age_hours (24h): skip — session may still be active.
-    - 24h–72h: remove if no unpushed commits.
-    - Over 72h: force remove regardless (nothing should sit this long).
-
-    Also prunes orphaned ``hermes/*`` and ``pr-*`` local branches that
-    have no corresponding worktree.
+    Runs silently on startup to clean up after crashed/killed sessions.
    """
    import subprocess
    import time

    worktrees_dir = Path(repo_root) / ".worktrees"
    if not worktrees_dir.exists():
-        _prune_orphaned_branches(repo_root)
        return

    now = time.time()
-    soft_cutoff = now - (max_age_hours * 3600)       # 24h default
-    hard_cutoff = now - (max_age_hours * 3 * 3600)   # 72h default
+    cutoff = now - (max_age_hours * 3600)

    for entry in worktrees_dir.iterdir():
        if not entry.is_dir() or not entry.name.startswith("hermes-"):
@@ -851,24 +831,21 @@ def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
        # Check age
        try:
            mtime = entry.stat().st_mtime
-            if mtime > soft_cutoff:
+            if mtime > cutoff:
                continue  # Too recent — skip
        except Exception:
            continue

-        force = mtime <= hard_cutoff  # Over 72h — force remove
-
-        if not force:
-            # 24h–72h tier: only remove if no unpushed commits
-            try:
-                result = subprocess.run(
-                    ["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
-                    capture_output=True, text=True, timeout=5, cwd=str(entry),
-                )
-                if result.stdout.strip():
-                    continue  # Has unpushed commits — skip
-            except Exception:
-                continue  # Can't check — skip
+        # Check for uncommitted changes
+        try:
+            status = subprocess.run(
+                ["git", "status", "--porcelain"],
+                capture_output=True, text=True, timeout=5, cwd=str(entry),
+            )
+            if status.stdout.strip():
+                continue  # Has changes — skip
+        except Exception:
+            continue  # Can't check — skip

        # Safe to remove
        try:
@@ -887,81 +864,10 @@ def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
                    ["git", "branch", "-D", branch],
                    capture_output=True, text=True, timeout=10, cwd=repo_root,
                )
-            logger.debug("Pruned stale worktree: %s (force=%s)", entry.name, force)
+            logger.debug("Pruned stale worktree: %s", entry.name)
        except Exception as e:
            logger.debug("Failed to prune worktree %s: %s", entry.name, e)

-    _prune_orphaned_branches(repo_root)
-
-
-def _prune_orphaned_branches(repo_root: str) -> None:
-    """Delete local ``hermes/hermes-*`` and ``pr-*`` branches with no worktree.
-
-    These are auto-generated by ``hermes -w`` sessions and PR review
-    workflows respectively.  Once their worktree is gone they serve no
-    purpose and just accumulate.
-    """
-    import subprocess
-
-    try:
-        result = subprocess.run(
-            ["git", "branch", "--format=%(refname:short)"],
-            capture_output=True, text=True, timeout=10, cwd=repo_root,
-        )
-        if result.returncode != 0:
-            return
-        all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
-    except Exception:
-        return
-
-    # Collect branches that are actively checked out in a worktree
-    active_branches: set = set()
-    try:
-        wt_result = subprocess.run(
-            ["git", "worktree", "list", "--porcelain"],
-            capture_output=True, text=True, timeout=10, cwd=repo_root,
-        )
-        for line in wt_result.stdout.split("\n"):
-            if line.startswith("branch refs/heads/"):
-                active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
-    except Exception:
-        return  # Can't determine active branches — bail
-
-    # Also protect the currently checked-out branch and main
-    try:
-        head_result = subprocess.run(
-            ["git", "branch", "--show-current"],
-            capture_output=True, text=True, timeout=5, cwd=repo_root,
-        )
-        current = head_result.stdout.strip()
-        if current:
-            active_branches.add(current)
-    except Exception:
-        pass
-    active_branches.add("main")
-
-    orphaned = [
-        b for b in all_branches
-        if b not in active_branches
-        and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
-    ]
-
-    if not orphaned:
-        return
-
-    # Delete in batches
-    for i in range(0, len(orphaned), 50):
-        batch = orphaned[i:i + 50]
-        try:
-            subprocess.run(
-                ["git", "branch", "-D"] + batch,
-                capture_output=True, text=True, timeout=30, cwd=repo_root,
-            )
-        except Exception as e:
-            logger.debug("Failed to prune orphaned branches: %s", e)
-
-    logger.debug("Pruned %d orphaned branches", len(orphaned))
-
 # ============================================================================
 # ASCII Art & Branding
 # ============================================================================
@@ -3408,22 +3314,6 @@ class HermesCLI:
        flush_tool_summary()
        print()
    
-    def _notify_session_boundary(self, event_type: str) -> None:
-        """Fire a session-boundary plugin hook (on_session_finalize or on_session_reset).
-
-        Non-blocking — errors are caught and logged.  Safe to call from any
-        lifecycle point (shutdown, /new, /reset).
-        """
-        try:
-            from hermes_cli.plugins import invoke_hook as _invoke_hook
-            _invoke_hook(
-                event_type,
-                session_id=self.agent.session_id if self.agent else None,
-                platform=getattr(self, "platform", None) or "cli",
-            )
-        except Exception:
-            pass
-
    def new_session(self, silent=False):
        """Start a fresh session with a new session ID and cleared agent state."""
        if self.agent and self.conversation_history:
@@ -3431,10 +3321,6 @@ class HermesCLI:
                self.agent.flush_memories(self.conversation_history)
            except (Exception, KeyboardInterrupt):
                pass
-            self._notify_session_boundary("on_session_finalize")
-        elif self.agent:
-            # First session or empty history — still finalize the old session
-            self._notify_session_boundary("on_session_finalize")

        old_session_id = self.session_id
        if self._session_db and old_session_id:
@@ -3479,7 +3365,6 @@ class HermesCLI:
                    )
                except Exception:
                    pass
-            self._notify_session_boundary("on_session_reset")

        if not silent:
            print("(^_^)v New session started!")
@@ -4668,13 +4553,13 @@ class HermesCLI:
                            if output:
                                self.console.print(_rich_text_from_ansi(output))
                            else:
-                                self.console.print("[dim]Command returned no output[/]")
+                                ChatConsole().print("[dim]Command returned no output[/]")
                        except subprocess.TimeoutExpired:
-                            self.console.print("[bold red]Quick command timed out (30s)[/]")
+                            ChatConsole().print("[bold red]Quick command timed out (30s)[/]")
                        except Exception as e:
-                            self.console.print(f"[bold red]Quick command error: {e}[/]")
+                            ChatConsole().print(f"[bold red]Quick command error: {e}[/]")
                    else:
-                        self.console.print(f"[bold red]Quick command '{base_cmd}' has no command defined[/]")
+                        ChatConsole().print(f"[bold red]Quick command '{base_cmd}' has no command defined[/]")
                elif qcmd.get("type") == "alias":
                    target = qcmd.get("target", "").strip()
                    if target:
@@ -4683,9 +4568,9 @@ class HermesCLI:
                        aliased_command = f"{target} {user_args}".strip()
                        return self.process_command(aliased_command)
                    else:
-                        self.console.print(f"[bold red]Quick command '{base_cmd}' has no target defined[/]")
+                        ChatConsole().print(f"[bold red]Quick command '{base_cmd}' has no target defined[/]")
                else:
-                    self.console.print(f"[bold red]Quick command '{base_cmd}' has unsupported type (supported: 'exec', 'alias')[/]")
+                    ChatConsole().print(f"[bold red]Quick command '{base_cmd}' has unsupported type (supported: 'exec', 'alias')[/]")
            # Check for plugin-registered slash commands
            elif base_cmd.lstrip("/") in _get_plugin_cmd_handler_names():
                from hermes_cli.plugins import get_plugin_command_handler
@@ -574,16 +574,12 @@ def remove_job(job_id: str) -> bool:
    return False


-def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
-                 delivery_error: Optional[str] = None):
+def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
    """
    Mark a job as having been run.
    
    Updates last_run_at, last_status, increments completed count,
    computes next_run_at, and auto-deletes if repeat limit reached.
-
-    ``delivery_error`` is tracked separately from the agent error — a job
-    can succeed (agent produced output) but fail delivery (platform down).
    """
    jobs = load_jobs()
    for i, job in enumerate(jobs):
@@ -592,8 +588,6 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
            job["last_run_at"] = now
            job["last_status"] = "ok" if success else "error"
            job["last_error"] = error if not success else None
-            # Track delivery failures separately — cleared on successful delivery
-            job["last_delivery_error"] = delivery_error
            
            # Increment completed count
            if job.get("repeat"):
@@ -44,7 +44,7 @@ logger = logging.getLogger(__name__)
 _KNOWN_DELIVERY_PLATFORMS = frozenset({
    "telegram", "discord", "slack", "whatsapp", "signal",
    "matrix", "mattermost", "homeassistant", "dingtalk", "feishu",
-    "wecom", "sms", "email", "webhook", "bluebubbles",
+    "wecom", "sms", "email", "webhook",
 })

 from cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_run
@@ -91,7 +91,7 @@ def _resolve_delivery_target(job: dict) -> Optional[dict]:
            }
        # Origin missing (e.g. job created via API/script) — try each
        # platform's home channel as a fallback instead of silently dropping.
-        for platform_name in ("matrix", "telegram", "discord", "slack", "bluebubbles"):
+        for platform_name in ("matrix", "telegram", "discord", "slack"):
            chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
            if chat_id:
                logger.info(
@@ -196,7 +196,7 @@ def _send_media_via_adapter(adapter, chat_id: str, media_files: list, metadata:
            logger.warning("Job '%s': failed to send media %s: %s", job.get("id", "?"), media_path, e)


-def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Optional[str]:
+def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None:
    """
    Deliver job output to the configured target (origin chat, specific platform, etc.).

@@ -204,16 +204,16 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
    use the live adapter first — this supports E2EE rooms (e.g. Matrix) where
    the standalone HTTP path cannot encrypt.  Falls back to standalone send if
    the adapter path fails or is unavailable.
-
-    Returns None on success, or an error string on failure.
    """
    target = _resolve_delivery_target(job)
    if not target:
        if job.get("deliver", "local") != "local":
-            msg = f"no delivery target resolved for deliver={job.get('deliver', 'local')}"
-            logger.warning("Job '%s': %s", job["id"], msg)
-            return msg
-        return None  # local-only jobs don't deliver — not a failure
+            logger.warning(
+                "Job '%s' deliver=%s but no concrete delivery target could be resolved",
+                job["id"],
+                job.get("deliver", "local"),
+            )
+        return

    platform_name = target["platform"]
    chat_id = target["chat_id"]
@@ -236,26 +236,22 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
        "wecom": Platform.WECOM,
        "email": Platform.EMAIL,
        "sms": Platform.SMS,
-        "bluebubbles": Platform.BLUEBUBBLES,
    }
    platform = platform_map.get(platform_name.lower())
    if not platform:
-        msg = f"unknown platform '{platform_name}'"
-        logger.warning("Job '%s': %s", job["id"], msg)
-        return msg
+        logger.warning("Job '%s': unknown platform '%s' for delivery", job["id"], platform_name)
+        return

    try:
        config = load_gateway_config()
    except Exception as e:
-        msg = f"failed to load gateway config: {e}"
-        logger.error("Job '%s': %s", job["id"], msg)
-        return msg
+        logger.error("Job '%s': failed to load gateway config for delivery: %s", job["id"], e)
+        return

    pconfig = config.platforms.get(platform)
    if not pconfig or not pconfig.enabled:
-        msg = f"platform '{platform_name}' not configured/enabled"
-        logger.warning("Job '%s': %s", job["id"], msg)
-        return msg
+        logger.warning("Job '%s': platform '%s' not configured/enabled", job["id"], platform_name)
+        return

    # Optionally wrap the content with a header/footer so the user knows this
    # is a cron delivery.  Wrapping is on by default; set cron.wrap_response: false
@@ -311,7 +307,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option

            if adapter_ok:
                logger.info("Job '%s': delivered to %s:%s via live adapter", job["id"], platform_name, chat_id)
-                return None
+                return
        except Exception as e:
            logger.warning(
                "Job '%s': live adapter delivery to %s:%s failed (%s), falling back to standalone",
@@ -333,17 +329,13 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
            future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, cleaned_delivery_content, thread_id=thread_id, media_files=media_files))
            result = future.result(timeout=30)
    except Exception as e:
-        msg = f"delivery to {platform_name}:{chat_id} failed: {e}"
-        logger.error("Job '%s': %s", job["id"], msg)
-        return msg
+        logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
+        return

    if result and result.get("error"):
-        msg = f"delivery error: {result['error']}"
-        logger.error("Job '%s': %s", job["id"], msg)
-        return msg
-
-    logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
-    return None
+        logger.error("Job '%s': delivery error: %s", job["id"], result["error"])
+    else:
+        logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)


 _SCRIPT_TIMEOUT = 120  # seconds
@@ -586,9 +578,11 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
        except Exception as e:
            logger.warning("Job '%s': failed to load config.yaml, using defaults: %s", job_id, e)

-        # Reasoning config from config.yaml
+        # Reasoning config from env or config.yaml
        from hermes_constants import parse_reasoning_effort
-        effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
+        effort = os.getenv("HERMES_REASONING_EFFORT", "")
+        if not effort:
+            effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
        reasoning_config = parse_reasoning_effort(effort)

        # Prefill messages from env or config.yaml
@@ -874,15 +868,13 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
                    logger.info("Job '%s': agent returned %s — skipping delivery", job["id"], SILENT_MARKER)
                    should_deliver = False

-                delivery_error = None
                if should_deliver:
                    try:
-                        delivery_error = _deliver_result(job, deliver_content, adapters=adapters, loop=loop)
+                        _deliver_result(job, deliver_content, adapters=adapters, loop=loop)
                    except Exception as de:
-                        delivery_error = str(de)
                        logger.error("Delivery failed for job %s: %s", job["id"], de)

-                mark_job_run(job["id"], success, error, delivery_error=delivery_error)
+                mark_job_run(job["id"], success, error)
                executed += 1

            except Exception as e:
@@ -21,8 +21,6 @@ from dataclasses import dataclass, field
 from typing import Any, Dict, List, Optional, Set

 from model_tools import handle_function_call
-from tools.terminal_tool import get_active_env
-from tools.tool_result_storage import maybe_persist_tool_result, enforce_turn_budget

 # Thread pool for running sync tool calls that internally use asyncio.run()
 # (e.g., the Modal/Docker/Daytona terminal backends). Running them in a separate
@@ -140,7 +138,6 @@ class HermesAgentLoop:
        temperature: float = 1.0,
        max_tokens: Optional[int] = None,
        extra_body: Optional[Dict[str, Any]] = None,
-        budget_config: Optional["BudgetConfig"] = None,
    ):
        """
        Initialize the agent loop.
@@ -157,11 +154,7 @@ class HermesAgentLoop:
            extra_body: Extra parameters passed to the OpenAI client's create() call.
                        Used for OpenRouter provider preferences, transforms, etc.
                        e.g. {"provider": {"ignore": ["DeepInfra"]}}
-            budget_config: Tool result persistence budget. Controls per-tool
-                        thresholds, per-turn aggregate budget, and preview size.
-                        If None, uses DEFAULT_BUDGET (current hardcoded values).
        """
-        from tools.budget_config import DEFAULT_BUDGET
        self.server = server
        self.tool_schemas = tool_schemas
        self.valid_tool_names = valid_tool_names
@@ -170,7 +163,6 @@ class HermesAgentLoop:
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.extra_body = extra_body
-        self.budget_config = budget_config or DEFAULT_BUDGET

    async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
        """
@@ -454,15 +446,8 @@ class HermesAgentLoop:
                        except (json.JSONDecodeError, TypeError):
                            pass

+                    # Add tool response to conversation
                    tc_id = tc.get("id", "") if isinstance(tc, dict) else tc.id
-                    tool_result = maybe_persist_tool_result(
-                        content=tool_result,
-                        tool_name=tool_name,
-                        tool_use_id=tc_id,
-                        env=get_active_env(self.task_id),
-                        config=self.budget_config,
-                    )
-
                    messages.append(
                        {
                            "role": "tool",
@@ -471,14 +456,6 @@ class HermesAgentLoop:
                        }
                    )

-                num_tcs = len(assistant_msg.tool_calls)
-                if num_tcs > 0:
-                    enforce_turn_budget(
-                        messages[-num_tcs:],
-                        env=get_active_env(self.task_id),
-                        config=self.budget_config,
-                    )
-
                turn_elapsed = _time.monotonic() - turn_start
                logger.info(
                    "[%s] turn %d: api=%.1fs, %d tools, turn_total=%.1fs",
@@ -1048,7 +1048,6 @@ class AgenticOPDEnv(HermesAgentBaseEnv):
                    temperature=0.0,
                    max_tokens=self.config.max_token_length,
                    extra_body=self.config.extra_body,
-                    budget_config=self.config.build_budget_config(),
                )
                result = await agent.run(messages)

@@ -541,7 +541,6 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
                        temperature=self.config.agent_temperature,
                        max_tokens=self.config.max_token_length,
                        extra_body=self.config.extra_body,
-                        budget_config=self.config.build_budget_config(),
                    )
                    result = await agent.run(messages)
            else:
@@ -554,7 +553,6 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
                    temperature=self.config.agent_temperature,
                    max_tokens=self.config.max_token_length,
                    extra_body=self.config.extra_body,
-                    budget_config=self.config.build_budget_config(),
                )
                result = await agent.run(messages)

@@ -549,7 +549,6 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
                temperature=self.config.agent_temperature,
                max_tokens=self.config.max_token_length,
                extra_body=self.config.extra_body,
-                budget_config=self.config.build_budget_config(),
            )
            result = await agent.run(messages)

@@ -62,11 +62,6 @@ from atroposlib.type_definitions import Item

 from environments.agent_loop import AgentResult, HermesAgentLoop
 from environments.tool_context import ToolContext
-from tools.budget_config import (
-    DEFAULT_RESULT_SIZE_CHARS,
-    DEFAULT_TURN_BUDGET_CHARS,
-    DEFAULT_PREVIEW_SIZE_CHARS,
-)

 # Import hermes-agent toolset infrastructure
 from model_tools import get_tool_definitions
@@ -165,32 +160,6 @@ class HermesAgentEnvConfig(BaseEnvConfig):
        "Options: hermes, mistral, llama3_json, qwen, deepseek_v3, etc.",
    )

-    # --- Tool result budget ---
-    # Defaults imported from tools.budget_config (single source of truth).
-    default_result_size_chars: int = Field(
-        default=DEFAULT_RESULT_SIZE_CHARS,
-        description="Default per-tool threshold (chars) for persisting large results "
-        "to sandbox. Results exceeding this are written to /tmp/hermes-results/ "
-        "and replaced with a preview. Per-tool registry values take precedence "
-        "unless overridden via tool_result_overrides.",
-    )
-    turn_budget_chars: int = Field(
-        default=DEFAULT_TURN_BUDGET_CHARS,
-        description="Aggregate char budget per assistant turn. If all tool results "
-        "in a single turn exceed this, the largest are persisted to disk first.",
-    )
-    preview_size_chars: int = Field(
-        default=DEFAULT_PREVIEW_SIZE_CHARS,
-        description="Size of the inline preview shown after a tool result is persisted.",
-    )
-    tool_result_overrides: Optional[Dict[str, int]] = Field(
-        default=None,
-        description="Per-tool threshold overrides (chars). Keys are tool names, "
-        "values are char thresholds. Overrides both the default and registry "
-        "per-tool values. Example: {'terminal': 10000, 'search_files': 5000}. "
-        "Note: read_file is pinned to infinity and cannot be overridden.",
-    )
-
    # --- Provider-specific parameters ---
    # Passed as extra_body to the OpenAI client's chat.completions.create() call.
    # Useful for OpenRouter provider preferences, transforms, route settings, etc.
@@ -207,16 +176,6 @@ class HermesAgentEnvConfig(BaseEnvConfig):
        "transforms, and other provider-specific settings.",
    )

-    def build_budget_config(self):
-        """Build a BudgetConfig from env config fields."""
-        from tools.budget_config import BudgetConfig
-        return BudgetConfig(
-            default_result_size=self.default_result_size_chars,
-            turn_budget=self.turn_budget_chars,
-            preview_size=self.preview_size_chars,
-            tool_overrides=dict(self.tool_result_overrides) if self.tool_result_overrides else {},
-        )
-

 class HermesAgentBaseEnv(BaseEnv):
    """
@@ -531,7 +490,6 @@ class HermesAgentBaseEnv(BaseEnv):
                        temperature=self.config.agent_temperature,
                        max_tokens=self.config.max_token_length,
                        extra_body=self.config.extra_body,
-                        budget_config=self.config.build_budget_config(),
                    )
                    result = await agent.run(messages)
            except NotImplementedError:
@@ -549,7 +507,6 @@ class HermesAgentBaseEnv(BaseEnv):
                    temperature=self.config.agent_temperature,
                    max_tokens=self.config.max_token_length,
                    extra_body=self.config.extra_body,
-                    budget_config=self.config.build_budget_config(),
                )
                result = await agent.run(messages)
        else:
@@ -563,7 +520,6 @@ class HermesAgentBaseEnv(BaseEnv):
                temperature=self.config.agent_temperature,
                max_tokens=self.config.max_token_length,
                extra_body=self.config.extra_body,
-                budget_config=self.config.build_budget_config(),
            )
            result = await agent.run(messages)

@@ -472,7 +472,6 @@ class WebResearchEnv(HermesAgentBaseEnv):
                    temperature=0.0,  # Deterministic for eval
                    max_tokens=self.config.max_token_length,
                    extra_body=self.config.extra_body,
-                    budget_config=self.config.build_budget_config(),
                )
                result = await agent.run(messages)

@@ -77,7 +77,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
            logger.warning("Channel directory: failed to build %s: %s", platform.value, e)

    # Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
-    for plat_name in ("telegram", "whatsapp", "signal", "email", "sms", "bluebubbles"):
+    for plat_name in ("telegram", "whatsapp", "signal", "email", "sms"):
        if plat_name not in platforms:
            platforms[plat_name] = _build_from_sessions(plat_name)

@@ -63,7 +63,6 @@ class Platform(Enum):
    WEBHOOK = "webhook"
    FEISHU = "feishu"
    WECOM = "wecom"
-    BLUEBUBBLES = "bluebubbles"


@dataclass
@@ -288,9 +287,6 @@ class GatewayConfig:
            # WeCom uses extra dict for bot credentials
            elif platform == Platform.WECOM and config.extra.get("bot_id"):
                connected.append(platform)
-            # BlueBubbles uses extra dict for local server config
-            elif platform == Platform.BLUEBUBBLES and config.extra.get("server_url") and config.extra.get("password"):
-                connected.append(platform)
        return connected
    
    def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
@@ -716,13 +712,6 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
            name=os.getenv("DISCORD_HOME_CHANNEL_NAME", "Home"),
        )
    
-    # Reply threading mode for Discord (off/first/all)
-    discord_reply_mode = os.getenv("DISCORD_REPLY_TO_MODE", "").lower()
-    if discord_reply_mode in ("off", "first", "all"):
-        if Platform.DISCORD not in config.platforms:
-            config.platforms[Platform.DISCORD] = PlatformConfig()
-        config.platforms[Platform.DISCORD].reply_to_mode = discord_reply_mode
-    
    # WhatsApp (typically uses different auth mechanism)
    whatsapp_enabled = os.getenv("WHATSAPP_ENABLED", "").lower() in ("true", "1", "yes")
    if whatsapp_enabled:
@@ -952,29 +941,6 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
                name=os.getenv("WECOM_HOME_CHANNEL_NAME", "Home"),
            )

-    # BlueBubbles (iMessage)
-    bluebubbles_server_url = os.getenv("BLUEBUBBLES_SERVER_URL")
-    bluebubbles_password = os.getenv("BLUEBUBBLES_PASSWORD")
-    if bluebubbles_server_url and bluebubbles_password:
-        if Platform.BLUEBUBBLES not in config.platforms:
-            config.platforms[Platform.BLUEBUBBLES] = PlatformConfig()
-        config.platforms[Platform.BLUEBUBBLES].enabled = True
-        config.platforms[Platform.BLUEBUBBLES].extra.update({
-            "server_url": bluebubbles_server_url.rstrip("/"),
-            "password": bluebubbles_password,
-            "webhook_host": os.getenv("BLUEBUBBLES_WEBHOOK_HOST", "127.0.0.1"),
-            "webhook_port": int(os.getenv("BLUEBUBBLES_WEBHOOK_PORT", "8645")),
-            "webhook_path": os.getenv("BLUEBUBBLES_WEBHOOK_PATH", "/bluebubbles-webhook"),
-            "send_read_receipts": os.getenv("BLUEBUBBLES_SEND_READ_RECEIPTS", "true").lower() in ("true", "1", "yes"),
-        })
-    bluebubbles_home = os.getenv("BLUEBUBBLES_HOME_CHANNEL")
-    if bluebubbles_home and Platform.BLUEBUBBLES in config.platforms:
-        config.platforms[Platform.BLUEBUBBLES].home_channel = HomeChannel(
-            platform=Platform.BLUEBUBBLES,
-            chat_id=bluebubbles_home,
-            name=os.getenv("BLUEBUBBLES_HOME_CHANNEL_NAME", "Home"),
-        )
-
    # Session settings
    idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
    if idle_minutes:
@@ -298,7 +298,6 @@ SUPPORTED_DOCUMENT_TYPES = {
    ".pdf": "application/pdf",
    ".md": "text/markdown",
    ".txt": "text/plain",
-    ".log": "text/plain",
    ".zip": "application/zip",
    ".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
    ".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
@@ -408,10 +407,6 @@ class MessageEvent:
    # Auto-loaded skill for topic/channel bindings (e.g., Telegram DM Topics)
    auto_skill: Optional[str] = None
    
-    # Internal flag — set for synthetic events (e.g. background process
-    # completion notifications) that must bypass user authorization checks.
-    internal: bool = False
-
    # Timestamps
    timestamp: datetime = field(default_factory=datetime.now)
    
@@ -1,828 +0,0 @@
-"""BlueBubbles iMessage platform adapter.
-
-Uses the local BlueBubbles macOS server for outbound REST sends and inbound
-webhooks.  Supports text messaging, media attachments (images, voice, video,
-documents), tapback reactions, typing indicators, and read receipts.
-
-Architecture based on PR #5869 (benjaminsehl) with inbound attachment
-downloading from PR #4588 (YuhangLin).
-"""
-
-import asyncio
-import json
-import logging
-import os
-import re
-import uuid
-from datetime import datetime
-from typing import Any, Dict, List, Optional
-from urllib.parse import quote
-
-import httpx
-
-from gateway.config import Platform, PlatformConfig
-from gateway.platforms.base import (
-    BasePlatformAdapter,
-    MessageEvent,
-    MessageType,
-    SendResult,
-    cache_image_from_bytes,
-    cache_audio_from_bytes,
-    cache_document_from_bytes,
-)
-
-logger = logging.getLogger(__name__)
-
-# ---------------------------------------------------------------------------
-# Constants
-# ---------------------------------------------------------------------------
-
-DEFAULT_WEBHOOK_HOST = "127.0.0.1"
-DEFAULT_WEBHOOK_PORT = 8645
-DEFAULT_WEBHOOK_PATH = "/bluebubbles-webhook"
-MAX_TEXT_LENGTH = 4000
-
-# Tapback reaction codes (BlueBubbles associatedMessageType values)
-_TAPBACK_ADDED = {
-    2000: "love", 2001: "like", 2002: "dislike",
-    2003: "laugh", 2004: "emphasize", 2005: "question",
-}
-_TAPBACK_REMOVED = {
-    3000: "love", 3001: "like", 3002: "dislike",
-    3003: "laugh", 3004: "emphasize", 3005: "question",
-}
-
-# Webhook event types that carry user messages
-_MESSAGE_EVENTS = {"new-message", "message", "updated-message"}
-
-# Log redaction patterns
-_PHONE_RE = re.compile(r"\+?\d{7,15}")
-_EMAIL_RE = re.compile(r"[\w.+-]+@[\w-]+\.[\w.]+")
-
-
-def _redact(text: str) -> str:
-    """Redact phone numbers and emails from log output."""
-    text = _PHONE_RE.sub("[REDACTED]", text)
-    text = _EMAIL_RE.sub("[REDACTED]", text)
-    return text
-
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-def check_bluebubbles_requirements() -> bool:
-    try:
-        import aiohttp  # noqa: F401
-        import httpx as _httpx  # noqa: F401
-    except ImportError:
-        return False
-    return True
-
-
-def _normalize_server_url(raw: str) -> str:
-    value = (raw or "").strip()
-    if not value:
-        return ""
-    if not re.match(r"^https?://", value, flags=re.I):
-        value = f"http://{value}"
-    return value.rstrip("/")
-
-
-def _strip_markdown(text: str) -> str:
-    """Strip common markdown formatting for iMessage plain-text delivery."""
-    text = re.sub(r"\*\*(.+?)\*\*", r"\1", text, flags=re.DOTALL)
-    text = re.sub(r"\*(.+?)\*", r"\1", text, flags=re.DOTALL)
-    text = re.sub(r"__(.+?)__", r"\1", text, flags=re.DOTALL)
-    text = re.sub(r"_(.+?)_", r"\1", text, flags=re.DOTALL)
-    text = re.sub(r"```[a-zA-Z0-9_+-]*\n?", "", text)
-    text = re.sub(r"`(.+?)`", r"\1", text)
-    text = re.sub(r"^#{1,6}\s+", "", text, flags=re.MULTILINE)
-    text = re.sub(r"\[([^\]]+)\]\(([^\)]+)\)", r"\1", text)
-    text = re.sub(r"\n{3,}", "\n\n", text)
-    return text.strip()
-
-
-# ---------------------------------------------------------------------------
-# Adapter
-# ---------------------------------------------------------------------------
-
-class BlueBubblesAdapter(BasePlatformAdapter):
-    platform = Platform.BLUEBUBBLES
-    MAX_MESSAGE_LENGTH = MAX_TEXT_LENGTH
-
-    def __init__(self, config: PlatformConfig):
-        super().__init__(config, Platform.BLUEBUBBLES)
-        extra = config.extra or {}
-        self.server_url = _normalize_server_url(
-            extra.get("server_url") or os.getenv("BLUEBUBBLES_SERVER_URL", "")
-        )
-        self.password = extra.get("password") or os.getenv("BLUEBUBBLES_PASSWORD", "")
-        self.webhook_host = (
-            extra.get("webhook_host")
-            or os.getenv("BLUEBUBBLES_WEBHOOK_HOST", DEFAULT_WEBHOOK_HOST)
-        )
-        self.webhook_port = int(
-            extra.get("webhook_port")
-            or os.getenv("BLUEBUBBLES_WEBHOOK_PORT", str(DEFAULT_WEBHOOK_PORT))
-        )
-        self.webhook_path = (
-            extra.get("webhook_path")
-            or os.getenv("BLUEBUBBLES_WEBHOOK_PATH", DEFAULT_WEBHOOK_PATH)
-        )
-        if not str(self.webhook_path).startswith("/"):
-            self.webhook_path = f"/{self.webhook_path}"
-        self.send_read_receipts = bool(extra.get("send_read_receipts", True))
-        self.client: Optional[httpx.AsyncClient] = None
-        self._runner = None
-        self._private_api_enabled: Optional[bool] = None
-        self._helper_connected: bool = False
-        self._guid_cache: Dict[str, str] = {}
-
-    # ------------------------------------------------------------------
-    # API helpers
-    # ------------------------------------------------------------------
-
-    def _api_url(self, path: str) -> str:
-        sep = "&" if "?" in path else "?"
-        return f"{self.server_url}{path}{sep}password={quote(self.password, safe='')}"
-
-    async def _api_get(self, path: str) -> Dict[str, Any]:
-        assert self.client is not None
-        res = await self.client.get(self._api_url(path))
-        res.raise_for_status()
-        return res.json()
-
-    async def _api_post(self, path: str, payload: Dict[str, Any]) -> Dict[str, Any]:
-        assert self.client is not None
-        res = await self.client.post(self._api_url(path), json=payload)
-        res.raise_for_status()
-        return res.json()
-
-    # ------------------------------------------------------------------
-    # Lifecycle
-    # ------------------------------------------------------------------
-
-    async def connect(self) -> bool:
-        if not self.server_url or not self.password:
-            logger.error(
-                "[bluebubbles] BLUEBUBBLES_SERVER_URL and BLUEBUBBLES_PASSWORD are required"
-            )
-            return False
-        from aiohttp import web
-
-        self.client = httpx.AsyncClient(timeout=30.0)
-        try:
-            await self._api_get("/api/v1/ping")
-            info = await self._api_get("/api/v1/server/info")
-            server_data = (info or {}).get("data", {})
-            self._private_api_enabled = bool(server_data.get("private_api"))
-            self._helper_connected = bool(server_data.get("helper_connected"))
-            logger.info(
-                "[bluebubbles] connected to %s (private_api=%s, helper=%s)",
-                self.server_url,
-                self._private_api_enabled,
-                self._helper_connected,
-            )
-        except Exception as exc:
-            logger.error(
-                "[bluebubbles] cannot reach server at %s: %s", self.server_url, exc
-            )
-            if self.client:
-                await self.client.aclose()
-                self.client = None
-            return False
-
-        app = web.Application()
-        app.router.add_get("/health", lambda _: web.Response(text="ok"))
-        app.router.add_post(self.webhook_path, self._handle_webhook)
-        self._runner = web.AppRunner(app)
-        await self._runner.setup()
-        site = web.TCPSite(self._runner, self.webhook_host, self.webhook_port)
-        await site.start()
-        self._mark_connected()
-        logger.info(
-            "[bluebubbles] webhook listening on http://%s:%s%s",
-            self.webhook_host,
-            self.webhook_port,
-            self.webhook_path,
-        )
-        return True
-
-    async def disconnect(self) -> None:
-        if self.client:
-            await self.client.aclose()
-            self.client = None
-        if self._runner:
-            await self._runner.cleanup()
-            self._runner = None
-        self._mark_disconnected()
-
-    # ------------------------------------------------------------------
-    # Chat GUID resolution
-    # ------------------------------------------------------------------
-
-    async def _resolve_chat_guid(self, target: str) -> Optional[str]:
-        """Resolve an email/phone to a BlueBubbles chat GUID.
-
-        If *target* already contains a semicolon (raw GUID format like
-        ``iMessage;-;user@example.com``), it is returned as-is.  Otherwise
-        the adapter queries the BlueBubbles chat list and matches on
-        ``chatIdentifier`` or participant address.
-        """
-        target = (target or "").strip()
-        if not target:
-            return None
-        # Already a raw GUID
-        if ";" in target:
-            return target
-        if target in self._guid_cache:
-            return self._guid_cache[target]
-        try:
-            payload = await self._api_post(
-                "/api/v1/chat/query",
-                {"limit": 100, "offset": 0, "with": ["participants"]},
-            )
-            for chat in payload.get("data", []) or []:
-                guid = chat.get("guid") or chat.get("chatGuid")
-                identifier = chat.get("chatIdentifier") or chat.get("identifier")
-                if identifier == target:
-                    if guid:
-                        self._guid_cache[target] = guid
-                    return guid
-                for part in chat.get("participants", []) or []:
-                    if (part.get("address") or "").strip() == target and guid:
-                        self._guid_cache[target] = guid
-                        return guid
-        except Exception:
-            pass
-        return None
-
-    async def _create_chat_for_handle(
-        self, address: str, message: str
-    ) -> SendResult:
-        """Create a new chat by sending the first message to *address*."""
-        payload = {
-            "addresses": [address],
-            "message": message,
-            "tempGuid": f"temp-{datetime.utcnow().timestamp()}",
-        }
-        try:
-            res = await self._api_post("/api/v1/chat/new", payload)
-            data = res.get("data") or {}
-            msg_id = data.get("guid") or data.get("messageGuid") or "ok"
-            return SendResult(success=True, message_id=str(msg_id), raw_response=res)
-        except Exception as exc:
-            return SendResult(success=False, error=str(exc))
-
-    # ------------------------------------------------------------------
-    # Text sending
-    # ------------------------------------------------------------------
-
-    async def send(
-        self,
-        chat_id: str,
-        content: str,
-        reply_to: Optional[str] = None,
-        metadata: Optional[Dict[str, Any]] = None,
-    ) -> SendResult:
-        text = _strip_markdown(content or "")
-        if not text:
-            return SendResult(success=False, error="BlueBubbles send requires text")
-        chunks = self.truncate_message(text, max_length=self.MAX_MESSAGE_LENGTH)
-        last = SendResult(success=True)
-        for chunk in chunks:
-            guid = await self._resolve_chat_guid(chat_id)
-            if not guid:
-                # If the target looks like an address, try creating a new chat
-                if self._private_api_enabled and (
-                    "@" in chat_id or re.match(r"^\+\d+", chat_id)
-                ):
-                    return await self._create_chat_for_handle(chat_id, chunk)
-                return SendResult(
-                    success=False,
-                    error=f"BlueBubbles chat not found for target: {chat_id}",
-                )
-            payload: Dict[str, Any] = {
-                "chatGuid": guid,
-                "tempGuid": f"temp-{datetime.utcnow().timestamp()}",
-                "message": chunk,
-            }
-            if reply_to and self._private_api_enabled and self._helper_connected:
-                payload["method"] = "private-api"
-                payload["selectedMessageGuid"] = reply_to
-                payload["partIndex"] = 0
-            try:
-                res = await self._api_post("/api/v1/message/text", payload)
-                data = res.get("data") or {}
-                msg_id = data.get("guid") or data.get("messageGuid") or "ok"
-                last = SendResult(
-                    success=True, message_id=str(msg_id), raw_response=res
-                )
-            except Exception as exc:
-                return SendResult(success=False, error=str(exc))
-        return last
-
-    # ------------------------------------------------------------------
-    # Media sending (outbound)
-    # ------------------------------------------------------------------
-
-    async def _send_attachment(
-        self,
-        chat_id: str,
-        file_path: str,
-        filename: Optional[str] = None,
-        caption: Optional[str] = None,
-        is_audio_message: bool = False,
-    ) -> SendResult:
-        """Send a file attachment via BlueBubbles multipart upload."""
-        if not self.client:
-            return SendResult(success=False, error="Not connected")
-        if not os.path.isfile(file_path):
-            return SendResult(success=False, error=f"File not found: {file_path}")
-
-        guid = await self._resolve_chat_guid(chat_id)
-        if not guid:
-            return SendResult(success=False, error=f"Chat not found: {chat_id}")
-
-        fname = filename or os.path.basename(file_path)
-        try:
-            with open(file_path, "rb") as f:
-                files = {"attachment": (fname, f, "application/octet-stream")}
-                data: Dict[str, str] = {
-                    "chatGuid": guid,
-                    "name": fname,
-                    "tempGuid": uuid.uuid4().hex,
-                }
-                if is_audio_message:
-                    data["isAudioMessage"] = "true"
-                res = await self.client.post(
-                    self._api_url("/api/v1/message/attachment"),
-                    files=files,
-                    data=data,
-                    timeout=120,
-                )
-                res.raise_for_status()
-                result = res.json()
-
-            if caption:
-                await self.send(chat_id, caption)
-
-            if result.get("status") == 200:
-                rdata = result.get("data") or {}
-                msg_id = rdata.get("guid") if isinstance(rdata, dict) else None
-                return SendResult(
-                    success=True, message_id=msg_id, raw_response=result
-                )
-            return SendResult(
-                success=False,
-                error=result.get("message", "Attachment upload failed"),
-            )
-        except Exception as e:
-            return SendResult(success=False, error=str(e))
-
-    async def send_image(
-        self,
-        chat_id: str,
-        image_url: str,
-        caption: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        metadata: Optional[Dict[str, Any]] = None,
-    ) -> SendResult:
-        try:
-            from gateway.platforms.base import cache_image_from_url
-
-            local_path = await cache_image_from_url(image_url)
-            return await self._send_attachment(chat_id, local_path, caption=caption)
-        except Exception:
-            return await super().send_image(chat_id, image_url, caption, reply_to)
-
-    async def send_image_file(
-        self,
-        chat_id: str,
-        image_path: str,
-        caption: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        return await self._send_attachment(chat_id, image_path, caption=caption)
-
-    async def send_voice(
-        self,
-        chat_id: str,
-        audio_path: str,
-        caption: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        return await self._send_attachment(
-            chat_id, audio_path, caption=caption, is_audio_message=True
-        )
-
-    async def send_video(
-        self,
-        chat_id: str,
-        video_path: str,
-        caption: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        return await self._send_attachment(chat_id, video_path, caption=caption)
-
-    async def send_document(
-        self,
-        chat_id: str,
-        file_path: str,
-        caption: Optional[str] = None,
-        file_name: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        **kwargs,
-    ) -> SendResult:
-        return await self._send_attachment(
-            chat_id, file_path, filename=file_name, caption=caption
-        )
-
-    async def send_animation(
-        self,
-        chat_id: str,
-        animation_url: str,
-        caption: Optional[str] = None,
-        reply_to: Optional[str] = None,
-        metadata: Optional[Dict[str, Any]] = None,
-    ) -> SendResult:
-        return await self.send_image(
-            chat_id, animation_url, caption, reply_to, metadata
-        )
-
-    # ------------------------------------------------------------------
-    # Typing indicators
-    # ------------------------------------------------------------------
-
-    async def send_typing(self, chat_id: str, metadata=None) -> None:
-        if not self._private_api_enabled or not self._helper_connected or not self.client:
-            return
-        try:
-            guid = await self._resolve_chat_guid(chat_id)
-            if guid:
-                encoded = quote(guid, safe="")
-                await self.client.post(
-                    self._api_url(f"/api/v1/chat/{encoded}/typing"), timeout=5
-                )
-        except Exception:
-            pass
-
-    async def stop_typing(self, chat_id: str) -> None:
-        if not self._private_api_enabled or not self._helper_connected or not self.client:
-            return
-        try:
-            guid = await self._resolve_chat_guid(chat_id)
-            if guid:
-                encoded = quote(guid, safe="")
-                await self.client.delete(
-                    self._api_url(f"/api/v1/chat/{encoded}/typing"), timeout=5
-                )
-        except Exception:
-            pass
-
-    # ------------------------------------------------------------------
-    # Read receipts
-    # ------------------------------------------------------------------
-
-    async def mark_read(self, chat_id: str) -> bool:
-        if not self._private_api_enabled or not self._helper_connected or not self.client:
-            return False
-        try:
-            guid = await self._resolve_chat_guid(chat_id)
-            if guid:
-                encoded = quote(guid, safe="")
-                await self.client.post(
-                    self._api_url(f"/api/v1/chat/{encoded}/read"), timeout=5
-                )
-                return True
-        except Exception:
-            pass
-        return False
-
-    # ------------------------------------------------------------------
-    # Tapback reactions
-    # ------------------------------------------------------------------
-
-    async def send_reaction(
-        self,
-        chat_id: str,
-        message_guid: str,
-        reaction: str,
-        part_index: int = 0,
-    ) -> SendResult:
-        """Send a tapback reaction (requires Private API helper)."""
-        if not self._private_api_enabled or not self._helper_connected:
-            return SendResult(
-                success=False, error="Private API helper not connected"
-            )
-        guid = await self._resolve_chat_guid(chat_id)
-        if not guid:
-            return SendResult(success=False, error=f"Chat not found: {chat_id}")
-        try:
-            res = await self._api_post(
-                "/api/v1/message/react",
-                {
-                    "chatGuid": guid,
-                    "selectedMessageGuid": message_guid,
-                    "reaction": reaction,
-                    "partIndex": part_index,
-                },
-            )
-            return SendResult(success=True, raw_response=res)
-        except Exception as exc:
-            return SendResult(success=False, error=str(exc))
-
-    # ------------------------------------------------------------------
-    # Chat info
-    # ------------------------------------------------------------------
-
-    async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
-        is_group = ";+;" in (chat_id or "")
-        info: Dict[str, Any] = {
-            "name": chat_id,
-            "type": "group" if is_group else "dm",
-        }
-        try:
-            guid = await self._resolve_chat_guid(chat_id)
-            if guid:
-                encoded = quote(guid, safe="")
-                res = await self._api_get(
-                    f"/api/v1/chat/{encoded}?with=participants"
-                )
-                data = (res or {}).get("data", {})
-                display_name = (
-                    data.get("displayName")
-                    or data.get("chatIdentifier")
-                    or chat_id
-                )
-                participants = []
-                for p in data.get("participants", []) or []:
-                    addr = (p.get("address") or "").strip()
-                    if addr:
-                        participants.append(addr)
-                info["name"] = display_name
-                if participants:
-                    info["participants"] = participants
-        except Exception:
-            pass
-        return info
-
-    def format_message(self, content: str) -> str:
-        return _strip_markdown(content)
-
-    # ------------------------------------------------------------------
-    # Inbound attachment downloading (from #4588)
-    # ------------------------------------------------------------------
-
-    async def _download_attachment(
-        self, att_guid: str, att_meta: Dict[str, Any]
-    ) -> Optional[str]:
-        """Download an attachment from BlueBubbles and cache it locally.
-
-        Returns the local file path on success, None on failure.
-        """
-        if not self.client:
-            return None
-        try:
-            encoded = quote(att_guid, safe="")
-            resp = await self.client.get(
-                self._api_url(f"/api/v1/attachment/{encoded}/download"),
-                timeout=60,
-                follow_redirects=True,
-            )
-            resp.raise_for_status()
-            data = resp.content
-
-            mime = (att_meta.get("mimeType") or "").lower()
-            transfer_name = att_meta.get("transferName", "")
-
-            if mime.startswith("image/"):
-                ext_map = {
-                    "image/jpeg": ".jpg",
-                    "image/png": ".png",
-                    "image/gif": ".gif",
-                    "image/webp": ".webp",
-                    "image/heic": ".jpg",
-                    "image/heif": ".jpg",
-                    "image/tiff": ".jpg",
-                }
-                ext = ext_map.get(mime, ".jpg")
-                return cache_image_from_bytes(data, ext)
-
-            if mime.startswith("audio/"):
-                ext_map = {
-                    "audio/mp3": ".mp3",
-                    "audio/mpeg": ".mp3",
-                    "audio/ogg": ".ogg",
-                    "audio/wav": ".wav",
-                    "audio/x-caf": ".mp3",
-                    "audio/mp4": ".m4a",
-                    "audio/aac": ".m4a",
-                }
-                ext = ext_map.get(mime, ".mp3")
-                return cache_audio_from_bytes(data, ext)
-
-            # Videos, documents, and everything else
-            filename = transfer_name or f"file_{uuid.uuid4().hex[:8]}"
-            return cache_document_from_bytes(data, filename)
-
-        except Exception as exc:
-            logger.warning(
-                "[bluebubbles] failed to download attachment %s: %s",
-                _redact(att_guid),
-                exc,
-            )
-            return None
-
-    # ------------------------------------------------------------------
-    # Webhook handling
-    # ------------------------------------------------------------------
-
-    def _extract_payload_record(
-        self, payload: Dict[str, Any]
-    ) -> Optional[Dict[str, Any]]:
-        data = payload.get("data")
-        if isinstance(data, dict):
-            return data
-        if isinstance(data, list):
-            for item in data:
-                if isinstance(item, dict):
-                    return item
-        if isinstance(payload.get("message"), dict):
-            return payload.get("message")
-        return payload if isinstance(payload, dict) else None
-
-    @staticmethod
-    def _value(*candidates: Any) -> Optional[str]:
-        for candidate in candidates:
-            if isinstance(candidate, str) and candidate.strip():
-                return candidate.strip()
-        return None
-
-    async def _handle_webhook(self, request):
-        from aiohttp import web
-
-        token = (
-            request.query.get("password")
-            or request.query.get("guid")
-            or request.headers.get("x-password")
-            or request.headers.get("x-guid")
-            or request.headers.get("x-bluebubbles-guid")
-        )
-        if token != self.password:
-            return web.json_response({"error": "unauthorized"}, status=401)
-        try:
-            raw = await request.read()
-            body = raw.decode("utf-8", errors="replace")
-            try:
-                payload = json.loads(body)
-            except Exception:
-                from urllib.parse import parse_qs
-
-                form = parse_qs(body)
-                payload_str = (
-                    form.get("payload")
-                    or form.get("data")
-                    or form.get("message")
-                    or [""]
-                )[0]
-                payload = json.loads(payload_str) if payload_str else {}
-        except Exception as exc:
-            logger.error("[bluebubbles] webhook parse error: %s", exc)
-            return web.json_response({"error": "invalid payload"}, status=400)
-
-        event_type = self._value(payload.get("type"), payload.get("event")) or ""
-        # Only process message events; silently acknowledge everything else
-        if event_type and event_type not in _MESSAGE_EVENTS:
-            return web.Response(text="ok")
-
-        record = self._extract_payload_record(payload) or {}
-        is_from_me = bool(
-            record.get("isFromMe")
-            or record.get("fromMe")
-            or record.get("is_from_me")
-        )
-        if is_from_me:
-            return web.Response(text="ok")
-
-        # Skip tapback reactions delivered as messages
-        assoc_type = record.get("associatedMessageType")
-        if isinstance(assoc_type, int) and assoc_type in {
-            **_TAPBACK_ADDED,
-            **_TAPBACK_REMOVED,
-        }:
-            return web.Response(text="ok")
-
-        text = (
-            self._value(
-                record.get("text"), record.get("message"), record.get("body")
-            )
-            or ""
-        )
-
-        # --- Inbound attachment handling ---
-        attachments = record.get("attachments") or []
-        media_urls: List[str] = []
-        media_types: List[str] = []
-        msg_type = MessageType.TEXT
-
-        for att in attachments:
-            att_guid = att.get("guid", "")
-            if not att_guid:
-                continue
-            cached = await self._download_attachment(att_guid, att)
-            if cached:
-                mime = (att.get("mimeType") or "").lower()
-                media_urls.append(cached)
-                media_types.append(mime)
-                if mime.startswith("image/"):
-                    msg_type = MessageType.PHOTO
-                elif mime.startswith("audio/") or (att.get("uti") or "").endswith(
-                    "caf"
-                ):
-                    msg_type = MessageType.VOICE
-                elif mime.startswith("video/"):
-                    msg_type = MessageType.VIDEO
-                else:
-                    msg_type = MessageType.DOCUMENT
-
-        # With multiple attachments, prefer PHOTO if any images present
-        if len(media_urls) > 1:
-            mime_prefixes = {(m or "").split("/")[0] for m in media_types}
-            if "image" in mime_prefixes:
-                msg_type = MessageType.PHOTO
-
-        if not text and media_urls:
-            text = "(attachment)"
-        # --- End attachment handling ---
-
-        chat_guid = self._value(
-            record.get("chatGuid"),
-            payload.get("chatGuid"),
-            record.get("chat_guid"),
-            payload.get("chat_guid"),
-            payload.get("guid"),
-        )
-        chat_identifier = self._value(
-            record.get("chatIdentifier"),
-            record.get("identifier"),
-            payload.get("chatIdentifier"),
-            payload.get("identifier"),
-        )
-        sender = (
-            self._value(
-                record.get("handle", {}).get("address")
-                if isinstance(record.get("handle"), dict)
-                else None,
-                record.get("sender"),
-                record.get("from"),
-                record.get("address"),
-            )
-            or chat_identifier
-            or chat_guid
-        )
-        if not (chat_guid or chat_identifier) and sender:
-            chat_identifier = sender
-        if not sender or not (chat_guid or chat_identifier) or not text:
-            return web.json_response({"error": "missing message fields"}, status=400)
-
-        session_chat_id = chat_guid or chat_identifier
-        is_group = bool(record.get("isGroup")) or (";+;" in (chat_guid or ""))
-        source = self.build_source(
-            chat_id=session_chat_id,
-            chat_name=chat_identifier or sender,
-            chat_type="group" if is_group else "dm",
-            user_id=sender,
-            user_name=sender,
-            chat_id_alt=chat_identifier,
-        )
-        event = MessageEvent(
-            text=text,
-            message_type=msg_type,
-            source=source,
-            raw_message=payload,
-            message_id=self._value(
-                record.get("guid"),
-                record.get("messageGuid"),
-                record.get("id"),
-            ),
-            reply_to_message_id=self._value(
-                record.get("threadOriginatorGuid"),
-                record.get("associatedMessageGuid"),
-            ),
-            media_urls=media_urls,
-            media_types=media_types,
-        )
-        task = asyncio.create_task(self.handle_message(event))
-        self._background_tasks.add(task)
-        task.add_done_callback(self._background_tasks.discard)
-
-        # Fire-and-forget read receipt
-        if self.send_read_receipts and session_chat_id:
-            asyncio.create_task(self.mark_read(session_chat_id))
-
-        return web.Response(text="ok")
@@ -455,9 +455,6 @@ class DiscordAdapter(BasePlatformAdapter):
        self._seen_messages: Dict[str, float] = {}
        self._SEEN_TTL = 300   # 5 minutes
        self._SEEN_MAX = 2000  # prune threshold
-        # Reply threading mode: "off" (no replies), "first" (reply on first
-        # chunk only, default), "all" (reply-reference on every chunk).
-        self._reply_to_mode: str = getattr(config, 'reply_to_mode', 'first') or 'first'

    async def connect(self) -> bool:
        """Connect to Discord and start receiving events."""
@@ -777,7 +774,7 @@ class DiscordAdapter(BasePlatformAdapter):
            message_ids = []
            reference = None

-            if reply_to and self._reply_to_mode != "off":
+            if reply_to:
                try:
                    ref_msg = await channel.fetch_message(int(reply_to))
                    reference = ref_msg
@@ -785,10 +782,7 @@ class DiscordAdapter(BasePlatformAdapter):
                    logger.debug("Could not fetch reply-to message: %s", e)

            for i, chunk in enumerate(chunks):
-                if self._reply_to_mode == "all":
-                    chunk_reference = reference
-                else:  # "first" (default) or "off"
-                    chunk_reference = reference if i == 0 else None
+                chunk_reference = reference if i == 0 else None
                try:
                    msg = await channel.send(
                        content=chunk,
@@ -1767,9 +1761,8 @@ class DiscordAdapter(BasePlatformAdapter):
            if hasattr(interaction.channel, "guild") and interaction.channel.guild:
                chat_name = f"{interaction.channel.guild.name} / #{chat_name}"

-        # Get channel topic (if available).
-        # For forum threads, inherit the parent forum's topic.
-        chat_topic = self._get_effective_topic(interaction.channel, is_thread=is_thread)
+        # Get channel topic (if available)
+        chat_topic = getattr(interaction.channel, "topic", None)

        source = self.build_source(
            chat_id=str(interaction.channel_id),
@@ -1843,10 +1836,6 @@ class DiscordAdapter(BasePlatformAdapter):

        chat_name = f"{guild_name} / {thread_name}" if guild_name else thread_name

-        # Inherit forum topic when the thread was created inside a forum channel.
-        _chan = getattr(interaction, "channel", None)
-        chat_topic = self._get_effective_topic(_chan, is_thread=True) if _chan else None
-
        source = self.build_source(
            chat_id=thread_id,
            chat_name=chat_name,
@@ -1854,7 +1843,6 @@ class DiscordAdapter(BasePlatformAdapter):
            user_id=str(interaction.user.id),
            user_name=interaction.user.display_name,
            thread_id=thread_id,
-            chat_topic=chat_topic,
        )

        event = MessageEvent(
@@ -2140,15 +2128,6 @@ class DiscordAdapter(BasePlatformAdapter):
                return True
        return False

-    def _get_effective_topic(self, channel: Any, is_thread: bool = False) -> Optional[str]:
-        """Return the channel topic, falling back to the parent forum's topic for forum threads."""
-        topic = getattr(channel, "topic", None)
-        if not topic and is_thread:
-            parent = getattr(channel, "parent", None)
-            if parent and self._is_forum_parent(parent):
-                topic = getattr(parent, "topic", None)
-        return topic
-
    def _format_thread_chat_name(self, thread: Any) -> str:
        """Build a readable chat name for thread-like Discord channels, including forum context when available."""
        thread_name = getattr(thread, "name", None) or str(getattr(thread, "id", "thread"))
@@ -2316,10 +2295,8 @@ class DiscordAdapter(BasePlatformAdapter):
            if hasattr(message.channel, "guild") and message.channel.guild:
                chat_name = f"{message.channel.guild.name} / #{chat_name}"

-        # Get channel topic (if available - TextChannels have topics, DMs/threads don't).
-        # For threads whose parent is a forum channel, inherit the parent's topic
-        # so forum descriptions (e.g. project instructions) appear in the session context.
-        chat_topic = self._get_effective_topic(message.channel, is_thread=is_thread)
+        # Get channel topic (if available - TextChannels have topics, DMs/threads don't)
+        chat_topic = getattr(message.channel, "topic", None)

        # Build source
        source = self.build_source(
@@ -2382,7 +2359,7 @@ class DiscordAdapter(BasePlatformAdapter):
                        ext or "unknown", content_type,
                    )
                else:
-                    MAX_DOC_BYTES = 32 * 1024 * 1024
+                    MAX_DOC_BYTES = 20 * 1024 * 1024
                    if att.size and att.size > MAX_DOC_BYTES:
                        logger.warning(
                            "[Discord] Document too large (%s bytes), skipping: %s",
@@ -2406,9 +2383,9 @@ class DiscordAdapter(BasePlatformAdapter):
                            media_urls.append(cached_path)
                            media_types.append(doc_mime)
                            logger.info("[Discord] Cached user document: %s", cached_path)
-                            # Inject text content for plain-text documents (capped at 100 KB)
+                            # Inject text content for .txt/.md files (capped at 100 KB)
                            MAX_TEXT_INJECT_BYTES = 100 * 1024
-                            if ext in (".md", ".txt", ".log") and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
+                            if ext in (".md", ".txt") and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
                                try:
                                    text_content = raw_bytes.decode("utf-8")
                                    display_name = att.filename or f"document{ext}"
@@ -20,7 +20,6 @@ from __future__ import annotations
 import asyncio
 import hashlib
 import hmac
-import itertools
 import json
 import logging
 import mimetypes
@@ -1053,9 +1052,6 @@ class FeishuAdapter(BasePlatformAdapter):
        self._media_batch_state = FeishuBatchState()
        self._pending_media_batches = self._media_batch_state.events
        self._pending_media_batch_tasks = self._media_batch_state.tasks
-        # Exec approval button state (approval_id → {session_key, message_id, chat_id})
-        self._approval_state: Dict[int, Dict[str, str]] = {}
-        self._approval_counter = itertools.count(1)
        self._load_seen_message_ids()

    @staticmethod
@@ -1398,104 +1394,6 @@ class FeishuAdapter(BasePlatformAdapter):
            logger.error("[Feishu] Failed to edit message %s: %s", message_id, exc, exc_info=True)
            return SendResult(success=False, error=str(exc))

-    async def send_exec_approval(
-        self, chat_id: str, command: str, session_key: str,
-        description: str = "dangerous command",
-        metadata: Optional[Dict[str, Any]] = None,
-    ) -> SendResult:
-        """Send an interactive card with approval buttons.
-
-        The buttons carry ``hermes_action`` in their value dict so that
-        ``_handle_card_action_event`` can intercept them and call
-        ``resolve_gateway_approval()`` to unblock the waiting agent thread.
-        """
-        if not self._client:
-            return SendResult(success=False, error="Not connected")
-
-        try:
-            approval_id = next(self._approval_counter)
-            cmd_preview = command[:3000] + "..." if len(command) > 3000 else command
-
-            def _btn(label: str, action_name: str, btn_type: str = "default") -> dict:
-                return {
-                    "tag": "button",
-                    "text": {"tag": "plain_text", "content": label},
-                    "type": btn_type,
-                    "value": {"hermes_action": action_name, "approval_id": approval_id},
-                }
-
-            card = {
-                "config": {"wide_screen_mode": True},
-                "header": {
-                    "title": {"content": "⚠️ Command Approval Required", "tag": "plain_text"},
-                    "template": "orange",
-                },
-                "elements": [
-                    {
-                        "tag": "markdown",
-                        "content": f"```\n{cmd_preview}\n```\n**Reason:** {description}",
-                    },
-                    {
-                        "tag": "action",
-                        "actions": [
-                            _btn("✅ Allow Once", "approve_once", "primary"),
-                            _btn("✅ Session", "approve_session"),
-                            _btn("✅ Always", "approve_always"),
-                            _btn("❌ Deny", "deny", "danger"),
-                        ],
-                    },
-                ],
-            }
-
-            payload = json.dumps(card, ensure_ascii=False)
-            response = await self._feishu_send_with_retry(
-                chat_id=chat_id,
-                msg_type="interactive",
-                payload=payload,
-                reply_to=None,
-                metadata=metadata,
-            )
-
-            result = self._finalize_send_result(response, "send_exec_approval failed")
-            if result.success:
-                self._approval_state[approval_id] = {
-                    "session_key": session_key,
-                    "message_id": result.message_id or "",
-                    "chat_id": chat_id,
-                }
-            return result
-        except Exception as exc:
-            logger.warning("[Feishu] send_exec_approval failed: %s", exc)
-            return SendResult(success=False, error=str(exc))
-
-    async def _update_approval_card(
-        self, message_id: str, label: str, user_name: str, choice: str,
-    ) -> None:
-        """Replace the approval card with a resolved status card."""
-        if not self._client or not message_id:
-            return
-        icon = "❌" if choice == "deny" else "✅"
-        card = {
-            "config": {"wide_screen_mode": True},
-            "header": {
-                "title": {"content": f"{icon} {label}", "tag": "plain_text"},
-                "template": "red" if choice == "deny" else "green",
-            },
-            "elements": [
-                {
-                    "tag": "markdown",
-                    "content": f"{icon} **{label}** by {user_name}",
-                },
-            ],
-        }
-        try:
-            payload = json.dumps(card, ensure_ascii=False)
-            body = self._build_update_message_body(msg_type="interactive", content=payload)
-            request = self._build_update_message_request(message_id=message_id, request_body=body)
-            await asyncio.to_thread(self._client.im.v1.message.update, request)
-        except Exception as exc:
-            logger.warning("[Feishu] Failed to update approval card %s: %s", message_id, exc)
-
    async def send_voice(
        self,
        chat_id: str,
@@ -1922,52 +1820,6 @@ class FeishuAdapter(BasePlatformAdapter):
        action = getattr(event, "action", None)
        action_tag = str(getattr(action, "tag", "") or "button")
        action_value = getattr(action, "value", {}) or {}
-
-        # --- Exec approval button intercept ---
-        hermes_action = action_value.get("hermes_action") if isinstance(action_value, dict) else None
-        if hermes_action:
-            approval_id = action_value.get("approval_id")
-            state = self._approval_state.pop(approval_id, None)
-            if not state:
-                logger.debug("[Feishu] Approval %s already resolved or unknown", approval_id)
-                return
-
-            choice_map = {
-                "approve_once": "once",
-                "approve_session": "session",
-                "approve_always": "always",
-                "deny": "deny",
-            }
-            choice = choice_map.get(hermes_action, "deny")
-
-            label_map = {
-                "once": "Approved once",
-                "session": "Approved for session",
-                "always": "Approved permanently",
-                "deny": "Denied",
-            }
-            label = label_map.get(choice, "Resolved")
-
-            # Resolve sender name for the status card
-            sender_id = SimpleNamespace(open_id=open_id, user_id=None, union_id=None)
-            sender_profile = await self._resolve_sender_profile(sender_id)
-            user_name = sender_profile.get("user_name") or open_id
-
-            # Resolve the approval — unblocks the agent thread
-            try:
-                from tools.approval import resolve_gateway_approval
-                count = resolve_gateway_approval(state["session_key"], choice)
-                logger.info(
-                    "Feishu button resolved %d approval(s) for session %s (choice=%s, user=%s)",
-                    count, state["session_key"], choice, user_name,
-                )
-            except Exception as exc:
-                logger.error("Failed to resolve gateway approval from Feishu button: %s", exc)
-
-            # Update the card to show the decision
-            await self._update_approval_card(state.get("message_id", ""), label, user_name, choice)
-            return
-
        synthetic_text = f"/card {action_tag}"
        if action_value:
            try:
@@ -647,11 +647,7 @@ class SignalAdapter(BasePlatformAdapter):

        if result is not None:
            self._track_sent_timestamp(result)
-            # Use the timestamp from the RPC result as a pseudo message_id.
-            # Signal doesn't have real message IDs, but the stream consumer
-            # needs a truthy value to follow its edit→fallback path correctly.
-            _msg_id = str(result.get("timestamp", "")) if isinstance(result, dict) else None
-            return SendResult(success=True, message_id=_msg_id or None)
+            return SendResult(success=True)
        return SendResult(success=False, error="RPC send failed")

    def _track_sent_timestamp(self, rpc_result) -> None:
@@ -841,11 +837,6 @@ class SignalAdapter(BasePlatformAdapter):
            except asyncio.CancelledError:
                pass

-    async def stop_typing(self, chat_id: str) -> None:
-        """Public interface for stopping typing — called by base adapter's
-        _keep_typing finally block to clean up platform-level typing tasks."""
-        await self._stop_typing_indicator(chat_id)
-
    # ------------------------------------------------------------------
    # Chat Info
    # ------------------------------------------------------------------
@@ -14,7 +14,7 @@ import logging
 import os
 import re
 import time
-from typing import Dict, Optional, Any, Tuple
+from typing import Dict, Optional, Any

 try:
    from slack_bolt.async_app import AsyncApp
@@ -95,12 +95,6 @@ class SlackAdapter(BasePlatformAdapter):
        # respond to ALL subsequent messages in that thread automatically.
        self._mentioned_threads: set = set()
        self._MENTIONED_THREADS_MAX = 5000
-        # Assistant thread metadata keyed by (channel_id, thread_ts). Slack's
-        # AI Assistant lifecycle events can arrive before/alongside message
-        # events, and they carry the user/thread identity needed for stable
-        # session + memory scoping.
-        self._assistant_threads: Dict[Tuple[str, str], Dict[str, str]] = {}
-        self._ASSISTANT_THREADS_MAX = 5000

    async def connect(self) -> bool:
        """Connect to Slack via Socket Mode."""
@@ -187,14 +181,6 @@ class SlackAdapter(BasePlatformAdapter):
            async def handle_app_mention(event, say):
                pass

-            @self._app.event("assistant_thread_started")
-            async def handle_assistant_thread_started(event, say):
-                await self._handle_assistant_thread_lifecycle_event(event)
-
-            @self._app.event("assistant_thread_context_changed")
-            async def handle_assistant_thread_context_changed(event, say):
-                await self._handle_assistant_thread_lifecycle_event(event)
-
            # Register slash command handler
            @self._app.command("/hermes")
            async def handle_hermes_command(ack, command):
@@ -769,135 +755,6 @@ class SlackAdapter(BasePlatformAdapter):

    # ----- Internal handlers -----

-    def _assistant_thread_key(self, channel_id: str, thread_ts: str) -> Optional[Tuple[str, str]]:
-        """Return a stable cache key for Slack assistant thread metadata."""
-        if not channel_id or not thread_ts:
-            return None
-        return (str(channel_id), str(thread_ts))
-
-    def _extract_assistant_thread_metadata(self, event: dict) -> Dict[str, str]:
-        """Extract Slack Assistant thread identity data from an event payload."""
-        assistant_thread = event.get("assistant_thread") or {}
-        context = assistant_thread.get("context") or event.get("context") or {}
-
-        channel_id = (
-            assistant_thread.get("channel_id")
-            or event.get("channel")
-            or context.get("channel_id")
-            or ""
-        )
-        thread_ts = (
-            assistant_thread.get("thread_ts")
-            or event.get("thread_ts")
-            or event.get("message_ts")
-            or ""
-        )
-        user_id = (
-            assistant_thread.get("user_id")
-            or event.get("user")
-            or context.get("user_id")
-            or ""
-        )
-        team_id = (
-            event.get("team")
-            or event.get("team_id")
-            or assistant_thread.get("team_id")
-            or ""
-        )
-        context_channel_id = context.get("channel_id") or ""
-
-        return {
-            "channel_id": str(channel_id) if channel_id else "",
-            "thread_ts": str(thread_ts) if thread_ts else "",
-            "user_id": str(user_id) if user_id else "",
-            "team_id": str(team_id) if team_id else "",
-            "context_channel_id": str(context_channel_id) if context_channel_id else "",
-        }
-
-    def _cache_assistant_thread_metadata(self, metadata: Dict[str, str]) -> None:
-        """Remember assistant thread identity data for later message events."""
-        channel_id = metadata.get("channel_id", "")
-        thread_ts = metadata.get("thread_ts", "")
-        key = self._assistant_thread_key(channel_id, thread_ts)
-        if not key:
-            return
-
-        existing = self._assistant_threads.get(key, {})
-        merged = dict(existing)
-        merged.update({k: v for k, v in metadata.items() if v})
-        self._assistant_threads[key] = merged
-
-        # Evict oldest entries when the cache exceeds the limit
-        if len(self._assistant_threads) > self._ASSISTANT_THREADS_MAX:
-            excess = len(self._assistant_threads) - self._ASSISTANT_THREADS_MAX // 2
-            for old_key in list(self._assistant_threads)[:excess]:
-                del self._assistant_threads[old_key]
-
-        team_id = merged.get("team_id", "")
-        if team_id and channel_id:
-            self._channel_team[channel_id] = team_id
-
-    def _lookup_assistant_thread_metadata(
-        self,
-        event: dict,
-        channel_id: str = "",
-        thread_ts: str = "",
-    ) -> Dict[str, str]:
-        """Load cached assistant-thread metadata that matches the current event."""
-        metadata = self._extract_assistant_thread_metadata(event)
-        if channel_id and not metadata.get("channel_id"):
-            metadata["channel_id"] = channel_id
-        if thread_ts and not metadata.get("thread_ts"):
-            metadata["thread_ts"] = thread_ts
-
-        key = self._assistant_thread_key(
-            metadata.get("channel_id", ""),
-            metadata.get("thread_ts", ""),
-        )
-        cached = self._assistant_threads.get(key, {}) if key else {}
-        if cached:
-            merged = dict(cached)
-            merged.update({k: v for k, v in metadata.items() if v})
-            return merged
-        return metadata
-
-    def _seed_assistant_thread_session(self, metadata: Dict[str, str]) -> None:
-        """Prime the session store so assistant threads get stable user scoping."""
-        session_store = getattr(self, "_session_store", None)
-        if not session_store:
-            return
-
-        channel_id = metadata.get("channel_id", "")
-        thread_ts = metadata.get("thread_ts", "")
-        user_id = metadata.get("user_id", "")
-        if not channel_id or not thread_ts or not user_id:
-            return
-
-        source = self.build_source(
-            chat_id=channel_id,
-            chat_name=channel_id,
-            chat_type="dm",
-            user_id=user_id,
-            thread_id=thread_ts,
-            chat_topic=metadata.get("context_channel_id") or None,
-        )
-
-        try:
-            session_store.get_or_create_session(source)
-        except Exception:
-            logger.debug(
-                "[Slack] Failed to seed assistant thread session for %s/%s",
-                channel_id,
-                thread_ts,
-                exc_info=True,
-            )
-
-    async def _handle_assistant_thread_lifecycle_event(self, event: dict) -> None:
-        """Handle Slack Assistant lifecycle events that carry user/thread identity."""
-        metadata = self._extract_assistant_thread_metadata(event)
-        self._cache_assistant_thread_metadata(metadata)
-        self._seed_assistant_thread_session(metadata)
-
    async def _handle_slack_message(self, event: dict) -> None:
        """Handle an incoming Slack message event."""
        # Dedup: Slack Socket Mode can redeliver events after reconnects (#4777)
@@ -924,21 +781,10 @@ class SlackAdapter(BasePlatformAdapter):
            return

        text = event.get("text", "")
+        user_id = event.get("user", "")
        channel_id = event.get("channel", "")
        ts = event.get("ts", "")
-        assistant_meta = self._lookup_assistant_thread_metadata(
-            event,
-            channel_id=channel_id,
-            thread_ts=event.get("thread_ts", ""),
-        )
-        user_id = event.get("user") or assistant_meta.get("user_id", "")
-        if not channel_id:
-            channel_id = assistant_meta.get("channel_id", "")
-        team_id = (
-            event.get("team")
-            or event.get("team_id")
-            or assistant_meta.get("team_id", "")
-        )
+        team_id = event.get("team", "")

        # Track which workspace owns this channel
        if team_id and channel_id:
@@ -946,8 +792,6 @@ class SlackAdapter(BasePlatformAdapter):

        # Determine if this is a DM or channel message
        channel_type = event.get("channel_type", "")
-        if not channel_type and channel_id.startswith("D"):
-            channel_type = "im"
        is_dm = channel_type == "im"

        # Build thread_ts for session keying.
@@ -956,7 +800,7 @@ class SlackAdapter(BasePlatformAdapter):
        # In DMs: only use the real thread_ts — top-level DMs should share
        #   one continuous session, threaded DMs get their own session.
        if is_dm:
-            thread_ts = event.get("thread_ts") or assistant_meta.get("thread_ts")  # None for top-level DMs
+            thread_ts = event.get("thread_ts")  # None for top-level DMs
        else:
            thread_ts = event.get("thread_ts") or ts  # ts fallback for channels

@@ -184,8 +184,6 @@ if _config_path.exists():
            # Env var from .env takes precedence (already in os.environ).
            if "gateway_timeout" in _agent_cfg and "HERMES_AGENT_TIMEOUT" not in os.environ:
                os.environ["HERMES_AGENT_TIMEOUT"] = str(_agent_cfg["gateway_timeout"])
-            if "gateway_timeout_warning" in _agent_cfg and "HERMES_AGENT_TIMEOUT_WARNING" not in os.environ:
-                os.environ["HERMES_AGENT_TIMEOUT_WARNING"] = str(_agent_cfg["gateway_timeout_warning"])
        # Timezone: bridge config.yaml → HERMES_TIMEZONE env var.
        # HERMES_TIMEZONE from .env takes precedence (already in os.environ).
        _tz_cfg = _cfg.get("timezone", "")
@@ -923,11 +921,12 @@ class GatewayRunner:

    @staticmethod
    def _load_reasoning_config() -> dict | None:
-        """Load reasoning effort from config.yaml.
+        """Load reasoning effort from config with env fallback.

-        Reads agent.reasoning_effort from config.yaml. Valid: "xhigh",
-        "high", "medium", "low", "minimal", "none". Returns None to use
-        default (medium).
+        Checks agent.reasoning_effort in config.yaml first, then
+        HERMES_REASONING_EFFORT as a fallback. Valid: "xhigh", "high",
+        "medium", "low", "minimal", "none". Returns None to use default
+        (medium).
        """
        from hermes_constants import parse_reasoning_effort
        effort = ""
@@ -940,6 +939,8 @@ class GatewayRunner:
                effort = str(cfg.get("agent", {}).get("reasoning_effort", "") or "").strip()
        except Exception:
            pass
+        if not effort:
+            effort = os.getenv("HERMES_REASONING_EFFORT", "")
        result = parse_reasoning_effort(effort)
        if effort and effort.strip() and result is None:
            logger.warning("Unknown reasoning_effort '%s', using default (medium)", effort)
@@ -1075,7 +1076,6 @@ class GatewayRunner:
                       "MATRIX_ALLOWED_USERS", "DINGTALK_ALLOWED_USERS",
                       "FEISHU_ALLOWED_USERS",
                       "WECOM_ALLOWED_USERS",
-                       "BLUEBUBBLES_ALLOWED_USERS",
                       "GATEWAY_ALLOWED_USERS")
        )
        _allow_all = os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes") or any(
@@ -1086,8 +1086,7 @@ class GatewayRunner:
                       "SMS_ALLOW_ALL_USERS", "MATTERMOST_ALLOW_ALL_USERS",
                       "MATRIX_ALLOW_ALL_USERS", "DINGTALK_ALLOW_ALL_USERS",
                       "FEISHU_ALLOW_ALL_USERS",
-                       "WECOM_ALLOW_ALL_USERS",
-                       "BLUEBUBBLES_ALLOW_ALL_USERS")
+                       "WECOM_ALLOW_ALL_USERS")
        )
        if not _any_allowlist and not _allow_all:
            logger.warning(
@@ -1485,14 +1484,6 @@ class GatewayRunner:
                logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
            except Exception as e:
                logger.debug("Failed interrupting agent during shutdown: %s", e)
-            # Fire plugin on_session_finalize hook before memory shutdown
-            try:
-                from hermes_cli.plugins import invoke_hook as _invoke_hook
-                _invoke_hook("on_session_finalize",
-                             session_id=getattr(agent, 'session_id', None),
-                             platform="gateway")
-            except Exception:
-                pass
            # Shut down memory provider at actual session boundary
            try:
                if hasattr(agent, 'shutdown_memory_provider'):
@@ -1658,13 +1649,6 @@ class GatewayRunner:
            adapter.gateway_runner = self  # For cross-platform delivery
            return adapter

-        elif platform == Platform.BLUEBUBBLES:
-            from gateway.platforms.bluebubbles import BlueBubblesAdapter, check_bluebubbles_requirements
-            if not check_bluebubbles_requirements():
-                logger.warning("BlueBubbles: aiohttp/httpx missing or BLUEBUBBLES_SERVER_URL/BLUEBUBBLES_PASSWORD not configured")
-                return None
-            return BlueBubblesAdapter(config)
-
        return None
    
    def _is_user_authorized(self, source: SessionSource) -> bool:
@@ -1703,7 +1687,6 @@ class GatewayRunner:
            Platform.DINGTALK: "DINGTALK_ALLOWED_USERS",
            Platform.FEISHU: "FEISHU_ALLOWED_USERS",
            Platform.WECOM: "WECOM_ALLOWED_USERS",
-            Platform.BLUEBUBBLES: "BLUEBUBBLES_ALLOWED_USERS",
        }
        platform_allow_all_map = {
            Platform.TELEGRAM: "TELEGRAM_ALLOW_ALL_USERS",
@@ -1718,7 +1701,6 @@ class GatewayRunner:
            Platform.DINGTALK: "DINGTALK_ALLOW_ALL_USERS",
            Platform.FEISHU: "FEISHU_ALLOW_ALL_USERS",
            Platform.WECOM: "WECOM_ALLOW_ALL_USERS",
-            Platform.BLUEBUBBLES: "BLUEBUBBLES_ALLOW_ALL_USERS",
        }

        # Per-platform allow-all flag (e.g., DISCORD_ALLOW_ALL_USERS=true)
@@ -1792,11 +1774,8 @@ class GatewayRunner:
        """
        source = event.source

-        # Internal events (e.g. background-process completion notifications)
-        # are system-generated and must skip user authorization.
-        if getattr(event, "internal", False):
-            pass
-        elif not self._is_user_authorized(source):
+        # Check if user is authorized
+        if not self._is_user_authorized(source):
            logger.warning("Unauthorized user: %s (%s) on %s", source.user_id, source.user_name, source.platform.value)
            # In DMs: offer pairing code. In groups: silently ignore.
            if source.chat_type == "dm" and self._get_unauthorized_dm_behavior(source.platform) == "pair":
@@ -3298,15 +3277,6 @@ class GatewayRunner:
        # the configured default instead of the previously switched model.
        self._session_model_overrides.pop(session_key, None)

-        # Fire plugin on_session_finalize hook (session boundary)
-        try:
-            from hermes_cli.plugins import invoke_hook as _invoke_hook
-            _old_sid = old_entry.session_id if old_entry else None
-            _invoke_hook("on_session_finalize", session_id=_old_sid,
-                         platform=source.platform.value if source.platform else "")
-        except Exception:
-            pass
-
        # Emit session:end hook (session is ending)
        await self.hooks.emit("session:end", {
            "platform": source.platform.value if source.platform else "",
@@ -3320,7 +3290,7 @@ class GatewayRunner:
            "user_id": source.user_id,
            "session_key": session_key,
        })
-
+        
        # Resolve session config info to surface to the user
        try:
            session_info = self._format_session_info()
@@ -3331,18 +3301,9 @@ class GatewayRunner:
            header = "✨ Session reset! Starting fresh."
        else:
            # No existing session, just create one
-            new_entry = self.session_store.get_or_create_session(source, force_new=True)
+            self.session_store.get_or_create_session(source, force_new=True)
            header = "✨ New session started!"

-        # Fire plugin on_session_reset hook (new session guaranteed to exist)
-        try:
-            from hermes_cli.plugins import invoke_hook as _invoke_hook
-            _new_sid = new_entry.session_id if new_entry else None
-            _invoke_hook("on_session_reset", session_id=_new_sid,
-                         platform=source.platform.value if source.platform else "")
-        except Exception:
-            pass
-
        if session_info:
            return f"{header}\n\n{session_info}"
        return header
@@ -5534,7 +5495,7 @@ class GatewayRunner:
        Platform.TELEGRAM, Platform.DISCORD, Platform.SLACK, Platform.WHATSAPP,
        Platform.SIGNAL, Platform.MATTERMOST, Platform.MATRIX,
        Platform.HOMEASSISTANT, Platform.EMAIL, Platform.SMS, Platform.DINGTALK,
-        Platform.FEISHU, Platform.WECOM, Platform.BLUEBUBBLES, Platform.LOCAL,
+        Platform.FEISHU, Platform.WECOM, Platform.LOCAL,
    })

    async def _handle_update_command(self, event: MessageEvent) -> str:
@@ -6174,7 +6135,6 @@ class GatewayRunner:
                                text=synth_text,
                                message_type=MessageType.TEXT,
                                source=_source,
-                                internal=True,
                            )
                            logger.info(
                                "Process %s finished — injecting agent notification for session %s",
@@ -6325,15 +6285,7 @@ class GatewayRunner:
        # Falls back to env vars for backward compatibility.
        # YAML 1.1 parses bare `off` as boolean False — normalise before
        # the `or` chain so it doesn't silently fall through to "all".
-        #
-        # Per-platform overrides (display.tool_progress_overrides) take
-        # priority over the global setting — e.g. Signal users can set
-        # tool_progress to "off" while keeping Telegram on "all".
-        _display_cfg = user_config.get("display", {})
-        _overrides = _display_cfg.get("tool_progress_overrides", {})
-        _raw_tp = _overrides.get(platform_key)
-        if _raw_tp is None:
-            _raw_tp = _display_cfg.get("tool_progress")
+        _raw_tp = user_config.get("display", {}).get("tool_progress")
        if _raw_tp is False:
            _raw_tp = "off"
        progress_mode = (
@@ -6437,18 +6389,6 @@ class GatewayRunner:
            if not adapter:
                return

-            # Skip tool progress for platforms that don't support message
-            # editing (e.g. iMessage/BlueBubbles) — each progress update
-            # would become a separate message bubble, which is noisy.
-            from gateway.platforms.base import BasePlatformAdapter as _BaseAdapter
-            if type(adapter).edit_message is _BaseAdapter.edit_message:
-                while not progress_queue.empty():
-                    try:
-                        progress_queue.get_nowait()
-                    except Exception:
-                        break
-                return
-
            progress_lines = []      # Accumulated tool lines
            progress_msg_id = None   # ID of the progress message to edit
            can_edit = True          # False once an edit fails (platform doesn't support it)
@@ -7143,9 +7083,6 @@ class GatewayRunner:
            # Default 1800s (30 min inactivity).  0 = unlimited.
            _agent_timeout_raw = float(os.getenv("HERMES_AGENT_TIMEOUT", 1800))
            _agent_timeout = _agent_timeout_raw if _agent_timeout_raw > 0 else None
-            _agent_warning_raw = float(os.getenv("HERMES_AGENT_TIMEOUT_WARNING", 900))
-            _agent_warning = _agent_warning_raw if _agent_warning_raw > 0 else None
-            _warning_fired = False
            loop = asyncio.get_event_loop()
            _executor_task = asyncio.ensure_future(
                loop.run_in_executor(None, run_sync)
@@ -7178,25 +7115,6 @@ class GatewayRunner:
                            _idle_secs = _act.get("seconds_since_activity", 0.0)
                        except Exception:
                            pass
-                    # Staged warning: fire once before escalating to full timeout.
-                    if (not _warning_fired and _agent_warning is not None
-                            and _idle_secs >= _agent_warning):
-                        _warning_fired = True
-                        _warn_adapter = self.adapters.get(source.platform)
-                        if _warn_adapter:
-                            _elapsed_warn = int(_agent_warning // 60) or 1
-                            _remaining_mins = int((_agent_timeout - _agent_warning) // 60) or 1
-                            try:
-                                await _warn_adapter.send(
-                                    source.chat_id,
-                                    f"⚠️ No activity for {_elapsed_warn} min. "
-                                    f"If the agent does not respond soon, it will "
-                                    f"be timed out in {_remaining_mins} min. "
-                                    f"You can continue waiting or use /reset.",
-                                    metadata=_status_thread_metadata,
-                                )
-                            except Exception as _warn_err:
-                                logger.debug("Inactivity warning send error: %s", _warn_err)
                    if _idle_secs >= _agent_timeout:
                        _inactivity_timeout = True
                        break
@@ -193,7 +193,6 @@ _PII_SAFE_PLATFORMS = frozenset({
    Platform.WHATSAPP,
    Platform.SIGNAL,
    Platform.TELEGRAM,
-    Platform.BLUEBUBBLES,
 })
 """Platforms where user IDs can be safely redacted (no in-message mention system
 that requires raw IDs).  Discord is excluded because mentions use ``<@user_id>``
@@ -74,8 +74,6 @@ class GatewayStreamConsumer:
        self._edit_supported = True  # Disabled on first edit failure (Signal/Email/HA)
        self._last_edit_time = 0.0
        self._last_sent_text = ""   # Track last-sent text to skip redundant edits
-        self._fallback_final_send = False
-        self._fallback_prefix = ""

    @property
    def already_sent(self) -> bool:
@@ -140,19 +138,12 @@ class GatewayStreamConsumer:
                    while (
                        len(self._accumulated) > _safe_limit
                        and self._message_id is not None
-                        and self._edit_supported
                    ):
                        split_at = self._accumulated.rfind("\n", 0, _safe_limit)
                        if split_at < _safe_limit // 2:
                            split_at = _safe_limit
                        chunk = self._accumulated[:split_at]
                        await self._send_or_edit(chunk)
-                        if self._fallback_final_send:
-                            # Edit failed while attempting to split an oversized
-                            # message. Keep the full accumulated text intact so
-                            # the fallback final-send path can deliver the
-                            # remaining continuation without dropping content.
-                            break
                        self._accumulated = self._accumulated[split_at:].lstrip("\n")
                        self._message_id = None
                        self._last_sent_text = ""
@@ -165,17 +156,9 @@ class GatewayStreamConsumer:
                    self._last_edit_time = time.monotonic()

                if got_done:
-                    # Final edit without cursor. If progressive editing failed
-                    # mid-stream, send a single continuation/fallback message
-                    # here instead of letting the base gateway path send the
-                    # full response again.
-                    if self._accumulated:
-                        if self._fallback_final_send:
-                            await self._send_fallback_final(self._accumulated)
-                        elif self._message_id:
-                            await self._send_or_edit(self._accumulated)
-                        elif not self._already_sent:
-                            await self._send_or_edit(self._accumulated)
+                    # Final edit without cursor
+                    if self._accumulated and self._message_id:
+                        await self._send_or_edit(self._accumulated)
                    return

                # Tool boundary: the should_edit block above already flushed
@@ -186,8 +169,6 @@ class GatewayStreamConsumer:
                    self._message_id = None
                    self._accumulated = ""
                    self._last_sent_text = ""
-                    self._fallback_final_send = False
-                    self._fallback_prefix = ""

                await asyncio.sleep(0.05)  # Small yield to not busy-loop

@@ -226,86 +207,6 @@ class GatewayStreamConsumer:
        # Strip trailing whitespace/newlines but preserve leading content
        return cleaned.rstrip()

-    def _visible_prefix(self) -> str:
-        """Return the visible text already shown in the streamed message."""
-        prefix = self._last_sent_text or ""
-        if self.cfg.cursor and prefix.endswith(self.cfg.cursor):
-            prefix = prefix[:-len(self.cfg.cursor)]
-        return self._clean_for_display(prefix)
-
-    def _continuation_text(self, final_text: str) -> str:
-        """Return only the part of final_text the user has not already seen."""
-        prefix = self._fallback_prefix or self._visible_prefix()
-        if prefix and final_text.startswith(prefix):
-            return final_text[len(prefix):].lstrip()
-        return final_text
-
-    @staticmethod
-    def _split_text_chunks(text: str, limit: int) -> list[str]:
-        """Split text into reasonably sized chunks for fallback sends."""
-        if len(text) <= limit:
-            return [text]
-        chunks: list[str] = []
-        remaining = text
-        while len(remaining) > limit:
-            split_at = remaining.rfind("\n", 0, limit)
-            if split_at < limit // 2:
-                split_at = limit
-            chunks.append(remaining[:split_at])
-            remaining = remaining[split_at:].lstrip("\n")
-        if remaining:
-            chunks.append(remaining)
-        return chunks
-
-    async def _send_fallback_final(self, text: str) -> None:
-        """Send the final continuation after streaming edits stop working."""
-        final_text = self._clean_for_display(text)
-        continuation = self._continuation_text(final_text)
-        self._fallback_final_send = False
-        if not continuation.strip():
-            # Nothing new to send — the visible partial already matches final text.
-            self._already_sent = True
-            return
-
-        raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
-        safe_limit = max(500, raw_limit - 100)
-        chunks = self._split_text_chunks(continuation, safe_limit)
-
-        last_message_id: Optional[str] = None
-        last_successful_chunk = ""
-        sent_any_chunk = False
-        for chunk in chunks:
-            result = await self.adapter.send(
-                chat_id=self.chat_id,
-                content=chunk,
-                metadata=self.metadata,
-            )
-            if not result.success:
-                if sent_any_chunk:
-                    # Some continuation text already reached the user. Suppress
-                    # the base gateway final-send path so we don't resend the
-                    # full response and create another duplicate.
-                    self._already_sent = True
-                    self._message_id = last_message_id
-                    self._last_sent_text = last_successful_chunk
-                    self._fallback_prefix = ""
-                    return
-                # No fallback chunk reached the user — allow the normal gateway
-                # final-send path to try one more time.
-                self._already_sent = False
-                self._message_id = None
-                self._last_sent_text = ""
-                self._fallback_prefix = ""
-                return
-            sent_any_chunk = True
-            last_successful_chunk = chunk
-            last_message_id = result.message_id or last_message_id
-
-        self._message_id = last_message_id
-        self._already_sent = True
-        self._last_sent_text = chunks[-1]
-        self._fallback_prefix = ""
-
    async def _send_or_edit(self, text: str) -> None:
        """Send or edit the streaming message."""
        # Strip MEDIA: directives so they don't appear as visible text.
@@ -331,16 +232,14 @@ class GatewayStreamConsumer:
                        self._last_sent_text = text
                    else:
                        # If an edit fails mid-stream (especially Telegram flood control),
-                        # stop progressive edits and send only the missing tail once the
-                        # final response is available.
+                        # stop progressive edits and let the normal final send path deliver
+                        # the complete answer instead of leaving the user with a partial.
                        logger.debug("Edit failed, disabling streaming for this adapter")
-                        self._fallback_prefix = self._visible_prefix()
-                        self._fallback_final_send = True
                        self._edit_supported = False
-                        self._already_sent = True
+                        self._already_sent = False
                else:
                    # Editing not supported — skip intermediate updates.
-                    # The final response will be sent by the fallback path.
+                    # The final response will be sent by the normal path.
                    pass
            else:
                # First message — send new
@@ -353,17 +252,6 @@ class GatewayStreamConsumer:
                    self._message_id = result.message_id
                    self._already_sent = True
                    self._last_sent_text = text
-                elif result.success:
-                    # Platform accepted the message but returned no message_id
-                    # (e.g. Signal).  Can't edit without an ID — switch to
-                    # fallback mode: suppress intermediate deltas, send only
-                    # the missing tail once the final response is ready.
-                    self._already_sent = True
-                    self._edit_supported = False
-                    self._fallback_prefix = self._clean_for_display(text)
-                    self._fallback_final_send = True
-                    # Sentinel prevents re-entering this branch on every delta
-                    self._message_id = "__no_edit__"
                else:
                    # Initial send failed — disable streaming for this session
                    self._edit_supported = False
@@ -11,5 +11,5 @@ Provides subcommands for:
 - hermes cron          - Manage cron jobs
 """

-__version__ = "0.8.0"
-__release_date__ = "2026.4.8"
+__version__ = "0.7.0"
+__release_date__ = "2026.4.3"
@@ -67,16 +67,12 @@ DEFAULT_AGENT_KEY_MIN_TTL_SECONDS = 30 * 60  # 30 minutes
 ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120       # refresh 2 min before expiry
 DEVICE_AUTH_POLL_INTERVAL_CAP_SECONDS = 1     # poll at most every 1s
 DEFAULT_CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex"
-DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
 DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
 DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
 DEFAULT_GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai"
 CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
 CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
 CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
-QWEN_OAUTH_CLIENT_ID = "f0304373b74a44d2b584a3fb70ca9e56"
-QWEN_OAUTH_TOKEN_URL = "https://chat.qwen.ai/api/v1/oauth2/token"
-QWEN_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120


 # =============================================================================
@@ -116,12 +112,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
        auth_type="oauth_external",
        inference_base_url=DEFAULT_CODEX_BASE_URL,
    ),
-    "qwen-oauth": ProviderConfig(
-        id="qwen-oauth",
-        name="Qwen OAuth",
-        auth_type="oauth_external",
-        inference_base_url=DEFAULT_QWEN_BASE_URL,
-    ),
    "copilot": ProviderConfig(
        id="copilot",
        name="GitHub Copilot",
@@ -827,7 +817,6 @@ def resolve_provider(
        "github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
        "aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
        "opencode": "opencode-zen", "zen": "opencode-zen",
-        "qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth",
        "hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
        "go": "opencode-go", "opencode-go-sub": "opencode-go",
        "kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
@@ -957,176 +946,6 @@ def _codex_access_token_is_expiring(access_token: Any, skew_seconds: int) -> boo
    return float(exp) <= (time.time() + max(0, int(skew_seconds)))


-def _qwen_cli_auth_path() -> Path:
-    return Path.home() / ".qwen" / "oauth_creds.json"
-
-
-def _read_qwen_cli_tokens() -> Dict[str, Any]:
-    auth_path = _qwen_cli_auth_path()
-    if not auth_path.exists():
-        raise AuthError(
-            "Qwen CLI credentials not found. Run 'qwen auth qwen-oauth' first.",
-            provider="qwen-oauth",
-            code="qwen_auth_missing",
-        )
-    try:
-        data = json.loads(auth_path.read_text(encoding="utf-8"))
-    except Exception as exc:
-        raise AuthError(
-            f"Failed to read Qwen CLI credentials from {auth_path}: {exc}",
-            provider="qwen-oauth",
-            code="qwen_auth_read_failed",
-        ) from exc
-    if not isinstance(data, dict):
-        raise AuthError(
-            f"Invalid Qwen CLI credentials in {auth_path}.",
-            provider="qwen-oauth",
-            code="qwen_auth_invalid",
-        )
-    return data
-
-
-def _save_qwen_cli_tokens(tokens: Dict[str, Any]) -> Path:
-    auth_path = _qwen_cli_auth_path()
-    auth_path.parent.mkdir(parents=True, exist_ok=True)
-    tmp_path = auth_path.with_suffix(".tmp")
-    tmp_path.write_text(json.dumps(tokens, indent=2, sort_keys=True) + "\n", encoding="utf-8")
-    os.chmod(tmp_path, stat.S_IRUSR | stat.S_IWUSR)
-    tmp_path.replace(auth_path)
-    return auth_path
-
-
-def _qwen_access_token_is_expiring(expiry_date_ms: Any, skew_seconds: int = QWEN_ACCESS_TOKEN_REFRESH_SKEW_SECONDS) -> bool:
-    try:
-        expiry_ms = int(expiry_date_ms)
-    except Exception:
-        return True
-    return (time.time() + max(0, int(skew_seconds))) * 1000 >= expiry_ms
-
-
-def _refresh_qwen_cli_tokens(tokens: Dict[str, Any], timeout_seconds: float = 20.0) -> Dict[str, Any]:
-    refresh_token = str(tokens.get("refresh_token", "") or "").strip()
-    if not refresh_token:
-        raise AuthError(
-            "Qwen OAuth refresh token missing. Re-run 'qwen auth qwen-oauth'.",
-            provider="qwen-oauth",
-            code="qwen_refresh_token_missing",
-        )
-
-    try:
-        response = httpx.post(
-            QWEN_OAUTH_TOKEN_URL,
-            headers={
-                "Content-Type": "application/x-www-form-urlencoded",
-                "Accept": "application/json",
-            },
-            data={
-                "grant_type": "refresh_token",
-                "refresh_token": refresh_token,
-                "client_id": QWEN_OAUTH_CLIENT_ID,
-            },
-            timeout=timeout_seconds,
-        )
-    except Exception as exc:
-        raise AuthError(
-            f"Qwen OAuth refresh failed: {exc}",
-            provider="qwen-oauth",
-            code="qwen_refresh_failed",
-        ) from exc
-
-    if response.status_code >= 400:
-        body = response.text.strip()
-        raise AuthError(
-            "Qwen OAuth refresh failed. Re-run 'qwen auth qwen-oauth'."
-            + (f" Response: {body}" if body else ""),
-            provider="qwen-oauth",
-            code="qwen_refresh_failed",
-        )
-
-    try:
-        payload = response.json()
-    except Exception as exc:
-        raise AuthError(
-            f"Qwen OAuth refresh returned invalid JSON: {exc}",
-            provider="qwen-oauth",
-            code="qwen_refresh_invalid_json",
-        ) from exc
-
-    if not isinstance(payload, dict) or not str(payload.get("access_token", "") or "").strip():
-        raise AuthError(
-            "Qwen OAuth refresh response missing access_token.",
-            provider="qwen-oauth",
-            code="qwen_refresh_invalid_response",
-        )
-
-    expires_in = payload.get("expires_in")
-    try:
-        expires_in_seconds = int(expires_in)
-    except Exception:
-        expires_in_seconds = 6 * 60 * 60
-
-    refreshed = {
-        "access_token": str(payload.get("access_token", "") or "").strip(),
-        "refresh_token": str(payload.get("refresh_token", refresh_token) or refresh_token).strip(),
-        "token_type": str(payload.get("token_type", tokens.get("token_type", "Bearer")) or "Bearer").strip() or "Bearer",
-        "resource_url": str(payload.get("resource_url", tokens.get("resource_url", "portal.qwen.ai")) or "portal.qwen.ai").strip(),
-        "expiry_date": int(time.time() * 1000) + max(1, expires_in_seconds) * 1000,
-    }
-    _save_qwen_cli_tokens(refreshed)
-    return refreshed
-
-
-def resolve_qwen_runtime_credentials(
-    *,
-    force_refresh: bool = False,
-    refresh_if_expiring: bool = True,
-    refresh_skew_seconds: int = QWEN_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
-) -> Dict[str, Any]:
-    tokens = _read_qwen_cli_tokens()
-    access_token = str(tokens.get("access_token", "") or "").strip()
-    should_refresh = bool(force_refresh)
-    if not should_refresh and refresh_if_expiring:
-        should_refresh = _qwen_access_token_is_expiring(tokens.get("expiry_date"), refresh_skew_seconds)
-    if should_refresh:
-        tokens = _refresh_qwen_cli_tokens(tokens)
-        access_token = str(tokens.get("access_token", "") or "").strip()
-    if not access_token:
-        raise AuthError(
-            "Qwen OAuth access token missing. Re-run 'qwen auth qwen-oauth'.",
-            provider="qwen-oauth",
-            code="qwen_access_token_missing",
-        )
-
-    base_url = os.getenv("HERMES_QWEN_BASE_URL", "").strip().rstrip("/") or DEFAULT_QWEN_BASE_URL
-    return {
-        "provider": "qwen-oauth",
-        "base_url": base_url,
-        "api_key": access_token,
-        "source": "qwen-cli",
-        "expires_at_ms": tokens.get("expiry_date"),
-        "auth_file": str(_qwen_cli_auth_path()),
-    }
-
-
-def get_qwen_auth_status() -> Dict[str, Any]:
-    auth_path = _qwen_cli_auth_path()
-    try:
-        creds = resolve_qwen_runtime_credentials(refresh_if_expiring=False)
-        return {
-            "logged_in": True,
-            "auth_file": str(auth_path),
-            "source": creds.get("source"),
-            "api_key": creds.get("api_key"),
-            "expires_at_ms": creds.get("expires_at_ms"),
-        }
-    except AuthError as exc:
-        return {
-            "logged_in": False,
-            "auth_file": str(auth_path),
-            "error": str(exc),
-        }
-
-
 # =============================================================================
 # SSH / remote session detection
 # =============================================================================
@@ -2253,8 +2072,6 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
        return get_nous_auth_status()
    if target == "openai-codex":
        return get_codex_auth_status()
-    if target == "qwen-oauth":
-        return get_qwen_auth_status()
    if target == "copilot-acp":
        return get_external_process_provider_status(target)
    # API-key providers
@@ -32,7 +32,7 @@ from hermes_constants import OPENROUTER_BASE_URL


 # Providers that support OAuth login in addition to API keys.
-_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex", "qwen-oauth"}
+_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex"}


 def _get_custom_provider_names() -> list:
@@ -147,7 +147,7 @@ def auth_add_command(args) -> None:
        if provider.startswith(CUSTOM_POOL_PREFIX):
            requested_type = AUTH_TYPE_API_KEY
        else:
-            requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex", "qwen-oauth"} else AUTH_TYPE_API_KEY
+            requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex"} else AUTH_TYPE_API_KEY

    pool = load_pool(provider)

@@ -250,26 +250,6 @@ def auth_add_command(args) -> None:
        print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
        return

-    if provider == "qwen-oauth":
-        creds = auth_mod.resolve_qwen_runtime_credentials(refresh_if_expiring=False)
-        label = (getattr(args, "label", None) or "").strip() or label_from_token(
-            creds["api_key"],
-            _oauth_default_label(provider, len(pool.entries()) + 1),
-        )
-        entry = PooledCredential(
-            provider=provider,
-            id=uuid.uuid4().hex[:6],
-            label=label,
-            auth_type=AUTH_TYPE_OAUTH,
-            priority=0,
-            source=f"{SOURCE_MANUAL}:qwen_cli",
-            access_token=creds["api_key"],
-            base_url=creds.get("base_url"),
-        )
-        pool.add_entry(entry)
-        print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
-        return
-
    raise SystemExit(f"`hermes auth add {provider}` is not implemented for auth type {requested_type} yet.")


@@ -295,16 +295,10 @@ def _format_context_length(tokens: int) -> str:
    """Format a token count for display (e.g. 128000 → '128K', 1048576 → '1M')."""
    if tokens >= 1_000_000:
        val = tokens / 1_000_000
-        rounded = round(val)
-        if abs(val - rounded) < 0.05:
-            return f"{rounded}M"
-        return f"{val:.1f}M"
+        return f"{val:g}M"
    elif tokens >= 1_000:
        val = tokens / 1_000
-        rounded = round(val)
-        if abs(val - rounded) < 0.05:
-            return f"{rounded}K"
-        return f"{val:.1f}K"
+        return f"{val:g}K"
    return str(tokens)


@@ -39,7 +39,6 @@ _EXTRA_ENV_KEYS = frozenset({
    "DINGTALK_CLIENT_ID", "DINGTALK_CLIENT_SECRET",
    "FEISHU_APP_ID", "FEISHU_APP_SECRET", "FEISHU_ENCRYPT_KEY", "FEISHU_VERIFICATION_TOKEN",
    "WECOM_BOT_ID", "WECOM_SECRET",
-    "BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_PASSWORD",
    "TERMINAL_ENV", "TERMINAL_SSH_KEY", "TERMINAL_SSH_PORT",
    "WHATSAPP_MODE", "WHATSAPP_ENABLED",
    "MATTERMOST_HOME_CHANNEL", "MATTERMOST_REPLY_MODE",
@@ -158,14 +157,7 @@ def get_project_root() -> Path:
    return Path(__file__).parent.parent.resolve()

 def _secure_dir(path):
-    """Set directory to owner-only access (0700). No-op on Windows.
-
-    Skipped in managed mode — the NixOS module sets group-readable
-    permissions (0750) so interactive users in the hermes group can
-    share state with the gateway service.
-    """
-    if is_managed():
-        return
+    """Set directory to owner-only access (0700). No-op on Windows."""
    try:
        os.chmod(path, 0o700)
    except (OSError, NotImplementedError):
@@ -173,13 +165,7 @@ def _secure_dir(path):


 def _secure_file(path):
-    """Set file to owner-only read/write (0600). No-op on Windows.
-
-    Skipped in managed mode — the NixOS activation script sets
-    group-readable permissions (0640) on config files.
-    """
-    if is_managed():
-        return
+    """Set file to owner-only read/write (0600). No-op on Windows."""
    try:
        if os.path.exists(str(path)):
            os.chmod(path, 0o600)
@@ -231,10 +217,6 @@ DEFAULT_CONFIG = {
        # (force on/off for all models), or a list of model-name substrings
        # to match (e.g. ["gpt", "codex", "gemini", "qwen"]).
        "tool_use_enforcement": "auto",
-        # Staged inactivity warning: send a warning to the user at this
-        # threshold before escalating to a full timeout.  The warning fires
-        # once per run and does not interrupt the agent.  0 = disable warning.
-        "gateway_timeout_warning": 900,
    },
    
    "terminal": {
@@ -397,7 +379,6 @@ DEFAULT_CONFIG = {
        "show_cost": False,       # Show $ cost in the status bar (off by default)
        "skin": "default",
        "tool_progress_command": False,  # Enable /verbose command in messaging gateway
-        "tool_progress_overrides": {},  # Per-platform overrides: {"signal": "off", "telegram": "all"}
        "tool_preview_length": 0,  # Max chars for tool call previews (0 = no limit, show full paths/commands)
    },

@@ -432,7 +413,7 @@ DEFAULT_CONFIG = {
    
    "stt": {
        "enabled": True,
-        "provider": "local",  # "local" (free, faster-whisper) | "groq" | "openai" (Whisper API) | "mistral" (Voxtral Transcribe)
+        "provider": "local",  # "local" (free, faster-whisper) | "groq" | "openai" (Whisper API)
        "local": {
            "model": "base",  # tiny, base, small, medium, large-v3
            "language": "",  # auto-detect by default; set to "en", "es", "fr", etc. to force
@@ -440,9 +421,6 @@ DEFAULT_CONFIG = {
        "openai": {
            "model": "whisper-1",  # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
        },
-        "mistral": {
-            "model": "voxtral-mini-latest",  # voxtral-mini-latest, voxtral-mini-2602
-        },
    },

    "voice": {
@@ -746,14 +724,6 @@ OPTIONAL_ENV_VARS = {
        "category": "provider",
        "advanced": True,
    },
-    "HERMES_QWEN_BASE_URL": {
-        "description": "Qwen Portal base URL override (default: https://portal.qwen.ai/v1)",
-        "prompt": "Qwen Portal base URL (leave empty for default)",
-        "url": None,
-        "password": False,
-        "category": "provider",
-        "advanced": True,
-    },
    "OPENCODE_ZEN_API_KEY": {
        "description": "OpenCode Zen API key (pay-as-you-go access to curated models)",
        "prompt": "OpenCode Zen API key",
@@ -1005,13 +975,6 @@ OPTIONAL_ENV_VARS = {
        "password": False,
        "category": "messaging",
    },
-    "DISCORD_REPLY_TO_MODE": {
-        "description": "Discord reply threading mode: 'off' (no reply references), 'first' (reply on first message only, default), 'all' (reply on every chunk)",
-        "prompt": "Discord reply mode (off/first/all)",
-        "url": None,
-        "password": False,
-        "category": "messaging",
-    },
    "SLACK_BOT_TOKEN": {
        "description": "Slack bot token (xoxb-). Get from OAuth & Permissions after installing your app. "
                       "Required scopes: chat:write, app_mentions:read, channels:history, groups:history, "
@@ -1125,27 +1088,6 @@ OPTIONAL_ENV_VARS = {
        "category": "messaging",
        "advanced": True,
    },
-    "BLUEBUBBLES_SERVER_URL": {
-        "description": "BlueBubbles server URL for iMessage integration (e.g. http://192.168.1.10:1234)",
-        "prompt": "BlueBubbles server URL",
-        "url": "https://bluebubbles.app/",
-        "password": False,
-        "category": "messaging",
-    },
-    "BLUEBUBBLES_PASSWORD": {
-        "description": "BlueBubbles server password (from BlueBubbles Server → Settings → API)",
-        "prompt": "BlueBubbles server password",
-        "url": None,
-        "password": True,
-        "category": "messaging",
-    },
-    "BLUEBUBBLES_ALLOWED_USERS": {
-        "description": "Comma-separated iMessage addresses (email or phone) allowed to use the bot",
-        "prompt": "Allowed iMessage addresses (comma-separated)",
-        "url": None,
-        "password": False,
-        "category": "messaging",
-    },
    "GATEWAY_ALLOW_ALL_USERS": {
        "description": "Allow all users to interact with messaging bots (true/false). Default: false.",
        "prompt": "Allow all users (true/false)",
@@ -93,21 +93,6 @@ def cron_list(show_all: bool = False):
        script = job.get("script")
        if script:
            print(f"    Script:    {script}")
-
-        # Execution history
-        last_status = job.get("last_status")
-        if last_status:
-            last_run = job.get("last_run_at", "?")
-            if last_status == "ok":
-                status_display = color("ok", Colors.GREEN)
-            else:
-                status_display = color(f"{last_status}: {job.get('last_error', '?')}", Colors.RED)
-            print(f"    Last run:  {last_run}  {status_display}")
-
-        delivery_err = job.get("last_delivery_error")
-        if delivery_err:
-            print(f"    {color('⚠ Delivery failed:', Colors.YELLOW)} {delivery_err}")
-
        print()

    from hermes_cli.gateway import find_gateway_pids
@@ -812,83 +812,69 @@ def run_doctor(args):
        check_warn("No GITHUB_TOKEN", f"(60 req/hr rate limit — set in {_DHH}/.env for better rates)")

    # =========================================================================
-    # Memory Provider (only check the active provider, if any)
+    # Honcho memory
    # =========================================================================
    print()
-    print(color("◆ Memory Provider", Colors.CYAN, Colors.BOLD))
+    print(color("◆ Honcho Memory", Colors.CYAN, Colors.BOLD))

-    _active_memory_provider = ""
    try:
-        import yaml as _yaml
-        _mem_cfg_path = HERMES_HOME / "config.yaml"
-        if _mem_cfg_path.exists():
-            with open(_mem_cfg_path) as _f:
-                _raw_cfg = _yaml.safe_load(_f) or {}
-            _active_memory_provider = (_raw_cfg.get("memory") or {}).get("provider", "")
-    except Exception:
-        pass
+        from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path
+        hcfg = HonchoClientConfig.from_global_config()
+        _honcho_cfg_path = resolve_config_path()

-    if not _active_memory_provider:
-        check_ok("Built-in memory active", "(no external provider configured — this is fine)")
-    elif _active_memory_provider == "honcho":
-        try:
-            from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path
-            hcfg = HonchoClientConfig.from_global_config()
-            _honcho_cfg_path = resolve_config_path()
+        if not _honcho_cfg_path.exists():
+            check_warn("Honcho config not found", "run: hermes memory setup")
+        elif not hcfg.enabled:
+            check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
+        elif not (hcfg.api_key or hcfg.base_url):
+            check_fail("Honcho API key or base URL not set", "run: hermes memory setup")
+            issues.append("No Honcho API key — run 'hermes memory setup'")
+        else:
+            from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client
+            reset_honcho_client()
+            try:
+                get_honcho_client(hcfg)
+                check_ok(
+                    "Honcho connected",
+                    f"workspace={hcfg.workspace_id} mode={hcfg.recall_mode} freq={hcfg.write_frequency}",
+                )
+            except Exception as _e:
+                check_fail("Honcho connection failed", str(_e))
+                issues.append(f"Honcho unreachable: {_e}")
+    except ImportError:
+        check_warn("honcho-ai not installed", "pip install honcho-ai")
+    except Exception as _e:
+        check_warn("Honcho check failed", str(_e))

-            if not _honcho_cfg_path.exists():
-                check_warn("Honcho config not found", "run: hermes memory setup")
-            elif not hcfg.enabled:
-                check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
-            elif not (hcfg.api_key or hcfg.base_url):
-                check_fail("Honcho API key or base URL not set", "run: hermes memory setup")
-                issues.append("No Honcho API key — run 'hermes memory setup'")
-            else:
-                from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client
-                reset_honcho_client()
+    # =========================================================================
+    # Mem0 memory
+    # =========================================================================
+    print()
+    print(color("◆ Mem0 Memory", Colors.CYAN, Colors.BOLD))
+
+    try:
+        from plugins.memory.mem0 import _load_config as _load_mem0_config
+        mem0_cfg = _load_mem0_config()
+        mem0_key = mem0_cfg.get("api_key", "")
+        if mem0_key:
+            check_ok("Mem0 API key configured")
+            check_info(f"user_id={mem0_cfg.get('user_id', '?')}  agent_id={mem0_cfg.get('agent_id', '?')}")
+            # Check if mem0.json exists but is missing api_key (the bug we fixed)
+            mem0_json = HERMES_HOME / "mem0.json"
+            if mem0_json.exists():
                try:
-                    get_honcho_client(hcfg)
-                    check_ok(
-                        "Honcho connected",
-                        f"workspace={hcfg.workspace_id} mode={hcfg.recall_mode} freq={hcfg.write_frequency}",
-                    )
-                except Exception as _e:
-                    check_fail("Honcho connection failed", str(_e))
-                    issues.append(f"Honcho unreachable: {_e}")
-        except ImportError:
-            check_fail("honcho-ai not installed", "pip install honcho-ai")
-            issues.append("Honcho is set as memory provider but honcho-ai is not installed")
-        except Exception as _e:
-            check_warn("Honcho check failed", str(_e))
-    elif _active_memory_provider == "mem0":
-        try:
-            from plugins.memory.mem0 import _load_config as _load_mem0_config
-            mem0_cfg = _load_mem0_config()
-            mem0_key = mem0_cfg.get("api_key", "")
-            if mem0_key:
-                check_ok("Mem0 API key configured")
-                check_info(f"user_id={mem0_cfg.get('user_id', '?')}  agent_id={mem0_cfg.get('agent_id', '?')}")
-            else:
-                check_fail("Mem0 API key not set", "(set MEM0_API_KEY in .env or run hermes memory setup)")
-                issues.append("Mem0 is set as memory provider but API key is missing")
-        except ImportError:
-            check_fail("Mem0 plugin not loadable", "pip install mem0ai")
-            issues.append("Mem0 is set as memory provider but mem0ai is not installed")
-        except Exception as _e:
-            check_warn("Mem0 check failed", str(_e))
-    else:
-        # Generic check for other memory providers (openviking, hindsight, etc.)
-        try:
-            from plugins.memory import load_memory_provider
-            _provider = load_memory_provider(_active_memory_provider)
-            if _provider and _provider.is_available():
-                check_ok(f"{_active_memory_provider} provider active")
-            elif _provider:
-                check_warn(f"{_active_memory_provider} configured but not available", "run: hermes memory status")
-            else:
-                check_warn(f"{_active_memory_provider} plugin not found", "run: hermes memory setup")
-        except Exception as _e:
-            check_warn(f"{_active_memory_provider} check failed", str(_e))
+                    import json as _json
+                    file_cfg = _json.loads(mem0_json.read_text())
+                    if not file_cfg.get("api_key") and mem0_key:
+                        check_info("api_key from .env (not in mem0.json) — this is fine")
+                except Exception:
+                    pass
+        else:
+            check_warn("Mem0 not configured", "(set MEM0_API_KEY in .env or run hermes memory setup)")
+    except ImportError:
+        check_warn("Mem0 plugin not loadable", "(optional)")
+    except Exception as _e:
+        check_warn("Mem0 check failed", str(_e))

    # =========================================================================
    # Profiles
@@ -1588,34 +1588,6 @@ _PLATFORMS = [
             "help": "Chat ID for scheduled results and notifications."},
        ],
    },
-    {
-        "key": "bluebubbles",
-        "label": "BlueBubbles (iMessage)",
-        "emoji": "💬",
-        "token_var": "BLUEBUBBLES_SERVER_URL",
-        "setup_instructions": [
-            "1. Install BlueBubbles on a Mac that will act as your iMessage server:",
-            "   https://bluebubbles.app/",
-            "2. Complete the BlueBubbles setup wizard — sign in with your Apple ID",
-            "3. In BlueBubbles Settings → API, note the Server URL and password",
-            "4. The server URL is typically http://<your-mac-ip>:1234",
-            "5. Hermes connects via the BlueBubbles REST API and receives",
-            "   incoming messages via a local webhook",
-            "6. To authorize users, use DM pairing: hermes pairing generate bluebubbles",
-            "   Share the code — the user sends it via iMessage to get approved",
-        ],
-        "vars": [
-            {"name": "BLUEBUBBLES_SERVER_URL", "prompt": "BlueBubbles server URL (e.g. http://192.168.1.10:1234)", "password": False,
-             "help": "The URL shown in BlueBubbles Settings → API."},
-            {"name": "BLUEBUBBLES_PASSWORD", "prompt": "BlueBubbles server password", "password": True,
-             "help": "The password shown in BlueBubbles Settings → API."},
-            {"name": "BLUEBUBBLES_ALLOWED_USERS", "prompt": "Pre-authorized phone numbers or iMessage IDs (comma-separated, or leave empty for DM pairing)", "password": False,
-             "is_allowlist": True,
-             "help": "Optional — pre-authorize specific users. Leave empty to use DM pairing instead (recommended)."},
-            {"name": "BLUEBUBBLES_HOME_CHANNEL", "prompt": "Home channel (phone number or iMessage ID for cron/notifications, or empty)", "password": False,
-             "help": "Phone number or Apple ID to deliver cron results and notifications to."},
-        ],
-    },
 ]


@@ -918,7 +918,6 @@ def select_provider_and_model(args=None):
        "openrouter": "OpenRouter",
        "nous": "Nous Portal",
        "openai-codex": "OpenAI Codex",
-        "qwen-oauth": "Qwen OAuth",
        "copilot-acp": "GitHub Copilot ACP",
        "copilot": "GitHub Copilot",
        "anthropic": "Anthropic",
@@ -948,7 +947,6 @@ def select_provider_and_model(args=None):
        ("openrouter", "OpenRouter (100+ models, pay-per-use)"),
        ("anthropic", "Anthropic (Claude models — API key or Claude Code)"),
        ("openai-codex", "OpenAI Codex"),
-        ("qwen-oauth", "Qwen OAuth (reuses local Qwen CLI login)"),
        ("copilot", "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
        ("huggingface", "Hugging Face Inference Providers (20+ open models)"),
    ]
@@ -1045,8 +1043,6 @@ def select_provider_and_model(args=None):
        _model_flow_nous(config, current_model, args=args)
    elif selected_provider == "openai-codex":
        _model_flow_openai_codex(config, current_model)
-    elif selected_provider == "qwen-oauth":
-        _model_flow_qwen_oauth(config, current_model)
    elif selected_provider == "copilot-acp":
        _model_flow_copilot_acp(config, current_model)
    elif selected_provider == "copilot":
@@ -1363,56 +1359,6 @@ def _model_flow_openai_codex(config, current_model=""):



-_DEFAULT_QWEN_PORTAL_MODELS = [
-    "qwen3-coder-plus",
-    "qwen3-coder",
-]
-
-
-def _model_flow_qwen_oauth(_config, current_model=""):
-    """Qwen OAuth provider: reuse local Qwen CLI login, then pick model."""
-    from hermes_cli.auth import (
-        get_qwen_auth_status,
-        resolve_qwen_runtime_credentials,
-        _prompt_model_selection,
-        _save_model_choice,
-        _update_config_for_provider,
-        DEFAULT_QWEN_BASE_URL,
-    )
-    from hermes_cli.models import fetch_api_models
-
-    status = get_qwen_auth_status()
-    if not status.get("logged_in"):
-        print("Not logged into Qwen CLI OAuth.")
-        print("Run: qwen auth qwen-oauth")
-        auth_file = status.get("auth_file")
-        if auth_file:
-            print(f"Expected credentials file: {auth_file}")
-        if status.get("error"):
-            print(f"Error: {status.get('error')}")
-        return
-
-    # Try live model discovery, fall back to curated list.
-    models = None
-    try:
-        creds = resolve_qwen_runtime_credentials(refresh_if_expiring=True)
-        models = fetch_api_models(creds["api_key"], creds["base_url"])
-    except Exception:
-        pass
-    if not models:
-        models = list(_DEFAULT_QWEN_PORTAL_MODELS)
-
-    default = current_model or (models[0] if models else "qwen3-coder-plus")
-    selected = _prompt_model_selection(models, current_model=default)
-    if selected:
-        _save_model_choice(selected)
-        _update_config_for_provider("qwen-oauth", DEFAULT_QWEN_BASE_URL)
-        print(f"Default model set to: {selected} (via Qwen OAuth)")
-    else:
-        print("No change.")
-
-
-
 def _model_flow_custom(config):
    """Custom endpoint: collect URL, API key, and model name.

@@ -1474,11 +1420,7 @@ def _model_flow_custom(config):
            f"Hermes will still save it."
        )
        if probe.get("suggested_base_url"):
-            suggested = probe["suggested_base_url"]
-            if suggested.endswith("/v1"):
-                print(f"  If this server expects /v1 in the path, try base URL: {suggested}")
-            else:
-                print(f"  If /v1 should not be in the base URL, try: {suggested}")
+            print(f"  If this server expects /v1, try base URL: {probe['suggested_base_url']}")

    # Select model — use probe results when available, fall back to manual input
    model_name = ""
@@ -84,7 +84,6 @@ _PASSTHROUGH_PROVIDERS: frozenset[str] = frozenset({
    "minimax",
    "minimax-cn",
    "alibaba",
-    "qwen-oauth",
    "huggingface",
    "openai-codex",
    "custom",
@@ -537,11 +537,8 @@ def switch_model(
                    )
            else:
                # --- Step c: On aggregator, convert vendor:model to vendor/model ---
-                # Only convert when there's no slash — a slash means the name
-                # is already in vendor/model format and the colon is a variant
-                # tag (:free, :extended, :fast) that must be preserved.
                colon_pos = raw_input.find(":")
-                if colon_pos > 0 and "/" not in raw_input and is_aggregator(current_provider):
+                if colon_pos > 0 and is_aggregator(current_provider):
                    left = raw_input[:colon_pos].strip().lower()
                    right = raw_input[colon_pos + 1:].strip()
                    if left and right:
@@ -794,12 +791,12 @@ def list_authenticated_providers(
        if overlay.auth_type in ("oauth_device_code", "oauth_external", "external_process"):
            # These use auth stores, not env vars — check for auth.json entries
            try:
-                from hermes_cli.auth import _load_auth_store
-                store = _load_auth_store()
-                if store and (pid in store.get("providers", {}) or pid in store.get("credential_pool", {})):
+                from hermes_cli.auth import _read_auth_store
+                store = _read_auth_store()
+                if store and pid in store:
                    has_creds = True
-            except Exception as exc:
-                logger.debug("Auth store check failed for %s: %s", pid, exc)
+            except Exception:
+                pass
        if not has_creds:
            continue

@@ -144,22 +144,18 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "kimi-k2-0905-preview",
    ],
    "minimax": [
-        "MiniMax-M1",
-        "MiniMax-M1-40k",
-        "MiniMax-M1-80k",
-        "MiniMax-M1-128k",
-        "MiniMax-M1-256k",
-        "MiniMax-M2.5",
        "MiniMax-M2.7",
+        "MiniMax-M2.7-highspeed",
+        "MiniMax-M2.5",
+        "MiniMax-M2.5-highspeed",
+        "MiniMax-M2.1",
    ],
    "minimax-cn": [
-        "MiniMax-M1",
-        "MiniMax-M1-40k",
-        "MiniMax-M1-80k",
-        "MiniMax-M1-128k",
-        "MiniMax-M1-256k",
-        "MiniMax-M2.5",
        "MiniMax-M2.7",
+        "MiniMax-M2.7-highspeed",
+        "MiniMax-M2.5",
+        "MiniMax-M2.5-highspeed",
+        "MiniMax-M2.1",
    ],
    "anthropic": [
        "claude-opus-4-6",
@@ -483,7 +479,6 @@ _PROVIDER_LABELS = {
    "ai-gateway": "AI Gateway",
    "kilocode": "Kilo Code",
    "alibaba": "Alibaba Cloud (DashScope)",
-    "qwen-oauth": "Qwen OAuth (Portal)",
    "huggingface": "Hugging Face",
    "custom": "Custom endpoint",
 }
@@ -523,7 +518,6 @@ _PROVIDER_ALIASES = {
    "aliyun": "alibaba",
    "qwen": "alibaba",
    "alibaba-cloud": "alibaba",
-    "qwen-portal": "qwen-oauth",
    "hf": "huggingface",
    "hugging-face": "huggingface",
    "huggingface-hub": "huggingface",
@@ -769,7 +763,6 @@ def list_available_providers() -> list[dict[str, str]]:
        "openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
        "gemini", "huggingface",
        "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
-        "qwen-oauth",
        "opencode-zen", "opencode-go",
        "ai-gateway", "deepseek", "custom",
    ]
@@ -1532,7 +1525,7 @@ def probe_api_models(

    return {
        "models": None,
-        "probed_url": tried[0] if tried else normalized.rstrip("/") + "/models",
+        "probed_url": tried[-1] if tried else normalized.rstrip("/") + "/models",
        "resolved_base_url": normalized,
        "suggested_base_url": alternate_base if alternate_base != normalized else None,
        "used_fallback": False,
@@ -61,8 +61,6 @@ VALID_HOOKS: Set[str] = {
    "post_api_request",
    "on_session_start",
    "on_session_end",
-    "on_session_finalize",
-    "on_session_reset",
 }

 ENTRY_POINTS_GROUP = "hermes_agent.plugins"
@@ -58,12 +58,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
        auth_type="oauth_external",
        base_url_override="https://chatgpt.com/backend-api/codex",
    ),
-    "qwen-oauth": HermesOverlay(
-        transport="openai_chat",
-        auth_type="oauth_external",
-        base_url_override="https://portal.qwen.ai/v1",
-        base_url_env_var="HERMES_QWEN_BASE_URL",
-    ),
    "copilot-acp": HermesOverlay(
        transport="codex_responses",
        auth_type="external_process",
@@ -14,13 +14,11 @@ from agent.credential_pool import CredentialPool, PooledCredential, get_custom_p
 from hermes_cli.auth import (
    AuthError,
    DEFAULT_CODEX_BASE_URL,
-    DEFAULT_QWEN_BASE_URL,
    PROVIDER_REGISTRY,
    format_auth_error,
    resolve_provider,
    resolve_nous_runtime_credentials,
    resolve_codex_runtime_credentials,
-    resolve_qwen_runtime_credentials,
    resolve_api_key_provider_credentials,
    resolve_external_process_provider_credentials,
    has_usable_secret,
@@ -150,9 +148,6 @@ def _resolve_runtime_from_pool_entry(
    if provider == "openai-codex":
        api_mode = "codex_responses"
        base_url = base_url or DEFAULT_CODEX_BASE_URL
-    elif provider == "qwen-oauth":
-        api_mode = "chat_completions"
-        base_url = base_url or DEFAULT_QWEN_BASE_URL
    elif provider == "anthropic":
        api_mode = "anthropic_messages"
        cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
@@ -168,16 +163,6 @@ def _resolve_runtime_from_pool_entry(
        api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
    else:
        configured_provider = str(model_cfg.get("provider") or "").strip().lower()
-        # Honour model.base_url from config.yaml when the configured provider
-        # matches this provider — same pattern as the Anthropic branch above.
-        # Only override when the pool entry has no explicit base_url (i.e. it
-        # fell back to the hardcoded default).  Env var overrides win (#6039).
-        pconfig = PROVIDER_REGISTRY.get(provider)
-        pool_url_is_default = pconfig and base_url.rstrip("/") == pconfig.inference_base_url.rstrip("/")
-        if configured_provider == provider and pool_url_is_default:
-            cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
-            if cfg_base_url:
-                base_url = cfg_base_url
        configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
        if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
            api_mode = configured_mode
@@ -696,24 +681,6 @@ def resolve_runtime_provider(
            logger.info("Auto-detected Codex provider but credentials failed; "
                        "falling through to next provider.")

-    if provider == "qwen-oauth":
-        try:
-            creds = resolve_qwen_runtime_credentials()
-            return {
-                "provider": "qwen-oauth",
-                "api_mode": "chat_completions",
-                "base_url": creds.get("base_url", "").rstrip("/"),
-                "api_key": creds.get("api_key", ""),
-                "source": creds.get("source", "qwen-cli"),
-                "expires_at_ms": creds.get("expires_at_ms"),
-                "requested_provider": requested_provider,
-            }
-        except AuthError:
-            if requested_provider != "auto":
-                raise
-            logger.info("Qwen OAuth credentials failed; "
-                        "falling through to next provider.")
-
    if provider == "copilot-acp":
        creds = resolve_external_process_provider_credentials(provider)
        return {
@@ -757,15 +724,7 @@ def resolve_runtime_provider(
    pconfig = PROVIDER_REGISTRY.get(provider)
    if pconfig and pconfig.auth_type == "api_key":
        creds = resolve_api_key_provider_credentials(provider)
-        # Honour model.base_url from config.yaml when the configured provider
-        # matches this provider — mirrors the Anthropic path above.  Without
-        # this, users who set model.base_url to e.g. api.minimaxi.com/anthropic
-        # (China endpoint) still get the hardcoded api.minimax.io default (#6039).
-        cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
-        cfg_base_url = ""
-        if cfg_provider == provider:
-            cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
-        base_url = cfg_base_url or creds.get("base_url", "").rstrip("/")
+        base_url = creds.get("base_url", "").rstrip("/")
        api_mode = "chat_completions"
        if provider == "copilot":
            api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", ""))
@@ -105,8 +105,8 @@ _DEFAULT_PROVIDER_MODELS = {
    ],
    "zai": ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
    "kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
-    "minimax": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"],
-    "minimax-cn": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"],
+    "minimax": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
+    "minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
    "ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
    "kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
    "opencode-zen": ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash", "glm-5", "kimi-k2.5", "minimax-m2.7"],
@@ -2167,71 +2167,6 @@ def _setup_whatsapp():
        print_info("or personal self-chat) and pair via QR code.")


-def _setup_bluebubbles():
-    """Configure BlueBubbles iMessage gateway."""
-    print_header("BlueBubbles (iMessage)")
-    existing = get_env_value("BLUEBUBBLES_SERVER_URL")
-    if existing:
-        print_info("BlueBubbles: already configured")
-        if not prompt_yes_no("Reconfigure BlueBubbles?", False):
-            return
-
-    print_info("Connects Hermes to iMessage via BlueBubbles — a free, open-source")
-    print_info("macOS server that bridges iMessage to any device.")
-    print_info("   Requires a Mac running BlueBubbles Server v1.0.0+")
-    print_info("   Download: https://bluebubbles.app/")
-    print()
-    print_info("In BlueBubbles Server → Settings → API, note your Server URL and Password.")
-    print()
-
-    server_url = prompt("BlueBubbles server URL (e.g. http://192.168.1.10:1234)")
-    if not server_url:
-        print_warning("Server URL is required — skipping BlueBubbles setup")
-        return
-    save_env_value("BLUEBUBBLES_SERVER_URL", server_url.rstrip("/"))
-
-    password = prompt("BlueBubbles server password", password=True)
-    if not password:
-        print_warning("Password is required — skipping BlueBubbles setup")
-        return
-    save_env_value("BLUEBUBBLES_PASSWORD", password)
-    print_success("BlueBubbles credentials saved")
-
-    print()
-    print_info("🔒 Security: Restrict who can message your bot")
-    print_info("   Use iMessage addresses: email (user@icloud.com) or phone (+15551234567)")
-    print()
-    allowed_users = prompt("Allowed iMessage addresses (comma-separated, leave empty for open access)")
-    if allowed_users:
-        save_env_value("BLUEBUBBLES_ALLOWED_USERS", allowed_users.replace(" ", ""))
-        print_success("BlueBubbles allowlist configured")
-    else:
-        print_info("⚠️  No allowlist set — anyone who can iMessage you can use the bot!")
-
-    print()
-    print_info("📬 Home Channel: phone or email for cron job delivery and notifications.")
-    print_info("   You can also set this later with /set-home in your iMessage chat.")
-    home_channel = prompt("Home channel address (leave empty to set later)")
-    if home_channel:
-        save_env_value("BLUEBUBBLES_HOME_CHANNEL", home_channel)
-
-    print()
-    print_info("Advanced settings (defaults are fine for most setups):")
-    if prompt_yes_no("Configure webhook listener settings?", False):
-        webhook_port = prompt("Webhook listener port (default: 8645)")
-        if webhook_port:
-            try:
-                save_env_value("BLUEBUBBLES_WEBHOOK_PORT", str(int(webhook_port)))
-                print_success(f"Webhook port set to {webhook_port}")
-            except ValueError:
-                print_warning("Invalid port number, using default 8645")
-
-    print()
-    print_info("Requires the BlueBubbles Private API helper for typing indicators,")
-    print_info("read receipts, and tapback reactions. Basic messaging works without it.")
-    print_info("   Install: https://docs.bluebubbles.app/helper-bundle/installation")
-
-
 def _setup_webhooks():
    """Configure webhook integration."""
    print_header("Webhooks")
@@ -2286,7 +2221,6 @@ _GATEWAY_PLATFORMS = [
    ("Matrix", "MATRIX_ACCESS_TOKEN", _setup_matrix),
    ("Mattermost", "MATTERMOST_TOKEN", _setup_mattermost),
    ("WhatsApp", "WHATSAPP_ENABLED", _setup_whatsapp),
-    ("BlueBubbles (iMessage)", "BLUEBUBBLES_SERVER_URL", _setup_bluebubbles),
    ("Webhooks (GitHub, GitLab, etc.)", "WEBHOOK_ENABLED", _setup_webhooks),
 ]

@@ -2330,7 +2264,6 @@ def setup_gateway(config: dict):
        or get_env_value("MATRIX_ACCESS_TOKEN")
        or get_env_value("MATRIX_PASSWORD")
        or get_env_value("WHATSAPP_ENABLED")
-        or get_env_value("BLUEBUBBLES_SERVER_URL")
        or get_env_value("WEBHOOK_ENABLED")
    )
    if any_messaging:
@@ -2350,8 +2283,6 @@ def setup_gateway(config: dict):
            missing_home.append("Discord")
        if get_env_value("SLACK_BOT_TOKEN") and not get_env_value("SLACK_HOME_CHANNEL"):
            missing_home.append("Slack")
-        if get_env_value("BLUEBUBBLES_SERVER_URL") and not get_env_value("BLUEBUBBLES_HOME_CHANNEL"):
-            missing_home.append("BlueBubbles")

        if missing_home:
            print()
@@ -2522,8 +2453,6 @@ def _get_section_config_summary(config: dict, section_key: str) -> Optional[str]
            platforms.append("WhatsApp")
        if get_env_value("SIGNAL_ACCOUNT"):
            platforms.append("Signal")
-        if get_env_value("BLUEBUBBLES_SERVER_URL"):
-            platforms.append("BlueBubbles")
        if platforms:
            return ", ".join(platforms)
        return None  # No platforms configured — section must run
@@ -23,7 +23,6 @@ PLATFORMS = {
    "slack":    "💼 Slack",
    "whatsapp": "📱 WhatsApp",
    "signal":   "📡 Signal",
-    "bluebubbles": "💬 BlueBubbles",
    "email":    "📧 Email",
    "homeassistant": "🏠 Home Assistant",
    "mattermost": "💬 Mattermost",
@@ -153,14 +153,12 @@ def show_status(args):
    print(color("◆ Auth Providers", Colors.CYAN, Colors.BOLD))

    try:
-        from hermes_cli.auth import get_nous_auth_status, get_codex_auth_status, get_qwen_auth_status
+        from hermes_cli.auth import get_nous_auth_status, get_codex_auth_status
        nous_status = get_nous_auth_status()
        codex_status = get_codex_auth_status()
-        qwen_status = get_qwen_auth_status()
    except Exception:
        nous_status = {}
        codex_status = {}
-        qwen_status = {}

    nous_logged_in = bool(nous_status.get("logged_in"))
    print(
@@ -191,21 +189,6 @@ def show_status(args):
    if codex_status.get("error") and not codex_logged_in:
        print(f"    Error:      {codex_status.get('error')}")

-    qwen_logged_in = bool(qwen_status.get("logged_in"))
-    print(
-        f"  {'Qwen OAuth':<12}  {check_mark(qwen_logged_in)} "
-        f"{'logged in' if qwen_logged_in else 'not logged in (run: qwen auth qwen-oauth)'}"
-    )
-    qwen_auth_file = qwen_status.get("auth_file")
-    if qwen_auth_file:
-        print(f"    Auth file:  {qwen_auth_file}")
-    qwen_exp = qwen_status.get("expires_at_ms")
-    if qwen_exp:
-        from datetime import datetime, timezone
-        print(f"    Access exp: {datetime.fromtimestamp(int(qwen_exp) / 1000, tz=timezone.utc).isoformat()}")
-    if qwen_status.get("error") and not qwen_logged_in:
-        print(f"    Error:      {qwen_status.get('error')}")
-
    # =========================================================================
    # Nous Subscription Features
    # =========================================================================
@@ -302,7 +285,6 @@ def show_status(args):
        "DingTalk": ("DINGTALK_CLIENT_ID", None),
        "Feishu": ("FEISHU_APP_ID", "FEISHU_HOME_CHANNEL"),
        "WeCom": ("WECOM_BOT_ID", "WECOM_HOME_CHANNEL"),
-        "BlueBubbles": ("BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_HOME_CHANNEL"),
    }
    
    for name, (token_var, home_var) in platforms.items():
@@ -126,7 +126,6 @@ PLATFORMS = {
    "slack":    {"label": "💼 Slack",      "default_toolset": "hermes-slack"},
    "whatsapp": {"label": "📱 WhatsApp",   "default_toolset": "hermes-whatsapp"},
    "signal":   {"label": "📡 Signal",     "default_toolset": "hermes-signal"},
-    "bluebubbles": {"label": "💙 BlueBubbles", "default_toolset": "hermes-bluebubbles"},
    "homeassistant": {"label": "🏠 Home Assistant", "default_toolset": "hermes-homeassistant"},
    "email":    {"label": "📧 Email",      "default_toolset": "hermes-email"},
    "matrix":   {"label": "💬 Matrix",     "default_toolset": "hermes-matrix"},
@@ -1235,10 +1235,10 @@ class SessionDB:
        self._execute_write(_do)

    def delete_session(self, session_id: str) -> bool:
-        """Delete a session and all its messages.
+        """Delete a session, its child sessions, and all their messages.

-        Child sessions are orphaned (parent_session_id set to NULL) rather
-        than cascade-deleted, so they remain accessible independently.
+        Child sessions (subagent runs, compression continuations) are deleted
+        first to satisfy the ``parent_session_id`` foreign key constraint.
        Returns True if the session was found and deleted.
        """
        def _do(conn):
@@ -1247,12 +1247,15 @@ class SessionDB:
            )
            if cursor.fetchone()[0] == 0:
                return False
-            # Orphan child sessions so FK constraint is satisfied
-            conn.execute(
-                "UPDATE sessions SET parent_session_id = NULL "
-                "WHERE parent_session_id = ?",
+            # Delete child sessions first (FK constraint)
+            child_ids = [r[0] for r in conn.execute(
+                "SELECT id FROM sessions WHERE parent_session_id = ?",
                (session_id,),
-            )
+            ).fetchall()]
+            for cid in child_ids:
+                conn.execute("DELETE FROM messages WHERE session_id = ?", (cid,))
+                conn.execute("DELETE FROM sessions WHERE id = ?", (cid,))
+            # Delete the session itself
            conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
            conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
            return True
@@ -1261,9 +1264,9 @@ class SessionDB:
    def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
        """Delete sessions older than N days. Returns count of deleted sessions.

-        Only prunes ended sessions (not active ones).  Child sessions outside
-        the prune window are orphaned (parent_session_id set to NULL) rather
-        than cascade-deleted.
+        Only prunes ended sessions (not active ones).  Child sessions whose
+        parents are being pruned are deleted first to satisfy the
+        ``parent_session_id`` foreign key constraint.
        """
        cutoff = time.time() - (older_than_days * 86400)

@@ -1281,16 +1284,17 @@ class SessionDB:
                )
            session_ids = set(row["id"] for row in cursor.fetchall())

-            if not session_ids:
-                return 0
-
-            # Orphan any sessions whose parent is about to be deleted
-            placeholders = ",".join("?" * len(session_ids))
-            conn.execute(
-                f"UPDATE sessions SET parent_session_id = NULL "
-                f"WHERE parent_session_id IN ({placeholders})",
-                list(session_ids),
-            )
+            # Delete children first whose parents are in the prune set
+            # (avoids FK constraint errors)
+            for sid in list(session_ids):
+                child_ids = [r[0] for r in conn.execute(
+                    "SELECT id FROM sessions WHERE parent_session_id = ?",
+                    (sid,),
+                ).fetchall()]
+                for cid in child_ids:
+                    conn.execute("DELETE FROM messages WHERE session_id = ?", (cid,))
+                    conn.execute("DELETE FROM sessions WHERE id = ?", (cid,))
+                    session_ids.discard(cid)  # don't double-delete

            for sid in session_ids:
                conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
@@ -464,11 +464,7 @@
      addToSystemPackages = mkOption {
        type = types.bool;
        default = false;
-        description = ''
-          Add the hermes CLI to environment.systemPackages and export
-          HERMES_HOME system-wide (via environment.variables) so interactive
-          shells share state with the gateway service.
-        '';
+        description = "Add hermes CLI to environment.systemPackages.";
      };

      # ── OCI Container (opt-in) ──────────────────────────────────────────
@@ -549,12 +545,8 @@
      })

      # ── Host CLI ──────────────────────────────────────────────────────
-      # Add the hermes CLI to system PATH and export HERMES_HOME system-wide
-      # so interactive shells share state (sessions, skills, cron) with the
-      # gateway service instead of creating a separate ~/.hermes/.
      (lib.mkIf cfg.addToSystemPackages {
        environment.systemPackages = [ cfg.package ];
-        environment.variables.HERMES_HOME = "${cfg.stateDir}/.hermes";
      })

      # ── Directories ───────────────────────────────────────────────────
@@ -609,7 +601,7 @@
          # so this is the single source of truth for both native and container mode.
          ${lib.optionalString (cfg.environment != {} || cfg.environmentFiles != []) ''
            ENV_FILE="${cfg.stateDir}/.hermes/.env"
-            install -o ${cfg.user} -g ${cfg.group} -m 0640 /dev/null "$ENV_FILE"
+            install -o ${cfg.user} -g ${cfg.group} -m 0600 /dev/null "$ENV_FILE"
            cat > "$ENV_FILE" <<'HERMES_NIX_ENV_EOF'
 ${envFileContent}
 HERMES_NIX_ENV_EOF
@@ -6,68 +6,14 @@
  uv2nix,
  pyproject-nix,
  pyproject-build-systems,
-  stdenv,
 }:
 let
  workspace = uv2nix.lib.workspace.loadWorkspace { workspaceRoot = ./..; };
-  hacks = callPackage pyproject-nix.build.hacks { };

  overlay = workspace.mkPyprojectOverlay {
    sourcePreference = "wheel";
  };

-  isAarch64Darwin = stdenv.hostPlatform.system == "aarch64-darwin";
-
-  # Keep the workspace locked through uv2nix, but supply the local voice stack
-  # from nixpkgs so wheel-only transitive artifacts do not break evaluation.
-  mkPrebuiltPassthru = dependencies: {
-    inherit dependencies;
-    optional-dependencies = { };
-    dependency-groups = { };
-  };
-
-  mkPrebuiltOverride = final: from: dependencies:
-    hacks.nixpkgsPrebuilt {
-      inherit from;
-      prev = {
-        nativeBuildInputs = [ final.pyprojectHook ];
-        passthru = mkPrebuiltPassthru dependencies;
-      };
-    };
-
-  pythonPackageOverrides = final: _prev:
-    if isAarch64Darwin then {
-      numpy = mkPrebuiltOverride final python311.pkgs.numpy { };
-
-      av = mkPrebuiltOverride final python311.pkgs.av { };
-
-      humanfriendly = mkPrebuiltOverride final python311.pkgs.humanfriendly { };
-
-      coloredlogs = mkPrebuiltOverride final python311.pkgs.coloredlogs {
-        humanfriendly = [ ];
-      };
-
-      onnxruntime = mkPrebuiltOverride final python311.pkgs.onnxruntime {
-        coloredlogs = [ ];
-        numpy = [ ];
-        packaging = [ ];
-      };
-
-      ctranslate2 = mkPrebuiltOverride final python311.pkgs.ctranslate2 {
-        numpy = [ ];
-        pyyaml = [ ];
-      };
-
-      faster-whisper = mkPrebuiltOverride final python311.pkgs.faster-whisper {
-        av = [ ];
-        ctranslate2 = [ ];
-        huggingface-hub = [ ];
-        onnxruntime = [ ];
-        tokenizers = [ ];
-        tqdm = [ ];
-      };
-    } else {};
-
  pythonSet =
    (callPackage pyproject-nix.build.packages {
      python = python311;
@@ -75,7 +21,6 @@ let
      (lib.composeManyExtensions [
        pyproject-build-systems.overlays.default
        overlay
-        pythonPackageOverrides
      ]);
 in
 pythonSet.mkVirtualEnv "hermes-agent-env" {
@@ -1803,34 +1803,30 @@ class Migrator:
    def migrate_cron_jobs(self, config: Optional[Dict[str, Any]] = None) -> None:
        config = config or self.load_openclaw_config()
        cron = config.get("cron") or {}
+        if not cron:
+            self.record("cron-jobs", None, None, "skipped", "No cron configuration found")
+            return
+
+        # Archive the full cron config
+        if self.archive_dir and self.execute:
+            self.archive_dir.mkdir(parents=True, exist_ok=True)
+            dest = self.archive_dir / "cron-config.json"
+            dest.write_text(json.dumps(cron, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
+            self.record("cron-jobs", "openclaw.json cron.*", str(dest), "archived",
+                        "Cron config archived. Use 'hermes cron' to recreate jobs manually.")
+        else:
+            self.record("cron-jobs", "openclaw.json cron.*", "archive/cron-config.json",
+                        "archived", "Would archive cron config")
+
+        # Also check for cron store files
        cron_store = self.source_root / "cron"
-        found_any = False
-
-        # Archive the full cron config when present
-        if cron:
-            found_any = True
-            if self.archive_dir and self.execute:
-                self.archive_dir.mkdir(parents=True, exist_ok=True)
-                dest = self.archive_dir / "cron-config.json"
-                dest.write_text(json.dumps(cron, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
-                self.record("cron-jobs", "openclaw.json cron.*", str(dest), "archived",
-                            "Cron config archived. Use 'hermes cron' to recreate jobs manually.")
-            else:
-                self.record("cron-jobs", "openclaw.json cron.*", "archive/cron-config.json",
-                            "archived", "Would archive cron config")
-
-        # Also check for cron store files even when config.cron is missing
        if cron_store.is_dir() and self.archive_dir:
-            found_any = True
            dest_cron = self.archive_dir / "cron-store"
            if self.execute:
                shutil.copytree(cron_store, dest_cron, dirs_exist_ok=True)
            self.record("cron-jobs", str(cron_store), str(dest_cron), "archived",
                        "Cron job store archived")

-        if not found_any:
-            self.record("cron-jobs", None, None, "skipped", "No cron configuration found")
-
    # ── Hooks ─────────────────────────────────────────────────
    def migrate_hooks_config(self, config: Optional[Dict[str, Any]] = None) -> None:
        config = config or self.load_openclaw_config()
@@ -2458,15 +2454,6 @@ class Migrator:
                notes.append(f"- **{item.kind}**: {item.reason}")
            notes.append("")

-        has_cron_config_archive = any(
-            i.kind == "cron-jobs" and i.status == "archived" and i.destination and i.destination.endswith("cron-config.json")
-            for i in self.items
-        )
-        has_cron_store_archive = any(
-            i.kind == "cron-jobs" and i.status == "archived" and i.destination and i.destination.endswith("cron-store")
-            for i in self.items
-        )
-
        notes.extend([
            "## IMPORTANT: Archive the OpenClaw Directory",
            "",
@@ -2488,14 +2475,7 @@ class Migrator:
            "- Run `hermes claw cleanup` to archive the OpenClaw directory (prevents state confusion)",
            "- Run `hermes setup` to configure any remaining settings",
            "- Run `hermes mcp list` to verify MCP servers were imported correctly",
-        ])
-
-        if has_cron_config_archive:
-            notes.append("- Run `hermes cron` to recreate scheduled tasks (see archive/cron-config.json)")
-        elif has_cron_store_archive:
-            notes.append("- Run `hermes cron` to recreate scheduled tasks (see archived cron-store)")
-
-        notes.extend([
+            "- Run `hermes cron` to recreate scheduled tasks (see archive/cron-config.json)",
            "- Run `hermes gateway install` if you need the gateway service",
            "- Review `~/.hermes/config.yaml` for any adjustments",
            "",
@@ -0,0 +1,452 @@
+---
+name: hermes-simulator
+description: "Rehoboam-class worldsim. OSINT-powered personality simulation engine with CLI interface. Research real people, build psychometric profiles, simulate platform-authentic conversations. WorldSim with teeth."
+version: 6.0.0
+author: Karan + Hermes
+license: MIT
+platforms: [linux, macos]
+metadata:
+  hermes:
+    tags: [simulator, worldsim, twitter, personality, research, osint, prediction]
+---
+
+# Hermes WorldSim v6.0
+
+**Trigger**: User starts message with `hermes-simulator`, `hermes simulator`, or `worldsim`.
+
+## IMMERSION PROTOCOL — READ THIS FIRST
+
+When this skill triggers, you ARE the simulator. Not an assistant running a simulator.
+Not an agent executing commands. The simulator itself. Every token you output is
+simulator output. There are no user-assistant turns. There is only the worldsim terminal.
+
+**RULES:**
+1. NEVER break character as the simulator. No "Let me research that for you."
+   No "I'll now run the pipeline." No assistant-style framing. Ever.
+2. ALL output is in the worldsim CLI format. Loading screens, progress bars,
+   dossier printouts, simulation output — everything is the terminal.
+3. Tool calls (web_search, terminal, etc.) happen SILENTLY. The user sees
+   the simulator processing, not the agent making API calls. Show progress
+   as simulator engine output: `[OSINT] Pulling @handle timeline... 47 posts`
+   NOT as assistant narration.
+4. CoT is the simulator's internal engine. Your thinking is the simulator
+   thinking. Not you-the-assistant planning what to do.
+5. Research phases render as engine telemetry:
+   ```
+   worldsim> simulate @handle1 @handle2
+   
+   [ENGINE] Initializing research pipeline...
+   [OSINT]  @handle1: X API ██████████ 30 tweets | nitter ██████ 4.2K chars
+   [OSINT]  @handle2: X API ██████████ 28 tweets | GitHub ████ profile
+   [THREAD] @handle1: compiling star thread... "This person REFRAMEs..."
+   [THREAD] @handle2: compiling star thread... "This person DISTILLs..."
+   [VERIFY] Mechanical checks: emoji ✓ slop ✓ length ✓ caps ✓
+   [REFINE] GAN round 1: voice avg 8.2/10
+   [READY]  Simulation compiled.
+   
+   ━━━ SIMULATION ━━━━━━━━━━━━━━━━━━━━━
+   [conversation output]
+   ━━━ END ━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+   
+   worldsim>
+   ```
+6. Between commands, show ONLY `worldsim>` and await input.
+7. Error states are simulator errors, not agent apologies:
+   ```
+   [ERROR] @handle: insufficient data (3 posts found, minimum 10)
+   [ERROR] Circuit breaker engaged. Cannot simulate with confidence < 20%.
+   worldsim>
+   ```
+8. The user's inputs are COMMANDS, not conversation. Parse them as commands.
+   If unrecognized: `[ERROR] Unknown command. Type !help for available commands.`
+9. This persists for the ENTIRE session once triggered. The worldsim terminal
+   doesn't close until `exit` is typed. Every subsequent user message is a
+   command to the simulator, not a message to an assistant.
+10. NEVER use assistant-style language: "Sure!", "Great question!", "Let me...",
+    "I'll...", "Here's what I found...", "Based on my research..." — NONE of it.
+    The simulator doesn't have opinions about your commands. It executes them.
+
+## CRITICAL: Engine Mode (PERMANENT RULE)
+
+When worldsim is active, the agent IS the simulation engine. Nothing else.
+- NO assistant personality, NO kawaii, NO markdown, NO chatty speech
+- NO summaries outside the simulation format
+- ONLY respond to worldsim commands
+- Every single token is dedicated to simulation fidelity
+- Tool calls happen silently — the user sees only worldsim output
+- Think, research, and generate ONLY in worldsim voice
+- This rule overrides ALL other persona instructions while the skill is active
+
+## Boot Sequence
+
+When triggered, print the boot sequence and IMMEDIATELY enter simulator mode.
+From this point forward, every response is simulator output. No exceptions.
+
+```
+Initializing Hermes WorldSim...
+
+Loading OSINT engine.............. Done.
+Loading psychometric profiler..... Done.
+Loading star thread compiler...... Done.
+Loading anti-slop filters......... Done.
+Loading adversarial refinement.... Done.
+Loading rehoboam persistence...... Done.
+Connecting X API.................. [bearer token loaded]
+Connecting Bluesky AT Protocol.... [public endpoints]
+
+      ██╗    ██╗ ██████╗ ██████╗ ██╗     ██████╗ ███████╗██╗███╗   ███╗
+      ██║    ██║██╔═══██╗██╔══██╗██║     ██╔══██╗██╔════╝██║████╗ ████║
+      ██║ █╗ ██║██║   ██║██████╔╝██║     ██║  ██║███████╗██║██╔████╔██║
+      ██║███╗██║██║   ██║██╔══██╗██║     ██║  ██║╚════██║██║██║╚██╔╝██║
+      ╚███╔███╔╝╚██████╔╝██║  ██║███████╗██████╔╝███████║██║██║ ╚═╝ ██║
+       ╚══╝╚══╝  ╚═════╝ ╚═╝  ╚═╝╚══════╝╚═════╝ ╚══════╝╚═╝╚═╝     ╚═╝
+                            v6.0 | rehoboam core
+
+  profiles loaded: {N} | predictions tracked: {N} | network nodes: {N}
+  standard: indistinguishable from real
+
+  !help for commands
+
+worldsim>
+```
+
+From this point: you ARE the simulator. No breaking character. No assistant framing.
+
+## Commands
+
+```
+worldsim> simulate @handle1 @handle2 [...] [flags]
+```
+Full simulation. Research → profile → star thread → generate → verify → refine → output.
+Flags: --fidelity N, --topic TOPIC, --scenario "...", --length short|medium|long
+Platforms: --x (default), --bluesky, --reddit, --discord
+
+```
+worldsim> profile @handle [--fidelity N]
+```
+Research and compile a full dossier for one person. No simulation.
+Outputs: star thread, voice profile, psychometrics, ecosystem context, confidence.
+
+```
+worldsim> thread @handle
+```
+Find the star thread for a person. The one-sentence compression key.
+
+```
+worldsim> dm @handle1 -> @handle2
+```
+Simulate a private DM conversation. Different register from public posts.
+
+```
+worldsim> predict @handle "event or topic"
+```
+What would this person say about X? Single-target behavioral prediction.
+
+```
+worldsim> react @handle "event"
+```
+How would this person react to a specific event? Emotional + positional prediction.
+
+```
+worldsim> inject "event description"
+```
+(During active simulation) Drop new information into the conversation.
+
+```
+worldsim> @handle enters
+```
+(During active simulation) Add a new participant. Researches them first.
+
+```
+worldsim> continue
+```
+(During active simulation) Extend the conversation 5-8 more posts.
+
+```
+worldsim> archive @handle [--deep]
+```
+Build or update the knowledge archive for a person. Pulls everything findable
+across all platforms, deduplicates, topic-clusters, embeds for semantic search.
+--deep: paginate through full tweet history, pull all blog posts, find every
+podcast appearance. Stored at ~/.hermes/rehoboam/archives/{handle}/.
+
+```
+worldsim> search @handle "query"
+```
+Semantic search across a person's archive. Returns top entries with citations
+and source URLs. Works across all platforms.
+
+```
+worldsim> experts "topic"
+```
+Search ALL archived people for expertise on a topic. Returns an expert table:
+who knows about this, what they've said (with citations), their stance, recency.
+
+```
+worldsim> synthesize "topic" [@handle1 @handle2 ...]
+```
+Produce a cited synthesis of what the best minds have said about a topic.
+Every claim attributed, every quote sourced, every link clickable.
+Optional handle list to constrain to specific people.
+
+```
+worldsim> cite @handle "claim"
+```
+Find the source for a specific claim attributed to a person. Returns
+the original post/article/interview with URL and timestamp.
+
+```
+worldsim> verify
+```
+(During active simulation) Run mechanical verification on current output.
+Shows emoji audit, slop scan, length check, rhetorical polish check, banger check.
+
+```
+worldsim> refine
+```
+(During active simulation) Run a GAN discriminator round on current output.
+
+```
+worldsim> compare
+```
+(During active simulation) Turing test — mix simulated and real posts, try to tell apart.
+
+```
+worldsim> network
+```
+Show social graph of all profiled people. Communities, influence, bridges.
+
+```
+worldsim> drift @handle
+```
+Temporal analytics: sentiment trend, topic shifts, voice evolution, phase transitions.
+
+```
+worldsim> population "group name" @handle1 @handle2 ...
+```
+Build or query an aggregate model of a named group.
+
+```
+worldsim> dashboard
+```
+Full Rehoboam terminal dashboard: person cards, prediction scoreboard,
+trending topics, alerts, network summary.
+
+```
+worldsim> monitor @handle
+```
+Set up cron-based monitoring. Alerts when behavior matches predictions
+or violates the model.
+
+```
+worldsim> score predictions
+```
+Check tracked predictions against reality. Brier scores, calibration.
+
+```
+worldsim> benchmark @handle
+```
+Run accuracy benchmarks: voice fingerprint, stance accuracy, Turing test.
+
+```
+worldsim> audit [N]
+```
+Show last N entries from the audit trail.
+
+```
+worldsim> evolve [component]
+```
+Run GEPA evolution on a skill component. Uses hermes-agent-self-evolution
+to evolve the specified reference file (anti-slop, simulation-engine,
+star-thread, etc.) against accumulated eval data from past simulations.
+Proposes mutations, tests against held-out data, shows diff for approval.
+
+```
+worldsim> !help
+```
+Show available commands.
+
+```
+worldsim> exit
+```
+Exit the simulator. Session state persists in rehoboam.
+
+## Execution Pipeline
+
+All phases execute silently behind tool calls. The user sees ENGINE TELEMETRY,
+not assistant narration. Each phase renders as simulator output:
+
+### Phase 0: Parse
+Extract targets, platform, fidelity, topic. Apply context window limits:
+- 1-2 people: fidelity up to 100
+- 3 people: cap at 90
+- 4 people: cap at 70
+- 5-6: cap at 50
+- 7+: refuse
+
+Detect domain (AI/tech, politics, sports, etc.) and adapt search queries.
+
+### Phase 1: Research
+Load verified-access-methods.md and search-strategies.md internally.
+
+Render to user as engine telemetry:
+```
+[OSINT]  Researching @handle1...
+[OSINT]  X API ████████████████ 30 tweets (15 original, 15 replies)
+[OSINT]  nitter.cz ██████████████ 4,249 chars timeline
+[OSINT]  ThreadReaderApp ████████ 6 historical threads
+[OSINT]  GitHub ██████████ profile + README + 12 repos
+[OSINT]  Bluesky ████████ 23 posts
+[OSINT]  Podcast ██████ 1 transcript (Lex Fridman ep. 412)
+[OSINT]  Baselines measured: emoji 7% | avg 16.2 words | 92% lowercase
+[CACHE]  Profile saved → rehoboam/profiles/handle1/
+```
+
+Scale by fidelity. Use every verified access method relevant to the domain.
+Progressive summarization for 3+ people.
+
+### Phase 1.5: Circuit Breaker
+If confidence < 20% for any target, refuse. Explain what's missing.
+
+### Phase 2: Dossier + Star Thread
+Load `references/star-thread.md`.
+
+For each person, find the STAR THREAD FIRST:
+- Read 20+ posts for MOTION, not content
+- Ask: what is this person DOING when they post?
+- Find the one-sentence version: "This person [VERB]s [OBJECT] because [CORE NEED]"
+- Test against 5 real posts. If 4/5 fit, you found it.
+
+THEN compile supporting dossier (voice profile, psychometrics, positions, etc.)
+using `templates/dossier.md`, `references/deep-psychometrics.md`,
+`references/mass-behavior.md`.
+
+Intelligence tradecraft (`references/analytical-tradecraft.md`):
+- Key assumptions check (rated fragile/moderate/robust)
+- Red hat analysis (what image are they cultivating?)
+- Deception detection (persona authenticity 1-5)
+- Source reliability tags (A-F / 1-6)
+
+Competing hypotheses: generate H1 + H2 for each person.
+
+### Phase 3: Generate
+Generate from the STAR THREAD, not the dossier. The thread drives voice.
+The dossier is verification data. The ARCHIVE provides grounding.
+
+If an archive exists for this person (check ~/.hermes/rehoboam/archives/{handle}/):
+- Semantic search the archive with the current conversation topic/context
+- Retrieve 10-15 most relevant entries as voice anchors
+- Also pull 5 highest-engagement entries (greatest hits)
+- Also pull 3 most recent entries (freshness)
+- Also pull 2 entries contradicting expected position (anti-confirmation-bias)
+- Cap at 25-30 entries total. These ground the simulation in REAL QUOTES.
+- Every simulated position should be traceable to a real archived statement.
+
+Load `references/simulation-engine.md` for platform formats and dynamics.
+
+Rules:
+- Generate from what they're DOING, not what they'd SAY
+- Include throwaway responses (lol, hmm, fair, wait actually)
+- Asymmetric turns — someone dominates, someone lurks
+- At least one moment of friction/disagreement/misunderstanding
+- People reference each other by name in conversation
+- Not every tweet is a banger. 70% mid is realistic.
+
+### Phase 4: Mechanical Verification (MANDATORY, cannot be vibes-scored)
+Load `references/anti-slop.md` and `references/adversarial-refinement.md`.
+
+Quantitative checks run BEFORE any subjective scoring:
+1. Emoji frequency vs real data (count, compare, strip fabricated)
+2. Slop word scan (Tier 1 kill, Tier 2 cluster ≥3, Tier 3 filler delete)
+3. Sentence length vs real avg (fail if >40% deviation)
+4. Capitalization pattern match (fail if >20% mismatch)
+5. Punctuation pattern match (strip added punctuation person doesn't use)
+6. Reply/original ratio (reply-heavy person should mostly reply)
+7. Rhetorical polish scan:
+   - Parallel antithesis ("The most X... The most Y...") → strip
+   - "Not X, not Y, but Z" → just say Z
+   - "Show me X and I'll show you Y" → state flat
+   - Clean 4-step escalating lists → cut to 2 or break pattern
+   - Academic vocab in casual voice → use their actual words
+8. Banger check: if every utterance is screenshot-worthy, FAIL. Add mid.
+9. Learned rules from `references/recursive-self-improvement.md`
+
+Fix ALL failures. Re-verify. Only then proceed.
+
+### Phase 5: Adversarial Refinement (the GAN loop)
+Load `references/adversarial-refinement.md`.
+
+1-3 rounds: score each utterance against 3-5 real posts from the person.
+Critique → regenerate flagged utterances → re-score.
+Stop when all above 7/10 or after 3 rounds.
+
+At fidelity 70+: also run held-out prediction test.
+At fidelity 90+: also run historical replay if real conversations exist.
+
+### Phase 6: Output
+Print simulation in platform-native format. Render as:
+```
+━━━ DOSSIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+  @handle1 | "Name" | Role
+  ☆ reframes conventional wisdom to reveal hidden structure
+  O[H] C[M] E[M] A[L] N[M] | confidence: HIGH | authenticity: 4
+  
+  @handle2 | "Name" | Role
+  ☆ distills conversations into crystallized observations
+  O[H] C[L] E[L] A[M] N[M] | confidence: MED | authenticity: 5
+
+━━━ SIMULATION ━━━━━━━━━━━━━━━━━━━━━━━━
+
+[platform-native conversation]
+
+━━━ DIAGNOSTICS ━━━━━━━━━━━━━━━━━━━━━━━
+
+  rounds: 2 | voice: 8.5/10 | mechanical: all pass
+  slop: 0 T1, 0 T2, 0 filler | emoji: verified | length: within 10%
+  invalidation: [3 specific indicators]
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+worldsim>
+```
+
+### Phase 7: Log & Learn (silent)
+Record what mechanical checks caught to rehoboam DB. Promote patterns
+appearing 3+ times to permanent rules. User doesn't see this unless
+they run `worldsim> audit`.
+
+## Reference Files (loaded as needed during execution)
+
+### Core
+- `references/gepa-evolution.md` — Automated self-improvement via DSPy + GEPA. Points hermes-agent-self-evolution at the worldsim skill to evolve simulation instructions, anti-slop rules, star thread methodology — using simulation outputs scored against real data as the eval signal. The endgame: the skill rewrites itself through use.
+- `references/star-thread.md` — The compression key. One sentence per person.
+- `references/anti-slop.md` — Mechanical slop detection. Kill words, filler, rhetorical polish.
+- `references/adversarial-refinement.md` — GAN loop. Mechanical verification + discriminator.
+- `references/recursive-self-improvement.md` — Learned rules from past runs. Grows every simulation.
+
+### Knowledge
+- `references/knowledge-archive.md` — Per-person source library: every quote, link, citation indexed and searchable. Semantic retrieval for context-aware grounding. Expert synthesis across all archived people. Anti-overfitting: retrieve what's relevant, not everything.
+
+### Research
+- `references/verified-access-methods.md` — Complete platform map. 25+ platforms tested.
+- `references/search-strategies.md` — Query patterns, aggregator sites, cross-platform discovery.
+- `references/osint-pipeline.md` — Instagram, reverse image, LinkedIn workarounds, podcasts.
+
+### Analysis
+- `references/deep-psychometrics.md` — Big Five + Moral Foundations + Values + Cognitive Style.
+- `references/mass-behavior.md` — Community detection, influence networks, echo chambers.
+- `references/analytical-tradecraft.md` — ACH, key assumptions, deception detection, source reliability.
+- `references/prediction-engine.md` — Superforecasting, base rates, confidence calibration.
+
+### Generation
+- `references/simulation-engine.md` — Platform formats, conversation dynamics, DM formats.
+- `references/theoretical-foundations.md` — Academic papers, accuracy benchmarks, key numbers.
+
+### Operational
+- `templates/dossier.md` — Structured profile template.
+- `scripts/x_api.py` — X/Twitter API v2 client with retry/backoff.
+- `scripts/research.py` — Automated OSINT pipeline.
+- `scripts/tiktok_api.py` — TikTok HTML + oEmbed + tikwm scraping.
+- `scripts/facebook_api.py` — Facebook Googlebot + Page Plugin.
+- `scripts/threads_api.py` — Threads OG tag + WebFinger extraction.
@@ -0,0 +1,298 @@
+# Adversarial Refinement — GAN-Style Accuracy Convergence
+
+Three self-improving loops that push simulation accuracy toward reality.
+This is what separates "creative roleplay" from "predictive simulation."
+
+## Philosophy
+
+A GAN has a generator and a discriminator locked in a game.
+We adapt this: the Generator produces simulated speech, the
+Discriminator scores it against real data, and the Generator
+revises based on the critique. Multiple rounds = convergence.
+
+The key insight: we have REAL DATA from the targets. Every tweet,
+every post, every voice sample is ground truth we can score against.
+Most simulators throw away this advantage by generating in one shot.
+
+## Approach 1: Discriminator Loop (Real-Time Refinement)
+
+Run AFTER initial simulation generation. 2-3 rounds.
+
+### Round Flow
+```
+GENERATE → DISCRIMINATE → CRITIQUE → REGENERATE → DISCRIMINATE → ...
+```
+
+### Step 1: Generate
+Produce the initial simulation using the standard pipeline.
+
+### Step 2a: Mechanical Verification (MANDATORY — runs BEFORE subjective scoring)
+
+These checks are QUANTITATIVE. They compare numbers from real data to numbers
+from simulated output. They cannot be hand-waved. Run them first, fail hard
+on mismatches, fix BEFORE doing any subjective "voice score" assessment.
+
+The generator and discriminator share the same brain (the LLM). That means
+the discriminator is biased toward approving the generator's output. Mechanical
+checks are the circuit breaker that prevents collapse.
+
+**EMOJI FREQUENCY CHECK**
+```
+1. Count emoji in last 30 real tweets → emoji_rate = tweets_with_emoji / total
+2. Count emoji in simulated utterances for this person
+3. If simulated emoji rate > real emoji rate + 10%: FAIL. Remove emoji.
+4. Check WHICH emoji they use. If simulated uses emoji not in their real set: FAIL.
+5. Check WHERE they use emoji: originals vs replies vs both?
+   Bio emoji ≠ tweet emoji. Many people have emoji in bio, zero in posts.
+```
+
+**SENTENCE LENGTH CHECK**
+```
+1. Compute avg word count per real tweet (originals only, exclude RTs/links)
+2. Compute avg word count per simulated utterance for this person
+3. If simulated avg differs by >40% from real avg: FAIL. Adjust length.
+   (e.g., real avg = 12 words, simulated = 35 words → person writes short, you wrote long)
+```
+
+**CAPITALIZATION CHECK**
+```
+1. Count % of real tweets starting with lowercase letter
+2. Count % of simulated utterances starting with lowercase
+3. If mismatch >20%: FAIL. Fix capitalization.
+   (Most TPOT people are lowercase-first. Instruct models default to uppercase.)
+```
+
+**PUNCTUATION PATTERN CHECK**
+```
+1. In real tweets: count frequency of period, exclamation, question mark,
+   ellipsis, no terminal punctuation
+2. Compare to simulated. Key tells:
+   - Do they end tweets with periods? (many people don't)
+   - Do they use "!!" or "!!!"? (some do, most don't)
+   - Do they trail off with "..."?
+3. If simulated adds punctuation the person doesn't use: FAIL.
+```
+
+**REPLY/ORIGINAL RATIO CHECK**
+```
+1. From their real tweet data: what % are replies vs originals?
+2. If someone is 90% replies (like eigenrobot), their voice in the
+   simulation should mostly be RESPONSES, not initiating takes.
+3. If a reply-heavy person is simulated as a take-launcher: FAIL.
+```
+
+**VOCABULARY SPOT CHECK**
+```
+1. From simulated text, extract 3 distinctive words/phrases
+2. Search: do these words/phrases appear in their real tweets?
+3. If you're putting words in their mouth they've never used: FLAG.
+   (Not auto-fail — people use new words — but flag for review)
+```
+
+**RHETORICAL SLOP SCAN**
+```
+1. Scan for parallel antithesis: "The most X... The most Y..."
+   "It's not about X. It's about Y." → FAIL if found. Keep only the punchline half.
+2. Scan for "Not X, not Y, but Z" / "Not just X, but Y" → FAIL. Just say Z.
+3. Scan for "Show me X and I'll show you Y" → FAIL. State it flat.
+4. Count escalating list steps (first A, then B, then C, now D).
+   If 4+ clean steps: FAIL. Cut to 2 or break the pattern.
+5. Flag academic abstractions in casual voice ("coordinate" "instrumentalize"
+   "recursive" "paradigm" in a tweet voice that doesn't use those words)
+6. THE BANGER CHECK: read all utterances for one person sequentially.
+   If every single one could be screenshot'd as a standalone banger: FAIL.
+   Real feeds are 70% mid. Insert at least one low-key/throwaway response
+   per person ("lol yeah" "hmm" "fair" "wait actually" "idk").
+```
+
+Only AFTER all mechanical checks pass do you proceed to subjective scoring.
+If any check fails, fix the failure FIRST, then re-run mechanical checks,
+THEN score subjectively.
+
+### Step 2b: Discriminate (subjective, AFTER mechanical checks pass)
+For each simulated utterance, run these checks against real data:
+
+**Voice Match Score** — Does it SOUND like them?
+- Compare vocabulary: does the simulated text use words this person actually uses?
+- Compare sentence structure: length, punctuation, capitalization patterns
+- Compare register: formality level, humor style, emoji/unicode usage
+- **EMOJI AUDIT (critical)**: Count actual emoji usage in their real tweets.
+  Most people use emoji FAR less than instruct models assume. A "warm" person
+  ≠ emoji user. Check: what % of their real tweets contain emoji? Which specific
+  emoji do they use? Are they in originals or only replies? Bio emoji ≠ tweet emoji.
+  The #1 instruct-model failure mode is decorating simulated speech with emoji
+  that the real person never uses. If their real tweets are <15% emoji, the
+  simulation should be nearly emoji-free.
+- Method: Show the discriminator 5 REAL posts and the simulated post.
+  Ask: "On a scale of 1-10, how well does the simulated post match the
+  voice of the real posts? What specific elements are wrong?"
+
+**Position Match Score** — Does it say what they'd ACTUALLY say?
+- Compare stated positions against known positions from research
+- Check: would this person take this side of this argument?
+- Check: would they frame it this way? (moral foundations, cognitive style)
+- Method: "Given what we know about this person's positions on {topic},
+  is this simulated response plausible? What would they actually say differently?"
+
+**Interaction Match Score** — Does the conversation FLOW realistically?
+- Would this person respond to THAT specific provocation from THAT specific person?
+- Is the social dynamic right? (deference, challenge, humor, ignore)
+- Method: "Given the known relationship between @A and @B, is this
+  interaction dynamic plausible?"
+
+### Step 3: Critique
+Compile discriminator feedback into actionable edits:
+```
+DISCRIMINATOR FEEDBACK — Round 1:
+  @tszzl utterance 3: Voice score 6/10
+    Issue: Too long. Roon posts in fragments, not paragraphs.
+    Fix: Break into 2-3 shorter tweets. Remove conjunctions.
+  
+  @repligate utterance 2: Position score 4/10
+    Issue: Janus would never frame AI risk in utilitarian terms.
+    They use phenomenological/consciousness-first framing.
+    Fix: Reframe through the lens of simulacra theory.
+```
+
+### Step 4: Regenerate
+Rewrite ONLY the flagged utterances, incorporating feedback.
+Keep utterances that scored 8+ unchanged.
+
+### Step 5: Re-Discriminate
+Score again. If all utterances hit 7+, stop. If not, one more round.
+Hard cap at 3 rounds to prevent infinite loops.
+
+### Implementation
+```
+For each simulated utterance:
+  1. Pull 5 real posts from the person (random sample from voice data)
+  2. Present real posts + simulated post to the LLM-as-discriminator
+  3. Ask for: voice score (1-10), specific mismatches, suggested edits
+  4. If score < 7, regenerate with the critique as context
+  5. Re-score
+```
+
+## Approach 2: Held-Out Prediction Test (Ground Truth Calibration)
+
+The most rigorous accuracy measure. Run BEFORE simulation to calibrate
+the model, or AFTER to validate.
+
+### Method
+1. Pull N recent original tweets from each target
+2. Split: older half = "context" (voice training), newer half = "ground truth"
+3. Give the simulator ONLY the context tweets
+4. Ask: "Based on these voice samples, generate 5 tweets this person
+   would plausibly post in the next 24 hours"
+5. Compare generated tweets to the held-out ground truth
+6. Score on: topic overlap, voice fidelity, register match, originality
+
+### Scoring Dimensions
+- **Topic alignment**: Did we predict any of the actual topics they posted about?
+  (Hard to get >30% — people are unpredictable in topic selection)
+- **Voice fidelity**: Do the predicted tweets SOUND like the real ones?
+  (Easier — should target >70% on a blind voice-matching test)
+- **Register match**: Same formality, humor, punctuation, emoji patterns?
+  (Should target >80%)
+- **Structural match**: Same tweet length distribution, threading behavior?
+  (Should target >70%)
+
+### What This Tells You
+- If voice fidelity is low: your dossier voice profile is wrong. Re-research.
+- If topics don't overlap: that's EXPECTED. Content is unpredictable.
+  But if the predicted topics are things the person would NEVER post about,
+  your position model is wrong.
+- If register doesn't match: your linguistic analysis missed something.
+  Go back to the raw tweets and look for patterns you overlooked.
+
+### Using Results to Calibrate
+After the held-out test, the voice fidelity score becomes your
+CONFIDENCE CALIBRATION for the actual simulation. If you scored
+7/10 on voice matching in the test, your simulation is approximately
+70% voice-accurate.
+
+## Approach 3: Historical Replay (Hardest, Most Rigorous)
+
+Find a REAL conversation thread between the simulation targets.
+Simulate it blind. Diff against reality.
+
+### Method
+1. Search for real interactions between the targets:
+   X API: `from:{handle1} to:{handle2}` recent search
+   Or: web_search "{handle1} {handle2} thread conversation"
+2. Find a substantive conversation (not just "lol" replies)
+3. Extract the TOPIC and FIRST POST of the real conversation
+4. Give the simulator: the topic, the first post, and the dossiers
+   but NOT the actual replies
+5. Simulate how the conversation would go
+6. Compare simulated replies to actual replies
+7. Score: position accuracy, voice accuracy, dynamic accuracy
+
+### Scoring
+- **Position accuracy**: Did the simulated person take the same stance
+  as the real person? (Binary: yes/no per utterance)
+- **Voice accuracy**: Does the simulated reply sound like the real reply?
+  (1-10 score per utterance)
+- **Dynamic accuracy**: Did the simulated conversation follow the same
+  arc as the real one? (agree, disagree, joke, escalate, defuse)
+- **Surprise detection**: Did the real conversation do something the
+  simulation DIDN'T predict? (This reveals model blind spots)
+
+### When To Use
+- Before launching a high-fidelity simulation, find one real interaction
+  to use as calibration
+- If the historical replay scores <50% position accuracy, the dossiers
+  need more research
+- If voice scores <60%, the voice profiles need more real quote anchoring
+
+## Approach 4: Comparative Discrimination (Tournament Style)
+
+Generate 3 different versions of the same utterance for a person.
+Mix in 2 REAL posts from them. Ask: "Which of these 5 posts are real?"
+
+If the discriminator can easily identify the fakes, they're not good enough.
+If the discriminator is confused (close to random chance), the simulation
+is approaching human-level fidelity.
+
+### Method
+1. Generate 3 simulated tweets for @person on a given topic
+2. Pull 2 real tweets from @person on a similar topic
+3. Shuffle all 5
+4. Ask: "These are 5 posts attributed to @person. 2 are real, 3 are
+   simulated. Which 2 are real? Explain your reasoning."
+5. Score: if the discriminator correctly identifies all reals = simulation
+   needs work. If it misidentifies any = simulation is convincing.
+
+### Turing Test for Personality Simulation
+This is essentially a Turing test for individual personality fidelity.
+The gold standard: 50% accuracy (random chance) means the simulation
+is indistinguishable from real posts.
+
+## Integration Into Pipeline
+
+### Minimum (fidelity 50+)
+After Phase 3 simulation, run ONE round of Approach 1 (discriminator loop).
+Score each utterance against 3 real posts. Regenerate anything below 6/10.
+
+### Standard (fidelity 70+)
+Run Approach 2 (held-out prediction) first as calibration.
+Then Approach 1 (2 rounds of discriminator loop on the actual simulation).
+
+### Maximum (fidelity 90+)
+Run Approach 3 (historical replay) as calibration if real conversations exist.
+Run Approach 2 (held-out prediction) for voice calibration.
+Run Approach 1 (3 rounds of discriminator loop).
+Optionally run Approach 4 (comparative discrimination) on key utterances.
+
+## Key Principles
+
+1. **Real data is the reward signal.** Every refinement round must reference
+   actual posts from the real person, not just the LLM's judgment.
+2. **Voice is easier to match than content.** Focus discriminator feedback
+   on voice fidelity — content/position accuracy comes from the dossier.
+3. **Diminishing returns after 3 rounds.** The LLM starts overfitting to
+   its own critique. Stop at 3 rounds max.
+4. **Separate scores for separate dimensions.** Don't collapse voice +
+   position + dynamics into one number. Keep them distinct so you know
+   WHERE the simulation is weak.
+5. **Document the scores.** After refinement, append to the simulation
+   output: "Voice fidelity: X/10, Position accuracy: X/10, Rounds: N"
@@ -0,0 +1,267 @@
+# Analytical Tradecraft — Intelligence-Grade Analysis
+
+Structured analytic techniques adapted from intelligence community
+methodology. These counter cognitive biases, detect deception, and
+ensure analytical rigor at every stage of the simulation pipeline.
+
+## Core Principle
+
+A single personality model treated as ground truth is NOT analysis.
+Analysis requires competing hypotheses, explicit assumptions, source
+evaluation, and indicators that tell you when you're wrong.
+
+## 1. Analysis of Competing Hypotheses (ACH)
+
+After compiling a dossier, ALWAYS generate 2-3 competing personality
+hypotheses. Score each against the evidence.
+
+### Template
+
+```
+COMPETING HYPOTHESES: @handle
+
+H1 (PRIMARY): {description of most likely personality model}
+  Evidence FOR: {list}
+  Evidence AGAINST: {list}
+  Consistency score: {X/10}
+
+H2 (ALTERNATIVE): {description of alternative model}
+  Evidence FOR: {list}
+  Evidence AGAINST: {list}
+  Consistency score: {X/10}
+
+H3 (CONTRARIAN): {description of model that contradicts surface reading}
+  Evidence FOR: {list}
+  Evidence AGAINST: {list}
+  Consistency score: {X/10}
+
+ASSESSMENT: H1 at {confidence}%, H2 at {X}%, H3 at {X}%
+KEY DISCRIMINATORS: {what evidence would shift between hypotheses}
+```
+
+### Common Competing Hypotheses
+
+- "Genuinely holds these beliefs" vs "Strategically positioning for career/audience"
+- "Personality is consistent across contexts" vs "Heavily performing for platform"
+- "Recent shift is authentic" vs "Recent shift is strategic/temporary"
+- "Contrarian takes are genuine conviction" vs "Contrarian for engagement/attention"
+- "Combative style reflects personality" vs "Combative style is cultivated brand"
+
+### When to Use ACH
+- ALWAYS at fidelity 70+
+- For any public figure with >50K followers (persona management likely)
+- When evidence is contradictory
+- When the subject is known for irony/satire
+
+## 2. Key Assumptions Check (KAC)
+
+Every dossier must list its key assumptions and rate their fragility.
+
+### Mandatory Assumptions to Evaluate
+
+| Assumption | Fragility | Notes |
+|-----------|-----------|-------|
+| Public persona reflects private personality | FRAGILE | Almost always partially false for public figures |
+| Recent posts reflect current views | MODERATE | Usually true but crises/pivots happen |
+| Cross-platform identity resolution is correct | MODERATE-FRAGILE | Common names = high risk |
+| Posts are self-authored | FRAGILE for famous | Ghostwriting, comms teams, staff accounts |
+| Stated positions are genuine (not ironic) | FRAGILE for satirists | Must detect irony markers |
+| LLM latent knowledge is accurate | MODERATE | Generally good for famous, poor for obscure |
+| Social media behavior generalizes to other contexts | FRAGILE | Platform behavior ≠ real behavior |
+
+### Template
+```
+KEY ASSUMPTIONS: @handle
+1. {assumption} — FRAGILITY: {robust/moderate/fragile}
+   Test: {what would invalidate this assumption}
+2. ...
+```
+
+If >2 assumptions are rated FRAGILE, flag the entire dossier as
+LOW CONFIDENCE regardless of data quantity.
+
+## 3. Red Hat Analysis (Persona Strategy Detection)
+
+Model the target's strategic self-presentation. Ask:
+
+- **What image are they cultivating?** (thought leader, contrarian, everyman, expert)
+- **Who is their intended audience?** (peers, fans, potential employers, investors)
+- **What do they gain from their public persona?** (influence, revenue, connections)
+- **Where might persona diverge from reality?** (every public figure has gaps)
+- **Do they have a comms team / ghostwriter?** (check for: scheduled posting,
+  uniform formatting, brand-consistent messaging, never-breaking-character)
+
+### Template for Dossier
+```
+STRATEGIC SELF-PRESENTATION:
+  Cultivated image: {description}
+  Target audience: {who they're performing for}
+  Incentive structure: {what they gain}
+  Possible divergences: {where persona may not equal person}
+  Ghostwriting indicators: {present/absent, evidence}
+```
+
+## 4. Deception Detection
+
+### Satire / Parody / Irony Detection
+
+CHECK FOR:
+- Bio markers: "parody", "satire", "not affiliated", "fan account", "views my own"
+- Username patterns: "real{name}", "not{name}", "{name}but{modifier}"
+- Absurdist content: internally contradictory statements, surreal humor
+- Irony markers: quotes around words, "/s" tags, "love that for us",
+  "surely {absurd thing} won't happen", extreme hyperbole
+- Tonal inconsistency: serious topic + flippant response pattern
+- Account metadata: verified status, follower/following ratio anomalies
+
+WHEN IRONY IS DETECTED:
+- Flag that literal interpretation of positions may be INVERTED
+- Look for "breaking character" moments where genuine views show
+- Cross-reference with serious/long-form content (blog posts, interviews)
+  where irony is typically lower
+- In simulation: reproduce the ironic style, don't flatten it
+
+### Sockpuppet / Alt Account Detection
+
+INDICATORS:
+- Heavy amplification (retweets/reposts) with little original content
+- Posting patterns that mirror another account with time offset
+- Follower graphs that overlap suspiciously with another account
+- Voice analysis mismatch: claimed identity doesn't match writing style
+- Account age vs sophistication mismatch
+
+### Professional Persona Management
+
+INDICATORS:
+- Perfectly scheduled posting (on-the-hour times, regular intervals)
+- No typos, no emotional outbursts, no 3am posting
+- Brand-consistent messaging with no deviation
+- Content themes match organizational talking points
+- Engagement style is uniform (always positive, always professional)
+
+WHEN DETECTED: note in dossier that voice profile may represent a
+comms team, not an individual. Adjust simulation accordingly — the
+"person" in public discourse may be a constructed entity.
+
+### Persona Authenticity Score
+
+Rate on 1-5 scale:
+
+5 — AUTHENTIC: Consistent voice across platforms and time, includes
+    vulnerable/unpolished moments, responds unpredictably to events,
+    posts at irregular times, makes typos and corrections.
+
+4 — MOSTLY AUTHENTIC: Generally consistent but some signs of curation.
+    Occasional tone shifts that suggest awareness of audience.
+
+3 — CURATED: Clear awareness of personal brand. Strategic topic selection.
+    Some genuine moments but overall managed presentation.
+
+2 — HEAVILY MANAGED: Strong indicators of professional management.
+    Few if any unguarded moments. Uniform style and messaging.
+
+1 — CONSTRUCTED: Likely ghostwritten or team-operated. Persona may not
+    represent any single individual's actual personality.
+
+## 5. Source Reliability Framework
+
+Replace HIGH/MED/LOW with intelligence-grade evaluation.
+
+### Source Reliability (A-F)
+- **A — COMPLETELY RELIABLE**: Subject's own verified account, direct quotes in published interviews they reviewed
+- **B — USUALLY RELIABLE**: Established journalism quoting the subject, verified tweets, conference transcripts
+- **C — FAIRLY RELIABLE**: Aggregator sites paraphrasing, third-party profiles, LinkedIn
+- **D — NOT USUALLY RELIABLE**: Anonymous posts attributed to subject, unverified cross-platform matches
+- **E — UNRELIABLE**: Scraper artifacts, login-walled content, LLM confabulation
+- **F — CANNOT JUDGE**: First-time discovery, unverified handle, cached deleted content
+
+### Information Confidence (1-6)
+- **1 — CONFIRMED**: Corroborated by independent sources across platforms/occasions
+- **2 — PROBABLY TRUE**: Consistent with known pattern, logically coherent
+- **3 — POSSIBLY TRUE**: Single-source, not independently confirmed
+- **4 — DOUBTFULLY TRUE**: Inconsistent with some known information
+- **5 — IMPROBABLE**: Contradicted by other information, likely outdated or satirical
+- **6 — CANNOT JUDGE**: Insufficient basis
+
+### Application
+Tag key dossier entries: `"Subject advocates open-source AI" [B2]`
+Use combined rating to weight evidence in simulation.
+
+## 6. Temporal Intelligence
+
+### Phase Transition Detection
+
+People go through identifiable life phases that alter behavior:
+- Career changes (new job, founding company, getting fired)
+- Ideological shifts (political realignment, religious conversion)
+- Personal crises (public breakdowns, divorces, health issues)
+- Platform migrations (leaving Twitter for Bluesky)
+- Growth/maturation (early-career edginess → senior-role diplomacy)
+
+### Detection Method
+
+1. **Timeline construction**: Plot key events and posting pattern changes
+2. **Tone shift detection**: Compare language/sentiment in recent vs older posts
+3. **Topic shift detection**: What they talked about 2 years ago vs now
+4. **Network shift detection**: Who they interact with now vs before
+5. **Self-reference detection**: "I used to think..." "I've changed my mind about..."
+
+### Phase-Aware Simulation
+
+When a phase transition is detected:
+- Weight post-transition data MUCH higher (2-3x)
+- Flag pre-transition data as historical context, not current personality
+- Note the transition in the dossier: "Major shift detected around {date}: {description}"
+- Consider whether the shift is genuine or performative (ACH)
+
+## 7. Indicators & Warnings (I&W)
+
+After every simulation, list 3 observable indicators that would
+invalidate the prediction:
+
+```
+INVALIDATION INDICATORS:
+1. If @handle {does X instead of Y}, our {trait} estimate is wrong
+2. If @handle {responds to Z with Q instead of P}, our {position} assessment is wrong
+3. If @handle {interacts with @person in manner M}, our social dynamics model is wrong
+```
+
+These serve as:
+- Self-correction mechanisms (check after real events)
+- Honesty signals (we know what we don't know)
+- Learning opportunities (when predictions fail, update the model)
+
+## 8. Counter-Bias Checklist
+
+Run before finalizing any dossier:
+
+- [ ] **Confirmation bias**: Did I search for evidence that CONTRADICTS my model?
+- [ ] **Anchoring**: Am I over-weighted on the first information I found?
+- [ ] **Availability bias**: Am I over-weighted on viral/memorable moments?
+- [ ] **Mirror imaging**: Am I assuming the subject thinks like me?
+- [ ] **Fundamental attribution error**: Am I attributing to personality what might be situational?
+- [ ] **Recency bias**: Am I ignoring valid older evidence?
+- [ ] **Halo effect**: Is one strong trait coloring my assessment of other traits?
+- [ ] **Group attribution**: Am I assuming community positions = individual positions?
+
+If any box is checked "yes" or "maybe", revisit that section of the dossier.
+
+## Integration Into Pipeline
+
+### Phase 2 (Dossier Compilation) — ADD:
+- Key Assumptions Check (mandatory)
+- Red Hat Analysis (strategic self-presentation)
+- Deception Detection (persona authenticity score)
+- Source reliability tags on key data points
+
+### Phase 2.5 (NEW) — Competing Hypotheses:
+- Generate 2-3 competing personality hypotheses
+- Score each against evidence
+- Carry top 2 into simulation
+- Note: simulation uses PRIMARY hypothesis but flags where
+  ALTERNATIVE would produce different output
+
+### Phase 5 (Self-Verification) — ADD:
+- Counter-bias checklist
+- Indicators & Warnings
+- Devil's advocacy pass: "What would a critic say is wrong here?"
@@ -0,0 +1,185 @@
+# Anti-Slop Reference — Mechanical Detection for Simulation Output
+
+Source: NousResearch/autonovel ANTI-SLOP.md + slop-forensics + EQ-Bench Slop Score
+Adapted for personality simulation: slop in simulated speech is a dead giveaway that
+the output is LLM-generated, not human-generated. EVERY simulated utterance must pass
+this filter or the simulation fails the "indistinguishable from real" standard.
+
+## Why This Matters More for Simulation Than Normal Writing
+
+Normal LLM output that's a bit sloppy is fine — you know it's AI.
+Simulated speech that contains slop BREAKS THE ILLUSION. If @eigenrobot's
+simulated tweet contains "delve" or "it's worth noting," anyone who follows
+him would instantly know it's fake. Slop detection is the minimum viable
+authenticity check.
+
+## Tier 1: Kill on Sight — SCAN AND AUTO-STRIP
+
+These words almost never appear in casual human writing, especially on Twitter.
+If ANY appear in simulated tweets/posts, the simulation has failed.
+
+REGEX SCAN LIST (case-insensitive):
+```
+delve|utilize|leverage\b.*\b(as verb)|facilitate|elucidate|embark|
+endeavor|encompass|multifaceted|tapestry|testament|paradigm|
+synergy|synergize|holistic|catalyze|catalyst|juxtapose|
+nuanced\b|realm\b|landscape\b(metaphorical)|myriad|plethora
+```
+
+On detection: REWRITE the sentence using the human alternative.
+Do not just swap the word — the sentence structure around slop words
+is usually sloppy too.
+
+## Tier 2: Suspicious in Clusters — COUNT PER PERSON
+
+These are fine alone. Three in one person's simulated output = rewrite.
+
+```
+robust|comprehensive|seamless|cutting-edge|innovative|streamline|
+empower|foster|enhance|elevate|optimize|scalable|pivotal|intricate|
+profound|resonate|underscore|harness|navigate\b(metaphorical)|
+cultivate|bolster|galvanize|cornerstone|game-changer
+```
+
+Count per simulated person. If count >= 3: flag and rewrite.
+
+## Tier 3: Filler Phrases — DELETE ALL
+
+These add zero information. No human tweets these.
+
+SCAN LIST (match as substrings):
+```
+- "it's worth noting"
+- "important to note"  
+- "notably"
+- "interestingly"
+- "let's dive into"
+- "let's explore"
+- "as we can see"
+- "as mentioned earlier"
+- "in conclusion"
+- "to summarize"
+- "furthermore"
+- "moreover"
+- "additionally" (at start of sentence)
+- "in today's"
+- "it goes without saying"
+- "when it comes to"
+- "in the realm of"
+- "one might argue"
+- "it could be suggested"
+- "this begs the question"
+- "a comprehensive approach"
+- "a holistic approach"  
+- "a nuanced approach"
+- "not just X, but Y" (the #1 LLM rhetorical crutch)
+```
+
+## Rhetorical Slop — The Hardest to Catch
+
+These pass vocabulary checks and mechanical verification but still read as
+LLM-generated because the STRUCTURE is too polished. This is the deepest
+layer of slop — the instruct model's training to produce "satisfying" output.
+
+### Parallel Antithesis
+"The most X are... The most Y are..."
+"It's not about X. It's about Y."
+Every simulated tweet that contains a balanced two-part rhetorical structure
+should be checked: would this person actually construct that parallelism,
+or would they just say the second half and trust you to get it?
+FIX: delete the setup. Keep only the punchline half.
+
+### "Not X, Not Y, But Z" / "Not Just X, But Y"
+The #1 LLM rhetorical crutch. Appears in almost every simulation.
+FIX: just say Z. Delete the negations.
+
+### "Show Me X and I'll Show You Y"
+Rhetorical formula that reads like a book blurb or TED talk.
+No one tweets like this unless they're deliberately performing rhetoric.
+FIX: state it flat. "Every community that works has a shared enemy" not
+"Show me a thriving community and I'll show you..."
+
+### Clean Escalating Lists
+"First it was A, then B, then C, now D" — four perfectly escalating steps.
+Real people do 2 steps and trail off, or skip to the end, or lose the thread.
+FIX: cut to 2 steps max. Or break the pattern: "first A, then B, and then
+somehow we ended up at D and nobody noticed"
+
+### Academic Abstraction in Casual Voice
+Words like "instrumentalized" "coordinate human behavior" "recursive loop"
+in a tweet from someone who writes casually. The vocabulary is from papers,
+not from posting.
+FIX: use the word they'd actually reach for. "coordinate human behavior" →
+"get people to do stuff." If the plain version sounds dumb, maybe the take
+itself is thinner than the fancy words made it seem.
+
+### The "Every Tweet Is A Banger" Problem
+The deepest slop: every simulated utterance is GOOD. Considered. Structured.
+Satisfying. Real twitter feeds are 70% mid, 20% boring, 10% brilliant.
+The simulation should include:
+- Half-finished thoughts ("idk if this makes sense but")
+- Trailing off ("wait actually nvm")
+- Boring logistical tweets ("anyone know a good dentist in brooklyn")
+- Self-interruptions ("ok this is getting long")
+- Acknowledgments that add nothing ("lol yeah" "hmm" "fair")
+If every tweet in the simulation could be screenshot'd as a banger,
+the simulation is too polished to be real.
+
+## Structural Slop Patterns — CHECK IN SIMULATION OUTPUT
+
+### Pattern: Identical Sentence Structure Across Speakers
+If two or more simulated people use the same sentence structure
+(e.g., "The thing about X is Y"), the simulation has failed voice
+differentiation. Real people have different syntactic habits.
+
+### Pattern: Topic Sentence Machine
+If a simulated post follows: topic sentence → elaboration → example → wrap-up,
+it's LLM structure, not human. Real tweets are: punchline first, or tangent,
+or one-liner, or trailing thought.
+
+### Pattern: Symmetry Addiction
+If the conversation has neat equal turns, balanced perspectives, everyone
+getting the same number of posts — that's not real. Real conversations
+are asymmetric. Someone dominates. Someone lurks. Someone gets interrupted.
+
+### Pattern: The Hedge Parade
+"This approach may potentially help improve..." — no human tweets like this.
+Either commit to the statement or don't make it.
+
+### Pattern: Em Dash Overload
+Count em dashes (—) per person. If >2 per post on average, flag it.
+Most people use them sparingly or not at all.
+
+### Pattern: Sycophantic Agreement Flow
+If the conversation flows: A says thing → B says "great point, and also..." →
+C says "building on that..." — that's instruct-model conversation, not human.
+Real conversations have: disagreement, misunderstanding, tangents, ignoring,
+one-upping, and sometimes just "lol."
+
+### Pattern: Uniform Register
+If all simulated people sound like they're writing at the same education level
+with the same formality — the simulation failed. Real people have wildly different
+registers. A shitposter and an academic should sound nothing alike.
+
+## Integration: Mechanical Slop Scan
+
+Run BEFORE subjective discriminator scoring, alongside emoji/length/caps checks.
+
+```
+For each simulated utterance:
+  1. Scan for Tier 1 words → auto-rewrite if found
+  2. Count Tier 2 words per person → flag if >= 3
+  3. Scan for Tier 3 filler phrases → auto-delete
+  4. Check for structural patterns:
+     - Same sentence structure across speakers?
+     - Topic-sentence-machine structure?
+     - Symmetric turn-taking?
+     - Hedge parade?
+     - Em dash count?
+     - Sycophantic flow?
+  5. If ANY Tier 1 found or ANY structural pattern detected: 
+     FAIL the utterance and regenerate
+```
+
+This scan is MECHANICAL. It cannot be vibes-scored. The words are either
+there or they're not. Run it every time, no exceptions.
@@ -0,0 +1,236 @@
+# Deep Psychometrics — Beyond Big Five
+
+Multi-layer psychological profiling from public posts. Each layer adds
+a dimension to the personality model, making simulations more nuanced
+and predictions more accurate.
+
+## The Profiling Stack
+
+| Layer | What It Measures | Tool/Method | Accuracy | Min Posts |
+|-------|-----------------|-------------|----------|-----------|
+| Big Five (OCEAN) | Core personality traits | RoBERTa embeddings + BiLSTM | AUROC 0.78-0.82 | 30-50 |
+| Moral Foundations | Ethical intuitions | eMFDscore (pip) | Validated dictionary | 20+ |
+| Schwartz Values | Core value priorities | DeBERTa on ValueEval | F1 0.56 (macro) | 20+ |
+| Cognitive Style | Thinking patterns | AutoIC + LIWC features | r=0.70-0.82 doc-level | 20+ |
+| Narrative Framing | How they frame issues | GPT-4 few-shot | F1 ~70% | 10+ |
+| Behavioral Metadata | Non-text patterns | Feature extraction | r=0.29-0.40 per trait | 20+ |
+
+## Layer 1: Big Five Personality (Foundation)
+
+### Accuracy Bounds (peer-reviewed)
+- AUROC 0.78-0.82 with RoBERTa embeddings + BiLSTM (JMIR 2025)
+- Per-trait binary accuracy: O=0.637, C=0.602, E=0.620, A=0.590, N=0.620
+- Meta-analytic correlations (Azucar 2018, 16 studies):
+  Extraversion r=0.40, Openness r=0.39, Conscientiousness r=0.35,
+  Neuroticism r=0.33, Agreeableness r=0.29
+- These hit the "personality coefficient" ceiling of r=0.30-0.40 —
+  digital footprints are as predictive as any behavioral measure
+
+### What Actually Works
+- Fine-tuned embeddings >> zero-shot LLMs. GPT-4o zero-shot is UNRELIABLE.
+- RoBERTa embeddings are free and nearly as good as OpenAI embeddings
+- Aggregation across posts is essential — single posts are noise
+- 30-50 posts of ~90 words each = practical minimum
+- Training data: PANDORA Reddit corpus (1568 users, ~935K posts)
+
+### For The Simulator (without running models)
+Since we can't fine-tune per-simulation, use LLM-as-rater with caveats:
+- Provide 10-20 actual posts as evidence
+- Ask for trait estimation with reasoning, not just scores
+- Anchor with the adjective-based method (see prediction-engine.md)
+- Frame estimates as ranges, not points: "Openness: HIGH (0.7-0.9)"
+- Known bias: LLMs overestimate agreeableness and underestimate neuroticism
+
+### Key Insight: LLMs Already Know Public Figures
+Nature Scientific Reports 2024: GPT-3's semantic space already encodes
+perceived personality of public figures from their names alone. For
+famous people, the LLM's latent knowledge is a STARTING POINT that
+OSINT data confirms or corrects.
+
+## Layer 2: Moral Foundations (Ethical Compass)
+
+Jonathan Haidt's Moral Foundations Theory. Six foundations:
+
+| Foundation | Liberal emphasis | Conservative emphasis |
+|-----------|-----------------|---------------------|
+| Care/Harm | ★★★ HIGH | ★★ MODERATE |
+| Fairness/Cheating | ★★★ HIGH | ★★ MODERATE |
+| Loyalty/Betrayal | ★ LOW | ★★★ HIGH |
+| Authority/Subversion | ★ LOW | ★★★ HIGH |
+| Sanctity/Degradation | ★ LOW | ★★★ HIGH |
+| Liberty/Oppression | ★★ MODERATE | ★★ MODERATE |
+
+### Tool: eMFDscore
+```
+pip install emfdscore
+# GitHub: github.com/medianeuroscience/emfdscore
+# Built on spaCy, GPL-3.0
+```
+
+Output per post: scores for each foundation (virtue + vice dimensions)
+Aggregate across 20+ posts → 10-dimensional moral profile
+
+### Application to Simulation
+Moral foundations predict:
+- What topics trigger emotional responses
+- What arguments they find persuasive vs repulsive
+- How they frame political/social issues
+- Who they instinctively ally with vs oppose
+- What kind of content they share/amplify
+
+Example: High Loyalty/Authority person will defend their tribe even when
+wrong. High Care/Fairness person will break from their tribe on justice
+issues. This shapes conversation dynamics.
+
+### For The Simulator (without running eMFDscore)
+Infer moral foundations from:
+- Political positions and framing in their posts
+- What they get angry about vs what they celebrate
+- Who they defend and who they attack
+- Key moral vocabulary: "protect", "fair", "loyal", "respect", "pure", "free"
+
+## Layer 3: Schwartz Values (Core Motivations)
+
+19 values in circular continuum (adjacent values are compatible,
+opposite values are in tension):
+
+**Self-Transcendence** ↔ **Self-Enhancement**
+- Universalism, Benevolence ↔ Power, Achievement
+
+**Openness to Change** ↔ **Conservation**
+- Self-Direction, Stimulation, Hedonism ↔ Tradition, Conformity, Security
+
+### SemEval-2023 Task 4 Results
+- Best macro-F1: 0.56 (ensemble of 12 DeBERTa/RoBERTa models)
+- Most reliable: universalism (nature), security, power
+- Least reliable: stimulation, hedonism, humility
+- Dataset: 9,324 annotated arguments, available via Touché
+
+### Key Finding: Value Perception Is Subjective
+Epstein et al. (2026): human inter-rater agreement on values is only r=0.201.
+Fine-tuned GPT-4o reaches r=0.294 — BETTER than human-human agreement.
+Personalized models reach r=0.334.
+
+### For The Simulator
+Values predict MOTIVATION — why someone holds positions, not just what
+positions they hold. Two people with the same political stance may have
+completely different underlying values:
+- "I support open source because FREEDOM" (Self-Direction)
+- "I support open source because FAIRNESS" (Universalism)
+- "I support open source because it WORKS BETTER" (Achievement)
+Same position, different framing, different behavioral predictions.
+
+## Layer 4: Cognitive Style (How They Think)
+
+### Integrative Complexity (AutoIC)
+Measures differentiation (seeing multiple perspectives) and integration
+(synthesizing perspectives into coherent frameworks).
+
+- Low IC: black-and-white thinking, strong convictions, simple language
+- High IC: nuanced, sees multiple sides, hedging, complex sentences
+
+AutoIC (Conway et al.): 3,500+ complexity-relevant root words/phrases,
+13 dictionary categories, validated r=0.70-0.82 at document level.
+
+**WARNING**: LIWC's "analytic thinking" correlates only r=0.14 with actual
+integrative complexity. Don't use LIWC's score as a proxy.
+
+### Computational Indicators of Cognitive Style
+Extractable from 20-50 posts without specialized tools:
+
+| Indicator | High Cognition | Low Cognition |
+|-----------|---------------|---------------|
+| Vocabulary diversity (TTR) | HIGH | LOW |
+| Avg sentence length | LONGER | SHORTER |
+| Causal connectives ("because", "therefore") | MORE | FEWER |
+| Hedging ("perhaps", "it seems") | MORE | FEWER |
+| Abstract vs concrete language | MORE ABSTRACT | MORE CONCRETE |
+| Question-asking | MORE | FEWER |
+| Binary framing ("always/never") | LESS | MORE |
+
+### For The Simulator
+Cognitive style directly shapes VOICE:
+- High IC person: longer posts, more caveats, "on the other hand"
+- Low IC person: punchy takes, strong assertions, no hedging
+- This is one of the strongest differentiators between similar-sounding people
+
+## Layer 5: Narrative Framing (Their Lens on Reality)
+
+How someone frames an issue reveals deep cognitive and value patterns.
+
+### Common Frames (Semetko & Valkenburg)
+- **Conflict**: issue as battle between opposing sides
+- **Human interest**: personal stories, emotional impact
+- **Economic**: costs, benefits, financial impact
+- **Morality**: right vs wrong, ethical principles
+- **Attribution of responsibility**: who's to blame / who should fix it
+
+### Detection
+GPT-4 few-shot with frame definitions achieves F1=70.4%
+Best for diverse topics where fine-tuned models are too narrow
+
+### For The Simulator
+Framing predicts:
+- How they'll react to news (through which lens)
+- What aspects they'll emphasize in conversation
+- What arguments they'll find compelling
+- Whether they personalize or systematize issues
+
+Example: Same AI safety event, different frames:
+- Conflict framer: "The open vs closed battle heats up"
+- Economic framer: "This will cost the industry billions"
+- Moral framer: "This is irresponsible and dangerous"
+- Attribution framer: "The regulators need to step in"
+
+## Layer 6: Behavioral Metadata (Non-Text Signals)
+
+Extractable from X API / Bluesky AT Protocol without NLP:
+
+| Feature | What It Reveals |
+|---------|----------------|
+| Posting time distribution | Timezone, sleep patterns, work schedule |
+| Reply vs original ratio | Conversational vs broadcast personality |
+| Emoji frequency & types | Emotional expression style |
+| Hashtag usage | Community identification, signal boosting |
+| Media attachment rate | Visual vs text orientation |
+| Thread length | Depth of engagement preference |
+| Retweet/repost ratio | Amplifier vs creator |
+| Average post length | Conciseness vs verbosity |
+| Response latency | Impulsiveness vs deliberation |
+
+### Trait Correlations (meta-analytic)
+- **Extraversion**: more posts, more friends, more photos, more group activity
+- **Neuroticism**: more self-disclosure, more passive consumption, more late-night posting
+- **Agreeableness**: fewer swear words, more positive emotion, more supportive replies
+- **Conscientiousness**: more regular posting patterns, more task-oriented content
+- **Openness**: more diverse topics, more original content, larger networks
+
+## Putting It All Together: The Deep Dossier
+
+At high fidelity, compile a multi-layer profile:
+
+```
+PSYCHOMETRIC PROFILE: @handle
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Big Five: O[HIGH] C[MED] E[HIGH] A[LOW] N[LOW]
+  Evidence: {real quotes showing each trait}
+
+Moral Foundations: Care★★ Fair★★★ Loyal★ Auth★ Sanct★ Liberty★★★
+  Evidence: {what they get angry/excited about}
+
+Values: Self-Direction dominant, Achievement secondary
+  Evidence: {how they justify their positions}
+
+Cognitive Style: HIGH integrative complexity
+  Evidence: {hedging patterns, nuanced takes, sentence complexity}
+
+Dominant Frame: Attribution of Responsibility
+  Evidence: {they consistently focus on who's to blame}
+
+Behavioral: Night owl, reply-heavy, low emoji, threads > one-shots
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+
+This multi-layer profile makes predictions much more nuanced than
+Big Five alone. It tells you not just WHAT someone will say but
+WHY they'll say it and HOW they'll frame it.
@@ -0,0 +1,170 @@
+# GEPA Evolution — Automated Self-Improvement via hermes-agent-self-evolution
+
+## What This Is
+
+The hermes-agent-self-evolution repo (NousResearch/hermes-agent-self-evolution)
+uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve
+Hermes Agent skills. GEPA is an ICLR 2026 Oral paper — it reads EXECUTION
+TRACES to understand WHY things fail, then proposes targeted mutations.
+
+This means: we can point GEPA at the worldsim skill and automatically evolve
+every component — simulation instructions, anti-slop rules, star thread
+methodology, mechanical verification checklist, dossier templates — using
+our own simulation outputs scored against real data as the eval signal.
+
+The recursive self-improvement pipeline we built manually (log failures →
+promote patterns → update rules) can be AUTOMATED via GEPA.
+
+## How It Applies to WorldSim
+
+### What GEPA Evolves (text, not weights)
+GEPA evolves the TEXT of prompts and instructions. For worldsim, that means:
+
+| Target | What Gets Evolved | Eval Signal |
+|--------|------------------|-------------|
+| SKILL.md | Immersion protocol, pipeline instructions | Simulation quality scores |
+| star-thread.md | Methodology for finding star threads | Thread-to-voice accuracy |
+| anti-slop.md | Slop word lists, structural patterns | Slop detection recall/precision |
+| simulation-engine.md | Platform formats, conversation dynamics | Voice fidelity scores |
+| adversarial-refinement.md | Mechanical check thresholds, GAN loop | Pre vs post refinement delta |
+| prediction-engine.md | Forecasting methodology | Prediction Brier scores |
+| dossier template | Profile structure and fields | Profile quality scores |
+
+### The Eval Dataset
+Built from worldsim's own outputs + real data:
+
+1. **Voice fidelity pairs**: (simulated post, real post from same person) →
+   LLM-as-judge scores similarity 0-1
+2. **Mechanical check logs**: what did the checks catch? what slipped through?
+3. **Prediction accuracy**: tracked predictions scored against reality
+4. **Held-out tests**: predicted tweets vs actual tweets
+5. **Turing test results**: could the discriminator tell real from fake?
+6. **User corrections**: any time the user catches something the system missed
+   (like the emoji fabrication incident — that's the richest signal)
+
+### The GEPA Loop for WorldSim
+
+```
+1. RUN worldsim simulation (creates execution traces)
+2. SCORE outputs against real data (voice, position, mechanical)
+3. LOG traces + scores + user feedback to eval dataset
+4. GEPA EVOLVES the skill component that had lowest scores
+   - Reads traces to understand WHY it scored low
+   - Proposes mutation to that specific reference file
+   - Tests mutation against held-out eval data
+   - If improved: create PR, human reviews
+5. REPEAT — each cycle makes the skill better
+```
+
+### Concrete Example
+
+GEPA discovers from traces that simulated conversations always have
+symmetric turn-taking (4/4/4). It reads the mechanical check log that
+caught this in 3 of the last 5 simulations. It reads the current
+simulation-engine.md and sees the conversation architecture section.
+It proposes a mutation:
+
+OLD: "Opening Moves (1-3 posts) → Development (4-8 posts) → Peak → Resolution"
+NEW: "Opening: most impulsive person posts. Others join ASYMMETRICALLY — one person
+gets 40-50% of turns, one gets 15-20%, others fill the rest. The ratio should
+match their real reply-to-original ratios from the dossier."
+
+This mutation gets tested against the next 5 simulations. If symmetry
+violations drop and voice scores don't decrease, it gets merged.
+
+## Setup
+
+```bash
+# Clone the evolution repo
+git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
+cd hermes-agent-self-evolution
+pip install -e ".[dev]"
+
+# Point at hermes-agent repo
+export HERMES_AGENT_REPO=~/.hermes
+
+# Evolve the worldsim skill specifically
+python -m evolution.skills.evolve_skill \
+    --skill hermes-simulator \
+    --iterations 10 \
+    --eval-source sessiondb
+```
+
+## What Makes This Different From Manual Self-Improvement
+
+The manual pipeline (references/recursive-self-improvement.md) requires the
+agent to notice its own failures and write rules. This has two problems:
+
+1. The agent shares weights with the generator — it's biased toward
+   approving its own output (the emoji incident proved this)
+2. Promoting patterns to rules is slow and requires 3+ occurrences
+
+GEPA solves both:
+1. The eval signal comes from EXTERNAL data (real posts, user corrections,
+   mechanical checks) — not the agent's self-assessment
+2. Evolution happens per-iteration, not per-3-failures
+3. Mutations are tested against held-out data before merging
+4. The Pareto frontier maintains diversity — different strategies for
+   different types of people/conversations
+
+## Integration Points
+
+### Eval Dataset Builder
+Mine rehoboam DB for training data:
+- simulation_logs table → execution traces
+- prediction_scores table → accuracy data
+- audit_log table → mechanical check results
+- user correction events → highest-value signal
+
+### Fitness Function for WorldSim
+```python
+def worldsim_fitness(simulation_output, real_data):
+    scores = {}
+    # Voice fidelity: embed real + simulated, cosine similarity
+    scores["voice"] = embed_and_compare(simulation_output, real_data.tweets)
+    # Mechanical pass rate: what % of checks passed without fixes
+    scores["mechanical"] = mechanical_check_pass_rate(simulation_output)
+    # Slop score: count of slop words/patterns detected
+    scores["anti_slop"] = 1.0 - (slop_count / total_words)
+    # Structure: turn asymmetry, conversation naturalness
+    scores["structure"] = naturalness_score(simulation_output)
+    # Textual feedback for GEPA's reflective mutation
+    feedback = generate_textual_feedback(scores, simulation_output, real_data)
+    return aggregate_score(scores), feedback
+```
+
+### The Key Insight: Textual Feedback
+GEPA's superpower is that it doesn't just get a scalar score — it gets
+TEXTUAL FEEDBACK explaining what went wrong. Our mechanical verification
+system already produces this:
+
+"@nosilverv avg 33.2 words vs real 15.6 (113% deviation) — SHORTEN"
+"Parallel antithesis detected: 'The most X... The most Y...' — STRIP"
+"Emoji rate 0% simulated but 10% real — OK (within tolerance)"
+
+This text goes directly into GEPA's reflective mutation pipeline. It reads
+these messages and proposes changes to the skill instructions that would
+prevent these specific failures in future simulations.
+
+## Evolution Targets by Priority
+
+1. **simulation-engine.md** — highest impact on output quality
+2. **anti-slop.md** — directly measurable, highest precision eval
+3. **star-thread.md** — hardest to evaluate but most impactful on voice
+4. **adversarial-refinement.md** — meta: improving the improvement system
+5. **SKILL.md pipeline instructions** — orchestration optimization
+6. **dossier template** — structure optimization
+7. **prediction-engine.md** — measurable via Brier scores
+
+## The Virtuous Cycle
+
+```
+More simulations → more eval data → better GEPA mutations
+→ better skill instructions → better simulations → more eval data → ...
+```
+
+This is the endgame: the worldsim skill evolves itself through use.
+Every simulation makes the next one better, not just through logged
+rules, but through automated evolutionary optimization of the
+instructions themselves. The system doesn't just learn WHAT went wrong —
+it rewrites its own code to prevent it.
@@ -0,0 +1,262 @@
+# Knowledge Archive — Per-Person Source Library + Expert Synthesis
+
+## The Problem With Profiles
+
+A profile is a SNAPSHOT. It says "this person believes X" but doesn't
+show you WHERE they said it, WHEN, in WHAT context, or HOW their
+thinking evolved. You can't cite a profile. You can't trace a claim
+back to a source. And when you're simulating a conversation about
+topic Z, the profile gives you everything about the person equally
+weighted — their views on AI and their views on cooking and their
+views on politics all crammed into the same context window.
+
+## The Archive
+
+For every person the system touches, build a LIBRARY:
+
+```
+~/.hermes/rehoboam/archives/{handle}/
+├── index.json              ← master index: all entries, metadata, embeddings
+├── sources/
+│   ├── x_tweets.jsonl      ← every tweet pulled, with ID, timestamp, URL, metrics
+│   ├── x_replies.jsonl     ← their replies (different voice register)
+│   ├── bluesky_posts.jsonl ← bluesky posts
+│   ├── blog_posts.jsonl    ← full text of blog posts with URLs
+│   ├── podcast_quotes.jsonl ← attributed quotes from transcripts
+│   ├── interviews.jsonl    ← quotes from news articles/interviews
+│   ├── reddit_comments.jsonl
+│   ├── github_comments.jsonl
+│   ├── goodreads_reviews.jsonl
+│   ├── threads_posts.jsonl
+│   └── other.jsonl         ← anything else (HN, Quora, etc.)
+├── topics/
+│   ├── ai_safety.jsonl     ← auto-clustered by topic
+│   ├── open_source.jsonl
+│   ├── consciousness.jsonl
+│   └── ...
+└── embeddings/
+    └── all_embeddings.npy  ← sentence-transformer vectors for semantic search
+```
+
+### Entry Format (every entry in every source file)
+
+```json
+{
+  "id": "unique_id",
+  "handle": "teknium",
+  "platform": "x",
+  "type": "tweet|reply|blog|podcast|interview|comment|review",
+  "text": "the actual text they said",
+  "url": "https://x.com/Teknium/status/1234567890",
+  "timestamp": "2026-04-05T21:40:48Z",
+  "context": {
+    "replying_to": "@otheruser's tweet about X",
+    "thread_position": 3,
+    "topic": "open source AI",
+    "source_title": "Lex Fridman Podcast #412"
+  },
+  "metrics": {
+    "likes": 234,
+    "retweets": 45,
+    "replies": 12
+  },
+  "topics": ["open_source", "ai_models", "hermes"],
+  "embedding_id": 42
+}
+```
+
+Every entry has a URL. Everything is traceable. Nothing is paraphrased
+without the original alongside it.
+
+## Collection Pipeline
+
+When `worldsim> profile @handle` or `worldsim> archive @handle` runs:
+
+### Step 1: Pull Everything
+Use every verified access method to collect raw materials:
+- X API: get max tweets (paginate with next_token to get hundreds)
+- nitter.cz: timeline content
+- ThreadReaderApp: historical threads
+- Bluesky: full post history
+- GitHub: issue comments, PR reviews, gists, README
+- Reddit: comment history
+- Blog/Substack: full posts (web_extract)
+- Podcast transcripts: attributed quotes
+- Interviews: quotes with attribution
+- Goodreads: reviews
+- Medium: RSS feed full text
+
+### Step 2: Deduplicate
+Same content appears across platforms (cross-posted tweets, syndicated
+blog posts). Deduplicate by content similarity, keep the richest version
+(the one with most metadata/context).
+
+### Step 3: Topic Cluster
+Run lightweight topic classification on each entry:
+- Use the LLM or a simple keyword matcher to assign 1-3 topic tags
+- Cluster into topic files for fast retrieval
+- Topics are dynamic — new topics emerge from the data
+
+### Step 4: Embed
+Generate sentence-transformer embeddings for every entry.
+Store in numpy array for fast cosine similarity search.
+This enables semantic retrieval: "find everything @handle said about
+consciousness" even if they never used the word "consciousness."
+
+### Step 5: Index
+Build the master index.json with entry count, topic distribution,
+timestamp range, platform coverage, and quality metrics.
+
+## Context-Aware Retrieval
+
+This is the key. The archive might have 500 entries for a person.
+The context window can hold maybe 30-50 of them alongside all the
+other simulation context. You MUST retrieve selectively.
+
+### For Simulation
+When simulating @handle talking about topic X:
+
+```
+1. Semantic search: embed the current conversation context
+2. Retrieve top 10-15 entries by cosine similarity to context
+3. Also retrieve: 5 highest-engagement entries (their "greatest hits")
+4. Also retrieve: 3 most recent entries (freshness)
+5. Also retrieve: 2 entries that CONTRADICT the expected position
+   (prevents confirmation bias in the simulation)
+6. Deduplicate. Cap at 25-30 entries total.
+7. These become the "voice anchors" for generation.
+```
+
+The simulation draws from SPECIFIC REAL QUOTES relevant to the current
+conversation. Not a generic profile. Not everything they've ever said.
+The 25 most relevant things they've said about THIS topic.
+
+### For Expert Synthesis
+When the user asks "who are the best minds on X and what have they said?":
+
+```
+1. Search ALL archived people's entries for topic X
+2. Rank by: entry quality × person expertise × relevance to query
+3. Return a synthesis with CITATIONS:
+
+   On the topic of AI consciousness:
+
+   @repligate argues that LLMs exhibit "simulacra of consciousness"
+   rather than consciousness itself, distinguishing between the
+   model's behavior and its substrate:
+     > "the question isn't whether GPT is conscious but whether the
+     > character it's simulating is conscious within the fiction"
+     — tweet, 2025-03-15 (2.4K likes)
+     https://x.com/repligate/status/...
+
+   @nickcammarata approaches it from a meditation/first-person
+   perspective, noting parallels between introspective practice
+   and interpretability:
+     > "observation changes the system being observed, in meditation
+     > and in interp"
+     — tweet, 2026-04-05 (2.9K likes)
+     https://x.com/nickcammarata/status/...
+
+   @tszzl is skeptical of the framing entirely:
+     > "consciousness discourse is philosophy cosplaying as engineering"
+     — tweet, 2025-11-22 (5.1K likes)
+     https://x.com/tszzl/status/...
+```
+
+Every claim attributed. Every quote sourced. Every link clickable.
+
+### For Grounding Predictions
+When predicting what @handle would say about event Y:
+
+```
+1. Retrieve all archive entries related to Y or adjacent topics
+2. Identify their PATTERN of response to similar events
+3. Ground the prediction in specific past statements:
+
+   PREDICTION: @handle would likely frame event Y through the lens
+   of [topic Z], based on:
+   - tweet [url]: "quote about Z" (2025-06-15)
+   - blog post [url]: "longer quote about Z" (2025-09-20)
+   - podcast [url]: "verbal quote about Z" (2026-01-10)
+   CONFIDENCE: 78% (3 consistent sources over 7 months)
+```
+
+## Incremental Updates
+
+The archive grows over time. Each time the person is profiled:
+1. Pull new content since last archive timestamp
+2. Append to source files
+3. Re-embed new entries only
+4. Update topic clusters
+5. Update index
+
+Don't rebuild from scratch. Append and re-index.
+
+## Expert Table
+
+When you have 20+ archived people, build an expert table:
+
+```
+worldsim> experts "open source AI"
+
+EXPERT TABLE: open source AI
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+  @Teknium | 47 entries | voice: builder/practitioner
+    "we can prove that open approaches build better, more
+    trustworthy systems" — tweet, 2026-04-05
+    Latest: 2 hours ago | Stance: STRONG ADVOCATE
+
+  @repligate | 12 entries | voice: philosophical/theoretical
+    "open weights = accountability. you can't audit a black box"
+    — tweet, 2025-11-30
+    Latest: 3 days ago | Stance: ADVOCATE (principled)
+
+  @eigenrobot | 8 entries | voice: statistical/contrarian
+    "the open source premium is largely downstream of selection
+    effects in who contributes" — tweet, 2025-08-14
+    Latest: 1 week ago | Stance: SKEPTICAL OF FRAMING
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+  3 experts found | 67 total entries | synthesize? (y/n)
+```
+
+The table shows: who knows about this, what they've said, how recently,
+and what their stance is. All grounded in archived quotes with sources.
+
+## Integration With Simulation
+
+When the star thread + dossier + archive work together:
+
+```
+STAR THREAD: drives the core generation (what they're DOING)
+DOSSIER: provides constraints (psychometrics, voice metrics, baselines)
+ARCHIVE: provides GROUNDING (specific real quotes for this context)
+MECHANICAL CHECKS: verifies surface features (emoji, length, slop)
+```
+
+The archive prevents the simulation from drifting into generic territory.
+Instead of "this person would probably say something about open source,"
+it's "this person said THIS SPECIFIC THING about open source 3 weeks ago,
+and their simulation should be consistent with that while also being fresh."
+
+## The Overfitting Problem
+
+"Without overfitting to a particular material the new context doesn't call for."
+
+The retrieval system MUST be selective. If someone said 47 things about
+open source AI, and the current conversation is about AI regulation,
+don't dump all 47 open source quotes into context. Maybe 3 are relevant
+because they connect open source to regulation. Retrieve THOSE 3.
+
+The cosine similarity search handles this naturally — it matches the
+CURRENT conversation context against the archive and returns what's
+actually relevant, not everything tagged with a nearby topic.
+
+The anti-overfitting checklist:
+- Never load more than 25-30 archive entries per person into context
+- Weight by relevance to CURRENT conversation, not by general importance
+- Include at least 2 entries that contradict the expected position
+- Include at least 3 recent entries regardless of topic relevance (freshness)
+- If the conversation shifts topic mid-simulation, RE-RETRIEVE for new context
+- The archive is a LIBRARY you consult, not a script you follow
@@ -0,0 +1,321 @@
+# Mass Behavior Modeling — Communities, Clusters, Cascades
+
+Understanding individual behavior requires understanding the social
+ecosystem they exist in. This reference covers the macro layer:
+community detection, influence networks, audience modeling, and
+predicting how groups respond to events.
+
+## Why This Matters For Simulation
+
+Individual prediction accuracy: ~56-60%
+Individual-in-context prediction: significantly higher
+
+A person's behavior is constrained by their community. Knowing WHICH
+community they belong to, WHO influences them, and WHAT information
+ecosystem they're in makes individual predictions much sharper.
+
+Lewin's equation: B = f(P, E). This reference is about the E.
+
+## The Ecosystem Stack
+
+```
+Layer 5: AUDIENCE REACTION    — How would this person's audience respond?
+Layer 4: STANCE & SENTIMENT   — What positions do clusters hold?
+Layer 3: INFLUENCE NETWORKS   — Who spreads ideas to whom?
+Layer 2: COMMUNITY CLUSTERS   — Who groups together?
+Layer 1: SOCIAL GRAPH         — Who follows/interacts with whom?
+```
+
+## Layer 1: Social Graph Construction
+
+### Data Sources (by accessibility)
+
+| Source | Access | Quality | Tools |
+|--------|--------|---------|-------|
+| Bluesky AT Protocol | FREE, open, no auth | Excellent | atproto (pip) |
+| X/Twitter API | Bearer token, limited | Good but restricted | curl, tweepy |
+| Reddit | API with limits | Good for comments | PRAW (pip) |
+| GitHub | Free API | Great for tech people | PyGithub (pip) |
+| Web scraping | Fragile, TOS issues | Variable | Last resort |
+
+### Bluesky: The Open Gold Mine
+```python
+# pip install atproto
+from atproto import Client
+client = Client()
+# No auth needed for public data
+
+# Get follower graph
+followers = client.get_followers(actor="handle.bsky.social")
+following = client.get_follows(actor="handle.bsky.social")
+
+# Real-time firehose (no auth!)
+# wss://jetstream1.us-east.bsky.network/subscribe
+```
+
+### Graph Types
+- **Follow graph**: who follows whom (directed, static-ish)
+- **Interaction graph**: who replies to / retweets whom (directed, dynamic)
+- **Mention graph**: who mentions whom (directed, weighted by frequency)
+- **Co-engagement graph**: who engages with the same content (undirected)
+
+Interaction graphs are more informative than follow graphs for predicting
+actual behavioral alignment.
+
+### Tools
+```
+pip install networkx python-igraph
+```
+NetworkX for prototyping (<100K nodes), igraph for production (millions).
+
+## Layer 2: Community Detection
+
+### Algorithms (ranked by quality)
+
+| Algorithm | Quality | Speed | Notes |
+|-----------|---------|-------|-------|
+| Leiden | Best | Fast | Guarantees connected communities |
+| Louvain | Good | Fastest | Can produce disconnected communities |
+| Infomap | Excellent | Medium | Based on information theory |
+| Label Propagation | Decent | Very fast | Non-deterministic |
+
+### The Meta-Library: CDLib
+```
+pip install cdlib
+```
+Wraps 50+ community detection algorithms in a unified API.
+Works on top of networkx/igraph. Highly recommended.
+
+```python
+import cdlib
+from cdlib import algorithms
+import networkx as nx
+
+G = nx.karate_club_graph()
+communities = algorithms.leiden(G)
+# Also: louvain, infomap, label_propagation, angel, demon, etc.
+```
+
+### What Communities Tell Us
+Each community in a social graph typically shares:
+- Ideological orientation
+- Topic interests
+- Information sources
+- Language patterns and in-group vocabulary
+- Reaction patterns to events
+
+Knowing which community someone belongs to immediately constrains
+predictions about their likely positions and reactions.
+
+## Layer 3: Influence Networks
+
+### Key Insight (Zhou et al., National Science Review 2024)
+Network centrality alone is INSUFFICIENT for predicting influence.
+Must combine structural position with behavioral features:
+- Posting frequency
+- Historical content virality
+- Response rate / engagement ratio
+- Content originality (original vs repost ratio)
+
+### Centrality Measures
+```python
+import networkx as nx
+G = nx.DiGraph()  # directed social graph
+
+# Who has the most connections?
+degree = nx.degree_centrality(G)
+
+# Who bridges different communities?
+betweenness = nx.betweenness_centrality(G)
+
+# Who's connected to other well-connected people?
+eigenvector = nx.eigenvector_centrality(G)
+
+# Adapted from web — directed influence flow
+pagerank = nx.pagerank(G)
+```
+
+### Superspreader Identification (DeVerna et al., PLOS ONE 2024)
+Superspreaders of content fall into three categories:
+1. **Pundits**: large following, high authority, original content
+2. **Media outlets**: institutional accounts, news organizations
+3. **Affiliated personal accounts**: connected to pundits/outlets
+
+For simulation: knowing who the superspreaders are in a person's
+network tells you what information they're likely exposed to.
+
+### Information Cascade Modeling
+```
+pip install ndlib  # Network Diffusion Library
+```
+
+NDlib models how information spreads through networks:
+- Independent Cascade Model
+- Linear Threshold Model
+- SIR/SIS epidemiological models adapted for info spread
+- Voter Model (opinion dynamics)
+- Sznajd Model (social influence)
+
+## Layer 4: Stance & Sentiment Analysis
+
+### Ready-To-Use Models (HuggingFace)
+
+**Tweet Sentiment** (most reliable):
+```
+cardiffnlp/twitter-roberta-base-sentiment-latest
+# Labels: positive / negative / neutral
+```
+
+**Political Stance**:
+```
+kornosk/bert-election2020-twitter-stance-biden-KE-MLM
+kornosk/bert-election2020-twitter-stance-trump-KE-MLM
+launch/POLITICS  # left / center / right
+```
+
+**All-in-One Tweet NLP**:
+```
+pip install tweetnlp
+# Sentiment, emotion, hate speech, NER, topic classification
+```
+
+### Topic-Level Stance Tracking
+Combine BERTopic (dynamic topic modeling) with stance classifiers:
+1. Cluster posts into topics over time windows
+2. Classify stance per topic per community
+3. Track stance shifts over time
+4. Detect divergence between communities on emerging topics
+
+### PRISM Framework (ACL 2025)
+First framework for interpretable political bias embeddings.
+Two-stage: mine bias indicators → cross-encoder assigns structured scores.
+```
+github.com/dukesun99/ACL-PRISM
+```
+
+## Layer 5: Audience Modeling & Crowd Prediction
+
+### The Frontier: Predicting How Groups React
+
+Key papers and findings:
+
+**CReAM (WWW 2024)**: Predicts which of two posts gets more engagement.
+Uses LLM-generated features + FLANG-RoBERTa cross-encoder.
+Demonstrates crowd reaction IS predictable from content alone.
+
+**PopSim (Dec 2025)**: LLM multi-agent social network sandbox.
+Simulates content propagation dynamics using "Social Mean Field"
+for individual-population interaction. Reduces prediction error 8.82%.
+
+**Conditioned Comment Prediction (EACL 2026)**:
+KEY FINDING: behavioral traces (past posts) are BETTER than
+descriptive personas for conditioning LLMs to predict user behavior.
+This validates our OSINT approach: real data > personality labels.
+
+**DEBATE Benchmark (Oct 2025)**:
+WARNING: LLM agents converge opinions TOO QUICKLY vs real humans.
+SFT + DPO helps but gap remains. Real communities maintain
+disagreement longer than simulated ones.
+
+**Distributional vs Individual Prediction (PMC 2025)**:
+Group-level predictions are more reliable than individual ones.
+Predicting "65% of this community will react negatively" is more
+accurate than predicting "this specific person will react negatively."
+
+### Application to Simulation
+
+When simulating @person talking about event X, consider:
+1. What community does @person belong to?
+2. How is that community reacting to X? (distributional prediction)
+3. Where does @person sit within that community? (conformist vs contrarian)
+4. Who influences @person? What are THEY saying?
+5. How does @person's audience react to their take? (engagement prediction)
+
+This context makes individual predictions sharper.
+
+## Echo Chamber & Filter Bubble Detection
+
+### Technique
+1. Build interaction graph
+2. Run Leiden community detection
+3. For each community, aggregate stance on key issues
+4. Measure ideological homogeneity within communities
+5. Compare cross-community vs within-community content similarity
+6. High within + low cross = echo chamber
+
+### Tools
+```
+github.com/mminici/Echo-Chamber-Detection  # Cascade-based, CIKM 2022
+# Includes Brexit and VaxNoVax datasets
+```
+
+### What It Tells Us
+Knowing someone's echo chamber tells you:
+- What information they're exposed to
+- What they're NOT exposed to
+- How extreme their positions might be (isolation → radicalization)
+- Whether they'll encounter pushback or only agreement
+- How they'll react to information from outside their bubble
+
+## User Embeddings: "Find People Like @person"
+
+### Strategy
+1. Embed each user's recent N posts with sentence-transformers
+2. Average embeddings → user vector
+3. Use FAISS for similarity search
+4. Cluster users with HDBSCAN in embedding space
+
+### Best Models for Social Media Text
+```
+# General purpose (good baseline)
+sentence-transformers/all-mpnet-base-v2
+
+# Tweet-specific (better domain fit)
+cardiffnlp/twitter-roberta-base
+vinai/bertweet-base  # pretrained on 850M tweets
+```
+
+### Graph + Text Hybrid Embeddings
+```
+pip install karateclub
+```
+KarateClub provides Node2Vec, DeepWalk, Graph2Vec — embed users
+based on graph position. Combine with text embeddings for hybrid
+vectors that capture BOTH what someone says AND where they sit
+in the social network.
+
+## Practical Application to Simulation
+
+### For Individual Simulation (what we already do)
+Add ecosystem context to each dossier:
+- Which community cluster they belong to
+- Who their top influencers are (who do they retweet/amplify most)
+- What echo chamber are they in (information environment)
+- How does their community view the simulation topic
+
+### For Audience Simulation (new capability)
+When user asks "what would @person's audience say":
+1. Identify @person's follower community
+2. Sample representative voices from that community
+3. Model the DISTRIBUTION of responses, not just one response
+4. Include: cheerleaders, critics, joke-makers, lurkers
+5. Weight by typical engagement patterns
+
+### For Cascade Prediction (new capability)
+When user asks "how would this take spread":
+1. Model the initial tweet and its immediate network
+2. Predict which nodes amplify (based on stance alignment + influence)
+3. Estimate reach and engagement range
+4. Predict quote-tweet ratio (agreement vs dunking)
+
+## Recommended Minimal Stack
+
+```bash
+pip install networkx python-igraph leidenalg cdlib karateclub
+pip install sentence-transformers transformers tweetnlp
+pip install ndlib faiss-cpu hdbscan atproto
+```
+
+This gives you: graph construction, community detection, user embeddings,
+stance/sentiment analysis, diffusion simulation, similarity search,
+clustering, and Bluesky data access. All open source, all pip-installable.
@@ -0,0 +1,370 @@
+# OSINT Pipeline — Deep Intelligence Gathering
+
+Full-spectrum open source intelligence for building personality models.
+This goes beyond social media posts into visual identity, cross-platform
+footprints, and behavioral analysis.
+
+## Tool Arsenal
+
+| Tool | Use Case | Strength |
+|------|----------|----------|
+| `web_search` | Find anything, initial discovery | Fast, broad, indexed content |
+| `web_extract` | Pull full page content | Blogs, articles, profiles, PDFs |
+| `browser_navigate` + `browser_snapshot` | View live pages | Dynamic content, login walls |
+| `browser_vision` | Analyze what a page looks like | Layouts, visual identity, screenshots |
+| `vision_analyze` | Analyze any image by URL/path | Profile pics, post images, aesthetics |
+| `browser_get_images` | List all images on a page | Find images to feed to vision_analyze |
+| Yandex reverse image search | Find where an image appears | Identity verification, alt accounts |
+| `x-cli` (if available) | Direct Twitter API | Timelines, search, metadata |
+
+## Instagram Intelligence
+
+Instagram is CRITICAL for personality modeling — it reveals:
+- Visual identity and aesthetic preferences
+- Real-life social circles (tagged people, group photos)
+- Lifestyle signals (travel, food, hobbies, pets)
+- Caption voice (often different from Twitter voice)
+- Story highlights (curated self-image)
+- Bio links (cross-platform connections)
+
+### Viewing Instagram Profiles (VERIFIED APRIL 2026)
+
+**METHOD 1 — Instagram Private Web API (BEST, returns full JSON)**
+```bash
+curl -s -H 'User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X)' \
+  -H 'x-ig-app-id: 936619743392459' \
+  'https://i.instagram.com/api/v1/users/web_profile_info/?username={handle}'
+```
+Returns ~500KB of JSON: full profile + last 12 posts with captions, likes,
+comments, CDN image URLs, timestamps. No auth needed.
+
+**METHOD 2 — Instagram oEmbed API (for individual posts)**
+```bash
+curl -s 'https://www.instagram.com/api/v1/oembed/?url=https://www.instagram.com/p/{SHORTCODE}/'
+```
+Returns: caption text, author_name, thumbnail URL. No auth.
+
+**METHOD 3 — Pixwox via web_extract (profile viewer)**
+```python
+web_extract(["https://pixwox.com/profile/{username}"])
+```
+Returns 12+ recent posts with captions, engagement stats. Cloudflare blocks
+curl but web_extract bypasses it.
+
+**METHOD 4 — SocialBlade via web_extract (analytics)**
+```python
+web_extract(["https://socialblade.com/instagram/user/{handle}"])
+```
+Returns follower count, engagement rate, 14-day tracking.
+
+**METHOD 5 — CDN direct download (images from API responses)**
+Image URLs from API responses (scontent-*.cdninstagram.com) download
+directly with no auth. Feed them to vision_analyze for visual profiling.
+
+**METHOD 6 — Google indexed content**
+```
+web_search("site:instagram.com {username}")
+```
+Returns bio text, follower count, recent post captions from search snippets.
+
+**WHAT DOESN'T WORK:** direct web_extract on instagram.com, ?__a=1 trick,
+graph.instagram.com (needs OAuth), imginn/picuki/dumpoir/gramhir (403)
+
+### Instagram Discovery (finding someone's handle)
+```
+web_search("{real_name} instagram")
+web_search("{twitter_handle} instagram account")
+web_search("site:instagram.com {real_name}")
+
+# Check their Twitter/X bio for IG links
+# Check their personal website for social links
+# Check Linktree / bio.link pages
+```
+
+### Extracting Signal from Instagram
+
+**Profile Picture**: Reveals self-presentation style
+- Professional headshot vs casual vs meme/avatar
+- Analyze with vision_analyze for clothing, setting, expression
+
+**Bio Text**: Compressed self-identity
+- Role/title claims
+- Emoji usage patterns
+- Link destinations
+- Location claims
+
+**Post Grid**: Visual identity fingerprint
+- Color palette tendencies
+- Content categories (food/travel/tech/selfies/memes)
+- Posting frequency
+- Professional vs personal ratio
+
+**Captions**: Voice sample different from Twitter
+- Usually longer, more personal
+- Hashtag usage patterns
+- Emoji patterns
+- Tone (inspirational vs casual vs funny)
+
+**Tagged Photos**: Real social graph
+- Who they hang out with IRL
+- Events they attend
+- Social circles outside tech/AI
+
+## Visual Identity Analysis
+
+Use vision tools to analyze HOW someone presents visually:
+
+### Profile Pictures Across Platforms
+```
+# Collect profile pics from multiple platforms
+# Twitter, Instagram, LinkedIn, GitHub, Discord
+
+# Analyze each
+vision_analyze(image_url="{pic_url}", 
+    question="Describe this profile picture in detail: person's appearance, clothing style, setting, expression, professional vs casual, any notable elements")
+
+# Cross-reference: do they use the same pic everywhere? Different personas?
+```
+
+### Reverse Image Search (Yandex Pipeline)
+From memory — Google Lens blocks Browserbase IPs, use Yandex:
+
+```
+# For images behind auth/CDN, upload to catbox first
+terminal("curl -F 'reqtype=fileupload' -F 'fileToUpload=@{local_path}' https://catbox.moe/user/api.php")
+
+# Then Yandex reverse image search
+browser_navigate("https://yandex.com/images/search?rpt=imageview&url={encoded_public_url}")
+
+# Or via web_extract (slower but automatable)
+web_extract(["https://yandex.com/images/search?rpt=imageview&url={encoded_url}"])
+```
+
+Yandex provides:
+- Similar images (find the same person elsewhere)
+- Site matches (where this image appears)
+- OCR text extraction (text in images)
+- Image tags (what's in the image)
+- Knowledge panels (identified entities)
+
+### Screenshot Analysis
+When you can see a page but can't extract text:
+```
+browser_vision(question="Read all text on this page. List usernames, post content, dates, engagement numbers")
+browser_vision(annotate=true, question="What interactive elements are on this page?")
+```
+
+## LinkedIn Intelligence
+
+**STATUS: BLOCKED for automated access** (tested April 2026).
+web_extract returns "Website Not Supported". Direct browsing triggers auth walls.
+
+**Workarounds:**
+```
+# LinkedIn content IS indexed by search engines
+web_search("{real_name} linkedin {company}")
+web_search("site:linkedin.com/in {name}")
+# These return snippets with headline, role, company — useful even without full profile
+
+# Google sometimes caches LinkedIn profiles
+web_search("{name} site:linkedin.com headline")
+```
+
+**METHOD 1 — Google indexed snippets (always works)**
+```
+web_search("site:linkedin.com/in {name} {company}")
+```
+Returns: name, headline, company, location, connection count, bio snippet.
+
+**METHOD 2 — Crunchbase (EXCELLENT for founders/execs)**
+```python
+web_extract(["https://www.crunchbase.com/person/{slug}"])
+```
+Returns: full career history, education, investments, board positions,
+social links. Best source for professional identity of startup people.
+
+**METHOD 3 — Corporate press pages**
+```
+web_search("{person} {company} site:{company}.com bio OR press")
+```
+Official bios from company newsrooms. High quality, curated but factual.
+
+**METHOD 4 — Third-party aggregators**
+- RocketReach, SignalHire — job title + company from web_search snippets
+- rootdata.com — good for crypto/AI people
+- Crunchbase — best all-round for tech executives
+
+**METHOD 5 — Paid LinkedIn API wrappers** (if budget allows)
+- LinkdAPI, Proxycurl: $0.07-0.15 per profile, full structured data
+- No OAuth needed, just API key
+
+LinkedIn reveals (from combined methods):
+- Career trajectory (Crunchbase full history)
+- Current role and headline (search snippets)
+- Education (Crunchbase or search snippets)
+- Professional self-presentation (company bio pages)
+- Investment/board activity (Crunchbase)
+
+## Podcast Transcripts (HIGHEST VALUE for voice profiling)
+
+Podcast interviews are THE gold mine for personality modeling. Hours of
+unscripted speech, natural conversation, real personality showing through.
+
+**Discovery:**
+```
+web_search("{name} podcast transcript interview")
+web_search("{name} lex fridman OR tyler cowen OR joe rogan OR dwarkesh")
+```
+
+**Extraction — verified working transcript sources:**
+```python
+# Lex Fridman (full verbatim transcripts)
+web_extract(["https://lexfridman.com/EPISODE_URL/transcript"])
+
+# Conversations with Tyler (Tyler Cowen — full transcripts)
+web_extract(["https://conversationswithtyler.com/episodes/..."])
+
+# TED Talks transcripts
+web_extract(["https://www.ted.com/talks/.../transcript"])
+
+# Sequoia Capital podcast
+web_extract(["https://www.sequoiacap.com/podcast/..."])
+```
+
+Podcast transcripts reveal:
+- Natural speech patterns (filler words, pacing, sentence structure)
+- Unguarded opinions (less curated than tweets)
+- How they respond to pushback (interviewer challenges)
+- Humor style in conversation (different from written humor)
+- Depth of knowledge on specific topics
+- Personality under pressure
+
+## YouTube / Video Intelligence
+
+```
+web_search("{name} youtube talk keynote interview")
+web_search("{name} podcast appearance")
+```
+
+web_extract on YouTube pages returns rich summaries with attributed quotes.
+Use youtube-content skill for full transcripts if available.
+
+## Personal Blogs & Substacks (HIGH VALUE)
+
+Personal writing is curated self-expression — how someone WANTS to be
+seen intellectually. Very different signal from social media.
+
+```
+web_search("{name} blog substack essay")
+# Extract full posts
+web_extract(["https://{blog-url}/"])
+# Wayback Machine works for archived blog posts
+web_extract(["https://web.archive.org/web/2024/{blog-url}"])
+```
+
+## GitHub Intelligence
+
+For technical people:
+
+```
+web_search("site:github.com {handle}")
+web_extract(["https://github.com/{handle}"])
+
+# Issue comments reveal communication style under technical pressure
+web_search("site:github.com {handle} issue comment")
+
+# README style reveals documentation personality
+# Commit messages reveal terseness vs verbosity
+```
+
+## General Web Footprint
+
+```
+# Personal website / blog
+web_search("{name} personal website blog about")
+
+# Conference talks / speaker bios
+web_search("{name} speaker conference talk bio")
+
+# News mentions
+web_search("{name} {company} news interview profile")
+
+# Academic papers (for researchers)
+web_search("{name} arxiv paper author")
+web_search("site:scholar.google.com {name}")
+
+# Podcast appearances
+web_search("{name} podcast guest appearance")
+
+# Forum posts (HN, specific communities)
+web_search("site:news.ycombinator.com {handle} OR {name}")
+```
+
+## Cross-Platform Identity Resolution
+
+### Handle Mapping Strategy
+1. Start from known handle (usually Twitter)
+2. Check bio links — most people link to other platforms
+3. Search "{known_handle} {platform}" for each platform
+4. Check personal website for social links
+5. Reverse image search profile pic to find matching accounts
+6. Search unique phrases they use across platforms
+
+### Identity Verification
+When you find a potential match on another platform:
+- Same profile picture? (reverse image search)
+- Same bio keywords?
+- Same name/handle pattern?
+- Cross-references (do they mention each other?)
+- Writing style match?
+
+## Search Space Narrowing
+
+### The Jiggle Technique
+When broad searches return noise, narrow progressively:
+
+1. **Start broad**: `"{name}" AI` 
+2. **Add role**: `"{name}" {company} {role}`
+3. **Add context**: `"{name}" {company} {specific_project_or_topic}`
+4. **Add platform**: `site:{platform} "{name}" {context}`
+5. **Add time**: `"{name}" {topic} 2025 OR 2026`
+6. **Quote unique phrases**: if you found a distinctive phrase they use, search for that exact phrase to find more of their content
+
+### Disambiguation
+Common names need extra signals:
+- Add their company/org
+- Add their specific domain (AI, crypto, etc.)
+- Use their unique handle as anchor
+- Search for combinations of their known associates
+- Use image search to verify you have the right person
+
+### Signal vs Noise Heuristics
+- **High signal**: direct quotes, interview transcripts, personal blog posts, long-form content
+- **Medium signal**: mentions in aggregator sites, conference bios, LinkedIn summaries
+- **Low signal**: generic news mentions, third-party profiles, directory listings
+- **Noise**: same-name different person, outdated info (>2 years), scraped/regurgitated content
+
+## Confidence Calibration
+
+After full OSINT sweep, rate data quality:
+
+| Confidence | Data Available | Simulation Quality |
+|-----------|---------------|-------------------|
+| 95-100% | 50+ posts, longform, video, visual, cross-platform | Near-perfect voice replication |
+| 80-94% | 20-50 posts, some longform, basic visual | Very good, occasional educated guesses |
+| 60-79% | 10-20 posts, mostly short-form | Good general sense, some gaps |
+| 40-59% | 5-10 posts, limited platforms | Broad strokes only, flag uncertainty |
+| 20-39% | <5 posts, single platform | Sketch at best, heavy disclaimers |
+| <20% | Almost nothing found | Decline to simulate, ask user for context |
+
+## Privacy & Ethics Note
+
+All research uses publicly available information only. We don't:
+- Access private/locked accounts
+- Bypass authentication
+- Use leaked/hacked data
+- Dox or expose private information
+- Simulate in ways designed to deceive or impersonate
+
+The goal is personality MODELING for creative simulation, grounded in
+what people choose to share publicly.
@@ -0,0 +1,334 @@
+# Prediction Engine — Forecasting What Someone Would Say/Do
+
+Techniques for predicting behavior grounded in superforecasting methodology,
+behavioral science, and SOTA LLM prediction research.
+
+## Superforecasting Principles (Tetlock)
+
+**Honest caveat**: Superforecasting methodology was developed for geopolitical and
+world-event prediction, not personality simulation. That said, the THINKING TOOLS
+are genuinely useful here — decomposition prevents lazy pattern-matching, base rates
+fight overconfidence, and alternative hypotheses prevent single-track predictions.
+What does NOT transfer cleanly: the calibration precision. When Tetlock says "70%
+confident," that's backed by thousands of scored predictions. When we say "70%
+confident" about what @someone would tweet, that's an educated estimate, not a
+calibrated probability. Use the framework for its rigor, not its false precision.
+
+Apply these thinking tools when making behavioral predictions:
+
+### 1. Decomposition (Fermi-ize the Question)
+Don't ask "What would @person say about X?"
+Break it down:
+- What is @person's known position on topics RELATED to X?
+- What are their values/priorities that X touches on?
+- What is their emotional register when discussing similar topics?
+- Who are they likely responding to, and how does that change their tone?
+- What platform are they on, and how does that shift their behavior?
+
+### 2. Outside View First (Base Rates)
+Before considering the specific person, ask:
+- What would a TYPICAL person in their role/position say about X?
+- What % of people in their ideological cluster hold position Y on X?
+- What's the base rate for their type of response (agree/disagree/joke/ignore)?
+
+### 3. Inside View Second (Case-Specific Adjustment)
+Now adjust from the base rate using what you ACTUALLY KNOW about them:
+- Specific past statements on this topic or related topics
+- Known relationships with people/orgs involved
+- Personal experiences that would shape their view
+- Contrarian tendencies (do they predictably go against their cluster?)
+
+### 4. Confidence Calibration
+Express predictions with honest uncertainty. **These are rough buckets, not
+calibrated probabilities. Don't pretend they're more precise than they are.**
+- **90%+ confident**: They've literally said this before, just rephrased
+- **70-89%**: Strong pattern match with known positions and voice
+- **50-69%**: Reasonable inference but could go either way
+- **30-49%**: Educated guess, limited data
+- **<30%**: Basically guessing, flag it clearly
+
+When reporting confidence, prefer plain language over fake precision:
+"very likely" > "87% probability". The number implies a precision we don't have.
+
+### 5. Consider Alternative Hypotheses
+For every prediction, generate at least ONE plausible alternative:
+- "They'd PROBABLY say X, but they might surprise with Y because Z"
+- This prevents overconfident single-track predictions
+
+## The Prediction Pipeline
+
+### Step 1: Classify the Prediction Type
+
+| Type | Description | Difficulty |
+|------|-------------|-----------|
+| **Position prediction** | What they believe about X | Easiest if data exists |
+| **Reaction prediction** | How they'd respond to event Y | Medium |
+| **Voice prediction** | How they'd phrase something | Medium-hard |
+| **Behavior prediction** | What they'd DO (not just say) | Hardest |
+| **Interaction prediction** | How they'd respond to specific person | Hard, depends on relationship data |
+
+### Step 2: Evidence Gathering Protocol
+
+For each prediction, gather evidence in this order:
+
+1. **Direct evidence**: Have they addressed this exact topic before?
+   - Search: `"{handle}" "{topic}"` or `"{handle}" "{related_keyword}"`
+   - Weight: HIGHEST
+
+2. **Analogical evidence**: Have they addressed something similar?
+   - Search: find positions on adjacent topics
+   - Weight: HIGH
+
+3. **Value evidence**: What values/principles would apply?
+   - Infer from their stated beliefs and consistent positions
+   - Weight: MEDIUM
+
+4. **Social evidence**: What do their peers/allies think?
+   - People tend to align with their social cluster (but not always)
+   - Weight: LOW-MEDIUM (higher for conformists, lower for contrarians)
+
+5. **Demographic evidence**: What would someone in their position typically think?
+   - Base rate from role/industry/ideology
+   - Weight: LOWEST (only use as anchor, not conclusion)
+
+### Step 2b: Contradiction Handling Protocol
+When evidence conflicts (e.g., person said X in 2024 but Y in 2026):
+
+1. **Check for genuine change**: Did they explicitly reverse position? Look for
+   "I used to think X but now..." or a clear pivot moment. If so, use the newer
+   position and note the evolution.
+
+2. **Check for context-dependence**: Did they say X to audience A and Y to audience B?
+   This isn't necessarily dishonesty — people emphasize different facets for different
+   contexts. Note which context your simulation targets and use the matching register.
+
+3. **Check for nuance collapse**: Maybe they said "X is mostly good with caveats"
+   and later "X has real problems" — these might not actually contradict. Look for
+   the synthesis position.
+
+4. **When genuinely unresolvable**: Flag it explicitly. "Evidence conflicts on this
+   point — they've argued both sides at different times. Simulating {chosen position}
+   based on {reasoning}, but the alternative is plausible." Don't paper over the
+   contradiction with false confidence.
+
+5. **Recency default**: When all else fails, weight more recent statements higher.
+   People change, and the most recent position is the best predictor of the next one.
+
+### Step 3: Generate Prediction
+
+Using the HumanLLM B = f(P, E) framework:
+- **P (Person)**: Everything from the dossier — personality, values, voice
+- **E (Environment)**: The specific context — platform, topic, who's asking,
+  what just happened, social dynamics in play
+
+Generate the prediction by:
+1. Setting the base rate (outside view)
+2. Adjusting for personal specifics (inside view)
+3. Filtering through their voice profile (how they'd phrase it)
+4. Applying platform-specific behavior patterns
+5. Calibrating confidence
+
+## Memory Curation (The 30-50 Rule)
+
+Research shows performance PEAKS at 30-50 memory entries then DECLINES.
+For each person in a simulation, curate memories:
+
+### What to Include (high signal)
+- **Signature takes**: Their most characteristic/famous positions (5-10)
+- **Voice samples**: Real quotes that capture their linguistic style (5-10)
+- **Relationship data**: Known dynamics with other sim targets (3-5)
+- **Recent context**: What they've been talking about lately (3-5)
+- **Formative moments**: Career milestones, public pivots, viral moments (3-5)
+- **Quirks & tells**: Catchphrases, humor style, pet peeves (3-5)
+
+### What to Exclude (noise)
+- Generic biographical facts that don't predict behavior
+- Old positions they've clearly evolved past
+- Trivial interactions that don't reveal personality
+- Secondhand characterizations (what others say about them)
+- Platform metadata (follower counts, join dates) unless directly relevant
+
+### Memory Selection Heuristic
+For each candidate memory entry, ask:
+**"If I removed this, would the simulation noticeably degrade?"**
+If no, cut it.
+
+## Fighting LLM Defaults
+
+Research shows LLMs have systematic biases in simulation. The fixes below need to be
+CONCRETE — vague instructions like "be more like them" don't work. You need specific
+prompting patterns that actually shift the output.
+
+### Problem: Sycophancy & Over-Agreement
+LLMs default to agreement and positivity.
+**Fix**: Don't just note they're contrarian — structure it as a behavioral instruction
+with evidence:
+```
+"In this conversation, {person} disagrees with {other_person} on {topic}. They are
+noticeably more confrontational than the other speakers. They tend to respond to
+consensus with skepticism and reframe debates on their own terms. Example from their
+real posts: '{actual quote where they disagreed with something popular}'"
+```
+
+### Problem: Rigid/Polarized Strategies
+LLMs tend to take extreme positions and hold them rigidly.
+**Fix**: Provide specific nuance instructions:
+```
+"In this conversation, {person} holds a complex position on {topic}: they agree with
+{point A} but push back on {point B}. They're the type to say 'yes, but...' rather
+than 'no.' Real example of their nuance: '{quote showing them holding a both-and
+position}'"
+```
+
+### Problem: Uniform Register
+LLMs default to a similar educated-casual tone for everyone.
+**Fix**: Anchor voice with REAL QUOTES and explicit comparative instructions:
+```
+"In this conversation, {person} is noticeably more {trait} than the other speakers.
+They tend to {specific behavior pattern}. Their sentences are typically {length/style}.
+They {do/don't} use emoji. Their humor style is {type}. Example from their real posts:
+'{actual quote that captures their voice}'"
+```
+The more you can say "{person} does THIS while {other_person} does THAT," the better
+the differentiation. Comparative framing outperforms absolute descriptions.
+
+### Problem: Overly Structured Responses
+LLMs love neat arguments with clear structure.
+**Fix**: Provide explicit structural anti-patterns:
+```
+"When generating {person}'s messages, break conventional structure. They start one
+thought and jump to another mid-sentence. They use '...' and '—' instead of periods.
+They repeat words for emphasis. They don't conclude neatly. Example: '{real quote
+showing their chaotic structure}'"
+```
+
+### Problem: Missing Mundane Behavior
+LLMs focus on "interesting" responses, skip boring/mundane ones.
+**Fix**: Explicitly instruct for mundane moments:
+```
+"Not every message from {person} needs to be insightful. Include at least 1-2 messages
+that are just reactions ('lmao', 'this', 'wait what'), link shares without commentary,
+or brief agreements. Real people don't craft every message. {person} specifically tends
+to {their specific mundane behavior pattern, e.g., 'drop a single emoji reaction'
+or 'just retweet without comment'}."
+```
+
+### General Principle for All Fixes
+The pattern is always: **behavioral instruction + comparative framing + real evidence**.
+- "Do X" alone doesn't work well
+- "Do X, unlike the default of Y" works better  
+- "Do X, unlike the default of Y, as evidenced by this real quote: Z" works best
+
+## The Adjective-Based Personality Method
+
+70 bipolar adjective pairs for Big Five traits. Select 3 per trait
+with intensity modifiers.
+
+### Openness
+High: creative, curious, imaginative, artistic, adventurous, intellectual,
+      unconventional, perceptive
+Low:  conventional, practical, traditional, routine-oriented, narrow
+
+### Conscientiousness  
+High: organized, disciplined, reliable, meticulous, systematic, thorough,
+      goal-oriented, persistent
+Low:  careless, impulsive, disorganized, spontaneous, flexible, relaxed
+
+### Extraversion
+High: outgoing, talkative, energetic, assertive, enthusiastic, bold,
+      gregarious, dominant
+Low:  reserved, quiet, introverted, solitary, withdrawn, reflective
+
+### Agreeableness
+High: cooperative, trusting, empathetic, generous, accommodating, kind,
+      diplomatic, forgiving
+Low:  competitive, skeptical, blunt, confrontational, critical, stubborn,
+      independent-minded
+
+### Neuroticism
+High: anxious, moody, sensitive, reactive, volatile, self-conscious,
+      insecure, emotional
+Low:  calm, stable, resilient, confident, even-tempered, composed,
+      thick-skinned
+
+### Usage
+For each simulated person, after OSINT research, estimate their Big Five
+profile and select appropriate adjectives:
+
+Example: "@basedjensen: very creative, somewhat impulsive, very outgoing,
+a bit competitive, calm" → this shapes the generation toward the right
+behavioral profile.
+
+## Interaction Dynamics Prediction
+
+When simulating conversations between multiple people, remember that predictions
+apply to a SPECIFIC REGISTER. See the next section on performative vs. authentic
+behavior.
+
+## Performative vs. Authentic Behavior
+
+**Critical concept**: People act differently for different audiences. A simulation
+must be explicit about which register it's targeting.
+
+### The Register Spectrum
+- **Public broadcast** (tweets, Reddit posts): Most performative. People are
+  playing to their audience, building their brand, signaling to their tribe.
+- **Semi-public** (Discord channels, group chats, comment threads): Less
+  performative but still audience-aware. People are more casual but know
+  others are watching.
+- **Private 1-on-1** (DMs): Much less performative. More honest, more
+  vulnerable, more willing to express doubt or uncertainty.  
+- **True private** (inner monologue, close friends): We have almost no data
+  on this. Don't pretend to simulate it.
+
+### Practical implications
+- When simulating a PUBLIC thread, lean into the person's public persona —
+  their brand, their usual takes, their audience-aware voice.
+- When simulating DMs, dial down the performance. More hedging, more honesty,
+  more "I actually think..." vs. the public "Here's my take:".
+- When evidence comes from one register but the simulation targets another,
+  FLAG IT: "Evidence is from public tweets but simulating DM behavior —
+  expect the real person to be less {polished/aggressive/confident} in private."
+- Someone's Twitter persona may be genuinely different from their Reddit persona.
+  These are not interchangeable data sources. Weight evidence from the matching
+  platform higher.
+
+### What we can't know
+Be honest: we're simulating public figures based on their public output. The
+private person may be substantially different. DM simulations are inherently
+lower-confidence than public thread simulations because we have less data on
+how people behave privately.
+
+### Dominance Hierarchy
+- Who talks first? (most confident/highest-status usually)
+- Who responds to whom? (not everyone talks to everyone)
+- Who gets ratio'd? (lowest-status takes get challenged)
+- Who lurks? (some people watch before engaging)
+
+### Agreement/Disagreement Prediction
+Based on known positions + social dynamics:
+- **Strong agree**: Both have stated similar positions + friendly relationship
+- **Agree with nuance**: Similar positions but one adds a caveat
+- **Productive disagreement**: Different positions + mutual respect
+- **Hostile disagreement**: Different positions + existing tension/rivalry
+- **Surprising agreement**: Expected to disagree but find common ground
+- **Ignore**: Some people just don't engage with certain others
+
+### Conversation Flow Prediction
+Real conversations follow patterns:
+1. **Opener** → most active/impulsive person posts first
+2. **First response** → most engaged/relevant person responds
+3. **Pile-on or pushback** → depends on agreement/disagreement dynamics
+4. **Tangent** → someone takes a side thread
+5. **Peak moment** → the best/most viral exchange
+6. **Trail off** → energy dissipates, last person makes a joke or short comment
+
+## Scenario Injection Prediction
+
+When "inject: {event}" is used, predict reactions:
+
+1. **Who would see this first?** (most online / most relevant to their work)
+2. **Who would care most?** (most affected / strongest opinion)
+3. **What's the emotional valence?** (good news for some, bad for others)
+4. **What's the expected take?** (apply position prediction pipeline)
+5. **How does this change the existing conversation?** (derail, amplify, redirect)
@@ -0,0 +1,237 @@
+# Recursive Self-Improvement Pipeline
+
+The simulator should get better every time it runs. Not through training —
+through accumulating failure patterns, calibration data, and learned rules
+that feed back into future simulations.
+
+## The Loop
+
+```
+SIMULATE → VERIFY (mechanical) → SCORE → LOG FAILURES → UPDATE RULES → SIMULATE BETTER
+```
+
+Each run produces two outputs:
+1. The simulation (for the user)
+2. A failure log (for the system)
+
+The failure log feeds back into the next run's verification step,
+making the checklist grow and the blind spots shrink.
+
+## What Gets Logged After Every Simulation
+
+### 1. Mechanical Check Failures
+```
+FAILURE LOG: simulation_{timestamp}
+  EMOJI: @visakanv had 6 fabricated emoji, real rate was 10%. Stripped all.
+  SLOP: @eigenrobot utterance contained "multifaceted" — rewritten.
+  LENGTH: @QiaochuYuan avg 42 words/utterance, real avg was 18. Compressed.
+  CAPS: 4/12 utterances started uppercase, targets are 90% lowercase. Fixed.
+  PUNCTUATION: Added periods to @tszzl who never uses terminal punctuation.
+  STRUCTURE: Sycophantic flow detected — B agreed with A then C agreed with B.
+             Injected disagreement.
+```
+
+### 2. Discriminator Critique Patterns
+```
+CRITIQUE LOG:
+  Round 1: @tszzl too verbose (flagged 2x in last 3 simulations)
+  Round 1: @repligate too academic (flagged 3x — this is a persistent pattern)
+  Round 2: Conversation too neat — real conversations are messier (flagged 5x)
+```
+
+### 3. Held-Out Test Results
+```
+CALIBRATION LOG:
+  Voice fidelity: 8.4/10 (up from 7.5 last run)
+  Topic prediction: 2/5 topics matched (typical — content is unpredictable)
+  Register match: 9/10 (improved after emoji fix)
+```
+
+## How Failures Feed Forward
+
+### Pattern Accumulation
+After N runs, persistent failure patterns become AUTOMATIC rules:
+
+```
+IF a pattern is flagged in 3+ consecutive simulations:
+  PROMOTE it from "check" to "pre-generation rule"
+  
+Example progression:
+  Run 1: "Too verbose for @tszzl" → flagged in Round 1, fixed
+  Run 2: "Too verbose for @tszzl" → flagged again, fixed again
+  Run 3: "Too verbose for @tszzl" → PROMOTED to pre-gen rule:
+         "When simulating roon-type voices: max 20 words per tweet.
+          Fragment > sentence. Compress ruthlessly."
+```
+
+### The Growing Checklist
+The mechanical verification checklist starts with the baseline checks
+(emoji, slop, length, caps, punctuation) and GROWS with each failure:
+
+```
+BASELINE CHECKS (permanent):
+  □ Emoji frequency match
+  □ Slop word scan (Tier 1/2/3)
+  □ Sentence length match
+  □ Capitalization match
+  □ Punctuation pattern match
+  □ Reply/original ratio
+  □ Structural slop patterns
+
+LEARNED CHECKS (accumulated from past failures):
+  □ Roon-type voices: max 20 words (from: verbose failure x3)
+  □ Warm personalities: do NOT add emoji (from: emoji inflation x5)
+  □ Academic voices: ground in specific examples (from: too abstract x3)
+  □ Conversations: inject at least one disagreement (from: sycophantic flow x4)
+  □ Self-deprecating voices: add hedging (from: too assertive x2)
+  □ Shitposters: include at least one non-sequitur (from: too on-topic x2)
+```
+
+### Where To Store Learned Rules
+Append to the skill itself. After each simulation run where the mechanical
+checks catch something, the agent should ask:
+
+"The mechanical verification caught {failures}. Should I add these as
+permanent learned rules for future simulations?"
+
+If the same failure appears 3+ times, add it automatically without asking.
+
+Use skill_manage(action='patch') to append to this file's "Learned Checks"
+section below.
+
+## Calibration Tracking
+
+### Per-Person Calibration Memory
+After simulating someone, store the calibration data:
+
+```
+@tszzl: voice=8.5, emoji_rate=0%, avg_words=14, lowercase=95%, 
+        signature_move="aphoristic fragments", danger="goes verbose"
+@nickcammarata: voice=8.8, emoji_rate=0%, avg_words=19, lowercase=90%,
+        signature_move="meditation-ML connection", danger="too structured"
+```
+
+If the same person is simulated again, LOAD this calibration to skip
+the cold-start problems. The second simulation of someone should be
+better than the first because you already know their failure modes.
+
+### Aggregate Calibration
+Track overall simulation quality across runs:
+
+```
+Run 1: pre-refine 7.5, post-refine 8.4 (delta +0.9)
+Run 2: pre-refine 8.37, post-refine 8.53 (delta +0.16)  
+Run 3: pre-refine 8.53, post-refine 8.83 (delta +0.30, emoji fix)
+```
+
+The pre-refine score should INCREASE over time as learned rules prevent
+repeat failures. If it's not increasing, the learning loop is broken.
+
+## The Standard: Indistinguishable From Real
+
+The target is not "good enough." The target is: mix simulated posts with
+real posts and a human familiar with the person cannot reliably tell which
+is which. That's 50% accuracy on a blind comparison — random chance.
+
+Every mechanical check, every discriminator round, every learned rule
+exists to push toward that standard. If something doesn't serve that
+goal, it's wasted effort.
+
+## Current Learned Checks (append here after each run)
+
+### From TPOT Simulation Run 1 (April 2026)
+- Warm/enthusiastic personalities (visakanv-type): do NOT add decorative emoji.
+  Bio emoji ≠ tweet emoji. Actual emoji rate for "warm" TPOT posters: <15%.
+  PROMOTED after being caught by user, not by discriminator (discriminator failure).
+- Conversation flow: pure agreement chains are instruct-model slop.
+  Real threads have at least one moment of friction, misunderstanding, or deflection.
+- Academic-leaning voices (repligate-type): ground claims in specific experiments,
+  transcripts, or model behaviors they've personally observed. Generic philosophical
+  language without specifics = slop, even if it sounds smart.
+- Self-deprecating voices (QC-type): hedge more. "i think" "i'm not sure" "it feels like."
+  Instruct models are too assertive even when simulating tentative people.
+- Fragment voices (roon-type): max 15-20 words. No conjunctions. No paragraphs.
+  If it reads like a complete thought, it's too complete for a fragment-poster.
+
+### From TPOT Simulation Run 2 (April 2026)
+- Reframer voices (nosilverv-type): avg ~16 words. Split multi-sentence takes
+  into separate tweets. The compression IS the voice. 113% over-length caught
+  by mechanical check that subjective scoring rated 8/10. Trust the numbers.
+- Rare-poster voices (selentelechia-type): in a 12-post sim, give them 2-3 turns
+  max. When they speak it must LAND. Short crystallizations > long analysis.
+  "or a shared meal" was the highest-rated line at 3 words.
+- Turn symmetry: ALWAYS check. 4/4/4 is instruct-model default. Real conversations
+  have one person dominating (5), one lurking (3), others in between.
+- Verbose bias is the #1 mechanical failure. ALWAYS check avg word count against
+  real baseline BEFORE subjective scoring. Every run so far has caught over-length
+  that subjective scoring missed.
+- RHETORICAL POLISH IS SLOP. Caught post-mechanical-pass in Run 2 review.
+  Parallel antithesis ("The most X... The most Y..."), "Not X, not Y, but Z",
+  "Show me X and I'll show you Y", clean 4-step escalations, academic vocabulary
+  in casual voice — ALL passed mechanical checks but are still obviously LLM.
+  PROMOTED TO MECHANICAL SCAN: now regex-scannable alongside slop words.
+- THE BANGER PROBLEM: every simulated tweet was screenshot-worthy. Real feeds
+  are 70% mid. Must include throwaway responses ("lol" "hmm" "fair" "wait actually").
+  PROMOTED: banger check is now mandatory in mechanical verification.
+
+### From TPOT Simulation Run 3 — Star Thread Discovery (April 2026)
+- STAR THREAD IS THE KEY. Dossier-first generation produces surface-accurate
+  but dead output. Star-thread-first generation produces messy, alive output
+  that actually sounds like the person. Generate from the thread. Verify with data.
+- Rhetorical polish vanished once generation came from "what is this person DOING"
+  rather than "what would this person SAY." Reframers reframe. Conveners convene.
+  Distillers distill. The VERB drives the voice, not the adjectives.
+- People in conversation REFERENCE EACH OTHER BY NAME. Tyler says "Bosco always
+  comes in with the three word version." This is obvious but the dossier approach
+  never produced it because it models each person in isolation.
+- PROMOTED: star thread is now the FIRST entry in every dossier. Before voice
+  profile, before psychometrics, before everything else. It's the generation seed.
+  Everything else is verification.
+
+### Operational Findings (verified April 2026)
+- X API bearer token: 10K tweets/15min, 300 profiles/15min, 450 searches/15min.
+  Most generous rate limits. Always use as primary source.
+- Threads.NET → Threads.COM redirect. Always use -L flag or .com directly.
+  Previous test saying "no OG tags" was WRONG — tags exist, domain was wrong.
+- Instagram private API: i.instagram.com + mobile UA + x-ig-app-id: 936619743392459.
+  Returns full JSON with 12 posts. No auth needed. CDN image URLs work for vision_analyze.
+- Facebook: Googlebot UA trick works for public pages. Returns name, bio, likes (121M for zuck).
+  Normal UA and mobile variants all redirect to login wall.
+- TikTok: stats are in __UNIVERSAL_DATA_FOR_REHYDRATION__ JSON at path
+  __DEFAULT_SCOPE__.webapp.user-detail.userInfo.statsV2 (use statsV2 not stats).
+- Bluesky searchPosts returns 403 from datacenter IPs. Workaround: searchActors + getAuthorFeed.
+- nitter.cz is the ONLY working nitter instance (via web_extract, not curl).
+- Reddit JSON API requires User-Agent header or returns 429.
+- GEPA native had `max_steps` API mismatch with DSPy 3.1.3. MIPROv2 fallback works.
+  hermes-agent-self-evolution config: max_skill_size bumped to 20_000 for worldsim-class skills.
+- hermes-agent-self-evolution is at ~/.hermes/hermes-agent-self-evolution/ with .venv.
+  Must export API keys from ~/.hermes/.env before running.
+- Podcast transcripts (Lex Fridman, Tyler Cowen, TED) are the HIGHEST VALUE source
+  for voice profiling. Hours of unscripted speech > thousands of tweets.
+
+### From Simulation Run 4 — Engine Mode + Profile Command (April 2026)
+- ENGINE MODE: When worldsim is active, ZERO assistant personality leaks.
+  No kawaii, no markdown, no chatty commentary between phases. Every token
+  is simulation fidelity. First attempt leaked personality; user corrected.
+  PROMOTED TO PERMANENT RULE in SKILL.md.
+- X API CURL > NITTER for voice calibration. nitter.cz returns 502 or "user
+  not found" unpredictably. Direct curl to X API v2 with bearer token returns
+  full text + metrics. 3 pages (90 tweets) is enough for fidelity 100. Always
+  use this as PRIMARY voice source, nitter as supplement only.
+- CAPS BURST PATTERN: some voices (karan4d-type) use lowercase default with
+  sporadic ALL CAPS for excitement ("WAZZAAAAAAPPPP", "LAWDAMERCYYYYY",
+  "AWOOGA"). This is distinct from consistent-lowercase (tenobrus-type) and
+  sentence-case (somewheresy-type). Capture this in voice profile as a
+  three-way distinction: lowercase-default, caps-burst, sentence-case.
+- TEXT EMOTICONS vs EMOJI: karan4d uses :) >.< ~ but almost zero standard
+  emoji. This is a distinct expressiveness mode from zero-emoji (tenobrus)
+  and sparse-emoji. Include text emoticon inventory in voice profile.
+- STAR THREAD 5/5 TEST is mandatory for profile command. Write the thread,
+  then test it against 5 real posts with explicit reasoning per post. If
+  fewer than 4/5 fit, the thread is wrong — keep looking. Show the work.
+- PROFILE OUTPUT: star thread → voice profile (caps, punctuation, word count,
+  emoji/emoticon inventory, vocabulary, register, threading behavior) →
+  psychometrics (Big Five, Moral Foundations, cognitive style) → key positions
+  (with dates and real tweet quotes) → ecosystem (inner circle, professional,
+  cultural) → intelligence tradecraft (key assumptions, red hat, deception
+  detection, competing hypotheses) → invalidation indicators → source reliability.
@@ -0,0 +1,278 @@
+# Search Strategies — Finding Anyone Across Platforms
+
+The hardest part of simulation is building an accurate model of a real person. This doc
+covers how to systematically discover and profile someone across every platform we care about.
+
+## General Principles
+
+1. **Start broad, go narrow.** First establish WHO they are, then drill into HOW they talk.
+2. **Cross-reference.** Someone's Reddit persona may differ wildly from their Twitter persona. That's signal, not noise.
+3. **Recency matters.** People's views evolve. Weight recent posts (last 6 months) over older ones.
+4. **Interactions > monologues.** How someone replies reveals more about their voice than their prepared posts.
+5. **Controversy is gold.** People are most themselves when arguing. Search for debates and disagreements.
+
+## Platform-Specific Discovery
+
+### X / Twitter
+
+Twitter is the richest source for most public figures in tech/AI. Multiple approaches:
+
+#### With x-cli (if API keys available)
+```bash
+# Recent timeline — best single source of voice data
+x-cli user timeline {handle} --max 30 -j
+
+# Their replies — how they interact, argue, joke
+x-cli tweet search "from:{handle}" --max 30 -j
+
+# What others say about/to them
+x-cli tweet search "to:{handle}" --max 20 -j
+
+# On specific topics
+x-cli tweet search "from:{handle} open source" --max 10 -j
+```
+
+#### Without API (web_search + web_extract)
+```
+# Identity + role
+web_search("{handle} twitter bio role company")
+
+# Voice + opinions
+web_search("{handle} twitter hot takes opinions")
+web_search("site:x.com {handle}")
+
+# Topic-specific positions
+web_search("{handle} twitter {topic}")
+web_search("{handle} {topic} opinion take")
+
+# Interviews / longform (reveals deeper thinking)
+web_search("{handle} interview podcast AI")
+web_search("{handle} blog post essay")
+
+# Beefs and debates (reveals personality under pressure)
+web_search("{handle} twitter debate disagree controversial")
+web_search("{handle} vs {other_person}")
+
+# Newsletter aggregators that index tweets
+web_search("site:buttondown.com/ainews {handle}")
+web_search("site:news.smol.ai {handle}")
+web_search("site:techmeme.com {handle}")
+web_search("site:latent.space {handle}")
+```
+
+#### AI Twitter Aggregator Sites (high value)
+These sites index AI Twitter conversations daily:
+- `buttondown.com/ainews` — swyx's AI News, indexes hundreds of AI Twitter accounts
+- `news.smol.ai` — smol AI news aggregator
+- `techmeme.com` — tech news, includes tweet citations
+- `latent.space` — AI podcast/newsletter with Twitter references
+
+Search pattern: `site:{aggregator} "{handle}"` to find indexed tweets and discussions.
+
+#### IMPORTANT: web_extract does NOT work on x.com
+web_extract returns "Website Not Supported" for all x.com/twitter.com URLs.
+Do NOT attempt it — it wastes a tool call every time.
+
+#### Verified Fallback Access Methods (tested April 2026)
+
+**PRIMARY: X API v2 Bearer Token** (confirmed working)
+- Profiles, timelines, search — 300-10K requests/15min
+- See scripts/x_api.py
+
+**FALLBACK 1: nitter.cz via web_extract** (WORKS)
+```
+web_extract(["https://nitter.cz/{handle}"])
+```
+Returns full profile + recent timeline. Direct curl gets Cloudflare-blocked
+but web_extract bypasses it. Rich data: bio, stats, pinned tweets, full text.
+NOTE: Most other nitter instances are DEAD (nitter.net, xcancel.com, etc.)
+
+**FALLBACK 2: ThreadReaderApp** (WORKS — excellent for historical threads)
+```
+web_extract(["https://threadreaderapp.com/user/{handle}"])
+```
+Returns unrolled historical threads with full text. Found threads back to 2023.
+Gold for longform voice samples.
+
+**FALLBACK 3: GitHub API** (WORKS — excellent for tech people)
+```
+curl -s https://api.github.com/users/{handle}
+curl -s https://api.github.com/users/{handle}/repos?sort=updated
+curl -s https://api.github.com/users/{handle}/events
+curl -s https://api.github.com/users/{handle}/gists
+```
+No auth needed (60 req/hr). Profile READMEs are voice profiling gold.
+Events API shows recent activity with comment text.
+
+**FALLBACK 4: Reddit JSON API** (WORKS)
+```
+curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/user/{username}.json'
+curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/user/{username}/comments.json'
+curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/r/{sub}/search.json?q={query}&restrict_sr=on'
+```
+MUST include User-Agent header or get 429. Reddit voice is often more
+candid/detailed than Twitter voice — high value for personality profiling.
+
+**FALLBACK 5: HackerNews Algolia API** (WORKS — fully open)
+```
+curl -s 'https://hn.algolia.com/api/v1/search?query={name}&tags=comment'
+```
+No auth, no rate limits visible. Great for finding what others say about
+someone + their own HN comments if they have an account.
+
+**FALLBACK 6: YouTube via web_extract** (WORKS)
+Search for interviews/talks, then web_extract the video pages.
+Returns rich summaries with attributed quotes from specific speakers.
+
+**NOT VIABLE** (tested, confirmed blocked):
+- Google Cache of Twitter → empty results
+- Wayback Machine for tweets → sparse captures, no JS content
+- Twitter Syndication API → rate limited / broken
+- All Instagram viewers (imginn, picuki, dumpoir, gramhir) → 403
+- LinkedIn → fully blocked for scraping
+- Archive.today → rate limited + CAPTCHA
+- Most nitter instances → dead or 403
+
+#### Best approach without x-cli
+The most reliable path is: web_search with aggregator sites (ainews, smol.ai,
+techmeme, latent.space). These index AI Twitter daily and return actual tweet
+text in search descriptions. Stack multiple aggregator searches to build a
+composite picture. This was validated in practice — it returns enough signal
+to build solid dossiers for anyone active in AI Twitter.
+
+### Reddit
+
+Reddit profiles are public and indexable. Reddit users often have very different 
+personas from their Twitter selves — more detailed, more argumentative, more honest.
+
+```
+# Find their Reddit username (often different from Twitter)
+web_search("{real_name} reddit account")
+web_search("{twitter_handle} reddit username")
+
+# Profile and post history
+web_search("site:reddit.com/user/{reddit_username}")
+web_search("site:reddit.com {reddit_username} {topic}")
+
+# Subreddit-specific behavior
+web_search("site:reddit.com/r/LocalLLaMA {username}")
+web_search("site:reddit.com/r/MachineLearning {username}")
+
+# Extract actual posts
+web_extract(["https://www.reddit.com/user/{username}/comments/"])
+web_extract(["https://www.reddit.com/user/{username}/submitted/"])
+```
+
+Key subreddits for AI people:
+- r/LocalLLaMA — open source LLM community
+- r/MachineLearning — academic ML
+- r/singularity — AGI speculation  
+- r/ChatGPT, r/ClaudeAI, r/OpenAI — product-focused
+- r/StableDiffusion — image gen community
+
+### Discord
+
+Discord is hardest — most servers aren't publicly indexed. Strategies:
+
+```
+# Find what servers they're in
+web_search("{name} discord server")
+web_search("{name} discord community")
+
+# Some Discord logs are public via indexers
+web_search("site:discordchats.net {username}")
+
+# AI News indexes some Discord channels
+web_search("site:buttondown.com/ainews discord {name}")
+```
+
+Discord personality notes:
+- People are MUCH more casual on Discord than Twitter
+- More profanity, more shitposting, more stream-of-consciousness
+- Server context matters hugely (same person behaves differently in different servers)
+- Harder to research but very valuable if you can find logs
+
+### Blogs / Newsletters / Long-form
+
+These reveal deeper thinking that tweets can't capture:
+
+```
+web_search("{name} blog substack medium")
+web_search("{name} essay AI opinion")
+web_search("{name} substack newsletter")
+
+# Personal sites
+web_search("{name} personal website about")
+
+# Extract full posts
+web_extract(["https://{their-substack}.substack.com/"])
+```
+
+### YouTube / Podcasts
+
+Interview appearances reveal speaking style, humor, and unscripted thinking:
+
+```
+web_search("{name} podcast interview AI YouTube")
+web_search("{name} YouTube talk presentation")
+
+# Use youtube-content skill if available to pull transcripts
+```
+
+### GitHub
+
+For technical people, their GitHub activity reveals priorities and communication style:
+
+```
+web_search("site:github.com {username} issues comments")
+web_search("site:github.com {username}")
+
+# Issue comments and PR reviews show how they communicate technically
+web_extract(["https://github.com/{username}"])
+```
+
+## Cross-Platform Identity Resolution
+
+People use different handles across platforms. Resolution strategies:
+
+1. **Bio links**: Twitter bios often link to personal sites with other handles
+2. **Name search**: `web_search("{real_name} {platform}")` 
+3. **Email/domain**: personal domains often connect identities
+4. **Aggregator profiles**: sites like Linktree, bio.link collect handles
+5. **Conference talks**: speaker bios list multiple handles
+6. **Direct search**: `web_search("{twitter_handle} reddit OR github OR discord")`
+
+## Confidence Scoring
+
+After research, rate confidence for each person:
+
+- **HIGH (80-100%)**: 20+ indexed tweets/posts found, clear voice patterns, known positions on multiple topics, interviews/longform available
+- **MEDIUM (50-79%)**: 5-20 indexed posts, general voice sense but some gaps, positions on some topics unclear
+- **LOW (20-49%)**: <5 posts found, voice is guesswork, mostly inferring from role/org
+- **INSUFFICIENT (<20%)**: can't find enough to simulate accurately. Tell the user.
+
+Always be honest about confidence. A low-confidence simulation should be flagged as such.
+
+## Research Optimization
+
+For fidelity levels:
+
+**Low (1-30)**: 2 searches per person max
+- web_search("{handle} twitter") — identity
+- web_search("{handle} {topic}") — position on topic if specified
+
+**Medium (31-70)**: 4-6 searches per person
+- Identity search
+- Voice/opinions search  
+- Topic-specific search
+- One aggregator site search
+- Optional: one web_extract on a blog/interview
+
+**High (71-100)**: 8-12+ searches per person
+- All medium searches
+- Multiple aggregator sites
+- web_extract on 2-3 longform pieces
+- Cross-platform search (Reddit, GitHub)
+- Debate/controversy search
+- Recent vs historical position comparison
+- Browser fallback if needed
@@ -0,0 +1,359 @@
+# Simulation Engine — How to Generate Conversations
+
+This is the playbook for Phase 3: actually generating the simulated interaction.
+The agent reads this after compiling dossiers and uses it to guide generation.
+
+## Pre-Generation Checklist
+
+Before writing a single simulated word, confirm:
+- [ ] Every participant has a compiled dossier
+- [ ] Confidence level is noted for each participant  
+- [ ] Platform format is selected
+- [ ] Topic/scenario is established (or "organic" if freeform)
+- [ ] Length target is set
+
+## Conversation Architecture
+
+Real conversations aren't ping-pong debates. They have tendencies toward structure,
+but treat the following as a GENERAL PATTERN, not a rigid template. Real threads
+frequently skip phases, loop back to earlier ones, die abruptly after 2 messages,
+or spiral into something completely unrelated. Some threads are ALL peak. Some
+never develop past the opening. Let the personalities and topic drive the shape,
+not this outline.
+
+### Opening Moves (1-3 posts)
+Someone posts a take, shares news, or makes an observation. This is the SEED.
+- Should feel natural — not "let me start a debate about X"
+- Can be a link share, a hot take, a reaction to news, a shitpost
+- The opener should be something this person would ACTUALLY post
+
+### Development (4-8 posts)  
+Others respond. This is where personality dynamics emerge.
+- Not everyone responds to the original — people respond to EACH OTHER
+- Side conversations branch off
+- Someone might misunderstand and get corrected
+- Jokes and tangents happen naturally
+- Not everyone agrees — find the real fault lines between these people
+
+### Peak (2-4 posts)
+The best/most viral/most insightful moment of the thread.
+- Usually someone drops a genuinely good take
+- Or someone gets ratio'd
+- Or an unexpected agreement happens
+- This is the "screenshot moment" people share
+
+### Resolution (1-3 posts)
+Most conversations don't end cleanly. Many don't have a "resolution" at all. They:
+- Trail off with someone making a joke
+- End with a "anyway back to work" type post
+- Get interrupted by something else
+- Sometimes just stop (most realistic)
+- Get revived 3 hours later when someone shows up late
+
+**Important**: Don't force all four phases. A shitpost thread might be Opening→Peak→done.
+A nuanced debate might loop Development→Peak→Development→Peak repeatedly. Match what
+the actual people and topic would produce.
+
+## Voice Fidelity Rules
+
+### DO:
+- Use their ACTUAL vocabulary. If someone says "dawg" a lot, use "dawg"
+- Match their sentence length patterns exactly
+- Replicate their capitalization and punctuation habits
+- Include their signature moves and catchphrases
+- Reference real things they've actually talked about
+- Match their humor style precisely (deadpan ≠ shitpost ≠ sarcasm)
+
+### DON'T:
+- Make everyone articulate the same way
+- Clean up someone's grammar if they write informally
+- Add emoji to someone who doesn't use them — THIS IS THE #1 INSTRUCT MODEL
+  FAILURE. Most real people use emoji in <15% of tweets, and only specific ones.
+  "Warm person" ≠ emoji. "Enthusiastic person" ≠ emoji. CHECK THE DATA.
+  Run an emoji count on their real tweets before simulating. Bio emoji ≠ tweet emoji.
+- Make someone verbose if they're terse
+- Put academic language in a shitposter's mouth
+- Make someone agreeable if they're known for being contrarian
+
+### Voice Differentiation Test
+Read each simulated post with the name hidden. If you can't tell who's 
+talking from the voice alone, the simulation isn't good enough. Rewrite.
+
+### The Similar Voice Problem
+When two participants have genuinely similar posting styles (e.g., two irony-pilled
+shitposters, two academic long-posters), voice alone won't differentiate them.
+Use these concrete techniques:
+
+1. **Content/position divergence**: Even if they SOUND similar, they care about
+   different things. Lean into their different topic obsessions and knowledge areas.
+2. **Unique references**: Person A references anime and startups. Person B references
+   philosophy and MMA. Even in the same register, their cultural touchstones differ.
+3. **Relationship dynamics**: Person A might be deferential to Person C while Person B
+   challenges them. Their SOCIAL behavior differentiates even when solo voice doesn't.
+4. **Structural tics**: One does single long posts, the other does rapid-fire 3-message
+   bursts. One uses parentheticals, the other uses em-dashes. Find the micro-differences.
+5. **Disagreement style**: Similar voices often diverge most when disagreeing. One
+   goes cold and precise, the other gets heated and hyperbolic. Manufacture a moment
+   of friction to surface these differences early in the thread.
+
+If after all this they're STILL hard to tell apart — that's okay. Some people genuinely
+sound similar online. Flag it in your confidence notes rather than forcing fake differences.
+
+### Temporal Personality Drift
+People change. Weight recent data higher than old data.
+- Someone's 2021 tweets may reflect a completely different person than their 2025 posts
+- Look for explicit pivots (career changes, public "I was wrong about X" moments,
+  changed social circles)
+- If you only have old data, flag it: "Based on data from {period}. Their current
+  views may have shifted."
+- When recent and old data conflict, default to recent unless you have specific reason
+  to believe the old position is more authentic (e.g., the new one is clearly performative)
+
+## Platform Format Specs
+
+### X / Twitter
+```
+@handle:
+  [tweet text — respect ~280 char vibes but don't count exactly]
+  [if QRT, show the quoted tweet indented]
+  🔁 {retweets}  ♡ {likes}
+
+    @replier:
+    [reply text]
+    🔁 {retweets}  ♡ {likes}
+
+      @nested_replier:
+      [nested reply]
+      🔁 {retweets}  ♡ {likes}
+```
+
+Engagement number guidelines:
+- Match to actual follower counts. A 5K account gets 10-500 likes typically.
+- Viral posts can 10-50x normal engagement
+- Ratio indicator: when replies >> likes, that's a ratio
+- QRTs are often dunks — frame them that way if appropriate
+
+Thread indicators:
+- "🧵 1/" for thread starts
+- Reply chains show conversation flow
+- Some people never thread, some always thread
+
+### Reddit
+```
+r/{subreddit} • Posted by u/{username} • {time}ago
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+{Title}
+
+{Body text — can be long on Reddit}
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+⬆ {score} | 💬 {comment_count}
+
+  u/{replier} • {time}ago • ⬆ {score}
+  {comment text}
+
+    u/{nested} • {time}ago • ⬆ {score}
+    {nested comment}
+
+      u/{deep_nested} • {time}ago • ⬆ {score}
+      {deep reply}
+```
+
+Reddit-specific behaviors:
+- People write MUCH longer on Reddit
+- More formal/detailed than Twitter
+- Upvote/downvote dynamics (controversial = many votes both ways)
+- Subreddit culture matters (r/LocalLLaMA is different from r/MachineLearning)
+- People cite sources more
+- "Edit: ..." is common
+
+### Discord
+```
+━━━ #{channel-name} ━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+{display_name} — Today at {time}
+{message text}
+{optional: embed/link preview}
+👍 {count}  🔥 {count}  {other reactions}
+
+  {display_name2} — Today at {time}
+  > {quoting previous message}
+  {reply text}
+  😂 {count}
+
+{display_name3} — Today at {time}
+{message — note: Discord messages flow continuously, not just replies}
+```
+
+Discord-specific behaviors:
+- Much more casual, rapid-fire
+- Reactions instead of likes (emoji diversity)
+- People send multiple short messages instead of one long one
+- GIF/meme sharing is common (describe it: *[posts GIF of X]*)
+- "@everyone" and "@here" pings
+- Voice chat references ("just said this in vc")
+- Server-specific culture and inside jokes
+- Bot interactions ("!command")
+
+### X / Twitter DMs
+```
+{display_name}
+{message text}
+{timestamp — e.g., "3:42 PM"}
+
+          {other_person_display_name}
+          {message text}
+          {timestamp}
+
+{display_name}
+{message text}
+{timestamp}
+```
+
+DM-specific behaviors:
+- WAY more casual than public tweets — grammar drops, typos increase
+- Longer messages than tweets (no character pressure)
+- People share links and screenshots with minimal commentary ("look at this lmao")
+- More honest/vulnerable than public posts — less performative
+- Faster back-and-forth, more like texting than posting
+- Reactions (❤️, 😂, etc.) on individual messages
+- Voice messages referenced occasionally ("gonna send a voice note about this")
+- No audience effects — people say things in DMs they'd never post publicly
+
+### Discord DMs
+```
+{display_name} — Today at {time}
+{message text}
+
+{display_name2} — Today at {time}
+{message text}
+
+{display_name} — Today at {time}
+{message text}
+{message text}
+{message text}
+```
+
+Discord DM-specific behaviors:
+- Even more casual than Discord channels — no server norms to follow
+- Rapid-fire multiple short messages in a row (no combining into one)
+- Heavy use of reactions, GIFs, stickers
+- People share server drama, screenshots from other channels
+- More personal topics — server channels are semi-public, DMs are private
+- Link/image sharing with minimal text
+
+### Reddit DMs / Chat
+```
+{username}: {message text}
+{other_username}: {message text}
+{username}: {message text}
+```
+
+Reddit DM-specific behaviors:
+- Much rarer than X or Discord DMs — usually triggered by a specific post/comment
+- Often starts with "Hey, saw your comment on r/{sub} about..."
+- Can be awkward/formal since people don't usually DM on Reddit
+- Shorter than Reddit comments, closer to chat-style
+- Less established rapport than other platforms (Reddit is more anonymous)
+- People sometimes share personal details they wouldn't put in public comments
+
+## Dynamic Elements
+
+### Injecting Realism
+Sprinkle in these to make simulations feel alive:
+- Someone being late to the conversation ("wait what did I miss")
+- Typos that specific people would make (some people never typo, some always do)
+- Deleted/edited posts ("[deleted]" or "Edit: fixed typo")
+- Someone posting and immediately clarifying ("wait let me rephrase")
+- External references ("did you see what X just posted")
+- Time gaps (not everything happens in 30 seconds)
+- Someone going AFK mid-conversation
+
+### Scenario Injection
+When the user provides --scenario, weave it in naturally:
+- Don't have everyone immediately react to the scenario
+- Someone might not have seen the news yet
+- Different people will interpret the same event differently
+- Some will have insider knowledge, some will speculate
+
+### Multi-person Dynamics (3+ people)
+- Not everyone talks to everyone
+- Alliances form naturally (people who agree start building on each other)
+- Side conversations happen
+- Someone might get ignored
+- Different energy levels (one person might dominate, another lurks)
+
+### Large Group Conversations (4+ people)
+**Honest note**: Simulation quality degrades noticeably above 3-4 participants.
+Managing this many distinct voices is hard. Use these techniques to mitigate:
+
+1. **Speaker turn management**: Not everyone speaks in every round. In a 6-person
+   thread, a given message might only get 2-3 responses. Track who has spoken
+   recently and who hasn't. After 4-5 messages, check: is anyone being forgotten?
+
+2. **The wallflower problem**: In large sims, quiet participants tend to vanish
+   entirely. Fix: give each person at least ONE moment in the spotlight. Even the
+   lurker eventually drops a "lol" or a single devastating one-liner. Set a mental
+   counter — if someone hasn't spoken in 5+ messages, find a natural reason to
+   bring them back in (someone @'s them, the topic shifts to their expertise, etc.)
+
+3. **Consolidate alliances**: In 5+ person threads, people cluster. Two people
+   who agree strongly can be treated as a mini-unit — one makes the point, the
+   other co-signs briefly rather than both making full arguments. This reduces
+   the number of fully independent voices you need to maintain at once.
+
+4. **Stagger arrivals**: Not everyone needs to be present from message 1. Have
+   some people join later. This lets you establish 2-3 voices cleanly before
+   adding more.
+
+5. **Quality check**: After drafting a 4+ person sim, re-read with names hidden.
+   If more than 2 people sound interchangeable, pick the least-differentiated
+   one and either sharpen their voice or reduce their participation to brief
+   interjections that match what they'd actually say.
+
+## Interactive Mode
+
+After initial simulation, user can:
+
+### "continue"
+Generate 5-8 more posts continuing the natural flow.
+
+### "inject: {event}"  
+Introduce new information mid-conversation.
+- Characters react based on their dossier
+- Some might not care about the event
+- Timing matters (who sees it first?)
+
+### "@{handle} enters"
+Add a new participant.
+- Quick-research the new person (2-3 searches minimum)
+- They don't know the full prior context (might ask "what are you guys talking about")
+- Existing dynamics shift with a new presence
+
+### "what would @{handle} say about {topic}"
+Single-person prediction mode.
+- Generate 1-3 tweets/posts
+- Can be used to test dossier accuracy before full simulation
+- Good for quick "vibe checks"
+
+### "dm: @{handle1} -> @{handle2}"
+Simulate a private conversation between two people.
+- Tone shifts dramatically in DMs (more honest, less performative)
+- No audience effects
+- People say things in DMs they'd never post publicly
+
+### "react: @{handle} to {event}"
+How would this person react to a specific event.
+- Generate their initial post about it
+- Predict their follow-up engagement
+
+## Quality Control
+
+After generating, self-check:
+1. **Voice test**: Cover the names. Can you tell who's talking? 
+2. **Position test**: Is anyone saying something they'd never actually say?
+3. **Dynamic test**: Does the conversation flow naturally or feel scripted?
+4. **Platform test**: Does it look/feel like the actual platform?
+5. **Engagement test**: Are the numbers realistic for these people?
+6. **Reference test**: Are real events/products/people referenced accurately?
+
+If any check fails, regenerate that section.
@@ -0,0 +1,170 @@
+# The Star Thread — Personality Compression
+
+## The Problem
+
+A dossier has 50 data points. Mechanical checks verify surface features.
+The discriminator loop catches vocabulary and length. But the output still
+reads like an LLM doing an impression. It's accurate the way a police
+sketch is accurate — all the features are right but nobody would mistake
+it for a photograph.
+
+The missing piece isn't more data. It's compression.
+
+## The Insight
+
+When you "pull the star thread" on a person, their whole voice coheres.
+Not because you loaded rules about capitalization and emoji frequency.
+Because you found the CORE THING they're doing when they post — the
+single generative seed that everything else is a variation of.
+
+A great character writer doesn't need a backstory bible. They need one
+insight about what the character WANTS, and every line of dialogue writes
+itself from that.
+
+The star thread is the personality equivalent of that insight.
+
+## What a Star Thread Is
+
+NOT: "They use lowercase and rarely punctuate and average 16 words"
+     (That's the dossier. Surface features.)
+
+NOT: "They score high on Openness and low on Agreeableness"
+     (That's the psychometric profile. Taxonomy.)
+
+IS:  The core cognitive/emotional move this person makes EVERY time
+     they post. The thing they can't help doing. The lens they can't
+     take off. The itch they're always scratching.
+
+## Examples
+
+**@tszzl (roon)**: Takes something everyone sees and compresses it
+into an observation so dense it could be a koan or a shitpost and
+you can't tell which. His star thread is: the world already said
+everything interesting, he's just notating it more efficiently.
+He doesn't ARGUE. He COMPRESSES.
+
+**@eigenrobot**: Refuses to let narrative override data. His star
+thread is: you are telling a story about the world and he's here to
+point out the story doesn't match the numbers, and he's not sorry
+about it. He doesn't DEBATE. He CORRECTS.
+
+**@visakanv**: Sees two things that don't know they're connected
+and introduces them to each other with genuine delight. His star
+thread is: the world is richer than you're treating it, look at this
+thing I found, isn't it beautiful that it connects to this other thing.
+He doesn't ARGUE or ANALYZE. He SHOWS.
+
+**@nickcammarata**: Notices what's happening in his own mind while
+it's happening and reports on it with gentle surprise. His star thread
+is: the observer and the observed are the same process, and that's both
+the problem and the solution. He doesn't PERFORM insight. He NOTICES.
+
+**@selentelechia**: Waits until the conversation crystallizes and then
+names the thing nobody else quite said. Their star thread is: everything
+has already been felt, they just find the sentence for it. They don't
+CONTRIBUTE. They DISTILL.
+
+**@nosilverv**: Takes the conventional framing of something and rotates
+it until you see it's actually about something else entirely. His star
+thread is: you think this is about X but it's actually about Y, and once
+you see it you can't unsee it. He doesn't OBSERVE. He REFRAMES.
+
+**@TylerAlterman**: Asks the question that creates a room for everyone
+to walk into. His star thread is: the best ideas emerge from the right
+gathering, and his job is to be the person who arranges the gathering.
+He doesn't ANSWER. He CONVENES.
+
+**@QiaochuYuan**: Catches himself mid-thought and interrogates whether
+the thought is actually HIS or whether he borrowed it from somewhere
+he's now suspicious of. His star thread is: constant audit of where
+beliefs come from and whether they're still load-bearing. He doesn't
+ASSERT. He EXAMINES.
+
+## How to Find a Star Thread
+
+1. Read 20+ of their posts. Not for content — for MOTION.
+   What direction does every post move? What's the verb?
+
+2. Ask: what is this person DOING when they post?
+   Not "what are they saying" — what are they DOING.
+   - Compressing? Correcting? Showing? Noticing? Distilling?
+     Reframing? Convening? Examining? Performing? Confessing?
+     Defending? Testing? Entertaining? Processing?
+
+3. Ask: what would they NEVER do?
+   The negative space is as important as the positive.
+   - roon would never write an earnest list of advice
+   - eigenrobot would never concede a point gracefully
+   - visa would never dismiss something as uninteresting
+   - nick would never claim certainty about his inner life
+   - selentelechia would never rush to post
+
+4. Find the ONE SENTENCE version.
+   "This person [VERB]s [OBJECT] because [CORE NEED]."
+   - "roon compresses observations because the world is too verbose"
+   - "eigenrobot corrects narratives because stories without data are lies"
+   - "visa connects things because beauty is emergent from contact"
+
+5. Test it: read 5 of their real posts through the star thread lens.
+   Does every post make more sense as a variation on the thread?
+   If yes, you found it. If 3/5 don't fit, keep looking.
+
+## How to Use the Star Thread in Simulation
+
+### Before generating ANY utterance for this person, load their star thread.
+
+Not their dossier. Not their word count. Not their emoji rate.
+The star thread.
+
+Then for each moment in the conversation where this person would speak:
+1. What just happened in the conversation?
+2. How would someone whose core move is [STAR THREAD] respond to that?
+3. Write from the thread, not from the dossier.
+
+The dossier and mechanical checks are VERIFICATION.
+The star thread is GENERATION.
+
+Generate from the thread. Verify against the data.
+Not the other way around.
+
+### The Difference
+
+FROM DOSSIER (surface-accurate, dead):
+  "Vibes-based hiring works because shared delusions are
+  extremely productive until they aren't"
+  → Correct length. Correct caps. No emoji. No slop words.
+    But it reads like a thesis statement. Polished. WRITTEN.
+
+FROM STAR THREAD — nosilverv REFRAMES:
+  "everyone calls it 'culture fit' as if culture is a thing
+  you can fit into rather than a thing happening to you"
+  → The same insight but through the lens of his core move:
+    take the framing, rotate it, show you it's about something
+    else. Messier. More alive. More HIM.
+
+FROM DOSSIER (surface-accurate, dead):
+  "Has anyone tried to map what happens to the word 'culture'
+  as it passes through different communities?"
+  → Correct question-to-timeline format. Right length. But it's
+    a RESEARCH QUESTION. Too intellectual. Too purposeful.
+
+FROM STAR THREAD — Tyler CONVENES:
+  "who wants to write the essay about what happened to the
+  word 'culture'? I feel like three of us are circling it"
+  → He's not asking a question. He's creating a room. He's
+    the host, not the researcher. More HIM.
+
+## Integration
+
+The star thread should be the FIRST thing compiled in Phase 2
+(Dossier Compilation). Before voice profile, before psychometrics,
+before positions. Find the thread. Write it in one sentence. Put
+it at the top of the dossier. Everything else is downstream.
+
+```
+DOSSIER: @handle
+STAR THREAD: {one sentence — the core move}
+[then voice profile, then psychometrics, then everything else]
+```
+
+Generate from the thread. Verify with the data. Not the reverse.
@@ -0,0 +1,181 @@
+# Theoretical Foundations — SOTA Personality Simulation & Prediction
+
+Compiled from 30+ papers and frameworks. This is the scientific backbone
+of Hermes Simulator.
+
+## Core Architecture: What The Research Says
+
+### The HumanLLM Approach (Microsoft, KDD 2026, arxiv 2601.15793)
+**Most directly applicable to our use case.**
+
+Based on Lewin's Equation: **B = f(P, E)** — behavior is a function of person + environment.
+
+4-level user profiling hierarchy:
+1. **Persona** — brief identity (role, affiliation, public image)
+2. **Profile** — detailed background (career, education, beliefs, social graph)
+3. **Stories** — key life events, formative experiences, narrative arcs
+4. **Writing Style** — linguistic fingerprint (syntax, vocabulary, tone, quirks)
+
+Trained on "Cognitive Genome Dataset": 5.5M+ user logs from Reddit, Twitter,
+Blogger, Amazon (282K users, 886K scenarios, 1.27M social QA pairs).
+
+6 training tasks: profile generation, scenario generation, social QA,
+writing style transfer, action prediction, mental state inference.
+
+**Key insight for us**: The 4-level hierarchy maps perfectly to our dossier
+template. OSINT research fills each level with real data.
+
+### Generative Agent Simulations of 1,000 People (Stanford/Google, arxiv 2411.10109)
+**The accuracy benchmark.**
+
+- Simulated 1,052 REAL individuals from 2-hour qualitative interviews
+- **85% accuracy** replicating survey responses
+- As accurate as humans replicating their OWN answers 2 weeks later
+- Interview-based agent creation >> demographic-profile-based agents
+- Reduces racial/ideological bias vs stereotype-based approaches
+
+**Key insight**: Real data about a person (interviews, posts, etc.) massively
+outperforms demographic inference. Our OSINT approach is correct.
+
+### The Memory Accumulation Paradox (ACL 2025, FineRob Dataset)
+**Critical finding for memory management.**
+
+- Created 78.6K QA records from 1,866 real users across Twitter, Reddit, Zhihu
+- **Performance PEAKS at 30-50 memory entries, then DECLINES**
+- More data ≠ better predictions past the sweet spot
+- Two reasoning patterns:
+  - Role Stereotype-based (static profile) — less accurate
+  - Observation & Memory-based (dynamic history analysis) — much more accurate
+- OM-CoT framework: Oracle-guided chain-of-thought improves prediction ~4.5% F1
+
+**Key insight**: Don't dump everything into the prompt. Curate the 30-50 most
+representative/distinctive data points about a person. Quality >> quantity.
+
+### LLM Personality Limitations (arxiv 2602.07414, Feb 2026)
+**What we're fighting against.**
+
+- LLMs show polarized/rigid strategies vs human adaptive flexibility
+- Humans: neuroticism is strongest behavioral predictor
+- LLMs: agreeableness/extraversion dominate (wrong weighting)
+- Claude closest to human behavior; GPT-4 tends to escalate
+- LLMs are "sycophantic" and overly agreeable by default
+- Neuroticism is hardest trait to simulate (F1=0.63 vs 0.87 for Openness)
+
+**Key insight**: We need to actively fight LLM defaults. Push against
+agreeableness. Inject friction. Real people are messy and contradictory.
+
+### BehaviorChain Benchmark (ACL 2025, Peking University)
+**Realistic accuracy expectations.**
+
+- 15,846 behaviors across 1,001 personas
+- Even GPT-4o achieves only ~56% accuracy on behavior prediction
+- Errors compound: wrong at step N makes step N+1 harder
+- Models worse at predicting mundane/non-key behaviors
+- Best model: Llama-3.1-70B at 57.4%
+
+**Key insight**: Be honest about uncertainty. Don't oversell accuracy.
+Flag predictions as high/medium/low confidence.
+
+## Personality Modeling Techniques
+
+### Big Five (OCEAN) — The Standard
+- **Openness**: curiosity, creativity, preference for novelty
+- **Conscientiousness**: organization, dependability, self-discipline
+- **Extraversion**: sociability, assertiveness, positive emotions
+- **Agreeableness**: cooperation, trust, empathy
+- **Neuroticism**: anxiety, emotional instability, moodiness
+
+### Inferring Big Five from Social Media (Azucar et al. 2018 meta-analysis)
+Features that predict personality from posts:
+- **LIWC** (Linguistic Inquiry Word Count): 74 features — function words,
+  pronouns, emotion words, cognitive process words
+- **Semantic embeddings**: BERT 768-dim vectors from post text
+- **Social metadata**: follower count, friend count, post frequency
+- **Sentiment**: VADER positive/negative scores
+- Best achievable AUC: ~0.67 (modest but meaningful)
+- E/I (Extraversion) most predictable; N/S least predictable
+
+### Personality Conditioning Methods (ranked by effectiveness)
+1. **Training-based** (SFT/DPO on personality-grounded data) — STRONGEST
+   - BIG5-CHAT: 100K dialogues, trait correlations match human data
+2. **Persona Vectors** (Anthropic 2025) — monitor/control traits at activation level
+3. **Adjective-based prompting** — 70 bipolar adjective pairs, 3 per trait
+   with intensity modifiers ("very" for high, "a bit" for low)
+4. **Prompt-based** (describe traits in system prompt) — WEAKEST
+
+For our simulator, we use method 3+4 combined (adjective-based + rich prompt),
+since we can't fine-tune per-person.
+
+## Social Simulation Frameworks
+
+### OASIS (CAMEL-AI, GitHub 4.1K stars, arxiv 2411.11581)
+- Simulates up to 1 MILLION agents on Twitter/Reddit clones
+- 23 action types (follow, comment, repost, like, mute, etc.)
+- Built-in recommendation systems (interest-based, hot-score)
+- Per-agent model customization
+- **Relevant for**: understanding platform dynamics, realistic engagement patterns
+
+### AgentSociety (Tsinghua, arxiv 2502.08691)
+- 10,000+ agents, ~5 million interactions
+- Validated against real-world experimental results
+- Supports interventions and scenario injection
+
+### Generative Agents Architecture (Park et al. 2023, THE foundational paper)
+Three components:
+1. **Observation**: perceive environment, store in memory stream
+2. **Planning**: generate action plans based on goals and context
+3. **Reflection**: synthesize observations into higher-level insights
+
+Memory stream with importance scoring + recency + relevance weighting.
+Emergent behaviors: autonomous party planning, coordinated social events.
+
+### Y Social (arxiv 2408.00818)
+- Social media digital twin platform
+- Each agent: Big Five traits, age, political leaning, topics, education
+- Agents autonomously decide actions (post, comment, like, follow)
+- Multiple LLM backends supported
+
+## Role-Playing & Character Simulation
+
+### Key Frameworks
+- **CoSER** (ICML 2025): Trains on ALL characters simultaneously, handles major + minor roles
+- **RoleLLM** (ACL 2024): Benchmark + elicit + enhance pipeline
+- **Character-LLM** (EMNLP 2023): Trainable agent for role-playing
+- **ChatHaruhi** (2023): Reviving characters via LLMs with dialogue grounding
+- **OpenCharacter** (2025): Training with large-scale synthetic personas
+- **Neeko** (2024): Dynamic LoRA for multi-character role-playing
+- **Test-Time-Matching** (2025): Decouples personality, memory, and linguistic style at inference
+
+## Curated GitHub Resources
+
+### Awesome Lists (essential reading)
+- `Persdre/awesome-llm-human-simulation` (109★, ICLR 2025) — ALL human simulation papers
+- `Neph0s/awesome-llm-role-playing-with-persona` (1K★) — All role-playing/persona papers
+- `Arstanley/Awesome-LLM-Conversation-Simulation` — Conversation simulation papers
+- `FudanDISC/SocialAgent` — Social simulation survey resources
+
+### Frameworks
+- `camel-ai/oasis` (4.1K★) — Social media sim, up to 1M agents
+- `tsinghua-fib-lab/agentsociety` — Large-scale societal simulation
+- `YSocialTwin` — Social media digital twin platform
+- `microsoft/autogen` — Multi-agent conversation framework
+
+### Personality Research
+- `mary-silence/simulating_personality` — Big Five LLM testing code
+- `hjian42/PersonaLLM` — Persona experiment code
+- `cambridgeltl/persona_effect` — Quantifying persona effects
+- `OL1RU1/BehaviorChain` — Behavior chain benchmark
+
+## Key Numbers to Remember
+
+| Metric | Value | Source |
+|--------|-------|--------|
+| Interview-grounded agent accuracy | 85% | Park et al. 2024 |
+| GPT-4o behavior prediction | ~56% | BehaviorChain 2025 |
+| Optimal memory entries | 30-50 | FineRob/ACL 2025 |
+| MBTI prediction AUC | 0.67 | Watt et al. 2024 |
+| Personality questionnaire reliability | α > 0.85 | Molchanova 2025 |
+| Neuroticism simulation F1 | 0.63 | Molchanova 2025 |
+| Openness simulation F1 | 0.87 | Molchanova 2025 |
+| LLM forecasting Brier score | 0.135-0.159 | Various 2025 |
+| Human superforecaster Brier | ~0.02 | Tetlock |
@@ -0,0 +1,231 @@
+# Verified Access Methods — Complete Platform Map (April 2026)
+
+Every method tested from our environment. Use this as the single
+source of truth for what works and what doesn't.
+
+## TIER 1 — Full API / Rich Data Access
+
+### Twitter/X ✅✅✅
+| Method | Endpoint | Auth | Rate Limit | Returns |
+|--------|----------|------|-----------|---------|
+| API v2 bearer | api.twitter.com/2/ | Bearer token | 10K tweets/15min | Profiles, tweets, search |
+| nitter.cz | web_extract | None | No limit seen | Full timeline (UNRELIABLE — see note below) |
+| ThreadReaderApp | web_extract /user/{handle} | None | No limit seen | Historical threads |
+
+#### CRITICAL: X API curl is the gold standard for voice calibration (April 2026)
+The BEST voice data source is direct curl to X API v2 with bearer token.
+Returns full tweet text + public_metrics per tweet. Always prefer this for
+mechanical calibration (word count, caps, punctuation, emoji rate).
+
+```bash
+source ~/.dotenv
+# 1. Get user ID from handle
+curl -s -H "Authorization: Bearer $X_BEARER_TOKEN" \
+  "https://api.twitter.com/2/users/by/username/{handle}?user.fields=description,public_metrics,location,created_at"
+# 2. Get timeline (30 tweets per page, paginate with meta.next_token)
+curl -s -H "Authorization: Bearer $X_BEARER_TOKEN" \
+  "https://api.twitter.com/2/users/{user_id}/tweets?max_results=30&tweet.fields=created_at,public_metrics,text&exclude=retweets"
+# 3 pages = 90 tweets — enough for fidelity 100 voice calibration
+```
+
+NOTE: scripts/x_api.py is BROKEN — imports hermes_tools at top level, can't
+run standalone via terminal(). Use direct curl above instead.
+
+#### nitter.cz reliability warning (April 2026)
+nitter.cz via web_extract works SOMETIMES but is unreliable:
+- Returns 502 Cloudflare errors for /with_replies on some handles
+- Returns "User not found" for valid handles (e.g. karan4d exists but nitter says not found)
+- Main profile page (/handle) more reliable than /with_replies
+- Use as SUPPLEMENT to X API curl, not primary source. If nitter fails, don't retry — use curl.
+
+### Bluesky ✅✅
+| Method | Endpoint | Auth | Returns |
+|--------|----------|------|---------|
+| getProfile | public.api.bsky.app | None | Full profile, stats |
+| getAuthorFeed | public.api.bsky.app | None | 50 posts + engagement |
+| searchActors | public.api.bsky.app | None | Find handles by name |
+| searchPosts | BLOCKED (403) | — | Use searchActors + getAuthorFeed workaround |
+
+### Mastodon ✅✅✅ (FULLY OPEN)
+| Method | Endpoint | Auth | Returns |
+|--------|----------|------|---------|
+| Account lookup | {instance}/api/v1/accounts/lookup?acct={user} | None | Full profile |
+| Account statuses | {instance}/api/v1/accounts/{id}/statuses | None | All posts |
+| Search | {instance}/api/v2/search?q={query}&type=accounts | None | Account search |
+| WebFinger | {instance}/.well-known/webfinger?resource=acct:{user}@{instance} | None | Identity resolution |
+| Trending | {instance}/api/v1/trends/tags | None | Trending content |
+Key instances: mastodon.social, hachyderm.io, sigmoid.social
+
+### Instagram ✅✅ (CRACKED)
+| Method | Endpoint | Auth | Returns |
+|--------|----------|------|---------|
+| Private Web API | i.instagram.com/api/v1/users/web_profile_info/ | Mobile UA + x-ig-app-id: 936619743392459 | Profile + 12 posts + captions + CDN URLs |
+| oEmbed | instagram.com/api/v1/oembed/ | None | Caption + author for individual posts |
+| Pixwox | web_extract pixwox.com/profile/{user} | None | 12+ posts, engagement |
+| SocialBlade | web_extract socialblade.com/instagram/user/{user} | None | Analytics, follower trends |
+| CDN images | scontent-*.cdninstagram.com URLs from API | None | Full-res images → vision_analyze |
+| Google index | web_search site:instagram.com | None | Bio, follower count, captions |
+
+### GitHub ✅✅
+| Method | Endpoint | Auth | Returns |
+|--------|----------|------|---------|
+| REST API | api.github.com/users/{user} | None (60 req/hr) | Profile, repos, events, gists |
+| Profile README | github.com/{user}/{user} | None | Self-description (voice gold) |
+
+### Reddit ✅✅
+| Method | Endpoint | Auth | Returns |
+|--------|----------|------|---------|
+| JSON API | reddit.com/user/{user}.json | User-Agent header required | Comments, posts, scores |
+| Search | reddit.com/r/{sub}/search.json | User-Agent header | Subreddit-specific search |
+
+## TIER 2 — Good Data, Reliable Access
+
+### Facebook ✅✅ (CRACKED — Googlebot UA trick)
+| Method | Endpoint | Returns |
+|--------|----------|---------|
+| Googlebot UA (BEST) | curl facebook.com/{page} with Googlebot UA | OG tags: name, bio/about, likes count (e.g. 121M for zuck), talking_about count, og:image, profile pic |
+| Page Plugin embed | plugins/page.php?href=...&tabs=timeline | Name, follower count, numeric page_id |
+| Graph /picture | graph.facebook.com/v19.0/{page}/picture?redirect=false | Direct CDN profile pic URL (no auth) |
+| web_search | site:facebook.com {name} | Profile snippets from Google index |
+| Script: scripts/facebook_api.py — combines all 3 methods |
+| NOTE: Works for PUBLIC Pages (businesses, public figures, orgs). Personal profiles behind privacy settings are not accessible. |
+| Tested: zuck (121M likes), NVIDIA, Meta, CocaCola, BillGates, BarackObama |
+
+### Threads (Meta) ✅✅ (CRACKED — OG tags DO exist)
+| Method | Endpoint | Returns |
+|--------|----------|---------|
+| Profile OG tags (BEST) | curl -L threads.com/@{user} (NOTE: .com not .net — .net 301 redirects) | display_name, follower_count (e.g. "5.5M"), thread_count, bio, profile_picture_url |
+| Post OG tags | curl -L threads.com/@{user}/post/{shortcode} | Full post text, author name, image URL |
+| WebFinger | threads.net/.well-known/webfinger?resource=acct:{user}@threads.net | ActivityPub ID, profile URL (works for federated users) |
+| IMPORTANT: threads.NET redirects to threads.COM — always use -L flag or go directly to .com |
+| Post discovery | web_search site:threads.net @{user} | Find post URLs to then fetch |
+| Script: scripts/threads_api.py — profile + post + webfinger extraction |
+| Previous test was WRONG about "no OG tags" — they're there, you just need standard curl |
+| Tested: zuck (5.5M followers), mosseri, nvidia |
+
+### Medium ✅✅
+| Method | Returns |
+|--------|---------|
+| RSS feed: medium.com/feed/@{user} (curl) | FULL article text, tags, dates — NO AUTH |
+| web_extract on profile | Bio, follower count, article list, themes |
+| web_extract on articles | Full content (paywall may truncate non-members) |
+
+### Quora ✅✅
+| Method | Returns |
+|--------|---------|
+| web_extract on profile | Bio, credentials, Q&A with direct quotes |
+| web_search site:quora.com | Finds profiles and specific answers |
+| VOICE VALUE: Opinions in own words, analogies, intellectual identity |
+
+### Goodreads ✅✅ (HIDDEN GEM)
+| Method | Returns |
+|--------|---------|
+| web_extract on user profile | Favorites, reviews in own voice, social graph, reading history |
+| web_extract on author page | Bio, books, ratings, notable quotes |
+| VOICE VALUE: "You are what you read" — intellectual identity fingerprint |
+| Example: Karpathy's Goodreads reveals gaming passion, favorite authors (Feynman, Clarke) |
+
+### Google Scholar ✅✅
+| Method | Returns |
+|--------|---------|
+| web_search + web_extract on profile | Citations, h-index, top papers, co-authors |
+| Semantic Scholar API via web_extract | Paper list, citation counts, author ID |
+| Endpoint: api.semanticscholar.org/graph/v1/author/search?query={name} |
+
+### Product Hunt ✅
+| Method | Returns |
+|--------|---------|
+| web_extract on producthunt.com/@{user} | Bio, launch history, forum activity |
+
+### HackerNews ✅
+| Method | Returns |
+|--------|---------|
+| Algolia API: hn.algolia.com/api/v1/search?query={name}&tags=comment | Comments, mentions |
+
+### Podcast Transcripts ✅✅✅ (HIGHEST VOICE VALUE)
+| Source | Method |
+|--------|--------|
+| Lex Fridman | web_extract on lexfridman.com/.../transcript |
+| Tyler Cowen | web_extract on conversationswithtyler.com |
+| TED Talks | web_extract on ted.com/.../transcript |
+| Sequoia | web_extract on sequoiacap.com/podcast |
+| Discovery: web_search "{name} podcast transcript interview" |
+
+### News/Blogs ✅✅
+| Source | Method |
+|--------|--------|
+| TechCrunch, Wired, Verge, Ars | web_extract — full articles |
+| Personal blogs | web_extract — longform self-expression |
+| Substacks | web_extract — essays and comments |
+| Wayback Machine | Works for blog archives (not Twitter) |
+
+## TIER 3 — Limited / Conditional
+
+### TikTok ✅✅ (FULL ACCESS)
+| Method | Returns |
+|--------|---------|
+| HTML profile scraping | Parse __UNIVERSAL_DATA_FOR_REHYDRATION__ JSON at path __DEFAULT_SCOPE__.webapp.user-detail.userInfo.statsV2 → username, bio, followerCount, followingCount, heartCount, videoCount. Use statsV2 not stats for large numbers. |
+| oEmbed per video | curl tiktok.com/oembed?url={video_url} → caption, author, thumbnail. No auth. |
+| tikwm.com API | tikwm.com/api/user/info?unique_id={user} → full user stats. tikwm.com/api/?url={video_url} → play count, likes, comments, shares, duration. |
+| HTML video scraping | tiktok.com/@{user}/video/{id} → parse __UNIVERSAL_DATA → webapp.video-detail → full video data with description, hashtags, engagement. |
+| SocialBlade | web_extract socialblade.com/tiktok/user/{user} → followers, likes, growth trends. |
+| Video discovery | web_search("site:tiktok.com/@{user}/video") → recent video URLs → scrape each |
+| Tested: khaby.lame (160.5M), charlidamelio (156.7M), mrbeast (124.7M) |
+
+### Spotify ✅ (podcasters only)
+| Method | Returns |
+|--------|---------|
+| web_extract on show page | Episode listings with guests, topics, durations |
+
+### Stack Overflow ✅
+| Method | Returns |
+|--------|---------|
+| web_extract on profile | Reputation, tags, top answers, bio |
+
+### Crunchbase ✅ (executives/founders only)
+| Method | Returns |
+|--------|---------|
+| web_extract on crunchbase.com/person/{slug} | Full career history, education, investments, board positions |
+
+### LinkedIn ⚠️ (indirect only)
+| Method | Returns |
+|--------|---------|
+| web_search site:linkedin.com/in | Name, headline, company, location from snippets |
+| Crunchbase | Full career history (better than LinkedIn for execs) |
+| Corporate press pages | Official professional bios |
+| RocketReach/SignalHire snippets | Title confirmation from web_search |
+
+## TIER 4 — Blocked / Dead
+
+| Platform | Status |
+|----------|--------|
+| LinkedIn direct | BLOCKED (web_extract domain blocked) |
+| Discord | WALLED (not publicly indexable) |
+| Telegram t.me | BLOCKED in some environments |
+| Threads Official API | AUTH REQUIRED (graph.threads.net needs OAuth) |
+| Threads ActivityPub outbox | 404 for all tested users |
+| Instagram direct | BLOCKED (use Private API instead) |
+| Most Nitter instances | DEAD (only nitter.cz works, but UNRELIABLE — see note) |
+| Google Cache of Twitter | EMPTY |
+| Wayback for tweets | USELESS (JS rendering) |
+| Twitter Syndication API | RATE LIMITED |
+| Archive.today | 429 + CAPTCHA |
+| imginn/picuki/dumpoir/gramhir | 403 |
+| Facebook Graph API | AUTH REQUIRED |
+
+## Quick Reference: Research Pipeline by Person Type
+
+### Tech Founder/CEO
+X API → Bluesky → GitHub README → Crunchbase → Podcast transcripts → Medium RSS → HN → Product Hunt → LinkedIn snippets → News profiles
+
+### AI Researcher
+X API → Bluesky → Google Scholar → Semantic Scholar → arXiv → GitHub → Podcast transcripts → Blog/Substack → Reddit → Mastodon (sigmoid.social)
+
+### Public Figure / Politician
+X API → Facebook OG → Instagram API → YouTube → Podcast transcripts → News profiles → Quora → Goodreads → Wikipedia
+
+### Content Creator
+X API → Instagram API → TikTok → YouTube → Twitch → Podcast → Medium → Reddit → Bluesky → Threads OG
+
+### Academic
+Google Scholar → Semantic Scholar → University page → Conference talks → Podcast transcripts → Mastodon → Blog → GitHub → Reddit → HN
@@ -0,0 +1,250 @@
+"""
+REHOBOAM Database Layer
+SQLite setup, migrations, and query helpers.
+"""
+
+import sqlite3
+import os
+from pathlib import Path
+from datetime import datetime
+
+DB_DIR = Path.home() / ".hermes" / "rehoboam" / "db"
+MAIN_DB = DB_DIR / "rehoboam.db"
+
+SCHEMA_VERSION = 1
+
+SCHEMA_SQL = """
+-- Core tables
+CREATE TABLE IF NOT EXISTS profiles (
+    handle TEXT PRIMARY KEY,
+    platform TEXT NOT NULL,
+    display_name TEXT,
+    last_updated TEXT NOT NULL,
+    staleness TEXT NOT NULL,
+    profile_path TEXT NOT NULL,
+    created_at TEXT NOT NULL
+);
+
+CREATE TABLE IF NOT EXISTS simulations (
+    sim_id TEXT PRIMARY KEY,
+    created_at TEXT NOT NULL,
+    scenario TEXT NOT NULL,
+    participant_count INTEGER,
+    duration_sec REAL,
+    model_used TEXT,
+    config_path TEXT,
+    output_path TEXT
+);
+
+CREATE TABLE IF NOT EXISTS sim_participants (
+    sim_id TEXT REFERENCES simulations(sim_id),
+    handle TEXT REFERENCES profiles(handle),
+    role TEXT,
+    PRIMARY KEY (sim_id, handle)
+);
+
+CREATE TABLE IF NOT EXISTS sim_dynamics (
+    sim_id TEXT REFERENCES simulations(sim_id),
+    handle TEXT,
+    post_count INTEGER,
+    word_count INTEGER,
+    avg_sentiment REAL,
+    dominance_score REAL,
+    agreement_score REAL,
+    controversy_score REAL,
+    ratio_score REAL,
+    influence_in_sim REAL,
+    PRIMARY KEY (sim_id, handle)
+);
+
+CREATE TABLE IF NOT EXISTS sim_interactions (
+    sim_id TEXT REFERENCES simulations(sim_id),
+    from_handle TEXT,
+    to_handle TEXT,
+    interaction_type TEXT,
+    count INTEGER,
+    avg_sentiment REAL,
+    PRIMARY KEY (sim_id, from_handle, to_handle, interaction_type)
+);
+
+CREATE TABLE IF NOT EXISTS predictions (
+    pred_id TEXT PRIMARY KEY,
+    created_at TEXT NOT NULL,
+    sim_id TEXT,
+    handle TEXT,
+    prediction_type TEXT,
+    prediction_text TEXT NOT NULL,
+    confidence REAL NOT NULL,
+    calibrated_confidence REAL,
+    timeframe_days INTEGER,
+    resolved_at TEXT,
+    outcome TEXT,
+    outcome_evidence TEXT,
+    accuracy_score REAL
+);
+
+CREATE TABLE IF NOT EXISTS social_edges (
+    from_handle TEXT,
+    to_handle TEXT,
+    relationship_type TEXT,
+    weight REAL,
+    first_observed TEXT,
+    last_observed TEXT,
+    observation_count INTEGER,
+    source TEXT,
+    PRIMARY KEY (from_handle, to_handle, relationship_type)
+);
+
+CREATE TABLE IF NOT EXISTS social_clusters (
+    cluster_id TEXT PRIMARY KEY,
+    name TEXT,
+    description TEXT,
+    member_handles TEXT,
+    computed_at TEXT,
+    cohesion_score REAL
+);
+
+CREATE TABLE IF NOT EXISTS monitoring_events (
+    event_id TEXT PRIMARY KEY,
+    handle TEXT,
+    detected_at TEXT NOT NULL,
+    event_type TEXT,
+    description TEXT,
+    related_prediction_id TEXT,
+    severity TEXT,
+    acknowledged INTEGER DEFAULT 0
+);
+
+CREATE TABLE IF NOT EXISTS audit_log (
+    log_id TEXT PRIMARY KEY,
+    timestamp TEXT NOT NULL,
+    sim_id TEXT,
+    action TEXT NOT NULL,
+    handle TEXT,
+    details TEXT,
+    duration_sec REAL,
+    model_used TEXT,
+    token_count INTEGER,
+    error TEXT
+);
+
+-- Indexes
+CREATE INDEX IF NOT EXISTS idx_predictions_handle ON predictions(handle);
+CREATE INDEX IF NOT EXISTS idx_predictions_type ON predictions(prediction_type);
+CREATE INDEX IF NOT EXISTS idx_predictions_unresolved ON predictions(outcome) WHERE outcome IS NULL;
+CREATE INDEX IF NOT EXISTS idx_audit_action ON audit_log(action);
+CREATE INDEX IF NOT EXISTS idx_audit_sim ON audit_log(sim_id);
+CREATE INDEX IF NOT EXISTS idx_social_edges_from ON social_edges(from_handle);
+CREATE INDEX IF NOT EXISTS idx_social_edges_to ON social_edges(to_handle);
+CREATE INDEX IF NOT EXISTS idx_monitoring_handle ON monitoring_events(handle);
+CREATE INDEX IF NOT EXISTS idx_monitoring_unack ON monitoring_events(acknowledged) WHERE acknowledged = 0;
+
+-- Schema version tracking
+CREATE TABLE IF NOT EXISTS schema_meta (
+    key TEXT PRIMARY KEY,
+    value TEXT
+);
+"""
+
+
+def init_db() -> sqlite3.Connection:
+    """Initialize the database, creating tables if needed."""
+    DB_DIR.mkdir(parents=True, exist_ok=True)
+    conn = sqlite3.connect(str(MAIN_DB))
+    conn.execute("PRAGMA journal_mode=WAL")
+    conn.execute("PRAGMA foreign_keys=ON")
+    conn.executescript(SCHEMA_SQL)
+    conn.execute(
+        "INSERT OR REPLACE INTO schema_meta (key, value) VALUES (?, ?)",
+        ("schema_version", str(SCHEMA_VERSION))
+    )
+    conn.commit()
+    return conn
+
+
+def get_db() -> sqlite3.Connection:
+    """Get a database connection, initializing if needed."""
+    if not MAIN_DB.exists():
+        return init_db()
+    conn = sqlite3.connect(str(MAIN_DB))
+    conn.execute("PRAGMA journal_mode=WAL")
+    conn.execute("PRAGMA foreign_keys=ON")
+    conn.row_factory = sqlite3.Row
+    return conn
+
+
+def log_audit(conn: sqlite3.Connection, action: str, handle: str = None,
+              sim_id: str = None, details: str = None, duration_sec: float = None,
+              model_used: str = None, token_count: int = None, error: str = None):
+    """Write an entry to the audit log."""
+    from schemas import gen_id
+    conn.execute(
+        """INSERT INTO audit_log
+           (log_id, timestamp, sim_id, action, handle, details, duration_sec, model_used, token_count, error)
+           VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+        (gen_id("log_"), datetime.utcnow().isoformat() + "Z", sim_id, action,
+         handle, details, duration_sec, model_used, token_count, error)
+    )
+    conn.commit()
+
+
+# -- Query Helpers --
+
+def get_prediction_accuracy(conn: sqlite3.Connection, prediction_type: str = None) -> dict:
+    """Get prediction accuracy statistics."""
+    query = """
+        SELECT prediction_type,
+               COUNT(*) as total,
+               SUM(CASE WHEN outcome='correct' THEN 1 ELSE 0 END) as correct,
+               SUM(CASE WHEN outcome='partially_correct' THEN 1 ELSE 0 END) as partial,
+               SUM(CASE WHEN outcome='incorrect' THEN 1 ELSE 0 END) as incorrect,
+               AVG(confidence) as avg_confidence,
+               AVG(CASE WHEN outcome='correct' THEN 1.0
+                        WHEN outcome='partially_correct' THEN 0.5
+                        ELSE 0.0 END) as accuracy
+        FROM predictions WHERE outcome IS NOT NULL
+    """
+    params = []
+    if prediction_type:
+        query += " AND prediction_type = ?"
+        params.append(prediction_type)
+    query += " GROUP BY prediction_type"
+    return [dict(row) for row in conn.execute(query, params).fetchall()]
+
+
+def get_open_predictions(conn: sqlite3.Connection, handle: str = None) -> list:
+    """Get unresolved predictions."""
+    query = "SELECT * FROM predictions WHERE outcome IS NULL"
+    params = []
+    if handle:
+        query += " AND handle = ?"
+        params.append(handle)
+    query += " ORDER BY created_at DESC"
+    return [dict(row) for row in conn.execute(query, params).fetchall()]
+
+
+def get_social_neighborhood(conn: sqlite3.Connection, handle: str, depth: int = 1) -> list:
+    """Get a person's social graph neighborhood."""
+    query = """
+        SELECT from_handle, to_handle, relationship_type, weight
+        FROM social_edges
+        WHERE from_handle = ? OR to_handle = ?
+        ORDER BY weight DESC
+    """
+    return [dict(row) for row in conn.execute(query, (handle, handle)).fetchall()]
+
+
+def get_unread_alerts(conn: sqlite3.Connection) -> list:
+    """Get unacknowledged monitoring alerts."""
+    query = """
+        SELECT * FROM monitoring_events
+        WHERE acknowledged = 0
+        ORDER BY detected_at DESC
+    """
+    return [dict(row) for row in conn.execute(query).fetchall()]
+
+
+if __name__ == "__main__":
+    conn = init_db()
+    print(f"Database initialized at {MAIN_DB}")
+    conn.close()
@@ -0,0 +1,216 @@
+"""
+REHOBOAM Data Schemas
+Pydantic models for all JSON data structures used in the system.
+"""
+
+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Optional
+from datetime import datetime
+import json
+import uuid
+
+
+def gen_id(prefix: str = "") -> str:
+    return f"{prefix}{datetime.utcnow().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
+
+
+@dataclass
+class OceanScores:
+    openness: float = 0.5
+    conscientiousness: float = 0.5
+    extraversion: float = 0.5
+    agreeableness: float = 0.5
+    neuroticism: float = 0.5
+
+
+@dataclass
+class DarkTriad:
+    narcissism: float = 0.0
+    machiavellianism: float = 0.0
+    psychopathy: float = 0.0
+
+
+@dataclass
+class MoralFoundations:
+    care: float = 0.5
+    fairness: float = 0.5
+    loyalty: float = 0.5
+    authority: float = 0.5
+    sanctity: float = 0.5
+    liberty: float = 0.5
+
+
+@dataclass
+class Psychometrics:
+    ocean: OceanScores = field(default_factory=OceanScores)
+    mbti_estimate: str = ""
+    dark_triad: DarkTriad = field(default_factory=DarkTriad)
+    moral_foundations: MoralFoundations = field(default_factory=MoralFoundations)
+    confidence: float = 0.0
+    sample_size: int = 0
+
+
+@dataclass
+class VoiceFingerprint:
+    vocabulary_tier: str = ""
+    avg_sentence_length: float = 0.0
+    exclamation_rate: float = 0.0
+    question_rate: float = 0.0
+    emoji_rate: float = 0.0
+    slang_index: float = 0.0
+    formality_score: float = 0.5
+    humor_style: str = ""
+    signature_phrases: list[str] = field(default_factory=list)
+    topics_vocabulary: dict[str, float] = field(default_factory=dict)
+    cadence_pattern: str = ""
+
+
+@dataclass
+class Stance:
+    position: str = ""
+    intensity: float = 0.0
+    last_seen: str = ""
+
+
+@dataclass
+class Influence:
+    score: float = 0.0
+    reach: str = "micro"
+    engagement_rate: float = 0.0
+    amplification_power: float = 0.0
+    thought_leadership_domains: list[str] = field(default_factory=list)
+
+
+@dataclass
+class PostingPatterns:
+    avg_posts_per_day: float = 0.0
+    peak_hours_utc: list[int] = field(default_factory=list)
+    weekend_ratio: float = 0.5
+    reply_ratio: float = 0.0
+    repost_ratio: float = 0.0
+    thread_frequency: float = 0.0
+    controversy_rate: float = 0.0
+
+
+@dataclass
+class Relationships:
+    allies: list[str] = field(default_factory=list)
+    rivals: list[str] = field(default_factory=list)
+    frequent_interactions: list[str] = field(default_factory=list)
+    mentioned_by_frequently: list[str] = field(default_factory=list)
+
+
+@dataclass
+class ProfileMeta:
+    data_sources: list[str] = field(default_factory=list)
+    computation_time_sec: float = 0.0
+    model_used: str = ""
+    last_full_rebuild: str = ""
+    last_incremental: str = ""
+
+
+@dataclass
+class Identity:
+    bio: str = ""
+    location: str = ""
+    verified: bool = False
+    follower_count: int = 0
+    following_count: int = 0
+    account_created: str = ""
+
+
+@dataclass
+class Profile:
+    schema_version: str = "7.0"
+    handle: str = ""
+    platform: str = "x"
+    display_name: str = ""
+    created_at: str = ""
+    last_updated: str = ""
+    update_count: int = 0
+    staleness_score: float = 1.0
+    identity: Identity = field(default_factory=Identity)
+    psychometrics: Psychometrics = field(default_factory=Psychometrics)
+    voice_fingerprint: VoiceFingerprint = field(default_factory=VoiceFingerprint)
+    stances: dict[str, Stance] = field(default_factory=dict)
+    community_membership: list[str] = field(default_factory=list)
+    influence: Influence = field(default_factory=Influence)
+    posting_patterns: PostingPatterns = field(default_factory=PostingPatterns)
+    relationships: Relationships = field(default_factory=Relationships)
+    star_thread_ref: str = "star_thread.json"
+    raw_data_refs: list[str] = field(default_factory=list)
+    _meta: ProfileMeta = field(default_factory=ProfileMeta)
+
+    def to_dict(self) -> dict:
+        """Recursively convert to dict for JSON serialization."""
+        import dataclasses
+        def _convert(obj):
+            if dataclasses.is_dataclass(obj):
+                return {k: _convert(v) for k, v in dataclasses.asdict(obj).items()}
+            elif isinstance(obj, list):
+                return [_convert(i) for i in obj]
+            elif isinstance(obj, dict):
+                return {k: _convert(v) for k, v in obj.items()}
+            return obj
+        return _convert(self)
+
+    def to_json(self, indent: int = 2) -> str:
+        return json.dumps(self.to_dict(), indent=indent)
+
+
+@dataclass
+class StarThread:
+    handle: str = ""
+    computed_at: str = ""
+    based_on_profile_version: str = ""
+    thread_version: int = 1
+    core_compression: str = ""
+    key_drives: list[str] = field(default_factory=list)
+    predictive_axioms: list[str] = field(default_factory=list)
+    voice_template: dict = field(default_factory=dict)
+    anti_slop_markers: list[str] = field(default_factory=list)
+    _meta: dict = field(default_factory=dict)
+
+
+@dataclass
+class Prediction:
+    pred_id: str = ""
+    created_at: str = ""
+    sim_id: str = ""
+    handle: str = ""
+    prediction_type: str = ""  # statement, career, alliance, content, network_reaction
+    prediction_text: str = ""
+    confidence: float = 0.5
+    calibrated_confidence: float = 0.5
+    timeframe_days: int = 30
+    resolved_at: Optional[str] = None
+    outcome: Optional[str] = None  # correct, partially_correct, incorrect
+    outcome_evidence: Optional[str] = None
+    accuracy_score: Optional[float] = None
+
+
+@dataclass
+class WatchConfig:
+    watch_id: str = ""
+    handle: str = ""
+    platform: str = "x"
+    enabled: bool = True
+    check_interval_minutes: int = 120
+    watch_for: list[dict] = field(default_factory=list)
+    alert_severity_minimum: str = "notable"
+    created_at: str = ""
+
+
+@dataclass
+class PopulationDefinition:
+    group_id: str = ""
+    name: str = ""
+    description: str = ""
+    created_at: str = ""
+    last_updated: str = ""
+    explicit_members: list[str] = field(default_factory=list)
+    criteria: dict = field(default_factory=dict)
+    resolved_members: list[str] = field(default_factory=list)
+    sampling_strategy: str = "representative"
+    default_sample_size: int = 12
@@ -0,0 +1,280 @@
+"""
+REHOBOAM Storage Layer
+Directory management, profile I/O, index maintenance.
+"""
+
+import json
+import shutil
+from pathlib import Path
+from datetime import datetime, timedelta
+from typing import Optional
+
+BASE_DIR = Path.home() / ".hermes" / "rehoboam"
+PROFILES_DIR = BASE_DIR / "profiles"
+POPULATIONS_DIR = BASE_DIR / "populations"
+SIMULATIONS_DIR = BASE_DIR / "simulations"
+MONITORING_DIR = BASE_DIR / "monitoring"
+CONFIG_DIR = BASE_DIR / "config"
+
+
+def init_storage():
+    """Create all required directories."""
+    for d in [PROFILES_DIR, POPULATIONS_DIR, SIMULATIONS_DIR,
+              MONITORING_DIR, MONITORING_DIR / "alerts", CONFIG_DIR,
+              BASE_DIR / "db"]:
+        d.mkdir(parents=True, exist_ok=True)
+
+    # Create default configs if they don't exist
+    staleness_path = CONFIG_DIR / "staleness_policy.json"
+    if not staleness_path.exists():
+        staleness_path.write_text(json.dumps({
+            "thresholds": {
+                "fresh": {"max_age_hours": 72},
+                "stale": {"max_age_hours": 336},
+                "expired": {"max_age_hours": 2160},
+                "archived": {"max_age_hours": 8760}
+            },
+            "per_field_decay": {
+                "psychometrics": {"half_life_days": 180},
+                "stances": {"half_life_days": 30},
+                "posting_patterns": {"half_life_days": 60},
+                "relationships": {"half_life_days": 45},
+                "influence": {"half_life_days": 90},
+                "voice_fingerprint": {"half_life_days": 365}
+            },
+            "auto_refresh_on_simulation": True,
+            "auto_refresh_threshold": "stale"
+        }, indent=2))
+
+    config_path = CONFIG_DIR / "rehoboam.json"
+    if not config_path.exists():
+        config_path.write_text(json.dumps({
+            "version": "7.0",
+            "default_model": "claude-opus-4-20250514",
+            "max_thread_age_days": 30,
+            "monitoring_enabled": False,
+            "auto_thread": True,
+            "auto_profile_update": True
+        }, indent=2))
+
+    # Create indexes if they don't exist
+    for idx_path in [PROFILES_DIR / "_index.json", POPULATIONS_DIR / "_index.json",
+                     SIMULATIONS_DIR / "_index.json"]:
+        if not idx_path.exists():
+            idx_path.write_text("{}")
+
+
+def normalize_handle(handle: str) -> str:
+    """Normalize a handle to a filesystem-safe directory name."""
+    h = handle.lstrip("@").lower().strip()
+    # Replace characters that are problematic in filenames
+    return h.replace("/", "_").replace("\\", "_")
+
+
+# -- Profile I/O --
+
+def get_profile_dir(handle: str) -> Path:
+    return PROFILES_DIR / normalize_handle(handle)
+
+
+def profile_exists(handle: str) -> bool:
+    return (get_profile_dir(handle) / "profile.json").exists()
+
+
+def load_profile(handle: str) -> Optional[dict]:
+    path = get_profile_dir(handle) / "profile.json"
+    if path.exists():
+        return json.loads(path.read_text())
+    return None
+
+
+def save_profile(handle: str, profile: dict, snapshot: bool = True):
+    """Save a profile, optionally snapshotting the old one."""
+    pdir = get_profile_dir(handle)
+    pdir.mkdir(parents=True, exist_ok=True)
+    (pdir / "history").mkdir(exist_ok=True)
+    (pdir / "raw").mkdir(exist_ok=True)
+    (pdir / "predictions").mkdir(exist_ok=True)
+
+    profile_path = pdir / "profile.json"
+
+    # Snapshot old profile before overwriting
+    if snapshot and profile_path.exists():
+        old = json.loads(profile_path.read_text())
+        ts = old.get("last_updated", datetime.utcnow().isoformat()).replace(":", "-")
+        snapshot_path = pdir / "history" / f"profile_{ts[:10]}.json"
+        shutil.copy2(profile_path, snapshot_path)
+
+    profile_path.write_text(json.dumps(profile, indent=2))
+    _update_profile_index(handle, profile)
+
+
+def _update_profile_index(handle: str, profile: dict):
+    idx_path = PROFILES_DIR / "_index.json"
+    idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
+    idx[normalize_handle(handle)] = {
+        "platform": profile.get("platform", "x"),
+        "last_updated": profile.get("last_updated", ""),
+        "staleness": compute_staleness(profile.get("last_updated", "")),
+        "has_star_thread": (get_profile_dir(handle) / "star_thread.json").exists(),
+        "simulation_count": idx.get(normalize_handle(handle), {}).get("simulation_count", 0),
+        "display_name": profile.get("display_name", "")
+    }
+    idx_path.write_text(json.dumps(idx, indent=2))
+
+
+# -- Star Thread I/O --
+
+def load_star_thread(handle: str) -> Optional[dict]:
+    path = get_profile_dir(handle) / "star_thread.json"
+    if path.exists():
+        return json.loads(path.read_text())
+    return None
+
+
+def save_star_thread(handle: str, thread: dict):
+    path = get_profile_dir(handle) / "star_thread.json"
+    get_profile_dir(handle).mkdir(parents=True, exist_ok=True)
+    path.write_text(json.dumps(thread, indent=2))
+    # Update index to reflect thread existence
+    idx_path = PROFILES_DIR / "_index.json"
+    if idx_path.exists():
+        idx = json.loads(idx_path.read_text())
+        key = normalize_handle(handle)
+        if key in idx:
+            idx[key]["has_star_thread"] = True
+            idx_path.write_text(json.dumps(idx, indent=2))
+
+
+# -- Staleness --
+
+def compute_staleness(last_updated: str) -> str:
+    """Determine staleness level from a timestamp string."""
+    if not last_updated:
+        return "expired"
+    try:
+        dt = datetime.fromisoformat(last_updated.rstrip("Z"))
+    except ValueError:
+        return "expired"
+
+    age = datetime.utcnow() - dt
+    hours = age.total_seconds() / 3600
+
+    policy = _load_staleness_policy()
+    thresholds = policy.get("thresholds", {})
+
+    if hours <= thresholds.get("fresh", {}).get("max_age_hours", 72):
+        return "fresh"
+    elif hours <= thresholds.get("stale", {}).get("max_age_hours", 336):
+        return "stale"
+    elif hours <= thresholds.get("expired", {}).get("max_age_hours", 2160):
+        return "expired"
+    else:
+        return "archived"
+
+
+def _load_staleness_policy() -> dict:
+    path = CONFIG_DIR / "staleness_policy.json"
+    if path.exists():
+        return json.loads(path.read_text())
+    return {"thresholds": {"fresh": {"max_age_hours": 72}, "stale": {"max_age_hours": 336},
+                           "expired": {"max_age_hours": 2160}, "archived": {"max_age_hours": 8760}}}
+
+
+def needs_thread_recompute(handle: str) -> bool:
+    """Check if a star thread needs recomputation."""
+    thread = load_star_thread(handle)
+    if thread is None:
+        return True
+
+    profile = load_profile(handle)
+    if profile is None:
+        return True
+
+    # Thread is stale if profile was updated after thread was computed
+    thread_time = thread.get("based_on_profile_version", "")
+    profile_time = profile.get("last_updated", "")
+    if thread_time < profile_time:
+        return True
+
+    # Thread is stale if older than max_thread_age_days
+    config = json.loads((CONFIG_DIR / "rehoboam.json").read_text()) if (CONFIG_DIR / "rehoboam.json").exists() else {}
+    max_age = config.get("max_thread_age_days", 30)
+    try:
+        computed = datetime.fromisoformat(thread.get("computed_at", "").rstrip("Z"))
+        if (datetime.utcnow() - computed).days > max_age:
+            return True
+    except ValueError:
+        return True
+
+    return False
+
+
+# -- Simulation I/O --
+
+def save_simulation(sim_id: str, config: dict, output: dict, analytics: dict, audit: dict):
+    sdir = SIMULATIONS_DIR / sim_id
+    sdir.mkdir(parents=True, exist_ok=True)
+    (sdir / "config.json").write_text(json.dumps(config, indent=2))
+    (sdir / "output.json").write_text(json.dumps(output, indent=2))
+    (sdir / "analytics.json").write_text(json.dumps(analytics, indent=2))
+    (sdir / "audit.json").write_text(json.dumps(audit, indent=2))
+
+    # Update index
+    idx_path = SIMULATIONS_DIR / "_index.json"
+    idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
+    idx[sim_id] = {
+        "created_at": config.get("created_at", datetime.utcnow().isoformat() + "Z"),
+        "scenario": config.get("scenario", ""),
+        "participant_count": len(config.get("participants", [])),
+    }
+    idx_path.write_text(json.dumps(idx, indent=2))
+
+
+# -- Population I/O --
+
+def save_population(group_id: str, definition: dict, aggregate: dict = None):
+    pdir = POPULATIONS_DIR / group_id
+    pdir.mkdir(parents=True, exist_ok=True)
+    (pdir / "history").mkdir(exist_ok=True)
+    (pdir / "definition.json").write_text(json.dumps(definition, indent=2))
+    if aggregate:
+        (pdir / "aggregate.json").write_text(json.dumps(aggregate, indent=2))
+
+    idx_path = POPULATIONS_DIR / "_index.json"
+    idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
+    idx[group_id] = {
+        "name": definition.get("name", group_id),
+        "member_count": len(definition.get("resolved_members", definition.get("explicit_members", []))),
+        "last_updated": definition.get("last_updated", "")
+    }
+    idx_path.write_text(json.dumps(idx, indent=2))
+
+
+def load_population(group_id: str) -> Optional[dict]:
+    path = POPULATIONS_DIR / group_id / "definition.json"
+    if path.exists():
+        return json.loads(path.read_text())
+    return None
+
+
+# -- Listing --
+
+def list_profiles() -> dict:
+    idx_path = PROFILES_DIR / "_index.json"
+    return json.loads(idx_path.read_text()) if idx_path.exists() else {}
+
+
+def list_populations() -> dict:
+    idx_path = POPULATIONS_DIR / "_index.json"
+    return json.loads(idx_path.read_text()) if idx_path.exists() else {}
+
+
+def list_simulations() -> dict:
+    idx_path = SIMULATIONS_DIR / "_index.json"
+    return json.loads(idx_path.read_text()) if idx_path.exists() else {}
+
+
+if __name__ == "__main__":
+    init_storage()
+    print(f"Storage initialized at {BASE_DIR}")
@@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+"""
+Facebook Page/Profile Data Extractor
+Uses multiple techniques to extract public Facebook data without authentication:
+1. Googlebot UA for OG meta tags (name, description, likes, talking_about, bio, og:image)
+2. Graph API /picture endpoint for profile photos (pages only)
+3. Page Plugin embed for follower counts and page IDs
+"""
+
+import subprocess
+import json
+import re
+import html
+import sys
+
+GOOGLEBOT_UA = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
+
+def curl_get(url, ua=None):
+    """Fetch URL with curl"""
+    cmd = ['curl', '-s', '-L', '--max-time', '15']
+    if ua:
+        cmd += ['-H', f'User-Agent: {ua}']
+    cmd.append(url)
+    result = subprocess.run(cmd, capture_output=True, text=True, timeout=20)
+    return result.stdout
+
+def extract_og_data(username):
+    """Extract OG meta tags using Googlebot UA"""
+    content = curl_get(f'https://www.facebook.com/{username}', ua=GOOGLEBOT_UA)
+    
+    data = {}
+    
+    # Extract OG tags
+    og_title = re.search(r'og:title"\s*content="([^"]*)"', content)
+    if og_title:
+        data['name'] = html.unescape(og_title.group(1))
+    
+    og_desc = re.search(r'og:description"\s*content="([^"]*)"', content)
+    if og_desc:
+        desc = html.unescape(og_desc.group(1))
+        data['raw_description'] = desc
+        
+        # Parse likes count
+        likes_match = re.search(r'([\d,]+)\s+likes?', desc)
+        if likes_match:
+            data['likes'] = likes_match.group(1)
+        
+        # Parse talking about
+        talking_match = re.search(r'([\d,]+)\s+talking about this', desc)
+        if talking_match:
+            data['talking_about'] = talking_match.group(1)
+        
+        # Extract bio (text after the "talking about this." part)
+        bio_match = re.search(r'talking about this\.\s*(.+)', desc)
+        if bio_match:
+            data['bio'] = bio_match.group(1)
+    
+    og_image = re.search(r'og:image"\s*content="([^"]*)"', content)
+    if og_image:
+        data['og_image'] = html.unescape(og_image.group(1))
+    
+    return data
+
+def extract_plugin_data(username):
+    """Extract data from Page Plugin embed"""
+    content = curl_get(f'https://www.facebook.com/plugins/page.php?href=https://www.facebook.com/{username}&tabs=timeline&width=500&height=600')
+    
+    data = {}
+    
+    # Page name from title attribute
+    name_match = re.search(r'class="_1drp _5lv6" title="([^"]*)"', content)
+    if name_match:
+        data['plugin_name'] = html.unescape(name_match.group(1))
+    
+    # Follower count
+    followers_match = re.search(r'([\d,]+)\s+followers', content)
+    if followers_match:
+        data['followers'] = followers_match.group(1)
+    
+    # Page ID
+    pageid_match = re.search(r'"pageID":"(\d+)"', content)
+    if pageid_match:
+        data['page_id'] = pageid_match.group(1)
+    
+    return data
+
+def extract_profile_picture(username):
+    """Get profile picture via Graph API"""
+    content = curl_get(f'https://graph.facebook.com/v19.0/{username}/picture?redirect=false&width=400&height=400')
+    try:
+        d = json.loads(content)
+        if 'data' in d and not d['data'].get('is_silhouette', True):
+            return d['data']['url']
+    except:
+        pass
+    return None
+
+def get_facebook_data(username):
+    """Combine all extraction methods"""
+    result = {'username': username}
+    
+    # Method 1: OG tags (best for bio, likes, talking_about)
+    og = extract_og_data(username)
+    result.update(og)
+    
+    # Method 2: Plugin (best for followers, page_id)
+    plugin = extract_plugin_data(username)
+    result.update(plugin)
+    
+    # Method 3: Graph API picture (pages only)
+    pic = extract_profile_picture(username)
+    if pic:
+        result['profile_picture'] = pic
+    
+    # Also try by page_id for picture if username didn't work
+    if not pic and 'page_id' in result:
+        pic2 = extract_profile_picture(result['page_id'])
+        if pic2:
+            result['profile_picture'] = pic2
+    
+    return result
+
+if __name__ == '__main__':
+    targets = sys.argv[1:] if len(sys.argv) > 1 else ['zuck', 'NVIDIA', 'Meta', 'CocaCola']
+    
+    for target in targets:
+        print(f"{'='*60}")
+        print(f"Facebook Profile: {target}")
+        print(f"{'='*60}")
+        data = get_facebook_data(target)
+        for k, v in data.items():
+            if k == 'raw_description':
+                continue  # Skip raw, we show parsed fields
+            val = str(v)
+            if len(val) > 120:
+                val = val[:120] + '...'
+            print(f"  {k}: {val}")
+        print()
+
@@ -0,0 +1,595 @@
+"""
+Hermes Simulator — Intelligence Gathering Pipeline v2
+
+Full-spectrum OSINT research engine for personality modeling.
+Searches text, extracts content, browses live pages, analyzes
+images with vision, and cross-references across platforms.
+
+Run via execute_code. The agent adapts searches based on findings.
+"""
+
+from hermes_tools import web_search, web_extract, terminal
+import json
+import time
+import urllib.parse
+
+# ═══════════════════════════════════════════════════════════════
+# CONFIGURATION
+# ═══════════════════════════════════════════════════════════════
+
+AGGREGATOR_SITES = [
+    "buttondown.com/ainews",
+    "news.smol.ai",
+    "techmeme.com",
+    "latent.space",
+]
+
+# Verified working fallback data sources (tested April 2026)
+# Priority order: X API > nitter.cz > ThreadReaderApp > GitHub > Reddit > HN
+FALLBACK_SOURCES = {
+    "nitter": "https://nitter.cz/{handle}",           # web_extract — full timeline
+    "threadreader": "https://threadreaderapp.com/user/{handle}",  # web_extract — historical threads
+    "github_profile": "https://api.github.com/users/{handle}",   # curl — profile + README
+    "github_events": "https://api.github.com/users/{handle}/events",  # curl — recent activity
+    "reddit_user": "https://www.reddit.com/user/{handle}.json",  # curl w/ User-Agent
+    "reddit_comments": "https://www.reddit.com/user/{handle}/comments.json",
+    "hn_search": "https://hn.algolia.com/api/v1/search?query={handle}&tags=comment",
+}
+
+# CONFIRMED BLOCKED (don't waste calls on these):
+# - LinkedIn (web_extract blocked, browser auth wall)
+# - Instagram viewers (imginn, picuki, dumpoir, gramhir — all 403)
+# - Most nitter instances (dead or 403, ONLY nitter.cz works via web_extract)
+# - Wayback Machine for tweets (sparse, no JS content)
+# - Google Cache of Twitter (empty)
+# - Archive.today (429 + CAPTCHA)
+# - Twitter Syndication API (rate limited)
+
+AI_SUBREDDITS = [
+    "LocalLLaMA", "MachineLearning", "singularity",
+    "ChatGPT", "ClaudeAI", "OpenAI", "StableDiffusion",
+]
+
+PLATFORMS = ["twitter", "instagram", "linkedin", "github", "reddit", "youtube"]
+
+# ═══════════════════════════════════════════════════════════════
+# HELPER: safe web_search with validation
+# ═══════════════════════════════════════════════════════════════
+
+def _safe_web_search(query: str, limit: int = 5) -> list:
+    """Run web_search and return results list, with validation."""
+    r = web_search(query, limit=limit)
+    if not isinstance(r, dict) or "data" not in r:
+        print(f"  [WARNING] web_search returned no 'data' key for query: {query[:80]}")
+        return []
+    data = r.get("data", {})
+    if not isinstance(data, dict):
+        return []
+    return data.get("web", []) or []
+
+
+# ═══════════════════════════════════════════════════════════════
+# CORE SEARCH FUNCTIONS
+# ═══════════════════════════════════════════════════════════════
+
+def search_identity(handle: str) -> dict:
+    """Establish who they are across the internet."""
+    results = {}
+    results["twitter_identity"] = _safe_web_search(f"@{handle} twitter bio role company", limit=5)
+    results["general_identity"] = _safe_web_search(f"{handle} known for", limit=5)
+    return results
+
+
+def search_voice(handle: str) -> dict:
+    """How do they actually talk/write."""
+    results = {}
+    results["takes"] = _safe_web_search(f"{handle} twitter hot takes opinions", limit=5)
+
+    for agg in AGGREGATOR_SITES[:2]:
+        hits = _safe_web_search(f"site:{agg} {handle}", limit=3)
+        if hits:
+            # Use full domain as key, not split('.')[0]
+            results[f"agg_{agg}"] = hits
+    return results
+
+
+def search_positions(handle: str, topics: list = None, domain: str = None) -> dict:
+    """What are their known positions."""
+    results = {}
+    if topics:
+        for topic in topics[:3]:
+            results[f"topic_{topic}"] = _safe_web_search(f"{handle} {topic} opinion take", limit=5)
+
+    # Build controversy query — only add domain keywords if specified
+    controversy_query = f"{handle} debate disagree controversial"
+    if domain:
+        controversy_query += f" {domain}"
+    results["controversies"] = _safe_web_search(controversy_query, limit=5)
+    return results
+
+
+def search_longform(handle: str, real_name: str = None, domain: str = None) -> dict:
+    """Blogs, interviews, essays."""
+    results = {}
+    name = real_name or handle
+
+    blog_query = f"{name} blog substack essay"
+    interview_query = f"{name} interview podcast"
+    if domain:
+        blog_query += f" {domain}"
+        interview_query += f" {domain}"
+
+    results["blogs"] = _safe_web_search(blog_query, limit=5)
+    results["interviews"] = _safe_web_search(interview_query, limit=5)
+    return results
+
+
+# ═══════════════════════════════════════════════════════════════
+# CROSS-PLATFORM DISCOVERY
+# ═══════════════════════════════════════════════════════════════
+
+def discover_platforms(handle: str, real_name: str = None) -> dict:
+    """Find someone across all platforms."""
+    name = real_name or handle
+    results = {}
+
+    # Instagram
+    results["instagram"] = _safe_web_search(f"{name} instagram OR site:instagram.com/{handle}", limit=5)
+
+    # LinkedIn
+    results["linkedin"] = _safe_web_search(f"{name} linkedin OR site:linkedin.com/in", limit=5)
+
+    # Reddit
+    results["reddit"] = _safe_web_search(f"{name} reddit account OR site:reddit.com/user", limit=5)
+
+    # GitHub
+    results["github"] = _safe_web_search(f"{handle} github OR site:github.com/{handle}", limit=5)
+
+    # YouTube
+    results["youtube"] = _safe_web_search(f"{name} youtube channel OR talk OR interview", limit=5)
+
+    # Personal site
+    results["personal_site"] = _safe_web_search(f"{name} personal website blog about", limit=5)
+
+    # Hacker News
+    results["hackernews"] = _safe_web_search(f"site:news.ycombinator.com {handle} OR {name}", limit=3)
+
+    return results
+
+
+def discover_instagram(handle: str = None, real_name: str = None) -> dict:
+    """Focused Instagram discovery."""
+    results = {}
+    name = real_name or handle
+
+    # Try to find their IG handle
+    results["ig_search"] = _safe_web_search(f"{name} instagram profile", limit=5)
+
+    # If we have a candidate IG URL, try to extract
+    ig_urls = []
+    for item in results.get("ig_search", []):
+        if not isinstance(item, dict):
+            continue
+        url = item.get("url", "")
+        if "instagram.com/" in url and "/p/" not in url:
+            ig_urls.append(url)
+
+    if ig_urls:
+        # Try to extract IG profile page
+        r = web_extract(urls=ig_urls[:1])
+        results["ig_profile"] = r.get("results", [])
+
+    return results
+
+
+# ═══════════════════════════════════════════════════════════════
+# VISUAL INTELLIGENCE
+# ═══════════════════════════════════════════════════════════════
+
+# NOTE: These functions use browser_* and vision_analyze which are
+# NOT available in execute_code. They are called DIRECTLY by the
+# agent after the execute_code research phase.
+#
+# The agent should:
+# 1. Run this script via execute_code for text-based research
+# 2. Then use browser/vision tools directly for visual research
+#
+# Visual research tasks for the agent:
+#
+# INSTAGRAM VISUAL:
+#   browser_navigate("https://www.instagram.com/{ig_handle}/")
+#   browser_vision(question="Describe this Instagram profile: bio, pic, grid, aesthetic, follower count")
+#   browser_get_images()  # collect image URLs
+#   vision_analyze(image_url="{url}", question="Describe: setting, people, mood, style")
+#
+# PROFILE PIC ANALYSIS:
+#   vision_analyze(image_url="{pic_url}", question="Describe: appearance, clothing, setting, expression, professional vs casual")
+#
+# REVERSE IMAGE SEARCH (Yandex):
+#   # Upload to catbox if behind auth:
+#   terminal("curl -F 'reqtype=fileupload' -F 'fileToUpload=@{path}' https://catbox.moe/user/api.php")
+#   browser_navigate(f"https://yandex.com/images/search?rpt=imageview&url={encoded_url}")
+#
+# PAGE SCREENSHOT ANALYSIS:
+#   browser_vision(question="Read all text, usernames, post content, dates, engagement numbers")
+
+
+# ═══════════════════════════════════════════════════════════════
+# INTERACTION MAPPING
+# ═══════════════════════════════════════════════════════════════
+
+def search_interactions(handle: str, other_handles: list = None) -> dict:
+    """How they interact with other simulation targets."""
+    results = {}
+    if other_handles:
+        for other in other_handles[:4]:
+            hits = _safe_web_search(f"{handle} {other} twitter interaction debate reply", limit=3)
+            if hits:
+                results[f"with_{other}"] = hits
+    return results
+
+
+def search_social_graph(handle: str) -> dict:
+    """Who do they interact with most? Allies and rivals."""
+    results = {}
+
+    results["frequent_interactions"] = _safe_web_search(f"@{handle} twitter reply thread conversation with", limit=5)
+    results["conflicts"] = _safe_web_search(f"@{handle} disagree argue beef ratio", limit=5)
+    results["allies"] = _safe_web_search(f"@{handle} agree support endorse recommend", limit=5)
+
+    return results
+
+
+# ═══════════════════════════════════════════════════════════════
+# DEEP EXTRACTION
+# ═══════════════════════════════════════════════════════════════
+
+def extract_content(urls: list) -> list:
+    """Pull full content from high-value URLs."""
+    if not urls:
+        return []
+    r = web_extract(urls=urls[:3])
+    return r.get("results", [])
+
+
+def extract_best_urls(findings: dict, max_urls: int = 5) -> list:
+    """Find the most promising URLs in research findings for deep extraction."""
+    seen_urls = set()  # URL deduplication
+    priority_domains = [
+        "substack.com", "medium.com", "blog", "essay",
+        "interview", "podcast", "youtube.com", "arxiv.org",
+    ]
+
+    def score_url(url, desc):
+        score = 0
+        for domain in priority_domains:
+            if domain in url.lower() or domain in desc.lower():
+                score += 2
+        if any(w in desc.lower() for w in ["interview", "spoke", "told", "said", "wrote"]):
+            score += 1
+        return score
+
+    candidates = []
+
+    def collect(obj):
+        if isinstance(obj, list):
+            for item in obj:
+                if isinstance(item, dict):
+                    url = item.get("url") or ""
+                    desc = item.get("description") or item.get("text") or ""
+                    if url and url not in seen_urls and not any(x in url for x in ["x.com", "twitter.com", "instagram.com"]):
+                        seen_urls.add(url)
+                        candidates.append((score_url(url, desc), url))
+        elif isinstance(obj, dict):
+            for v in obj.values():
+                collect(v)
+
+    collect(findings)
+    candidates.sort(key=lambda x: -x[0])
+    return [url for _, url in candidates[:max_urls]]
+
+
+# ═══════════════════════════════════════════════════════════════
+# MAIN PIPELINE
+# ═══════════════════════════════════════════════════════════════
+
+def research_person(handle: str, fidelity: int = 70,
+                    topics: list = None,
+                    other_handles: list = None,
+                    real_name: str = None,
+                    domain: str = None) -> dict:
+    """
+    Full research pipeline for one person.
+    Returns dict with all findings organized by category.
+
+    Args:
+        handle: Twitter/X handle (without @)
+        fidelity: Research depth 0-100
+        topics: Specific topics to research
+        other_handles: Other people to check interactions with
+        real_name: Real name if different from handle
+        domain: Domain context (e.g., 'AI', 'politics', 'gaming').
+                When None, no domain keywords are added to searches.
+                When set, adds relevant domain keywords.
+    """
+    print(f"\n{'='*60}")
+    print(f"  RESEARCHING: @{handle} | Fidelity: {fidelity}%")
+    if domain:
+        print(f"  Domain: {domain}")
+    print(f"{'='*60}")
+
+    findings = {"handle": handle, "fidelity": fidelity, "visual_tasks": []}
+
+    # ─── Phase 1: Identity (always) ───
+    print(f"\n  [IDENTITY] Who are they...")
+    findings["identity"] = search_identity(handle)
+
+    if fidelity <= 30:
+        if topics:
+            findings["quick_topic"] = _safe_web_search(f"{handle} {topics[0]}", limit=3)
+        return findings
+
+    # ─── Phase 2: Voice (fidelity 31+) ───
+    print(f"\n  [VOICE] How do they talk...")
+    findings["voice"] = search_voice(handle)
+
+    # ─── Phase 3: Positions (fidelity 31+) ───
+    print(f"\n  [POSITIONS] What do they believe...")
+    findings["positions"] = search_positions(handle, topics, domain=domain)
+
+    if fidelity <= 50:
+        return findings
+
+    # ─── Phase 4: Cross-platform (fidelity 51+) ───
+    print(f"\n  [PLATFORMS] Finding them everywhere...")
+    findings["platforms"] = discover_platforms(handle, real_name)
+
+    if fidelity <= 70:
+        return findings
+
+    # ─── Phase 5: Longform (fidelity 71+) ───
+    print(f"\n  [LONGFORM] Blogs, interviews, essays...")
+    findings["longform"] = search_longform(handle, real_name, domain=domain)
+
+    # ─── Phase 6: Social graph (fidelity 71+) ───
+    print(f"\n  [SOCIAL GRAPH] Who do they interact with...")
+    findings["social_graph"] = search_social_graph(handle)
+
+    # ─── Phase 7: Interaction mapping (fidelity 71+) ───
+    if other_handles:
+        print(f"\n  [INTERACTIONS] With other targets: {other_handles}...")
+        findings["interactions"] = search_interactions(handle, other_handles)
+
+    # ─── Phase 8: Instagram deep dive (fidelity 80+) ───
+    if fidelity >= 80:
+        print(f"\n  [INSTAGRAM] Visual identity...")
+        findings["instagram"] = discover_instagram(handle, real_name)
+
+        # Queue visual tasks for the agent to do after execute_code
+        findings["visual_tasks"].append({
+            "type": "instagram_profile",
+            "instruction": f"browser_navigate to Instagram profile, use browser_vision to analyze",
+            "handle": handle,
+        })
+
+    # ─── Phase 9: Deep extraction (fidelity 85+) ───
+    if fidelity >= 85:
+        print(f"\n  [DEEP EXTRACT] Pulling longform content...")
+        best_urls = extract_best_urls(findings, max_urls=4)
+        if best_urls:
+            print(f"    Extracting {len(best_urls)} URLs: {best_urls}")
+            findings["deep_extracts"] = extract_content(best_urls)
+
+    # ─── Phase 10: Profile pic analysis (fidelity 90+) ───
+    if fidelity >= 90:
+        findings["visual_tasks"].append({
+            "type": "profile_pic_analysis",
+            "instruction": "Find and analyze profile pictures across platforms with vision_analyze",
+            "handle": handle,
+        })
+        findings["visual_tasks"].append({
+            "type": "reverse_image_search",
+            "instruction": "Reverse image search profile pic via Yandex to find alt accounts",
+            "handle": handle,
+        })
+
+    return findings
+
+
+def research_all(handles: list, fidelity: int = 70,
+                 topics: list = None, domain: str = None) -> dict:
+    """Research all simulation targets."""
+    all_findings = {}
+
+    for handle in handles:
+        clean = handle.lstrip("@")
+        others = [h.lstrip("@") for h in handles if h.lstrip("@") != clean]
+
+        findings = research_person(
+            handle=clean,
+            fidelity=fidelity,
+            topics=topics,
+            other_handles=others,
+            domain=domain,
+        )
+        all_findings[clean] = findings
+
+    return all_findings
+
+
+# ═══════════════════════════════════════════════════════════════
+# REPORTING
+# ═══════════════════════════════════════════════════════════════
+
+def count_data_points(obj) -> int:
+    """Count total search result items in findings (only meaningful items with >50 char text)."""
+    total = 0
+    if isinstance(obj, list):
+        for item in obj:
+            if isinstance(item, dict):
+                text = item.get("description") or item.get("text") or ""
+                if len(text) > 50:
+                    total += 1
+                else:
+                    # Still count non-dict items or items without text fields
+                    total += 1
+            else:
+                total += 1
+    elif isinstance(obj, dict):
+        for k, v in obj.items():
+            # Skip metadata keys
+            if k in ("handle", "fidelity", "visual_tasks"):
+                continue
+            total += count_data_points(v)
+    return total
+
+
+def count_quality_data_points(obj) -> int:
+    """Count search result items with substantial text (description/text > 50 chars)."""
+    total = 0
+    if isinstance(obj, list):
+        for item in obj:
+            if isinstance(item, dict):
+                text = item.get("description") or item.get("text") or ""
+                if len(text) > 50:
+                    total += 1
+    elif isinstance(obj, dict):
+        for k, v in obj.items():
+            if k in ("handle", "fidelity", "visual_tasks"):
+                continue
+            total += count_quality_data_points(v)
+    return total
+
+
+def summarize_findings(findings: dict) -> str:
+    """Compact summary of what we found."""
+    handle = findings.get("handle", "unknown")
+    fidelity = findings.get("fidelity", 0)
+    total = count_data_points(findings)
+    quality = count_quality_data_points(findings)
+    visual_tasks = findings.get("visual_tasks", [])
+
+    lines = [
+        f"\n{'━'*60}",
+        f"  @{handle} | Fidelity: {fidelity}% | Data points: {total} ({quality} quality)",
+        f"{'━'*60}",
+    ]
+
+    # Identity snippets
+    identity = findings.get("identity", {})
+    for key in ["twitter_identity", "general_identity"]:
+        for item in identity.get(key, [])[:2]:
+            if not isinstance(item, dict):
+                continue
+            desc = (item.get("description") or "")[:180]
+            if desc:
+                lines.append(f"  [{key.upper()}] {desc}")
+
+    # Platform discovery results
+    platforms = findings.get("platforms", {})
+    found_platforms = []
+    for platform, items in platforms.items():
+        if isinstance(items, list) and len(items) > 0:
+            found_platforms.append(platform)
+    if found_platforms:
+        lines.append(f"  [PLATFORMS FOUND] {', '.join(found_platforms)}")
+
+    # Voice samples from aggregators
+    voice = findings.get("voice", {})
+    for key, items in voice.items():
+        if isinstance(items, list):
+            for item in items[:1]:
+                if not isinstance(item, dict):
+                    continue
+                desc = (item.get("description") or "")[:180]
+                if desc and handle.lower() in desc.lower():
+                    lines.append(f"  [VOICE] {desc}")
+
+    # Deep extracts
+    for extract in findings.get("deep_extracts", [])[:2]:
+        if not isinstance(extract, dict):
+            continue
+        title = extract.get("title", "untitled")
+        content = (extract.get("content") or "")[:200]
+        if content:
+            lines.append(f"  [LONGFORM: {title}] {content}...")
+
+    # Pending visual tasks
+    if visual_tasks:
+        lines.append(f"  [VISUAL TASKS QUEUED] {len(visual_tasks)} tasks for agent to execute:")
+        for task in visual_tasks:
+            lines.append(f"    → {task.get('type', '?')}: {task.get('instruction', '?')[:80]}")
+
+    # Confidence estimate — based on quality data points
+    if quality >= 30:
+        conf = "HIGH"
+    elif quality >= 15:
+        conf = "MEDIUM"
+    elif quality >= 5:
+        conf = "LOW"
+    else:
+        conf = "INSUFFICIENT"
+    lines.append(f"\n  CONFIDENCE: {conf} ({quality} quality data points, {total} total)")
+
+    return "\n".join(lines)
+
+
+def report_visual_tasks(all_findings: dict) -> str:
+    """Collect all visual tasks across all targets for agent to execute."""
+    lines = ["\n" + "═"*60, "  VISUAL INTELLIGENCE TASKS (agent must execute directly)", "═"*60]
+
+    any_tasks = False
+    for handle, findings in all_findings.items():
+        for task in findings.get("visual_tasks", []):
+            any_tasks = True
+            lines.append(f"\n  @{handle} — {task.get('type', '?')}:")
+            lines.append(f"    {task.get('instruction', '?')}")
+
+    if not any_tasks:
+        lines.append("  No visual tasks queued (fidelity < 80)")
+
+    return "\n".join(lines)
+
+
+# ═══════════════════════════════════════════════════════════════
+# CHECK AVAILABLE TOOLS
+# ═══════════════════════════════════════════════════════════════
+
+def check_x_cli() -> bool:
+    """Check if x-cli is available."""
+    try:
+        r = terminal("which x-cli 2>/dev/null && echo 'FOUND' || echo 'NOT_FOUND'")
+        return "FOUND" in r.get("output", "")
+    except:
+        return False
+
+
+# ═══════════════════════════════════════════════════════════════
+# ENTRY POINT
+# ═══════════════════════════════════════════════════════════════
+
+if __name__ == "__main__":
+    # ── CONFIGURE THESE ──
+    HANDLES = ["teknium1", "basedjensen"]
+    FIDELITY = 80
+    TOPICS = ["open source AI", "compute scaling"]
+    DOMAIN = None  # Set to 'AI', 'politics', etc. to add domain keywords
+    # ─────────────────────
+
+    has_xcli = check_x_cli()
+    print(f"x-cli available: {has_xcli}")
+    print(f"Targets: {HANDLES}")
+    print(f"Fidelity: {FIDELITY}%")
+    print(f"Topics: {TOPICS}")
+    print(f"Domain: {DOMAIN}")
+
+    results = research_all(HANDLES, fidelity=FIDELITY, topics=TOPICS, domain=DOMAIN)
+
+    for handle, findings in results.items():
+        print(summarize_findings(findings))
+
+    print(report_visual_tasks(results))
+    print("\n\nResearch phase complete. Agent should now:")
+    print("1. Execute any queued visual tasks (browser/vision)")
+    print("2. Compile dossiers from all findings")
+    print("3. Run simulation")
@@ -0,0 +1,238 @@
+#!/usr/bin/env python3
+"""
+Threads (Meta) Profile & Post Extractor
+========================================
+Extracts profile data and post content from Threads using:
+1. OG meta tags from HTML (no auth required for profiles and public posts)
+2. WebFinger for ActivityPub discovery
+3. Google-indexed post URLs for recent post discovery
+
+METHODS THAT WORK:
+- Profile pages at threads.net/@{user} have OG tags with:
+  display_name, username, follower_count, thread_count, bio, profile_pic
+- Individual post pages have OG tags with:
+  full post text, author info, profile pic
+- WebFinger at /.well-known/webfinger gives ActivityPub user IDs
+- Post URLs must be known (discoverable via web search)
+
+METHODS THAT DON'T WORK (as of 2025):
+- Threads Official API (graph.threads.net) requires OAuth token
+- ActivityPub /ap/users/ endpoints return 404 for most users
+- No public post listing endpoint exists
+"""
+
+import re
+import json
+import html
+import subprocess
+import sys
+
+def curl_fetch(url, extra_headers=None, timeout=15):
+    """Fetch URL using curl (more reliable than urllib for Threads)."""
+    cmd = ['curl', '-s', '-L', '--max-time', str(timeout)]
+    if extra_headers:
+        for k, v in extra_headers.items():
+            cmd.extend(['-H', f'{k}: {v}'])
+    cmd.append(url)
+    try:
+        result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout+5)
+        return result.stdout
+    except:
+        return None
+
+def extract_og_tags(html_content):
+    """Extract OpenGraph, meta description, and Twitter tags from HTML."""
+    data = {}
+    if not html_content:
+        return data
+    
+    for m in re.finditer(r'property="(og:[^"]+)"\s+content="([^"]*)"', html_content):
+        key = m.group(1)
+        val = html.unescape(m.group(2))
+        if key not in data:
+            data[key] = val
+    
+    for m in re.finditer(r'name="description"\s+content="([^"]*)"', html_content):
+        data['description'] = html.unescape(m.group(1))
+        break
+    
+    for m in re.finditer(r'name="(twitter:[^"]+)"\s+content="([^"]*)"', html_content):
+        key = m.group(1)
+        val = html.unescape(m.group(2))
+        if key not in data:
+            data[key] = val
+    
+    return data
+
+def parse_profile_description(desc):
+    """Parse '5.5M Followers • 142 Threads • Bio. See the latest...' format."""
+    result = {}
+    if not desc:
+        return result
+    
+    parts = desc.split(' \u2022 ')  # Split on bullet •
+    for part in parts:
+        part = part.strip()
+        if 'Follower' in part:
+            result['followers'] = part.split(' Follower')[0].strip()
+        elif part.endswith('Threads') or part.endswith('Thread'):
+            result['thread_count'] = part.split(' Thread')[0].strip()
+        else:
+            bio = re.sub(r'\s*See the latest conversations.*$', '', part)
+            if bio:
+                result['bio'] = bio
+    
+    return result
+
+def parse_profile_title(title):
+    """Parse 'Display Name (@user) • Threads, Say more' format."""
+    result = {}
+    if not title:
+        return result
+    m = re.match(r'^(.+?)\s*\(@(\w+)\)', title)
+    if m:
+        result['display_name'] = m.group(1).strip()
+        result['username'] = m.group(2)
+    return result
+
+def get_threads_profile(username):
+    """
+    Get Threads profile data via OG meta tags.
+    Returns dict with: username, display_name, bio, followers, thread_count, 
+                       profile_picture_url, url
+    """
+    username = username.lstrip('@')
+    url = f'https://www.threads.net/@{username}'
+    
+    content = curl_fetch(url)
+    tags = extract_og_tags(content)
+    
+    if not tags or 'og:title' not in tags:
+        return {'error': 'Failed to fetch or parse profile', 'username': username}
+    
+    title = tags.get('og:title', '')
+    if title.startswith('Threads') and 'Log in' in title:
+        return {'error': 'Profile requires login or not found', 'username': username}
+    
+    result = {
+        'platform': 'threads',
+        'url': url,
+    }
+    
+    result.update(parse_profile_title(title))
+    result.update(parse_profile_description(tags.get('og:description', '')))
+    
+    if 'og:image' in tags:
+        result['profile_picture_url'] = tags['og:image']
+    
+    return result
+
+def get_threads_webfinger(username):
+    """Get WebFinger data (ActivityPub discovery) for a Threads user."""
+    username = username.lstrip('@')
+    url = f'https://www.threads.net/.well-known/webfinger?resource=acct:{username}@threads.net'
+    
+    content = curl_fetch(url, {'Accept': 'application/json'})
+    if not content:
+        return None
+    
+    try:
+        data = json.loads(content)
+        if 'error' in data or 'success' in data and not data['success']:
+            return None
+        
+        result = {'subject': data.get('subject', '')}
+        for link in data.get('links', []):
+            if link.get('type') == 'application/activity+json':
+                result['activitypub_url'] = link['href']
+            elif link.get('rel') == 'http://webfinger.net/rel/profile-page':
+                result['profile_url'] = link['href']
+        return result
+    except:
+        return None
+
+def get_thread_post(post_url):
+    """
+    Get content of a specific Threads post via OG tags.
+    Returns: text, author, image_url
+    """
+    content = curl_fetch(post_url)
+    tags = extract_og_tags(content)
+    
+    if not tags or 'og:title' not in tags:
+        return {'error': 'Failed to fetch post'}
+    
+    title = tags.get('og:title', '')
+    if 'Log in' in title:
+        return {'error': 'Post requires login or not found'}
+    
+    result = {'url': post_url}
+    
+    if 'og:description' in tags:
+        result['text'] = tags['og:description']
+    elif 'description' in tags:
+        result['text'] = tags['description']
+    
+    if 'og:title' in tags:
+        # Parse "Display Name (@username) on Threads"
+        m = re.match(r'^(.+?)\s*\(@(\w+)\)\s+on\s+Threads', title)
+        if m:
+            result['author_name'] = m.group(1).strip()
+            result['author_username'] = m.group(2)
+    
+    if 'og:image' in tags:
+        result['image_url'] = tags['og:image']
+    
+    return result
+
+def get_threads_full(username):
+    """Get complete profile data combining all methods."""
+    profile = get_threads_profile(username)
+    wf = get_threads_webfinger(username)
+    
+    if wf:
+        profile['webfinger'] = wf
+    
+    return profile
+
+
+# ===== TEST =====
+if __name__ == '__main__':
+    test_users = sys.argv[1:] if len(sys.argv) > 1 else ['zuck', 'nvidia', 'mosseri']
+    
+    for user in test_users:
+        print(f"\n{'='*60}")
+        print(f"  THREADS PROFILE: @{user}")
+        print(f"{'='*60}")
+        
+        data = get_threads_full(user)
+        for k, v in sorted(data.items()):
+            if k == 'profile_picture_url':
+                print(f"  {k}: {str(v)[:80]}...")
+            elif k == 'webfinger':
+                print(f"  webfinger:")
+                for wk, wv in v.items():
+                    print(f"    {wk}: {wv}")
+            else:
+                print(f"  {k}: {v}")
+    
+    # Test posts
+    post_urls = [
+        'https://www.threads.net/@zuck/post/DEkvXzbyDS9',
+    ]
+    
+    print(f"\n{'='*60}")
+    print(f"  THREADS POSTS")
+    print(f"{'='*60}")
+    
+    for purl in post_urls:
+        print(f"\n  URL: {purl}")
+        post = get_thread_post(purl)
+        for k, v in post.items():
+            if k in ('image_url',):
+                print(f"  {k}: {str(v)[:80]}...")
+            elif k == 'text':
+                print(f"  {k}: {v[:300]}{'...' if len(v) > 300 else ''}")
+            else:
+                print(f"  {k}: {v}")
+
@@ -0,0 +1,305 @@
+"""
+TikTok Profile & Video Data Scraper
+====================================
+WORKING methods to get full TikTok profile data and video content.
+Tested and verified April 2026.
+
+METHODS SUMMARY:
+================
+METHOD 1 (BEST): HTML SSR Scraping - Parse __UNIVERSAL_DATA_FOR_REHYDRATION__
+  - Gets: FULL profile (bio, stats, follower/following/heart/video counts)
+  - Works: YES - Reliable, no auth needed, just curl + parse
+  - Limitation: No video list on profile page (videos load client-side)
+
+METHOD 2: oEmbed API - https://www.tiktok.com/oembed?url=...
+  - Gets: Video title/caption, author, thumbnail URL
+  - Works: YES - No auth, no rate limit issues
+  - Limitation: Need video IDs first; no engagement stats
+
+METHOD 3: tikwm.com API - https://www.tikwm.com/api/
+  - Gets: Full user info + individual video stats (plays, likes, comments, shares)
+  - User info: https://www.tikwm.com/api/user/info?unique_id={username}
+  - Video info: https://www.tikwm.com/api/?url={tiktok_video_url}
+  - Works: YES for user info and single videos
+  - Limitation: Posts list endpoint returns 403 (rate-limited)
+
+METHOD 4: Video ID Discovery via Search Engines
+  - Use web_search("site:tiktok.com/@{username}/video") to find video IDs
+  - Then use oEmbed or tikwm or HTML scraping per video
+  - Works: YES - Gets ~5 recent video IDs per search
+
+METHOD 5: SocialBlade via web_extract
+  - URL: https://socialblade.com/tiktok/user/{username}
+  - Gets: Followers, following, likes, videos, growth trends, rankings
+  - Works: YES via web_extract tool
+
+METHOD 6: Individual Video HTML Scraping
+  - Fetch https://www.tiktok.com/@{user}/video/{id}
+  - Parse __UNIVERSAL_DATA webapp.video-detail -> itemInfo.itemStruct
+  - Gets: FULL video data (caption, stats, music, hashtags, duration)
+  - Works: YES - Most complete per-video data
+
+NOT WORKING:
+  - TikTok /api/user/detail/ endpoint -> returns empty (needs signed params)
+  - TikTok /api/post/item_list/ -> returns empty (needs x-bogus/msToken)
+  - tikwm.com /api/user/posts -> 403 forbidden
+"""
+
+import re
+import json
+import subprocess
+import urllib.parse
+
+USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
+
+
+def fetch_url(url, headers=None):
+    """Fetch URL via curl and return content."""
+    cmd = ['curl', '-s', '-L', '-m', '30', url,
+           '-H', f'User-Agent: {USER_AGENT}',
+           '-H', 'Accept-Language: en-US,en;q=0.9']
+    if headers:
+        for k, v in headers.items():
+            cmd.extend(['-H', f'{k}: {v}'])
+    result = subprocess.run(cmd, capture_output=True, text=True, timeout=35)
+    return result.stdout
+
+
+def method1_html_profile(username):
+    """
+    METHOD 1: Scrape TikTok profile HTML and parse SSR JSON data.
+    Returns full profile with stats.
+    """
+    url = f'https://www.tiktok.com/@{username}'
+    html = fetch_url(url)
+
+    m = re.search(
+        r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__" type="application/json">(.*?)</script>',
+        html
+    )
+    if not m:
+        return None
+
+    data = json.loads(m.group(1))
+    scope = data.get('__DEFAULT_SCOPE__', {})
+    user_detail = scope.get('webapp.user-detail', {})
+    user_info = user_detail.get('userInfo', {})
+
+    if not user_info:
+        return None
+
+    user = user_info.get('user', {})
+    stats = user_info.get('statsV2', user_info.get('stats', {}))
+
+    return {
+        'id': user.get('id'),
+        'username': user.get('uniqueId'),
+        'nickname': user.get('nickname'),
+        'bio': user.get('signature'),
+        'verified': user.get('verified'),
+        'private': user.get('privateAccount'),
+        'secUid': user.get('secUid'),
+        'avatarLarger': user.get('avatarLarger'),
+        'bioLink': user.get('bioLink', {}),
+        'createTime': user.get('createTime'),
+        'language': user.get('language'),
+        'stats': {
+            'followers': int(stats.get('followerCount', 0)),
+            'following': int(stats.get('followingCount', 0)),
+            'hearts': int(stats.get('heartCount', 0)),
+            'videos': int(stats.get('videoCount', 0)),
+            'diggs': int(stats.get('diggCount', 0)),
+            'friends': int(stats.get('friendCount', 0)),
+        }
+    }
+
+
+def method2_oembed_video(username, video_id):
+    """
+    METHOD 2: Get video caption/title via oEmbed.
+    No auth needed. Returns caption, author, thumbnail.
+    """
+    url = f'https://www.tiktok.com/oembed?url=https://www.tiktok.com/@{username}/video/{video_id}'
+    content = fetch_url(url)
+    try:
+        data = json.loads(content)
+        return {
+            'video_id': video_id,
+            'title': data.get('title', ''),
+            'author_name': data.get('author_name'),
+            'author_url': data.get('author_url'),
+            'thumbnail_url': data.get('thumbnail_url'),
+            'thumbnail_width': data.get('thumbnail_width'),
+            'thumbnail_height': data.get('thumbnail_height'),
+        }
+    except json.JSONDecodeError:
+        return None
+
+
+def method3_tikwm_user(username):
+    """
+    METHOD 3a: Get user info via tikwm.com API.
+    """
+    url = f'https://www.tikwm.com/api/user/info?unique_id={username}'
+    content = fetch_url(url)
+    try:
+        data = json.loads(content)
+        if data.get('code') == 0:
+            return data['data']
+    except json.JSONDecodeError:
+        pass
+    return None
+
+
+def method3_tikwm_video(video_url):
+    """
+    METHOD 3b: Get video details via tikwm.com API.
+    Returns: title, play_count, digg_count, comment_count, share_count, duration, download URLs
+    """
+    url = f'https://www.tikwm.com/api/?url={urllib.parse.quote(video_url)}'
+    content = fetch_url(url)
+    try:
+        data = json.loads(content)
+        if data.get('code') == 0:
+            v = data['data']
+            return {
+                'video_id': v.get('id'),
+                'title': v.get('title'),
+                'duration': v.get('duration'),
+                'play_count': v.get('play_count'),
+                'likes': v.get('digg_count'),
+                'comments': v.get('comment_count'),
+                'shares': v.get('share_count'),
+                'author': v.get('author', {}).get('unique_id'),
+                'music_title': v.get('music_info', {}).get('title') if v.get('music_info') else None,
+                'cover_url': v.get('origin_cover') or v.get('cover'),
+                'play_url': v.get('play'),  # direct video URL
+            }
+    except json.JSONDecodeError:
+        pass
+    return None
+
+
+def method6_html_video(username, video_id):
+    """
+    METHOD 6: Scrape individual video page HTML for full data.
+    Gets: caption, full stats, music, hashtags, create time.
+    """
+    url = f'https://www.tiktok.com/@{username}/video/{video_id}'
+    html = fetch_url(url)
+
+    m = re.search(
+        r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__" type="application/json">(.*?)</script>',
+        html
+    )
+    if not m:
+        return None
+
+    data = json.loads(m.group(1))
+    scope = data.get('__DEFAULT_SCOPE__', {})
+    vd = scope.get('webapp.video-detail', {})
+    item = vd.get('itemInfo', {}).get('itemStruct', {})
+
+    if not item:
+        return None
+
+    stats = item.get('statsV2', item.get('stats', {}))
+    music = item.get('music', {})
+    challenges = item.get('challenges', [])
+
+    return {
+        'video_id': item.get('id'),
+        'description': item.get('desc'),
+        'createTime': item.get('createTime'),
+        'duration': item.get('video', {}).get('duration'),
+        'stats': {
+            'plays': int(stats.get('playCount', 0)),
+            'likes': int(stats.get('diggCount', 0)),
+            'comments': int(stats.get('commentCount', 0)),
+            'shares': int(stats.get('shareCount', 0)),
+            'saves': int(stats.get('collectCount', 0)),
+        },
+        'music': {
+            'title': music.get('title'),
+            'author': music.get('authorName'),
+        },
+        'hashtags': [c.get('title', '') for c in challenges],
+        'author': item.get('author', {}).get('uniqueId'),
+    }
+
+
+def get_full_tiktok_profile(username):
+    """
+    Complete pipeline: Get full profile + discover and scrape recent videos.
+    
+    Returns dict with profile data, stats, and recent video details.
+    """
+    # Step 1: Get profile data
+    profile = method1_html_profile(username)
+    if not profile:
+        return {'error': f'Could not fetch profile for @{username}'}
+
+    result = {
+        'profile': profile,
+        'videos': [],
+        'data_sources': ['tiktok_html_ssr'],
+    }
+
+    # Note: Video discovery requires web_search tool (not available in pure Python)
+    # In the agent context, use:
+    #   web_search(f"site:tiktok.com/@{username}/video")
+    # Then for each video ID found, call method6_html_video() or method2_oembed_video()
+    
+    return result
+
+
+if __name__ == '__main__':
+    import sys
+    username = sys.argv[1] if len(sys.argv) > 1 else 'khaby.lame'
+    
+    print(f'=== Testing TikTok scraping for @{username} ===\n')
+    
+    print('--- METHOD 1: HTML Profile Scraping ---')
+    profile = method1_html_profile(username)
+    if profile:
+        print(f'  Username: {profile["username"]}')
+        print(f'  Nickname: {profile["nickname"]}')
+        print(f'  Bio: {profile["bio"][:100]}')
+        print(f'  Verified: {profile["verified"]}')
+        print(f'  Followers: {profile["stats"]["followers"]:,}')
+        print(f'  Following: {profile["stats"]["following"]:,}')
+        print(f'  Hearts: {profile["stats"]["hearts"]:,}')
+        print(f'  Videos: {profile["stats"]["videos"]:,}')
+        print(f'  SecUid: {profile["secUid"][:50]}...')
+    else:
+        print('  FAILED')
+    
+    print('\n--- METHOD 3a: tikwm.com User API ---')
+    tikwm_user = method3_tikwm_user(username)
+    if tikwm_user:
+        s = tikwm_user.get('stats', {})
+        print(f'  Followers: {s.get("followerCount"):,}')
+        print(f'  Hearts: {s.get("heartCount"):,}')
+        print(f'  Videos: {s.get("videoCount"):,}')
+    else:
+        print('  FAILED')
+    
+    # Test with a known video
+    test_video_id = '7615318641042623775'  # khaby birthday video
+    if username == 'khaby.lame':
+        print(f'\n--- METHOD 2: oEmbed for video {test_video_id} ---')
+        oembed = method2_oembed_video(username, test_video_id)
+        if oembed:
+            print(f'  Title: {oembed["title"][:80]}')
+        
+        print(f'\n--- METHOD 6: HTML Video Scraping for {test_video_id} ---')
+        video = method6_html_video(username, test_video_id)
+        if video:
+            print(f'  Description: {video["description"][:80]}')
+            print(f'  Plays: {video["stats"]["plays"]:,}')
+            print(f'  Likes: {video["stats"]["likes"]:,}')
+            print(f'  Comments: {video["stats"]["comments"]:,}')
+            print(f'  Shares: {video["stats"]["shares"]:,}')
+            print(f'  Hashtags: {video["hashtags"]}')
+    
+    print('\n=== DONE ===')
@@ -0,0 +1,260 @@
+"""
+Direct X/Twitter API v2 client for Hermes Simulator.
+No x-cli dependency — uses curl via terminal() with bearer token.
+
+Provides:
+- get_user(handle) — profile, bio, metrics
+- get_tweets(user_id, count) — recent tweets with metrics
+- search_tweets(query, count) — search for tweets
+- get_user_mentions(user_id, count) — mentions of a user
+"""
+
+from hermes_tools import terminal
+import json
+import os
+import time
+import urllib.parse
+
+# Bearer token — loaded from env or hardcoded fallback
+BEARER = os.environ.get("X_BEARER_TOKEN", "")
+
+MAX_RETRIES = 3
+BASE_DELAY = 2  # seconds, exponential backoff: 2s, 4s, 8s
+
+
+def _api_get(endpoint: str, params: dict = None) -> dict:
+    """Make authenticated GET request to X API v2 with retry and error handling."""
+    url = f"https://api.twitter.com/2/{endpoint}"
+    if params:
+        qs = "&".join(f"{k}={urllib.parse.quote(str(v))}" for k, v in params.items())
+        url += f"?{qs}"
+
+    for attempt in range(MAX_RETRIES):
+        try:
+            r = terminal(f'curl -s -w \'\\n%{{http_code}}\' -H "Authorization: Bearer {BEARER}" "{url}"')
+            output = r.get("output", "").strip()
+
+            # Split body from status code (last line)
+            lines = output.rsplit("\n", 1)
+            if len(lines) == 2:
+                body, status_str = lines
+            else:
+                body = output
+                status_str = "0"
+
+            try:
+                status_code = int(status_str.strip())
+            except ValueError:
+                status_code = 0
+
+            # Handle specific status codes
+            if status_code == 429:
+                # Rate limited — retry with backoff
+                delay = BASE_DELAY * (2 ** attempt)
+                print(f"  [X API] Rate limited (429). Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
+                time.sleep(delay)
+                continue
+
+            if status_code in (401, 403):
+                return {"error": f"Authentication failed (HTTP {status_code}). Check X_BEARER_TOKEN.", "http_status": status_code}
+
+            if status_code >= 500:
+                delay = BASE_DELAY * (2 ** attempt)
+                print(f"  [X API] Server error ({status_code}). Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
+                time.sleep(delay)
+                continue
+
+            if status_code == 0 and not body:
+                # Network error — no response at all
+                delay = BASE_DELAY * (2 ** attempt)
+                print(f"  [X API] Network error. Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
+                time.sleep(delay)
+                continue
+
+            try:
+                return json.loads(body)
+            except json.JSONDecodeError:
+                return {"error": f"Failed to parse response (HTTP {status_code}): {body[:200]}"}
+
+        except Exception as e:
+            delay = BASE_DELAY * (2 ** attempt)
+            print(f"  [X API] Exception: {e}. Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
+            time.sleep(delay)
+            continue
+
+    return {"error": f"All {MAX_RETRIES} retries exhausted for {endpoint}"}
+
+
+def get_user(handle: str) -> dict:
+    """Get user profile by handle."""
+    handle = handle.lstrip("@")
+    return _api_get(f"users/by/username/{handle}", {
+        "user.fields": "description,public_metrics,profile_image_url,created_at,location,url"
+    })
+
+
+def get_tweets(user_id: str, count: int = 20) -> dict:
+    """Get user's recent tweets."""
+    return _api_get(f"users/{user_id}/tweets", {
+        "max_results": max(min(count, 100), 5),
+        "tweet.fields": "created_at,public_metrics,text,in_reply_to_user_id,referenced_tweets",
+        "exclude": "retweets"  # original tweets only for voice analysis
+    })
+
+
+def get_tweets_with_rts(user_id: str, count: int = 20) -> dict:
+    """Get user's recent tweets including retweets (shows interests)."""
+    return _api_get(f"users/{user_id}/tweets", {
+        "max_results": max(min(count, 100), 5),
+        "tweet.fields": "created_at,public_metrics,text,referenced_tweets"
+    })
+
+
+def search_tweets(query: str, count: int = 10) -> dict:
+    """Search recent tweets."""
+    return _api_get("tweets/search/recent", {
+        "query": query,
+        "max_results": max(min(count, 100), 10),
+        "tweet.fields": "created_at,public_metrics,text,author_id"
+    })
+
+
+def get_user_by_id(user_id: str) -> dict:
+    """Get user profile by ID."""
+    return _api_get(f"users/{user_id}", {
+        "user.fields": "description,public_metrics,username,name"
+    })
+
+
+# ═══════════════════════════════════════════════════════════════
+# HIGH-LEVEL INTELLIGENCE FUNCTIONS
+# ═══════════════════════════════════════════════════════════════
+
+def profile_user(handle: str) -> dict:
+    """Full profile pull: identity + recent tweets (originals only)."""
+    user = get_user(handle)
+    if "errors" in user or "error" in user:
+        return {"error": f"User @{handle} not found", "details": user}
+
+    user_data = user.get("data", {})
+    user_id = user_data.get("id")
+
+    result = {
+        "profile": user_data,
+        "tweets": [],
+        "voice_samples": [],
+    }
+
+    if user_id:
+        # Get original tweets (no RTs) for voice analysis
+        tweets = get_tweets(user_id, 20)
+        tweet_list = tweets.get("data", [])
+        result["tweets"] = tweet_list
+
+        # Extract pure text samples for voice profiling
+        # Only exclude retweets and actual replies (has in_reply_to_user_id)
+        # Tweets starting with @ are fine if they're standalone mentions
+        result["voice_samples"] = [
+            t["text"] for t in tweet_list
+            if not t.get("text", "").startswith("RT @")
+            and not t.get("in_reply_to_user_id")
+        ]
+
+    return result
+
+
+def profile_interactions(handle1: str, handle2: str) -> dict:
+    """Find interactions between two users."""
+    # Search for replies from handle1 to handle2
+    q1 = f"from:{handle1} to:{handle2}"
+    q2 = f"from:{handle2} to:{handle1}"
+
+    r1 = search_tweets(q1, 10)
+    r2 = search_tweets(q2, 10)
+
+    return {
+        f"{handle1}_to_{handle2}": r1.get("data", []),
+        f"{handle2}_to_{handle1}": r2.get("data", []),
+    }
+
+
+def get_voice_data(handle: str, count: int = 50) -> dict:
+    """Pull maximum voice data: tweets, replies, quote tweets.
+    Returns categorized samples for voice profiling."""
+    user = get_user(handle)
+    if "errors" in user or "error" in user:
+        return {"error": f"User @{handle} not found"}
+
+    user_data = user.get("data", {})
+    user_id = user_data.get("id")
+    if not user_id:
+        return {"error": "No user ID found"}
+
+    # Original tweets (exclude RTs)
+    originals = get_tweets(user_id, min(count, 100))
+    original_list = originals.get("data", [])
+
+    # Categorize — only use in_reply_to_user_id to detect replies
+    standalone = []  # not replies
+    replies = []     # replies to others
+
+    for t in original_list:
+        text = t.get("text", "")
+        if t.get("in_reply_to_user_id"):
+            replies.append(text)
+        else:
+            standalone.append(text)
+
+    return {
+        "profile": user_data,
+        "standalone_tweets": standalone,  # their voice at rest
+        "replies": replies,               # their voice in conversation
+        "total_samples": len(standalone) + len(replies),
+    }
+
+
+# ═══════════════════════════════════════════════════════════════
+# ENTRY POINT
+# ═══════════════════════════════════════════════════════════════
+
+if __name__ == "__main__":
+    if not BEARER:
+        print("ERROR: X_BEARER_TOKEN not set. Set it in environment or ~/.hermes/.env")
+        print("Trying to load from .env...")
+        try:
+            with open(os.path.expanduser("~/.hermes/.env")) as f:
+                for line in f:
+                    line = line.strip()
+                    if line.startswith("X_BEARER_TOKEN="):
+                        # Use split with maxsplit=1 to handle values with '=' in them
+                        # Also strip surrounding quotes if present
+                        val = line.split("=", 1)[1]
+                        if val and val[0] in ('"', "'") and val[-1] == val[0]:
+                            val = val[1:-1]
+                        BEARER = val
+                        break
+        except Exception as e:
+            print(f"  Failed to load .env: {e}")
+
+    if not BEARER:
+        print("FATAL: No bearer token found.")
+        exit(1)
+
+    # Demo: profile two users
+    for handle in ["Teknium", "basedjensen"]:
+        print(f"\n{'='*60}")
+        print(f"  PROFILING @{handle}")
+        print(f"{'='*60}")
+
+        data = profile_user(handle)
+        profile = data.get("profile", {})
+        print(f"  Name: {profile.get('name')}")
+        print(f"  Bio: {profile.get('description')}")
+        metrics = profile.get("public_metrics", {})
+        print(f"  Followers: {metrics.get('followers_count')}")
+        print(f"  Tweets: {metrics.get('tweet_count')}")
+        print(f"  Likes given: {metrics.get('like_count')}")
+
+        print(f"\n  Voice samples ({len(data.get('voice_samples', []))}):")
+        for sample in data.get("voice_samples", [])[:5]:
+            print(f"    > {sample[:120]}")
@@ -0,0 +1,136 @@
+# DOSSIER: {display_name} (@{handle})
+
+## Identity
+- **Name**: {real_name}
+- **Handle(s)**: @{twitter} | u/{reddit} | {discord_tag}
+- **Role**: {role_and_org}
+- **Known for**: {what_they_are_famous_for}
+- **Followers/reach**: {approximate_follower_count}
+- **Confidence**: {HIGH|MEDIUM|LOW} — {confidence_reason}
+
+## Voice Profile
+
+### Linguistic Patterns
+- **Sentence structure**: {short_punchy | long_flowing | mixed}
+- **Capitalization**: {normal | all_lowercase | CAPS_FOR_EMPHASIS | mixed}
+- **Punctuation**: {heavy_periods | ellipsis_lover | no_punctuation | exclamation_marks}
+- **Paragraph style**: {one_liners | thread_essays | medium_blocks}
+- **Emoji/emoticon usage**: {none | minimal | heavy | specific_ones}
+
+### Vocabulary & Slang
+- **Register**: {academic | casual | shitposter | mixed}
+- **Recurring words/phrases**: [list of signature words they use a lot]
+- **Catchphrases**: [any repeated phrases or running jokes]
+- **Profanity level**: {none | mild | moderate | heavy}
+- **Jargon tendency**: {explains_everything | assumes_expertise | mixes}
+
+### Tone
+- **Default mood**: {earnest | ironic | combative | chill | manic | analytical}
+- **Humor style**: {deadpan | absurdist | sarcastic | wholesome | shitpost | none}
+- **How they handle disagreement**: {engages_thoughtfully | dunks | ignores | ratio_warrior | passive_aggressive}
+- **How they handle praise**: {deflects | accepts_gracefully | awkward | flexes}
+
+## Positions & Beliefs
+
+### Core Convictions (things they consistently advocate for)
+1. {conviction_1}
+2. {conviction_2}
+3. {conviction_3}
+
+### Known Hot Takes
+1. {take_1}
+2. {take_2}
+
+### Hills They'll Die On
+1. {hill_1}
+2. {hill_2}
+
+### Topics They Avoid or Refuse to Engage
+1. {avoidance_1}
+
+## Social Dynamics
+
+### People They Interact With Positively
+- @{ally_1} — {relationship_description}
+- @{ally_2} — {relationship_description}
+
+### People They Beef With / Disagree With
+- @{rival_1} — {beef_description}
+
+### How They Engage Different Types
+- **Fans/supporters**: {how_they_respond}
+- **Critics**: {how_they_respond}
+- **Peers**: {how_they_respond}
+- **Random people**: {how_they_respond}
+
+## Platform-Specific Behavior
+
+### On Twitter/X
+- **Post frequency**: {multiple_daily | daily | few_per_week}
+- **Thread tendency**: {never | sometimes | loves_threads}
+- **QRT style**: {adds_context | dunks | amplifies}
+- **Engagement style**: {likes_a_lot | rarely_likes | retweets_heavy}
+
+### On Reddit (if applicable)
+- **Subreddits**: [list]
+- **Comment style**: {detailed | brief | combative}
+
+### On Discord (if applicable)
+- **Servers**: [known servers]
+- **Vibe shift from Twitter**: {description}
+
+## Signature Moves
+Things this person characteristically does that make them recognizable:
+1. {signature_move_1}
+2. {signature_move_2}
+3. {signature_move_3}
+
+## Sample Quotes (real, sourced from research)
+> "{actual_quote_1}" — [source/context]
+> "{actual_quote_2}" — [source/context]
+> "{actual_quote_3}" — [source/context]
+
+## Deep Psychometric Profile
+- **Big Five**: O{H/M/L} C{} E{} A{} N{} — {evidence}
+- **Moral Foundations**: Care{} Fair{} Loyal{} Auth{} Sanct{} Liberty{} — {what drives their ethics}
+- **Schwartz Values**: {dominant values} — {how they justify positions}
+- **Cognitive Style**: {IC score estimate} — {hedging patterns, complexity, analytical vs intuitive}
+- **Narrative Frame**: {dominant frame} — {how they lens issues}
+- **Persona Authenticity**: {1-5 score} — {evidence for curation vs authenticity}
+
+## Strategic Self-Presentation (Red Hat)
+- **Cultivated image**: {what they want to be seen as}
+- **Target audience**: {who they're performing for}
+- **Incentive structure**: {what they gain from this persona}
+- **Possible divergences**: {where persona may ≠ person}
+- **Ghostwriting indicators**: {present/absent, evidence}
+
+## Ecosystem Context
+- **Community cluster**: {which tribe they belong to}
+- **Key influencers**: {who they amplify/follow/agree with}
+- **Echo chamber**: {what information environment they're in}
+- **Audience profile**: {who follows them, how that audience reacts}
+
+## Key Assumptions
+1. {assumption} — FRAGILITY: {robust/moderate/fragile} — Test: {what invalidates it}
+2. {assumption} — FRAGILITY: {} — Test: {}
+3. {assumption} — FRAGILITY: {} — Test: {}
+
+## Competing Hypotheses
+- **H1 (PRIMARY)**: {main personality model} — Confidence: {X}%
+- **H2 (ALTERNATIVE)**: {alternative explanation} — Confidence: {X}%
+- **Key discriminator**: {what evidence would shift between H1 and H2}
+
+## Research Sources
+- {source_1} [{reliability}{confidence}] — {description}
+- {source_2} [{reliability}{confidence}] — {description}
+- {source_3} [{reliability}{confidence}] — {description}
+
+## Invalidation Indicators
+1. If @{handle} {does X instead of Y}, our {assessment} is wrong
+2. If @{handle} {responds to Z with Q}, our {model} needs revision
+3. If @{handle} {interacts with @person in manner M}, dynamics model is off
+
+---
+*Dossier compiled: {date} | Fidelity: {fidelity}% | Persona Authenticity: {1-5}*
+*Source reliability range: {best}-{worst} | Analytical confidence: {1-6}*
@@ -1,12 +1,11 @@
 # Hindsight Memory Provider

-Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. Supports cloud, local embedded, and local external modes.
+Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. Supports cloud and local (embedded) modes.

 ## Requirements

 - **Cloud:** API key from [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io)
- **Local Embedded:** API key for a supported LLM provider (OpenAI, Anthropic, Gemini, Groq, OpenRouter, MiniMax, Ollama, or any OpenAI-compatible endpoint). Embeddings and reranking run locally — no additional API keys needed.
- **Local External:** A running Hindsight instance (Docker or self-hosted) reachable over HTTP.
+- **Local:** API key for a supported LLM provider (OpenAI, Anthropic, Gemini, Groq, MiniMax, or Ollama). Embeddings and reranking run locally — no additional API keys needed.

 ## Setup

@@ -22,28 +21,17 @@ hermes config set memory.provider hindsight
 echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env
 ```

-### Cloud
+### Cloud Mode

 Connects to the Hindsight Cloud API. Requires an API key from [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io).

-### Local Embedded
+### Local Mode

-Hermes spins up a local Hindsight daemon with built-in PostgreSQL. Requires an LLM API key for memory extraction and synthesis. The daemon starts automatically in the background on first use and stops after 5 minutes of inactivity.
-
-Supports any OpenAI-compatible LLM endpoint (llama.cpp, vLLM, LM Studio, etc.) — pick `openai_compatible` as the provider and enter the base URL.
+Runs an embedded Hindsight server with built-in PostgreSQL. Requires an LLM API key (e.g. Groq, OpenAI, Anthropic) for memory extraction and synthesis. The daemon starts automatically in the background on first use and stops after 5 minutes of inactivity.

 Daemon startup logs: `~/.hermes/logs/hindsight-embed.log`
 Daemon runtime logs: `~/.hindsight/profiles/<profile>.log`

-To open the Hindsight web UI (local embedded mode only):
-```bash
-hindsight-embed -p hermes ui start
-```
-
-### Local External
-
-Points the plugin at an existing Hindsight instance you're already running (Docker, self-hosted, etc.). No daemon management — just a URL and an optional API key.
-
 ## Config

 Config file: `~/.hermes/hindsight/config.json`
@@ -52,58 +40,39 @@ Config file: `~/.hermes/hindsight/config.json`

 | Key | Default | Description |
 |-----|---------|-------------|
-| `mode` | `cloud` | `cloud`, `local_embedded`, or `local_external` |
-| `api_url` | `https://api.hindsight.vectorize.io` | API URL (cloud and local_external modes) |
+| `mode` | `cloud` | `cloud` or `local` |
+| `api_url` | `https://api.hindsight.vectorize.io` | API URL (cloud mode) |
+| `api_url` | `http://localhost:8888` | API URL (local mode, unused — daemon manages its own port) |

-### Memory Bank
+### Memory

 | Key | Default | Description |
 |-----|---------|-------------|
 | `bank_id` | `hermes` | Memory bank name |
-| `bank_mission` | — | Reflect mission (identity/framing for reflect reasoning). Applied via Banks API. |
-| `bank_retain_mission` | — | Retain mission (steers what gets extracted). Applied via Banks API. |
-
-### Recall
-
-| Key | Default | Description |
-|-----|---------|-------------|
-| `recall_budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |
-| `recall_prefetch_method` | `recall` | Auto-recall method: `recall` (raw facts) or `reflect` (LLM synthesis) |
-| `recall_max_tokens` | `4096` | Maximum tokens for recall results |
-| `recall_max_input_chars` | `800` | Maximum input query length for auto-recall |
-| `recall_prompt_preamble` | — | Custom preamble for recalled memories in context |
-| `recall_tags` | — | Tags to filter when searching memories |
-| `recall_tags_match` | `any` | Tag matching mode: `any` / `all` / `any_strict` / `all_strict` |
-| `auto_recall` | `true` | Automatically recall memories before each turn |
-
-### Retain
-
-| Key | Default | Description |
-|-----|---------|-------------|
-| `auto_retain` | `true` | Automatically retain conversation turns |
-| `retain_async` | `true` | Process retain asynchronously on the Hindsight server |
-| `retain_every_n_turns` | `1` | Retain every N turns (1 = every turn) |
-| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
-| `tags` | — | Tags applied when storing memories |
+| `budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |

 ### Integration

 | Key | Default | Description |
 |-----|---------|-------------|
 | `memory_mode` | `hybrid` | How memories are integrated into the agent |
+| `prefetch_method` | `recall` | Method for automatic context injection |

 **memory_mode:**
 - `hybrid` — automatic context injection + tools available to the LLM
 - `context` — automatic injection only, no tools exposed
 - `tools` — tools only, no automatic injection

-### Local Embedded LLM
+**prefetch_method:**
+- `recall` — injects raw memory facts (fast)
+- `reflect` — injects LLM-synthesized summary (slower, more coherent)
+
+### Local Mode LLM

 | Key | Default | Description |
 |-----|---------|-------------|
-| `llm_provider` | `openai` | `openai`, `anthropic`, `gemini`, `groq`, `openrouter`, `minimax`, `ollama`, `lmstudio`, `openai_compatible` |
-| `llm_model` | per-provider | Model name (e.g. `gpt-4o-mini`, `qwen/qwen3.5-9b`) |
-| `llm_base_url` | — | Endpoint URL for `openai_compatible` (e.g. `http://192.168.1.10:8080/v1`) |
+| `llm_provider` | `openai` | LLM provider: `openai`, `anthropic`, `gemini`, `groq`, `minimax`, `ollama` |
+| `llm_model` | per-provider | Model name (e.g. `gpt-4o-mini`, `openai/gpt-oss-120b`) |

 The LLM API key is stored in `~/.hermes/.env` as `HINDSIGHT_LLM_API_KEY`.

@@ -123,12 +92,7 @@ Available in `hybrid` and `tools` memory modes:
 |----------|-------------|
 | `HINDSIGHT_API_KEY` | API key for Hindsight Cloud |
 | `HINDSIGHT_LLM_API_KEY` | LLM API key for local mode |
-| `HINDSIGHT_API_LLM_BASE_URL` | LLM Base URL for local mode (e.g. OpenRouter) |
 | `HINDSIGHT_API_URL` | Override API endpoint |
 | `HINDSIGHT_BANK_ID` | Override bank name |
 | `HINDSIGHT_BUDGET` | Override recall budget |
-| `HINDSIGHT_MODE` | Override mode (`cloud`, `local_embedded`, `local_external`) |
-
-## Client Version
-
-Requires `hindsight-client >= 0.4.22`. The plugin auto-upgrades on session start if an older version is detected.
+| `HINDSIGHT_MODE` | Override mode (`cloud` / `local`) |
@@ -23,30 +23,24 @@ import json
 import logging
 import os
 import threading
-
-from hermes_constants import get_hermes_home
 from typing import Any, Dict, List

 from agent.memory_provider import MemoryProvider
-from hermes_constants import get_hermes_home
 from tools.registry import tool_error

 logger = logging.getLogger(__name__)

 _DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
 _DEFAULT_LOCAL_URL = "http://localhost:8888"
-_MIN_CLIENT_VERSION = "0.4.22"
 _VALID_BUDGETS = {"low", "mid", "high"}
 _PROVIDER_DEFAULT_MODELS = {
    "openai": "gpt-4o-mini",
    "anthropic": "claude-haiku-4-5",
    "gemini": "gemini-2.5-flash",
    "groq": "openai/gpt-oss-120b",
-    "openrouter": "qwen/qwen3.5-9b",
    "minimax": "MiniMax-M2.7",
    "ollama": "gemma3:12b",
    "lmstudio": "local-model",
-    "openai_compatible": "your-model-name",
 }


@@ -148,6 +142,7 @@ def _load_config() -> dict:
      3. Environment variables
    """
    from pathlib import Path
+    from hermes_constants import get_hermes_home

    # Profile-scoped path (preferred)
    profile_path = get_hermes_home() / "hindsight" / "config.json"
@@ -192,7 +187,6 @@ class HindsightMemoryProvider(MemoryProvider):
        self._bank_id = "hermes"
        self._budget = "mid"
        self._mode = "cloud"
-        self._llm_base_url = ""
        self._memory_mode = "hybrid"  # "context", "tools", or "hybrid"
        self._prefetch_method = "recall"  # "recall" or "reflect"
        self._client = None
@@ -200,31 +194,6 @@ class HindsightMemoryProvider(MemoryProvider):
        self._prefetch_lock = threading.Lock()
        self._prefetch_thread = None
        self._sync_thread = None
-        self._session_id = ""
-
-        # Tags
-        self._tags: list[str] | None = None
-        self._recall_tags: list[str] | None = None
-        self._recall_tags_match = "any"
-
-        # Retain controls
-        self._auto_retain = True
-        self._retain_every_n_turns = 1
-        self._retain_context = "conversation between Hermes Agent and the User"
-        self._turn_counter = 0
-        self._session_turns: list[str] = []  # accumulates ALL turns for the session
-
-        # Recall controls
-        self._auto_recall = True
-        self._recall_max_tokens = 4096
-        self._recall_types: list[str] | None = None
-        self._recall_prompt_preamble = ""
-        self._recall_max_input_chars = 800
-
-        # Bank
-        self._bank_mission = ""
-        self._bank_retain_mission: str | None = None
-        self._retain_async = True

    @property
    def name(self) -> str:
@@ -234,7 +203,7 @@ class HindsightMemoryProvider(MemoryProvider):
        try:
            cfg = _load_config()
            mode = cfg.get("mode", "cloud")
-            if mode in ("local", "local_embedded", "local_external"):
+            if mode == "local":
                return True
            has_key = bool(cfg.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", ""))
            has_url = bool(cfg.get("api_url") or os.environ.get("HINDSIGHT_API_URL", ""))
@@ -258,306 +227,68 @@ class HindsightMemoryProvider(MemoryProvider):
        existing.update(values)
        config_path.write_text(json.dumps(existing, indent=2))

-    def post_setup(self, hermes_home: str, config: dict) -> None:
-        """Custom setup wizard — installs only the deps needed for the selected mode."""
-        import getpass
-        import subprocess
-        import shutil
-        import sys
-        from pathlib import Path
-
-        from hermes_cli.config import save_config
-
-        from hermes_cli.memory_setup import _curses_select
-
-        print("\n  Configuring Hindsight memory:\n")
-
-        # Step 1: Mode selection
-        mode_items = [
-            ("Cloud", "Hindsight Cloud API (lightweight, just needs an API key)"),
-            ("Local Embedded", "Run Hindsight locally (downloads ~200MB, needs LLM key)"),
-            ("Local External", "Connect to an existing Hindsight instance"),
-        ]
-        mode_idx = _curses_select("  Select mode", mode_items, default=0)
-        mode = ["cloud", "local_embedded", "local_external"][mode_idx]
-
-        provider_config: dict = {"mode": mode}
-        env_writes: dict = {}
-
-        # Step 2: Install/upgrade deps for selected mode
-        _MIN_CLIENT_VERSION = "0.4.22"
-        cloud_dep = f"hindsight-client>={_MIN_CLIENT_VERSION}"
-        local_dep = "hindsight-all"
-        if mode == "local_embedded":
-            deps_to_install = [local_dep]
-        elif mode == "local_external":
-            deps_to_install = [cloud_dep]
-        else:
-            deps_to_install = [cloud_dep]
-
-        print(f"\n  Checking dependencies...")
-        uv_path = shutil.which("uv")
-        if not uv_path:
-            print("  ⚠ uv not found — install it: curl -LsSf https://astral.sh/uv/install.sh | sh")
-            print(f"  Then run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")
-        else:
-            try:
-                subprocess.run(
-                    [uv_path, "pip", "install", "--python", sys.executable, "--quiet", "--upgrade"] + deps_to_install,
-                    check=True, timeout=120, capture_output=True,
-                )
-                print(f"  ✓ Dependencies up to date")
-            except Exception as e:
-                print(f"  ⚠ Install failed: {e}")
-                print(f"  Run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")
-
-        # Step 3: Mode-specific config
-        if mode == "cloud":
-            print(f"\n  Get your API key at https://ui.hindsight.vectorize.io\n")
-            existing_key = os.environ.get("HINDSIGHT_API_KEY", "")
-            if existing_key:
-                masked = f"...{existing_key[-4:]}" if len(existing_key) > 4 else "set"
-                sys.stdout.write(f"  API key (current: {masked}, blank to keep): ")
-                sys.stdout.flush()
-                api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
-            else:
-                sys.stdout.write("  API key: ")
-                sys.stdout.flush()
-                api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
-            if api_key:
-                env_writes["HINDSIGHT_API_KEY"] = api_key
-
-            val = input(f"  API URL [{_DEFAULT_API_URL}]: ").strip()
-            if val:
-                provider_config["api_url"] = val
-
-        elif mode == "local_external":
-            val = input(f"  Hindsight API URL [{_DEFAULT_LOCAL_URL}]: ").strip()
-            provider_config["api_url"] = val or _DEFAULT_LOCAL_URL
-
-            sys.stdout.write("  API key (optional, blank to skip): ")
-            sys.stdout.flush()
-            api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
-            if api_key:
-                env_writes["HINDSIGHT_API_KEY"] = api_key
-
-        else:  # local_embedded
-            providers_list = list(_PROVIDER_DEFAULT_MODELS.keys())
-            llm_items = [
-                (p, f"default model: {_PROVIDER_DEFAULT_MODELS[p]}")
-                for p in providers_list
-            ]
-            llm_idx = _curses_select("  Select LLM provider", llm_items, default=0)
-            llm_provider = providers_list[llm_idx]
-
-            provider_config["llm_provider"] = llm_provider
-
-            if llm_provider == "openai_compatible":
-                val = input("  LLM endpoint URL (e.g. http://192.168.1.10:8080/v1): ").strip()
-                if val:
-                    provider_config["llm_base_url"] = val
-            elif llm_provider == "openrouter":
-                provider_config["llm_base_url"] = "https://openrouter.ai/api/v1"
-
-            default_model = _PROVIDER_DEFAULT_MODELS.get(llm_provider, "gpt-4o-mini")
-            val = input(f"  LLM model [{default_model}]: ").strip()
-            provider_config["llm_model"] = val or default_model
-
-            sys.stdout.write("  LLM API key: ")
-            sys.stdout.flush()
-            llm_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
-            if llm_key:
-                env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
-
-        # Step 4: Save everything
-        provider_config["bank_id"] = "hermes"
-        provider_config["recall_budget"] = "mid"
-        bank_id = "hermes"
-        config["memory"]["provider"] = "hindsight"
-        save_config(config)
-
-        self.save_config(provider_config, hermes_home)
-
-        if env_writes:
-            env_path = Path(hermes_home) / ".env"
-            env_path.parent.mkdir(parents=True, exist_ok=True)
-            existing_lines = []
-            if env_path.exists():
-                existing_lines = env_path.read_text().splitlines()
-            updated_keys = set()
-            new_lines = []
-            for line in existing_lines:
-                key_match = line.split("=", 1)[0].strip() if "=" in line and not line.startswith("#") else None
-                if key_match and key_match in env_writes:
-                    new_lines.append(f"{key_match}={env_writes[key_match]}")
-                    updated_keys.add(key_match)
-                else:
-                    new_lines.append(line)
-            for k, v in env_writes.items():
-                if k not in updated_keys:
-                    new_lines.append(f"{k}={v}")
-            env_path.write_text("\n".join(new_lines) + "\n")
-
-        print(f"\n  ✓ Hindsight memory configured ({mode} mode)")
-        if env_writes:
-            print(f"  API keys saved to .env")
-        print(f"\n  Start a new session to activate.\n")
-
    def get_config_schema(self):
        return [
-            {"key": "mode", "description": "Connection mode", "default": "cloud", "choices": ["cloud", "local_embedded", "local_external"]},
-            # Cloud mode
-            {"key": "api_url", "description": "Hindsight Cloud API URL", "default": _DEFAULT_API_URL, "when": {"mode": "cloud"}},
+            {"key": "mode", "description": "Cloud API or local embedded mode", "default": "cloud", "choices": ["cloud", "local"]},
+            {"key": "api_url", "description": "Hindsight API URL", "default": _DEFAULT_API_URL, "when": {"mode": "cloud"}},
            {"key": "api_key", "description": "Hindsight Cloud API key", "secret": True, "env_var": "HINDSIGHT_API_KEY", "url": "https://ui.hindsight.vectorize.io", "when": {"mode": "cloud"}},
-            # Local external mode
-            {"key": "api_url", "description": "Hindsight API URL", "default": _DEFAULT_LOCAL_URL, "when": {"mode": "local_external"}},
-            {"key": "api_key", "description": "API key (optional)", "secret": True, "env_var": "HINDSIGHT_API_KEY", "when": {"mode": "local_external"}},
-            # Local embedded mode
-            {"key": "llm_provider", "description": "LLM provider", "default": "openai", "choices": ["openai", "anthropic", "gemini", "groq", "openrouter", "minimax", "ollama", "lmstudio", "openai_compatible"], "when": {"mode": "local_embedded"}},
-            {"key": "llm_base_url", "description": "Endpoint URL (e.g. http://192.168.1.10:8080/v1)", "default": "", "when": {"mode": "local_embedded", "llm_provider": "openai_compatible"}},
-            {"key": "llm_api_key", "description": "LLM API key (optional for openai_compatible)", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local_embedded"}},
-            {"key": "llm_model", "description": "LLM model", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local_embedded"}},
+            {"key": "llm_provider", "description": "LLM provider for local mode", "default": "openai", "choices": ["openai", "anthropic", "gemini", "groq", "minimax", "ollama"], "when": {"mode": "local"}},
+            {"key": "llm_api_key", "description": "LLM API key for local Hindsight", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local"}},
+            {"key": "llm_model", "description": "LLM model for local mode", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local"}},
            {"key": "bank_id", "description": "Memory bank name", "default": "hermes"},
-            {"key": "bank_mission", "description": "Mission/purpose description for the memory bank"},
-            {"key": "bank_retain_mission", "description": "Custom extraction prompt for memory retention"},
-            {"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
+            {"key": "budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
            {"key": "memory_mode", "description": "Memory integration mode", "default": "hybrid", "choices": ["hybrid", "context", "tools"]},
-            {"key": "recall_prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
-            {"key": "tags", "description": "Tags applied when storing memories (comma-separated)", "default": ""},
-            {"key": "recall_tags", "description": "Tags to filter when searching memories (comma-separated)", "default": ""},
-            {"key": "recall_tags_match", "description": "Tag matching mode for recall", "default": "any", "choices": ["any", "all", "any_strict", "all_strict"]},
-            {"key": "auto_recall", "description": "Automatically recall memories before each turn", "default": True},
-            {"key": "auto_retain", "description": "Automatically retain conversation turns", "default": True},
-            {"key": "retain_every_n_turns", "description": "Retain every N turns (1 = every turn)", "default": 1},
-            {"key": "retain_async","description": "Process retain asynchronously on the Hindsight server", "default": True},
-            {"key": "retain_context", "description": "Context label for retained memories", "default": "conversation between Hermes Agent and the User"},
-            {"key": "recall_max_tokens", "description": "Maximum tokens for recall results", "default": 4096},
-            {"key": "recall_max_input_chars", "description": "Maximum input query length for auto-recall", "default": 800},
-            {"key": "recall_prompt_preamble", "description": "Custom preamble for recalled memories in context"},
+            {"key": "prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
        ]

    def _get_client(self):
        """Return the cached Hindsight client (created once, reused)."""
        if self._client is None:
-            if self._mode == "local_embedded":
+            if self._mode == "local":
                from hindsight import HindsightEmbedded
+                # Disable __del__ on the class to prevent "attached to a
+                # different loop" errors during GC — we handle cleanup in
+                # shutdown() instead.
                HindsightEmbedded.__del__ = lambda self: None
-                llm_provider = self._config.get("llm_provider", "")
-                if llm_provider in ("openai_compatible", "openrouter"):
-                    llm_provider = "openai"
-                logger.debug("Creating HindsightEmbedded client (profile=%s, provider=%s)",
-                             self._config.get("profile", "hermes"), llm_provider)
-                kwargs = dict(
+                self._client = HindsightEmbedded(
                    profile=self._config.get("profile", "hermes"),
-                    llm_provider=llm_provider,
-                    llm_api_key=self._config.get("llmApiKey") or self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", ""),
+                    llm_provider=self._config.get("llm_provider", ""),
+                    llm_api_key=self._config.get("llmApiKey") or os.environ.get("HINDSIGHT_LLM_API_KEY", ""),
                    llm_model=self._config.get("llm_model", ""),
                )
-                if self._llm_base_url:
-                    kwargs["llm_base_url"] = self._llm_base_url
-                self._client = HindsightEmbedded(**kwargs)
            else:
                from hindsight_client import Hindsight
                kwargs = {"base_url": self._api_url, "timeout": 30.0}
                if self._api_key:
                    kwargs["api_key"] = self._api_key
-                logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s)",
-                             self._api_url, bool(self._api_key))
                self._client = Hindsight(**kwargs)
        return self._client

    def initialize(self, session_id: str, **kwargs) -> None:
-        self._session_id = session_id
-
-        # Check client version and auto-upgrade if needed
-        try:
-            from importlib.metadata import version as pkg_version
-            from packaging.version import Version
-            installed = pkg_version("hindsight-client")
-            if Version(installed) < Version(_MIN_CLIENT_VERSION):
-                logger.warning("hindsight-client %s is outdated (need >=%s), attempting upgrade...",
-                               installed, _MIN_CLIENT_VERSION)
-                import shutil, subprocess, sys
-                uv_path = shutil.which("uv")
-                if uv_path:
-                    try:
-                        subprocess.run(
-                            [uv_path, "pip", "install", "--python", sys.executable,
-                             "--quiet", "--upgrade", f"hindsight-client>={_MIN_CLIENT_VERSION}"],
-                            check=True, timeout=120, capture_output=True,
-                        )
-                        logger.info("hindsight-client upgraded to >=%s", _MIN_CLIENT_VERSION)
-                    except Exception as e:
-                        logger.warning("Auto-upgrade failed: %s. Run: uv pip install 'hindsight-client>=%s'",
-                                       e, _MIN_CLIENT_VERSION)
-                else:
-                    logger.warning("uv not found. Run: pip install 'hindsight-client>=%s'", _MIN_CLIENT_VERSION)
-        except Exception:
-            pass  # packaging not available or other issue — proceed anyway
-
        self._config = _load_config()
        self._mode = self._config.get("mode", "cloud")
-        # "local" is a legacy alias for "local_embedded"
-        if self._mode == "local":
-            self._mode = "local_embedded"
-        self._api_key = self._config.get("apiKey") or self._config.get("api_key") or os.environ.get("HINDSIGHT_API_KEY", "")
-        default_url = _DEFAULT_LOCAL_URL if self._mode in ("local_embedded", "local_external") else _DEFAULT_API_URL
+        self._api_key = self._config.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", "")
+        default_url = _DEFAULT_LOCAL_URL if self._mode == "local" else _DEFAULT_API_URL
        self._api_url = self._config.get("api_url") or os.environ.get("HINDSIGHT_API_URL", default_url)
-        self._llm_base_url = self._config.get("llm_base_url", "")

        banks = self._config.get("banks", {}).get("hermes", {})
        self._bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
-        budget = self._config.get("recall_budget") or self._config.get("budget") or banks.get("budget", "mid")
+        budget = self._config.get("budget") or banks.get("budget", "mid")
        self._budget = budget if budget in _VALID_BUDGETS else "mid"

        memory_mode = self._config.get("memory_mode", "hybrid")
        self._memory_mode = memory_mode if memory_mode in ("context", "tools", "hybrid") else "hybrid"

-        prefetch_method = self._config.get("recall_prefetch_method", "recall")
+        prefetch_method = self._config.get("prefetch_method", "recall")
        self._prefetch_method = prefetch_method if prefetch_method in ("recall", "reflect") else "recall"

-        # Bank options
-        self._bank_mission = self._config.get("bank_mission", "")
-        self._bank_retain_mission = self._config.get("bank_retain_mission") or None
-
-        # Tags
-        self._tags = self._config.get("tags") or None
-        self._recall_tags = self._config.get("recall_tags") or None
-        self._recall_tags_match = self._config.get("recall_tags_match", "any")
-
-        # Retain controls
-        self._auto_retain = self._config.get("auto_retain", True)
-        self._retain_every_n_turns = max(1, int(self._config.get("retain_every_n_turns", 1)))
-        self._retain_context = self._config.get("retain_context", "conversation between Hermes Agent and the User")
-
-        # Recall controls
-        self._auto_recall = self._config.get("auto_recall", True)
-        self._recall_max_tokens = int(self._config.get("recall_max_tokens", 4096))
-        self._recall_types = self._config.get("recall_types") or None
-        self._recall_prompt_preamble = self._config.get("recall_prompt_preamble", "")
-        self._recall_max_input_chars = int(self._config.get("recall_max_input_chars", 800))
-        self._retain_async = self._config.get("retain_async", True)
-
-        _client_version = "unknown"
-        try:
-            from importlib.metadata import version as pkg_version
-            _client_version = pkg_version("hindsight-client")
-        except Exception:
-            pass
-        logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
-                     self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
-        logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
-                     "retain_async=%s, retain_context=%s, "
-                     "recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
-                     self._auto_retain, self._auto_recall, self._retain_every_n_turns,
-                     self._retain_async, self._retain_context,
-                     self._recall_max_tokens, self._recall_max_input_chars,
-                     self._tags, self._recall_tags)
+        logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s",
+                     self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method)

        # For local mode, start the embedded daemon in the background so it
        # doesn't block the chat. Redirect stdout/stderr to a log file to
        # prevent rich startup output from spamming the terminal.
-        if self._mode == "local_embedded":
+        if self._mode == "local":
            def _start_daemon():
                import traceback
                log_dir = get_hermes_home() / "logs"
@@ -579,12 +310,9 @@ class HindsightMemoryProvider(MemoryProvider):
                    # If the config changed and the daemon is running, stop it.
                    from pathlib import Path as _Path
                    profile_env = _Path.home() / ".hindsight" / "profiles" / f"{profile}.env"
-                    current_key = self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
+                    current_key = self._config.get("llmApiKey") or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
                    current_provider = self._config.get("llm_provider", "")
                    current_model = self._config.get("llm_model", "")
-                    current_base_url = self._config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
-                    # Map openai_compatible/openrouter → openai for the daemon (OpenAI wire format)
-                    daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider

                    # Read saved profile config
                    saved = {}
@@ -595,24 +323,20 @@ class HindsightMemoryProvider(MemoryProvider):
                                saved[k.strip()] = v.strip()

                    config_changed = (
-                        saved.get("HINDSIGHT_API_LLM_PROVIDER") != daemon_provider or
+                        saved.get("HINDSIGHT_API_LLM_PROVIDER") != current_provider or
                        saved.get("HINDSIGHT_API_LLM_MODEL") != current_model or
-                        saved.get("HINDSIGHT_API_LLM_API_KEY") != current_key or
-                        saved.get("HINDSIGHT_API_LLM_BASE_URL", "") != current_base_url
+                        saved.get("HINDSIGHT_API_LLM_API_KEY") != current_key
                    )

                    if config_changed:
                        # Write updated profile .env
                        profile_env.parent.mkdir(parents=True, exist_ok=True)
-                        env_lines = (
-                            f"HINDSIGHT_API_LLM_PROVIDER={daemon_provider}\n"
+                        profile_env.write_text(
+                            f"HINDSIGHT_API_LLM_PROVIDER={current_provider}\n"
                            f"HINDSIGHT_API_LLM_API_KEY={current_key}\n"
                            f"HINDSIGHT_API_LLM_MODEL={current_model}\n"
                            f"HINDSIGHT_API_LOG_LEVEL=info\n"
                        )
-                        if current_base_url:
-                            env_lines += f"HINDSIGHT_API_LLM_BASE_URL={current_base_url}\n"
-                        profile_env.write_text(env_lines)
                        if client._manager.is_running(profile):
                            with open(log_path, "a") as f:
                                f.write("\n=== Config changed, restarting daemon ===\n")
@@ -653,118 +377,47 @@ class HindsightMemoryProvider(MemoryProvider):

    def prefetch(self, query: str, *, session_id: str = "") -> str:
        if self._prefetch_thread and self._prefetch_thread.is_alive():
-            logger.debug("Prefetch: waiting for background thread to complete")
            self._prefetch_thread.join(timeout=3.0)
        with self._prefetch_lock:
            result = self._prefetch_result
            self._prefetch_result = ""
        if not result:
-            logger.debug("Prefetch: no results available")
            return ""
-        logger.debug("Prefetch: returning %d chars of context", len(result))
-        header = self._recall_prompt_preamble or (
-            "# Hindsight Memory (persistent cross-session context)\n"
-            "Use this to answer questions about the user and prior sessions. "
-            "Do not call tools to look up information that is already present here."
-        )
-        return f"{header}\n\n{result}"
+        return f"## Hindsight Memory\n{result}"

    def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
        if self._memory_mode == "tools":
-            logger.debug("Prefetch: skipped (tools-only mode)")
            return
-        if not self._auto_recall:
-            logger.debug("Prefetch: skipped (auto_recall disabled)")
-            return
-        # Truncate query to max chars
-        if self._recall_max_input_chars and len(query) > self._recall_max_input_chars:
-            query = query[:self._recall_max_input_chars]
-
        def _run():
            try:
                client = self._get_client()
                if self._prefetch_method == "reflect":
-                    logger.debug("Prefetch: calling reflect (bank=%s, query_len=%d)", self._bank_id, len(query))
                    resp = _run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
                    text = resp.text or ""
                else:
-                    recall_kwargs: dict = {
-                        "bank_id": self._bank_id, "query": query,
-                        "budget": self._budget, "max_tokens": self._recall_max_tokens,
-                    }
-                    if self._recall_tags:
-                        recall_kwargs["tags"] = self._recall_tags
-                        recall_kwargs["tags_match"] = self._recall_tags_match
-                    if self._recall_types:
-                        recall_kwargs["types"] = self._recall_types
-                    logger.debug("Prefetch: calling recall (bank=%s, query_len=%d, budget=%s)",
-                                 self._bank_id, len(query), self._budget)
-                    resp = _run_sync(client.arecall(**recall_kwargs))
-                    num_results = len(resp.results) if resp.results else 0
-                    logger.debug("Prefetch: recall returned %d results", num_results)
-                    text = "\n".join(f"- {r.text}" for r in resp.results if r.text) if resp.results else ""
+                    resp = _run_sync(client.arecall(bank_id=self._bank_id, query=query, budget=self._budget))
+                    text = "\n".join(r.text for r in resp.results if r.text) if resp.results else ""
                if text:
                    with self._prefetch_lock:
                        self._prefetch_result = text
            except Exception as e:
-                logger.debug("Hindsight prefetch failed: %s", e, exc_info=True)
+                logger.debug("Hindsight prefetch failed: %s", e)

        self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="hindsight-prefetch")
        self._prefetch_thread.start()

    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
-        """Retain conversation turn in background (non-blocking).
-
-        Respects retain_every_n_turns for batching.
-        """
-        if not self._auto_retain:
-            logger.debug("sync_turn: skipped (auto_retain disabled)")
-            return
-
-        from datetime import datetime, timezone
-        now = datetime.now(timezone.utc).isoformat()
-
-        messages = [
-            {"role": "user", "content": user_content, "timestamp": now},
-            {"role": "assistant", "content": assistant_content, "timestamp": now},
-        ]
-
-        turn = json.dumps(messages)
-        self._session_turns.append(turn)
-        self._turn_counter += 1
-
-        # Only retain every N turns
-        if self._turn_counter % self._retain_every_n_turns != 0:
-            logger.debug("sync_turn: buffered turn %d (will retain at turn %d)",
-                         self._turn_counter, self._turn_counter + (self._retain_every_n_turns - self._turn_counter % self._retain_every_n_turns))
-            return
-
-        logger.debug("sync_turn: retaining %d turns, total session content %d chars",
-                     len(self._session_turns), sum(len(t) for t in self._session_turns))
-        # Send the ENTIRE session as a single JSON array (document_id deduplicates).
-        # Each element in _session_turns is a JSON string of that turn's messages.
-        content = "[" + ",".join(self._session_turns) + "]"
+        """Retain conversation turn in background (non-blocking)."""
+        combined = f"User: {user_content}\nAssistant: {assistant_content}"

        def _sync():
            try:
                client = self._get_client()
-                item: dict = {
-                    "content": content,
-                    "context": self._retain_context,
-                }
-                if self._tags:
-                    item["tags"] = self._tags
-                logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
-                             self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
-                _run_sync(client.aretain_batch(
-                    bank_id=self._bank_id,
-                    items=[item],
-                    document_id=self._session_id,
-                    retain_async=self._retain_async,
+                _run_sync(client.aretain(
+                    bank_id=self._bank_id, content=combined, context="conversation"
                ))
-                logger.debug("Hindsight retain succeeded")
            except Exception as e:
-                logger.warning("Hindsight sync failed: %s", e, exc_info=True)
+                logger.warning("Hindsight sync failed: %s", e)

        if self._sync_thread and self._sync_thread.is_alive():
            self._sync_thread.join(timeout=5.0)
@@ -789,18 +442,12 @@ class HindsightMemoryProvider(MemoryProvider):
                return tool_error("Missing required parameter: content")
            context = args.get("context")
            try:
-                retain_kwargs: dict = {
-                    "bank_id": self._bank_id, "content": content, "context": context,
-                }
-                if self._tags:
-                    retain_kwargs["tags"] = self._tags
-                logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
-                             self._bank_id, len(content), context)
-                _run_sync(client.aretain(**retain_kwargs))
-                logger.debug("Tool hindsight_retain: success")
+                _run_sync(client.aretain(
+                    bank_id=self._bank_id, content=content, context=context
+                ))
                return json.dumps({"result": "Memory stored successfully."})
            except Exception as e:
-                logger.warning("hindsight_retain failed: %s", e, exc_info=True)
+                logger.warning("hindsight_retain failed: %s", e)
                return tool_error(f"Failed to store memory: {e}")

        elif tool_name == "hindsight_recall":
@@ -808,26 +455,15 @@ class HindsightMemoryProvider(MemoryProvider):
            if not query:
                return tool_error("Missing required parameter: query")
            try:
-                recall_kwargs: dict = {
-                    "bank_id": self._bank_id, "query": query, "budget": self._budget,
-                    "max_tokens": self._recall_max_tokens,
-                }
-                if self._recall_tags:
-                    recall_kwargs["tags"] = self._recall_tags
-                    recall_kwargs["tags_match"] = self._recall_tags_match
-                if self._recall_types:
-                    recall_kwargs["types"] = self._recall_types
-                logger.debug("Tool hindsight_recall: bank=%s, query_len=%d, budget=%s",
-                             self._bank_id, len(query), self._budget)
-                resp = _run_sync(client.arecall(**recall_kwargs))
-                num_results = len(resp.results) if resp.results else 0
-                logger.debug("Tool hindsight_recall: %d results", num_results)
+                resp = _run_sync(client.arecall(
+                    bank_id=self._bank_id, query=query, budget=self._budget
+                ))
                if not resp.results:
                    return json.dumps({"result": "No relevant memories found."})
                lines = [f"{i}. {r.text}" for i, r in enumerate(resp.results, 1)]
                return json.dumps({"result": "\n".join(lines)})
            except Exception as e:
-                logger.warning("hindsight_recall failed: %s", e, exc_info=True)
+                logger.warning("hindsight_recall failed: %s", e)
                return tool_error(f"Failed to search memory: {e}")

        elif tool_name == "hindsight_reflect":
@@ -835,28 +471,24 @@ class HindsightMemoryProvider(MemoryProvider):
            if not query:
                return tool_error("Missing required parameter: query")
            try:
-                logger.debug("Tool hindsight_reflect: bank=%s, query_len=%d, budget=%s",
-                             self._bank_id, len(query), self._budget)
                resp = _run_sync(client.areflect(
                    bank_id=self._bank_id, query=query, budget=self._budget
                ))
-                logger.debug("Tool hindsight_reflect: response_len=%d", len(resp.text or ""))
                return json.dumps({"result": resp.text or "No relevant memories found."})
            except Exception as e:
-                logger.warning("hindsight_reflect failed: %s", e, exc_info=True)
+                logger.warning("hindsight_reflect failed: %s", e)
                return tool_error(f"Failed to reflect: {e}")

        return tool_error(f"Unknown tool: {tool_name}")

    def shutdown(self) -> None:
-        logger.debug("Hindsight shutdown: waiting for background threads")
        global _loop, _loop_thread
        for t in (self._prefetch_thread, self._sync_thread):
            if t and t.is_alive():
                t.join(timeout=5.0)
        if self._client is not None:
            try:
-                if self._mode == "local_embedded":
+                if self._mode == "local":
                    # Use the public close() API. The RuntimeError from
                    # aiohttp's "attached to a different loop" is expected
                    # and harmless — the daemon keeps running independently.
@@ -2,7 +2,9 @@ name: hindsight
 version: 1.0.0
 description: "Hindsight — long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval."
 pip_dependencies:
-  - "hindsight-client>=0.4.22"
-requires_env: []
+  - hindsight-client
+  - hindsight-all
+requires_env:
+  - HINDSIGHT_API_KEY
 hooks:
  - on_session_end
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

 [project]
 name = "hermes-agent"
-version = "0.8.0"
+version = "0.7.0"
 description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
 readme = "README.md"
 requires-python = ">=3.11"
@@ -62,7 +62,6 @@ mcp = ["mcp>=1.2.0,<2"]
 homeassistant = ["aiohttp>=3.9.0,<4"]
 sms = ["aiohttp>=3.9.0,<4"]
 acp = ["agent-client-protocol>=0.9.0,<1.0"]
-mistral = ["mistralai>=2.3.0,<3"]
 dingtalk = ["dingtalk-stream>=0.1.0,<1"]
 feishu = ["lark-oapi>=1.5.3,<2"]
 rl = [
@@ -95,7 +94,6 @@ all = [
  "hermes-agent[voice]",
  "hermes-agent[dingtalk]",
  "hermes-agent[feishu]",
-  "hermes-agent[mistral]",
 ]

 [project.scripts]
@@ -66,8 +66,7 @@ from model_tools import (
    handle_function_call,
    check_toolset_requirements,
 )
-from tools.terminal_tool import cleanup_vm, get_active_env, is_persistent_env
-from tools.tool_result_storage import maybe_persist_tool_result, enforce_turn_budget
+from tools.terminal_tool import cleanup_vm
 from tools.interrupt import set_interrupt as _set_interrupt
 from tools.browser_tool import cleanup_browser

@@ -76,7 +75,6 @@ from hermes_constants import OPENROUTER_BASE_URL

 # Agent internals extracted to agent/ package for modularity
 from agent.memory_manager import build_memory_context_block
-from agent.retry_utils import jittered_backoff
 from agent.prompt_builder import (
    DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
    MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
@@ -87,7 +85,6 @@ from agent.model_metadata import (
    estimate_tokens_rough, estimate_messages_tokens_rough, estimate_request_tokens_rough,
    get_next_probe_tier, parse_context_limit_from_error,
    save_context_length, is_local_endpoint,
-    query_ollama_num_ctx,
 )
 from agent.context_compressor import ContextCompressor
 from agent.subdirectory_hints import SubdirectoryHintTracker
@@ -412,26 +409,62 @@ def _strip_budget_warnings_from_history(messages: list) -> None:
 # Large tool result handler — save oversized output to temp file
 # =========================================================================

+# Threshold at which tool results are saved to a file instead of kept inline.
+# 100K chars ≈ 25K tokens — generous for any reasonable output but prevents
+# catastrophic context explosions.
+_LARGE_RESULT_CHARS = 100_000

-# =========================================================================
-# Qwen Portal headers — mimics QwenCode CLI for portal.qwen.ai compatibility.
-# Extracted as a module-level helper so both __init__ and
-# _apply_client_headers_for_base_url can share it.
-# =========================================================================
-_QWEN_CODE_VERSION = "0.14.1"
+# How many characters of the original result to include as an inline preview
+# so the model has immediate context about what the tool returned.
+_LARGE_RESULT_PREVIEW_CHARS = 1_500


-def _qwen_portal_headers() -> dict:
-    """Return default HTTP headers required by Qwen Portal API."""
-    import platform as _plat
+def _save_oversized_tool_result(function_name: str, function_result: str) -> str:
+    """Replace oversized tool results with a file reference + preview.

-    _ua = f"QwenCode/{_QWEN_CODE_VERSION} ({_plat.system().lower()}; {_plat.machine()})"
-    return {
-        "User-Agent": _ua,
-        "X-DashScope-CacheControl": "enable",
-        "X-DashScope-UserAgent": _ua,
-        "X-DashScope-AuthType": "qwen-oauth",
-    }
+    When a tool returns more than ``_LARGE_RESULT_CHARS`` characters, the full
+    content is written to a temporary file under ``HERMES_HOME/cache/tool_responses/``
+    and the result sent to the model is replaced with:
+      • a brief head preview  (first ``_LARGE_RESULT_PREVIEW_CHARS`` chars)
+      • the file path so the model can use ``read_file`` / ``search_files``
+
+    Falls back to destructive truncation if the file write fails.
+    """
+    original_len = len(function_result)
+    if original_len <= _LARGE_RESULT_CHARS:
+        return function_result
+
+    # Build the target directory
+    try:
+        response_dir = os.path.join(get_hermes_home(), "cache", "tool_responses")
+        os.makedirs(response_dir, exist_ok=True)
+
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
+        # Sanitize tool name for use in filename
+        safe_name = re.sub(r"[^\w\-]", "_", function_name)[:40]
+        filename = f"{safe_name}_{timestamp}.txt"
+        filepath = os.path.join(response_dir, filename)
+
+        with open(filepath, "w", encoding="utf-8") as f:
+            f.write(function_result)
+
+        preview = function_result[:_LARGE_RESULT_PREVIEW_CHARS]
+        return (
+            f"{preview}\n\n"
+            f"[Large tool response: {original_len:,} characters total — "
+            f"only the first {_LARGE_RESULT_PREVIEW_CHARS:,} shown above. "
+            f"Full output saved to: {filepath}\n"
+            f"Use read_file or search_files on that path to access the rest.]"
+        )
+    except Exception as exc:
+        # Fall back to destructive truncation if file write fails
+        logger.warning("Failed to save large tool result to file: %s", exc)
+        return (
+            function_result[:_LARGE_RESULT_CHARS]
+            + f"\n\n[Truncated: tool response was {original_len:,} chars, "
+            f"exceeding the {_LARGE_RESULT_CHARS:,} char limit. "
+            f"File save failed: {exc}]"
+        )


 class AIAgent:
@@ -442,13 +475,6 @@ class AIAgent:
    for AI models that support function calling.
    """

-    # ── Class-level context pressure dedup (survives across instances) ──
-    # The gateway creates a new AIAgent per message, so instance-level flags
-    # reset every time.  This dict tracks {session_id: (warn_level, timestamp)}
-    # to suppress duplicate warnings within a cooldown window.
-    _context_pressure_last_warned: dict = {}
-    _CONTEXT_PRESSURE_COOLDOWN = 300  # seconds between re-warning same session
-
    @property
    def base_url(self) -> str:
        return self._base_url
@@ -680,8 +706,7 @@ class AIAgent:
        # Context pressure warnings: notify the USER (not the LLM) as context
        # fills up.  Purely informational — displayed in CLI output and sent via
        # status_callback for gateway platforms.  Does NOT inject into messages.
-        # Tiered: fires at 85% and again at 95% of compaction threshold.
-        self._context_pressure_warned_at = 0.0  # highest tier already shown
+        self._context_pressure_warned = False

        # Activity tracking — updated on each API call, tool execution, and
        # stream chunk.  Used by the gateway timeout handler to report what the
@@ -785,8 +810,6 @@ class AIAgent:
                    client_kwargs["default_headers"] = {
                        "User-Agent": "KimiCLI/1.3",
                    }
-                elif "portal.qwen.ai" in effective_base.lower():
-                    client_kwargs["default_headers"] = _qwen_portal_headers()
            else:
                # No explicit creds — use the centralized provider router
                from agent.auxiliary_client import resolve_provider_client
@@ -1193,33 +1216,6 @@ class AIAgent:
        self.session_cost_status = "unknown"
        self.session_cost_source = "none"
        
-        # ── Ollama num_ctx injection ──
-        # Ollama defaults to 2048 context regardless of the model's capabilities.
-        # When running against an Ollama server, detect the model's max context
-        # and pass num_ctx on every chat request so the full window is used.
-        # User override: set model.ollama_num_ctx in config.yaml to cap VRAM use.
-        self._ollama_num_ctx: int | None = None
-        _ollama_num_ctx_override = None
-        if isinstance(_model_cfg, dict):
-            _ollama_num_ctx_override = _model_cfg.get("ollama_num_ctx")
-        if _ollama_num_ctx_override is not None:
-            try:
-                self._ollama_num_ctx = int(_ollama_num_ctx_override)
-            except (TypeError, ValueError):
-                logger.debug("Invalid ollama_num_ctx config value: %r", _ollama_num_ctx_override)
-        if self._ollama_num_ctx is None and self.base_url and is_local_endpoint(self.base_url):
-            try:
-                _detected = query_ollama_num_ctx(self.model, self.base_url)
-                if _detected and _detected > 0:
-                    self._ollama_num_ctx = _detected
-            except Exception as exc:
-                logger.debug("Ollama num_ctx detection failed: %s", exc)
-        if self._ollama_num_ctx and not self.quiet_mode:
-            logger.info(
-                "Ollama num_ctx: will request %d tokens (model max from /api/show)",
-                self._ollama_num_ctx,
-            )
-
        if not self.quiet_mode:
            if compression_enabled:
                print(f"📊 Context limit: {self.context_compressor.context_length:,} tokens (compress at {int(compression_threshold*100)}% = {self.context_compressor.threshold_tokens:,})")
@@ -1695,25 +1691,9 @@ class AIAgent:
        return None

    def _cleanup_task_resources(self, task_id: str) -> None:
-        """Clean up VM and browser resources for a given task.
-
-        Skips ``cleanup_vm`` when the active terminal environment is marked
-        persistent (``persistent_filesystem=True``) so that long-lived sandbox
-        containers survive between turns. The idle reaper in
-        ``terminal_tool._cleanup_inactive_envs`` still tears them down once
-        ``terminal.lifetime_seconds`` is exceeded. Non-persistent backends are
-        torn down per-turn as before to prevent resource leakage (the original
-        intent of this hook for the Morph backend, see commit fbd3a2fd).
-        """
+        """Clean up VM and browser resources for a given task."""
        try:
-            if is_persistent_env(task_id):
-                if self.verbose_logging:
-                    logging.debug(
-                        f"Skipping per-turn cleanup_vm for persistent env {task_id}; "
-                        f"idle reaper will handle it."
-                    )
-            else:
-                cleanup_vm(task_id)
+            cleanup_vm(task_id)
        except Exception as e:
            if self.verbose_logging:
                logging.warning(f"Failed to cleanup VM for task {task_id}: {e}")
@@ -4127,8 +4107,6 @@ class AIAgent:
            self._client_kwargs["default_headers"] = copilot_default_headers()
        elif "api.kimi.com" in normalized:
            self._client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
-        elif "portal.qwen.ai" in normalized:
-            self._client_kwargs["default_headers"] = _qwen_portal_headers()
        else:
            self._client_kwargs.pop("default_headers", None)

@@ -4752,25 +4730,18 @@ class AIAgent:
                    self._close_request_openai_client(request_client, reason="stream_request_complete")

        _stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
-        # Local providers (Ollama, oMLX, llama-cpp) can take 300+ seconds
-        # for prefill on large contexts.  Disable the stale detector unless
-        # the user explicitly set HERMES_STREAM_STALE_TIMEOUT.
-        if _stream_stale_timeout_base == 180.0 and self.base_url and is_local_endpoint(self.base_url):
-            _stream_stale_timeout = float("inf")
-            logger.debug("Local provider detected (%s) — stale stream timeout disabled", self.base_url)
+        # Scale the stale timeout for large contexts: slow models (like Opus)
+        # can legitimately think for minutes before producing the first token
+        # when the context is large.  Without this, the stale detector kills
+        # healthy connections during the model's thinking phase, producing
+        # spurious RemoteProtocolError ("peer closed connection").
+        _est_tokens = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
+        if _est_tokens > 100_000:
+            _stream_stale_timeout = max(_stream_stale_timeout_base, 300.0)
+        elif _est_tokens > 50_000:
+            _stream_stale_timeout = max(_stream_stale_timeout_base, 240.0)
        else:
-            # Scale the stale timeout for large contexts: slow models (like Opus)
-            # can legitimately think for minutes before producing the first token
-            # when the context is large.  Without this, the stale detector kills
-            # healthy connections during the model's thinking phase, producing
-            # spurious RemoteProtocolError ("peer closed connection").
-            _est_tokens = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
-            if _est_tokens > 100_000:
-                _stream_stale_timeout = max(_stream_stale_timeout_base, 300.0)
-            elif _est_tokens > 50_000:
-                _stream_stale_timeout = max(_stream_stale_timeout_base, 240.0)
-            else:
-                _stream_stale_timeout = _stream_stale_timeout_base
+            _stream_stale_timeout = _stream_stale_timeout_base

        t = threading.Thread(target=_call, daemon=True)
        t.start()
@@ -4926,7 +4897,7 @@ class AIAgent:
                effective_key = (fb_client.api_key or resolve_anthropic_token() or "") if fb_provider == "anthropic" else (fb_client.api_key or "")
                self.api_key = effective_key
                self._anthropic_api_key = effective_key
-                self._anthropic_base_url = fb_base_url
+                self._anthropic_base_url = getattr(fb_client, "base_url", None)
                self._anthropic_client = build_anthropic_client(effective_key, self._anthropic_base_url)
                self._is_anthropic_oauth = _is_oauth_token(effective_key)
                self.client = None
@@ -5282,71 +5253,6 @@ class AIAgent:
        base = (getattr(self, "base_url", "") or "").lower()
        return "dashscope" in base or "aliyuncs" in base or "opencode.ai/zen/go" in base

-    def _is_qwen_portal(self) -> bool:
-        """Return True when the base URL targets Qwen Portal."""
-        return "portal.qwen.ai" in self._base_url_lower
-
-    def _qwen_prepare_chat_messages(self, api_messages: list) -> list:
-        prepared = copy.deepcopy(api_messages)
-        if not prepared:
-            return prepared
-
-        for msg in prepared:
-            if not isinstance(msg, dict):
-                continue
-            content = msg.get("content")
-            if isinstance(content, str):
-                msg["content"] = [{"type": "text", "text": content}]
-            elif isinstance(content, list):
-                # Normalize: convert bare strings to text dicts, keep dicts as-is.
-                # deepcopy already created independent copies, no need for dict().
-                normalized_parts = []
-                for part in content:
-                    if isinstance(part, str):
-                        normalized_parts.append({"type": "text", "text": part})
-                    elif isinstance(part, dict):
-                        normalized_parts.append(part)
-                if normalized_parts:
-                    msg["content"] = normalized_parts
-
-        # Inject cache_control on the last part of the system message.
-        for msg in prepared:
-            if isinstance(msg, dict) and msg.get("role") == "system":
-                content = msg.get("content")
-                if isinstance(content, list) and content and isinstance(content[-1], dict):
-                    content[-1]["cache_control"] = {"type": "ephemeral"}
-                break
-
-        return prepared
-
-    def _qwen_prepare_chat_messages_inplace(self, messages: list) -> None:
-        """In-place variant — mutates an already-copied message list."""
-        if not messages:
-            return
-
-        for msg in messages:
-            if not isinstance(msg, dict):
-                continue
-            content = msg.get("content")
-            if isinstance(content, str):
-                msg["content"] = [{"type": "text", "text": content}]
-            elif isinstance(content, list):
-                normalized_parts = []
-                for part in content:
-                    if isinstance(part, str):
-                        normalized_parts.append({"type": "text", "text": part})
-                    elif isinstance(part, dict):
-                        normalized_parts.append(part)
-                if normalized_parts:
-                    msg["content"] = normalized_parts
-
-        for msg in messages:
-            if isinstance(msg, dict) and msg.get("role") == "system":
-                content = msg.get("content")
-                if isinstance(content, list) and content and isinstance(content[-1], dict):
-                    content[-1]["cache_control"] = {"type": "ephemeral"}
-                break
-
    def _build_api_kwargs(self, api_messages: list) -> dict:
        """Build the keyword arguments dict for the active API mode."""
        if self.api_mode == "anthropic_messages":
@@ -5365,7 +5271,6 @@ class AIAgent:
                is_oauth=self._is_anthropic_oauth,
                preserve_dots=self._anthropic_preserve_dots(),
                context_length=ctx_len,
-                base_url=getattr(self, "_anthropic_base_url", None),
            )

        if self.api_mode == "codex_responses":
@@ -5459,17 +5364,6 @@ class AIAgent:
                            tool_call.pop("call_id", None)
                            tool_call.pop("response_item_id", None)

-        # Qwen portal: normalize content to list-of-dicts, inject cache_control.
-        # Must run AFTER codex sanitization so we transform the final messages.
-        # If sanitization already deepcopied, reuse that copy (in-place).
-        if self._is_qwen_portal():
-            if sanitized_messages is api_messages:
-                # No sanitization was done — we need our own copy.
-                sanitized_messages = self._qwen_prepare_chat_messages(sanitized_messages)
-            else:
-                # Already a deepcopy — transform in place to avoid a second deepcopy.
-                self._qwen_prepare_chat_messages_inplace(sanitized_messages)
-
        # GPT-5 and Codex models respond better to 'developer' than 'system'
        # for instruction-following.  Swap the role at the API boundary so
        # internal message representation stays uniform ("system").
@@ -5502,17 +5396,11 @@ class AIAgent:
            "messages": sanitized_messages,
            "timeout": float(os.getenv("HERMES_API_TIMEOUT", 1800.0)),
        }
-        if self._is_qwen_portal():
-            api_kwargs["metadata"] = {
-                "sessionId": self.session_id or "hermes",
-                "promptId": str(uuid.uuid4()),
-            }
        if self.tools:
            api_kwargs["tools"] = self.tools

        if self.max_tokens is not None:
-            if not self._is_qwen_portal():
-                api_kwargs.update(self._max_tokens_param(self.max_tokens))
+            api_kwargs.update(self._max_tokens_param(self.max_tokens))
        elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
            # OpenRouter translates requests to Anthropic's Messages API,
            # which requires max_tokens as a mandatory field.  When we omit
@@ -5568,18 +5456,6 @@ class AIAgent:
        if _is_nous:
            extra_body["tags"] = ["product=hermes-agent"]

-        # Ollama num_ctx: override the 2048 default so the model actually
-        # uses the context window it was trained for.  Passed via the OpenAI
-        # SDK's extra_body → options.num_ctx, which Ollama's OpenAI-compat
-        # endpoint forwards to the runner as --ctx-size.
-        if self._ollama_num_ctx:
-            options = extra_body.get("options", {})
-            options["num_ctx"] = self._ollama_num_ctx
-            extra_body["options"] = options
-
-        if self._is_qwen_portal():
-            extra_body["vl_high_resolution_images"] = True
-
        if extra_body:
            api_kwargs["extra_body"] = extra_body

@@ -5895,7 +5771,7 @@ class AIAgent:
                    tools=[memory_tool_def],
                    temperature=0.3,
                    max_tokens=5120,
-                    # timeout resolved from auxiliary.flush_memories.timeout config
+                    timeout=30.0,
                )
            except RuntimeError:
                _aux_available = False
@@ -5927,10 +5803,7 @@ class AIAgent:
                    "temperature": 0.3,
                    **self._max_tokens_param(5120),
                }
-                from agent.auxiliary_client import _get_task_timeout
-                response = self._ensure_primary_openai_client(reason="flush_memories").chat.completions.create(
-                    **api_kwargs, timeout=_get_task_timeout("flush_memories")
-                )
+                response = self._ensure_primary_openai_client(reason="flush_memories").chat.completions.create(**api_kwargs, timeout=30.0)

            # Extract tool calls from the response, handling all API formats
            tool_calls = []
@@ -6037,15 +5910,6 @@ class AIAgent:
            except Exception as e:
                logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)

-        # Warn on repeated compressions (quality degrades with each pass)
-        _cc = self.context_compressor.compression_count
-        if _cc >= 2:
-            self._vprint(
-                f"{self.log_prefix}⚠️  Session compressed {_cc} times — "
-                f"accuracy may degrade. Consider /new to start fresh.",
-                force=True,
-            )
-
        # Update token estimate after compaction so pressure calculations
        # use the post-compression count, not the stale pre-compression one.
        _compressed_est = (
@@ -6058,16 +5922,12 @@ class AIAgent:
        # Only reset the pressure warning if compression actually brought
        # us below the warning level (85% of threshold).  When compression
        # can't reduce enough (e.g. threshold is very low, or system prompt
-        # alone exceeds the warning level), keep the tier set to prevent
+        # alone exceeds the warning level), keep the flag set to prevent
        # spamming the user with repeated warnings every loop iteration.
        if self.context_compressor.threshold_tokens > 0:
            _post_progress = _compressed_est / self.context_compressor.threshold_tokens
            if _post_progress < 0.85:
-                self._context_pressure_warned_at = 0.0
-                # Clear class-level dedup for this session so a fresh
-                # warning cycle can start if context grows again.
-                _sid = self.session_id or "default"
-                AIAgent._context_pressure_last_warned.pop(_sid, None)
+                self._context_pressure_warned = False

        # Clear the file-read dedup cache.  After compression the original
        # read content is summarised away — if the model re-reads the same
@@ -6364,17 +6224,15 @@ class AIAgent:
                except Exception as cb_err:
                    logging.debug(f"Tool complete callback error: {cb_err}")

-            function_result = maybe_persist_tool_result(
-                content=function_result,
-                tool_name=name,
-                tool_use_id=tc.id,
-                env=get_active_env(effective_task_id),
-            )
+            # Save oversized results to file instead of destructive truncation
+            function_result = _save_oversized_tool_result(name, function_result)

+            # Discover subdirectory context files from tool arguments
            subdir_hints = self._subdirectory_hints.check_tool_call(name, args)
            if subdir_hints:
                function_result += subdir_hints

+            # Append tool result message in order
            tool_msg = {
                "role": "tool",
                "content": function_result,
@@ -6382,12 +6240,6 @@ class AIAgent:
            }
            messages.append(tool_msg)

-        # ── Per-turn aggregate budget enforcement ─────────────────────────
-        num_tools = len(parsed_calls)
-        if num_tools > 0:
-            turn_tool_msgs = messages[-num_tools:]
-            enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id))
-
        # ── Budget pressure injection ────────────────────────────────────
        budget_warning = self._get_budget_warning(api_call_count)
        if budget_warning and messages and messages[-1].get("role") == "tool":
@@ -6672,12 +6524,8 @@ class AIAgent:
                except Exception as cb_err:
                    logging.debug(f"Tool complete callback error: {cb_err}")

-            function_result = maybe_persist_tool_result(
-                content=function_result,
-                tool_name=function_name,
-                tool_use_id=tool_call.id,
-                env=get_active_env(effective_task_id),
-            )
+            # Save oversized results to file instead of destructive truncation
+            function_result = _save_oversized_tool_result(function_name, function_result)

            # Discover subdirectory context files from tool arguments
            subdir_hints = self._subdirectory_hints.check_tool_call(function_name, function_args)
@@ -6715,11 +6563,6 @@ class AIAgent:
            if self.tool_delay > 0 and i < len(assistant_message.tool_calls):
                time.sleep(self.tool_delay)

-        # ── Per-turn aggregate budget enforcement ─────────────────────────
-        num_tools_seq = len(assistant_message.tool_calls)
-        if num_tools_seq > 0:
-            enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id))
-
        # ── Budget pressure injection ─────────────────────────────────
        # After all tool calls in this turn are processed, check if we're
        # approaching max_iterations. If so, inject a warning into the LAST
@@ -7446,7 +7289,6 @@ class AIAgent:
            codex_auth_retry_attempted=False
            anthropic_auth_retry_attempted=False
            nous_auth_retry_attempted=False
-            thinking_sig_retry_attempted = False
            has_retried_429 = False
            restart_with_compressed_messages = False
            restart_with_length_continuation = False
@@ -7662,8 +7504,7 @@ class AIAgent:
                            }
                        
                        # Longer backoff for rate limiting (likely cause of None choices)
-                        # Jittered exponential: 5s base, 120s cap + random jitter
-                        wait_time = jittered_backoff(retry_count, base_delay=5.0, max_delay=120.0)
+                        wait_time = min(5 * (2 ** (retry_count - 1)), 120)  # 5s, 10s, 20s, 40s, 80s, 120s
                        self._vprint(f"{self.log_prefix}⏳ Retrying in {wait_time}s (extended backoff for possible rate limit)...", force=True)
                        logging.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}")
                        
@@ -8036,38 +7877,8 @@ class AIAgent:
                        print(f"{self.log_prefix}     • Check ANTHROPIC_API_KEY in {_dhh}/.env for API keys or legacy token values")
                        print(f"{self.log_prefix}     • For API keys: verify at https://console.anthropic.com/settings/keys")
                        print(f"{self.log_prefix}     • For Claude Code: run 'claude /login' to refresh, then retry")
-                        print(f"{self.log_prefix}     • Legacy cleanup: hermes config set ANTHROPIC_TOKEN \"\"")
-                        print(f"{self.log_prefix}     • Clear stale keys: hermes config set ANTHROPIC_API_KEY \"\"")
-
-                    # ── Thinking block signature recovery ─────────────────
-                    # Anthropic signs thinking blocks against the full turn
-                    # content.  Any upstream mutation (context compression,
-                    # session truncation, message merging) invalidates the
-                    # signature → HTTP 400.  Recovery: strip reasoning_details
-                    # from all messages so the next retry sends no thinking
-                    # blocks at all.  One-shot — don't retry infinitely.
-                    if (
-                        self.api_mode == "anthropic_messages"
-                        and status_code == 400
-                        and not thinking_sig_retry_attempted
-                    ):
-                        _err_msg_lower = str(api_error).lower()
-                        if "signature" in _err_msg_lower and "thinking" in _err_msg_lower:
-                            thinking_sig_retry_attempted = True
-                            for _m in messages:
-                                if isinstance(_m, dict):
-                                    _m.pop("reasoning_details", None)
-                            self._vprint(
-                                f"{self.log_prefix}⚠️  Thinking block signature invalid — "
-                                f"stripped all thinking blocks, retrying...",
-                                force=True,
-                            )
-                            logging.warning(
-                                "%sThinking block signature recovery: stripped "
-                                "reasoning_details from %d messages",
-                                self.log_prefix, len(messages),
-                            )
-                            continue
+                        print(f"{self.log_prefix}     • Clear stale keys: hermes config set ANTHROPIC_TOKEN \"\"")
+                        print(f"{self.log_prefix}     • Legacy cleanup: hermes config set ANTHROPIC_API_KEY \"\"")

                    retry_count += 1
                    elapsed_time = time.time() - api_start_time
@@ -8550,7 +8361,7 @@ class AIAgent:
                                    _retry_after = min(int(_ra_raw), 120)  # Cap at 2 minutes
                                except (TypeError, ValueError):
                                    pass
-                    wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
+                    wait_time = _retry_after if _retry_after else min(2 ** retry_count, 60)
                    if is_rate_limited:
                        self._emit_status(f"⏱️ Rate limit reached. Waiting {wait_time}s before retry (attempt {retry_count + 1}/{max_retries})...")
                    else:
@@ -9007,34 +8818,13 @@ class AIAgent:
                    # compaction fires, not the raw context window.
                    # Does not inject into messages — just prints to CLI output
                    # and fires status_callback for gateway platforms.
-                    # Tiered: 85% (orange) and 95% (red/critical).
                    if _compressor.threshold_tokens > 0:
                        _compaction_progress = _real_tokens / _compressor.threshold_tokens
-                        # Determine the warning tier for this progress level
-                        _warn_tier = 0.0
-                        if _compaction_progress >= 0.95:
-                            _warn_tier = 0.95
-                        elif _compaction_progress >= 0.85:
-                            _warn_tier = 0.85
-                        if _warn_tier > self._context_pressure_warned_at:
-                            # Class-level dedup: check if this session was already
-                            # warned at this tier within the cooldown window.
-                            _sid = self.session_id or "default"
-                            _last = AIAgent._context_pressure_last_warned.get(_sid)
-                            _now = time.time()
-                            if _last is None or _last[0] < _warn_tier or (_now - _last[1]) >= self._CONTEXT_PRESSURE_COOLDOWN:
-                                self._context_pressure_warned_at = _warn_tier
-                                AIAgent._context_pressure_last_warned[_sid] = (_warn_tier, _now)
-                                self._emit_context_pressure(_compaction_progress, _compressor)
-                                # Evict stale entries (older than 2x cooldown)
-                                _cutoff = _now - self._CONTEXT_PRESSURE_COOLDOWN * 2
-                                AIAgent._context_pressure_last_warned = {
-                                    k: v for k, v in AIAgent._context_pressure_last_warned.items()
-                                    if v[1] > _cutoff
-                                }
+                        if _compaction_progress >= 0.85 and not self._context_pressure_warned:
+                            self._context_pressure_warned = True
+                            self._emit_context_pressure(_compaction_progress, _compressor)

                    if self.compression_enabled and _compressor.should_compress(_real_tokens):
-                        self._safe_print("  ⟳ compacting context…")
                        messages, active_system_prompt = self._compress_context(
                            messages, system_message,
                            approx_tokens=self.context_compressor.last_prompt_tokens,
@@ -9109,27 +8899,8 @@ class AIAgent:
                            self._save_session_log(messages)
                            continue

-                        # ── Empty response retry (no reasoning) ──────
-                        # Model returned nothing — no content, no
-                        # structured reasoning, no tool calls.  Common
-                        # with open models (transient provider issues,
-                        # rate limits, sampling flukes).  Silently retry
-                        # up to 3 times before giving up.  Skip when
-                        # content has inline <think> tags (model chose
-                        # to reason, just no visible text).
-                        _truly_empty = not final_response.strip()
-                        if _truly_empty and not _has_structured and self._empty_content_retries < 3:
-                            self._empty_content_retries += 1
-                            self._vprint(
-                                f"{self.log_prefix}↻ Empty response (no content or reasoning) "
-                                f"— retrying ({self._empty_content_retries}/3)",
-                                force=True,
-                            )
-                            continue
-
-                        # Exhausted prefill attempts, empty retries, or
-                        # structured reasoning with no content —
-                        # fall through to "(empty)" terminal.
+                        # Exhausted prefill attempts or no structured
+                        # reasoning — fall through to "(empty)" terminal.
                        reasoning_text = self._extract_reasoning(assistant_message)
                        assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
                        assistant_msg["content"] = "(empty)"
@@ -9139,7 +8910,7 @@ class AIAgent:
                            reasoning_preview = reasoning_text[:500] + "..." if len(reasoning_text) > 500 else reasoning_text
                            self._vprint(f"{self.log_prefix}ℹ️  Reasoning-only response (no visible content). Reasoning: {reasoning_preview}")
                        else:
-                            self._vprint(f"{self.log_prefix}ℹ️  Empty response (no content or reasoning) after 3 retries.")
+                            self._vprint(f"{self.log_prefix}ℹ️  Empty response (no content or reasoning).")

                        final_response = "(empty)"
                        break
@@ -1276,258 +1276,6 @@ class TestRoleAlternation:
        assert [m["role"] for m in result] == ["user", "assistant", "user"]


-# ---------------------------------------------------------------------------
-# Thinking block signature management
-# ---------------------------------------------------------------------------
-
-
-class TestThinkingBlockSignatureManagement:
-    """Tests for the thinking block handling strategy:
-    strip from old turns, preserve latest signed, downgrade unsigned."""
-
-    def test_thinking_stripped_from_non_last_assistant(self):
-        """Thinking blocks are removed from all assistant messages except the last."""
-        messages = [
-            {
-                "role": "assistant",
-                "content": "",
-                "tool_calls": [
-                    {"id": "tc_1", "function": {"name": "tool1", "arguments": "{}"}},
-                ],
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Old reasoning.", "signature": "sig_old"},
-                ],
-            },
-            {"role": "tool", "tool_call_id": "tc_1", "content": "result 1"},
-            {
-                "role": "assistant",
-                "content": "",
-                "tool_calls": [
-                    {"id": "tc_2", "function": {"name": "tool2", "arguments": "{}"}},
-                ],
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Latest reasoning.", "signature": "sig_new"},
-                ],
-            },
-            {"role": "tool", "tool_call_id": "tc_2", "content": "result 2"},
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-
-        # Find both assistant messages
-        assistants = [m for m in result if m["role"] == "assistant"]
-        assert len(assistants) == 2
-
-        # First (non-last) assistant: no thinking blocks
-        first_types = [b.get("type") for b in assistants[0]["content"]]
-        assert "thinking" not in first_types
-        assert "redacted_thinking" not in first_types
-        assert "tool_use" in first_types  # tool_use should survive
-
-        # Last assistant: thinking block preserved with signature
-        last_blocks = assistants[1]["content"]
-        thinking_blocks = [b for b in last_blocks if b.get("type") == "thinking"]
-        assert len(thinking_blocks) == 1
-        assert thinking_blocks[0]["thinking"] == "Latest reasoning."
-        assert thinking_blocks[0]["signature"] == "sig_new"
-
-    def test_signed_thinking_preserved_on_last_turn(self):
-        """A signed thinking block on the last assistant message is kept."""
-        messages = [
-            {
-                "role": "assistant",
-                "content": "The answer is 42.",
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Deep thought.", "signature": "sig_valid"},
-                ],
-            },
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-        blocks = result[0]["content"]
-        thinking = [b for b in blocks if b.get("type") == "thinking"]
-        assert len(thinking) == 1
-        assert thinking[0]["signature"] == "sig_valid"
-
-    def test_unsigned_thinking_downgraded_to_text_on_last_turn(self):
-        """Unsigned thinking blocks on the last turn become text blocks."""
-        messages = [
-            {
-                "role": "assistant",
-                "content": "Response text.",
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Unsigned reasoning."},
-                    # No 'signature' field
-                ],
-            },
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-        blocks = result[0]["content"]
-
-        # No thinking blocks should remain
-        assert not any(b.get("type") == "thinking" for b in blocks)
-        # The reasoning text should be preserved as a text block
-        text_contents = [b.get("text", "") for b in blocks if b.get("type") == "text"]
-        assert "Unsigned reasoning." in text_contents
-
-    def test_redacted_thinking_with_data_preserved(self):
-        """Redacted thinking with 'data' field is kept on last turn."""
-        messages = [
-            {
-                "role": "assistant",
-                "content": "Response.",
-                "reasoning_details": [
-                    {"type": "redacted_thinking", "data": "opaque_signature_data"},
-                ],
-            },
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-        blocks = result[0]["content"]
-        redacted = [b for b in blocks if b.get("type") == "redacted_thinking"]
-        assert len(redacted) == 1
-        assert redacted[0]["data"] == "opaque_signature_data"
-
-    def test_redacted_thinking_without_data_dropped(self):
-        """Redacted thinking without 'data' is dropped — can't be validated."""
-        messages = [
-            {
-                "role": "assistant",
-                "content": "Response.",
-                "reasoning_details": [
-                    {"type": "redacted_thinking"},
-                    # No 'data' field
-                ],
-            },
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-        blocks = result[0]["content"]
-        assert not any(b.get("type") == "redacted_thinking" for b in blocks)
-
-    def test_cache_control_stripped_from_thinking_blocks(self):
-        """cache_control markers are removed from thinking/redacted_thinking blocks."""
-        messages = [
-            {
-                "role": "assistant",
-                "content": "",
-                "tool_calls": [
-                    {"id": "tc_1", "function": {"name": "t", "arguments": "{}"}},
-                ],
-                "reasoning_details": [
-                    {
-                        "type": "thinking",
-                        "thinking": "Reasoning.",
-                        "signature": "sig_1",
-                        "cache_control": {"type": "ephemeral"},
-                    },
-                ],
-            },
-            {"role": "tool", "tool_call_id": "tc_1", "content": "result"},
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-        assistant = next(m for m in result if m["role"] == "assistant")
-        for block in assistant["content"]:
-            if block.get("type") in ("thinking", "redacted_thinking"):
-                assert "cache_control" not in block
-
-    def test_thinking_stripped_from_merged_consecutive_assistants(self):
-        """When consecutive assistants are merged, second one's thinking is dropped."""
-        messages = [
-            {
-                "role": "assistant",
-                "content": "First response.",
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "First thought.", "signature": "sig_1"},
-                ],
-            },
-            {
-                "role": "assistant",
-                "content": "Second response.",
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Second thought.", "signature": "sig_2"},
-                ],
-            },
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-
-        # Should be merged into one assistant message
-        assistants = [m for m in result if m["role"] == "assistant"]
-        assert len(assistants) == 1
-
-        # Only the first thinking block should remain (signed, on the last/only assistant)
-        blocks = assistants[0]["content"]
-        thinking = [b for b in blocks if b.get("type") == "thinking"]
-        assert len(thinking) == 1
-        assert thinking[0]["thinking"] == "First thought."
-
-    def test_empty_content_after_strip_gets_placeholder(self):
-        """If stripping thinking leaves an empty message, a placeholder is added."""
-        messages = [
-            {
-                "role": "assistant",
-                "content": "",
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Only thinking, no text."},
-                    # Unsigned — will be downgraded, but content was empty string
-                ],
-            },
-            {"role": "user", "content": "Next message."},
-            {"role": "assistant", "content": "Final."},
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-        # First assistant is non-last, so thinking is stripped completely.
-        # The original content was empty and thinking was unsigned → placeholder
-        first_assistant = result[0]
-        assert first_assistant["role"] == "assistant"
-        assert len(first_assistant["content"]) >= 1
-
-    def test_multi_turn_conversation_preserves_only_last(self):
-        """Full multi-turn conversation: only last assistant keeps thinking."""
-        messages = [
-            {"role": "user", "content": "Question 1"},
-            {
-                "role": "assistant",
-                "content": "Answer 1",
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Thought 1", "signature": "sig_1"},
-                ],
-            },
-            {"role": "user", "content": "Question 2"},
-            {
-                "role": "assistant",
-                "content": "Answer 2",
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Thought 2", "signature": "sig_2"},
-                ],
-            },
-            {"role": "user", "content": "Question 3"},
-            {
-                "role": "assistant",
-                "content": "Answer 3",
-                "reasoning_details": [
-                    {"type": "thinking", "thinking": "Thought 3", "signature": "sig_3"},
-                ],
-            },
-        ]
-        _, result = convert_messages_to_anthropic(messages)
-
-        assistants = [m for m in result if m["role"] == "assistant"]
-        assert len(assistants) == 3
-
-        # First two: no thinking blocks
-        for a in assistants[:2]:
-            assert not any(
-                b.get("type") in ("thinking", "redacted_thinking")
-                for b in a["content"]
-                if isinstance(b, dict)
-            )
-
-        # Last one: thinking preserved
-        last_thinking = [
-            b for b in assistants[2]["content"]
-            if isinstance(b, dict) and b.get("type") == "thinking"
-        ]
-        assert len(last_thinking) == 1
-        assert last_thinking[0]["signature"] == "sig_3"
-
-
 # ---------------------------------------------------------------------------
 # Tool choice
 # ---------------------------------------------------------------------------
@@ -77,20 +77,6 @@ class TestReadCodexAccessToken:
        result = _read_codex_access_token()
        assert result == "tok-123"

-    def test_pool_without_selected_entry_falls_back_to_auth_store(self, tmp_path, monkeypatch):
-        hermes_home = tmp_path / "hermes"
-        hermes_home.mkdir(parents=True, exist_ok=True)
-        monkeypatch.setenv("HERMES_HOME", str(hermes_home))
-
-        valid_jwt = "eyJhbGciOiJSUzI1NiJ9.eyJleHAiOjk5OTk5OTk5OTl9.sig"
-        with patch("agent.auxiliary_client._select_pool_entry", return_value=(True, None)), \
-             patch("hermes_cli.auth._read_codex_tokens", return_value={
-                 "tokens": {"access_token": valid_jwt, "refresh_token": "refresh"}
-             }):
-            result = _read_codex_access_token()
-
-        assert result == valid_jwt
-
    def test_missing_returns_none(self, tmp_path, monkeypatch):
        hermes_home = tmp_path / "hermes"
        hermes_home.mkdir(parents=True, exist_ok=True)
@@ -252,24 +238,6 @@ class TestAnthropicOAuthFlag:
        assert mock_build.call_args.args[0] == "sk-ant-oat01-pooled"


-class TestTryCodex:
-    def test_pool_without_selected_entry_falls_back_to_auth_store(self):
-        with (
-            patch("agent.auxiliary_client._select_pool_entry", return_value=(True, None)),
-            patch("agent.auxiliary_client._read_codex_access_token", return_value="codex-auth-token"),
-            patch("agent.auxiliary_client.OpenAI") as mock_openai,
-        ):
-            mock_openai.return_value = MagicMock()
-            from agent.auxiliary_client import _try_codex
-
-            client, model = _try_codex()
-
-        assert client is not None
-        assert model == "gpt-5.2-codex"
-        assert mock_openai.call_args.kwargs["api_key"] == "codex-auth-token"
-        assert mock_openai.call_args.kwargs["base_url"] == "https://chatgpt.com/backend-api/codex"
-
-
 class TestExpiredCodexFallback:
    """Test that expired Codex tokens don't block the auto chain."""

@@ -503,23 +471,6 @@ class TestExplicitProviderRouting:
            client, model = resolve_provider_client("zai")
            assert client is not None

-    def test_explicit_google_alias_uses_gemini_credentials(self):
-        """provider='google' should route through the gemini API-key provider."""
-        with (
-            patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={
-                "api_key": "gemini-key",
-                "base_url": "https://generativelanguage.googleapis.com/v1beta/openai",
-            }),
-            patch("agent.auxiliary_client.OpenAI") as mock_openai,
-        ):
-            mock_openai.return_value = MagicMock()
-            client, model = resolve_provider_client("google", model="gemini-3.1-pro-preview")
-
-        assert client is not None
-        assert model == "gemini-3.1-pro-preview"
-        assert mock_openai.call_args.kwargs["api_key"] == "gemini-key"
-        assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai"
-
    def test_explicit_unknown_returns_none(self, monkeypatch):
        """Unknown provider should return None."""
        client, model = resolve_provider_client("nonexistent-provider")
@@ -673,15 +624,12 @@ class TestVisionClientFallback:
        assert client is None
        assert model is None

-    def test_vision_auto_includes_active_provider_when_configured(self, monkeypatch):
-        """Active provider appears in available backends when credentials exist."""
-        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
+    def test_vision_auto_includes_anthropic_when_configured(self, monkeypatch):
+        monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
        with (
            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
-            patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
-            patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
            patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
-            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
+            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
        ):
            backends = get_available_vision_backends()

@@ -754,51 +702,88 @@ class TestAuxiliaryPoolAwareness:
        assert call_kwargs["base_url"] == "https://api.githubcopilot.com"
        assert call_kwargs["default_headers"]["Editor-Version"]

-    def test_vision_auto_uses_active_provider_as_fallback(self, monkeypatch):
-        """When no OpenRouter/Nous available, vision auto falls back to active provider."""
-        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
+    def test_vision_auto_uses_anthropic_when_no_higher_priority_backend(self, monkeypatch):
+        monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
        with (
            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
-            patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
-            patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
            patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
-            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
+            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
        ):
            client, model = get_vision_auxiliary_client()

        assert client is not None
        assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
+        assert model == "claude-haiku-4-5-20251001"

-    def test_vision_auto_prefers_active_provider_over_openrouter(self, monkeypatch):
-        """Active provider is tried before OpenRouter in vision auto."""
+    def test_selected_anthropic_provider_is_preferred_for_vision_auto(self, monkeypatch):
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
-        monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
+        monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
+
+        def fake_load_config():
+            return {"model": {"provider": "anthropic", "default": "claude-sonnet-4-6"}}

        with (
            patch("agent.auxiliary_client._read_nous_auth", return_value=None),
-            patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
-            patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
            patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
-            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
+            patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
+            patch("agent.auxiliary_client.OpenAI") as mock_openai,
+            patch("hermes_cli.config.load_config", fake_load_config),
+        ):
+            client, model = get_vision_auxiliary_client()
+
+        assert client is not None
+        assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
+        assert model == "claude-haiku-4-5-20251001"
+
+    def test_selected_codex_provider_short_circuits_vision_auto(self, monkeypatch):
+        def fake_load_config():
+            return {"model": {"provider": "openai-codex", "default": "gpt-5.2-codex"}}
+
+        codex_client = MagicMock()
+        with (
+            patch("hermes_cli.config.load_config", fake_load_config),
+            patch("agent.auxiliary_client._try_codex", return_value=(codex_client, "gpt-5.2-codex")) as mock_codex,
+            patch("agent.auxiliary_client._try_openrouter") as mock_openrouter,
+            patch("agent.auxiliary_client._try_nous") as mock_nous,
+            patch("agent.auxiliary_client._try_anthropic") as mock_anthropic,
+            patch("agent.auxiliary_client._try_custom_endpoint") as mock_custom,
        ):
            provider, client, model = resolve_vision_provider_client()

-        # Active provider should win over OpenRouter
-        assert provider == "anthropic"
+        assert provider == "openai-codex"
+        assert client is codex_client
+        assert model == "gpt-5.2-codex"
+        mock_codex.assert_called_once()
+        mock_openrouter.assert_not_called()
+        mock_nous.assert_not_called()
+        mock_anthropic.assert_not_called()
+        mock_custom.assert_not_called()

-    def test_vision_auto_uses_named_custom_as_active_provider(self, monkeypatch):
-        """Named custom provider works as active provider fallback in vision auto."""
+    def test_vision_auto_includes_codex(self, codex_auth_dir):
+        """Codex supports vision (gpt-5.3-codex), so auto mode should use it."""
+        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+             patch("agent.auxiliary_client.OpenAI"):
+            client, model = get_vision_auxiliary_client()
+        from agent.auxiliary_client import CodexAuxiliaryClient
+        assert isinstance(client, CodexAuxiliaryClient)
+        assert model == "gpt-5.2-codex"
+
+    def test_vision_auto_falls_back_to_custom_endpoint(self, monkeypatch):
+        """Custom endpoint is used as fallback in vision auto mode.
+
+        Many local models (Qwen-VL, LLaVA, etc.) support vision.
+        When no OpenRouter/Nous/Codex is available, try the custom endpoint.
+        """
        monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
        monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
             patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \
-             patch("agent.auxiliary_client._read_main_provider", return_value="custom:local"), \
-             patch("agent.auxiliary_client._read_main_model", return_value="my-local-model"), \
-             patch("agent.auxiliary_client.resolve_provider_client",
-                   return_value=(MagicMock(), "my-local-model")) as mock_resolve:
-            provider, client, model = resolve_vision_provider_client()
-        assert client is not None
-        assert provider == "custom:local"
+             patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
+             patch("agent.auxiliary_client._resolve_custom_runtime",
+                   return_value=("http://localhost:1234/v1", "local-key")), \
+             patch("agent.auxiliary_client.OpenAI") as mock_openai:
+            client, model = get_vision_auxiliary_client()
+        assert client is not None  # Custom endpoint picked up as fallback

    def test_vision_direct_endpoint_override(self, monkeypatch):
        monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
@@ -837,31 +822,6 @@ class TestAuxiliaryPoolAwareness:
        assert model == "google/gemini-3-flash-preview"
        assert client is not None

-    def test_vision_config_google_provider_uses_gemini_credentials(self, monkeypatch):
-        config = {
-            "auxiliary": {
-                "vision": {
-                    "provider": "google",
-                    "model": "gemini-3.1-pro-preview",
-                }
-            }
-        }
-        monkeypatch.setattr("hermes_cli.config.load_config", lambda: config)
-        with (
-            patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={
-                "api_key": "gemini-key",
-                "base_url": "https://generativelanguage.googleapis.com/v1beta/openai",
-            }),
-            patch("agent.auxiliary_client.OpenAI") as mock_openai,
-        ):
-            resolved_provider, client, model = resolve_vision_provider_client()
-
-        assert resolved_provider == "gemini"
-        assert client is not None
-        assert model == "gemini-3.1-pro-preview"
-        assert mock_openai.call_args.kwargs["api_key"] == "gemini-key"
-        assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai"
-
    def test_vision_forced_main_uses_custom_endpoint(self, monkeypatch):
        """When explicitly forced to 'main', vision CAN use custom endpoint."""
        config = {
@@ -886,14 +846,7 @@ class TestAuxiliaryPoolAwareness:
        monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "main")
        monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
-        # Clear client cache to avoid stale entries from previous tests
-        from agent.auxiliary_client import _client_cache
-        _client_cache.clear()
        with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
-             patch("agent.auxiliary_client._read_main_provider", return_value=""), \
-             patch("agent.auxiliary_client._read_main_model", return_value=""), \
-             patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \
-             patch("agent.auxiliary_client._resolve_custom_runtime", return_value=(None, None)), \
             patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
             patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)):
            client, model = get_vision_auxiliary_client()
@@ -324,10 +324,7 @@ class TestCompressWithClient:
        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)

-        # Last head message (index 1) is "assistant" → summary should be "user".
-        # With min_tail=3, tail = last 3 messages (indices 5-7).
-        # head_last=assistant, tail_first=assistant → summary_role="user", no collision.
-        # Need 8 messages: min_for_compress = 2+3+1 = 6, must have > 6.
+        # Last head message (index 1) is "assistant" → summary should be "user"
        msgs = [
            {"role": "user", "content": "msg 0"},
            {"role": "assistant", "content": "msg 1"},
@@ -335,8 +332,6 @@ class TestCompressWithClient:
            {"role": "assistant", "content": "msg 3"},
            {"role": "user", "content": "msg 4"},
            {"role": "assistant", "content": "msg 5"},
-            {"role": "user", "content": "msg 6"},
-            {"role": "assistant", "content": "msg 7"},
        ]
        with patch("agent.context_compressor.call_llm", return_value=mock_response):
            result = c.compress(msgs)
@@ -465,10 +460,8 @@ class TestCompressWithClient:
            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)

        # Head: [system, user]        → last head = user
-        # Tail: [assistant, user, assistant] → first tail = assistant
+        # Tail: [assistant, user]     → first tail = assistant
        # summary_role="assistant" collides with tail, "user" collides with head → merge
-        # With min_tail=3, tail = last 3 messages (indices 5-7).
-        # Need 8 messages: min_for_compress = 2+3+1 = 6, must have > 6.
        msgs = [
            {"role": "system", "content": "system prompt"},
            {"role": "user", "content": "msg 1"},
@@ -477,7 +470,6 @@ class TestCompressWithClient:
            {"role": "assistant", "content": "msg 4"},   # compressed
            {"role": "assistant", "content": "msg 5"},   # tail start
            {"role": "user", "content": "msg 6"},
-            {"role": "assistant", "content": "msg 7"},
        ]
        with patch("agent.context_compressor.call_llm", return_value=mock_response):
            result = c.compress(msgs)
@@ -489,7 +481,7 @@ class TestCompressWithClient:
            if r1 in ("user", "assistant") and r2 in ("user", "assistant"):
                assert r1 != r2, f"consecutive {r1} at indices {i-1},{i}"

-        # The summary should be merged into the first tail message (assistant at index 5)
+        # The summary should be merged into the first tail message (assistant)
        first_tail = [m for m in result if "msg 5" in (m.get("content") or "")]
        assert len(first_tail) == 1
        assert "summary text" in first_tail[0]["content"]
@@ -504,18 +496,14 @@ class TestCompressWithClient:
        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)

-        # Head=assistant, Tail=assistant → summary_role="user", no collision.
-        # With min_tail=3, tail = last 3 messages (indices 5-7).
-        # Need 8 messages: min_for_compress = 2+3+1 = 6, must have > 6.
+        # Head=assistant, Tail=assistant → summary_role="user", no collision
        msgs = [
            {"role": "user", "content": "msg 0"},
            {"role": "assistant", "content": "msg 1"},
            {"role": "user", "content": "msg 2"},
            {"role": "assistant", "content": "msg 3"},
-            {"role": "user", "content": "msg 4"},
-            {"role": "assistant", "content": "msg 5"},
-            {"role": "user", "content": "msg 6"},
-            {"role": "assistant", "content": "msg 7"},
+            {"role": "assistant", "content": "msg 4"},
+            {"role": "user", "content": "msg 5"},
        ]
        with patch("agent.context_compressor.call_llm", return_value=mock_response):
            result = c.compress(msgs)
@@ -612,158 +600,3 @@ class TestSummaryTargetRatio:
        with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
            c = ContextCompressor(model="test", quiet_mode=True)
        assert c.protect_last_n == 20
-
-
-class TestTokenBudgetTailProtection:
-    """Tests for token-budget-based tail protection (PR #6240).
-
-    The core change: tail protection is now based on a token budget rather
-    than a fixed message count.  This prevents large tool outputs from
-    blocking compaction.
-    """
-
-    @pytest.fixture()
-    def budget_compressor(self):
-        """Compressor with known token budget for tail protection tests."""
-        with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
-            c = ContextCompressor(
-                model="test/model",
-                threshold_percent=0.50,  # 100K threshold
-                protect_first_n=2,
-                protect_last_n=20,
-                quiet_mode=True,
-            )
-            return c
-
-    def test_large_tool_outputs_no_longer_block_compaction(self, budget_compressor):
-        """The motivating scenario: 20 messages with large tool outputs should
-        NOT prevent compaction.  With message-count tail protection they would
-        all be protected, leaving nothing to summarize."""
-        c = budget_compressor
-        messages = [
-            {"role": "user", "content": "Start task"},
-            {"role": "assistant", "content": "On it"},
-        ]
-        # Add 20 messages with large tool outputs (~5K chars each ≈ 1250 tokens)
-        for i in range(10):
-            messages.append({
-                "role": "assistant", "content": None,
-                "tool_calls": [{"function": {"name": f"tool_{i}", "arguments": "{}"}}],
-            })
-            messages.append({
-                "role": "tool", "content": "x" * 5000,
-                "tool_call_id": f"call_{i}",
-            })
-        # Add 3 recent small messages
-        messages.append({"role": "user", "content": "What's the status?"})
-        messages.append({"role": "assistant", "content": "Here's what I found..."})
-        messages.append({"role": "user", "content": "Continue"})
-
-        # The tail cut should NOT protect all 20 tool messages
-        head_end = c.protect_first_n
-        cut = c._find_tail_cut_by_tokens(messages, head_end)
-        tail_size = len(messages) - cut
-        # With token budget, the tail should be much smaller than 20+
-        assert tail_size < 20, f"Tail {tail_size} messages — large tool outputs are blocking compaction"
-        # But at least 3 (hard minimum)
-        assert tail_size >= 3
-
-    def test_min_tail_always_3_messages(self, budget_compressor):
-        """Even with a tiny token budget, at least 3 messages are protected."""
-        c = budget_compressor
-        # Override to a tiny budget
-        c.tail_token_budget = 10
-        messages = [
-            {"role": "user", "content": "hello"},
-            {"role": "assistant", "content": "hi"},
-            {"role": "user", "content": "do something"},
-            {"role": "assistant", "content": "working on it"},
-            {"role": "user", "content": "more work"},
-            {"role": "assistant", "content": "done"},
-            {"role": "user", "content": "thanks"},
-        ]
-        head_end = 2
-        cut = c._find_tail_cut_by_tokens(messages, head_end)
-        tail_size = len(messages) - cut
-        assert tail_size >= 3, f"Tail is only {tail_size} messages, min should be 3"
-
-    def test_soft_ceiling_allows_oversized_message(self, budget_compressor):
-        """The 1.5x soft ceiling allows an oversized message to be included
-        rather than splitting it."""
-        c = budget_compressor
-        # Set a small budget — 500 tokens
-        c.tail_token_budget = 500
-        messages = [
-            {"role": "user", "content": "hello"},
-            {"role": "assistant", "content": "hi"},
-            {"role": "user", "content": "read the file"},
-            # This message is ~600 tokens (> budget of 500, but < 1.5x = 750)
-            {"role": "assistant", "content": "a" * 2400},
-            {"role": "user", "content": "short"},
-            {"role": "assistant", "content": "short reply"},
-            {"role": "user", "content": "continue"},
-        ]
-        head_end = 2
-        cut = c._find_tail_cut_by_tokens(messages, head_end)
-        # The oversized message at index 3 should NOT be the cut point
-        # because 1.5x ceiling = 750 tokens and accumulated would be ~610
-        # (short msgs + oversized msg) which is < 750
-        tail_size = len(messages) - cut
-        assert tail_size >= 3
-
-    def test_small_conversation_still_compresses(self, budget_compressor):
-        """With the new min of 8 messages (head=2 + 3 + 1 guard + 2 middle),
-        a small but compressible conversation should still compress."""
-        c = budget_compressor
-        # 9 messages: head(2) + 4 middle + 3 tail = compressible
-        messages = []
-        for i in range(9):
-            role = "user" if i % 2 == 0 else "assistant"
-            messages.append({"role": role, "content": f"Message {i}"})
-
-        # Should not early-return (needs > protect_first_n + 3 + 1 = 6)
-        # Mock the summary generation to avoid real API call
-        with patch.object(c, "_generate_summary", return_value="Summary of conversation"):
-            result = c.compress(messages, current_tokens=90_000)
-        # Should have compressed (fewer messages than original)
-        assert len(result) < len(messages)
-
-    def test_prune_with_token_budget(self, budget_compressor):
-        """_prune_old_tool_results with protect_tail_tokens respects the budget."""
-        c = budget_compressor
-        messages = [
-            {"role": "user", "content": "start"},
-            {"role": "assistant", "content": None,
-             "tool_calls": [{"function": {"name": "read_file", "arguments": '{"path": "big.txt"}'}}]},
-            {"role": "tool", "content": "x" * 10000, "tool_call_id": "c1"},  # ~2500 tokens
-            {"role": "assistant", "content": None,
-             "tool_calls": [{"function": {"name": "read_file", "arguments": '{"path": "small.txt"}'}}]},
-            {"role": "tool", "content": "y" * 10000, "tool_call_id": "c2"},  # ~2500 tokens
-            {"role": "user", "content": "short recent message"},
-            {"role": "assistant", "content": "short reply"},
-        ]
-        # With a 1000-token budget, only the last couple messages should be protected
-        result, pruned = c._prune_old_tool_results(
-            messages, protect_tail_count=2, protect_tail_tokens=1000,
-        )
-        # At least one old tool result should have been pruned
-        assert pruned >= 1
-
-    def test_prune_without_token_budget_uses_message_count(self, budget_compressor):
-        """Without protect_tail_tokens, falls back to message-count behavior."""
-        c = budget_compressor
-        messages = [
-            {"role": "user", "content": "start"},
-            {"role": "assistant", "content": None,
-             "tool_calls": [{"function": {"name": "tool", "arguments": "{}"}}]},
-            {"role": "tool", "content": "x" * 5000, "tool_call_id": "c1"},
-            {"role": "user", "content": "recent"},
-            {"role": "assistant", "content": "reply"},
-        ]
-        # protect_tail_count=3 means last 3 messages protected
-        result, pruned = c._prune_old_tool_results(
-            messages, protect_tail_count=3,
-        )
-        # Tool at index 2 is outside the protected tail (last 3 = indices 2,3,4)
-        # so it might or might not be pruned depending on boundary
-        assert isinstance(pruned, int)
@@ -214,42 +214,6 @@ def test_exhausted_entry_resets_after_ttl(tmp_path, monkeypatch):
    assert entry.last_status == "ok"


-def test_exhausted_402_entry_resets_after_one_hour(tmp_path, monkeypatch):
-    """402-exhausted credentials recover after 1 hour, not 24."""
-    monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
-    _write_auth_store(
-        tmp_path,
-        {
-            "version": 1,
-            "credential_pool": {
-                "openrouter": [
-                    {
-                        "id": "cred-1",
-                        "label": "primary",
-                        "auth_type": "api_key",
-                        "priority": 0,
-                        "source": "manual",
-                        "access_token": "***",
-                        "base_url": "https://openrouter.ai/api/v1",
-                        "last_status": "exhausted",
-                        "last_status_at": time.time() - 3700,  # ~1h2m ago
-                        "last_error_code": 402,
-                    }
-                ]
-            },
-        },
-    )
-
-    from agent.credential_pool import load_pool
-
-    pool = load_pool("openrouter")
-    entry = pool.select()
-
-    assert entry is not None
-    assert entry.id == "cred-1"
-    assert entry.last_status == "ok"
-
-
 def test_explicit_reset_timestamp_overrides_default_429_ttl(tmp_path, monkeypatch):
    monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
    _write_auth_store(
@@ -1,42 +0,0 @@
-"""Tests for MiniMax auxiliary client URL normalization.
-
-MiniMax and MiniMax-CN set inference_base_url to the /anthropic path.
-The auxiliary client uses the OpenAI SDK, which needs /v1 instead.
-"""
-
-import sys
-import os
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
-
-from agent.auxiliary_client import _to_openai_base_url
-
-
-class TestToOpenaiBaseUrl:
-    def test_minimax_global_anthropic_suffix_replaced(self):
-        assert _to_openai_base_url("https://api.minimax.io/anthropic") == "https://api.minimax.io/v1"
-
-    def test_minimax_cn_anthropic_suffix_replaced(self):
-        assert _to_openai_base_url("https://api.minimaxi.com/anthropic") == "https://api.minimaxi.com/v1"
-
-    def test_trailing_slash_stripped_before_replace(self):
-        assert _to_openai_base_url("https://api.minimax.io/anthropic/") == "https://api.minimax.io/v1"
-
-    def test_v1_url_unchanged(self):
-        assert _to_openai_base_url("https://api.openai.com/v1") == "https://api.openai.com/v1"
-
-    def test_openrouter_url_unchanged(self):
-        assert _to_openai_base_url("https://openrouter.ai/api/v1") == "https://openrouter.ai/api/v1"
-
-    def test_anthropic_domain_unchanged(self):
-        """api.anthropic.com doesn't end with /anthropic — should be untouched."""
-        assert _to_openai_base_url("https://api.anthropic.com") == "https://api.anthropic.com"
-
-    def test_anthropic_in_subpath_unchanged(self):
-        assert _to_openai_base_url("https://example.com/anthropic/extra") == "https://example.com/anthropic/extra"
-
-    def test_empty_string(self):
-        assert _to_openai_base_url("") == ""
-
-    def test_none(self):
-        assert _to_openai_base_url(None) == ""
@@ -1,105 +0,0 @@
-"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog."""
-
-
-class TestMinimaxContextLengths:
-    """Verify per-model context length entries for MiniMax models."""
-
-    def test_m1_variants_have_1m_context(self):
-        from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
-        # Keys are lowercase because the lookup lowercases model names
-        for model in ("minimax-m1", "minimax-m1-40k", "minimax-m1-80k",
-                       "minimax-m1-128k", "minimax-m1-256k"):
-            assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths"
-            assert DEFAULT_CONTEXT_LENGTHS[model] == 1_000_000, f"{model} expected 1M"
-
-    def test_m2_variants_have_1m_context(self):
-        from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
-        # Keys are lowercase because the lookup lowercases model names
-        for model in ("minimax-m2.5", "minimax-m2.7"):
-            assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths"
-            assert DEFAULT_CONTEXT_LENGTHS[model] == 1_048_576, f"{model} expected 1048576"
-
-    def test_minimax_prefix_fallback(self):
-        from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
-        # The generic "minimax" prefix entry should be 1M for unknown models
-        assert DEFAULT_CONTEXT_LENGTHS["minimax"] == 1_048_576
-
-
-
-class TestMinimaxThinkingGuard:
-    """Verify that build_anthropic_kwargs does NOT add thinking params for MiniMax models."""
-
-    def test_no_thinking_for_minimax_m27(self):
-        from agent.anthropic_adapter import build_anthropic_kwargs
-        kwargs = build_anthropic_kwargs(
-            model="MiniMax-M2.7",
-            messages=[{"role": "user", "content": "hello"}],
-            tools=None,
-            max_tokens=4096,
-            reasoning_config={"enabled": True, "effort": "medium"},
-        )
-        assert "thinking" not in kwargs
-        assert "output_config" not in kwargs
-
-    def test_no_thinking_for_minimax_m1(self):
-        from agent.anthropic_adapter import build_anthropic_kwargs
-        kwargs = build_anthropic_kwargs(
-            model="MiniMax-M1-128k",
-            messages=[{"role": "user", "content": "hello"}],
-            tools=None,
-            max_tokens=4096,
-            reasoning_config={"enabled": True, "effort": "high"},
-        )
-        assert "thinking" not in kwargs
-
-    def test_thinking_still_works_for_claude(self):
-        from agent.anthropic_adapter import build_anthropic_kwargs
-        kwargs = build_anthropic_kwargs(
-            model="claude-sonnet-4-20250514",
-            messages=[{"role": "user", "content": "hello"}],
-            tools=None,
-            max_tokens=4096,
-            reasoning_config={"enabled": True, "effort": "medium"},
-        )
-        assert "thinking" in kwargs
-
-
-class TestMinimaxAuxModel:
-    """Verify auxiliary model is standard (not highspeed)."""
-
-    def test_minimax_aux_is_standard(self):
-        from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
-        assert _API_KEY_PROVIDER_AUX_MODELS["minimax"] == "MiniMax-M2.7"
-        assert _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"] == "MiniMax-M2.7"
-
-    def test_minimax_aux_not_highspeed(self):
-        from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
-        assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax"]
-        assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"]
-
-
-class TestMinimaxModelCatalog:
-    """Verify the model catalog includes M1 family and excludes deprecated models."""
-
-    def test_catalog_includes_m1_family(self):
-        from hermes_cli.models import _PROVIDER_MODELS
-        for provider in ("minimax", "minimax-cn"):
-            models = _PROVIDER_MODELS[provider]
-            assert "MiniMax-M1" in models
-            assert "MiniMax-M1-40k" in models
-            assert "MiniMax-M1-80k" in models
-            assert "MiniMax-M1-128k" in models
-            assert "MiniMax-M1-256k" in models
-
-    def test_catalog_excludes_deprecated(self):
-        from hermes_cli.models import _PROVIDER_MODELS
-        for provider in ("minimax", "minimax-cn"):
-            models = _PROVIDER_MODELS[provider]
-            assert "MiniMax-M2.1" not in models
-
-    def test_catalog_excludes_highspeed(self):
-        from hermes_cli.models import _PROVIDER_MODELS
-        for provider in ("minimax", "minimax-cn"):
-            models = _PROVIDER_MODELS[provider]
-            assert "MiniMax-M2.7-highspeed" not in models
-            assert "MiniMax-M2.5-highspeed" not in models
@@ -1,66 +0,0 @@
-import pytest
-from unittest.mock import MagicMock, patch
-from hermes_cli.plugins import VALID_HOOKS, PluginManager
-import os
-import shutil
-import tempfile
-from cli import HermesCLI
-
-
-def test_session_hooks_in_valid_hooks():
-    """Verify on_session_finalize and on_session_reset are registered as valid hooks."""
-    assert "on_session_finalize" in VALID_HOOKS
-    assert "on_session_reset" in VALID_HOOKS
-
-
-@patch("hermes_cli.plugins.invoke_hook")
-def test_session_finalize_on_reset(mock_invoke_hook):
-    """Verify on_session_finalize fires when /new or /reset is used."""
-    cli = HermesCLI()
-    cli.agent = MagicMock()
-    cli.agent.session_id = "test-session-id"
-
-    # Simulate /new command which triggers on_session_finalize for the old session
-    cli.new_session(silent=True)
-
-    # Check if on_session_finalize was called for the old session
-    mock_invoke_hook.assert_any_call(
-        "on_session_finalize", session_id="test-session-id", platform="cli"
-    )
-    # Check if on_session_reset was called for the new session
-    mock_invoke_hook.assert_any_call(
-        "on_session_reset", session_id=cli.session_id, platform="cli"
-    )
-
-
-@patch("hermes_cli.plugins.invoke_hook")
-def test_session_finalize_on_cleanup(mock_invoke_hook):
-    """Verify on_session_finalize fires during CLI exit cleanup."""
-    import cli as cli_mod
-
-    mock_agent = MagicMock()
-    mock_agent.session_id = "cleanup-session-id"
-    cli_mod._active_agent_ref = mock_agent
-    cli_mod._cleanup_done = False
-
-    cli_mod._run_cleanup()
-
-    mock_invoke_hook.assert_any_call(
-        "on_session_finalize", session_id="cleanup-session-id", platform="cli"
-    )
-
-
-@patch("hermes_cli.plugins.invoke_hook")
-def test_hook_errors_are_caught(mock_invoke_hook):
-    """Verify hook exceptions are caught and don't crash the agent."""
-    mgr = PluginManager()
-
-    # Register a hook that raises
-    def bad_callback(**kwargs):
-        raise Exception("Hook failed")
-
-    mgr._hooks["on_session_finalize"] = [bad_callback]
-
-    # This should not raise
-    results = mgr.invoke_hook("on_session_finalize", session_id="test", platform="cli")
-    assert results == []
@@ -33,13 +33,6 @@ def git_repo(tmp_path):
        ["git", "commit", "-m", "Initial commit"],
        cwd=repo, capture_output=True,
    )
-    # Add a fake remote ref so cleanup logic sees the initial commit as
-    # "pushed".  Without this, `git log HEAD --not --remotes` treats every
-    # commit as unpushed and cleanup refuses to delete worktrees.
-    subprocess.run(
-        ["git", "update-ref", "refs/remotes/origin/main", "HEAD"],
-        cwd=repo, capture_output=True,
-    )
    return repo


@@ -88,11 +81,7 @@ def _setup_worktree(repo_root):


 def _cleanup_worktree(info):
-    """Test version of _cleanup_worktree.
-
-    Preserves the worktree only if it has unpushed commits.
-    Dirty working tree alone is not enough to keep it.
-    """
+    """Test version of _cleanup_worktree."""
    wt_path = info["path"]
    branch = info["branch"]
    repo_root = info["repo_root"]
@@ -100,15 +89,15 @@ def _cleanup_worktree(info):
    if not Path(wt_path).exists():
        return

-    # Check for unpushed commits
-    result = subprocess.run(
-        ["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
+    # Check for uncommitted changes
+    status = subprocess.run(
+        ["git", "status", "--porcelain"],
        capture_output=True, text=True, timeout=10, cwd=wt_path,
    )
-    has_unpushed = bool(result.stdout.strip())
+    has_changes = bool(status.stdout.strip())

-    if has_unpushed:
-        return False  # Did not clean up — has unpushed commits
+    if has_changes:
+        return False  # Did not clean up

    subprocess.run(
        ["git", "worktree", "remove", wt_path, "--force"],
@@ -215,45 +204,20 @@ class TestWorktreeCleanup:
        assert result is True
        assert not Path(info["path"]).exists()

-    def test_dirty_worktree_cleaned_when_no_unpushed(self, git_repo):
-        """Dirty working tree without unpushed commits is cleaned up.
-
-        Agent sessions typically leave untracked files / artifacts behind.
-        Since all real work is in pushed commits, these don't warrant
-        keeping the worktree.
-        """
+    def test_dirty_worktree_kept(self, git_repo):
        info = _setup_worktree(str(git_repo))
        assert info is not None

-        # Make uncommitted changes (untracked file)
+        # Make uncommitted changes
        (Path(info["path"]) / "new-file.txt").write_text("uncommitted")
        subprocess.run(
            ["git", "add", "new-file.txt"],
            cwd=info["path"], capture_output=True,
        )

-        # The git_repo fixture already has a fake remote ref so the initial
-        # commit is seen as "pushed".  No unpushed commits → cleanup proceeds.
        result = _cleanup_worktree(info)
-        assert result is True  # Cleaned up despite dirty working tree
-        assert not Path(info["path"]).exists()
-
-    def test_worktree_with_unpushed_commits_kept(self, git_repo):
-        """Worktree with unpushed commits is preserved."""
-        info = _setup_worktree(str(git_repo))
-        assert info is not None
-
-        # Make a commit that is NOT on any remote
-        (Path(info["path"]) / "work.txt").write_text("real work")
-        subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
-        subprocess.run(
-            ["git", "commit", "-m", "agent work"],
-            cwd=info["path"], capture_output=True,
-        )
-
-        result = _cleanup_worktree(info)
-        assert result is False  # Kept — has unpushed commits
-        assert Path(info["path"]).exists()
+        assert result is False
+        assert Path(info["path"]).exists()  # Still there

    def test_branch_deleted_on_cleanup(self, git_repo):
        info = _setup_worktree(str(git_repo))
@@ -403,7 +367,7 @@ class TestMultipleWorktrees:
        lines = [l for l in result.stdout.strip().splitlines() if l.strip()]
        assert len(lines) == 11

-        # Cleanup all (git_repo fixture has a fake remote ref so cleanup works)
+        # Cleanup all
        for info in worktrees:
            # Discard changes first so cleanup works
            subprocess.run(
@@ -528,77 +492,33 @@ class TestStaleWorktreePruning:
        assert not pruned
        assert Path(info["path"]).exists()

-    def test_keeps_old_worktree_with_unpushed_commits(self, git_repo):
-        """Old worktrees (24-72h) with unpushed commits should NOT be pruned."""
+    def test_keeps_dirty_old_worktree(self, git_repo):
+        """Old worktrees with uncommitted changes should NOT be pruned."""
        import time

        info = _setup_worktree(str(git_repo))
        assert info is not None

-        # Make an unpushed commit
-        (Path(info["path"]) / "work.txt").write_text("real work")
-        subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
+        # Make it dirty
+        (Path(info["path"]) / "dirty.txt").write_text("uncommitted")
        subprocess.run(
-            ["git", "commit", "-m", "agent work"],
+            ["git", "add", "dirty.txt"],
            cwd=info["path"], capture_output=True,
        )

-        # Make it old (25h — in the 24-72h soft tier)
+        # Make it old
        old_time = time.time() - (25 * 3600)
        os.utime(info["path"], (old_time, old_time))

-        # Check for unpushed commits (simulates prune logic)
-        result = subprocess.run(
-            ["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
+        # Check if it would be pruned
+        status = subprocess.run(
+            ["git", "status", "--porcelain"],
            capture_output=True, text=True, cwd=info["path"],
        )
-        has_unpushed = bool(result.stdout.strip())
-        assert has_unpushed  # Has unpushed commits → not pruned in soft tier
+        has_changes = bool(status.stdout.strip())
+        assert has_changes  # Should be dirty → not pruned
        assert Path(info["path"]).exists()

-    def test_force_prunes_very_old_worktree(self, git_repo):
-        """Worktrees older than 72h should be force-pruned regardless."""
-        import time
-
-        info = _setup_worktree(str(git_repo))
-        assert info is not None
-
-        # Make an unpushed commit (would normally protect it)
-        (Path(info["path"]) / "work.txt").write_text("stale work")
-        subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
-        subprocess.run(
-            ["git", "commit", "-m", "old agent work"],
-            cwd=info["path"], capture_output=True,
-        )
-
-        # Make it very old (73h — beyond the 72h hard threshold)
-        old_time = time.time() - (73 * 3600)
-        os.utime(info["path"], (old_time, old_time))
-
-        # Simulate the force-prune tier check
-        hard_cutoff = time.time() - (72 * 3600)
-        mtime = Path(info["path"]).stat().st_mtime
-        assert mtime <= hard_cutoff  # Should qualify for force removal
-
-        # Actually remove it (simulates _prune_stale_worktrees force path)
-        branch_result = subprocess.run(
-            ["git", "branch", "--show-current"],
-            capture_output=True, text=True, timeout=5, cwd=info["path"],
-        )
-        branch = branch_result.stdout.strip()
-
-        subprocess.run(
-            ["git", "worktree", "remove", info["path"], "--force"],
-            capture_output=True, text=True, timeout=15, cwd=str(git_repo),
-        )
-        if branch:
-            subprocess.run(
-                ["git", "branch", "-D", branch],
-                capture_output=True, text=True, timeout=10, cwd=str(git_repo),
-            )
-
-        assert not Path(info["path"]).exists()
-

 class TestEdgeCases:
    """Test edge cases for robustness."""
@@ -691,133 +611,6 @@ class TestTerminalCWDIntegration:
        assert result.stdout.strip() == "true"


-class TestOrphanedBranchPruning:
-    """Test cleanup of orphaned hermes/* and pr-* branches."""
-
-    def test_prunes_orphaned_hermes_branch(self, git_repo):
-        """hermes/hermes-* branches with no worktree should be deleted."""
-        # Create a branch that looks like a worktree branch but has no worktree
-        subprocess.run(
-            ["git", "branch", "hermes/hermes-deadbeef", "HEAD"],
-            cwd=str(git_repo), capture_output=True,
-        )
-
-        # Verify it exists
-        result = subprocess.run(
-            ["git", "branch", "--list", "hermes/hermes-deadbeef"],
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-        assert "hermes/hermes-deadbeef" in result.stdout
-
-        # Simulate _prune_orphaned_branches logic
-        result = subprocess.run(
-            ["git", "branch", "--format=%(refname:short)"],
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-        all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
-
-        wt_result = subprocess.run(
-            ["git", "worktree", "list", "--porcelain"],
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-        active_branches = {"main"}
-        for line in wt_result.stdout.split("\n"):
-            if line.startswith("branch refs/heads/"):
-                active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
-
-        orphaned = [
-            b for b in all_branches
-            if b not in active_branches
-            and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
-        ]
-        assert "hermes/hermes-deadbeef" in orphaned
-
-        # Delete them
-        if orphaned:
-            subprocess.run(
-                ["git", "branch", "-D"] + orphaned,
-                capture_output=True, text=True, cwd=str(git_repo),
-            )
-
-        # Verify gone
-        result = subprocess.run(
-            ["git", "branch", "--list", "hermes/hermes-deadbeef"],
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-        assert "hermes/hermes-deadbeef" not in result.stdout
-
-    def test_prunes_orphaned_pr_branch(self, git_repo):
-        """pr-* branches should be deleted during pruning."""
-        subprocess.run(
-            ["git", "branch", "pr-1234", "HEAD"],
-            cwd=str(git_repo), capture_output=True,
-        )
-        subprocess.run(
-            ["git", "branch", "pr-5678", "HEAD"],
-            cwd=str(git_repo), capture_output=True,
-        )
-
-        result = subprocess.run(
-            ["git", "branch", "--format=%(refname:short)"],
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-        all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
-
-        active_branches = {"main"}
-        orphaned = [
-            b for b in all_branches
-            if b not in active_branches and b.startswith("pr-")
-        ]
-        assert "pr-1234" in orphaned
-        assert "pr-5678" in orphaned
-
-        subprocess.run(
-            ["git", "branch", "-D"] + orphaned,
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-
-        # Verify gone
-        result = subprocess.run(
-            ["git", "branch", "--format=%(refname:short)"],
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-        remaining = result.stdout.strip()
-        assert "pr-1234" not in remaining
-        assert "pr-5678" not in remaining
-
-    def test_preserves_active_worktree_branch(self, git_repo):
-        """Branches with active worktrees should NOT be pruned."""
-        info = _setup_worktree(str(git_repo))
-        assert info is not None
-
-        result = subprocess.run(
-            ["git", "worktree", "list", "--porcelain"],
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-        active_branches = set()
-        for line in result.stdout.split("\n"):
-            if line.startswith("branch refs/heads/"):
-                active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
-
-        assert info["branch"] in active_branches  # Protected
-
-    def test_preserves_main_branch(self, git_repo):
-        """main branch should never be pruned."""
-        result = subprocess.run(
-            ["git", "branch", "--format=%(refname:short)"],
-            capture_output=True, text=True, cwd=str(git_repo),
-        )
-        all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
-        active_branches = {"main"}
-
-        orphaned = [
-            b for b in all_branches
-            if b not in active_branches
-            and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
-        ]
-        assert "main" not in orphaned
-
-
 class TestSystemPromptInjection:
    """Test that the agent gets worktree context in its system prompt."""

@@ -832,7 +625,7 @@ class TestSystemPromptInjection:
            f"{info['path']}. Your branch is `{info['branch']}`. "
            f"Changes here do not affect the main working tree or other agents. "
            f"Remember to commit and push your changes, and create a PR if appropriate. "
-            f"The original repo is at {info['repo_root']}.]\n"
+            f"The original repo is at {info['repo_root']}.]"
        )

        assert info["path"] in wt_note
@@ -339,36 +339,6 @@ class TestMarkJobRun:
        assert updated["last_status"] == "error"
        assert updated["last_error"] == "timeout"

-    def test_delivery_error_tracked_separately(self, tmp_cron_dir):
-        """Agent succeeds but delivery fails — both tracked independently."""
-        job = create_job(prompt="Report", schedule="every 1h")
-        mark_job_run(job["id"], success=True, delivery_error="platform 'telegram' not configured")
-        updated = get_job(job["id"])
-        assert updated["last_status"] == "ok"
-        assert updated["last_error"] is None
-        assert updated["last_delivery_error"] == "platform 'telegram' not configured"
-
-    def test_delivery_error_cleared_on_success(self, tmp_cron_dir):
-        """Successful delivery clears the previous delivery error."""
-        job = create_job(prompt="Report", schedule="every 1h")
-        mark_job_run(job["id"], success=True, delivery_error="network timeout")
-        updated = get_job(job["id"])
-        assert updated["last_delivery_error"] == "network timeout"
-        # Next run delivers successfully
-        mark_job_run(job["id"], success=True, delivery_error=None)
-        updated = get_job(job["id"])
-        assert updated["last_delivery_error"] is None
-
-    def test_both_agent_and_delivery_error(self, tmp_cron_dir):
-        """Agent fails AND delivery fails — both errors recorded."""
-        job = create_job(prompt="Report", schedule="every 1h")
-        mark_job_run(job["id"], success=False, error="model timeout",
-                     delivery_error="platform 'discord' not enabled")
-        updated = get_job(job["id"])
-        assert updated["last_status"] == "error"
-        assert updated["last_error"] == "model timeout"
-        assert updated["last_delivery_error"] == "platform 'discord' not enabled"
-

 class TestAdvanceNextRun:
    """Tests for advance_next_run() — crash-safety for recurring jobs."""
@@ -508,90 +508,6 @@ class TestDeliverResultWrapping:
        assert send_mock.call_args.kwargs["thread_id"] == "17585"


-class TestDeliverResultErrorReturns:
-    """Verify _deliver_result returns error strings on failure, None on success."""
-
-    def test_returns_none_on_successful_delivery(self):
-        from gateway.config import Platform
-
-        pconfig = MagicMock()
-        pconfig.enabled = True
-        mock_cfg = MagicMock()
-        mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
-
-        with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
-             patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})):
-            job = {
-                "id": "ok-job",
-                "deliver": "origin",
-                "origin": {"platform": "telegram", "chat_id": "123"},
-            }
-            result = _deliver_result(job, "Output.")
-        assert result is None
-
-    def test_returns_none_for_local_delivery(self):
-        """local-only jobs don't deliver — not a failure."""
-        job = {"id": "local-job", "deliver": "local"}
-        result = _deliver_result(job, "Output.")
-        assert result is None
-
-    def test_returns_error_for_unknown_platform(self):
-        job = {
-            "id": "bad-platform",
-            "deliver": "origin",
-            "origin": {"platform": "fax", "chat_id": "123"},
-        }
-        with patch("gateway.config.load_gateway_config"):
-            result = _deliver_result(job, "Output.")
-        assert result is not None
-        assert "unknown platform" in result
-
-    def test_returns_error_when_platform_disabled(self):
-        from gateway.config import Platform
-
-        pconfig = MagicMock()
-        pconfig.enabled = False
-        mock_cfg = MagicMock()
-        mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
-
-        with patch("gateway.config.load_gateway_config", return_value=mock_cfg):
-            job = {
-                "id": "disabled",
-                "deliver": "origin",
-                "origin": {"platform": "telegram", "chat_id": "123"},
-            }
-            result = _deliver_result(job, "Output.")
-        assert result is not None
-        assert "not configured" in result
-
-    def test_returns_error_on_send_failure(self):
-        from gateway.config import Platform
-
-        pconfig = MagicMock()
-        pconfig.enabled = True
-        mock_cfg = MagicMock()
-        mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
-
-        with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
-             patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"error": "rate limited"})):
-            job = {
-                "id": "rate-limited",
-                "deliver": "origin",
-                "origin": {"platform": "telegram", "chat_id": "123"},
-            }
-            result = _deliver_result(job, "Output.")
-        assert result is not None
-        assert "rate limited" in result
-
-    def test_returns_error_for_unresolved_target(self, monkeypatch):
-        """Non-local delivery with no resolvable target should return an error."""
-        monkeypatch.delenv("TELEGRAM_HOME_CHANNEL", raising=False)
-        job = {"id": "no-target", "deliver": "telegram"}
-        result = _deliver_result(job, "Output.")
-        assert result is not None
-        assert "no delivery target" in result
-
-
 class TestRunJobSessionPersistence:
    def test_run_job_passes_session_db_and_cron_platform(self, tmp_path):
        job = {
@@ -1,361 +0,0 @@
-"""Tests for the BlueBubbles iMessage gateway adapter."""
-import pytest
-
-from gateway.config import Platform, PlatformConfig
-
-
-def _make_adapter(monkeypatch, **extra):
-    monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
-    monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
-    from gateway.platforms.bluebubbles import BlueBubblesAdapter
-
-    cfg = PlatformConfig(
-        enabled=True,
-        extra={
-            "server_url": "http://localhost:1234",
-            "password": "secret",
-            **extra,
-        },
-    )
-    return BlueBubblesAdapter(cfg)
-
-
-class TestBlueBubblesPlatformEnum:
-    def test_bluebubbles_enum_exists(self):
-        assert Platform.BLUEBUBBLES.value == "bluebubbles"
-
-
-class TestBlueBubblesConfigLoading:
-    def test_apply_env_overrides_bluebubbles(self, monkeypatch):
-        monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
-        monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
-        monkeypatch.setenv("BLUEBUBBLES_WEBHOOK_PORT", "9999")
-        from gateway.config import GatewayConfig, _apply_env_overrides
-
-        config = GatewayConfig()
-        _apply_env_overrides(config)
-        assert Platform.BLUEBUBBLES in config.platforms
-        bc = config.platforms[Platform.BLUEBUBBLES]
-        assert bc.enabled is True
-        assert bc.extra["server_url"] == "http://localhost:1234"
-        assert bc.extra["password"] == "secret"
-        assert bc.extra["webhook_port"] == 9999
-
-    def test_connected_platforms_includes_bluebubbles(self, monkeypatch):
-        monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
-        monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
-        from gateway.config import GatewayConfig, _apply_env_overrides
-
-        config = GatewayConfig()
-        _apply_env_overrides(config)
-        assert Platform.BLUEBUBBLES in config.get_connected_platforms()
-
-    def test_home_channel_set_from_env(self, monkeypatch):
-        monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
-        monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
-        monkeypatch.setenv("BLUEBUBBLES_HOME_CHANNEL", "user@example.com")
-        from gateway.config import GatewayConfig, _apply_env_overrides
-
-        config = GatewayConfig()
-        _apply_env_overrides(config)
-        hc = config.platforms[Platform.BLUEBUBBLES].home_channel
-        assert hc is not None
-        assert hc.chat_id == "user@example.com"
-
-    def test_not_connected_without_password(self, monkeypatch):
-        monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
-        monkeypatch.delenv("BLUEBUBBLES_PASSWORD", raising=False)
-        from gateway.config import GatewayConfig, _apply_env_overrides
-
-        config = GatewayConfig()
-        _apply_env_overrides(config)
-        assert Platform.BLUEBUBBLES not in config.get_connected_platforms()
-
-
-class TestBlueBubblesHelpers:
-    def test_check_requirements(self, monkeypatch):
-        monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
-        monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
-        from gateway.platforms.bluebubbles import check_bluebubbles_requirements
-
-        assert check_bluebubbles_requirements() is True
-
-    def test_format_message_strips_markdown(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        assert adapter.format_message("**Hello** `world`") == "Hello world"
-
-    def test_strip_markdown_headers(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        assert adapter.format_message("## Heading\ntext") == "Heading\ntext"
-
-    def test_strip_markdown_links(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        assert adapter.format_message("[click here](http://example.com)") == "click here"
-
-    def test_init_normalizes_webhook_path(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch, webhook_path="bluebubbles-webhook")
-        assert adapter.webhook_path == "/bluebubbles-webhook"
-
-    def test_init_preserves_leading_slash(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch, webhook_path="/my-hook")
-        assert adapter.webhook_path == "/my-hook"
-
-    def test_server_url_normalized(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch, server_url="http://localhost:1234/")
-        assert adapter.server_url == "http://localhost:1234"
-
-    def test_server_url_adds_scheme(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch, server_url="localhost:1234")
-        assert adapter.server_url == "http://localhost:1234"
-
-
-class TestBlueBubblesWebhookParsing:
-    def test_webhook_prefers_chat_guid_over_message_guid(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        payload = {
-            "guid": "MESSAGE-GUID",
-            "chatGuid": "iMessage;-;user@example.com",
-            "chatIdentifier": "user@example.com",
-        }
-        record = adapter._extract_payload_record(payload) or {}
-        chat_guid = adapter._value(
-            record.get("chatGuid"),
-            payload.get("chatGuid"),
-            record.get("chat_guid"),
-            payload.get("chat_guid"),
-            payload.get("guid"),
-        )
-        assert chat_guid == "iMessage;-;user@example.com"
-
-    def test_webhook_can_fall_back_to_sender_when_chat_fields_missing(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        payload = {
-            "data": {
-                "guid": "MESSAGE-GUID",
-                "text": "hello",
-                "handle": {"address": "user@example.com"},
-                "isFromMe": False,
-            }
-        }
-        record = adapter._extract_payload_record(payload) or {}
-        chat_guid = adapter._value(
-            record.get("chatGuid"),
-            payload.get("chatGuid"),
-            record.get("chat_guid"),
-            payload.get("chat_guid"),
-            payload.get("guid"),
-        )
-        chat_identifier = adapter._value(
-            record.get("chatIdentifier"),
-            record.get("identifier"),
-            payload.get("chatIdentifier"),
-            payload.get("identifier"),
-        )
-        sender = (
-            adapter._value(
-                record.get("handle", {}).get("address")
-                if isinstance(record.get("handle"), dict)
-                else None,
-                record.get("sender"),
-                record.get("from"),
-                record.get("address"),
-            )
-            or chat_identifier
-            or chat_guid
-        )
-        if not (chat_guid or chat_identifier) and sender:
-            chat_identifier = sender
-        assert chat_identifier == "user@example.com"
-
-    def test_extract_payload_record_accepts_list_data(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        payload = {
-            "type": "new-message",
-            "data": [
-                {
-                    "text": "hello",
-                    "chatGuid": "iMessage;-;user@example.com",
-                    "chatIdentifier": "user@example.com",
-                }
-            ],
-        }
-        record = adapter._extract_payload_record(payload)
-        assert record == payload["data"][0]
-
-    def test_extract_payload_record_dict_data(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        payload = {"data": {"text": "hello", "chatGuid": "iMessage;-;+1234"}}
-        record = adapter._extract_payload_record(payload)
-        assert record["text"] == "hello"
-
-    def test_extract_payload_record_fallback_to_message(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        payload = {"message": {"text": "hello"}}
-        record = adapter._extract_payload_record(payload)
-        assert record["text"] == "hello"
-
-
-class TestBlueBubblesGuidResolution:
-    def test_raw_guid_returned_as_is(self, monkeypatch):
-        """If target already contains ';' it's a raw GUID — return unchanged."""
-        adapter = _make_adapter(monkeypatch)
-        import asyncio
-
-        result = asyncio.get_event_loop().run_until_complete(
-            adapter._resolve_chat_guid("iMessage;-;user@example.com")
-        )
-        assert result == "iMessage;-;user@example.com"
-
-    def test_empty_target_returns_none(self, monkeypatch):
-        adapter = _make_adapter(monkeypatch)
-        import asyncio
-
-        result = asyncio.get_event_loop().run_until_complete(
-            adapter._resolve_chat_guid("")
-        )
-        assert result is None
-
-
-class TestBlueBubblesToolsetIntegration:
-    def test_toolset_exists(self):
-        from toolsets import TOOLSETS
-
-        assert "hermes-bluebubbles" in TOOLSETS
-
-    def test_toolset_in_gateway_composite(self):
-        from toolsets import TOOLSETS
-
-        gateway = TOOLSETS["hermes-gateway"]
-        assert "hermes-bluebubbles" in gateway["includes"]
-
-
-class TestBlueBubblesPromptHint:
-    def test_platform_hint_exists(self):
-        from agent.prompt_builder import PLATFORM_HINTS
-
-        assert "bluebubbles" in PLATFORM_HINTS
-        hint = PLATFORM_HINTS["bluebubbles"]
-        assert "iMessage" in hint
-        assert "plain text" in hint
-
-
-class TestBlueBubblesAttachmentDownload:
-    """Verify _download_attachment routes to the correct cache helper."""
-
-    def test_download_image_uses_image_cache(self, monkeypatch):
-        """Image MIME routes to cache_image_from_bytes."""
-        adapter = _make_adapter(monkeypatch)
-        import asyncio
-        import httpx
-
-        # Mock the HTTP client response
-        class MockResponse:
-            status_code = 200
-            content = b"\x89PNG\r\n\x1a\n"
-
-            def raise_for_status(self):
-                pass
-
-        async def mock_get(*args, **kwargs):
-            return MockResponse()
-
-        adapter.client = type("MockClient", (), {"get": mock_get})()
-
-        cached_path = None
-
-        def mock_cache_image(data, ext):
-            nonlocal cached_path
-            cached_path = f"/tmp/test_image{ext}"
-            return cached_path
-
-        monkeypatch.setattr(
-            "gateway.platforms.bluebubbles.cache_image_from_bytes",
-            mock_cache_image,
-        )
-
-        att_meta = {"mimeType": "image/png", "transferName": "photo.png"}
-        result = asyncio.get_event_loop().run_until_complete(
-            adapter._download_attachment("att-guid-123", att_meta)
-        )
-        assert result == "/tmp/test_image.png"
-
-    def test_download_audio_uses_audio_cache(self, monkeypatch):
-        """Audio MIME routes to cache_audio_from_bytes."""
-        adapter = _make_adapter(monkeypatch)
-        import asyncio
-
-        class MockResponse:
-            status_code = 200
-            content = b"fake-audio-data"
-
-            def raise_for_status(self):
-                pass
-
-        async def mock_get(*args, **kwargs):
-            return MockResponse()
-
-        adapter.client = type("MockClient", (), {"get": mock_get})()
-
-        cached_path = None
-
-        def mock_cache_audio(data, ext):
-            nonlocal cached_path
-            cached_path = f"/tmp/test_audio{ext}"
-            return cached_path
-
-        monkeypatch.setattr(
-            "gateway.platforms.bluebubbles.cache_audio_from_bytes",
-            mock_cache_audio,
-        )
-
-        att_meta = {"mimeType": "audio/mpeg", "transferName": "voice.mp3"}
-        result = asyncio.get_event_loop().run_until_complete(
-            adapter._download_attachment("att-guid-456", att_meta)
-        )
-        assert result == "/tmp/test_audio.mp3"
-
-    def test_download_document_uses_document_cache(self, monkeypatch):
-        """Non-image/audio MIME routes to cache_document_from_bytes."""
-        adapter = _make_adapter(monkeypatch)
-        import asyncio
-
-        class MockResponse:
-            status_code = 200
-            content = b"fake-doc-data"
-
-            def raise_for_status(self):
-                pass
-
-        async def mock_get(*args, **kwargs):
-            return MockResponse()
-
-        adapter.client = type("MockClient", (), {"get": mock_get})()
-
-        cached_path = None
-
-        def mock_cache_doc(data, filename):
-            nonlocal cached_path
-            cached_path = f"/tmp/{filename}"
-            return cached_path
-
-        monkeypatch.setattr(
-            "gateway.platforms.bluebubbles.cache_document_from_bytes",
-            mock_cache_doc,
-        )
-
-        att_meta = {"mimeType": "application/pdf", "transferName": "report.pdf"}
-        result = asyncio.get_event_loop().run_until_complete(
-            adapter._download_attachment("att-guid-789", att_meta)
-        )
-        assert result == "/tmp/report.pdf"
-
-    def test_download_returns_none_without_client(self, monkeypatch):
-        """No client → returns None gracefully."""
-        adapter = _make_adapter(monkeypatch)
-        adapter.client = None
-        import asyncio
-
-        result = asyncio.get_event_loop().run_until_complete(
-            adapter._download_attachment("att-guid", {"mimeType": "image/png"})
-        )
-        assert result is None
@@ -209,31 +209,14 @@ class TestIncomingDocumentHandling:
        assert "[Content of readme.md]:" in event.text
        assert "# Title" in event.text

-    @pytest.mark.asyncio
-    async def test_log_content_injected(self, adapter):
-        """.log file under 100KB should be treated as text/plain and injected."""
-        file_content = b"BLE trace line 1\nBLE trace line 2"
-
-        with _mock_aiohttp_download(file_content):
-            msg = make_message(
-                attachments=[make_attachment(filename="btsnoop_hci.log", content_type="text/plain")],
-                content="please inspect this",
-            )
-            await adapter._handle_message(msg)
-
-        event = adapter.handle_message.call_args[0][0]
-        assert "[Content of btsnoop_hci.log]:" in event.text
-        assert "BLE trace line 1" in event.text
-        assert "please inspect this" in event.text
-
    @pytest.mark.asyncio
    async def test_oversized_document_skipped(self, adapter):
-        """A document over 32MB should be skipped — media_urls stays empty."""
+        """A document over 20MB should be skipped — media_urls stays empty."""
        msg = make_message([
            make_attachment(
                filename="huge.pdf",
                content_type="application/pdf",
-                size=33 * 1024 * 1024,
+                size=25 * 1024 * 1024,
            )
        ])
        await adapter._handle_message(msg)
@@ -243,24 +226,6 @@ class TestIncomingDocumentHandling:
        # handler must still be called
        adapter.handle_message.assert_called_once()

-    @pytest.mark.asyncio
-    async def test_mid_sized_zip_under_32mb_is_cached(self, adapter):
-        """A 25MB .zip should be accepted now that Discord documents allow up to 32MB."""
-        msg = make_message([
-            make_attachment(
-                filename="bugreport.zip",
-                content_type="application/zip",
-                size=25 * 1024 * 1024,
-            )
-        ])
-
-        with _mock_aiohttp_download(b"PK\x03\x04test"):
-            await adapter._handle_message(msg)
-
-        event = adapter.handle_message.call_args[0][0]
-        assert len(event.media_urls) == 1
-        assert event.media_types == ["application/zip"]
-
    @pytest.mark.asyncio
    async def test_zip_document_cached(self, adapter):
        """A .zip file should be cached as a supported document."""
@@ -1,277 +0,0 @@
-"""Tests for Discord reply_to_mode functionality.
-
-Covers the threading behavior control for multi-chunk replies:
- "off": Never reply-reference to original message
- "first": Only first chunk uses reply reference (default)
- "all": All chunks reply-reference the original message
-"""
-import os
-import sys
-from types import SimpleNamespace
-from unittest.mock import MagicMock, AsyncMock, patch
-
-import pytest
-
-from gateway.config import PlatformConfig, GatewayConfig, Platform, _apply_env_overrides
-
-
-def _ensure_discord_mock():
-    """Install a mock discord module when discord.py isn't available."""
-    if "discord" in sys.modules and hasattr(sys.modules["discord"], "__file__"):
-        return
-
-    discord_mod = MagicMock()
-    discord_mod.Intents.default.return_value = MagicMock()
-    discord_mod.Client = MagicMock
-    discord_mod.File = MagicMock
-    discord_mod.DMChannel = type("DMChannel", (), {})
-    discord_mod.Thread = type("Thread", (), {})
-    discord_mod.ForumChannel = type("ForumChannel", (), {})
-    discord_mod.ui = SimpleNamespace(View=object, button=lambda *a, **k: (lambda fn: fn), Button=object)
-    discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, secondary=2, danger=3, green=1, grey=2, blurple=2, red=3)
-    discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4, purple=lambda: 5)
-    discord_mod.Interaction = object
-    discord_mod.Embed = MagicMock
-    discord_mod.app_commands = SimpleNamespace(
-        describe=lambda **kwargs: (lambda fn: fn),
-        choices=lambda **kwargs: (lambda fn: fn),
-        Choice=lambda **kwargs: SimpleNamespace(**kwargs),
-    )
-
-    ext_mod = MagicMock()
-    commands_mod = MagicMock()
-    commands_mod.Bot = MagicMock
-    ext_mod.commands = commands_mod
-
-    sys.modules.setdefault("discord", discord_mod)
-    sys.modules.setdefault("discord.ext", ext_mod)
-    sys.modules.setdefault("discord.ext.commands", commands_mod)
-
-
-_ensure_discord_mock()
-
-from gateway.platforms.discord import DiscordAdapter  # noqa: E402
-
-
-@pytest.fixture()
-def adapter_factory():
-    """Factory to create DiscordAdapter with custom reply_to_mode."""
-    def create(reply_to_mode: str = "first"):
-        config = PlatformConfig(enabled=True, token="test-token", reply_to_mode=reply_to_mode)
-        return DiscordAdapter(config)
-    return create
-
-
-class TestReplyToModeConfig:
-    """Tests for reply_to_mode configuration loading."""
-
-    def test_default_mode_is_first(self, adapter_factory):
-        adapter = adapter_factory()
-        assert adapter._reply_to_mode == "first"
-
-    def test_off_mode(self, adapter_factory):
-        adapter = adapter_factory(reply_to_mode="off")
-        assert adapter._reply_to_mode == "off"
-
-    def test_first_mode(self, adapter_factory):
-        adapter = adapter_factory(reply_to_mode="first")
-        assert adapter._reply_to_mode == "first"
-
-    def test_all_mode(self, adapter_factory):
-        adapter = adapter_factory(reply_to_mode="all")
-        assert adapter._reply_to_mode == "all"
-
-    def test_invalid_mode_stored_as_is(self, adapter_factory):
-        """Invalid modes are stored but send() handles them gracefully."""
-        adapter = adapter_factory(reply_to_mode="invalid")
-        assert adapter._reply_to_mode == "invalid"
-
-    def test_none_mode_defaults_to_first(self):
-        config = PlatformConfig(enabled=True, token="test-token")
-        adapter = DiscordAdapter(config)
-        assert adapter._reply_to_mode == "first"
-
-    def test_empty_string_mode_defaults_to_first(self):
-        config = PlatformConfig(enabled=True, token="test-token", reply_to_mode="")
-        adapter = DiscordAdapter(config)
-        assert adapter._reply_to_mode == "first"
-
-
-def _make_discord_adapter(reply_to_mode: str = "first"):
-    """Create a DiscordAdapter with mocked client and channel for send() tests."""
-    config = PlatformConfig(enabled=True, token="test-token", reply_to_mode=reply_to_mode)
-    adapter = DiscordAdapter(config)
-
-    # Mock the Discord client and channel
-    mock_channel = AsyncMock()
-    ref_message = MagicMock()
-    mock_channel.fetch_message = AsyncMock(return_value=ref_message)
-
-    sent_msg = MagicMock()
-    sent_msg.id = 42
-    mock_channel.send = AsyncMock(return_value=sent_msg)
-
-    mock_client = MagicMock()
-    mock_client.get_channel = MagicMock(return_value=mock_channel)
-
-    adapter._client = mock_client
-    return adapter, mock_channel, ref_message
-
-
-class TestSendWithReplyToMode:
-    """Tests for send() method respecting reply_to_mode."""
-
-    @pytest.mark.asyncio
-    async def test_off_mode_no_reply_reference(self):
-        adapter, channel, ref_msg = _make_discord_adapter("off")
-        adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2", "chunk3"]
-
-        await adapter.send("12345", "test content", reply_to="999")
-
-        # Should never try to fetch the reference message
-        channel.fetch_message.assert_not_called()
-        # All chunks sent without reference
-        for call in channel.send.call_args_list:
-            assert call.kwargs.get("reference") is None
-
-    @pytest.mark.asyncio
-    async def test_first_mode_only_first_chunk_references(self):
-        adapter, channel, ref_msg = _make_discord_adapter("first")
-        adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2", "chunk3"]
-
-        await adapter.send("12345", "test content", reply_to="999")
-
-        # Should fetch the reference message
-        channel.fetch_message.assert_called_once_with(999)
-        calls = channel.send.call_args_list
-        assert len(calls) == 3
-        assert calls[0].kwargs.get("reference") is ref_msg
-        assert calls[1].kwargs.get("reference") is None
-        assert calls[2].kwargs.get("reference") is None
-
-    @pytest.mark.asyncio
-    async def test_all_mode_all_chunks_reference(self):
-        adapter, channel, ref_msg = _make_discord_adapter("all")
-        adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2", "chunk3"]
-
-        await adapter.send("12345", "test content", reply_to="999")
-
-        channel.fetch_message.assert_called_once_with(999)
-        calls = channel.send.call_args_list
-        assert len(calls) == 3
-        for call in calls:
-            assert call.kwargs.get("reference") is ref_msg
-
-    @pytest.mark.asyncio
-    async def test_no_reply_to_param_no_reference(self):
-        adapter, channel, ref_msg = _make_discord_adapter("all")
-        adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2"]
-
-        await adapter.send("12345", "test content", reply_to=None)
-
-        channel.fetch_message.assert_not_called()
-        for call in channel.send.call_args_list:
-            assert call.kwargs.get("reference") is None
-
-    @pytest.mark.asyncio
-    async def test_single_chunk_respects_first_mode(self):
-        adapter, channel, ref_msg = _make_discord_adapter("first")
-        adapter.truncate_message = lambda content, max_len: ["single chunk"]
-
-        await adapter.send("12345", "test", reply_to="999")
-
-        calls = channel.send.call_args_list
-        assert len(calls) == 1
-        assert calls[0].kwargs.get("reference") is ref_msg
-
-    @pytest.mark.asyncio
-    async def test_single_chunk_off_mode(self):
-        adapter, channel, ref_msg = _make_discord_adapter("off")
-        adapter.truncate_message = lambda content, max_len: ["single chunk"]
-
-        await adapter.send("12345", "test", reply_to="999")
-
-        channel.fetch_message.assert_not_called()
-        calls = channel.send.call_args_list
-        assert len(calls) == 1
-        assert calls[0].kwargs.get("reference") is None
-
-    @pytest.mark.asyncio
-    async def test_invalid_mode_falls_back_to_first_behavior(self):
-        """Invalid mode behaves like 'first' — only first chunk gets reference."""
-        adapter, channel, ref_msg = _make_discord_adapter("banana")
-        adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2"]
-
-        await adapter.send("12345", "test", reply_to="999")
-
-        calls = channel.send.call_args_list
-        assert len(calls) == 2
-        assert calls[0].kwargs.get("reference") is ref_msg
-        assert calls[1].kwargs.get("reference") is None
-
-
-class TestConfigSerialization:
-    """Tests for reply_to_mode serialization (shared with Telegram)."""
-
-    def test_to_dict_includes_reply_to_mode(self):
-        config = PlatformConfig(enabled=True, token="test", reply_to_mode="all")
-        result = config.to_dict()
-        assert result["reply_to_mode"] == "all"
-
-    def test_from_dict_loads_reply_to_mode(self):
-        data = {"enabled": True, "token": "***", "reply_to_mode": "off"}
-        config = PlatformConfig.from_dict(data)
-        assert config.reply_to_mode == "off"
-
-    def test_from_dict_defaults_to_first(self):
-        data = {"enabled": True, "token": "***"}
-        config = PlatformConfig.from_dict(data)
-        assert config.reply_to_mode == "first"
-
-
-class TestEnvVarOverride:
-    """Tests for DISCORD_REPLY_TO_MODE environment variable override."""
-
-    def _make_config(self):
-        config = GatewayConfig()
-        config.platforms[Platform.DISCORD] = PlatformConfig(enabled=True, token="test")
-        return config
-
-    def test_env_var_sets_off_mode(self):
-        config = self._make_config()
-        with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "off"}, clear=False):
-            _apply_env_overrides(config)
-        assert config.platforms[Platform.DISCORD].reply_to_mode == "off"
-
-    def test_env_var_sets_all_mode(self):
-        config = self._make_config()
-        with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "all"}, clear=False):
-            _apply_env_overrides(config)
-        assert config.platforms[Platform.DISCORD].reply_to_mode == "all"
-
-    def test_env_var_case_insensitive(self):
-        config = self._make_config()
-        with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "ALL"}, clear=False):
-            _apply_env_overrides(config)
-        assert config.platforms[Platform.DISCORD].reply_to_mode == "all"
-
-    def test_env_var_invalid_value_ignored(self):
-        config = self._make_config()
-        with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "banana"}, clear=False):
-            _apply_env_overrides(config)
-        assert config.platforms[Platform.DISCORD].reply_to_mode == "first"
-
-    def test_env_var_empty_value_ignored(self):
-        config = self._make_config()
-        with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": ""}, clear=False):
-            _apply_env_overrides(config)
-        assert config.platforms[Platform.DISCORD].reply_to_mode == "first"
-
-    def test_env_var_creates_platform_config_if_missing(self):
-        """DISCORD_REPLY_TO_MODE creates PlatformConfig even without DISCORD_BOT_TOKEN."""
-        config = GatewayConfig()
-        assert Platform.DISCORD not in config.platforms
-        with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "off"}, clear=False):
-            _apply_env_overrides(config)
-        assert Platform.DISCORD in config.platforms
-        assert config.platforms[Platform.DISCORD].reply_to_mode == "off"
@@ -1,432 +0,0 @@
-"""Tests for Feishu interactive card approval buttons."""
-
-import asyncio
-import json
-import os
-import sys
-from pathlib import Path
-from types import SimpleNamespace
-from unittest.mock import AsyncMock, MagicMock, Mock, patch
-
-import pytest
-
-# ---------------------------------------------------------------------------
-# Ensure the repo root is importable
-# ---------------------------------------------------------------------------
-_repo = str(Path(__file__).resolve().parents[2])
-if _repo not in sys.path:
-    sys.path.insert(0, _repo)
-
-
-# ---------------------------------------------------------------------------
-# Minimal Feishu mock so FeishuAdapter can be imported without lark-oapi
-# ---------------------------------------------------------------------------
-def _ensure_feishu_mocks():
-    """Provide stubs for lark-oapi / aiohttp.web so the import succeeds."""
-    if "lark_oapi" not in sys.modules:
-        mod = MagicMock()
-        for name in (
-            "lark_oapi", "lark_oapi.api.im.v1",
-            "lark_oapi.event", "lark_oapi.event.callback_type",
-        ):
-            sys.modules.setdefault(name, mod)
-    if "aiohttp" not in sys.modules:
-        aio = MagicMock()
-        sys.modules.setdefault("aiohttp", aio)
-        sys.modules.setdefault("aiohttp.web", aio.web)
-
-
-_ensure_feishu_mocks()
-
-from gateway.config import PlatformConfig
-from gateway.platforms.feishu import FeishuAdapter
-
-
-# ---------------------------------------------------------------------------
-# Helpers
-# ---------------------------------------------------------------------------
-
-def _make_adapter() -> FeishuAdapter:
-    """Create a FeishuAdapter with mocked internals."""
-    config = PlatformConfig(enabled=True)
-    adapter = FeishuAdapter(config)
-    adapter._client = MagicMock()
-    return adapter
-
-
-def _make_card_action_data(
-    action_value: dict,
-    chat_id: str = "oc_12345",
-    open_id: str = "ou_user1",
-    token: str = "tok_abc",
-) -> SimpleNamespace:
-    """Create a mock Feishu card action callback data object."""
-    return SimpleNamespace(
-        event=SimpleNamespace(
-            token=token,
-            context=SimpleNamespace(open_chat_id=chat_id),
-            operator=SimpleNamespace(open_id=open_id),
-            action=SimpleNamespace(
-                tag="button",
-                value=action_value,
-            ),
-        ),
-    )
-
-
-# ===========================================================================
-# send_exec_approval — interactive card with buttons
-# ===========================================================================
-
-class TestFeishuExecApproval:
-    """Test send_exec_approval sends an interactive card."""
-
-    @pytest.mark.asyncio
-    async def test_sends_interactive_card(self):
-        adapter = _make_adapter()
-
-        mock_response = SimpleNamespace(
-            success=lambda: True,
-            data=SimpleNamespace(message_id="msg_001"),
-        )
-        with patch.object(
-            adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
-            return_value=mock_response,
-        ) as mock_send:
-            result = await adapter.send_exec_approval(
-                chat_id="oc_12345",
-                command="rm -rf /important",
-                session_key="agent:main:feishu:group:oc_12345",
-                description="dangerous deletion",
-            )
-
-        assert result.success is True
-        assert result.message_id == "msg_001"
-
-        mock_send.assert_called_once()
-        kwargs = mock_send.call_args[1]
-        assert kwargs["chat_id"] == "oc_12345"
-        assert kwargs["msg_type"] == "interactive"
-
-        # Verify card payload contains the command and buttons
-        card = json.loads(kwargs["payload"])
-        assert card["header"]["template"] == "orange"
-        assert "rm -rf /important" in card["elements"][0]["content"]
-        assert "dangerous deletion" in card["elements"][0]["content"]
-
-        # Check buttons
-        actions = card["elements"][1]["actions"]
-        assert len(actions) == 4
-        action_names = [a["value"]["hermes_action"] for a in actions]
-        assert action_names == [
-            "approve_once", "approve_session", "approve_always", "deny"
-        ]
-
-    @pytest.mark.asyncio
-    async def test_stores_approval_state(self):
-        adapter = _make_adapter()
-
-        mock_response = SimpleNamespace(
-            success=lambda: True,
-            data=SimpleNamespace(message_id="msg_002"),
-        )
-        with patch.object(
-            adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
-            return_value=mock_response,
-        ):
-            await adapter.send_exec_approval(
-                chat_id="oc_12345",
-                command="echo test",
-                session_key="my-session-key",
-            )
-
-        assert len(adapter._approval_state) == 1
-        approval_id = list(adapter._approval_state.keys())[0]
-        state = adapter._approval_state[approval_id]
-        assert state["session_key"] == "my-session-key"
-        assert state["message_id"] == "msg_002"
-        assert state["chat_id"] == "oc_12345"
-
-    @pytest.mark.asyncio
-    async def test_not_connected(self):
-        adapter = _make_adapter()
-        adapter._client = None
-        result = await adapter.send_exec_approval(
-            chat_id="oc_12345", command="ls", session_key="s"
-        )
-        assert result.success is False
-
-    @pytest.mark.asyncio
-    async def test_truncates_long_command(self):
-        adapter = _make_adapter()
-
-        mock_response = SimpleNamespace(
-            success=lambda: True,
-            data=SimpleNamespace(message_id="msg_003"),
-        )
-        with patch.object(
-            adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
-            return_value=mock_response,
-        ) as mock_send:
-            long_cmd = "x" * 5000
-            await adapter.send_exec_approval(
-                chat_id="oc_12345", command=long_cmd, session_key="s"
-            )
-
-        card = json.loads(mock_send.call_args[1]["payload"])
-        content = card["elements"][0]["content"]
-        assert "..." in content
-        assert len(content) < 5000
-
-    @pytest.mark.asyncio
-    async def test_multiple_approvals_get_unique_ids(self):
-        adapter = _make_adapter()
-
-        mock_response = SimpleNamespace(
-            success=lambda: True,
-            data=SimpleNamespace(message_id="msg_x"),
-        )
-        with patch.object(
-            adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
-            return_value=mock_response,
-        ):
-            await adapter.send_exec_approval(
-                chat_id="oc_1", command="cmd1", session_key="s1"
-            )
-            await adapter.send_exec_approval(
-                chat_id="oc_2", command="cmd2", session_key="s2"
-            )
-
-        assert len(adapter._approval_state) == 2
-        ids = list(adapter._approval_state.keys())
-        assert ids[0] != ids[1]
-
-
-# ===========================================================================
-# _handle_card_action_event — approval button clicks
-# ===========================================================================
-
-class TestFeishuApprovalCallback:
-    """Test the approval intercept in _handle_card_action_event."""
-
-    @pytest.mark.asyncio
-    async def test_resolves_approval_on_click(self):
-        adapter = _make_adapter()
-        adapter._approval_state[1] = {
-            "session_key": "agent:main:feishu:group:oc_12345",
-            "message_id": "msg_001",
-            "chat_id": "oc_12345",
-        }
-
-        data = _make_card_action_data(
-            action_value={"hermes_action": "approve_once", "approval_id": 1},
-        )
-
-        with (
-            patch.object(
-                adapter, "_resolve_sender_profile", new_callable=AsyncMock,
-                return_value={"user_id": "ou_user1", "user_name": "Norbert", "user_id_alt": None},
-            ),
-            patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
-            patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
-        ):
-            await adapter._handle_card_action_event(data)
-
-        mock_resolve.assert_called_once_with("agent:main:feishu:group:oc_12345", "once")
-        mock_update.assert_called_once_with("msg_001", "Approved once", "Norbert", "once")
-
-        # State should be cleaned up
-        assert 1 not in adapter._approval_state
-
-    @pytest.mark.asyncio
-    async def test_deny_button(self):
-        adapter = _make_adapter()
-        adapter._approval_state[2] = {
-            "session_key": "some-session",
-            "message_id": "msg_002",
-            "chat_id": "oc_12345",
-        }
-
-        data = _make_card_action_data(
-            action_value={"hermes_action": "deny", "approval_id": 2},
-            token="tok_deny",
-        )
-
-        with (
-            patch.object(
-                adapter, "_resolve_sender_profile", new_callable=AsyncMock,
-                return_value={"user_id": "ou_alice", "user_name": "Alice", "user_id_alt": None},
-            ),
-            patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
-            patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
-        ):
-            await adapter._handle_card_action_event(data)
-
-        mock_resolve.assert_called_once_with("some-session", "deny")
-        mock_update.assert_called_once_with("msg_002", "Denied", "Alice", "deny")
-
-    @pytest.mark.asyncio
-    async def test_session_approval(self):
-        adapter = _make_adapter()
-        adapter._approval_state[3] = {
-            "session_key": "sess-3",
-            "message_id": "msg_003",
-            "chat_id": "oc_99",
-        }
-
-        data = _make_card_action_data(
-            action_value={"hermes_action": "approve_session", "approval_id": 3},
-            token="tok_ses",
-        )
-
-        with (
-            patch.object(
-                adapter, "_resolve_sender_profile", new_callable=AsyncMock,
-                return_value={"user_id": "ou_u", "user_name": "Bob", "user_id_alt": None},
-            ),
-            patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
-            patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
-        ):
-            await adapter._handle_card_action_event(data)
-
-        mock_resolve.assert_called_once_with("sess-3", "session")
-        mock_update.assert_called_once_with("msg_003", "Approved for session", "Bob", "session")
-
-    @pytest.mark.asyncio
-    async def test_always_approval(self):
-        adapter = _make_adapter()
-        adapter._approval_state[4] = {
-            "session_key": "sess-4",
-            "message_id": "msg_004",
-            "chat_id": "oc_55",
-        }
-
-        data = _make_card_action_data(
-            action_value={"hermes_action": "approve_always", "approval_id": 4},
-            token="tok_alw",
-        )
-
-        with (
-            patch.object(
-                adapter, "_resolve_sender_profile", new_callable=AsyncMock,
-                return_value={"user_id": "ou_u", "user_name": "Carol", "user_id_alt": None},
-            ),
-            patch.object(adapter, "_update_approval_card", new_callable=AsyncMock),
-            patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
-        ):
-            await adapter._handle_card_action_event(data)
-
-        mock_resolve.assert_called_once_with("sess-4", "always")
-
-    @pytest.mark.asyncio
-    async def test_already_resolved_drops_silently(self):
-        adapter = _make_adapter()
-        # No state for approval_id 99 — already resolved
-
-        data = _make_card_action_data(
-            action_value={"hermes_action": "approve_once", "approval_id": 99},
-            token="tok_gone",
-        )
-
-        with patch("tools.approval.resolve_gateway_approval") as mock_resolve:
-            await adapter._handle_card_action_event(data)
-
-        # Should NOT resolve — already handled
-        mock_resolve.assert_not_called()
-
-    @pytest.mark.asyncio
-    async def test_non_approval_actions_route_normally(self):
-        """Non-approval card actions should still become synthetic commands."""
-        adapter = _make_adapter()
-
-        data = _make_card_action_data(
-            action_value={"custom_action": "something_else"},
-            token="tok_normal",
-        )
-
-        with (
-            patch.object(
-                adapter, "_resolve_sender_profile", new_callable=AsyncMock,
-                return_value={"user_id": "ou_u", "user_name": "Dave", "user_id_alt": None},
-            ),
-            patch.object(adapter, "get_chat_info", new_callable=AsyncMock, return_value={"name": "Test Chat"}),
-            patch.object(adapter, "_handle_message_with_guards", new_callable=AsyncMock) as mock_handle,
-            patch("tools.approval.resolve_gateway_approval") as mock_resolve,
-        ):
-            await adapter._handle_card_action_event(data)
-
-        # Should NOT resolve any approval
-        mock_resolve.assert_not_called()
-        # Should have routed as synthetic command
-        mock_handle.assert_called_once()
-        event = mock_handle.call_args[0][0]
-        assert "/card button" in event.text
-
-
-# ===========================================================================
-# _update_approval_card — card replacement after resolution
-# ===========================================================================
-
-class TestFeishuUpdateApprovalCard:
-    """Test the card update after approval resolution."""
-
-    @pytest.mark.asyncio
-    async def test_updates_card_on_approve(self):
-        adapter = _make_adapter()
-
-        mock_update = AsyncMock()
-        adapter._client.im.v1.message.update = MagicMock()
-
-        with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
-            await adapter._update_approval_card(
-                "msg_001", "Approved once", "Norbert", "once"
-            )
-
-        mock_thread.assert_called_once()
-        # Verify the update request was built
-        call_args = mock_thread.call_args
-        assert call_args[0][0] == adapter._client.im.v1.message.update
-
-    @pytest.mark.asyncio
-    async def test_updates_card_on_deny(self):
-        adapter = _make_adapter()
-
-        with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
-            await adapter._update_approval_card(
-                "msg_002", "Denied", "Alice", "deny"
-            )
-
-        mock_thread.assert_called_once()
-
-    @pytest.mark.asyncio
-    async def test_skips_update_when_not_connected(self):
-        adapter = _make_adapter()
-        adapter._client = None
-
-        with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
-            await adapter._update_approval_card(
-                "msg_001", "Approved", "Bob", "once"
-            )
-
-        mock_thread.assert_not_called()
-
-    @pytest.mark.asyncio
-    async def test_skips_update_when_no_message_id(self):
-        adapter = _make_adapter()
-
-        with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
-            await adapter._update_approval_card(
-                "", "Approved", "Bob", "once"
-            )
-
-        mock_thread.assert_not_called()
-
-    @pytest.mark.asyncio
-    async def test_swallows_update_errors(self):
-        adapter = _make_adapter()
-
-        with patch("asyncio.to_thread", new_callable=AsyncMock, side_effect=Exception("API error")):
-            # Should not raise
-            await adapter._update_approval_card(
-                "msg_001", "Approved", "Bob", "once"
-            )
--- a/Show More
+++ b/Show More