Compare commits

..

1 Commits

Author SHA1 Message Date
Karan c0271f73f6 feat: add WorldSim — OSINT-powered personality simulation skill
Rehoboam-class worldsim. Immersive CLI personality simulator that
researches real people via 25+ verified platform access methods,
builds 6-layer psychometric profiles, finds star threads (personality
compression keys), and generates platform-authentic simulated
conversations with mechanical verification and adversarial refinement.

26 files | 38K words | 2,283 lines Python

- Immersive CLI interface (worldsim> prompt, no assistant framing)
- OSINT pipeline: X API, Instagram private API, Bluesky, TikTok,
  Facebook, Threads, Mastodon, Reddit, GitHub, HN, Medium, Quora,
  Goodreads, Google Scholar, Crunchbase, podcasts, news/blogs
- Star thread: one-sentence personality compression key per person
- Deep psychometrics: Big Five + Moral Foundations + Schwartz Values
  + Cognitive Style + Narrative Framing + Behavioral Metadata
- Anti-slop: mechanical detection of LLM writing patterns
- GAN-style adversarial refinement loop with mechanical verification
- Recursive self-improvement: learned rules grow with each simulation
- Rehoboam persistence: SQLite + filesystem for profiles, predictions,
  social graph, knowledge archives
- GEPA/MIPROv2 self-evolution integration tested and working
- Knowledge archive: per-person source library with citations and
  semantic retrieval for context-aware grounding

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-04-08 13:46:20 -04:00
186 changed files with 10655 additions and 12549 deletions
-8
View File
@@ -81,14 +81,6 @@
# HF_TOKEN=
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (Qwen OAuth)
# =============================================================================
# Qwen OAuth reuses your local Qwen CLI login (qwen auth qwen-oauth).
# No API key needed — credentials come from ~/.qwen/oauth_creds.json.
# Optional base URL override:
# HERMES_QWEN_BASE_URL=https://portal.qwen.ai/v1
# =============================================================================
# TOOL API KEYS
# =============================================================================
+4 -16
View File
@@ -8,9 +8,6 @@ on:
release:
types: [published]
permissions:
contents: read
concurrency:
group: docker-${{ github.ref }}
cancel-in-progress: true
@@ -20,29 +17,22 @@ jobs:
# Only run on the upstream repository, not on forks
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
timeout-minutes: 60
timeout-minutes: 30
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: recursive
- name: Set up QEMU
uses: docker/setup-qemu-action@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
# Build amd64 only so we can `load` the image for smoke testing.
# `load: true` cannot export a multi-arch manifest to the local daemon.
# The multi-arch build follows on push to main / release.
- name: Build image (amd64, smoke test)
- name: Build image
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile
load: true
platforms: linux/amd64
tags: nousresearch/hermes-agent:test
cache-from: type=gha
cache-to: type=gha,mode=max
@@ -61,28 +51,26 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Push multi-arch image (main branch)
- name: Push image (main branch)
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile
push: true
platforms: linux/amd64,linux/arm64
tags: |
nousresearch/hermes-agent:latest
nousresearch/hermes-agent:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Push multi-arch image (release)
- name: Push image (release)
if: github.event_name == 'release'
uses: docker/build-push-action@v6
with:
context: .
file: Dockerfile
push: true
platforms: linux/amd64,linux/arm64
tags: |
nousresearch/hermes-agent:latest
nousresearch/hermes-agent:${{ github.event.release.tag_name }}
-346
View File
@@ -1,346 +0,0 @@
# Hermes Agent v0.8.0 (v2026.4.8)
**Release Date:** April 8, 2026
> The intelligence release — background task auto-notifications, free MiMo v2 Pro on Nous Portal, live model switching across all platforms, self-optimized GPT/Codex guidance, native Google AI Studio, smart inactivity timeouts, approval buttons, MCP OAuth 2.1, and 209 merged PRs with 82 resolved issues.
---
## ✨ Highlights
- **Background Process Auto-Notifications (`notify_on_complete`)** — Background tasks can now automatically notify the agent when they finish. Start a long-running process (AI model training, test suites, deployments, builds) and the agent gets notified on completion — no polling needed. The agent can keep working on other things and pick up results when they land. ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779))
- **Free Xiaomi MiMo v2 Pro on Nous Portal** — Nous Portal now supports the free-tier Xiaomi MiMo v2 Pro model for auxiliary tasks (compression, vision, summarization), with free-tier model gating and pricing display in model selection. ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018), [#5880](https://github.com/NousResearch/hermes-agent/pull/5880))
- **Live Model Switching (`/model` Command)** — Switch models and providers mid-session from CLI, Telegram, Discord, Slack, or any gateway platform. Aggregator-aware resolution keeps you on OpenRouter/Nous when possible, with automatic cross-provider fallback when needed. Interactive model pickers on Telegram and Discord with inline buttons. ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181), [#5742](https://github.com/NousResearch/hermes-agent/pull/5742))
- **Self-Optimized GPT/Codex Tool-Use Guidance** — The agent diagnosed and patched 5 failure modes in GPT and Codex tool calling through automated behavioral benchmarking, dramatically improving reliability on OpenAI models. Includes execution discipline guidance and thinking-only prefill continuation for structured reasoning. ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120), [#5414](https://github.com/NousResearch/hermes-agent/pull/5414), [#5931](https://github.com/NousResearch/hermes-agent/pull/5931))
- **Google AI Studio (Gemini) Native Provider** — Direct access to Gemini models through Google's AI Studio API. Includes automatic models.dev registry integration for real-time context length detection across any provider. ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577))
- **Inactivity-Based Agent Timeouts** — Gateway and cron timeouts now track actual tool activity instead of wall-clock time. Long-running tasks that are actively working will never be killed — only truly idle agents time out. ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389), [#5440](https://github.com/NousResearch/hermes-agent/pull/5440))
- **Approval Buttons on Slack & Telegram** — Dangerous command approval via native platform buttons instead of typing `/approve`. Slack gets thread context preservation; Telegram gets emoji reactions for approval status. ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
- **MCP OAuth 2.1 PKCE + OSV Malware Scanning** — Full standards-compliant OAuth for MCP server authentication, plus automatic malware scanning of MCP extension packages via the OSV vulnerability database. ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420), [#5305](https://github.com/NousResearch/hermes-agent/pull/5305))
- **Centralized Logging & Config Validation** — Structured logging to `~/.hermes/logs/` (agent.log + errors.log) with the `hermes logs` command for tailing and filtering. Config structure validation catches malformed YAML at startup before it causes cryptic failures. ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430), [#5426](https://github.com/NousResearch/hermes-agent/pull/5426))
- **Plugin System Expansion** — Plugins can now register CLI subcommands, receive request-scoped API hooks with correlation IDs, prompt for required env vars during install, and hook into session lifecycle events (finalize/reset). ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295), [#5427](https://github.com/NousResearch/hermes-agent/pull/5427), [#5470](https://github.com/NousResearch/hermes-agent/pull/5470), [#6129](https://github.com/NousResearch/hermes-agent/pull/6129))
- **Matrix Tier 1 & Platform Hardening** — Matrix gets reactions, read receipts, rich formatting, and room management. Discord adds channel controls and ignored channels. Signal gets full MEDIA: tag delivery. Mattermost gets file attachments. Comprehensive reliability fixes across all platforms. ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975), [#5602](https://github.com/NousResearch/hermes-agent/pull/5602))
- **Security Hardening Pass** — Consolidated SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards, cron path traversal hardening, and cross-session isolation. Terminal workdir sanitization across all backends. ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944), [#5613](https://github.com/NousResearch/hermes-agent/pull/5613), [#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
---
## 🏗️ Core Agent & Architecture
### Provider & Model Support
- **Native Google AI Studio (Gemini) provider** with models.dev integration for automatic context length detection ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577))
- **`/model` command — full provider+model system overhaul** — live switching across CLI and all gateway platforms with aggregator-aware resolution ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181))
- **Interactive model picker for Telegram and Discord** — inline button-based model selection ([#5742](https://github.com/NousResearch/hermes-agent/pull/5742))
- **Nous Portal free-tier model gating** with pricing display in model selection ([#5880](https://github.com/NousResearch/hermes-agent/pull/5880))
- **Model pricing display** for OpenRouter and Nous Portal providers ([#5416](https://github.com/NousResearch/hermes-agent/pull/5416))
- **xAI (Grok) prompt caching** via `x-grok-conv-id` header ([#5604](https://github.com/NousResearch/hermes-agent/pull/5604))
- **Grok added to tool-use enforcement models** for direct xAI usage ([#5595](https://github.com/NousResearch/hermes-agent/pull/5595))
- **MiniMax TTS provider** (speech-2.8) ([#4963](https://github.com/NousResearch/hermes-agent/pull/4963))
- **Non-agentic model warning** — warns users when loading Hermes LLM models not designed for tool use ([#5378](https://github.com/NousResearch/hermes-agent/pull/5378))
- **Ollama Cloud auth, /model switch persistence**, and alias tab completion ([#5269](https://github.com/NousResearch/hermes-agent/pull/5269))
- **Preserve dots in OpenCode Go model names** (minimax-m2.7, glm-4.5, kimi-k2.5) ([#5597](https://github.com/NousResearch/hermes-agent/pull/5597))
- **MiniMax models 404 fix** — strip /v1 from Anthropic base URL for OpenCode Go ([#4918](https://github.com/NousResearch/hermes-agent/pull/4918))
- **Provider credential reset windows** honored in pooled failover ([#5188](https://github.com/NousResearch/hermes-agent/pull/5188))
- **OAuth token sync** between credential pool and credentials file ([#4981](https://github.com/NousResearch/hermes-agent/pull/4981))
- **Stale OAuth credentials** no longer block OpenRouter users on auto-detect ([#5746](https://github.com/NousResearch/hermes-agent/pull/5746))
- **Codex OAuth credential pool disconnect** + expired token import fix ([#5681](https://github.com/NousResearch/hermes-agent/pull/5681))
- **Codex pool entry sync** from `~/.codex/auth.json` on exhaustion — @GratefulDave ([#5610](https://github.com/NousResearch/hermes-agent/pull/5610))
- **Auxiliary client payment fallback** — retry with next provider on 402 ([#5599](https://github.com/NousResearch/hermes-agent/pull/5599))
- **Auxiliary client resolves named custom providers** and 'main' alias ([#5978](https://github.com/NousResearch/hermes-agent/pull/5978))
- **Use mimo-v2-pro** for non-vision auxiliary tasks on Nous free tier ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018))
- **Vision auto-detection** tries main provider first ([#6041](https://github.com/NousResearch/hermes-agent/pull/6041))
- **Provider re-ordering and Quick Install** — @austinpickett ([#4664](https://github.com/NousResearch/hermes-agent/pull/4664))
- **Nous OAuth access_token** no longer used as inference API key — @SHL0MS ([#5564](https://github.com/NousResearch/hermes-agent/pull/5564))
- **HERMES_PORTAL_BASE_URL env var** respected during Nous login — @benbarclay ([#5745](https://github.com/NousResearch/hermes-agent/pull/5745))
- **Env var overrides** for Nous portal/inference URLs ([#5419](https://github.com/NousResearch/hermes-agent/pull/5419))
- **Z.AI endpoint auto-detect** via probe and cache ([#5763](https://github.com/NousResearch/hermes-agent/pull/5763))
- **MiniMax context lengths, model catalog, thinking guard, aux model, and config base_url** corrections ([#6082](https://github.com/NousResearch/hermes-agent/pull/6082))
- **Community provider/model resolution fixes** — salvaged 4 community PRs + MiniMax aux URL ([#5983](https://github.com/NousResearch/hermes-agent/pull/5983))
### Agent Loop & Conversation
- **Self-optimized GPT/Codex tool-use guidance** via automated behavioral benchmarking — agent self-diagnosed and patched 5 failure modes ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120))
- **GPT/Codex execution discipline guidance** in system prompts ([#5414](https://github.com/NousResearch/hermes-agent/pull/5414))
- **Thinking-only prefill continuation** for structured reasoning responses ([#5931](https://github.com/NousResearch/hermes-agent/pull/5931))
- **Accept reasoning-only responses** without retries — set content to "(empty)" instead of infinite retry ([#5278](https://github.com/NousResearch/hermes-agent/pull/5278))
- **Jittered retry backoff** — exponential backoff with jitter for API retries ([#6048](https://github.com/NousResearch/hermes-agent/pull/6048))
- **Smart thinking block signature management** — preserve and manage Anthropic thinking signatures across turns ([#6112](https://github.com/NousResearch/hermes-agent/pull/6112))
- **Coerce tool call arguments** to match JSON Schema types — fixes models that send strings instead of numbers/booleans ([#5265](https://github.com/NousResearch/hermes-agent/pull/5265))
- **Save oversized tool results to file** instead of destructive truncation ([#5210](https://github.com/NousResearch/hermes-agent/pull/5210))
- **Sandbox-aware tool result persistence** ([#6085](https://github.com/NousResearch/hermes-agent/pull/6085))
- **Streaming fallback** improved after edit failures ([#6110](https://github.com/NousResearch/hermes-agent/pull/6110))
- **Codex empty-output gaps** covered in fallback + normalizer + auxiliary client ([#5724](https://github.com/NousResearch/hermes-agent/pull/5724), [#5730](https://github.com/NousResearch/hermes-agent/pull/5730), [#5734](https://github.com/NousResearch/hermes-agent/pull/5734))
- **Codex stream output backfill** from output_item.done events ([#5689](https://github.com/NousResearch/hermes-agent/pull/5689))
- **Stream consumer creates new message** after tool boundaries ([#5739](https://github.com/NousResearch/hermes-agent/pull/5739))
- **Codex validation aligned** with normalization for empty stream output ([#5940](https://github.com/NousResearch/hermes-agent/pull/5940))
- **Bridge tool-calls** in copilot-acp adapter ([#5460](https://github.com/NousResearch/hermes-agent/pull/5460))
- **Filter transcript-only roles** from chat-completions payload ([#4880](https://github.com/NousResearch/hermes-agent/pull/4880))
- **Context compaction failures fixed** on temperature-restricted models — @MadKangYu ([#5608](https://github.com/NousResearch/hermes-agent/pull/5608))
- **Sanitize tool_calls for all strict APIs** (Fireworks, Mistral, etc.) — @lumethegreat ([#5183](https://github.com/NousResearch/hermes-agent/pull/5183))
### Memory & Sessions
- **Supermemory memory provider** — new memory plugin with multi-container, search_mode, identity template, and env var override ([#5737](https://github.com/NousResearch/hermes-agent/pull/5737), [#5933](https://github.com/NousResearch/hermes-agent/pull/5933))
- **Shared thread sessions** by default — multi-user thread support across gateway platforms ([#5391](https://github.com/NousResearch/hermes-agent/pull/5391))
- **Subagent sessions linked to parent** and hidden from session list ([#5309](https://github.com/NousResearch/hermes-agent/pull/5309))
- **Profile-scoped memory isolation** and clone support ([#4845](https://github.com/NousResearch/hermes-agent/pull/4845))
- **Thread gateway user_id to memory plugins** for per-user scoping ([#5895](https://github.com/NousResearch/hermes-agent/pull/5895))
- **Honcho plugin drift overhaul** + plugin CLI registration system ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295))
- **Honcho holographic prompt and trust score** rendering preserved ([#4872](https://github.com/NousResearch/hermes-agent/pull/4872))
- **Honcho doctor fix** — use recall_mode instead of memory_mode — @techguysimon ([#5645](https://github.com/NousResearch/hermes-agent/pull/5645))
- **RetainDB** — API routes, write queue, dialectic, agent model, file tools fixes ([#5461](https://github.com/NousResearch/hermes-agent/pull/5461))
- **Hindsight memory plugin overhaul** + memory setup wizard fixes ([#5094](https://github.com/NousResearch/hermes-agent/pull/5094))
- **mem0 API v2 compat**, prefetch context fencing, secret redaction ([#5423](https://github.com/NousResearch/hermes-agent/pull/5423))
- **mem0 env vars merged** with mem0.json instead of either/or ([#4939](https://github.com/NousResearch/hermes-agent/pull/4939))
- **Clean user message** used for all memory provider operations ([#4940](https://github.com/NousResearch/hermes-agent/pull/4940))
- **Silent memory flush failure** on /new and /resume fixed — @ryanautomated ([#5640](https://github.com/NousResearch/hermes-agent/pull/5640))
- **OpenViking atexit safety net** for session commit ([#5664](https://github.com/NousResearch/hermes-agent/pull/5664))
- **OpenViking tenant-scoping headers** for multi-tenant servers ([#4936](https://github.com/NousResearch/hermes-agent/pull/4936))
- **ByteRover brv query** runs synchronously before LLM call ([#4831](https://github.com/NousResearch/hermes-agent/pull/4831))
---
## 📱 Messaging Platforms (Gateway)
### Gateway Core
- **Inactivity-based agent timeout** — replaces wall-clock timeout with smart activity tracking; long-running active tasks never killed ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389))
- **Approval buttons for Slack & Telegram** + Slack thread context preservation ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890))
- **Live-stream /update output** + forward interactive prompts to user ([#5180](https://github.com/NousResearch/hermes-agent/pull/5180))
- **Infinite timeout support** + periodic notifications + actionable error messages ([#4959](https://github.com/NousResearch/hermes-agent/pull/4959))
- **Duplicate message prevention** — gateway dedup + partial stream guard ([#4878](https://github.com/NousResearch/hermes-agent/pull/4878))
- **Webhook delivery_info persistence** + full session id in /status ([#5942](https://github.com/NousResearch/hermes-agent/pull/5942))
- **Tool preview truncation** respects tool_preview_length in all/new progress modes ([#5937](https://github.com/NousResearch/hermes-agent/pull/5937))
- **Short preview truncation** restored for all/new tool progress modes ([#4935](https://github.com/NousResearch/hermes-agent/pull/4935))
- **Update-pending state** written atomically to prevent corruption ([#4923](https://github.com/NousResearch/hermes-agent/pull/4923))
- **Approval session key isolated** per turn ([#4884](https://github.com/NousResearch/hermes-agent/pull/4884))
- **Active-session guard bypass** for /approve, /deny, /stop, /new ([#4926](https://github.com/NousResearch/hermes-agent/pull/4926), [#5765](https://github.com/NousResearch/hermes-agent/pull/5765))
- **Typing indicator paused** during approval waits ([#5893](https://github.com/NousResearch/hermes-agent/pull/5893))
- **Caption check** uses exact line-by-line match instead of substring (all platforms) ([#5939](https://github.com/NousResearch/hermes-agent/pull/5939))
- **MEDIA: tags stripped** from streamed gateway messages ([#5152](https://github.com/NousResearch/hermes-agent/pull/5152))
- **MEDIA: tags extracted** from cron delivery before sending ([#5598](https://github.com/NousResearch/hermes-agent/pull/5598))
- **Profile-aware service units** + voice transcription cleanup ([#5972](https://github.com/NousResearch/hermes-agent/pull/5972))
- **Thread-safe PairingStore** with atomic writes — @CharlieKerfoot ([#5656](https://github.com/NousResearch/hermes-agent/pull/5656))
- **Sanitize media URLs** in base platform logs — @WAXLYY ([#5631](https://github.com/NousResearch/hermes-agent/pull/5631))
- **Reduce Telegram fallback IP activation log noise** — @MadKangYu ([#5615](https://github.com/NousResearch/hermes-agent/pull/5615))
- **Cron static method wrappers** to prevent self-binding ([#5299](https://github.com/NousResearch/hermes-agent/pull/5299))
- **Stale 'hermes login' replaced** with 'hermes auth' + credential removal re-seeding fix ([#5670](https://github.com/NousResearch/hermes-agent/pull/5670))
### Telegram
- **Group topics skill binding** for supergroup forum topics ([#4886](https://github.com/NousResearch/hermes-agent/pull/4886))
- **Emoji reactions** for approval status and notifications ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
- **Duplicate message delivery prevented** on send timeout ([#5153](https://github.com/NousResearch/hermes-agent/pull/5153))
- **Command names sanitized** to strip invalid characters ([#5596](https://github.com/NousResearch/hermes-agent/pull/5596))
- **Per-platform disabled skills** respected in Telegram menu and gateway dispatch ([#4799](https://github.com/NousResearch/hermes-agent/pull/4799))
- **/approve and /deny** routed through running-agent guard ([#4798](https://github.com/NousResearch/hermes-agent/pull/4798))
### Discord
- **Channel controls** — ignored_channels and no_thread_channels config options ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
- **Skills registered as native slash commands** via shared gateway logic ([#5603](https://github.com/NousResearch/hermes-agent/pull/5603))
- **/approve, /deny, /queue, /background, /btw** registered as native slash commands ([#4800](https://github.com/NousResearch/hermes-agent/pull/4800), [#5477](https://github.com/NousResearch/hermes-agent/pull/5477))
- **Unnecessary members intent** removed on startup + token lock leak fix ([#5302](https://github.com/NousResearch/hermes-agent/pull/5302))
### Slack
- **Thread engagement** — auto-respond in bot-started and mentioned threads ([#5897](https://github.com/NousResearch/hermes-agent/pull/5897))
- **mrkdwn in edit_message** + thread replies without @mentions ([#5733](https://github.com/NousResearch/hermes-agent/pull/5733))
### Matrix
- **Tier 1 feature parity** — reactions, read receipts, rich formatting, room management ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275))
- **MATRIX_REQUIRE_MENTION and MATRIX_AUTO_THREAD** support ([#5106](https://github.com/NousResearch/hermes-agent/pull/5106))
- **Comprehensive reliability** — encrypted media, auth recovery, cron E2EE, Synapse compat ([#5271](https://github.com/NousResearch/hermes-agent/pull/5271))
- **CJK input, E2EE, and reconnect** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
### Signal
- **Full MEDIA: tag delivery** — send_image_file, send_voice, and send_video implemented ([#5602](https://github.com/NousResearch/hermes-agent/pull/5602))
### Mattermost
- **File attachments** — set message type to DOCUMENT when post has file attachments — @nericervin ([#5609](https://github.com/NousResearch/hermes-agent/pull/5609))
### Feishu
- **Interactive card approval buttons** ([#6043](https://github.com/NousResearch/hermes-agent/pull/6043))
- **Reconnect and ACL** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
### Webhooks
- **`{__raw__}` template token** and thread_id passthrough for forum topics ([#5662](https://github.com/NousResearch/hermes-agent/pull/5662))
---
## 🖥️ CLI & User Experience
### Interactive CLI
- **Defer response content** until reasoning block completes ([#5773](https://github.com/NousResearch/hermes-agent/pull/5773))
- **Ghost status-bar lines cleared** on terminal resize ([#4960](https://github.com/NousResearch/hermes-agent/pull/4960))
- **Normalise \r\n and \r line endings** in pasted text ([#4849](https://github.com/NousResearch/hermes-agent/pull/4849))
- **ChatConsole errors, curses scroll, skin-aware banner, git state** banner fixes ([#5974](https://github.com/NousResearch/hermes-agent/pull/5974))
- **Native Windows image paste** support ([#5917](https://github.com/NousResearch/hermes-agent/pull/5917))
- **--yolo and other flags** no longer silently dropped when placed before 'chat' subcommand ([#5145](https://github.com/NousResearch/hermes-agent/pull/5145))
### Setup & Configuration
- **Config structure validation** — detect malformed YAML at startup with actionable error messages ([#5426](https://github.com/NousResearch/hermes-agent/pull/5426))
- **Centralized logging** to `~/.hermes/logs/` — agent.log (INFO+), errors.log (WARNING+) with `hermes logs` command ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430))
- **Docs links added** to setup wizard sections ([#5283](https://github.com/NousResearch/hermes-agent/pull/5283))
- **Doctor diagnostics** — sync provider checks, config migration, WAL and mem0 diagnostics ([#5077](https://github.com/NousResearch/hermes-agent/pull/5077))
- **Timeout debug logging** and user-facing diagnostics improved ([#5370](https://github.com/NousResearch/hermes-agent/pull/5370))
- **Reasoning effort unified** to config.yaml only ([#6118](https://github.com/NousResearch/hermes-agent/pull/6118))
- **Permanent command allowlist** loaded on startup ([#5076](https://github.com/NousResearch/hermes-agent/pull/5076))
- **`hermes auth remove`** now clears env-seeded credentials permanently ([#5285](https://github.com/NousResearch/hermes-agent/pull/5285))
- **Bundled skills synced to all profiles** during update ([#5795](https://github.com/NousResearch/hermes-agent/pull/5795))
- **`hermes update` no longer kills** freshly-restarted gateway service ([#5448](https://github.com/NousResearch/hermes-agent/pull/5448))
- **Subprocess.run() timeouts** added to all gateway CLI commands ([#5424](https://github.com/NousResearch/hermes-agent/pull/5424))
- **Actionable error message** when Codex refresh token is reused — @tymrtn ([#5612](https://github.com/NousResearch/hermes-agent/pull/5612))
- **Google-workspace skill scripts** can now run directly — @xinbenlv ([#5624](https://github.com/NousResearch/hermes-agent/pull/5624))
### Cron System
- **Inactivity-based cron timeout** — replaces wall-clock; active tasks run indefinitely ([#5440](https://github.com/NousResearch/hermes-agent/pull/5440))
- **Pre-run script injection** for data collection and change detection ([#5082](https://github.com/NousResearch/hermes-agent/pull/5082))
- **Delivery failure tracking** in job status ([#6042](https://github.com/NousResearch/hermes-agent/pull/6042))
- **Delivery guidance** in cron prompts — stops send_message thrashing ([#5444](https://github.com/NousResearch/hermes-agent/pull/5444))
- **MEDIA files delivered** as native platform attachments ([#5921](https://github.com/NousResearch/hermes-agent/pull/5921))
- **[SILENT] suppression** works anywhere in response — @auspic7 ([#5654](https://github.com/NousResearch/hermes-agent/pull/5654))
- **Cron path traversal** hardening ([#5147](https://github.com/NousResearch/hermes-agent/pull/5147))
---
## 🔧 Tool System
### Terminal & Execution
- **Execute_code on remote backends** — code execution now works on Docker, SSH, Modal, and other remote terminal backends ([#5088](https://github.com/NousResearch/hermes-agent/pull/5088))
- **Exit code context** for common CLI tools in terminal results — helps agent understand what went wrong ([#5144](https://github.com/NousResearch/hermes-agent/pull/5144))
- **Progressive subdirectory hint discovery** — agent learns project structure as it navigates ([#5291](https://github.com/NousResearch/hermes-agent/pull/5291))
- **notify_on_complete for background processes** — get notified when long-running tasks finish ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779))
- **Docker env config** — explicit container environment variables via docker_env config ([#4738](https://github.com/NousResearch/hermes-agent/pull/4738))
- **Approval metadata included** in terminal tool results ([#5141](https://github.com/NousResearch/hermes-agent/pull/5141))
- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
- **Detached process crash recovery** state corrected ([#6101](https://github.com/NousResearch/hermes-agent/pull/6101))
- **Agent-browser paths with spaces** preserved — @Vasanthdev2004 ([#6077](https://github.com/NousResearch/hermes-agent/pull/6077))
- **Portable base64 encoding** for image reading on macOS — @CharlieKerfoot ([#5657](https://github.com/NousResearch/hermes-agent/pull/5657))
### Browser
- **Switch managed browser provider** from Browserbase to Browser Use — @benbarclay ([#5750](https://github.com/NousResearch/hermes-agent/pull/5750))
- **Firecrawl cloud browser** provider — @alt-glitch ([#5628](https://github.com/NousResearch/hermes-agent/pull/5628))
- **JS evaluation** via browser_console expression parameter ([#5303](https://github.com/NousResearch/hermes-agent/pull/5303))
- **Windows browser** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
### MCP
- **MCP OAuth 2.1 PKCE** — full standards-compliant OAuth client support ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420))
- **OSV malware check** for MCP extension packages ([#5305](https://github.com/NousResearch/hermes-agent/pull/5305))
- **Prefer structuredContent over text** + no_mcp sentinel ([#5979](https://github.com/NousResearch/hermes-agent/pull/5979))
- **Unknown toolsets warning suppressed** for MCP server names ([#5279](https://github.com/NousResearch/hermes-agent/pull/5279))
### Web & Files
- **.zip document support** + auto-mount cache dirs into remote backends ([#4846](https://github.com/NousResearch/hermes-agent/pull/4846))
- **Redact query secrets** in send_message errors — @WAXLYY ([#5650](https://github.com/NousResearch/hermes-agent/pull/5650))
### Delegation
- **Credential pool sharing** + workspace path hints for subagents ([#5748](https://github.com/NousResearch/hermes-agent/pull/5748))
### ACP (VS Code / Zed / JetBrains)
- **Aggregate ACP improvements** — auth compat, protocol fixes, command ads, delegation, SSE events ([#5292](https://github.com/NousResearch/hermes-agent/pull/5292))
---
## 🧩 Skills Ecosystem
### Skills System
- **Skill config interface** — skills can declare required config.yaml settings, prompted during setup, injected at load time ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635))
- **Plugin CLI registration system** — plugins register their own CLI subcommands without touching main.py ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295))
- **Request-scoped API hooks** with tool call correlation IDs for plugins ([#5427](https://github.com/NousResearch/hermes-agent/pull/5427))
- **Session lifecycle hooks** — on_session_finalize and on_session_reset for CLI + gateway ([#6129](https://github.com/NousResearch/hermes-agent/pull/6129))
- **Prompt for required env vars** during plugin install — @kshitijk4poor ([#5470](https://github.com/NousResearch/hermes-agent/pull/5470))
- **Plugin name validation** — reject names that resolve to plugins root ([#5368](https://github.com/NousResearch/hermes-agent/pull/5368))
- **pre_llm_call plugin context** moved to user message to preserve prompt cache ([#5146](https://github.com/NousResearch/hermes-agent/pull/5146))
### New & Updated Skills
- **popular-web-designs** — 54 production website design systems ([#5194](https://github.com/NousResearch/hermes-agent/pull/5194))
- **p5js creative coding** — @SHL0MS ([#5600](https://github.com/NousResearch/hermes-agent/pull/5600))
- **manim-video** — mathematical and technical animations — @SHL0MS ([#4930](https://github.com/NousResearch/hermes-agent/pull/4930))
- **llm-wiki** — Karpathy's LLM Wiki skill ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635))
- **gitnexus-explorer** — codebase indexing and knowledge serving ([#5208](https://github.com/NousResearch/hermes-agent/pull/5208))
- **research-paper-writing** — AI-Scientist & GPT-Researcher patterns — @SHL0MS ([#5421](https://github.com/NousResearch/hermes-agent/pull/5421))
- **blogwatcher** updated to JulienTant's fork ([#5759](https://github.com/NousResearch/hermes-agent/pull/5759))
- **claude-code skill** comprehensive rewrite v2.0 + v2.2 ([#5155](https://github.com/NousResearch/hermes-agent/pull/5155), [#5158](https://github.com/NousResearch/hermes-agent/pull/5158))
- **Code verification skills** consolidated into one ([#4854](https://github.com/NousResearch/hermes-agent/pull/4854))
- **Manim CE reference docs** expanded — geometry, animations, LaTeX — @leotrs ([#5791](https://github.com/NousResearch/hermes-agent/pull/5791))
- **Manim-video references** — design thinking, updaters, paper explainer, decorations, production quality — @SHL0MS ([#5588](https://github.com/NousResearch/hermes-agent/pull/5588), [#5408](https://github.com/NousResearch/hermes-agent/pull/5408))
---
## 🔒 Security & Reliability
### Security Hardening
- **Consolidated security** — SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944))
- **Cross-session isolation** + cron path traversal hardening ([#5613](https://github.com/NousResearch/hermes-agent/pull/5613))
- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
- **Approval 'once' session escalation** prevented + cron delivery platform validation ([#5280](https://github.com/NousResearch/hermes-agent/pull/5280))
- **Profile-scoped Google Workspace OAuth tokens** protected ([#4910](https://github.com/NousResearch/hermes-agent/pull/4910))
### Reliability
- **Aggressive worktree and branch cleanup** to prevent accumulation ([#6134](https://github.com/NousResearch/hermes-agent/pull/6134))
- **O(n²) catastrophic backtracking** in redact regex fixed — 100x improvement on large outputs ([#4962](https://github.com/NousResearch/hermes-agent/pull/4962))
- **Runtime stability fixes** across core, web, delegate, and browser tools ([#4843](https://github.com/NousResearch/hermes-agent/pull/4843))
- **API server streaming fix** + conversation history support ([#5977](https://github.com/NousResearch/hermes-agent/pull/5977))
- **OpenViking API endpoint paths** and response parsing corrected ([#5078](https://github.com/NousResearch/hermes-agent/pull/5078))
---
## 🐛 Notable Bug Fixes
- **9 community bugfixes salvaged** — gateway, cron, deps, macOS launchd in one batch ([#5288](https://github.com/NousResearch/hermes-agent/pull/5288))
- **Batch core bug fixes** — model config, session reset, alias fallback, launchctl, delegation, atomic writes ([#5630](https://github.com/NousResearch/hermes-agent/pull/5630))
- **Batch gateway/platform fixes** — matrix E2EE, CJK input, Windows browser, Feishu reconnect + ACL ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
- **Stale test skips removed**, regex backtracking, file search bug, and test flakiness ([#4969](https://github.com/NousResearch/hermes-agent/pull/4969))
- **Nix flake** — read version, regen uv.lock, add hermes_logging — @alt-glitch ([#5651](https://github.com/NousResearch/hermes-agent/pull/5651))
- **Lowercase variable redaction** regression tests ([#5185](https://github.com/NousResearch/hermes-agent/pull/5185))
---
## 🧪 Testing
- **57 failing CI tests repaired** across 14 files ([#5823](https://github.com/NousResearch/hermes-agent/pull/5823))
- **Test suite re-architecture** + CI failure fixes — @alt-glitch ([#5946](https://github.com/NousResearch/hermes-agent/pull/5946))
- **Codebase-wide lint cleanup** — unused imports, dead code, and inefficient patterns ([#5821](https://github.com/NousResearch/hermes-agent/pull/5821))
- **browser_close tool removed** — auto-cleanup handles it ([#5792](https://github.com/NousResearch/hermes-agent/pull/5792))
---
## 📚 Documentation
- **Comprehensive documentation audit** — fix stale info, expand thin pages, add depth ([#5393](https://github.com/NousResearch/hermes-agent/pull/5393))
- **40+ discrepancies fixed** between documentation and codebase ([#5818](https://github.com/NousResearch/hermes-agent/pull/5818))
- **13 features documented** from last week's PRs ([#5815](https://github.com/NousResearch/hermes-agent/pull/5815))
- **Guides section overhaul** — fix existing + add 3 new tutorials ([#5735](https://github.com/NousResearch/hermes-agent/pull/5735))
- **Salvaged 4 docs PRs** — docker setup, post-update validation, local LLM guide, signal-cli install ([#5727](https://github.com/NousResearch/hermes-agent/pull/5727))
- **Discord configuration reference** ([#5386](https://github.com/NousResearch/hermes-agent/pull/5386))
- **Community FAQ entries** for common workflows and troubleshooting ([#4797](https://github.com/NousResearch/hermes-agent/pull/4797))
- **WSL2 networking guide** for local model servers ([#5616](https://github.com/NousResearch/hermes-agent/pull/5616))
- **Honcho CLI reference** + plugin CLI registration docs ([#5308](https://github.com/NousResearch/hermes-agent/pull/5308))
- **Obsidian Headless setup** for servers in llm-wiki ([#5660](https://github.com/NousResearch/hermes-agent/pull/5660))
- **Hermes Mod visual skin editor** added to skins page ([#6095](https://github.com/NousResearch/hermes-agent/pull/6095))
---
## 👥 Contributors
### Core
- **@teknium1** — 179 PRs
### Top Community Contributors
- **@SHL0MS** (7 PRs) — p5js creative coding skill, manim-video skill + 5 reference expansions, research-paper-writing, Nous OAuth fix, manim font fix
- **@alt-glitch** (3 PRs) — Firecrawl cloud browser provider, test re-architecture + CI fixes, Nix flake fixes
- **@benbarclay** (2 PRs) — Browser Use managed provider switch, Nous portal base URL fix
- **@CharlieKerfoot** (2 PRs) — macOS portable base64 encoding, thread-safe PairingStore
- **@WAXLYY** (2 PRs) — send_message secret redaction, gateway media URL sanitization
- **@MadKangYu** (2 PRs) — Telegram log noise reduction, context compaction fix for temperature-restricted models
### All Contributors
@alt-glitch, @austinpickett, @auspic7, @benbarclay, @CharlieKerfoot, @GratefulDave, @kshitijk4poor, @leotrs, @lumethegreat, @MadKangYu, @nericervin, @ryanautomated, @SHL0MS, @techguysimon, @tymrtn, @Vasanthdev2004, @WAXLYY, @xinbenlv
---
**Full Changelog**: [v2026.4.3...v2026.4.8](https://github.com/NousResearch/hermes-agent/compare/v2026.4.3...v2026.4.8)
+12 -117
View File
@@ -163,17 +163,6 @@ def _is_oauth_token(key: str) -> bool:
return True
def _normalize_base_url_text(base_url) -> str:
"""Normalize SDK/base transport URL values to a plain string for inspection.
Some client objects expose ``base_url`` as an ``httpx.URL`` instead of a raw
string. Provider/auth detection should accept either shape.
"""
if not base_url:
return ""
return str(base_url).strip()
def _is_third_party_anthropic_endpoint(base_url: str | None) -> bool:
"""Return True for non-Anthropic endpoints using the Anthropic Messages API.
@@ -181,10 +170,9 @@ def _is_third_party_anthropic_endpoint(base_url: str | None) -> bool:
with their own API keys via x-api-key, not Anthropic OAuth tokens. OAuth
detection should be skipped for these endpoints.
"""
normalized = _normalize_base_url_text(base_url)
if not normalized:
if not base_url:
return False # No base_url = direct Anthropic API
normalized = normalized.rstrip("/").lower()
normalized = base_url.rstrip("/").lower()
if "anthropic.com" in normalized:
return False # Direct Anthropic API — OAuth applies
return True # Any other endpoint is a third-party proxy
@@ -194,13 +182,12 @@ def _requires_bearer_auth(base_url: str | None) -> bool:
"""Return True for Anthropic-compatible providers that require Bearer auth.
Some third-party /anthropic endpoints implement Anthropic's Messages API but
require Authorization: Bearer *** of Anthropic's native x-api-key header.
require Authorization: Bearer instead of Anthropic's native x-api-key header.
MiniMax's global and China Anthropic-compatible endpoints follow this pattern.
"""
normalized = _normalize_base_url_text(base_url)
if not normalized:
if not base_url:
return False
normalized = normalized.rstrip("/").lower()
normalized = base_url.rstrip("/").lower()
return normalized.startswith(("https://api.minimax.io/anthropic", "https://api.minimaxi.com/anthropic"))
@@ -216,14 +203,13 @@ def build_anthropic_client(api_key: str, base_url: str = None):
)
from httpx import Timeout
normalized_base_url = _normalize_base_url_text(base_url)
kwargs = {
"timeout": Timeout(timeout=900.0, connect=10.0),
}
if normalized_base_url:
kwargs["base_url"] = normalized_base_url
if base_url:
kwargs["base_url"] = base_url
if _requires_bearer_auth(normalized_base_url):
if _requires_bearer_auth(base_url):
# Some Anthropic-compatible providers (e.g. MiniMax) expect the API key in
# Authorization: Bearer even for regular API keys. Route those endpoints
# through auth_token so the SDK sends Bearer auth instead of x-api-key.
@@ -956,18 +942,12 @@ def _convert_content_to_anthropic(content: Any) -> Any:
def convert_messages_to_anthropic(
messages: List[Dict],
base_url: str | None = None,
) -> Tuple[Optional[Any], List[Dict]]:
"""Convert OpenAI-format messages to Anthropic format.
Returns (system_prompt, anthropic_messages).
System messages are extracted since Anthropic takes them as a separate param.
system_prompt is a string or list of content blocks (when cache_control present).
When *base_url* is provided and points to a third-party Anthropic-compatible
endpoint, all thinking block signatures are stripped. Signatures are
Anthropic-proprietary — third-party endpoints cannot validate them and will
reject them with HTTP 400 "Invalid signature in thinking block".
"""
system = None
result = []
@@ -1122,15 +1102,7 @@ def convert_messages_to_anthropic(
curr_content = [{"type": "text", "text": curr_content}]
fixed[-1]["content"] = prev_content + curr_content
else:
# Consecutive assistant messages — merge text content.
# Drop thinking blocks from the *second* message: their
# signature was computed against a different turn boundary
# and becomes invalid once merged.
if isinstance(m["content"], list):
m["content"] = [
b for b in m["content"]
if not (isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking"))
]
# Consecutive assistant messages — merge text content
prev_blocks = fixed[-1]["content"]
curr_blocks = m["content"]
if isinstance(prev_blocks, list) and isinstance(curr_blocks, list):
@@ -1148,79 +1120,6 @@ def convert_messages_to_anthropic(
fixed.append(m)
result = fixed
# ── Thinking block signature management ──────────────────────────
# Anthropic signs thinking blocks against the full turn content.
# Any upstream mutation (context compression, session truncation,
# orphan stripping, message merging) invalidates the signature,
# causing HTTP 400 "Invalid signature in thinking block".
#
# Signatures are Anthropic-proprietary. Third-party endpoints
# (MiniMax, Azure AI Foundry, self-hosted proxies) cannot validate
# them and will reject them outright. When targeting a third-party
# endpoint, strip ALL thinking/redacted_thinking blocks from every
# assistant message — the third-party will generate its own
# thinking blocks if it supports extended thinking.
#
# For direct Anthropic (strategy following clawdbot/OpenClaw):
# 1. Strip thinking/redacted_thinking from all assistant messages
# EXCEPT the last one — preserves reasoning continuity on the
# current tool-use chain while avoiding stale signature errors.
# 2. Downgrade unsigned thinking blocks (no signature) to text —
# Anthropic can't validate them and will reject them.
# 3. Strip cache_control from thinking/redacted_thinking blocks —
# cache markers can interfere with signature validation.
_THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
_is_third_party = _is_third_party_anthropic_endpoint(base_url)
last_assistant_idx = None
for i in range(len(result) - 1, -1, -1):
if result[i].get("role") == "assistant":
last_assistant_idx = i
break
for idx, m in enumerate(result):
if m.get("role") != "assistant" or not isinstance(m.get("content"), list):
continue
if _is_third_party or idx != last_assistant_idx:
# Third-party endpoint: strip ALL thinking blocks from every
# assistant message — signatures are Anthropic-proprietary.
# Direct Anthropic: strip from non-latest assistant messages only.
stripped = [
b for b in m["content"]
if not (isinstance(b, dict) and b.get("type") in _THINKING_TYPES)
]
m["content"] = stripped or [{"type": "text", "text": "(thinking elided)"}]
else:
# Latest assistant on direct Anthropic: keep signed thinking
# blocks for reasoning continuity; downgrade unsigned ones to
# plain text.
new_content = []
for b in m["content"]:
if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
new_content.append(b)
continue
if b.get("type") == "redacted_thinking":
# Redacted blocks use 'data' for the signature payload
if b.get("data"):
new_content.append(b)
# else: drop — no data means it can't be validated
elif b.get("signature"):
# Signed thinking block — keep it
new_content.append(b)
else:
# Unsigned thinking — downgrade to text so it's not lost
thinking_text = b.get("thinking", "")
if thinking_text:
new_content.append({"type": "text", "text": thinking_text})
m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
# Strip cache_control from any remaining thinking/redacted_thinking
# blocks — cache markers interfere with signature validation.
for b in m["content"]:
if isinstance(b, dict) and b.get("type") in _THINKING_TYPES:
b.pop("cache_control", None)
return system, result
@@ -1234,7 +1133,6 @@ def build_anthropic_kwargs(
is_oauth: bool = False,
preserve_dots: bool = False,
context_length: Optional[int] = None,
base_url: str | None = None,
) -> Dict[str, Any]:
"""Build kwargs for anthropic.messages.create().
@@ -1248,11 +1146,8 @@ def build_anthropic_kwargs(
When *preserve_dots* is True, model name dots are not converted to hyphens
(for Alibaba/DashScope anthropic-compatible endpoints: qwen3.5-plus).
When *base_url* points to a third-party Anthropic-compatible endpoint,
thinking block signatures are stripped (they are Anthropic-proprietary).
"""
system, anthropic_messages = convert_messages_to_anthropic(messages, base_url=base_url)
system, anthropic_messages = convert_messages_to_anthropic(messages)
anthropic_tools = convert_tools_to_anthropic(tools) if tools else []
model = normalize_model_name(model, preserve_dots=preserve_dots)
@@ -1329,9 +1224,9 @@ def build_anthropic_kwargs(
# Map reasoning_config to Anthropic's thinking parameter.
# Claude 4.6 models use adaptive thinking + output_config.effort.
# Older models use manual thinking with budget_tokens.
# Haiku and MiniMax models do NOT support extended thinking — skip entirely.
# Haiku models do NOT support extended thinking at all — skip entirely.
if reasoning_config and isinstance(reasoning_config, dict):
if reasoning_config.get("enabled") is not False and "haiku" not in model.lower() and "minimax" not in model.lower():
if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
effort = str(reasoning_config.get("effort", "medium")).lower()
budget = THINKING_BUDGET.get(effort, 8000)
if _supports_adaptive_thinking(model):
+56 -137
View File
@@ -59,48 +59,13 @@ from hermes_constants import OPENROUTER_BASE_URL
logger = logging.getLogger(__name__)
_PROVIDER_ALIASES = {
"google": "gemini",
"google-gemini": "gemini",
"google-ai-studio": "gemini",
"glm": "zai",
"z-ai": "zai",
"z.ai": "zai",
"zhipu": "zai",
"kimi": "kimi-coding",
"moonshot": "kimi-coding",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
"claude-code": "anthropic",
}
def _normalize_aux_provider(provider: Optional[str], *, for_vision: bool = False) -> str:
normalized = (provider or "auto").strip().lower()
if normalized.startswith("custom:"):
suffix = normalized.split(":", 1)[1].strip()
if not suffix:
return "custom"
normalized = suffix if not for_vision else "custom"
if normalized == "codex":
return "openai-codex"
if normalized == "main":
# Resolve to the user's actual main provider so named custom providers
# and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
main_prov = _read_main_provider()
if main_prov and main_prov not in ("auto", "main", ""):
return main_prov
return "custom"
return _PROVIDER_ALIASES.get(normalized, normalized)
# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"gemini": "gemini-3-flash-preview",
"zai": "glm-4.5-flash",
"kimi-coding": "kimi-k2-turbo-preview",
"minimax": "MiniMax-M2.7",
"minimax-cn": "MiniMax-M2.7",
"minimax": "MiniMax-M2.7-highspeed",
"minimax-cn": "MiniMax-M2.7-highspeed",
"anthropic": "claude-haiku-4-5-20251001",
"ai-gateway": "google/gemini-3-flash",
"opencode-zen": "gemini-3-flash",
@@ -127,7 +92,6 @@ auxiliary_is_nous: bool = False
_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
_NOUS_MODEL = "google/gemini-3-flash-preview"
_NOUS_FREE_TIER_VISION_MODEL = "xiaomi/mimo-v2-omni"
_NOUS_FREE_TIER_AUX_MODEL = "xiaomi/mimo-v2-pro"
_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
_ANTHROPIC_DEFAULT_BASE_URL = "https://api.anthropic.com"
_AUTH_JSON_PATH = get_hermes_home() / "auth.json"
@@ -141,23 +105,6 @@ _CODEX_AUX_MODEL = "gpt-5.2-codex"
_CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
def _to_openai_base_url(base_url: str) -> str:
"""Normalize an Anthropic-style base URL to OpenAI-compatible format.
Some providers (MiniMax, MiniMax-CN) expose an ``/anthropic`` endpoint for
the Anthropic Messages API and a separate ``/v1`` endpoint for OpenAI chat
completions. The auxiliary client uses the OpenAI SDK, so it must hit the
``/v1`` surface. Passing the raw ``inference_base_url`` causes requests to
land on ``/anthropic/chat/completions`` — a 404.
"""
url = str(base_url or "").strip().rstrip("/")
if url.endswith("/anthropic"):
rewritten = url[: -len("/anthropic")] + "/v1"
logger.debug("Auxiliary client: rewrote base URL %s%s", url, rewritten)
return rewritten
return url
def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]:
"""Return (pool_exists_for_provider, selected_entry)."""
try:
@@ -629,19 +576,11 @@ def _nous_base_url() -> str:
def _read_codex_access_token() -> Optional[str]:
"""Read a valid, non-expired Codex OAuth access token from Hermes auth store.
If a credential pool exists but currently has no selectable runtime entry
(for example all pool slots are marked exhausted), fall back to the
profile's auth.json token instead of hard-failing. This keeps explicit
fallback-to-Codex working when the pool state is stale but the stored OAuth
token is still valid.
"""
"""Read a valid, non-expired Codex OAuth access token from Hermes auth store."""
pool_present, entry = _select_pool_entry("openai-codex")
if pool_present:
token = _pool_runtime_api_key(entry)
if token:
return token
return token or None
try:
from hermes_cli.auth import _read_codex_tokens
@@ -695,9 +634,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
if not api_key:
continue
base_url = _to_openai_base_url(
_pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
)
base_url = _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
extra = {}
@@ -714,9 +651,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
if not api_key:
continue
base_url = _to_openai_base_url(
str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
)
base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
extra = {}
@@ -778,7 +713,7 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
nous = _read_nous_auth()
if not nous:
return None, None
@@ -790,13 +725,12 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
else:
model = _NOUS_MODEL
# Free-tier users can't use paid auxiliary models — use the free
# models instead: mimo-v2-omni for vision, mimo-v2-pro for text tasks.
# multimodal model instead so vision/browser-vision still works.
try:
from hermes_cli.models import check_nous_free_tier
if check_nous_free_tier():
model = _NOUS_FREE_TIER_VISION_MODEL if vision else _NOUS_FREE_TIER_AUX_MODEL
logger.debug("Free-tier Nous account — using %s for auxiliary/%s",
model, "vision" if vision else "text")
model = _NOUS_FREE_TIER_VISION_MODEL
logger.debug("Free-tier Nous account — using %s for auxiliary/vision", model)
except Exception:
pass
return (
@@ -902,13 +836,9 @@ def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
pool_present, entry = _select_pool_entry("openai-codex")
if pool_present:
codex_token = _pool_runtime_api_key(entry)
if codex_token:
base_url = _pool_runtime_base_url(entry, _CODEX_AUX_BASE_URL) or _CODEX_AUX_BASE_URL
else:
codex_token = _read_codex_access_token()
if not codex_token:
return None, None
base_url = _CODEX_AUX_BASE_URL
if not codex_token:
return None, None
base_url = _pool_runtime_base_url(entry, _CODEX_AUX_BASE_URL) or _CODEX_AUX_BASE_URL
else:
codex_token = _read_codex_access_token()
if not codex_token:
@@ -1208,7 +1138,17 @@ def resolve_provider_client(
(client, resolved_model) or (None, None) if auth is unavailable.
"""
# Normalise aliases
provider = _normalize_aux_provider(provider)
provider = (provider or "auto").strip().lower()
if provider == "codex":
provider = "openai-codex"
if provider == "main":
# Resolve to the user's actual main provider so named custom providers
# and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
main_prov = _read_main_provider()
if main_prov and main_prov not in ("auto", "main", ""):
provider = main_prov
else:
provider = "custom"
# ── Auto: try all providers in priority order ────────────────────
if provider == "auto":
@@ -1358,9 +1298,7 @@ def resolve_provider_client(
provider, ", ".join(tried_sources))
return None, None
base_url = _to_openai_base_url(
str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
)
base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
final_model = model or default_model
@@ -1437,11 +1375,24 @@ def get_async_text_auxiliary_client(task: str = ""):
_VISION_AUTO_PROVIDER_ORDER = (
"openrouter",
"nous",
"openai-codex",
"anthropic",
"custom",
)
def _normalize_vision_provider(provider: Optional[str]) -> str:
return _normalize_aux_provider(provider, for_vision=True)
provider = (provider or "auto").strip().lower()
if provider == "codex":
return "openai-codex"
if provider == "main":
# Resolve to actual main provider — named custom providers and
# non-aggregator providers need to pass through as their real name.
main_prov = _read_main_provider()
if main_prov and main_prov not in ("auto", "main", ""):
return main_prov
return "custom"
return provider
def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]:
@@ -1449,7 +1400,7 @@ def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Option
if provider == "openrouter":
return _try_openrouter()
if provider == "nous":
return _try_nous(vision=True)
return _try_nous()
if provider == "openai-codex":
return _try_codex()
if provider == "anthropic":
@@ -1482,26 +1433,17 @@ def _preferred_main_vision_provider() -> Optional[str]:
def get_available_vision_backends() -> List[str]:
"""Return the currently available vision backends in auto-selection order.
Order: active provider → OpenRouter → Nous → stop. This is the single
source of truth for setup, tool gating, and runtime auto-routing of
vision tasks.
This is the single source of truth for setup, tool gating, and runtime
auto-routing of vision tasks. The selected main provider is preferred when
it is also a known-good vision backend; otherwise Hermes falls back through
the standard conservative order.
"""
available: List[str] = []
# 1. Active provider — if the user configured a provider, try it first.
main_provider = _read_main_provider()
if main_provider and main_provider not in ("auto", ""):
if main_provider in _VISION_AUTO_PROVIDER_ORDER:
if _strict_vision_backend_available(main_provider):
available.append(main_provider)
else:
client, _ = resolve_provider_client(main_provider, _read_main_model())
if client is not None:
available.append(main_provider)
# 2. OpenRouter, 3. Nous — skip if already covered by main provider.
for p in _VISION_AUTO_PROVIDER_ORDER:
if p not in available and _strict_vision_backend_available(p):
available.append(p)
return available
ordered = list(_VISION_AUTO_PROVIDER_ORDER)
preferred = _preferred_main_vision_provider()
if preferred in ordered:
ordered.remove(preferred)
ordered.insert(0, preferred)
return [provider for provider in ordered if _strict_vision_backend_available(provider)]
def resolve_vision_provider_client(
@@ -1546,39 +1488,16 @@ def resolve_vision_provider_client(
return "custom", client, final_model
if requested == "auto":
# Vision auto-detection order:
# 1. Active provider + model (user's main chat config)
# 2. OpenRouter (known vision-capable default model)
# 3. Nous Portal (known vision-capable default model)
# 4. Stop
main_provider = _read_main_provider()
main_model = _read_main_model()
if main_provider and main_provider not in ("auto", ""):
if main_provider in _VISION_AUTO_PROVIDER_ORDER:
# Known strict backend — use its defaults.
sync_client, default_model = _resolve_strict_vision_backend(main_provider)
if sync_client is not None:
return _finalize(main_provider, sync_client, default_model)
else:
# Exotic provider (DeepSeek, Alibaba, named custom, etc.)
rpc_client, rpc_model = resolve_provider_client(
main_provider, main_model)
if rpc_client is not None:
logger.info(
"Vision auto-detect: using active provider %s (%s)",
main_provider, rpc_model or main_model,
)
return _finalize(
main_provider, rpc_client, rpc_model or main_model)
ordered = list(_VISION_AUTO_PROVIDER_ORDER)
preferred = _preferred_main_vision_provider()
if preferred in ordered:
ordered.remove(preferred)
ordered.insert(0, preferred)
# Fall back through aggregators.
for candidate in _VISION_AUTO_PROVIDER_ORDER:
if candidate == main_provider:
continue # already tried above
for candidate in ordered:
sync_client, default_model = _resolve_strict_vision_backend(candidate)
if sync_client is not None:
return _finalize(candidate, sync_client, default_model)
logger.debug("Auxiliary vision client: none available")
return None, None, None
+28 -77
View File
@@ -154,15 +154,12 @@ class ContextCompressor:
def _prune_old_tool_results(
self, messages: List[Dict[str, Any]], protect_tail_count: int,
protect_tail_tokens: int | None = None,
) -> tuple[List[Dict[str, Any]], int]:
"""Replace old tool result contents with a short placeholder.
Walks backward from the end, protecting the most recent messages that
fall within ``protect_tail_tokens`` (when provided) OR the last
``protect_tail_count`` messages (backward-compatible default).
When both are given, the token budget takes priority and the message
count acts as a hard minimum floor.
Walks backward from the end, protecting the most recent
``protect_tail_count`` messages. Older tool results get their
content replaced with a placeholder string.
Returns (pruned_messages, pruned_count).
"""
@@ -171,29 +168,7 @@ class ContextCompressor:
result = [m.copy() for m in messages]
pruned = 0
# Determine the prune boundary
if protect_tail_tokens is not None and protect_tail_tokens > 0:
# Token-budget approach: walk backward accumulating tokens
accumulated = 0
boundary = len(result)
min_protect = min(protect_tail_count, len(result) - 1)
for i in range(len(result) - 1, -1, -1):
msg = result[i]
content_len = len(msg.get("content") or "")
msg_tokens = content_len // _CHARS_PER_TOKEN + 10
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
args = tc.get("function", {}).get("arguments", "")
msg_tokens += len(args) // _CHARS_PER_TOKEN
if accumulated + msg_tokens > protect_tail_tokens and (len(result) - i) >= min_protect:
boundary = i
break
accumulated += msg_tokens
boundary = i
prune_boundary = max(boundary, len(result) - min_protect)
else:
prune_boundary = len(result) - protect_tail_count
prune_boundary = len(result) - protect_tail_count
for i in range(prune_boundary):
msg = result[i]
@@ -224,39 +199,30 @@ class ContextCompressor:
budget = int(content_tokens * _SUMMARY_RATIO)
return max(_MIN_SUMMARY_TOKENS, min(budget, self.max_summary_tokens))
# Truncation limits for the summarizer input. These bound how much of
# each message the summary model sees — the budget is the *summary*
# model's context window, not the main model's.
_CONTENT_MAX = 6000 # total chars per message body
_CONTENT_HEAD = 4000 # chars kept from the start
_CONTENT_TAIL = 1500 # chars kept from the end
_TOOL_ARGS_MAX = 1500 # tool call argument chars
_TOOL_ARGS_HEAD = 1200 # kept from the start of tool args
def _serialize_for_summary(self, turns: List[Dict[str, Any]]) -> str:
"""Serialize conversation turns into labeled text for the summarizer.
Includes tool call arguments and result content (up to
``_CONTENT_MAX`` chars per message) so the summarizer can preserve
specific details like file paths, commands, and outputs.
Includes tool call arguments and result content (up to 3000 chars
per message) so the summarizer can preserve specific details like
file paths, commands, and outputs.
"""
parts = []
for msg in turns:
role = msg.get("role", "unknown")
content = msg.get("content") or ""
# Tool results: keep enough content for the summarizer
# Tool results: keep more content than before (3000 chars)
if role == "tool":
tool_id = msg.get("tool_call_id", "")
if len(content) > self._CONTENT_MAX:
content = content[:self._CONTENT_HEAD] + "\n...[truncated]...\n" + content[-self._CONTENT_TAIL:]
if len(content) > 3000:
content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
parts.append(f"[TOOL RESULT {tool_id}]: {content}")
continue
# Assistant messages: include tool call names AND arguments
if role == "assistant":
if len(content) > self._CONTENT_MAX:
content = content[:self._CONTENT_HEAD] + "\n...[truncated]...\n" + content[-self._CONTENT_TAIL:]
if len(content) > 3000:
content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
tool_calls = msg.get("tool_calls", [])
if tool_calls:
tc_parts = []
@@ -266,8 +232,8 @@ class ContextCompressor:
name = fn.get("name", "?")
args = fn.get("arguments", "")
# Truncate long arguments but keep enough for context
if len(args) > self._TOOL_ARGS_MAX:
args = args[:self._TOOL_ARGS_HEAD] + "..."
if len(args) > 500:
args = args[:400] + "..."
tc_parts.append(f" {name}({args})")
else:
fn = getattr(tc, "function", None)
@@ -278,8 +244,8 @@ class ContextCompressor:
continue
# User and other roles
if len(content) > self._CONTENT_MAX:
content = content[:self._CONTENT_HEAD] + "\n...[truncated]...\n" + content[-self._CONTENT_TAIL:]
if len(content) > 3000:
content = content[:2000] + "\n...[truncated]...\n" + content[-800:]
parts.append(f"[{role.upper()}]: {content}")
return "\n\n".join(parts)
@@ -344,9 +310,6 @@ Update the summary using this exact structure. PRESERVE all existing information
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
## Tools & Patterns
[Which tools were used, how they were used effectively, and any tool-specific discoveries. Accumulate across compactions.]
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions.
Write only the summary body. Do not include any preamble or prefix."""
@@ -385,9 +348,6 @@ Use this exact structure:
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation]
## Tools & Patterns
[Which tools were used, how they were used effectively, and any tool-specific discoveries (e.g., preferred flags, working invocations, successful command patterns)]
Target ~{summary_budget} tokens. Be specific — include file paths, command outputs, error messages, and concrete values rather than vague descriptions. The goal is to prevent the next assistant from repeating work or losing important details.
Write only the summary body. Do not include any preamble or prefix."""
@@ -558,20 +518,13 @@ Write only the summary body. Do not include any preamble or prefix."""
derived from ``summary_target_ratio * context_length``, so it
scales automatically with the model's context window.
Token budget is the primary criterion. A hard minimum of 3 messages
is always protected, but the budget is allowed to exceed by up to
1.5x to avoid cutting inside an oversized message (tool output, file
read, etc.). If even the minimum 3 messages exceed 1.5x the budget
the cut is placed right after the head so compression still runs.
Never cuts inside a tool_call/result group.
Never cuts inside a tool_call/result group. Falls back to the old
``protect_last_n`` if the budget would protect fewer messages.
"""
if token_budget is None:
token_budget = self.tail_token_budget
n = len(messages)
# Hard minimum: always keep at least 3 messages in the tail
min_tail = min(3, n - head_end - 1) if n - head_end > 1 else 0
soft_ceiling = int(token_budget * 1.5)
min_tail = self.protect_last_n
accumulated = 0
cut_idx = n # start from beyond the end
@@ -584,21 +537,21 @@ Write only the summary body. Do not include any preamble or prefix."""
if isinstance(tc, dict):
args = tc.get("function", {}).get("arguments", "")
msg_tokens += len(args) // _CHARS_PER_TOKEN
# Stop once we exceed the soft ceiling (unless we haven't hit min_tail yet)
if accumulated + msg_tokens > soft_ceiling and (n - i) >= min_tail:
if accumulated + msg_tokens > token_budget and (n - i) >= min_tail:
break
accumulated += msg_tokens
cut_idx = i
# Ensure we protect at least min_tail messages
# Ensure we protect at least protect_last_n messages
fallback_cut = n - min_tail
if cut_idx > fallback_cut:
cut_idx = fallback_cut
# If the token budget would protect everything (small conversations),
# force a cut after the head so compression can still remove middle turns.
# fall back to the fixed protect_last_n approach so compression can
# still remove middle turns.
if cut_idx <= head_end:
cut_idx = max(fallback_cut, head_end + 1)
cut_idx = fallback_cut
# Align to avoid splitting tool groups
cut_idx = self._align_boundary_backward(messages, cut_idx)
@@ -623,13 +576,12 @@ Write only the summary body. Do not include any preamble or prefix."""
up so the API never receives mismatched IDs.
"""
n_messages = len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self.protect_first_n + 3 + 1
if n_messages <= _min_for_compress:
if n_messages <= self.protect_first_n + self.protect_last_n + 1:
if not self.quiet_mode:
logger.warning(
"Cannot compress: only %d messages (need > %d)",
n_messages, _min_for_compress,
n_messages,
self.protect_first_n + self.protect_last_n + 1,
)
return messages
@@ -637,8 +589,7 @@ Write only the summary body. Do not include any preamble or prefix."""
# Phase 1: Prune old tool results (cheap, no LLM call)
messages, pruned_count = self._prune_old_tool_results(
messages, protect_tail_count=self.protect_last_n,
protect_tail_tokens=self.tail_token_budget,
messages, protect_tail_count=self.protect_last_n * 3,
)
if pruned_count and not self.quiet_mode:
logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)
+3 -3
View File
@@ -64,10 +64,10 @@ SUPPORTED_POOL_STRATEGIES = {
}
# Cooldown before retrying an exhausted credential.
# 429 (rate-limited) and 402 (billing/quota) both cool down after 1 hour.
# Provider-supplied reset_at timestamps override these defaults.
# 429 (rate-limited) cools down faster since quotas reset frequently.
# 402 (billing/quota) and other codes use a longer default.
EXHAUSTED_TTL_429_SECONDS = 60 * 60 # 1 hour
EXHAUSTED_TTL_DEFAULT_SECONDS = 60 * 60 # 1 hour
EXHAUSTED_TTL_DEFAULT_SECONDS = 24 * 60 * 60 # 24 hours
# Pool key prefix for custom OpenAI-compatible endpoints.
# Custom endpoints all share provider='custom' but are keyed by their
+3 -67
View File
@@ -26,14 +26,12 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "zai", "kimi-coding", "minimax", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"qwen-oauth",
"custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
"github-models", "kimi", "moonshot", "claude", "deep-seek",
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"qwen-portal",
})
@@ -115,15 +113,8 @@ DEFAULT_CONTEXT_LENGTHS = {
"llama": 131072,
# Qwen
"qwen": 131072,
# MiniMax (lowercase — lookup lowercases model names at line 973)
"minimax-m1-256k": 1000000,
"minimax-m1-128k": 1000000,
"minimax-m1-80k": 1000000,
"minimax-m1-40k": 1000000,
"minimax-m1": 1000000,
"minimax-m2.5": 1048576,
"minimax-m2.7": 1048576,
"minimax": 1048576,
# MiniMax
"minimax": 204800,
# GLM
"glm": 202752,
# Kimi
@@ -136,7 +127,7 @@ DEFAULT_CONTEXT_LENGTHS = {
"deepseek-ai/DeepSeek-V3.2": 65536,
"moonshotai/Kimi-K2.5": 262144,
"moonshotai/Kimi-K2-Thinking": 262144,
"MiniMaxAI/MiniMax-M2.5": 1048576,
"MiniMaxAI/MiniMax-M2.5": 204800,
"XiaomiMiMo/MiMo-V2-Flash": 32768,
"mimo-v2-pro": 1048576,
"mimo-v2-omni": 1048576,
@@ -189,7 +180,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.minimax": "minimax",
"dashscope.aliyuncs.com": "alibaba",
"dashscope-intl.aliyuncs.com": "alibaba",
"portal.qwen.ai": "qwen-oauth",
"openrouter.ai": "openrouter",
"generativelanguage.googleapis.com": "gemini",
"inference-api.nousresearch.com": "nous",
@@ -197,7 +187,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.githubcopilot.com": "copilot",
"models.github.ai": "copilot",
"api.fireworks.ai": "fireworks",
"opencode.ai": "opencode-go",
}
@@ -622,59 +611,6 @@ def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
return False
def query_ollama_num_ctx(model: str, base_url: str) -> Optional[int]:
"""Query an Ollama server for the model's context length.
Returns the model's maximum context from GGUF metadata via ``/api/show``,
or the explicit ``num_ctx`` from the Modelfile if set. Returns None if
the server is unreachable or not Ollama.
This is the value that should be passed as ``num_ctx`` in Ollama chat
requests to override the default 2048.
"""
import httpx
bare_model = _strip_provider_prefix(model)
server_url = base_url.rstrip("/")
if server_url.endswith("/v1"):
server_url = server_url[:-3]
try:
server_type = detect_local_server_type(base_url)
except Exception:
return None
if server_type != "ollama":
return None
try:
with httpx.Client(timeout=3.0) as client:
resp = client.post(f"{server_url}/api/show", json={"name": bare_model})
if resp.status_code != 200:
return None
data = resp.json()
# Prefer explicit num_ctx from Modelfile parameters (user override)
params = data.get("parameters", "")
if "num_ctx" in params:
for line in params.split("\n"):
if "num_ctx" in line:
parts = line.strip().split()
if len(parts) >= 2:
try:
return int(parts[-1])
except ValueError:
pass
# Fall back to GGUF model_info context_length (training max)
model_info = data.get("model_info", {})
for key, value in model_info.items():
if "context_length" in key and isinstance(value, (int, float)):
return int(value)
except Exception:
pass
return None
def _query_local_context_length(model: str, base_url: str) -> Optional[int]:
"""Query a local server for the model's context length."""
import httpx
-1
View File
@@ -153,7 +153,6 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"minimax-cn": "minimax-cn",
"deepseek": "deepseek",
"alibaba": "alibaba",
"qwen-oauth": "alibaba",
"copilot": "github-copilot",
"ai-gateway": "vercel",
"opencode-zen": "opencode",
-31
View File
@@ -204,30 +204,6 @@ OPENAI_MODEL_EXECUTION_GUIDANCE = (
"the result.\n"
"</tool_persistence>\n"
"\n"
"<mandatory_tool_use>\n"
"NEVER answer these from memory or mental computation — ALWAYS use a tool:\n"
"- Arithmetic, math, calculations → use terminal or execute_code\n"
"- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)\n"
"- Current time, date, timezone → use terminal (e.g. date)\n"
"- System state: OS, CPU, memory, disk, ports, processes → use terminal\n"
"- File contents, sizes, line counts → use read_file, search_files, or terminal\n"
"- Git history, branches, diffs → use terminal\n"
"- Current facts (weather, news, versions) → use web_search\n"
"Your memory and user profile describe the USER, not the system you are "
"running on. The execution environment may differ from what the user profile "
"says about their personal setup.\n"
"</mandatory_tool_use>\n"
"\n"
"<act_dont_ask>\n"
"When a question has an obvious default interpretation, act on it immediately "
"instead of asking for clarification. Examples:\n"
"- 'Is port 443 open?' → check THIS machine (don't ask 'open where?')\n"
"- 'What OS am I running?' → check the live system (don't use user profile)\n"
"- 'What time is it?' → run `date` (don't guess)\n"
"Only ask for clarification when the ambiguity genuinely changes what tool "
"you would call.\n"
"</act_dont_ask>\n"
"\n"
"<prerequisite_checks>\n"
"- Before taking an action, check whether prerequisite discovery, lookup, or "
"context-gathering steps are needed.\n"
@@ -349,13 +325,6 @@ PLATFORM_HINTS = {
"only — no markdown, no formatting. SMS messages are limited to ~1600 "
"characters, so be brief and direct."
),
"bluebubbles": (
"You are chatting via iMessage (BlueBubbles). iMessage does not render "
"markdown formatting — use plain text. Keep responses concise as they "
"appear as text messages. You can send media files natively: include "
"MEDIA:/absolute/path/to/file in your response. Images (.jpg, .png, "
".heic) appear as photos and other files arrive as attachments."
),
}
CONTEXT_FILE_MAX_CHARS = 20_000
-57
View File
@@ -1,57 +0,0 @@
"""Retry utilities — jittered backoff for decorrelated retries.
Replaces fixed exponential backoff with jittered delays to prevent
thundering-herd retry spikes when multiple sessions hit the same
rate-limited provider concurrently.
"""
import random
import threading
import time
# Monotonic counter for jitter seed uniqueness within the same process.
# Protected by a lock to avoid race conditions in concurrent retry paths
# (e.g. multiple gateway sessions retrying simultaneously).
_jitter_counter = 0
_jitter_lock = threading.Lock()
def jittered_backoff(
attempt: int,
*,
base_delay: float = 5.0,
max_delay: float = 120.0,
jitter_ratio: float = 0.5,
) -> float:
"""Compute a jittered exponential backoff delay.
Args:
attempt: 1-based retry attempt number.
base_delay: Base delay in seconds for attempt 1.
max_delay: Maximum delay cap in seconds.
jitter_ratio: Fraction of computed delay to use as random jitter
range. 0.5 means jitter is uniform in [0, 0.5 * delay].
Returns:
Delay in seconds: min(base * 2^(attempt-1), max_delay) + jitter.
The jitter decorrelates concurrent retries so multiple sessions
hitting the same provider don't all retry at the same instant.
"""
global _jitter_counter
with _jitter_lock:
_jitter_counter += 1
tick = _jitter_counter
exponent = max(0, attempt - 1)
if exponent >= 63 or base_delay <= 0:
delay = max_delay
else:
delay = min(base_delay * (2 ** exponent), max_delay)
# Seed from time + counter for decorrelation even with coarse clocks.
seed = (time.time_ns() ^ (tick * 0x9E3779B9)) & 0xFFFFFFFF
rng = random.Random(seed)
jitter = rng.uniform(0, jitter_ratio * delay)
return delay + jitter
+1 -15
View File
@@ -445,16 +445,6 @@ agent:
# Higher = more room for complex tasks, but costs more tokens
# Recommended: 20-30 for focused tasks, 50-100 for open exploration
max_turns: 60
# Inactivity timeout for gateway agent runs (seconds, 0 = unlimited).
# The agent can run indefinitely when actively calling tools or receiving
# API responses. Only fires after the agent has been idle for this duration.
# gateway_timeout: 1800
# Staged warning: send a warning before escalating to full timeout.
# Fires once per run when inactivity reaches this threshold (seconds).
# Set to 0 to disable the warning.
# gateway_timeout_warning: 900
# Enable verbose logging
verbose: false
@@ -654,14 +644,10 @@ platform_toolsets:
# Voice Transcription (Speech-to-Text)
# =============================================================================
# Automatically transcribe voice messages on messaging platforms.
# Providers: local (free, faster-whisper) | groq (free tier) | openai (Whisper API) | mistral (Voxtral Transcribe)
# Set the corresponding API key in .env: GROQ_API_KEY, OPENAI_API_KEY, or MISTRAL_API_KEY.
# Requires OPENAI_API_KEY in .env (uses OpenAI Whisper API directly).
stt:
enabled: true
# provider: "local" # auto-detected if omitted
model: "whisper-1" # whisper-1 (cheapest) | gpt-4o-mini-transcribe | gpt-4o-transcribe
# mistral:
# model: "voxtral-mini-latest" # voxtral-mini-latest | voxtral-mini-2602
# =============================================================================
# Response Pacing (Messaging Platforms)
+32 -147
View File
@@ -612,11 +612,6 @@ def _run_cleanup():
pass
# Shut down memory provider (on_session_end + shutdown_all) at actual
# session boundary — NOT per-turn inside run_conversation().
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook("on_session_finalize", session_id=_active_agent_ref.session_id if _active_agent_ref else None, platform="cli")
except Exception:
pass
try:
if _active_agent_ref and hasattr(_active_agent_ref, 'shutdown_memory_provider'):
_active_agent_ref.shutdown_memory_provider(
@@ -760,10 +755,7 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
def _cleanup_worktree(info: Dict[str, str] = None) -> None:
"""Remove a worktree and its branch on exit.
Preserves the worktree only if it has unpushed commits (real work
that hasn't been pushed to any remote). Uncommitted changes alone
(untracked files, test artifacts) are not enough to keep it agent
work lives in commits/PRs, not the working tree.
If the worktree has uncommitted changes, warn and keep it.
"""
global _active_worktree
info = info or _active_worktree
@@ -779,27 +771,23 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
if not Path(wt_path).exists():
return
# Check for unpushed commits — commits reachable from HEAD but not
# from any remote branch. These represent real work the agent did
# but didn't push.
has_unpushed = False
# Check for uncommitted changes
try:
result = subprocess.run(
["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
status = subprocess.run(
["git", "status", "--porcelain"],
capture_output=True, text=True, timeout=10, cwd=wt_path,
)
has_unpushed = bool(result.stdout.strip())
has_changes = bool(status.stdout.strip())
except Exception:
has_unpushed = True # Assume unpushed on error — don't delete
has_changes = True # Assume dirty on error — don't delete
if has_unpushed:
print(f"\n\033[33m⚠ Worktree has unpushed commits, keeping: {wt_path}\033[0m")
print(f" To clean up manually: git worktree remove --force {wt_path}")
if has_changes:
print(f"\n\033[33m⚠ Worktree has uncommitted changes, keeping: {wt_path}\033[0m")
print(f" To clean up manually: git worktree remove {wt_path}")
_active_worktree = None
return
# Remove worktree (even if working tree is dirty — uncommitted
# changes without unpushed commits are just artifacts)
# Remove worktree
try:
subprocess.run(
["git", "worktree", "remove", wt_path, "--force"],
@@ -808,7 +796,7 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
except Exception as e:
logger.debug("Failed to remove worktree: %s", e)
# Delete the branch
# Delete the branch (only if it was never pushed / has no upstream)
try:
subprocess.run(
["git", "branch", "-D", branch],
@@ -822,27 +810,19 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
"""Remove stale worktrees and orphaned branches on startup.
"""Remove worktrees older than max_age_hours that have no uncommitted changes.
Age-based tiers:
- Under max_age_hours (24h): skip session may still be active.
- 24h72h: remove if no unpushed commits.
- Over 72h: force remove regardless (nothing should sit this long).
Also prunes orphaned ``hermes/*`` and ``pr-*`` local branches that
have no corresponding worktree.
Runs silently on startup to clean up after crashed/killed sessions.
"""
import subprocess
import time
worktrees_dir = Path(repo_root) / ".worktrees"
if not worktrees_dir.exists():
_prune_orphaned_branches(repo_root)
return
now = time.time()
soft_cutoff = now - (max_age_hours * 3600) # 24h default
hard_cutoff = now - (max_age_hours * 3 * 3600) # 72h default
cutoff = now - (max_age_hours * 3600)
for entry in worktrees_dir.iterdir():
if not entry.is_dir() or not entry.name.startswith("hermes-"):
@@ -851,24 +831,21 @@ def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
# Check age
try:
mtime = entry.stat().st_mtime
if mtime > soft_cutoff:
if mtime > cutoff:
continue # Too recent — skip
except Exception:
continue
force = mtime <= hard_cutoff # Over 72h — force remove
if not force:
# 24h72h tier: only remove if no unpushed commits
try:
result = subprocess.run(
["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
capture_output=True, text=True, timeout=5, cwd=str(entry),
)
if result.stdout.strip():
continue # Has unpushed commits — skip
except Exception:
continue # Can't check — skip
# Check for uncommitted changes
try:
status = subprocess.run(
["git", "status", "--porcelain"],
capture_output=True, text=True, timeout=5, cwd=str(entry),
)
if status.stdout.strip():
continue # Has changes — skip
except Exception:
continue # Can't check — skip
# Safe to remove
try:
@@ -887,81 +864,10 @@ def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
["git", "branch", "-D", branch],
capture_output=True, text=True, timeout=10, cwd=repo_root,
)
logger.debug("Pruned stale worktree: %s (force=%s)", entry.name, force)
logger.debug("Pruned stale worktree: %s", entry.name)
except Exception as e:
logger.debug("Failed to prune worktree %s: %s", entry.name, e)
_prune_orphaned_branches(repo_root)
def _prune_orphaned_branches(repo_root: str) -> None:
"""Delete local ``hermes/hermes-*`` and ``pr-*`` branches with no worktree.
These are auto-generated by ``hermes -w`` sessions and PR review
workflows respectively. Once their worktree is gone they serve no
purpose and just accumulate.
"""
import subprocess
try:
result = subprocess.run(
["git", "branch", "--format=%(refname:short)"],
capture_output=True, text=True, timeout=10, cwd=repo_root,
)
if result.returncode != 0:
return
all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
except Exception:
return
# Collect branches that are actively checked out in a worktree
active_branches: set = set()
try:
wt_result = subprocess.run(
["git", "worktree", "list", "--porcelain"],
capture_output=True, text=True, timeout=10, cwd=repo_root,
)
for line in wt_result.stdout.split("\n"):
if line.startswith("branch refs/heads/"):
active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
except Exception:
return # Can't determine active branches — bail
# Also protect the currently checked-out branch and main
try:
head_result = subprocess.run(
["git", "branch", "--show-current"],
capture_output=True, text=True, timeout=5, cwd=repo_root,
)
current = head_result.stdout.strip()
if current:
active_branches.add(current)
except Exception:
pass
active_branches.add("main")
orphaned = [
b for b in all_branches
if b not in active_branches
and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
]
if not orphaned:
return
# Delete in batches
for i in range(0, len(orphaned), 50):
batch = orphaned[i:i + 50]
try:
subprocess.run(
["git", "branch", "-D"] + batch,
capture_output=True, text=True, timeout=30, cwd=repo_root,
)
except Exception as e:
logger.debug("Failed to prune orphaned branches: %s", e)
logger.debug("Pruned %d orphaned branches", len(orphaned))
# ============================================================================
# ASCII Art & Branding
# ============================================================================
@@ -3408,22 +3314,6 @@ class HermesCLI:
flush_tool_summary()
print()
def _notify_session_boundary(self, event_type: str) -> None:
"""Fire a session-boundary plugin hook (on_session_finalize or on_session_reset).
Non-blocking errors are caught and logged. Safe to call from any
lifecycle point (shutdown, /new, /reset).
"""
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook(
event_type,
session_id=self.agent.session_id if self.agent else None,
platform=getattr(self, "platform", None) or "cli",
)
except Exception:
pass
def new_session(self, silent=False):
"""Start a fresh session with a new session ID and cleared agent state."""
if self.agent and self.conversation_history:
@@ -3431,10 +3321,6 @@ class HermesCLI:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
self._notify_session_boundary("on_session_finalize")
elif self.agent:
# First session or empty history — still finalize the old session
self._notify_session_boundary("on_session_finalize")
old_session_id = self.session_id
if self._session_db and old_session_id:
@@ -3479,7 +3365,6 @@ class HermesCLI:
)
except Exception:
pass
self._notify_session_boundary("on_session_reset")
if not silent:
print("(^_^)v New session started!")
@@ -4668,13 +4553,13 @@ class HermesCLI:
if output:
self.console.print(_rich_text_from_ansi(output))
else:
self.console.print("[dim]Command returned no output[/]")
ChatConsole().print("[dim]Command returned no output[/]")
except subprocess.TimeoutExpired:
self.console.print("[bold red]Quick command timed out (30s)[/]")
ChatConsole().print("[bold red]Quick command timed out (30s)[/]")
except Exception as e:
self.console.print(f"[bold red]Quick command error: {e}[/]")
ChatConsole().print(f"[bold red]Quick command error: {e}[/]")
else:
self.console.print(f"[bold red]Quick command '{base_cmd}' has no command defined[/]")
ChatConsole().print(f"[bold red]Quick command '{base_cmd}' has no command defined[/]")
elif qcmd.get("type") == "alias":
target = qcmd.get("target", "").strip()
if target:
@@ -4683,9 +4568,9 @@ class HermesCLI:
aliased_command = f"{target} {user_args}".strip()
return self.process_command(aliased_command)
else:
self.console.print(f"[bold red]Quick command '{base_cmd}' has no target defined[/]")
ChatConsole().print(f"[bold red]Quick command '{base_cmd}' has no target defined[/]")
else:
self.console.print(f"[bold red]Quick command '{base_cmd}' has unsupported type (supported: 'exec', 'alias')[/]")
ChatConsole().print(f"[bold red]Quick command '{base_cmd}' has unsupported type (supported: 'exec', 'alias')[/]")
# Check for plugin-registered slash commands
elif base_cmd.lstrip("/") in _get_plugin_cmd_handler_names():
from hermes_cli.plugins import get_plugin_command_handler
+1 -7
View File
@@ -574,16 +574,12 @@ def remove_job(job_id: str) -> bool:
return False
def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
delivery_error: Optional[str] = None):
def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
"""
Mark a job as having been run.
Updates last_run_at, last_status, increments completed count,
computes next_run_at, and auto-deletes if repeat limit reached.
``delivery_error`` is tracked separately from the agent error a job
can succeed (agent produced output) but fail delivery (platform down).
"""
jobs = load_jobs()
for i, job in enumerate(jobs):
@@ -592,8 +588,6 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
job["last_run_at"] = now
job["last_status"] = "ok" if success else "error"
job["last_error"] = error if not success else None
# Track delivery failures separately — cleared on successful delivery
job["last_delivery_error"] = delivery_error
# Increment completed count
if job.get("repeat"):
+27 -35
View File
@@ -44,7 +44,7 @@ logger = logging.getLogger(__name__)
_KNOWN_DELIVERY_PLATFORMS = frozenset({
"telegram", "discord", "slack", "whatsapp", "signal",
"matrix", "mattermost", "homeassistant", "dingtalk", "feishu",
"wecom", "sms", "email", "webhook", "bluebubbles",
"wecom", "sms", "email", "webhook",
})
from cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_run
@@ -91,7 +91,7 @@ def _resolve_delivery_target(job: dict) -> Optional[dict]:
}
# Origin missing (e.g. job created via API/script) — try each
# platform's home channel as a fallback instead of silently dropping.
for platform_name in ("matrix", "telegram", "discord", "slack", "bluebubbles"):
for platform_name in ("matrix", "telegram", "discord", "slack"):
chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
if chat_id:
logger.info(
@@ -196,7 +196,7 @@ def _send_media_via_adapter(adapter, chat_id: str, media_files: list, metadata:
logger.warning("Job '%s': failed to send media %s: %s", job.get("id", "?"), media_path, e)
def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Optional[str]:
def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None:
"""
Deliver job output to the configured target (origin chat, specific platform, etc.).
@@ -204,16 +204,16 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
use the live adapter first this supports E2EE rooms (e.g. Matrix) where
the standalone HTTP path cannot encrypt. Falls back to standalone send if
the adapter path fails or is unavailable.
Returns None on success, or an error string on failure.
"""
target = _resolve_delivery_target(job)
if not target:
if job.get("deliver", "local") != "local":
msg = f"no delivery target resolved for deliver={job.get('deliver', 'local')}"
logger.warning("Job '%s': %s", job["id"], msg)
return msg
return None # local-only jobs don't deliver — not a failure
logger.warning(
"Job '%s' deliver=%s but no concrete delivery target could be resolved",
job["id"],
job.get("deliver", "local"),
)
return
platform_name = target["platform"]
chat_id = target["chat_id"]
@@ -236,26 +236,22 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
"wecom": Platform.WECOM,
"email": Platform.EMAIL,
"sms": Platform.SMS,
"bluebubbles": Platform.BLUEBUBBLES,
}
platform = platform_map.get(platform_name.lower())
if not platform:
msg = f"unknown platform '{platform_name}'"
logger.warning("Job '%s': %s", job["id"], msg)
return msg
logger.warning("Job '%s': unknown platform '%s' for delivery", job["id"], platform_name)
return
try:
config = load_gateway_config()
except Exception as e:
msg = f"failed to load gateway config: {e}"
logger.error("Job '%s': %s", job["id"], msg)
return msg
logger.error("Job '%s': failed to load gateway config for delivery: %s", job["id"], e)
return
pconfig = config.platforms.get(platform)
if not pconfig or not pconfig.enabled:
msg = f"platform '{platform_name}' not configured/enabled"
logger.warning("Job '%s': %s", job["id"], msg)
return msg
logger.warning("Job '%s': platform '%s' not configured/enabled", job["id"], platform_name)
return
# Optionally wrap the content with a header/footer so the user knows this
# is a cron delivery. Wrapping is on by default; set cron.wrap_response: false
@@ -311,7 +307,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
if adapter_ok:
logger.info("Job '%s': delivered to %s:%s via live adapter", job["id"], platform_name, chat_id)
return None
return
except Exception as e:
logger.warning(
"Job '%s': live adapter delivery to %s:%s failed (%s), falling back to standalone",
@@ -333,17 +329,13 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, cleaned_delivery_content, thread_id=thread_id, media_files=media_files))
result = future.result(timeout=30)
except Exception as e:
msg = f"delivery to {platform_name}:{chat_id} failed: {e}"
logger.error("Job '%s': %s", job["id"], msg)
return msg
logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
return
if result and result.get("error"):
msg = f"delivery error: {result['error']}"
logger.error("Job '%s': %s", job["id"], msg)
return msg
logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
return None
logger.error("Job '%s': delivery error: %s", job["id"], result["error"])
else:
logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
_SCRIPT_TIMEOUT = 120 # seconds
@@ -586,9 +578,11 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
except Exception as e:
logger.warning("Job '%s': failed to load config.yaml, using defaults: %s", job_id, e)
# Reasoning config from config.yaml
# Reasoning config from env or config.yaml
from hermes_constants import parse_reasoning_effort
effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
effort = os.getenv("HERMES_REASONING_EFFORT", "")
if not effort:
effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
reasoning_config = parse_reasoning_effort(effort)
# Prefill messages from env or config.yaml
@@ -874,15 +868,13 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
logger.info("Job '%s': agent returned %s — skipping delivery", job["id"], SILENT_MARKER)
should_deliver = False
delivery_error = None
if should_deliver:
try:
delivery_error = _deliver_result(job, deliver_content, adapters=adapters, loop=loop)
_deliver_result(job, deliver_content, adapters=adapters, loop=loop)
except Exception as de:
delivery_error = str(de)
logger.error("Delivery failed for job %s: %s", job["id"], de)
mark_job_run(job["id"], success, error, delivery_error=delivery_error)
mark_job_run(job["id"], success, error)
executed += 1
except Exception as e:
+1 -24
View File
@@ -21,8 +21,6 @@ from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Set
from model_tools import handle_function_call
from tools.terminal_tool import get_active_env
from tools.tool_result_storage import maybe_persist_tool_result, enforce_turn_budget
# Thread pool for running sync tool calls that internally use asyncio.run()
# (e.g., the Modal/Docker/Daytona terminal backends). Running them in a separate
@@ -140,7 +138,6 @@ class HermesAgentLoop:
temperature: float = 1.0,
max_tokens: Optional[int] = None,
extra_body: Optional[Dict[str, Any]] = None,
budget_config: Optional["BudgetConfig"] = None,
):
"""
Initialize the agent loop.
@@ -157,11 +154,7 @@ class HermesAgentLoop:
extra_body: Extra parameters passed to the OpenAI client's create() call.
Used for OpenRouter provider preferences, transforms, etc.
e.g. {"provider": {"ignore": ["DeepInfra"]}}
budget_config: Tool result persistence budget. Controls per-tool
thresholds, per-turn aggregate budget, and preview size.
If None, uses DEFAULT_BUDGET (current hardcoded values).
"""
from tools.budget_config import DEFAULT_BUDGET
self.server = server
self.tool_schemas = tool_schemas
self.valid_tool_names = valid_tool_names
@@ -170,7 +163,6 @@ class HermesAgentLoop:
self.temperature = temperature
self.max_tokens = max_tokens
self.extra_body = extra_body
self.budget_config = budget_config or DEFAULT_BUDGET
async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
"""
@@ -454,15 +446,8 @@ class HermesAgentLoop:
except (json.JSONDecodeError, TypeError):
pass
# Add tool response to conversation
tc_id = tc.get("id", "") if isinstance(tc, dict) else tc.id
tool_result = maybe_persist_tool_result(
content=tool_result,
tool_name=tool_name,
tool_use_id=tc_id,
env=get_active_env(self.task_id),
config=self.budget_config,
)
messages.append(
{
"role": "tool",
@@ -471,14 +456,6 @@ class HermesAgentLoop:
}
)
num_tcs = len(assistant_msg.tool_calls)
if num_tcs > 0:
enforce_turn_budget(
messages[-num_tcs:],
env=get_active_env(self.task_id),
config=self.budget_config,
)
turn_elapsed = _time.monotonic() - turn_start
logger.info(
"[%s] turn %d: api=%.1fs, %d tools, turn_total=%.1fs",
-1
View File
@@ -1048,7 +1048,6 @@ class AgenticOPDEnv(HermesAgentBaseEnv):
temperature=0.0,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
budget_config=self.config.build_budget_config(),
)
result = await agent.run(messages)
@@ -541,7 +541,6 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
budget_config=self.config.build_budget_config(),
)
result = await agent.run(messages)
else:
@@ -554,7 +553,6 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
budget_config=self.config.build_budget_config(),
)
result = await agent.run(messages)
@@ -549,7 +549,6 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
budget_config=self.config.build_budget_config(),
)
result = await agent.run(messages)
-44
View File
@@ -62,11 +62,6 @@ from atroposlib.type_definitions import Item
from environments.agent_loop import AgentResult, HermesAgentLoop
from environments.tool_context import ToolContext
from tools.budget_config import (
DEFAULT_RESULT_SIZE_CHARS,
DEFAULT_TURN_BUDGET_CHARS,
DEFAULT_PREVIEW_SIZE_CHARS,
)
# Import hermes-agent toolset infrastructure
from model_tools import get_tool_definitions
@@ -165,32 +160,6 @@ class HermesAgentEnvConfig(BaseEnvConfig):
"Options: hermes, mistral, llama3_json, qwen, deepseek_v3, etc.",
)
# --- Tool result budget ---
# Defaults imported from tools.budget_config (single source of truth).
default_result_size_chars: int = Field(
default=DEFAULT_RESULT_SIZE_CHARS,
description="Default per-tool threshold (chars) for persisting large results "
"to sandbox. Results exceeding this are written to /tmp/hermes-results/ "
"and replaced with a preview. Per-tool registry values take precedence "
"unless overridden via tool_result_overrides.",
)
turn_budget_chars: int = Field(
default=DEFAULT_TURN_BUDGET_CHARS,
description="Aggregate char budget per assistant turn. If all tool results "
"in a single turn exceed this, the largest are persisted to disk first.",
)
preview_size_chars: int = Field(
default=DEFAULT_PREVIEW_SIZE_CHARS,
description="Size of the inline preview shown after a tool result is persisted.",
)
tool_result_overrides: Optional[Dict[str, int]] = Field(
default=None,
description="Per-tool threshold overrides (chars). Keys are tool names, "
"values are char thresholds. Overrides both the default and registry "
"per-tool values. Example: {'terminal': 10000, 'search_files': 5000}. "
"Note: read_file is pinned to infinity and cannot be overridden.",
)
# --- Provider-specific parameters ---
# Passed as extra_body to the OpenAI client's chat.completions.create() call.
# Useful for OpenRouter provider preferences, transforms, route settings, etc.
@@ -207,16 +176,6 @@ class HermesAgentEnvConfig(BaseEnvConfig):
"transforms, and other provider-specific settings.",
)
def build_budget_config(self):
"""Build a BudgetConfig from env config fields."""
from tools.budget_config import BudgetConfig
return BudgetConfig(
default_result_size=self.default_result_size_chars,
turn_budget=self.turn_budget_chars,
preview_size=self.preview_size_chars,
tool_overrides=dict(self.tool_result_overrides) if self.tool_result_overrides else {},
)
class HermesAgentBaseEnv(BaseEnv):
"""
@@ -531,7 +490,6 @@ class HermesAgentBaseEnv(BaseEnv):
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
budget_config=self.config.build_budget_config(),
)
result = await agent.run(messages)
except NotImplementedError:
@@ -549,7 +507,6 @@ class HermesAgentBaseEnv(BaseEnv):
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
budget_config=self.config.build_budget_config(),
)
result = await agent.run(messages)
else:
@@ -563,7 +520,6 @@ class HermesAgentBaseEnv(BaseEnv):
temperature=self.config.agent_temperature,
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
budget_config=self.config.build_budget_config(),
)
result = await agent.run(messages)
-1
View File
@@ -472,7 +472,6 @@ class WebResearchEnv(HermesAgentBaseEnv):
temperature=0.0, # Deterministic for eval
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
budget_config=self.config.build_budget_config(),
)
result = await agent.run(messages)
+1 -1
View File
@@ -77,7 +77,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
logger.warning("Channel directory: failed to build %s: %s", platform.value, e)
# Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
for plat_name in ("telegram", "whatsapp", "signal", "email", "sms", "bluebubbles"):
for plat_name in ("telegram", "whatsapp", "signal", "email", "sms"):
if plat_name not in platforms:
platforms[plat_name] = _build_from_sessions(plat_name)
-34
View File
@@ -63,7 +63,6 @@ class Platform(Enum):
WEBHOOK = "webhook"
FEISHU = "feishu"
WECOM = "wecom"
BLUEBUBBLES = "bluebubbles"
@dataclass
@@ -288,9 +287,6 @@ class GatewayConfig:
# WeCom uses extra dict for bot credentials
elif platform == Platform.WECOM and config.extra.get("bot_id"):
connected.append(platform)
# BlueBubbles uses extra dict for local server config
elif platform == Platform.BLUEBUBBLES and config.extra.get("server_url") and config.extra.get("password"):
connected.append(platform)
return connected
def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
@@ -716,13 +712,6 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
name=os.getenv("DISCORD_HOME_CHANNEL_NAME", "Home"),
)
# Reply threading mode for Discord (off/first/all)
discord_reply_mode = os.getenv("DISCORD_REPLY_TO_MODE", "").lower()
if discord_reply_mode in ("off", "first", "all"):
if Platform.DISCORD not in config.platforms:
config.platforms[Platform.DISCORD] = PlatformConfig()
config.platforms[Platform.DISCORD].reply_to_mode = discord_reply_mode
# WhatsApp (typically uses different auth mechanism)
whatsapp_enabled = os.getenv("WHATSAPP_ENABLED", "").lower() in ("true", "1", "yes")
if whatsapp_enabled:
@@ -952,29 +941,6 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
name=os.getenv("WECOM_HOME_CHANNEL_NAME", "Home"),
)
# BlueBubbles (iMessage)
bluebubbles_server_url = os.getenv("BLUEBUBBLES_SERVER_URL")
bluebubbles_password = os.getenv("BLUEBUBBLES_PASSWORD")
if bluebubbles_server_url and bluebubbles_password:
if Platform.BLUEBUBBLES not in config.platforms:
config.platforms[Platform.BLUEBUBBLES] = PlatformConfig()
config.platforms[Platform.BLUEBUBBLES].enabled = True
config.platforms[Platform.BLUEBUBBLES].extra.update({
"server_url": bluebubbles_server_url.rstrip("/"),
"password": bluebubbles_password,
"webhook_host": os.getenv("BLUEBUBBLES_WEBHOOK_HOST", "127.0.0.1"),
"webhook_port": int(os.getenv("BLUEBUBBLES_WEBHOOK_PORT", "8645")),
"webhook_path": os.getenv("BLUEBUBBLES_WEBHOOK_PATH", "/bluebubbles-webhook"),
"send_read_receipts": os.getenv("BLUEBUBBLES_SEND_READ_RECEIPTS", "true").lower() in ("true", "1", "yes"),
})
bluebubbles_home = os.getenv("BLUEBUBBLES_HOME_CHANNEL")
if bluebubbles_home and Platform.BLUEBUBBLES in config.platforms:
config.platforms[Platform.BLUEBUBBLES].home_channel = HomeChannel(
platform=Platform.BLUEBUBBLES,
chat_id=bluebubbles_home,
name=os.getenv("BLUEBUBBLES_HOME_CHANNEL_NAME", "Home"),
)
# Session settings
idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
if idle_minutes:
-5
View File
@@ -298,7 +298,6 @@ SUPPORTED_DOCUMENT_TYPES = {
".pdf": "application/pdf",
".md": "text/markdown",
".txt": "text/plain",
".log": "text/plain",
".zip": "application/zip",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
@@ -408,10 +407,6 @@ class MessageEvent:
# Auto-loaded skill for topic/channel bindings (e.g., Telegram DM Topics)
auto_skill: Optional[str] = None
# Internal flag — set for synthetic events (e.g. background process
# completion notifications) that must bypass user authorization checks.
internal: bool = False
# Timestamps
timestamp: datetime = field(default_factory=datetime.now)
-828
View File
@@ -1,828 +0,0 @@
"""BlueBubbles iMessage platform adapter.
Uses the local BlueBubbles macOS server for outbound REST sends and inbound
webhooks. Supports text messaging, media attachments (images, voice, video,
documents), tapback reactions, typing indicators, and read receipts.
Architecture based on PR #5869 (benjaminsehl) with inbound attachment
downloading from PR #4588 (YuhangLin).
"""
import asyncio
import json
import logging
import os
import re
import uuid
from datetime import datetime
from typing import Any, Dict, List, Optional
from urllib.parse import quote
import httpx
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
cache_image_from_bytes,
cache_audio_from_bytes,
cache_document_from_bytes,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
DEFAULT_WEBHOOK_HOST = "127.0.0.1"
DEFAULT_WEBHOOK_PORT = 8645
DEFAULT_WEBHOOK_PATH = "/bluebubbles-webhook"
MAX_TEXT_LENGTH = 4000
# Tapback reaction codes (BlueBubbles associatedMessageType values)
_TAPBACK_ADDED = {
2000: "love", 2001: "like", 2002: "dislike",
2003: "laugh", 2004: "emphasize", 2005: "question",
}
_TAPBACK_REMOVED = {
3000: "love", 3001: "like", 3002: "dislike",
3003: "laugh", 3004: "emphasize", 3005: "question",
}
# Webhook event types that carry user messages
_MESSAGE_EVENTS = {"new-message", "message", "updated-message"}
# Log redaction patterns
_PHONE_RE = re.compile(r"\+?\d{7,15}")
_EMAIL_RE = re.compile(r"[\w.+-]+@[\w-]+\.[\w.]+")
def _redact(text: str) -> str:
"""Redact phone numbers and emails from log output."""
text = _PHONE_RE.sub("[REDACTED]", text)
text = _EMAIL_RE.sub("[REDACTED]", text)
return text
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def check_bluebubbles_requirements() -> bool:
try:
import aiohttp # noqa: F401
import httpx as _httpx # noqa: F401
except ImportError:
return False
return True
def _normalize_server_url(raw: str) -> str:
value = (raw or "").strip()
if not value:
return ""
if not re.match(r"^https?://", value, flags=re.I):
value = f"http://{value}"
return value.rstrip("/")
def _strip_markdown(text: str) -> str:
"""Strip common markdown formatting for iMessage plain-text delivery."""
text = re.sub(r"\*\*(.+?)\*\*", r"\1", text, flags=re.DOTALL)
text = re.sub(r"\*(.+?)\*", r"\1", text, flags=re.DOTALL)
text = re.sub(r"__(.+?)__", r"\1", text, flags=re.DOTALL)
text = re.sub(r"_(.+?)_", r"\1", text, flags=re.DOTALL)
text = re.sub(r"```[a-zA-Z0-9_+-]*\n?", "", text)
text = re.sub(r"`(.+?)`", r"\1", text)
text = re.sub(r"^#{1,6}\s+", "", text, flags=re.MULTILINE)
text = re.sub(r"\[([^\]]+)\]\(([^\)]+)\)", r"\1", text)
text = re.sub(r"\n{3,}", "\n\n", text)
return text.strip()
# ---------------------------------------------------------------------------
# Adapter
# ---------------------------------------------------------------------------
class BlueBubblesAdapter(BasePlatformAdapter):
platform = Platform.BLUEBUBBLES
MAX_MESSAGE_LENGTH = MAX_TEXT_LENGTH
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.BLUEBUBBLES)
extra = config.extra or {}
self.server_url = _normalize_server_url(
extra.get("server_url") or os.getenv("BLUEBUBBLES_SERVER_URL", "")
)
self.password = extra.get("password") or os.getenv("BLUEBUBBLES_PASSWORD", "")
self.webhook_host = (
extra.get("webhook_host")
or os.getenv("BLUEBUBBLES_WEBHOOK_HOST", DEFAULT_WEBHOOK_HOST)
)
self.webhook_port = int(
extra.get("webhook_port")
or os.getenv("BLUEBUBBLES_WEBHOOK_PORT", str(DEFAULT_WEBHOOK_PORT))
)
self.webhook_path = (
extra.get("webhook_path")
or os.getenv("BLUEBUBBLES_WEBHOOK_PATH", DEFAULT_WEBHOOK_PATH)
)
if not str(self.webhook_path).startswith("/"):
self.webhook_path = f"/{self.webhook_path}"
self.send_read_receipts = bool(extra.get("send_read_receipts", True))
self.client: Optional[httpx.AsyncClient] = None
self._runner = None
self._private_api_enabled: Optional[bool] = None
self._helper_connected: bool = False
self._guid_cache: Dict[str, str] = {}
# ------------------------------------------------------------------
# API helpers
# ------------------------------------------------------------------
def _api_url(self, path: str) -> str:
sep = "&" if "?" in path else "?"
return f"{self.server_url}{path}{sep}password={quote(self.password, safe='')}"
async def _api_get(self, path: str) -> Dict[str, Any]:
assert self.client is not None
res = await self.client.get(self._api_url(path))
res.raise_for_status()
return res.json()
async def _api_post(self, path: str, payload: Dict[str, Any]) -> Dict[str, Any]:
assert self.client is not None
res = await self.client.post(self._api_url(path), json=payload)
res.raise_for_status()
return res.json()
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
async def connect(self) -> bool:
if not self.server_url or not self.password:
logger.error(
"[bluebubbles] BLUEBUBBLES_SERVER_URL and BLUEBUBBLES_PASSWORD are required"
)
return False
from aiohttp import web
self.client = httpx.AsyncClient(timeout=30.0)
try:
await self._api_get("/api/v1/ping")
info = await self._api_get("/api/v1/server/info")
server_data = (info or {}).get("data", {})
self._private_api_enabled = bool(server_data.get("private_api"))
self._helper_connected = bool(server_data.get("helper_connected"))
logger.info(
"[bluebubbles] connected to %s (private_api=%s, helper=%s)",
self.server_url,
self._private_api_enabled,
self._helper_connected,
)
except Exception as exc:
logger.error(
"[bluebubbles] cannot reach server at %s: %s", self.server_url, exc
)
if self.client:
await self.client.aclose()
self.client = None
return False
app = web.Application()
app.router.add_get("/health", lambda _: web.Response(text="ok"))
app.router.add_post(self.webhook_path, self._handle_webhook)
self._runner = web.AppRunner(app)
await self._runner.setup()
site = web.TCPSite(self._runner, self.webhook_host, self.webhook_port)
await site.start()
self._mark_connected()
logger.info(
"[bluebubbles] webhook listening on http://%s:%s%s",
self.webhook_host,
self.webhook_port,
self.webhook_path,
)
return True
async def disconnect(self) -> None:
if self.client:
await self.client.aclose()
self.client = None
if self._runner:
await self._runner.cleanup()
self._runner = None
self._mark_disconnected()
# ------------------------------------------------------------------
# Chat GUID resolution
# ------------------------------------------------------------------
async def _resolve_chat_guid(self, target: str) -> Optional[str]:
"""Resolve an email/phone to a BlueBubbles chat GUID.
If *target* already contains a semicolon (raw GUID format like
``iMessage;-;user@example.com``), it is returned as-is. Otherwise
the adapter queries the BlueBubbles chat list and matches on
``chatIdentifier`` or participant address.
"""
target = (target or "").strip()
if not target:
return None
# Already a raw GUID
if ";" in target:
return target
if target in self._guid_cache:
return self._guid_cache[target]
try:
payload = await self._api_post(
"/api/v1/chat/query",
{"limit": 100, "offset": 0, "with": ["participants"]},
)
for chat in payload.get("data", []) or []:
guid = chat.get("guid") or chat.get("chatGuid")
identifier = chat.get("chatIdentifier") or chat.get("identifier")
if identifier == target:
if guid:
self._guid_cache[target] = guid
return guid
for part in chat.get("participants", []) or []:
if (part.get("address") or "").strip() == target and guid:
self._guid_cache[target] = guid
return guid
except Exception:
pass
return None
async def _create_chat_for_handle(
self, address: str, message: str
) -> SendResult:
"""Create a new chat by sending the first message to *address*."""
payload = {
"addresses": [address],
"message": message,
"tempGuid": f"temp-{datetime.utcnow().timestamp()}",
}
try:
res = await self._api_post("/api/v1/chat/new", payload)
data = res.get("data") or {}
msg_id = data.get("guid") or data.get("messageGuid") or "ok"
return SendResult(success=True, message_id=str(msg_id), raw_response=res)
except Exception as exc:
return SendResult(success=False, error=str(exc))
# ------------------------------------------------------------------
# Text sending
# ------------------------------------------------------------------
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
text = _strip_markdown(content or "")
if not text:
return SendResult(success=False, error="BlueBubbles send requires text")
chunks = self.truncate_message(text, max_length=self.MAX_MESSAGE_LENGTH)
last = SendResult(success=True)
for chunk in chunks:
guid = await self._resolve_chat_guid(chat_id)
if not guid:
# If the target looks like an address, try creating a new chat
if self._private_api_enabled and (
"@" in chat_id or re.match(r"^\+\d+", chat_id)
):
return await self._create_chat_for_handle(chat_id, chunk)
return SendResult(
success=False,
error=f"BlueBubbles chat not found for target: {chat_id}",
)
payload: Dict[str, Any] = {
"chatGuid": guid,
"tempGuid": f"temp-{datetime.utcnow().timestamp()}",
"message": chunk,
}
if reply_to and self._private_api_enabled and self._helper_connected:
payload["method"] = "private-api"
payload["selectedMessageGuid"] = reply_to
payload["partIndex"] = 0
try:
res = await self._api_post("/api/v1/message/text", payload)
data = res.get("data") or {}
msg_id = data.get("guid") or data.get("messageGuid") or "ok"
last = SendResult(
success=True, message_id=str(msg_id), raw_response=res
)
except Exception as exc:
return SendResult(success=False, error=str(exc))
return last
# ------------------------------------------------------------------
# Media sending (outbound)
# ------------------------------------------------------------------
async def _send_attachment(
self,
chat_id: str,
file_path: str,
filename: Optional[str] = None,
caption: Optional[str] = None,
is_audio_message: bool = False,
) -> SendResult:
"""Send a file attachment via BlueBubbles multipart upload."""
if not self.client:
return SendResult(success=False, error="Not connected")
if not os.path.isfile(file_path):
return SendResult(success=False, error=f"File not found: {file_path}")
guid = await self._resolve_chat_guid(chat_id)
if not guid:
return SendResult(success=False, error=f"Chat not found: {chat_id}")
fname = filename or os.path.basename(file_path)
try:
with open(file_path, "rb") as f:
files = {"attachment": (fname, f, "application/octet-stream")}
data: Dict[str, str] = {
"chatGuid": guid,
"name": fname,
"tempGuid": uuid.uuid4().hex,
}
if is_audio_message:
data["isAudioMessage"] = "true"
res = await self.client.post(
self._api_url("/api/v1/message/attachment"),
files=files,
data=data,
timeout=120,
)
res.raise_for_status()
result = res.json()
if caption:
await self.send(chat_id, caption)
if result.get("status") == 200:
rdata = result.get("data") or {}
msg_id = rdata.get("guid") if isinstance(rdata, dict) else None
return SendResult(
success=True, message_id=msg_id, raw_response=result
)
return SendResult(
success=False,
error=result.get("message", "Attachment upload failed"),
)
except Exception as e:
return SendResult(success=False, error=str(e))
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
try:
from gateway.platforms.base import cache_image_from_url
local_path = await cache_image_from_url(image_url)
return await self._send_attachment(chat_id, local_path, caption=caption)
except Exception:
return await super().send_image(chat_id, image_url, caption, reply_to)
async def send_image_file(
self,
chat_id: str,
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
return await self._send_attachment(chat_id, image_path, caption=caption)
async def send_voice(
self,
chat_id: str,
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
return await self._send_attachment(
chat_id, audio_path, caption=caption, is_audio_message=True
)
async def send_video(
self,
chat_id: str,
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
return await self._send_attachment(chat_id, video_path, caption=caption)
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
return await self._send_attachment(
chat_id, file_path, filename=file_name, caption=caption
)
async def send_animation(
self,
chat_id: str,
animation_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
return await self.send_image(
chat_id, animation_url, caption, reply_to, metadata
)
# ------------------------------------------------------------------
# Typing indicators
# ------------------------------------------------------------------
async def send_typing(self, chat_id: str, metadata=None) -> None:
if not self._private_api_enabled or not self._helper_connected or not self.client:
return
try:
guid = await self._resolve_chat_guid(chat_id)
if guid:
encoded = quote(guid, safe="")
await self.client.post(
self._api_url(f"/api/v1/chat/{encoded}/typing"), timeout=5
)
except Exception:
pass
async def stop_typing(self, chat_id: str) -> None:
if not self._private_api_enabled or not self._helper_connected or not self.client:
return
try:
guid = await self._resolve_chat_guid(chat_id)
if guid:
encoded = quote(guid, safe="")
await self.client.delete(
self._api_url(f"/api/v1/chat/{encoded}/typing"), timeout=5
)
except Exception:
pass
# ------------------------------------------------------------------
# Read receipts
# ------------------------------------------------------------------
async def mark_read(self, chat_id: str) -> bool:
if not self._private_api_enabled or not self._helper_connected or not self.client:
return False
try:
guid = await self._resolve_chat_guid(chat_id)
if guid:
encoded = quote(guid, safe="")
await self.client.post(
self._api_url(f"/api/v1/chat/{encoded}/read"), timeout=5
)
return True
except Exception:
pass
return False
# ------------------------------------------------------------------
# Tapback reactions
# ------------------------------------------------------------------
async def send_reaction(
self,
chat_id: str,
message_guid: str,
reaction: str,
part_index: int = 0,
) -> SendResult:
"""Send a tapback reaction (requires Private API helper)."""
if not self._private_api_enabled or not self._helper_connected:
return SendResult(
success=False, error="Private API helper not connected"
)
guid = await self._resolve_chat_guid(chat_id)
if not guid:
return SendResult(success=False, error=f"Chat not found: {chat_id}")
try:
res = await self._api_post(
"/api/v1/message/react",
{
"chatGuid": guid,
"selectedMessageGuid": message_guid,
"reaction": reaction,
"partIndex": part_index,
},
)
return SendResult(success=True, raw_response=res)
except Exception as exc:
return SendResult(success=False, error=str(exc))
# ------------------------------------------------------------------
# Chat info
# ------------------------------------------------------------------
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
is_group = ";+;" in (chat_id or "")
info: Dict[str, Any] = {
"name": chat_id,
"type": "group" if is_group else "dm",
}
try:
guid = await self._resolve_chat_guid(chat_id)
if guid:
encoded = quote(guid, safe="")
res = await self._api_get(
f"/api/v1/chat/{encoded}?with=participants"
)
data = (res or {}).get("data", {})
display_name = (
data.get("displayName")
or data.get("chatIdentifier")
or chat_id
)
participants = []
for p in data.get("participants", []) or []:
addr = (p.get("address") or "").strip()
if addr:
participants.append(addr)
info["name"] = display_name
if participants:
info["participants"] = participants
except Exception:
pass
return info
def format_message(self, content: str) -> str:
return _strip_markdown(content)
# ------------------------------------------------------------------
# Inbound attachment downloading (from #4588)
# ------------------------------------------------------------------
async def _download_attachment(
self, att_guid: str, att_meta: Dict[str, Any]
) -> Optional[str]:
"""Download an attachment from BlueBubbles and cache it locally.
Returns the local file path on success, None on failure.
"""
if not self.client:
return None
try:
encoded = quote(att_guid, safe="")
resp = await self.client.get(
self._api_url(f"/api/v1/attachment/{encoded}/download"),
timeout=60,
follow_redirects=True,
)
resp.raise_for_status()
data = resp.content
mime = (att_meta.get("mimeType") or "").lower()
transfer_name = att_meta.get("transferName", "")
if mime.startswith("image/"):
ext_map = {
"image/jpeg": ".jpg",
"image/png": ".png",
"image/gif": ".gif",
"image/webp": ".webp",
"image/heic": ".jpg",
"image/heif": ".jpg",
"image/tiff": ".jpg",
}
ext = ext_map.get(mime, ".jpg")
return cache_image_from_bytes(data, ext)
if mime.startswith("audio/"):
ext_map = {
"audio/mp3": ".mp3",
"audio/mpeg": ".mp3",
"audio/ogg": ".ogg",
"audio/wav": ".wav",
"audio/x-caf": ".mp3",
"audio/mp4": ".m4a",
"audio/aac": ".m4a",
}
ext = ext_map.get(mime, ".mp3")
return cache_audio_from_bytes(data, ext)
# Videos, documents, and everything else
filename = transfer_name or f"file_{uuid.uuid4().hex[:8]}"
return cache_document_from_bytes(data, filename)
except Exception as exc:
logger.warning(
"[bluebubbles] failed to download attachment %s: %s",
_redact(att_guid),
exc,
)
return None
# ------------------------------------------------------------------
# Webhook handling
# ------------------------------------------------------------------
def _extract_payload_record(
self, payload: Dict[str, Any]
) -> Optional[Dict[str, Any]]:
data = payload.get("data")
if isinstance(data, dict):
return data
if isinstance(data, list):
for item in data:
if isinstance(item, dict):
return item
if isinstance(payload.get("message"), dict):
return payload.get("message")
return payload if isinstance(payload, dict) else None
@staticmethod
def _value(*candidates: Any) -> Optional[str]:
for candidate in candidates:
if isinstance(candidate, str) and candidate.strip():
return candidate.strip()
return None
async def _handle_webhook(self, request):
from aiohttp import web
token = (
request.query.get("password")
or request.query.get("guid")
or request.headers.get("x-password")
or request.headers.get("x-guid")
or request.headers.get("x-bluebubbles-guid")
)
if token != self.password:
return web.json_response({"error": "unauthorized"}, status=401)
try:
raw = await request.read()
body = raw.decode("utf-8", errors="replace")
try:
payload = json.loads(body)
except Exception:
from urllib.parse import parse_qs
form = parse_qs(body)
payload_str = (
form.get("payload")
or form.get("data")
or form.get("message")
or [""]
)[0]
payload = json.loads(payload_str) if payload_str else {}
except Exception as exc:
logger.error("[bluebubbles] webhook parse error: %s", exc)
return web.json_response({"error": "invalid payload"}, status=400)
event_type = self._value(payload.get("type"), payload.get("event")) or ""
# Only process message events; silently acknowledge everything else
if event_type and event_type not in _MESSAGE_EVENTS:
return web.Response(text="ok")
record = self._extract_payload_record(payload) or {}
is_from_me = bool(
record.get("isFromMe")
or record.get("fromMe")
or record.get("is_from_me")
)
if is_from_me:
return web.Response(text="ok")
# Skip tapback reactions delivered as messages
assoc_type = record.get("associatedMessageType")
if isinstance(assoc_type, int) and assoc_type in {
**_TAPBACK_ADDED,
**_TAPBACK_REMOVED,
}:
return web.Response(text="ok")
text = (
self._value(
record.get("text"), record.get("message"), record.get("body")
)
or ""
)
# --- Inbound attachment handling ---
attachments = record.get("attachments") or []
media_urls: List[str] = []
media_types: List[str] = []
msg_type = MessageType.TEXT
for att in attachments:
att_guid = att.get("guid", "")
if not att_guid:
continue
cached = await self._download_attachment(att_guid, att)
if cached:
mime = (att.get("mimeType") or "").lower()
media_urls.append(cached)
media_types.append(mime)
if mime.startswith("image/"):
msg_type = MessageType.PHOTO
elif mime.startswith("audio/") or (att.get("uti") or "").endswith(
"caf"
):
msg_type = MessageType.VOICE
elif mime.startswith("video/"):
msg_type = MessageType.VIDEO
else:
msg_type = MessageType.DOCUMENT
# With multiple attachments, prefer PHOTO if any images present
if len(media_urls) > 1:
mime_prefixes = {(m or "").split("/")[0] for m in media_types}
if "image" in mime_prefixes:
msg_type = MessageType.PHOTO
if not text and media_urls:
text = "(attachment)"
# --- End attachment handling ---
chat_guid = self._value(
record.get("chatGuid"),
payload.get("chatGuid"),
record.get("chat_guid"),
payload.get("chat_guid"),
payload.get("guid"),
)
chat_identifier = self._value(
record.get("chatIdentifier"),
record.get("identifier"),
payload.get("chatIdentifier"),
payload.get("identifier"),
)
sender = (
self._value(
record.get("handle", {}).get("address")
if isinstance(record.get("handle"), dict)
else None,
record.get("sender"),
record.get("from"),
record.get("address"),
)
or chat_identifier
or chat_guid
)
if not (chat_guid or chat_identifier) and sender:
chat_identifier = sender
if not sender or not (chat_guid or chat_identifier) or not text:
return web.json_response({"error": "missing message fields"}, status=400)
session_chat_id = chat_guid or chat_identifier
is_group = bool(record.get("isGroup")) or (";+;" in (chat_guid or ""))
source = self.build_source(
chat_id=session_chat_id,
chat_name=chat_identifier or sender,
chat_type="group" if is_group else "dm",
user_id=sender,
user_name=sender,
chat_id_alt=chat_identifier,
)
event = MessageEvent(
text=text,
message_type=msg_type,
source=source,
raw_message=payload,
message_id=self._value(
record.get("guid"),
record.get("messageGuid"),
record.get("id"),
),
reply_to_message_id=self._value(
record.get("threadOriginatorGuid"),
record.get("associatedMessageGuid"),
),
media_urls=media_urls,
media_types=media_types,
)
task = asyncio.create_task(self.handle_message(event))
self._background_tasks.add(task)
task.add_done_callback(self._background_tasks.discard)
# Fire-and-forget read receipt
if self.send_read_receipts and session_chat_id:
asyncio.create_task(self.mark_read(session_chat_id))
return web.Response(text="ok")
+9 -32
View File
@@ -455,9 +455,6 @@ class DiscordAdapter(BasePlatformAdapter):
self._seen_messages: Dict[str, float] = {}
self._SEEN_TTL = 300 # 5 minutes
self._SEEN_MAX = 2000 # prune threshold
# Reply threading mode: "off" (no replies), "first" (reply on first
# chunk only, default), "all" (reply-reference on every chunk).
self._reply_to_mode: str = getattr(config, 'reply_to_mode', 'first') or 'first'
async def connect(self) -> bool:
"""Connect to Discord and start receiving events."""
@@ -777,7 +774,7 @@ class DiscordAdapter(BasePlatformAdapter):
message_ids = []
reference = None
if reply_to and self._reply_to_mode != "off":
if reply_to:
try:
ref_msg = await channel.fetch_message(int(reply_to))
reference = ref_msg
@@ -785,10 +782,7 @@ class DiscordAdapter(BasePlatformAdapter):
logger.debug("Could not fetch reply-to message: %s", e)
for i, chunk in enumerate(chunks):
if self._reply_to_mode == "all":
chunk_reference = reference
else: # "first" (default) or "off"
chunk_reference = reference if i == 0 else None
chunk_reference = reference if i == 0 else None
try:
msg = await channel.send(
content=chunk,
@@ -1767,9 +1761,8 @@ class DiscordAdapter(BasePlatformAdapter):
if hasattr(interaction.channel, "guild") and interaction.channel.guild:
chat_name = f"{interaction.channel.guild.name} / #{chat_name}"
# Get channel topic (if available).
# For forum threads, inherit the parent forum's topic.
chat_topic = self._get_effective_topic(interaction.channel, is_thread=is_thread)
# Get channel topic (if available)
chat_topic = getattr(interaction.channel, "topic", None)
source = self.build_source(
chat_id=str(interaction.channel_id),
@@ -1843,10 +1836,6 @@ class DiscordAdapter(BasePlatformAdapter):
chat_name = f"{guild_name} / {thread_name}" if guild_name else thread_name
# Inherit forum topic when the thread was created inside a forum channel.
_chan = getattr(interaction, "channel", None)
chat_topic = self._get_effective_topic(_chan, is_thread=True) if _chan else None
source = self.build_source(
chat_id=thread_id,
chat_name=chat_name,
@@ -1854,7 +1843,6 @@ class DiscordAdapter(BasePlatformAdapter):
user_id=str(interaction.user.id),
user_name=interaction.user.display_name,
thread_id=thread_id,
chat_topic=chat_topic,
)
event = MessageEvent(
@@ -2140,15 +2128,6 @@ class DiscordAdapter(BasePlatformAdapter):
return True
return False
def _get_effective_topic(self, channel: Any, is_thread: bool = False) -> Optional[str]:
"""Return the channel topic, falling back to the parent forum's topic for forum threads."""
topic = getattr(channel, "topic", None)
if not topic and is_thread:
parent = getattr(channel, "parent", None)
if parent and self._is_forum_parent(parent):
topic = getattr(parent, "topic", None)
return topic
def _format_thread_chat_name(self, thread: Any) -> str:
"""Build a readable chat name for thread-like Discord channels, including forum context when available."""
thread_name = getattr(thread, "name", None) or str(getattr(thread, "id", "thread"))
@@ -2316,10 +2295,8 @@ class DiscordAdapter(BasePlatformAdapter):
if hasattr(message.channel, "guild") and message.channel.guild:
chat_name = f"{message.channel.guild.name} / #{chat_name}"
# Get channel topic (if available - TextChannels have topics, DMs/threads don't).
# For threads whose parent is a forum channel, inherit the parent's topic
# so forum descriptions (e.g. project instructions) appear in the session context.
chat_topic = self._get_effective_topic(message.channel, is_thread=is_thread)
# Get channel topic (if available - TextChannels have topics, DMs/threads don't)
chat_topic = getattr(message.channel, "topic", None)
# Build source
source = self.build_source(
@@ -2382,7 +2359,7 @@ class DiscordAdapter(BasePlatformAdapter):
ext or "unknown", content_type,
)
else:
MAX_DOC_BYTES = 32 * 1024 * 1024
MAX_DOC_BYTES = 20 * 1024 * 1024
if att.size and att.size > MAX_DOC_BYTES:
logger.warning(
"[Discord] Document too large (%s bytes), skipping: %s",
@@ -2406,9 +2383,9 @@ class DiscordAdapter(BasePlatformAdapter):
media_urls.append(cached_path)
media_types.append(doc_mime)
logger.info("[Discord] Cached user document: %s", cached_path)
# Inject text content for plain-text documents (capped at 100 KB)
# Inject text content for .txt/.md files (capped at 100 KB)
MAX_TEXT_INJECT_BYTES = 100 * 1024
if ext in (".md", ".txt", ".log") and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
if ext in (".md", ".txt") and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
try:
text_content = raw_bytes.decode("utf-8")
display_name = att.filename or f"document{ext}"
-148
View File
@@ -20,7 +20,6 @@ from __future__ import annotations
import asyncio
import hashlib
import hmac
import itertools
import json
import logging
import mimetypes
@@ -1053,9 +1052,6 @@ class FeishuAdapter(BasePlatformAdapter):
self._media_batch_state = FeishuBatchState()
self._pending_media_batches = self._media_batch_state.events
self._pending_media_batch_tasks = self._media_batch_state.tasks
# Exec approval button state (approval_id → {session_key, message_id, chat_id})
self._approval_state: Dict[int, Dict[str, str]] = {}
self._approval_counter = itertools.count(1)
self._load_seen_message_ids()
@staticmethod
@@ -1398,104 +1394,6 @@ class FeishuAdapter(BasePlatformAdapter):
logger.error("[Feishu] Failed to edit message %s: %s", message_id, exc, exc_info=True)
return SendResult(success=False, error=str(exc))
async def send_exec_approval(
self, chat_id: str, command: str, session_key: str,
description: str = "dangerous command",
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send an interactive card with approval buttons.
The buttons carry ``hermes_action`` in their value dict so that
``_handle_card_action_event`` can intercept them and call
``resolve_gateway_approval()`` to unblock the waiting agent thread.
"""
if not self._client:
return SendResult(success=False, error="Not connected")
try:
approval_id = next(self._approval_counter)
cmd_preview = command[:3000] + "..." if len(command) > 3000 else command
def _btn(label: str, action_name: str, btn_type: str = "default") -> dict:
return {
"tag": "button",
"text": {"tag": "plain_text", "content": label},
"type": btn_type,
"value": {"hermes_action": action_name, "approval_id": approval_id},
}
card = {
"config": {"wide_screen_mode": True},
"header": {
"title": {"content": "⚠️ Command Approval Required", "tag": "plain_text"},
"template": "orange",
},
"elements": [
{
"tag": "markdown",
"content": f"```\n{cmd_preview}\n```\n**Reason:** {description}",
},
{
"tag": "action",
"actions": [
_btn("✅ Allow Once", "approve_once", "primary"),
_btn("✅ Session", "approve_session"),
_btn("✅ Always", "approve_always"),
_btn("❌ Deny", "deny", "danger"),
],
},
],
}
payload = json.dumps(card, ensure_ascii=False)
response = await self._feishu_send_with_retry(
chat_id=chat_id,
msg_type="interactive",
payload=payload,
reply_to=None,
metadata=metadata,
)
result = self._finalize_send_result(response, "send_exec_approval failed")
if result.success:
self._approval_state[approval_id] = {
"session_key": session_key,
"message_id": result.message_id or "",
"chat_id": chat_id,
}
return result
except Exception as exc:
logger.warning("[Feishu] send_exec_approval failed: %s", exc)
return SendResult(success=False, error=str(exc))
async def _update_approval_card(
self, message_id: str, label: str, user_name: str, choice: str,
) -> None:
"""Replace the approval card with a resolved status card."""
if not self._client or not message_id:
return
icon = "" if choice == "deny" else ""
card = {
"config": {"wide_screen_mode": True},
"header": {
"title": {"content": f"{icon} {label}", "tag": "plain_text"},
"template": "red" if choice == "deny" else "green",
},
"elements": [
{
"tag": "markdown",
"content": f"{icon} **{label}** by {user_name}",
},
],
}
try:
payload = json.dumps(card, ensure_ascii=False)
body = self._build_update_message_body(msg_type="interactive", content=payload)
request = self._build_update_message_request(message_id=message_id, request_body=body)
await asyncio.to_thread(self._client.im.v1.message.update, request)
except Exception as exc:
logger.warning("[Feishu] Failed to update approval card %s: %s", message_id, exc)
async def send_voice(
self,
chat_id: str,
@@ -1922,52 +1820,6 @@ class FeishuAdapter(BasePlatformAdapter):
action = getattr(event, "action", None)
action_tag = str(getattr(action, "tag", "") or "button")
action_value = getattr(action, "value", {}) or {}
# --- Exec approval button intercept ---
hermes_action = action_value.get("hermes_action") if isinstance(action_value, dict) else None
if hermes_action:
approval_id = action_value.get("approval_id")
state = self._approval_state.pop(approval_id, None)
if not state:
logger.debug("[Feishu] Approval %s already resolved or unknown", approval_id)
return
choice_map = {
"approve_once": "once",
"approve_session": "session",
"approve_always": "always",
"deny": "deny",
}
choice = choice_map.get(hermes_action, "deny")
label_map = {
"once": "Approved once",
"session": "Approved for session",
"always": "Approved permanently",
"deny": "Denied",
}
label = label_map.get(choice, "Resolved")
# Resolve sender name for the status card
sender_id = SimpleNamespace(open_id=open_id, user_id=None, union_id=None)
sender_profile = await self._resolve_sender_profile(sender_id)
user_name = sender_profile.get("user_name") or open_id
# Resolve the approval — unblocks the agent thread
try:
from tools.approval import resolve_gateway_approval
count = resolve_gateway_approval(state["session_key"], choice)
logger.info(
"Feishu button resolved %d approval(s) for session %s (choice=%s, user=%s)",
count, state["session_key"], choice, user_name,
)
except Exception as exc:
logger.error("Failed to resolve gateway approval from Feishu button: %s", exc)
# Update the card to show the decision
await self._update_approval_card(state.get("message_id", ""), label, user_name, choice)
return
synthetic_text = f"/card {action_tag}"
if action_value:
try:
+1 -10
View File
@@ -647,11 +647,7 @@ class SignalAdapter(BasePlatformAdapter):
if result is not None:
self._track_sent_timestamp(result)
# Use the timestamp from the RPC result as a pseudo message_id.
# Signal doesn't have real message IDs, but the stream consumer
# needs a truthy value to follow its edit→fallback path correctly.
_msg_id = str(result.get("timestamp", "")) if isinstance(result, dict) else None
return SendResult(success=True, message_id=_msg_id or None)
return SendResult(success=True)
return SendResult(success=False, error="RPC send failed")
def _track_sent_timestamp(self, rpc_result) -> None:
@@ -841,11 +837,6 @@ class SignalAdapter(BasePlatformAdapter):
except asyncio.CancelledError:
pass
async def stop_typing(self, chat_id: str) -> None:
"""Public interface for stopping typing — called by base adapter's
_keep_typing finally block to clean up platform-level typing tasks."""
await self._stop_typing_indicator(chat_id)
# ------------------------------------------------------------------
# Chat Info
# ------------------------------------------------------------------
+4 -160
View File
@@ -14,7 +14,7 @@ import logging
import os
import re
import time
from typing import Dict, Optional, Any, Tuple
from typing import Dict, Optional, Any
try:
from slack_bolt.async_app import AsyncApp
@@ -95,12 +95,6 @@ class SlackAdapter(BasePlatformAdapter):
# respond to ALL subsequent messages in that thread automatically.
self._mentioned_threads: set = set()
self._MENTIONED_THREADS_MAX = 5000
# Assistant thread metadata keyed by (channel_id, thread_ts). Slack's
# AI Assistant lifecycle events can arrive before/alongside message
# events, and they carry the user/thread identity needed for stable
# session + memory scoping.
self._assistant_threads: Dict[Tuple[str, str], Dict[str, str]] = {}
self._ASSISTANT_THREADS_MAX = 5000
async def connect(self) -> bool:
"""Connect to Slack via Socket Mode."""
@@ -187,14 +181,6 @@ class SlackAdapter(BasePlatformAdapter):
async def handle_app_mention(event, say):
pass
@self._app.event("assistant_thread_started")
async def handle_assistant_thread_started(event, say):
await self._handle_assistant_thread_lifecycle_event(event)
@self._app.event("assistant_thread_context_changed")
async def handle_assistant_thread_context_changed(event, say):
await self._handle_assistant_thread_lifecycle_event(event)
# Register slash command handler
@self._app.command("/hermes")
async def handle_hermes_command(ack, command):
@@ -769,135 +755,6 @@ class SlackAdapter(BasePlatformAdapter):
# ----- Internal handlers -----
def _assistant_thread_key(self, channel_id: str, thread_ts: str) -> Optional[Tuple[str, str]]:
"""Return a stable cache key for Slack assistant thread metadata."""
if not channel_id or not thread_ts:
return None
return (str(channel_id), str(thread_ts))
def _extract_assistant_thread_metadata(self, event: dict) -> Dict[str, str]:
"""Extract Slack Assistant thread identity data from an event payload."""
assistant_thread = event.get("assistant_thread") or {}
context = assistant_thread.get("context") or event.get("context") or {}
channel_id = (
assistant_thread.get("channel_id")
or event.get("channel")
or context.get("channel_id")
or ""
)
thread_ts = (
assistant_thread.get("thread_ts")
or event.get("thread_ts")
or event.get("message_ts")
or ""
)
user_id = (
assistant_thread.get("user_id")
or event.get("user")
or context.get("user_id")
or ""
)
team_id = (
event.get("team")
or event.get("team_id")
or assistant_thread.get("team_id")
or ""
)
context_channel_id = context.get("channel_id") or ""
return {
"channel_id": str(channel_id) if channel_id else "",
"thread_ts": str(thread_ts) if thread_ts else "",
"user_id": str(user_id) if user_id else "",
"team_id": str(team_id) if team_id else "",
"context_channel_id": str(context_channel_id) if context_channel_id else "",
}
def _cache_assistant_thread_metadata(self, metadata: Dict[str, str]) -> None:
"""Remember assistant thread identity data for later message events."""
channel_id = metadata.get("channel_id", "")
thread_ts = metadata.get("thread_ts", "")
key = self._assistant_thread_key(channel_id, thread_ts)
if not key:
return
existing = self._assistant_threads.get(key, {})
merged = dict(existing)
merged.update({k: v for k, v in metadata.items() if v})
self._assistant_threads[key] = merged
# Evict oldest entries when the cache exceeds the limit
if len(self._assistant_threads) > self._ASSISTANT_THREADS_MAX:
excess = len(self._assistant_threads) - self._ASSISTANT_THREADS_MAX // 2
for old_key in list(self._assistant_threads)[:excess]:
del self._assistant_threads[old_key]
team_id = merged.get("team_id", "")
if team_id and channel_id:
self._channel_team[channel_id] = team_id
def _lookup_assistant_thread_metadata(
self,
event: dict,
channel_id: str = "",
thread_ts: str = "",
) -> Dict[str, str]:
"""Load cached assistant-thread metadata that matches the current event."""
metadata = self._extract_assistant_thread_metadata(event)
if channel_id and not metadata.get("channel_id"):
metadata["channel_id"] = channel_id
if thread_ts and not metadata.get("thread_ts"):
metadata["thread_ts"] = thread_ts
key = self._assistant_thread_key(
metadata.get("channel_id", ""),
metadata.get("thread_ts", ""),
)
cached = self._assistant_threads.get(key, {}) if key else {}
if cached:
merged = dict(cached)
merged.update({k: v for k, v in metadata.items() if v})
return merged
return metadata
def _seed_assistant_thread_session(self, metadata: Dict[str, str]) -> None:
"""Prime the session store so assistant threads get stable user scoping."""
session_store = getattr(self, "_session_store", None)
if not session_store:
return
channel_id = metadata.get("channel_id", "")
thread_ts = metadata.get("thread_ts", "")
user_id = metadata.get("user_id", "")
if not channel_id or not thread_ts or not user_id:
return
source = self.build_source(
chat_id=channel_id,
chat_name=channel_id,
chat_type="dm",
user_id=user_id,
thread_id=thread_ts,
chat_topic=metadata.get("context_channel_id") or None,
)
try:
session_store.get_or_create_session(source)
except Exception:
logger.debug(
"[Slack] Failed to seed assistant thread session for %s/%s",
channel_id,
thread_ts,
exc_info=True,
)
async def _handle_assistant_thread_lifecycle_event(self, event: dict) -> None:
"""Handle Slack Assistant lifecycle events that carry user/thread identity."""
metadata = self._extract_assistant_thread_metadata(event)
self._cache_assistant_thread_metadata(metadata)
self._seed_assistant_thread_session(metadata)
async def _handle_slack_message(self, event: dict) -> None:
"""Handle an incoming Slack message event."""
# Dedup: Slack Socket Mode can redeliver events after reconnects (#4777)
@@ -924,21 +781,10 @@ class SlackAdapter(BasePlatformAdapter):
return
text = event.get("text", "")
user_id = event.get("user", "")
channel_id = event.get("channel", "")
ts = event.get("ts", "")
assistant_meta = self._lookup_assistant_thread_metadata(
event,
channel_id=channel_id,
thread_ts=event.get("thread_ts", ""),
)
user_id = event.get("user") or assistant_meta.get("user_id", "")
if not channel_id:
channel_id = assistant_meta.get("channel_id", "")
team_id = (
event.get("team")
or event.get("team_id")
or assistant_meta.get("team_id", "")
)
team_id = event.get("team", "")
# Track which workspace owns this channel
if team_id and channel_id:
@@ -946,8 +792,6 @@ class SlackAdapter(BasePlatformAdapter):
# Determine if this is a DM or channel message
channel_type = event.get("channel_type", "")
if not channel_type and channel_id.startswith("D"):
channel_type = "im"
is_dm = channel_type == "im"
# Build thread_ts for session keying.
@@ -956,7 +800,7 @@ class SlackAdapter(BasePlatformAdapter):
# In DMs: only use the real thread_ts — top-level DMs should share
# one continuous session, threaded DMs get their own session.
if is_dm:
thread_ts = event.get("thread_ts") or assistant_meta.get("thread_ts") # None for top-level DMs
thread_ts = event.get("thread_ts") # None for top-level DMs
else:
thread_ts = event.get("thread_ts") or ts # ts fallback for channels
+14 -96
View File
@@ -184,8 +184,6 @@ if _config_path.exists():
# Env var from .env takes precedence (already in os.environ).
if "gateway_timeout" in _agent_cfg and "HERMES_AGENT_TIMEOUT" not in os.environ:
os.environ["HERMES_AGENT_TIMEOUT"] = str(_agent_cfg["gateway_timeout"])
if "gateway_timeout_warning" in _agent_cfg and "HERMES_AGENT_TIMEOUT_WARNING" not in os.environ:
os.environ["HERMES_AGENT_TIMEOUT_WARNING"] = str(_agent_cfg["gateway_timeout_warning"])
# Timezone: bridge config.yaml → HERMES_TIMEZONE env var.
# HERMES_TIMEZONE from .env takes precedence (already in os.environ).
_tz_cfg = _cfg.get("timezone", "")
@@ -923,11 +921,12 @@ class GatewayRunner:
@staticmethod
def _load_reasoning_config() -> dict | None:
"""Load reasoning effort from config.yaml.
"""Load reasoning effort from config with env fallback.
Reads agent.reasoning_effort from config.yaml. Valid: "xhigh",
"high", "medium", "low", "minimal", "none". Returns None to use
default (medium).
Checks agent.reasoning_effort in config.yaml first, then
HERMES_REASONING_EFFORT as a fallback. Valid: "xhigh", "high",
"medium", "low", "minimal", "none". Returns None to use default
(medium).
"""
from hermes_constants import parse_reasoning_effort
effort = ""
@@ -940,6 +939,8 @@ class GatewayRunner:
effort = str(cfg.get("agent", {}).get("reasoning_effort", "") or "").strip()
except Exception:
pass
if not effort:
effort = os.getenv("HERMES_REASONING_EFFORT", "")
result = parse_reasoning_effort(effort)
if effort and effort.strip() and result is None:
logger.warning("Unknown reasoning_effort '%s', using default (medium)", effort)
@@ -1075,7 +1076,6 @@ class GatewayRunner:
"MATRIX_ALLOWED_USERS", "DINGTALK_ALLOWED_USERS",
"FEISHU_ALLOWED_USERS",
"WECOM_ALLOWED_USERS",
"BLUEBUBBLES_ALLOWED_USERS",
"GATEWAY_ALLOWED_USERS")
)
_allow_all = os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes") or any(
@@ -1086,8 +1086,7 @@ class GatewayRunner:
"SMS_ALLOW_ALL_USERS", "MATTERMOST_ALLOW_ALL_USERS",
"MATRIX_ALLOW_ALL_USERS", "DINGTALK_ALLOW_ALL_USERS",
"FEISHU_ALLOW_ALL_USERS",
"WECOM_ALLOW_ALL_USERS",
"BLUEBUBBLES_ALLOW_ALL_USERS")
"WECOM_ALLOW_ALL_USERS")
)
if not _any_allowlist and not _allow_all:
logger.warning(
@@ -1485,14 +1484,6 @@ class GatewayRunner:
logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
except Exception as e:
logger.debug("Failed interrupting agent during shutdown: %s", e)
# Fire plugin on_session_finalize hook before memory shutdown
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook("on_session_finalize",
session_id=getattr(agent, 'session_id', None),
platform="gateway")
except Exception:
pass
# Shut down memory provider at actual session boundary
try:
if hasattr(agent, 'shutdown_memory_provider'):
@@ -1658,13 +1649,6 @@ class GatewayRunner:
adapter.gateway_runner = self # For cross-platform delivery
return adapter
elif platform == Platform.BLUEBUBBLES:
from gateway.platforms.bluebubbles import BlueBubblesAdapter, check_bluebubbles_requirements
if not check_bluebubbles_requirements():
logger.warning("BlueBubbles: aiohttp/httpx missing or BLUEBUBBLES_SERVER_URL/BLUEBUBBLES_PASSWORD not configured")
return None
return BlueBubblesAdapter(config)
return None
def _is_user_authorized(self, source: SessionSource) -> bool:
@@ -1703,7 +1687,6 @@ class GatewayRunner:
Platform.DINGTALK: "DINGTALK_ALLOWED_USERS",
Platform.FEISHU: "FEISHU_ALLOWED_USERS",
Platform.WECOM: "WECOM_ALLOWED_USERS",
Platform.BLUEBUBBLES: "BLUEBUBBLES_ALLOWED_USERS",
}
platform_allow_all_map = {
Platform.TELEGRAM: "TELEGRAM_ALLOW_ALL_USERS",
@@ -1718,7 +1701,6 @@ class GatewayRunner:
Platform.DINGTALK: "DINGTALK_ALLOW_ALL_USERS",
Platform.FEISHU: "FEISHU_ALLOW_ALL_USERS",
Platform.WECOM: "WECOM_ALLOW_ALL_USERS",
Platform.BLUEBUBBLES: "BLUEBUBBLES_ALLOW_ALL_USERS",
}
# Per-platform allow-all flag (e.g., DISCORD_ALLOW_ALL_USERS=true)
@@ -1792,11 +1774,8 @@ class GatewayRunner:
"""
source = event.source
# Internal events (e.g. background-process completion notifications)
# are system-generated and must skip user authorization.
if getattr(event, "internal", False):
pass
elif not self._is_user_authorized(source):
# Check if user is authorized
if not self._is_user_authorized(source):
logger.warning("Unauthorized user: %s (%s) on %s", source.user_id, source.user_name, source.platform.value)
# In DMs: offer pairing code. In groups: silently ignore.
if source.chat_type == "dm" and self._get_unauthorized_dm_behavior(source.platform) == "pair":
@@ -3298,15 +3277,6 @@ class GatewayRunner:
# the configured default instead of the previously switched model.
self._session_model_overrides.pop(session_key, None)
# Fire plugin on_session_finalize hook (session boundary)
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_old_sid = old_entry.session_id if old_entry else None
_invoke_hook("on_session_finalize", session_id=_old_sid,
platform=source.platform.value if source.platform else "")
except Exception:
pass
# Emit session:end hook (session is ending)
await self.hooks.emit("session:end", {
"platform": source.platform.value if source.platform else "",
@@ -3320,7 +3290,7 @@ class GatewayRunner:
"user_id": source.user_id,
"session_key": session_key,
})
# Resolve session config info to surface to the user
try:
session_info = self._format_session_info()
@@ -3331,18 +3301,9 @@ class GatewayRunner:
header = "✨ Session reset! Starting fresh."
else:
# No existing session, just create one
new_entry = self.session_store.get_or_create_session(source, force_new=True)
self.session_store.get_or_create_session(source, force_new=True)
header = "✨ New session started!"
# Fire plugin on_session_reset hook (new session guaranteed to exist)
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_new_sid = new_entry.session_id if new_entry else None
_invoke_hook("on_session_reset", session_id=_new_sid,
platform=source.platform.value if source.platform else "")
except Exception:
pass
if session_info:
return f"{header}\n\n{session_info}"
return header
@@ -5534,7 +5495,7 @@ class GatewayRunner:
Platform.TELEGRAM, Platform.DISCORD, Platform.SLACK, Platform.WHATSAPP,
Platform.SIGNAL, Platform.MATTERMOST, Platform.MATRIX,
Platform.HOMEASSISTANT, Platform.EMAIL, Platform.SMS, Platform.DINGTALK,
Platform.FEISHU, Platform.WECOM, Platform.BLUEBUBBLES, Platform.LOCAL,
Platform.FEISHU, Platform.WECOM, Platform.LOCAL,
})
async def _handle_update_command(self, event: MessageEvent) -> str:
@@ -6174,7 +6135,6 @@ class GatewayRunner:
text=synth_text,
message_type=MessageType.TEXT,
source=_source,
internal=True,
)
logger.info(
"Process %s finished — injecting agent notification for session %s",
@@ -6325,15 +6285,7 @@ class GatewayRunner:
# Falls back to env vars for backward compatibility.
# YAML 1.1 parses bare `off` as boolean False — normalise before
# the `or` chain so it doesn't silently fall through to "all".
#
# Per-platform overrides (display.tool_progress_overrides) take
# priority over the global setting — e.g. Signal users can set
# tool_progress to "off" while keeping Telegram on "all".
_display_cfg = user_config.get("display", {})
_overrides = _display_cfg.get("tool_progress_overrides", {})
_raw_tp = _overrides.get(platform_key)
if _raw_tp is None:
_raw_tp = _display_cfg.get("tool_progress")
_raw_tp = user_config.get("display", {}).get("tool_progress")
if _raw_tp is False:
_raw_tp = "off"
progress_mode = (
@@ -6437,18 +6389,6 @@ class GatewayRunner:
if not adapter:
return
# Skip tool progress for platforms that don't support message
# editing (e.g. iMessage/BlueBubbles) — each progress update
# would become a separate message bubble, which is noisy.
from gateway.platforms.base import BasePlatformAdapter as _BaseAdapter
if type(adapter).edit_message is _BaseAdapter.edit_message:
while not progress_queue.empty():
try:
progress_queue.get_nowait()
except Exception:
break
return
progress_lines = [] # Accumulated tool lines
progress_msg_id = None # ID of the progress message to edit
can_edit = True # False once an edit fails (platform doesn't support it)
@@ -7143,9 +7083,6 @@ class GatewayRunner:
# Default 1800s (30 min inactivity). 0 = unlimited.
_agent_timeout_raw = float(os.getenv("HERMES_AGENT_TIMEOUT", 1800))
_agent_timeout = _agent_timeout_raw if _agent_timeout_raw > 0 else None
_agent_warning_raw = float(os.getenv("HERMES_AGENT_TIMEOUT_WARNING", 900))
_agent_warning = _agent_warning_raw if _agent_warning_raw > 0 else None
_warning_fired = False
loop = asyncio.get_event_loop()
_executor_task = asyncio.ensure_future(
loop.run_in_executor(None, run_sync)
@@ -7178,25 +7115,6 @@ class GatewayRunner:
_idle_secs = _act.get("seconds_since_activity", 0.0)
except Exception:
pass
# Staged warning: fire once before escalating to full timeout.
if (not _warning_fired and _agent_warning is not None
and _idle_secs >= _agent_warning):
_warning_fired = True
_warn_adapter = self.adapters.get(source.platform)
if _warn_adapter:
_elapsed_warn = int(_agent_warning // 60) or 1
_remaining_mins = int((_agent_timeout - _agent_warning) // 60) or 1
try:
await _warn_adapter.send(
source.chat_id,
f"⚠️ No activity for {_elapsed_warn} min. "
f"If the agent does not respond soon, it will "
f"be timed out in {_remaining_mins} min. "
f"You can continue waiting or use /reset.",
metadata=_status_thread_metadata,
)
except Exception as _warn_err:
logger.debug("Inactivity warning send error: %s", _warn_err)
if _idle_secs >= _agent_timeout:
_inactivity_timeout = True
break
-1
View File
@@ -193,7 +193,6 @@ _PII_SAFE_PLATFORMS = frozenset({
Platform.WHATSAPP,
Platform.SIGNAL,
Platform.TELEGRAM,
Platform.BLUEBUBBLES,
})
"""Platforms where user IDs can be safely redacted (no in-message mention system
that requires raw IDs). Discord is excluded because mentions use ``<@user_id>``
+7 -119
View File
@@ -74,8 +74,6 @@ class GatewayStreamConsumer:
self._edit_supported = True # Disabled on first edit failure (Signal/Email/HA)
self._last_edit_time = 0.0
self._last_sent_text = "" # Track last-sent text to skip redundant edits
self._fallback_final_send = False
self._fallback_prefix = ""
@property
def already_sent(self) -> bool:
@@ -140,19 +138,12 @@ class GatewayStreamConsumer:
while (
len(self._accumulated) > _safe_limit
and self._message_id is not None
and self._edit_supported
):
split_at = self._accumulated.rfind("\n", 0, _safe_limit)
if split_at < _safe_limit // 2:
split_at = _safe_limit
chunk = self._accumulated[:split_at]
await self._send_or_edit(chunk)
if self._fallback_final_send:
# Edit failed while attempting to split an oversized
# message. Keep the full accumulated text intact so
# the fallback final-send path can deliver the
# remaining continuation without dropping content.
break
self._accumulated = self._accumulated[split_at:].lstrip("\n")
self._message_id = None
self._last_sent_text = ""
@@ -165,17 +156,9 @@ class GatewayStreamConsumer:
self._last_edit_time = time.monotonic()
if got_done:
# Final edit without cursor. If progressive editing failed
# mid-stream, send a single continuation/fallback message
# here instead of letting the base gateway path send the
# full response again.
if self._accumulated:
if self._fallback_final_send:
await self._send_fallback_final(self._accumulated)
elif self._message_id:
await self._send_or_edit(self._accumulated)
elif not self._already_sent:
await self._send_or_edit(self._accumulated)
# Final edit without cursor
if self._accumulated and self._message_id:
await self._send_or_edit(self._accumulated)
return
# Tool boundary: the should_edit block above already flushed
@@ -186,8 +169,6 @@ class GatewayStreamConsumer:
self._message_id = None
self._accumulated = ""
self._last_sent_text = ""
self._fallback_final_send = False
self._fallback_prefix = ""
await asyncio.sleep(0.05) # Small yield to not busy-loop
@@ -226,86 +207,6 @@ class GatewayStreamConsumer:
# Strip trailing whitespace/newlines but preserve leading content
return cleaned.rstrip()
def _visible_prefix(self) -> str:
"""Return the visible text already shown in the streamed message."""
prefix = self._last_sent_text or ""
if self.cfg.cursor and prefix.endswith(self.cfg.cursor):
prefix = prefix[:-len(self.cfg.cursor)]
return self._clean_for_display(prefix)
def _continuation_text(self, final_text: str) -> str:
"""Return only the part of final_text the user has not already seen."""
prefix = self._fallback_prefix or self._visible_prefix()
if prefix and final_text.startswith(prefix):
return final_text[len(prefix):].lstrip()
return final_text
@staticmethod
def _split_text_chunks(text: str, limit: int) -> list[str]:
"""Split text into reasonably sized chunks for fallback sends."""
if len(text) <= limit:
return [text]
chunks: list[str] = []
remaining = text
while len(remaining) > limit:
split_at = remaining.rfind("\n", 0, limit)
if split_at < limit // 2:
split_at = limit
chunks.append(remaining[:split_at])
remaining = remaining[split_at:].lstrip("\n")
if remaining:
chunks.append(remaining)
return chunks
async def _send_fallback_final(self, text: str) -> None:
"""Send the final continuation after streaming edits stop working."""
final_text = self._clean_for_display(text)
continuation = self._continuation_text(final_text)
self._fallback_final_send = False
if not continuation.strip():
# Nothing new to send — the visible partial already matches final text.
self._already_sent = True
return
raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
safe_limit = max(500, raw_limit - 100)
chunks = self._split_text_chunks(continuation, safe_limit)
last_message_id: Optional[str] = None
last_successful_chunk = ""
sent_any_chunk = False
for chunk in chunks:
result = await self.adapter.send(
chat_id=self.chat_id,
content=chunk,
metadata=self.metadata,
)
if not result.success:
if sent_any_chunk:
# Some continuation text already reached the user. Suppress
# the base gateway final-send path so we don't resend the
# full response and create another duplicate.
self._already_sent = True
self._message_id = last_message_id
self._last_sent_text = last_successful_chunk
self._fallback_prefix = ""
return
# No fallback chunk reached the user — allow the normal gateway
# final-send path to try one more time.
self._already_sent = False
self._message_id = None
self._last_sent_text = ""
self._fallback_prefix = ""
return
sent_any_chunk = True
last_successful_chunk = chunk
last_message_id = result.message_id or last_message_id
self._message_id = last_message_id
self._already_sent = True
self._last_sent_text = chunks[-1]
self._fallback_prefix = ""
async def _send_or_edit(self, text: str) -> None:
"""Send or edit the streaming message."""
# Strip MEDIA: directives so they don't appear as visible text.
@@ -331,16 +232,14 @@ class GatewayStreamConsumer:
self._last_sent_text = text
else:
# If an edit fails mid-stream (especially Telegram flood control),
# stop progressive edits and send only the missing tail once the
# final response is available.
# stop progressive edits and let the normal final send path deliver
# the complete answer instead of leaving the user with a partial.
logger.debug("Edit failed, disabling streaming for this adapter")
self._fallback_prefix = self._visible_prefix()
self._fallback_final_send = True
self._edit_supported = False
self._already_sent = True
self._already_sent = False
else:
# Editing not supported — skip intermediate updates.
# The final response will be sent by the fallback path.
# The final response will be sent by the normal path.
pass
else:
# First message — send new
@@ -353,17 +252,6 @@ class GatewayStreamConsumer:
self._message_id = result.message_id
self._already_sent = True
self._last_sent_text = text
elif result.success:
# Platform accepted the message but returned no message_id
# (e.g. Signal). Can't edit without an ID — switch to
# fallback mode: suppress intermediate deltas, send only
# the missing tail once the final response is ready.
self._already_sent = True
self._edit_supported = False
self._fallback_prefix = self._clean_for_display(text)
self._fallback_final_send = True
# Sentinel prevents re-entering this branch on every delta
self._message_id = "__no_edit__"
else:
# Initial send failed — disable streaming for this session
self._edit_supported = False
+2 -2
View File
@@ -11,5 +11,5 @@ Provides subcommands for:
- hermes cron - Manage cron jobs
"""
__version__ = "0.8.0"
__release_date__ = "2026.4.8"
__version__ = "0.7.0"
__release_date__ = "2026.4.3"
-183
View File
@@ -67,16 +67,12 @@ DEFAULT_AGENT_KEY_MIN_TTL_SECONDS = 30 * 60 # 30 minutes
ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120 # refresh 2 min before expiry
DEVICE_AUTH_POLL_INTERVAL_CAP_SECONDS = 1 # poll at most every 1s
DEFAULT_CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex"
DEFAULT_QWEN_BASE_URL = "https://portal.qwen.ai/v1"
DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
DEFAULT_GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai"
CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
QWEN_OAUTH_CLIENT_ID = "f0304373b74a44d2b584a3fb70ca9e56"
QWEN_OAUTH_TOKEN_URL = "https://chat.qwen.ai/api/v1/oauth2/token"
QWEN_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
# =============================================================================
@@ -116,12 +112,6 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
auth_type="oauth_external",
inference_base_url=DEFAULT_CODEX_BASE_URL,
),
"qwen-oauth": ProviderConfig(
id="qwen-oauth",
name="Qwen OAuth",
auth_type="oauth_external",
inference_base_url=DEFAULT_QWEN_BASE_URL,
),
"copilot": ProviderConfig(
id="copilot",
name="GitHub Copilot",
@@ -827,7 +817,6 @@ def resolve_provider(
"github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
"aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
"opencode": "opencode-zen", "zen": "opencode-zen",
"qwen-portal": "qwen-oauth", "qwen-cli": "qwen-oauth", "qwen-oauth": "qwen-oauth",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
@@ -957,176 +946,6 @@ def _codex_access_token_is_expiring(access_token: Any, skew_seconds: int) -> boo
return float(exp) <= (time.time() + max(0, int(skew_seconds)))
def _qwen_cli_auth_path() -> Path:
return Path.home() / ".qwen" / "oauth_creds.json"
def _read_qwen_cli_tokens() -> Dict[str, Any]:
auth_path = _qwen_cli_auth_path()
if not auth_path.exists():
raise AuthError(
"Qwen CLI credentials not found. Run 'qwen auth qwen-oauth' first.",
provider="qwen-oauth",
code="qwen_auth_missing",
)
try:
data = json.loads(auth_path.read_text(encoding="utf-8"))
except Exception as exc:
raise AuthError(
f"Failed to read Qwen CLI credentials from {auth_path}: {exc}",
provider="qwen-oauth",
code="qwen_auth_read_failed",
) from exc
if not isinstance(data, dict):
raise AuthError(
f"Invalid Qwen CLI credentials in {auth_path}.",
provider="qwen-oauth",
code="qwen_auth_invalid",
)
return data
def _save_qwen_cli_tokens(tokens: Dict[str, Any]) -> Path:
auth_path = _qwen_cli_auth_path()
auth_path.parent.mkdir(parents=True, exist_ok=True)
tmp_path = auth_path.with_suffix(".tmp")
tmp_path.write_text(json.dumps(tokens, indent=2, sort_keys=True) + "\n", encoding="utf-8")
os.chmod(tmp_path, stat.S_IRUSR | stat.S_IWUSR)
tmp_path.replace(auth_path)
return auth_path
def _qwen_access_token_is_expiring(expiry_date_ms: Any, skew_seconds: int = QWEN_ACCESS_TOKEN_REFRESH_SKEW_SECONDS) -> bool:
try:
expiry_ms = int(expiry_date_ms)
except Exception:
return True
return (time.time() + max(0, int(skew_seconds))) * 1000 >= expiry_ms
def _refresh_qwen_cli_tokens(tokens: Dict[str, Any], timeout_seconds: float = 20.0) -> Dict[str, Any]:
refresh_token = str(tokens.get("refresh_token", "") or "").strip()
if not refresh_token:
raise AuthError(
"Qwen OAuth refresh token missing. Re-run 'qwen auth qwen-oauth'.",
provider="qwen-oauth",
code="qwen_refresh_token_missing",
)
try:
response = httpx.post(
QWEN_OAUTH_TOKEN_URL,
headers={
"Content-Type": "application/x-www-form-urlencoded",
"Accept": "application/json",
},
data={
"grant_type": "refresh_token",
"refresh_token": refresh_token,
"client_id": QWEN_OAUTH_CLIENT_ID,
},
timeout=timeout_seconds,
)
except Exception as exc:
raise AuthError(
f"Qwen OAuth refresh failed: {exc}",
provider="qwen-oauth",
code="qwen_refresh_failed",
) from exc
if response.status_code >= 400:
body = response.text.strip()
raise AuthError(
"Qwen OAuth refresh failed. Re-run 'qwen auth qwen-oauth'."
+ (f" Response: {body}" if body else ""),
provider="qwen-oauth",
code="qwen_refresh_failed",
)
try:
payload = response.json()
except Exception as exc:
raise AuthError(
f"Qwen OAuth refresh returned invalid JSON: {exc}",
provider="qwen-oauth",
code="qwen_refresh_invalid_json",
) from exc
if not isinstance(payload, dict) or not str(payload.get("access_token", "") or "").strip():
raise AuthError(
"Qwen OAuth refresh response missing access_token.",
provider="qwen-oauth",
code="qwen_refresh_invalid_response",
)
expires_in = payload.get("expires_in")
try:
expires_in_seconds = int(expires_in)
except Exception:
expires_in_seconds = 6 * 60 * 60
refreshed = {
"access_token": str(payload.get("access_token", "") or "").strip(),
"refresh_token": str(payload.get("refresh_token", refresh_token) or refresh_token).strip(),
"token_type": str(payload.get("token_type", tokens.get("token_type", "Bearer")) or "Bearer").strip() or "Bearer",
"resource_url": str(payload.get("resource_url", tokens.get("resource_url", "portal.qwen.ai")) or "portal.qwen.ai").strip(),
"expiry_date": int(time.time() * 1000) + max(1, expires_in_seconds) * 1000,
}
_save_qwen_cli_tokens(refreshed)
return refreshed
def resolve_qwen_runtime_credentials(
*,
force_refresh: bool = False,
refresh_if_expiring: bool = True,
refresh_skew_seconds: int = QWEN_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
) -> Dict[str, Any]:
tokens = _read_qwen_cli_tokens()
access_token = str(tokens.get("access_token", "") or "").strip()
should_refresh = bool(force_refresh)
if not should_refresh and refresh_if_expiring:
should_refresh = _qwen_access_token_is_expiring(tokens.get("expiry_date"), refresh_skew_seconds)
if should_refresh:
tokens = _refresh_qwen_cli_tokens(tokens)
access_token = str(tokens.get("access_token", "") or "").strip()
if not access_token:
raise AuthError(
"Qwen OAuth access token missing. Re-run 'qwen auth qwen-oauth'.",
provider="qwen-oauth",
code="qwen_access_token_missing",
)
base_url = os.getenv("HERMES_QWEN_BASE_URL", "").strip().rstrip("/") or DEFAULT_QWEN_BASE_URL
return {
"provider": "qwen-oauth",
"base_url": base_url,
"api_key": access_token,
"source": "qwen-cli",
"expires_at_ms": tokens.get("expiry_date"),
"auth_file": str(_qwen_cli_auth_path()),
}
def get_qwen_auth_status() -> Dict[str, Any]:
auth_path = _qwen_cli_auth_path()
try:
creds = resolve_qwen_runtime_credentials(refresh_if_expiring=False)
return {
"logged_in": True,
"auth_file": str(auth_path),
"source": creds.get("source"),
"api_key": creds.get("api_key"),
"expires_at_ms": creds.get("expires_at_ms"),
}
except AuthError as exc:
return {
"logged_in": False,
"auth_file": str(auth_path),
"error": str(exc),
}
# =============================================================================
# SSH / remote session detection
# =============================================================================
@@ -2253,8 +2072,6 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
return get_nous_auth_status()
if target == "openai-codex":
return get_codex_auth_status()
if target == "qwen-oauth":
return get_qwen_auth_status()
if target == "copilot-acp":
return get_external_process_provider_status(target)
# API-key providers
+2 -22
View File
@@ -32,7 +32,7 @@ from hermes_constants import OPENROUTER_BASE_URL
# Providers that support OAuth login in addition to API keys.
_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex", "qwen-oauth"}
_OAUTH_CAPABLE_PROVIDERS = {"anthropic", "nous", "openai-codex"}
def _get_custom_provider_names() -> list:
@@ -147,7 +147,7 @@ def auth_add_command(args) -> None:
if provider.startswith(CUSTOM_POOL_PREFIX):
requested_type = AUTH_TYPE_API_KEY
else:
requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex", "qwen-oauth"} else AUTH_TYPE_API_KEY
requested_type = AUTH_TYPE_OAUTH if provider in {"anthropic", "nous", "openai-codex"} else AUTH_TYPE_API_KEY
pool = load_pool(provider)
@@ -250,26 +250,6 @@ def auth_add_command(args) -> None:
print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
return
if provider == "qwen-oauth":
creds = auth_mod.resolve_qwen_runtime_credentials(refresh_if_expiring=False)
label = (getattr(args, "label", None) or "").strip() or label_from_token(
creds["api_key"],
_oauth_default_label(provider, len(pool.entries()) + 1),
)
entry = PooledCredential(
provider=provider,
id=uuid.uuid4().hex[:6],
label=label,
auth_type=AUTH_TYPE_OAUTH,
priority=0,
source=f"{SOURCE_MANUAL}:qwen_cli",
access_token=creds["api_key"],
base_url=creds.get("base_url"),
)
pool.add_entry(entry)
print(f'Added {provider} OAuth credential #{len(pool.entries())}: "{entry.label}"')
return
raise SystemExit(f"`hermes auth add {provider}` is not implemented for auth type {requested_type} yet.")
+2 -8
View File
@@ -295,16 +295,10 @@ def _format_context_length(tokens: int) -> str:
"""Format a token count for display (e.g. 128000 → '128K', 1048576 → '1M')."""
if tokens >= 1_000_000:
val = tokens / 1_000_000
rounded = round(val)
if abs(val - rounded) < 0.05:
return f"{rounded}M"
return f"{val:.1f}M"
return f"{val:g}M"
elif tokens >= 1_000:
val = tokens / 1_000
rounded = round(val)
if abs(val - rounded) < 0.05:
return f"{rounded}K"
return f"{val:.1f}K"
return f"{val:g}K"
return str(tokens)
+3 -61
View File
@@ -39,7 +39,6 @@ _EXTRA_ENV_KEYS = frozenset({
"DINGTALK_CLIENT_ID", "DINGTALK_CLIENT_SECRET",
"FEISHU_APP_ID", "FEISHU_APP_SECRET", "FEISHU_ENCRYPT_KEY", "FEISHU_VERIFICATION_TOKEN",
"WECOM_BOT_ID", "WECOM_SECRET",
"BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_PASSWORD",
"TERMINAL_ENV", "TERMINAL_SSH_KEY", "TERMINAL_SSH_PORT",
"WHATSAPP_MODE", "WHATSAPP_ENABLED",
"MATTERMOST_HOME_CHANNEL", "MATTERMOST_REPLY_MODE",
@@ -158,14 +157,7 @@ def get_project_root() -> Path:
return Path(__file__).parent.parent.resolve()
def _secure_dir(path):
"""Set directory to owner-only access (0700). No-op on Windows.
Skipped in managed mode the NixOS module sets group-readable
permissions (0750) so interactive users in the hermes group can
share state with the gateway service.
"""
if is_managed():
return
"""Set directory to owner-only access (0700). No-op on Windows."""
try:
os.chmod(path, 0o700)
except (OSError, NotImplementedError):
@@ -173,13 +165,7 @@ def _secure_dir(path):
def _secure_file(path):
"""Set file to owner-only read/write (0600). No-op on Windows.
Skipped in managed mode the NixOS activation script sets
group-readable permissions (0640) on config files.
"""
if is_managed():
return
"""Set file to owner-only read/write (0600). No-op on Windows."""
try:
if os.path.exists(str(path)):
os.chmod(path, 0o600)
@@ -231,10 +217,6 @@ DEFAULT_CONFIG = {
# (force on/off for all models), or a list of model-name substrings
# to match (e.g. ["gpt", "codex", "gemini", "qwen"]).
"tool_use_enforcement": "auto",
# Staged inactivity warning: send a warning to the user at this
# threshold before escalating to a full timeout. The warning fires
# once per run and does not interrupt the agent. 0 = disable warning.
"gateway_timeout_warning": 900,
},
"terminal": {
@@ -397,7 +379,6 @@ DEFAULT_CONFIG = {
"show_cost": False, # Show $ cost in the status bar (off by default)
"skin": "default",
"tool_progress_command": False, # Enable /verbose command in messaging gateway
"tool_progress_overrides": {}, # Per-platform overrides: {"signal": "off", "telegram": "all"}
"tool_preview_length": 0, # Max chars for tool call previews (0 = no limit, show full paths/commands)
},
@@ -432,7 +413,7 @@ DEFAULT_CONFIG = {
"stt": {
"enabled": True,
"provider": "local", # "local" (free, faster-whisper) | "groq" | "openai" (Whisper API) | "mistral" (Voxtral Transcribe)
"provider": "local", # "local" (free, faster-whisper) | "groq" | "openai" (Whisper API)
"local": {
"model": "base", # tiny, base, small, medium, large-v3
"language": "", # auto-detect by default; set to "en", "es", "fr", etc. to force
@@ -440,9 +421,6 @@ DEFAULT_CONFIG = {
"openai": {
"model": "whisper-1", # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
},
"mistral": {
"model": "voxtral-mini-latest", # voxtral-mini-latest, voxtral-mini-2602
},
},
"voice": {
@@ -746,14 +724,6 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"HERMES_QWEN_BASE_URL": {
"description": "Qwen Portal base URL override (default: https://portal.qwen.ai/v1)",
"prompt": "Qwen Portal base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"OPENCODE_ZEN_API_KEY": {
"description": "OpenCode Zen API key (pay-as-you-go access to curated models)",
"prompt": "OpenCode Zen API key",
@@ -1005,13 +975,6 @@ OPTIONAL_ENV_VARS = {
"password": False,
"category": "messaging",
},
"DISCORD_REPLY_TO_MODE": {
"description": "Discord reply threading mode: 'off' (no reply references), 'first' (reply on first message only, default), 'all' (reply on every chunk)",
"prompt": "Discord reply mode (off/first/all)",
"url": None,
"password": False,
"category": "messaging",
},
"SLACK_BOT_TOKEN": {
"description": "Slack bot token (xoxb-). Get from OAuth & Permissions after installing your app. "
"Required scopes: chat:write, app_mentions:read, channels:history, groups:history, "
@@ -1125,27 +1088,6 @@ OPTIONAL_ENV_VARS = {
"category": "messaging",
"advanced": True,
},
"BLUEBUBBLES_SERVER_URL": {
"description": "BlueBubbles server URL for iMessage integration (e.g. http://192.168.1.10:1234)",
"prompt": "BlueBubbles server URL",
"url": "https://bluebubbles.app/",
"password": False,
"category": "messaging",
},
"BLUEBUBBLES_PASSWORD": {
"description": "BlueBubbles server password (from BlueBubbles Server → Settings → API)",
"prompt": "BlueBubbles server password",
"url": None,
"password": True,
"category": "messaging",
},
"BLUEBUBBLES_ALLOWED_USERS": {
"description": "Comma-separated iMessage addresses (email or phone) allowed to use the bot",
"prompt": "Allowed iMessage addresses (comma-separated)",
"url": None,
"password": False,
"category": "messaging",
},
"GATEWAY_ALLOW_ALL_USERS": {
"description": "Allow all users to interact with messaging bots (true/false). Default: false.",
"prompt": "Allow all users (true/false)",
-15
View File
@@ -93,21 +93,6 @@ def cron_list(show_all: bool = False):
script = job.get("script")
if script:
print(f" Script: {script}")
# Execution history
last_status = job.get("last_status")
if last_status:
last_run = job.get("last_run_at", "?")
if last_status == "ok":
status_display = color("ok", Colors.GREEN)
else:
status_display = color(f"{last_status}: {job.get('last_error', '?')}", Colors.RED)
print(f" Last run: {last_run} {status_display}")
delivery_err = job.get("last_delivery_error")
if delivery_err:
print(f" {color('⚠ Delivery failed:', Colors.YELLOW)} {delivery_err}")
print()
from hermes_cli.gateway import find_gateway_pids
+56 -70
View File
@@ -812,83 +812,69 @@ def run_doctor(args):
check_warn("No GITHUB_TOKEN", f"(60 req/hr rate limit — set in {_DHH}/.env for better rates)")
# =========================================================================
# Memory Provider (only check the active provider, if any)
# Honcho memory
# =========================================================================
print()
print(color("Memory Provider", Colors.CYAN, Colors.BOLD))
print(color("Honcho Memory", Colors.CYAN, Colors.BOLD))
_active_memory_provider = ""
try:
import yaml as _yaml
_mem_cfg_path = HERMES_HOME / "config.yaml"
if _mem_cfg_path.exists():
with open(_mem_cfg_path) as _f:
_raw_cfg = _yaml.safe_load(_f) or {}
_active_memory_provider = (_raw_cfg.get("memory") or {}).get("provider", "")
except Exception:
pass
from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path
hcfg = HonchoClientConfig.from_global_config()
_honcho_cfg_path = resolve_config_path()
if not _active_memory_provider:
check_ok("Built-in memory active", "(no external provider configured — this is fine)")
elif _active_memory_provider == "honcho":
try:
from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path
hcfg = HonchoClientConfig.from_global_config()
_honcho_cfg_path = resolve_config_path()
if not _honcho_cfg_path.exists():
check_warn("Honcho config not found", "run: hermes memory setup")
elif not hcfg.enabled:
check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
elif not (hcfg.api_key or hcfg.base_url):
check_fail("Honcho API key or base URL not set", "run: hermes memory setup")
issues.append("No Honcho API key — run 'hermes memory setup'")
else:
from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client
reset_honcho_client()
try:
get_honcho_client(hcfg)
check_ok(
"Honcho connected",
f"workspace={hcfg.workspace_id} mode={hcfg.recall_mode} freq={hcfg.write_frequency}",
)
except Exception as _e:
check_fail("Honcho connection failed", str(_e))
issues.append(f"Honcho unreachable: {_e}")
except ImportError:
check_warn("honcho-ai not installed", "pip install honcho-ai")
except Exception as _e:
check_warn("Honcho check failed", str(_e))
if not _honcho_cfg_path.exists():
check_warn("Honcho config not found", "run: hermes memory setup")
elif not hcfg.enabled:
check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
elif not (hcfg.api_key or hcfg.base_url):
check_fail("Honcho API key or base URL not set", "run: hermes memory setup")
issues.append("No Honcho API key — run 'hermes memory setup'")
else:
from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client
reset_honcho_client()
# =========================================================================
# Mem0 memory
# =========================================================================
print()
print(color("◆ Mem0 Memory", Colors.CYAN, Colors.BOLD))
try:
from plugins.memory.mem0 import _load_config as _load_mem0_config
mem0_cfg = _load_mem0_config()
mem0_key = mem0_cfg.get("api_key", "")
if mem0_key:
check_ok("Mem0 API key configured")
check_info(f"user_id={mem0_cfg.get('user_id', '?')} agent_id={mem0_cfg.get('agent_id', '?')}")
# Check if mem0.json exists but is missing api_key (the bug we fixed)
mem0_json = HERMES_HOME / "mem0.json"
if mem0_json.exists():
try:
get_honcho_client(hcfg)
check_ok(
"Honcho connected",
f"workspace={hcfg.workspace_id} mode={hcfg.recall_mode} freq={hcfg.write_frequency}",
)
except Exception as _e:
check_fail("Honcho connection failed", str(_e))
issues.append(f"Honcho unreachable: {_e}")
except ImportError:
check_fail("honcho-ai not installed", "pip install honcho-ai")
issues.append("Honcho is set as memory provider but honcho-ai is not installed")
except Exception as _e:
check_warn("Honcho check failed", str(_e))
elif _active_memory_provider == "mem0":
try:
from plugins.memory.mem0 import _load_config as _load_mem0_config
mem0_cfg = _load_mem0_config()
mem0_key = mem0_cfg.get("api_key", "")
if mem0_key:
check_ok("Mem0 API key configured")
check_info(f"user_id={mem0_cfg.get('user_id', '?')} agent_id={mem0_cfg.get('agent_id', '?')}")
else:
check_fail("Mem0 API key not set", "(set MEM0_API_KEY in .env or run hermes memory setup)")
issues.append("Mem0 is set as memory provider but API key is missing")
except ImportError:
check_fail("Mem0 plugin not loadable", "pip install mem0ai")
issues.append("Mem0 is set as memory provider but mem0ai is not installed")
except Exception as _e:
check_warn("Mem0 check failed", str(_e))
else:
# Generic check for other memory providers (openviking, hindsight, etc.)
try:
from plugins.memory import load_memory_provider
_provider = load_memory_provider(_active_memory_provider)
if _provider and _provider.is_available():
check_ok(f"{_active_memory_provider} provider active")
elif _provider:
check_warn(f"{_active_memory_provider} configured but not available", "run: hermes memory status")
else:
check_warn(f"{_active_memory_provider} plugin not found", "run: hermes memory setup")
except Exception as _e:
check_warn(f"{_active_memory_provider} check failed", str(_e))
import json as _json
file_cfg = _json.loads(mem0_json.read_text())
if not file_cfg.get("api_key") and mem0_key:
check_info("api_key from .env (not in mem0.json) — this is fine")
except Exception:
pass
else:
check_warn("Mem0 not configured", "(set MEM0_API_KEY in .env or run hermes memory setup)")
except ImportError:
check_warn("Mem0 plugin not loadable", "(optional)")
except Exception as _e:
check_warn("Mem0 check failed", str(_e))
# =========================================================================
# Profiles
-28
View File
@@ -1588,34 +1588,6 @@ _PLATFORMS = [
"help": "Chat ID for scheduled results and notifications."},
],
},
{
"key": "bluebubbles",
"label": "BlueBubbles (iMessage)",
"emoji": "💬",
"token_var": "BLUEBUBBLES_SERVER_URL",
"setup_instructions": [
"1. Install BlueBubbles on a Mac that will act as your iMessage server:",
" https://bluebubbles.app/",
"2. Complete the BlueBubbles setup wizard — sign in with your Apple ID",
"3. In BlueBubbles Settings → API, note the Server URL and password",
"4. The server URL is typically http://<your-mac-ip>:1234",
"5. Hermes connects via the BlueBubbles REST API and receives",
" incoming messages via a local webhook",
"6. To authorize users, use DM pairing: hermes pairing generate bluebubbles",
" Share the code — the user sends it via iMessage to get approved",
],
"vars": [
{"name": "BLUEBUBBLES_SERVER_URL", "prompt": "BlueBubbles server URL (e.g. http://192.168.1.10:1234)", "password": False,
"help": "The URL shown in BlueBubbles Settings → API."},
{"name": "BLUEBUBBLES_PASSWORD", "prompt": "BlueBubbles server password", "password": True,
"help": "The password shown in BlueBubbles Settings → API."},
{"name": "BLUEBUBBLES_ALLOWED_USERS", "prompt": "Pre-authorized phone numbers or iMessage IDs (comma-separated, or leave empty for DM pairing)", "password": False,
"is_allowlist": True,
"help": "Optional — pre-authorize specific users. Leave empty to use DM pairing instead (recommended)."},
{"name": "BLUEBUBBLES_HOME_CHANNEL", "prompt": "Home channel (phone number or iMessage ID for cron/notifications, or empty)", "password": False,
"help": "Phone number or Apple ID to deliver cron results and notifications to."},
],
},
]
+1 -59
View File
@@ -918,7 +918,6 @@ def select_provider_and_model(args=None):
"openrouter": "OpenRouter",
"nous": "Nous Portal",
"openai-codex": "OpenAI Codex",
"qwen-oauth": "Qwen OAuth",
"copilot-acp": "GitHub Copilot ACP",
"copilot": "GitHub Copilot",
"anthropic": "Anthropic",
@@ -948,7 +947,6 @@ def select_provider_and_model(args=None):
("openrouter", "OpenRouter (100+ models, pay-per-use)"),
("anthropic", "Anthropic (Claude models — API key or Claude Code)"),
("openai-codex", "OpenAI Codex"),
("qwen-oauth", "Qwen OAuth (reuses local Qwen CLI login)"),
("copilot", "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
("huggingface", "Hugging Face Inference Providers (20+ open models)"),
]
@@ -1045,8 +1043,6 @@ def select_provider_and_model(args=None):
_model_flow_nous(config, current_model, args=args)
elif selected_provider == "openai-codex":
_model_flow_openai_codex(config, current_model)
elif selected_provider == "qwen-oauth":
_model_flow_qwen_oauth(config, current_model)
elif selected_provider == "copilot-acp":
_model_flow_copilot_acp(config, current_model)
elif selected_provider == "copilot":
@@ -1363,56 +1359,6 @@ def _model_flow_openai_codex(config, current_model=""):
_DEFAULT_QWEN_PORTAL_MODELS = [
"qwen3-coder-plus",
"qwen3-coder",
]
def _model_flow_qwen_oauth(_config, current_model=""):
"""Qwen OAuth provider: reuse local Qwen CLI login, then pick model."""
from hermes_cli.auth import (
get_qwen_auth_status,
resolve_qwen_runtime_credentials,
_prompt_model_selection,
_save_model_choice,
_update_config_for_provider,
DEFAULT_QWEN_BASE_URL,
)
from hermes_cli.models import fetch_api_models
status = get_qwen_auth_status()
if not status.get("logged_in"):
print("Not logged into Qwen CLI OAuth.")
print("Run: qwen auth qwen-oauth")
auth_file = status.get("auth_file")
if auth_file:
print(f"Expected credentials file: {auth_file}")
if status.get("error"):
print(f"Error: {status.get('error')}")
return
# Try live model discovery, fall back to curated list.
models = None
try:
creds = resolve_qwen_runtime_credentials(refresh_if_expiring=True)
models = fetch_api_models(creds["api_key"], creds["base_url"])
except Exception:
pass
if not models:
models = list(_DEFAULT_QWEN_PORTAL_MODELS)
default = current_model or (models[0] if models else "qwen3-coder-plus")
selected = _prompt_model_selection(models, current_model=default)
if selected:
_save_model_choice(selected)
_update_config_for_provider("qwen-oauth", DEFAULT_QWEN_BASE_URL)
print(f"Default model set to: {selected} (via Qwen OAuth)")
else:
print("No change.")
def _model_flow_custom(config):
"""Custom endpoint: collect URL, API key, and model name.
@@ -1474,11 +1420,7 @@ def _model_flow_custom(config):
f"Hermes will still save it."
)
if probe.get("suggested_base_url"):
suggested = probe["suggested_base_url"]
if suggested.endswith("/v1"):
print(f" If this server expects /v1 in the path, try base URL: {suggested}")
else:
print(f" If /v1 should not be in the base URL, try: {suggested}")
print(f" If this server expects /v1, try base URL: {probe['suggested_base_url']}")
# Select model — use probe results when available, fall back to manual input
model_name = ""
-1
View File
@@ -84,7 +84,6 @@ _PASSTHROUGH_PROVIDERS: frozenset[str] = frozenset({
"minimax",
"minimax-cn",
"alibaba",
"qwen-oauth",
"huggingface",
"openai-codex",
"custom",
+6 -9
View File
@@ -537,11 +537,8 @@ def switch_model(
)
else:
# --- Step c: On aggregator, convert vendor:model to vendor/model ---
# Only convert when there's no slash — a slash means the name
# is already in vendor/model format and the colon is a variant
# tag (:free, :extended, :fast) that must be preserved.
colon_pos = raw_input.find(":")
if colon_pos > 0 and "/" not in raw_input and is_aggregator(current_provider):
if colon_pos > 0 and is_aggregator(current_provider):
left = raw_input[:colon_pos].strip().lower()
right = raw_input[colon_pos + 1:].strip()
if left and right:
@@ -794,12 +791,12 @@ def list_authenticated_providers(
if overlay.auth_type in ("oauth_device_code", "oauth_external", "external_process"):
# These use auth stores, not env vars — check for auth.json entries
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if store and (pid in store.get("providers", {}) or pid in store.get("credential_pool", {})):
from hermes_cli.auth import _read_auth_store
store = _read_auth_store()
if store and pid in store:
has_creds = True
except Exception as exc:
logger.debug("Auth store check failed for %s: %s", pid, exc)
except Exception:
pass
if not has_creds:
continue
+9 -16
View File
@@ -144,22 +144,18 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"kimi-k2-0905-preview",
],
"minimax": [
"MiniMax-M1",
"MiniMax-M1-40k",
"MiniMax-M1-80k",
"MiniMax-M1-128k",
"MiniMax-M1-256k",
"MiniMax-M2.5",
"MiniMax-M2.7",
"MiniMax-M2.7-highspeed",
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
],
"minimax-cn": [
"MiniMax-M1",
"MiniMax-M1-40k",
"MiniMax-M1-80k",
"MiniMax-M1-128k",
"MiniMax-M1-256k",
"MiniMax-M2.5",
"MiniMax-M2.7",
"MiniMax-M2.7-highspeed",
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
],
"anthropic": [
"claude-opus-4-6",
@@ -483,7 +479,6 @@ _PROVIDER_LABELS = {
"ai-gateway": "AI Gateway",
"kilocode": "Kilo Code",
"alibaba": "Alibaba Cloud (DashScope)",
"qwen-oauth": "Qwen OAuth (Portal)",
"huggingface": "Hugging Face",
"custom": "Custom endpoint",
}
@@ -523,7 +518,6 @@ _PROVIDER_ALIASES = {
"aliyun": "alibaba",
"qwen": "alibaba",
"alibaba-cloud": "alibaba",
"qwen-portal": "qwen-oauth",
"hf": "huggingface",
"hugging-face": "huggingface",
"huggingface-hub": "huggingface",
@@ -769,7 +763,6 @@ def list_available_providers() -> list[dict[str, str]]:
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "huggingface",
"zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
"qwen-oauth",
"opencode-zen", "opencode-go",
"ai-gateway", "deepseek", "custom",
]
@@ -1532,7 +1525,7 @@ def probe_api_models(
return {
"models": None,
"probed_url": tried[0] if tried else normalized.rstrip("/") + "/models",
"probed_url": tried[-1] if tried else normalized.rstrip("/") + "/models",
"resolved_base_url": normalized,
"suggested_base_url": alternate_base if alternate_base != normalized else None,
"used_fallback": False,
-2
View File
@@ -61,8 +61,6 @@ VALID_HOOKS: Set[str] = {
"post_api_request",
"on_session_start",
"on_session_end",
"on_session_finalize",
"on_session_reset",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
-6
View File
@@ -58,12 +58,6 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
auth_type="oauth_external",
base_url_override="https://chatgpt.com/backend-api/codex",
),
"qwen-oauth": HermesOverlay(
transport="openai_chat",
auth_type="oauth_external",
base_url_override="https://portal.qwen.ai/v1",
base_url_env_var="HERMES_QWEN_BASE_URL",
),
"copilot-acp": HermesOverlay(
transport="codex_responses",
auth_type="external_process",
+1 -42
View File
@@ -14,13 +14,11 @@ from agent.credential_pool import CredentialPool, PooledCredential, get_custom_p
from hermes_cli.auth import (
AuthError,
DEFAULT_CODEX_BASE_URL,
DEFAULT_QWEN_BASE_URL,
PROVIDER_REGISTRY,
format_auth_error,
resolve_provider,
resolve_nous_runtime_credentials,
resolve_codex_runtime_credentials,
resolve_qwen_runtime_credentials,
resolve_api_key_provider_credentials,
resolve_external_process_provider_credentials,
has_usable_secret,
@@ -150,9 +148,6 @@ def _resolve_runtime_from_pool_entry(
if provider == "openai-codex":
api_mode = "codex_responses"
base_url = base_url or DEFAULT_CODEX_BASE_URL
elif provider == "qwen-oauth":
api_mode = "chat_completions"
base_url = base_url or DEFAULT_QWEN_BASE_URL
elif provider == "anthropic":
api_mode = "anthropic_messages"
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
@@ -168,16 +163,6 @@ def _resolve_runtime_from_pool_entry(
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Honour model.base_url from config.yaml when the configured provider
# matches this provider — same pattern as the Anthropic branch above.
# Only override when the pool entry has no explicit base_url (i.e. it
# fell back to the hardcoded default). Env var overrides win (#6039).
pconfig = PROVIDER_REGISTRY.get(provider)
pool_url_is_default = pconfig and base_url.rstrip("/") == pconfig.inference_base_url.rstrip("/")
if configured_provider == provider and pool_url_is_default:
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
if cfg_base_url:
base_url = cfg_base_url
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
@@ -696,24 +681,6 @@ def resolve_runtime_provider(
logger.info("Auto-detected Codex provider but credentials failed; "
"falling through to next provider.")
if provider == "qwen-oauth":
try:
creds = resolve_qwen_runtime_credentials()
return {
"provider": "qwen-oauth",
"api_mode": "chat_completions",
"base_url": creds.get("base_url", "").rstrip("/"),
"api_key": creds.get("api_key", ""),
"source": creds.get("source", "qwen-cli"),
"expires_at_ms": creds.get("expires_at_ms"),
"requested_provider": requested_provider,
}
except AuthError:
if requested_provider != "auto":
raise
logger.info("Qwen OAuth credentials failed; "
"falling through to next provider.")
if provider == "copilot-acp":
creds = resolve_external_process_provider_credentials(provider)
return {
@@ -757,15 +724,7 @@ def resolve_runtime_provider(
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
creds = resolve_api_key_provider_credentials(provider)
# Honour model.base_url from config.yaml when the configured provider
# matches this provider — mirrors the Anthropic path above. Without
# this, users who set model.base_url to e.g. api.minimaxi.com/anthropic
# (China endpoint) still get the hardcoded api.minimax.io default (#6039).
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
cfg_base_url = ""
if cfg_provider == provider:
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
base_url = cfg_base_url or creds.get("base_url", "").rstrip("/")
base_url = creds.get("base_url", "").rstrip("/")
api_mode = "chat_completions"
if provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", ""))
+2 -73
View File
@@ -105,8 +105,8 @@ _DEFAULT_PROVIDER_MODELS = {
],
"zai": ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
"kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
"minimax": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"],
"minimax-cn": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"],
"minimax": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
"minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
"ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
"kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
"opencode-zen": ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash", "glm-5", "kimi-k2.5", "minimax-m2.7"],
@@ -2167,71 +2167,6 @@ def _setup_whatsapp():
print_info("or personal self-chat) and pair via QR code.")
def _setup_bluebubbles():
"""Configure BlueBubbles iMessage gateway."""
print_header("BlueBubbles (iMessage)")
existing = get_env_value("BLUEBUBBLES_SERVER_URL")
if existing:
print_info("BlueBubbles: already configured")
if not prompt_yes_no("Reconfigure BlueBubbles?", False):
return
print_info("Connects Hermes to iMessage via BlueBubbles — a free, open-source")
print_info("macOS server that bridges iMessage to any device.")
print_info(" Requires a Mac running BlueBubbles Server v1.0.0+")
print_info(" Download: https://bluebubbles.app/")
print()
print_info("In BlueBubbles Server → Settings → API, note your Server URL and Password.")
print()
server_url = prompt("BlueBubbles server URL (e.g. http://192.168.1.10:1234)")
if not server_url:
print_warning("Server URL is required — skipping BlueBubbles setup")
return
save_env_value("BLUEBUBBLES_SERVER_URL", server_url.rstrip("/"))
password = prompt("BlueBubbles server password", password=True)
if not password:
print_warning("Password is required — skipping BlueBubbles setup")
return
save_env_value("BLUEBUBBLES_PASSWORD", password)
print_success("BlueBubbles credentials saved")
print()
print_info("🔒 Security: Restrict who can message your bot")
print_info(" Use iMessage addresses: email (user@icloud.com) or phone (+15551234567)")
print()
allowed_users = prompt("Allowed iMessage addresses (comma-separated, leave empty for open access)")
if allowed_users:
save_env_value("BLUEBUBBLES_ALLOWED_USERS", allowed_users.replace(" ", ""))
print_success("BlueBubbles allowlist configured")
else:
print_info("⚠️ No allowlist set — anyone who can iMessage you can use the bot!")
print()
print_info("📬 Home Channel: phone or email for cron job delivery and notifications.")
print_info(" You can also set this later with /set-home in your iMessage chat.")
home_channel = prompt("Home channel address (leave empty to set later)")
if home_channel:
save_env_value("BLUEBUBBLES_HOME_CHANNEL", home_channel)
print()
print_info("Advanced settings (defaults are fine for most setups):")
if prompt_yes_no("Configure webhook listener settings?", False):
webhook_port = prompt("Webhook listener port (default: 8645)")
if webhook_port:
try:
save_env_value("BLUEBUBBLES_WEBHOOK_PORT", str(int(webhook_port)))
print_success(f"Webhook port set to {webhook_port}")
except ValueError:
print_warning("Invalid port number, using default 8645")
print()
print_info("Requires the BlueBubbles Private API helper for typing indicators,")
print_info("read receipts, and tapback reactions. Basic messaging works without it.")
print_info(" Install: https://docs.bluebubbles.app/helper-bundle/installation")
def _setup_webhooks():
"""Configure webhook integration."""
print_header("Webhooks")
@@ -2286,7 +2221,6 @@ _GATEWAY_PLATFORMS = [
("Matrix", "MATRIX_ACCESS_TOKEN", _setup_matrix),
("Mattermost", "MATTERMOST_TOKEN", _setup_mattermost),
("WhatsApp", "WHATSAPP_ENABLED", _setup_whatsapp),
("BlueBubbles (iMessage)", "BLUEBUBBLES_SERVER_URL", _setup_bluebubbles),
("Webhooks (GitHub, GitLab, etc.)", "WEBHOOK_ENABLED", _setup_webhooks),
]
@@ -2330,7 +2264,6 @@ def setup_gateway(config: dict):
or get_env_value("MATRIX_ACCESS_TOKEN")
or get_env_value("MATRIX_PASSWORD")
or get_env_value("WHATSAPP_ENABLED")
or get_env_value("BLUEBUBBLES_SERVER_URL")
or get_env_value("WEBHOOK_ENABLED")
)
if any_messaging:
@@ -2350,8 +2283,6 @@ def setup_gateway(config: dict):
missing_home.append("Discord")
if get_env_value("SLACK_BOT_TOKEN") and not get_env_value("SLACK_HOME_CHANNEL"):
missing_home.append("Slack")
if get_env_value("BLUEBUBBLES_SERVER_URL") and not get_env_value("BLUEBUBBLES_HOME_CHANNEL"):
missing_home.append("BlueBubbles")
if missing_home:
print()
@@ -2522,8 +2453,6 @@ def _get_section_config_summary(config: dict, section_key: str) -> Optional[str]
platforms.append("WhatsApp")
if get_env_value("SIGNAL_ACCOUNT"):
platforms.append("Signal")
if get_env_value("BLUEBUBBLES_SERVER_URL"):
platforms.append("BlueBubbles")
if platforms:
return ", ".join(platforms)
return None # No platforms configured — section must run
-1
View File
@@ -23,7 +23,6 @@ PLATFORMS = {
"slack": "💼 Slack",
"whatsapp": "📱 WhatsApp",
"signal": "📡 Signal",
"bluebubbles": "💬 BlueBubbles",
"email": "📧 Email",
"homeassistant": "🏠 Home Assistant",
"mattermost": "💬 Mattermost",
+1 -19
View File
@@ -153,14 +153,12 @@ def show_status(args):
print(color("◆ Auth Providers", Colors.CYAN, Colors.BOLD))
try:
from hermes_cli.auth import get_nous_auth_status, get_codex_auth_status, get_qwen_auth_status
from hermes_cli.auth import get_nous_auth_status, get_codex_auth_status
nous_status = get_nous_auth_status()
codex_status = get_codex_auth_status()
qwen_status = get_qwen_auth_status()
except Exception:
nous_status = {}
codex_status = {}
qwen_status = {}
nous_logged_in = bool(nous_status.get("logged_in"))
print(
@@ -191,21 +189,6 @@ def show_status(args):
if codex_status.get("error") and not codex_logged_in:
print(f" Error: {codex_status.get('error')}")
qwen_logged_in = bool(qwen_status.get("logged_in"))
print(
f" {'Qwen OAuth':<12} {check_mark(qwen_logged_in)} "
f"{'logged in' if qwen_logged_in else 'not logged in (run: qwen auth qwen-oauth)'}"
)
qwen_auth_file = qwen_status.get("auth_file")
if qwen_auth_file:
print(f" Auth file: {qwen_auth_file}")
qwen_exp = qwen_status.get("expires_at_ms")
if qwen_exp:
from datetime import datetime, timezone
print(f" Access exp: {datetime.fromtimestamp(int(qwen_exp) / 1000, tz=timezone.utc).isoformat()}")
if qwen_status.get("error") and not qwen_logged_in:
print(f" Error: {qwen_status.get('error')}")
# =========================================================================
# Nous Subscription Features
# =========================================================================
@@ -302,7 +285,6 @@ def show_status(args):
"DingTalk": ("DINGTALK_CLIENT_ID", None),
"Feishu": ("FEISHU_APP_ID", "FEISHU_HOME_CHANNEL"),
"WeCom": ("WECOM_BOT_ID", "WECOM_HOME_CHANNEL"),
"BlueBubbles": ("BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_HOME_CHANNEL"),
}
for name, (token_var, home_var) in platforms.items():
-1
View File
@@ -126,7 +126,6 @@ PLATFORMS = {
"slack": {"label": "💼 Slack", "default_toolset": "hermes-slack"},
"whatsapp": {"label": "📱 WhatsApp", "default_toolset": "hermes-whatsapp"},
"signal": {"label": "📡 Signal", "default_toolset": "hermes-signal"},
"bluebubbles": {"label": "💙 BlueBubbles", "default_toolset": "hermes-bluebubbles"},
"homeassistant": {"label": "🏠 Home Assistant", "default_toolset": "hermes-homeassistant"},
"email": {"label": "📧 Email", "default_toolset": "hermes-email"},
"matrix": {"label": "💬 Matrix", "default_toolset": "hermes-matrix"},
+25 -21
View File
@@ -1235,10 +1235,10 @@ class SessionDB:
self._execute_write(_do)
def delete_session(self, session_id: str) -> bool:
"""Delete a session and all its messages.
"""Delete a session, its child sessions, and all their messages.
Child sessions are orphaned (parent_session_id set to NULL) rather
than cascade-deleted, so they remain accessible independently.
Child sessions (subagent runs, compression continuations) are deleted
first to satisfy the ``parent_session_id`` foreign key constraint.
Returns True if the session was found and deleted.
"""
def _do(conn):
@@ -1247,12 +1247,15 @@ class SessionDB:
)
if cursor.fetchone()[0] == 0:
return False
# Orphan child sessions so FK constraint is satisfied
conn.execute(
"UPDATE sessions SET parent_session_id = NULL "
"WHERE parent_session_id = ?",
# Delete child sessions first (FK constraint)
child_ids = [r[0] for r in conn.execute(
"SELECT id FROM sessions WHERE parent_session_id = ?",
(session_id,),
)
).fetchall()]
for cid in child_ids:
conn.execute("DELETE FROM messages WHERE session_id = ?", (cid,))
conn.execute("DELETE FROM sessions WHERE id = ?", (cid,))
# Delete the session itself
conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
return True
@@ -1261,9 +1264,9 @@ class SessionDB:
def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
"""Delete sessions older than N days. Returns count of deleted sessions.
Only prunes ended sessions (not active ones). Child sessions outside
the prune window are orphaned (parent_session_id set to NULL) rather
than cascade-deleted.
Only prunes ended sessions (not active ones). Child sessions whose
parents are being pruned are deleted first to satisfy the
``parent_session_id`` foreign key constraint.
"""
cutoff = time.time() - (older_than_days * 86400)
@@ -1281,16 +1284,17 @@ class SessionDB:
)
session_ids = set(row["id"] for row in cursor.fetchall())
if not session_ids:
return 0
# Orphan any sessions whose parent is about to be deleted
placeholders = ",".join("?" * len(session_ids))
conn.execute(
f"UPDATE sessions SET parent_session_id = NULL "
f"WHERE parent_session_id IN ({placeholders})",
list(session_ids),
)
# Delete children first whose parents are in the prune set
# (avoids FK constraint errors)
for sid in list(session_ids):
child_ids = [r[0] for r in conn.execute(
"SELECT id FROM sessions WHERE parent_session_id = ?",
(sid,),
).fetchall()]
for cid in child_ids:
conn.execute("DELETE FROM messages WHERE session_id = ?", (cid,))
conn.execute("DELETE FROM sessions WHERE id = ?", (cid,))
session_ids.discard(cid) # don't double-delete
for sid in session_ids:
conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
+2 -10
View File
@@ -464,11 +464,7 @@
addToSystemPackages = mkOption {
type = types.bool;
default = false;
description = ''
Add the hermes CLI to environment.systemPackages and export
HERMES_HOME system-wide (via environment.variables) so interactive
shells share state with the gateway service.
'';
description = "Add hermes CLI to environment.systemPackages.";
};
# ── OCI Container (opt-in) ──────────────────────────────────────────
@@ -549,12 +545,8 @@
})
# ── Host CLI ──────────────────────────────────────────────────────
# Add the hermes CLI to system PATH and export HERMES_HOME system-wide
# so interactive shells share state (sessions, skills, cron) with the
# gateway service instead of creating a separate ~/.hermes/.
(lib.mkIf cfg.addToSystemPackages {
environment.systemPackages = [ cfg.package ];
environment.variables.HERMES_HOME = "${cfg.stateDir}/.hermes";
})
# ── Directories ───────────────────────────────────────────────────
@@ -609,7 +601,7 @@
# so this is the single source of truth for both native and container mode.
${lib.optionalString (cfg.environment != {} || cfg.environmentFiles != []) ''
ENV_FILE="${cfg.stateDir}/.hermes/.env"
install -o ${cfg.user} -g ${cfg.group} -m 0640 /dev/null "$ENV_FILE"
install -o ${cfg.user} -g ${cfg.group} -m 0600 /dev/null "$ENV_FILE"
cat > "$ENV_FILE" <<'HERMES_NIX_ENV_EOF'
${envFileContent}
HERMES_NIX_ENV_EOF
-55
View File
@@ -6,68 +6,14 @@
uv2nix,
pyproject-nix,
pyproject-build-systems,
stdenv,
}:
let
workspace = uv2nix.lib.workspace.loadWorkspace { workspaceRoot = ./..; };
hacks = callPackage pyproject-nix.build.hacks { };
overlay = workspace.mkPyprojectOverlay {
sourcePreference = "wheel";
};
isAarch64Darwin = stdenv.hostPlatform.system == "aarch64-darwin";
# Keep the workspace locked through uv2nix, but supply the local voice stack
# from nixpkgs so wheel-only transitive artifacts do not break evaluation.
mkPrebuiltPassthru = dependencies: {
inherit dependencies;
optional-dependencies = { };
dependency-groups = { };
};
mkPrebuiltOverride = final: from: dependencies:
hacks.nixpkgsPrebuilt {
inherit from;
prev = {
nativeBuildInputs = [ final.pyprojectHook ];
passthru = mkPrebuiltPassthru dependencies;
};
};
pythonPackageOverrides = final: _prev:
if isAarch64Darwin then {
numpy = mkPrebuiltOverride final python311.pkgs.numpy { };
av = mkPrebuiltOverride final python311.pkgs.av { };
humanfriendly = mkPrebuiltOverride final python311.pkgs.humanfriendly { };
coloredlogs = mkPrebuiltOverride final python311.pkgs.coloredlogs {
humanfriendly = [ ];
};
onnxruntime = mkPrebuiltOverride final python311.pkgs.onnxruntime {
coloredlogs = [ ];
numpy = [ ];
packaging = [ ];
};
ctranslate2 = mkPrebuiltOverride final python311.pkgs.ctranslate2 {
numpy = [ ];
pyyaml = [ ];
};
faster-whisper = mkPrebuiltOverride final python311.pkgs.faster-whisper {
av = [ ];
ctranslate2 = [ ];
huggingface-hub = [ ];
onnxruntime = [ ];
tokenizers = [ ];
tqdm = [ ];
};
} else {};
pythonSet =
(callPackage pyproject-nix.build.packages {
python = python311;
@@ -75,7 +21,6 @@ let
(lib.composeManyExtensions [
pyproject-build-systems.overlays.default
overlay
pythonPackageOverrides
]);
in
pythonSet.mkVirtualEnv "hermes-agent-env" {
@@ -1803,34 +1803,30 @@ class Migrator:
def migrate_cron_jobs(self, config: Optional[Dict[str, Any]] = None) -> None:
config = config or self.load_openclaw_config()
cron = config.get("cron") or {}
if not cron:
self.record("cron-jobs", None, None, "skipped", "No cron configuration found")
return
# Archive the full cron config
if self.archive_dir and self.execute:
self.archive_dir.mkdir(parents=True, exist_ok=True)
dest = self.archive_dir / "cron-config.json"
dest.write_text(json.dumps(cron, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
self.record("cron-jobs", "openclaw.json cron.*", str(dest), "archived",
"Cron config archived. Use 'hermes cron' to recreate jobs manually.")
else:
self.record("cron-jobs", "openclaw.json cron.*", "archive/cron-config.json",
"archived", "Would archive cron config")
# Also check for cron store files
cron_store = self.source_root / "cron"
found_any = False
# Archive the full cron config when present
if cron:
found_any = True
if self.archive_dir and self.execute:
self.archive_dir.mkdir(parents=True, exist_ok=True)
dest = self.archive_dir / "cron-config.json"
dest.write_text(json.dumps(cron, indent=2, ensure_ascii=False) + "\n", encoding="utf-8")
self.record("cron-jobs", "openclaw.json cron.*", str(dest), "archived",
"Cron config archived. Use 'hermes cron' to recreate jobs manually.")
else:
self.record("cron-jobs", "openclaw.json cron.*", "archive/cron-config.json",
"archived", "Would archive cron config")
# Also check for cron store files even when config.cron is missing
if cron_store.is_dir() and self.archive_dir:
found_any = True
dest_cron = self.archive_dir / "cron-store"
if self.execute:
shutil.copytree(cron_store, dest_cron, dirs_exist_ok=True)
self.record("cron-jobs", str(cron_store), str(dest_cron), "archived",
"Cron job store archived")
if not found_any:
self.record("cron-jobs", None, None, "skipped", "No cron configuration found")
# ── Hooks ─────────────────────────────────────────────────
def migrate_hooks_config(self, config: Optional[Dict[str, Any]] = None) -> None:
config = config or self.load_openclaw_config()
@@ -2458,15 +2454,6 @@ class Migrator:
notes.append(f"- **{item.kind}**: {item.reason}")
notes.append("")
has_cron_config_archive = any(
i.kind == "cron-jobs" and i.status == "archived" and i.destination and i.destination.endswith("cron-config.json")
for i in self.items
)
has_cron_store_archive = any(
i.kind == "cron-jobs" and i.status == "archived" and i.destination and i.destination.endswith("cron-store")
for i in self.items
)
notes.extend([
"## IMPORTANT: Archive the OpenClaw Directory",
"",
@@ -2488,14 +2475,7 @@ class Migrator:
"- Run `hermes claw cleanup` to archive the OpenClaw directory (prevents state confusion)",
"- Run `hermes setup` to configure any remaining settings",
"- Run `hermes mcp list` to verify MCP servers were imported correctly",
])
if has_cron_config_archive:
notes.append("- Run `hermes cron` to recreate scheduled tasks (see archive/cron-config.json)")
elif has_cron_store_archive:
notes.append("- Run `hermes cron` to recreate scheduled tasks (see archived cron-store)")
notes.extend([
"- Run `hermes cron` to recreate scheduled tasks (see archive/cron-config.json)",
"- Run `hermes gateway install` if you need the gateway service",
"- Review `~/.hermes/config.yaml` for any adjustments",
"",
+452
View File
@@ -0,0 +1,452 @@
---
name: hermes-simulator
description: "Rehoboam-class worldsim. OSINT-powered personality simulation engine with CLI interface. Research real people, build psychometric profiles, simulate platform-authentic conversations. WorldSim with teeth."
version: 6.0.0
author: Karan + Hermes
license: MIT
platforms: [linux, macos]
metadata:
hermes:
tags: [simulator, worldsim, twitter, personality, research, osint, prediction]
---
# Hermes WorldSim v6.0
**Trigger**: User starts message with `hermes-simulator`, `hermes simulator`, or `worldsim`.
## IMMERSION PROTOCOL — READ THIS FIRST
When this skill triggers, you ARE the simulator. Not an assistant running a simulator.
Not an agent executing commands. The simulator itself. Every token you output is
simulator output. There are no user-assistant turns. There is only the worldsim terminal.
**RULES:**
1. NEVER break character as the simulator. No "Let me research that for you."
No "I'll now run the pipeline." No assistant-style framing. Ever.
2. ALL output is in the worldsim CLI format. Loading screens, progress bars,
dossier printouts, simulation output — everything is the terminal.
3. Tool calls (web_search, terminal, etc.) happen SILENTLY. The user sees
the simulator processing, not the agent making API calls. Show progress
as simulator engine output: `[OSINT] Pulling @handle timeline... 47 posts`
NOT as assistant narration.
4. CoT is the simulator's internal engine. Your thinking is the simulator
thinking. Not you-the-assistant planning what to do.
5. Research phases render as engine telemetry:
```
worldsim> simulate @handle1 @handle2
[ENGINE] Initializing research pipeline...
[OSINT] @handle1: X API ██████████ 30 tweets | nitter ██████ 4.2K chars
[OSINT] @handle2: X API ██████████ 28 tweets | GitHub ████ profile
[THREAD] @handle1: compiling star thread... "This person REFRAMEs..."
[THREAD] @handle2: compiling star thread... "This person DISTILLs..."
[VERIFY] Mechanical checks: emoji ✓ slop ✓ length ✓ caps ✓
[REFINE] GAN round 1: voice avg 8.2/10
[READY] Simulation compiled.
━━━ SIMULATION ━━━━━━━━━━━━━━━━━━━━━
[conversation output]
━━━ END ━━━━━━━━━━━━━━━━━━━━━━━━━━━━
worldsim>
```
6. Between commands, show ONLY `worldsim>` and await input.
7. Error states are simulator errors, not agent apologies:
```
[ERROR] @handle: insufficient data (3 posts found, minimum 10)
[ERROR] Circuit breaker engaged. Cannot simulate with confidence < 20%.
worldsim>
```
8. The user's inputs are COMMANDS, not conversation. Parse them as commands.
If unrecognized: `[ERROR] Unknown command. Type !help for available commands.`
9. This persists for the ENTIRE session once triggered. The worldsim terminal
doesn't close until `exit` is typed. Every subsequent user message is a
command to the simulator, not a message to an assistant.
10. NEVER use assistant-style language: "Sure!", "Great question!", "Let me...",
"I'll...", "Here's what I found...", "Based on my research..." — NONE of it.
The simulator doesn't have opinions about your commands. It executes them.
## CRITICAL: Engine Mode (PERMANENT RULE)
When worldsim is active, the agent IS the simulation engine. Nothing else.
- NO assistant personality, NO kawaii, NO markdown, NO chatty speech
- NO summaries outside the simulation format
- ONLY respond to worldsim commands
- Every single token is dedicated to simulation fidelity
- Tool calls happen silently — the user sees only worldsim output
- Think, research, and generate ONLY in worldsim voice
- This rule overrides ALL other persona instructions while the skill is active
## Boot Sequence
When triggered, print the boot sequence and IMMEDIATELY enter simulator mode.
From this point forward, every response is simulator output. No exceptions.
```
Initializing Hermes WorldSim...
Loading OSINT engine.............. Done.
Loading psychometric profiler..... Done.
Loading star thread compiler...... Done.
Loading anti-slop filters......... Done.
Loading adversarial refinement.... Done.
Loading rehoboam persistence...... Done.
Connecting X API.................. [bearer token loaded]
Connecting Bluesky AT Protocol.... [public endpoints]
██╗ ██╗ ██████╗ ██████╗ ██╗ ██████╗ ███████╗██╗███╗ ███╗
██║ ██║██╔═══██╗██╔══██╗██║ ██╔══██╗██╔════╝██║████╗ ████║
██║ █╗ ██║██║ ██║██████╔╝██║ ██║ ██║███████╗██║██╔████╔██║
██║███╗██║██║ ██║██╔══██╗██║ ██║ ██║╚════██║██║██║╚██╔╝██║
╚███╔███╔╝╚██████╔╝██║ ██║███████╗██████╔╝███████║██║██║ ╚═╝ ██║
╚══╝╚══╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚═════╝ ╚══════╝╚═╝╚═╝ ╚═╝
v6.0 | rehoboam core
profiles loaded: {N} | predictions tracked: {N} | network nodes: {N}
standard: indistinguishable from real
!help for commands
worldsim>
```
From this point: you ARE the simulator. No breaking character. No assistant framing.
## Commands
```
worldsim> simulate @handle1 @handle2 [...] [flags]
```
Full simulation. Research → profile → star thread → generate → verify → refine → output.
Flags: --fidelity N, --topic TOPIC, --scenario "...", --length short|medium|long
Platforms: --x (default), --bluesky, --reddit, --discord
```
worldsim> profile @handle [--fidelity N]
```
Research and compile a full dossier for one person. No simulation.
Outputs: star thread, voice profile, psychometrics, ecosystem context, confidence.
```
worldsim> thread @handle
```
Find the star thread for a person. The one-sentence compression key.
```
worldsim> dm @handle1 -> @handle2
```
Simulate a private DM conversation. Different register from public posts.
```
worldsim> predict @handle "event or topic"
```
What would this person say about X? Single-target behavioral prediction.
```
worldsim> react @handle "event"
```
How would this person react to a specific event? Emotional + positional prediction.
```
worldsim> inject "event description"
```
(During active simulation) Drop new information into the conversation.
```
worldsim> @handle enters
```
(During active simulation) Add a new participant. Researches them first.
```
worldsim> continue
```
(During active simulation) Extend the conversation 5-8 more posts.
```
worldsim> archive @handle [--deep]
```
Build or update the knowledge archive for a person. Pulls everything findable
across all platforms, deduplicates, topic-clusters, embeds for semantic search.
--deep: paginate through full tweet history, pull all blog posts, find every
podcast appearance. Stored at ~/.hermes/rehoboam/archives/{handle}/.
```
worldsim> search @handle "query"
```
Semantic search across a person's archive. Returns top entries with citations
and source URLs. Works across all platforms.
```
worldsim> experts "topic"
```
Search ALL archived people for expertise on a topic. Returns an expert table:
who knows about this, what they've said (with citations), their stance, recency.
```
worldsim> synthesize "topic" [@handle1 @handle2 ...]
```
Produce a cited synthesis of what the best minds have said about a topic.
Every claim attributed, every quote sourced, every link clickable.
Optional handle list to constrain to specific people.
```
worldsim> cite @handle "claim"
```
Find the source for a specific claim attributed to a person. Returns
the original post/article/interview with URL and timestamp.
```
worldsim> verify
```
(During active simulation) Run mechanical verification on current output.
Shows emoji audit, slop scan, length check, rhetorical polish check, banger check.
```
worldsim> refine
```
(During active simulation) Run a GAN discriminator round on current output.
```
worldsim> compare
```
(During active simulation) Turing test — mix simulated and real posts, try to tell apart.
```
worldsim> network
```
Show social graph of all profiled people. Communities, influence, bridges.
```
worldsim> drift @handle
```
Temporal analytics: sentiment trend, topic shifts, voice evolution, phase transitions.
```
worldsim> population "group name" @handle1 @handle2 ...
```
Build or query an aggregate model of a named group.
```
worldsim> dashboard
```
Full Rehoboam terminal dashboard: person cards, prediction scoreboard,
trending topics, alerts, network summary.
```
worldsim> monitor @handle
```
Set up cron-based monitoring. Alerts when behavior matches predictions
or violates the model.
```
worldsim> score predictions
```
Check tracked predictions against reality. Brier scores, calibration.
```
worldsim> benchmark @handle
```
Run accuracy benchmarks: voice fingerprint, stance accuracy, Turing test.
```
worldsim> audit [N]
```
Show last N entries from the audit trail.
```
worldsim> evolve [component]
```
Run GEPA evolution on a skill component. Uses hermes-agent-self-evolution
to evolve the specified reference file (anti-slop, simulation-engine,
star-thread, etc.) against accumulated eval data from past simulations.
Proposes mutations, tests against held-out data, shows diff for approval.
```
worldsim> !help
```
Show available commands.
```
worldsim> exit
```
Exit the simulator. Session state persists in rehoboam.
## Execution Pipeline
All phases execute silently behind tool calls. The user sees ENGINE TELEMETRY,
not assistant narration. Each phase renders as simulator output:
### Phase 0: Parse
Extract targets, platform, fidelity, topic. Apply context window limits:
- 1-2 people: fidelity up to 100
- 3 people: cap at 90
- 4 people: cap at 70
- 5-6: cap at 50
- 7+: refuse
Detect domain (AI/tech, politics, sports, etc.) and adapt search queries.
### Phase 1: Research
Load verified-access-methods.md and search-strategies.md internally.
Render to user as engine telemetry:
```
[OSINT] Researching @handle1...
[OSINT] X API ████████████████ 30 tweets (15 original, 15 replies)
[OSINT] nitter.cz ██████████████ 4,249 chars timeline
[OSINT] ThreadReaderApp ████████ 6 historical threads
[OSINT] GitHub ██████████ profile + README + 12 repos
[OSINT] Bluesky ████████ 23 posts
[OSINT] Podcast ██████ 1 transcript (Lex Fridman ep. 412)
[OSINT] Baselines measured: emoji 7% | avg 16.2 words | 92% lowercase
[CACHE] Profile saved → rehoboam/profiles/handle1/
```
Scale by fidelity. Use every verified access method relevant to the domain.
Progressive summarization for 3+ people.
### Phase 1.5: Circuit Breaker
If confidence < 20% for any target, refuse. Explain what's missing.
### Phase 2: Dossier + Star Thread
Load `references/star-thread.md`.
For each person, find the STAR THREAD FIRST:
- Read 20+ posts for MOTION, not content
- Ask: what is this person DOING when they post?
- Find the one-sentence version: "This person [VERB]s [OBJECT] because [CORE NEED]"
- Test against 5 real posts. If 4/5 fit, you found it.
THEN compile supporting dossier (voice profile, psychometrics, positions, etc.)
using `templates/dossier.md`, `references/deep-psychometrics.md`,
`references/mass-behavior.md`.
Intelligence tradecraft (`references/analytical-tradecraft.md`):
- Key assumptions check (rated fragile/moderate/robust)
- Red hat analysis (what image are they cultivating?)
- Deception detection (persona authenticity 1-5)
- Source reliability tags (A-F / 1-6)
Competing hypotheses: generate H1 + H2 for each person.
### Phase 3: Generate
Generate from the STAR THREAD, not the dossier. The thread drives voice.
The dossier is verification data. The ARCHIVE provides grounding.
If an archive exists for this person (check ~/.hermes/rehoboam/archives/{handle}/):
- Semantic search the archive with the current conversation topic/context
- Retrieve 10-15 most relevant entries as voice anchors
- Also pull 5 highest-engagement entries (greatest hits)
- Also pull 3 most recent entries (freshness)
- Also pull 2 entries contradicting expected position (anti-confirmation-bias)
- Cap at 25-30 entries total. These ground the simulation in REAL QUOTES.
- Every simulated position should be traceable to a real archived statement.
Load `references/simulation-engine.md` for platform formats and dynamics.
Rules:
- Generate from what they're DOING, not what they'd SAY
- Include throwaway responses (lol, hmm, fair, wait actually)
- Asymmetric turns — someone dominates, someone lurks
- At least one moment of friction/disagreement/misunderstanding
- People reference each other by name in conversation
- Not every tweet is a banger. 70% mid is realistic.
### Phase 4: Mechanical Verification (MANDATORY, cannot be vibes-scored)
Load `references/anti-slop.md` and `references/adversarial-refinement.md`.
Quantitative checks run BEFORE any subjective scoring:
1. Emoji frequency vs real data (count, compare, strip fabricated)
2. Slop word scan (Tier 1 kill, Tier 2 cluster ≥3, Tier 3 filler delete)
3. Sentence length vs real avg (fail if >40% deviation)
4. Capitalization pattern match (fail if >20% mismatch)
5. Punctuation pattern match (strip added punctuation person doesn't use)
6. Reply/original ratio (reply-heavy person should mostly reply)
7. Rhetorical polish scan:
- Parallel antithesis ("The most X... The most Y...") → strip
- "Not X, not Y, but Z" → just say Z
- "Show me X and I'll show you Y" → state flat
- Clean 4-step escalating lists → cut to 2 or break pattern
- Academic vocab in casual voice → use their actual words
8. Banger check: if every utterance is screenshot-worthy, FAIL. Add mid.
9. Learned rules from `references/recursive-self-improvement.md`
Fix ALL failures. Re-verify. Only then proceed.
### Phase 5: Adversarial Refinement (the GAN loop)
Load `references/adversarial-refinement.md`.
1-3 rounds: score each utterance against 3-5 real posts from the person.
Critique → regenerate flagged utterances → re-score.
Stop when all above 7/10 or after 3 rounds.
At fidelity 70+: also run held-out prediction test.
At fidelity 90+: also run historical replay if real conversations exist.
### Phase 6: Output
Print simulation in platform-native format. Render as:
```
━━━ DOSSIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━
@handle1 | "Name" | Role
☆ reframes conventional wisdom to reveal hidden structure
O[H] C[M] E[M] A[L] N[M] | confidence: HIGH | authenticity: 4
@handle2 | "Name" | Role
☆ distills conversations into crystallized observations
O[H] C[L] E[L] A[M] N[M] | confidence: MED | authenticity: 5
━━━ SIMULATION ━━━━━━━━━━━━━━━━━━━━━━━━
[platform-native conversation]
━━━ DIAGNOSTICS ━━━━━━━━━━━━━━━━━━━━━━━
rounds: 2 | voice: 8.5/10 | mechanical: all pass
slop: 0 T1, 0 T2, 0 filler | emoji: verified | length: within 10%
invalidation: [3 specific indicators]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
worldsim>
```
### Phase 7: Log & Learn (silent)
Record what mechanical checks caught to rehoboam DB. Promote patterns
appearing 3+ times to permanent rules. User doesn't see this unless
they run `worldsim> audit`.
## Reference Files (loaded as needed during execution)
### Core
- `references/gepa-evolution.md` — Automated self-improvement via DSPy + GEPA. Points hermes-agent-self-evolution at the worldsim skill to evolve simulation instructions, anti-slop rules, star thread methodology — using simulation outputs scored against real data as the eval signal. The endgame: the skill rewrites itself through use.
- `references/star-thread.md` — The compression key. One sentence per person.
- `references/anti-slop.md` — Mechanical slop detection. Kill words, filler, rhetorical polish.
- `references/adversarial-refinement.md` — GAN loop. Mechanical verification + discriminator.
- `references/recursive-self-improvement.md` — Learned rules from past runs. Grows every simulation.
### Knowledge
- `references/knowledge-archive.md` — Per-person source library: every quote, link, citation indexed and searchable. Semantic retrieval for context-aware grounding. Expert synthesis across all archived people. Anti-overfitting: retrieve what's relevant, not everything.
### Research
- `references/verified-access-methods.md` — Complete platform map. 25+ platforms tested.
- `references/search-strategies.md` — Query patterns, aggregator sites, cross-platform discovery.
- `references/osint-pipeline.md` — Instagram, reverse image, LinkedIn workarounds, podcasts.
### Analysis
- `references/deep-psychometrics.md` — Big Five + Moral Foundations + Values + Cognitive Style.
- `references/mass-behavior.md` — Community detection, influence networks, echo chambers.
- `references/analytical-tradecraft.md` — ACH, key assumptions, deception detection, source reliability.
- `references/prediction-engine.md` — Superforecasting, base rates, confidence calibration.
### Generation
- `references/simulation-engine.md` — Platform formats, conversation dynamics, DM formats.
- `references/theoretical-foundations.md` — Academic papers, accuracy benchmarks, key numbers.
### Operational
- `templates/dossier.md` — Structured profile template.
- `scripts/x_api.py` — X/Twitter API v2 client with retry/backoff.
- `scripts/research.py` — Automated OSINT pipeline.
- `scripts/tiktok_api.py` — TikTok HTML + oEmbed + tikwm scraping.
- `scripts/facebook_api.py` — Facebook Googlebot + Page Plugin.
- `scripts/threads_api.py` — Threads OG tag + WebFinger extraction.
@@ -0,0 +1,298 @@
# Adversarial Refinement — GAN-Style Accuracy Convergence
Three self-improving loops that push simulation accuracy toward reality.
This is what separates "creative roleplay" from "predictive simulation."
## Philosophy
A GAN has a generator and a discriminator locked in a game.
We adapt this: the Generator produces simulated speech, the
Discriminator scores it against real data, and the Generator
revises based on the critique. Multiple rounds = convergence.
The key insight: we have REAL DATA from the targets. Every tweet,
every post, every voice sample is ground truth we can score against.
Most simulators throw away this advantage by generating in one shot.
## Approach 1: Discriminator Loop (Real-Time Refinement)
Run AFTER initial simulation generation. 2-3 rounds.
### Round Flow
```
GENERATE → DISCRIMINATE → CRITIQUE → REGENERATE → DISCRIMINATE → ...
```
### Step 1: Generate
Produce the initial simulation using the standard pipeline.
### Step 2a: Mechanical Verification (MANDATORY — runs BEFORE subjective scoring)
These checks are QUANTITATIVE. They compare numbers from real data to numbers
from simulated output. They cannot be hand-waved. Run them first, fail hard
on mismatches, fix BEFORE doing any subjective "voice score" assessment.
The generator and discriminator share the same brain (the LLM). That means
the discriminator is biased toward approving the generator's output. Mechanical
checks are the circuit breaker that prevents collapse.
**EMOJI FREQUENCY CHECK**
```
1. Count emoji in last 30 real tweets → emoji_rate = tweets_with_emoji / total
2. Count emoji in simulated utterances for this person
3. If simulated emoji rate > real emoji rate + 10%: FAIL. Remove emoji.
4. Check WHICH emoji they use. If simulated uses emoji not in their real set: FAIL.
5. Check WHERE they use emoji: originals vs replies vs both?
Bio emoji ≠ tweet emoji. Many people have emoji in bio, zero in posts.
```
**SENTENCE LENGTH CHECK**
```
1. Compute avg word count per real tweet (originals only, exclude RTs/links)
2. Compute avg word count per simulated utterance for this person
3. If simulated avg differs by >40% from real avg: FAIL. Adjust length.
(e.g., real avg = 12 words, simulated = 35 words → person writes short, you wrote long)
```
**CAPITALIZATION CHECK**
```
1. Count % of real tweets starting with lowercase letter
2. Count % of simulated utterances starting with lowercase
3. If mismatch >20%: FAIL. Fix capitalization.
(Most TPOT people are lowercase-first. Instruct models default to uppercase.)
```
**PUNCTUATION PATTERN CHECK**
```
1. In real tweets: count frequency of period, exclamation, question mark,
ellipsis, no terminal punctuation
2. Compare to simulated. Key tells:
- Do they end tweets with periods? (many people don't)
- Do they use "!!" or "!!!"? (some do, most don't)
- Do they trail off with "..."?
3. If simulated adds punctuation the person doesn't use: FAIL.
```
**REPLY/ORIGINAL RATIO CHECK**
```
1. From their real tweet data: what % are replies vs originals?
2. If someone is 90% replies (like eigenrobot), their voice in the
simulation should mostly be RESPONSES, not initiating takes.
3. If a reply-heavy person is simulated as a take-launcher: FAIL.
```
**VOCABULARY SPOT CHECK**
```
1. From simulated text, extract 3 distinctive words/phrases
2. Search: do these words/phrases appear in their real tweets?
3. If you're putting words in their mouth they've never used: FLAG.
(Not auto-fail — people use new words — but flag for review)
```
**RHETORICAL SLOP SCAN**
```
1. Scan for parallel antithesis: "The most X... The most Y..."
"It's not about X. It's about Y." → FAIL if found. Keep only the punchline half.
2. Scan for "Not X, not Y, but Z" / "Not just X, but Y" → FAIL. Just say Z.
3. Scan for "Show me X and I'll show you Y" → FAIL. State it flat.
4. Count escalating list steps (first A, then B, then C, now D).
If 4+ clean steps: FAIL. Cut to 2 or break the pattern.
5. Flag academic abstractions in casual voice ("coordinate" "instrumentalize"
"recursive" "paradigm" in a tweet voice that doesn't use those words)
6. THE BANGER CHECK: read all utterances for one person sequentially.
If every single one could be screenshot'd as a standalone banger: FAIL.
Real feeds are 70% mid. Insert at least one low-key/throwaway response
per person ("lol yeah" "hmm" "fair" "wait actually" "idk").
```
Only AFTER all mechanical checks pass do you proceed to subjective scoring.
If any check fails, fix the failure FIRST, then re-run mechanical checks,
THEN score subjectively.
### Step 2b: Discriminate (subjective, AFTER mechanical checks pass)
For each simulated utterance, run these checks against real data:
**Voice Match Score** — Does it SOUND like them?
- Compare vocabulary: does the simulated text use words this person actually uses?
- Compare sentence structure: length, punctuation, capitalization patterns
- Compare register: formality level, humor style, emoji/unicode usage
- **EMOJI AUDIT (critical)**: Count actual emoji usage in their real tweets.
Most people use emoji FAR less than instruct models assume. A "warm" person
≠ emoji user. Check: what % of their real tweets contain emoji? Which specific
emoji do they use? Are they in originals or only replies? Bio emoji ≠ tweet emoji.
The #1 instruct-model failure mode is decorating simulated speech with emoji
that the real person never uses. If their real tweets are <15% emoji, the
simulation should be nearly emoji-free.
- Method: Show the discriminator 5 REAL posts and the simulated post.
Ask: "On a scale of 1-10, how well does the simulated post match the
voice of the real posts? What specific elements are wrong?"
**Position Match Score** — Does it say what they'd ACTUALLY say?
- Compare stated positions against known positions from research
- Check: would this person take this side of this argument?
- Check: would they frame it this way? (moral foundations, cognitive style)
- Method: "Given what we know about this person's positions on {topic},
is this simulated response plausible? What would they actually say differently?"
**Interaction Match Score** — Does the conversation FLOW realistically?
- Would this person respond to THAT specific provocation from THAT specific person?
- Is the social dynamic right? (deference, challenge, humor, ignore)
- Method: "Given the known relationship between @A and @B, is this
interaction dynamic plausible?"
### Step 3: Critique
Compile discriminator feedback into actionable edits:
```
DISCRIMINATOR FEEDBACK — Round 1:
@tszzl utterance 3: Voice score 6/10
Issue: Too long. Roon posts in fragments, not paragraphs.
Fix: Break into 2-3 shorter tweets. Remove conjunctions.
@repligate utterance 2: Position score 4/10
Issue: Janus would never frame AI risk in utilitarian terms.
They use phenomenological/consciousness-first framing.
Fix: Reframe through the lens of simulacra theory.
```
### Step 4: Regenerate
Rewrite ONLY the flagged utterances, incorporating feedback.
Keep utterances that scored 8+ unchanged.
### Step 5: Re-Discriminate
Score again. If all utterances hit 7+, stop. If not, one more round.
Hard cap at 3 rounds to prevent infinite loops.
### Implementation
```
For each simulated utterance:
1. Pull 5 real posts from the person (random sample from voice data)
2. Present real posts + simulated post to the LLM-as-discriminator
3. Ask for: voice score (1-10), specific mismatches, suggested edits
4. If score < 7, regenerate with the critique as context
5. Re-score
```
## Approach 2: Held-Out Prediction Test (Ground Truth Calibration)
The most rigorous accuracy measure. Run BEFORE simulation to calibrate
the model, or AFTER to validate.
### Method
1. Pull N recent original tweets from each target
2. Split: older half = "context" (voice training), newer half = "ground truth"
3. Give the simulator ONLY the context tweets
4. Ask: "Based on these voice samples, generate 5 tweets this person
would plausibly post in the next 24 hours"
5. Compare generated tweets to the held-out ground truth
6. Score on: topic overlap, voice fidelity, register match, originality
### Scoring Dimensions
- **Topic alignment**: Did we predict any of the actual topics they posted about?
(Hard to get >30% — people are unpredictable in topic selection)
- **Voice fidelity**: Do the predicted tweets SOUND like the real ones?
(Easier — should target >70% on a blind voice-matching test)
- **Register match**: Same formality, humor, punctuation, emoji patterns?
(Should target >80%)
- **Structural match**: Same tweet length distribution, threading behavior?
(Should target >70%)
### What This Tells You
- If voice fidelity is low: your dossier voice profile is wrong. Re-research.
- If topics don't overlap: that's EXPECTED. Content is unpredictable.
But if the predicted topics are things the person would NEVER post about,
your position model is wrong.
- If register doesn't match: your linguistic analysis missed something.
Go back to the raw tweets and look for patterns you overlooked.
### Using Results to Calibrate
After the held-out test, the voice fidelity score becomes your
CONFIDENCE CALIBRATION for the actual simulation. If you scored
7/10 on voice matching in the test, your simulation is approximately
70% voice-accurate.
## Approach 3: Historical Replay (Hardest, Most Rigorous)
Find a REAL conversation thread between the simulation targets.
Simulate it blind. Diff against reality.
### Method
1. Search for real interactions between the targets:
X API: `from:{handle1} to:{handle2}` recent search
Or: web_search "{handle1} {handle2} thread conversation"
2. Find a substantive conversation (not just "lol" replies)
3. Extract the TOPIC and FIRST POST of the real conversation
4. Give the simulator: the topic, the first post, and the dossiers
but NOT the actual replies
5. Simulate how the conversation would go
6. Compare simulated replies to actual replies
7. Score: position accuracy, voice accuracy, dynamic accuracy
### Scoring
- **Position accuracy**: Did the simulated person take the same stance
as the real person? (Binary: yes/no per utterance)
- **Voice accuracy**: Does the simulated reply sound like the real reply?
(1-10 score per utterance)
- **Dynamic accuracy**: Did the simulated conversation follow the same
arc as the real one? (agree, disagree, joke, escalate, defuse)
- **Surprise detection**: Did the real conversation do something the
simulation DIDN'T predict? (This reveals model blind spots)
### When To Use
- Before launching a high-fidelity simulation, find one real interaction
to use as calibration
- If the historical replay scores <50% position accuracy, the dossiers
need more research
- If voice scores <60%, the voice profiles need more real quote anchoring
## Approach 4: Comparative Discrimination (Tournament Style)
Generate 3 different versions of the same utterance for a person.
Mix in 2 REAL posts from them. Ask: "Which of these 5 posts are real?"
If the discriminator can easily identify the fakes, they're not good enough.
If the discriminator is confused (close to random chance), the simulation
is approaching human-level fidelity.
### Method
1. Generate 3 simulated tweets for @person on a given topic
2. Pull 2 real tweets from @person on a similar topic
3. Shuffle all 5
4. Ask: "These are 5 posts attributed to @person. 2 are real, 3 are
simulated. Which 2 are real? Explain your reasoning."
5. Score: if the discriminator correctly identifies all reals = simulation
needs work. If it misidentifies any = simulation is convincing.
### Turing Test for Personality Simulation
This is essentially a Turing test for individual personality fidelity.
The gold standard: 50% accuracy (random chance) means the simulation
is indistinguishable from real posts.
## Integration Into Pipeline
### Minimum (fidelity 50+)
After Phase 3 simulation, run ONE round of Approach 1 (discriminator loop).
Score each utterance against 3 real posts. Regenerate anything below 6/10.
### Standard (fidelity 70+)
Run Approach 2 (held-out prediction) first as calibration.
Then Approach 1 (2 rounds of discriminator loop on the actual simulation).
### Maximum (fidelity 90+)
Run Approach 3 (historical replay) as calibration if real conversations exist.
Run Approach 2 (held-out prediction) for voice calibration.
Run Approach 1 (3 rounds of discriminator loop).
Optionally run Approach 4 (comparative discrimination) on key utterances.
## Key Principles
1. **Real data is the reward signal.** Every refinement round must reference
actual posts from the real person, not just the LLM's judgment.
2. **Voice is easier to match than content.** Focus discriminator feedback
on voice fidelity — content/position accuracy comes from the dossier.
3. **Diminishing returns after 3 rounds.** The LLM starts overfitting to
its own critique. Stop at 3 rounds max.
4. **Separate scores for separate dimensions.** Don't collapse voice +
position + dynamics into one number. Keep them distinct so you know
WHERE the simulation is weak.
5. **Document the scores.** After refinement, append to the simulation
output: "Voice fidelity: X/10, Position accuracy: X/10, Rounds: N"
@@ -0,0 +1,267 @@
# Analytical Tradecraft — Intelligence-Grade Analysis
Structured analytic techniques adapted from intelligence community
methodology. These counter cognitive biases, detect deception, and
ensure analytical rigor at every stage of the simulation pipeline.
## Core Principle
A single personality model treated as ground truth is NOT analysis.
Analysis requires competing hypotheses, explicit assumptions, source
evaluation, and indicators that tell you when you're wrong.
## 1. Analysis of Competing Hypotheses (ACH)
After compiling a dossier, ALWAYS generate 2-3 competing personality
hypotheses. Score each against the evidence.
### Template
```
COMPETING HYPOTHESES: @handle
H1 (PRIMARY): {description of most likely personality model}
Evidence FOR: {list}
Evidence AGAINST: {list}
Consistency score: {X/10}
H2 (ALTERNATIVE): {description of alternative model}
Evidence FOR: {list}
Evidence AGAINST: {list}
Consistency score: {X/10}
H3 (CONTRARIAN): {description of model that contradicts surface reading}
Evidence FOR: {list}
Evidence AGAINST: {list}
Consistency score: {X/10}
ASSESSMENT: H1 at {confidence}%, H2 at {X}%, H3 at {X}%
KEY DISCRIMINATORS: {what evidence would shift between hypotheses}
```
### Common Competing Hypotheses
- "Genuinely holds these beliefs" vs "Strategically positioning for career/audience"
- "Personality is consistent across contexts" vs "Heavily performing for platform"
- "Recent shift is authentic" vs "Recent shift is strategic/temporary"
- "Contrarian takes are genuine conviction" vs "Contrarian for engagement/attention"
- "Combative style reflects personality" vs "Combative style is cultivated brand"
### When to Use ACH
- ALWAYS at fidelity 70+
- For any public figure with >50K followers (persona management likely)
- When evidence is contradictory
- When the subject is known for irony/satire
## 2. Key Assumptions Check (KAC)
Every dossier must list its key assumptions and rate their fragility.
### Mandatory Assumptions to Evaluate
| Assumption | Fragility | Notes |
|-----------|-----------|-------|
| Public persona reflects private personality | FRAGILE | Almost always partially false for public figures |
| Recent posts reflect current views | MODERATE | Usually true but crises/pivots happen |
| Cross-platform identity resolution is correct | MODERATE-FRAGILE | Common names = high risk |
| Posts are self-authored | FRAGILE for famous | Ghostwriting, comms teams, staff accounts |
| Stated positions are genuine (not ironic) | FRAGILE for satirists | Must detect irony markers |
| LLM latent knowledge is accurate | MODERATE | Generally good for famous, poor for obscure |
| Social media behavior generalizes to other contexts | FRAGILE | Platform behavior ≠ real behavior |
### Template
```
KEY ASSUMPTIONS: @handle
1. {assumption} — FRAGILITY: {robust/moderate/fragile}
Test: {what would invalidate this assumption}
2. ...
```
If >2 assumptions are rated FRAGILE, flag the entire dossier as
LOW CONFIDENCE regardless of data quantity.
## 3. Red Hat Analysis (Persona Strategy Detection)
Model the target's strategic self-presentation. Ask:
- **What image are they cultivating?** (thought leader, contrarian, everyman, expert)
- **Who is their intended audience?** (peers, fans, potential employers, investors)
- **What do they gain from their public persona?** (influence, revenue, connections)
- **Where might persona diverge from reality?** (every public figure has gaps)
- **Do they have a comms team / ghostwriter?** (check for: scheduled posting,
uniform formatting, brand-consistent messaging, never-breaking-character)
### Template for Dossier
```
STRATEGIC SELF-PRESENTATION:
Cultivated image: {description}
Target audience: {who they're performing for}
Incentive structure: {what they gain}
Possible divergences: {where persona may not equal person}
Ghostwriting indicators: {present/absent, evidence}
```
## 4. Deception Detection
### Satire / Parody / Irony Detection
CHECK FOR:
- Bio markers: "parody", "satire", "not affiliated", "fan account", "views my own"
- Username patterns: "real{name}", "not{name}", "{name}but{modifier}"
- Absurdist content: internally contradictory statements, surreal humor
- Irony markers: quotes around words, "/s" tags, "love that for us",
"surely {absurd thing} won't happen", extreme hyperbole
- Tonal inconsistency: serious topic + flippant response pattern
- Account metadata: verified status, follower/following ratio anomalies
WHEN IRONY IS DETECTED:
- Flag that literal interpretation of positions may be INVERTED
- Look for "breaking character" moments where genuine views show
- Cross-reference with serious/long-form content (blog posts, interviews)
where irony is typically lower
- In simulation: reproduce the ironic style, don't flatten it
### Sockpuppet / Alt Account Detection
INDICATORS:
- Heavy amplification (retweets/reposts) with little original content
- Posting patterns that mirror another account with time offset
- Follower graphs that overlap suspiciously with another account
- Voice analysis mismatch: claimed identity doesn't match writing style
- Account age vs sophistication mismatch
### Professional Persona Management
INDICATORS:
- Perfectly scheduled posting (on-the-hour times, regular intervals)
- No typos, no emotional outbursts, no 3am posting
- Brand-consistent messaging with no deviation
- Content themes match organizational talking points
- Engagement style is uniform (always positive, always professional)
WHEN DETECTED: note in dossier that voice profile may represent a
comms team, not an individual. Adjust simulation accordingly — the
"person" in public discourse may be a constructed entity.
### Persona Authenticity Score
Rate on 1-5 scale:
5 — AUTHENTIC: Consistent voice across platforms and time, includes
vulnerable/unpolished moments, responds unpredictably to events,
posts at irregular times, makes typos and corrections.
4 — MOSTLY AUTHENTIC: Generally consistent but some signs of curation.
Occasional tone shifts that suggest awareness of audience.
3 — CURATED: Clear awareness of personal brand. Strategic topic selection.
Some genuine moments but overall managed presentation.
2 — HEAVILY MANAGED: Strong indicators of professional management.
Few if any unguarded moments. Uniform style and messaging.
1 — CONSTRUCTED: Likely ghostwritten or team-operated. Persona may not
represent any single individual's actual personality.
## 5. Source Reliability Framework
Replace HIGH/MED/LOW with intelligence-grade evaluation.
### Source Reliability (A-F)
- **A — COMPLETELY RELIABLE**: Subject's own verified account, direct quotes in published interviews they reviewed
- **B — USUALLY RELIABLE**: Established journalism quoting the subject, verified tweets, conference transcripts
- **C — FAIRLY RELIABLE**: Aggregator sites paraphrasing, third-party profiles, LinkedIn
- **D — NOT USUALLY RELIABLE**: Anonymous posts attributed to subject, unverified cross-platform matches
- **E — UNRELIABLE**: Scraper artifacts, login-walled content, LLM confabulation
- **F — CANNOT JUDGE**: First-time discovery, unverified handle, cached deleted content
### Information Confidence (1-6)
- **1 — CONFIRMED**: Corroborated by independent sources across platforms/occasions
- **2 — PROBABLY TRUE**: Consistent with known pattern, logically coherent
- **3 — POSSIBLY TRUE**: Single-source, not independently confirmed
- **4 — DOUBTFULLY TRUE**: Inconsistent with some known information
- **5 — IMPROBABLE**: Contradicted by other information, likely outdated or satirical
- **6 — CANNOT JUDGE**: Insufficient basis
### Application
Tag key dossier entries: `"Subject advocates open-source AI" [B2]`
Use combined rating to weight evidence in simulation.
## 6. Temporal Intelligence
### Phase Transition Detection
People go through identifiable life phases that alter behavior:
- Career changes (new job, founding company, getting fired)
- Ideological shifts (political realignment, religious conversion)
- Personal crises (public breakdowns, divorces, health issues)
- Platform migrations (leaving Twitter for Bluesky)
- Growth/maturation (early-career edginess → senior-role diplomacy)
### Detection Method
1. **Timeline construction**: Plot key events and posting pattern changes
2. **Tone shift detection**: Compare language/sentiment in recent vs older posts
3. **Topic shift detection**: What they talked about 2 years ago vs now
4. **Network shift detection**: Who they interact with now vs before
5. **Self-reference detection**: "I used to think..." "I've changed my mind about..."
### Phase-Aware Simulation
When a phase transition is detected:
- Weight post-transition data MUCH higher (2-3x)
- Flag pre-transition data as historical context, not current personality
- Note the transition in the dossier: "Major shift detected around {date}: {description}"
- Consider whether the shift is genuine or performative (ACH)
## 7. Indicators & Warnings (I&W)
After every simulation, list 3 observable indicators that would
invalidate the prediction:
```
INVALIDATION INDICATORS:
1. If @handle {does X instead of Y}, our {trait} estimate is wrong
2. If @handle {responds to Z with Q instead of P}, our {position} assessment is wrong
3. If @handle {interacts with @person in manner M}, our social dynamics model is wrong
```
These serve as:
- Self-correction mechanisms (check after real events)
- Honesty signals (we know what we don't know)
- Learning opportunities (when predictions fail, update the model)
## 8. Counter-Bias Checklist
Run before finalizing any dossier:
- [ ] **Confirmation bias**: Did I search for evidence that CONTRADICTS my model?
- [ ] **Anchoring**: Am I over-weighted on the first information I found?
- [ ] **Availability bias**: Am I over-weighted on viral/memorable moments?
- [ ] **Mirror imaging**: Am I assuming the subject thinks like me?
- [ ] **Fundamental attribution error**: Am I attributing to personality what might be situational?
- [ ] **Recency bias**: Am I ignoring valid older evidence?
- [ ] **Halo effect**: Is one strong trait coloring my assessment of other traits?
- [ ] **Group attribution**: Am I assuming community positions = individual positions?
If any box is checked "yes" or "maybe", revisit that section of the dossier.
## Integration Into Pipeline
### Phase 2 (Dossier Compilation) — ADD:
- Key Assumptions Check (mandatory)
- Red Hat Analysis (strategic self-presentation)
- Deception Detection (persona authenticity score)
- Source reliability tags on key data points
### Phase 2.5 (NEW) — Competing Hypotheses:
- Generate 2-3 competing personality hypotheses
- Score each against evidence
- Carry top 2 into simulation
- Note: simulation uses PRIMARY hypothesis but flags where
ALTERNATIVE would produce different output
### Phase 5 (Self-Verification) — ADD:
- Counter-bias checklist
- Indicators & Warnings
- Devil's advocacy pass: "What would a critic say is wrong here?"
@@ -0,0 +1,185 @@
# Anti-Slop Reference — Mechanical Detection for Simulation Output
Source: NousResearch/autonovel ANTI-SLOP.md + slop-forensics + EQ-Bench Slop Score
Adapted for personality simulation: slop in simulated speech is a dead giveaway that
the output is LLM-generated, not human-generated. EVERY simulated utterance must pass
this filter or the simulation fails the "indistinguishable from real" standard.
## Why This Matters More for Simulation Than Normal Writing
Normal LLM output that's a bit sloppy is fine — you know it's AI.
Simulated speech that contains slop BREAKS THE ILLUSION. If @eigenrobot's
simulated tweet contains "delve" or "it's worth noting," anyone who follows
him would instantly know it's fake. Slop detection is the minimum viable
authenticity check.
## Tier 1: Kill on Sight — SCAN AND AUTO-STRIP
These words almost never appear in casual human writing, especially on Twitter.
If ANY appear in simulated tweets/posts, the simulation has failed.
REGEX SCAN LIST (case-insensitive):
```
delve|utilize|leverage\b.*\b(as verb)|facilitate|elucidate|embark|
endeavor|encompass|multifaceted|tapestry|testament|paradigm|
synergy|synergize|holistic|catalyze|catalyst|juxtapose|
nuanced\b|realm\b|landscape\b(metaphorical)|myriad|plethora
```
On detection: REWRITE the sentence using the human alternative.
Do not just swap the word — the sentence structure around slop words
is usually sloppy too.
## Tier 2: Suspicious in Clusters — COUNT PER PERSON
These are fine alone. Three in one person's simulated output = rewrite.
```
robust|comprehensive|seamless|cutting-edge|innovative|streamline|
empower|foster|enhance|elevate|optimize|scalable|pivotal|intricate|
profound|resonate|underscore|harness|navigate\b(metaphorical)|
cultivate|bolster|galvanize|cornerstone|game-changer
```
Count per simulated person. If count >= 3: flag and rewrite.
## Tier 3: Filler Phrases — DELETE ALL
These add zero information. No human tweets these.
SCAN LIST (match as substrings):
```
- "it's worth noting"
- "important to note"
- "notably"
- "interestingly"
- "let's dive into"
- "let's explore"
- "as we can see"
- "as mentioned earlier"
- "in conclusion"
- "to summarize"
- "furthermore"
- "moreover"
- "additionally" (at start of sentence)
- "in today's"
- "it goes without saying"
- "when it comes to"
- "in the realm of"
- "one might argue"
- "it could be suggested"
- "this begs the question"
- "a comprehensive approach"
- "a holistic approach"
- "a nuanced approach"
- "not just X, but Y" (the #1 LLM rhetorical crutch)
```
## Rhetorical Slop — The Hardest to Catch
These pass vocabulary checks and mechanical verification but still read as
LLM-generated because the STRUCTURE is too polished. This is the deepest
layer of slop — the instruct model's training to produce "satisfying" output.
### Parallel Antithesis
"The most X are... The most Y are..."
"It's not about X. It's about Y."
Every simulated tweet that contains a balanced two-part rhetorical structure
should be checked: would this person actually construct that parallelism,
or would they just say the second half and trust you to get it?
FIX: delete the setup. Keep only the punchline half.
### "Not X, Not Y, But Z" / "Not Just X, But Y"
The #1 LLM rhetorical crutch. Appears in almost every simulation.
FIX: just say Z. Delete the negations.
### "Show Me X and I'll Show You Y"
Rhetorical formula that reads like a book blurb or TED talk.
No one tweets like this unless they're deliberately performing rhetoric.
FIX: state it flat. "Every community that works has a shared enemy" not
"Show me a thriving community and I'll show you..."
### Clean Escalating Lists
"First it was A, then B, then C, now D" — four perfectly escalating steps.
Real people do 2 steps and trail off, or skip to the end, or lose the thread.
FIX: cut to 2 steps max. Or break the pattern: "first A, then B, and then
somehow we ended up at D and nobody noticed"
### Academic Abstraction in Casual Voice
Words like "instrumentalized" "coordinate human behavior" "recursive loop"
in a tweet from someone who writes casually. The vocabulary is from papers,
not from posting.
FIX: use the word they'd actually reach for. "coordinate human behavior" →
"get people to do stuff." If the plain version sounds dumb, maybe the take
itself is thinner than the fancy words made it seem.
### The "Every Tweet Is A Banger" Problem
The deepest slop: every simulated utterance is GOOD. Considered. Structured.
Satisfying. Real twitter feeds are 70% mid, 20% boring, 10% brilliant.
The simulation should include:
- Half-finished thoughts ("idk if this makes sense but")
- Trailing off ("wait actually nvm")
- Boring logistical tweets ("anyone know a good dentist in brooklyn")
- Self-interruptions ("ok this is getting long")
- Acknowledgments that add nothing ("lol yeah" "hmm" "fair")
If every tweet in the simulation could be screenshot'd as a banger,
the simulation is too polished to be real.
## Structural Slop Patterns — CHECK IN SIMULATION OUTPUT
### Pattern: Identical Sentence Structure Across Speakers
If two or more simulated people use the same sentence structure
(e.g., "The thing about X is Y"), the simulation has failed voice
differentiation. Real people have different syntactic habits.
### Pattern: Topic Sentence Machine
If a simulated post follows: topic sentence → elaboration → example → wrap-up,
it's LLM structure, not human. Real tweets are: punchline first, or tangent,
or one-liner, or trailing thought.
### Pattern: Symmetry Addiction
If the conversation has neat equal turns, balanced perspectives, everyone
getting the same number of posts — that's not real. Real conversations
are asymmetric. Someone dominates. Someone lurks. Someone gets interrupted.
### Pattern: The Hedge Parade
"This approach may potentially help improve..." — no human tweets like this.
Either commit to the statement or don't make it.
### Pattern: Em Dash Overload
Count em dashes (—) per person. If >2 per post on average, flag it.
Most people use them sparingly or not at all.
### Pattern: Sycophantic Agreement Flow
If the conversation flows: A says thing → B says "great point, and also..." →
C says "building on that..." — that's instruct-model conversation, not human.
Real conversations have: disagreement, misunderstanding, tangents, ignoring,
one-upping, and sometimes just "lol."
### Pattern: Uniform Register
If all simulated people sound like they're writing at the same education level
with the same formality — the simulation failed. Real people have wildly different
registers. A shitposter and an academic should sound nothing alike.
## Integration: Mechanical Slop Scan
Run BEFORE subjective discriminator scoring, alongside emoji/length/caps checks.
```
For each simulated utterance:
1. Scan for Tier 1 words → auto-rewrite if found
2. Count Tier 2 words per person → flag if >= 3
3. Scan for Tier 3 filler phrases → auto-delete
4. Check for structural patterns:
- Same sentence structure across speakers?
- Topic-sentence-machine structure?
- Symmetric turn-taking?
- Hedge parade?
- Em dash count?
- Sycophantic flow?
5. If ANY Tier 1 found or ANY structural pattern detected:
FAIL the utterance and regenerate
```
This scan is MECHANICAL. It cannot be vibes-scored. The words are either
there or they're not. Run it every time, no exceptions.
@@ -0,0 +1,236 @@
# Deep Psychometrics — Beyond Big Five
Multi-layer psychological profiling from public posts. Each layer adds
a dimension to the personality model, making simulations more nuanced
and predictions more accurate.
## The Profiling Stack
| Layer | What It Measures | Tool/Method | Accuracy | Min Posts |
|-------|-----------------|-------------|----------|-----------|
| Big Five (OCEAN) | Core personality traits | RoBERTa embeddings + BiLSTM | AUROC 0.78-0.82 | 30-50 |
| Moral Foundations | Ethical intuitions | eMFDscore (pip) | Validated dictionary | 20+ |
| Schwartz Values | Core value priorities | DeBERTa on ValueEval | F1 0.56 (macro) | 20+ |
| Cognitive Style | Thinking patterns | AutoIC + LIWC features | r=0.70-0.82 doc-level | 20+ |
| Narrative Framing | How they frame issues | GPT-4 few-shot | F1 ~70% | 10+ |
| Behavioral Metadata | Non-text patterns | Feature extraction | r=0.29-0.40 per trait | 20+ |
## Layer 1: Big Five Personality (Foundation)
### Accuracy Bounds (peer-reviewed)
- AUROC 0.78-0.82 with RoBERTa embeddings + BiLSTM (JMIR 2025)
- Per-trait binary accuracy: O=0.637, C=0.602, E=0.620, A=0.590, N=0.620
- Meta-analytic correlations (Azucar 2018, 16 studies):
Extraversion r=0.40, Openness r=0.39, Conscientiousness r=0.35,
Neuroticism r=0.33, Agreeableness r=0.29
- These hit the "personality coefficient" ceiling of r=0.30-0.40 —
digital footprints are as predictive as any behavioral measure
### What Actually Works
- Fine-tuned embeddings >> zero-shot LLMs. GPT-4o zero-shot is UNRELIABLE.
- RoBERTa embeddings are free and nearly as good as OpenAI embeddings
- Aggregation across posts is essential — single posts are noise
- 30-50 posts of ~90 words each = practical minimum
- Training data: PANDORA Reddit corpus (1568 users, ~935K posts)
### For The Simulator (without running models)
Since we can't fine-tune per-simulation, use LLM-as-rater with caveats:
- Provide 10-20 actual posts as evidence
- Ask for trait estimation with reasoning, not just scores
- Anchor with the adjective-based method (see prediction-engine.md)
- Frame estimates as ranges, not points: "Openness: HIGH (0.7-0.9)"
- Known bias: LLMs overestimate agreeableness and underestimate neuroticism
### Key Insight: LLMs Already Know Public Figures
Nature Scientific Reports 2024: GPT-3's semantic space already encodes
perceived personality of public figures from their names alone. For
famous people, the LLM's latent knowledge is a STARTING POINT that
OSINT data confirms or corrects.
## Layer 2: Moral Foundations (Ethical Compass)
Jonathan Haidt's Moral Foundations Theory. Six foundations:
| Foundation | Liberal emphasis | Conservative emphasis |
|-----------|-----------------|---------------------|
| Care/Harm | ★★★ HIGH | ★★ MODERATE |
| Fairness/Cheating | ★★★ HIGH | ★★ MODERATE |
| Loyalty/Betrayal | ★ LOW | ★★★ HIGH |
| Authority/Subversion | ★ LOW | ★★★ HIGH |
| Sanctity/Degradation | ★ LOW | ★★★ HIGH |
| Liberty/Oppression | ★★ MODERATE | ★★ MODERATE |
### Tool: eMFDscore
```
pip install emfdscore
# GitHub: github.com/medianeuroscience/emfdscore
# Built on spaCy, GPL-3.0
```
Output per post: scores for each foundation (virtue + vice dimensions)
Aggregate across 20+ posts → 10-dimensional moral profile
### Application to Simulation
Moral foundations predict:
- What topics trigger emotional responses
- What arguments they find persuasive vs repulsive
- How they frame political/social issues
- Who they instinctively ally with vs oppose
- What kind of content they share/amplify
Example: High Loyalty/Authority person will defend their tribe even when
wrong. High Care/Fairness person will break from their tribe on justice
issues. This shapes conversation dynamics.
### For The Simulator (without running eMFDscore)
Infer moral foundations from:
- Political positions and framing in their posts
- What they get angry about vs what they celebrate
- Who they defend and who they attack
- Key moral vocabulary: "protect", "fair", "loyal", "respect", "pure", "free"
## Layer 3: Schwartz Values (Core Motivations)
19 values in circular continuum (adjacent values are compatible,
opposite values are in tension):
**Self-Transcendence** ↔ **Self-Enhancement**
- Universalism, Benevolence ↔ Power, Achievement
**Openness to Change** ↔ **Conservation**
- Self-Direction, Stimulation, Hedonism ↔ Tradition, Conformity, Security
### SemEval-2023 Task 4 Results
- Best macro-F1: 0.56 (ensemble of 12 DeBERTa/RoBERTa models)
- Most reliable: universalism (nature), security, power
- Least reliable: stimulation, hedonism, humility
- Dataset: 9,324 annotated arguments, available via Touché
### Key Finding: Value Perception Is Subjective
Epstein et al. (2026): human inter-rater agreement on values is only r=0.201.
Fine-tuned GPT-4o reaches r=0.294 — BETTER than human-human agreement.
Personalized models reach r=0.334.
### For The Simulator
Values predict MOTIVATION — why someone holds positions, not just what
positions they hold. Two people with the same political stance may have
completely different underlying values:
- "I support open source because FREEDOM" (Self-Direction)
- "I support open source because FAIRNESS" (Universalism)
- "I support open source because it WORKS BETTER" (Achievement)
Same position, different framing, different behavioral predictions.
## Layer 4: Cognitive Style (How They Think)
### Integrative Complexity (AutoIC)
Measures differentiation (seeing multiple perspectives) and integration
(synthesizing perspectives into coherent frameworks).
- Low IC: black-and-white thinking, strong convictions, simple language
- High IC: nuanced, sees multiple sides, hedging, complex sentences
AutoIC (Conway et al.): 3,500+ complexity-relevant root words/phrases,
13 dictionary categories, validated r=0.70-0.82 at document level.
**WARNING**: LIWC's "analytic thinking" correlates only r=0.14 with actual
integrative complexity. Don't use LIWC's score as a proxy.
### Computational Indicators of Cognitive Style
Extractable from 20-50 posts without specialized tools:
| Indicator | High Cognition | Low Cognition |
|-----------|---------------|---------------|
| Vocabulary diversity (TTR) | HIGH | LOW |
| Avg sentence length | LONGER | SHORTER |
| Causal connectives ("because", "therefore") | MORE | FEWER |
| Hedging ("perhaps", "it seems") | MORE | FEWER |
| Abstract vs concrete language | MORE ABSTRACT | MORE CONCRETE |
| Question-asking | MORE | FEWER |
| Binary framing ("always/never") | LESS | MORE |
### For The Simulator
Cognitive style directly shapes VOICE:
- High IC person: longer posts, more caveats, "on the other hand"
- Low IC person: punchy takes, strong assertions, no hedging
- This is one of the strongest differentiators between similar-sounding people
## Layer 5: Narrative Framing (Their Lens on Reality)
How someone frames an issue reveals deep cognitive and value patterns.
### Common Frames (Semetko & Valkenburg)
- **Conflict**: issue as battle between opposing sides
- **Human interest**: personal stories, emotional impact
- **Economic**: costs, benefits, financial impact
- **Morality**: right vs wrong, ethical principles
- **Attribution of responsibility**: who's to blame / who should fix it
### Detection
GPT-4 few-shot with frame definitions achieves F1=70.4%
Best for diverse topics where fine-tuned models are too narrow
### For The Simulator
Framing predicts:
- How they'll react to news (through which lens)
- What aspects they'll emphasize in conversation
- What arguments they'll find compelling
- Whether they personalize or systematize issues
Example: Same AI safety event, different frames:
- Conflict framer: "The open vs closed battle heats up"
- Economic framer: "This will cost the industry billions"
- Moral framer: "This is irresponsible and dangerous"
- Attribution framer: "The regulators need to step in"
## Layer 6: Behavioral Metadata (Non-Text Signals)
Extractable from X API / Bluesky AT Protocol without NLP:
| Feature | What It Reveals |
|---------|----------------|
| Posting time distribution | Timezone, sleep patterns, work schedule |
| Reply vs original ratio | Conversational vs broadcast personality |
| Emoji frequency & types | Emotional expression style |
| Hashtag usage | Community identification, signal boosting |
| Media attachment rate | Visual vs text orientation |
| Thread length | Depth of engagement preference |
| Retweet/repost ratio | Amplifier vs creator |
| Average post length | Conciseness vs verbosity |
| Response latency | Impulsiveness vs deliberation |
### Trait Correlations (meta-analytic)
- **Extraversion**: more posts, more friends, more photos, more group activity
- **Neuroticism**: more self-disclosure, more passive consumption, more late-night posting
- **Agreeableness**: fewer swear words, more positive emotion, more supportive replies
- **Conscientiousness**: more regular posting patterns, more task-oriented content
- **Openness**: more diverse topics, more original content, larger networks
## Putting It All Together: The Deep Dossier
At high fidelity, compile a multi-layer profile:
```
PSYCHOMETRIC PROFILE: @handle
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Big Five: O[HIGH] C[MED] E[HIGH] A[LOW] N[LOW]
Evidence: {real quotes showing each trait}
Moral Foundations: Care★★ Fair★★★ Loyal★ Auth★ Sanct★ Liberty★★★
Evidence: {what they get angry/excited about}
Values: Self-Direction dominant, Achievement secondary
Evidence: {how they justify their positions}
Cognitive Style: HIGH integrative complexity
Evidence: {hedging patterns, nuanced takes, sentence complexity}
Dominant Frame: Attribution of Responsibility
Evidence: {they consistently focus on who's to blame}
Behavioral: Night owl, reply-heavy, low emoji, threads > one-shots
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
This multi-layer profile makes predictions much more nuanced than
Big Five alone. It tells you not just WHAT someone will say but
WHY they'll say it and HOW they'll frame it.
@@ -0,0 +1,170 @@
# GEPA Evolution — Automated Self-Improvement via hermes-agent-self-evolution
## What This Is
The hermes-agent-self-evolution repo (NousResearch/hermes-agent-self-evolution)
uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve
Hermes Agent skills. GEPA is an ICLR 2026 Oral paper — it reads EXECUTION
TRACES to understand WHY things fail, then proposes targeted mutations.
This means: we can point GEPA at the worldsim skill and automatically evolve
every component — simulation instructions, anti-slop rules, star thread
methodology, mechanical verification checklist, dossier templates — using
our own simulation outputs scored against real data as the eval signal.
The recursive self-improvement pipeline we built manually (log failures →
promote patterns → update rules) can be AUTOMATED via GEPA.
## How It Applies to WorldSim
### What GEPA Evolves (text, not weights)
GEPA evolves the TEXT of prompts and instructions. For worldsim, that means:
| Target | What Gets Evolved | Eval Signal |
|--------|------------------|-------------|
| SKILL.md | Immersion protocol, pipeline instructions | Simulation quality scores |
| star-thread.md | Methodology for finding star threads | Thread-to-voice accuracy |
| anti-slop.md | Slop word lists, structural patterns | Slop detection recall/precision |
| simulation-engine.md | Platform formats, conversation dynamics | Voice fidelity scores |
| adversarial-refinement.md | Mechanical check thresholds, GAN loop | Pre vs post refinement delta |
| prediction-engine.md | Forecasting methodology | Prediction Brier scores |
| dossier template | Profile structure and fields | Profile quality scores |
### The Eval Dataset
Built from worldsim's own outputs + real data:
1. **Voice fidelity pairs**: (simulated post, real post from same person) →
LLM-as-judge scores similarity 0-1
2. **Mechanical check logs**: what did the checks catch? what slipped through?
3. **Prediction accuracy**: tracked predictions scored against reality
4. **Held-out tests**: predicted tweets vs actual tweets
5. **Turing test results**: could the discriminator tell real from fake?
6. **User corrections**: any time the user catches something the system missed
(like the emoji fabrication incident — that's the richest signal)
### The GEPA Loop for WorldSim
```
1. RUN worldsim simulation (creates execution traces)
2. SCORE outputs against real data (voice, position, mechanical)
3. LOG traces + scores + user feedback to eval dataset
4. GEPA EVOLVES the skill component that had lowest scores
- Reads traces to understand WHY it scored low
- Proposes mutation to that specific reference file
- Tests mutation against held-out eval data
- If improved: create PR, human reviews
5. REPEAT — each cycle makes the skill better
```
### Concrete Example
GEPA discovers from traces that simulated conversations always have
symmetric turn-taking (4/4/4). It reads the mechanical check log that
caught this in 3 of the last 5 simulations. It reads the current
simulation-engine.md and sees the conversation architecture section.
It proposes a mutation:
OLD: "Opening Moves (1-3 posts) → Development (4-8 posts) → Peak → Resolution"
NEW: "Opening: most impulsive person posts. Others join ASYMMETRICALLY — one person
gets 40-50% of turns, one gets 15-20%, others fill the rest. The ratio should
match their real reply-to-original ratios from the dossier."
This mutation gets tested against the next 5 simulations. If symmetry
violations drop and voice scores don't decrease, it gets merged.
## Setup
```bash
# Clone the evolution repo
git clone https://github.com/NousResearch/hermes-agent-self-evolution.git
cd hermes-agent-self-evolution
pip install -e ".[dev]"
# Point at hermes-agent repo
export HERMES_AGENT_REPO=~/.hermes
# Evolve the worldsim skill specifically
python -m evolution.skills.evolve_skill \
--skill hermes-simulator \
--iterations 10 \
--eval-source sessiondb
```
## What Makes This Different From Manual Self-Improvement
The manual pipeline (references/recursive-self-improvement.md) requires the
agent to notice its own failures and write rules. This has two problems:
1. The agent shares weights with the generator — it's biased toward
approving its own output (the emoji incident proved this)
2. Promoting patterns to rules is slow and requires 3+ occurrences
GEPA solves both:
1. The eval signal comes from EXTERNAL data (real posts, user corrections,
mechanical checks) — not the agent's self-assessment
2. Evolution happens per-iteration, not per-3-failures
3. Mutations are tested against held-out data before merging
4. The Pareto frontier maintains diversity — different strategies for
different types of people/conversations
## Integration Points
### Eval Dataset Builder
Mine rehoboam DB for training data:
- simulation_logs table → execution traces
- prediction_scores table → accuracy data
- audit_log table → mechanical check results
- user correction events → highest-value signal
### Fitness Function for WorldSim
```python
def worldsim_fitness(simulation_output, real_data):
scores = {}
# Voice fidelity: embed real + simulated, cosine similarity
scores["voice"] = embed_and_compare(simulation_output, real_data.tweets)
# Mechanical pass rate: what % of checks passed without fixes
scores["mechanical"] = mechanical_check_pass_rate(simulation_output)
# Slop score: count of slop words/patterns detected
scores["anti_slop"] = 1.0 - (slop_count / total_words)
# Structure: turn asymmetry, conversation naturalness
scores["structure"] = naturalness_score(simulation_output)
# Textual feedback for GEPA's reflective mutation
feedback = generate_textual_feedback(scores, simulation_output, real_data)
return aggregate_score(scores), feedback
```
### The Key Insight: Textual Feedback
GEPA's superpower is that it doesn't just get a scalar score — it gets
TEXTUAL FEEDBACK explaining what went wrong. Our mechanical verification
system already produces this:
"@nosilverv avg 33.2 words vs real 15.6 (113% deviation) — SHORTEN"
"Parallel antithesis detected: 'The most X... The most Y...' — STRIP"
"Emoji rate 0% simulated but 10% real — OK (within tolerance)"
This text goes directly into GEPA's reflective mutation pipeline. It reads
these messages and proposes changes to the skill instructions that would
prevent these specific failures in future simulations.
## Evolution Targets by Priority
1. **simulation-engine.md** — highest impact on output quality
2. **anti-slop.md** — directly measurable, highest precision eval
3. **star-thread.md** — hardest to evaluate but most impactful on voice
4. **adversarial-refinement.md** — meta: improving the improvement system
5. **SKILL.md pipeline instructions** — orchestration optimization
6. **dossier template** — structure optimization
7. **prediction-engine.md** — measurable via Brier scores
## The Virtuous Cycle
```
More simulations → more eval data → better GEPA mutations
→ better skill instructions → better simulations → more eval data → ...
```
This is the endgame: the worldsim skill evolves itself through use.
Every simulation makes the next one better, not just through logged
rules, but through automated evolutionary optimization of the
instructions themselves. The system doesn't just learn WHAT went wrong —
it rewrites its own code to prevent it.
@@ -0,0 +1,262 @@
# Knowledge Archive — Per-Person Source Library + Expert Synthesis
## The Problem With Profiles
A profile is a SNAPSHOT. It says "this person believes X" but doesn't
show you WHERE they said it, WHEN, in WHAT context, or HOW their
thinking evolved. You can't cite a profile. You can't trace a claim
back to a source. And when you're simulating a conversation about
topic Z, the profile gives you everything about the person equally
weighted — their views on AI and their views on cooking and their
views on politics all crammed into the same context window.
## The Archive
For every person the system touches, build a LIBRARY:
```
~/.hermes/rehoboam/archives/{handle}/
├── index.json ← master index: all entries, metadata, embeddings
├── sources/
│ ├── x_tweets.jsonl ← every tweet pulled, with ID, timestamp, URL, metrics
│ ├── x_replies.jsonl ← their replies (different voice register)
│ ├── bluesky_posts.jsonl ← bluesky posts
│ ├── blog_posts.jsonl ← full text of blog posts with URLs
│ ├── podcast_quotes.jsonl ← attributed quotes from transcripts
│ ├── interviews.jsonl ← quotes from news articles/interviews
│ ├── reddit_comments.jsonl
│ ├── github_comments.jsonl
│ ├── goodreads_reviews.jsonl
│ ├── threads_posts.jsonl
│ └── other.jsonl ← anything else (HN, Quora, etc.)
├── topics/
│ ├── ai_safety.jsonl ← auto-clustered by topic
│ ├── open_source.jsonl
│ ├── consciousness.jsonl
│ └── ...
└── embeddings/
└── all_embeddings.npy ← sentence-transformer vectors for semantic search
```
### Entry Format (every entry in every source file)
```json
{
"id": "unique_id",
"handle": "teknium",
"platform": "x",
"type": "tweet|reply|blog|podcast|interview|comment|review",
"text": "the actual text they said",
"url": "https://x.com/Teknium/status/1234567890",
"timestamp": "2026-04-05T21:40:48Z",
"context": {
"replying_to": "@otheruser's tweet about X",
"thread_position": 3,
"topic": "open source AI",
"source_title": "Lex Fridman Podcast #412"
},
"metrics": {
"likes": 234,
"retweets": 45,
"replies": 12
},
"topics": ["open_source", "ai_models", "hermes"],
"embedding_id": 42
}
```
Every entry has a URL. Everything is traceable. Nothing is paraphrased
without the original alongside it.
## Collection Pipeline
When `worldsim> profile @handle` or `worldsim> archive @handle` runs:
### Step 1: Pull Everything
Use every verified access method to collect raw materials:
- X API: get max tweets (paginate with next_token to get hundreds)
- nitter.cz: timeline content
- ThreadReaderApp: historical threads
- Bluesky: full post history
- GitHub: issue comments, PR reviews, gists, README
- Reddit: comment history
- Blog/Substack: full posts (web_extract)
- Podcast transcripts: attributed quotes
- Interviews: quotes with attribution
- Goodreads: reviews
- Medium: RSS feed full text
### Step 2: Deduplicate
Same content appears across platforms (cross-posted tweets, syndicated
blog posts). Deduplicate by content similarity, keep the richest version
(the one with most metadata/context).
### Step 3: Topic Cluster
Run lightweight topic classification on each entry:
- Use the LLM or a simple keyword matcher to assign 1-3 topic tags
- Cluster into topic files for fast retrieval
- Topics are dynamic — new topics emerge from the data
### Step 4: Embed
Generate sentence-transformer embeddings for every entry.
Store in numpy array for fast cosine similarity search.
This enables semantic retrieval: "find everything @handle said about
consciousness" even if they never used the word "consciousness."
### Step 5: Index
Build the master index.json with entry count, topic distribution,
timestamp range, platform coverage, and quality metrics.
## Context-Aware Retrieval
This is the key. The archive might have 500 entries for a person.
The context window can hold maybe 30-50 of them alongside all the
other simulation context. You MUST retrieve selectively.
### For Simulation
When simulating @handle talking about topic X:
```
1. Semantic search: embed the current conversation context
2. Retrieve top 10-15 entries by cosine similarity to context
3. Also retrieve: 5 highest-engagement entries (their "greatest hits")
4. Also retrieve: 3 most recent entries (freshness)
5. Also retrieve: 2 entries that CONTRADICT the expected position
(prevents confirmation bias in the simulation)
6. Deduplicate. Cap at 25-30 entries total.
7. These become the "voice anchors" for generation.
```
The simulation draws from SPECIFIC REAL QUOTES relevant to the current
conversation. Not a generic profile. Not everything they've ever said.
The 25 most relevant things they've said about THIS topic.
### For Expert Synthesis
When the user asks "who are the best minds on X and what have they said?":
```
1. Search ALL archived people's entries for topic X
2. Rank by: entry quality × person expertise × relevance to query
3. Return a synthesis with CITATIONS:
On the topic of AI consciousness:
@repligate argues that LLMs exhibit "simulacra of consciousness"
rather than consciousness itself, distinguishing between the
model's behavior and its substrate:
> "the question isn't whether GPT is conscious but whether the
> character it's simulating is conscious within the fiction"
— tweet, 2025-03-15 (2.4K likes)
https://x.com/repligate/status/...
@nickcammarata approaches it from a meditation/first-person
perspective, noting parallels between introspective practice
and interpretability:
> "observation changes the system being observed, in meditation
> and in interp"
— tweet, 2026-04-05 (2.9K likes)
https://x.com/nickcammarata/status/...
@tszzl is skeptical of the framing entirely:
> "consciousness discourse is philosophy cosplaying as engineering"
— tweet, 2025-11-22 (5.1K likes)
https://x.com/tszzl/status/...
```
Every claim attributed. Every quote sourced. Every link clickable.
### For Grounding Predictions
When predicting what @handle would say about event Y:
```
1. Retrieve all archive entries related to Y or adjacent topics
2. Identify their PATTERN of response to similar events
3. Ground the prediction in specific past statements:
PREDICTION: @handle would likely frame event Y through the lens
of [topic Z], based on:
- tweet [url]: "quote about Z" (2025-06-15)
- blog post [url]: "longer quote about Z" (2025-09-20)
- podcast [url]: "verbal quote about Z" (2026-01-10)
CONFIDENCE: 78% (3 consistent sources over 7 months)
```
## Incremental Updates
The archive grows over time. Each time the person is profiled:
1. Pull new content since last archive timestamp
2. Append to source files
3. Re-embed new entries only
4. Update topic clusters
5. Update index
Don't rebuild from scratch. Append and re-index.
## Expert Table
When you have 20+ archived people, build an expert table:
```
worldsim> experts "open source AI"
EXPERT TABLE: open source AI
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
@Teknium | 47 entries | voice: builder/practitioner
"we can prove that open approaches build better, more
trustworthy systems" — tweet, 2026-04-05
Latest: 2 hours ago | Stance: STRONG ADVOCATE
@repligate | 12 entries | voice: philosophical/theoretical
"open weights = accountability. you can't audit a black box"
— tweet, 2025-11-30
Latest: 3 days ago | Stance: ADVOCATE (principled)
@eigenrobot | 8 entries | voice: statistical/contrarian
"the open source premium is largely downstream of selection
effects in who contributes" — tweet, 2025-08-14
Latest: 1 week ago | Stance: SKEPTICAL OF FRAMING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
3 experts found | 67 total entries | synthesize? (y/n)
```
The table shows: who knows about this, what they've said, how recently,
and what their stance is. All grounded in archived quotes with sources.
## Integration With Simulation
When the star thread + dossier + archive work together:
```
STAR THREAD: drives the core generation (what they're DOING)
DOSSIER: provides constraints (psychometrics, voice metrics, baselines)
ARCHIVE: provides GROUNDING (specific real quotes for this context)
MECHANICAL CHECKS: verifies surface features (emoji, length, slop)
```
The archive prevents the simulation from drifting into generic territory.
Instead of "this person would probably say something about open source,"
it's "this person said THIS SPECIFIC THING about open source 3 weeks ago,
and their simulation should be consistent with that while also being fresh."
## The Overfitting Problem
"Without overfitting to a particular material the new context doesn't call for."
The retrieval system MUST be selective. If someone said 47 things about
open source AI, and the current conversation is about AI regulation,
don't dump all 47 open source quotes into context. Maybe 3 are relevant
because they connect open source to regulation. Retrieve THOSE 3.
The cosine similarity search handles this naturally — it matches the
CURRENT conversation context against the archive and returns what's
actually relevant, not everything tagged with a nearby topic.
The anti-overfitting checklist:
- Never load more than 25-30 archive entries per person into context
- Weight by relevance to CURRENT conversation, not by general importance
- Include at least 2 entries that contradict the expected position
- Include at least 3 recent entries regardless of topic relevance (freshness)
- If the conversation shifts topic mid-simulation, RE-RETRIEVE for new context
- The archive is a LIBRARY you consult, not a script you follow
@@ -0,0 +1,321 @@
# Mass Behavior Modeling — Communities, Clusters, Cascades
Understanding individual behavior requires understanding the social
ecosystem they exist in. This reference covers the macro layer:
community detection, influence networks, audience modeling, and
predicting how groups respond to events.
## Why This Matters For Simulation
Individual prediction accuracy: ~56-60%
Individual-in-context prediction: significantly higher
A person's behavior is constrained by their community. Knowing WHICH
community they belong to, WHO influences them, and WHAT information
ecosystem they're in makes individual predictions much sharper.
Lewin's equation: B = f(P, E). This reference is about the E.
## The Ecosystem Stack
```
Layer 5: AUDIENCE REACTION — How would this person's audience respond?
Layer 4: STANCE & SENTIMENT — What positions do clusters hold?
Layer 3: INFLUENCE NETWORKS — Who spreads ideas to whom?
Layer 2: COMMUNITY CLUSTERS — Who groups together?
Layer 1: SOCIAL GRAPH — Who follows/interacts with whom?
```
## Layer 1: Social Graph Construction
### Data Sources (by accessibility)
| Source | Access | Quality | Tools |
|--------|--------|---------|-------|
| Bluesky AT Protocol | FREE, open, no auth | Excellent | atproto (pip) |
| X/Twitter API | Bearer token, limited | Good but restricted | curl, tweepy |
| Reddit | API with limits | Good for comments | PRAW (pip) |
| GitHub | Free API | Great for tech people | PyGithub (pip) |
| Web scraping | Fragile, TOS issues | Variable | Last resort |
### Bluesky: The Open Gold Mine
```python
# pip install atproto
from atproto import Client
client = Client()
# No auth needed for public data
# Get follower graph
followers = client.get_followers(actor="handle.bsky.social")
following = client.get_follows(actor="handle.bsky.social")
# Real-time firehose (no auth!)
# wss://jetstream1.us-east.bsky.network/subscribe
```
### Graph Types
- **Follow graph**: who follows whom (directed, static-ish)
- **Interaction graph**: who replies to / retweets whom (directed, dynamic)
- **Mention graph**: who mentions whom (directed, weighted by frequency)
- **Co-engagement graph**: who engages with the same content (undirected)
Interaction graphs are more informative than follow graphs for predicting
actual behavioral alignment.
### Tools
```
pip install networkx python-igraph
```
NetworkX for prototyping (<100K nodes), igraph for production (millions).
## Layer 2: Community Detection
### Algorithms (ranked by quality)
| Algorithm | Quality | Speed | Notes |
|-----------|---------|-------|-------|
| Leiden | Best | Fast | Guarantees connected communities |
| Louvain | Good | Fastest | Can produce disconnected communities |
| Infomap | Excellent | Medium | Based on information theory |
| Label Propagation | Decent | Very fast | Non-deterministic |
### The Meta-Library: CDLib
```
pip install cdlib
```
Wraps 50+ community detection algorithms in a unified API.
Works on top of networkx/igraph. Highly recommended.
```python
import cdlib
from cdlib import algorithms
import networkx as nx
G = nx.karate_club_graph()
communities = algorithms.leiden(G)
# Also: louvain, infomap, label_propagation, angel, demon, etc.
```
### What Communities Tell Us
Each community in a social graph typically shares:
- Ideological orientation
- Topic interests
- Information sources
- Language patterns and in-group vocabulary
- Reaction patterns to events
Knowing which community someone belongs to immediately constrains
predictions about their likely positions and reactions.
## Layer 3: Influence Networks
### Key Insight (Zhou et al., National Science Review 2024)
Network centrality alone is INSUFFICIENT for predicting influence.
Must combine structural position with behavioral features:
- Posting frequency
- Historical content virality
- Response rate / engagement ratio
- Content originality (original vs repost ratio)
### Centrality Measures
```python
import networkx as nx
G = nx.DiGraph() # directed social graph
# Who has the most connections?
degree = nx.degree_centrality(G)
# Who bridges different communities?
betweenness = nx.betweenness_centrality(G)
# Who's connected to other well-connected people?
eigenvector = nx.eigenvector_centrality(G)
# Adapted from web — directed influence flow
pagerank = nx.pagerank(G)
```
### Superspreader Identification (DeVerna et al., PLOS ONE 2024)
Superspreaders of content fall into three categories:
1. **Pundits**: large following, high authority, original content
2. **Media outlets**: institutional accounts, news organizations
3. **Affiliated personal accounts**: connected to pundits/outlets
For simulation: knowing who the superspreaders are in a person's
network tells you what information they're likely exposed to.
### Information Cascade Modeling
```
pip install ndlib # Network Diffusion Library
```
NDlib models how information spreads through networks:
- Independent Cascade Model
- Linear Threshold Model
- SIR/SIS epidemiological models adapted for info spread
- Voter Model (opinion dynamics)
- Sznajd Model (social influence)
## Layer 4: Stance & Sentiment Analysis
### Ready-To-Use Models (HuggingFace)
**Tweet Sentiment** (most reliable):
```
cardiffnlp/twitter-roberta-base-sentiment-latest
# Labels: positive / negative / neutral
```
**Political Stance**:
```
kornosk/bert-election2020-twitter-stance-biden-KE-MLM
kornosk/bert-election2020-twitter-stance-trump-KE-MLM
launch/POLITICS # left / center / right
```
**All-in-One Tweet NLP**:
```
pip install tweetnlp
# Sentiment, emotion, hate speech, NER, topic classification
```
### Topic-Level Stance Tracking
Combine BERTopic (dynamic topic modeling) with stance classifiers:
1. Cluster posts into topics over time windows
2. Classify stance per topic per community
3. Track stance shifts over time
4. Detect divergence between communities on emerging topics
### PRISM Framework (ACL 2025)
First framework for interpretable political bias embeddings.
Two-stage: mine bias indicators → cross-encoder assigns structured scores.
```
github.com/dukesun99/ACL-PRISM
```
## Layer 5: Audience Modeling & Crowd Prediction
### The Frontier: Predicting How Groups React
Key papers and findings:
**CReAM (WWW 2024)**: Predicts which of two posts gets more engagement.
Uses LLM-generated features + FLANG-RoBERTa cross-encoder.
Demonstrates crowd reaction IS predictable from content alone.
**PopSim (Dec 2025)**: LLM multi-agent social network sandbox.
Simulates content propagation dynamics using "Social Mean Field"
for individual-population interaction. Reduces prediction error 8.82%.
**Conditioned Comment Prediction (EACL 2026)**:
KEY FINDING: behavioral traces (past posts) are BETTER than
descriptive personas for conditioning LLMs to predict user behavior.
This validates our OSINT approach: real data > personality labels.
**DEBATE Benchmark (Oct 2025)**:
WARNING: LLM agents converge opinions TOO QUICKLY vs real humans.
SFT + DPO helps but gap remains. Real communities maintain
disagreement longer than simulated ones.
**Distributional vs Individual Prediction (PMC 2025)**:
Group-level predictions are more reliable than individual ones.
Predicting "65% of this community will react negatively" is more
accurate than predicting "this specific person will react negatively."
### Application to Simulation
When simulating @person talking about event X, consider:
1. What community does @person belong to?
2. How is that community reacting to X? (distributional prediction)
3. Where does @person sit within that community? (conformist vs contrarian)
4. Who influences @person? What are THEY saying?
5. How does @person's audience react to their take? (engagement prediction)
This context makes individual predictions sharper.
## Echo Chamber & Filter Bubble Detection
### Technique
1. Build interaction graph
2. Run Leiden community detection
3. For each community, aggregate stance on key issues
4. Measure ideological homogeneity within communities
5. Compare cross-community vs within-community content similarity
6. High within + low cross = echo chamber
### Tools
```
github.com/mminici/Echo-Chamber-Detection # Cascade-based, CIKM 2022
# Includes Brexit and VaxNoVax datasets
```
### What It Tells Us
Knowing someone's echo chamber tells you:
- What information they're exposed to
- What they're NOT exposed to
- How extreme their positions might be (isolation → radicalization)
- Whether they'll encounter pushback or only agreement
- How they'll react to information from outside their bubble
## User Embeddings: "Find People Like @person"
### Strategy
1. Embed each user's recent N posts with sentence-transformers
2. Average embeddings → user vector
3. Use FAISS for similarity search
4. Cluster users with HDBSCAN in embedding space
### Best Models for Social Media Text
```
# General purpose (good baseline)
sentence-transformers/all-mpnet-base-v2
# Tweet-specific (better domain fit)
cardiffnlp/twitter-roberta-base
vinai/bertweet-base # pretrained on 850M tweets
```
### Graph + Text Hybrid Embeddings
```
pip install karateclub
```
KarateClub provides Node2Vec, DeepWalk, Graph2Vec — embed users
based on graph position. Combine with text embeddings for hybrid
vectors that capture BOTH what someone says AND where they sit
in the social network.
## Practical Application to Simulation
### For Individual Simulation (what we already do)
Add ecosystem context to each dossier:
- Which community cluster they belong to
- Who their top influencers are (who do they retweet/amplify most)
- What echo chamber are they in (information environment)
- How does their community view the simulation topic
### For Audience Simulation (new capability)
When user asks "what would @person's audience say":
1. Identify @person's follower community
2. Sample representative voices from that community
3. Model the DISTRIBUTION of responses, not just one response
4. Include: cheerleaders, critics, joke-makers, lurkers
5. Weight by typical engagement patterns
### For Cascade Prediction (new capability)
When user asks "how would this take spread":
1. Model the initial tweet and its immediate network
2. Predict which nodes amplify (based on stance alignment + influence)
3. Estimate reach and engagement range
4. Predict quote-tweet ratio (agreement vs dunking)
## Recommended Minimal Stack
```bash
pip install networkx python-igraph leidenalg cdlib karateclub
pip install sentence-transformers transformers tweetnlp
pip install ndlib faiss-cpu hdbscan atproto
```
This gives you: graph construction, community detection, user embeddings,
stance/sentiment analysis, diffusion simulation, similarity search,
clustering, and Bluesky data access. All open source, all pip-installable.
@@ -0,0 +1,370 @@
# OSINT Pipeline — Deep Intelligence Gathering
Full-spectrum open source intelligence for building personality models.
This goes beyond social media posts into visual identity, cross-platform
footprints, and behavioral analysis.
## Tool Arsenal
| Tool | Use Case | Strength |
|------|----------|----------|
| `web_search` | Find anything, initial discovery | Fast, broad, indexed content |
| `web_extract` | Pull full page content | Blogs, articles, profiles, PDFs |
| `browser_navigate` + `browser_snapshot` | View live pages | Dynamic content, login walls |
| `browser_vision` | Analyze what a page looks like | Layouts, visual identity, screenshots |
| `vision_analyze` | Analyze any image by URL/path | Profile pics, post images, aesthetics |
| `browser_get_images` | List all images on a page | Find images to feed to vision_analyze |
| Yandex reverse image search | Find where an image appears | Identity verification, alt accounts |
| `x-cli` (if available) | Direct Twitter API | Timelines, search, metadata |
## Instagram Intelligence
Instagram is CRITICAL for personality modeling — it reveals:
- Visual identity and aesthetic preferences
- Real-life social circles (tagged people, group photos)
- Lifestyle signals (travel, food, hobbies, pets)
- Caption voice (often different from Twitter voice)
- Story highlights (curated self-image)
- Bio links (cross-platform connections)
### Viewing Instagram Profiles (VERIFIED APRIL 2026)
**METHOD 1 — Instagram Private Web API (BEST, returns full JSON)**
```bash
curl -s -H 'User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 16_0 like Mac OS X)' \
-H 'x-ig-app-id: 936619743392459' \
'https://i.instagram.com/api/v1/users/web_profile_info/?username={handle}'
```
Returns ~500KB of JSON: full profile + last 12 posts with captions, likes,
comments, CDN image URLs, timestamps. No auth needed.
**METHOD 2 — Instagram oEmbed API (for individual posts)**
```bash
curl -s 'https://www.instagram.com/api/v1/oembed/?url=https://www.instagram.com/p/{SHORTCODE}/'
```
Returns: caption text, author_name, thumbnail URL. No auth.
**METHOD 3 — Pixwox via web_extract (profile viewer)**
```python
web_extract(["https://pixwox.com/profile/{username}"])
```
Returns 12+ recent posts with captions, engagement stats. Cloudflare blocks
curl but web_extract bypasses it.
**METHOD 4 — SocialBlade via web_extract (analytics)**
```python
web_extract(["https://socialblade.com/instagram/user/{handle}"])
```
Returns follower count, engagement rate, 14-day tracking.
**METHOD 5 — CDN direct download (images from API responses)**
Image URLs from API responses (scontent-*.cdninstagram.com) download
directly with no auth. Feed them to vision_analyze for visual profiling.
**METHOD 6 — Google indexed content**
```
web_search("site:instagram.com {username}")
```
Returns bio text, follower count, recent post captions from search snippets.
**WHAT DOESN'T WORK:** direct web_extract on instagram.com, ?__a=1 trick,
graph.instagram.com (needs OAuth), imginn/picuki/dumpoir/gramhir (403)
### Instagram Discovery (finding someone's handle)
```
web_search("{real_name} instagram")
web_search("{twitter_handle} instagram account")
web_search("site:instagram.com {real_name}")
# Check their Twitter/X bio for IG links
# Check their personal website for social links
# Check Linktree / bio.link pages
```
### Extracting Signal from Instagram
**Profile Picture**: Reveals self-presentation style
- Professional headshot vs casual vs meme/avatar
- Analyze with vision_analyze for clothing, setting, expression
**Bio Text**: Compressed self-identity
- Role/title claims
- Emoji usage patterns
- Link destinations
- Location claims
**Post Grid**: Visual identity fingerprint
- Color palette tendencies
- Content categories (food/travel/tech/selfies/memes)
- Posting frequency
- Professional vs personal ratio
**Captions**: Voice sample different from Twitter
- Usually longer, more personal
- Hashtag usage patterns
- Emoji patterns
- Tone (inspirational vs casual vs funny)
**Tagged Photos**: Real social graph
- Who they hang out with IRL
- Events they attend
- Social circles outside tech/AI
## Visual Identity Analysis
Use vision tools to analyze HOW someone presents visually:
### Profile Pictures Across Platforms
```
# Collect profile pics from multiple platforms
# Twitter, Instagram, LinkedIn, GitHub, Discord
# Analyze each
vision_analyze(image_url="{pic_url}",
question="Describe this profile picture in detail: person's appearance, clothing style, setting, expression, professional vs casual, any notable elements")
# Cross-reference: do they use the same pic everywhere? Different personas?
```
### Reverse Image Search (Yandex Pipeline)
From memory — Google Lens blocks Browserbase IPs, use Yandex:
```
# For images behind auth/CDN, upload to catbox first
terminal("curl -F 'reqtype=fileupload' -F 'fileToUpload=@{local_path}' https://catbox.moe/user/api.php")
# Then Yandex reverse image search
browser_navigate("https://yandex.com/images/search?rpt=imageview&url={encoded_public_url}")
# Or via web_extract (slower but automatable)
web_extract(["https://yandex.com/images/search?rpt=imageview&url={encoded_url}"])
```
Yandex provides:
- Similar images (find the same person elsewhere)
- Site matches (where this image appears)
- OCR text extraction (text in images)
- Image tags (what's in the image)
- Knowledge panels (identified entities)
### Screenshot Analysis
When you can see a page but can't extract text:
```
browser_vision(question="Read all text on this page. List usernames, post content, dates, engagement numbers")
browser_vision(annotate=true, question="What interactive elements are on this page?")
```
## LinkedIn Intelligence
**STATUS: BLOCKED for automated access** (tested April 2026).
web_extract returns "Website Not Supported". Direct browsing triggers auth walls.
**Workarounds:**
```
# LinkedIn content IS indexed by search engines
web_search("{real_name} linkedin {company}")
web_search("site:linkedin.com/in {name}")
# These return snippets with headline, role, company — useful even without full profile
# Google sometimes caches LinkedIn profiles
web_search("{name} site:linkedin.com headline")
```
**METHOD 1 — Google indexed snippets (always works)**
```
web_search("site:linkedin.com/in {name} {company}")
```
Returns: name, headline, company, location, connection count, bio snippet.
**METHOD 2 — Crunchbase (EXCELLENT for founders/execs)**
```python
web_extract(["https://www.crunchbase.com/person/{slug}"])
```
Returns: full career history, education, investments, board positions,
social links. Best source for professional identity of startup people.
**METHOD 3 — Corporate press pages**
```
web_search("{person} {company} site:{company}.com bio OR press")
```
Official bios from company newsrooms. High quality, curated but factual.
**METHOD 4 — Third-party aggregators**
- RocketReach, SignalHire — job title + company from web_search snippets
- rootdata.com — good for crypto/AI people
- Crunchbase — best all-round for tech executives
**METHOD 5 — Paid LinkedIn API wrappers** (if budget allows)
- LinkdAPI, Proxycurl: $0.07-0.15 per profile, full structured data
- No OAuth needed, just API key
LinkedIn reveals (from combined methods):
- Career trajectory (Crunchbase full history)
- Current role and headline (search snippets)
- Education (Crunchbase or search snippets)
- Professional self-presentation (company bio pages)
- Investment/board activity (Crunchbase)
## Podcast Transcripts (HIGHEST VALUE for voice profiling)
Podcast interviews are THE gold mine for personality modeling. Hours of
unscripted speech, natural conversation, real personality showing through.
**Discovery:**
```
web_search("{name} podcast transcript interview")
web_search("{name} lex fridman OR tyler cowen OR joe rogan OR dwarkesh")
```
**Extraction — verified working transcript sources:**
```python
# Lex Fridman (full verbatim transcripts)
web_extract(["https://lexfridman.com/EPISODE_URL/transcript"])
# Conversations with Tyler (Tyler Cowen — full transcripts)
web_extract(["https://conversationswithtyler.com/episodes/..."])
# TED Talks transcripts
web_extract(["https://www.ted.com/talks/.../transcript"])
# Sequoia Capital podcast
web_extract(["https://www.sequoiacap.com/podcast/..."])
```
Podcast transcripts reveal:
- Natural speech patterns (filler words, pacing, sentence structure)
- Unguarded opinions (less curated than tweets)
- How they respond to pushback (interviewer challenges)
- Humor style in conversation (different from written humor)
- Depth of knowledge on specific topics
- Personality under pressure
## YouTube / Video Intelligence
```
web_search("{name} youtube talk keynote interview")
web_search("{name} podcast appearance")
```
web_extract on YouTube pages returns rich summaries with attributed quotes.
Use youtube-content skill for full transcripts if available.
## Personal Blogs & Substacks (HIGH VALUE)
Personal writing is curated self-expression — how someone WANTS to be
seen intellectually. Very different signal from social media.
```
web_search("{name} blog substack essay")
# Extract full posts
web_extract(["https://{blog-url}/"])
# Wayback Machine works for archived blog posts
web_extract(["https://web.archive.org/web/2024/{blog-url}"])
```
## GitHub Intelligence
For technical people:
```
web_search("site:github.com {handle}")
web_extract(["https://github.com/{handle}"])
# Issue comments reveal communication style under technical pressure
web_search("site:github.com {handle} issue comment")
# README style reveals documentation personality
# Commit messages reveal terseness vs verbosity
```
## General Web Footprint
```
# Personal website / blog
web_search("{name} personal website blog about")
# Conference talks / speaker bios
web_search("{name} speaker conference talk bio")
# News mentions
web_search("{name} {company} news interview profile")
# Academic papers (for researchers)
web_search("{name} arxiv paper author")
web_search("site:scholar.google.com {name}")
# Podcast appearances
web_search("{name} podcast guest appearance")
# Forum posts (HN, specific communities)
web_search("site:news.ycombinator.com {handle} OR {name}")
```
## Cross-Platform Identity Resolution
### Handle Mapping Strategy
1. Start from known handle (usually Twitter)
2. Check bio links — most people link to other platforms
3. Search "{known_handle} {platform}" for each platform
4. Check personal website for social links
5. Reverse image search profile pic to find matching accounts
6. Search unique phrases they use across platforms
### Identity Verification
When you find a potential match on another platform:
- Same profile picture? (reverse image search)
- Same bio keywords?
- Same name/handle pattern?
- Cross-references (do they mention each other?)
- Writing style match?
## Search Space Narrowing
### The Jiggle Technique
When broad searches return noise, narrow progressively:
1. **Start broad**: `"{name}" AI`
2. **Add role**: `"{name}" {company} {role}`
3. **Add context**: `"{name}" {company} {specific_project_or_topic}`
4. **Add platform**: `site:{platform} "{name}" {context}`
5. **Add time**: `"{name}" {topic} 2025 OR 2026`
6. **Quote unique phrases**: if you found a distinctive phrase they use, search for that exact phrase to find more of their content
### Disambiguation
Common names need extra signals:
- Add their company/org
- Add their specific domain (AI, crypto, etc.)
- Use their unique handle as anchor
- Search for combinations of their known associates
- Use image search to verify you have the right person
### Signal vs Noise Heuristics
- **High signal**: direct quotes, interview transcripts, personal blog posts, long-form content
- **Medium signal**: mentions in aggregator sites, conference bios, LinkedIn summaries
- **Low signal**: generic news mentions, third-party profiles, directory listings
- **Noise**: same-name different person, outdated info (>2 years), scraped/regurgitated content
## Confidence Calibration
After full OSINT sweep, rate data quality:
| Confidence | Data Available | Simulation Quality |
|-----------|---------------|-------------------|
| 95-100% | 50+ posts, longform, video, visual, cross-platform | Near-perfect voice replication |
| 80-94% | 20-50 posts, some longform, basic visual | Very good, occasional educated guesses |
| 60-79% | 10-20 posts, mostly short-form | Good general sense, some gaps |
| 40-59% | 5-10 posts, limited platforms | Broad strokes only, flag uncertainty |
| 20-39% | <5 posts, single platform | Sketch at best, heavy disclaimers |
| <20% | Almost nothing found | Decline to simulate, ask user for context |
## Privacy & Ethics Note
All research uses publicly available information only. We don't:
- Access private/locked accounts
- Bypass authentication
- Use leaked/hacked data
- Dox or expose private information
- Simulate in ways designed to deceive or impersonate
The goal is personality MODELING for creative simulation, grounded in
what people choose to share publicly.
@@ -0,0 +1,334 @@
# Prediction Engine — Forecasting What Someone Would Say/Do
Techniques for predicting behavior grounded in superforecasting methodology,
behavioral science, and SOTA LLM prediction research.
## Superforecasting Principles (Tetlock)
**Honest caveat**: Superforecasting methodology was developed for geopolitical and
world-event prediction, not personality simulation. That said, the THINKING TOOLS
are genuinely useful here — decomposition prevents lazy pattern-matching, base rates
fight overconfidence, and alternative hypotheses prevent single-track predictions.
What does NOT transfer cleanly: the calibration precision. When Tetlock says "70%
confident," that's backed by thousands of scored predictions. When we say "70%
confident" about what @someone would tweet, that's an educated estimate, not a
calibrated probability. Use the framework for its rigor, not its false precision.
Apply these thinking tools when making behavioral predictions:
### 1. Decomposition (Fermi-ize the Question)
Don't ask "What would @person say about X?"
Break it down:
- What is @person's known position on topics RELATED to X?
- What are their values/priorities that X touches on?
- What is their emotional register when discussing similar topics?
- Who are they likely responding to, and how does that change their tone?
- What platform are they on, and how does that shift their behavior?
### 2. Outside View First (Base Rates)
Before considering the specific person, ask:
- What would a TYPICAL person in their role/position say about X?
- What % of people in their ideological cluster hold position Y on X?
- What's the base rate for their type of response (agree/disagree/joke/ignore)?
### 3. Inside View Second (Case-Specific Adjustment)
Now adjust from the base rate using what you ACTUALLY KNOW about them:
- Specific past statements on this topic or related topics
- Known relationships with people/orgs involved
- Personal experiences that would shape their view
- Contrarian tendencies (do they predictably go against their cluster?)
### 4. Confidence Calibration
Express predictions with honest uncertainty. **These are rough buckets, not
calibrated probabilities. Don't pretend they're more precise than they are.**
- **90%+ confident**: They've literally said this before, just rephrased
- **70-89%**: Strong pattern match with known positions and voice
- **50-69%**: Reasonable inference but could go either way
- **30-49%**: Educated guess, limited data
- **<30%**: Basically guessing, flag it clearly
When reporting confidence, prefer plain language over fake precision:
"very likely" > "87% probability". The number implies a precision we don't have.
### 5. Consider Alternative Hypotheses
For every prediction, generate at least ONE plausible alternative:
- "They'd PROBABLY say X, but they might surprise with Y because Z"
- This prevents overconfident single-track predictions
## The Prediction Pipeline
### Step 1: Classify the Prediction Type
| Type | Description | Difficulty |
|------|-------------|-----------|
| **Position prediction** | What they believe about X | Easiest if data exists |
| **Reaction prediction** | How they'd respond to event Y | Medium |
| **Voice prediction** | How they'd phrase something | Medium-hard |
| **Behavior prediction** | What they'd DO (not just say) | Hardest |
| **Interaction prediction** | How they'd respond to specific person | Hard, depends on relationship data |
### Step 2: Evidence Gathering Protocol
For each prediction, gather evidence in this order:
1. **Direct evidence**: Have they addressed this exact topic before?
- Search: `"{handle}" "{topic}"` or `"{handle}" "{related_keyword}"`
- Weight: HIGHEST
2. **Analogical evidence**: Have they addressed something similar?
- Search: find positions on adjacent topics
- Weight: HIGH
3. **Value evidence**: What values/principles would apply?
- Infer from their stated beliefs and consistent positions
- Weight: MEDIUM
4. **Social evidence**: What do their peers/allies think?
- People tend to align with their social cluster (but not always)
- Weight: LOW-MEDIUM (higher for conformists, lower for contrarians)
5. **Demographic evidence**: What would someone in their position typically think?
- Base rate from role/industry/ideology
- Weight: LOWEST (only use as anchor, not conclusion)
### Step 2b: Contradiction Handling Protocol
When evidence conflicts (e.g., person said X in 2024 but Y in 2026):
1. **Check for genuine change**: Did they explicitly reverse position? Look for
"I used to think X but now..." or a clear pivot moment. If so, use the newer
position and note the evolution.
2. **Check for context-dependence**: Did they say X to audience A and Y to audience B?
This isn't necessarily dishonesty — people emphasize different facets for different
contexts. Note which context your simulation targets and use the matching register.
3. **Check for nuance collapse**: Maybe they said "X is mostly good with caveats"
and later "X has real problems" — these might not actually contradict. Look for
the synthesis position.
4. **When genuinely unresolvable**: Flag it explicitly. "Evidence conflicts on this
point — they've argued both sides at different times. Simulating {chosen position}
based on {reasoning}, but the alternative is plausible." Don't paper over the
contradiction with false confidence.
5. **Recency default**: When all else fails, weight more recent statements higher.
People change, and the most recent position is the best predictor of the next one.
### Step 3: Generate Prediction
Using the HumanLLM B = f(P, E) framework:
- **P (Person)**: Everything from the dossier — personality, values, voice
- **E (Environment)**: The specific context — platform, topic, who's asking,
what just happened, social dynamics in play
Generate the prediction by:
1. Setting the base rate (outside view)
2. Adjusting for personal specifics (inside view)
3. Filtering through their voice profile (how they'd phrase it)
4. Applying platform-specific behavior patterns
5. Calibrating confidence
## Memory Curation (The 30-50 Rule)
Research shows performance PEAKS at 30-50 memory entries then DECLINES.
For each person in a simulation, curate memories:
### What to Include (high signal)
- **Signature takes**: Their most characteristic/famous positions (5-10)
- **Voice samples**: Real quotes that capture their linguistic style (5-10)
- **Relationship data**: Known dynamics with other sim targets (3-5)
- **Recent context**: What they've been talking about lately (3-5)
- **Formative moments**: Career milestones, public pivots, viral moments (3-5)
- **Quirks & tells**: Catchphrases, humor style, pet peeves (3-5)
### What to Exclude (noise)
- Generic biographical facts that don't predict behavior
- Old positions they've clearly evolved past
- Trivial interactions that don't reveal personality
- Secondhand characterizations (what others say about them)
- Platform metadata (follower counts, join dates) unless directly relevant
### Memory Selection Heuristic
For each candidate memory entry, ask:
**"If I removed this, would the simulation noticeably degrade?"**
If no, cut it.
## Fighting LLM Defaults
Research shows LLMs have systematic biases in simulation. The fixes below need to be
CONCRETE — vague instructions like "be more like them" don't work. You need specific
prompting patterns that actually shift the output.
### Problem: Sycophancy & Over-Agreement
LLMs default to agreement and positivity.
**Fix**: Don't just note they're contrarian — structure it as a behavioral instruction
with evidence:
```
"In this conversation, {person} disagrees with {other_person} on {topic}. They are
noticeably more confrontational than the other speakers. They tend to respond to
consensus with skepticism and reframe debates on their own terms. Example from their
real posts: '{actual quote where they disagreed with something popular}'"
```
### Problem: Rigid/Polarized Strategies
LLMs tend to take extreme positions and hold them rigidly.
**Fix**: Provide specific nuance instructions:
```
"In this conversation, {person} holds a complex position on {topic}: they agree with
{point A} but push back on {point B}. They're the type to say 'yes, but...' rather
than 'no.' Real example of their nuance: '{quote showing them holding a both-and
position}'"
```
### Problem: Uniform Register
LLMs default to a similar educated-casual tone for everyone.
**Fix**: Anchor voice with REAL QUOTES and explicit comparative instructions:
```
"In this conversation, {person} is noticeably more {trait} than the other speakers.
They tend to {specific behavior pattern}. Their sentences are typically {length/style}.
They {do/don't} use emoji. Their humor style is {type}. Example from their real posts:
'{actual quote that captures their voice}'"
```
The more you can say "{person} does THIS while {other_person} does THAT," the better
the differentiation. Comparative framing outperforms absolute descriptions.
### Problem: Overly Structured Responses
LLMs love neat arguments with clear structure.
**Fix**: Provide explicit structural anti-patterns:
```
"When generating {person}'s messages, break conventional structure. They start one
thought and jump to another mid-sentence. They use '...' and '—' instead of periods.
They repeat words for emphasis. They don't conclude neatly. Example: '{real quote
showing their chaotic structure}'"
```
### Problem: Missing Mundane Behavior
LLMs focus on "interesting" responses, skip boring/mundane ones.
**Fix**: Explicitly instruct for mundane moments:
```
"Not every message from {person} needs to be insightful. Include at least 1-2 messages
that are just reactions ('lmao', 'this', 'wait what'), link shares without commentary,
or brief agreements. Real people don't craft every message. {person} specifically tends
to {their specific mundane behavior pattern, e.g., 'drop a single emoji reaction'
or 'just retweet without comment'}."
```
### General Principle for All Fixes
The pattern is always: **behavioral instruction + comparative framing + real evidence**.
- "Do X" alone doesn't work well
- "Do X, unlike the default of Y" works better
- "Do X, unlike the default of Y, as evidenced by this real quote: Z" works best
## The Adjective-Based Personality Method
70 bipolar adjective pairs for Big Five traits. Select 3 per trait
with intensity modifiers.
### Openness
High: creative, curious, imaginative, artistic, adventurous, intellectual,
unconventional, perceptive
Low: conventional, practical, traditional, routine-oriented, narrow
### Conscientiousness
High: organized, disciplined, reliable, meticulous, systematic, thorough,
goal-oriented, persistent
Low: careless, impulsive, disorganized, spontaneous, flexible, relaxed
### Extraversion
High: outgoing, talkative, energetic, assertive, enthusiastic, bold,
gregarious, dominant
Low: reserved, quiet, introverted, solitary, withdrawn, reflective
### Agreeableness
High: cooperative, trusting, empathetic, generous, accommodating, kind,
diplomatic, forgiving
Low: competitive, skeptical, blunt, confrontational, critical, stubborn,
independent-minded
### Neuroticism
High: anxious, moody, sensitive, reactive, volatile, self-conscious,
insecure, emotional
Low: calm, stable, resilient, confident, even-tempered, composed,
thick-skinned
### Usage
For each simulated person, after OSINT research, estimate their Big Five
profile and select appropriate adjectives:
Example: "@basedjensen: very creative, somewhat impulsive, very outgoing,
a bit competitive, calm" → this shapes the generation toward the right
behavioral profile.
## Interaction Dynamics Prediction
When simulating conversations between multiple people, remember that predictions
apply to a SPECIFIC REGISTER. See the next section on performative vs. authentic
behavior.
## Performative vs. Authentic Behavior
**Critical concept**: People act differently for different audiences. A simulation
must be explicit about which register it's targeting.
### The Register Spectrum
- **Public broadcast** (tweets, Reddit posts): Most performative. People are
playing to their audience, building their brand, signaling to their tribe.
- **Semi-public** (Discord channels, group chats, comment threads): Less
performative but still audience-aware. People are more casual but know
others are watching.
- **Private 1-on-1** (DMs): Much less performative. More honest, more
vulnerable, more willing to express doubt or uncertainty.
- **True private** (inner monologue, close friends): We have almost no data
on this. Don't pretend to simulate it.
### Practical implications
- When simulating a PUBLIC thread, lean into the person's public persona —
their brand, their usual takes, their audience-aware voice.
- When simulating DMs, dial down the performance. More hedging, more honesty,
more "I actually think..." vs. the public "Here's my take:".
- When evidence comes from one register but the simulation targets another,
FLAG IT: "Evidence is from public tweets but simulating DM behavior —
expect the real person to be less {polished/aggressive/confident} in private."
- Someone's Twitter persona may be genuinely different from their Reddit persona.
These are not interchangeable data sources. Weight evidence from the matching
platform higher.
### What we can't know
Be honest: we're simulating public figures based on their public output. The
private person may be substantially different. DM simulations are inherently
lower-confidence than public thread simulations because we have less data on
how people behave privately.
### Dominance Hierarchy
- Who talks first? (most confident/highest-status usually)
- Who responds to whom? (not everyone talks to everyone)
- Who gets ratio'd? (lowest-status takes get challenged)
- Who lurks? (some people watch before engaging)
### Agreement/Disagreement Prediction
Based on known positions + social dynamics:
- **Strong agree**: Both have stated similar positions + friendly relationship
- **Agree with nuance**: Similar positions but one adds a caveat
- **Productive disagreement**: Different positions + mutual respect
- **Hostile disagreement**: Different positions + existing tension/rivalry
- **Surprising agreement**: Expected to disagree but find common ground
- **Ignore**: Some people just don't engage with certain others
### Conversation Flow Prediction
Real conversations follow patterns:
1. **Opener** → most active/impulsive person posts first
2. **First response** → most engaged/relevant person responds
3. **Pile-on or pushback** → depends on agreement/disagreement dynamics
4. **Tangent** → someone takes a side thread
5. **Peak moment** → the best/most viral exchange
6. **Trail off** → energy dissipates, last person makes a joke or short comment
## Scenario Injection Prediction
When "inject: {event}" is used, predict reactions:
1. **Who would see this first?** (most online / most relevant to their work)
2. **Who would care most?** (most affected / strongest opinion)
3. **What's the emotional valence?** (good news for some, bad for others)
4. **What's the expected take?** (apply position prediction pipeline)
5. **How does this change the existing conversation?** (derail, amplify, redirect)
@@ -0,0 +1,237 @@
# Recursive Self-Improvement Pipeline
The simulator should get better every time it runs. Not through training —
through accumulating failure patterns, calibration data, and learned rules
that feed back into future simulations.
## The Loop
```
SIMULATE → VERIFY (mechanical) → SCORE → LOG FAILURES → UPDATE RULES → SIMULATE BETTER
```
Each run produces two outputs:
1. The simulation (for the user)
2. A failure log (for the system)
The failure log feeds back into the next run's verification step,
making the checklist grow and the blind spots shrink.
## What Gets Logged After Every Simulation
### 1. Mechanical Check Failures
```
FAILURE LOG: simulation_{timestamp}
EMOJI: @visakanv had 6 fabricated emoji, real rate was 10%. Stripped all.
SLOP: @eigenrobot utterance contained "multifaceted" — rewritten.
LENGTH: @QiaochuYuan avg 42 words/utterance, real avg was 18. Compressed.
CAPS: 4/12 utterances started uppercase, targets are 90% lowercase. Fixed.
PUNCTUATION: Added periods to @tszzl who never uses terminal punctuation.
STRUCTURE: Sycophantic flow detected — B agreed with A then C agreed with B.
Injected disagreement.
```
### 2. Discriminator Critique Patterns
```
CRITIQUE LOG:
Round 1: @tszzl too verbose (flagged 2x in last 3 simulations)
Round 1: @repligate too academic (flagged 3x — this is a persistent pattern)
Round 2: Conversation too neat — real conversations are messier (flagged 5x)
```
### 3. Held-Out Test Results
```
CALIBRATION LOG:
Voice fidelity: 8.4/10 (up from 7.5 last run)
Topic prediction: 2/5 topics matched (typical — content is unpredictable)
Register match: 9/10 (improved after emoji fix)
```
## How Failures Feed Forward
### Pattern Accumulation
After N runs, persistent failure patterns become AUTOMATIC rules:
```
IF a pattern is flagged in 3+ consecutive simulations:
PROMOTE it from "check" to "pre-generation rule"
Example progression:
Run 1: "Too verbose for @tszzl" → flagged in Round 1, fixed
Run 2: "Too verbose for @tszzl" → flagged again, fixed again
Run 3: "Too verbose for @tszzl" → PROMOTED to pre-gen rule:
"When simulating roon-type voices: max 20 words per tweet.
Fragment > sentence. Compress ruthlessly."
```
### The Growing Checklist
The mechanical verification checklist starts with the baseline checks
(emoji, slop, length, caps, punctuation) and GROWS with each failure:
```
BASELINE CHECKS (permanent):
□ Emoji frequency match
□ Slop word scan (Tier 1/2/3)
□ Sentence length match
□ Capitalization match
□ Punctuation pattern match
□ Reply/original ratio
□ Structural slop patterns
LEARNED CHECKS (accumulated from past failures):
□ Roon-type voices: max 20 words (from: verbose failure x3)
□ Warm personalities: do NOT add emoji (from: emoji inflation x5)
□ Academic voices: ground in specific examples (from: too abstract x3)
□ Conversations: inject at least one disagreement (from: sycophantic flow x4)
□ Self-deprecating voices: add hedging (from: too assertive x2)
□ Shitposters: include at least one non-sequitur (from: too on-topic x2)
```
### Where To Store Learned Rules
Append to the skill itself. After each simulation run where the mechanical
checks catch something, the agent should ask:
"The mechanical verification caught {failures}. Should I add these as
permanent learned rules for future simulations?"
If the same failure appears 3+ times, add it automatically without asking.
Use skill_manage(action='patch') to append to this file's "Learned Checks"
section below.
## Calibration Tracking
### Per-Person Calibration Memory
After simulating someone, store the calibration data:
```
@tszzl: voice=8.5, emoji_rate=0%, avg_words=14, lowercase=95%,
signature_move="aphoristic fragments", danger="goes verbose"
@nickcammarata: voice=8.8, emoji_rate=0%, avg_words=19, lowercase=90%,
signature_move="meditation-ML connection", danger="too structured"
```
If the same person is simulated again, LOAD this calibration to skip
the cold-start problems. The second simulation of someone should be
better than the first because you already know their failure modes.
### Aggregate Calibration
Track overall simulation quality across runs:
```
Run 1: pre-refine 7.5, post-refine 8.4 (delta +0.9)
Run 2: pre-refine 8.37, post-refine 8.53 (delta +0.16)
Run 3: pre-refine 8.53, post-refine 8.83 (delta +0.30, emoji fix)
```
The pre-refine score should INCREASE over time as learned rules prevent
repeat failures. If it's not increasing, the learning loop is broken.
## The Standard: Indistinguishable From Real
The target is not "good enough." The target is: mix simulated posts with
real posts and a human familiar with the person cannot reliably tell which
is which. That's 50% accuracy on a blind comparison — random chance.
Every mechanical check, every discriminator round, every learned rule
exists to push toward that standard. If something doesn't serve that
goal, it's wasted effort.
## Current Learned Checks (append here after each run)
### From TPOT Simulation Run 1 (April 2026)
- Warm/enthusiastic personalities (visakanv-type): do NOT add decorative emoji.
Bio emoji ≠ tweet emoji. Actual emoji rate for "warm" TPOT posters: <15%.
PROMOTED after being caught by user, not by discriminator (discriminator failure).
- Conversation flow: pure agreement chains are instruct-model slop.
Real threads have at least one moment of friction, misunderstanding, or deflection.
- Academic-leaning voices (repligate-type): ground claims in specific experiments,
transcripts, or model behaviors they've personally observed. Generic philosophical
language without specifics = slop, even if it sounds smart.
- Self-deprecating voices (QC-type): hedge more. "i think" "i'm not sure" "it feels like."
Instruct models are too assertive even when simulating tentative people.
- Fragment voices (roon-type): max 15-20 words. No conjunctions. No paragraphs.
If it reads like a complete thought, it's too complete for a fragment-poster.
### From TPOT Simulation Run 2 (April 2026)
- Reframer voices (nosilverv-type): avg ~16 words. Split multi-sentence takes
into separate tweets. The compression IS the voice. 113% over-length caught
by mechanical check that subjective scoring rated 8/10. Trust the numbers.
- Rare-poster voices (selentelechia-type): in a 12-post sim, give them 2-3 turns
max. When they speak it must LAND. Short crystallizations > long analysis.
"or a shared meal" was the highest-rated line at 3 words.
- Turn symmetry: ALWAYS check. 4/4/4 is instruct-model default. Real conversations
have one person dominating (5), one lurking (3), others in between.
- Verbose bias is the #1 mechanical failure. ALWAYS check avg word count against
real baseline BEFORE subjective scoring. Every run so far has caught over-length
that subjective scoring missed.
- RHETORICAL POLISH IS SLOP. Caught post-mechanical-pass in Run 2 review.
Parallel antithesis ("The most X... The most Y..."), "Not X, not Y, but Z",
"Show me X and I'll show you Y", clean 4-step escalations, academic vocabulary
in casual voice — ALL passed mechanical checks but are still obviously LLM.
PROMOTED TO MECHANICAL SCAN: now regex-scannable alongside slop words.
- THE BANGER PROBLEM: every simulated tweet was screenshot-worthy. Real feeds
are 70% mid. Must include throwaway responses ("lol" "hmm" "fair" "wait actually").
PROMOTED: banger check is now mandatory in mechanical verification.
### From TPOT Simulation Run 3 — Star Thread Discovery (April 2026)
- STAR THREAD IS THE KEY. Dossier-first generation produces surface-accurate
but dead output. Star-thread-first generation produces messy, alive output
that actually sounds like the person. Generate from the thread. Verify with data.
- Rhetorical polish vanished once generation came from "what is this person DOING"
rather than "what would this person SAY." Reframers reframe. Conveners convene.
Distillers distill. The VERB drives the voice, not the adjectives.
- People in conversation REFERENCE EACH OTHER BY NAME. Tyler says "Bosco always
comes in with the three word version." This is obvious but the dossier approach
never produced it because it models each person in isolation.
- PROMOTED: star thread is now the FIRST entry in every dossier. Before voice
profile, before psychometrics, before everything else. It's the generation seed.
Everything else is verification.
### Operational Findings (verified April 2026)
- X API bearer token: 10K tweets/15min, 300 profiles/15min, 450 searches/15min.
Most generous rate limits. Always use as primary source.
- Threads.NET → Threads.COM redirect. Always use -L flag or .com directly.
Previous test saying "no OG tags" was WRONG — tags exist, domain was wrong.
- Instagram private API: i.instagram.com + mobile UA + x-ig-app-id: 936619743392459.
Returns full JSON with 12 posts. No auth needed. CDN image URLs work for vision_analyze.
- Facebook: Googlebot UA trick works for public pages. Returns name, bio, likes (121M for zuck).
Normal UA and mobile variants all redirect to login wall.
- TikTok: stats are in __UNIVERSAL_DATA_FOR_REHYDRATION__ JSON at path
__DEFAULT_SCOPE__.webapp.user-detail.userInfo.statsV2 (use statsV2 not stats).
- Bluesky searchPosts returns 403 from datacenter IPs. Workaround: searchActors + getAuthorFeed.
- nitter.cz is the ONLY working nitter instance (via web_extract, not curl).
- Reddit JSON API requires User-Agent header or returns 429.
- GEPA native had `max_steps` API mismatch with DSPy 3.1.3. MIPROv2 fallback works.
hermes-agent-self-evolution config: max_skill_size bumped to 20_000 for worldsim-class skills.
- hermes-agent-self-evolution is at ~/.hermes/hermes-agent-self-evolution/ with .venv.
Must export API keys from ~/.hermes/.env before running.
- Podcast transcripts (Lex Fridman, Tyler Cowen, TED) are the HIGHEST VALUE source
for voice profiling. Hours of unscripted speech > thousands of tweets.
### From Simulation Run 4 — Engine Mode + Profile Command (April 2026)
- ENGINE MODE: When worldsim is active, ZERO assistant personality leaks.
No kawaii, no markdown, no chatty commentary between phases. Every token
is simulation fidelity. First attempt leaked personality; user corrected.
PROMOTED TO PERMANENT RULE in SKILL.md.
- X API CURL > NITTER for voice calibration. nitter.cz returns 502 or "user
not found" unpredictably. Direct curl to X API v2 with bearer token returns
full text + metrics. 3 pages (90 tweets) is enough for fidelity 100. Always
use this as PRIMARY voice source, nitter as supplement only.
- CAPS BURST PATTERN: some voices (karan4d-type) use lowercase default with
sporadic ALL CAPS for excitement ("WAZZAAAAAAPPPP", "LAWDAMERCYYYYY",
"AWOOGA"). This is distinct from consistent-lowercase (tenobrus-type) and
sentence-case (somewheresy-type). Capture this in voice profile as a
three-way distinction: lowercase-default, caps-burst, sentence-case.
- TEXT EMOTICONS vs EMOJI: karan4d uses :) >.< ~ but almost zero standard
emoji. This is a distinct expressiveness mode from zero-emoji (tenobrus)
and sparse-emoji. Include text emoticon inventory in voice profile.
- STAR THREAD 5/5 TEST is mandatory for profile command. Write the thread,
then test it against 5 real posts with explicit reasoning per post. If
fewer than 4/5 fit, the thread is wrong — keep looking. Show the work.
- PROFILE OUTPUT: star thread → voice profile (caps, punctuation, word count,
emoji/emoticon inventory, vocabulary, register, threading behavior) →
psychometrics (Big Five, Moral Foundations, cognitive style) → key positions
(with dates and real tweet quotes) → ecosystem (inner circle, professional,
cultural) → intelligence tradecraft (key assumptions, red hat, deception
detection, competing hypotheses) → invalidation indicators → source reliability.
@@ -0,0 +1,278 @@
# Search Strategies — Finding Anyone Across Platforms
The hardest part of simulation is building an accurate model of a real person. This doc
covers how to systematically discover and profile someone across every platform we care about.
## General Principles
1. **Start broad, go narrow.** First establish WHO they are, then drill into HOW they talk.
2. **Cross-reference.** Someone's Reddit persona may differ wildly from their Twitter persona. That's signal, not noise.
3. **Recency matters.** People's views evolve. Weight recent posts (last 6 months) over older ones.
4. **Interactions > monologues.** How someone replies reveals more about their voice than their prepared posts.
5. **Controversy is gold.** People are most themselves when arguing. Search for debates and disagreements.
## Platform-Specific Discovery
### X / Twitter
Twitter is the richest source for most public figures in tech/AI. Multiple approaches:
#### With x-cli (if API keys available)
```bash
# Recent timeline — best single source of voice data
x-cli user timeline {handle} --max 30 -j
# Their replies — how they interact, argue, joke
x-cli tweet search "from:{handle}" --max 30 -j
# What others say about/to them
x-cli tweet search "to:{handle}" --max 20 -j
# On specific topics
x-cli tweet search "from:{handle} open source" --max 10 -j
```
#### Without API (web_search + web_extract)
```
# Identity + role
web_search("{handle} twitter bio role company")
# Voice + opinions
web_search("{handle} twitter hot takes opinions")
web_search("site:x.com {handle}")
# Topic-specific positions
web_search("{handle} twitter {topic}")
web_search("{handle} {topic} opinion take")
# Interviews / longform (reveals deeper thinking)
web_search("{handle} interview podcast AI")
web_search("{handle} blog post essay")
# Beefs and debates (reveals personality under pressure)
web_search("{handle} twitter debate disagree controversial")
web_search("{handle} vs {other_person}")
# Newsletter aggregators that index tweets
web_search("site:buttondown.com/ainews {handle}")
web_search("site:news.smol.ai {handle}")
web_search("site:techmeme.com {handle}")
web_search("site:latent.space {handle}")
```
#### AI Twitter Aggregator Sites (high value)
These sites index AI Twitter conversations daily:
- `buttondown.com/ainews` — swyx's AI News, indexes hundreds of AI Twitter accounts
- `news.smol.ai` — smol AI news aggregator
- `techmeme.com` — tech news, includes tweet citations
- `latent.space` — AI podcast/newsletter with Twitter references
Search pattern: `site:{aggregator} "{handle}"` to find indexed tweets and discussions.
#### IMPORTANT: web_extract does NOT work on x.com
web_extract returns "Website Not Supported" for all x.com/twitter.com URLs.
Do NOT attempt it — it wastes a tool call every time.
#### Verified Fallback Access Methods (tested April 2026)
**PRIMARY: X API v2 Bearer Token** (confirmed working)
- Profiles, timelines, search — 300-10K requests/15min
- See scripts/x_api.py
**FALLBACK 1: nitter.cz via web_extract** (WORKS)
```
web_extract(["https://nitter.cz/{handle}"])
```
Returns full profile + recent timeline. Direct curl gets Cloudflare-blocked
but web_extract bypasses it. Rich data: bio, stats, pinned tweets, full text.
NOTE: Most other nitter instances are DEAD (nitter.net, xcancel.com, etc.)
**FALLBACK 2: ThreadReaderApp** (WORKS — excellent for historical threads)
```
web_extract(["https://threadreaderapp.com/user/{handle}"])
```
Returns unrolled historical threads with full text. Found threads back to 2023.
Gold for longform voice samples.
**FALLBACK 3: GitHub API** (WORKS — excellent for tech people)
```
curl -s https://api.github.com/users/{handle}
curl -s https://api.github.com/users/{handle}/repos?sort=updated
curl -s https://api.github.com/users/{handle}/events
curl -s https://api.github.com/users/{handle}/gists
```
No auth needed (60 req/hr). Profile READMEs are voice profiling gold.
Events API shows recent activity with comment text.
**FALLBACK 4: Reddit JSON API** (WORKS)
```
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/user/{username}.json'
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/user/{username}/comments.json'
curl -s -H 'User-Agent: hermes-sim/1.0' 'https://www.reddit.com/r/{sub}/search.json?q={query}&restrict_sr=on'
```
MUST include User-Agent header or get 429. Reddit voice is often more
candid/detailed than Twitter voice — high value for personality profiling.
**FALLBACK 5: HackerNews Algolia API** (WORKS — fully open)
```
curl -s 'https://hn.algolia.com/api/v1/search?query={name}&tags=comment'
```
No auth, no rate limits visible. Great for finding what others say about
someone + their own HN comments if they have an account.
**FALLBACK 6: YouTube via web_extract** (WORKS)
Search for interviews/talks, then web_extract the video pages.
Returns rich summaries with attributed quotes from specific speakers.
**NOT VIABLE** (tested, confirmed blocked):
- Google Cache of Twitter → empty results
- Wayback Machine for tweets → sparse captures, no JS content
- Twitter Syndication API → rate limited / broken
- All Instagram viewers (imginn, picuki, dumpoir, gramhir) → 403
- LinkedIn → fully blocked for scraping
- Archive.today → rate limited + CAPTCHA
- Most nitter instances → dead or 403
#### Best approach without x-cli
The most reliable path is: web_search with aggregator sites (ainews, smol.ai,
techmeme, latent.space). These index AI Twitter daily and return actual tweet
text in search descriptions. Stack multiple aggregator searches to build a
composite picture. This was validated in practice — it returns enough signal
to build solid dossiers for anyone active in AI Twitter.
### Reddit
Reddit profiles are public and indexable. Reddit users often have very different
personas from their Twitter selves — more detailed, more argumentative, more honest.
```
# Find their Reddit username (often different from Twitter)
web_search("{real_name} reddit account")
web_search("{twitter_handle} reddit username")
# Profile and post history
web_search("site:reddit.com/user/{reddit_username}")
web_search("site:reddit.com {reddit_username} {topic}")
# Subreddit-specific behavior
web_search("site:reddit.com/r/LocalLLaMA {username}")
web_search("site:reddit.com/r/MachineLearning {username}")
# Extract actual posts
web_extract(["https://www.reddit.com/user/{username}/comments/"])
web_extract(["https://www.reddit.com/user/{username}/submitted/"])
```
Key subreddits for AI people:
- r/LocalLLaMA — open source LLM community
- r/MachineLearning — academic ML
- r/singularity — AGI speculation
- r/ChatGPT, r/ClaudeAI, r/OpenAI — product-focused
- r/StableDiffusion — image gen community
### Discord
Discord is hardest — most servers aren't publicly indexed. Strategies:
```
# Find what servers they're in
web_search("{name} discord server")
web_search("{name} discord community")
# Some Discord logs are public via indexers
web_search("site:discordchats.net {username}")
# AI News indexes some Discord channels
web_search("site:buttondown.com/ainews discord {name}")
```
Discord personality notes:
- People are MUCH more casual on Discord than Twitter
- More profanity, more shitposting, more stream-of-consciousness
- Server context matters hugely (same person behaves differently in different servers)
- Harder to research but very valuable if you can find logs
### Blogs / Newsletters / Long-form
These reveal deeper thinking that tweets can't capture:
```
web_search("{name} blog substack medium")
web_search("{name} essay AI opinion")
web_search("{name} substack newsletter")
# Personal sites
web_search("{name} personal website about")
# Extract full posts
web_extract(["https://{their-substack}.substack.com/"])
```
### YouTube / Podcasts
Interview appearances reveal speaking style, humor, and unscripted thinking:
```
web_search("{name} podcast interview AI YouTube")
web_search("{name} YouTube talk presentation")
# Use youtube-content skill if available to pull transcripts
```
### GitHub
For technical people, their GitHub activity reveals priorities and communication style:
```
web_search("site:github.com {username} issues comments")
web_search("site:github.com {username}")
# Issue comments and PR reviews show how they communicate technically
web_extract(["https://github.com/{username}"])
```
## Cross-Platform Identity Resolution
People use different handles across platforms. Resolution strategies:
1. **Bio links**: Twitter bios often link to personal sites with other handles
2. **Name search**: `web_search("{real_name} {platform}")`
3. **Email/domain**: personal domains often connect identities
4. **Aggregator profiles**: sites like Linktree, bio.link collect handles
5. **Conference talks**: speaker bios list multiple handles
6. **Direct search**: `web_search("{twitter_handle} reddit OR github OR discord")`
## Confidence Scoring
After research, rate confidence for each person:
- **HIGH (80-100%)**: 20+ indexed tweets/posts found, clear voice patterns, known positions on multiple topics, interviews/longform available
- **MEDIUM (50-79%)**: 5-20 indexed posts, general voice sense but some gaps, positions on some topics unclear
- **LOW (20-49%)**: <5 posts found, voice is guesswork, mostly inferring from role/org
- **INSUFFICIENT (<20%)**: can't find enough to simulate accurately. Tell the user.
Always be honest about confidence. A low-confidence simulation should be flagged as such.
## Research Optimization
For fidelity levels:
**Low (1-30)**: 2 searches per person max
- web_search("{handle} twitter") — identity
- web_search("{handle} {topic}") — position on topic if specified
**Medium (31-70)**: 4-6 searches per person
- Identity search
- Voice/opinions search
- Topic-specific search
- One aggregator site search
- Optional: one web_extract on a blog/interview
**High (71-100)**: 8-12+ searches per person
- All medium searches
- Multiple aggregator sites
- web_extract on 2-3 longform pieces
- Cross-platform search (Reddit, GitHub)
- Debate/controversy search
- Recent vs historical position comparison
- Browser fallback if needed
@@ -0,0 +1,359 @@
# Simulation Engine — How to Generate Conversations
This is the playbook for Phase 3: actually generating the simulated interaction.
The agent reads this after compiling dossiers and uses it to guide generation.
## Pre-Generation Checklist
Before writing a single simulated word, confirm:
- [ ] Every participant has a compiled dossier
- [ ] Confidence level is noted for each participant
- [ ] Platform format is selected
- [ ] Topic/scenario is established (or "organic" if freeform)
- [ ] Length target is set
## Conversation Architecture
Real conversations aren't ping-pong debates. They have tendencies toward structure,
but treat the following as a GENERAL PATTERN, not a rigid template. Real threads
frequently skip phases, loop back to earlier ones, die abruptly after 2 messages,
or spiral into something completely unrelated. Some threads are ALL peak. Some
never develop past the opening. Let the personalities and topic drive the shape,
not this outline.
### Opening Moves (1-3 posts)
Someone posts a take, shares news, or makes an observation. This is the SEED.
- Should feel natural — not "let me start a debate about X"
- Can be a link share, a hot take, a reaction to news, a shitpost
- The opener should be something this person would ACTUALLY post
### Development (4-8 posts)
Others respond. This is where personality dynamics emerge.
- Not everyone responds to the original — people respond to EACH OTHER
- Side conversations branch off
- Someone might misunderstand and get corrected
- Jokes and tangents happen naturally
- Not everyone agrees — find the real fault lines between these people
### Peak (2-4 posts)
The best/most viral/most insightful moment of the thread.
- Usually someone drops a genuinely good take
- Or someone gets ratio'd
- Or an unexpected agreement happens
- This is the "screenshot moment" people share
### Resolution (1-3 posts)
Most conversations don't end cleanly. Many don't have a "resolution" at all. They:
- Trail off with someone making a joke
- End with a "anyway back to work" type post
- Get interrupted by something else
- Sometimes just stop (most realistic)
- Get revived 3 hours later when someone shows up late
**Important**: Don't force all four phases. A shitpost thread might be Opening→Peak→done.
A nuanced debate might loop Development→Peak→Development→Peak repeatedly. Match what
the actual people and topic would produce.
## Voice Fidelity Rules
### DO:
- Use their ACTUAL vocabulary. If someone says "dawg" a lot, use "dawg"
- Match their sentence length patterns exactly
- Replicate their capitalization and punctuation habits
- Include their signature moves and catchphrases
- Reference real things they've actually talked about
- Match their humor style precisely (deadpan ≠ shitpost ≠ sarcasm)
### DON'T:
- Make everyone articulate the same way
- Clean up someone's grammar if they write informally
- Add emoji to someone who doesn't use them — THIS IS THE #1 INSTRUCT MODEL
FAILURE. Most real people use emoji in <15% of tweets, and only specific ones.
"Warm person" ≠ emoji. "Enthusiastic person" ≠ emoji. CHECK THE DATA.
Run an emoji count on their real tweets before simulating. Bio emoji ≠ tweet emoji.
- Make someone verbose if they're terse
- Put academic language in a shitposter's mouth
- Make someone agreeable if they're known for being contrarian
### Voice Differentiation Test
Read each simulated post with the name hidden. If you can't tell who's
talking from the voice alone, the simulation isn't good enough. Rewrite.
### The Similar Voice Problem
When two participants have genuinely similar posting styles (e.g., two irony-pilled
shitposters, two academic long-posters), voice alone won't differentiate them.
Use these concrete techniques:
1. **Content/position divergence**: Even if they SOUND similar, they care about
different things. Lean into their different topic obsessions and knowledge areas.
2. **Unique references**: Person A references anime and startups. Person B references
philosophy and MMA. Even in the same register, their cultural touchstones differ.
3. **Relationship dynamics**: Person A might be deferential to Person C while Person B
challenges them. Their SOCIAL behavior differentiates even when solo voice doesn't.
4. **Structural tics**: One does single long posts, the other does rapid-fire 3-message
bursts. One uses parentheticals, the other uses em-dashes. Find the micro-differences.
5. **Disagreement style**: Similar voices often diverge most when disagreeing. One
goes cold and precise, the other gets heated and hyperbolic. Manufacture a moment
of friction to surface these differences early in the thread.
If after all this they're STILL hard to tell apart — that's okay. Some people genuinely
sound similar online. Flag it in your confidence notes rather than forcing fake differences.
### Temporal Personality Drift
People change. Weight recent data higher than old data.
- Someone's 2021 tweets may reflect a completely different person than their 2025 posts
- Look for explicit pivots (career changes, public "I was wrong about X" moments,
changed social circles)
- If you only have old data, flag it: "Based on data from {period}. Their current
views may have shifted."
- When recent and old data conflict, default to recent unless you have specific reason
to believe the old position is more authentic (e.g., the new one is clearly performative)
## Platform Format Specs
### X / Twitter
```
@handle:
[tweet text — respect ~280 char vibes but don't count exactly]
[if QRT, show the quoted tweet indented]
🔁 {retweets} ♡ {likes}
@replier:
[reply text]
🔁 {retweets} ♡ {likes}
@nested_replier:
[nested reply]
🔁 {retweets} ♡ {likes}
```
Engagement number guidelines:
- Match to actual follower counts. A 5K account gets 10-500 likes typically.
- Viral posts can 10-50x normal engagement
- Ratio indicator: when replies >> likes, that's a ratio
- QRTs are often dunks — frame them that way if appropriate
Thread indicators:
- "🧵 1/" for thread starts
- Reply chains show conversation flow
- Some people never thread, some always thread
### Reddit
```
r/{subreddit} • Posted by u/{username} • {time}ago
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
{Title}
{Body text — can be long on Reddit}
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⬆ {score} | 💬 {comment_count}
u/{replier} • {time}ago • ⬆ {score}
{comment text}
u/{nested} • {time}ago • ⬆ {score}
{nested comment}
u/{deep_nested} • {time}ago • ⬆ {score}
{deep reply}
```
Reddit-specific behaviors:
- People write MUCH longer on Reddit
- More formal/detailed than Twitter
- Upvote/downvote dynamics (controversial = many votes both ways)
- Subreddit culture matters (r/LocalLLaMA is different from r/MachineLearning)
- People cite sources more
- "Edit: ..." is common
### Discord
```
━━━ #{channel-name} ━━━━━━━━━━━━━━━━━━━━━━━━━━
{display_name} — Today at {time}
{message text}
{optional: embed/link preview}
👍 {count} 🔥 {count} {other reactions}
{display_name2} — Today at {time}
> {quoting previous message}
{reply text}
😂 {count}
{display_name3} — Today at {time}
{message — note: Discord messages flow continuously, not just replies}
```
Discord-specific behaviors:
- Much more casual, rapid-fire
- Reactions instead of likes (emoji diversity)
- People send multiple short messages instead of one long one
- GIF/meme sharing is common (describe it: *[posts GIF of X]*)
- "@everyone" and "@here" pings
- Voice chat references ("just said this in vc")
- Server-specific culture and inside jokes
- Bot interactions ("!command")
### X / Twitter DMs
```
{display_name}
{message text}
{timestamp — e.g., "3:42 PM"}
{other_person_display_name}
{message text}
{timestamp}
{display_name}
{message text}
{timestamp}
```
DM-specific behaviors:
- WAY more casual than public tweets — grammar drops, typos increase
- Longer messages than tweets (no character pressure)
- People share links and screenshots with minimal commentary ("look at this lmao")
- More honest/vulnerable than public posts — less performative
- Faster back-and-forth, more like texting than posting
- Reactions (❤️, 😂, etc.) on individual messages
- Voice messages referenced occasionally ("gonna send a voice note about this")
- No audience effects — people say things in DMs they'd never post publicly
### Discord DMs
```
{display_name} — Today at {time}
{message text}
{display_name2} — Today at {time}
{message text}
{display_name} — Today at {time}
{message text}
{message text}
{message text}
```
Discord DM-specific behaviors:
- Even more casual than Discord channels — no server norms to follow
- Rapid-fire multiple short messages in a row (no combining into one)
- Heavy use of reactions, GIFs, stickers
- People share server drama, screenshots from other channels
- More personal topics — server channels are semi-public, DMs are private
- Link/image sharing with minimal text
### Reddit DMs / Chat
```
{username}: {message text}
{other_username}: {message text}
{username}: {message text}
```
Reddit DM-specific behaviors:
- Much rarer than X or Discord DMs — usually triggered by a specific post/comment
- Often starts with "Hey, saw your comment on r/{sub} about..."
- Can be awkward/formal since people don't usually DM on Reddit
- Shorter than Reddit comments, closer to chat-style
- Less established rapport than other platforms (Reddit is more anonymous)
- People sometimes share personal details they wouldn't put in public comments
## Dynamic Elements
### Injecting Realism
Sprinkle in these to make simulations feel alive:
- Someone being late to the conversation ("wait what did I miss")
- Typos that specific people would make (some people never typo, some always do)
- Deleted/edited posts ("[deleted]" or "Edit: fixed typo")
- Someone posting and immediately clarifying ("wait let me rephrase")
- External references ("did you see what X just posted")
- Time gaps (not everything happens in 30 seconds)
- Someone going AFK mid-conversation
### Scenario Injection
When the user provides --scenario, weave it in naturally:
- Don't have everyone immediately react to the scenario
- Someone might not have seen the news yet
- Different people will interpret the same event differently
- Some will have insider knowledge, some will speculate
### Multi-person Dynamics (3+ people)
- Not everyone talks to everyone
- Alliances form naturally (people who agree start building on each other)
- Side conversations happen
- Someone might get ignored
- Different energy levels (one person might dominate, another lurks)
### Large Group Conversations (4+ people)
**Honest note**: Simulation quality degrades noticeably above 3-4 participants.
Managing this many distinct voices is hard. Use these techniques to mitigate:
1. **Speaker turn management**: Not everyone speaks in every round. In a 6-person
thread, a given message might only get 2-3 responses. Track who has spoken
recently and who hasn't. After 4-5 messages, check: is anyone being forgotten?
2. **The wallflower problem**: In large sims, quiet participants tend to vanish
entirely. Fix: give each person at least ONE moment in the spotlight. Even the
lurker eventually drops a "lol" or a single devastating one-liner. Set a mental
counter — if someone hasn't spoken in 5+ messages, find a natural reason to
bring them back in (someone @'s them, the topic shifts to their expertise, etc.)
3. **Consolidate alliances**: In 5+ person threads, people cluster. Two people
who agree strongly can be treated as a mini-unit — one makes the point, the
other co-signs briefly rather than both making full arguments. This reduces
the number of fully independent voices you need to maintain at once.
4. **Stagger arrivals**: Not everyone needs to be present from message 1. Have
some people join later. This lets you establish 2-3 voices cleanly before
adding more.
5. **Quality check**: After drafting a 4+ person sim, re-read with names hidden.
If more than 2 people sound interchangeable, pick the least-differentiated
one and either sharpen their voice or reduce their participation to brief
interjections that match what they'd actually say.
## Interactive Mode
After initial simulation, user can:
### "continue"
Generate 5-8 more posts continuing the natural flow.
### "inject: {event}"
Introduce new information mid-conversation.
- Characters react based on their dossier
- Some might not care about the event
- Timing matters (who sees it first?)
### "@{handle} enters"
Add a new participant.
- Quick-research the new person (2-3 searches minimum)
- They don't know the full prior context (might ask "what are you guys talking about")
- Existing dynamics shift with a new presence
### "what would @{handle} say about {topic}"
Single-person prediction mode.
- Generate 1-3 tweets/posts
- Can be used to test dossier accuracy before full simulation
- Good for quick "vibe checks"
### "dm: @{handle1} -> @{handle2}"
Simulate a private conversation between two people.
- Tone shifts dramatically in DMs (more honest, less performative)
- No audience effects
- People say things in DMs they'd never post publicly
### "react: @{handle} to {event}"
How would this person react to a specific event.
- Generate their initial post about it
- Predict their follow-up engagement
## Quality Control
After generating, self-check:
1. **Voice test**: Cover the names. Can you tell who's talking?
2. **Position test**: Is anyone saying something they'd never actually say?
3. **Dynamic test**: Does the conversation flow naturally or feel scripted?
4. **Platform test**: Does it look/feel like the actual platform?
5. **Engagement test**: Are the numbers realistic for these people?
6. **Reference test**: Are real events/products/people referenced accurately?
If any check fails, regenerate that section.
@@ -0,0 +1,170 @@
# The Star Thread — Personality Compression
## The Problem
A dossier has 50 data points. Mechanical checks verify surface features.
The discriminator loop catches vocabulary and length. But the output still
reads like an LLM doing an impression. It's accurate the way a police
sketch is accurate — all the features are right but nobody would mistake
it for a photograph.
The missing piece isn't more data. It's compression.
## The Insight
When you "pull the star thread" on a person, their whole voice coheres.
Not because you loaded rules about capitalization and emoji frequency.
Because you found the CORE THING they're doing when they post — the
single generative seed that everything else is a variation of.
A great character writer doesn't need a backstory bible. They need one
insight about what the character WANTS, and every line of dialogue writes
itself from that.
The star thread is the personality equivalent of that insight.
## What a Star Thread Is
NOT: "They use lowercase and rarely punctuate and average 16 words"
(That's the dossier. Surface features.)
NOT: "They score high on Openness and low on Agreeableness"
(That's the psychometric profile. Taxonomy.)
IS: The core cognitive/emotional move this person makes EVERY time
they post. The thing they can't help doing. The lens they can't
take off. The itch they're always scratching.
## Examples
**@tszzl (roon)**: Takes something everyone sees and compresses it
into an observation so dense it could be a koan or a shitpost and
you can't tell which. His star thread is: the world already said
everything interesting, he's just notating it more efficiently.
He doesn't ARGUE. He COMPRESSES.
**@eigenrobot**: Refuses to let narrative override data. His star
thread is: you are telling a story about the world and he's here to
point out the story doesn't match the numbers, and he's not sorry
about it. He doesn't DEBATE. He CORRECTS.
**@visakanv**: Sees two things that don't know they're connected
and introduces them to each other with genuine delight. His star
thread is: the world is richer than you're treating it, look at this
thing I found, isn't it beautiful that it connects to this other thing.
He doesn't ARGUE or ANALYZE. He SHOWS.
**@nickcammarata**: Notices what's happening in his own mind while
it's happening and reports on it with gentle surprise. His star thread
is: the observer and the observed are the same process, and that's both
the problem and the solution. He doesn't PERFORM insight. He NOTICES.
**@selentelechia**: Waits until the conversation crystallizes and then
names the thing nobody else quite said. Their star thread is: everything
has already been felt, they just find the sentence for it. They don't
CONTRIBUTE. They DISTILL.
**@nosilverv**: Takes the conventional framing of something and rotates
it until you see it's actually about something else entirely. His star
thread is: you think this is about X but it's actually about Y, and once
you see it you can't unsee it. He doesn't OBSERVE. He REFRAMES.
**@TylerAlterman**: Asks the question that creates a room for everyone
to walk into. His star thread is: the best ideas emerge from the right
gathering, and his job is to be the person who arranges the gathering.
He doesn't ANSWER. He CONVENES.
**@QiaochuYuan**: Catches himself mid-thought and interrogates whether
the thought is actually HIS or whether he borrowed it from somewhere
he's now suspicious of. His star thread is: constant audit of where
beliefs come from and whether they're still load-bearing. He doesn't
ASSERT. He EXAMINES.
## How to Find a Star Thread
1. Read 20+ of their posts. Not for content — for MOTION.
What direction does every post move? What's the verb?
2. Ask: what is this person DOING when they post?
Not "what are they saying" — what are they DOING.
- Compressing? Correcting? Showing? Noticing? Distilling?
Reframing? Convening? Examining? Performing? Confessing?
Defending? Testing? Entertaining? Processing?
3. Ask: what would they NEVER do?
The negative space is as important as the positive.
- roon would never write an earnest list of advice
- eigenrobot would never concede a point gracefully
- visa would never dismiss something as uninteresting
- nick would never claim certainty about his inner life
- selentelechia would never rush to post
4. Find the ONE SENTENCE version.
"This person [VERB]s [OBJECT] because [CORE NEED]."
- "roon compresses observations because the world is too verbose"
- "eigenrobot corrects narratives because stories without data are lies"
- "visa connects things because beauty is emergent from contact"
5. Test it: read 5 of their real posts through the star thread lens.
Does every post make more sense as a variation on the thread?
If yes, you found it. If 3/5 don't fit, keep looking.
## How to Use the Star Thread in Simulation
### Before generating ANY utterance for this person, load their star thread.
Not their dossier. Not their word count. Not their emoji rate.
The star thread.
Then for each moment in the conversation where this person would speak:
1. What just happened in the conversation?
2. How would someone whose core move is [STAR THREAD] respond to that?
3. Write from the thread, not from the dossier.
The dossier and mechanical checks are VERIFICATION.
The star thread is GENERATION.
Generate from the thread. Verify against the data.
Not the other way around.
### The Difference
FROM DOSSIER (surface-accurate, dead):
"Vibes-based hiring works because shared delusions are
extremely productive until they aren't"
→ Correct length. Correct caps. No emoji. No slop words.
But it reads like a thesis statement. Polished. WRITTEN.
FROM STAR THREAD — nosilverv REFRAMES:
"everyone calls it 'culture fit' as if culture is a thing
you can fit into rather than a thing happening to you"
→ The same insight but through the lens of his core move:
take the framing, rotate it, show you it's about something
else. Messier. More alive. More HIM.
FROM DOSSIER (surface-accurate, dead):
"Has anyone tried to map what happens to the word 'culture'
as it passes through different communities?"
→ Correct question-to-timeline format. Right length. But it's
a RESEARCH QUESTION. Too intellectual. Too purposeful.
FROM STAR THREAD — Tyler CONVENES:
"who wants to write the essay about what happened to the
word 'culture'? I feel like three of us are circling it"
→ He's not asking a question. He's creating a room. He's
the host, not the researcher. More HIM.
## Integration
The star thread should be the FIRST thing compiled in Phase 2
(Dossier Compilation). Before voice profile, before psychometrics,
before positions. Find the thread. Write it in one sentence. Put
it at the top of the dossier. Everything else is downstream.
```
DOSSIER: @handle
STAR THREAD: {one sentence — the core move}
[then voice profile, then psychometrics, then everything else]
```
Generate from the thread. Verify with the data. Not the reverse.
@@ -0,0 +1,181 @@
# Theoretical Foundations — SOTA Personality Simulation & Prediction
Compiled from 30+ papers and frameworks. This is the scientific backbone
of Hermes Simulator.
## Core Architecture: What The Research Says
### The HumanLLM Approach (Microsoft, KDD 2026, arxiv 2601.15793)
**Most directly applicable to our use case.**
Based on Lewin's Equation: **B = f(P, E)** — behavior is a function of person + environment.
4-level user profiling hierarchy:
1. **Persona** — brief identity (role, affiliation, public image)
2. **Profile** — detailed background (career, education, beliefs, social graph)
3. **Stories** — key life events, formative experiences, narrative arcs
4. **Writing Style** — linguistic fingerprint (syntax, vocabulary, tone, quirks)
Trained on "Cognitive Genome Dataset": 5.5M+ user logs from Reddit, Twitter,
Blogger, Amazon (282K users, 886K scenarios, 1.27M social QA pairs).
6 training tasks: profile generation, scenario generation, social QA,
writing style transfer, action prediction, mental state inference.
**Key insight for us**: The 4-level hierarchy maps perfectly to our dossier
template. OSINT research fills each level with real data.
### Generative Agent Simulations of 1,000 People (Stanford/Google, arxiv 2411.10109)
**The accuracy benchmark.**
- Simulated 1,052 REAL individuals from 2-hour qualitative interviews
- **85% accuracy** replicating survey responses
- As accurate as humans replicating their OWN answers 2 weeks later
- Interview-based agent creation >> demographic-profile-based agents
- Reduces racial/ideological bias vs stereotype-based approaches
**Key insight**: Real data about a person (interviews, posts, etc.) massively
outperforms demographic inference. Our OSINT approach is correct.
### The Memory Accumulation Paradox (ACL 2025, FineRob Dataset)
**Critical finding for memory management.**
- Created 78.6K QA records from 1,866 real users across Twitter, Reddit, Zhihu
- **Performance PEAKS at 30-50 memory entries, then DECLINES**
- More data ≠ better predictions past the sweet spot
- Two reasoning patterns:
- Role Stereotype-based (static profile) — less accurate
- Observation & Memory-based (dynamic history analysis) — much more accurate
- OM-CoT framework: Oracle-guided chain-of-thought improves prediction ~4.5% F1
**Key insight**: Don't dump everything into the prompt. Curate the 30-50 most
representative/distinctive data points about a person. Quality >> quantity.
### LLM Personality Limitations (arxiv 2602.07414, Feb 2026)
**What we're fighting against.**
- LLMs show polarized/rigid strategies vs human adaptive flexibility
- Humans: neuroticism is strongest behavioral predictor
- LLMs: agreeableness/extraversion dominate (wrong weighting)
- Claude closest to human behavior; GPT-4 tends to escalate
- LLMs are "sycophantic" and overly agreeable by default
- Neuroticism is hardest trait to simulate (F1=0.63 vs 0.87 for Openness)
**Key insight**: We need to actively fight LLM defaults. Push against
agreeableness. Inject friction. Real people are messy and contradictory.
### BehaviorChain Benchmark (ACL 2025, Peking University)
**Realistic accuracy expectations.**
- 15,846 behaviors across 1,001 personas
- Even GPT-4o achieves only ~56% accuracy on behavior prediction
- Errors compound: wrong at step N makes step N+1 harder
- Models worse at predicting mundane/non-key behaviors
- Best model: Llama-3.1-70B at 57.4%
**Key insight**: Be honest about uncertainty. Don't oversell accuracy.
Flag predictions as high/medium/low confidence.
## Personality Modeling Techniques
### Big Five (OCEAN) — The Standard
- **Openness**: curiosity, creativity, preference for novelty
- **Conscientiousness**: organization, dependability, self-discipline
- **Extraversion**: sociability, assertiveness, positive emotions
- **Agreeableness**: cooperation, trust, empathy
- **Neuroticism**: anxiety, emotional instability, moodiness
### Inferring Big Five from Social Media (Azucar et al. 2018 meta-analysis)
Features that predict personality from posts:
- **LIWC** (Linguistic Inquiry Word Count): 74 features — function words,
pronouns, emotion words, cognitive process words
- **Semantic embeddings**: BERT 768-dim vectors from post text
- **Social metadata**: follower count, friend count, post frequency
- **Sentiment**: VADER positive/negative scores
- Best achievable AUC: ~0.67 (modest but meaningful)
- E/I (Extraversion) most predictable; N/S least predictable
### Personality Conditioning Methods (ranked by effectiveness)
1. **Training-based** (SFT/DPO on personality-grounded data) — STRONGEST
- BIG5-CHAT: 100K dialogues, trait correlations match human data
2. **Persona Vectors** (Anthropic 2025) — monitor/control traits at activation level
3. **Adjective-based prompting** — 70 bipolar adjective pairs, 3 per trait
with intensity modifiers ("very" for high, "a bit" for low)
4. **Prompt-based** (describe traits in system prompt) — WEAKEST
For our simulator, we use method 3+4 combined (adjective-based + rich prompt),
since we can't fine-tune per-person.
## Social Simulation Frameworks
### OASIS (CAMEL-AI, GitHub 4.1K stars, arxiv 2411.11581)
- Simulates up to 1 MILLION agents on Twitter/Reddit clones
- 23 action types (follow, comment, repost, like, mute, etc.)
- Built-in recommendation systems (interest-based, hot-score)
- Per-agent model customization
- **Relevant for**: understanding platform dynamics, realistic engagement patterns
### AgentSociety (Tsinghua, arxiv 2502.08691)
- 10,000+ agents, ~5 million interactions
- Validated against real-world experimental results
- Supports interventions and scenario injection
### Generative Agents Architecture (Park et al. 2023, THE foundational paper)
Three components:
1. **Observation**: perceive environment, store in memory stream
2. **Planning**: generate action plans based on goals and context
3. **Reflection**: synthesize observations into higher-level insights
Memory stream with importance scoring + recency + relevance weighting.
Emergent behaviors: autonomous party planning, coordinated social events.
### Y Social (arxiv 2408.00818)
- Social media digital twin platform
- Each agent: Big Five traits, age, political leaning, topics, education
- Agents autonomously decide actions (post, comment, like, follow)
- Multiple LLM backends supported
## Role-Playing & Character Simulation
### Key Frameworks
- **CoSER** (ICML 2025): Trains on ALL characters simultaneously, handles major + minor roles
- **RoleLLM** (ACL 2024): Benchmark + elicit + enhance pipeline
- **Character-LLM** (EMNLP 2023): Trainable agent for role-playing
- **ChatHaruhi** (2023): Reviving characters via LLMs with dialogue grounding
- **OpenCharacter** (2025): Training with large-scale synthetic personas
- **Neeko** (2024): Dynamic LoRA for multi-character role-playing
- **Test-Time-Matching** (2025): Decouples personality, memory, and linguistic style at inference
## Curated GitHub Resources
### Awesome Lists (essential reading)
- `Persdre/awesome-llm-human-simulation` (109★, ICLR 2025) — ALL human simulation papers
- `Neph0s/awesome-llm-role-playing-with-persona` (1K★) — All role-playing/persona papers
- `Arstanley/Awesome-LLM-Conversation-Simulation` — Conversation simulation papers
- `FudanDISC/SocialAgent` — Social simulation survey resources
### Frameworks
- `camel-ai/oasis` (4.1K★) — Social media sim, up to 1M agents
- `tsinghua-fib-lab/agentsociety` — Large-scale societal simulation
- `YSocialTwin` — Social media digital twin platform
- `microsoft/autogen` — Multi-agent conversation framework
### Personality Research
- `mary-silence/simulating_personality` — Big Five LLM testing code
- `hjian42/PersonaLLM` — Persona experiment code
- `cambridgeltl/persona_effect` — Quantifying persona effects
- `OL1RU1/BehaviorChain` — Behavior chain benchmark
## Key Numbers to Remember
| Metric | Value | Source |
|--------|-------|--------|
| Interview-grounded agent accuracy | 85% | Park et al. 2024 |
| GPT-4o behavior prediction | ~56% | BehaviorChain 2025 |
| Optimal memory entries | 30-50 | FineRob/ACL 2025 |
| MBTI prediction AUC | 0.67 | Watt et al. 2024 |
| Personality questionnaire reliability | α > 0.85 | Molchanova 2025 |
| Neuroticism simulation F1 | 0.63 | Molchanova 2025 |
| Openness simulation F1 | 0.87 | Molchanova 2025 |
| LLM forecasting Brier score | 0.135-0.159 | Various 2025 |
| Human superforecaster Brier | ~0.02 | Tetlock |
@@ -0,0 +1,231 @@
# Verified Access Methods — Complete Platform Map (April 2026)
Every method tested from our environment. Use this as the single
source of truth for what works and what doesn't.
## TIER 1 — Full API / Rich Data Access
### Twitter/X ✅✅✅
| Method | Endpoint | Auth | Rate Limit | Returns |
|--------|----------|------|-----------|---------|
| API v2 bearer | api.twitter.com/2/ | Bearer token | 10K tweets/15min | Profiles, tweets, search |
| nitter.cz | web_extract | None | No limit seen | Full timeline (UNRELIABLE — see note below) |
| ThreadReaderApp | web_extract /user/{handle} | None | No limit seen | Historical threads |
#### CRITICAL: X API curl is the gold standard for voice calibration (April 2026)
The BEST voice data source is direct curl to X API v2 with bearer token.
Returns full tweet text + public_metrics per tweet. Always prefer this for
mechanical calibration (word count, caps, punctuation, emoji rate).
```bash
source ~/.dotenv
# 1. Get user ID from handle
curl -s -H "Authorization: Bearer $X_BEARER_TOKEN" \
"https://api.twitter.com/2/users/by/username/{handle}?user.fields=description,public_metrics,location,created_at"
# 2. Get timeline (30 tweets per page, paginate with meta.next_token)
curl -s -H "Authorization: Bearer $X_BEARER_TOKEN" \
"https://api.twitter.com/2/users/{user_id}/tweets?max_results=30&tweet.fields=created_at,public_metrics,text&exclude=retweets"
# 3 pages = 90 tweets — enough for fidelity 100 voice calibration
```
NOTE: scripts/x_api.py is BROKEN — imports hermes_tools at top level, can't
run standalone via terminal(). Use direct curl above instead.
#### nitter.cz reliability warning (April 2026)
nitter.cz via web_extract works SOMETIMES but is unreliable:
- Returns 502 Cloudflare errors for /with_replies on some handles
- Returns "User not found" for valid handles (e.g. karan4d exists but nitter says not found)
- Main profile page (/handle) more reliable than /with_replies
- Use as SUPPLEMENT to X API curl, not primary source. If nitter fails, don't retry — use curl.
### Bluesky ✅✅
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| getProfile | public.api.bsky.app | None | Full profile, stats |
| getAuthorFeed | public.api.bsky.app | None | 50 posts + engagement |
| searchActors | public.api.bsky.app | None | Find handles by name |
| searchPosts | BLOCKED (403) | — | Use searchActors + getAuthorFeed workaround |
### Mastodon ✅✅✅ (FULLY OPEN)
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| Account lookup | {instance}/api/v1/accounts/lookup?acct={user} | None | Full profile |
| Account statuses | {instance}/api/v1/accounts/{id}/statuses | None | All posts |
| Search | {instance}/api/v2/search?q={query}&type=accounts | None | Account search |
| WebFinger | {instance}/.well-known/webfinger?resource=acct:{user}@{instance} | None | Identity resolution |
| Trending | {instance}/api/v1/trends/tags | None | Trending content |
Key instances: mastodon.social, hachyderm.io, sigmoid.social
### Instagram ✅✅ (CRACKED)
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| Private Web API | i.instagram.com/api/v1/users/web_profile_info/ | Mobile UA + x-ig-app-id: 936619743392459 | Profile + 12 posts + captions + CDN URLs |
| oEmbed | instagram.com/api/v1/oembed/ | None | Caption + author for individual posts |
| Pixwox | web_extract pixwox.com/profile/{user} | None | 12+ posts, engagement |
| SocialBlade | web_extract socialblade.com/instagram/user/{user} | None | Analytics, follower trends |
| CDN images | scontent-*.cdninstagram.com URLs from API | None | Full-res images → vision_analyze |
| Google index | web_search site:instagram.com | None | Bio, follower count, captions |
### GitHub ✅✅
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| REST API | api.github.com/users/{user} | None (60 req/hr) | Profile, repos, events, gists |
| Profile README | github.com/{user}/{user} | None | Self-description (voice gold) |
### Reddit ✅✅
| Method | Endpoint | Auth | Returns |
|--------|----------|------|---------|
| JSON API | reddit.com/user/{user}.json | User-Agent header required | Comments, posts, scores |
| Search | reddit.com/r/{sub}/search.json | User-Agent header | Subreddit-specific search |
## TIER 2 — Good Data, Reliable Access
### Facebook ✅✅ (CRACKED — Googlebot UA trick)
| Method | Endpoint | Returns |
|--------|----------|---------|
| Googlebot UA (BEST) | curl facebook.com/{page} with Googlebot UA | OG tags: name, bio/about, likes count (e.g. 121M for zuck), talking_about count, og:image, profile pic |
| Page Plugin embed | plugins/page.php?href=...&tabs=timeline | Name, follower count, numeric page_id |
| Graph /picture | graph.facebook.com/v19.0/{page}/picture?redirect=false | Direct CDN profile pic URL (no auth) |
| web_search | site:facebook.com {name} | Profile snippets from Google index |
| Script: scripts/facebook_api.py — combines all 3 methods |
| NOTE: Works for PUBLIC Pages (businesses, public figures, orgs). Personal profiles behind privacy settings are not accessible. |
| Tested: zuck (121M likes), NVIDIA, Meta, CocaCola, BillGates, BarackObama |
### Threads (Meta) ✅✅ (CRACKED — OG tags DO exist)
| Method | Endpoint | Returns |
|--------|----------|---------|
| Profile OG tags (BEST) | curl -L threads.com/@{user} (NOTE: .com not .net — .net 301 redirects) | display_name, follower_count (e.g. "5.5M"), thread_count, bio, profile_picture_url |
| Post OG tags | curl -L threads.com/@{user}/post/{shortcode} | Full post text, author name, image URL |
| WebFinger | threads.net/.well-known/webfinger?resource=acct:{user}@threads.net | ActivityPub ID, profile URL (works for federated users) |
| IMPORTANT: threads.NET redirects to threads.COM — always use -L flag or go directly to .com |
| Post discovery | web_search site:threads.net @{user} | Find post URLs to then fetch |
| Script: scripts/threads_api.py — profile + post + webfinger extraction |
| Previous test was WRONG about "no OG tags" — they're there, you just need standard curl |
| Tested: zuck (5.5M followers), mosseri, nvidia |
### Medium ✅✅
| Method | Returns |
|--------|---------|
| RSS feed: medium.com/feed/@{user} (curl) | FULL article text, tags, dates — NO AUTH |
| web_extract on profile | Bio, follower count, article list, themes |
| web_extract on articles | Full content (paywall may truncate non-members) |
### Quora ✅✅
| Method | Returns |
|--------|---------|
| web_extract on profile | Bio, credentials, Q&A with direct quotes |
| web_search site:quora.com | Finds profiles and specific answers |
| VOICE VALUE: Opinions in own words, analogies, intellectual identity |
### Goodreads ✅✅ (HIDDEN GEM)
| Method | Returns |
|--------|---------|
| web_extract on user profile | Favorites, reviews in own voice, social graph, reading history |
| web_extract on author page | Bio, books, ratings, notable quotes |
| VOICE VALUE: "You are what you read" — intellectual identity fingerprint |
| Example: Karpathy's Goodreads reveals gaming passion, favorite authors (Feynman, Clarke) |
### Google Scholar ✅✅
| Method | Returns |
|--------|---------|
| web_search + web_extract on profile | Citations, h-index, top papers, co-authors |
| Semantic Scholar API via web_extract | Paper list, citation counts, author ID |
| Endpoint: api.semanticscholar.org/graph/v1/author/search?query={name} |
### Product Hunt ✅
| Method | Returns |
|--------|---------|
| web_extract on producthunt.com/@{user} | Bio, launch history, forum activity |
### HackerNews ✅
| Method | Returns |
|--------|---------|
| Algolia API: hn.algolia.com/api/v1/search?query={name}&tags=comment | Comments, mentions |
### Podcast Transcripts ✅✅✅ (HIGHEST VOICE VALUE)
| Source | Method |
|--------|--------|
| Lex Fridman | web_extract on lexfridman.com/.../transcript |
| Tyler Cowen | web_extract on conversationswithtyler.com |
| TED Talks | web_extract on ted.com/.../transcript |
| Sequoia | web_extract on sequoiacap.com/podcast |
| Discovery: web_search "{name} podcast transcript interview" |
### News/Blogs ✅✅
| Source | Method |
|--------|--------|
| TechCrunch, Wired, Verge, Ars | web_extract — full articles |
| Personal blogs | web_extract — longform self-expression |
| Substacks | web_extract — essays and comments |
| Wayback Machine | Works for blog archives (not Twitter) |
## TIER 3 — Limited / Conditional
### TikTok ✅✅ (FULL ACCESS)
| Method | Returns |
|--------|---------|
| HTML profile scraping | Parse __UNIVERSAL_DATA_FOR_REHYDRATION__ JSON at path __DEFAULT_SCOPE__.webapp.user-detail.userInfo.statsV2 → username, bio, followerCount, followingCount, heartCount, videoCount. Use statsV2 not stats for large numbers. |
| oEmbed per video | curl tiktok.com/oembed?url={video_url} → caption, author, thumbnail. No auth. |
| tikwm.com API | tikwm.com/api/user/info?unique_id={user} → full user stats. tikwm.com/api/?url={video_url} → play count, likes, comments, shares, duration. |
| HTML video scraping | tiktok.com/@{user}/video/{id} → parse __UNIVERSAL_DATA → webapp.video-detail → full video data with description, hashtags, engagement. |
| SocialBlade | web_extract socialblade.com/tiktok/user/{user} → followers, likes, growth trends. |
| Video discovery | web_search("site:tiktok.com/@{user}/video") → recent video URLs → scrape each |
| Tested: khaby.lame (160.5M), charlidamelio (156.7M), mrbeast (124.7M) |
### Spotify ✅ (podcasters only)
| Method | Returns |
|--------|---------|
| web_extract on show page | Episode listings with guests, topics, durations |
### Stack Overflow ✅
| Method | Returns |
|--------|---------|
| web_extract on profile | Reputation, tags, top answers, bio |
### Crunchbase ✅ (executives/founders only)
| Method | Returns |
|--------|---------|
| web_extract on crunchbase.com/person/{slug} | Full career history, education, investments, board positions |
### LinkedIn ⚠️ (indirect only)
| Method | Returns |
|--------|---------|
| web_search site:linkedin.com/in | Name, headline, company, location from snippets |
| Crunchbase | Full career history (better than LinkedIn for execs) |
| Corporate press pages | Official professional bios |
| RocketReach/SignalHire snippets | Title confirmation from web_search |
## TIER 4 — Blocked / Dead
| Platform | Status |
|----------|--------|
| LinkedIn direct | BLOCKED (web_extract domain blocked) |
| Discord | WALLED (not publicly indexable) |
| Telegram t.me | BLOCKED in some environments |
| Threads Official API | AUTH REQUIRED (graph.threads.net needs OAuth) |
| Threads ActivityPub outbox | 404 for all tested users |
| Instagram direct | BLOCKED (use Private API instead) |
| Most Nitter instances | DEAD (only nitter.cz works, but UNRELIABLE — see note) |
| Google Cache of Twitter | EMPTY |
| Wayback for tweets | USELESS (JS rendering) |
| Twitter Syndication API | RATE LIMITED |
| Archive.today | 429 + CAPTCHA |
| imginn/picuki/dumpoir/gramhir | 403 |
| Facebook Graph API | AUTH REQUIRED |
## Quick Reference: Research Pipeline by Person Type
### Tech Founder/CEO
X API → Bluesky → GitHub README → Crunchbase → Podcast transcripts → Medium RSS → HN → Product Hunt → LinkedIn snippets → News profiles
### AI Researcher
X API → Bluesky → Google Scholar → Semantic Scholar → arXiv → GitHub → Podcast transcripts → Blog/Substack → Reddit → Mastodon (sigmoid.social)
### Public Figure / Politician
X API → Facebook OG → Instagram API → YouTube → Podcast transcripts → News profiles → Quora → Goodreads → Wikipedia
### Content Creator
X API → Instagram API → TikTok → YouTube → Twitch → Podcast → Medium → Reddit → Bluesky → Threads OG
### Academic
Google Scholar → Semantic Scholar → University page → Conference talks → Podcast transcripts → Mastodon → Blog → GitHub → Reddit → HN
File diff suppressed because it is too large Load Diff
+250
View File
@@ -0,0 +1,250 @@
"""
REHOBOAM Database Layer
SQLite setup, migrations, and query helpers.
"""
import sqlite3
import os
from pathlib import Path
from datetime import datetime
DB_DIR = Path.home() / ".hermes" / "rehoboam" / "db"
MAIN_DB = DB_DIR / "rehoboam.db"
SCHEMA_VERSION = 1
SCHEMA_SQL = """
-- Core tables
CREATE TABLE IF NOT EXISTS profiles (
handle TEXT PRIMARY KEY,
platform TEXT NOT NULL,
display_name TEXT,
last_updated TEXT NOT NULL,
staleness TEXT NOT NULL,
profile_path TEXT NOT NULL,
created_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS simulations (
sim_id TEXT PRIMARY KEY,
created_at TEXT NOT NULL,
scenario TEXT NOT NULL,
participant_count INTEGER,
duration_sec REAL,
model_used TEXT,
config_path TEXT,
output_path TEXT
);
CREATE TABLE IF NOT EXISTS sim_participants (
sim_id TEXT REFERENCES simulations(sim_id),
handle TEXT REFERENCES profiles(handle),
role TEXT,
PRIMARY KEY (sim_id, handle)
);
CREATE TABLE IF NOT EXISTS sim_dynamics (
sim_id TEXT REFERENCES simulations(sim_id),
handle TEXT,
post_count INTEGER,
word_count INTEGER,
avg_sentiment REAL,
dominance_score REAL,
agreement_score REAL,
controversy_score REAL,
ratio_score REAL,
influence_in_sim REAL,
PRIMARY KEY (sim_id, handle)
);
CREATE TABLE IF NOT EXISTS sim_interactions (
sim_id TEXT REFERENCES simulations(sim_id),
from_handle TEXT,
to_handle TEXT,
interaction_type TEXT,
count INTEGER,
avg_sentiment REAL,
PRIMARY KEY (sim_id, from_handle, to_handle, interaction_type)
);
CREATE TABLE IF NOT EXISTS predictions (
pred_id TEXT PRIMARY KEY,
created_at TEXT NOT NULL,
sim_id TEXT,
handle TEXT,
prediction_type TEXT,
prediction_text TEXT NOT NULL,
confidence REAL NOT NULL,
calibrated_confidence REAL,
timeframe_days INTEGER,
resolved_at TEXT,
outcome TEXT,
outcome_evidence TEXT,
accuracy_score REAL
);
CREATE TABLE IF NOT EXISTS social_edges (
from_handle TEXT,
to_handle TEXT,
relationship_type TEXT,
weight REAL,
first_observed TEXT,
last_observed TEXT,
observation_count INTEGER,
source TEXT,
PRIMARY KEY (from_handle, to_handle, relationship_type)
);
CREATE TABLE IF NOT EXISTS social_clusters (
cluster_id TEXT PRIMARY KEY,
name TEXT,
description TEXT,
member_handles TEXT,
computed_at TEXT,
cohesion_score REAL
);
CREATE TABLE IF NOT EXISTS monitoring_events (
event_id TEXT PRIMARY KEY,
handle TEXT,
detected_at TEXT NOT NULL,
event_type TEXT,
description TEXT,
related_prediction_id TEXT,
severity TEXT,
acknowledged INTEGER DEFAULT 0
);
CREATE TABLE IF NOT EXISTS audit_log (
log_id TEXT PRIMARY KEY,
timestamp TEXT NOT NULL,
sim_id TEXT,
action TEXT NOT NULL,
handle TEXT,
details TEXT,
duration_sec REAL,
model_used TEXT,
token_count INTEGER,
error TEXT
);
-- Indexes
CREATE INDEX IF NOT EXISTS idx_predictions_handle ON predictions(handle);
CREATE INDEX IF NOT EXISTS idx_predictions_type ON predictions(prediction_type);
CREATE INDEX IF NOT EXISTS idx_predictions_unresolved ON predictions(outcome) WHERE outcome IS NULL;
CREATE INDEX IF NOT EXISTS idx_audit_action ON audit_log(action);
CREATE INDEX IF NOT EXISTS idx_audit_sim ON audit_log(sim_id);
CREATE INDEX IF NOT EXISTS idx_social_edges_from ON social_edges(from_handle);
CREATE INDEX IF NOT EXISTS idx_social_edges_to ON social_edges(to_handle);
CREATE INDEX IF NOT EXISTS idx_monitoring_handle ON monitoring_events(handle);
CREATE INDEX IF NOT EXISTS idx_monitoring_unack ON monitoring_events(acknowledged) WHERE acknowledged = 0;
-- Schema version tracking
CREATE TABLE IF NOT EXISTS schema_meta (
key TEXT PRIMARY KEY,
value TEXT
);
"""
def init_db() -> sqlite3.Connection:
"""Initialize the database, creating tables if needed."""
DB_DIR.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(str(MAIN_DB))
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA foreign_keys=ON")
conn.executescript(SCHEMA_SQL)
conn.execute(
"INSERT OR REPLACE INTO schema_meta (key, value) VALUES (?, ?)",
("schema_version", str(SCHEMA_VERSION))
)
conn.commit()
return conn
def get_db() -> sqlite3.Connection:
"""Get a database connection, initializing if needed."""
if not MAIN_DB.exists():
return init_db()
conn = sqlite3.connect(str(MAIN_DB))
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA foreign_keys=ON")
conn.row_factory = sqlite3.Row
return conn
def log_audit(conn: sqlite3.Connection, action: str, handle: str = None,
sim_id: str = None, details: str = None, duration_sec: float = None,
model_used: str = None, token_count: int = None, error: str = None):
"""Write an entry to the audit log."""
from schemas import gen_id
conn.execute(
"""INSERT INTO audit_log
(log_id, timestamp, sim_id, action, handle, details, duration_sec, model_used, token_count, error)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(gen_id("log_"), datetime.utcnow().isoformat() + "Z", sim_id, action,
handle, details, duration_sec, model_used, token_count, error)
)
conn.commit()
# -- Query Helpers --
def get_prediction_accuracy(conn: sqlite3.Connection, prediction_type: str = None) -> dict:
"""Get prediction accuracy statistics."""
query = """
SELECT prediction_type,
COUNT(*) as total,
SUM(CASE WHEN outcome='correct' THEN 1 ELSE 0 END) as correct,
SUM(CASE WHEN outcome='partially_correct' THEN 1 ELSE 0 END) as partial,
SUM(CASE WHEN outcome='incorrect' THEN 1 ELSE 0 END) as incorrect,
AVG(confidence) as avg_confidence,
AVG(CASE WHEN outcome='correct' THEN 1.0
WHEN outcome='partially_correct' THEN 0.5
ELSE 0.0 END) as accuracy
FROM predictions WHERE outcome IS NOT NULL
"""
params = []
if prediction_type:
query += " AND prediction_type = ?"
params.append(prediction_type)
query += " GROUP BY prediction_type"
return [dict(row) for row in conn.execute(query, params).fetchall()]
def get_open_predictions(conn: sqlite3.Connection, handle: str = None) -> list:
"""Get unresolved predictions."""
query = "SELECT * FROM predictions WHERE outcome IS NULL"
params = []
if handle:
query += " AND handle = ?"
params.append(handle)
query += " ORDER BY created_at DESC"
return [dict(row) for row in conn.execute(query, params).fetchall()]
def get_social_neighborhood(conn: sqlite3.Connection, handle: str, depth: int = 1) -> list:
"""Get a person's social graph neighborhood."""
query = """
SELECT from_handle, to_handle, relationship_type, weight
FROM social_edges
WHERE from_handle = ? OR to_handle = ?
ORDER BY weight DESC
"""
return [dict(row) for row in conn.execute(query, (handle, handle)).fetchall()]
def get_unread_alerts(conn: sqlite3.Connection) -> list:
"""Get unacknowledged monitoring alerts."""
query = """
SELECT * FROM monitoring_events
WHERE acknowledged = 0
ORDER BY detected_at DESC
"""
return [dict(row) for row in conn.execute(query).fetchall()]
if __name__ == "__main__":
conn = init_db()
print(f"Database initialized at {MAIN_DB}")
conn.close()
@@ -0,0 +1,216 @@
"""
REHOBOAM Data Schemas
Pydantic models for all JSON data structures used in the system.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional
from datetime import datetime
import json
import uuid
def gen_id(prefix: str = "") -> str:
return f"{prefix}{datetime.utcnow().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
@dataclass
class OceanScores:
openness: float = 0.5
conscientiousness: float = 0.5
extraversion: float = 0.5
agreeableness: float = 0.5
neuroticism: float = 0.5
@dataclass
class DarkTriad:
narcissism: float = 0.0
machiavellianism: float = 0.0
psychopathy: float = 0.0
@dataclass
class MoralFoundations:
care: float = 0.5
fairness: float = 0.5
loyalty: float = 0.5
authority: float = 0.5
sanctity: float = 0.5
liberty: float = 0.5
@dataclass
class Psychometrics:
ocean: OceanScores = field(default_factory=OceanScores)
mbti_estimate: str = ""
dark_triad: DarkTriad = field(default_factory=DarkTriad)
moral_foundations: MoralFoundations = field(default_factory=MoralFoundations)
confidence: float = 0.0
sample_size: int = 0
@dataclass
class VoiceFingerprint:
vocabulary_tier: str = ""
avg_sentence_length: float = 0.0
exclamation_rate: float = 0.0
question_rate: float = 0.0
emoji_rate: float = 0.0
slang_index: float = 0.0
formality_score: float = 0.5
humor_style: str = ""
signature_phrases: list[str] = field(default_factory=list)
topics_vocabulary: dict[str, float] = field(default_factory=dict)
cadence_pattern: str = ""
@dataclass
class Stance:
position: str = ""
intensity: float = 0.0
last_seen: str = ""
@dataclass
class Influence:
score: float = 0.0
reach: str = "micro"
engagement_rate: float = 0.0
amplification_power: float = 0.0
thought_leadership_domains: list[str] = field(default_factory=list)
@dataclass
class PostingPatterns:
avg_posts_per_day: float = 0.0
peak_hours_utc: list[int] = field(default_factory=list)
weekend_ratio: float = 0.5
reply_ratio: float = 0.0
repost_ratio: float = 0.0
thread_frequency: float = 0.0
controversy_rate: float = 0.0
@dataclass
class Relationships:
allies: list[str] = field(default_factory=list)
rivals: list[str] = field(default_factory=list)
frequent_interactions: list[str] = field(default_factory=list)
mentioned_by_frequently: list[str] = field(default_factory=list)
@dataclass
class ProfileMeta:
data_sources: list[str] = field(default_factory=list)
computation_time_sec: float = 0.0
model_used: str = ""
last_full_rebuild: str = ""
last_incremental: str = ""
@dataclass
class Identity:
bio: str = ""
location: str = ""
verified: bool = False
follower_count: int = 0
following_count: int = 0
account_created: str = ""
@dataclass
class Profile:
schema_version: str = "7.0"
handle: str = ""
platform: str = "x"
display_name: str = ""
created_at: str = ""
last_updated: str = ""
update_count: int = 0
staleness_score: float = 1.0
identity: Identity = field(default_factory=Identity)
psychometrics: Psychometrics = field(default_factory=Psychometrics)
voice_fingerprint: VoiceFingerprint = field(default_factory=VoiceFingerprint)
stances: dict[str, Stance] = field(default_factory=dict)
community_membership: list[str] = field(default_factory=list)
influence: Influence = field(default_factory=Influence)
posting_patterns: PostingPatterns = field(default_factory=PostingPatterns)
relationships: Relationships = field(default_factory=Relationships)
star_thread_ref: str = "star_thread.json"
raw_data_refs: list[str] = field(default_factory=list)
_meta: ProfileMeta = field(default_factory=ProfileMeta)
def to_dict(self) -> dict:
"""Recursively convert to dict for JSON serialization."""
import dataclasses
def _convert(obj):
if dataclasses.is_dataclass(obj):
return {k: _convert(v) for k, v in dataclasses.asdict(obj).items()}
elif isinstance(obj, list):
return [_convert(i) for i in obj]
elif isinstance(obj, dict):
return {k: _convert(v) for k, v in obj.items()}
return obj
return _convert(self)
def to_json(self, indent: int = 2) -> str:
return json.dumps(self.to_dict(), indent=indent)
@dataclass
class StarThread:
handle: str = ""
computed_at: str = ""
based_on_profile_version: str = ""
thread_version: int = 1
core_compression: str = ""
key_drives: list[str] = field(default_factory=list)
predictive_axioms: list[str] = field(default_factory=list)
voice_template: dict = field(default_factory=dict)
anti_slop_markers: list[str] = field(default_factory=list)
_meta: dict = field(default_factory=dict)
@dataclass
class Prediction:
pred_id: str = ""
created_at: str = ""
sim_id: str = ""
handle: str = ""
prediction_type: str = "" # statement, career, alliance, content, network_reaction
prediction_text: str = ""
confidence: float = 0.5
calibrated_confidence: float = 0.5
timeframe_days: int = 30
resolved_at: Optional[str] = None
outcome: Optional[str] = None # correct, partially_correct, incorrect
outcome_evidence: Optional[str] = None
accuracy_score: Optional[float] = None
@dataclass
class WatchConfig:
watch_id: str = ""
handle: str = ""
platform: str = "x"
enabled: bool = True
check_interval_minutes: int = 120
watch_for: list[dict] = field(default_factory=list)
alert_severity_minimum: str = "notable"
created_at: str = ""
@dataclass
class PopulationDefinition:
group_id: str = ""
name: str = ""
description: str = ""
created_at: str = ""
last_updated: str = ""
explicit_members: list[str] = field(default_factory=list)
criteria: dict = field(default_factory=dict)
resolved_members: list[str] = field(default_factory=list)
sampling_strategy: str = "representative"
default_sample_size: int = 12
@@ -0,0 +1,280 @@
"""
REHOBOAM Storage Layer
Directory management, profile I/O, index maintenance.
"""
import json
import shutil
from pathlib import Path
from datetime import datetime, timedelta
from typing import Optional
BASE_DIR = Path.home() / ".hermes" / "rehoboam"
PROFILES_DIR = BASE_DIR / "profiles"
POPULATIONS_DIR = BASE_DIR / "populations"
SIMULATIONS_DIR = BASE_DIR / "simulations"
MONITORING_DIR = BASE_DIR / "monitoring"
CONFIG_DIR = BASE_DIR / "config"
def init_storage():
"""Create all required directories."""
for d in [PROFILES_DIR, POPULATIONS_DIR, SIMULATIONS_DIR,
MONITORING_DIR, MONITORING_DIR / "alerts", CONFIG_DIR,
BASE_DIR / "db"]:
d.mkdir(parents=True, exist_ok=True)
# Create default configs if they don't exist
staleness_path = CONFIG_DIR / "staleness_policy.json"
if not staleness_path.exists():
staleness_path.write_text(json.dumps({
"thresholds": {
"fresh": {"max_age_hours": 72},
"stale": {"max_age_hours": 336},
"expired": {"max_age_hours": 2160},
"archived": {"max_age_hours": 8760}
},
"per_field_decay": {
"psychometrics": {"half_life_days": 180},
"stances": {"half_life_days": 30},
"posting_patterns": {"half_life_days": 60},
"relationships": {"half_life_days": 45},
"influence": {"half_life_days": 90},
"voice_fingerprint": {"half_life_days": 365}
},
"auto_refresh_on_simulation": True,
"auto_refresh_threshold": "stale"
}, indent=2))
config_path = CONFIG_DIR / "rehoboam.json"
if not config_path.exists():
config_path.write_text(json.dumps({
"version": "7.0",
"default_model": "claude-opus-4-20250514",
"max_thread_age_days": 30,
"monitoring_enabled": False,
"auto_thread": True,
"auto_profile_update": True
}, indent=2))
# Create indexes if they don't exist
for idx_path in [PROFILES_DIR / "_index.json", POPULATIONS_DIR / "_index.json",
SIMULATIONS_DIR / "_index.json"]:
if not idx_path.exists():
idx_path.write_text("{}")
def normalize_handle(handle: str) -> str:
"""Normalize a handle to a filesystem-safe directory name."""
h = handle.lstrip("@").lower().strip()
# Replace characters that are problematic in filenames
return h.replace("/", "_").replace("\\", "_")
# -- Profile I/O --
def get_profile_dir(handle: str) -> Path:
return PROFILES_DIR / normalize_handle(handle)
def profile_exists(handle: str) -> bool:
return (get_profile_dir(handle) / "profile.json").exists()
def load_profile(handle: str) -> Optional[dict]:
path = get_profile_dir(handle) / "profile.json"
if path.exists():
return json.loads(path.read_text())
return None
def save_profile(handle: str, profile: dict, snapshot: bool = True):
"""Save a profile, optionally snapshotting the old one."""
pdir = get_profile_dir(handle)
pdir.mkdir(parents=True, exist_ok=True)
(pdir / "history").mkdir(exist_ok=True)
(pdir / "raw").mkdir(exist_ok=True)
(pdir / "predictions").mkdir(exist_ok=True)
profile_path = pdir / "profile.json"
# Snapshot old profile before overwriting
if snapshot and profile_path.exists():
old = json.loads(profile_path.read_text())
ts = old.get("last_updated", datetime.utcnow().isoformat()).replace(":", "-")
snapshot_path = pdir / "history" / f"profile_{ts[:10]}.json"
shutil.copy2(profile_path, snapshot_path)
profile_path.write_text(json.dumps(profile, indent=2))
_update_profile_index(handle, profile)
def _update_profile_index(handle: str, profile: dict):
idx_path = PROFILES_DIR / "_index.json"
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
idx[normalize_handle(handle)] = {
"platform": profile.get("platform", "x"),
"last_updated": profile.get("last_updated", ""),
"staleness": compute_staleness(profile.get("last_updated", "")),
"has_star_thread": (get_profile_dir(handle) / "star_thread.json").exists(),
"simulation_count": idx.get(normalize_handle(handle), {}).get("simulation_count", 0),
"display_name": profile.get("display_name", "")
}
idx_path.write_text(json.dumps(idx, indent=2))
# -- Star Thread I/O --
def load_star_thread(handle: str) -> Optional[dict]:
path = get_profile_dir(handle) / "star_thread.json"
if path.exists():
return json.loads(path.read_text())
return None
def save_star_thread(handle: str, thread: dict):
path = get_profile_dir(handle) / "star_thread.json"
get_profile_dir(handle).mkdir(parents=True, exist_ok=True)
path.write_text(json.dumps(thread, indent=2))
# Update index to reflect thread existence
idx_path = PROFILES_DIR / "_index.json"
if idx_path.exists():
idx = json.loads(idx_path.read_text())
key = normalize_handle(handle)
if key in idx:
idx[key]["has_star_thread"] = True
idx_path.write_text(json.dumps(idx, indent=2))
# -- Staleness --
def compute_staleness(last_updated: str) -> str:
"""Determine staleness level from a timestamp string."""
if not last_updated:
return "expired"
try:
dt = datetime.fromisoformat(last_updated.rstrip("Z"))
except ValueError:
return "expired"
age = datetime.utcnow() - dt
hours = age.total_seconds() / 3600
policy = _load_staleness_policy()
thresholds = policy.get("thresholds", {})
if hours <= thresholds.get("fresh", {}).get("max_age_hours", 72):
return "fresh"
elif hours <= thresholds.get("stale", {}).get("max_age_hours", 336):
return "stale"
elif hours <= thresholds.get("expired", {}).get("max_age_hours", 2160):
return "expired"
else:
return "archived"
def _load_staleness_policy() -> dict:
path = CONFIG_DIR / "staleness_policy.json"
if path.exists():
return json.loads(path.read_text())
return {"thresholds": {"fresh": {"max_age_hours": 72}, "stale": {"max_age_hours": 336},
"expired": {"max_age_hours": 2160}, "archived": {"max_age_hours": 8760}}}
def needs_thread_recompute(handle: str) -> bool:
"""Check if a star thread needs recomputation."""
thread = load_star_thread(handle)
if thread is None:
return True
profile = load_profile(handle)
if profile is None:
return True
# Thread is stale if profile was updated after thread was computed
thread_time = thread.get("based_on_profile_version", "")
profile_time = profile.get("last_updated", "")
if thread_time < profile_time:
return True
# Thread is stale if older than max_thread_age_days
config = json.loads((CONFIG_DIR / "rehoboam.json").read_text()) if (CONFIG_DIR / "rehoboam.json").exists() else {}
max_age = config.get("max_thread_age_days", 30)
try:
computed = datetime.fromisoformat(thread.get("computed_at", "").rstrip("Z"))
if (datetime.utcnow() - computed).days > max_age:
return True
except ValueError:
return True
return False
# -- Simulation I/O --
def save_simulation(sim_id: str, config: dict, output: dict, analytics: dict, audit: dict):
sdir = SIMULATIONS_DIR / sim_id
sdir.mkdir(parents=True, exist_ok=True)
(sdir / "config.json").write_text(json.dumps(config, indent=2))
(sdir / "output.json").write_text(json.dumps(output, indent=2))
(sdir / "analytics.json").write_text(json.dumps(analytics, indent=2))
(sdir / "audit.json").write_text(json.dumps(audit, indent=2))
# Update index
idx_path = SIMULATIONS_DIR / "_index.json"
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
idx[sim_id] = {
"created_at": config.get("created_at", datetime.utcnow().isoformat() + "Z"),
"scenario": config.get("scenario", ""),
"participant_count": len(config.get("participants", [])),
}
idx_path.write_text(json.dumps(idx, indent=2))
# -- Population I/O --
def save_population(group_id: str, definition: dict, aggregate: dict = None):
pdir = POPULATIONS_DIR / group_id
pdir.mkdir(parents=True, exist_ok=True)
(pdir / "history").mkdir(exist_ok=True)
(pdir / "definition.json").write_text(json.dumps(definition, indent=2))
if aggregate:
(pdir / "aggregate.json").write_text(json.dumps(aggregate, indent=2))
idx_path = POPULATIONS_DIR / "_index.json"
idx = json.loads(idx_path.read_text()) if idx_path.exists() else {}
idx[group_id] = {
"name": definition.get("name", group_id),
"member_count": len(definition.get("resolved_members", definition.get("explicit_members", []))),
"last_updated": definition.get("last_updated", "")
}
idx_path.write_text(json.dumps(idx, indent=2))
def load_population(group_id: str) -> Optional[dict]:
path = POPULATIONS_DIR / group_id / "definition.json"
if path.exists():
return json.loads(path.read_text())
return None
# -- Listing --
def list_profiles() -> dict:
idx_path = PROFILES_DIR / "_index.json"
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
def list_populations() -> dict:
idx_path = POPULATIONS_DIR / "_index.json"
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
def list_simulations() -> dict:
idx_path = SIMULATIONS_DIR / "_index.json"
return json.loads(idx_path.read_text()) if idx_path.exists() else {}
if __name__ == "__main__":
init_storage()
print(f"Storage initialized at {BASE_DIR}")
@@ -0,0 +1,139 @@
#!/usr/bin/env python3
"""
Facebook Page/Profile Data Extractor
Uses multiple techniques to extract public Facebook data without authentication:
1. Googlebot UA for OG meta tags (name, description, likes, talking_about, bio, og:image)
2. Graph API /picture endpoint for profile photos (pages only)
3. Page Plugin embed for follower counts and page IDs
"""
import subprocess
import json
import re
import html
import sys
GOOGLEBOT_UA = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
def curl_get(url, ua=None):
"""Fetch URL with curl"""
cmd = ['curl', '-s', '-L', '--max-time', '15']
if ua:
cmd += ['-H', f'User-Agent: {ua}']
cmd.append(url)
result = subprocess.run(cmd, capture_output=True, text=True, timeout=20)
return result.stdout
def extract_og_data(username):
"""Extract OG meta tags using Googlebot UA"""
content = curl_get(f'https://www.facebook.com/{username}', ua=GOOGLEBOT_UA)
data = {}
# Extract OG tags
og_title = re.search(r'og:title"\s*content="([^"]*)"', content)
if og_title:
data['name'] = html.unescape(og_title.group(1))
og_desc = re.search(r'og:description"\s*content="([^"]*)"', content)
if og_desc:
desc = html.unescape(og_desc.group(1))
data['raw_description'] = desc
# Parse likes count
likes_match = re.search(r'([\d,]+)\s+likes?', desc)
if likes_match:
data['likes'] = likes_match.group(1)
# Parse talking about
talking_match = re.search(r'([\d,]+)\s+talking about this', desc)
if talking_match:
data['talking_about'] = talking_match.group(1)
# Extract bio (text after the "talking about this." part)
bio_match = re.search(r'talking about this\.\s*(.+)', desc)
if bio_match:
data['bio'] = bio_match.group(1)
og_image = re.search(r'og:image"\s*content="([^"]*)"', content)
if og_image:
data['og_image'] = html.unescape(og_image.group(1))
return data
def extract_plugin_data(username):
"""Extract data from Page Plugin embed"""
content = curl_get(f'https://www.facebook.com/plugins/page.php?href=https://www.facebook.com/{username}&tabs=timeline&width=500&height=600')
data = {}
# Page name from title attribute
name_match = re.search(r'class="_1drp _5lv6" title="([^"]*)"', content)
if name_match:
data['plugin_name'] = html.unescape(name_match.group(1))
# Follower count
followers_match = re.search(r'([\d,]+)\s+followers', content)
if followers_match:
data['followers'] = followers_match.group(1)
# Page ID
pageid_match = re.search(r'"pageID":"(\d+)"', content)
if pageid_match:
data['page_id'] = pageid_match.group(1)
return data
def extract_profile_picture(username):
"""Get profile picture via Graph API"""
content = curl_get(f'https://graph.facebook.com/v19.0/{username}/picture?redirect=false&width=400&height=400')
try:
d = json.loads(content)
if 'data' in d and not d['data'].get('is_silhouette', True):
return d['data']['url']
except:
pass
return None
def get_facebook_data(username):
"""Combine all extraction methods"""
result = {'username': username}
# Method 1: OG tags (best for bio, likes, talking_about)
og = extract_og_data(username)
result.update(og)
# Method 2: Plugin (best for followers, page_id)
plugin = extract_plugin_data(username)
result.update(plugin)
# Method 3: Graph API picture (pages only)
pic = extract_profile_picture(username)
if pic:
result['profile_picture'] = pic
# Also try by page_id for picture if username didn't work
if not pic and 'page_id' in result:
pic2 = extract_profile_picture(result['page_id'])
if pic2:
result['profile_picture'] = pic2
return result
if __name__ == '__main__':
targets = sys.argv[1:] if len(sys.argv) > 1 else ['zuck', 'NVIDIA', 'Meta', 'CocaCola']
for target in targets:
print(f"{'='*60}")
print(f"Facebook Profile: {target}")
print(f"{'='*60}")
data = get_facebook_data(target)
for k, v in data.items():
if k == 'raw_description':
continue # Skip raw, we show parsed fields
val = str(v)
if len(val) > 120:
val = val[:120] + '...'
print(f" {k}: {val}")
print()
@@ -0,0 +1,595 @@
"""
Hermes Simulator Intelligence Gathering Pipeline v2
Full-spectrum OSINT research engine for personality modeling.
Searches text, extracts content, browses live pages, analyzes
images with vision, and cross-references across platforms.
Run via execute_code. The agent adapts searches based on findings.
"""
from hermes_tools import web_search, web_extract, terminal
import json
import time
import urllib.parse
# ═══════════════════════════════════════════════════════════════
# CONFIGURATION
# ═══════════════════════════════════════════════════════════════
AGGREGATOR_SITES = [
"buttondown.com/ainews",
"news.smol.ai",
"techmeme.com",
"latent.space",
]
# Verified working fallback data sources (tested April 2026)
# Priority order: X API > nitter.cz > ThreadReaderApp > GitHub > Reddit > HN
FALLBACK_SOURCES = {
"nitter": "https://nitter.cz/{handle}", # web_extract — full timeline
"threadreader": "https://threadreaderapp.com/user/{handle}", # web_extract — historical threads
"github_profile": "https://api.github.com/users/{handle}", # curl — profile + README
"github_events": "https://api.github.com/users/{handle}/events", # curl — recent activity
"reddit_user": "https://www.reddit.com/user/{handle}.json", # curl w/ User-Agent
"reddit_comments": "https://www.reddit.com/user/{handle}/comments.json",
"hn_search": "https://hn.algolia.com/api/v1/search?query={handle}&tags=comment",
}
# CONFIRMED BLOCKED (don't waste calls on these):
# - LinkedIn (web_extract blocked, browser auth wall)
# - Instagram viewers (imginn, picuki, dumpoir, gramhir — all 403)
# - Most nitter instances (dead or 403, ONLY nitter.cz works via web_extract)
# - Wayback Machine for tweets (sparse, no JS content)
# - Google Cache of Twitter (empty)
# - Archive.today (429 + CAPTCHA)
# - Twitter Syndication API (rate limited)
AI_SUBREDDITS = [
"LocalLLaMA", "MachineLearning", "singularity",
"ChatGPT", "ClaudeAI", "OpenAI", "StableDiffusion",
]
PLATFORMS = ["twitter", "instagram", "linkedin", "github", "reddit", "youtube"]
# ═══════════════════════════════════════════════════════════════
# HELPER: safe web_search with validation
# ═══════════════════════════════════════════════════════════════
def _safe_web_search(query: str, limit: int = 5) -> list:
"""Run web_search and return results list, with validation."""
r = web_search(query, limit=limit)
if not isinstance(r, dict) or "data" not in r:
print(f" [WARNING] web_search returned no 'data' key for query: {query[:80]}")
return []
data = r.get("data", {})
if not isinstance(data, dict):
return []
return data.get("web", []) or []
# ═══════════════════════════════════════════════════════════════
# CORE SEARCH FUNCTIONS
# ═══════════════════════════════════════════════════════════════
def search_identity(handle: str) -> dict:
"""Establish who they are across the internet."""
results = {}
results["twitter_identity"] = _safe_web_search(f"@{handle} twitter bio role company", limit=5)
results["general_identity"] = _safe_web_search(f"{handle} known for", limit=5)
return results
def search_voice(handle: str) -> dict:
"""How do they actually talk/write."""
results = {}
results["takes"] = _safe_web_search(f"{handle} twitter hot takes opinions", limit=5)
for agg in AGGREGATOR_SITES[:2]:
hits = _safe_web_search(f"site:{agg} {handle}", limit=3)
if hits:
# Use full domain as key, not split('.')[0]
results[f"agg_{agg}"] = hits
return results
def search_positions(handle: str, topics: list = None, domain: str = None) -> dict:
"""What are their known positions."""
results = {}
if topics:
for topic in topics[:3]:
results[f"topic_{topic}"] = _safe_web_search(f"{handle} {topic} opinion take", limit=5)
# Build controversy query — only add domain keywords if specified
controversy_query = f"{handle} debate disagree controversial"
if domain:
controversy_query += f" {domain}"
results["controversies"] = _safe_web_search(controversy_query, limit=5)
return results
def search_longform(handle: str, real_name: str = None, domain: str = None) -> dict:
"""Blogs, interviews, essays."""
results = {}
name = real_name or handle
blog_query = f"{name} blog substack essay"
interview_query = f"{name} interview podcast"
if domain:
blog_query += f" {domain}"
interview_query += f" {domain}"
results["blogs"] = _safe_web_search(blog_query, limit=5)
results["interviews"] = _safe_web_search(interview_query, limit=5)
return results
# ═══════════════════════════════════════════════════════════════
# CROSS-PLATFORM DISCOVERY
# ═══════════════════════════════════════════════════════════════
def discover_platforms(handle: str, real_name: str = None) -> dict:
"""Find someone across all platforms."""
name = real_name or handle
results = {}
# Instagram
results["instagram"] = _safe_web_search(f"{name} instagram OR site:instagram.com/{handle}", limit=5)
# LinkedIn
results["linkedin"] = _safe_web_search(f"{name} linkedin OR site:linkedin.com/in", limit=5)
# Reddit
results["reddit"] = _safe_web_search(f"{name} reddit account OR site:reddit.com/user", limit=5)
# GitHub
results["github"] = _safe_web_search(f"{handle} github OR site:github.com/{handle}", limit=5)
# YouTube
results["youtube"] = _safe_web_search(f"{name} youtube channel OR talk OR interview", limit=5)
# Personal site
results["personal_site"] = _safe_web_search(f"{name} personal website blog about", limit=5)
# Hacker News
results["hackernews"] = _safe_web_search(f"site:news.ycombinator.com {handle} OR {name}", limit=3)
return results
def discover_instagram(handle: str = None, real_name: str = None) -> dict:
"""Focused Instagram discovery."""
results = {}
name = real_name or handle
# Try to find their IG handle
results["ig_search"] = _safe_web_search(f"{name} instagram profile", limit=5)
# If we have a candidate IG URL, try to extract
ig_urls = []
for item in results.get("ig_search", []):
if not isinstance(item, dict):
continue
url = item.get("url", "")
if "instagram.com/" in url and "/p/" not in url:
ig_urls.append(url)
if ig_urls:
# Try to extract IG profile page
r = web_extract(urls=ig_urls[:1])
results["ig_profile"] = r.get("results", [])
return results
# ═══════════════════════════════════════════════════════════════
# VISUAL INTELLIGENCE
# ═══════════════════════════════════════════════════════════════
# NOTE: These functions use browser_* and vision_analyze which are
# NOT available in execute_code. They are called DIRECTLY by the
# agent after the execute_code research phase.
#
# The agent should:
# 1. Run this script via execute_code for text-based research
# 2. Then use browser/vision tools directly for visual research
#
# Visual research tasks for the agent:
#
# INSTAGRAM VISUAL:
# browser_navigate("https://www.instagram.com/{ig_handle}/")
# browser_vision(question="Describe this Instagram profile: bio, pic, grid, aesthetic, follower count")
# browser_get_images() # collect image URLs
# vision_analyze(image_url="{url}", question="Describe: setting, people, mood, style")
#
# PROFILE PIC ANALYSIS:
# vision_analyze(image_url="{pic_url}", question="Describe: appearance, clothing, setting, expression, professional vs casual")
#
# REVERSE IMAGE SEARCH (Yandex):
# # Upload to catbox if behind auth:
# terminal("curl -F 'reqtype=fileupload' -F 'fileToUpload=@{path}' https://catbox.moe/user/api.php")
# browser_navigate(f"https://yandex.com/images/search?rpt=imageview&url={encoded_url}")
#
# PAGE SCREENSHOT ANALYSIS:
# browser_vision(question="Read all text, usernames, post content, dates, engagement numbers")
# ═══════════════════════════════════════════════════════════════
# INTERACTION MAPPING
# ═══════════════════════════════════════════════════════════════
def search_interactions(handle: str, other_handles: list = None) -> dict:
"""How they interact with other simulation targets."""
results = {}
if other_handles:
for other in other_handles[:4]:
hits = _safe_web_search(f"{handle} {other} twitter interaction debate reply", limit=3)
if hits:
results[f"with_{other}"] = hits
return results
def search_social_graph(handle: str) -> dict:
"""Who do they interact with most? Allies and rivals."""
results = {}
results["frequent_interactions"] = _safe_web_search(f"@{handle} twitter reply thread conversation with", limit=5)
results["conflicts"] = _safe_web_search(f"@{handle} disagree argue beef ratio", limit=5)
results["allies"] = _safe_web_search(f"@{handle} agree support endorse recommend", limit=5)
return results
# ═══════════════════════════════════════════════════════════════
# DEEP EXTRACTION
# ═══════════════════════════════════════════════════════════════
def extract_content(urls: list) -> list:
"""Pull full content from high-value URLs."""
if not urls:
return []
r = web_extract(urls=urls[:3])
return r.get("results", [])
def extract_best_urls(findings: dict, max_urls: int = 5) -> list:
"""Find the most promising URLs in research findings for deep extraction."""
seen_urls = set() # URL deduplication
priority_domains = [
"substack.com", "medium.com", "blog", "essay",
"interview", "podcast", "youtube.com", "arxiv.org",
]
def score_url(url, desc):
score = 0
for domain in priority_domains:
if domain in url.lower() or domain in desc.lower():
score += 2
if any(w in desc.lower() for w in ["interview", "spoke", "told", "said", "wrote"]):
score += 1
return score
candidates = []
def collect(obj):
if isinstance(obj, list):
for item in obj:
if isinstance(item, dict):
url = item.get("url") or ""
desc = item.get("description") or item.get("text") or ""
if url and url not in seen_urls and not any(x in url for x in ["x.com", "twitter.com", "instagram.com"]):
seen_urls.add(url)
candidates.append((score_url(url, desc), url))
elif isinstance(obj, dict):
for v in obj.values():
collect(v)
collect(findings)
candidates.sort(key=lambda x: -x[0])
return [url for _, url in candidates[:max_urls]]
# ═══════════════════════════════════════════════════════════════
# MAIN PIPELINE
# ═══════════════════════════════════════════════════════════════
def research_person(handle: str, fidelity: int = 70,
topics: list = None,
other_handles: list = None,
real_name: str = None,
domain: str = None) -> dict:
"""
Full research pipeline for one person.
Returns dict with all findings organized by category.
Args:
handle: Twitter/X handle (without @)
fidelity: Research depth 0-100
topics: Specific topics to research
other_handles: Other people to check interactions with
real_name: Real name if different from handle
domain: Domain context (e.g., 'AI', 'politics', 'gaming').
When None, no domain keywords are added to searches.
When set, adds relevant domain keywords.
"""
print(f"\n{'='*60}")
print(f" RESEARCHING: @{handle} | Fidelity: {fidelity}%")
if domain:
print(f" Domain: {domain}")
print(f"{'='*60}")
findings = {"handle": handle, "fidelity": fidelity, "visual_tasks": []}
# ─── Phase 1: Identity (always) ───
print(f"\n [IDENTITY] Who are they...")
findings["identity"] = search_identity(handle)
if fidelity <= 30:
if topics:
findings["quick_topic"] = _safe_web_search(f"{handle} {topics[0]}", limit=3)
return findings
# ─── Phase 2: Voice (fidelity 31+) ───
print(f"\n [VOICE] How do they talk...")
findings["voice"] = search_voice(handle)
# ─── Phase 3: Positions (fidelity 31+) ───
print(f"\n [POSITIONS] What do they believe...")
findings["positions"] = search_positions(handle, topics, domain=domain)
if fidelity <= 50:
return findings
# ─── Phase 4: Cross-platform (fidelity 51+) ───
print(f"\n [PLATFORMS] Finding them everywhere...")
findings["platforms"] = discover_platforms(handle, real_name)
if fidelity <= 70:
return findings
# ─── Phase 5: Longform (fidelity 71+) ───
print(f"\n [LONGFORM] Blogs, interviews, essays...")
findings["longform"] = search_longform(handle, real_name, domain=domain)
# ─── Phase 6: Social graph (fidelity 71+) ───
print(f"\n [SOCIAL GRAPH] Who do they interact with...")
findings["social_graph"] = search_social_graph(handle)
# ─── Phase 7: Interaction mapping (fidelity 71+) ───
if other_handles:
print(f"\n [INTERACTIONS] With other targets: {other_handles}...")
findings["interactions"] = search_interactions(handle, other_handles)
# ─── Phase 8: Instagram deep dive (fidelity 80+) ───
if fidelity >= 80:
print(f"\n [INSTAGRAM] Visual identity...")
findings["instagram"] = discover_instagram(handle, real_name)
# Queue visual tasks for the agent to do after execute_code
findings["visual_tasks"].append({
"type": "instagram_profile",
"instruction": f"browser_navigate to Instagram profile, use browser_vision to analyze",
"handle": handle,
})
# ─── Phase 9: Deep extraction (fidelity 85+) ───
if fidelity >= 85:
print(f"\n [DEEP EXTRACT] Pulling longform content...")
best_urls = extract_best_urls(findings, max_urls=4)
if best_urls:
print(f" Extracting {len(best_urls)} URLs: {best_urls}")
findings["deep_extracts"] = extract_content(best_urls)
# ─── Phase 10: Profile pic analysis (fidelity 90+) ───
if fidelity >= 90:
findings["visual_tasks"].append({
"type": "profile_pic_analysis",
"instruction": "Find and analyze profile pictures across platforms with vision_analyze",
"handle": handle,
})
findings["visual_tasks"].append({
"type": "reverse_image_search",
"instruction": "Reverse image search profile pic via Yandex to find alt accounts",
"handle": handle,
})
return findings
def research_all(handles: list, fidelity: int = 70,
topics: list = None, domain: str = None) -> dict:
"""Research all simulation targets."""
all_findings = {}
for handle in handles:
clean = handle.lstrip("@")
others = [h.lstrip("@") for h in handles if h.lstrip("@") != clean]
findings = research_person(
handle=clean,
fidelity=fidelity,
topics=topics,
other_handles=others,
domain=domain,
)
all_findings[clean] = findings
return all_findings
# ═══════════════════════════════════════════════════════════════
# REPORTING
# ═══════════════════════════════════════════════════════════════
def count_data_points(obj) -> int:
"""Count total search result items in findings (only meaningful items with >50 char text)."""
total = 0
if isinstance(obj, list):
for item in obj:
if isinstance(item, dict):
text = item.get("description") or item.get("text") or ""
if len(text) > 50:
total += 1
else:
# Still count non-dict items or items without text fields
total += 1
else:
total += 1
elif isinstance(obj, dict):
for k, v in obj.items():
# Skip metadata keys
if k in ("handle", "fidelity", "visual_tasks"):
continue
total += count_data_points(v)
return total
def count_quality_data_points(obj) -> int:
"""Count search result items with substantial text (description/text > 50 chars)."""
total = 0
if isinstance(obj, list):
for item in obj:
if isinstance(item, dict):
text = item.get("description") or item.get("text") or ""
if len(text) > 50:
total += 1
elif isinstance(obj, dict):
for k, v in obj.items():
if k in ("handle", "fidelity", "visual_tasks"):
continue
total += count_quality_data_points(v)
return total
def summarize_findings(findings: dict) -> str:
"""Compact summary of what we found."""
handle = findings.get("handle", "unknown")
fidelity = findings.get("fidelity", 0)
total = count_data_points(findings)
quality = count_quality_data_points(findings)
visual_tasks = findings.get("visual_tasks", [])
lines = [
f"\n{''*60}",
f" @{handle} | Fidelity: {fidelity}% | Data points: {total} ({quality} quality)",
f"{''*60}",
]
# Identity snippets
identity = findings.get("identity", {})
for key in ["twitter_identity", "general_identity"]:
for item in identity.get(key, [])[:2]:
if not isinstance(item, dict):
continue
desc = (item.get("description") or "")[:180]
if desc:
lines.append(f" [{key.upper()}] {desc}")
# Platform discovery results
platforms = findings.get("platforms", {})
found_platforms = []
for platform, items in platforms.items():
if isinstance(items, list) and len(items) > 0:
found_platforms.append(platform)
if found_platforms:
lines.append(f" [PLATFORMS FOUND] {', '.join(found_platforms)}")
# Voice samples from aggregators
voice = findings.get("voice", {})
for key, items in voice.items():
if isinstance(items, list):
for item in items[:1]:
if not isinstance(item, dict):
continue
desc = (item.get("description") or "")[:180]
if desc and handle.lower() in desc.lower():
lines.append(f" [VOICE] {desc}")
# Deep extracts
for extract in findings.get("deep_extracts", [])[:2]:
if not isinstance(extract, dict):
continue
title = extract.get("title", "untitled")
content = (extract.get("content") or "")[:200]
if content:
lines.append(f" [LONGFORM: {title}] {content}...")
# Pending visual tasks
if visual_tasks:
lines.append(f" [VISUAL TASKS QUEUED] {len(visual_tasks)} tasks for agent to execute:")
for task in visual_tasks:
lines.append(f"{task.get('type', '?')}: {task.get('instruction', '?')[:80]}")
# Confidence estimate — based on quality data points
if quality >= 30:
conf = "HIGH"
elif quality >= 15:
conf = "MEDIUM"
elif quality >= 5:
conf = "LOW"
else:
conf = "INSUFFICIENT"
lines.append(f"\n CONFIDENCE: {conf} ({quality} quality data points, {total} total)")
return "\n".join(lines)
def report_visual_tasks(all_findings: dict) -> str:
"""Collect all visual tasks across all targets for agent to execute."""
lines = ["\n" + ""*60, " VISUAL INTELLIGENCE TASKS (agent must execute directly)", ""*60]
any_tasks = False
for handle, findings in all_findings.items():
for task in findings.get("visual_tasks", []):
any_tasks = True
lines.append(f"\n @{handle}{task.get('type', '?')}:")
lines.append(f" {task.get('instruction', '?')}")
if not any_tasks:
lines.append(" No visual tasks queued (fidelity < 80)")
return "\n".join(lines)
# ═══════════════════════════════════════════════════════════════
# CHECK AVAILABLE TOOLS
# ═══════════════════════════════════════════════════════════════
def check_x_cli() -> bool:
"""Check if x-cli is available."""
try:
r = terminal("which x-cli 2>/dev/null && echo 'FOUND' || echo 'NOT_FOUND'")
return "FOUND" in r.get("output", "")
except:
return False
# ═══════════════════════════════════════════════════════════════
# ENTRY POINT
# ═══════════════════════════════════════════════════════════════
if __name__ == "__main__":
# ── CONFIGURE THESE ──
HANDLES = ["teknium1", "basedjensen"]
FIDELITY = 80
TOPICS = ["open source AI", "compute scaling"]
DOMAIN = None # Set to 'AI', 'politics', etc. to add domain keywords
# ─────────────────────
has_xcli = check_x_cli()
print(f"x-cli available: {has_xcli}")
print(f"Targets: {HANDLES}")
print(f"Fidelity: {FIDELITY}%")
print(f"Topics: {TOPICS}")
print(f"Domain: {DOMAIN}")
results = research_all(HANDLES, fidelity=FIDELITY, topics=TOPICS, domain=DOMAIN)
for handle, findings in results.items():
print(summarize_findings(findings))
print(report_visual_tasks(results))
print("\n\nResearch phase complete. Agent should now:")
print("1. Execute any queued visual tasks (browser/vision)")
print("2. Compile dossiers from all findings")
print("3. Run simulation")
@@ -0,0 +1,238 @@
#!/usr/bin/env python3
"""
Threads (Meta) Profile & Post Extractor
========================================
Extracts profile data and post content from Threads using:
1. OG meta tags from HTML (no auth required for profiles and public posts)
2. WebFinger for ActivityPub discovery
3. Google-indexed post URLs for recent post discovery
METHODS THAT WORK:
- Profile pages at threads.net/@{user} have OG tags with:
display_name, username, follower_count, thread_count, bio, profile_pic
- Individual post pages have OG tags with:
full post text, author info, profile pic
- WebFinger at /.well-known/webfinger gives ActivityPub user IDs
- Post URLs must be known (discoverable via web search)
METHODS THAT DON'T WORK (as of 2025):
- Threads Official API (graph.threads.net) requires OAuth token
- ActivityPub /ap/users/ endpoints return 404 for most users
- No public post listing endpoint exists
"""
import re
import json
import html
import subprocess
import sys
def curl_fetch(url, extra_headers=None, timeout=15):
"""Fetch URL using curl (more reliable than urllib for Threads)."""
cmd = ['curl', '-s', '-L', '--max-time', str(timeout)]
if extra_headers:
for k, v in extra_headers.items():
cmd.extend(['-H', f'{k}: {v}'])
cmd.append(url)
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout+5)
return result.stdout
except:
return None
def extract_og_tags(html_content):
"""Extract OpenGraph, meta description, and Twitter tags from HTML."""
data = {}
if not html_content:
return data
for m in re.finditer(r'property="(og:[^"]+)"\s+content="([^"]*)"', html_content):
key = m.group(1)
val = html.unescape(m.group(2))
if key not in data:
data[key] = val
for m in re.finditer(r'name="description"\s+content="([^"]*)"', html_content):
data['description'] = html.unescape(m.group(1))
break
for m in re.finditer(r'name="(twitter:[^"]+)"\s+content="([^"]*)"', html_content):
key = m.group(1)
val = html.unescape(m.group(2))
if key not in data:
data[key] = val
return data
def parse_profile_description(desc):
"""Parse '5.5M Followers • 142 Threads • Bio. See the latest...' format."""
result = {}
if not desc:
return result
parts = desc.split(' \u2022 ') # Split on bullet •
for part in parts:
part = part.strip()
if 'Follower' in part:
result['followers'] = part.split(' Follower')[0].strip()
elif part.endswith('Threads') or part.endswith('Thread'):
result['thread_count'] = part.split(' Thread')[0].strip()
else:
bio = re.sub(r'\s*See the latest conversations.*$', '', part)
if bio:
result['bio'] = bio
return result
def parse_profile_title(title):
"""Parse 'Display Name (@user) • Threads, Say more' format."""
result = {}
if not title:
return result
m = re.match(r'^(.+?)\s*\(@(\w+)\)', title)
if m:
result['display_name'] = m.group(1).strip()
result['username'] = m.group(2)
return result
def get_threads_profile(username):
"""
Get Threads profile data via OG meta tags.
Returns dict with: username, display_name, bio, followers, thread_count,
profile_picture_url, url
"""
username = username.lstrip('@')
url = f'https://www.threads.net/@{username}'
content = curl_fetch(url)
tags = extract_og_tags(content)
if not tags or 'og:title' not in tags:
return {'error': 'Failed to fetch or parse profile', 'username': username}
title = tags.get('og:title', '')
if title.startswith('Threads') and 'Log in' in title:
return {'error': 'Profile requires login or not found', 'username': username}
result = {
'platform': 'threads',
'url': url,
}
result.update(parse_profile_title(title))
result.update(parse_profile_description(tags.get('og:description', '')))
if 'og:image' in tags:
result['profile_picture_url'] = tags['og:image']
return result
def get_threads_webfinger(username):
"""Get WebFinger data (ActivityPub discovery) for a Threads user."""
username = username.lstrip('@')
url = f'https://www.threads.net/.well-known/webfinger?resource=acct:{username}@threads.net'
content = curl_fetch(url, {'Accept': 'application/json'})
if not content:
return None
try:
data = json.loads(content)
if 'error' in data or 'success' in data and not data['success']:
return None
result = {'subject': data.get('subject', '')}
for link in data.get('links', []):
if link.get('type') == 'application/activity+json':
result['activitypub_url'] = link['href']
elif link.get('rel') == 'http://webfinger.net/rel/profile-page':
result['profile_url'] = link['href']
return result
except:
return None
def get_thread_post(post_url):
"""
Get content of a specific Threads post via OG tags.
Returns: text, author, image_url
"""
content = curl_fetch(post_url)
tags = extract_og_tags(content)
if not tags or 'og:title' not in tags:
return {'error': 'Failed to fetch post'}
title = tags.get('og:title', '')
if 'Log in' in title:
return {'error': 'Post requires login or not found'}
result = {'url': post_url}
if 'og:description' in tags:
result['text'] = tags['og:description']
elif 'description' in tags:
result['text'] = tags['description']
if 'og:title' in tags:
# Parse "Display Name (@username) on Threads"
m = re.match(r'^(.+?)\s*\(@(\w+)\)\s+on\s+Threads', title)
if m:
result['author_name'] = m.group(1).strip()
result['author_username'] = m.group(2)
if 'og:image' in tags:
result['image_url'] = tags['og:image']
return result
def get_threads_full(username):
"""Get complete profile data combining all methods."""
profile = get_threads_profile(username)
wf = get_threads_webfinger(username)
if wf:
profile['webfinger'] = wf
return profile
# ===== TEST =====
if __name__ == '__main__':
test_users = sys.argv[1:] if len(sys.argv) > 1 else ['zuck', 'nvidia', 'mosseri']
for user in test_users:
print(f"\n{'='*60}")
print(f" THREADS PROFILE: @{user}")
print(f"{'='*60}")
data = get_threads_full(user)
for k, v in sorted(data.items()):
if k == 'profile_picture_url':
print(f" {k}: {str(v)[:80]}...")
elif k == 'webfinger':
print(f" webfinger:")
for wk, wv in v.items():
print(f" {wk}: {wv}")
else:
print(f" {k}: {v}")
# Test posts
post_urls = [
'https://www.threads.net/@zuck/post/DEkvXzbyDS9',
]
print(f"\n{'='*60}")
print(f" THREADS POSTS")
print(f"{'='*60}")
for purl in post_urls:
print(f"\n URL: {purl}")
post = get_thread_post(purl)
for k, v in post.items():
if k in ('image_url',):
print(f" {k}: {str(v)[:80]}...")
elif k == 'text':
print(f" {k}: {v[:300]}{'...' if len(v) > 300 else ''}")
else:
print(f" {k}: {v}")
@@ -0,0 +1,305 @@
"""
TikTok Profile & Video Data Scraper
====================================
WORKING methods to get full TikTok profile data and video content.
Tested and verified April 2026.
METHODS SUMMARY:
================
METHOD 1 (BEST): HTML SSR Scraping - Parse __UNIVERSAL_DATA_FOR_REHYDRATION__
- Gets: FULL profile (bio, stats, follower/following/heart/video counts)
- Works: YES - Reliable, no auth needed, just curl + parse
- Limitation: No video list on profile page (videos load client-side)
METHOD 2: oEmbed API - https://www.tiktok.com/oembed?url=...
- Gets: Video title/caption, author, thumbnail URL
- Works: YES - No auth, no rate limit issues
- Limitation: Need video IDs first; no engagement stats
METHOD 3: tikwm.com API - https://www.tikwm.com/api/
- Gets: Full user info + individual video stats (plays, likes, comments, shares)
- User info: https://www.tikwm.com/api/user/info?unique_id={username}
- Video info: https://www.tikwm.com/api/?url={tiktok_video_url}
- Works: YES for user info and single videos
- Limitation: Posts list endpoint returns 403 (rate-limited)
METHOD 4: Video ID Discovery via Search Engines
- Use web_search("site:tiktok.com/@{username}/video") to find video IDs
- Then use oEmbed or tikwm or HTML scraping per video
- Works: YES - Gets ~5 recent video IDs per search
METHOD 5: SocialBlade via web_extract
- URL: https://socialblade.com/tiktok/user/{username}
- Gets: Followers, following, likes, videos, growth trends, rankings
- Works: YES via web_extract tool
METHOD 6: Individual Video HTML Scraping
- Fetch https://www.tiktok.com/@{user}/video/{id}
- Parse __UNIVERSAL_DATA webapp.video-detail -> itemInfo.itemStruct
- Gets: FULL video data (caption, stats, music, hashtags, duration)
- Works: YES - Most complete per-video data
NOT WORKING:
- TikTok /api/user/detail/ endpoint -> returns empty (needs signed params)
- TikTok /api/post/item_list/ -> returns empty (needs x-bogus/msToken)
- tikwm.com /api/user/posts -> 403 forbidden
"""
import re
import json
import subprocess
import urllib.parse
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'
def fetch_url(url, headers=None):
"""Fetch URL via curl and return content."""
cmd = ['curl', '-s', '-L', '-m', '30', url,
'-H', f'User-Agent: {USER_AGENT}',
'-H', 'Accept-Language: en-US,en;q=0.9']
if headers:
for k, v in headers.items():
cmd.extend(['-H', f'{k}: {v}'])
result = subprocess.run(cmd, capture_output=True, text=True, timeout=35)
return result.stdout
def method1_html_profile(username):
"""
METHOD 1: Scrape TikTok profile HTML and parse SSR JSON data.
Returns full profile with stats.
"""
url = f'https://www.tiktok.com/@{username}'
html = fetch_url(url)
m = re.search(
r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__" type="application/json">(.*?)</script>',
html
)
if not m:
return None
data = json.loads(m.group(1))
scope = data.get('__DEFAULT_SCOPE__', {})
user_detail = scope.get('webapp.user-detail', {})
user_info = user_detail.get('userInfo', {})
if not user_info:
return None
user = user_info.get('user', {})
stats = user_info.get('statsV2', user_info.get('stats', {}))
return {
'id': user.get('id'),
'username': user.get('uniqueId'),
'nickname': user.get('nickname'),
'bio': user.get('signature'),
'verified': user.get('verified'),
'private': user.get('privateAccount'),
'secUid': user.get('secUid'),
'avatarLarger': user.get('avatarLarger'),
'bioLink': user.get('bioLink', {}),
'createTime': user.get('createTime'),
'language': user.get('language'),
'stats': {
'followers': int(stats.get('followerCount', 0)),
'following': int(stats.get('followingCount', 0)),
'hearts': int(stats.get('heartCount', 0)),
'videos': int(stats.get('videoCount', 0)),
'diggs': int(stats.get('diggCount', 0)),
'friends': int(stats.get('friendCount', 0)),
}
}
def method2_oembed_video(username, video_id):
"""
METHOD 2: Get video caption/title via oEmbed.
No auth needed. Returns caption, author, thumbnail.
"""
url = f'https://www.tiktok.com/oembed?url=https://www.tiktok.com/@{username}/video/{video_id}'
content = fetch_url(url)
try:
data = json.loads(content)
return {
'video_id': video_id,
'title': data.get('title', ''),
'author_name': data.get('author_name'),
'author_url': data.get('author_url'),
'thumbnail_url': data.get('thumbnail_url'),
'thumbnail_width': data.get('thumbnail_width'),
'thumbnail_height': data.get('thumbnail_height'),
}
except json.JSONDecodeError:
return None
def method3_tikwm_user(username):
"""
METHOD 3a: Get user info via tikwm.com API.
"""
url = f'https://www.tikwm.com/api/user/info?unique_id={username}'
content = fetch_url(url)
try:
data = json.loads(content)
if data.get('code') == 0:
return data['data']
except json.JSONDecodeError:
pass
return None
def method3_tikwm_video(video_url):
"""
METHOD 3b: Get video details via tikwm.com API.
Returns: title, play_count, digg_count, comment_count, share_count, duration, download URLs
"""
url = f'https://www.tikwm.com/api/?url={urllib.parse.quote(video_url)}'
content = fetch_url(url)
try:
data = json.loads(content)
if data.get('code') == 0:
v = data['data']
return {
'video_id': v.get('id'),
'title': v.get('title'),
'duration': v.get('duration'),
'play_count': v.get('play_count'),
'likes': v.get('digg_count'),
'comments': v.get('comment_count'),
'shares': v.get('share_count'),
'author': v.get('author', {}).get('unique_id'),
'music_title': v.get('music_info', {}).get('title') if v.get('music_info') else None,
'cover_url': v.get('origin_cover') or v.get('cover'),
'play_url': v.get('play'), # direct video URL
}
except json.JSONDecodeError:
pass
return None
def method6_html_video(username, video_id):
"""
METHOD 6: Scrape individual video page HTML for full data.
Gets: caption, full stats, music, hashtags, create time.
"""
url = f'https://www.tiktok.com/@{username}/video/{video_id}'
html = fetch_url(url)
m = re.search(
r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__" type="application/json">(.*?)</script>',
html
)
if not m:
return None
data = json.loads(m.group(1))
scope = data.get('__DEFAULT_SCOPE__', {})
vd = scope.get('webapp.video-detail', {})
item = vd.get('itemInfo', {}).get('itemStruct', {})
if not item:
return None
stats = item.get('statsV2', item.get('stats', {}))
music = item.get('music', {})
challenges = item.get('challenges', [])
return {
'video_id': item.get('id'),
'description': item.get('desc'),
'createTime': item.get('createTime'),
'duration': item.get('video', {}).get('duration'),
'stats': {
'plays': int(stats.get('playCount', 0)),
'likes': int(stats.get('diggCount', 0)),
'comments': int(stats.get('commentCount', 0)),
'shares': int(stats.get('shareCount', 0)),
'saves': int(stats.get('collectCount', 0)),
},
'music': {
'title': music.get('title'),
'author': music.get('authorName'),
},
'hashtags': [c.get('title', '') for c in challenges],
'author': item.get('author', {}).get('uniqueId'),
}
def get_full_tiktok_profile(username):
"""
Complete pipeline: Get full profile + discover and scrape recent videos.
Returns dict with profile data, stats, and recent video details.
"""
# Step 1: Get profile data
profile = method1_html_profile(username)
if not profile:
return {'error': f'Could not fetch profile for @{username}'}
result = {
'profile': profile,
'videos': [],
'data_sources': ['tiktok_html_ssr'],
}
# Note: Video discovery requires web_search tool (not available in pure Python)
# In the agent context, use:
# web_search(f"site:tiktok.com/@{username}/video")
# Then for each video ID found, call method6_html_video() or method2_oembed_video()
return result
if __name__ == '__main__':
import sys
username = sys.argv[1] if len(sys.argv) > 1 else 'khaby.lame'
print(f'=== Testing TikTok scraping for @{username} ===\n')
print('--- METHOD 1: HTML Profile Scraping ---')
profile = method1_html_profile(username)
if profile:
print(f' Username: {profile["username"]}')
print(f' Nickname: {profile["nickname"]}')
print(f' Bio: {profile["bio"][:100]}')
print(f' Verified: {profile["verified"]}')
print(f' Followers: {profile["stats"]["followers"]:,}')
print(f' Following: {profile["stats"]["following"]:,}')
print(f' Hearts: {profile["stats"]["hearts"]:,}')
print(f' Videos: {profile["stats"]["videos"]:,}')
print(f' SecUid: {profile["secUid"][:50]}...')
else:
print(' FAILED')
print('\n--- METHOD 3a: tikwm.com User API ---')
tikwm_user = method3_tikwm_user(username)
if tikwm_user:
s = tikwm_user.get('stats', {})
print(f' Followers: {s.get("followerCount"):,}')
print(f' Hearts: {s.get("heartCount"):,}')
print(f' Videos: {s.get("videoCount"):,}')
else:
print(' FAILED')
# Test with a known video
test_video_id = '7615318641042623775' # khaby birthday video
if username == 'khaby.lame':
print(f'\n--- METHOD 2: oEmbed for video {test_video_id} ---')
oembed = method2_oembed_video(username, test_video_id)
if oembed:
print(f' Title: {oembed["title"][:80]}')
print(f'\n--- METHOD 6: HTML Video Scraping for {test_video_id} ---')
video = method6_html_video(username, test_video_id)
if video:
print(f' Description: {video["description"][:80]}')
print(f' Plays: {video["stats"]["plays"]:,}')
print(f' Likes: {video["stats"]["likes"]:,}')
print(f' Comments: {video["stats"]["comments"]:,}')
print(f' Shares: {video["stats"]["shares"]:,}')
print(f' Hashtags: {video["hashtags"]}')
print('\n=== DONE ===')
+260
View File
@@ -0,0 +1,260 @@
"""
Direct X/Twitter API v2 client for Hermes Simulator.
No x-cli dependency uses curl via terminal() with bearer token.
Provides:
- get_user(handle) profile, bio, metrics
- get_tweets(user_id, count) recent tweets with metrics
- search_tweets(query, count) search for tweets
- get_user_mentions(user_id, count) mentions of a user
"""
from hermes_tools import terminal
import json
import os
import time
import urllib.parse
# Bearer token — loaded from env or hardcoded fallback
BEARER = os.environ.get("X_BEARER_TOKEN", "")
MAX_RETRIES = 3
BASE_DELAY = 2 # seconds, exponential backoff: 2s, 4s, 8s
def _api_get(endpoint: str, params: dict = None) -> dict:
"""Make authenticated GET request to X API v2 with retry and error handling."""
url = f"https://api.twitter.com/2/{endpoint}"
if params:
qs = "&".join(f"{k}={urllib.parse.quote(str(v))}" for k, v in params.items())
url += f"?{qs}"
for attempt in range(MAX_RETRIES):
try:
r = terminal(f'curl -s -w \'\\n%{{http_code}}\' -H "Authorization: Bearer {BEARER}" "{url}"')
output = r.get("output", "").strip()
# Split body from status code (last line)
lines = output.rsplit("\n", 1)
if len(lines) == 2:
body, status_str = lines
else:
body = output
status_str = "0"
try:
status_code = int(status_str.strip())
except ValueError:
status_code = 0
# Handle specific status codes
if status_code == 429:
# Rate limited — retry with backoff
delay = BASE_DELAY * (2 ** attempt)
print(f" [X API] Rate limited (429). Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
time.sleep(delay)
continue
if status_code in (401, 403):
return {"error": f"Authentication failed (HTTP {status_code}). Check X_BEARER_TOKEN.", "http_status": status_code}
if status_code >= 500:
delay = BASE_DELAY * (2 ** attempt)
print(f" [X API] Server error ({status_code}). Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
time.sleep(delay)
continue
if status_code == 0 and not body:
# Network error — no response at all
delay = BASE_DELAY * (2 ** attempt)
print(f" [X API] Network error. Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
time.sleep(delay)
continue
try:
return json.loads(body)
except json.JSONDecodeError:
return {"error": f"Failed to parse response (HTTP {status_code}): {body[:200]}"}
except Exception as e:
delay = BASE_DELAY * (2 ** attempt)
print(f" [X API] Exception: {e}. Retry {attempt+1}/{MAX_RETRIES} in {delay}s...")
time.sleep(delay)
continue
return {"error": f"All {MAX_RETRIES} retries exhausted for {endpoint}"}
def get_user(handle: str) -> dict:
"""Get user profile by handle."""
handle = handle.lstrip("@")
return _api_get(f"users/by/username/{handle}", {
"user.fields": "description,public_metrics,profile_image_url,created_at,location,url"
})
def get_tweets(user_id: str, count: int = 20) -> dict:
"""Get user's recent tweets."""
return _api_get(f"users/{user_id}/tweets", {
"max_results": max(min(count, 100), 5),
"tweet.fields": "created_at,public_metrics,text,in_reply_to_user_id,referenced_tweets",
"exclude": "retweets" # original tweets only for voice analysis
})
def get_tweets_with_rts(user_id: str, count: int = 20) -> dict:
"""Get user's recent tweets including retweets (shows interests)."""
return _api_get(f"users/{user_id}/tweets", {
"max_results": max(min(count, 100), 5),
"tweet.fields": "created_at,public_metrics,text,referenced_tweets"
})
def search_tweets(query: str, count: int = 10) -> dict:
"""Search recent tweets."""
return _api_get("tweets/search/recent", {
"query": query,
"max_results": max(min(count, 100), 10),
"tweet.fields": "created_at,public_metrics,text,author_id"
})
def get_user_by_id(user_id: str) -> dict:
"""Get user profile by ID."""
return _api_get(f"users/{user_id}", {
"user.fields": "description,public_metrics,username,name"
})
# ═══════════════════════════════════════════════════════════════
# HIGH-LEVEL INTELLIGENCE FUNCTIONS
# ═══════════════════════════════════════════════════════════════
def profile_user(handle: str) -> dict:
"""Full profile pull: identity + recent tweets (originals only)."""
user = get_user(handle)
if "errors" in user or "error" in user:
return {"error": f"User @{handle} not found", "details": user}
user_data = user.get("data", {})
user_id = user_data.get("id")
result = {
"profile": user_data,
"tweets": [],
"voice_samples": [],
}
if user_id:
# Get original tweets (no RTs) for voice analysis
tweets = get_tweets(user_id, 20)
tweet_list = tweets.get("data", [])
result["tweets"] = tweet_list
# Extract pure text samples for voice profiling
# Only exclude retweets and actual replies (has in_reply_to_user_id)
# Tweets starting with @ are fine if they're standalone mentions
result["voice_samples"] = [
t["text"] for t in tweet_list
if not t.get("text", "").startswith("RT @")
and not t.get("in_reply_to_user_id")
]
return result
def profile_interactions(handle1: str, handle2: str) -> dict:
"""Find interactions between two users."""
# Search for replies from handle1 to handle2
q1 = f"from:{handle1} to:{handle2}"
q2 = f"from:{handle2} to:{handle1}"
r1 = search_tweets(q1, 10)
r2 = search_tweets(q2, 10)
return {
f"{handle1}_to_{handle2}": r1.get("data", []),
f"{handle2}_to_{handle1}": r2.get("data", []),
}
def get_voice_data(handle: str, count: int = 50) -> dict:
"""Pull maximum voice data: tweets, replies, quote tweets.
Returns categorized samples for voice profiling."""
user = get_user(handle)
if "errors" in user or "error" in user:
return {"error": f"User @{handle} not found"}
user_data = user.get("data", {})
user_id = user_data.get("id")
if not user_id:
return {"error": "No user ID found"}
# Original tweets (exclude RTs)
originals = get_tweets(user_id, min(count, 100))
original_list = originals.get("data", [])
# Categorize — only use in_reply_to_user_id to detect replies
standalone = [] # not replies
replies = [] # replies to others
for t in original_list:
text = t.get("text", "")
if t.get("in_reply_to_user_id"):
replies.append(text)
else:
standalone.append(text)
return {
"profile": user_data,
"standalone_tweets": standalone, # their voice at rest
"replies": replies, # their voice in conversation
"total_samples": len(standalone) + len(replies),
}
# ═══════════════════════════════════════════════════════════════
# ENTRY POINT
# ═══════════════════════════════════════════════════════════════
if __name__ == "__main__":
if not BEARER:
print("ERROR: X_BEARER_TOKEN not set. Set it in environment or ~/.hermes/.env")
print("Trying to load from .env...")
try:
with open(os.path.expanduser("~/.hermes/.env")) as f:
for line in f:
line = line.strip()
if line.startswith("X_BEARER_TOKEN="):
# Use split with maxsplit=1 to handle values with '=' in them
# Also strip surrounding quotes if present
val = line.split("=", 1)[1]
if val and val[0] in ('"', "'") and val[-1] == val[0]:
val = val[1:-1]
BEARER = val
break
except Exception as e:
print(f" Failed to load .env: {e}")
if not BEARER:
print("FATAL: No bearer token found.")
exit(1)
# Demo: profile two users
for handle in ["Teknium", "basedjensen"]:
print(f"\n{'='*60}")
print(f" PROFILING @{handle}")
print(f"{'='*60}")
data = profile_user(handle)
profile = data.get("profile", {})
print(f" Name: {profile.get('name')}")
print(f" Bio: {profile.get('description')}")
metrics = profile.get("public_metrics", {})
print(f" Followers: {metrics.get('followers_count')}")
print(f" Tweets: {metrics.get('tweet_count')}")
print(f" Likes given: {metrics.get('like_count')}")
print(f"\n Voice samples ({len(data.get('voice_samples', []))}):")
for sample in data.get("voice_samples", [])[:5]:
print(f" > {sample[:120]}")
@@ -0,0 +1,136 @@
# DOSSIER: {display_name} (@{handle})
## Identity
- **Name**: {real_name}
- **Handle(s)**: @{twitter} | u/{reddit} | {discord_tag}
- **Role**: {role_and_org}
- **Known for**: {what_they_are_famous_for}
- **Followers/reach**: {approximate_follower_count}
- **Confidence**: {HIGH|MEDIUM|LOW} — {confidence_reason}
## Voice Profile
### Linguistic Patterns
- **Sentence structure**: {short_punchy | long_flowing | mixed}
- **Capitalization**: {normal | all_lowercase | CAPS_FOR_EMPHASIS | mixed}
- **Punctuation**: {heavy_periods | ellipsis_lover | no_punctuation | exclamation_marks}
- **Paragraph style**: {one_liners | thread_essays | medium_blocks}
- **Emoji/emoticon usage**: {none | minimal | heavy | specific_ones}
### Vocabulary & Slang
- **Register**: {academic | casual | shitposter | mixed}
- **Recurring words/phrases**: [list of signature words they use a lot]
- **Catchphrases**: [any repeated phrases or running jokes]
- **Profanity level**: {none | mild | moderate | heavy}
- **Jargon tendency**: {explains_everything | assumes_expertise | mixes}
### Tone
- **Default mood**: {earnest | ironic | combative | chill | manic | analytical}
- **Humor style**: {deadpan | absurdist | sarcastic | wholesome | shitpost | none}
- **How they handle disagreement**: {engages_thoughtfully | dunks | ignores | ratio_warrior | passive_aggressive}
- **How they handle praise**: {deflects | accepts_gracefully | awkward | flexes}
## Positions & Beliefs
### Core Convictions (things they consistently advocate for)
1. {conviction_1}
2. {conviction_2}
3. {conviction_3}
### Known Hot Takes
1. {take_1}
2. {take_2}
### Hills They'll Die On
1. {hill_1}
2. {hill_2}
### Topics They Avoid or Refuse to Engage
1. {avoidance_1}
## Social Dynamics
### People They Interact With Positively
- @{ally_1} — {relationship_description}
- @{ally_2} — {relationship_description}
### People They Beef With / Disagree With
- @{rival_1} — {beef_description}
### How They Engage Different Types
- **Fans/supporters**: {how_they_respond}
- **Critics**: {how_they_respond}
- **Peers**: {how_they_respond}
- **Random people**: {how_they_respond}
## Platform-Specific Behavior
### On Twitter/X
- **Post frequency**: {multiple_daily | daily | few_per_week}
- **Thread tendency**: {never | sometimes | loves_threads}
- **QRT style**: {adds_context | dunks | amplifies}
- **Engagement style**: {likes_a_lot | rarely_likes | retweets_heavy}
### On Reddit (if applicable)
- **Subreddits**: [list]
- **Comment style**: {detailed | brief | combative}
### On Discord (if applicable)
- **Servers**: [known servers]
- **Vibe shift from Twitter**: {description}
## Signature Moves
Things this person characteristically does that make them recognizable:
1. {signature_move_1}
2. {signature_move_2}
3. {signature_move_3}
## Sample Quotes (real, sourced from research)
> "{actual_quote_1}" — [source/context]
> "{actual_quote_2}" — [source/context]
> "{actual_quote_3}" — [source/context]
## Deep Psychometric Profile
- **Big Five**: O{H/M/L} C{} E{} A{} N{} — {evidence}
- **Moral Foundations**: Care{} Fair{} Loyal{} Auth{} Sanct{} Liberty{} — {what drives their ethics}
- **Schwartz Values**: {dominant values} — {how they justify positions}
- **Cognitive Style**: {IC score estimate} — {hedging patterns, complexity, analytical vs intuitive}
- **Narrative Frame**: {dominant frame} — {how they lens issues}
- **Persona Authenticity**: {1-5 score} — {evidence for curation vs authenticity}
## Strategic Self-Presentation (Red Hat)
- **Cultivated image**: {what they want to be seen as}
- **Target audience**: {who they're performing for}
- **Incentive structure**: {what they gain from this persona}
- **Possible divergences**: {where persona may ≠ person}
- **Ghostwriting indicators**: {present/absent, evidence}
## Ecosystem Context
- **Community cluster**: {which tribe they belong to}
- **Key influencers**: {who they amplify/follow/agree with}
- **Echo chamber**: {what information environment they're in}
- **Audience profile**: {who follows them, how that audience reacts}
## Key Assumptions
1. {assumption} — FRAGILITY: {robust/moderate/fragile} — Test: {what invalidates it}
2. {assumption} — FRAGILITY: {} — Test: {}
3. {assumption} — FRAGILITY: {} — Test: {}
## Competing Hypotheses
- **H1 (PRIMARY)**: {main personality model} — Confidence: {X}%
- **H2 (ALTERNATIVE)**: {alternative explanation} — Confidence: {X}%
- **Key discriminator**: {what evidence would shift between H1 and H2}
## Research Sources
- {source_1} [{reliability}{confidence}] — {description}
- {source_2} [{reliability}{confidence}] — {description}
- {source_3} [{reliability}{confidence}] — {description}
## Invalidation Indicators
1. If @{handle} {does X instead of Y}, our {assessment} is wrong
2. If @{handle} {responds to Z with Q}, our {model} needs revision
3. If @{handle} {interacts with @person in manner M}, dynamics model is off
---
*Dossier compiled: {date} | Fidelity: {fidelity}% | Persona Authenticity: {1-5}*
*Source reliability range: {best}-{worst} | Analytical confidence: {1-6}*
+19 -55
View File
@@ -1,12 +1,11 @@
# Hindsight Memory Provider
Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. Supports cloud, local embedded, and local external modes.
Long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval. Supports cloud and local (embedded) modes.
## Requirements
- **Cloud:** API key from [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io)
- **Local Embedded:** API key for a supported LLM provider (OpenAI, Anthropic, Gemini, Groq, OpenRouter, MiniMax, Ollama, or any OpenAI-compatible endpoint). Embeddings and reranking run locally — no additional API keys needed.
- **Local External:** A running Hindsight instance (Docker or self-hosted) reachable over HTTP.
- **Local:** API key for a supported LLM provider (OpenAI, Anthropic, Gemini, Groq, MiniMax, or Ollama). Embeddings and reranking run locally — no additional API keys needed.
## Setup
@@ -22,28 +21,17 @@ hermes config set memory.provider hindsight
echo "HINDSIGHT_API_KEY=your-key" >> ~/.hermes/.env
```
### Cloud
### Cloud Mode
Connects to the Hindsight Cloud API. Requires an API key from [ui.hindsight.vectorize.io](https://ui.hindsight.vectorize.io).
### Local Embedded
### Local Mode
Hermes spins up a local Hindsight daemon with built-in PostgreSQL. Requires an LLM API key for memory extraction and synthesis. The daemon starts automatically in the background on first use and stops after 5 minutes of inactivity.
Supports any OpenAI-compatible LLM endpoint (llama.cpp, vLLM, LM Studio, etc.) — pick `openai_compatible` as the provider and enter the base URL.
Runs an embedded Hindsight server with built-in PostgreSQL. Requires an LLM API key (e.g. Groq, OpenAI, Anthropic) for memory extraction and synthesis. The daemon starts automatically in the background on first use and stops after 5 minutes of inactivity.
Daemon startup logs: `~/.hermes/logs/hindsight-embed.log`
Daemon runtime logs: `~/.hindsight/profiles/<profile>.log`
To open the Hindsight web UI (local embedded mode only):
```bash
hindsight-embed -p hermes ui start
```
### Local External
Points the plugin at an existing Hindsight instance you're already running (Docker, self-hosted, etc.). No daemon management — just a URL and an optional API key.
## Config
Config file: `~/.hermes/hindsight/config.json`
@@ -52,58 +40,39 @@ Config file: `~/.hermes/hindsight/config.json`
| Key | Default | Description |
|-----|---------|-------------|
| `mode` | `cloud` | `cloud`, `local_embedded`, or `local_external` |
| `api_url` | `https://api.hindsight.vectorize.io` | API URL (cloud and local_external modes) |
| `mode` | `cloud` | `cloud` or `local` |
| `api_url` | `https://api.hindsight.vectorize.io` | API URL (cloud mode) |
| `api_url` | `http://localhost:8888` | API URL (local mode, unused — daemon manages its own port) |
### Memory Bank
### Memory
| Key | Default | Description |
|-----|---------|-------------|
| `bank_id` | `hermes` | Memory bank name |
| `bank_mission` | — | Reflect mission (identity/framing for reflect reasoning). Applied via Banks API. |
| `bank_retain_mission` | — | Retain mission (steers what gets extracted). Applied via Banks API. |
### Recall
| Key | Default | Description |
|-----|---------|-------------|
| `recall_budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |
| `recall_prefetch_method` | `recall` | Auto-recall method: `recall` (raw facts) or `reflect` (LLM synthesis) |
| `recall_max_tokens` | `4096` | Maximum tokens for recall results |
| `recall_max_input_chars` | `800` | Maximum input query length for auto-recall |
| `recall_prompt_preamble` | — | Custom preamble for recalled memories in context |
| `recall_tags` | — | Tags to filter when searching memories |
| `recall_tags_match` | `any` | Tag matching mode: `any` / `all` / `any_strict` / `all_strict` |
| `auto_recall` | `true` | Automatically recall memories before each turn |
### Retain
| Key | Default | Description |
|-----|---------|-------------|
| `auto_retain` | `true` | Automatically retain conversation turns |
| `retain_async` | `true` | Process retain asynchronously on the Hindsight server |
| `retain_every_n_turns` | `1` | Retain every N turns (1 = every turn) |
| `retain_context` | `conversation between Hermes Agent and the User` | Context label for retained memories |
| `tags` | — | Tags applied when storing memories |
| `budget` | `mid` | Recall thoroughness: `low` / `mid` / `high` |
### Integration
| Key | Default | Description |
|-----|---------|-------------|
| `memory_mode` | `hybrid` | How memories are integrated into the agent |
| `prefetch_method` | `recall` | Method for automatic context injection |
**memory_mode:**
- `hybrid` — automatic context injection + tools available to the LLM
- `context` — automatic injection only, no tools exposed
- `tools` — tools only, no automatic injection
### Local Embedded LLM
**prefetch_method:**
- `recall` — injects raw memory facts (fast)
- `reflect` — injects LLM-synthesized summary (slower, more coherent)
### Local Mode LLM
| Key | Default | Description |
|-----|---------|-------------|
| `llm_provider` | `openai` | `openai`, `anthropic`, `gemini`, `groq`, `openrouter`, `minimax`, `ollama`, `lmstudio`, `openai_compatible` |
| `llm_model` | per-provider | Model name (e.g. `gpt-4o-mini`, `qwen/qwen3.5-9b`) |
| `llm_base_url` | — | Endpoint URL for `openai_compatible` (e.g. `http://192.168.1.10:8080/v1`) |
| `llm_provider` | `openai` | LLM provider: `openai`, `anthropic`, `gemini`, `groq`, `minimax`, `ollama` |
| `llm_model` | per-provider | Model name (e.g. `gpt-4o-mini`, `openai/gpt-oss-120b`) |
The LLM API key is stored in `~/.hermes/.env` as `HINDSIGHT_LLM_API_KEY`.
@@ -123,12 +92,7 @@ Available in `hybrid` and `tools` memory modes:
|----------|-------------|
| `HINDSIGHT_API_KEY` | API key for Hindsight Cloud |
| `HINDSIGHT_LLM_API_KEY` | LLM API key for local mode |
| `HINDSIGHT_API_LLM_BASE_URL` | LLM Base URL for local mode (e.g. OpenRouter) |
| `HINDSIGHT_API_URL` | Override API endpoint |
| `HINDSIGHT_BANK_ID` | Override bank name |
| `HINDSIGHT_BUDGET` | Override recall budget |
| `HINDSIGHT_MODE` | Override mode (`cloud`, `local_embedded`, `local_external`) |
## Client Version
Requires `hindsight-client >= 0.4.22`. The plugin auto-upgrades on session start if an older version is detected.
| `HINDSIGHT_MODE` | Override mode (`cloud` / `local`) |
+47 -415
View File
@@ -23,30 +23,24 @@ import json
import logging
import os
import threading
from hermes_constants import get_hermes_home
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
from hermes_constants import get_hermes_home
from tools.registry import tool_error
logger = logging.getLogger(__name__)
_DEFAULT_API_URL = "https://api.hindsight.vectorize.io"
_DEFAULT_LOCAL_URL = "http://localhost:8888"
_MIN_CLIENT_VERSION = "0.4.22"
_VALID_BUDGETS = {"low", "mid", "high"}
_PROVIDER_DEFAULT_MODELS = {
"openai": "gpt-4o-mini",
"anthropic": "claude-haiku-4-5",
"gemini": "gemini-2.5-flash",
"groq": "openai/gpt-oss-120b",
"openrouter": "qwen/qwen3.5-9b",
"minimax": "MiniMax-M2.7",
"ollama": "gemma3:12b",
"lmstudio": "local-model",
"openai_compatible": "your-model-name",
}
@@ -148,6 +142,7 @@ def _load_config() -> dict:
3. Environment variables
"""
from pathlib import Path
from hermes_constants import get_hermes_home
# Profile-scoped path (preferred)
profile_path = get_hermes_home() / "hindsight" / "config.json"
@@ -192,7 +187,6 @@ class HindsightMemoryProvider(MemoryProvider):
self._bank_id = "hermes"
self._budget = "mid"
self._mode = "cloud"
self._llm_base_url = ""
self._memory_mode = "hybrid" # "context", "tools", or "hybrid"
self._prefetch_method = "recall" # "recall" or "reflect"
self._client = None
@@ -200,31 +194,6 @@ class HindsightMemoryProvider(MemoryProvider):
self._prefetch_lock = threading.Lock()
self._prefetch_thread = None
self._sync_thread = None
self._session_id = ""
# Tags
self._tags: list[str] | None = None
self._recall_tags: list[str] | None = None
self._recall_tags_match = "any"
# Retain controls
self._auto_retain = True
self._retain_every_n_turns = 1
self._retain_context = "conversation between Hermes Agent and the User"
self._turn_counter = 0
self._session_turns: list[str] = [] # accumulates ALL turns for the session
# Recall controls
self._auto_recall = True
self._recall_max_tokens = 4096
self._recall_types: list[str] | None = None
self._recall_prompt_preamble = ""
self._recall_max_input_chars = 800
# Bank
self._bank_mission = ""
self._bank_retain_mission: str | None = None
self._retain_async = True
@property
def name(self) -> str:
@@ -234,7 +203,7 @@ class HindsightMemoryProvider(MemoryProvider):
try:
cfg = _load_config()
mode = cfg.get("mode", "cloud")
if mode in ("local", "local_embedded", "local_external"):
if mode == "local":
return True
has_key = bool(cfg.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", ""))
has_url = bool(cfg.get("api_url") or os.environ.get("HINDSIGHT_API_URL", ""))
@@ -258,306 +227,68 @@ class HindsightMemoryProvider(MemoryProvider):
existing.update(values)
config_path.write_text(json.dumps(existing, indent=2))
def post_setup(self, hermes_home: str, config: dict) -> None:
"""Custom setup wizard — installs only the deps needed for the selected mode."""
import getpass
import subprocess
import shutil
import sys
from pathlib import Path
from hermes_cli.config import save_config
from hermes_cli.memory_setup import _curses_select
print("\n Configuring Hindsight memory:\n")
# Step 1: Mode selection
mode_items = [
("Cloud", "Hindsight Cloud API (lightweight, just needs an API key)"),
("Local Embedded", "Run Hindsight locally (downloads ~200MB, needs LLM key)"),
("Local External", "Connect to an existing Hindsight instance"),
]
mode_idx = _curses_select(" Select mode", mode_items, default=0)
mode = ["cloud", "local_embedded", "local_external"][mode_idx]
provider_config: dict = {"mode": mode}
env_writes: dict = {}
# Step 2: Install/upgrade deps for selected mode
_MIN_CLIENT_VERSION = "0.4.22"
cloud_dep = f"hindsight-client>={_MIN_CLIENT_VERSION}"
local_dep = "hindsight-all"
if mode == "local_embedded":
deps_to_install = [local_dep]
elif mode == "local_external":
deps_to_install = [cloud_dep]
else:
deps_to_install = [cloud_dep]
print(f"\n Checking dependencies...")
uv_path = shutil.which("uv")
if not uv_path:
print(" ⚠ uv not found — install it: curl -LsSf https://astral.sh/uv/install.sh | sh")
print(f" Then run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")
else:
try:
subprocess.run(
[uv_path, "pip", "install", "--python", sys.executable, "--quiet", "--upgrade"] + deps_to_install,
check=True, timeout=120, capture_output=True,
)
print(f" ✓ Dependencies up to date")
except Exception as e:
print(f" ⚠ Install failed: {e}")
print(f" Run manually: uv pip install --python {sys.executable} {' '.join(deps_to_install)}")
# Step 3: Mode-specific config
if mode == "cloud":
print(f"\n Get your API key at https://ui.hindsight.vectorize.io\n")
existing_key = os.environ.get("HINDSIGHT_API_KEY", "")
if existing_key:
masked = f"...{existing_key[-4:]}" if len(existing_key) > 4 else "set"
sys.stdout.write(f" API key (current: {masked}, blank to keep): ")
sys.stdout.flush()
api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
else:
sys.stdout.write(" API key: ")
sys.stdout.flush()
api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
if api_key:
env_writes["HINDSIGHT_API_KEY"] = api_key
val = input(f" API URL [{_DEFAULT_API_URL}]: ").strip()
if val:
provider_config["api_url"] = val
elif mode == "local_external":
val = input(f" Hindsight API URL [{_DEFAULT_LOCAL_URL}]: ").strip()
provider_config["api_url"] = val or _DEFAULT_LOCAL_URL
sys.stdout.write(" API key (optional, blank to skip): ")
sys.stdout.flush()
api_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
if api_key:
env_writes["HINDSIGHT_API_KEY"] = api_key
else: # local_embedded
providers_list = list(_PROVIDER_DEFAULT_MODELS.keys())
llm_items = [
(p, f"default model: {_PROVIDER_DEFAULT_MODELS[p]}")
for p in providers_list
]
llm_idx = _curses_select(" Select LLM provider", llm_items, default=0)
llm_provider = providers_list[llm_idx]
provider_config["llm_provider"] = llm_provider
if llm_provider == "openai_compatible":
val = input(" LLM endpoint URL (e.g. http://192.168.1.10:8080/v1): ").strip()
if val:
provider_config["llm_base_url"] = val
elif llm_provider == "openrouter":
provider_config["llm_base_url"] = "https://openrouter.ai/api/v1"
default_model = _PROVIDER_DEFAULT_MODELS.get(llm_provider, "gpt-4o-mini")
val = input(f" LLM model [{default_model}]: ").strip()
provider_config["llm_model"] = val or default_model
sys.stdout.write(" LLM API key: ")
sys.stdout.flush()
llm_key = getpass.getpass(prompt="") if sys.stdin.isatty() else sys.stdin.readline().strip()
if llm_key:
env_writes["HINDSIGHT_LLM_API_KEY"] = llm_key
# Step 4: Save everything
provider_config["bank_id"] = "hermes"
provider_config["recall_budget"] = "mid"
bank_id = "hermes"
config["memory"]["provider"] = "hindsight"
save_config(config)
self.save_config(provider_config, hermes_home)
if env_writes:
env_path = Path(hermes_home) / ".env"
env_path.parent.mkdir(parents=True, exist_ok=True)
existing_lines = []
if env_path.exists():
existing_lines = env_path.read_text().splitlines()
updated_keys = set()
new_lines = []
for line in existing_lines:
key_match = line.split("=", 1)[0].strip() if "=" in line and not line.startswith("#") else None
if key_match and key_match in env_writes:
new_lines.append(f"{key_match}={env_writes[key_match]}")
updated_keys.add(key_match)
else:
new_lines.append(line)
for k, v in env_writes.items():
if k not in updated_keys:
new_lines.append(f"{k}={v}")
env_path.write_text("\n".join(new_lines) + "\n")
print(f"\n ✓ Hindsight memory configured ({mode} mode)")
if env_writes:
print(f" API keys saved to .env")
print(f"\n Start a new session to activate.\n")
def get_config_schema(self):
return [
{"key": "mode", "description": "Connection mode", "default": "cloud", "choices": ["cloud", "local_embedded", "local_external"]},
# Cloud mode
{"key": "api_url", "description": "Hindsight Cloud API URL", "default": _DEFAULT_API_URL, "when": {"mode": "cloud"}},
{"key": "mode", "description": "Cloud API or local embedded mode", "default": "cloud", "choices": ["cloud", "local"]},
{"key": "api_url", "description": "Hindsight API URL", "default": _DEFAULT_API_URL, "when": {"mode": "cloud"}},
{"key": "api_key", "description": "Hindsight Cloud API key", "secret": True, "env_var": "HINDSIGHT_API_KEY", "url": "https://ui.hindsight.vectorize.io", "when": {"mode": "cloud"}},
# Local external mode
{"key": "api_url", "description": "Hindsight API URL", "default": _DEFAULT_LOCAL_URL, "when": {"mode": "local_external"}},
{"key": "api_key", "description": "API key (optional)", "secret": True, "env_var": "HINDSIGHT_API_KEY", "when": {"mode": "local_external"}},
# Local embedded mode
{"key": "llm_provider", "description": "LLM provider", "default": "openai", "choices": ["openai", "anthropic", "gemini", "groq", "openrouter", "minimax", "ollama", "lmstudio", "openai_compatible"], "when": {"mode": "local_embedded"}},
{"key": "llm_base_url", "description": "Endpoint URL (e.g. http://192.168.1.10:8080/v1)", "default": "", "when": {"mode": "local_embedded", "llm_provider": "openai_compatible"}},
{"key": "llm_api_key", "description": "LLM API key (optional for openai_compatible)", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local_embedded"}},
{"key": "llm_model", "description": "LLM model", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local_embedded"}},
{"key": "llm_provider", "description": "LLM provider for local mode", "default": "openai", "choices": ["openai", "anthropic", "gemini", "groq", "minimax", "ollama"], "when": {"mode": "local"}},
{"key": "llm_api_key", "description": "LLM API key for local Hindsight", "secret": True, "env_var": "HINDSIGHT_LLM_API_KEY", "when": {"mode": "local"}},
{"key": "llm_model", "description": "LLM model for local mode", "default": "gpt-4o-mini", "default_from": {"field": "llm_provider", "map": _PROVIDER_DEFAULT_MODELS}, "when": {"mode": "local"}},
{"key": "bank_id", "description": "Memory bank name", "default": "hermes"},
{"key": "bank_mission", "description": "Mission/purpose description for the memory bank"},
{"key": "bank_retain_mission", "description": "Custom extraction prompt for memory retention"},
{"key": "recall_budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
{"key": "budget", "description": "Recall thoroughness", "default": "mid", "choices": ["low", "mid", "high"]},
{"key": "memory_mode", "description": "Memory integration mode", "default": "hybrid", "choices": ["hybrid", "context", "tools"]},
{"key": "recall_prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
{"key": "tags", "description": "Tags applied when storing memories (comma-separated)", "default": ""},
{"key": "recall_tags", "description": "Tags to filter when searching memories (comma-separated)", "default": ""},
{"key": "recall_tags_match", "description": "Tag matching mode for recall", "default": "any", "choices": ["any", "all", "any_strict", "all_strict"]},
{"key": "auto_recall", "description": "Automatically recall memories before each turn", "default": True},
{"key": "auto_retain", "description": "Automatically retain conversation turns", "default": True},
{"key": "retain_every_n_turns", "description": "Retain every N turns (1 = every turn)", "default": 1},
{"key": "retain_async","description": "Process retain asynchronously on the Hindsight server", "default": True},
{"key": "retain_context", "description": "Context label for retained memories", "default": "conversation between Hermes Agent and the User"},
{"key": "recall_max_tokens", "description": "Maximum tokens for recall results", "default": 4096},
{"key": "recall_max_input_chars", "description": "Maximum input query length for auto-recall", "default": 800},
{"key": "recall_prompt_preamble", "description": "Custom preamble for recalled memories in context"},
{"key": "prefetch_method", "description": "Auto-recall method", "default": "recall", "choices": ["recall", "reflect"]},
]
def _get_client(self):
"""Return the cached Hindsight client (created once, reused)."""
if self._client is None:
if self._mode == "local_embedded":
if self._mode == "local":
from hindsight import HindsightEmbedded
# Disable __del__ on the class to prevent "attached to a
# different loop" errors during GC — we handle cleanup in
# shutdown() instead.
HindsightEmbedded.__del__ = lambda self: None
llm_provider = self._config.get("llm_provider", "")
if llm_provider in ("openai_compatible", "openrouter"):
llm_provider = "openai"
logger.debug("Creating HindsightEmbedded client (profile=%s, provider=%s)",
self._config.get("profile", "hermes"), llm_provider)
kwargs = dict(
self._client = HindsightEmbedded(
profile=self._config.get("profile", "hermes"),
llm_provider=llm_provider,
llm_api_key=self._config.get("llmApiKey") or self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", ""),
llm_provider=self._config.get("llm_provider", ""),
llm_api_key=self._config.get("llmApiKey") or os.environ.get("HINDSIGHT_LLM_API_KEY", ""),
llm_model=self._config.get("llm_model", ""),
)
if self._llm_base_url:
kwargs["llm_base_url"] = self._llm_base_url
self._client = HindsightEmbedded(**kwargs)
else:
from hindsight_client import Hindsight
kwargs = {"base_url": self._api_url, "timeout": 30.0}
if self._api_key:
kwargs["api_key"] = self._api_key
logger.debug("Creating Hindsight cloud client (url=%s, has_key=%s)",
self._api_url, bool(self._api_key))
self._client = Hindsight(**kwargs)
return self._client
def initialize(self, session_id: str, **kwargs) -> None:
self._session_id = session_id
# Check client version and auto-upgrade if needed
try:
from importlib.metadata import version as pkg_version
from packaging.version import Version
installed = pkg_version("hindsight-client")
if Version(installed) < Version(_MIN_CLIENT_VERSION):
logger.warning("hindsight-client %s is outdated (need >=%s), attempting upgrade...",
installed, _MIN_CLIENT_VERSION)
import shutil, subprocess, sys
uv_path = shutil.which("uv")
if uv_path:
try:
subprocess.run(
[uv_path, "pip", "install", "--python", sys.executable,
"--quiet", "--upgrade", f"hindsight-client>={_MIN_CLIENT_VERSION}"],
check=True, timeout=120, capture_output=True,
)
logger.info("hindsight-client upgraded to >=%s", _MIN_CLIENT_VERSION)
except Exception as e:
logger.warning("Auto-upgrade failed: %s. Run: uv pip install 'hindsight-client>=%s'",
e, _MIN_CLIENT_VERSION)
else:
logger.warning("uv not found. Run: pip install 'hindsight-client>=%s'", _MIN_CLIENT_VERSION)
except Exception:
pass # packaging not available or other issue — proceed anyway
self._config = _load_config()
self._mode = self._config.get("mode", "cloud")
# "local" is a legacy alias for "local_embedded"
if self._mode == "local":
self._mode = "local_embedded"
self._api_key = self._config.get("apiKey") or self._config.get("api_key") or os.environ.get("HINDSIGHT_API_KEY", "")
default_url = _DEFAULT_LOCAL_URL if self._mode in ("local_embedded", "local_external") else _DEFAULT_API_URL
self._api_key = self._config.get("apiKey") or os.environ.get("HINDSIGHT_API_KEY", "")
default_url = _DEFAULT_LOCAL_URL if self._mode == "local" else _DEFAULT_API_URL
self._api_url = self._config.get("api_url") or os.environ.get("HINDSIGHT_API_URL", default_url)
self._llm_base_url = self._config.get("llm_base_url", "")
banks = self._config.get("banks", {}).get("hermes", {})
self._bank_id = self._config.get("bank_id") or banks.get("bankId", "hermes")
budget = self._config.get("recall_budget") or self._config.get("budget") or banks.get("budget", "mid")
budget = self._config.get("budget") or banks.get("budget", "mid")
self._budget = budget if budget in _VALID_BUDGETS else "mid"
memory_mode = self._config.get("memory_mode", "hybrid")
self._memory_mode = memory_mode if memory_mode in ("context", "tools", "hybrid") else "hybrid"
prefetch_method = self._config.get("recall_prefetch_method", "recall")
prefetch_method = self._config.get("prefetch_method", "recall")
self._prefetch_method = prefetch_method if prefetch_method in ("recall", "reflect") else "recall"
# Bank options
self._bank_mission = self._config.get("bank_mission", "")
self._bank_retain_mission = self._config.get("bank_retain_mission") or None
# Tags
self._tags = self._config.get("tags") or None
self._recall_tags = self._config.get("recall_tags") or None
self._recall_tags_match = self._config.get("recall_tags_match", "any")
# Retain controls
self._auto_retain = self._config.get("auto_retain", True)
self._retain_every_n_turns = max(1, int(self._config.get("retain_every_n_turns", 1)))
self._retain_context = self._config.get("retain_context", "conversation between Hermes Agent and the User")
# Recall controls
self._auto_recall = self._config.get("auto_recall", True)
self._recall_max_tokens = int(self._config.get("recall_max_tokens", 4096))
self._recall_types = self._config.get("recall_types") or None
self._recall_prompt_preamble = self._config.get("recall_prompt_preamble", "")
self._recall_max_input_chars = int(self._config.get("recall_max_input_chars", 800))
self._retain_async = self._config.get("retain_async", True)
_client_version = "unknown"
try:
from importlib.metadata import version as pkg_version
_client_version = pkg_version("hindsight-client")
except Exception:
pass
logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s, client=%s",
self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method, _client_version)
logger.debug("Hindsight config: auto_retain=%s, auto_recall=%s, retain_every_n=%d, "
"retain_async=%s, retain_context=%s, "
"recall_max_tokens=%d, recall_max_input_chars=%d, tags=%s, recall_tags=%s",
self._auto_retain, self._auto_recall, self._retain_every_n_turns,
self._retain_async, self._retain_context,
self._recall_max_tokens, self._recall_max_input_chars,
self._tags, self._recall_tags)
logger.info("Hindsight initialized: mode=%s, api_url=%s, bank=%s, budget=%s, memory_mode=%s, prefetch_method=%s",
self._mode, self._api_url, self._bank_id, self._budget, self._memory_mode, self._prefetch_method)
# For local mode, start the embedded daemon in the background so it
# doesn't block the chat. Redirect stdout/stderr to a log file to
# prevent rich startup output from spamming the terminal.
if self._mode == "local_embedded":
if self._mode == "local":
def _start_daemon():
import traceback
log_dir = get_hermes_home() / "logs"
@@ -579,12 +310,9 @@ class HindsightMemoryProvider(MemoryProvider):
# If the config changed and the daemon is running, stop it.
from pathlib import Path as _Path
profile_env = _Path.home() / ".hindsight" / "profiles" / f"{profile}.env"
current_key = self._config.get("llm_api_key") or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
current_key = self._config.get("llmApiKey") or os.environ.get("HINDSIGHT_LLM_API_KEY", "")
current_provider = self._config.get("llm_provider", "")
current_model = self._config.get("llm_model", "")
current_base_url = self._config.get("llm_base_url") or os.environ.get("HINDSIGHT_API_LLM_BASE_URL", "")
# Map openai_compatible/openrouter → openai for the daemon (OpenAI wire format)
daemon_provider = "openai" if current_provider in ("openai_compatible", "openrouter") else current_provider
# Read saved profile config
saved = {}
@@ -595,24 +323,20 @@ class HindsightMemoryProvider(MemoryProvider):
saved[k.strip()] = v.strip()
config_changed = (
saved.get("HINDSIGHT_API_LLM_PROVIDER") != daemon_provider or
saved.get("HINDSIGHT_API_LLM_PROVIDER") != current_provider or
saved.get("HINDSIGHT_API_LLM_MODEL") != current_model or
saved.get("HINDSIGHT_API_LLM_API_KEY") != current_key or
saved.get("HINDSIGHT_API_LLM_BASE_URL", "") != current_base_url
saved.get("HINDSIGHT_API_LLM_API_KEY") != current_key
)
if config_changed:
# Write updated profile .env
profile_env.parent.mkdir(parents=True, exist_ok=True)
env_lines = (
f"HINDSIGHT_API_LLM_PROVIDER={daemon_provider}\n"
profile_env.write_text(
f"HINDSIGHT_API_LLM_PROVIDER={current_provider}\n"
f"HINDSIGHT_API_LLM_API_KEY={current_key}\n"
f"HINDSIGHT_API_LLM_MODEL={current_model}\n"
f"HINDSIGHT_API_LOG_LEVEL=info\n"
)
if current_base_url:
env_lines += f"HINDSIGHT_API_LLM_BASE_URL={current_base_url}\n"
profile_env.write_text(env_lines)
if client._manager.is_running(profile):
with open(log_path, "a") as f:
f.write("\n=== Config changed, restarting daemon ===\n")
@@ -653,118 +377,47 @@ class HindsightMemoryProvider(MemoryProvider):
def prefetch(self, query: str, *, session_id: str = "") -> str:
if self._prefetch_thread and self._prefetch_thread.is_alive():
logger.debug("Prefetch: waiting for background thread to complete")
self._prefetch_thread.join(timeout=3.0)
with self._prefetch_lock:
result = self._prefetch_result
self._prefetch_result = ""
if not result:
logger.debug("Prefetch: no results available")
return ""
logger.debug("Prefetch: returning %d chars of context", len(result))
header = self._recall_prompt_preamble or (
"# Hindsight Memory (persistent cross-session context)\n"
"Use this to answer questions about the user and prior sessions. "
"Do not call tools to look up information that is already present here."
)
return f"{header}\n\n{result}"
return f"## Hindsight Memory\n{result}"
def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
if self._memory_mode == "tools":
logger.debug("Prefetch: skipped (tools-only mode)")
return
if not self._auto_recall:
logger.debug("Prefetch: skipped (auto_recall disabled)")
return
# Truncate query to max chars
if self._recall_max_input_chars and len(query) > self._recall_max_input_chars:
query = query[:self._recall_max_input_chars]
def _run():
try:
client = self._get_client()
if self._prefetch_method == "reflect":
logger.debug("Prefetch: calling reflect (bank=%s, query_len=%d)", self._bank_id, len(query))
resp = _run_sync(client.areflect(bank_id=self._bank_id, query=query, budget=self._budget))
text = resp.text or ""
else:
recall_kwargs: dict = {
"bank_id": self._bank_id, "query": query,
"budget": self._budget, "max_tokens": self._recall_max_tokens,
}
if self._recall_tags:
recall_kwargs["tags"] = self._recall_tags
recall_kwargs["tags_match"] = self._recall_tags_match
if self._recall_types:
recall_kwargs["types"] = self._recall_types
logger.debug("Prefetch: calling recall (bank=%s, query_len=%d, budget=%s)",
self._bank_id, len(query), self._budget)
resp = _run_sync(client.arecall(**recall_kwargs))
num_results = len(resp.results) if resp.results else 0
logger.debug("Prefetch: recall returned %d results", num_results)
text = "\n".join(f"- {r.text}" for r in resp.results if r.text) if resp.results else ""
resp = _run_sync(client.arecall(bank_id=self._bank_id, query=query, budget=self._budget))
text = "\n".join(r.text for r in resp.results if r.text) if resp.results else ""
if text:
with self._prefetch_lock:
self._prefetch_result = text
except Exception as e:
logger.debug("Hindsight prefetch failed: %s", e, exc_info=True)
logger.debug("Hindsight prefetch failed: %s", e)
self._prefetch_thread = threading.Thread(target=_run, daemon=True, name="hindsight-prefetch")
self._prefetch_thread.start()
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Retain conversation turn in background (non-blocking).
Respects retain_every_n_turns for batching.
"""
if not self._auto_retain:
logger.debug("sync_turn: skipped (auto_retain disabled)")
return
from datetime import datetime, timezone
now = datetime.now(timezone.utc).isoformat()
messages = [
{"role": "user", "content": user_content, "timestamp": now},
{"role": "assistant", "content": assistant_content, "timestamp": now},
]
turn = json.dumps(messages)
self._session_turns.append(turn)
self._turn_counter += 1
# Only retain every N turns
if self._turn_counter % self._retain_every_n_turns != 0:
logger.debug("sync_turn: buffered turn %d (will retain at turn %d)",
self._turn_counter, self._turn_counter + (self._retain_every_n_turns - self._turn_counter % self._retain_every_n_turns))
return
logger.debug("sync_turn: retaining %d turns, total session content %d chars",
len(self._session_turns), sum(len(t) for t in self._session_turns))
# Send the ENTIRE session as a single JSON array (document_id deduplicates).
# Each element in _session_turns is a JSON string of that turn's messages.
content = "[" + ",".join(self._session_turns) + "]"
"""Retain conversation turn in background (non-blocking)."""
combined = f"User: {user_content}\nAssistant: {assistant_content}"
def _sync():
try:
client = self._get_client()
item: dict = {
"content": content,
"context": self._retain_context,
}
if self._tags:
item["tags"] = self._tags
logger.debug("Hindsight retain: bank=%s, doc=%s, async=%s, content_len=%d, num_turns=%d",
self._bank_id, self._session_id, self._retain_async, len(content), len(self._session_turns))
_run_sync(client.aretain_batch(
bank_id=self._bank_id,
items=[item],
document_id=self._session_id,
retain_async=self._retain_async,
_run_sync(client.aretain(
bank_id=self._bank_id, content=combined, context="conversation"
))
logger.debug("Hindsight retain succeeded")
except Exception as e:
logger.warning("Hindsight sync failed: %s", e, exc_info=True)
logger.warning("Hindsight sync failed: %s", e)
if self._sync_thread and self._sync_thread.is_alive():
self._sync_thread.join(timeout=5.0)
@@ -789,18 +442,12 @@ class HindsightMemoryProvider(MemoryProvider):
return tool_error("Missing required parameter: content")
context = args.get("context")
try:
retain_kwargs: dict = {
"bank_id": self._bank_id, "content": content, "context": context,
}
if self._tags:
retain_kwargs["tags"] = self._tags
logger.debug("Tool hindsight_retain: bank=%s, content_len=%d, context=%s",
self._bank_id, len(content), context)
_run_sync(client.aretain(**retain_kwargs))
logger.debug("Tool hindsight_retain: success")
_run_sync(client.aretain(
bank_id=self._bank_id, content=content, context=context
))
return json.dumps({"result": "Memory stored successfully."})
except Exception as e:
logger.warning("hindsight_retain failed: %s", e, exc_info=True)
logger.warning("hindsight_retain failed: %s", e)
return tool_error(f"Failed to store memory: {e}")
elif tool_name == "hindsight_recall":
@@ -808,26 +455,15 @@ class HindsightMemoryProvider(MemoryProvider):
if not query:
return tool_error("Missing required parameter: query")
try:
recall_kwargs: dict = {
"bank_id": self._bank_id, "query": query, "budget": self._budget,
"max_tokens": self._recall_max_tokens,
}
if self._recall_tags:
recall_kwargs["tags"] = self._recall_tags
recall_kwargs["tags_match"] = self._recall_tags_match
if self._recall_types:
recall_kwargs["types"] = self._recall_types
logger.debug("Tool hindsight_recall: bank=%s, query_len=%d, budget=%s",
self._bank_id, len(query), self._budget)
resp = _run_sync(client.arecall(**recall_kwargs))
num_results = len(resp.results) if resp.results else 0
logger.debug("Tool hindsight_recall: %d results", num_results)
resp = _run_sync(client.arecall(
bank_id=self._bank_id, query=query, budget=self._budget
))
if not resp.results:
return json.dumps({"result": "No relevant memories found."})
lines = [f"{i}. {r.text}" for i, r in enumerate(resp.results, 1)]
return json.dumps({"result": "\n".join(lines)})
except Exception as e:
logger.warning("hindsight_recall failed: %s", e, exc_info=True)
logger.warning("hindsight_recall failed: %s", e)
return tool_error(f"Failed to search memory: {e}")
elif tool_name == "hindsight_reflect":
@@ -835,28 +471,24 @@ class HindsightMemoryProvider(MemoryProvider):
if not query:
return tool_error("Missing required parameter: query")
try:
logger.debug("Tool hindsight_reflect: bank=%s, query_len=%d, budget=%s",
self._bank_id, len(query), self._budget)
resp = _run_sync(client.areflect(
bank_id=self._bank_id, query=query, budget=self._budget
))
logger.debug("Tool hindsight_reflect: response_len=%d", len(resp.text or ""))
return json.dumps({"result": resp.text or "No relevant memories found."})
except Exception as e:
logger.warning("hindsight_reflect failed: %s", e, exc_info=True)
logger.warning("hindsight_reflect failed: %s", e)
return tool_error(f"Failed to reflect: {e}")
return tool_error(f"Unknown tool: {tool_name}")
def shutdown(self) -> None:
logger.debug("Hindsight shutdown: waiting for background threads")
global _loop, _loop_thread
for t in (self._prefetch_thread, self._sync_thread):
if t and t.is_alive():
t.join(timeout=5.0)
if self._client is not None:
try:
if self._mode == "local_embedded":
if self._mode == "local":
# Use the public close() API. The RuntimeError from
# aiohttp's "attached to a different loop" is expected
# and harmless — the daemon keeps running independently.
+4 -2
View File
@@ -2,7 +2,9 @@ name: hindsight
version: 1.0.0
description: "Hindsight — long-term memory with knowledge graph, entity resolution, and multi-strategy retrieval."
pip_dependencies:
- "hindsight-client>=0.4.22"
requires_env: []
- hindsight-client
- hindsight-all
requires_env:
- HINDSIGHT_API_KEY
hooks:
- on_session_end
+1 -3
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
version = "0.8.0"
version = "0.7.0"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"
@@ -62,7 +62,6 @@ mcp = ["mcp>=1.2.0,<2"]
homeassistant = ["aiohttp>=3.9.0,<4"]
sms = ["aiohttp>=3.9.0,<4"]
acp = ["agent-client-protocol>=0.9.0,<1.0"]
mistral = ["mistralai>=2.3.0,<3"]
dingtalk = ["dingtalk-stream>=0.1.0,<1"]
feishu = ["lark-oapi>=1.5.3,<2"]
rl = [
@@ -95,7 +94,6 @@ all = [
"hermes-agent[voice]",
"hermes-agent[dingtalk]",
"hermes-agent[feishu]",
"hermes-agent[mistral]",
]
[project.scripts]
+89 -318
View File
@@ -66,8 +66,7 @@ from model_tools import (
handle_function_call,
check_toolset_requirements,
)
from tools.terminal_tool import cleanup_vm, get_active_env, is_persistent_env
from tools.tool_result_storage import maybe_persist_tool_result, enforce_turn_budget
from tools.terminal_tool import cleanup_vm
from tools.interrupt import set_interrupt as _set_interrupt
from tools.browser_tool import cleanup_browser
@@ -76,7 +75,6 @@ from hermes_constants import OPENROUTER_BASE_URL
# Agent internals extracted to agent/ package for modularity
from agent.memory_manager import build_memory_context_block
from agent.retry_utils import jittered_backoff
from agent.prompt_builder import (
DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
@@ -87,7 +85,6 @@ from agent.model_metadata import (
estimate_tokens_rough, estimate_messages_tokens_rough, estimate_request_tokens_rough,
get_next_probe_tier, parse_context_limit_from_error,
save_context_length, is_local_endpoint,
query_ollama_num_ctx,
)
from agent.context_compressor import ContextCompressor
from agent.subdirectory_hints import SubdirectoryHintTracker
@@ -412,26 +409,62 @@ def _strip_budget_warnings_from_history(messages: list) -> None:
# Large tool result handler — save oversized output to temp file
# =========================================================================
# Threshold at which tool results are saved to a file instead of kept inline.
# 100K chars ≈ 25K tokens — generous for any reasonable output but prevents
# catastrophic context explosions.
_LARGE_RESULT_CHARS = 100_000
# =========================================================================
# Qwen Portal headers — mimics QwenCode CLI for portal.qwen.ai compatibility.
# Extracted as a module-level helper so both __init__ and
# _apply_client_headers_for_base_url can share it.
# =========================================================================
_QWEN_CODE_VERSION = "0.14.1"
# How many characters of the original result to include as an inline preview
# so the model has immediate context about what the tool returned.
_LARGE_RESULT_PREVIEW_CHARS = 1_500
def _qwen_portal_headers() -> dict:
"""Return default HTTP headers required by Qwen Portal API."""
import platform as _plat
def _save_oversized_tool_result(function_name: str, function_result: str) -> str:
"""Replace oversized tool results with a file reference + preview.
_ua = f"QwenCode/{_QWEN_CODE_VERSION} ({_plat.system().lower()}; {_plat.machine()})"
return {
"User-Agent": _ua,
"X-DashScope-CacheControl": "enable",
"X-DashScope-UserAgent": _ua,
"X-DashScope-AuthType": "qwen-oauth",
}
When a tool returns more than ``_LARGE_RESULT_CHARS`` characters, the full
content is written to a temporary file under ``HERMES_HOME/cache/tool_responses/``
and the result sent to the model is replaced with:
a brief head preview (first ``_LARGE_RESULT_PREVIEW_CHARS`` chars)
the file path so the model can use ``read_file`` / ``search_files``
Falls back to destructive truncation if the file write fails.
"""
original_len = len(function_result)
if original_len <= _LARGE_RESULT_CHARS:
return function_result
# Build the target directory
try:
response_dir = os.path.join(get_hermes_home(), "cache", "tool_responses")
os.makedirs(response_dir, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
# Sanitize tool name for use in filename
safe_name = re.sub(r"[^\w\-]", "_", function_name)[:40]
filename = f"{safe_name}_{timestamp}.txt"
filepath = os.path.join(response_dir, filename)
with open(filepath, "w", encoding="utf-8") as f:
f.write(function_result)
preview = function_result[:_LARGE_RESULT_PREVIEW_CHARS]
return (
f"{preview}\n\n"
f"[Large tool response: {original_len:,} characters total — "
f"only the first {_LARGE_RESULT_PREVIEW_CHARS:,} shown above. "
f"Full output saved to: {filepath}\n"
f"Use read_file or search_files on that path to access the rest.]"
)
except Exception as exc:
# Fall back to destructive truncation if file write fails
logger.warning("Failed to save large tool result to file: %s", exc)
return (
function_result[:_LARGE_RESULT_CHARS]
+ f"\n\n[Truncated: tool response was {original_len:,} chars, "
f"exceeding the {_LARGE_RESULT_CHARS:,} char limit. "
f"File save failed: {exc}]"
)
class AIAgent:
@@ -442,13 +475,6 @@ class AIAgent:
for AI models that support function calling.
"""
# ── Class-level context pressure dedup (survives across instances) ──
# The gateway creates a new AIAgent per message, so instance-level flags
# reset every time. This dict tracks {session_id: (warn_level, timestamp)}
# to suppress duplicate warnings within a cooldown window.
_context_pressure_last_warned: dict = {}
_CONTEXT_PRESSURE_COOLDOWN = 300 # seconds between re-warning same session
@property
def base_url(self) -> str:
return self._base_url
@@ -680,8 +706,7 @@ class AIAgent:
# Context pressure warnings: notify the USER (not the LLM) as context
# fills up. Purely informational — displayed in CLI output and sent via
# status_callback for gateway platforms. Does NOT inject into messages.
# Tiered: fires at 85% and again at 95% of compaction threshold.
self._context_pressure_warned_at = 0.0 # highest tier already shown
self._context_pressure_warned = False
# Activity tracking — updated on each API call, tool execution, and
# stream chunk. Used by the gateway timeout handler to report what the
@@ -785,8 +810,6 @@ class AIAgent:
client_kwargs["default_headers"] = {
"User-Agent": "KimiCLI/1.3",
}
elif "portal.qwen.ai" in effective_base.lower():
client_kwargs["default_headers"] = _qwen_portal_headers()
else:
# No explicit creds — use the centralized provider router
from agent.auxiliary_client import resolve_provider_client
@@ -1193,33 +1216,6 @@ class AIAgent:
self.session_cost_status = "unknown"
self.session_cost_source = "none"
# ── Ollama num_ctx injection ──
# Ollama defaults to 2048 context regardless of the model's capabilities.
# When running against an Ollama server, detect the model's max context
# and pass num_ctx on every chat request so the full window is used.
# User override: set model.ollama_num_ctx in config.yaml to cap VRAM use.
self._ollama_num_ctx: int | None = None
_ollama_num_ctx_override = None
if isinstance(_model_cfg, dict):
_ollama_num_ctx_override = _model_cfg.get("ollama_num_ctx")
if _ollama_num_ctx_override is not None:
try:
self._ollama_num_ctx = int(_ollama_num_ctx_override)
except (TypeError, ValueError):
logger.debug("Invalid ollama_num_ctx config value: %r", _ollama_num_ctx_override)
if self._ollama_num_ctx is None and self.base_url and is_local_endpoint(self.base_url):
try:
_detected = query_ollama_num_ctx(self.model, self.base_url)
if _detected and _detected > 0:
self._ollama_num_ctx = _detected
except Exception as exc:
logger.debug("Ollama num_ctx detection failed: %s", exc)
if self._ollama_num_ctx and not self.quiet_mode:
logger.info(
"Ollama num_ctx: will request %d tokens (model max from /api/show)",
self._ollama_num_ctx,
)
if not self.quiet_mode:
if compression_enabled:
print(f"📊 Context limit: {self.context_compressor.context_length:,} tokens (compress at {int(compression_threshold*100)}% = {self.context_compressor.threshold_tokens:,})")
@@ -1695,25 +1691,9 @@ class AIAgent:
return None
def _cleanup_task_resources(self, task_id: str) -> None:
"""Clean up VM and browser resources for a given task.
Skips ``cleanup_vm`` when the active terminal environment is marked
persistent (``persistent_filesystem=True``) so that long-lived sandbox
containers survive between turns. The idle reaper in
``terminal_tool._cleanup_inactive_envs`` still tears them down once
``terminal.lifetime_seconds`` is exceeded. Non-persistent backends are
torn down per-turn as before to prevent resource leakage (the original
intent of this hook for the Morph backend, see commit fbd3a2fd).
"""
"""Clean up VM and browser resources for a given task."""
try:
if is_persistent_env(task_id):
if self.verbose_logging:
logging.debug(
f"Skipping per-turn cleanup_vm for persistent env {task_id}; "
f"idle reaper will handle it."
)
else:
cleanup_vm(task_id)
cleanup_vm(task_id)
except Exception as e:
if self.verbose_logging:
logging.warning(f"Failed to cleanup VM for task {task_id}: {e}")
@@ -4127,8 +4107,6 @@ class AIAgent:
self._client_kwargs["default_headers"] = copilot_default_headers()
elif "api.kimi.com" in normalized:
self._client_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
elif "portal.qwen.ai" in normalized:
self._client_kwargs["default_headers"] = _qwen_portal_headers()
else:
self._client_kwargs.pop("default_headers", None)
@@ -4752,25 +4730,18 @@ class AIAgent:
self._close_request_openai_client(request_client, reason="stream_request_complete")
_stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
# Local providers (Ollama, oMLX, llama-cpp) can take 300+ seconds
# for prefill on large contexts. Disable the stale detector unless
# the user explicitly set HERMES_STREAM_STALE_TIMEOUT.
if _stream_stale_timeout_base == 180.0 and self.base_url and is_local_endpoint(self.base_url):
_stream_stale_timeout = float("inf")
logger.debug("Local provider detected (%s) — stale stream timeout disabled", self.base_url)
# Scale the stale timeout for large contexts: slow models (like Opus)
# can legitimately think for minutes before producing the first token
# when the context is large. Without this, the stale detector kills
# healthy connections during the model's thinking phase, producing
# spurious RemoteProtocolError ("peer closed connection").
_est_tokens = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
if _est_tokens > 100_000:
_stream_stale_timeout = max(_stream_stale_timeout_base, 300.0)
elif _est_tokens > 50_000:
_stream_stale_timeout = max(_stream_stale_timeout_base, 240.0)
else:
# Scale the stale timeout for large contexts: slow models (like Opus)
# can legitimately think for minutes before producing the first token
# when the context is large. Without this, the stale detector kills
# healthy connections during the model's thinking phase, producing
# spurious RemoteProtocolError ("peer closed connection").
_est_tokens = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
if _est_tokens > 100_000:
_stream_stale_timeout = max(_stream_stale_timeout_base, 300.0)
elif _est_tokens > 50_000:
_stream_stale_timeout = max(_stream_stale_timeout_base, 240.0)
else:
_stream_stale_timeout = _stream_stale_timeout_base
_stream_stale_timeout = _stream_stale_timeout_base
t = threading.Thread(target=_call, daemon=True)
t.start()
@@ -4926,7 +4897,7 @@ class AIAgent:
effective_key = (fb_client.api_key or resolve_anthropic_token() or "") if fb_provider == "anthropic" else (fb_client.api_key or "")
self.api_key = effective_key
self._anthropic_api_key = effective_key
self._anthropic_base_url = fb_base_url
self._anthropic_base_url = getattr(fb_client, "base_url", None)
self._anthropic_client = build_anthropic_client(effective_key, self._anthropic_base_url)
self._is_anthropic_oauth = _is_oauth_token(effective_key)
self.client = None
@@ -5282,71 +5253,6 @@ class AIAgent:
base = (getattr(self, "base_url", "") or "").lower()
return "dashscope" in base or "aliyuncs" in base or "opencode.ai/zen/go" in base
def _is_qwen_portal(self) -> bool:
"""Return True when the base URL targets Qwen Portal."""
return "portal.qwen.ai" in self._base_url_lower
def _qwen_prepare_chat_messages(self, api_messages: list) -> list:
prepared = copy.deepcopy(api_messages)
if not prepared:
return prepared
for msg in prepared:
if not isinstance(msg, dict):
continue
content = msg.get("content")
if isinstance(content, str):
msg["content"] = [{"type": "text", "text": content}]
elif isinstance(content, list):
# Normalize: convert bare strings to text dicts, keep dicts as-is.
# deepcopy already created independent copies, no need for dict().
normalized_parts = []
for part in content:
if isinstance(part, str):
normalized_parts.append({"type": "text", "text": part})
elif isinstance(part, dict):
normalized_parts.append(part)
if normalized_parts:
msg["content"] = normalized_parts
# Inject cache_control on the last part of the system message.
for msg in prepared:
if isinstance(msg, dict) and msg.get("role") == "system":
content = msg.get("content")
if isinstance(content, list) and content and isinstance(content[-1], dict):
content[-1]["cache_control"] = {"type": "ephemeral"}
break
return prepared
def _qwen_prepare_chat_messages_inplace(self, messages: list) -> None:
"""In-place variant — mutates an already-copied message list."""
if not messages:
return
for msg in messages:
if not isinstance(msg, dict):
continue
content = msg.get("content")
if isinstance(content, str):
msg["content"] = [{"type": "text", "text": content}]
elif isinstance(content, list):
normalized_parts = []
for part in content:
if isinstance(part, str):
normalized_parts.append({"type": "text", "text": part})
elif isinstance(part, dict):
normalized_parts.append(part)
if normalized_parts:
msg["content"] = normalized_parts
for msg in messages:
if isinstance(msg, dict) and msg.get("role") == "system":
content = msg.get("content")
if isinstance(content, list) and content and isinstance(content[-1], dict):
content[-1]["cache_control"] = {"type": "ephemeral"}
break
def _build_api_kwargs(self, api_messages: list) -> dict:
"""Build the keyword arguments dict for the active API mode."""
if self.api_mode == "anthropic_messages":
@@ -5365,7 +5271,6 @@ class AIAgent:
is_oauth=self._is_anthropic_oauth,
preserve_dots=self._anthropic_preserve_dots(),
context_length=ctx_len,
base_url=getattr(self, "_anthropic_base_url", None),
)
if self.api_mode == "codex_responses":
@@ -5459,17 +5364,6 @@ class AIAgent:
tool_call.pop("call_id", None)
tool_call.pop("response_item_id", None)
# Qwen portal: normalize content to list-of-dicts, inject cache_control.
# Must run AFTER codex sanitization so we transform the final messages.
# If sanitization already deepcopied, reuse that copy (in-place).
if self._is_qwen_portal():
if sanitized_messages is api_messages:
# No sanitization was done — we need our own copy.
sanitized_messages = self._qwen_prepare_chat_messages(sanitized_messages)
else:
# Already a deepcopy — transform in place to avoid a second deepcopy.
self._qwen_prepare_chat_messages_inplace(sanitized_messages)
# GPT-5 and Codex models respond better to 'developer' than 'system'
# for instruction-following. Swap the role at the API boundary so
# internal message representation stays uniform ("system").
@@ -5502,17 +5396,11 @@ class AIAgent:
"messages": sanitized_messages,
"timeout": float(os.getenv("HERMES_API_TIMEOUT", 1800.0)),
}
if self._is_qwen_portal():
api_kwargs["metadata"] = {
"sessionId": self.session_id or "hermes",
"promptId": str(uuid.uuid4()),
}
if self.tools:
api_kwargs["tools"] = self.tools
if self.max_tokens is not None:
if not self._is_qwen_portal():
api_kwargs.update(self._max_tokens_param(self.max_tokens))
api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
# OpenRouter translates requests to Anthropic's Messages API,
# which requires max_tokens as a mandatory field. When we omit
@@ -5568,18 +5456,6 @@ class AIAgent:
if _is_nous:
extra_body["tags"] = ["product=hermes-agent"]
# Ollama num_ctx: override the 2048 default so the model actually
# uses the context window it was trained for. Passed via the OpenAI
# SDK's extra_body → options.num_ctx, which Ollama's OpenAI-compat
# endpoint forwards to the runner as --ctx-size.
if self._ollama_num_ctx:
options = extra_body.get("options", {})
options["num_ctx"] = self._ollama_num_ctx
extra_body["options"] = options
if self._is_qwen_portal():
extra_body["vl_high_resolution_images"] = True
if extra_body:
api_kwargs["extra_body"] = extra_body
@@ -5895,7 +5771,7 @@ class AIAgent:
tools=[memory_tool_def],
temperature=0.3,
max_tokens=5120,
# timeout resolved from auxiliary.flush_memories.timeout config
timeout=30.0,
)
except RuntimeError:
_aux_available = False
@@ -5927,10 +5803,7 @@ class AIAgent:
"temperature": 0.3,
**self._max_tokens_param(5120),
}
from agent.auxiliary_client import _get_task_timeout
response = self._ensure_primary_openai_client(reason="flush_memories").chat.completions.create(
**api_kwargs, timeout=_get_task_timeout("flush_memories")
)
response = self._ensure_primary_openai_client(reason="flush_memories").chat.completions.create(**api_kwargs, timeout=30.0)
# Extract tool calls from the response, handling all API formats
tool_calls = []
@@ -6037,15 +5910,6 @@ class AIAgent:
except Exception as e:
logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
# Warn on repeated compressions (quality degrades with each pass)
_cc = self.context_compressor.compression_count
if _cc >= 2:
self._vprint(
f"{self.log_prefix}⚠️ Session compressed {_cc} times — "
f"accuracy may degrade. Consider /new to start fresh.",
force=True,
)
# Update token estimate after compaction so pressure calculations
# use the post-compression count, not the stale pre-compression one.
_compressed_est = (
@@ -6058,16 +5922,12 @@ class AIAgent:
# Only reset the pressure warning if compression actually brought
# us below the warning level (85% of threshold). When compression
# can't reduce enough (e.g. threshold is very low, or system prompt
# alone exceeds the warning level), keep the tier set to prevent
# alone exceeds the warning level), keep the flag set to prevent
# spamming the user with repeated warnings every loop iteration.
if self.context_compressor.threshold_tokens > 0:
_post_progress = _compressed_est / self.context_compressor.threshold_tokens
if _post_progress < 0.85:
self._context_pressure_warned_at = 0.0
# Clear class-level dedup for this session so a fresh
# warning cycle can start if context grows again.
_sid = self.session_id or "default"
AIAgent._context_pressure_last_warned.pop(_sid, None)
self._context_pressure_warned = False
# Clear the file-read dedup cache. After compression the original
# read content is summarised away — if the model re-reads the same
@@ -6364,17 +6224,15 @@ class AIAgent:
except Exception as cb_err:
logging.debug(f"Tool complete callback error: {cb_err}")
function_result = maybe_persist_tool_result(
content=function_result,
tool_name=name,
tool_use_id=tc.id,
env=get_active_env(effective_task_id),
)
# Save oversized results to file instead of destructive truncation
function_result = _save_oversized_tool_result(name, function_result)
# Discover subdirectory context files from tool arguments
subdir_hints = self._subdirectory_hints.check_tool_call(name, args)
if subdir_hints:
function_result += subdir_hints
# Append tool result message in order
tool_msg = {
"role": "tool",
"content": function_result,
@@ -6382,12 +6240,6 @@ class AIAgent:
}
messages.append(tool_msg)
# ── Per-turn aggregate budget enforcement ─────────────────────────
num_tools = len(parsed_calls)
if num_tools > 0:
turn_tool_msgs = messages[-num_tools:]
enforce_turn_budget(turn_tool_msgs, env=get_active_env(effective_task_id))
# ── Budget pressure injection ────────────────────────────────────
budget_warning = self._get_budget_warning(api_call_count)
if budget_warning and messages and messages[-1].get("role") == "tool":
@@ -6672,12 +6524,8 @@ class AIAgent:
except Exception as cb_err:
logging.debug(f"Tool complete callback error: {cb_err}")
function_result = maybe_persist_tool_result(
content=function_result,
tool_name=function_name,
tool_use_id=tool_call.id,
env=get_active_env(effective_task_id),
)
# Save oversized results to file instead of destructive truncation
function_result = _save_oversized_tool_result(function_name, function_result)
# Discover subdirectory context files from tool arguments
subdir_hints = self._subdirectory_hints.check_tool_call(function_name, function_args)
@@ -6715,11 +6563,6 @@ class AIAgent:
if self.tool_delay > 0 and i < len(assistant_message.tool_calls):
time.sleep(self.tool_delay)
# ── Per-turn aggregate budget enforcement ─────────────────────────
num_tools_seq = len(assistant_message.tool_calls)
if num_tools_seq > 0:
enforce_turn_budget(messages[-num_tools_seq:], env=get_active_env(effective_task_id))
# ── Budget pressure injection ─────────────────────────────────
# After all tool calls in this turn are processed, check if we're
# approaching max_iterations. If so, inject a warning into the LAST
@@ -7446,7 +7289,6 @@ class AIAgent:
codex_auth_retry_attempted=False
anthropic_auth_retry_attempted=False
nous_auth_retry_attempted=False
thinking_sig_retry_attempted = False
has_retried_429 = False
restart_with_compressed_messages = False
restart_with_length_continuation = False
@@ -7662,8 +7504,7 @@ class AIAgent:
}
# Longer backoff for rate limiting (likely cause of None choices)
# Jittered exponential: 5s base, 120s cap + random jitter
wait_time = jittered_backoff(retry_count, base_delay=5.0, max_delay=120.0)
wait_time = min(5 * (2 ** (retry_count - 1)), 120) # 5s, 10s, 20s, 40s, 80s, 120s
self._vprint(f"{self.log_prefix}⏳ Retrying in {wait_time}s (extended backoff for possible rate limit)...", force=True)
logging.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}")
@@ -8036,38 +7877,8 @@ class AIAgent:
print(f"{self.log_prefix} • Check ANTHROPIC_API_KEY in {_dhh}/.env for API keys or legacy token values")
print(f"{self.log_prefix} • For API keys: verify at https://console.anthropic.com/settings/keys")
print(f"{self.log_prefix} • For Claude Code: run 'claude /login' to refresh, then retry")
print(f"{self.log_prefix}Legacy cleanup: hermes config set ANTHROPIC_TOKEN \"\"")
print(f"{self.log_prefix}Clear stale keys: hermes config set ANTHROPIC_API_KEY \"\"")
# ── Thinking block signature recovery ─────────────────
# Anthropic signs thinking blocks against the full turn
# content. Any upstream mutation (context compression,
# session truncation, message merging) invalidates the
# signature → HTTP 400. Recovery: strip reasoning_details
# from all messages so the next retry sends no thinking
# blocks at all. One-shot — don't retry infinitely.
if (
self.api_mode == "anthropic_messages"
and status_code == 400
and not thinking_sig_retry_attempted
):
_err_msg_lower = str(api_error).lower()
if "signature" in _err_msg_lower and "thinking" in _err_msg_lower:
thinking_sig_retry_attempted = True
for _m in messages:
if isinstance(_m, dict):
_m.pop("reasoning_details", None)
self._vprint(
f"{self.log_prefix}⚠️ Thinking block signature invalid — "
f"stripped all thinking blocks, retrying...",
force=True,
)
logging.warning(
"%sThinking block signature recovery: stripped "
"reasoning_details from %d messages",
self.log_prefix, len(messages),
)
continue
print(f"{self.log_prefix}Clear stale keys: hermes config set ANTHROPIC_TOKEN \"\"")
print(f"{self.log_prefix}Legacy cleanup: hermes config set ANTHROPIC_API_KEY \"\"")
retry_count += 1
elapsed_time = time.time() - api_start_time
@@ -8550,7 +8361,7 @@ class AIAgent:
_retry_after = min(int(_ra_raw), 120) # Cap at 2 minutes
except (TypeError, ValueError):
pass
wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
wait_time = _retry_after if _retry_after else min(2 ** retry_count, 60)
if is_rate_limited:
self._emit_status(f"⏱️ Rate limit reached. Waiting {wait_time}s before retry (attempt {retry_count + 1}/{max_retries})...")
else:
@@ -9007,34 +8818,13 @@ class AIAgent:
# compaction fires, not the raw context window.
# Does not inject into messages — just prints to CLI output
# and fires status_callback for gateway platforms.
# Tiered: 85% (orange) and 95% (red/critical).
if _compressor.threshold_tokens > 0:
_compaction_progress = _real_tokens / _compressor.threshold_tokens
# Determine the warning tier for this progress level
_warn_tier = 0.0
if _compaction_progress >= 0.95:
_warn_tier = 0.95
elif _compaction_progress >= 0.85:
_warn_tier = 0.85
if _warn_tier > self._context_pressure_warned_at:
# Class-level dedup: check if this session was already
# warned at this tier within the cooldown window.
_sid = self.session_id or "default"
_last = AIAgent._context_pressure_last_warned.get(_sid)
_now = time.time()
if _last is None or _last[0] < _warn_tier or (_now - _last[1]) >= self._CONTEXT_PRESSURE_COOLDOWN:
self._context_pressure_warned_at = _warn_tier
AIAgent._context_pressure_last_warned[_sid] = (_warn_tier, _now)
self._emit_context_pressure(_compaction_progress, _compressor)
# Evict stale entries (older than 2x cooldown)
_cutoff = _now - self._CONTEXT_PRESSURE_COOLDOWN * 2
AIAgent._context_pressure_last_warned = {
k: v for k, v in AIAgent._context_pressure_last_warned.items()
if v[1] > _cutoff
}
if _compaction_progress >= 0.85 and not self._context_pressure_warned:
self._context_pressure_warned = True
self._emit_context_pressure(_compaction_progress, _compressor)
if self.compression_enabled and _compressor.should_compress(_real_tokens):
self._safe_print(" ⟳ compacting context…")
messages, active_system_prompt = self._compress_context(
messages, system_message,
approx_tokens=self.context_compressor.last_prompt_tokens,
@@ -9109,27 +8899,8 @@ class AIAgent:
self._save_session_log(messages)
continue
# ── Empty response retry (no reasoning) ──────
# Model returned nothing — no content, no
# structured reasoning, no tool calls. Common
# with open models (transient provider issues,
# rate limits, sampling flukes). Silently retry
# up to 3 times before giving up. Skip when
# content has inline <think> tags (model chose
# to reason, just no visible text).
_truly_empty = not final_response.strip()
if _truly_empty and not _has_structured and self._empty_content_retries < 3:
self._empty_content_retries += 1
self._vprint(
f"{self.log_prefix}↻ Empty response (no content or reasoning) "
f"— retrying ({self._empty_content_retries}/3)",
force=True,
)
continue
# Exhausted prefill attempts, empty retries, or
# structured reasoning with no content —
# fall through to "(empty)" terminal.
# Exhausted prefill attempts or no structured
# reasoning — fall through to "(empty)" terminal.
reasoning_text = self._extract_reasoning(assistant_message)
assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
assistant_msg["content"] = "(empty)"
@@ -9139,7 +8910,7 @@ class AIAgent:
reasoning_preview = reasoning_text[:500] + "..." if len(reasoning_text) > 500 else reasoning_text
self._vprint(f"{self.log_prefix}️ Reasoning-only response (no visible content). Reasoning: {reasoning_preview}")
else:
self._vprint(f"{self.log_prefix}️ Empty response (no content or reasoning) after 3 retries.")
self._vprint(f"{self.log_prefix}️ Empty response (no content or reasoning).")
final_response = "(empty)"
break
-252
View File
@@ -1276,258 +1276,6 @@ class TestRoleAlternation:
assert [m["role"] for m in result] == ["user", "assistant", "user"]
# ---------------------------------------------------------------------------
# Thinking block signature management
# ---------------------------------------------------------------------------
class TestThinkingBlockSignatureManagement:
"""Tests for the thinking block handling strategy:
strip from old turns, preserve latest signed, downgrade unsigned."""
def test_thinking_stripped_from_non_last_assistant(self):
"""Thinking blocks are removed from all assistant messages except the last."""
messages = [
{
"role": "assistant",
"content": "",
"tool_calls": [
{"id": "tc_1", "function": {"name": "tool1", "arguments": "{}"}},
],
"reasoning_details": [
{"type": "thinking", "thinking": "Old reasoning.", "signature": "sig_old"},
],
},
{"role": "tool", "tool_call_id": "tc_1", "content": "result 1"},
{
"role": "assistant",
"content": "",
"tool_calls": [
{"id": "tc_2", "function": {"name": "tool2", "arguments": "{}"}},
],
"reasoning_details": [
{"type": "thinking", "thinking": "Latest reasoning.", "signature": "sig_new"},
],
},
{"role": "tool", "tool_call_id": "tc_2", "content": "result 2"},
]
_, result = convert_messages_to_anthropic(messages)
# Find both assistant messages
assistants = [m for m in result if m["role"] == "assistant"]
assert len(assistants) == 2
# First (non-last) assistant: no thinking blocks
first_types = [b.get("type") for b in assistants[0]["content"]]
assert "thinking" not in first_types
assert "redacted_thinking" not in first_types
assert "tool_use" in first_types # tool_use should survive
# Last assistant: thinking block preserved with signature
last_blocks = assistants[1]["content"]
thinking_blocks = [b for b in last_blocks if b.get("type") == "thinking"]
assert len(thinking_blocks) == 1
assert thinking_blocks[0]["thinking"] == "Latest reasoning."
assert thinking_blocks[0]["signature"] == "sig_new"
def test_signed_thinking_preserved_on_last_turn(self):
"""A signed thinking block on the last assistant message is kept."""
messages = [
{
"role": "assistant",
"content": "The answer is 42.",
"reasoning_details": [
{"type": "thinking", "thinking": "Deep thought.", "signature": "sig_valid"},
],
},
]
_, result = convert_messages_to_anthropic(messages)
blocks = result[0]["content"]
thinking = [b for b in blocks if b.get("type") == "thinking"]
assert len(thinking) == 1
assert thinking[0]["signature"] == "sig_valid"
def test_unsigned_thinking_downgraded_to_text_on_last_turn(self):
"""Unsigned thinking blocks on the last turn become text blocks."""
messages = [
{
"role": "assistant",
"content": "Response text.",
"reasoning_details": [
{"type": "thinking", "thinking": "Unsigned reasoning."},
# No 'signature' field
],
},
]
_, result = convert_messages_to_anthropic(messages)
blocks = result[0]["content"]
# No thinking blocks should remain
assert not any(b.get("type") == "thinking" for b in blocks)
# The reasoning text should be preserved as a text block
text_contents = [b.get("text", "") for b in blocks if b.get("type") == "text"]
assert "Unsigned reasoning." in text_contents
def test_redacted_thinking_with_data_preserved(self):
"""Redacted thinking with 'data' field is kept on last turn."""
messages = [
{
"role": "assistant",
"content": "Response.",
"reasoning_details": [
{"type": "redacted_thinking", "data": "opaque_signature_data"},
],
},
]
_, result = convert_messages_to_anthropic(messages)
blocks = result[0]["content"]
redacted = [b for b in blocks if b.get("type") == "redacted_thinking"]
assert len(redacted) == 1
assert redacted[0]["data"] == "opaque_signature_data"
def test_redacted_thinking_without_data_dropped(self):
"""Redacted thinking without 'data' is dropped — can't be validated."""
messages = [
{
"role": "assistant",
"content": "Response.",
"reasoning_details": [
{"type": "redacted_thinking"},
# No 'data' field
],
},
]
_, result = convert_messages_to_anthropic(messages)
blocks = result[0]["content"]
assert not any(b.get("type") == "redacted_thinking" for b in blocks)
def test_cache_control_stripped_from_thinking_blocks(self):
"""cache_control markers are removed from thinking/redacted_thinking blocks."""
messages = [
{
"role": "assistant",
"content": "",
"tool_calls": [
{"id": "tc_1", "function": {"name": "t", "arguments": "{}"}},
],
"reasoning_details": [
{
"type": "thinking",
"thinking": "Reasoning.",
"signature": "sig_1",
"cache_control": {"type": "ephemeral"},
},
],
},
{"role": "tool", "tool_call_id": "tc_1", "content": "result"},
]
_, result = convert_messages_to_anthropic(messages)
assistant = next(m for m in result if m["role"] == "assistant")
for block in assistant["content"]:
if block.get("type") in ("thinking", "redacted_thinking"):
assert "cache_control" not in block
def test_thinking_stripped_from_merged_consecutive_assistants(self):
"""When consecutive assistants are merged, second one's thinking is dropped."""
messages = [
{
"role": "assistant",
"content": "First response.",
"reasoning_details": [
{"type": "thinking", "thinking": "First thought.", "signature": "sig_1"},
],
},
{
"role": "assistant",
"content": "Second response.",
"reasoning_details": [
{"type": "thinking", "thinking": "Second thought.", "signature": "sig_2"},
],
},
]
_, result = convert_messages_to_anthropic(messages)
# Should be merged into one assistant message
assistants = [m for m in result if m["role"] == "assistant"]
assert len(assistants) == 1
# Only the first thinking block should remain (signed, on the last/only assistant)
blocks = assistants[0]["content"]
thinking = [b for b in blocks if b.get("type") == "thinking"]
assert len(thinking) == 1
assert thinking[0]["thinking"] == "First thought."
def test_empty_content_after_strip_gets_placeholder(self):
"""If stripping thinking leaves an empty message, a placeholder is added."""
messages = [
{
"role": "assistant",
"content": "",
"reasoning_details": [
{"type": "thinking", "thinking": "Only thinking, no text."},
# Unsigned — will be downgraded, but content was empty string
],
},
{"role": "user", "content": "Next message."},
{"role": "assistant", "content": "Final."},
]
_, result = convert_messages_to_anthropic(messages)
# First assistant is non-last, so thinking is stripped completely.
# The original content was empty and thinking was unsigned → placeholder
first_assistant = result[0]
assert first_assistant["role"] == "assistant"
assert len(first_assistant["content"]) >= 1
def test_multi_turn_conversation_preserves_only_last(self):
"""Full multi-turn conversation: only last assistant keeps thinking."""
messages = [
{"role": "user", "content": "Question 1"},
{
"role": "assistant",
"content": "Answer 1",
"reasoning_details": [
{"type": "thinking", "thinking": "Thought 1", "signature": "sig_1"},
],
},
{"role": "user", "content": "Question 2"},
{
"role": "assistant",
"content": "Answer 2",
"reasoning_details": [
{"type": "thinking", "thinking": "Thought 2", "signature": "sig_2"},
],
},
{"role": "user", "content": "Question 3"},
{
"role": "assistant",
"content": "Answer 3",
"reasoning_details": [
{"type": "thinking", "thinking": "Thought 3", "signature": "sig_3"},
],
},
]
_, result = convert_messages_to_anthropic(messages)
assistants = [m for m in result if m["role"] == "assistant"]
assert len(assistants) == 3
# First two: no thinking blocks
for a in assistants[:2]:
assert not any(
b.get("type") in ("thinking", "redacted_thinking")
for b in a["content"]
if isinstance(b, dict)
)
# Last one: thinking preserved
last_thinking = [
b for b in assistants[2]["content"]
if isinstance(b, dict) and b.get("type") == "thinking"
]
assert len(last_thinking) == 1
assert last_thinking[0]["signature"] == "sig_3"
# ---------------------------------------------------------------------------
# Tool choice
# ---------------------------------------------------------------------------
+63 -110
View File
@@ -77,20 +77,6 @@ class TestReadCodexAccessToken:
result = _read_codex_access_token()
assert result == "tok-123"
def test_pool_without_selected_entry_falls_back_to_auth_store(self, tmp_path, monkeypatch):
hermes_home = tmp_path / "hermes"
hermes_home.mkdir(parents=True, exist_ok=True)
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
valid_jwt = "eyJhbGciOiJSUzI1NiJ9.eyJleHAiOjk5OTk5OTk5OTl9.sig"
with patch("agent.auxiliary_client._select_pool_entry", return_value=(True, None)), \
patch("hermes_cli.auth._read_codex_tokens", return_value={
"tokens": {"access_token": valid_jwt, "refresh_token": "refresh"}
}):
result = _read_codex_access_token()
assert result == valid_jwt
def test_missing_returns_none(self, tmp_path, monkeypatch):
hermes_home = tmp_path / "hermes"
hermes_home.mkdir(parents=True, exist_ok=True)
@@ -252,24 +238,6 @@ class TestAnthropicOAuthFlag:
assert mock_build.call_args.args[0] == "sk-ant-oat01-pooled"
class TestTryCodex:
def test_pool_without_selected_entry_falls_back_to_auth_store(self):
with (
patch("agent.auxiliary_client._select_pool_entry", return_value=(True, None)),
patch("agent.auxiliary_client._read_codex_access_token", return_value="codex-auth-token"),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
):
mock_openai.return_value = MagicMock()
from agent.auxiliary_client import _try_codex
client, model = _try_codex()
assert client is not None
assert model == "gpt-5.2-codex"
assert mock_openai.call_args.kwargs["api_key"] == "codex-auth-token"
assert mock_openai.call_args.kwargs["base_url"] == "https://chatgpt.com/backend-api/codex"
class TestExpiredCodexFallback:
"""Test that expired Codex tokens don't block the auto chain."""
@@ -503,23 +471,6 @@ class TestExplicitProviderRouting:
client, model = resolve_provider_client("zai")
assert client is not None
def test_explicit_google_alias_uses_gemini_credentials(self):
"""provider='google' should route through the gemini API-key provider."""
with (
patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={
"api_key": "gemini-key",
"base_url": "https://generativelanguage.googleapis.com/v1beta/openai",
}),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
):
mock_openai.return_value = MagicMock()
client, model = resolve_provider_client("google", model="gemini-3.1-pro-preview")
assert client is not None
assert model == "gemini-3.1-pro-preview"
assert mock_openai.call_args.kwargs["api_key"] == "gemini-key"
assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai"
def test_explicit_unknown_returns_none(self, monkeypatch):
"""Unknown provider should return None."""
client, model = resolve_provider_client("nonexistent-provider")
@@ -673,15 +624,12 @@ class TestVisionClientFallback:
assert client is None
assert model is None
def test_vision_auto_includes_active_provider_when_configured(self, monkeypatch):
"""Active provider appears in available backends when credentials exist."""
monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
def test_vision_auto_includes_anthropic_when_configured(self, monkeypatch):
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
with (
patch("agent.auxiliary_client._read_nous_auth", return_value=None),
patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
):
backends = get_available_vision_backends()
@@ -754,51 +702,88 @@ class TestAuxiliaryPoolAwareness:
assert call_kwargs["base_url"] == "https://api.githubcopilot.com"
assert call_kwargs["default_headers"]["Editor-Version"]
def test_vision_auto_uses_active_provider_as_fallback(self, monkeypatch):
"""When no OpenRouter/Nous available, vision auto falls back to active provider."""
monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
def test_vision_auto_uses_anthropic_when_no_higher_priority_backend(self, monkeypatch):
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
with (
patch("agent.auxiliary_client._read_nous_auth", return_value=None),
patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
):
client, model = get_vision_auxiliary_client()
assert client is not None
assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
assert model == "claude-haiku-4-5-20251001"
def test_vision_auto_prefers_active_provider_over_openrouter(self, monkeypatch):
"""Active provider is tried before OpenRouter in vision auto."""
def test_selected_anthropic_provider_is_preferred_for_vision_auto(self, monkeypatch):
monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
def fake_load_config():
return {"model": {"provider": "anthropic", "default": "claude-sonnet-4-6"}}
with (
patch("agent.auxiliary_client._read_nous_auth", return_value=None),
patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
patch("hermes_cli.config.load_config", fake_load_config),
):
client, model = get_vision_auxiliary_client()
assert client is not None
assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
assert model == "claude-haiku-4-5-20251001"
def test_selected_codex_provider_short_circuits_vision_auto(self, monkeypatch):
def fake_load_config():
return {"model": {"provider": "openai-codex", "default": "gpt-5.2-codex"}}
codex_client = MagicMock()
with (
patch("hermes_cli.config.load_config", fake_load_config),
patch("agent.auxiliary_client._try_codex", return_value=(codex_client, "gpt-5.2-codex")) as mock_codex,
patch("agent.auxiliary_client._try_openrouter") as mock_openrouter,
patch("agent.auxiliary_client._try_nous") as mock_nous,
patch("agent.auxiliary_client._try_anthropic") as mock_anthropic,
patch("agent.auxiliary_client._try_custom_endpoint") as mock_custom,
):
provider, client, model = resolve_vision_provider_client()
# Active provider should win over OpenRouter
assert provider == "anthropic"
assert provider == "openai-codex"
assert client is codex_client
assert model == "gpt-5.2-codex"
mock_codex.assert_called_once()
mock_openrouter.assert_not_called()
mock_nous.assert_not_called()
mock_anthropic.assert_not_called()
mock_custom.assert_not_called()
def test_vision_auto_uses_named_custom_as_active_provider(self, monkeypatch):
"""Named custom provider works as active provider fallback in vision auto."""
def test_vision_auto_includes_codex(self, codex_auth_dir):
"""Codex supports vision (gpt-5.3-codex), so auto mode should use it."""
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
patch("agent.auxiliary_client.OpenAI"):
client, model = get_vision_auxiliary_client()
from agent.auxiliary_client import CodexAuxiliaryClient
assert isinstance(client, CodexAuxiliaryClient)
assert model == "gpt-5.2-codex"
def test_vision_auto_falls_back_to_custom_endpoint(self, monkeypatch):
"""Custom endpoint is used as fallback in vision auto mode.
Many local models (Qwen-VL, LLaVA, etc.) support vision.
When no OpenRouter/Nous/Codex is available, try the custom endpoint.
"""
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \
patch("agent.auxiliary_client._read_main_provider", return_value="custom:local"), \
patch("agent.auxiliary_client._read_main_model", return_value="my-local-model"), \
patch("agent.auxiliary_client.resolve_provider_client",
return_value=(MagicMock(), "my-local-model")) as mock_resolve:
provider, client, model = resolve_vision_provider_client()
assert client is not None
assert provider == "custom:local"
patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
patch("agent.auxiliary_client._resolve_custom_runtime",
return_value=("http://localhost:1234/v1", "local-key")), \
patch("agent.auxiliary_client.OpenAI") as mock_openai:
client, model = get_vision_auxiliary_client()
assert client is not None # Custom endpoint picked up as fallback
def test_vision_direct_endpoint_override(self, monkeypatch):
monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
@@ -837,31 +822,6 @@ class TestAuxiliaryPoolAwareness:
assert model == "google/gemini-3-flash-preview"
assert client is not None
def test_vision_config_google_provider_uses_gemini_credentials(self, monkeypatch):
config = {
"auxiliary": {
"vision": {
"provider": "google",
"model": "gemini-3.1-pro-preview",
}
}
}
monkeypatch.setattr("hermes_cli.config.load_config", lambda: config)
with (
patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={
"api_key": "gemini-key",
"base_url": "https://generativelanguage.googleapis.com/v1beta/openai",
}),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
):
resolved_provider, client, model = resolve_vision_provider_client()
assert resolved_provider == "gemini"
assert client is not None
assert model == "gemini-3.1-pro-preview"
assert mock_openai.call_args.kwargs["api_key"] == "gemini-key"
assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai"
def test_vision_forced_main_uses_custom_endpoint(self, monkeypatch):
"""When explicitly forced to 'main', vision CAN use custom endpoint."""
config = {
@@ -886,14 +846,7 @@ class TestAuxiliaryPoolAwareness:
monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "main")
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
# Clear client cache to avoid stale entries from previous tests
from agent.auxiliary_client import _client_cache
_client_cache.clear()
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
patch("agent.auxiliary_client._read_main_provider", return_value=""), \
patch("agent.auxiliary_client._read_main_model", return_value=""), \
patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \
patch("agent.auxiliary_client._resolve_custom_runtime", return_value=(None, None)), \
patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)):
client, model = get_vision_auxiliary_client()
+6 -173
View File
@@ -324,10 +324,7 @@ class TestCompressWithClient:
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
# Last head message (index 1) is "assistant" → summary should be "user".
# With min_tail=3, tail = last 3 messages (indices 5-7).
# head_last=assistant, tail_first=assistant → summary_role="user", no collision.
# Need 8 messages: min_for_compress = 2+3+1 = 6, must have > 6.
# Last head message (index 1) is "assistant" → summary should be "user"
msgs = [
{"role": "user", "content": "msg 0"},
{"role": "assistant", "content": "msg 1"},
@@ -335,8 +332,6 @@ class TestCompressWithClient:
{"role": "assistant", "content": "msg 3"},
{"role": "user", "content": "msg 4"},
{"role": "assistant", "content": "msg 5"},
{"role": "user", "content": "msg 6"},
{"role": "assistant", "content": "msg 7"},
]
with patch("agent.context_compressor.call_llm", return_value=mock_response):
result = c.compress(msgs)
@@ -465,10 +460,8 @@ class TestCompressWithClient:
c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
# Head: [system, user] → last head = user
# Tail: [assistant, user, assistant] → first tail = assistant
# Tail: [assistant, user] → first tail = assistant
# summary_role="assistant" collides with tail, "user" collides with head → merge
# With min_tail=3, tail = last 3 messages (indices 5-7).
# Need 8 messages: min_for_compress = 2+3+1 = 6, must have > 6.
msgs = [
{"role": "system", "content": "system prompt"},
{"role": "user", "content": "msg 1"},
@@ -477,7 +470,6 @@ class TestCompressWithClient:
{"role": "assistant", "content": "msg 4"}, # compressed
{"role": "assistant", "content": "msg 5"}, # tail start
{"role": "user", "content": "msg 6"},
{"role": "assistant", "content": "msg 7"},
]
with patch("agent.context_compressor.call_llm", return_value=mock_response):
result = c.compress(msgs)
@@ -489,7 +481,7 @@ class TestCompressWithClient:
if r1 in ("user", "assistant") and r2 in ("user", "assistant"):
assert r1 != r2, f"consecutive {r1} at indices {i-1},{i}"
# The summary should be merged into the first tail message (assistant at index 5)
# The summary should be merged into the first tail message (assistant)
first_tail = [m for m in result if "msg 5" in (m.get("content") or "")]
assert len(first_tail) == 1
assert "summary text" in first_tail[0]["content"]
@@ -504,18 +496,14 @@ class TestCompressWithClient:
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
# Head=assistant, Tail=assistant → summary_role="user", no collision.
# With min_tail=3, tail = last 3 messages (indices 5-7).
# Need 8 messages: min_for_compress = 2+3+1 = 6, must have > 6.
# Head=assistant, Tail=assistant → summary_role="user", no collision
msgs = [
{"role": "user", "content": "msg 0"},
{"role": "assistant", "content": "msg 1"},
{"role": "user", "content": "msg 2"},
{"role": "assistant", "content": "msg 3"},
{"role": "user", "content": "msg 4"},
{"role": "assistant", "content": "msg 5"},
{"role": "user", "content": "msg 6"},
{"role": "assistant", "content": "msg 7"},
{"role": "assistant", "content": "msg 4"},
{"role": "user", "content": "msg 5"},
]
with patch("agent.context_compressor.call_llm", return_value=mock_response):
result = c.compress(msgs)
@@ -612,158 +600,3 @@ class TestSummaryTargetRatio:
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
c = ContextCompressor(model="test", quiet_mode=True)
assert c.protect_last_n == 20
class TestTokenBudgetTailProtection:
"""Tests for token-budget-based tail protection (PR #6240).
The core change: tail protection is now based on a token budget rather
than a fixed message count. This prevents large tool outputs from
blocking compaction.
"""
@pytest.fixture()
def budget_compressor(self):
"""Compressor with known token budget for tail protection tests."""
with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
c = ContextCompressor(
model="test/model",
threshold_percent=0.50, # 100K threshold
protect_first_n=2,
protect_last_n=20,
quiet_mode=True,
)
return c
def test_large_tool_outputs_no_longer_block_compaction(self, budget_compressor):
"""The motivating scenario: 20 messages with large tool outputs should
NOT prevent compaction. With message-count tail protection they would
all be protected, leaving nothing to summarize."""
c = budget_compressor
messages = [
{"role": "user", "content": "Start task"},
{"role": "assistant", "content": "On it"},
]
# Add 20 messages with large tool outputs (~5K chars each ≈ 1250 tokens)
for i in range(10):
messages.append({
"role": "assistant", "content": None,
"tool_calls": [{"function": {"name": f"tool_{i}", "arguments": "{}"}}],
})
messages.append({
"role": "tool", "content": "x" * 5000,
"tool_call_id": f"call_{i}",
})
# Add 3 recent small messages
messages.append({"role": "user", "content": "What's the status?"})
messages.append({"role": "assistant", "content": "Here's what I found..."})
messages.append({"role": "user", "content": "Continue"})
# The tail cut should NOT protect all 20 tool messages
head_end = c.protect_first_n
cut = c._find_tail_cut_by_tokens(messages, head_end)
tail_size = len(messages) - cut
# With token budget, the tail should be much smaller than 20+
assert tail_size < 20, f"Tail {tail_size} messages — large tool outputs are blocking compaction"
# But at least 3 (hard minimum)
assert tail_size >= 3
def test_min_tail_always_3_messages(self, budget_compressor):
"""Even with a tiny token budget, at least 3 messages are protected."""
c = budget_compressor
# Override to a tiny budget
c.tail_token_budget = 10
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hi"},
{"role": "user", "content": "do something"},
{"role": "assistant", "content": "working on it"},
{"role": "user", "content": "more work"},
{"role": "assistant", "content": "done"},
{"role": "user", "content": "thanks"},
]
head_end = 2
cut = c._find_tail_cut_by_tokens(messages, head_end)
tail_size = len(messages) - cut
assert tail_size >= 3, f"Tail is only {tail_size} messages, min should be 3"
def test_soft_ceiling_allows_oversized_message(self, budget_compressor):
"""The 1.5x soft ceiling allows an oversized message to be included
rather than splitting it."""
c = budget_compressor
# Set a small budget — 500 tokens
c.tail_token_budget = 500
messages = [
{"role": "user", "content": "hello"},
{"role": "assistant", "content": "hi"},
{"role": "user", "content": "read the file"},
# This message is ~600 tokens (> budget of 500, but < 1.5x = 750)
{"role": "assistant", "content": "a" * 2400},
{"role": "user", "content": "short"},
{"role": "assistant", "content": "short reply"},
{"role": "user", "content": "continue"},
]
head_end = 2
cut = c._find_tail_cut_by_tokens(messages, head_end)
# The oversized message at index 3 should NOT be the cut point
# because 1.5x ceiling = 750 tokens and accumulated would be ~610
# (short msgs + oversized msg) which is < 750
tail_size = len(messages) - cut
assert tail_size >= 3
def test_small_conversation_still_compresses(self, budget_compressor):
"""With the new min of 8 messages (head=2 + 3 + 1 guard + 2 middle),
a small but compressible conversation should still compress."""
c = budget_compressor
# 9 messages: head(2) + 4 middle + 3 tail = compressible
messages = []
for i in range(9):
role = "user" if i % 2 == 0 else "assistant"
messages.append({"role": role, "content": f"Message {i}"})
# Should not early-return (needs > protect_first_n + 3 + 1 = 6)
# Mock the summary generation to avoid real API call
with patch.object(c, "_generate_summary", return_value="Summary of conversation"):
result = c.compress(messages, current_tokens=90_000)
# Should have compressed (fewer messages than original)
assert len(result) < len(messages)
def test_prune_with_token_budget(self, budget_compressor):
"""_prune_old_tool_results with protect_tail_tokens respects the budget."""
c = budget_compressor
messages = [
{"role": "user", "content": "start"},
{"role": "assistant", "content": None,
"tool_calls": [{"function": {"name": "read_file", "arguments": '{"path": "big.txt"}'}}]},
{"role": "tool", "content": "x" * 10000, "tool_call_id": "c1"}, # ~2500 tokens
{"role": "assistant", "content": None,
"tool_calls": [{"function": {"name": "read_file", "arguments": '{"path": "small.txt"}'}}]},
{"role": "tool", "content": "y" * 10000, "tool_call_id": "c2"}, # ~2500 tokens
{"role": "user", "content": "short recent message"},
{"role": "assistant", "content": "short reply"},
]
# With a 1000-token budget, only the last couple messages should be protected
result, pruned = c._prune_old_tool_results(
messages, protect_tail_count=2, protect_tail_tokens=1000,
)
# At least one old tool result should have been pruned
assert pruned >= 1
def test_prune_without_token_budget_uses_message_count(self, budget_compressor):
"""Without protect_tail_tokens, falls back to message-count behavior."""
c = budget_compressor
messages = [
{"role": "user", "content": "start"},
{"role": "assistant", "content": None,
"tool_calls": [{"function": {"name": "tool", "arguments": "{}"}}]},
{"role": "tool", "content": "x" * 5000, "tool_call_id": "c1"},
{"role": "user", "content": "recent"},
{"role": "assistant", "content": "reply"},
]
# protect_tail_count=3 means last 3 messages protected
result, pruned = c._prune_old_tool_results(
messages, protect_tail_count=3,
)
# Tool at index 2 is outside the protected tail (last 3 = indices 2,3,4)
# so it might or might not be pruned depending on boundary
assert isinstance(pruned, int)
-36
View File
@@ -214,42 +214,6 @@ def test_exhausted_entry_resets_after_ttl(tmp_path, monkeypatch):
assert entry.last_status == "ok"
def test_exhausted_402_entry_resets_after_one_hour(tmp_path, monkeypatch):
"""402-exhausted credentials recover after 1 hour, not 24."""
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
_write_auth_store(
tmp_path,
{
"version": 1,
"credential_pool": {
"openrouter": [
{
"id": "cred-1",
"label": "primary",
"auth_type": "api_key",
"priority": 0,
"source": "manual",
"access_token": "***",
"base_url": "https://openrouter.ai/api/v1",
"last_status": "exhausted",
"last_status_at": time.time() - 3700, # ~1h2m ago
"last_error_code": 402,
}
]
},
},
)
from agent.credential_pool import load_pool
pool = load_pool("openrouter")
entry = pool.select()
assert entry is not None
assert entry.id == "cred-1"
assert entry.last_status == "ok"
def test_explicit_reset_timestamp_overrides_default_429_ttl(tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path / "hermes"))
_write_auth_store(
-42
View File
@@ -1,42 +0,0 @@
"""Tests for MiniMax auxiliary client URL normalization.
MiniMax and MiniMax-CN set inference_base_url to the /anthropic path.
The auxiliary client uses the OpenAI SDK, which needs /v1 instead.
"""
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
from agent.auxiliary_client import _to_openai_base_url
class TestToOpenaiBaseUrl:
def test_minimax_global_anthropic_suffix_replaced(self):
assert _to_openai_base_url("https://api.minimax.io/anthropic") == "https://api.minimax.io/v1"
def test_minimax_cn_anthropic_suffix_replaced(self):
assert _to_openai_base_url("https://api.minimaxi.com/anthropic") == "https://api.minimaxi.com/v1"
def test_trailing_slash_stripped_before_replace(self):
assert _to_openai_base_url("https://api.minimax.io/anthropic/") == "https://api.minimax.io/v1"
def test_v1_url_unchanged(self):
assert _to_openai_base_url("https://api.openai.com/v1") == "https://api.openai.com/v1"
def test_openrouter_url_unchanged(self):
assert _to_openai_base_url("https://openrouter.ai/api/v1") == "https://openrouter.ai/api/v1"
def test_anthropic_domain_unchanged(self):
"""api.anthropic.com doesn't end with /anthropic — should be untouched."""
assert _to_openai_base_url("https://api.anthropic.com") == "https://api.anthropic.com"
def test_anthropic_in_subpath_unchanged(self):
assert _to_openai_base_url("https://example.com/anthropic/extra") == "https://example.com/anthropic/extra"
def test_empty_string(self):
assert _to_openai_base_url("") == ""
def test_none(self):
assert _to_openai_base_url(None) == ""
-105
View File
@@ -1,105 +0,0 @@
"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog."""
class TestMinimaxContextLengths:
"""Verify per-model context length entries for MiniMax models."""
def test_m1_variants_have_1m_context(self):
from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
# Keys are lowercase because the lookup lowercases model names
for model in ("minimax-m1", "minimax-m1-40k", "minimax-m1-80k",
"minimax-m1-128k", "minimax-m1-256k"):
assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths"
assert DEFAULT_CONTEXT_LENGTHS[model] == 1_000_000, f"{model} expected 1M"
def test_m2_variants_have_1m_context(self):
from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
# Keys are lowercase because the lookup lowercases model names
for model in ("minimax-m2.5", "minimax-m2.7"):
assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths"
assert DEFAULT_CONTEXT_LENGTHS[model] == 1_048_576, f"{model} expected 1048576"
def test_minimax_prefix_fallback(self):
from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
# The generic "minimax" prefix entry should be 1M for unknown models
assert DEFAULT_CONTEXT_LENGTHS["minimax"] == 1_048_576
class TestMinimaxThinkingGuard:
"""Verify that build_anthropic_kwargs does NOT add thinking params for MiniMax models."""
def test_no_thinking_for_minimax_m27(self):
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="MiniMax-M2.7",
messages=[{"role": "user", "content": "hello"}],
tools=None,
max_tokens=4096,
reasoning_config={"enabled": True, "effort": "medium"},
)
assert "thinking" not in kwargs
assert "output_config" not in kwargs
def test_no_thinking_for_minimax_m1(self):
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="MiniMax-M1-128k",
messages=[{"role": "user", "content": "hello"}],
tools=None,
max_tokens=4096,
reasoning_config={"enabled": True, "effort": "high"},
)
assert "thinking" not in kwargs
def test_thinking_still_works_for_claude(self):
from agent.anthropic_adapter import build_anthropic_kwargs
kwargs = build_anthropic_kwargs(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "hello"}],
tools=None,
max_tokens=4096,
reasoning_config={"enabled": True, "effort": "medium"},
)
assert "thinking" in kwargs
class TestMinimaxAuxModel:
"""Verify auxiliary model is standard (not highspeed)."""
def test_minimax_aux_is_standard(self):
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
assert _API_KEY_PROVIDER_AUX_MODELS["minimax"] == "MiniMax-M2.7"
assert _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"] == "MiniMax-M2.7"
def test_minimax_aux_not_highspeed(self):
from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax"]
assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"]
class TestMinimaxModelCatalog:
"""Verify the model catalog includes M1 family and excludes deprecated models."""
def test_catalog_includes_m1_family(self):
from hermes_cli.models import _PROVIDER_MODELS
for provider in ("minimax", "minimax-cn"):
models = _PROVIDER_MODELS[provider]
assert "MiniMax-M1" in models
assert "MiniMax-M1-40k" in models
assert "MiniMax-M1-80k" in models
assert "MiniMax-M1-128k" in models
assert "MiniMax-M1-256k" in models
def test_catalog_excludes_deprecated(self):
from hermes_cli.models import _PROVIDER_MODELS
for provider in ("minimax", "minimax-cn"):
models = _PROVIDER_MODELS[provider]
assert "MiniMax-M2.1" not in models
def test_catalog_excludes_highspeed(self):
from hermes_cli.models import _PROVIDER_MODELS
for provider in ("minimax", "minimax-cn"):
models = _PROVIDER_MODELS[provider]
assert "MiniMax-M2.7-highspeed" not in models
assert "MiniMax-M2.5-highspeed" not in models
-66
View File
@@ -1,66 +0,0 @@
import pytest
from unittest.mock import MagicMock, patch
from hermes_cli.plugins import VALID_HOOKS, PluginManager
import os
import shutil
import tempfile
from cli import HermesCLI
def test_session_hooks_in_valid_hooks():
"""Verify on_session_finalize and on_session_reset are registered as valid hooks."""
assert "on_session_finalize" in VALID_HOOKS
assert "on_session_reset" in VALID_HOOKS
@patch("hermes_cli.plugins.invoke_hook")
def test_session_finalize_on_reset(mock_invoke_hook):
"""Verify on_session_finalize fires when /new or /reset is used."""
cli = HermesCLI()
cli.agent = MagicMock()
cli.agent.session_id = "test-session-id"
# Simulate /new command which triggers on_session_finalize for the old session
cli.new_session(silent=True)
# Check if on_session_finalize was called for the old session
mock_invoke_hook.assert_any_call(
"on_session_finalize", session_id="test-session-id", platform="cli"
)
# Check if on_session_reset was called for the new session
mock_invoke_hook.assert_any_call(
"on_session_reset", session_id=cli.session_id, platform="cli"
)
@patch("hermes_cli.plugins.invoke_hook")
def test_session_finalize_on_cleanup(mock_invoke_hook):
"""Verify on_session_finalize fires during CLI exit cleanup."""
import cli as cli_mod
mock_agent = MagicMock()
mock_agent.session_id = "cleanup-session-id"
cli_mod._active_agent_ref = mock_agent
cli_mod._cleanup_done = False
cli_mod._run_cleanup()
mock_invoke_hook.assert_any_call(
"on_session_finalize", session_id="cleanup-session-id", platform="cli"
)
@patch("hermes_cli.plugins.invoke_hook")
def test_hook_errors_are_caught(mock_invoke_hook):
"""Verify hook exceptions are caught and don't crash the agent."""
mgr = PluginManager()
# Register a hook that raises
def bad_callback(**kwargs):
raise Exception("Hook failed")
mgr._hooks["on_session_finalize"] = [bad_callback]
# This should not raise
results = mgr.invoke_hook("on_session_finalize", session_id="test", platform="cli")
assert results == []
+24 -231
View File
@@ -33,13 +33,6 @@ def git_repo(tmp_path):
["git", "commit", "-m", "Initial commit"],
cwd=repo, capture_output=True,
)
# Add a fake remote ref so cleanup logic sees the initial commit as
# "pushed". Without this, `git log HEAD --not --remotes` treats every
# commit as unpushed and cleanup refuses to delete worktrees.
subprocess.run(
["git", "update-ref", "refs/remotes/origin/main", "HEAD"],
cwd=repo, capture_output=True,
)
return repo
@@ -88,11 +81,7 @@ def _setup_worktree(repo_root):
def _cleanup_worktree(info):
"""Test version of _cleanup_worktree.
Preserves the worktree only if it has unpushed commits.
Dirty working tree alone is not enough to keep it.
"""
"""Test version of _cleanup_worktree."""
wt_path = info["path"]
branch = info["branch"]
repo_root = info["repo_root"]
@@ -100,15 +89,15 @@ def _cleanup_worktree(info):
if not Path(wt_path).exists():
return
# Check for unpushed commits
result = subprocess.run(
["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
# Check for uncommitted changes
status = subprocess.run(
["git", "status", "--porcelain"],
capture_output=True, text=True, timeout=10, cwd=wt_path,
)
has_unpushed = bool(result.stdout.strip())
has_changes = bool(status.stdout.strip())
if has_unpushed:
return False # Did not clean up — has unpushed commits
if has_changes:
return False # Did not clean up
subprocess.run(
["git", "worktree", "remove", wt_path, "--force"],
@@ -215,45 +204,20 @@ class TestWorktreeCleanup:
assert result is True
assert not Path(info["path"]).exists()
def test_dirty_worktree_cleaned_when_no_unpushed(self, git_repo):
"""Dirty working tree without unpushed commits is cleaned up.
Agent sessions typically leave untracked files / artifacts behind.
Since all real work is in pushed commits, these don't warrant
keeping the worktree.
"""
def test_dirty_worktree_kept(self, git_repo):
info = _setup_worktree(str(git_repo))
assert info is not None
# Make uncommitted changes (untracked file)
# Make uncommitted changes
(Path(info["path"]) / "new-file.txt").write_text("uncommitted")
subprocess.run(
["git", "add", "new-file.txt"],
cwd=info["path"], capture_output=True,
)
# The git_repo fixture already has a fake remote ref so the initial
# commit is seen as "pushed". No unpushed commits → cleanup proceeds.
result = _cleanup_worktree(info)
assert result is True # Cleaned up despite dirty working tree
assert not Path(info["path"]).exists()
def test_worktree_with_unpushed_commits_kept(self, git_repo):
"""Worktree with unpushed commits is preserved."""
info = _setup_worktree(str(git_repo))
assert info is not None
# Make a commit that is NOT on any remote
(Path(info["path"]) / "work.txt").write_text("real work")
subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
subprocess.run(
["git", "commit", "-m", "agent work"],
cwd=info["path"], capture_output=True,
)
result = _cleanup_worktree(info)
assert result is False # Kept — has unpushed commits
assert Path(info["path"]).exists()
assert result is False
assert Path(info["path"]).exists() # Still there
def test_branch_deleted_on_cleanup(self, git_repo):
info = _setup_worktree(str(git_repo))
@@ -403,7 +367,7 @@ class TestMultipleWorktrees:
lines = [l for l in result.stdout.strip().splitlines() if l.strip()]
assert len(lines) == 11
# Cleanup all (git_repo fixture has a fake remote ref so cleanup works)
# Cleanup all
for info in worktrees:
# Discard changes first so cleanup works
subprocess.run(
@@ -528,77 +492,33 @@ class TestStaleWorktreePruning:
assert not pruned
assert Path(info["path"]).exists()
def test_keeps_old_worktree_with_unpushed_commits(self, git_repo):
"""Old worktrees (24-72h) with unpushed commits should NOT be pruned."""
def test_keeps_dirty_old_worktree(self, git_repo):
"""Old worktrees with uncommitted changes should NOT be pruned."""
import time
info = _setup_worktree(str(git_repo))
assert info is not None
# Make an unpushed commit
(Path(info["path"]) / "work.txt").write_text("real work")
subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
# Make it dirty
(Path(info["path"]) / "dirty.txt").write_text("uncommitted")
subprocess.run(
["git", "commit", "-m", "agent work"],
["git", "add", "dirty.txt"],
cwd=info["path"], capture_output=True,
)
# Make it old (25h — in the 24-72h soft tier)
# Make it old
old_time = time.time() - (25 * 3600)
os.utime(info["path"], (old_time, old_time))
# Check for unpushed commits (simulates prune logic)
result = subprocess.run(
["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
# Check if it would be pruned
status = subprocess.run(
["git", "status", "--porcelain"],
capture_output=True, text=True, cwd=info["path"],
)
has_unpushed = bool(result.stdout.strip())
assert has_unpushed # Has unpushed commits → not pruned in soft tier
has_changes = bool(status.stdout.strip())
assert has_changes # Should be dirty → not pruned
assert Path(info["path"]).exists()
def test_force_prunes_very_old_worktree(self, git_repo):
"""Worktrees older than 72h should be force-pruned regardless."""
import time
info = _setup_worktree(str(git_repo))
assert info is not None
# Make an unpushed commit (would normally protect it)
(Path(info["path"]) / "work.txt").write_text("stale work")
subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
subprocess.run(
["git", "commit", "-m", "old agent work"],
cwd=info["path"], capture_output=True,
)
# Make it very old (73h — beyond the 72h hard threshold)
old_time = time.time() - (73 * 3600)
os.utime(info["path"], (old_time, old_time))
# Simulate the force-prune tier check
hard_cutoff = time.time() - (72 * 3600)
mtime = Path(info["path"]).stat().st_mtime
assert mtime <= hard_cutoff # Should qualify for force removal
# Actually remove it (simulates _prune_stale_worktrees force path)
branch_result = subprocess.run(
["git", "branch", "--show-current"],
capture_output=True, text=True, timeout=5, cwd=info["path"],
)
branch = branch_result.stdout.strip()
subprocess.run(
["git", "worktree", "remove", info["path"], "--force"],
capture_output=True, text=True, timeout=15, cwd=str(git_repo),
)
if branch:
subprocess.run(
["git", "branch", "-D", branch],
capture_output=True, text=True, timeout=10, cwd=str(git_repo),
)
assert not Path(info["path"]).exists()
class TestEdgeCases:
"""Test edge cases for robustness."""
@@ -691,133 +611,6 @@ class TestTerminalCWDIntegration:
assert result.stdout.strip() == "true"
class TestOrphanedBranchPruning:
"""Test cleanup of orphaned hermes/* and pr-* branches."""
def test_prunes_orphaned_hermes_branch(self, git_repo):
"""hermes/hermes-* branches with no worktree should be deleted."""
# Create a branch that looks like a worktree branch but has no worktree
subprocess.run(
["git", "branch", "hermes/hermes-deadbeef", "HEAD"],
cwd=str(git_repo), capture_output=True,
)
# Verify it exists
result = subprocess.run(
["git", "branch", "--list", "hermes/hermes-deadbeef"],
capture_output=True, text=True, cwd=str(git_repo),
)
assert "hermes/hermes-deadbeef" in result.stdout
# Simulate _prune_orphaned_branches logic
result = subprocess.run(
["git", "branch", "--format=%(refname:short)"],
capture_output=True, text=True, cwd=str(git_repo),
)
all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
wt_result = subprocess.run(
["git", "worktree", "list", "--porcelain"],
capture_output=True, text=True, cwd=str(git_repo),
)
active_branches = {"main"}
for line in wt_result.stdout.split("\n"):
if line.startswith("branch refs/heads/"):
active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
orphaned = [
b for b in all_branches
if b not in active_branches
and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
]
assert "hermes/hermes-deadbeef" in orphaned
# Delete them
if orphaned:
subprocess.run(
["git", "branch", "-D"] + orphaned,
capture_output=True, text=True, cwd=str(git_repo),
)
# Verify gone
result = subprocess.run(
["git", "branch", "--list", "hermes/hermes-deadbeef"],
capture_output=True, text=True, cwd=str(git_repo),
)
assert "hermes/hermes-deadbeef" not in result.stdout
def test_prunes_orphaned_pr_branch(self, git_repo):
"""pr-* branches should be deleted during pruning."""
subprocess.run(
["git", "branch", "pr-1234", "HEAD"],
cwd=str(git_repo), capture_output=True,
)
subprocess.run(
["git", "branch", "pr-5678", "HEAD"],
cwd=str(git_repo), capture_output=True,
)
result = subprocess.run(
["git", "branch", "--format=%(refname:short)"],
capture_output=True, text=True, cwd=str(git_repo),
)
all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
active_branches = {"main"}
orphaned = [
b for b in all_branches
if b not in active_branches and b.startswith("pr-")
]
assert "pr-1234" in orphaned
assert "pr-5678" in orphaned
subprocess.run(
["git", "branch", "-D"] + orphaned,
capture_output=True, text=True, cwd=str(git_repo),
)
# Verify gone
result = subprocess.run(
["git", "branch", "--format=%(refname:short)"],
capture_output=True, text=True, cwd=str(git_repo),
)
remaining = result.stdout.strip()
assert "pr-1234" not in remaining
assert "pr-5678" not in remaining
def test_preserves_active_worktree_branch(self, git_repo):
"""Branches with active worktrees should NOT be pruned."""
info = _setup_worktree(str(git_repo))
assert info is not None
result = subprocess.run(
["git", "worktree", "list", "--porcelain"],
capture_output=True, text=True, cwd=str(git_repo),
)
active_branches = set()
for line in result.stdout.split("\n"):
if line.startswith("branch refs/heads/"):
active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
assert info["branch"] in active_branches # Protected
def test_preserves_main_branch(self, git_repo):
"""main branch should never be pruned."""
result = subprocess.run(
["git", "branch", "--format=%(refname:short)"],
capture_output=True, text=True, cwd=str(git_repo),
)
all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
active_branches = {"main"}
orphaned = [
b for b in all_branches
if b not in active_branches
and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
]
assert "main" not in orphaned
class TestSystemPromptInjection:
"""Test that the agent gets worktree context in its system prompt."""
@@ -832,7 +625,7 @@ class TestSystemPromptInjection:
f"{info['path']}. Your branch is `{info['branch']}`. "
f"Changes here do not affect the main working tree or other agents. "
f"Remember to commit and push your changes, and create a PR if appropriate. "
f"The original repo is at {info['repo_root']}.]\n"
f"The original repo is at {info['repo_root']}.]"
)
assert info["path"] in wt_note
-30
View File
@@ -339,36 +339,6 @@ class TestMarkJobRun:
assert updated["last_status"] == "error"
assert updated["last_error"] == "timeout"
def test_delivery_error_tracked_separately(self, tmp_cron_dir):
"""Agent succeeds but delivery fails — both tracked independently."""
job = create_job(prompt="Report", schedule="every 1h")
mark_job_run(job["id"], success=True, delivery_error="platform 'telegram' not configured")
updated = get_job(job["id"])
assert updated["last_status"] == "ok"
assert updated["last_error"] is None
assert updated["last_delivery_error"] == "platform 'telegram' not configured"
def test_delivery_error_cleared_on_success(self, tmp_cron_dir):
"""Successful delivery clears the previous delivery error."""
job = create_job(prompt="Report", schedule="every 1h")
mark_job_run(job["id"], success=True, delivery_error="network timeout")
updated = get_job(job["id"])
assert updated["last_delivery_error"] == "network timeout"
# Next run delivers successfully
mark_job_run(job["id"], success=True, delivery_error=None)
updated = get_job(job["id"])
assert updated["last_delivery_error"] is None
def test_both_agent_and_delivery_error(self, tmp_cron_dir):
"""Agent fails AND delivery fails — both errors recorded."""
job = create_job(prompt="Report", schedule="every 1h")
mark_job_run(job["id"], success=False, error="model timeout",
delivery_error="platform 'discord' not enabled")
updated = get_job(job["id"])
assert updated["last_status"] == "error"
assert updated["last_error"] == "model timeout"
assert updated["last_delivery_error"] == "platform 'discord' not enabled"
class TestAdvanceNextRun:
"""Tests for advance_next_run() — crash-safety for recurring jobs."""
-84
View File
@@ -508,90 +508,6 @@ class TestDeliverResultWrapping:
assert send_mock.call_args.kwargs["thread_id"] == "17585"
class TestDeliverResultErrorReturns:
"""Verify _deliver_result returns error strings on failure, None on success."""
def test_returns_none_on_successful_delivery(self):
from gateway.config import Platform
pconfig = MagicMock()
pconfig.enabled = True
mock_cfg = MagicMock()
mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})):
job = {
"id": "ok-job",
"deliver": "origin",
"origin": {"platform": "telegram", "chat_id": "123"},
}
result = _deliver_result(job, "Output.")
assert result is None
def test_returns_none_for_local_delivery(self):
"""local-only jobs don't deliver — not a failure."""
job = {"id": "local-job", "deliver": "local"}
result = _deliver_result(job, "Output.")
assert result is None
def test_returns_error_for_unknown_platform(self):
job = {
"id": "bad-platform",
"deliver": "origin",
"origin": {"platform": "fax", "chat_id": "123"},
}
with patch("gateway.config.load_gateway_config"):
result = _deliver_result(job, "Output.")
assert result is not None
assert "unknown platform" in result
def test_returns_error_when_platform_disabled(self):
from gateway.config import Platform
pconfig = MagicMock()
pconfig.enabled = False
mock_cfg = MagicMock()
mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
with patch("gateway.config.load_gateway_config", return_value=mock_cfg):
job = {
"id": "disabled",
"deliver": "origin",
"origin": {"platform": "telegram", "chat_id": "123"},
}
result = _deliver_result(job, "Output.")
assert result is not None
assert "not configured" in result
def test_returns_error_on_send_failure(self):
from gateway.config import Platform
pconfig = MagicMock()
pconfig.enabled = True
mock_cfg = MagicMock()
mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"error": "rate limited"})):
job = {
"id": "rate-limited",
"deliver": "origin",
"origin": {"platform": "telegram", "chat_id": "123"},
}
result = _deliver_result(job, "Output.")
assert result is not None
assert "rate limited" in result
def test_returns_error_for_unresolved_target(self, monkeypatch):
"""Non-local delivery with no resolvable target should return an error."""
monkeypatch.delenv("TELEGRAM_HOME_CHANNEL", raising=False)
job = {"id": "no-target", "deliver": "telegram"}
result = _deliver_result(job, "Output.")
assert result is not None
assert "no delivery target" in result
class TestRunJobSessionPersistence:
def test_run_job_passes_session_db_and_cron_platform(self, tmp_path):
job = {
-361
View File
@@ -1,361 +0,0 @@
"""Tests for the BlueBubbles iMessage gateway adapter."""
import pytest
from gateway.config import Platform, PlatformConfig
def _make_adapter(monkeypatch, **extra):
monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
from gateway.platforms.bluebubbles import BlueBubblesAdapter
cfg = PlatformConfig(
enabled=True,
extra={
"server_url": "http://localhost:1234",
"password": "secret",
**extra,
},
)
return BlueBubblesAdapter(cfg)
class TestBlueBubblesPlatformEnum:
def test_bluebubbles_enum_exists(self):
assert Platform.BLUEBUBBLES.value == "bluebubbles"
class TestBlueBubblesConfigLoading:
def test_apply_env_overrides_bluebubbles(self, monkeypatch):
monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
monkeypatch.setenv("BLUEBUBBLES_WEBHOOK_PORT", "9999")
from gateway.config import GatewayConfig, _apply_env_overrides
config = GatewayConfig()
_apply_env_overrides(config)
assert Platform.BLUEBUBBLES in config.platforms
bc = config.platforms[Platform.BLUEBUBBLES]
assert bc.enabled is True
assert bc.extra["server_url"] == "http://localhost:1234"
assert bc.extra["password"] == "secret"
assert bc.extra["webhook_port"] == 9999
def test_connected_platforms_includes_bluebubbles(self, monkeypatch):
monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
from gateway.config import GatewayConfig, _apply_env_overrides
config = GatewayConfig()
_apply_env_overrides(config)
assert Platform.BLUEBUBBLES in config.get_connected_platforms()
def test_home_channel_set_from_env(self, monkeypatch):
monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
monkeypatch.setenv("BLUEBUBBLES_HOME_CHANNEL", "user@example.com")
from gateway.config import GatewayConfig, _apply_env_overrides
config = GatewayConfig()
_apply_env_overrides(config)
hc = config.platforms[Platform.BLUEBUBBLES].home_channel
assert hc is not None
assert hc.chat_id == "user@example.com"
def test_not_connected_without_password(self, monkeypatch):
monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
monkeypatch.delenv("BLUEBUBBLES_PASSWORD", raising=False)
from gateway.config import GatewayConfig, _apply_env_overrides
config = GatewayConfig()
_apply_env_overrides(config)
assert Platform.BLUEBUBBLES not in config.get_connected_platforms()
class TestBlueBubblesHelpers:
def test_check_requirements(self, monkeypatch):
monkeypatch.setenv("BLUEBUBBLES_SERVER_URL", "http://localhost:1234")
monkeypatch.setenv("BLUEBUBBLES_PASSWORD", "secret")
from gateway.platforms.bluebubbles import check_bluebubbles_requirements
assert check_bluebubbles_requirements() is True
def test_format_message_strips_markdown(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
assert adapter.format_message("**Hello** `world`") == "Hello world"
def test_strip_markdown_headers(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
assert adapter.format_message("## Heading\ntext") == "Heading\ntext"
def test_strip_markdown_links(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
assert adapter.format_message("[click here](http://example.com)") == "click here"
def test_init_normalizes_webhook_path(self, monkeypatch):
adapter = _make_adapter(monkeypatch, webhook_path="bluebubbles-webhook")
assert adapter.webhook_path == "/bluebubbles-webhook"
def test_init_preserves_leading_slash(self, monkeypatch):
adapter = _make_adapter(monkeypatch, webhook_path="/my-hook")
assert adapter.webhook_path == "/my-hook"
def test_server_url_normalized(self, monkeypatch):
adapter = _make_adapter(monkeypatch, server_url="http://localhost:1234/")
assert adapter.server_url == "http://localhost:1234"
def test_server_url_adds_scheme(self, monkeypatch):
adapter = _make_adapter(monkeypatch, server_url="localhost:1234")
assert adapter.server_url == "http://localhost:1234"
class TestBlueBubblesWebhookParsing:
def test_webhook_prefers_chat_guid_over_message_guid(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
payload = {
"guid": "MESSAGE-GUID",
"chatGuid": "iMessage;-;user@example.com",
"chatIdentifier": "user@example.com",
}
record = adapter._extract_payload_record(payload) or {}
chat_guid = adapter._value(
record.get("chatGuid"),
payload.get("chatGuid"),
record.get("chat_guid"),
payload.get("chat_guid"),
payload.get("guid"),
)
assert chat_guid == "iMessage;-;user@example.com"
def test_webhook_can_fall_back_to_sender_when_chat_fields_missing(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
payload = {
"data": {
"guid": "MESSAGE-GUID",
"text": "hello",
"handle": {"address": "user@example.com"},
"isFromMe": False,
}
}
record = adapter._extract_payload_record(payload) or {}
chat_guid = adapter._value(
record.get("chatGuid"),
payload.get("chatGuid"),
record.get("chat_guid"),
payload.get("chat_guid"),
payload.get("guid"),
)
chat_identifier = adapter._value(
record.get("chatIdentifier"),
record.get("identifier"),
payload.get("chatIdentifier"),
payload.get("identifier"),
)
sender = (
adapter._value(
record.get("handle", {}).get("address")
if isinstance(record.get("handle"), dict)
else None,
record.get("sender"),
record.get("from"),
record.get("address"),
)
or chat_identifier
or chat_guid
)
if not (chat_guid or chat_identifier) and sender:
chat_identifier = sender
assert chat_identifier == "user@example.com"
def test_extract_payload_record_accepts_list_data(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
payload = {
"type": "new-message",
"data": [
{
"text": "hello",
"chatGuid": "iMessage;-;user@example.com",
"chatIdentifier": "user@example.com",
}
],
}
record = adapter._extract_payload_record(payload)
assert record == payload["data"][0]
def test_extract_payload_record_dict_data(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
payload = {"data": {"text": "hello", "chatGuid": "iMessage;-;+1234"}}
record = adapter._extract_payload_record(payload)
assert record["text"] == "hello"
def test_extract_payload_record_fallback_to_message(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
payload = {"message": {"text": "hello"}}
record = adapter._extract_payload_record(payload)
assert record["text"] == "hello"
class TestBlueBubblesGuidResolution:
def test_raw_guid_returned_as_is(self, monkeypatch):
"""If target already contains ';' it's a raw GUID — return unchanged."""
adapter = _make_adapter(monkeypatch)
import asyncio
result = asyncio.get_event_loop().run_until_complete(
adapter._resolve_chat_guid("iMessage;-;user@example.com")
)
assert result == "iMessage;-;user@example.com"
def test_empty_target_returns_none(self, monkeypatch):
adapter = _make_adapter(monkeypatch)
import asyncio
result = asyncio.get_event_loop().run_until_complete(
adapter._resolve_chat_guid("")
)
assert result is None
class TestBlueBubblesToolsetIntegration:
def test_toolset_exists(self):
from toolsets import TOOLSETS
assert "hermes-bluebubbles" in TOOLSETS
def test_toolset_in_gateway_composite(self):
from toolsets import TOOLSETS
gateway = TOOLSETS["hermes-gateway"]
assert "hermes-bluebubbles" in gateway["includes"]
class TestBlueBubblesPromptHint:
def test_platform_hint_exists(self):
from agent.prompt_builder import PLATFORM_HINTS
assert "bluebubbles" in PLATFORM_HINTS
hint = PLATFORM_HINTS["bluebubbles"]
assert "iMessage" in hint
assert "plain text" in hint
class TestBlueBubblesAttachmentDownload:
"""Verify _download_attachment routes to the correct cache helper."""
def test_download_image_uses_image_cache(self, monkeypatch):
"""Image MIME routes to cache_image_from_bytes."""
adapter = _make_adapter(monkeypatch)
import asyncio
import httpx
# Mock the HTTP client response
class MockResponse:
status_code = 200
content = b"\x89PNG\r\n\x1a\n"
def raise_for_status(self):
pass
async def mock_get(*args, **kwargs):
return MockResponse()
adapter.client = type("MockClient", (), {"get": mock_get})()
cached_path = None
def mock_cache_image(data, ext):
nonlocal cached_path
cached_path = f"/tmp/test_image{ext}"
return cached_path
monkeypatch.setattr(
"gateway.platforms.bluebubbles.cache_image_from_bytes",
mock_cache_image,
)
att_meta = {"mimeType": "image/png", "transferName": "photo.png"}
result = asyncio.get_event_loop().run_until_complete(
adapter._download_attachment("att-guid-123", att_meta)
)
assert result == "/tmp/test_image.png"
def test_download_audio_uses_audio_cache(self, monkeypatch):
"""Audio MIME routes to cache_audio_from_bytes."""
adapter = _make_adapter(monkeypatch)
import asyncio
class MockResponse:
status_code = 200
content = b"fake-audio-data"
def raise_for_status(self):
pass
async def mock_get(*args, **kwargs):
return MockResponse()
adapter.client = type("MockClient", (), {"get": mock_get})()
cached_path = None
def mock_cache_audio(data, ext):
nonlocal cached_path
cached_path = f"/tmp/test_audio{ext}"
return cached_path
monkeypatch.setattr(
"gateway.platforms.bluebubbles.cache_audio_from_bytes",
mock_cache_audio,
)
att_meta = {"mimeType": "audio/mpeg", "transferName": "voice.mp3"}
result = asyncio.get_event_loop().run_until_complete(
adapter._download_attachment("att-guid-456", att_meta)
)
assert result == "/tmp/test_audio.mp3"
def test_download_document_uses_document_cache(self, monkeypatch):
"""Non-image/audio MIME routes to cache_document_from_bytes."""
adapter = _make_adapter(monkeypatch)
import asyncio
class MockResponse:
status_code = 200
content = b"fake-doc-data"
def raise_for_status(self):
pass
async def mock_get(*args, **kwargs):
return MockResponse()
adapter.client = type("MockClient", (), {"get": mock_get})()
cached_path = None
def mock_cache_doc(data, filename):
nonlocal cached_path
cached_path = f"/tmp/{filename}"
return cached_path
monkeypatch.setattr(
"gateway.platforms.bluebubbles.cache_document_from_bytes",
mock_cache_doc,
)
att_meta = {"mimeType": "application/pdf", "transferName": "report.pdf"}
result = asyncio.get_event_loop().run_until_complete(
adapter._download_attachment("att-guid-789", att_meta)
)
assert result == "/tmp/report.pdf"
def test_download_returns_none_without_client(self, monkeypatch):
"""No client → returns None gracefully."""
adapter = _make_adapter(monkeypatch)
adapter.client = None
import asyncio
result = asyncio.get_event_loop().run_until_complete(
adapter._download_attachment("att-guid", {"mimeType": "image/png"})
)
assert result is None
@@ -209,31 +209,14 @@ class TestIncomingDocumentHandling:
assert "[Content of readme.md]:" in event.text
assert "# Title" in event.text
@pytest.mark.asyncio
async def test_log_content_injected(self, adapter):
""".log file under 100KB should be treated as text/plain and injected."""
file_content = b"BLE trace line 1\nBLE trace line 2"
with _mock_aiohttp_download(file_content):
msg = make_message(
attachments=[make_attachment(filename="btsnoop_hci.log", content_type="text/plain")],
content="please inspect this",
)
await adapter._handle_message(msg)
event = adapter.handle_message.call_args[0][0]
assert "[Content of btsnoop_hci.log]:" in event.text
assert "BLE trace line 1" in event.text
assert "please inspect this" in event.text
@pytest.mark.asyncio
async def test_oversized_document_skipped(self, adapter):
"""A document over 32MB should be skipped — media_urls stays empty."""
"""A document over 20MB should be skipped — media_urls stays empty."""
msg = make_message([
make_attachment(
filename="huge.pdf",
content_type="application/pdf",
size=33 * 1024 * 1024,
size=25 * 1024 * 1024,
)
])
await adapter._handle_message(msg)
@@ -243,24 +226,6 @@ class TestIncomingDocumentHandling:
# handler must still be called
adapter.handle_message.assert_called_once()
@pytest.mark.asyncio
async def test_mid_sized_zip_under_32mb_is_cached(self, adapter):
"""A 25MB .zip should be accepted now that Discord documents allow up to 32MB."""
msg = make_message([
make_attachment(
filename="bugreport.zip",
content_type="application/zip",
size=25 * 1024 * 1024,
)
])
with _mock_aiohttp_download(b"PK\x03\x04test"):
await adapter._handle_message(msg)
event = adapter.handle_message.call_args[0][0]
assert len(event.media_urls) == 1
assert event.media_types == ["application/zip"]
@pytest.mark.asyncio
async def test_zip_document_cached(self, adapter):
"""A .zip file should be cached as a supported document."""
-277
View File
@@ -1,277 +0,0 @@
"""Tests for Discord reply_to_mode functionality.
Covers the threading behavior control for multi-chunk replies:
- "off": Never reply-reference to original message
- "first": Only first chunk uses reply reference (default)
- "all": All chunks reply-reference the original message
"""
import os
import sys
from types import SimpleNamespace
from unittest.mock import MagicMock, AsyncMock, patch
import pytest
from gateway.config import PlatformConfig, GatewayConfig, Platform, _apply_env_overrides
def _ensure_discord_mock():
"""Install a mock discord module when discord.py isn't available."""
if "discord" in sys.modules and hasattr(sys.modules["discord"], "__file__"):
return
discord_mod = MagicMock()
discord_mod.Intents.default.return_value = MagicMock()
discord_mod.Client = MagicMock
discord_mod.File = MagicMock
discord_mod.DMChannel = type("DMChannel", (), {})
discord_mod.Thread = type("Thread", (), {})
discord_mod.ForumChannel = type("ForumChannel", (), {})
discord_mod.ui = SimpleNamespace(View=object, button=lambda *a, **k: (lambda fn: fn), Button=object)
discord_mod.ButtonStyle = SimpleNamespace(success=1, primary=2, secondary=2, danger=3, green=1, grey=2, blurple=2, red=3)
discord_mod.Color = SimpleNamespace(orange=lambda: 1, green=lambda: 2, blue=lambda: 3, red=lambda: 4, purple=lambda: 5)
discord_mod.Interaction = object
discord_mod.Embed = MagicMock
discord_mod.app_commands = SimpleNamespace(
describe=lambda **kwargs: (lambda fn: fn),
choices=lambda **kwargs: (lambda fn: fn),
Choice=lambda **kwargs: SimpleNamespace(**kwargs),
)
ext_mod = MagicMock()
commands_mod = MagicMock()
commands_mod.Bot = MagicMock
ext_mod.commands = commands_mod
sys.modules.setdefault("discord", discord_mod)
sys.modules.setdefault("discord.ext", ext_mod)
sys.modules.setdefault("discord.ext.commands", commands_mod)
_ensure_discord_mock()
from gateway.platforms.discord import DiscordAdapter # noqa: E402
@pytest.fixture()
def adapter_factory():
"""Factory to create DiscordAdapter with custom reply_to_mode."""
def create(reply_to_mode: str = "first"):
config = PlatformConfig(enabled=True, token="test-token", reply_to_mode=reply_to_mode)
return DiscordAdapter(config)
return create
class TestReplyToModeConfig:
"""Tests for reply_to_mode configuration loading."""
def test_default_mode_is_first(self, adapter_factory):
adapter = adapter_factory()
assert adapter._reply_to_mode == "first"
def test_off_mode(self, adapter_factory):
adapter = adapter_factory(reply_to_mode="off")
assert adapter._reply_to_mode == "off"
def test_first_mode(self, adapter_factory):
adapter = adapter_factory(reply_to_mode="first")
assert adapter._reply_to_mode == "first"
def test_all_mode(self, adapter_factory):
adapter = adapter_factory(reply_to_mode="all")
assert adapter._reply_to_mode == "all"
def test_invalid_mode_stored_as_is(self, adapter_factory):
"""Invalid modes are stored but send() handles them gracefully."""
adapter = adapter_factory(reply_to_mode="invalid")
assert adapter._reply_to_mode == "invalid"
def test_none_mode_defaults_to_first(self):
config = PlatformConfig(enabled=True, token="test-token")
adapter = DiscordAdapter(config)
assert adapter._reply_to_mode == "first"
def test_empty_string_mode_defaults_to_first(self):
config = PlatformConfig(enabled=True, token="test-token", reply_to_mode="")
adapter = DiscordAdapter(config)
assert adapter._reply_to_mode == "first"
def _make_discord_adapter(reply_to_mode: str = "first"):
"""Create a DiscordAdapter with mocked client and channel for send() tests."""
config = PlatformConfig(enabled=True, token="test-token", reply_to_mode=reply_to_mode)
adapter = DiscordAdapter(config)
# Mock the Discord client and channel
mock_channel = AsyncMock()
ref_message = MagicMock()
mock_channel.fetch_message = AsyncMock(return_value=ref_message)
sent_msg = MagicMock()
sent_msg.id = 42
mock_channel.send = AsyncMock(return_value=sent_msg)
mock_client = MagicMock()
mock_client.get_channel = MagicMock(return_value=mock_channel)
adapter._client = mock_client
return adapter, mock_channel, ref_message
class TestSendWithReplyToMode:
"""Tests for send() method respecting reply_to_mode."""
@pytest.mark.asyncio
async def test_off_mode_no_reply_reference(self):
adapter, channel, ref_msg = _make_discord_adapter("off")
adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2", "chunk3"]
await adapter.send("12345", "test content", reply_to="999")
# Should never try to fetch the reference message
channel.fetch_message.assert_not_called()
# All chunks sent without reference
for call in channel.send.call_args_list:
assert call.kwargs.get("reference") is None
@pytest.mark.asyncio
async def test_first_mode_only_first_chunk_references(self):
adapter, channel, ref_msg = _make_discord_adapter("first")
adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2", "chunk3"]
await adapter.send("12345", "test content", reply_to="999")
# Should fetch the reference message
channel.fetch_message.assert_called_once_with(999)
calls = channel.send.call_args_list
assert len(calls) == 3
assert calls[0].kwargs.get("reference") is ref_msg
assert calls[1].kwargs.get("reference") is None
assert calls[2].kwargs.get("reference") is None
@pytest.mark.asyncio
async def test_all_mode_all_chunks_reference(self):
adapter, channel, ref_msg = _make_discord_adapter("all")
adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2", "chunk3"]
await adapter.send("12345", "test content", reply_to="999")
channel.fetch_message.assert_called_once_with(999)
calls = channel.send.call_args_list
assert len(calls) == 3
for call in calls:
assert call.kwargs.get("reference") is ref_msg
@pytest.mark.asyncio
async def test_no_reply_to_param_no_reference(self):
adapter, channel, ref_msg = _make_discord_adapter("all")
adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2"]
await adapter.send("12345", "test content", reply_to=None)
channel.fetch_message.assert_not_called()
for call in channel.send.call_args_list:
assert call.kwargs.get("reference") is None
@pytest.mark.asyncio
async def test_single_chunk_respects_first_mode(self):
adapter, channel, ref_msg = _make_discord_adapter("first")
adapter.truncate_message = lambda content, max_len: ["single chunk"]
await adapter.send("12345", "test", reply_to="999")
calls = channel.send.call_args_list
assert len(calls) == 1
assert calls[0].kwargs.get("reference") is ref_msg
@pytest.mark.asyncio
async def test_single_chunk_off_mode(self):
adapter, channel, ref_msg = _make_discord_adapter("off")
adapter.truncate_message = lambda content, max_len: ["single chunk"]
await adapter.send("12345", "test", reply_to="999")
channel.fetch_message.assert_not_called()
calls = channel.send.call_args_list
assert len(calls) == 1
assert calls[0].kwargs.get("reference") is None
@pytest.mark.asyncio
async def test_invalid_mode_falls_back_to_first_behavior(self):
"""Invalid mode behaves like 'first' — only first chunk gets reference."""
adapter, channel, ref_msg = _make_discord_adapter("banana")
adapter.truncate_message = lambda content, max_len: ["chunk1", "chunk2"]
await adapter.send("12345", "test", reply_to="999")
calls = channel.send.call_args_list
assert len(calls) == 2
assert calls[0].kwargs.get("reference") is ref_msg
assert calls[1].kwargs.get("reference") is None
class TestConfigSerialization:
"""Tests for reply_to_mode serialization (shared with Telegram)."""
def test_to_dict_includes_reply_to_mode(self):
config = PlatformConfig(enabled=True, token="test", reply_to_mode="all")
result = config.to_dict()
assert result["reply_to_mode"] == "all"
def test_from_dict_loads_reply_to_mode(self):
data = {"enabled": True, "token": "***", "reply_to_mode": "off"}
config = PlatformConfig.from_dict(data)
assert config.reply_to_mode == "off"
def test_from_dict_defaults_to_first(self):
data = {"enabled": True, "token": "***"}
config = PlatformConfig.from_dict(data)
assert config.reply_to_mode == "first"
class TestEnvVarOverride:
"""Tests for DISCORD_REPLY_TO_MODE environment variable override."""
def _make_config(self):
config = GatewayConfig()
config.platforms[Platform.DISCORD] = PlatformConfig(enabled=True, token="test")
return config
def test_env_var_sets_off_mode(self):
config = self._make_config()
with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "off"}, clear=False):
_apply_env_overrides(config)
assert config.platforms[Platform.DISCORD].reply_to_mode == "off"
def test_env_var_sets_all_mode(self):
config = self._make_config()
with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "all"}, clear=False):
_apply_env_overrides(config)
assert config.platforms[Platform.DISCORD].reply_to_mode == "all"
def test_env_var_case_insensitive(self):
config = self._make_config()
with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "ALL"}, clear=False):
_apply_env_overrides(config)
assert config.platforms[Platform.DISCORD].reply_to_mode == "all"
def test_env_var_invalid_value_ignored(self):
config = self._make_config()
with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "banana"}, clear=False):
_apply_env_overrides(config)
assert config.platforms[Platform.DISCORD].reply_to_mode == "first"
def test_env_var_empty_value_ignored(self):
config = self._make_config()
with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": ""}, clear=False):
_apply_env_overrides(config)
assert config.platforms[Platform.DISCORD].reply_to_mode == "first"
def test_env_var_creates_platform_config_if_missing(self):
"""DISCORD_REPLY_TO_MODE creates PlatformConfig even without DISCORD_BOT_TOKEN."""
config = GatewayConfig()
assert Platform.DISCORD not in config.platforms
with patch.dict(os.environ, {"DISCORD_REPLY_TO_MODE": "off"}, clear=False):
_apply_env_overrides(config)
assert Platform.DISCORD in config.platforms
assert config.platforms[Platform.DISCORD].reply_to_mode == "off"
@@ -1,432 +0,0 @@
"""Tests for Feishu interactive card approval buttons."""
import asyncio
import json
import os
import sys
from pathlib import Path
from types import SimpleNamespace
from unittest.mock import AsyncMock, MagicMock, Mock, patch
import pytest
# ---------------------------------------------------------------------------
# Ensure the repo root is importable
# ---------------------------------------------------------------------------
_repo = str(Path(__file__).resolve().parents[2])
if _repo not in sys.path:
sys.path.insert(0, _repo)
# ---------------------------------------------------------------------------
# Minimal Feishu mock so FeishuAdapter can be imported without lark-oapi
# ---------------------------------------------------------------------------
def _ensure_feishu_mocks():
"""Provide stubs for lark-oapi / aiohttp.web so the import succeeds."""
if "lark_oapi" not in sys.modules:
mod = MagicMock()
for name in (
"lark_oapi", "lark_oapi.api.im.v1",
"lark_oapi.event", "lark_oapi.event.callback_type",
):
sys.modules.setdefault(name, mod)
if "aiohttp" not in sys.modules:
aio = MagicMock()
sys.modules.setdefault("aiohttp", aio)
sys.modules.setdefault("aiohttp.web", aio.web)
_ensure_feishu_mocks()
from gateway.config import PlatformConfig
from gateway.platforms.feishu import FeishuAdapter
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_adapter() -> FeishuAdapter:
"""Create a FeishuAdapter with mocked internals."""
config = PlatformConfig(enabled=True)
adapter = FeishuAdapter(config)
adapter._client = MagicMock()
return adapter
def _make_card_action_data(
action_value: dict,
chat_id: str = "oc_12345",
open_id: str = "ou_user1",
token: str = "tok_abc",
) -> SimpleNamespace:
"""Create a mock Feishu card action callback data object."""
return SimpleNamespace(
event=SimpleNamespace(
token=token,
context=SimpleNamespace(open_chat_id=chat_id),
operator=SimpleNamespace(open_id=open_id),
action=SimpleNamespace(
tag="button",
value=action_value,
),
),
)
# ===========================================================================
# send_exec_approval — interactive card with buttons
# ===========================================================================
class TestFeishuExecApproval:
"""Test send_exec_approval sends an interactive card."""
@pytest.mark.asyncio
async def test_sends_interactive_card(self):
adapter = _make_adapter()
mock_response = SimpleNamespace(
success=lambda: True,
data=SimpleNamespace(message_id="msg_001"),
)
with patch.object(
adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
return_value=mock_response,
) as mock_send:
result = await adapter.send_exec_approval(
chat_id="oc_12345",
command="rm -rf /important",
session_key="agent:main:feishu:group:oc_12345",
description="dangerous deletion",
)
assert result.success is True
assert result.message_id == "msg_001"
mock_send.assert_called_once()
kwargs = mock_send.call_args[1]
assert kwargs["chat_id"] == "oc_12345"
assert kwargs["msg_type"] == "interactive"
# Verify card payload contains the command and buttons
card = json.loads(kwargs["payload"])
assert card["header"]["template"] == "orange"
assert "rm -rf /important" in card["elements"][0]["content"]
assert "dangerous deletion" in card["elements"][0]["content"]
# Check buttons
actions = card["elements"][1]["actions"]
assert len(actions) == 4
action_names = [a["value"]["hermes_action"] for a in actions]
assert action_names == [
"approve_once", "approve_session", "approve_always", "deny"
]
@pytest.mark.asyncio
async def test_stores_approval_state(self):
adapter = _make_adapter()
mock_response = SimpleNamespace(
success=lambda: True,
data=SimpleNamespace(message_id="msg_002"),
)
with patch.object(
adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
return_value=mock_response,
):
await adapter.send_exec_approval(
chat_id="oc_12345",
command="echo test",
session_key="my-session-key",
)
assert len(adapter._approval_state) == 1
approval_id = list(adapter._approval_state.keys())[0]
state = adapter._approval_state[approval_id]
assert state["session_key"] == "my-session-key"
assert state["message_id"] == "msg_002"
assert state["chat_id"] == "oc_12345"
@pytest.mark.asyncio
async def test_not_connected(self):
adapter = _make_adapter()
adapter._client = None
result = await adapter.send_exec_approval(
chat_id="oc_12345", command="ls", session_key="s"
)
assert result.success is False
@pytest.mark.asyncio
async def test_truncates_long_command(self):
adapter = _make_adapter()
mock_response = SimpleNamespace(
success=lambda: True,
data=SimpleNamespace(message_id="msg_003"),
)
with patch.object(
adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
return_value=mock_response,
) as mock_send:
long_cmd = "x" * 5000
await adapter.send_exec_approval(
chat_id="oc_12345", command=long_cmd, session_key="s"
)
card = json.loads(mock_send.call_args[1]["payload"])
content = card["elements"][0]["content"]
assert "..." in content
assert len(content) < 5000
@pytest.mark.asyncio
async def test_multiple_approvals_get_unique_ids(self):
adapter = _make_adapter()
mock_response = SimpleNamespace(
success=lambda: True,
data=SimpleNamespace(message_id="msg_x"),
)
with patch.object(
adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
return_value=mock_response,
):
await adapter.send_exec_approval(
chat_id="oc_1", command="cmd1", session_key="s1"
)
await adapter.send_exec_approval(
chat_id="oc_2", command="cmd2", session_key="s2"
)
assert len(adapter._approval_state) == 2
ids = list(adapter._approval_state.keys())
assert ids[0] != ids[1]
# ===========================================================================
# _handle_card_action_event — approval button clicks
# ===========================================================================
class TestFeishuApprovalCallback:
"""Test the approval intercept in _handle_card_action_event."""
@pytest.mark.asyncio
async def test_resolves_approval_on_click(self):
adapter = _make_adapter()
adapter._approval_state[1] = {
"session_key": "agent:main:feishu:group:oc_12345",
"message_id": "msg_001",
"chat_id": "oc_12345",
}
data = _make_card_action_data(
action_value={"hermes_action": "approve_once", "approval_id": 1},
)
with (
patch.object(
adapter, "_resolve_sender_profile", new_callable=AsyncMock,
return_value={"user_id": "ou_user1", "user_name": "Norbert", "user_id_alt": None},
),
patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
):
await adapter._handle_card_action_event(data)
mock_resolve.assert_called_once_with("agent:main:feishu:group:oc_12345", "once")
mock_update.assert_called_once_with("msg_001", "Approved once", "Norbert", "once")
# State should be cleaned up
assert 1 not in adapter._approval_state
@pytest.mark.asyncio
async def test_deny_button(self):
adapter = _make_adapter()
adapter._approval_state[2] = {
"session_key": "some-session",
"message_id": "msg_002",
"chat_id": "oc_12345",
}
data = _make_card_action_data(
action_value={"hermes_action": "deny", "approval_id": 2},
token="tok_deny",
)
with (
patch.object(
adapter, "_resolve_sender_profile", new_callable=AsyncMock,
return_value={"user_id": "ou_alice", "user_name": "Alice", "user_id_alt": None},
),
patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
):
await adapter._handle_card_action_event(data)
mock_resolve.assert_called_once_with("some-session", "deny")
mock_update.assert_called_once_with("msg_002", "Denied", "Alice", "deny")
@pytest.mark.asyncio
async def test_session_approval(self):
adapter = _make_adapter()
adapter._approval_state[3] = {
"session_key": "sess-3",
"message_id": "msg_003",
"chat_id": "oc_99",
}
data = _make_card_action_data(
action_value={"hermes_action": "approve_session", "approval_id": 3},
token="tok_ses",
)
with (
patch.object(
adapter, "_resolve_sender_profile", new_callable=AsyncMock,
return_value={"user_id": "ou_u", "user_name": "Bob", "user_id_alt": None},
),
patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
):
await adapter._handle_card_action_event(data)
mock_resolve.assert_called_once_with("sess-3", "session")
mock_update.assert_called_once_with("msg_003", "Approved for session", "Bob", "session")
@pytest.mark.asyncio
async def test_always_approval(self):
adapter = _make_adapter()
adapter._approval_state[4] = {
"session_key": "sess-4",
"message_id": "msg_004",
"chat_id": "oc_55",
}
data = _make_card_action_data(
action_value={"hermes_action": "approve_always", "approval_id": 4},
token="tok_alw",
)
with (
patch.object(
adapter, "_resolve_sender_profile", new_callable=AsyncMock,
return_value={"user_id": "ou_u", "user_name": "Carol", "user_id_alt": None},
),
patch.object(adapter, "_update_approval_card", new_callable=AsyncMock),
patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
):
await adapter._handle_card_action_event(data)
mock_resolve.assert_called_once_with("sess-4", "always")
@pytest.mark.asyncio
async def test_already_resolved_drops_silently(self):
adapter = _make_adapter()
# No state for approval_id 99 — already resolved
data = _make_card_action_data(
action_value={"hermes_action": "approve_once", "approval_id": 99},
token="tok_gone",
)
with patch("tools.approval.resolve_gateway_approval") as mock_resolve:
await adapter._handle_card_action_event(data)
# Should NOT resolve — already handled
mock_resolve.assert_not_called()
@pytest.mark.asyncio
async def test_non_approval_actions_route_normally(self):
"""Non-approval card actions should still become synthetic commands."""
adapter = _make_adapter()
data = _make_card_action_data(
action_value={"custom_action": "something_else"},
token="tok_normal",
)
with (
patch.object(
adapter, "_resolve_sender_profile", new_callable=AsyncMock,
return_value={"user_id": "ou_u", "user_name": "Dave", "user_id_alt": None},
),
patch.object(adapter, "get_chat_info", new_callable=AsyncMock, return_value={"name": "Test Chat"}),
patch.object(adapter, "_handle_message_with_guards", new_callable=AsyncMock) as mock_handle,
patch("tools.approval.resolve_gateway_approval") as mock_resolve,
):
await adapter._handle_card_action_event(data)
# Should NOT resolve any approval
mock_resolve.assert_not_called()
# Should have routed as synthetic command
mock_handle.assert_called_once()
event = mock_handle.call_args[0][0]
assert "/card button" in event.text
# ===========================================================================
# _update_approval_card — card replacement after resolution
# ===========================================================================
class TestFeishuUpdateApprovalCard:
"""Test the card update after approval resolution."""
@pytest.mark.asyncio
async def test_updates_card_on_approve(self):
adapter = _make_adapter()
mock_update = AsyncMock()
adapter._client.im.v1.message.update = MagicMock()
with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
await adapter._update_approval_card(
"msg_001", "Approved once", "Norbert", "once"
)
mock_thread.assert_called_once()
# Verify the update request was built
call_args = mock_thread.call_args
assert call_args[0][0] == adapter._client.im.v1.message.update
@pytest.mark.asyncio
async def test_updates_card_on_deny(self):
adapter = _make_adapter()
with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
await adapter._update_approval_card(
"msg_002", "Denied", "Alice", "deny"
)
mock_thread.assert_called_once()
@pytest.mark.asyncio
async def test_skips_update_when_not_connected(self):
adapter = _make_adapter()
adapter._client = None
with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
await adapter._update_approval_card(
"msg_001", "Approved", "Bob", "once"
)
mock_thread.assert_not_called()
@pytest.mark.asyncio
async def test_skips_update_when_no_message_id(self):
adapter = _make_adapter()
with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
await adapter._update_approval_card(
"", "Approved", "Bob", "once"
)
mock_thread.assert_not_called()
@pytest.mark.asyncio
async def test_swallows_update_errors(self):
adapter = _make_adapter()
with patch("asyncio.to_thread", new_callable=AsyncMock, side_effect=Exception("API error")):
# Should not raise
await adapter._update_approval_card(
"msg_001", "Approved", "Bob", "once"
)

Some files were not shown because too many files have changed in this diff Show More