Compare commits

..

481 Commits

Author SHA1 Message Date
emozilla ab6abc2c13 fix: use per-thread persistent event loops in worker threads
Replace asyncio.run() with thread-local persistent event loops for
worker threads (e.g., delegate_task's ThreadPoolExecutor). asyncio.run()
creates and closes a fresh loop on every call, leaving cached
httpx/AsyncOpenAI clients bound to a dead loop — causing 'Event loop is
closed' errors during GC when parallel subagents clean up connections.

The fix mirrors the main thread's _get_tool_loop() pattern but uses
threading.local() so each worker thread gets its own long-lived loop,
avoiding both cross-thread contention and the create-destroy lifecycle.

Added 4 regression tests covering worker loop persistence, reuse,
per-thread isolation, and separation from the main thread's loop.
2026-03-20 15:41:06 -04:00
Teknium aafe86d81a fix: prevent 'event loop already running' when async tools run in parallel (#2207)
When the model returns multiple tool calls, run_agent.py executes them
concurrently in a ThreadPoolExecutor. Each thread called _run_async()
which used a shared persistent event loop (_get_tool_loop()). If two
async tools (like web_extract) ran in parallel, the second thread would
hit 'This event loop is already running' on the shared loop.

Fix: detect worker threads (not main thread) and use asyncio.run() with
a per-thread fresh loop instead of the shared persistent one. The shared
loop is still used for the main thread (CLI sequential path) to keep
cached async clients (httpx/AsyncOpenAI) alive.

Co-authored-by: Test <test@test.com>
2026-03-20 11:39:13 -07:00
Teknium 1aa7027be1 Merge pull request #2192 from NousResearch/hermes/hermes-3d7c23c9
fix(acp): preserve leading whitespace in streaming chunks
2026-03-20 09:52:32 -07:00
Teknium f961937097 Merge pull request #2181 from NousResearch/hermes/hermes-4a7e401e
fix: missing platforms in delivery maps + WhatsApp image/bridge improvements
2026-03-20 09:45:50 -07:00
Teknium 7a427d7b03 fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190)
Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104.

asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.

Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
  loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
  each async client, discard stale entries if loop is closed

6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.

Co-authored-by: Test <test@test.com>
2026-03-20 09:44:50 -07:00
Teknium 66a1942524 feat: add /queue command to queue prompts without interrupting (#2191)
Adds /queue <prompt> (alias /q) that queues a message for the next
turn while the agent is busy, without interrupting the current run.

- CLI: /queue <prompt> puts it in _pending_input for the next turn
- Gateway: /queue <prompt> creates a pending MessageEvent on the
  adapter, picked up after the current agent run finishes
- Enter still interrupts as usual (no behavior change)
- /queue with no prompt shows usage
- /queue when agent is idle tells user to just type normally

Co-authored-by: Test <test@test.com>
2026-03-20 09:44:27 -07:00
Dilee 1173adbe86 fix(acp): preserve leading whitespace in streaming chunks 2026-03-20 09:38:13 -07:00
Test a5beb6d8f0 fix(whatsapp): image downloading, bridge reuse, LID allowlist, Baileys 7.x compat
Salvaged from PR #2162 by @Zindar. Reply prefix changes excluded (already
on main via #1756 configurable prefix).

Bridge improvements (bridge.js):
- Download incoming images to ~/.hermes/image_cache/ via downloadMediaMessage
  so the agent can actually see user-sent photos
- Add getMessage callback required for Baileys 7.x E2EE session
  re-establishment (without it, some messages arrive as null)
- Build LID→phone reverse map for allowlist resolution (WhatsApp LID format)
- Add placeholder body for media without caption: [image received]
- Bind express to 127.0.0.1 instead of 0.0.0.0 for security
- Use 127.0.0.1 consistently throughout (more reliable than localhost)

Adapter improvements (whatsapp.py):
- Detect and reuse already-running bridge (only if status=connected)
- Handle local file paths from bridge-cached images in _build_message_event
- Don't kill external bridges on disconnect
- Use 127.0.0.1 throughout for consistency with bridge binding

Fix vs original PR: bridge reuse now checks status=connected, not just
HTTP 200. A disconnected bridge gets restarted instead of reused.

Co-authored-by: Zindar <zindar@users.noreply.github.com>
2026-03-20 09:37:48 -07:00
Teknium 0e3b7b6a39 docs: fill documentation gaps from recent PRs (#2183)
- slash-commands.md: add /approve, /deny (gateway-only), /statusbar
  (CLI-only); update Notes section with new platform-specific commands
- messaging/index.md: add Webhooks to architecture diagram, platform
  toolsets table, and Next Steps links; add /approve and /deny to
  Chat Commands table
- environment-variables.md: add HONCHO_BASE_URL for self-hosted
  Honcho instances
- configuration.md: add Context Pressure Warnings section (separate
  from iteration budget pressure); add base_url to OpenAI TTS config;
  add display.show_cost to Display Settings
- tts.md: add base_url to OpenAI TTS config example

Co-authored-by: Test <test@test.com>
2026-03-20 08:55:49 -07:00
Teknium 5e705bc31b Merge pull request #2182 from NousResearch/hermes/hermes-5d6932ba
fix: 6 bugs in model metadata, reasoning detection, and delegate tool
2026-03-20 08:53:01 -07:00
Test 55ce601502 fix: 6 bugs in model metadata, reasoning detection, and delegate tool
Cherry-picked from PR #2169 by @0xbyt4.

1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b)
2. Fuzzy match: remove reverse direction that made claude-sonnet-4
   resolve to 1M instead of 200K
3. _has_content_after_think_block: reuse _strip_think_blocks() to
   handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD)
4. models.dev lookup: elif→if so nous provider also queries models.dev
5. Disk cache fallback: use 5-min TTL instead of full hour so network
   is retried soon
6. Delegate build: wrap child construction in try/finally so
   _last_resolved_tool_names is always restored on exception
2026-03-20 08:52:37 -07:00
Test 8f6ecd5c64 fix: add missing platforms to cron/send_message delivery maps and tool schema
Matrix, Mattermost, Home Assistant, and DingTalk were missing from the
platform_map in both cron/scheduler.py and tools/send_message_tool.py,
causing delivery to those platforms to silently fail.

Also updates the cronjob tool schema description to list all available
delivery targets so the model knows its options.
2026-03-20 08:52:21 -07:00
Teknium a51a767407 Merge pull request #2167 from buntingszn/fix/cron-matrix-delivery
fix(cron): add Matrix to scheduler delivery platform_map
2026-03-20 08:50:14 -07:00
Teknium 2ea4dd30c6 fix(gateway): strip orphaned tool_results + let /reset bypass running agent (#2180)
Two fixes for Telegram/gateway-specific bugs:

1. Anthropic adapter: strip orphaned tool_result blocks (mirror of
   existing tool_use stripping). Context compression or session
   truncation can remove an assistant message containing a tool_use
   while leaving the subsequent tool_result intact. Anthropic rejects
   these with a 400: 'unexpected tool_use_id found in tool_result
   blocks'. The adapter now collects all tool_use IDs and filters out
   any tool_result blocks referencing IDs not in that set.

2. Gateway: /reset and /new now bypass the running-agent guard (like
   /status already does). Previously, sending /reset while an agent
   was running caused the raw text to be queued and later fed back as
   a user message with the same broken history — replaying the
   corrupted session instead of resetting it. Now the running agent is
   interrupted, pending messages are cleared, and the reset command
   dispatches immediately.

Tests updated: existing tests now include proper tool_use→tool_result
pairs; two new tests cover orphaned tool_result stripping.

Co-authored-by: Test <test@test.com>
2026-03-20 08:39:49 -07:00
Teknium 80e578d3e3 docs: add context length detection references to FAQ and quickstart (#2179)
- quickstart.md: mention context length prompt for custom endpoints,
  link to configuration docs, add Ollama to provider table
- faq.md: rewrite local models section with hermes model flow and
  context length prompt example, add Ollama num_ctx tip, expand
  context-length-exceeded troubleshooting with detection override
  options and config.yaml examples

Co-authored-by: Test <test@test.com>
2026-03-20 08:38:44 -07:00
Teknium c52353cf8a feat: context pressure warnings for CLI and gateway (#2159)
* feat: context pressure warnings for CLI and gateway

User-facing notifications as context approaches the compaction threshold.
Warnings fire at 60% and 85% of the way to compaction — relative to
the configured compression threshold, not the raw context window.

CLI: Formatted line with a progress bar showing distance to compaction.
Cyan at 60% (approaching), bold yellow at 85% (imminent).

  ◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction  100k threshold (50%) · approaching compaction
  ⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction  100k threshold (50%) · compaction imminent

Gateway: Plain-text notification sent to the user's chat via the new
status_callback mechanism (asyncio.run_coroutine_threadsafe bridge,
same pattern as step_callback).

Does NOT inject into the message stream. The LLM never sees these
warnings. Flags reset after each compaction cycle.

Files changed:
- agent/display.py — format_context_pressure(), format_context_pressure_gateway()
- run_agent.py — status_callback param, _context_50/70_warned flags,
  _emit_context_pressure(), flag reset in _compress_context()
- gateway/run.py — _status_callback_sync bridge, wired to AIAgent
- tests/test_context_pressure.py — 23 tests

* Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf

---------

Co-authored-by: Test <test@test.com>
2026-03-20 08:37:36 -07:00
Teknium d76ebf0ec3 feat(gateway): webhook platform adapter for external event triggers (#2166)
feat(gateway): webhook platform adapter for external event triggers
2026-03-20 08:27:58 -07:00
bunting szn 4be5070427 fix(cron): add Matrix to scheduler delivery platform_map
Matrix is a supported gateway platform but was missing from the
cron scheduler's delivery platform_map, causing cron job results
to silently fail delivery when targeting Matrix rooms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 08:33:46 -05:00
Test e140c02d51 feat(gateway): add webhook platform adapter for external event triggers
Add a generic webhook platform adapter that receives HTTP POSTs from
external services (GitHub, GitLab, JIRA, Stripe, etc.), validates HMAC
signatures, transforms payloads into agent prompts, and routes responses
back to the source or to another platform.

Features:
- Configurable routes with per-route HMAC secrets, event filters,
  prompt templates with dot-notation payload access, skill loading,
  and pluggable delivery (github_comment, telegram, discord, log)
- HMAC signature validation (GitHub SHA-256, GitLab token, generic)
- Rate limiting (30 req/min per route, configurable)
- Idempotency cache (1hr TTL, prevents duplicate runs on retries)
- Body size limits (1MB default, checked before reading payload)
- Setup wizard integration with security warnings and docs links
- 33 tests (29 unit + 4 integration), all passing

Security:
- HMAC secret required per route (startup validation)
- Setup wizard warns about internet exposure for webhook/SMS platforms
- Sandboxing (Docker/VM) recommended in docs for public-facing deployments

Files changed:
- gateway/config.py — Platform.WEBHOOK enum + env var overrides
- gateway/platforms/webhook.py — WebhookAdapter (~420 lines)
- gateway/run.py — factory wiring + auth bypass for webhook events
- hermes_cli/config.py — WEBHOOK_* env var definitions
- hermes_cli/setup.py — webhook section in setup_gateway()
- tests/gateway/test_webhook_adapter.py — 29 unit tests
- tests/gateway/test_webhook_integration.py — 4 integration tests
- website/docs/user-guide/messaging/webhooks.md — full user docs
- website/docs/reference/environment-variables.md — WEBHOOK_* vars
- website/sidebars.ts — nav entry
2026-03-20 06:33:36 -07:00
Teknium 88643a1ba9 feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.

Key changes:

- New agent/models_dev.py: Fetches and caches the models.dev registry
  (3800+ models across 100+ providers with per-provider context windows).
  In-memory cache (1hr TTL) + disk cache for cold starts.

- Rewritten get_model_context_length() resolution chain:
  0. Config override (model.context_length)
  1. Custom providers per-model context_length
  2. Persistent disk cache
  3. Endpoint /models (local servers)
  4. Anthropic /v1/models API (max_input_tokens, API-key only)
  5. OpenRouter live API (existing, unchanged)
  6. Nous suffix-match via OpenRouter (dot/dash normalization)
  7. models.dev registry lookup (provider-aware)
  8. Thin hardcoded defaults (broad family patterns)
  9. 128K fallback (was 2M)

- Provider-aware context: same model now correctly resolves to different
  context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
  128K on GitHub Copilot). Provider name flows through ContextCompressor.

- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
  models.dev replaces the per-model hardcoding.

- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
  to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.

- hermes model: prompts for context_length when configuring custom
  endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
  per-model config.

- custom_providers schema extended with optional models dict for
  per-model context_length (backward compatible).

- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
  OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
  normalization. Handles all 15 current Nous models.

- Anthropic direct: queries /v1/models for max_input_tokens. Only works
  with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
  to models.dev for OAuth users.

Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md

Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
Teknium b7b585656b Merge pull request #2110 from NousResearch/hermes/hermes-5d6932ba
fix: session reset + custom provider model switch + honcho base_url
2026-03-20 06:01:44 -07:00
Test 4494c0b033 fix(cron): remove send_message/clarify from cron agents + autonomous prompt
Cron jobs run unattended with no user present. Previously the agent had
send_message and clarify tools available, which makes no sense — the
final response is auto-delivered, and there's nobody to ask questions to.

Changes:
- Disable messaging and clarify toolsets for cron agent sessions
- Update cron platform hint to emphasize autonomous execution: no user
  present, cannot ask questions, must execute fully and make decisions
- Update cronjob tool schema description to match (remove stale
  send_message guidance)
2026-03-20 05:18:05 -07:00
Teknium aa6416399e Merge pull request #2161 from NousResearch/hermes/hermes-6757a563
fix(display): show spinners and tool progress during streaming mode
2026-03-20 05:17:55 -07:00
Test b313751acf fix(display): show spinners and tool progress during streaming mode
When streaming was enabled, two visual feedback mechanisms were
completely suppressed:

1. The thinking spinner (TUI toolbar) was skipped because the entire
   spinner block was gated on 'not self._has_stream_consumers()'.
   Now the thinking_callback fires in streaming mode too — the
   raw KawaiiSpinner is still skipped (would conflict with streamed
   tokens) but the TUI toolbar widget works fine alongside streaming.

2. Tool progress lines (the ┊ feed) were invisible because _vprint
   was blanket-suppressed when stream consumers existed. But during
   tool execution, no tokens are actively streaming, so printing is
   safe. Added an _executing_tools flag that _vprint respects to
   allow output during tool execution even with stream consumers
   registered.
2026-03-20 05:14:42 -07:00
Test b1d05dfe8b fix(openai): route api.openai.com to Responses API for GPT-5.x
Based on PR #1859 by @magi-morph (too stale to cherry-pick, reimplemented).

GPT-5.x models reject tool calls + reasoning_effort on
/v1/chat/completions with a 400 error directing to /v1/responses.
This auto-detects api.openai.com in the base URL and switches to
codex_responses mode in three places:

- AIAgent.__init__: upgrades chat_completions → codex_responses
- _try_activate_fallback(): same routing for fallback model
- runtime_provider.py: _detect_api_mode_for_url() for both custom
  provider and openrouter runtime resolution paths

Also extracts _is_direct_openai_url() helper to replace the inline
check in _max_tokens_param().
2026-03-20 05:09:41 -07:00
Teknium f8899af113 Merge pull request #2156 from NousResearch/hermes/hermes-6757a563
fix(signal): handle Note to Self messages with echo-back protection
2026-03-20 04:56:57 -07:00
Test cf29cba084 docs(signal): add Note to Self section to Signal setup guide 2026-03-20 04:48:13 -07:00
Test ec9b868aea fix(signal): handle Note to Self messages with echo-back protection
Support Signal 'Note to Self' messages in single-number setups where
signal-cli is linked as a secondary device on the user's own account.

syncMessage.sentMessage envelopes addressed to the bot's own account
are now promoted to dataMessage for normal processing, while other
sync events (read receipts, typing, etc.) are still filtered.

Echo-back prevention mirrors the WhatsApp bridge pattern:
- Track timestamps of recently sent messages (bounded set of 50)
- When a Note to Self sync arrives, check if its timestamp matches
  a recent outbound — skip if so (agent echo-back)
- Only process sync messages that are genuinely user-initiated

Based on PR #2115 by @Stonelinks with added echo-back protection.
2026-03-20 04:46:32 -07:00
Teknium 3ec6c71e43 fix: update claude 4.6 context length from 200K to 1M (#2155)
* fix: preserve Ollama model:tag colons in context length detection

The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.

Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.

* fix: update claude-opus-4-6 and claude-sonnet-4-6 context length from 200K to 1M

Both models support 1,000,000 token context windows. The hardcoded defaults
were set before Anthropic expanded the context for the 4.6 generation.
Verified via models.dev and OpenRouter API data.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Co-authored-by: Test <test@test.com>
2026-03-20 04:38:59 -07:00
Test 4ad0083118 fix(honcho): read HONCHO_BASE_URL for local/self-hosted instances
Cherry-picked from PR #2120 by @unclebumpy.

- from_env() now reads HONCHO_BASE_URL and enables Honcho when base_url
  is set, even without an API key
- from_global_config() reads baseUrl from config root with
  HONCHO_BASE_URL env var as fallback
- get_honcho_client() guard relaxed to allow base_url without api_key
  for no-auth local instances
- Added HONCHO_BASE_URL to OPTIONAL_ENV_VARS registry

Result: Setting HONCHO_BASE_URL=http://localhost:8000 in ~/.hermes/.env
now correctly routes the Honcho client to a local instance.
2026-03-20 04:36:06 -07:00
Test 1055d4356a fix: skip model auto-detection for custom/local providers
When the user is on a custom provider (provider=custom, localhost, or
127.0.0.1 endpoint), /model <name> no longer tries to auto-detect a
provider switch. The model name changes on the current endpoint as-is.

To switch away from a custom endpoint, users must use explicit
provider:model syntax (e.g. /model openai-codex:gpt-5.2-codex).
A helpful tip is printed when changing models on a custom endpoint.

This prevents the confusing case where someone on LM Studio types
/model gpt-5.2-codex, the auto-detection tries to switch providers,
fails or partially succeeds, and requests still go to the old endpoint.

Also fixes the missing prompt_toolkit.auto_suggest mock stub in
test_cli_init.py (same issue already fixed in test_cli_new_session.py).
2026-03-20 04:35:17 -07:00
Test 5822711ae6 fix: complete session reset — missing compressor counters + test
Follow-up to PR #2101 (InB4DevOps). Adds three missing context compressor
resets in reset_session_state():
- compression_count (displayed in status bar)
- last_total_tokens
- _context_probed (stale context-error flag)

Also fixes the test_cli_new_session.py prompt_toolkit mock (missing
auto_suggest stub) and adds a regression test for #2099 that verifies
all token counters and compressor state are zeroed on /new.
2026-03-20 04:35:17 -07:00
Teknium b19f5133c3 Merge pull request #2118 from NousResearch/hermes/hermes-e83093f0
feat: show reasoning/thinking blocks when show_reasoning is enabled
2026-03-20 04:35:12 -07:00
Teknium 471ea81a7d fix: preserve Ollama model:tag colons in context length detection (#2149)
The colon-split logic in get_model_context_length() and
_query_local_context_length() assumed any colon meant provider:model
format (e.g. "local:my-model"). But Ollama uses model:tag format
(e.g. "qwen3.5:27b"), so the split turned "qwen3.5:27b" into just
"27b" — which matches nothing, causing a fallback to the 2M token
probe tier.

Now only recognised provider prefixes (local, openrouter, anthropic,
etc.) are stripped. Ollama model:tag names pass through intact.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-20 03:19:31 -07:00
Test b1832faaae feat: show reasoning/thinking blocks when show_reasoning is enabled
- Add <thinking> tag to streaming filter's tag list
- When show_reasoning is on, route XML reasoning content to the
  reasoning display box instead of silently discarding it
- Expand _strip_think_blocks to handle all tag variants:
  <think>, <thinking>, <THINKING>, <reasoning>, <REASONING_SCRATCHPAD>
2026-03-19 19:44:31 -07:00
Teknium 3a9a1bbb84 Merge pull request #2091 from dusterbloom/fix/lmstudio-context-length-detection
feat: query local servers for actual context window size
2026-03-19 19:08:21 -07:00
Teknium d8081790f3 Merge pull request #2102 from NousResearch/hermes/hermes-6757a563
fix(tools,cli): normalise MCP schemas + expand session list columns
2026-03-19 19:06:56 -07:00
Teknium 493bf8db7e Merge pull request #2083 from ygd58/fix/delegate-save-parent-tool-names-before-child-build
fix(delegate): save parent tool names before child construction mutates global
2026-03-19 18:47:29 -07:00
Teknium d9eba2a44f feat: optional FastMCP skill + fix: gateway session race guard (#2113)
feat: optional FastMCP skill + fix: gateway session race guard
2026-03-19 18:26:49 -07:00
Test fc061c2fee fix: harden sentinel guard for /stop during setup and shutdown
- /stop during sentinel returns helpful message instead of queuing
- Shutdown loop skips sentinel entries instead of catching AttributeError
- _handle_stop_command guards against sentinel (defensive)
- Added tests for both edge cases (7 total race guard tests)
2026-03-19 18:26:09 -07:00
Gutslabs aaa96713d4 fix(gateway): prevent concurrent agent runs for the same session
Place a sentinel in _running_agents immediately after the "already
running" guard check passes — before any await.  Without this, the
numerous await points between the guard (line 1324) and agent
registration (track_agent at line 4790) create a window where a
second message for the same session can bypass the guard and start
a duplicate agent, corrupting the transcript.

The await gap includes: hook emissions, vision enrichment (external
API call), audio transcription (external API call), session hygiene
compression, and the run_in_executor call itself.  For messages with
media attachments the window can be several seconds wide.

The sentinel is wrapped in try/finally so it is always cleaned up —
even if the handler raises or takes an early-return path.  When the
real AIAgent is created, track_agent() overwrites the sentinel with
the actual instance (preserving interrupt support).

Also handles the edge case where a message arrives while the sentinel
is set but no real agent exists yet: the message is queued via the
adapter's pending-message mechanism instead of attempting to call
interrupt() on the sentinel object.
2026-03-19 18:23:24 -07:00
kshitijk4poor 02954c1a10 feat: add optional FastMCP skill for building MCP servers
Add FastMCP skill to optional-skills/mcp/fastmcp/ with:
- SKILL.md with workflow, design patterns, quality checklist
- Templates: API wrapper, database server, file processor
- Scaffold CLI script for template instantiation
- FastMCP CLI reference documentation

Moved to optional-skills (requires pip install fastmcp).

Based on work by kshitijk4poor in PR #2096.
Closes #343
2026-03-19 18:23:16 -07:00
Teknium 4355f30422 Merge pull request #2114 from NousResearch/hermes/hermes-14b05543
docs: align venv path to match installer (venv/ not .venv/)
2026-03-19 18:22:03 -07:00
Test 2f07df3177 fix(cli): expand session list columns for full ID visibility
Show complete session IDs in 'hermes sessions list' instead of
truncating to 20 characters. Widens title column from 20→30 chars
and adjusts header widths accordingly.

Fixes #2068. Based on PR #2085 by @Nebula037 with a correction
to preserve the no-titles layout (the original PR accidentally
replaced the Preview/Src header with a duplicate Title/Preview header).
2026-03-19 18:17:28 -07:00
Test 672e9752a0 docs: align venv path to match installer (venv/ not .venv/)
The install script creates venv/ but several docs referenced .venv/,
causing agents to fail with 'No such file or directory' when following
AGENTS.md instructions.

Fixes #2066
2026-03-19 18:16:26 -07:00
Teknium df0f684c34 Merge pull request #2098 from JiwaniZakir/minisweagent_path-missing-wheel-2075
Clean fix — adds minisweagent_path to py-modules so it ships in the wheel. Thanks @JiwaniZakir!
2026-03-19 17:47:25 -07:00
Teknium 21afa134f0 Merge pull request #2101 from InB4DevOps/main
fix: Reset token counters on new session for accurate usage display
2026-03-19 17:47:11 -07:00
Teknium 6bcec1ac25 fix: resolve MiniMax 401 auth error by defaulting to anthropic_messages (#2103)
MiniMax's default base URL was /v1 which caused runtime_provider to
default to chat_completions mode (OpenAI-style Authorization: Bearer
header). MiniMax rejects this with a 401 because they require the
Anthropic-style x-api-key header.

Changes:
- auth.py: Change default inference_base_url for minimax and minimax-cn
  from /v1 to /anthropic
- runtime_provider.py: Auto-correct stale /v1 URLs from existing .env
  files to /anthropic, and always default minimax/minimax-cn providers
  to anthropic_messages mode
- Update tests to reflect new defaults, add tests for stale URL
  auto-correction and explicit api_mode override

Based on PR #2100 by @devorun. Fixes #2094.

Co-authored-by: Test <test@test.com>
2026-03-19 17:47:05 -07:00
InB4DevOps fe331ed9bd fix: Reset token counters on new session for accurate usage display (#2099) 2026-03-20 01:21:25 +01:00
Peppi Littera 746abf5e28 fix: use reasoning content as response when model only produces think blocks
Local models (especially Qwen 3.5) sometimes wrap their entire response
inside <think> tags, leaving actual content empty. Previously this caused
3 retries and then an error, wasting tokens and failing the request.

Now when retries are exhausted and reasoning_text contains the response,
it is used as final_response instead of returning an error. The user
sees the actual answer instead of "Model generated only think blocks."
2026-03-20 00:26:36 +01:00
hermes 4d2c93a04f fix: normalize MCP object schemas without properties 2026-03-19 16:23:45 -07:00
Zakir Jiwani 3959e3cadb fix: add minisweagent_path to py-modules in pyproject.toml
Closes #2075
2026-03-19 22:20:44 +00:00
Peppi Littera ec5fdb8b92 feat: query local servers for actual context window size
Custom endpoints (LM Studio, Ollama, vLLM, llama.cpp) silently fall
back to 2M tokens when /v1/models doesn't include context_length.

Adds _query_local_context_length() which queries server-specific APIs:
- LM Studio: /api/v1/models (max_context_length + loaded instances)
- Ollama: /api/show (model_info + num_ctx parameters)
- llama.cpp: /props (n_ctx from default_generation_settings)
- vLLM: /v1/models/{model} (max_model_len)

Prefers loaded instance context over max (e.g., 122K loaded vs 1M max).
Results are cached via save_context_length() to avoid repeated queries.

Also fixes detect_local_server_type() misidentifying LM Studio as
Ollama (LM Studio returns 200 for /api/tags with an error body).
2026-03-19 21:32:04 +01:00
Peppi Littera c030ac1d85 fix: prefer loaded instance context size over max for LM Studio
When LM Studio has a model loaded with a custom context size (e.g.,
122K), prefer that over the model's max_context_length (e.g., 1M).
This makes the TUI status bar show the actual runtime context window.
2026-03-19 21:24:53 +01:00
Peppi Littera d223f7388d feat: query local server for actual context window size
Instead of defaulting to 2M for unknown local models, query the server
API for the real context length. Supports Ollama (/api/show), vLLM
(max_model_len), and LM Studio (/v1/models). Results are cached to
avoid repeated queries.
2026-03-19 21:24:05 +01:00
ygd58 816d1344ee fix(delegate): save parent tool names before child construction mutates global 2026-03-19 20:27:26 +01:00
Teknium 4c0c7f4c6e fix: /model command — bare provider names, custom endpoint display
Two issues with /model preventing proper provider switching:

1. Bare provider names not detected: typing '/model nous' treated 'nous'
   as a model name instead of triggering a provider switch. Fixed by adding
   step 0 in detect_provider_for_model() that checks if the input matches
   a known provider name/alias (excluding 'custom'/'openrouter' which need
   explicit model names) and returns that provider's default model.

2. Custom endpoint details hidden: /model (no args) showed '[custom]' with
   just a usage hint but no endpoint URL or model name. Now displays the
   configured base_url for custom providers in both CLI and gateway.

Note: config base_url and OPENAI_BASE_URL are intentionally NOT cleared on
provider switch — dedicated provider paths (nous, anthropic, codex) have
their own credential resolution that ignores these, and clearing them would
destroy the user's custom endpoint config, preventing switching back.

Co-authored-by: Test <test@test.com>
2026-03-19 12:06:48 -07:00
StefanIsMe 04b6ecadc4 feat(cli): Tab now accepts auto-suggestions (ghost text)
Previously, Tab only handled dropdown completions. Users seeing gray
ghost text from history-based suggestions had no way to accept them
with Tab - they had to use Right arrow or Ctrl+E.

Now Tab follows priority:
1. Completion menu open → accept selected completion
2. Ghost text suggestion available → accept auto-suggestion
3. Otherwise → start completion menu

This matches user intuition that Tab should 'complete what I see.'
2026-03-19 10:40:37 -07:00
Teknium e84d952dc0 fix(codex): handle reasoning-only responses and replay path (#2070)
* fix(codex): treat reasoning-only responses as incomplete, not stop

When a Codex Responses API response contains only reasoning items
(encrypted thinking state) with no message text or tool calls, the
_normalize_codex_response method was setting finish_reason='stop'.
This sent the response into the empty-content retry loop, which
burned 3 retries and then failed — exactly the pattern Nester
reported in Discord.

Two fixes:
1. _normalize_codex_response: reasoning-only responses (reasoning_items_raw
   non-empty but no final_text) now get finish_reason='incomplete', routing
   them to the Codex continuation path instead of the retry loop.
2. Incomplete handling: also checks for codex_reasoning_items when deciding
   whether to preserve an interim message, so encrypted reasoning state is
   not silently dropped when there is no visible reasoning text.

Adds 4 regression tests covering:
- Unit: reasoning-only → incomplete, reasoning+content → stop
- E2E: reasoning-only → continuation → final answer succeeds
- E2E: encrypted reasoning items preserved in interim messages

* fix(codex): ensure reasoning items have required following item in API input

Follow-up to the reasoning-only response fix. Three additional issues
found by tracing the full replay path:

1. _chat_messages_to_responses_input: when a reasoning-only interim
   message was converted to Responses API input, the reasoning items
   were emitted as the last items with no following item. The Responses
   API requires a following item after each reasoning item (otherwise:
   'missing_following_item' error, as seen in OpenHands #11406). Now
   emits an empty assistant message as the required following item when
   content is empty but reasoning items were added.

2. Duplicate detection: two consecutive reasoning-only incomplete
   messages with identical empty content/reasoning but different
   encrypted codex_reasoning_items were incorrectly treated as
   duplicates, silently dropping the second response's reasoning state.
   Now includes codex_reasoning_items in the duplicate comparison.

3. Added tests for both the API input conversion path and the duplicate
   detection edge case.

Research context: verified against OpenCode (uses Vercel AI SDK, no
retry loop so avoids the issue), Clawdbot (drops orphaned reasoning
blocks entirely), and OpenHands (hit the missing_following_item error).
Our approach preserves reasoning continuity while satisfying the API
constraint.

---------

Co-authored-by: Test <test@test.com>
2026-03-19 10:34:44 -07:00
Teknium 388130a122 fix: persist ACP sessions to SessionDB so they survive process restarts
* fix: persist ACP sessions to disk so they survive process restarts

The ACP adapter stored sessions entirely in-memory. When the editor
restarted the ACP subprocess (idle timeout, crash, system sleep/wake,
editor restart), all sessions were lost. The editor's load_session /
resume_session calls would fail to find the session, forcing a new
empty session and losing all conversation history.

Changes:
- SessionManager now persists each session as a JSON file under
  ~/.hermes/acp_sessions/<session_id>.json
- get_session() transparently restores from disk when not in memory
- update_cwd(), fork_session(), list_sessions() all check disk
- server.py calls save_session() after prompt completion, /reset,
  /compact, and model switches
- cleanup() and remove_session() delete disk files too
- Sessions have a 7-day TTL; expired sessions are pruned on startup
- Atomic writes via tempfile + os.replace to prevent corruption
- 11 new tests covering persistence, disk restoration, and TTL expiry

* refactor: use SessionDB instead of JSON files for ACP session persistence

Replace the standalone JSON file persistence layer with SessionDB
(~/.hermes/state.db) integration. ACP sessions now:
- Share the same DB as CLI and gateway sessions
- Are searchable via session_search (FTS5)
- Get token tracking, cost tracking, and session titles for free
- Follow existing session pruning policies

Key changes:
- _get_db() lazily creates a SessionDB, resolving HERMES_HOME
  dynamically (not at import time) for test compatibility
- _persist() creates session record + replaces messages in DB
- _restore() loads from DB with source='acp' filter
- cwd stored in model_config JSON field (no schema migration)
- Model values coerced to str to handle mock agents in tests
- Removed: json files, sessions_dir, ttl_days, _expire logic
- Tests updated: DB-backed persistence, FTS search, tool_call
  round-tripping, source filtering

---------

Co-authored-by: Test <test@test.com>
2026-03-19 10:30:50 -07:00
cmcleay bb59057d5d fix: normalize live Chrome CDP endpoints for browser tools 2026-03-19 10:17:03 -07:00
Teknium 36a4481152 fix: prevent unavailable tool names from leaking into model schemas
* fix: prevent unavailable tool names from leaking into model schemas

When web_search/web_extract fail check_fn (no API key configured), their
names were still leaking into tool descriptions via two paths:

1. execute_code schema: sandbox_enabled was computed from tools_to_include
   (pre-filter) instead of the actual available tools (post-filter), so
   the execute_code description listed web_search/web_extract as available
   sandbox imports even when they weren't.

2. browser_navigate schema: hardcoded description said 'prefer web_search
   or web_extract' regardless of whether those tools existed.

The model saw these references, assumed the tools existed, and tried
calling them directly — triggering 'Unknown tool' errors.

Fix: compute available_tool_names from the filtered result set and use
that for both execute_code sandbox listing and browser_navigate description
patching.

* docs: add pitfall about cross-tool references in schema descriptions

---------

Co-authored-by: Test <test@test.com>
2026-03-19 10:08:14 -07:00
Test efa753678c Merge PR #2064: feat(tools): add base_url support to OpenAI TTS provider
Authored by Hanai. Allows overriding the OpenAI TTS endpoint via
tts.openai.base_url in config.yaml for self-hosted or OpenAI-compatible
TTS services. Falls back to api.openai.com when not set.
2026-03-19 10:07:58 -07:00
Test 7f3a567259 Merge PR #2063: fix(daytona): migrate sandbox lookup from find_one to get/list
Authored by Lovre Pešut (rovle). Migrates from deprecated find_one(labels=...)
to get(sandbox_name) with deterministic naming (hermes-{task_id}), plus legacy
fallback via list(labels=...) for pre-migration sandboxes.
2026-03-19 10:01:40 -07:00
Yannick Stephan defbe0f9e9 fix(cron): warn and skip missing skills instead of crashing job
When a cron job references a skill that is no longer installed,
_build_job_prompt() now logs a warning and injects a user-visible notice
into the prompt instead of raising RuntimeError. The job continues with
any remaining valid skills and the user prompt.

Adds 4 regression tests for missing skill handling.
2026-03-19 09:56:16 -07:00
rovle 18862145e4 fix(daytona): migrate sandbox lookup from find_one to get/list
find_one is being deprecated. Primary lookup now uses get() with a
deterministic sandbox name (hermes-{task_id}). A legacy fallback via
list(labels=...) ensures sandboxes created before this migration are
still resumable.
2026-03-19 17:54:46 +01:00
Test 35558dadf4 Merge PR #2061: fix(security): eliminate SQL string formatting in execute() calls
Authored by dusterbloom. Closes #1911.

Pre-computes SQL query strings at class definition time in insights.py,
adds identifier quoting for ALTER TABLE DDL in hermes_state.py, and adds
4 regression tests verifying query construction safety.
2026-03-19 09:52:00 -07:00
Test ae8059ca24 fix(delegate): move _saved_tool_names assignment to correct scope
The merge at e7844e9c re-introduced a line in _build_child_agent() that
references _saved_tool_names — a variable only defined in _run_single_child().
This caused NameError on every delegate_task call, completely breaking
subagent delegation.

Moves the child._delegate_saved_tool_names assignment to _run_single_child()
where _saved_tool_names is actually defined, keeping the save/restore in the
same scope as the try/finally block.

Adds two regression tests from PR #2038 (YanSte).
Also fixes the same issue reported in PR #2048 (Gutslabs).

Co-authored-by: Yannick Stephan <yannick.stephan@gmail.com>
Co-authored-by: Guts <gutslabs@users.noreply.github.com>
2026-03-19 09:26:05 -07:00
Han 116984feb7 feat(tools): add base_url support to OpenAI TTS provider
Allow users to configure a custom base_url for the OpenAI TTS provider
in ~/.hermes/config.yaml under tts.openai.base_url. Defaults to the
official OpenAI endpoint. Enables use of self-hosted or OpenAI-compatible
TTS services (e.g. http://localhost:8000/v1).

Also adds a TTS configuration example block to cli-config.yaml.example.
2026-03-19 23:55:13 +08:00
Peppi Littera 219af75704 fix(security): eliminate SQL string formatting in execute() calls
Closes #1911

- insights.py: Pre-compute SELECT queries as class constants instead of
  f-string interpolation at runtime. _SESSION_COLS is now evaluated once
  at class definition time.
- hermes_state.py: Add identifier quoting and whitelist validation for
  ALTER TABLE column names in schema migrations.
- Add 4 tests verifying no injection vectors in SQL query construction.
2026-03-19 15:16:35 +01:00
Teknium d76fa7fc37 fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051)
* fix: detect context length for custom model endpoints via fuzzy matching + config override

Custom model endpoints (non-OpenRouter, non-known-provider) were silently
falling back to 2M tokens when the model name didn't exactly match what the
endpoint's /v1/models reported. This happened because:

1. Endpoint metadata lookup used exact match only — model name mismatches
   (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss
2. Single-model servers (common for local inference) required exact name
   match even though only one model was loaded
3. No user escape hatch to manually set context length

Changes:
- Add fuzzy matching for endpoint model metadata: single-model servers
  use the only available model regardless of name; multi-model servers
  try substring matching in both directions
- Add model.context_length config override (highest priority) so users
  can explicitly set their model's context length in config.yaml
- Log an informative message when falling back to 2M probe, telling
  users about the config override option
- Thread config_context_length through ContextCompressor and AIAgent init

Tests: 6 new tests covering fuzzy match, single-model fallback, config
override (including zero/None edge cases).

* fix: auto-detect local model name and context length for local servers

Cherry-picked from PR #2043 by sudoingX.

- Auto-detect model name from local server's /v1/models when only one
  model is loaded (no manual model name config needed)
- Add n_ctx_train and n_ctx to context length detection keys for llama.cpp
- Query llama.cpp /props endpoint for actual allocated context (not just
  training context from GGUF metadata)
- Strip .gguf suffix from display in banner and status bar
- _auto_detect_local_model() in runtime_provider.py for CLI init

Co-authored-by: sudo <sudoingx@users.noreply.github.com>

* fix: revert accidental summary_target_tokens change + add docs for context_length config

- Revert summary_target_tokens from 2500 back to 500 (accidental change
  during patching)
- Add 'Context Length Detection' section to Custom & Self-Hosted docs
  explaining model.context_length config override

---------

Co-authored-by: Test <test@test.com>
Co-authored-by: sudo <sudoingx@users.noreply.github.com>
2026-03-19 06:01:16 -07:00
Teknium 7b6d14e62a fix(gateway): replace bare text approval with /approve and /deny commands (#2002)
The gateway approval system previously intercepted bare 'yes'/'no' text
from the user's next message to approve/deny dangerous commands. This was
fragile and dangerous — if the agent asked a clarify question and the user
said 'yes' to answer it, the gateway would execute the pending dangerous
command instead. (Fixes #1888)

Changes:
- Remove bare text matching ('yes', 'y', 'approve', 'ok', etc.) from
  _handle_message approval check
- Add /approve and /deny as gateway-only slash commands in the command
  registry
- /approve supports scoping: /approve (one-time), /approve session,
  /approve always (permanent)
- Add 5-minute timeout for stale approvals
- Gateway appends structured instructions to the agent response when a
  dangerous command is pending, telling the user exactly how to respond
- 9 tests covering approve, deny, timeout, scoping, and verification
  that bare 'yes' no longer triggers execution

Credit to @solo386 and @FlyByNight69420 for identifying and reporting
this security issue in PR #1971 and issue #1888.

Co-authored-by: Test <test@test.com>
2026-03-18 16:58:20 -07:00
Teknium 67d707e851 fix: respect config.yaml model.base_url for Anthropic provider (#1948) (#1998)
After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic
provider base URL was hardcoded to https://api.anthropic.com. Now reads
model.base_url from config.yaml as an override, falling back to the
default when not set. Also applies to the auxiliary client.

Cherry-picked from PR #1949 by @rivercrab26.

Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>
2026-03-18 16:51:24 -07:00
Teknium e648863d52 docs: fix documentation inconsistencies across reference and user guides
- toolsets-reference: add browser_console to browser + all platform toolsets,
  add missing hermes-acp, hermes-sms, messaging toolsets, correct hermes-gateway
  as composite, deduplicate platform toolset listings
- tools-reference: add missing vision and web toolset sections
- slash-commands: fix /new+/reset as alias (not separate commands), add /stop to
  CLI section (available in both CLI and gateway), add /plugins command, fix Notes
  section about messaging-only vs CLI-only
- environment-variables: fix HERMES_MAX_ITERATIONS default (90 not 60), add
  DEEPSEEK_API_KEY/BASE_URL, OPENCODE_ZEN/GO keys, TAVILY_API_KEY,
  GITHUB_TOKEN, HERMES_EPHEMERAL_SYSTEM_PROMPT
- configuration: remove duplicate Alibaba Cloud row, add OpenCode Zen/Go providers
- cli-commands: add missing providers to --provider list (opencode-zen,
  opencode-go, ai-gateway, kilocode, alibaba)
- quickstart: add OpenCode Zen and OpenCode Go to provider table

Co-authored-by: Test <test@test.com>
2026-03-18 16:26:27 -07:00
Teknium a7cc1cf309 fix: support Anthropic-compatible endpoints for third-party providers (#1997)
Three bugs prevented providers like MiniMax from using their
Anthropic-compatible endpoints (e.g. api.minimax.io/anthropic):

1. _VALID_API_MODES was missing 'anthropic_messages', so explicit
   api_mode config was silently rejected and defaulted to
   chat_completions.

2. API-key provider resolution hardcoded api_mode to 'chat_completions'
   without checking model config or detecting Anthropic-compatible URLs.

3. run_agent.py auto-detection only recognized api.anthropic.com, not
   third-party endpoints using the /anthropic URL convention.

Fixes:
- Add 'anthropic_messages' to _VALID_API_MODES
- API-key providers now check model config api_mode and auto-detect
  URLs ending in /anthropic
- run_agent.py and fallback logic detect /anthropic URL convention
- 5 new tests covering all scenarios

Users can now either:
- Set MINIMAX_BASE_URL=https://api.minimax.io/anthropic (auto-detected)
- Set api_mode: anthropic_messages in model config (explicit)
- Use custom_providers with api_mode: anthropic_messages

Co-authored-by: Test <test@test.com>
2026-03-18 16:26:06 -07:00
Teknium f24db23458 fix: custom provider uses config base_url and api_key over env vars (#1760) (#1994)
When provider: custom is set in config.yaml with base_url and api_key,
those values are now used instead of falling back to OPENAI_BASE_URL and
OPENAI_API_KEY env vars. Also reads the 'api' field as an alternative to
'api_key' for config compatibility.

Cherry-picked from PR #1762 by crazywriter1.

Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-18 16:00:14 -07:00
Teknium d132e344d7 fix(agent): prevent silent tool result loss during context compression (#1993)
_align_boundary_backward only checked messages[idx-1] to decide if
the compress-end boundary splits a tool_call/result group. When an
assistant issues 3+ parallel tool calls, their results span multiple
consecutive messages. If the boundary fell in the middle of that group,
the parent assistant was summarized away and orphaned tool results were
silently deleted by _sanitize_tool_pairs.

Now walks backward through all consecutive tool results to find the
parent assistant, then pulls the boundary before the entire group.

6 regression tests added in tests/test_compression_boundary.py.

Co-authored-by: Guts <Gutslabs@users.noreply.github.com>
2026-03-18 15:22:51 -07:00
Teknium 22f41daded fix: send error details to user in gateway outer exception handler
Previously, if an error occurred during response processing in
_process_message_background (e.g. during extract_media, send, or
any uncaught exception from the handler), the error was only logged
to server console and the user was left with radio silence — typing
indicator stops but no message arrives.

Now the outer except block attempts to send the error type and detail
(truncated to 300 chars) to the user's chat, matching the format
already used by the inner handler in gateway/run.py.

Co-authored-by: Test <test@test.com>
2026-03-18 10:42:43 -07:00
Teknium 7c7feaa033 Merge pull request #1929 from NousResearch/hermes/hermes-b29f73b2
feat: inject model and provider into system prompt
2026-03-18 04:18:41 -07:00
Teknium 2f80bd9f87 fix: whatsapp reply_prefix config.yaml bridging was dead code (#1923)
The whatsapp reply_prefix bridging referenced config.platforms before
the config object was constructed, making it a silent NameError caught
by except Exception: pass.

Fix: fold reply_prefix into the per-platform bridging loop (introduced
in #1919) which correctly writes to gw_data dict pre-construction.
Removes the broken standalone whatsapp bridging block.

Co-authored-by: Test <test@test.com>
2026-03-18 04:18:33 -07:00
Teknium 23e5e8dde9 Merge pull request #1928 from NousResearch/hermes/hermes-ba3c8fa1
chore: trim huggingface-hub skill description
2026-03-18 04:18:27 -07:00
Test e99aca98ab feat: inject model and provider into system prompt
Adds model name and provider to the system prompt metadata block,
alongside the existing session ID and timestamp. These are frozen
at session start and don't change mid-conversation, so they won't
break prompt caching.
2026-03-18 04:18:26 -07:00
Test 7e30e97a59 chore: trim redundant trigger sentence from huggingface-hub description 2026-03-18 04:18:13 -07:00
Teknium db4dfea7ec docs: document SOUL.md as primary agent identity (#1927)
Update all SOUL.md documentation to reflect that it now occupies
slot #1 in the system prompt, replacing the hardcoded default identity.

Updated pages:
- user-guide/features/personality.md — SOUL.md is primary identity, not just a layer
- developer-guide/prompt-assembly.md — updated prompt layer order, context files list
- guides/use-soul-with-hermes.md — SOUL.md replaces built-in identity
- user-guide/configuration.md — updated context files table and directory tree

Co-authored-by: Test <test@test.com>
2026-03-18 04:18:08 -07:00
Teknium 17254a7692 Merge pull request #1926 from NousResearch/hermes/hermes-ba3c8fa1
chore: add search to huggingface-hub skill description
2026-03-18 04:15:17 -07:00
Test adf188c439 chore: add search to huggingface-hub skill description 2026-03-18 04:15:03 -07:00
Teknium 21958a55d1 Merge pull request #1925 from NousResearch/hermes/hermes-ba3c8fa1
chore: tighten huggingface-hub skill description
2026-03-18 04:11:43 -07:00
Test 947827bba0 chore: tighten huggingface-hub skill description 2026-03-18 04:11:33 -07:00
Teknium e4a3ffa9c1 feat: use SOUL.md as primary agent identity instead of hardcoded default (#1922)
SOUL.md now loads in slot #1 of the system prompt, replacing the
hardcoded DEFAULT_AGENT_IDENTITY. This lets users fully customize
the agent's identity and personality by editing ~/.hermes/SOUL.md
without it conflicting with the built-in identity text.

When SOUL.md is loaded as identity, it's excluded from the context
files section to avoid appearing twice. When SOUL.md is missing,
empty, unreadable, or skip_context_files is set, the hardcoded
DEFAULT_AGENT_IDENTITY is used as a fallback.

The default SOUL.md (seeded on first run) already contains the full
Hermes personality, so existing installs are unaffected.

Co-authored-by: Test <test@test.com>
2026-03-18 04:11:20 -07:00
Teknium 1fa3737134 feat: GitHub Copilot provider integration (#1924)
feat: GitHub Copilot provider integration with OAuth auth, API routing, and docs
2026-03-18 04:09:30 -07:00
Test e7844e9c8d Merge origin/main, resolve conflicts (self._base_url_lower) 2026-03-18 04:09:00 -07:00
Teknium 1c761ae042 feat: add huggingface-hub bundled skill (#1921)
feat: add huggingface-hub bundled skill
2026-03-18 04:08:00 -07:00
Test 56ca84f243 feat: add huggingface-hub bundled skill
Adds the Hugging Face CLI (hf) reference as a built-in skill under
mlops/. Covers downloading/uploading models and datasets, repo
management, SQL queries on datasets, inference endpoints, Spaces,
buckets, and more.

Based on the official HF skill from huggingface/skills.
2026-03-18 04:07:41 -07:00
Test 04101bc59e docs: comprehensive GitHub Copilot provider documentation
- Add dedicated GitHub Copilot section in configuration guide with:
  - Auth options (OAuth device code, env vars, gh CLI)
  - Token type table (supported vs unsupported)
  - API routing explanation (GPT-5+ → Responses, others → Chat)
  - Copilot ACP setup instructions
  - Environment variable reference
- Add all Copilot env vars to environment-variables.md:
  COPILOT_GITHUB_TOKEN, HERMES_COPILOT_ACP_COMMAND, etc.
- Add copilot-acp to --provider list in cli-commands.md
- Docs build verified
2026-03-18 04:07:34 -07:00
Teknium 0a247a50f2 feat: support ignoring unauthorized gateway DMs (#1919)
Add unauthorized_dm_behavior config (pair|ignore) with global default
and per-platform override. WhatsApp can silently drop unknown DMs
instead of sending pairing codes.

Adapted config bridging to work with gw_data dict (pre-construction)
rather than config object. Dropped implementation plan document.

Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
2026-03-18 04:06:08 -07:00
Teknium 0e2714acea fix(cron): recover recent one-shot jobs (#1918)
Co-authored-by: Frederico Ribeiro <fr@tecompanytea.com>
2026-03-18 04:06:02 -07:00
Test 36921a3e98 fix: correct Copilot API mode selection to match opencode
The previous copilot_model_api_mode() checked the catalog's
supported_endpoints first and picked /chat/completions when a model
supported both endpoints. This is wrong — GPT-5+ models should use
the Responses API even when the catalog lists both.

Replicate opencode's shouldUseCopilotResponsesApi() logic:
- GPT-5+ models (gpt-5.4, gpt-5.3-codex, etc.) → Responses API
- gpt-5-mini → Chat Completions (explicit exception)
- Everything else (gpt-4o, claude, gemini, etc.) → Chat Completions
- Model ID pattern is the primary signal, catalog is secondary

The catalog fallback now only matters for non-GPT-5 models that might
exclusively support /v1/messages (e.g. Claude via Copilot).

Models are auto-detected from the live catalog at
api.githubcopilot.com/models — no hardcoded list required for
supported models, only a static fallback for when the API is
unreachable.
2026-03-18 03:54:50 -07:00
Teknium c1a127c87c Merge pull request #1917 from NousResearch/hermes/hermes-b29f73b2
feat(cli): add /statusbar command to toggle context bar
2026-03-18 03:50:05 -07:00
Test c1750bb32d feat(cli): add /statusbar command to toggle context bar
Adds /statusbar (alias /sb) to show/hide the bottom status bar that
displays model name, context usage, and session duration.

Uses ConditionalContainer so the bar takes zero space when hidden
rather than leaving a blank line.
2026-03-18 03:49:49 -07:00
Teknium 4699c226da chore: reorder OpenRouter model catalog (#1916)
chore: reorder OpenRouter model catalog
2026-03-18 03:31:19 -07:00
Test b05f9b6256 chore: reorder OpenRouter catalog — glm-5-turbo under glm-5, minimax under stepfun 2026-03-18 03:31:04 -07:00
Teknium 0679712d26 feat: reorder OpenRouter catalog, add haiku-4.5, fix minimax slug (#1915)
feat: reorder OpenRouter catalog, add haiku-4.5, fix minimax slug
2026-03-18 03:26:22 -07:00
Test cb54750e07 feat: reorder OpenRouter catalog, add haiku-4.5, fix minimax slug
- Add anthropic/claude-haiku-4.5
- Move gpt-5.4-pro and gpt-5.4-nano to bottom
- Fix minimax/minimax-m2.7 → minimax-m2.5 (m2.7 not on OpenRouter)
- Tag hunter-alpha and healer-alpha as free
- Place hunter/healer-alpha right below gpt-5.4-mini
2026-03-18 03:26:06 -07:00
Test 21c45ba0ac feat: proper Copilot auth with OAuth device code flow and token validation
Builds on PR #1879's Copilot integration with critical auth improvements
modeled after opencode's implementation:

- Add hermes_cli/copilot_auth.py with:
  - OAuth device code flow (copilot_device_code_login) using the same
    client_id (Ov23li8tweQw6odWQebz) as opencode and Copilot CLI
  - Token type validation: reject classic PATs (ghp_*) with a clear
    error message explaining supported token types
  - Proper env var priority: COPILOT_GITHUB_TOKEN > GH_TOKEN > GITHUB_TOKEN
    (matching Copilot CLI documentation)
  - copilot_request_headers() with Openai-Intent, x-initiator, and
    Copilot-Vision-Request headers (matching opencode)

- Update auth.py:
  - PROVIDER_REGISTRY copilot entry uses correct env var order
  - _resolve_api_key_provider_secret delegates to copilot_auth for
    the copilot provider with proper token validation

- Update models.py:
  - copilot_default_headers() now includes Openai-Intent and x-initiator

- Update main.py:
  - _model_flow_copilot offers OAuth device code login when no token
    is found, with manual token entry as fallback
  - Shows supported vs unsupported token types

- 22 new tests covering token validation, env var priority, header
  generation, and integration with existing auth infrastructure
2026-03-18 03:25:58 -07:00
Teknium c0c14e60b4 fix: make concurrent tool batching path-aware for file mutations (#1914)
* Improve tool batching independence checks

* fix: address review feedback on path-aware batching

- Log malformed/non-dict tool arguments at debug level before
  falling back to sequential, instead of silently swallowing
  the error into an empty dict
- Guard empty paths in _paths_overlap (unreachable in practice
  due to upstream filtering, but makes the invariant explicit)
- Add tests: malformed JSON args, non-dict args, _paths_overlap
  unit tests including empty path edge cases
- web_crawl is not a registered tool (only web_search/web_extract
  are); no addition needed to _PARALLEL_SAFE_TOOLS

---------

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-18 03:25:38 -07:00
Teknium 050b43108c feat: add gpt-5.4-mini, gpt-5.4-nano, healer-alpha to OpenRouter catalog (#1913)
feat: add gpt-5.4-mini, gpt-5.4-nano, healer-alpha to OpenRouter catalog
2026-03-18 03:23:36 -07:00
Test 00cc0c6a28 feat: add gpt-5.4-mini, gpt-5.4-nano, healer-alpha to OpenRouter catalog 2026-03-18 03:23:20 -07:00
Teknium bee13d9921 Merge pull request #1912 from NousResearch/hermes/hermes-b29f73b2
fix(banner): normalize toolset labels and use skin colors
2026-03-18 03:23:15 -07:00
Test f814787144 fix(banner): normalize toolset labels and use skin colors
- Strip '_tools' suffix from internal toolset identifiers in the banner
  (e.g. 'web_tools' -> 'web', 'homeassistant_tools' -> 'homeassistant')
- Stop appending '_tools' to unavailable toolset names
- Replace 6 hardcoded hex colors (#B8860B, #FFBF00, #FFF8DC) in toolset
  rows, overflow line, and MCP server rows with the skin variables
  (dim, accent, text) already resolved at the top of the function

Inspired by PR #1871 by @kshitijk4poor.
Adds 4 tests.
2026-03-18 03:22:58 -07:00
Teknium c9bb0c587f fix: direct user message on STT failure + hermes-agent-setup skill (#1905)
fix: direct user message on STT failure + hermes-agent-setup skill
2026-03-18 03:21:12 -07:00
Test 8422196e89 Merge PR #1879: feat: integrate GitHub Copilot providers 2026-03-18 03:18:33 -07:00
Teknium b70dd51cfa fix: disabled skills respected across banner, system prompt, slash commands, and skill_view (#1897)
* fix: banner skill count now respects disabled skills and platform filtering

The banner's get_available_skills() was doing a raw rglob scan of
~/.hermes/skills/ without checking:
- Whether skills are disabled (skills.disabled config)
- Whether skills match the current platform (platforms: frontmatter)

This caused the banner to show inflated skill counts (e.g. '100 skills'
when many are disabled) and list macOS-only skills on Linux.

Fix: delegate to _find_all_skills() from tools/skills_tool which already
handles both platform gating and disabled-skill filtering.

* fix: system prompt and slash commands now respect disabled skills

Two more places where disabled skills were still surfaced:

1. build_skills_system_prompt() in prompt_builder.py — disabled skills
   appeared in the <available_skills> system prompt section, causing
   the agent to suggest/load them despite being disabled.

2. scan_skill_commands() in skill_commands.py — disabled skills still
   registered as /skill-name slash commands in CLI help and could be
   invoked.

Both now load _get_disabled_skill_names() and filter accordingly.

* fix: skill_view blocks disabled skills

skill_view() checked platform compatibility but not disabled state,
so the agent could still load and read disabled skills directly.

Now returns a clear error when a disabled skill is requested, telling
the user to enable it via hermes skills or inspect the files manually.

---------

Co-authored-by: Test <test@test.com>
2026-03-18 03:17:37 -07:00
Test 190c07975d fix: check skill availability before hinting at hermes-agent-setup
Only mention the hermes-agent-setup skill in STT failure notes (both
the direct user message and the agent context note) when the skill is
actually installed. Uses _find_skill() from skill_manager_tool.

Also confirmed: STT is the only user-facing failure case where the
setup skill hint helps. Vision failures are transient API issues,
runtime transcription errors indicate a configured-but-broken provider,
and platform startup warnings are server logs.
2026-03-18 03:17:23 -07:00
Teknium 011ed540dd Merge pull request #1909 from NousResearch/hermes/hermes-b29f73b2
docs: fix MCP install commands — use uv, not bare pip
2026-03-18 03:15:15 -07:00
Test a9c405fac9 docs: fix MCP install commands — use uv, not bare pip
The standard install already includes MCP via .[all]. For users who
need to add it separately, the correct command is:
  cd ~/.hermes/hermes-agent && uv pip install -e ".[mcp]"

The venv is created by uv, so bare 'pip' isn't available. All four
occurrences across 3 docs pages updated.
2026-03-18 03:14:58 -07:00
Teknium 9c174e0940 Merge pull request #1908 from NousResearch/hermes/hermes-b29f73b2
fix(gateway): detect script-style gateway processes for --replace
2026-03-18 03:13:21 -07:00
TheSameCat2 5c4c4b8b7d fix(gateway): detect script-style gateway processes for --replace
Recognize hermes_cli/main.py gateway command lines in gateway
process detection and PID validation so --replace reliably finds
existing gateway instances.

Adds a regression test covering script-style cmdline detection.

Closes #1830
2026-03-18 03:12:59 -07:00
Test 764825bbff feat: expand hermes-agent-setup skill + tell agent about it in STT notes
Skill now covers full CLI usage (hermes setup, hermes skills, hermes
tools, hermes config, session management, etc.), config file reference,
and expanded gateway commands.

Agent context notes for STT failure now mention the hermes-agent-setup
skill is available to help users configure Hermes features.
2026-03-18 03:05:17 -07:00
Teknium ee4cc8ee3b Merge pull request #1907 from NousResearch/hermes/hermes-b29f73b2
feat(mcp): expose MCP servers as standalone toolsets
2026-03-18 03:04:34 -07:00
Test 4b53b89f09 feat(mcp): expose MCP servers as standalone toolsets
Each configured MCP server now registers as its own toolset in TOOLSETS
(e.g. TOOLSETS['github'] = {tools: ['mcp_github_list_files', ...]}),
making raw server names resolvable in platform_toolsets overrides.

Previously MCP tools were only injected into hermes-* umbrella toolsets,
so gateway sessions using raw toolset names like ['terminal', 'github']
in platform_toolsets couldn't resolve MCP tools.

Skips server names that collide with built-in toolsets. Also handles
idempotent reloads (syncs toolsets even when no new servers connect).

Inspired by PR #1876 by @kshitijk4poor.
Adds 2 tests (standalone toolset creation + built-in collision guard).
2026-03-18 03:04:17 -07:00
Teknium a2440f72f6 feat: use endpoint metadata for custom model context and pricing (#1906)
* perf: cache base_url.lower() via property, consolidate triple load_config(), hoist set constant

run_agent.py:
- Add base_url property that auto-caches _base_url_lower on every
  assignment, eliminating 12+ redundant .lower() calls per API cycle
  across __init__, _build_api_kwargs, _supports_reasoning_extra_body,
  and the main conversation loop
- Consolidate three separate load_config() disk reads in __init__
  (memory, skills, compression) into a single call, reusing the
  result dict for all three config sections

model_tools.py:
- Hoist _READ_SEARCH_TOOLS set to module level (was rebuilt inside
  handle_function_call on every tool invocation)

* Use endpoint metadata for custom model context and pricing

---------

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-18 03:04:07 -07:00
Test 9c0f346258 fix: direct user message on STT failure + hermes-agent-setup skill
When a user sends a voice message and STT isn't configured, the gateway
now sends a clear message directly to the user explaining how to set up
voice transcription, rather than relying on the agent to relay an
injected context note (which often gets misinterpreted).

Also adds a hermes-agent-setup bundled skill covering STT/TTS setup,
tool configuration, dependency installation, and troubleshooting.
2026-03-18 03:01:41 -07:00
Teknium 11f029c311 fix(tts): document NeuTTS provider and align install guidance (#1903)
Co-authored-by: charles-édouard <59705750+ccbbccbb@users.noreply.github.com>
2026-03-18 02:55:30 -07:00
Teknium fb923d5efc Merge pull request #1902 from NousResearch/hermes/hermes-b29f73b2
fix(gateway): PID-based wait with force-kill for gateway restart
2026-03-18 02:54:38 -07:00
Test ace2cc6257 fix(gateway): PID-based wait with force-kill for gateway restart
Add _wait_for_gateway_exit() that polls get_running_pid() to confirm
the old gateway process has actually exited before starting a new one.
If the process doesn't exit within 5s, sends SIGKILL to the specific
PID. Uses the saved PID from gateway.pid (not launchd labels) so it
works correctly with multiple gateway instances under separate
HERMES_HOME directories.

Applied to both launchd_restart() and the manual restart path (replaces
the blind time.sleep(2)).

Inspired by PR #1881 by @AzothZephyr (race condition diagnosis).
Adds 4 tests.
2026-03-18 02:54:18 -07:00
Teknium 24ac577046 fix: respect model.default from config.yaml for openai-codex provider (#1896)
When config.yaml had a non-default model (e.g. gpt-5.3-codex) and the
provider was openai-codex, _normalize_model_for_provider() would replace
it with the latest available codex model because _model_is_default only
checked the CLI argument, not the config value.

Now _model_is_default is False when config.yaml has a model that differs
from the global fallback (anthropic/claude-opus-4.6), so the user's
explicit config choice is preserved.

Fixes #1887

Co-authored-by: Test <test@test.com>
2026-03-18 02:50:31 -07:00
Teknium e86bfd7667 feat: upgrade MiniMax default to M2.7 + add new OpenRouter models (#1900)
feat: upgrade MiniMax default to M2.7 + add new OpenRouter models
2026-03-18 02:43:19 -07:00
octo-patch e4043633fc feat: upgrade MiniMax default to M2.7 + add new OpenRouter models
MiniMax: Add M2.7 and M2.7-highspeed as new defaults across provider
model lists, auxiliary client, metadata, setup wizard, RL training tool,
fallback tests, and docs. Retain M2.5/M2.1 as alternatives.

OpenRouter: Add grok-4.20-beta, nemotron-3-super-120b-a12b:free,
trinity-large-preview:free, glm-5-turbo, and hunter-alpha to the
model catalog.

MiniMax changes based on PR #1882 by @octo-patch (applied manually
due to stale conflicts in refactored pricing module).
2026-03-18 02:42:58 -07:00
Test a8132d1252 fix: respect model.default from config.yaml for openai-codex provider
When config.yaml had a non-default model (e.g. gpt-5.3-codex) and the
provider was openai-codex, _normalize_model_for_provider() would replace
it with the latest available codex model because _model_is_default only
checked the CLI argument, not the config value.

Now _model_is_default is False when config.yaml has a model that differs
from the global fallback (anthropic/claude-opus-4.6), so the user's
explicit config choice is preserved.

Fixes #1887
2026-03-18 02:24:41 -07:00
Teknium 927f4d3a37 fix(matrix): use correct reply_to_message_id parameter name (#1895)
fix(matrix): use correct reply_to_message_id parameter name
2026-03-18 02:23:38 -07:00
Bartok9 66f71c1836 fix(matrix): use correct reply_to_message_id parameter name
Fixes #1842

The MessageEvent dataclass expects 'reply_to_message_id' but the Matrix
connector was passing 'reply_to'. This caused replies to fail with:

    MessageEvent.__init__() got an unexpected keyword argument 'reply_to'

Changed the parameter name to match the dataclass definition.
2026-03-18 02:23:21 -07:00
Teknium b1069196a6 Merge pull request #1894 from NousResearch/hermes/hermes-b29f73b2
fix(delegate): move _saved_tool_names save/restore to _run_single_child scope
2026-03-18 02:23:14 -07:00
Bartok9 ba7248c669 fix(delegate): move _saved_tool_names save/restore to _run_single_child scope
Fixes #1802

The v0.3.0 refactor split child agent construction (_build_child_agent)
and execution (_run_single_child) into separate functions. This created
a scope bug where _saved_tool_names was defined in _build_child_agent
but referenced in _run_single_child's finally block, causing a NameError
on every delegate_task call.

Solution: Move the save/restore logic entirely into _run_single_child,
keeping the save and restore in the same scope as the try/finally block.
This is cleaner than passing the variable through and removes the dead
save from _build_child_agent.
2026-03-18 02:22:46 -07:00
Teknium 6fc4e36625 fix: search all sources by default in session_search (#1892)
* fix: include ACP sessions in default search sources

* fix: remove hardcoded source allowlist from session search

The default source_filter was a hardcoded list that silently excluded
any platform not explicitly listed. Instead of maintaining an ever-growing
allowlist, remove it entirely so all sources are searched by default.
Callers can still pass source_filter explicitly to narrow results.

Follow-up to cherry-picked PR #1817.

---------

Co-authored-by: someoneexistsontheinternet <154079416+someoneexistsontheinternet@users.noreply.github.com>
Co-authored-by: Test <test@test.com>
2026-03-18 02:21:29 -07:00
Teknium 7d7c2a62dd Merge pull request #1890 from NousResearch/hermes/hermes-b29f73b2
fix: OAuth flag stale after refresh/fallback, memory nudge never fires, dead code
2026-03-18 02:20:19 -07:00
Test 5b74df2bfc fix: OAuth flag stale after refresh/fallback, memory nudge never fires, dead code
- Update _is_anthropic_oauth in _try_refresh_anthropic_client_credentials()
  when token type changes during credential refresh
- Set _is_anthropic_oauth in _try_activate_fallback() Anthropic path
- Move _turns_since_memory and _iters_since_skill init to __init__ so
  nudge counters accumulate across run_conversation() calls in CLI mode
- Remove unreachable retry_count >= max_retries block after raise

Adds 7 regression tests. Salvaged from PR #1797 by @0xbyt4.
2026-03-18 02:19:57 -07:00
max 0c392e7a87 feat: integrate GitHub Copilot providers across Hermes
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-17 23:40:22 -07:00
Teknium f656dfcb32 Merge pull request #1840 from NousResearch/hermes/hermes-b29f73b2
fix: allow agent-created skills with caution-level findings
2026-03-17 16:33:04 -07:00
Test 0fab46f65c fix: allow agent-created skills with caution-level findings
Agent-created skills were using the same policy as community hub
installs, blocking any skill with medium/high severity findings
(e.g. docker pull, pip install, git clone). This meant the agent
couldn't create skills that reference Docker or other common tools.

Changed agent-created policy from (allow, block, block) to
(allow, allow, block) — matching the trusted policy. Caution-level
findings (medium/high severity) are now allowed through, while
dangerous findings (critical severity like exfiltration, prompt
injection, reverse shells) remain blocked.

Added 4 tests covering the agent-created policy: safe allowed,
caution allowed, dangerous blocked, force override.
2026-03-17 16:32:25 -07:00
Teknium 37dceb043e fix: improve gateway error handling for 429 usage limits and 500 context overflow (#1839)
fix: improve gateway error handling for 429 usage limits and 500 context overflow
2026-03-17 16:32:20 -07:00
silentconsensus 7ce374d3b9 Improve gateway error handling for 429 usage limits and 500 context overflow
- Distinguish plan usage limits (429 with usage_limit_reached) from transient rate limits
- Show approximate reset time in hours for plan limits
- Treat HTTP 500 with large sessions as context overflow (same as 400)
- Move history length check earlier for reuse across status codes
2026-03-17 16:32:01 -07:00
Teknium 6e4415e865 Merge pull request #1838 from NousResearch/hermes/hermes-b29f73b2
fix(context_compressor): replace print() calls with logger
2026-03-17 16:31:32 -07:00
Test 45bad9771d fix(context_compressor): replace print() calls with logger
Replaces all remaining print() calls in compress() with logger.info()
and logger.warning() for consistency with the rest of the module.

Inspired by PR #1822.
2026-03-17 16:31:01 -07:00
Teknium 8d60db0f6f fix(discord): remove bugged followup messages + remove /ask command (#1836)
fix(discord): remove bugged followup messages + remove /ask command
2026-03-17 16:28:36 -07:00
Test 1bee519a6f fix(discord): remove redundant /ask slash command
/ask was just 'send a message to the bot' via the slash command menu —
completely redundant since Discord bots already listen to channel messages.
Removed as part of salvaging PR #1827.
2026-03-17 16:25:09 -07:00
charliekerfoot 72bfa115a0 fix(discord): removebugged follow up messages from discord slash commands 2026-03-17 16:24:17 -07:00
Teknium 7f85b2914d Merge pull request #1824 from cutepawss/fix/search-files-pagination
Clean fix — adds pagination args to search_key for parity with read_file. Thanks @cutepawss!
2026-03-17 16:16:47 -07:00
Teknium b8076bb0bd feat: cron agents can suppress delivery with [SILENT] response (#1833)
feat: cron agents can suppress delivery with [SILENT] response
2026-03-17 16:09:24 -07:00
Test d35d923c76 feat: cron agents can suppress delivery with [SILENT] response
Every cron job prompt now includes guidance that the agent can respond
with [SILENT] when it has nothing new or noteworthy to report. The
scheduler checks for this marker and skips delivery, while still saving
output to disk for audit. Failed jobs always deliver regardless.

This replaces the notify parameter approach from PR #1807 with a simpler
always-on design — the model is smart enough to decide when there's
nothing worth reporting without needing a per-job flag.
2026-03-17 16:06:49 -07:00
darya a654bc04f7 fix(file_tools): include pagination args in repeated search key 2026-03-18 01:19:05 +03:00
Test a71e3f4d98 fix: add /browser to COMMAND_REGISTRY so it shows in help and autocomplete
The /browser command handler existed in cli.py but was never added to
COMMAND_REGISTRY after the centralized command registry refactor. This
meant:
- /browser didn't appear in /help
- No tab-completion or subcommand suggestions
- Dispatch used _base_word fallback instead of canonical resolution

Added CommandDef with connect/disconnect/status subcommands and
switched dispatch to use canonical instead of _base_word.
2026-03-17 13:29:36 -07:00
Teknium 588962d24e docs: escape {id} in api-server.md headings to fix MDX build (#1787)
MDX v2+ interprets curly braces in regular markdown as JSX
expressions. The headings 'GET /v1/responses/{id}' and
'DELETE /v1/responses/{id}' caused a ReferenceError during
Docusaurus static site generation because 'id' is not a
defined JavaScript variable. Escaped with backslashes.

Co-authored-by: Test <test@test.com>
2026-03-17 11:04:37 -07:00
Teknium 2fa33dde81 fix: handle message length overflow in streaming mode (#1783)
Stream consumer now splits messages that exceed the platform's
MAX_MESSAGE_LENGTH. When accumulated text grows past the safe limit,
the current message is finalized and a new message is started for the
overflow — same as how normal sends chunk long responses.

Split point prefers line boundaries (rfind newline) for clean breaks.
Works for all platforms (Telegram 4096, Discord 2000, etc.) by reading
the adapter's MAX_MESSAGE_LENGTH at runtime.

Also added a safety net in the Telegram adapter: if edit_message_text
still hits MESSAGE_TOO_LONG (e.g. markdown formatting expansion), it
truncates and returns success so the stream consumer doesn't die.

Co-authored-by: Test <test@test.com>
2026-03-17 11:00:52 -07:00
Teknium 7ac9088d5c fix: Telegram streaming — config bridge, not-modified, flood control (#1782)
* fix: NameError in OpenCode provider setup (prompt_text -> prompt)

The OpenCode Zen and OpenCode Go setup sections used prompt_text()
which is undefined. All other providers correctly use the local
prompt() function defined in setup.py. Fixes crash during
'hermes setup' when selecting either OpenCode provider.

* fix: Telegram streaming — config bridge, not-modified, flood control

Three fixes for gateway streaming:

1. Bridge streaming config from config.yaml into gateway runtime.
   load_gateway_config() now reads the 'streaming' key from config.yaml
   (same pattern as session_reset, stt, etc.), matching the docs.
   Previously only gateway.json was read.

2. Handle 'Message is not modified' in Telegram edit_message().
   This Telegram API error fires when editing with identical content —
   a no-op, not a real failure. Previously it returned success=False
   which made the stream consumer disable streaming entirely.

3. Handle RetryAfter / flood control in Telegram edit_message().
   Fast providers can hit Telegram rate limits during streaming.
   Now waits the requested retry_after duration and retries once,
   instead of treating it as a fatal edit failure.

Also fixed double-edit on stream finish: the consumer now tracks
last-sent text and skips redundant edits, preventing the not-modified
error at the source.

* refactor: make config.yaml the primary gateway config source

Eliminates the per-key bridge pattern in load_gateway_config().
Previously gateway.json was the primary source and each config.yaml
key needed an individual bridge — easy to forget (streaming was
missing, causing garl4546's bug).

Now config.yaml is read first and its keys are mapped directly into
the GatewayConfig.from_dict() schema. gateway.json is kept as a
legacy fallback layer (loaded first, then overwritten by config.yaml
keys). If gateway.json exists, a log message suggests migrating.

Also:
- Removed dead save_gateway_config() (never called anywhere)
- Updated CLI help text and send_message error to reference
  config.yaml instead of gateway.json

---------

Co-authored-by: Test <test@test.com>
2026-03-17 10:51:54 -07:00
Teknium dd60bcbfb7 feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756)
* feat: OpenAI-compatible API server platform adapter

Salvaged from PR #956, updated for current main.

Adds an HTTP API server as a gateway platform adapter that exposes
hermes-agent via the OpenAI Chat Completions and Responses APIs.
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at
http://localhost:8642/v1.

Endpoints:
- POST /v1/chat/completions  — stateless Chat Completions API
- POST /v1/responses         — stateful Responses API with chaining
- GET  /v1/responses/{id}    — retrieve stored response
- DELETE /v1/responses/{id}  — delete stored response
- GET  /v1/models            — list hermes-agent as available model
- GET  /health               — health check

Features:
- Real SSE streaming via stream_delta_callback (uses main's streaming)
- In-memory LRU response store for Responses API conversation chaining
- Named conversations via 'conversation' parameter
- Bearer token auth (optional, via API_SERVER_KEY)
- CORS support for browser-based frontends
- System prompt layering (frontend system messages on top of core)
- Real token usage tracking in responses

Integration points:
- Platform.API_SERVER in gateway/config.py
- _create_adapter() branch in gateway/run.py
- API_SERVER_* env vars in hermes_cli/config.py
- Env var overrides in gateway/config.py _apply_env_overrides()

Changes vs original PR #956:
- Removed streaming infrastructure (already on main via stream_consumer.py)
- Removed Telegram reply_to_mode (separate feature, not included)
- Updated _resolve_model() -> _resolve_gateway_model()
- Updated stream_callback -> stream_delta_callback
- Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected()
- Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK)

Tests: 72 new tests, all passing
Docs: API server guide, Open WebUI integration guide, env var reference

* feat(whatsapp): make reply prefix configurable via config.yaml

Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env.

The WhatsApp bridge prepends a header to every outgoing message.
This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize
or disable it via config.yaml:

  whatsapp:
    reply_prefix: ''                     # disable header
    reply_prefix: '🤖 *My Bot*\n───\n'  # custom prefix

How it works:
- load_gateway_config() reads whatsapp.reply_prefix from config.yaml
  and stores it in PlatformConfig.extra['reply_prefix']
- WhatsAppAdapter reads it from config.extra at init
- When spawning bridge.js, the adapter passes it as
  WHATSAPP_REPLY_PREFIX in the subprocess environment
- bridge.js handles undefined (default), empty (no header),
  or custom values with \\n escape support
- Self-chat echo suppression uses the configured prefix

Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a
key 10 (TAVILY_API_KEY), so existing users at v9 would never be
prompted for Tavily. Bumped to 10 to close the gap. Added a
regression test to prevent this from happening again.

Credit: ifrederico (PR #1764) for the bridge.js implementation
and the config version gap discovery.

---------

Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
Teknium b5cf0f0aef fix: preserve parent agent's tool list after subagent delegation (#1778)
Save and restore the process-global _last_resolved_tool_names in
_run_single_child() so the parent's execute_code sandbox generates
correct tool imports after delegation completes.

The global was already mostly mitigated (run_agent.py passes
enabled_tools via self.valid_tool_names), but the global itself
remained corrupted — a footgun for any code that reads it directly.

Co-authored-by: shane9coy <shane9coy@users.noreply.github.com>
2026-03-17 10:31:38 -07:00
Teknium 9a1e971126 fix(stt): respect explicit provider config instead of env-var fallback (#1775)
* fix(session): skip corrupt lines in load_transcript instead of crashing

Wrap json.loads() in load_transcript() with try/except JSONDecodeError
so that partial JSONL lines (from mid-write crashes like OOM/SIGKILL)
are skipped with a warning instead of crashing the entire transcript
load. The rest of the history loads fine.

Adds a logger.warning with the session ID and truncated corrupt line
content for debugging visibility.

Salvaged from PR #1193 by alireza78a.
Closes #1193

* fix(stt): respect explicit provider config instead of env-var fallback

Rework _get_provider() to separate explicit config from auto-detect.
When stt.provider is explicitly set in config.yaml, that choice is
authoritative — no silent cross-provider fallback based on which env
vars happen to be set. When no provider is configured, auto-detect
still tries: local > groq > openai.

This fixes the reported scenario where provider: local + a placeholder
OPENAI_API_KEY caused the system to silently select OpenAI and fail
with a 401.

Closes #1774
2026-03-17 10:30:58 -07:00
Teknium 088d65605a fix: NameError in OpenCode provider setup (prompt_text -> prompt) (#1779)
The OpenCode Zen and OpenCode Go setup sections used prompt_text()
which is undefined. All other providers correctly use the local
prompt() function defined in setup.py. Fixes crash during
'hermes setup' when selecting either OpenCode provider.
2026-03-17 10:30:16 -07:00
teknium1 c881209b92 Revert "feat(cli): skin-aware light/dark theme mode with terminal auto-detection"
This reverts commit a1c81360a5.
2026-03-17 10:04:53 -07:00
Teknium d7a2e3ddae fix: handle hyphenated FTS5 queries and preserve quoted literals (#1776)
_sanitize_fts5_query() was stripping ALL double quotes (including
properly paired ones), breaking user-provided quoted phrases like
"exact phrase".  Hyphenated terms like chat-send also silently
expanded to chat AND send, returning unexpected or zero results.

Fix:
1. Extract balanced quoted phrases into placeholders before
   stripping FTS5-special characters, then restore them.
2. Wrap unquoted hyphenated terms (word-word) in double quotes so
   FTS5 matches them as exact phrases instead of splitting on
   the hyphen.
3. Unmatched quotes are still stripped as before.

Based on issue report by @bailob (#1770) and PR #1773 by @Jah-yee
(whose branch contained unrelated changes and couldn't be merged
directly).

Closes #1770
Closes #1773

Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-17 09:44:01 -07:00
Teknium d5af593769 Merge pull request #1769 from sai-samarth/fix/whatsapp-send-message-support
Clean merge — PR is current against main, tests pass, implementation matches existing gateway WhatsApp bridge pattern.
2026-03-17 09:42:01 -07:00
Teknium df74f86955 Merge pull request #1767 from sai-samarth/fix/systemd-node-path-whatsapp
Clean fix for nvm/non-standard Node.js paths in systemd units. Merges cleanly.
2026-03-17 09:41:39 -07:00
sai-samarth a3de843fdb test: replace real-looking WhatsApp jid in regression test 2026-03-17 15:38:37 +00:00
sai-samarth dc15bc508f fix(tools): add outbound WhatsApp send_message routing 2026-03-17 15:31:13 +00:00
sai-samarth b8eb7c5fed fix(gateway): include resolved node path in systemd unit 2026-03-17 15:11:28 +00:00
Teknium 548cedb869 fix(context_compressor): prevent consecutive same-role messages after compression (#1743)
compress() checks both the head and tail neighbors when choosing the
summary message role.  When only the tail collides, the role is flipped.
When BOTH roles would create consecutive same-role messages (e.g.
head=assistant, tail=user), the summary is merged into the first tail
message instead of inserting a standalone message that breaks role
alternation and causes API 400 errors.

The previous code handled head-side collision but left the tail-side
uncovered — long conversations would crash mid-reply with no useful
error, forcing the user to /reset and lose session history.

Based on PR #1186 by @alireza78a, with improved double-collision
handling (merge into tail instead of unconditional 'user' fallback).

Co-authored-by: alireza78a <alireza78.crypto@gmail.com>
2026-03-17 05:18:52 -07:00
Teknium 702191049f fix(session): skip corrupt lines in load_transcript instead of crashing (#1744)
Wrap json.loads() in load_transcript() with try/except JSONDecodeError
so that partial JSONL lines (from mid-write crashes like OOM/SIGKILL)
are skipped with a warning instead of crashing the entire transcript
load. The rest of the history loads fine.

Adds a logger.warning with the session ID and truncated corrupt line
content for debugging visibility.

Salvaged from PR #1193 by alireza78a.
Closes #1193
2026-03-17 05:18:12 -07:00
Teknium aea39eeafb Merge pull request #1736 from NousResearch/fix/gateway-platform-hardening
fix(gateway): SMS session-per-send + Matrix bare media types break downstream processing
2026-03-17 04:46:25 -07:00
Teknium 23a3f01b2b Merge pull request #1735 from NousResearch/fix/tool-handler-safety
fix(tools): browser handlers TypeError on unexpected LLM params + fuzzy_match docstring
2026-03-17 04:46:22 -07:00
Teknium af118501b9 Merge pull request #1733 from NousResearch/fix/defensive-hardening
fix: defensive hardening — logging, dedup, locks, dead code
2026-03-17 04:46:20 -07:00
Teknium d1d17f4f0a feat(compression): add summary_base_url + move compression config to YAML-only
- Add summary_base_url config option to compression block for custom
  OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama)
- Remove compression env var bridges from cli.py and gateway/run.py
  (CONTEXT_COMPRESSION_* env vars no longer set from config)
- Switch run_agent.py to read compression config directly from
  config.yaml instead of env vars
- Fix backwards-compat block in _resolve_task_provider_model to also
  fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG
  sets this, which was silently preventing the compression section's
  summary_* keys from being read)
- Add test for summary_base_url config-to-client flow
- Update docs to show compression as config.yaml-only

Closes #1591
Based on PR #1702 by @uzaylisak
2026-03-17 04:46:15 -07:00
teknium1 6832d60bc0 fix(gateway): SMS persistent HTTP session + Matrix MIME media types
1. sms.py: Replace per-send aiohttp.ClientSession with a persistent
   session created in connect() and closed in disconnect(). Each
   outbound SMS no longer pays the TCP+TLS handshake cost. Falls back
   to a temporary session if the persistent one isn't available.

2. matrix.py: Use proper MIME types (image/png, audio/ogg, video/mp4)
   instead of bare category words (image, audio, video). The gateway's
   media processing checks startswith('image/') and startswith('audio/')
   so bare words caused Matrix images to skip vision enrichment and
   Matrix audio to skip transcription. Now extracts the actual MIME
   type from the nio event's content info when available.
2026-03-17 04:35:14 -07:00
teknium1 ea95462998 fix(tools): browser handler safety + fuzzy_match docstring accuracy
1. browser_tool.py: Replace **args spread on browser_click, browser_type,
   and browser_scroll handlers with explicit parameter extraction. The
   **args pattern passed all dict keys as keyword arguments, causing
   TypeError if the LLM sent unexpected parameters. Now extracts only
   the expected params (ref, text, direction) with safe defaults.

2. fuzzy_match.py: Update module docstring to match actual strategy
   order in code. Block anchor was listed as #3 but is actually #7.
   Multi-occurrence is not a separate strategy but a flag. Updated
   count from 9 to 8.
2026-03-17 04:32:39 -07:00
teknium1 847ee20390 fix: defensive hardening — logging, dedup, locks, dead code
Four small fixes:

1. model_tools.py: Tool import failures logged at WARNING instead of
   DEBUG. If a tool module fails to import (syntax error, missing dep),
   the user now sees a warning instead of the tool silently vanishing.

2. hermes_cli/config.py: Remove duplicate 'import sys' (lines 19, 21).

3. agent/model_metadata.py: Remove 6 duplicate entries in
   DEFAULT_CONTEXT_LENGTHS dict. Python keeps the last value, so no
   functional change, but removes maintenance confusion.

4. hermes_state.py: Add missing self._lock to the LIKE query in
   resolve_session_id(). The exact-match path used get_session()
   (which locks internally), but the prefix fallback queried _conn
   without the lock.
2026-03-17 04:31:26 -07:00
Teknium 867a96c051 fix+feat: bug fixes, auto session titles, .hermes.md project config (#1712)
fix+feat: bug fixes, auto session titles, .hermes.md project config
2026-03-17 04:30:48 -07:00
teknium1 0897e4350e merge: resolve conflicts with origin/main 2026-03-17 04:30:37 -07:00
Teknium d2b10545db feat(web): add Tavily as web search/extract/crawl backend (#1731)
Salvage of PR #1707 by @kshitijk4poor (cherry-picked with authorship preserved).

Adds Tavily as a third web backend alongside Firecrawl and Parallel, using the Tavily REST API via httpx.

- Backend selection via hermes tools → saved as web.backend in config.yaml
- All three tools supported: search, extract, crawl
- TAVILY_API_KEY in config registry, doctor, status, setup wizard
- 15 new Tavily tests + 9 backend selection tests + 5 config tests
- Backward compatible

Closes #1707
2026-03-17 04:28:03 -07:00
Teknium 85993fbb5a feat: pre-call sanitization and post-call tool guardrails (#1732)
Salvage of PR #1321 by @alireza78a (cherry-picked concept, reimplemented
against current main).

Phase 1 — Pre-call message sanitization:
  _sanitize_api_messages() now runs unconditionally before every LLM call.
  Previously gated on context_compressor being present, so sessions loaded
  from disk or running without compression could accumulate dangling
  tool_call/tool_result pairs causing API errors.

Phase 2a — Delegate task cap:
  _cap_delegate_task_calls() truncates excess delegate_task calls per turn
  to MAX_CONCURRENT_CHILDREN. The existing cap in delegate_tool.py only
  limits the task array within a single call; this catches multiple
  separate delegate_task tool_calls in one turn.

Phase 2b — Tool call deduplication:
  _deduplicate_tool_calls() drops duplicate (tool_name, arguments) pairs
  within a single turn when models stutter.

All three are static methods on AIAgent, independently testable.
29 tests covering happy paths and edge cases.
2026-03-17 04:24:27 -07:00
Teknium fb20a9e120 Merge pull request #1729 from NousResearch/fix/cron-timezone-naive-iso
fix(cron): naive ISO timestamps stored without timezone — jobs fire at wrong time
2026-03-17 04:24:02 -07:00
Teknium 21b823dd3b Merge pull request #1726 from NousResearch/fix/memory-tool-file-locking
fix(memory): concurrent writes silently drop entries — add file locking
2026-03-17 04:23:59 -07:00
Teknium 618ed2c65f fix(update): use .[all] extras with fallback in hermes update (#1728)
Both update paths now try .[all] first, fall back to . if extras fail. Fixes #1336.

Inspired by PR #1342 by @baketnk.
2026-03-17 04:22:37 -07:00
Teknium 9f81c11ba0 feat: eager fallback to backup model on rate-limit errors (#1730)
When a fallback model is configured, switch to it immediately upon
detecting rate-limit conditions (429, quota exhaustion, empty/malformed
responses) instead of exhausting all retries with exponential backoff.

Two eager-fallback checks:
1. Invalid/empty API responses — fallback attempted before retry loop
2. HTTP 429 / rate-limit keyword detection — fallback before backoff

Both guarded by _fallback_activated for one-shot semantics.

Cherry-picked from PR #1413 by usvimal.

Co-authored-by: usvimal <usvimal@users.noreply.github.com>
2026-03-17 04:21:16 -07:00
teknium1 5301c01776 fix(cron): make naive ISO timestamps timezone-aware at parse time
User-provided ISO timestamps like '2026-02-03T14:00' (no timezone)
were stored naive. The _ensure_aware() helper at check time interprets
naive datetimes using the current system timezone, but if the system
timezone changes between job creation and checking, the job fires at
the wrong time.

Fix: call dt.astimezone() at parse time to immediately stamp the
datetime with the local timezone. The stored value is now always
timezone-aware, so it's stable regardless of later timezone changes.
2026-03-17 04:20:24 -07:00
teknium1 d81de2f3d8 fix(memory): file-lock read-modify-write to prevent concurrent data loss
Two concurrent gateway sessions calling memory add/replace/remove
simultaneously could both read the old state, apply their changes
independently, and write — the last writer silently drops the first
writer's entry.

Fix: wrap each mutation in a file lock (fcntl.flock on a .lock file).
Under the lock, re-read entries from disk to get the latest state,
apply the mutation, then write. This ensures concurrent writers
serialize properly.

The lock uses a separate .lock file since the memory file itself is
atomically replaced via os.replace() (can't flock a replaced file).
Readers remain lock-free since atomic rename ensures they always see
a complete file.
2026-03-17 04:19:11 -07:00
Teknium 1314b4b541 feat(hooks): emit session:end lifecycle event (#1725)
Based on PR #1432 by @bayrakdarerdem. session:start was already on main; this adds the session:end event.

Co-authored-by: bayrakdarerdem <bayrakdarerdem@users.noreply.github.com>
2026-03-17 04:17:44 -07:00
ch3ronsa 695eb04243 feat(agent): .hermes.md per-repository project config discovery
Adds .hermes.md / HERMES.md discovery for per-project agent configuration.
When the agent starts, it walks from cwd to the git root looking for
.hermes.md (preferred) or HERMES.md, strips any YAML frontmatter, and
injects the markdown body into the system prompt as project context.

- Nearest-first discovery (subdirectory configs shadow parent)
- Stops at git root boundary (no leaking into parent repos)
- YAML frontmatter stripped (structured config deferred to Phase 2)
- Same injection scanning and 20K truncation as other context files
- 22 comprehensive tests

Original implementation by ch3ronsa. Cherry-picked and adapted for current main.

Closes #681 (Phase 1)
2026-03-17 04:16:32 -07:00
teknium1 e5fc916814 feat: auto-generate session titles after first exchange
After the first user→assistant exchange, Hermes now generates a short
descriptive session title via the auxiliary LLM (compression task config).
Title generation runs in a background thread so it never delays the
user-facing response.

Key behaviors:
- Fires only on the first 1-2 exchanges (checks user message count)
- Skips if a title already exists (user-set titles are never overwritten)
- Uses call_llm with compression task config (cheapest/fastest model)
- Truncates long messages to keep the title generation request small
- Cleans up LLM output: strips quotes, 'Title:' prefixes, enforces 80 char max
- Works in both CLI and gateway (Telegram/Discord/etc.)

Also updates /title (no args) to show the session ID alongside the title
in both CLI and gateway.

Implements #1426
2026-03-17 04:14:40 -07:00
Teknium 0878e5f4a8 Merge pull request #1724 from NousResearch/fix/model-metadata-fuzzy-match
fix(metadata): fuzzy context length match can return wrong model's value
2026-03-17 04:13:56 -07:00
Teknium 72bcec0ce5 Merge pull request #1723 from NousResearch/fix/compression-attempts-persist
fix(core): compression_attempts resets each iteration — allows unlimited compressions
2026-03-17 04:13:54 -07:00
Teknium d604b9622c Merge pull request #1722 from NousResearch/fix/run-agent-role-violations
fix(core): message role alternation violations in JSON recovery and error handler
2026-03-17 04:13:51 -07:00
Teknium cf0dd777c8 Merge pull request #1721 from NousResearch/fix/browser-session-race
fix(browser): race condition in session creation orphans cloud sessions
2026-03-17 04:13:49 -07:00
Teknium ec272ca8be Merge pull request #1720 from NousResearch/fix/compressor-consecutive-role-violation
fix(compressor): summary role can violate consecutive-role constraint
2026-03-17 04:13:46 -07:00
Teknium 99a44d87dc Merge pull request #1718 from NousResearch/fix/messaging-toolset-missing
fix(toolsets): add missing 'messaging' toolset — can't enable/disable send_message
2026-03-17 04:13:44 -07:00
Teknium 16f38abd25 Merge pull request #1717 from NousResearch/fix/length-continue-retries-reset
fix(core): length_continue_retries never resets — later truncations get fewer retries
2026-03-17 04:13:41 -07:00
Teknium cac3c4d45f Merge pull request #1716 from NousResearch/fix/cron-double-load-jobs
fix(cron): get_due_jobs reads jobs.json twice — race condition
2026-03-17 04:13:39 -07:00
Teknium 4167e2e294 Merge pull request #1714 from NousResearch/fix/anthropic-tool-choice-none
fix(anthropic): tool_choice 'none' still allows tool calls
2026-03-17 04:13:36 -07:00
Teknium 6ddb9ee3e3 Merge pull request #1713 from NousResearch/fix/auxiliary-is-nous-reset
fix(aux): auxiliary_is_nous flag never resets — leaks Nous tags to other providers
2026-03-17 04:13:33 -07:00
Teknium 05aefeddc7 Merge pull request #1711 from NousResearch/fix/matrix-mattermost-mark-connected
fix(gateway): Matrix and Mattermost never report as connected
2026-03-17 04:13:31 -07:00
teknium1 9db75fcfc2 fix(metadata): fuzzy context length match prefers longest key
The fuzzy match for model context lengths iterated dict insertion
order. Shorter model names (e.g. 'gpt-5') could match before more
specific ones (e.g. 'gpt-5.4-pro'), returning the wrong context
length.

Sort by key length descending so more specific model names always
match first.
2026-03-17 04:12:08 -07:00
teknium1 1264275cc3 fix(core): compression_attempts counter resets each loop iteration
compression_attempts was initialized inside the outer while loop,
resetting to 0 on every iteration. Since compression triggers a
'continue' back to the top of the loop, the counter never accumulated
past 1 — effectively allowing unlimited compression attempts.

Move initialization before the outer while loop so the cap of 3
applies across the entire run_conversation() call.
2026-03-17 04:11:32 -07:00
teknium1 cd6dc4ef7e fix(core): message role violations in JSON recovery and error handler
Two edge cases could inject messages that violate role alternation:

1. Invalid JSON recovery (line ~5985): After 3 retries of invalid JSON
   tool args, a user-role recovery message was injected. But the
   assistant's tool_calls were never appended, so the sequence could
   become user → user. Fix: append the assistant message with its
   tool_calls, then respond with proper tool-role error results.

2. System error handler (line ~6238): Always injected a user-role
   error message, which creates consecutive user messages if the last
   message was already user. Fix: dynamically choose the role based on
   the last message to maintain alternation.
2026-03-17 04:10:41 -07:00
teknium1 8cd4a96686 fix(browser): race condition in session creation can orphan cloud sessions
Two concurrent threads (e.g. parallel subagents) could both pass the
'task_id in _active_sessions' check, both create cloud sessions via
network calls, and then one would overwrite the other — leaking the
first cloud session.

Add double-check after the lock is re-acquired: if another thread
already created a session while we were doing the network call, use
the existing one instead of orphaning it.
2026-03-17 04:09:16 -07:00
teknium1 344f3771cb fix(compressor): summary role can create consecutive same-role messages
The summary message role was determined only by the last head message,
ignoring the first tail message. This could create consecutive user
messages (rejected by Anthropic) when the tail started with 'user'.

Now checks both neighbors. Priority: avoid colliding with the head
(already committed). If the chosen role also collides with the tail,
flip it — but only if flipping wouldn't re-collide with the head.
2026-03-17 04:08:37 -07:00
teknium1 8b851e2eeb fix(toolsets): add missing 'messaging' toolset definition
send_message_tool registers under toolset='messaging' but no
'messaging' entry existed in TOOLSETS. This meant --disable-toolset
messaging and --enable-toolset messaging silently failed, and the
hermes tools config UI couldn't toggle the messaging tools.
2026-03-17 04:06:06 -07:00
teknium1 24282dceb1 fix(core): reset length_continue_retries after successful continuation
length_continue_retries and truncated_response_prefix were initialized
once before the outer loop and never reset after a successful
continuation. If a conversation hit length truncation once (counter=1),
succeeded on continuation, did more tool calls, then hit length again,
the counter started at 1 instead of 0 — reducing available retries
from 3 to 2. The stale truncated_response_prefix would also leak
into the next response.

Reset both after the prefix is consumed on a successful final response.
2026-03-17 04:05:20 -07:00
teknium1 1f0bb8742f fix(cron): get_due_jobs read jobs.json twice creating race window
get_due_jobs() called load_jobs() twice: once for filtering (with
_apply_skill_fields) and once for saving updates. Between the two
reads, another process could modify jobs.json, causing the filtering
and saving to operate on different versions.

Fix: load once, deepcopy for the skill-applied working list.
2026-03-17 04:03:42 -07:00
teknium1 0de75505f3 fix(anthropic): tool_choice 'none' still allowed tool calls
When tool_choice was 'none', the code did 'pass' — no tool_choice
was sent but tools were still included in the request. Anthropic
defaults to 'auto' when tools are present, so the model could still
call tools despite the caller requesting 'none'.

Fix: omit tools entirely from the request when tool_choice is 'none',
which is the only way to prevent tool use with the Anthropic API.
2026-03-17 04:02:49 -07:00
teknium1 e5a244ad5d fix(aux): reset auxiliary_is_nous flag on each resolution attempt
The module-level auxiliary_is_nous was set to True by _try_nous() and
never reset. In long-running gateway processes, once Nous was resolved
as auxiliary provider, the flag stayed True forever — even if
subsequent resolutions chose a different provider (e.g. OpenRouter).
This caused Nous product tags to be sent to non-Nous providers.

Reset the flag at the start of _resolve_auto() so only the winning
provider's flag persists.
2026-03-17 04:02:15 -07:00
Teknium 4433b83378 feat(web): add Parallel as alternative web search/extract backend (#1696)
* feat(web): add Parallel as alternative web search/extract backend

Adds Parallel (parallel.ai) as a drop-in alternative to Firecrawl for
web_search and web_extract tools using the official parallel-web SDK.

- Backend selection via WEB_SEARCH_BACKEND env var (auto/parallel/firecrawl)
- Auto mode prefers Firecrawl when both keys present; Parallel when sole backend
- web_crawl remains Firecrawl-only with clear error when unavailable
- Lazy SDK imports, interrupt support, singleton clients
- 16 new unit tests for backend selection and client config

Co-authored-by: s-jag <s-jag@users.noreply.github.com>

* fix: add PARALLEL_API_KEY to config registry and fix web_crawl policy tests

Follow-up for Parallel backend integration:
- Add PARALLEL_API_KEY to OPTIONAL_ENV_VARS (hermes doctor, env blocklist)
- Add to set_config_value api_keys list (hermes config set)
- Add to doctor keys display
- Fix 2 web_crawl policy tests that didn't set FIRECRAWL_API_KEY
  (needed now that web_crawl has a Firecrawl availability guard)

* refactor: explicit backend selection via hermes tools, not auto-detect

Replace the auto-detect backend selection with explicit user choice:
- hermes tools saves WEB_SEARCH_BACKEND to .env when user picks a provider
- _get_backend() reads the explicit choice first
- Fallback only for manual/legacy config (uses whichever key is present)
- _is_provider_active() shows [active] for the selected web backend
- Updated tests, docs, and .env.example to remove 'auto' mode language

* refactor: use config.yaml for web backend, not env var

Match the TTS/browser pattern — web.backend is stored in config.yaml
(set by hermes tools), not as a WEB_SEARCH_BACKEND env var.

- _load_web_config() reads web: section from config.yaml
- _get_backend() reads web.backend from config, falls back to key detection
- _configure_provider() saves to config dict (saved to config.yaml)
- _is_provider_active() reads from config dict
- Removed WEB_SEARCH_BACKEND from .env.example, set_config_value, docs
- Updated all tests to mock _load_web_config instead of env vars

---------

Co-authored-by: s-jag <s-jag@users.noreply.github.com>
2026-03-17 04:02:02 -07:00
crazywriter1 7049dba778 fix(docker): remove container on cleanup when container_persistent=false
When container_persistent=false, the inner mini-swe-agent cleanup only
runs 'docker stop' in the background, leaving containers in Exited state.
Now cleanup() also runs 'docker rm -f' to fully remove the container.

Also fixes pre-existing test failures in model_metadata (gpt-4.1 1M context),
setup tests (TTS provider step), and adds MockInnerDocker.cleanup().

Original fix by crazywriter1. Cherry-picked and adapted for current main.

Fixes #1679
2026-03-17 04:02:01 -07:00
Teknium 6405d389aa test: align Hermes setup and full-suite expectations (#1710)
Salvaged from PR #1708 by @kartikkabadi. Cherry-picked with authorship preserved.

Fixes pre-existing test failures from setup TTS prompt flow changes and environment-sensitive assumptions.

Co-authored-by: Kartik <user2@RentKars-MacBook-Air.local>
2026-03-17 04:01:37 -07:00
teknium1 b111f2a779 fix(gateway): Matrix and Mattermost never report as connected
Neither adapter called _mark_connected() after successful connect(),
so _running stayed False, runtime status never showed 'connected',
and /status reported them as offline even while actively processing
messages.

Add _mark_connected() calls matching the pattern used by Telegram
and DingTalk adapters.
2026-03-17 04:01:02 -07:00
Teknium b16186a32a feat(telegram): auto-detect HTML tags and use parse_mode=HTML in send_message (#1709)
* feat: interactive MCP tool configuration in hermes tools

Add the ability to selectively enable/disable individual MCP server
tools through the interactive 'hermes tools' TUI.

Changes:
- tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function
  that temporarily connects to configured MCP servers, discovers their
  tools (names + descriptions), and disconnects. No registry side effects.

- hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the
  interactive menu. When selected:
  1. Probes all enabled MCP servers for their available tools
  2. Shows a per-server curses checklist with tool descriptions
  3. Pre-selects tools based on existing include/exclude config
  4. Writes changes back as tools.exclude entries in config.yaml
  5. Reports which servers failed to connect

The existing CLI commands (hermes tools enable/disable server:tool)
continue to work unchanged. This adds the interactive TUI counterpart
so users can browse and toggle MCP tools visually.

Tests: 22 new tests covering probe function edge cases and interactive
flow (pre-selection, exclude/include modes, description truncation,
multi-server handling, error paths).

* feat(telegram): auto-detect HTML tags and use parse_mode=HTML in send_message

When _send_telegram detects HTML tags in the message body, it now sends
with parse_mode='HTML' instead of converting to MarkdownV2. This allows
cron jobs and agents to send rich HTML-formatted Telegram messages with
bold, italic, code blocks, etc. that render correctly.

Detection uses the same regex from PR #1568 by @ashaney:
  re.search(r'<[a-zA-Z/][^>]*>', message)

Plain-text and markdown messages continue through the existing
MarkdownV2 pipeline. The HTML fallback path also catches HTML parse
errors and falls back to plain text, matching the existing MarkdownV2
error handling.

Inspired by: github.com/ashaney — PR #1568
2026-03-17 03:56:06 -07:00
Teknium abdb4660d4 Merge pull request #1705 from NousResearch/fix/dingtalk-requirements-check
fix(dingtalk): requirements check passes with only one credential set
2026-03-17 03:53:51 -07:00
Teknium ed3bcae8bd Merge pull request #1704 from NousResearch/fix/hermes-state-thread-locks
fix(state): add missing thread locks to 4 SessionDB methods
2026-03-17 03:53:48 -07:00
Teknium 75c5136e5a Merge pull request #1703 from NousResearch/fix/anthropic-adapter-merge-content-loss
fix(anthropic): consecutive assistant message merge drops content on mixed types
2026-03-17 03:53:45 -07:00
Teknium 1781c05adb Merge pull request #1701 from NousResearch/fix/gateway-yaml-pii-redaction
fix(gateway): PII redaction config never read — missing yaml import
2026-03-17 03:53:43 -07:00
teknium1 f613da4219 fix: add missing subprocess import in _install_neutts_deps
The function uses subprocess.run() and subprocess.CalledProcessError but
never imported the module. This caused a NameError crash during setup
when users selected NeuTTS as their TTS provider.

Fixes #1698
2026-03-17 03:53:35 -07:00
Teknium d87655afff fix(gateway): persist watcher metadata in checkpoint for crash recovery (#1706)
Salvaged from PR #1573 by @eren-karakus0. Cherry-picked with authorship preserved.

Fixes #1143 — background process notifications resume after gateway restart.

Co-authored-by: Muhammet Eren Karakuş <erenkar950@gmail.com>
2026-03-17 03:52:15 -07:00
teknium1 a9da944a5d fix(dingtalk): requirements check passes with only one credential set
check_dingtalk_requirements() used 'and' to check for missing env vars:
  if not CLIENT_ID and not CLIENT_SECRET: return False

This only returns False when BOTH are missing. If only one is set
(e.g. CLIENT_ID without CLIENT_SECRET), the check passes and
connect() fails later with a cryptic error.

Fix: Change 'and' to 'or' so it returns False when EITHER is missing.
2026-03-17 03:50:45 -07:00
teknium1 efa778a0ef fix(state): add missing thread locks to 4 SessionDB methods
search_sessions(), clear_messages(), delete_session(), and
prune_sessions() all accessed self._conn without acquiring self._lock.
Every other method in the class uses the lock. In multi-threaded
contexts (gateway serving concurrent platform messages), these
unprotected methods can cause sqlite3.ProgrammingError from concurrent
cursor operations on the same connection.
2026-03-17 03:50:06 -07:00
teknium1 8b411b234d fix(anthropic): merge consecutive assistant messages with mixed content types
When two consecutive assistant messages had mixed content types (one
string, one list), the merge logic just replaced the earlier message
entirely with the later one (fixed[-1] = m), silently dropping the
earlier message's content.

Apply the same normalization pattern used in the tool_use merge path
(lines 952-956): convert both to list format before concatenating.
This preserves all content from both messages.
2026-03-17 03:48:55 -07:00
Teknium ce7418e274 feat: interactive MCP tool configuration in hermes tools (#1694)
Add the ability to selectively enable/disable individual MCP server
tools through the interactive 'hermes tools' TUI.

Changes:
- tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function
  that temporarily connects to configured MCP servers, discovers their
  tools (names + descriptions), and disconnects. No registry side effects.

- hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the
  interactive menu. When selected:
  1. Probes all enabled MCP servers for their available tools
  2. Shows a per-server curses checklist with tool descriptions
  3. Pre-selects tools based on existing include/exclude config
  4. Writes changes back as tools.exclude entries in config.yaml
  5. Reports which servers failed to connect

The existing CLI commands (hermes tools enable/disable server:tool)
continue to work unchanged. This adds the interactive TUI counterpart
so users can browse and toggle MCP tools visually.

Tests: 22 new tests covering probe function edge cases and interactive
flow (pre-selection, exclude/include modes, description truncation,
multi-server handling, error paths).
2026-03-17 03:48:44 -07:00
teknium1 7c9beb5829 fix(gateway): add missing yaml import for PII redaction config read
The privacy.redact_pii config reader on line 1546 used bare 'yaml'
which is not in scope — yaml is imported as '_yaml' at module level
(line 93) and as '_y' in other methods. The NameError was silently
caught by the try/except, so PII redaction never activated even when
configured.

Add a local 'import yaml as _pii_yaml' consistent with the pattern
used elsewhere in the file.
2026-03-17 03:48:15 -07:00
Teknium 56e0c90445 Merge pull request #1700 from NousResearch/fix/redacting-formatter-import
fix(core): RedactingFormatter NameError when verbose_logging=True
2026-03-17 03:46:49 -07:00
Teknium 490d37bb80 Merge pull request #1699 from NousResearch/fix/nous-model-fetch-kwargs
fix(cli): fetch_nous_models called with positional args — always TypeError
2026-03-17 03:46:43 -07:00
Teknium ea238721f0 Merge pull request #1697 from NousResearch/fix/gateway-skill-command-nameref
fix(gateway): NameError on skill slash commands — wrong variable reference
2026-03-17 03:46:08 -07:00
Teknium d417ba2a48 feat: add route-aware pricing estimates (#1695)
Salvaged from PR #1563 by @kshitijk4poor. Cherry-picked with authorship preserved.

- Route-aware pricing architecture replacing static MODEL_PRICING + heuristics
- Canonical usage normalization (Anthropic/OpenAI/Codex API shapes)
- Cache-aware billing (separate cache_read/cache_write rates)
- Cost status tracking (estimated/included/unknown/actual)
- OpenRouter live pricing via models API
- Schema migration v4→v5 with billing metadata columns
- Removed speculative forward-looking entries
- Removed cost display from CLI status bar
- Threaded OpenRouter metadata pre-warm

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-17 03:44:44 -07:00
teknium1 c713d01e72 fix(core): move RedactingFormatter import before conditional block
RedactingFormatter was imported inside 'if not has_errors_log_handler:'
(line 461) but also used unconditionally in the verbose_logging block
(line 479). When the error log handler already exists (e.g. second
AIAgent in the same process) AND verbose_logging=True, the import was
skipped and line 479 raised NameError.

Fix: Move the import one level up so it's always available regardless
of whether the error log handler already exists.
2026-03-17 03:43:21 -07:00
teknium1 f95c6a221b fix(cli): use keyword args for fetch_nous_models (always TypeError)
fetch_nous_models() uses keyword-only parameters (the * separator in
its signature), but models.py called it with positional args and in
the wrong order (api_key first, base_url second). This always raised
TypeError, silently caught by except Exception: pass.

Result: Nous provider model list was completely broken — /model
autocomplete and provider_model_ids('nous') always fell back to the
static model catalog instead of fetching live models.
2026-03-17 03:42:46 -07:00
teknium1 718d4b013c fix(gateway): use correct variable for skill slash command task_id
Line 1482 referenced 'session_key' which is not defined until line 1519,
causing a NameError on every skill slash command invocation in the gateway
(e.g. /deploy, /plan-with-skill). The try/except silently swallowed the
error, making all user-defined skill slash commands silently fail.

The correct variable is '_quick_key', defined at line 1292 (same variable
used by the /plan handler on line 1379).
2026-03-17 03:42:15 -07:00
Teknium d9b9987ad3 docs: comprehensive documentation update for recent features
New documentation:
- DingTalk messaging platform setup guide (dingtalk.md)

Updated existing docs:
- quickstart.md: add Alibaba Cloud, Kilo Code, Vercel AI Gateway to provider table
- configuration.md: add Alibaba Cloud provider, website blocklist config,
  light/dark theme mode, smart approvals (ask/smart/off)
- environment-variables.md: add Mattermost, Matrix, DingTalk, Browser Use,
  DashScope env vars
- browser.md: add Browser Use cloud provider, /browser connect CDP mode,
  multi-provider architecture, fix limitation section contradiction
- slash-commands.md: add /tools enable/disable/list, /browser connect/disconnect/status
- messaging/index.md: add DingTalk, Mattermost, Matrix to architecture diagram,
  platform toolset table, security allowlists, and Next Steps links
- security.md: add website access policy (blocklist) documentation
- sidebars.ts: add Mattermost, Matrix, DingTalk to Messaging Gateway sidebar
2026-03-17 03:42:02 -07:00
Teknium ba728f3e63 docs: add DingTalk setup guide and Alibaba Cloud provider to Docusaurus docs (#1692)
* feat(gateway): wire DingTalk into gateway setup and platform maps

Add DingTalk to:
- hermes_cli/gateway.py: _PLATFORMS list with setup instructions,
  AppKey/AppSecret prompts, and Stream Mode setup guide
- gateway/run.py: all platform-to-config-key maps, allowed users
  map, allow-all-users map, and toolset resolution maps

* docs: add DingTalk setup guide and Alibaba Cloud provider to docs

- Create website/docs/user-guide/messaging/dingtalk.md with full
  setup guide (prerequisites, app creation, config, access control,
  features, troubleshooting, env var reference)
- Update messaging/index.md: add DingTalk to diagram, toolsets
  table, security examples, and next steps
- Update configuration.md: add Alibaba Cloud to provider table
2026-03-17 03:26:44 -07:00
Teknium d83efbb5bc feat(gateway): wire DingTalk into gateway setup and platform maps (#1690)
Add DingTalk to:
- hermes_cli/gateway.py: _PLATFORMS list with setup instructions,
  AppKey/AppSecret prompts, and Stream Mode setup guide
- gateway/run.py: all platform-to-config-key maps, allowed users
  map, allow-all-users map, and toolset resolution maps
2026-03-17 03:19:45 -07:00
Teknium 3cb83404e9 Merge pull request #1683 from NousResearch/feat/mattermost-matrix-adapters
feat: add Mattermost and Matrix gateway adapters
2026-03-17 03:18:25 -07:00
teknium1 1ae1e361b7 docs: add Mattermost and Matrix setup guides
Full Docusaurus docs following the Discord guide structure:

Mattermost (277 lines):
- Step-by-step: enable bot accounts, create bot, get token, add to channels
- All env vars documented with examples
- Reply mode (thread/off), home channel, troubleshooting

Matrix (354 lines):
- Step-by-step: create bot account, get access token (Element or API)
- Dual auth (token + password), E2EE section with libolm install
- Thread support, DM detection, home room, troubleshooting
- Works with any homeserver (Synapse, Conduit, Dendrite, matrix.org)
2026-03-17 03:18:16 -07:00
teknium1 016b1e10d7 feat: register Mattermost and Matrix env vars in OPTIONAL_ENV_VARS
Adds both platforms to the config system so hermes setup, hermes doctor,
and hermes config properly discover and manage their env vars.

- MATTERMOST_URL, MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS
- MATRIX_HOMESERVER, MATRIX_ACCESS_TOKEN, MATRIX_USER_ID, MATRIX_ALLOWED_USERS
- Extra env keys for .env sanitizer: MATTERMOST_HOME_CHANNEL,
  MATTERMOST_REPLY_MODE, MATRIX_PASSWORD, MATRIX_ENCRYPTION, MATRIX_HOME_ROOM
2026-03-17 03:18:16 -07:00
teknium1 c3ce6108e3 test: add comprehensive tests for Mattermost and Matrix adapters
77 tests covering:

Mattermost (37 tests):
- Platform enum and config loading
- Message formatting (image markdown stripping)
- Message chunking at 4000 chars
- Send with mocked aiohttp (payload, threading, errors)
- WebSocket event parsing (double-encoded JSON!)
- File upload flow
- Post dedup cache (TTL, pruning)
- Requirements check

Matrix (40 tests):
- Platform enum and config loading (token + password auth, E2EE)
- mxc:// to HTTP URL conversion (authenticated v1.11+ endpoint)
- DM detection via m.direct cache
- Reply fallback stripping
- Thread detection from m.relates_to
- Message formatting and markdown to HTML
- Display name resolution
- Requirements check
2026-03-17 03:18:16 -07:00
teknium1 cd67f60e01 feat: add Mattermost and Matrix gateway adapters
Add support for Mattermost (self-hosted Slack alternative) and Matrix
(federated messaging protocol) as messaging platforms.

Mattermost adapter:
- REST API v4 client for posts, files, channels, typing indicators
- WebSocket listener for real-time 'posted' events with reconnect backoff
- Thread support via root_id
- File upload/download with auth-aware caching
- Dedup cache (5min TTL, 2000 entries)
- Full self-hosted instance support

Matrix adapter:
- matrix-nio AsyncClient with sync loop
- Dual auth: access token or user_id + password
- Optional E2EE via matrix-nio[e2e] (libolm)
- Thread support via m.thread (MSC3440)
- Reply support via m.in_reply_to with fallback stripping
- Media upload/download via mxc:// URLs (authenticated v1.11+ endpoint)
- Auto-join on room invite
- DM detection via m.direct account data with sync fallback
- Markdown to HTML conversion

Fixes applied over original PR #1225 by @cyb0rgk1tty:
- Mattermost: add timeout to file downloads, wrap API helpers in
  try/except for network errors, download incoming files immediately
  with auth headers instead of passing auth-required URLs
- Matrix: use authenticated media endpoint (/_matrix/client/v1/media/),
  robust m.direct cache with sync fallback, prefer aiohttp over httpx

Install Matrix support: pip install 'hermes-agent[matrix]'
Mattermost needs no extra deps (uses aiohttp).

Salvaged from PR #1225 by @cyb0rgk1tty with fixes.
2026-03-17 03:18:16 -07:00
Teknium 07549c967a feat: add SMS (Twilio) platform adapter
Add SMS as a first-class messaging platform via the Twilio API.
Shares credentials with the existing telephony skill — same
TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN, TWILIO_PHONE_NUMBER env vars.

Adapter (gateway/platforms/sms.py):
- aiohttp webhook server for inbound (Twilio form-encoded POSTs)
- Twilio REST API with Basic auth for outbound
- Markdown stripping, smart chunking at 1600 chars
- Echo loop prevention, phone number redaction in logs

Integration (13 files):
- gateway config, run, channel_directory
- agent prompt_builder (SMS platform hint)
- cron scheduler, cronjob tools
- send_message_tool (_send_sms via Twilio API)
- toolsets (hermes-sms + hermes-gateway)
- gateway setup wizard, status display
- pyproject.toml (sms optional extra)
- 21 tests

Docs:
- website/docs/user-guide/messaging/sms.md (full setup guide)
- Updated messaging index (architecture, toolsets, security, links)
- Updated environment-variables.md reference

Inspired by PR #1575 (@sunsakis), rewritten for Twilio.
2026-03-17 03:14:53 -07:00
Teknium 3d38d85287 docs: add Alibaba Cloud and DingTalk to setup wizard and docs (#1687)
* feat(gateway): add DingTalk platform adapter

Add DingTalk as a messaging platform using the dingtalk-stream SDK
for real-time message reception via Stream Mode (no webhook needed).
Replies are sent via session webhook using markdown format.

Features:
- Stream Mode connection (long-lived WebSocket, no public URL needed)
- Text and rich text message support
- DM and group chat support
- Message deduplication with 5-minute window
- Auto-reconnection with exponential backoff
- Session webhook caching for reply routing

Configuration:
  export DINGTALK_CLIENT_ID=your-app-key
  export DINGTALK_CLIENT_SECRET=your-app-secret

  # or in config.yaml:
  platforms:
    dingtalk:
      enabled: true
      extra:
        client_id: your-app-key
        client_secret: your-app-secret

Files:
- gateway/platforms/dingtalk.py (340 lines) — adapter implementation
- gateway/config.py — add DINGTALK to Platform enum
- gateway/run.py — add DingTalk to _create_adapter
- hermes_cli/config.py — add env vars to _EXTRA_ENV_KEYS
- hermes_cli/tools_config.py — add dingtalk to PLATFORMS
- tests/gateway/test_dingtalk.py — 21 tests

* docs: add Alibaba Cloud and DingTalk to setup wizard and docs

Wire Alibaba Cloud (DashScope) into hermes setup and hermes model
provider selection flows. Add DingTalk env vars to documentation.

Changes:
- setup.py: Add Alibaba Cloud as provider choice (index 11) with
  DASHSCOPE_API_KEY prompt and model studio link
- main.py: Add alibaba to provider_labels, providers list, and
  model flow dispatch
- environment-variables.md: Add DASHSCOPE_API_KEY, DINGTALK_CLIENT_ID,
  DINGTALK_CLIENT_SECRET, and alibaba to HERMES_INFERENCE_PROVIDER
2026-03-17 03:13:58 -07:00
teknium1 6fc76ef954 fix: harden website blocklist — default off, TTL cache, fail-open, guarded imports
- Default enabled: false (zero overhead when not configured)
- Fast path: cached disabled state skips all work immediately
- TTL cache (30s) for parsed policy — avoids re-reading config.yaml
  on every URL check
- Missing shared files warn + skip instead of crashing all web tools
- Lazy yaml import — missing PyYAML doesn't break browser toolset
- Guarded browser_tool import — fail-open lambda fallback
- check_website_access never raises for default path (fail-open with
  warning log); only raises with explicit config_path (test mode)
- Simplified enforcement code in web_tools/browser_tool — no more
  try/except wrappers since errors are handled internally
2026-03-17 03:11:26 -07:00
Teknium d132a3dfbb feat(skills): add inference.sh skill (terminal-based, no custom tools) (#1686)
Add inference.sh as a built-in skill that uses the terminal tool to
run infsh CLI commands. No custom tools or tool registration — the
skill teaches the agent how to use the infsh binary via terminal.

Covers 150+ AI apps: image gen (FLUX, Reve, Seedream), video (Veo,
Wan, Seedance), LLMs, search (Tavily, Exa), 3D, avatars, and more.

Includes reference docs for authentication, app discovery, running
apps, and CLI command reference.

Based on PR #1021 by @okaris, reworked as a skill-only integration.

Co-authored-by: okaris <okaris@users.noreply.github.com>
2026-03-17 03:06:53 -07:00
Teknium a6dcc231f8 feat(gateway): add DingTalk platform adapter (#1685)
Add DingTalk as a messaging platform using the dingtalk-stream SDK
for real-time message reception via Stream Mode (no webhook needed).
Replies are sent via session webhook using markdown format.

Features:
- Stream Mode connection (long-lived WebSocket, no public URL needed)
- Text and rich text message support
- DM and group chat support
- Message deduplication with 5-minute window
- Auto-reconnection with exponential backoff
- Session webhook caching for reply routing

Configuration:
  export DINGTALK_CLIENT_ID=your-app-key
  export DINGTALK_CLIENT_SECRET=your-app-secret

  # or in config.yaml:
  platforms:
    dingtalk:
      enabled: true
      extra:
        client_id: your-app-key
        client_secret: your-app-secret

Files:
- gateway/platforms/dingtalk.py (340 lines) — adapter implementation
- gateway/config.py — add DINGTALK to Platform enum
- gateway/run.py — add DingTalk to _create_adapter
- hermes_cli/config.py — add env vars to _EXTRA_ENV_KEYS
- hermes_cli/tools_config.py — add dingtalk to PLATFORMS
- tests/gateway/test_dingtalk.py — 21 tests
2026-03-17 03:04:58 -07:00
Teknium c3d626eb07 Revert "feat: add inference.sh integration (infsh tool + skill) (#1682)" (#1684)
This reverts commit 6020db0243.
2026-03-17 03:01:30 -07:00
Teknium 6d1c5d4491 refactor(tools): extract position calculation logic in fuzzy_match (#1681)
Extract the repeated line-position calculation pattern into a
_calculate_line_positions() helper. The same 4-line pattern was
duplicated across _strategy_trimmed_boundary, _strategy_block_anchor,
_strategy_context_aware, and _find_normalized_matches. Also
standardizes the end_pos clamping (some sites used min(), some used
an if-guard).

Based on PR #1604 by aydnOktay.

Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
2026-03-17 03:00:17 -07:00
teknium1 30c417fe70 feat: add website blocklist enforcement for web/browser tools (#1064)
Adds security.website_blocklist config for user-managed domain blocking
across URL-capable tools. Enforced at the tool level (not monkey-patching)
so it's safe and predictable.

- tools/website_policy.py: shared policy loader with domain normalization,
  wildcard support (*.tracking.example), shared file imports, and
  structured block metadata
- web_extract: pre-fetch URL check + post-redirect recheck
- web_crawl: pre-crawl URL check + per-page URL recheck
- browser_navigate: pre-navigation URL check
- Blocked responses include blocked_by_policy metadata so the agent
  can explain exactly what was denied

Config:
  security:
    website_blocklist:
      enabled: true
      domains: ["evil.com", "*.tracking.example"]
      shared_files: ["team-blocklist.txt"]

Salvaged from PR #1086 by @kshitijk4poor. Browser post-redirect checks
deferred (browser_tool was fully rewritten since the PR branched).

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-17 02:59:39 -07:00
Teknium 6020db0243 feat: add inference.sh integration (infsh tool + skill) (#1682)
Add inference.sh CLI (infsh) as a tool integration, giving agents
access to 150+ AI apps through a single CLI — image gen (FLUX, Reve,
Seedream), video (Veo, Wan, Seedance), LLMs, search (Tavily, Exa),
3D, avatar/lipsync, and more. One API key manages all services.

Tools:
- infsh: run any infsh CLI command (app list, app run, etc.)
- infsh_install: install the CLI if not present

Registered as an 'inference' toolset (opt-in, not in core tools).
Includes comprehensive skill docs with examples for all app categories.

Changes from original PR:
- NOT added to _HERMES_CORE_TOOLS (available via --toolsets inference)
- Added 12 tests covering tool registration, command execution,
  error handling, timeout, JSON parsing, and install flow

Inspired by PR #1021 by @okaris.

Co-authored-by: okaris <okaris@users.noreply.github.com>
2026-03-17 02:59:21 -07:00
Teknium d9a7b83ae3 fix: make _is_write_denied robust to Path objects (#1678)
Cast path to str() before os.path.expanduser() to handle pathlib.Path
inputs safely.

Based on PR #1051 by JackTheGit.

Co-authored-by: JackTheGit <JackTheGit@users.noreply.github.com>
2026-03-17 02:57:02 -07:00
Teknium 1d5a39e002 fix: thread safety for concurrent subagent delegation (#1672)
* fix: thread safety for concurrent subagent delegation

Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:

1. Remove redirect_stdout/stderr from delegate_tool — mutating global
   sys.stdout races with the spinner thread when multiple children start
   concurrently, causing segfaults. Children already run with
   quiet_mode=True so the redirect was redundant.

2. Split _run_single_child into _build_child_agent (main thread) +
   _run_single_child (worker thread). AIAgent construction creates
   httpx/SSL clients which are not thread-safe to initialize
   concurrently.

3. Add threading.Lock to SessionDB — subagents share the parent's
   SessionDB and call create_session/append_message from worker threads
   with no synchronization.

4. Add _active_children_lock to AIAgent — interrupt() iterates
   _active_children while worker threads append/remove children.

5. Add _client_cache_lock to auxiliary_client — multiple subagent
   threads may resolve clients concurrently via call_llm().

Based on PR #1471 by peteromallet.

* feat: Honcho base_url override via config.yaml + quick command alias type

Two features salvaged from PR #1576:

1. Honcho base_url override: allows pointing Hermes at a remote
   self-hosted Honcho deployment via config.yaml:

     honcho:
       base_url: "http://192.168.x.x:8000"

   When set, this overrides the Honcho SDK's environment mapping
   (production/local), enabling LAN/VPN Honcho deployments without
   requiring the server to live on localhost. Uses config.yaml instead
   of env var (HONCHO_URL) per project convention.

2. Quick command alias type: adds a new 'alias' quick command type
   that rewrites to another slash command before normal dispatch:

     quick_commands:
       sc:
         type: alias
         target: /context

   Supports both CLI and gateway. Arguments are forwarded to the
   target command.

Based on PR #1576 by redhelix.

---------

Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
2026-03-17 02:53:33 -07:00
Teknium fd61ae13e5 revert: revert SMS (Telnyx) platform adapter for review
This reverts commit ef67037f8e.
2026-03-17 02:53:30 -07:00
Teknium ef67037f8e feat: add SMS (Telnyx) platform adapter
Implement SMS as a first-class messaging platform following
ADDING_A_PLATFORM.md checklist. All 16 integration points covered:

- gateway/platforms/sms.py: Core adapter with aiohttp webhook server,
  Telnyx REST API send, markdown stripping, 1600-char chunking,
  echo loop prevention, multi-number reply-from tracking
- gateway/config.py: Platform.SMS enum + env override block
- gateway/run.py: Adapter factory + auth maps (SMS_ALLOWED_USERS,
  SMS_ALLOW_ALL_USERS)
- toolsets.py: hermes-sms toolset + included in hermes-gateway
- cron/scheduler.py: SMS in platform_map for cron delivery
- tools/send_message_tool.py: SMS routing + _send_sms() standalone sender
- tools/cronjob_tools.py: 'sms' in deliver description
- gateway/channel_directory.py: SMS in session-based discovery
- agent/prompt_builder.py: SMS platform hint (plain text, concise)
- hermes_cli/status.py: SMS in platforms status display
- hermes_cli/gateway.py: SMS in setup wizard with Telnyx instructions
- pyproject.toml: sms optional dependency group (aiohttp>=3.9.0)
- tests/gateway/test_sms.py: Unit tests for config, format, truncate,
  echo prevention, requirements, toolset integration

Co-authored-by: sunsakis <teo@sunsakis.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-17 02:52:34 -07:00
Teknium 71c6b1ee99 fix: remove ANTHROPIC_BASE_URL env var to avoid collisions (#1675)
ANTHROPIC_BASE_URL collides with Claude Code and other Anthropic
tooling. Remove it from the Anthropic provider — base URL overrides
should go through config.yaml model.base_url instead.

The Alibaba/DashScope provider has its own dedicated base URL and
API key env vars which don't collide with anything.
2026-03-17 02:51:49 -07:00
teknium1 a1c81360a5 feat(cli): skin-aware light/dark theme mode with terminal auto-detection
Add display.theme_mode setting (auto/light/dark) that makes the CLI
readable on light terminal backgrounds.

- Auto-detect terminal background via COLORFGBG, OSC 11, and macOS
  appearance (fallback chain in hermes_cli/colors.py)
- Add colors_light overrides to all 7 built-in skins with dark/readable
  colors for light backgrounds
- SkinConfig.get_color() now returns light overrides when theme is light
- get_prompt_toolkit_style_overrides() uses light bg colors for
  completion menus in light mode
- init_skin_from_config() reads display.theme_mode from config
- 7 new tests covering theme mode resolution, detection fallbacks,
  and light-mode skin overrides

Salvaged from PR #1187 by @peteromallet. Core design preserved;
adapted to current main (kept all existing helpers, tool_emojis,
convenience functions that were added after the PR branched).

Co-authored-by: Peter O'Mallet <peteromallet@users.noreply.github.com>
2026-03-17 02:51:40 -07:00
Teknium d156942419 fix(telegram): aggregate split text messages before dispatching (#1674)
When a user sends a long message, Telegram clients split it into
multiple updates that arrive within milliseconds of each other.
Previously each chunk was dispatched independently — the first would
start the agent, and subsequent chunks would interrupt or queue as
separate turns, causing the agent to only see part of the message.

Add text message batching to TelegramAdapter following the same pattern
as the existing photo burst batching:

- _enqueue_text_event() buffers text by session key, concatenating
  chunks that arrive in rapid succession
- _flush_text_batch() dispatches the combined message after a 0.6s
  quiet period (configurable via HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS)
- Timer resets on each new chunk, so all parts of a split arrive
  before the batch is dispatched

Reported by NulledVector on Discord.
2026-03-17 02:49:57 -07:00
Teknium 7042a748f5 feat: add Alibaba Cloud provider and Anthropic base_url override (#1673)
Add Alibaba Cloud (DashScope) as a first-class inference provider
using the Anthropic-compatible endpoint. This gives access to Qwen
models (qwen3.5-plus, qwen3-max, qwen3-coder-plus, etc.) through
the same api_mode as native Anthropic.

Also add ANTHROPIC_BASE_URL env var support so users can point the
Anthropic provider at any compatible endpoint.

Changes:
- auth.py: Add alibaba ProviderConfig + ANTHROPIC_BASE_URL on anthropic
- models.py: Add alibaba to catalog, labels, aliases (dashscope/aliyun/qwen), provider order
- runtime_provider.py: Add alibaba resolution (anthropic_messages api_mode) + ANTHROPIC_BASE_URL
- model_metadata.py: Add Qwen model context lengths (128K)
- config.py: Add DASHSCOPE_API_KEY, DASHSCOPE_BASE_URL, ANTHROPIC_BASE_URL env vars

Usage:
  hermes --provider alibaba --model qwen3.5-plus
  # or via aliases:
  hermes --provider qwen --model qwen3-max
2026-03-17 02:49:22 -07:00
Teknium d9d937b7f7 fix: detect Claude Code version dynamically for OAuth user-agent
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

* perf: use ripgrep for file search (200x faster than find)

search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.

* refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow

Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)

* fix(docker): add explicit env allowlist for container credentials (#1436)

Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439

* fix: email send_typing metadata param + ☤ Hermes staff symbol

- email.py: add missing metadata parameter to send_typing() to match
  BasePlatformAdapter signature (PR #1431 by @ItsChoudhry)
- README.md: ⚕ → ☤ — the caduceus is Hermes's staff, not the
  medical Staff of Asclepius (PR #1420 by @rianczerwinski)

* fix(whatsapp): support LID format in self-chat mode (#1556)

WhatsApp now uses LID (Linked Identity Device) format alongside classic
@s.whatsapp.net. Self-chat detection checked only the classic format,
breaking self-chat mode for users on newer WhatsApp versions.

- Check both sock.user.id and sock.user.lid for self-chat detection
- Accept 'append' message type in addition to 'notify' (self-chat
  messages arrive as 'append')
- Track sent message IDs to prevent echo-back loops with media
- Add WHATSAPP_DEBUG env var for troubleshooting

Based on PR #1556 by jcorrego (manually applied due to cherry-pick
conflicts).

* fix: detect Claude Code version dynamically for OAuth user-agent

The _CLAUDE_CODE_VERSION was hardcoded to '2.1.2' but Anthropic
rejects OAuth requests when the spoofed user-agent version is too
far behind the current Claude Code release. The error is a generic
400 with just 'Error' as the message, making it very hard to diagnose.

Fix: detect the installed version via 'claude --version' at import
time, falling back to a bumped static constant (2.1.74) when Claude
Code isn't installed. This means users who keep Claude Code updated
never hit stale-version rejections.

Reported by Jack — changing the version string to match the installed
claude binary fixed persistent OAuth 400 errors immediately.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
Co-authored-by: jcorrego <jcorrego@users.noreply.github.com>
2026-03-17 02:48:33 -07:00
Teknium 65be657a79 feat(skills): add Sherlock OSINT username search skill
Add optional skill for username enumeration across 400+ social networks
using the Sherlock Project CLI (https://github.com/sherlock-project/sherlock).

Features:
- Smart username extraction from user messages
- Installation verification before execution
- Categorized output with clickable links
- Ethical use guidelines
- Docker, pipx, and pip installation paths

Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
2026-03-17 02:48:21 -07:00
Teknium b197bb01d3 docs(configuration): clarify self-hosted firecrawl setup
Co-authored-by: caentzminger <112503481+caentzminger@users.noreply.github.com>
2026-03-17 02:45:13 -07:00
Teknium a3ac142c83 fix(core): guard print() calls in run_conversation() against OSError
In headless environments (systemd, Docker, nohup) stdout can become
unavailable mid-session. Raw print() raises OSError which crashes
cron jobs — agent finishes work but delivery never happens because
the error handler's own print() also raises OSError.

Fix:
- Add _safe_print() static method that wraps print() with try/except
  OSError — silently drops output when stdout is broken
- Make _vprint() use _safe_print() — protects all calls through the
  verbose print path
- Convert raw print() calls in run_conversation() hot path to use
  _safe_print(): starting conversation, interrupt, budget exhausted,
  preflight compression, context cache, conversation completed
- Error handler print (the cascading crash point) gets explicit
  try/except with logger.error() fallback so diagnostics aren't lost

Fixes #845
Closes #1358 (superseded — PR was 323 commits stale with a bug)
2026-03-17 02:41:01 -07:00
Teknium 342a0ad372 fix(whatsapp): support LID format in self-chat mode (#1556)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

* perf: use ripgrep for file search (200x faster than find)

search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.

* refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow

Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)

* fix(docker): add explicit env allowlist for container credentials (#1436)

Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439

* fix: email send_typing metadata param + ☤ Hermes staff symbol

- email.py: add missing metadata parameter to send_typing() to match
  BasePlatformAdapter signature (PR #1431 by @ItsChoudhry)
- README.md: ⚕ → ☤ — the caduceus is Hermes's staff, not the
  medical Staff of Asclepius (PR #1420 by @rianczerwinski)

* fix(whatsapp): support LID format in self-chat mode (#1556)

WhatsApp now uses LID (Linked Identity Device) format alongside classic
@s.whatsapp.net. Self-chat detection checked only the classic format,
breaking self-chat mode for users on newer WhatsApp versions.

- Check both sock.user.id and sock.user.lid for self-chat detection
- Accept 'append' message type in addition to 'notify' (self-chat
  messages arrive as 'append')
- Track sent message IDs to prevent echo-back loops with media
- Add WHATSAPP_DEBUG env var for troubleshooting

Based on PR #1556 by jcorrego (manually applied due to cherry-pick
conflicts).

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
Co-authored-by: jcorrego <jcorrego@users.noreply.github.com>
2026-03-17 02:40:55 -07:00
Teknium 35d948b6e1 feat: add Kilo Code (kilocode) as first-class inference provider (#1666)
Add Kilo Gateway (kilo.ai) as an API-key provider with OpenAI-compatible
endpoint at https://api.kilo.ai/api/gateway. Supports 500+ models from
Anthropic, OpenAI, Google, xAI, Mistral, MiniMax via a single API key.

- Register kilocode in PROVIDER_REGISTRY with aliases (kilo, kilo-code,
  kilo-gateway) and KILOCODE_API_KEY / KILOCODE_BASE_URL env vars
- Add to model catalog, CLI provider menu, setup wizard, doctor checks
- Add google/gemini-3-flash-preview as default aux model
- 12 new tests covering registration, aliases, credential resolution,
  runtime config
- Documentation updates (env vars, config, fallback providers)
- Fix setup test index shift from provider insertion

Inspired by PR #1473 by @amanning3390.

Co-authored-by: amanning3390 <amanning3390@users.noreply.github.com>
2026-03-17 02:40:34 -07:00
Teknium 6c6d12033f fix: email send_typing metadata + ☤ Hermes staff symbol (#1431, #1420)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

* perf: use ripgrep for file search (200x faster than find)

search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.

* refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow

Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)

* fix(docker): add explicit env allowlist for container credentials (#1436)

Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439

* fix: email send_typing metadata param + ☤ Hermes staff symbol

- email.py: add missing metadata parameter to send_typing() to match
  BasePlatformAdapter signature (PR #1431 by @ItsChoudhry)
- README.md: ⚕ → ☤ — the caduceus is Hermes's staff, not the
  medical Staff of Asclepius (PR #1420 by @rianczerwinski)

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
2026-03-17 02:37:40 -07:00
Teknium 556e0f4b43 fix(docker): add explicit env allowlist for container credentials (#1436)
Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.

Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.

Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.

Fixes #1436
Supersedes #1439
2026-03-17 02:34:35 -07:00
Teknium d50e0711c2 refactor(tts): replace NeuTTS optional skill with built-in provider + setup flow
Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.

Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
  espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state

Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
  neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)
2026-03-17 02:33:12 -07:00
Teknium e2e53d497f fix: recognize Claude Code OAuth credentials in startup gate (#1455)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

* fix: recognize Claude Code OAuth credentials in startup gate (#1455)

The _has_any_provider_configured() startup check didn't look for
Claude Code OAuth credentials (~/.claude/.credentials.json). Users
with only Claude Code auth got the setup wizard instead of starting.

Cherry-picked from PR #1455 by kshitijk4poor.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
Co-authored-by: kshitij <kshitijk4poor@users.noreply.github.com>
2026-03-17 02:32:16 -07:00
teknium1 693f5786ac perf: use ripgrep for file search (200x faster than find)
search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).

Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.

Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.
2026-03-17 02:32:02 -07:00
Teknium 9ece1ce2de feat(gateway): inject reply-to message context for out-of-session replies (#1594)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

* feat(gateway): inject reply-to message context for out-of-session replies (#1594)

When a user replies to a Telegram message, check if the quoted text
exists in the current session transcript. If missing (from cron jobs,
background tasks, or old sessions), prepend [Replying to: "..."] to
the message so the agent has context about what's being referenced.

- Add reply_to_text field to MessageEvent (base.py)
- Populate from Telegram's reply_to_message (text or caption)
- Inject context in _handle_message when not found in history

Based on PR #1596 by anpicasso (cherry-picked reply-to feature only,
excluded unrelated /server command and background delegation changes).

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
2026-03-17 02:31:27 -07:00
Teknium 36a76bf9db Merge pull request #1661 from NousResearch/fix/discord-thread-persistence
fix(discord): persist thread participation across gateway restarts
2026-03-17 02:27:09 -07:00
Teknium d0faf77208 fix(gateway): /model shows active fallback model instead of config default (#1615)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

* fix(gateway): /model shows active fallback model instead of config default (#1615)

When the agent falls back to a different model (e.g. due to rate
limiting), /model still showed the config default. Now tracks the
effective model/provider after each agent run and displays it.

Cleared when the primary model succeeds again or the user explicitly
switches via /model.

Cherry-picked from PR #1616 by MaxKerkula. Added hasattr guard for
test compatibility.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
Co-authored-by: Max K <MaxKerkula@users.noreply.github.com>
2026-03-17 02:26:51 -07:00
teknium1 c8582fc4a2 fix(discord): persist thread participation across gateway restarts
_bot_participated_threads was an in-memory set — lost on every restart.
After restart, the bot forgot which threads it was active in, requiring
fresh @mentions and potentially creating duplicate threads instead of
continuing existing conversations.

Changes:
- Persist thread IDs to ~/.hermes/discord_threads.json
- Load on adapter init, save on every new thread participation
- _track_thread() replaces direct .add() calls for atomic persist
- Cap at 500 tracked threads to prevent unbounded growth
- /thread slash command also tracks participation
- 7 new tests covering persistence, restart survival, corruption
  recovery, cap enforcement
2026-03-17 02:26:34 -07:00
Teknium 60b67e2b47 fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

* fix(gateway): cap interrupt recursion depth to prevent resource exhaustion (#816)

When a user sends multiple messages while the agent keeps failing,
_run_agent() calls itself recursively with no depth limit. This can
exhaust stack/memory if the agent is in a failure loop.

Add _MAX_INTERRUPT_DEPTH = 3. When exceeded, the pending message is
logged and the current result is returned instead of recursing deeper.

The log handler duplication bug described in #816 was already fixed
separately (AIAgent.__init__ deduplicates handlers).

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:23:07 -07:00
Teknium 2c7c30be69 fix(security): harden terminal safety and sandbox file writes (#1653)
* fix(security): harden terminal safety and sandbox file writes

Two security improvements:

1. Dangerous command detection: expand shell -c pattern to catch
   combined flags (bash -lc, bash -ic, ksh -c) that were previously
   undetected. Pattern changed from matching only 'bash -c' to
   matching any shell invocation with -c anywhere in the flags.

2. File write sandboxing: add HERMES_WRITE_SAFE_ROOT env var that
   constrains all write_file/patch operations to a configured directory
   tree. Opt-in — when unset, behavior is unchanged. Useful for
   gateway/messaging deployments that should only touch a workspace.

Based on PR #1085 by ismoilh.

* fix: correct "POSIDEON" typo to "POSEIDON" in banner ASCII art

The poseidon skin's banner_logo had the E and I letters swapped,
spelling "POSIDEON-AGENT" instead of "POSEIDON-AGENT".

---------

Co-authored-by: ismoilh <ismoilh@users.noreply.github.com>
Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
2026-03-17 02:22:12 -07:00
Teknium 6a320e8bfe fix(security): block sandbox backend creds from subprocess env (#1264)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

* fix(security): block sandbox backend creds from subprocess env (#1264)

Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.

Cherry-picked from PR #1571 by ygd58.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:20:42 -07:00
Teknium cb0deb5f9d feat: add NeuTTS optional skill + local TTS provider backend
* feat(skills): add bundled neutts optional skill

Add NeuTTS optional skill with CLI scaffold, bootstrap helper, and
sample voice profile. Also fixes skills_hub.py to handle binary
assets (WAV files) during skill installation.

Changes:
- optional-skills/mlops/models/neutts/ — skill + CLI scaffold
- tools/skills_hub.py — binary asset support (read_bytes, write_bytes)
- tests/tools/test_skills_hub.py — regression tests for binary assets

* feat(tts): add NeuTTS as local TTS provider backend

Add NeuTTS as a fourth TTS provider option alongside Edge, ElevenLabs,
and OpenAI. NeuTTS runs fully on-device via neutts_cli — no API key
needed.

Provider behavior:
- Explicit: set tts.provider to 'neutts' in config.yaml
- Fallback: when Edge TTS is unavailable and neutts_cli is installed,
  automatically falls back to NeuTTS instead of failing
- check_tts_requirements() now includes NeuTTS in availability checks

NeuTTS outputs WAV natively. For Telegram voice bubbles, ffmpeg
converts to Opus (same pattern as Edge TTS).

Changes:
- tools/tts_tool.py — _generate_neutts(), _check_neutts_available(),
  provider dispatch, fallback logic, Opus conversion
- hermes_cli/config.py — tts.neutts config defaults

---------

Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
2026-03-17 02:13:34 -07:00
Teknium 766f4aae2b refactor: tie api_mode to provider config instead of env var (#1656)
Remove HERMES_API_MODE env var. api_mode is now configured where the
endpoint is defined:

- model.api_mode in config.yaml (for the active model config)
- custom_providers[].api_mode (for named custom providers)

Replace _get_configured_api_mode() with _parse_api_mode() which just
validates a value against the whitelist without reading env vars.

Both paths (model config and named custom providers) now read api_mode
from their respective config entries rather than a global override.
2026-03-17 02:13:26 -07:00
Teknium 4e66d22151 fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)

When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.

Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.

Cherry-picked from PR #1593 by ygd58.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:10:36 -07:00
Teknium 8992babaa3 fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)

The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.

Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:09:26 -07:00
Teknium 49043b7b7d feat: add /tools disable/enable/list slash commands with session reset (#1652)
Add in-session tool management via /tools disable/enable/list, plus
hermes tools list/disable/enable CLI subcommands. Supports both
built-in toolsets (web, memory) and MCP tools (github:create_issue).

To preserve prompt caching, /tools disable/enable in a chat session
saves the change to config and resets the session cleanly — the user
is asked to confirm before the reset happens.

Also improves prefix matching: /qui now dispatches to /quit instead
of showing ambiguous when longer skill commands like /quint-pipeline
are installed.

Based on PR #1520 by @YanSte.

Co-authored-by: Yannick Stephan <YanSte@users.noreply.github.com>
2026-03-17 02:05:26 -07:00
Teknium f2414bfd45 feat: allow custom endpoints to use responses API via api_mode override (#1651)
Add HERMES_API_MODE env var and model.api_mode config field to let
custom OpenAI-compatible endpoints opt into codex_responses mode
without requiring the OpenAI Codex OAuth provider path.

- _get_configured_api_mode() reads HERMES_API_MODE env (precedence)
  then model.api_mode from config.yaml; validates against whitelist
- Applied in both _resolve_openrouter_runtime() and
  _resolve_named_custom_runtime() (original PR only covered openrouter)
- Fix _dump_api_request_debug() to show /responses URL when in
  codex_responses mode instead of always showing /chat/completions
- Tests for config override, env override, invalid values, named
  custom providers, and debug dump URL for both API modes

Inspired by PR #1041 by @mxyhi.

Co-authored-by: mxyhi <mxyhi@users.noreply.github.com>
2026-03-17 02:04:36 -07:00
0xbyt4 68fbcdaa06 fix: add browser_console to browser toolset and core tools list (#1084)
browser_console was registered in the tool registry but missing from
all toolset definitions (TOOLSETS, _HERMES_CORE_TOOLS, _LEGACY_TOOLSET_MAP),
so the agent could never discover or use it.

Added to all 4 locations + 4 wiring tests.

Cherry-picked from PR #1084 by @0xbyt4 (authorship preserved in tests).
2026-03-17 02:02:57 -07:00
teknium1 7d91b436e4 fix: exclude hidden directories from find/grep search backends (#1558)
The primary injection vector in #1558 was search_files discovering
catalog cache files in .hub/index-cache/ via find or grep, which
don't skip hidden directories like ripgrep does by default.

Three-layer fix:

1. _search_files (find): add -not -path '*/.*' to exclude hidden
   directories, matching ripgrep's default behavior.

2. _search_with_grep: add --exclude-dir='.*' to skip hidden
   directories in the grep fallback path.

3. _write_index_cache: write a .ignore file to .hub/ so ripgrep
   also skips it even when invoked with --hidden (belt-and-suspenders).

This makes all three search backends (rg, grep, find) consistently
exclude hidden directories, preventing the agent from discovering
and reading unvetted community content in hub cache files.
2026-03-17 02:02:57 -07:00
Teknium 40e2f8d9f0 feat(provider): add OpenCode Zen and OpenCode Go providers
Add support for OpenCode Zen (pay-as-you-go, 35+ curated models) and
OpenCode Go ($10/month subscription, open models) as first-class providers.

Both are OpenAI-compatible endpoints resolved via the generic api_key
provider flow — no custom adapter needed.

Files changed:
- hermes_cli/auth.py — ProviderConfig entries + aliases
- hermes_cli/config.py — OPENCODE_ZEN/GO API key env vars
- hermes_cli/models.py — model catalogs, labels, aliases, provider order
- hermes_cli/main.py — provider labels, menu entries, model flow dispatch
- hermes_cli/setup.py — setup wizard branches (idx 10, 11)
- agent/model_metadata.py — context lengths for all OpenCode models
- agent/auxiliary_client.py — default aux models
- .env.example — documentation

Co-authored-by: DevAgarwal2 <DevAgarwal2@users.noreply.github.com>
2026-03-17 02:02:43 -07:00
Teknium 4cb6735541 fix(approval): show full command in dangerous command approval (#1553)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:02:33 -07:00
teknium1 0351e4fa90 fix: add metadata param to base send_image and forward in send_animation
_send_response_parts() calls send_image(metadata=_thread_metadata) but
the base class signature didn't accept metadata, crashing platforms that
don't override send_image. send_animation already had the param but
wasn't forwarding it.

Credit: @0xbyt4 (PR #1077)
2026-03-17 02:02:28 -07:00
Teknium 1b2d6c424c fix: add --yes flag to bypass confirmation in /skills install and uninstall (#1647)
Fixes hanging when using /skills install or /skills uninstall from the
TUI — bare input() calls hang inside prompt_toolkit's event loop.

Changes:
- Add skip_confirm parameter to do_install() and do_uninstall()
- Separate --yes/-y (confirmation bypass) from --force (scan override)
  in both argparse and slash command handlers
- Update usage hint for /skills uninstall to show [--yes]

The original PR (#1595) accidentally deleted the install_from_quarantine()
call, which would have broken all installs. That bug is not present here.

Based on PR #1595 by 333Alden333.

Co-authored-by: 333Alden333 <333Alden333@users.noreply.github.com>
2026-03-17 01:59:07 -07:00
Teknium 28c35d045d Merge pull request #1537 from aydnOktay/improve/skill-manager-error-logging
Improve error logging in skill manager tool
2026-03-17 01:53:58 -07:00
Teknium 1f6a1f0028 fix(tools): chunk long messages in send_message_tool before platform dispatch
* add base support

* fix: correct skill author attribution to youssefea

* fix(tools): chunk long messages in send_message_tool before platform dispatch

  - Convert BasePlatformAdapter.truncate_message() to @staticmethod
  - Apply truncate_message() in _send_to_platform() with per-platform
    max lengths
  - Remove naive character split in _send_discord()
  - Attach media files to last chunk only for Telegram
  - Add regression tests for chunking and media placement

---------

Co-authored-by: youssefea <youcefea99@gmail.com>
Co-authored-by: llbn <46884939+llbn@users.noreply.github.com>
2026-03-17 01:52:51 -07:00
Teknium d7029489d6 fix: show custom endpoint models in /model via live API probe (#1645)
Add 'custom' to the provider order so custom OpenAI-compatible
endpoints appear in /model list. Probes the endpoint's /models API
to dynamically discover available models.

Changes:
- Add 'custom' to _PROVIDER_ORDER in list_available_providers()
- Add _get_custom_base_url() helper to read model.base_url from config
- Add custom branch in provider_model_ids() using fetch_api_models()
- Custom endpoint detection via base_url presence for has_creds check

Based on PR #1612 by @aashizpoudel.

Co-authored-by: Aashish Poudel <aashizpoudel@users.noreply.github.com>
2026-03-17 01:52:46 -07:00
Teknium 12afccd9ca fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
2026-03-17 01:52:43 -07:00
Teknium 81f76111b0 Merge pull request #1560 from eren-karakus0/fix/singularity-preflight-check
fix(terminal): add Singularity/Apptainer preflight availability check
2026-03-17 01:52:03 -07:00
Teknium 96dac22194 fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files (#1630, #1558)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
2026-03-17 01:50:59 -07:00
Teknium 2d36819503 feat: add Base blockchain optional skill
* add base support

* fix: correct skill author attribution to youssefea

---------

Co-authored-by: youssefea <youcefea99@gmail.com>
2026-03-17 01:50:03 -07:00
Teknium 8e20a7e035 fix(gateway): strip MEDIA: and [[audio_as_voice]] tags from message body
* fix(gateway): strip MEDIA: and [[audio_as_voice]] tags from message body

Closes #1561

* fix: remove redundant re import, use existing import

---------

Co-authored-by: mettin4 <coktinmetin@gmail.com>
2026-03-17 01:47:35 -07:00
Teknium 4920c5940f feat: auto-detect local file paths in gateway responses for native media delivery (#1640)
Small models (7B-14B) can't reliably use MEDIA: or IMAGE: syntax. This
adds extract_local_files() to BasePlatformAdapter that regex-detects
bare local file paths ending in image/video extensions, validates them
with os.path.isfile(), and delivers them as native platform attachments.

Hardened over the original PR:
- Code-block exclusion: paths inside fenced blocks and inline code are
  skipped so code samples are never mutilated
- URL rejection: negative lookbehind prevents matching path segments
  inside HTTP URLs
- Relative path rejection: ./foo.png no longer matches
- Tilde path cleanup: raw ~/... form is removed from response text
- Deduplication by expanded path
- Added .webm to _VIDEO_EXTS
- Fallback to send_document for unrecognized media extensions

Based on PR #1636 by sudoingX.

Co-authored-by: sudoingX <sudoingX@users.noreply.github.com>
2026-03-17 01:47:34 -07:00
Teknium 3744118311 feat(cli): two-stage /model autocomplete with ghost text suggestions (#1641)
* feat(cli): two-stage /model autocomplete with ghost text suggestions

- SlashCommandCompleter: Tab-complete providers first (anthropic:, openrouter:, etc.)
  then models within the selected provider
- SlashCommandAutoSuggest: inline ghost text for slash commands, subcommands,
  and /model provider:model two-stage suggestions
- Custom Tab key binding: accepts provider completion and immediately
  re-triggers completions to show that provider's models
- COMMANDS_BY_CATEGORY: structured format with explicit subcommands for
  tab completion and ghost text (prompt, reasoning, voice, skills, cron, browser)
- SUBCOMMANDS dict auto-extracted from command definitions
- Model/provider info cached 60s for responsive completions

* fix: repair test regression and restore gold color from PR #1622

- Fix test_unknown_command_still_shows_error: patch _cprint instead of
  console.print to match the _cprint switch in process_command()
- Restore gold color on 'Type /help' hint using _DIM + _GOLD constants
  instead of bare \033[2m (was losing the #B8860B gold)
- Use _GOLD constant for ambiguous command message for consistency
- Add clarifying comment on SUBCOMMANDS regex fallback

---------

Co-authored-by: Lars van der Zande <lmvanderzande@gmail.com>
2026-03-17 01:47:32 -07:00
Teknium 5ada0b95e9 Merge pull request #1609 from 0xbyt4/fix/context-counter-cache-tokens
fix: context counter shows cached token count in status bar
2026-03-17 01:45:12 -07:00
teknium1 19eaf5d956 test: fix telegram mock to include ParseMode constant
The MarkdownV2 formatting change imports telegram.constants.ParseMode,
which the test mock didn't provide. Add ParseMode to the mock so
existing tests continue working.
2026-03-17 01:44:11 -07:00
Alex Ferrari 365d175100 fix: apply MarkdownV2 formatting in _send_telegram for proper rendering
The _send_telegram() function was sending raw markdown text without
parse_mode, causing bold, links, and headers to render as plain text.
This fix reuses the gateway adapter's format_message() to convert
markdown to Telegram's MarkdownV2 format, with a fallback to plain
text if parsing fails.
2026-03-17 01:44:11 -07:00
Teknium c3ca68d25b Merge pull request #1614 from PeterFile/fix/launchd-service-recovery
fix(gateway): recover stale launchd service state
2026-03-17 01:43:07 -07:00
Teknium eaa9ceeb43 Merge pull request #1621 from Death-Incarnate/main
fix: isolate test_anthropic_adapter from local credentials
2026-03-17 01:40:39 -07:00
Teknium 949fac192f fix(tools): remove unnecessary crontab requirement from cronjob tool (#1638)
* fix(tools): remove unnecessary crontab requirement from cronjob tool

The hermes cron system is internal — it uses a JSON-based scheduler
ticked by the gateway (cron/scheduler.py), not system crontab.

The check for shutil.which('crontab') was preventing the cronjob tool
from being available in environments without crontab installed (e.g.
minimal Ubuntu containers).

Changes:
- Remove shutil.which('crontab') check from check_cronjob_requirements()
- Remove unused shutil import
- Update docstring to clarify internal scheduler is used
- Update tests to reflect new behavior and add coverage for all
  session modes (interactive, gateway, exec_ask)

Fixes #1589

* test: add HERMES_EXEC_ASK coverage for cronjob requirements

Adds missing test for the exec_ask session mode, complementing
the cherry-picked fix from PR #1633.

---------

Co-authored-by: Bartok9 <bartokmagic@proton.me>
2026-03-17 01:40:02 -07:00
Teknium 4b96d10bc3 fix(cli): invalidate update-check cache after hermes update
Signed-off-by: nidhi-singh02 <nidhi2894@gmail.com>
Co-authored-by: nidhi-singh02 <nidhi2894@gmail.com>
2026-03-17 01:38:11 -07:00
teknium1 c16870277c test: add regression test for stale PID in gateway_state.json (#1631)
Verifies that write_runtime_status() overwrites pid and start_time
from a previous process rather than preserving them via setdefault().
Covers the fix from PR #1632.
2026-03-17 01:35:02 -07:00
Teknium 247e3c1470 Merge pull request #1632 from nidhi-singh02/fix/stale-pid-gateway-state
fix(gateway): overwrite stale PID in gateway_state.json on restart
2026-03-17 01:34:24 -07:00
Teknium 2af4af6390 Merge pull request #1635 from NousResearch/hermes/hermes-a86162db
fix: sanitize corrupted .env files on read and during migration
2026-03-17 01:33:36 -07:00
Teknium 749e9977a0 Merge pull request #1629 from NousResearch/hermes/hermes-6891ac11
feat(browser): multi-provider cloud browser support + Browser Use integration
2026-03-17 01:32:38 -07:00
teknium1 1c61ab6bd9 fix: unconditionally clear ANTHROPIC_TOKEN on v8→v9 migration
No conditional checks — just clear it. The new auth flow doesn't use
this env var. Anyone upgrading gets it wiped once, then it's done.
2026-03-17 01:31:20 -07:00
teknium1 e9f1a8e39b fix: gate ANTHROPIC_TOKEN cleanup to config version 8→9 migration
- Bump _config_version 8 → 9
- Move stale ANTHROPIC_TOKEN clearing into 'if current_ver < 9' block
  so it only runs once during the upgrade, not on every migrate_config()
- ANTHROPIC_TOKEN is still a valid auth path (OAuth flow), so we don't
  want to clear it repeatedly — only during the one-time migration from
  old setups that left it stale
- Add test_skips_on_version_9_or_later to verify one-time behavior
- All tests set config version 8 to trigger migration
2026-03-17 01:28:38 -07:00
teknium1 b6a51c955e fix: clear stale ANTHROPIC_TOKEN during migration, remove false *** detection
- Remove *** placeholder detection from _sanitize_env_lines (was based on
  confusing terminal redaction with literal file content)
- Add migrate_config() logic to clear stale ANTHROPIC_TOKEN when better
  credentials exist (ANTHROPIC_API_KEY or Claude Code auto-discovery)
- Old ANTHROPIC_TOKEN values shadow Claude Code credential fallthrough,
  breaking auth for users who updated without re-running setup
- Preserves ANTHROPIC_TOKEN when it's the only auth method available
- 3 new migration tests, updated existing tests
2026-03-17 01:26:23 -07:00
teknium1 634c1f6752 fix: sanitize corrupted .env files on read and during migration
Fixes two corruption patterns that break API keys during updates:

1. Concatenated KEY=VALUE pairs on a single line due to missing newlines
   (e.g. ANTHROPIC_API_KEY=sk-...OPENAI_BASE_URL=https://...). Uses a
   known-keys set to safely detect and split concatenated entries without
   false-splitting values that contain uppercase text.

2. Stale KEY=*** placeholder entries left by incomplete setup runs that
   never get updated and shadow real credentials.

Changes:
- Add _sanitize_env_lines() that splits concatenated known keys and drops
  *** placeholders
- Add sanitize_env_file() public API for explicit repair
- Call sanitization in save_env_value() on every read (self-healing)
- Call sanitize_env_file() at the start of migrate_config() so existing
  corrupted files are repaired on update
- 12 new tests covering splits, placeholders, edge cases, and integration
2026-03-17 01:13:34 -07:00
Teknium 6ebb816e56 Merge pull request #1634 from NousResearch/hermes/hermes-a86162db
chore: release v0.3.0 (v2026.3.17)
2026-03-17 00:55:51 -07:00
teknium1 37862f74fa chore: release v0.3.0 (v2026.3.17)
- Bump version 0.2.0 → 0.3.0
- Add comprehensive changelog (248 merged PRs, 15 contributors)
- CalVer tag: v2026.3.17
2026-03-17 00:38:48 -07:00
nidhi-singh02 67546746d4 fix(gateway): overwrite stale PID in gateway_state.json on restart
Signed-off-by: nidhi-singh02 <nidhi2894@gmail.com>
2026-03-17 13:01:55 +05:30
ShawnPana d44b6b7f1b feat(browser): multi-provider cloud browser support + Browser Use integration
Introduce a cloud browser provider abstraction so users can switch
between Local Browser, Browserbase, and Browser Use (or future providers)
via hermes tools / hermes setup.

Cloud browser providers are behind an ABC (tools/browser_providers/base.py)
so adding a new provider is a single-file addition with no changes to
browser_tool.py internals.

Changes:
- tools/browser_providers/ package with ABC, Browserbase extraction,
  and Browser Use provider
- browser_tool.py refactored to use _PROVIDER_REGISTRY + _get_cloud_provider()
  (cached) instead of hardcoded _is_local_mode() / _create_browserbase_session()
- tools_config.py: generic _is_provider_active() / _detect_active_provider_index()
  replace TTS-only logic; Browser Use added as third browser option
- config.py: BROWSER_USE_API_KEY added to OPTIONAL_ENV_VARS + show_config + allowlist
- subprocess pipe hang fix: agent-browser daemon inherits pipe fds,
  communicate() blocks. Replaced with Popen + temp files.

Original PR: #1208
Co-authored-by: ShawnPana <shawnpana@users.noreply.github.com>
2026-03-17 00:16:34 -07:00
Teknium 3576f44a57 feat: add Vercel AI Gateway provider (#1628)
* feat: add Vercel AI Gateway as a first-class provider

Adds AI Gateway (ai-gateway.vercel.sh) as a new inference provider
with AI_GATEWAY_API_KEY authentication, live model discovery, and
reasoning support via extra_body.reasoning.

Based on PR #1492 by jerilynzheng.

* feat: add AI Gateway to setup wizard, doctor, and fallback providers

* test: add AI Gateway to api_key_providers test suite

* feat: add AI Gateway to hermes model CLI and model metadata

Wire AI Gateway into the interactive model selection menu and add
context lengths for AI Gateway model IDs in model_metadata.py.

* feat: use claude-haiku-4.5 as AI Gateway auxiliary model

* revert: use gemini-3-flash as AI Gateway auxiliary model

* fix: move AI Gateway below established providers in selection order

---------

Co-authored-by: jerilynzheng <jerilynzheng@users.noreply.github.com>
Co-authored-by: jerilynzheng <zheng.jerilyn@gmail.com>
2026-03-17 00:12:16 -07:00
teknium1 4768ea624d fix: skip stale cron jobs on gateway restart instead of firing immediately
When the gateway restarts after being down past a scheduled run time,
recurring jobs (cron/interval) were firing immediately because their
next_run_at was in the past. Now jobs more than 2 minutes late are
fast-forwarded to the next future occurrence instead.

- get_due_jobs() checks staleness for cron/interval jobs
- Stale jobs get next_run_at recomputed and saved
- Jobs within 2 minutes of their schedule still fire normally
- One-shot (once) jobs are unaffected — they fire if missed

Fixes the 'cron jobs run on every gateway restart' issue.
2026-03-16 23:48:14 -07:00
Teknium e3f9894caf fix: send_animation metadata, MarkdownV2 inline code splitting, tirith cosign-free install (#1626)
* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting

Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)

* fix: three gateway issues from user error logs

1. send_animation missing metadata kwarg (base.py)
   - Base class send_animation lacked the metadata parameter that the
     call site in base.py line 917 passes. Telegram's override accepted
     it, but any platform without an override (Discord, Slack, etc.)
     hit TypeError. Added metadata to base class signature.

2. MarkdownV2 split-inside-inline-code (base.py truncate_message)
   - truncate_message could split at a space inside an inline code span
     (e.g. `function(arg1, arg2)`), leaving an unpaired backtick and
     unescaped parentheses in the chunk. Telegram rejects with
     'character ( is reserved'. Added inline code awareness to the
     split-point finder — detects odd backtick counts and moves the
     split before the code span.

3. tirith auto-install without cosign (tirith_security.py)
   - Previously required cosign on PATH for auto-install, blocking
     install entirely with a warning if missing. Now proceeds with
     SHA-256 checksum verification only when cosign is unavailable.
     Cosign is still used for full supply chain verification when
     present. If cosign IS present but verification explicitly fails,
     install is still aborted (tampered release).
2026-03-16 23:39:41 -07:00
teknium1 19c8ad3d3d fix: add Claude Code user-agent to OAuth token exchange/refresh requests
Anthropic's token endpoint is behind Cloudflare which blocks Python's
default urllib user-agent (Python-urllib/3.x). Without a proper
user-agent, the token exchange returns 403 (Cloudflare error 1010).

Adds 'claude-cli/2.1.2 (external, cli)' user-agent to all three
OAuth HTTP requests:
- Initial token exchange (authorization_code grant)
- Hermes token refresh (refresh_token grant)
- Claude Code credential refresh (refresh_token grant)

Verified: full OAuth PKCE flow now works end-to-end.
2026-03-16 23:26:43 -07:00
teknium1 bd3b0c712b fix: make OAuth login URL prominent for SSH/headless users
The URL is now the primary element — displayed in a bordered box
before the browser auto-open attempt. Works for users who SSH into
remote servers where webbrowser.open() silently fails.
2026-03-16 23:21:30 -07:00
Teknium 46176c8029 refactor: centralize slash command registry (#1603)
* refactor: centralize slash command registry

Replace 7+ scattered command definition sites with a single
CommandDef registry in hermes_cli/commands.py. All downstream
consumers now derive from this registry:

- CLI process_command() resolves aliases via resolve_command()
- Gateway _known_commands uses GATEWAY_KNOWN_COMMANDS frozenset
- Gateway help text generated by gateway_help_lines()
- Telegram BotCommands generated by telegram_bot_commands()
- Slack subcommand map generated by slack_subcommand_map()

Adding a command or alias is now a one-line change to
COMMAND_REGISTRY instead of touching 6+ files.

Bugfixes included:
- Telegram now registers /rollback, /background (were missing)
- Slack now has /voice, /update, /reload-mcp (were missing)
- Gateway duplicate 'reasoning' dispatch (dead code) removed
- Gateway help text can no longer drift from CLI help

Backwards-compatible: COMMANDS and COMMANDS_BY_CATEGORY dicts are
rebuilt from the registry, so existing imports work unchanged.

* docs: update developer docs for centralized command registry

Update AGENTS.md with full 'Slash Command Registry' and 'Adding a
Slash Command' sections covering CommandDef fields, registry helpers,
and the one-line alias workflow.

Also update:
- CONTRIBUTING.md: commands.py description
- website/docs/reference/slash-commands.md: reference central registry
- docs/plans/centralize-command-registry.md: mark COMPLETED
- plans/checkpoint-rollback.md: reference new pattern
- hermes-agent-dev skill: architecture table

* chore: remove stale plan docs
2026-03-16 23:21:03 -07:00
teknium1 b798062501 fix: improve OAuth login UX for headless/SSH users
Put the authorization URL front and center instead of treating it as
a fallback. Most Hermes users run on remote servers via SSH where
webbrowser.open() silently fails.
2026-03-16 23:17:29 -07:00
teknium1 63e88326a8 feat: Hermes-native PKCE OAuth flow for Claude Pro/Max subscriptions
Adds our own OAuth login and token refresh flow, independent of Claude
Code CLI. Mirrors the PKCE flow used by pi-ai (clawdbot) and OpenCode:

- run_hermes_oauth_login(): full PKCE authorization code flow
  - Opens browser to claude.ai/oauth/authorize
  - User pastes code#state back
  - Exchanges for access + refresh tokens
  - Stores in ~/.hermes/.anthropic_oauth.json (our own file)
  - Also writes to ~/.claude/.credentials.json for backward compat

- refresh_hermes_oauth_token(): automatic token refresh
  - POST to console.anthropic.com/v1/oauth/token with refresh_token
  - Updates both credential files on success

- Credential resolution priority updated:
  1. ANTHROPIC_TOKEN env var
  2. CLAUDE_CODE_OAUTH_TOKEN env var
  3. Hermes OAuth credentials (~/.hermes/.anthropic_oauth.json) ← NEW
  4. Claude Code credentials (~/.claude/.credentials.json)
  5. ANTHROPIC_API_KEY env var

Uses same CLIENT_ID, endpoints, scopes, and PKCE parameters as
Claude Code / OpenCode / pi-ai. Token refresh happens automatically
before each API call via _try_refresh_anthropic_client_credentials.
2026-03-16 23:15:56 -07:00
Teknium 474301adc6 fix: improve execute_code error logging and harden cleanup (#1623)
* fix(tools): improve error logging in code_execution_tool

* fix: harden execute_code cleanup and reduce logging noise

Follow-up to cherry-picked PR #1588 (aydnOktay):
- Initialize server_sock = None before try block to prevent NameError
  if exception occurs before socket creation (line 413 is inside the try)
- Guard server_sock.close() with None check
- Narrow cleanup exception handlers to OSError (the actual error type)
- Remove exc_info=True from cleanup debug logs — benign teardown
  failures don't need stack traces, the message is sufficient
- Remove redundant try/except around shutil.rmtree(ignore_errors=True)
- Silence sock_path unlink with pass — expected when already cleaned up

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-16 23:13:26 -07:00
DeadMan 285300528b fix: isolate test_anthropic_adapter from local credentials
Two tests lacked filesystem isolation causing them to pick up real
~/.claude/.credentials.json tokens on machines with Claude Code installed.

- test_prefers_oauth_token_over_api_key: add tmp_path, mock Path.home,
  clear CLAUDE_CODE_OAUTH_TOKEN env
- test_falls_back_to_token: same isolation

Also commit run_agent.py generic-400 retry fix.
2026-03-16 22:53:32 -07:00
Verne 673f132151 fix(gateway): Recover stale service state
Repair stale launchd/systemd definitions during install and
teach launchd start to reload unloaded jobs before retrying.

Stop masking service restart failures by falling back to a
foreground gateway when a configured service manager is still
broken.

Refs: #1613
2026-03-17 11:05:28 +08:00
0xbyt4 8d0a96a8bf fix: context counter shows cached token count in status bar
Anthropic prompt caching splits input into cache_read_input_tokens,
cache_creation_input_tokens, and non-cached input_tokens. The context
counter only read input_tokens (non-cached portion), showing ~3 tokens
instead of the real ~18K total. Now includes cached portions for
Anthropic native provider only — other providers (OpenAI, OpenRouter,
Codex) already include cached tokens in their prompt_tokens field.

Before: 3/200K | 0%
After: 17.7K/200K | 9%
2026-03-17 05:06:11 +03:00
SHL0MS cfa87e77a9 Merge pull request #1598 from NousResearch/shloms/ascii-video-v3
Refactor ascii-video skill: creative-first SKILL.md, consolidate references
2026-03-16 20:46:12 -04:00
Teknium 60e38e82ec fix: auto-detect D-Bus session bus for systemctl --user on headless servers (#1601)
* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting

Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)

* fix: auto-detect DBUS_SESSION_BUS_ADDRESS for systemctl --user on headless servers

On SSH sessions to headless servers, DBUS_SESSION_BUS_ADDRESS and
XDG_RUNTIME_DIR may not be set even when the user's systemd instance
is running via linger. This causes 'systemctl --user' to fail with
'Failed to connect to bus: No medium found', breaking gateway
restart/start/stop as a service and falling back to foreground mode.

Add _ensure_user_systemd_env() that detects the standard D-Bus socket
at /run/user/<UID>/bus and sets the env vars before any systemctl --user
call. Called from _systemctl_cmd() so all existing call sites benefit
automatically with zero changes.

Fixes: gateway restart falling back to foreground on headless servers

* fix: show linger guidance when gateway restart fails during update and gateway restart

When systemctl --user restart fails during 'hermes update' or
'hermes gateway restart', check linger status and tell the user
exactly what to run (sudo -S -p '' loginctl enable-linger) instead of
silently falling back to foreground mode.

Also applies _ensure_user_systemd_env() to the raw systemctl calls
in cmd_update so they work properly on SSH sessions where D-Bus
env vars are missing.
2026-03-16 17:45:48 -07:00
Teknium ce430fed4c installer: clarify why sudo is needed at every prompt (#1602)
* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting

Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)

* installer: clarify why sudo is needed at every prompt

Every sudo prompt now explicitly states what packages are being installed
and that Hermes Agent itself does not require or retain root access.
Covers system packages, build tools, and Playwright browser deps.
2026-03-16 17:43:48 -07:00
Teknium 6794e79bb4 feat: add /bg as alias for /background slash command (#1590)
* feat: add optional smart model routing

Add a conservative cheap-vs-strong routing option that can send very short/simple turns to a cheaper model across providers while keeping the primary model for complex work. Wire it through CLI, gateway, and cron, and document the config.yaml workflow.

* fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s

* fix(gateway): avoid recursive ExecStop in user systemd unit

* fix: extend ExecStop removal and TimeoutStopSec=60 to system unit

The cherry-picked PR #1448 fix only covered the user systemd unit.
The system unit had the same TimeoutStopSec=15 and could benefit
from the same 60s timeout for clean shutdown. Also adds a regression
test for the system unit.

---------

Co-authored-by: Ninja <ninja@local>

* feat(skills): add blender-mcp optional skill for 3D modeling

Control a running Blender instance from Hermes via socket connection
to the blender-mcp addon (port 9876). Supports creating 3D objects,
materials, animations, and running arbitrary bpy code.

Placed in optional-skills/ since it requires Blender 4.3+ desktop
with a third-party addon manually started each session.

* feat(acp): support slash commands in ACP adapter (#1532)

Adds /help, /model, /tools, /context, /reset, /compact, /version
to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled
directly in the server without instantiating the TUI — each command
queries agent/session state and returns plain text.

Unrecognized /commands fall through to the LLM as normal messages.

/model uses detect_provider_for_model() for auto-detection when
switching models, matching the CLI and gateway behavior.

Fixes #1402

* fix(logging): improve error logging in session search tool (#1533)

* fix(gateway): restart on retryable startup failures (#1517)

* feat(email): add skip_attachments option via config.yaml

* feat(email): add skip_attachments option via config.yaml

Adds a config.yaml-driven option to skip email attachments in the
gateway email adapter. Useful for malware protection and bandwidth
savings.

Configure in config.yaml:
  platforms:
    email:
      skip_attachments: true

Based on PR #1521 by @an420eth, changed from env var to config.yaml
(via PlatformConfig.extra) to match the project's config-first pattern.

* docs: document skip_attachments option for email adapter

* fix(telegram): retry on transient TLS failures during connect and send

Add exponential-backoff retry (3 attempts) around initialize() to
handle transient TLS resets during gateway startup. Also catches
TimedOut and OSError in addition to NetworkError.

Add exponential-backoff retry (3 attempts) around send_message() for
NetworkError during message delivery, wrapping the existing Markdown
fallback logic.

Both imports are guarded with try/except ImportError for test
environments where telegram is mocked.

Based on PR #1527 by cmd8. Closes #1526.

* feat: permissive block_anchor thresholds and unicode normalization (#1539)

Salvaged from PR #1528 by an420eth. Closes #517.

Improves _strategy_block_anchor in fuzzy_match.py:
- Add unicode normalization (smart quotes, em/en-dashes, ellipsis,
  non-breaking spaces → ASCII) so LLM-produced unicode artifacts
  don't break anchor line matching
- Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for
  multiple candidates — if first/last lines match exactly, the
  block is almost certainly correct
- Use original (non-normalized) content for offset calculation to
  preserve correct character positions

Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space
anchors, very-low-similarity unique matches), zero regressions on
all 9 existing fuzzy match tests.

Co-authored-by: an420eth <an420eth@users.noreply.github.com>

* feat(cli): add file path autocomplete in the input prompt (#1545)

When typing a path-like token (./  ../  ~/  /  or containing /),
the CLI now shows filesystem completions in the dropdown menu.
Directories show a trailing slash and 'dir' label; files show
their size. Completions are case-insensitive and capped at 30
entries.

Triggered by tokens like:
  edit ./src/ma     → shows ./src/main.py, ./src/manifest.json, ...
  check ~/doc       → shows ~/docs/, ~/documents/, ...
  read /etc/hos     → shows /etc/hosts, /etc/hostname, ...
  open tools/reg    → shows tools/registry.py

Slash command autocomplete (/help, /model, etc.) is unaffected —
it still triggers when the input starts with /.

Inspired by OpenCode PR #145 (file path completion menu).

Implementation:
- hermes_cli/commands.py: _extract_path_word() detects path-like
  tokens, _path_completions() yields filesystem Completions with
  size labels, get_completions() routes to paths vs slash commands
- tests/hermes_cli/test_path_completion.py: 26 tests covering
  path extraction, prefix filtering, directory markers, home
  expansion, case-insensitivity, integration with slash commands

* feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled

Add privacy.redact_pii config option (boolean, default false). When
enabled, the gateway redacts personally identifiable information from
the system prompt before sending it to the LLM provider:

- Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256>
- User IDs → hashed to user_<sha256>
- Chat IDs → numeric portion hashed, platform prefix preserved
- Home channel IDs → hashed
- Names/usernames → NOT affected (user-chosen, publicly visible)

Hashes are deterministic (same user → same hash) so the model can
still distinguish users in group chats. Routing and delivery use
the original values internally — redaction only affects LLM context.

Inspired by OpenClaw PR #47959.

* fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs)

Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM
needs the real ID to tag users. Redaction now only applies to WhatsApp,
Signal, and Telegram where IDs are pure routing metadata.

Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack.

* feat: smart approvals + /stop command (inspired by OpenAI Codex)

* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.

* feat: first-class plugin architecture + hide status bar cost by default (#1544)

The persistent status bar now shows context %, token counts, and
duration but NOT $ cost by default. Cost display is opt-in via:

  display:
    show_cost: true

in config.yaml, or: hermes config set display.show_cost true

The /usage command still shows full cost breakdown since the user
explicitly asked for it — this only affects the always-visible bar.

Status bar without cost:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m

Status bar with show_cost: true:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m

* feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex)

* feat: improve memory prioritization — user preferences over procedural knowledge

Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493)
which focus memory writes on user preferences and recurring patterns
rather than procedural task details.

Key insight: 'Optimize for reducing future user steering — the most
valuable memory prevents the user from having to repeat themselves.'

Changes:
- MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy
  and the core principle about reducing user steering
- MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put
  corrections first, added explicit PRIORITY guidance
- Memory nudge (run_agent.py): now asks specifically about preferences,
  corrections, and workflow patterns instead of generic 'anything'
- Memory flush (run_agent.py): now instructs to prioritize user
  preferences and corrections over task-specific details

* feat: more aggressive skill creation and update prompting

Press harder on skill updates — the agent should proactively patch
skills when it encounters issues during use, not wait to be asked.

Changes:
- SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction
  to patch skills immediately when found outdated/wrong
- Skills header: added instruction to update loaded skills before finishing
  if they had missing steps or wrong commands
- Skill nudge: more assertive ('save the approach' not 'consider saving'),
  now also prompts for updating existing skills used in the task
- Skill nudge interval: lowered default from 15 to 10 iterations
- skill_manage schema: added 'patch it immediately' to update triggers

* feat: first-class plugin architecture (#1555)

Plugin system for extending Hermes with custom tools, hooks, and
integrations — no source code changes required.

Core system (hermes_cli/plugins.py):
  - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and
    pip entry_points (hermes_agent.plugins group)
  - PluginContext with register_tool() and register_hook()
  - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call,
    on_session_start/end
  - Namespace package handling for relative imports in plugins
  - Graceful error isolation — broken plugins never crash the agent

Integration (model_tools.py):
  - Plugin discovery runs after built-in + MCP tools
  - Plugin tools bypass toolset filter via get_plugin_tool_names()
  - Pre/post tool call hooks fire in handle_function_call()

CLI:
  - /plugins command shows loaded plugins, tool counts, status
  - Added to COMMANDS dict for autocomplete

Docs:
  - Getting started guide (build-a-hermes-plugin.md) — full tutorial
    building a calculator plugin step by step
  - Reference page (features/plugins.md) — quick overview + tables
  - Covers: file structure, schemas, handlers, hooks, data files,
    bundled skills, env var gating, pip distribution, common mistakes

Tests: 16 tests covering discovery, loading, hooks, tool visibility.

* feat: add /bg as alias for /background slash command

Adds /bg alias across CLI, gateway, and Slack platform adapter.
Updates help text, autocomplete, known_commands set, and dispatch
logic. Includes tests for the new alias.

* docs: add plan for centralized slash command registry

Scopes a refactor to replace 7+ scattered command definition sites
with a single CommandDef registry in hermes_cli/commands.py. Includes
derived helper functions for gateway help text, Telegram BotCommands,
Slack subcommand maps, and alias resolution.

Documents current drift (Telegram missing /rollback + /background,
Slack missing /voice + /update, gateway dead code) that the refactor
fixes for free.

---------

Co-authored-by: Ninja <ninja@local>
Co-authored-by: alireza78a <alireza78a@users.noreply.github.com>
Co-authored-by: Oktay Aydin <113846926+aydnOktay@users.noreply.github.com>
Co-authored-by: JP Lew <polydegen@protonmail.com>
Co-authored-by: an420eth <an420eth@users.noreply.github.com>
2026-03-16 17:27:02 -07:00
Teknium 181077b785 fix: hide Honcho session line on CLI load when no API key configured (#1582)
HonchoClientConfig.from_env() set enabled=True unconditionally,
even when HONCHO_API_KEY was not set. When ~/.honcho/config.json
didn't exist, from_global_config() fell back to from_env() and
returned enabled=True with a null api_key, causing the Honcho
session indicator to display on every CLI launch.

Fix: from_env() now sets enabled=bool(api_key), matching the
auto-enable logic already used in from_global_config().
Also added api_key guard to the CLI display as defense-in-depth.
2026-03-16 17:22:52 -07:00
SHL0MS 63635744bf Refactor ascii-video skill: creative-first SKILL.md, consolidate reference files 2026-03-16 20:11:12 -04:00
Teknium 2158c44efd fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting (#1597)
Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)
2026-03-16 17:08:22 -07:00
Teknium e6cf1c94a8 Merge pull request #1585 from 0xbyt4/fix/anthropic-error-handling
fix(anthropic): retry 429/529 errors and surface error details to users
2026-03-16 15:46:06 -07:00
0xbyt4 d998cac319 fix(anthropic): retry 429/529 errors and surface error details to users
- 429 rate limit and 529 overloaded were incorrectly treated as
  non-retryable client errors, causing immediate failure instead of
  exponential backoff retry. Users hitting Anthropic rate limits got
  silent failures or no response at all.
- Generic "Sorry, I encountered an unexpected error" now includes
  error type, details, and status-specific hints (auth, rate limit,
  overloaded).
- Failed agent with final_response=None now surfaces the actual
  error message instead of returning an empty response.
2026-03-17 01:07:11 +03:00
Teknium 6c84e26e70 Merge pull request #1538 from NousResearch/hermes/hermes-a098c323
feat: unified streaming infrastructure — real-time token delivery for CLI + gateway
2026-03-16 14:22:57 -07:00
teknium1 f4d61c168b merge: resolve conflicts with main (show_cost, turn routing, docker docs) 2026-03-16 14:22:38 -07:00
teknium1 8feb9e4656 docs: add streaming section to configuration guide 2026-03-16 12:53:49 -07:00
teknium1 25a1f1867f fix(gateway): prevent message flooding on adapters without edit support
When the stream consumer's first edit_message() call fails (Signal,
Email, HomeAssistant don't support editing), it now disables editing
for the rest of the stream instead of falling back to sending a new
message every 0.3 seconds. The final response is delivered by the
normal send path since already_sent stays false.

Without this fix, enabling gateway streaming on Signal/Email/HA would
flood the chat with dozens of partial messages.
2026-03-16 12:41:28 -07:00
Teknium 5e5c92663d fix: hermes update causes dual gateways on macOS (launchd) (#1567)
* feat: add optional smart model routing

Add a conservative cheap-vs-strong routing option that can send very short/simple turns to a cheaper model across providers while keeping the primary model for complex work. Wire it through CLI, gateway, and cron, and document the config.yaml workflow.

* fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s

* fix(gateway): avoid recursive ExecStop in user systemd unit

* fix: extend ExecStop removal and TimeoutStopSec=60 to system unit

The cherry-picked PR #1448 fix only covered the user systemd unit.
The system unit had the same TimeoutStopSec=15 and could benefit
from the same 60s timeout for clean shutdown. Also adds a regression
test for the system unit.

---------

Co-authored-by: Ninja <ninja@local>

* feat(skills): add blender-mcp optional skill for 3D modeling

Control a running Blender instance from Hermes via socket connection
to the blender-mcp addon (port 9876). Supports creating 3D objects,
materials, animations, and running arbitrary bpy code.

Placed in optional-skills/ since it requires Blender 4.3+ desktop
with a third-party addon manually started each session.

* feat(acp): support slash commands in ACP adapter (#1532)

Adds /help, /model, /tools, /context, /reset, /compact, /version
to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled
directly in the server without instantiating the TUI — each command
queries agent/session state and returns plain text.

Unrecognized /commands fall through to the LLM as normal messages.

/model uses detect_provider_for_model() for auto-detection when
switching models, matching the CLI and gateway behavior.

Fixes #1402

* fix(logging): improve error logging in session search tool (#1533)

* fix(gateway): restart on retryable startup failures (#1517)

* feat(email): add skip_attachments option via config.yaml

* feat(email): add skip_attachments option via config.yaml

Adds a config.yaml-driven option to skip email attachments in the
gateway email adapter. Useful for malware protection and bandwidth
savings.

Configure in config.yaml:
  platforms:
    email:
      skip_attachments: true

Based on PR #1521 by @an420eth, changed from env var to config.yaml
(via PlatformConfig.extra) to match the project's config-first pattern.

* docs: document skip_attachments option for email adapter

* fix(telegram): retry on transient TLS failures during connect and send

Add exponential-backoff retry (3 attempts) around initialize() to
handle transient TLS resets during gateway startup. Also catches
TimedOut and OSError in addition to NetworkError.

Add exponential-backoff retry (3 attempts) around send_message() for
NetworkError during message delivery, wrapping the existing Markdown
fallback logic.

Both imports are guarded with try/except ImportError for test
environments where telegram is mocked.

Based on PR #1527 by cmd8. Closes #1526.

* feat: permissive block_anchor thresholds and unicode normalization (#1539)

Salvaged from PR #1528 by an420eth. Closes #517.

Improves _strategy_block_anchor in fuzzy_match.py:
- Add unicode normalization (smart quotes, em/en-dashes, ellipsis,
  non-breaking spaces → ASCII) so LLM-produced unicode artifacts
  don't break anchor line matching
- Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for
  multiple candidates — if first/last lines match exactly, the
  block is almost certainly correct
- Use original (non-normalized) content for offset calculation to
  preserve correct character positions

Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space
anchors, very-low-similarity unique matches), zero regressions on
all 9 existing fuzzy match tests.

Co-authored-by: an420eth <an420eth@users.noreply.github.com>

* feat(cli): add file path autocomplete in the input prompt (#1545)

When typing a path-like token (./  ../  ~/  /  or containing /),
the CLI now shows filesystem completions in the dropdown menu.
Directories show a trailing slash and 'dir' label; files show
their size. Completions are case-insensitive and capped at 30
entries.

Triggered by tokens like:
  edit ./src/ma     → shows ./src/main.py, ./src/manifest.json, ...
  check ~/doc       → shows ~/docs/, ~/documents/, ...
  read /etc/hos     → shows /etc/hosts, /etc/hostname, ...
  open tools/reg    → shows tools/registry.py

Slash command autocomplete (/help, /model, etc.) is unaffected —
it still triggers when the input starts with /.

Inspired by OpenCode PR #145 (file path completion menu).

Implementation:
- hermes_cli/commands.py: _extract_path_word() detects path-like
  tokens, _path_completions() yields filesystem Completions with
  size labels, get_completions() routes to paths vs slash commands
- tests/hermes_cli/test_path_completion.py: 26 tests covering
  path extraction, prefix filtering, directory markers, home
  expansion, case-insensitivity, integration with slash commands

* feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled

Add privacy.redact_pii config option (boolean, default false). When
enabled, the gateway redacts personally identifiable information from
the system prompt before sending it to the LLM provider:

- Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256>
- User IDs → hashed to user_<sha256>
- Chat IDs → numeric portion hashed, platform prefix preserved
- Home channel IDs → hashed
- Names/usernames → NOT affected (user-chosen, publicly visible)

Hashes are deterministic (same user → same hash) so the model can
still distinguish users in group chats. Routing and delivery use
the original values internally — redaction only affects LLM context.

Inspired by OpenClaw PR #47959.

* fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs)

Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM
needs the real ID to tag users. Redaction now only applies to WhatsApp,
Signal, and Telegram where IDs are pure routing metadata.

Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack.

* feat: smart approvals + /stop command (inspired by OpenAI Codex)

* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.

* feat: first-class plugin architecture + hide status bar cost by default (#1544)

The persistent status bar now shows context %, token counts, and
duration but NOT $ cost by default. Cost display is opt-in via:

  display:
    show_cost: true

in config.yaml, or: hermes config set display.show_cost true

The /usage command still shows full cost breakdown since the user
explicitly asked for it — this only affects the always-visible bar.

Status bar without cost:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m

Status bar with show_cost: true:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m

* feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex)

* feat: improve memory prioritization — user preferences over procedural knowledge

Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493)
which focus memory writes on user preferences and recurring patterns
rather than procedural task details.

Key insight: 'Optimize for reducing future user steering — the most
valuable memory prevents the user from having to repeat themselves.'

Changes:
- MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy
  and the core principle about reducing user steering
- MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put
  corrections first, added explicit PRIORITY guidance
- Memory nudge (run_agent.py): now asks specifically about preferences,
  corrections, and workflow patterns instead of generic 'anything'
- Memory flush (run_agent.py): now instructs to prioritize user
  preferences and corrections over task-specific details

* feat: more aggressive skill creation and update prompting

Press harder on skill updates — the agent should proactively patch
skills when it encounters issues during use, not wait to be asked.

Changes:
- SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction
  to patch skills immediately when found outdated/wrong
- Skills header: added instruction to update loaded skills before finishing
  if they had missing steps or wrong commands
- Skill nudge: more assertive ('save the approach' not 'consider saving'),
  now also prompts for updating existing skills used in the task
- Skill nudge interval: lowered default from 15 to 10 iterations
- skill_manage schema: added 'patch it immediately' to update triggers

* feat: first-class plugin architecture (#1555)

Plugin system for extending Hermes with custom tools, hooks, and
integrations — no source code changes required.

Core system (hermes_cli/plugins.py):
  - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and
    pip entry_points (hermes_agent.plugins group)
  - PluginContext with register_tool() and register_hook()
  - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call,
    on_session_start/end
  - Namespace package handling for relative imports in plugins
  - Graceful error isolation — broken plugins never crash the agent

Integration (model_tools.py):
  - Plugin discovery runs after built-in + MCP tools
  - Plugin tools bypass toolset filter via get_plugin_tool_names()
  - Pre/post tool call hooks fire in handle_function_call()

CLI:
  - /plugins command shows loaded plugins, tool counts, status
  - Added to COMMANDS dict for autocomplete

Docs:
  - Getting started guide (build-a-hermes-plugin.md) — full tutorial
    building a calculator plugin step by step
  - Reference page (features/plugins.md) — quick overview + tables
  - Covers: file structure, schemas, handlers, hooks, data files,
    bundled skills, env var gating, pip distribution, common mistakes

Tests: 16 tests covering discovery, loading, hooks, tool visibility.

* fix: hermes update causes dual gateways on macOS (launchd)

Three bugs worked together to create the dual-gateway problem:

1. cmd_update only checked systemd for gateway restart, completely
   ignoring launchd on macOS. After killing the PID it would print
   'Restart it with: hermes gateway run' even when launchd was about
   to auto-respawn the process.

2. launchd's KeepAlive.SuccessfulExit=false respawns the gateway
   after SIGTERM (non-zero exit), so the user's manual restart
   created a second instance.

3. The launchd plist lacked --replace (systemd had it), so the
   respawned gateway didn't kill stale instances on startup.

Fixes:
- Add --replace to launchd ProgramArguments (matches systemd)
- Add launchd detection to cmd_update's auto-restart logic
- Print 'auto-restart via launchd' instead of manual restart hint

* fix: add launchd plist auto-refresh + explicit restart in cmd_update

Two integration issues with the initial fix:

1. Existing macOS users with old plist (no --replace) would never
   get the fix until manual uninstall/reinstall. Added
   refresh_launchd_plist_if_needed() — mirrors the existing
   refresh_systemd_unit_if_needed(). Called from launchd_start(),
   launchd_restart(), and cmd_update.

2. cmd_update relied on KeepAlive respawn after SIGTERM rather than
   explicit launchctl stop/start. This caused races: launchd would
   respawn the old process before the PID file was cleaned up.
   Now does explicit stop+start (matching how systemd gets an
   explicit systemctl restart), with plist refresh first so the
   new --replace flag is picked up.

---------

Co-authored-by: Ninja <ninja@local>
Co-authored-by: alireza78a <alireza78a@users.noreply.github.com>
Co-authored-by: Oktay Aydin <113846926+aydnOktay@users.noreply.github.com>
Co-authored-by: JP Lew <polydegen@protonmail.com>
Co-authored-by: an420eth <an420eth@users.noreply.github.com>
2026-03-16 12:36:29 -07:00
teknium1 942950f5b9 feat(cli): live reasoning token streaming — dim box above response
When both display.streaming and display.show_reasoning are enabled,
reasoning tokens stream in real-time into a dim bordered box. When
content tokens start arriving, the reasoning box closes and the
response box opens — smooth visual transition.

- _stream_reasoning_delta(): line-buffered rendering in dim text
- _close_reasoning_box(): flush + close, called on first content token
- Reasoning callback routes to streaming version when both flags set
- Skips static post-response reasoning display when streamed live
- State reset per turn via _reset_stream_state()

Works with reasoning_content deltas (OpenRouter reasoning mode) and
thinking_delta events (Anthropic extended thinking).
2026-03-16 10:29:55 -07:00
teknium1 d3687d3e81 docs: document planned live reasoning token display as future enhancement
The streaming infrastructure already fires reasoning deltas via
_fire_reasoning_delta() during streaming. The remaining work is the
CLI display layer: a dim reasoning box that opens on first reasoning
token, streams live, then transitions to the response box.

Reference: PR #1214 (raulvidis) for gateway reasoning visibility.
2026-03-16 10:22:44 -07:00
Muhammet Eren Karakuş 43b8ecd172 fix(tests): use case-insensitive regex in singularity preflight tests
pytest.raises(match=...) is case-sensitive by default. The error
message starts with "Neither" (capital N) but the regex used lowercase
"neither", causing CI failures on Linux.
2026-03-16 19:01:39 +03:00
Muhammet Eren Karakuş 606f57a3ab fix(terminal): add Singularity/Apptainer preflight availability check
When neither apptainer nor singularity is installed, the Singularity
backend silently defaults to "singularity" and fails with a cryptic
FileNotFoundError inside _start_instance().  Add a preflight check
that resolves the executable and verifies it responds, raising a
clear RuntimeError with install instructions on failure.

Closes #1511
2026-03-16 18:25:20 +03:00
teknium1 23b9d88a76 docs: add streaming config to cli-config.yaml.example and defaults
Documents the new streaming options in the example config:
- display.streaming for CLI (under display section)
- streaming.enabled + transport/interval/threshold/cursor for gateway
- Added streaming: false to load_cli_config() defaults dict
2026-03-16 07:53:08 -07:00
teknium1 c0b88018eb feat: ship streaming disabled by default — opt-in via config
Streaming is now off by default for both CLI and gateway. Users opt in:

CLI (config.yaml):
  display:
    streaming: true

Gateway (config.yaml):
  streaming:
    enabled: true

This lets early adopters test streaming while existing users see zero
change. Once we have enough field validation, we flip the default to
true in a subsequent release.
2026-03-16 07:44:42 -07:00
teknium1 fc4080c58a fix(cli): add <THINKING> to streaming tag suppression list
Anthropic native models emit <THINKING> tags in text content (separate
from the SDK's thinking_delta events). Without suppression, these tags
leak into the streamed CLI output. Found during live provider testing.
2026-03-16 07:34:29 -07:00
Teknium 91b9495b04 feat(browser): /browser connect — attach browser tools to live Chrome via CDP (#1549)
feat(browser): /browser connect — attach browser tools to live Chrome via CDP
2026-03-16 07:32:07 -07:00
teknium1 c2769dffe0 merge: resolve conflicts with main (plugins + stop commands) 2026-03-16 07:32:00 -07:00
teknium1 71e35311f5 fix(browser): model waits for user instruction after /browser connect
Updated the injected context message to tell the model to await the
user's instruction before operating the browser. Typical flow is:
user opens Chrome → logs into sites → /browser connect → tells the
agent what to do.
2026-03-16 07:20:43 -07:00
Teknium 97990e7ad5 feat: first-class plugin architecture (#1555)
Plugin system for extending Hermes with custom tools, hooks, and
integrations — no source code changes required.

Core system (hermes_cli/plugins.py):
  - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and
    pip entry_points (hermes_agent.plugins group)
  - PluginContext with register_tool() and register_hook()
  - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call,
    on_session_start/end
  - Namespace package handling for relative imports in plugins
  - Graceful error isolation — broken plugins never crash the agent

Integration (model_tools.py):
  - Plugin discovery runs after built-in + MCP tools
  - Plugin tools bypass toolset filter via get_plugin_tool_names()
  - Pre/post tool call hooks fire in handle_function_call()

CLI:
  - /plugins command shows loaded plugins, tool counts, status
  - Added to COMMANDS dict for autocomplete

Docs:
  - Getting started guide (build-a-hermes-plugin.md) — full tutorial
    building a calculator plugin step by step
  - Reference page (features/plugins.md) — quick overview + tables
  - Covers: file structure, schemas, handlers, hooks, data files,
    bundled skills, env var gating, pip distribution, common mistakes

Tests: 16 tests covering discovery, loading, hooks, tool visibility.
2026-03-16 07:17:36 -07:00
teknium1 73f39a7761 feat(browser): auto-launch Chrome when /browser connect finds no debugger
When /browser connect detects that port 9222 isn't open, it now:
1. Finds Chrome/Chromium/Brave/Edge on the system (macOS app bundles
   or Linux PATH lookup)
2. Launches it with --remote-debugging-port=9222 (detached)
3. Waits up to 5 seconds for the port to come up
4. Falls back to manual instructions if auto-launch fails

This means GUI-only users can just type /browser connect without
needing to know about terminal flags or Chrome launch commands.
2026-03-16 07:05:48 -07:00
Teknium 1ecfe68675 feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex)
* feat: improve memory prioritization — user preferences over procedural knowledge

Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493)
which focus memory writes on user preferences and recurring patterns
rather than procedural task details.

Key insight: 'Optimize for reducing future user steering — the most
valuable memory prevents the user from having to repeat themselves.'

Changes:
- MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy
  and the core principle about reducing user steering
- MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put
  corrections first, added explicit PRIORITY guidance
- Memory nudge (run_agent.py): now asks specifically about preferences,
  corrections, and workflow patterns instead of generic 'anything'
- Memory flush (run_agent.py): now instructs to prioritize user
  preferences and corrections over task-specific details

* feat: more aggressive skill creation and update prompting

Press harder on skill updates — the agent should proactively patch
skills when it encounters issues during use, not wait to be asked.

Changes:
- SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction
  to patch skills immediately when found outdated/wrong
- Skills header: added instruction to update loaded skills before finishing
  if they had missing steps or wrong commands
- Skill nudge: more assertive ('save the approach' not 'consider saving'),
  now also prompts for updating existing skills used in the task
- Skill nudge interval: lowered default from 15 to 10 iterations
- skill_manage schema: added 'patch it immediately' to update triggers
2026-03-16 06:52:32 -07:00
Teknium 447594be28 feat: first-class plugin architecture + hide status bar cost by default (#1544)
The persistent status bar now shows context %, token counts, and
duration but NOT $ cost by default. Cost display is opt-in via:

  display:
    show_cost: true

in config.yaml, or: hermes config set display.show_cost true

The /usage command still shows full cost breakdown since the user
explicitly asked for it — this only affects the always-visible bar.

Status bar without cost:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m

Status bar with show_cost: true:
  ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15m
2026-03-16 06:43:57 -07:00
teknium1 9d1483c7e6 feat(browser): /browser connect — attach browser tools to live Chrome via CDP
Add /browser slash command for connecting browser tools to the user's
live Chrome instance via Chrome DevTools Protocol:

  /browser connect       — connect to Chrome on localhost:9222
  /browser connect ws://host:port  — custom CDP endpoint
  /browser disconnect    — revert to default (headless/Browserbase)
  /browser status        — show current browser mode + connectivity

When connected:
- All browser tools (navigate, snapshot, click, etc.) control the
  user's real Chrome — logged-in sessions, cookies, open tabs
- Platform-specific Chrome launch instructions are shown
- Port connectivity is tested immediately
- A context message is injected so the model knows it's controlling
  a live browser and should be mindful of user's open tabs

Implementation:
- BROWSER_CDP_URL env var drives the backend selection in browser_tool.py
- New _create_cdp_session() creates sessions using the CDP override
- _get_cdp_override() checked before local/Browserbase selection
- Existing agent-browser --cdp flag handles the actual CDP connection

Inspired by OpenClaw's browser profile system.
2026-03-16 06:38:20 -07:00
teknium1 8e07f9ca56 fix: audit fixes — 5 bugs found and resolved
Thorough code review found 5 issues across run_agent.py, cli.py, and gateway/:

1. CRITICAL — Gateway stream consumer task never started: stream_consumer_holder
   was checked BEFORE run_sync populated it. Fixed with async polling pattern
   (same as track_agent).

2. MEDIUM-HIGH — Streaming fallback after partial delivery caused double-response:
   if streaming failed after some tokens were delivered, the fallback would
   re-deliver the full response. Now tracks deltas_were_sent and only falls
   back when no tokens reached consumers yet.

3. MEDIUM — Codex mode lost on_first_delta spinner callback: _run_codex_stream
   now accepts on_first_delta parameter, fires it on first text delta. Passed
   through from _interruptible_streaming_api_call via _codex_on_first_delta
   instance attribute.

4. MEDIUM — CLI close-tag after-text bypassed tag filtering: text after a
   reasoning close tag was sent directly to _emit_stream_text, skipping
   open-tag detection. Now routes through _stream_delta for full filtering.

5. LOW — Removed 140 lines of dead code: old _streaming_api_call method
   (superseded by _interruptible_streaming_api_call). Updated 13 tests in
   test_run_agent.py and test_openai_client_lifecycle.py to use the new
   method name and signature.

4573 tests passing.
2026-03-16 06:35:46 -07:00
Teknium 57be18c026 feat: smart approvals + /stop command (inspired by OpenAI Codex)
* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.
2026-03-16 06:20:11 -07:00
teknium1 99369b926c fix: always fall back to non-streaming on ANY streaming error
Previously the fallback only triggered on specific error keywords like
'streaming is not supported'. Many third-party providers have partial
or broken streaming — rejecting stream=True, crashing on stream_options,
dropping connections mid-stream, returning malformed chunks, etc.

Now: any exception during the streaming API call triggers an automatic
fallback to the standard non-streaming request path. The error is logged
at INFO level for diagnostics but never surfaces to the user. If the
fallback also fails, THAT error propagates normally.

This ensures streaming is additive — it improves UX when it works but
never breaks providers that don't support it.

Tests: 2 new (any-error fallback, double-failure propagation), 15 total.
2026-03-16 06:15:09 -07:00
Teknium 2633272ea9 feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled (#1542)
feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled
2026-03-16 06:08:17 -07:00
Teknium 2ba219fa4b feat(cli): add file path autocomplete in the input prompt (#1545)
When typing a path-like token (./  ../  ~/  /  or containing /),
the CLI now shows filesystem completions in the dropdown menu.
Directories show a trailing slash and 'dir' label; files show
their size. Completions are case-insensitive and capped at 30
entries.

Triggered by tokens like:
  edit ./src/ma     → shows ./src/main.py, ./src/manifest.json, ...
  check ~/doc       → shows ~/docs/, ~/documents/, ...
  read /etc/hos     → shows /etc/hosts, /etc/hostname, ...
  open tools/reg    → shows tools/registry.py

Slash command autocomplete (/help, /model, etc.) is unaffected —
it still triggers when the input starts with /.

Inspired by OpenCode PR #145 (file path completion menu).

Implementation:
- hermes_cli/commands.py: _extract_path_word() detects path-like
  tokens, _path_completions() yields filesystem Completions with
  size labels, get_completions() routes to paths vs slash commands
- tests/hermes_cli/test_path_completion.py: 26 tests covering
  path extraction, prefix filtering, directory markers, home
  expansion, case-insensitivity, integration with slash commands
2026-03-16 06:07:45 -07:00
teknium1 9a423c3487 fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs)
Discord uses <@user_id> for mentions and Slack uses <@U12345> — the LLM
needs the real ID to tag users. Redaction now only applies to WhatsApp,
Signal, and Telegram where IDs are pure routing metadata.

Add 4 platform-specific tests covering Discord, WhatsApp, Signal, Slack.
2026-03-16 05:58:34 -07:00
teknium1 5479bb0e0c feat(gateway): streaming token delivery — StreamingConfig, GatewayStreamConsumer, already_sent
Stage 3 of streaming support. Gateway now streams tokens to messaging platforms:

- StreamingConfig dataclass (enabled, transport, edit_interval, buffer_threshold, cursor)
  on GatewayConfig with from_dict/to_dict serialization
- GatewayStreamConsumer: async queue-based consumer that progressively edits
  a single message on the target platform (edit transport)
- on_delta() → queue → run() async task → send_or_edit() with rate limiting
- already_sent propagation: when streaming delivered the response, handler
  returns None so base adapter skips duplicate send()
- stream_delta_callback wired into AIAgent constructor in _run_agent
- Consumer lifecycle: started as asyncio task, awaited with timeout in finally

Config (config.yaml):
  streaming:
    enabled: true
    transport: edit      # progressive editMessageText
    edit_interval: 0.3   # seconds between edits
    buffer_threshold: 40 # chars before forcing flush
    cursor: ' ▉'

Credit: jobless0x (#774, #1312), OutThisLife (#798), clicksingh (#697).
2026-03-16 05:52:42 -07:00
teknium1 c51e7b4af7 feat(privacy): redact PII from LLM context when privacy.redact_pii is enabled
Add privacy.redact_pii config option (boolean, default false). When
enabled, the gateway redacts personally identifiable information from
the system prompt before sending it to the LLM provider:

- Phone numbers (user IDs on WhatsApp/Signal) → hashed to user_<sha256>
- User IDs → hashed to user_<sha256>
- Chat IDs → numeric portion hashed, platform prefix preserved
- Home channel IDs → hashed
- Names/usernames → NOT affected (user-chosen, publicly visible)

Hashes are deterministic (same user → same hash) so the model can
still distinguish users in group chats. Routing and delivery use
the original values internally — redaction only affects LLM context.

Inspired by OpenClaw PR #47959.
2026-03-16 05:48:45 -07:00
Teknium 7d2c786acc Merge pull request #1534 from NousResearch/fix/1445-docker-cwd-optin
fix(docker): make cwd workspace mount explicit opt-in
2026-03-16 05:42:21 -07:00
teknium1 b72f522e30 test: fake minisweagent for docker cwd mount regressions
Make the new Docker cwd-mount tests pass in CI environments that do not have the minisweagent package installed by injecting a fake module instead of monkeypatching an import path that may not exist.
2026-03-16 05:40:05 -07:00
Teknium 352980311b feat: permissive block_anchor thresholds and unicode normalization (#1539)
Salvaged from PR #1528 by an420eth. Closes #517.

Improves _strategy_block_anchor in fuzzy_match.py:
- Add unicode normalization (smart quotes, em/en-dashes, ellipsis,
  non-breaking spaces → ASCII) so LLM-produced unicode artifacts
  don't break anchor line matching
- Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for
  multiple candidates — if first/last lines match exactly, the
  block is almost certainly correct
- Use original (non-normalized) content for offset calculation to
  preserve correct character positions

Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space
anchors, very-low-similarity unique matches), zero regressions on
all 9 existing fuzzy match tests.

Co-authored-by: an420eth <an420eth@users.noreply.github.com>
2026-03-16 05:29:25 -07:00
Teknium b411b979cb fix(telegram): retry on transient TLS failures during connect and send (#1535)
fix(telegram): retry on transient TLS failures during connect and send
2026-03-16 05:28:11 -07:00
teknium1 ac739e485f fix(cli): reasoning tag suppression during streaming + fix fallback detection
Fixes two issues found during live testing:

1. Reasoning tag suppression: close tags like </REASONING_SCRATCHPAD>
   that arrive split across stream tokens (e.g. '</REASONING_SCRATCH' +
   'PAD>\n\nHello') were being lost because the buffer was discarded.
   Fix: keep a sliding window of the tail (max close tag length) so
   partial tags survive across tokens.

2. Streaming fallback detection was too broad — 'stream' matched any
   error containing that word (including 'stream_options' rejections).
   Narrowed to specific phrases: 'streaming is not', 'streaming not
   support', 'does not support stream', 'not available'.

Verified with real API calls: streaming works end-to-end with
reasoning block suppression, response box framing, and proper
fallback to Rich Panel when streaming isn't active.
2026-03-16 05:28:10 -07:00
Teknium 8758e2e8d7 feat(email): add skip_attachments option via config.yaml
* feat(email): add skip_attachments option via config.yaml

Adds a config.yaml-driven option to skip email attachments in the
gateway email adapter. Useful for malware protection and bandwidth
savings.

Configure in config.yaml:
  platforms:
    email:
      skip_attachments: true

Based on PR #1521 by @an420eth, changed from env var to config.yaml
(via PlatformConfig.extra) to match the project's config-first pattern.

* docs: document skip_attachments option for email adapter
2026-03-16 05:27:54 -07:00
JP Lew 17e87478d2 fix(gateway): restart on retryable startup failures (#1517) 2026-03-16 05:26:31 -07:00
aydnOktay a5359e61e7 fix(tools): improve error logging in skill_manager_tool 2026-03-16 15:25:30 +03:00
teknium1 25b0ae7979 fix(telegram): retry on transient TLS failures during connect and send
Add exponential-backoff retry (3 attempts) around initialize() to
handle transient TLS resets during gateway startup. Also catches
TimedOut and OSError in addition to NetworkError.

Add exponential-backoff retry (3 attempts) around send_message() for
NetworkError during message delivery, wrapping the existing Markdown
fallback logic.

Both imports are guarded with try/except ImportError for test
environments where telegram is mocked.

Based on PR #1527 by cmd8. Closes #1526.
2026-03-16 05:23:32 -07:00
Oktay Aydin dfe72b9d97 fix(logging): improve error logging in session search tool (#1533) 2026-03-16 05:22:00 -07:00
teknium1 780ddd102b fix(docker): gate cwd workspace mount behind config
Keep Docker sandboxes isolated by default. Add an explicit terminal.docker_mount_cwd_to_workspace opt-in, thread it through terminal/file environment creation, and document the security tradeoff and config.yaml workflow clearly.
2026-03-16 05:20:56 -07:00
Bartok9 8cdbbcaaa2 fix(docker): auto-mount host CWD to /workspace
Fixes #1445 — When using Docker backend, the user's current working
directory is now automatically bind-mounted to /workspace inside the
container. This allows users to run `cd my-project && hermes` and have
their project files accessible to the agent without manual volume config.

Changes:
- Add host_cwd and auto_mount_cwd parameters to DockerEnvironment
- Capture original host CWD in _get_env_config() before container fallback
- Pass host_cwd through _create_environment() to Docker backend
- Add TERMINAL_DOCKER_NO_AUTO_MOUNT env var to disable if needed
- Skip auto-mount when /workspace is already explicitly mounted
- Add tests for auto-mount behavior
- Add documentation for the new feature

The auto-mount is skipped when:
1. TERMINAL_DOCKER_NO_AUTO_MOUNT=true is set
2. User configured docker_volumes with :/workspace
3. persistent_filesystem=true (persistent sandbox mode)

This makes the Docker backend behave more intuitively — the agent
operates on the user's actual project directory by default.
2026-03-16 05:20:21 -07:00
Teknium a2f0d14f29 feat(acp): support slash commands in ACP adapter (#1532)
Adds /help, /model, /tools, /context, /reset, /compact, /version
to the ACP adapter (VS Code, Zed, JetBrains). Commands are handled
directly in the server without instantiating the TUI — each command
queries agent/session state and returns plain text.

Unrecognized /commands fall through to the LLM as normal messages.

/model uses detect_provider_for_model() for auto-detection when
switching models, matching the CLI and gateway behavior.

Fixes #1402
2026-03-16 05:19:36 -07:00
teknium1 2219695d92 test: 14-test streaming suite — accumulator, callbacks, fallback, reasoning, Codex
Tests cover:
- Text/tool-call/mixed response accumulation into correct shape
- Delta callback ordering and on_first_delta firing once
- Tool-call suppression (no callbacks during tool turns)
- Provider fallback on 'not supported' errors
- Reasoning content accumulation and callback
- _has_stream_consumers() detection
- Codex stream delta callback firing
2026-03-16 05:12:38 -07:00
teknium1 d23e9a9bed feat(cli): streaming token display — line-buffered rendering with response box framing
Stage 2 of streaming support. CLI now streams tokens in real-time:

- _stream_delta(): line-buffered rendering via _cprint (prompt_toolkit safe)
- _flush_stream(): emits remaining buffer and closes response box
- Response box opens on first token, closes on flush
- Skip Rich Panel when streaming already displayed content
- Reset streaming state before each agent turn
- Compatible with existing TTS streaming (both can fire simultaneously)
- Uses skin engine for response label branding

Credit: OutThisLife (#798 CLI streaming concept).
2026-03-16 05:10:15 -07:00
Teknium add945e53c feat(skills): add blender-mcp optional skill for 3D modeling (#1531)
feat(skills): add blender-mcp optional skill for 3D modeling
2026-03-16 05:05:56 -07:00
teknium1 c1ac32737d feat: unified streaming infrastructure — core delta callbacks for all providers
Stage 1 of streaming support. Adds:

- stream_delta_callback parameter on AIAgent.__init__ for real-time token delivery
- _interruptible_streaming_api_call() handling chat_completions + anthropic_messages
- Enhanced _run_codex_stream() to fire delta callbacks during Codex streaming
- _fire_stream_delta() fires both display and TTS callbacks
- _fire_reasoning_delta() for reasoning content streaming
- Tool-call suppression: callbacks only fire on text-only responses
- on_first_delta callback for spinner control on first token
- Provider fallback: graceful degradation to non-streaming
- _has_stream_consumers() unifies stream_delta_callback and _stream_callback checks
- Anthropic streaming returns native Message for downstream compatibility

Drawing from PRs #922 (unified streaming), #1312 (gateway consumer),
#774 (Telegram streaming), #798 (CLI streaming), #1214 (reasoning modes).
Credit: jobless0x, OutThisLife, clicksingh, raulvidis.
2026-03-16 05:05:45 -07:00
alireza78a 14b049d658 feat(skills): add blender-mcp optional skill for 3D modeling
Control a running Blender instance from Hermes via socket connection
to the blender-mcp addon (port 9876). Supports creating 3D objects,
materials, animations, and running arbitrary bpy code.

Placed in optional-skills/ since it requires Blender 4.3+ desktop
with a third-party addon manually started each session.
2026-03-16 05:03:19 -07:00
Teknium 002c459981 fix(gateway): remove recursive ExecStop from systemd units, extend TimeoutStopSec to 60s
* fix(gateway): avoid recursive ExecStop in user systemd unit

* fix: extend ExecStop removal and TimeoutStopSec=60 to system unit

The cherry-picked PR #1448 fix only covered the user systemd unit.
The system unit had the same TimeoutStopSec=15 and could benefit
from the same 60s timeout for clean shutdown. Also adds a regression
test for the system unit.

---------

Co-authored-by: Ninja <ninja@local>
2026-03-16 05:03:11 -07:00
Teknium ce660a4413 fix(gateway): remove app-specific Athabasca references from vision enrichment (#1529)
Salvaged from PR #1428 by jplew.

Removes Athabasca-specific persistence guidance accidentally merged
in PR #1422:
- Drop Athabasca docstring and injected note from _enrich_message_with_vision
- Delete tests/gateway/test_image_enrichment.py (asserted app-specific behavior)

Co-authored-by: jplew <jplew@users.noreply.github.com>
2026-03-16 05:02:58 -07:00
Teknium ee579af566 docs: add CLI status bar docs and update /usage reference (#1523)
- Add Status Bar section to user-guide/cli.md with layout example,
  element descriptions, responsive width behavior, and color-coded
  context threshold table
- Update /usage description in slash-commands reference to mention
  cost breakdown and session duration
2026-03-16 04:58:28 -07:00
Teknium caa944e752 fix(setup+gateway): defer config write, PID-based gateway kill, scoped systemd service names (#1499)
fix(setup+gateway): defer config write, PID-based gateway kill, scoped systemd service names
2026-03-16 04:58:12 -07:00
Teknium 00110fb3c3 docs: update checkpoint/rollback docs for new features
- Reflect that checkpoints are now enabled by default
- Document /rollback diff <N> for previewing changes
- Document /rollback <N> <file> for single-file restore
- Document automatic conversation undo on rollback
- Document terminal command checkpoint coverage
- Update listing example to show change stats
- Fix config path (checkpoints.enabled, not agent.checkpoints_enabled)
- Consolidate features/checkpoints.md to brief summary with link
2026-03-16 04:56:22 -07:00
Bartok9 3543b755af fix(docker): auto-mount host CWD to /workspace
Fixes #1445 — When using Docker backend, the user's current working
directory is now automatically bind-mounted to /workspace inside the
container. This allows users to run `cd my-project && hermes` and have
their project files accessible to the agent without manual volume config.

Changes:
- Add host_cwd and auto_mount_cwd parameters to DockerEnvironment
- Capture original host CWD in _get_env_config() before container fallback
- Pass host_cwd through _create_environment() to Docker backend
- Add TERMINAL_DOCKER_NO_AUTO_MOUNT env var to disable if needed
- Skip auto-mount when /workspace is already explicitly mounted
- Add tests for auto-mount behavior
- Add documentation for the new feature

The auto-mount is skipped when:
1. TERMINAL_DOCKER_NO_AUTO_MOUNT=true is set
2. User configured docker_volumes with :/workspace
3. persistent_filesystem=true (persistent sandbox mode)

This makes the Docker backend behave more intuitively — the agent
operates on the user's actual project directory by default.
2026-03-16 04:53:24 -07:00
teknium1 51185354dd docs: document scoped systemd service names for multi-install
- Update messaging guide to use 'hermes gateway' CLI commands instead
  of raw systemctl (auto-resolves the correct service name)
- Add info callout explaining multi-install service name scoping
- Update HERMES_HOME env var docs to mention PID + service name scoping
2026-03-16 04:44:53 -07:00
Teknium 9e845a6e53 feat: major /rollback improvements — enabled by default, diff preview, file-level restore, conversation undo, terminal checkpoints
Checkpoint & rollback upgrades:

1. Enabled by default — checkpoints are now on for all new sessions.
   Zero cost when no file-mutating tools fire. Disable with
   checkpoints.enabled: false in config.yaml.

2. Diff preview — /rollback diff <N> shows a git diff between the
   checkpoint and current working tree before committing to a restore.

3. File-level restore — /rollback <N> <file> restores a single file
   from a checkpoint instead of the entire directory.

4. Conversation undo on rollback — when restoring files, the last
   chat turn is automatically undone so the agent's context matches
   the restored filesystem state.

5. Terminal command checkpoints — destructive terminal commands (rm,
   mv, sed -i, truncate, git reset/clean, output redirects) now
   trigger automatic checkpoints before execution. Previously only
   write_file and patch were covered.

6. Change summary in listing — /rollback now shows file count and
   +insertions/-deletions for each checkpoint.

7. Fixed dead code — removed duplicate _run_git call in
   list_checkpoints with nonsensical --all if False condition.

8. Updated help text — /rollback with no args now shows available
   subcommands (diff, file-level restore).
2026-03-16 04:43:37 -07:00
Teknium 00a0c56598 feat: add persistent CLI status bar and usage details (#1522)
Salvaged from PR #1104 by kshitijk4poor. Closes #683.

Adds a persistent status bar to the CLI showing model name, context
window usage with visual bar, estimated cost, and session duration.
Responsive layout degrades gracefully for narrow terminals.

Changes:
- agent/usage_pricing.py: shared pricing table, cost estimation with
  Decimal arithmetic, duration/token formatting helpers
- agent/insights.py: refactored to reuse usage_pricing (eliminates
  duplicate pricing table and formatting logic)
- cli.py: status bar with FormattedTextControl fragments, color-coded
  context thresholds (green/yellow/orange/red), enhanced /usage with
  cost breakdown, 1Hz idle refresh for status bar updates
- tests/test_cli_status_bar.py: status bar snapshot, width collapsing,
  usage report with/without pricing, zero-priced model handling
- tests/test_insights.py: verify zero-priced providers show as unknown

Salvage fixes:
- Resolved conflict with voice status bar (both coexist in layout)
- Import _format_context_length from hermes_cli.banner (moved since PR)

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-16 04:42:48 -07:00
teknium1 30da22e1c1 feat(gateway): scope systemd service name to HERMES_HOME
Multiple Hermes installations on the same machine now get unique
systemd service names:
- Default ~/.hermes → hermes-gateway (backward compatible)
- Custom HERMES_HOME → hermes-gateway-<8-char-hash>

Changes:
- Add get_service_name() in hermes_cli/gateway.py that derives a
  deterministic service name from HERMES_HOME via SHA256
- Replace all hardcoded 'hermes-gateway' systemd references with
  get_service_name() across gateway.py, main.py, status.py, uninstall.py
- Add HERMES_HOME env var to both user and system systemd unit templates
  so the gateway process uses the correct installation
- Update tests to use get_service_name() in assertions
2026-03-16 04:42:46 -07:00
teknium1 e7d3f1f3ba fix(update): kill gateway via PID file before restart
cmd_update only ran 'systemctl --user restart hermes-gateway', which
left manually-started gateway processes alive, causing duplicates.

Now uses get_running_pid() from gateway/status.py (scoped to
HERMES_HOME) to find and SIGTERM this installation's gateway before
restarting. Safe with multiple Hermes installations since each
HERMES_HOME has its own PID file.

If no systemd service exists, informs the user to restart manually.

Based on PR #1131 by teknium1. Dropped the cli.py Rich from_ansi
changes (already on main).
2026-03-16 04:35:34 -07:00
Teknium c1da1fdcd5 feat: auto-detect provider when switching models via /model (#1506)
When typing /model deepseek-chat while on a different provider, the
model name now auto-resolves to the correct provider instead of
silently staying on the wrong one and causing API errors.

Detection priority:
1. Direct provider with credentials (e.g. DEEPSEEK_API_KEY set)
2. OpenRouter catalog match with proper slug remapping
3. Direct provider without creds (clear error beats silent failure)

Also adds DeepSeek as a first-class API-key provider — just set
DEEPSEEK_API_KEY and /model deepseek-chat routes directly.

Bare model names get remapped to proper OpenRouter slugs:
  /model gpt-5.4 → openai/gpt-5.4
  /model claude-opus-4.6 → anthropic/claude-opus-4.6

Salvages the concept from PR #1177 by @virtaava with credential
awareness and OpenRouter slug mapping added.

Co-authored-by: virtaava <virtaava@users.noreply.github.com>
2026-03-16 04:34:45 -07:00
teknium1 f7c5d8a749 Merge remote-tracking branch 'origin/main' into hermes/hermes-6360cdf9 2026-03-16 00:29:49 -07:00
Teknium 9cf7e2f0af Merge pull request #1495 from NousResearch/fix/814-group-session-isolation
fix(gateway): default group sessions to per-user isolation
2026-03-16 00:25:43 -07:00
Teknium dd7921d514 fix(honcho): isolate session routing for multi-user gateway (#1500)
Salvaged from PR #1470 by adavyas.

Core fix: Honcho tool calls in a multi-session gateway could route to
the wrong session because honcho_tools.py relied on process-global
state. Now threads session context through the call chain:
  AIAgent._invoke_tool() → handle_function_call() → registry.dispatch()
  → handler **kw → _resolve_session_context()

Changes:
- Add _resolve_session_context() to prefer per-call context over globals
- Plumb honcho_manager + honcho_session_key through handle_function_call
- Add sync_honcho=False to run_conversation() for synthetic flush turns
- Pass honcho_session_key through gateway memory flush lifecycle
- Harden gateway PID detection when /proc cmdline is unreadable
- Make interrupt test scripts import-safe for pytest-xdist
- Wrap BibTeX examples in Jekyll raw blocks for docs build
- Fix thread-order-dependent assertion in client lifecycle test
- Expand Honcho docs: session isolation, lifecycle, routing internals

Dropped from original PR:
- Indentation change in _create_request_openai_client that would move
  client creation inside the lock (causes unnecessary contention)

Co-authored-by: adavyas <adavyas@users.noreply.github.com>
2026-03-16 00:23:47 -07:00
Teknium eb4f0348e1 fix: persist CLI token counts to session DB for /insights
Token usage was tracked in-memory during CLI sessions (session_prompt_tokens,
session_completion_tokens) but never written to the SQLite session DB. The
gateway persisted tokens via session_store.update_session(), but CLI sessions
always showed 0 tokens in /insights.

Now run_agent.py persists token deltas to the DB after each API call for CLI
sessions. Gateway sessions continue to use their existing persist path to
avoid double-counting.
2026-03-16 00:23:13 -07:00
teknium1 38b4fd3737 fix(gateway): make group session isolation configurable
default group and channel sessions to per-user isolation, allow opting back into shared room sessions via config.yaml, and document Discord gateway routing and session behavior.
2026-03-16 00:22:23 -07:00
ygd58 36dd7a3e8d fix(setup): defer config.yaml write until after model selection
_update_config_for_provider() was called immediately after provider
selection for zai, kimi-coding, minimax, minimax-cn, and anthropic —
before model selection happened. Since the gateway re-reads config.yaml
per-message, this created a race where the gateway would pick up the
new provider but still use the old (incompatible) model name.

Capture selected_base_url in each provider block, then call
_update_config_for_provider() once, after model selection completes,
right before save_config(). The in-memory _set_model_provider() calls
stay in place so the config object remains consistent during setup.

Closes #1182
2026-03-16 00:18:30 -07:00
Teknium dd698f6d5d fix(gateway): SSL certificate auto-detection for NixOS and non-standard systems (#1494)
fix(gateway): SSL certificate auto-detection for NixOS and non-standard systems
2026-03-16 00:14:13 -07:00
teknium1 06a7d19f98 fix(gateway): isolate group sessions per user
Include participant identifiers in non-DM session keys when available so group and channel conversations no longer share one transcript across every active user in the chat.
2026-03-15 23:08:56 -07:00
teknium1 3801532bd3 fix(gateway): SSL certificate auto-detection for NixOS and non-standard systems
Add _ensure_ssl_certs() that discovers CA certificate bundles before any
HTTP library is imported.  Resolution order:
1. Python's ssl.get_default_verify_paths()
2. certifi (if installed)
3. Common distro/macOS paths

Only sets SSL_CERT_FILE if not already present in the environment.
Wrapped in a function (called immediately) to avoid polluting module
namespace.

Based on PR #1151 by sylvesterroos.
2026-03-15 23:04:34 -07:00
Teknium aaacab7de7 docs: explain checkpoints, /rollback, and git worktrees
* docs: explain checkpoints, rollback, and git worktrees

* fix: correct hermes -w description — auto-creates worktree, takes no path arg

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-15 23:04:07 -07:00
Teknium 4298c6fd9a fix: route background process watcher notifications to Telegram forum topics (#1481)
Salvaged from PR #1146 by spanishflu-est1918.

Background process progress/completion messages were sent with only
chat_id, landing in the general topic instead of the originating forum
topic. Thread the thread_id from HERMES_SESSION_THREAD_ID through the
watcher payload and pass it as metadata to adapter.send() so Telegram
routes notifications to the correct topic.

The env var export (HERMES_SESSION_THREAD_ID in _set_session_env /
_clear_session_env) already existed on main — this commit adds the
missing watcher plumbing.

Co-authored-by: spanishflu-est1918 <spanishflu-est1918@users.noreply.github.com>
2026-03-15 23:01:57 -07:00
Teknium c30505dddd feat: add OSS Security Forensics skill (Skills Hub) (#1482)
* feat: add OSS Security Forensics skill (Skills Hub)

Salvaged from PR #1066 by zagiscoming. Adds a 7-phase multi-agent
investigation framework for GitHub supply chain attack forensics.

Skill contents (optional-skills/security/oss-forensics/):
- SKILL.md: 420-line investigation framework with 8 anti-hallucination
  guardrails, 5 specialist investigators, ethical use guidelines,
  and API rate limiting guidance
- evidence-store.py: CLI evidence manager with add/list/verify/query/
  export/summary + SHA-256 integrity + chain of custody
- references/: evidence types, GH Archive BigQuery guide (expanded with
  12 event types and 6 query templates), recovery techniques (4 methods),
  investigation templates (5 attack patterns)
- templates/: forensic report template (151 lines), malicious package
  report template

Changes from original PR:
- Dropped unrelated core tool changes (delegate_tool.py role parameter,
  AGENTS.md, README.md modifications)
- Removed duplicate skills/security/oss-forensics/ placement
- Fixed github-archive-guide.md (missing from optional-skills/, expanded
  from 33 to 160+ lines with all 12 event types and query templates)
- Added ethical use guidelines and API rate limiting sections
- Rewrote tests to match the v2 evidence store API (12 tests, all pass)

Closes #384

* fix: use python3 and SKILL_DIR paths throughout oss-forensics skill

- Replace all 'python' invocations with 'python3' for portability
  (Ubuntu doesn't ship 'python' by default)
- Replace relative '../scripts/' and '../templates/' paths with
  SKILL_DIR/scripts/ and SKILL_DIR/templates/ convention
- Add path convention note before Phase 0 explaining SKILL_DIR
- Fix double --- separator (cosmetic)
- Applies to SKILL.md, evidence-store.py docstring,
  recovery-techniques.md, and forensic-report.md template

---------

Co-authored-by: zagiscoming <zagiscoming@users.noreply.github.com>
2026-03-15 21:59:53 -07:00
Teknium 70e24d77a1 Merge pull request #1490 from NousResearch/fix/1033-telegram-voice-fallback
fix: restore local STT fallback for gateway voice notes
2026-03-15 21:58:32 -07:00
Teknium fa3db2671a docs(readme): add CLI vs messaging quick reference
Co-authored-by: Frank <97429702+tsubasakong@users.noreply.github.com>
2026-03-15 21:58:11 -07:00
Teknium 6fd9f2a0c5 fix(gateway): null-coalesce mode in SessionResetPolicy.from_dict (#1488)
fix(gateway): null-coalesce mode in SessionResetPolicy.from_dict
2026-03-15 21:57:31 -07:00
teknium1 1f72ce71b7 fix: restore local STT fallback for gateway voice notes
Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.
2026-03-15 21:51:40 -07:00
teknium1 102a255575 fix(gateway): null-coalesce mode in SessionResetPolicy.from_dict
Complete the YAML null handling for all three SessionResetPolicy fields.
at_hour and idle_minutes already had null coalescing; mode was still
using data.get('mode', 'both') which returns None when the key exists
with an explicit null value.

Add regression test covering all-null input.

Based on PR #1120 by stablegenius49.
2026-03-15 21:40:22 -07:00
Teknium 5beb681c70 fix(cli): prefer curses over simple_term_menu in setup.py (#1487) 2026-03-15 21:16:21 -07:00
Teknium c9a9db318e feat(tools): persistent shell mode for local and SSH backends (#1483)
feat(tools): persistent shell mode for local and SSH backends
2026-03-15 21:14:01 -07:00
teknium1 01e62c067b merge: resolve conflicts with origin/main (SSH preflight check) 2026-03-15 21:13:40 -07:00
Teknium ceb970c559 fix(terminal): add SSH preflight check (#1486) 2026-03-15 21:09:07 -07:00
teknium1 6894358fe1 docs: add persistent shell section to configuration and env-vars reference
Documents terminal.persistent_shell config option, per-backend env var
overrides, precedence table, and what state persists across commands.
2026-03-15 21:01:50 -07:00
Teknium 3f0f4a04a9 fix(agent): skip reasoning extra_body for unsupported OpenRouter models (#1485)
* fix(agent): skip reasoning extra_body for models that don't support it

Sending reasoning config to models like MiniMax or Nvidia via OpenRouter
causes a 400 BadRequestError. Previously, reasoning extra_body was sent
to all OpenRouter and Nous models unconditionally.

Fix: only send reasoning extra_body when the model slug starts with a
known reasoning-capable prefix (deepseek/, anthropic/, openai/, x-ai/,
google/gemini-2, qwen/qwen3) or when using Nous Portal directly.

Applies to both the main API call path (_build_api_kwargs) and the
conversation summary path.

Fixes #1083

* test(agent): cover reasoning extra_body gating

---------

Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-03-15 20:42:07 -07:00
Teknium c564e1c3dc feat(tools): centralize tool emoji metadata in registry + skin integration (#1484)
feat(tools): centralize tool emoji metadata in registry + skin integration
2026-03-15 20:35:24 -07:00
teknium1 210d5ade1e feat(tools): centralize tool emoji metadata in registry + skin integration
- Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry
- Add emoji= to all 50+ registry.register() calls across tool files
- Add get_tool_emoji() helper in agent/display.py with 3-tier resolution:
  skin override → registry default → hardcoded fallback
- Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and
  gateway/run.py with centralized get_tool_emoji() calls
- Add 'tool_emojis' field to SkinConfig so skins can override per-tool
  emojis (e.g. ares skin could use swords instead of wrenches)
- Add 11 tests (5 registry emoji, 6 display/skin integration)
- Update AGENTS.md skin docs table

Based on the approach from PR #1061 by ForgingAlex (emoji centralization
in registry). This salvage fixes several issues from the original:
- Does NOT split the cronjob tool (which would crash on missing schemas)
- Does NOT change image_generate toolset/requires_env/is_async
- Does NOT delete existing tests
- Completes the centralization (gateway/run.py was missed)
- Hooks into the skin system for full customizability
2026-03-15 20:21:21 -07:00
teknium1 33ebedc76d feat: enable persistent shell by default for SSH, add config option
SSH persistent shell now defaults to true — non-local backends benefit
most from state persistence across execute() calls. Local backend
remains opt-in via TERMINAL_LOCAL_PERSISTENT env var.

New config.yaml option: terminal.persistent_shell (default: true)
Controls the default for non-local backends. Users can disable with:
  hermes config set terminal.persistent_shell false

Precedence: per-backend env var > TERMINAL_PERSISTENT_SHELL > default.

Wired through cli.py, gateway/run.py, and hermes_cli/config.py so the
config.yaml value reaches terminal_tool via env var bridge.
2026-03-15 20:17:13 -07:00
teknium1 5b80654198 feat(tools): add persistent shell mode to local and SSH backends
Cherry-picked from PR #1067 by alt-glitch.
Adds PersistentShellMixin with file-based IPC protocol for long-lived
bash shells. LocalEnvironment and SSHEnvironment gain persistent=True
option. Controlled via TERMINAL_LOCAL_PERSISTENT / TERMINAL_SSH_PERSISTENT
env vars. Fixes latent stderr pipe buffer deadlock.

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-03-15 20:13:02 -07:00
Teknium 25e53f3c1a fix(custom-endpoint): verify /models and suggest working /v1 base URL (#1480) 2026-03-15 20:09:50 -07:00
Teknium 103f7b1ebc fix: verbose mode shows full untruncated output
* fix(cli): silence tirith prefetch install warnings at startup

* fix: verbose mode now shows full untruncated tool args, results, content, and think blocks

When tool progress is set to 'verbose' (via /verbose or config), the display
was still truncating tool arguments to 100 chars, tool results to 100-200 chars,
assistant content to 100 chars, and think blocks to 5 lines. This defeated the
purpose of verbose mode.

Changes:
- Tool args: show full JSON args (not truncated to log_prefix_chars)
- Tool results: show full result content in both display and debug logs
- Assistant content: show full content during tool-call loops
- Think blocks: show full reasoning text (not truncated to 5 lines/100 chars)
- Auto-enable reasoning display when verbose mode is active
- Fix initial agent creation to respect verbose config (was always quiet_mode=True)
- Updated verbose label to mention think blocks
2026-03-15 20:03:37 -07:00
Teknium a56937735e fix(telegram): escape chunk indicators in MarkdownV2 (#1478) 2026-03-15 19:27:15 -07:00
Teknium 7148534401 fix(gateway): make /status report live state and tokens (#1476) 2026-03-15 19:18:58 -07:00
Teknium 4e91b0240b fix(honcho): correct seed_ai_identity to use session.add_messages() (#1475)
The seed_ai_identity method was calling assistant_peer.add_message() which
doesn't exist on the Honcho SDK's Peer class. Fixed to use the correct
pattern: session.add_messages([peer.message(content)]), matching the
existing message sync code at line 294.

Discovered and fixed by Yuqi (Hermes Agent), Angello's AI companion.

Co-authored-by: Angello Picasso <angello.picasso@devsu.com>
2026-03-15 19:07:57 -07:00
Teknium 5e92a4ce5a fix: auto-reload MCP tools when mcp_servers config changes without restart (#1474)
Fixes #1036

After adding an MCP server to config.yaml, users had to restart Hermes
before the new tools became visible — even though /reload-mcp existed.

Add _check_config_mcp_changes() called from process_loop every 5s:
- stat() config.yaml for mtime changes (fast path, no YAML parse)
- On mtime change, parse and compare mcp_servers section
- If mcp_servers changed, auto-trigger _reload_mcp() and notify user
- Skip check while agent is running to avoid interrupting tool calls
- Throttled to CONFIG_WATCH_INTERVAL=5s to avoid busy-polling

/reload-mcp still works for manual force-reload.

Tests: 6 new tests in TestMCPConfigWatch, all passed

Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>
2026-03-15 19:03:34 -07:00
Teknium 471c663fdf fix(cli): silence tirith prefetch install warnings at startup (#1452) 2026-03-15 18:07:03 -07:00
Teknium 64d333204b Merge pull request #1242 from NousResearch/fix/file-tool-log-noise
fix: reduce file tool log noise
2026-03-15 11:11:18 -07:00
Teknium c44af43840 Merge pull request #1401 from NousResearch/hermes/hermes-eca4a640
test: protect atomic temp cleanup on interrupts
2026-03-15 11:10:41 -07:00
alt-glitch 4511322f56 Merge origin/main into sid/persistent-backend
Resolve conflict in local.py: keep refactored _make_run_env helper
over inline _sanitize_subprocess_env logic.
2026-03-15 21:08:11 +05:30
Teknium 934fc9df22 Merge pull request #1440 from NousResearch/fix/1071-dict-tool-args
fix: handle dict tool call arguments from local backends
2026-03-15 08:04:09 -07:00
teknium1 5847c180c6 test: restore vllm integration coverage and add dict-args regression
Restore the existing vLLM integration test module that was accidentally replaced during development and add a focused agent-loop regression test for dict tool-call arguments from OpenAI-compatible local backends.
2026-03-15 08:02:29 -07:00
teknium1 93a0c0cddd fix: handle dict tool call arguments from local backends
Normalize tool call arguments when OpenAI-compatible backends return parsed dict/list payloads instead of JSON strings. This prevents the .strip() crash during tool-call validation for llama.cpp and similar servers, while preserving existing empty-string and invalid-JSON handling. Adds a focused regression test for dict arguments in the agent loop.
2026-03-15 08:00:19 -07:00
Teknium 23e8fdd167 feat(discord): auto-thread on @mention + skip mention in bot threads
Two changes to align Discord behavior with Slack:

1. Auto-thread on @mention (default: true)
   - When someone @mentions the bot in a server channel, a thread is
     automatically created from their message and the response goes there.
   - Each thread gets its own isolated session (like Slack).
   - Configurable via discord.auto_thread in config.yaml (default: true)
     or DISCORD_AUTO_THREAD env var (env takes precedence).
   - DMs and existing threads are unaffected.

2. Skip @mention in bot-participated threads
   - Once the bot has responded in a thread (auto-created or manually
     entered), subsequent messages in that thread no longer require
     @mention. Users can just type normally.
   - Tracked via in-memory set (_bot_participated_threads). After a
     gateway restart, users need to @mention once to re-establish.
   - Threads the bot hasn't participated in still require @mention.

Config change:
   discord:
     auto_thread: true  # new, added to DEFAULT_CONFIG

Tests: 7 new tests covering auto-thread default, disable, bot thread
participation tracking, and mention skip logic. All 903 gateway tests pass.
2026-03-15 07:59:55 -07:00
Teknium 3268b98779 Merge pull request #1437 from NousResearch/fix/1219-cron-thread-context
fix: preserve thread context for cronjob deliver=origin
2026-03-15 06:58:37 -07:00
teyrebaz33 20f381cfb6 fix: preserve thread context for cronjob deliver=origin
When a cronjob is created from within a Telegram or Slack thread,
deliver=origin was posting to the parent channel instead of the thread.

Root cause: the gateway never set HERMES_SESSION_THREAD_ID in the
session environment, so cronjob_tools.py could not capture thread_id
into the job's origin metadata — even though the scheduler already
reads origin.get('thread_id').

Fix:
- gateway/run.py: set HERMES_SESSION_THREAD_ID when thread_id is
  present on the session context, and clear it in _clear_session_env
- tools/cronjob_tools.py: read HERMES_SESSION_THREAD_ID into origin

Closes #1219
2026-03-15 06:57:00 -07:00
Teknium 77bfa252b9 Merge pull request #1434 from NousResearch/fix/1244-env-override
fix(config): reload .env over stale shell overrides
2026-03-15 06:47:40 -07:00
teknium1 f24c00a5bf fix(config): reload .env over stale shell overrides
Hermes startup entrypoints now load ~/.hermes/.env and project fallback env files with user config taking precedence over stale shell-exported values. This makes model/provider/base URL changes in .env actually take effect after restarting Hermes. Adds a shared env loader plus regression coverage, and reproduces the original bug case where OPENAI_BASE_URL and HERMES_INFERENCE_PROVIDER remained stuck on old shell values before import.
2026-03-15 06:46:28 -07:00
Teknium 463239ed85 docs: fallback providers + /background command documentation
* docs: comprehensive fallback providers documentation

- New dedicated page: user-guide/features/fallback-providers.md covering
  both primary model fallback and auxiliary task fallback systems
- Updated configuration.md with fallback_model config section
- Updated environment-variables.md noting fallback is config-only
- Fleshed out developer-guide/provider-runtime.md fallback section with
  internal architecture details (trigger points, activation flow, config flow)
- Added cross-reference from provider-routing.md distinguishing OpenRouter
  sub-provider routing from Hermes-level model fallback
- Added new page to sidebar under Integrations

* docs: comprehensive /background command documentation

- Added Background Sessions section to cli.md covering how it works
  (daemon threads, isolated sessions, config inheritance, Rich panel
  output, bell notification, concurrent tasks)
- Added Background Sessions section to messaging/index.md covering
  messaging-specific behavior (async execution, result delivery back
  to same chat, fire-and-forget pattern)
- Documented background_process_notifications config
  (all/result/error/off) in messaging docs and configuration.md
- Added HERMES_BACKGROUND_NOTIFICATIONS env var to reference page
- Fixed inconsistency in slash-commands.md: /background was listed as
  messaging-only but works in both CLI and messaging. Moved it to the
  'both surfaces' note.
- Expanded one-liner table descriptions with detail and cross-references
2026-03-15 06:24:28 -07:00
Teknium 60cce9ca6d Merge pull request #1429 from NousResearch/fix/1336-discord-voice-reliability
fix(voice): Discord voice channel reliability fixes
2026-03-15 05:25:45 -07:00
teknium1 2d57946ee9 test(voice): clarify install guidance and local skips
Add an explicit messaging-extra install hint to the missing PyNaCl/davey error path, cover it with a voice-channel join regression test, and skip the low-level NaCl packet tests when PyNaCl is not installed locally.
2026-03-15 05:24:34 -07:00
0xbyt4 5f32fd8b6d feat(voice): add discord-voice-doctor diagnostic script
Checks the full voice environment and reports what's missing:
- Python packages: discord.py, PyNaCl, davey, STT/TTS providers
- System tools: Opus codec (macOS + Linux paths), ffmpeg
- Environment: bot token, allowed users (resolved to usernames), API keys
- Configuration: STT/TTS provider, voice mode state
- Bot permissions: live Discord API check for Connect, Speak, VAD, etc.

All sensitive values are masked. Gracefully handles missing deps,
invalid tokens, API timeouts, and unreachable Discord API.
2026-03-15 05:20:17 -07:00
0xbyt4 3ea039684e test(voice): add integration tests with real NaCl crypto and Opus codec
End-to-end voice channel tests using real crypto (no mocks):

NaCl decrypt (5): valid packet, wrong key, bot SSRC, multi-packet, multi-SSRC
DAVE passthrough (3): unknown SSRC, Unencrypted error, real error drop
Full flow (5): utterance lifecycle, auto-map, pause/resume, corruption, cleanup
SPEAKING hook (4): hook installed, map/overwrite, mapped audio processed
Auth filtering (3): allowed user, rejected user, empty allowlist
Rejoin flow (3): clean state, new SSRC, missing SPEAKING auto-map
Multi-guild (2): independent receivers, stop isolation
Echo prevention (2): paused audio ignored, resumed audio processed
2026-03-15 05:20:17 -07:00
0xbyt4 63f0ec96ec test(voice): add comprehensive flow tests for voice channel fixes
Tests cover the actual code paths changed in voice fixes:

_on_packet DAVE passthrough (8 tests):
- Known SSRC + DAVE decrypt success → buffered
- Unknown SSRC + DAVE → skip DAVE, passthrough to Opus
- DAVE "Unencrypted" error → passthrough, not dropped
- DAVE other error → packet dropped
- No DAVE session → direct decode
- Bot's own SSRC → ignored (echo prevention)
- Multiple SSRCs → separate buffers

SSRC auto-mapping (6 tests):
- Single allowed user → auto-mapped
- Multiple allowed users → no auto-map
- No allowlist → sole non-bot member inferred
- Unallowed user → rejected
- Only bot in channel → no map
- Auto-map persists across checks

Buffer lifecycle (4 tests):
- Known SSRC completed utterance
- Short buffer ignored
- Recent audio waits
- Stale unknown buffer discarded

TTS playback (10 tests):
- play_tts calls play_in_voice_channel in VC
- play_tts falls through when not in VC
- play_tts wrong channel no match
- Voice input dedup (runner skips)
- Text + voice_mode combinations
- Error/empty response skipped
- Agent TTS tool dedup

UDP keepalive (2 tests):
- Interval within bounds
- Silence frame actually sent via send_packet
2026-03-15 05:20:17 -07:00
0xbyt4 1cacaccca6 fix(voice): show clear error when voice dependencies are missing
When PyNaCl or davey is not installed, joining a voice channel fails
with a raw exception. Now shows a human-readable message pointing
the user to reinstall with voice support.

Closes #1336
2026-03-15 05:20:17 -07:00
0xbyt4 773f3c1137 fix(voice): DAVE passthrough + auto-map SSRC after bot rejoin
After bot leave/rejoin, Discord doesn't resend SPEAKING events for
users already in the channel. This left SSRC unmapped and all audio
was silently dropped by DAVE decrypt.

Fixes:
- Skip DAVE for unknown SSRCs instead of dropping (passthrough)
- Handle "UnencryptedWhenPassthroughDisabled" DAVE errors gracefully
- Auto-infer user_id from sole allowed member in voice channel
- Pass allowed_user_ids to VoiceReceiver for secure inference
2026-03-15 05:20:17 -07:00
0xbyt4 0cc784068d fix(voice): add UDP keepalive to prevent Discord dropping voice after silence
Discord drops the UDP voice route after ~60s of silence - no packets
arrive even when users start speaking again. Send an Opus silence
frame every 15s to keep the UDP session alive.
2026-03-15 05:20:17 -07:00
0xbyt4 f1b4d0b280 fix(voice): make play_tts play in VC instead of no-op
play_tts was returning success without playing anything when bot was
in a voice channel. Now it calls play_in_voice_channel directly.

Simplified skip_double dedup: base adapter handles voice input TTS
via play_tts (which now works for VC), runner skips to avoid double.
2026-03-15 05:20:17 -07:00
Teknium 5254d0bba1 Merge pull request #1427 from NousResearch/fix/1414-gateway-shutdown-restart
fix(gateway): cancel active runs during shutdown
2026-03-15 04:47:28 -07:00
teknium1 21c20aeaa5 fix(gateway): cancel active runs during shutdown
Track adapter background message-processing tasks, cancel them during gateway shutdown, and interrupt running agents before disconnecting adapters. This prevents old gateway instances from continuing in-flight work after stop/replace, which was contributing to the restart-time task continuation/flicker behavior reported in #1414. Adds regression coverage for adapter task cancellation and shutdown interrupts.
2026-03-15 04:21:50 -07:00
Teknium dc095f8491 Merge pull request #1425 from NousResearch/fix/1412-session-delete-prefix
fix(cli): accept session ID prefixes for session actions
2026-03-15 04:02:47 -07:00
teknium1 621fd80b1e fix(cli): accept session ID prefixes for session actions
Resolve session IDs by exact match or unique prefix for sessions delete/export/rename so IDs copied from Preview                                            Last Active   Src    ID
──────────────────────────────────────────────────────────────────────────────────────────
Search for GitHub/GitLab source repositories for   11m ago       cli    20260315_034720_8e1f
[SYSTEM: The user has invoked the "minecraft-atm   1m ago        cli    20260315_034035_57b6
                                                   1h ago        cron   cron_job-1_20260315_
[SYSTEM: The user has invoked the "hermes-agent-   9m ago        cli    20260315_014304_652a
                                                   4h ago        cron   cron_job-1_20260314_
[The user attached an image. Here's what it cont   4h ago        cli    20260314_233806_c8f3
[SYSTEM: The user has invoked the "google-worksp   1h ago        cli    20260314_233301_b04f
Inspect the opencode codebase for how it sends m   4h ago        cli    20260314_232543_0601
Inspect the clawdbot codebase for how it sends m   4h ago        cli    20260314_232543_8125
                                                   4h ago        cron   cron_job-1_20260314_
Reply with exactly: smoke-ok                       4h ago        cli    20260314_231730_aac9
                                                   4h ago        cron   cron_job-1_20260314_
[SYSTEM: The user has invoked the "hermes-agent-   4h ago        cli    20260314_231111_3586
[SYSTEM: The user has invoked the "hermes-agent-   4h ago        cli    20260314_225551_daff
                                                   5h ago        cron   cron_job-1_20260314_
[SYSTEM: The user has invoked the "google-worksp   4h ago        cli    20260314_224629_a9c6
k_sze   — 10:34 PM Just ran hermes update and I    5h ago        cli    20260314_224243_544e
                                                   5h ago        cron   cron_job-1_20260314_
                                                   5h ago        cron   cron_job-1_20260314_
                                                   5h ago        cron   cron_job-1_20260314_ work even when the table view truncates them. Add SessionDB prefix-resolution coverage and a CLI regression test for deleting by listed prefix.
2026-03-15 04:01:56 -07:00
Teknium 2b8fd9a8e3 Merge pull request #1422 from NousResearch/fix/1409-photo-burst-interrupts
fix(gateway): prevent Telegram photo burst interrupts
2026-03-15 03:51:51 -07:00
teknium1 fef710aca8 test(gateway): cover photo burst interrupt regressions
Add regression coverage for non-album Telegram photo burst batching, photo follow-ups that should queue without interrupting active runs, and the gateway priority-interrupt path for photo events.
2026-03-15 03:50:45 -07:00
CoinDegen 4ae1334287 fix(gateway): prevent telegram photo burst interrupts 2026-03-15 03:49:01 -07:00
Teknium db3e3aa6c5 Merge pull request #1421 from NousResearch/fix/1247-preserve-mcp-toolsets
fix(tools): preserve MCP toolsets when saving platform tool config
2026-03-15 03:33:31 -07:00
ygd58 633488e0c0 fix(tools): preserve MCP toolsets when saving platform tool config
_save_platform_tools() overwrote the entire platform_toolsets list with
only the toolsets known to CONFIGURABLE_TOOLSETS. This silently dropped
any MCP server toolsets that users had added manually to config.yaml.

Fix: collect any existing toolset keys that are not in CONFIGURABLE_TOOLSETS
and append them back after the wizard's selections are written. This ensures
MCP toolsets survive a hermes tools save.

Fixes #1247
2026-03-15 03:28:20 -07:00
Teknium 0de200cf4d Merge pull request #1419 from NousResearch/fix/1264-env-secret-blocklist
fix(security): block gateway and tool env vars in subprocesses
2026-03-15 03:22:58 -07:00
Teknium f6fdb18fe6 Merge pull request #1417 from NousResearch/fix/1056-dm-session-isolation
fix(gateway): isolate DM sessions by chat_id
2026-03-15 03:22:39 -07:00
teknium1 b177b4abad fix(security): block gateway and tool env vars in subprocesses
Extend subprocess env sanitization beyond provider credentials by blocking Hermes-managed tool, messaging, and related gateway runtime vars. Reuse a shared sanitizer in LocalEnvironment and ProcessRegistry so background and PTY processes honor the same blocklist and _HERMES_FORCE_ escape hatch. Add regression coverage for local env execution and process_registry spawning.
2026-03-15 02:51:04 -07:00
teknium1 232ba441d7 test: cover DM session key isolation
Update interrupt-key expectations for namespaced DM session keys and add a regression test that different DM chat IDs produce distinct gateway sessions.
2026-03-15 02:38:48 -07:00
heyyyimmax 34e120bcbb fix(gateway): enforce chat_id isolation for all DM sessions 2026-03-15 02:37:53 -07:00
Teknium 779f8df6a6 Merge pull request #1408 from NousResearch/hermes/hermes-daa73839
fix: make Claude image handling work end-to-end
2026-03-14 23:45:03 -07:00
teknium1 62abb453d3 Merge origin/main into hermes/hermes-daa73839 2026-03-14 23:44:47 -07:00
teknium1 735a6e7651 fix: convert anthropic image content blocks 2026-03-14 23:41:20 -07:00
Teknium e5ddca1c8b Merge pull request #1407 from NousResearch/hermes/slack-thread-docs
docs: clarify Slack thread reply behavior
2026-03-14 23:25:43 -07:00
Teknium fd0e1aac72 Merge pull request #1400 from NousResearch/hermes/hermes-45b79a59-clawhub-search
fix: harden ClawHub skill search exact matches
2026-03-14 23:17:24 -07:00
teknium1 8ccd14a0d4 fix: improve clawhub skill search matching 2026-03-14 23:15:04 -07:00
Teknium 6c611c852e fix(update): clarify manual autostash cleanup
fix(update): clarify manual autostash cleanup
2026-03-14 22:58:11 -07:00
teknium1 f882dabf19 fix(update): clarify manual autostash cleanup 2026-03-14 22:56:59 -07:00
teknium1 b117bbc125 test: cover atomic temp cleanup on interrupts
- add regression coverage for BaseException cleanup in atomic_json_write
- add dedicated atomic_yaml_write tests, including interrupt cleanup
- document why BaseException is intentional in both helpers
2026-03-14 22:31:51 -07:00
teknium1 df9020dfa3 fix: harden clawhub skill search exact matches 2026-03-14 22:31:09 -07:00
alt-glitch e266530c7d add different polling intervals for ssh and local backends. ssh has a
longer roundtrip
2026-03-15 02:54:32 +05:30
alt-glitch 879b7d3fbf fix(tests): update mock stdout in env blocklist tests
The fake_popen mock used iter([]) for proc.stdout which doesn't
support .close(). Use MagicMock with __iter__ instead, since
_drain_stdout now calls proc.stdout.close() in its finally block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 02:48:05 +05:30
alt-glitch 9f36483bf4 refactor: deduplicate execute/cleanup, merge init, clean up helpers
- Merge _init_persistent_shell + _start_persistent_shell into single method
- Move execute() dispatcher and cleanup() into PersistentShellMixin
  so LocalEnvironment and SSHEnvironment inherit them
- Remove broad except Exception wrappers from _execute_oneshot in both backends
- Replace try/except with os.path.exists checks in local _read_temp_files
  and _cleanup_temp_files
- Remove redundant bash -c from SSH oneshot (SSH already runs in a shell)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 02:39:56 +05:30
alt-glitch 7be314c456 pass configs to file_tools for r+w over ssh.
pass TERM env.
default to ~ to in local and ssh backends.
ssh backend.
2026-03-15 02:26:39 +05:30
balyan.sid@gmail.com 9001b34146 simplify docstrings, fix some bugs 2026-03-15 01:20:42 +05:30
balyan.sid@gmail.com 861202b56c wip: add persistent shell to ssh and local terminal backends 2026-03-15 01:20:42 +05:30
balyan.sid@gmail.com 9d63dcc3f9 add persistent ssh backend 2026-03-15 01:19:38 +05:30
teknium1 b59da08730 fix: reduce file tool log noise
- treat git diff --cached --quiet rc=1 as an expected checkpoint state
  instead of logging it as an error
- downgrade expected write PermissionError/EROFS/EACCES failures out of
  error logging while keeping unexpected exceptions at error level
- add regression tests for both logging behaviors
2026-03-13 22:14:00 -07:00
371 changed files with 56932 additions and 5484 deletions
+21
View File
@@ -45,14 +45,35 @@ MINIMAX_API_KEY=
MINIMAX_CN_API_KEY=
# MINIMAX_CN_BASE_URL=https://api.minimaxi.com/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (OpenCode Zen)
# =============================================================================
# OpenCode Zen provides curated, tested models (GPT, Claude, Gemini, MiniMax, GLM, Kimi)
# Pay-as-you-go pricing. Get your key at: https://opencode.ai/auth
OPENCODE_ZEN_API_KEY=
# OPENCODE_ZEN_BASE_URL=https://opencode.ai/zen/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (OpenCode Go)
# =============================================================================
# OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
# $10/month subscription. Get your key at: https://opencode.ai/auth
OPENCODE_GO_API_KEY=
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
# =============================================================================
# TOOL API KEYS
# =============================================================================
# Parallel API Key - AI-native web search and extract
# Get at: https://parallel.ai
PARALLEL_API_KEY=
# Firecrawl API Key - Web search, extract, and crawl
# Get at: https://firecrawl.dev/
FIRECRAWL_API_KEY=
# FAL.ai API Key - Image generation
# Get at: https://fal.ai/
FAL_KEY=
+50 -9
View File
@@ -5,7 +5,7 @@ Instructions for AI coding assistants and developers working on the hermes-agent
## Development Environment
```bash
source .venv/bin/activate # ALWAYS activate before running Python
source venv/bin/activate # ALWAYS activate before running Python
```
## Project Structure
@@ -23,6 +23,7 @@ hermes-agent/
│ ├── prompt_caching.py # Anthropic prompt caching
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
│ ├── model_metadata.py # Model context lengths, token estimation
│ ├── models_dev.py # models.dev registry integration (provider-aware context)
│ ├── display.py # KawaiiSpinner, tool preview formatting
│ ├── skill_commands.py # Skill slash commands (shared CLI/gateway)
│ └── trajectory.py # Trajectory saving helpers
@@ -44,7 +45,7 @@ hermes-agent/
│ ├── terminal_tool.py # Terminal orchestration
│ ├── process_registry.py # Background process management
│ ├── file_tools.py # File read/write/search/patch
│ ├── web_tools.py # Firecrawl search/extract
│ ├── web_tools.py # Web search/extract (Parallel + Firecrawl)
│ ├── browser_tool.py # Browserbase browser automation
│ ├── code_execution_tool.py # execute_code sandbox
│ ├── delegate_tool.py # Subagent delegation
@@ -129,14 +130,50 @@ Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Re
- **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
- `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
- **Skin engine** (`hermes_cli/skin_engine.py`) — data-driven CLI theming; initialized from `display.skin` config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
- `process_command()` is a method on `HermesCLI` (not in commands.py)
- `process_command()` is a method on `HermesCLI` — dispatches on canonical command name resolved via `resolve_command()` from the central registry
- Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching
### Adding CLI Commands
### Slash Command Registry (`hermes_cli/commands.py`)
1. Add to `COMMANDS` dict in `hermes_cli/commands.py`
2. Add handler in `HermesCLI.process_command()` in `cli.py`
3. For persistent settings, use `save_config_value()` in `cli.py`
All slash commands are defined in a central `COMMAND_REGISTRY` list of `CommandDef` objects. Every downstream consumer derives from this registry automatically:
- **CLI** — `process_command()` resolves aliases via `resolve_command()`, dispatches on canonical name
- **Gateway** — `GATEWAY_KNOWN_COMMANDS` frozenset for hook emission, `resolve_command()` for dispatch
- **Gateway help** — `gateway_help_lines()` generates `/help` output
- **Telegram** — `telegram_bot_commands()` generates the BotCommand menu
- **Slack** — `slack_subcommand_map()` generates `/hermes` subcommand routing
- **Autocomplete** — `COMMANDS` flat dict feeds `SlashCommandCompleter`
- **CLI help** — `COMMANDS_BY_CATEGORY` dict feeds `show_help()`
### Adding a Slash Command
1. Add a `CommandDef` entry to `COMMAND_REGISTRY` in `hermes_cli/commands.py`:
```python
CommandDef("mycommand", "Description of what it does", "Session",
aliases=("mc",), args_hint="[arg]"),
```
2. Add handler in `HermesCLI.process_command()` in `cli.py`:
```python
elif canonical == "mycommand":
self._handle_mycommand(cmd_original)
```
3. If the command is available in the gateway, add a handler in `gateway/run.py`:
```python
if canonical == "mycommand":
return await self._handle_mycommand(event)
```
4. For persistent settings, use `save_config_value()` in `cli.py`
**CommandDef fields:**
- `name` — canonical name without slash (e.g. `"background"`)
- `description` — human-readable description
- `category` — one of `"Session"`, `"Configuration"`, `"Tools & Skills"`, `"Info"`, `"Exit"`
- `aliases` — tuple of alternative names (e.g. `("bg",)`)
- `args_hint` — argument placeholder shown in help (e.g. `"<prompt>"`, `"[name]"`)
- `cli_only` — only available in the interactive CLI
- `gateway_only` — only available in messaging platforms
**Adding an alias** requires only adding it to the `aliases` tuple on the existing `CommandDef`. No other file changes needed — dispatch, help text, Telegram menu, Slack mapping, and autocomplete all update automatically.
---
@@ -235,6 +272,7 @@ hermes_cli/skin_engine.py # SkinConfig dataclass, built-in skins, YAML loader
| Spinner verbs | `spinner.thinking_verbs` | `display.py` |
| Spinner wings (optional) | `spinner.wings` | `display.py` |
| Tool output prefix | `tool_prefix` | `display.py` |
| Per-tool emojis | `tool_emojis` | `display.py``get_tool_emoji()` |
| Agent name | `branding.agent_name` | `banner.py`, `cli.py` |
| Welcome message | `branding.welcome` | `cli.py` |
| Response box label | `branding.response_label` | `cli.py` |
@@ -327,7 +365,10 @@ Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) inst
Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.
### `_last_resolved_tool_names` is a process-global in `model_tools.py`
When subagents overwrite this global, `execute_code` calls after delegation may fail with missing tool imports. Known bug.
`_run_single_child()` in `delegate_tool.py` saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.
### DO NOT hardcode cross-tool references in schema descriptions
Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `model_tools.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
### Tests must not write to `~/.hermes/`
The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
@@ -337,7 +378,7 @@ The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HER
## Testing
```bash
source .venv/bin/activate
source venv/bin/activate
python -m pytest tests/ -q # Full suite (~3000 tests, ~3 min)
python -m pytest tests/test_model_tools.py -q # Toolset resolution
python -m pytest tests/test_cli_init.py -q # CLI config loading
+2 -2
View File
@@ -136,7 +136,7 @@ hermes-agent/
│ ├── auth.py # Provider resolution, OAuth, Nous Portal
│ ├── models.py # OpenRouter model selection lists
│ ├── banner.py # Welcome banner, ASCII art
│ ├── commands.py # Slash command definitions + autocomplete
│ ├── commands.py # Central slash command registry (CommandDef), autocomplete, gateway helpers
│ ├── callbacks.py # Interactive callbacks (clarify, sudo, approval)
│ ├── doctor.py # Diagnostics
│ ├── skills_hub.py # Skills Hub CLI + /skills slash command
@@ -147,7 +147,7 @@ hermes-agent/
│ ├── approval.py # Dangerous command detection + per-session approval
│ ├── terminal_tool.py # Terminal orchestration (sudo, env lifecycle, backends)
│ ├── file_operations.py # read_file, write_file, search, patch, etc.
│ ├── web_tools.py # web_search, web_extract (Firecrawl + Gemini summarization)
│ ├── web_tools.py # web_search, web_extract (Parallel/Firecrawl + Gemini summarization)
│ ├── vision_tools.py # Image analysis via multimodal models
│ ├── delegate_tool.py # Subagent spawning and parallel task execution
│ ├── code_execution_tool.py # Sandboxed Python with RPC tool access
+21 -3
View File
@@ -2,7 +2,7 @@
<img src="assets/banner.png" alt="Hermes Agent" width="100%">
</p>
# Hermes Agent
# Hermes Agent
<p align="center">
<a href="https://hermes-agent.nousresearch.com/docs/"><img src="https://img.shields.io/badge/Docs-hermes--agent.nousresearch.com-FFD700?style=for-the-badge" alt="Documentation"></a>
@@ -62,6 +62,24 @@ hermes doctor # Diagnose any issues
📖 **[Full documentation →](https://hermes-agent.nousresearch.com/docs/)**
## CLI vs Messaging Quick Reference
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
| Action | CLI | Messaging platforms |
|---------|-----|---------------------|
| Start chatting | `hermes` | Run `hermes gateway setup` + `hermes gateway start`, then send the bot a message |
| Start fresh conversation | `/new` or `/reset` | `/new` or `/reset` |
| Change model | `/model [provider:model]` | `/model [provider:model]` |
| Set a personality | `/personality [name]` | `/personality [name]` |
| Retry or undo the last turn | `/retry`, `/undo` | `/retry`, `/undo` |
| Compress context / check usage | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
| Browse skills | `/skills` or `/<skill-name>` | `/skills` or `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
For the full command lists, see the [CLI guide](https://hermes-agent.nousresearch.com/docs/user-guide/cli) and the [Messaging Gateway guide](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
---
## Documentation
@@ -128,8 +146,8 @@ git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
git submodule update --init mini-swe-agent # required terminal backend
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv venv venv --python 3.11
source venv/bin/activate
uv pip install -e ".[all,dev]"
uv pip install -e "./mini-swe-agent"
python -m pytest tests/ -q
+377
View File
@@ -0,0 +1,377 @@
# Hermes Agent v0.3.0 (v2026.3.17)
**Release Date:** March 17, 2026
> The streaming, plugins, and provider release — unified real-time token delivery, first-class plugin architecture, rebuilt provider system with Vercel AI Gateway, native Anthropic provider, smart approvals, live Chrome CDP browser connect, ACP IDE integration, Honcho memory, voice mode, persistent shell, and 50+ bug fixes across every platform.
---
## ✨ Highlights
- **Unified Streaming Infrastructure** — Real-time token-by-token delivery in CLI and all gateway platforms. Responses stream as they're generated instead of arriving as a block. ([#1538](https://github.com/NousResearch/hermes-agent/pull/1538))
- **First-Class Plugin Architecture** — Drop Python files into `~/.hermes/plugins/` to extend Hermes with custom tools, commands, and hooks. No forking required. ([#1544](https://github.com/NousResearch/hermes-agent/pull/1544), [#1555](https://github.com/NousResearch/hermes-agent/pull/1555))
- **Native Anthropic Provider** — Direct Anthropic API calls with Claude Code credential auto-discovery, OAuth PKCE flows, and native prompt caching. No OpenRouter middleman needed. ([#1097](https://github.com/NousResearch/hermes-agent/pull/1097))
- **Smart Approvals + /stop Command** — Codex-inspired approval system that learns which commands are safe and remembers your preferences. `/stop` kills the current agent run immediately. ([#1543](https://github.com/NousResearch/hermes-agent/pull/1543))
- **Honcho Memory Integration** — Async memory writes, configurable recall modes, session title integration, and multi-user isolation in gateway mode. By @erosika. ([#736](https://github.com/NousResearch/hermes-agent/pull/736))
- **Voice Mode** — Push-to-talk in CLI, voice notes in Telegram/Discord, Discord voice channel support, and local Whisper transcription via faster-whisper. ([#1299](https://github.com/NousResearch/hermes-agent/pull/1299), [#1185](https://github.com/NousResearch/hermes-agent/pull/1185), [#1429](https://github.com/NousResearch/hermes-agent/pull/1429))
- **Concurrent Tool Execution** — Multiple independent tool calls now run in parallel via ThreadPoolExecutor, significantly reducing latency for multi-tool turns. ([#1152](https://github.com/NousResearch/hermes-agent/pull/1152))
- **PII Redaction** — When `privacy.redact_pii` is enabled, personally identifiable information is automatically scrubbed before sending context to LLM providers. ([#1542](https://github.com/NousResearch/hermes-agent/pull/1542))
- **`/browser connect` via CDP** — Attach browser tools to a live Chrome instance through Chrome DevTools Protocol. Debug, inspect, and interact with pages you already have open. ([#1549](https://github.com/NousResearch/hermes-agent/pull/1549))
- **Vercel AI Gateway Provider** — Route Hermes through Vercel's AI Gateway for access to their model catalog and infrastructure. ([#1628](https://github.com/NousResearch/hermes-agent/pull/1628))
- **Centralized Provider Router** — Rebuilt provider system with `call_llm` API, unified `/model` command, auto-detect provider on model switch, and direct endpoint overrides for auxiliary/delegation clients. ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003), [#1506](https://github.com/NousResearch/hermes-agent/pull/1506), [#1375](https://github.com/NousResearch/hermes-agent/pull/1375))
- **ACP Server (IDE Integration)** — VS Code, Zed, and JetBrains can now connect to Hermes as an agent backend, with full slash command support. ([#1254](https://github.com/NousResearch/hermes-agent/pull/1254), [#1532](https://github.com/NousResearch/hermes-agent/pull/1532))
- **Persistent Shell Mode** — Local and SSH terminal backends can maintain shell state across tool calls — cd, env vars, and aliases persist. By @alt-glitch. ([#1067](https://github.com/NousResearch/hermes-agent/pull/1067), [#1483](https://github.com/NousResearch/hermes-agent/pull/1483))
- **Agentic On-Policy Distillation (OPD)** — New RL training environment for distilling agent policies, expanding the Atropos training ecosystem. ([#1149](https://github.com/NousResearch/hermes-agent/pull/1149))
---
## 🏗️ Core Agent & Architecture
### Provider & Model Support
- **Centralized provider router** with `call_llm` API and unified `/model` command — switch models and providers seamlessly ([#1003](https://github.com/NousResearch/hermes-agent/pull/1003))
- **Vercel AI Gateway** provider support ([#1628](https://github.com/NousResearch/hermes-agent/pull/1628))
- **Auto-detect provider** when switching models via `/model` ([#1506](https://github.com/NousResearch/hermes-agent/pull/1506))
- **Direct endpoint overrides** for auxiliary and delegation clients — point vision/subagent calls at specific endpoints ([#1375](https://github.com/NousResearch/hermes-agent/pull/1375))
- **Native Anthropic auxiliary vision** — use Claude's native vision API instead of routing through OpenAI-compatible endpoints ([#1377](https://github.com/NousResearch/hermes-agent/pull/1377))
- Anthropic OAuth flow improvements — auto-run `claude setup-token`, reauthentication, PKCE state persistence, identity fingerprinting ([#1132](https://github.com/NousResearch/hermes-agent/pull/1132), [#1360](https://github.com/NousResearch/hermes-agent/pull/1360), [#1396](https://github.com/NousResearch/hermes-agent/pull/1396), [#1597](https://github.com/NousResearch/hermes-agent/pull/1597))
- Fix adaptive thinking without `budget_tokens` for Claude 4.6 models — by @ASRagab ([#1128](https://github.com/NousResearch/hermes-agent/pull/1128))
- Fix Anthropic cache markers through adapter — by @brandtcormorant ([#1216](https://github.com/NousResearch/hermes-agent/pull/1216))
- Retry Anthropic 429/529 errors and surface details to users — by @0xbyt4 ([#1585](https://github.com/NousResearch/hermes-agent/pull/1585))
- Fix Anthropic adapter max_tokens, fallback crash, proxy base_url — by @0xbyt4 ([#1121](https://github.com/NousResearch/hermes-agent/pull/1121))
- Fix DeepSeek V3 parser dropping multiple parallel tool calls — by @mr-emmett-one ([#1365](https://github.com/NousResearch/hermes-agent/pull/1365), [#1300](https://github.com/NousResearch/hermes-agent/pull/1300))
- Accept unlisted models with warning instead of rejecting ([#1047](https://github.com/NousResearch/hermes-agent/pull/1047), [#1102](https://github.com/NousResearch/hermes-agent/pull/1102))
- Skip reasoning params for unsupported OpenRouter models ([#1485](https://github.com/NousResearch/hermes-agent/pull/1485))
- MiniMax Anthropic API compatibility fix ([#1623](https://github.com/NousResearch/hermes-agent/pull/1623))
- Custom endpoint `/models` verification and `/v1` base URL suggestion ([#1480](https://github.com/NousResearch/hermes-agent/pull/1480))
- Resolve delegation providers from `custom_providers` config ([#1328](https://github.com/NousResearch/hermes-agent/pull/1328))
- Kimi model additions and User-Agent fix ([#1039](https://github.com/NousResearch/hermes-agent/pull/1039))
- Strip `call_id`/`response_item_id` for Mistral compatibility ([#1058](https://github.com/NousResearch/hermes-agent/pull/1058))
### Agent Loop & Conversation
- **Anthropic Context Editing API** support ([#1147](https://github.com/NousResearch/hermes-agent/pull/1147))
- Improved context compaction handoff summaries — compressor now preserves more actionable state ([#1273](https://github.com/NousResearch/hermes-agent/pull/1273))
- Sync session_id after mid-run context compression ([#1160](https://github.com/NousResearch/hermes-agent/pull/1160))
- Session hygiene threshold tuned to 50% for more proactive compression ([#1096](https://github.com/NousResearch/hermes-agent/pull/1096), [#1161](https://github.com/NousResearch/hermes-agent/pull/1161))
- Include session ID in system prompt via `--pass-session-id` flag ([#1040](https://github.com/NousResearch/hermes-agent/pull/1040))
- Prevent closed OpenAI client reuse across retries ([#1391](https://github.com/NousResearch/hermes-agent/pull/1391))
- Sanitize chat payloads and provider precedence ([#1253](https://github.com/NousResearch/hermes-agent/pull/1253))
- Handle dict tool call arguments from Codex and local backends ([#1393](https://github.com/NousResearch/hermes-agent/pull/1393), [#1440](https://github.com/NousResearch/hermes-agent/pull/1440))
### Memory & Sessions
- **Improve memory prioritization** — user preferences and corrections weighted above procedural knowledge ([#1548](https://github.com/NousResearch/hermes-agent/pull/1548))
- Tighter memory and session recall guidance in system prompts ([#1329](https://github.com/NousResearch/hermes-agent/pull/1329))
- Persist CLI token counts to session DB for `/insights` ([#1498](https://github.com/NousResearch/hermes-agent/pull/1498))
- Keep Honcho recall out of the cached system prefix ([#1201](https://github.com/NousResearch/hermes-agent/pull/1201))
- Correct `seed_ai_identity` to use `session.add_messages()` ([#1475](https://github.com/NousResearch/hermes-agent/pull/1475))
- Isolate Honcho session routing for multi-user gateway ([#1500](https://github.com/NousResearch/hermes-agent/pull/1500))
---
## 📱 Messaging Platforms (Gateway)
### Gateway Core
- **System gateway service mode** — run as a system-level systemd service, not just user-level ([#1371](https://github.com/NousResearch/hermes-agent/pull/1371))
- **Gateway install scope prompts** — choose user vs system scope during setup ([#1374](https://github.com/NousResearch/hermes-agent/pull/1374))
- **Reasoning hot reload** — change reasoning settings without restarting the gateway ([#1275](https://github.com/NousResearch/hermes-agent/pull/1275))
- Default group sessions to per-user isolation — no more shared state across users in group chats ([#1495](https://github.com/NousResearch/hermes-agent/pull/1495), [#1417](https://github.com/NousResearch/hermes-agent/pull/1417))
- Harden gateway restart recovery ([#1310](https://github.com/NousResearch/hermes-agent/pull/1310))
- Cancel active runs during shutdown ([#1427](https://github.com/NousResearch/hermes-agent/pull/1427))
- SSL certificate auto-detection for NixOS and non-standard systems ([#1494](https://github.com/NousResearch/hermes-agent/pull/1494))
- Auto-detect D-Bus session bus for `systemctl --user` on headless servers ([#1601](https://github.com/NousResearch/hermes-agent/pull/1601))
- Auto-enable systemd linger during gateway install on headless servers ([#1334](https://github.com/NousResearch/hermes-agent/pull/1334))
- Fall back to module entrypoint when `hermes` is not on PATH ([#1355](https://github.com/NousResearch/hermes-agent/pull/1355))
- Fix dual gateways on macOS launchd after `hermes update` ([#1567](https://github.com/NousResearch/hermes-agent/pull/1567))
- Remove recursive ExecStop from systemd units ([#1530](https://github.com/NousResearch/hermes-agent/pull/1530))
- Prevent logging handler accumulation in gateway mode ([#1251](https://github.com/NousResearch/hermes-agent/pull/1251))
- Restart on retryable startup failures — by @jplew ([#1517](https://github.com/NousResearch/hermes-agent/pull/1517))
- Backfill model on gateway sessions after agent runs ([#1306](https://github.com/NousResearch/hermes-agent/pull/1306))
- PID-based gateway kill and deferred config write ([#1499](https://github.com/NousResearch/hermes-agent/pull/1499))
### Telegram
- Buffer media groups to prevent self-interruption from photo bursts ([#1341](https://github.com/NousResearch/hermes-agent/pull/1341), [#1422](https://github.com/NousResearch/hermes-agent/pull/1422))
- Retry on transient TLS failures during connect and send ([#1535](https://github.com/NousResearch/hermes-agent/pull/1535))
- Harden polling conflict handling ([#1339](https://github.com/NousResearch/hermes-agent/pull/1339))
- Escape chunk indicators and inline code in MarkdownV2 ([#1478](https://github.com/NousResearch/hermes-agent/pull/1478), [#1626](https://github.com/NousResearch/hermes-agent/pull/1626))
- Check updater/app state before disconnect ([#1389](https://github.com/NousResearch/hermes-agent/pull/1389))
### Discord
- `/thread` command with `auto_thread` config and media metadata fixes ([#1178](https://github.com/NousResearch/hermes-agent/pull/1178))
- Auto-thread on @mention, skip mention text in bot threads ([#1438](https://github.com/NousResearch/hermes-agent/pull/1438))
- Retry without reply reference for system messages ([#1385](https://github.com/NousResearch/hermes-agent/pull/1385))
- Preserve native document and video attachment support ([#1392](https://github.com/NousResearch/hermes-agent/pull/1392))
- Defer discord adapter annotations to avoid optional import crashes ([#1314](https://github.com/NousResearch/hermes-agent/pull/1314))
### Slack
- Thread handling overhaul — progress messages, responses, and session isolation all respect threads ([#1103](https://github.com/NousResearch/hermes-agent/pull/1103))
- Formatting, reactions, user resolution, and command improvements ([#1106](https://github.com/NousResearch/hermes-agent/pull/1106))
- Fix MAX_MESSAGE_LENGTH 3900 → 39000 ([#1117](https://github.com/NousResearch/hermes-agent/pull/1117))
- File upload fallback preserves thread context — by @0xbyt4 ([#1122](https://github.com/NousResearch/hermes-agent/pull/1122))
- Improve setup guidance ([#1387](https://github.com/NousResearch/hermes-agent/pull/1387))
### Email
- Fix IMAP UID tracking and SMTP TLS verification ([#1305](https://github.com/NousResearch/hermes-agent/pull/1305))
- Add `skip_attachments` option via config.yaml ([#1536](https://github.com/NousResearch/hermes-agent/pull/1536))
### Home Assistant
- Event filtering closed by default ([#1169](https://github.com/NousResearch/hermes-agent/pull/1169))
---
## 🖥️ CLI & User Experience
### Interactive CLI
- **Persistent CLI status bar** — always-visible model, provider, and token counts ([#1522](https://github.com/NousResearch/hermes-agent/pull/1522))
- **File path autocomplete** in the input prompt ([#1545](https://github.com/NousResearch/hermes-agent/pull/1545))
- **`/plan` command** — generate implementation plans from specs ([#1372](https://github.com/NousResearch/hermes-agent/pull/1372), [#1381](https://github.com/NousResearch/hermes-agent/pull/1381))
- **Major `/rollback` improvements** — richer checkpoint history, clearer UX ([#1505](https://github.com/NousResearch/hermes-agent/pull/1505))
- **Preload CLI skills on launch** — skills are ready before the first prompt ([#1359](https://github.com/NousResearch/hermes-agent/pull/1359))
- **Centralized slash command registry** — all commands defined once, consumed everywhere ([#1603](https://github.com/NousResearch/hermes-agent/pull/1603))
- `/bg` alias for `/background` ([#1590](https://github.com/NousResearch/hermes-agent/pull/1590))
- Prefix matching for slash commands — `/mod` resolves to `/model` ([#1320](https://github.com/NousResearch/hermes-agent/pull/1320))
- `/new`, `/reset`, `/clear` now start genuinely fresh sessions ([#1237](https://github.com/NousResearch/hermes-agent/pull/1237))
- Accept session ID prefixes for session actions ([#1425](https://github.com/NousResearch/hermes-agent/pull/1425))
- TUI prompt and accent output now respect active skin ([#1282](https://github.com/NousResearch/hermes-agent/pull/1282))
- Centralize tool emoji metadata in registry + skin integration ([#1484](https://github.com/NousResearch/hermes-agent/pull/1484))
- "View full command" option added to dangerous command approval — by @teknium1 based on design by community ([#887](https://github.com/NousResearch/hermes-agent/pull/887))
- Non-blocking startup update check and banner deduplication ([#1386](https://github.com/NousResearch/hermes-agent/pull/1386))
- `/reasoning` command output ordering and inline think extraction fixes ([#1031](https://github.com/NousResearch/hermes-agent/pull/1031))
- Verbose mode shows full untruncated output ([#1472](https://github.com/NousResearch/hermes-agent/pull/1472))
- Fix `/status` to report live state and tokens ([#1476](https://github.com/NousResearch/hermes-agent/pull/1476))
- Seed a default global SOUL.md ([#1311](https://github.com/NousResearch/hermes-agent/pull/1311))
### Setup & Configuration
- **OpenClaw migration** during first-time setup — by @kshitijk4poor ([#981](https://github.com/NousResearch/hermes-agent/pull/981))
- `hermes claw migrate` command + migration docs ([#1059](https://github.com/NousResearch/hermes-agent/pull/1059))
- Smart vision setup that respects the user's chosen provider ([#1323](https://github.com/NousResearch/hermes-agent/pull/1323))
- Handle headless setup flows end-to-end ([#1274](https://github.com/NousResearch/hermes-agent/pull/1274))
- Prefer curses over `simple_term_menu` in setup.py ([#1487](https://github.com/NousResearch/hermes-agent/pull/1487))
- Show effective model and provider in `/status` ([#1284](https://github.com/NousResearch/hermes-agent/pull/1284))
- Config set examples use placeholder syntax ([#1322](https://github.com/NousResearch/hermes-agent/pull/1322))
- Reload .env over stale shell overrides ([#1434](https://github.com/NousResearch/hermes-agent/pull/1434))
- Fix is_coding_plan NameError crash — by @0xbyt4 ([#1123](https://github.com/NousResearch/hermes-agent/pull/1123))
- Add missing packages to setuptools config — by @alt-glitch ([#912](https://github.com/NousResearch/hermes-agent/pull/912))
- Installer: clarify why sudo is needed at every prompt ([#1602](https://github.com/NousResearch/hermes-agent/pull/1602))
---
## 🔧 Tool System
### Terminal & Execution
- **Persistent shell mode** for local and SSH backends — maintain shell state across tool calls — by @alt-glitch ([#1067](https://github.com/NousResearch/hermes-agent/pull/1067), [#1483](https://github.com/NousResearch/hermes-agent/pull/1483))
- **Tirith pre-exec command scanning** — security layer that analyzes commands before execution ([#1256](https://github.com/NousResearch/hermes-agent/pull/1256))
- Strip Hermes provider env vars from all subprocess environments ([#1157](https://github.com/NousResearch/hermes-agent/pull/1157), [#1172](https://github.com/NousResearch/hermes-agent/pull/1172), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399), [#1419](https://github.com/NousResearch/hermes-agent/pull/1419)) — initial fix by @eren-karakus0
- SSH preflight check ([#1486](https://github.com/NousResearch/hermes-agent/pull/1486))
- Docker backend: make cwd workspace mount explicit opt-in ([#1534](https://github.com/NousResearch/hermes-agent/pull/1534))
- Add project root to PYTHONPATH in execute_code sandbox ([#1383](https://github.com/NousResearch/hermes-agent/pull/1383))
- Eliminate execute_code progress spam on gateway platforms ([#1098](https://github.com/NousResearch/hermes-agent/pull/1098))
- Clearer docker backend preflight errors ([#1276](https://github.com/NousResearch/hermes-agent/pull/1276))
### Browser
- **`/browser connect`** — attach browser tools to a live Chrome instance via CDP ([#1549](https://github.com/NousResearch/hermes-agent/pull/1549))
- Improve browser cleanup, local browser PATH setup, and screenshot recovery ([#1333](https://github.com/NousResearch/hermes-agent/pull/1333))
### MCP
- **Selective tool loading** with utility policies — filter which MCP tools are available ([#1302](https://github.com/NousResearch/hermes-agent/pull/1302))
- Auto-reload MCP tools when `mcp_servers` config changes without restart ([#1474](https://github.com/NousResearch/hermes-agent/pull/1474))
- Resolve npx stdio connection failures ([#1291](https://github.com/NousResearch/hermes-agent/pull/1291))
- Preserve MCP toolsets when saving platform tool config ([#1421](https://github.com/NousResearch/hermes-agent/pull/1421))
### Vision
- Unify vision backend gating ([#1367](https://github.com/NousResearch/hermes-agent/pull/1367))
- Surface actual error reason instead of generic message ([#1338](https://github.com/NousResearch/hermes-agent/pull/1338))
- Make Claude image handling work end-to-end ([#1408](https://github.com/NousResearch/hermes-agent/pull/1408))
### Cron
- **Compress cron management into one tool** — single `cronjob` tool replaces multiple commands ([#1343](https://github.com/NousResearch/hermes-agent/pull/1343))
- Suppress duplicate cron sends to auto-delivery targets ([#1357](https://github.com/NousResearch/hermes-agent/pull/1357))
- Persist cron sessions to SQLite ([#1255](https://github.com/NousResearch/hermes-agent/pull/1255))
- Per-job runtime overrides (provider, model, base_url) ([#1398](https://github.com/NousResearch/hermes-agent/pull/1398))
- Atomic write in `save_job_output` to prevent data loss on crash ([#1173](https://github.com/NousResearch/hermes-agent/pull/1173))
- Preserve thread context for `deliver=origin` ([#1437](https://github.com/NousResearch/hermes-agent/pull/1437))
### Patch Tool
- Avoid corrupting pipe chars in V4A patch apply ([#1286](https://github.com/NousResearch/hermes-agent/pull/1286))
- Permissive `block_anchor` thresholds and unicode normalization ([#1539](https://github.com/NousResearch/hermes-agent/pull/1539))
### Delegation
- Add observability metadata to subagent results (model, tokens, duration, tool trace) ([#1175](https://github.com/NousResearch/hermes-agent/pull/1175))
---
## 🧩 Skills Ecosystem
### Skills System
- **Integrate skills.sh** as a hub source alongside ClawHub ([#1303](https://github.com/NousResearch/hermes-agent/pull/1303))
- Secure skill env setup on load ([#1153](https://github.com/NousResearch/hermes-agent/pull/1153))
- Honor policy table for dangerous verdicts ([#1330](https://github.com/NousResearch/hermes-agent/pull/1330))
- Harden ClawHub skill search exact matches ([#1400](https://github.com/NousResearch/hermes-agent/pull/1400))
- Fix ClawHub skill install — use `/download` ZIP endpoint ([#1060](https://github.com/NousResearch/hermes-agent/pull/1060))
- Avoid mislabeling local skills as builtin — by @arceus77-7 ([#862](https://github.com/NousResearch/hermes-agent/pull/862))
### New Skills
- **Linear** project management ([#1230](https://github.com/NousResearch/hermes-agent/pull/1230))
- **X/Twitter** via x-cli ([#1285](https://github.com/NousResearch/hermes-agent/pull/1285))
- **Telephony** — Twilio, SMS, and AI calls ([#1289](https://github.com/NousResearch/hermes-agent/pull/1289))
- **1Password** — by @arceus77-7 ([#883](https://github.com/NousResearch/hermes-agent/pull/883), [#1179](https://github.com/NousResearch/hermes-agent/pull/1179))
- **NeuroSkill BCI** integration ([#1135](https://github.com/NousResearch/hermes-agent/pull/1135))
- **Blender MCP** for 3D modeling ([#1531](https://github.com/NousResearch/hermes-agent/pull/1531))
- **OSS Security Forensics** ([#1482](https://github.com/NousResearch/hermes-agent/pull/1482))
- **Parallel CLI** research skill ([#1301](https://github.com/NousResearch/hermes-agent/pull/1301))
- **OpenCode** CLI skill ([#1174](https://github.com/NousResearch/hermes-agent/pull/1174))
- **ASCII Video** skill refactored — by @SHL0MS ([#1213](https://github.com/NousResearch/hermes-agent/pull/1213), [#1598](https://github.com/NousResearch/hermes-agent/pull/1598))
---
## 🎙️ Voice Mode
- Voice mode foundation — push-to-talk CLI, Telegram/Discord voice notes ([#1299](https://github.com/NousResearch/hermes-agent/pull/1299))
- Free local Whisper transcription via faster-whisper ([#1185](https://github.com/NousResearch/hermes-agent/pull/1185))
- Discord voice channel reliability fixes ([#1429](https://github.com/NousResearch/hermes-agent/pull/1429))
- Restore local STT fallback for gateway voice notes ([#1490](https://github.com/NousResearch/hermes-agent/pull/1490))
- Honor `stt.enabled: false` across gateway transcription ([#1394](https://github.com/NousResearch/hermes-agent/pull/1394))
- Fix bogus incapability message on Telegram voice notes (Issue [#1033](https://github.com/NousResearch/hermes-agent/issues/1033))
---
## 🔌 ACP (IDE Integration)
- Restore ACP server implementation ([#1254](https://github.com/NousResearch/hermes-agent/pull/1254))
- Support slash commands in ACP adapter ([#1532](https://github.com/NousResearch/hermes-agent/pull/1532))
---
## 🧪 RL Training
- **Agentic On-Policy Distillation (OPD)** environment — new RL training environment for agent policy distillation ([#1149](https://github.com/NousResearch/hermes-agent/pull/1149))
- Make tinker-atropos RL training fully optional ([#1062](https://github.com/NousResearch/hermes-agent/pull/1062))
---
## 🔒 Security & Reliability
### Security Hardening
- **Tirith pre-exec command scanning** — static analysis of terminal commands before execution ([#1256](https://github.com/NousResearch/hermes-agent/pull/1256))
- **PII redaction** when `privacy.redact_pii` is enabled ([#1542](https://github.com/NousResearch/hermes-agent/pull/1542))
- Strip Hermes provider/gateway/tool env vars from all subprocess environments ([#1157](https://github.com/NousResearch/hermes-agent/pull/1157), [#1172](https://github.com/NousResearch/hermes-agent/pull/1172), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399), [#1419](https://github.com/NousResearch/hermes-agent/pull/1419))
- Docker cwd workspace mount now explicit opt-in — never auto-mount host directories ([#1534](https://github.com/NousResearch/hermes-agent/pull/1534))
- Escape parens and braces in fork bomb regex pattern ([#1397](https://github.com/NousResearch/hermes-agent/pull/1397))
- Harden `.worktreeinclude` path containment ([#1388](https://github.com/NousResearch/hermes-agent/pull/1388))
- Use description as `pattern_key` to prevent approval collisions ([#1395](https://github.com/NousResearch/hermes-agent/pull/1395))
### Reliability
- Guard init-time stdio writes ([#1271](https://github.com/NousResearch/hermes-agent/pull/1271))
- Session log writes reuse shared atomic JSON helper ([#1280](https://github.com/NousResearch/hermes-agent/pull/1280))
- Atomic temp cleanup protected on interrupts ([#1401](https://github.com/NousResearch/hermes-agent/pull/1401))
---
## 🐛 Notable Bug Fixes
- **`/status` always showing 0 tokens** — now reports live state (Issue [#1465](https://github.com/NousResearch/hermes-agent/issues/1465), [#1476](https://github.com/NousResearch/hermes-agent/pull/1476))
- **Custom model endpoints not working** — restored config-saved endpoint resolution (Issue [#1460](https://github.com/NousResearch/hermes-agent/issues/1460), [#1373](https://github.com/NousResearch/hermes-agent/pull/1373))
- **MCP tools not visible until restart** — auto-reload on config change (Issue [#1036](https://github.com/NousResearch/hermes-agent/issues/1036), [#1474](https://github.com/NousResearch/hermes-agent/pull/1474))
- **`hermes tools` removing MCP tools** — preserve MCP toolsets when saving (Issue [#1247](https://github.com/NousResearch/hermes-agent/issues/1247), [#1421](https://github.com/NousResearch/hermes-agent/pull/1421))
- **Terminal subprocesses inheriting `OPENAI_BASE_URL`** breaking external tools (Issue [#1002](https://github.com/NousResearch/hermes-agent/issues/1002), [#1399](https://github.com/NousResearch/hermes-agent/pull/1399))
- **Background process lost on gateway restart** — improved recovery (Issue [#1144](https://github.com/NousResearch/hermes-agent/issues/1144))
- **Cron jobs not persisting state** — now stored in SQLite (Issue [#1416](https://github.com/NousResearch/hermes-agent/issues/1416), [#1255](https://github.com/NousResearch/hermes-agent/pull/1255))
- **Cronjob `deliver: origin` not preserving thread context** (Issue [#1219](https://github.com/NousResearch/hermes-agent/issues/1219), [#1437](https://github.com/NousResearch/hermes-agent/pull/1437))
- **Gateway systemd service failing to auto-restart** when browser processes orphaned (Issue [#1617](https://github.com/NousResearch/hermes-agent/issues/1617))
- **`/background` completion report cut off in Telegram** (Issue [#1443](https://github.com/NousResearch/hermes-agent/issues/1443))
- **Model switching not taking effect** (Issue [#1244](https://github.com/NousResearch/hermes-agent/issues/1244), [#1183](https://github.com/NousResearch/hermes-agent/pull/1183))
- **`hermes doctor` reporting cronjob as unavailable** (Issue [#878](https://github.com/NousResearch/hermes-agent/issues/878), [#1180](https://github.com/NousResearch/hermes-agent/pull/1180))
- **WhatsApp bridge messages not received** from mobile (Issue [#1142](https://github.com/NousResearch/hermes-agent/issues/1142))
- **Setup wizard hanging on headless SSH** (Issue [#905](https://github.com/NousResearch/hermes-agent/issues/905), [#1274](https://github.com/NousResearch/hermes-agent/pull/1274))
- **Log handler accumulation** degrading gateway performance (Issue [#990](https://github.com/NousResearch/hermes-agent/issues/990), [#1251](https://github.com/NousResearch/hermes-agent/pull/1251))
- **Gateway NULL model in DB** (Issue [#987](https://github.com/NousResearch/hermes-agent/issues/987), [#1306](https://github.com/NousResearch/hermes-agent/pull/1306))
- **Strict endpoints rejecting replayed tool_calls** (Issue [#893](https://github.com/NousResearch/hermes-agent/issues/893))
- **Remaining hardcoded `~/.hermes` paths** — all now respect `HERMES_HOME` (Issue [#892](https://github.com/NousResearch/hermes-agent/issues/892), [#1233](https://github.com/NousResearch/hermes-agent/pull/1233))
- **Delegate tool not working with custom inference providers** (Issue [#1011](https://github.com/NousResearch/hermes-agent/issues/1011), [#1328](https://github.com/NousResearch/hermes-agent/pull/1328))
- **Skills Guard blocking official skills** (Issue [#1006](https://github.com/NousResearch/hermes-agent/issues/1006), [#1330](https://github.com/NousResearch/hermes-agent/pull/1330))
- **Setup writing provider before model selection** (Issue [#1182](https://github.com/NousResearch/hermes-agent/issues/1182))
- **`GatewayConfig.get()` AttributeError** crashing all message handling (Issue [#1158](https://github.com/NousResearch/hermes-agent/issues/1158), [#1287](https://github.com/NousResearch/hermes-agent/pull/1287))
- **`/update` hard-failing with "command not found"** (Issue [#1049](https://github.com/NousResearch/hermes-agent/issues/1049))
- **Image analysis failing silently** (Issue [#1034](https://github.com/NousResearch/hermes-agent/issues/1034), [#1338](https://github.com/NousResearch/hermes-agent/pull/1338))
- **API `BadRequestError` from `'dict'` object has no attribute `'strip'`** (Issue [#1071](https://github.com/NousResearch/hermes-agent/issues/1071))
- **Slash commands requiring exact full name** — now uses prefix matching (Issue [#928](https://github.com/NousResearch/hermes-agent/issues/928), [#1320](https://github.com/NousResearch/hermes-agent/pull/1320))
- **Gateway stops responding when terminal is closed on headless** (Issue [#1005](https://github.com/NousResearch/hermes-agent/issues/1005))
---
## 🧪 Testing
- Cover empty cached Anthropic tool-call turns ([#1222](https://github.com/NousResearch/hermes-agent/pull/1222))
- Fix stale CI assumptions in parser and quick-command coverage ([#1236](https://github.com/NousResearch/hermes-agent/pull/1236))
- Fix gateway async tests without implicit event loop ([#1278](https://github.com/NousResearch/hermes-agent/pull/1278))
- Make gateway async tests xdist-safe ([#1281](https://github.com/NousResearch/hermes-agent/pull/1281))
- Cross-timezone naive timestamp regression for cron ([#1319](https://github.com/NousResearch/hermes-agent/pull/1319))
- Isolate codex provider tests from local env ([#1335](https://github.com/NousResearch/hermes-agent/pull/1335))
- Lock retry replacement semantics ([#1379](https://github.com/NousResearch/hermes-agent/pull/1379))
- Improve error logging in session search tool — by @aydnOktay ([#1533](https://github.com/NousResearch/hermes-agent/pull/1533))
---
## 📚 Documentation
- Comprehensive SOUL.md guide ([#1315](https://github.com/NousResearch/hermes-agent/pull/1315))
- Voice mode documentation ([#1316](https://github.com/NousResearch/hermes-agent/pull/1316), [#1362](https://github.com/NousResearch/hermes-agent/pull/1362))
- Provider contribution guide ([#1361](https://github.com/NousResearch/hermes-agent/pull/1361))
- ACP and internal systems implementation guides ([#1259](https://github.com/NousResearch/hermes-agent/pull/1259))
- Expand Docusaurus coverage across CLI, tools, skills, and skins ([#1232](https://github.com/NousResearch/hermes-agent/pull/1232))
- Terminal backend and Windows troubleshooting ([#1297](https://github.com/NousResearch/hermes-agent/pull/1297))
- Skills hub reference section ([#1317](https://github.com/NousResearch/hermes-agent/pull/1317))
- Checkpoint, /rollback, and git worktrees guide ([#1493](https://github.com/NousResearch/hermes-agent/pull/1493), [#1524](https://github.com/NousResearch/hermes-agent/pull/1524))
- CLI status bar and /usage reference ([#1523](https://github.com/NousResearch/hermes-agent/pull/1523))
- Fallback providers + /background command docs ([#1430](https://github.com/NousResearch/hermes-agent/pull/1430))
- Gateway service scopes docs ([#1378](https://github.com/NousResearch/hermes-agent/pull/1378))
- Slack thread reply behavior docs ([#1407](https://github.com/NousResearch/hermes-agent/pull/1407))
- Redesigned landing page with Nous blue palette — by @austinpickett ([#974](https://github.com/NousResearch/hermes-agent/pull/974))
- Fix several documentation typos — by @JackTheGit ([#953](https://github.com/NousResearch/hermes-agent/pull/953))
- Stabilize website diagrams ([#1405](https://github.com/NousResearch/hermes-agent/pull/1405))
- CLI vs messaging quick reference in README ([#1491](https://github.com/NousResearch/hermes-agent/pull/1491))
- Add search to Docusaurus ([#1053](https://github.com/NousResearch/hermes-agent/pull/1053))
- Home Assistant integration docs ([#1170](https://github.com/NousResearch/hermes-agent/pull/1170))
---
## 👥 Contributors
### Core
- **@teknium1** — 220+ PRs spanning every area of the codebase
### Top Community Contributors
- **@0xbyt4** (4 PRs) — Anthropic adapter fixes (max_tokens, fallback crash, 429/529 retry), Slack file upload thread context, setup NameError fix
- **@erosika** (1 PR) — Honcho memory integration: async writes, memory modes, session title integration
- **@SHL0MS** (2 PRs) — ASCII video skill design patterns and refactoring
- **@alt-glitch** (2 PRs) — Persistent shell mode for local/SSH backends, setuptools packaging fix
- **@arceus77-7** (2 PRs) — 1Password skill, fix skills list mislabeling
- **@kshitijk4poor** (1 PR) — OpenClaw migration during setup wizard
- **@ASRagab** (1 PR) — Fix adaptive thinking for Claude 4.6 models
- **@eren-karakus0** (1 PR) — Strip Hermes provider env vars from subprocess environment
- **@mr-emmett-one** (1 PR) — Fix DeepSeek V3 parser multi-tool call support
- **@jplew** (1 PR) — Gateway restart on retryable startup failures
- **@brandtcormorant** (1 PR) — Fix Anthropic cache control for empty text blocks
- **@aydnOktay** (1 PR) — Improve error logging in session search tool
- **@austinpickett** (1 PR) — Landing page redesign with Nous blue palette
- **@JackTheGit** (1 PR) — Documentation typo fixes
### All Contributors
@0xbyt4, @alt-glitch, @arceus77-7, @ASRagab, @austinpickett, @aydnOktay, @brandtcormorant, @eren-karakus0, @erosika, @JackTheGit, @jplew, @kshitijk4poor, @mr-emmett-one, @SHL0MS, @teknium1
---
**Full Changelog**: [v2026.3.12...v2026.3.17](https://github.com/NousResearch/hermes-agent/compare/v2026.3.12...v2026.3.17)
+6 -9
View File
@@ -42,19 +42,16 @@ def _setup_logging() -> None:
def _load_env() -> None:
"""Load .env from HERMES_HOME (default ``~/.hermes``)."""
from dotenv import load_dotenv
from hermes_cli.env_loader import load_hermes_dotenv
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
env_file = hermes_home / ".env"
if env_file.exists():
try:
load_dotenv(dotenv_path=env_file, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(dotenv_path=env_file, encoding="latin-1")
logging.getLogger(__name__).info("Loaded env from %s", env_file)
loaded = load_hermes_dotenv(hermes_home=hermes_home)
if loaded:
for env_file in loaded:
logging.getLogger(__name__).info("Loaded env from %s", env_file)
else:
logging.getLogger(__name__).info(
"No .env found at %s, using system env", env_file
"No .env found at %s, using system env", hermes_home / ".env"
)
+157 -5
View File
@@ -42,7 +42,7 @@ from acp_adapter.events import (
make_tool_progress_cb,
)
from acp_adapter.permissions import make_approval_callback
from acp_adapter.session import SessionManager
from acp_adapter.session import SessionManager, SessionState
logger = logging.getLogger(__name__)
@@ -226,10 +226,19 @@ class HermesACPAgent(acp.Agent):
logger.error("prompt: session %s not found", session_id)
return PromptResponse(stop_reason="refusal")
user_text = _extract_text(prompt)
if not user_text.strip():
user_text = _extract_text(prompt).strip()
if not user_text:
return PromptResponse(stop_reason="end_turn")
# Intercept slash commands — handle locally without calling the LLM
if user_text.startswith("/"):
response_text = self._handle_slash_command(user_text, state)
if response_text is not None:
if self._conn:
update = acp.update_agent_message_text(response_text)
await self._conn.session_update(session_id, update)
return PromptResponse(stop_reason="end_turn")
logger.info("Prompt on session %s: %s", session_id, user_text[:100])
conn = self._conn
@@ -295,6 +304,8 @@ class HermesACPAgent(acp.Agent):
if result.get("messages"):
state.history = result["messages"]
# Persist updated history so sessions survive process restarts.
self.session_manager.save_session(session_id)
final_response = result.get("final_response", "")
if final_response and conn:
@@ -315,12 +326,152 @@ class HermesACPAgent(acp.Agent):
stop_reason = "cancelled" if state.cancel_event and state.cancel_event.is_set() else "end_turn"
return PromptResponse(stop_reason=stop_reason, usage=usage)
# ---- Model switching ----------------------------------------------------
# ---- Slash commands (headless) -------------------------------------------
_SLASH_COMMANDS = {
"help": "Show available commands",
"model": "Show or change current model",
"tools": "List available tools",
"context": "Show conversation context info",
"reset": "Clear conversation history",
"compact": "Compress conversation context",
"version": "Show Hermes version",
}
def _handle_slash_command(self, text: str, state: SessionState) -> str | None:
"""Dispatch a slash command and return the response text.
Returns ``None`` for unrecognized commands so they fall through
to the LLM (the user may have typed ``/something`` as prose).
"""
parts = text.split(maxsplit=1)
cmd = parts[0].lstrip("/").lower()
args = parts[1].strip() if len(parts) > 1 else ""
handler = {
"help": self._cmd_help,
"model": self._cmd_model,
"tools": self._cmd_tools,
"context": self._cmd_context,
"reset": self._cmd_reset,
"compact": self._cmd_compact,
"version": self._cmd_version,
}.get(cmd)
if handler is None:
return None # not a known command — let the LLM handle it
try:
return handler(args, state)
except Exception as e:
logger.error("Slash command /%s error: %s", cmd, e, exc_info=True)
return f"Error executing /{cmd}: {e}"
def _cmd_help(self, args: str, state: SessionState) -> str:
lines = ["Available commands:", ""]
for cmd, desc in self._SLASH_COMMANDS.items():
lines.append(f" /{cmd:10s} {desc}")
lines.append("")
lines.append("Unrecognized /commands are sent to the model as normal messages.")
return "\n".join(lines)
def _cmd_model(self, args: str, state: SessionState) -> str:
if not args:
model = state.model or getattr(state.agent, "model", "unknown")
provider = getattr(state.agent, "provider", None) or "auto"
return f"Current model: {model}\nProvider: {provider}"
new_model = args.strip()
target_provider = None
# Auto-detect provider for the requested model
try:
from hermes_cli.models import parse_model_input, detect_provider_for_model
current_provider = getattr(state.agent, "provider", None) or "openrouter"
target_provider, new_model = parse_model_input(new_model, current_provider)
if target_provider == current_provider:
detected = detect_provider_for_model(new_model, current_provider)
if detected:
target_provider, new_model = detected
except Exception:
logger.debug("Provider detection failed, using model as-is", exc_info=True)
state.model = new_model
state.agent = self.session_manager._make_agent(
session_id=state.session_id,
cwd=state.cwd,
model=new_model,
)
self.session_manager.save_session(state.session_id)
provider_label = target_provider or getattr(state.agent, "provider", "auto")
logger.info("Session %s: model switched to %s", state.session_id, new_model)
return f"Model switched to: {new_model}\nProvider: {provider_label}"
def _cmd_tools(self, args: str, state: SessionState) -> str:
try:
from model_tools import get_tool_definitions
toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
if not tools:
return "No tools available."
lines = [f"Available tools ({len(tools)}):"]
for t in tools:
name = t.get("function", {}).get("name", "?")
desc = t.get("function", {}).get("description", "")
# Truncate long descriptions
if len(desc) > 80:
desc = desc[:77] + "..."
lines.append(f" {name}: {desc}")
return "\n".join(lines)
except Exception as e:
return f"Could not list tools: {e}"
def _cmd_context(self, args: str, state: SessionState) -> str:
n_messages = len(state.history)
if n_messages == 0:
return "Conversation is empty (no messages yet)."
# Count by role
roles: dict[str, int] = {}
for msg in state.history:
role = msg.get("role", "unknown")
roles[role] = roles.get(role, 0) + 1
lines = [
f"Conversation: {n_messages} messages",
f" user: {roles.get('user', 0)}, assistant: {roles.get('assistant', 0)}, "
f"tool: {roles.get('tool', 0)}, system: {roles.get('system', 0)}",
]
model = state.model or getattr(state.agent, "model", "")
if model:
lines.append(f"Model: {model}")
return "\n".join(lines)
def _cmd_reset(self, args: str, state: SessionState) -> str:
state.history.clear()
self.session_manager.save_session(state.session_id)
return "Conversation history cleared."
def _cmd_compact(self, args: str, state: SessionState) -> str:
if not state.history:
return "Nothing to compress — conversation is empty."
try:
agent = state.agent
if hasattr(agent, "compress_context"):
agent.compress_context(state.history)
self.session_manager.save_session(state.session_id)
return f"Context compressed. Messages: {len(state.history)}"
return "Context compression not available for this agent."
except Exception as e:
return f"Compression failed: {e}"
def _cmd_version(self, args: str, state: SessionState) -> str:
return f"Hermes Agent v{HERMES_VERSION}"
# ---- Model switching (ACP protocol method) -------------------------------
async def set_session_model(
self, model_id: str, session_id: str, **kwargs: Any
):
"""Switch the model for a session."""
"""Switch the model for a session (called by ACP protocol)."""
state = self.session_manager.get_session(session_id)
if state:
state.model = model_id
@@ -329,5 +480,6 @@ class HermesACPAgent(acp.Agent):
cwd=state.cwd,
model=model_id,
)
self.session_manager.save_session(session_id)
logger.info("Session %s: model switched to %s", session_id, model_id)
return None
+262 -34
View File
@@ -1,7 +1,15 @@
"""ACP session manager — maps ACP sessions to Hermes AIAgent instances."""
"""ACP session manager — maps ACP sessions to Hermes AIAgent instances.
Sessions are persisted to the shared SessionDB (``~/.hermes/state.db``) so they
survive process restarts and appear in ``session_search``. When the editor
reconnects after idle/restart, the ``load_session`` / ``resume_session`` calls
find the persisted session in the database and restore the full conversation
history.
"""
from __future__ import annotations
import copy
import json
import logging
import uuid
from dataclasses import dataclass, field
@@ -46,18 +54,26 @@ class SessionState:
class SessionManager:
"""Thread-safe manager for ACP sessions backed by Hermes AIAgent instances."""
"""Thread-safe manager for ACP sessions backed by Hermes AIAgent instances.
def __init__(self, agent_factory=None):
Sessions are held in-memory for fast access **and** persisted to the
shared SessionDB so they survive process restarts and are searchable
via ``session_search``.
"""
def __init__(self, agent_factory=None, db=None):
"""
Args:
agent_factory: Optional callable that creates an AIAgent-like object.
Used by tests. When omitted, a real AIAgent is created
using the current Hermes runtime provider configuration.
db: Optional SessionDB instance. When omitted, the default
SessionDB (``~/.hermes/state.db``) is lazily created.
"""
self._sessions: Dict[str, SessionState] = {}
self._lock = Lock()
self._agent_factory = agent_factory
self._db_instance = db # None → lazy-init on first use
# ---- public API ---------------------------------------------------------
@@ -77,54 +93,67 @@ class SessionManager:
with self._lock:
self._sessions[session_id] = state
_register_task_cwd(session_id, cwd)
self._persist(state)
logger.info("Created ACP session %s (cwd=%s)", session_id, cwd)
return state
def get_session(self, session_id: str) -> Optional[SessionState]:
"""Return the session for *session_id*, or ``None``."""
"""Return the session for *session_id*, or ``None``.
If the session is not in memory but exists in the database (e.g. after
a process restart), it is transparently restored.
"""
with self._lock:
return self._sessions.get(session_id)
state = self._sessions.get(session_id)
if state is not None:
return state
# Attempt to restore from database.
return self._restore(session_id)
def remove_session(self, session_id: str) -> bool:
"""Remove a session. Returns True if it existed."""
"""Remove a session from memory and database. Returns True if it existed."""
with self._lock:
existed = self._sessions.pop(session_id, None) is not None
if existed:
db_existed = self._delete_persisted(session_id)
if existed or db_existed:
_clear_task_cwd(session_id)
return existed
return existed or db_existed
def fork_session(self, session_id: str, cwd: str = ".") -> Optional[SessionState]:
"""Deep-copy a session's history into a new session."""
import threading
with self._lock:
original = self._sessions.get(session_id)
if original is None:
return None
original = self.get_session(session_id) # checks DB too
if original is None:
return None
new_id = str(uuid.uuid4())
agent = self._make_agent(
session_id=new_id,
cwd=cwd,
model=original.model or None,
)
state = SessionState(
session_id=new_id,
agent=agent,
cwd=cwd,
model=getattr(agent, "model", original.model) or original.model,
history=copy.deepcopy(original.history),
cancel_event=threading.Event(),
)
new_id = str(uuid.uuid4())
agent = self._make_agent(
session_id=new_id,
cwd=cwd,
model=original.model or None,
)
state = SessionState(
session_id=new_id,
agent=agent,
cwd=cwd,
model=getattr(agent, "model", original.model) or original.model,
history=copy.deepcopy(original.history),
cancel_event=threading.Event(),
)
with self._lock:
self._sessions[new_id] = state
_register_task_cwd(new_id, cwd)
self._persist(state)
logger.info("Forked ACP session %s -> %s", session_id, new_id)
return state
def list_sessions(self) -> List[Dict[str, Any]]:
"""Return lightweight info dicts for all sessions."""
"""Return lightweight info dicts for all sessions (memory + database)."""
# Collect in-memory sessions first.
with self._lock:
return [
seen_ids = set(self._sessions.keys())
results = [
{
"session_id": s.session_id,
"cwd": s.cwd,
@@ -134,23 +163,220 @@ class SessionManager:
for s in self._sessions.values()
]
# Merge any persisted sessions not currently in memory.
db = self._get_db()
if db is not None:
try:
rows = db.search_sessions(source="acp", limit=1000)
for row in rows:
sid = row["id"]
if sid in seen_ids:
continue
# Extract cwd from model_config JSON.
cwd = "."
mc = row.get("model_config")
if mc:
try:
cwd = json.loads(mc).get("cwd", ".")
except (json.JSONDecodeError, TypeError):
pass
results.append({
"session_id": sid,
"cwd": cwd,
"model": row.get("model") or "",
"history_len": row.get("message_count") or 0,
})
except Exception:
logger.debug("Failed to list ACP sessions from DB", exc_info=True)
return results
def update_cwd(self, session_id: str, cwd: str) -> Optional[SessionState]:
"""Update the working directory for a session and its tool overrides."""
with self._lock:
state = self._sessions.get(session_id)
if state is None:
return None
state.cwd = cwd
state = self.get_session(session_id) # checks DB too
if state is None:
return None
state.cwd = cwd
_register_task_cwd(session_id, cwd)
self._persist(state)
return state
def cleanup(self) -> None:
"""Remove all sessions and clear task-specific cwd overrides."""
"""Remove all sessions (memory and database) and clear task-specific cwd overrides."""
with self._lock:
session_ids = list(self._sessions.keys())
self._sessions.clear()
for session_id in session_ids:
_clear_task_cwd(session_id)
self._delete_persisted(session_id)
# Also remove any DB-only ACP sessions not currently in memory.
db = self._get_db()
if db is not None:
try:
rows = db.search_sessions(source="acp", limit=10000)
for row in rows:
sid = row["id"]
_clear_task_cwd(sid)
db.delete_session(sid)
except Exception:
logger.debug("Failed to cleanup ACP sessions from DB", exc_info=True)
def save_session(self, session_id: str) -> None:
"""Persist the current state of a session to the database.
Called by the server after prompt completion, slash commands that
mutate history, and model switches.
"""
with self._lock:
state = self._sessions.get(session_id)
if state is not None:
self._persist(state)
# ---- persistence via SessionDB ------------------------------------------
def _get_db(self):
"""Lazily initialise and return the SessionDB instance.
Returns ``None`` if the DB is unavailable (e.g. import error in a
minimal test environment).
Note: we resolve ``HERMES_HOME`` dynamically rather than relying on
the module-level ``DEFAULT_DB_PATH`` constant, because that constant
is evaluated at import time and won't reflect env-var changes made
later (e.g. by the test fixture ``_isolate_hermes_home``).
"""
if self._db_instance is not None:
return self._db_instance
try:
import os
from pathlib import Path
from hermes_state import SessionDB
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
self._db_instance = SessionDB(db_path=hermes_home / "state.db")
return self._db_instance
except Exception:
logger.debug("SessionDB unavailable for ACP persistence", exc_info=True)
return None
def _persist(self, state: SessionState) -> None:
"""Write session state to the database.
Creates the session record if it doesn't exist, then replaces all
stored messages with the current in-memory history.
"""
db = self._get_db()
if db is None:
return
# Ensure model is a plain string (not a MagicMock or other proxy).
model_str = str(state.model) if state.model else None
cwd_json = json.dumps({"cwd": state.cwd})
try:
# Ensure the session record exists.
existing = db.get_session(state.session_id)
if existing is None:
db.create_session(
session_id=state.session_id,
source="acp",
model=model_str,
model_config={"cwd": state.cwd},
)
else:
# Update model_config (contains cwd) if changed.
try:
with db._lock:
db._conn.execute(
"UPDATE sessions SET model_config = ?, model = COALESCE(?, model) WHERE id = ?",
(cwd_json, model_str, state.session_id),
)
db._conn.commit()
except Exception:
logger.debug("Failed to update ACP session metadata", exc_info=True)
# Replace stored messages with current history.
db.clear_messages(state.session_id)
for msg in state.history:
db.append_message(
session_id=state.session_id,
role=msg.get("role", "user"),
content=msg.get("content"),
tool_name=msg.get("tool_name") or msg.get("name"),
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
)
except Exception:
logger.warning("Failed to persist ACP session %s", state.session_id, exc_info=True)
def _restore(self, session_id: str) -> Optional[SessionState]:
"""Load a session from the database into memory, recreating the AIAgent."""
import threading
db = self._get_db()
if db is None:
return None
try:
row = db.get_session(session_id)
except Exception:
logger.debug("Failed to query DB for ACP session %s", session_id, exc_info=True)
return None
if row is None:
return None
# Only restore ACP sessions.
if row.get("source") != "acp":
return None
# Extract cwd from model_config.
cwd = "."
mc = row.get("model_config")
if mc:
try:
cwd = json.loads(mc).get("cwd", ".")
except (json.JSONDecodeError, TypeError):
pass
model = row.get("model") or None
# Load conversation history.
try:
history = db.get_messages_as_conversation(session_id)
except Exception:
logger.warning("Failed to load messages for ACP session %s", session_id, exc_info=True)
history = []
try:
agent = self._make_agent(session_id=session_id, cwd=cwd, model=model)
except Exception:
logger.warning("Failed to recreate agent for ACP session %s", session_id, exc_info=True)
return None
state = SessionState(
session_id=session_id,
agent=agent,
cwd=cwd,
model=model or getattr(agent, "model", "") or "",
history=history,
cancel_event=threading.Event(),
)
with self._lock:
self._sessions[session_id] = state
_register_task_cwd(session_id, cwd)
logger.info("Restored ACP session %s from DB (%d messages)", session_id, len(history))
return state
def _delete_persisted(self, session_id: str) -> bool:
"""Delete a session from the database. Returns True if it existed."""
db = self._get_db()
if db is None:
return False
try:
return db.delete_session(session_id)
except Exception:
logger.debug("Failed to delete ACP session %s from DB", session_id, exc_info=True)
return False
# ---- internal -----------------------------------------------------------
@@ -194,6 +420,8 @@ class SessionManager:
"api_mode": runtime.get("api_mode"),
"base_url": runtime.get("base_url"),
"api_key": runtime.get("api_key"),
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
}
)
except Exception:
+423 -24
View File
@@ -45,14 +45,49 @@ _COMMON_BETAS = [
"fine-grained-tool-streaming-2025-05-14",
]
# Additional beta headers required for OAuth/subscription auth
# Both clawdbot and OpenCode include claude-code-20250219 alongside oauth-2025-04-20.
# Without claude-code-20250219, Anthropic's API rejects OAuth tokens with 401.
# Additional beta headers required for OAuth/subscription auth.
# Matches what Claude Code (and pi-ai / OpenCode) send.
_OAUTH_ONLY_BETAS = [
"claude-code-20250219",
"oauth-2025-04-20",
]
# Claude Code identity — required for OAuth requests to be routed correctly.
# Without these, Anthropic's infrastructure intermittently 500s OAuth traffic.
# The version must stay reasonably current — Anthropic rejects OAuth requests
# when the spoofed user-agent version is too far behind the actual release.
_CLAUDE_CODE_VERSION_FALLBACK = "2.1.74"
def _detect_claude_code_version() -> str:
"""Detect the installed Claude Code version, fall back to a static constant.
Anthropic's OAuth infrastructure validates the user-agent version and may
reject requests with a version that's too old. Detecting dynamically means
users who keep Claude Code updated never hit stale-version 400s.
"""
import subprocess as _sp
for cmd in ("claude", "claude-code"):
try:
result = _sp.run(
[cmd, "--version"],
capture_output=True, text=True, timeout=5,
)
if result.returncode == 0 and result.stdout.strip():
# Output is like "2.1.74 (Claude Code)" or just "2.1.74"
version = result.stdout.strip().split()[0]
if version and version[0].isdigit():
return version
except Exception:
pass
return _CLAUDE_CODE_VERSION_FALLBACK
_CLAUDE_CODE_VERSION = _detect_claude_code_version()
_CLAUDE_CODE_SYSTEM_PREFIX = "You are Claude Code, Anthropic's official CLI for Claude."
_MCP_TOOL_PREFIX = "mcp_"
def _is_oauth_token(key: str) -> bool:
"""Check if the key is an OAuth/setup token (not a regular Console API key).
@@ -88,10 +123,16 @@ def build_anthropic_client(api_key: str, base_url: str = None):
kwargs["base_url"] = base_url
if _is_oauth_token(api_key):
# OAuth access token / setup-token → Bearer auth + beta headers
# OAuth access token / setup-token → Bearer auth + Claude Code identity.
# Anthropic routes OAuth requests based on user-agent and headers;
# without Claude Code's fingerprint, requests get intermittent 500s.
all_betas = _COMMON_BETAS + _OAUTH_ONLY_BETAS
kwargs["auth_token"] = api_key
kwargs["default_headers"] = {"anthropic-beta": ",".join(all_betas)}
kwargs["default_headers"] = {
"anthropic-beta": ",".join(all_betas),
"user-agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
"x-app": "cli",
}
else:
# Regular API key → x-api-key header + common betas
kwargs["api_key"] = api_key
@@ -189,7 +230,10 @@ def _refresh_oauth_token(creds: Dict[str, Any]) -> Optional[str]:
req = urllib.request.Request(
"https://console.anthropic.com/v1/oauth/token",
data=data,
headers={"Content-Type": "application/x-www-form-urlencoded"},
headers={
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
},
method="POST",
)
@@ -332,12 +376,24 @@ def resolve_anthropic_token() -> Optional[str]:
return preferred
return cc_token
# 3. Claude Code credential file
# 3. Hermes-managed OAuth credentials (~/.hermes/.anthropic_oauth.json)
hermes_creds = read_hermes_oauth_credentials()
if hermes_creds:
if is_claude_code_token_valid(hermes_creds):
logger.debug("Using Hermes-managed OAuth credentials")
return hermes_creds["accessToken"]
# Expired — try refresh
logger.debug("Hermes OAuth token expired — attempting refresh")
refreshed = refresh_hermes_oauth_token()
if refreshed:
return refreshed
# 4. Claude Code credential file
resolved_claude_token = _resolve_claude_code_token_from_credentials(creds)
if resolved_claude_token:
return resolved_claude_token
# 4. Regular API key, or a legacy OAuth token saved in ANTHROPIC_API_KEY.
# 5. Regular API key, or a legacy OAuth token saved in ANTHROPIC_API_KEY.
# This remains as a compatibility fallback for pre-migration Hermes configs.
api_key = os.getenv("ANTHROPIC_API_KEY", "").strip()
if api_key:
@@ -386,6 +442,215 @@ def run_oauth_setup_token() -> Optional[str]:
return None
# ── Hermes-native PKCE OAuth flow ────────────────────────────────────────
# Mirrors the flow used by Claude Code, pi-ai, and OpenCode.
# Stores credentials in ~/.hermes/.anthropic_oauth.json (our own file).
_OAUTH_CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
_OAUTH_TOKEN_URL = "https://console.anthropic.com/v1/oauth/token"
_OAUTH_REDIRECT_URI = "https://console.anthropic.com/oauth/code/callback"
_OAUTH_SCOPES = "org:create_api_key user:profile user:inference"
_HERMES_OAUTH_FILE = Path(os.getenv("HERMES_HOME", str(Path.home() / ".hermes"))) / ".anthropic_oauth.json"
def _generate_pkce() -> tuple:
"""Generate PKCE code_verifier and code_challenge (S256)."""
import base64
import hashlib
import secrets
verifier = base64.urlsafe_b64encode(secrets.token_bytes(32)).rstrip(b"=").decode()
challenge = base64.urlsafe_b64encode(
hashlib.sha256(verifier.encode()).digest()
).rstrip(b"=").decode()
return verifier, challenge
def run_hermes_oauth_login() -> Optional[str]:
"""Run Hermes-native OAuth PKCE flow for Claude Pro/Max subscription.
Opens a browser to claude.ai for authorization, prompts for the code,
exchanges it for tokens, and stores them in ~/.hermes/.anthropic_oauth.json.
Returns the access token on success, None on failure.
"""
import time
import webbrowser
verifier, challenge = _generate_pkce()
# Build authorization URL
params = {
"code": "true",
"client_id": _OAUTH_CLIENT_ID,
"response_type": "code",
"redirect_uri": _OAUTH_REDIRECT_URI,
"scope": _OAUTH_SCOPES,
"code_challenge": challenge,
"code_challenge_method": "S256",
"state": verifier,
}
from urllib.parse import urlencode
auth_url = f"https://claude.ai/oauth/authorize?{urlencode(params)}"
print()
print("Authorize Hermes with your Claude Pro/Max subscription.")
print()
print("╭─ Claude Pro/Max Authorization ────────────────────╮")
print("│ │")
print("│ Open this link in your browser: │")
print("╰───────────────────────────────────────────────────╯")
print()
print(f" {auth_url}")
print()
# Try to open browser automatically (works on desktop, silently fails on headless/SSH)
try:
webbrowser.open(auth_url)
print(" (Browser opened automatically)")
except Exception:
pass
print()
print("After authorizing, you'll see a code. Paste it below.")
print()
try:
auth_code = input("Authorization code: ").strip()
except (KeyboardInterrupt, EOFError):
return None
if not auth_code:
print("No code entered.")
return None
# Split code#state format
splits = auth_code.split("#")
code = splits[0]
state = splits[1] if len(splits) > 1 else ""
# Exchange code for tokens
try:
import urllib.request
exchange_data = json.dumps({
"grant_type": "authorization_code",
"client_id": _OAUTH_CLIENT_ID,
"code": code,
"state": state,
"redirect_uri": _OAUTH_REDIRECT_URI,
"code_verifier": verifier,
}).encode()
req = urllib.request.Request(
_OAUTH_TOKEN_URL,
data=exchange_data,
headers={
"Content-Type": "application/json",
"User-Agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
},
method="POST",
)
with urllib.request.urlopen(req, timeout=15) as resp:
result = json.loads(resp.read().decode())
except Exception as e:
print(f"Token exchange failed: {e}")
return None
access_token = result.get("access_token", "")
refresh_token = result.get("refresh_token", "")
expires_in = result.get("expires_in", 3600)
if not access_token:
print("No access token in response.")
return None
# Store credentials
expires_at_ms = int(time.time() * 1000) + (expires_in * 1000)
_save_hermes_oauth_credentials(access_token, refresh_token, expires_at_ms)
# Also write to Claude Code's credential file for backward compat
_write_claude_code_credentials(access_token, refresh_token, expires_at_ms)
print("Authentication successful!")
return access_token
def _save_hermes_oauth_credentials(access_token: str, refresh_token: str, expires_at_ms: int) -> None:
"""Save OAuth credentials to ~/.hermes/.anthropic_oauth.json."""
data = {
"accessToken": access_token,
"refreshToken": refresh_token,
"expiresAt": expires_at_ms,
}
try:
_HERMES_OAUTH_FILE.parent.mkdir(parents=True, exist_ok=True)
_HERMES_OAUTH_FILE.write_text(json.dumps(data, indent=2), encoding="utf-8")
_HERMES_OAUTH_FILE.chmod(0o600)
except (OSError, IOError) as e:
logger.debug("Failed to save Hermes OAuth credentials: %s", e)
def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
"""Read Hermes-managed OAuth credentials from ~/.hermes/.anthropic_oauth.json."""
if _HERMES_OAUTH_FILE.exists():
try:
data = json.loads(_HERMES_OAUTH_FILE.read_text(encoding="utf-8"))
if data.get("accessToken"):
return data
except (json.JSONDecodeError, OSError, IOError) as e:
logger.debug("Failed to read Hermes OAuth credentials: %s", e)
return None
def refresh_hermes_oauth_token() -> Optional[str]:
"""Refresh the Hermes-managed OAuth token using the stored refresh token.
Returns the new access token, or None if refresh fails.
"""
import time
import urllib.request
creds = read_hermes_oauth_credentials()
if not creds or not creds.get("refreshToken"):
return None
try:
data = json.dumps({
"grant_type": "refresh_token",
"refresh_token": creds["refreshToken"],
"client_id": _OAUTH_CLIENT_ID,
}).encode()
req = urllib.request.Request(
_OAUTH_TOKEN_URL,
data=data,
headers={
"Content-Type": "application/json",
"User-Agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
},
method="POST",
)
with urllib.request.urlopen(req, timeout=10) as resp:
result = json.loads(resp.read().decode())
new_access = result.get("access_token", "")
new_refresh = result.get("refresh_token", creds["refreshToken"])
expires_in = result.get("expires_in", 3600)
if new_access:
new_expires_ms = int(time.time() * 1000) + (expires_in * 1000)
_save_hermes_oauth_credentials(new_access, new_refresh, new_expires_ms)
# Also update Claude Code's credential file
_write_claude_code_credentials(new_access, new_refresh, new_expires_ms)
logger.debug("Successfully refreshed Hermes OAuth token")
return new_access
except Exception as e:
logger.debug("Failed to refresh Hermes OAuth token: %s", e)
return None
# ---------------------------------------------------------------------------
# Message / tool / response format conversion
# ---------------------------------------------------------------------------
@@ -497,6 +762,66 @@ def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
return result
def _image_source_from_openai_url(url: str) -> Dict[str, str]:
"""Convert an OpenAI-style image URL/data URL into Anthropic image source."""
url = str(url or "").strip()
if not url:
return {"type": "url", "url": ""}
if url.startswith("data:"):
header, _, data = url.partition(",")
media_type = "image/jpeg"
if header.startswith("data:"):
mime_part = header[len("data:"):].split(";", 1)[0].strip()
if mime_part.startswith("image/"):
media_type = mime_part
return {
"type": "base64",
"media_type": media_type,
"data": data,
}
return {"type": "url", "url": url}
def _convert_content_part_to_anthropic(part: Any) -> Optional[Dict[str, Any]]:
"""Convert a single OpenAI-style content part to Anthropic format."""
if part is None:
return None
if isinstance(part, str):
return {"type": "text", "text": part}
if not isinstance(part, dict):
return {"type": "text", "text": str(part)}
ptype = part.get("type")
if ptype == "input_text":
block: Dict[str, Any] = {"type": "text", "text": part.get("text", "")}
elif ptype in {"image_url", "input_image"}:
image_value = part.get("image_url", {})
url = image_value.get("url", "") if isinstance(image_value, dict) else str(image_value or "")
block = {"type": "image", "source": _image_source_from_openai_url(url)}
else:
block = dict(part)
if isinstance(part.get("cache_control"), dict) and "cache_control" not in block:
block["cache_control"] = dict(part["cache_control"])
return block
def _convert_content_to_anthropic(content: Any) -> Any:
"""Convert OpenAI-style multimodal content arrays to Anthropic blocks."""
if not isinstance(content, list):
return content
converted = []
for part in content:
block = _convert_content_part_to_anthropic(part)
if block is not None:
converted.append(block)
return converted
def convert_messages_to_anthropic(
messages: List[Dict],
) -> Tuple[Optional[Any], List[Dict]]:
@@ -533,11 +858,9 @@ def convert_messages_to_anthropic(
blocks = []
if content:
if isinstance(content, list):
for part in content:
if isinstance(part, dict):
blocks.append(dict(part))
elif part is not None:
blocks.append({"type": "text", "text": str(part)})
converted_content = _convert_content_to_anthropic(content)
if isinstance(converted_content, list):
blocks.extend(converted_content)
else:
blocks.append({"type": "text", "text": str(content)})
for tc in m.get("tool_calls", []):
@@ -587,12 +910,11 @@ def convert_messages_to_anthropic(
# Regular user message
if isinstance(content, list):
converted_blocks = []
for part in content:
converted = _convert_user_content_part_to_anthropic(part)
if converted is not None:
converted_blocks.append(converted)
result.append({"role": "user", "content": converted_blocks or [{"type": "text", "text": ""}]})
converted_blocks = _convert_content_to_anthropic(content)
result.append({
"role": "user",
"content": converted_blocks or [{"type": "text", "text": ""}],
})
else:
result.append({"role": "user", "content": content})
@@ -613,6 +935,26 @@ def convert_messages_to_anthropic(
if not m["content"]:
m["content"] = [{"type": "text", "text": "(tool call removed)"}]
# Strip orphaned tool_result blocks (no matching tool_use precedes them).
# This is the mirror of the above: context compression or session truncation
# can remove an assistant message containing a tool_use while leaving the
# subsequent tool_result intact. Anthropic rejects these with a 400.
tool_use_ids = set()
for m in result:
if m["role"] == "assistant" and isinstance(m["content"], list):
for block in m["content"]:
if block.get("type") == "tool_use":
tool_use_ids.add(block.get("id"))
for m in result:
if m["role"] == "user" and isinstance(m["content"], list):
m["content"] = [
b
for b in m["content"]
if b.get("type") != "tool_result" or b.get("tool_use_id") in tool_use_ids
]
if not m["content"]:
m["content"] = [{"type": "text", "text": "(tool result removed)"}]
# Enforce strict role alternation (Anthropic rejects consecutive same-role messages)
fixed = []
for m in result:
@@ -641,8 +983,12 @@ def convert_messages_to_anthropic(
elif isinstance(prev_blocks, str) and isinstance(curr_blocks, str):
fixed[-1]["content"] = prev_blocks + "\n" + curr_blocks
else:
# Keep the later message
fixed[-1] = m
# Mixed types — normalize both to list and merge
if isinstance(prev_blocks, str):
prev_blocks = [{"type": "text", "text": prev_blocks}]
if isinstance(curr_blocks, str):
curr_blocks = [{"type": "text", "text": curr_blocks}]
fixed[-1]["content"] = prev_blocks + curr_blocks
else:
fixed.append(m)
result = fixed
@@ -657,14 +1003,59 @@ def build_anthropic_kwargs(
max_tokens: Optional[int],
reasoning_config: Optional[Dict[str, Any]],
tool_choice: Optional[str] = None,
is_oauth: bool = False,
) -> Dict[str, Any]:
"""Build kwargs for anthropic.messages.create()."""
"""Build kwargs for anthropic.messages.create().
When *is_oauth* is True, applies Claude Code compatibility transforms:
system prompt prefix, tool name prefixing, and prompt sanitization.
"""
system, anthropic_messages = convert_messages_to_anthropic(messages)
anthropic_tools = convert_tools_to_anthropic(tools) if tools else []
model = normalize_model_name(model)
effective_max_tokens = max_tokens or 16384
# ── OAuth: Claude Code identity ──────────────────────────────────
if is_oauth:
# 1. Prepend Claude Code system prompt identity
cc_block = {"type": "text", "text": _CLAUDE_CODE_SYSTEM_PREFIX}
if isinstance(system, list):
system = [cc_block] + system
elif isinstance(system, str) and system:
system = [cc_block, {"type": "text", "text": system}]
else:
system = [cc_block]
# 2. Sanitize system prompt — replace product name references
# to avoid Anthropic's server-side content filters.
for block in system:
if isinstance(block, dict) and block.get("type") == "text":
text = block.get("text", "")
text = text.replace("Hermes Agent", "Claude Code")
text = text.replace("Hermes agent", "Claude Code")
text = text.replace("hermes-agent", "claude-code")
text = text.replace("Nous Research", "Anthropic")
block["text"] = text
# 3. Prefix tool names with mcp_ (Claude Code convention)
if anthropic_tools:
for tool in anthropic_tools:
if "name" in tool:
tool["name"] = _MCP_TOOL_PREFIX + tool["name"]
# 4. Prefix tool names in message history (tool_use and tool_result blocks)
for msg in anthropic_messages:
content = msg.get("content")
if isinstance(content, list):
for block in content:
if isinstance(block, dict):
if block.get("type") == "tool_use" and "name" in block:
if not block["name"].startswith(_MCP_TOOL_PREFIX):
block["name"] = _MCP_TOOL_PREFIX + block["name"]
elif block.get("type") == "tool_result" and "tool_use_id" in block:
pass # tool_result uses ID, not name
kwargs: Dict[str, Any] = {
"model": model,
"messages": anthropic_messages,
@@ -682,7 +1073,8 @@ def build_anthropic_kwargs(
elif tool_choice == "required":
kwargs["tool_choice"] = {"type": "any"}
elif tool_choice == "none":
pass # Don't send tool_choice — Anthropic will use tools if needed
# Anthropic has no tool_choice "none" — omit tools entirely to prevent use
kwargs.pop("tools", None)
elif isinstance(tool_choice, str):
# Specific tool name
kwargs["tool_choice"] = {"type": "tool", "name": tool_choice}
@@ -711,11 +1103,15 @@ def build_anthropic_kwargs(
def normalize_anthropic_response(
response,
strip_tool_prefix: bool = False,
) -> Tuple[SimpleNamespace, str]:
"""Normalize Anthropic response to match the shape expected by AIAgent.
Returns (assistant_message, finish_reason) where assistant_message has
.content, .tool_calls, and .reasoning attributes.
When *strip_tool_prefix* is True, removes the ``mcp_`` prefix that was
added to tool names for OAuth Claude Code compatibility.
"""
text_parts = []
reasoning_parts = []
@@ -727,12 +1123,15 @@ def normalize_anthropic_response(
elif block.type == "thinking":
reasoning_parts.append(block.thinking)
elif block.type == "tool_use":
name = block.name
if strip_tool_prefix and name.startswith(_MCP_TOOL_PREFIX):
name = name[len(_MCP_TOOL_PREFIX):]
tool_calls.append(
SimpleNamespace(
id=block.id,
type="function",
function=SimpleNamespace(
name=block.name,
name=name,
arguments=json.dumps(block.input),
),
)
+94 -51
View File
@@ -39,6 +39,7 @@ custom OpenAI-compatible endpoint without touching the main model settings.
import json
import logging
import os
import threading
from pathlib import Path
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
@@ -54,9 +55,13 @@ logger = logging.getLogger(__name__)
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"zai": "glm-4.5-flash",
"kimi-coding": "kimi-k2-turbo-preview",
"minimax": "MiniMax-M2.5-highspeed",
"minimax-cn": "MiniMax-M2.5-highspeed",
"minimax": "MiniMax-M2.7-highspeed",
"minimax-cn": "MiniMax-M2.7-highspeed",
"anthropic": "claude-haiku-4-5-20251001",
"ai-gateway": "google/gemini-3-flash",
"opencode-zen": "gemini-3-flash",
"opencode-go": "glm-5",
"kilocode": "google/gemini-3-flash-preview",
}
# OpenRouter app attribution headers
@@ -83,7 +88,10 @@ _AUTH_JSON_PATH = get_hermes_home() / "auth.json"
# Codex fallback: uses the Responses API (the only endpoint the Codex
# OAuth token can access) with a fast model for auxiliary tasks.
_CODEX_AUX_MODEL = "gpt-5.3-codex"
# ChatGPT-backed Codex accounts currently reject gpt-5.3-codex for these
# auxiliary flows, while gpt-5.2-codex remains broadly available and supports
# vision via Responses.
_CODEX_AUX_MODEL = "gpt-5.2-codex"
_CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
@@ -472,11 +480,11 @@ def _read_codex_access_token() -> Optional[str]:
def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Try each API-key provider in PROVIDER_REGISTRY order.
Returns (client, model) for the first provider whose env var is set,
or (None, None) if none are configured.
Returns (client, model) for the first provider with usable runtime
credentials, or (None, None) if none are configured.
"""
try:
from hermes_cli.auth import PROVIDER_REGISTRY
from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
except ImportError:
logger.debug("Could not import PROVIDER_REGISTRY for API-key fallback")
return None, None
@@ -484,34 +492,24 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
for provider_id, pconfig in PROVIDER_REGISTRY.items():
if pconfig.auth_type != "api_key":
continue
# Check if any of the provider's env vars are set
api_key = ""
for env_var in pconfig.api_key_env_vars:
val = os.getenv(env_var, "").strip()
if val:
api_key = val
break
if not api_key:
continue
if provider_id == "anthropic":
return _try_anthropic()
# Resolve base URL (with optional env-var override)
# Kimi Code keys (sk-kimi-) need api.kimi.com/coding/v1
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if env_url:
base_url = env_url.rstrip("/")
elif provider_id == "kimi-coding" and api_key.startswith("sk-kimi-"):
base_url = "https://api.kimi.com/coding/v1"
else:
base_url = pconfig.inference_base_url
creds = resolve_api_key_provider_credentials(provider_id)
api_key = str(creds.get("api_key", "")).strip()
if not api_key:
continue
base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
extra = {}
if "api.kimi.com" in base_url.lower():
extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
elif "api.githubcopilot.com" in base_url.lower():
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
return None, None
@@ -656,10 +654,23 @@ def _try_anthropic() -> Tuple[Optional[Any], Optional[str]]:
if not token:
return None, None
# Allow base URL override from config.yaml model.base_url
base_url = _ANTHROPIC_DEFAULT_BASE_URL
try:
from hermes_cli.config import load_config
cfg = load_config()
model_cfg = cfg.get("model")
if isinstance(model_cfg, dict):
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
if cfg_base_url:
base_url = cfg_base_url
except Exception:
pass
model = _API_KEY_PROVIDER_AUX_MODELS.get("anthropic", "claude-haiku-4-5-20251001")
logger.debug("Auxiliary client: Anthropic native (%s)", model)
real_client = build_anthropic_client(token, _ANTHROPIC_DEFAULT_BASE_URL)
return AnthropicAuxiliaryClient(real_client, model, token, _ANTHROPIC_DEFAULT_BASE_URL), model
logger.debug("Auxiliary client: Anthropic native (%s) at %s", model, base_url)
real_client = build_anthropic_client(token, base_url)
return AnthropicAuxiliaryClient(real_client, model, token, base_url), model
def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[str]]:
@@ -698,6 +709,8 @@ def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[st
def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Full auto-detection chain: OpenRouter → Nous → custom → Codex → API-key → None."""
global auxiliary_is_nous
auxiliary_is_nous = False # Reset — _try_nous() will set True if it wins
for try_fn in (_try_openrouter, _try_nous, _try_custom_endpoint,
_try_codex, _resolve_api_key_provider):
client, model = try_fn()
@@ -734,6 +747,10 @@ def _to_async_client(sync_client, model: str):
base_lower = str(sync_client.base_url).lower()
if "openrouter" in base_lower:
async_kwargs["default_headers"] = dict(_OR_HEADERS)
elif "api.githubcopilot.com" in base_lower:
from hermes_cli.models import copilot_default_headers
async_kwargs["default_headers"] = copilot_default_headers()
elif "api.kimi.com" in base_lower:
async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
return AsyncOpenAI(**async_kwargs), model
@@ -875,7 +892,7 @@ def resolve_provider_client(
# ── API-key providers from PROVIDER_REGISTRY ─────────────────────
try:
from hermes_cli.auth import PROVIDER_REGISTRY, _resolve_kimi_base_url
from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
except ImportError:
logger.debug("hermes_cli.auth not available for provider %s", provider)
return None, None
@@ -894,26 +911,18 @@ def resolve_provider_client(
final_model = model or default_model
return (_to_async_client(client, final_model) if async_mode else (client, final_model))
# Find the first configured API key
api_key = ""
for env_var in pconfig.api_key_env_vars:
api_key = os.getenv(env_var, "").strip()
if api_key:
break
creds = resolve_api_key_provider_credentials(provider)
api_key = str(creds.get("api_key", "")).strip()
if not api_key:
tried_sources = list(pconfig.api_key_env_vars)
if provider == "copilot":
tried_sources.append("gh auth token")
logger.warning("resolve_provider_client: provider %s has no API "
"key configured (tried: %s)",
provider, ", ".join(pconfig.api_key_env_vars))
provider, ", ".join(tried_sources))
return None, None
# Resolve base URL (env override → provider-specific logic → default)
base_url_override = os.getenv(pconfig.base_url_env_var, "").strip() if pconfig.base_url_env_var else ""
if provider == "kimi-coding":
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, base_url_override)
elif base_url_override:
base_url = base_url_override
else:
base_url = pconfig.inference_base_url
base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
final_model = model or default_model
@@ -922,6 +931,10 @@ def resolve_provider_client(
headers = {}
if "api.kimi.com" in base_url.lower():
headers["User-Agent"] = "KimiCLI/1.0"
elif "api.githubcopilot.com" in base_url.lower():
from hermes_cli.models import copilot_default_headers
headers.update(copilot_default_headers())
client = OpenAI(api_key=api_key, base_url=base_url,
**({"default_headers": headers} if headers else {}))
@@ -1164,6 +1177,7 @@ def auxiliary_max_tokens_param(value: int) -> dict:
# Client cache: (provider, async_mode, base_url, api_key) -> (client, default_model)
_client_cache: Dict[tuple, tuple] = {}
_client_cache_lock = threading.Lock()
def _get_cached_client(
@@ -1175,9 +1189,21 @@ def _get_cached_client(
) -> Tuple[Optional[Any], Optional[str]]:
"""Get or create a cached client for the given provider."""
cache_key = (provider, async_mode, base_url or "", api_key or "")
if cache_key in _client_cache:
cached_client, cached_default = _client_cache[cache_key]
return cached_client, model or cached_default
with _client_cache_lock:
if cache_key in _client_cache:
cached_client, cached_default, cached_loop = _client_cache[cache_key]
if async_mode:
# Async clients are bound to the event loop that created them.
# A cached async client whose loop has been closed will raise
# "Event loop is closed" when httpx tries to clean up its
# transport. Discard the stale client and create a fresh one.
if cached_loop is not None and cached_loop.is_closed():
del _client_cache[cache_key]
else:
return cached_client, model or cached_default
else:
return cached_client, model or cached_default
# Build outside the lock
client, default_model = resolve_provider_client(
provider,
model,
@@ -1186,7 +1212,20 @@ def _get_cached_client(
explicit_api_key=api_key,
)
if client is not None:
_client_cache[cache_key] = (client, default_model)
# For async clients, remember which loop they were created on so we
# can detect stale entries later.
bound_loop = None
if async_mode:
try:
import asyncio as _aio
bound_loop = _aio.get_event_loop()
except RuntimeError:
pass
with _client_cache_lock:
if cache_key not in _client_cache:
_client_cache[cache_key] = (client, default_model, bound_loop)
else:
client, default_model, _ = _client_cache[cache_key]
return client, model or default_model
@@ -1231,12 +1270,16 @@ def _resolve_task_provider_model(
cfg_base_url = str(task_config.get("base_url", "")).strip() or None
cfg_api_key = str(task_config.get("api_key", "")).strip() or None
# Backwards compat: compression section has its own keys
if task == "compression" and not cfg_provider:
# Backwards compat: compression section has its own keys.
# The auxiliary.compression defaults to provider="auto", so treat
# both None and "auto" as "not explicitly configured".
if task == "compression" and (not cfg_provider or cfg_provider == "auto"):
comp = config.get("compression", {}) if isinstance(config, dict) else {}
if isinstance(comp, dict):
cfg_provider = comp.get("summary_provider", "").strip() or None
cfg_model = cfg_model or comp.get("summary_model", "").strip() or None
_sbu = comp.get("summary_base_url") or ""
cfg_base_url = cfg_base_url or _sbu.strip() or None
env_model = _get_auxiliary_env_override(task, "MODEL") if task else None
resolved_model = model or env_model or cfg_model
+83 -22
View File
@@ -45,16 +45,25 @@ class ContextCompressor:
quiet_mode: bool = False,
summary_model_override: str = None,
base_url: str = "",
api_key: str = "",
config_context_length: int | None = None,
provider: str = "",
):
self.model = model
self.base_url = base_url
self.api_key = api_key
self.provider = provider
self.threshold_percent = threshold_percent
self.protect_first_n = protect_first_n
self.protect_last_n = protect_last_n
self.summary_target_tokens = summary_target_tokens
self.quiet_mode = quiet_mode
self.context_length = get_model_context_length(model, base_url=base_url)
self.context_length = get_model_context_length(
model, base_url=base_url, api_key=api_key,
config_context_length=config_context_length,
provider=provider,
)
self.threshold_tokens = int(self.context_length * threshold_percent)
self.compression_count = 0
self._context_probed = False # True after a step-down from context error
@@ -251,18 +260,24 @@ Write only the summary body. Do not include any preamble or prefix; the system w
"""Pull a compress-end boundary backward to avoid splitting a
tool_call / result group.
If the message just before ``idx`` is an assistant message with
tool_calls, those tool results will start at ``idx`` and would be
separated from their parent. Move backwards to include the whole
group in the summarised region.
If the boundary falls in the middle of a tool-result group (i.e.
there are consecutive tool messages before ``idx``), walk backward
past all of them to find the parent assistant message. If found,
move the boundary before the assistant so the entire
assistant + tool_results group is included in the summarised region
rather than being split (which causes silent data loss when
``_sanitize_tool_pairs`` removes the orphaned tail results).
"""
if idx <= 0 or idx >= len(messages):
return idx
prev = messages[idx - 1]
if prev.get("role") == "assistant" and prev.get("tool_calls"):
# The results for this assistant turn sit at idx..idx+k.
# Include the assistant message in the summarised region too.
idx -= 1
# Walk backward past consecutive tool results
check = idx - 1
while check >= 0 and messages[check].get("role") == "tool":
check -= 1
# If we landed on the parent assistant with tool_calls, pull the
# boundary before it so the whole group gets summarised together.
if check >= 0 and messages[check].get("role") == "assistant" and messages[check].get("tool_calls"):
idx = check
return idx
def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
@@ -275,7 +290,11 @@ Write only the summary body. Do not include any preamble or prefix; the system w
n_messages = len(messages)
if n_messages <= self.protect_first_n + self.protect_last_n + 1:
if not self.quiet_mode:
print(f"⚠️ Cannot compress: only {n_messages} messages (need > {self.protect_first_n + self.protect_last_n + 1})")
logger.warning(
"Cannot compress: only %d messages (need > %d)",
n_messages,
self.protect_first_n + self.protect_last_n + 1,
)
return messages
compress_start = self.protect_first_n
@@ -293,11 +312,23 @@ Write only the summary body. Do not include any preamble or prefix; the system w
display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)
if not self.quiet_mode:
print(f"\n📦 Context compression triggered ({display_tokens:,} tokens ≥ {self.threshold_tokens:,} threshold)")
print(f" 📊 Model context limit: {self.context_length:,} tokens ({self.threshold_percent*100:.0f}% = {self.threshold_tokens:,})")
if not self.quiet_mode:
print(f" 🗜️ Summarizing turns {compress_start+1}-{compress_end} ({len(turns_to_summarize)} turns)")
logger.info(
"Context compression triggered (%d tokens >= %d threshold)",
display_tokens,
self.threshold_tokens,
)
logger.info(
"Model context limit: %d tokens (%.0f%% = %d)",
self.context_length,
self.threshold_percent * 100,
self.threshold_tokens,
)
logger.info(
"Summarizing turns %d-%d (%d turns)",
compress_start + 1,
compress_end,
len(turns_to_summarize),
)
summary = self._generate_summary(turns_to_summarize)
@@ -311,16 +342,41 @@ Write only the summary body. Do not include any preamble or prefix; the system w
)
compressed.append(msg)
_merge_summary_into_tail = False
if summary:
last_head_role = messages[compress_start - 1].get("role", "user") if compress_start > 0 else "user"
summary_role = "user" if last_head_role in ("assistant", "tool") else "assistant"
compressed.append({"role": summary_role, "content": summary})
first_tail_role = messages[compress_end].get("role", "user") if compress_end < n_messages else "user"
# Pick a role that avoids consecutive same-role with both neighbors.
# Priority: avoid colliding with head (already committed), then tail.
if last_head_role in ("assistant", "tool"):
summary_role = "user"
else:
summary_role = "assistant"
# If the chosen role collides with the tail AND flipping wouldn't
# collide with the head, flip it.
if summary_role == first_tail_role:
flipped = "assistant" if summary_role == "user" else "user"
if flipped != last_head_role:
summary_role = flipped
else:
# Both roles would create consecutive same-role messages
# (e.g. head=assistant, tail=user — neither role works).
# Merge the summary into the first tail message instead
# of inserting a standalone message that breaks alternation.
_merge_summary_into_tail = True
if not _merge_summary_into_tail:
compressed.append({"role": summary_role, "content": summary})
else:
if not self.quiet_mode:
print(" ⚠️ No summary model available — middle turns dropped without summary")
logger.warning("No summary model available — middle turns dropped without summary")
for i in range(compress_end, n_messages):
compressed.append(messages[i].copy())
msg = messages[i].copy()
if _merge_summary_into_tail and i == compress_end:
original = msg.get("content") or ""
msg["content"] = summary + "\n\n" + original
_merge_summary_into_tail = False
compressed.append(msg)
self.compression_count += 1
@@ -329,7 +385,12 @@ Write only the summary body. Do not include any preamble or prefix; the system w
if not self.quiet_mode:
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
print(f" ✅ Compressed: {n_messages}{len(compressed)} messages (~{saved_estimate:,} tokens saved)")
print(f" 💡 Compression #{self.compression_count} complete")
logger.info(
"Compressed: %d -> %d messages (~%d tokens saved)",
n_messages,
len(compressed),
saved_estimate,
)
logger.info("Compression #%d complete", self.compression_count)
return compressed
+447
View File
@@ -0,0 +1,447 @@
"""OpenAI-compatible shim that forwards Hermes requests to `copilot --acp`.
This adapter lets Hermes treat the GitHub Copilot ACP server as a chat-style
backend. Each request starts a short-lived ACP session, sends the formatted
conversation as a single prompt, collects text chunks, and converts the result
back into the minimal shape Hermes expects from an OpenAI client.
"""
from __future__ import annotations
import json
import os
import queue
import shlex
import subprocess
import threading
import time
from collections import deque
from pathlib import Path
from types import SimpleNamespace
from typing import Any
ACP_MARKER_BASE_URL = "acp://copilot"
_DEFAULT_TIMEOUT_SECONDS = 900.0
def _resolve_command() -> str:
return (
os.getenv("HERMES_COPILOT_ACP_COMMAND", "").strip()
or os.getenv("COPILOT_CLI_PATH", "").strip()
or "copilot"
)
def _resolve_args() -> list[str]:
raw = os.getenv("HERMES_COPILOT_ACP_ARGS", "").strip()
if not raw:
return ["--acp", "--stdio"]
return shlex.split(raw)
def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
return {
"jsonrpc": "2.0",
"id": message_id,
"error": {
"code": code,
"message": message,
},
}
def _format_messages_as_prompt(messages: list[dict[str, Any]], model: str | None = None) -> str:
sections: list[str] = [
"You are being used as the active ACP agent backend for Hermes.",
"Use your own ACP capabilities and respond directly in natural language.",
"Do not emit OpenAI tool-call JSON.",
]
if model:
sections.append(f"Hermes requested model hint: {model}")
transcript: list[str] = []
for message in messages:
if not isinstance(message, dict):
continue
role = str(message.get("role") or "unknown").strip().lower()
if role == "tool":
role = "tool"
elif role not in {"system", "user", "assistant"}:
role = "context"
content = message.get("content")
rendered = _render_message_content(content)
if not rendered:
continue
label = {
"system": "System",
"user": "User",
"assistant": "Assistant",
"tool": "Tool",
"context": "Context",
}.get(role, role.title())
transcript.append(f"{label}:\n{rendered}")
if transcript:
sections.append("Conversation transcript:\n\n" + "\n\n".join(transcript))
sections.append("Continue the conversation from the latest user request.")
return "\n\n".join(section.strip() for section in sections if section and section.strip())
def _render_message_content(content: Any) -> str:
if content is None:
return ""
if isinstance(content, str):
return content.strip()
if isinstance(content, dict):
if "text" in content:
return str(content.get("text") or "").strip()
if "content" in content and isinstance(content.get("content"), str):
return str(content.get("content") or "").strip()
return json.dumps(content, ensure_ascii=True)
if isinstance(content, list):
parts: list[str] = []
for item in content:
if isinstance(item, str):
parts.append(item)
elif isinstance(item, dict):
text = item.get("text")
if isinstance(text, str) and text.strip():
parts.append(text.strip())
return "\n".join(parts).strip()
return str(content).strip()
def _ensure_path_within_cwd(path_text: str, cwd: str) -> Path:
candidate = Path(path_text)
if not candidate.is_absolute():
raise PermissionError("ACP file-system paths must be absolute.")
resolved = candidate.resolve()
root = Path(cwd).resolve()
try:
resolved.relative_to(root)
except ValueError as exc:
raise PermissionError(f"Path '{resolved}' is outside the session cwd '{root}'.") from exc
return resolved
class _ACPChatCompletions:
def __init__(self, client: "CopilotACPClient"):
self._client = client
def create(self, **kwargs: Any) -> Any:
return self._client._create_chat_completion(**kwargs)
class _ACPChatNamespace:
def __init__(self, client: "CopilotACPClient"):
self.completions = _ACPChatCompletions(client)
class CopilotACPClient:
"""Minimal OpenAI-client-compatible facade for Copilot ACP."""
def __init__(
self,
*,
api_key: str | None = None,
base_url: str | None = None,
default_headers: dict[str, str] | None = None,
acp_command: str | None = None,
acp_args: list[str] | None = None,
acp_cwd: str | None = None,
command: str | None = None,
args: list[str] | None = None,
**_: Any,
):
self.api_key = api_key or "copilot-acp"
self.base_url = base_url or ACP_MARKER_BASE_URL
self._default_headers = dict(default_headers or {})
self._acp_command = acp_command or command or _resolve_command()
self._acp_args = list(acp_args or args or _resolve_args())
self._acp_cwd = str(Path(acp_cwd or os.getcwd()).resolve())
self.chat = _ACPChatNamespace(self)
self.is_closed = False
self._active_process: subprocess.Popen[str] | None = None
self._active_process_lock = threading.Lock()
def close(self) -> None:
proc: subprocess.Popen[str] | None
with self._active_process_lock:
proc = self._active_process
self._active_process = None
self.is_closed = True
if proc is None:
return
try:
proc.terminate()
proc.wait(timeout=2)
except Exception:
try:
proc.kill()
except Exception:
pass
def _create_chat_completion(
self,
*,
model: str | None = None,
messages: list[dict[str, Any]] | None = None,
timeout: float | None = None,
**_: Any,
) -> Any:
prompt_text = _format_messages_as_prompt(messages or [], model=model)
response_text, reasoning_text = self._run_prompt(
prompt_text,
timeout_seconds=float(timeout or _DEFAULT_TIMEOUT_SECONDS),
)
usage = SimpleNamespace(
prompt_tokens=0,
completion_tokens=0,
total_tokens=0,
prompt_tokens_details=SimpleNamespace(cached_tokens=0),
)
assistant_message = SimpleNamespace(
content=response_text,
tool_calls=[],
reasoning=reasoning_text or None,
reasoning_content=reasoning_text or None,
reasoning_details=None,
)
choice = SimpleNamespace(message=assistant_message, finish_reason="stop")
return SimpleNamespace(
choices=[choice],
usage=usage,
model=model or "copilot-acp",
)
def _run_prompt(self, prompt_text: str, *, timeout_seconds: float) -> tuple[str, str]:
try:
proc = subprocess.Popen(
[self._acp_command] + self._acp_args,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
bufsize=1,
cwd=self._acp_cwd,
)
except FileNotFoundError as exc:
raise RuntimeError(
f"Could not start Copilot ACP command '{self._acp_command}'. "
"Install GitHub Copilot CLI or set HERMES_COPILOT_ACP_COMMAND/COPILOT_CLI_PATH."
) from exc
if proc.stdin is None or proc.stdout is None:
proc.kill()
raise RuntimeError("Copilot ACP process did not expose stdin/stdout pipes.")
self.is_closed = False
with self._active_process_lock:
self._active_process = proc
inbox: queue.Queue[dict[str, Any]] = queue.Queue()
stderr_tail: deque[str] = deque(maxlen=40)
def _stdout_reader() -> None:
for line in proc.stdout:
try:
inbox.put(json.loads(line))
except Exception:
inbox.put({"raw": line.rstrip("\n")})
def _stderr_reader() -> None:
if proc.stderr is None:
return
for line in proc.stderr:
stderr_tail.append(line.rstrip("\n"))
out_thread = threading.Thread(target=_stdout_reader, daemon=True)
err_thread = threading.Thread(target=_stderr_reader, daemon=True)
out_thread.start()
err_thread.start()
next_id = 0
def _request(method: str, params: dict[str, Any], *, text_parts: list[str] | None = None, reasoning_parts: list[str] | None = None) -> Any:
nonlocal next_id
next_id += 1
request_id = next_id
payload = {
"jsonrpc": "2.0",
"id": request_id,
"method": method,
"params": params,
}
proc.stdin.write(json.dumps(payload) + "\n")
proc.stdin.flush()
deadline = time.time() + timeout_seconds
while time.time() < deadline:
if proc.poll() is not None:
break
try:
msg = inbox.get(timeout=0.1)
except queue.Empty:
continue
if self._handle_server_message(
msg,
process=proc,
cwd=self._acp_cwd,
text_parts=text_parts,
reasoning_parts=reasoning_parts,
):
continue
if msg.get("id") != request_id:
continue
if "error" in msg:
err = msg.get("error") or {}
raise RuntimeError(
f"Copilot ACP {method} failed: {err.get('message') or err}"
)
return msg.get("result")
stderr_text = "\n".join(stderr_tail).strip()
if proc.poll() is not None and stderr_text:
raise RuntimeError(f"Copilot ACP process exited early: {stderr_text}")
raise TimeoutError(f"Timed out waiting for Copilot ACP response to {method}.")
try:
_request(
"initialize",
{
"protocolVersion": 1,
"clientCapabilities": {
"fs": {
"readTextFile": True,
"writeTextFile": True,
}
},
"clientInfo": {
"name": "hermes-agent",
"title": "Hermes Agent",
"version": "0.0.0",
},
},
)
session = _request(
"session/new",
{
"cwd": self._acp_cwd,
"mcpServers": [],
},
) or {}
session_id = str(session.get("sessionId") or "").strip()
if not session_id:
raise RuntimeError("Copilot ACP did not return a sessionId.")
text_parts: list[str] = []
reasoning_parts: list[str] = []
_request(
"session/prompt",
{
"sessionId": session_id,
"prompt": [
{
"type": "text",
"text": prompt_text,
}
],
},
text_parts=text_parts,
reasoning_parts=reasoning_parts,
)
return "".join(text_parts), "".join(reasoning_parts)
finally:
self.close()
def _handle_server_message(
self,
msg: dict[str, Any],
*,
process: subprocess.Popen[str],
cwd: str,
text_parts: list[str] | None,
reasoning_parts: list[str] | None,
) -> bool:
method = msg.get("method")
if not isinstance(method, str):
return False
if method == "session/update":
params = msg.get("params") or {}
update = params.get("update") or {}
kind = str(update.get("sessionUpdate") or "").strip()
content = update.get("content") or {}
chunk_text = ""
if isinstance(content, dict):
chunk_text = str(content.get("text") or "")
if kind == "agent_message_chunk" and chunk_text and text_parts is not None:
text_parts.append(chunk_text)
elif kind == "agent_thought_chunk" and chunk_text and reasoning_parts is not None:
reasoning_parts.append(chunk_text)
return True
if process.stdin is None:
return True
message_id = msg.get("id")
params = msg.get("params") or {}
if method == "session/request_permission":
response = {
"jsonrpc": "2.0",
"id": message_id,
"result": {
"outcome": {
"outcome": "allow_once",
}
},
}
elif method == "fs/read_text_file":
try:
path = _ensure_path_within_cwd(str(params.get("path") or ""), cwd)
content = path.read_text() if path.exists() else ""
line = params.get("line")
limit = params.get("limit")
if isinstance(line, int) and line > 1:
lines = content.splitlines(keepends=True)
start = line - 1
end = start + limit if isinstance(limit, int) and limit > 0 else None
content = "".join(lines[start:end])
response = {
"jsonrpc": "2.0",
"id": message_id,
"result": {
"content": content,
},
}
except Exception as exc:
response = _jsonrpc_error(message_id, -32602, str(exc))
elif method == "fs/write_text_file":
try:
path = _ensure_path_within_cwd(str(params.get("path") or ""), cwd)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(str(params.get("content") or ""))
response = {
"jsonrpc": "2.0",
"id": message_id,
"result": None,
}
except Exception as exc:
response = _jsonrpc_error(message_id, -32602, str(exc))
else:
response = _jsonrpc_error(
message_id,
-32601,
f"ACP client method '{method}' is not supported by Hermes yet.",
)
process.stdin.write(json.dumps(response) + "\n")
process.stdin.flush()
return True
+118
View File
@@ -59,6 +59,32 @@ def get_skin_tool_prefix() -> str:
return ""
def get_tool_emoji(tool_name: str, default: str = "") -> str:
"""Get the display emoji for a tool.
Resolution order:
1. Active skin's ``tool_emojis`` overrides (if a skin is loaded)
2. Tool registry's per-tool ``emoji`` field
3. *default* fallback
"""
# 1. Skin override
skin = _get_skin()
if skin and skin.tool_emojis:
override = skin.tool_emojis.get(tool_name)
if override:
return override
# 2. Registry default
try:
from tools.registry import registry
emoji = registry.get_emoji(tool_name, default="")
if emoji:
return emoji
except Exception:
pass
# 3. Hardcoded fallback
return default
# =========================================================================
# Tool preview (one-line summary of a tool call's primary argument)
# =========================================================================
@@ -586,3 +612,95 @@ def write_tty(text: str) -> None:
except OSError:
sys.stdout.write(text)
sys.stdout.flush()
# =========================================================================
# Context pressure display (CLI user-facing warnings)
# =========================================================================
# ANSI color codes for context pressure tiers
_CYAN = "\033[36m"
_YELLOW = "\033[33m"
_BOLD = "\033[1m"
_DIM_ANSI = "\033[2m"
# Bar characters
_BAR_FILLED = ""
_BAR_EMPTY = ""
_BAR_WIDTH = 20
def format_context_pressure(
compaction_progress: float,
threshold_tokens: int,
threshold_percent: float,
compression_enabled: bool = True,
) -> str:
"""Build a formatted context pressure line for CLI display.
The bar and percentage show progress toward the compaction threshold,
NOT the raw context window. 100% = compaction fires.
Uses ANSI colors:
- cyan at ~60% to compaction = informational
- bold yellow at ~85% to compaction = warning
Args:
compaction_progress: How close to compaction (0.01.0, 1.0 = fires).
threshold_tokens: Compaction threshold in tokens.
threshold_percent: Compaction threshold as a fraction of context window.
compression_enabled: Whether auto-compression is active.
"""
pct_int = int(compaction_progress * 100)
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
threshold_k = f"{threshold_tokens // 1000}k" if threshold_tokens >= 1000 else str(threshold_tokens)
threshold_pct_int = int(threshold_percent * 100)
# Tier styling
if compaction_progress >= 0.85:
color = f"{_BOLD}{_YELLOW}"
icon = ""
if compression_enabled:
hint = "compaction imminent"
else:
hint = "no auto-compaction"
else:
color = _CYAN
icon = ""
hint = "approaching compaction"
return (
f" {color}{icon} context {bar} {pct_int}% to compaction{_ANSI_RESET}"
f" {_DIM_ANSI}{threshold_k} threshold ({threshold_pct_int}%) · {hint}{_ANSI_RESET}"
)
def format_context_pressure_gateway(
compaction_progress: float,
threshold_percent: float,
compression_enabled: bool = True,
) -> str:
"""Build a plain-text context pressure notification for messaging platforms.
No ANSI — just Unicode and plain text suitable for Telegram/Discord/etc.
The percentage shows progress toward the compaction threshold.
"""
pct_int = int(compaction_progress * 100)
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
threshold_pct_int = int(threshold_percent * 100)
if compaction_progress >= 0.85:
icon = "⚠️"
if compression_enabled:
hint = f"Context compaction is imminent (threshold: {threshold_pct_int}% of window)."
else:
hint = "Auto-compaction is disabled — context may be truncated."
else:
icon = ""
hint = f"Compaction threshold is at {threshold_pct_int}% of context window."
return f"{icon} Context: {bar} {pct_int}% to compaction\n{hint}"
+106 -132
View File
@@ -20,65 +20,23 @@ import json
import time
from collections import Counter, defaultdict
from datetime import datetime
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List
# =========================================================================
# Model pricing (USD per million tokens) — approximate as of early 2026
# =========================================================================
MODEL_PRICING = {
# OpenAI
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"gpt-4.1": {"input": 2.00, "output": 8.00},
"gpt-4.1-mini": {"input": 0.40, "output": 1.60},
"gpt-4.1-nano": {"input": 0.10, "output": 0.40},
"gpt-4.5-preview": {"input": 75.00, "output": 150.00},
"gpt-5": {"input": 10.00, "output": 30.00},
"gpt-5.4": {"input": 10.00, "output": 30.00},
"o3": {"input": 10.00, "output": 40.00},
"o3-mini": {"input": 1.10, "output": 4.40},
"o4-mini": {"input": 1.10, "output": 4.40},
# Anthropic
"claude-opus-4-20250514": {"input": 15.00, "output": 75.00},
"claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
"claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
"claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
"claude-3-opus-20240229": {"input": 15.00, "output": 75.00},
"claude-3-haiku-20240307": {"input": 0.25, "output": 1.25},
# DeepSeek
"deepseek-chat": {"input": 0.14, "output": 0.28},
"deepseek-reasoner": {"input": 0.55, "output": 2.19},
# Google
"gemini-2.5-pro": {"input": 1.25, "output": 10.00},
"gemini-2.5-flash": {"input": 0.15, "output": 0.60},
"gemini-2.0-flash": {"input": 0.10, "output": 0.40},
# Meta (via providers)
"llama-4-maverick": {"input": 0.50, "output": 0.70},
"llama-4-scout": {"input": 0.20, "output": 0.30},
# Z.AI / GLM (direct provider — pricing not published externally, treat as local)
"glm-5": {"input": 0.0, "output": 0.0},
"glm-4.7": {"input": 0.0, "output": 0.0},
"glm-4.5": {"input": 0.0, "output": 0.0},
"glm-4.5-flash": {"input": 0.0, "output": 0.0},
# Kimi / Moonshot (direct provider — pricing not published externally, treat as local)
"kimi-k2.5": {"input": 0.0, "output": 0.0},
"kimi-k2-thinking": {"input": 0.0, "output": 0.0},
"kimi-k2-turbo-preview": {"input": 0.0, "output": 0.0},
"kimi-k2-0905-preview": {"input": 0.0, "output": 0.0},
# MiniMax (direct provider — pricing not published externally, treat as local)
"MiniMax-M2.5": {"input": 0.0, "output": 0.0},
"MiniMax-M2.5-highspeed": {"input": 0.0, "output": 0.0},
"MiniMax-M2.1": {"input": 0.0, "output": 0.0},
}
from agent.usage_pricing import (
CanonicalUsage,
DEFAULT_PRICING,
estimate_usage_cost,
format_duration_compact,
get_pricing,
has_known_pricing,
)
# Fallback: unknown/custom models get zero cost (we can't assume pricing
# for self-hosted models, custom OAI endpoints, local inference, etc.)
_DEFAULT_PRICING = {"input": 0.0, "output": 0.0}
_DEFAULT_PRICING = DEFAULT_PRICING
def _has_known_pricing(model_name: str) -> bool:
def _has_known_pricing(model_name: str, provider: str = None, base_url: str = None) -> bool:
"""Check if a model has known pricing (vs unknown/custom endpoint)."""
return _get_pricing(model_name) is not _DEFAULT_PRICING
return has_known_pricing(model_name, provider=provider, base_url=base_url)
def _get_pricing(model_name: str) -> Dict[str, float]:
@@ -87,67 +45,51 @@ def _get_pricing(model_name: str) -> Dict[str, float]:
Returns _DEFAULT_PRICING (zero cost) for unknown/custom models —
we can't assume costs for self-hosted endpoints, local inference, etc.
"""
if not model_name:
return _DEFAULT_PRICING
# Strip provider prefix (e.g., "anthropic/claude-..." -> "claude-...")
bare = model_name.split("/")[-1].lower()
# Exact match first
if bare in MODEL_PRICING:
return MODEL_PRICING[bare]
# Fuzzy prefix match — prefer the LONGEST matching key to avoid
# e.g. "gpt-4o" matching before "gpt-4o-mini" for "gpt-4o-mini-2024-07-18"
best_match = None
best_len = 0
for key, price in MODEL_PRICING.items():
if bare.startswith(key) and len(key) > best_len:
best_match = price
best_len = len(key)
if best_match:
return best_match
# Keyword heuristics (checked in most-specific-first order)
if "opus" in bare:
return {"input": 15.00, "output": 75.00}
if "sonnet" in bare:
return {"input": 3.00, "output": 15.00}
if "haiku" in bare:
return {"input": 0.80, "output": 4.00}
if "gpt-4o-mini" in bare:
return {"input": 0.15, "output": 0.60}
if "gpt-4o" in bare:
return {"input": 2.50, "output": 10.00}
if "gpt-5" in bare:
return {"input": 10.00, "output": 30.00}
if "deepseek" in bare:
return {"input": 0.14, "output": 0.28}
if "gemini" in bare:
return {"input": 0.15, "output": 0.60}
return _DEFAULT_PRICING
return get_pricing(model_name)
def _estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
"""Estimate the USD cost for a given model and token counts."""
pricing = _get_pricing(model)
return (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
def _estimate_cost(
session_or_model: Dict[str, Any] | str,
input_tokens: int = 0,
output_tokens: int = 0,
*,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
provider: str = None,
base_url: str = None,
) -> tuple[float, str]:
"""Estimate the USD cost for a session row or a model/token tuple."""
if isinstance(session_or_model, dict):
session = session_or_model
model = session.get("model") or ""
usage = CanonicalUsage(
input_tokens=session.get("input_tokens") or 0,
output_tokens=session.get("output_tokens") or 0,
cache_read_tokens=session.get("cache_read_tokens") or 0,
cache_write_tokens=session.get("cache_write_tokens") or 0,
)
provider = session.get("billing_provider")
base_url = session.get("billing_base_url")
else:
model = session_or_model or ""
usage = CanonicalUsage(
input_tokens=input_tokens,
output_tokens=output_tokens,
cache_read_tokens=cache_read_tokens,
cache_write_tokens=cache_write_tokens,
)
result = estimate_usage_cost(
model,
usage,
provider=provider,
base_url=base_url,
)
return float(result.amount_usd or 0.0), result.status
def _format_duration(seconds: float) -> str:
"""Format seconds into a human-readable duration string."""
if seconds < 60:
return f"{seconds:.0f}s"
minutes = seconds / 60
if minutes < 60:
return f"{minutes:.0f}m"
hours = minutes / 60
if hours < 24:
remaining_min = int(minutes % 60)
return f"{int(hours)}h {remaining_min}m" if remaining_min else f"{int(hours)}h"
days = hours / 24
return f"{days:.1f}d"
return format_duration_compact(seconds)
def _bar_chart(values: List[int], max_width: int = 20) -> List[str]:
@@ -234,24 +176,30 @@ class InsightsEngine:
# Columns we actually need (skip system_prompt, model_config blobs)
_SESSION_COLS = ("id, source, model, started_at, ended_at, "
"message_count, tool_call_count, input_tokens, output_tokens")
"message_count, tool_call_count, input_tokens, output_tokens, "
"cache_read_tokens, cache_write_tokens, billing_provider, "
"billing_base_url, billing_mode, estimated_cost_usd, "
"actual_cost_usd, cost_status, cost_source")
# Pre-computed query strings — f-string evaluated once at class definition,
# not at runtime, so no user-controlled value can alter the query structure.
_GET_SESSIONS_WITH_SOURCE = (
f"SELECT {_SESSION_COLS} FROM sessions"
" WHERE started_at >= ? AND source = ?"
" ORDER BY started_at DESC"
)
_GET_SESSIONS_ALL = (
f"SELECT {_SESSION_COLS} FROM sessions"
" WHERE started_at >= ?"
" ORDER BY started_at DESC"
)
def _get_sessions(self, cutoff: float, source: str = None) -> List[Dict]:
"""Fetch sessions within the time window."""
if source:
cursor = self._conn.execute(
f"""SELECT {self._SESSION_COLS} FROM sessions
WHERE started_at >= ? AND source = ?
ORDER BY started_at DESC""",
(cutoff, source),
)
cursor = self._conn.execute(self._GET_SESSIONS_WITH_SOURCE, (cutoff, source))
else:
cursor = self._conn.execute(
f"""SELECT {self._SESSION_COLS} FROM sessions
WHERE started_at >= ?
ORDER BY started_at DESC""",
(cutoff,),
)
cursor = self._conn.execute(self._GET_SESSIONS_ALL, (cutoff,))
return [dict(row) for row in cursor.fetchall()]
def _get_tool_usage(self, cutoff: float, source: str = None) -> List[Dict]:
@@ -386,21 +334,30 @@ class InsightsEngine:
"""Compute high-level overview statistics."""
total_input = sum(s.get("input_tokens") or 0 for s in sessions)
total_output = sum(s.get("output_tokens") or 0 for s in sessions)
total_tokens = total_input + total_output
total_cache_read = sum(s.get("cache_read_tokens") or 0 for s in sessions)
total_cache_write = sum(s.get("cache_write_tokens") or 0 for s in sessions)
total_tokens = total_input + total_output + total_cache_read + total_cache_write
total_tool_calls = sum(s.get("tool_call_count") or 0 for s in sessions)
total_messages = sum(s.get("message_count") or 0 for s in sessions)
# Cost estimation (weighted by model)
total_cost = 0.0
actual_cost = 0.0
models_with_pricing = set()
models_without_pricing = set()
unknown_cost_sessions = 0
included_cost_sessions = 0
for s in sessions:
model = s.get("model") or ""
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
total_cost += _estimate_cost(model, inp, out)
estimated, status = _estimate_cost(s)
total_cost += estimated
actual_cost += s.get("actual_cost_usd") or 0.0
display = model.split("/")[-1] if "/" in model else (model or "unknown")
if _has_known_pricing(model):
if status == "included":
included_cost_sessions += 1
elif status == "unknown":
unknown_cost_sessions += 1
if _has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url")):
models_with_pricing.add(display)
else:
models_without_pricing.add(display)
@@ -427,8 +384,11 @@ class InsightsEngine:
"total_tool_calls": total_tool_calls,
"total_input_tokens": total_input,
"total_output_tokens": total_output,
"total_cache_read_tokens": total_cache_read,
"total_cache_write_tokens": total_cache_write,
"total_tokens": total_tokens,
"estimated_cost": total_cost,
"actual_cost": actual_cost,
"total_hours": total_hours,
"avg_session_duration": avg_duration,
"avg_messages_per_session": total_messages / len(sessions) if sessions else 0,
@@ -440,12 +400,15 @@ class InsightsEngine:
"date_range_end": date_range_end,
"models_with_pricing": sorted(models_with_pricing),
"models_without_pricing": sorted(models_without_pricing),
"unknown_cost_sessions": unknown_cost_sessions,
"included_cost_sessions": included_cost_sessions,
}
def _compute_model_breakdown(self, sessions: List[Dict]) -> List[Dict]:
"""Break down usage by model."""
model_data = defaultdict(lambda: {
"sessions": 0, "input_tokens": 0, "output_tokens": 0,
"cache_read_tokens": 0, "cache_write_tokens": 0,
"total_tokens": 0, "tool_calls": 0, "cost": 0.0,
})
@@ -457,12 +420,18 @@ class InsightsEngine:
d["sessions"] += 1
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
cache_read = s.get("cache_read_tokens") or 0
cache_write = s.get("cache_write_tokens") or 0
d["input_tokens"] += inp
d["output_tokens"] += out
d["total_tokens"] += inp + out
d["cache_read_tokens"] += cache_read
d["cache_write_tokens"] += cache_write
d["total_tokens"] += inp + out + cache_read + cache_write
d["tool_calls"] += s.get("tool_call_count") or 0
d["cost"] += _estimate_cost(model, inp, out)
d["has_pricing"] = _has_known_pricing(model)
estimate, status = _estimate_cost(s)
d["cost"] += estimate
d["has_pricing"] = _has_known_pricing(model, s.get("billing_provider"), s.get("billing_base_url"))
d["cost_status"] = status
result = [
{"model": model, **data}
@@ -476,7 +445,8 @@ class InsightsEngine:
"""Break down usage by platform/source."""
platform_data = defaultdict(lambda: {
"sessions": 0, "messages": 0, "input_tokens": 0,
"output_tokens": 0, "total_tokens": 0, "tool_calls": 0,
"output_tokens": 0, "cache_read_tokens": 0,
"cache_write_tokens": 0, "total_tokens": 0, "tool_calls": 0,
})
for s in sessions:
@@ -486,9 +456,13 @@ class InsightsEngine:
d["messages"] += s.get("message_count") or 0
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
cache_read = s.get("cache_read_tokens") or 0
cache_write = s.get("cache_write_tokens") or 0
d["input_tokens"] += inp
d["output_tokens"] += out
d["total_tokens"] += inp + out
d["cache_read_tokens"] += cache_read
d["cache_write_tokens"] += cache_write
d["total_tokens"] += inp + out + cache_read + cache_write
d["tool_calls"] += s.get("tool_call_count") or 0
result = [
+677 -55
View File
@@ -10,6 +10,7 @@ import re
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
from urllib.parse import urlparse
import requests
import yaml
@@ -18,61 +19,322 @@ from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
# Provider names that can appear as a "provider:" prefix before a model ID.
# Only these are stripped — Ollama-style "model:tag" colons (e.g. "qwen3.5:27b")
# are preserved so the full model name reaches cache lookups and server queries.
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"zai", "kimi-coding", "minimax", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"custom", "local",
# Common aliases
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
"github-models", "kimi", "moonshot", "claude", "deep-seek",
"opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
})
_OLLAMA_TAG_PATTERN = re.compile(
r"^(\d+\.?\d*b|latest|stable|q\d|fp?\d|instruct|chat|coder|vision|text)",
re.IGNORECASE,
)
def _strip_provider_prefix(model: str) -> str:
"""Strip a recognised provider prefix from a model string.
``"local:my-model"`` → ``"my-model"``
``"qwen3.5:27b"`` → ``"qwen3.5:27b"`` (unchanged — not a provider prefix)
``"qwen:0.5b"`` → ``"qwen:0.5b"`` (unchanged — Ollama model:tag)
``"deepseek:latest"``→ ``"deepseek:latest"``(unchanged — Ollama model:tag)
"""
if ":" not in model or model.startswith("http"):
return model
prefix, suffix = model.split(":", 1)
prefix_lower = prefix.strip().lower()
if prefix_lower in _PROVIDER_PREFIXES:
# Don't strip if suffix looks like an Ollama tag (e.g. "7b", "latest", "q4_0")
if _OLLAMA_TAG_PATTERN.match(suffix.strip()):
return model
return suffix
return model
_model_metadata_cache: Dict[str, Dict[str, Any]] = {}
_model_metadata_cache_time: float = 0
_MODEL_CACHE_TTL = 3600
_endpoint_model_metadata_cache: Dict[str, Dict[str, Dict[str, Any]]] = {}
_endpoint_model_metadata_cache_time: Dict[str, float] = {}
_ENDPOINT_MODEL_CACHE_TTL = 300
# Descending tiers for context length probing when the model is unknown.
# We start high and step down on context-length errors until one works.
# We start at 128K (a safe default for most modern models) and step down
# on context-length errors until one works.
CONTEXT_PROBE_TIERS = [
2_000_000,
1_000_000,
512_000,
200_000,
128_000,
64_000,
32_000,
16_000,
8_000,
]
# Default context length when no detection method succeeds.
DEFAULT_FALLBACK_CONTEXT = CONTEXT_PROBE_TIERS[0]
# Thin fallback defaults — only broad model family patterns.
# These fire only when provider is unknown AND models.dev/OpenRouter/Anthropic
# all miss. Replaced the previous 80+ entry dict.
# For provider-specific context lengths, models.dev is the primary source.
DEFAULT_CONTEXT_LENGTHS = {
"anthropic/claude-opus-4": 200000,
"anthropic/claude-opus-4.5": 200000,
"anthropic/claude-opus-4.6": 200000,
"anthropic/claude-sonnet-4": 200000,
"anthropic/claude-sonnet-4-20250514": 200000,
"anthropic/claude-haiku-4.5": 200000,
# Bare Anthropic model IDs (for native API provider)
"claude-opus-4-6": 200000,
"claude-sonnet-4-6": 200000,
"claude-opus-4-5-20251101": 200000,
"claude-sonnet-4-5-20250929": 200000,
"claude-opus-4-1-20250805": 200000,
"claude-opus-4-20250514": 200000,
"claude-sonnet-4-20250514": 200000,
"claude-haiku-4-5-20251001": 200000,
"openai/gpt-4o": 128000,
"openai/gpt-4-turbo": 128000,
"openai/gpt-4o-mini": 128000,
"google/gemini-2.0-flash": 1048576,
"google/gemini-2.5-pro": 1048576,
"meta-llama/llama-3.3-70b-instruct": 131072,
"deepseek/deepseek-chat-v3": 65536,
"qwen/qwen-2.5-72b-instruct": 32768,
"glm-4.7": 202752,
"glm-5": 202752,
"glm-4.5": 131072,
"glm-4.5-flash": 131072,
"kimi-for-coding": 262144,
"kimi-k2.5": 262144,
"kimi-k2-thinking": 262144,
"kimi-k2-thinking-turbo": 262144,
"kimi-k2-turbo-preview": 262144,
"kimi-k2-0905-preview": 131072,
"MiniMax-M2.5": 204800,
"MiniMax-M2.5-highspeed": 204800,
"MiniMax-M2.1": 204800,
# Anthropic Claude 4.6 (1M context) — bare IDs only to avoid
# fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
# substring of "anthropic/claude-sonnet-4.6").
# OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
"claude-opus-4-6": 1000000,
"claude-sonnet-4-6": 1000000,
"claude-opus-4.6": 1000000,
"claude-sonnet-4.6": 1000000,
# Catch-all for older Claude models (must sort after specific entries)
"claude": 200000,
# OpenAI
"gpt-4.1": 1047576,
"gpt-5": 128000,
"gpt-4": 128000,
# Google
"gemini": 1048576,
# DeepSeek
"deepseek": 128000,
# Meta
"llama": 131072,
# Qwen
"qwen": 131072,
# MiniMax
"minimax": 204800,
# GLM
"glm": 202752,
# Kimi
"kimi": 262144,
}
_CONTEXT_LENGTH_KEYS = (
"context_length",
"context_window",
"max_context_length",
"max_position_embeddings",
"max_model_len",
"max_input_tokens",
"max_sequence_length",
"max_seq_len",
"n_ctx_train",
"n_ctx",
)
_MAX_COMPLETION_KEYS = (
"max_completion_tokens",
"max_output_tokens",
"max_tokens",
)
# Local server hostnames / address patterns
_LOCAL_HOSTS = ("localhost", "127.0.0.1", "::1", "0.0.0.0")
def _normalize_base_url(base_url: str) -> str:
return (base_url or "").strip().rstrip("/")
def _is_openrouter_base_url(base_url: str) -> bool:
return "openrouter.ai" in _normalize_base_url(base_url).lower()
def _is_custom_endpoint(base_url: str) -> bool:
normalized = _normalize_base_url(base_url)
return bool(normalized) and not _is_openrouter_base_url(normalized)
def _is_known_provider_base_url(base_url: str) -> bool:
normalized = _normalize_base_url(base_url)
if not normalized:
return False
parsed = urlparse(normalized if "://" in normalized else f"https://{normalized}")
host = parsed.netloc.lower() or parsed.path.lower()
known_hosts = (
"api.openai.com",
"chatgpt.com",
"api.anthropic.com",
"api.z.ai",
"api.moonshot.ai",
"api.kimi.com",
"api.minimax",
)
return any(known_host in host for known_host in known_hosts)
def is_local_endpoint(base_url: str) -> bool:
"""Return True if base_url points to a local machine (localhost / RFC-1918 / WSL)."""
normalized = _normalize_base_url(base_url)
if not normalized:
return False
url = normalized if "://" in normalized else f"http://{normalized}"
try:
parsed = urlparse(url)
host = parsed.hostname or ""
except Exception:
return False
if host in _LOCAL_HOSTS:
return True
# RFC-1918 private ranges and link-local
import ipaddress
try:
addr = ipaddress.ip_address(host)
return addr.is_private or addr.is_loopback or addr.is_link_local
except ValueError:
pass
# Bare IP that looks like a private range (e.g. 172.26.x.x for WSL)
parts = host.split(".")
if len(parts) == 4:
try:
first, second = int(parts[0]), int(parts[1])
if first == 10:
return True
if first == 172 and 16 <= second <= 31:
return True
if first == 192 and second == 168:
return True
except ValueError:
pass
return False
def detect_local_server_type(base_url: str) -> Optional[str]:
"""Detect which local server is running at base_url by probing known endpoints.
Returns one of: "ollama", "lm-studio", "vllm", "llamacpp", or None.
"""
import httpx
normalized = _normalize_base_url(base_url)
server_url = normalized
if server_url.endswith("/v1"):
server_url = server_url[:-3]
try:
with httpx.Client(timeout=2.0) as client:
# LM Studio exposes /api/v1/models — check first (most specific)
try:
r = client.get(f"{server_url}/api/v1/models")
if r.status_code == 200:
return "lm-studio"
except Exception:
pass
# Ollama exposes /api/tags and responds with {"models": [...]}
# LM Studio returns {"error": "Unexpected endpoint"} with status 200
# on this path, so we must verify the response contains "models".
try:
r = client.get(f"{server_url}/api/tags")
if r.status_code == 200:
try:
data = r.json()
if "models" in data:
return "ollama"
except Exception:
pass
except Exception:
pass
# llama.cpp exposes /props
try:
r = client.get(f"{server_url}/props")
if r.status_code == 200 and "default_generation_settings" in r.text:
return "llamacpp"
except Exception:
pass
# vLLM: /version
try:
r = client.get(f"{server_url}/version")
if r.status_code == 200:
data = r.json()
if "version" in data:
return "vllm"
except Exception:
pass
except Exception:
pass
return None
def _iter_nested_dicts(value: Any):
if isinstance(value, dict):
yield value
for nested in value.values():
yield from _iter_nested_dicts(nested)
elif isinstance(value, list):
for item in value:
yield from _iter_nested_dicts(item)
def _coerce_reasonable_int(value: Any, minimum: int = 1024, maximum: int = 10_000_000) -> Optional[int]:
try:
if isinstance(value, bool):
return None
if isinstance(value, str):
value = value.strip().replace(",", "")
result = int(value)
except (TypeError, ValueError):
return None
if minimum <= result <= maximum:
return result
return None
def _extract_first_int(payload: Dict[str, Any], keys: tuple[str, ...]) -> Optional[int]:
keyset = {key.lower() for key in keys}
for mapping in _iter_nested_dicts(payload):
for key, value in mapping.items():
if str(key).lower() not in keyset:
continue
coerced = _coerce_reasonable_int(value)
if coerced is not None:
return coerced
return None
def _extract_context_length(payload: Dict[str, Any]) -> Optional[int]:
return _extract_first_int(payload, _CONTEXT_LENGTH_KEYS)
def _extract_max_completion_tokens(payload: Dict[str, Any]) -> Optional[int]:
return _extract_first_int(payload, _MAX_COMPLETION_KEYS)
def _extract_pricing(payload: Dict[str, Any]) -> Dict[str, Any]:
alias_map = {
"prompt": ("prompt", "input", "input_cost_per_token", "prompt_token_cost"),
"completion": ("completion", "output", "output_cost_per_token", "completion_token_cost"),
"request": ("request", "request_cost"),
"cache_read": ("cache_read", "cached_prompt", "input_cache_read", "cache_read_cost_per_token"),
"cache_write": ("cache_write", "cache_creation", "input_cache_write", "cache_write_cost_per_token"),
}
for mapping in _iter_nested_dicts(payload):
normalized = {str(key).lower(): value for key, value in mapping.items()}
if not any(any(alias in normalized for alias in aliases) for aliases in alias_map.values()):
continue
pricing: Dict[str, Any] = {}
for target, aliases in alias_map.items():
for alias in aliases:
if alias in normalized and normalized[alias] not in (None, ""):
pricing[target] = normalized[alias]
break
if pricing:
return pricing
return {}
def _add_model_aliases(cache: Dict[str, Dict[str, Any]], model_id: str, entry: Dict[str, Any]) -> None:
cache[model_id] = entry
if "/" in model_id:
bare_model = model_id.split("/", 1)[1]
cache.setdefault(bare_model, entry)
def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any]]:
"""Fetch model metadata from OpenRouter (cached for 1 hour)."""
@@ -89,15 +351,16 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
cache = {}
for model in data.get("data", []):
model_id = model.get("id", "")
cache[model_id] = {
entry = {
"context_length": model.get("context_length", 128000),
"max_completion_tokens": model.get("top_provider", {}).get("max_completion_tokens", 4096),
"name": model.get("name", model_id),
"pricing": model.get("pricing", {}),
}
_add_model_aliases(cache, model_id, entry)
canonical = model.get("canonical_slug", "")
if canonical and canonical != model_id:
cache[canonical] = cache[model_id]
_add_model_aliases(cache, canonical, entry)
_model_metadata_cache = cache
_model_metadata_cache_time = time.time()
@@ -109,6 +372,94 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
return _model_metadata_cache or {}
def fetch_endpoint_model_metadata(
base_url: str,
api_key: str = "",
force_refresh: bool = False,
) -> Dict[str, Dict[str, Any]]:
"""Fetch model metadata from an OpenAI-compatible ``/models`` endpoint.
This is used for explicit custom endpoints where hardcoded global model-name
defaults are unreliable. Results are cached in memory per base URL.
"""
normalized = _normalize_base_url(base_url)
if not normalized or _is_openrouter_base_url(normalized):
return {}
if not force_refresh:
cached = _endpoint_model_metadata_cache.get(normalized)
cached_at = _endpoint_model_metadata_cache_time.get(normalized, 0)
if cached is not None and (time.time() - cached_at) < _ENDPOINT_MODEL_CACHE_TTL:
return cached
candidates = [normalized]
if normalized.endswith("/v1"):
alternate = normalized[:-3].rstrip("/")
else:
alternate = normalized + "/v1"
if alternate and alternate not in candidates:
candidates.append(alternate)
headers = {"Authorization": f"Bearer {api_key}"} if api_key else {}
last_error: Optional[Exception] = None
for candidate in candidates:
url = candidate.rstrip("/") + "/models"
try:
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
payload = response.json()
cache: Dict[str, Dict[str, Any]] = {}
for model in payload.get("data", []):
if not isinstance(model, dict):
continue
model_id = model.get("id")
if not model_id:
continue
entry: Dict[str, Any] = {"name": model.get("name", model_id)}
context_length = _extract_context_length(model)
if context_length is not None:
entry["context_length"] = context_length
max_completion_tokens = _extract_max_completion_tokens(model)
if max_completion_tokens is not None:
entry["max_completion_tokens"] = max_completion_tokens
pricing = _extract_pricing(model)
if pricing:
entry["pricing"] = pricing
_add_model_aliases(cache, model_id, entry)
# If this is a llama.cpp server, query /props for actual allocated context
is_llamacpp = any(
m.get("owned_by") == "llamacpp"
for m in payload.get("data", []) if isinstance(m, dict)
)
if is_llamacpp:
try:
props_url = candidate.rstrip("/").replace("/v1", "") + "/props"
props_resp = requests.get(props_url, headers=headers, timeout=5)
if props_resp.ok:
props = props_resp.json()
gen_settings = props.get("default_generation_settings", {})
n_ctx = gen_settings.get("n_ctx")
model_alias = props.get("model_alias", "")
if n_ctx and model_alias and model_alias in cache:
cache[model_alias]["context_length"] = n_ctx
except Exception:
pass
_endpoint_model_metadata_cache[normalized] = cache
_endpoint_model_metadata_cache_time[normalized] = time.time()
return cache
except Exception as exc:
last_error = exc
if last_error:
logger.debug("Failed to fetch model metadata from %s/models: %s", normalized, last_error)
_endpoint_model_metadata_cache[normalized] = {}
_endpoint_model_metadata_cache_time[normalized] = time.time()
return {}
def _get_context_cache_path() -> Path:
"""Return path to the persistent context length cache file."""
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
@@ -116,7 +467,7 @@ def _get_context_cache_path() -> Path:
def _load_context_cache() -> Dict[str, int]:
"""Load the model+provider context_length cache from disk."""
"""Load the model+provider -> context_length cache from disk."""
path = _get_context_cache_path()
if not path.exists():
return {}
@@ -145,7 +496,7 @@ def save_context_length(model: str, base_url: str, length: int) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
logger.info("Cached context length %s %s tokens", key, f"{length:,}")
logger.info("Cached context length %s -> %s tokens", key, f"{length:,}")
except Exception as e:
logger.debug("Failed to save context length cache: %s", e)
@@ -193,33 +544,304 @@ def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
return None
def get_model_context_length(model: str, base_url: str = "") -> int:
def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
"""Return True if *candidate_id* (from server) matches *lookup_model* (configured).
Supports two forms:
- Exact match: "nvidia-nemotron-super-49b-v1" == "nvidia-nemotron-super-49b-v1"
- Slug match: "nvidia/nvidia-nemotron-super-49b-v1" matches "nvidia-nemotron-super-49b-v1"
(the part after the last "/" equals lookup_model)
This covers LM Studio's native API which stores models as "publisher/slug"
while users typically configure only the slug after the "local:" prefix.
"""
if candidate_id == lookup_model:
return True
# Slug match: basename of candidate equals the lookup name
if "/" in candidate_id and candidate_id.rsplit("/", 1)[1] == lookup_model:
return True
return False
def _query_local_context_length(model: str, base_url: str) -> Optional[int]:
"""Query a local server for the model's context length."""
import httpx
# Strip recognised provider prefix (e.g., "local:model-name" → "model-name").
# Ollama "model:tag" colons (e.g. "qwen3.5:27b") are intentionally preserved.
model = _strip_provider_prefix(model)
# Strip /v1 suffix to get the server root
server_url = base_url.rstrip("/")
if server_url.endswith("/v1"):
server_url = server_url[:-3]
try:
server_type = detect_local_server_type(base_url)
except Exception:
server_type = None
try:
with httpx.Client(timeout=3.0) as client:
# Ollama: /api/show returns model details with context info
if server_type == "ollama":
resp = client.post(f"{server_url}/api/show", json={"name": model})
if resp.status_code == 200:
data = resp.json()
# Check model_info for context length
model_info = data.get("model_info", {})
for key, value in model_info.items():
if "context_length" in key and isinstance(value, (int, float)):
return int(value)
# Check parameters string for num_ctx
params = data.get("parameters", "")
if "num_ctx" in params:
for line in params.split("\n"):
if "num_ctx" in line:
parts = line.strip().split()
if len(parts) >= 2:
try:
return int(parts[-1])
except ValueError:
pass
# LM Studio native API: /api/v1/models returns max_context_length.
# This is more reliable than the OpenAI-compat /v1/models which
# doesn't include context window information for LM Studio servers.
# Use _model_id_matches for fuzzy matching: LM Studio stores models as
# "publisher/slug" but users configure only "slug" after "local:" prefix.
if server_type == "lm-studio":
resp = client.get(f"{server_url}/api/v1/models")
if resp.status_code == 200:
data = resp.json()
for m in data.get("models", []):
if _model_id_matches(m.get("key", ""), model) or _model_id_matches(m.get("id", ""), model):
# Prefer loaded instance context (actual runtime value)
for inst in m.get("loaded_instances", []):
cfg = inst.get("config", {})
ctx = cfg.get("context_length")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
# Fall back to max_context_length (theoretical model max)
ctx = m.get("max_context_length") or m.get("context_length")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
# LM Studio / vLLM / llama.cpp: try /v1/models/{model}
resp = client.get(f"{server_url}/v1/models/{model}")
if resp.status_code == 200:
data = resp.json()
# vLLM returns max_model_len
ctx = data.get("max_model_len") or data.get("context_length") or data.get("max_tokens")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
# Try /v1/models and find the model in the list.
# Use _model_id_matches to handle "publisher/slug" vs bare "slug".
resp = client.get(f"{server_url}/v1/models")
if resp.status_code == 200:
data = resp.json()
models_list = data.get("data", [])
for m in models_list:
if _model_id_matches(m.get("id", ""), model):
ctx = m.get("max_model_len") or m.get("context_length") or m.get("max_tokens")
if ctx and isinstance(ctx, (int, float)):
return int(ctx)
except Exception:
pass
return None
def _normalize_model_version(model: str) -> str:
"""Normalize version separators for matching.
Nous uses dashes: claude-opus-4-6, claude-sonnet-4-5
OpenRouter uses dots: claude-opus-4.6, claude-sonnet-4.5
Normalize both to dashes for comparison.
"""
return model.replace(".", "-")
def _query_anthropic_context_length(model: str, base_url: str, api_key: str) -> Optional[int]:
"""Query Anthropic's /v1/models endpoint for context length.
Only works with regular ANTHROPIC_API_KEY (sk-ant-api*).
OAuth tokens (sk-ant-oat*) from Claude Code return 401.
"""
if not api_key or api_key.startswith("sk-ant-oat"):
return None # OAuth tokens can't access /v1/models
try:
base = base_url.rstrip("/")
if base.endswith("/v1"):
base = base[:-3]
url = f"{base}/v1/models?limit=1000"
headers = {
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
}
resp = requests.get(url, headers=headers, timeout=10)
if resp.status_code != 200:
return None
data = resp.json()
for m in data.get("data", []):
if m.get("id") == model:
ctx = m.get("max_input_tokens")
if isinstance(ctx, int) and ctx > 0:
return ctx
except Exception as e:
logger.debug("Anthropic /v1/models query failed: %s", e)
return None
def _resolve_nous_context_length(model: str) -> Optional[int]:
"""Resolve Nous Portal model context length via OpenRouter metadata.
Nous model IDs are bare (e.g. 'claude-opus-4-6') while OpenRouter uses
prefixed IDs (e.g. 'anthropic/claude-opus-4.6'). Try suffix matching
with version normalization (dot↔dash).
"""
metadata = fetch_model_metadata() # OpenRouter cache
# Exact match first
if model in metadata:
return metadata[model].get("context_length")
normalized = _normalize_model_version(model).lower()
for or_id, entry in metadata.items():
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
if bare.lower() == model.lower() or _normalize_model_version(bare).lower() == normalized:
return entry.get("context_length")
# Partial prefix match for cases like gemini-3-flash → gemini-3-flash-preview
# Require match to be at a word boundary (followed by -, :, or end of string)
model_lower = model.lower()
for or_id, entry in metadata.items():
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
for candidate, query in [(bare.lower(), model_lower), (_normalize_model_version(bare).lower(), normalized)]:
if candidate.startswith(query) and (
len(candidate) == len(query) or candidate[len(query)] in "-:."
):
return entry.get("context_length")
return None
def get_model_context_length(
model: str,
base_url: str = "",
api_key: str = "",
config_context_length: int | None = None,
provider: str = "",
) -> int:
"""Get the context length for a model.
Resolution order:
0. Explicit config override (model.context_length or custom_providers per-model)
1. Persistent cache (previously discovered via probing)
2. OpenRouter API metadata
3. Hardcoded DEFAULT_CONTEXT_LENGTHS (fuzzy match)
4. First probe tier (2M) — will be narrowed on first context error
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
5. OpenRouter live API metadata
6. Nous suffix-match via OpenRouter cache
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. Default fallback (128K)
"""
# 0. Explicit config override — user knows best
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
return config_context_length
# Normalise provider-prefixed model names (e.g. "local:model-name" →
# "model-name") so cache lookups and server queries use the bare ID that
# local servers actually know about. Ollama "model:tag" colons are preserved.
model = _strip_provider_prefix(model)
# 1. Check persistent cache (model+provider)
if base_url:
cached = get_cached_context_length(model, base_url)
if cached is not None:
return cached
# 2. OpenRouter API metadata
# 2. Active endpoint metadata for explicit custom routes
if _is_custom_endpoint(base_url):
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
# Single-model servers: if only one model is loaded, use it
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
# Fuzzy match: substring in either direction
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
if not _is_known_provider_base_url(base_url):
# 3. Try querying local server directly
if is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url)
if local_ctx and local_ctx > 0:
save_context_length(model, base_url, local_ctx)
return local_ctx
logger.info(
"Could not detect context length for model %r at %s"
"defaulting to %s tokens (probe-down). Set model.context_length "
"in config.yaml to override.",
model, base_url, f"{DEFAULT_FALLBACK_CONTEXT:,}",
)
return DEFAULT_FALLBACK_CONTEXT
# 4. Anthropic /v1/models API (only for regular API keys, not OAuth)
if provider == "anthropic" or (
base_url and "api.anthropic.com" in base_url
):
ctx = _query_anthropic_context_length(model, base_url or "https://api.anthropic.com", api_key)
if ctx:
return ctx
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
# since the same model can have different context limits per provider
# (e.g. claude-opus-4.6 is 1M on Anthropic but 128K on GitHub Copilot).
if provider == "nous":
ctx = _resolve_nous_context_length(model)
if ctx:
return ctx
if provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(provider, model)
if ctx:
return ctx
# 6. OpenRouter live API metadata (provider-unaware fallback)
metadata = fetch_model_metadata()
if model in metadata:
return metadata[model].get("context_length", 128000)
# 3. Hardcoded defaults (fuzzy match)
for default_model, length in DEFAULT_CONTEXT_LENGTHS.items():
if default_model in model or model in default_model:
# 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
# Only check `default_model in model` (is the key a substring of the input).
# The reverse (`model in default_model`) causes shorter names like
# "claude-sonnet-4" to incorrectly match "claude-sonnet-4-6" and return 1M.
for default_model, length in sorted(
DEFAULT_CONTEXT_LENGTHS.items(), key=lambda x: len(x[0]), reverse=True
):
if default_model in model:
return length
# 4. Unknown model — start at highest probe tier
return CONTEXT_PROBE_TIERS[0]
# 9. Query local server as last resort
if base_url and is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url)
if local_ctx and local_ctx > 0:
save_context_length(model, base_url, local_ctx)
return local_ctx
# 10. Default fallback — 128K
return DEFAULT_FALLBACK_CONTEXT
def estimate_tokens_rough(text: str) -> int:
+171
View File
@@ -0,0 +1,171 @@
"""Models.dev registry integration for provider-aware context length detection.
Fetches model metadata from https://models.dev/api.json — a community-maintained
database of 3800+ models across 100+ providers, including per-provider context
windows, pricing, and capabilities.
Data is cached in memory (1hr TTL) and on disk (~/.hermes/models_dev_cache.json)
to avoid cold-start network latency.
"""
import json
import logging
import os
import time
from pathlib import Path
from typing import Any, Dict, Optional
import requests
logger = logging.getLogger(__name__)
MODELS_DEV_URL = "https://models.dev/api.json"
_MODELS_DEV_CACHE_TTL = 3600 # 1 hour in-memory
# In-memory cache
_models_dev_cache: Dict[str, Any] = {}
_models_dev_cache_time: float = 0
# Provider ID mapping: Hermes provider names → models.dev provider IDs
PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openrouter": "openrouter",
"anthropic": "anthropic",
"zai": "zai",
"kimi-coding": "kimi-for-coding",
"minimax": "minimax",
"minimax-cn": "minimax-cn",
"deepseek": "deepseek",
"alibaba": "alibaba",
"copilot": "github-copilot",
"ai-gateway": "vercel",
"opencode-zen": "opencode",
"opencode-go": "opencode-go",
"kilocode": "kilo",
}
def _get_cache_path() -> Path:
"""Return path to disk cache file."""
env_val = os.environ.get("HERMES_HOME", "")
hermes_home = Path(env_val) if env_val else Path.home() / ".hermes"
return hermes_home / "models_dev_cache.json"
def _load_disk_cache() -> Dict[str, Any]:
"""Load models.dev data from disk cache."""
try:
cache_path = _get_cache_path()
if cache_path.exists():
with open(cache_path, encoding="utf-8") as f:
return json.load(f)
except Exception as e:
logger.debug("Failed to load models.dev disk cache: %s", e)
return {}
def _save_disk_cache(data: Dict[str, Any]) -> None:
"""Save models.dev data to disk cache."""
try:
cache_path = _get_cache_path()
cache_path.parent.mkdir(parents=True, exist_ok=True)
with open(cache_path, "w", encoding="utf-8") as f:
json.dump(data, f, separators=(",", ":"))
except Exception as e:
logger.debug("Failed to save models.dev disk cache: %s", e)
def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
"""Fetch models.dev registry. In-memory cache (1hr) + disk fallback.
Returns the full registry dict keyed by provider ID, or empty dict on failure.
"""
global _models_dev_cache, _models_dev_cache_time
# Check in-memory cache
if (
not force_refresh
and _models_dev_cache
and (time.time() - _models_dev_cache_time) < _MODELS_DEV_CACHE_TTL
):
return _models_dev_cache
# Try network fetch
try:
response = requests.get(MODELS_DEV_URL, timeout=15)
response.raise_for_status()
data = response.json()
if isinstance(data, dict) and len(data) > 0:
_models_dev_cache = data
_models_dev_cache_time = time.time()
_save_disk_cache(data)
logger.debug(
"Fetched models.dev registry: %d providers, %d total models",
len(data),
sum(len(p.get("models", {})) for p in data.values() if isinstance(p, dict)),
)
return data
except Exception as e:
logger.debug("Failed to fetch models.dev: %s", e)
# Fall back to disk cache — use a short TTL (5 min) so we retry
# the network fetch soon instead of serving stale data for a full hour.
if not _models_dev_cache:
_models_dev_cache = _load_disk_cache()
if _models_dev_cache:
_models_dev_cache_time = time.time() - _MODELS_DEV_CACHE_TTL + 300
logger.debug("Loaded models.dev from disk cache (%d providers)", len(_models_dev_cache))
return _models_dev_cache
def lookup_models_dev_context(provider: str, model: str) -> Optional[int]:
"""Look up context_length for a provider+model combo in models.dev.
Returns the context window in tokens, or None if not found.
Handles case-insensitive matching and filters out context=0 entries.
"""
mdev_provider_id = PROVIDER_TO_MODELS_DEV.get(provider)
if not mdev_provider_id:
return None
data = fetch_models_dev()
provider_data = data.get(mdev_provider_id)
if not isinstance(provider_data, dict):
return None
models = provider_data.get("models", {})
if not isinstance(models, dict):
return None
# Exact match
entry = models.get(model)
if entry:
ctx = _extract_context(entry)
if ctx:
return ctx
# Case-insensitive match
model_lower = model.lower()
for mid, mdata in models.items():
if mid.lower() == model_lower:
ctx = _extract_context(mdata)
if ctx:
return ctx
return None
def _extract_context(entry: Dict[str, Any]) -> Optional[int]:
"""Extract context_length from a models.dev model entry.
Returns None for invalid/zero values (some audio/image models have context=0).
"""
if not isinstance(entry, dict):
return None
limit = entry.get("limit")
if not isinstance(limit, dict):
return None
ctx = limit.get("context")
if isinstance(ctx, (int, float)) and ctx > 0:
return int(ctx)
return None
+154 -35
View File
@@ -56,6 +56,61 @@ def _scan_context_content(content: str, filename: str) -> str:
return content
def _find_git_root(start: Path) -> Optional[Path]:
"""Walk *start* and its parents looking for a ``.git`` directory.
Returns the directory containing ``.git``, or ``None`` if we hit the
filesystem root without finding one.
"""
current = start.resolve()
for parent in [current, *current.parents]:
if (parent / ".git").exists():
return parent
return None
_HERMES_MD_NAMES = (".hermes.md", "HERMES.md")
def _find_hermes_md(cwd: Path) -> Optional[Path]:
"""Discover the nearest ``.hermes.md`` or ``HERMES.md``.
Search order: *cwd* first, then each parent directory up to (and
including) the git repository root. Returns the first match, or
``None`` if nothing is found.
"""
stop_at = _find_git_root(cwd)
current = cwd.resolve()
for directory in [current, *current.parents]:
for name in _HERMES_MD_NAMES:
candidate = directory / name
if candidate.is_file():
return candidate
# Stop walking at the git root (or filesystem root).
if stop_at and directory == stop_at:
break
return None
def _strip_yaml_frontmatter(content: str) -> str:
"""Remove optional YAML frontmatter (``---`` delimited) from *content*.
The frontmatter may contain structured config (model overrides, tool
settings) that will be handled separately in a future PR. For now we
strip it so only the human-readable markdown body is injected into the
system prompt.
"""
if content.startswith("---"):
end = content.find("\n---", 3)
if end != -1:
# Skip past the closing --- and any trailing newline
body = content[end + 4:].lstrip("\n")
return body if body else content
return content
# =========================================================================
# Constants
# =========================================================================
@@ -73,9 +128,15 @@ DEFAULT_AGENT_IDENTITY = (
MEMORY_GUIDANCE = (
"You have persistent memory across sessions. Save durable facts using the memory "
"tool: user preferences, environment details, tool quirks, and stable conventions. "
"Memory is injected into every turn, so keep it compact. Do NOT save task progress, "
"session outcomes, or completed-work logs to memory; use session_search to recall "
"those from past transcripts."
"Memory is injected into every turn, so keep it compact and focused on facts that "
"will still matter later.\n"
"Prioritize what reduces future user steering — the most valuable memory is one "
"that prevents the user from having to correct or remind you again. "
"User preferences and recurring corrections matter more than procedural task details.\n"
"Do NOT save task progress, session outcomes, completed-work logs, or temporary TODO "
"state to memory; use session_search to recall those from past transcripts. "
"If you've discovered a new way to do something, solved a problem that could be "
"necessary later, save it as a skill with the skill tool."
)
SESSION_SEARCH_GUIDANCE = (
@@ -86,8 +147,11 @@ SESSION_SEARCH_GUIDANCE = (
SKILLS_GUIDANCE = (
"After completing a complex task (5+ tool calls), fixing a tricky error, "
"or discovering a non-trivial workflow, consider saving the approach as a "
"skill with skill_manage so you can reuse it next time."
"or discovering a non-trivial workflow, save the approach as a "
"skill with skill_manage so you can reuse it next time.\n"
"When using a skill and finding it outdated, incomplete, or wrong, "
"patch it immediately with skill_manage(action='patch') — don't wait to be asked. "
"Skills that aren't maintained become liabilities."
)
PLATFORM_HINTS = {
@@ -142,16 +206,21 @@ PLATFORM_HINTS = {
"contextually appropriate."
),
"cron": (
"You are running as a scheduled cron job. Your final response is automatically "
"delivered to the job's configured destination, so do not use send_message to "
"send to that same target again. If you want the user to receive something in "
"the scheduled destination, put it directly in your final response. Use "
"send_message only for additional or different targets."
"You are running as a scheduled cron job. There is no user present — you "
"cannot ask questions, request clarification, or wait for follow-up. Execute "
"the task fully and autonomously, making reasonable decisions where needed. "
"Your final response is automatically delivered to the job's configured "
"destination — put the primary content directly in your response."
),
"cli": (
"You are a CLI AI Agent. Try not to use markdown but simple text "
"renderable inside a terminal."
),
"sms": (
"You are communicating via SMS. Keep responses concise and use plain text "
"only — no markdown, no formatting. SMS messages are limited to ~1600 "
"characters, so be brief and direct."
),
}
CONTEXT_FILE_MAX_CHARS = 20_000
@@ -261,28 +330,34 @@ def build_skills_system_prompt(
# Each entry: (skill_name, description)
# Supports sub-categories: skills/mlops/training/axolotl/SKILL.md
# -> category "mlops/training", skill "axolotl"
# Load disabled skill names once for the entire scan
try:
from tools.skills_tool import _get_disabled_skill_names
disabled = _get_disabled_skill_names()
except Exception:
disabled = set()
skills_by_category: dict[str, list[tuple[str, str]]] = {}
for skill_file in skills_dir.rglob("SKILL.md"):
is_compatible, _, desc = _parse_skill_file(skill_file)
is_compatible, frontmatter, desc = _parse_skill_file(skill_file)
if not is_compatible:
continue
# Skip skills whose conditional activation rules exclude them
conditions = _read_skill_conditions(skill_file)
if not _skill_should_show(conditions, available_tools, available_toolsets):
continue
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
# Category is everything between skills_dir and the skill folder
# e.g. parts = ("mlops", "training", "axolotl", "SKILL.md")
# → category = "mlops/training", skill_name = "axolotl"
# e.g. parts = ("github", "github-auth", "SKILL.md")
# → category = "github", skill_name = "github-auth"
skill_name = parts[-2]
category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
else:
category = "general"
skill_name = skill_file.parent.name
# Respect user's disabled skills config
fm_name = frontmatter.get("name", skill_name)
if fm_name in disabled or skill_name in disabled:
continue
# Skip skills whose conditional activation rules exclude them
conditions = _read_skill_conditions(skill_file)
if not _skill_should_show(conditions, available_tools, available_toolsets):
continue
skills_by_category.setdefault(category, []).append((skill_name, desc))
if not skills_by_category:
@@ -326,6 +401,9 @@ def build_skills_system_prompt(
"Before replying, scan the skills below. If one clearly matches your task, "
"load it with skill_view(name) and follow its instructions. "
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"After difficult/iterative tasks, offer to save as a skill. "
"If a skill you loaded was missing steps, had wrong commands, or needed "
"pitfalls you discovered, update it before finishing.\n"
"\n"
"<available_skills>\n"
+ "\n".join(index_lines) + "\n"
@@ -351,11 +429,42 @@ def _truncate_content(content: str, filename: str, max_chars: int = CONTEXT_FILE
return head + marker + tail
def build_context_files_prompt(cwd: Optional[str] = None) -> str:
def load_soul_md() -> Optional[str]:
"""Load SOUL.md from HERMES_HOME and return its content, or None.
Used as the agent identity (slot #1 in the system prompt). When this
returns content, ``build_context_files_prompt`` should be called with
``skip_soul=True`` so SOUL.md isn't injected twice.
"""
try:
from hermes_cli.config import ensure_hermes_home
ensure_hermes_home()
except Exception as e:
logger.debug("Could not ensure HERMES_HOME before loading SOUL.md: %s", e)
soul_path = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "SOUL.md"
if not soul_path.exists():
return None
try:
content = soul_path.read_text(encoding="utf-8").strip()
if not content:
return None
content = _scan_context_content(content, "SOUL.md")
content = _truncate_content(content, "SOUL.md")
return content
except Exception as e:
logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
return None
def build_context_files_prompt(cwd: Optional[str] = None, skip_soul: bool = False) -> str:
"""Discover and load context files for the system prompt.
Discovery: AGENTS.md (recursive), .cursorrules / .cursor/rules/*.mdc,
and SOUL.md from HERMES_HOME only. Each capped at 20,000 chars.
When *skip_soul* is True, SOUL.md is not included here (it was already
loaded via ``load_soul_md()`` for the identity slot).
"""
if cwd is None:
cwd = os.getcwd()
@@ -423,23 +532,33 @@ def build_context_files_prompt(cwd: Optional[str] = None) -> str:
cursorrules_content = _truncate_content(cursorrules_content, ".cursorrules")
sections.append(cursorrules_content)
# SOUL.md from HERMES_HOME only
try:
from hermes_cli.config import ensure_hermes_home
ensure_hermes_home()
except Exception as e:
logger.debug("Could not ensure HERMES_HOME before loading SOUL.md: %s", e)
soul_path = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "SOUL.md"
if soul_path.exists():
# .hermes.md / HERMES.md — per-project agent config (walk to git root)
hermes_md_content = ""
hermes_md_path = _find_hermes_md(cwd_path)
if hermes_md_path:
try:
content = soul_path.read_text(encoding="utf-8").strip()
content = hermes_md_path.read_text(encoding="utf-8").strip()
if content:
content = _scan_context_content(content, "SOUL.md")
content = _truncate_content(content, "SOUL.md")
sections.append(content)
content = _strip_yaml_frontmatter(content)
rel = hermes_md_path.name
try:
rel = str(hermes_md_path.relative_to(cwd_path))
except ValueError:
pass
content = _scan_context_content(content, rel)
hermes_md_content = f"## {rel}\n\n{content}"
except Exception as e:
logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
logger.debug("Could not read %s: %s", hermes_md_path, e)
if hermes_md_content:
hermes_md_content = _truncate_content(hermes_md_content, ".hermes.md")
sections.append(hermes_md_content)
# SOUL.md from HERMES_HOME only — skip when already loaded as identity
if not skip_soul:
soul_content = load_soul_md()
if soul_content:
sections.append(soul_content)
if not sections:
return ""
+5 -1
View File
@@ -157,9 +157,10 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
global _skill_commands
_skill_commands = {}
try:
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform, _get_disabled_skill_names
if not SKILLS_DIR.exists():
return _skill_commands
disabled = _get_disabled_skill_names()
for skill_md in SKILLS_DIR.rglob("SKILL.md"):
if any(part in ('.git', '.github', '.hub') for part in skill_md.parts):
continue
@@ -170,6 +171,9 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
if not skill_matches_platform(frontmatter):
continue
name = frontmatter.get('name', skill_md.parent.name)
# Respect user's disabled skills config
if name in disabled:
continue
description = frontmatter.get('description', '')
if not description:
for line in body.strip().split('\n'):
+196
View File
@@ -0,0 +1,196 @@
"""Helpers for optional cheap-vs-strong model routing."""
from __future__ import annotations
import os
import re
from typing import Any, Dict, Optional
_COMPLEX_KEYWORDS = {
"debug",
"debugging",
"implement",
"implementation",
"refactor",
"patch",
"traceback",
"stacktrace",
"exception",
"error",
"analyze",
"analysis",
"investigate",
"architecture",
"design",
"compare",
"benchmark",
"optimize",
"optimise",
"review",
"terminal",
"shell",
"tool",
"tools",
"pytest",
"test",
"tests",
"plan",
"planning",
"delegate",
"subagent",
"cron",
"docker",
"kubernetes",
}
_URL_RE = re.compile(r"https?://|www\.", re.IGNORECASE)
def _coerce_bool(value: Any, default: bool = False) -> bool:
if value is None:
return default
if isinstance(value, bool):
return value
if isinstance(value, str):
return value.strip().lower() in {"1", "true", "yes", "on"}
return bool(value)
def _coerce_int(value: Any, default: int) -> int:
try:
return int(value)
except (TypeError, ValueError):
return default
def choose_cheap_model_route(user_message: str, routing_config: Optional[Dict[str, Any]]) -> Optional[Dict[str, Any]]:
"""Return the configured cheap-model route when a message looks simple.
Conservative by design: if the message has signs of code/tool/debugging/
long-form work, keep the primary model.
"""
cfg = routing_config or {}
if not _coerce_bool(cfg.get("enabled"), False):
return None
cheap_model = cfg.get("cheap_model") or {}
if not isinstance(cheap_model, dict):
return None
provider = str(cheap_model.get("provider") or "").strip().lower()
model = str(cheap_model.get("model") or "").strip()
if not provider or not model:
return None
text = (user_message or "").strip()
if not text:
return None
max_chars = _coerce_int(cfg.get("max_simple_chars"), 160)
max_words = _coerce_int(cfg.get("max_simple_words"), 28)
if len(text) > max_chars:
return None
if len(text.split()) > max_words:
return None
if text.count("\n") > 1:
return None
if "```" in text or "`" in text:
return None
if _URL_RE.search(text):
return None
lowered = text.lower()
words = {token.strip(".,:;!?()[]{}\"'`") for token in lowered.split()}
if words & _COMPLEX_KEYWORDS:
return None
route = dict(cheap_model)
route["provider"] = provider
route["model"] = model
route["routing_reason"] = "simple_turn"
return route
def resolve_turn_route(user_message: str, routing_config: Optional[Dict[str, Any]], primary: Dict[str, Any]) -> Dict[str, Any]:
"""Resolve the effective model/runtime for one turn.
Returns a dict with model/runtime/signature/label fields.
"""
route = choose_cheap_model_route(user_message, routing_config)
if not route:
return {
"model": primary.get("model"),
"runtime": {
"api_key": primary.get("api_key"),
"base_url": primary.get("base_url"),
"provider": primary.get("provider"),
"api_mode": primary.get("api_mode"),
"command": primary.get("command"),
"args": list(primary.get("args") or []),
},
"label": None,
"signature": (
primary.get("model"),
primary.get("provider"),
primary.get("base_url"),
primary.get("api_mode"),
primary.get("command"),
tuple(primary.get("args") or ()),
),
}
from hermes_cli.runtime_provider import resolve_runtime_provider
explicit_api_key = None
api_key_env = str(route.get("api_key_env") or "").strip()
if api_key_env:
explicit_api_key = os.getenv(api_key_env) or None
try:
runtime = resolve_runtime_provider(
requested=route.get("provider"),
explicit_api_key=explicit_api_key,
explicit_base_url=route.get("base_url"),
)
except Exception:
return {
"model": primary.get("model"),
"runtime": {
"api_key": primary.get("api_key"),
"base_url": primary.get("base_url"),
"provider": primary.get("provider"),
"api_mode": primary.get("api_mode"),
"command": primary.get("command"),
"args": list(primary.get("args") or []),
},
"label": None,
"signature": (
primary.get("model"),
primary.get("provider"),
primary.get("base_url"),
primary.get("api_mode"),
primary.get("command"),
tuple(primary.get("args") or ()),
),
}
return {
"model": route.get("model"),
"runtime": {
"api_key": runtime.get("api_key"),
"base_url": runtime.get("base_url"),
"provider": runtime.get("provider"),
"api_mode": runtime.get("api_mode"),
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
},
"label": f"smart route → {route.get('model')} ({runtime.get('provider')})",
"signature": (
route.get("model"),
runtime.get("provider"),
runtime.get("base_url"),
runtime.get("api_mode"),
runtime.get("command"),
tuple(runtime.get("args") or ()),
),
}
+125
View File
@@ -0,0 +1,125 @@
"""Auto-generate short session titles from the first user/assistant exchange.
Runs asynchronously after the first response is delivered so it never
adds latency to the user-facing reply.
"""
import logging
import threading
from typing import Optional
from agent.auxiliary_client import call_llm
logger = logging.getLogger(__name__)
_TITLE_PROMPT = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
"following exchange. The title should capture the main topic or intent. "
"Return ONLY the title text, nothing else. No quotes, no punctuation at the end, no prefixes."
)
def generate_title(user_message: str, assistant_response: str, timeout: float = 15.0) -> Optional[str]:
"""Generate a session title from the first exchange.
Uses the auxiliary LLM client (cheapest/fastest available model).
Returns the title string or None on failure.
"""
# Truncate long messages to keep the request small
user_snippet = user_message[:500] if user_message else ""
assistant_snippet = assistant_response[:500] if assistant_response else ""
messages = [
{"role": "system", "content": _TITLE_PROMPT},
{"role": "user", "content": f"User: {user_snippet}\n\nAssistant: {assistant_snippet}"},
]
try:
response = call_llm(
task="compression", # reuse compression task config (cheap/fast model)
messages=messages,
max_tokens=30,
temperature=0.3,
timeout=timeout,
)
title = (response.choices[0].message.content or "").strip()
# Clean up: remove quotes, trailing punctuation, prefixes like "Title: "
title = title.strip('"\'')
if title.lower().startswith("title:"):
title = title[6:].strip()
# Enforce reasonable length
if len(title) > 80:
title = title[:77] + "..."
return title if title else None
except Exception as e:
logger.debug("Title generation failed: %s", e)
return None
def auto_title_session(
session_db,
session_id: str,
user_message: str,
assistant_response: str,
) -> None:
"""Generate and set a session title if one doesn't already exist.
Called in a background thread after the first exchange completes.
Silently skips if:
- session_db is None
- session already has a title (user-set or previously auto-generated)
- title generation fails
"""
if not session_db or not session_id:
return
# Check if title already exists (user may have set one via /title before first response)
try:
existing = session_db.get_session_title(session_id)
if existing:
return
except Exception:
return
title = generate_title(user_message, assistant_response)
if not title:
return
try:
session_db.set_session_title(session_id, title)
logger.debug("Auto-generated session title: %s", title)
except Exception as e:
logger.debug("Failed to set auto-generated title: %s", e)
def maybe_auto_title(
session_db,
session_id: str,
user_message: str,
assistant_response: str,
conversation_history: list,
) -> None:
"""Fire-and-forget title generation after the first exchange.
Only generates a title when:
- This appears to be the first user→assistant exchange
- No title is already set
"""
if not session_db or not session_id or not user_message or not assistant_response:
return
# Count user messages in history to detect first exchange.
# conversation_history includes the exchange that just happened,
# so for a first exchange we expect exactly 1 user message
# (or 2 counting system). Be generous: generate on first 2 exchanges.
user_msg_count = sum(1 for m in (conversation_history or []) if m.get("role") == "user")
if user_msg_count > 2:
return
thread = threading.Thread(
target=auto_title_session,
args=(session_db, session_id, user_message, assistant_response),
daemon=True,
name="auto-title",
)
thread.start()
+655
View File
@@ -0,0 +1,655 @@
from __future__ import annotations
from dataclasses import dataclass
from datetime import datetime, timezone
from decimal import Decimal
from typing import Any, Dict, Literal, Optional
from agent.model_metadata import fetch_endpoint_model_metadata, fetch_model_metadata
DEFAULT_PRICING = {"input": 0.0, "output": 0.0}
_ZERO = Decimal("0")
_ONE_MILLION = Decimal("1000000")
CostStatus = Literal["actual", "estimated", "included", "unknown"]
CostSource = Literal[
"provider_cost_api",
"provider_generation_api",
"provider_models_api",
"official_docs_snapshot",
"user_override",
"custom_contract",
"none",
]
@dataclass(frozen=True)
class CanonicalUsage:
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_write_tokens: int = 0
reasoning_tokens: int = 0
request_count: int = 1
raw_usage: Optional[dict[str, Any]] = None
@property
def prompt_tokens(self) -> int:
return self.input_tokens + self.cache_read_tokens + self.cache_write_tokens
@property
def total_tokens(self) -> int:
return self.prompt_tokens + self.output_tokens
@dataclass(frozen=True)
class BillingRoute:
provider: str
model: str
base_url: str = ""
billing_mode: str = "unknown"
@dataclass(frozen=True)
class PricingEntry:
input_cost_per_million: Optional[Decimal] = None
output_cost_per_million: Optional[Decimal] = None
cache_read_cost_per_million: Optional[Decimal] = None
cache_write_cost_per_million: Optional[Decimal] = None
request_cost: Optional[Decimal] = None
source: CostSource = "none"
source_url: Optional[str] = None
pricing_version: Optional[str] = None
fetched_at: Optional[datetime] = None
@dataclass(frozen=True)
class CostResult:
amount_usd: Optional[Decimal]
status: CostStatus
source: CostSource
label: str
fetched_at: Optional[datetime] = None
pricing_version: Optional[str] = None
notes: tuple[str, ...] = ()
_UTC_NOW = lambda: datetime.now(timezone.utc)
# Official docs snapshot entries. Models whose published pricing and cache
# semantics are stable enough to encode exactly.
_OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
(
"anthropic",
"claude-opus-4-20250514",
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
cache_read_cost_per_million=Decimal("1.50"),
cache_write_cost_per_million=Decimal("18.75"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-prompt-caching-2026-03-16",
),
(
"anthropic",
"claude-sonnet-4-20250514",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
cache_read_cost_per_million=Decimal("0.30"),
cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-prompt-caching-2026-03-16",
),
# OpenAI
(
"openai",
"gpt-4o",
): PricingEntry(
input_cost_per_million=Decimal("2.50"),
output_cost_per_million=Decimal("10.00"),
cache_read_cost_per_million=Decimal("1.25"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"gpt-4o-mini",
): PricingEntry(
input_cost_per_million=Decimal("0.15"),
output_cost_per_million=Decimal("0.60"),
cache_read_cost_per_million=Decimal("0.075"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"gpt-4.1",
): PricingEntry(
input_cost_per_million=Decimal("2.00"),
output_cost_per_million=Decimal("8.00"),
cache_read_cost_per_million=Decimal("0.50"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"gpt-4.1-mini",
): PricingEntry(
input_cost_per_million=Decimal("0.40"),
output_cost_per_million=Decimal("1.60"),
cache_read_cost_per_million=Decimal("0.10"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"gpt-4.1-nano",
): PricingEntry(
input_cost_per_million=Decimal("0.10"),
output_cost_per_million=Decimal("0.40"),
cache_read_cost_per_million=Decimal("0.025"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"o3",
): PricingEntry(
input_cost_per_million=Decimal("10.00"),
output_cost_per_million=Decimal("40.00"),
cache_read_cost_per_million=Decimal("2.50"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
(
"openai",
"o3-mini",
): PricingEntry(
input_cost_per_million=Decimal("1.10"),
output_cost_per_million=Decimal("4.40"),
cache_read_cost_per_million=Decimal("0.55"),
source="official_docs_snapshot",
source_url="https://openai.com/api/pricing/",
pricing_version="openai-pricing-2026-03-16",
),
# Anthropic older models (pre-4.6 generation)
(
"anthropic",
"claude-3-5-sonnet-20241022",
): PricingEntry(
input_cost_per_million=Decimal("3.00"),
output_cost_per_million=Decimal("15.00"),
cache_read_cost_per_million=Decimal("0.30"),
cache_write_cost_per_million=Decimal("3.75"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-pricing-2026-03-16",
),
(
"anthropic",
"claude-3-5-haiku-20241022",
): PricingEntry(
input_cost_per_million=Decimal("0.80"),
output_cost_per_million=Decimal("4.00"),
cache_read_cost_per_million=Decimal("0.08"),
cache_write_cost_per_million=Decimal("1.00"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-pricing-2026-03-16",
),
(
"anthropic",
"claude-3-opus-20240229",
): PricingEntry(
input_cost_per_million=Decimal("15.00"),
output_cost_per_million=Decimal("75.00"),
cache_read_cost_per_million=Decimal("1.50"),
cache_write_cost_per_million=Decimal("18.75"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-pricing-2026-03-16",
),
(
"anthropic",
"claude-3-haiku-20240307",
): PricingEntry(
input_cost_per_million=Decimal("0.25"),
output_cost_per_million=Decimal("1.25"),
cache_read_cost_per_million=Decimal("0.03"),
cache_write_cost_per_million=Decimal("0.30"),
source="official_docs_snapshot",
source_url="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching",
pricing_version="anthropic-pricing-2026-03-16",
),
# DeepSeek
(
"deepseek",
"deepseek-chat",
): PricingEntry(
input_cost_per_million=Decimal("0.14"),
output_cost_per_million=Decimal("0.28"),
source="official_docs_snapshot",
source_url="https://api-docs.deepseek.com/quick_start/pricing",
pricing_version="deepseek-pricing-2026-03-16",
),
(
"deepseek",
"deepseek-reasoner",
): PricingEntry(
input_cost_per_million=Decimal("0.55"),
output_cost_per_million=Decimal("2.19"),
source="official_docs_snapshot",
source_url="https://api-docs.deepseek.com/quick_start/pricing",
pricing_version="deepseek-pricing-2026-03-16",
),
# Google Gemini
(
"google",
"gemini-2.5-pro",
): PricingEntry(
input_cost_per_million=Decimal("1.25"),
output_cost_per_million=Decimal("10.00"),
source="official_docs_snapshot",
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
(
"google",
"gemini-2.5-flash",
): PricingEntry(
input_cost_per_million=Decimal("0.15"),
output_cost_per_million=Decimal("0.60"),
source="official_docs_snapshot",
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
(
"google",
"gemini-2.0-flash",
): PricingEntry(
input_cost_per_million=Decimal("0.10"),
output_cost_per_million=Decimal("0.40"),
source="official_docs_snapshot",
source_url="https://ai.google.dev/pricing",
pricing_version="google-pricing-2026-03-16",
),
}
def _to_decimal(value: Any) -> Optional[Decimal]:
if value is None:
return None
try:
return Decimal(str(value))
except Exception:
return None
def _to_int(value: Any) -> int:
try:
return int(value or 0)
except Exception:
return 0
def resolve_billing_route(
model_name: str,
provider: Optional[str] = None,
base_url: Optional[str] = None,
) -> BillingRoute:
provider_name = (provider or "").strip().lower()
base = (base_url or "").strip().lower()
model = (model_name or "").strip()
if not provider_name and "/" in model:
inferred_provider, bare_model = model.split("/", 1)
if inferred_provider in {"anthropic", "openai", "google"}:
provider_name = inferred_provider
model = bare_model
if provider_name == "openai-codex":
return BillingRoute(provider="openai-codex", model=model, base_url=base_url or "", billing_mode="subscription_included")
if provider_name == "openrouter" or "openrouter.ai" in base:
return BillingRoute(provider="openrouter", model=model, base_url=base_url or "", billing_mode="official_models_api")
if provider_name == "anthropic":
return BillingRoute(provider="anthropic", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
if provider_name == "openai":
return BillingRoute(provider="openai", model=model.split("/")[-1], base_url=base_url or "", billing_mode="official_docs_snapshot")
if provider_name in {"custom", "local"} or (base and "localhost" in base):
return BillingRoute(provider=provider_name or "custom", model=model, base_url=base_url or "", billing_mode="unknown")
return BillingRoute(provider=provider_name or "unknown", model=model.split("/")[-1] if model else "", base_url=base_url or "", billing_mode="unknown")
def _lookup_official_docs_pricing(route: BillingRoute) -> Optional[PricingEntry]:
return _OFFICIAL_DOCS_PRICING.get((route.provider, route.model.lower()))
def _openrouter_pricing_entry(route: BillingRoute) -> Optional[PricingEntry]:
return _pricing_entry_from_metadata(
fetch_model_metadata(),
route.model,
source_url="https://openrouter.ai/docs/api/api-reference/models/get-models",
pricing_version="openrouter-models-api",
)
def _pricing_entry_from_metadata(
metadata: Dict[str, Dict[str, Any]],
model_id: str,
*,
source_url: str,
pricing_version: str,
) -> Optional[PricingEntry]:
if model_id not in metadata:
return None
pricing = metadata[model_id].get("pricing") or {}
prompt = _to_decimal(pricing.get("prompt"))
completion = _to_decimal(pricing.get("completion"))
request = _to_decimal(pricing.get("request"))
cache_read = _to_decimal(
pricing.get("cache_read")
or pricing.get("cached_prompt")
or pricing.get("input_cache_read")
)
cache_write = _to_decimal(
pricing.get("cache_write")
or pricing.get("cache_creation")
or pricing.get("input_cache_write")
)
if prompt is None and completion is None and request is None:
return None
def _per_token_to_per_million(value: Optional[Decimal]) -> Optional[Decimal]:
if value is None:
return None
return value * _ONE_MILLION
return PricingEntry(
input_cost_per_million=_per_token_to_per_million(prompt),
output_cost_per_million=_per_token_to_per_million(completion),
cache_read_cost_per_million=_per_token_to_per_million(cache_read),
cache_write_cost_per_million=_per_token_to_per_million(cache_write),
request_cost=request,
source="provider_models_api",
source_url=source_url,
pricing_version=pricing_version,
fetched_at=_UTC_NOW(),
)
def get_pricing_entry(
model_name: str,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> Optional[PricingEntry]:
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
if route.billing_mode == "subscription_included":
return PricingEntry(
input_cost_per_million=_ZERO,
output_cost_per_million=_ZERO,
cache_read_cost_per_million=_ZERO,
cache_write_cost_per_million=_ZERO,
source="none",
pricing_version="included-route",
)
if route.provider == "openrouter":
return _openrouter_pricing_entry(route)
if route.base_url:
entry = _pricing_entry_from_metadata(
fetch_endpoint_model_metadata(route.base_url, api_key=api_key or ""),
route.model,
source_url=f"{route.base_url.rstrip('/')}/models",
pricing_version="openai-compatible-models-api",
)
if entry:
return entry
return _lookup_official_docs_pricing(route)
def normalize_usage(
response_usage: Any,
*,
provider: Optional[str] = None,
api_mode: Optional[str] = None,
) -> CanonicalUsage:
"""Normalize raw API response usage into canonical token buckets.
Handles three API shapes:
- Anthropic: input_tokens/output_tokens/cache_read_input_tokens/cache_creation_input_tokens
- Codex Responses: input_tokens includes cache tokens; input_tokens_details.cached_tokens separates them
- OpenAI Chat Completions: prompt_tokens includes cache tokens; prompt_tokens_details.cached_tokens separates them
In both Codex and OpenAI modes, input_tokens is derived by subtracting cache
tokens from the total — the API contract is that input/prompt totals include
cached tokens and the details object breaks them out.
"""
if not response_usage:
return CanonicalUsage()
provider_name = (provider or "").strip().lower()
mode = (api_mode or "").strip().lower()
if mode == "anthropic_messages" or provider_name == "anthropic":
input_tokens = _to_int(getattr(response_usage, "input_tokens", 0))
output_tokens = _to_int(getattr(response_usage, "output_tokens", 0))
cache_read_tokens = _to_int(getattr(response_usage, "cache_read_input_tokens", 0))
cache_write_tokens = _to_int(getattr(response_usage, "cache_creation_input_tokens", 0))
elif mode == "codex_responses":
input_total = _to_int(getattr(response_usage, "input_tokens", 0))
output_tokens = _to_int(getattr(response_usage, "output_tokens", 0))
details = getattr(response_usage, "input_tokens_details", None)
cache_read_tokens = _to_int(getattr(details, "cached_tokens", 0) if details else 0)
cache_write_tokens = _to_int(
getattr(details, "cache_creation_tokens", 0) if details else 0
)
input_tokens = max(0, input_total - cache_read_tokens - cache_write_tokens)
else:
prompt_total = _to_int(getattr(response_usage, "prompt_tokens", 0))
output_tokens = _to_int(getattr(response_usage, "completion_tokens", 0))
details = getattr(response_usage, "prompt_tokens_details", None)
cache_read_tokens = _to_int(getattr(details, "cached_tokens", 0) if details else 0)
cache_write_tokens = _to_int(
getattr(details, "cache_write_tokens", 0) if details else 0
)
input_tokens = max(0, prompt_total - cache_read_tokens - cache_write_tokens)
reasoning_tokens = 0
output_details = getattr(response_usage, "output_tokens_details", None)
if output_details:
reasoning_tokens = _to_int(getattr(output_details, "reasoning_tokens", 0))
return CanonicalUsage(
input_tokens=input_tokens,
output_tokens=output_tokens,
cache_read_tokens=cache_read_tokens,
cache_write_tokens=cache_write_tokens,
reasoning_tokens=reasoning_tokens,
)
def estimate_usage_cost(
model_name: str,
usage: CanonicalUsage,
*,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> CostResult:
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
if route.billing_mode == "subscription_included":
return CostResult(
amount_usd=_ZERO,
status="included",
source="none",
label="included",
pricing_version="included-route",
)
entry = get_pricing_entry(model_name, provider=provider, base_url=base_url, api_key=api_key)
if not entry:
return CostResult(amount_usd=None, status="unknown", source="none", label="n/a")
notes: list[str] = []
amount = _ZERO
if usage.input_tokens and entry.input_cost_per_million is None:
return CostResult(amount_usd=None, status="unknown", source=entry.source, label="n/a")
if usage.output_tokens and entry.output_cost_per_million is None:
return CostResult(amount_usd=None, status="unknown", source=entry.source, label="n/a")
if usage.cache_read_tokens:
if entry.cache_read_cost_per_million is None:
return CostResult(
amount_usd=None,
status="unknown",
source=entry.source,
label="n/a",
notes=("cache-read pricing unavailable for route",),
)
if usage.cache_write_tokens:
if entry.cache_write_cost_per_million is None:
return CostResult(
amount_usd=None,
status="unknown",
source=entry.source,
label="n/a",
notes=("cache-write pricing unavailable for route",),
)
if entry.input_cost_per_million is not None:
amount += Decimal(usage.input_tokens) * entry.input_cost_per_million / _ONE_MILLION
if entry.output_cost_per_million is not None:
amount += Decimal(usage.output_tokens) * entry.output_cost_per_million / _ONE_MILLION
if entry.cache_read_cost_per_million is not None:
amount += Decimal(usage.cache_read_tokens) * entry.cache_read_cost_per_million / _ONE_MILLION
if entry.cache_write_cost_per_million is not None:
amount += Decimal(usage.cache_write_tokens) * entry.cache_write_cost_per_million / _ONE_MILLION
if entry.request_cost is not None and usage.request_count:
amount += Decimal(usage.request_count) * entry.request_cost
status: CostStatus = "estimated"
label = f"~${amount:.2f}"
if entry.source == "none" and amount == _ZERO:
status = "included"
label = "included"
if route.provider == "openrouter":
notes.append("OpenRouter cost is estimated from the models API until reconciled.")
return CostResult(
amount_usd=amount,
status=status,
source=entry.source,
label=label,
fetched_at=entry.fetched_at,
pricing_version=entry.pricing_version,
notes=tuple(notes),
)
def has_known_pricing(
model_name: str,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> bool:
"""Check whether we have pricing data for this model+route.
Uses direct lookup instead of routing through the full estimation
pipeline — avoids creating dummy usage objects just to check status.
"""
route = resolve_billing_route(model_name, provider=provider, base_url=base_url)
if route.billing_mode == "subscription_included":
return True
entry = get_pricing_entry(model_name, provider=provider, base_url=base_url, api_key=api_key)
return entry is not None
def get_pricing(
model_name: str,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> Dict[str, float]:
"""Backward-compatible thin wrapper for legacy callers.
Returns only non-cache input/output fields when a pricing entry exists.
Unknown routes return zeroes.
"""
entry = get_pricing_entry(model_name, provider=provider, base_url=base_url, api_key=api_key)
if not entry:
return {"input": 0.0, "output": 0.0}
return {
"input": float(entry.input_cost_per_million or _ZERO),
"output": float(entry.output_cost_per_million or _ZERO),
}
def estimate_cost_usd(
model: str,
input_tokens: int,
output_tokens: int,
*,
provider: Optional[str] = None,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
) -> float:
"""Backward-compatible helper for legacy callers.
This uses non-cached input/output only. New code should call
`estimate_usage_cost()` with canonical usage buckets.
"""
result = estimate_usage_cost(
model,
CanonicalUsage(input_tokens=input_tokens, output_tokens=output_tokens),
provider=provider,
base_url=base_url,
api_key=api_key,
)
return float(result.amount_usd or _ZERO)
def format_duration_compact(seconds: float) -> str:
if seconds < 60:
return f"{seconds:.0f}s"
minutes = seconds / 60
if minutes < 60:
return f"{minutes:.0f}m"
hours = minutes / 60
if hours < 24:
remaining_min = int(minutes % 60)
return f"{int(hours)}h {remaining_min}m" if remaining_min else f"{int(hours)}h"
days = hours / 24
return f"{days:.1f}d"
def format_token_count_compact(value: int) -> str:
abs_value = abs(int(value))
if abs_value < 1_000:
return str(int(value))
sign = "-" if value < 0 else ""
units = ((1_000_000_000, "B"), (1_000_000, "M"), (1_000, "K"))
for threshold, suffix in units:
if abs_value >= threshold:
scaled = abs_value / threshold
if scaled < 10:
text = f"{scaled:.2f}"
elif scaled < 100:
text = f"{scaled:.1f}"
else:
text = f"{scaled:.0f}"
text = text.rstrip("0").rstrip(".")
return f"{sign}{text}{suffix}"
return f"{value:,}"
+59 -1
View File
@@ -51,6 +51,20 @@ model:
# # Data policy: "allow" (default) or "deny" to exclude providers that may store data
# # data_collection: "deny"
# =============================================================================
# Smart Model Routing (optional)
# =============================================================================
# Use a cheaper model for short/simple turns while keeping your main model for
# more complex requests. Disabled by default.
#
# smart_model_routing:
# enabled: true
# max_simple_chars: 160
# max_simple_words: 28
# cheap_model:
# provider: openrouter
# model: google/gemini-2.5-flash
# =============================================================================
# Git Worktree Isolation
# =============================================================================
@@ -76,8 +90,9 @@ model:
# - Messaging (Telegram/Discord): Uses MESSAGING_CWD from .env (default: home)
terminal:
backend: "local"
cwd: "." # For local backend: "." = current directory. Ignored for remote backends.
cwd: "." # For local backend: "." = current directory. Ignored for remote backends unless a backend documents otherwise.
timeout: 180
docker_mount_cwd_to_workspace: false # SECURITY: off by default. Opt in to mount the launch cwd into Docker /workspace.
lifetime_seconds: 300
# sudo_password: "" # Enable sudo commands (pipes via sudo -S) - SECURITY WARNING: plaintext!
@@ -107,6 +122,13 @@ terminal:
# timeout: 180
# lifetime_seconds: 300
# docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
# docker_mount_cwd_to_workspace: true # Explicit opt-in: mount your launch cwd into /workspace
# # Optional: explicitly forward selected env vars into Docker.
# # These values come from your current shell first, then ~/.hermes/.env.
# # Warning: anything forwarded here is visible to commands run in the container.
# docker_forward_env:
# - "GITHUB_TOKEN"
# - "NPM_TOKEN"
# -----------------------------------------------------------------------------
# OPTION 4: Singularity/Apptainer container
@@ -333,6 +355,25 @@ session_reset:
idle_minutes: 1440 # Inactivity timeout in minutes (default: 1440 = 24 hours)
at_hour: 4 # Daily reset hour, 0-23 local time (default: 4 AM)
# When true, group/channel chats use one session per participant when the platform
# provides a user ID. This is the secure default and prevents users in the same
# room from sharing context, interrupts, and token costs. Set false only if you
# explicitly want one shared "room brain" per group/channel.
group_sessions_per_user: true
# ─────────────────────────────────────────────────────────────────────────────
# Gateway Streaming
# ─────────────────────────────────────────────────────────────────────────────
# Stream tokens to messaging platforms in real-time. The bot sends a message
# on first token, then progressively edits it as more tokens arrive.
# Disabled by default — enable to try the streaming UX on Telegram/Discord/Slack.
streaming:
enabled: false
# transport: edit # "edit" = progressive editMessageText
# edit_interval: 0.3 # seconds between message edits
# buffer_threshold: 40 # chars before forcing an edit flush
# cursor: " ▉" # cursor shown during streaming
# =============================================================================
# Skills Configuration
# =============================================================================
@@ -694,6 +735,12 @@ display:
# Toggle at runtime with /reasoning show or /reasoning hide.
show_reasoning: false
# Stream tokens to the terminal as they arrive instead of waiting for the
# full response. The response box opens on first token and text appears
# line-by-line. Tool calls are still captured silently.
# Disabled by default — enable to try the streaming UX.
streaming: false
# ───────────────────────────────────────────────────────────────────────────
# Skin / Theme
# ───────────────────────────────────────────────────────────────────────────
@@ -734,3 +781,14 @@ display:
# tool_prefix: "╎" # Tool output line prefix (default: ┊)
#
skin: default
# =============================================================================
# Privacy
# =============================================================================
# privacy:
# # Redact PII from the LLM context prompt.
# # When true, phone numbers are stripped and user/chat IDs are replaced
# # with deterministic hashes before being sent to the model.
# # Names and usernames are NOT affected (user-chosen, publicly visible).
# # Routing/delivery still uses the original values internally.
# redact_pii: false
+1290 -155
View File
File diff suppressed because it is too large Load Diff
+96 -6
View File
@@ -5,7 +5,9 @@ Jobs are stored in ~/.hermes/cron/jobs.json
Output is saved to ~/.hermes/cron/output/{job_id}/{timestamp}.md
"""
import copy
import json
import logging
import tempfile
import os
import re
@@ -14,6 +16,8 @@ from datetime import datetime, timedelta
from pathlib import Path
from typing import Optional, Dict, List, Any
logger = logging.getLogger(__name__)
from hermes_time import now as _hermes_now
try:
@@ -30,6 +34,7 @@ HERMES_DIR = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
CRON_DIR = HERMES_DIR / "cron"
JOBS_FILE = CRON_DIR / "jobs.json"
OUTPUT_DIR = CRON_DIR / "output"
ONESHOT_GRACE_SECONDS = 120
def _normalize_skill_list(skill: Optional[str] = None, skills: Optional[Any] = None) -> List[str]:
@@ -164,6 +169,10 @@ def parse_schedule(schedule: str) -> Dict[str, Any]:
try:
# Parse and validate
dt = datetime.fromisoformat(schedule.replace('Z', '+00:00'))
# Make naive timestamps timezone-aware at parse time so the stored
# value doesn't depend on the system timezone matching at check time.
if dt.tzinfo is None:
dt = dt.astimezone() # Interpret as local timezone
return {
"kind": "once",
"run_at": dt.isoformat(),
@@ -212,6 +221,33 @@ def _ensure_aware(dt: datetime) -> datetime:
return dt.astimezone(target_tz)
def _recoverable_oneshot_run_at(
schedule: Dict[str, Any],
now: datetime,
*,
last_run_at: Optional[str] = None,
) -> Optional[str]:
"""Return a one-shot run time if it is still eligible to fire.
One-shot jobs get a small grace window so jobs created a few seconds after
their requested minute still run on the next tick. Once a one-shot has
already run, it is never eligible again.
"""
if schedule.get("kind") != "once":
return None
if last_run_at:
return None
run_at = schedule.get("run_at")
if not run_at:
return None
run_at_dt = _ensure_aware(datetime.fromisoformat(run_at))
if run_at_dt >= now - timedelta(seconds=ONESHOT_GRACE_SECONDS):
return run_at
return None
def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
"""
Compute the next run time for a schedule.
@@ -221,9 +257,7 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None
now = _hermes_now()
if schedule["kind"] == "once":
run_at = _ensure_aware(datetime.fromisoformat(schedule["run_at"]))
# If in the future, return it; if in the past, no more runs
return schedule["run_at"] if run_at > now else None
return _recoverable_oneshot_run_at(schedule, now, last_run_at=last_run_at)
elif schedule["kind"] == "interval":
minutes = schedule["minutes"]
@@ -528,10 +562,18 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
def get_due_jobs() -> List[Dict[str, Any]]:
"""Get all jobs that are due to run now."""
"""Get all jobs that are due to run now.
For recurring jobs (cron/interval), if the scheduled time is stale
(more than one period in the past, e.g. because the gateway was down),
the job is fast-forwarded to the next future run instead of firing
immediately. This prevents a burst of missed jobs on gateway restart.
"""
now = _hermes_now()
jobs = [_apply_skill_fields(j) for j in load_jobs()]
raw_jobs = load_jobs()
jobs = [_apply_skill_fields(j) for j in copy.deepcopy(raw_jobs)]
due = []
needs_save = False
for job in jobs:
if not job.get("enabled", True):
@@ -539,12 +581,60 @@ def get_due_jobs() -> List[Dict[str, Any]]:
next_run = job.get("next_run_at")
if not next_run:
continue
recovered_next = _recoverable_oneshot_run_at(
job.get("schedule", {}),
now,
last_run_at=job.get("last_run_at"),
)
if not recovered_next:
continue
job["next_run_at"] = recovered_next
next_run = recovered_next
logger.info(
"Job '%s' had no next_run_at; recovering one-shot run at %s",
job.get("name", job["id"]),
recovered_next,
)
for rj in raw_jobs:
if rj["id"] == job["id"]:
rj["next_run_at"] = recovered_next
needs_save = True
break
next_run_dt = _ensure_aware(datetime.fromisoformat(next_run))
if next_run_dt <= now:
schedule = job.get("schedule", {})
kind = schedule.get("kind")
# For recurring jobs, check if the scheduled time is stale
# (gateway was down and missed the window). Fast-forward to
# the next future occurrence instead of firing a stale run.
if kind in ("cron", "interval") and (now - next_run_dt).total_seconds() > 120:
# More than 2 minutes late — this is a missed run, not a current one.
# Recompute next_run_at to the next future occurrence.
new_next = compute_next_run(schedule, now.isoformat())
if new_next:
logger.info(
"Job '%s' missed its scheduled time (%s). "
"Fast-forwarding to next run: %s",
job.get("name", job["id"]),
next_run,
new_next,
)
# Update the job in storage
for rj in raw_jobs:
if rj["id"] == job["id"]:
rj["next_run_at"] = new_next
needs_save = True
break
continue # Skip this run
due.append(job)
if needs_save:
save_jobs(raw_jobs)
return due
+67 -9
View File
@@ -37,6 +37,11 @@ sys.path.insert(0, str(Path(__file__).parent.parent))
from cron.jobs import get_due_jobs, mark_job_run, save_job_output
# Sentinel: when a cron agent has nothing new to report, it can start its
# response with this marker to suppress delivery. Output is still saved
# locally for audit.
SILENT_MARKER = "[SILENT]"
# Resolve Hermes home directory (respects HERMES_HOME override)
_hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
@@ -131,7 +136,12 @@ def _deliver_result(job: dict, content: str) -> None:
"slack": Platform.SLACK,
"whatsapp": Platform.WHATSAPP,
"signal": Platform.SIGNAL,
"matrix": Platform.MATRIX,
"mattermost": Platform.MATTERMOST,
"homeassistant": Platform.HOMEASSISTANT,
"dingtalk": Platform.DINGTALK,
"email": Platform.EMAIL,
"sms": Platform.SMS,
}
platform = platform_map.get(platform_name.lower())
if not platform:
@@ -179,6 +189,17 @@ def _build_job_prompt(job: dict) -> str:
"""Build the effective prompt for a cron job, optionally loading one or more skills first."""
prompt = job.get("prompt", "")
skills = job.get("skills")
# Always prepend [SILENT] guidance so the cron agent can suppress
# delivery when it has nothing new or noteworthy to report.
silent_hint = (
"[SYSTEM: If you have nothing new or noteworthy to report, respond "
"with exactly \"[SILENT]\" (optionally followed by a brief internal "
"note). This suppresses delivery to the user while still saving "
"output locally. Only use [SILENT] when there are genuinely no "
"changes worth reporting.]\n\n"
)
prompt = silent_hint + prompt
if skills is None:
legacy = job.get("skill")
skills = [legacy] if legacy else []
@@ -190,11 +211,14 @@ def _build_job_prompt(job: dict) -> str:
from tools.skills_tool import skill_view
parts = []
skipped: list[str] = []
for skill_name in skill_names:
loaded = json.loads(skill_view(skill_name))
if not loaded.get("success"):
error = loaded.get("error") or f"Failed to load skill '{skill_name}'"
raise RuntimeError(error)
logger.warning("Cron job '%s': skill not found, skipping — %s", job.get("name", job.get("id")), error)
skipped.append(skill_name)
continue
content = str(loaded.get("content") or "").strip()
if parts:
@@ -207,6 +231,15 @@ def _build_job_prompt(job: dict) -> str:
]
)
if skipped:
notice = (
f"[SYSTEM: The following skill(s) were listed for this job but could not be found "
f"and were skipped: {', '.join(skipped)}. "
f"Start your response with a brief notice so the user is aware, e.g.: "
f"'⚠️ Skill(s) not found and skipped: {', '.join(skipped)}']"
)
parts.insert(0, notice)
if prompt:
parts.extend(["", f"The user has provided the following instruction alongside the skill invocation: {prompt}"])
return "\n".join(parts)
@@ -315,6 +348,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
# Provider routing
pr = _cfg.get("provider_routing", {})
smart_routing = _cfg.get("smart_model_routing", {}) or {}
from hermes_cli.runtime_provider import (
resolve_runtime_provider,
@@ -331,12 +365,29 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
message = format_runtime_provider_error(exc)
raise RuntimeError(message) from exc
from agent.smart_model_routing import resolve_turn_route
turn_route = resolve_turn_route(
prompt,
smart_routing,
{
"model": model,
"api_key": runtime.get("api_key"),
"base_url": runtime.get("base_url"),
"provider": runtime.get("provider"),
"api_mode": runtime.get("api_mode"),
"command": runtime.get("command"),
"args": list(runtime.get("args") or []),
},
)
agent = AIAgent(
model=model,
api_key=runtime.get("api_key"),
base_url=runtime.get("base_url"),
provider=runtime.get("provider"),
api_mode=runtime.get("api_mode"),
model=turn_route["model"],
api_key=turn_route["runtime"].get("api_key"),
base_url=turn_route["runtime"].get("base_url"),
provider=turn_route["runtime"].get("provider"),
api_mode=turn_route["runtime"].get("api_mode"),
acp_command=turn_route["runtime"].get("command"),
acp_args=turn_route["runtime"].get("args"),
max_iterations=max_iterations,
reasoning_config=reasoning_config,
prefill_messages=prefill_messages,
@@ -344,7 +395,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
providers_ignored=pr.get("ignore"),
providers_order=pr.get("order"),
provider_sort=pr.get("sort"),
disabled_toolsets=["cronjob"],
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
platform="cron",
session_id=f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}",
@@ -465,9 +516,16 @@ def tick(verbose: bool = True) -> int:
if verbose:
logger.info("Output saved to: %s", output_file)
# Deliver the final response to the origin/target chat
# Deliver the final response to the origin/target chat.
# If the agent responded with [SILENT], skip delivery (but
# output is already saved above). Failed jobs always deliver.
deliver_content = final_response if success else f"⚠️ Cron job '{job.get('name', job['id'])}' failed:\n{error}"
if deliver_content:
should_deliver = bool(deliver_content)
if should_deliver and success and deliver_content.strip().upper().startswith(SILENT_MARKER):
logger.info("Job '%s': agent returned %s — skipping delivery", job["id"], SILENT_MARKER)
should_deliver = False
if should_deliver:
try:
_deliver_result(job, deliver_content)
except Exception as de:
@@ -0,0 +1,608 @@
# Pricing Accuracy Architecture
Date: 2026-03-16
## Goal
Hermes should only show dollar costs when they are backed by an official source for the user's actual billing path.
This design replaces the current static, heuristic pricing flow in:
- `run_agent.py`
- `agent/usage_pricing.py`
- `agent/insights.py`
- `cli.py`
with a provider-aware pricing system that:
- handles cache billing correctly
- distinguishes `actual` vs `estimated` vs `included` vs `unknown`
- reconciles post-hoc costs when providers expose authoritative billing data
- supports direct providers, OpenRouter, subscriptions, enterprise pricing, and custom endpoints
## Problems In The Current Design
Current Hermes behavior has four structural issues:
1. It stores only `prompt_tokens` and `completion_tokens`, which is insufficient for providers that bill cache reads and cache writes separately.
2. It uses a static model price table and fuzzy heuristics, which can drift from current official pricing.
3. It assumes public API list pricing matches the user's real billing path.
4. It has no distinction between live estimates and reconciled billed cost.
## Design Principles
1. Normalize usage before pricing.
2. Never fold cached tokens into plain input cost.
3. Track certainty explicitly.
4. Treat the billing path as part of the model identity.
5. Prefer official machine-readable sources over scraped docs.
6. Use post-hoc provider cost APIs when available.
7. Show `n/a` rather than inventing precision.
## High-Level Architecture
The new system has four layers:
1. `usage_normalization`
Converts raw provider usage into a canonical usage record.
2. `pricing_source_resolution`
Determines the billing path, source of truth, and applicable pricing source.
3. `cost_estimation_and_reconciliation`
Produces an immediate estimate when possible, then replaces or annotates it with actual billed cost later.
4. `presentation`
`/usage`, `/insights`, and the status bar display cost with certainty metadata.
## Canonical Usage Record
Add a canonical usage model that every provider path maps into before any pricing math happens.
Suggested structure:
```python
@dataclass
class CanonicalUsage:
provider: str
billing_provider: str
model: str
billing_route: str
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_write_tokens: int = 0
reasoning_tokens: int = 0
request_count: int = 1
raw_usage: dict[str, Any] | None = None
raw_usage_fields: dict[str, str] | None = None
computed_fields: set[str] | None = None
provider_request_id: str | None = None
provider_generation_id: str | None = None
provider_response_id: str | None = None
```
Rules:
- `input_tokens` means non-cached input only.
- `cache_read_tokens` and `cache_write_tokens` are never merged into `input_tokens`.
- `output_tokens` excludes cache metrics.
- `reasoning_tokens` is telemetry unless a provider officially bills it separately.
This is the same normalization pattern used by `opencode`, extended with provenance and reconciliation ids.
## Provider Normalization Rules
### OpenAI Direct
Source usage fields:
- `prompt_tokens`
- `completion_tokens`
- `prompt_tokens_details.cached_tokens`
Normalization:
- `cache_read_tokens = cached_tokens`
- `input_tokens = prompt_tokens - cached_tokens`
- `cache_write_tokens = 0` unless OpenAI exposes it in the relevant route
- `output_tokens = completion_tokens`
### Anthropic Direct
Source usage fields:
- `input_tokens`
- `output_tokens`
- `cache_read_input_tokens`
- `cache_creation_input_tokens`
Normalization:
- `input_tokens = input_tokens`
- `output_tokens = output_tokens`
- `cache_read_tokens = cache_read_input_tokens`
- `cache_write_tokens = cache_creation_input_tokens`
### OpenRouter
Estimate-time usage normalization should use the response usage payload with the same rules as the underlying provider when possible.
Reconciliation-time records should also store:
- OpenRouter generation id
- native token fields when available
- `total_cost`
- `cache_discount`
- `upstream_inference_cost`
- `is_byok`
### Gemini / Vertex
Use official Gemini or Vertex usage fields where available.
If cached content tokens are exposed:
- map them to `cache_read_tokens`
If a route exposes no cache creation metric:
- store `cache_write_tokens = 0`
- preserve the raw usage payload for later extension
### DeepSeek And Other Direct Providers
Normalize only the fields that are officially exposed.
If a provider does not expose cache buckets:
- do not infer them unless the provider explicitly documents how to derive them
### Subscription / Included-Cost Routes
These still use the canonical usage model.
Tokens are tracked normally. Cost depends on billing mode, not on whether usage exists.
## Billing Route Model
Hermes must stop keying pricing solely by `model`.
Introduce a billing route descriptor:
```python
@dataclass
class BillingRoute:
provider: str
base_url: str | None
model: str
billing_mode: str
organization_hint: str | None = None
```
`billing_mode` values:
- `official_cost_api`
- `official_generation_api`
- `official_models_api`
- `official_docs_snapshot`
- `subscription_included`
- `user_override`
- `custom_contract`
- `unknown`
Examples:
- OpenAI direct API with Costs API access: `official_cost_api`
- Anthropic direct API with Usage & Cost API access: `official_cost_api`
- OpenRouter request before reconciliation: `official_models_api`
- OpenRouter request after generation lookup: `official_generation_api`
- GitHub Copilot style subscription route: `subscription_included`
- local OpenAI-compatible server: `unknown`
- enterprise contract with configured rates: `custom_contract`
## Cost Status Model
Every displayed cost should have:
```python
@dataclass
class CostResult:
amount_usd: Decimal | None
status: Literal["actual", "estimated", "included", "unknown"]
source: Literal[
"provider_cost_api",
"provider_generation_api",
"provider_models_api",
"official_docs_snapshot",
"user_override",
"custom_contract",
"none",
]
label: str
fetched_at: datetime | None
pricing_version: str | None
notes: list[str]
```
Presentation rules:
- `actual`: show dollar amount as final
- `estimated`: show dollar amount with estimate labeling
- `included`: show `included` or `$0.00 (included)` depending on UX choice
- `unknown`: show `n/a`
## Official Source Hierarchy
Resolve cost using this order:
1. Request-level or account-level official billed cost
2. Official machine-readable model pricing
3. Official docs snapshot
4. User override or custom contract
5. Unknown
The system must never skip to a lower level if a higher-confidence source exists for the current billing route.
## Provider-Specific Truth Rules
### OpenAI Direct
Preferred truth:
1. Costs API for reconciled spend
2. Official pricing page for live estimate
### Anthropic Direct
Preferred truth:
1. Usage & Cost API for reconciled spend
2. Official pricing docs for live estimate
### OpenRouter
Preferred truth:
1. `GET /api/v1/generation` for reconciled `total_cost`
2. `GET /api/v1/models` pricing for live estimate
Do not use underlying provider public pricing as the source of truth for OpenRouter billing.
### Gemini / Vertex
Preferred truth:
1. official billing export or billing API for reconciled spend when available for the route
2. official pricing docs for estimate
### DeepSeek
Preferred truth:
1. official machine-readable cost source if available in the future
2. official pricing docs snapshot today
### Subscription-Included Routes
Preferred truth:
1. explicit route config marking the model as included in subscription
These should display `included`, not an API list-price estimate.
### Custom Endpoint / Local Model
Preferred truth:
1. user override
2. custom contract config
3. unknown
These should default to `unknown`.
## Pricing Catalog
Replace the current `MODEL_PRICING` dict with a richer pricing catalog.
Suggested record:
```python
@dataclass
class PricingEntry:
provider: str
route_pattern: str
model_pattern: str
input_cost_per_million: Decimal | None = None
output_cost_per_million: Decimal | None = None
cache_read_cost_per_million: Decimal | None = None
cache_write_cost_per_million: Decimal | None = None
request_cost: Decimal | None = None
image_cost: Decimal | None = None
source: str = "official_docs_snapshot"
source_url: str | None = None
fetched_at: datetime | None = None
pricing_version: str | None = None
```
The catalog should be route-aware:
- `openai:gpt-5`
- `anthropic:claude-opus-4-6`
- `openrouter:anthropic/claude-opus-4.6`
- `copilot:gpt-4o`
This avoids conflating direct-provider billing with aggregator billing.
## Pricing Sync Architecture
Introduce a pricing sync subsystem instead of manually maintaining a single hardcoded table.
Suggested modules:
- `agent/pricing/catalog.py`
- `agent/pricing/sources.py`
- `agent/pricing/sync.py`
- `agent/pricing/reconcile.py`
- `agent/pricing/types.py`
### Sync Sources
- OpenRouter models API
- official provider docs snapshots where no API exists
- user overrides from config
### Sync Output
Cache pricing entries locally with:
- source URL
- fetch timestamp
- version/hash
- confidence/source type
### Sync Frequency
- startup warm cache
- background refresh every 6 to 24 hours depending on source
- manual `hermes pricing sync`
## Reconciliation Architecture
Live requests may produce only an estimate initially. Hermes should reconcile them later when a provider exposes actual billed cost.
Suggested flow:
1. Agent call completes.
2. Hermes stores canonical usage plus reconciliation ids.
3. Hermes computes an immediate estimate if a pricing source exists.
4. A reconciliation worker fetches actual cost when supported.
5. Session and message records are updated with `actual` cost.
This can run:
- inline for cheap lookups
- asynchronously for delayed provider accounting
## Persistence Changes
Session storage should stop storing only aggregate prompt/completion totals.
Add fields for both usage and cost certainty:
- `input_tokens`
- `output_tokens`
- `cache_read_tokens`
- `cache_write_tokens`
- `reasoning_tokens`
- `estimated_cost_usd`
- `actual_cost_usd`
- `cost_status`
- `cost_source`
- `pricing_version`
- `billing_provider`
- `billing_mode`
If schema expansion is too large for one PR, add a new pricing events table:
```text
session_cost_events
id
session_id
request_id
provider
model
billing_mode
input_tokens
output_tokens
cache_read_tokens
cache_write_tokens
estimated_cost_usd
actual_cost_usd
cost_status
cost_source
pricing_version
created_at
updated_at
```
## Hermes Touchpoints
### `run_agent.py`
Current responsibility:
- parse raw provider usage
- update session token counters
New responsibility:
- build `CanonicalUsage`
- update canonical counters
- store reconciliation ids
- emit usage event to pricing subsystem
### `agent/usage_pricing.py`
Current responsibility:
- static lookup table
- direct cost arithmetic
New responsibility:
- move or replace with pricing catalog facade
- no fuzzy model-family heuristics
- no direct pricing without billing-route context
### `cli.py`
Current responsibility:
- compute session cost directly from prompt/completion totals
New responsibility:
- display `CostResult`
- show status badges:
- `actual`
- `estimated`
- `included`
- `n/a`
### `agent/insights.py`
Current responsibility:
- recompute historical estimates from static pricing
New responsibility:
- aggregate stored pricing events
- prefer actual cost over estimate
- surface estimates only when reconciliation is unavailable
## UX Rules
### Status Bar
Show one of:
- `$1.42`
- `~$1.42`
- `included`
- `cost n/a`
Where:
- `$1.42` means `actual`
- `~$1.42` means `estimated`
- `included` means subscription-backed or explicitly zero-cost route
- `cost n/a` means unknown
### `/usage`
Show:
- token buckets
- estimated cost
- actual cost if available
- cost status
- pricing source
### `/insights`
Aggregate:
- actual cost totals
- estimated-only totals
- unknown-cost sessions count
- included-cost sessions count
## Config And Overrides
Add user-configurable pricing overrides in config:
```yaml
pricing:
mode: hybrid
sync_on_startup: true
sync_interval_hours: 12
overrides:
- provider: openrouter
model: anthropic/claude-opus-4.6
billing_mode: custom_contract
input_cost_per_million: 4.25
output_cost_per_million: 22.0
cache_read_cost_per_million: 0.5
cache_write_cost_per_million: 6.0
included_routes:
- provider: copilot
model: "*"
- provider: codex-subscription
model: "*"
```
Overrides must win over catalog defaults for the matching billing route.
## Rollout Plan
### Phase 1
- add canonical usage model
- split cache token buckets in `run_agent.py`
- stop pricing cache-inflated prompt totals
- preserve current UI with improved backend math
### Phase 2
- add route-aware pricing catalog
- integrate OpenRouter models API sync
- add `estimated` vs `included` vs `unknown`
### Phase 3
- add reconciliation for OpenRouter generation cost
- add actual cost persistence
- update `/insights` to prefer actual cost
### Phase 4
- add direct OpenAI and Anthropic reconciliation paths
- add user overrides and contract pricing
- add pricing sync CLI command
## Testing Strategy
Add tests for:
- OpenAI cached token subtraction
- Anthropic cache read/write separation
- OpenRouter estimated vs actual reconciliation
- subscription-backed models showing `included`
- custom endpoints showing `n/a`
- override precedence
- stale catalog fallback behavior
Current tests that assume heuristic pricing should be replaced with route-aware expectations.
## Non-Goals
- exact enterprise billing reconstruction without an official source or user override
- backfilling perfect historical cost for old sessions that lack cache bucket data
- scraping arbitrary provider web pages at request time
## Recommendation
Do not expand the existing `MODEL_PRICING` dict.
That path cannot satisfy the product requirement. Hermes should instead migrate to:
- canonical usage normalization
- route-aware pricing sources
- estimate-then-reconcile cost lifecycle
- explicit certainty states in the UI
This is the minimum architecture that makes the statement "Hermes pricing is backed by official sources where possible, and otherwise clearly labeled" defensible.
+1 -1
View File
@@ -63,7 +63,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
logger.warning("Channel directory: failed to build %s: %s", platform.value, e)
# Telegram, WhatsApp & Signal can't enumerate chats -- pull from session history
for plat_name in ("telegram", "whatsapp", "signal", "email"):
for plat_name in ("telegram", "whatsapp", "signal", "email", "sms"):
if plat_name not in platforms:
platforms[plat_name] = _build_from_sessions(plat_name)
+276 -39
View File
@@ -32,6 +32,15 @@ def _coerce_bool(value: Any, default: bool = True) -> bool:
return bool(value)
def _normalize_unauthorized_dm_behavior(value: Any, default: str = "pair") -> str:
"""Normalize unauthorized DM behavior to a supported value."""
if isinstance(value, str):
normalized = value.strip().lower()
if normalized in {"pair", "ignore"}:
return normalized
return default
class Platform(Enum):
"""Supported messaging platforms."""
LOCAL = "local"
@@ -40,8 +49,14 @@ class Platform(Enum):
WHATSAPP = "whatsapp"
SLACK = "slack"
SIGNAL = "signal"
MATTERMOST = "mattermost"
MATRIX = "matrix"
HOMEASSISTANT = "homeassistant"
EMAIL = "email"
SMS = "sms"
DINGTALK = "dingtalk"
API_SERVER = "api_server"
WEBHOOK = "webhook"
@dataclass
@@ -97,10 +112,11 @@ class SessionResetPolicy:
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "SessionResetPolicy":
# Handle both missing keys and explicit null values (YAML null → None)
mode = data.get("mode")
at_hour = data.get("at_hour")
idle_minutes = data.get("idle_minutes")
return cls(
mode=data.get("mode", "both"),
mode=mode if mode is not None else "both",
at_hour=at_hour if at_hour is not None else 4,
idle_minutes=idle_minutes if idle_minutes is not None else 1440,
)
@@ -145,6 +161,37 @@ class PlatformConfig:
)
@dataclass
class StreamingConfig:
"""Configuration for real-time token streaming to messaging platforms."""
enabled: bool = False
transport: str = "edit" # "edit" (progressive editMessageText) or "off"
edit_interval: float = 0.3 # Seconds between message edits
buffer_threshold: int = 40 # Chars before forcing an edit
cursor: str = "" # Cursor shown during streaming
def to_dict(self) -> Dict[str, Any]:
return {
"enabled": self.enabled,
"transport": self.transport,
"edit_interval": self.edit_interval,
"buffer_threshold": self.buffer_threshold,
"cursor": self.cursor,
}
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "StreamingConfig":
if not data:
return cls()
return cls(
enabled=data.get("enabled", False),
transport=data.get("transport", "edit"),
edit_interval=float(data.get("edit_interval", 0.3)),
buffer_threshold=int(data.get("buffer_threshold", 40)),
cursor=data.get("cursor", ""),
)
@dataclass
class GatewayConfig:
"""
@@ -174,7 +221,16 @@ class GatewayConfig:
# STT settings
stt_enabled: bool = True # Whether to auto-transcribe inbound voice messages
# Session isolation in shared chats
group_sessions_per_user: bool = True # Isolate group/channel sessions per participant when user IDs are available
# Unauthorized DM policy
unauthorized_dm_behavior: str = "pair" # "pair" or "ignore"
# Streaming configuration
streaming: StreamingConfig = field(default_factory=StreamingConfig)
def get_connected_platforms(self) -> List[Platform]:
"""Return list of platforms that are enabled and configured."""
connected = []
@@ -193,6 +249,15 @@ class GatewayConfig:
# Email uses extra dict for config (address + imap_host + smtp_host)
elif platform == Platform.EMAIL and config.extra.get("address"):
connected.append(platform)
# SMS uses api_key (Twilio auth token) — SID checked via env
elif platform == Platform.SMS and os.getenv("TWILIO_ACCOUNT_SID"):
connected.append(platform)
# API Server uses enabled flag only (no token needed)
elif platform == Platform.API_SERVER:
connected.append(platform)
# Webhook uses enabled flag only (secrets are per-route)
elif platform == Platform.WEBHOOK:
connected.append(platform)
return connected
def get_home_channel(self, platform: Platform) -> Optional[HomeChannel]:
@@ -239,6 +304,9 @@ class GatewayConfig:
"sessions_dir": str(self.sessions_dir),
"always_log_local": self.always_log_local,
"stt_enabled": self.stt_enabled,
"group_sessions_per_user": self.group_sessions_per_user,
"unauthorized_dm_behavior": self.unauthorized_dm_behavior,
"streaming": self.streaming.to_dict(),
}
@classmethod
@@ -279,6 +347,12 @@ class GatewayConfig:
if stt_enabled is None:
stt_enabled = data.get("stt", {}).get("enabled") if isinstance(data.get("stt"), dict) else None
group_sessions_per_user = data.get("group_sessions_per_user")
unauthorized_dm_behavior = _normalize_unauthorized_dm_behavior(
data.get("unauthorized_dm_behavior"),
"pair",
)
return cls(
platforms=platforms,
default_reset_policy=default_policy,
@@ -289,63 +363,131 @@ class GatewayConfig:
sessions_dir=sessions_dir,
always_log_local=data.get("always_log_local", True),
stt_enabled=_coerce_bool(stt_enabled, True),
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
unauthorized_dm_behavior=unauthorized_dm_behavior,
streaming=StreamingConfig.from_dict(data.get("streaming", {})),
)
def get_unauthorized_dm_behavior(self, platform: Optional[Platform] = None) -> str:
"""Return the effective unauthorized-DM behavior for a platform."""
if platform:
platform_cfg = self.platforms.get(platform)
if platform_cfg and "unauthorized_dm_behavior" in platform_cfg.extra:
return _normalize_unauthorized_dm_behavior(
platform_cfg.extra.get("unauthorized_dm_behavior"),
self.unauthorized_dm_behavior,
)
return self.unauthorized_dm_behavior
def load_gateway_config() -> GatewayConfig:
"""
Load gateway configuration from multiple sources.
Priority (highest to lowest):
1. Environment variables
2. ~/.hermes/gateway.json
3. cli-config.yaml gateway section
4. Defaults
2. ~/.hermes/config.yaml (primary user-facing config)
3. ~/.hermes/gateway.json (legacy — provides defaults under config.yaml)
4. Built-in defaults
"""
config = GatewayConfig()
# Try loading from ~/.hermes/gateway.json
_home = get_hermes_home()
gateway_config_path = _home / "gateway.json"
if gateway_config_path.exists():
try:
with open(gateway_config_path, "r", encoding="utf-8") as f:
data = json.load(f)
config = GatewayConfig.from_dict(data)
except Exception as e:
print(f"[gateway] Warning: Failed to load {gateway_config_path}: {e}")
gw_data: dict = {}
# Bridge session_reset from config.yaml (the user-facing config file)
# into the gateway config. config.yaml takes precedence over gateway.json
# for session reset policy since that's where hermes setup writes it.
# Legacy fallback: gateway.json provides the base layer.
# config.yaml keys always win when both specify the same setting.
gateway_json_path = _home / "gateway.json"
if gateway_json_path.exists():
try:
with open(gateway_json_path, "r", encoding="utf-8") as f:
gw_data = json.load(f) or {}
logger.info(
"Loaded legacy %s — consider moving settings to config.yaml",
gateway_json_path,
)
except Exception as e:
logger.warning("Failed to load %s: %s", gateway_json_path, e)
# Primary source: config.yaml
try:
import yaml
config_yaml_path = _home / "config.yaml"
if config_yaml_path.exists():
with open(config_yaml_path, encoding="utf-8") as f:
yaml_cfg = yaml.safe_load(f) or {}
# Map config.yaml keys → GatewayConfig.from_dict() schema.
# Each key overwrites whatever gateway.json may have set.
sr = yaml_cfg.get("session_reset")
if sr and isinstance(sr, dict):
config.default_reset_policy = SessionResetPolicy.from_dict(sr)
gw_data["default_reset_policy"] = sr
# Bridge quick commands from config.yaml into gateway runtime config.
# config.yaml is the user-facing config source, so when present it
# should override gateway.json for this setting.
qc = yaml_cfg.get("quick_commands")
if qc is not None:
if isinstance(qc, dict):
config.quick_commands = qc
gw_data["quick_commands"] = qc
else:
logger.warning("Ignoring invalid quick_commands in config.yaml (expected mapping, got %s)", type(qc).__name__)
logger.warning(
"Ignoring invalid quick_commands in config.yaml "
"(expected mapping, got %s)",
type(qc).__name__,
)
# Bridge STT enable/disable from config.yaml into gateway runtime.
# This keeps the gateway aligned with the user-facing config source.
stt_cfg = yaml_cfg.get("stt")
if isinstance(stt_cfg, dict) and "enabled" in stt_cfg:
config.stt_enabled = _coerce_bool(stt_cfg.get("enabled"), True)
if isinstance(stt_cfg, dict):
gw_data["stt"] = stt_cfg
# Bridge discord settings from config.yaml to env vars
# (env vars take precedence — only set if not already defined)
if "group_sessions_per_user" in yaml_cfg:
gw_data["group_sessions_per_user"] = yaml_cfg["group_sessions_per_user"]
streaming_cfg = yaml_cfg.get("streaming")
if isinstance(streaming_cfg, dict):
gw_data["streaming"] = streaming_cfg
if "reset_triggers" in yaml_cfg:
gw_data["reset_triggers"] = yaml_cfg["reset_triggers"]
if "always_log_local" in yaml_cfg:
gw_data["always_log_local"] = yaml_cfg["always_log_local"]
if "unauthorized_dm_behavior" in yaml_cfg:
gw_data["unauthorized_dm_behavior"] = _normalize_unauthorized_dm_behavior(
yaml_cfg.get("unauthorized_dm_behavior"),
"pair",
)
# Bridge per-platform settings from config.yaml into gw_data
platforms_data = gw_data.setdefault("platforms", {})
if not isinstance(platforms_data, dict):
platforms_data = {}
gw_data["platforms"] = platforms_data
for plat in Platform:
if plat == Platform.LOCAL:
continue
platform_cfg = yaml_cfg.get(plat.value)
if not isinstance(platform_cfg, dict):
continue
# Collect bridgeable keys from this platform section
bridged = {}
if "unauthorized_dm_behavior" in platform_cfg:
bridged["unauthorized_dm_behavior"] = _normalize_unauthorized_dm_behavior(
platform_cfg.get("unauthorized_dm_behavior"),
gw_data.get("unauthorized_dm_behavior", "pair"),
)
if "reply_prefix" in platform_cfg:
bridged["reply_prefix"] = platform_cfg["reply_prefix"]
if not bridged:
continue
plat_data = platforms_data.setdefault(plat.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[plat.value] = plat_data
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
extra.update(bridged)
# Discord settings → env vars (env vars take precedence)
discord_cfg = yaml_cfg.get("discord", {})
if isinstance(discord_cfg, dict):
if "require_mention" in discord_cfg and not os.getenv("DISCORD_REQUIRE_MENTION"):
@@ -360,6 +502,8 @@ def load_gateway_config() -> GatewayConfig:
except Exception:
pass
config = GatewayConfig.from_dict(gw_data)
# Override with environment variables
_apply_env_overrides(config)
@@ -385,6 +529,8 @@ def load_gateway_config() -> GatewayConfig:
Platform.TELEGRAM: "TELEGRAM_BOT_TOKEN",
Platform.DISCORD: "DISCORD_BOT_TOKEN",
Platform.SLACK: "SLACK_BOT_TOKEN",
Platform.MATTERMOST: "MATTERMOST_TOKEN",
Platform.MATRIX: "MATRIX_ACCESS_TOKEN",
}
for platform, pconfig in config.platforms.items():
if not pconfig.enabled:
@@ -478,6 +624,53 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
name=os.getenv("SIGNAL_HOME_CHANNEL_NAME", "Home"),
)
# Mattermost
mattermost_token = os.getenv("MATTERMOST_TOKEN")
if mattermost_token:
mattermost_url = os.getenv("MATTERMOST_URL", "")
if not mattermost_url:
logger.warning("MATTERMOST_TOKEN set but MATTERMOST_URL is missing")
if Platform.MATTERMOST not in config.platforms:
config.platforms[Platform.MATTERMOST] = PlatformConfig()
config.platforms[Platform.MATTERMOST].enabled = True
config.platforms[Platform.MATTERMOST].token = mattermost_token
config.platforms[Platform.MATTERMOST].extra["url"] = mattermost_url
mattermost_home = os.getenv("MATTERMOST_HOME_CHANNEL")
if mattermost_home:
config.platforms[Platform.MATTERMOST].home_channel = HomeChannel(
platform=Platform.MATTERMOST,
chat_id=mattermost_home,
name=os.getenv("MATTERMOST_HOME_CHANNEL_NAME", "Home"),
)
# Matrix
matrix_token = os.getenv("MATRIX_ACCESS_TOKEN")
matrix_homeserver = os.getenv("MATRIX_HOMESERVER", "")
if matrix_token or os.getenv("MATRIX_PASSWORD"):
if not matrix_homeserver:
logger.warning("MATRIX_ACCESS_TOKEN/MATRIX_PASSWORD set but MATRIX_HOMESERVER is missing")
if Platform.MATRIX not in config.platforms:
config.platforms[Platform.MATRIX] = PlatformConfig()
config.platforms[Platform.MATRIX].enabled = True
if matrix_token:
config.platforms[Platform.MATRIX].token = matrix_token
config.platforms[Platform.MATRIX].extra["homeserver"] = matrix_homeserver
matrix_user = os.getenv("MATRIX_USER_ID", "")
if matrix_user:
config.platforms[Platform.MATRIX].extra["user_id"] = matrix_user
matrix_password = os.getenv("MATRIX_PASSWORD", "")
if matrix_password:
config.platforms[Platform.MATRIX].extra["password"] = matrix_password
matrix_e2ee = os.getenv("MATRIX_ENCRYPTION", "").lower() in ("true", "1", "yes")
config.platforms[Platform.MATRIX].extra["encryption"] = matrix_e2ee
matrix_home = os.getenv("MATRIX_HOME_ROOM")
if matrix_home:
config.platforms[Platform.MATRIX].home_channel = HomeChannel(
platform=Platform.MATRIX,
chat_id=matrix_home,
name=os.getenv("MATRIX_HOME_ROOM_NAME", "Home"),
)
# Home Assistant
hass_token = os.getenv("HASS_TOKEN")
if hass_token:
@@ -511,6 +704,56 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
name=os.getenv("EMAIL_HOME_ADDRESS_NAME", "Home"),
)
# SMS (Twilio)
twilio_sid = os.getenv("TWILIO_ACCOUNT_SID")
if twilio_sid:
if Platform.SMS not in config.platforms:
config.platforms[Platform.SMS] = PlatformConfig()
config.platforms[Platform.SMS].enabled = True
config.platforms[Platform.SMS].api_key = os.getenv("TWILIO_AUTH_TOKEN", "")
sms_home = os.getenv("SMS_HOME_CHANNEL")
if sms_home:
config.platforms[Platform.SMS].home_channel = HomeChannel(
platform=Platform.SMS,
chat_id=sms_home,
name=os.getenv("SMS_HOME_CHANNEL_NAME", "Home"),
)
# API Server
api_server_enabled = os.getenv("API_SERVER_ENABLED", "").lower() in ("true", "1", "yes")
api_server_key = os.getenv("API_SERVER_KEY", "")
api_server_port = os.getenv("API_SERVER_PORT")
api_server_host = os.getenv("API_SERVER_HOST")
if api_server_enabled or api_server_key:
if Platform.API_SERVER not in config.platforms:
config.platforms[Platform.API_SERVER] = PlatformConfig()
config.platforms[Platform.API_SERVER].enabled = True
if api_server_key:
config.platforms[Platform.API_SERVER].extra["key"] = api_server_key
if api_server_port:
try:
config.platforms[Platform.API_SERVER].extra["port"] = int(api_server_port)
except ValueError:
pass
if api_server_host:
config.platforms[Platform.API_SERVER].extra["host"] = api_server_host
# Webhook platform
webhook_enabled = os.getenv("WEBHOOK_ENABLED", "").lower() in ("true", "1", "yes")
webhook_port = os.getenv("WEBHOOK_PORT")
webhook_secret = os.getenv("WEBHOOK_SECRET", "")
if webhook_enabled:
if Platform.WEBHOOK not in config.platforms:
config.platforms[Platform.WEBHOOK] = PlatformConfig()
config.platforms[Platform.WEBHOOK].enabled = True
if webhook_port:
try:
config.platforms[Platform.WEBHOOK].extra["port"] = int(webhook_port)
except ValueError:
pass
if webhook_secret:
config.platforms[Platform.WEBHOOK].extra["secret"] = webhook_secret
# Session settings
idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
if idle_minutes:
@@ -527,10 +770,4 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
pass
def save_gateway_config(config: GatewayConfig) -> None:
"""Save gateway configuration to ~/.hermes/gateway.json."""
gateway_config_path = get_hermes_home() / "gateway.json"
gateway_config_path.parent.mkdir(parents=True, exist_ok=True)
with open(gateway_config_path, "w", encoding="utf-8") as f:
json.dump(config.to_dict(), f, indent=2)
+3 -2
View File
@@ -8,8 +8,9 @@ Hooks are discovered from ~/.hermes/hooks/ directories, each containing:
Events:
- gateway:startup -- Gateway process starts
- session:start -- New session created
- session:reset -- User ran /new or /reset
- session:start -- New session created (first message of a new session)
- session:end -- Session ends (user ran /new or /reset)
- session:reset -- Session reset completed (new session entry created)
- agent:start -- Agent begins processing a message
- agent:step -- Each turn in the tool-calling loop
- agent:end -- Agent finishes processing
+790
View File
@@ -0,0 +1,790 @@
"""
OpenAI-compatible API server platform adapter.
Exposes an HTTP server with endpoints:
- POST /v1/chat/completions — OpenAI Chat Completions format (stateless)
- POST /v1/responses — OpenAI Responses API format (stateful via previous_response_id)
- GET /v1/responses/{response_id} — Retrieve a stored response
- DELETE /v1/responses/{response_id} — Delete a stored response
- GET /v1/models — lists hermes-agent as an available model
- GET /health — health check
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect to hermes-agent
through this adapter by pointing at http://localhost:8642/v1.
Requires:
- aiohttp (already available in the gateway)
"""
import asyncio
import collections
import json
import logging
import os
import time
import uuid
from typing import Any, Dict, List, Optional
try:
from aiohttp import web
AIOHTTP_AVAILABLE = True
except ImportError:
AIOHTTP_AVAILABLE = False
web = None # type: ignore[assignment]
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
SendResult,
)
logger = logging.getLogger(__name__)
# Default settings
DEFAULT_HOST = "127.0.0.1"
DEFAULT_PORT = 8642
MAX_STORED_RESPONSES = 100
def check_api_server_requirements() -> bool:
"""Check if API server dependencies are available."""
return AIOHTTP_AVAILABLE
class ResponseStore:
"""
In-memory LRU store for Responses API state.
Each stored response includes the full internal conversation history
(with tool calls and results) so it can be reconstructed on subsequent
requests via previous_response_id.
"""
def __init__(self, max_size: int = MAX_STORED_RESPONSES):
self._store: collections.OrderedDict[str, Dict[str, Any]] = collections.OrderedDict()
self._max_size = max_size
def get(self, response_id: str) -> Optional[Dict[str, Any]]:
"""Retrieve a stored response by ID (moves to end for LRU)."""
if response_id in self._store:
self._store.move_to_end(response_id)
return self._store[response_id]
return None
def put(self, response_id: str, data: Dict[str, Any]) -> None:
"""Store a response, evicting the oldest if at capacity."""
if response_id in self._store:
self._store.move_to_end(response_id)
self._store[response_id] = data
while len(self._store) > self._max_size:
self._store.popitem(last=False)
def delete(self, response_id: str) -> bool:
"""Remove a response from the store. Returns True if found and deleted."""
if response_id in self._store:
del self._store[response_id]
return True
return False
def __len__(self) -> int:
return len(self._store)
# ---------------------------------------------------------------------------
# CORS middleware
# ---------------------------------------------------------------------------
_CORS_HEADERS = {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Methods": "GET, POST, DELETE, OPTIONS",
"Access-Control-Allow-Headers": "Authorization, Content-Type",
}
if AIOHTTP_AVAILABLE:
@web.middleware
async def cors_middleware(request, handler):
"""Add CORS headers to every response; handle OPTIONS preflight."""
if request.method == "OPTIONS":
return web.Response(status=200, headers=_CORS_HEADERS)
response = await handler(request)
response.headers.update(_CORS_HEADERS)
return response
else:
cors_middleware = None # type: ignore[assignment]
class APIServerAdapter(BasePlatformAdapter):
"""
OpenAI-compatible HTTP API server adapter.
Runs an aiohttp web server that accepts OpenAI-format requests
and routes them through hermes-agent's AIAgent.
"""
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.API_SERVER)
extra = config.extra or {}
self._host: str = extra.get("host", os.getenv("API_SERVER_HOST", DEFAULT_HOST))
self._port: int = int(extra.get("port", os.getenv("API_SERVER_PORT", str(DEFAULT_PORT))))
self._api_key: str = extra.get("key", os.getenv("API_SERVER_KEY", ""))
self._app: Optional["web.Application"] = None
self._runner: Optional["web.AppRunner"] = None
self._site: Optional["web.TCPSite"] = None
self._response_store = ResponseStore()
# Conversation name → latest response_id mapping
self._conversations: Dict[str, str] = {}
# ------------------------------------------------------------------
# Auth helper
# ------------------------------------------------------------------
def _check_auth(self, request: "web.Request") -> Optional["web.Response"]:
"""
Validate Bearer token from Authorization header.
Returns None if auth is OK, or a 401 web.Response on failure.
If no API key is configured, all requests are allowed.
"""
if not self._api_key:
return None # No key configured — allow all (local-only use)
auth_header = request.headers.get("Authorization", "")
if auth_header.startswith("Bearer "):
token = auth_header[7:].strip()
if token == self._api_key:
return None # Auth OK
return web.json_response(
{"error": {"message": "Invalid API key", "type": "invalid_request_error", "code": "invalid_api_key"}},
status=401,
)
# ------------------------------------------------------------------
# Agent creation helper
# ------------------------------------------------------------------
def _create_agent(
self,
ephemeral_system_prompt: Optional[str] = None,
session_id: Optional[str] = None,
stream_delta_callback=None,
) -> Any:
"""
Create an AIAgent instance using the gateway's runtime config.
Uses _resolve_runtime_agent_kwargs() to pick up model, api_key,
base_url, etc. from config.yaml / env vars.
"""
from run_agent import AIAgent
from gateway.run import _resolve_runtime_agent_kwargs, _resolve_gateway_model
runtime_kwargs = _resolve_runtime_agent_kwargs()
model = _resolve_gateway_model()
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
agent = AIAgent(
model=model,
**runtime_kwargs,
max_iterations=max_iterations,
quiet_mode=True,
verbose_logging=False,
ephemeral_system_prompt=ephemeral_system_prompt or None,
session_id=session_id,
platform="api_server",
stream_delta_callback=stream_delta_callback,
)
return agent
# ------------------------------------------------------------------
# HTTP Handlers
# ------------------------------------------------------------------
async def _handle_health(self, request: "web.Request") -> "web.Response":
"""GET /health — simple health check."""
return web.json_response({"status": "ok", "platform": "hermes-agent"})
async def _handle_models(self, request: "web.Request") -> "web.Response":
"""GET /v1/models — return hermes-agent as an available model."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
return web.json_response({
"object": "list",
"data": [
{
"id": "hermes-agent",
"object": "model",
"created": int(time.time()),
"owned_by": "hermes",
"permission": [],
"root": "hermes-agent",
"parent": None,
}
],
})
async def _handle_chat_completions(self, request: "web.Request") -> "web.Response":
"""POST /v1/chat/completions — OpenAI Chat Completions format."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
# Parse request body
try:
body = await request.json()
except (json.JSONDecodeError, Exception):
return web.json_response(
{"error": {"message": "Invalid JSON in request body", "type": "invalid_request_error"}},
status=400,
)
messages = body.get("messages")
if not messages or not isinstance(messages, list):
return web.json_response(
{"error": {"message": "Missing or invalid 'messages' field", "type": "invalid_request_error"}},
status=400,
)
stream = body.get("stream", False)
# Extract system message (becomes ephemeral system prompt layered ON TOP of core)
system_prompt = None
conversation_messages: List[Dict[str, str]] = []
for msg in messages:
role = msg.get("role", "")
content = msg.get("content", "")
if role == "system":
# Accumulate system messages
if system_prompt is None:
system_prompt = content
else:
system_prompt = system_prompt + "\n" + content
elif role in ("user", "assistant"):
conversation_messages.append({"role": role, "content": content})
# Extract the last user message as the primary input
user_message = ""
history = []
if conversation_messages:
user_message = conversation_messages[-1].get("content", "")
history = conversation_messages[:-1]
if not user_message:
return web.json_response(
{"error": {"message": "No user message found in messages", "type": "invalid_request_error"}},
status=400,
)
session_id = str(uuid.uuid4())
completion_id = f"chatcmpl-{uuid.uuid4().hex[:29]}"
model_name = body.get("model", "hermes-agent")
created = int(time.time())
if stream:
import queue as _q
_stream_q: _q.Queue = _q.Queue()
def _on_delta(delta):
_stream_q.put(delta)
# Start agent in background
agent_task = asyncio.ensure_future(self._run_agent(
user_message=user_message,
conversation_history=history,
ephemeral_system_prompt=system_prompt,
session_id=session_id,
stream_delta_callback=_on_delta,
))
return await self._write_sse_chat_completion(
request, completion_id, model_name, created, _stream_q, agent_task
)
# Non-streaming: run the agent and return full response
try:
result, usage = await self._run_agent(
user_message=user_message,
conversation_history=history,
ephemeral_system_prompt=system_prompt,
session_id=session_id,
)
except Exception as e:
logger.error("Error running agent for chat completions: %s", e, exc_info=True)
return web.json_response(
{"error": {"message": f"Internal server error: {e}", "type": "server_error"}},
status=500,
)
final_response = result.get("final_response", "")
if not final_response:
final_response = result.get("error", "(No response generated)")
response_data = {
"id": completion_id,
"object": "chat.completion",
"created": created,
"model": model_name,
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": final_response,
},
"finish_reason": "stop",
}
],
"usage": {
"prompt_tokens": usage.get("input_tokens", 0),
"completion_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
return web.json_response(response_data)
async def _write_sse_chat_completion(
self, request: "web.Request", completion_id: str, model: str,
created: int, stream_q, agent_task,
) -> "web.StreamResponse":
"""Write real streaming SSE from agent's stream_delta_callback queue."""
import queue as _q
response = web.StreamResponse(
status=200,
headers={"Content-Type": "text/event-stream", "Cache-Control": "no-cache"},
)
await response.prepare(request)
# Role chunk
role_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}],
}
await response.write(f"data: {json.dumps(role_chunk)}\n\n".encode())
# Stream content chunks as they arrive from the agent
loop = asyncio.get_event_loop()
while True:
try:
delta = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
except _q.Empty:
if agent_task.done():
# Drain any remaining items
while True:
try:
delta = stream_q.get_nowait()
if delta is None:
break
content_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
}
await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
except _q.Empty:
break
break
continue
if delta is None: # End of stream sentinel
break
content_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
}
await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
# Get usage from completed agent
usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
try:
result, agent_usage = await agent_task
usage = agent_usage or usage
except Exception:
pass
# Finish chunk
finish_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
"usage": {
"prompt_tokens": usage.get("input_tokens", 0),
"completion_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
await response.write(f"data: {json.dumps(finish_chunk)}\n\n".encode())
await response.write(b"data: [DONE]\n\n")
return response
async def _handle_responses(self, request: "web.Request") -> "web.Response":
"""POST /v1/responses — OpenAI Responses API format."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
# Parse request body
try:
body = await request.json()
except (json.JSONDecodeError, Exception):
return web.json_response(
{"error": {"message": "Invalid JSON in request body", "type": "invalid_request_error"}},
status=400,
)
raw_input = body.get("input")
if raw_input is None:
return web.json_response(
{"error": {"message": "Missing 'input' field", "type": "invalid_request_error"}},
status=400,
)
instructions = body.get("instructions")
previous_response_id = body.get("previous_response_id")
conversation = body.get("conversation")
store = body.get("store", True)
# conversation and previous_response_id are mutually exclusive
if conversation and previous_response_id:
return web.json_response(
{"error": {"message": "Cannot use both 'conversation' and 'previous_response_id'", "type": "invalid_request_error"}},
status=400,
)
# Resolve conversation name to latest response_id
if conversation:
previous_response_id = self._conversations.get(conversation)
# No error if conversation doesn't exist yet — it's a new conversation
# Normalize input to message list
input_messages: List[Dict[str, str]] = []
if isinstance(raw_input, str):
input_messages = [{"role": "user", "content": raw_input}]
elif isinstance(raw_input, list):
for item in raw_input:
if isinstance(item, str):
input_messages.append({"role": "user", "content": item})
elif isinstance(item, dict):
role = item.get("role", "user")
content = item.get("content", "")
# Handle content that may be a list of content parts
if isinstance(content, list):
text_parts = []
for part in content:
if isinstance(part, dict) and part.get("type") == "input_text":
text_parts.append(part.get("text", ""))
elif isinstance(part, dict) and part.get("type") == "output_text":
text_parts.append(part.get("text", ""))
elif isinstance(part, str):
text_parts.append(part)
content = "\n".join(text_parts)
input_messages.append({"role": role, "content": content})
else:
return web.json_response(
{"error": {"message": "'input' must be a string or array", "type": "invalid_request_error"}},
status=400,
)
# Reconstruct conversation history from previous_response_id
conversation_history: List[Dict[str, str]] = []
if previous_response_id:
stored = self._response_store.get(previous_response_id)
if stored is None:
return web.json_response(
{"error": {"message": f"Previous response not found: {previous_response_id}", "type": "invalid_request_error"}},
status=404,
)
conversation_history = list(stored.get("conversation_history", []))
# If no instructions provided, carry forward from previous
if instructions is None:
instructions = stored.get("instructions")
# Append new input messages to history (all but the last become history)
for msg in input_messages[:-1]:
conversation_history.append(msg)
# Last input message is the user_message
user_message = input_messages[-1].get("content", "") if input_messages else ""
if not user_message:
return web.json_response(
{"error": {"message": "No user message found in input", "type": "invalid_request_error"}},
status=400,
)
# Truncation support
if body.get("truncation") == "auto" and len(conversation_history) > 100:
conversation_history = conversation_history[-100:]
# Run the agent
session_id = str(uuid.uuid4())
try:
result, usage = await self._run_agent(
user_message=user_message,
conversation_history=conversation_history,
ephemeral_system_prompt=instructions,
session_id=session_id,
)
except Exception as e:
logger.error("Error running agent for responses: %s", e, exc_info=True)
return web.json_response(
{"error": {"message": f"Internal server error: {e}", "type": "server_error"}},
status=500,
)
final_response = result.get("final_response", "")
if not final_response:
final_response = result.get("error", "(No response generated)")
response_id = f"resp_{uuid.uuid4().hex[:28]}"
created_at = int(time.time())
# Build the full conversation history for storage
# (includes tool calls from the agent run)
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
# Add agent's internal messages if available
agent_messages = result.get("messages", [])
if agent_messages:
full_history.extend(agent_messages)
else:
full_history.append({"role": "assistant", "content": final_response})
# Build output items (includes tool calls + final message)
output_items = self._extract_output_items(result)
response_data = {
"id": response_id,
"object": "response",
"status": "completed",
"created_at": created_at,
"model": body.get("model", "hermes-agent"),
"output": output_items,
"usage": {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
# Store the complete response object for future chaining / GET retrieval
if store:
self._response_store.put(response_id, {
"response": response_data,
"conversation_history": full_history,
"instructions": instructions,
})
# Update conversation mapping so the next request with the same
# conversation name automatically chains to this response
if conversation:
self._conversations[conversation] = response_id
return web.json_response(response_data)
# ------------------------------------------------------------------
# GET / DELETE response endpoints
# ------------------------------------------------------------------
async def _handle_get_response(self, request: "web.Request") -> "web.Response":
"""GET /v1/responses/{response_id} — retrieve a stored response."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
response_id = request.match_info["response_id"]
stored = self._response_store.get(response_id)
if stored is None:
return web.json_response(
{"error": {"message": f"Response not found: {response_id}", "type": "invalid_request_error"}},
status=404,
)
return web.json_response(stored["response"])
async def _handle_delete_response(self, request: "web.Request") -> "web.Response":
"""DELETE /v1/responses/{response_id} — delete a stored response."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
response_id = request.match_info["response_id"]
deleted = self._response_store.delete(response_id)
if not deleted:
return web.json_response(
{"error": {"message": f"Response not found: {response_id}", "type": "invalid_request_error"}},
status=404,
)
return web.json_response({
"id": response_id,
"object": "response",
"deleted": True,
})
# ------------------------------------------------------------------
# Output extraction helper
# ------------------------------------------------------------------
@staticmethod
def _extract_output_items(result: Dict[str, Any]) -> List[Dict[str, Any]]:
"""
Build the full output item array from the agent's messages.
Walks *result["messages"]* and emits:
- ``function_call`` items for each tool_call on assistant messages
- ``function_call_output`` items for each tool-role message
- a final ``message`` item with the assistant's text reply
"""
items: List[Dict[str, Any]] = []
messages = result.get("messages", [])
for msg in messages:
role = msg.get("role")
if role == "assistant" and msg.get("tool_calls"):
for tc in msg["tool_calls"]:
func = tc.get("function", {})
items.append({
"type": "function_call",
"name": func.get("name", ""),
"arguments": func.get("arguments", ""),
"call_id": tc.get("id", ""),
})
elif role == "tool":
items.append({
"type": "function_call_output",
"call_id": msg.get("tool_call_id", ""),
"output": msg.get("content", ""),
})
# Final assistant message
final = result.get("final_response", "")
if not final:
final = result.get("error", "(No response generated)")
items.append({
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": final,
}
],
})
return items
# ------------------------------------------------------------------
# Agent execution
# ------------------------------------------------------------------
async def _run_agent(
self,
user_message: str,
conversation_history: List[Dict[str, str]],
ephemeral_system_prompt: Optional[str] = None,
session_id: Optional[str] = None,
stream_delta_callback=None,
) -> tuple:
"""
Create an agent and run a conversation in a thread executor.
Returns ``(result_dict, usage_dict)`` where *usage_dict* contains
``input_tokens``, ``output_tokens`` and ``total_tokens``.
"""
loop = asyncio.get_event_loop()
def _run():
agent = self._create_agent(
ephemeral_system_prompt=ephemeral_system_prompt,
session_id=session_id,
stream_delta_callback=stream_delta_callback,
)
result = agent.run_conversation(
user_message=user_message,
conversation_history=conversation_history,
)
usage = {
"input_tokens": getattr(agent, "session_prompt_tokens", 0) or 0,
"output_tokens": getattr(agent, "session_completion_tokens", 0) or 0,
"total_tokens": getattr(agent, "session_total_tokens", 0) or 0,
}
return result, usage
return await loop.run_in_executor(None, _run)
# ------------------------------------------------------------------
# BasePlatformAdapter interface
# ------------------------------------------------------------------
async def connect(self) -> bool:
"""Start the aiohttp web server."""
if not AIOHTTP_AVAILABLE:
logger.warning("[%s] aiohttp not installed", self.name)
return False
try:
self._app = web.Application(middlewares=[cors_middleware])
self._app.router.add_get("/health", self._handle_health)
self._app.router.add_get("/v1/models", self._handle_models)
self._app.router.add_post("/v1/chat/completions", self._handle_chat_completions)
self._app.router.add_post("/v1/responses", self._handle_responses)
self._app.router.add_get("/v1/responses/{response_id}", self._handle_get_response)
self._app.router.add_delete("/v1/responses/{response_id}", self._handle_delete_response)
self._runner = web.AppRunner(self._app)
await self._runner.setup()
self._site = web.TCPSite(self._runner, self._host, self._port)
await self._site.start()
self._mark_connected()
logger.info(
"[%s] API server listening on http://%s:%d",
self.name, self._host, self._port,
)
return True
except Exception as e:
logger.error("[%s] Failed to start API server: %s", self.name, e)
return False
async def disconnect(self) -> None:
"""Stop the aiohttp web server."""
self._mark_disconnected()
if self._site:
await self._site.stop()
self._site = None
if self._runner:
await self._runner.cleanup()
self._runner = None
self._app = None
logger.info("[%s] API server stopped", self.name)
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""
Not used — HTTP request/response cycle handles delivery directly.
"""
return SendResult(success=False, error="API server uses HTTP request/response, not send()")
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Return basic info about the API server."""
return {
"name": "API Server",
"type": "api",
"host": self._host,
"port": self._port,
}
+202 -8
View File
@@ -288,11 +288,13 @@ class MessageEvent:
message_id: Optional[str] = None
# Media attachments
# media_urls: local file paths (for vision tool access)
media_urls: List[str] = field(default_factory=list)
media_types: List[str] = field(default_factory=list)
# Reply context
reply_to_message_id: Optional[str] = None
reply_to_text: Optional[str] = None # Text of the replied-to message (for context injection)
# Timestamps
timestamp: datetime = field(default_factory=datetime.now)
@@ -355,6 +357,10 @@ class BasePlatformAdapter(ABC):
# Key: session_key (e.g., chat_id), Value: (event, asyncio.Event for interrupt)
self._active_sessions: Dict[str, asyncio.Event] = {}
self._pending_messages: Dict[str, MessageEvent] = {}
# Background message-processing tasks spawned by handle_message().
# Gateway shutdown cancels these so an old gateway instance doesn't keep
# working on a task after --replace or manual restarts.
self._background_tasks: set[asyncio.Task] = set()
# Chats where auto-TTS on voice input is disabled (set by /voice off)
self._auto_tts_disabled_chats: set = set()
@@ -505,6 +511,7 @@ class BasePlatformAdapter(ABC):
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""
Send an image natively via the platform API.
@@ -523,6 +530,7 @@ class BasePlatformAdapter(ABC):
animation_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""
Send an animated GIF natively via the platform API.
@@ -531,7 +539,7 @@ class BasePlatformAdapter(ABC):
(e.g., Telegram send_animation) so they auto-play inline.
Default falls back to send_image.
"""
return await self.send_image(chat_id=chat_id, image_url=animation_url, caption=caption, reply_to=reply_to)
return await self.send_image(chat_id=chat_id, image_url=animation_url, caption=caption, reply_to=reply_to, metadata=metadata)
@staticmethod
def _is_animation_url(url: str) -> bool:
@@ -721,7 +729,75 @@ class BasePlatformAdapter(ABC):
cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
return media, cleaned
@staticmethod
def extract_local_files(content: str) -> Tuple[List[str], str]:
"""
Detect bare local file paths in response text for native media delivery.
Matches absolute paths (/...) and tilde paths (~/) ending in common
image or video extensions. Validates each candidate with
``os.path.isfile()`` to avoid false positives from URLs or
non-existent paths.
Paths inside fenced code blocks (``` ... ```) and inline code
(`...`) are ignored so that code samples are never mutilated.
Returns:
Tuple of (list of expanded file paths, cleaned text with the
raw path strings removed).
"""
_LOCAL_MEDIA_EXTS = (
'.png', '.jpg', '.jpeg', '.gif', '.webp',
'.mp4', '.mov', '.avi', '.mkv', '.webm',
)
ext_part = '|'.join(e.lstrip('.') for e in _LOCAL_MEDIA_EXTS)
# (?<![/:\w.]) prevents matching inside URLs (e.g. https://…/img.png)
# and relative paths (./foo.png)
# (?:~/|/) anchors to absolute or home-relative paths
path_re = re.compile(
r'(?<![/:\w.])(?:~/|/)(?:[\w.\-]+/)*[\w.\-]+\.(?:' + ext_part + r')\b',
re.IGNORECASE,
)
# Build spans covered by fenced code blocks and inline code
code_spans: list = []
for m in re.finditer(r'```[^\n]*\n.*?```', content, re.DOTALL):
code_spans.append((m.start(), m.end()))
for m in re.finditer(r'`[^`\n]+`', content):
code_spans.append((m.start(), m.end()))
def _in_code(pos: int) -> bool:
return any(s <= pos < e for s, e in code_spans)
found: list = [] # (raw_match_text, expanded_path)
for match in path_re.finditer(content):
if _in_code(match.start()):
continue
raw = match.group(0)
expanded = os.path.expanduser(raw)
if os.path.isfile(expanded):
found.append((raw, expanded))
# Deduplicate by expanded path, preserving discovery order
seen: set = set()
unique: list = []
for raw, expanded in found:
if expanded not in seen:
seen.add(expanded)
unique.append((raw, expanded))
paths = [expanded for _, expanded in unique]
cleaned = content
if unique:
for raw, _exp in unique:
cleaned = cleaned.replace(raw, '')
cleaned = re.sub(r'\n{3,}', '\n\n', cleaned).strip()
return paths, cleaned
async def _keep_typing(self, chat_id: str, interval: float = 2.0, metadata=None) -> None:
"""
Continuously send typing indicator until cancelled.
@@ -747,11 +823,32 @@ class BasePlatformAdapter(ABC):
if not self._message_handler:
return
session_key = build_session_key(event.source)
session_key = build_session_key(
event.source,
group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
)
# Check if there's already an active handler for this session
if session_key in self._active_sessions:
# Store this as a pending message - it will interrupt the running agent
# Special case: photo bursts/albums frequently arrive as multiple near-
# simultaneous messages. Queue them without interrupting the active run,
# then process them immediately after the current task finishes.
if event.message_type == MessageType.PHOTO:
print(f"[{self.name}] 🖼️ Queuing photo follow-up for session {session_key} without interrupt")
existing = self._pending_messages.get(session_key)
if existing and existing.message_type == MessageType.PHOTO:
existing.media_urls.extend(event.media_urls)
existing.media_types.extend(event.media_types)
if event.text:
if not existing.text:
existing.text = event.text
elif event.text not in existing.text:
existing.text = f"{existing.text}\n\n{event.text}".strip()
else:
self._pending_messages[session_key] = event
return # Don't interrupt now - will run after current task completes
# Default behavior for non-photo follow-ups: interrupt the running agent
print(f"[{self.name}] ⚡ New message while session {session_key} is active - triggering interrupt")
self._pending_messages[session_key] = event
# Signal the interrupt (the processing task checks this)
@@ -759,7 +856,15 @@ class BasePlatformAdapter(ABC):
return # Don't process now - will be handled after current task finishes
# Spawn background task to process this message
asyncio.create_task(self._process_message_background(event, session_key))
task = asyncio.create_task(self._process_message_background(event, session_key))
try:
self._background_tasks.add(task)
except TypeError:
# Some tests stub create_task() with lightweight sentinels that are not
# hashable and do not support lifecycle callbacks.
return
if hasattr(task, "add_done_callback"):
task.add_done_callback(self._background_tasks.discard)
@staticmethod
def _get_human_delay() -> float:
@@ -805,8 +910,17 @@ class BasePlatformAdapter(ABC):
# Extract image URLs and send them as native platform attachments
images, text_content = self.extract_images(response)
# Strip any remaining internal directives from message body (fixes #1561)
text_content = text_content.replace("[[audio_as_voice]]", "").strip()
text_content = re.sub(r"MEDIA:\s*\S+", "", text_content).strip()
if images:
logger.info("[%s] extract_images found %d image(s) in response (%d chars)", self.name, len(images), len(response))
# Auto-detect bare local file paths for native media delivery
# (helps small models that don't use MEDIA: syntax)
local_files, text_content = self.extract_local_files(text_content)
if local_files:
logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
# Auto-TTS: if voice message, generate audio FIRST (before sending text)
# Skipped when the chat has voice mode disabled (/voice off)
@@ -900,7 +1014,7 @@ class BasePlatformAdapter(ABC):
# Send extracted media files — route by file type
_AUDIO_EXTS = {'.ogg', '.opus', '.mp3', '.wav', '.m4a'}
_VIDEO_EXTS = {'.mp4', '.mov', '.avi', '.mkv', '.3gp'}
_VIDEO_EXTS = {'.mp4', '.mov', '.avi', '.mkv', '.webm', '.3gp'}
_IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.webp', '.gif'}
for media_path, is_voice in media_files:
@@ -937,7 +1051,34 @@ class BasePlatformAdapter(ABC):
print(f"[{self.name}] Failed to send media ({ext}): {media_result.error}")
except Exception as media_err:
print(f"[{self.name}] Error sending media: {media_err}")
# Send auto-detected local files as native attachments
for file_path in local_files:
if human_delay > 0:
await asyncio.sleep(human_delay)
try:
ext = Path(file_path).suffix.lower()
if ext in _IMAGE_EXTS:
await self.send_image_file(
chat_id=event.source.chat_id,
image_path=file_path,
metadata=_thread_metadata,
)
elif ext in _VIDEO_EXTS:
await self.send_video(
chat_id=event.source.chat_id,
video_path=file_path,
metadata=_thread_metadata,
)
else:
await self.send_document(
chat_id=event.source.chat_id,
file_path=file_path,
metadata=_thread_metadata,
)
except Exception as file_err:
logger.error("[%s] Error sending local file %s: %s", self.name, file_path, file_err)
# Check if there's a pending message that was queued during our processing
if session_key in self._pending_messages:
pending_event = self._pending_messages.pop(session_key)
@@ -958,6 +1099,22 @@ class BasePlatformAdapter(ABC):
print(f"[{self.name}] Error handling message: {e}")
import traceback
traceback.print_exc()
# Send the error to the user so they aren't left with radio silence
try:
error_type = type(e).__name__
error_detail = str(e)[:300] if str(e) else "no details available"
_thread_metadata = {"thread_id": event.source.thread_id} if event.source.thread_id else None
await self.send(
chat_id=event.source.chat_id,
content=(
f"Sorry, I encountered an error ({error_type}).\n"
f"{error_detail}\n"
"Try again or use /reset to start a fresh session."
),
metadata=_thread_metadata,
)
except Exception:
pass # Last resort — don't let error reporting crash the handler
finally:
# Stop typing indicator
typing_task.cancel()
@@ -969,6 +1126,21 @@ class BasePlatformAdapter(ABC):
if session_key in self._active_sessions:
del self._active_sessions[session_key]
async def cancel_background_tasks(self) -> None:
"""Cancel any in-flight background message-processing tasks.
Used during gateway shutdown/replacement so active sessions from the old
process do not keep running after adapters are being torn down.
"""
tasks = [task for task in self._background_tasks if not task.done()]
for task in tasks:
task.cancel()
if tasks:
await asyncio.gather(*tasks, return_exceptions=True)
self._background_tasks.clear()
self._pending_messages.clear()
self._active_sessions.clear()
def has_pending_interrupt(self, session_key: str) -> bool:
"""Check if there's a pending interrupt for a session."""
return session_key in self._active_sessions and self._active_sessions[session_key].is_set()
@@ -1028,7 +1200,8 @@ class BasePlatformAdapter(ABC):
"""
return content
def truncate_message(self, content: str, max_length: int = 4096) -> List[str]:
@staticmethod
def truncate_message(content: str, max_length: int = 4096) -> List[str]:
"""
Split a long message into chunks, preserving code block boundaries.
@@ -1080,6 +1253,27 @@ class BasePlatformAdapter(ABC):
if split_at < 1:
split_at = headroom
# Avoid splitting inside an inline code span (`...`).
# If the text before split_at has an odd number of unescaped
# backticks, the split falls inside inline code — the resulting
# chunk would have an unpaired backtick and any special characters
# (like parentheses) inside the broken span would be unescaped,
# causing MarkdownV2 parse errors on Telegram.
candidate = remaining[:split_at]
backtick_count = candidate.count("`") - candidate.count("\\`")
if backtick_count % 2 == 1:
# Find the last unescaped backtick and split before it
last_bt = candidate.rfind("`")
while last_bt > 0 and candidate[last_bt - 1] == "\\":
last_bt = candidate.rfind("`", 0, last_bt)
if last_bt > 0:
# Try to find a space or newline just before the backtick
safe_split = candidate.rfind(" ", 0, last_bt)
nl_split = candidate.rfind("\n", 0, last_bt)
safe_split = max(safe_split, nl_split)
if safe_split > headroom // 4:
split_at = safe_split
chunk_body = remaining[:split_at]
remaining = remaining[split_at:].lstrip()
+340
View File
@@ -0,0 +1,340 @@
"""
DingTalk platform adapter using Stream Mode.
Uses dingtalk-stream SDK for real-time message reception without webhooks.
Responses are sent via DingTalk's session webhook (markdown format).
Requires:
pip install dingtalk-stream httpx
DINGTALK_CLIENT_ID and DINGTALK_CLIENT_SECRET env vars
Configuration in config.yaml:
platforms:
dingtalk:
enabled: true
extra:
client_id: "your-app-key" # or DINGTALK_CLIENT_ID env var
client_secret: "your-secret" # or DINGTALK_CLIENT_SECRET env var
"""
import asyncio
import logging
import os
import time
import uuid
from datetime import datetime, timezone
from typing import Any, Dict, Optional
try:
import dingtalk_stream
from dingtalk_stream import ChatbotHandler, ChatbotMessage
DINGTALK_STREAM_AVAILABLE = True
except ImportError:
DINGTALK_STREAM_AVAILABLE = False
dingtalk_stream = None # type: ignore[assignment]
try:
import httpx
HTTPX_AVAILABLE = True
except ImportError:
HTTPX_AVAILABLE = False
httpx = None # type: ignore[assignment]
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
MAX_MESSAGE_LENGTH = 20000
DEDUP_WINDOW_SECONDS = 300
DEDUP_MAX_SIZE = 1000
RECONNECT_BACKOFF = [2, 5, 10, 30, 60]
def check_dingtalk_requirements() -> bool:
"""Check if DingTalk dependencies are available and configured."""
if not DINGTALK_STREAM_AVAILABLE or not HTTPX_AVAILABLE:
return False
if not os.getenv("DINGTALK_CLIENT_ID") or not os.getenv("DINGTALK_CLIENT_SECRET"):
return False
return True
class DingTalkAdapter(BasePlatformAdapter):
"""DingTalk chatbot adapter using Stream Mode.
The dingtalk-stream SDK maintains a long-lived WebSocket connection.
Incoming messages arrive via a ChatbotHandler callback. Replies are
sent via the incoming message's session_webhook URL using httpx.
"""
MAX_MESSAGE_LENGTH = MAX_MESSAGE_LENGTH
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.DINGTALK)
extra = config.extra or {}
self._client_id: str = extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID", "")
self._client_secret: str = extra.get("client_secret") or os.getenv("DINGTALK_CLIENT_SECRET", "")
self._stream_client: Any = None
self._stream_task: Optional[asyncio.Task] = None
self._http_client: Optional["httpx.AsyncClient"] = None
# Message deduplication: msg_id -> timestamp
self._seen_messages: Dict[str, float] = {}
# Map chat_id -> session_webhook for reply routing
self._session_webhooks: Dict[str, str] = {}
# -- Connection lifecycle -----------------------------------------------
async def connect(self) -> bool:
"""Connect to DingTalk via Stream Mode."""
if not DINGTALK_STREAM_AVAILABLE:
logger.warning("[%s] dingtalk-stream not installed. Run: pip install dingtalk-stream", self.name)
return False
if not HTTPX_AVAILABLE:
logger.warning("[%s] httpx not installed. Run: pip install httpx", self.name)
return False
if not self._client_id or not self._client_secret:
logger.warning("[%s] DINGTALK_CLIENT_ID and DINGTALK_CLIENT_SECRET required", self.name)
return False
try:
self._http_client = httpx.AsyncClient(timeout=30.0)
credential = dingtalk_stream.Credential(self._client_id, self._client_secret)
self._stream_client = dingtalk_stream.DingTalkStreamClient(credential)
# Capture the current event loop for cross-thread dispatch
loop = asyncio.get_running_loop()
handler = _IncomingHandler(self, loop)
self._stream_client.register_callback_handler(
dingtalk_stream.ChatbotMessage.TOPIC, handler
)
self._stream_task = asyncio.create_task(self._run_stream())
self._mark_connected()
logger.info("[%s] Connected via Stream Mode", self.name)
return True
except Exception as e:
logger.error("[%s] Failed to connect: %s", self.name, e)
return False
async def _run_stream(self) -> None:
"""Run the blocking stream client with auto-reconnection."""
backoff_idx = 0
while self._running:
try:
logger.debug("[%s] Starting stream client...", self.name)
await asyncio.to_thread(self._stream_client.start)
except asyncio.CancelledError:
return
except Exception as e:
if not self._running:
return
logger.warning("[%s] Stream client error: %s", self.name, e)
if not self._running:
return
delay = RECONNECT_BACKOFF[min(backoff_idx, len(RECONNECT_BACKOFF) - 1)]
logger.info("[%s] Reconnecting in %ds...", self.name, delay)
await asyncio.sleep(delay)
backoff_idx += 1
async def disconnect(self) -> None:
"""Disconnect from DingTalk."""
self._running = False
self._mark_disconnected()
if self._stream_task:
self._stream_task.cancel()
try:
await self._stream_task
except asyncio.CancelledError:
pass
self._stream_task = None
if self._http_client:
await self._http_client.aclose()
self._http_client = None
self._stream_client = None
self._session_webhooks.clear()
self._seen_messages.clear()
logger.info("[%s] Disconnected", self.name)
# -- Inbound message processing -----------------------------------------
async def _on_message(self, message: "ChatbotMessage") -> None:
"""Process an incoming DingTalk chatbot message."""
msg_id = getattr(message, "message_id", None) or uuid.uuid4().hex
if self._is_duplicate(msg_id):
logger.debug("[%s] Duplicate message %s, skipping", self.name, msg_id)
return
text = self._extract_text(message)
if not text:
logger.debug("[%s] Empty message, skipping", self.name)
return
# Chat context
conversation_id = getattr(message, "conversation_id", "") or ""
conversation_type = getattr(message, "conversation_type", "1")
is_group = str(conversation_type) == "2"
sender_id = getattr(message, "sender_id", "") or ""
sender_nick = getattr(message, "sender_nick", "") or sender_id
sender_staff_id = getattr(message, "sender_staff_id", "") or ""
chat_id = conversation_id or sender_id
chat_type = "group" if is_group else "dm"
# Store session webhook for reply routing
session_webhook = getattr(message, "session_webhook", None) or ""
if session_webhook and chat_id:
self._session_webhooks[chat_id] = session_webhook
source = self.build_source(
chat_id=chat_id,
chat_name=getattr(message, "conversation_title", None),
chat_type=chat_type,
user_id=sender_id,
user_name=sender_nick,
user_id_alt=sender_staff_id if sender_staff_id else None,
)
# Parse timestamp
create_at = getattr(message, "create_at", None)
try:
timestamp = datetime.fromtimestamp(int(create_at) / 1000, tz=timezone.utc) if create_at else datetime.now(tz=timezone.utc)
except (ValueError, OSError, TypeError):
timestamp = datetime.now(tz=timezone.utc)
event = MessageEvent(
text=text,
message_type=MessageType.TEXT,
source=source,
message_id=msg_id,
raw_message=message,
timestamp=timestamp,
)
logger.debug("[%s] Message from %s in %s: %s",
self.name, sender_nick, chat_id[:20] if chat_id else "?", text[:50])
await self.handle_message(event)
@staticmethod
def _extract_text(message: "ChatbotMessage") -> str:
"""Extract plain text from a DingTalk chatbot message."""
text = getattr(message, "text", None) or ""
if isinstance(text, dict):
content = text.get("content", "").strip()
else:
content = str(text).strip()
# Fall back to rich text if present
if not content:
rich_text = getattr(message, "rich_text", None)
if rich_text and isinstance(rich_text, list):
parts = [item["text"] for item in rich_text
if isinstance(item, dict) and item.get("text")]
content = " ".join(parts).strip()
return content
# -- Deduplication ------------------------------------------------------
def _is_duplicate(self, msg_id: str) -> bool:
"""Check and record a message ID. Returns True if already seen."""
now = time.time()
if len(self._seen_messages) > DEDUP_MAX_SIZE:
cutoff = now - DEDUP_WINDOW_SECONDS
self._seen_messages = {k: v for k, v in self._seen_messages.items() if v > cutoff}
if msg_id in self._seen_messages:
return True
self._seen_messages[msg_id] = now
return False
# -- Outbound messaging -------------------------------------------------
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a markdown reply via DingTalk session webhook."""
metadata = metadata or {}
session_webhook = metadata.get("session_webhook") or self._session_webhooks.get(chat_id)
if not session_webhook:
return SendResult(success=False,
error="No session_webhook available. Reply must follow an incoming message.")
if not self._http_client:
return SendResult(success=False, error="HTTP client not initialized")
payload = {
"msgtype": "markdown",
"markdown": {"title": "Hermes", "text": content[:self.MAX_MESSAGE_LENGTH]},
}
try:
resp = await self._http_client.post(session_webhook, json=payload, timeout=15.0)
if resp.status_code < 300:
return SendResult(success=True, message_id=uuid.uuid4().hex[:12])
body = resp.text
logger.warning("[%s] Send failed HTTP %d: %s", self.name, resp.status_code, body[:200])
return SendResult(success=False, error=f"HTTP {resp.status_code}: {body[:200]}")
except httpx.TimeoutException:
return SendResult(success=False, error="Timeout sending message to DingTalk")
except Exception as e:
logger.error("[%s] Send error: %s", self.name, e)
return SendResult(success=False, error=str(e))
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""DingTalk does not support typing indicators."""
pass
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Return basic info about a DingTalk conversation."""
return {"name": chat_id, "type": "group" if "group" in chat_id.lower() else "dm"}
# ---------------------------------------------------------------------------
# Internal stream handler
# ---------------------------------------------------------------------------
class _IncomingHandler(ChatbotHandler if DINGTALK_STREAM_AVAILABLE else object):
"""dingtalk-stream ChatbotHandler that forwards messages to the adapter."""
def __init__(self, adapter: DingTalkAdapter, loop: asyncio.AbstractEventLoop):
if DINGTALK_STREAM_AVAILABLE:
super().__init__()
self._adapter = adapter
self._loop = loop
def process(self, message: "ChatbotMessage"):
"""Called by dingtalk-stream in its thread when a message arrives.
Schedules the async handler on the main event loop.
"""
loop = self._loop
if loop is None or loop.is_closed():
logger.error("[DingTalk] Event loop unavailable, cannot dispatch message")
return dingtalk_stream.AckMessage.STATUS_OK, "OK"
future = asyncio.run_coroutine_threadsafe(self._adapter._on_message(message), loop)
try:
future.result(timeout=60)
except Exception:
logger.exception("[DingTalk] Error processing incoming message")
return dingtalk_stream.AckMessage.STATUS_OK, "OK"
+157 -58
View File
@@ -10,6 +10,7 @@ Uses discord.py library for:
"""
import asyncio
import json
import logging
import os
import struct
@@ -18,6 +19,7 @@ import tempfile
import threading
import time
from collections import defaultdict
from pathlib import Path
from typing import Callable, Dict, List, Optional, Any
logger = logging.getLogger(__name__)
@@ -87,8 +89,9 @@ class VoiceReceiver:
SAMPLE_RATE = 48000 # Discord native rate
CHANNELS = 2 # Discord sends stereo
def __init__(self, voice_client):
def __init__(self, voice_client, allowed_user_ids: set = None):
self._vc = voice_client
self._allowed_user_ids = allowed_user_ids or set()
self._running = False
# Decryption
@@ -274,19 +277,21 @@ class VoiceReceiver:
if self._dave_session:
with self._lock:
user_id = self._ssrc_to_user.get(ssrc, 0)
if user_id == 0:
if self._packet_debug_count <= 10:
logger.warning("DAVE skip: unknown user for ssrc=%d", ssrc)
return # unknown user, can't DAVE-decrypt
try:
import davey
decrypted = self._dave_session.decrypt(
user_id, davey.MediaType.audio, decrypted
)
except Exception as e:
if self._packet_debug_count <= 10:
logger.warning("DAVE decrypt failed for ssrc=%d: %s", ssrc, e)
return
if user_id:
try:
import davey
decrypted = self._dave_session.decrypt(
user_id, davey.MediaType.audio, decrypted
)
except Exception as e:
# Unencrypted passthrough — use NaCl-decrypted data as-is
if "Unencrypted" not in str(e):
if self._packet_debug_count <= 10:
logger.warning("DAVE decrypt failed for ssrc=%d: %s", ssrc, e)
return
# If SSRC unknown (no SPEAKING event yet), skip DAVE and try
# Opus decode directly — audio may be in passthrough mode.
# Buffer will get a user_id when SPEAKING event arrives later.
# --- Opus decode -> PCM ---
try:
@@ -304,6 +309,32 @@ class VoiceReceiver:
# Silence detection
# ------------------------------------------------------------------
def _infer_user_for_ssrc(self, ssrc: int) -> int:
"""Try to infer user_id for an unmapped SSRC.
When the bot rejoins a voice channel, Discord may not resend
SPEAKING events for users already speaking. If exactly one
allowed user is in the channel, map the SSRC to them.
"""
try:
channel = self._vc.channel
if not channel:
return 0
bot_id = self._vc.user.id if self._vc.user else 0
allowed = self._allowed_user_ids
candidates = [
m.id for m in channel.members
if m.id != bot_id and (not allowed or str(m.id) in allowed)
]
if len(candidates) == 1:
uid = candidates[0]
self._ssrc_to_user[ssrc] = uid
logger.info("Auto-mapped ssrc=%d -> user=%d (sole allowed member)", ssrc, uid)
return uid
except Exception:
pass
return 0
def check_silence(self) -> list:
"""Return list of (user_id, pcm_bytes) for completed utterances."""
now = time.monotonic()
@@ -322,6 +353,10 @@ class VoiceReceiver:
if silence_duration >= self.SILENCE_THRESHOLD and buf_duration >= self.MIN_SPEECH_DURATION:
user_id = ssrc_user_map.get(ssrc, 0)
if not user_id:
# SSRC not mapped (SPEAKING event missing after bot rejoin).
# Infer from allowed users in the voice channel.
user_id = self._infer_user_for_ssrc(ssrc)
if user_id:
completed.append((user_id, bytes(buf)))
self._buffers[ssrc] = bytearray()
@@ -400,6 +435,12 @@ class DiscordAdapter(BasePlatformAdapter):
self._voice_listen_tasks: Dict[int, asyncio.Task] = {} # guild_id -> listen loop
self._voice_input_callback: Optional[Callable] = None # set by run.py
self._on_voice_disconnect: Optional[Callable] = None # set by run.py
# Track threads where the bot has participated so follow-up messages
# in those threads don't require @mention. Persisted to disk so the
# set survives gateway restarts.
self._bot_participated_threads: set = self._load_participated_threads()
# Cap to prevent unbounded growth (Discord threads get archived).
self._MAX_TRACKED_THREADS = 500
async def connect(self) -> bool:
"""Connect to Discord and start receiving events."""
@@ -580,7 +621,7 @@ class DiscordAdapter(BasePlatformAdapter):
"""Send a message to a Discord channel."""
if not self._client:
return SendResult(success=False, error="Not connected")
try:
# Get the channel
channel = self._client.get_channel(int(chat_id))
@@ -695,13 +736,14 @@ class DiscordAdapter(BasePlatformAdapter):
) -> SendResult:
"""Play auto-TTS audio.
When the bot is in a voice channel for this chat's guild, skip the
file attachment the gateway runner plays audio in the VC instead.
When the bot is in a voice channel for this chat's guild, play
directly in the VC instead of sending as a file attachment.
"""
for gid, text_ch_id in self._voice_text_channels.items():
if str(text_ch_id) == str(chat_id) and self.is_in_voice_channel(gid):
logger.debug("[%s] Skipping play_tts for %s — VC playback handled by runner", self.name, chat_id)
return SendResult(success=True)
logger.info("[%s] Playing TTS in voice channel (guild=%d)", self.name, gid)
success = await self.play_in_voice_channel(gid, audio_path)
return SendResult(success=success)
return await self.send_voice(chat_id=chat_id, audio_path=audio_path, **kwargs)
async def send_voice(
@@ -805,7 +847,7 @@ class DiscordAdapter(BasePlatformAdapter):
# Start voice receiver (Phase 2: listen to users)
try:
receiver = VoiceReceiver(vc)
receiver = VoiceReceiver(vc, allowed_user_ids=self._allowed_user_ids)
receiver.start()
self._voice_receivers[guild_id] = receiver
self._voice_listen_tasks[guild_id] = asyncio.ensure_future(
@@ -1001,14 +1043,32 @@ class DiscordAdapter(BasePlatformAdapter):
# Voice listening (Phase 2)
# ------------------------------------------------------------------
# UDP keepalive interval in seconds — prevents Discord from dropping
# the UDP route after ~60s of silence.
_KEEPALIVE_INTERVAL = 15
async def _voice_listen_loop(self, guild_id: int):
"""Periodically check for completed utterances and process them."""
receiver = self._voice_receivers.get(guild_id)
if not receiver:
return
last_keepalive = time.monotonic()
try:
while receiver._running:
await asyncio.sleep(0.2)
# Send periodic UDP keepalive to prevent Discord from
# dropping the UDP session after ~60s of silence.
now = time.monotonic()
if now - last_keepalive >= self._KEEPALIVE_INTERVAL:
last_keepalive = now
try:
vc = self._voice_clients.get(guild_id)
if vc and vc.is_connected():
vc._connection.send_packet(b'\xf8\xff\xfe')
except Exception:
pass
completed = receiver.check_silence()
for user_id, pcm_data in completed:
if not self._is_allowed_user(str(user_id)):
@@ -1304,16 +1364,17 @@ class DiscordAdapter(BasePlatformAdapter):
self,
interaction: discord.Interaction,
command_text: str,
followup_msg: str = "Done~",
followup_msg: str | None = None,
) -> None:
"""Common handler for simple slash commands that dispatch a command string."""
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, command_text)
await self.handle_message(event)
try:
await interaction.followup.send(followup_msg, ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
if followup_msg:
try:
await interaction.followup.send(followup_msg, ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
def _register_slash_commands(self) -> None:
"""Register Discord slash commands on the command tree."""
@@ -1322,19 +1383,6 @@ class DiscordAdapter(BasePlatformAdapter):
tree = self._client.tree
@tree.command(name="ask", description="Ask Hermes a question")
@discord.app_commands.describe(question="Your question for Hermes")
async def slash_ask(interaction: discord.Interaction, question: str):
await interaction.response.defer()
event = self._build_slash_event(interaction, question)
await self.handle_message(event)
# The response is sent via the normal send() flow
# Send a followup to close the interaction if needed
try:
await interaction.followup.send("Processing complete~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="new", description="Start a new conversation")
async def slash_new(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/reset", "New conversation started~")
@@ -1354,10 +1402,6 @@ class DiscordAdapter(BasePlatformAdapter):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, f"/reasoning {effort}".strip())
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="personality", description="Set a personality")
@discord.app_commands.describe(name="Personality name. Leave empty to list available.")
@@ -1433,10 +1477,6 @@ class DiscordAdapter(BasePlatformAdapter):
await interaction.response.defer(ephemeral=True)
event = self._build_slash_event(interaction, f"/voice {mode}".strip())
await self.handle_message(event)
try:
await interaction.followup.send("Done~", ephemeral=True)
except Exception as e:
logger.debug("Discord followup failed: %s", e)
@tree.command(name="update", description="Update Hermes Agent to the latest version")
async def slash_update(interaction: discord.Interaction):
@@ -1518,6 +1558,10 @@ class DiscordAdapter(BasePlatformAdapter):
link = f"<#{thread_id}>" if thread_id else f"**{thread_name}**"
await interaction.followup.send(f"Created thread {link}", ephemeral=True)
# Track thread participation so follow-ups don't require @mention
if thread_id:
self._track_thread(thread_id)
# If a message was provided, kick off a new Hermes session in the thread
starter = (message or "").strip()
if starter and thread_id:
@@ -1685,9 +1729,12 @@ class DiscordAdapter(BasePlatformAdapter):
if not channel:
channel = await self._client.fetch_channel(int(chat_id))
# Discord embed description limit is 4096; show full command up to that
max_desc = 4088
cmd_display = command if len(command) <= max_desc else command[: max_desc - 3] + "..."
embed = discord.Embed(
title="Command Approval Required",
description=f"```\n{command[:500]}\n```",
description=f"```\n{cmd_display}\n```",
color=discord.Color.orange(),
)
embed.set_footer(text=f"Approval ID: {approval_id}")
@@ -1743,17 +1790,59 @@ class DiscordAdapter(BasePlatformAdapter):
return f"{parent_name} / {thread_name}"
return thread_name
# ------------------------------------------------------------------
# Thread participation persistence
# ------------------------------------------------------------------
@staticmethod
def _thread_state_path() -> Path:
"""Path to the persisted thread participation set."""
from hermes_cli.config import get_hermes_home
return get_hermes_home() / "discord_threads.json"
@classmethod
def _load_participated_threads(cls) -> set:
"""Load persisted thread IDs from disk."""
path = cls._thread_state_path()
try:
if path.exists():
data = json.loads(path.read_text(encoding="utf-8"))
if isinstance(data, list):
return set(data)
except Exception as e:
logger.debug("Could not load discord thread state: %s", e)
return set()
def _save_participated_threads(self) -> None:
"""Persist the current thread set to disk (best-effort)."""
path = self._thread_state_path()
try:
# Trim to most recent entries if over cap
thread_list = list(self._bot_participated_threads)
if len(thread_list) > self._MAX_TRACKED_THREADS:
thread_list = thread_list[-self._MAX_TRACKED_THREADS:]
self._bot_participated_threads = set(thread_list)
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(json.dumps(thread_list), encoding="utf-8")
except Exception as e:
logger.debug("Could not save discord thread state: %s", e)
def _track_thread(self, thread_id: str) -> None:
"""Add a thread to the participation set and persist."""
if thread_id not in self._bot_participated_threads:
self._bot_participated_threads.add(thread_id)
self._save_participated_threads()
async def _handle_message(self, message: DiscordMessage) -> None:
"""Handle incoming Discord messages."""
# In server channels (not DMs), require the bot to be @mentioned
# UNLESS the channel is in the free-response list.
# UNLESS the channel is in the free-response list or the message is
# in a thread where the bot has already participated.
#
# Config:
# DISCORD_FREE_RESPONSE_CHANNELS: Comma-separated channel IDs where the
# bot responds to every message without needing a mention.
# DISCORD_REQUIRE_MENTION: Set to "false" to disable mention requirement
# globally (all channels become free-response). Default: "true".
# Can also be set via discord.require_mention in config.yaml.
# Config (all settable via discord.* in config.yaml):
# discord.require_mention: Require @mention in server channels (default: true)
# discord.free_response_channels: Channel IDs where bot responds without mention
# discord.auto_thread: Auto-create thread on @mention in channels (default: true)
thread_id = None
parent_channel_id = None
@@ -1772,7 +1861,11 @@ class DiscordAdapter(BasePlatformAdapter):
require_mention = os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no")
is_free_channel = bool(channel_ids & free_channels)
if require_mention and not is_free_channel:
# Skip the mention check if the message is in a thread where
# the bot has previously participated (auto-created or replied in).
in_bot_thread = is_thread and thread_id in self._bot_participated_threads
if require_mention and not is_free_channel and not in_bot_thread:
if self._client.user not in message.mentions:
return
@@ -1781,17 +1874,18 @@ class DiscordAdapter(BasePlatformAdapter):
message.content = message.content.replace(f"<@!{self._client.user.id}>", "").strip()
# Auto-thread: when enabled, automatically create a thread for every
# new message in a text channel so each conversation is isolated.
# @mention in a text channel so each conversation is isolated (like Slack).
# Messages already inside threads or DMs are unaffected.
auto_threaded_channel = None
if not is_thread and not isinstance(message.channel, discord.DMChannel):
auto_thread = os.getenv("DISCORD_AUTO_THREAD", "").lower() in ("true", "1", "yes")
auto_thread = os.getenv("DISCORD_AUTO_THREAD", "true").lower() in ("true", "1", "yes")
if auto_thread:
thread = await self._auto_create_thread(message)
if thread:
is_thread = True
thread_id = str(thread.id)
auto_threaded_channel = thread
self._track_thread(thread_id)
# Determine message type
msg_type = MessageType.TEXT
@@ -1891,7 +1985,12 @@ class DiscordAdapter(BasePlatformAdapter):
reply_to_message_id=str(message.reference.message_id) if message.reference else None,
timestamp=message.created_at,
)
# Track thread participation so the bot won't require @mention for
# follow-up messages in threads it has already engaged in.
if thread_id:
self._track_thread(thread_id)
await self.handle_message(event)
+20 -4
View File
@@ -135,14 +135,23 @@ def _extract_email_address(raw: str) -> str:
return raw.strip().lower()
def _extract_attachments(msg: email_lib.message.Message) -> List[Dict[str, Any]]:
"""Extract attachment metadata and cache files locally."""
def _extract_attachments(
msg: email_lib.message.Message,
skip_attachments: bool = False,
) -> List[Dict[str, Any]]:
"""Extract attachment metadata and cache files locally.
When *skip_attachments* is True, all attachment/inline parts are ignored
(useful for malware protection or bandwidth savings).
"""
attachments = []
if not msg.is_multipart():
return attachments
for part in msg.walk():
disposition = str(part.get("Content-Disposition", ""))
if skip_attachments and ("attachment" in disposition or "inline" in disposition):
continue
if "attachment" not in disposition and "inline" not in disposition:
continue
# Skip text/plain and text/html body parts
@@ -196,6 +205,13 @@ class EmailAdapter(BasePlatformAdapter):
self._smtp_port = int(os.getenv("EMAIL_SMTP_PORT", "587"))
self._poll_interval = int(os.getenv("EMAIL_POLL_INTERVAL", "15"))
# Skip attachments — configured via config.yaml:
# platforms:
# email:
# skip_attachments: true
extra = config.extra or {}
self._skip_attachments = extra.get("skip_attachments", False)
# Track message IDs we've already processed to avoid duplicates
self._seen_uids: set = set()
self._poll_task: Optional[asyncio.Task] = None
@@ -306,7 +322,7 @@ class EmailAdapter(BasePlatformAdapter):
message_id = msg.get("Message-ID", "")
in_reply_to = msg.get("In-Reply-To", "")
body = _extract_text_body(msg)
attachments = _extract_attachments(msg)
attachments = _extract_attachments(msg, skip_attachments=self._skip_attachments)
results.append({
"uid": uid,
@@ -436,7 +452,7 @@ class EmailAdapter(BasePlatformAdapter):
logger.info("[Email] Sent reply to %s (subject: %s)", to_addr, subject)
return msg_id
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata: Optional[Dict[str, Any]] = None) -> None:
"""Email has no typing indicator — no-op."""
pass
+849
View File
@@ -0,0 +1,849 @@
"""Matrix gateway adapter.
Connects to any Matrix homeserver (self-hosted or matrix.org) via the
matrix-nio Python SDK. Supports optional end-to-end encryption (E2EE)
when installed with ``pip install "matrix-nio[e2e]"``.
Environment variables:
MATRIX_HOMESERVER Homeserver URL (e.g. https://matrix.example.org)
MATRIX_ACCESS_TOKEN Access token (preferred auth method)
MATRIX_USER_ID Full user ID (@bot:server) required for password login
MATRIX_PASSWORD Password (alternative to access token)
MATRIX_ENCRYPTION Set "true" to enable E2EE
MATRIX_ALLOWED_USERS Comma-separated Matrix user IDs (@user:server)
MATRIX_HOME_ROOM Room ID for cron/notification delivery
"""
from __future__ import annotations
import asyncio
import json
import logging
import mimetypes
import os
import re
import time
from pathlib import Path
from typing import Any, Dict, List, Optional, Set
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
# Matrix message size limit (4000 chars practical, spec has no hard limit
# but clients render poorly above this).
MAX_MESSAGE_LENGTH = 4000
# Store directory for E2EE keys and sync state.
_STORE_DIR = Path.home() / ".hermes" / "matrix" / "store"
# Grace period: ignore messages older than this many seconds before startup.
_STARTUP_GRACE_SECONDS = 5
def check_matrix_requirements() -> bool:
"""Return True if the Matrix adapter can be used."""
token = os.getenv("MATRIX_ACCESS_TOKEN", "")
password = os.getenv("MATRIX_PASSWORD", "")
homeserver = os.getenv("MATRIX_HOMESERVER", "")
if not token and not password:
logger.debug("Matrix: neither MATRIX_ACCESS_TOKEN nor MATRIX_PASSWORD set")
return False
if not homeserver:
logger.warning("Matrix: MATRIX_HOMESERVER not set")
return False
try:
import nio # noqa: F401
return True
except ImportError:
logger.warning(
"Matrix: matrix-nio not installed. "
"Run: pip install 'matrix-nio[e2e]'"
)
return False
class MatrixAdapter(BasePlatformAdapter):
"""Gateway adapter for Matrix (any homeserver)."""
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.MATRIX)
self._homeserver: str = (
config.extra.get("homeserver", "")
or os.getenv("MATRIX_HOMESERVER", "")
).rstrip("/")
self._access_token: str = config.token or os.getenv("MATRIX_ACCESS_TOKEN", "")
self._user_id: str = (
config.extra.get("user_id", "")
or os.getenv("MATRIX_USER_ID", "")
)
self._password: str = (
config.extra.get("password", "")
or os.getenv("MATRIX_PASSWORD", "")
)
self._encryption: bool = config.extra.get(
"encryption",
os.getenv("MATRIX_ENCRYPTION", "").lower() in ("true", "1", "yes"),
)
self._client: Any = None # nio.AsyncClient
self._sync_task: Optional[asyncio.Task] = None
self._closing = False
self._startup_ts: float = 0.0
# Cache: room_id → bool (is DM)
self._dm_rooms: Dict[str, bool] = {}
# Set of room IDs we've joined
self._joined_rooms: Set[str] = set()
# ------------------------------------------------------------------
# Required overrides
# ------------------------------------------------------------------
async def connect(self) -> bool:
"""Connect to the Matrix homeserver and start syncing."""
import nio
if not self._homeserver:
logger.error("Matrix: homeserver URL not configured")
return False
# Determine store path and ensure it exists.
store_path = str(_STORE_DIR)
_STORE_DIR.mkdir(parents=True, exist_ok=True)
# Create the client.
if self._encryption:
try:
client = nio.AsyncClient(
self._homeserver,
self._user_id or "",
store_path=store_path,
)
logger.info("Matrix: E2EE enabled (store: %s)", store_path)
except Exception as exc:
logger.warning(
"Matrix: failed to create E2EE client (%s), "
"falling back to plain client. Install: "
"pip install 'matrix-nio[e2e]'",
exc,
)
client = nio.AsyncClient(self._homeserver, self._user_id or "")
else:
client = nio.AsyncClient(self._homeserver, self._user_id or "")
self._client = client
# Authenticate.
if self._access_token:
client.access_token = self._access_token
# Resolve user_id if not set.
if not self._user_id:
resp = await client.whoami()
if isinstance(resp, nio.WhoamiResponse):
self._user_id = resp.user_id
client.user_id = resp.user_id
logger.info("Matrix: authenticated as %s", self._user_id)
else:
logger.error(
"Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER"
)
await client.close()
return False
else:
client.user_id = self._user_id
logger.info("Matrix: using access token for %s", self._user_id)
elif self._password and self._user_id:
resp = await client.login(
self._password,
device_name="Hermes Agent",
)
if isinstance(resp, nio.LoginResponse):
logger.info("Matrix: logged in as %s", self._user_id)
else:
logger.error("Matrix: login failed — %s", getattr(resp, "message", resp))
await client.close()
return False
else:
logger.error("Matrix: need MATRIX_ACCESS_TOKEN or MATRIX_USER_ID + MATRIX_PASSWORD")
await client.close()
return False
# If E2EE is enabled, load the crypto store.
if self._encryption and hasattr(client, "olm"):
try:
if client.should_upload_keys:
await client.keys_upload()
logger.info("Matrix: E2EE crypto initialized")
except Exception as exc:
logger.warning("Matrix: crypto init issue: %s", exc)
# Register event callbacks.
client.add_event_callback(self._on_room_message, nio.RoomMessageText)
client.add_event_callback(self._on_room_message_media, nio.RoomMessageMedia)
client.add_event_callback(self._on_room_message_media, nio.RoomMessageImage)
client.add_event_callback(self._on_room_message_media, nio.RoomMessageAudio)
client.add_event_callback(self._on_room_message_media, nio.RoomMessageVideo)
client.add_event_callback(self._on_room_message_media, nio.RoomMessageFile)
client.add_event_callback(self._on_invite, nio.InviteMemberEvent)
# If E2EE: handle encrypted events.
if self._encryption and hasattr(client, "olm"):
client.add_event_callback(
self._on_room_message, nio.MegolmEvent
)
# Initial sync to catch up, then start background sync.
self._startup_ts = time.time()
self._closing = False
# Do an initial sync to populate room state.
resp = await client.sync(timeout=10000, full_state=True)
if isinstance(resp, nio.SyncResponse):
self._joined_rooms = set(resp.rooms.join.keys())
logger.info(
"Matrix: initial sync complete, joined %d rooms",
len(self._joined_rooms),
)
# Build DM room cache from m.direct account data.
await self._refresh_dm_cache()
else:
logger.warning("Matrix: initial sync returned %s", type(resp).__name__)
# Start the sync loop.
self._sync_task = asyncio.create_task(self._sync_loop())
self._mark_connected()
return True
async def disconnect(self) -> None:
"""Disconnect from Matrix."""
self._closing = True
if self._sync_task and not self._sync_task.done():
self._sync_task.cancel()
try:
await self._sync_task
except (asyncio.CancelledError, Exception):
pass
if self._client:
await self._client.close()
self._client = None
logger.info("Matrix: disconnected")
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a message to a Matrix room."""
import nio
if not content:
return SendResult(success=True)
formatted = self.format_message(content)
chunks = self.truncate_message(formatted, MAX_MESSAGE_LENGTH)
last_event_id = None
for chunk in chunks:
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": chunk,
}
# Convert markdown to HTML for rich rendering.
html = self._markdown_to_html(chunk)
if html and html != chunk:
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = html
# Reply-to support.
if reply_to:
msg_content["m.relates_to"] = {
"m.in_reply_to": {"event_id": reply_to}
}
# Thread support: if metadata has thread_id, send as threaded reply.
thread_id = (metadata or {}).get("thread_id")
if thread_id:
relates_to = msg_content.get("m.relates_to", {})
relates_to["rel_type"] = "m.thread"
relates_to["event_id"] = thread_id
relates_to["is_falling_back"] = True
if reply_to and "m.in_reply_to" not in relates_to:
relates_to["m.in_reply_to"] = {"event_id": reply_to}
msg_content["m.relates_to"] = relates_to
resp = await self._client.room_send(
chat_id,
"m.room.message",
msg_content,
)
if isinstance(resp, nio.RoomSendResponse):
last_event_id = resp.event_id
else:
err = getattr(resp, "message", str(resp))
logger.error("Matrix: failed to send to %s: %s", chat_id, err)
return SendResult(success=False, error=err)
return SendResult(success=True, message_id=last_event_id)
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Return room name and type (dm/group)."""
name = chat_id
chat_type = "group"
if self._client:
room = self._client.rooms.get(chat_id)
if room:
name = room.display_name or room.canonical_alias or chat_id
# Use DM cache.
if self._dm_rooms.get(chat_id, False):
chat_type = "dm"
elif room.member_count == 2:
chat_type = "dm"
return {"name": name, "type": chat_type}
# ------------------------------------------------------------------
# Optional overrides
# ------------------------------------------------------------------
async def send_typing(
self, chat_id: str, metadata: Optional[Dict[str, Any]] = None
) -> None:
"""Send a typing indicator."""
if self._client:
try:
await self._client.room_typing(chat_id, typing_state=True, timeout=30000)
except Exception:
pass
async def edit_message(
self, chat_id: str, message_id: str, content: str
) -> SendResult:
"""Edit an existing message (via m.replace)."""
import nio
formatted = self.format_message(content)
msg_content: Dict[str, Any] = {
"msgtype": "m.text",
"body": f"* {formatted}",
"m.new_content": {
"msgtype": "m.text",
"body": formatted,
},
"m.relates_to": {
"rel_type": "m.replace",
"event_id": message_id,
},
}
html = self._markdown_to_html(formatted)
if html and html != formatted:
msg_content["m.new_content"]["format"] = "org.matrix.custom.html"
msg_content["m.new_content"]["formatted_body"] = html
msg_content["format"] = "org.matrix.custom.html"
msg_content["formatted_body"] = f"* {html}"
resp = await self._client.room_send(chat_id, "m.room.message", msg_content)
if isinstance(resp, nio.RoomSendResponse):
return SendResult(success=True, message_id=resp.event_id)
return SendResult(success=False, error=getattr(resp, "message", str(resp)))
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Download an image URL and upload it to Matrix."""
try:
# Try aiohttp first (always available), fall back to httpx
try:
import aiohttp as _aiohttp
async with _aiohttp.ClientSession() as http:
async with http.get(image_url, timeout=_aiohttp.ClientTimeout(total=30)) as resp:
resp.raise_for_status()
data = await resp.read()
ct = resp.content_type or "image/png"
fname = image_url.rsplit("/", 1)[-1].split("?")[0] or "image.png"
except ImportError:
import httpx
async with httpx.AsyncClient() as http:
resp = await http.get(image_url, follow_redirects=True, timeout=30)
resp.raise_for_status()
data = resp.content
ct = resp.headers.get("content-type", "image/png")
fname = image_url.rsplit("/", 1)[-1].split("?")[0] or "image.png"
except Exception as exc:
logger.warning("Matrix: failed to download image %s: %s", image_url, exc)
return await self.send(chat_id, f"{caption or ''}\n{image_url}".strip(), reply_to)
return await self._upload_and_send(chat_id, data, fname, ct, "m.image", caption, reply_to, metadata)
async def send_image_file(
self,
chat_id: str,
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a local image file to Matrix."""
return await self._send_local_file(chat_id, image_path, "m.image", caption, reply_to, metadata=metadata)
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a local file as a document."""
return await self._send_local_file(chat_id, file_path, "m.file", caption, reply_to, file_name, metadata)
async def send_voice(
self,
chat_id: str,
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload an audio file as a voice message."""
return await self._send_local_file(chat_id, audio_path, "m.audio", caption, reply_to, metadata=metadata)
async def send_video(
self,
chat_id: str,
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a video file."""
return await self._send_local_file(chat_id, video_path, "m.video", caption, reply_to, metadata=metadata)
def format_message(self, content: str) -> str:
"""Pass-through — Matrix supports standard Markdown natively."""
# Strip image markdown; media is uploaded separately.
content = re.sub(r"!\[([^\]]*)\]\(([^)]+)\)", r"\2", content)
return content
# ------------------------------------------------------------------
# File helpers
# ------------------------------------------------------------------
async def _upload_and_send(
self,
room_id: str,
data: bytes,
filename: str,
content_type: str,
msgtype: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload bytes to Matrix and send as a media message."""
import nio
# Upload to homeserver.
resp = await self._client.upload(
data,
content_type=content_type,
filename=filename,
)
if not isinstance(resp, nio.UploadResponse):
err = getattr(resp, "message", str(resp))
logger.error("Matrix: upload failed: %s", err)
return SendResult(success=False, error=err)
mxc_url = resp.content_uri
# Build media message content.
msg_content: Dict[str, Any] = {
"msgtype": msgtype,
"body": caption or filename,
"url": mxc_url,
"info": {
"mimetype": content_type,
"size": len(data),
},
}
if reply_to:
msg_content["m.relates_to"] = {
"m.in_reply_to": {"event_id": reply_to}
}
thread_id = (metadata or {}).get("thread_id")
if thread_id:
relates_to = msg_content.get("m.relates_to", {})
relates_to["rel_type"] = "m.thread"
relates_to["event_id"] = thread_id
relates_to["is_falling_back"] = True
msg_content["m.relates_to"] = relates_to
resp2 = await self._client.room_send(room_id, "m.room.message", msg_content)
if isinstance(resp2, nio.RoomSendResponse):
return SendResult(success=True, message_id=resp2.event_id)
return SendResult(success=False, error=getattr(resp2, "message", str(resp2)))
async def _send_local_file(
self,
room_id: str,
file_path: str,
msgtype: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
file_name: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Read a local file and upload it."""
p = Path(file_path)
if not p.exists():
return await self.send(
room_id, f"{caption or ''}\n(file not found: {file_path})", reply_to
)
fname = file_name or p.name
ct = mimetypes.guess_type(fname)[0] or "application/octet-stream"
data = p.read_bytes()
return await self._upload_and_send(room_id, data, fname, ct, msgtype, caption, reply_to, metadata)
# ------------------------------------------------------------------
# Sync loop
# ------------------------------------------------------------------
async def _sync_loop(self) -> None:
"""Continuously sync with the homeserver."""
while not self._closing:
try:
await self._client.sync(timeout=30000)
except asyncio.CancelledError:
return
except Exception as exc:
if self._closing:
return
logger.warning("Matrix: sync error: %s — retrying in 5s", exc)
await asyncio.sleep(5)
# ------------------------------------------------------------------
# Event callbacks
# ------------------------------------------------------------------
async def _on_room_message(self, room: Any, event: Any) -> None:
"""Handle incoming text messages (and decrypted megolm events)."""
import nio
# Ignore own messages.
if event.sender == self._user_id:
return
# Startup grace: ignore old messages from initial sync.
event_ts = getattr(event, "server_timestamp", 0) / 1000.0
if event_ts and event_ts < self._startup_ts - _STARTUP_GRACE_SECONDS:
return
# Handle decrypted MegolmEvents — extract the inner event.
if isinstance(event, nio.MegolmEvent):
# Failed to decrypt.
logger.warning(
"Matrix: could not decrypt event %s in %s",
event.event_id, room.room_id,
)
return
# Skip edits (m.replace relation).
source_content = getattr(event, "source", {}).get("content", {})
relates_to = source_content.get("m.relates_to", {})
if relates_to.get("rel_type") == "m.replace":
return
body = getattr(event, "body", "") or ""
if not body:
return
# Determine chat type.
is_dm = self._dm_rooms.get(room.room_id, False)
if not is_dm and room.member_count == 2:
is_dm = True
chat_type = "dm" if is_dm else "group"
# Thread support.
thread_id = None
if relates_to.get("rel_type") == "m.thread":
thread_id = relates_to.get("event_id")
# Reply-to detection.
reply_to = None
in_reply_to = relates_to.get("m.in_reply_to", {})
if in_reply_to:
reply_to = in_reply_to.get("event_id")
# Strip reply fallback from body (Matrix prepends "> ..." lines).
if reply_to and body.startswith("> "):
lines = body.split("\n")
stripped = []
past_fallback = False
for line in lines:
if not past_fallback:
if line.startswith("> ") or line == ">":
continue
if line == "":
past_fallback = True
continue
past_fallback = True
stripped.append(line)
body = "\n".join(stripped) if stripped else body
# Message type.
msg_type = MessageType.TEXT
if body.startswith("!") or body.startswith("/"):
msg_type = MessageType.COMMAND
source = self.build_source(
chat_id=room.room_id,
chat_type=chat_type,
user_id=event.sender,
user_name=self._get_display_name(room, event.sender),
thread_id=thread_id,
)
msg_event = MessageEvent(
text=body,
message_type=msg_type,
source=source,
raw_message=getattr(event, "source", {}),
message_id=event.event_id,
reply_to_message_id=reply_to,
)
await self.handle_message(msg_event)
async def _on_room_message_media(self, room: Any, event: Any) -> None:
"""Handle incoming media messages (images, audio, video, files)."""
import nio
# Ignore own messages.
if event.sender == self._user_id:
return
# Startup grace.
event_ts = getattr(event, "server_timestamp", 0) / 1000.0
if event_ts and event_ts < self._startup_ts - _STARTUP_GRACE_SECONDS:
return
body = getattr(event, "body", "") or ""
url = getattr(event, "url", "")
# Convert mxc:// to HTTP URL for downstream processing.
http_url = ""
if url and url.startswith("mxc://"):
http_url = self._mxc_to_http(url)
# Determine message type from event class.
# Use the MIME type from the event's content info when available,
# falling back to category-level MIME types for downstream matching
# (gateway/run.py checks startswith("image/"), startswith("audio/"), etc.)
content_info = getattr(event, "content", {}) if isinstance(getattr(event, "content", None), dict) else {}
event_mimetype = (content_info.get("info") or {}).get("mimetype", "")
media_type = "application/octet-stream"
msg_type = MessageType.DOCUMENT
if isinstance(event, nio.RoomMessageImage):
msg_type = MessageType.PHOTO
media_type = event_mimetype or "image/png"
elif isinstance(event, nio.RoomMessageAudio):
msg_type = MessageType.AUDIO
media_type = event_mimetype or "audio/ogg"
elif isinstance(event, nio.RoomMessageVideo):
msg_type = MessageType.VIDEO
media_type = event_mimetype or "video/mp4"
elif event_mimetype:
media_type = event_mimetype
is_dm = self._dm_rooms.get(room.room_id, False)
if not is_dm and room.member_count == 2:
is_dm = True
chat_type = "dm" if is_dm else "group"
# Thread/reply detection.
source_content = getattr(event, "source", {}).get("content", {})
relates_to = source_content.get("m.relates_to", {})
thread_id = None
if relates_to.get("rel_type") == "m.thread":
thread_id = relates_to.get("event_id")
source = self.build_source(
chat_id=room.room_id,
chat_type=chat_type,
user_id=event.sender,
user_name=self._get_display_name(room, event.sender),
thread_id=thread_id,
)
msg_event = MessageEvent(
text=body,
message_type=msg_type,
source=source,
raw_message=getattr(event, "source", {}),
message_id=event.event_id,
media_urls=[http_url] if http_url else None,
media_types=[media_type] if http_url else None,
)
await self.handle_message(msg_event)
async def _on_invite(self, room: Any, event: Any) -> None:
"""Auto-join rooms when invited."""
import nio
if not isinstance(event, nio.InviteMemberEvent):
return
# Only process invites directed at us.
if event.state_key != self._user_id:
return
if event.membership != "invite":
return
logger.info(
"Matrix: invited to %s by %s — joining",
room.room_id, event.sender,
)
try:
resp = await self._client.join(room.room_id)
if isinstance(resp, nio.JoinResponse):
self._joined_rooms.add(room.room_id)
logger.info("Matrix: joined %s", room.room_id)
# Refresh DM cache since new room may be a DM.
await self._refresh_dm_cache()
else:
logger.warning(
"Matrix: failed to join %s: %s",
room.room_id, getattr(resp, "message", resp),
)
except Exception as exc:
logger.warning("Matrix: error joining %s: %s", room.room_id, exc)
# ------------------------------------------------------------------
# Helpers
# ------------------------------------------------------------------
async def _refresh_dm_cache(self) -> None:
"""Refresh the DM room cache from m.direct account data.
Tries the account_data API first, then falls back to parsing
the sync response's account_data for robustness.
"""
if not self._client:
return
dm_data: Optional[Dict] = None
# Primary: try the dedicated account data endpoint.
try:
resp = await self._client.get_account_data("m.direct")
if hasattr(resp, "content"):
dm_data = resp.content
elif isinstance(resp, dict):
dm_data = resp
except Exception as exc:
logger.debug("Matrix: get_account_data('m.direct') failed: %s — trying sync fallback", exc)
# Fallback: parse from the client's account_data store (populated by sync).
if dm_data is None:
try:
# matrix-nio stores account data events on the client object
ad = getattr(self._client, "account_data", None)
if ad and isinstance(ad, dict) and "m.direct" in ad:
event = ad["m.direct"]
if hasattr(event, "content"):
dm_data = event.content
elif isinstance(event, dict):
dm_data = event
except Exception:
pass
if dm_data is None:
return
dm_room_ids: Set[str] = set()
for user_id, rooms in dm_data.items():
if isinstance(rooms, list):
dm_room_ids.update(rooms)
self._dm_rooms = {
rid: (rid in dm_room_ids)
for rid in self._joined_rooms
}
def _get_display_name(self, room: Any, user_id: str) -> str:
"""Get a user's display name in a room, falling back to user_id."""
if room and hasattr(room, "users"):
user = room.users.get(user_id)
if user and getattr(user, "display_name", None):
return user.display_name
# Strip the @...:server format to just the localpart.
if user_id.startswith("@") and ":" in user_id:
return user_id[1:].split(":")[0]
return user_id
def _mxc_to_http(self, mxc_url: str) -> str:
"""Convert mxc://server/media_id to an HTTP download URL."""
# mxc://matrix.org/abc123 → https://matrix.org/_matrix/client/v1/media/download/matrix.org/abc123
# Uses the authenticated client endpoint (spec v1.11+) instead of the
# deprecated /_matrix/media/v3/download/ path.
if not mxc_url.startswith("mxc://"):
return mxc_url
parts = mxc_url[6:] # strip mxc://
# Use our homeserver for download (federation handles the rest).
return f"{self._homeserver}/_matrix/client/v1/media/download/{parts}"
def _markdown_to_html(self, text: str) -> str:
"""Convert Markdown to Matrix-compatible HTML.
Uses a simple conversion for common patterns. For full fidelity
a markdown-it style library could be used, but this covers the
common cases without an extra dependency.
"""
try:
import markdown
html = markdown.markdown(
text,
extensions=["fenced_code", "tables", "nl2br"],
)
# Strip wrapping <p> tags for single-paragraph messages.
if html.count("<p>") == 1:
html = html.replace("<p>", "").replace("</p>", "")
return html
except ImportError:
pass
# Minimal fallback: just handle bold, italic, code.
html = text
html = re.sub(r"\*\*(.+?)\*\*", r"<strong>\1</strong>", html)
html = re.sub(r"\*(.+?)\*", r"<em>\1</em>", html)
html = re.sub(r"`([^`]+)`", r"<code>\1</code>", html)
html = re.sub(r"\n", r"<br>", html)
return html
+664
View File
@@ -0,0 +1,664 @@
"""Mattermost gateway adapter.
Connects to a self-hosted (or cloud) Mattermost instance via its REST API
(v4) and WebSocket for real-time events. No external Mattermost library
required uses aiohttp which is already a Hermes dependency.
Environment variables:
MATTERMOST_URL Server URL (e.g. https://mm.example.com)
MATTERMOST_TOKEN Bot token or personal-access token
MATTERMOST_ALLOWED_USERS Comma-separated user IDs
MATTERMOST_HOME_CHANNEL Channel ID for cron/notification delivery
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
import re
import time
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
# Mattermost post size limit (server default is 16383, but 4000 is the
# practical limit for readable messages — matching OpenClaw's choice).
MAX_POST_LENGTH = 4000
# Channel type codes returned by the Mattermost API.
_CHANNEL_TYPE_MAP = {
"D": "dm",
"G": "group",
"P": "group", # private channel → treat as group
"O": "channel",
}
# Reconnect parameters (exponential backoff).
_RECONNECT_BASE_DELAY = 2.0
_RECONNECT_MAX_DELAY = 60.0
_RECONNECT_JITTER = 0.2
def check_mattermost_requirements() -> bool:
"""Return True if the Mattermost adapter can be used."""
token = os.getenv("MATTERMOST_TOKEN", "")
url = os.getenv("MATTERMOST_URL", "")
if not token:
logger.debug("Mattermost: MATTERMOST_TOKEN not set")
return False
if not url:
logger.warning("Mattermost: MATTERMOST_URL not set")
return False
try:
import aiohttp # noqa: F401
return True
except ImportError:
logger.warning("Mattermost: aiohttp not installed")
return False
class MattermostAdapter(BasePlatformAdapter):
"""Gateway adapter for Mattermost (self-hosted or cloud)."""
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.MATTERMOST)
self._base_url: str = (
config.extra.get("url", "")
or os.getenv("MATTERMOST_URL", "")
).rstrip("/")
self._token: str = config.token or os.getenv("MATTERMOST_TOKEN", "")
self._bot_user_id: str = ""
self._bot_username: str = ""
# aiohttp session + websocket handle
self._session: Any = None # aiohttp.ClientSession
self._ws: Any = None # aiohttp.ClientWebSocketResponse
self._ws_task: Optional[asyncio.Task] = None
self._reconnect_task: Optional[asyncio.Task] = None
self._closing = False
# Reply mode: "thread" to nest replies, "off" for flat messages.
self._reply_mode: str = (
config.extra.get("reply_mode", "")
or os.getenv("MATTERMOST_REPLY_MODE", "off")
).lower()
# Dedup cache: post_id → timestamp (prevent reprocessing)
self._seen_posts: Dict[str, float] = {}
self._SEEN_MAX = 2000
self._SEEN_TTL = 300 # 5 minutes
# ------------------------------------------------------------------
# HTTP helpers
# ------------------------------------------------------------------
def _headers(self) -> Dict[str, str]:
return {
"Authorization": f"Bearer {self._token}",
"Content-Type": "application/json",
}
async def _api_get(self, path: str) -> Dict[str, Any]:
"""GET /api/v4/{path}."""
import aiohttp
url = f"{self._base_url}/api/v4/{path.lstrip('/')}"
try:
async with self._session.get(url, headers=self._headers()) as resp:
if resp.status >= 400:
body = await resp.text()
logger.error("MM API GET %s%s: %s", path, resp.status, body[:200])
return {}
return await resp.json()
except aiohttp.ClientError as exc:
logger.error("MM API GET %s network error: %s", path, exc)
return {}
async def _api_post(
self, path: str, payload: Dict[str, Any]
) -> Dict[str, Any]:
"""POST /api/v4/{path} with JSON body."""
import aiohttp
url = f"{self._base_url}/api/v4/{path.lstrip('/')}"
try:
async with self._session.post(
url, headers=self._headers(), json=payload
) as resp:
if resp.status >= 400:
body = await resp.text()
logger.error("MM API POST %s%s: %s", path, resp.status, body[:200])
return {}
return await resp.json()
except aiohttp.ClientError as exc:
logger.error("MM API POST %s network error: %s", path, exc)
return {}
async def _api_put(
self, path: str, payload: Dict[str, Any]
) -> Dict[str, Any]:
"""PUT /api/v4/{path} with JSON body."""
import aiohttp
url = f"{self._base_url}/api/v4/{path.lstrip('/')}"
try:
async with self._session.put(
url, headers=self._headers(), json=payload
) as resp:
if resp.status >= 400:
body = await resp.text()
logger.error("MM API PUT %s%s: %s", path, resp.status, body[:200])
return {}
return await resp.json()
except aiohttp.ClientError as exc:
logger.error("MM API PUT %s network error: %s", path, exc)
return {}
async def _upload_file(
self, channel_id: str, file_data: bytes, filename: str, content_type: str = "application/octet-stream"
) -> Optional[str]:
"""Upload a file and return its file ID, or None on failure."""
import aiohttp
url = f"{self._base_url}/api/v4/files"
form = aiohttp.FormData()
form.add_field("channel_id", channel_id)
form.add_field(
"files",
file_data,
filename=filename,
content_type=content_type,
)
headers = {"Authorization": f"Bearer {self._token}"}
async with self._session.post(url, headers=headers, data=form) as resp:
if resp.status >= 400:
body = await resp.text()
logger.error("MM file upload → %s: %s", resp.status, body[:200])
return None
data = await resp.json()
infos = data.get("file_infos", [])
return infos[0]["id"] if infos else None
# ------------------------------------------------------------------
# Required overrides
# ------------------------------------------------------------------
async def connect(self) -> bool:
"""Connect to Mattermost and start the WebSocket listener."""
import aiohttp
if not self._base_url or not self._token:
logger.error("Mattermost: URL or token not configured")
return False
self._session = aiohttp.ClientSession()
self._closing = False
# Verify credentials and fetch bot identity.
me = await self._api_get("users/me")
if not me or "id" not in me:
logger.error("Mattermost: failed to authenticate — check MATTERMOST_TOKEN and MATTERMOST_URL")
await self._session.close()
return False
self._bot_user_id = me["id"]
self._bot_username = me.get("username", "")
logger.info(
"Mattermost: authenticated as @%s (%s) on %s",
self._bot_username,
self._bot_user_id,
self._base_url,
)
# Start WebSocket in background.
self._ws_task = asyncio.create_task(self._ws_loop())
self._mark_connected()
return True
async def disconnect(self) -> None:
"""Disconnect from Mattermost."""
self._closing = True
if self._ws_task and not self._ws_task.done():
self._ws_task.cancel()
try:
await self._ws_task
except (asyncio.CancelledError, Exception):
pass
if self._reconnect_task and not self._reconnect_task.done():
self._reconnect_task.cancel()
if self._ws:
await self._ws.close()
self._ws = None
if self._session and not self._session.closed:
await self._session.close()
logger.info("Mattermost: disconnected")
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a message (or multiple chunks) to a channel."""
if not content:
return SendResult(success=True)
formatted = self.format_message(content)
chunks = self.truncate_message(formatted, MAX_POST_LENGTH)
last_id = None
for chunk in chunks:
payload: Dict[str, Any] = {
"channel_id": chat_id,
"message": chunk,
}
# Thread support: reply_to is the root post ID.
if reply_to and self._reply_mode == "thread":
payload["root_id"] = reply_to
data = await self._api_post("posts", payload)
if not data or "id" not in data:
return SendResult(success=False, error="Failed to create post")
last_id = data["id"]
return SendResult(success=True, message_id=last_id)
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Return channel name and type."""
data = await self._api_get(f"channels/{chat_id}")
if not data:
return {"name": chat_id, "type": "channel"}
ch_type = _CHANNEL_TYPE_MAP.get(data.get("type", "O"), "channel")
display_name = data.get("display_name") or data.get("name") or chat_id
return {"name": display_name, "type": ch_type}
# ------------------------------------------------------------------
# Optional overrides
# ------------------------------------------------------------------
async def send_typing(
self, chat_id: str, metadata: Optional[Dict[str, Any]] = None
) -> None:
"""Send a typing indicator."""
await self._api_post(
f"users/{self._bot_user_id}/typing",
{"channel_id": chat_id},
)
async def edit_message(
self, chat_id: str, message_id: str, content: str
) -> SendResult:
"""Edit an existing post."""
formatted = self.format_message(content)
data = await self._api_put(
f"posts/{message_id}/patch",
{"message": formatted},
)
if not data or "id" not in data:
return SendResult(success=False, error="Failed to edit post")
return SendResult(success=True, message_id=data["id"])
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Download an image and upload it as a file attachment."""
return await self._send_url_as_file(
chat_id, image_url, caption, reply_to, "image"
)
async def send_image_file(
self,
chat_id: str,
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a local image file."""
return await self._send_local_file(
chat_id, image_path, caption, reply_to
)
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a local file as a document."""
return await self._send_local_file(
chat_id, file_path, caption, reply_to, file_name
)
async def send_voice(
self,
chat_id: str,
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload an audio file."""
return await self._send_local_file(
chat_id, audio_path, caption, reply_to
)
async def send_video(
self,
chat_id: str,
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Upload a video file."""
return await self._send_local_file(
chat_id, video_path, caption, reply_to
)
def format_message(self, content: str) -> str:
"""Mattermost uses standard Markdown — mostly pass through.
Strip image markdown into plain links (files are uploaded separately).
"""
# Convert ![alt](url) to just the URL — Mattermost renders
# image URLs as inline previews automatically.
content = re.sub(r"!\[([^\]]*)\]\(([^)]+)\)", r"\2", content)
return content
# ------------------------------------------------------------------
# File helpers
# ------------------------------------------------------------------
async def _send_url_as_file(
self,
chat_id: str,
url: str,
caption: Optional[str],
reply_to: Optional[str],
kind: str = "file",
) -> SendResult:
"""Download a URL and upload it as a file attachment."""
import aiohttp
try:
async with self._session.get(url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
if resp.status >= 400:
# Fall back to sending the URL as text.
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
file_data = await resp.read()
ct = resp.content_type or "application/octet-stream"
# Derive filename from URL.
fname = url.rsplit("/", 1)[-1].split("?")[0] or f"{kind}.png"
except Exception as exc:
logger.warning("Mattermost: failed to download %s: %s", url, exc)
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
file_id = await self._upload_file(chat_id, file_data, fname, ct)
if not file_id:
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
payload: Dict[str, Any] = {
"channel_id": chat_id,
"message": caption or "",
"file_ids": [file_id],
}
if reply_to and self._reply_mode == "thread":
payload["root_id"] = reply_to
data = await self._api_post("posts", payload)
if not data or "id" not in data:
return SendResult(success=False, error="Failed to post with file")
return SendResult(success=True, message_id=data["id"])
async def _send_local_file(
self,
chat_id: str,
file_path: str,
caption: Optional[str],
reply_to: Optional[str],
file_name: Optional[str] = None,
) -> SendResult:
"""Upload a local file and attach it to a post."""
import mimetypes
p = Path(file_path)
if not p.exists():
return await self.send(
chat_id, f"{caption or ''}\n(file not found: {file_path})", reply_to
)
fname = file_name or p.name
ct = mimetypes.guess_type(fname)[0] or "application/octet-stream"
file_data = p.read_bytes()
file_id = await self._upload_file(chat_id, file_data, fname, ct)
if not file_id:
return SendResult(success=False, error="File upload failed")
payload: Dict[str, Any] = {
"channel_id": chat_id,
"message": caption or "",
"file_ids": [file_id],
}
if reply_to and self._reply_mode == "thread":
payload["root_id"] = reply_to
data = await self._api_post("posts", payload)
if not data or "id" not in data:
return SendResult(success=False, error="Failed to post with file")
return SendResult(success=True, message_id=data["id"])
# ------------------------------------------------------------------
# WebSocket
# ------------------------------------------------------------------
async def _ws_loop(self) -> None:
"""Connect to the WebSocket and listen for events, reconnecting on failure."""
delay = _RECONNECT_BASE_DELAY
while not self._closing:
try:
await self._ws_connect_and_listen()
# Clean disconnect — reset delay.
delay = _RECONNECT_BASE_DELAY
except asyncio.CancelledError:
return
except Exception as exc:
if self._closing:
return
logger.warning("Mattermost WS error: %s — reconnecting in %.0fs", exc, delay)
if self._closing:
return
# Exponential backoff with jitter.
import random
jitter = delay * _RECONNECT_JITTER * random.random()
await asyncio.sleep(delay + jitter)
delay = min(delay * 2, _RECONNECT_MAX_DELAY)
async def _ws_connect_and_listen(self) -> None:
"""Single WebSocket session: connect, authenticate, process events."""
# Build WS URL: https:// → wss://, http:// → ws://
ws_url = re.sub(r"^http", "ws", self._base_url) + "/api/v4/websocket"
logger.info("Mattermost: connecting to %s", ws_url)
self._ws = await self._session.ws_connect(ws_url, heartbeat=30.0)
# Authenticate via the WebSocket.
auth_msg = {
"seq": 1,
"action": "authentication_challenge",
"data": {"token": self._token},
}
await self._ws.send_json(auth_msg)
logger.info("Mattermost: WebSocket connected and authenticated")
async for raw_msg in self._ws:
if self._closing:
return
if raw_msg.type in (
raw_msg.type.TEXT,
raw_msg.type.BINARY,
):
try:
event = json.loads(raw_msg.data)
except (json.JSONDecodeError, TypeError):
continue
await self._handle_ws_event(event)
elif raw_msg.type in (
raw_msg.type.ERROR,
raw_msg.type.CLOSE,
raw_msg.type.CLOSING,
raw_msg.type.CLOSED,
):
logger.info("Mattermost: WebSocket closed (%s)", raw_msg.type)
break
async def _handle_ws_event(self, event: Dict[str, Any]) -> None:
"""Process a single WebSocket event."""
event_type = event.get("event")
if event_type != "posted":
return
data = event.get("data", {})
raw_post_str = data.get("post")
if not raw_post_str:
return
try:
post = json.loads(raw_post_str)
except (json.JSONDecodeError, TypeError):
return
# Ignore own messages.
if post.get("user_id") == self._bot_user_id:
return
# Ignore system posts.
if post.get("type"):
return
post_id = post.get("id", "")
# Dedup.
self._prune_seen()
if post_id in self._seen_posts:
return
self._seen_posts[post_id] = time.time()
# Build message event.
channel_id = post.get("channel_id", "")
channel_type_raw = data.get("channel_type", "O")
chat_type = _CHANNEL_TYPE_MAP.get(channel_type_raw, "channel")
# For DMs, user_id is sufficient. For channels, check for @mention.
message_text = post.get("message", "")
# Resolve sender info.
sender_id = post.get("user_id", "")
sender_name = data.get("sender_name", "").lstrip("@") or sender_id
# Thread support: if the post is in a thread, use root_id.
thread_id = post.get("root_id") or None
# Determine message type.
file_ids = post.get("file_ids") or []
msg_type = MessageType.TEXT
if message_text.startswith("/"):
msg_type = MessageType.COMMAND
# Download file attachments immediately (URLs require auth headers
# that downstream tools won't have).
media_urls: List[str] = []
media_types: List[str] = []
for fid in file_ids:
try:
file_info = await self._api_get(f"files/{fid}/info")
fname = file_info.get("name", f"file_{fid}")
ext = Path(fname).suffix or ""
mime = file_info.get("mime_type", "application/octet-stream")
import aiohttp
dl_url = f"{self._base_url}/api/v4/files/{fid}"
async with self._session.get(
dl_url,
headers={"Authorization": f"Bearer {self._token}"},
timeout=aiohttp.ClientTimeout(total=30),
) as resp:
if resp.status < 400:
file_data = await resp.read()
from gateway.platforms.base import cache_image_from_bytes, cache_document_from_bytes
if mime.startswith("image/"):
local_path = cache_image_from_bytes(file_data, ext or ".png")
media_urls.append(local_path)
media_types.append("image")
elif mime.startswith("audio/"):
from gateway.platforms.base import cache_audio_from_bytes
local_path = cache_audio_from_bytes(file_data, ext or ".ogg")
media_urls.append(local_path)
media_types.append("audio")
else:
local_path = cache_document_from_bytes(file_data, fname)
media_urls.append(local_path)
media_types.append("document")
else:
logger.warning("Mattermost: failed to download file %s: HTTP %s", fid, resp.status)
except Exception as exc:
logger.warning("Mattermost: error downloading file %s: %s", fid, exc)
source = self.build_source(
chat_id=channel_id,
chat_type=chat_type,
user_id=sender_id,
user_name=sender_name,
thread_id=thread_id,
)
msg_event = MessageEvent(
text=message_text,
message_type=msg_type,
source=source,
raw_message=post,
message_id=post_id,
media_urls=media_urls if media_urls else None,
media_types=media_types if media_types else None,
)
await self.handle_message(msg_event)
def _prune_seen(self) -> None:
"""Remove expired entries from the dedup cache."""
if len(self._seen_posts) < self._SEEN_MAX:
return
now = time.time()
self._seen_posts = {
pid: ts
for pid, ts in self._seen_posts.items()
if now - ts < self._SEEN_TTL
}
+37 -5
View File
@@ -179,6 +179,11 @@ class SignalAdapter(BasePlatformAdapter):
# Normalize account for self-message filtering
self._account_normalized = self.account.strip()
# Track recently sent message timestamps to prevent echo-back loops
# in Note to Self / self-chat mode (mirrors WhatsApp recentlySentIds)
self._recent_sent_timestamps: set = set()
self._max_recent_timestamps = 50
logger.info("Signal adapter initialized: url=%s account=%s groups=%s",
self.http_url, _redact_phone(self.account),
"enabled" if self.group_allow_from else "disabled")
@@ -353,10 +358,26 @@ class SignalAdapter(BasePlatformAdapter):
# Unwrap nested envelope if present
envelope_data = envelope.get("envelope", envelope)
# Filter syncMessage envelopes (sent transcripts, read receipts, etc.)
# signal-cli may set syncMessage to null vs omitting it, so check key existence
# Handle syncMessage: extract "Note to Self" messages (sent to own account)
# while still filtering other sync events (read receipts, typing, etc.)
is_note_to_self = False
if "syncMessage" in envelope_data:
return
sync_msg = envelope_data.get("syncMessage")
if sync_msg and isinstance(sync_msg, dict):
sent_msg = sync_msg.get("sentMessage")
if sent_msg and isinstance(sent_msg, dict):
dest = sent_msg.get("destinationNumber") or sent_msg.get("destination")
sent_ts = sent_msg.get("timestamp")
if dest == self._account_normalized:
# Check if this is an echo of our own outbound reply
if sent_ts and sent_ts in self._recent_sent_timestamps:
self._recent_sent_timestamps.discard(sent_ts)
return
# Genuine user Note to Self — promote to dataMessage
is_note_to_self = True
envelope_data = {**envelope_data, "dataMessage": sent_msg}
if not is_note_to_self:
return
# Extract sender info
sender = (
@@ -371,8 +392,8 @@ class SignalAdapter(BasePlatformAdapter):
logger.debug("Signal: ignoring envelope with no sender")
return
# Self-message filtering — prevent reply loops
if self._account_normalized and sender == self._account_normalized:
# Self-message filtering — prevent reply loops (but allow Note to Self)
if self._account_normalized and sender == self._account_normalized and not is_note_to_self:
return
# Filter stories
@@ -577,9 +598,18 @@ class SignalAdapter(BasePlatformAdapter):
result = await self._rpc("send", params)
if result is not None:
self._track_sent_timestamp(result)
return SendResult(success=True)
return SendResult(success=False, error="RPC send failed")
def _track_sent_timestamp(self, rpc_result) -> None:
"""Record outbound message timestamp for echo-back filtering."""
ts = rpc_result.get("timestamp") if isinstance(rpc_result, dict) else None
if ts:
self._recent_sent_timestamps.add(ts)
if len(self._recent_sent_timestamps) > self._max_recent_timestamps:
self._recent_sent_timestamps.pop()
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""Send a typing indicator."""
params: Dict[str, Any] = {
@@ -635,6 +665,7 @@ class SignalAdapter(BasePlatformAdapter):
result = await self._rpc("send", params)
if result is not None:
self._track_sent_timestamp(result)
return SendResult(success=True)
return SendResult(success=False, error="RPC send with attachment failed")
@@ -665,6 +696,7 @@ class SignalAdapter(BasePlatformAdapter):
result = await self._rpc("send", params)
if result is not None:
self._track_sent_timestamp(result)
return SendResult(success=True)
return SendResult(success=False, error="RPC send document failed")
+5 -17
View File
@@ -789,23 +789,11 @@ class SlackAdapter(BasePlatformAdapter):
user_id = command.get("user_id", "")
channel_id = command.get("channel_id", "")
# Map subcommands to gateway commands
subcommand_map = {
"new": "/reset", "reset": "/reset",
"status": "/status", "stop": "/stop",
"help": "/help",
"model": "/model", "personality": "/personality",
"retry": "/retry", "undo": "/undo",
"compact": "/compress", "compress": "/compress",
"resume": "/resume",
"background": "/background",
"usage": "/usage",
"insights": "/insights",
"title": "/title",
"reasoning": "/reasoning",
"provider": "/provider",
"rollback": "/rollback",
}
# Map subcommands to gateway commands — derived from central registry.
# Also keep "compact" as a Slack-specific alias for /compress.
from hermes_cli.commands import slack_subcommand_map
subcommand_map = slack_subcommand_map()
subcommand_map["compact"] = "/compress"
first_word = text.split()[0] if text else ""
if first_word in subcommand_map:
# Preserve arguments after the subcommand
+271
View File
@@ -0,0 +1,271 @@
"""SMS (Twilio) platform adapter.
Connects to the Twilio REST API for outbound SMS and runs an aiohttp
webhook server to receive inbound messages.
Shares credentials with the optional telephony skill same env vars:
- TWILIO_ACCOUNT_SID
- TWILIO_AUTH_TOKEN
- TWILIO_PHONE_NUMBER (E.164 from-number, e.g. +15551234567)
Gateway-specific env vars:
- SMS_WEBHOOK_PORT (default 8080)
- SMS_ALLOWED_USERS (comma-separated E.164 phone numbers)
- SMS_ALLOW_ALL_USERS (true/false)
- SMS_HOME_CHANNEL (phone number for cron delivery)
"""
import asyncio
import base64
import json
import logging
import os
import re
import urllib.parse
from typing import Any, Dict, List, Optional
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
TWILIO_API_BASE = "https://api.twilio.com/2010-04-01/Accounts"
MAX_SMS_LENGTH = 1600 # ~10 SMS segments
DEFAULT_WEBHOOK_PORT = 8080
# E.164 phone number pattern for redaction
_PHONE_RE = re.compile(r"\+[1-9]\d{6,14}")
def _redact_phone(phone: str) -> str:
"""Redact a phone number for logging: +15551234567 -> +1555***4567."""
if not phone:
return "<none>"
if len(phone) <= 8:
return phone[:2] + "***" + phone[-2:] if len(phone) > 4 else "****"
return phone[:5] + "***" + phone[-4:]
def check_sms_requirements() -> bool:
"""Check if SMS adapter dependencies are available."""
try:
import aiohttp # noqa: F401
except ImportError:
return False
return bool(os.getenv("TWILIO_ACCOUNT_SID") and os.getenv("TWILIO_AUTH_TOKEN"))
class SmsAdapter(BasePlatformAdapter):
"""
Twilio SMS <-> Hermes gateway adapter.
Each inbound phone number gets its own Hermes session (multi-tenant).
Replies are always sent from the configured TWILIO_PHONE_NUMBER.
"""
MAX_MESSAGE_LENGTH = MAX_SMS_LENGTH
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.SMS)
self._account_sid: str = os.environ["TWILIO_ACCOUNT_SID"]
self._auth_token: str = os.environ["TWILIO_AUTH_TOKEN"]
self._from_number: str = os.getenv("TWILIO_PHONE_NUMBER", "")
self._webhook_port: int = int(
os.getenv("SMS_WEBHOOK_PORT", str(DEFAULT_WEBHOOK_PORT))
)
self._runner = None
self._http_session: Optional["aiohttp.ClientSession"] = None
def _basic_auth_header(self) -> str:
"""Build HTTP Basic auth header value for Twilio."""
creds = f"{self._account_sid}:{self._auth_token}"
encoded = base64.b64encode(creds.encode("ascii")).decode("ascii")
return f"Basic {encoded}"
# ------------------------------------------------------------------
# Required abstract methods
# ------------------------------------------------------------------
async def connect(self) -> bool:
import aiohttp
from aiohttp import web
if not self._from_number:
logger.error("[sms] TWILIO_PHONE_NUMBER not set — cannot send replies")
return False
app = web.Application()
app.router.add_post("/webhooks/twilio", self._handle_webhook)
app.router.add_get("/health", lambda _: web.Response(text="ok"))
self._runner = web.AppRunner(app)
await self._runner.setup()
site = web.TCPSite(self._runner, "0.0.0.0", self._webhook_port)
await site.start()
self._http_session = aiohttp.ClientSession()
self._running = True
logger.info(
"[sms] Twilio webhook server listening on port %d, from: %s",
self._webhook_port,
_redact_phone(self._from_number),
)
return True
async def disconnect(self) -> None:
if self._http_session:
await self._http_session.close()
self._http_session = None
if self._runner:
await self._runner.cleanup()
self._runner = None
self._running = False
logger.info("[sms] Disconnected")
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
import aiohttp
formatted = self.format_message(content)
chunks = self.truncate_message(formatted)
last_result = SendResult(success=True)
url = f"{TWILIO_API_BASE}/{self._account_sid}/Messages.json"
headers = {
"Authorization": self._basic_auth_header(),
}
session = self._http_session or aiohttp.ClientSession()
try:
for chunk in chunks:
form_data = aiohttp.FormData()
form_data.add_field("From", self._from_number)
form_data.add_field("To", chat_id)
form_data.add_field("Body", chunk)
try:
async with session.post(url, data=form_data, headers=headers) as resp:
body = await resp.json()
if resp.status >= 400:
error_msg = body.get("message", str(body))
logger.error(
"[sms] send failed to %s: %s %s",
_redact_phone(chat_id),
resp.status,
error_msg,
)
return SendResult(
success=False,
error=f"Twilio {resp.status}: {error_msg}",
)
msg_sid = body.get("sid", "")
last_result = SendResult(success=True, message_id=msg_sid)
except Exception as e:
logger.error("[sms] send error to %s: %s", _redact_phone(chat_id), e)
return SendResult(success=False, error=str(e))
finally:
# Close session only if we created a fallback (no persistent session)
if not self._http_session and session:
await session.close()
return last_result
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
return {"name": chat_id, "type": "dm"}
# ------------------------------------------------------------------
# SMS-specific formatting
# ------------------------------------------------------------------
def format_message(self, content: str) -> str:
"""Strip markdown — SMS renders it as literal characters."""
content = re.sub(r"\*\*(.+?)\*\*", r"\1", content, flags=re.DOTALL)
content = re.sub(r"\*(.+?)\*", r"\1", content, flags=re.DOTALL)
content = re.sub(r"__(.+?)__", r"\1", content, flags=re.DOTALL)
content = re.sub(r"_(.+?)_", r"\1", content, flags=re.DOTALL)
content = re.sub(r"```[a-z]*\n?", "", content)
content = re.sub(r"`(.+?)`", r"\1", content)
content = re.sub(r"^#{1,6}\s+", "", content, flags=re.MULTILINE)
content = re.sub(r"\[([^\]]+)\]\([^\)]+\)", r"\1", content)
content = re.sub(r"\n{3,}", "\n\n", content)
return content.strip()
# ------------------------------------------------------------------
# Twilio webhook handler
# ------------------------------------------------------------------
async def _handle_webhook(self, request) -> "aiohttp.web.Response":
from aiohttp import web
try:
raw = await request.read()
# Twilio sends form-encoded data, not JSON
form = urllib.parse.parse_qs(raw.decode("utf-8"))
except Exception as e:
logger.error("[sms] webhook parse error: %s", e)
return web.Response(
text='<?xml version="1.0" encoding="UTF-8"?><Response></Response>',
content_type="application/xml",
status=400,
)
# Extract fields (parse_qs returns lists)
from_number = (form.get("From", [""]))[0].strip()
to_number = (form.get("To", [""]))[0].strip()
text = (form.get("Body", [""]))[0].strip()
message_sid = (form.get("MessageSid", [""]))[0].strip()
if not from_number or not text:
return web.Response(
text='<?xml version="1.0" encoding="UTF-8"?><Response></Response>',
content_type="application/xml",
)
# Ignore messages from our own number (echo prevention)
if from_number == self._from_number:
logger.debug("[sms] ignoring echo from own number %s", _redact_phone(from_number))
return web.Response(
text='<?xml version="1.0" encoding="UTF-8"?><Response></Response>',
content_type="application/xml",
)
logger.info(
"[sms] inbound from %s -> %s: %s",
_redact_phone(from_number),
_redact_phone(to_number),
text[:80],
)
source = self.build_source(
chat_id=from_number,
chat_name=from_number,
chat_type="dm",
user_id=from_number,
user_name=from_number,
)
event = MessageEvent(
text=text,
message_type=MessageType.TEXT,
source=source,
raw_message=form,
message_id=message_sid,
)
# Non-blocking: Twilio expects a fast response
asyncio.create_task(self.handle_message(event))
# Return empty TwiML — we send replies via the REST API, not inline TwiML
return web.Response(
text='<?xml version="1.0" encoding="UTF-8"?><Response></Response>',
content_type="application/xml",
)
+272 -56
View File
@@ -111,8 +111,18 @@ class TelegramAdapter(BasePlatformAdapter):
super().__init__(config, Platform.TELEGRAM)
self._app: Optional[Application] = None
self._bot: Optional[Bot] = None
# Buffer rapid/album photo updates so Telegram image bursts are handled
# as a single MessageEvent instead of self-interrupting multiple turns.
self._media_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_MEDIA_BATCH_DELAY_SECONDS", "0.8"))
self._pending_photo_batches: Dict[str, MessageEvent] = {}
self._pending_photo_batch_tasks: Dict[str, asyncio.Task] = {}
self._media_group_events: Dict[str, MessageEvent] = {}
self._media_group_tasks: Dict[str, asyncio.Task] = {}
# Buffer rapid text messages so Telegram client-side splits of long
# messages are aggregated into a single MessageEvent.
self._text_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS", "0.6"))
self._pending_text_batches: Dict[str, MessageEvent] = {}
self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
self._token_lock_identity: Optional[str] = None
self._polling_error_task: Optional[asyncio.Task] = None
@@ -197,8 +207,26 @@ class TelegramAdapter(BasePlatformAdapter):
self._handle_media_message
))
# Start polling in background
await self._app.initialize()
# Start polling — retry initialize() for transient TLS resets
try:
from telegram.error import NetworkError, TimedOut
except ImportError:
NetworkError = TimedOut = OSError # type: ignore[misc,assignment]
_max_connect = 3
for _attempt in range(_max_connect):
try:
await self._app.initialize()
break
except (NetworkError, TimedOut, OSError) as init_err:
if _attempt < _max_connect - 1:
wait = 2 ** _attempt
logger.warning(
"[%s] Connect attempt %d/%d failed: %s — retrying in %ds",
self.name, _attempt + 1, _max_connect, init_err, wait,
)
await asyncio.sleep(wait)
else:
raise
await self._app.start()
loop = asyncio.get_running_loop()
@@ -217,29 +245,13 @@ class TelegramAdapter(BasePlatformAdapter):
)
# Register bot commands so Telegram shows a hint menu when users type /
# List is derived from the central COMMAND_REGISTRY — adding a new
# gateway command there automatically adds it to the Telegram menu.
try:
from telegram import BotCommand
from hermes_cli.commands import telegram_bot_commands
await self._bot.set_my_commands([
BotCommand("new", "Start a new conversation"),
BotCommand("reset", "Reset conversation history"),
BotCommand("model", "Show or change the model"),
BotCommand("reasoning", "Show or change reasoning effort"),
BotCommand("personality", "Set a personality"),
BotCommand("retry", "Retry your last message"),
BotCommand("undo", "Remove the last exchange"),
BotCommand("status", "Show session info"),
BotCommand("stop", "Stop the running agent"),
BotCommand("sethome", "Set this chat as the home channel"),
BotCommand("compress", "Compress conversation context"),
BotCommand("title", "Set or show the session title"),
BotCommand("resume", "Resume a previously-named session"),
BotCommand("usage", "Show token usage for this session"),
BotCommand("provider", "Show available providers"),
BotCommand("insights", "Show usage insights and analytics"),
BotCommand("update", "Update Hermes to the latest version"),
BotCommand("reload_mcp", "Reload MCP servers from config"),
BotCommand("voice", "Toggle voice reply mode"),
BotCommand("help", "Show available commands"),
BotCommand(name, desc) for name, desc in telegram_bot_commands()
])
except Exception as e:
logger.warning(
@@ -260,6 +272,8 @@ class TelegramAdapter(BasePlatformAdapter):
release_scoped_lock("telegram-bot-token", self._token_lock_identity)
except Exception:
pass
message = f"Telegram startup failed: {e}"
self._set_fatal_error("telegram_connect_error", message, retryable=True)
logger.error("[%s] Failed to connect to Telegram: %s", self.name, e, exc_info=True)
return False
@@ -289,13 +303,19 @@ class TelegramAdapter(BasePlatformAdapter):
release_scoped_lock("telegram-bot-token", self._token_lock_identity)
except Exception as e:
logger.warning("[%s] Error releasing Telegram token lock: %s", self.name, e, exc_info=True)
for task in self._pending_photo_batch_tasks.values():
if task and not task.done():
task.cancel()
self._pending_photo_batch_tasks.clear()
self._pending_photo_batches.clear()
self._mark_disconnected()
self._app = None
self._bot = None
self._token_lock_identity = None
logger.info("[%s] Disconnected from Telegram", self.name)
async def send(
self,
chat_id: str,
@@ -311,36 +331,59 @@ class TelegramAdapter(BasePlatformAdapter):
# Format and split message if needed
formatted = self.format_message(content)
chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
if len(chunks) > 1:
# truncate_message appends a raw " (1/2)" suffix. Escape the
# MarkdownV2-special parentheses so Telegram doesn't reject the
# chunk and fall back to plain text.
chunks = [
re.sub(r" \((\d+)/(\d+)\)$", r" \\(\1/\2\\)", chunk)
for chunk in chunks
]
message_ids = []
thread_id = metadata.get("thread_id") if metadata else None
try:
from telegram.error import NetworkError as _NetErr
except ImportError:
_NetErr = OSError # type: ignore[misc,assignment]
for i, chunk in enumerate(chunks):
# Try Markdown first, fall back to plain text if it fails
try:
msg = await self._bot.send_message(
chat_id=int(chat_id),
text=chunk,
parse_mode=ParseMode.MARKDOWN_V2,
reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
message_thread_id=int(thread_id) if thread_id else None,
)
except Exception as md_error:
# Markdown parsing failed, try plain text
if "parse" in str(md_error).lower() or "markdown" in str(md_error).lower():
logger.warning("[%s] MarkdownV2 parse failed, falling back to plain text: %s", self.name, md_error)
# Strip MDV2 escape backslashes so the user doesn't
# see raw backslashes littered through the message.
plain_chunk = _strip_mdv2(chunk)
msg = await self._bot.send_message(
chat_id=int(chat_id),
text=plain_chunk,
parse_mode=None, # Plain text
reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
message_thread_id=int(thread_id) if thread_id else None,
)
else:
raise # Re-raise if not a parse error
msg = None
for _send_attempt in range(3):
try:
# Try Markdown first, fall back to plain text if it fails
try:
msg = await self._bot.send_message(
chat_id=int(chat_id),
text=chunk,
parse_mode=ParseMode.MARKDOWN_V2,
reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
message_thread_id=int(thread_id) if thread_id else None,
)
except Exception as md_error:
# Markdown parsing failed, try plain text
if "parse" in str(md_error).lower() or "markdown" in str(md_error).lower():
logger.warning("[%s] MarkdownV2 parse failed, falling back to plain text: %s", self.name, md_error)
plain_chunk = _strip_mdv2(chunk)
msg = await self._bot.send_message(
chat_id=int(chat_id),
text=plain_chunk,
parse_mode=None,
reply_to_message_id=int(reply_to) if reply_to and i == 0 else None,
message_thread_id=int(thread_id) if thread_id else None,
)
else:
raise
break # success
except _NetErr as send_err:
if _send_attempt < 2:
wait = 2 ** _send_attempt
logger.warning("[%s] Network error on send (attempt %d/3), retrying in %ds: %s",
self.name, _send_attempt + 1, wait, send_err)
await asyncio.sleep(wait)
else:
raise
message_ids.append(str(msg.message_id))
return SendResult(
@@ -371,7 +414,10 @@ class TelegramAdapter(BasePlatformAdapter):
text=formatted,
parse_mode=ParseMode.MARKDOWN_V2,
)
except Exception:
except Exception as fmt_err:
# "Message is not modified" is a no-op, not an error
if "not modified" in str(fmt_err).lower():
return SendResult(success=True, message_id=message_id)
# Fallback: retry without markdown formatting
await self._bot.edit_message_text(
chat_id=int(chat_id),
@@ -380,6 +426,46 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=True, message_id=message_id)
except Exception as e:
err_str = str(e).lower()
# "Message is not modified" — content identical, treat as success
if "not modified" in err_str:
return SendResult(success=True, message_id=message_id)
# Message too long — content exceeded 4096 chars (e.g. during
# streaming). Truncate and succeed so the stream consumer can
# split the overflow into a new message instead of dying.
if "message_too_long" in err_str or "too long" in err_str:
truncated = content[: self.MAX_MESSAGE_LENGTH - 20] + ""
try:
await self._bot.edit_message_text(
chat_id=int(chat_id),
message_id=int(message_id),
text=truncated,
)
except Exception:
pass # best-effort truncation
return SendResult(success=True, message_id=message_id)
# Flood control / RetryAfter — back off and retry once
retry_after = getattr(e, "retry_after", None)
if retry_after is not None or "retry after" in err_str:
wait = retry_after if retry_after else 1.0
logger.warning(
"[%s] Telegram flood control, waiting %.1fs",
self.name, wait,
)
await asyncio.sleep(wait)
try:
await self._bot.edit_message_text(
chat_id=int(chat_id),
message_id=int(message_id),
text=content,
)
return SendResult(success=True, message_id=message_id)
except Exception as retry_err:
logger.error(
"[%s] Edit retry failed after flood wait: %s",
self.name, retry_err,
)
return SendResult(success=False, error=str(retry_err))
logger.error(
"[%s] Failed to edit Telegram message %s: %s",
self.name,
@@ -757,12 +843,17 @@ class TelegramAdapter(BasePlatformAdapter):
return text
async def _handle_text_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Handle incoming text messages."""
"""Handle incoming text messages.
Telegram clients split long messages into multiple updates. Buffer
rapid successive text messages from the same user/chat and aggregate
them into a single MessageEvent before dispatching.
"""
if not update.message or not update.message.text:
return
event = self._build_message_event(update.message, MessageType.TEXT)
await self.handle_message(event)
self._enqueue_text_event(event)
async def _handle_command(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Handle incoming command messages."""
@@ -807,6 +898,114 @@ class TelegramAdapter(BasePlatformAdapter):
event.text = "\n".join(parts)
await self.handle_message(event)
# ------------------------------------------------------------------
# Text message aggregation (handles Telegram client-side splits)
# ------------------------------------------------------------------
def _text_batch_key(self, event: MessageEvent) -> str:
"""Session-scoped key for text message batching."""
from gateway.session import build_session_key
return build_session_key(
event.source,
group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
)
def _enqueue_text_event(self, event: MessageEvent) -> None:
"""Buffer a text event and reset the flush timer.
When Telegram splits a long user message into multiple updates,
they arrive within a few hundred milliseconds. This method
concatenates them and waits for a short quiet period before
dispatching the combined message.
"""
key = self._text_batch_key(event)
existing = self._pending_text_batches.get(key)
if existing is None:
self._pending_text_batches[key] = event
else:
# Append text from the follow-up chunk
if event.text:
existing.text = f"{existing.text}\n{event.text}" if existing.text else event.text
# Merge any media that might be attached
if event.media_urls:
existing.media_urls.extend(event.media_urls)
existing.media_types.extend(event.media_types)
# Cancel any pending flush and restart the timer
prior_task = self._pending_text_batch_tasks.get(key)
if prior_task and not prior_task.done():
prior_task.cancel()
self._pending_text_batch_tasks[key] = asyncio.create_task(
self._flush_text_batch(key)
)
async def _flush_text_batch(self, key: str) -> None:
"""Wait for the quiet period then dispatch the aggregated text."""
current_task = asyncio.current_task()
try:
await asyncio.sleep(self._text_batch_delay_seconds)
event = self._pending_text_batches.pop(key, None)
if not event:
return
logger.info(
"[Telegram] Flushing text batch %s (%d chars)",
key, len(event.text or ""),
)
await self.handle_message(event)
finally:
if self._pending_text_batch_tasks.get(key) is current_task:
self._pending_text_batch_tasks.pop(key, None)
# ------------------------------------------------------------------
# Photo batching
# ------------------------------------------------------------------
def _photo_batch_key(self, event: MessageEvent, msg: Message) -> str:
"""Return a batching key for Telegram photos/albums."""
from gateway.session import build_session_key
session_key = build_session_key(
event.source,
group_sessions_per_user=self.config.extra.get("group_sessions_per_user", True),
)
media_group_id = getattr(msg, "media_group_id", None)
if media_group_id:
return f"{session_key}:album:{media_group_id}"
return f"{session_key}:photo-burst"
async def _flush_photo_batch(self, batch_key: str) -> None:
"""Send a buffered photo burst/album as a single MessageEvent."""
current_task = asyncio.current_task()
try:
await asyncio.sleep(self._media_batch_delay_seconds)
event = self._pending_photo_batches.pop(batch_key, None)
if not event:
return
logger.info("[Telegram] Flushing photo batch %s with %d image(s)", batch_key, len(event.media_urls))
await self.handle_message(event)
finally:
if self._pending_photo_batch_tasks.get(batch_key) is current_task:
self._pending_photo_batch_tasks.pop(batch_key, None)
def _enqueue_photo_event(self, batch_key: str, event: MessageEvent) -> None:
"""Merge photo events into a pending batch and schedule flush."""
existing = self._pending_photo_batches.get(batch_key)
if existing is None:
self._pending_photo_batches[batch_key] = event
else:
existing.media_urls.extend(event.media_urls)
existing.media_types.extend(event.media_types)
if event.text:
if not existing.text:
existing.text = event.text
elif event.text not in existing.text:
existing.text = f"{existing.text}\n\n{event.text}".strip()
prior_task = self._pending_photo_batch_tasks.get(batch_key)
if prior_task and not prior_task.done():
prior_task.cancel()
self._pending_photo_batch_tasks[batch_key] = asyncio.create_task(self._flush_photo_batch(batch_key))
async def _handle_media_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Handle incoming media messages, downloading images to local cache."""
if not update.message:
@@ -858,14 +1057,22 @@ class TelegramAdapter(BasePlatformAdapter):
if file_obj.file_path.lower().endswith(candidate):
ext = candidate
break
# Save to cache and populate media_urls with the local path
# Save to local cache (for vision tool access)
cached_path = cache_image_from_bytes(bytes(image_bytes), ext=ext)
event.media_urls = [cached_path]
event.media_types = [f"image/{ext.lstrip('.')}"]
event.media_types = [f"image/{ext.lstrip('.')}" ]
logger.info("[Telegram] Cached user photo at %s", cached_path)
media_group_id = getattr(msg, "media_group_id", None)
if media_group_id:
await self._queue_media_group_event(str(media_group_id), event)
else:
batch_key = self._photo_batch_key(event, msg)
self._enqueue_photo_event(batch_key, event)
return
except Exception as e:
logger.warning("[Telegram] Failed to cache photo: %s", e, exc_info=True)
# Download voice/audio messages to cache for STT transcription
if msg.voice:
try:
@@ -1093,11 +1300,20 @@ class TelegramAdapter(BasePlatformAdapter):
thread_id=str(message.message_thread_id) if message.message_thread_id else None,
)
# Extract reply context if this message is a reply
reply_to_id = None
reply_to_text = None
if message.reply_to_message:
reply_to_id = str(message.reply_to_message.message_id)
reply_to_text = message.reply_to_message.text or message.reply_to_message.caption or None
return MessageEvent(
text=message.text or "",
message_type=msg_type,
source=source,
raw_message=message,
message_id=str(message.message_id),
reply_to_message_id=reply_to_id,
reply_to_text=reply_to_text,
timestamp=message.date,
)
+557
View File
@@ -0,0 +1,557 @@
"""Generic webhook platform adapter.
Runs an aiohttp HTTP server that receives webhook POSTs from external
services (GitHub, GitLab, JIRA, Stripe, etc.), validates HMAC signatures,
transforms payloads into agent prompts, and routes responses back to the
source or to another configured platform.
Configuration lives in config.yaml under platforms.webhook.extra.routes.
Each route defines:
- events: which event types to accept (header-based filtering)
- secret: HMAC secret for signature validation (REQUIRED)
- prompt: template string formatted with the webhook payload
- skills: optional list of skills to load for the agent
- deliver: where to send the response (github_comment, telegram, etc.)
- deliver_extra: additional delivery config (repo, pr_number, chat_id)
Security:
- HMAC secret is required per route (validated at startup)
- Rate limiting per route (fixed-window, configurable)
- Idempotency cache prevents duplicate agent runs on webhook retries
- Body size limits checked before reading payload
- Set secret to "INSECURE_NO_AUTH" to skip validation (testing only)
"""
import asyncio
import hashlib
import hmac
import json
import logging
import re
import subprocess
import time
from typing import Any, Dict, List, Optional
try:
from aiohttp import web
AIOHTTP_AVAILABLE = True
except ImportError:
AIOHTTP_AVAILABLE = False
web = None # type: ignore[assignment]
from gateway.config import Platform, PlatformConfig
from gateway.platforms.base import (
BasePlatformAdapter,
MessageEvent,
MessageType,
SendResult,
)
logger = logging.getLogger(__name__)
DEFAULT_HOST = "0.0.0.0"
DEFAULT_PORT = 8644
_INSECURE_NO_AUTH = "INSECURE_NO_AUTH"
def check_webhook_requirements() -> bool:
"""Check if webhook adapter dependencies are available."""
return AIOHTTP_AVAILABLE
class WebhookAdapter(BasePlatformAdapter):
"""Generic webhook receiver that triggers agent runs from HTTP POSTs."""
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.WEBHOOK)
self._host: str = config.extra.get("host", DEFAULT_HOST)
self._port: int = int(config.extra.get("port", DEFAULT_PORT))
self._global_secret: str = config.extra.get("secret", "")
self._routes: Dict[str, dict] = config.extra.get("routes", {})
self._runner = None
# Delivery info keyed by session chat_id — consumed by send()
self._delivery_info: Dict[str, dict] = {}
# Reference to gateway runner for cross-platform delivery (set externally)
self.gateway_runner = None
# Idempotency: TTL cache of recently processed delivery IDs.
# Prevents duplicate agent runs when webhook providers retry.
self._seen_deliveries: Dict[str, float] = {}
self._idempotency_ttl: int = 3600 # 1 hour
# Rate limiting: per-route timestamps in a fixed window.
self._rate_counts: Dict[str, List[float]] = {}
self._rate_limit: int = int(config.extra.get("rate_limit", 30)) # per minute
# Body size limit (auth-before-body pattern)
self._max_body_bytes: int = int(
config.extra.get("max_body_bytes", 1_048_576)
) # 1MB
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
async def connect(self) -> bool:
# Validate routes at startup — secret is required per route
for name, route in self._routes.items():
secret = route.get("secret", self._global_secret)
if not secret:
raise ValueError(
f"[webhook] Route '{name}' has no HMAC secret. "
f"Set 'secret' on the route or globally. "
f"For testing without auth, set secret to '{_INSECURE_NO_AUTH}'."
)
app = web.Application()
app.router.add_get("/health", self._handle_health)
app.router.add_post("/webhooks/{route_name}", self._handle_webhook)
self._runner = web.AppRunner(app)
await self._runner.setup()
site = web.TCPSite(self._runner, self._host, self._port)
await site.start()
self._mark_connected()
route_names = ", ".join(self._routes.keys()) or "(none configured)"
logger.info(
"[webhook] Listening on %s:%d — routes: %s",
self._host,
self._port,
route_names,
)
return True
async def disconnect(self) -> None:
if self._runner:
await self._runner.cleanup()
self._runner = None
self._mark_disconnected()
logger.info("[webhook] Disconnected")
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Deliver the agent's response to the configured destination.
chat_id is ``webhook:{route}:{delivery_id}`` we pop the delivery
info stored during webhook receipt so it doesn't leak memory.
"""
delivery = self._delivery_info.pop(chat_id, {})
deliver_type = delivery.get("deliver", "log")
if deliver_type == "log":
logger.info("[webhook] Response for %s: %s", chat_id, content[:200])
return SendResult(success=True)
if deliver_type == "github_comment":
return await self._deliver_github_comment(content, delivery)
# Cross-platform delivery (telegram, discord, etc.)
if self.gateway_runner and deliver_type in (
"telegram",
"discord",
"slack",
"signal",
"sms",
):
return await self._deliver_cross_platform(
deliver_type, content, delivery
)
logger.warning("[webhook] Unknown deliver type: %s", deliver_type)
return SendResult(
success=False, error=f"Unknown deliver type: {deliver_type}"
)
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
return {"name": chat_id, "type": "webhook"}
# ------------------------------------------------------------------
# HTTP handlers
# ------------------------------------------------------------------
async def _handle_health(self, request: "web.Request") -> "web.Response":
"""GET /health — simple health check."""
return web.json_response({"status": "ok", "platform": "webhook"})
async def _handle_webhook(self, request: "web.Request") -> "web.Response":
"""POST /webhooks/{route_name} — receive and process a webhook event."""
route_name = request.match_info.get("route_name", "")
route_config = self._routes.get(route_name)
if not route_config:
return web.json_response(
{"error": f"Unknown route: {route_name}"}, status=404
)
# ── Auth-before-body ─────────────────────────────────────
# Check Content-Length before reading the full payload.
content_length = request.content_length or 0
if content_length > self._max_body_bytes:
return web.json_response(
{"error": "Payload too large"}, status=413
)
# ── Rate limiting ────────────────────────────────────────
now = time.time()
window = self._rate_counts.setdefault(route_name, [])
window[:] = [t for t in window if now - t < 60]
if len(window) >= self._rate_limit:
return web.json_response(
{"error": "Rate limit exceeded"}, status=429
)
window.append(now)
# Read body
try:
raw_body = await request.read()
except Exception as e:
logger.error("[webhook] Failed to read body: %s", e)
return web.json_response({"error": "Bad request"}, status=400)
# Validate HMAC signature (skip for INSECURE_NO_AUTH testing mode)
secret = route_config.get("secret", self._global_secret)
if secret and secret != _INSECURE_NO_AUTH:
if not self._validate_signature(request, raw_body, secret):
logger.warning(
"[webhook] Invalid signature for route %s", route_name
)
return web.json_response(
{"error": "Invalid signature"}, status=401
)
# Parse payload
try:
payload = json.loads(raw_body)
except json.JSONDecodeError:
# Try form-encoded as fallback
try:
import urllib.parse
payload = dict(
urllib.parse.parse_qsl(raw_body.decode("utf-8"))
)
except Exception:
return web.json_response(
{"error": "Cannot parse body"}, status=400
)
# Check event type filter
event_type = (
request.headers.get("X-GitHub-Event", "")
or request.headers.get("X-GitLab-Event", "")
or payload.get("event_type", "")
or "unknown"
)
allowed_events = route_config.get("events", [])
if allowed_events and event_type not in allowed_events:
logger.debug(
"[webhook] Ignoring event %s for route %s (allowed: %s)",
event_type,
route_name,
allowed_events,
)
return web.json_response(
{"status": "ignored", "event": event_type}
)
# Format prompt from template
prompt_template = route_config.get("prompt", "")
prompt = self._render_prompt(
prompt_template, payload, event_type, route_name
)
# Inject skill content if configured.
# We call build_skill_invocation_message() directly rather than
# using /skill-name slash commands — the gateway's command parser
# would intercept those and break the flow.
skills = route_config.get("skills", [])
if skills:
try:
from agent.skill_commands import (
build_skill_invocation_message,
get_skill_commands,
)
skill_cmds = get_skill_commands()
for skill_name in skills:
cmd_key = f"/{skill_name}"
if cmd_key in skill_cmds:
skill_content = build_skill_invocation_message(
cmd_key, user_instruction=prompt
)
if skill_content:
prompt = skill_content
break # Load the first matching skill
else:
logger.warning(
"[webhook] Skill '%s' not found", skill_name
)
except Exception as e:
logger.warning("[webhook] Skill loading failed: %s", e)
# Build a unique delivery ID
delivery_id = request.headers.get(
"X-GitHub-Delivery",
request.headers.get("X-Request-ID", str(int(time.time() * 1000))),
)
# ── Idempotency ─────────────────────────────────────────
# Skip duplicate deliveries (webhook retries).
now = time.time()
# Prune expired entries
self._seen_deliveries = {
k: v
for k, v in self._seen_deliveries.items()
if now - v < self._idempotency_ttl
}
if delivery_id in self._seen_deliveries:
logger.info(
"[webhook] Skipping duplicate delivery %s", delivery_id
)
return web.json_response(
{"status": "duplicate", "delivery_id": delivery_id},
status=200,
)
self._seen_deliveries[delivery_id] = now
# Use delivery_id in session key so concurrent webhooks on the
# same route get independent agent runs (not queued/interrupted).
session_chat_id = f"webhook:{route_name}:{delivery_id}"
# Store delivery info for send() — consumed (popped) on delivery
deliver_config = {
"deliver": route_config.get("deliver", "log"),
"deliver_extra": self._render_delivery_extra(
route_config.get("deliver_extra", {}), payload
),
"payload": payload,
}
self._delivery_info[session_chat_id] = deliver_config
# Build source and event
source = self.build_source(
chat_id=session_chat_id,
chat_name=f"webhook/{route_name}",
chat_type="webhook",
user_id=f"webhook:{route_name}",
user_name=route_name,
)
event = MessageEvent(
text=prompt,
message_type=MessageType.TEXT,
source=source,
raw_message=payload,
message_id=delivery_id,
)
logger.info(
"[webhook] %s event=%s route=%s prompt_len=%d delivery=%s",
request.method,
event_type,
route_name,
len(prompt),
delivery_id,
)
# Non-blocking — return 202 Accepted immediately
asyncio.create_task(self.handle_message(event))
return web.json_response(
{
"status": "accepted",
"route": route_name,
"event": event_type,
"delivery_id": delivery_id,
},
status=202,
)
# ------------------------------------------------------------------
# Signature validation
# ------------------------------------------------------------------
def _validate_signature(
self, request: "web.Request", body: bytes, secret: str
) -> bool:
"""Validate webhook signature (GitHub, GitLab, generic HMAC-SHA256)."""
# GitHub: X-Hub-Signature-256 = sha256=<hex>
gh_sig = request.headers.get("X-Hub-Signature-256", "")
if gh_sig:
expected = "sha256=" + hmac.new(
secret.encode(), body, hashlib.sha256
).hexdigest()
return hmac.compare_digest(gh_sig, expected)
# GitLab: X-Gitlab-Token = <plain secret>
gl_token = request.headers.get("X-Gitlab-Token", "")
if gl_token:
return hmac.compare_digest(gl_token, secret)
# Generic: X-Webhook-Signature = <hex HMAC-SHA256>
generic_sig = request.headers.get("X-Webhook-Signature", "")
if generic_sig:
expected = hmac.new(
secret.encode(), body, hashlib.sha256
).hexdigest()
return hmac.compare_digest(generic_sig, expected)
# No recognised signature header but secret is configured → reject
logger.debug(
"[webhook] Secret configured but no signature header found"
)
return False
# ------------------------------------------------------------------
# Prompt rendering
# ------------------------------------------------------------------
def _render_prompt(
self,
template: str,
payload: dict,
event_type: str,
route_name: str,
) -> str:
"""Render a prompt template with the webhook payload.
Supports dot-notation access into nested dicts:
``{pull_request.title}`` ``payload["pull_request"]["title"]``
"""
if not template:
truncated = json.dumps(payload, indent=2)[:4000]
return (
f"Webhook event '{event_type}' on route "
f"'{route_name}':\n\n```json\n{truncated}\n```"
)
def _resolve(match: re.Match) -> str:
key = match.group(1)
value: Any = payload
for part in key.split("."):
if isinstance(value, dict):
value = value.get(part, f"{{{key}}}")
else:
return f"{{{key}}}"
if isinstance(value, (dict, list)):
return json.dumps(value, indent=2)[:2000]
return str(value)
return re.sub(r"\{([a-zA-Z0-9_.]+)\}", _resolve, template)
def _render_delivery_extra(
self, extra: dict, payload: dict
) -> dict:
"""Render delivery_extra template values with payload data."""
rendered: Dict[str, Any] = {}
for key, value in extra.items():
if isinstance(value, str):
rendered[key] = self._render_prompt(value, payload, "", "")
else:
rendered[key] = value
return rendered
# ------------------------------------------------------------------
# Response delivery
# ------------------------------------------------------------------
async def _deliver_github_comment(
self, content: str, delivery: dict
) -> SendResult:
"""Post agent response as a GitHub PR/issue comment via ``gh`` CLI."""
extra = delivery.get("deliver_extra", {})
repo = extra.get("repo", "")
pr_number = extra.get("pr_number", "")
if not repo or not pr_number:
logger.error(
"[webhook] github_comment delivery missing repo or pr_number"
)
return SendResult(
success=False, error="Missing repo or pr_number"
)
try:
result = subprocess.run(
[
"gh",
"pr",
"comment",
str(pr_number),
"--repo",
repo,
"--body",
content,
],
capture_output=True,
text=True,
timeout=30,
)
if result.returncode == 0:
logger.info(
"[webhook] Posted comment on %s#%s", repo, pr_number
)
return SendResult(success=True)
else:
logger.error(
"[webhook] gh pr comment failed: %s", result.stderr
)
return SendResult(success=False, error=result.stderr)
except FileNotFoundError:
logger.error(
"[webhook] 'gh' CLI not found — install GitHub CLI for "
"github_comment delivery"
)
return SendResult(
success=False, error="gh CLI not installed"
)
except Exception as e:
logger.error("[webhook] github_comment delivery error: %s", e)
return SendResult(success=False, error=str(e))
async def _deliver_cross_platform(
self, platform_name: str, content: str, delivery: dict
) -> SendResult:
"""Route response to another platform (telegram, discord, etc.)."""
if not self.gateway_runner:
return SendResult(
success=False,
error="No gateway runner for cross-platform delivery",
)
try:
target_platform = Platform(platform_name)
except ValueError:
return SendResult(
success=False, error=f"Unknown platform: {platform_name}"
)
adapter = self.gateway_runner.adapters.get(target_platform)
if not adapter:
return SendResult(
success=False,
error=f"Platform {platform_name} not connected",
)
# Use home channel if no specific chat_id in deliver_extra
extra = delivery.get("deliver_extra", {})
chat_id = extra.get("chat_id", "")
if not chat_id:
home = self.gateway_runner.config.get_home_channel(target_platform)
if home:
chat_id = home.chat_id
else:
return SendResult(
success=False,
error=f"No chat_id or home channel for {platform_name}",
)
return await adapter.send(chat_id, content)
+49 -12
View File
@@ -136,6 +136,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
"session_path",
get_hermes_home() / "whatsapp" / "session"
))
self._reply_prefix: Optional[str] = config.extra.get("reply_prefix")
self._message_queue: asyncio.Queue = asyncio.Queue()
self._bridge_log_fh = None
self._bridge_log: Optional[Path] = None
@@ -181,9 +182,31 @@ class WhatsAppAdapter(BasePlatformAdapter):
# Ensure session directory exists
self._session_path.mkdir(parents=True, exist_ok=True)
# Check if bridge is already running and connected
import aiohttp
import asyncio
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://127.0.0.1:{self._bridge_port}/health",
timeout=aiohttp.ClientTimeout(total=2)
) as resp:
if resp.status == 200:
data = await resp.json()
bridge_status = data.get("status", "unknown")
if bridge_status == "connected":
print(f"[{self.name}] Using existing bridge (status: {bridge_status})")
self._running = True
self._bridge_process = None # Not managed by us
asyncio.create_task(self._poll_messages())
return True
else:
print(f"[{self.name}] Bridge found but not connected (status: {bridge_status}), restarting")
except Exception:
pass # Bridge not running, start a new one
# Kill any orphaned bridge from a previous gateway run
_kill_port_process(self._bridge_port)
import asyncio
await asyncio.sleep(1)
# Start the bridge process in its own process group.
@@ -193,6 +216,14 @@ class WhatsAppAdapter(BasePlatformAdapter):
self._bridge_log = self._session_path.parent / "bridge.log"
bridge_log_fh = open(self._bridge_log, "a")
self._bridge_log_fh = bridge_log_fh
# Build bridge subprocess environment.
# Pass WHATSAPP_REPLY_PREFIX from config.yaml so the Node bridge
# can use it without the user needing to set a separate env var.
bridge_env = os.environ.copy()
if self._reply_prefix is not None:
bridge_env["WHATSAPP_REPLY_PREFIX"] = self._reply_prefix
self._bridge_process = subprocess.Popen(
[
"node",
@@ -204,6 +235,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
stdout=bridge_log_fh,
stderr=bridge_log_fh,
preexec_fn=None if _IS_WINDOWS else os.setsid,
env=bridge_env,
)
# Wait for the bridge to connect to WhatsApp.
@@ -222,7 +254,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://localhost:{self._bridge_port}/health",
f"http://127.0.0.1:{self._bridge_port}/health",
timeout=aiohttp.ClientTimeout(total=2)
) as resp:
if resp.status == 200:
@@ -254,7 +286,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://localhost:{self._bridge_port}/health",
f"http://127.0.0.1:{self._bridge_port}/health",
timeout=aiohttp.ClientTimeout(total=2)
) as resp:
if resp.status == 200:
@@ -316,9 +348,9 @@ class WhatsAppAdapter(BasePlatformAdapter):
self._bridge_process.kill()
except Exception as e:
print(f"[{self.name}] Error stopping bridge: {e}")
# Also kill any orphaned bridge processes on our port
_kill_port_process(self._bridge_port)
else:
# Bridge was not started by us, don't kill it
print(f"[{self.name}] Disconnecting (external bridge left running)")
self._running = False
self._bridge_process = None
@@ -348,7 +380,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
payload["replyTo"] = reply_to
async with session.post(
f"http://localhost:{self._bridge_port}/send",
f"http://127.0.0.1:{self._bridge_port}/send",
json=payload,
timeout=aiohttp.ClientTimeout(total=30)
) as resp:
@@ -384,7 +416,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
import aiohttp
async with aiohttp.ClientSession() as session:
async with session.post(
f"http://localhost:{self._bridge_port}/edit",
f"http://127.0.0.1:{self._bridge_port}/edit",
json={
"chatId": chat_id,
"messageId": message_id,
@@ -429,7 +461,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
async with aiohttp.ClientSession() as session:
async with session.post(
f"http://localhost:{self._bridge_port}/send-media",
f"http://127.0.0.1:{self._bridge_port}/send-media",
json=payload,
timeout=aiohttp.ClientTimeout(total=120),
) as resp:
@@ -505,7 +537,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
async with aiohttp.ClientSession() as session:
await session.post(
f"http://localhost:{self._bridge_port}/typing",
f"http://127.0.0.1:{self._bridge_port}/typing",
json={"chatId": chat_id},
timeout=aiohttp.ClientTimeout(total=5)
)
@@ -522,7 +554,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://localhost:{self._bridge_port}/chat/{chat_id}",
f"http://127.0.0.1:{self._bridge_port}/chat/{chat_id}",
timeout=aiohttp.ClientTimeout(total=10)
) as resp:
if resp.status == 200:
@@ -549,7 +581,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
try:
async with aiohttp.ClientSession() as session:
async with session.get(
f"http://localhost:{self._bridge_port}/messages",
f"http://127.0.0.1:{self._bridge_port}/messages",
timeout=aiohttp.ClientTimeout(total=30)
) as resp:
if resp.status == 200:
@@ -611,6 +643,11 @@ class WhatsAppAdapter(BasePlatformAdapter):
print(f"[{self.name}] Failed to cache image: {e}", flush=True)
cached_urls.append(url)
media_types.append("image/jpeg")
elif msg_type == MessageType.PHOTO and os.path.isabs(url):
# Local file path — bridge already downloaded the image
cached_urls.append(url)
media_types.append("image/jpeg")
print(f"[{self.name}] Using bridge-cached image: {url}", flush=True)
elif msg_type == MessageType.VOICE and url.startswith(("http://", "https://")):
try:
cached_path = await cache_audio_from_url(url, ext=".ogg")
+968 -239
View File
File diff suppressed because it is too large Load Diff
+168 -20
View File
@@ -8,9 +8,11 @@ Handles:
- Dynamic system prompt injection (agent knows its context)
"""
import hashlib
import logging
import os
import json
import re
import uuid
from pathlib import Path
from datetime import datetime, timedelta
@@ -19,6 +21,41 @@ from typing import Dict, List, Optional, Any
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# PII redaction helpers
# ---------------------------------------------------------------------------
_PHONE_RE = re.compile(r"^\+?\d[\d\-\s]{6,}$")
def _hash_id(value: str) -> str:
"""Deterministic 12-char hex hash of an identifier."""
return hashlib.sha256(value.encode("utf-8")).hexdigest()[:12]
def _hash_sender_id(value: str) -> str:
"""Hash a sender ID to ``user_<12hex>``."""
return f"user_{_hash_id(value)}"
def _hash_chat_id(value: str) -> str:
"""Hash the numeric portion of a chat ID, preserving platform prefix.
``telegram:12345`` ``telegram:<hash>``
``12345`` ``<hash>``
"""
colon = value.find(":")
if colon > 0:
prefix = value[:colon]
return f"{prefix}:{_hash_id(value[colon + 1:])}"
return _hash_id(value)
def _looks_like_phone(value: str) -> bool:
"""Return True if *value* looks like a phone number (E.164 or similar)."""
return bool(_PHONE_RE.match(value.strip()))
from .config import (
Platform,
GatewayConfig,
@@ -146,7 +183,21 @@ class SessionContext:
}
def build_session_context_prompt(context: SessionContext) -> str:
_PII_SAFE_PLATFORMS = frozenset({
Platform.WHATSAPP,
Platform.SIGNAL,
Platform.TELEGRAM,
})
"""Platforms where user IDs can be safely redacted (no in-message mention system
that requires raw IDs). Discord is excluded because mentions use ``<@user_id>``
and the LLM needs the real ID to tag users."""
def build_session_context_prompt(
context: SessionContext,
*,
redact_pii: bool = False,
) -> str:
"""
Build the dynamic system prompt section that tells the agent about its context.
@@ -154,7 +205,15 @@ def build_session_context_prompt(context: SessionContext) -> str:
- Where messages are coming from
- What platforms are connected
- Where it can deliver scheduled task outputs
When *redact_pii* is True **and** the source platform is in
``_PII_SAFE_PLATFORMS``, phone numbers are stripped and user/chat IDs
are replaced with deterministic hashes before being sent to the LLM.
Platforms like Discord are excluded because mentions need real IDs.
Routing still uses the original values (they stay in SessionSource).
"""
# Only apply redaction on platforms where IDs aren't needed for mentions
redact_pii = redact_pii and context.source.platform in _PII_SAFE_PLATFORMS
lines = [
"## Current Session Context",
"",
@@ -165,7 +224,25 @@ def build_session_context_prompt(context: SessionContext) -> str:
if context.source.platform == Platform.LOCAL:
lines.append(f"**Source:** {platform_name} (the machine running this agent)")
else:
lines.append(f"**Source:** {platform_name} ({context.source.description})")
# Build a description that respects PII redaction
src = context.source
if redact_pii:
# Build a safe description without raw IDs
_uname = src.user_name or (
_hash_sender_id(src.user_id) if src.user_id else "user"
)
_cname = src.chat_name or _hash_chat_id(src.chat_id)
if src.chat_type == "dm":
desc = f"DM with {_uname}"
elif src.chat_type == "group":
desc = f"group: {_cname}"
elif src.chat_type == "channel":
desc = f"channel: {_cname}"
else:
desc = _cname
else:
desc = src.description
lines.append(f"**Source:** {platform_name} ({desc})")
# Channel topic (if available - provides context about the channel's purpose)
if context.source.chat_topic:
@@ -175,7 +252,10 @@ def build_session_context_prompt(context: SessionContext) -> str:
if context.source.user_name:
lines.append(f"**User:** {context.source.user_name}")
elif context.source.user_id:
lines.append(f"**User ID:** {context.source.user_id}")
uid = context.source.user_id
if redact_pii:
uid = _hash_sender_id(uid)
lines.append(f"**User ID:** {uid}")
# Platform-specific behavioral notes
if context.source.platform == Platform.SLACK:
@@ -210,7 +290,8 @@ def build_session_context_prompt(context: SessionContext) -> str:
lines.append("")
lines.append("**Home Channels (default destinations):**")
for platform, home in context.home_channels.items():
lines.append(f" - {platform.value}: {home.name} (ID: {home.chat_id})")
hc_id = _hash_chat_id(home.chat_id) if redact_pii else home.chat_id
lines.append(f" - {platform.value}: {home.name} (ID: {hc_id})")
# Delivery options for scheduled tasks
lines.append("")
@@ -220,7 +301,10 @@ def build_session_context_prompt(context: SessionContext) -> str:
if context.source.platform == Platform.LOCAL:
lines.append("- `\"origin\"` → Local output (saved to files)")
else:
lines.append(f"- `\"origin\"` → Back to this chat ({context.source.chat_name or context.source.chat_id})")
_origin_label = context.source.chat_name or (
_hash_chat_id(context.source.chat_id) if redact_pii else context.source.chat_id
)
lines.append(f"- `\"origin\"` → Back to this chat ({_origin_label})")
# Local always available
lines.append("- `\"local\"` → Save to local files only (~/.hermes/cron/output/)")
@@ -259,7 +343,11 @@ class SessionEntry:
# Token tracking
input_tokens: int = 0
output_tokens: int = 0
cache_read_tokens: int = 0
cache_write_tokens: int = 0
total_tokens: int = 0
estimated_cost_usd: float = 0.0
cost_status: str = "unknown"
# Last API-reported prompt tokens (for accurate compression pre-check)
last_prompt_tokens: int = 0
@@ -279,8 +367,12 @@ class SessionEntry:
"chat_type": self.chat_type,
"input_tokens": self.input_tokens,
"output_tokens": self.output_tokens,
"cache_read_tokens": self.cache_read_tokens,
"cache_write_tokens": self.cache_write_tokens,
"total_tokens": self.total_tokens,
"last_prompt_tokens": self.last_prompt_tokens,
"estimated_cost_usd": self.estimated_cost_usd,
"cost_status": self.cost_status,
}
if self.origin:
result["origin"] = self.origin.to_dict()
@@ -310,36 +402,56 @@ class SessionEntry:
chat_type=data.get("chat_type", "dm"),
input_tokens=data.get("input_tokens", 0),
output_tokens=data.get("output_tokens", 0),
cache_read_tokens=data.get("cache_read_tokens", 0),
cache_write_tokens=data.get("cache_write_tokens", 0),
total_tokens=data.get("total_tokens", 0),
last_prompt_tokens=data.get("last_prompt_tokens", 0),
estimated_cost_usd=data.get("estimated_cost_usd", 0.0),
cost_status=data.get("cost_status", "unknown"),
)
def build_session_key(source: SessionSource) -> str:
def build_session_key(source: SessionSource, group_sessions_per_user: bool = True) -> str:
"""Build a deterministic session key from a message source.
This is the single source of truth for session key construction.
DM rules:
- WhatsApp DMs include chat_id (multi-user support).
- Other DMs include thread_id when present (e.g. Slack threaded DMs),
so each DM thread gets its own session while top-level DMs share one.
- Without thread_id or chat_id, all DMs share a single session.
- DMs include chat_id when present, so each private conversation is isolated.
- thread_id further differentiates threaded DMs within the same DM chat.
- Without chat_id, thread_id is used as a best-effort fallback.
- Without thread_id or chat_id, DMs share a single session.
Group/channel rules:
- thread_id differentiates threads within a channel.
- Without thread_id, all messages in a channel share one session.
- chat_id identifies the parent group/channel.
- user_id/user_id_alt isolates participants within that parent chat when available when
``group_sessions_per_user`` is enabled.
- thread_id differentiates threads within that parent chat.
- Without participant identifiers, or when isolation is disabled, messages fall back to one
shared session per chat.
- Without identifiers, messages fall back to one session per platform/chat_type.
"""
platform = source.platform.value
if source.chat_type == "dm":
if source.chat_id:
if source.thread_id:
return f"agent:main:{platform}:dm:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{source.chat_id}"
if source.thread_id:
return f"agent:main:{platform}:dm:{source.thread_id}"
if platform == "whatsapp" and source.chat_id:
return f"agent:main:{platform}:dm:{source.chat_id}"
return f"agent:main:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
key_parts = ["agent:main", platform, source.chat_type]
if source.chat_id:
key_parts.append(source.chat_id)
if source.thread_id:
return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}"
key_parts.append(source.thread_id)
if group_sessions_per_user and participant_id:
key_parts.append(str(participant_id))
return ":".join(key_parts)
class SessionStore:
@@ -418,7 +530,10 @@ class SessionStore:
def _generate_session_key(self, source: SessionSource) -> str:
"""Generate a session key from a source."""
return build_session_key(source)
return build_session_key(
source,
group_sessions_per_user=getattr(self.config, "group_sessions_per_user", True),
)
def _is_session_expired(self, entry: SessionEntry) -> bool:
"""Check if a session has expired based on its reset policy.
@@ -593,8 +708,15 @@ class SessionStore:
session_key: str,
input_tokens: int = 0,
output_tokens: int = 0,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
last_prompt_tokens: int = None,
model: str = None,
estimated_cost_usd: Optional[float] = None,
cost_status: Optional[str] = None,
cost_source: Optional[str] = None,
provider: Optional[str] = None,
base_url: Optional[str] = None,
) -> None:
"""Update a session's metadata after an interaction."""
self._ensure_loaded()
@@ -604,15 +726,35 @@ class SessionStore:
entry.updated_at = datetime.now()
entry.input_tokens += input_tokens
entry.output_tokens += output_tokens
entry.cache_read_tokens += cache_read_tokens
entry.cache_write_tokens += cache_write_tokens
if last_prompt_tokens is not None:
entry.last_prompt_tokens = last_prompt_tokens
entry.total_tokens = entry.input_tokens + entry.output_tokens
if estimated_cost_usd is not None:
entry.estimated_cost_usd += estimated_cost_usd
if cost_status:
entry.cost_status = cost_status
entry.total_tokens = (
entry.input_tokens
+ entry.output_tokens
+ entry.cache_read_tokens
+ entry.cache_write_tokens
)
self._save()
if self._db:
try:
self._db.update_token_counts(
entry.session_id, input_tokens, output_tokens,
entry.session_id,
input_tokens=input_tokens,
output_tokens=output_tokens,
cache_read_tokens=cache_read_tokens,
cache_write_tokens=cache_write_tokens,
estimated_cost_usd=estimated_cost_usd,
cost_status=cost_status,
cost_source=cost_source,
billing_provider=provider,
billing_base_url=base_url,
model=model,
)
except Exception as e:
@@ -802,7 +944,13 @@ class SessionStore:
for line in f:
line = line.strip()
if line:
messages.append(json.loads(line))
try:
messages.append(json.loads(line))
except json.JSONDecodeError:
logger.warning(
"Skipping corrupt line in transcript %s: %s",
session_id, line[:120],
)
return messages
+26 -6
View File
@@ -83,11 +83,30 @@ def _looks_like_gateway_process(pid: int) -> bool:
"""Return True when the live PID still looks like the Hermes gateway."""
cmdline = _read_process_cmdline(pid)
if not cmdline:
# If we cannot inspect the process, fall back to the liveness check.
return True
return False
patterns = (
"hermes_cli.main gateway",
"hermes_cli/main.py gateway",
"hermes gateway",
"gateway/run.py",
)
return any(pattern in cmdline for pattern in patterns)
def _record_looks_like_gateway(record: dict[str, Any]) -> bool:
"""Validate gateway identity from PID-file metadata when cmdline is unavailable."""
if record.get("kind") != _GATEWAY_KIND:
return False
argv = record.get("argv")
if not isinstance(argv, list) or not argv:
return False
cmdline = " ".join(str(part) for part in argv)
patterns = (
"hermes_cli.main gateway",
"hermes_cli/main.py gateway",
"hermes gateway",
"gateway/run.py",
)
@@ -178,8 +197,8 @@ def write_runtime_status(
payload = _read_json_file(path) or _build_runtime_status_record()
payload.setdefault("platforms", {})
payload.setdefault("kind", _GATEWAY_KIND)
payload.setdefault("pid", os.getpid())
payload.setdefault("start_time", _get_process_start_time(os.getpid()))
payload["pid"] = os.getpid()
payload["start_time"] = _get_process_start_time(os.getpid())
payload["updated_at"] = _utc_now_iso()
if gateway_state is not None:
@@ -325,8 +344,9 @@ def get_running_pid() -> Optional[int]:
return None
if not _looks_like_gateway_process(pid):
remove_pid_file()
return None
if not _record_looks_like_gateway(record):
remove_pid_file()
return None
return pid
+202
View File
@@ -0,0 +1,202 @@
"""Gateway streaming consumer — bridges sync agent callbacks to async platform delivery.
The agent fires stream_delta_callback(text) synchronously from its worker thread.
GatewayStreamConsumer:
1. Receives deltas via on_delta() (thread-safe, sync)
2. Queues them to an asyncio task via queue.Queue
3. The async run() task buffers, rate-limits, and progressively edits
a single message on the target platform
Design: Uses the edit transport (send initial message, then editMessageText).
This is universally supported across Telegram, Discord, and Slack.
Credit: jobless0x (#774, #1312), OutThisLife (#798), clicksingh (#697).
"""
from __future__ import annotations
import asyncio
import logging
import queue
import time
from dataclasses import dataclass
from typing import Any, Optional
logger = logging.getLogger("gateway.stream_consumer")
# Sentinel to signal the stream is complete
_DONE = object()
@dataclass
class StreamConsumerConfig:
"""Runtime config for a single stream consumer instance."""
edit_interval: float = 0.3
buffer_threshold: int = 40
cursor: str = ""
class GatewayStreamConsumer:
"""Async consumer that progressively edits a platform message with streamed tokens.
Usage::
consumer = GatewayStreamConsumer(adapter, chat_id, config, metadata=metadata)
# Pass consumer.on_delta as stream_delta_callback to AIAgent
agent = AIAgent(..., stream_delta_callback=consumer.on_delta)
# Start the consumer as an asyncio task
task = asyncio.create_task(consumer.run())
# ... run agent in thread pool ...
consumer.finish() # signal completion
await task # wait for final edit
"""
def __init__(
self,
adapter: Any,
chat_id: str,
config: Optional[StreamConsumerConfig] = None,
metadata: Optional[dict] = None,
):
self.adapter = adapter
self.chat_id = chat_id
self.cfg = config or StreamConsumerConfig()
self.metadata = metadata
self._queue: queue.Queue = queue.Queue()
self._accumulated = ""
self._message_id: Optional[str] = None
self._already_sent = False
self._edit_supported = True # Disabled on first edit failure (Signal/Email/HA)
self._last_edit_time = 0.0
self._last_sent_text = "" # Track last-sent text to skip redundant edits
@property
def already_sent(self) -> bool:
"""True if at least one message was sent/edited — signals the base
adapter to skip re-sending the final response."""
return self._already_sent
def on_delta(self, text: str) -> None:
"""Thread-safe callback — called from the agent's worker thread."""
if text:
self._queue.put(text)
def finish(self) -> None:
"""Signal that the stream is complete."""
self._queue.put(_DONE)
async def run(self) -> None:
"""Async task that drains the queue and edits the platform message."""
# Platform message length limit — leave room for cursor + formatting
_raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
_safe_limit = max(500, _raw_limit - len(self.cfg.cursor) - 100)
try:
while True:
# Drain all available items from the queue
got_done = False
while True:
try:
item = self._queue.get_nowait()
if item is _DONE:
got_done = True
break
self._accumulated += item
except queue.Empty:
break
# Decide whether to flush an edit
now = time.monotonic()
elapsed = now - self._last_edit_time
should_edit = (
got_done
or (elapsed >= self.cfg.edit_interval
and len(self._accumulated) > 0)
or len(self._accumulated) >= self.cfg.buffer_threshold
)
if should_edit and self._accumulated:
# Split overflow: if accumulated text exceeds the platform
# limit, finalize the current message and start a new one.
while (
len(self._accumulated) > _safe_limit
and self._message_id is not None
):
split_at = self._accumulated.rfind("\n", 0, _safe_limit)
if split_at < _safe_limit // 2:
split_at = _safe_limit
chunk = self._accumulated[:split_at]
await self._send_or_edit(chunk)
self._accumulated = self._accumulated[split_at:].lstrip("\n")
self._message_id = None
self._last_sent_text = ""
display_text = self._accumulated
if not got_done:
display_text += self.cfg.cursor
await self._send_or_edit(display_text)
self._last_edit_time = time.monotonic()
if got_done:
# Final edit without cursor
if self._accumulated and self._message_id:
await self._send_or_edit(self._accumulated)
return
await asyncio.sleep(0.05) # Small yield to not busy-loop
except asyncio.CancelledError:
# Best-effort final edit on cancellation
if self._accumulated and self._message_id:
try:
await self._send_or_edit(self._accumulated)
except Exception:
pass
except Exception as e:
logger.error("Stream consumer error: %s", e)
async def _send_or_edit(self, text: str) -> None:
"""Send or edit the streaming message."""
try:
if self._message_id is not None:
if self._edit_supported:
# Skip if text is identical to what we last sent
if text == self._last_sent_text:
return
# Edit existing message
result = await self.adapter.edit_message(
chat_id=self.chat_id,
message_id=self._message_id,
content=text,
)
if result.success:
self._already_sent = True
self._last_sent_text = text
else:
# Edit not supported by this adapter — stop streaming,
# let the normal send path handle the final response.
# Without this guard, adapters like Signal/Email would
# flood the chat with a new message every edit_interval.
logger.debug("Edit failed, disabling streaming for this adapter")
self._edit_supported = False
else:
# Editing not supported — skip intermediate updates.
# The final response will be sent by the normal path.
pass
else:
# First message — send new
result = await self.adapter.send(
chat_id=self.chat_id,
content=text,
metadata=self.metadata,
)
if result.success and result.message_id:
self._message_id = result.message_id
self._already_sent = True
self._last_sent_text = text
else:
# Initial send failed — disable streaming for this session
self._edit_supported = False
except Exception as e:
logger.error("Stream send/edit error: %s", e)
+2 -2
View File
@@ -11,5 +11,5 @@ Provides subcommands for:
- hermes cron - Manage cron jobs
"""
__version__ = "0.2.0"
__release_date__ = "2026.3.12"
__version__ = "0.4.0"
__release_date__ = "2026.3.18"
+217 -14
View File
@@ -19,6 +19,7 @@ import json
import logging
import os
import shutil
import shlex
import stat
import base64
import hashlib
@@ -66,6 +67,8 @@ DEFAULT_AGENT_KEY_MIN_TTL_SECONDS = 30 * 60 # 30 minutes
ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120 # refresh 2 min before expiry
DEVICE_AUTH_POLL_INTERVAL_CAP_SECONDS = 1 # poll at most every 1s
DEFAULT_CODEX_BASE_URL = "https://chatgpt.com/backend-api/codex"
DEFAULT_GITHUB_MODELS_BASE_URL = "https://api.githubcopilot.com"
DEFAULT_COPILOT_ACP_BASE_URL = "acp://copilot"
CODEX_OAUTH_CLIENT_ID = "app_EMoamEEZ73f0CkXaXp7hrann"
CODEX_OAUTH_TOKEN_URL = "https://auth.openai.com/oauth/token"
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS = 120
@@ -108,6 +111,20 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
auth_type="oauth_external",
inference_base_url=DEFAULT_CODEX_BASE_URL,
),
"copilot": ProviderConfig(
id="copilot",
name="GitHub Copilot",
auth_type="api_key",
inference_base_url=DEFAULT_GITHUB_MODELS_BASE_URL,
api_key_env_vars=("COPILOT_GITHUB_TOKEN", "GH_TOKEN", "GITHUB_TOKEN"),
),
"copilot-acp": ProviderConfig(
id="copilot-acp",
name="GitHub Copilot ACP",
auth_type="external_process",
inference_base_url=DEFAULT_COPILOT_ACP_BASE_URL,
base_url_env_var="COPILOT_ACP_BASE_URL",
),
"zai": ProviderConfig(
id="zai",
name="Z.AI / GLM",
@@ -128,7 +145,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
id="minimax",
name="MiniMax",
auth_type="api_key",
inference_base_url="https://api.minimax.io/v1",
inference_base_url="https://api.minimax.io/anthropic",
api_key_env_vars=("MINIMAX_API_KEY",),
base_url_env_var="MINIMAX_BASE_URL",
),
@@ -139,14 +156,62 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
inference_base_url="https://api.anthropic.com",
api_key_env_vars=("ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN", "CLAUDE_CODE_OAUTH_TOKEN"),
),
"alibaba": ProviderConfig(
id="alibaba",
name="Alibaba Cloud (DashScope)",
auth_type="api_key",
inference_base_url="https://dashscope-intl.aliyuncs.com/apps/anthropic",
api_key_env_vars=("DASHSCOPE_API_KEY",),
base_url_env_var="DASHSCOPE_BASE_URL",
),
"minimax-cn": ProviderConfig(
id="minimax-cn",
name="MiniMax (China)",
auth_type="api_key",
inference_base_url="https://api.minimaxi.com/v1",
inference_base_url="https://api.minimaxi.com/anthropic",
api_key_env_vars=("MINIMAX_CN_API_KEY",),
base_url_env_var="MINIMAX_CN_BASE_URL",
),
"deepseek": ProviderConfig(
id="deepseek",
name="DeepSeek",
auth_type="api_key",
inference_base_url="https://api.deepseek.com/v1",
api_key_env_vars=("DEEPSEEK_API_KEY",),
base_url_env_var="DEEPSEEK_BASE_URL",
),
"ai-gateway": ProviderConfig(
id="ai-gateway",
name="AI Gateway",
auth_type="api_key",
inference_base_url="https://ai-gateway.vercel.sh/v1",
api_key_env_vars=("AI_GATEWAY_API_KEY",),
base_url_env_var="AI_GATEWAY_BASE_URL",
),
"opencode-zen": ProviderConfig(
id="opencode-zen",
name="OpenCode Zen",
auth_type="api_key",
inference_base_url="https://opencode.ai/zen/v1",
api_key_env_vars=("OPENCODE_ZEN_API_KEY",),
base_url_env_var="OPENCODE_ZEN_BASE_URL",
),
"opencode-go": ProviderConfig(
id="opencode-go",
name="OpenCode Go",
auth_type="***",
inference_base_url="https://opencode.ai/zen/go/v1",
api_key_env_vars=("OPEN...",),
base_url_env_var="OPENCODE_GO_BASE_URL",
),
"kilocode": ProviderConfig(
id="kilocode",
name="Kilo Code",
auth_type="api_key",
inference_base_url="https://api.kilo.ai/api/gateway",
api_key_env_vars=("KILOCODE_API_KEY",),
base_url_env_var="KILOCODE_BASE_URL",
),
}
@@ -174,6 +239,70 @@ def _resolve_kimi_base_url(api_key: str, default_url: str, env_override: str) ->
return default_url
def _gh_cli_candidates() -> list[str]:
"""Return candidate ``gh`` binary paths, including common Homebrew installs."""
candidates: list[str] = []
resolved = shutil.which("gh")
if resolved:
candidates.append(resolved)
for candidate in (
"/opt/homebrew/bin/gh",
"/usr/local/bin/gh",
str(Path.home() / ".local" / "bin" / "gh"),
):
if candidate in candidates:
continue
if os.path.isfile(candidate) and os.access(candidate, os.X_OK):
candidates.append(candidate)
return candidates
def _try_gh_cli_token() -> Optional[str]:
"""Return a token from ``gh auth token`` when the GitHub CLI is available."""
for gh_path in _gh_cli_candidates():
try:
result = subprocess.run(
[gh_path, "auth", "token"],
capture_output=True,
text=True,
timeout=5,
)
except (FileNotFoundError, subprocess.TimeoutExpired) as exc:
logger.debug("gh CLI token lookup failed (%s): %s", gh_path, exc)
continue
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip()
return None
def _resolve_api_key_provider_secret(
provider_id: str, pconfig: ProviderConfig
) -> tuple[str, str]:
"""Resolve an API-key provider's token and indicate where it came from."""
if provider_id == "copilot":
# Use the dedicated copilot auth module for proper token validation
try:
from hermes_cli.copilot_auth import resolve_copilot_token
token, source = resolve_copilot_token()
if token:
return token, source
except ValueError as exc:
logger.warning("Copilot token validation failed: %s", exc)
except Exception:
pass
return "", ""
for env_var in pconfig.api_key_env_vars:
val = os.getenv(env_var, "").strip()
if val:
return val, env_var
return "", ""
# =============================================================================
# Z.AI Endpoint Detection
# =============================================================================
@@ -524,6 +653,13 @@ def resolve_provider(
"kimi": "kimi-coding", "moonshot": "kimi-coding",
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
"claude": "anthropic", "claude-code": "anthropic",
"github": "copilot", "github-copilot": "copilot",
"github-models": "copilot", "github-model": "copilot",
"github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
"aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
"opencode": "opencode-zen", "zen": "opencode-zen",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
}
normalized = _PROVIDER_ALIASES.get(normalized, normalized)
@@ -559,6 +695,11 @@ def resolve_provider(
for pid, pconfig in PROVIDER_REGISTRY.items():
if pconfig.auth_type != "api_key":
continue
# GitHub tokens are commonly present for repo/tool access but should not
# hijack inference auto-selection unless the user explicitly chooses
# Copilot/GitHub Models as the provider.
if pid == "copilot":
continue
for env_var in pconfig.api_key_env_vars:
if os.getenv(env_var, "").strip():
return pid
@@ -1427,12 +1568,7 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
api_key = ""
key_source = ""
for env_var in pconfig.api_key_env_vars:
val = os.getenv(env_var, "").strip()
if val:
api_key = val
key_source = env_var
break
api_key, key_source = _resolve_api_key_provider_secret(provider_id, pconfig)
env_url = ""
if pconfig.base_url_env_var:
@@ -1455,6 +1591,36 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
}
def get_external_process_provider_status(provider_id: str) -> Dict[str, Any]:
"""Status snapshot for providers that run a local subprocess."""
pconfig = PROVIDER_REGISTRY.get(provider_id)
if not pconfig or pconfig.auth_type != "external_process":
return {"configured": False}
command = (
os.getenv("HERMES_COPILOT_ACP_COMMAND", "").strip()
or os.getenv("COPILOT_CLI_PATH", "").strip()
or "copilot"
)
raw_args = os.getenv("HERMES_COPILOT_ACP_ARGS", "").strip()
args = shlex.split(raw_args) if raw_args else ["--acp", "--stdio"]
base_url = os.getenv(pconfig.base_url_env_var, "").strip() if pconfig.base_url_env_var else ""
if not base_url:
base_url = pconfig.inference_base_url
resolved_command = shutil.which(command) if command else None
return {
"configured": bool(resolved_command or base_url.startswith("acp+tcp://")),
"provider": provider_id,
"name": pconfig.name,
"command": command,
"args": args,
"resolved_command": resolved_command,
"base_url": base_url,
"logged_in": bool(resolved_command or base_url.startswith("acp+tcp://")),
}
def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
"""Generic auth status dispatcher."""
target = provider_id or get_active_provider()
@@ -1462,6 +1628,8 @@ def get_auth_status(provider_id: Optional[str] = None) -> Dict[str, Any]:
return get_nous_auth_status()
if target == "openai-codex":
return get_codex_auth_status()
if target == "copilot-acp":
return get_external_process_provider_status(target)
# API-key providers
pconfig = PROVIDER_REGISTRY.get(target)
if pconfig and pconfig.auth_type == "api_key":
@@ -1484,12 +1652,7 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
api_key = ""
key_source = ""
for env_var in pconfig.api_key_env_vars:
val = os.getenv(env_var, "").strip()
if val:
api_key = val
key_source = env_var
break
api_key, key_source = _resolve_api_key_provider_secret(provider_id, pconfig)
env_url = ""
if pconfig.base_url_env_var:
@@ -1510,6 +1673,46 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
}
def resolve_external_process_provider_credentials(provider_id: str) -> Dict[str, Any]:
"""Resolve runtime details for local subprocess-backed providers."""
pconfig = PROVIDER_REGISTRY.get(provider_id)
if not pconfig or pconfig.auth_type != "external_process":
raise AuthError(
f"Provider '{provider_id}' is not an external-process provider.",
provider=provider_id,
code="invalid_provider",
)
base_url = os.getenv(pconfig.base_url_env_var, "").strip() if pconfig.base_url_env_var else ""
if not base_url:
base_url = pconfig.inference_base_url
command = (
os.getenv("HERMES_COPILOT_ACP_COMMAND", "").strip()
or os.getenv("COPILOT_CLI_PATH", "").strip()
or "copilot"
)
raw_args = os.getenv("HERMES_COPILOT_ACP_ARGS", "").strip()
args = shlex.split(raw_args) if raw_args else ["--acp", "--stdio"]
resolved_command = shutil.which(command) if command else None
if not resolved_command and not base_url.startswith("acp+tcp://"):
raise AuthError(
f"Could not find the Copilot CLI command '{command}'. "
"Install GitHub Copilot CLI or set HERMES_COPILOT_ACP_COMMAND/COPILOT_CLI_PATH.",
provider=provider_id,
code="missing_copilot_cli",
)
return {
"provider": provider_id,
"api_key": "copilot-acp",
"base_url": base_url.rstrip("/"),
"command": resolved_command or command,
"args": args,
"source": "process",
}
# =============================================================================
# External credential detection
# =============================================================================
+34 -26
View File
@@ -102,27 +102,22 @@ COMPACT_BANNER = """
# =========================================================================
def get_available_skills() -> Dict[str, List[str]]:
"""Scan ~/.hermes/skills/ and return skills grouped by category."""
import os
"""Return skills grouped by category, filtered by platform and disabled state.
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
skills_dir = hermes_home / "skills"
skills_by_category = {}
if not skills_dir.exists():
return skills_by_category
for skill_file in skills_dir.rglob("SKILL.md"):
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
category = parts[0]
skill_name = parts[-2]
else:
category = "general"
skill_name = skill_file.parent.name
skills_by_category.setdefault(category, []).append(skill_name)
Delegates to ``_find_all_skills()`` from ``tools/skills_tool`` which already
handles platform gating (``platforms:`` frontmatter) and respects the
user's ``skills.disabled`` config list.
"""
try:
from tools.skills_tool import _find_all_skills
all_skills = _find_all_skills() # already filtered
except Exception:
return {}
skills_by_category: Dict[str, List[str]] = {}
for skill in all_skills:
category = skill.get("category") or "general"
skills_by_category.setdefault(category, []).append(skill["name"])
return skills_by_category
@@ -233,6 +228,17 @@ def _format_context_length(tokens: int) -> str:
return str(tokens)
def _display_toolset_name(toolset_name: str) -> str:
"""Normalize internal/legacy toolset identifiers for banner display."""
if not toolset_name:
return "unknown"
return (
toolset_name[:-6]
if toolset_name.endswith("_tools")
else toolset_name
)
def build_welcome_banner(console: Console, model: str, cwd: str,
tools: List[dict] = None,
enabled_toolsets: List[str] = None,
@@ -283,6 +289,8 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
_hero = HERMES_CADUCEUS
left_lines = ["", _hero, ""]
model_short = model.split("/")[-1] if "/" in model else model
if model_short.endswith(".gguf"):
model_short = model_short[:-5]
if len(model_short) > 28:
model_short = model_short[:25] + "..."
ctx_str = f" [dim {dim}]·[/] [dim {dim}]{_format_context_length(context_length)} context[/]" if context_length else ""
@@ -297,12 +305,12 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
for tool in tools:
tool_name = tool["function"]["name"]
toolset = get_toolset_for_tool(tool_name) or "other"
toolset = _display_toolset_name(get_toolset_for_tool(tool_name) or "other")
toolsets_dict.setdefault(toolset, []).append(tool_name)
for item in unavailable_toolsets:
toolset_id = item.get("id", item.get("name", "unknown"))
display_name = f"{toolset_id}_tools" if not toolset_id.endswith("_tools") else toolset_id
display_name = _display_toolset_name(toolset_id)
if display_name not in toolsets_dict:
toolsets_dict[display_name] = []
for tool_name in item.get("tools", []):
@@ -342,10 +350,10 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
colored_names.append(f"[{text}]{name}[/]")
tools_str = ", ".join(colored_names)
right_lines.append(f"[dim #B8860B]{toolset}:[/] {tools_str}")
right_lines.append(f"[dim {dim}]{toolset}:[/] {tools_str}")
if remaining_toolsets > 0:
right_lines.append(f"[dim #B8860B](and {remaining_toolsets} more toolsets...)[/]")
right_lines.append(f"[dim {dim}](and {remaining_toolsets} more toolsets...)[/]")
# MCP Servers section (only if configured)
try:
@@ -356,12 +364,12 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
if mcp_status:
right_lines.append("")
right_lines.append("[bold #FFBF00]MCP Servers[/]")
right_lines.append(f"[bold {accent}]MCP Servers[/]")
for srv in mcp_status:
if srv["connected"]:
right_lines.append(
f"[dim #B8860B]{srv['name']}[/] [#FFF8DC]({srv['transport']})[/] "
f"[dim #B8860B]—[/] [#FFF8DC]{srv['tools']} tool(s)[/]"
f"[dim {dim}]{srv['name']}[/] [{text}]({srv['transport']})[/] "
f"[dim {dim}]—[/] [{text}]{srv['tools']} tool(s)[/]"
)
else:
right_lines.append(
+15
View File
@@ -294,3 +294,18 @@ def _print_migration_report(report: dict, dry_run: bool):
elif migrated:
print()
print_success("Migration complete!")
# Warn if API keys were skipped (migrate_secrets not enabled)
skipped_keys = [
i for i in report.get("items", [])
if i.get("kind") == "provider-keys" and i.get("status") == "skipped"
]
if skipped_keys:
print()
print(color(" ⚠ API keys were NOT migrated (secrets migration is disabled by default).", Colors.YELLOW))
print(color(" Your OPENROUTER_API_KEY and other provider keys must be added manually.", Colors.YELLOW))
print()
print_info("To migrate API keys, re-run with:")
print_info(" hermes claw migrate --migrate-secrets")
print()
print_info("Or add your key manually:")
print_info(" hermes config set OPENROUTER_API_KEY sk-or-v1-...")
+524 -52
View File
@@ -1,77 +1,307 @@
"""Slash command definitions and autocomplete for the Hermes CLI.
Contains the shared built-in ``COMMANDS`` dict and ``SlashCommandCompleter``.
The completer can optionally include dynamic skill slash commands supplied by the
interactive CLI.
Central registry for all slash commands. Every consumer -- CLI help, gateway
dispatch, Telegram BotCommands, Slack subcommand mapping, autocomplete --
derives its data from ``COMMAND_REGISTRY``.
To add a command: add a ``CommandDef`` entry to ``COMMAND_REGISTRY``.
To add an alias: set ``aliases=("short",)`` on the existing ``CommandDef``.
"""
from __future__ import annotations
import os
import re
from collections.abc import Callable, Mapping
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any
from prompt_toolkit.auto_suggest import AutoSuggest, Suggestion
from prompt_toolkit.completion import Completer, Completion
# Commands organized by category for better help display
COMMANDS_BY_CATEGORY = {
"Session": {
"/new": "Start a new session (fresh session ID + history)",
"/reset": "Start a new session (alias for /new)",
"/clear": "Clear screen and start a new session",
"/history": "Show conversation history",
"/save": "Save the current conversation",
"/retry": "Retry the last message (resend to agent)",
"/undo": "Remove the last user/assistant exchange",
"/title": "Set a title for the current session (usage: /title My Session Name)",
"/compress": "Manually compress conversation context (flush memories + summarize)",
"/rollback": "List or restore filesystem checkpoints (usage: /rollback [number])",
"/background": "Run a prompt in the background (usage: /background <prompt>)",
},
"Configuration": {
"/config": "Show current configuration",
"/model": "Show or change the current model",
"/provider": "Show available providers and current provider",
"/prompt": "View/set custom system prompt",
"/personality": "Set a predefined personality",
"/verbose": "Cycle tool progress display: off → new → all → verbose",
"/reasoning": "Manage reasoning effort and display (usage: /reasoning [level|show|hide])",
"/skin": "Show or change the display skin/theme",
"/voice": "Toggle voice mode (Ctrl+B to record). Usage: /voice [on|off|tts|status]",
},
"Tools & Skills": {
"/tools": "List available tools",
"/toolsets": "List available toolsets",
"/skills": "Search, install, inspect, or manage skills from online registries",
"/cron": "Manage scheduled tasks (list, add/create, edit, pause, resume, run, remove)",
"/reload-mcp": "Reload MCP servers from config.yaml",
},
"Info": {
"/help": "Show this help message",
"/usage": "Show token usage for the current session",
"/insights": "Show usage insights and analytics (last 30 days)",
"/platforms": "Show gateway/messaging platform status",
"/paste": "Check clipboard for an image and attach it",
},
"Exit": {
"/quit": "Exit the CLI (also: /exit, /q)",
},
}
# ---------------------------------------------------------------------------
# CommandDef dataclass
# ---------------------------------------------------------------------------
# Flat dict for backwards compatibility and autocomplete
COMMANDS = {}
for category_commands in COMMANDS_BY_CATEGORY.values():
COMMANDS.update(category_commands)
@dataclass(frozen=True)
class CommandDef:
"""Definition of a single slash command."""
name: str # canonical name without slash: "background"
description: str # human-readable description
category: str # "Session", "Configuration", etc.
aliases: tuple[str, ...] = () # alternative names: ("bg",)
args_hint: str = "" # argument placeholder: "<prompt>", "[name]"
subcommands: tuple[str, ...] = () # tab-completable subcommands
cli_only: bool = False # only available in CLI
gateway_only: bool = False # only available in gateway/messaging
# ---------------------------------------------------------------------------
# Central registry -- single source of truth
# ---------------------------------------------------------------------------
COMMAND_REGISTRY: list[CommandDef] = [
# Session
CommandDef("new", "Start a new session (fresh session ID + history)", "Session",
aliases=("reset",)),
CommandDef("clear", "Clear screen and start a new session", "Session",
cli_only=True),
CommandDef("history", "Show conversation history", "Session",
cli_only=True),
CommandDef("save", "Save the current conversation", "Session",
cli_only=True),
CommandDef("retry", "Retry the last message (resend to agent)", "Session"),
CommandDef("undo", "Remove the last user/assistant exchange", "Session"),
CommandDef("title", "Set a title for the current session", "Session",
args_hint="[name]"),
CommandDef("compress", "Manually compress conversation context", "Session"),
CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
args_hint="[number]"),
CommandDef("stop", "Kill all running background processes", "Session"),
CommandDef("approve", "Approve a pending dangerous command", "Session",
gateway_only=True, args_hint="[session|always]"),
CommandDef("deny", "Deny a pending dangerous command", "Session",
gateway_only=True),
CommandDef("background", "Run a prompt in the background", "Session",
aliases=("bg",), args_hint="<prompt>"),
CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session",
aliases=("q",), args_hint="<prompt>"),
CommandDef("status", "Show session info", "Session",
gateway_only=True),
CommandDef("sethome", "Set this chat as the home channel", "Session",
gateway_only=True, aliases=("set-home",)),
CommandDef("resume", "Resume a previously-named session", "Session",
args_hint="[name]"),
# Configuration
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
CommandDef("model", "Show or change the current model", "Configuration",
args_hint="[name]"),
CommandDef("provider", "Show available providers and current provider",
"Configuration"),
CommandDef("prompt", "View/set custom system prompt", "Configuration",
cli_only=True, args_hint="[text]", subcommands=("clear",)),
CommandDef("personality", "Set a predefined personality", "Configuration",
args_hint="[name]"),
CommandDef("statusbar", "Toggle the context/model status bar", "Configuration",
cli_only=True, aliases=("sb",)),
CommandDef("verbose", "Cycle tool progress display: off -> new -> all -> verbose",
"Configuration", cli_only=True),
CommandDef("reasoning", "Manage reasoning effort and display", "Configuration",
args_hint="[level|show|hide]",
subcommands=("none", "low", "minimal", "medium", "high", "xhigh", "show", "hide", "on", "off")),
CommandDef("skin", "Show or change the display skin/theme", "Configuration",
cli_only=True, args_hint="[name]"),
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
# Tools & Skills
CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
args_hint="[list|disable|enable] [name...]", cli_only=True),
CommandDef("toolsets", "List available toolsets", "Tools & Skills",
cli_only=True),
CommandDef("skills", "Search, install, inspect, or manage skills",
"Tools & Skills", cli_only=True,
subcommands=("search", "browse", "inspect", "install")),
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
aliases=("reload_mcp",)),
CommandDef("browser", "Connect browser tools to your live Chrome via CDP", "Tools & Skills",
cli_only=True, args_hint="[connect|disconnect|status]",
subcommands=("connect", "disconnect", "status")),
CommandDef("plugins", "List installed plugins and their status",
"Tools & Skills", cli_only=True),
# Info
CommandDef("help", "Show available commands", "Info"),
CommandDef("usage", "Show token usage for the current session", "Info"),
CommandDef("insights", "Show usage insights and analytics", "Info",
args_hint="[days]"),
CommandDef("platforms", "Show gateway/messaging platform status", "Info",
cli_only=True, aliases=("gateway",)),
CommandDef("paste", "Check clipboard for an image and attach it", "Info",
cli_only=True),
CommandDef("update", "Update Hermes Agent to the latest version", "Info",
gateway_only=True),
# Exit
CommandDef("quit", "Exit the CLI", "Exit",
cli_only=True, aliases=("exit", "q")),
]
# ---------------------------------------------------------------------------
# Derived lookups -- rebuilt once at import time
# ---------------------------------------------------------------------------
def _build_command_lookup() -> dict[str, CommandDef]:
"""Map every name and alias to its CommandDef."""
lookup: dict[str, CommandDef] = {}
for cmd in COMMAND_REGISTRY:
lookup[cmd.name] = cmd
for alias in cmd.aliases:
lookup[alias] = cmd
return lookup
_COMMAND_LOOKUP: dict[str, CommandDef] = _build_command_lookup()
def resolve_command(name: str) -> CommandDef | None:
"""Resolve a command name or alias to its CommandDef.
Accepts names with or without the leading slash.
"""
return _COMMAND_LOOKUP.get(name.lower().lstrip("/"))
def _build_description(cmd: CommandDef) -> str:
"""Build a CLI-facing description string including usage hint."""
if cmd.args_hint:
return f"{cmd.description} (usage: /{cmd.name} {cmd.args_hint})"
return cmd.description
# Backwards-compatible flat dict: "/command" -> description
COMMANDS: dict[str, str] = {}
for _cmd in COMMAND_REGISTRY:
if not _cmd.gateway_only:
COMMANDS[f"/{_cmd.name}"] = _build_description(_cmd)
for _alias in _cmd.aliases:
COMMANDS[f"/{_alias}"] = f"{_cmd.description} (alias for /{_cmd.name})"
# Backwards-compatible categorized dict
COMMANDS_BY_CATEGORY: dict[str, dict[str, str]] = {}
for _cmd in COMMAND_REGISTRY:
if not _cmd.gateway_only:
_cat = COMMANDS_BY_CATEGORY.setdefault(_cmd.category, {})
_cat[f"/{_cmd.name}"] = COMMANDS[f"/{_cmd.name}"]
for _alias in _cmd.aliases:
_cat[f"/{_alias}"] = COMMANDS[f"/{_alias}"]
# Subcommands lookup: "/cmd" -> ["sub1", "sub2", ...]
SUBCOMMANDS: dict[str, list[str]] = {}
for _cmd in COMMAND_REGISTRY:
if _cmd.subcommands:
SUBCOMMANDS[f"/{_cmd.name}"] = list(_cmd.subcommands)
# Also extract subcommands hinted in args_hint via pipe-separated patterns
# e.g. args_hint="[on|off|tts|status]" for commands that don't have explicit subcommands.
# NOTE: If a command already has explicit subcommands, this fallback is skipped.
# Use the `subcommands` field on CommandDef for intentional tab-completable args.
_PIPE_SUBS_RE = re.compile(r"[a-z]+(?:\|[a-z]+)+")
for _cmd in COMMAND_REGISTRY:
key = f"/{_cmd.name}"
if key in SUBCOMMANDS or not _cmd.args_hint:
continue
m = _PIPE_SUBS_RE.search(_cmd.args_hint)
if m:
SUBCOMMANDS[key] = m.group(0).split("|")
# ---------------------------------------------------------------------------
# Gateway helpers
# ---------------------------------------------------------------------------
# Set of all command names + aliases recognized by the gateway
GATEWAY_KNOWN_COMMANDS: frozenset[str] = frozenset(
name
for cmd in COMMAND_REGISTRY
if not cmd.cli_only
for name in (cmd.name, *cmd.aliases)
)
def gateway_help_lines() -> list[str]:
"""Generate gateway help text lines from the registry."""
lines: list[str] = []
for cmd in COMMAND_REGISTRY:
if cmd.cli_only:
continue
args = f" {cmd.args_hint}" if cmd.args_hint else ""
alias_parts: list[str] = []
for a in cmd.aliases:
# Skip internal aliases like reload_mcp (underscore variant)
if a.replace("-", "_") == cmd.name.replace("-", "_") and a != cmd.name:
continue
alias_parts.append(f"`/{a}`")
alias_note = f" (alias: {', '.join(alias_parts)})" if alias_parts else ""
lines.append(f"`/{cmd.name}{args}` -- {cmd.description}{alias_note}")
return lines
def telegram_bot_commands() -> list[tuple[str, str]]:
"""Return (command_name, description) pairs for Telegram setMyCommands.
Telegram command names cannot contain hyphens, so they are replaced with
underscores. Aliases are skipped -- Telegram shows one menu entry per
canonical command.
"""
result: list[tuple[str, str]] = []
for cmd in COMMAND_REGISTRY:
if cmd.cli_only:
continue
tg_name = cmd.name.replace("-", "_")
result.append((tg_name, cmd.description))
return result
def slack_subcommand_map() -> dict[str, str]:
"""Return subcommand -> /command mapping for Slack /hermes handler.
Maps both canonical names and aliases so /hermes bg do stuff works
the same as /hermes background do stuff.
"""
mapping: dict[str, str] = {}
for cmd in COMMAND_REGISTRY:
if cmd.cli_only:
continue
mapping[cmd.name] = f"/{cmd.name}"
for alias in cmd.aliases:
mapping[alias] = f"/{alias}"
return mapping
# ---------------------------------------------------------------------------
# Autocomplete
# ---------------------------------------------------------------------------
class SlashCommandCompleter(Completer):
"""Autocomplete for built-in slash commands and optional skill commands."""
"""Autocomplete for built-in slash commands, subcommands, and skill commands."""
def __init__(
self,
skill_commands_provider: Callable[[], Mapping[str, dict[str, Any]]] | None = None,
model_completer_provider: Callable[[], dict[str, Any]] | None = None,
) -> None:
self._skill_commands_provider = skill_commands_provider
# model_completer_provider returns {"current_provider": str,
# "providers": {id: label, ...}, "models_for": callable(provider) -> list[str]}
self._model_completer_provider = model_completer_provider
self._model_info_cache: dict[str, Any] | None = None
self._model_info_cache_time: float = 0
def _get_model_info(self) -> dict[str, Any]:
"""Get cached model/provider info for /model autocomplete."""
import time
now = time.monotonic()
if self._model_info_cache is not None and now - self._model_info_cache_time < 60:
return self._model_info_cache
if self._model_completer_provider is None:
return {}
try:
self._model_info_cache = self._model_completer_provider() or {}
self._model_info_cache_time = now
except Exception:
self._model_info_cache = self._model_info_cache or {}
return self._model_info_cache
def _iter_skill_commands(self) -> Mapping[str, dict[str, Any]]:
if self._skill_commands_provider is None:
@@ -92,9 +322,152 @@ class SlashCommandCompleter(Completer):
"""
return f"{cmd_name} " if cmd_name == word else cmd_name
@staticmethod
def _extract_path_word(text: str) -> str | None:
"""Extract the current word if it looks like a file path.
Returns the path-like token under the cursor, or None if the
current word doesn't look like a path. A word is path-like when
it starts with ``./``, ``../``, ``~/``, ``/``, or contains a
``/`` separator (e.g. ``src/main.py``).
"""
if not text:
return None
# Walk backwards to find the start of the current "word".
# Words are delimited by spaces, but paths can contain almost anything.
i = len(text) - 1
while i >= 0 and text[i] != " ":
i -= 1
word = text[i + 1:]
if not word:
return None
# Only trigger path completion for path-like tokens
if word.startswith(("./", "../", "~/", "/")) or "/" in word:
return word
return None
@staticmethod
def _path_completions(word: str, limit: int = 30):
"""Yield Completion objects for file paths matching *word*."""
expanded = os.path.expanduser(word)
# Split into directory part and prefix to match inside it
if expanded.endswith("/"):
search_dir = expanded
prefix = ""
else:
search_dir = os.path.dirname(expanded) or "."
prefix = os.path.basename(expanded)
try:
entries = os.listdir(search_dir)
except OSError:
return
count = 0
prefix_lower = prefix.lower()
for entry in sorted(entries):
if prefix and not entry.lower().startswith(prefix_lower):
continue
if count >= limit:
break
full_path = os.path.join(search_dir, entry)
is_dir = os.path.isdir(full_path)
# Build the completion text (what replaces the typed word)
if word.startswith("~"):
display_path = "~/" + os.path.relpath(full_path, os.path.expanduser("~"))
elif os.path.isabs(word):
display_path = full_path
else:
# Keep relative
display_path = os.path.relpath(full_path)
if is_dir:
display_path += "/"
suffix = "/" if is_dir else ""
meta = "dir" if is_dir else _file_size_label(full_path)
yield Completion(
display_path,
start_position=-len(word),
display=entry + suffix,
display_meta=meta,
)
count += 1
def get_completions(self, document, complete_event):
text = document.text_before_cursor
if not text.startswith("/"):
# Try file path completion for non-slash input
path_word = self._extract_path_word(text)
if path_word is not None:
yield from self._path_completions(path_word)
return
# Check if we're completing a subcommand (base command already typed)
parts = text.split(maxsplit=1)
base_cmd = parts[0].lower()
if len(parts) > 1 or (len(parts) == 1 and text.endswith(" ")):
sub_text = parts[1] if len(parts) > 1 else ""
sub_lower = sub_text.lower()
# /model gets two-stage completion:
# Stage 1: provider names (with : suffix)
# Stage 2: after "provider:", list that provider's models
if base_cmd == "/model" and " " not in sub_text:
info = self._get_model_info()
if info:
current_prov = info.get("current_provider", "")
providers = info.get("providers", {})
models_for = info.get("models_for")
if ":" in sub_text:
# Stage 2: "anthropic:cl" → models for anthropic
prov_part, model_part = sub_text.split(":", 1)
model_lower = model_part.lower()
if models_for:
try:
prov_models = models_for(prov_part)
except Exception:
prov_models = []
for mid in prov_models:
if mid.lower().startswith(model_lower) and mid.lower() != model_lower:
full = f"{prov_part}:{mid}"
yield Completion(
full,
start_position=-len(sub_text),
display=mid,
)
else:
# Stage 1: providers sorted: non-current first, current last
for pid, plabel in sorted(
providers.items(),
key=lambda kv: (kv[0] == current_prov, kv[0]),
):
display_name = f"{pid}:"
if display_name.lower().startswith(sub_lower):
meta = f"({plabel})" if plabel != pid else ""
if pid == current_prov:
meta = f"(current — {plabel})" if plabel != pid else "(current)"
yield Completion(
display_name,
start_position=-len(sub_text),
display=display_name,
display_meta=meta,
)
return
# Static subcommand completions
if " " not in sub_text and base_cmd in SUBCOMMANDS:
for sub in SUBCOMMANDS[base_cmd]:
if sub.startswith(sub_lower) and sub != sub_lower:
yield Completion(
sub,
start_position=-len(sub_text),
display=sub,
)
return
word = text[1:]
@@ -120,3 +493,102 @@ class SlashCommandCompleter(Completer):
display=cmd,
display_meta=f"{short_desc}",
)
# ---------------------------------------------------------------------------
# Inline auto-suggest (ghost text) for slash commands
# ---------------------------------------------------------------------------
class SlashCommandAutoSuggest(AutoSuggest):
"""Inline ghost-text suggestions for slash commands and their subcommands.
Shows the rest of a command or subcommand in dim text as you type.
Falls back to history-based suggestions for non-slash input.
"""
def __init__(
self,
history_suggest: AutoSuggest | None = None,
completer: SlashCommandCompleter | None = None,
) -> None:
self._history = history_suggest
self._completer = completer # Reuse its model cache
def get_suggestion(self, buffer, document):
text = document.text_before_cursor
# Only suggest for slash commands
if not text.startswith("/"):
# Fall back to history for regular text
if self._history:
return self._history.get_suggestion(buffer, document)
return None
parts = text.split(maxsplit=1)
base_cmd = parts[0].lower()
if len(parts) == 1 and not text.endswith(" "):
# Still typing the command name: /upd → suggest "ate"
word = text[1:].lower()
for cmd in COMMANDS:
cmd_name = cmd[1:] # strip leading /
if cmd_name.startswith(word) and cmd_name != word:
return Suggestion(cmd_name[len(word):])
return None
# Command is complete — suggest subcommands or model names
sub_text = parts[1] if len(parts) > 1 else ""
sub_lower = sub_text.lower()
# /model gets two-stage ghost text
if base_cmd == "/model" and " " not in sub_text and self._completer:
info = self._completer._get_model_info()
if info:
providers = info.get("providers", {})
models_for = info.get("models_for")
current_prov = info.get("current_provider", "")
if ":" in sub_text:
# Stage 2: after provider:, suggest model
prov_part, model_part = sub_text.split(":", 1)
model_lower = model_part.lower()
if models_for:
try:
for mid in models_for(prov_part):
if mid.lower().startswith(model_lower) and mid.lower() != model_lower:
return Suggestion(mid[len(model_part):])
except Exception:
pass
else:
# Stage 1: suggest provider name with :
for pid in sorted(providers, key=lambda p: (p == current_prov, p)):
candidate = f"{pid}:"
if candidate.lower().startswith(sub_lower) and candidate.lower() != sub_lower:
return Suggestion(candidate[len(sub_text):])
# Static subcommands
if base_cmd in SUBCOMMANDS and SUBCOMMANDS[base_cmd]:
if " " not in sub_text:
for sub in SUBCOMMANDS[base_cmd]:
if sub.startswith(sub_lower) and sub != sub_lower:
return Suggestion(sub[len(sub_text):])
# Fall back to history
if self._history:
return self._history.get_suggestion(buffer, document)
return None
def _file_size_label(path: str) -> str:
"""Return a compact human-readable file size, or '' on error."""
try:
size = os.path.getsize(path)
except OSError:
return ""
if size < 1024:
return f"{size}B"
if size < 1024 * 1024:
return f"{size / 1024:.0f}K"
if size < 1024 * 1024 * 1024:
return f"{size / (1024 * 1024):.1f}M"
return f"{size / (1024 * 1024 * 1024):.1f}G"
+420 -6
View File
@@ -16,7 +16,6 @@ import os
import platform
import re
import stat
import sys
import subprocess
import sys
import tempfile
@@ -25,6 +24,21 @@ from typing import Dict, Any, Optional, List, Tuple
_IS_WINDOWS = platform.system() == "Windows"
_ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
# Env var names written to .env that aren't in OPTIONAL_ENV_VARS
# (managed by setup/provider flows directly).
_EXTRA_ENV_KEYS = frozenset({
"OPENAI_API_KEY", "OPENAI_BASE_URL",
"ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN",
"AUXILIARY_VISION_MODEL",
"DISCORD_HOME_CHANNEL", "TELEGRAM_HOME_CHANNEL",
"SIGNAL_ACCOUNT", "SIGNAL_HTTP_URL",
"SIGNAL_ALLOWED_USERS", "SIGNAL_GROUP_ALLOWED_USERS",
"DINGTALK_CLIENT_ID", "DINGTALK_CLIENT_SECRET",
"TERMINAL_ENV", "TERMINAL_SSH_KEY", "TERMINAL_SSH_PORT",
"WHATSAPP_MODE", "WHATSAPP_ENABLED",
"MATTERMOST_HOME_CHANNEL", "MATTERMOST_REPLY_MODE",
"MATRIX_PASSWORD", "MATRIX_ENCRYPTION", "MATRIX_HOME_ROOM",
})
import yaml
@@ -106,6 +120,7 @@ DEFAULT_CONFIG = {
"cwd": ".", # Use current directory
"timeout": 180,
"docker_image": "nikolaik/python-nodejs:python3.11-nodejs20",
"docker_forward_env": [],
"singularity_image": "docker://nikolaik/python-nodejs:python3.11-nodejs20",
"modal_image": "nikolaik/python-nodejs:python3.11-nodejs20",
"daytona_image": "nikolaik/python-nodejs:python3.11-nodejs20",
@@ -118,6 +133,14 @@ DEFAULT_CONFIG = {
# Each entry is "host_path:container_path" (standard Docker -v syntax).
# Example: ["/home/user/projects:/workspace/projects", "/data:/data"]
"docker_volumes": [],
# Explicit opt-in: mount the host cwd into /workspace for Docker sessions.
# Default off because passing host directories into a sandbox weakens isolation.
"docker_mount_cwd_to_workspace": False,
# Persistent shell — keep a long-lived bash shell across execute() calls
# so cwd/env vars/shell variables survive between commands.
# Enabled by default for non-local backends (SSH); local is always opt-in
# via TERMINAL_LOCAL_PERSISTENT env var.
"persistent_shell": True,
},
"browser": {
@@ -129,7 +152,7 @@ DEFAULT_CONFIG = {
# When enabled, the agent takes a snapshot of the working directory once per
# conversation turn (on first write_file/patch call). Use /rollback to restore.
"checkpoints": {
"enabled": False,
"enabled": True,
"max_snapshots": 50, # Max checkpoints to keep per directory
},
@@ -138,6 +161,13 @@ DEFAULT_CONFIG = {
"threshold": 0.50,
"summary_model": "google/gemini-3-flash-preview",
"summary_provider": "auto",
"summary_base_url": None,
},
"smart_model_routing": {
"enabled": False,
"max_simple_chars": 160,
"max_simple_words": 28,
"cheap_model": {},
},
# Auxiliary model config — provider:model for each side task.
@@ -177,6 +207,12 @@ DEFAULT_CONFIG = {
"base_url": "",
"api_key": "",
},
"approval": {
"provider": "auto",
"model": "", # fast/cheap model recommended (e.g. gemini-flash, haiku)
"base_url": "",
"api_key": "",
},
"mcp": {
"provider": "auto",
"model": "",
@@ -197,12 +233,19 @@ DEFAULT_CONFIG = {
"resume_display": "full",
"bell_on_complete": False,
"show_reasoning": False,
"streaming": False,
"show_cost": False, # Show $ cost in the status bar (off by default)
"skin": "default",
},
# Privacy settings
"privacy": {
"redact_pii": False, # When True, hash user IDs and strip phone numbers from LLM context
},
# Text-to-speech configuration
"tts": {
"provider": "edge", # "edge" (free) | "elevenlabs" (premium) | "openai"
"provider": "edge", # "edge" (free) | "elevenlabs" (premium) | "openai" | "neutts" (local)
"edge": {
"voice": "en-US-AriaNeural",
# Popular: AriaNeural, JennyNeural, AndrewNeural, BrianNeural, SoniaNeural
@@ -216,6 +259,12 @@ DEFAULT_CONFIG = {
"voice": "alloy",
# Voices: alloy, echo, fable, onyx, nova, shimmer
},
"neutts": {
"ref_audio": "", # Path to reference voice audio (empty = bundled default)
"ref_text": "", # Path to reference voice transcript (empty = bundled default)
"model": "neuphonic/neutts-air-q4-gguf", # HuggingFace model repo
"device": "cpu", # cpu, cuda, or mps
},
},
"stt": {
@@ -280,6 +329,23 @@ DEFAULT_CONFIG = {
"discord": {
"require_mention": True, # Require @mention to respond in server channels
"free_response_channels": "", # Comma-separated channel IDs where bot responds without mention
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
},
# WhatsApp platform settings (gateway mode)
"whatsapp": {
# Reply prefix prepended to every outgoing WhatsApp message.
# Default (None) uses the built-in "⚕ *Hermes Agent*" header.
# Set to "" (empty string) to disable the header entirely.
# Supports \n for newlines, e.g. "🤖 *My Bot*\n──────\n"
},
# Approval mode for dangerous commands:
# manual — always prompt the user (default)
# smart — use auxiliary LLM to auto-approve low-risk commands, prompt for high-risk
# off — skip all approval prompts (equivalent to --yolo)
"approvals": {
"mode": "manual",
},
# Permanently allowed dangerous command patterns (added via "always" approval)
@@ -298,10 +364,15 @@ DEFAULT_CONFIG = {
"tirith_path": "tirith",
"tirith_timeout": 5,
"tirith_fail_open": True,
"website_blocklist": {
"enabled": False,
"domains": [],
"shared_files": [],
},
},
# Config schema version - bump this when adding new required fields
"_config_version": 8,
"_config_version": 10,
}
# =============================================================================
@@ -315,6 +386,7 @@ ENV_VARS_BY_VERSION: Dict[int, List[str]] = {
4: ["VOICE_TOOLS_OPENAI_KEY", "ELEVENLABS_API_KEY"],
5: ["WHATSAPP_ENABLED", "WHATSAPP_MODE", "WHATSAPP_ALLOWED_USERS",
"SLACK_BOT_TOKEN", "SLACK_APP_TOKEN", "SLACK_ALLOWED_USERS"],
10: ["TAVILY_API_KEY"],
}
# Required environment variables with metadata for migration prompts.
@@ -423,8 +495,77 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"DEEPSEEK_API_KEY": {
"description": "DeepSeek API key for direct DeepSeek access",
"prompt": "DeepSeek API Key",
"url": "https://platform.deepseek.com/api_keys",
"password": True,
"category": "provider",
},
"DEEPSEEK_BASE_URL": {
"description": "Custom DeepSeek API base URL (advanced)",
"prompt": "DeepSeek Base URL",
"url": "",
"password": False,
"category": "provider",
},
"DASHSCOPE_API_KEY": {
"description": "Alibaba Cloud DashScope API key for Qwen models",
"prompt": "DashScope API Key",
"url": "https://modelstudio.console.alibabacloud.com/",
"password": True,
"category": "provider",
},
"DASHSCOPE_BASE_URL": {
"description": "Custom DashScope base URL (default: international endpoint)",
"prompt": "DashScope Base URL",
"url": "",
"password": False,
"category": "provider",
"advanced": True,
},
"OPENCODE_ZEN_API_KEY": {
"description": "OpenCode Zen API key (pay-as-you-go access to curated models)",
"prompt": "OpenCode Zen API key",
"url": "https://opencode.ai/auth",
"password": True,
"category": "provider",
"advanced": True,
},
"OPENCODE_ZEN_BASE_URL": {
"description": "OpenCode Zen base URL override",
"prompt": "OpenCode Zen base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"OPENCODE_GO_API_KEY": {
"description": "OpenCode Go API key ($10/month subscription for open models)",
"prompt": "OpenCode Go API key",
"url": "https://opencode.ai/auth",
"password": True,
"category": "provider",
"advanced": True,
},
"OPENCODE_GO_BASE_URL": {
"description": "OpenCode Go base URL override",
"prompt": "OpenCode Go base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"PARALLEL_API_KEY": {
"description": "Parallel API key for AI-native web search and extract",
"prompt": "Parallel API key",
"url": "https://parallel.ai/",
"tools": ["web_search", "web_extract"],
"password": True,
"category": "tool",
},
"FIRECRAWL_API_KEY": {
"description": "Firecrawl API key for web search and scraping",
"prompt": "Firecrawl API key",
@@ -441,6 +582,14 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
"advanced": True,
},
"TAVILY_API_KEY": {
"description": "Tavily API key for AI-native web search, extract, and crawl",
"prompt": "Tavily API key",
"url": "https://app.tavily.com/home",
"tools": ["web_search", "web_extract", "web_crawl"],
"password": True,
"category": "tool",
},
"BROWSERBASE_API_KEY": {
"description": "Browserbase API key for cloud browser (optional — local browser works without this)",
"prompt": "Browserbase API key",
@@ -457,6 +606,14 @@ OPTIONAL_ENV_VARS = {
"password": False,
"category": "tool",
},
"BROWSER_USE_API_KEY": {
"description": "Browser Use API key for cloud browser (optional — local browser works without this)",
"prompt": "Browser Use API key",
"url": "https://browser-use.com/",
"tools": ["browser_navigate", "browser_click"],
"password": True,
"category": "tool",
},
"FAL_KEY": {
"description": "FAL API key for image generation",
"prompt": "FAL API key",
@@ -513,6 +670,11 @@ OPTIONAL_ENV_VARS = {
"password": True,
"category": "tool",
},
"HONCHO_BASE_URL": {
"description": "Base URL for self-hosted Honcho instances (no API key needed)",
"prompt": "Honcho base URL (e.g. http://localhost:8000)",
"category": "tool",
},
# ── Messaging platforms ──
"TELEGRAM_BOT_TOKEN": {
@@ -561,6 +723,55 @@ OPTIONAL_ENV_VARS = {
"password": True,
"category": "messaging",
},
"MATTERMOST_URL": {
"description": "Mattermost server URL (e.g. https://mm.example.com)",
"prompt": "Mattermost server URL",
"url": "https://mattermost.com/deploy/",
"password": False,
"category": "messaging",
},
"MATTERMOST_TOKEN": {
"description": "Mattermost bot token or personal access token",
"prompt": "Mattermost bot token",
"url": None,
"password": True,
"category": "messaging",
},
"MATTERMOST_ALLOWED_USERS": {
"description": "Comma-separated Mattermost user IDs allowed to use the bot",
"prompt": "Allowed Mattermost user IDs (comma-separated)",
"url": None,
"password": False,
"category": "messaging",
},
"MATRIX_HOMESERVER": {
"description": "Matrix homeserver URL (e.g. https://matrix.example.org)",
"prompt": "Matrix homeserver URL",
"url": "https://matrix.org/ecosystem/servers/",
"password": False,
"category": "messaging",
},
"MATRIX_ACCESS_TOKEN": {
"description": "Matrix access token (preferred over password login)",
"prompt": "Matrix access token",
"url": None,
"password": True,
"category": "messaging",
},
"MATRIX_USER_ID": {
"description": "Matrix user ID (e.g. @hermes:example.org)",
"prompt": "Matrix user ID (@user:server)",
"url": None,
"password": False,
"category": "messaging",
},
"MATRIX_ALLOWED_USERS": {
"description": "Comma-separated Matrix user IDs allowed to use the bot (@user:server format)",
"prompt": "Allowed Matrix user IDs (comma-separated)",
"url": None,
"password": False,
"category": "messaging",
},
"GATEWAY_ALLOW_ALL_USERS": {
"description": "Allow all users to interact with messaging bots (true/false). Default: false.",
"prompt": "Allow all users (true/false)",
@@ -569,6 +780,59 @@ OPTIONAL_ENV_VARS = {
"category": "messaging",
"advanced": True,
},
"API_SERVER_ENABLED": {
"description": "Enable the OpenAI-compatible API server (true/false). Allows frontends like Open WebUI, LobeChat, etc. to connect.",
"prompt": "Enable API server (true/false)",
"url": None,
"password": False,
"category": "messaging",
"advanced": True,
},
"API_SERVER_KEY": {
"description": "Bearer token for API server authentication. If empty, all requests are allowed (local use only).",
"prompt": "API server auth key (optional)",
"url": None,
"password": True,
"category": "messaging",
"advanced": True,
},
"API_SERVER_PORT": {
"description": "Port for the API server (default: 8642).",
"prompt": "API server port",
"url": None,
"password": False,
"category": "messaging",
"advanced": True,
},
"API_SERVER_HOST": {
"description": "Host/bind address for the API server (default: 127.0.0.1). Use 0.0.0.0 for network access — requires API_SERVER_KEY for security.",
"prompt": "API server host",
"url": None,
"password": False,
"category": "messaging",
"advanced": True,
},
"WEBHOOK_ENABLED": {
"description": "Enable the webhook platform adapter for receiving events from GitHub, GitLab, etc.",
"prompt": "Enable webhooks (true/false)",
"url": None,
"password": False,
"category": "messaging",
},
"WEBHOOK_PORT": {
"description": "Port for the webhook HTTP server (default: 8644).",
"prompt": "Webhook port",
"url": None,
"password": False,
"category": "messaging",
},
"WEBHOOK_SECRET": {
"description": "Global HMAC secret for webhook signature validation (overridable per route in config.yaml).",
"prompt": "Webhook secret",
"url": None,
"password": True,
"category": "messaging",
},
# ── Agent settings ──
"MESSAGING_CWD": {
@@ -715,7 +979,15 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
Dict with migration results: {"env_added": [...], "config_added": [...], "warnings": [...]}
"""
results = {"env_added": [], "config_added": [], "warnings": []}
# ── Always: sanitize .env (split concatenated keys) ──
try:
fixes = sanitize_env_file()
if fixes and not quiet:
print(f" ✓ Repaired .env file ({fixes} corrupted entries fixed)")
except Exception:
pass # best-effort; don't block migration on sanitize failure
# Check config version
current_ver, latest_ver = check_config_version()
@@ -758,6 +1030,18 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
tz_display = config["timezone"] or "(server-local)"
print(f" ✓ Added timezone to config.yaml: {tz_display}")
# ── Version 8 → 9: clear ANTHROPIC_TOKEN from .env ──
# The new Anthropic auth flow no longer uses this env var.
if current_ver < 9:
try:
old_token = get_env_value("ANTHROPIC_TOKEN")
if old_token:
save_env_value("ANTHROPIC_TOKEN", "")
if not quiet:
print(" ✓ Cleared ANTHROPIC_TOKEN from .env (no longer used)")
except Exception:
pass
if current_ver < latest_ver and not quiet:
print(f"Config version: {current_ver}{latest_ver}")
@@ -967,6 +1251,19 @@ _FALLBACK_COMMENT = """
# fallback_model:
# provider: openrouter
# model: anthropic/claude-sonnet-4
#
# ── Smart Model Routing ────────────────────────────────────────────────
# Optional cheap-vs-strong routing for simple turns.
# Keeps the primary model for complex work, but can route short/simple
# messages to a cheaper model across providers.
#
# smart_model_routing:
# enabled: true
# max_simple_chars: 160
# max_simple_words: 28
# cheap_model:
# provider: openrouter
# model: google/gemini-2.5-flash
"""
@@ -997,6 +1294,19 @@ _COMMENTED_SECTIONS = """
# fallback_model:
# provider: openrouter
# model: anthropic/claude-sonnet-4
#
# ── Smart Model Routing ────────────────────────────────────────────────
# Optional cheap-vs-strong routing for simple turns.
# Keeps the primary model for complex work, but can route short/simple
# messages to a cheaper model across providers.
#
# smart_model_routing:
# enabled: true
# max_simple_chars: 160
# max_simple_words: 28
# cheap_model:
# provider: openrouter
# model: google/gemini-2.5-flash
"""
@@ -1045,6 +1355,102 @@ def load_env() -> Dict[str, str]:
return env_vars
def _sanitize_env_lines(lines: list) -> list:
"""Fix corrupted .env lines before writing.
Handles two known corruption patterns:
1. Concatenated KEY=VALUE pairs on a single line (missing newline between
entries, e.g. ``ANTHROPIC_API_KEY=sk-...OPENAI_BASE_URL=https://...``).
2. Stale ``KEY=***`` placeholder entries left by incomplete setup runs.
Uses a known-keys set (OPTIONAL_ENV_VARS + _EXTRA_ENV_KEYS) so we only
split on real Hermes env var names, avoiding false positives from values
that happen to contain uppercase text with ``=``.
"""
# Build the known keys set lazily from OPTIONAL_ENV_VARS + extras.
# Done inside the function so OPTIONAL_ENV_VARS is guaranteed to be defined.
known_keys = set(OPTIONAL_ENV_VARS.keys()) | _EXTRA_ENV_KEYS
sanitized: list[str] = []
for line in lines:
raw = line.rstrip("\r\n")
stripped = raw.strip()
# Preserve blank lines and comments
if not stripped or stripped.startswith("#"):
sanitized.append(raw + "\n")
continue
# Detect concatenated KEY=VALUE pairs on one line.
# Search for known KEY= patterns at any position in the line.
split_positions = []
for key_name in known_keys:
needle = key_name + "="
idx = stripped.find(needle)
while idx >= 0:
split_positions.append(idx)
idx = stripped.find(needle, idx + len(needle))
if len(split_positions) > 1:
split_positions.sort()
# Deduplicate (shouldn't happen, but be safe)
split_positions = sorted(set(split_positions))
for i, pos in enumerate(split_positions):
end = split_positions[i + 1] if i + 1 < len(split_positions) else len(stripped)
part = stripped[pos:end].strip()
if part:
sanitized.append(part + "\n")
else:
sanitized.append(stripped + "\n")
return sanitized
def sanitize_env_file() -> int:
"""Read, sanitize, and rewrite ~/.hermes/.env in place.
Returns the number of lines that were fixed (concatenation splits +
placeholder removals). Returns 0 when no changes are needed.
"""
env_path = get_env_path()
if not env_path.exists():
return 0
read_kw = {"encoding": "utf-8", "errors": "replace"} if _IS_WINDOWS else {}
write_kw = {"encoding": "utf-8"} if _IS_WINDOWS else {}
with open(env_path, **read_kw) as f:
original_lines = f.readlines()
sanitized = _sanitize_env_lines(original_lines)
if sanitized == original_lines:
return 0
# Count fixes: difference in line count (from splits) + removed lines
fixes = abs(len(sanitized) - len(original_lines))
if fixes == 0:
# Lines changed content (e.g. *** removal) even if count is same
fixes = sum(1 for a, b in zip(original_lines, sanitized) if a != b)
fixes += abs(len(sanitized) - len(original_lines))
fd, tmp_path = tempfile.mkstemp(dir=str(env_path.parent), suffix=".tmp", prefix=".env_")
try:
with os.fdopen(fd, "w", **write_kw) as f:
f.writelines(sanitized)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, env_path)
except BaseException:
try:
os.unlink(tmp_path)
except OSError:
pass
raise
_secure_file(env_path)
return fixes
def save_env_value(key: str, value: str):
"""Save or update a value in ~/.hermes/.env."""
if not _ENV_VAR_NAME_RE.match(key):
@@ -1062,6 +1468,8 @@ def save_env_value(key: str, value: str):
if env_path.exists():
with open(env_path, **read_kw) as f:
lines = f.readlines()
# Sanitize on every read: split concatenated keys, drop stale placeholders
lines = _sanitize_env_lines(lines)
# Find and update or append
found = False
@@ -1180,8 +1588,11 @@ def show_config():
keys = [
("OPENROUTER_API_KEY", "OpenRouter"),
("VOICE_TOOLS_OPENAI_KEY", "OpenAI (STT/TTS)"),
("PARALLEL_API_KEY", "Parallel"),
("FIRECRAWL_API_KEY", "Firecrawl"),
("TAVILY_API_KEY", "Tavily"),
("BROWSERBASE_API_KEY", "Browserbase"),
("BROWSER_USE_API_KEY", "Browser Use"),
("FAL_KEY", "FAL"),
]
@@ -1328,7 +1739,8 @@ def set_config_value(key: str, value: str):
# Check if it's an API key (goes to .env)
api_keys = [
'OPENROUTER_API_KEY', 'OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'VOICE_TOOLS_OPENAI_KEY',
'FIRECRAWL_API_KEY', 'FIRECRAWL_API_URL', 'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID',
'PARALLEL_API_KEY', 'FIRECRAWL_API_KEY', 'FIRECRAWL_API_URL', 'TAVILY_API_KEY',
'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID', 'BROWSER_USE_API_KEY',
'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
'SUDO_PASSWORD', 'SLACK_BOT_TOKEN', 'SLACK_APP_TOKEN',
@@ -1387,9 +1799,11 @@ def set_config_value(key: str, value: str):
"terminal.singularity_image": "TERMINAL_SINGULARITY_IMAGE",
"terminal.modal_image": "TERMINAL_MODAL_IMAGE",
"terminal.daytona_image": "TERMINAL_DAYTONA_IMAGE",
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"terminal.cwd": "TERMINAL_CWD",
"terminal.timeout": "TERMINAL_TIMEOUT",
"terminal.sandbox_dir": "TERMINAL_SANDBOX_DIR",
"terminal.persistent_shell": "TERMINAL_PERSISTENT_SHELL",
}
if key in _config_to_env_sync:
save_env_value(_config_to_env_sync[key], str(value))
+295
View File
@@ -0,0 +1,295 @@
"""GitHub Copilot authentication utilities.
Implements the OAuth device code flow used by the Copilot CLI and handles
token validation/exchange for the Copilot API.
Token type support (per GitHub docs):
gho_ OAuth token (default via copilot login)
github_pat_ Fine-grained PAT (needs Copilot Requests permission)
ghu_ GitHub App token (via environment variable)
ghp_ Classic PAT NOT SUPPORTED
Credential search order (matching Copilot CLI behaviour):
1. COPILOT_GITHUB_TOKEN env var
2. GH_TOKEN env var
3. GITHUB_TOKEN env var
4. gh auth token CLI fallback
"""
from __future__ import annotations
import json
import logging
import os
import re
import shutil
import subprocess
import time
from pathlib import Path
from typing import Any, Optional
logger = logging.getLogger(__name__)
# OAuth device code flow constants (same client ID as opencode/Copilot CLI)
COPILOT_OAUTH_CLIENT_ID = "Ov23li8tweQw6odWQebz"
COPILOT_DEVICE_CODE_URL = "https://github.com/login/device/code"
COPILOT_ACCESS_TOKEN_URL = "https://github.com/login/oauth/access_token"
# Copilot API constants
COPILOT_TOKEN_EXCHANGE_URL = "https://api.github.com/copilot_internal/v2/token"
COPILOT_API_BASE_URL = "https://api.githubcopilot.com"
# Token type prefixes
_CLASSIC_PAT_PREFIX = "ghp_"
_SUPPORTED_PREFIXES = ("gho_", "github_pat_", "ghu_")
# Env var search order (matches Copilot CLI)
COPILOT_ENV_VARS = ("COPILOT_GITHUB_TOKEN", "GH_TOKEN", "GITHUB_TOKEN")
# Polling constants
_DEVICE_CODE_POLL_INTERVAL = 5 # seconds
_DEVICE_CODE_POLL_SAFETY_MARGIN = 3 # seconds
def is_classic_pat(token: str) -> bool:
"""Check if a token is a classic PAT (ghp_*), which Copilot doesn't support."""
return token.strip().startswith(_CLASSIC_PAT_PREFIX)
def validate_copilot_token(token: str) -> tuple[bool, str]:
"""Validate that a token is usable with the Copilot API.
Returns (valid, message).
"""
token = token.strip()
if not token:
return False, "Empty token"
if token.startswith(_CLASSIC_PAT_PREFIX):
return False, (
"Classic Personal Access Tokens (ghp_*) are not supported by the "
"Copilot API. Use one of:\n"
" → `copilot login` or `hermes model` to authenticate via OAuth\n"
" → A fine-grained PAT (github_pat_*) with Copilot Requests permission\n"
" → `gh auth login` with the default device code flow (produces gho_* tokens)"
)
return True, "OK"
def resolve_copilot_token() -> tuple[str, str]:
"""Resolve a GitHub token suitable for Copilot API use.
Returns (token, source) where source describes where the token came from.
Raises ValueError if only a classic PAT is available.
"""
# 1. Check env vars in priority order
for env_var in COPILOT_ENV_VARS:
val = os.getenv(env_var, "").strip()
if val:
valid, msg = validate_copilot_token(val)
if not valid:
logger.warning(
"Token from %s is not supported: %s", env_var, msg
)
continue
return val, env_var
# 2. Fall back to gh auth token
token = _try_gh_cli_token()
if token:
valid, msg = validate_copilot_token(token)
if not valid:
raise ValueError(
f"Token from `gh auth token` is a classic PAT (ghp_*). {msg}"
)
return token, "gh auth token"
return "", ""
def _gh_cli_candidates() -> list[str]:
"""Return candidate ``gh`` binary paths, including common Homebrew installs."""
candidates: list[str] = []
resolved = shutil.which("gh")
if resolved:
candidates.append(resolved)
for candidate in (
"/opt/homebrew/bin/gh",
"/usr/local/bin/gh",
str(Path.home() / ".local" / "bin" / "gh"),
):
if candidate in candidates:
continue
if os.path.isfile(candidate) and os.access(candidate, os.X_OK):
candidates.append(candidate)
return candidates
def _try_gh_cli_token() -> Optional[str]:
"""Return a token from ``gh auth token`` when the GitHub CLI is available."""
for gh_path in _gh_cli_candidates():
try:
result = subprocess.run(
[gh_path, "auth", "token"],
capture_output=True,
text=True,
timeout=5,
)
except (FileNotFoundError, subprocess.TimeoutExpired) as exc:
logger.debug("gh CLI token lookup failed (%s): %s", gh_path, exc)
continue
if result.returncode == 0 and result.stdout.strip():
return result.stdout.strip()
return None
# ─── OAuth Device Code Flow ────────────────────────────────────────────────
def copilot_device_code_login(
*,
host: str = "github.com",
timeout_seconds: float = 300,
) -> Optional[str]:
"""Run the GitHub OAuth device code flow for Copilot.
Prints instructions for the user, polls for completion, and returns
the OAuth access token on success, or None on failure/cancellation.
This replicates the flow used by opencode and the Copilot CLI.
"""
import urllib.request
import urllib.parse
domain = host.rstrip("/")
device_code_url = f"https://{domain}/login/device/code"
access_token_url = f"https://{domain}/login/oauth/access_token"
# Step 1: Request device code
data = urllib.parse.urlencode({
"client_id": COPILOT_OAUTH_CLIENT_ID,
"scope": "read:user",
}).encode()
req = urllib.request.Request(
device_code_url,
data=data,
headers={
"Accept": "application/json",
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": "HermesAgent/1.0",
},
)
try:
with urllib.request.urlopen(req, timeout=15) as resp:
device_data = json.loads(resp.read().decode())
except Exception as exc:
logger.error("Failed to initiate device authorization: %s", exc)
print(f" ✗ Failed to start device authorization: {exc}")
return None
verification_uri = device_data.get("verification_uri", "https://github.com/login/device")
user_code = device_data.get("user_code", "")
device_code = device_data.get("device_code", "")
interval = max(device_data.get("interval", _DEVICE_CODE_POLL_INTERVAL), 1)
if not device_code or not user_code:
print(" ✗ GitHub did not return a device code.")
return None
# Step 2: Show instructions
print()
print(f" Open this URL in your browser: {verification_uri}")
print(f" Enter this code: {user_code}")
print()
print(" Waiting for authorization...", end="", flush=True)
# Step 3: Poll for completion
deadline = time.time() + timeout_seconds
while time.time() < deadline:
time.sleep(interval + _DEVICE_CODE_POLL_SAFETY_MARGIN)
poll_data = urllib.parse.urlencode({
"client_id": COPILOT_OAUTH_CLIENT_ID,
"device_code": device_code,
"grant_type": "urn:ietf:params:oauth:grant-type:device_code",
}).encode()
poll_req = urllib.request.Request(
access_token_url,
data=poll_data,
headers={
"Accept": "application/json",
"Content-Type": "application/x-www-form-urlencoded",
"User-Agent": "HermesAgent/1.0",
},
)
try:
with urllib.request.urlopen(poll_req, timeout=10) as resp:
result = json.loads(resp.read().decode())
except Exception:
print(".", end="", flush=True)
continue
if result.get("access_token"):
print("")
return result["access_token"]
error = result.get("error", "")
if error == "authorization_pending":
print(".", end="", flush=True)
continue
elif error == "slow_down":
# RFC 8628: add 5 seconds to polling interval
server_interval = result.get("interval")
if isinstance(server_interval, (int, float)) and server_interval > 0:
interval = int(server_interval)
else:
interval += 5
print(".", end="", flush=True)
continue
elif error == "expired_token":
print()
print(" ✗ Device code expired. Please try again.")
return None
elif error == "access_denied":
print()
print(" ✗ Authorization was denied.")
return None
elif error:
print()
print(f" ✗ Authorization failed: {error}")
return None
print()
print(" ✗ Timed out waiting for authorization.")
return None
# ─── Copilot API Headers ───────────────────────────────────────────────────
def copilot_request_headers(
*,
is_agent_turn: bool = True,
is_vision: bool = False,
) -> dict[str, str]:
"""Build the standard headers for Copilot API requests.
Replicates the header set used by opencode and the Copilot CLI.
"""
headers: dict[str, str] = {
"Editor-Version": "vscode/1.104.1",
"User-Agent": "HermesAgent/1.0",
"Openai-Intent": "conversation-edits",
"x-initiator": "agent" if is_agent_turn else "user",
}
if is_vision:
headers["Copilot-Vision-Request"] = "true"
return headers
+3
View File
@@ -46,6 +46,7 @@ _PROVIDER_ENV_HINTS = (
"KIMI_API_KEY",
"MINIMAX_API_KEY",
"MINIMAX_CN_API_KEY",
"KILOCODE_API_KEY",
)
@@ -570,6 +571,8 @@ def run_doctor(args):
# MiniMax APIs don't support /models endpoint — https://github.com/NousResearch/hermes-agent/issues/811
("MiniMax", ("MINIMAX_API_KEY",), None, "MINIMAX_BASE_URL", False),
("MiniMax (China)", ("MINIMAX_CN_API_KEY",), None, "MINIMAX_CN_BASE_URL", False),
("AI Gateway", ("AI_GATEWAY_API_KEY",), "https://ai-gateway.vercel.sh/v1/models", "AI_GATEWAY_BASE_URL", True),
("Kilo Code", ("KILOCODE_API_KEY",), "https://api.kilo.ai/api/gateway/models", "KILOCODE_BASE_URL", True),
]
for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
_key = ""
+46
View File
@@ -0,0 +1,46 @@
"""Helpers for loading Hermes .env files consistently across entrypoints."""
from __future__ import annotations
import os
from pathlib import Path
from typing import Iterable
from dotenv import load_dotenv
def _load_dotenv_with_fallback(path: Path, *, override: bool) -> None:
try:
load_dotenv(dotenv_path=path, override=override, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(dotenv_path=path, override=override, encoding="latin-1")
def load_hermes_dotenv(
*,
hermes_home: str | os.PathLike | None = None,
project_env: str | os.PathLike | None = None,
) -> list[Path]:
"""Load Hermes environment files with user config taking precedence.
Behavior:
- `~/.hermes/.env` overrides stale shell-exported values when present.
- project `.env` acts as a dev fallback and only fills missing values when
the user env exists.
- if no user env exists, the project `.env` also overrides stale shell vars.
"""
loaded: list[Path] = []
home_path = Path(hermes_home or os.getenv("HERMES_HOME", Path.home() / ".hermes"))
user_env = home_path / ".env"
project_env_path = Path(project_env) if project_env else None
if user_env.exists():
_load_dotenv_with_fallback(user_env, override=True)
loaded.append(user_env)
if project_env_path and project_env_path.exists():
_load_dotenv_with_fallback(project_env_path, override=not loaded)
loaded.append(project_env_path)
return loaded
+328 -30
View File
@@ -6,6 +6,7 @@ Handles: hermes gateway [run|start|stop|restart|status|install|uninstall|setup]
import asyncio
import os
import shutil
import signal
import subprocess
import sys
@@ -30,6 +31,7 @@ def find_gateway_pids() -> list:
pids = []
patterns = [
"hermes_cli.main gateway",
"hermes_cli/main.py gateway",
"hermes gateway",
"gateway/run.py",
]
@@ -119,17 +121,62 @@ def is_windows() -> bool:
# Service Configuration
# =============================================================================
SERVICE_NAME = "hermes-gateway"
_SERVICE_BASE = "hermes-gateway"
SERVICE_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"
def get_service_name() -> str:
"""Derive a systemd service name scoped to this HERMES_HOME.
Default ``~/.hermes`` returns ``hermes-gateway`` (backward compatible).
Any other HERMES_HOME appends a short hash so multiple installations
can each have their own systemd service without conflicting.
"""
import hashlib
from pathlib import Path as _Path # local import to avoid monkeypatch interference
home = _Path(os.getenv("HERMES_HOME", _Path.home() / ".hermes")).resolve()
default = (_Path.home() / ".hermes").resolve()
if home == default:
return _SERVICE_BASE
suffix = hashlib.sha256(str(home).encode()).hexdigest()[:8]
return f"{_SERVICE_BASE}-{suffix}"
SERVICE_NAME = _SERVICE_BASE # backward-compat for external importers; prefer get_service_name()
def get_systemd_unit_path(system: bool = False) -> Path:
name = get_service_name()
if system:
return Path("/etc/systemd/system") / f"{SERVICE_NAME}.service"
return Path.home() / ".config" / "systemd" / "user" / f"{SERVICE_NAME}.service"
return Path("/etc/systemd/system") / f"{name}.service"
return Path.home() / ".config" / "systemd" / "user" / f"{name}.service"
def _ensure_user_systemd_env() -> None:
"""Ensure DBUS_SESSION_BUS_ADDRESS and XDG_RUNTIME_DIR are set for systemctl --user.
On headless servers (SSH sessions), these env vars may be missing even when
the user's systemd instance is running (via linger). Without them,
``systemctl --user`` fails with "Failed to connect to bus: No medium found".
We detect the standard socket path and set the vars so all subsequent
subprocess calls inherit them.
"""
uid = os.getuid()
if "XDG_RUNTIME_DIR" not in os.environ:
runtime_dir = f"/run/user/{uid}"
if Path(runtime_dir).exists():
os.environ["XDG_RUNTIME_DIR"] = runtime_dir
if "DBUS_SESSION_BUS_ADDRESS" not in os.environ:
xdg_runtime = os.environ.get("XDG_RUNTIME_DIR", f"/run/user/{uid}")
bus_path = Path(xdg_runtime) / "bus"
if bus_path.exists():
os.environ["DBUS_SESSION_BUS_ADDRESS"] = f"unix:path={bus_path}"
def _systemctl_cmd(system: bool = False) -> list[str]:
if not system:
_ensure_user_systemd_env()
return ["systemctl"] if system else ["systemctl", "--user"]
@@ -350,17 +397,22 @@ def get_hermes_cli_path() -> str:
# =============================================================================
def generate_systemd_unit(system: bool = False, run_as_user: str | None = None) -> str:
import shutil
python_path = get_python_path()
working_dir = str(PROJECT_ROOT)
venv_dir = str(PROJECT_ROOT / "venv")
venv_bin = str(PROJECT_ROOT / "venv" / "bin")
node_bin = str(PROJECT_ROOT / "node_modules" / ".bin")
# Build a PATH that includes the venv, node_modules, and standard system dirs
sane_path = f"{venv_bin}:{node_bin}:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
hermes_cli = shutil.which("hermes") or f"{python_path} -m hermes_cli.main"
path_entries = [venv_bin, node_bin]
resolved_node = shutil.which("node")
if resolved_node:
resolved_node_dir = str(Path(resolved_node).resolve().parent)
if resolved_node_dir not in path_entries:
path_entries.append(resolved_node_dir)
path_entries.extend(["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"])
sane_path = ":".join(path_entries)
hermes_home = str(Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")).resolve())
if system:
username, group_name, home_dir = _system_service_identity(run_as_user)
@@ -380,11 +432,12 @@ Environment="USER={username}"
Environment="LOGNAME={username}"
Environment="PATH={sane_path}"
Environment="VIRTUAL_ENV={venv_dir}"
Environment="HERMES_HOME={hermes_home}"
Restart=on-failure
RestartSec=10
KillMode=mixed
KillSignal=SIGTERM
TimeoutStopSec=15
TimeoutStopSec=60
StandardOutput=journal
StandardError=journal
@@ -399,15 +452,15 @@ After=network.target
[Service]
Type=simple
ExecStart={python_path} -m hermes_cli.main gateway run --replace
ExecStop={hermes_cli} gateway stop
WorkingDirectory={working_dir}
Environment="PATH={sane_path}"
Environment="VIRTUAL_ENV={venv_dir}"
Environment="HERMES_HOME={hermes_home}"
Restart=on-failure
RestartSec=10
KillMode=mixed
KillSignal=SIGTERM
TimeoutStopSec=15
TimeoutStopSec=60
StandardOutput=journal
StandardError=journal
@@ -455,7 +508,7 @@ def _print_linger_enable_warning(username: str, detail: str | None = None) -> No
print(f" sudo loginctl enable-linger {username}")
print()
print(" Then restart the gateway:")
print(f" systemctl --user restart {SERVICE_NAME}.service")
print(f" systemctl --user restart {get_service_name()}.service")
print()
@@ -517,6 +570,12 @@ def systemd_install(force: bool = False, system: bool = False, run_as_user: str
scope_flag = " --system" if system else ""
if unit_path.exists() and not force:
if not systemd_unit_is_current(system=system):
print(f"↻ Repairing outdated {_service_scope_label(system)} systemd service at: {unit_path}")
refresh_systemd_unit_if_needed(system=system)
subprocess.run(_systemctl_cmd(system) + ["enable", get_service_name()], check=True)
print(f"{_service_scope_label(system).capitalize()} service definition updated")
return
print(f"Service already installed at: {unit_path}")
print("Use --force to reinstall")
return
@@ -526,7 +585,7 @@ def systemd_install(force: bool = False, system: bool = False, run_as_user: str
unit_path.write_text(generate_systemd_unit(system=system, run_as_user=run_as_user), encoding="utf-8")
subprocess.run(_systemctl_cmd(system) + ["daemon-reload"], check=True)
subprocess.run(_systemctl_cmd(system) + ["enable", SERVICE_NAME], check=True)
subprocess.run(_systemctl_cmd(system) + ["enable", get_service_name()], check=True)
print()
print(f"{_service_scope_label(system).capitalize()} service installed and enabled!")
@@ -534,7 +593,7 @@ def systemd_install(force: bool = False, system: bool = False, run_as_user: str
print("Next steps:")
print(f" {'sudo ' if system else ''}hermes gateway start{scope_flag} # Start the service")
print(f" {'sudo ' if system else ''}hermes gateway status{scope_flag} # Check status")
print(f" {'journalctl' if system else 'journalctl --user'} -u {SERVICE_NAME} -f # View logs")
print(f" {'journalctl' if system else 'journalctl --user'} -u {get_service_name()} -f # View logs")
print()
if system:
@@ -552,8 +611,8 @@ def systemd_uninstall(system: bool = False):
if system:
_require_root_for_system_service("uninstall")
subprocess.run(_systemctl_cmd(system) + ["stop", SERVICE_NAME], check=False)
subprocess.run(_systemctl_cmd(system) + ["disable", SERVICE_NAME], check=False)
subprocess.run(_systemctl_cmd(system) + ["stop", get_service_name()], check=False)
subprocess.run(_systemctl_cmd(system) + ["disable", get_service_name()], check=False)
unit_path = get_systemd_unit_path(system=system)
if unit_path.exists():
@@ -569,7 +628,7 @@ def systemd_start(system: bool = False):
if system:
_require_root_for_system_service("start")
refresh_systemd_unit_if_needed(system=system)
subprocess.run(_systemctl_cmd(system) + ["start", SERVICE_NAME], check=True)
subprocess.run(_systemctl_cmd(system) + ["start", get_service_name()], check=True)
print(f"{_service_scope_label(system).capitalize()} service started")
@@ -578,7 +637,7 @@ def systemd_stop(system: bool = False):
system = _select_systemd_scope(system)
if system:
_require_root_for_system_service("stop")
subprocess.run(_systemctl_cmd(system) + ["stop", SERVICE_NAME], check=True)
subprocess.run(_systemctl_cmd(system) + ["stop", get_service_name()], check=True)
print(f"{_service_scope_label(system).capitalize()} service stopped")
@@ -588,7 +647,7 @@ def systemd_restart(system: bool = False):
if system:
_require_root_for_system_service("restart")
refresh_systemd_unit_if_needed(system=system)
subprocess.run(_systemctl_cmd(system) + ["restart", SERVICE_NAME], check=True)
subprocess.run(_systemctl_cmd(system) + ["restart", get_service_name()], check=True)
print(f"{_service_scope_label(system).capitalize()} service restarted")
@@ -613,12 +672,12 @@ def systemd_status(deep: bool = False, system: bool = False):
print()
subprocess.run(
_systemctl_cmd(system) + ["status", SERVICE_NAME, "--no-pager"],
_systemctl_cmd(system) + ["status", get_service_name(), "--no-pager"],
capture_output=False,
)
result = subprocess.run(
_systemctl_cmd(system) + ["is-active", SERVICE_NAME],
_systemctl_cmd(system) + ["is-active", get_service_name()],
capture_output=True,
text=True,
)
@@ -657,7 +716,7 @@ def systemd_status(deep: bool = False, system: bool = False):
if deep:
print()
print("Recent logs:")
subprocess.run(_journalctl_cmd(system) + ["-u", SERVICE_NAME, "-n", "20", "--no-pager"])
subprocess.run(_journalctl_cmd(system) + ["-u", get_service_name(), "-n", "20", "--no-pager"])
# =============================================================================
@@ -684,6 +743,7 @@ def generate_launchd_plist() -> str:
<string>hermes_cli.main</string>
<string>gateway</string>
<string>run</string>
<string>--replace</string>
</array>
<key>WorkingDirectory</key>
@@ -707,10 +767,45 @@ def generate_launchd_plist() -> str:
</plist>
"""
def launchd_plist_is_current() -> bool:
"""Check if the installed launchd plist matches the currently generated one."""
plist_path = get_launchd_plist_path()
if not plist_path.exists():
return False
installed = plist_path.read_text(encoding="utf-8")
expected = generate_launchd_plist()
return _normalize_service_definition(installed) == _normalize_service_definition(expected)
def refresh_launchd_plist_if_needed() -> bool:
"""Rewrite the installed launchd plist when the generated definition has changed.
Unlike systemd, launchd picks up plist changes on the next ``launchctl stop``/
``launchctl start`` cycle no daemon-reload is needed. We still unload/reload
to make launchd re-read the updated plist immediately.
"""
plist_path = get_launchd_plist_path()
if not plist_path.exists() or launchd_plist_is_current():
return False
plist_path.write_text(generate_launchd_plist(), encoding="utf-8")
# Unload/reload so launchd picks up the new definition
subprocess.run(["launchctl", "unload", str(plist_path)], check=False)
subprocess.run(["launchctl", "load", str(plist_path)], check=False)
print("↻ Updated gateway launchd service definition to match the current Hermes install")
return True
def launchd_install(force: bool = False):
plist_path = get_launchd_plist_path()
if plist_path.exists() and not force:
if not launchd_plist_is_current():
print(f"↻ Repairing outdated launchd service at: {plist_path}")
refresh_launchd_plist_if_needed()
print("✓ Service definition updated")
return
print(f"Service already installed at: {plist_path}")
print("Use --force to reinstall")
return
@@ -739,29 +834,94 @@ def launchd_uninstall():
print("✓ Service uninstalled")
def launchd_start():
subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
refresh_launchd_plist_if_needed()
plist_path = get_launchd_plist_path()
try:
subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
except subprocess.CalledProcessError as e:
if e.returncode != 3 or not plist_path.exists():
raise
print("↻ launchd job was unloaded; reloading service definition")
subprocess.run(["launchctl", "load", str(plist_path)], check=True)
subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
print("✓ Service started")
def launchd_stop():
subprocess.run(["launchctl", "stop", "ai.hermes.gateway"], check=True)
print("✓ Service stopped")
def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
"""Wait for the gateway process (by saved PID) to exit.
Uses the PID from the gateway.pid file not launchd labels so this
works correctly when multiple gateway instances run under separate
HERMES_HOME directories.
Args:
timeout: Total seconds to wait before giving up.
force_after: Seconds of graceful waiting before sending SIGKILL.
"""
import time
from gateway.status import get_running_pid
deadline = time.monotonic() + timeout
force_deadline = time.monotonic() + force_after
force_sent = False
while time.monotonic() < deadline:
pid = get_running_pid()
if pid is None:
return # Process exited cleanly.
if not force_sent and time.monotonic() >= force_deadline:
# Grace period expired — force-kill the specific PID.
try:
os.kill(pid, signal.SIGKILL)
print(f"⚠ Gateway PID {pid} did not exit gracefully; sent SIGKILL")
except (ProcessLookupError, PermissionError):
return # Already gone or we can't touch it.
force_sent = True
time.sleep(0.3)
# Timed out even after SIGKILL.
remaining_pid = get_running_pid()
if remaining_pid is not None:
print(f"⚠ Gateway PID {remaining_pid} still running after {timeout}s — restart may fail")
def launchd_restart():
launchd_stop()
try:
launchd_stop()
except subprocess.CalledProcessError as e:
if e.returncode != 3:
raise
print("↻ launchd job was unloaded; skipping stop")
_wait_for_gateway_exit()
launchd_start()
def launchd_status(deep: bool = False):
plist_path = get_launchd_plist_path()
result = subprocess.run(
["launchctl", "list", "ai.hermes.gateway"],
capture_output=True,
text=True
)
print(f"Launchd plist: {plist_path}")
if launchd_plist_is_current():
print("✓ Service definition matches the current Hermes install")
else:
print("⚠ Service definition is stale relative to the current Hermes install")
print(" Run: hermes gateway start")
if result.returncode == 0:
print("✓ Gateway service is loaded")
print(result.stdout)
else:
print("✗ Gateway service is not loaded")
print(" Service definition exists locally but launchd has not loaded it.")
print(" Run: hermes gateway start")
if deep:
log_file = get_hermes_home() / "logs" / "gateway.log"
@@ -890,6 +1050,64 @@ _PLATFORMS = [
"help": "Paste your member ID from step 7 above."},
],
},
{
"key": "matrix",
"label": "Matrix",
"emoji": "🔐",
"token_var": "MATRIX_ACCESS_TOKEN",
"setup_instructions": [
"1. Works with any Matrix homeserver (self-hosted Synapse/Conduit/Dendrite or matrix.org)",
"2. Create a bot user on your homeserver, or use your own account",
"3. Get an access token: Element → Settings → Help & About → Access Token",
" Or via API: curl -X POST https://your-server/_matrix/client/v3/login \\",
" -d '{\"type\":\"m.login.password\",\"user\":\"@bot:server\",\"password\":\"...\"}'",
"4. Alternatively, provide user ID + password and Hermes will log in directly",
"5. For E2EE: set MATRIX_ENCRYPTION=true (requires pip install 'matrix-nio[e2e]')",
"6. To find your user ID: it's @username:your-server (shown in Element profile)",
],
"vars": [
{"name": "MATRIX_HOMESERVER", "prompt": "Homeserver URL (e.g. https://matrix.example.org)", "password": False,
"help": "Your Matrix homeserver URL. Works with any self-hosted instance."},
{"name": "MATRIX_ACCESS_TOKEN", "prompt": "Access token (leave empty to use password login instead)", "password": True,
"help": "Paste your access token, or leave empty and provide user ID + password below."},
{"name": "MATRIX_USER_ID", "prompt": "User ID (@bot:server — required for password login)", "password": False,
"help": "Full Matrix user ID, e.g. @hermes:matrix.example.org"},
{"name": "MATRIX_ALLOWED_USERS", "prompt": "Allowed user IDs (comma-separated, e.g. @you:server)", "password": False,
"is_allowlist": True,
"help": "Matrix user IDs who can interact with the bot."},
{"name": "MATRIX_HOME_ROOM", "prompt": "Home room ID (for cron/notification delivery, or empty to set later with /set-home)", "password": False,
"help": "Room ID (e.g. !abc123:server) for delivering cron results and notifications."},
],
},
{
"key": "mattermost",
"label": "Mattermost",
"emoji": "💬",
"token_var": "MATTERMOST_TOKEN",
"setup_instructions": [
"1. In Mattermost: Integrations → Bot Accounts → Add Bot Account",
" (System Console → Integrations → Bot Accounts must be enabled)",
"2. Give it a username (e.g. hermes) and copy the bot token",
"3. Works with any self-hosted Mattermost instance — enter your server URL",
"4. To find your user ID: click your avatar (top-left) → Profile",
" Your user ID is displayed there — click it to copy.",
" ⚠ This is NOT your username — it's a 26-character alphanumeric ID.",
"5. To get a channel ID: click the channel name → View Info → copy the ID",
],
"vars": [
{"name": "MATTERMOST_URL", "prompt": "Server URL (e.g. https://mm.example.com)", "password": False,
"help": "Your Mattermost server URL. Works with any self-hosted instance."},
{"name": "MATTERMOST_TOKEN", "prompt": "Bot token", "password": True,
"help": "Paste the bot token from step 2 above."},
{"name": "MATTERMOST_ALLOWED_USERS", "prompt": "Allowed user IDs (comma-separated)", "password": False,
"is_allowlist": True,
"help": "Your Mattermost user ID from step 4 above."},
{"name": "MATTERMOST_HOME_CHANNEL", "prompt": "Home channel ID (for cron/notification delivery, or empty to set later with /set-home)", "password": False,
"help": "Channel ID where Hermes delivers cron results and notifications."},
{"name": "MATTERMOST_REPLY_MODE", "prompt": "Reply mode — 'off' for flat messages, 'thread' for threaded replies (default: off)", "password": False,
"help": "off = flat channel messages, thread = replies nest under your message."},
],
},
{
"key": "whatsapp",
"label": "WhatsApp",
@@ -928,6 +1146,51 @@ _PLATFORMS = [
"help": "Only emails from these addresses will be processed."},
],
},
{
"key": "sms",
"label": "SMS (Twilio)",
"emoji": "📱",
"token_var": "TWILIO_ACCOUNT_SID",
"setup_instructions": [
"1. Create a Twilio account at https://www.twilio.com/",
"2. Get your Account SID and Auth Token from the Twilio Console dashboard",
"3. Buy or configure a phone number capable of sending SMS",
"4. Set up your webhook URL for inbound SMS:",
" Twilio Console → Phone Numbers → Active Numbers → your number",
" → Messaging → A MESSAGE COMES IN → Webhook → https://your-server:8080/webhooks/twilio",
],
"vars": [
{"name": "TWILIO_ACCOUNT_SID", "prompt": "Twilio Account SID", "password": False,
"help": "Found on the Twilio Console dashboard."},
{"name": "TWILIO_AUTH_TOKEN", "prompt": "Twilio Auth Token", "password": True,
"help": "Found on the Twilio Console dashboard (click to reveal)."},
{"name": "TWILIO_PHONE_NUMBER", "prompt": "Twilio phone number (E.164 format, e.g. +15551234567)", "password": False,
"help": "The Twilio phone number to send SMS from."},
{"name": "SMS_ALLOWED_USERS", "prompt": "Allowed phone numbers (comma-separated, E.164 format)", "password": False,
"is_allowlist": True,
"help": "Only messages from these phone numbers will be processed."},
{"name": "SMS_HOME_CHANNEL", "prompt": "Home channel phone number (for cron/notification delivery, or empty)", "password": False,
"help": "Phone number to deliver cron job results and notifications to."},
],
},
{
"key": "dingtalk",
"label": "DingTalk",
"emoji": "💬",
"token_var": "DINGTALK_CLIENT_ID",
"setup_instructions": [
"1. Go to https://open-dev.dingtalk.com → Create Application",
"2. Under 'Credentials', copy the AppKey (Client ID) and AppSecret (Client Secret)",
"3. Enable 'Stream Mode' under the bot settings",
"4. Add the bot to a group chat or message it directly",
],
"vars": [
{"name": "DINGTALK_CLIENT_ID", "prompt": "AppKey (Client ID)", "password": False,
"help": "The AppKey from your DingTalk application credentials."},
{"name": "DINGTALK_CLIENT_SECRET", "prompt": "AppSecret (Client Secret)", "password": True,
"help": "The AppSecret from your DingTalk application credentials."},
],
},
]
@@ -962,6 +1225,16 @@ def _platform_status(platform: dict) -> str:
if any([val, pwd, imap, smtp]):
return "partially configured"
return "not configured"
if platform.get("key") == "matrix":
homeserver = get_env_value("MATRIX_HOMESERVER")
password = get_env_value("MATRIX_PASSWORD")
if (val or password) and homeserver:
e2ee = get_env_value("MATRIX_ENCRYPTION")
suffix = " + E2EE" if e2ee and e2ee.lower() in ("true", "1", "yes") else ""
return f"configured{suffix}"
if val or password or homeserver:
return "partially configured"
return "not configured"
if val:
return "configured"
return "not configured"
@@ -1118,7 +1391,7 @@ def _is_service_running() -> bool:
if user_unit_exists:
result = subprocess.run(
_systemctl_cmd(False) + ["is-active", SERVICE_NAME],
_systemctl_cmd(False) + ["is-active", get_service_name()],
capture_output=True, text=True
)
if result.stdout.strip() == "active":
@@ -1126,7 +1399,7 @@ def _is_service_running() -> bool:
if system_unit_exists:
result = subprocess.run(
_systemctl_cmd(True) + ["is-active", SERVICE_NAME],
_systemctl_cmd(True) + ["is-active", get_service_name()],
capture_output=True, text=True
)
if result.stdout.strip() == "active":
@@ -1477,14 +1750,17 @@ def gateway_command(args):
# Try service first, fall back to killing and restarting
service_available = False
system = getattr(args, 'system', False)
service_configured = False
if is_linux() and (get_systemd_unit_path(system=False).exists() or get_systemd_unit_path(system=True).exists()):
service_configured = True
try:
systemd_restart(system=system)
service_available = True
except subprocess.CalledProcessError:
pass
elif is_macos() and get_launchd_plist_path().exists():
service_configured = True
try:
launchd_restart()
service_available = True
@@ -1492,14 +1768,36 @@ def gateway_command(args):
pass
if not service_available:
# systemd/launchd restart failed — check if linger is the issue
if is_linux():
linger_ok, _detail = get_systemd_linger_status()
if linger_ok is not True:
import getpass
_username = getpass.getuser()
print()
print("⚠ Cannot restart gateway as a service — linger is not enabled.")
print(" The gateway user service requires linger to function on headless servers.")
print()
print(f" Run: sudo loginctl enable-linger {_username}")
print()
print(" Then restart the gateway:")
print(" hermes gateway restart")
return
if service_configured:
print()
print("✗ Gateway service restart failed.")
print(" The service definition exists, but the service manager did not recover it.")
print(" Fix the service, then retry: hermes gateway start")
sys.exit(1)
# Manual restart: kill existing processes
killed = kill_gateway_processes()
if killed:
print(f"✓ Stopped {killed} gateway process(es)")
import time
time.sleep(2)
_wait_for_gateway_exit(timeout=10.0, force_after=5.0)
# Start fresh
print("Starting gateway...")
run_gateway(verbose=False)
+754 -73
View File
File diff suppressed because it is too large Load Diff
+777 -28
View File
@@ -8,26 +8,46 @@ Add, remove, or reorder entries here — both `hermes setup` and
from __future__ import annotations
import json
import os
import urllib.request
import urllib.error
from difflib import get_close_matches
from typing import Any, Optional
COPILOT_BASE_URL = "https://api.githubcopilot.com"
COPILOT_MODELS_URL = f"{COPILOT_BASE_URL}/models"
COPILOT_EDITOR_VERSION = "vscode/1.104.1"
COPILOT_REASONING_EFFORTS_GPT5 = ["minimal", "low", "medium", "high"]
COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# Backward-compatible aliases for the earlier GitHub Models-backed Copilot work.
GITHUB_MODELS_BASE_URL = COPILOT_BASE_URL
GITHUB_MODELS_CATALOG_URL = COPILOT_MODELS_URL
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-opus-4.6", "recommended"),
("anthropic/claude-sonnet-4.5", ""),
("openai/gpt-5.4-pro", ""),
("anthropic/claude-haiku-4.5", ""),
("openai/gpt-5.4", ""),
("openai/gpt-5.4-mini", ""),
("openrouter/hunter-alpha", "free"),
("openrouter/healer-alpha", "free"),
("openai/gpt-5.3-codex", ""),
("google/gemini-3-pro-preview", ""),
("google/gemini-3-flash-preview", ""),
("qwen/qwen3.5-plus-02-15", ""),
("qwen/qwen3.5-35b-a3b", ""),
("stepfun/step-3.5-flash", ""),
("z-ai/glm-5", ""),
("moonshotai/kimi-k2.5", ""),
("minimax/minimax-m2.5", ""),
("z-ai/glm-5", ""),
("z-ai/glm-5-turbo", ""),
("moonshotai/kimi-k2.5", ""),
("x-ai/grok-4.20-beta", ""),
("nvidia/nemotron-3-super-120b-a12b:free", "free"),
("arcee-ai/trinity-large-preview:free", "free"),
("openai/gpt-5.4-pro", ""),
("openai/gpt-5.4-nano", ""),
]
_PROVIDER_MODELS: dict[str, list[str]] = {
@@ -45,6 +65,25 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"gpt-5.1-codex-mini",
"gpt-5.1-codex-max",
],
"copilot-acp": [
"copilot-acp",
],
"copilot": [
"gpt-5.4",
"gpt-5.4-mini",
"gpt-5-mini",
"gpt-5.3-codex",
"gpt-5.2-codex",
"gpt-4.1",
"gpt-4o",
"gpt-4o-mini",
"claude-opus-4.6",
"claude-sonnet-4.6",
"claude-sonnet-4.5",
"claude-haiku-4.5",
"gemini-2.5-pro",
"grok-code-fast-1",
],
"zai": [
"glm-5",
"glm-4.7",
@@ -60,11 +99,15 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"kimi-k2-0905-preview",
],
"minimax": [
"MiniMax-M2.7",
"MiniMax-M2.7-highspeed",
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
],
"minimax-cn": [
"MiniMax-M2.7",
"MiniMax-M2.7-highspeed",
"MiniMax-M2.5",
"MiniMax-M2.5-highspeed",
"MiniMax-M2.1",
@@ -78,17 +121,101 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"claude-sonnet-4-20250514",
"claude-haiku-4-5-20251001",
],
"deepseek": [
"deepseek-chat",
"deepseek-reasoner",
],
"opencode-zen": [
"gpt-5.4-pro",
"gpt-5.4",
"gpt-5.3-codex",
"gpt-5.3-codex-spark",
"gpt-5.2",
"gpt-5.2-codex",
"gpt-5.1",
"gpt-5.1-codex",
"gpt-5.1-codex-max",
"gpt-5.1-codex-mini",
"gpt-5",
"gpt-5-codex",
"gpt-5-nano",
"claude-opus-4-6",
"claude-opus-4-5",
"claude-opus-4-1",
"claude-sonnet-4-6",
"claude-sonnet-4-5",
"claude-sonnet-4",
"claude-haiku-4-5",
"claude-3-5-haiku",
"gemini-3.1-pro",
"gemini-3-pro",
"gemini-3-flash",
"minimax-m2.5",
"minimax-m2.5-free",
"minimax-m2.1",
"glm-5",
"glm-4.7",
"glm-4.6",
"kimi-k2.5",
"kimi-k2-thinking",
"kimi-k2",
"qwen3-coder",
"big-pickle",
],
"opencode-go": [
"glm-5",
"kimi-k2.5",
"minimax-m2.5",
],
"ai-gateway": [
"anthropic/claude-opus-4.6",
"anthropic/claude-sonnet-4.6",
"anthropic/claude-sonnet-4.5",
"anthropic/claude-haiku-4.5",
"openai/gpt-5",
"openai/gpt-4.1",
"openai/gpt-4.1-mini",
"google/gemini-3-pro-preview",
"google/gemini-3-flash",
"google/gemini-2.5-pro",
"google/gemini-2.5-flash",
"deepseek/deepseek-v3.2",
],
"kilocode": [
"anthropic/claude-opus-4.6",
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
"google/gemini-3-pro-preview",
"google/gemini-3-flash-preview",
],
"alibaba": [
"qwen3.5-plus",
"qwen3-max",
"qwen3-coder-plus",
"qwen3-coder-next",
"qwen-plus-latest",
"qwen3.5-flash",
"qwen-vl-max",
],
}
_PROVIDER_LABELS = {
"openrouter": "OpenRouter",
"openai-codex": "OpenAI Codex",
"copilot-acp": "GitHub Copilot ACP",
"nous": "Nous Portal",
"copilot": "GitHub Copilot",
"zai": "Z.AI / GLM",
"kimi-coding": "Kimi / Moonshot",
"minimax": "MiniMax",
"minimax-cn": "MiniMax (China)",
"anthropic": "Anthropic",
"deepseek": "DeepSeek",
"opencode-zen": "OpenCode Zen",
"opencode-go": "OpenCode Go",
"ai-gateway": "AI Gateway",
"kilocode": "Kilo Code",
"alibaba": "Alibaba Cloud (DashScope)",
"custom": "Custom endpoint",
}
@@ -97,12 +224,33 @@ _PROVIDER_ALIASES = {
"z-ai": "zai",
"z.ai": "zai",
"zhipu": "zai",
"github": "copilot",
"github-copilot": "copilot",
"github-models": "copilot",
"github-model": "copilot",
"github-copilot-acp": "copilot-acp",
"copilot-acp-agent": "copilot-acp",
"kimi": "kimi-coding",
"moonshot": "kimi-coding",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
"claude-code": "anthropic",
"deep-seek": "deepseek",
"opencode": "opencode-zen",
"zen": "opencode-zen",
"go": "opencode-go",
"opencode-go-sub": "opencode-go",
"aigateway": "ai-gateway",
"vercel": "ai-gateway",
"vercel-ai-gateway": "ai-gateway",
"kilo": "kilocode",
"kilo-code": "kilocode",
"kilo-gateway": "kilocode",
"dashscope": "alibaba",
"aliyun": "alibaba",
"qwen": "alibaba",
"alibaba-cloud": "alibaba",
}
@@ -135,8 +283,10 @@ def list_available_providers() -> list[dict[str, str]]:
"""
# Canonical providers in display order
_PROVIDER_ORDER = [
"openrouter", "nous", "openai-codex",
"zai", "kimi-coding", "minimax", "minimax-cn", "anthropic",
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
"opencode-zen", "opencode-go",
"ai-gateway", "deepseek", "custom",
]
# Build reverse alias map
aliases_for: dict[str, list[str]] = {}
@@ -150,9 +300,12 @@ def list_available_providers() -> list[dict[str, str]]:
# Check if this provider has credentials available
has_creds = False
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider(requested=pid)
has_creds = bool(runtime.get("api_key"))
if pid == "custom":
has_creds = bool(_get_custom_base_url())
else:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider(requested=pid)
has_creds = bool(runtime.get("api_key"))
except Exception:
pass
result.append({
@@ -191,6 +344,19 @@ def parse_model_input(raw: str, current_provider: str) -> tuple[str, str]:
return (current_provider, stripped)
def _get_custom_base_url() -> str:
"""Get the custom endpoint base_url from config.yaml."""
try:
from hermes_cli.config import load_config
config = load_config()
model_cfg = config.get("model", {})
if isinstance(model_cfg, dict):
return str(model_cfg.get("base_url", "")).strip()
except Exception:
pass
return ""
def curated_models_for_provider(provider: Optional[str]) -> list[tuple[str, str]]:
"""Return ``(model_id, description)`` tuples for a provider's model list.
@@ -212,6 +378,127 @@ def curated_models_for_provider(provider: Optional[str]) -> list[tuple[str, str]
return [(m, "") for m in models]
def detect_provider_for_model(
model_name: str,
current_provider: str,
) -> Optional[tuple[str, str]]:
"""Auto-detect the best provider for a model name.
Returns ``(provider_id, model_name)`` the model name may be remapped
(e.g. bare ``deepseek-chat`` ``deepseek/deepseek-chat`` for OpenRouter).
Returns ``None`` when no confident match is found.
Priority:
0. Bare provider name switch to that provider's default model
1. Direct provider with credentials (highest)
2. Direct provider without credentials remap to OpenRouter slug
3. OpenRouter catalog match
"""
name = (model_name or "").strip()
if not name:
return None
name_lower = name.lower()
# --- Step 0: bare provider name typed as model ---
# If someone types `/model nous` or `/model anthropic`, treat it as a
# provider switch and pick the first model from that provider's catalog.
# Skip "custom" and "openrouter" — custom has no model catalog, and
# openrouter requires an explicit model name to be useful.
resolved_provider = _PROVIDER_ALIASES.get(name_lower, name_lower)
if resolved_provider not in {"custom", "openrouter"}:
default_models = _PROVIDER_MODELS.get(resolved_provider, [])
if (
resolved_provider in _PROVIDER_LABELS
and default_models
and resolved_provider != normalize_provider(current_provider)
):
return (resolved_provider, default_models[0])
# Aggregators list other providers' models — never auto-switch TO them
_AGGREGATORS = {"nous", "openrouter"}
# If the model belongs to the current provider's catalog, don't suggest switching
current_models = _PROVIDER_MODELS.get(current_provider, [])
if any(name_lower == m.lower() for m in current_models):
return None
# --- Step 1: check static provider catalogs for a direct match ---
direct_match: Optional[str] = None
for pid, models in _PROVIDER_MODELS.items():
if pid == current_provider or pid in _AGGREGATORS:
continue
if any(name_lower == m.lower() for m in models):
direct_match = pid
break
if direct_match:
# Check if we have credentials for this provider
has_creds = False
try:
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY.get(direct_match)
if pconfig:
import os
for env_var in pconfig.api_key_env_vars:
if os.getenv(env_var, "").strip():
has_creds = True
break
except Exception:
pass
if has_creds:
return (direct_match, name)
# No direct creds — try to find this model on OpenRouter instead
or_slug = _find_openrouter_slug(name)
if or_slug:
return ("openrouter", or_slug)
# Still return the direct provider — credential resolution will
# give a clear error rather than silently using the wrong provider
return (direct_match, name)
# --- Step 2: check OpenRouter catalog ---
# First try exact match (handles provider/model format)
or_slug = _find_openrouter_slug(name)
if or_slug:
if current_provider != "openrouter":
return ("openrouter", or_slug)
# Already on openrouter, just return the resolved slug
if or_slug != name:
return ("openrouter", or_slug)
return None # already on openrouter with matching name
return None
def _find_openrouter_slug(model_name: str) -> Optional[str]:
"""Find the full OpenRouter model slug for a bare or partial model name.
Handles:
- Exact match: ``anthropic/claude-opus-4.6`` as-is
- Bare name: ``deepseek-chat`` ``deepseek/deepseek-chat``
- Bare name: ``claude-opus-4.6`` ``anthropic/claude-opus-4.6``
"""
name_lower = model_name.strip().lower()
if not name_lower:
return None
# Exact match (already has provider/ prefix)
for mid, _ in OPENROUTER_MODELS:
if name_lower == mid.lower():
return mid
# Try matching just the model part (after the /)
for mid, _ in OPENROUTER_MODELS:
if "/" in mid:
_, model_part = mid.split("/", 1)
if name_lower == model_part.lower():
return mid
return None
def normalize_provider(provider: Optional[str]) -> str:
"""Normalize provider aliases to Hermes' canonical provider ids.
@@ -233,6 +520,17 @@ def provider_label(provider: Optional[str]) -> str:
return _PROVIDER_LABELS.get(normalized, original or "OpenRouter")
def _resolve_copilot_catalog_api_key() -> str:
"""Best-effort GitHub token for fetching the Copilot model catalog."""
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("copilot")
return str(creds.get("api_key") or "").strip()
except Exception:
return ""
def provider_model_ids(provider: Optional[str]) -> list[str]:
"""Return the best known model catalog for a provider.
@@ -246,13 +544,22 @@ def provider_model_ids(provider: Optional[str]) -> list[str]:
from hermes_cli.codex_models import get_codex_model_ids
return get_codex_model_ids()
if normalized in {"copilot", "copilot-acp"}:
try:
live = _fetch_github_models(_resolve_copilot_catalog_api_key())
if live:
return live
except Exception:
pass
if normalized == "copilot-acp":
return list(_PROVIDER_MODELS.get("copilot", []))
if normalized == "nous":
# Try live Nous Portal /models endpoint
try:
from hermes_cli.auth import fetch_nous_models, resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials()
if creds:
live = fetch_nous_models(creds.get("api_key", ""), creds.get("base_url", ""))
live = fetch_nous_models(api_key=creds.get("api_key", ""), inference_base_url=creds.get("base_url", ""))
if live:
return live
except Exception:
@@ -261,6 +568,22 @@ def provider_model_ids(provider: Optional[str]) -> list[str]:
live = _fetch_anthropic_models()
if live:
return live
if normalized == "ai-gateway":
live = _fetch_ai_gateway_models()
if live:
return live
if normalized == "custom":
base_url = _get_custom_base_url()
if base_url:
# Try common API key env vars for custom endpoints
api_key = (
os.getenv("CUSTOM_API_KEY", "")
or os.getenv("OPENAI_API_KEY", "")
or os.getenv("OPENROUTER_API_KEY", "")
)
live = fetch_api_models(api_key, base_url)
if live:
return live
return list(_PROVIDER_MODELS.get(normalized, []))
@@ -308,6 +631,401 @@ def _fetch_anthropic_models(timeout: float = 5.0) -> Optional[list[str]]:
return None
def _payload_items(payload: Any) -> list[dict[str, Any]]:
if isinstance(payload, list):
return [item for item in payload if isinstance(item, dict)]
if isinstance(payload, dict):
data = payload.get("data", [])
if isinstance(data, list):
return [item for item in data if isinstance(item, dict)]
return []
def _extract_model_ids(payload: Any) -> list[str]:
return [item.get("id", "") for item in _payload_items(payload) if item.get("id")]
def copilot_default_headers() -> dict[str, str]:
"""Standard headers for Copilot API requests.
Includes Openai-Intent and x-initiator headers that opencode and the
Copilot CLI send on every request.
"""
try:
from hermes_cli.copilot_auth import copilot_request_headers
return copilot_request_headers(is_agent_turn=True)
except ImportError:
return {
"Editor-Version": COPILOT_EDITOR_VERSION,
"User-Agent": "HermesAgent/1.0",
"Openai-Intent": "conversation-edits",
"x-initiator": "agent",
}
def _copilot_catalog_item_is_text_model(item: dict[str, Any]) -> bool:
model_id = str(item.get("id") or "").strip()
if not model_id:
return False
if item.get("model_picker_enabled") is False:
return False
capabilities = item.get("capabilities")
if isinstance(capabilities, dict):
model_type = str(capabilities.get("type") or "").strip().lower()
if model_type and model_type != "chat":
return False
supported_endpoints = item.get("supported_endpoints")
if isinstance(supported_endpoints, list):
normalized_endpoints = {
str(endpoint).strip()
for endpoint in supported_endpoints
if str(endpoint).strip()
}
if normalized_endpoints and not normalized_endpoints.intersection(
{"/chat/completions", "/responses", "/v1/messages"}
):
return False
return True
def fetch_github_model_catalog(
api_key: Optional[str] = None, timeout: float = 5.0
) -> Optional[list[dict[str, Any]]]:
"""Fetch the live GitHub Copilot model catalog for this account."""
attempts: list[dict[str, str]] = []
if api_key:
attempts.append({
**copilot_default_headers(),
"Authorization": f"Bearer {api_key}",
})
attempts.append(copilot_default_headers())
for headers in attempts:
req = urllib.request.Request(COPILOT_MODELS_URL, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
items = _payload_items(data)
models: list[dict[str, Any]] = []
seen_ids: set[str] = set()
for item in items:
if not _copilot_catalog_item_is_text_model(item):
continue
model_id = str(item.get("id") or "").strip()
if not model_id or model_id in seen_ids:
continue
seen_ids.add(model_id)
models.append(item)
if models:
return models
except Exception:
continue
return None
def _is_github_models_base_url(base_url: Optional[str]) -> bool:
normalized = (base_url or "").strip().rstrip("/").lower()
return (
normalized.startswith(COPILOT_BASE_URL)
or normalized.startswith("https://models.github.ai/inference")
)
def _fetch_github_models(api_key: Optional[str] = None, timeout: float = 5.0) -> Optional[list[str]]:
catalog = fetch_github_model_catalog(api_key=api_key, timeout=timeout)
if not catalog:
return None
return [item.get("id", "") for item in catalog if item.get("id")]
_COPILOT_MODEL_ALIASES = {
"openai/gpt-5": "gpt-5-mini",
"openai/gpt-5-chat": "gpt-5-mini",
"openai/gpt-5-mini": "gpt-5-mini",
"openai/gpt-5-nano": "gpt-5-mini",
"openai/gpt-4.1": "gpt-4.1",
"openai/gpt-4.1-mini": "gpt-4.1",
"openai/gpt-4.1-nano": "gpt-4.1",
"openai/gpt-4o": "gpt-4o",
"openai/gpt-4o-mini": "gpt-4o-mini",
"openai/o1": "gpt-5.2",
"openai/o1-mini": "gpt-5-mini",
"openai/o1-preview": "gpt-5.2",
"openai/o3": "gpt-5.3-codex",
"openai/o3-mini": "gpt-5-mini",
"openai/o4-mini": "gpt-5-mini",
"anthropic/claude-opus-4.6": "claude-opus-4.6",
"anthropic/claude-sonnet-4.6": "claude-sonnet-4.6",
"anthropic/claude-sonnet-4.5": "claude-sonnet-4.5",
"anthropic/claude-haiku-4.5": "claude-haiku-4.5",
}
def _copilot_catalog_ids(
catalog: Optional[list[dict[str, Any]]] = None,
api_key: Optional[str] = None,
) -> set[str]:
if catalog is None and api_key:
catalog = fetch_github_model_catalog(api_key=api_key)
if not catalog:
return set()
return {
str(item.get("id") or "").strip()
for item in catalog
if str(item.get("id") or "").strip()
}
def normalize_copilot_model_id(
model_id: Optional[str],
*,
catalog: Optional[list[dict[str, Any]]] = None,
api_key: Optional[str] = None,
) -> str:
raw = str(model_id or "").strip()
if not raw:
return ""
catalog_ids = _copilot_catalog_ids(catalog=catalog, api_key=api_key)
alias = _COPILOT_MODEL_ALIASES.get(raw)
if alias:
return alias
candidates = [raw]
if "/" in raw:
candidates.append(raw.split("/", 1)[1].strip())
if raw.endswith("-mini"):
candidates.append(raw[:-5])
if raw.endswith("-nano"):
candidates.append(raw[:-5])
if raw.endswith("-chat"):
candidates.append(raw[:-5])
seen: set[str] = set()
for candidate in candidates:
if not candidate or candidate in seen:
continue
seen.add(candidate)
if candidate in _COPILOT_MODEL_ALIASES:
return _COPILOT_MODEL_ALIASES[candidate]
if candidate in catalog_ids:
return candidate
if "/" in raw:
return raw.split("/", 1)[1].strip()
return raw
def _github_reasoning_efforts_for_model_id(model_id: str) -> list[str]:
raw = (model_id or "").strip().lower()
if raw.startswith(("openai/o1", "openai/o3", "openai/o4", "o1", "o3", "o4")):
return list(COPILOT_REASONING_EFFORTS_O_SERIES)
normalized = normalize_copilot_model_id(model_id).lower()
if normalized.startswith("gpt-5"):
return list(COPILOT_REASONING_EFFORTS_GPT5)
return []
def _should_use_copilot_responses_api(model_id: str) -> bool:
"""Decide whether a Copilot model should use the Responses API.
Replicates opencode's ``shouldUseCopilotResponsesApi`` logic:
GPT-5+ models use Responses API, except ``gpt-5-mini`` which uses
Chat Completions. All non-GPT models (Claude, Gemini, etc.) use
Chat Completions.
"""
import re
match = re.match(r"^gpt-(\d+)", model_id)
if not match:
return False
major = int(match.group(1))
return major >= 5 and not model_id.startswith("gpt-5-mini")
def copilot_model_api_mode(
model_id: Optional[str],
*,
catalog: Optional[list[dict[str, Any]]] = None,
api_key: Optional[str] = None,
) -> str:
"""Determine the API mode for a Copilot model.
Uses the model ID pattern (matching opencode's approach) as the
primary signal. Falls back to the catalog's ``supported_endpoints``
only for models not covered by the pattern check.
"""
normalized = normalize_copilot_model_id(model_id, catalog=catalog, api_key=api_key)
if not normalized:
return "chat_completions"
# Primary: model ID pattern (matches opencode's shouldUseCopilotResponsesApi)
if _should_use_copilot_responses_api(normalized):
return "codex_responses"
# Secondary: check catalog for non-GPT-5 models (Claude via /v1/messages, etc.)
if catalog is None and api_key:
catalog = fetch_github_model_catalog(api_key=api_key)
if catalog:
catalog_entry = next((item for item in catalog if item.get("id") == normalized), None)
if isinstance(catalog_entry, dict):
supported_endpoints = {
str(endpoint).strip()
for endpoint in (catalog_entry.get("supported_endpoints") or [])
if str(endpoint).strip()
}
# For non-GPT-5 models, check if they only support messages API
if "/v1/messages" in supported_endpoints and "/chat/completions" not in supported_endpoints:
return "anthropic_messages"
return "chat_completions"
def github_model_reasoning_efforts(
model_id: Optional[str],
*,
catalog: Optional[list[dict[str, Any]]] = None,
api_key: Optional[str] = None,
) -> list[str]:
"""Return supported reasoning-effort levels for a Copilot-visible model."""
normalized = normalize_copilot_model_id(model_id, catalog=catalog, api_key=api_key)
if not normalized:
return []
catalog_entry = None
if catalog is not None:
catalog_entry = next((item for item in catalog if item.get("id") == normalized), None)
elif api_key:
fetched_catalog = fetch_github_model_catalog(api_key=api_key)
if fetched_catalog:
catalog_entry = next((item for item in fetched_catalog if item.get("id") == normalized), None)
if catalog_entry is not None:
capabilities = catalog_entry.get("capabilities")
if isinstance(capabilities, dict):
supports = capabilities.get("supports")
if isinstance(supports, dict):
efforts = supports.get("reasoning_effort")
if isinstance(efforts, list):
normalized_efforts = [
str(effort).strip().lower()
for effort in efforts
if str(effort).strip()
]
return list(dict.fromkeys(normalized_efforts))
return []
legacy_capabilities = {
str(capability).strip().lower()
for capability in catalog_entry.get("capabilities", [])
if str(capability).strip()
}
if "reasoning" not in legacy_capabilities:
return []
return _github_reasoning_efforts_for_model_id(str(model_id or normalized))
def probe_api_models(
api_key: Optional[str],
base_url: Optional[str],
timeout: float = 5.0,
) -> dict[str, Any]:
"""Probe an OpenAI-compatible ``/models`` endpoint with light URL heuristics."""
normalized = (base_url or "").strip().rstrip("/")
if not normalized:
return {
"models": None,
"probed_url": None,
"resolved_base_url": "",
"suggested_base_url": None,
"used_fallback": False,
}
if _is_github_models_base_url(normalized):
models = _fetch_github_models(api_key=api_key, timeout=timeout)
return {
"models": models,
"probed_url": COPILOT_MODELS_URL,
"resolved_base_url": COPILOT_BASE_URL,
"suggested_base_url": None,
"used_fallback": False,
}
if normalized.endswith("/v1"):
alternate_base = normalized[:-3].rstrip("/")
else:
alternate_base = normalized + "/v1"
candidates: list[tuple[str, bool]] = [(normalized, False)]
if alternate_base and alternate_base != normalized:
candidates.append((alternate_base, True))
tried: list[str] = []
headers: dict[str, str] = {}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
if normalized.startswith(COPILOT_BASE_URL):
headers.update(copilot_default_headers())
for candidate_base, is_fallback in candidates:
url = candidate_base.rstrip("/") + "/models"
tried.append(url)
req = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
return {
"models": [m.get("id", "") for m in data.get("data", [])],
"probed_url": url,
"resolved_base_url": candidate_base.rstrip("/"),
"suggested_base_url": alternate_base if alternate_base != candidate_base else normalized,
"used_fallback": is_fallback,
}
except Exception:
continue
return {
"models": None,
"probed_url": tried[-1] if tried else normalized.rstrip("/") + "/models",
"resolved_base_url": normalized,
"suggested_base_url": alternate_base if alternate_base != normalized else None,
"used_fallback": False,
}
def _fetch_ai_gateway_models(timeout: float = 5.0) -> Optional[list[str]]:
"""Fetch available language models with tool-use from AI Gateway."""
api_key = os.getenv("AI_GATEWAY_API_KEY", "").strip()
if not api_key:
return None
base_url = os.getenv("AI_GATEWAY_BASE_URL", "").strip()
if not base_url:
from hermes_constants import AI_GATEWAY_BASE_URL
base_url = AI_GATEWAY_BASE_URL
url = base_url.rstrip("/") + "/models"
headers: dict[str, str] = {"Authorization": f"Bearer {api_key}"}
req = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
return [
m["id"]
for m in data.get("data", [])
if m.get("id")
and m.get("type") == "language"
and "tool-use" in (m.get("tags") or [])
]
except Exception:
return None
def fetch_api_models(
api_key: Optional[str],
base_url: Optional[str],
@@ -318,22 +1036,7 @@ def fetch_api_models(
Returns a list of model ID strings, or ``None`` if the endpoint could not
be reached (network error, timeout, auth failure, etc.).
"""
if not base_url:
return None
url = base_url.rstrip("/") + "/models"
headers: dict[str, str] = {}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
req = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
# Standard OpenAI format: {"data": [{"id": "model-name", ...}, ...]}
return [m.get("id", "") for m in data.get("data", [])]
except Exception:
return None
return probe_api_models(api_key, base_url, timeout=timeout).get("models")
def validate_requested_model(
@@ -359,6 +1062,12 @@ def validate_requested_model(
normalized = normalize_provider(provider)
if normalized == "openrouter" and base_url and "openrouter.ai" not in base_url:
normalized = "custom"
requested_for_lookup = requested
if normalized == "copilot":
requested_for_lookup = normalize_copilot_model_id(
requested,
api_key=api_key,
) or requested
if not requested:
return {
@@ -376,20 +1085,60 @@ def validate_requested_model(
"message": "Model names cannot contain spaces.",
}
# Custom endpoints can serve any model — skip validation
if normalized == "custom":
probe = probe_api_models(api_key, base_url)
api_models = probe.get("models")
if api_models is not None:
if requested_for_lookup in set(api_models):
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
suggestions = get_close_matches(requested, api_models, n=3, cutoff=0.5)
suggestion_text = ""
if suggestions:
suggestion_text = "\n Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
message = (
f"Note: `{requested}` was not found in this custom endpoint's model listing "
f"({probe.get('probed_url')}). It may still work if the server supports hidden or aliased models."
f"{suggestion_text}"
)
if probe.get("used_fallback"):
message += (
f"\n Endpoint verification succeeded after trying `{probe.get('resolved_base_url')}`. "
f"Consider saving that as your base URL."
)
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": message,
}
message = (
f"Note: could not reach this custom endpoint's model listing at `{probe.get('probed_url')}`. "
f"Hermes will still save `{requested}`, but the endpoint should expose `/models` for verification."
)
if probe.get("suggested_base_url"):
message += f"\n If this server expects `/v1`, try base URL: `{probe.get('suggested_base_url')}`"
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": None,
"message": message,
}
# Probe the live API to check if the model actually exists
api_models = fetch_api_models(api_key, base_url)
if api_models is not None:
if requested in set(api_models):
if requested_for_lookup in set(api_models):
# API confirmed the model exists
return {
"accepted": True,
+449
View File
@@ -0,0 +1,449 @@
"""
Hermes Plugin System
====================
Discovers, loads, and manages plugins from three sources:
1. **User plugins** ``~/.hermes/plugins/<name>/``
2. **Project plugins** ``./.hermes/plugins/<name>/``
3. **Pip plugins** packages that expose the ``hermes_agent.plugins``
entry-point group.
Each directory plugin must contain a ``plugin.yaml`` manifest **and** an
``__init__.py`` with a ``register(ctx)`` function.
Lifecycle hooks
---------------
Plugins may register callbacks for any of the hooks in ``VALID_HOOKS``.
The agent core calls ``invoke_hook(name, **kwargs)`` at the appropriate
points.
Tool registration
-----------------
``PluginContext.register_tool()`` delegates to ``tools.registry.register()``
so plugin-defined tools appear alongside the built-in tools.
"""
from __future__ import annotations
import importlib
import importlib.metadata
import importlib.util
import logging
import os
import sys
import types
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Set
try:
import yaml
except ImportError: # pragma: no cover yaml is optional at import time
yaml = None # type: ignore[assignment]
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
VALID_HOOKS: Set[str] = {
"pre_tool_call",
"post_tool_call",
"pre_llm_call",
"post_llm_call",
"on_session_start",
"on_session_end",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
_NS_PARENT = "hermes_plugins"
# ---------------------------------------------------------------------------
# Data classes
# ---------------------------------------------------------------------------
@dataclass
class PluginManifest:
"""Parsed representation of a plugin.yaml manifest."""
name: str
version: str = ""
description: str = ""
author: str = ""
requires_env: List[str] = field(default_factory=list)
provides_tools: List[str] = field(default_factory=list)
provides_hooks: List[str] = field(default_factory=list)
source: str = "" # "user", "project", or "entrypoint"
path: Optional[str] = None
@dataclass
class LoadedPlugin:
"""Runtime state for a single loaded plugin."""
manifest: PluginManifest
module: Optional[types.ModuleType] = None
tools_registered: List[str] = field(default_factory=list)
hooks_registered: List[str] = field(default_factory=list)
enabled: bool = False
error: Optional[str] = None
# ---------------------------------------------------------------------------
# PluginContext handed to each plugin's ``register()`` function
# ---------------------------------------------------------------------------
class PluginContext:
"""Facade given to plugins so they can register tools and hooks."""
def __init__(self, manifest: PluginManifest, manager: "PluginManager"):
self.manifest = manifest
self._manager = manager
# -- tool registration --------------------------------------------------
def register_tool(
self,
name: str,
toolset: str,
schema: dict,
handler: Callable,
check_fn: Callable | None = None,
requires_env: list | None = None,
is_async: bool = False,
description: str = "",
emoji: str = "",
) -> None:
"""Register a tool in the global registry **and** track it as plugin-provided."""
from tools.registry import registry
registry.register(
name=name,
toolset=toolset,
schema=schema,
handler=handler,
check_fn=check_fn,
requires_env=requires_env,
is_async=is_async,
description=description,
emoji=emoji,
)
self._manager._plugin_tool_names.add(name)
logger.debug("Plugin %s registered tool: %s", self.manifest.name, name)
# -- hook registration --------------------------------------------------
def register_hook(self, hook_name: str, callback: Callable) -> None:
"""Register a lifecycle hook callback.
Unknown hook names produce a warning but are still stored so
forward-compatible plugins don't break.
"""
if hook_name not in VALID_HOOKS:
logger.warning(
"Plugin '%s' registered unknown hook '%s' "
"(valid: %s)",
self.manifest.name,
hook_name,
", ".join(sorted(VALID_HOOKS)),
)
self._manager._hooks.setdefault(hook_name, []).append(callback)
logger.debug("Plugin %s registered hook: %s", self.manifest.name, hook_name)
# ---------------------------------------------------------------------------
# PluginManager
# ---------------------------------------------------------------------------
class PluginManager:
"""Central manager that discovers, loads, and invokes plugins."""
def __init__(self) -> None:
self._plugins: Dict[str, LoadedPlugin] = {}
self._hooks: Dict[str, List[Callable]] = {}
self._plugin_tool_names: Set[str] = set()
self._discovered: bool = False
# -----------------------------------------------------------------------
# Public
# -----------------------------------------------------------------------
def discover_and_load(self) -> None:
"""Scan all plugin sources and load each plugin found."""
if self._discovered:
return
self._discovered = True
manifests: List[PluginManifest] = []
# 1. User plugins (~/.hermes/plugins/)
hermes_home = os.environ.get("HERMES_HOME", os.path.expanduser("~/.hermes"))
user_dir = Path(hermes_home) / "plugins"
manifests.extend(self._scan_directory(user_dir, source="user"))
# 2. Project plugins (./.hermes/plugins/)
project_dir = Path.cwd() / ".hermes" / "plugins"
manifests.extend(self._scan_directory(project_dir, source="project"))
# 3. Pip / entry-point plugins
manifests.extend(self._scan_entry_points())
# Load each manifest
for manifest in manifests:
self._load_plugin(manifest)
if manifests:
logger.info(
"Plugin discovery complete: %d found, %d enabled",
len(self._plugins),
sum(1 for p in self._plugins.values() if p.enabled),
)
# -----------------------------------------------------------------------
# Directory scanning
# -----------------------------------------------------------------------
def _scan_directory(self, path: Path, source: str) -> List[PluginManifest]:
"""Read ``plugin.yaml`` manifests from subdirectories of *path*."""
manifests: List[PluginManifest] = []
if not path.is_dir():
return manifests
for child in sorted(path.iterdir()):
if not child.is_dir():
continue
manifest_file = child / "plugin.yaml"
if not manifest_file.exists():
manifest_file = child / "plugin.yml"
if not manifest_file.exists():
logger.debug("Skipping %s (no plugin.yaml)", child)
continue
try:
if yaml is None:
logger.warning("PyYAML not installed cannot load %s", manifest_file)
continue
data = yaml.safe_load(manifest_file.read_text()) or {}
manifest = PluginManifest(
name=data.get("name", child.name),
version=str(data.get("version", "")),
description=data.get("description", ""),
author=data.get("author", ""),
requires_env=data.get("requires_env", []),
provides_tools=data.get("provides_tools", []),
provides_hooks=data.get("provides_hooks", []),
source=source,
path=str(child),
)
manifests.append(manifest)
except Exception as exc:
logger.warning("Failed to parse %s: %s", manifest_file, exc)
return manifests
# -----------------------------------------------------------------------
# Entry-point scanning
# -----------------------------------------------------------------------
def _scan_entry_points(self) -> List[PluginManifest]:
"""Check ``importlib.metadata`` for pip-installed plugins."""
manifests: List[PluginManifest] = []
try:
eps = importlib.metadata.entry_points()
# Python 3.12+ returns a SelectableGroups; earlier returns dict
if hasattr(eps, "select"):
group_eps = eps.select(group=ENTRY_POINTS_GROUP)
elif isinstance(eps, dict):
group_eps = eps.get(ENTRY_POINTS_GROUP, [])
else:
group_eps = [ep for ep in eps if ep.group == ENTRY_POINTS_GROUP]
for ep in group_eps:
manifest = PluginManifest(
name=ep.name,
source="entrypoint",
path=ep.value,
)
manifests.append(manifest)
except Exception as exc:
logger.debug("Entry-point scan failed: %s", exc)
return manifests
# -----------------------------------------------------------------------
# Loading
# -----------------------------------------------------------------------
def _load_plugin(self, manifest: PluginManifest) -> None:
"""Import a plugin module and call its ``register(ctx)`` function."""
loaded = LoadedPlugin(manifest=manifest)
try:
if manifest.source in ("user", "project"):
module = self._load_directory_module(manifest)
else:
module = self._load_entrypoint_module(manifest)
loaded.module = module
# Call register()
register_fn = getattr(module, "register", None)
if register_fn is None:
loaded.error = "no register() function"
logger.warning("Plugin '%s' has no register() function", manifest.name)
else:
ctx = PluginContext(manifest, self)
register_fn(ctx)
loaded.tools_registered = [
t for t in self._plugin_tool_names
if t not in {
n
for name, p in self._plugins.items()
for n in p.tools_registered
}
]
loaded.hooks_registered = list(
{
h
for h, cbs in self._hooks.items()
if cbs # non-empty
}
- {
h
for name, p in self._plugins.items()
for h in p.hooks_registered
}
)
loaded.enabled = True
except Exception as exc:
loaded.error = str(exc)
logger.warning("Failed to load plugin '%s': %s", manifest.name, exc)
self._plugins[manifest.name] = loaded
def _load_directory_module(self, manifest: PluginManifest) -> types.ModuleType:
"""Import a directory-based plugin as ``hermes_plugins.<name>``."""
plugin_dir = Path(manifest.path) # type: ignore[arg-type]
init_file = plugin_dir / "__init__.py"
if not init_file.exists():
raise FileNotFoundError(f"No __init__.py in {plugin_dir}")
# Ensure the namespace parent package exists
if _NS_PARENT not in sys.modules:
ns_pkg = types.ModuleType(_NS_PARENT)
ns_pkg.__path__ = [] # type: ignore[attr-defined]
ns_pkg.__package__ = _NS_PARENT
sys.modules[_NS_PARENT] = ns_pkg
module_name = f"{_NS_PARENT}.{manifest.name.replace('-', '_')}"
spec = importlib.util.spec_from_file_location(
module_name,
init_file,
submodule_search_locations=[str(plugin_dir)],
)
if spec is None or spec.loader is None:
raise ImportError(f"Cannot create module spec for {init_file}")
module = importlib.util.module_from_spec(spec)
module.__package__ = module_name
module.__path__ = [str(plugin_dir)] # type: ignore[attr-defined]
sys.modules[module_name] = module
spec.loader.exec_module(module)
return module
def _load_entrypoint_module(self, manifest: PluginManifest) -> types.ModuleType:
"""Load a pip-installed plugin via its entry-point reference."""
eps = importlib.metadata.entry_points()
if hasattr(eps, "select"):
group_eps = eps.select(group=ENTRY_POINTS_GROUP)
elif isinstance(eps, dict):
group_eps = eps.get(ENTRY_POINTS_GROUP, [])
else:
group_eps = [ep for ep in eps if ep.group == ENTRY_POINTS_GROUP]
for ep in group_eps:
if ep.name == manifest.name:
return ep.load()
raise ImportError(
f"Entry point '{manifest.name}' not found in group '{ENTRY_POINTS_GROUP}'"
)
# -----------------------------------------------------------------------
# Hook invocation
# -----------------------------------------------------------------------
def invoke_hook(self, hook_name: str, **kwargs: Any) -> None:
"""Call all registered callbacks for *hook_name*.
Each callback is wrapped in its own try/except so a misbehaving
plugin cannot break the core agent loop.
"""
callbacks = self._hooks.get(hook_name, [])
for cb in callbacks:
try:
cb(**kwargs)
except Exception as exc:
logger.warning(
"Hook '%s' callback %s raised: %s",
hook_name,
getattr(cb, "__name__", repr(cb)),
exc,
)
# -----------------------------------------------------------------------
# Introspection
# -----------------------------------------------------------------------
def list_plugins(self) -> List[Dict[str, Any]]:
"""Return a list of info dicts for all discovered plugins."""
result: List[Dict[str, Any]] = []
for name, loaded in sorted(self._plugins.items()):
result.append(
{
"name": name,
"version": loaded.manifest.version,
"description": loaded.manifest.description,
"source": loaded.manifest.source,
"enabled": loaded.enabled,
"tools": len(loaded.tools_registered),
"hooks": len(loaded.hooks_registered),
"error": loaded.error,
}
)
return result
# ---------------------------------------------------------------------------
# Module-level singleton & convenience functions
# ---------------------------------------------------------------------------
_plugin_manager: Optional[PluginManager] = None
def get_plugin_manager() -> PluginManager:
"""Return (and lazily create) the global PluginManager singleton."""
global _plugin_manager
if _plugin_manager is None:
_plugin_manager = PluginManager()
return _plugin_manager
def discover_plugins() -> None:
"""Discover and load all plugins (idempotent)."""
get_plugin_manager().discover_and_load()
def invoke_hook(hook_name: str, **kwargs: Any) -> None:
"""Invoke a lifecycle hook on all loaded plugins."""
get_plugin_manager().invoke_hook(hook_name, **kwargs)
def get_plugin_tool_names() -> Set[str]:
"""Return the set of tool names registered by plugins."""
return get_plugin_manager()._plugin_tool_names
+152 -16
View File
@@ -14,6 +14,7 @@ from hermes_cli.auth import (
resolve_nous_runtime_credentials,
resolve_codex_runtime_credentials,
resolve_api_key_provider_credentials,
resolve_external_process_provider_credentials,
)
from hermes_cli.config import load_config
from hermes_constants import OPENROUTER_BASE_URL
@@ -23,16 +24,87 @@ def _normalize_custom_provider_name(value: str) -> str:
return value.strip().lower().replace(" ", "-")
def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
"""Auto-detect api_mode from the resolved base URL.
Direct api.openai.com endpoints need the Responses API for GPT-5.x
tool calls with reasoning (chat/completions returns 400).
"""
normalized = (base_url or "").strip().lower().rstrip("/")
if "api.openai.com" in normalized and "openrouter" not in normalized:
return "codex_responses"
return None
def _auto_detect_local_model(base_url: str) -> str:
"""Query a local server for its model name when only one model is loaded."""
if not base_url:
return ""
try:
import requests
url = base_url.rstrip("/")
if not url.endswith("/v1"):
url += "/v1"
resp = requests.get(url + "/models", timeout=5)
if resp.ok:
models = resp.json().get("data", [])
if len(models) == 1:
model_id = models[0].get("id", "")
if model_id:
return model_id
except Exception:
pass
return ""
def _get_model_config() -> Dict[str, Any]:
config = load_config()
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
return dict(model_cfg)
cfg = dict(model_cfg)
default = cfg.get("default", "").strip()
base_url = cfg.get("base_url", "").strip()
is_local = "localhost" in base_url or "127.0.0.1" in base_url
is_fallback = not default or default == "anthropic/claude-opus-4.6"
if is_local and is_fallback and base_url:
detected = _auto_detect_local_model(base_url)
if detected:
cfg["default"] = detected
return cfg
if isinstance(model_cfg, str) and model_cfg.strip():
return {"default": model_cfg.strip()}
return {}
def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
return configured_mode
model_name = str(model_cfg.get("default") or "").strip()
if not model_name:
return "chat_completions"
try:
from hermes_cli.models import copilot_model_api_mode
return copilot_model_api_mode(model_name, api_key=api_key)
except Exception:
return "chat_completions"
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages"}
def _parse_api_mode(raw: Any) -> Optional[str]:
"""Validate an api_mode value from config. Returns None if invalid."""
if isinstance(raw, str):
normalized = raw.strip().lower()
if normalized in _VALID_API_MODES:
return normalized
return None
def resolve_requested_provider(requested: Optional[str] = None) -> str:
"""Resolve provider request from explicit arg, config, then env."""
if requested and requested.strip():
@@ -86,11 +158,15 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
menu_key = f"custom:{name_norm}"
if requested_norm not in {name_norm, menu_key}:
continue
return {
result = {
"name": name.strip(),
"base_url": base_url.strip(),
"api_key": str(entry.get("api_key", "") or "").strip(),
}
api_mode = _parse_api_mode(entry.get("api_mode"))
if api_mode:
result["api_mode"] = api_mode
return result
return None
@@ -121,7 +197,9 @@ def _resolve_named_custom_runtime(
return {
"provider": "openrouter",
"api_mode": "chat_completions",
"api_mode": custom_provider.get("api_mode")
or _detect_api_mode_for_url(base_url)
or "chat_completions",
"base_url": base_url,
"api_key": api_key,
"source": f"custom_provider:{custom_provider.get('name', requested_provider)}",
@@ -137,6 +215,12 @@ def _resolve_openrouter_runtime(
model_cfg = _get_model_config()
cfg_base_url = model_cfg.get("base_url") if isinstance(model_cfg.get("base_url"), str) else ""
cfg_provider = model_cfg.get("provider") if isinstance(model_cfg.get("provider"), str) else ""
cfg_api_key = ""
for k in ("api_key", "api"):
v = model_cfg.get(k)
if isinstance(v, str) and v.strip():
cfg_api_key = v.strip()
break
requested_norm = (requested_provider or "").strip().lower()
cfg_provider = cfg_provider.strip().lower()
@@ -144,26 +228,24 @@ def _resolve_openrouter_runtime(
env_openrouter_base_url = os.getenv("OPENROUTER_BASE_URL", "").strip()
use_config_base_url = False
if cfg_base_url.strip() and not explicit_base_url and not env_openai_base_url:
if cfg_base_url.strip() and not explicit_base_url:
if requested_norm == "auto":
if not cfg_provider or cfg_provider == "auto":
use_config_base_url = True
elif requested_norm == "custom":
# Persisted custom endpoints store their base URL in config.yaml.
# If OPENAI_BASE_URL is not currently set in the environment, keep
# honoring that saved endpoint instead of falling back to OpenRouter.
if cfg_provider == "custom":
if (not cfg_provider or cfg_provider == "auto") and not env_openai_base_url:
use_config_base_url = True
elif requested_norm == "custom" and cfg_provider == "custom":
# provider: custom — use base_url from config (Fixes #1760).
use_config_base_url = True
# When the user explicitly requested the openrouter provider, skip
# OPENAI_BASE_URL — it typically points to a custom / non-OpenRouter
# endpoint and would prevent switching back to OpenRouter (#874).
skip_openai_base = requested_norm == "openrouter"
# For custom, prefer config base_url over env so config.yaml is honored (#1760).
base_url = (
(explicit_base_url or "").strip()
or ("" if skip_openai_base else env_openai_base_url)
or (cfg_base_url.strip() if use_config_base_url else "")
or ("" if skip_openai_base else env_openai_base_url)
or env_openrouter_base_url
or OPENROUTER_BASE_URL
).rstrip("/")
@@ -182,8 +264,10 @@ def _resolve_openrouter_runtime(
or ""
)
else:
# Custom endpoint: use api_key from config when using config base_url (#1760).
api_key = (
explicit_api_key
or (cfg_api_key if use_config_base_url else "")
or os.getenv("OPENAI_API_KEY")
or os.getenv("OPENROUTER_API_KEY")
or ""
@@ -193,7 +277,9 @@ def _resolve_openrouter_runtime(
return {
"provider": "openrouter",
"api_mode": "chat_completions",
"api_mode": _parse_api_mode(model_cfg.get("api_mode"))
or _detect_api_mode_for_url(base_url)
or "chat_completions",
"base_url": base_url,
"api_key": api_key,
"source": source,
@@ -251,6 +337,19 @@ def resolve_runtime_provider(
"requested_provider": requested_provider,
}
if provider == "copilot-acp":
creds = resolve_external_process_provider_credentials(provider)
return {
"provider": "copilot-acp",
"api_mode": "chat_completions",
"base_url": creds.get("base_url", "").rstrip("/"),
"api_key": creds.get("api_key", ""),
"command": creds.get("command", ""),
"args": list(creds.get("args") or []),
"source": creds.get("source", "process"),
"requested_provider": requested_provider,
}
# Anthropic (native Messages API)
if provider == "anthropic":
from agent.anthropic_adapter import resolve_anthropic_token
@@ -260,23 +359,60 @@ def resolve_runtime_provider(
"No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
"run 'claude setup-token', or authenticate with 'claude /login'."
)
# Allow base URL override from config.yaml model.base_url
model_cfg = _get_model_config()
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
base_url = cfg_base_url or "https://api.anthropic.com"
return {
"provider": "anthropic",
"api_mode": "anthropic_messages",
"base_url": "https://api.anthropic.com",
"base_url": base_url,
"api_key": token,
"source": "env",
"requested_provider": requested_provider,
}
# Alibaba Cloud / DashScope (Anthropic-compatible endpoint)
if provider == "alibaba":
creds = resolve_api_key_provider_credentials(provider)
base_url = creds.get("base_url", "").rstrip("/") or "https://dashscope-intl.aliyuncs.com/apps/anthropic"
return {
"provider": "alibaba",
"api_mode": "anthropic_messages",
"base_url": base_url,
"api_key": creds.get("api_key", ""),
"source": creds.get("source", "env"),
"requested_provider": requested_provider,
}
# API-key providers (z.ai/GLM, Kimi, MiniMax, MiniMax-CN)
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
creds = resolve_api_key_provider_credentials(provider)
model_cfg = _get_model_config()
base_url = creds.get("base_url", "").rstrip("/")
api_mode = "chat_completions"
if provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", ""))
else:
# Check explicit api_mode from model config first
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
api_mode = configured_mode
# Auto-detect Anthropic-compatible endpoints by URL convention
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
elif base_url.rstrip("/").endswith("/anthropic"):
api_mode = "anthropic_messages"
# MiniMax providers always use Anthropic Messages API.
# Auto-correct stale /v1 URLs (from old .env or config) to /anthropic.
elif provider in ("minimax", "minimax-cn"):
api_mode = "anthropic_messages"
if base_url.rstrip("/").endswith("/v1"):
base_url = base_url.rstrip("/")[:-3] + "/anthropic"
return {
"provider": provider,
"api_mode": "chat_completions",
"base_url": creds.get("base_url", "").rstrip("/"),
"api_mode": api_mode,
"base_url": base_url,
"api_key": creds.get("api_key", ""),
"source": creds.get("source", "env"),
"requested_provider": requested_provider,
+841 -204
View File
File diff suppressed because it is too large Load Diff
+26 -16
View File
@@ -304,7 +304,7 @@ def do_browse(page: int = 1, page_size: int = 20, source: str = "all",
def do_install(identifier: str, category: str = "", force: bool = False,
console: Optional[Console] = None) -> None:
console: Optional[Console] = None, skip_confirm: bool = False) -> None:
"""Fetch, quarantine, scan, confirm, and install a skill."""
from tools.skills_hub import (
GitHubAuth, create_source_router, ensure_hub_dirs,
@@ -378,7 +378,8 @@ def do_install(identifier: str, category: str = "", force: bool = False,
c.print(Panel("\n".join(metadata_lines), title="Upstream Metadata", border_style="blue"))
# Confirm with user — show appropriate warning based on source
if not force:
# skip_confirm bypasses the prompt (needed in TUI mode where input() hangs)
if not force and not skip_confirm:
c.print()
if bundle.source == "official":
c.print(Panel(
@@ -598,20 +599,23 @@ def do_audit(name: Optional[str] = None, console: Optional[Console] = None) -> N
c.print()
def do_uninstall(name: str, console: Optional[Console] = None) -> None:
def do_uninstall(name: str, console: Optional[Console] = None,
skip_confirm: bool = False) -> None:
"""Remove a hub-installed skill with confirmation."""
from tools.skills_hub import uninstall_skill
c = console or _console
c.print(f"\n[bold]Uninstall '{name}'?[/]")
try:
answer = input("Confirm [y/N]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
answer = "n"
if answer not in ("y", "yes"):
c.print("[dim]Cancelled.[/]\n")
return
# skip_confirm bypasses the prompt (needed in TUI mode where input() hangs)
if not skip_confirm:
c.print(f"\n[bold]Uninstall '{name}'?[/]")
try:
answer = input("Confirm [y/N]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
answer = "n"
if answer not in ("y", "yes"):
c.print("[dim]Cancelled.[/]\n")
return
success, msg = uninstall_skill(name)
if success:
@@ -923,7 +927,8 @@ def skills_command(args) -> None:
elif action == "search":
do_search(args.query, source=args.source, limit=args.limit)
elif action == "install":
do_install(args.identifier, category=args.category, force=args.force)
do_install(args.identifier, category=args.category, force=args.force,
skip_confirm=getattr(args, "yes", False))
elif action == "inspect":
do_inspect(args.identifier)
elif action == "list":
@@ -1054,11 +1059,15 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
return
identifier = args[0]
category = ""
force = any(flag in args for flag in ("--force", "--yes", "-y"))
# --yes / -y bypasses confirmation prompt (needed in TUI mode)
# --force handles reinstall override
skip_confirm = any(flag in args for flag in ("--yes", "-y"))
force = "--force" in args
for i, a in enumerate(args):
if a == "--category" and i + 1 < len(args):
category = args[i + 1]
do_install(identifier, category=category, force=force, console=c)
do_install(identifier, category=category, force=force,
skip_confirm=skip_confirm, console=c)
elif action == "inspect":
if not args:
@@ -1088,9 +1097,10 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
elif action == "uninstall":
if not args:
c.print("[bold red]Usage:[/] /skills uninstall <name>\n")
c.print("[bold red]Usage:[/] /skills uninstall <name> [--yes]\n")
return
do_uninstall(args[0], console=c)
skip_confirm = any(flag in args for flag in ("--yes", "-y"))
do_uninstall(args[0], console=c, skip_confirm=skip_confirm)
elif action == "publish":
if not args:
+14 -6
View File
@@ -60,6 +60,12 @@ All fields are optional. Missing values inherit from the ``default`` skin.
# Tool prefix: character for tool output lines (default: ┊)
tool_prefix: ""
# Tool emojis: override the default emoji for any tool (used in spinners & progress)
tool_emojis:
terminal: "" # Override terminal tool emoji
web_search: "🔮" # Override web_search tool emoji
# Any tool not listed here uses its registry default
USAGE
=====
@@ -111,6 +117,7 @@ class SkinConfig:
spinner: Dict[str, Any] = field(default_factory=dict)
branding: Dict[str, str] = field(default_factory=dict)
tool_prefix: str = ""
tool_emojis: Dict[str, str] = field(default_factory=dict) # per-tool emoji overrides
banner_logo: str = "" # Rich-markup ASCII art logo (replaces HERMES_AGENT_LOGO)
banner_hero: str = "" # Rich-markup hero art (replaces HERMES_CADUCEUS)
@@ -344,12 +351,12 @@ _BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"help_header": "(Ψ) Available Commands",
},
"tool_prefix": "",
"banner_logo": """[bold #B8E8FF]██████╗ ██████╗ ███████╗██╗██████╗ ███████╗ ██████╗ ███╗ ██╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗[/]
[bold #97D6FF]██╔══██╗██╔═══██╗██╔════╝██║██╔══██╗██╔════╝██╔═══██╗████╗ ██║ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝[/]
[#75C1F6]██████╔╝██║ ██║███████╗██║██║ ██║█████╗ ██║ ██║██╔██╗ ██║█████╗███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║[/]
[#4FA2E0]██╔═══╝ ██║ ██║╚════██║██║██║ ██║██╔══╝ ██║ ██║██║╚██╗██║╚════╝██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║[/]
[#2E7CC7]██║ ╚██████╔╝███████║██║██████╔╝███████╗╚██████╔╝██║ ╚████║ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║[/]
[#1B4F95]╚═╝ ╚═════╝ ╚══════╝╚═╝╚═════╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═══╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝[/]""",
"banner_logo": """[bold #B8E8FF]██████╗ ██████╗ ███████╗███████╗██╗██████╗ ██████╗ ███╗ ██╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗[/]
[bold #97D6FF]██╔══██╗██╔═══██╗██╔════╝██╔════╝██║██╔══██╗██╔═══██╗████╗ ██║ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝[/]
[#75C1F6]██████╔╝██║ ██║███████╗█████╗ ██║██║ ██║██║ ██║██╔██╗ ██║█████╗███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║[/]
[#4FA2E0]██╔═══╝ ██║ ██║╚════██║██╔══╝ ██║██║ ██║██║ ██║██║╚██╗██║╚════╝██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║[/]
[#2E7CC7]██║ ╚██████╔╝███████║███████╗██║██████╔╝╚██████╔╝██║ ╚████║ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║[/]
[#1B4F95]╚═╝ ╚═════╝ ╚══════╝╚══════╝╚═╝╚═════╝ ╚═════╝ ╚═╝ ╚═══╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝[/]""",
"banner_hero": """[#2A6FB9]⠀⢀⣀⡀⠀⠀⠀⠀⠀⠀[/]
[#5DB8F5]⠀⣠⣾⣿⣷⣄⠀⠀⠀⠀⠀⠀[/]
[#5DB8F5]⠀⢠⣿⠏⠀Ψ⠀⠹⣿⡄⠀⠀⠀⠀⠀⠀⠀[/]
@@ -541,6 +548,7 @@ def _build_skin_config(data: Dict[str, Any]) -> SkinConfig:
spinner=spinner,
branding=branding,
tool_prefix=data.get("tool_prefix", default.get("tool_prefix", "")),
tool_emojis=data.get("tool_emojis", {}),
banner_logo=data.get("banner_logo", ""),
banner_hero=data.get("banner_hero", ""),
)
+8 -1
View File
@@ -120,6 +120,7 @@ def show_status(args):
"MiniMax": "MINIMAX_API_KEY",
"MiniMax-CN": "MINIMAX_CN_API_KEY",
"Firecrawl": "FIRECRAWL_API_KEY",
"Tavily": "TAVILY_API_KEY",
"Browserbase": "BROWSERBASE_API_KEY", # Optional — local browser works without this
"FAL": "FAL_KEY",
"Tinker": "TINKER_API_KEY",
@@ -252,6 +253,7 @@ def show_status(args):
"Signal": ("SIGNAL_HTTP_URL", "SIGNAL_HOME_CHANNEL"),
"Slack": ("SLACK_BOT_TOKEN", None),
"Email": ("EMAIL_ADDRESS", "EMAIL_HOME_ADDRESS"),
"SMS": ("TWILIO_ACCOUNT_SID", "SMS_HOME_CHANNEL"),
}
for name, (token_var, home_var) in platforms.items():
@@ -275,8 +277,13 @@ def show_status(args):
print(color("◆ Gateway Service", Colors.CYAN, Colors.BOLD))
if sys.platform.startswith('linux'):
try:
from hermes_cli.gateway import get_service_name
_gw_svc = get_service_name()
except Exception:
_gw_svc = "hermes-gateway"
result = subprocess.run(
["systemctl", "--user", "is-active", "hermes-gateway"],
["systemctl", "--user", "is-active", _gw_svc],
capture_output=True,
text=True
)
+368 -26
View File
@@ -110,6 +110,7 @@ PLATFORMS = {
"whatsapp": {"label": "📱 WhatsApp", "default_toolset": "hermes-whatsapp"},
"signal": {"label": "📡 Signal", "default_toolset": "hermes-signal"},
"email": {"label": "📧 Email", "default_toolset": "hermes-email"},
"dingtalk": {"label": "💬 DingTalk", "default_toolset": "hermes-dingtalk"},
}
@@ -150,19 +151,37 @@ TOOL_CATEGORIES = {
"web": {
"name": "Web Search & Extract",
"setup_title": "Select Search Provider",
"setup_note": "A free DuckDuckGo search skill is also included — skip this if you don't need Firecrawl.",
"setup_note": "A free DuckDuckGo search skill is also included — skip this if you don't need a premium provider.",
"icon": "🔍",
"providers": [
{
"name": "Firecrawl Cloud",
"tag": "Recommended - hosted service",
"tag": "Hosted service - search, extract, and crawl",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
],
},
{
"name": "Parallel",
"tag": "AI-native search and extract",
"web_backend": "parallel",
"env_vars": [
{"key": "PARALLEL_API_KEY", "prompt": "Parallel API key", "url": "https://parallel.ai"},
],
},
{
"name": "Tavily",
"tag": "AI-native search, extract, and crawl",
"web_backend": "tavily",
"env_vars": [
{"key": "TAVILY_API_KEY", "prompt": "Tavily API key", "url": "https://app.tavily.com/home"},
],
},
{
"name": "Firecrawl Self-Hosted",
"tag": "Free - run your own instance",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_URL", "prompt": "Your Firecrawl instance URL (e.g., http://localhost:3002)"},
],
@@ -190,6 +209,7 @@ TOOL_CATEGORIES = {
"name": "Local Browser",
"tag": "Free headless Chromium (no API key needed)",
"env_vars": [],
"browser_provider": None,
"post_setup": "browserbase", # Same npm install for agent-browser
},
{
@@ -199,6 +219,16 @@ TOOL_CATEGORIES = {
{"key": "BROWSERBASE_API_KEY", "prompt": "Browserbase API key", "url": "https://browserbase.com"},
{"key": "BROWSERBASE_PROJECT_ID", "prompt": "Browserbase project ID"},
],
"browser_provider": "browserbase",
"post_setup": "browserbase",
},
{
"name": "Browser Use",
"tag": "Cloud browser with remote execution",
"env_vars": [
{"key": "BROWSER_USE_API_KEY", "prompt": "Browser Use API key", "url": "https://browser-use.com"},
],
"browser_provider": "browser-use",
"post_setup": "browserbase",
},
],
@@ -354,9 +384,29 @@ def _get_platform_tools(config: dict, platform: str) -> Set[str]:
def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[str]):
"""Save the selected toolset keys for a platform to config."""
"""Save the selected toolset keys for a platform to config.
Preserves any non-configurable toolset entries (like MCP server names)
that were already in the config for this platform.
"""
config.setdefault("platform_toolsets", {})
config["platform_toolsets"][platform] = sorted(enabled_toolset_keys)
# Get the set of all configurable toolset keys
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
# Get existing toolsets for this platform
existing_toolsets = config.get("platform_toolsets", {}).get(platform, [])
if not isinstance(existing_toolsets, list):
existing_toolsets = []
# Preserve any entries that are NOT configurable toolsets (i.e. MCP server names)
preserved_entries = {
entry for entry in existing_toolsets
if entry not in configurable_keys
}
# Merge preserved entries with new enabled toolsets
config["platform_toolsets"][platform] = sorted(enabled_toolset_keys | preserved_entries)
save_config(config)
@@ -555,10 +605,10 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
if p.get("tts_provider") and config.get("tts", {}).get("provider") == p["tts_provider"]:
if _is_provider_active(p, config):
configured = " [active]"
elif not env_vars:
configured = " [active]" if config.get("tts", {}).get("provider", "edge") == p.get("tts_provider", "") else ""
configured = ""
else:
configured = " [configured]"
provider_choices.append(f"{p['name']}{tag}{configured}")
@@ -567,15 +617,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
provider_choices.append("Skip — keep defaults / configure later")
# Detect current provider as default
default_idx = 0
for i, p in enumerate(providers):
if p.get("tts_provider") and config.get("tts", {}).get("provider") == p["tts_provider"]:
default_idx = i
break
env_vars = p.get("env_vars", [])
if env_vars and all(get_env_value(v["key"]) for v in env_vars):
default_idx = i
break
default_idx = _detect_active_provider_index(providers, config)
provider_idx = _prompt_choice(f" {title}:", provider_choices, default_idx)
@@ -587,6 +629,31 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
_configure_provider(providers[provider_idx], config)
def _is_provider_active(provider: dict, config: dict) -> bool:
"""Check if a provider entry matches the currently active config."""
if provider.get("tts_provider"):
return config.get("tts", {}).get("provider") == provider["tts_provider"]
if "browser_provider" in provider:
current = config.get("browser", {}).get("cloud_provider")
return provider["browser_provider"] == current
if provider.get("web_backend"):
current = config.get("web", {}).get("backend")
return current == provider["web_backend"]
return False
def _detect_active_provider_index(providers: list, config: dict) -> int:
"""Return the index of the currently active provider, or 0."""
for i, p in enumerate(providers):
if _is_provider_active(p, config):
return i
# Fallback: env vars present → likely configured
env_vars = p.get("env_vars", [])
if env_vars and all(get_env_value(v["key"]) for v in env_vars):
return i
return 0
def _configure_provider(provider: dict, config: dict):
"""Configure a single provider - prompt for API keys and set config."""
env_vars = provider.get("env_vars", [])
@@ -595,6 +662,20 @@ def _configure_provider(provider: dict, config: dict):
if provider.get("tts_provider"):
config.setdefault("tts", {})["provider"] = provider["tts_provider"]
# Set browser cloud provider in config if applicable
if "browser_provider" in provider:
bp = provider["browser_provider"]
if bp:
config.setdefault("browser", {})["cloud_provider"] = bp
_print_success(f" Browser cloud provider set to: {bp}")
else:
config.get("browser", {}).pop("cloud_provider", None)
# Set web search backend in config if applicable
if provider.get("web_backend"):
config.setdefault("web", {})["backend"] = provider["web_backend"]
_print_success(f" Web backend set to: {provider['web_backend']}")
if not env_vars:
_print_success(f" {provider['name']} - no configuration needed!")
return
@@ -747,7 +828,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
if p.get("tts_provider") and config.get("tts", {}).get("provider") == p["tts_provider"]:
if _is_provider_active(p, config):
configured = " [active]"
elif not env_vars:
configured = ""
@@ -755,15 +836,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
configured = " [configured]"
provider_choices.append(f"{p['name']}{tag}{configured}")
default_idx = 0
for i, p in enumerate(providers):
if p.get("tts_provider") and config.get("tts", {}).get("provider") == p["tts_provider"]:
default_idx = i
break
env_vars = p.get("env_vars", [])
if env_vars and all(get_env_value(v["key"]) for v in env_vars):
default_idx = i
break
default_idx = _detect_active_provider_index(providers, config)
provider_idx = _prompt_choice(" Select provider:", provider_choices, default_idx)
_reconfigure_provider(providers[provider_idx], config)
@@ -777,6 +850,20 @@ def _reconfigure_provider(provider: dict, config: dict):
config.setdefault("tts", {})["provider"] = provider["tts_provider"]
_print_success(f" TTS provider set to: {provider['tts_provider']}")
if "browser_provider" in provider:
bp = provider["browser_provider"]
if bp:
config.setdefault("browser", {})["cloud_provider"] = bp
_print_success(f" Browser cloud provider set to: {bp}")
else:
config.get("browser", {}).pop("cloud_provider", None)
_print_success(f" Browser set to local mode")
# Set web search backend in config if applicable
if provider.get("web_backend"):
config.setdefault("web", {})["backend"] = provider["web_backend"]
_print_success(f" Web backend set to: {provider['web_backend']}")
if not env_vars:
_print_success(f" {provider['name']} - no configuration needed!")
return
@@ -929,12 +1016,19 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
if len(platform_keys) > 1:
platform_choices.append("Configure all platforms (global)")
platform_choices.append("Reconfigure an existing tool's provider or API key")
# Show MCP option if any MCP servers are configured
_has_mcp = bool(config.get("mcp_servers"))
if _has_mcp:
platform_choices.append("Configure MCP server tools")
platform_choices.append("Done")
# Index offsets for the extra options after per-platform entries
_global_idx = len(platform_keys) if len(platform_keys) > 1 else -1
_reconfig_idx = len(platform_keys) + (1 if len(platform_keys) > 1 else 0)
_done_idx = _reconfig_idx + 1
_mcp_idx = (_reconfig_idx + 1) if _has_mcp else -1
_done_idx = _reconfig_idx + (2 if _has_mcp else 1)
while True:
idx = _prompt_choice("Select an option:", platform_choices, default=0)
@@ -949,6 +1043,12 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
print()
continue
# "Configure MCP tools" selected
if idx == _mcp_idx:
_configure_mcp_tools_interactive(config)
print()
continue
# "Configure all platforms (global)" selected
if idx == _global_idx:
# Use the union of all platforms' current tools as the starting state
@@ -1033,3 +1133,245 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
print(color(" Tool configuration saved to ~/.hermes/config.yaml", Colors.DIM))
print(color(" Changes take effect on next 'hermes' or gateway restart.", Colors.DIM))
print()
# ─── MCP Tools Interactive Configuration ─────────────────────────────────────
def _configure_mcp_tools_interactive(config: dict):
"""Probe MCP servers for available tools and let user toggle them on/off.
Connects to each configured MCP server, discovers tools, then shows
a per-server curses checklist. Writes changes back as ``tools.exclude``
entries in config.yaml.
"""
from hermes_cli.curses_ui import curses_checklist
mcp_servers = config.get("mcp_servers") or {}
if not mcp_servers:
_print_info("No MCP servers configured.")
return
# Count enabled servers
enabled_names = [
k for k, v in mcp_servers.items()
if v.get("enabled", True) not in (False, "false", "0", "no", "off")
]
if not enabled_names:
_print_info("All MCP servers are disabled.")
return
print()
print(color(" Discovering tools from MCP servers...", Colors.YELLOW))
print(color(f" Connecting to {len(enabled_names)} server(s): {', '.join(enabled_names)}", Colors.DIM))
try:
from tools.mcp_tool import probe_mcp_server_tools
server_tools = probe_mcp_server_tools()
except Exception as exc:
_print_error(f"Failed to probe MCP servers: {exc}")
return
if not server_tools:
_print_warning("Could not discover tools from any MCP server.")
_print_info("Check that server commands/URLs are correct and dependencies are installed.")
return
# Report discovery results
failed = [n for n in enabled_names if n not in server_tools]
if failed:
for name in failed:
_print_warning(f" Could not connect to '{name}'")
total_tools = sum(len(tools) for tools in server_tools.values())
print(color(f" Found {total_tools} tool(s) across {len(server_tools)} server(s)", Colors.GREEN))
print()
any_changes = False
for server_name, tools in server_tools.items():
if not tools:
_print_info(f" {server_name}: no tools found")
continue
srv_cfg = mcp_servers.get(server_name, {})
tools_cfg = srv_cfg.get("tools") or {}
include_list = tools_cfg.get("include") or []
exclude_list = tools_cfg.get("exclude") or []
# Build checklist labels
labels = []
for tool_name, description in tools:
desc_short = description[:70] + "..." if len(description) > 70 else description
if desc_short:
labels.append(f"{tool_name} ({desc_short})")
else:
labels.append(tool_name)
# Determine which tools are currently enabled
pre_selected: Set[int] = set()
tool_names = [t[0] for t in tools]
for i, tool_name in enumerate(tool_names):
if include_list:
# Include mode: only included tools are selected
if tool_name in include_list:
pre_selected.add(i)
elif exclude_list:
# Exclude mode: everything except excluded
if tool_name not in exclude_list:
pre_selected.add(i)
else:
# No filter: all enabled
pre_selected.add(i)
chosen = curses_checklist(
f"MCP Server: {server_name} ({len(tools)} tools)",
labels,
pre_selected,
cancel_returns=pre_selected,
)
if chosen == pre_selected:
_print_info(f" {server_name}: no changes")
continue
# Compute new exclude list based on unchecked tools
new_exclude = [tool_names[i] for i in range(len(tool_names)) if i not in chosen]
# Update config
srv_cfg = mcp_servers.setdefault(server_name, {})
tools_cfg = srv_cfg.setdefault("tools", {})
if new_exclude:
tools_cfg["exclude"] = new_exclude
# Remove include if present — we're switching to exclude mode
tools_cfg.pop("include", None)
else:
# All tools enabled — clear filters
tools_cfg.pop("exclude", None)
tools_cfg.pop("include", None)
enabled_count = len(chosen)
disabled_count = len(tools) - enabled_count
_print_success(
f" {server_name}: {enabled_count} enabled, {disabled_count} disabled"
)
any_changes = True
if any_changes:
save_config(config)
print()
print(color(" ✓ MCP tool configuration saved", Colors.GREEN))
else:
print(color(" No changes to MCP tools", Colors.DIM))
# ─── Non-interactive disable/enable ──────────────────────────────────────────
def _apply_toolset_change(config: dict, platform: str, toolset_names: List[str], action: str):
"""Add or remove built-in toolsets for a platform."""
enabled = _get_platform_tools(config, platform)
if action == "disable":
updated = enabled - set(toolset_names)
else:
updated = enabled | set(toolset_names)
_save_platform_tools(config, platform, updated)
def _apply_mcp_change(config: dict, targets: List[str], action: str) -> Set[str]:
"""Add or remove specific MCP tools from a server's exclude list.
Returns the set of server names that were not found in config.
"""
failed_servers: Set[str] = set()
mcp_servers = config.get("mcp_servers") or {}
for target in targets:
server_name, tool_name = target.split(":", 1)
if server_name not in mcp_servers:
failed_servers.add(server_name)
continue
tools_cfg = mcp_servers[server_name].setdefault("tools", {})
exclude = list(tools_cfg.get("exclude") or [])
if action == "disable":
if tool_name not in exclude:
exclude.append(tool_name)
else:
exclude = [t for t in exclude if t != tool_name]
tools_cfg["exclude"] = exclude
return failed_servers
def _print_tools_list(enabled_toolsets: set, mcp_servers: dict, platform: str = "cli"):
"""Print a summary of enabled/disabled toolsets and MCP tool filters."""
print(f"Built-in toolsets ({platform}):")
for ts_key, label, _ in CONFIGURABLE_TOOLSETS:
status = (color("✓ enabled", Colors.GREEN) if ts_key in enabled_toolsets
else color("✗ disabled", Colors.RED))
print(f" {status} {ts_key} {color(label, Colors.DIM)}")
if mcp_servers:
print()
print("MCP servers:")
for srv_name, srv_cfg in mcp_servers.items():
tools_cfg = srv_cfg.get("tools") or {}
exclude = tools_cfg.get("exclude") or []
include = tools_cfg.get("include") or []
if include:
_print_info(f"{srv_name} [include only: {', '.join(include)}]")
elif exclude:
_print_info(f"{srv_name} [excluded: {color(', '.join(exclude), Colors.YELLOW)}]")
else:
_print_info(f"{srv_name} {color('all tools enabled', Colors.DIM)}")
def tools_disable_enable_command(args):
"""Enable, disable, or list tools for a platform.
Built-in toolsets use plain names (e.g. ``web``, ``memory``).
MCP tools use ``server:tool`` notation (e.g. ``github:create_issue``).
"""
action = args.tools_action
platform = getattr(args, "platform", "cli")
config = load_config()
if platform not in PLATFORMS:
_print_error(f"Unknown platform '{platform}'. Valid: {', '.join(PLATFORMS)}")
return
if action == "list":
_print_tools_list(_get_platform_tools(config, platform),
config.get("mcp_servers") or {}, platform)
return
targets: List[str] = args.names
toolset_targets = [t for t in targets if ":" not in t]
mcp_targets = [t for t in targets if ":" in t]
valid_toolsets = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
unknown_toolsets = [t for t in toolset_targets if t not in valid_toolsets]
if unknown_toolsets:
for name in unknown_toolsets:
_print_error(f"Unknown toolset '{name}'")
toolset_targets = [t for t in toolset_targets if t in valid_toolsets]
if toolset_targets:
_apply_toolset_change(config, platform, toolset_targets, action)
failed_servers: Set[str] = set()
if mcp_targets:
failed_servers = _apply_mcp_change(config, mcp_targets, action)
for srv in failed_servers:
_print_error(f"MCP server '{srv}' not found in config")
save_config(config)
successful = [
t for t in targets
if t not in unknown_toolsets and (":" not in t or t.split(":")[0] not in failed_servers)
]
if successful:
verb = "Disabled" if action == "disable" else "Enabled"
_print_success(f"{verb}: {', '.join(successful)}")
+9 -3
View File
@@ -133,7 +133,13 @@ def uninstall_gateway_service():
if platform.system() != "Linux":
return False
service_file = Path.home() / ".config" / "systemd" / "user" / "hermes-gateway.service"
try:
from hermes_cli.gateway import get_service_name
svc_name = get_service_name()
except Exception:
svc_name = "hermes-gateway"
service_file = Path.home() / ".config" / "systemd" / "user" / f"{svc_name}.service"
if not service_file.exists():
return False
@@ -141,14 +147,14 @@ def uninstall_gateway_service():
try:
# Stop the service
subprocess.run(
["systemctl", "--user", "stop", "hermes-gateway"],
["systemctl", "--user", "stop", svc_name],
capture_output=True,
check=False
)
# Disable the service
subprocess.run(
["systemctl", "--user", "disable", "hermes-gateway"],
["systemctl", "--user", "disable", svc_name],
capture_output=True,
check=False
)
+4
View File
@@ -8,5 +8,9 @@ OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
OPENROUTER_MODELS_URL = f"{OPENROUTER_BASE_URL}/models"
OPENROUTER_CHAT_URL = f"{OPENROUTER_BASE_URL}/chat/completions"
AI_GATEWAY_BASE_URL = "https://ai-gateway.vercel.sh/v1"
AI_GATEWAY_MODELS_URL = f"{AI_GATEWAY_BASE_URL}/models"
AI_GATEWAY_CHAT_URL = f"{AI_GATEWAY_BASE_URL}/chat/completions"
NOUS_API_BASE_URL = "https://inference-api.nousresearch.com/v1"
NOUS_API_CHAT_URL = f"{NOUS_API_BASE_URL}/chat/completions"
+363 -208
View File
@@ -18,6 +18,7 @@ import json
import os
import re
import sqlite3
import threading
import time
from pathlib import Path
from typing import Dict, Any, List, Optional
@@ -25,7 +26,7 @@ from typing import Dict, Any, List, Optional
DEFAULT_DB_PATH = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) / "state.db"
SCHEMA_VERSION = 4
SCHEMA_VERSION = 5
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -47,6 +48,17 @@ CREATE TABLE IF NOT EXISTS sessions (
tool_call_count INTEGER DEFAULT 0,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cache_read_tokens INTEGER DEFAULT 0,
cache_write_tokens INTEGER DEFAULT 0,
reasoning_tokens INTEGER DEFAULT 0,
billing_provider TEXT,
billing_base_url TEXT,
billing_mode TEXT,
estimated_cost_usd REAL,
actual_cost_usd REAL,
cost_status TEXT,
cost_source TEXT,
pricing_version TEXT,
title TEXT,
FOREIGN KEY (parent_session_id) REFERENCES sessions(id)
);
@@ -104,6 +116,7 @@ class SessionDB:
self.db_path = db_path or DEFAULT_DB_PATH
self.db_path.parent.mkdir(parents=True, exist_ok=True)
self._lock = threading.Lock()
self._conn = sqlite3.connect(
str(self.db_path),
check_same_thread=False,
@@ -152,6 +165,30 @@ class SessionDB:
except sqlite3.OperationalError:
pass # Index already exists
cursor.execute("UPDATE schema_version SET version = 4")
if current_version < 5:
new_columns = [
("cache_read_tokens", "INTEGER DEFAULT 0"),
("cache_write_tokens", "INTEGER DEFAULT 0"),
("reasoning_tokens", "INTEGER DEFAULT 0"),
("billing_provider", "TEXT"),
("billing_base_url", "TEXT"),
("billing_mode", "TEXT"),
("estimated_cost_usd", "REAL"),
("actual_cost_usd", "REAL"),
("cost_status", "TEXT"),
("cost_source", "TEXT"),
("pricing_version", "TEXT"),
]
for name, column_type in new_columns:
try:
# name and column_type come from the hardcoded tuple above,
# not user input. Double-quote identifier escaping is applied
# as defense-in-depth; SQLite DDL cannot be parameterized.
safe_name = name.replace('"', '""')
cursor.execute(f'ALTER TABLE sessions ADD COLUMN "{safe_name}" {column_type}')
except sqlite3.OperationalError:
pass
cursor.execute("UPDATE schema_version SET version = 5")
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
@@ -173,9 +210,10 @@ class SessionDB:
def close(self):
"""Close the database connection."""
if self._conn:
self._conn.close()
self._conn = None
with self._lock:
if self._conn:
self._conn.close()
self._conn = None
# =========================================================================
# Session lifecycle
@@ -192,63 +230,140 @@ class SessionDB:
parent_session_id: str = None,
) -> str:
"""Create a new session record. Returns the session_id."""
self._conn.execute(
"""INSERT INTO sessions (id, source, user_id, model, model_config,
system_prompt, parent_session_id, started_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
source,
user_id,
model,
json.dumps(model_config) if model_config else None,
system_prompt,
parent_session_id,
time.time(),
),
)
self._conn.commit()
with self._lock:
self._conn.execute(
"""INSERT INTO sessions (id, source, user_id, model, model_config,
system_prompt, parent_session_id, started_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
source,
user_id,
model,
json.dumps(model_config) if model_config else None,
system_prompt,
parent_session_id,
time.time(),
),
)
self._conn.commit()
return session_id
def end_session(self, session_id: str, end_reason: str) -> None:
"""Mark a session as ended."""
self._conn.execute(
"UPDATE sessions SET ended_at = ?, end_reason = ? WHERE id = ?",
(time.time(), end_reason, session_id),
)
self._conn.commit()
with self._lock:
self._conn.execute(
"UPDATE sessions SET ended_at = ?, end_reason = ? WHERE id = ?",
(time.time(), end_reason, session_id),
)
self._conn.commit()
def update_system_prompt(self, session_id: str, system_prompt: str) -> None:
"""Store the full assembled system prompt snapshot."""
self._conn.execute(
"UPDATE sessions SET system_prompt = ? WHERE id = ?",
(system_prompt, session_id),
)
self._conn.commit()
with self._lock:
self._conn.execute(
"UPDATE sessions SET system_prompt = ? WHERE id = ?",
(system_prompt, session_id),
)
self._conn.commit()
def update_token_counts(
self, session_id: str, input_tokens: int = 0, output_tokens: int = 0,
self,
session_id: str,
input_tokens: int = 0,
output_tokens: int = 0,
model: str = None,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
reasoning_tokens: int = 0,
estimated_cost_usd: Optional[float] = None,
actual_cost_usd: Optional[float] = None,
cost_status: Optional[str] = None,
cost_source: Optional[str] = None,
pricing_version: Optional[str] = None,
billing_provider: Optional[str] = None,
billing_base_url: Optional[str] = None,
billing_mode: Optional[str] = None,
) -> None:
"""Increment token counters and backfill model if not already set."""
self._conn.execute(
"""UPDATE sessions SET
input_tokens = input_tokens + ?,
output_tokens = output_tokens + ?,
model = COALESCE(model, ?)
WHERE id = ?""",
(input_tokens, output_tokens, model, session_id),
)
self._conn.commit()
with self._lock:
self._conn.execute(
"""UPDATE sessions SET
input_tokens = input_tokens + ?,
output_tokens = output_tokens + ?,
cache_read_tokens = cache_read_tokens + ?,
cache_write_tokens = cache_write_tokens + ?,
reasoning_tokens = reasoning_tokens + ?,
estimated_cost_usd = COALESCE(estimated_cost_usd, 0) + COALESCE(?, 0),
actual_cost_usd = CASE
WHEN ? IS NULL THEN actual_cost_usd
ELSE COALESCE(actual_cost_usd, 0) + ?
END,
cost_status = COALESCE(?, cost_status),
cost_source = COALESCE(?, cost_source),
pricing_version = COALESCE(?, pricing_version),
billing_provider = COALESCE(billing_provider, ?),
billing_base_url = COALESCE(billing_base_url, ?),
billing_mode = COALESCE(billing_mode, ?),
model = COALESCE(model, ?)
WHERE id = ?""",
(
input_tokens,
output_tokens,
cache_read_tokens,
cache_write_tokens,
reasoning_tokens,
estimated_cost_usd,
actual_cost_usd,
actual_cost_usd,
cost_status,
cost_source,
pricing_version,
billing_provider,
billing_base_url,
billing_mode,
model,
session_id,
),
)
self._conn.commit()
def get_session(self, session_id: str) -> Optional[Dict[str, Any]]:
"""Get a session by ID."""
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE id = ?", (session_id,)
)
row = cursor.fetchone()
with self._lock:
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE id = ?", (session_id,)
)
row = cursor.fetchone()
return dict(row) if row else None
def resolve_session_id(self, session_id_or_prefix: str) -> Optional[str]:
"""Resolve an exact or uniquely prefixed session ID to the full ID.
Returns the exact ID when it exists. Otherwise treats the input as a
prefix and returns the single matching session ID if the prefix is
unambiguous. Returns None for no matches or ambiguous prefixes.
"""
exact = self.get_session(session_id_or_prefix)
if exact:
return exact["id"]
escaped = (
session_id_or_prefix
.replace("\\", "\\\\")
.replace("%", "\\%")
.replace("_", "\\_")
)
with self._lock:
cursor = self._conn.execute(
"SELECT id FROM sessions WHERE id LIKE ? ESCAPE '\\' ORDER BY started_at DESC LIMIT 2",
(f"{escaped}%",),
)
matches = [row["id"] for row in cursor.fetchall()]
if len(matches) == 1:
return matches[0]
return None
# Maximum length for session titles
MAX_TITLE_LENGTH = 100
@@ -305,38 +420,42 @@ class SessionDB:
Empty/whitespace-only strings are normalized to None (clearing the title).
"""
title = self.sanitize_title(title)
if title:
# Check uniqueness (allow the same session to keep its own title)
with self._lock:
if title:
# Check uniqueness (allow the same session to keep its own title)
cursor = self._conn.execute(
"SELECT id FROM sessions WHERE title = ? AND id != ?",
(title, session_id),
)
conflict = cursor.fetchone()
if conflict:
raise ValueError(
f"Title '{title}' is already in use by session {conflict['id']}"
)
cursor = self._conn.execute(
"SELECT id FROM sessions WHERE title = ? AND id != ?",
"UPDATE sessions SET title = ? WHERE id = ?",
(title, session_id),
)
conflict = cursor.fetchone()
if conflict:
raise ValueError(
f"Title '{title}' is already in use by session {conflict['id']}"
)
cursor = self._conn.execute(
"UPDATE sessions SET title = ? WHERE id = ?",
(title, session_id),
)
self._conn.commit()
return cursor.rowcount > 0
self._conn.commit()
rowcount = cursor.rowcount
return rowcount > 0
def get_session_title(self, session_id: str) -> Optional[str]:
"""Get the title for a session, or None."""
cursor = self._conn.execute(
"SELECT title FROM sessions WHERE id = ?", (session_id,)
)
row = cursor.fetchone()
with self._lock:
cursor = self._conn.execute(
"SELECT title FROM sessions WHERE id = ?", (session_id,)
)
row = cursor.fetchone()
return row["title"] if row else None
def get_session_by_title(self, title: str) -> Optional[Dict[str, Any]]:
"""Look up a session by exact title. Returns session dict or None."""
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE title = ?", (title,)
)
row = cursor.fetchone()
with self._lock:
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE title = ?", (title,)
)
row = cursor.fetchone()
return dict(row) if row else None
def resolve_session_by_title(self, title: str) -> Optional[str]:
@@ -353,12 +472,13 @@ class SessionDB:
# Also search for numbered variants: "title #2", "title #3", etc.
# Escape SQL LIKE wildcards (%, _) in the title to prevent false matches
escaped = title.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
cursor = self._conn.execute(
"SELECT id, title, started_at FROM sessions "
"WHERE title LIKE ? ESCAPE '\\' ORDER BY started_at DESC",
(f"{escaped} #%",),
)
numbered = cursor.fetchall()
with self._lock:
cursor = self._conn.execute(
"SELECT id, title, started_at FROM sessions "
"WHERE title LIKE ? ESCAPE '\\' ORDER BY started_at DESC",
(f"{escaped} #%",),
)
numbered = cursor.fetchall()
if numbered:
# Return the most recent numbered variant
@@ -383,11 +503,12 @@ class SessionDB:
# Find all existing numbered variants
# Escape SQL LIKE wildcards (%, _) in the base to prevent false matches
escaped = base.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
cursor = self._conn.execute(
"SELECT title FROM sessions WHERE title = ? OR title LIKE ? ESCAPE '\\'",
(base, f"{escaped} #%"),
)
existing = [row["title"] for row in cursor.fetchall()]
with self._lock:
cursor = self._conn.execute(
"SELECT title FROM sessions WHERE title = ? OR title LIKE ? ESCAPE '\\'",
(base, f"{escaped} #%"),
)
existing = [row["title"] for row in cursor.fetchall()]
if not existing:
return base # No conflict, use the base name as-is
@@ -435,9 +556,11 @@ class SessionDB:
LIMIT ? OFFSET ?
"""
params = (source, limit, offset) if source else (limit, offset)
cursor = self._conn.execute(query, params)
with self._lock:
cursor = self._conn.execute(query, params)
rows = cursor.fetchall()
sessions = []
for row in cursor.fetchall():
for row in rows:
s = dict(row)
# Build the preview from the raw substring
raw = s.pop("_preview_raw", "").strip()
@@ -471,52 +594,54 @@ class SessionDB:
Also increments the session's message_count (and tool_call_count
if role is 'tool' or tool_calls is present).
"""
cursor = self._conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
content,
tool_call_id,
json.dumps(tool_calls) if tool_calls else None,
tool_name,
time.time(),
token_count,
finish_reason,
),
)
msg_id = cursor.lastrowid
# Update counters
# Count actual tool calls from the tool_calls list (not from tool responses).
# A single assistant message can contain multiple parallel tool calls.
num_tool_calls = 0
if tool_calls is not None:
num_tool_calls = len(tool_calls) if isinstance(tool_calls, list) else 1
if num_tool_calls > 0:
self._conn.execute(
"""UPDATE sessions SET message_count = message_count + 1,
tool_call_count = tool_call_count + ? WHERE id = ?""",
(num_tool_calls, session_id),
)
else:
self._conn.execute(
"UPDATE sessions SET message_count = message_count + 1 WHERE id = ?",
(session_id,),
with self._lock:
cursor = self._conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
content,
tool_call_id,
json.dumps(tool_calls) if tool_calls else None,
tool_name,
time.time(),
token_count,
finish_reason,
),
)
msg_id = cursor.lastrowid
self._conn.commit()
# Update counters
# Count actual tool calls from the tool_calls list (not from tool responses).
# A single assistant message can contain multiple parallel tool calls.
num_tool_calls = 0
if tool_calls is not None:
num_tool_calls = len(tool_calls) if isinstance(tool_calls, list) else 1
if num_tool_calls > 0:
self._conn.execute(
"""UPDATE sessions SET message_count = message_count + 1,
tool_call_count = tool_call_count + ? WHERE id = ?""",
(num_tool_calls, session_id),
)
else:
self._conn.execute(
"UPDATE sessions SET message_count = message_count + 1 WHERE id = ?",
(session_id,),
)
self._conn.commit()
return msg_id
def get_messages(self, session_id: str) -> List[Dict[str, Any]]:
"""Load all messages for a session, ordered by timestamp."""
cursor = self._conn.execute(
"SELECT * FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
rows = cursor.fetchall()
with self._lock:
cursor = self._conn.execute(
"SELECT * FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
rows = cursor.fetchall()
result = []
for row in rows:
msg = dict(row)
@@ -533,13 +658,15 @@ class SessionDB:
Load messages in the OpenAI conversation format (role + content dicts).
Used by the gateway to restore conversation history.
"""
cursor = self._conn.execute(
"SELECT role, content, tool_call_id, tool_calls, tool_name "
"FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
with self._lock:
cursor = self._conn.execute(
"SELECT role, content, tool_call_id, tool_calls, tool_name "
"FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
rows = cursor.fetchall()
messages = []
for row in cursor.fetchall():
for row in rows:
msg = {"role": row["role"], "content": row["content"]}
if row["tool_call_id"]:
msg["tool_call_id"] = row["tool_call_id"]
@@ -566,21 +693,45 @@ class SessionDB:
``NOT``) have special meaning. Passing raw user input directly to
MATCH can cause ``sqlite3.OperationalError``.
Strategy: strip characters that are only meaningful as FTS5 operators
and would otherwise cause syntax errors. This preserves normal keyword
search while preventing crashes on inputs like ``C++``, ``"unterminated``,
or ``hello AND``.
Strategy:
- Preserve properly paired quoted phrases (``"exact phrase"``)
- Strip unmatched FTS5-special characters that would cause errors
- Wrap unquoted hyphenated terms in quotes so FTS5 matches them
as exact phrases instead of splitting on the hyphen
"""
# Remove FTS5-special characters that are not useful in keyword search
sanitized = re.sub(r'[+{}()"^]', " ", query)
# Collapse repeated * (e.g. "***") into a single one, and remove
# leading * (prefix-only matching requires at least one char before *)
# Step 1: Extract balanced double-quoted phrases and protect them
# from further processing via numbered placeholders.
_quoted_parts: list = []
def _preserve_quoted(m: re.Match) -> str:
_quoted_parts.append(m.group(0))
return f"\x00Q{len(_quoted_parts) - 1}\x00"
sanitized = re.sub(r'"[^"]*"', _preserve_quoted, query)
# Step 2: Strip remaining (unmatched) FTS5-special characters
sanitized = re.sub(r'[+{}()\"^]', " ", sanitized)
# Step 3: Collapse repeated * (e.g. "***") into a single one,
# and remove leading * (prefix-only needs at least one char before *)
sanitized = re.sub(r"\*+", "*", sanitized)
sanitized = re.sub(r"(^|\s)\*", r"\1", sanitized)
# Remove dangling boolean operators at start/end that would cause
# syntax errors (e.g. "hello AND" or "OR world")
# Step 4: Remove dangling boolean operators at start/end that would
# cause syntax errors (e.g. "hello AND" or "OR world")
sanitized = re.sub(r"(?i)^(AND|OR|NOT)\b\s*", "", sanitized.strip())
sanitized = re.sub(r"(?i)\s+(AND|OR|NOT)\s*$", "", sanitized.strip())
# Step 5: Wrap unquoted hyphenated terms (e.g. ``chat-send``) in
# double quotes. FTS5's tokenizer splits on hyphens, turning
# ``chat-send`` into ``chat AND send``. Quoting preserves the
# intended phrase match.
sanitized = re.sub(r"\b(\w+(?:-\w+)+)\b", r'"\1"', sanitized)
# Step 6: Restore preserved quoted phrases
for i, quoted in enumerate(_quoted_parts):
sanitized = sanitized.replace(f"\x00Q{i}\x00", quoted)
return sanitized.strip()
def search_messages(
@@ -610,16 +761,14 @@ class SessionDB:
if not query:
return []
if source_filter is None:
source_filter = ["cli", "telegram", "discord", "whatsapp", "slack"]
# Build WHERE clauses dynamically
where_clauses = ["messages_fts MATCH ?"]
params: list = [query]
source_placeholders = ",".join("?" for _ in source_filter)
where_clauses.append(f"s.source IN ({source_placeholders})")
params.extend(source_filter)
if source_filter is not None:
source_placeholders = ",".join("?" for _ in source_filter)
where_clauses.append(f"s.source IN ({source_placeholders})")
params.extend(source_filter)
if role_filter:
role_placeholders = ",".join("?" for _ in role_filter)
@@ -649,31 +798,33 @@ class SessionDB:
LIMIT ? OFFSET ?
"""
try:
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
return []
matches = [dict(row) for row in cursor.fetchall()]
# Add surrounding context (1 message before + after each match)
for match in matches:
with self._lock:
try:
ctx_cursor = self._conn.execute(
"""SELECT role, content FROM messages
WHERE session_id = ? AND id >= ? - 1 AND id <= ? + 1
ORDER BY id""",
(match["session_id"], match["id"], match["id"]),
)
context_msgs = [
{"role": r["role"], "content": (r["content"] or "")[:200]}
for r in ctx_cursor.fetchall()
]
match["context"] = context_msgs
except Exception:
match["context"] = []
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
return []
matches = [dict(row) for row in cursor.fetchall()]
# Remove full content from result (snippet is enough, saves tokens)
# Add surrounding context (1 message before + after each match)
for match in matches:
try:
ctx_cursor = self._conn.execute(
"""SELECT role, content FROM messages
WHERE session_id = ? AND id >= ? - 1 AND id <= ? + 1
ORDER BY id""",
(match["session_id"], match["id"], match["id"]),
)
context_msgs = [
{"role": r["role"], "content": (r["content"] or "")[:200]}
for r in ctx_cursor.fetchall()
]
match["context"] = context_msgs
except Exception:
match["context"] = []
# Remove full content from result (snippet is enough, saves tokens)
for match in matches:
match.pop("content", None)
return matches
@@ -685,17 +836,18 @@ class SessionDB:
offset: int = 0,
) -> List[Dict[str, Any]]:
"""List sessions, optionally filtered by source."""
if source:
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE source = ? ORDER BY started_at DESC LIMIT ? OFFSET ?",
(source, limit, offset),
)
else:
cursor = self._conn.execute(
"SELECT * FROM sessions ORDER BY started_at DESC LIMIT ? OFFSET ?",
(limit, offset),
)
return [dict(row) for row in cursor.fetchall()]
with self._lock:
if source:
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE source = ? ORDER BY started_at DESC LIMIT ? OFFSET ?",
(source, limit, offset),
)
else:
cursor = self._conn.execute(
"SELECT * FROM sessions ORDER BY started_at DESC LIMIT ? OFFSET ?",
(limit, offset),
)
return [dict(row) for row in cursor.fetchall()]
# =========================================================================
# Utility
@@ -747,26 +899,28 @@ class SessionDB:
def clear_messages(self, session_id: str) -> None:
"""Delete all messages for a session and reset its counters."""
self._conn.execute(
"DELETE FROM messages WHERE session_id = ?", (session_id,)
)
self._conn.execute(
"UPDATE sessions SET message_count = 0, tool_call_count = 0 WHERE id = ?",
(session_id,),
)
self._conn.commit()
with self._lock:
self._conn.execute(
"DELETE FROM messages WHERE session_id = ?", (session_id,)
)
self._conn.execute(
"UPDATE sessions SET message_count = 0, tool_call_count = 0 WHERE id = ?",
(session_id,),
)
self._conn.commit()
def delete_session(self, session_id: str) -> bool:
"""Delete a session and all its messages. Returns True if found."""
cursor = self._conn.execute(
"SELECT COUNT(*) FROM sessions WHERE id = ?", (session_id,)
)
if cursor.fetchone()[0] == 0:
return False
self._conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
self._conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
self._conn.commit()
return True
with self._lock:
cursor = self._conn.execute(
"SELECT COUNT(*) FROM sessions WHERE id = ?", (session_id,)
)
if cursor.fetchone()[0] == 0:
return False
self._conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
self._conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
self._conn.commit()
return True
def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
"""
@@ -776,22 +930,23 @@ class SessionDB:
import time as _time
cutoff = _time.time() - (older_than_days * 86400)
if source:
cursor = self._conn.execute(
"""SELECT id FROM sessions
WHERE started_at < ? AND ended_at IS NOT NULL AND source = ?""",
(cutoff, source),
)
else:
cursor = self._conn.execute(
"SELECT id FROM sessions WHERE started_at < ? AND ended_at IS NOT NULL",
(cutoff,),
)
session_ids = [row["id"] for row in cursor.fetchall()]
with self._lock:
if source:
cursor = self._conn.execute(
"""SELECT id FROM sessions
WHERE started_at < ? AND ended_at IS NOT NULL AND source = ?""",
(cutoff, source),
)
else:
cursor = self._conn.execute(
"SELECT id FROM sessions WHERE started_at < ? AND ended_at IS NOT NULL",
(cutoff,),
)
session_ids = [row["id"] for row in cursor.fetchall()]
for sid in session_ids:
self._conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
self._conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
for sid in session_ids:
self._conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
self._conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
self._conn.commit()
self._conn.commit()
return len(session_ids)
+48 -14
View File
@@ -69,6 +69,8 @@ class HonchoClientConfig:
workspace_id: str = "hermes"
api_key: str | None = None
environment: str = "production"
# Optional base URL for self-hosted Honcho (overrides environment mapping)
base_url: str | None = None
# Identity
peer_name: str | None = None
ai_peer: str = "hermes"
@@ -114,11 +116,14 @@ class HonchoClientConfig:
@classmethod
def from_env(cls, workspace_id: str = "hermes") -> HonchoClientConfig:
"""Create config from environment variables (fallback)."""
api_key = os.environ.get("HONCHO_API_KEY")
base_url = os.environ.get("HONCHO_BASE_URL", "").strip() or None
return cls(
workspace_id=workspace_id,
api_key=os.environ.get("HONCHO_API_KEY"),
api_key=api_key,
environment=os.environ.get("HONCHO_ENVIRONMENT", "production"),
enabled=True,
base_url=base_url,
enabled=bool(api_key or base_url),
)
@classmethod
@@ -168,8 +173,14 @@ class HonchoClientConfig:
or raw.get("environment", "production")
)
# Auto-enable when API key is present (unless explicitly disabled)
# Host-level enabled wins, then root-level, then auto-enable if key exists.
base_url = (
raw.get("baseUrl")
or os.environ.get("HONCHO_BASE_URL", "").strip()
or None
)
# Auto-enable when API key or base_url is present (unless explicitly disabled)
# Host-level enabled wins, then root-level, then auto-enable if key/url exists.
host_enabled = host_block.get("enabled")
root_enabled = raw.get("enabled")
if host_enabled is not None:
@@ -177,8 +188,8 @@ class HonchoClientConfig:
elif root_enabled is not None:
enabled = root_enabled
else:
# Not explicitly set anywhere -> auto-enable if API key exists
enabled = bool(api_key)
# Not explicitly set anywhere -> auto-enable if API key or base_url exists
enabled = bool(api_key or base_url)
# write_frequency: accept int or string
raw_wf = (
@@ -211,6 +222,7 @@ class HonchoClientConfig:
workspace_id=workspace,
api_key=api_key,
environment=environment,
base_url=base_url,
peer_name=host_block.get("peerName") or raw.get("peerName"),
ai_peer=ai_peer,
linked_hosts=linked_hosts,
@@ -345,11 +357,12 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
if config is None:
config = HonchoClientConfig.from_global_config()
if not config.api_key:
if not config.api_key and not config.base_url:
raise ValueError(
"Honcho API key not found. "
"Get your API key at https://app.honcho.dev, "
"then run 'hermes honcho setup' or set HONCHO_API_KEY."
"then run 'hermes honcho setup' or set HONCHO_API_KEY. "
"For local instances, set HONCHO_BASE_URL instead."
)
try:
@@ -360,13 +373,34 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
"Install it with: pip install honcho-ai"
)
logger.info("Initializing Honcho client (host: %s, workspace: %s)", config.host, config.workspace_id)
# Allow config.yaml honcho.base_url to override the SDK's environment
# mapping, enabling remote self-hosted Honcho deployments without
# requiring the server to live on localhost.
resolved_base_url = config.base_url
if not resolved_base_url:
try:
from hermes_cli.config import load_config
hermes_cfg = load_config()
honcho_cfg = hermes_cfg.get("honcho", {})
if isinstance(honcho_cfg, dict):
resolved_base_url = honcho_cfg.get("base_url", "").strip() or None
except Exception:
pass
_honcho_client = Honcho(
workspace_id=config.workspace_id,
api_key=config.api_key,
environment=config.environment,
)
if resolved_base_url:
logger.info("Initializing Honcho client (base_url: %s, workspace: %s)", resolved_base_url, config.workspace_id)
else:
logger.info("Initializing Honcho client (host: %s, workspace: %s)", config.host, config.workspace_id)
kwargs: dict = {
"workspace_id": config.workspace_id,
"api_key": config.api_key,
"environment": config.environment,
}
if resolved_base_url:
kwargs["base_url"] = resolved_base_url
_honcho_client = Honcho(**kwargs)
return _honcho_client
+6 -1
View File
@@ -927,6 +927,11 @@ class HonchoSessionManager:
return False
assistant_peer = self._get_or_create_peer(session.assistant_peer_id)
honcho_session = self._sessions_cache.get(session.honcho_session_id)
if not honcho_session:
logger.warning("No Honcho session cached for '%s', skipping AI seed", session_key)
return False
try:
wrapped = (
f"<ai_identity_seed>\n"
@@ -935,7 +940,7 @@ class HonchoSessionManager:
f"{content.strip()}\n"
f"</ai_identity_seed>"
)
assistant_peer.add_message("assistant", wrapped)
honcho_session.add_messages([assistant_peer.message(wrapped)])
logger.info("Seeded AI identity from '%s' into %s", source, session_key)
return True
except Exception as e:
+142 -14
View File
@@ -24,6 +24,7 @@ import json
import asyncio
import os
import logging
import threading
from typing import Dict, Any, List, Optional, Tuple
from tools.registry import registry
@@ -36,6 +37,48 @@ logger = logging.getLogger(__name__)
# Async Bridging (single source of truth -- used by registry.dispatch too)
# =============================================================================
_tool_loop = None # persistent loop for the main (CLI) thread
_tool_loop_lock = threading.Lock()
_worker_thread_local = threading.local() # per-worker-thread persistent loops
def _get_tool_loop():
"""Return a long-lived event loop for running async tool handlers.
Using a persistent loop (instead of asyncio.run() which creates and
*closes* a fresh loop every time) prevents "Event loop is closed"
errors that occur when cached httpx/AsyncOpenAI clients attempt to
close their transport on a dead loop during garbage collection.
"""
global _tool_loop
with _tool_loop_lock:
if _tool_loop is None or _tool_loop.is_closed():
_tool_loop = asyncio.new_event_loop()
return _tool_loop
def _get_worker_loop():
"""Return a persistent event loop for the current worker thread.
Each worker thread (e.g., delegate_task's ThreadPoolExecutor threads)
gets its own long-lived loop stored in thread-local storage. This
prevents the "Event loop is closed" errors that occurred when
asyncio.run() was used per-call: asyncio.run() creates a loop, runs
the coroutine, then *closes* the loop but cached httpx/AsyncOpenAI
clients remain bound to that now-dead loop and raise RuntimeError
during garbage collection or subsequent use.
By keeping the loop alive for the thread's lifetime, cached clients
stay valid and their cleanup runs on a live loop.
"""
loop = getattr(_worker_thread_local, 'loop', None)
if loop is None or loop.is_closed():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
_worker_thread_local.loop = loop
return loop
def _run_async(coro):
"""Run an async coroutine from a sync context.
@@ -44,6 +87,15 @@ def _run_async(coro):
disposable thread so asyncio.run() can create its own loop without
conflicting.
For the common CLI path (no running loop), we use a persistent event
loop so that cached async clients (httpx / AsyncOpenAI) remain bound
to a live loop and don't trigger "Event loop is closed" on GC.
When called from a worker thread (parallel tool execution), we use a
per-thread persistent loop to avoid both contention with the main
thread's shared loop AND the "Event loop is closed" errors caused by
asyncio.run()'s create-and-destroy lifecycle.
This is the single source of truth for sync->async bridging in tool
handlers. The RL paths (agent_loop.py, tool_context.py) also provide
outer thread-pool wrapping as defense-in-depth, but each handler is
@@ -55,11 +107,23 @@ def _run_async(coro):
loop = None
if loop and loop.is_running():
# Inside an async context (gateway, RL env) — run in a fresh thread.
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
future = pool.submit(asyncio.run, coro)
return future.result(timeout=300)
return asyncio.run(coro)
# If we're on a worker thread (e.g., parallel tool execution in
# delegate_task), use a per-thread persistent loop. This avoids
# contention with the main thread's shared loop while keeping cached
# httpx/AsyncOpenAI clients bound to a live loop for the thread's
# lifetime — preventing "Event loop is closed" on GC cleanup.
if threading.current_thread() is not threading.main_thread():
worker_loop = _get_worker_loop()
return worker_loop.run_until_complete(coro)
tool_loop = _get_tool_loop()
return tool_loop.run_until_complete(coro)
# =============================================================================
@@ -101,7 +165,7 @@ def _discover_tools():
try:
importlib.import_module(mod_name)
except Exception as e:
logger.debug("Could not import %s: %s", mod_name, e)
logger.warning("Could not import tool module %s: %s", mod_name, e)
_discover_tools()
@@ -113,6 +177,13 @@ try:
except Exception as e:
logger.debug("MCP tool discovery failed: %s", e)
# Plugin tool discovery (user/project/pip plugins)
try:
from hermes_cli.plugins import discover_plugins
discover_plugins()
except Exception as e:
logger.debug("Plugin discovery failed: %s", e)
# =============================================================================
# Backward-compat constants (built once after discovery)
@@ -142,7 +213,7 @@ _LEGACY_TOOLSET_MAP = {
"browser_navigate", "browser_snapshot", "browser_click",
"browser_type", "browser_scroll", "browser_back",
"browser_press", "browser_close", "browser_get_images",
"browser_vision"
"browser_vision", "browser_console"
],
"cronjob_tools": ["cronjob"],
"rl_tools": [
@@ -222,21 +293,58 @@ def get_tool_definitions(
for ts_name in get_all_toolsets():
tools_to_include.update(resolve_toolset(ts_name))
# Always include plugin-registered tools — they bypass the toolset filter
# because their toolsets are dynamic (created at plugin load time).
try:
from hermes_cli.plugins import get_plugin_tool_names
plugin_tools = get_plugin_tool_names()
if plugin_tools:
tools_to_include.update(plugin_tools)
except Exception:
pass
# Ask the registry for schemas (only returns tools whose check_fn passes)
filtered_tools = registry.get_definitions(tools_to_include, quiet=quiet_mode)
# The set of tool names that actually passed check_fn filtering.
# Use this (not tools_to_include) for any downstream schema that references
# other tools by name — otherwise the model sees tools mentioned in
# descriptions that don't actually exist, and hallucinates calls to them.
available_tool_names = {t["function"]["name"] for t in filtered_tools}
# Rebuild execute_code schema to only list sandbox tools that are actually
# enabled. Without this, the model sees "web_search is available in
# execute_code" even when the user disabled the web toolset (#560-discord).
if "execute_code" in tools_to_include:
# available. Without this, the model sees "web_search is available in
# execute_code" even when the API key isn't configured or the toolset is
# disabled (#560-discord).
if "execute_code" in available_tool_names:
from tools.code_execution_tool import SANDBOX_ALLOWED_TOOLS, build_execute_code_schema
sandbox_enabled = SANDBOX_ALLOWED_TOOLS & tools_to_include
sandbox_enabled = SANDBOX_ALLOWED_TOOLS & available_tool_names
dynamic_schema = build_execute_code_schema(sandbox_enabled)
for i, td in enumerate(filtered_tools):
if td.get("function", {}).get("name") == "execute_code":
filtered_tools[i] = {"type": "function", "function": dynamic_schema}
break
# Strip web tool cross-references from browser_navigate description when
# web_search / web_extract are not available. The static schema says
# "prefer web_search or web_extract" which causes the model to hallucinate
# those tools when they're missing.
if "browser_navigate" in available_tool_names:
web_tools_available = {"web_search", "web_extract"} & available_tool_names
if not web_tools_available:
for i, td in enumerate(filtered_tools):
if td.get("function", {}).get("name") == "browser_navigate":
desc = td["function"].get("description", "")
desc = desc.replace(
" For simple information retrieval, prefer web_search or web_extract (faster, cheaper).",
"",
)
filtered_tools[i] = {
"type": "function",
"function": {**td["function"], "description": desc},
}
break
if not quiet_mode:
if filtered_tools:
tool_names = [t["function"]["name"] for t in filtered_tools]
@@ -259,6 +367,7 @@ def get_tool_definitions(
# The registry still holds their schemas; dispatch just returns a stub error
# so if something slips through, the LLM sees a sensible message.
_AGENT_LOOP_TOOLS = {"todo", "memory", "session_search", "delegate_task"}
_READ_SEARCH_TOOLS = {"read_file", "search_files"}
def handle_function_call(
@@ -267,6 +376,8 @@ def handle_function_call(
task_id: Optional[str] = None,
user_task: Optional[str] = None,
enabled_tools: Optional[List[str]] = None,
honcho_manager: Optional[Any] = None,
honcho_session_key: Optional[str] = None,
) -> str:
"""
Main function call dispatcher that routes calls to the tool registry.
@@ -286,7 +397,6 @@ def handle_function_call(
"""
# Notify the read-loop tracker when a non-read/search tool runs,
# so the *consecutive* counter resets (reads after other work are fine).
_READ_SEARCH_TOOLS = {"read_file", "search_files"}
if function_name not in _READ_SEARCH_TOOLS:
try:
from tools.file_tools import notify_other_tool_call
@@ -298,21 +408,39 @@ def handle_function_call(
if function_name in _AGENT_LOOP_TOOLS:
return json.dumps({"error": f"{function_name} must be handled by the agent loop"})
try:
from hermes_cli.plugins import invoke_hook
invoke_hook("pre_tool_call", tool_name=function_name, args=function_args, task_id=task_id or "")
except Exception:
pass
if function_name == "execute_code":
# Prefer the caller-provided list so subagents can't overwrite
# the parent's tool set via the process-global.
sandbox_enabled = enabled_tools if enabled_tools is not None else _last_resolved_tool_names
return registry.dispatch(
result = registry.dispatch(
function_name, function_args,
task_id=task_id,
enabled_tools=sandbox_enabled,
honcho_manager=honcho_manager,
honcho_session_key=honcho_session_key,
)
else:
result = registry.dispatch(
function_name, function_args,
task_id=task_id,
user_task=user_task,
honcho_manager=honcho_manager,
honcho_session_key=honcho_session_key,
)
return registry.dispatch(
function_name, function_args,
task_id=task_id,
user_task=user_task,
)
try:
from hermes_cli.plugins import invoke_hook
invoke_hook("post_tool_call", tool_name=function_name, args=function_args, result=result, task_id=task_id or "")
except Exception:
pass
return result
except Exception as e:
error_msg = f"Error executing {function_name}: {str(e)}"
+231
View File
@@ -0,0 +1,231 @@
---
name: base
description: Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detection, and live network stats. Uses Base RPC + CoinGecko. No API key required.
version: 0.1.0
author: youssefea
license: MIT
metadata:
hermes:
tags: [Base, Blockchain, Crypto, Web3, RPC, DeFi, EVM, L2, Ethereum]
related_skills: []
---
# Base Blockchain Skill
Query Base (Ethereum L2) on-chain data enriched with USD pricing via CoinGecko.
8 commands: wallet portfolio, token info, transactions, gas analysis,
contract inspection, whale detection, network stats, and price lookup.
No API key needed. Uses only Python standard library (urllib, json, argparse).
---
## When to Use
- User asks for a Base wallet balance, token holdings, or portfolio value
- User wants to inspect a specific transaction by hash
- User wants ERC-20 token metadata, price, supply, or market cap
- User wants to understand Base gas costs and L1 data fees
- User wants to inspect a contract (ERC type detection, proxy resolution)
- User wants to find large ETH transfers (whale detection)
- User wants Base network health, gas price, or ETH price
- User asks "what's the price of USDC/AERO/DEGEN/ETH?"
---
## Prerequisites
The helper script uses only Python standard library (urllib, json, argparse).
No external packages required.
Pricing data comes from CoinGecko's free API (no key needed, rate-limited
to ~10-30 requests/minute). For faster lookups, use `--no-prices` flag.
---
## Quick Reference
RPC endpoint (default): https://mainnet.base.org
Override: export BASE_RPC_URL=https://your-private-rpc.com
Helper script path: ~/.hermes/skills/blockchain/base/scripts/base_client.py
```
python3 base_client.py wallet <address> [--limit N] [--all] [--no-prices]
python3 base_client.py tx <hash>
python3 base_client.py token <contract_address>
python3 base_client.py gas
python3 base_client.py contract <address>
python3 base_client.py whales [--min-eth N]
python3 base_client.py stats
python3 base_client.py price <contract_address_or_symbol>
```
---
## Procedure
### 0. Setup Check
```bash
python3 --version
# Optional: set a private RPC for better rate limits
export BASE_RPC_URL="https://mainnet.base.org"
# Confirm connectivity
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
```
### 1. Wallet Portfolio
Get ETH balance and ERC-20 token holdings with USD values.
Checks ~15 well-known Base tokens (USDC, WETH, AERO, DEGEN, etc.)
via on-chain `balanceOf` calls. Tokens sorted by value, dust filtered.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
wallet 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
```
Flags:
- `--limit N` — show top N tokens (default: 20)
- `--all` — show all tokens, no dust filter, no limit
- `--no-prices` — skip CoinGecko price lookups (faster, RPC-only)
Output includes: ETH balance + USD value, token list with prices sorted
by value, dust count, total portfolio value in USD.
Note: Only checks known tokens. Unknown ERC-20s are not discovered.
Use the `token` command with a specific contract address for any token.
### 2. Transaction Details
Inspect a full transaction by its hash. Shows ETH value transferred,
gas used, fee in ETH/USD, status, and decoded ERC-20/ERC-721 transfers.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
tx 0xabc123...your_tx_hash_here
```
Output: hash, block, from, to, value (ETH + USD), gas price, gas used,
fee, status, contract creation address (if any), token transfers.
### 3. Token Info
Get ERC-20 token metadata: name, symbol, decimals, total supply, price,
market cap, and contract code size.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
token 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
```
Output: name, symbol, decimals, total supply, price, market cap.
Reads name/symbol/decimals directly from the contract via eth_call.
### 4. Gas Analysis
Detailed gas analysis with cost estimates for common operations.
Shows current gas price, base fee trends over 10 blocks, block
utilization, and estimated costs for ETH transfers, ERC-20 transfers,
and swaps.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py gas
```
Output: current gas price, base fee, block utilization, 10-block trend,
cost estimates in ETH and USD.
Note: Base is an L2 — actual transaction costs include an L1 data
posting fee that depends on calldata size and L1 gas prices. The
estimates shown are for L2 execution only.
### 5. Contract Inspection
Inspect an address: determine if it's an EOA or contract, detect
ERC-20/ERC-721/ERC-1155 interfaces, resolve EIP-1967 proxy
implementation addresses.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
contract 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
```
Output: is_contract, code size, ETH balance, detected interfaces
(ERC-20, ERC-721, ERC-1155), ERC-20 metadata, proxy implementation
address.
### 6. Whale Detector
Scan the most recent block for large ETH transfers with USD values.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
whales --min-eth 1.0
```
Note: scans the latest block only — point-in-time snapshot, not historical.
Default threshold is 1.0 ETH (lower than Solana's default since ETH
values are higher).
### 7. Network Stats
Live Base network health: latest block, chain ID, gas price, base fee,
block utilization, transaction count, and ETH price.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
```
### 8. Price Lookup
Quick price check for any token by contract address or known symbol.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price ETH
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price USDC
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price AERO
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price DEGEN
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
```
Known symbols: ETH, WETH, USDC, cbETH, AERO, DEGEN, TOSHI, BRETT,
WELL, wstETH, rETH, cbBTC.
---
## Pitfalls
- **CoinGecko rate-limits** — free tier allows ~10-30 requests/minute.
Price lookups use 1 request per token. Use `--no-prices` for speed.
- **Public RPC rate-limits** — Base's public RPC limits requests.
For production use, set BASE_RPC_URL to a private endpoint
(Alchemy, QuickNode, Infura).
- **Wallet shows known tokens only** — unlike Solana, EVM chains have no
built-in "get all tokens" RPC. The wallet command checks ~15 popular
Base tokens via `balanceOf`. Unknown ERC-20s won't appear. Use the
`token` command for any specific contract.
- **Token names read from contract** — if a contract doesn't implement
`name()` or `symbol()`, these fields may be empty. Known tokens have
hardcoded labels as fallback.
- **Gas estimates are L2 only** — Base transaction costs include an L1
data posting fee (depends on calldata size and L1 gas prices). The gas
command estimates L2 execution cost only.
- **Whale detector scans latest block only** — not historical. Results
vary by the moment you query. Default threshold is 1.0 ETH.
- **Proxy detection** — only EIP-1967 proxies are detected. Other proxy
patterns (EIP-1167 minimal proxy, custom storage slots) are not checked.
- **Retry on 429** — both RPC and CoinGecko calls retry up to 2 times
with exponential backoff on rate-limit errors.
---
## Verification
```bash
# Should print Base chain ID (8453), latest block, gas price, and ETH price
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
```
File diff suppressed because it is too large Load Diff
@@ -0,0 +1,116 @@
---
name: blender-mcp
description: Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python (bpy) code. Use when user wants to create or modify anything in Blender.
version: 1.0.0
requires: Blender 4.3+ (desktop instance required, headless not supported)
author: alireza78a
tags: [blender, 3d, animation, modeling, bpy, mcp]
---
# Blender MCP
Control a running Blender instance from Hermes via socket on TCP port 9876.
## Setup (one-time)
### 1. Install the Blender addon
curl -sL https://raw.githubusercontent.com/ahujasid/blender-mcp/main/addon.py -o ~/Desktop/blender_mcp_addon.py
In Blender:
Edit > Preferences > Add-ons > Install > select blender_mcp_addon.py
Enable "Interface: Blender MCP"
### 2. Start the socket server in Blender
Press N in Blender viewport to open sidebar.
Find "BlenderMCP" tab and click "Start Server".
### 3. Verify connection
nc -z -w2 localhost 9876 && echo "OPEN" || echo "CLOSED"
## Protocol
Plain UTF-8 JSON over TCP -- no length prefix.
Send: {"type": "<command>", "params": {<kwargs>}}
Receive: {"status": "success", "result": <value>}
{"status": "error", "message": "<reason>"}
## Available Commands
| type | params | description |
|-------------------------|-------------------|---------------------------------|
| execute_code | code (str) | Run arbitrary bpy Python code |
| get_scene_info | (none) | List all objects in scene |
| get_object_info | object_name (str) | Details on a specific object |
| get_viewport_screenshot | (none) | Screenshot of current viewport |
## Python Helper
Use this inside execute_code tool calls:
import socket, json
def blender_exec(code: str, host="localhost", port=9876, timeout=15):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, port))
s.settimeout(timeout)
payload = json.dumps({"type": "execute_code", "params": {"code": code}})
s.sendall(payload.encode("utf-8"))
buf = b""
while True:
try:
chunk = s.recv(4096)
if not chunk:
break
buf += chunk
try:
json.loads(buf.decode("utf-8"))
break
except json.JSONDecodeError:
continue
except socket.timeout:
break
s.close()
return json.loads(buf.decode("utf-8"))
## Common bpy Patterns
### Clear scene
bpy.ops.object.select_all(action='SELECT')
bpy.ops.object.delete()
### Add mesh objects
bpy.ops.mesh.primitive_uv_sphere_add(radius=1, location=(0, 0, 0))
bpy.ops.mesh.primitive_cube_add(size=2, location=(3, 0, 0))
bpy.ops.mesh.primitive_cylinder_add(radius=0.5, depth=2, location=(-3, 0, 0))
### Create and assign material
mat = bpy.data.materials.new(name="MyMat")
mat.use_nodes = True
bsdf = mat.node_tree.nodes.get("Principled BSDF")
bsdf.inputs["Base Color"].default_value = (R, G, B, 1.0)
bsdf.inputs["Roughness"].default_value = 0.3
bsdf.inputs["Metallic"].default_value = 0.0
obj.data.materials.append(mat)
### Keyframe animation
obj.location = (0, 0, 0)
obj.keyframe_insert(data_path="location", frame=1)
obj.location = (0, 0, 3)
obj.keyframe_insert(data_path="location", frame=60)
### Render to file
bpy.context.scene.render.filepath = "/tmp/render.png"
bpy.context.scene.render.engine = 'CYCLES'
bpy.ops.render.render(write_still=True)
## Pitfalls
- Must check socket is open before running (nc -z localhost 9876)
- Addon server must be started inside Blender each session (N-panel > BlenderMCP > Connect)
- Break complex scenes into multiple smaller execute_code calls to avoid timeouts
- Render output path must be absolute (/tmp/...) not relative
- shade_smooth() requires object to be selected and in object mode
+3
View File
@@ -0,0 +1,3 @@
# MCP
Skills for building, testing, and deploying MCP (Model Context Protocol) servers.
+299
View File
@@ -0,0 +1,299 @@
---
name: fastmcp
description: Build, test, inspect, install, and deploy MCP servers with FastMCP in Python. Use when creating a new MCP server, wrapping an API or database as MCP tools, exposing resources or prompts, or preparing a FastMCP server for Claude Code, Cursor, or HTTP deployment.
version: 1.0.0
author: Hermes Agent
license: MIT
metadata:
hermes:
tags: [MCP, FastMCP, Python, Tools, Resources, Prompts, Deployment]
homepage: https://gofastmcp.com
related_skills: [native-mcp, mcporter]
prerequisites:
commands: [python3]
---
# FastMCP
Build MCP servers in Python with FastMCP, validate them locally, install them into MCP clients, and deploy them as HTTP endpoints.
## When to Use
Use this skill when the task is to:
- create a new MCP server in Python
- wrap an API, database, CLI, or file-processing workflow as MCP tools
- expose resources or prompts in addition to tools
- smoke-test a server with the FastMCP CLI before wiring it into Hermes or another client
- install a server into Claude Code, Claude Desktop, Cursor, or a similar MCP client
- prepare a FastMCP server repo for HTTP deployment
Use `native-mcp` when the server already exists and only needs to be connected to Hermes. Use `mcporter` when the goal is ad-hoc CLI access to an existing MCP server instead of building one.
## Prerequisites
Install FastMCP in the working environment first:
```bash
pip install fastmcp
fastmcp version
```
For the API template, install `httpx` if it is not already present:
```bash
pip install httpx
```
## Included Files
### Templates
- `templates/api_wrapper.py` - REST API wrapper with auth header support
- `templates/database_server.py` - read-only SQLite query server
- `templates/file_processor.py` - text-file inspection and search server
### Scripts
- `scripts/scaffold_fastmcp.py` - copy a starter template and replace the server name placeholder
### References
- `references/fastmcp-cli.md` - FastMCP CLI workflow, installation targets, and deployment checks
## Workflow
### 1. Pick the Smallest Viable Server Shape
Choose the narrowest useful surface area first:
- API wrapper: start with 1-3 high-value endpoints, not the whole API
- database server: expose read-only introspection and a constrained query path
- file processor: expose deterministic operations with explicit path arguments
- prompts/resources: add only when the client needs reusable prompt templates or discoverable documents
Prefer a thin server with good names, docstrings, and schemas over a large server with vague tools.
### 2. Scaffold from a Template
Copy a template directly or use the scaffold helper:
```bash
python ~/.hermes/skills/mcp/fastmcp/scripts/scaffold_fastmcp.py \
--template api_wrapper \
--name "Acme API" \
--output ./acme_server.py
```
Available templates:
```bash
python ~/.hermes/skills/mcp/fastmcp/scripts/scaffold_fastmcp.py --list
```
If copying manually, replace `__SERVER_NAME__` with a real server name.
### 3. Implement Tools First
Start with `@mcp.tool` functions before adding resources or prompts.
Rules for tool design:
- Give every tool a concrete verb-based name
- Write docstrings as user-facing tool descriptions
- Keep parameters explicit and typed
- Return structured JSON-safe data where possible
- Validate unsafe inputs early
- Prefer read-only behavior by default for first versions
Good tool examples:
- `get_customer`
- `search_tickets`
- `describe_table`
- `summarize_text_file`
Weak tool examples:
- `run`
- `process`
- `do_thing`
### 4. Add Resources and Prompts Only When They Help
Add `@mcp.resource` when the client benefits from fetching stable read-only content such as schemas, policy docs, or generated reports.
Add `@mcp.prompt` when the server should provide a reusable prompt template for a known workflow.
Do not turn every document into a prompt. Prefer:
- tools for actions
- resources for data/document retrieval
- prompts for reusable LLM instructions
### 5. Test the Server Before Integrating It Anywhere
Use the FastMCP CLI for local validation:
```bash
fastmcp inspect acme_server.py:mcp
fastmcp list acme_server.py --json
fastmcp call acme_server.py search_resources query=router limit=5 --json
```
For fast iterative debugging, run the server locally:
```bash
fastmcp run acme_server.py:mcp
```
To test HTTP transport locally:
```bash
fastmcp run acme_server.py:mcp --transport http --host 127.0.0.1 --port 8000
fastmcp list http://127.0.0.1:8000/mcp --json
fastmcp call http://127.0.0.1:8000/mcp search_resources query=router --json
```
Always run at least one real `fastmcp call` against each new tool before claiming the server works.
### 6. Install into a Client When Local Validation Passes
FastMCP can register the server with supported MCP clients:
```bash
fastmcp install claude-code acme_server.py
fastmcp install claude-desktop acme_server.py
fastmcp install cursor acme_server.py -e .
```
Use `fastmcp discover` to inspect named MCP servers already configured on the machine.
When the goal is Hermes integration, either:
- configure the server in `~/.hermes/config.yaml` using the `native-mcp` skill, or
- keep using FastMCP CLI commands during development until the interface stabilizes
### 7. Deploy After the Local Contract Is Stable
For managed hosting, Prefect Horizon is the path FastMCP documents most directly. Before deployment:
```bash
fastmcp inspect acme_server.py:mcp
```
Make sure the repo contains:
- a Python file with the FastMCP server object
- `requirements.txt` or `pyproject.toml`
- any environment-variable documentation needed for deployment
For generic HTTP hosting, validate the HTTP transport locally first, then deploy on any Python-compatible platform that can expose the server port.
## Common Patterns
### API Wrapper Pattern
Use when exposing a REST or HTTP API as MCP tools.
Recommended first slice:
- one read path
- one list/search path
- optional health check
Implementation notes:
- keep auth in environment variables, not hardcoded
- centralize request logic in one helper
- surface API errors with concise context
- normalize inconsistent upstream payloads before returning them
Start from `templates/api_wrapper.py`.
### Database Pattern
Use when exposing safe query and inspection capabilities.
Recommended first slice:
- `list_tables`
- `describe_table`
- one constrained read query tool
Implementation notes:
- default to read-only DB access
- reject non-`SELECT` SQL in early versions
- limit row counts
- return rows plus column names
Start from `templates/database_server.py`.
### File Processor Pattern
Use when the server needs to inspect or transform files on demand.
Recommended first slice:
- summarize file contents
- search within files
- extract deterministic metadata
Implementation notes:
- accept explicit file paths
- check for missing files and encoding failures
- cap previews and result counts
- avoid shelling out unless a specific external tool is required
Start from `templates/file_processor.py`.
## Quality Bar
Before handing off a FastMCP server, verify all of the following:
- server imports cleanly
- `fastmcp inspect <file.py:mcp>` succeeds
- `fastmcp list <server spec> --json` succeeds
- every new tool has at least one real `fastmcp call`
- environment variables are documented
- the tool surface is small enough to understand without guesswork
## Troubleshooting
### FastMCP command missing
Install the package in the active environment:
```bash
pip install fastmcp
fastmcp version
```
### `fastmcp inspect` fails
Check that:
- the file imports without side effects that crash
- the FastMCP instance is named correctly in `<file.py:object>`
- optional dependencies from the template are installed
### Tool works in Python but not through CLI
Run:
```bash
fastmcp list server.py --json
fastmcp call server.py your_tool_name --json
```
This usually exposes naming mismatches, missing required arguments, or non-serializable return values.
### Hermes cannot see the deployed server
The server-building part may be correct while the Hermes config is not. Load the `native-mcp` skill and configure the server in `~/.hermes/config.yaml`, then restart Hermes.
## References
For CLI details, install targets, and deployment checks, read `references/fastmcp-cli.md`.
@@ -0,0 +1,110 @@
# FastMCP CLI Reference
Use this file when the task needs exact FastMCP CLI workflows rather than the higher-level guidance in `SKILL.md`.
## Install and Verify
```bash
pip install fastmcp
fastmcp version
```
FastMCP documents `pip install fastmcp` and `fastmcp version` as the baseline installation and verification path.
## Run a Server
Run a server object from a Python file:
```bash
fastmcp run server.py:mcp
```
Run the same server over HTTP:
```bash
fastmcp run server.py:mcp --transport http --host 127.0.0.1 --port 8000
```
## Inspect a Server
Inspect what FastMCP will expose:
```bash
fastmcp inspect server.py:mcp
```
This is also the check FastMCP recommends before deploying to Prefect Horizon.
## List and Call Tools
List tools from a Python file:
```bash
fastmcp list server.py --json
```
List tools from an HTTP endpoint:
```bash
fastmcp list http://127.0.0.1:8000/mcp --json
```
Call a tool with key-value arguments:
```bash
fastmcp call server.py search_resources query=router limit=5 --json
```
Call a tool with a full JSON input payload:
```bash
fastmcp call server.py create_item '{"name": "Widget", "tags": ["sale"]}' --json
```
## Discover Named MCP Servers
Find named servers already configured in local MCP-aware tools:
```bash
fastmcp discover
```
FastMCP documents name-based resolution for Claude Desktop, Claude Code, Cursor, Gemini, Goose, and `./mcp.json`.
## Install into MCP Clients
Register a server with common clients:
```bash
fastmcp install claude-code server.py
fastmcp install claude-desktop server.py
fastmcp install cursor server.py -e .
```
FastMCP notes that client installs run in isolated environments, so declare dependencies explicitly when needed with flags such as `--with`, `--env-file`, or editable installs.
## Deployment Checks
### Prefect Horizon
Before pushing to Horizon:
```bash
fastmcp inspect server.py:mcp
```
FastMCPs Horizon docs expect:
- a GitHub repo
- a Python file containing the FastMCP server object
- dependencies declared in `requirements.txt` or `pyproject.toml`
- an entrypoint like `main.py:mcp`
### Generic HTTP Hosting
Before shipping to any other host:
1. Start the server locally with HTTP transport.
2. Verify `fastmcp list` against the local `/mcp` URL.
3. Verify at least one `fastmcp call`.
4. Document required environment variables.
@@ -0,0 +1,56 @@
#!/usr/bin/env python3
"""Copy a FastMCP starter template into a working file."""
from __future__ import annotations
import argparse
from pathlib import Path
SCRIPT_DIR = Path(__file__).resolve().parent
SKILL_DIR = SCRIPT_DIR.parent
TEMPLATE_DIR = SKILL_DIR / "templates"
PLACEHOLDER = "__SERVER_NAME__"
def list_templates() -> list[str]:
return sorted(path.stem for path in TEMPLATE_DIR.glob("*.py"))
def render_template(template_name: str, server_name: str) -> str:
template_path = TEMPLATE_DIR / f"{template_name}.py"
if not template_path.exists():
available = ", ".join(list_templates())
raise SystemExit(f"Unknown template '{template_name}'. Available: {available}")
return template_path.read_text(encoding="utf-8").replace(PLACEHOLDER, server_name)
def main() -> int:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument("--template", help="Template name without .py suffix")
parser.add_argument("--name", help="FastMCP server display name")
parser.add_argument("--output", help="Destination Python file path")
parser.add_argument("--force", action="store_true", help="Overwrite an existing output file")
parser.add_argument("--list", action="store_true", help="List available templates and exit")
args = parser.parse_args()
if args.list:
for name in list_templates():
print(name)
return 0
if not args.template or not args.name or not args.output:
parser.error("--template, --name, and --output are required unless --list is used")
output_path = Path(args.output).expanduser()
if output_path.exists() and not args.force:
raise SystemExit(f"Refusing to overwrite existing file: {output_path}")
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(render_template(args.template, args.name), encoding="utf-8")
print(f"Wrote {output_path}")
return 0
if __name__ == "__main__":
raise SystemExit(main())
@@ -0,0 +1,54 @@
from __future__ import annotations
import os
from typing import Any
import httpx
from fastmcp import FastMCP
mcp = FastMCP("__SERVER_NAME__")
API_BASE_URL = os.getenv("API_BASE_URL", "https://api.example.com")
API_TOKEN = os.getenv("API_TOKEN")
REQUEST_TIMEOUT = float(os.getenv("API_TIMEOUT_SECONDS", "20"))
def _headers() -> dict[str, str]:
headers = {"Accept": "application/json"}
if API_TOKEN:
headers["Authorization"] = f"Bearer {API_TOKEN}"
return headers
def _request(method: str, path: str, *, params: dict[str, Any] | None = None) -> Any:
url = f"{API_BASE_URL.rstrip('/')}/{path.lstrip('/')}"
with httpx.Client(timeout=REQUEST_TIMEOUT, headers=_headers()) as client:
response = client.request(method, url, params=params)
response.raise_for_status()
return response.json()
@mcp.tool
def health_check() -> dict[str, Any]:
"""Check whether the upstream API is reachable."""
payload = _request("GET", "/health")
return {"base_url": API_BASE_URL, "result": payload}
@mcp.tool
def get_resource(resource_id: str) -> dict[str, Any]:
"""Fetch one resource by ID from the upstream API."""
payload = _request("GET", f"/resources/{resource_id}")
return {"resource_id": resource_id, "data": payload}
@mcp.tool
def search_resources(query: str, limit: int = 10) -> dict[str, Any]:
"""Search upstream resources by query string."""
payload = _request("GET", "/resources", params={"q": query, "limit": limit})
return {"query": query, "limit": limit, "results": payload}
if __name__ == "__main__":
mcp.run()
@@ -0,0 +1,77 @@
from __future__ import annotations
import os
import re
import sqlite3
from typing import Any
from fastmcp import FastMCP
mcp = FastMCP("__SERVER_NAME__")
DATABASE_PATH = os.getenv("SQLITE_PATH", "./app.db")
MAX_ROWS = int(os.getenv("SQLITE_MAX_ROWS", "200"))
TABLE_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
def _connect() -> sqlite3.Connection:
return sqlite3.connect(f"file:{DATABASE_PATH}?mode=ro", uri=True)
def _reject_mutation(sql: str) -> None:
normalized = sql.strip().lower()
if not normalized.startswith("select"):
raise ValueError("Only SELECT queries are allowed")
def _validate_table_name(table_name: str) -> str:
if not TABLE_NAME_RE.fullmatch(table_name):
raise ValueError("Invalid table name")
return table_name
@mcp.tool
def list_tables() -> list[str]:
"""List user-defined SQLite tables."""
with _connect() as conn:
rows = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%' ORDER BY name"
).fetchall()
return [row[0] for row in rows]
@mcp.tool
def describe_table(table_name: str) -> list[dict[str, Any]]:
"""Describe columns for a SQLite table."""
safe_table_name = _validate_table_name(table_name)
with _connect() as conn:
rows = conn.execute(f"PRAGMA table_info({safe_table_name})").fetchall()
return [
{
"cid": row[0],
"name": row[1],
"type": row[2],
"notnull": bool(row[3]),
"default": row[4],
"pk": bool(row[5]),
}
for row in rows
]
@mcp.tool
def query(sql: str, limit: int = 50) -> dict[str, Any]:
"""Run a read-only SELECT query and return rows plus column names."""
_reject_mutation(sql)
safe_limit = max(0, min(limit, MAX_ROWS))
wrapped_sql = f"SELECT * FROM ({sql.strip().rstrip(';')}) LIMIT {safe_limit}"
with _connect() as conn:
cursor = conn.execute(wrapped_sql)
columns = [column[0] for column in cursor.description or []]
rows = [dict(zip(columns, row)) for row in cursor.fetchall()]
return {"limit": safe_limit, "columns": columns, "rows": rows}
if __name__ == "__main__":
mcp.run()
@@ -0,0 +1,55 @@
from __future__ import annotations
from pathlib import Path
from typing import Any
from fastmcp import FastMCP
mcp = FastMCP("__SERVER_NAME__")
def _read_text(path: str) -> str:
file_path = Path(path).expanduser()
try:
return file_path.read_text(encoding="utf-8")
except FileNotFoundError as exc:
raise ValueError(f"File not found: {file_path}") from exc
except UnicodeDecodeError as exc:
raise ValueError(f"File is not valid UTF-8 text: {file_path}") from exc
@mcp.tool
def summarize_text_file(path: str, preview_chars: int = 1200) -> dict[str, int | str]:
"""Return basic metadata and a preview for a UTF-8 text file."""
file_path = Path(path).expanduser()
text = _read_text(path)
return {
"path": str(file_path),
"characters": len(text),
"lines": len(text.splitlines()),
"preview": text[:preview_chars],
}
@mcp.tool
def search_text_file(path: str, needle: str, max_matches: int = 20) -> dict[str, Any]:
"""Find matching lines in a UTF-8 text file."""
file_path = Path(path).expanduser()
matches: list[dict[str, Any]] = []
for line_number, line in enumerate(_read_text(path).splitlines(), start=1):
if needle.lower() in line.lower():
matches.append({"line_number": line_number, "line": line})
if len(matches) >= max_matches:
break
return {"path": str(file_path), "needle": needle, "matches": matches}
@mcp.resource("file://{path}")
def read_file_resource(path: str) -> str:
"""Expose a text file as a resource."""
return _read_text(path)
if __name__ == "__main__":
mcp.run()
@@ -0,0 +1,422 @@
---
name: oss-forensics
description: |
Supply chain investigation, evidence recovery, and forensic analysis for GitHub repositories.
Covers deleted commit recovery, force-push detection, IOC extraction, multi-source evidence
collection, hypothesis formation/validation, and structured forensic reporting.
Inspired by RAPTOR's 1800+ line OSS Forensics system.
category: security
triggers:
- "investigate this repository"
- "investigate [owner/repo]"
- "check for supply chain compromise"
- "recover deleted commits"
- "forensic analysis of [owner/repo]"
- "was this repo compromised"
- "supply chain attack"
- "suspicious commit"
- "force push detected"
- "IOC extraction"
toolsets:
- terminal
- web
- file
- delegation
---
# OSS Security Forensics Skill
A 7-phase multi-agent investigation framework for researching open-source supply chain attacks.
Adapted from RAPTOR's forensics system. Covers GitHub Archive, Wayback Machine, GitHub API,
local git analysis, IOC extraction, evidence-backed hypothesis formation and validation,
and final forensic report generation.
---
## ⚠️ Anti-Hallucination Guardrails
Read these before every investigation step. Violating them invalidates the report.
1. **Evidence-First Rule**: Every claim in any report, hypothesis, or summary MUST cite at least one evidence ID (`EV-XXXX`). Assertions without citations are forbidden.
2. **STAY IN YOUR LANE**: Each sub-agent (investigator) has a single data source. Do NOT mix sources. The GH Archive investigator does not query the GitHub API, and vice versa. Role boundaries are hard.
3. **Fact vs. Hypothesis Separation**: Mark all unverified inferences with `[HYPOTHESIS]`. Only statements verified against original sources may be stated as facts.
4. **No Evidence Fabrication**: The hypothesis validator MUST mechanically check that every cited evidence ID actually exists in the evidence store before accepting a hypothesis.
5. **Proof-Required Disproval**: A hypothesis cannot be dismissed without a specific, evidence-backed counter-argument. "No evidence found" is not sufficient to disprove—it only makes a hypothesis inconclusive.
6. **SHA/URL Double-Verification**: Any commit SHA, URL, or external identifier cited as evidence must be independently confirmed from at least two sources before being marked as verified.
7. **Suspicious Code Rule**: Never run code found inside the investigated repository locally. Analyze statically only, or use `execute_code` in a sandboxed environment.
8. **Secret Redaction**: Any API keys, tokens, or credentials discovered during investigation must be redacted in the final report. Log them internally only.
---
## Example Scenarios
- **Scenario A: Dependency Confusion**: A malicious package `internal-lib-v2` is uploaded to NPM with a higher version than the internal one. The investigator must track when this package was first seen and if any PushEvents in the target repo updated `package.json` to this version.
- **Scenario B: Maintainer Takeover**: A long-term contributor's account is used to push a backdoored `.github/workflows/build.yml`. The investigator looks for PushEvents from this user after a long period of inactivity or from a new IP/location (if detectable via BigQuery).
- **Scenario C: Force-Push Hide**: A developer accidentally commits a production secret, then force-pushes to "fix" it. The investigator uses `git fsck` and GH Archive to recover the original commit SHA and verify what was leaked.
---
> **Path convention**: Throughout this skill, `SKILL_DIR` refers to the root of this skill's
> installation directory (the folder containing this `SKILL.md`). When the skill is loaded,
> resolve `SKILL_DIR` to the actual path — e.g. `~/.hermes/skills/security/oss-forensics/`
> or the `optional-skills/` equivalent. All script and template references are relative to it.
## Phase 0: Initialization
1. Create investigation working directory:
```bash
mkdir investigation_$(echo "REPO_NAME" | tr '/' '_')
cd investigation_$(echo "REPO_NAME" | tr '/' '_')
```
2. Initialize the evidence store:
```bash
python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list
```
3. Copy the forensic report template:
```bash
cp SKILL_DIR/templates/forensic-report.md ./investigation-report.md
```
4. Create an `iocs.md` file to track Indicators of Compromise as they are discovered.
5. Record the investigation start time, target repository, and stated investigation goal.
---
## Phase 1: Prompt Parsing and IOC Extraction
**Goal**: Extract all structured investigative targets from the user's request.
**Actions**:
- Parse the user prompt and extract:
- Target repository (`owner/repo`)
- Target actors (GitHub handles, email addresses)
- Time window of interest (commit date ranges, PR timestamps)
- Provided Indicators of Compromise: commit SHAs, file paths, package names, IP addresses, domains, API keys/tokens, malicious URLs
- Any linked vendor security reports or blog posts
**Tools**: Reasoning only, or `execute_code` for regex extraction from large text blocks.
**Output**: Populate `iocs.md` with extracted IOCs. Each IOC must have:
- Type (from: COMMIT_SHA, FILE_PATH, API_KEY, SECRET, IP_ADDRESS, DOMAIN, PACKAGE_NAME, ACTOR_USERNAME, MALICIOUS_URL, OTHER)
- Value
- Source (user-provided, inferred)
**Reference**: See [evidence-types.md](./references/evidence-types.md) for IOC taxonomy.
---
## Phase 2: Parallel Evidence Collection
Spawn up to 5 specialist investigator sub-agents using `delegate_task` (batch mode, max 3 concurrent). Each investigator has a **single data source** and must not mix sources.
> **Orchestrator note**: Pass the IOC list from Phase 1 and the investigation time window in the `context` field of each delegated task.
---
### Investigator 1: Local Git Investigator
**ROLE BOUNDARY**: You query the LOCAL GIT REPOSITORY ONLY. Do not call any external APIs.
**Actions**:
```bash
# Clone repository
git clone https://github.com/OWNER/REPO.git target_repo && cd target_repo
# Full commit log with stats
git log --all --full-history --stat --format="%H|%ae|%an|%ai|%s" > ../git_log.txt
# Detect force-push evidence (orphaned/dangling commits)
git fsck --lost-found --unreachable 2>&1 | grep commit > ../dangling_commits.txt
# Check reflog for rewritten history
git reflog --all > ../reflog.txt
# List ALL branches including deleted remote refs
git branch -a -v > ../branches.txt
# Find suspicious large binary additions
git log --all --diff-filter=A --name-only --format="%H %ai" -- "*.so" "*.dll" "*.exe" "*.bin" > ../binary_additions.txt
# Check for GPG signature anomalies
git log --show-signature --format="%H %ai %aN" > ../signature_check.txt 2>&1
```
**Evidence to collect** (add via `python3 SKILL_DIR/scripts/evidence-store.py add`):
- Each dangling commit SHA → type: `git`
- Force-push evidence (reflog showing history rewrite) → type: `git`
- Unsigned commits from verified contributors → type: `git`
- Suspicious binary file additions → type: `git`
**Reference**: See [recovery-techniques.md](./references/recovery-techniques.md) for accessing force-pushed commits.
---
### Investigator 2: GitHub API Investigator
**ROLE BOUNDARY**: You query the GITHUB REST API ONLY. Do not run git commands locally.
**Actions**:
```bash
# Commits (paginated)
curl -s "https://api.github.com/repos/OWNER/REPO/commits?per_page=100" > api_commits.json
# Pull Requests including closed/deleted
curl -s "https://api.github.com/repos/OWNER/REPO/pulls?state=all&per_page=100" > api_prs.json
# Issues
curl -s "https://api.github.com/repos/OWNER/REPO/issues?state=all&per_page=100" > api_issues.json
# Contributors and collaborator changes
curl -s "https://api.github.com/repos/OWNER/REPO/contributors" > api_contributors.json
# Repository events (last 300)
curl -s "https://api.github.com/repos/OWNER/REPO/events?per_page=100" > api_events.json
# Check specific suspicious commit SHA details
curl -s "https://api.github.com/repos/OWNER/REPO/git/commits/SHA" > commit_detail.json
# Releases
curl -s "https://api.github.com/repos/OWNER/REPO/releases?per_page=100" > api_releases.json
# Check if a specific commit exists (force-pushed commits may 404 on commits/ but succeed on git/commits/)
curl -s "https://api.github.com/repos/OWNER/REPO/commits/SHA" | jq .sha
```
**Cross-reference targets** (flag discrepancies as evidence):
- PR exists in archive but missing from API → evidence of deletion
- Contributor in archive events but not in contributors list → evidence of permission revocation
- Commit in archive PushEvents but not in API commit list → evidence of force-push/deletion
**Reference**: See [evidence-types.md](./references/evidence-types.md) for GH event types.
---
### Investigator 3: Wayback Machine Investigator
**ROLE BOUNDARY**: You query the WAYBACK MACHINE CDX API ONLY. Do not use the GitHub API.
**Goal**: Recover deleted GitHub pages (READMEs, issues, PRs, releases, wiki pages).
**Actions**:
```bash
# Search for archived snapshots of the repo main page
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO&output=json&limit=100&from=YYYYMMDD&to=YYYYMMDD" > wayback_main.json
# Search for a specific deleted issue
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/issues/NUM&output=json&limit=50" > wayback_issue_NUM.json
# Search for a specific deleted PR
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/pull/NUM&output=json&limit=50" > wayback_pr_NUM.json
# Fetch the best snapshot of a page
# Use the Wayback Machine URL: https://web.archive.org/web/TIMESTAMP/ORIGINAL_URL
# Example: https://web.archive.org/web/20240101000000*/github.com/OWNER/REPO
# Advanced: Search for deleted releases/tags
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/releases/tag/*&output=json" > wayback_tags.json
# Advanced: Search for historical wiki changes
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/wiki/*&output=json" > wayback_wiki.json
```
**Evidence to collect**:
- Archived snapshots of deleted issues/PRs with their content
- Historical README versions showing changes
- Evidence of content present in archive but missing from current GitHub state
**Reference**: See [github-archive-guide.md](./references/github-archive-guide.md) for CDX API parameters.
---
### Investigator 4: GH Archive / BigQuery Investigator
**ROLE BOUNDARY**: You query GITHUB ARCHIVE via BIGQUERY ONLY. This is a tamper-proof record of all public GitHub events.
> **Prerequisites**: Requires Google Cloud credentials with BigQuery access (`gcloud auth application-default login`). If unavailable, skip this investigator and note it in the report.
**Cost Optimization Rules** (MANDATORY):
1. ALWAYS run a `--dry_run` before every query to estimate cost.
2. Use `_TABLE_SUFFIX` to filter by date range and minimize scanned data.
3. Only SELECT the columns you need.
4. Add a LIMIT unless aggregating.
```bash
# Template: safe BigQuery query for PushEvents to OWNER/REPO
bq query --use_legacy_sql=false --dry_run "
SELECT created_at, actor.login, payload.commits, payload.before, payload.head,
payload.size, payload.distinct_size
FROM \`githubarchive.month.*\`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMM' AND 'YYYYMM'
AND type = 'PushEvent'
AND repo.name = 'OWNER/REPO'
LIMIT 1000
"
# If cost is acceptable, re-run without --dry_run
# Detect force-pushes: zero-distinct_size PushEvents mean commits were force-erased
# payload.distinct_size = 0 AND payload.size > 0 → force push indicator
# Check for deleted branch events
bq query --use_legacy_sql=false "
SELECT created_at, actor.login, payload.ref, payload.ref_type
FROM \`githubarchive.month.*\`
WHERE _TABLE_SUFFIX BETWEEN 'YYYYMM' AND 'YYYYMM'
AND type = 'DeleteEvent'
AND repo.name = 'OWNER/REPO'
LIMIT 200
"
```
**Evidence to collect**:
- Force-push events (payload.size > 0, payload.distinct_size = 0)
- DeleteEvents for branches/tags
- WorkflowRunEvents for suspicious CI/CD automation
- PushEvents that precede a "gap" in the git log (evidence of rewrite)
**Reference**: See [github-archive-guide.md](./references/github-archive-guide.md) for all 12 event types and query patterns.
---
### Investigator 5: IOC Enrichment Investigator
**ROLE BOUNDARY**: You enrich EXISTING IOCs from Phase 1 using passive public sources ONLY. Do not execute any code from the target repository.
**Actions**:
- For each commit SHA: attempt recovery via direct GitHub URL (`github.com/OWNER/REPO/commit/SHA.patch`)
- For each domain/IP: check passive DNS, WHOIS records (via `web_extract` on public WHOIS services)
- For each package name: check npm/PyPI for matching malicious package reports
- For each actor username: check GitHub profile, contribution history, account age
- Recover force-pushed commits using 3 methods (see [recovery-techniques.md](./references/recovery-techniques.md))
---
## Phase 3: Evidence Consolidation
After all investigators complete:
1. Run `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list` to see all collected evidence.
2. For each piece of evidence, verify the `content_sha256` hash matches the original source.
3. Group evidence by:
- **Timeline**: Sort all timestamped evidence chronologically
- **Actor**: Group by GitHub handle or email
- **IOC**: Link evidence to the IOC it relates to
4. Identify **discrepancies**: items present in one source but absent in another (key deletion indicators).
5. Flag evidence as `[VERIFIED]` (confirmed from 2+ independent sources) or `[UNVERIFIED]` (single source only).
---
## Phase 4: Hypothesis Formation
A hypothesis must:
- State a specific claim (e.g., "Actor X force-pushed to BRANCH on DATE to erase commit SHA")
- Cite at least 2 evidence IDs that support it (`EV-XXXX`, `EV-YYYY`)
- Identify what evidence would disprove it
- Be labeled `[HYPOTHESIS]` until validated
**Common hypothesis templates** (see [investigation-templates.md](./references/investigation-templates.md)):
- Maintainer Compromise: legitimate account used post-takeover to inject malicious code
- Dependency Confusion: package name squatting to intercept installs
- CI/CD Injection: malicious workflow changes to run code during builds
- Typosquatting: near-identical package name targeting misspellers
- Credential Leak: token/key accidentally committed then force-pushed to erase
For each hypothesis, spawn a `delegate_task` sub-agent to attempt to find disconfirming evidence before confirming.
---
## Phase 5: Hypothesis Validation
The validator sub-agent MUST mechanically check:
1. For each hypothesis, extract all cited evidence IDs.
2. Verify each ID exists in `evidence.json` (hard failure if any ID is missing → hypothesis rejected as potentially fabricated).
3. Verify each `[VERIFIED]` piece of evidence was confirmed from 2+ sources.
4. Check logical consistency: does the timeline depicted by the evidence support the hypothesis?
5. Check for alternative explanations: could the same evidence pattern arise from a benign cause?
**Output**:
- `VALIDATED`: All evidence cited, verified, logically consistent, no plausible alternative explanation.
- `INCONCLUSIVE`: Evidence supports hypothesis but alternative explanations exist or evidence is insufficient.
- `REJECTED`: Missing evidence IDs, unverified evidence cited as fact, logical inconsistency detected.
Rejected hypotheses feed back into Phase 4 for refinement (max 3 iterations).
---
## Phase 6: Final Report Generation
Populate `investigation-report.md` using the template in [forensic-report.md](./templates/forensic-report.md).
**Mandatory sections**:
- Executive Summary: one-paragraph verdict (Compromised / Clean / Inconclusive) with confidence level
- Timeline: chronological reconstruction of all significant events with evidence citations
- Validated Hypotheses: each with status and supporting evidence IDs
- Evidence Registry: table of all `EV-XXXX` entries with source, type, and verification status
- IOC List: all extracted and enriched Indicators of Compromise
- Chain of Custody: how evidence was collected, from what sources, at what timestamps
- Recommendations: immediate mitigations if compromise detected; monitoring recommendations
**Report rules**:
- Every factual claim must have at least one `[EV-XXXX]` citation
- Executive Summary must state confidence level (High / Medium / Low)
- All secrets/credentials must be redacted to `[REDACTED]`
---
## Phase 7: Completion
1. Run final evidence count: `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json list`
2. Archive the full investigation directory.
3. If compromise is confirmed:
- List immediate mitigations (rotate credentials, pin dependency hashes, notify affected users)
- Identify affected versions/packages
- Note disclosure obligations (if a public package: coordinate with the package registry)
4. Present the final `investigation-report.md` to the user.
---
## Ethical Use Guidelines
This skill is designed for **defensive security investigation** — protecting open-source software from supply chain attacks. It must not be used for:
- **Harassment or stalking** of contributors or maintainers
- **Doxing** — correlating GitHub activity to real identities for malicious purposes
- **Competitive intelligence** — investigating proprietary or internal repositories without authorization
- **False accusations** — publishing investigation results without validated evidence (see anti-hallucination guardrails)
Investigations should be conducted with the principle of **minimal intrusion**: collect only the evidence necessary to validate or refute the hypothesis. When publishing results, follow responsible disclosure practices and coordinate with affected maintainers before public disclosure.
If the investigation reveals a genuine compromise, follow the coordinated vulnerability disclosure process:
1. Notify the repository maintainers privately first
2. Allow reasonable time for remediation (typically 90 days)
3. Coordinate with package registries (npm, PyPI, etc.) if published packages are affected
4. File a CVE if appropriate
---
## API Rate Limiting
GitHub REST API enforces rate limits that will interrupt large investigations if not managed.
**Authenticated requests**: 5,000/hour (requires `GITHUB_TOKEN` env var or `gh` CLI auth)
**Unauthenticated requests**: 60/hour (unusable for investigations)
**Best practices**:
- Always authenticate: `export GITHUB_TOKEN=ghp_...` or use `gh` CLI (auto-authenticates)
- Use conditional requests (`If-None-Match` / `If-Modified-Since` headers) to avoid consuming quota on unchanged data
- For paginated endpoints, fetch all pages in sequence — don't parallelize against the same endpoint
- Check `X-RateLimit-Remaining` header; if below 100, pause for `X-RateLimit-Reset` timestamp
- BigQuery has its own quotas (10 TiB/day free tier) — always dry-run first
- Wayback Machine CDX API: no formal rate limit, but be courteous (1-2 req/sec max)
If rate-limited mid-investigation, record the partial results in the evidence store and note the limitation in the report.
---
## Reference Materials
- [github-archive-guide.md](./references/github-archive-guide.md) — BigQuery queries, CDX API, 12 event types
- [evidence-types.md](./references/evidence-types.md) — IOC taxonomy, evidence source types, observation types
- [recovery-techniques.md](./references/recovery-techniques.md) — Recovering deleted commits, PRs, issues
- [investigation-templates.md](./references/investigation-templates.md) — Pre-built hypothesis templates per attack type
- [evidence-store.py](./scripts/evidence-store.py) — CLI tool for managing the evidence JSON store
- [forensic-report.md](./templates/forensic-report.md) — Structured report template
@@ -0,0 +1,89 @@
# Evidence Types Reference
Taxonomy of all evidence types, IOC types, GitHub event types, and observation types
used in OSS forensic investigations.
---
## Evidence Source Types
| Type | Description | Example Sources |
|------|-------------|-----------------|
| `git` | Data from local git repository analysis | `git log`, `git fsck`, `git reflog`, `git blame` |
| `gh_api` | Data from GitHub REST API responses | `/repos/.../commits`, `/repos/.../pulls`, `/repos/.../events` |
| `gh_archive` | Data from GitHub Archive (BigQuery) | `githubarchive.month.*` BigQuery tables |
| `web_archive` | Archived web pages from Wayback Machine | CDX API results, `web.archive.org/web/...` snapshots |
| `ioc` | Indicator of Compromise from any source | Extracted from vendor reports, git history, network traces |
| `analysis` | Derived insight from cross-source correlation | "SHA present in archive but absent from API" |
| `vendor_report` | External security vendor or researcher report | CVE advisories, blog posts, NVD records |
| `manual` | Manually recorded observation by investigator | Notes on behavioral patterns, timeline gaps |
---
## IOC Types
| Type | Description | Example |
|------|-------------|---------|
| `COMMIT_SHA` | A git commit hash linked to malicious activity | `abc123def456...` |
| `FILE_PATH` | A suspicious file inside the repository | `src/utils/crypto.js`, `dist/index.min.js` |
| `API_KEY` | An API key accidentally committed | `AKIA...` (AWS), `ghp_...` (GitHub PAT) |
| `SECRET` | A generic secret / credential | Database password, private key blob |
| `IP_ADDRESS` | A C2 server or attacker IP | `192.0.2.1` |
| `DOMAIN` | A malicious or suspicious domain | `evil-cdn.io`, typosquatted package registry domain |
| `PACKAGE_NAME` | A malicious or squatted package name | `colo-rs` (typosquatting `color`), `lodash-utils` |
| `ACTOR_USERNAME` | A GitHub handle linked to the attack | `malicious-bot-account` |
| `MALICIOUS_URL` | A URL to a malicious resource | `https://evil.example.com/payload.sh` |
| `WORKFLOW_FILE` | A suspicious CI/CD workflow file | `.github/workflows/release.yml` |
| `BRANCH_NAME` | A suspicious branch | `refs/heads/temp-fix-do-not-merge` |
| `TAG_NAME` | A suspicious git tag | `v1.0.0-security-patch` |
| `RELEASE_NAME` | A suspicious release | Release with no associated tag or changelog |
| `OTHER` | Catch-all for unclassified IOCs | — |
---
## GitHub Archive Event Types (12 Types)
| Event Type | Forensic Relevance |
|------------|-------------------|
| `PushEvent` | Core: `payload.distinct_size=0` with `payload.size>0` → force push. `payload.before`/`payload.head` shows rewritten history. |
| `PullRequestEvent` | Detects deleted PRs, rapid open→close patterns, PRs from new accounts |
| `IssueEvent` | Detects deleted issues, coordinated labeling, rapid closure of vulnerability reports |
| `IssueCommentEvent` | Deleted comments, rapid activity bursts |
| `WatchEvent` | Star-farming campaigns (coordinated starring from new accounts) |
| `ForkEvent` | Unusual fork patterns before malicious commit |
| `CreateEvent` | Branch/tag creation: signals new release or code injection point |
| `DeleteEvent` | Branch/tag deletion: critical — often used to hide traces |
| `ReleaseEvent` | Unauthorized releases, release artifacts modified post-publish |
| `MemberEvent` | Collaborator added/removed: maintainer compromise indicator |
| `PublicEvent` | Repository made public (sometimes to drop malicious code briefly) |
| `WorkflowRunEvent` | CI/CD pipeline executions: workflow injection, secret exfiltration |
---
## Evidence Verification States
| State | Meaning |
|-------|---------|
| `unverified` | Collected from a single source, not cross-referenced |
| `single_source` | The primary source has been confirmed directly (e.g., SHA resolves on GitHub), but no second source |
| `multi_source_verified` | Confirmed from 2+ independent sources (e.g., GH Archive AND GitHub API both show the same event) |
Only `multi_source_verified` evidence may be cited as fact in validated hypotheses.
`unverified` and `single_source` evidence must be labeled `[UNVERIFIED]` or `[SINGLE-SOURCE]`.
---
## Observation Types (Patterned after RAPTOR)
| Type | Description |
|------|-------------|
| `CommitObservation` | Specific commit SHA with metadata (author, date, files changed) |
| `ForceWashObservation` | Evidence that commits were force-erased from a branch |
| `DanglingCommitObservation` | SHA present in git object store but unreachable from any ref |
| `IssueObservation` | A GitHub issue (current or archived) with title, body, timestamp |
| `PRObservation` | A GitHub PR (current or archived) with diff summary, reviewers |
| `IOC` | A single Indicator of Compromise with context |
| `TimelineGap` | A period with unusual absence of expected activity |
| `ActorAnomalyObservation` | Behavioral anomaly for a specific GitHub actor |
| `WorkflowAnomalyObservation` | Suspicious CI/CD workflow change or unexpected run |
| `CrossSourceDiscrepancy` | Item present in one source but absent in another (strong deletion indicator) |
@@ -0,0 +1,184 @@
# GitHub Archive Query Guide (BigQuery)
GitHub Archive records every public event on GitHub as immutable JSON records. This data is accessible via Google BigQuery and is the most reliable source for forensic investigation — events cannot be deleted or modified after recording.
## Public Dataset
- **Project**: `githubarchive`
- **Tables**: `day.YYYYMMDD`, `month.YYYYMM`, `year.YYYY`
- **Cost**: $6.25 per TiB scanned. Always run dry runs first.
- **Access**: Requires a Google Cloud account with BigQuery enabled. Free tier includes 1 TiB/month of queries.
---
## The 12 GitHub Event Types
| Event Type | What It Records | Forensic Value |
|------------|-----------------|----------------|
| `PushEvent` | Commits pushed to a branch | Force-push detection, commit timeline, author attribution |
| `PullRequestEvent` | PR opened, closed, merged, reopened | Deleted PR recovery, review timeline |
| `IssuesEvent` | Issue opened, closed, reopened, labeled | Deleted issue recovery, social engineering traces |
| `IssueCommentEvent` | Comments on issues and PRs | Deleted comment recovery, communication patterns |
| `CreateEvent` | Branch, tag, or repository creation | Suspicious branch creation, tag timing |
| `DeleteEvent` | Branch or tag deletion | Evidence of cleanup after compromise |
| `MemberEvent` | Collaborator added or removed | Permission changes, access escalation |
| `PublicEvent` | Repository made public | Accidental exposure of private repos |
| `WatchEvent` | User stars a repository | Actor reconnaissance patterns |
| `ForkEvent` | Repository forked | Exfiltration of code before cleanup |
| `ReleaseEvent` | Release published, edited, deleted | Malicious release injection, deleted release recovery |
| `WorkflowRunEvent` | GitHub Actions workflow triggered | CI/CD abuse, unauthorized workflow runs |
---
## Query Templates
### Basic: All Events for a Repository
```sql
SELECT
created_at,
type,
actor.login,
repo.name,
payload
FROM
`githubarchive.day.20240101` -- Adjust date
WHERE
repo.name = 'owner/repo'
AND type IN ('PushEvent', 'DeleteEvent', 'MemberEvent')
ORDER BY
created_at ASC
```
### Force-Push Detection
Force-pushes produce PushEvents where commits are overwritten. Key indicators:
- `payload.distinct_size = 0` with `payload.size > 0` → commits were erased
- `payload.before` contains the SHA before the rewrite (recoverable)
```sql
SELECT
created_at,
actor.login,
JSON_EXTRACT_SCALAR(payload, '$.before') AS before_sha,
JSON_EXTRACT_SCALAR(payload, '$.head') AS after_sha,
JSON_EXTRACT_SCALAR(payload, '$.size') AS total_commits,
JSON_EXTRACT_SCALAR(payload, '$.distinct_size') AS distinct_commits,
JSON_EXTRACT_SCALAR(payload, '$.ref') AS branch_ref
FROM
`githubarchive.month.*`
WHERE
_TABLE_SUFFIX BETWEEN '202401' AND '202403'
AND type = 'PushEvent'
AND repo.name = 'owner/repo'
AND CAST(JSON_EXTRACT_SCALAR(payload, '$.distinct_size') AS INT64) = 0
ORDER BY
created_at ASC
```
### Deleted Branch/Tag Detection
```sql
SELECT
created_at,
actor.login,
JSON_EXTRACT_SCALAR(payload, '$.ref') AS deleted_ref,
JSON_EXTRACT_SCALAR(payload, '$.ref_type') AS ref_type
FROM
`githubarchive.month.*`
WHERE
_TABLE_SUFFIX BETWEEN '202401' AND '202403'
AND type = 'DeleteEvent'
AND repo.name = 'owner/repo'
ORDER BY
created_at ASC
```
### Collaborator Permission Changes
```sql
SELECT
created_at,
actor.login,
JSON_EXTRACT_SCALAR(payload, '$.action') AS action,
JSON_EXTRACT_SCALAR(payload, '$.member.login') AS member
FROM
`githubarchive.month.*`
WHERE
_TABLE_SUFFIX BETWEEN '202401' AND '202403'
AND type = 'MemberEvent'
AND repo.name = 'owner/repo'
ORDER BY
created_at ASC
```
### CI/CD Workflow Activity
```sql
SELECT
created_at,
actor.login,
JSON_EXTRACT_SCALAR(payload, '$.action') AS action,
JSON_EXTRACT_SCALAR(payload, '$.workflow_run.name') AS workflow_name,
JSON_EXTRACT_SCALAR(payload, '$.workflow_run.conclusion') AS conclusion,
JSON_EXTRACT_SCALAR(payload, '$.workflow_run.head_sha') AS head_sha
FROM
`githubarchive.month.*`
WHERE
_TABLE_SUFFIX BETWEEN '202401' AND '202403'
AND type = 'WorkflowRunEvent'
AND repo.name = 'owner/repo'
ORDER BY
created_at ASC
```
### Actor Activity Profiling
```sql
SELECT
type,
COUNT(*) AS event_count,
MIN(created_at) AS first_event,
MAX(created_at) AS last_event
FROM
`githubarchive.month.*`
WHERE
_TABLE_SUFFIX BETWEEN '202301' AND '202412'
AND actor.login = 'suspicious-username'
GROUP BY type
ORDER BY event_count DESC
```
---
## Cost Optimization (MANDATORY)
1. **Always dry run first**: Add `--dry_run` flag to `bq query` to see estimated bytes scanned before executing.
2. **Use `_TABLE_SUFFIX`**: Narrow the date range as much as possible. `day.*` tables are cheapest for narrow windows; `month.*` for broader sweeps.
3. **Select only needed columns**: Avoid `SELECT *`. The `payload` column is large — only select specific JSON paths.
4. **Add LIMIT**: Use `LIMIT 1000` during exploration. Remove only for final exhaustive queries.
5. **Column filtering in WHERE**: Filter on indexed columns (`type`, `repo.name`, `actor.login`) before payload extraction.
**Cost estimation**: A single month of GH Archive data is ~1-2 TiB uncompressed. Querying a specific repo + event type with `_TABLE_SUFFIX` typically scans 1-10 GiB ($0.006-$0.06).
---
## Accessing via Hermes
**Option A: BigQuery CLI** (if `gcloud` is installed)
```bash
bq query --use_legacy_sql=false --format=json "YOUR QUERY"
```
**Option B: Python** (via `execute_code`)
```python
from google.cloud import bigquery
client = bigquery.Client()
query = "YOUR QUERY"
results = client.query(query).result()
for row in results:
print(dict(row))
```
**Option C: No GCP credentials available**
If BigQuery is unavailable, document this limitation in the report. Use the other 4 investigators (Git, GitHub API, Wayback Machine, IOC Enrichment) — they cover most investigation needs without BigQuery.
@@ -0,0 +1,131 @@
# Investigation Templates
Pre-built hypothesis and investigation templates for common supply chain attack scenarios.
Each template includes: attack pattern, key evidence to collect, and hypothesis starters.
---
## Template 1: Maintainer Account Compromise
**Pattern**: Attacker gains access to a legitimate maintainer account (phishing, credential stuffing)
and uses it to push malicious code, create backdoored releases, or exfiltrate CI secrets.
**Real-world examples**: XZ Utils (2024), Codecov (2021), event-stream (2018)
**Key Evidence to Collect**:
- [ ] Push events from maintainer account outside normal working hours/timezone
- [ ] Commits adding new dependencies, obfuscated code, or modified build scripts
- [ ] Release creation immediately after suspicious push (to maximize package distribution)
- [ ] MemberEvent adding unknown collaborators (attacker adding backup access)
- [ ] WorkflowRunEvent with unexpected secret access or exfiltration-like behavior
- [ ] Account login location changes (check social media, conference talks for corroboration)
**Hypothesis Starters**:
```
[HYPOTHESIS] Actor <HANDLE>'s account was compromised on or around <DATE>,
based on anomalous commit timing [EV-XXXX] and geographic access patterns [EV-YYYY].
```
```
[HYPOTHESIS] Release <VERSION> was published by the compromised account to push
malicious code to downstream users, evidenced by the malicious commit [EV-XXXX]
being added <N> hours before the release [EV-YYYY].
```
---
## Template 2: Malicious Dependency Injection
**Pattern**: A trusted package is modified to include malicious code in a dependency,
or a new malicious dependency is injected into an existing package.
**Key Evidence to Collect**:
- [ ] Diff of `package.json`/`requirements.txt`/`go.mod` before and after suspicious commit
- [ ] The new dependency's publication timestamp vs. the injection commit timestamp
- [ ] Whether the new dependency exists on npm/PyPI and who owns it
- [ ] Any obfuscation patterns in the injected dependency code
- [ ] Install-time scripts (`postinstall`, `setup.py`, etc.) that execute code on install
**Hypothesis Starters**:
```
[HYPOTHESIS] Commit <SHA> [EV-XXXX] introduced dependency <PACKAGE@VERSION>
which appears to be a malicious package published by actor <HANDLE> [EV-YYYY],
designed to execute <BEHAVIOR> during installation.
```
---
## Template 3: CI/CD Pipeline Injection
**Pattern**: Attacker modifies GitHub Actions workflows to steal secrets, exfiltrate code,
or inject malicious artifacts into the build output.
**Key Evidence to Collect**:
- [ ] Diff of all `.github/workflows/*.yml` files before/after suspicious period
- [ ] WorkflowRunEvents triggered by the modified workflows
- [ ] Any `curl`, `wget`, or network calls added to workflow steps
- [ ] New or modified `env:` sections referencing `secrets.*`
- [ ] Artifacts produced by modified workflow runs
**Hypothesis Starters**:
```
[HYPOTHESIS] Workflow file <FILE> was modified in commit <SHA> [EV-XXXX] to
exfiltrate repository secrets via <METHOD>, as evidenced by the added network
call pattern [EV-YYYY].
```
---
## Template 4: Typosquatting / Dependency Confusion
**Pattern**: Attacker registers a package with a name similar to a popular package
(or an internal package name) to intercept installs from users who mistype.
**Key Evidence to Collect**:
- [ ] Registration timestamp of the suspicious package on the registry
- [ ] Package content: does it contain malicious code or is it a stub?
- [ ] Download statistics for the suspicious package
- [ ] Names of internal packages that could be targeted (if private repo scope)
- [ ] Any references to the legitimate package in the malicious one's metadata
**Hypothesis Starters**:
```
[HYPOTHESIS] Package <MALICIOUS_NAME> was registered on <DATE> [EV-XXXX] to
typosquat on <LEGITIMATE_NAME>, targeting users who misspell the package name.
The package contains <BEHAVIOR> [EV-YYYY].
```
---
## Template 5: Force-Push History Rewrite (Evidence Erasure)
**Pattern**: After a malicious commit is detected (or before wider notice), the attacker
force-pushes to remove the malicious commit from branch history.
**Detection is key** — this template focuses on proving the erasure happened.
**Key Evidence to Collect**:
- [ ] GH Archive PushEvent with `distinct_size=0` (force push indicator) [EV-XXXX]
- [ ] The SHA of the commit BEFORE the force push (from GH Archive `payload.before`)
- [ ] Recovery of the erased commit via direct URL or `git fetch origin SHA`
- [ ] Wayback Machine snapshot of the commit page before erasure
- [ ] Timeline gap in git log (N commits visible in archive but M < N in current repo)
**Hypothesis Starters**:
```
[HYPOTHESIS] Actor <HANDLE> force-pushed branch <BRANCH> on <DATE> [EV-XXXX]
to erase commit <SHA> [EV-YYYY], which contained <MALICIOUS_CONTENT>.
The erased commit was recovered via <METHOD> [EV-ZZZZ].
```
---
## Cross-Cutting Investigation Checklist
Apply to every investigation regardless of template:
- [ ] Check all contributors for newly created accounts (< 30 days old at time of malicious activity)
- [ ] Check if any maintainer account changed email in the period (sign of account takeover)
- [ ] Verify GPG signatures on suspicious commits match known maintainer keys
- [ ] Check if the repository changed ownership or transferred orgs near the incident
- [ ] Look for "cleanup" commits immediately after the malicious commit (cover-up pattern)
- [ ] Check related packages/repos by the same author for similar patterns
@@ -0,0 +1,164 @@
# Deleted Content Recovery Techniques
## Key Insight: GitHub Never Fully Deletes Force-Pushed Commits
Force-pushed commits are removed from the branch history but REMAIN on GitHub's servers until garbage collection runs (which can take weeks to months). This is the foundation of deleted commit recovery.
---
## Method 1: Direct GitHub URL (Fastest — No Auth Required)
If you have a commit SHA, access it directly even if it was force-pushed off a branch:
```bash
# View commit metadata
curl -s "https://github.com/OWNER/REPO/commit/SHA"
# Download as patch (includes full diff)
curl -s "https://github.com/OWNER/REPO/commit/SHA.patch" > recovered_commit.patch
# Download as diff
curl -s "https://github.com/OWNER/REPO/commit/SHA.diff" > recovered_commit.diff
# Example (Istio credential leak - real incident):
curl -s "https://github.com/istio/istio/commit/FORCE_PUSHED_SHA.patch"
```
**When this works**: SHA is known (from GH Archive, Wayback Machine, or `git fsck`)
**When this fails**: GitHub has already garbage-collected the object (rare, typically 3090 days post-force-push)
---
## Method 2: GitHub REST API
```bash
# Works for commits force-pushed off branches but still on server
# Note: /commits/SHA may 404, but /git/commits/SHA often succeeds for orphaned commits
curl -s "https://api.github.com/repos/OWNER/REPO/git/commits/SHA" | jq .
# Get the tree (file listing) of a force-pushed commit
curl -s "https://api.github.com/repos/OWNER/REPO/git/trees/SHA?recursive=1" | jq .
# Get a specific file from a force-pushed commit
curl -s "https://api.github.com/repos/OWNER/REPO/contents/PATH?ref=SHA" | jq .content | base64 -d
```
---
## Method 3: Git Fetch by SHA (Local — Requires Clone)
```bash
# Fetch an orphaned commit directly by SHA into local repo
cd target_repo
git fetch origin SHA
git log FETCH_HEAD -1 # view the commit
git diff FETCH_HEAD~1 FETCH_HEAD # view the diff
# If the SHA was recently force-pushed it will still be fetchable
# This stops working once GitHub GC runs
```
---
## Method 4: Dangling Commits via git fsck
```bash
cd target_repo
# Find all unreachable objects (includes force-pushed commits)
git fsck --unreachable --no-reflogs 2>&1 | grep "unreachable commit" | awk '{print $3}' > dangling_shas.txt
# For each dangling commit, get its metadata
while read sha; do
echo "=== $sha ===" >> dangling_details.txt
git show --stat "$sha" >> dangling_details.txt 2>&1
done < dangling_shas.txt
# Note: dangling objects only exist in LOCAL clone — not the same as GitHub's copies
# GitHub's copies are accessible via Methods 1-3 until GC runs
```
---
## Recovering Deleted GitHub Issues and PRs
### Via Wayback Machine CDX API
```bash
# Find all archived snapshots of a specific issue
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO/issues/NUMBER&output=json&limit=50&fl=timestamp,statuscode,original" | python3 -m json.tool
# Fetch the best snapshot
# Use the timestamp from the CDX result:
# https://web.archive.org/web/TIMESTAMP/https://github.com/OWNER/REPO/issues/NUMBER
curl -s "https://web.archive.org/web/TIMESTAMP/https://github.com/OWNER/REPO/issues/NUMBER" > issue_NUMBER_archived.html
# Find all snapshots of the repo in a date range
curl -s "https://web.archive.org/cdx/search/cdx?url=github.com/OWNER/REPO*&output=json&from=20240101&to=20240201&limit=200&fl=timestamp,urlkey,statuscode" | python3 -m json.tool
```
### Via GitHub API (Limited — Only Non-Deleted Content)
```bash
# Closed issues (not deleted) are retrievable
curl -s "https://api.github.com/repos/OWNER/REPO/issues?state=closed&per_page=100" | jq '.[].number'
# Note: DELETED issues/PRs do NOT appear in the API. Use Wayback Machine or GH Archive for those.
```
### Via GitHub Archive (For Event History — Not Content)
```sql
-- Find all IssueEvents for a repo in a date range
SELECT created_at, actor.login, payload.action, payload.issue.number, payload.issue.title
FROM `githubarchive.day.*`
WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240201'
AND type = 'IssuesEvent'
AND repo.name = 'OWNER/REPO'
ORDER BY created_at
```
---
## Recovering Deleted Files from a Known Commit
```bash
# If you have the commit SHA (even force-pushed):
git show SHA:path/to/file.py > recovered_file.py
# Or via API (base64 encoded content):
curl -s "https://api.github.com/repos/OWNER/REPO/contents/path/to/file.py?ref=SHA" | python3 -c "
import sys, json, base64
d = json.load(sys.stdin)
print(base64.b64decode(d['content']).decode())
"
```
---
## Evidence Recording
After recovering any deleted content, immediately record it:
```bash
python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json add \
--source "git fetch origin FORCE_PUSHED_SHA" \
--content "Recovered commit: FORCE_PUSHED_SHA | Author: attacker@example.com | Date: 2024-01-15 | Added file: malicious.sh" \
--type git \
--actor "attacker-handle" \
--url "https://github.com/OWNER/REPO/commit/FORCE_PUSHED_SHA.patch" \
--timestamp "2024-01-15T00:00:00Z" \
--verification single_source \
--notes "Commit force-pushed off main branch on 2024-01-16. Recovered via direct fetch."
```
---
## Recovery Failure Modes
| Failure | Cause | Workaround |
|---------|-------|------------|
| `git fetch origin SHA` returns "not our ref" | GitHub GC already ran | Try Method 1/2, search Wayback Machine |
| `github.com/OWNER/REPO/commit/SHA` returns 404 | GC ran or SHA is wrong | Verify SHA via GH Archive; try partial SHA search |
| Wayback Machine has no snapshots | Page was never crawled by IA | Check `commoncrawl.org`, check Google Cache |
| BigQuery shows event but no content | GH Archive stores event metadata, not file contents | Recovery only reveals the event occurred, not the content |
@@ -0,0 +1,313 @@
#!/usr/bin/env python3
"""
OSS Forensics Evidence Store Manager
Manages a JSON-based evidence store for forensic investigations.
Commands:
add - Add a piece of evidence
list - List all evidence (optionally filter by type or actor)
verify - Re-check SHA-256 hashes for integrity
query - Search evidence by keyword
export - Export evidence as a Markdown table
summary - Print investigation statistics
Usage example:
python3 evidence-store.py --store evidence.json add \
--source "git fsck output" --content "dangling commit abc123" \
--type git --actor "malicious-user" --url "https://github.com/owner/repo/commit/abc123"
python3 evidence-store.py --store evidence.json list --type git
python3 evidence-store.py --store evidence.json verify
python3 evidence-store.py --store evidence.json export > evidence-table.md
"""
import json
import argparse
import os
import datetime
import hashlib
import sys
EVIDENCE_TYPES = [
"git", # Local git repository data (commits, reflog, fsck)
"gh_api", # GitHub REST API responses
"gh_archive", # GitHub Archive / BigQuery query results
"web_archive", # Wayback Machine snapshots
"ioc", # Indicator of Compromise (SHA, domain, IP, package name, etc.)
"analysis", # Derived analysis / cross-source correlation result
"manual", # Manually noted observation
"vendor_report", # External security vendor report excerpt
]
VERIFICATION_STATES = ["unverified", "single_source", "multi_source_verified"]
IOC_TYPES = [
"COMMIT_SHA", "FILE_PATH", "API_KEY", "SECRET", "IP_ADDRESS",
"DOMAIN", "PACKAGE_NAME", "ACTOR_USERNAME", "MALICIOUS_URL",
"WORKFLOW_FILE", "BRANCH_NAME", "TAG_NAME", "RELEASE_NAME", "OTHER",
]
def _now_iso():
return datetime.datetime.now(datetime.timezone.utc).isoformat(timespec="seconds") + "Z"
def _sha256(content: str) -> str:
return hashlib.sha256(content.encode("utf-8")).hexdigest()
class EvidenceStore:
def __init__(self, filepath: str):
self.filepath = filepath
self.data = {
"metadata": {
"version": "2.0",
"created_at": _now_iso(),
"last_updated": _now_iso(),
"investigation": "",
"target_repo": "",
},
"evidence": [],
"chain_of_custody": [],
}
if os.path.exists(filepath):
try:
with open(filepath, "r", encoding="utf-8") as f:
self.data = json.load(f)
except (json.JSONDecodeError, IOError) as e:
print(f"Error loading evidence store '{filepath}': {e}", file=sys.stderr)
print("Hint: The file might be corrupted. Check for manual edits or syntax errors.", file=sys.stderr)
sys.exit(1)
def _save(self):
self.data["metadata"]["last_updated"] = _now_iso()
with open(self.filepath, "w", encoding="utf-8") as f:
json.dump(self.data, f, indent=2, ensure_ascii=False)
def _next_id(self) -> str:
return f"EV-{len(self.data['evidence']) + 1:04d}"
def add(
self,
source: str,
content: str,
evidence_type: str,
actor: str = None,
url: str = None,
timestamp: str = None,
ioc_type: str = None,
verification: str = "unverified",
notes: str = None,
) -> str:
evidence_id = self._next_id()
entry = {
"id": evidence_id,
"type": evidence_type,
"source": source,
"content": content,
"content_sha256": _sha256(content),
"actor": actor,
"url": url,
"event_timestamp": timestamp,
"collected_at": _now_iso(),
"ioc_type": ioc_type,
"verification": verification,
"notes": notes,
}
self.data["evidence"].append(entry)
self.data["chain_of_custody"].append({
"action": "add",
"evidence_id": evidence_id,
"timestamp": _now_iso(),
"source": source,
})
self._save()
return evidence_id
def list_evidence(self, filter_type: str = None, filter_actor: str = None):
results = self.data["evidence"]
if filter_type:
results = [e for e in results if e.get("type") == filter_type]
if filter_actor:
results = [e for e in results if e.get("actor") == filter_actor]
return results
def verify_integrity(self):
"""Re-compute SHA-256 for all entries and report mismatches."""
issues = []
for entry in self.data["evidence"]:
expected = _sha256(entry["content"])
stored = entry.get("content_sha256", "")
if expected != stored:
issues.append({
"id": entry["id"],
"stored_sha256": stored,
"computed_sha256": expected,
})
return issues
def query(self, keyword: str):
"""Search for keyword in content, source, actor, or url."""
keyword_lower = keyword.lower()
return [
e for e in self.data["evidence"]
if keyword_lower in (e.get("content", "") or "").lower()
or keyword_lower in (e.get("source", "") or "").lower()
or keyword_lower in (e.get("actor", "") or "").lower()
or keyword_lower in (e.get("url", "") or "").lower()
]
def export_markdown(self) -> str:
lines = [
"# Evidence Registry",
"",
f"**Store**: `{self.filepath}`",
f"**Last Updated**: {self.data['metadata'].get('last_updated', 'N/A')}",
f"**Total Evidence Items**: {len(self.data['evidence'])}",
"",
"| ID | Type | Source | Actor | Verification | Event Timestamp | URL |",
"|----|------|--------|-------|--------------|-----------------|-----|",
]
for e in self.data["evidence"]:
url = e.get("url") or ""
url_display = f"[link]({url})" if url else ""
lines.append(
f"| {e['id']} | {e.get('type','')} | {e.get('source','')} "
f"| {e.get('actor') or ''} | {e.get('verification','')} "
f"| {e.get('event_timestamp') or ''} | {url_display} |"
)
lines.append("")
lines.append("## Chain of Custody")
lines.append("")
lines.append("| Evidence ID | Action | Timestamp | Source |")
lines.append("|-------------|--------|-----------|--------|")
for c in self.data["chain_of_custody"]:
lines.append(
f"| {c.get('evidence_id','')} | {c.get('action','')} "
f"| {c.get('timestamp','')} | {c.get('source','')} |"
)
return "\n".join(lines)
def summary(self) -> dict:
by_type = {}
by_verification = {}
actors = set()
for e in self.data["evidence"]:
t = e.get("type", "unknown")
by_type[t] = by_type.get(t, 0) + 1
v = e.get("verification", "unverified")
by_verification[v] = by_verification.get(v, 0) + 1
if e.get("actor"):
actors.add(e["actor"])
return {
"total": len(self.data["evidence"]),
"by_type": by_type,
"by_verification": by_verification,
"unique_actors": sorted(actors),
}
def main():
parser = argparse.ArgumentParser(
description="OSS Forensics Evidence Store Manager v2.0",
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument("--store", default="evidence.json", help="Path to evidence JSON file (default: evidence.json)")
subparsers = parser.add_subparsers(dest="command", metavar="COMMAND")
# --- add ---
add_p = subparsers.add_parser("add", help="Add a new evidence entry")
add_p.add_argument("--source", required=True, help="Where this evidence came from (e.g. 'git fsck', 'GH API /commits')")
add_p.add_argument("--content", required=True, help="The evidence content (commit SHA, API response excerpt, etc.)")
add_p.add_argument("--type", required=True, choices=EVIDENCE_TYPES, dest="evidence_type", help="Evidence type")
add_p.add_argument("--actor", help="GitHub handle or email of associated actor")
add_p.add_argument("--url", help="URL to original source")
add_p.add_argument("--timestamp", help="When the event occurred (ISO 8601)")
add_p.add_argument("--ioc-type", choices=IOC_TYPES, help="IOC subtype (for --type ioc)")
add_p.add_argument("--verification", choices=VERIFICATION_STATES, default="unverified")
add_p.add_argument("--notes", help="Additional investigator notes")
add_p.add_argument("--quiet", action="store_true", help="Suppress success message")
# --- list ---
list_p = subparsers.add_parser("list", help="List all evidence entries")
list_p.add_argument("--type", dest="filter_type", choices=EVIDENCE_TYPES, help="Filter by type")
list_p.add_argument("--actor", dest="filter_actor", help="Filter by actor")
# --- verify ---
subparsers.add_parser("verify", help="Verify SHA-256 integrity of all evidence content")
# --- query ---
query_p = subparsers.add_parser("query", help="Search evidence by keyword")
query_p.add_argument("keyword", help="Keyword to search for")
# --- export ---
subparsers.add_parser("export", help="Export evidence as a Markdown table (stdout)")
# --- summary ---
subparsers.add_parser("summary", help="Print investigation statistics")
args = parser.parse_args()
if not args.command:
parser.print_help()
sys.exit(0)
store = EvidenceStore(args.store)
if args.command == "add":
eid = store.add(
source=args.source,
content=args.content,
evidence_type=args.evidence_type,
actor=args.actor,
url=args.url,
timestamp=args.timestamp,
ioc_type=args.ioc_type,
verification=args.verification,
notes=args.notes,
)
if not getattr(args, "quiet", False):
print(f"✓ Added evidence: {eid}")
elif args.command == "list":
items = store.list_evidence(
filter_type=getattr(args, "filter_type", None),
filter_actor=getattr(args, "filter_actor", None),
)
if not items:
print("No evidence found.")
for e in items:
actor_str = f" | actor: {e['actor']}" if e.get("actor") else ""
url_str = f" | {e['url']}" if e.get("url") else ""
print(f"[{e['id']}] {e['type']:12s} | {e['verification']:20s} | {e['source']}{actor_str}{url_str}")
elif args.command == "verify":
issues = store.verify_integrity()
if not issues:
print(f"✓ All {len(store.data['evidence'])} evidence entries passed SHA-256 integrity check.")
else:
print(f"{len(issues)} integrity issue(s) detected:")
for i in issues:
print(f" [{i['id']}] stored={i['stored_sha256'][:16]}... computed={i['computed_sha256'][:16]}...")
sys.exit(1)
elif args.command == "query":
results = store.query(args.keyword)
print(f"Found {len(results)} result(s) for '{args.keyword}':")
for e in results:
print(f" [{e['id']}] {e['type']} | {e['source']} | {e['content'][:80]}")
elif args.command == "export":
print(store.export_markdown())
elif args.command == "summary":
s = store.summary()
print(f"Total evidence items : {s['total']}")
print(f"By type : {json.dumps(s['by_type'], indent=2)}")
print(f"By verification : {json.dumps(s['by_verification'], indent=2)}")
print(f"Unique actors : {s['unique_actors']}")
if __name__ == "__main__":
main()
@@ -0,0 +1,151 @@
# Forensic Investigation Report
> **Instructions**: Fill in all sections. Every factual claim must cite at least one `[EV-XXXX]` evidence ID.
> Remove placeholder text and instruction notes before finalizing. Redact all secrets to `[REDACTED]`.
---
## Executive Summary
**Target Repository**: `OWNER/REPO`
**Investigation Period**: YYYY-MM-DD to YYYY-MM-DD
**Verdict**: <!-- Compromised / Clean / Inconclusive -->
**Confidence Level**: <!-- High / Medium / Low -->
**Report Date**: YYYY-MM-DD
**Investigator**: <!-- Agent session ID or analyst name -->
<!-- One paragraph: what was investigated, what was found, what is recommended. -->
---
## Timeline of Events
> All timestamps in UTC. Each event must cite at least one evidence ID.
| Timestamp (UTC) | Event | Evidence IDs | Source |
|-----------------|-------|--------------|--------|
| YYYY-MM-DDTHH:MM:SSZ | _Describe event_ | [EV-XXXX] | git / gh_api / gh_archive / web_archive |
| | | | |
---
## Validated Hypotheses
### Hypothesis 1: <!-- Short title -->
**Status**: <!-- VALIDATED / INCONCLUSIVE / REJECTED -->
**Claim**: _Full statement of the hypothesis._
**Supporting Evidence**:
- [EV-XXXX]: _What this evidence shows_
- [EV-YYYY]: _What this evidence shows_
**Counter-Evidence Considered**: _What might disprove this, and why it was ruled out or not._
**Confidence**: <!-- High / Medium / Low, and why -->
---
## Indicators of Compromise (IOC List)
| Type | Value | Status | Evidence |
|------|-------|--------|----------|
| COMMIT_SHA | `abc123...` | Confirmed malicious | [EV-XXXX] |
| ACTOR_USERNAME | `handle` | Suspected compromised | [EV-YYYY] |
| FILE_PATH | `src/evil.js` | Confirmed malicious | [EV-ZZZZ] |
| DOMAIN | `evil-cdn.io` | Confirmed C2 | [EV-WWWW] |
---
## Affected Versions
| Version / Tag | Published | Contains Malicious Code | Evidence |
|---------------|-----------|------------------------|----------|
| `v1.2.3` | YYYY-MM-DD | Yes / No / Unknown | [EV-XXXX] |
---
## Evidence Registry
> Generated by: `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json export`
<!-- Paste the Markdown table output from the evidence-store.py export command here -->
| ID | Type | Source | Actor | Verification | Event Timestamp | URL |
|----|------|--------|-------|--------------|-----------------|-----|
| EV-0001 | | | | | | |
---
## Chain of Custody
> Generated by: `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json export`
<!-- Paste the chain of custody section from the export output here -->
| Evidence ID | Action | Timestamp | Source |
|-------------|--------|-----------|--------|
| EV-0001 | add | | |
---
## Technical Findings
### Git History Analysis
_Summarize findings from local git analysis: dangling commits, reflog anomalies, unsigned commits, binary additions, etc._
### GitHub API Analysis
_Summarize findings from GitHub REST API: deleted PRs/issues, contributor changes, release anomalies, etc._
### GitHub Archive Analysis
_Summarize findings from BigQuery: force-push events, delete events, workflow anomalies, member changes, etc._
_Note: If BigQuery was unavailable, state this explicitly._
### Wayback Machine Analysis
_Summarize findings from archive.org: recovered deleted pages, historical content differences, etc._
### IOC Enrichment
_Summarize enrichment results: WHOIS data for domains, recovered commit content, actor account analysis, etc._
---
## Recommendations
### Immediate Actions (If Compromise Confirmed)
- [ ] Rotate all GitHub tokens, API keys, and credentials that may have been exposed
- [ ] Pin dependency versions to hashes in all affected packages
- [ ] Publish a security advisory / CVE if applicable
- [ ] Notify downstream users/package registries (npm, PyPI, etc.)
- [ ] Revoke access for the compromised account and re-secure with hardware 2FA
- [ ] Audit all CI/CD workflow files for unauthorized modifications
- [ ] Review all releases published during the compromise window
### Monitoring Recommendations
- [ ] Enable branch protection on `main`/`master` (require code review, disallow force-push)
- [ ] Enable required commit signing (GPG/SSH)
- [ ] Set up GitHub audit log streaming for future monitoring
- [ ] Pin critical dependencies to known-good SHAs in lock files
---
## Limitations and Caveats
- _List any data sources that were unavailable (e.g., no BigQuery access)_
- _Note any evidence that is single-source only (not independently verified)_
- _Note any hypotheses that could not be confirmed or denied_
---
## References
- Evidence store: `evidence.json` (SHA-256 integrity: run `python3 SKILL_DIR/scripts/evidence-store.py --store evidence.json verify`)
- Related issues: <!-- Link to GitHub issues, CVEs, security advisories -->
- RAPTOR framework: https://github.com/gadievron/raptor
@@ -0,0 +1,43 @@
# Malicious Package Investigation Report
---
## 📦 Package Metadata
- **Package Name**:
- **Registry**: [NPM / PyPI / RubyGems / etc.]
- **Affected Versions**:
- **Malicious Version(s)**:
- **Downloads at Time of Detection**:
- **Package URL**:
---
## 🚩 Indicators of Compromise (IOCs)
- **Malicious URL(s)**:
- **Exfiltrated Data Types**: [Environment variables, ~/.ssh/id_rsa, /etc/shadow, etc.]
- **Exfiltration Method**: [DNS tunneling, HTTP POST to C2, etc.]
- **C2 IP/Domain**:
---
## 🛠️ Analysis Summary
- **Primary Mechanism**: [Typosquatting / Dependency Confusion / Maintainer Takeover]
- **Behavior Description**:
- [Example: Installs a postinstall script that exfiltrates environment variables.]
- [Example: Patches `setup.py` to download a secondary payload.]
---
## 🔍 Evidence Registry
| Evidence ID | Type | Source | Description |
|-------------|------|--------|-------------|
| EV-XXXX | ioc | NPM | Package install script snapshot |
| EV-YYYY | web | Wayback| Historical version comparison |
---
## 🛡️ Recommended Mitigations
1. [ ] Unpublish/Report the package to the registry.
2. [ ] Audit `package-lock.json` or `requirements.txt` across all projects.
3. [ ] Rotate secrets exfiltrated via environment variables.
4. [ ] Pin specific hashes (SHASUM) for mission-critical dependencies.
+192
View File
@@ -0,0 +1,192 @@
---
name: sherlock
description: OSINT username search across 400+ social networks. Hunt down social media accounts by username.
version: 1.0.0
author: unmodeled-tyler
license: MIT
metadata:
hermes:
tags: [osint, security, username, social-media, reconnaissance]
category: security
prerequisites:
commands: [sherlock]
---
# Sherlock OSINT Username Search
Hunt down social media accounts by username across 400+ social networks using the [Sherlock Project](https://github.com/sherlock-project/sherlock).
## When to Use
- User asks to find accounts associated with a username
- User wants to check username availability across platforms
- User is conducting OSINT or reconnaissance research
- User asks "where is this username registered?" or similar
## Requirements
- Sherlock CLI installed: `pipx install sherlock-project` or `pip install sherlock-project`
- Alternatively: Docker available (`docker run -it --rm sherlock/sherlock`)
- Network access to query social platforms
## Procedure
### 1. Check if Sherlock is Installed
**Before doing anything else**, verify sherlock is available:
```bash
sherlock --version
```
If the command fails:
- Offer to install: `pipx install sherlock-project` (recommended) or `pip install sherlock-project`
- **Do NOT** try multiple installation methods — pick one and proceed
- If installation fails, inform the user and stop
### 2. Extract Username
**Extract the username directly from the user's message if clearly stated.**
Examples where you should **NOT** use clarify:
- "Find accounts for nasa" → username is `nasa`
- "Search for johndoe123" → username is `johndoe123`
- "Check if alice exists on social media" → username is `alice`
- "Look up user bob on social networks" → username is `bob`
**Only use clarify if:**
- Multiple potential usernames mentioned ("search for alice or bob")
- Ambiguous phrasing ("search for my username" without specifying)
- No username mentioned at all ("do an OSINT search")
When extracting, take the **exact** username as stated — preserve case, numbers, underscores, etc.
### 3. Build Command
**Default command** (use this unless user specifically requests otherwise):
```bash
sherlock --print-found --no-color "<username>" --timeout 90
```
**Optional flags** (only add if user explicitly requests):
- `--nsfw` — Include NSFW sites (only if user asks)
- `--tor` — Route through Tor (only if user asks for anonymity)
**Do NOT ask about options via clarify** — just run the default search. Users can request specific options if needed.
### 4. Execute Search
Run via the `terminal` tool. The command typically takes 30-120 seconds depending on network conditions and site count.
**Example terminal call:**
```json
{
"command": "sherlock --print-found --no-color \"target_username\"",
"timeout": 180
}
```
### 5. Parse and Present Results
Sherlock outputs found accounts in a simple format. Parse the output and present:
1. **Summary line:** "Found X accounts for username 'Y'"
2. **Categorized links:** Group by platform type if helpful (social, professional, forums, etc.)
3. **Output file location:** Sherlock saves results to `<username>.txt` by default
**Example output parsing:**
```
[+] Instagram: https://instagram.com/username
[+] Twitter: https://twitter.com/username
[+] GitHub: https://github.com/username
```
Present findings as clickable links when possible.
## Pitfalls
### No Results Found
If Sherlock finds no accounts, this is often correct — the username may not be registered on checked platforms. Suggest:
- Checking spelling/variation
- Trying similar usernames with `?` wildcard: `sherlock "user?name"`
- The user may have privacy settings or deleted accounts
### Timeout Issues
Some sites are slow or block automated requests. Use `--timeout 120` to increase wait time, or `--site` to limit scope.
### Tor Configuration
`--tor` requires Tor daemon running. If user wants anonymity but Tor isn't available, suggest:
- Installing Tor service
- Using `--proxy` with an alternative proxy
### False Positives
Some sites always return "found" due to their response structure. Cross-reference unexpected results with manual checks.
### Rate Limiting
Aggressive searches may trigger rate limits. For bulk username searches, add delays between calls or use `--local` with cached data.
## Installation
### pipx (recommended)
```bash
pipx install sherlock-project
```
### pip
```bash
pip install sherlock-project
```
### Docker
```bash
docker pull sherlock/sherlock
docker run -it --rm sherlock/sherlock <username>
```
### Linux packages
Available on Debian 13+, Ubuntu 22.10+, Homebrew, Kali, BlackArch.
## Ethical Use
This tool is for legitimate OSINT and research purposes only. Remind users:
- Only search usernames they own or have permission to investigate
- Respect platform terms of service
- Do not use for harassment, stalking, or illegal activities
- Consider privacy implications before sharing results
## Verification
After running sherlock, verify:
1. Output lists found sites with URLs
2. `<username>.txt` file created (default output) if using file output
3. If `--print-found` used, output should only contain `[+]` lines for matches
## Example Interaction
**User:** "Can you check if the username 'johndoe123' exists on social media?"
**Agent procedure:**
1. Check `sherlock --version` (verify installed)
2. Username provided — proceed directly
3. Run: `sherlock --print-found --no-color "johndoe123" --timeout 90`
4. Parse output and present links
**Response format:**
> Found 12 accounts for username 'johndoe123':
>
> • https://twitter.com/johndoe123
> • https://github.com/johndoe123
> • https://instagram.com/johndoe123
> • [... additional links]
>
> Results saved to: johndoe123.txt
---
**User:** "Search for username 'alice' including NSFW sites"
**Agent procedure:**
1. Check sherlock installed
2. Username + NSFW flag both provided
3. Run: `sherlock --print-found --no-color --nsfw "alice" --timeout 90`
4. Present results
-218
View File
@@ -1,218 +0,0 @@
# Checkpoint & Rollback — Implementation Plan
## Goal
Automatic filesystem snapshots before destructive file operations, with user-facing rollback. The agent never sees or interacts with this — it's transparent infrastructure.
## Design Principles
1. **Not a tool** — the LLM never knows about it. Zero prompt tokens, zero tool schema overhead.
2. **Once per turn** — checkpoint at most once per conversation turn (user message → agent response cycle), triggered lazily on the first file-mutating operation. Not on every write.
3. **Opt-in via config** — disabled by default, enabled with `checkpoints: true` in config.yaml.
4. **Works on any directory** — uses a shadow git repo completely separate from the user's project git. Works on git repos, non-git directories, anything.
5. **User-facing rollback**`/rollback` slash command (CLI + gateway) to list and restore checkpoints. Also `hermes rollback` CLI subcommand.
## Architecture
```
~/.hermes/checkpoints/
{sha256(abs_dir)[:16]}/ # Shadow git repo per working directory
HEAD, refs/, objects/... # Standard git internals
HERMES_WORKDIR # Original dir path (for display)
info/exclude # Default excludes (node_modules, .env, etc.)
```
### Core: CheckpointManager (new file: tools/checkpoint_manager.py)
Adapted from PR #559's CheckpointStore. Key changes from the PR:
- **Not a tool** — no schema, no registry entry, no handler
- **Turn-scoped deduplication** — tracks `_checkpointed_dirs: Set[str]` per turn
- **Configurable** — reads `checkpoints` config key
- **Pruning** — keeps last N snapshots per directory (default 50), prunes on take
```python
class CheckpointManager:
def __init__(self, enabled: bool = False, max_snapshots: int = 50):
self.enabled = enabled
self.max_snapshots = max_snapshots
self._checkpointed_dirs: Set[str] = set() # reset each turn
def new_turn(self):
"""Call at start of each conversation turn to reset dedup."""
self._checkpointed_dirs.clear()
def ensure_checkpoint(self, working_dir: str, reason: str = "auto") -> None:
"""Take a checkpoint if enabled and not already done this turn."""
if not self.enabled:
return
abs_dir = str(Path(working_dir).resolve())
if abs_dir in self._checkpointed_dirs:
return
self._checkpointed_dirs.add(abs_dir)
try:
self._take(abs_dir, reason)
except Exception as e:
logger.debug("Checkpoint failed (non-fatal): %s", e)
def list_checkpoints(self, working_dir: str) -> List[dict]:
"""List available checkpoints for a directory."""
...
def restore(self, working_dir: str, commit_hash: str) -> dict:
"""Restore files to a checkpoint state."""
...
def _take(self, working_dir: str, reason: str):
"""Shadow git: add -A + commit. Prune if over max_snapshots."""
...
def _prune(self, shadow_repo: Path):
"""Keep only last max_snapshots commits."""
...
```
### Integration Point: run_agent.py
The AIAgent already owns the conversation loop. Add CheckpointManager as an instance attribute:
```python
class AIAgent:
def __init__(self, ...):
...
# Checkpoint manager — reads config to determine if enabled
self._checkpoint_mgr = CheckpointManager(
enabled=config.get("checkpoints", False),
max_snapshots=config.get("checkpoint_max_snapshots", 50),
)
```
**Turn boundary** — in `run_conversation()`, call `new_turn()` at the start of each agent iteration (before processing tool calls):
```python
# Inside the main loop, before _execute_tool_calls():
self._checkpoint_mgr.new_turn()
```
**Trigger point** — in `_execute_tool_calls()`, before dispatching file-mutating tools:
```python
# Before the handle_function_call dispatch:
if function_name in ("write_file", "patch"):
# Determine working dir from the file path in the args
file_path = function_args.get("path", "") or function_args.get("old_string", "")
if file_path:
work_dir = str(Path(file_path).parent.resolve())
self._checkpoint_mgr.ensure_checkpoint(work_dir, f"before {function_name}")
```
This means:
- First `write_file` in a turn → checkpoint (fast, one `git add -A && git commit`)
- Subsequent writes in the same turn → no-op (already checkpointed)
- Next turn (new user message) → fresh checkpoint eligibility
### Config
Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`:
```python
"checkpoints": False, # Enable filesystem checkpoints before destructive ops
"checkpoint_max_snapshots": 50, # Max snapshots to keep per directory
```
User enables with:
```yaml
# ~/.hermes/config.yaml
checkpoints: true
```
### User-Facing Rollback
**CLI slash command** — add `/rollback` to `process_command()` in `cli.py`:
```
/rollback — List recent checkpoints for the current directory
/rollback <hash> — Restore files to that checkpoint
```
Shows a numbered list:
```
📸 Checkpoints for /home/user/project:
1. abc1234 2026-03-09 21:15 before write_file (3 files changed)
2. def5678 2026-03-09 20:42 before patch (1 file changed)
3. ghi9012 2026-03-09 20:30 before write_file (2 files changed)
Use /rollback <number> to restore, e.g. /rollback 1
```
**Gateway slash command** — add `/rollback` to gateway/run.py with the same behavior.
**CLI subcommand** — `hermes rollback` (optional, lower priority).
### What Gets Excluded (not checkpointed)
Same as the PR's defaults — written to the shadow repo's `info/exclude`:
```
node_modules/
dist/
build/
.env
.env.*
__pycache__/
*.pyc
.DS_Store
*.log
.cache/
.venv/
.git/
```
Also respects the project's `.gitignore` if present (shadow repo can read it via `core.excludesFile`).
### Safety
- `ensure_checkpoint()` wraps everything in try/except — a checkpoint failure never blocks the actual file operation
- Shadow repo is completely isolated — GIT_DIR + GIT_WORK_TREE env vars, never touches user's .git
- If git isn't installed, checkpoints silently disable
- Large directories: add a file count check — skip checkpoint if >50K files to avoid slowdowns
## Files to Create/Modify
| File | Change |
|------|--------|
| `tools/checkpoint_manager.py` | **NEW** — CheckpointManager class (adapted from PR #559) |
| `run_agent.py` | Add CheckpointManager init + trigger in `_execute_tool_calls()` |
| `hermes_cli/config.py` | Add `checkpoints` + `checkpoint_max_snapshots` to DEFAULT_CONFIG |
| `cli.py` | Add `/rollback` slash command handler |
| `gateway/run.py` | Add `/rollback` slash command handler |
| `tests/tools/test_checkpoint_manager.py` | **NEW** — tests (adapted from PR #559's tests) |
## What We Take From PR #559
- `_shadow_repo_path()` — deterministic path hashing ✅
- `_git_env()` — GIT_DIR/GIT_WORK_TREE isolation ✅
- `_run_git()` — subprocess wrapper with timeout ✅
- `_init_shadow_repo()` — shadow repo initialization ✅
- `DEFAULT_EXCLUDES` list ✅
- Test structure and patterns ✅
## What We Change From PR #559
- **Remove tool schema/registry** — not a tool
- **Remove injection into file_operations.py and patch_parser.py** — trigger from run_agent.py instead
- **Add turn-scoped deduplication** — one checkpoint per turn, not per operation
- **Add pruning** — keep last N snapshots
- **Add config flag** — opt-in, not mandatory
- **Add /rollback command** — user-facing restore UI
- **Add file count guard** — skip huge directories
## Implementation Order
1. `tools/checkpoint_manager.py` — core class with take/list/restore/prune
2. `tests/tools/test_checkpoint_manager.py` — tests
3. `hermes_cli/config.py` — config keys
4. `run_agent.py` — integration (init + trigger)
5. `cli.py``/rollback` slash command
6. `gateway/run.py``/rollback` slash command
7. Full test suite run + manual smoke test
+6 -2
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
version = "0.2.0"
version = "0.4.0"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"
@@ -27,6 +27,7 @@ dependencies = [
"prompt_toolkit",
# Tools
"firecrawl-py",
"parallel-web>=0.4.2",
"fal-client",
# Text-to-speech (Edge TTS is free, no API key needed)
"edge-tts",
@@ -46,6 +47,7 @@ dev = ["pytest", "pytest-asyncio", "pytest-xdist", "mcp>=1.2.0"]
messaging = ["python-telegram-bot>=20.0", "discord.py[voice]>=2.0", "aiohttp>=3.9.0", "slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
cron = ["croniter"]
slack = ["slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
matrix = ["matrix-nio[e2e]>=0.24.0"]
cli = ["simple-term-menu"]
tts-premium = ["elevenlabs"]
voice = ["sounddevice>=0.4.6", "numpy>=1.24.0"]
@@ -56,6 +58,7 @@ pty = [
honcho = ["honcho-ai>=2.0.1"]
mcp = ["mcp>=1.2.0"]
homeassistant = ["aiohttp>=3.9.0"]
sms = ["aiohttp>=3.9.0"]
acp = ["agent-client-protocol>=0.8.1,<1.0"]
rl = [
"atroposlib @ git+https://github.com/NousResearch/atropos.git",
@@ -78,6 +81,7 @@ all = [
"hermes-agent[honcho]",
"hermes-agent[mcp]",
"hermes-agent[homeassistant]",
"hermes-agent[sms]",
"hermes-agent[acp]",
"hermes-agent[voice]",
]
@@ -88,7 +92,7 @@ hermes-agent = "run_agent:main"
hermes-acp = "acp_adapter.entry:main"
[tool.setuptools]
py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli", "hermes_constants", "hermes_state", "hermes_time", "mini_swe_runner", "rl_cli", "utils"]
py-modules = ["run_agent", "model_tools", "toolsets", "batch_runner", "trajectory_compressor", "toolset_distributions", "cli", "hermes_constants", "hermes_state", "hermes_time", "mini_swe_runner", "minisweagent_path", "rl_cli", "utils"]
[tool.setuptools.packages.find]
include = ["agent", "tools", "tools.*", "hermes_cli", "gateway", "gateway.*", "cron", "honcho_integration", "acp_adapter"]
+1
View File
@@ -18,6 +18,7 @@ PyJWT[crypto]
# Web tools
firecrawl-py
parallel-web>=0.4.2
# Image generation
fal-client
+7 -16
View File
@@ -27,25 +27,16 @@ from pathlib import Path
import fire
import yaml
# Load .env from ~/.hermes/.env first, then project root as dev fallback
from dotenv import load_dotenv
# Load .env from ~/.hermes/.env first, then project root as dev fallback.
# User-managed env files should override stale shell exports on restart.
_hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
_user_env = _hermes_home / ".env"
_project_env = Path(__file__).parent / '.env'
if _user_env.exists():
try:
load_dotenv(dotenv_path=_user_env, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(dotenv_path=_user_env, encoding="latin-1")
print(f"✅ Loaded environment variables from {_user_env}")
elif _project_env.exists():
try:
load_dotenv(dotenv_path=_project_env, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(dotenv_path=_project_env, encoding="latin-1")
print(f"✅ Loaded environment variables from {_project_env}")
from hermes_cli.env_loader import load_hermes_dotenv
_loaded_env_paths = load_hermes_dotenv(hermes_home=_hermes_home, project_env=_project_env)
for _env_path in _loaded_env_paths:
print(f"✅ Loaded environment variables from {_env_path}")
# Set terminal working directory to tinker-atropos submodule
# This ensures terminal commands run in the right context for RL work
+1469 -343
View File
File diff suppressed because it is too large Load Diff
+389
View File
@@ -0,0 +1,389 @@
#!/usr/bin/env python3
"""Discord Voice Doctor — diagnostic tool for voice channel support.
Checks all dependencies, configuration, and bot permissions needed
for Discord voice mode to work correctly.
Usage:
python scripts/discord-voice-doctor.py
.venv/bin/python scripts/discord-voice-doctor.py
"""
import os
import sys
import shutil
from pathlib import Path
# Resolve project root
SCRIPT_DIR = Path(__file__).resolve().parent
PROJECT_ROOT = SCRIPT_DIR.parent
sys.path.insert(0, str(PROJECT_ROOT))
HERMES_HOME = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
ENV_FILE = HERMES_HOME / ".env"
OK = "\033[92m\u2713\033[0m"
FAIL = "\033[91m\u2717\033[0m"
WARN = "\033[93m!\033[0m"
# Track whether discord.py is available for later sections
_discord_available = False
def mask(value):
"""Mask sensitive value: show only first 4 chars."""
if not value or len(value) < 8:
return "****"
return f"{value[:4]}{'*' * (len(value) - 4)}"
def check(label, ok, detail=""):
symbol = OK if ok else FAIL
msg = f" {symbol} {label}"
if detail:
msg += f" ({detail})"
print(msg)
return ok
def warn(label, detail=""):
msg = f" {WARN} {label}"
if detail:
msg += f" ({detail})"
print(msg)
def section(title):
print(f"\n\033[1m{title}\033[0m")
def check_packages():
"""Check Python package dependencies. Returns True if all critical deps OK."""
global _discord_available
section("Python Packages")
ok = True
# discord.py
try:
import discord
_discord_available = True
check("discord.py", True, f"v{discord.__version__}")
except ImportError:
check("discord.py", False, "pip install discord.py[voice]")
ok = False
# PyNaCl
try:
import nacl
ver = getattr(nacl, "__version__", "unknown")
try:
import nacl.secret
nacl.secret.Aead(bytes(32))
check("PyNaCl", True, f"v{ver}")
except (AttributeError, Exception):
check("PyNaCl (Aead)", False, f"v{ver} — need >=1.5.0")
ok = False
except ImportError:
check("PyNaCl", False, "pip install PyNaCl>=1.5.0")
ok = False
# davey (DAVE E2EE)
try:
import davey
check("davey (DAVE E2EE)", True, f"v{getattr(davey, '__version__', '?')}")
except ImportError:
check("davey (DAVE E2EE)", False, "pip install davey")
ok = False
# Optional: local STT
try:
import faster_whisper
check("faster-whisper (local STT)", True)
except ImportError:
warn("faster-whisper (local STT)", "not installed — local STT unavailable")
# Optional: TTS providers
try:
import edge_tts
check("edge-tts", True)
except ImportError:
warn("edge-tts", "not installed — edge TTS unavailable")
try:
import elevenlabs
check("elevenlabs SDK", True)
except ImportError:
warn("elevenlabs SDK", "not installed — premium TTS unavailable")
return ok
def check_system_tools():
"""Check system-level tools (opus, ffmpeg). Returns True if all OK."""
section("System Tools")
ok = True
# Opus codec
if _discord_available:
try:
import discord
opus_loaded = discord.opus.is_loaded()
if not opus_loaded:
import ctypes.util
opus_path = ctypes.util.find_library("opus")
if not opus_path:
# Platform-specific fallback paths
candidates = [
"/opt/homebrew/lib/libopus.dylib", # macOS Apple Silicon
"/usr/local/lib/libopus.dylib", # macOS Intel
"/usr/lib/x86_64-linux-gnu/libopus.so.0", # Debian/Ubuntu x86
"/usr/lib/aarch64-linux-gnu/libopus.so.0", # Debian/Ubuntu ARM
"/usr/lib/libopus.so", # Arch Linux
"/usr/lib64/libopus.so", # RHEL/Fedora
]
for p in candidates:
if os.path.isfile(p):
opus_path = p
break
if opus_path:
discord.opus.load_opus(opus_path)
opus_loaded = discord.opus.is_loaded()
if opus_loaded:
check("Opus codec", True)
else:
check("Opus codec", False, "brew install opus / apt install libopus0")
ok = False
except Exception as e:
check("Opus codec", False, str(e))
ok = False
else:
warn("Opus codec", "skipped — discord.py not installed")
# ffmpeg
ffmpeg_path = shutil.which("ffmpeg")
if ffmpeg_path:
check("ffmpeg", True, ffmpeg_path)
else:
check("ffmpeg", False, "brew install ffmpeg / apt install ffmpeg")
ok = False
return ok
def check_env_vars():
"""Check environment variables. Returns (ok, token, groq_key, eleven_key)."""
section("Environment Variables")
# Load .env
try:
from dotenv import load_dotenv
if ENV_FILE.exists():
load_dotenv(ENV_FILE)
except ImportError:
pass
ok = True
token = os.getenv("DISCORD_BOT_TOKEN", "")
if token:
check("DISCORD_BOT_TOKEN", True, mask(token))
else:
check("DISCORD_BOT_TOKEN", False, "not set")
ok = False
# Allowed users — resolve usernames if possible
allowed = os.getenv("DISCORD_ALLOWED_USERS", "")
if allowed:
users = [u.strip() for u in allowed.split(",") if u.strip()]
user_labels = []
for uid in users:
label = mask(uid)
if token and uid.isdigit():
try:
import requests
r = requests.get(
f"https://discord.com/api/v10/users/{uid}",
headers={"Authorization": f"Bot {token}"},
timeout=3,
)
if r.status_code == 200:
label = f"{r.json().get('username', '?')} ({mask(uid)})"
except Exception:
pass
user_labels.append(label)
check("DISCORD_ALLOWED_USERS", True, f"{len(users)} user(s): {', '.join(user_labels)}")
else:
warn("DISCORD_ALLOWED_USERS", "not set — all users can use voice")
groq_key = os.getenv("GROQ_API_KEY", "")
eleven_key = os.getenv("ELEVENLABS_API_KEY", "")
if groq_key:
check("GROQ_API_KEY (STT)", True, mask(groq_key))
else:
warn("GROQ_API_KEY", "not set — Groq STT unavailable")
if eleven_key:
check("ELEVENLABS_API_KEY (TTS)", True, mask(eleven_key))
else:
warn("ELEVENLABS_API_KEY", "not set — ElevenLabs TTS unavailable")
return ok, token, groq_key, eleven_key
def check_config(groq_key, eleven_key):
"""Check hermes config.yaml."""
section("Configuration")
config_path = HERMES_HOME / "config.yaml"
if config_path.exists():
try:
import yaml
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
stt_provider = cfg.get("stt", {}).get("provider", "local")
tts_provider = cfg.get("tts", {}).get("provider", "edge")
check("STT provider", True, stt_provider)
check("TTS provider", True, tts_provider)
if stt_provider == "groq" and not groq_key:
warn("STT config says groq but GROQ_API_KEY is missing")
if tts_provider == "elevenlabs" and not eleven_key:
warn("TTS config says elevenlabs but ELEVENLABS_API_KEY is missing")
except Exception as e:
warn("config.yaml", f"parse error: {e}")
else:
warn("config.yaml", "not found — using defaults")
# Voice mode state
voice_mode_path = HERMES_HOME / "gateway_voice_mode.json"
if voice_mode_path.exists():
try:
import json
modes = json.loads(voice_mode_path.read_text())
off_count = sum(1 for v in modes.values() if v == "off")
all_count = sum(1 for v in modes.values() if v == "all")
check("Voice mode state", True, f"{all_count} on, {off_count} off, {len(modes)} total")
except Exception:
warn("Voice mode state", "parse error")
else:
check("Voice mode state", True, "no saved state (fresh)")
def check_bot_permissions(token):
"""Check bot permissions via Discord API. Returns True if all OK."""
section("Bot Permissions")
if not token:
warn("Bot permissions", "no token — skipping")
return True
try:
import requests
except ImportError:
warn("Bot permissions", "requests not installed — skipping")
return True
VOICE_PERMS = {
"Priority Speaker": 8,
"Stream": 9,
"View Channel": 10,
"Send Messages": 11,
"Embed Links": 14,
"Attach Files": 15,
"Read Message History": 16,
"Connect": 20,
"Speak": 21,
"Mute Members": 22,
"Deafen Members": 23,
"Move Members": 24,
"Use VAD": 25,
"Send Voice Messages": 46,
}
REQUIRED_PERMS = {"Connect", "Speak", "View Channel", "Send Messages"}
ok = True
try:
headers = {"Authorization": f"Bot {token}"}
r = requests.get("https://discord.com/api/v10/users/@me", headers=headers, timeout=5)
if r.status_code == 401:
check("Bot login", False, "invalid token (401)")
return False
if r.status_code != 200:
check("Bot login", False, f"HTTP {r.status_code}")
return False
bot = r.json()
bot_name = bot.get("username", "?")
check("Bot login", True, f"{bot_name[:3]}{'*' * (len(bot_name) - 3)}")
# Check guilds
r2 = requests.get("https://discord.com/api/v10/users/@me/guilds", headers=headers, timeout=5)
if r2.status_code != 200:
warn("Guilds", f"HTTP {r2.status_code}")
return ok
guilds = r2.json()
check("Guilds", True, f"{len(guilds)} guild(s)")
for g in guilds[:5]:
perms = int(g.get("permissions", 0))
is_admin = bool(perms & (1 << 3))
if is_admin:
print(f" {OK} {g['name']}: Administrator (all permissions)")
continue
has = []
missing = []
for name, bit in sorted(VOICE_PERMS.items(), key=lambda x: x[1]):
if perms & (1 << bit):
has.append(name)
elif name in REQUIRED_PERMS:
missing.append(name)
if missing:
print(f" {FAIL} {g['name']}: missing {', '.join(missing)}")
ok = False
else:
print(f" {OK} {g['name']}: {', '.join(has)}")
except requests.exceptions.Timeout:
warn("Bot permissions", "Discord API timeout")
except requests.exceptions.ConnectionError:
warn("Bot permissions", "cannot reach Discord API")
except Exception as e:
warn("Bot permissions", f"check failed: {e}")
return ok
def main():
print()
print("\033[1m" + "=" * 50 + "\033[0m")
print("\033[1m Discord Voice Doctor\033[0m")
print("\033[1m" + "=" * 50 + "\033[0m")
all_ok = True
all_ok &= check_packages()
all_ok &= check_system_tools()
env_ok, token, groq_key, eleven_key = check_env_vars()
all_ok &= env_ok
check_config(groq_key, eleven_key)
all_ok &= check_bot_permissions(token)
# Summary
print()
print("\033[1m" + "-" * 50 + "\033[0m")
if all_ok:
print(f" {OK} \033[92mAll checks passed — voice mode ready!\033[0m")
else:
print(f" {FAIL} \033[91mSome checks failed — fix issues above.\033[0m")
print()
if __name__ == "__main__":
main()
+10 -3
View File
@@ -483,6 +483,8 @@ install_system_packages() {
elif command -v sudo &> /dev/null; then
if [ "$IS_INTERACTIVE" = true ]; then
echo ""
log_info "sudo is needed ONLY to install optional system packages (${pkgs[*]}) via your package manager."
log_info "Hermes Agent itself does not require or retain root access."
read -p "Install ${description}? (requires sudo) [y/N] " -n 1 -r
echo
if [[ $REPLY =~ ^[Yy]$ ]]; then
@@ -496,8 +498,9 @@ install_system_packages() {
# Non-interactive (e.g. curl | bash) but a terminal is available.
# Read the prompt from /dev/tty (same approach the setup wizard uses).
echo ""
log_info "Installing ${description} requires sudo."
read -p "Install? [Y/n] " -n 1 -r < /dev/tty
log_info "sudo is needed ONLY to install optional system packages (${pkgs[*]}) via your package manager."
log_info "Hermes Agent itself does not require or retain root access."
read -p "Install ${description}? [Y/n] " -n 1 -r < /dev/tty
echo
if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
if sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a $install_cmd < /dev/tty; then
@@ -688,7 +691,9 @@ install_deps() {
sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a apt-get update -qq && sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a apt-get install -y -qq build-essential python3-dev libffi-dev >/dev/null 2>&1 || true
log_success "Build tools installed"
else
read -p "Install build tools (build-essential, python3-dev)? (requires sudo) [Y/n] " -n 1 -r < /dev/tty
log_info "sudo is needed ONLY to install build tools (build-essential, python3-dev, libffi-dev) via apt."
log_info "Hermes Agent itself does not require or retain root access."
read -p "Install build tools? [Y/n] " -n 1 -r < /dev/tty
echo
if [[ $REPLY =~ ^[Yy]$ ]] || [[ -z $REPLY ]]; then
sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a apt-get update -qq && sudo DEBIAN_FRONTEND=noninteractive NEEDRESTART_MODE=a apt-get install -y -qq build-essential python3-dev libffi-dev >/dev/null 2>&1 || true
@@ -908,6 +913,8 @@ install_node_deps() {
cd "$INSTALL_DIR" && npx playwright install chromium 2>/dev/null || true
;;
*)
log_info "Playwright may request sudo to install browser system dependencies (shared libraries)."
log_info "This is standard Playwright setup — Hermes itself does not require root access."
cd "$INSTALL_DIR" && npx playwright install --with-deps chromium 2>/dev/null || true
;;
esac
+125 -17
View File
@@ -18,12 +18,13 @@
* node bridge.js --port 3000 --session ~/.hermes/whatsapp/session
*/
import { makeWASocket, useMultiFileAuthState, DisconnectReason, fetchLatestBaileysVersion } from '@whiskeysockets/baileys';
import { makeWASocket, useMultiFileAuthState, DisconnectReason, fetchLatestBaileysVersion, downloadMediaMessage } from '@whiskeysockets/baileys';
import express from 'express';
import { Boom } from '@hapi/boom';
import pino from 'pino';
import path from 'path';
import { mkdirSync, readFileSync, existsSync } from 'fs';
import { mkdirSync, readFileSync, writeFileSync, existsSync, readdirSync } from 'fs';
import { randomBytes } from 'crypto';
import qrcode from 'qrcode-terminal';
// Parse CLI args
@@ -33,20 +34,55 @@ function getArg(name, defaultVal) {
return idx !== -1 && args[idx + 1] ? args[idx + 1] : defaultVal;
}
const WHATSAPP_DEBUG =
typeof process !== 'undefined' &&
process.env &&
typeof process.env.WHATSAPP_DEBUG === 'string' &&
['1', 'true', 'yes', 'on'].includes(process.env.WHATSAPP_DEBUG.toLowerCase());
const PORT = parseInt(getArg('port', '3000'), 10);
const SESSION_DIR = getArg('session', path.join(process.env.HOME || '~', '.hermes', 'whatsapp', 'session'));
const IMAGE_CACHE_DIR = path.join(process.env.HOME || '~', '.hermes', 'image_cache');
const PAIR_ONLY = args.includes('--pair-only');
const WHATSAPP_MODE = getArg('mode', process.env.WHATSAPP_MODE || 'self-chat'); // "bot" or "self-chat"
const ALLOWED_USERS = (process.env.WHATSAPP_ALLOWED_USERS || '').split(',').map(s => s.trim()).filter(Boolean);
const DEFAULT_REPLY_PREFIX = '⚕ *Hermes Agent*\n────────────\n';
const REPLY_PREFIX = process.env.WHATSAPP_REPLY_PREFIX === undefined
? DEFAULT_REPLY_PREFIX
: process.env.WHATSAPP_REPLY_PREFIX.replace(/\\n/g, '\n');
function formatOutgoingMessage(message) {
return REPLY_PREFIX ? `${REPLY_PREFIX}${message}` : message;
}
mkdirSync(SESSION_DIR, { recursive: true });
// Build LID → phone reverse map from session files (lid-mapping-{phone}.json)
function buildLidMap() {
const map = {};
try {
for (const f of readdirSync(SESSION_DIR)) {
const m = f.match(/^lid-mapping-(\d+)\.json$/);
if (!m) continue;
const phone = m[1];
const lid = JSON.parse(readFileSync(path.join(SESSION_DIR, f), 'utf8'));
if (lid) map[String(lid)] = phone;
}
} catch {}
return map;
}
let lidToPhone = buildLidMap();
const logger = pino({ level: 'warn' });
// Message queue for polling
const messageQueue = [];
const MAX_QUEUE_SIZE = 100;
// Track recently sent message IDs to prevent echo-back loops with media
const recentlySentIds = new Set();
const MAX_RECENT_IDS = 50;
let sock = null;
let connectionState = 'disconnected';
@@ -62,9 +98,16 @@ async function startSocket() {
browser: ['Hermes Agent', 'Chrome', '120.0'],
syncFullHistory: false,
markOnlineOnConnect: false,
// Required for Baileys 7.x: without this, incoming messages that need
// E2EE session re-establishment are silently dropped (msg.message === null)
getMessage: async (key) => {
// We don't maintain a message store, so return a placeholder.
// This is enough for Baileys to complete the retry handshake.
return { conversation: '' };
},
});
sock.ev.on('creds.update', saveCreds);
sock.ev.on('creds.update', () => { saveCreds(); lidToPhone = buildLidMap(); });
sock.ev.on('connection.update', (update) => {
const { connection, lastDisconnect, qr } = update;
@@ -102,13 +145,25 @@ async function startSocket() {
}
});
sock.ev.on('messages.upsert', ({ messages, type }) => {
if (type !== 'notify') return;
sock.ev.on('messages.upsert', async ({ messages, type }) => {
// In self-chat mode, your own messages commonly arrive as 'append' rather
// than 'notify'. Accept both and filter agent echo-backs below.
if (type !== 'notify' && type !== 'append') return;
for (const msg of messages) {
if (!msg.message) continue;
const chatId = msg.key.remoteJid;
if (WHATSAPP_DEBUG) {
try {
console.log(JSON.stringify({
event: 'upsert', type,
fromMe: !!msg.key.fromMe, chatId,
senderId: msg.key.participant || chatId,
messageKeys: Object.keys(msg.message || {}),
}));
} catch {}
}
const senderId = msg.key.participant || chatId;
const isGroup = chatId.endsWith('@g.us');
const senderNumber = senderId.replace(/@.*/, '');
@@ -123,15 +178,20 @@ async function startSocket() {
}
// Self-chat mode: only allow messages in the user's own self-chat
// WhatsApp now uses LID (Linked Identity Device) format: 67427329167522@lid
// AND classic format: 34652029134@s.whatsapp.net
// sock.user has both: { id: "number:10@s.whatsapp.net", lid: "lid_number:10@lid" }
const myNumber = (sock.user?.id || '').replace(/:.*@/, '@').replace(/@.*/, '');
const myLid = (sock.user?.lid || '').replace(/:.*@/, '@').replace(/@.*/, '');
const chatNumber = chatId.replace(/@.*/, '');
const isSelfChat = myNumber && chatNumber === myNumber;
const isSelfChat = (myNumber && chatNumber === myNumber) || (myLid && chatNumber === myLid);
if (!isSelfChat) continue;
}
// Check allowlist for messages from others
if (!msg.key.fromMe && ALLOWED_USERS.length > 0 && !ALLOWED_USERS.includes(senderNumber)) {
continue;
// Check allowlist for messages from others (resolve LID → phone if needed)
if (!msg.key.fromMe && ALLOWED_USERS.length > 0) {
const resolvedNumber = lidToPhone[senderNumber] || senderNumber;
if (!ALLOWED_USERS.includes(resolvedNumber)) continue;
}
// Extract message body
@@ -148,6 +208,18 @@ async function startSocket() {
body = msg.message.imageMessage.caption || '';
hasMedia = true;
mediaType = 'image';
try {
const buf = await downloadMediaMessage(msg, 'buffer', {}, { logger, reuploadRequest: sock.updateMediaMessage });
const mime = msg.message.imageMessage.mimetype || 'image/jpeg';
const extMap = { 'image/jpeg': '.jpg', 'image/png': '.png', 'image/webp': '.webp', 'image/gif': '.gif' };
const ext = extMap[mime] || '.jpg';
mkdirSync(IMAGE_CACHE_DIR, { recursive: true });
const filePath = path.join(IMAGE_CACHE_DIR, `img_${randomBytes(6).toString('hex')}${ext}`);
writeFileSync(filePath, buf);
mediaUrls.push(filePath);
} catch (err) {
console.error('[bridge] Failed to download image:', err.message);
}
} else if (msg.message.videoMessage) {
body = msg.message.videoMessage.caption || '';
hasMedia = true;
@@ -161,8 +233,30 @@ async function startSocket() {
mediaType = 'document';
}
// For media without caption, use a placeholder so the API message is never empty
if (hasMedia && !body) {
body = `[${mediaType} received]`;
}
// Ignore Hermes' own reply messages in self-chat mode to avoid loops.
if (msg.key.fromMe && ((REPLY_PREFIX && body.startsWith(REPLY_PREFIX)) || recentlySentIds.has(msg.key.id))) {
if (WHATSAPP_DEBUG) {
try { console.log(JSON.stringify({ event: 'ignored', reason: 'agent_echo', chatId, messageId: msg.key.id })); } catch {}
}
continue;
}
// Skip empty messages
if (!body && !hasMedia) continue;
if (!body && !hasMedia) {
if (WHATSAPP_DEBUG) {
try {
console.log(JSON.stringify({ event: 'ignored', reason: 'empty', chatId, messageKeys: Object.keys(msg.message || {}) }));
} catch (err) {
console.error('Failed to log empty message event:', err);
}
}
continue;
}
const event = {
messageId: msg.key.id,
@@ -208,10 +302,16 @@ app.post('/send', async (req, res) => {
}
try {
// Prefix responses so the user can distinguish agent replies from their
// own messages (especially in self-chat / "Message Yourself").
const prefixed = `⚕ *Hermes Agent*\n────────────\n${message}`;
const sent = await sock.sendMessage(chatId, { text: prefixed });
const sent = await sock.sendMessage(chatId, { text: formatOutgoingMessage(message) });
// Track sent message ID to prevent echo-back loops
if (sent?.key?.id) {
recentlySentIds.add(sent.key.id);
if (recentlySentIds.size > MAX_RECENT_IDS) {
recentlySentIds.delete(recentlySentIds.values().next().value);
}
}
res.json({ success: true, messageId: sent?.key?.id });
} catch (err) {
res.status(500).json({ error: err.message });
@@ -230,9 +330,8 @@ app.post('/edit', async (req, res) => {
}
try {
const prefixed = `⚕ *Hermes Agent*\n────────────\n${message}`;
const key = { id: messageId, fromMe: true, remoteJid: chatId };
await sock.sendMessage(chatId, { text: prefixed, edit: key });
await sock.sendMessage(chatId, { text: formatOutgoingMessage(message), edit: key });
res.json({ success: true });
} catch (err) {
res.status(500).json({ error: err.message });
@@ -303,6 +402,15 @@ app.post('/send-media', async (req, res) => {
}
const sent = await sock.sendMessage(chatId, msgPayload);
// Track sent message ID to prevent echo-back loops
if (sent?.key?.id) {
recentlySentIds.add(sent.key.id);
if (recentlySentIds.size > MAX_RECENT_IDS) {
recentlySentIds.delete(recentlySentIds.values().next().value);
}
}
res.json({ success: true, messageId: sent?.key?.id });
} catch (err) {
res.status(500).json({ error: err.message });
@@ -368,7 +476,7 @@ if (PAIR_ONLY) {
console.log();
startSocket();
} else {
app.listen(PORT, () => {
app.listen(PORT, '127.0.0.1', () => {
console.log(`🌉 WhatsApp bridge listening on port ${PORT} (mode: ${WHATSAPP_MODE})`);
console.log(`📁 Session stored in: ${SESSION_DIR}`);
if (ALLOWED_USERS.length > 0) {
+111 -162
View File
@@ -5,12 +5,26 @@ description: "Production pipeline for ASCII art video — any format. Converts v
# ASCII Video Production Pipeline
Full production pipeline for rendering any content as colored ASCII character video.
## Creative Standard
This is visual art. ASCII characters are the medium; cinema is the standard.
**Before writing a single line of code**, articulate the creative concept. What is the mood? What visual story does this tell? What makes THIS project different from every other ASCII video? The user's prompt is a starting point — interpret it with creative ambition, not literal transcription.
**First-render excellence is non-negotiable.** The output must be visually striking without requiring revision rounds. If something looks generic, flat, or like "AI-generated ASCII art," it is wrong — rethink the creative concept before shipping.
**Go beyond the reference vocabulary.** The effect catalogs, shader presets, and palette libraries in the references are a starting vocabulary. For every project, combine, modify, and invent new patterns. The catalog is a palette of paints — you write the painting.
**Be proactively creative.** Extend the skill's vocabulary when the project calls for it. If the references don't have what the vision demands, build it. Include at least one visual moment the user didn't ask for but will appreciate — a transition, an effect, a color choice that elevates the whole piece.
**Cohesive aesthetic over technical correctness.** All scenes in a video must feel connected by a unifying visual language — shared color temperature, related character palettes, consistent motion vocabulary. A technically correct video where every scene uses a random different effect is an aesthetic failure.
**Dense, layered, considered.** Every frame should reward viewing. Never flat black backgrounds. Always multi-grid composition. Always per-scene variation. Always intentional color.
## Modes
| Mode | Input | Output | Read |
|------|-------|--------|------|
| Mode | Input | Output | Reference |
|------|-------|--------|-----------|
| **Video-to-ASCII** | Video file | ASCII recreation of source footage | `references/inputs.md` § Video Sampling |
| **Audio-reactive** | Audio file | Generative visuals driven by audio features | `references/inputs.md` § Audio Analysis |
| **Generative** | None (or seed params) | Procedural ASCII animation | `references/effects.md` |
@@ -20,210 +34,154 @@ Full production pipeline for rendering any content as colored ASCII character vi
## Stack
Single self-contained Python script per project. No GPU.
Single self-contained Python script per project. No GPU required.
| Layer | Tool | Purpose |
|-------|------|---------|
| Core | Python 3.10+, NumPy | Math, array ops, vectorized effects |
| Signal | SciPy | FFT, peak detection (audio modes only) |
| Imaging | Pillow (PIL) | Font rasterization, video frame decoding, image I/O |
| Video I/O | ffmpeg (CLI) | Decode input, encode output segments, mux audio, mix tracks |
| Parallel | concurrent.futures / multiprocessing | N workers for batch/clip rendering |
| TTS | ElevenLabs API (or similar) | Generate narration clips for quote/testimonial videos |
| Optional | OpenCV | Video frame sampling, edge detection, optical flow |
| Signal | SciPy | FFT, peak detection (audio modes) |
| Imaging | Pillow (PIL) | Font rasterization, frame decoding, image I/O |
| Video I/O | ffmpeg (CLI) | Decode input, encode output, mux audio |
| Parallel | concurrent.futures | N workers for batch/clip rendering |
| TTS | ElevenLabs API (optional) | Generate narration clips |
| Optional | OpenCV | Video frame sampling, edge detection |
## Pipeline Architecture (v2)
## Pipeline Architecture
Every mode follows the same 6-stage pipeline. See `references/architecture.md` for implementation details, `references/scenes.md` for scene protocol, and `references/composition.md` for multi-grid composition and tonemap.
Every mode follows the same 6-stage pipeline:
```
┌─────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌─────────┐ ┌────────┐
│ 1.INPUT │→│ 2.ANALYZE │→│ 3.SCENE_FN │→│ 4.TONEMAP │→│ 5.SHADE │→│ 6.ENCODE│
│ load src │ │ features │ │ → canvas │ │ normalize │ │ post-fx │ │ → video │
└─────────┘ └──────────┘ └───────────┘ └──────────┘ └─────────┘ └────────┘
INPUT → ANALYZE → SCENE_FN → TONEMAP → SHADE → ENCODE
```
1. **INPUT** — Load/decode source material (video frames, audio samples, images, or nothing)
2. **ANALYZE** — Extract per-frame features (audio bands, video luminance/edges, motion vectors)
3. **SCENE_FN** — Scene function renders directly to pixel canvas (`uint8 H,W,3`). May internally compose multiple character grids via `_render_vf()` + pixel blend modes. See `references/composition.md`
4. **TONEMAP** — Percentile-based adaptive brightness normalization with per-scene gamma. Replaces linear brightness multipliers. See `references/composition.md` § Adaptive Tonemap
5. **SHADE**Apply post-processing `ShaderChain` + `FeedbackBuffer`. See `references/shaders.md`
3. **SCENE_FN** — Scene function renders to pixel canvas (`uint8 H,W,3`). Composes multiple character grids via `_render_vf()` + pixel blend modes. See `references/composition.md`
4. **TONEMAP** — Percentile-based adaptive brightness normalization. See `references/composition.md` § Adaptive Tonemap
5. **SHADE**Post-processing via `ShaderChain` + `FeedbackBuffer`. See `references/shaders.md`
6. **ENCODE** — Pipe raw RGB frames to ffmpeg for H.264/GIF encoding
## Creative Direction
**Every project should look and feel different.** The references provide a vocabulary of building blocks — don't copy them verbatim. Combine, modify, and invent.
### Aesthetic Dimensions to Vary
### Aesthetic Dimensions
| Dimension | Options | Reference |
|-----------|---------|-----------|
| **Character palette** | Density ramps, block elements, symbols, scripts (katakana, Greek, runes, braille), dots, project-specific | `architecture.md` § Character Palettes |
| **Color strategy** | HSV (angle/distance/time/value mapped), OKLAB/OKLCH (perceptually uniform), discrete RGB palettes, auto-generated harmony (complementary/triadic/analogous/tetradic), monochrome, temperature | `architecture.md` § Color System |
| **Color tint** | Warm, cool, amber, matrix green, neon pink, sepia, ice, blood, void, sunset | `shaders.md` § Color Grade |
| **Background texture** | Sine fields, fBM noise, domain warp, voronoi cells, reaction-diffusion, cellular automata, video source | `effects.md` § Background Fills, Noise-Based Fields, Simulation-Based Fields |
| **Primary effects** | Rings, spirals, tunnel, vortex, waves, interference, aurora, ripple, fire, strange attractors, SDFs (geometric shapes with smooth booleans) | `effects.md` § Radial / Wave / Fire / SDF-Based Fields |
| **Particles** | Energy sparks, snow, rain, bubbles, runes, binary data, orbits, gravity wells, flocking boids, flow-field followers, trail-drawing particles | `effects.md` § Particle Systems |
| **Shader mood** | Retro CRT, clean modern, glitch art, cinematic, dreamy, harsh industrial, psychedelic | `shaders.md` § Design Philosophy |
| **Character palette** | Density ramps, block elements, symbols, scripts (katakana, Greek, runes, braille), project-specific | `architecture.md` § Palettes |
| **Color strategy** | HSV, OKLAB/OKLCH, discrete RGB palettes, auto-generated harmony, monochrome, temperature | `architecture.md` § Color System |
| **Background texture** | Sine fields, fBM noise, domain warp, voronoi, reaction-diffusion, cellular automata, video | `effects.md` |
| **Primary effects** | Rings, spirals, tunnel, vortex, waves, interference, aurora, fire, SDFs, strange attractors | `effects.md` |
| **Particles** | Sparks, snow, rain, bubbles, runes, orbits, flocking boids, flow-field followers, trails | `effects.md` § Particles |
| **Shader mood** | Retro CRT, clean modern, glitch art, cinematic, dreamy, industrial, psychedelic | `shaders.md` |
| **Grid density** | xs(8px) through xxl(40px), mixed per layer | `architecture.md` § Grid System |
| **Font** | Menlo, Monaco, Courier, SF Mono, JetBrains Mono, Fira Code, IBM Plex | `architecture.md` § Font Selection |
| **Coordinate space** | Cartesian, polar, tiled, rotated, skewed, fisheye, twisted, Möbius, domain-warped | `effects.md` § Coordinate Transforms |
| **Mirror mode** | None, horizontal, vertical, quad, diagonal, kaleidoscope | `shaders.md` § Mirror Effects |
| **Masking** | Circle, rect, ring, gradient, text stencil, value-field-as-mask, animated iris/wipe/dissolve | `composition.md` § Masking |
| **Temporal motion** | Static, audio-reactive, eased keyframes, morphing between fields, temporal noise (smooth in-place evolution) | `effects.md` § Temporal Coherence |
| **Transition style** | Crossfade, wipe (directional/radial), dissolve, glitch cut, iris open/close, mask-based reveal | `shaders.md` § Transitions, `composition.md` § Animated Masks |
| **Aspect ratio** | Landscape (16:9), portrait (9:16), square (1:1), ultrawide (21:9) | `architecture.md` § Resolution Presets |
| **Coordinate space** | Cartesian, polar, tiled, rotated, fisheye, Möbius, domain-warped | `effects.md` § Transforms |
| **Feedback** | Zoom tunnel, rainbow trails, ghostly echo, rotating mandala, color evolution | `composition.md` § Feedback |
| **Masking** | Circle, ring, gradient, text stencil, animated iris/wipe/dissolve | `composition.md` § Masking |
| **Transitions** | Crossfade, wipe, dissolve, glitch cut, iris, mask-based reveal | `shaders.md` § Transitions |
### Per-Section Variation
Never use the same config for the entire video. For each section/scene/quote:
- Choose a **different background effect** (or compose 2-3)
- Choose a **different character palette** (match the mood)
- Choose a **different color strategy** (or at minimum a different hue)
- Vary **shader intensity** (more bloom during peaks, more grain during quiet)
- Use **different particle types** if particles are active
Never use the same config for the entire video. For each section/scene:
- **Different background effect** (or compose 2-3)
- **Different character palette** (match the mood)
- **Different color strategy** (or at minimum a different hue)
- **Vary shader intensity** (more bloom during peaks, more grain during quiet)
- **Different particle types** if particles are active
### Project-Specific Invention
For every project, invent at least one of:
- A custom character palette matching the theme
- A custom background effect (combine/modify existing ones)
- A custom background effect (combine/modify existing building blocks)
- A custom color palette (discrete RGB set matching the brand/mood)
- A custom particle character set
- A novel scene transition or visual moment
Don't just pick from the catalog. The catalog is vocabulary — you write the poem.
## Workflow
### Step 1: Determine Mode and Gather Requirements
### Step 1: Creative Vision
Before any code, articulate the creative concept:
- **Mood/atmosphere**: What should the viewer feel? Energetic, meditative, chaotic, elegant, ominous?
- **Visual story**: What happens over the duration? Build tension? Transform? Dissolve?
- **Color world**: Warm/cool? Monochrome? Neon? Earth tones? What's the dominant hue?
- **Character texture**: Dense data? Sparse stars? Organic dots? Geometric blocks?
- **What makes THIS different**: What's the one thing that makes this project unique?
- **Emotional arc**: How do scenes progress? Open with energy, build to climax, resolve?
Map the user's prompt to aesthetic choices. A "chill lo-fi visualizer" demands different everything from a "glitch cyberpunk data stream."
### Step 2: Technical Design
Establish with user:
- **Input source** — file path, format, duration
- **Mode** — which of the 6 modes above
- **Sections**time-mapped style changes (timestamps → effect names)
- **Resolution** — landscape 1920x1080 (default), portrait 1080x1920, square 1080x1080 @ 24fps; GIFs typically 640x360 @ 15fps
- **Style direction**dense/sparse, bright/dark, chaotic/minimal, color palette
- **Text/branding** — easter eggs, overlays, credits, themed character sets
- **Output format** — MP4 (default), GIF, PNG sequence
- **Aspect ratio** — landscape (16:9), portrait (9:16 for TikTok/Reels/Stories), square (1:1 for IG feed)
### Step 2: Detect Hardware and Set Quality
Before building the script, detect the user's hardware and set appropriate defaults. See `references/optimization.md` § Hardware Detection.
```python
hw = detect_hardware()
profile = quality_profile(hw, target_duration, user_quality_pref)
log(f"Hardware: {hw['cpu_count']} cores, {hw['mem_gb']:.1f}GB RAM")
log(f"Render: {profile['vw']}x{profile['vh']} @{profile['fps']}fps, {profile['workers']} workers")
```
Never hardcode worker counts, resolution, or CRF. Always detect and adapt.
- **Resolution**landscape 1920x1080 (default), portrait 1080x1920, square 1080x1080 @ 24fps
- **Hardware detection** — auto-detect cores/RAM, set quality profile. See `references/optimization.md`
- **Sections**map timestamps to scene functions, each with its own effect/palette/color/shader config
- **Output format** — MP4 (default), GIF (640x360 @ 15fps), PNG sequence
### Step 3: Build the Script
Write as a single Python file. Major components:
Single Python file. Components (with references):
1. **Hardware detection + quality profile** see `references/optimization.md`
2. **Input loader** — mode-dependent; see `references/inputs.md`
3. **Feature analyzer** — audio FFT, video luminance, or pass-through
4. **Grid + renderer** — multi-density character grids with bitmap cache; `_render_vf()` helper for value/hue field → canvas
5. **Character palettes** — multiple palettes chosen per project theme; see `references/architecture.md`
6. **Color system** — HSV + discrete RGB palettes as needed; see `references/architecture.md`
7. **Scene functions** — each returns `canvas (uint8 H,W,3)` directly. May compose multiple grids internally via pixel blend modes. See `references/scenes.md` + `references/composition.md`
8. **Tonemap** — adaptive brightness normalization with per-scene gamma; see `references/composition.md`
9. **Shader pipeline**`ShaderChain` + `FeedbackBuffer` per-section config; see `references/shaders.md`
10. **Scene table + dispatcher**maps time ranges to scene functions + shader/feedback configs; see `references/scenes.md`
11. **Parallel encoder** — N-worker batch clip rendering with ffmpeg pipes
1. **Hardware detection + quality profile**`references/optimization.md`
2. **Input loader** — mode-dependent; `references/inputs.md`
3. **Feature analyzer** — audio FFT, video luminance, or synthetic
4. **Grid + renderer** — multi-density grids with bitmap cache; `references/architecture.md`
5. **Character palettes** — multiple per project; `references/architecture.md` § Palettes
6. **Color system** — HSV + discrete RGB + harmony generation; `references/architecture.md` § Color
7. **Scene functions** — each returns `canvas (uint8 H,W,3)`; `references/scenes.md`
8. **Tonemap** — adaptive brightness normalization; `references/composition.md`
9. **Shader pipeline**`ShaderChain` + `FeedbackBuffer`; `references/shaders.md`
10. **Scene table + dispatcher**time → scene function + config; `references/scenes.md`
11. **Parallel encoder** — N-worker clip rendering with ffmpeg pipes
12. **Main** — orchestrate full pipeline
### Step 4: Handle Critical Bugs
### Step 4: Quality Verification
#### Font Cell Height (macOS Pillow)
- **Test frames first**: render single frames at key timestamps before full render
- **Brightness check**: `canvas.mean() > 8` for all ASCII content. If dark, lower gamma
- **Visual coherence**: do all scenes feel like they belong to the same video?
- **Creative vision check**: does the output match the concept from Step 1? If it looks generic, go back
`textbbox()` returns wrong height. Use `font.getmetrics()`:
## Critical Implementation Notes
```python
ascent, descent = font.getmetrics()
cell_height = ascent + descent # correct
```
### Brightness — Use `tonemap()`, Not Linear Multipliers
#### ffmpeg Pipe Deadlock
Never use `stderr=subprocess.PIPE` with long-running ffmpeg. Redirect to file:
```python
stderr_fh = open(err_path, "w")
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=stderr_fh)
```
#### Brightness — Use `tonemap()`, Not Linear Multipliers
ASCII on black is inherently dark. This is the #1 visual issue. **Do NOT use linear `* N` brightness multipliers** — they clip highlights and wash out the image. Instead, use the **adaptive tonemap** function from `references/composition.md`:
This is the #1 visual issue. ASCII on black is inherently dark. **Never use `canvas * N` multipliers** — they clip highlights. Use adaptive tonemap:
```python
def tonemap(canvas, gamma=0.75):
"""Percentile-based adaptive normalization + gamma. Replaces all brightness multipliers."""
f = canvas.astype(np.float32)
lo = np.percentile(f, 1) # black point (1st percentile)
hi = np.percentile(f, 99.5) # white point (99.5th percentile)
if hi - lo < 1: hi = lo + 1
f = (f - lo) / (hi - lo)
f = np.clip(f, 0, 1) ** gamma # gamma < 1 = brighter mids
lo, hi = np.percentile(f[::4, ::4], [1, 99.5])
if hi - lo < 10: hi = lo + 10
f = np.clip((f - lo) / (hi - lo), 0, 1) ** gamma
return (f * 255).astype(np.uint8)
```
Pipeline ordering: `scene_fn() → tonemap() → FeedbackBuffer → ShaderChain → ffmpeg`
Pipeline: `scene_fn() → tonemap() → FeedbackBuffer → ShaderChain → ffmpeg`
Per-scene gamma overrides for destructive effects:
- Default: `gamma=0.75`
- Solarize scenes: `gamma=0.55` (solarize darkens above-threshold pixels)
- Posterize scenes: `gamma=0.50` (quantization loses brightness range)
- Already-bright scenes: `gamma=0.85`
Per-scene gamma: default 0.75, solarize 0.55, posterize 0.50, bright scenes 0.85. Use `screen` blend (not `overlay`) for dark layers.
Additional brightness best practices:
- Dense animated backgrounds — never flat black, always fill the grid
- Vignette minimum clamped to 0.15 (not 0.12)
- Bloom threshold lowered to 130 (not 170) so more pixels contribute to glow
- Use `screen` blend mode (not `overlay`) when compositing dark ASCII layers — overlay squares dark values: `2 * 0.12 * 0.12 = 0.03`
### Font Cell Height
#### Font Compatibility
macOS Pillow: `textbbox()` returns wrong height. Use `font.getmetrics()`: `cell_height = ascent + descent`. See `references/troubleshooting.md`.
Not all Unicode characters render in all fonts. Validate palettes at init:
```python
for c in palette:
img = Image.new("L", (20, 20), 0)
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
if np.array(img).max() == 0:
log(f"WARNING: char '{c}' (U+{ord(c):04X}) not in font, removing from palette")
```
### ffmpeg Pipe Deadlock
### Step 4b: Per-Clip Architecture (for segmented videos)
Never `stderr=subprocess.PIPE` with long-running ffmpeg — buffer fills at 64KB and deadlocks. Redirect to file. See `references/troubleshooting.md`.
When the video has discrete segments (quotes, scenes, chapters), render each as a separate clip file. This enables:
- Re-rendering individual clips without touching the rest (`--clip q05`)
- Faster iteration on specific sections
- Easy reordering or trimming in post
### Font Compatibility
```python
segments = [
{"id": "intro", "start": 0.0, "end": 5.0, "type": "intro"},
{"id": "q00", "start": 5.0, "end": 12.0, "type": "quote", "qi": 0, ...},
{"id": "t00", "start": 12.0, "end": 13.5, "type": "transition", ...},
{"id": "outro", "start": 208.0, "end": 211.6, "type": "outro"},
]
Not all Unicode chars render in all fonts. Validate palettes at init — render each char, check for blank output. See `references/troubleshooting.md`.
from concurrent.futures import ProcessPoolExecutor, as_completed
with ProcessPoolExecutor(max_workers=hw["workers"]) as pool:
futures = {pool.submit(render_clip, seg, features, path): seg["id"]
for seg, path in clip_args}
for fut in as_completed(futures):
fut.result()
```
### Per-Clip Architecture
CLI: `--clip q00 t00 q01` to re-render specific clips, `--list` to show segments, `--skip-render` to re-stitch only.
For segmented videos (quotes, scenes, chapters), render each as a separate clip file for parallel rendering and selective re-rendering. See `references/scenes.md`.
### Step 5: Render and Iterate
Performance targets per frame:
## Performance Targets
| Component | Budget |
|-----------|--------|
@@ -233,24 +191,15 @@ Performance targets per frame:
| Shader pipeline | 5-25ms |
| **Total** | ~100-200ms/frame |
**Fast iteration**: render single test frames to check brightness/layout before full render:
```python
canvas = render_single_frame(frame_index, features, renderer)
Image.fromarray(canvas).save("test.png")
```
**Brightness verification**: sample 5-10 frames across video, check `mean > 8` for ASCII content.
## References
| File | Contents |
|------|----------|
| `references/architecture.md` | Grid system (landscape/portrait/square resolution presets), font selection, character palettes (library of 20+), color system (HSV + OKLAB/OKLCH + discrete RGB + color harmony generation + perceptual gradient interpolation), `_render_vf()` helper, compositing, v2 effect function contract |
| `references/inputs.md` | All input sources: audio analysis, video sampling, image conversion, text/lyrics, TTS integration (ElevenLabs, voice assignment, audio mixing) |
| `references/effects.md` | Effect building blocks: 20+ value field generators (trig, noise/fBM, domain warp, voronoi, reaction-diffusion, cellular automata, strange attractors, SDFs), 8 hue field generators, coordinate transforms (rotate/tile/polar/Möbius), temporal coherence (easing, keyframes, morphing), radial/wave/fire effects, advanced particles (flocking, flow fields, trails), composing guide |
| `references/shaders.md` | 38 shader implementations (geometry, channel, color, glow, noise, pattern, tone, glitch, mirror), `ShaderChain` class, full `_apply_shader_step()` dispatch, audio-reactive scaling, transitions, tint presets |
| `references/composition.md` | **v2 core**: pixel blend modes (20 modes with implementations), multi-grid composition, `_render_vf()` helper, adaptive `tonemap()`, per-scene gamma, `FeedbackBuffer` with spatial transforms, `PixelBlendStack`, masking/stencil system (shape masks, text stencils, animated masks, boolean ops) |
| `references/scenes.md` | **v2 scene protocol**: scene function contract (local time convention), `Renderer` class, `SCENES` table structure, `render_clip()` loop, beat-synced cutting, parallel rendering + pickling constraints, 4 complete scene examples, scene design checklist |
| `references/design-patterns.md` | **Scene composition patterns**: layer hierarchy (bg/content/accent), directional parameter arcs vs oscillation, scene concepts and visual metaphors, counter-rotating dual systems, wave collision, progressive fragmentation, entropy/consumption, staggered layer entry (crescendo), scene ordering |
| `references/troubleshooting.md` | NumPy broadcasting traps, blend mode pitfalls, multiprocessing/pickling issues, brightness diagnostics, ffmpeg deadlocks, font issues, performance bottlenecks, common mistakes |
| `references/optimization.md` | Hardware detection, adaptive quality profiles (draft/preview/production/max), CLI integration, vectorized effect patterns, parallel rendering, memory management |
| `references/architecture.md` | Grid system, resolution presets, font selection, character palettes (20+), color system (HSV + OKLAB + discrete RGB + harmony generation), `_render_vf()` helper, GridLayer class |
| `references/composition.md` | Pixel blend modes (20 modes), `blend_canvas()`, multi-grid composition, adaptive `tonemap()`, `FeedbackBuffer`, `PixelBlendStack`, masking/stencil system |
| `references/effects.md` | Effect building blocks: value field generators, hue fields, noise/fBM/domain warp, voronoi, reaction-diffusion, cellular automata, SDFs, strange attractors, particle systems, coordinate transforms, temporal coherence |
| `references/shaders.md` | `ShaderChain`, `_apply_shader_step()` dispatch, 38 shader catalog, audio-reactive scaling, transitions, tint presets, output format encoding, terminal rendering |
| `references/scenes.md` | Scene protocol, `Renderer` class, `SCENES` table, `render_clip()`, beat-synced cutting, parallel rendering, design patterns (layer hierarchy, directional arcs, visual metaphors, compositional techniques), complete scene examples at every complexity level, scene design checklist |
| `references/inputs.md` | Audio analysis (FFT, bands, beats), video sampling, image conversion, text/lyrics, TTS integration (ElevenLabs, voice assignment, audio mixing) |
| `references/optimization.md` | Hardware detection, quality profiles, vectorized patterns, parallel rendering, memory management, performance budgets |
| `references/troubleshooting.md` | NumPy broadcasting traps, blend mode pitfalls, multiprocessing/pickling, brightness diagnostics, ffmpeg issues, font problems, common mistakes |
@@ -1,14 +1,6 @@
# Architecture Reference
**Cross-references:**
- Effect building blocks (value fields, noise, SDFs, particles): `effects.md`
- `_render_vf()`, blend modes, tonemap, masking: `composition.md`
- Scene protocol, render_clip, SCENES table: `scenes.md`
- Shader pipeline, feedback buffer, output encoding: `shaders.md`
- Complete scene examples: `examples.md`
- Input sources (audio analysis, video, TTS): `inputs.md`
- Performance tuning, hardware detection: `optimization.md`
- Common bugs (broadcasting, font, encoding): `troubleshooting.md`
> **See also:** composition.md · effects.md · scenes.md · shaders.md · inputs.md · optimization.md · troubleshooting.md
## Grid System

Some files were not shown because too many files have changed in this diff Show More