Compare commits

..

112 Commits

Author SHA1 Message Date
Teknium 417eccc054 feat(mattermost): configurable mention behavior — respond without @mention
Adds MATTERMOST_REQUIRE_MENTION and MATTERMOST_FREE_RESPONSE_CHANNELS
env vars, matching Discord's existing mention gating pattern.

- MATTERMOST_REQUIRE_MENTION=false: respond to all channel messages
- MATTERMOST_FREE_RESPONSE_CHANNELS=id1,id2: specific channels where
  bot responds without @mention even when require_mention is true
- DMs always respond regardless of mention settings
- @mention is now stripped from message text (clean agent input)

7 new tests for mention gating, free-response channels, DM bypass,
and mention stripping. Updated existing test for mention stripping.

Docs: updated mattermost.md with Mention Behavior section,
environment-variables.md with new vars, config.py with metadata.
2026-03-28 21:10:24 -07:00
Teknium 3e1157080a fix(tools): use non-deprecated streamable_http_client for MCP HTTP transport (#3646)
Switch MCP HTTP transport from the deprecated streamablehttp_client()
(mcp < 1.24.0) to the new streamable_http_client() API that accepts a
pre-built httpx.AsyncClient.

Changes vs the original PR #3391:
- Separate try/except imports so mcp < 1.24.0 doesn't break (graceful
  fallback to deprecated API instead of losing HTTP MCP entirely)
- Wrap httpx.AsyncClient in async-with for proper lifecycle management
  (the new SDK API explicitly skips closing caller-provided clients)
- Match SDK's own create_mcp_http_client defaults: follow_redirects=True,
  Timeout(connect_timeout, read=300.0)
- Keep deprecated code path as fallback for older SDK versions

Co-authored-by: HenkDz <HenkDz@users.noreply.github.com>
2026-03-28 18:20:49 -07:00
Teknium 1a032ccf79 fix(skills): stop marking persisted env vars missing on remote backends (#3650)
Salvage of PR #3452 (kentimsit). Fixes skill readiness checks on remote backends — persisted env vars are no longer incorrectly marked as missing.

Co-Authored-By: kentimsit <kentimsit@users.noreply.github.com>
2026-03-28 17:52:32 -07:00
Teknium 0bd7e95dfc fix(honcho): allow self-hosted local instances without API key (#3644)
Self-hosted Honcho on localhost doesn't require authentication, but
both the activation gates and the SDK client required an API key.

Combined fix from three contributor PRs:
- Relax all 8 activation gates to accept (api_key OR base_url) as
  valid credentials (#3482 by @cameronbergh)
- Use 'local' placeholder for the SDK client when base_url points to
  localhost/127.0.0.1/::1 (#3570 by @ygd58)

Files changed: run_agent.py (2 gates), cli.py (1 gate),
gateway/run.py (1 gate), honcho_integration/cli.py (2 gates),
hermes_cli/doctor.py (2 gates), honcho_integration/client.py (SDK).

Co-authored-by: cameronbergh <cameronbergh@users.noreply.github.com>
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
Co-authored-by: devorun <devorun@users.noreply.github.com>
2026-03-28 17:49:56 -07:00
Teknium d35567c6e0 feat(web): add Exa as a web search and extract backend (#3648)
Adds Exa (https://exa.ai) as a fourth web backend alongside Parallel,
Firecrawl, and Tavily. Follows the exact same integration pattern:

- Backend selection: config web.backend=exa or auto-detect from EXA_API_KEY
- Search: _exa_search() with highlights for result descriptions
- Extract: _exa_extract() with full text content extraction
- Lazy singleton client with x-exa-integration header
- Wired into web_search_tool and web_extract_tool dispatchers
- check_web_api_key() and requires_env updated
- CLI: hermes setup summary, hermes tools config, hermes config show
- config.py: EXA_API_KEY in OPTIONAL_ENV_VARS with metadata
- pyproject.toml: exa-py>=2.9.0,<3 in dependencies


Salvaged from PR #1850.

Co-authored-by: louiswalsh <louiswalsh@users.noreply.github.com>
2026-03-28 17:35:53 -07:00
Teknium bea49e02a3 fix: route /bg spinner through TUI widget to prevent status bar collision (#3643)
Background agent's KawaiiSpinner wrote \r-based animation and stop()
messages through StdoutProxy, colliding with prompt_toolkit's status bar.

Two fixes:
- display.py: use isinstance(out, StdoutProxy) instead of fragile
  hasattr+name check for detecting prompt_toolkit's stdout wrapper
- cli.py: silence bg agent's raw spinner (_print_fn=no-op) and route
  thinking updates through the TUI widget only when no foreground
  agent is active; clear spinner text in finally block with same guard

Closes #2718

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-28 17:29:37 -07:00
nguyen binh c6e2e486bf fix: add download retry to cache_audio_from_url matching cache_image_from_url (#3401)
PR #3323 added retry with exponential backoff to cache_image_from_url
but missed the sibling function cache_audio_from_url 18 lines below in
the same file. A single transient 429/5xx/timeout loses voice messages
while image downloads now survive them.

Apply the same retry pattern: 3 attempts with 1.5s exponential backoff,
immediate raise on non-retryable 4xx.
2026-03-28 17:28:38 -07:00
Teknium 973deb4f76 fix(browser): guard LLM response content against None in snapshot and vision (#3642)
Salvage of PR #3532 (binhnt92). Guards browser_tool.py against None content from reasoning-only models (DeepSeek-R1, QwQ). Follow-up to #3449.

Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>
2026-03-28 17:25:04 -07:00
Teknium dc74998718 fix(sessions): support stdout (-) in session and snapshot export (salvage #3617) (#3641)
* fix(sessions): support stdout when output path is '-' in session export

* fix: style cleanup + extend stdout support to snapshot export

Follow-up for salvaged PR #3617:
- Fix import sys; on one line (style consistency)
- Update help text to mention - for stdout
- Apply same stdout support to hermes skills snapshot export

---------

Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-03-28 17:24:32 -07:00
Teknium 17617e4399 feat(discord): DISCORD_IGNORE_NO_MENTION — skip messages that @mention others but not the bot (#3640)
Salvage of PR #3310 (luojiesi). When DISCORD_IGNORE_NO_MENTION=true (default), messages that @mention other users but not the bot are silently skipped in server channels. DMs excluded — mentions there are just references.

Co-Authored-By: luojiesi <luojiesi@users.noreply.github.com>
2026-03-28 17:19:41 -07:00
Siddharth Balyan ffdfeb91d8 fix(nix): unify directory and file permissions across all three layers (#3619)
Activation script, tmpfiles, and container entrypoint now agree on
0750 for all directories. Tighten config.yaml and workspace documents
from 0644 to 0640 (group-readable, no world access). Add explicit
chmod for .managed marker and container $TARGET_HOME to eliminate
umask dependence. Secrets (auth.json, .env) remain 0600.
2026-03-29 05:29:24 +05:30
Teknium 857a5d7b47 fix: sanitize surrogate characters from clipboard paste to prevent UnicodeEncodeError (#3624)
Pasting text from rich-text editors (Google Docs, Word, etc.) can inject
lone surrogate characters (U+D800..U+DFFF) that are invalid UTF-8.
The OpenAI SDK serializes messages with ensure_ascii=False, then encodes
to UTF-8 for the HTTP body — surrogates crash this with:
  UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2'

Three-layer fix:
1. Primary: sanitize user_message at the top of run_conversation()
2. CLI: sanitize in chat() before appending to conversation_history
3. Safety net: catch UnicodeEncodeError in the API error handler,
   sanitize the entire messages list in-place, and retry once.
   Also exclude UnicodeEncodeError from is_local_validation_error
   so it doesn't get classified as non-retryable.

Includes 14 new tests covering the sanitization helpers and the
integration with run_conversation().
2026-03-28 16:53:14 -07:00
Teknium b029742092 fix(cli): strengthen paste collapse fallback for terminals without bracketed paste (#3625)
The _on_text_changed fallback only detected pastes when all characters
arrived in a single event (chars_added > 1).  Some terminals (notably
VSCode integrated terminal in certain configs) may deliver paste data
differently, causing the fallback to miss.

Add a second heuristic: if the newline count jumps by 4+ in a single
text-change event, treat it as a paste.  Alt+Enter only adds 1 newline
per event, so this never false-positives on manual multi-line input.

Also fixes: the fallback path was missing _paste_just_collapsed flag
set before replacing buffer text, which could cause a re-trigger loop.
2026-03-28 15:40:49 -07:00
Teknium 02fb7c4aaf docs: comprehensive docs audit — fix 12 stale/missing items across 10 pages (#3618)
Fixes found by auditing docs against recent PRs/commits:

Critical (misleading):
- hooks.md: Remove stale 'planned — not yet wired' markers for 4 hooks
  that are now active (#3542). Add correct callback signatures.
- security.md: Update tirith verdict behavior — block verdicts now go
  through approval flow instead of hard-blocking (#3428). Add pkill/killall
  self-termination guard and gateway-run backgrounding patterns (#3593).

New feature docs:
- configuration.md: Add tool_use_enforcement section with value table
  (auto/true/false/list) from #3551/#3528.
- configuration.md: Expand auxiliary config with per-task timeouts
  (compression 120s, web_extract 30s, approval 30s) from #3597.
- api-server.md: Add /v1/health alias, Security Headers section,
  CORS details (Max-Age, SSE headers, Idempotency-Key) from
  #3572/#3573/#3576/#3580/#3530.

Stale/incomplete:
- configuration.md: Fix Alibaba model name qwen-plus -> qwen3.5-plus (#3484).
- environment-variables.md: Specify actual DashScope default URL.
- cli-commands.md: Add alibaba to --provider list.
- fallback-providers.md: Add Alibaba/DashScope to provider table.
- email.md: Document noreply/automated sender filtering (#3606).
- toolsets-reference.md: Add 4 missing platform toolsets — matrix,
  mattermost, dingtalk, api-server (#3583).
- skills.md: List default GitHub taps including garrytan/gstack (#3605).
2026-03-28 15:26:35 -07:00
Teknium 1e924e99b9 refactor: consolidate ~/.hermes directory layout with backward compat (#3610)
New installs get a cleaner structure:
  cache/images/      (was image_cache/)
  cache/audio/       (was audio_cache/)
  cache/documents/   (was document_cache/)
  cache/screenshots/ (was browser_screenshots/)
  platforms/whatsapp/session/ (was whatsapp/session/)
  platforms/matrix/store/    (was matrix/store/)
  platforms/pairing/         (was pairing/)

Existing installs are unaffected -- get_hermes_dir() checks for the
old path first and uses it if present. No migration needed.

Adds get_hermes_dir(new_subpath, old_name) helper to hermes_constants.py
for reuse by any future subsystem.
2026-03-28 15:22:19 -07:00
Teknium 614e43d3d9 feat(skills): add garrytan/gstack as default Skills Hub tap (#3605)
Add the gstack community skills repo to the default tap list and fix
skill_identifier construction for repos with an empty path prefix.

Co-authored-by: Tugrul Guner <tugrulguner@users.noreply.github.com>
2026-03-28 14:55:49 -07:00
Teknium e4480ff426 fix(config): accept 'model' key as alias for 'default' in model config (#3603)
Users intuitively write model: { model: my-model } instead of
model: { default: my-model } and it silently falls back to the
hardcoded default. Now both spellings work across all three config
consumers: runtime_provider, CLI, and gateway.

Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
2026-03-28 14:55:27 -07:00
Teknium 9a364f2805 fix: cap percentage displays at 100% in stats, gateway, and memory tool (#3599)
Salvage of PR #3533 (binhnt92). Follow-up to #3480 — applies min(100, ...) to 5 remaining unclamped percentage display sites in context_compressor, cli /stats, gateway /stats, and memory tool. Defensive clamps now that the root cause (estimation heuristic) was already removed in #3480.

Co-Authored-By: binhnt92 <binhnt92@users.noreply.github.com>
2026-03-28 14:55:18 -07:00
Teknium 1b2d4f21f3 feat(cli): show resume-by-title command in exit summary (#3607)
When exiting a session that has a title (auto-generated or manual),
the exit summary now also shows:
  hermes -c "Session Title"
alongside the existing hermes --resume <id> command.

Also adds the title to the session info block.
2026-03-28 14:54:53 -07:00
Teknium 9009169eeb fix: recover updater when venv pip is missing (#3608)
Some environments lose pip inside the venv. Before invoking pip install,
check pip --version and bootstrap with ensurepip if missing. Applied to
both update code paths (_update_via_zip and cmd_update).


Salvaged from PR #3359.

Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
2026-03-28 14:54:49 -07:00
Teknium 0f042f3930 fix(email): filter automated/noreply senders to prevent reply loops (salvage #3461) (#3606)
* fix(gateway): filter automated/noreply senders in email adapter

Fixes #3453

Adds noreply/automated sender filtering to the email adapter. Drops emails from noreply, mailer-daemon, postmaster addresses and bulk mail headers (Auto-Submitted, Precedence, List-Unsubscribe) before dispatching. Prevents pairing codes and AI responses being sent to automated senders.

* fix: remove redundant seen_uids add + trailing whitespace cleanup

---------

Co-authored-by: devorun <130918800+devorun@users.noreply.github.com>
2026-03-28 14:50:50 -07:00
Siddharth Balyan 7a9e45e560 fix: regenerate uv.lock to match v0.5.0 in pyproject.toml (#3594)
The lockfile was still pinned to hermes-agent 0.4.0 after the v0.5.0
release, causing downstream consumers (e.g. the Nix package built via
uv2nix) to report the wrong version.  Also drops stale transitive deps
(bashlex, boto3, swe-rex) that were carried over from the removed
swe-rex integration.
2026-03-29 03:19:47 +05:30
Teknium a641f20cac fix(gateway): self-heal missing launchd plist on start (#3601)
When the plist is deleted (manual cleanup, failed upgrade),
hermes gateway start now regenerates it automatically instead of
failing. Also simplifies the returncode==3 error path since the
plist is guaranteed to exist at that point.

Co-authored-by: Bartok9 <Bartok9@users.noreply.github.com>
2026-03-28 14:48:55 -07:00
Teknium ee066b7be6 fix: use placeholder api_key for custom providers without credentials (#3604)
Local/custom OpenAI-compatible providers (Ollama, LM Studio, vLLM) that
don't require auth were hitting empty api_key rejections from the OpenAI
SDK, especially when used as smart model routing targets.

Uses the same 'no-key-required' placeholder already used in
_resolve_openrouter_runtime() for the identical scenario.


Salvaged from PR #3543.

Co-authored-by: scottlowry <scottlowry@users.noreply.github.com>
2026-03-28 14:47:41 -07:00
Mibay a6bc13ce13 fix(github-auth): check ~/.hermes/.env before ~/.git-credentials for token extraction (#3466)
* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials for token extraction

Users who configured their token via `hermes setup` have it stored in
~/.hermes/.env (GITHUB_TOKEN=...), not in ~/.git-credentials. On macOS
with osxkeychain as the default git credential helper, ~/.git-credentials
may not exist at all, causing silent 401 failures in all GitHub skills.

Add ~/.hermes/.env as the first fallback in the auth detection block and
the inline "Extracting the Token from Git Credentials" example.

Priority order: env var → ~/.hermes/.env → ~/.git-credentials → none

Part of fix for NousResearch/hermes-agent#3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464

* fix(github-auth): check ~/.hermes/.env before ~/.git-credentials

Fixes #3464
2026-03-28 14:46:49 -07:00
Teknium f803f66339 fix(terminal): avoid merging heredoc EOF with fence wrapper (#3598)
One-shot local execution built `printf FENCE; <cmd>; __hermes_rc=...`, so a
command ending in a heredoc produced a closing line like `EOF; __hermes_rc=...`,
which is not a valid delimiter. Bash then treated the rest of the wrapper as
heredoc body, leaking it into tool output (e.g. gh issue/PR flows).

Use newline-separated wrapper lines so the delimiter stays alone and the
trailer runs after the heredoc completes.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-28 14:43:41 -07:00
Teknium 839d9d7471 feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)
Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
2026-03-28 14:35:28 -07:00
Teknium 404a0b823e fix: add self-termination guard for pkill/killall targeting hermes/gateway (#3593)
Prevent the agent from accidentally killing its own process with
pkill -f gateway, killall hermes, etc. Adds a dangerous command
pattern that triggers the approval flow.

Co-authored-by: arasovic <arasovic@users.noreply.github.com>
2026-03-28 14:33:48 -07:00
Teknium dabe3c34cc feat(webhook): hermes webhook CLI + skill for event-driven subscriptions (#3578)
Adds 'hermes webhook' CLI subcommand and a skill — zero new model tools.

CLI commands (require webhook platform to be enabled):
  hermes webhook subscribe <name> [--events, --prompt, --deliver, ...]
  hermes webhook list
  hermes webhook remove <name>
  hermes webhook test <name>

All commands gate on webhook platform being enabled in config. If not
configured, prints setup instructions (gateway setup wizard, manual
config.yaml, or env vars).

The agent uses these via terminal tool, guided by the webhook-subscriptions
skill which documents setup, common patterns (GitHub, Stripe, CI/CD,
monitoring), prompt template syntax, security, and troubleshooting.

Adapter enhancement: webhook.py hot-reloads dynamic subscriptions from
~/.hermes/webhook_subscriptions.json on each incoming request (mtime-gated).
Static config.yaml routes always take precedence.

Docs: updated webhooks.md with Dynamic Subscriptions section, added
hermes webhook to cli-commands.md reference.

No new model tools. No toolset changes.

24 new tests for CLI CRUD, persistence, enabled-gate, and adapter
dynamic route loading.
2026-03-28 14:33:35 -07:00
Teknium 82d6c28bd5 fix(skills): cache-aware /skills install and uninstall in TUI (#3586)
Two fixes for /skills install and /skills uninstall slash commands:

1. input() hangs indefinitely inside prompt_toolkit's TUI event loop,
   soft-locking the CLI. The user typing the slash command is already
   implicit consent, so confirmation is now always skipped.

2. Cache invalidation was unconditional — installing or uninstalling a
   skill mid-session silently broke the prompt cache, increasing costs.
   The slash handler now defers cache invalidation by default (skill
   takes effect next session). Pass --now to invalidate immediately,
   with a message explaining the cost tradeoff. The CLI argparse path
   (hermes skills install) is unaffected and still invalidates.

Fixes #3474
Salvaged from PR #3496 by dlkakbs.
2026-03-28 14:32:23 -07:00
Islandman93 dc7d504aca Remove incorrect docker alternative for signal-cli (#3545)
Removed docker alternative for signal-cli-rest-api from the documentation. It does not support the raw signal-cli http daemon. See https://github.com/bbernhard/signal-cli-rest-api/issues/720
2026-03-28 14:28:57 -07:00
Teknium 9e411f7d70 fix(update): skip config migration prompts in non-interactive sessions (#3584)
hermes update hangs on input() when run from cron, scripts, or piped
contexts. Check both stdin and stdout isatty(), catch EOFError as a
fallback, and print guidance to run 'hermes config migrate' later.

Co-authored-by: phippsbot-byte <phippsbot-byte@users.noreply.github.com>
2026-03-28 14:26:32 -07:00
Teknium 708f187549 fix(gateway): exit with failure when all platforms fail with retryable errors (#3592)
When all messaging platforms exhaust retries and get queued for background
reconnection, exit with code 1 so systemd Restart=on-failure can restart
the process. Previously the gateway stayed alive as a zombie with no
connected platforms and exit code 0.

Salvaged from PR #3567 by kelsia14. Test updates added.

Co-authored-by: kelsia14 <kelsia14@users.noreply.github.com>
2026-03-28 14:25:12 -07:00
Teknium d7c41f3cef fix(telegram): honor proxy env vars in fallback transport (salvage #3411) (#3591)
* fix: keep gateway running through telegram proxy failures

- continue gateway startup in degraded mode when Telegram cannot connect yet
- ensure Telegram fallback transport also honors proxy env vars
- support reconnect retries without taking down the whole gateway

* test(telegram): cover proxy env handling in fallback transport

---------

Co-authored-by: kufufu9 <pi@local>
2026-03-28 14:23:27 -07:00
Teknium 6893c3befc fix(gateway): inject PATH + VIRTUAL_ENV into launchd plist for macOS service (#3585)
Salvage of PR #2173 (hanai) and PR #3432 (timknip).

Injects PATH, VIRTUAL_ENV, and HERMES_HOME into the macOS launchd plist so gateway subprocesses find user-installed tools (node, ffmpeg, etc.). Matches systemd unit parity with venv/bin, node_modules/.bin, and resolved node dir in PATH. Includes 7 new tests and docs updates across 4 pages.

Co-Authored-By: Han <ihanai1991@gmail.com>
Co-Authored-By: timknip <timknip@users.noreply.github.com>
2026-03-28 14:23:26 -07:00
Teknium 5cdc24c2e2 docs(slack): add missing Messages Tab setup step (#3590)
Without enabling the Messages Tab in App Home settings, users see
"Sending messages to this app has been turned off" when trying to DM
the bot — even with all correct scopes and event subscriptions.

Add Step 5 (Enable the Messages Tab) between Event Subscriptions and
Install App, with a danger admonition. Also add troubleshooting entry
for this specific error message. Renumber subsequent steps (6→7→8→9).

Co-authored-by: Alberto Leal <mail4alberto@gmail.com>
2026-03-28 14:23:19 -07:00
Teknium 2dd286c162 fix: write models.dev disk cache atomically (#3588)
Use atomic_json_write() from utils.py instead of plain open()/json.dump()
for the models.dev disk cache. Prevents corrupted cache if the process is
killed mid-write — _load_disk_cache() silently returns {} on corrupt JSON,
losing all model metadata until the next successful API fetch.

Co-authored-by: memosr <memosr@users.noreply.github.com>
2026-03-28 14:20:30 -07:00
Teknium 924857c3e3 fix: prevent tool name/arg concatenation for Ollama-compatible endpoints (#3582)
Ollama reuses index 0 for every tool call in a parallel batch,
distinguishing them only by id.  The streaming accumulator now
detects a new non-empty id at an already-active index and redirects
it to a fresh slot, preventing names and arguments from being
concatenated into a single tool call.

No-op for normal providers that use incrementing indices.

Co-authored-by: dmater01 <dmater01@users.noreply.github.com>
2026-03-28 14:08:26 -07:00
Teknium ba3bbf5b53 fix: add missing mattermost/matrix/dingtalk toolsets + platform consistency tests (salvage #3512) (#3583)
* Fixing mattermost configuration parsing bugs

* fix: add homeassistant to skills_config + platform consistency tests

Follow-up for cherry-picked #3512:
- Add homeassistant to skills_config.py PLATFORMS (was in tools_config
  but missing from skills_config)
- Add 3 consistency tests that verify all platforms in tools_config have
  matching toolset definitions, gateway includes, and skills_config entries
  — prevents this class of bug from recurring

---------

Co-authored-by: DaneelV3 <dannel@v3rtical.tech>
2026-03-28 14:05:02 -07:00
Teknium d6b4fa2e9f fix: strip @botname from commands so /new@TigerNanoBot resolves correctly (#3581)
Commands sent directly to the bot in groups include @botname suffix
(e.g. /compress@TigerNanoBot). get_command() now strips the @anything
part before lookup, matching how Telegram bot menu generates commands.
Fixes all slash commands silently doing nothing when sent with @mention.

Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>
2026-03-28 14:01:01 -07:00
Teknium df1bf0a209 feat(api-server): add basic security headers (#3576)
Add X-Content-Type-Options: nosniff and Referrer-Policy: no-referrer
to all API server responses via a new security_headers_middleware.

Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>
2026-03-28 14:00:52 -07:00
Teknium 49a49983e4 feat(api-server): add Access-Control-Max-Age to CORS preflight responses (#3580)
Adds Access-Control-Max-Age: 600 to CORS preflight responses, telling
browsers to cache the preflight for 10 minutes. Reduces redundant OPTIONS
requests and improves perceived latency for browser-based API clients.

Salvaged from PR #3514 by aydnOktay.

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-28 14:00:03 -07:00
Teknium e97c0cb578 fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support
* feat: GPT tool-use steering + strip budget warnings from history

Two changes to improve tool reliability, especially for OpenAI GPT models:

1. GPT tool-use enforcement prompt: Adds GPT_TOOL_USE_GUIDANCE to the
   system prompt when the model name contains 'gpt' and tools are loaded.
   This addresses a known behavioral pattern where GPT models describe
   intended actions ('I will run the tests') instead of actually making
   tool calls. Inspired by similar steering in OpenCode (beast.txt) and
   Cline (GPT-5.1 variant).

2. Budget warning history stripping: Budget pressure warnings injected by
   _get_budget_warning() into tool results are now stripped when
   conversation history is replayed via run_conversation(). Previously,
   these turn-scoped signals persisted across turns, causing models to
   avoid tool calls in all subsequent messages after any turn that hit
   the 70-90% iteration threshold.

* fix: replace hardcoded ~/.hermes paths with get_hermes_home() for profile support

Prep for the upcoming profiles feature — each profile is a separate
HERMES_HOME directory, so all paths must respect the env var.

Fixes:
- gateway/platforms/matrix.py: Matrix E2EE store was hardcoded to
  ~/.hermes/matrix/store, ignoring HERMES_HOME. Now uses
  get_hermes_home() so each profile gets its own Matrix state.

- gateway/platforms/telegram.py: Two locations reading config.yaml via
  Path.home()/.hermes instead of get_hermes_home(). DM topic thread_id
  persistence and hot-reload would read the wrong config in a profile.

- tools/file_tools.py: Security path for hub index blocking was
  hardcoded to ~/.hermes, would miss the actual profile's hub cache.

- hermes_cli/gateway.py: Service naming now uses the profile name
  (hermes-gateway-coder) instead of a cryptic hash suffix. Extracted
  _profile_suffix() helper shared by systemd and launchd.

- hermes_cli/gateway.py: Launchd plist path and Label now scoped per
  profile (ai.hermes.gateway-coder.plist). Previously all profiles
  would collide on the same plist file on macOS.

- hermes_cli/gateway.py: Launchd plist now includes HERMES_HOME in
  EnvironmentVariables — was missing entirely, making custom
  HERMES_HOME broken on macOS launchd (pre-existing bug).

- All launchctl commands in gateway.py, main.py, status.py updated
  to use get_launchd_label() instead of hardcoded string.

Test fixes: DM topic tests now set HERMES_HOME env var alongside
Path.home() mock. Launchd test uses get_launchd_label() for expected
commands.
2026-03-28 13:51:08 -07:00
Teknium c0aa06f300 fix(test): update streaming test to match PR #3566 behavior change (#3574)
PR #3566 intentionally routes suppressed content to stream_delta_callback
when tool calls are present, so reasoning tag extraction can fire during
streaming. The test was still asserting the old behavior where content
after tool calls was fully suppressed from the callback.

Updated the assertion to match: content IS delivered to the callback
(for tag extraction), with display-level suppression handled by the
CLI's _stream_delta.
2026-03-28 13:41:23 -07:00
Teknium 3273732891 fix(api-server): add CORS headers to streaming SSE responses (#3573)
StreamResponse headers are flushed on prepare() before the CORS
middleware can inject them. Resolve CORS headers up front using
_cors_headers_for_origin() so the full set (including
Access-Control-Allow-Origin) is present on SSE streams.

Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
2026-03-28 13:38:30 -07:00
Teknium 09ebf8b252 feat(api-server): add /v1/health alias for OpenAI compatibility (#3572)
Add GET /v1/health as an alias to the existing /health endpoint so
OpenAI-compatible health checks work out of the box.

Co-authored-by: Oktay Aydin <aydnOktay@users.noreply.github.com>
2026-03-28 13:32:39 -07:00
Teknium 33c89e52ec fix(whatsapp): add **kwargs to media sending methods to accept metadata (#3571)
The base orchestrator passes metadata=_thread_metadata to
send_image_file, send_video, and send_document. WhatsApp was the
only platform adapter missing the parameter, causing TypeError
crashes when sending media.

Extended to all three methods (original PR only fixed send_image_file).


Salvaged from PR #3144.

Co-authored-by: afifai <afifai@users.noreply.github.com>
2026-03-28 13:28:04 -07:00
Teknium 558cc14ad9 chore: release v0.5.0 (v2026.3.28) (#3568)
The hardening release — Nous Portal 400+ models, Hugging Face provider,
Telegram Private Chat Topics, native Modal SDK, plugin lifecycle hooks,
improved OpenAI model reliability, Nix flake, supply chain hardening,
Anthropic output limits fix, and 50+ security/reliability fixes.

165 merged PRs, 65 closed issues across a 5-day window.
2026-03-28 13:11:39 -07:00
Teknium 1d0a119368 fix(display): show reasoning before response when tool calls suppress content (#3566)
* fix(provider): remove MiniMax /v1→/anthropic auto-correction to allow user override

The minimax-specific auto-correction in runtime_provider.py was
preventing users from overriding to the OpenAI-compatible endpoint
via MINIMAX_BASE_URL. Users in certain regions get nginx 404 on
api.minimax.io/anthropic and need to switch to api.minimax.chat/v1.

The generic URL-suffix detection already handles /anthropic →
anthropic_messages, so the minimax-specific code was redundant for
the default path and harmful for the override path.

Now: default /anthropic URL works via generic detection, user
override to /v1 gets chat_completions mode naturally.

Closes #3546 (different approach — respects user overrides instead
of changing the default endpoint).

* fix(display): show reasoning during streaming even when tool calls suppress content

When a model generates content (containing <REASONING_SCRATCHPAD> tags)
alongside tool calls in the same API response, content deltas were
suppressed from streaming once any tool call chunk arrived. This
prevented the CLI's tag extraction from running, so reasoning was
never shown during streaming. The post-response fallback then
displayed reasoning AFTER the already-visible streamed response,
creating a confusing reversed order.

Fix: route suppressed content to stream_delta_callback even when tool
calls are present. The CLI's _stream_delta handles tag extraction —
reasoning tags are routed to the reasoning display box, while
non-reasoning text is handled by the existing stream display logic.
This ensures reasoning appears before tool execution and the final
response, matching the expected visual order.
2026-03-28 12:34:32 -07:00
Teknium 901494d728 feat: make tool-use enforcement configurable via agent.tool_use_enforcement (#3551)
The TOOL_USE_ENFORCEMENT_GUIDANCE injection (added in #3528) was
hardcoded to only match gpt/codex model names. This makes it a
config option so users can turn it on for any model family.

New config key: agent.tool_use_enforcement
  - "auto" (default): matches gpt/codex (existing behavior)
  - true: inject for all models
  - false: never inject
  - list of strings: custom model-name substrings to match
    e.g. ["gpt", "codex", "deepseek", "qwen"]

No version bump needed — deep merge provides the default
automatically for existing installs.

12 new tests covering all config modes.
2026-03-28 12:31:22 -07:00
Osman Mehmood d26ee20659 docs(discord): fix Public Bot setting for Discord-provided invite link (#3519)
The documentation incorrectly instructed users to set Public Bot to OFF,
but this prevents using the Discord-provided invite link (recommended method),
causing the error: 'Private application cannot have a default authorization link'.

Changes:
- Changed Step 2: Public Bot now set to ON (required for Installation tab method)
- Added info callout explaining the Private Bot alternative (use Manual URL)
- Added note in Step 5 Option A clarifying the Public Bot requirement

Fixes Discord bot setup flow for new users following the recommended path.

Co-authored-by: Docs Fix <docs-fix@example.com>
2026-03-28 12:24:43 -07:00
Teknium 393929831e fix(gateway): preserve transcript on /compress and hygiene compression (salvage #3516) (#3556)
* fix(gateway): preserve full transcript on /compress instead of overwriting

The /compress command calls _compress_context() which correctly ends the
old session (preserving its full transcript in SQLite) and creates a new
session_id for the continuation. However, it then immediately called
rewrite_transcript() on the OLD session_id, overwriting the preserved
transcript with the compressed version — destroying searchable history.

Auto-compression (triggered by context pressure) does not have this bug
because the gateway already handles the session_id swap via the
agent.session_id != session_id check after _run_agent_sync.

Fix: after _compress_context creates the new session, write the compressed
messages into the NEW session_id and update the session store pointer.
The old session's full transcript stays intact and searchable via
session_search.

Before: /compress destroys original messages, session_search can't find
details from compressed portions.

After: /compress behaves like /new for history — full transcript preserved,
compressed context for the live session.

* fix(gateway): preserve transcript on /compress and hygiene compression

Apply session_id swap after _compress_context in both /compress handler
and hygiene pre-compression. _compress_context creates a new session
(ending the old one), but both paths were calling rewrite_transcript on
the OLD session_id — overwriting the preserved transcript and destroying
searchable history.

Now follows the same pattern as the auto-compression handler (lines
5415-5423): detect the new session_id, update the session store entry,
and write compressed messages to the new session.

Also fix FakeCompressAgent test mock to include session_id attribute
and simulate the session_id change that real _compress_context performs.

Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>

---------

Co-authored-by: MacroAnarchy <MacroAnarchy@users.noreply.github.com>
2026-03-28 12:23:43 -07:00
Teknium be322efdf2 fix(matrix): harden e2ee access-token handling (#3562)
* fix(matrix): harden e2ee access-token handling

* fix: patch nio mock in e2ee maintenance sync loop test

The sync_loop now imports nio for SyncError checking (from PR #3280),
so the test needs to inject a fake nio module via sys.modules.

---------

Co-authored-by: Cortana <andrew+cortana@chalkley.org>
2026-03-28 12:13:35 -07:00
Teknium be39292633 fix(cli): guard .strip() against None values from YAML config (#3552)
dict.get(key, default) only returns default when key is ABSENT.
When YAML has 'key:' with no value, it parses as None — .get()
returns None, then .strip() crashes with AttributeError.

Use (x or '') pattern to handle both missing and null cases.


Salvaged from PR #3217.

Co-authored-by: erosika <erosika@users.noreply.github.com>
2026-03-28 11:39:01 -07:00
Teknium df6ce848e9 fix(provider): remove MiniMax /v1→/anthropic auto-correction to allow user override (#3553)
The minimax-specific auto-correction in runtime_provider.py was
preventing users from overriding to the OpenAI-compatible endpoint
via MINIMAX_BASE_URL. Users in certain regions get nginx 404 on
api.minimax.io/anthropic and need to switch to api.minimax.chat/v1.

The generic URL-suffix detection already handles /anthropic →
anthropic_messages, so the minimax-specific code was redundant for
the default path and harmful for the override path.

Now: default /anthropic URL works via generic detection, user
override to /v1 gets chat_completions mode naturally.

Closes #3546 (different approach — respects user overrides instead
of changing the default endpoint).
2026-03-28 11:36:59 -07:00
Teknium 735ca9dfb2 refactor: replace swe-rex with native Modal SDK for Modal backend (#3538)
Drop the swe-rex dependency for Modal terminal backend and use the
Modal SDK directly (Sandbox.create + Sandbox.exec). This fixes:

- AsyncUsageWarning from synchronous App.lookup() in async context
- DeprecationError from unencrypted_ports / .url on unencrypted tunnels
  (deprecated 2026-03-05)

The new implementation:
- Uses modal.App.lookup.aio() for async-safe app creation
- Uses Sandbox.create.aio() with 'sleep infinity' entrypoint
- Uses Sandbox.exec.aio() for direct command execution (no HTTP server
  or tunnel needed)
- Keeps all existing features: persistent filesystem snapshots,
  configurable resources (CPU/memory/disk), sudo support, interrupt
  handling, _AsyncWorker for event loop safety

Consistent with the Docker backend precedent (PR #2804) where we
removed mini-swe-agent in favor of direct docker run.

Files changed:
- tools/environments/modal.py - core rewrite
- tools/terminal_tool.py - health check: modal instead of swerex
- hermes_cli/setup.py - install modal instead of swe-rex[modal]
- pyproject.toml - modal extra: modal>=1.0.0 instead of swe-rex[modal]
- scripts/kill_modal.sh - grep for hermes-agent instead of swe-rex
- tests/ - updated for new implementation
- environments/README.md - updated patches section
- website/docs - updated install command
2026-03-28 11:21:44 -07:00
Teknium 455bf2e853 feat: activate plugin lifecycle hooks (pre/post_llm_call, session start/end) (#3542)
The plugin system defined six lifecycle hooks but only pre_tool_call and
post_tool_call were invoked.  This activates the remaining four so that
external plugins (e.g. memory systems) can hook into the conversation
loop without touching core code.

Hook semantics:
- on_session_start: fires once when a new session is created
- pre_llm_call: fires once per turn before the tool-calling loop;
  plugins can return {"context": "..."} to inject into the ephemeral
  system prompt (not cached, not persisted)
- post_llm_call: fires once per turn after the loop completes, with
  user_message and assistant_response for sync/storage
- on_session_end: fires at the end of every run_conversation call

invoke_hook() now returns a list of non-None callback return values,
enabling pre_llm_call context injection while remaining backward
compatible (existing hooks that return None are unaffected).

Salvaged from PR #2823.

Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
2026-03-28 11:14:54 -07:00
Teknium 411e3c1539 fix(api-server): allow Idempotency-Key in CORS headers (#3530)
Browser clients using the Idempotency-Key header for request
deduplication were blocked by CORS preflight because the header
was not listed in Access-Control-Allow-Headers.

Add Idempotency-Key to _CORS_HEADERS and add tests for both the
new header allowance and the existing Vary: Origin behavior.

Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-03-28 08:16:41 -07:00
Teknium d313a3b7d7 fix: auto-repair jobs.json with invalid control characters (#3537)
load_jobs() uses strict json.load() which rejects bare control characters
(e.g. literal newlines) in JSON string values. When a cron job prompt
contains such characters, the parser throws JSONDecodeError and the
function silently returns an empty list — causing ALL scheduled jobs
to stop firing with no error logged.

Fix: on JSONDecodeError, retry with json.loads(strict=False). If jobs
are recovered, auto-rewrite the file with proper escaping via save_jobs()
and log a warning. Only fall back to empty list if the JSON is truly
unrecoverable.

Co-authored-by: Sebastian Bochna <sbochna@SB-MBP-M2-2.local>
2026-03-28 08:15:31 -07:00
Teknium 80a899a8e2 fix: enable fine-grained tool streaming for Claude/OpenRouter + retry SSE errors (#3497)
Root cause: Anthropic buffers entire tool call arguments and goes silent
for minutes while thinking (verified: 167s gap with zero SSE events on
direct API).  OpenRouter's upstream proxy times out after ~125s of
inactivity and drops the connection with 'Network connection lost'.

Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14
header for Claude models on OpenRouter.  This makes Anthropic stream
tool call arguments token-by-token instead of buffering them, keeping
the connection alive through OpenRouter's proxy.

Live-tested: the exact prompt that consistently failed at ~128s now
completes successfully — 2,972 lines written, 49K tokens, 8 minutes.

Additional improvements:

1. Send explicit max_tokens for Claude through OpenRouter.  Without it,
   OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) —
   only half of Opus 4.6's 128K limit.

2. Classify SSE 'Network connection lost' as retryable in the streaming
   inner retry loop.  The OpenAI SDK raises APIError from SSE error
   events, which was bypassing our transient error retry logic.

3. Actionable diagnostic guidance when stream-drop retries exhaust.
2026-03-28 08:01:37 -07:00
Teknium e295a2215a fix(gateway): include user-local bin paths in systemd unit PATH (#3527)
Add ~/.local/bin, ~/.cargo/bin, ~/go/bin, ~/.npm-global/bin to the
systemd unit PATH so tools installed via uv/pipx/cargo/go are
discoverable by MCP servers and terminal commands.

Uses a _build_user_local_paths() helper that checks exists() before
adding, and correctly resolves home dir for both user and system
service types.

Co-authored-by: Kal Sze <ksze@users.noreply.github.com>
2026-03-28 07:47:40 -07:00
Teknium 831e8ba0e5 feat: tool-use enforcement + strip budget warnings from history (#3528)
Cherry-pick of feat/gpt-tool-steering with modifications:

1. Tool-use enforcement prompt (refactored from GPT-specific):
   - Renamed GPT_TOOL_USE_GUIDANCE -> TOOL_USE_ENFORCEMENT_GUIDANCE
   - Added TOOL_USE_ENFORCEMENT_MODELS tuple: ('gpt', 'codex')
   - Injection logic now checks against the tuple instead of hardcoding
     'gpt' — adding new model families is a one-line change
   - Addresses models describing actions instead of making tool calls

2. Budget warning history stripping:
   - _strip_budget_warnings_from_history() strips _budget_warning JSON
     keys and [BUDGET WARNING: ...] text from tool results at the start
     of run_conversation()
   - Prevents old budget warnings from poisoning subsequent turns

Based on PR #3479 by teknium1.
2026-03-28 07:38:36 -07:00
Teknium 9d4b3e5470 fix: harden hermes update against diverged history, non-main branches, and gateway edge cases (salvage #3489) (#3492)
* fix: harden `hermes update` against diverged history, non-main branches, and gateway edge cases

The self-update command (`hermes update` / gateway `/update`) could fail
or silently corrupt state in several scenarios:

1. **Diverged history** — `git pull --ff-only` aborts with a cryptic
   subprocess error when upstream has force-pushed or rebased. Now falls
   back to `git reset --hard origin/main` since local changes are already
   stashed.

2. **User on a feature branch / detached HEAD** — the old code would
   either clobber the feature branch HEAD to point at origin/main, or
   silently pull against a non-existent remote branch. Now auto-checkouts
   main before pulling, with a clear warning.

3. **Fetch failures** — network or auth errors produced raw subprocess
   tracebacks. Now shows user-friendly messages ("Network error",
   "Authentication failed") with actionable hints.

4. **reset --hard failure** — if the fallback reset itself fails (disk
   full, permissions), the old code would still attempt stash restore on
   a broken working tree. Now skips restore and tells the user their
   changes are safe in stash.

5. **Gateway /update stash conflicts** — non-interactive mode (Telegram
   `/update`) called sys.exit(1) when stash restore had conflicts, making
   the entire update report as failed even though the code update itself
   succeeded. Now treats stash conflicts as non-fatal in non-interactive
   mode (returns False instead of exiting).

* fix: restore stash and branch on 'already up to date' early return

The PR moved stash creation before the commit-count check (needed for
the branch-switching feature), but the 'already up to date' early return
didn't restore the stash or switch back to the original branch — leaving
the user stranded on main with changes trapped in a stash.

Now the early-return path restores the stash and checks out the original
branch when applicable.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-27 23:12:43 -07:00
Teknium 6ed9740444 fix: prevent unbounded growth of _seen_uids in EmailAdapter (#3490)
EmailAdapter._seen_uids accumulates every IMAP UID ever seen but
never removes any. A long-running gateway processing a high-volume
inbox would leak memory indefinitely — thousands of integers per day.

IMAP UIDs are monotonically increasing integers, so old UIDs are safe
to drop: new messages always have higher UIDs, and the IMAP UNSEEN
flag already prevents re-delivery regardless of our local tracking.

Fix adds _trim_seen_uids() which keeps only the most recent 1000 UIDs
(half of the 2000-entry cap) when the set grows too large. Called
automatically during connect() and after each fetch cycle.

Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-27 23:08:42 -07:00
Teknium 290c71a707 fix(gateway): scope progress thread fallback to Slack only (salvage #3414) (#3488)
* test(gateway): map fixture adapter by platform in progress threading tests

* fix(gateway): scope progress thread fallback to Slack only

---------

Co-authored-by: EmpireOperating <258363005+EmpireOperating@users.noreply.github.com>
2026-03-27 22:37:53 -07:00
Teknium 09796b183b fix: alibaba provider default endpoint and model list (#3484)
- Change default inference_base_url from dashscope-intl Anthropic-compat
  endpoint to coding-intl OpenAI-compat /v1 endpoint. The old Anthropic
  endpoint 404'd when used with the OpenAI SDK (which appends
  /chat/completions to a /apps/anthropic base URL).

- Update curated model list: remove models unavailable on coding-intl
  (qwen3-max, qwen-plus-latest, qwen3.5-flash, qwen-vl-max), add
  third-party models available on the platform (glm-5, glm-4.7,
  kimi-k2.5, MiniMax-M2.5).

- URL-based api_mode auto-detection still works: overriding
  DASHSCOPE_BASE_URL to an /apps/anthropic endpoint automatically
  switches to anthropic_messages mode.

- Update provider description and env var descriptions to reflect the
  coding-intl multi-provider platform.

- Update tests to match new default URL and test the anthropic override
  path instead.
2026-03-27 22:10:10 -07:00
Teknium 15cfd20820 fix: cap context pressure percentage at 100% in display (#3480)
* fix: cap context pressure percentage at 100% in display

The forward-looking token estimate can overshoot the compaction threshold
(e.g. a large tool result pushes it from 70% to 109% in one step). The
progress bar was already capped via min(), but pct_int was not — causing
the user to see '109% to compaction' which is confusing.

Cap pct_int at 100 in both CLI and gateway display functions.

Reported by @JoshExile82.

* refactor: use real API token counts for compression decisions

Replace the rough chars/3 estimation with actual prompt_tokens +
completion_tokens from the API response. The estimation was needed to
predict whether tool results would push context past the threshold, but
the default 50% threshold leaves ample headroom — if tool results push
past it, the next API call reports real usage and triggers compression
then.

This removes all estimation from the compression and context pressure
paths, making both 100% data-driven from provider-reported token counts.

Also removes the dead _msg_count_before_tools variable.
2026-03-27 21:42:09 -07:00
Teknium 03f24c1edd fix: session_search fallback preview on summarization failure (salvage #3413) (#3478)
* Fix #3409: Add fallback to session_search to prevent false negatives on summarization failure

Fixes #3409. When the auxiliary summarizer fails or returns None, the tool now returns a raw fallback preview of the matched session instead of silently dropping it and returning an empty list

* fix: clean up fallback logic — separate exception handling from preview

Restructure the loop: handle exceptions first (log + nullify), build
entry dict once, then branch on result truthiness. Removes duplicated
field assignments and makes the control flow linear.

---------

Co-authored-by: devorun <130918800+devorun@users.noreply.github.com>
2026-03-27 21:27:51 -07:00
Teknium 388fa5293d fix(matrix): add missing matrix entry in PLATFORMS dict (#3473)
Matrix platform was missing from the PLATFORMS config, causing a
KeyError in _get_platform_tools() when handling Matrix messages.
Every other platform (telegram, discord, slack, etc.) was present
but matrix was overlooked.

Co-authored-by: williamtwomey <williamtwomey@users.noreply.github.com>
2026-03-27 18:36:23 -07:00
Teknium 83043e9aa8 fix: add timeout to subprocess calls in context_references (#3469)
_expand_git_reference() and _rg_files() called subprocess.run()
without a timeout. On a large repository, @diff, @staged, or
@git:N references could hang the agent indefinitely while git
or ripgrep processes slow output.

- Add timeout=30 to git subprocess in _expand_git_reference()
  with a user-friendly error message on TimeoutExpired
- Add timeout=10 to rg subprocess in _rg_files() returning
  None on timeout (falls back to os.walk folder listing)

Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
2026-03-27 17:51:14 -07:00
Teknium b6b87dedd4 fix: discover plugins before reading plugin toolsets in tools_config (#3457)
hermes tools and _get_platform_tools() call get_plugin_toolsets() /
_get_plugin_toolset_keys() without first ensuring plugins have been
discovered. discover_plugins() only runs as a side effect of importing
model_tools.py, which hermes tools never does. This means:

- hermes tools TUI never shows plugin toolsets (invisible to users)
- _get_platform_tools() in standalone processes misses plugin toolsets

Fix: call discover_plugins() (idempotent) in both
_get_plugin_toolset_keys() and _get_effective_configurable_toolsets()
before accessing plugin state. In the gateway/CLI where model_tools.py
is already imported, the call is a no-op (discover_and_load checks
_discovered flag).
2026-03-27 15:31:17 -07:00
Teknium 8fdfc4b00c fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries (#3444)
When finish_reason='length' and the response contains only reasoning
(think blocks or empty content), the model exhausted its output token
budget on thinking with nothing left for the actual response.

Previously, this fell into either:
- chat_completions: 3 useless continuation retries (model hits same limit)
- anthropic/codex: generic 'Response truncated' error with rollback

Now: detect the think-only + length condition early and return immediately
with a targeted error message: 'Model used all output tokens on reasoning
with none left for the response. Try lowering reasoning effort or
increasing max_tokens.'

This saves 2 wasted API calls on the chat_completions path and gives
users actionable guidance instead of a cryptic error.

The existing think-only retry logic (finish_reason='stop') is unchanged —
that's a genuine model glitch where retrying can help.
2026-03-27 15:29:30 -07:00
Teknium 658692799d fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389) (#3449)
Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.

All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.

Closes #3389.
2026-03-27 15:28:19 -07:00
Teknium ab09f6b568 feat: curate HF model picker with OpenRouter analogues (#3440)
Show only agentic models that map to OpenRouter defaults:

  Qwen/Qwen3.5-397B-A17B          ↔ qwen/qwen3.5-plus
  Qwen/Qwen3.5-35B-A3B            ↔ qwen/qwen3.5-35b-a3b
  deepseek-ai/DeepSeek-V3.2       ↔ deepseek/deepseek-chat
  moonshotai/Kimi-K2.5             ↔ moonshotai/kimi-k2.5
  MiniMaxAI/MiniMax-M2.5           ↔ minimax/minimax-m2.5
  zai-org/GLM-5                    ↔ z-ai/glm-5
  XiaomiMiMo/MiMo-V2-Flash         ↔ xiaomi/mimo-v2-pro
  moonshotai/Kimi-K2-Thinking      ↔ moonshotai/kimi-k2-thinking

Users can still pick any HF model via Enter custom model name.
2026-03-27 13:54:46 -07:00
Teknium e4e04c2005 fix: make tirith block verdicts approvable instead of hard-blocking (#3428)
Previously, tirith exit code 1 (block) immediately rejected the command
with no approval prompt — users saw 'BLOCKED: Command blocked by
security scan' and the agent moved on.  This prevented gateway/CLI users
from approving pipe-to-shell installs like 'curl ... | sh' even when
they understood the risk.

Changes:
- Tirith 'block' and 'warn' now both go through the approval flow.
  Users see the full tirith findings (severity, title, description,
  safer alternatives) and can choose to approve or deny.
- New _format_tirith_description() builds rich descriptions from tirith
  findings JSON so the approval prompt is informative.
- CLI startup now warns when tirith is enabled but not available, so
  users know command scanning is degraded to pattern matching only.

The default approval choice is still deny, so the security posture is
unchanged for unattended/timeout scenarios.

Reported via Discord by pistrie — 'curl -fsSL https://mandex.dev/install.sh | sh'
was hard-blocked with no way to approve.
2026-03-27 13:22:01 -07:00
Teknium 6f11ff53ad fix(anthropic): use model-native output limits instead of hardcoded 16K (#3426)
The Anthropic adapter defaulted to max_tokens=16384 when no explicit value
was configured.  This severely limits thinking-enabled models where thinking
tokens count toward max_tokens:

- Claude Opus 4.6 supports 128K output but was capped at 16K
- Claude Sonnet 4.6 supports 64K output but was capped at 16K

With extended thinking (adaptive or budget-based), the model could exhaust
the entire 16K on reasoning, leaving zero tokens for the actual response.
This caused two user-visible errors:
- 'Response truncated (finish_reason=length)' — thinking consumed most tokens
- 'Response only contains think block with no content' — thinking consumed all

Fix: add _ANTHROPIC_OUTPUT_LIMITS lookup table (sourced from Anthropic docs
and Cline's model catalog) and use the model's actual output limit as the
default.  Unknown future models default to 128K (the current maximum).

Also adds context_length clamping: if the user configured a smaller context
window (e.g. custom endpoint), max_tokens is clamped to context_length - 1
to avoid exceeding the window.

Closes #2706
2026-03-27 13:02:52 -07:00
Teknium fb46a90098 fix: increase API timeout default from 900s to 1800s for slow-thinking models (#3431)
Models like GLM-5/5.1 can think for 15+ minutes. The previous 900s
(15 min) default for HERMES_API_TIMEOUT killed legitimate requests.

Raised to 1800s (30 min) in both places that read the env var:
- _build_api_kwargs() timeout (non-streaming total timeout)
- _call_chat_completions() write timeout (streaming connection)

The streaming per-chunk read timeout (60s) and stale stream detector
(180-300s) are unchanged — those are appropriate for inter-chunk timing.
2026-03-27 13:02:23 -07:00
Teknium fd8c465e42 feat: add Hugging Face as a first-class inference provider (#3419)
Salvage of PR #1747 (original PR #1171 by @davanstrien) onto current main.

Registers Hugging Face Inference Providers (router.huggingface.co/v1) as a named provider:
- hermes chat --provider huggingface (or --provider hf)
- 18 curated open models via hermes model picker
- HF_TOKEN in ~/.hermes/.env
- OpenAI-compatible endpoint with automatic failover (Groq, Together, SambaNova, etc.)

Files: auth.py, models.py, main.py, setup.py, config.py, model_metadata.py, .env.example, 5 docs pages, 17 new tests.

Co-authored-by: Daniel van Strien <davanstrien@gmail.com>
2026-03-27 12:41:59 -07:00
Teknium f57ebf52e9 fix(api-server): cancel orphaned agent + true interrupt on SSE disconnect (salvage #3399) (#3427)
Salvage of #3399 by @binhnt92 with true agent interruption added on top.

When a streaming /v1/chat/completions client disconnects mid-stream, the agent is now interrupted via agent.interrupt() so it stops making LLM API calls, and the asyncio task wrapper is cancelled.

Closes #3399.
2026-03-27 11:33:19 -07:00
Teknium 5127567d5d perf(ttft): cache skills prompt with shared skill_utils module (salvage #3366) (#3421)
Two-layer caching for build_skills_system_prompt():
  1. In-process LRU (OrderedDict, max 8) — same-process: 546ms → <1ms
  2. Disk snapshot (.skills_prompt_snapshot.json) — cold start: 297ms → 103ms

Key improvements over original PR #3366:
- Extract shared logic into agent/skill_utils.py (parse_frontmatter,
  skill_matches_platform, get_disabled_skill_names, extract_skill_conditions,
  extract_skill_description, iter_skill_index_files)
- tools/skills_tool.py delegates to shared module — zero code duplication
- Proper LRU eviction via OrderedDict.move_to_end + popitem(last=False)
- Cache invalidation on all skill mutation paths:
  - skill_manage tool (in-conversation writes)
  - hermes skills install (CLI hub)
  - hermes skills uninstall (CLI hub)
  - Automatic via mtime/size manifest on cold start

prompt_builder.py no longer imports tools.skills_tool (avoids pulling
in the entire tool registry chain at prompt build time).

6301 tests pass, 0 failures.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-27 10:54:02 -07:00
Siddharth Balyan cc4514076b feat(nix): add suffix PATHs during nix build for more agent-friendliness (#3274)
* refactor: suffix runtimeDeps PATH so apt-installed tools take priority

Changes makeWrapper from --prefix to --suffix. In container mode,
tools installed via apt in /usr/bin now win over read-only nix store
copies. Nix store versions become dead-letter fallbacks. Native NixOS
mode unaffected — tools in /run/current-system/sw/bin already precede
the suffix.

* feat(container): first-boot apt provisioning for agent tools

Installs nodejs, npm, curl via apt and uv via curl on first container
boot. Uses sentinel file so subsequent boots skip. Container recreation
triggers fresh install. Combined with --suffix PATH change, agents get
mutable tools that support npm i -g and uv without hitting read-only
nix store paths.

* docs: update nixosModules header for tool provisioning

* feat(container): consolidate first-boot provisioning + Python 3.11 venv

Merge sudo and tool apt installs into a single apt-get update call.
Move uv install outside the sentinel so transient failures retry on
next boot. Bootstrap a Python 3.11 venv via uv (--seed for pip) and
prepend ~/.venv/bin to PATH so agents get writable python/pip/node
out of the box.

---------

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-03-27 23:00:56 +05:30
Teknium 8ecd7aed2c fix: prevent reasoning box from rendering 3x during tool-calling loops (#3405)
Two independent bugs caused the reasoning box to appear three times when
the model produced reasoning + tool_calls:

Bug A: _build_assistant_message() re-fired reasoning_callback with the full
reasoning text even when streaming had already displayed it. The original
guard only checked structured reasoning_content deltas, but reasoning also
arrives via content tag extraction (<REASONING_SCRATCHPAD>/<think> tags
in delta.content), which went through _fire_stream_delta not
_fire_reasoning_delta. Fix: skip the callback entirely when streaming is
active — both paths display reasoning during the stream. Any reasoning not
shown during streaming is caught by the CLI post-response fallback.

Bug B: The post-response reasoning display checked _reasoning_stream_started,
but that flag was reset by _reset_stream_state() during intermediate turn
boundaries (when stream_delta_callback(None) fires between tool calls).
Introduced _reasoning_shown_this_turn flag that persists across the tool
loop and is only reset at the start of each user turn.

Live-tested in PTY: reasoning now shows exactly once per API call, no
duplicates across tool-calling loops.
2026-03-27 09:57:50 -07:00
Teknium e0dbbdb2c9 fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398)
The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via
asyncio.get_running_loop().create_task().  When an AsyncOpenAI client is
garbage-collected while prompt_toolkit's event loop is running (the common
CLI idle state), the aclose() task runs on prompt_toolkit's loop but the
underlying TCP transport is bound to a different (dead) worker loop.
The transport's self._loop.call_soon() then raises RuntimeError('Event
loop is closed'), which prompt_toolkit surfaces as the disruptive
'Unhandled exception in event loop ... Press ENTER to continue...' error.

Three-layer fix:

1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI
   startup before any AsyncOpenAI clients are created.  Safe because
   cached clients are explicitly cleaned via _force_close_async_httpx,
   and uncached clients' TCP connections are cleaned by the OS on exit.

2. Custom asyncio exception handler: Installed on prompt_toolkit's event
   loop to silently suppress 'Event loop is closed' RuntimeError.
   Defense-in-depth for SDK upgrades that might change the class name.

3. cleanup_stale_async_clients(): Called after each agent turn (when the
   agent thread joins) to proactively evict cache entries whose event
   loop is closed, preventing stale clients from accumulating.
2026-03-27 09:45:25 -07:00
Teknium eb2127c1dc fix(cron): prevent recurring job re-fire on gateway crash/restart loop (#3396)
When a gateway crashes mid-job execution (before mark_job_run can persist
the updated next_run_at), the job would fire again on every restart attempt
within the grace window. For a daily 6:15 AM job with a 2-hour grace,
rapidly restarting the gateway could trigger dozens of duplicate runs.

Fix: call advance_next_run() BEFORE run_job() in tick(). For recurring
jobs (cron/interval), this preemptively advances next_run_at to the next
future occurrence and persists it to disk. If the process then crashes
during execution, the job won't be considered due on restart.

One-shot jobs are left unchanged — they still retry on restart since
there's no future occurrence to advance to.

This changes the scheduler from at-least-once to at-most-once semantics
for recurring jobs, which is the correct tradeoff: missing one daily
message is far better than sending it dozens of times.
2026-03-27 08:02:58 -07:00
Teknium 5a1e2a307a perf(ttft): salvage easy-win startup optimizations from #3346 (#3395)
* perf(ttft): dedupe shared tool availability checks

* perf(ttft): short-circuit vision auto-resolution

* perf(ttft): make Claude Code version detection lazy

* perf(ttft): reuse loaded toolsets for skills prompt

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-27 07:49:44 -07:00
Teknium 41d9d08078 fix(telegram): fall back to no thread_id on 'Message thread not found' (#3390)
python-telegram-bot's BadRequest inherits from NetworkError, so the
send() retry loop was catching 'Message thread not found' as a transient
network error and retrying 3 times before silently failing. This killed
all tool progress messages, streaming responses, and typing indicators
when the incoming message carried an invalid message_thread_id.

Now detect BadRequest inside the NetworkError handler:
- 'thread not found' + thread_id set → clear thread_id and retry once
  (message still reaches the chat, just without topic threading)
- Other BadRequest errors → raise immediately (permanent, don't retry)
- True NetworkError → retry as before (transient)

252 silent failures in gateway.log traced to this on 2026-03-26.

5 new tests for thread fallback, non-thread BadRequest, no-thread sends,
network retry, and multi-chunk fallback.
2026-03-27 06:07:28 -07:00
Teknium b7bcae49c6 fix: SQLite WAL write-lock contention causing 15-20s TUI freeze (#3385)
Multiple hermes processes (gateway + CLI sessions + worktree agents) sharing
one state.db caused WAL write-lock convoy effects. SQLite's built-in busy
handler uses deterministic sleep intervals (up to 100ms) that synchronize
competing writers, creating 15-20 second freezes during agent init.

Root cause: timeout=30.0 with 7+ concurrent connections meant:
- WAL never checkpointed (294MB, readers always blocked it)
- Bloated WAL slowed all reads and writes
- Deterministic backoff caused convoy effects under contention

Fix:
- Replace 30s SQLite timeout with 1s + app-level retry (15 attempts,
  random 20-150ms jitter between retries to break convoys)
- Use BEGIN IMMEDIATE for explicit write-lock acquisition (fail fast)
- Set isolation_level=None for manual transaction control
- PASSIVE WAL checkpoint on close() and every 50 writes
- All 12 write methods converted to _execute_write() helper

Before: 15-20s frozen at create_session during agent init
After:  <1s to API call, WAL stays at ~4MB

Tested: 4355 tests pass, 3 concurrent live sessions with simultaneous
writes showed zero contention on every py-spy sample.
2026-03-27 05:22:57 -07:00
Teknium 915df02bbf fix(streaming): stale stream detector race causing spurious RemoteProtocolError
The stale stream detector (90s timeout) was killing healthy connections
during the model's thinking phase, producing self-inflicted
RemoteProtocolError ("peer closed connection without sending complete
message body"). Three issues:

1. last_chunk_time was never reset between inner stream retries, so
   subsequent attempts inherited the previous attempt's stale budget
2. The non-streaming fallback path didn't reset the timer either
3. 90s base timeout was too aggressive for large-context Opus sessions
   where thinking time before first token routinely exceeds 90s

Fix: reset last_chunk_time at the start of each streaming attempt and
before the non-streaming fallback. Increase base timeout to 180s and
scale to 300s for >100K token contexts.

Made-with: Cursor
2026-03-27 04:05:51 -07:00
Teknium 75fcbc44ce feat(telegram): auto-discover fallback IPs via DoH when api.telegram.org is unreachable (#3376)
* feat(telegram): auto-discover fallback IPs via DoH when api.telegram.org is unreachable

On some networks (university, corporate), api.telegram.org resolves to a
valid Telegram IP that is unreachable due to routing/firewall rules. A
different IP in the same Telegram-owned 149.154.160.0/20 block works fine.

This adds automatic fallback IP discovery at connect time:
1. Query Google and Cloudflare DNS-over-HTTPS for api.telegram.org A records
2. Exclude the system-DNS IP (the unreachable one), use the rest as fallbacks
3. If DoH is also blocked, fall back to a seed list (149.154.167.220)
4. TelegramFallbackTransport tries primary first, sticks to whichever works

No configuration needed — works automatically. TELEGRAM_FALLBACK_IPS env var
still available as manual override. Zero impact on healthy networks (primary
path succeeds on first attempt, fallback never exercised).

No new dependencies (uses httpx already in deps + stdlib socket).

* fix: share transport instance and downgrade seed fallback log to info

- Use single TelegramFallbackTransport shared between request and
  get_updates_request so sticky IP is shared across polling and API calls
- Keep separate HTTPXRequest instances (different timeout settings)
- Downgrade "using seed fallback IPs" from warning to info to avoid
  noisy logs on healthy networks

* fix: add telegram.request mock and discovery fixture to remaining test files

The original PR missed test_dm_topics.py and
test_telegram_network_reconnect.py — both need the telegram.request
mock module. The reconnect test also needs _no_auto_discovery since
_handle_polling_network_error calls connect() which now invokes
discover_fallback_ips().

---------

Co-authored-by: Mohan Qiao <Gavin-Qiao@users.noreply.github.com>
2026-03-27 04:03:13 -07:00
Teknium be416cdfa9 fix: guard config.get() against YAML null values to prevent AttributeError (#3377)
dict.get(key, default) returns None — not the default — when the key IS
present but explicitly set to null/~ in YAML.  Calling .lower() on that
raises AttributeError.

Use (config.get(key) or fallback) so both missing keys and explicit nulls
coalesce to the intended default.

Files fixed:
- tools/tts_tool.py — _get_provider()
- tools/web_tools.py — _get_backend()
- tools/mcp_tool.py — MCPServerTask auth config
- trajectory_compressor.py — _detect_provider() and config loading

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-27 04:03:00 -07:00
Teknium b8b1f24fd7 fix: handle addition-only hunks in V4A patch parser (#3325)
V4A patches with only + lines (no context or - lines) were silently
dropped because search_lines was empty and the 'if search_lines:' block
was the only code path. Addition-only hunks are common when the model
generates patches for new functions or blocks.

Adds an else branch that inserts at the context_hint position when
available, or appends at end of file.

Includes 2 regression tests for addition-only hunks with and without
context hints.

Salvaged from PR #3092 by thakoreh.

Co-authored-by: Hiren <hiren.thakore58@gmail.com>
2026-03-26 19:38:04 -07:00
Teknium a2847ea7f0 fix(gateway): add media download retry to Mattermost, Slack, and base cache (#3323)
* fix(gateway): add media download retry to Mattermost, Slack, and base cache

Media downloads on Mattermost and Slack fail permanently on transient
errors (timeouts, 429 rate limits, 5xx server errors). Telegram and
WhatsApp already have retry logic, but these platforms had single-attempt
downloads with hardcoded 30s timeouts.

Changes:
- base.py cache_image_from_url: add retry with exponential backoff
  (covers Signal and any platform using the shared cache helper)
- mattermost.py _send_media_url: retry on 429/5xx/timeout (3 attempts)
- slack.py _download_slack_file: retry on timeout/5xx (3 attempts)
- slack.py _download_slack_file_bytes: same retry pattern

* test: add tests for media download retry

---------

Co-authored-by: dieutx <dangtc94@gmail.com>
2026-03-26 19:33:18 -07:00
Teknium 58ca875e19 feat(gateway): surface session config on /new, /reset, and auto-reset (#3321)
When a new session starts in the gateway (via /new, /reset, or
auto-reset), send the user a summary of the detected configuration:

   Session reset! Starting fresh.

  ◆ Model: qwen3.5:27b-q4_K_M
  ◆ Provider: custom
  ◆ Context: 8K tokens (config)
  ◆ Endpoint: http://localhost:11434/v1

This makes misconfigured context length immediately visible — a user
running a local 8K model that falls to the 128K default will see:

  ◆ Context: 128K tokens (default — set model.context_length in config to override)

Instead of silently getting no compression and degrading responses.

- _format_session_info() resolves model, provider, context length,
  and endpoint from config + runtime, matching the hygiene code's
  resolution chain
- Local/custom endpoints shown; cloud endpoints hidden (not useful)
- Context source annotated: config, detected, or default with hint
- Appended to /new and /reset responses, and auto-reset notifications
- 9 tests covering all formatting paths and failure resilience

Addresses the user-facing side of #2708 — instead of trying to fix
every edge case in context detection, surface the values so users
can immediately see when something is wrong.
2026-03-26 19:27:58 -07:00
Teknium 3f95e741a7 fix: validate empty user messages to prevent Anthropic API 400 errors (#3322)
When user messages have empty content (e.g., Discord @mention-only
messages, unrecognized attachments), the Anthropic API rejects the
request with 'user messages must have non-empty content'.

Changes:
- anthropic_adapter.py: Add empty content validation for user messages
  (string and list formats), matching the existing pattern for assistant
  and tool messages. Empty content gets '(empty message)' placeholder.

- discord.py: Defense-in-depth check at gateway layer to catch empty
  messages before they enter session history.

- Add 4 regression tests covering empty string, whitespace-only,
  empty list, and empty text block scenarios.

Fixes #3143

Co-authored-by: Bartok9 <bartok9@users.noreply.github.com>
2026-03-26 19:24:03 -07:00
Teknium 03396627a6 fix(ci): pin acp <0.9 and update retry-exhaust test (#3320)
Two remaining CI failures:

1. agent-client-protocol 0.9.0 removed AuthMethod (replaced with
   AuthMethodAgent/EnvVar/Terminal). Pin to <0.9 until the new API
   is evaluated — our usage doesn't map 1:1 to the new types.

2. test_429_exhausts_all_retries_before_raising expected pytest.raises
   but the agent now catches 429s after max retries, tries fallback,
   then returns a result dict. Updated to check final_response.
2026-03-26 19:21:34 -07:00
Teknium 22cfad157b fix: gateway token double-counting — use absolute set instead of increment (#3317)
The gateway's update_session() used += for token counts, but the cached
agent's session_prompt_tokens / session_completion_tokens are cumulative
totals that grow across messages. Each update_session call re-added the
running total, inflating usage stats with every message (1.7x after 3
messages, worse over longer conversations).

Fix: change += to = for in-memory entry fields, add set_token_counts()
to SessionDB that uses direct assignment instead of SQL increment, and
switch the gateway to call it.

CLI mode continues using update_token_counts() (increment) since it
tracks per-API-call deltas — that path is unchanged.

Based on analysis from PR #3222 by @zaycruz (closed).

Co-authored-by: zaycruz <zay@users.noreply.github.com>
2026-03-26 19:13:07 -07:00
Teknium 867eefdd9f fix(signal): track SSE keepalive comments as connection activity (#3316)
signal-cli sends SSE comment lines (':') as keepalives every ~15s. The
SSE listener only counted 'data:' lines as activity, so the health
monitor reported false idle warnings every 2 minutes during quiet
periods. Recognize ':' lines as valid activity per the SSE spec.

Salvaged from PR #2938 by ticketclosed-wontfix.
2026-03-26 19:10:25 -07:00
Teknium a8df7f9964 fix: gateway token double-counting with cached agents (#3306)
The cached agent accumulates session_input_tokens across messages, so
run_conversation() returns cumulative totals. But update_session() used
+= (increment), double-counting on every message after the first.

- session.py: change in-memory entry updates from += to = (direct
  assignment for cumulative values)
- hermes_state.py: add absolute=True flag to update_token_counts()
  that uses SET column = ? instead of SET column = column + ?
- session.py: pass absolute=True to the DB call

CLI path is unchanged — it passes per-API-call deltas directly to
update_token_counts() with the default absolute=False (increment).

Reported by @zaycruz in #3222. Closes #3222.
2026-03-26 19:04:53 -07:00
Teknium 1519c4d477 fix(session): add /resume CLI handler, session log truncation guard, reopen_session API (#3315)
Three improvements salvaged from PR #3225 by Mibayy:

1. Add /resume slash command handler in CLI process_command(). The
   command was registered in the commands registry but had no handler,
   so typing /resume produced 'Unknown command'. The handler resolves
   by title or session ID, ends the current session cleanly, loads
   conversation history from SQLite, re-opens the target session, and
   syncs the AIAgent instance. Follows the same pattern as new_session().

2. Add truncation guard in _save_session_log(). When resuming a session
   whose messages weren't fully written to SQLite, the agent starts with
   partial history and the first save would overwrite the full JSON log
   on disk. The guard reads the existing file and skips the write if it
   already has more messages than the current batch.

3. Add reopen_session() method to SessionDB. Proper API for clearing
   ended_at/end_reason instead of reaching into _conn directly.

Note: Bug 1 from the original PR (INSERT OR IGNORE + _session_db = None)
is already fixed on main — skipped as redundant.

Closes #3123.
2026-03-26 19:04:28 -07:00
Teknium 005786c55d fix(gateway): include per-platform ALLOW_ALL and SIGNAL_GROUP in startup allowlist check (#3313)
The startup warning 'No user allowlists configured' only checked
GATEWAY_ALLOW_ALL_USERS and per-platform _ALLOWED_USERS vars. It
missed SIGNAL_GROUP_ALLOWED_USERS and per-platform _ALLOW_ALL_USERS
vars (e.g. TELEGRAM_ALLOW_ALL_USERS), causing a false warning even
when users had these configured. The actual auth check in
_is_user_authorized already recognized these vars.

Cherry-picked from PR #3202 by binhnt92.

Co-authored-by: binhnt92 <binhnt.ht.92@gmail.com>
2026-03-26 18:23:49 -07:00
Teknium ad764d3513 fix(auxiliary): catch ImportError from build_anthropic_client in vision auto-detection (#3312)
_try_anthropic() caught ImportError on the module import (line 667-669)
but not on the build_anthropic_client() call (line 696). When the
anthropic_adapter module imports fine but the anthropic SDK is missing,
build_anthropic_client() raises ImportError at call time. This escaped
_try_anthropic() entirely, killing get_available_vision_backends() and
cascading to 7 test failures:

- 4 setup wizard tests hit unexpected 'Configure vision:' prompt
- 3 codex-auth-as-vision tests failed check_vision_requirements()

The fix wraps the build_anthropic_client call in try/except ImportError,
returning (None, None) when the SDK is unavailable — consistent with the
existing guard at the top of the function.
2026-03-26 18:21:59 -07:00
Teknium f008ee1019 fix(session): preserve reasoning fields in rewrite_transcript (#3311)
rewrite_transcript (used by /retry, /undo, /compress) was calling
append_message without reasoning, reasoning_details, or
codex_reasoning_items — permanently dropping them from SQLite.

Co-authored-by: alireza78a <alireza78.crypto@gmail.com>
2026-03-26 18:18:00 -07:00
Teknium 60fdb58ce4 fix(agent): update context compressor limits after fallback activation (#3305)
When _try_activate_fallback() switches to the fallback model, it
updates the agent's model/provider/client but never touches
self.context_compressor. The compressor keeps the primary model's
context_length and threshold_tokens, so compression decisions use
wrong limits — a 200K primary → 32K fallback still uses 200K-based
thresholds, causing oversized sessions to overflow the fallback.

Update the compressor's model, credentials, context_length, and
threshold_tokens after fallback activation using get_model_context_length()
for the new model.

Cherry-picked from PR #3202 by binhnt92.

Co-authored-by: binhnt92 <binhnt.ht.92@gmail.com>
2026-03-26 18:10:50 -07:00
Teknium 18d28c63a7 fix: add explicit hermes-api-server toolset for API server platform (#3304)
The API server adapter was creating agents without specifying
enabled_toolsets, causing ALL tools to load — including clarify,
send_message, and text_to_speech which don't work without interactive
callbacks or gateway dispatch.

Changes:
- toolsets.py: Add hermes-api-server toolset (core tools minus clarify,
  send_message, text_to_speech)
- api_server.py: Resolve toolsets from config.yaml platform_toolsets
  via _get_platform_tools() — same path as all other gateway platforms.
  Falls back to hermes-api-server default when no override configured.
- tools_config.py: Add api_server to PLATFORMS dict so users can
  customize via 'hermes tools' or platform_toolsets.api_server in
  config.yaml
- 12 tests covering toolset definition, config resolution, and
  user override

Reported by thatwolfieguy on Discord.
2026-03-26 18:02:26 -07:00
Teknium 3c57eaf744 fix: YAML boolean handling for tool_progress config (#3300)
YAML 1.1 parses bare `off` as boolean False, which is falsy in
Python's `or` chain and silently falls through to the 'all' default.
Users setting `display.tool_progress: off` in config.yaml saw no
effect — tool progress stayed on.

Normalise False → 'off' before the or chain in both affected paths:
- gateway/run.py _run_agent() tool progress reader
- cli.py HermesCLI.__init__() tool_progress_mode

Reported by @gibbsoft in #2859. Closes #2859.
2026-03-26 17:58:50 -07:00
Teknium 2d232c9991 feat(cli): configurable busy input mode + fix /queue always working (#3298)
Two changes:

1. Fix /queue command: remove the _agent_running guard that rejected
   /queue after the agent finished. The prompt was deferred in
   _pending_input until the agent completed, then the handler checked
   _agent_running (now False) and rejected it. /queue now always queues
   regardless of timing.

2. Add display.busy_input_mode config (CLI-only):
   - 'interrupt' (default): Enter while busy interrupts the current run
     (preserves existing behavior)
   - 'queue': Enter while busy queues the message for the next turn,
     with a 'Queued for the next turn: ...' confirmation
   Ctrl+C always interrupts regardless of this setting.

Salvaged from PR #3037 by StefanoChiodino. Key differences:
- Default is 'interrupt' (preserves existing behavior) not 'queue'
- No config version bump (unnecessary for new key in existing section)
- Simpler normalization (no alias map)
- /queue fix is simpler: just remove the guard instead of intercepting
  commands during busy state
2026-03-26 17:58:40 -07:00
Teknium 0375b2a0d7 fix(gateway): silence background agent terminal output (#3297)
* fix(gateway): silence flush agent terminal output

quiet_mode=True only suppresses AIAgent init messages.
Tool call output still leaks to the terminal through
_safe_print → _print_fn during session reset/expiry.

Since #2670 injected live memory state into the flush prompt,
the flush agent now reliably calls memory tools — making the
output leak noticeable for the first time.

Set _print_fn to a no-op so the background flush is fully silent.

* test(gateway): add test for flush agent terminal silence + fix dotenv mock

- Add TestFlushAgentSilenced: verifies _print_fn is set to a no-op on
  the flush agent so tool output never leaks to the terminal
- Fix pre-existing test failures: replace patch('run_agent.AIAgent')
  with sys.modules mock to avoid importing run_agent (requires openai)
- Add autouse _mock_dotenv fixture so all tests in this file run
  without the dotenv package installed

* fix(display): route KawaiiSpinner output through print_fn to fully silence flush agent

The previous fix set tmp_agent._print_fn = no-op on the flush agent but
spinner output and quiet-mode cute messages bypassed _print_fn entirely:
- KawaiiSpinner captured sys.stdout at __init__ and wrote directly to it
- quiet-mode tool results used builtin print() instead of _safe_print()

Add optional print_fn parameter to KawaiiSpinner.__init__; _write routes
through it when set. Pass self._print_fn to all spinner construction sites
in run_agent.py and change the quiet-mode cute message print to _safe_print.
The existing gateway fix (tmp_agent._print_fn = lambda) now propagates
correctly through both paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(gateway): silence hygiene and compression background agents

Two more background AIAgent instances in the gateway were created with
quiet_mode=True but without _print_fn = no-op, causing tool output to
leak to the terminal:
- _hyg_agent (in-turn hygiene memory agent)
- tmp_agent (_compress_context path)

Apply the same _print_fn no-op pattern used for the flush agent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(display): remove unused _last_flush_time from KawaiiSpinner

Attribute was set but never read; upstream already removed it.
Leftover from conflict resolution during rebase onto upstream/main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 17:40:31 -07:00
Teknium 08fa326bb0 feat(gateway): deliver background review notifications to user chat (#3293)
The background memory/skill review (_spawn_background_review) runs
after the agent response when turn/iteration counters exceed their
thresholds. It saves memories and skills, then prints a summary like
'💾 Memory updated · User profile updated'. In CLI mode this goes to
the terminal via _safe_print. In gateway mode, _safe_print routes to
print() which goes to stdout — invisible to the user.

Add a background_review_callback attribute to AIAgent. When set, the
background review thread calls it with the summary string after saves
complete. The gateway wires this to adapter.send() via the same
run_coroutine_threadsafe bridge used by status_callback, delivering
the notification to the user's chat.
2026-03-26 17:38:24 -07:00
Teknium bde45f5a2a fix(gateway): retry transient send failures and notify user on exhaustion (#3288)
When send() fails due to a network error (ConnectError, ReadTimeout, etc.),
the failure was silently logged and the user received no feedback — appearing
as a hang. In one reported case, a user waited 1+ hour for a response that
had already been generated but failed to deliver (#2910).

Adds _send_with_retry() to BasePlatformAdapter:
- Transient errors: retry up to 2x with exponential backoff + jitter
- On exhaustion: send delivery-failure notice so user knows to retry
- Permanent errors: fall back to plain-text version (preserves existing behavior)
- SendResult.retryable flag for platform-specific transient errors

All adapters benefit automatically via BasePlatformAdapter inheritance.

Cherry-picked from PR #3108 by Mibayy.

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
2026-03-26 17:37:10 -07:00
Teknium 716e616d28 fix(tui): status bar duplicates and degrades during long sessions (#3291)
shutil.get_terminal_size() can return stale/fallback values on SSH that
differ from prompt_toolkit's actual terminal width. Fragments built for
the wrong width overflow and wrap onto a second line (wrap_lines=True
default), appearing as progressively degrading duplicates.

- Read width from get_app().output.get_size().columns when inside a
  prompt_toolkit TUI, falling back to shutil outside TUI context
- Add wrap_lines=False on the status bar Window as belt-and-suspenders
  guard against any future width mismatch

Closes #3130

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
2026-03-26 17:33:11 -07:00
Teknium bdccdd67a1 fix: OpenClaw migration overwrites defaults and setup wizard skips imported sections (#3282)
Two bugs caused the OpenClaw migration during first-time setup to be
ineffective, forcing users to reconfigure everything manually:

1. The setup wizard created config.yaml with all defaults BEFORE running
   the migration, then the migrator ran with overwrite=False. Every config
   setting was reported as a 'conflict' against the defaults and skipped.
   Fix: use overwrite=True during setup-time migration (safe because only
   defaults exist at that point). The hermes claw migrate CLI command
   still defaults to overwrite=False for post-setup use.

2. After migration, the full setup wizard ran all 5 sections unconditionally,
   forcing the user through model/terminal/agent/messaging/tools configuration
   even when those settings were just imported.
   Fix: add _get_section_config_summary() and _skip_configured_section()
   helpers. After migration, each section checks if it's already configured
   (API keys present, non-default values, platform tokens) and offers
   'Reconfigure? [y/N]' with default No. Unconfigured sections still run
   normally.

Reported by Dev Bredda on social media.
2026-03-26 16:29:38 -07:00
Teknium 148f46620f fix(matrix): add backoff for SyncError in sync loop (#3280)
When the homeserver returns an error response, matrix-nio parses it
as a SyncError return value rather than raising an exception. The sync
loop only had backoff in the except handler, so SyncError caused a
tight retry loop (~489 req/s) flooding logs and hammering the
homeserver. Check the return value and sleep 5s before retry.

Cherry-picked from PR #2937 by ticketclosed-wontfix.

Co-authored-by: ticketclosed-wontfix <ticketclosed-wontfix@users.noreply.github.com>
2026-03-26 16:19:58 -07:00
176 changed files with 11948 additions and 1239 deletions
+13
View File
@@ -59,12 +59,25 @@ OPENCODE_ZEN_API_KEY=
# OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
# $10/month subscription. Get your key at: https://opencode.ai/auth
OPENCODE_GO_API_KEY=
# =============================================================================
# LLM PROVIDER (Hugging Face Inference Providers)
# =============================================================================
# Hugging Face routes to 20+ open models via unified OpenAI-compatible endpoint.
# Free tier included ($0.10/month), no markup on provider rates.
# Get your token at: https://huggingface.co/settings/tokens
# Required permission: "Make calls to Inference Providers"
HF_TOKEN=
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
# =============================================================================
# TOOL API KEYS
# =============================================================================
# Exa API Key - AI-native web search and contents
# Get at: https://exa.ai
EXA_API_KEY=
# Parallel API Key - AI-native web search and extract
# Get at: https://parallel.ai
PARALLEL_API_KEY=
+348
View File
@@ -0,0 +1,348 @@
# Hermes Agent v0.5.0 (v2026.3.28)
**Release Date:** March 28, 2026
> The hardening release — Hugging Face provider, /model command overhaul, Telegram Private Chat Topics, native Modal SDK, plugin lifecycle hooks, tool-use enforcement for GPT models, Nix flake, 50+ security and reliability fixes, and a comprehensive supply chain audit.
---
## ✨ Highlights
- **Nous Portal now supports 400+ models** — The Nous Research inference portal has expanded dramatically, giving Hermes Agent users access to over 400 models through a single provider endpoint
- **Hugging Face as a first-class inference provider** — Full integration with HF Inference API including curated agentic model picker that maps to OpenRouter analogues, live `/models` endpoint probe, and setup wizard flow ([#3419](https://github.com/NousResearch/hermes-agent/pull/3419), [#3440](https://github.com/NousResearch/hermes-agent/pull/3440))
- **Telegram Private Chat Topics** — Project-based conversations with functional skill binding per topic, enabling isolated workflows within a single Telegram chat ([#3163](https://github.com/NousResearch/hermes-agent/pull/3163))
- **Native Modal SDK backend** — Replaced swe-rex dependency with native Modal SDK (`Sandbox.create.aio` + `exec.aio`), eliminating tunnels and simplifying the Modal terminal backend ([#3538](https://github.com/NousResearch/hermes-agent/pull/3538))
- **Plugin lifecycle hooks activated** — `pre_llm_call`, `post_llm_call`, `on_session_start`, and `on_session_end` hooks now fire in the agent loop and CLI/gateway, completing the plugin hook system ([#3542](https://github.com/NousResearch/hermes-agent/pull/3542))
- **Improved OpenAI Model Reliability** — Added `GPT_TOOL_USE_GUIDANCE` to prevent GPT models from describing intended actions instead of making tool calls, plus automatic stripping of stale budget warnings from conversation history that caused models to avoid tools across turns ([#3528](https://github.com/NousResearch/hermes-agent/pull/3528))
- **Nix flake** — Full uv2nix build, NixOS module with persistent container mode, auto-generated config keys from Python source, and suffix PATHs for agent-friendliness ([#20](https://github.com/NousResearch/hermes-agent/pull/20), [#3274](https://github.com/NousResearch/hermes-agent/pull/3274), [#3061](https://github.com/NousResearch/hermes-agent/pull/3061)) by @alt-glitch
- **Supply chain hardening** — Removed compromised `litellm` dependency, pinned all dependency version ranges, regenerated `uv.lock` with hashes, added CI workflow scanning PRs for supply chain attack patterns, and bumped deps to fix CVEs ([#2796](https://github.com/NousResearch/hermes-agent/pull/2796), [#2810](https://github.com/NousResearch/hermes-agent/pull/2810), [#2812](https://github.com/NousResearch/hermes-agent/pull/2812), [#2816](https://github.com/NousResearch/hermes-agent/pull/2816), [#3073](https://github.com/NousResearch/hermes-agent/pull/3073))
- **Anthropic output limits fix** — Replaced hardcoded 16K `max_tokens` with per-model native output limits (128K for Opus 4.6, 64K for Sonnet 4.6), fixing "Response truncated" and thinking-budget exhaustion on direct Anthropic API ([#3426](https://github.com/NousResearch/hermes-agent/pull/3426), [#3444](https://github.com/NousResearch/hermes-agent/pull/3444))
---
## 🏗️ Core Agent & Architecture
### New Provider: Hugging Face
- First-class Hugging Face Inference API integration with auth, setup wizard, and model picker ([#3419](https://github.com/NousResearch/hermes-agent/pull/3419))
- Curated model list mapping OpenRouter agentic defaults to HF equivalents — providers with 8+ curated models skip live `/models` probe for speed ([#3440](https://github.com/NousResearch/hermes-agent/pull/3440))
- Added glm-5-turbo to Z.AI provider model list ([#3095](https://github.com/NousResearch/hermes-agent/pull/3095))
### Provider & Model Improvements
- `/model` command overhaul — extracted shared `switch_model()` pipeline for CLI and gateway, custom endpoint support, provider-aware routing ([#2795](https://github.com/NousResearch/hermes-agent/pull/2795), [#2799](https://github.com/NousResearch/hermes-agent/pull/2799))
- Removed `/model` slash command from CLI and gateway in favor of `hermes model` subcommand ([#3080](https://github.com/NousResearch/hermes-agent/pull/3080))
- Preserve `custom` provider instead of silently remapping to `openrouter` ([#2792](https://github.com/NousResearch/hermes-agent/pull/2792))
- Read root-level `provider` and `base_url` from config.yaml into model config ([#3112](https://github.com/NousResearch/hermes-agent/pull/3112))
- Align Nous Portal model slugs with OpenRouter naming ([#3253](https://github.com/NousResearch/hermes-agent/pull/3253))
- Fix Alibaba provider default endpoint and model list ([#3484](https://github.com/NousResearch/hermes-agent/pull/3484))
- Allow MiniMax users to override `/v1``/anthropic` auto-correction ([#3553](https://github.com/NousResearch/hermes-agent/pull/3553))
- Migrate OAuth token refresh to `platform.claude.com` with fallback ([#3246](https://github.com/NousResearch/hermes-agent/pull/3246))
### Agent Loop & Conversation
- **Improved OpenAI model reliability** — `GPT_TOOL_USE_GUIDANCE` prevents GPT models from describing actions instead of calling tools + automatic budget warning stripping from history ([#3528](https://github.com/NousResearch/hermes-agent/pull/3528))
- **Surface lifecycle events** — All retry, fallback, and compression events now surface to the user as formatted messages ([#3153](https://github.com/NousResearch/hermes-agent/pull/3153))
- **Anthropic output limits** — Per-model native output limits instead of hardcoded 16K `max_tokens` ([#3426](https://github.com/NousResearch/hermes-agent/pull/3426))
- **Thinking-budget exhaustion detection** — Skip useless continuation retries when model uses all output tokens on reasoning ([#3444](https://github.com/NousResearch/hermes-agent/pull/3444))
- Always prefer streaming for API calls to prevent hung subagents ([#3120](https://github.com/NousResearch/hermes-agent/pull/3120))
- Restore safe non-streaming fallback after stream failures ([#3020](https://github.com/NousResearch/hermes-agent/pull/3020))
- Give subagents independent iteration budgets ([#3004](https://github.com/NousResearch/hermes-agent/pull/3004))
- Update `api_key` in `_try_activate_fallback` for subagent auth ([#3103](https://github.com/NousResearch/hermes-agent/pull/3103))
- Graceful return on max retries instead of crashing thread ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Count compression restarts toward retry limit ([#3070](https://github.com/NousResearch/hermes-agent/pull/3070))
- Include tool tokens in preflight estimate, guard context probe persistence ([#3164](https://github.com/NousResearch/hermes-agent/pull/3164))
- Update context compressor limits after fallback activation ([#3305](https://github.com/NousResearch/hermes-agent/pull/3305))
- Validate empty user messages to prevent Anthropic API 400 errors ([#3322](https://github.com/NousResearch/hermes-agent/pull/3322))
- GLM reasoning-only and max-length handling ([#3010](https://github.com/NousResearch/hermes-agent/pull/3010))
- Increase API timeout default from 900s to 1800s for slow-thinking models ([#3431](https://github.com/NousResearch/hermes-agent/pull/3431))
- Send `max_tokens` for Claude/OpenRouter + retry SSE connection errors ([#3497](https://github.com/NousResearch/hermes-agent/pull/3497))
- Prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode ([#2701](https://github.com/NousResearch/hermes-agent/pull/2701)) by @ctlst
### Streaming & Reasoning
- **Persist reasoning across gateway session turns** with new schema v6 columns (`reasoning`, `reasoning_details`, `codex_reasoning_items`) ([#2974](https://github.com/NousResearch/hermes-agent/pull/2974))
- Detect and kill stale SSE connections ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Fix stale stream detector race causing spurious `RemoteProtocolError` ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Skip duplicate callback for `<think>`-extracted reasoning during streaming ([#3116](https://github.com/NousResearch/hermes-agent/pull/3116))
- Preserve reasoning fields in `rewrite_transcript` ([#3311](https://github.com/NousResearch/hermes-agent/pull/3311))
- Preserve Gemini thought signatures in streamed tool calls ([#2997](https://github.com/NousResearch/hermes-agent/pull/2997))
- Ensure first delta is fired during reasoning updates ([untagged commit](https://github.com/NousResearch/hermes-agent))
### Session & Memory
- **Session search recent sessions mode** — Omit query to browse recent sessions with titles, previews, and timestamps ([#2533](https://github.com/NousResearch/hermes-agent/pull/2533))
- **Session config surfacing** on `/new`, `/reset`, and auto-reset ([#3321](https://github.com/NousResearch/hermes-agent/pull/3321))
- **Third-party session isolation** — `--source` flag for isolating sessions by origin ([#3255](https://github.com/NousResearch/hermes-agent/pull/3255))
- Add `/resume` CLI handler, session log truncation guard, `reopen_session` API ([#3315](https://github.com/NousResearch/hermes-agent/pull/3315))
- Clear compressor summary and turn counter on `/clear` and `/new` ([#3102](https://github.com/NousResearch/hermes-agent/pull/3102))
- Surface silent SessionDB failures that cause session data loss ([#2999](https://github.com/NousResearch/hermes-agent/pull/2999))
- Session search fallback preview on summarization failure ([#3478](https://github.com/NousResearch/hermes-agent/pull/3478))
- Prevent stale memory overwrites by flush agent ([#2687](https://github.com/NousResearch/hermes-agent/pull/2687))
### Context Compression
- Replace dead `summary_target_tokens` with ratio-based scaling ([#2554](https://github.com/NousResearch/hermes-agent/pull/2554))
- Expose `compression.target_ratio`, `protect_last_n`, and `threshold` in `DEFAULT_CONFIG` ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Restore sane defaults and cap summary at 12K tokens ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Preserve transcript on `/compress` and hygiene compression ([#3556](https://github.com/NousResearch/hermes-agent/pull/3556))
- Update context pressure warnings and token estimates after compaction ([untagged commit](https://github.com/NousResearch/hermes-agent))
### Architecture & Dependencies
- **Remove mini-swe-agent dependency** — Inline Docker and Modal backends directly ([#2804](https://github.com/NousResearch/hermes-agent/pull/2804))
- **Replace swe-rex with native Modal SDK** for Modal backend ([#3538](https://github.com/NousResearch/hermes-agent/pull/3538))
- **Plugin lifecycle hooks** — `pre_llm_call`, `post_llm_call`, `on_session_start`, `on_session_end` now fire in the agent loop ([#3542](https://github.com/NousResearch/hermes-agent/pull/3542))
- Fix plugin toolsets invisible in `hermes tools` and standalone processes ([#3457](https://github.com/NousResearch/hermes-agent/pull/3457))
- Consolidate `get_hermes_home()` and `parse_reasoning_effort()` ([#3062](https://github.com/NousResearch/hermes-agent/pull/3062))
- Remove unused Hermes-native PKCE OAuth flow ([#3107](https://github.com/NousResearch/hermes-agent/pull/3107))
- Remove ~100 unused imports across 55 files ([#3016](https://github.com/NousResearch/hermes-agent/pull/3016))
- Fix 154 f-strings, simplify getattr/URL patterns, remove dead code ([#3119](https://github.com/NousResearch/hermes-agent/pull/3119))
---
## 📱 Messaging Platforms (Gateway)
### Telegram
- **Private Chat Topics** — Project-based conversations with functional skill binding per topic, enabling isolated workflows within a single Telegram chat ([#3163](https://github.com/NousResearch/hermes-agent/pull/3163))
- **Auto-discover fallback IPs via DNS-over-HTTPS** when `api.telegram.org` is unreachable ([#3376](https://github.com/NousResearch/hermes-agent/pull/3376))
- **Configurable reply threading mode** ([#2907](https://github.com/NousResearch/hermes-agent/pull/2907))
- Fall back to no `thread_id` on "Message thread not found" BadRequest ([#3390](https://github.com/NousResearch/hermes-agent/pull/3390))
- Self-reschedule reconnect when `start_polling` fails after 502 ([#3268](https://github.com/NousResearch/hermes-agent/pull/3268))
### Discord
- Stop phantom typing indicator after agent turn completes ([#3003](https://github.com/NousResearch/hermes-agent/pull/3003))
### Slack
- Send tool call progress messages to correct Slack thread ([#3063](https://github.com/NousResearch/hermes-agent/pull/3063))
- Scope progress thread fallback to Slack only ([#3488](https://github.com/NousResearch/hermes-agent/pull/3488))
### WhatsApp
- Download documents, audio, and video media from messages ([#2978](https://github.com/NousResearch/hermes-agent/pull/2978))
### Matrix
- Add missing Matrix entry in `PLATFORMS` dict ([#3473](https://github.com/NousResearch/hermes-agent/pull/3473))
- Harden e2ee access-token handling ([#3562](https://github.com/NousResearch/hermes-agent/pull/3562))
- Add backoff for `SyncError` in sync loop ([#3280](https://github.com/NousResearch/hermes-agent/pull/3280))
### Signal
- Track SSE keepalive comments as connection activity ([#3316](https://github.com/NousResearch/hermes-agent/pull/3316))
### Email
- Prevent unbounded growth of `_seen_uids` in EmailAdapter ([#3490](https://github.com/NousResearch/hermes-agent/pull/3490))
### Gateway Core
- **Config-gated `/verbose` command** for messaging platforms — toggle tool output verbosity from chat ([#3262](https://github.com/NousResearch/hermes-agent/pull/3262))
- **Background review notifications** delivered to user chat ([#3293](https://github.com/NousResearch/hermes-agent/pull/3293))
- **Retry transient send failures** and notify user on exhaustion ([#3288](https://github.com/NousResearch/hermes-agent/pull/3288))
- Recover from hung agents — `/stop` hard-kills session lock ([#3104](https://github.com/NousResearch/hermes-agent/pull/3104))
- Thread-safe `SessionStore` — protect `_entries` with `threading.Lock` ([#3052](https://github.com/NousResearch/hermes-agent/pull/3052))
- Fix gateway token double-counting with cached agents — use absolute set instead of increment ([#3306](https://github.com/NousResearch/hermes-agent/pull/3306), [#3317](https://github.com/NousResearch/hermes-agent/pull/3317))
- Fingerprint full auth token in agent cache signature ([#3247](https://github.com/NousResearch/hermes-agent/pull/3247))
- Silence background agent terminal output ([#3297](https://github.com/NousResearch/hermes-agent/pull/3297))
- Include per-platform `ALLOW_ALL` and `SIGNAL_GROUP` in startup allowlist check ([#3313](https://github.com/NousResearch/hermes-agent/pull/3313))
- Include user-local bin paths in systemd unit PATH ([#3527](https://github.com/NousResearch/hermes-agent/pull/3527))
- Track background task references in `GatewayRunner` ([#3254](https://github.com/NousResearch/hermes-agent/pull/3254))
- Add request timeouts to HA, Email, Mattermost, SMS adapters ([#3258](https://github.com/NousResearch/hermes-agent/pull/3258))
- Add media download retry to Mattermost, Slack, and base cache ([#3323](https://github.com/NousResearch/hermes-agent/pull/3323))
- Detect virtualenv path instead of hardcoding `venv/` ([#2797](https://github.com/NousResearch/hermes-agent/pull/2797))
- Use `TERMINAL_CWD` for context file discovery, not process cwd ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Stop loading hermes repo AGENTS.md into gateway sessions (~10k wasted tokens) ([#2891](https://github.com/NousResearch/hermes-agent/pull/2891))
---
## 🖥️ CLI & User Experience
### Interactive CLI
- **Configurable busy input mode** + fix `/queue` always working ([#3298](https://github.com/NousResearch/hermes-agent/pull/3298))
- **Preserve user input on multiline paste** ([#3065](https://github.com/NousResearch/hermes-agent/pull/3065))
- **Tool generation callback** — streaming "preparing terminal…" updates during tool argument generation ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Show tool progress for substantive tools, not just "preparing" ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Buffer reasoning preview chunks and fix duplicate display ([#3013](https://github.com/NousResearch/hermes-agent/pull/3013))
- Prevent reasoning box from rendering 3x during tool-calling loops ([#3405](https://github.com/NousResearch/hermes-agent/pull/3405))
- Eliminate "Event loop is closed" / "Press ENTER to continue" during idle — three-layer fix with `neuter_async_httpx_del()`, custom exception handler, and stale client cleanup ([#3398](https://github.com/NousResearch/hermes-agent/pull/3398))
- Fix status bar shows 26K instead of 260K for token counts with trailing zeros ([#3024](https://github.com/NousResearch/hermes-agent/pull/3024))
- Fix status bar duplicates and degrades during long sessions ([#3291](https://github.com/NousResearch/hermes-agent/pull/3291))
- Refresh TUI before background task output to prevent status bar overlap ([#3048](https://github.com/NousResearch/hermes-agent/pull/3048))
- Suppress KawaiiSpinner animation under `patch_stdout` ([#2994](https://github.com/NousResearch/hermes-agent/pull/2994))
- Skip KawaiiSpinner when TUI handles tool progress ([#2973](https://github.com/NousResearch/hermes-agent/pull/2973))
- Guard `isatty()` against closed streams via `_is_tty` property ([#3056](https://github.com/NousResearch/hermes-agent/pull/3056))
- Ensure single closure of streaming boxes during tool generation ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Cap context pressure percentage at 100% in display ([#3480](https://github.com/NousResearch/hermes-agent/pull/3480))
- Clean up HTML error messages in CLI display ([#3069](https://github.com/NousResearch/hermes-agent/pull/3069))
- Show HTTP status code and 400 body in API error output ([#3096](https://github.com/NousResearch/hermes-agent/pull/3096))
- Extract useful info from HTML error pages, dump debug on max retries ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Prevent TypeError on startup when `base_url` is None ([#3068](https://github.com/NousResearch/hermes-agent/pull/3068))
- Prevent update crash in non-TTY environments ([#3094](https://github.com/NousResearch/hermes-agent/pull/3094))
- Handle EOFError in sessions delete/prune confirmation prompts ([#3101](https://github.com/NousResearch/hermes-agent/pull/3101))
- Catch KeyboardInterrupt during `flush_memories` on exit and in exit cleanup handlers ([#3025](https://github.com/NousResearch/hermes-agent/pull/3025), [#3257](https://github.com/NousResearch/hermes-agent/pull/3257))
- Guard `.strip()` against None values from YAML config ([#3552](https://github.com/NousResearch/hermes-agent/pull/3552))
- Guard `config.get()` against YAML null values to prevent AttributeError ([#3377](https://github.com/NousResearch/hermes-agent/pull/3377))
- Store asyncio task references to prevent GC mid-execution ([#3267](https://github.com/NousResearch/hermes-agent/pull/3267))
### Setup & Configuration
- Use explicit key mapping for returning-user menu dispatch instead of positional index ([#3083](https://github.com/NousResearch/hermes-agent/pull/3083))
- Use `sys.executable` for pip in update commands to fix PEP 668 ([#3099](https://github.com/NousResearch/hermes-agent/pull/3099))
- Harden `hermes update` against diverged history, non-main branches, and gateway edge cases ([#3492](https://github.com/NousResearch/hermes-agent/pull/3492))
- OpenClaw migration overwrites defaults and setup wizard skips imported sections — fixed ([#3282](https://github.com/NousResearch/hermes-agent/pull/3282))
- Stop recursive AGENTS.md walk, load top-level only ([#3110](https://github.com/NousResearch/hermes-agent/pull/3110))
- Add macOS Homebrew paths to browser and terminal PATH resolution ([#2713](https://github.com/NousResearch/hermes-agent/pull/2713))
- YAML boolean handling for `tool_progress` config ([#3300](https://github.com/NousResearch/hermes-agent/pull/3300))
- Reset default SOUL.md to baseline identity text ([#3159](https://github.com/NousResearch/hermes-agent/pull/3159))
- Reject relative cwd paths for container terminal backends ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Add explicit `hermes-api-server` toolset for API server platform ([#3304](https://github.com/NousResearch/hermes-agent/pull/3304))
- Reorder setup wizard providers — OpenRouter first ([untagged commit](https://github.com/NousResearch/hermes-agent))
---
## 🔧 Tool System
### API Server
- **Idempotency-Key support**, body size limit, and OpenAI error envelope ([#2903](https://github.com/NousResearch/hermes-agent/pull/2903))
- Allow Idempotency-Key in CORS headers ([#3530](https://github.com/NousResearch/hermes-agent/pull/3530))
- Cancel orphaned agent + true interrupt on SSE disconnect ([#3427](https://github.com/NousResearch/hermes-agent/pull/3427))
- Fix streaming breaks when agent makes tool calls ([#2985](https://github.com/NousResearch/hermes-agent/pull/2985))
### Terminal & File Operations
- Handle addition-only hunks in V4A patch parser ([#3325](https://github.com/NousResearch/hermes-agent/pull/3325))
- Exponential backoff for persistent shell polling ([#2996](https://github.com/NousResearch/hermes-agent/pull/2996))
- Add timeout to subprocess calls in `context_references` ([#3469](https://github.com/NousResearch/hermes-agent/pull/3469))
### Browser & Vision
- Handle 402 insufficient credits error in vision tool ([#2802](https://github.com/NousResearch/hermes-agent/pull/2802))
- Fix `browser_vision` ignores `auxiliary.vision.timeout` config ([#2901](https://github.com/NousResearch/hermes-agent/pull/2901))
- Make browser command timeout configurable via config.yaml ([#2801](https://github.com/NousResearch/hermes-agent/pull/2801))
### MCP
- MCP toolset resolution for runtime and config ([#3252](https://github.com/NousResearch/hermes-agent/pull/3252))
- Add MCP tool name collision protection ([#3077](https://github.com/NousResearch/hermes-agent/pull/3077))
### Auxiliary LLM
- Guard aux LLM calls against None content + reasoning fallback + retry ([#3449](https://github.com/NousResearch/hermes-agent/pull/3449))
- Catch ImportError from `build_anthropic_client` in vision auto-detection ([#3312](https://github.com/NousResearch/hermes-agent/pull/3312))
### Other Tools
- Add request timeouts to `send_message_tool` HTTP calls ([#3162](https://github.com/NousResearch/hermes-agent/pull/3162)) by @memosr
- Auto-repair `jobs.json` with invalid control characters ([#3537](https://github.com/NousResearch/hermes-agent/pull/3537))
- Enable fine-grained tool streaming for Claude/OpenRouter ([#3497](https://github.com/NousResearch/hermes-agent/pull/3497))
---
## 🧩 Skills Ecosystem
### Skills System
- **Env var passthrough** for skills and user config — skills can declare environment variables to pass through ([#2807](https://github.com/NousResearch/hermes-agent/pull/2807))
- Cache skills prompt with shared `skill_utils` module for faster TTFT ([#3421](https://github.com/NousResearch/hermes-agent/pull/3421))
- Avoid redundant file re-read for skill conditions ([#2992](https://github.com/NousResearch/hermes-agent/pull/2992))
- Use Git Trees API to prevent silent subdirectory loss during install ([#2995](https://github.com/NousResearch/hermes-agent/pull/2995))
- Fix skills-sh install for deeply nested repo structures ([#2980](https://github.com/NousResearch/hermes-agent/pull/2980))
- Handle null metadata in skill frontmatter ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Preserve trust for skills-sh identifiers + reduce resolution churn ([#3251](https://github.com/NousResearch/hermes-agent/pull/3251))
- Agent-created skills were incorrectly treated as untrusted community content — fixed ([untagged commit](https://github.com/NousResearch/hermes-agent))
### New Skills
- **G0DM0D3 godmode jailbreaking skill** + docs ([#3157](https://github.com/NousResearch/hermes-agent/pull/3157))
- **Docker management skill** added to optional-skills ([#3060](https://github.com/NousResearch/hermes-agent/pull/3060))
- **OpenClaw migration v2** — 17 new modules, terminal recap for migrating from OpenClaw to Hermes ([#2906](https://github.com/NousResearch/hermes-agent/pull/2906))
---
## 🔒 Security & Reliability
### Security Hardening
- **SSRF protection** added to `browser_navigate` ([#3058](https://github.com/NousResearch/hermes-agent/pull/3058))
- **SSRF protection** added to `vision_tools` and `web_tools` (hardened) ([#2679](https://github.com/NousResearch/hermes-agent/pull/2679))
- **Restrict subagent toolsets** to parent's enabled set ([#3269](https://github.com/NousResearch/hermes-agent/pull/3269))
- **Prevent zip-slip path traversal** in self-update ([#3250](https://github.com/NousResearch/hermes-agent/pull/3250))
- **Prevent shell injection** in `_expand_path` via `~user` path suffix ([#2685](https://github.com/NousResearch/hermes-agent/pull/2685))
- **Normalize input** before dangerous command detection ([#3260](https://github.com/NousResearch/hermes-agent/pull/3260))
- Make tirith block verdicts approvable instead of hard-blocking ([#3428](https://github.com/NousResearch/hermes-agent/pull/3428))
- Remove compromised `litellm`/`typer`/`platformdirs` from deps ([#2796](https://github.com/NousResearch/hermes-agent/pull/2796))
- Pin all dependency version ranges ([#2810](https://github.com/NousResearch/hermes-agent/pull/2810))
- Regenerate `uv.lock` with hashes, use lockfile in setup ([#2812](https://github.com/NousResearch/hermes-agent/pull/2812))
- Bump dependencies to fix CVEs + regenerate `uv.lock` ([#3073](https://github.com/NousResearch/hermes-agent/pull/3073))
- Supply chain audit CI workflow for PR scanning ([#2816](https://github.com/NousResearch/hermes-agent/pull/2816))
### Reliability
- **SQLite WAL write-lock contention** causing 15-20s TUI freeze — fixed ([#3385](https://github.com/NousResearch/hermes-agent/pull/3385))
- **SQLite concurrency hardening** + session transcript integrity ([#3249](https://github.com/NousResearch/hermes-agent/pull/3249))
- Prevent recurring cron job re-fire on gateway crash/restart loop ([#3396](https://github.com/NousResearch/hermes-agent/pull/3396))
- Mark cron session as ended after job completes ([#2998](https://github.com/NousResearch/hermes-agent/pull/2998))
---
## ⚡ Performance
- **TTFT startup optimizations** — salvaged easy-win startup improvements ([#3395](https://github.com/NousResearch/hermes-agent/pull/3395))
- Cache skills prompt with shared `skill_utils` module ([#3421](https://github.com/NousResearch/hermes-agent/pull/3421))
- Avoid redundant file re-read for skill conditions in prompt builder ([#2992](https://github.com/NousResearch/hermes-agent/pull/2992))
---
## 🐛 Notable Bug Fixes
- Fix gateway token double-counting with cached agents ([#3306](https://github.com/NousResearch/hermes-agent/pull/3306), [#3317](https://github.com/NousResearch/hermes-agent/pull/3317))
- Fix "Event loop is closed" / "Press ENTER to continue" during idle sessions ([#3398](https://github.com/NousResearch/hermes-agent/pull/3398))
- Fix reasoning box rendering 3x during tool-calling loops ([#3405](https://github.com/NousResearch/hermes-agent/pull/3405))
- Fix status bar shows 26K instead of 260K for token counts ([#3024](https://github.com/NousResearch/hermes-agent/pull/3024))
- Fix `/queue` always working regardless of config ([#3298](https://github.com/NousResearch/hermes-agent/pull/3298))
- Fix phantom Discord typing indicator after agent turn ([#3003](https://github.com/NousResearch/hermes-agent/pull/3003))
- Fix Slack progress messages appearing in wrong thread ([#3063](https://github.com/NousResearch/hermes-agent/pull/3063))
- Fix WhatsApp media downloads (documents, audio, video) ([#2978](https://github.com/NousResearch/hermes-agent/pull/2978))
- Fix Telegram "Message thread not found" killing progress messages ([#3390](https://github.com/NousResearch/hermes-agent/pull/3390))
- Fix OpenClaw migration overwriting defaults ([#3282](https://github.com/NousResearch/hermes-agent/pull/3282))
- Fix returning-user setup menu dispatching wrong section ([#3083](https://github.com/NousResearch/hermes-agent/pull/3083))
- Fix `hermes update` PEP 668 "externally-managed-environment" error ([#3099](https://github.com/NousResearch/hermes-agent/pull/3099))
- Fix subagents hitting `max_iterations` prematurely via shared budget ([#3004](https://github.com/NousResearch/hermes-agent/pull/3004))
- Fix YAML boolean handling for `tool_progress` config ([#3300](https://github.com/NousResearch/hermes-agent/pull/3300))
- Fix `config.get()` crashes on YAML null values ([#3377](https://github.com/NousResearch/hermes-agent/pull/3377))
- Fix `.strip()` crash on None values from YAML config ([#3552](https://github.com/NousResearch/hermes-agent/pull/3552))
- Fix hung agents on gateway — `/stop` now hard-kills session lock ([#3104](https://github.com/NousResearch/hermes-agent/pull/3104))
- Fix `_custom` provider silently remapped to `openrouter` ([#2792](https://github.com/NousResearch/hermes-agent/pull/2792))
- Fix Matrix missing from `PLATFORMS` dict ([#3473](https://github.com/NousResearch/hermes-agent/pull/3473))
- Fix Email adapter unbounded `_seen_uids` growth ([#3490](https://github.com/NousResearch/hermes-agent/pull/3490))
---
## 🧪 Testing
- Pin `agent-client-protocol` < 0.9 to handle breaking upstream release ([#3320](https://github.com/NousResearch/hermes-agent/pull/3320))
- Catch anthropic ImportError in vision auto-detection tests ([#3312](https://github.com/NousResearch/hermes-agent/pull/3312))
- Update retry-exhaust test for new graceful return behavior ([#3320](https://github.com/NousResearch/hermes-agent/pull/3320))
- Add regression tests for null metadata frontmatter ([untagged commit](https://github.com/NousResearch/hermes-agent))
---
## 📚 Documentation
- Update all docs for `/model` command overhaul and custom provider support ([#2800](https://github.com/NousResearch/hermes-agent/pull/2800))
- Fix stale and incorrect documentation across 18 files ([#2805](https://github.com/NousResearch/hermes-agent/pull/2805))
- Document 9 previously undocumented features ([#2814](https://github.com/NousResearch/hermes-agent/pull/2814))
- Add missing skills, CLI commands, and messaging env vars to docs ([#2809](https://github.com/NousResearch/hermes-agent/pull/2809))
- Fix api-server response storage documentation — SQLite, not in-memory ([#2819](https://github.com/NousResearch/hermes-agent/pull/2819))
- Quote pip install extras to fix zsh glob errors ([#2815](https://github.com/NousResearch/hermes-agent/pull/2815))
- Unify hooks documentation — add plugin hooks to hooks page, add `session:end` event ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Clarify two-mode behavior in `session_search` schema description ([untagged commit](https://github.com/NousResearch/hermes-agent))
- Fix Discord Public Bot setting for Discord-provided invite link ([#3519](https://github.com/NousResearch/hermes-agent/pull/3519)) by @mehmoodosman
- Revise v0.4.0 changelog — fix feature attribution, reorder sections ([untagged commit](https://github.com/NousResearch/hermes-agent))
---
## 👥 Contributors
### Core
- **@teknium1** — 157 PRs covering the full scope of this release
### Community Contributors
- **@alt-glitch** (Siddharth Balyan) — 2 PRs: Nix flake with uv2nix build, NixOS module, and persistent container mode ([#20](https://github.com/NousResearch/hermes-agent/pull/20)); auto-generated config keys and suffix PATHs for Nix builds ([#3061](https://github.com/NousResearch/hermes-agent/pull/3061), [#3274](https://github.com/NousResearch/hermes-agent/pull/3274))
- **@ctlst** — 1 PR: Prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode ([#2701](https://github.com/NousResearch/hermes-agent/pull/2701))
- **@memosr** (memosr.eth) — 1 PR: Add request timeouts to `send_message_tool` HTTP calls ([#3162](https://github.com/NousResearch/hermes-agent/pull/3162))
- **@mehmoodosman** (Osman Mehmood) — 1 PR: Fix Discord docs for Public Bot setting ([#3519](https://github.com/NousResearch/hermes-agent/pull/3519))
### All Contributors
@alt-glitch, @ctlst, @mehmoodosman, @memosr, @teknium1
---
**Full Changelog**: [v2026.3.23...v2026.3.28](https://github.com/NousResearch/hermes-agent/compare/v2026.3.23...v2026.3.28)
+83 -9
View File
@@ -35,6 +35,54 @@ ADAPTIVE_EFFORT_MAP = {
"minimal": "low",
}
# ── Max output token limits per Anthropic model ───────────────────────
# Source: Anthropic docs + Cline model catalog. Anthropic's API requires
# max_tokens as a mandatory field. Previously we hardcoded 16384, which
# starves thinking-enabled models (thinking tokens count toward the limit).
_ANTHROPIC_OUTPUT_LIMITS = {
# Claude 4.6
"claude-opus-4-6": 128_000,
"claude-sonnet-4-6": 64_000,
# Claude 4.5
"claude-opus-4-5": 64_000,
"claude-sonnet-4-5": 64_000,
"claude-haiku-4-5": 64_000,
# Claude 4
"claude-opus-4": 32_000,
"claude-sonnet-4": 64_000,
# Claude 3.7
"claude-3-7-sonnet": 128_000,
# Claude 3.5
"claude-3-5-sonnet": 8_192,
"claude-3-5-haiku": 8_192,
# Claude 3
"claude-3-opus": 4_096,
"claude-3-sonnet": 4_096,
"claude-3-haiku": 4_096,
}
# For any model not in the table, assume the highest current limit.
# Future Anthropic models are unlikely to have *less* output capacity.
_ANTHROPIC_DEFAULT_OUTPUT_LIMIT = 128_000
def _get_anthropic_max_output(model: str) -> int:
"""Look up the max output token limit for an Anthropic model.
Uses substring matching against _ANTHROPIC_OUTPUT_LIMITS so date-stamped
model IDs (claude-sonnet-4-5-20250929) and variant suffixes (:1m, :fast)
resolve correctly. Longest-prefix match wins to avoid e.g. "claude-3-5"
matching before "claude-3-5-sonnet".
"""
m = model.lower()
best_key = ""
best_val = _ANTHROPIC_DEFAULT_OUTPUT_LIMIT
for key, val in _ANTHROPIC_OUTPUT_LIMITS.items():
if key in m and len(key) > len(best_key):
best_key = key
best_val = val
return best_val
def _supports_adaptive_thinking(model: str) -> bool:
"""Return True for Claude 4.6 models that support adaptive thinking."""
@@ -59,6 +107,7 @@ _OAUTH_ONLY_BETAS = [
# The version must stay reasonably current — Anthropic rejects OAuth requests
# when the spoofed user-agent version is too far behind the actual release.
_CLAUDE_CODE_VERSION_FALLBACK = "2.1.74"
_claude_code_version_cache: Optional[str] = None
def _detect_claude_code_version() -> str:
@@ -86,11 +135,18 @@ def _detect_claude_code_version() -> str:
return _CLAUDE_CODE_VERSION_FALLBACK
_CLAUDE_CODE_VERSION = _detect_claude_code_version()
_CLAUDE_CODE_SYSTEM_PREFIX = "You are Claude Code, Anthropic's official CLI for Claude."
_MCP_TOOL_PREFIX = "mcp_"
def _get_claude_code_version() -> str:
"""Lazily detect the installed Claude Code version when OAuth headers need it."""
global _claude_code_version_cache
if _claude_code_version_cache is None:
_claude_code_version_cache = _detect_claude_code_version()
return _claude_code_version_cache
def _is_oauth_token(key: str) -> bool:
"""Check if the key is an OAuth/setup token (not a regular Console API key).
@@ -132,7 +188,7 @@ def build_anthropic_client(api_key: str, base_url: str = None):
kwargs["auth_token"] = api_key
kwargs["default_headers"] = {
"anthropic-beta": ",".join(all_betas),
"user-agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
"user-agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
"x-app": "cli",
}
else:
@@ -241,7 +297,7 @@ def _refresh_oauth_token(creds: Dict[str, Any]) -> Optional[str]:
headers = {
"Content-Type": "application/json",
"User-Agent": f"claude-cli/{_CLAUDE_CODE_VERSION} (external, cli)",
"User-Agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
}
for endpoint in token_endpoints:
@@ -706,14 +762,21 @@ def convert_messages_to_anthropic(
result.append({"role": "user", "content": [tool_result]})
continue
# Regular user message
# Regular user message — validate non-empty content (Anthropic rejects empty)
if isinstance(content, list):
converted_blocks = _convert_content_to_anthropic(content)
result.append({
"role": "user",
"content": converted_blocks or [{"type": "text", "text": ""}],
})
# Check if all text blocks are empty
if not converted_blocks or all(
b.get("text", "").strip() == ""
for b in converted_blocks
if isinstance(b, dict) and b.get("type") == "text"
):
converted_blocks = [{"type": "text", "text": "(empty message)"}]
result.append({"role": "user", "content": converted_blocks})
else:
# Validate string content is non-empty
if not content or (isinstance(content, str) and not content.strip()):
content = "(empty message)"
result.append({"role": "user", "content": content})
# Strip orphaned tool_use blocks (no matching tool_result follows)
@@ -803,9 +866,15 @@ def build_anthropic_kwargs(
tool_choice: Optional[str] = None,
is_oauth: bool = False,
preserve_dots: bool = False,
context_length: Optional[int] = None,
) -> Dict[str, Any]:
"""Build kwargs for anthropic.messages.create().
When *max_tokens* is None, the model's native output limit is used
(e.g. 128K for Opus 4.6, 64K for Sonnet 4.6). If *context_length*
is provided, the effective limit is clamped so it doesn't exceed
the context window.
When *is_oauth* is True, applies Claude Code compatibility transforms:
system prompt prefix, tool name prefixing, and prompt sanitization.
@@ -816,7 +885,12 @@ def build_anthropic_kwargs(
anthropic_tools = convert_tools_to_anthropic(tools) if tools else []
model = normalize_model_name(model, preserve_dots=preserve_dots)
effective_max_tokens = max_tokens or 16384
effective_max_tokens = max_tokens or _get_anthropic_max_output(model)
# Clamp to context window if the user set a lower context_length
# (e.g. custom endpoint with limited capacity).
if context_length and effective_max_tokens > context_length:
effective_max_tokens = max(context_length - 1, 1)
# ── OAuth: Claude Code identity ──────────────────────────────────
if is_oauth:
+154 -7
View File
@@ -693,7 +693,13 @@ def _try_anthropic() -> Tuple[Optional[Any], Optional[str]]:
is_oauth = _is_oauth_token(token)
model = _API_KEY_PROVIDER_AUX_MODELS.get("anthropic", "claude-haiku-4-5-20251001")
logger.debug("Auxiliary client: Anthropic native (%s) at %s (oauth=%s)", model, base_url, is_oauth)
real_client = build_anthropic_client(token, base_url)
try:
real_client = build_anthropic_client(token, base_url)
except ImportError:
# The anthropic_adapter module imports fine but the SDK itself is
# missing — build_anthropic_client raises ImportError at call time
# when _anthropic_sdk is None. Treat as unavailable.
return None, None
return AnthropicAuxiliaryClient(real_client, model, token, base_url, is_oauth=is_oauth), model
@@ -1131,7 +1137,13 @@ def resolve_vision_provider_client(
return "custom", client, final_model
if requested == "auto":
for candidate in get_available_vision_backends():
ordered = list(_VISION_AUTO_PROVIDER_ORDER)
preferred = _preferred_main_vision_provider()
if preferred in ordered:
ordered.remove(preferred)
ordered.insert(0, preferred)
for candidate in ordered:
sync_client, default_model = _resolve_strict_vision_backend(candidate)
if sync_client is not None:
return _finalize(candidate, sync_client, default_model)
@@ -1204,6 +1216,39 @@ _client_cache: Dict[tuple, tuple] = {}
_client_cache_lock = threading.Lock()
def neuter_async_httpx_del() -> None:
"""Monkey-patch ``AsyncHttpxClientWrapper.__del__`` to be a no-op.
The OpenAI SDK's ``AsyncHttpxClientWrapper.__del__`` schedules
``self.aclose()`` via ``asyncio.get_running_loop().create_task()``.
When an ``AsyncOpenAI`` client is garbage-collected while
prompt_toolkit's event loop is running (the common CLI idle state),
the ``aclose()`` task runs on prompt_toolkit's loop but the
underlying TCP transport is bound to a *different* loop (the worker
thread's loop that the client was originally created on). If that
loop is closed or its thread is dead, the transport's
``self._loop.call_soon()`` raises ``RuntimeError("Event loop is
closed")``, which prompt_toolkit surfaces as "Unhandled exception
in event loop ... Press ENTER to continue...".
Neutering ``__del__`` is safe because:
- Cached clients are explicitly cleaned via ``_force_close_async_httpx``
on stale-loop detection and ``shutdown_cached_clients`` on exit.
- Uncached clients' TCP connections are cleaned up by the OS when the
process exits.
- The OpenAI SDK itself marks this as a TODO (``# TODO(someday):
support non asyncio runtimes here``).
Call this once at CLI startup, before any ``AsyncOpenAI`` clients are
created.
"""
try:
from openai._base_client import AsyncHttpxClientWrapper
AsyncHttpxClientWrapper.__del__ = lambda self: None # type: ignore[assignment]
except (ImportError, AttributeError):
pass # Graceful degradation if the SDK changes its internals
def _force_close_async_httpx(client: Any) -> None:
"""Mark the httpx AsyncClient inside an AsyncOpenAI client as closed.
@@ -1251,6 +1296,25 @@ def shutdown_cached_clients() -> None:
_client_cache.clear()
def cleanup_stale_async_clients() -> None:
"""Force-close cached async clients whose event loop is closed.
Call this after each agent turn to proactively clean up stale clients
before GC can trigger ``AsyncHttpxClientWrapper.__del__`` on them.
This is defense-in-depth — the primary fix is ``neuter_async_httpx_del``
which disables ``__del__`` entirely.
"""
with _client_cache_lock:
stale_keys = []
for key, entry in _client_cache.items():
client, _default, cached_loop = entry
if cached_loop is not None and cached_loop.is_closed():
_force_close_async_httpx(client)
stale_keys.append(key)
for key in stale_keys:
del _client_cache[key]
def _get_cached_client(
provider: str,
model: str = None,
@@ -1394,6 +1458,29 @@ def _resolve_task_provider_model(
return "auto", resolved_model, None, None
_DEFAULT_AUX_TIMEOUT = 30.0
def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float:
"""Read timeout from auxiliary.{task}.timeout in config, falling back to *default*."""
if not task:
return default
try:
from hermes_cli.config import load_config
config = load_config()
except ImportError:
return default
aux = config.get("auxiliary", {}) if isinstance(config, dict) else {}
task_config = aux.get(task, {}) if isinstance(aux, dict) else {}
raw = task_config.get("timeout")
if raw is not None:
try:
return float(raw)
except (ValueError, TypeError):
pass
return default
def _build_call_kwargs(
provider: str,
model: str,
@@ -1451,7 +1538,7 @@ def call_llm(
temperature: float = None,
max_tokens: int = None,
tools: list = None,
timeout: float = 30.0,
timeout: float = None,
extra_body: dict = None,
) -> Any:
"""Centralized synchronous LLM call.
@@ -1469,7 +1556,7 @@ def call_llm(
temperature: Sampling temperature (None = provider default).
max_tokens: Max output tokens (handles max_tokens vs max_completion_tokens).
tools: Tool definitions (for function calling).
timeout: Request timeout in seconds.
timeout: Request timeout in seconds (None = read from auxiliary.{task}.timeout config).
extra_body: Additional request body fields.
Returns:
@@ -1534,10 +1621,12 @@ def call_llm(
f"No LLM provider configured for task={task} provider={resolved_provider}. "
f"Run: hermes setup")
effective_timeout = timeout if timeout is not None else _get_task_timeout(task)
kwargs = _build_call_kwargs(
resolved_provider, final_model, messages,
temperature=temperature, max_tokens=max_tokens,
tools=tools, timeout=timeout, extra_body=extra_body,
tools=tools, timeout=effective_timeout, extra_body=extra_body,
base_url=resolved_base_url)
# Handle max_tokens vs max_completion_tokens retry
@@ -1552,6 +1641,62 @@ def call_llm(
raise
def extract_content_or_reasoning(response) -> str:
"""Extract content from an LLM response, falling back to reasoning fields.
Mirrors the main agent loop's behavior when a reasoning model (DeepSeek-R1,
Qwen-QwQ, etc.) returns ``content=None`` with reasoning in structured fields.
Resolution order:
1. ``message.content`` — strip inline think/reasoning blocks, check for
remaining non-whitespace text.
2. ``message.reasoning`` / ``message.reasoning_content`` — direct
structured reasoning fields (DeepSeek, Moonshot, Novita, etc.).
3. ``message.reasoning_details`` — OpenRouter unified array format.
Returns the best available text, or ``""`` if nothing found.
"""
import re
msg = response.choices[0].message
content = (msg.content or "").strip()
if content:
# Strip inline think/reasoning blocks (mirrors _strip_think_blocks)
cleaned = re.sub(
r"<(?:think|thinking|reasoning|REASONING_SCRATCHPAD)>"
r".*?"
r"</(?:think|thinking|reasoning|REASONING_SCRATCHPAD)>",
"", content, flags=re.DOTALL | re.IGNORECASE,
).strip()
if cleaned:
return cleaned
# Content is empty or reasoning-only — try structured reasoning fields
reasoning_parts: list[str] = []
for field in ("reasoning", "reasoning_content"):
val = getattr(msg, field, None)
if val and isinstance(val, str) and val.strip() and val not in reasoning_parts:
reasoning_parts.append(val.strip())
details = getattr(msg, "reasoning_details", None)
if details and isinstance(details, list):
for detail in details:
if isinstance(detail, dict):
summary = (
detail.get("summary")
or detail.get("content")
or detail.get("text")
)
if summary and summary not in reasoning_parts:
reasoning_parts.append(summary.strip() if isinstance(summary, str) else str(summary))
if reasoning_parts:
return "\n\n".join(reasoning_parts)
return ""
async def async_call_llm(
task: str = None,
*,
@@ -1563,7 +1708,7 @@ async def async_call_llm(
temperature: float = None,
max_tokens: int = None,
tools: list = None,
timeout: float = 30.0,
timeout: float = None,
extra_body: dict = None,
) -> Any:
"""Centralized asynchronous LLM call.
@@ -1624,10 +1769,12 @@ async def async_call_llm(
f"No LLM provider configured for task={task} provider={resolved_provider}. "
f"Run: hermes setup")
effective_timeout = timeout if timeout is not None else _get_task_timeout(task)
kwargs = _build_call_kwargs(
resolved_provider, final_model, messages,
temperature=temperature, max_tokens=max_tokens,
tools=tools, timeout=timeout, extra_body=extra_body,
tools=tools, timeout=effective_timeout, extra_body=extra_body,
base_url=resolved_base_url)
try:
+2 -2
View File
@@ -141,7 +141,7 @@ class ContextCompressor:
"last_prompt_tokens": self.last_prompt_tokens,
"threshold_tokens": self.threshold_tokens,
"context_length": self.context_length,
"usage_percent": (self.last_prompt_tokens / self.context_length * 100) if self.context_length else 0,
"usage_percent": min(100, (self.last_prompt_tokens / self.context_length * 100)) if self.context_length else 0,
"compression_count": self.compression_count,
}
@@ -347,7 +347,7 @@ Write only the summary body. Do not include any preamble or prefix."""
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.3,
"max_tokens": summary_budget * 2,
"timeout": 45.0,
# timeout resolved from auxiliary.compression.timeout config by call_llm
}
if self.summary_model:
call_kwargs["model"] = self.summary_model
+13 -6
View File
@@ -286,12 +286,16 @@ def _expand_git_reference(
args: list[str],
label: str,
) -> tuple[str | None, str | None]:
result = subprocess.run(
["git", *args],
cwd=cwd,
capture_output=True,
text=True,
)
try:
result = subprocess.run(
["git", *args],
cwd=cwd,
capture_output=True,
text=True,
timeout=30,
)
except subprocess.TimeoutExpired:
return f"{ref.raw}: git command timed out (30s)", None
if result.returncode != 0:
stderr = (result.stderr or "").strip() or "git command failed"
return f"{ref.raw}: {stderr}", None
@@ -449,9 +453,12 @@ def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
cwd=cwd,
capture_output=True,
text=True,
timeout=10,
)
except FileNotFoundError:
return None
except subprocess.TimeoutExpired:
return None
if result.returncode != 0:
return None
files = [Path(line.strip()) for line in result.stdout.splitlines() if line.strip()]
+23 -9
View File
@@ -231,7 +231,7 @@ class KawaiiSpinner:
"analyzing", "computing", "synthesizing", "formulating", "brainstorming",
]
def __init__(self, message: str = "", spinner_type: str = 'dots'):
def __init__(self, message: str = "", spinner_type: str = 'dots', print_fn=None):
self.message = message
self.spinner_frames = self.SPINNERS.get(spinner_type, self.SPINNERS['dots'])
self.running = False
@@ -239,12 +239,26 @@ class KawaiiSpinner:
self.frame_idx = 0
self.start_time = None
self.last_line_len = 0
# Optional callable to route all output through (e.g. a no-op for silent
# background agents). When set, bypasses self._out entirely so that
# agents with _print_fn overridden remain fully silent.
self._print_fn = print_fn
# Capture stdout NOW, before any redirect_stdout(devnull) from
# child agents can replace sys.stdout with a black hole.
self._out = sys.stdout
def _write(self, text: str, end: str = '\n', flush: bool = False):
"""Write to the stdout captured at spinner creation time."""
"""Write to the stdout captured at spinner creation time.
If a print_fn was supplied at construction, all output is routed through
it instead — allowing callers to silence the spinner with a no-op lambda.
"""
if self._print_fn is not None:
try:
self._print_fn(text)
except Exception:
pass
return
try:
self._out.write(text + end)
if flush:
@@ -270,11 +284,11 @@ class KawaiiSpinner:
The CLI already drives a TUI widget (_spinner_text) for spinner display,
so KawaiiSpinner's \\r-based animation is redundant under StdoutProxy.
"""
out = self._out
# StdoutProxy has a 'raw' attribute (bool) that plain file objects lack.
if hasattr(out, 'raw') and type(out).__name__ == 'StdoutProxy':
return True
return False
try:
from prompt_toolkit.patch_stdout import StdoutProxy
return isinstance(self._out, StdoutProxy)
except ImportError:
return False
def _animate(self):
# When stdout is not a real terminal (e.g. Docker, systemd, pipe),
@@ -685,7 +699,7 @@ def format_context_pressure(
threshold_percent: Compaction threshold as a fraction of context window.
compression_enabled: Whether auto-compression is active.
"""
pct_int = int(compaction_progress * 100)
pct_int = min(int(compaction_progress * 100), 100)
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
@@ -715,7 +729,7 @@ def format_context_pressure_gateway(
No ANSI — just Unicode and plain text suitable for Telegram/Discord/etc.
The percentage shows progress toward the compaction threshold.
"""
pct_int = int(compaction_progress * 100)
pct_int = min(int(compaction_progress * 100), 100)
filled = min(int(compaction_progress * _BAR_WIDTH), _BAR_WIDTH)
bar = _BAR_FILLED * filled + _BAR_EMPTY * (_BAR_WIDTH - filled)
+9
View File
@@ -113,6 +113,15 @@ DEFAULT_CONTEXT_LENGTHS = {
"glm": 202752,
# Kimi
"kimi": 262144,
# Hugging Face Inference Providers — model IDs use org/name format
"Qwen/Qwen3.5-397B-A17B": 131072,
"Qwen/Qwen3.5-35B-A3B": 131072,
"deepseek-ai/DeepSeek-V3.2": 65536,
"moonshotai/Kimi-K2.5": 262144,
"moonshotai/Kimi-K2-Thinking": 262144,
"MiniMaxAI/MiniMax-M2.5": 204800,
"XiaomiMiMo/MiMo-V2-Flash": 32768,
"zai-org/GLM-5": 202752,
}
_CONTEXT_LENGTH_KEYS = (
+4 -4
View File
@@ -15,6 +15,8 @@ import time
from pathlib import Path
from typing import Any, Dict, Optional
from utils import atomic_json_write
import requests
logger = logging.getLogger(__name__)
@@ -64,12 +66,10 @@ def _load_disk_cache() -> Dict[str, Any]:
def _save_disk_cache(data: Dict[str, Any]) -> None:
"""Save models.dev data to disk cache."""
"""Save models.dev data to disk cache atomically."""
try:
cache_path = _get_cache_path()
cache_path.parent.mkdir(parents=True, exist_ok=True)
with open(cache_path, "w", encoding="utf-8") as f:
json.dump(data, f, separators=(",", ":"))
atomic_json_write(cache_path, data, indent=None, separators=(",", ":"))
except Exception as e:
logger.debug("Failed to save models.dev disk cache: %s", e)
+272 -107
View File
@@ -4,14 +4,27 @@ All functions are stateless. AIAgent._build_system_prompt() calls these to
assemble pieces, then combines them with memory and ephemeral prompts.
"""
import json
import logging
import os
import re
import threading
from collections import OrderedDict
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Optional
from agent.skill_utils import (
extract_skill_conditions,
extract_skill_description,
get_disabled_skill_names,
iter_skill_index_files,
parse_frontmatter,
skill_matches_platform,
)
from utils import atomic_json_write
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
@@ -156,6 +169,25 @@ SKILLS_GUIDANCE = (
"Skills that aren't maintained become liabilities."
)
TOOL_USE_ENFORCEMENT_GUIDANCE = (
"# Tool-use enforcement\n"
"You MUST use your tools to take action — do not describe what you would do "
"or plan to do without actually doing it. When you say you will perform an "
"action (e.g. 'I will run the tests', 'Let me check the file', 'I will create "
"the project'), you MUST immediately make the corresponding tool call in the same "
"response. Never end your turn with a promise of future action — execute it now.\n"
"Keep working until the task is actually complete. Do not stop with a summary of "
"what you plan to do next time. If you have tools available that can accomplish "
"the task, use them instead of telling the user what you would do.\n"
"Every response should either (a) contain tool calls that make progress, or "
"(b) deliver a final result to the user. Responses that only describe intentions "
"without acting are not acceptable."
)
# Model name substrings that trigger tool-use enforcement guidance.
# Add new patterns here when a model family needs explicit steering.
TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex")
PLATFORM_HINTS = {
"whatsapp": (
"You are on a text messaging communication platform, WhatsApp. "
@@ -230,6 +262,111 @@ CONTEXT_TRUNCATE_HEAD_RATIO = 0.7
CONTEXT_TRUNCATE_TAIL_RATIO = 0.2
# =========================================================================
# Skills prompt cache
# =========================================================================
_SKILLS_PROMPT_CACHE_MAX = 8
_SKILLS_PROMPT_CACHE: OrderedDict[tuple, str] = OrderedDict()
_SKILLS_PROMPT_CACHE_LOCK = threading.Lock()
_SKILLS_SNAPSHOT_VERSION = 1
def _skills_prompt_snapshot_path() -> Path:
return get_hermes_home() / ".skills_prompt_snapshot.json"
def clear_skills_system_prompt_cache(*, clear_snapshot: bool = False) -> None:
"""Drop the in-process skills prompt cache (and optionally the disk snapshot)."""
with _SKILLS_PROMPT_CACHE_LOCK:
_SKILLS_PROMPT_CACHE.clear()
if clear_snapshot:
try:
_skills_prompt_snapshot_path().unlink(missing_ok=True)
except OSError as e:
logger.debug("Could not remove skills prompt snapshot: %s", e)
def _build_skills_manifest(skills_dir: Path) -> dict[str, list[int]]:
"""Build an mtime/size manifest of all SKILL.md and DESCRIPTION.md files."""
manifest: dict[str, list[int]] = {}
for filename in ("SKILL.md", "DESCRIPTION.md"):
for path in iter_skill_index_files(skills_dir, filename):
try:
st = path.stat()
except OSError:
continue
manifest[str(path.relative_to(skills_dir))] = [st.st_mtime_ns, st.st_size]
return manifest
def _load_skills_snapshot(skills_dir: Path) -> Optional[dict]:
"""Load the disk snapshot if it exists and its manifest still matches."""
snapshot_path = _skills_prompt_snapshot_path()
if not snapshot_path.exists():
return None
try:
snapshot = json.loads(snapshot_path.read_text(encoding="utf-8"))
except Exception:
return None
if not isinstance(snapshot, dict):
return None
if snapshot.get("version") != _SKILLS_SNAPSHOT_VERSION:
return None
if snapshot.get("manifest") != _build_skills_manifest(skills_dir):
return None
return snapshot
def _write_skills_snapshot(
skills_dir: Path,
manifest: dict[str, list[int]],
skill_entries: list[dict],
category_descriptions: dict[str, str],
) -> None:
"""Persist skill metadata to disk for fast cold-start reuse."""
payload = {
"version": _SKILLS_SNAPSHOT_VERSION,
"manifest": manifest,
"skills": skill_entries,
"category_descriptions": category_descriptions,
}
try:
atomic_json_write(_skills_prompt_snapshot_path(), payload)
except Exception as e:
logger.debug("Could not write skills prompt snapshot: %s", e)
def _build_snapshot_entry(
skill_file: Path,
skills_dir: Path,
frontmatter: dict,
description: str,
) -> dict:
"""Build a serialisable metadata dict for one skill."""
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
skill_name = parts[-2]
category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
else:
category = "general"
skill_name = skill_file.parent.name
platforms = frontmatter.get("platforms") or []
if isinstance(platforms, str):
platforms = [platforms]
return {
"skill_name": skill_name,
"category": category,
"frontmatter_name": str(frontmatter.get("name", skill_name)),
"description": description,
"platforms": [str(p).strip() for p in platforms if str(p).strip()],
"conditions": extract_skill_conditions(frontmatter),
}
# =========================================================================
# Skills index
# =========================================================================
@@ -241,22 +378,13 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
(True, {}, "") to err on the side of showing the skill.
"""
try:
from tools.skills_tool import _parse_frontmatter, skill_matches_platform
raw = skill_file.read_text(encoding="utf-8")[:2000]
frontmatter, _ = _parse_frontmatter(raw)
frontmatter, _ = parse_frontmatter(raw)
if not skill_matches_platform(frontmatter):
return False, {}, ""
return False, frontmatter, ""
desc = ""
raw_desc = frontmatter.get("description", "")
if raw_desc:
desc = str(raw_desc).strip().strip("'\"")
if len(desc) > 60:
desc = desc[:57] + "..."
return True, frontmatter, desc
return True, frontmatter, extract_skill_description(frontmatter)
except Exception as e:
logger.debug("Failed to parse skill file %s: %s", skill_file, e)
return True, {}, ""
@@ -265,16 +393,9 @@ def _parse_skill_file(skill_file: Path) -> tuple[bool, dict, str]:
def _read_skill_conditions(skill_file: Path) -> dict:
"""Extract conditional activation fields from SKILL.md frontmatter."""
try:
from tools.skills_tool import _parse_frontmatter
raw = skill_file.read_text(encoding="utf-8")[:2000]
frontmatter, _ = _parse_frontmatter(raw)
hermes = frontmatter.get("metadata", {}).get("hermes", {})
return {
"fallback_for_toolsets": hermes.get("fallback_for_toolsets", []),
"requires_toolsets": hermes.get("requires_toolsets", []),
"fallback_for_tools": hermes.get("fallback_for_tools", []),
"requires_tools": hermes.get("requires_tools", []),
}
frontmatter, _ = parse_frontmatter(raw)
return extract_skill_conditions(frontmatter)
except Exception as e:
logger.debug("Failed to read skill conditions from %s: %s", skill_file, e)
return {}
@@ -317,10 +438,12 @@ def build_skills_system_prompt(
) -> str:
"""Build a compact skill index for the system prompt.
Scans ~/.hermes/skills/ for SKILL.md files grouped by category.
Includes per-skill descriptions from frontmatter so the model can
match skills by meaning, not just name.
Filters out skills incompatible with the current OS platform.
Two-layer cache:
1. In-process LRU dict keyed by (skills_dir, tools, toolsets)
2. Disk snapshot (``.skills_prompt_snapshot.json``) validated by
mtime/size manifest — survives process restarts
Falls back to a full filesystem scan when both layers miss.
"""
hermes_home = get_hermes_home()
skills_dir = hermes_home / "skills"
@@ -328,98 +451,140 @@ def build_skills_system_prompt(
if not skills_dir.exists():
return ""
# Collect skills with descriptions, grouped by category.
# Each entry: (skill_name, description)
# Supports sub-categories: skills/mlops/training/axolotl/SKILL.md
# -> category "mlops/training", skill "axolotl"
# Load disabled skill names once for the entire scan
try:
from tools.skills_tool import _get_disabled_skill_names
disabled = _get_disabled_skill_names()
except Exception:
disabled = set()
# ── Layer 1: in-process LRU cache ─────────────────────────────────
cache_key = (
str(skills_dir.resolve()),
tuple(sorted(str(t) for t in (available_tools or set()))),
tuple(sorted(str(ts) for ts in (available_toolsets or set()))),
)
with _SKILLS_PROMPT_CACHE_LOCK:
cached = _SKILLS_PROMPT_CACHE.get(cache_key)
if cached is not None:
_SKILLS_PROMPT_CACHE.move_to_end(cache_key)
return cached
disabled = get_disabled_skill_names()
# ── Layer 2: disk snapshot ────────────────────────────────────────
snapshot = _load_skills_snapshot(skills_dir)
skills_by_category: dict[str, list[tuple[str, str]]] = {}
for skill_file in skills_dir.rglob("SKILL.md"):
is_compatible, frontmatter, desc = _parse_skill_file(skill_file)
if not is_compatible:
continue
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
skill_name = parts[-2]
category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
else:
category = "general"
skill_name = skill_file.parent.name
# Respect user's disabled skills config
fm_name = frontmatter.get("name", skill_name)
if fm_name in disabled or skill_name in disabled:
continue
# Extract conditions inline from already-parsed frontmatter
# (avoids redundant file re-read that _read_skill_conditions would do)
hermes_meta = (frontmatter.get("metadata") or {}).get("hermes") or {}
conditions = {
"fallback_for_toolsets": hermes_meta.get("fallback_for_toolsets", []),
"requires_toolsets": hermes_meta.get("requires_toolsets", []),
"fallback_for_tools": hermes_meta.get("fallback_for_tools", []),
"requires_tools": hermes_meta.get("requires_tools", []),
category_descriptions: dict[str, str] = {}
if snapshot is not None:
# Fast path: use pre-parsed metadata from disk
for entry in snapshot.get("skills", []):
if not isinstance(entry, dict):
continue
skill_name = entry.get("skill_name") or ""
category = entry.get("category") or "general"
frontmatter_name = entry.get("frontmatter_name") or skill_name
platforms = entry.get("platforms") or []
if not skill_matches_platform({"platforms": platforms}):
continue
if frontmatter_name in disabled or skill_name in disabled:
continue
if not _skill_should_show(
entry.get("conditions") or {},
available_tools,
available_toolsets,
):
continue
skills_by_category.setdefault(category, []).append(
(skill_name, entry.get("description", ""))
)
category_descriptions = {
str(k): str(v)
for k, v in (snapshot.get("category_descriptions") or {}).items()
}
if not _skill_should_show(conditions, available_tools, available_toolsets):
continue
skills_by_category.setdefault(category, []).append((skill_name, desc))
else:
# Cold path: full filesystem scan + write snapshot for next time
skill_entries: list[dict] = []
for skill_file in iter_skill_index_files(skills_dir, "SKILL.md"):
is_compatible, frontmatter, desc = _parse_skill_file(skill_file)
entry = _build_snapshot_entry(skill_file, skills_dir, frontmatter, desc)
skill_entries.append(entry)
if not is_compatible:
continue
skill_name = entry["skill_name"]
if entry["frontmatter_name"] in disabled or skill_name in disabled:
continue
if not _skill_should_show(
extract_skill_conditions(frontmatter),
available_tools,
available_toolsets,
):
continue
skills_by_category.setdefault(entry["category"], []).append(
(skill_name, entry["description"])
)
if not skills_by_category:
return ""
# Read category-level descriptions from DESCRIPTION.md
# Checks both the exact category path and parent directories
category_descriptions = {}
for category in skills_by_category:
cat_path = Path(category)
desc_file = skills_dir / cat_path / "DESCRIPTION.md"
if desc_file.exists():
# Read category-level DESCRIPTION.md files
for desc_file in iter_skill_index_files(skills_dir, "DESCRIPTION.md"):
try:
content = desc_file.read_text(encoding="utf-8")
match = re.search(r"^---\s*\n.*?description:\s*(.+?)\s*\n.*?^---", content, re.MULTILINE | re.DOTALL)
if match:
category_descriptions[category] = match.group(1).strip()
fm, _ = parse_frontmatter(content)
cat_desc = fm.get("description")
if not cat_desc:
continue
rel = desc_file.relative_to(skills_dir)
cat = "/".join(rel.parts[:-1]) if len(rel.parts) > 1 else "general"
category_descriptions[cat] = str(cat_desc).strip().strip("'\"")
except Exception as e:
logger.debug("Could not read skill description %s: %s", desc_file, e)
index_lines = []
for category in sorted(skills_by_category.keys()):
cat_desc = category_descriptions.get(category, "")
if cat_desc:
index_lines.append(f" {category}: {cat_desc}")
else:
index_lines.append(f" {category}:")
# Deduplicate and sort skills within each category
seen = set()
for name, desc in sorted(skills_by_category[category], key=lambda x: x[0]):
if name in seen:
continue
seen.add(name)
if desc:
index_lines.append(f" - {name}: {desc}")
else:
index_lines.append(f" - {name}")
_write_skills_snapshot(
skills_dir,
_build_skills_manifest(skills_dir),
skill_entries,
category_descriptions,
)
return (
"## Skills (mandatory)\n"
"Before replying, scan the skills below. If one clearly matches your task, "
"load it with skill_view(name) and follow its instructions. "
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"After difficult/iterative tasks, offer to save as a skill. "
"If a skill you loaded was missing steps, had wrong commands, or needed "
"pitfalls you discovered, update it before finishing.\n"
"\n"
"<available_skills>\n"
+ "\n".join(index_lines) + "\n"
"</available_skills>\n"
"\n"
"If none match, proceed normally without loading a skill."
)
if not skills_by_category:
result = ""
else:
index_lines = []
for category in sorted(skills_by_category.keys()):
cat_desc = category_descriptions.get(category, "")
if cat_desc:
index_lines.append(f" {category}: {cat_desc}")
else:
index_lines.append(f" {category}:")
# Deduplicate and sort skills within each category
seen = set()
for name, desc in sorted(skills_by_category[category], key=lambda x: x[0]):
if name in seen:
continue
seen.add(name)
if desc:
index_lines.append(f" - {name}: {desc}")
else:
index_lines.append(f" - {name}")
result = (
"## Skills (mandatory)\n"
"Before replying, scan the skills below. If one clearly matches your task, "
"load it with skill_view(name) and follow its instructions. "
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"After difficult/iterative tasks, offer to save as a skill. "
"If a skill you loaded was missing steps, had wrong commands, or needed "
"pitfalls you discovered, update it before finishing.\n"
"\n"
"<available_skills>\n"
+ "\n".join(index_lines) + "\n"
"</available_skills>\n"
"\n"
"If none match, proceed normally without loading a skill."
)
# ── Store in LRU cache ────────────────────────────────────────────
with _SKILLS_PROMPT_CACHE_LOCK:
_SKILLS_PROMPT_CACHE[cache_key] = result
_SKILLS_PROMPT_CACHE.move_to_end(cache_key)
while len(_SKILLS_PROMPT_CACHE) > _SKILLS_PROMPT_CACHE_MAX:
_SKILLS_PROMPT_CACHE.popitem(last=False)
return result
# =========================================================================
+203
View File
@@ -0,0 +1,203 @@
"""Lightweight skill metadata utilities shared by prompt_builder and skills_tool.
This module intentionally avoids importing the tool registry, CLI config, or any
heavy dependency chain. It is safe to import at module level without triggering
tool registration or provider resolution.
"""
import logging
import os
import re
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import get_hermes_home
logger = logging.getLogger(__name__)
# ── Platform mapping ──────────────────────────────────────────────────────
PLATFORM_MAP = {
"macos": "darwin",
"linux": "linux",
"windows": "win32",
}
EXCLUDED_SKILL_DIRS = frozenset((".git", ".github", ".hub"))
# ── Lazy YAML loader ─────────────────────────────────────────────────────
_yaml_load_fn = None
def yaml_load(content: str):
"""Parse YAML with lazy import and CSafeLoader preference."""
global _yaml_load_fn
if _yaml_load_fn is None:
import yaml
loader = getattr(yaml, "CSafeLoader", None) or yaml.SafeLoader
def _load(value: str):
return yaml.load(value, Loader=loader)
_yaml_load_fn = _load
return _yaml_load_fn(content)
# ── Frontmatter parsing ──────────────────────────────────────────────────
def parse_frontmatter(content: str) -> Tuple[Dict[str, Any], str]:
"""Parse YAML frontmatter from a markdown string.
Uses yaml with CSafeLoader for full YAML support (nested metadata, lists)
with a fallback to simple key:value splitting for robustness.
Returns:
(frontmatter_dict, remaining_body)
"""
frontmatter: Dict[str, Any] = {}
body = content
if not content.startswith("---"):
return frontmatter, body
end_match = re.search(r"\n---\s*\n", content[3:])
if not end_match:
return frontmatter, body
yaml_content = content[3 : end_match.start() + 3]
body = content[end_match.end() + 3 :]
try:
parsed = yaml_load(yaml_content)
if isinstance(parsed, dict):
frontmatter = parsed
except Exception:
# Fallback: simple key:value parsing for malformed YAML
for line in yaml_content.strip().split("\n"):
if ":" not in line:
continue
key, value = line.split(":", 1)
frontmatter[key.strip()] = value.strip()
return frontmatter, body
# ── Platform matching ─────────────────────────────────────────────────────
def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:
"""Return True when the skill is compatible with the current OS.
Skills declare platform requirements via a top-level ``platforms`` list
in their YAML frontmatter::
platforms: [macos] # macOS only
platforms: [macos, linux] # macOS and Linux
If the field is absent or empty the skill is compatible with **all**
platforms (backward-compatible default).
"""
platforms = frontmatter.get("platforms")
if not platforms:
return True
if not isinstance(platforms, list):
platforms = [platforms]
current = sys.platform
for platform in platforms:
normalized = str(platform).lower().strip()
mapped = PLATFORM_MAP.get(normalized, normalized)
if current.startswith(mapped):
return True
return False
# ── Disabled skills ───────────────────────────────────────────────────────
def get_disabled_skill_names() -> Set[str]:
"""Read disabled skill names from config.yaml.
Resolves platform from ``HERMES_PLATFORM`` env var, falls back to
the global disabled list. Reads the config file directly (no CLI
config imports) to stay lightweight.
"""
config_path = get_hermes_home() / "config.yaml"
if not config_path.exists():
return set()
try:
parsed = yaml_load(config_path.read_text(encoding="utf-8"))
except Exception as e:
logger.debug("Could not read skill config %s: %s", config_path, e)
return set()
if not isinstance(parsed, dict):
return set()
skills_cfg = parsed.get("skills")
if not isinstance(skills_cfg, dict):
return set()
resolved_platform = os.getenv("HERMES_PLATFORM")
if resolved_platform:
platform_disabled = (skills_cfg.get("platform_disabled") or {}).get(
resolved_platform
)
if platform_disabled is not None:
return _normalize_string_set(platform_disabled)
return _normalize_string_set(skills_cfg.get("disabled"))
def _normalize_string_set(values) -> Set[str]:
if values is None:
return set()
if isinstance(values, str):
values = [values]
return {str(v).strip() for v in values if str(v).strip()}
# ── Condition extraction ──────────────────────────────────────────────────
def extract_skill_conditions(frontmatter: Dict[str, Any]) -> Dict[str, List]:
"""Extract conditional activation fields from parsed frontmatter."""
hermes = (frontmatter.get("metadata") or {}).get("hermes") or {}
return {
"fallback_for_toolsets": hermes.get("fallback_for_toolsets", []),
"requires_toolsets": hermes.get("requires_toolsets", []),
"fallback_for_tools": hermes.get("fallback_for_tools", []),
"requires_tools": hermes.get("requires_tools", []),
}
# ── Description extraction ────────────────────────────────────────────────
def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
"""Extract a truncated description from parsed frontmatter."""
raw_desc = frontmatter.get("description", "")
if not raw_desc:
return ""
desc = str(raw_desc).strip().strip("'\"")
if len(desc) > 60:
return desc[:57] + "..."
return desc
# ── File iteration ────────────────────────────────────────────────────────
def iter_skill_index_files(skills_dir: Path, filename: str):
"""Walk skills_dir yielding sorted paths matching *filename*.
Excludes ``.git``, ``.github``, ``.hub`` directories.
"""
matches = []
for root, dirs, files in os.walk(skills_dir):
dirs[:] = [d for d in dirs if d not in EXCLUDED_SKILL_DIRS]
if filename in files:
matches.append(Path(root) / filename)
for path in sorted(matches, key=lambda p: str(p.relative_to(skills_dir))):
yield path
+1 -1
View File
@@ -19,7 +19,7 @@ _TITLE_PROMPT = (
)
def generate_title(user_message: str, assistant_response: str, timeout: float = 15.0) -> Optional[str]:
def generate_title(user_message: str, assistant_response: str, timeout: float = 30.0) -> Optional[str]:
"""Generate a session title from the first exchange.
Uses the auxiliary LLM client (cheapest/fastest available model).
+7
View File
@@ -7,6 +7,7 @@
# =============================================================================
model:
# Default model to use (can be overridden with --model flag)
# Both "default" and "model" work as the key name here.
default: "anthropic/claude-opus-4.6"
# Inference provider selection:
@@ -688,6 +689,12 @@ display:
# Toggle at runtime with /verbose in the CLI
tool_progress: all
# What Enter does when Hermes is already busy in the CLI.
# interrupt: Interrupt the current run and redirect Hermes (default)
# queue: Queue your message for the next turn
# Ctrl+C always interrupts regardless of this setting.
busy_input_mode: interrupt
# Background process notifications (gateway/messaging only).
# Controls how chatty the process watcher is when you use
# terminal(background=true, check_interval=...) from Telegram/Discord/etc.
+249 -34
View File
@@ -205,6 +205,7 @@ def load_cli_config() -> Dict[str, Any]:
"resume_display": "full",
"show_reasoning": False,
"streaming": True,
"busy_input_mode": "interrupt",
"skin": "default",
},
@@ -448,6 +449,17 @@ try:
except Exception:
pass # Skin engine is optional — default skin used if unavailable
# Neuter AsyncHttpxClientWrapper.__del__ before any AsyncOpenAI clients are
# created. The SDK's __del__ schedules aclose() on asyncio.get_running_loop()
# which, during CLI idle time, finds prompt_toolkit's event loop and tries to
# close TCP transports bound to dead worker loops — producing
# "Event loop is closed" / "Press ENTER to continue..." errors.
try:
from agent.auxiliary_client import neuter_async_httpx_del
neuter_async_httpx_del()
except Exception:
pass
from rich import box as rich_box
from rich.console import Console
from rich.markup import escape as _escape
@@ -1035,13 +1047,18 @@ class HermesCLI:
self.config = CLI_CONFIG
self.compact = compact if compact is not None else CLI_CONFIG["display"].get("compact", False)
# tool_progress: "off", "new", "all", "verbose" (from config.yaml display section)
self.tool_progress_mode = CLI_CONFIG["display"].get("tool_progress", "all")
# YAML 1.1 parses bare `off` as boolean False — normalise to string.
_raw_tp = CLI_CONFIG["display"].get("tool_progress", "all")
self.tool_progress_mode = "off" if _raw_tp is False else str(_raw_tp)
# resume_display: "full" (show history) | "minimal" (one-liner only)
self.resume_display = CLI_CONFIG["display"].get("resume_display", "full")
# bell_on_complete: play terminal bell (\a) when agent finishes a response
self.bell_on_complete = CLI_CONFIG["display"].get("bell_on_complete", False)
# show_reasoning: display model thinking/reasoning before the response
self.show_reasoning = CLI_CONFIG["display"].get("show_reasoning", False)
# busy_input_mode: "interrupt" (Enter interrupts current run) or "queue" (Enter queues for next turn)
_bim = CLI_CONFIG["display"].get("busy_input_mode", "interrupt")
self.busy_input_mode = "queue" if str(_bim).strip().lower() == "queue" else "interrupt"
self.verbose = verbose if verbose is not None else (self.tool_progress_mode == "verbose")
@@ -1061,12 +1078,12 @@ class HermesCLI:
# authoritative. This avoids conflicts in multi-agent setups where
# env vars would stomp each other.
_model_config = CLI_CONFIG.get("model", {})
_config_model = _model_config.get("default", "") if isinstance(_model_config, dict) else (_model_config or "")
_config_model = (_model_config.get("default") or _model_config.get("model") or "") if isinstance(_model_config, dict) else (_model_config or "")
_FALLBACK_MODEL = "anthropic/claude-opus-4.6"
self.model = model or _config_model or _FALLBACK_MODEL
# Auto-detect model from local server if still on fallback
if self.model == _FALLBACK_MODEL:
_base_url = _model_config.get("base_url", "") if isinstance(_model_config, dict) else ""
_base_url = (_model_config.get("base_url") or "") if isinstance(_model_config, dict) else ""
if "localhost" in _base_url or "127.0.0.1" in _base_url:
from hermes_cli.runtime_provider import _auto_detect_local_model
_detected = _auto_detect_local_model(_base_url)
@@ -1329,7 +1346,12 @@ class HermesCLI:
def _build_status_bar_text(self, width: Optional[int] = None) -> str:
try:
snapshot = self._get_status_bar_snapshot()
width = width or shutil.get_terminal_size((80, 24)).columns
if width is None:
try:
from prompt_toolkit.application import get_app
width = get_app().output.get_size().columns
except Exception:
width = shutil.get_terminal_size((80, 24)).columns
percent = snapshot["context_percent"]
percent_label = f"{percent}%" if percent is not None else "--"
duration_label = snapshot["duration"]
@@ -1359,7 +1381,16 @@ class HermesCLI:
return []
try:
snapshot = self._get_status_bar_snapshot()
width = shutil.get_terminal_size((80, 24)).columns
# Use prompt_toolkit's own terminal width when running inside the
# TUI — shutil.get_terminal_size() can return stale or fallback
# values (especially on SSH) that differ from what prompt_toolkit
# actually renders, causing the fragments to overflow to a second
# line and produce duplicated status bar rows over long sessions.
try:
from prompt_toolkit.application import get_app
width = get_app().output.get_size().columns
except Exception:
width = shutil.get_terminal_size((80, 24)).columns
duration_label = snapshot["duration"]
if width < 52:
@@ -1594,6 +1625,7 @@ class HermesCLI:
if not text:
return
self._reasoning_stream_started = True
self._reasoning_shown_this_turn = True
if getattr(self, "_stream_box_opened", False):
return
@@ -2929,6 +2961,82 @@ class HermesCLI:
if not silent:
print("(^_^)v New session started!")
def _handle_resume_command(self, cmd_original: str) -> None:
"""Handle /resume <session_id_or_title> — switch to a previous session mid-conversation."""
parts = cmd_original.split(None, 1)
target = parts[1].strip() if len(parts) > 1 else ""
if not target:
_cprint(" Usage: /resume <session_id_or_title>")
_cprint(" Tip: Use /history or `hermes sessions list` to find sessions.")
return
if not self._session_db:
_cprint(" Session database not available.")
return
# Resolve title or ID
from hermes_cli.main import _resolve_session_by_name_or_id
resolved = _resolve_session_by_name_or_id(target)
target_id = resolved or target
session_meta = self._session_db.get_session(target_id)
if not session_meta:
_cprint(f" Session not found: {target}")
_cprint(" Use /history or `hermes sessions list` to see available sessions.")
return
if target_id == self.session_id:
_cprint(" Already on that session.")
return
# End current session
try:
self._session_db.end_session(self.session_id, "resumed_other")
except Exception:
pass
# Switch to the target session
self.session_id = target_id
self._resumed = True
self._pending_title = None
# Load conversation history
restored = self._session_db.get_messages_as_conversation(target_id)
self.conversation_history = restored or []
# Re-open the target session so it's not marked as ended
try:
self._session_db.reopen_session(target_id)
except Exception:
pass
# Sync the agent if already initialised
if self.agent:
self.agent.session_id = target_id
self.agent.reset_session_state()
if hasattr(self.agent, "_last_flushed_db_idx"):
self.agent._last_flushed_db_idx = len(self.conversation_history)
if hasattr(self.agent, "_todo_store"):
try:
from tools.todo_tool import TodoStore
self.agent._todo_store = TodoStore()
except Exception:
pass
if hasattr(self.agent, "_invalidate_system_prompt"):
self.agent._invalidate_system_prompt()
title_part = f" \"{session_meta['title']}\"" if session_meta.get("title") else ""
msg_count = len([m for m in self.conversation_history if m.get("role") == "user"])
if self.conversation_history:
_cprint(
f" ↻ Resumed session {target_id}{title_part}"
f" ({msg_count} user message{'s' if msg_count != 1 else ''},"
f" {len(self.conversation_history)} total)"
)
else:
_cprint(f" ↻ Resumed session {target_id}{title_part} — no messages, starting fresh.")
def reset_conversation(self):
"""Reset the conversation by starting a new session."""
self.new_session()
@@ -3647,6 +3755,8 @@ class HermesCLI:
_cprint(" Session database not available.")
elif canonical == "new":
self.new_session()
elif canonical == "resume":
self._handle_resume_command(cmd_original)
elif canonical == "provider":
self._show_model_and_providers()
elif canonical == "prompt":
@@ -3722,17 +3832,17 @@ class HermesCLI:
elif canonical == "background":
self._handle_background_command(cmd_original)
elif canonical == "queue":
if not self._agent_running:
_cprint(" /queue only works while Hermes is busy. Just type your message normally.")
# Extract prompt after "/queue " or "/q "
parts = cmd_original.split(None, 1)
payload = parts[1].strip() if len(parts) > 1 else ""
if not payload:
_cprint(" Usage: /queue <prompt>")
else:
# Extract prompt after "/queue " or "/q "
parts = cmd_original.split(None, 1)
payload = parts[1].strip() if len(parts) > 1 else ""
if not payload:
_cprint(" Usage: /queue <prompt>")
else:
self._pending_input.put(payload)
self._pending_input.put(payload)
if self._agent_running:
_cprint(f" Queued for the next turn: {payload[:80]}{'...' if len(payload) > 80 else ''}")
else:
_cprint(f" Queued: {payload[:80]}{'...' if len(payload) > 80 else ''}")
elif canonical == "skin":
self._handle_skin_command(cmd_original)
elif canonical == "voice":
@@ -3924,6 +4034,17 @@ class HermesCLI:
provider_data_collection=self._provider_data_collection,
fallback_model=self._fallback_model,
)
# Silence raw spinner; route thinking through TUI widget when no foreground agent is active.
bg_agent._print_fn = lambda *_a, **_kw: None
def _bg_thinking(text: str) -> None:
# Concurrent bg tasks may race on _spinner_text; acceptable for best-effort UI.
if not self._agent_running:
self._spinner_text = text
if self._app:
self._app.invalidate()
bg_agent.thinking_callback = _bg_thinking
result = bg_agent.run_conversation(
user_message=prompt,
@@ -3986,6 +4107,9 @@ class HermesCLI:
_cprint(f" ❌ Background task #{task_num} failed: {e}")
finally:
self._background_tasks.pop(task_id, None)
# Clear spinner only if no foreground agent owns it
if not self._agent_running:
self._spinner_text = ""
if self._app:
self._invalidate(min_interval=0)
@@ -4396,7 +4520,7 @@ class HermesCLI:
compressor = agent.context_compressor
last_prompt = compressor.last_prompt_tokens
ctx_len = compressor.context_length
pct = (last_prompt / ctx_len * 100) if ctx_len else 0
pct = min(100, (last_prompt / ctx_len * 100)) if ctx_len else 0
compressions = compressor.compression_count
msg_count = len(self.conversation_history)
@@ -5424,6 +5548,13 @@ class HermesCLI:
except Exception as e:
logging.debug("@ context reference expansion failed: %s", e)
# Sanitize surrogate characters that can arrive via clipboard paste from
# rich-text editors (Google Docs, Word, etc.). Lone surrogates are invalid
# UTF-8 and crash JSON serialization in the OpenAI SDK.
if isinstance(message, str):
from run_agent import _sanitize_surrogates
message = _sanitize_surrogates(message)
# Add user message to history
self.conversation_history.append({"role": "user", "content": message})
@@ -5436,6 +5567,10 @@ class HermesCLI:
# Reset streaming display state for this turn
self._reset_stream_state()
# Separate from _reset_stream_state because this must persist
# across intermediate turn boundaries (tool-calling loops) — only
# reset at the start of each user turn.
self._reasoning_shown_this_turn = False
# --- Streaming TTS setup ---
# When ElevenLabs is the TTS provider and sounddevice is available,
@@ -5580,6 +5715,16 @@ class HermesCLI:
agent_thread.join() # Ensure agent thread completes
# Proactively clean up async clients whose event loop is dead.
# The agent thread may have created AsyncOpenAI clients bound
# to a per-thread event loop; if that loop is now closed, those
# clients' __del__ would crash prompt_toolkit's loop on GC.
try:
from agent.auxiliary_client import cleanup_stale_async_clients
cleanup_stale_async_clients()
except Exception:
pass
# Flush any remaining streamed text and close the box
self._flush_stream()
@@ -5640,8 +5785,13 @@ class HermesCLI:
response_previewed = result.get("response_previewed", False) if result else False
# Display reasoning (thinking) box if enabled and available.
# Skip when streaming already showed reasoning live.
if self.show_reasoning and result and not self._reasoning_stream_started:
# Skip when streaming already showed reasoning live. Use the
# turn-persistent flag (_reasoning_shown_this_turn) instead of
# _reasoning_stream_started — the latter gets reset during
# intermediate turn boundaries (tool-calling loops), which caused
# the reasoning box to re-render after the final response.
_reasoning_already_shown = getattr(self, '_reasoning_shown_this_turn', False)
if self.show_reasoning and result and not _reasoning_already_shown:
reasoning = result.get("last_reasoning")
if reasoning:
w = shutil.get_terminal_size().columns
@@ -5762,10 +5912,22 @@ class HermesCLI:
else:
duration_str = f"{seconds}s"
# Look up session title for resume-by-name hint
session_title = None
if self._session_db:
try:
session_title = self._session_db.get_session_title(self.session_id)
except Exception:
pass
print("Resume this session with:")
print(f" hermes --resume {self.session_id}")
if session_title:
print(f" hermes -c \"{session_title}\"")
print()
print(f"Session: {self.session_id}")
if session_title:
print(f"Title: {session_title}")
print(f"Duration: {duration_str}")
print(f"Messages: {msg_count} ({user_msgs} user, {tool_calls} tool calls)")
else:
@@ -5941,7 +6103,7 @@ class HermesCLI:
from honcho_integration.client import HonchoClientConfig
from agent.display import honcho_session_line, write_tty
hcfg = HonchoClientConfig.from_global_config()
if hcfg.enabled and hcfg.api_key and hcfg.explicitly_configured:
if hcfg.enabled and (hcfg.api_key or hcfg.base_url) and hcfg.explicitly_configured:
sname = hcfg.resolve_session_name(session_id=self.session_id)
if sname:
write_tty(honcho_session_line(hcfg.workspace_id, sname) + "\n")
@@ -6028,10 +6190,18 @@ class HermesCLI:
set_approval_callback(self._approval_callback)
set_secret_capture_callback(self._secret_capture_callback)
# Ensure tirith security scanner is available (downloads if needed)
# Ensure tirith security scanner is available (downloads if needed).
# Warn the user if tirith is enabled in config but not available,
# so they know command security scanning is degraded.
try:
from tools.tirith_security import ensure_installed
ensure_installed(log_failures=False)
tirith_path = ensure_installed(log_failures=False)
if tirith_path is None:
security_cfg = self.config.get("security", {}) or {}
tirith_enabled = security_cfg.get("tirith_enabled", True)
if tirith_enabled:
_cprint(f" {_DIM}⚠ tirith security scanner enabled but not available "
f"— command scanning will use pattern matching only{_RST}")
except Exception:
pass # Non-fatal — fail-open at scan time if unavailable
@@ -6112,16 +6282,22 @@ class HermesCLI:
# Bundle text + images as a tuple when images are present
payload = (text, images) if images else text
if self._agent_running and not (text and text.startswith("/")):
self._interrupt_queue.put(payload)
# Debug: log to file when message enters interrupt queue
try:
_dbg = _hermes_home / "interrupt_debug.log"
with open(_dbg, "a") as _f:
import time as _t
_f.write(f"{_t.strftime('%H:%M:%S')} ENTER: queued interrupt msg={str(payload)[:60]!r}, "
f"agent_running={self._agent_running}\n")
except Exception:
pass
if self.busy_input_mode == "queue":
# Queue for the next turn instead of interrupting
self._pending_input.put(payload)
preview = text if text else f"[{len(images)} image{'s' if len(images) != 1 else ''} attached]"
_cprint(f" Queued for the next turn: {preview[:80]}{'...' if len(preview) > 80 else ''}")
else:
self._interrupt_queue.put(payload)
# Debug: log to file when message enters interrupt queue
try:
_dbg = _hermes_home / "interrupt_debug.log"
with open(_dbg, "a") as _f:
import time as _t
_f.write(f"{_t.strftime('%H:%M:%S')} ENTER: queued interrupt msg={str(payload)[:60]!r}, "
f"agent_running={self._agent_running}\n")
except Exception:
pass
else:
self._pending_input.put(payload)
event.app.current_buffer.reset(append_to_history=True)
@@ -6501,6 +6677,7 @@ class HermesCLI:
# Paste collapsing: detect large pastes and save to temp file
_paste_counter = [0]
_prev_text_len = [0]
_prev_newline_count = [0]
_paste_just_collapsed = [False]
def _on_text_changed(buf):
@@ -6509,18 +6686,27 @@ class HermesCLI:
When bracketed paste is available, handle_paste collapses
large pastes directly. This handler is a fallback for
terminals without bracketed paste support.
Two heuristics (either triggers collapse):
1. Many characters added at once (chars_added > 1) works
when the terminal delivers the paste in one event-loop tick.
2. Newline count jumped by 4+ in a single text-change event
catches terminals that feed characters individually but
still batch newlines. Alt+Enter only adds 1 newline per
event so it never triggers this.
"""
text = buf.text
chars_added = len(text) - _prev_text_len[0]
_prev_text_len[0] = len(text)
if _paste_just_collapsed[0]:
_paste_just_collapsed[0] = False
_prev_newline_count[0] = text.count('\n')
return
line_count = text.count('\n')
# Heuristic: a real paste adds many characters at once (not just a
# single newline from Alt+Enter) AND the result has 5+ lines.
# Fallback for terminals without bracketed paste support.
if line_count >= 5 and chars_added > 1 and not text.startswith('/'):
newlines_added = line_count - _prev_newline_count[0]
_prev_newline_count[0] = line_count
is_paste = chars_added > 1 or newlines_added >= 4
if line_count >= 5 and is_paste and not text.startswith('/'):
_paste_counter[0] += 1
# Save to temp file
paste_dir = _hermes_home / "pastes"
@@ -6528,6 +6714,7 @@ class HermesCLI:
paste_file = paste_dir / f"paste_{_paste_counter[0]}_{datetime.now().strftime('%H%M%S')}.txt"
paste_file.write_text(text, encoding="utf-8")
# Replace buffer with compact reference
_paste_just_collapsed[0] = True
buf.text = f"[Pasted text #{_paste_counter[0]}: {line_count + 1} lines \u2192 {paste_file}]"
buf.cursor_position = len(buf.text)
@@ -6894,6 +7081,15 @@ class HermesCLI:
Window(
content=FormattedTextControl(lambda: cli_ref._get_status_bar_fragments()),
height=1,
# Prevent fragments that overflow the terminal width from
# wrapping onto a second line, which causes the status bar to
# appear duplicated (one full + one partial row) during long
# sessions, especially on SSH where shutil.get_terminal_size
# may return stale values. _get_status_bar_fragments now reads
# width from prompt_toolkit's own output object, so fragments
# will always fit; wrap_lines=False is the belt-and-suspenders
# guard against any future width mismatch.
wrap_lines=False,
),
filter=Condition(lambda: cli_ref._status_bar_visible),
)
@@ -7128,9 +7324,28 @@ class HermesCLI:
# Register atexit cleanup so resources are freed even on unexpected exit
atexit.register(_run_cleanup)
# Install a custom asyncio exception handler that suppresses the
# "Event loop is closed" RuntimeError from httpx transport cleanup.
# This is defense-in-depth — the primary fix is neuter_async_httpx_del
# which disables __del__ entirely, but older clients or SDK upgrades
# could bypass it.
def _suppress_closed_loop_errors(loop, context):
exc = context.get("exception")
if isinstance(exc, RuntimeError) and "Event loop is closed" in str(exc):
return # silently suppress
# Fall back to default handler for everything else
loop.default_exception_handler(context)
# Run the application with patch_stdout for proper output handling
try:
with patch_stdout():
# Set the custom handler on prompt_toolkit's event loop
try:
import asyncio as _aio
_loop = _aio.get_event_loop()
_loop.set_exception_handler(_suppress_closed_loop_errors)
except Exception:
pass
app.run()
except (EOFError, KeyboardInterrupt):
pass
+42 -1
View File
@@ -327,7 +327,20 @@ def load_jobs() -> List[Dict[str, Any]]:
with open(JOBS_FILE, 'r', encoding='utf-8') as f:
data = json.load(f)
return data.get("jobs", [])
except (json.JSONDecodeError, IOError):
except json.JSONDecodeError:
# Retry with strict=False to handle bare control chars in string values
try:
with open(JOBS_FILE, 'r', encoding='utf-8') as f:
data = json.loads(f.read(), strict=False)
jobs = data.get("jobs", [])
if jobs:
# Auto-repair: rewrite with proper escaping
save_jobs(jobs)
logger.warning("Auto-repaired jobs.json (had invalid control characters)")
return jobs
except Exception:
return []
except IOError:
return []
@@ -598,6 +611,34 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
save_jobs(jobs)
def advance_next_run(job_id: str) -> bool:
"""Preemptively advance next_run_at for a recurring job before execution.
Call this BEFORE run_job() so that if the process crashes mid-execution,
the job won't re-fire on the next gateway restart. This converts the
scheduler from at-least-once to at-most-once for recurring jobs — missing
one run is far better than firing dozens of times in a crash loop.
One-shot jobs are left unchanged so they can still retry on restart.
Returns True if next_run_at was advanced, False otherwise.
"""
jobs = load_jobs()
for job in jobs:
if job["id"] == job_id:
kind = job.get("schedule", {}).get("kind")
if kind not in ("cron", "interval"):
return False
now = _hermes_now().isoformat()
new_next = compute_next_run(job["schedule"], now)
if new_next and new_next != job.get("next_run_at"):
job["next_run_at"] = new_next
save_jobs(jobs)
return True
return False
return False
def get_due_jobs() -> List[Dict[str, Any]]:
"""Get all jobs that are due to run now.
+7 -1
View File
@@ -35,7 +35,7 @@ logger = logging.getLogger(__name__)
# Add parent directory to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent))
from cron.jobs import get_due_jobs, mark_job_run, save_job_output
from cron.jobs import get_due_jobs, mark_job_run, save_job_output, advance_next_run
# Sentinel: when a cron agent has nothing new to report, it can start its
# response with this marker to suppress delivery. Output is still saved
@@ -524,6 +524,12 @@ def tick(verbose: bool = True) -> int:
executed = 0
for job in due_jobs:
try:
# For recurring jobs (cron/interval), advance next_run_at to the
# next future occurrence BEFORE execution. This way, if the
# process crashes mid-run, the job won't re-fire on restart.
# One-shot jobs are left alone so they can retry on restart.
advance_next_run(job["id"])
success, output, final_response, error = run_job(job)
output_file = save_job_output(job["id"], output)
+3 -13
View File
@@ -101,21 +101,11 @@ Available methods:
### Patches (`patches.py`)
**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., the Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., the Modal backend). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
**Solution**: `patches.py` monkey-patches `SwerexModalEnvironment` to use a dedicated background thread (`_AsyncWorker`) with its own event loop. The calling code sees the same sync interface, but internally the async work happens on a separate thread that doesn't conflict with Atropos's loop.
**Solution**: `ModalEnvironment` uses a dedicated `_AsyncWorker` background thread with its own event loop. The calling code sees a sync interface, but internally all async Modal SDK calls happen on the worker thread so they don't conflict with Atropos's loop. This is built directly into `tools/environments/modal.py` — no monkey-patching required.
What gets patched:
- `SwerexModalEnvironment.__init__` -- creates Modal deployment on a background thread
- `SwerexModalEnvironment.execute` -- runs commands on the same background thread
- `SwerexModalEnvironment.stop` -- stops deployment on the background thread
The patches are:
- **Idempotent** -- calling `apply_patches()` multiple times is safe
- **Transparent** -- same interface and behavior, only the internal async execution changes
- **Universal** -- works identically in normal CLI use (no running event loop)
Applied automatically at import time by `hermes_base_env.py`.
`patches.py` is now a no-op (kept for backward compatibility with imports).
### Tool Call Parsers (`tool_call_parsers/`)
+8
View File
@@ -601,6 +601,14 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.platforms[Platform.TELEGRAM] = PlatformConfig()
config.platforms[Platform.TELEGRAM].reply_to_mode = telegram_reply_mode
telegram_fallback_ips = os.getenv("TELEGRAM_FALLBACK_IPS", "")
if telegram_fallback_ips:
if Platform.TELEGRAM not in config.platforms:
config.platforms[Platform.TELEGRAM] = PlatformConfig()
config.platforms[Platform.TELEGRAM].extra["fallback_ips"] = [
ip.strip() for ip in telegram_fallback_ips.split(",") if ip.strip()
]
telegram_home = os.getenv("TELEGRAM_HOME_CHANNEL")
if telegram_home and Platform.TELEGRAM in config.platforms:
config.platforms[Platform.TELEGRAM].home_channel = HomeChannel(
+2 -2
View File
@@ -25,7 +25,7 @@ import time
from pathlib import Path
from typing import Optional
from hermes_cli.config import get_hermes_home
from hermes_constants import get_hermes_dir
# Unambiguous alphabet -- excludes 0/O, 1/I to prevent confusion
@@ -41,7 +41,7 @@ LOCKOUT_SECONDS = 3600 # Lockout duration after too many failures
MAX_PENDING_PER_PLATFORM = 3 # Max pending codes per platform
MAX_FAILED_ATTEMPTS = 5 # Failed approvals before lockout
PAIRING_DIR = get_hermes_home() / "pairing"
PAIRING_DIR = get_hermes_dir("platforms/pairing", "pairing")
def _secure_write(path: Path, data: str) -> None:
+136 -71
View File
@@ -166,7 +166,7 @@ class ResponseStore:
_CORS_HEADERS = {
"Access-Control-Allow-Methods": "GET, POST, DELETE, OPTIONS",
"Access-Control-Allow-Headers": "Authorization, Content-Type",
"Access-Control-Allow-Headers": "Authorization, Content-Type, Idempotency-Key",
}
@@ -223,6 +223,23 @@ if AIOHTTP_AVAILABLE:
else:
body_limit_middleware = None # type: ignore[assignment]
_SECURITY_HEADERS = {
"X-Content-Type-Options": "nosniff",
"Referrer-Policy": "no-referrer",
}
if AIOHTTP_AVAILABLE:
@web.middleware
async def security_headers_middleware(request, handler):
"""Add security headers to all responses (including errors)."""
response = await handler(request)
for k, v in _SECURITY_HEADERS.items():
response.headers.setdefault(k, v)
return response
else:
security_headers_middleware = None # type: ignore[assignment]
class _IdempotencyCache:
"""In-memory idempotency cache with TTL and basic LRU semantics."""
@@ -307,6 +324,7 @@ class APIServerAdapter(BasePlatformAdapter):
if "*" in self._cors_origins:
headers = dict(_CORS_HEADERS)
headers["Access-Control-Allow-Origin"] = "*"
headers["Access-Control-Max-Age"] = "600"
return headers
if origin not in self._cors_origins:
@@ -315,6 +333,7 @@ class APIServerAdapter(BasePlatformAdapter):
headers = dict(_CORS_HEADERS)
headers["Access-Control-Allow-Origin"] = origin
headers["Vary"] = "Origin"
headers["Access-Control-Max-Age"] = "600"
return headers
def _origin_allowed(self, origin: str) -> bool:
@@ -366,14 +385,20 @@ class APIServerAdapter(BasePlatformAdapter):
Create an AIAgent instance using the gateway's runtime config.
Uses _resolve_runtime_agent_kwargs() to pick up model, api_key,
base_url, etc. from config.yaml / env vars.
base_url, etc. from config.yaml / env vars. Toolsets are resolved
from config.yaml platform_toolsets.api_server (same as all other
gateway platforms), falling back to the hermes-api-server default.
"""
from run_agent import AIAgent
from gateway.run import _resolve_runtime_agent_kwargs, _resolve_gateway_model
from gateway.run import _resolve_runtime_agent_kwargs, _resolve_gateway_model, _load_gateway_config
from hermes_cli.tools_config import _get_platform_tools
runtime_kwargs = _resolve_runtime_agent_kwargs()
model = _resolve_gateway_model()
user_config = _load_gateway_config()
enabled_toolsets = sorted(_get_platform_tools(user_config, "api_server"))
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
agent = AIAgent(
@@ -383,7 +408,7 @@ class APIServerAdapter(BasePlatformAdapter):
quiet_mode=True,
verbose_logging=False,
ephemeral_system_prompt=ephemeral_system_prompt or None,
enabled_toolsets=["hermes-api-server"],
enabled_toolsets=enabled_toolsets,
session_id=session_id,
platform="api_server",
stream_delta_callback=stream_delta_callback,
@@ -489,17 +514,21 @@ class APIServerAdapter(BasePlatformAdapter):
if delta is not None:
_stream_q.put(delta)
# Start agent in background
# Start agent in background. agent_ref is a mutable container
# so the SSE writer can interrupt the agent on client disconnect.
agent_ref = [None]
agent_task = asyncio.ensure_future(self._run_agent(
user_message=user_message,
conversation_history=history,
ephemeral_system_prompt=system_prompt,
session_id=session_id,
stream_delta_callback=_on_delta,
agent_ref=agent_ref,
))
return await self._write_sse_chat_completion(
request, completion_id, model_name, created, _stream_q, agent_task
request, completion_id, model_name, created, _stream_q,
agent_task, agent_ref,
)
# Non-streaming: run the agent (with optional Idempotency-Key)
@@ -562,80 +591,107 @@ class APIServerAdapter(BasePlatformAdapter):
async def _write_sse_chat_completion(
self, request: "web.Request", completion_id: str, model: str,
created: int, stream_q, agent_task,
created: int, stream_q, agent_task, agent_ref=None,
) -> "web.StreamResponse":
"""Write real streaming SSE from agent's stream_delta_callback queue."""
"""Write real streaming SSE from agent's stream_delta_callback queue.
If the client disconnects mid-stream (network drop, browser tab close),
the agent is interrupted via ``agent.interrupt()`` so it stops making
LLM API calls, and the asyncio task wrapper is cancelled.
"""
import queue as _q
response = web.StreamResponse(
status=200,
headers={"Content-Type": "text/event-stream", "Cache-Control": "no-cache"},
)
sse_headers = {"Content-Type": "text/event-stream", "Cache-Control": "no-cache"}
# CORS middleware can't inject headers into StreamResponse after
# prepare() flushes them, so resolve CORS headers up front.
origin = request.headers.get("Origin", "")
cors = self._cors_headers_for_origin(origin) if origin else None
if cors:
sse_headers.update(cors)
response = web.StreamResponse(status=200, headers=sse_headers)
await response.prepare(request)
# Role chunk
role_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}],
}
await response.write(f"data: {json.dumps(role_chunk)}\n\n".encode())
# Stream content chunks as they arrive from the agent
loop = asyncio.get_event_loop()
while True:
try:
delta = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
except _q.Empty:
if agent_task.done():
# Drain any remaining items
while True:
try:
delta = stream_q.get_nowait()
if delta is None:
break
content_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
}
await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
except _q.Empty:
break
break
continue
if delta is None: # End of stream sentinel
break
content_chunk = {
try:
# Role chunk
role_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
"choices": [{"index": 0, "delta": {"role": "assistant"}, "finish_reason": None}],
}
await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
await response.write(f"data: {json.dumps(role_chunk)}\n\n".encode())
# Get usage from completed agent
usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
try:
result, agent_usage = await agent_task
usage = agent_usage or usage
except Exception:
pass
# Stream content chunks as they arrive from the agent
loop = asyncio.get_event_loop()
while True:
try:
delta = await loop.run_in_executor(None, lambda: stream_q.get(timeout=0.5))
except _q.Empty:
if agent_task.done():
# Drain any remaining items
while True:
try:
delta = stream_q.get_nowait()
if delta is None:
break
content_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
}
await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
except _q.Empty:
break
break
continue
# Finish chunk
finish_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
"usage": {
"prompt_tokens": usage.get("input_tokens", 0),
"completion_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
await response.write(f"data: {json.dumps(finish_chunk)}\n\n".encode())
await response.write(b"data: [DONE]\n\n")
if delta is None: # End of stream sentinel
break
content_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {"content": delta}, "finish_reason": None}],
}
await response.write(f"data: {json.dumps(content_chunk)}\n\n".encode())
# Get usage from completed agent
usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
try:
result, agent_usage = await agent_task
usage = agent_usage or usage
except Exception:
pass
# Finish chunk
finish_chunk = {
"id": completion_id, "object": "chat.completion.chunk",
"created": created, "model": model,
"choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}],
"usage": {
"prompt_tokens": usage.get("input_tokens", 0),
"completion_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
await response.write(f"data: {json.dumps(finish_chunk)}\n\n".encode())
await response.write(b"data: [DONE]\n\n")
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
# Client disconnected mid-stream. Interrupt the agent so it
# stops making LLM API calls at the next loop iteration, then
# cancel the asyncio task wrapper.
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE client disconnected")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
try:
await agent_task
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", completion_id)
return response
@@ -1138,12 +1194,18 @@ class APIServerAdapter(BasePlatformAdapter):
ephemeral_system_prompt: Optional[str] = None,
session_id: Optional[str] = None,
stream_delta_callback=None,
agent_ref: Optional[list] = None,
) -> tuple:
"""
Create an agent and run a conversation in a thread executor.
Returns ``(result_dict, usage_dict)`` where *usage_dict* contains
``input_tokens``, ``output_tokens`` and ``total_tokens``.
If *agent_ref* is a one-element list, the AIAgent instance is stored
at ``agent_ref[0]`` before ``run_conversation`` begins. This allows
callers (e.g. the SSE writer) to call ``agent.interrupt()`` from
another thread to stop in-progress LLM calls.
"""
loop = asyncio.get_event_loop()
@@ -1153,6 +1215,8 @@ class APIServerAdapter(BasePlatformAdapter):
session_id=session_id,
stream_delta_callback=stream_delta_callback,
)
if agent_ref is not None:
agent_ref[0] = agent
result = agent.run_conversation(
user_message=user_message,
conversation_history=conversation_history,
@@ -1177,10 +1241,11 @@ class APIServerAdapter(BasePlatformAdapter):
return False
try:
mws = [mw for mw in (cors_middleware, body_limit_middleware) if mw is not None]
mws = [mw for mw in (cors_middleware, body_limit_middleware, security_headers_middleware) if mw is not None]
self._app = web.Application(middlewares=mws)
self._app["api_server_adapter"] = self
self._app.router.add_get("/health", self._handle_health)
self._app.router.add_get("/v1/health", self._handle_health)
self._app.router.add_get("/v1/models", self._handle_models)
self._app.router.add_post("/v1/chat/completions", self._handle_chat_completions)
self._app.router.add_post("/v1/responses", self._handle_responses)
+177 -40
View File
@@ -8,6 +8,7 @@ and implement the required methods.
import asyncio
import logging
import os
import random
import re
import uuid
from abc import ABC, abstractmethod
@@ -26,6 +27,7 @@ sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
from gateway.config import Platform, PlatformConfig
from gateway.session import SessionSource, build_session_key
from hermes_cli.config import get_hermes_home
from hermes_constants import get_hermes_dir
GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
@@ -43,8 +45,8 @@ GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
# (e.g. Telegram file URLs expire after ~1 hour).
# ---------------------------------------------------------------------------
# Default location: {HERMES_HOME}/image_cache/
IMAGE_CACHE_DIR = get_hermes_home() / "image_cache"
# Default location: {HERMES_HOME}/cache/images/ (legacy: image_cache/)
IMAGE_CACHE_DIR = get_hermes_dir("cache/images", "image_cache")
def get_image_cache_dir() -> Path:
@@ -71,31 +73,51 @@ def cache_image_from_bytes(data: bytes, ext: str = ".jpg") -> str:
return str(filepath)
async def cache_image_from_url(url: str, ext: str = ".jpg") -> str:
async def cache_image_from_url(url: str, ext: str = ".jpg", retries: int = 2) -> str:
"""
Download an image from a URL and save it to the local cache.
Uses httpx for async download with a reasonable timeout.
Retries on transient failures (timeouts, 429, 5xx) with exponential
backoff so a single slow CDN response doesn't lose the media.
Args:
url: The HTTP/HTTPS URL to download from.
ext: File extension including the dot (e.g. ".jpg", ".png").
retries: Number of retry attempts on transient failures.
Returns:
Absolute path to the cached image file as a string.
"""
import asyncio
import httpx
import logging as _logging
_log = _logging.getLogger(__name__)
last_exc = None
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "image/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_image_from_bytes(response.content, ext)
for attempt in range(retries + 1):
try:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "image/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_image_from_bytes(response.content, ext)
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
last_exc = exc
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
if attempt < retries:
wait = 1.5 * (attempt + 1)
_log.debug("Media cache retry %d/%d for %s (%.1fs): %s",
attempt + 1, retries, url[:80], wait, exc)
await asyncio.sleep(wait)
continue
raise
raise last_exc
def cleanup_image_cache(max_age_hours: int = 24) -> int:
@@ -126,7 +148,7 @@ def cleanup_image_cache(max_age_hours: int = 24) -> int:
# here so the STT tool (OpenAI Whisper) can transcribe them from local files.
# ---------------------------------------------------------------------------
AUDIO_CACHE_DIR = get_hermes_home() / "audio_cache"
AUDIO_CACHE_DIR = get_hermes_dir("cache/audio", "audio_cache")
def get_audio_cache_dir() -> Path:
@@ -153,29 +175,51 @@ def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
return str(filepath)
async def cache_audio_from_url(url: str, ext: str = ".ogg") -> str:
async def cache_audio_from_url(url: str, ext: str = ".ogg", retries: int = 2) -> str:
"""
Download an audio file from a URL and save it to the local cache.
Retries on transient failures (timeouts, 429, 5xx) with exponential
backoff so a single slow CDN response doesn't lose the media.
Args:
url: The HTTP/HTTPS URL to download from.
ext: File extension including the dot (e.g. ".ogg", ".mp3").
retries: Number of retry attempts on transient failures.
Returns:
Absolute path to the cached audio file as a string.
"""
import asyncio
import httpx
import logging as _logging
_log = _logging.getLogger(__name__)
last_exc = None
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "audio/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_audio_from_bytes(response.content, ext)
for attempt in range(retries + 1):
try:
response = await client.get(
url,
headers={
"User-Agent": "Mozilla/5.0 (compatible; HermesAgent/1.0)",
"Accept": "audio/*,*/*;q=0.8",
},
)
response.raise_for_status()
return cache_audio_from_bytes(response.content, ext)
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
last_exc = exc
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
if attempt < retries:
wait = 1.5 * (attempt + 1)
_log.debug("Audio cache retry %d/%d for %s (%.1fs): %s",
attempt + 1, retries, url[:80], wait, exc)
await asyncio.sleep(wait)
continue
raise
raise last_exc
# ---------------------------------------------------------------------------
@@ -185,7 +229,7 @@ async def cache_audio_from_url(url: str, ext: str = ".ogg") -> str:
# here so the agent can reference them by local file path.
# ---------------------------------------------------------------------------
DOCUMENT_CACHE_DIR = get_hermes_home() / "document_cache"
DOCUMENT_CACHE_DIR = get_hermes_dir("cache/documents", "document_cache")
SUPPORTED_DOCUMENT_TYPES = {
".pdf": "application/pdf",
@@ -312,7 +356,10 @@ class MessageEvent:
return None
# Split on space and get first word, strip the /
parts = self.text.split(maxsplit=1)
return parts[0][1:].lower() if parts else None
raw = parts[0][1:].lower() if parts else None
if raw and "@" in raw:
raw = raw.split("@", 1)[0]
return raw
def get_command_args(self) -> str:
"""Get the arguments after a command."""
@@ -329,6 +376,24 @@ class SendResult:
message_id: Optional[str] = None
error: Optional[str] = None
raw_response: Any = None
retryable: bool = False # True for transient errors (network, timeout) — base will retry automatically
# Error substrings that indicate a transient network failure worth retrying
_RETRYABLE_ERROR_PATTERNS = (
"connecterror",
"connectionerror",
"connectionreset",
"connectionrefused",
"timeout",
"timed out",
"network",
"broken pipe",
"remotedisconnected",
"eoferror",
"readtimeout",
"writetimeout",
)
# Type for message handlers
@@ -833,6 +898,91 @@ class BasePlatformAdapter(ABC):
except Exception:
pass
@staticmethod
def _is_retryable_error(error: Optional[str]) -> bool:
"""Return True if the error string looks like a transient network failure."""
if not error:
return False
lowered = error.lower()
return any(pat in lowered for pat in _RETRYABLE_ERROR_PATTERNS)
async def _send_with_retry(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Any = None,
max_retries: int = 2,
base_delay: float = 2.0,
) -> "SendResult":
"""
Send a message with automatic retry for transient network errors.
On permanent failures (e.g. formatting / permission errors) falls back
to a plain-text version before giving up. If all attempts fail due to
network errors, sends the user a brief delivery-failure notice so they
know to retry rather than waiting indefinitely.
"""
result = await self.send(
chat_id=chat_id,
content=content,
reply_to=reply_to,
metadata=metadata,
)
if result.success:
return result
error_str = result.error or ""
is_network = result.retryable or self._is_retryable_error(error_str)
if is_network:
# Retry with exponential backoff for transient errors
for attempt in range(1, max_retries + 1):
delay = base_delay * (2 ** (attempt - 1)) + random.uniform(0, 1)
logger.warning(
"[%s] Send failed (attempt %d/%d, retrying in %.1fs): %s",
self.name, attempt, max_retries, delay, error_str,
)
await asyncio.sleep(delay)
result = await self.send(
chat_id=chat_id,
content=content,
reply_to=reply_to,
metadata=metadata,
)
if result.success:
logger.info("[%s] Send succeeded on retry %d", self.name, attempt)
return result
error_str = result.error or ""
if not (result.retryable or self._is_retryable_error(error_str)):
break # error switched to non-transient — fall through to plain-text fallback
else:
# All retries exhausted (loop completed without break) — notify user
logger.error("[%s] Failed to deliver response after %d retries: %s", self.name, max_retries, error_str)
notice = (
"\u26a0\ufe0f Message delivery failed after multiple attempts. "
"Please try again \u2014 your request was processed but the response could not be sent."
)
try:
await self.send(chat_id=chat_id, content=notice, reply_to=reply_to, metadata=metadata)
except Exception as notify_err:
logger.debug("[%s] Could not send delivery-failure notice: %s", self.name, notify_err)
return result
# Non-network / post-retry formatting failure: try plain text as fallback
logger.warning("[%s] Send failed: %s — trying plain-text fallback", self.name, error_str)
fallback_result = await self.send(
chat_id=chat_id,
content=f"(Response formatting failed, plain text:)\n\n{content[:3500]}",
reply_to=reply_to,
metadata=metadata,
)
if not fallback_result.success:
logger.error("[%s] Fallback send also failed: %s", self.name, fallback_result.error)
return fallback_result
async def handle_message(self, event: MessageEvent) -> None:
"""
Process an incoming message.
@@ -982,26 +1132,13 @@ class BasePlatformAdapter(ABC):
# Send the text portion
if text_content:
logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id)
result = await self.send(
result = await self._send_with_retry(
chat_id=event.source.chat_id,
content=text_content,
reply_to=event.message_id,
metadata=_thread_metadata,
)
# Log send failures (don't raise - user already saw tool progress)
if not result.success:
print(f"[{self.name}] Failed to send response: {result.error}")
# Try sending without markdown as fallback
fallback_result = await self.send(
chat_id=event.source.chat_id,
content=f"(Response formatting failed, plain text:)\n\n{text_content[:3500]}",
reply_to=event.message_id,
metadata=_thread_metadata,
)
if not fallback_result.success:
print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
# Human-like pacing delay between text and media
human_delay = self._get_human_delay()
+21
View File
@@ -550,6 +550,22 @@ class DiscordAdapter(BasePlatformAdapter):
return
# "all" falls through to handle_message
# If the message @mentions other users but NOT the bot, the
# sender is talking to someone else — stay silent. Only
# applies in server channels; in DMs the user is always
# talking to the bot (mentions are just references).
# Controlled by DISCORD_IGNORE_NO_MENTION (default: true).
_ignore_no_mention = os.getenv(
"DISCORD_IGNORE_NO_MENTION", "true"
).lower() in ("true", "1", "yes")
if _ignore_no_mention and message.mentions and not isinstance(message.channel, discord.DMChannel):
_bot_mentioned = (
self._client.user is not None
and self._client.user in message.mentions
)
if not _bot_mentioned:
return # Talking to someone else, don't interrupt
await self._handle_message(message)
@self._client.event
@@ -2096,6 +2112,11 @@ class DiscordAdapter(BasePlatformAdapter):
if pending_text_injection:
event_text = f"{pending_text_injection}\n\n{event_text}" if event_text else pending_text_injection
# Defense-in-depth: prevent empty user messages from entering session
# (can happen when user sends @mention-only with no other text)
if not event_text or not event_text.strip():
event_text = "(The user sent a message with no text content)"
event = MessageEvent(
text=event_text,
message_type=msg_type,
+61 -1
View File
@@ -43,6 +43,20 @@ from gateway.platforms.base import (
from gateway.config import Platform, PlatformConfig
logger = logging.getLogger(__name__)
# Automated sender patterns — emails from these are silently ignored
_NOREPLY_PATTERNS = (
"noreply", "no-reply", "no_reply", "donotreply", "do-not-reply",
"mailer-daemon", "postmaster", "bounce", "notifications@",
"automated@", "auto-confirm", "auto-reply", "automailer",
)
# RFC headers that indicate bulk/automated mail
_AUTOMATED_HEADERS = {
"Auto-Submitted": lambda v: v.lower() != "no",
"Precedence": lambda v: v.lower() in ("bulk", "list", "junk"),
"X-Auto-Response-Suppress": lambda v: bool(v),
"List-Unsubscribe": lambda v: bool(v),
}
# Gmail-safe max length per email body
MAX_MESSAGE_LENGTH = 50_000
@@ -50,7 +64,17 @@ MAX_MESSAGE_LENGTH = 50_000
# Supported image extensions for inline detection
_IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
def _is_automated_sender(address: str, headers: dict) -> bool:
"""Return True if this email is from an automated/noreply source."""
addr = address.lower()
if any(pattern in addr for pattern in _NOREPLY_PATTERNS):
return True
for header, check in _AUTOMATED_HEADERS.items():
value = headers.get(header, "")
if value and check(value):
return True
return False
def check_email_requirements() -> bool:
"""Check if email platform dependencies are available."""
addr = os.getenv("EMAIL_ADDRESS")
@@ -213,6 +237,7 @@ class EmailAdapter(BasePlatformAdapter):
# Track message IDs we've already processed to avoid duplicates
self._seen_uids: set = set()
self._seen_uids_max: int = 2000 # cap to prevent unbounded memory growth
self._poll_task: Optional[asyncio.Task] = None
# Map chat_id (sender email) -> last subject + message-id for threading
@@ -220,6 +245,26 @@ class EmailAdapter(BasePlatformAdapter):
logger.info("[Email] Adapter initialized for %s", self._address)
def _trim_seen_uids(self) -> None:
"""Keep only the most recent UIDs to prevent unbounded memory growth.
IMAP UIDs are monotonically increasing integers. When the set grows
beyond the cap, we keep only the highest half old UIDs are safe to
drop because new messages always have higher UIDs and IMAP's UNSEEN
flag prevents re-delivery regardless.
"""
if len(self._seen_uids) <= self._seen_uids_max:
return
try:
# UIDs are bytes like b'1234' — sort numerically and keep top half
sorted_uids = sorted(self._seen_uids, key=lambda u: int(u))
keep = self._seen_uids_max // 2
self._seen_uids = set(sorted_uids[-keep:])
logger.debug("[Email] Trimmed seen UIDs to %d entries", len(self._seen_uids))
except (ValueError, TypeError):
# Fallback: just clear old entries if sort fails
self._seen_uids = set(list(self._seen_uids)[-self._seen_uids_max // 2:])
async def connect(self) -> bool:
"""Connect to the IMAP server and start polling for new messages."""
try:
@@ -232,6 +277,8 @@ class EmailAdapter(BasePlatformAdapter):
if status == "OK" and data and data[0]:
for uid in data[0].split():
self._seen_uids.add(uid)
# Keep only the most recent UIDs to prevent unbounded growth
self._trim_seen_uids()
imap.logout()
logger.info("[Email] IMAP connection test passed. %d existing messages skipped.", len(self._seen_uids))
except Exception as e:
@@ -302,6 +349,9 @@ class EmailAdapter(BasePlatformAdapter):
if uid in self._seen_uids:
continue
self._seen_uids.add(uid)
# Trim periodically to prevent unbounded memory growth
if len(self._seen_uids) > self._seen_uids_max:
self._trim_seen_uids()
status, msg_data = imap.uid("fetch", uid, "(RFC822)")
if status != "OK":
@@ -320,6 +370,11 @@ class EmailAdapter(BasePlatformAdapter):
subject = _decode_header_value(msg.get("Subject", "(no subject)"))
message_id = msg.get("Message-ID", "")
in_reply_to = msg.get("In-Reply-To", "")
# Skip automated/noreply senders before any processing
msg_headers = dict(msg.items())
if _is_automated_sender(sender_addr, msg_headers):
logger.debug("[Email] Skipping automated sender: %s", sender_addr)
continue
body = _extract_text_body(msg)
attachments = _extract_attachments(msg, skip_attachments=self._skip_attachments)
@@ -348,6 +403,11 @@ class EmailAdapter(BasePlatformAdapter):
if sender_addr == self._address.lower():
return
# Never reply to automated senders
if _is_automated_sender(sender_addr, {}):
logger.debug("[Email] Dropping automated sender at dispatch: %s", sender_addr)
return
subject = msg_data["subject"]
body = msg_data["body"].strip()
attachments = msg_data["attachments"]
+138 -22
View File
@@ -40,7 +40,9 @@ logger = logging.getLogger(__name__)
MAX_MESSAGE_LENGTH = 4000
# Store directory for E2EE keys and sync state.
_STORE_DIR = Path.home() / ".hermes" / "matrix" / "store"
# Uses get_hermes_home() so each profile gets its own Matrix store.
from hermes_constants import get_hermes_dir as _get_hermes_dir
_STORE_DIR = _get_hermes_dir("platforms/matrix/store", "matrix/store")
# Grace period: ignore messages older than this many seconds before startup.
_STARTUP_GRACE_SECONDS = 5
@@ -161,22 +163,49 @@ class MatrixAdapter(BasePlatformAdapter):
# Authenticate.
if self._access_token:
client.access_token = self._access_token
# Resolve user_id if not set.
if not self._user_id:
resp = await client.whoami()
if isinstance(resp, nio.WhoamiResponse):
self._user_id = resp.user_id
client.user_id = resp.user_id
logger.info("Matrix: authenticated as %s", self._user_id)
else:
logger.error(
"Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER"
# With access-token auth, always resolve whoami so we validate the
# token and learn the device_id. The device_id matters for E2EE:
# without it, matrix-nio can send plain messages but may fail to
# decrypt inbound encrypted events or encrypt outbound room sends.
resp = await client.whoami()
if isinstance(resp, nio.WhoamiResponse):
resolved_user_id = getattr(resp, "user_id", "") or self._user_id
resolved_device_id = getattr(resp, "device_id", "")
if resolved_user_id:
self._user_id = resolved_user_id
# restore_login() is the matrix-nio path that binds the access
# token to a specific device and loads the crypto store.
if resolved_device_id and hasattr(client, "restore_login"):
client.restore_login(
self._user_id or resolved_user_id,
resolved_device_id,
self._access_token,
)
await client.close()
return False
else:
if self._user_id:
client.user_id = self._user_id
if resolved_device_id:
client.device_id = resolved_device_id
client.access_token = self._access_token
if self._encryption:
logger.warning(
"Matrix: access-token login did not restore E2EE state; "
"encrypted rooms may fail until a device_id is available"
)
logger.info(
"Matrix: using access token for %s%s",
self._user_id or "(unknown user)",
f" (device {resolved_device_id})" if resolved_device_id else "",
)
else:
client.user_id = self._user_id
logger.info("Matrix: using access token for %s", self._user_id)
logger.error(
"Matrix: whoami failed — check MATRIX_ACCESS_TOKEN and MATRIX_HOMESERVER"
)
await client.close()
return False
elif self._password and self._user_id:
resp = await client.login(
self._password,
@@ -194,13 +223,18 @@ class MatrixAdapter(BasePlatformAdapter):
return False
# If E2EE is enabled, load the crypto store.
if self._encryption and hasattr(client, "olm"):
if self._encryption and getattr(client, "olm", None):
try:
if client.should_upload_keys:
await client.keys_upload()
logger.info("Matrix: E2EE crypto initialized")
except Exception as exc:
logger.warning("Matrix: crypto init issue: %s", exc)
elif self._encryption:
logger.warning(
"Matrix: E2EE requested but crypto store is not loaded; "
"encrypted rooms may fail"
)
# Register event callbacks.
client.add_event_callback(self._on_room_message, nio.RoomMessageText)
@@ -230,6 +264,7 @@ class MatrixAdapter(BasePlatformAdapter):
)
# Build DM room cache from m.direct account data.
await self._refresh_dm_cache()
await self._run_e2ee_maintenance()
else:
logger.warning("Matrix: initial sync returned %s", type(resp).__name__)
@@ -301,13 +336,48 @@ class MatrixAdapter(BasePlatformAdapter):
relates_to["m.in_reply_to"] = {"event_id": reply_to}
msg_content["m.relates_to"] = relates_to
resp = await self._client.room_send(
chat_id,
"m.room.message",
msg_content,
)
async def _room_send_once(*, ignore_unverified_devices: bool = False):
return await asyncio.wait_for(
self._client.room_send(
chat_id,
"m.room.message",
msg_content,
ignore_unverified_devices=ignore_unverified_devices,
),
timeout=45,
)
try:
resp = await _room_send_once(ignore_unverified_devices=False)
except Exception as exc:
retryable = isinstance(exc, asyncio.TimeoutError)
olm_unverified = getattr(nio, "OlmUnverifiedDeviceError", None)
send_retry = getattr(nio, "SendRetryError", None)
if isinstance(olm_unverified, type) and isinstance(exc, olm_unverified):
retryable = True
if isinstance(send_retry, type) and isinstance(exc, send_retry):
retryable = True
if not retryable:
logger.error("Matrix: failed to send to %s: %s", chat_id, exc)
return SendResult(success=False, error=str(exc))
logger.warning(
"Matrix: initial encrypted send to %s failed (%s); "
"retrying after E2EE maintenance with ignored unverified devices",
chat_id,
exc,
)
await self._run_e2ee_maintenance()
try:
resp = await _room_send_once(ignore_unverified_devices=True)
except Exception as retry_exc:
logger.error("Matrix: failed to send to %s after retry: %s", chat_id, retry_exc)
return SendResult(success=False, error=str(retry_exc))
if isinstance(resp, nio.RoomSendResponse):
last_event_id = resp.event_id
logger.info("Matrix: sent event %s to %s", last_event_id, chat_id)
else:
err = getattr(resp, "message", str(resp))
logger.error("Matrix: failed to send to %s: %s", chat_id, err)
@@ -551,9 +621,23 @@ class MatrixAdapter(BasePlatformAdapter):
async def _sync_loop(self) -> None:
"""Continuously sync with the homeserver."""
import nio
while not self._closing:
try:
await self._client.sync(timeout=30000)
resp = await self._client.sync(timeout=30000)
if isinstance(resp, nio.SyncError):
if self._closing:
return
logger.warning(
"Matrix: sync returned %s: %s — retrying in 5s",
type(resp).__name__,
getattr(resp, "message", resp),
)
await asyncio.sleep(5)
continue
await self._run_e2ee_maintenance()
except asyncio.CancelledError:
return
except Exception as exc:
@@ -562,6 +646,38 @@ class MatrixAdapter(BasePlatformAdapter):
logger.warning("Matrix: sync error: %s — retrying in 5s", exc)
await asyncio.sleep(5)
async def _run_e2ee_maintenance(self) -> None:
"""Run matrix-nio E2EE housekeeping between syncs.
Hermes uses a custom sync loop instead of matrix-nio's sync_forever(),
so we need to explicitly drive the key management work that sync_forever()
normally handles for encrypted rooms.
"""
client = self._client
if not client or not self._encryption or not getattr(client, "olm", None):
return
tasks = [asyncio.create_task(client.send_to_device_messages())]
if client.should_upload_keys:
tasks.append(asyncio.create_task(client.keys_upload()))
if client.should_query_keys:
tasks.append(asyncio.create_task(client.keys_query()))
if client.should_claim_keys:
users = client.get_users_for_key_claiming()
if users:
tasks.append(asyncio.create_task(client.keys_claim(users)))
for task in asyncio.as_completed(tasks):
try:
await task
except asyncio.CancelledError:
raise
except Exception as exc:
logger.warning("Matrix: E2EE maintenance task failed: %s", exc)
# ------------------------------------------------------------------
# Event callbacks
# ------------------------------------------------------------------
+52 -14
View File
@@ -407,18 +407,38 @@ class MattermostAdapter(BasePlatformAdapter):
kind: str = "file",
) -> SendResult:
"""Download a URL and upload it as a file attachment."""
import asyncio
import aiohttp
try:
async with self._session.get(url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
if resp.status >= 400:
# Fall back to sending the URL as text.
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
file_data = await resp.read()
ct = resp.content_type or "application/octet-stream"
# Derive filename from URL.
fname = url.rsplit("/", 1)[-1].split("?")[0] or f"{kind}.png"
except Exception as exc:
logger.warning("Mattermost: failed to download %s: %s", url, exc)
last_exc = None
file_data = None
ct = "application/octet-stream"
fname = url.rsplit("/", 1)[-1].split("?")[0] or f"{kind}.png"
for attempt in range(3):
try:
async with self._session.get(url, timeout=aiohttp.ClientTimeout(total=30)) as resp:
if resp.status >= 500 or resp.status == 429:
if attempt < 2:
logger.debug("Mattermost download retry %d/2 for %s (status %d)",
attempt + 1, url[:80], resp.status)
await asyncio.sleep(1.5 * (attempt + 1))
continue
if resp.status >= 400:
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
file_data = await resp.read()
ct = resp.content_type or "application/octet-stream"
break
except (aiohttp.ClientError, asyncio.TimeoutError) as exc:
last_exc = exc
if attempt < 2:
await asyncio.sleep(1.5 * (attempt + 1))
continue
logger.warning("Mattermost: failed to download %s after %d attempts: %s", url, attempt + 1, exc)
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
if file_data is None:
logger.warning("Mattermost: download returned no data for %s", url)
return await self.send(chat_id, f"{caption or ''}\n{url}".strip(), reply_to)
file_id = await self._upload_file(chat_id, file_data, fname, ct)
@@ -583,9 +603,19 @@ class MattermostAdapter(BasePlatformAdapter):
# For DMs, user_id is sufficient. For channels, check for @mention.
message_text = post.get("message", "")
# Mention-only mode: skip channel messages that don't @mention the bot.
# DMs (type "D") are always processed.
# Mention-gating for non-DM channels.
# Config (env vars):
# MATTERMOST_REQUIRE_MENTION: Require @mention in channels (default: true)
# MATTERMOST_FREE_RESPONSE_CHANNELS: Channel IDs where bot responds without mention
if channel_type_raw != "D":
require_mention = os.getenv(
"MATTERMOST_REQUIRE_MENTION", "true"
).lower() not in ("false", "0", "no")
free_channels_raw = os.getenv("MATTERMOST_FREE_RESPONSE_CHANNELS", "")
free_channels = {ch.strip() for ch in free_channels_raw.split(",") if ch.strip()}
is_free_channel = channel_id in free_channels
mention_patterns = [
f"@{self._bot_username}",
f"@{self._bot_user_id}",
@@ -594,13 +624,21 @@ class MattermostAdapter(BasePlatformAdapter):
pattern.lower() in message_text.lower()
for pattern in mention_patterns
)
if not has_mention:
if require_mention and not is_free_channel and not has_mention:
logger.debug(
"Mattermost: skipping non-DM message without @mention (channel=%s)",
channel_id,
)
return
# Strip @mention from the message text so the agent sees clean input.
if has_mention:
for pattern in mention_patterns:
message_text = re.sub(
re.escape(pattern), "", message_text, flags=re.IGNORECASE
).strip()
# Resolve sender info.
sender_id = post.get("user_id", "")
sender_name = data.get("sender_name", "").lstrip("@") or sender_id
+6
View File
@@ -279,6 +279,12 @@ class SignalAdapter(BasePlatformAdapter):
line = line.strip()
if not line:
continue
# SSE keepalive comments (":") prove the connection
# is alive — update activity so the health monitor
# doesn't report false idle warnings.
if line.startswith(":"):
self._last_sse_activity = time.time()
continue
# Parse SSE data lines
if line.startswith("data:"):
data_str = line[5:].strip()
+52 -20
View File
@@ -819,33 +819,65 @@ class SlackAdapter(BasePlatformAdapter):
await self.handle_message(event)
async def _download_slack_file(self, url: str, ext: str, audio: bool = False) -> str:
"""Download a Slack file using the bot token for auth."""
"""Download a Slack file using the bot token for auth, with retry."""
import asyncio
import httpx
bot_token = self.config.token
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
last_exc = None
if audio:
from gateway.platforms.base import cache_audio_from_bytes
return cache_audio_from_bytes(response.content, ext)
else:
from gateway.platforms.base import cache_image_from_bytes
return cache_image_from_bytes(response.content, ext)
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
for attempt in range(3):
try:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
if audio:
from gateway.platforms.base import cache_audio_from_bytes
return cache_audio_from_bytes(response.content, ext)
else:
from gateway.platforms.base import cache_image_from_bytes
return cache_image_from_bytes(response.content, ext)
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
last_exc = exc
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
if attempt < 2:
logger.debug("Slack file download retry %d/2 for %s: %s",
attempt + 1, url[:80], exc)
await asyncio.sleep(1.5 * (attempt + 1))
continue
raise
raise last_exc
async def _download_slack_file_bytes(self, url: str) -> bytes:
"""Download a Slack file and return raw bytes."""
"""Download a Slack file and return raw bytes, with retry."""
import asyncio
import httpx
bot_token = self.config.token
last_exc = None
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
return response.content
for attempt in range(3):
try:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
return response.content
except (httpx.TimeoutException, httpx.HTTPStatusError) as exc:
last_exc = exc
if isinstance(exc, httpx.HTTPStatusError) and exc.response.status_code < 429:
raise
if attempt < 2:
logger.debug("Slack file download retry %d/2 for %s: %s",
attempt + 1, url[:80], exc)
await asyncio.sleep(1.5 * (attempt + 1))
continue
raise
raise last_exc
+65 -6
View File
@@ -11,7 +11,7 @@ import asyncio
import logging
import os
import re
from typing import Dict, Optional, Any
from typing import Dict, List, Optional, Any
logger = logging.getLogger(__name__)
@@ -25,6 +25,7 @@ try:
filters,
)
from telegram.constants import ParseMode, ChatType
from telegram.request import HTTPXRequest
TELEGRAM_AVAILABLE = True
except ImportError:
TELEGRAM_AVAILABLE = False
@@ -34,6 +35,7 @@ except ImportError:
Application = Any
CommandHandler = Any
TelegramMessageHandler = Any
HTTPXRequest = Any
filters = None
ParseMode = None
ChatType = None
@@ -59,6 +61,11 @@ from gateway.platforms.base import (
cache_document_from_bytes,
SUPPORTED_DOCUMENT_TYPES,
)
from gateway.platforms.telegram_network import (
TelegramFallbackTransport,
discover_fallback_ips,
parse_fallback_ip_env,
)
def check_telegram_requirements() -> bool:
@@ -138,6 +145,13 @@ class TelegramAdapter(BasePlatformAdapter):
# DM Topics config from extra.dm_topics
self._dm_topics_config: List[Dict[str, Any]] = self.config.extra.get("dm_topics", [])
def _fallback_ips(self) -> list[str]:
"""Return validated fallback IPs from config (populated by _apply_env_overrides)."""
configured = self.config.extra.get("fallback_ips", []) if getattr(self.config, "extra", None) else []
if isinstance(configured, str):
configured = configured.split(",")
return parse_fallback_ip_env(",".join(str(v) for v in configured) if configured else None)
@staticmethod
def _looks_like_polling_conflict(error: Exception) -> bool:
text = str(error).lower()
@@ -331,7 +345,8 @@ class TelegramAdapter(BasePlatformAdapter):
def _persist_dm_topic_thread_id(self, chat_id: int, topic_name: str, thread_id: int) -> None:
"""Save a newly created thread_id back into config.yaml so it persists across restarts."""
try:
config_path = _Path.home() / ".hermes" / "config.yaml"
from hermes_constants import get_hermes_home
config_path = get_hermes_home() / "config.yaml"
if not config_path.exists():
logger.warning("[%s] Config file not found at %s, cannot persist thread_id", self.name, config_path)
return
@@ -474,7 +489,26 @@ class TelegramAdapter(BasePlatformAdapter):
return False
# Build the application
self._app = Application.builder().token(self.config.token).build()
builder = Application.builder().token(self.config.token)
fallback_ips = self._fallback_ips()
if not fallback_ips:
fallback_ips = await discover_fallback_ips()
logger.info(
"[%s] Auto-discovered Telegram fallback IPs: %s",
self.name,
", ".join(fallback_ips),
)
if fallback_ips:
logger.warning(
"[%s] Telegram fallback IPs active: %s",
self.name,
", ".join(fallback_ips),
)
transport = TelegramFallbackTransport(fallback_ips)
request = HTTPXRequest(httpx_kwargs={"transport": transport})
get_updates_request = HTTPXRequest(httpx_kwargs={"transport": transport})
builder = builder.request(request).get_updates_request(get_updates_request)
self._app = builder.build()
self._bot = self._app.bot
# Register handlers
@@ -674,9 +708,15 @@ class TelegramAdapter(BasePlatformAdapter):
except ImportError:
_NetErr = OSError # type: ignore[misc,assignment]
try:
from telegram.error import BadRequest as _BadReq
except ImportError:
_BadReq = None # type: ignore[assignment,misc]
for i, chunk in enumerate(chunks):
should_thread = self._should_thread_reply(reply_to, i)
reply_to_id = int(reply_to) if should_thread else None
effective_thread_id = int(thread_id) if thread_id else None
msg = None
for _send_attempt in range(3):
@@ -688,7 +728,7 @@ class TelegramAdapter(BasePlatformAdapter):
text=chunk,
parse_mode=ParseMode.MARKDOWN_V2,
reply_to_message_id=reply_to_id,
message_thread_id=int(thread_id) if thread_id else None,
message_thread_id=effective_thread_id,
)
except Exception as md_error:
# Markdown parsing failed, try plain text
@@ -700,12 +740,30 @@ class TelegramAdapter(BasePlatformAdapter):
text=plain_chunk,
parse_mode=None,
reply_to_message_id=reply_to_id,
message_thread_id=int(thread_id) if thread_id else None,
message_thread_id=effective_thread_id,
)
else:
raise
break # success
except _NetErr as send_err:
# BadRequest is a subclass of NetworkError in
# python-telegram-bot but represents permanent errors
# (not transient network issues). Detect and handle
# specific cases instead of blindly retrying.
if _BadReq and isinstance(send_err, _BadReq):
err_lower = str(send_err).lower()
if "thread not found" in err_lower and effective_thread_id is not None:
# Thread doesn't exist — retry without
# message_thread_id so the message still
# reaches the chat.
logger.warning(
"[%s] Thread %s not found, retrying without message_thread_id",
self.name, effective_thread_id,
)
effective_thread_id = None
continue
# Other BadRequest errors are permanent — don't retry
raise
if _send_attempt < 2:
wait = 2 ** _send_attempt
logger.warning("[%s] Network error on send (attempt %d/3), retrying in %ds: %s",
@@ -1700,7 +1758,8 @@ class TelegramAdapter(BasePlatformAdapter):
recognized without a gateway restart.
"""
try:
config_path = _Path.home() / ".hermes" / "config.yaml"
from hermes_constants import get_hermes_home
config_path = get_hermes_home() / "config.yaml"
if not config_path.exists():
return
+245
View File
@@ -0,0 +1,245 @@
"""Telegram-specific network helpers.
Provides a hostname-preserving fallback transport for networks where
api.telegram.org resolves to an endpoint that is unreachable from the current
host. The transport keeps the logical request host and TLS SNI as
api.telegram.org while retrying the TCP connection against one or more fallback
IPv4 addresses.
"""
from __future__ import annotations
import asyncio
import ipaddress
import logging
import os
import socket
from typing import Iterable, Optional
import httpx
logger = logging.getLogger(__name__)
_TELEGRAM_API_HOST = "api.telegram.org"
# DNS-over-HTTPS providers used to discover Telegram API IPs that may differ
# from the (potentially unreachable) IP returned by the local system resolver.
_DOH_TIMEOUT = 4.0 # seconds — bounded so connect() isn't noticeably delayed
_DOH_PROVIDERS: list[dict] = [
{
"url": "https://dns.google/resolve",
"params": {"name": _TELEGRAM_API_HOST, "type": "A"},
"headers": {},
},
{
"url": "https://cloudflare-dns.com/dns-query",
"params": {"name": _TELEGRAM_API_HOST, "type": "A"},
"headers": {"Accept": "application/dns-json"},
},
]
# Last-resort IPs when DoH is also blocked. These are stable Telegram Bot API
# endpoints in the 149.154.160.0/20 block (same seed used by OpenClaw).
_SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url() -> str | None:
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY", "https_proxy", "http_proxy", "all_proxy"):
value = (os.environ.get(key) or "").strip()
if value:
return value
return None
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
"""Retry Telegram Bot API requests via fallback IPs while preserving TLS/SNI.
Requests continue to target https://api.telegram.org/... logically, but on
connect failures the underlying TCP connection is retried against a known
reachable IP. This is effectively the programmatic equivalent of
``curl --resolve api.telegram.org:443:<ip>``.
"""
def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
proxy_url = _resolve_proxy_url()
if proxy_url and "proxy" not in transport_kwargs:
transport_kwargs["proxy"] = proxy_url
self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)
self._fallbacks = {
ip: httpx.AsyncHTTPTransport(**transport_kwargs) for ip in self._fallback_ips
}
self._sticky_ip: Optional[str] = None
self._sticky_lock = asyncio.Lock()
async def handle_async_request(self, request: httpx.Request) -> httpx.Response:
if request.url.host != _TELEGRAM_API_HOST or not self._fallback_ips:
return await self._primary.handle_async_request(request)
sticky_ip = self._sticky_ip
attempt_order: list[Optional[str]] = [sticky_ip] if sticky_ip else [None]
for ip in self._fallback_ips:
if ip != sticky_ip:
attempt_order.append(ip)
last_error: Exception | None = None
for ip in attempt_order:
candidate = request if ip is None else _rewrite_request_for_ip(request, ip)
transport = self._primary if ip is None else self._fallbacks[ip]
try:
response = await transport.handle_async_request(candidate)
if ip is not None and self._sticky_ip != ip:
async with self._sticky_lock:
if self._sticky_ip != ip:
self._sticky_ip = ip
logger.warning(
"[Telegram] Primary api.telegram.org path unreachable; using sticky fallback IP %s",
ip,
)
return response
except Exception as exc:
last_error = exc
if not _is_retryable_connect_error(exc):
raise
if ip is None:
logger.warning(
"[Telegram] Primary api.telegram.org connection failed (%s); trying fallback IPs %s",
exc,
", ".join(self._fallback_ips),
)
continue
logger.warning("[Telegram] Fallback IP %s failed: %s", ip, exc)
continue
assert last_error is not None
raise last_error
async def aclose(self) -> None:
await self._primary.aclose()
for transport in self._fallbacks.values():
await transport.aclose()
def _normalize_fallback_ips(values: Iterable[str]) -> list[str]:
normalized: list[str] = []
for value in values:
raw = str(value).strip()
if not raw:
continue
try:
addr = ipaddress.ip_address(raw)
except ValueError:
logger.warning("Ignoring invalid Telegram fallback IP: %r", raw)
continue
if addr.version != 4:
logger.warning("Ignoring non-IPv4 Telegram fallback IP: %s", raw)
continue
normalized.append(str(addr))
return normalized
def parse_fallback_ip_env(value: str | None) -> list[str]:
if not value:
return []
parts = [part.strip() for part in value.split(",")]
return _normalize_fallback_ips(parts)
def _resolve_system_dns() -> set[str]:
"""Return the IPv4 addresses that the OS resolver gives for api.telegram.org."""
try:
results = socket.getaddrinfo(_TELEGRAM_API_HOST, 443, socket.AF_INET)
return {addr[4][0] for addr in results}
except Exception:
return set()
async def _query_doh_provider(
client: httpx.AsyncClient, provider: dict
) -> list[str]:
"""Query one DoH provider and return A-record IPs."""
try:
resp = await client.get(
provider["url"], params=provider["params"], headers=provider["headers"]
)
resp.raise_for_status()
data = resp.json()
ips: list[str] = []
for answer in data.get("Answer", []):
if answer.get("type") != 1: # A record
continue
raw = answer.get("data", "").strip()
try:
ipaddress.ip_address(raw)
ips.append(raw)
except ValueError:
continue
return ips
except Exception as exc:
logger.debug("DoH query to %s failed: %s", provider["url"], exc)
return []
async def discover_fallback_ips() -> list[str]:
"""Auto-discover Telegram API IPs via DNS-over-HTTPS.
Resolves api.telegram.org through Google and Cloudflare DoH, collects all
unique IPs, and excludes the system-DNS-resolved IP (which is presumably
unreachable on this network). Falls back to a hardcoded seed list when DoH
is also unavailable.
"""
async with httpx.AsyncClient(timeout=httpx.Timeout(_DOH_TIMEOUT)) as client:
doh_tasks = [_query_doh_provider(client, p) for p in _DOH_PROVIDERS]
system_dns_task = asyncio.to_thread(_resolve_system_dns)
results = await asyncio.gather(system_dns_task, *doh_tasks, return_exceptions=True)
# results[0] = system DNS IPs (set), results[1:] = DoH IP lists
system_ips: set[str] = results[0] if isinstance(results[0], set) else set()
doh_ips: list[str] = []
for r in results[1:]:
if isinstance(r, list):
doh_ips.extend(r)
# Deduplicate preserving order, exclude system-DNS IPs
seen: set[str] = set()
candidates: list[str] = []
for ip in doh_ips:
if ip not in seen and ip not in system_ips:
seen.add(ip)
candidates.append(ip)
# Validate through existing normalization
validated = _normalize_fallback_ips(candidates)
if validated:
logger.debug("Discovered Telegram fallback IPs via DoH: %s", ", ".join(validated))
return validated
logger.info(
"DoH discovery yielded no new IPs (system DNS: %s); using seed fallback IPs %s",
", ".join(system_ips) or "unknown",
", ".join(_SEED_FALLBACK_IPS),
)
return list(_SEED_FALLBACK_IPS)
def _rewrite_request_for_ip(request: httpx.Request, ip: str) -> httpx.Request:
original_host = request.url.host or _TELEGRAM_API_HOST
url = request.url.copy_with(host=ip)
headers = request.headers.copy()
headers["host"] = original_host
extensions = dict(request.extensions)
extensions["sni_hostname"] = original_host
return httpx.Request(
method=request.method,
url=url,
headers=headers,
stream=request.stream,
extensions=extensions,
)
def _is_retryable_connect_error(exc: Exception) -> bool:
return isinstance(exc, (httpx.ConnectTimeout, httpx.ConnectError))
+47 -1
View File
@@ -27,6 +27,7 @@ import hashlib
import hmac
import json
import logging
import os
import re
import subprocess
import time
@@ -53,6 +54,7 @@ logger = logging.getLogger(__name__)
DEFAULT_HOST = "0.0.0.0"
DEFAULT_PORT = 8644
_INSECURE_NO_AUTH = "INSECURE_NO_AUTH"
_DYNAMIC_ROUTES_FILENAME = "webhook_subscriptions.json"
def check_webhook_requirements() -> bool:
@@ -68,7 +70,10 @@ class WebhookAdapter(BasePlatformAdapter):
self._host: str = config.extra.get("host", DEFAULT_HOST)
self._port: int = int(config.extra.get("port", DEFAULT_PORT))
self._global_secret: str = config.extra.get("secret", "")
self._routes: Dict[str, dict] = config.extra.get("routes", {})
self._static_routes: Dict[str, dict] = config.extra.get("routes", {})
self._dynamic_routes: Dict[str, dict] = {}
self._dynamic_routes_mtime: float = 0.0
self._routes: Dict[str, dict] = dict(self._static_routes)
self._runner = None
# Delivery info keyed by session chat_id — consumed by send()
@@ -96,6 +101,9 @@ class WebhookAdapter(BasePlatformAdapter):
# ------------------------------------------------------------------
async def connect(self) -> bool:
# Load agent-created subscriptions before validating
self._reload_dynamic_routes()
# Validate routes at startup — secret is required per route
for name, route in self._routes.items():
secret = route.get("secret", self._global_secret)
@@ -182,8 +190,46 @@ class WebhookAdapter(BasePlatformAdapter):
"""GET /health — simple health check."""
return web.json_response({"status": "ok", "platform": "webhook"})
def _reload_dynamic_routes(self) -> None:
"""Reload agent-created subscriptions from disk if the file changed."""
from pathlib import Path as _Path
hermes_home = _Path(
os.getenv("HERMES_HOME", str(_Path.home() / ".hermes"))
).expanduser()
subs_path = hermes_home / _DYNAMIC_ROUTES_FILENAME
if not subs_path.exists():
if self._dynamic_routes:
self._dynamic_routes = {}
self._routes = dict(self._static_routes)
logger.debug("[webhook] Dynamic subscriptions file removed, cleared dynamic routes")
return
try:
mtime = subs_path.stat().st_mtime
if mtime <= self._dynamic_routes_mtime:
return # No change
data = json.loads(subs_path.read_text(encoding="utf-8"))
if not isinstance(data, dict):
return
# Merge: static routes take precedence over dynamic ones
self._dynamic_routes = {
k: v for k, v in data.items()
if k not in self._static_routes
}
self._routes = {**self._dynamic_routes, **self._static_routes}
self._dynamic_routes_mtime = mtime
logger.info(
"[webhook] Reloaded %d dynamic route(s): %s",
len(self._dynamic_routes),
", ".join(self._dynamic_routes.keys()) or "(none)",
)
except Exception as e:
logger.warning("[webhook] Failed to reload dynamic routes: %s", e)
async def _handle_webhook(self, request: "web.Request") -> "web.Response":
"""POST /webhooks/{route_name} — receive and process a webhook event."""
# Hot-reload dynamic subscriptions on each request (mtime-gated, cheap)
self._reload_dynamic_routes()
route_name = request.match_info.get("route_name", "")
route_config = self._routes.get(route_name)
+5 -1
View File
@@ -26,6 +26,7 @@ from pathlib import Path
from typing import Dict, Optional, Any
from hermes_cli.config import get_hermes_home
from hermes_constants import get_hermes_dir
logger = logging.getLogger(__name__)
@@ -134,7 +135,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
)
self._session_path: Path = Path(config.extra.get(
"session_path",
get_hermes_home() / "whatsapp" / "session"
get_hermes_dir("platforms/whatsapp/session", "whatsapp/session")
))
self._reply_prefix: Optional[str] = config.extra.get("reply_prefix")
self._message_queue: asyncio.Queue = asyncio.Queue()
@@ -526,6 +527,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a local image file natively via bridge."""
return await self._send_media_to_bridge(chat_id, image_path, "image", caption)
@@ -536,6 +538,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a video natively via bridge — plays inline in WhatsApp."""
return await self._send_media_to_bridge(chat_id, video_path, "video", caption)
@@ -547,6 +550,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a document/file as a downloadable attachment via bridge."""
return await self._send_media_to_bridge(
+191 -24
View File
@@ -288,7 +288,7 @@ def _resolve_gateway_model(config: dict | None = None) -> str:
if isinstance(model_cfg, str):
model = model_cfg
elif isinstance(model_cfg, dict):
model = model_cfg.get("default", model)
model = model_cfg.get("default") or model_cfg.get("model") or model
return model
@@ -432,7 +432,7 @@ class GatewayRunner:
from honcho_integration.session import HonchoSessionManager
hcfg = HonchoClientConfig.from_global_config()
if not hcfg.enabled or not hcfg.api_key:
if not hcfg.enabled or not (hcfg.api_key or hcfg.base_url):
return None, hcfg
client = get_honcho_client(hcfg)
@@ -573,6 +573,10 @@ class GatewayRunner:
session_id=old_session_id,
honcho_session_key=honcho_session_key,
)
# Fully silence the flush agent — quiet_mode only suppresses init
# messages; tool call output still leaks to the terminal through
# _safe_print → _print_fn. Set a no-op to prevent that.
tmp_agent._print_fn = lambda *a, **kw: None
# Build conversation history from transcript
msgs = [
@@ -741,10 +745,22 @@ class GatewayRunner:
logger.error("No connected messaging platforms remain. Shutting down gateway cleanly.")
await self.stop()
elif not self.adapters and self._failed_platforms:
logger.warning(
"No connected messaging platforms remain, but %d platform(s) queued for reconnection",
len(self._failed_platforms),
)
# All platforms are down and queued for background reconnection.
# If the error is retryable, exit with failure so systemd Restart=on-failure
# can restart the process. Otherwise stay alive and keep retrying in background.
if adapter.fatal_error_retryable:
self._exit_reason = adapter.fatal_error_message or "All messaging platforms failed with retryable errors"
self._exit_with_failure = True
logger.error(
"All messaging platforms failed with retryable errors. "
"Shutting down gateway for service restart (systemd will retry)."
)
await self.stop()
else:
logger.warning(
"No connected messaging platforms remain, but %d platform(s) queued for reconnection",
len(self._failed_platforms),
)
def _request_clean_exit(self, reason: str) -> None:
self._exit_cleanly = True
@@ -954,12 +970,20 @@ class GatewayRunner:
os.getenv(v)
for v in ("TELEGRAM_ALLOWED_USERS", "DISCORD_ALLOWED_USERS",
"WHATSAPP_ALLOWED_USERS", "SLACK_ALLOWED_USERS",
"SIGNAL_ALLOWED_USERS", "EMAIL_ALLOWED_USERS",
"SIGNAL_ALLOWED_USERS", "SIGNAL_GROUP_ALLOWED_USERS",
"EMAIL_ALLOWED_USERS",
"SMS_ALLOWED_USERS", "MATTERMOST_ALLOWED_USERS",
"MATRIX_ALLOWED_USERS", "DINGTALK_ALLOWED_USERS",
"GATEWAY_ALLOWED_USERS")
)
_allow_all = os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes")
_allow_all = os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes") or any(
os.getenv(v, "").lower() in ("true", "1", "yes")
for v in ("TELEGRAM_ALLOW_ALL_USERS", "DISCORD_ALLOW_ALL_USERS",
"WHATSAPP_ALLOW_ALL_USERS", "SLACK_ALLOW_ALL_USERS",
"SIGNAL_ALLOW_ALL_USERS", "EMAIL_ALLOW_ALL_USERS",
"SMS_ALLOW_ALL_USERS", "MATTERMOST_ALLOW_ALL_USERS",
"MATRIX_ALLOW_ALL_USERS", "DINGTALK_ALLOW_ALL_USERS")
)
if not _any_allowlist and not _allow_all:
logger.warning(
"No user allowlists configured. All unauthorized users will be denied. "
@@ -1970,6 +1994,12 @@ class GatewayRunner:
f"Use /resume to browse and restore a previous session.\n"
f"Adjust reset timing in config.yaml under session_reset."
)
try:
session_info = self._format_session_info()
if session_info:
notice = f"{notice}\n\n{session_info}"
except Exception:
pass
await adapter.send(
source.chat_id, notice,
metadata=getattr(event, 'metadata', None),
@@ -2063,7 +2093,7 @@ class GatewayRunner:
if isinstance(_model_cfg, str):
_hyg_model = _model_cfg
elif isinstance(_model_cfg, dict):
_hyg_model = _model_cfg.get("default", _hyg_model)
_hyg_model = _model_cfg.get("default") or _model_cfg.get("model") or _hyg_model
# Read explicit context_length override from model config
# (same as run_agent.py lines 995-1005)
_raw_ctx = _model_cfg.get("context_length")
@@ -2175,6 +2205,7 @@ class GatewayRunner:
enabled_toolsets=["memory"],
session_id=session_entry.session_id,
)
_hyg_agent._print_fn = lambda *a, **kw: None
loop = asyncio.get_event_loop()
_compressed, _ = await loop.run_in_executor(
@@ -2185,6 +2216,15 @@ class GatewayRunner:
),
)
# _compress_context ends the old session and creates
# a new session_id. Write compressed messages into
# the NEW session so the old transcript stays intact
# and searchable via session_search.
_hyg_new_sid = _hyg_agent.session_id
if _hyg_new_sid != session_entry.session_id:
session_entry.session_id = _hyg_new_sid
self.session_store._save()
self.session_store.rewrite_transcript(
session_entry.session_id, _compressed
)
@@ -2736,6 +2776,85 @@ class GatewayRunner:
# Clear session env
self._clear_session_env()
def _format_session_info(self) -> str:
"""Resolve current model config and return a formatted info block.
Surfaces model, provider, context length, and endpoint so gateway
users can immediately see if context detection went wrong (e.g.
local models falling to the 128K default).
"""
from agent.model_metadata import get_model_context_length, DEFAULT_FALLBACK_CONTEXT
model = _resolve_gateway_model()
config_context_length = None
provider = None
base_url = None
api_key = None
try:
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
import yaml as _info_yaml
with open(cfg_path, encoding="utf-8") as f:
data = _info_yaml.safe_load(f) or {}
model_cfg = data.get("model", {})
if isinstance(model_cfg, dict):
raw_ctx = model_cfg.get("context_length")
if raw_ctx is not None:
try:
config_context_length = int(raw_ctx)
except (TypeError, ValueError):
pass
provider = model_cfg.get("provider") or None
base_url = model_cfg.get("base_url") or None
except Exception:
pass
# Resolve runtime credentials for probing
try:
runtime = _resolve_runtime_agent_kwargs()
provider = provider or runtime.get("provider")
base_url = base_url or runtime.get("base_url")
api_key = runtime.get("api_key")
except Exception:
pass
context_length = get_model_context_length(
model,
base_url=base_url or "",
api_key=api_key or "",
config_context_length=config_context_length,
provider=provider or "",
)
# Format context source hint
if config_context_length is not None:
ctx_source = "config"
elif context_length == DEFAULT_FALLBACK_CONTEXT:
ctx_source = "default — set model.context_length in config to override"
else:
ctx_source = "detected"
# Format context length for display
if context_length >= 1_000_000:
ctx_display = f"{context_length / 1_000_000:.1f}M"
elif context_length >= 1_000:
ctx_display = f"{context_length // 1_000}K"
else:
ctx_display = str(context_length)
lines = [
f"◆ Model: `{model}`",
f"◆ Provider: {provider or 'openrouter'}",
f"◆ Context: {ctx_display} tokens ({ctx_source})",
]
# Show endpoint for local/custom setups
if base_url and ("localhost" in base_url or "127.0.0.1" in base_url or "0.0.0.0" in base_url):
lines.append(f"◆ Endpoint: {base_url}")
return "\n".join(lines)
async def _handle_reset_command(self, event: MessageEvent) -> str:
"""Handle /new or /reset command."""
source = event.source
@@ -2776,12 +2895,22 @@ class GatewayRunner:
"session_key": session_key,
})
# Resolve session config info to surface to the user
try:
session_info = self._format_session_info()
except Exception:
session_info = ""
if new_entry:
return "✨ Session reset! I've started fresh with no memory of our previous conversation."
header = "✨ Session reset! Starting fresh."
else:
# No existing session, just create one
self.session_store.get_or_create_session(source, force_new=True)
return "✨ New session started!"
header = "✨ New session started!"
if session_info:
return f"{header}\n\n{session_info}"
return header
async def _handle_status_command(self, event: MessageEvent) -> str:
"""Handle /status command."""
@@ -3885,17 +4014,27 @@ class GatewayRunner:
enabled_toolsets=["memory"],
session_id=session_entry.session_id,
)
tmp_agent._print_fn = lambda *a, **kw: None
loop = asyncio.get_event_loop()
compressed, _ = await loop.run_in_executor(
None,
lambda: tmp_agent._compress_context(msgs, "", approx_tokens=approx_tokens),
lambda: tmp_agent._compress_context(msgs, "", approx_tokens=approx_tokens)
)
self.session_store.rewrite_transcript(session_entry.session_id, compressed)
# _compress_context already calls end_session() on the old session
# (preserving its full transcript in SQLite) and creates a new
# session_id for the continuation. Write the compressed messages
# into the NEW session so the original history stays searchable.
new_session_id = tmp_agent.session_id
if new_session_id != session_entry.session_id:
session_entry.session_id = new_session_id
self.session_store._save()
self.session_store.rewrite_transcript(new_session_id, compressed)
# Reset stored token count — transcript changed, old value is stale
self.session_store.update_session(
session_entry.session_key, last_prompt_tokens=0,
session_entry.session_key, last_prompt_tokens=0
)
new_count = len(compressed)
new_tokens = estimate_messages_tokens_rough(compressed)
@@ -4051,7 +4190,7 @@ class GatewayRunner:
]
ctx = agent.context_compressor
if ctx.last_prompt_tokens:
pct = ctx.last_prompt_tokens / ctx.context_length * 100 if ctx.context_length else 0
pct = min(100, ctx.last_prompt_tokens / ctx.context_length * 100) if ctx.context_length else 0
lines.append(f"Context: {ctx.last_prompt_tokens:,} / {ctx.context_length:,} ({pct:.0f}%)")
if ctx.compression_count:
lines.append(f"Compressions: {ctx.compression_count}")
@@ -4799,9 +4938,14 @@ class GatewayRunner:
enabled_toolsets = sorted(_get_platform_tools(user_config, platform_key))
# Tool progress mode from config.yaml: "all", "new", "verbose", "off"
# Falls back to env vars for backward compatibility
# Falls back to env vars for backward compatibility.
# YAML 1.1 parses bare `off` as boolean False — normalise before
# the `or` chain so it doesn't silently fall through to "all".
_raw_tp = user_config.get("display", {}).get("tool_progress")
if _raw_tp is False:
_raw_tp = "off"
progress_mode = (
user_config.get("display", {}).get("tool_progress")
_raw_tp
or os.getenv("HERMES_TOOL_PROGRESS_MODE")
or "all"
)
@@ -4860,12 +5004,17 @@ class GatewayRunner:
progress_queue.put(msg)
# Background task to send progress messages
# Accumulates tool lines into a single message that gets edited
# For DM top-level Slack messages, source.thread_id is None but the
# final reply will be threaded under the original message via reply_to.
# Use event_message_id as fallback so progress messages land in the
# same thread as the final response instead of going to the DM root.
_progress_thread_id = source.thread_id or event_message_id
# Accumulates tool lines into a single message that gets edited.
#
# Threading metadata is platform-specific:
# - Slack DM threading needs event_message_id fallback (reply thread)
# - Telegram uses message_thread_id only for forum topics; passing a
# normal DM/group message id as thread_id causes send failures
# - Other platforms should use explicit source.thread_id only
if source.platform == Platform.SLACK:
_progress_thread_id = source.thread_id or event_message_id
else:
_progress_thread_id = source.thread_id
_progress_metadata = {"thread_id": _progress_thread_id} if _progress_thread_id else None
async def send_progress_messages():
@@ -5128,7 +5277,25 @@ class GatewayRunner:
agent.stream_delta_callback = _stream_delta_cb
agent.status_callback = _status_callback_sync
agent.reasoning_config = reasoning_config
# Background review delivery — send "💾 Memory updated" etc. to user
def _bg_review_send(message: str) -> None:
if not _status_adapter:
return
try:
asyncio.run_coroutine_threadsafe(
_status_adapter.send(
_status_chat_id,
message,
metadata=_status_thread_metadata,
),
_loop_for_step,
)
except Exception as _e:
logger.debug("background_review_callback error: %s", _e)
agent.background_review_callback = _bg_review_send
# Store agent reference for interrupt support
agent_holder[0] = agent
# Capture the full tool definitions for transcript logging
+14 -7
View File
@@ -762,14 +762,16 @@ class SessionStore:
if session_key in self._entries:
entry = self._entries[session_key]
entry.updated_at = _now()
entry.input_tokens += input_tokens
entry.output_tokens += output_tokens
entry.cache_read_tokens += cache_read_tokens
entry.cache_write_tokens += cache_write_tokens
# Direct assignment — the gateway receives cumulative totals
# from the cached agent, not per-call deltas.
entry.input_tokens = input_tokens
entry.output_tokens = output_tokens
entry.cache_read_tokens = cache_read_tokens
entry.cache_write_tokens = cache_write_tokens
if last_prompt_tokens is not None:
entry.last_prompt_tokens = last_prompt_tokens
if estimated_cost_usd is not None:
entry.estimated_cost_usd += estimated_cost_usd
entry.estimated_cost_usd = estimated_cost_usd
if cost_status:
entry.cost_status = cost_status
entry.total_tokens = (
@@ -783,7 +785,7 @@ class SessionStore:
if self._db and db_session_id:
try:
self._db.update_token_counts(
self._db.set_token_counts(
db_session_id,
input_tokens=input_tokens,
output_tokens=output_tokens,
@@ -795,6 +797,7 @@ class SessionStore:
billing_provider=provider,
billing_base_url=base_url,
model=model,
absolute=True,
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
@@ -955,13 +958,17 @@ class SessionStore:
try:
self._db.clear_messages(session_id)
for msg in messages:
role = msg.get("role", "unknown")
self._db.append_message(
session_id=session_id,
role=msg.get("role", "unknown"),
role=role,
content=msg.get("content"),
tool_name=msg.get("tool_name"),
tool_calls=msg.get("tool_calls"),
tool_call_id=msg.get("tool_call_id"),
reasoning=msg.get("reasoning") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
)
except Exception as e:
logger.debug("Failed to rewrite transcript in DB: %s", e)
+2 -2
View File
@@ -11,5 +11,5 @@ Provides subcommands for:
- hermes cron - Manage cron jobs
"""
__version__ = "0.4.0"
__release_date__ = "2026.3.23"
__version__ = "0.5.0"
__release_date__ = "2026.3.28"
+10 -1
View File
@@ -160,7 +160,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
id="alibaba",
name="Alibaba Cloud (DashScope)",
auth_type="api_key",
inference_base_url="https://dashscope-intl.aliyuncs.com/apps/anthropic",
inference_base_url="https://coding-intl.dashscope.aliyuncs.com/v1",
api_key_env_vars=("DASHSCOPE_API_KEY",),
base_url_env_var="DASHSCOPE_BASE_URL",
),
@@ -212,6 +212,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("KILOCODE_API_KEY",),
base_url_env_var="KILOCODE_BASE_URL",
),
"huggingface": ProviderConfig(
id="huggingface",
name="Hugging Face",
auth_type="api_key",
inference_base_url="https://router.huggingface.co/v1",
api_key_env_vars=("HF_TOKEN",),
base_url_env_var="HF_BASE_URL",
),
}
@@ -685,6 +693,7 @@ def resolve_provider(
"github-copilot-acp": "copilot-acp", "copilot-acp-agent": "copilot-acp",
"aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
"opencode": "opencode-zen", "zen": "opencode-zen",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
}
+55 -3
View File
@@ -138,6 +138,12 @@ DEFAULT_CONFIG = {
"toolsets": ["hermes-cli"],
"agent": {
"max_turns": 90,
# Tool-use enforcement: injects system prompt guidance that tells the
# model to actually call tools instead of describing intended actions.
# Values: "auto" (default — applies to gpt/codex models), true/false
# (force on/off for all models), or a list of model-name substrings
# to match (e.g. ["gpt", "codex", "gemini", "qwen"]).
"tool_use_enforcement": "auto",
},
"terminal": {
@@ -221,42 +227,49 @@ DEFAULT_CONFIG = {
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30, # seconds — increase for slow local models
},
"compression": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 120, # seconds — compression summarises large contexts; increase for local models
},
"session_search": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
},
"skills_hub": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
},
"approval": {
"provider": "auto",
"model": "", # fast/cheap model recommended (e.g. gemini-flash, haiku)
"base_url": "",
"api_key": "",
"timeout": 30,
},
"mcp": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
},
"flush_memories": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
},
},
@@ -264,6 +277,7 @@ DEFAULT_CONFIG = {
"compact": False,
"personality": "kawaii",
"resume_display": "full",
"busy_input_mode": "interrupt",
"bell_on_complete": False,
"show_reasoning": False,
"streaming": False,
@@ -546,14 +560,14 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
},
"DASHSCOPE_API_KEY": {
"description": "Alibaba Cloud DashScope API key for Qwen models",
"description": "Alibaba Cloud DashScope API key (Qwen + multi-provider models)",
"prompt": "DashScope API Key",
"url": "https://modelstudio.console.alibabacloud.com/",
"password": True,
"category": "provider",
},
"DASHSCOPE_BASE_URL": {
"description": "Custom DashScope base URL (default: international endpoint)",
"description": "Custom DashScope base URL (default: coding-intl OpenAI-compat endpoint)",
"prompt": "DashScope Base URL",
"url": "",
"password": False,
@@ -592,8 +606,31 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"HF_TOKEN": {
"description": "Hugging Face token for Inference Providers (20+ open models via router.huggingface.co)",
"prompt": "Hugging Face Token",
"url": "https://huggingface.co/settings/tokens",
"password": True,
"category": "provider",
},
"HF_BASE_URL": {
"description": "Hugging Face Inference Providers base URL override",
"prompt": "HF base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"EXA_API_KEY": {
"description": "Exa API key for AI-native web search and contents",
"prompt": "Exa API key",
"url": "https://exa.ai/",
"tools": ["web_search", "web_extract"],
"password": True,
"category": "tool",
},
"PARALLEL_API_KEY": {
"description": "Parallel API key for AI-native web search and extract",
"prompt": "Parallel API key",
@@ -780,6 +817,20 @@ OPTIONAL_ENV_VARS = {
"password": False,
"category": "messaging",
},
"MATTERMOST_REQUIRE_MENTION": {
"description": "Require @mention in Mattermost channels (default: true). Set to false to respond to all messages.",
"prompt": "Require @mention in channels",
"url": None,
"password": False,
"category": "messaging",
},
"MATTERMOST_FREE_RESPONSE_CHANNELS": {
"description": "Comma-separated Mattermost channel IDs where bot responds without @mention",
"prompt": "Free-response channel IDs (comma-separated)",
"url": None,
"password": False,
"category": "messaging",
},
"MATRIX_HOMESERVER": {
"description": "Matrix homeserver URL (e.g. https://matrix.example.org)",
"prompt": "Matrix homeserver URL",
@@ -1650,6 +1701,7 @@ def show_config():
keys = [
("OPENROUTER_API_KEY", "OpenRouter"),
("VOICE_TOOLS_OPENAI_KEY", "OpenAI (STT/TTS)"),
("EXA_API_KEY", "Exa"),
("PARALLEL_API_KEY", "Parallel"),
("FIRECRAWL_API_KEY", "Firecrawl"),
("TAVILY_API_KEY", "Tavily"),
@@ -1809,7 +1861,7 @@ def set_config_value(key: str, value: str):
# Check if it's an API key (goes to .env)
api_keys = [
'OPENROUTER_API_KEY', 'OPENAI_API_KEY', 'ANTHROPIC_API_KEY', 'VOICE_TOOLS_OPENAI_KEY',
'PARALLEL_API_KEY', 'FIRECRAWL_API_KEY', 'FIRECRAWL_API_URL', 'TAVILY_API_KEY',
'EXA_API_KEY', 'PARALLEL_API_KEY', 'FIRECRAWL_API_KEY', 'FIRECRAWL_API_URL', 'TAVILY_API_KEY',
'BROWSERBASE_API_KEY', 'BROWSERBASE_PROJECT_ID', 'BROWSER_USE_API_KEY',
'FAL_KEY', 'TELEGRAM_BOT_TOKEN', 'DISCORD_BOT_TOKEN',
'TERMINAL_SSH_HOST', 'TERMINAL_SSH_USER', 'TERMINAL_SSH_KEY',
+3 -3
View File
@@ -56,7 +56,7 @@ def _honcho_is_configured_for_doctor() -> bool:
from honcho_integration.client import HonchoClientConfig
cfg = HonchoClientConfig.from_global_config()
return bool(cfg.enabled and cfg.api_key)
return bool(cfg.enabled and (cfg.api_key or cfg.base_url))
except Exception:
return False
@@ -708,8 +708,8 @@ def run_doctor(args):
check_warn("Honcho config not found", "run: hermes honcho setup")
elif not hcfg.enabled:
check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
elif not hcfg.api_key:
check_fail("Honcho API key not set", "run: hermes honcho setup")
elif not (hcfg.api_key or hcfg.base_url):
check_fail("Honcho API key or base URL not set", "run: hermes honcho setup")
issues.append("No Honcho API key — run 'hermes honcho setup'")
else:
from honcho_integration.client import get_honcho_client, reset_honcho_client
+119 -20
View File
@@ -125,20 +125,43 @@ _SERVICE_BASE = "hermes-gateway"
SERVICE_DESCRIPTION = "Hermes Agent Gateway - Messaging Platform Integration"
def _profile_suffix() -> str:
"""Derive a service-name suffix from the current HERMES_HOME.
Returns ``""`` for the default ``~/.hermes``, the profile name for
``~/.hermes/profiles/<name>``, or a short hash for any other custom
HERMES_HOME path.
"""
import hashlib
import re
from pathlib import Path as _Path
home = get_hermes_home().resolve()
default = (_Path.home() / ".hermes").resolve()
if home == default:
return ""
# Detect ~/.hermes/profiles/<name> pattern → use the profile name
profiles_root = (default / "profiles").resolve()
try:
rel = home.relative_to(profiles_root)
parts = rel.parts
if len(parts) == 1 and re.match(r"^[a-z0-9][a-z0-9_-]{0,63}$", parts[0]):
return parts[0]
except ValueError:
pass
# Fallback: short hash for arbitrary HERMES_HOME paths
return hashlib.sha256(str(home).encode()).hexdigest()[:8]
def get_service_name() -> str:
"""Derive a systemd service name scoped to this HERMES_HOME.
Default ``~/.hermes`` returns ``hermes-gateway`` (backward compatible).
Any other HERMES_HOME appends a short hash so multiple installations
can each have their own systemd service without conflicting.
Profile ``~/.hermes/profiles/coder`` returns ``hermes-gateway-coder``.
Any other HERMES_HOME appends a short hash for uniqueness.
"""
import hashlib
from pathlib import Path as _Path # local import to avoid monkeypatch interference
home = get_hermes_home().resolve()
default = (_Path.home() / ".hermes").resolve()
if home == default:
suffix = _profile_suffix()
if not suffix:
return _SERVICE_BASE
suffix = hashlib.sha256(str(home).encode()).hexdigest()[:8]
return f"{_SERVICE_BASE}-{suffix}"
@@ -369,7 +392,14 @@ def print_systemd_linger_guidance() -> None:
print(" sudo loginctl enable-linger $USER")
def get_launchd_plist_path() -> Path:
return Path.home() / "Library" / "LaunchAgents" / "ai.hermes.gateway.plist"
"""Return the launchd plist path, scoped per profile.
Default ``~/.hermes`` ``ai.hermes.gateway.plist`` (backward compatible).
Profile ``~/.hermes/profiles/coder`` ``ai.hermes.gateway-coder.plist``.
"""
suffix = _profile_suffix()
name = f"ai.hermes.gateway-{suffix}" if suffix else "ai.hermes.gateway"
return Path.home() / "Library" / "LaunchAgents" / f"{name}.plist"
def _detect_venv_dir() -> Path | None:
"""Detect the active virtualenv directory.
@@ -420,6 +450,17 @@ def get_hermes_cli_path() -> str:
# Systemd (Linux)
# =============================================================================
def _build_user_local_paths(home: Path, path_entries: list[str]) -> list[str]:
"""Return user-local bin dirs that exist and aren't already in *path_entries*."""
candidates = [
str(home / ".local" / "bin"), # uv, uvx, pip-installed CLIs
str(home / ".cargo" / "bin"), # Rust/cargo tools
str(home / "go" / "bin"), # Go tools
str(home / ".npm-global" / "bin"), # npm global packages
]
return [p for p in candidates if p not in path_entries and Path(p).exists()]
def generate_systemd_unit(system: bool = False, run_as_user: str | None = None) -> str:
python_path = get_python_path()
working_dir = str(PROJECT_ROOT)
@@ -434,13 +475,16 @@ def generate_systemd_unit(system: bool = False, run_as_user: str | None = None)
resolved_node_dir = str(Path(resolved_node).resolve().parent)
if resolved_node_dir not in path_entries:
path_entries.append(resolved_node_dir)
path_entries.extend(["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"])
sane_path = ":".join(path_entries)
hermes_home = str(get_hermes_home().resolve())
common_bin_paths = ["/usr/local/sbin", "/usr/local/bin", "/usr/sbin", "/usr/bin", "/sbin", "/bin"]
if system:
username, group_name, home_dir = _system_service_identity(run_as_user)
path_entries.extend(_build_user_local_paths(Path(home_dir), path_entries))
path_entries.extend(common_bin_paths)
sane_path = ":".join(path_entries)
return f"""[Unit]
Description={SERVICE_DESCRIPTION}
After=network-online.target
@@ -472,6 +516,9 @@ StandardError=journal
WantedBy=multi-user.target
"""
path_entries.extend(_build_user_local_paths(Path.home(), path_entries))
path_entries.extend(common_bin_paths)
sane_path = ":".join(path_entries)
return f"""[Unit]
Description={SERVICE_DESCRIPTION}
After=network.target
@@ -752,18 +799,46 @@ def systemd_status(deep: bool = False, system: bool = False):
# Launchd (macOS)
# =============================================================================
def get_launchd_label() -> str:
"""Return the launchd service label, scoped per profile."""
suffix = _profile_suffix()
return f"ai.hermes.gateway-{suffix}" if suffix else "ai.hermes.gateway"
def generate_launchd_plist() -> str:
python_path = get_python_path()
working_dir = str(PROJECT_ROOT)
hermes_home = str(get_hermes_home().resolve())
log_dir = get_hermes_home() / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
label = get_launchd_label()
# Build a sane PATH for the launchd plist. launchd provides only a
# minimal default (/usr/bin:/bin:/usr/sbin:/sbin) which misses Homebrew,
# nvm, cargo, etc. We prepend venv/bin and node_modules/.bin (matching
# the systemd unit), then capture the user's full shell PATH so every
# user-installed tool (node, ffmpeg, …) is reachable.
detected_venv = _detect_venv_dir()
venv_bin = str(detected_venv / "bin") if detected_venv else str(PROJECT_ROOT / "venv" / "bin")
venv_dir = str(detected_venv) if detected_venv else str(PROJECT_ROOT / "venv")
node_bin = str(PROJECT_ROOT / "node_modules" / ".bin")
# Resolve the directory containing the node binary (e.g. Homebrew, nvm)
# so it's explicitly in PATH even if the user's shell PATH changes later.
priority_dirs = [venv_bin, node_bin]
resolved_node = shutil.which("node")
if resolved_node:
resolved_node_dir = str(Path(resolved_node).resolve().parent)
if resolved_node_dir not in priority_dirs:
priority_dirs.append(resolved_node_dir)
sane_path = ":".join(
dict.fromkeys(priority_dirs + [p for p in os.environ.get("PATH", "").split(":") if p])
)
return f"""<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>ai.hermes.gateway</string>
<string>{label}</string>
<key>ProgramArguments</key>
<array>
@@ -778,6 +853,16 @@ def generate_launchd_plist() -> str:
<key>WorkingDirectory</key>
<string>{working_dir}</string>
<key>EnvironmentVariables</key>
<dict>
<key>PATH</key>
<string>{sane_path}</string>
<key>VIRTUAL_ENV</key>
<string>{venv_dir}</string>
<key>HERMES_HOME</key>
<string>{hermes_home}</string>
</dict>
<key>RunAtLoad</key>
<true/>
@@ -863,20 +948,33 @@ def launchd_uninstall():
print("✓ Service uninstalled")
def launchd_start():
refresh_launchd_plist_if_needed()
plist_path = get_launchd_plist_path()
label = get_launchd_label()
# Self-heal if the plist is missing entirely (e.g., manual cleanup, failed upgrade)
if not plist_path.exists():
print("↻ launchd plist missing; regenerating service definition")
plist_path.parent.mkdir(parents=True, exist_ok=True)
plist_path.write_text(generate_launchd_plist(), encoding="utf-8")
subprocess.run(["launchctl", "load", str(plist_path)], check=True)
subprocess.run(["launchctl", "start", label], check=True)
print("✓ Service started")
return
refresh_launchd_plist_if_needed()
try:
subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
subprocess.run(["launchctl", "start", label], check=True)
except subprocess.CalledProcessError as e:
if e.returncode != 3 or not plist_path.exists():
if e.returncode != 3:
raise
print("↻ launchd job was unloaded; reloading service definition")
subprocess.run(["launchctl", "load", str(plist_path)], check=True)
subprocess.run(["launchctl", "start", "ai.hermes.gateway"], check=True)
subprocess.run(["launchctl", "start", label], check=True)
print("✓ Service started")
def launchd_stop():
subprocess.run(["launchctl", "stop", "ai.hermes.gateway"], check=True)
label = get_launchd_label()
subprocess.run(["launchctl", "stop", label], check=True)
print("✓ Service stopped")
def _wait_for_gateway_exit(timeout: float = 10.0, force_after: float = 5.0):
@@ -931,8 +1029,9 @@ def launchd_restart():
def launchd_status(deep: bool = False):
plist_path = get_launchd_plist_path()
label = get_launchd_label()
result = subprocess.run(
["launchctl", "list", "ai.hermes.gateway"],
["launchctl", "list", label],
capture_output=True,
text=True
)
@@ -1437,7 +1536,7 @@ def _is_service_running() -> bool:
return False
elif is_macos() and get_launchd_plist_path().exists():
result = subprocess.run(
["launchctl", "list", "ai.hermes.gateway"],
["launchctl", "list", get_launchd_label()],
capture_output=True, text=True
)
return result.returncode == 0
+228 -61
View File
@@ -795,6 +795,7 @@ def cmd_model(args):
"ai-gateway": "AI Gateway",
"kilocode": "Kilo Code",
"alibaba": "Alibaba Cloud (DashScope)",
"huggingface": "Hugging Face",
"custom": "Custom endpoint",
}
active_label = provider_labels.get(active, active)
@@ -820,7 +821,8 @@ def cmd_model(args):
("opencode-zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
("opencode-go", "OpenCode Go (open models, $10/month subscription)"),
("ai-gateway", "AI Gateway (Vercel — 200+ models, pay-per-use)"),
("alibaba", "Alibaba Cloud / DashScope (Qwen models, Anthropic-compatible)"),
("alibaba", "Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
("huggingface", "Hugging Face Inference Providers (20+ open models)"),
]
# Add user-defined custom providers from config.yaml
@@ -830,8 +832,8 @@ def cmd_model(args):
for entry in custom_providers_cfg:
if not isinstance(entry, dict):
continue
name = entry.get("name", "").strip()
base_url = entry.get("base_url", "").strip()
name = (entry.get("name") or "").strip()
base_url = (entry.get("base_url") or "").strip()
if not name or not base_url:
continue
# Generate a stable key from the name
@@ -893,7 +895,7 @@ def cmd_model(args):
_model_flow_anthropic(config, current_model)
elif selected_provider == "kimi-coding":
_model_flow_kimi(config, current_model)
elif selected_provider in ("zai", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba"):
elif selected_provider in ("zai", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface"):
_model_flow_api_key_provider(config, selected_provider, current_model)
@@ -1502,6 +1504,18 @@ _PROVIDER_MODELS = {
"google/gemini-3-pro-preview",
"google/gemini-3-flash-preview",
],
# Curated HF model list — only agentic models that map to OpenRouter defaults.
# Format: HF model ID → OpenRouter equivalent noted in comment
"huggingface": [
"Qwen/Qwen3.5-397B-A17B", # ↔ qwen/qwen3.5-plus
"Qwen/Qwen3.5-35B-A3B", # ↔ qwen/qwen3.5-35b-a3b
"deepseek-ai/DeepSeek-V3.2", # ↔ deepseek/deepseek-chat
"moonshotai/Kimi-K2.5", # ↔ moonshotai/kimi-k2.5
"MiniMaxAI/MiniMax-M2.5", # ↔ minimax/minimax-m2.5
"zai-org/GLM-5", # ↔ z-ai/glm-5
"XiaomiMiMo/MiMo-V2-Flash", # ↔ xiaomi/mimo-v2-pro
"moonshotai/Kimi-K2-Thinking", # ↔ moonshotai/kimi-k2-thinking
],
}
@@ -2031,19 +2045,25 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
save_env_value(base_url_env, override)
effective_base = override
# Model selection — try live /models endpoint first, fall back to defaults
from hermes_cli.models import fetch_api_models
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
live_models = fetch_api_models(api_key_for_probe, effective_base)
# Model selection — try live /models endpoint first, fall back to defaults.
# Providers with large live catalogs (100+ models) use a curated list instead
# so users see familiar model names rather than an overwhelming dump.
curated = _PROVIDER_MODELS.get(provider_id, [])
if curated and len(curated) >= 8:
# Curated list is substantial — use it directly, skip live probe
live_models = None
else:
from hermes_cli.models import fetch_api_models
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
live_models = fetch_api_models(api_key_for_probe, effective_base)
if live_models:
model_list = live_models
print(f" Found {len(model_list)} model(s) from {pconfig.name} API")
else:
model_list = _PROVIDER_MODELS.get(provider_id, [])
model_list = curated
if model_list:
print(" ⚠ Could not auto-detect models from API — showing defaults.")
print(" Use \"Enter custom model name\" if you don't see your model.")
print(f" Showing {len(model_list)} curated models — use \"Enter custom model name\" for others.")
# else: no defaults either, will fall through to raw input
if model_list:
@@ -2319,6 +2339,12 @@ def cmd_cron(args):
cron_command(args)
def cmd_webhook(args):
"""Webhook subscription management."""
from hermes_cli.webhook import webhook_command
webhook_command(args)
def cmd_doctor(args):
"""Check configuration and dependencies."""
from hermes_cli.doctor import run_doctor
@@ -2450,8 +2476,18 @@ def _update_via_zip(args):
)
else:
# Use sys.executable to explicitly call the venv's pip module,
# avoiding PEP 668 'externally-managed-environment' errors on Debian/Ubuntu
# avoiding PEP 668 'externally-managed-environment' errors on Debian/Ubuntu.
# Some environments lose pip inside the venv; bootstrap it back with
# ensurepip before trying the editable install.
pip_cmd = [sys.executable, "-m", "pip"]
try:
subprocess.run(pip_cmd + ["--version"], cwd=PROJECT_ROOT, check=True, capture_output=True)
except subprocess.CalledProcessError:
subprocess.run(
[sys.executable, "-m", "ensurepip", "--upgrade", "--default-pip"],
cwd=PROJECT_ROOT,
check=True,
)
try:
subprocess.run(pip_cmd + ["install", "-e", ".[all]", "--quiet"], cwd=PROJECT_ROOT, check=True)
except subprocess.CalledProcessError:
@@ -2612,7 +2648,12 @@ def _restore_stashed_changes(
print("Resolve conflicts manually, then run: git stash drop")
print(f"Restore your changes with: git stash apply {stash_ref}")
sys.exit(1)
# In non-interactive mode (gateway /update), don't abort — the code
# update itself succeeded, only the stash restore had conflicts.
# Aborting would report the entire update as failed.
if prompt_user:
sys.exit(1)
return False
stash_selector = _resolve_stash_selector(git_cmd, cwd, stash_ref)
if stash_selector is None:
@@ -2686,30 +2727,60 @@ def cmd_update(args):
# Fetch and pull
try:
print("→ Fetching updates...")
git_cmd = ["git"]
if sys.platform == "win32":
git_cmd = ["git", "-c", "windows.appendAtomically=false"]
subprocess.run(git_cmd + ["fetch", "origin"], cwd=PROJECT_ROOT, check=True)
# Get current branch
print("→ Fetching updates...")
fetch_result = subprocess.run(
git_cmd + ["fetch", "origin"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
if fetch_result.returncode != 0:
stderr = fetch_result.stderr.strip()
if "Could not resolve host" in stderr or "unable to access" in stderr:
print("✗ Network error — cannot reach the remote repository.")
print(f" {stderr.splitlines()[0]}" if stderr else "")
elif "Authentication failed" in stderr or "could not read Username" in stderr:
print("✗ Authentication failed — check your git credentials or SSH key.")
else:
print(f"✗ Failed to fetch updates from origin.")
if stderr:
print(f" {stderr.splitlines()[0]}")
sys.exit(1)
# Get current branch (returns literal "HEAD" when detached)
result = subprocess.run(
git_cmd + ["rev-parse", "--abbrev-ref", "HEAD"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
check=True
check=True,
)
branch = result.stdout.strip()
current_branch = result.stdout.strip()
# Fall back to main if the current branch doesn't exist on the remote
verify = subprocess.run(
git_cmd + ["rev-parse", "--verify", f"origin/{branch}"],
cwd=PROJECT_ROOT, capture_output=True, text=True,
)
if verify.returncode != 0:
branch = "main"
# Always update against main
branch = "main"
# If user is on a non-main branch or detached HEAD, switch to main
if current_branch != "main":
label = "detached HEAD" if current_branch == "HEAD" else f"branch '{current_branch}'"
print(f" ⚠ Currently on {label} — switching to main for update...")
# Stash before checkout so uncommitted work isn't lost
auto_stash_ref = _stash_local_changes_if_needed(git_cmd, PROJECT_ROOT)
subprocess.run(
git_cmd + ["checkout", "main"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
check=True,
)
else:
auto_stash_ref = _stash_local_changes_if_needed(git_cmd, PROJECT_ROOT)
prompt_for_restore = auto_stash_ref is not None and sys.stdin.isatty() and sys.stdout.isatty()
# Check if there are updates
result = subprocess.run(
@@ -2717,31 +2788,69 @@ def cmd_update(args):
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
check=True
check=True,
)
commit_count = int(result.stdout.strip())
if commit_count == 0:
_invalidate_update_cache()
print("✓ Already up to date!")
return
print(f"→ Found {commit_count} new commit(s)")
auto_stash_ref = _stash_local_changes_if_needed(git_cmd, PROJECT_ROOT)
prompt_for_restore = auto_stash_ref is not None and sys.stdin.isatty() and sys.stdout.isatty()
print("→ Pulling updates...")
try:
subprocess.run(git_cmd + ["pull", "--ff-only", "origin", branch], cwd=PROJECT_ROOT, check=True)
finally:
# Restore stash and switch back to original branch if we moved
if auto_stash_ref is not None:
_restore_stashed_changes(
git_cmd,
PROJECT_ROOT,
auto_stash_ref,
git_cmd, PROJECT_ROOT, auto_stash_ref,
prompt_user=prompt_for_restore,
)
if current_branch not in ("main", "HEAD"):
subprocess.run(
git_cmd + ["checkout", current_branch],
cwd=PROJECT_ROOT, capture_output=True, text=True, check=False,
)
print("✓ Already up to date!")
return
print(f"→ Found {commit_count} new commit(s)")
print("→ Pulling updates...")
update_succeeded = False
try:
pull_result = subprocess.run(
git_cmd + ["pull", "--ff-only", "origin", branch],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
if pull_result.returncode != 0:
# ff-only failed — local and remote have diverged (e.g. upstream
# force-pushed or rebase). Since local changes are already
# stashed, reset to match the remote exactly.
print(" ⚠ Fast-forward not possible (history diverged), resetting to match remote...")
reset_result = subprocess.run(
git_cmd + ["reset", "--hard", f"origin/{branch}"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
)
if reset_result.returncode != 0:
print(f"✗ Failed to reset to origin/{branch}.")
if reset_result.stderr.strip():
print(f" {reset_result.stderr.strip()}")
print(" Try manually: git fetch origin && git reset --hard origin/main")
sys.exit(1)
update_succeeded = True
finally:
if auto_stash_ref is not None:
# Don't attempt stash restore if the code update itself failed —
# working tree is in an unknown state.
if not update_succeeded:
print(f" ️ Local changes preserved in stash (ref: {auto_stash_ref})")
print(f" Restore manually with: git stash apply")
else:
_restore_stashed_changes(
git_cmd,
PROJECT_ROOT,
auto_stash_ref,
prompt_user=prompt_for_restore,
)
_invalidate_update_cache()
@@ -2764,8 +2873,18 @@ def cmd_update(args):
)
else:
# Use sys.executable to explicitly call the venv's pip module,
# avoiding PEP 668 'externally-managed-environment' errors on Debian/Ubuntu
# avoiding PEP 668 'externally-managed-environment' errors on Debian/Ubuntu.
# Some environments lose pip inside the venv; bootstrap it back with
# ensurepip before trying the editable install.
pip_cmd = [sys.executable, "-m", "pip"]
try:
subprocess.run(pip_cmd + ["--version"], cwd=PROJECT_ROOT, check=True, capture_output=True)
except subprocess.CalledProcessError:
subprocess.run(
[sys.executable, "-m", "ensurepip", "--upgrade", "--default-pip"],
cwd=PROJECT_ROOT,
check=True,
)
try:
subprocess.run(pip_cmd + ["install", "-e", ".[all]", "--quiet"], cwd=PROJECT_ROOT, check=True)
except subprocess.CalledProcessError:
@@ -2824,10 +2943,15 @@ def cmd_update(args):
print(f" {len(missing_config)} new config option(s) available")
print()
if sys.stdin.isatty():
response = input("Would you like to configure them now? [Y/n]: ").strip().lower()
else:
if not (sys.stdin.isatty() and sys.stdout.isatty()):
print(" Non-interactive session — skipping config migration prompt.")
print(" Run 'hermes config migrate' later to apply any new config/env options.")
response = "n"
else:
try:
response = input("Would you like to configure them now? [Y/n]: ").strip().lower()
except EOFError:
response = "n"
if response in ('', 'y', 'yes'):
print()
@@ -2875,10 +2999,11 @@ def cmd_update(args):
# Check for macOS launchd service
if is_macos():
try:
from hermes_cli.gateway import get_launchd_label
plist_path = get_launchd_plist_path()
if plist_path.exists():
check = subprocess.run(
["launchctl", "list", "ai.hermes.gateway"],
["launchctl", "list", get_launchd_label()],
capture_output=True, text=True, timeout=5,
)
has_launchd_service = check.returncode == 0
@@ -2934,12 +3059,13 @@ def cmd_update(args):
# after a manual SIGTERM, which would race with the
# PID file cleanup.
print("→ Restarting gateway service...")
_launchd_label = get_launchd_label()
stop = subprocess.run(
["launchctl", "stop", "ai.hermes.gateway"],
["launchctl", "stop", _launchd_label],
capture_output=True, text=True, timeout=10,
)
start = subprocess.run(
["launchctl", "start", "ai.hermes.gateway"],
["launchctl", "start", _launchd_label],
capture_output=True, text=True, timeout=10,
)
if start.returncode == 0:
@@ -3122,7 +3248,7 @@ For more help on a command:
)
chat_parser.add_argument(
"--provider",
choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
choices=["auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot", "anthropic", "huggingface", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
default=None,
help="Inference provider (default: auto)"
)
@@ -3423,7 +3549,38 @@ For more help on a command:
cron_subparsers.add_parser("tick", help="Run due jobs once and exit")
cron_parser.set_defaults(func=cmd_cron)
# =========================================================================
# webhook command
# =========================================================================
webhook_parser = subparsers.add_parser(
"webhook",
help="Manage dynamic webhook subscriptions",
description="Create, list, and remove webhook subscriptions for event-driven agent activation",
)
webhook_subparsers = webhook_parser.add_subparsers(dest="webhook_action")
wh_sub = webhook_subparsers.add_parser("subscribe", aliases=["add"], help="Create a webhook subscription")
wh_sub.add_argument("name", help="Route name (used in URL: /webhooks/<name>)")
wh_sub.add_argument("--prompt", default="", help="Prompt template with {dot.notation} payload refs")
wh_sub.add_argument("--events", default="", help="Comma-separated event types to accept")
wh_sub.add_argument("--description", default="", help="What this subscription does")
wh_sub.add_argument("--skills", default="", help="Comma-separated skill names to load")
wh_sub.add_argument("--deliver", default="log", help="Delivery target: log, telegram, discord, slack, etc.")
wh_sub.add_argument("--deliver-chat-id", default="", help="Target chat ID for cross-platform delivery")
wh_sub.add_argument("--secret", default="", help="HMAC secret (auto-generated if omitted)")
webhook_subparsers.add_parser("list", aliases=["ls"], help="List all dynamic subscriptions")
wh_rm = webhook_subparsers.add_parser("remove", aliases=["rm"], help="Remove a subscription")
wh_rm.add_argument("name", help="Subscription name to remove")
wh_test = webhook_subparsers.add_parser("test", help="Send a test POST to a webhook route")
wh_test.add_argument("name", help="Subscription name to test")
wh_test.add_argument("--payload", default="", help="JSON payload to send (default: test payload)")
webhook_parser.set_defaults(func=cmd_webhook)
# =========================================================================
# doctor command
# =========================================================================
@@ -3556,7 +3713,7 @@ For more help on a command:
skills_snapshot = skills_subparsers.add_parser("snapshot", help="Export/import skill configurations")
snapshot_subparsers = skills_snapshot.add_subparsers(dest="snapshot_action")
snap_export = snapshot_subparsers.add_parser("export", help="Export installed skills to a file")
snap_export.add_argument("output", help="Output JSON file path")
snap_export.add_argument("output", help="Output JSON file path (use - for stdout)")
snap_import = snapshot_subparsers.add_parser("import", help="Import and install skills from a file")
snap_import.add_argument("input", help="Input JSON file path")
snap_import.add_argument("--force", action="store_true", help="Force install despite caution verdict")
@@ -3833,7 +3990,7 @@ For more help on a command:
sessions_list.add_argument("--limit", type=int, default=20, help="Max sessions to show")
sessions_export = sessions_subparsers.add_parser("export", help="Export sessions to a JSONL file")
sessions_export.add_argument("output", help="Output JSONL file path")
sessions_export.add_argument("output", help="Output JSONL file path (use - for stdout)")
sessions_export.add_argument("--source", help="Filter by source")
sessions_export.add_argument("--session-id", help="Export a specific session")
@@ -3914,15 +4071,25 @@ For more help on a command:
if not data:
print(f"Session '{args.session_id}' not found.")
return
with open(args.output, "w", encoding="utf-8") as f:
f.write(_json.dumps(data, ensure_ascii=False) + "\n")
print(f"Exported 1 session to {args.output}")
line = _json.dumps(data, ensure_ascii=False) + "\n"
if args.output == "-":
import sys
sys.stdout.write(line)
else:
with open(args.output, "w", encoding="utf-8") as f:
f.write(line)
print(f"Exported 1 session to {args.output}")
else:
sessions = db.export_all(source=args.source)
with open(args.output, "w", encoding="utf-8") as f:
if args.output == "-":
import sys
for s in sessions:
f.write(_json.dumps(s, ensure_ascii=False) + "\n")
print(f"Exported {len(sessions)} sessions to {args.output}")
sys.stdout.write(_json.dumps(s, ensure_ascii=False) + "\n")
else:
with open(args.output, "w", encoding="utf-8") as f:
for s in sessions:
f.write(_json.dumps(s, ensure_ascii=False) + "\n")
print(f"Exported {len(sessions)} sessions to {args.output}")
elif action == "delete":
resolved_session_id = db.resolve_session_id(args.session_id)
+26 -5
View File
@@ -208,14 +208,31 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"google/gemini-3-pro-preview",
"google/gemini-3-flash-preview",
],
# Alibaba DashScope Coding platform (coding-intl) — default endpoint.
# Supports Qwen models + third-party providers (GLM, Kimi, MiniMax).
# Users with classic DashScope keys should override DASHSCOPE_BASE_URL
# to https://dashscope-intl.aliyuncs.com/compatible-mode/v1 (OpenAI-compat)
# or https://dashscope-intl.aliyuncs.com/apps/anthropic (Anthropic-compat).
"alibaba": [
"qwen3.5-plus",
"qwen3-max",
"qwen3-coder-plus",
"qwen3-coder-next",
"qwen-plus-latest",
"qwen3.5-flash",
"qwen-vl-max",
# Third-party models available on coding-intl
"glm-5",
"glm-4.7",
"kimi-k2.5",
"MiniMax-M2.5",
],
# Curated HF model list — only agentic models that map to OpenRouter defaults.
"huggingface": [
"Qwen/Qwen3.5-397B-A17B",
"Qwen/Qwen3.5-35B-A3B",
"deepseek-ai/DeepSeek-V3.2",
"moonshotai/Kimi-K2.5",
"MiniMaxAI/MiniMax-M2.5",
"zai-org/GLM-5",
"XiaomiMiMo/MiMo-V2-Flash",
"moonshotai/Kimi-K2-Thinking",
],
}
@@ -236,6 +253,7 @@ _PROVIDER_LABELS = {
"ai-gateway": "AI Gateway",
"kilocode": "Kilo Code",
"alibaba": "Alibaba Cloud (DashScope)",
"huggingface": "Hugging Face",
"custom": "Custom endpoint",
}
@@ -271,6 +289,9 @@ _PROVIDER_ALIASES = {
"aliyun": "alibaba",
"qwen": "alibaba",
"alibaba-cloud": "alibaba",
"hf": "huggingface",
"hugging-face": "huggingface",
"huggingface-hub": "huggingface",
}
@@ -304,7 +325,7 @@ def list_available_providers() -> list[dict[str, str]]:
# Canonical providers in display order
_PROVIDER_ORDER = [
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
"huggingface", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
"opencode-zen", "opencode-go",
"ai-gateway", "deepseek", "custom",
]
+16 -5
View File
@@ -385,16 +385,23 @@ class PluginManager:
# Hook invocation
# -----------------------------------------------------------------------
def invoke_hook(self, hook_name: str, **kwargs: Any) -> None:
def invoke_hook(self, hook_name: str, **kwargs: Any) -> List[Any]:
"""Call all registered callbacks for *hook_name*.
Each callback is wrapped in its own try/except so a misbehaving
plugin cannot break the core agent loop.
Returns a list of non-``None`` return values from callbacks.
This allows hooks like ``pre_llm_call`` to contribute context
that the agent core can collect and inject.
"""
callbacks = self._hooks.get(hook_name, [])
results: List[Any] = []
for cb in callbacks:
try:
cb(**kwargs)
ret = cb(**kwargs)
if ret is not None:
results.append(ret)
except Exception as exc:
logger.warning(
"Hook '%s' callback %s raised: %s",
@@ -402,6 +409,7 @@ class PluginManager:
getattr(cb, "__name__", repr(cb)),
exc,
)
return results
# -----------------------------------------------------------------------
# Introspection
@@ -446,9 +454,12 @@ def discover_plugins() -> None:
get_plugin_manager().discover_and_load()
def invoke_hook(hook_name: str, **kwargs: Any) -> None:
"""Invoke a lifecycle hook on all loaded plugins."""
get_plugin_manager().invoke_hook(hook_name, **kwargs)
def invoke_hook(hook_name: str, **kwargs: Any) -> List[Any]:
"""Invoke a lifecycle hook on all loaded plugins.
Returns a list of non-``None`` return values from plugin callbacks.
"""
return get_plugin_manager().invoke_hook(hook_name, **kwargs)
def get_plugin_tool_names() -> Set[str]:
+6 -9
View File
@@ -63,8 +63,11 @@ def _get_model_config() -> Dict[str, Any]:
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
cfg = dict(model_cfg)
default = cfg.get("default", "").strip()
base_url = cfg.get("base_url", "").strip()
# Accept "model" as alias for "default" (users intuitively write model.model)
if not cfg.get("default") and cfg.get("model"):
cfg["default"] = cfg["model"]
default = (cfg.get("default") or "").strip()
base_url = (cfg.get("base_url") or "").strip()
is_local = "localhost" in base_url or "127.0.0.1" in base_url
is_fallback = not default or default == "anthropic/claude-opus-4.6"
if is_local and is_fallback and base_url:
@@ -203,7 +206,7 @@ def _resolve_named_custom_runtime(
or _detect_api_mode_for_url(base_url)
or "chat_completions",
"base_url": base_url,
"api_key": api_key,
"api_key": api_key or "no-key-required",
"source": f"custom_provider:{custom_provider.get('name', requested_provider)}",
}
@@ -407,12 +410,6 @@ def resolve_runtime_provider(
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
elif base_url.rstrip("/").endswith("/anthropic"):
api_mode = "anthropic_messages"
# MiniMax providers always use Anthropic Messages API.
# Auto-correct stale /v1 URLs (from old .env or config) to /anthropic.
elif provider in ("minimax", "minimax-cn"):
api_mode = "anthropic_messages"
if base_url.rstrip("/").endswith("/v1"):
base_url = base_url.rstrip("/")[:-3] + "/anthropic"
return {
"provider": provider,
"api_mode": api_mode,
+146 -18
View File
@@ -80,6 +80,11 @@ _DEFAULT_PROVIDER_MODELS = {
"minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
"ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
"kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
"huggingface": [
"Qwen/Qwen3.5-397B-A17B", "Qwen/Qwen3-235B-A22B-Thinking-2507",
"Qwen/Qwen3-Coder-480B-A35B-Instruct", "deepseek-ai/DeepSeek-R1-0528",
"deepseek-ai/DeepSeek-V3.2", "moonshotai/Kimi-K2.5",
],
}
@@ -580,11 +585,11 @@ def _print_setup_summary(config: dict, hermes_home):
else:
tool_status.append(("Mixture of Agents", False, "OPENROUTER_API_KEY"))
# Web tools (Parallel, Firecrawl, or Tavily)
if get_env_value("PARALLEL_API_KEY") or get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL") or get_env_value("TAVILY_API_KEY"):
# Web tools (Exa, Parallel, Firecrawl, or Tavily)
if get_env_value("EXA_API_KEY") or get_env_value("PARALLEL_API_KEY") or get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL") or get_env_value("TAVILY_API_KEY"):
tool_status.append(("Web Search & Extract", True, None))
else:
tool_status.append(("Web Search & Extract", False, "PARALLEL_API_KEY, FIRECRAWL_API_KEY, or TAVILY_API_KEY"))
tool_status.append(("Web Search & Extract", False, "EXA_API_KEY, PARALLEL_API_KEY, FIRECRAWL_API_KEY, or TAVILY_API_KEY"))
# Browser tools (local Chromium or Browserbase cloud)
import shutil
@@ -884,6 +889,7 @@ def setup_model_provider(config: dict):
"OpenCode Go (open models, $10/month subscription)",
"GitHub Copilot (uses GITHUB_TOKEN or gh auth token)",
"GitHub Copilot ACP (spawns `copilot --acp --stdio`)",
"Hugging Face Inference Providers (20+ open models)",
]
if keep_label:
provider_choices.append(keep_label)
@@ -1528,7 +1534,26 @@ def setup_model_provider(config: dict):
_set_model_provider(config, "copilot-acp", pconfig.inference_base_url)
selected_base_url = pconfig.inference_base_url
# else: provider_idx == 16 (Keep current) — only shown when a provider already exists
elif provider_idx == 16: # Hugging Face Inference Providers
selected_provider = "huggingface"
print()
print_header("Hugging Face API Token")
pconfig = PROVIDER_REGISTRY["huggingface"]
print_info(f"Provider: {pconfig.name}")
print_info("Get your token at: https://huggingface.co/settings/tokens")
print_info("Required permission: 'Make calls to Inference Providers'")
print()
api_key = prompt(" HF Token", password=True)
if api_key:
save_env_value("HF_TOKEN", api_key)
# Clear OpenRouter env vars to prevent routing confusion
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
_set_model_provider(config, "huggingface", pconfig.inference_base_url)
selected_base_url = pconfig.inference_base_url
# else: provider_idx == 17 (Keep current) — only shown when a provider already exists
# Normalize "keep current" to an explicit provider so downstream logic
# doesn't fall back to the generic OpenRouter/static-model path.
if selected_provider is None:
@@ -2067,11 +2092,11 @@ def setup_terminal_backend(config: dict):
print_info("Serverless cloud sandboxes. Each session gets its own container.")
print_info("Requires a Modal account: https://modal.com")
# Check if swe-rex[modal] is installed
# Check if modal SDK is installed
try:
__import__("swe_rex")
__import__("modal")
except ImportError:
print_info("Installing swe-rex[modal]...")
print_info("Installing modal SDK...")
import subprocess
uv_bin = shutil.which("uv")
@@ -2083,22 +2108,22 @@ def setup_terminal_backend(config: dict):
"install",
"--python",
sys.executable,
"swe-rex[modal]",
"modal",
],
capture_output=True,
text=True,
)
else:
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "swe-rex[modal]"],
[sys.executable, "-m", "pip", "install", "modal"],
capture_output=True,
text=True,
)
if result.returncode == 0:
print_success("swe-rex[modal] installed")
print_success("modal SDK installed")
else:
print_warning(
"Install failed — run manually: pip install 'swe-rex[modal]'"
"Install failed — run manually: pip install modal"
)
# Modal token
@@ -2968,6 +2993,95 @@ def setup_tools(config: dict, first_install: bool = False):
tools_command(first_install=first_install, config=config)
# =============================================================================
# Post-Migration Section Skip Logic
# =============================================================================
def _get_section_config_summary(config: dict, section_key: str) -> Optional[str]:
"""Return a short summary if a setup section is already configured, else None.
Used after OpenClaw migration to detect which sections can be skipped.
``get_env_value`` is the module-level import from hermes_cli.config
so that test patches on ``setup_mod.get_env_value`` take effect.
"""
if section_key == "model":
has_key = bool(
get_env_value("OPENROUTER_API_KEY")
or get_env_value("OPENAI_API_KEY")
or get_env_value("ANTHROPIC_API_KEY")
)
if not has_key:
# Check for OAuth providers
try:
from hermes_cli.auth import get_active_provider
if get_active_provider():
has_key = True
except Exception:
pass
if not has_key:
return None
model = config.get("model")
if isinstance(model, str) and model.strip():
return model.strip()
if isinstance(model, dict):
return str(model.get("default") or model.get("model") or "configured")
return "configured"
elif section_key == "terminal":
backend = config.get("terminal", {}).get("backend", "local")
return f"backend: {backend}"
elif section_key == "agent":
max_turns = config.get("agent", {}).get("max_turns", 90)
return f"max turns: {max_turns}"
elif section_key == "gateway":
platforms = []
if get_env_value("TELEGRAM_BOT_TOKEN"):
platforms.append("Telegram")
if get_env_value("DISCORD_BOT_TOKEN"):
platforms.append("Discord")
if get_env_value("SLACK_BOT_TOKEN"):
platforms.append("Slack")
if get_env_value("WHATSAPP_PHONE_NUMBER_ID"):
platforms.append("WhatsApp")
if get_env_value("SIGNAL_ACCOUNT"):
platforms.append("Signal")
if platforms:
return ", ".join(platforms)
return None # No platforms configured — section must run
elif section_key == "tools":
tools = []
if get_env_value("ELEVENLABS_API_KEY"):
tools.append("TTS/ElevenLabs")
if get_env_value("BROWSERBASE_API_KEY"):
tools.append("Browser")
if get_env_value("FIRECRAWL_API_KEY"):
tools.append("Firecrawl")
if tools:
return ", ".join(tools)
return None
return None
def _skip_configured_section(
config: dict, section_key: str, label: str
) -> bool:
"""Show an already-configured section summary and offer to skip.
Returns True if the user chose to skip, False if the section should run.
"""
summary = _get_section_config_summary(config, section_key)
if not summary:
return False
print()
print_success(f" {label}: {summary}")
return not prompt_yes_no(f" Reconfigure {label.lower()}?", default=False)
# =============================================================================
# OpenClaw Migration
# =============================================================================
@@ -3039,7 +3153,7 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
target_root=hermes_home.resolve(),
execute=True,
workspace_target=None,
overwrite=False,
overwrite=True,
migrate_secrets=True,
output_dir=None,
selected_options=selected,
@@ -3195,6 +3309,8 @@ def run_setup_wizard(args):
)
)
migration_ran = False
if is_existing:
# ── Returning User Menu ──
print()
@@ -3264,7 +3380,8 @@ def run_setup_wizard(args):
return
# Offer OpenClaw migration before configuration begins
if _offer_openclaw_migration(hermes_home):
migration_ran = _offer_openclaw_migration(hermes_home)
if migration_ran:
# Reload config in case migration wrote to it
config = load_config()
@@ -3277,20 +3394,31 @@ def run_setup_wizard(args):
print()
print_info("You can edit these files directly or use 'hermes config edit'")
if migration_ran:
print()
print_info("Settings were imported from OpenClaw.")
print_info("Each section below will show what was imported — press Enter to keep,")
print_info("or choose to reconfigure if needed.")
# Section 1: Model & Provider
setup_model_provider(config)
if not (migration_ran and _skip_configured_section(config, "model", "Model & Provider")):
setup_model_provider(config)
# Section 2: Terminal Backend
setup_terminal_backend(config)
if not (migration_ran and _skip_configured_section(config, "terminal", "Terminal Backend")):
setup_terminal_backend(config)
# Section 3: Agent Settings
setup_agent_settings(config)
if not (migration_ran and _skip_configured_section(config, "agent", "Agent Settings")):
setup_agent_settings(config)
# Section 4: Messaging Platforms
setup_gateway(config)
if not (migration_ran and _skip_configured_section(config, "gateway", "Messaging Platforms")):
setup_gateway(config)
# Section 5: Tools
setup_tools(config, first_install=not is_existing)
if not (migration_ran and _skip_configured_section(config, "tools", "Tools")):
setup_tools(config, first_install=not is_existing)
# Save and show summary
save_config(config)
+4
View File
@@ -24,6 +24,10 @@ PLATFORMS = {
"whatsapp": "📱 WhatsApp",
"signal": "📡 Signal",
"email": "📧 Email",
"homeassistant": "🏠 Home Assistant",
"mattermost": "💬 Mattermost",
"matrix": "💬 Matrix",
"dingtalk": "💬 DingTalk",
}
# ─── Config Helpers ───────────────────────────────────────────────────────────
+48 -14
View File
@@ -304,7 +304,8 @@ def do_browse(page: int = 1, page_size: int = 20, source: str = "all",
def do_install(identifier: str, category: str = "", force: bool = False,
console: Optional[Console] = None, skip_confirm: bool = False) -> None:
console: Optional[Console] = None, skip_confirm: bool = False,
invalidate_cache: bool = True) -> None:
"""Fetch, quarantine, scan, confirm, and install a skill."""
from tools.skills_hub import (
GitHubAuth, create_source_router, ensure_hub_dirs,
@@ -417,6 +418,17 @@ def do_install(identifier: str, category: str = "", force: bool = False,
c.print(f"[bold green]Installed:[/] {install_dir.relative_to(SKILLS_DIR)}")
c.print(f"[dim]Files: {', '.join(bundle.files.keys())}[/]\n")
if invalidate_cache:
# Invalidate the skills prompt cache so the new skill appears immediately
try:
from agent.prompt_builder import clear_skills_system_prompt_cache
clear_skills_system_prompt_cache(clear_snapshot=True)
except Exception:
pass
else:
c.print("[dim]Skill will be available in your next session.[/]")
c.print("[dim]Use /reset to start a new session now, or --now to activate immediately (invalidates prompt cache).[/]\n")
def do_inspect(identifier: str, console: Optional[Console] = None) -> None:
"""Preview a skill's SKILL.md content without installing."""
@@ -603,7 +615,8 @@ def do_audit(name: Optional[str] = None, console: Optional[Console] = None) -> N
def do_uninstall(name: str, console: Optional[Console] = None,
skip_confirm: bool = False) -> None:
skip_confirm: bool = False,
invalidate_cache: bool = True) -> None:
"""Remove a hub-installed skill with confirmation."""
from tools.skills_hub import uninstall_skill
@@ -623,6 +636,15 @@ def do_uninstall(name: str, console: Optional[Console] = None,
success, msg = uninstall_skill(name)
if success:
c.print(f"[bold green]{msg}[/]\n")
if invalidate_cache:
try:
from agent.prompt_builder import clear_skills_system_prompt_cache
clear_skills_system_prompt_cache(clear_snapshot=True)
except Exception:
pass
else:
c.print("[dim]Change will take effect in your next session.[/]")
c.print("[dim]Use /reset to start a new session now, or --now to apply immediately (invalidates prompt cache).[/]\n")
else:
c.print(f"[bold red]Error:[/] {msg}\n")
@@ -865,10 +887,15 @@ def do_snapshot_export(output_path: str, console: Optional[Console] = None) -> N
"taps": tap_list,
}
out = Path(output_path)
out.write_text(json.dumps(snapshot, indent=2, ensure_ascii=False) + "\n")
c.print(f"[bold green]Snapshot exported:[/] {out}")
c.print(f"[dim]{len(installed)} skill(s), {len(tap_list)} tap(s)[/]\n")
payload = json.dumps(snapshot, indent=2, ensure_ascii=False) + "\n"
if output_path == "-":
import sys
sys.stdout.write(payload)
else:
out = Path(output_path)
out.write_text(payload)
c.print(f"[bold green]Snapshot exported:[/] {out}")
c.print(f"[dim]{len(installed)} skill(s), {len(tap_list)} tap(s)[/]\n")
def do_snapshot_import(input_path: str, force: bool = False,
@@ -1059,19 +1086,23 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
elif action == "install":
if not args:
c.print("[bold red]Usage:[/] /skills install <identifier> [--category <cat>] [--force|--yes]\n")
c.print("[bold red]Usage:[/] /skills install <identifier> [--category <cat>] [--force] [--now]\n")
return
identifier = args[0]
category = ""
# --yes / -y bypasses confirmation prompt (needed in TUI mode)
# --force handles reinstall override
skip_confirm = any(flag in args for flag in ("--yes", "-y"))
# Slash commands run inside prompt_toolkit where input() hangs.
# Always skip confirmation — the user typing the command is implicit consent.
skip_confirm = True
force = "--force" in args
# --now invalidates prompt cache immediately (costs more money).
# Default: defer to next session to preserve cache.
invalidate_cache = "--now" in args
for i, a in enumerate(args):
if a == "--category" and i + 1 < len(args):
category = args[i + 1]
do_install(identifier, category=category, force=force,
skip_confirm=skip_confirm, console=c)
skip_confirm=skip_confirm, invalidate_cache=invalidate_cache,
console=c)
elif action == "inspect":
if not args:
@@ -1101,10 +1132,13 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
elif action == "uninstall":
if not args:
c.print("[bold red]Usage:[/] /skills uninstall <name> [--yes]\n")
c.print("[bold red]Usage:[/] /skills uninstall <name> [--now]\n")
return
skip_confirm = any(flag in args for flag in ("--yes", "-y"))
do_uninstall(args[0], console=c, skip_confirm=skip_confirm)
# Slash commands run inside prompt_toolkit where input() hangs.
skip_confirm = True
invalidate_cache = "--now" in args
do_uninstall(args[0], console=c, skip_confirm=skip_confirm,
invalidate_cache=invalidate_cache)
elif action == "publish":
if not args:
+2 -1
View File
@@ -292,8 +292,9 @@ def show_status(args):
print(" Manager: systemd (user)")
elif sys.platform == 'darwin':
from hermes_cli.gateway import get_launchd_label
result = subprocess.run(
["launchctl", "list", "ai.hermes.gateway"],
["launchctl", "list", get_launchd_label()],
capture_output=True,
text=True
)
+14 -2
View File
@@ -108,7 +108,8 @@ def _get_effective_configurable_toolsets():
"""
result = list(CONFIGURABLE_TOOLSETS)
try:
from hermes_cli.plugins import get_plugin_toolsets
from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
discover_plugins() # idempotent — ensures plugins are loaded
result.extend(get_plugin_toolsets())
except Exception:
pass
@@ -118,7 +119,8 @@ def _get_effective_configurable_toolsets():
def _get_plugin_toolset_keys() -> set:
"""Return the set of toolset keys provided by plugins."""
try:
from hermes_cli.plugins import get_plugin_toolsets
from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
discover_plugins() # idempotent — ensures plugins are loaded
return {ts_key for ts_key, _, _ in get_plugin_toolsets()}
except Exception:
return set()
@@ -133,8 +135,10 @@ PLATFORMS = {
"signal": {"label": "📡 Signal", "default_toolset": "hermes-signal"},
"homeassistant": {"label": "🏠 Home Assistant", "default_toolset": "hermes-homeassistant"},
"email": {"label": "📧 Email", "default_toolset": "hermes-email"},
"matrix": {"label": "💬 Matrix", "default_toolset": "hermes-matrix"},
"dingtalk": {"label": "💬 DingTalk", "default_toolset": "hermes-dingtalk"},
"api_server": {"label": "🌐 API Server", "default_toolset": "hermes-api-server"},
"mattermost": {"label": "💬 Mattermost", "default_toolset": "hermes-mattermost"},
}
@@ -186,6 +190,14 @@ TOOL_CATEGORIES = {
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
],
},
{
"name": "Exa",
"tag": "AI-native search and contents",
"web_backend": "exa",
"env_vars": [
{"key": "EXA_API_KEY", "prompt": "Exa API key", "url": "https://exa.ai"},
],
},
{
"name": "Parallel",
"tag": "AI-native search and extract",
+256
View File
@@ -0,0 +1,256 @@
"""hermes webhook — manage dynamic webhook subscriptions from the CLI.
Usage:
hermes webhook subscribe <name> [options]
hermes webhook list
hermes webhook remove <name>
hermes webhook test <name> [--payload '{"key": "value"}']
Subscriptions persist to ~/.hermes/webhook_subscriptions.json and are
hot-reloaded by the webhook adapter without a gateway restart.
"""
import json
import os
import re
import secrets
import time
from pathlib import Path
from typing import Dict, Optional
_SUBSCRIPTIONS_FILENAME = "webhook_subscriptions.json"
def _hermes_home() -> Path:
return Path(
os.getenv("HERMES_HOME", str(Path.home() / ".hermes"))
).expanduser()
def _subscriptions_path() -> Path:
return _hermes_home() / _SUBSCRIPTIONS_FILENAME
def _load_subscriptions() -> Dict[str, dict]:
path = _subscriptions_path()
if not path.exists():
return {}
try:
data = json.loads(path.read_text(encoding="utf-8"))
return data if isinstance(data, dict) else {}
except Exception:
return {}
def _save_subscriptions(subs: Dict[str, dict]) -> None:
path = _subscriptions_path()
path.parent.mkdir(parents=True, exist_ok=True)
tmp_path = path.with_suffix(".tmp")
tmp_path.write_text(
json.dumps(subs, indent=2, ensure_ascii=False),
encoding="utf-8",
)
os.replace(str(tmp_path), str(path))
def _get_webhook_config() -> dict:
"""Load webhook platform config. Returns {} if not configured."""
try:
from hermes_cli.config import load_config
cfg = load_config()
return cfg.get("platforms", {}).get("webhook", {})
except Exception:
return {}
def _is_webhook_enabled() -> bool:
return bool(_get_webhook_config().get("enabled"))
def _get_webhook_base_url() -> str:
wh = _get_webhook_config().get("extra", {})
host = wh.get("host", "0.0.0.0")
port = wh.get("port", 8644)
display_host = "localhost" if host == "0.0.0.0" else host
return f"http://{display_host}:{port}"
_SETUP_HINT = """
Webhook platform is not enabled. To set it up:
1. Run the gateway setup wizard:
hermes gateway setup
2. Or manually add to ~/.hermes/config.yaml:
platforms:
webhook:
enabled: true
extra:
host: "0.0.0.0"
port: 8644
secret: "your-global-hmac-secret"
3. Or set environment variables in ~/.hermes/.env:
WEBHOOK_ENABLED=true
WEBHOOK_PORT=8644
WEBHOOK_SECRET=your-global-secret
Then start the gateway: hermes gateway run
"""
def _require_webhook_enabled() -> bool:
"""Check webhook is enabled. Print setup guide and return False if not."""
if _is_webhook_enabled():
return True
print(_SETUP_HINT)
return False
def webhook_command(args):
"""Entry point for 'hermes webhook' subcommand."""
sub = getattr(args, "webhook_action", None)
if not sub:
print("Usage: hermes webhook {subscribe|list|remove|test}")
print("Run 'hermes webhook --help' for details.")
return
if not _require_webhook_enabled():
return
if sub in ("subscribe", "add"):
_cmd_subscribe(args)
elif sub in ("list", "ls"):
_cmd_list(args)
elif sub in ("remove", "rm"):
_cmd_remove(args)
elif sub == "test":
_cmd_test(args)
def _cmd_subscribe(args):
name = args.name.strip().lower().replace(" ", "-")
if not re.match(r'^[a-z0-9][a-z0-9_-]*$', name):
print(f"Error: Invalid name '{name}'. Use lowercase alphanumeric with hyphens/underscores.")
return
subs = _load_subscriptions()
is_update = name in subs
secret = args.secret or secrets.token_urlsafe(32)
events = [e.strip() for e in args.events.split(",")] if args.events else []
route = {
"description": args.description or f"Agent-created subscription: {name}",
"events": events,
"secret": secret,
"prompt": args.prompt or "",
"skills": [s.strip() for s in args.skills.split(",")] if args.skills else [],
"deliver": args.deliver or "log",
"created_at": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
}
if args.deliver_chat_id:
route["deliver_extra"] = {"chat_id": args.deliver_chat_id}
subs[name] = route
_save_subscriptions(subs)
base_url = _get_webhook_base_url()
status = "Updated" if is_update else "Created"
print(f"\n {status} webhook subscription: {name}")
print(f" URL: {base_url}/webhooks/{name}")
print(f" Secret: {secret}")
if events:
print(f" Events: {', '.join(events)}")
else:
print(" Events: (all)")
print(f" Deliver: {route['deliver']}")
if route.get("prompt"):
prompt_preview = route["prompt"][:80] + ("..." if len(route["prompt"]) > 80 else "")
print(f" Prompt: {prompt_preview}")
print(f"\n Configure your service to POST to the URL above.")
print(f" Use the secret for HMAC-SHA256 signature validation.")
print(f" The gateway must be running to receive events (hermes gateway run).\n")
def _cmd_list(args):
subs = _load_subscriptions()
if not subs:
print(" No dynamic webhook subscriptions.")
print(" Create one with: hermes webhook subscribe <name>")
return
base_url = _get_webhook_base_url()
print(f"\n {len(subs)} webhook subscription(s):\n")
for name, route in subs.items():
events = ", ".join(route.get("events", [])) or "(all)"
deliver = route.get("deliver", "log")
desc = route.get("description", "")
print(f"{name}")
if desc:
print(f" {desc}")
print(f" URL: {base_url}/webhooks/{name}")
print(f" Events: {events}")
print(f" Deliver: {deliver}")
print()
def _cmd_remove(args):
name = args.name.strip().lower()
subs = _load_subscriptions()
if name not in subs:
print(f" No subscription named '{name}'.")
print(" Note: Static routes from config.yaml cannot be removed here.")
return
del subs[name]
_save_subscriptions(subs)
print(f" Removed webhook subscription: {name}")
def _cmd_test(args):
"""Send a test POST to a webhook route."""
name = args.name.strip().lower()
subs = _load_subscriptions()
if name not in subs:
print(f" No subscription named '{name}'.")
return
route = subs[name]
secret = route.get("secret", "")
base_url = _get_webhook_base_url()
url = f"{base_url}/webhooks/{name}"
payload = args.payload or '{"test": true, "event_type": "test", "message": "Hello from hermes webhook test"}'
import hmac
import hashlib
sig = "sha256=" + hmac.new(
secret.encode(), payload.encode(), hashlib.sha256
).hexdigest()
print(f" Sending test POST to {url}")
try:
import urllib.request
req = urllib.request.Request(
url,
data=payload.encode(),
headers={
"Content-Type": "application/json",
"X-Hub-Signature-256": sig,
"X-GitHub-Event": "test",
},
method="POST",
)
with urllib.request.urlopen(req, timeout=10) as resp:
body = resp.read().decode()
print(f" Response ({resp.status}): {body}")
except Exception as e:
print(f" Error: {e}")
print(" Is the gateway running? (hermes gateway run)")
+21
View File
@@ -17,6 +17,27 @@ def get_hermes_home() -> Path:
return Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
def get_hermes_dir(new_subpath: str, old_name: str) -> Path:
"""Resolve a Hermes subdirectory with backward compatibility.
New installs get the consolidated layout (e.g. ``cache/images``).
Existing installs that already have the old path (e.g. ``image_cache``)
keep using it no migration required.
Args:
new_subpath: Preferred path relative to HERMES_HOME (e.g. ``"cache/images"``).
old_name: Legacy path relative to HERMES_HOME (e.g. ``"image_cache"``).
Returns:
Absolute ``Path`` old location if it exists on disk, otherwise the new one.
"""
home = get_hermes_home()
old_path = home / old_name
if old_path.exists():
return old_path
return home / new_subpath
VALID_REASONING_EFFORTS = ("xhigh", "high", "medium", "low", "minimal")
+304 -84
View File
@@ -15,15 +15,20 @@ Key design decisions:
"""
import json
import logging
import os
import random
import re
import sqlite3
import threading
import time
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Dict, Any, List, Optional
from typing import Any, Callable, Dict, List, Optional, TypeVar
logger = logging.getLogger(__name__)
T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
@@ -116,18 +121,38 @@ class SessionDB:
single writer via WAL mode). Each method opens its own cursor.
"""
# ── Write-contention tuning ──
# With multiple hermes processes (gateway + CLI sessions + worktree agents)
# all sharing one state.db, WAL write-lock contention causes visible TUI
# freezes. SQLite's built-in busy handler uses a deterministic sleep
# schedule that causes convoy effects under high concurrency.
#
# Instead, we keep the SQLite timeout short (1s) and handle retries at the
# application level with random jitter, which naturally staggers competing
# writers and avoids the convoy.
_WRITE_MAX_RETRIES = 15
_WRITE_RETRY_MIN_S = 0.020 # 20ms
_WRITE_RETRY_MAX_S = 0.150 # 150ms
# Attempt a PASSIVE WAL checkpoint every N successful writes.
_CHECKPOINT_EVERY_N_WRITES = 50
def __init__(self, db_path: Path = None):
self.db_path = db_path or DEFAULT_DB_PATH
self.db_path.parent.mkdir(parents=True, exist_ok=True)
self._lock = threading.Lock()
self._write_count = 0
self._conn = sqlite3.connect(
str(self.db_path),
check_same_thread=False,
# 30s gives the WAL writer (CLI or gateway) time to finish a batch
# flush before the concurrent reader/writer gives up. 10s was too
# short when the CLI is doing frequent memory flushes.
timeout=30.0,
# Short timeout — application-level retry with random jitter
# handles contention instead of sitting in SQLite's internal
# busy handler for up to 30s.
timeout=1.0,
# Autocommit mode: Python's default isolation_level="" auto-starts
# transactions on DML, which conflicts with our explicit
# BEGIN IMMEDIATE. None = we manage transactions ourselves.
isolation_level=None,
)
self._conn.row_factory = sqlite3.Row
self._conn.execute("PRAGMA journal_mode=WAL")
@@ -135,6 +160,96 @@ class SessionDB:
self._init_schema()
# ── Core write helper ──
def _execute_write(self, fn: Callable[[sqlite3.Connection], T]) -> T:
"""Execute a write transaction with BEGIN IMMEDIATE and jitter retry.
*fn* receives the connection and should perform INSERT/UPDATE/DELETE
statements. The caller must NOT call ``commit()`` that's handled
here after *fn* returns.
BEGIN IMMEDIATE acquires the WAL write lock at transaction start
(not at commit time), so lock contention surfaces immediately.
On ``database is locked``, we release the Python lock, sleep a
random 20-150ms, and retry breaking the convoy pattern that
SQLite's built-in deterministic backoff creates.
Returns whatever *fn* returns.
"""
last_err: Optional[Exception] = None
for attempt in range(self._WRITE_MAX_RETRIES):
try:
with self._lock:
self._conn.execute("BEGIN IMMEDIATE")
try:
result = fn(self._conn)
self._conn.commit()
except BaseException:
try:
self._conn.rollback()
except Exception:
pass
raise
# Success — periodic best-effort checkpoint.
self._write_count += 1
if self._write_count % self._CHECKPOINT_EVERY_N_WRITES == 0:
self._try_wal_checkpoint()
return result
except sqlite3.OperationalError as exc:
err_msg = str(exc).lower()
if "locked" in err_msg or "busy" in err_msg:
last_err = exc
if attempt < self._WRITE_MAX_RETRIES - 1:
jitter = random.uniform(
self._WRITE_RETRY_MIN_S,
self._WRITE_RETRY_MAX_S,
)
time.sleep(jitter)
continue
# Non-lock error or retries exhausted — propagate.
raise
# Retries exhausted (shouldn't normally reach here).
raise last_err or sqlite3.OperationalError(
"database is locked after max retries"
)
def _try_wal_checkpoint(self) -> None:
"""Best-effort PASSIVE WAL checkpoint. Never blocks, never raises.
Flushes committed WAL frames back into the main DB file for any
frames that no other connection currently needs. Keeps the WAL
from growing unbounded when many processes hold persistent
connections.
"""
try:
with self._lock:
result = self._conn.execute(
"PRAGMA wal_checkpoint(PASSIVE)"
).fetchone()
if result and result[1] > 0:
logger.debug(
"WAL checkpoint: %d/%d pages checkpointed",
result[2], result[1],
)
except Exception:
pass # Best effort — never fatal.
def close(self):
"""Close the database connection.
Attempts a PASSIVE WAL checkpoint first so that exiting processes
help keep the WAL file from growing unbounded.
"""
with self._lock:
if self._conn:
try:
self._conn.execute("PRAGMA wal_checkpoint(PASSIVE)")
except Exception:
pass
self._conn.close()
self._conn = None
def _init_schema(self):
"""Create tables and FTS if they don't exist, run migrations."""
cursor = self._conn.cursor()
@@ -256,8 +371,8 @@ class SessionDB:
parent_session_id: str = None,
) -> str:
"""Create a new session record. Returns the session_id."""
with self._lock:
self._conn.execute(
def _do(conn):
conn.execute(
"""INSERT OR IGNORE INTO sessions (id, source, user_id, model, model_config,
system_prompt, parent_session_id, started_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
@@ -272,26 +387,35 @@ class SessionDB:
time.time(),
),
)
self._conn.commit()
self._execute_write(_do)
return session_id
def end_session(self, session_id: str, end_reason: str) -> None:
"""Mark a session as ended."""
with self._lock:
self._conn.execute(
def _do(conn):
conn.execute(
"UPDATE sessions SET ended_at = ?, end_reason = ? WHERE id = ?",
(time.time(), end_reason, session_id),
)
self._conn.commit()
self._execute_write(_do)
def reopen_session(self, session_id: str) -> None:
"""Clear ended_at/end_reason so a session can be resumed."""
def _do(conn):
conn.execute(
"UPDATE sessions SET ended_at = NULL, end_reason = NULL WHERE id = ?",
(session_id,),
)
self._execute_write(_do)
def update_system_prompt(self, session_id: str, system_prompt: str) -> None:
"""Store the full assembled system prompt snapshot."""
with self._lock:
self._conn.execute(
def _do(conn):
conn.execute(
"UPDATE sessions SET system_prompt = ? WHERE id = ?",
(system_prompt, session_id),
)
self._conn.commit()
self._execute_write(_do)
def update_token_counts(
self,
@@ -310,11 +434,39 @@ class SessionDB:
billing_provider: Optional[str] = None,
billing_base_url: Optional[str] = None,
billing_mode: Optional[str] = None,
absolute: bool = False,
) -> None:
"""Increment token counters and backfill model if not already set."""
with self._lock:
self._conn.execute(
"""UPDATE sessions SET
"""Update token counters and backfill model if not already set.
When *absolute* is False (default), values are **incremented** use
this for per-API-call deltas (CLI path).
When *absolute* is True, values are **set directly** use this when
the caller already holds cumulative totals (gateway path, where the
cached agent accumulates across messages).
"""
if absolute:
sql = """UPDATE sessions SET
input_tokens = ?,
output_tokens = ?,
cache_read_tokens = ?,
cache_write_tokens = ?,
reasoning_tokens = ?,
estimated_cost_usd = COALESCE(?, 0),
actual_cost_usd = CASE
WHEN ? IS NULL THEN actual_cost_usd
ELSE ?
END,
cost_status = COALESCE(?, cost_status),
cost_source = COALESCE(?, cost_source),
pricing_version = COALESCE(?, pricing_version),
billing_provider = COALESCE(billing_provider, ?),
billing_base_url = COALESCE(billing_base_url, ?),
billing_mode = COALESCE(billing_mode, ?),
model = COALESCE(model, ?)
WHERE id = ?"""
else:
sql = """UPDATE sessions SET
input_tokens = input_tokens + ?,
output_tokens = output_tokens + ?,
cache_read_tokens = cache_read_tokens + ?,
@@ -332,6 +484,94 @@ class SessionDB:
billing_base_url = COALESCE(billing_base_url, ?),
billing_mode = COALESCE(billing_mode, ?),
model = COALESCE(model, ?)
WHERE id = ?"""
params = (
input_tokens,
output_tokens,
cache_read_tokens,
cache_write_tokens,
reasoning_tokens,
estimated_cost_usd,
actual_cost_usd,
actual_cost_usd,
cost_status,
cost_source,
pricing_version,
billing_provider,
billing_base_url,
billing_mode,
model,
session_id,
)
def _do(conn):
conn.execute(sql, params)
self._execute_write(_do)
def ensure_session(
self,
session_id: str,
source: str = "unknown",
model: str = None,
) -> None:
"""Ensure a session row exists, creating it with minimal metadata if absent.
Used by _flush_messages_to_session_db to recover from a failed
create_session() call (e.g. transient SQLite lock at agent startup).
INSERT OR IGNORE is safe to call even when the row already exists.
"""
def _do(conn):
conn.execute(
"""INSERT OR IGNORE INTO sessions
(id, source, model, started_at)
VALUES (?, ?, ?, ?)""",
(session_id, source, model, time.time()),
)
self._execute_write(_do)
def set_token_counts(
self,
session_id: str,
input_tokens: int = 0,
output_tokens: int = 0,
model: str = None,
cache_read_tokens: int = 0,
cache_write_tokens: int = 0,
reasoning_tokens: int = 0,
estimated_cost_usd: Optional[float] = None,
actual_cost_usd: Optional[float] = None,
cost_status: Optional[str] = None,
cost_source: Optional[str] = None,
pricing_version: Optional[str] = None,
billing_provider: Optional[str] = None,
billing_base_url: Optional[str] = None,
billing_mode: Optional[str] = None,
) -> None:
"""Set token counters to absolute values (not increment).
Use this when the caller provides cumulative totals from a completed
conversation run (e.g. the gateway, where the cached agent's
session_prompt_tokens already reflects the running total).
"""
def _do(conn):
conn.execute(
"""UPDATE sessions SET
input_tokens = ?,
output_tokens = ?,
cache_read_tokens = ?,
cache_write_tokens = ?,
reasoning_tokens = ?,
estimated_cost_usd = ?,
actual_cost_usd = CASE
WHEN ? IS NULL THEN actual_cost_usd
ELSE ?
END,
cost_status = COALESCE(?, cost_status),
cost_source = COALESCE(?, cost_source),
pricing_version = COALESCE(?, pricing_version),
billing_provider = COALESCE(billing_provider, ?),
billing_base_url = COALESCE(billing_base_url, ?),
billing_mode = COALESCE(billing_mode, ?),
model = COALESCE(model, ?)
WHERE id = ?""",
(
input_tokens,
@@ -352,28 +592,7 @@ class SessionDB:
session_id,
),
)
self._conn.commit()
def ensure_session(
self,
session_id: str,
source: str = "unknown",
model: str = None,
) -> None:
"""Ensure a session row exists, creating it with minimal metadata if absent.
Used by _flush_messages_to_session_db to recover from a failed
create_session() call (e.g. transient SQLite lock at agent startup).
INSERT OR IGNORE is safe to call even when the row already exists.
"""
with self._lock:
self._conn.execute(
"""INSERT OR IGNORE INTO sessions
(id, source, model, started_at)
VALUES (?, ?, ?, ?)""",
(session_id, source, model, time.time()),
)
self._conn.commit()
self._execute_write(_do)
def get_session(self, session_id: str) -> Optional[Dict[str, Any]]:
"""Get a session by ID."""
@@ -467,10 +686,10 @@ class SessionDB:
Empty/whitespace-only strings are normalized to None (clearing the title).
"""
title = self.sanitize_title(title)
with self._lock:
def _do(conn):
if title:
# Check uniqueness (allow the same session to keep its own title)
cursor = self._conn.execute(
cursor = conn.execute(
"SELECT id FROM sessions WHERE title = ? AND id != ?",
(title, session_id),
)
@@ -479,12 +698,12 @@ class SessionDB:
raise ValueError(
f"Title '{title}' is already in use by session {conflict['id']}"
)
cursor = self._conn.execute(
cursor = conn.execute(
"UPDATE sessions SET title = ? WHERE id = ?",
(title, session_id),
)
self._conn.commit()
rowcount = cursor.rowcount
return cursor.rowcount
rowcount = self._execute_write(_do)
return rowcount > 0
def get_session_title(self, session_id: str) -> Optional[str]:
@@ -656,17 +875,24 @@ class SessionDB:
Also increments the session's message_count (and tool_call_count
if role is 'tool' or tool_calls is present).
"""
with self._lock:
# Serialize structured fields to JSON for storage
reasoning_details_json = (
json.dumps(reasoning_details)
if reasoning_details else None
)
codex_items_json = (
json.dumps(codex_reasoning_items)
if codex_reasoning_items else None
)
cursor = self._conn.execute(
# Serialize structured fields to JSON before entering the write txn
reasoning_details_json = (
json.dumps(reasoning_details)
if reasoning_details else None
)
codex_items_json = (
json.dumps(codex_reasoning_items)
if codex_reasoning_items else None
)
tool_calls_json = json.dumps(tool_calls) if tool_calls else None
# Pre-compute tool call count
num_tool_calls = 0
if tool_calls is not None:
num_tool_calls = len(tool_calls) if isinstance(tool_calls, list) else 1
def _do(conn):
cursor = conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_details, codex_reasoning_items)
@@ -676,7 +902,7 @@ class SessionDB:
role,
content,
tool_call_id,
json.dumps(tool_calls) if tool_calls else None,
tool_calls_json,
tool_name,
time.time(),
token_count,
@@ -689,25 +915,20 @@ class SessionDB:
msg_id = cursor.lastrowid
# Update counters
# Count actual tool calls from the tool_calls list (not from tool responses).
# A single assistant message can contain multiple parallel tool calls.
num_tool_calls = 0
if tool_calls is not None:
num_tool_calls = len(tool_calls) if isinstance(tool_calls, list) else 1
if num_tool_calls > 0:
self._conn.execute(
conn.execute(
"""UPDATE sessions SET message_count = message_count + 1,
tool_call_count = tool_call_count + ? WHERE id = ?""",
(num_tool_calls, session_id),
)
else:
self._conn.execute(
conn.execute(
"UPDATE sessions SET message_count = message_count + 1 WHERE id = ?",
(session_id,),
)
return msg_id
self._conn.commit()
return msg_id
return self._execute_write(_do)
def get_messages(self, session_id: str) -> List[Dict[str, Any]]:
"""Load all messages for a session, ordered by timestamp."""
@@ -1001,54 +1222,53 @@ class SessionDB:
def clear_messages(self, session_id: str) -> None:
"""Delete all messages for a session and reset its counters."""
with self._lock:
self._conn.execute(
def _do(conn):
conn.execute(
"DELETE FROM messages WHERE session_id = ?", (session_id,)
)
self._conn.execute(
conn.execute(
"UPDATE sessions SET message_count = 0, tool_call_count = 0 WHERE id = ?",
(session_id,),
)
self._conn.commit()
self._execute_write(_do)
def delete_session(self, session_id: str) -> bool:
"""Delete a session and all its messages. Returns True if found."""
with self._lock:
cursor = self._conn.execute(
def _do(conn):
cursor = conn.execute(
"SELECT COUNT(*) FROM sessions WHERE id = ?", (session_id,)
)
if cursor.fetchone()[0] == 0:
return False
self._conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
self._conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
self._conn.commit()
conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
return True
return self._execute_write(_do)
def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
"""
Delete sessions older than N days. Returns count of deleted sessions.
Only prunes ended sessions (not active ones).
"""
import time as _time
cutoff = _time.time() - (older_than_days * 86400)
cutoff = time.time() - (older_than_days * 86400)
with self._lock:
def _do(conn):
if source:
cursor = self._conn.execute(
cursor = conn.execute(
"""SELECT id FROM sessions
WHERE started_at < ? AND ended_at IS NOT NULL AND source = ?""",
(cutoff, source),
)
else:
cursor = self._conn.execute(
cursor = conn.execute(
"SELECT id FROM sessions WHERE started_at < ? AND ended_at IS NOT NULL",
(cutoff,),
)
session_ids = [row["id"] for row in cursor.fetchall()]
for sid in session_ids:
self._conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
self._conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
return len(session_ids)
self._conn.commit()
return len(session_ids)
return self._execute_write(_do)
+2 -2
View File
@@ -270,7 +270,7 @@ def cmd_status(args) -> None:
print(f" {peer}: {mode}")
print(f" Write freq: {hcfg.write_frequency}")
if hcfg.enabled and hcfg.api_key:
if hcfg.enabled and (hcfg.api_key or hcfg.base_url):
print("\n Connection... ", end="", flush=True)
try:
get_honcho_client(hcfg)
@@ -278,7 +278,7 @@ def cmd_status(args) -> None:
except Exception as e:
print(f"FAILED ({e})\n")
else:
reason = "disabled" if not hcfg.enabled else "no API key"
reason = "disabled" if not hcfg.enabled else "no API key or base URL"
print(f"\n Not connected ({reason})\n")
+10 -1
View File
@@ -417,9 +417,18 @@ def get_honcho_client(config: HonchoClientConfig | None = None) -> Honcho:
else:
logger.info("Initializing Honcho client (host: %s, workspace: %s)", config.host, config.workspace_id)
# Local Honcho instances don't require an API key, but the SDK
# expects a non-empty string. Use a placeholder for local URLs.
_is_local = resolved_base_url and (
"localhost" in resolved_base_url
or "127.0.0.1" in resolved_base_url
or "::1" in resolved_base_url
)
effective_api_key = config.api_key or ("local" if _is_local else None)
kwargs: dict = {
"workspace_id": config.workspace_id,
"api_key": config.api_key,
"api_key": effective_api_key,
"environment": config.environment,
}
if resolved_base_url:
+46 -8
View File
@@ -10,6 +10,12 @@
# container recreation. Environment variables are written to $HERMES_HOME/.env
# and read by hermes at startup — no container recreation needed for env changes.
#
# Tool resolution: the hermes wrapper uses --suffix PATH for nix store tools,
# so apt/uv-installed versions take priority. The container entrypoint provisions
# extensible tools on first boot: nodejs/npm via apt, uv via curl, and a Python
# 3.11 venv (bootstrapped entirely by uv) at ~/.venv with pip seeded. Agents get
# writable tool prefixes for npm i -g, pip install, uv tool install, etc.
#
# Usage:
# services.hermes-agent = {
# enable = true;
@@ -105,22 +111,52 @@
fi
mkdir -p "$TARGET_HOME"
chown "$HERMES_UID:$HERMES_GID" "$TARGET_HOME"
chmod 0750 "$TARGET_HOME"
# Ensure HERMES_HOME is owned by the target user
if [ -n "''${HERMES_HOME:-}" ] && [ -d "$HERMES_HOME" ]; then
chown -R "$HERMES_UID:$HERMES_GID" "$HERMES_HOME"
fi
# Install sudo on Debian/Ubuntu if missing (first boot only, cached in writable layer)
if command -v apt-get >/dev/null 2>&1 && ! command -v sudo >/dev/null 2>&1; then
apt-get update -qq >/dev/null 2>&1 && apt-get install -y -qq sudo >/dev/null 2>&1 || true
# ── Provision apt packages (first boot only, cached in writable layer) ──
# sudo: agent self-modification
# nodejs/npm: writable node so npm i -g works (nix store copies are read-only)
# curl: needed for uv installer
if [ ! -f /var/lib/hermes-tools-provisioned ] && command -v apt-get >/dev/null 2>&1; then
echo "First boot: provisioning agent tools..."
apt-get update -qq
apt-get install -y -qq sudo nodejs npm curl
touch /var/lib/hermes-tools-provisioned
fi
if command -v sudo >/dev/null 2>&1 && [ ! -f /etc/sudoers.d/hermes ]; then
mkdir -p /etc/sudoers.d
echo "$TARGET_USER ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/hermes
chmod 0440 /etc/sudoers.d/hermes
fi
# uv (Python manager) — not in Ubuntu repos, retry-safe outside the sentinel
if ! command -v uv >/dev/null 2>&1 && [ ! -x "$TARGET_HOME/.local/bin/uv" ] && command -v curl >/dev/null 2>&1; then
su -s /bin/sh "$TARGET_USER" -c 'curl -LsSf https://astral.sh/uv/install.sh | sh' || true
fi
# Python 3.11 venv — gives the agent a writable Python with pip.
# Uses uv to install Python 3.11 (Ubuntu 24.04 ships 3.12).
# --seed includes pip/setuptools so bare `pip install` works.
_UV_BIN="$TARGET_HOME/.local/bin/uv"
if [ ! -d "$TARGET_HOME/.venv" ] && [ -x "$_UV_BIN" ]; then
su -s /bin/sh "$TARGET_USER" -c "
export PATH=\"\$HOME/.local/bin:\$PATH\"
uv python install 3.11
uv venv --python 3.11 --seed \"\$HOME/.venv\"
" || true
fi
# Put the agent venv first on PATH so python/pip resolve to writable copies
if [ -d "$TARGET_HOME/.venv/bin" ]; then
export PATH="$TARGET_HOME/.venv/bin:$PATH"
fi
if command -v setpriv >/dev/null 2>&1; then
exec setpriv --reuid="$HERMES_UID" --regid="$HERMES_GID" --init-groups "$@"
elif command -v su >/dev/null 2>&1; then
@@ -516,8 +552,8 @@
# ── Directories ───────────────────────────────────────────────────
{
systemd.tmpfiles.rules = [
"d ${cfg.stateDir} 0755 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/.hermes 0755 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir} 0750 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/.hermes 0750 ${cfg.user} ${cfg.group} - -"
"d ${cfg.stateDir}/home 0750 ${cfg.user} ${cfg.group} - -"
"d ${cfg.workingDirectory} 0750 ${cfg.user} ${cfg.group} - -"
];
@@ -531,21 +567,23 @@
mkdir -p ${cfg.stateDir}/home
mkdir -p ${cfg.workingDirectory}
chown ${cfg.user}:${cfg.group} ${cfg.stateDir} ${cfg.stateDir}/.hermes ${cfg.stateDir}/home ${cfg.workingDirectory}
chmod 0750 ${cfg.stateDir} ${cfg.stateDir}/.hermes ${cfg.stateDir}/home ${cfg.workingDirectory}
# Merge Nix settings into existing config.yaml.
# Preserves user-added keys (skills, streaming, etc.); Nix keys win.
# If configFile is user-provided (not generated), overwrite instead of merge.
${if cfg.configFile != null then ''
install -o ${cfg.user} -g ${cfg.group} -m 0644 -D ${configFile} ${cfg.stateDir}/.hermes/config.yaml
install -o ${cfg.user} -g ${cfg.group} -m 0640 -D ${configFile} ${cfg.stateDir}/.hermes/config.yaml
'' else ''
${configMergeScript} ${generatedConfigFile} ${cfg.stateDir}/.hermes/config.yaml
chown ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/config.yaml
chmod 0644 ${cfg.stateDir}/.hermes/config.yaml
chmod 0640 ${cfg.stateDir}/.hermes/config.yaml
''}
# Managed mode marker (so interactive shells also detect NixOS management)
touch ${cfg.stateDir}/.hermes/.managed
chown ${cfg.user}:${cfg.group} ${cfg.stateDir}/.hermes/.managed
chmod 0644 ${cfg.stateDir}/.hermes/.managed
# Seed auth file if provided
${lib.optionalString (cfg.authFile != null) ''
@@ -577,7 +615,7 @@ HERMES_NIX_ENV_EOF
# Link documents into workspace
${lib.concatStringsSep "\n" (lib.mapAttrsToList (name: _value: ''
install -o ${cfg.user} -g ${cfg.group} -m 0644 ${documentDerivation}/${name} ${cfg.workingDirectory}/${name}
install -o ${cfg.user} -g ${cfg.group} -m 0640 ${documentDerivation}/${name} ${cfg.workingDirectory}/${name}
'') cfg.documents)}
'';
}
+1 -1
View File
@@ -35,7 +35,7 @@
${pkgs.lib.concatMapStringsSep "\n" (name: ''
makeWrapper ${hermesVenv}/bin/${name} $out/bin/${name} \
--prefix PATH : "${runtimePath}" \
--suffix PATH : "${runtimePath}" \
--set HERMES_BUNDLED_SKILLS $out/share/hermes-agent/skills
'') [ "hermes" "hermes-agent" "hermes-acp" ]}
+4 -3
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
version = "0.4.0"
version = "0.5.0"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"
@@ -26,6 +26,7 @@ dependencies = [
# Interactive CLI (prompt_toolkit is used directly by cli.py)
"prompt_toolkit>=3.0.52,<4",
# Tools
"exa-py>=2.9.0,<3",
"firecrawl-py>=4.16.0,<5",
"parallel-web>=0.4.2,<1",
"fal-client>=0.13.1,<1",
@@ -37,7 +38,7 @@ dependencies = [
]
[project.optional-dependencies]
modal = ["swe-rex[modal]>=1.4.0,<2"]
modal = ["modal>=1.0.0,<2"]
daytona = ["daytona>=0.148.0,<1"]
dev = ["pytest>=9.0.2,<10", "pytest-asyncio>=1.3.0,<2", "pytest-xdist>=3.0,<4", "mcp>=1.2.0,<2"]
messaging = ["python-telegram-bot>=22.6,<23", "discord.py[voice]>=2.7.1,<3", "aiohttp>=3.13.3,<4", "slack-bolt>=1.18.0,<2", "slack-sdk>=3.27.0,<4"]
@@ -55,7 +56,7 @@ honcho = ["honcho-ai>=2.0.1,<3"]
mcp = ["mcp>=1.2.0,<2"]
homeassistant = ["aiohttp>=3.9.0,<4"]
sms = ["aiohttp>=3.9.0,<4"]
acp = ["agent-client-protocol>=0.8.1,<1.0"]
acp = ["agent-client-protocol>=0.8.1,<0.9"]
dingtalk = ["dingtalk-stream>=0.1.0,<1"]
rl = [
"atroposlib @ git+https://github.com/NousResearch/atropos.git",
+563 -36
View File
@@ -62,7 +62,12 @@ else:
# Import our tool system
from model_tools import get_tool_definitions, handle_function_call, check_toolset_requirements
from model_tools import (
get_tool_definitions,
get_toolset_for_tool,
handle_function_call,
check_toolset_requirements,
)
from tools.terminal_tool import cleanup_vm
from tools.interrupt import set_interrupt as _set_interrupt
from tools.browser_tool import cleanup_browser
@@ -83,7 +88,7 @@ from agent.model_metadata import (
)
from agent.context_compressor import ContextCompressor
from agent.prompt_caching import apply_anthropic_cache_control
from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, load_soul_md
from agent.prompt_builder import build_skills_system_prompt, build_context_files_prompt, load_soul_md, TOOL_USE_ENFORCEMENT_GUIDANCE, TOOL_USE_ENFORCEMENT_MODELS
from agent.usage_pricing import estimate_usage_cost, normalize_usage
from agent.display import (
KawaiiSpinner, build_tool_preview as _build_tool_preview,
@@ -356,6 +361,85 @@ def _inject_honcho_turn_context(content, turn_context: str):
return f"{text}\n\n{note}"
# Budget warning text patterns injected by _get_budget_warning().
_BUDGET_WARNING_RE = re.compile(
r"\[BUDGET(?:\s+WARNING)?:\s+Iteration\s+\d+/\d+\..*?\]",
re.DOTALL,
)
# Regex to match lone surrogate code points (U+D800..U+DFFF).
# These are invalid in UTF-8 and cause UnicodeEncodeError when the OpenAI SDK
# serialises messages to JSON. Common source: clipboard paste from Google Docs
# or other rich-text editors on some platforms.
_SURROGATE_RE = re.compile(r'[\ud800-\udfff]')
def _sanitize_surrogates(text: str) -> str:
"""Replace lone surrogate code points with U+FFFD (replacement character).
Surrogates are invalid in UTF-8 and will crash ``json.dumps()`` inside the
OpenAI SDK. This is a fast no-op when the text contains no surrogates.
"""
if _SURROGATE_RE.search(text):
return _SURROGATE_RE.sub('\ufffd', text)
return text
def _sanitize_messages_surrogates(messages: list) -> bool:
"""Sanitize surrogate characters from all string content in a messages list.
Walks message dicts in-place. Returns True if any surrogates were found
and replaced, False otherwise.
"""
found = False
for msg in messages:
if not isinstance(msg, dict):
continue
content = msg.get("content")
if isinstance(content, str) and _SURROGATE_RE.search(content):
msg["content"] = _SURROGATE_RE.sub('\ufffd', content)
found = True
elif isinstance(content, list):
for part in content:
if isinstance(part, dict):
text = part.get("text")
if isinstance(text, str) and _SURROGATE_RE.search(text):
part["text"] = _SURROGATE_RE.sub('\ufffd', text)
found = True
return found
def _strip_budget_warnings_from_history(messages: list) -> None:
"""Remove budget pressure warnings from tool-result messages in-place.
Budget warnings are turn-scoped signals that must not leak into replayed
history. They live in tool-result ``content`` either as a JSON key
(``_budget_warning``) or appended plain text.
"""
for msg in messages:
if not isinstance(msg, dict) or msg.get("role") != "tool":
continue
content = msg.get("content")
if not isinstance(content, str) or "_budget_warning" not in content and "[BUDGET" not in content:
continue
# Try JSON first (the common case: _budget_warning key in a dict)
try:
parsed = json.loads(content)
if isinstance(parsed, dict) and "_budget_warning" in parsed:
del parsed["_budget_warning"]
msg["content"] = json.dumps(parsed, ensure_ascii=False)
continue
except (json.JSONDecodeError, TypeError):
pass
# Fallback: strip the text pattern from plain-text tool results
cleaned = _BUDGET_WARNING_RE.sub("", content).strip()
if cleaned != content:
msg["content"] = cleaned
class AIAgent:
"""
AI Agent with tool calling capabilities.
@@ -486,6 +570,7 @@ class AIAgent:
# instead of going directly to stdout where patch_stdout's StdoutProxy
# would mangle the escape sequences. None = use builtins.print.
self._print_fn = None
self.background_review_callback = None # Optional sync callback for gateway delivery
self.skip_context_files = skip_context_files
self.pass_session_id = pass_session_id
self.log_prefix_chars = log_prefix_chars
@@ -533,6 +618,7 @@ class AIAgent:
self.tool_progress_callback = tool_progress_callback
self.thinking_callback = thinking_callback
self.reasoning_callback = reasoning_callback
self._reasoning_deltas_fired = False # Set by _fire_reasoning_delta, reset per API call
self.clarify_callback = clarify_callback
self.step_callback = step_callback
self.stream_delta_callback = stream_delta_callback
@@ -775,6 +861,25 @@ class AIAgent:
}
self._client_kwargs = client_kwargs # stored for rebuilding after interrupt
# Enable fine-grained tool streaming for Claude on OpenRouter.
# Without this, Anthropic buffers the entire tool call and goes
# silent for minutes while thinking — OpenRouter's upstream proxy
# times out during the silence. The beta header makes Anthropic
# stream tool call arguments token-by-token, keeping the
# connection alive.
_effective_base = str(client_kwargs.get("base_url", "")).lower()
if "openrouter" in _effective_base and "claude" in (self.model or "").lower():
headers = client_kwargs.get("default_headers") or {}
existing_beta = headers.get("x-anthropic-beta", "")
_FINE_GRAINED = "fine-grained-tool-streaming-2025-05-14"
if _FINE_GRAINED not in existing_beta:
if existing_beta:
headers["x-anthropic-beta"] = f"{existing_beta},{_FINE_GRAINED}"
else:
headers["x-anthropic-beta"] = _FINE_GRAINED
client_kwargs["default_headers"] = headers
self.api_key = client_kwargs.get("api_key", "")
try:
self.client = self._create_openai_client(client_kwargs, reason="agent_init", shared=True)
@@ -979,8 +1084,8 @@ class AIAgent:
else:
if not hcfg.enabled:
logger.debug("Honcho disabled in global config")
elif not hcfg.api_key:
logger.debug("Honcho enabled but no API key configured")
elif not (hcfg.api_key or hcfg.base_url):
logger.debug("Honcho enabled but no API key or base URL configured")
else:
logger.debug("Honcho enabled but missing API key or disabled in config")
except Exception as e:
@@ -1017,6 +1122,13 @@ class AIAgent:
except Exception:
pass
# Tool-use enforcement config: "auto" (default — matches hardcoded
# model list), true (always), false (never), or list of substrings.
_agent_section = _agent_cfg.get("agent", {})
if not isinstance(_agent_section, dict):
_agent_section = {}
self._tool_use_enforcement = _agent_section.get("tool_use_enforcement", "auto")
# Initialize context compressor for automatic context management
# Compresses conversation when approaching model's context limit
# Configuration via config.yaml (compression section)
@@ -1525,6 +1637,12 @@ class AIAgent:
if actions:
summary = " · ".join(dict.fromkeys(actions))
self._safe_print(f" 💾 {summary}")
_bg_cb = self.background_review_callback
if _bg_cb:
try:
_bg_cb(f"💾 {summary}")
except Exception:
pass
except Exception as e:
logger.debug("Background memory/skill review failed: %s", e)
@@ -2048,6 +2166,23 @@ class AIAgent:
msg["content"] = self._clean_session_content(msg["content"])
cleaned.append(msg)
# Guard: never overwrite a larger session log with fewer messages.
# This protects against data loss when --resume loads a session whose
# messages weren't fully written to SQLite — the resumed agent starts
# with partial history and would otherwise clobber the full JSON log.
if self.session_log_file.exists():
try:
existing = json.loads(self.session_log_file.read_text(encoding="utf-8"))
existing_count = existing.get("message_count", len(existing.get("messages", [])))
if existing_count > len(cleaned):
logging.debug(
"Skipping session log overwrite: existing has %d messages, current has %d",
existing_count, len(cleaned),
)
return
except Exception:
pass # corrupted existing file — allow the overwrite
entry = {
"session_id": self.session_id,
"model": self.model,
@@ -2157,8 +2292,14 @@ class AIAgent:
# ── Honcho integration helpers ──
def _honcho_should_activate(self, hcfg) -> bool:
"""Return True when remote Honcho should be active."""
if not hcfg or not hcfg.enabled or not hcfg.api_key:
"""Return True when Honcho should be active.
Self-hosted Honcho may be configured with a base_url and no API key,
so activation should accept either credential style.
"""
if not hcfg or not hcfg.enabled:
return False
if not (hcfg.api_key or hcfg.base_url):
return False
return True
@@ -2424,6 +2565,30 @@ class AIAgent:
if tool_guidance:
prompt_parts.append(" ".join(tool_guidance))
# Tool-use enforcement: tells the model to actually call tools instead
# of describing intended actions. Controlled by config.yaml
# agent.tool_use_enforcement:
# "auto" (default) — matches TOOL_USE_ENFORCEMENT_MODELS
# true — always inject (all models)
# false — never inject
# list — custom model-name substrings to match
if self.valid_tool_names:
_enforce = self._tool_use_enforcement
_inject = False
if _enforce is True or (isinstance(_enforce, str) and _enforce.lower() in ("true", "always", "yes", "on")):
_inject = True
elif _enforce is False or (isinstance(_enforce, str) and _enforce.lower() in ("false", "never", "no", "off")):
_inject = False
elif isinstance(_enforce, list):
model_lower = (self.model or "").lower()
_inject = any(p.lower() in model_lower for p in _enforce if isinstance(p, str))
else:
# "auto" or any unrecognised value — use hardcoded defaults
model_lower = (self.model or "").lower()
_inject = any(p in model_lower for p in TOOL_USE_ENFORCEMENT_MODELS)
if _inject:
prompt_parts.append(TOOL_USE_ENFORCEMENT_GUIDANCE)
# Honcho CLI awareness: tell Hermes about its own management commands
# so it can refer the user to them rather than reinventing answers.
if self._honcho and self._honcho_session_key:
@@ -2496,7 +2661,13 @@ class AIAgent:
has_skills_tools = any(name in self.valid_tool_names for name in ['skills_list', 'skill_view', 'skill_manage'])
if has_skills_tools:
avail_toolsets = {ts for ts, avail in check_toolset_requirements().items() if avail}
avail_toolsets = {
toolset
for toolset in (
get_toolset_for_tool(tool_name) for tool_name in self.valid_tool_names
)
if toolset
}
skills_prompt = build_skills_system_prompt(
available_tools=self.valid_tool_names,
available_toolsets=avail_toolsets,
@@ -3380,6 +3551,7 @@ class AIAgent:
max_stream_retries = 1
has_tool_calls = False
first_delta_fired = False
self._reasoning_deltas_fired = False
for attempt in range(max_stream_retries + 1):
try:
with active_client.responses.stream(**api_kwargs) as stream:
@@ -3656,6 +3828,7 @@ class AIAgent:
def _fire_reasoning_delta(self, text: str) -> None:
"""Fire reasoning callback if registered."""
self._reasoning_deltas_fired = True
cb = self.reasoning_callback
if cb is not None:
try:
@@ -3734,7 +3907,7 @@ class AIAgent:
def _call_chat_completions():
"""Stream a chat completions response."""
import httpx as _httpx
_base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 900.0))
_base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 1800.0))
_stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 60.0))
stream_kwargs = {
**api_kwargs,
@@ -3750,16 +3923,28 @@ class AIAgent:
request_client_holder["client"] = self._create_request_openai_client(
reason="chat_completion_stream_request"
)
# Reset stale-stream timer so the detector measures from this
# attempt's start, not a previous attempt's last chunk.
last_chunk_time["t"] = time.time()
stream = request_client_holder["client"].chat.completions.create(**stream_kwargs)
content_parts: list = []
tool_calls_acc: dict = {}
tool_gen_notified: set = set()
# Ollama-compatible endpoints reuse index 0 for every tool call
# in a parallel batch, distinguishing them only by id. Track
# the last seen id per raw index so we can detect a new tool
# call starting at the same index and redirect it to a fresh slot.
_last_id_at_idx: dict = {} # raw_index -> last seen non-empty id
_active_slot_by_idx: dict = {} # raw_index -> current slot in tool_calls_acc
finish_reason = None
model_name = None
role = "assistant"
reasoning_parts: list = []
usage_obj = None
# Reset per-call reasoning tracking so _build_assistant_message
# knows whether reasoning was already displayed during streaming.
self._reasoning_deltas_fired = False
for chunk in stream:
last_chunk_time["t"] = time.time()
@@ -3793,11 +3978,45 @@ class AIAgent:
_fire_first_delta()
self._fire_stream_delta(delta.content)
deltas_were_sent["yes"] = True
else:
# Tool calls suppress regular content streaming (avoids
# displaying chatty "I'll use the tool..." text alongside
# tool calls). But reasoning tags embedded in suppressed
# content should still reach the display — otherwise the
# reasoning box only appears as a post-response fallback,
# rendering it confusingly after the already-streamed
# response. Route suppressed content through the stream
# delta callback so its tag extraction can fire the
# reasoning display. Non-reasoning text is harmlessly
# suppressed by the CLI's _stream_delta when the stream
# box is already closed (tool boundary flush).
if self.stream_delta_callback:
try:
self.stream_delta_callback(delta.content)
except Exception:
pass
# Accumulate tool call deltas — notify display on first name
if delta and delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index if tc_delta.index is not None else 0
raw_idx = tc_delta.index if tc_delta.index is not None else 0
delta_id = tc_delta.id or ""
# Ollama fix: detect a new tool call reusing the same
# raw index (different id) and redirect to a fresh slot.
if raw_idx not in _active_slot_by_idx:
_active_slot_by_idx[raw_idx] = raw_idx
if (
delta_id
and raw_idx in _last_id_at_idx
and delta_id != _last_id_at_idx[raw_idx]
):
new_slot = max(tool_calls_acc, default=-1) + 1
_active_slot_by_idx[raw_idx] = new_slot
if delta_id:
_last_id_at_idx[raw_idx] = delta_id
idx = _active_slot_by_idx[raw_idx]
if idx not in tool_calls_acc:
tool_calls_acc[idx] = {
"id": tc_delta.id or "",
@@ -3879,7 +4098,10 @@ class AIAgent:
works unchanged.
"""
has_tool_use = False
self._reasoning_deltas_fired = False
# Reset stale-stream timer for this attempt
last_chunk_time["t"] = time.time()
# Use the Anthropic SDK's streaming context manager
with self._anthropic_client.messages.stream(**api_kwargs) as stream:
for event in stream:
@@ -3947,7 +4169,37 @@ class AIAgent:
e, (_httpx.ConnectError, _httpx.RemoteProtocolError, ConnectionError)
)
if _is_timeout or _is_conn_err:
# SSE error events from proxies (e.g. OpenRouter sends
# {"error":{"message":"Network connection lost."}}) are
# raised as APIError by the OpenAI SDK. These are
# semantically identical to httpx connection drops —
# the upstream stream died — and should be retried with
# a fresh connection. Distinguish from HTTP errors:
# APIError from SSE has no status_code, while
# APIStatusError (4xx/5xx) always has one.
_is_sse_conn_err = False
if not _is_timeout and not _is_conn_err:
from openai import APIError as _APIError
if isinstance(e, _APIError) and not getattr(e, "status_code", None):
_err_lower_sse = str(e).lower()
_SSE_CONN_PHRASES = (
"connection lost",
"connection reset",
"connection closed",
"connection terminated",
"network error",
"network connection",
"terminated",
"peer closed",
"broken pipe",
"upstream connect error",
)
_is_sse_conn_err = any(
phrase in _err_lower_sse
for phrase in _SSE_CONN_PHRASES
)
if _is_timeout or _is_conn_err or _is_sse_conn_err:
# Transient network / timeout error. Retry the
# streaming request with a fresh connection first.
if _stream_attempt < _max_stream_retries:
@@ -3992,6 +4244,10 @@ class AIAgent:
)
try:
# Reset stale timer — the non-streaming fallback
# uses its own client; prevent the stale detector
# from firing on stale timestamps from failed streams.
last_chunk_time["t"] = time.time()
result["response"] = self._interruptible_api_call(api_kwargs)
except Exception as fallback_err:
result["error"] = fallback_err
@@ -4001,7 +4257,19 @@ class AIAgent:
if request_client is not None:
self._close_request_openai_client(request_client, reason="stream_request_complete")
_stream_stale_timeout = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 90.0))
_stream_stale_timeout_base = float(os.getenv("HERMES_STREAM_STALE_TIMEOUT", 180.0))
# Scale the stale timeout for large contexts: slow models (like Opus)
# can legitimately think for minutes before producing the first token
# when the context is large. Without this, the stale detector kills
# healthy connections during the model's thinking phase, producing
# spurious RemoteProtocolError ("peer closed connection").
_est_tokens = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
if _est_tokens > 100_000:
_stream_stale_timeout = max(_stream_stale_timeout_base, 300.0)
elif _est_tokens > 50_000:
_stream_stale_timeout = max(_stream_stale_timeout_base, 240.0)
else:
_stream_stale_timeout = _stream_stale_timeout_base
t = threading.Thread(target=_call, daemon=True)
t.start()
@@ -4127,6 +4395,25 @@ class AIAgent:
or is_native_anthropic
)
# Update context compressor limits for the fallback model.
# Without this, compression decisions use the primary model's
# context window (e.g. 200K) instead of the fallback's (e.g. 32K),
# causing oversized sessions to overflow the fallback.
if hasattr(self, 'context_compressor') and self.context_compressor:
from agent.model_metadata import get_model_context_length
fb_context_length = get_model_context_length(
self.model, base_url=self.base_url,
api_key=self.api_key, provider=self.provider,
)
self.context_compressor.model = self.model
self.context_compressor.base_url = self.base_url
self.context_compressor.api_key = self.api_key
self.context_compressor.provider = self.provider
self.context_compressor.context_length = fb_context_length
self.context_compressor.threshold_tokens = int(
fb_context_length * self.context_compressor.threshold_percent
)
self._emit_status(
f"🔄 Primary model failed — switching to fallback: "
f"{fb_model} via {fb_provider}"
@@ -4296,6 +4583,10 @@ class AIAgent:
if self.api_mode == "anthropic_messages":
from agent.anthropic_adapter import build_anthropic_kwargs
anthropic_messages = self._prepare_anthropic_messages_for_api(api_messages)
# Pass context_length so the adapter can clamp max_tokens if the
# user configured a smaller context window than the model's output limit.
ctx_len = getattr(self, "context_compressor", None)
ctx_len = ctx_len.context_length if ctx_len else None
return build_anthropic_kwargs(
model=self.model,
messages=anthropic_messages,
@@ -4304,6 +4595,7 @@ class AIAgent:
reasoning_config=self.reasoning_config,
is_oauth=self._is_anthropic_oauth,
preserve_dots=self._anthropic_preserve_dots(),
context_length=ctx_len,
)
if self.api_mode == "codex_responses":
@@ -4415,11 +4707,25 @@ class AIAgent:
"model": self.model,
"messages": sanitized_messages,
"tools": self.tools if self.tools else None,
"timeout": float(os.getenv("HERMES_API_TIMEOUT", 900.0)),
"timeout": float(os.getenv("HERMES_API_TIMEOUT", 1800.0)),
}
if self.max_tokens is not None:
api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
# OpenRouter translates requests to Anthropic's Messages API,
# which requires max_tokens as a mandatory field. When we omit
# it, OpenRouter picks a default that can be too low — the model
# spends its output budget on thinking and has almost nothing
# left for the actual response (especially large tool calls like
# write_file). Sending the model's real output limit ensures
# full capacity. Other providers handle the default fine.
try:
from agent.anthropic_adapter import _get_anthropic_max_output
_model_output_limit = _get_anthropic_max_output(self.model)
api_kwargs["max_tokens"] = _model_output_limit
except Exception:
pass # fail open — let OpenRouter pick its default
extra_body = {}
@@ -4555,11 +4861,15 @@ class AIAgent:
logging.debug(f"Captured reasoning ({len(reasoning_text)} chars): {reasoning_text}")
if reasoning_text and self.reasoning_callback:
# Skip callback for <think>-extracted reasoning when streaming is active.
# _stream_delta() already displayed <think> blocks during streaming;
# firing the callback again would cause duplicate display.
# Structured reasoning (from reasoning_content field) always fires.
if _from_structured or not self.stream_delta_callback:
# Skip callback when streaming is active — reasoning was already
# displayed during the stream via one of two paths:
# (a) _fire_reasoning_delta (structured reasoning_content deltas)
# (b) _stream_delta tag extraction (<think>/<REASONING_SCRATCHPAD>)
# When streaming is NOT active, always fire so non-streaming modes
# (gateway, batch, quiet) still get reasoning.
# Any reasoning that wasn't shown during streaming is caught by the
# CLI post-response display fallback (cli.py _reasoning_shown_this_turn).
if not self.stream_delta_callback:
try:
self.reasoning_callback(reasoning_text)
except Exception:
@@ -5080,7 +5390,7 @@ class AIAgent:
spinner = None
if self.quiet_mode and not self.tool_progress_callback:
face = random.choice(KawaiiSpinner.KAWAII_WAITING)
spinner = KawaiiSpinner(f"{face} ⚡ running {num_tools} tools concurrently", spinner_type='dots')
spinner = KawaiiSpinner(f"{face} ⚡ running {num_tools} tools concurrently", spinner_type='dots', print_fn=self._print_fn)
spinner.start()
try:
@@ -5121,7 +5431,7 @@ class AIAgent:
# Print cute message per tool
if self.quiet_mode:
cute_msg = _get_cute_tool_message_impl(name, args, tool_duration, result=function_result)
print(f" {cute_msg}")
self._safe_print(f" {cute_msg}")
elif not self.quiet_mode:
if self.verbose_logging:
print(f" ✅ Tool {i+1} completed in {tool_duration:.2f}s")
@@ -5306,7 +5616,7 @@ class AIAgent:
spinner = None
if self.quiet_mode and not self.tool_progress_callback:
face = random.choice(KawaiiSpinner.KAWAII_WAITING)
spinner = KawaiiSpinner(f"{face} {spinner_label}", spinner_type='dots')
spinner = KawaiiSpinner(f"{face} {spinner_label}", spinner_type='dots', print_fn=self._print_fn)
spinner.start()
self._delegate_spinner = spinner
_delegate_result = None
@@ -5336,7 +5646,7 @@ class AIAgent:
preview = _build_tool_preview(function_name, function_args) or function_name
if len(preview) > 30:
preview = preview[:27] + "..."
spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots')
spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots', print_fn=self._print_fn)
spinner.start()
_spinner_result = None
try:
@@ -5697,6 +6007,14 @@ class AIAgent:
# Installed once, transparent when streams are healthy, prevents crash on write.
_install_safe_stdio()
# Sanitize surrogate characters from user input. Clipboard paste from
# rich-text editors (Google Docs, Word, etc.) can inject lone surrogates
# that are invalid UTF-8 and crash JSON serialization in the OpenAI SDK.
if isinstance(user_message, str):
user_message = _sanitize_surrogates(user_message)
if isinstance(persist_user_message, str):
persist_user_message = _sanitize_surrogates(persist_user_message)
# Store stream callback for _interruptible_api_call to pick up
self._stream_callback = stream_callback
self._persist_user_message_idx = None
@@ -5713,6 +6031,7 @@ class AIAgent:
self._codex_incomplete_retries = 0
self._last_content_with_tools = None
self._mute_post_response = False
self._surrogate_sanitized = False
# NOTE: _turns_since_memory and _iters_since_skill are NOT reset here.
# They are initialized in __init__ and must persist across run_conversation
# calls so that nudge logic accumulates correctly in CLI mode.
@@ -5720,6 +6039,14 @@ class AIAgent:
# Initialize conversation (copy to avoid mutating the caller's list)
messages = list(conversation_history) if conversation_history else []
# Strip budget pressure warnings from previous turns. These are
# turn-scoped signals injected by _get_budget_warning() into tool
# result content. If left in the replayed history, models (especially
# GPT-family) interpret them as still-active instructions and avoid
# making tool calls in ALL subsequent turns.
if messages:
_strip_budget_warnings_from_history(messages)
# Hydrate todo store from conversation history (gateway creates a fresh
# AIAgent per message, so the in-memory store is empty -- we need to
@@ -5815,6 +6142,22 @@ class AIAgent:
self._cached_system_prompt = (
self._cached_system_prompt + "\n\n" + self._honcho_context
).strip()
# Plugin hook: on_session_start
# Fired once when a brand-new session is created (not on
# continuation). Plugins can use this to initialise
# session-scoped state (e.g. warm a memory cache).
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook(
"on_session_start",
session_id=self.session_id,
model=self.model,
platform=getattr(self, "platform", None) or "",
)
except Exception as exc:
logger.warning("on_session_start hook failed: %s", exc)
# Store the system prompt snapshot in SQLite
if self._session_db:
try:
@@ -5876,6 +6219,34 @@ class AIAgent:
if _preflight_tokens < self.context_compressor.threshold_tokens:
break # Under threshold
# Plugin hook: pre_llm_call
# Fired once per turn before the tool-calling loop. Plugins can
# return a dict with a ``context`` key whose value is a string
# that will be appended to the ephemeral system prompt for every
# API call in this turn (not persisted to session DB or cache).
_plugin_turn_context = ""
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_pre_results = _invoke_hook(
"pre_llm_call",
session_id=self.session_id,
user_message=original_user_message,
conversation_history=list(messages),
is_first_turn=(not bool(conversation_history)),
model=self.model,
platform=getattr(self, "platform", None) or "",
)
_ctx_parts = []
for r in _pre_results:
if isinstance(r, dict) and r.get("context"):
_ctx_parts.append(str(r["context"]))
elif isinstance(r, str) and r.strip():
_ctx_parts.append(r)
if _ctx_parts:
_plugin_turn_context = "\n\n".join(_ctx_parts)
except Exception as exc:
logger.warning("pre_llm_call hook failed: %s", exc)
# Main conversation loop
api_call_count = 0
final_response = None
@@ -5973,6 +6344,9 @@ class AIAgent:
effective_system = active_system_prompt or ""
if self.ephemeral_system_prompt:
effective_system = (effective_system + "\n\n" + self.ephemeral_system_prompt).strip()
# Plugin context from pre_llm_call hooks — ephemeral, not cached.
if _plugin_turn_context:
effective_system = (effective_system + "\n\n" + _plugin_turn_context).strip()
if effective_system:
api_messages = [{"role": "system", "content": effective_system}] + api_messages
@@ -6019,7 +6393,7 @@ class AIAgent:
# Raw KawaiiSpinner only when no streaming consumers
# (would conflict with streamed token output)
spinner_type = random.choice(['brain', 'sparkle', 'pulse', 'moon', 'star'])
thinking_spinner = KawaiiSpinner(f"{face} {verb}...", spinner_type=spinner_type)
thinking_spinner = KawaiiSpinner(f"{face} {verb}...", spinner_type=spinner_type, print_fn=self._print_fn)
thinking_spinner.start()
# Log request details if verbose
@@ -6249,6 +6623,62 @@ class AIAgent:
if finish_reason == "length":
self._vprint(f"{self.log_prefix}⚠️ Response truncated (finish_reason='length') - model hit max output tokens", force=True)
# ── Detect thinking-budget exhaustion ──────────────
# When the model spends ALL output tokens on reasoning
# and has none left for the response, continuation
# retries are pointless. Detect this early and give a
# targeted error instead of wasting 3 API calls.
_trunc_content = None
if self.api_mode == "chat_completions":
_trunc_msg = response.choices[0].message if (hasattr(response, "choices") and response.choices) else None
_trunc_content = getattr(_trunc_msg, "content", None) if _trunc_msg else None
elif self.api_mode == "anthropic_messages":
# Anthropic response.content is a list of blocks
_text_parts = []
for _blk in getattr(response, "content", []):
if getattr(_blk, "type", None) == "text":
_text_parts.append(getattr(_blk, "text", ""))
_trunc_content = "\n".join(_text_parts) if _text_parts else None
_thinking_exhausted = (
_trunc_content is not None
and not self._has_content_after_think_block(_trunc_content)
) or _trunc_content is None
if _thinking_exhausted:
_exhaust_error = (
"Model used all output tokens on reasoning with none left "
"for the response. Try lowering reasoning effort or "
"increasing max_tokens."
)
self._vprint(
f"{self.log_prefix}💭 Reasoning exhausted the output token budget — "
f"no visible response was produced.",
force=True,
)
# Return a user-friendly message as the response so
# CLI (response box) and gateway (chat message) both
# display it naturally instead of a suppressed error.
_exhaust_response = (
"⚠️ **Thinking Budget Exhausted**\n\n"
"The model used all its output tokens on reasoning "
"and had none left for the actual response.\n\n"
"To fix this:\n"
"→ Lower reasoning effort: `/thinkon low` or `/thinkon minimal`\n"
"→ Increase the output token limit: "
"set `model.max_tokens` in config.yaml"
)
self._cleanup_task_resources(effective_task_id)
self._persist_session(messages, conversation_history)
return {
"final_response": _exhaust_response,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
"partial": True,
"error": _exhaust_error,
}
if self.api_mode == "chat_completions":
assistant_message = response.choices[0].message
if not assistant_message.tool_calls:
@@ -6437,6 +6867,24 @@ class AIAgent:
if self.thinking_callback:
self.thinking_callback("")
# -----------------------------------------------------------
# Surrogate character recovery. UnicodeEncodeError happens
# when the messages contain lone surrogates (U+D800..U+DFFF)
# that are invalid UTF-8. Common source: clipboard paste
# from Google Docs or similar rich-text editors. We sanitize
# the entire messages list in-place and retry once.
# -----------------------------------------------------------
if isinstance(api_error, UnicodeEncodeError) and not getattr(self, '_surrogate_sanitized', False):
self._surrogate_sanitized = True
if _sanitize_messages_surrogates(messages):
self._vprint(
f"{self.log_prefix}⚠️ Stripped invalid surrogate characters from messages. Retrying...",
force=True,
)
continue
# Surrogates weren't in messages — might be in system
# prompt or prefill. Fall through to normal error path.
status_code = getattr(api_error, "status_code", None)
if (
self.api_mode == "codex_responses"
@@ -6705,8 +7153,13 @@ class AIAgent:
# 529 (Anthropic overloaded) is also transient.
# Also catch local validation errors (ValueError, TypeError) — these
# are programming bugs, not transient failures.
# Exclude UnicodeEncodeError — it's a ValueError subclass but is
# handled separately by the surrogate sanitization path above.
_RETRYABLE_STATUS_CODES = {413, 429, 529}
is_local_validation_error = isinstance(api_error, (ValueError, TypeError))
is_local_validation_error = (
isinstance(api_error, (ValueError, TypeError))
and not isinstance(api_error, UnicodeEncodeError)
)
# Detect generic 400s from Anthropic OAuth (transient server-side failures).
# Real invalid_request_error responses include a descriptive message;
# transient ones contain only "Error" or are empty. (ref: issue #1608)
@@ -6776,6 +7229,36 @@ class AIAgent:
_final_summary = self._summarize_api_error(api_error)
self._vprint(f"{self.log_prefix}❌ Max retries ({max_retries}) exceeded. Giving up.", force=True)
self._vprint(f"{self.log_prefix} 💀 Final error: {_final_summary}", force=True)
# Detect SSE stream-drop pattern (e.g. "Network
# connection lost") and surface actionable guidance.
# This typically happens when the model generates a
# very large tool call (write_file with huge content)
# and the proxy/CDN drops the stream mid-response.
_is_stream_drop = (
not getattr(api_error, "status_code", None)
and any(p in error_msg for p in (
"connection lost", "connection reset",
"connection closed", "network connection",
"network error", "terminated",
))
)
if _is_stream_drop:
self._vprint(
f"{self.log_prefix} 💡 The provider's stream "
f"connection keeps dropping. This often happens "
f"when the model tries to write a very large "
f"file in a single tool call.",
force=True,
)
self._vprint(
f"{self.log_prefix} Try asking the model "
f"to use execute_code with Python's open() for "
f"large files, or to write the file in smaller "
f"sections.",
force=True,
)
logging.error(
"%sAPI call failed after %s retries. %s | provider=%s model=%s msgs=%s tokens=~%s",
self.log_prefix, max_retries, _final_summary,
@@ -6785,8 +7268,18 @@ class AIAgent:
api_kwargs, reason="max_retries_exhausted", error=api_error,
)
self._persist_session(messages, conversation_history)
_final_response = f"API call failed after {max_retries} retries: {_final_summary}"
if _is_stream_drop:
_final_response += (
"\n\nThe provider's stream connection keeps "
"dropping — this often happens when generating "
"very large tool call responses (e.g. write_file "
"with long content). Try asking me to use "
"execute_code with Python's open() for large "
"files, or to write in smaller sections."
)
return {
"final_response": f"API call failed after {max_retries} retries: {_final_summary}",
"final_response": _final_response,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
@@ -7164,7 +7657,6 @@ class AIAgent:
except Exception:
pass
_msg_count_before_tools = len(messages)
self._execute_tool_calls(assistant_message, messages, effective_task_id, api_call_count)
# Signal that a paragraph break is needed before the next
@@ -7182,18 +7674,18 @@ class AIAgent:
if _tc_names == {"execute_code"}:
self.iteration_budget.refund()
# Estimate next prompt size using real token counts from the
# last API response + rough estimate of newly appended tool
# results. This catches cases where tool results push the
# context past the limit that last_prompt_tokens alone misses
# (e.g. large file reads, web extractions).
# Use real token counts from the API response to decide
# compression. prompt_tokens + completion_tokens is the
# actual context size the provider reported plus the
# assistant turn — a tight lower bound for the next prompt.
# Tool results appended above aren't counted yet, but the
# threshold (default 50%) leaves ample headroom; if tool
# results push past it, the next API call will report the
# real total and trigger compression then.
_compressor = self.context_compressor
_new_tool_msgs = messages[_msg_count_before_tools:]
_new_chars = sum(len(str(m.get("content", "") or "")) for m in _new_tool_msgs)
_estimated_next_prompt = (
_real_tokens = (
_compressor.last_prompt_tokens
+ _compressor.last_completion_tokens
+ _new_chars // 3 # conservative: JSON-heavy tool results ≈ 3 chars/token
)
# ── Context pressure warnings (user-facing only) ──────────
@@ -7203,12 +7695,12 @@ class AIAgent:
# Does not inject into messages — just prints to CLI output
# and fires status_callback for gateway platforms.
if _compressor.threshold_tokens > 0:
_compaction_progress = _estimated_next_prompt / _compressor.threshold_tokens
_compaction_progress = _real_tokens / _compressor.threshold_tokens
if _compaction_progress >= 0.85 and not self._context_pressure_warned:
self._context_pressure_warned = True
self._emit_context_pressure(_compaction_progress, _compressor)
if self.compression_enabled and _compressor.should_compress(_estimated_next_prompt):
if self.compression_enabled and _compressor.should_compress(_real_tokens):
messages, active_system_prompt = self._compress_context(
messages, system_message,
approx_tokens=self.context_compressor.last_prompt_tokens,
@@ -7455,6 +7947,25 @@ class AIAgent:
self._honcho_sync(original_user_message, final_response)
self._queue_honcho_prefetch(original_user_message)
# Plugin hook: post_llm_call
# Fired once per turn after the tool-calling loop completes.
# Plugins can use this to persist conversation data (e.g. sync
# to an external memory system).
if final_response and not interrupted:
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook(
"post_llm_call",
session_id=self.session_id,
user_message=original_user_message,
assistant_response=final_response,
conversation_history=list(messages),
model=self.model,
platform=getattr(self, "platform", None) or "",
)
except Exception as exc:
logger.warning("post_llm_call hook failed: %s", exc)
# Extract reasoning from the last assistant message (if any)
last_reasoning = None
for msg in reversed(messages):
@@ -7520,6 +8031,22 @@ class AIAgent:
except Exception:
pass # Background review is best-effort
# Plugin hook: on_session_end
# Fired at the very end of every run_conversation call.
# Plugins can use this for cleanup, flushing buffers, etc.
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_invoke_hook(
"on_session_end",
session_id=self.session_id,
completed=completed,
interrupted=interrupted,
model=self.model,
platform=getattr(self, "platform", None) or "",
)
except Exception as exc:
logger.warning("on_session_end hook failed: %s", exc)
return result
def chat(self, message: str, stream_callback: Optional[callable] = None) -> str:
+6 -6
View File
@@ -2,7 +2,7 @@
# Kill all running Modal apps (sandboxes, deployments, etc.)
#
# Usage:
# bash scripts/kill_modal.sh # Stop swe-rex (the sandbox app)
# bash scripts/kill_modal.sh # Stop hermes-agent sandboxes
# bash scripts/kill_modal.sh --all # Stop ALL Modal apps
set -uo pipefail
@@ -17,10 +17,10 @@ if [[ "${1:-}" == "--all" ]]; then
modal app stop "$app_id" 2>/dev/null || true
done
else
echo "Stopping swe-rex sandboxes..."
APPS=$(echo "$APP_LIST" | grep 'swe-rex' | grep -oE 'ap-[A-Za-z0-9]+' || true)
echo "Stopping hermes-agent sandboxes..."
APPS=$(echo "$APP_LIST" | grep 'hermes-agent' | grep -oE 'ap-[A-Za-z0-9]+' || true)
if [[ -z "$APPS" ]]; then
echo " No swe-rex apps found."
echo " No hermes-agent apps found."
else
echo "$APPS" | while read app_id; do
echo " Stopping $app_id"
@@ -30,5 +30,5 @@ else
fi
echo ""
echo "Current swe-rex status:"
modal app list 2>/dev/null | grep -E 'State|swe-rex' || echo " (none)"
echo "Current hermes-agent status:"
modal app list 2>/dev/null | grep -E 'State|hermes-agent' || echo " (none)"
@@ -0,0 +1,180 @@
---
name: webhook-subscriptions
description: Create and manage webhook subscriptions for event-driven agent activation. Use when the user wants external services to trigger agent runs automatically.
version: 1.0.0
metadata:
hermes:
tags: [webhook, events, automation, integrations]
---
# Webhook Subscriptions
Create dynamic webhook subscriptions so external services (GitHub, GitLab, Stripe, CI/CD, IoT sensors, monitoring tools) can trigger Hermes agent runs by POSTing events to a URL.
## Setup (Required First)
The webhook platform must be enabled before subscriptions can be created. Check with:
```bash
hermes webhook list
```
If it says "Webhook platform is not enabled", set it up:
### Option 1: Setup wizard
```bash
hermes gateway setup
```
Follow the prompts to enable webhooks, set the port, and set a global HMAC secret.
### Option 2: Manual config
Add to `~/.hermes/config.yaml`:
```yaml
platforms:
webhook:
enabled: true
extra:
host: "0.0.0.0"
port: 8644
secret: "generate-a-strong-secret-here"
```
### Option 3: Environment variables
Add to `~/.hermes/.env`:
```bash
WEBHOOK_ENABLED=true
WEBHOOK_PORT=8644
WEBHOOK_SECRET=generate-a-strong-secret-here
```
After configuration, start (or restart) the gateway:
```bash
hermes gateway run
# Or if using systemd:
systemctl --user restart hermes-gateway
```
Verify it's running:
```bash
curl http://localhost:8644/health
```
## Commands
All management is via the `hermes webhook` CLI command:
### Create a subscription
```bash
hermes webhook subscribe <name> \
--prompt "Prompt template with {payload.fields}" \
--events "event1,event2" \
--description "What this does" \
--skills "skill1,skill2" \
--deliver telegram \
--deliver-chat-id "12345" \
--secret "optional-custom-secret"
```
Returns the webhook URL and HMAC secret. The user configures their service to POST to that URL.
### List subscriptions
```bash
hermes webhook list
```
### Remove a subscription
```bash
hermes webhook remove <name>
```
### Test a subscription
```bash
hermes webhook test <name>
hermes webhook test <name> --payload '{"key": "value"}'
```
## Prompt Templates
Prompts support `{dot.notation}` for accessing nested payload fields:
- `{issue.title}` — GitHub issue title
- `{pull_request.user.login}` — PR author
- `{data.object.amount}` — Stripe payment amount
- `{sensor.temperature}` — IoT sensor reading
If no prompt is specified, the full JSON payload is dumped into the agent prompt.
## Common Patterns
### GitHub: new issues
```bash
hermes webhook subscribe github-issues \
--events "issues" \
--prompt "New GitHub issue #{issue.number}: {issue.title}\n\nAction: {action}\nAuthor: {issue.user.login}\nBody:\n{issue.body}\n\nPlease triage this issue." \
--deliver telegram \
--deliver-chat-id "-100123456789"
```
Then in GitHub repo Settings → Webhooks → Add webhook:
- Payload URL: the returned webhook_url
- Content type: application/json
- Secret: the returned secret
- Events: "Issues"
### GitHub: PR reviews
```bash
hermes webhook subscribe github-prs \
--events "pull_request" \
--prompt "PR #{pull_request.number} {action}: {pull_request.title}\nBy: {pull_request.user.login}\nBranch: {pull_request.head.ref}\n\n{pull_request.body}" \
--skills "github-code-review" \
--deliver github_comment
```
### Stripe: payment events
```bash
hermes webhook subscribe stripe-payments \
--events "payment_intent.succeeded,payment_intent.payment_failed" \
--prompt "Payment {data.object.status}: {data.object.amount} cents from {data.object.receipt_email}" \
--deliver telegram \
--deliver-chat-id "-100123456789"
```
### CI/CD: build notifications
```bash
hermes webhook subscribe ci-builds \
--events "pipeline" \
--prompt "Build {object_attributes.status} on {project.name} branch {object_attributes.ref}\nCommit: {commit.message}" \
--deliver discord \
--deliver-chat-id "1234567890"
```
### Generic monitoring alert
```bash
hermes webhook subscribe alerts \
--prompt "Alert: {alert.name}\nSeverity: {alert.severity}\nMessage: {alert.message}\n\nPlease investigate and suggest remediation." \
--deliver origin
```
## Security
- Each subscription gets an auto-generated HMAC-SHA256 secret (or provide your own with `--secret`)
- The webhook adapter validates signatures on every incoming POST
- Static routes from config.yaml cannot be overwritten by dynamic subscriptions
- Subscriptions persist to `~/.hermes/webhook_subscriptions.json`
## How It Works
1. `hermes webhook subscribe` writes to `~/.hermes/webhook_subscriptions.json`
2. The webhook adapter hot-reloads this file on each incoming request (mtime-gated, negligible overhead)
3. When a POST arrives matching a route, the adapter formats the prompt and triggers an agent run
4. The agent's response is delivered to the configured target (Telegram, Discord, GitHub comment, etc.)
## Troubleshooting
If webhooks aren't working:
1. **Is the gateway running?** Check with `systemctl --user status hermes-gateway` or `ps aux | grep gateway`
2. **Is the webhook server listening?** `curl http://localhost:8644/health` should return `{"status": "ok"}`
3. **Check gateway logs:** `grep webhook ~/.hermes/logs/gateway.log | tail -20`
4. **Signature mismatch?** Verify the secret in your service matches the one from `hermes webhook list`. GitHub sends `X-Hub-Signature-256`, GitLab sends `X-Gitlab-Token`.
5. **Firewall/NAT?** The webhook URL must be reachable from the service. For local development, use a tunnel (ngrok, cloudflared).
6. **Wrong event type?** Check `--events` filter matches what the service sends. Use `hermes webhook test <name>` to verify the route works.
+3
View File
@@ -219,6 +219,9 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null; then
echo "AUTH_METHOD=gh"
elif [ -n "$GITHUB_TOKEN" ]; then
echo "AUTH_METHOD=curl"
elif [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
export GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
echo "AUTH_METHOD=curl"
elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
export GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
echo "AUTH_METHOD=curl"
@@ -23,6 +23,11 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null 2>&1; then
GH_USER=$(gh api user --jq '.login' 2>/dev/null)
elif [ -n "$GITHUB_TOKEN" ]; then
GH_AUTH_METHOD="curl"
elif [ -f "$HOME/.hermes/.env" ] && grep -q "^GITHUB_TOKEN=" "$HOME/.hermes/.env" 2>/dev/null; then
GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" "$HOME/.hermes/.env" | head -1 | cut -d= -f2 | tr -d '\n\r')
if [ -n "$GITHUB_TOKEN" ]; then
GH_AUTH_METHOD="curl"
fi
elif [ -f "$HOME/.git-credentials" ] && grep -q "github.com" "$HOME/.git-credentials" 2>/dev/null; then
GITHUB_TOKEN=$(grep "github.com" "$HOME/.git-credentials" | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
if [ -n "$GITHUB_TOKEN" ]; then
+5 -1
View File
@@ -27,7 +27,11 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null; then
else
AUTH="git"
if [ -z "$GITHUB_TOKEN" ]; then
GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
if [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
fi
fi
fi
+5 -1
View File
@@ -27,7 +27,11 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null; then
else
AUTH="git"
if [ -z "$GITHUB_TOKEN" ]; then
GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
if [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
fi
fi
fi
+5 -1
View File
@@ -29,7 +29,11 @@ else
AUTH="git"
# Ensure we have a token for API calls
if [ -z "$GITHUB_TOKEN" ]; then
GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
if [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
fi
fi
fi
echo "Using: $AUTH"
@@ -26,7 +26,11 @@ if command -v gh &>/dev/null && gh auth status &>/dev/null; then
else
AUTH="git"
if [ -z "$GITHUB_TOKEN" ]; then
GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
if [ -f ~/.hermes/.env ] && grep -q "^GITHUB_TOKEN=" ~/.hermes/.env; then
GITHUB_TOKEN=$(grep "^GITHUB_TOKEN=" ~/.hermes/.env | head -1 | cut -d= -f2 | tr -d '\n\r')
elif grep -q "github.com" ~/.git-credentials 2>/dev/null; then
GITHUB_TOKEN=$(grep "github.com" ~/.git-credentials 2>/dev/null | head -1 | sed 's|https://[^:]*:\([^@]*\)@.*|\1|')
fi
fi
fi
+25
View File
@@ -11,6 +11,7 @@ from agent.auxiliary_client import (
get_text_auxiliary_client,
get_vision_auxiliary_client,
get_available_vision_backends,
resolve_vision_provider_client,
resolve_provider_client,
auxiliary_max_tokens_param,
_read_codex_access_token,
@@ -638,6 +639,30 @@ class TestVisionClientFallback:
assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
assert model == "claude-haiku-4-5-20251001"
def test_selected_codex_provider_short_circuits_vision_auto(self, monkeypatch):
def fake_load_config():
return {"model": {"provider": "openai-codex", "default": "gpt-5.2-codex"}}
codex_client = MagicMock()
with (
patch("hermes_cli.config.load_config", fake_load_config),
patch("agent.auxiliary_client._try_codex", return_value=(codex_client, "gpt-5.2-codex")) as mock_codex,
patch("agent.auxiliary_client._try_openrouter") as mock_openrouter,
patch("agent.auxiliary_client._try_nous") as mock_nous,
patch("agent.auxiliary_client._try_anthropic") as mock_anthropic,
patch("agent.auxiliary_client._try_custom_endpoint") as mock_custom,
):
provider, client, model = resolve_vision_provider_client()
assert provider == "openai-codex"
assert client is codex_client
assert model == "gpt-5.2-codex"
mock_codex.assert_called_once()
mock_openrouter.assert_not_called()
mock_nous.assert_not_called()
mock_anthropic.assert_not_called()
mock_custom.assert_not_called()
def test_vision_auto_includes_codex(self, codex_auth_dir):
"""Codex supports vision (gpt-5.3-codex), so auto mode should use it."""
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+117 -2
View File
@@ -18,6 +18,8 @@ from agent.prompt_builder import (
build_context_files_prompt,
CONTEXT_FILE_MAX_CHARS,
DEFAULT_AGENT_IDENTITY,
TOOL_USE_ENFORCEMENT_GUIDANCE,
TOOL_USE_ENFORCEMENT_MODELS,
MEMORY_GUIDANCE,
SESSION_SEARCH_GUIDANCE,
PLATFORM_HINTS,
@@ -232,7 +234,18 @@ class TestPromptBuilderImports:
# =========================================================================
import pytest
class TestBuildSkillsSystemPrompt:
@pytest.fixture(autouse=True)
def _clear_skills_cache(self):
"""Ensure the in-process skills prompt cache doesn't leak between tests."""
from agent.prompt_builder import clear_skills_system_prompt_cache
clear_skills_system_prompt_cache(clear_snapshot=True)
yield
clear_skills_system_prompt_cache(clear_snapshot=True)
def test_empty_when_no_skills_dir(self, monkeypatch, tmp_path):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
result = build_skills_system_prompt()
@@ -302,7 +315,7 @@ class TestBuildSkillsSystemPrompt:
from unittest.mock import patch
with patch("tools.skills_tool.sys") as mock_sys:
with patch("agent.skill_utils.sys") as mock_sys:
mock_sys.platform = "darwin"
result = build_skills_system_prompt()
@@ -330,7 +343,7 @@ class TestBuildSkillsSystemPrompt:
from unittest.mock import patch
with patch(
"tools.skills_tool._get_disabled_skill_names",
"agent.prompt_builder.get_disabled_skill_names",
return_value={"old-tool"},
):
result = build_skills_system_prompt()
@@ -804,6 +817,13 @@ class TestSkillShouldShow:
class TestBuildSkillsSystemPromptConditional:
@pytest.fixture(autouse=True)
def _clear_skills_cache(self):
from agent.prompt_builder import clear_skills_system_prompt_cache
clear_skills_system_prompt_cache(clear_snapshot=True)
yield
clear_skills_system_prompt_cache(clear_snapshot=True)
def test_fallback_skill_hidden_when_primary_available(self, monkeypatch, tmp_path):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
skill_dir = tmp_path / "skills" / "search" / "duckduckgo"
@@ -908,3 +928,98 @@ class TestBuildSkillsSystemPromptConditional:
available_toolsets=set(),
)
assert "nested-null" in result
# =========================================================================
# Tool-use enforcement guidance
# =========================================================================
class TestToolUseEnforcementGuidance:
def test_guidance_mentions_tool_calls(self):
assert "tool call" in TOOL_USE_ENFORCEMENT_GUIDANCE.lower()
def test_guidance_forbids_description_only(self):
assert "describe" in TOOL_USE_ENFORCEMENT_GUIDANCE.lower()
assert "promise" in TOOL_USE_ENFORCEMENT_GUIDANCE.lower()
def test_guidance_requires_action(self):
assert "MUST" in TOOL_USE_ENFORCEMENT_GUIDANCE
def test_enforcement_models_includes_gpt(self):
assert "gpt" in TOOL_USE_ENFORCEMENT_MODELS
def test_enforcement_models_includes_codex(self):
assert "codex" in TOOL_USE_ENFORCEMENT_MODELS
def test_enforcement_models_is_tuple(self):
assert isinstance(TOOL_USE_ENFORCEMENT_MODELS, tuple)
# =========================================================================
# Budget warning history stripping
# =========================================================================
class TestStripBudgetWarningsFromHistory:
def test_strips_json_budget_warning_key(self):
import json
from run_agent import _strip_budget_warnings_from_history
messages = [
{"role": "tool", "tool_call_id": "c1", "content": json.dumps({
"output": "hello",
"exit_code": 0,
"_budget_warning": "[BUDGET: Iteration 55/60. 5 iterations left. Start consolidating your work.]",
})},
]
_strip_budget_warnings_from_history(messages)
parsed = json.loads(messages[0]["content"])
assert "_budget_warning" not in parsed
assert parsed["output"] == "hello"
assert parsed["exit_code"] == 0
def test_strips_text_budget_warning(self):
from run_agent import _strip_budget_warnings_from_history
messages = [
{"role": "tool", "tool_call_id": "c1",
"content": "some result\n\n[BUDGET WARNING: Iteration 58/60. Only 2 iteration(s) left. Provide your final response NOW. No more tool calls unless absolutely critical.]"},
]
_strip_budget_warnings_from_history(messages)
assert messages[0]["content"] == "some result"
def test_leaves_non_tool_messages_unchanged(self):
from run_agent import _strip_budget_warnings_from_history
messages = [
{"role": "assistant", "content": "[BUDGET WARNING: Iteration 58/60. Only 2 iteration(s) left. Provide your final response NOW. No more tool calls unless absolutely critical.]"},
{"role": "user", "content": "hello"},
]
original_contents = [m["content"] for m in messages]
_strip_budget_warnings_from_history(messages)
assert [m["content"] for m in messages] == original_contents
def test_handles_empty_and_missing_content(self):
from run_agent import _strip_budget_warnings_from_history
messages = [
{"role": "tool", "tool_call_id": "c1", "content": ""},
{"role": "tool", "tool_call_id": "c2"},
]
_strip_budget_warnings_from_history(messages)
assert messages[0]["content"] == ""
def test_strips_caution_variant(self):
import json
from run_agent import _strip_budget_warnings_from_history
messages = [
{"role": "tool", "tool_call_id": "c1", "content": json.dumps({
"output": "ok",
"_budget_warning": "[BUDGET: Iteration 42/60. 18 iterations left. Start consolidating your work.]",
})},
]
_strip_budget_warnings_from_history(messages)
parsed = json.loads(messages[0]["content"])
assert "_budget_warning" not in parsed
+3 -3
View File
@@ -54,7 +54,7 @@ class TestScanSkillCommands:
"""macOS-only skills should not register slash commands on Linux."""
with (
patch("tools.skills_tool.SKILLS_DIR", tmp_path),
patch("tools.skills_tool.sys") as mock_sys,
patch("agent.skill_utils.sys") as mock_sys,
):
mock_sys.platform = "linux"
_make_skill(tmp_path, "imessage", frontmatter_extra="platforms: [macos]\n")
@@ -67,7 +67,7 @@ class TestScanSkillCommands:
"""macOS-only skills should register slash commands on macOS."""
with (
patch("tools.skills_tool.SKILLS_DIR", tmp_path),
patch("tools.skills_tool.sys") as mock_sys,
patch("agent.skill_utils.sys") as mock_sys,
):
mock_sys.platform = "darwin"
_make_skill(tmp_path, "imessage", frontmatter_extra="platforms: [macos]\n")
@@ -78,7 +78,7 @@ class TestScanSkillCommands:
"""Skills without platforms field should register on any platform."""
with (
patch("tools.skills_tool.SKILLS_DIR", tmp_path),
patch("tools.skills_tool.sys") as mock_sys,
patch("agent.skill_utils.sys") as mock_sys,
):
mock_sys.platform = "win32"
_make_skill(tmp_path, "generic-tool")
+85
View File
@@ -20,6 +20,7 @@ from cron.jobs import (
resume_job,
remove_job,
mark_job_run,
advance_next_run,
get_due_jobs,
save_job_output,
)
@@ -339,6 +340,90 @@ class TestMarkJobRun:
assert updated["last_error"] == "timeout"
class TestAdvanceNextRun:
"""Tests for advance_next_run() — crash-safety for recurring jobs."""
def test_advances_interval_job(self, tmp_cron_dir):
"""Interval jobs should have next_run_at bumped to the next future occurrence."""
job = create_job(prompt="Recurring check", schedule="every 1h")
# Force next_run_at to 5 minutes ago (i.e. the job is due)
jobs = load_jobs()
old_next = (datetime.now() - timedelta(minutes=5)).isoformat()
jobs[0]["next_run_at"] = old_next
save_jobs(jobs)
result = advance_next_run(job["id"])
assert result is True
updated = get_job(job["id"])
from cron.jobs import _ensure_aware, _hermes_now
new_next_dt = _ensure_aware(datetime.fromisoformat(updated["next_run_at"]))
assert new_next_dt > _hermes_now(), "next_run_at should be in the future after advance"
def test_advances_cron_job(self, tmp_cron_dir):
"""Cron-expression jobs should have next_run_at bumped to the next occurrence."""
pytest.importorskip("croniter")
job = create_job(prompt="Daily wakeup", schedule="15 6 * * *")
# Force next_run_at to 30 minutes ago
jobs = load_jobs()
old_next = (datetime.now() - timedelta(minutes=30)).isoformat()
jobs[0]["next_run_at"] = old_next
save_jobs(jobs)
result = advance_next_run(job["id"])
assert result is True
updated = get_job(job["id"])
from cron.jobs import _ensure_aware, _hermes_now
new_next_dt = _ensure_aware(datetime.fromisoformat(updated["next_run_at"]))
assert new_next_dt > _hermes_now(), "next_run_at should be in the future after advance"
def test_skips_oneshot_job(self, tmp_cron_dir):
"""One-shot jobs should NOT be advanced — they need to retry on restart."""
job = create_job(prompt="Run once", schedule="30m")
original_next = get_job(job["id"])["next_run_at"]
result = advance_next_run(job["id"])
assert result is False
updated = get_job(job["id"])
assert updated["next_run_at"] == original_next, "one-shot next_run_at should be unchanged"
def test_nonexistent_job_returns_false(self, tmp_cron_dir):
result = advance_next_run("nonexistent-id")
assert result is False
def test_already_future_stays_future(self, tmp_cron_dir):
"""If next_run_at is already in the future, advance keeps it in the future (no harm)."""
job = create_job(prompt="Future job", schedule="every 1h")
# next_run_at is already set to ~1h from now by create_job
advance_next_run(job["id"])
# Regardless of return value, the job should still be in the future
updated = get_job(job["id"])
from cron.jobs import _ensure_aware, _hermes_now
new_next_dt = _ensure_aware(datetime.fromisoformat(updated["next_run_at"]))
assert new_next_dt > _hermes_now(), "next_run_at should remain in the future"
def test_crash_safety_scenario(self, tmp_cron_dir):
"""Simulate the crash-loop scenario: after advance, the job should NOT be due."""
job = create_job(prompt="Crash test", schedule="every 1h")
# Force next_run_at to 5 minutes ago (job is due)
jobs = load_jobs()
jobs[0]["next_run_at"] = (datetime.now() - timedelta(minutes=5)).isoformat()
save_jobs(jobs)
# Job should be due before advance
due_before = get_due_jobs()
assert len(due_before) == 1
# Advance (simulating what tick() does before run_job)
advance_next_run(job["id"])
# Now the job should NOT be due (simulates restart after crash)
due_after = get_due_jobs()
assert len(due_after) == 0, "Job should not be due after advance_next_run"
class TestGetDueJobs:
def test_past_due_within_window_returned(self, tmp_cron_dir):
"""Jobs within the dynamic grace window are still considered due (not stale).
+38
View File
@@ -687,3 +687,41 @@ class TestBuildJobPromptMissingSkill:
result = _build_job_prompt({"skills": ["ghost-skill", "real-skill"], "prompt": "go"})
assert "Real skill content." in result
assert "go" in result
class TestTickAdvanceBeforeRun:
"""Verify that tick() calls advance_next_run before run_job for crash safety."""
def test_advance_called_before_run_job(self, tmp_path):
"""advance_next_run must be called before run_job to prevent crash-loop re-fires."""
call_order = []
def fake_advance(job_id):
call_order.append(("advance", job_id))
return True
def fake_run_job(job):
call_order.append(("run", job["id"]))
return True, "output", "response", None
fake_job = {
"id": "test-advance",
"name": "test",
"prompt": "hello",
"enabled": True,
"schedule": {"kind": "cron", "expr": "15 6 * * *"},
}
with patch("cron.scheduler.get_due_jobs", return_value=[fake_job]), \
patch("cron.scheduler.advance_next_run", side_effect=fake_advance) as adv_mock, \
patch("cron.scheduler.run_job", side_effect=fake_run_job), \
patch("cron.scheduler.save_job_output", return_value=tmp_path / "out.md"), \
patch("cron.scheduler.mark_job_run"), \
patch("cron.scheduler._deliver_result"):
from cron.scheduler import tick
executed = tick(verbose=False)
assert executed == 1
adv_mock.assert_called_once_with("test-advance")
# advance must happen before run
assert call_order == [("advance", "test-advance"), ("run", "test-advance")]
@@ -0,0 +1,46 @@
"""Tests for the startup allowlist warning check in gateway/run.py."""
import os
from unittest.mock import patch
def _would_warn():
"""Replicate the startup allowlist warning logic. Returns True if warning fires."""
_any_allowlist = any(
os.getenv(v)
for v in ("TELEGRAM_ALLOWED_USERS", "DISCORD_ALLOWED_USERS",
"WHATSAPP_ALLOWED_USERS", "SLACK_ALLOWED_USERS",
"SIGNAL_ALLOWED_USERS", "SIGNAL_GROUP_ALLOWED_USERS",
"EMAIL_ALLOWED_USERS",
"SMS_ALLOWED_USERS", "MATTERMOST_ALLOWED_USERS",
"MATRIX_ALLOWED_USERS", "DINGTALK_ALLOWED_USERS",
"GATEWAY_ALLOWED_USERS")
)
_allow_all = os.getenv("GATEWAY_ALLOW_ALL_USERS", "").lower() in ("true", "1", "yes") or any(
os.getenv(v, "").lower() in ("true", "1", "yes")
for v in ("TELEGRAM_ALLOW_ALL_USERS", "DISCORD_ALLOW_ALL_USERS",
"WHATSAPP_ALLOW_ALL_USERS", "SLACK_ALLOW_ALL_USERS",
"SIGNAL_ALLOW_ALL_USERS", "EMAIL_ALLOW_ALL_USERS",
"SMS_ALLOW_ALL_USERS", "MATTERMOST_ALLOW_ALL_USERS",
"MATRIX_ALLOW_ALL_USERS", "DINGTALK_ALLOW_ALL_USERS")
)
return not _any_allowlist and not _allow_all
class TestAllowlistStartupCheck:
def test_no_config_emits_warning(self):
with patch.dict(os.environ, {}, clear=True):
assert _would_warn() is True
def test_signal_group_allowed_users_suppresses_warning(self):
with patch.dict(os.environ, {"SIGNAL_GROUP_ALLOWED_USERS": "user1"}, clear=True):
assert _would_warn() is False
def test_telegram_allow_all_users_suppresses_warning(self):
with patch.dict(os.environ, {"TELEGRAM_ALLOW_ALL_USERS": "true"}, clear=True):
assert _would_warn() is False
def test_gateway_allow_all_users_suppresses_warning(self):
with patch.dict(os.environ, {"GATEWAY_ALLOW_ALL_USERS": "yes"}, clear=True):
assert _would_warn() is False
+65 -1
View File
@@ -28,6 +28,7 @@ from gateway.platforms.api_server import (
_CORS_HEADERS,
check_api_server_requirements,
cors_middleware,
security_headers_middleware,
)
@@ -214,9 +215,11 @@ def _make_adapter(api_key: str = "", cors_origins=None) -> APIServerAdapter:
def _create_app(adapter: APIServerAdapter) -> web.Application:
"""Create the aiohttp app from the adapter (without starting the full server)."""
app = web.Application(middlewares=[cors_middleware])
mws = [mw for mw in (cors_middleware, security_headers_middleware) if mw is not None]
app = web.Application(middlewares=mws)
app["api_server_adapter"] = adapter
app.router.add_get("/health", adapter._handle_health)
app.router.add_get("/v1/health", adapter._handle_health)
app.router.add_get("/v1/models", adapter._handle_models)
app.router.add_post("/v1/chat/completions", adapter._handle_chat_completions)
app.router.add_post("/v1/responses", adapter._handle_responses)
@@ -241,6 +244,16 @@ def auth_adapter():
class TestHealthEndpoint:
@pytest.mark.asyncio
async def test_security_headers_present(self, adapter):
"""Responses should include basic security headers."""
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
resp = await cli.get("/health")
assert resp.status == 200
assert resp.headers.get("X-Content-Type-Options") == "nosniff"
assert resp.headers.get("Referrer-Policy") == "no-referrer"
@pytest.mark.asyncio
async def test_health_returns_ok(self, adapter):
app = _create_app(adapter)
@@ -251,6 +264,17 @@ class TestHealthEndpoint:
assert data["status"] == "ok"
assert data["platform"] == "hermes-agent"
@pytest.mark.asyncio
async def test_v1_health_alias_returns_ok(self, adapter):
"""GET /v1/health should return the same response as /health."""
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
resp = await cli.get("/v1/health")
assert resp.status == 200
data = await resp.json()
assert data["status"] == "ok"
assert data["platform"] == "hermes-agent"
# ---------------------------------------------------------------------------
# /v1/models endpoint
@@ -1300,6 +1324,31 @@ class TestCORS:
assert "POST" in resp.headers.get("Access-Control-Allow-Methods", "")
assert "DELETE" in resp.headers.get("Access-Control-Allow-Methods", "")
@pytest.mark.asyncio
async def test_cors_allows_idempotency_key_header(self):
adapter = _make_adapter(cors_origins=["http://localhost:3000"])
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
resp = await cli.options(
"/v1/chat/completions",
headers={
"Origin": "http://localhost:3000",
"Access-Control-Request-Method": "POST",
"Access-Control-Request-Headers": "Idempotency-Key",
},
)
assert resp.status == 200
assert "Idempotency-Key" in resp.headers.get("Access-Control-Allow-Headers", "")
@pytest.mark.asyncio
async def test_cors_sets_vary_origin_header(self):
adapter = _make_adapter(cors_origins=["http://localhost:3000"])
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
resp = await cli.get("/health", headers={"Origin": "http://localhost:3000"})
assert resp.status == 200
assert resp.headers.get("Vary") == "Origin"
@pytest.mark.asyncio
async def test_cors_options_preflight_allowed_for_configured_origin(self):
"""Configured origins can complete browser preflight."""
@@ -1319,6 +1368,21 @@ class TestCORS:
assert "Authorization" in resp.headers.get("Access-Control-Allow-Headers", "")
@pytest.mark.asyncio
async def test_cors_preflight_sets_max_age(self):
adapter = _make_adapter(cors_origins=["http://localhost:3000"])
app = _create_app(adapter)
async with TestClient(TestServer(app)) as cli:
resp = await cli.options(
"/v1/chat/completions",
headers={
"Origin": "http://localhost:3000",
"Access-Control-Request-Method": "POST",
"Access-Control-Request-Headers": "Authorization, Content-Type",
},
)
assert resp.status == 200
assert resp.headers.get("Access-Control-Max-Age") == "600"
# ---------------------------------------------------------------------------
# Conversation parameter
# ---------------------------------------------------------------------------
+38 -2
View File
@@ -69,7 +69,8 @@ class TestApiServerPlatformConfig:
class TestApiServerAdapterToolset:
@patch("gateway.platforms.api_server.AIOHTTP_AVAILABLE", True)
def test_create_agent_uses_api_server_toolset(self):
def test_create_agent_reads_config_toolsets(self):
"""API server resolves toolsets from config like all other platforms."""
from gateway.platforms.api_server import APIServerAdapter
from gateway.config import PlatformConfig
@@ -77,17 +78,52 @@ class TestApiServerAdapterToolset:
with patch("gateway.run._resolve_runtime_agent_kwargs") as mock_kwargs, \
patch("gateway.run._resolve_gateway_model") as mock_model, \
patch("gateway.run._load_gateway_config") as mock_config, \
patch("run_agent.AIAgent") as mock_agent_cls:
mock_kwargs.return_value = {"api_key": "test-key", "base_url": None,
"provider": None, "api_mode": None,
"command": None, "args": []}
mock_model.return_value = "test/model"
# No platform_toolsets override — should fall back to hermes-api-server default
mock_config.return_value = {}
mock_agent_cls.return_value = MagicMock()
adapter._create_agent()
mock_agent_cls.assert_called_once()
call_kwargs = mock_agent_cls.call_args
assert call_kwargs.kwargs.get("enabled_toolsets") == ["hermes-api-server"]
toolsets = call_kwargs.kwargs.get("enabled_toolsets")
assert isinstance(toolsets, list)
assert len(toolsets) > 0
assert call_kwargs.kwargs.get("platform") == "api_server"
@patch("gateway.platforms.api_server.AIOHTTP_AVAILABLE", True)
def test_create_agent_respects_config_override(self):
"""User can override API server toolsets via platform_toolsets in config.yaml."""
from gateway.platforms.api_server import APIServerAdapter
from gateway.config import PlatformConfig
adapter = APIServerAdapter(PlatformConfig())
with patch("gateway.run._resolve_runtime_agent_kwargs") as mock_kwargs, \
patch("gateway.run._resolve_gateway_model") as mock_model, \
patch("gateway.run._load_gateway_config") as mock_config, \
patch("run_agent.AIAgent") as mock_agent_cls:
mock_kwargs.return_value = {"api_key": "test-key", "base_url": None,
"provider": None, "api_mode": None,
"command": None, "args": []}
mock_model.return_value = "test/model"
# User overrides with just web and terminal
mock_config.return_value = {
"platform_toolsets": {"api_server": ["web", "terminal"]}
}
mock_agent_cls.return_value = MagicMock()
adapter._create_agent()
mock_agent_cls.assert_called_once()
call_kwargs = mock_agent_cls.call_args
toolsets = call_kwargs.kwargs.get("enabled_toolsets")
assert sorted(toolsets) == ["terminal", "web"]
+6 -3
View File
@@ -10,6 +10,7 @@ Covers:
"""
import asyncio
import os
import sys
from pathlib import Path
from types import SimpleNamespace
@@ -32,7 +33,7 @@ def _ensure_telegram_mock():
telegram_mod.constants.ChatType.CHANNEL = "channel"
telegram_mod.constants.ChatType.PRIVATE = "private"
for name in ("telegram", "telegram.ext", "telegram.constants"):
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
sys.modules.setdefault(name, telegram_mod)
@@ -227,7 +228,8 @@ def test_persist_dm_topic_thread_id_writes_config(tmp_path):
adapter = _make_adapter()
with patch.object(Path, "home", return_value=tmp_path):
with patch.object(Path, "home", return_value=tmp_path), \
patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}):
adapter._persist_dm_topic_thread_id(111, "General", 999)
with open(config_file) as f:
@@ -366,7 +368,8 @@ def test_get_dm_topic_info_hot_reloads_from_config(tmp_path):
with open(config_file, "w") as f:
yaml.dump(config_data, f)
with patch.object(Path, "home", return_value=tmp_path):
with patch.object(Path, "home", return_value=tmp_path), \
patch.dict(os.environ, {"HERMES_HOME": str(tmp_path / ".hermes")}):
result = adapter._get_dm_topic_info("111", "555")
assert result is not None
+89 -33
View File
@@ -7,11 +7,21 @@ Verifies that:
3. The flush still works normally when memory files don't exist
"""
import sys
import types
import pytest
from pathlib import Path
from unittest.mock import MagicMock, patch, call
@pytest.fixture(autouse=True)
def _mock_dotenv(monkeypatch):
"""gateway.run imports dotenv at module level; stub it so tests run without the package."""
fake = types.ModuleType("dotenv")
fake.load_dotenv = lambda *a, **kw: None
monkeypatch.setitem(sys.modules, "dotenv", fake)
def _make_runner():
from gateway.run import GatewayRunner
@@ -57,105 +67,151 @@ class TestCronSessionBypass:
runner.session_store.load_transcript.assert_called_once_with("session_abc123")
def _make_flush_context(monkeypatch, memory_dir=None):
"""Return (runner, tmp_agent, fake_run_agent) with run_agent mocked in sys.modules."""
tmp_agent = MagicMock()
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = MagicMock(return_value=tmp_agent)
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
return runner, tmp_agent, memory_dir
class TestMemoryInjection:
"""The flush prompt should include current memory state from disk."""
def test_memory_content_injected_into_flush_prompt(self, tmp_path):
def test_memory_content_injected_into_flush_prompt(self, tmp_path, monkeypatch):
"""When memory files exist, their content appears in the flush prompt."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
tmp_agent = MagicMock()
memory_dir = tmp_path / "memories"
memory_dir.mkdir()
(memory_dir / "MEMORY.md").write_text("Agent knows Python\n§\nUser prefers dark mode")
(memory_dir / "USER.md").write_text("Name: Alice\n§\nTimezone: PST")
runner, tmp_agent, _ = _make_flush_context(monkeypatch, memory_dir)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=tmp_agent),
# Intercept `from tools.memory_tool import MEMORY_DIR` inside the function
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=memory_dir)}),
):
runner._flush_memories_for_session("session_123")
tmp_agent.run_conversation.assert_called_once()
call_kwargs = tmp_agent.run_conversation.call_args.kwargs
flush_prompt = call_kwargs.get("user_message", "")
# Verify both memory sections appear in the prompt
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
assert "Agent knows Python" in flush_prompt
assert "User prefers dark mode" in flush_prompt
assert "Name: Alice" in flush_prompt
assert "Timezone: PST" in flush_prompt
# Verify the stale-overwrite warning is present
assert "Do NOT overwrite or remove entries" in flush_prompt
assert "current live state of memory" in flush_prompt
def test_flush_works_without_memory_files(self, tmp_path):
def test_flush_works_without_memory_files(self, tmp_path, monkeypatch):
"""When no memory files exist, flush still runs without the guard."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
tmp_agent = MagicMock()
empty_dir = tmp_path / "no_memories"
empty_dir.mkdir()
runner, tmp_agent, _ = _make_flush_context(monkeypatch)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=tmp_agent),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=empty_dir)}),
):
runner._flush_memories_for_session("session_456")
# Should still run, just without the memory guard section
tmp_agent.run_conversation.assert_called_once()
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
assert "Do NOT overwrite or remove entries" not in flush_prompt
assert "Review the conversation above" in flush_prompt
def test_empty_memory_files_no_injection(self, tmp_path):
def test_empty_memory_files_no_injection(self, tmp_path, monkeypatch):
"""Empty memory files should not trigger the guard section."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
tmp_agent = MagicMock()
memory_dir = tmp_path / "memories"
memory_dir.mkdir()
(memory_dir / "MEMORY.md").write_text("")
(memory_dir / "USER.md").write_text(" \n ") # whitespace only
runner, tmp_agent, _ = _make_flush_context(monkeypatch)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=tmp_agent),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=memory_dir)}),
):
runner._flush_memories_for_session("session_789")
tmp_agent.run_conversation.assert_called_once()
flush_prompt = tmp_agent.run_conversation.call_args.kwargs.get("user_message", "")
# No memory content → no guard section
assert "current live state of memory" not in flush_prompt
class TestFlushAgentSilenced:
"""The flush agent must not produce any terminal output."""
def test_print_fn_set_to_noop(self, tmp_path, monkeypatch):
"""_print_fn on the flush agent must be a no-op so tool output never leaks."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
captured_agent = {}
def _fake_ai_agent(*args, **kwargs):
agent = MagicMock()
captured_agent["instance"] = agent
return agent
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = _fake_ai_agent
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=tmp_path)}),
):
runner._flush_memories_for_session("session_silent")
agent = captured_agent["instance"]
assert agent._print_fn is not None, "_print_fn should be overridden to suppress output"
# Confirm it is callable and produces no output (no exception)
agent._print_fn("should be silenced")
def test_kawaii_spinner_respects_print_fn(self):
"""KawaiiSpinner must route all output through print_fn when supplied."""
from agent.display import KawaiiSpinner
written = []
spinner = KawaiiSpinner("test", print_fn=lambda *a, **kw: written.append(a))
spinner._write("hello")
assert written == [("hello",)], "spinner should route through print_fn"
# A no-op print_fn must produce no output to stdout
import io, sys
buf = io.StringIO()
old_stdout = sys.stdout
sys.stdout = buf
try:
silent_spinner = KawaiiSpinner("silent", print_fn=lambda *a, **kw: None)
silent_spinner._write("should not appear")
silent_spinner.stop("done")
finally:
sys.stdout = old_stdout
assert buf.getvalue() == "", "no-op print_fn spinner must not write to stdout"
class TestFlushPromptStructure:
"""Verify the flush prompt retains its core instructions."""
def test_core_instructions_present(self):
def test_core_instructions_present(self, monkeypatch):
"""The flush prompt should still contain the original guidance."""
runner = _make_runner()
runner.session_store.load_transcript.return_value = _TRANSCRIPT_4_MSGS
tmp_agent = MagicMock()
runner, tmp_agent, _ = _make_flush_context(monkeypatch)
with (
patch("gateway.run._resolve_runtime_agent_kwargs", return_value={"api_key": "k"}),
patch("gateway.run._resolve_gateway_model", return_value="test-model"),
patch("run_agent.AIAgent", return_value=tmp_agent),
# Make the import fail gracefully so we test without memory files
patch.dict("sys.modules", {"tools.memory_tool": MagicMock(MEMORY_DIR=Path("/nonexistent"))}),
):
runner._flush_memories_for_session("session_struct")
+197
View File
@@ -1,4 +1,5 @@
"""Tests for Matrix platform adapter."""
import asyncio
import json
import re
import pytest
@@ -446,3 +447,199 @@ class TestMatrixRequirements:
monkeypatch.delenv("MATRIX_HOMESERVER", raising=False)
from gateway.platforms.matrix import check_matrix_requirements
assert check_matrix_requirements() is False
# ---------------------------------------------------------------------------
# Access-token auth / E2EE bootstrap
# ---------------------------------------------------------------------------
class TestMatrixAccessTokenAuth:
@pytest.mark.asyncio
async def test_connect_fetches_device_id_from_whoami_for_access_token(self):
from gateway.platforms.matrix import MatrixAdapter
config = PlatformConfig(
enabled=True,
token="syt_test_access_token",
extra={
"homeserver": "https://matrix.example.org",
"user_id": "@bot:example.org",
"encryption": True,
},
)
adapter = MatrixAdapter(config)
class FakeWhoamiResponse:
def __init__(self, user_id, device_id):
self.user_id = user_id
self.device_id = device_id
class FakeSyncResponse:
def __init__(self):
self.rooms = MagicMock(join={})
fake_client = MagicMock()
fake_client.whoami = AsyncMock(return_value=FakeWhoamiResponse("@bot:example.org", "DEV123"))
fake_client.sync = AsyncMock(return_value=FakeSyncResponse())
fake_client.keys_upload = AsyncMock()
fake_client.keys_query = AsyncMock()
fake_client.keys_claim = AsyncMock()
fake_client.send_to_device_messages = AsyncMock(return_value=[])
fake_client.get_users_for_key_claiming = MagicMock(return_value={})
fake_client.close = AsyncMock()
fake_client.add_event_callback = MagicMock()
fake_client.rooms = {}
fake_client.account_data = {}
fake_client.olm = object()
fake_client.should_upload_keys = False
fake_client.should_query_keys = False
fake_client.should_claim_keys = False
def _restore_login(user_id, device_id, access_token):
fake_client.user_id = user_id
fake_client.device_id = device_id
fake_client.access_token = access_token
fake_client.olm = object()
fake_client.restore_login = MagicMock(side_effect=_restore_login)
fake_nio = MagicMock()
fake_nio.AsyncClient = MagicMock(return_value=fake_client)
fake_nio.WhoamiResponse = FakeWhoamiResponse
fake_nio.SyncResponse = FakeSyncResponse
fake_nio.LoginResponse = type("LoginResponse", (), {})
fake_nio.RoomMessageText = type("RoomMessageText", (), {})
fake_nio.RoomMessageImage = type("RoomMessageImage", (), {})
fake_nio.RoomMessageAudio = type("RoomMessageAudio", (), {})
fake_nio.RoomMessageVideo = type("RoomMessageVideo", (), {})
fake_nio.RoomMessageFile = type("RoomMessageFile", (), {})
fake_nio.InviteMemberEvent = type("InviteMemberEvent", (), {})
fake_nio.MegolmEvent = type("MegolmEvent", (), {})
with patch.dict("sys.modules", {"nio": fake_nio}):
with patch.object(adapter, "_refresh_dm_cache", AsyncMock()):
with patch.object(adapter, "_sync_loop", AsyncMock(return_value=None)):
assert await adapter.connect() is True
fake_client.restore_login.assert_called_once_with(
"@bot:example.org", "DEV123", "syt_test_access_token"
)
assert fake_client.access_token == "syt_test_access_token"
assert fake_client.user_id == "@bot:example.org"
assert fake_client.device_id == "DEV123"
fake_client.whoami.assert_awaited_once()
await adapter.disconnect()
class TestMatrixE2EEMaintenance:
@pytest.mark.asyncio
async def test_sync_loop_runs_e2ee_maintenance_requests(self):
adapter = _make_adapter()
adapter._encryption = True
adapter._closing = False
class FakeSyncError:
pass
async def _sync_once(timeout=30000):
adapter._closing = True
return MagicMock()
fake_client = MagicMock()
fake_client.sync = AsyncMock(side_effect=_sync_once)
fake_client.send_to_device_messages = AsyncMock(return_value=[])
fake_client.keys_upload = AsyncMock()
fake_client.keys_query = AsyncMock()
fake_client.get_users_for_key_claiming = MagicMock(
return_value={"@alice:example.org": ["DEVICE1"]}
)
fake_client.keys_claim = AsyncMock()
fake_client.olm = object()
fake_client.should_upload_keys = True
fake_client.should_query_keys = True
fake_client.should_claim_keys = True
adapter._client = fake_client
fake_nio = MagicMock()
fake_nio.SyncError = FakeSyncError
with patch.dict("sys.modules", {"nio": fake_nio}):
await adapter._sync_loop()
fake_client.sync.assert_awaited_once_with(timeout=30000)
fake_client.send_to_device_messages.assert_awaited_once()
fake_client.keys_upload.assert_awaited_once()
fake_client.keys_query.assert_awaited_once()
fake_client.keys_claim.assert_awaited_once_with(
{"@alice:example.org": ["DEVICE1"]}
)
class TestMatrixEncryptedSendFallback:
@pytest.mark.asyncio
async def test_send_retries_with_ignored_unverified_devices(self):
adapter = _make_adapter()
adapter._encryption = True
class FakeRoomSendResponse:
def __init__(self, event_id):
self.event_id = event_id
class FakeOlmUnverifiedDeviceError(Exception):
pass
fake_client = MagicMock()
fake_client.room_send = AsyncMock(side_effect=[
FakeOlmUnverifiedDeviceError("unverified"),
FakeRoomSendResponse("$event123"),
])
adapter._client = fake_client
adapter._run_e2ee_maintenance = AsyncMock()
fake_nio = MagicMock()
fake_nio.RoomSendResponse = FakeRoomSendResponse
fake_nio.OlmUnverifiedDeviceError = FakeOlmUnverifiedDeviceError
with patch.dict("sys.modules", {"nio": fake_nio}):
result = await adapter.send("!room:example.org", "hello")
assert result.success is True
assert result.message_id == "$event123"
adapter._run_e2ee_maintenance.assert_awaited_once()
assert fake_client.room_send.await_count == 2
first_call = fake_client.room_send.await_args_list[0]
second_call = fake_client.room_send.await_args_list[1]
assert first_call.kwargs.get("ignore_unverified_devices") is False
assert second_call.kwargs.get("ignore_unverified_devices") is True
@pytest.mark.asyncio
async def test_send_retries_after_timeout_in_encrypted_room(self):
adapter = _make_adapter()
adapter._encryption = True
class FakeRoomSendResponse:
def __init__(self, event_id):
self.event_id = event_id
fake_client = MagicMock()
fake_client.room_send = AsyncMock(side_effect=[
asyncio.TimeoutError(),
FakeRoomSendResponse("$event456"),
])
adapter._client = fake_client
adapter._run_e2ee_maintenance = AsyncMock()
fake_nio = MagicMock()
fake_nio.RoomSendResponse = FakeRoomSendResponse
with patch.dict("sys.modules", {"nio": fake_nio}):
result = await adapter.send("!room:example.org", "hello")
assert result.success is True
assert result.message_id == "$event456"
adapter._run_e2ee_maintenance.assert_awaited_once()
assert fake_client.room_send.await_count == 2
second_call = fake_client.room_send.await_args_list[1]
assert second_call.kwargs.get("ignore_unverified_devices") is True
+85 -1
View File
@@ -1,5 +1,6 @@
"""Tests for Mattermost platform adapter."""
import json
import os
import time
import pytest
from unittest.mock import MagicMock, patch, AsyncMock
@@ -269,6 +270,7 @@ class TestMattermostWebSocketParsing:
def setup_method(self):
self.adapter = _make_adapter()
self.adapter._bot_user_id = "bot_user_id"
self.adapter._bot_username = "hermes-bot"
# Mock handle_message to capture the MessageEvent without processing
self.adapter.handle_message = AsyncMock()
@@ -293,7 +295,8 @@ class TestMattermostWebSocketParsing:
await self.adapter._handle_ws_event(event)
assert self.adapter.handle_message.called
msg_event = self.adapter.handle_message.call_args[0][0]
assert msg_event.text == "@bot_user_id Hello from Matrix!"
# @mention is stripped from the message text
assert msg_event.text == "Hello from Matrix!"
assert msg_event.message_id == "post_abc"
@pytest.mark.asyncio
@@ -410,6 +413,87 @@ class TestMattermostWebSocketParsing:
assert not self.adapter.handle_message.called
# ---------------------------------------------------------------------------
# Mention behavior (require_mention + free_response_channels)
# ---------------------------------------------------------------------------
class TestMattermostMentionBehavior:
def setup_method(self):
self.adapter = _make_adapter()
self.adapter._bot_user_id = "bot_user_id"
self.adapter._bot_username = "hermes-bot"
self.adapter.handle_message = AsyncMock()
def _make_event(self, message, channel_type="O", channel_id="chan_456"):
post_data = {
"id": "post_mention",
"user_id": "user_123",
"channel_id": channel_id,
"message": message,
}
return {
"event": "posted",
"data": {
"post": json.dumps(post_data),
"channel_type": channel_type,
"sender_name": "@alice",
},
}
@pytest.mark.asyncio
async def test_require_mention_true_skips_without_mention(self):
"""Default: messages without @mention in channels are skipped."""
with patch.dict(os.environ, {}, clear=False):
os.environ.pop("MATTERMOST_REQUIRE_MENTION", None)
os.environ.pop("MATTERMOST_FREE_RESPONSE_CHANNELS", None)
await self.adapter._handle_ws_event(self._make_event("hello"))
assert not self.adapter.handle_message.called
@pytest.mark.asyncio
async def test_require_mention_false_responds_to_all(self):
"""MATTERMOST_REQUIRE_MENTION=false: respond to all channel messages."""
with patch.dict(os.environ, {"MATTERMOST_REQUIRE_MENTION": "false"}):
await self.adapter._handle_ws_event(self._make_event("hello"))
assert self.adapter.handle_message.called
@pytest.mark.asyncio
async def test_free_response_channel_responds_without_mention(self):
"""Messages in free-response channels don't need @mention."""
with patch.dict(os.environ, {"MATTERMOST_FREE_RESPONSE_CHANNELS": "chan_456,chan_789"}):
os.environ.pop("MATTERMOST_REQUIRE_MENTION", None)
await self.adapter._handle_ws_event(self._make_event("hello", channel_id="chan_456"))
assert self.adapter.handle_message.called
@pytest.mark.asyncio
async def test_non_free_channel_still_requires_mention(self):
"""Channels NOT in free-response list still require @mention."""
with patch.dict(os.environ, {"MATTERMOST_FREE_RESPONSE_CHANNELS": "chan_789"}):
os.environ.pop("MATTERMOST_REQUIRE_MENTION", None)
await self.adapter._handle_ws_event(self._make_event("hello", channel_id="chan_456"))
assert not self.adapter.handle_message.called
@pytest.mark.asyncio
async def test_dm_always_responds(self):
"""DMs (channel_type=D) always respond regardless of mention settings."""
with patch.dict(os.environ, {}, clear=False):
os.environ.pop("MATTERMOST_REQUIRE_MENTION", None)
await self.adapter._handle_ws_event(self._make_event("hello", channel_type="D"))
assert self.adapter.handle_message.called
@pytest.mark.asyncio
async def test_mention_stripped_from_text(self):
"""@mention is stripped from message text."""
with patch.dict(os.environ, {}, clear=False):
os.environ.pop("MATTERMOST_REQUIRE_MENTION", None)
await self.adapter._handle_ws_event(
self._make_event("@hermes-bot what is 2+2")
)
assert self.adapter.handle_message.called
msg = self.adapter.handle_message.call_args[0][0]
assert "@hermes-bot" not in msg.text
assert "2+2" in msg.text
# ---------------------------------------------------------------------------
# File upload (send_image)
# ---------------------------------------------------------------------------
+722
View File
@@ -0,0 +1,722 @@
"""
Tests for media download retry logic added in PR #2982.
Covers:
- gateway/platforms/base.py: cache_image_from_url
- gateway/platforms/slack.py: SlackAdapter._download_slack_file
SlackAdapter._download_slack_file_bytes
- gateway/platforms/mattermost.py: MattermostAdapter._send_url_as_file
All async tests use asyncio.run() directly pytest-asyncio is not installed
in this environment.
"""
import asyncio
import sys
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
import httpx
# ---------------------------------------------------------------------------
# Helpers for building httpx exceptions
# ---------------------------------------------------------------------------
def _make_http_status_error(status_code: int) -> httpx.HTTPStatusError:
request = httpx.Request("GET", "http://example.com/img.jpg")
response = httpx.Response(status_code=status_code, request=request)
return httpx.HTTPStatusError(
f"HTTP {status_code}", request=request, response=response
)
def _make_timeout_error() -> httpx.TimeoutException:
return httpx.TimeoutException("timed out")
# ---------------------------------------------------------------------------
# cache_image_from_url (base.py)
# ---------------------------------------------------------------------------
class TestCacheImageFromUrl:
"""Tests for gateway.platforms.base.cache_image_from_url"""
def test_success_on_first_attempt(self, tmp_path, monkeypatch):
"""A clean 200 response caches the image and returns a path."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
fake_response = MagicMock()
fake_response.content = b"\xff\xd8\xff fake jpeg"
fake_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(return_value=fake_response)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client):
from gateway.platforms.base import cache_image_from_url
return await cache_image_from_url(
"http://example.com/img.jpg", ext=".jpg"
)
path = asyncio.run(run())
assert path.endswith(".jpg")
mock_client.get.assert_called_once()
def test_retries_on_timeout_then_succeeds(self, tmp_path, monkeypatch):
"""A timeout on the first attempt is retried; second attempt succeeds."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
fake_response = MagicMock()
fake_response.content = b"image data"
fake_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(
side_effect=[_make_timeout_error(), fake_response]
)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
mock_sleep = AsyncMock()
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", mock_sleep):
from gateway.platforms.base import cache_image_from_url
return await cache_image_from_url(
"http://example.com/img.jpg", ext=".jpg", retries=2
)
path = asyncio.run(run())
assert path.endswith(".jpg")
assert mock_client.get.call_count == 2
mock_sleep.assert_called_once()
def test_retries_on_429_then_succeeds(self, tmp_path, monkeypatch):
"""A 429 response on the first attempt is retried; second attempt succeeds."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
ok_response = MagicMock()
ok_response.content = b"image data"
ok_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(
side_effect=[_make_http_status_error(429), ok_response]
)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", new_callable=AsyncMock):
from gateway.platforms.base import cache_image_from_url
return await cache_image_from_url(
"http://example.com/img.jpg", ext=".jpg", retries=2
)
path = asyncio.run(run())
assert path.endswith(".jpg")
assert mock_client.get.call_count == 2
def test_raises_after_max_retries_exhausted(self, tmp_path, monkeypatch):
"""Timeout on every attempt raises after all retries are consumed."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
mock_client = AsyncMock()
mock_client.get = AsyncMock(side_effect=_make_timeout_error())
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", new_callable=AsyncMock):
from gateway.platforms.base import cache_image_from_url
await cache_image_from_url(
"http://example.com/img.jpg", ext=".jpg", retries=2
)
with pytest.raises(httpx.TimeoutException):
asyncio.run(run())
# 3 total calls: initial + 2 retries
assert mock_client.get.call_count == 3
def test_non_retryable_4xx_raises_immediately(self, tmp_path, monkeypatch):
"""A 404 (non-retryable) is raised immediately without any retry."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
mock_sleep = AsyncMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(side_effect=_make_http_status_error(404))
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", mock_sleep):
from gateway.platforms.base import cache_image_from_url
await cache_image_from_url(
"http://example.com/img.jpg", ext=".jpg", retries=2
)
with pytest.raises(httpx.HTTPStatusError):
asyncio.run(run())
# Only 1 attempt, no sleep
assert mock_client.get.call_count == 1
mock_sleep.assert_not_called()
# ---------------------------------------------------------------------------
# cache_audio_from_url (base.py)
# ---------------------------------------------------------------------------
class TestCacheAudioFromUrl:
"""Tests for gateway.platforms.base.cache_audio_from_url"""
def test_success_on_first_attempt(self, tmp_path, monkeypatch):
"""A clean 200 response caches the audio and returns a path."""
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
fake_response = MagicMock()
fake_response.content = b"\x00\x01 fake audio"
fake_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(return_value=fake_response)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client):
from gateway.platforms.base import cache_audio_from_url
return await cache_audio_from_url(
"http://example.com/voice.ogg", ext=".ogg"
)
path = asyncio.run(run())
assert path.endswith(".ogg")
mock_client.get.assert_called_once()
def test_retries_on_timeout_then_succeeds(self, tmp_path, monkeypatch):
"""A timeout on the first attempt is retried; second attempt succeeds."""
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
fake_response = MagicMock()
fake_response.content = b"audio data"
fake_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(
side_effect=[_make_timeout_error(), fake_response]
)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
mock_sleep = AsyncMock()
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", mock_sleep):
from gateway.platforms.base import cache_audio_from_url
return await cache_audio_from_url(
"http://example.com/voice.ogg", ext=".ogg", retries=2
)
path = asyncio.run(run())
assert path.endswith(".ogg")
assert mock_client.get.call_count == 2
mock_sleep.assert_called_once()
def test_retries_on_429_then_succeeds(self, tmp_path, monkeypatch):
"""A 429 response on the first attempt is retried; second attempt succeeds."""
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
ok_response = MagicMock()
ok_response.content = b"audio data"
ok_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(
side_effect=[_make_http_status_error(429), ok_response]
)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", new_callable=AsyncMock):
from gateway.platforms.base import cache_audio_from_url
return await cache_audio_from_url(
"http://example.com/voice.ogg", ext=".ogg", retries=2
)
path = asyncio.run(run())
assert path.endswith(".ogg")
assert mock_client.get.call_count == 2
def test_retries_on_500_then_succeeds(self, tmp_path, monkeypatch):
"""A 500 response on the first attempt is retried; second attempt succeeds."""
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
ok_response = MagicMock()
ok_response.content = b"audio data"
ok_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(
side_effect=[_make_http_status_error(500), ok_response]
)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", new_callable=AsyncMock):
from gateway.platforms.base import cache_audio_from_url
return await cache_audio_from_url(
"http://example.com/voice.ogg", ext=".ogg", retries=2
)
path = asyncio.run(run())
assert path.endswith(".ogg")
assert mock_client.get.call_count == 2
def test_raises_after_max_retries_exhausted(self, tmp_path, monkeypatch):
"""Timeout on every attempt raises after all retries are consumed."""
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
mock_client = AsyncMock()
mock_client.get = AsyncMock(side_effect=_make_timeout_error())
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", new_callable=AsyncMock):
from gateway.platforms.base import cache_audio_from_url
await cache_audio_from_url(
"http://example.com/voice.ogg", ext=".ogg", retries=2
)
with pytest.raises(httpx.TimeoutException):
asyncio.run(run())
# 3 total calls: initial + 2 retries
assert mock_client.get.call_count == 3
def test_non_retryable_4xx_raises_immediately(self, tmp_path, monkeypatch):
"""A 404 (non-retryable) is raised immediately without any retry."""
monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")
mock_sleep = AsyncMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(side_effect=_make_http_status_error(404))
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", mock_sleep):
from gateway.platforms.base import cache_audio_from_url
await cache_audio_from_url(
"http://example.com/voice.ogg", ext=".ogg", retries=2
)
with pytest.raises(httpx.HTTPStatusError):
asyncio.run(run())
# Only 1 attempt, no sleep
assert mock_client.get.call_count == 1
mock_sleep.assert_not_called()
# ---------------------------------------------------------------------------
# Slack mock setup (mirrors existing test_slack.py approach)
# ---------------------------------------------------------------------------
def _ensure_slack_mock():
if "slack_bolt" in sys.modules and hasattr(sys.modules["slack_bolt"], "__file__"):
return
slack_bolt = MagicMock()
slack_bolt.async_app.AsyncApp = MagicMock
slack_bolt.adapter.socket_mode.async_handler.AsyncSocketModeHandler = MagicMock
slack_sdk = MagicMock()
slack_sdk.web.async_client.AsyncWebClient = MagicMock
for name, mod in [
("slack_bolt", slack_bolt),
("slack_bolt.async_app", slack_bolt.async_app),
("slack_bolt.adapter", slack_bolt.adapter),
("slack_bolt.adapter.socket_mode", slack_bolt.adapter.socket_mode),
("slack_bolt.adapter.socket_mode.async_handler",
slack_bolt.adapter.socket_mode.async_handler),
("slack_sdk", slack_sdk),
("slack_sdk.web", slack_sdk.web),
("slack_sdk.web.async_client", slack_sdk.web.async_client),
]:
sys.modules.setdefault(name, mod)
_ensure_slack_mock()
import gateway.platforms.slack as _slack_mod # noqa: E402
_slack_mod.SLACK_AVAILABLE = True
from gateway.platforms.slack import SlackAdapter # noqa: E402
from gateway.config import Platform, PlatformConfig # noqa: E402
def _make_slack_adapter():
config = PlatformConfig(enabled=True, token="xoxb-fake-token")
adapter = SlackAdapter(config)
adapter._app = MagicMock()
adapter._app.client = AsyncMock()
adapter._bot_user_id = "U_BOT"
adapter._running = True
return adapter
# ---------------------------------------------------------------------------
# SlackAdapter._download_slack_file
# ---------------------------------------------------------------------------
class TestSlackDownloadSlackFile:
"""Tests for SlackAdapter._download_slack_file"""
def test_success_on_first_attempt(self, tmp_path, monkeypatch):
"""Successful download on first try returns a cached file path."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
adapter = _make_slack_adapter()
fake_response = MagicMock()
fake_response.content = b"fake image bytes"
fake_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(return_value=fake_response)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client):
return await adapter._download_slack_file(
"https://files.slack.com/img.jpg", ext=".jpg"
)
path = asyncio.run(run())
assert path.endswith(".jpg")
mock_client.get.assert_called_once()
def test_retries_on_timeout_then_succeeds(self, tmp_path, monkeypatch):
"""Timeout on first attempt triggers retry; success on second."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
adapter = _make_slack_adapter()
fake_response = MagicMock()
fake_response.content = b"image bytes"
fake_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(
side_effect=[_make_timeout_error(), fake_response]
)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
mock_sleep = AsyncMock()
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", mock_sleep):
return await adapter._download_slack_file(
"https://files.slack.com/img.jpg", ext=".jpg"
)
path = asyncio.run(run())
assert path.endswith(".jpg")
assert mock_client.get.call_count == 2
mock_sleep.assert_called_once()
def test_raises_after_max_retries(self, tmp_path, monkeypatch):
"""Timeout on every attempt eventually raises after 3 total tries."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
adapter = _make_slack_adapter()
mock_client = AsyncMock()
mock_client.get = AsyncMock(side_effect=_make_timeout_error())
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", new_callable=AsyncMock):
await adapter._download_slack_file(
"https://files.slack.com/img.jpg", ext=".jpg"
)
with pytest.raises(httpx.TimeoutException):
asyncio.run(run())
assert mock_client.get.call_count == 3
def test_non_retryable_403_raises_immediately(self, tmp_path, monkeypatch):
"""A 403 is not retried; it raises immediately."""
monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")
adapter = _make_slack_adapter()
mock_sleep = AsyncMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(side_effect=_make_http_status_error(403))
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", mock_sleep):
await adapter._download_slack_file(
"https://files.slack.com/img.jpg", ext=".jpg"
)
with pytest.raises(httpx.HTTPStatusError):
asyncio.run(run())
assert mock_client.get.call_count == 1
mock_sleep.assert_not_called()
# ---------------------------------------------------------------------------
# SlackAdapter._download_slack_file_bytes
# ---------------------------------------------------------------------------
class TestSlackDownloadSlackFileBytes:
"""Tests for SlackAdapter._download_slack_file_bytes"""
def test_success_returns_bytes(self):
"""Successful download returns raw bytes."""
adapter = _make_slack_adapter()
fake_response = MagicMock()
fake_response.content = b"raw bytes here"
fake_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(return_value=fake_response)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client):
return await adapter._download_slack_file_bytes(
"https://files.slack.com/file.bin"
)
result = asyncio.run(run())
assert result == b"raw bytes here"
def test_retries_on_429_then_succeeds(self):
"""429 on first attempt is retried; raw bytes returned on second."""
adapter = _make_slack_adapter()
ok_response = MagicMock()
ok_response.content = b"final bytes"
ok_response.raise_for_status = MagicMock()
mock_client = AsyncMock()
mock_client.get = AsyncMock(
side_effect=[_make_http_status_error(429), ok_response]
)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", new_callable=AsyncMock):
return await adapter._download_slack_file_bytes(
"https://files.slack.com/file.bin"
)
result = asyncio.run(run())
assert result == b"final bytes"
assert mock_client.get.call_count == 2
def test_raises_after_max_retries(self):
"""Persistent timeouts raise after all 3 attempts are exhausted."""
adapter = _make_slack_adapter()
mock_client = AsyncMock()
mock_client.get = AsyncMock(side_effect=_make_timeout_error())
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=False)
async def run():
with patch("httpx.AsyncClient", return_value=mock_client), \
patch("asyncio.sleep", new_callable=AsyncMock):
await adapter._download_slack_file_bytes(
"https://files.slack.com/file.bin"
)
with pytest.raises(httpx.TimeoutException):
asyncio.run(run())
assert mock_client.get.call_count == 3
# ---------------------------------------------------------------------------
# MattermostAdapter._send_url_as_file
# ---------------------------------------------------------------------------
def _make_mm_adapter():
"""Build a minimal MattermostAdapter with mocked internals."""
from gateway.platforms.mattermost import MattermostAdapter
config = PlatformConfig(
enabled=True, token="mm-token-fake",
extra={"url": "https://mm.example.com"},
)
adapter = MattermostAdapter(config)
adapter._session = MagicMock()
adapter._upload_file = AsyncMock(return_value="file-id-123")
adapter._api_post = AsyncMock(return_value={"id": "post-id-abc"})
adapter.send = AsyncMock(return_value=MagicMock(success=True))
return adapter
def _make_aiohttp_resp(status: int, content: bytes = b"file bytes",
content_type: str = "image/jpeg"):
"""Build a context-manager mock for an aiohttp response."""
resp = MagicMock()
resp.status = status
resp.content_type = content_type
resp.read = AsyncMock(return_value=content)
resp.__aenter__ = AsyncMock(return_value=resp)
resp.__aexit__ = AsyncMock(return_value=False)
return resp
class TestMattermostSendUrlAsFile:
"""Tests for MattermostAdapter._send_url_as_file"""
def test_success_on_first_attempt(self):
"""200 on first attempt → file uploaded and post created."""
adapter = _make_mm_adapter()
resp = _make_aiohttp_resp(200)
adapter._session.get = MagicMock(return_value=resp)
async def run():
with patch("asyncio.sleep", new_callable=AsyncMock):
return await adapter._send_url_as_file(
"C123", "http://cdn.example.com/img.png", "caption", None
)
result = asyncio.run(run())
assert result.success
adapter._upload_file.assert_called_once()
adapter._api_post.assert_called_once()
def test_retries_on_429_then_succeeds(self):
"""429 on first attempt is retried; 200 on second attempt succeeds."""
adapter = _make_mm_adapter()
resp_429 = _make_aiohttp_resp(429)
resp_200 = _make_aiohttp_resp(200)
adapter._session.get = MagicMock(side_effect=[resp_429, resp_200])
mock_sleep = AsyncMock()
async def run():
with patch("asyncio.sleep", mock_sleep):
return await adapter._send_url_as_file(
"C123", "http://cdn.example.com/img.png", None, None
)
result = asyncio.run(run())
assert result.success
assert adapter._session.get.call_count == 2
mock_sleep.assert_called_once()
def test_retries_on_500_then_succeeds(self):
"""5xx on first attempt is retried; 200 on second attempt succeeds."""
adapter = _make_mm_adapter()
resp_500 = _make_aiohttp_resp(500)
resp_200 = _make_aiohttp_resp(200)
adapter._session.get = MagicMock(side_effect=[resp_500, resp_200])
async def run():
with patch("asyncio.sleep", new_callable=AsyncMock):
return await adapter._send_url_as_file(
"C123", "http://cdn.example.com/img.png", None, None
)
result = asyncio.run(run())
assert result.success
assert adapter._session.get.call_count == 2
def test_falls_back_to_text_after_max_retries_on_5xx(self):
"""Three consecutive 500s exhaust retries; falls back to send() with URL text."""
adapter = _make_mm_adapter()
resp_500 = _make_aiohttp_resp(500)
adapter._session.get = MagicMock(return_value=resp_500)
async def run():
with patch("asyncio.sleep", new_callable=AsyncMock):
return await adapter._send_url_as_file(
"C123", "http://cdn.example.com/img.png", "my caption", None
)
asyncio.run(run())
adapter.send.assert_called_once()
text_arg = adapter.send.call_args[0][1]
assert "http://cdn.example.com/img.png" in text_arg
def test_falls_back_on_client_error(self):
"""aiohttp.ClientError on every attempt falls back to send() with URL."""
import aiohttp
adapter = _make_mm_adapter()
error_resp = MagicMock()
error_resp.__aenter__ = AsyncMock(
side_effect=aiohttp.ClientConnectionError("connection refused")
)
error_resp.__aexit__ = AsyncMock(return_value=False)
adapter._session.get = MagicMock(return_value=error_resp)
async def run():
with patch("asyncio.sleep", new_callable=AsyncMock):
return await adapter._send_url_as_file(
"C123", "http://cdn.example.com/img.png", None, None
)
asyncio.run(run())
adapter.send.assert_called_once()
text_arg = adapter.send.call_args[0][1]
assert "http://cdn.example.com/img.png" in text_arg
def test_non_retryable_404_falls_back_immediately(self):
"""404 is non-retryable (< 500, != 429); send() is called right away."""
adapter = _make_mm_adapter()
resp_404 = _make_aiohttp_resp(404)
adapter._session.get = MagicMock(return_value=resp_404)
mock_sleep = AsyncMock()
async def run():
with patch("asyncio.sleep", mock_sleep):
return await adapter._send_url_as_file(
"C123", "http://cdn.example.com/img.png", None, None
)
asyncio.run(run())
adapter.send.assert_called_once()
# No sleep — fell back on first attempt
mock_sleep.assert_not_called()
assert adapter._session.get.call_count == 1
+12
View File
@@ -62,6 +62,18 @@ class TestMessageEventGetCommand:
event = MessageEvent(text="/")
assert event.get_command() == ""
def test_command_with_at_botname(self):
event = MessageEvent(text="/new@TigerNanoBot")
assert event.get_command() == "new"
def test_command_with_at_botname_and_args(self):
event = MessageEvent(text="/compress@TigerNanoBot")
assert event.get_command() == "compress"
def test_command_mixed_case_with_at_botname(self):
event = MessageEvent(text="/RESET@TigerNanoBot")
assert event.get_command() == "reset"
class TestMessageEventGetCommandArgs:
def test_command_with_args(self):
+29 -3
View File
@@ -344,6 +344,7 @@ class TestRuntimeDisconnectQueuing:
async def test_retryable_runtime_error_queued_for_reconnect(self):
"""Retryable runtime errors should add the platform to _failed_platforms."""
runner = _make_runner()
runner.stop = AsyncMock()
adapter = StubAdapter(succeed=True)
adapter._set_fatal_error("network_error", "DNS failure", retryable=True)
@@ -371,8 +372,12 @@ class TestRuntimeDisconnectQueuing:
assert Platform.TELEGRAM not in runner._failed_platforms
@pytest.mark.asyncio
async def test_retryable_error_prevents_shutdown_when_queued(self):
"""Gateway should not shut down if failed platforms are queued for reconnection."""
async def test_retryable_error_exits_for_service_restart_when_all_down(self):
"""Gateway should exit with failure when all platforms fail with retryable errors.
This lets systemd Restart=on-failure restart the process, which is more
reliable than in-process background reconnection after exhausted retries.
"""
runner = _make_runner()
runner.stop = AsyncMock()
@@ -382,7 +387,28 @@ class TestRuntimeDisconnectQueuing:
await runner._handle_adapter_fatal_error(adapter)
# stop() should NOT have been called since we have platforms queued
# stop() SHOULD be called — gateway exits for systemd restart
runner.stop.assert_called_once()
assert runner._exit_with_failure is True
assert Platform.TELEGRAM in runner._failed_platforms
@pytest.mark.asyncio
async def test_retryable_error_no_exit_when_other_adapters_still_connected(self):
"""Gateway should NOT exit if some adapters are still connected."""
runner = _make_runner()
runner.stop = AsyncMock()
failing_adapter = StubAdapter(succeed=True)
failing_adapter._set_fatal_error("network_error", "DNS failure", retryable=True)
runner.adapters[Platform.TELEGRAM] = failing_adapter
# Another adapter is still connected
healthy_adapter = StubAdapter(succeed=True)
runner.adapters[Platform.DISCORD] = healthy_adapter
await runner._handle_adapter_fatal_error(failing_adapter)
# stop() should NOT have been called — Discord is still up
runner.stop.assert_not_called()
assert Platform.TELEGRAM in runner._failed_platforms
+87 -3
View File
@@ -14,8 +14,8 @@ from gateway.session import SessionSource
class ProgressCaptureAdapter(BasePlatformAdapter):
def __init__(self):
super().__init__(PlatformConfig(enabled=True, token="fake-token"), Platform.TELEGRAM)
def __init__(self, platform=Platform.TELEGRAM):
super().__init__(PlatformConfig(enabled=True, token="***"), platform)
self.sent = []
self.edits = []
self.typing = []
@@ -76,7 +76,7 @@ def _make_runner(adapter):
GatewayRunner = gateway_run.GatewayRunner
runner = object.__new__(GatewayRunner)
runner.adapters = {Platform.TELEGRAM: adapter}
runner.adapters = {adapter.platform: adapter}
runner._voice_mode = {}
runner._prefill_messages = []
runner._ephemeral_system_prompt = ""
@@ -133,3 +133,87 @@ async def test_run_agent_progress_stays_in_originating_topic(monkeypatch, tmp_pa
]
assert adapter.edits
assert all(call["metadata"] == {"thread_id": "17585"} for call in adapter.typing)
@pytest.mark.asyncio
async def test_run_agent_progress_does_not_use_event_message_id_for_telegram_dm(monkeypatch, tmp_path):
"""Telegram DM progress must not reuse event message id as thread metadata."""
monkeypatch.setenv("HERMES_TOOL_PROGRESS_MODE", "all")
fake_dotenv = types.ModuleType("dotenv")
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = FakeAgent
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
adapter = ProgressCaptureAdapter(platform=Platform.TELEGRAM)
runner = _make_runner(adapter)
gateway_run = importlib.import_module("gateway.run")
monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"})
source = SessionSource(
platform=Platform.TELEGRAM,
chat_id="12345",
chat_type="dm",
thread_id=None,
)
result = await runner._run_agent(
message="hello",
context_prompt="",
history=[],
source=source,
session_id="sess-2",
session_key="agent:main:telegram:dm:12345",
event_message_id="777",
)
assert result["final_response"] == "done"
assert adapter.sent
assert adapter.sent[0]["metadata"] is None
assert all(call["metadata"] is None for call in adapter.typing)
@pytest.mark.asyncio
async def test_run_agent_progress_uses_event_message_id_for_slack_dm(monkeypatch, tmp_path):
"""Slack DM progress should keep event ts fallback threading."""
monkeypatch.setenv("HERMES_TOOL_PROGRESS_MODE", "all")
fake_dotenv = types.ModuleType("dotenv")
fake_dotenv.load_dotenv = lambda *args, **kwargs: None
monkeypatch.setitem(sys.modules, "dotenv", fake_dotenv)
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = FakeAgent
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
adapter = ProgressCaptureAdapter(platform=Platform.SLACK)
runner = _make_runner(adapter)
gateway_run = importlib.import_module("gateway.run")
monkeypatch.setattr(gateway_run, "_hermes_home", tmp_path)
monkeypatch.setattr(gateway_run, "_resolve_runtime_agent_kwargs", lambda: {"api_key": "***"})
source = SessionSource(
platform=Platform.SLACK,
chat_id="D123",
chat_type="dm",
thread_id=None,
)
result = await runner._run_agent(
message="hello",
context_prompt="",
history=[],
source=source,
session_id="sess-3",
session_key="agent:main:slack:dm:D123",
event_message_id="1234567890.000001",
)
assert result["final_response"] == "done"
assert adapter.sent
assert adapter.sent[0]["metadata"] == {"thread_id": "1234567890.000001"}
assert all(call["metadata"] == {"thread_id": "1234567890.000001"} for call in adapter.typing)
+3 -2
View File
@@ -89,7 +89,8 @@ async def test_runner_queues_retryable_runtime_fatal_for_reconnection(monkeypatc
await runner._handle_adapter_fatal_error(adapter)
# Should NOT shut down — platform is queued for reconnection
runner.stop.assert_not_awaited()
# Should shut down with failure — systemd Restart=on-failure will restart
runner.stop.assert_awaited_once()
assert runner._exit_with_failure is True
assert Platform.WHATSAPP in runner._failed_platforms
assert runner._failed_platforms[Platform.WHATSAPP]["attempts"] == 0
+1 -1
View File
@@ -76,7 +76,7 @@ def _ensure_telegram_mock():
telegram_mod.constants.ChatType.CHANNEL = "channel"
telegram_mod.constants.ChatType.PRIVATE = "private"
for name in ("telegram", "telegram.ext", "telegram.constants"):
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
sys.modules.setdefault(name, telegram_mod)
+231
View File
@@ -0,0 +1,231 @@
"""
Tests for BasePlatformAdapter._send_with_retry and _is_retryable_error.
Verifies that:
- Transient network errors trigger retry with backoff
- Permanent errors fall back to plain-text immediately (no retry)
- User receives a delivery-failure notice when all retries are exhausted
- Successful sends on retry return success
- SendResult.retryable flag is respected
"""
import pytest
from unittest.mock import AsyncMock, patch
from gateway.platforms.base import BasePlatformAdapter, SendResult, _RETRYABLE_ERROR_PATTERNS
from gateway.platforms.base import Platform, PlatformConfig
# ---------------------------------------------------------------------------
# Minimal concrete adapter for testing (no real network)
# ---------------------------------------------------------------------------
class _StubAdapter(BasePlatformAdapter):
def __init__(self):
cfg = PlatformConfig()
super().__init__(cfg, Platform.TELEGRAM)
self._send_results = [] # queue of SendResult to return per call
self._send_calls = [] # record of (chat_id, content) sent
def _next_result(self) -> SendResult:
if self._send_results:
return self._send_results.pop(0)
return SendResult(success=True, message_id="ok")
async def send(self, chat_id, content, reply_to=None, metadata=None, **kwargs) -> SendResult:
self._send_calls.append((chat_id, content))
return self._next_result()
async def connect(self) -> bool:
return True
async def disconnect(self) -> None:
pass
async def send_typing(self, chat_id, metadata=None) -> None:
pass
async def get_chat_info(self, chat_id):
return {"name": "test", "type": "direct", "chat_id": chat_id}
# ---------------------------------------------------------------------------
# _is_retryable_error
# ---------------------------------------------------------------------------
class TestIsRetryableError:
def test_none_is_not_retryable(self):
assert not _StubAdapter._is_retryable_error(None)
def test_empty_string_is_not_retryable(self):
assert not _StubAdapter._is_retryable_error("")
@pytest.mark.parametrize("pattern", _RETRYABLE_ERROR_PATTERNS)
def test_known_pattern_is_retryable(self, pattern):
assert _StubAdapter._is_retryable_error(f"httpx.{pattern.title()}: connection dropped")
def test_permission_error_not_retryable(self):
assert not _StubAdapter._is_retryable_error("Forbidden: bot was blocked by the user")
def test_bad_request_not_retryable(self):
assert not _StubAdapter._is_retryable_error("Bad Request: can't parse entities")
def test_case_insensitive(self):
assert _StubAdapter._is_retryable_error("CONNECTERROR: host unreachable")
# ---------------------------------------------------------------------------
# _send_with_retry — success on first attempt
# ---------------------------------------------------------------------------
class TestSendWithRetrySuccess:
@pytest.mark.asyncio
async def test_success_first_attempt(self):
adapter = _StubAdapter()
adapter._send_results = [SendResult(success=True, message_id="123")]
result = await adapter._send_with_retry("chat1", "hello")
assert result.success
assert len(adapter._send_calls) == 1
@pytest.mark.asyncio
async def test_returns_message_id(self):
adapter = _StubAdapter()
adapter._send_results = [SendResult(success=True, message_id="abc")]
result = await adapter._send_with_retry("chat1", "hi")
assert result.message_id == "abc"
# ---------------------------------------------------------------------------
# _send_with_retry — network error with successful retry
# ---------------------------------------------------------------------------
class TestSendWithRetryNetworkRetry:
@pytest.mark.asyncio
async def test_retries_on_connect_error_and_succeeds(self):
adapter = _StubAdapter()
adapter._send_results = [
SendResult(success=False, error="httpx.ConnectError: connection refused"),
SendResult(success=True, message_id="ok"),
]
with patch("asyncio.sleep", new_callable=AsyncMock):
result = await adapter._send_with_retry("chat1", "hello", max_retries=2, base_delay=0)
assert result.success
assert len(adapter._send_calls) == 2 # initial + 1 retry
@pytest.mark.asyncio
async def test_retries_on_timeout_and_succeeds(self):
adapter = _StubAdapter()
adapter._send_results = [
SendResult(success=False, error="ReadTimeout: request timed out"),
SendResult(success=False, error="ReadTimeout: request timed out"),
SendResult(success=True, message_id="ok"),
]
with patch("asyncio.sleep", new_callable=AsyncMock):
result = await adapter._send_with_retry("chat1", "hello", max_retries=3, base_delay=0)
assert result.success
assert len(adapter._send_calls) == 3
@pytest.mark.asyncio
async def test_retryable_flag_respected(self):
"""SendResult.retryable=True should trigger retry even if error string doesn't match."""
adapter = _StubAdapter()
adapter._send_results = [
SendResult(success=False, error="internal platform error", retryable=True),
SendResult(success=True, message_id="ok"),
]
with patch("asyncio.sleep", new_callable=AsyncMock):
result = await adapter._send_with_retry("chat1", "hello", max_retries=2, base_delay=0)
assert result.success
assert len(adapter._send_calls) == 2
@pytest.mark.asyncio
async def test_network_to_nonnetwork_transition_falls_back_to_plaintext(self):
"""If error switches from network to formatting mid-retry, fall through to plain-text fallback."""
adapter = _StubAdapter()
adapter._send_results = [
SendResult(success=False, error="httpx.ConnectError: host unreachable"),
SendResult(success=False, error="Bad Request: can't parse entities"),
SendResult(success=True, message_id="fallback_ok"), # plain-text fallback
]
with patch("asyncio.sleep", new_callable=AsyncMock):
result = await adapter._send_with_retry("chat1", "**bold**", max_retries=2, base_delay=0)
assert result.success
# 3 calls: initial (network) + 1 retry (non-network, breaks loop) + plain-text fallback
assert len(adapter._send_calls) == 3
assert "plain text" in adapter._send_calls[-1][1].lower()
# ---------------------------------------------------------------------------
# _send_with_retry — all retries exhausted → user notification
# ---------------------------------------------------------------------------
class TestSendWithRetryExhausted:
@pytest.mark.asyncio
async def test_sends_user_notice_after_exhaustion(self):
adapter = _StubAdapter()
network_err = SendResult(success=False, error="httpx.ConnectError: host unreachable")
# initial + 2 retries + notice attempt
adapter._send_results = [network_err, network_err, network_err, SendResult(success=True)]
with patch("asyncio.sleep", new_callable=AsyncMock):
result = await adapter._send_with_retry("chat1", "hello", max_retries=2, base_delay=0)
# Result is the last failed one (before notice)
assert not result.success
# 4 total calls: 1 initial + 2 retries + 1 notice
assert len(adapter._send_calls) == 4
# The notice content should mention delivery failure
notice_content = adapter._send_calls[-1][1]
assert "delivery failed" in notice_content.lower() or "Message delivery failed" in notice_content
@pytest.mark.asyncio
async def test_notice_send_exception_doesnt_propagate(self):
"""If the notice itself throws, _send_with_retry should not raise."""
adapter = _StubAdapter()
network_err = SendResult(success=False, error="ConnectError")
adapter._send_results = [network_err, network_err, network_err]
original_send = adapter.send
call_count = [0]
async def send_with_notice_failure(chat_id, content, **kwargs):
call_count[0] += 1
if call_count[0] > 3:
raise RuntimeError("notice send also failed")
return network_err
adapter.send = send_with_notice_failure
with patch("asyncio.sleep", new_callable=AsyncMock):
result = await adapter._send_with_retry("chat1", "hello", max_retries=2, base_delay=0)
assert not result.success # still failed, but no exception raised
# ---------------------------------------------------------------------------
# _send_with_retry — non-network failure → plain-text fallback (no retry)
# ---------------------------------------------------------------------------
class TestSendWithRetryFallback:
@pytest.mark.asyncio
async def test_non_network_error_falls_back_immediately(self):
adapter = _StubAdapter()
adapter._send_results = [
SendResult(success=False, error="Bad Request: can't parse entities"),
SendResult(success=True, message_id="fallback_ok"),
]
with patch("asyncio.sleep", new_callable=AsyncMock) as mock_sleep:
result = await adapter._send_with_retry("chat1", "**bold**", max_retries=2, base_delay=0)
# No sleep — no retry loop for non-network errors
mock_sleep.assert_not_called()
assert result.success
assert len(adapter._send_calls) == 2
# Fallback content should be plain-text notice
assert "plain text" in adapter._send_calls[1][1].lower()
@pytest.mark.asyncio
async def test_fallback_failure_logged_but_not_raised(self):
adapter = _StubAdapter()
adapter._send_results = [
SendResult(success=False, error="Forbidden: bot blocked"),
SendResult(success=False, error="Forbidden: bot blocked"),
]
with patch("asyncio.sleep", new_callable=AsyncMock):
result = await adapter._send_with_retry("chat1", "hello", max_retries=2)
assert not result.success
assert len(adapter._send_calls) == 2 # original + fallback only
+45 -1
View File
@@ -846,7 +846,7 @@ class TestLastPromptTokens:
store.update_session("k1", model="openai/gpt-5.4")
store._db.update_token_counts.assert_called_once_with(
store._db.set_token_counts.assert_called_once_with(
"s1",
input_tokens=0,
output_tokens=0,
@@ -858,4 +858,48 @@ class TestLastPromptTokens:
billing_provider=None,
billing_base_url=None,
model="openai/gpt-5.4",
absolute=True,
)
class TestRewriteTranscriptPreservesReasoning:
"""rewrite_transcript must not drop reasoning fields from SQLite."""
def test_reasoning_survives_rewrite(self, tmp_path):
from hermes_state import SessionDB
db = SessionDB(db_path=tmp_path / "test.db")
session_id = "reasoning-test"
db.create_session(session_id=session_id, source="cli")
# Insert a message WITH all three reasoning fields
db.append_message(
session_id=session_id,
role="assistant",
content="The answer is 42.",
reasoning="I need to think step by step.",
reasoning_details=[{"type": "summary", "text": "step by step"}],
codex_reasoning_items=[{"id": "r1", "type": "reasoning"}],
)
# Verify all three were stored
before = db.get_messages_as_conversation(session_id)
assert before[0].get("reasoning") == "I need to think step by step."
assert before[0].get("reasoning_details") == [{"type": "summary", "text": "step by step"}]
assert before[0].get("codex_reasoning_items") == [{"id": "r1", "type": "reasoning"}]
# Now simulate /retry: build the SessionStore and call rewrite_transcript
config = GatewayConfig()
with patch("gateway.session.SessionStore._ensure_loaded"):
store = SessionStore(sessions_dir=tmp_path, config=config)
store._db = db
store._loaded = True
# rewrite_transcript receives the messages that load_transcript returned
store.rewrite_transcript(session_id, before)
# Load again — all three reasoning fields must survive
after = db.get_messages_as_conversation(session_id)
assert after[0].get("reasoning") == "I need to think step by step."
assert after[0].get("reasoning_details") == [{"type": "summary", "text": "step by step"}]
assert after[0].get("codex_reasoning_items") == [{"id": "r1", "type": "reasoning"}]
+4
View File
@@ -304,8 +304,12 @@ async def test_session_hygiene_messages_stay_in_originating_topic(monkeypatch, t
class FakeCompressAgent:
def __init__(self, **kwargs):
self.model = kwargs.get("model")
self.session_id = kwargs.get("session_id", "fake-session")
self._print_fn = None
def _compress_context(self, messages, *_args, **_kwargs):
# Simulate real _compress_context: create a new session_id
self.session_id = f"{self.session_id}_compressed"
return ([{"role": "assistant", "content": "compressed"}], None)
fake_run_agent = types.ModuleType("run_agent")
+110
View File
@@ -0,0 +1,110 @@
"""Tests for GatewayRunner._format_session_info — session config surfacing."""
import pytest
from unittest.mock import patch, MagicMock
from pathlib import Path
from gateway.run import GatewayRunner
@pytest.fixture()
def runner():
"""Create a bare GatewayRunner without __init__."""
return GatewayRunner.__new__(GatewayRunner)
def _patch_info(tmp_path, config_yaml, model, runtime):
"""Return a context-manager stack that patches _format_session_info deps."""
cfg_path = tmp_path / "config.yaml"
if config_yaml is not None:
cfg_path.write_text(config_yaml)
return (
patch("gateway.run._hermes_home", tmp_path),
patch("gateway.run._resolve_gateway_model", return_value=model),
patch("gateway.run._resolve_runtime_agent_kwargs", return_value=runtime),
)
class TestFormatSessionInfo:
def test_includes_model_name(self, runner, tmp_path):
p1, p2, p3 = _patch_info(tmp_path, "model:\n default: anthropic/claude-opus-4.6\n provider: openrouter\n",
"anthropic/claude-opus-4.6",
{"provider": "openrouter", "base_url": "https://openrouter.ai/api/v1", "api_key": "k"})
with p1, p2, p3:
info = runner._format_session_info()
assert "claude-opus-4.6" in info
def test_includes_provider(self, runner, tmp_path):
p1, p2, p3 = _patch_info(tmp_path, "model:\n default: test-model\n provider: openrouter\n",
"test-model",
{"provider": "openrouter", "base_url": "", "api_key": ""})
with p1, p2, p3:
info = runner._format_session_info()
assert "openrouter" in info
def test_config_context_length(self, runner, tmp_path):
p1, p2, p3 = _patch_info(tmp_path, "model:\n default: test-model\n context_length: 32768\n",
"test-model",
{"provider": "custom", "base_url": "", "api_key": ""})
with p1, p2, p3:
info = runner._format_session_info()
assert "32K" in info
assert "config" in info
def test_default_fallback_hint(self, runner, tmp_path):
p1, p2, p3 = _patch_info(tmp_path, "model:\n default: unknown-model-xyz\n",
"unknown-model-xyz",
{"provider": "", "base_url": "", "api_key": ""})
with p1, p2, p3:
info = runner._format_session_info()
assert "128K" in info
assert "model.context_length" in info
def test_local_endpoint_shown(self, runner, tmp_path):
p1, p2, p3 = _patch_info(
tmp_path,
"model:\n default: qwen3:8b\n provider: custom\n base_url: http://localhost:11434/v1\n context_length: 8192\n",
"qwen3:8b",
{"provider": "custom", "base_url": "http://localhost:11434/v1", "api_key": ""})
with p1, p2, p3:
info = runner._format_session_info()
assert "localhost:11434" in info
assert "8K" in info
def test_cloud_endpoint_hidden(self, runner, tmp_path):
p1, p2, p3 = _patch_info(tmp_path, "model:\n default: test-model\n provider: openrouter\n",
"test-model",
{"provider": "openrouter", "base_url": "https://openrouter.ai/api/v1", "api_key": "k"})
with p1, p2, p3:
info = runner._format_session_info()
assert "Endpoint" not in info
def test_million_context_format(self, runner, tmp_path):
p1, p2, p3 = _patch_info(tmp_path, "model:\n default: test-model\n context_length: 1000000\n",
"test-model",
{"provider": "", "base_url": "", "api_key": ""})
with p1, p2, p3:
info = runner._format_session_info()
assert "1.0M" in info
def test_missing_config(self, runner, tmp_path):
"""No config.yaml should not crash."""
p1, p2, p3 = _patch_info(tmp_path, None, # don't create config
"anthropic/claude-sonnet-4.6",
{"provider": "openrouter", "base_url": "", "api_key": ""})
with p1, p2, p3:
info = runner._format_session_info()
assert "Model" in info
assert "Context" in info
def test_runtime_resolution_failure_doesnt_crash(self, runner, tmp_path):
"""If runtime resolution raises, should still produce output."""
cfg_path = tmp_path / "config.yaml"
cfg_path.write_text("model:\n default: test-model\n context_length: 4096\n")
with patch("gateway.run._hermes_home", tmp_path), \
patch("gateway.run._resolve_gateway_model", return_value="test-model"), \
patch("gateway.run._resolve_runtime_agent_kwargs", side_effect=RuntimeError("no creds")):
info = runner._format_session_info()
assert "4K" in info
assert "config" in info
+280
View File
@@ -0,0 +1,280 @@
"""Tests for SSE client disconnect → agent task cancellation.
When a streaming /v1/chat/completions client disconnects mid-stream
(network drop, browser tab close), the agent is interrupted via
agent.interrupt() so it stops making LLM API calls, and the asyncio
task wrapper is cancelled.
"""
import asyncio
import json
import queue
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_adapter():
"""Build a minimal APIServerAdapter with mocked internals."""
from gateway.platforms.api_server import APIServerAdapter
from gateway.config import PlatformConfig
config = PlatformConfig(enabled=True, token="test-key")
adapter = APIServerAdapter(config)
return adapter
def _make_request():
"""Build a mock aiohttp request."""
req = MagicMock()
req.headers = {}
return req
# ---------------------------------------------------------------------------
# Tests
# ---------------------------------------------------------------------------
class TestSSEAgentCancelOnDisconnect:
"""gateway/platforms/api_server.py — _write_sse_chat_completion()"""
def test_agent_task_cancelled_on_client_disconnect(self):
"""When response.write raises ConnectionResetError (client dropped),
the agent task must be cancelled."""
adapter = _make_adapter()
stream_q = queue.Queue()
stream_q.put("hello ") # Some data already queued
# Agent task that runs forever (simulates a long LLM call)
agent_done = asyncio.Event()
async def fake_agent():
await agent_done.wait()
return {"final_response": "done"}, {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
async def run():
from aiohttp import web
agent_task = asyncio.ensure_future(fake_agent())
# Mock response that raises ConnectionResetError on second write
mock_response = AsyncMock(spec=web.StreamResponse)
call_count = 0
async def write_side_effect(data):
nonlocal call_count
call_count += 1
if call_count >= 2:
raise ConnectionResetError("client disconnected")
mock_response.write = AsyncMock(side_effect=write_side_effect)
mock_response.prepare = AsyncMock()
with patch.object(type(adapter), '_write_sse_chat_completion',
adapter._write_sse_chat_completion):
# Patch StreamResponse creation
with patch("gateway.platforms.api_server.web.StreamResponse",
return_value=mock_response):
await adapter._write_sse_chat_completion(
_make_request(), "cmpl-123", "gpt-4", 1234567890,
stream_q, agent_task,
)
# The critical assertion: agent_task must be cancelled
assert agent_task.cancelled() or agent_task.done()
# Clean up
agent_done.set()
asyncio.run(run())
def test_agent_task_not_cancelled_on_normal_completion(self):
"""On normal stream completion, agent task should NOT be cancelled."""
adapter = _make_adapter()
stream_q = queue.Queue()
stream_q.put("hello")
stream_q.put(None) # End-of-stream sentinel
async def fake_agent():
return {"final_response": "done"}, {"input_tokens": 10, "output_tokens": 5, "total_tokens": 15}
async def run():
from aiohttp import web
agent_task = asyncio.ensure_future(fake_agent())
await asyncio.sleep(0) # Let agent complete
mock_response = AsyncMock(spec=web.StreamResponse)
mock_response.write = AsyncMock()
mock_response.prepare = AsyncMock()
with patch("gateway.platforms.api_server.web.StreamResponse",
return_value=mock_response):
await adapter._write_sse_chat_completion(
_make_request(), "cmpl-456", "gpt-4", 1234567890,
stream_q, agent_task,
)
# Agent should have completed normally, not been cancelled
assert agent_task.done()
assert not agent_task.cancelled()
asyncio.run(run())
def test_broken_pipe_also_cancels_agent(self):
"""BrokenPipeError (another disconnect variant) also cancels the task."""
adapter = _make_adapter()
stream_q = queue.Queue()
async def fake_agent():
await asyncio.sleep(999) # Never completes
return {}, {}
async def run():
from aiohttp import web
agent_task = asyncio.ensure_future(fake_agent())
mock_response = AsyncMock(spec=web.StreamResponse)
mock_response.write = AsyncMock(side_effect=BrokenPipeError("pipe broken"))
mock_response.prepare = AsyncMock()
with patch("gateway.platforms.api_server.web.StreamResponse",
return_value=mock_response):
await adapter._write_sse_chat_completion(
_make_request(), "cmpl-789", "gpt-4", 1234567890,
stream_q, agent_task,
)
assert agent_task.cancelled() or agent_task.done()
asyncio.run(run())
def test_already_done_task_not_cancelled_on_disconnect(self):
"""If agent already finished before disconnect, don't try to cancel."""
adapter = _make_adapter()
stream_q = queue.Queue()
stream_q.put("data")
async def fake_agent():
return {"final_response": "done"}, {}
async def run():
from aiohttp import web
agent_task = asyncio.ensure_future(fake_agent())
await asyncio.sleep(0) # Let agent complete
mock_response = AsyncMock(spec=web.StreamResponse)
call_count = 0
async def write_side_effect(data):
nonlocal call_count
call_count += 1
if call_count >= 2:
raise ConnectionResetError("late disconnect")
mock_response.write = AsyncMock(side_effect=write_side_effect)
mock_response.prepare = AsyncMock()
with patch("gateway.platforms.api_server.web.StreamResponse",
return_value=mock_response):
await adapter._write_sse_chat_completion(
_make_request(), "cmpl-done", "gpt-4", 1234567890,
stream_q, agent_task,
)
# Task was already done — should not be cancelled
assert agent_task.done()
assert not agent_task.cancelled()
asyncio.run(run())
def test_agent_interrupt_called_on_disconnect(self):
"""When the client disconnects, agent.interrupt() must be called
so the agent thread stops making LLM API calls."""
adapter = _make_adapter()
stream_q = queue.Queue()
stream_q.put("hello ")
agent_done = asyncio.Event()
async def fake_agent():
await agent_done.wait()
return {"final_response": "done"}, {}
# Mock agent with an interrupt method
mock_agent = MagicMock()
mock_agent.interrupt = MagicMock()
async def run():
from aiohttp import web
agent_task = asyncio.ensure_future(fake_agent())
agent_ref = [mock_agent]
mock_response = AsyncMock(spec=web.StreamResponse)
call_count = 0
async def write_side_effect(data):
nonlocal call_count
call_count += 1
if call_count >= 2:
raise ConnectionResetError("client disconnected")
mock_response.write = AsyncMock(side_effect=write_side_effect)
mock_response.prepare = AsyncMock()
with patch("gateway.platforms.api_server.web.StreamResponse",
return_value=mock_response):
await adapter._write_sse_chat_completion(
_make_request(), "cmpl-int", "gpt-4", 1234567890,
stream_q, agent_task, agent_ref,
)
# agent.interrupt() must have been called
mock_agent.interrupt.assert_called_once_with("SSE client disconnected")
# Clean up
agent_done.set()
asyncio.run(run())
def test_agent_ref_none_still_cancels_task(self):
"""When agent_ref is not provided (None), the task is still cancelled
on disconnect just without the interrupt() call."""
adapter = _make_adapter()
stream_q = queue.Queue()
async def fake_agent():
await asyncio.sleep(999)
return {}, {}
async def run():
from aiohttp import web
agent_task = asyncio.ensure_future(fake_agent())
mock_response = AsyncMock(spec=web.StreamResponse)
mock_response.write = AsyncMock(side_effect=BrokenPipeError("gone"))
mock_response.prepare = AsyncMock()
with patch("gateway.platforms.api_server.web.StreamResponse",
return_value=mock_response):
# No agent_ref passed — should still handle disconnect cleanly
await adapter._write_sse_chat_completion(
_make_request(), "cmpl-noref", "gpt-4", 1234567890,
stream_q, agent_task,
)
assert agent_task.cancelled() or agent_task.done()
asyncio.run(run())
+9 -1
View File
@@ -20,7 +20,7 @@ def _ensure_telegram_mock():
telegram_mod.constants.ChatType.CHANNEL = "channel"
telegram_mod.constants.ChatType.PRIVATE = "private"
for name in ("telegram", "telegram.ext", "telegram.constants"):
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
sys.modules.setdefault(name, telegram_mod)
@@ -29,6 +29,14 @@ _ensure_telegram_mock()
from gateway.platforms.telegram import TelegramAdapter # noqa: E402
@pytest.fixture(autouse=True)
def _no_auto_discovery(monkeypatch):
"""Disable DoH auto-discovery so connect() uses the plain builder chain."""
async def _noop():
return []
monkeypatch.setattr("gateway.platforms.telegram.discover_fallback_ips", _noop)
@pytest.mark.asyncio
async def test_connect_rejects_same_host_token_lock(monkeypatch):
adapter = TelegramAdapter(PlatformConfig(enabled=True, token="secret-token"))
+1 -1
View File
@@ -45,7 +45,7 @@ def _ensure_telegram_mock():
telegram_mod.constants.ChatType.CHANNEL = "channel"
telegram_mod.constants.ChatType.PRIVATE = "private"
for name in ("telegram", "telegram.ext", "telegram.constants"):
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
sys.modules.setdefault(name, telegram_mod)
+1 -1
View File
@@ -28,7 +28,7 @@ def _ensure_telegram_mock():
mod.constants.ChatType.SUPERGROUP = "supergroup"
mod.constants.ChatType.CHANNEL = "channel"
mod.constants.ChatType.PRIVATE = "private"
for name in ("telegram", "telegram.ext", "telegram.constants"):
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
sys.modules.setdefault(name, mod)
+644
View File
@@ -0,0 +1,644 @@
"""Tests for gateway.platforms.telegram_network fallback transport layer.
Background
----------
api.telegram.org resolves to an IP (e.g. 149.154.166.110) that is unreachable
from some networks. The workaround: route TCP through a different IP in the
same Telegram-owned 149.154.160.0/20 block (e.g. 149.154.167.220) while
keeping TLS SNI and the Host header as api.telegram.org so Telegram's edge
servers still accept the request. This is the programmatic equivalent of:
curl --resolve api.telegram.org:443:149.154.167.220 https://api.telegram.org/bot<token>/getMe
The TelegramFallbackTransport implements this: try the primary (DNS-resolved)
path first, and on ConnectTimeout / ConnectError fall through to configured
fallback IPs in order, then "stick" to whichever IP works.
"""
import httpx
import pytest
from gateway.platforms import telegram_network as tnet
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
class FakeTransport(httpx.AsyncBaseTransport):
"""Records calls and raises / returns based on a host→action mapping."""
def __init__(self, calls, behavior):
self.calls = calls
self.behavior = behavior
self.closed = False
async def handle_async_request(self, request: httpx.Request) -> httpx.Response:
self.calls.append(
{
"url_host": request.url.host,
"host_header": request.headers.get("host"),
"sni_hostname": request.extensions.get("sni_hostname"),
"path": request.url.path,
}
)
action = self.behavior.get(request.url.host, "ok")
if action == "timeout":
raise httpx.ConnectTimeout("timed out")
if action == "connect_error":
raise httpx.ConnectError("connect error")
if isinstance(action, Exception):
raise action
return httpx.Response(200, request=request, text="ok")
async def aclose(self) -> None:
self.closed = True
def _fake_transport_factory(calls, behavior):
"""Returns a factory that creates FakeTransport instances."""
instances = []
def factory(**kwargs):
t = FakeTransport(calls, behavior)
instances.append(t)
return t
factory.instances = instances
return factory
def _telegram_request(path="/botTOKEN/getMe"):
return httpx.Request("GET", f"https://api.telegram.org{path}")
# ═══════════════════════════════════════════════════════════════════════════
# IP parsing & validation
# ═══════════════════════════════════════════════════════════════════════════
class TestParseFallbackIpEnv:
def test_filters_invalid_and_ipv6(self, caplog):
ips = tnet.parse_fallback_ip_env("149.154.167.220, bad, 2001:67c:4e8:f004::9,149.154.167.220")
assert ips == ["149.154.167.220", "149.154.167.220"]
assert "Ignoring invalid Telegram fallback IP" in caplog.text
assert "Ignoring non-IPv4 Telegram fallback IP" in caplog.text
def test_none_returns_empty(self):
assert tnet.parse_fallback_ip_env(None) == []
def test_empty_string_returns_empty(self):
assert tnet.parse_fallback_ip_env("") == []
def test_whitespace_only_returns_empty(self):
assert tnet.parse_fallback_ip_env(" , , ") == []
def test_single_valid_ip(self):
assert tnet.parse_fallback_ip_env("149.154.167.220") == ["149.154.167.220"]
def test_multiple_valid_ips(self):
ips = tnet.parse_fallback_ip_env("149.154.167.220, 149.154.167.221")
assert ips == ["149.154.167.220", "149.154.167.221"]
def test_rejects_leading_zeros(self, caplog):
"""Leading zeros are ambiguous (octal?) so ipaddress rejects them."""
ips = tnet.parse_fallback_ip_env("149.154.167.010")
assert ips == []
assert "Ignoring invalid" in caplog.text
class TestNormalizeFallbackIps:
def test_deduplication_happens_at_transport_level(self):
"""_normalize does not dedup; TelegramFallbackTransport.__init__ does."""
raw = ["149.154.167.220", "149.154.167.220"]
assert tnet._normalize_fallback_ips(raw) == ["149.154.167.220", "149.154.167.220"]
def test_empty_strings_skipped(self):
assert tnet._normalize_fallback_ips(["", " ", "149.154.167.220"]) == ["149.154.167.220"]
# ═══════════════════════════════════════════════════════════════════════════
# Request rewriting
# ═══════════════════════════════════════════════════════════════════════════
class TestRewriteRequestForIp:
def test_preserves_host_and_sni(self):
request = _telegram_request()
rewritten = tnet._rewrite_request_for_ip(request, "149.154.167.220")
assert rewritten.url.host == "149.154.167.220"
assert rewritten.headers["host"] == "api.telegram.org"
assert rewritten.extensions["sni_hostname"] == "api.telegram.org"
assert rewritten.url.path == "/botTOKEN/getMe"
def test_preserves_method_and_path(self):
request = httpx.Request("POST", "https://api.telegram.org/botTOKEN/sendMessage")
rewritten = tnet._rewrite_request_for_ip(request, "149.154.167.220")
assert rewritten.method == "POST"
assert rewritten.url.path == "/botTOKEN/sendMessage"
# ═══════════════════════════════════════════════════════════════════════════
# Fallback transport core behavior
# ═══════════════════════════════════════════════════════════════════════════
class TestFallbackTransport:
"""Primary path fails → try fallback IPs → stick to whichever works."""
@pytest.mark.asyncio
async def test_falls_back_on_connect_timeout_and_becomes_sticky(self, monkeypatch):
calls = []
behavior = {"api.telegram.org": "timeout", "149.154.167.220": "ok"}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport(["149.154.167.220"])
resp = await transport.handle_async_request(_telegram_request())
assert resp.status_code == 200
assert transport._sticky_ip == "149.154.167.220"
# First attempt was primary (api.telegram.org), second was fallback
assert calls[0]["url_host"] == "api.telegram.org"
assert calls[1]["url_host"] == "149.154.167.220"
assert calls[1]["host_header"] == "api.telegram.org"
assert calls[1]["sni_hostname"] == "api.telegram.org"
# Second request goes straight to sticky IP
calls.clear()
resp2 = await transport.handle_async_request(_telegram_request())
assert resp2.status_code == 200
assert calls[0]["url_host"] == "149.154.167.220"
@pytest.mark.asyncio
async def test_falls_back_on_connect_error(self, monkeypatch):
calls = []
behavior = {"api.telegram.org": "connect_error", "149.154.167.220": "ok"}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport(["149.154.167.220"])
resp = await transport.handle_async_request(_telegram_request())
assert resp.status_code == 200
assert transport._sticky_ip == "149.154.167.220"
@pytest.mark.asyncio
async def test_does_not_fallback_on_non_connect_error(self, monkeypatch):
"""Errors like ReadTimeout are not connection issues — don't retry."""
calls = []
behavior = {"api.telegram.org": httpx.ReadTimeout("read timeout"), "149.154.167.220": "ok"}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport(["149.154.167.220"])
with pytest.raises(httpx.ReadTimeout):
await transport.handle_async_request(_telegram_request())
assert [c["url_host"] for c in calls] == ["api.telegram.org"]
@pytest.mark.asyncio
async def test_all_ips_fail_raises_last_error(self, monkeypatch):
calls = []
behavior = {"api.telegram.org": "timeout", "149.154.167.220": "timeout"}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport(["149.154.167.220"])
with pytest.raises(httpx.ConnectTimeout):
await transport.handle_async_request(_telegram_request())
assert [c["url_host"] for c in calls] == ["api.telegram.org", "149.154.167.220"]
assert transport._sticky_ip is None
@pytest.mark.asyncio
async def test_multiple_fallback_ips_tried_in_order(self, monkeypatch):
calls = []
behavior = {
"api.telegram.org": "timeout",
"149.154.167.220": "timeout",
"149.154.167.221": "ok",
}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport(["149.154.167.220", "149.154.167.221"])
resp = await transport.handle_async_request(_telegram_request())
assert resp.status_code == 200
assert transport._sticky_ip == "149.154.167.221"
assert [c["url_host"] for c in calls] == [
"api.telegram.org",
"149.154.167.220",
"149.154.167.221",
]
@pytest.mark.asyncio
async def test_sticky_ip_tried_first_but_falls_through_if_stale(self, monkeypatch):
"""If the sticky IP stops working, the transport retries others."""
calls = []
behavior = {
"api.telegram.org": "timeout",
"149.154.167.220": "ok",
"149.154.167.221": "ok",
}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport(["149.154.167.220", "149.154.167.221"])
# First request: primary fails → .220 works → becomes sticky
await transport.handle_async_request(_telegram_request())
assert transport._sticky_ip == "149.154.167.220"
# Now .220 goes bad too
calls.clear()
behavior["149.154.167.220"] = "timeout"
resp = await transport.handle_async_request(_telegram_request())
assert resp.status_code == 200
# Tried sticky (.220) first, then fell through to .221
assert [c["url_host"] for c in calls] == ["149.154.167.220", "149.154.167.221"]
assert transport._sticky_ip == "149.154.167.221"
class TestFallbackTransportPassthrough:
"""Requests that don't need fallback behavior."""
@pytest.mark.asyncio
async def test_non_telegram_host_bypasses_fallback(self, monkeypatch):
calls = []
behavior = {}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport(["149.154.167.220"])
request = httpx.Request("GET", "https://example.com/path")
resp = await transport.handle_async_request(request)
assert resp.status_code == 200
assert calls[0]["url_host"] == "example.com"
assert transport._sticky_ip is None
@pytest.mark.asyncio
async def test_empty_fallback_list_uses_primary_only(self, monkeypatch):
calls = []
behavior = {}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport([])
resp = await transport.handle_async_request(_telegram_request())
assert resp.status_code == 200
assert calls[0]["url_host"] == "api.telegram.org"
@pytest.mark.asyncio
async def test_primary_succeeds_no_fallback_needed(self, monkeypatch):
calls = []
behavior = {"api.telegram.org": "ok"}
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", _fake_transport_factory(calls, behavior))
transport = tnet.TelegramFallbackTransport(["149.154.167.220"])
resp = await transport.handle_async_request(_telegram_request())
assert resp.status_code == 200
assert transport._sticky_ip is None
assert len(calls) == 1
class TestFallbackTransportInit:
def test_deduplicates_fallback_ips(self, monkeypatch):
monkeypatch.setattr(
tnet.httpx, "AsyncHTTPTransport", lambda **kw: FakeTransport([], {})
)
transport = tnet.TelegramFallbackTransport(["149.154.167.220", "149.154.167.220"])
assert transport._fallback_ips == ["149.154.167.220"]
def test_filters_invalid_ips_at_init(self, monkeypatch):
monkeypatch.setattr(
tnet.httpx, "AsyncHTTPTransport", lambda **kw: FakeTransport([], {})
)
transport = tnet.TelegramFallbackTransport(["149.154.167.220", "not-an-ip"])
assert transport._fallback_ips == ["149.154.167.220"]
def test_uses_proxy_env_for_primary_and_fallback_transports(self, monkeypatch):
seen_kwargs = []
def factory(**kwargs):
seen_kwargs.append(kwargs.copy())
return FakeTransport([], {})
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY", "https_proxy", "http_proxy", "all_proxy"):
monkeypatch.delenv(key, raising=False)
monkeypatch.setenv("HTTPS_PROXY", "http://proxy.example:8080")
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", factory)
transport = tnet.TelegramFallbackTransport(["149.154.167.220"])
assert transport._fallback_ips == ["149.154.167.220"]
assert len(seen_kwargs) == 2
assert all(kwargs["proxy"] == "http://proxy.example:8080" for kwargs in seen_kwargs)
class TestFallbackTransportClose:
@pytest.mark.asyncio
async def test_aclose_closes_all_transports(self, monkeypatch):
factory = _fake_transport_factory([], {})
monkeypatch.setattr(tnet.httpx, "AsyncHTTPTransport", factory)
transport = tnet.TelegramFallbackTransport(["149.154.167.220", "149.154.167.221"])
await transport.aclose()
# 1 primary + 2 fallback transports
assert len(factory.instances) == 3
assert all(t.closed for t in factory.instances)
# ═══════════════════════════════════════════════════════════════════════════
# Config layer TELEGRAM_FALLBACK_IPS env → config.extra
# ═══════════════════════════════════════════════════════════════════════════
class TestConfigFallbackIps:
def test_env_var_populates_config_extra(self, monkeypatch):
from gateway.config import GatewayConfig, Platform, PlatformConfig, _apply_env_overrides
monkeypatch.setenv("TELEGRAM_FALLBACK_IPS", "149.154.167.220,149.154.167.221")
config = GatewayConfig(platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="tok")})
_apply_env_overrides(config)
assert config.platforms[Platform.TELEGRAM].extra["fallback_ips"] == [
"149.154.167.220", "149.154.167.221",
]
def test_env_var_creates_platform_if_missing(self, monkeypatch):
from gateway.config import GatewayConfig, Platform, _apply_env_overrides
monkeypatch.setenv("TELEGRAM_FALLBACK_IPS", "149.154.167.220")
config = GatewayConfig(platforms={})
_apply_env_overrides(config)
assert Platform.TELEGRAM in config.platforms
assert config.platforms[Platform.TELEGRAM].extra["fallback_ips"] == ["149.154.167.220"]
def test_env_var_strips_whitespace(self, monkeypatch):
from gateway.config import GatewayConfig, Platform, PlatformConfig, _apply_env_overrides
monkeypatch.setenv("TELEGRAM_FALLBACK_IPS", " 149.154.167.220 , 149.154.167.221 ")
config = GatewayConfig(platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="tok")})
_apply_env_overrides(config)
assert config.platforms[Platform.TELEGRAM].extra["fallback_ips"] == [
"149.154.167.220", "149.154.167.221",
]
def test_empty_env_var_does_not_populate(self, monkeypatch):
from gateway.config import GatewayConfig, Platform, PlatformConfig, _apply_env_overrides
monkeypatch.setenv("TELEGRAM_FALLBACK_IPS", "")
config = GatewayConfig(platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="tok")})
_apply_env_overrides(config)
assert "fallback_ips" not in config.platforms[Platform.TELEGRAM].extra
# ═══════════════════════════════════════════════════════════════════════════
# Adapter layer _fallback_ips() reads config correctly
# ═══════════════════════════════════════════════════════════════════════════
class TestAdapterFallbackIps:
def _make_adapter(self, extra=None):
import sys
from unittest.mock import MagicMock
# Ensure telegram mock is in place
if "telegram" not in sys.modules or not hasattr(sys.modules["telegram"], "__file__"):
mod = MagicMock()
mod.ext.ContextTypes.DEFAULT_TYPE = type(None)
mod.constants.ParseMode.MARKDOWN_V2 = "MarkdownV2"
mod.constants.ChatType.GROUP = "group"
mod.constants.ChatType.SUPERGROUP = "supergroup"
mod.constants.ChatType.CHANNEL = "channel"
mod.constants.ChatType.PRIVATE = "private"
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
sys.modules.setdefault(name, mod)
from gateway.config import PlatformConfig
from gateway.platforms.telegram import TelegramAdapter
config = PlatformConfig(enabled=True, token="test-token")
if extra:
config.extra.update(extra)
return TelegramAdapter(config)
def test_list_in_extra(self):
adapter = self._make_adapter(extra={"fallback_ips": ["149.154.167.220"]})
assert adapter._fallback_ips() == ["149.154.167.220"]
def test_csv_string_in_extra(self):
adapter = self._make_adapter(extra={"fallback_ips": "149.154.167.220,149.154.167.221"})
assert adapter._fallback_ips() == ["149.154.167.220", "149.154.167.221"]
def test_empty_extra(self):
adapter = self._make_adapter()
assert adapter._fallback_ips() == []
def test_no_extra_attr(self):
adapter = self._make_adapter()
adapter.config.extra = None
assert adapter._fallback_ips() == []
def test_invalid_ips_filtered(self):
adapter = self._make_adapter(extra={"fallback_ips": ["149.154.167.220", "not-valid"]})
assert adapter._fallback_ips() == ["149.154.167.220"]
# ═══════════════════════════════════════════════════════════════════════════
# DoH auto-discovery
# ═══════════════════════════════════════════════════════════════════════════
def _doh_answer(*ips: str) -> dict:
"""Build a minimal DoH JSON response with A records."""
return {"Answer": [{"type": 1, "data": ip} for ip in ips]}
class FakeDoHClient:
"""Mock httpx.AsyncClient for DoH queries."""
def __init__(self, responses: dict):
# responses: URL prefix → (status, json_body) | Exception
self._responses = responses
self.requests_made: list[dict] = []
@staticmethod
def _make_response(status, body, url):
"""Build an httpx.Response with a request attached (needed for raise_for_status)."""
request = httpx.Request("GET", url)
return httpx.Response(status, json=body, request=request)
async def get(self, url, *, params=None, headers=None, **kwargs):
self.requests_made.append({"url": url, "params": params, "headers": headers})
for prefix, action in self._responses.items():
if url.startswith(prefix):
if isinstance(action, Exception):
raise action
status, body = action
return self._make_response(status, body, url)
return self._make_response(200, {}, url)
async def __aenter__(self):
return self
async def __aexit__(self, *args):
pass
class TestDiscoverFallbackIps:
"""Tests for discover_fallback_ips() — DoH-based auto-discovery."""
def _patch_doh(self, monkeypatch, responses, system_dns_ips=None):
"""Wire up fake DoH client and system DNS."""
client = FakeDoHClient(responses)
monkeypatch.setattr(tnet.httpx, "AsyncClient", lambda **kw: client)
if system_dns_ips is not None:
addrs = [(None, None, None, None, (ip, 443)) for ip in system_dns_ips]
monkeypatch.setattr(tnet.socket, "getaddrinfo", lambda *a, **kw: addrs)
else:
def _fail(*a, **kw):
raise OSError("dns failed")
monkeypatch.setattr(tnet.socket, "getaddrinfo", _fail)
return client
@pytest.mark.asyncio
async def test_google_and_cloudflare_ips_collected(self, monkeypatch):
self._patch_doh(monkeypatch, {
"https://dns.google": (200, _doh_answer("149.154.167.220")),
"https://cloudflare-dns.com": (200, _doh_answer("149.154.167.221")),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert "149.154.167.220" in ips
assert "149.154.167.221" in ips
@pytest.mark.asyncio
async def test_system_dns_ip_excluded(self, monkeypatch):
"""The IP from system DNS is the one that doesn't work — exclude it."""
self._patch_doh(monkeypatch, {
"https://dns.google": (200, _doh_answer("149.154.166.110", "149.154.167.220")),
"https://cloudflare-dns.com": (200, _doh_answer("149.154.166.110")),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == ["149.154.167.220"]
@pytest.mark.asyncio
async def test_doh_results_deduplicated(self, monkeypatch):
self._patch_doh(monkeypatch, {
"https://dns.google": (200, _doh_answer("149.154.167.220")),
"https://cloudflare-dns.com": (200, _doh_answer("149.154.167.220")),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == ["149.154.167.220"]
@pytest.mark.asyncio
async def test_doh_timeout_falls_back_to_seed(self, monkeypatch):
self._patch_doh(monkeypatch, {
"https://dns.google": httpx.TimeoutException("timeout"),
"https://cloudflare-dns.com": httpx.TimeoutException("timeout"),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == tnet._SEED_FALLBACK_IPS
@pytest.mark.asyncio
async def test_doh_connect_error_falls_back_to_seed(self, monkeypatch):
self._patch_doh(monkeypatch, {
"https://dns.google": httpx.ConnectError("refused"),
"https://cloudflare-dns.com": httpx.ConnectError("refused"),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == tnet._SEED_FALLBACK_IPS
@pytest.mark.asyncio
async def test_doh_malformed_json_falls_back_to_seed(self, monkeypatch):
self._patch_doh(monkeypatch, {
"https://dns.google": (200, {"Status": 0}), # no Answer key
"https://cloudflare-dns.com": (200, {"garbage": True}),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == tnet._SEED_FALLBACK_IPS
@pytest.mark.asyncio
async def test_one_provider_fails_other_succeeds(self, monkeypatch):
self._patch_doh(monkeypatch, {
"https://dns.google": httpx.TimeoutException("timeout"),
"https://cloudflare-dns.com": (200, _doh_answer("149.154.167.220")),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == ["149.154.167.220"]
@pytest.mark.asyncio
async def test_system_dns_failure_keeps_all_doh_ips(self, monkeypatch):
"""If system DNS fails, nothing gets excluded — all DoH IPs kept."""
self._patch_doh(monkeypatch, {
"https://dns.google": (200, _doh_answer("149.154.166.110", "149.154.167.220")),
"https://cloudflare-dns.com": (200, _doh_answer()),
}, system_dns_ips=None) # triggers OSError
ips = await tnet.discover_fallback_ips()
assert "149.154.166.110" in ips
assert "149.154.167.220" in ips
@pytest.mark.asyncio
async def test_all_doh_ips_same_as_system_dns_uses_seed(self, monkeypatch):
"""DoH returns only the same blocked IP — seed list is the fallback."""
self._patch_doh(monkeypatch, {
"https://dns.google": (200, _doh_answer("149.154.166.110")),
"https://cloudflare-dns.com": (200, _doh_answer("149.154.166.110")),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == tnet._SEED_FALLBACK_IPS
@pytest.mark.asyncio
async def test_cloudflare_gets_accept_header(self, monkeypatch):
client = self._patch_doh(monkeypatch, {
"https://dns.google": (200, _doh_answer("149.154.167.220")),
"https://cloudflare-dns.com": (200, _doh_answer("149.154.167.221")),
}, system_dns_ips=["149.154.166.110"])
await tnet.discover_fallback_ips()
cf_reqs = [r for r in client.requests_made if "cloudflare" in r["url"]]
assert cf_reqs
assert cf_reqs[0]["headers"]["Accept"] == "application/dns-json"
@pytest.mark.asyncio
async def test_non_a_records_ignored(self, monkeypatch):
"""AAAA records (type 28) and CNAME (type 5) should be skipped."""
answer = {
"Answer": [
{"type": 5, "data": "telegram.org"}, # CNAME
{"type": 28, "data": "2001:67c:4e8:f004::9"}, # AAAA
{"type": 1, "data": "149.154.167.220"}, # A ✓
]
}
self._patch_doh(monkeypatch, {
"https://dns.google": (200, answer),
"https://cloudflare-dns.com": (200, _doh_answer()),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == ["149.154.167.220"]
@pytest.mark.asyncio
async def test_invalid_ip_in_doh_response_skipped(self, monkeypatch):
answer = {"Answer": [
{"type": 1, "data": "not-an-ip"},
{"type": 1, "data": "149.154.167.220"},
]}
self._patch_doh(monkeypatch, {
"https://dns.google": (200, answer),
"https://cloudflare-dns.com": (200, _doh_answer()),
}, system_dns_ips=["149.154.166.110"])
ips = await tnet.discover_fallback_ips()
assert ips == ["149.154.167.220"]
@@ -27,7 +27,7 @@ def _ensure_telegram_mock():
telegram_mod.constants.ChatType.CHANNEL = "channel"
telegram_mod.constants.ChatType.PRIVATE = "private"
for name in ("telegram", "telegram.ext", "telegram.constants"):
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
sys.modules.setdefault(name, telegram_mod)
@@ -36,6 +36,14 @@ _ensure_telegram_mock()
from gateway.platforms.telegram import TelegramAdapter # noqa: E402
@pytest.fixture(autouse=True)
def _no_auto_discovery(monkeypatch):
"""Disable DoH auto-discovery so connect() uses the plain builder chain."""
async def _noop():
return []
monkeypatch.setattr("gateway.platforms.telegram.discover_fallback_ips", _noop)
def _make_adapter() -> TelegramAdapter:
return TelegramAdapter(PlatformConfig(enabled=True, token="test-token"))
+1 -1
View File
@@ -25,7 +25,7 @@ def _ensure_telegram_mock():
mod.constants.ChatType.SUPERGROUP = "supergroup"
mod.constants.ChatType.CHANNEL = "channel"
mod.constants.ChatType.PRIVATE = "private"
for name in ("telegram", "telegram.ext", "telegram.constants"):
for name in ("telegram", "telegram.ext", "telegram.constants", "telegram.request"):
sys.modules.setdefault(name, mod)
@@ -0,0 +1,199 @@
"""Tests for Telegram send() thread_id fallback.
When message_thread_id points to a non-existent thread, Telegram returns
BadRequest('Message thread not found'). Since BadRequest is a subclass of
NetworkError in python-telegram-bot, the old retry loop treated this as a
transient error and retried 3 times before silently failing killing all
tool progress messages, streaming responses, and typing indicators.
The fix detects "thread not found" BadRequest errors and retries the send
WITHOUT message_thread_id so the message still reaches the chat.
"""
import sys
import types
from types import SimpleNamespace
import pytest
from gateway.config import PlatformConfig, Platform
from gateway.platforms.base import SendResult
# ── Fake telegram.error hierarchy ──────────────────────────────────────
# Mirrors the real python-telegram-bot hierarchy:
# BadRequest → NetworkError → TelegramError → Exception
class FakeNetworkError(Exception):
pass
class FakeBadRequest(FakeNetworkError):
pass
# Build a fake telegram module tree so the adapter's internal imports work
_fake_telegram = types.ModuleType("telegram")
_fake_telegram_error = types.ModuleType("telegram.error")
_fake_telegram_error.NetworkError = FakeNetworkError
_fake_telegram_error.BadRequest = FakeBadRequest
_fake_telegram.error = _fake_telegram_error
_fake_telegram_constants = types.ModuleType("telegram.constants")
_fake_telegram_constants.ParseMode = SimpleNamespace(MARKDOWN_V2="MarkdownV2")
_fake_telegram.constants = _fake_telegram_constants
@pytest.fixture(autouse=True)
def _inject_fake_telegram(monkeypatch):
"""Inject fake telegram modules so the adapter can import from them."""
monkeypatch.setitem(sys.modules, "telegram", _fake_telegram)
monkeypatch.setitem(sys.modules, "telegram.error", _fake_telegram_error)
monkeypatch.setitem(sys.modules, "telegram.constants", _fake_telegram_constants)
def _make_adapter():
from gateway.platforms.telegram import TelegramAdapter
config = PlatformConfig(enabled=True, token="fake-token")
adapter = object.__new__(TelegramAdapter)
adapter._config = config
adapter._platform = Platform.TELEGRAM
adapter._connected = True
adapter._dm_topics = {}
adapter._dm_topics_config = []
adapter._reply_to_mode = "first"
adapter._fallback_ips = []
adapter._polling_conflict_count = 0
adapter._polling_network_error_count = 0
adapter._polling_error_callback_ref = None
adapter.platform = Platform.TELEGRAM
return adapter
@pytest.mark.asyncio
async def test_send_retries_without_thread_on_thread_not_found():
"""When message_thread_id causes 'thread not found', retry without it."""
adapter = _make_adapter()
call_log = []
async def mock_send_message(**kwargs):
call_log.append(dict(kwargs))
tid = kwargs.get("message_thread_id")
if tid is not None:
raise FakeBadRequest("Message thread not found")
return SimpleNamespace(message_id=42)
adapter._bot = SimpleNamespace(send_message=mock_send_message)
result = await adapter.send(
chat_id="123",
content="test message",
metadata={"thread_id": "99999"},
)
assert result.success is True
assert result.message_id == "42"
# First call has thread_id, second call retries without
assert len(call_log) == 2
assert call_log[0]["message_thread_id"] == 99999
assert call_log[1]["message_thread_id"] is None
@pytest.mark.asyncio
async def test_send_raises_on_other_bad_request():
"""Non-thread BadRequest errors should NOT be retried — they fail immediately."""
adapter = _make_adapter()
async def mock_send_message(**kwargs):
raise FakeBadRequest("Chat not found")
adapter._bot = SimpleNamespace(send_message=mock_send_message)
result = await adapter.send(
chat_id="123",
content="test message",
metadata={"thread_id": "99999"},
)
assert result.success is False
assert "Chat not found" in result.error
@pytest.mark.asyncio
async def test_send_without_thread_id_unaffected():
"""Normal sends without thread_id should work as before."""
adapter = _make_adapter()
call_log = []
async def mock_send_message(**kwargs):
call_log.append(dict(kwargs))
return SimpleNamespace(message_id=100)
adapter._bot = SimpleNamespace(send_message=mock_send_message)
result = await adapter.send(
chat_id="123",
content="test message",
)
assert result.success is True
assert len(call_log) == 1
assert call_log[0]["message_thread_id"] is None
@pytest.mark.asyncio
async def test_send_retries_network_errors_normally():
"""Real transient network errors (not BadRequest) should still be retried."""
adapter = _make_adapter()
attempt = [0]
async def mock_send_message(**kwargs):
attempt[0] += 1
if attempt[0] < 3:
raise FakeNetworkError("Connection reset")
return SimpleNamespace(message_id=200)
adapter._bot = SimpleNamespace(send_message=mock_send_message)
result = await adapter.send(
chat_id="123",
content="test message",
)
assert result.success is True
assert attempt[0] == 3 # Two retries then success
@pytest.mark.asyncio
async def test_thread_fallback_only_fires_once():
"""After clearing thread_id, subsequent chunks should also use None."""
adapter = _make_adapter()
call_log = []
async def mock_send_message(**kwargs):
call_log.append(dict(kwargs))
tid = kwargs.get("message_thread_id")
if tid is not None:
raise FakeBadRequest("Message thread not found")
return SimpleNamespace(message_id=42)
adapter._bot = SimpleNamespace(send_message=mock_send_message)
# Send a long message that gets split into chunks
long_msg = "A" * 5000 # Exceeds Telegram's 4096 limit
result = await adapter.send(
chat_id="123",
content=long_msg,
metadata={"thread_id": "99999"},
)
assert result.success is True
# First chunk: attempt with thread → fail → retry without → succeed
# Second chunk: should use thread_id=None directly (effective_thread_id
# was cleared per-chunk but the metadata doesn't change between chunks)
# The key point: the message was delivered despite the invalid thread
@@ -0,0 +1,87 @@
"""Tests for webhook adapter dynamic route loading."""
import json
import os
import pytest
from pathlib import Path
from gateway.config import PlatformConfig
from gateway.platforms.webhook import WebhookAdapter, _DYNAMIC_ROUTES_FILENAME
def _make_adapter(routes=None, extra=None):
_extra = extra or {}
if routes:
_extra["routes"] = routes
_extra.setdefault("secret", "test-global-secret")
config = PlatformConfig(enabled=True, extra=_extra)
return WebhookAdapter(config)
@pytest.fixture(autouse=True)
def _isolate(tmp_path, monkeypatch):
monkeypatch.setenv("HERMES_HOME", str(tmp_path))
class TestDynamicRouteLoading:
def test_no_dynamic_file(self):
adapter = _make_adapter(routes={"static": {"secret": "s"}})
adapter._reload_dynamic_routes()
assert "static" in adapter._routes
assert len(adapter._dynamic_routes) == 0
def test_loads_dynamic_routes(self, tmp_path):
subs = {"my-hook": {"secret": "dynamic-secret", "prompt": "test", "events": []}}
(tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(json.dumps(subs))
adapter = _make_adapter(routes={"static": {"secret": "s"}})
adapter._reload_dynamic_routes()
assert "my-hook" in adapter._routes
assert "static" in adapter._routes
def test_static_takes_precedence(self, tmp_path):
(tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text(
json.dumps({"conflict": {"secret": "dynamic", "prompt": "dyn"}})
)
adapter = _make_adapter(routes={"conflict": {"secret": "static", "prompt": "stat"}})
adapter._reload_dynamic_routes()
assert adapter._routes["conflict"]["secret"] == "static"
def test_mtime_gated(self, tmp_path):
import time
path = tmp_path / _DYNAMIC_ROUTES_FILENAME
path.write_text(json.dumps({"v1": {"secret": "s"}}))
adapter = _make_adapter()
adapter._reload_dynamic_routes()
assert "v1" in adapter._dynamic_routes
# Same mtime — no reload
adapter._dynamic_routes["injected"] = True
adapter._reload_dynamic_routes()
assert "injected" in adapter._dynamic_routes
# New write — reloads
time.sleep(0.05)
path.write_text(json.dumps({"v2": {"secret": "s"}}))
adapter._reload_dynamic_routes()
assert "v2" in adapter._dynamic_routes
assert "v1" not in adapter._dynamic_routes
def test_file_removal_clears(self, tmp_path):
path = tmp_path / _DYNAMIC_ROUTES_FILENAME
path.write_text(json.dumps({"temp": {"secret": "s"}}))
adapter = _make_adapter()
adapter._reload_dynamic_routes()
assert "temp" in adapter._dynamic_routes
path.unlink()
adapter._reload_dynamic_routes()
assert len(adapter._dynamic_routes) == 0
def test_corrupted_file(self, tmp_path):
(tmp_path / _DYNAMIC_ROUTES_FILENAME).write_text("not json")
adapter = _make_adapter(routes={"static": {"secret": "s"}})
adapter._reload_dynamic_routes()
assert "static" in adapter._routes
assert len(adapter._dynamic_routes) == 0
+21
View File
@@ -105,3 +105,24 @@ class TestCmdUpdateBranchFallback:
commands = [" ".join(str(a) for a in c.args[0]) for c in mock_run.call_args_list]
pull_cmds = [c for c in commands if "pull" in c]
assert len(pull_cmds) == 0
def test_update_non_interactive_skips_migration_prompt(self, mock_args, capsys):
"""When stdin/stdout aren't TTYs, config migration prompt is skipped."""
with patch("shutil.which", return_value=None), patch(
"subprocess.run"
) as mock_run, patch("builtins.input") as mock_input, patch(
"hermes_cli.config.get_missing_env_vars", return_value=["MISSING_KEY"]
), patch("hermes_cli.config.get_missing_config_fields", return_value=[]), patch(
"hermes_cli.config.check_config_version", return_value=(1, 2)
), patch("hermes_cli.main.sys") as mock_sys:
mock_sys.stdin.isatty.return_value = False
mock_sys.stdout.isatty.return_value = False
mock_run.side_effect = _make_run_side_effect(
branch="main", verify_ok=True, commit_count="1"
)
cmd_update(mock_args)
mock_input.assert_not_called()
captured = capsys.readouterr()
assert "Non-interactive session" in captured.out
+19 -3
View File
@@ -1,6 +1,7 @@
"""Tests for gateway service management helpers."""
import os
from pathlib import Path
from types import SimpleNamespace
import hermes_cli.gateway as gateway_cli
@@ -152,12 +153,13 @@ class TestLaunchdServiceRecovery:
def test_launchd_start_reloads_unloaded_job_and_retries(self, tmp_path, monkeypatch):
plist_path = tmp_path / "ai.hermes.gateway.plist"
plist_path.write_text(gateway_cli.generate_launchd_plist(), encoding="utf-8")
label = gateway_cli.get_launchd_label()
calls = []
def fake_run(cmd, check=False, **kwargs):
calls.append(cmd)
if cmd == ["launchctl", "start", "ai.hermes.gateway"] and calls.count(cmd) == 1:
if cmd == ["launchctl", "start", label] and calls.count(cmd) == 1:
raise gateway_cli.subprocess.CalledProcessError(3, cmd, stderr="Could not find service")
return SimpleNamespace(returncode=0, stdout="", stderr="")
@@ -167,9 +169,9 @@ class TestLaunchdServiceRecovery:
gateway_cli.launchd_start()
assert calls == [
["launchctl", "start", "ai.hermes.gateway"],
["launchctl", "start", label],
["launchctl", "load", str(plist_path)],
["launchctl", "start", "ai.hermes.gateway"],
["launchctl", "start", label],
]
def test_launchd_status_reports_local_stale_plist_when_unloaded(self, tmp_path, monkeypatch, capsys):
@@ -354,6 +356,20 @@ class TestGeneratedUnitUsesDetectedVenv:
assert "/venv/" not in unit or "/.venv/" in unit
class TestGeneratedUnitIncludesLocalBin:
"""~/.local/bin must be in PATH so uvx/pipx tools are discoverable."""
def test_user_unit_includes_local_bin_in_path(self):
unit = gateway_cli.generate_systemd_unit(system=False)
home = str(Path.home())
assert f"{home}/.local/bin" in unit
def test_system_unit_includes_local_bin_in_path(self):
unit = gateway_cli.generate_systemd_unit(system=True)
# System unit uses the resolved home dir from _system_service_identity
assert "/.local/bin" in unit
class TestEnsureUserSystemdEnv:
"""Tests for _ensure_user_systemd_env() D-Bus session bus auto-detection."""
@@ -94,7 +94,7 @@ class TestOfferOpenclawMigration:
fake_mod.Migrator.assert_called_once()
call_kwargs = fake_mod.Migrator.call_args[1]
assert call_kwargs["execute"] is True
assert call_kwargs["overwrite"] is False
assert call_kwargs["overwrite"] is True
assert call_kwargs["migrate_secrets"] is True
assert call_kwargs["preset_name"] == "full"
fake_migrator.migrate.assert_called_once()
@@ -285,3 +285,182 @@ class TestSetupWizardOpenclawIntegration:
setup_mod.run_setup_wizard(args)
mock_migration.assert_not_called()
# ---------------------------------------------------------------------------
# _get_section_config_summary / _skip_configured_section — unit tests
# ---------------------------------------------------------------------------
class TestGetSectionConfigSummary:
"""Test the _get_section_config_summary helper."""
def test_model_returns_none_without_api_key(self):
with patch.object(setup_mod, "get_env_value", return_value=""):
result = setup_mod._get_section_config_summary({}, "model")
assert result is None
def test_model_returns_summary_with_api_key(self):
def env_side(key):
return "sk-xxx" if key == "OPENROUTER_API_KEY" else ""
with patch.object(setup_mod, "get_env_value", side_effect=env_side):
result = setup_mod._get_section_config_summary(
{"model": "openai/gpt-4"}, "model"
)
assert result == "openai/gpt-4"
def test_model_returns_dict_default_key(self):
def env_side(key):
return "sk-xxx" if key == "OPENAI_API_KEY" else ""
with patch.object(setup_mod, "get_env_value", side_effect=env_side):
result = setup_mod._get_section_config_summary(
{"model": {"default": "claude-opus-4", "provider": "anthropic"}},
"model",
)
assert result == "claude-opus-4"
def test_terminal_always_returns(self):
with patch.object(setup_mod, "get_env_value", return_value=""):
result = setup_mod._get_section_config_summary(
{"terminal": {"backend": "docker"}}, "terminal"
)
assert result == "backend: docker"
def test_agent_always_returns(self):
with patch.object(setup_mod, "get_env_value", return_value=""):
result = setup_mod._get_section_config_summary(
{"agent": {"max_turns": 120}}, "agent"
)
assert result == "max turns: 120"
def test_gateway_returns_none_without_tokens(self):
with patch.object(setup_mod, "get_env_value", return_value=""):
result = setup_mod._get_section_config_summary({}, "gateway")
assert result is None
def test_gateway_lists_platforms(self):
def env_side(key):
if key == "TELEGRAM_BOT_TOKEN":
return "tok123"
if key == "DISCORD_BOT_TOKEN":
return "disc456"
return ""
with patch.object(setup_mod, "get_env_value", side_effect=env_side):
result = setup_mod._get_section_config_summary({}, "gateway")
assert "Telegram" in result
assert "Discord" in result
def test_tools_returns_none_without_keys(self):
with patch.object(setup_mod, "get_env_value", return_value=""):
result = setup_mod._get_section_config_summary({}, "tools")
assert result is None
def test_tools_lists_configured(self):
def env_side(key):
return "key" if key == "BROWSERBASE_API_KEY" else ""
with patch.object(setup_mod, "get_env_value", side_effect=env_side):
result = setup_mod._get_section_config_summary({}, "tools")
assert "Browser" in result
class TestSkipConfiguredSection:
"""Test the _skip_configured_section helper."""
def test_returns_false_when_not_configured(self):
with patch.object(setup_mod, "get_env_value", return_value=""):
result = setup_mod._skip_configured_section({}, "model", "Model")
assert result is False
def test_returns_true_when_user_skips(self):
def env_side(key):
return "sk-xxx" if key == "OPENROUTER_API_KEY" else ""
with (
patch.object(setup_mod, "get_env_value", side_effect=env_side),
patch.object(setup_mod, "prompt_yes_no", return_value=False),
):
result = setup_mod._skip_configured_section(
{"model": "openai/gpt-4"}, "model", "Model"
)
assert result is True
def test_returns_false_when_user_wants_reconfig(self):
def env_side(key):
return "sk-xxx" if key == "OPENROUTER_API_KEY" else ""
with (
patch.object(setup_mod, "get_env_value", side_effect=env_side),
patch.object(setup_mod, "prompt_yes_no", return_value=True),
):
result = setup_mod._skip_configured_section(
{"model": "openai/gpt-4"}, "model", "Model"
)
assert result is False
class TestSetupWizardSkipsConfiguredSections:
"""After migration, already-configured sections should offer skip."""
def test_sections_skipped_when_migration_imported_settings(self, tmp_path):
"""When migration ran and API key exists, model section should be skippable.
Simulates the real flow: get_env_value returns "" during the is_existing
check (before migration), then returns a key after migration imported it.
"""
args = _first_time_args()
# Track whether migration has "run" — after it does, API key is available
migration_done = {"value": False}
def env_side(key):
if migration_done["value"] and key == "OPENROUTER_API_KEY":
return "sk-xxx"
return ""
def fake_migration(hermes_home):
migration_done["value"] = True
return True
reloaded_config = {"model": "openai/gpt-4"}
with (
patch.object(setup_mod, "ensure_hermes_home"),
patch.object(
setup_mod, "load_config",
side_effect=[{}, reloaded_config],
),
patch.object(setup_mod, "get_hermes_home", return_value=tmp_path),
patch.object(setup_mod, "get_env_value", side_effect=env_side),
patch.object(setup_mod, "is_interactive_stdin", return_value=True),
patch("hermes_cli.auth.get_active_provider", return_value=None),
patch("builtins.input", return_value=""),
# Migration succeeds and flips the env_side flag
patch.object(
setup_mod, "_offer_openclaw_migration",
side_effect=fake_migration,
),
# User says No to all reconfig prompts
patch.object(setup_mod, "prompt_yes_no", return_value=False),
patch.object(setup_mod, "setup_model_provider") as mock_model,
patch.object(setup_mod, "setup_terminal_backend") as mock_terminal,
patch.object(setup_mod, "setup_agent_settings") as mock_agent,
patch.object(setup_mod, "setup_gateway") as mock_gateway,
patch.object(setup_mod, "setup_tools") as mock_tools,
patch.object(setup_mod, "save_config"),
patch.object(setup_mod, "_print_setup_summary"),
):
setup_mod.run_setup_wizard(args)
# Model has API key → skip offered, user said No → section NOT called
mock_model.assert_not_called()
# Terminal/agent always have a summary → skip offered, user said No
mock_terminal.assert_not_called()
mock_agent.assert_not_called()
# Gateway has no tokens (env_side returns "" for gateway keys) → section runs
mock_gateway.assert_called_once()
# Tools have no keys → section runs
mock_tools.assert_called_once()
+51 -9
View File
@@ -1,10 +1,13 @@
"""
Tests for skip_confirm behavior in /skills install and /skills uninstall.
Tests for skip_confirm and invalidate_cache behavior in /skills install
and /skills uninstall slash commands.
Verifies that --yes / -y bypasses the interactive confirmation prompt
that hangs inside prompt_toolkit's TUI.
Slash commands always skip confirmation (input() hangs in TUI).
Cache invalidation is deferred by default; --now opts into immediate
invalidation (at the cost of breaking prompt cache mid-session).
Based on PR #1595 by 333Alden333 (salvaged).
Updated for PR #3586 (cache-aware install/uninstall).
"""
from unittest.mock import patch, MagicMock
@@ -32,23 +35,43 @@ class TestHandleSkillsSlashInstallFlags:
_, kwargs = mock_install.call_args
assert kwargs.get("skip_confirm") is True
def test_force_flag_sets_force_not_skip(self):
def test_force_flag_sets_force(self):
from hermes_cli.skills_hub import handle_skills_slash
with patch("hermes_cli.skills_hub.do_install") as mock_install:
handle_skills_slash("/skills install test/skill --force")
mock_install.assert_called_once()
_, kwargs = mock_install.call_args
assert kwargs.get("force") is True
assert kwargs.get("skip_confirm") is False
# Slash commands always skip confirmation (input() hangs in TUI)
assert kwargs.get("skip_confirm") is True
def test_no_flags(self):
def test_no_flags_still_skips_confirm(self):
"""Slash commands always skip confirmation — input() hangs in TUI."""
from hermes_cli.skills_hub import handle_skills_slash
with patch("hermes_cli.skills_hub.do_install") as mock_install:
handle_skills_slash("/skills install test/skill")
mock_install.assert_called_once()
_, kwargs = mock_install.call_args
assert kwargs.get("force") is False
assert kwargs.get("skip_confirm") is False
assert kwargs.get("skip_confirm") is True
def test_default_defers_cache_invalidation(self):
"""Without --now, cache invalidation is deferred to next session."""
from hermes_cli.skills_hub import handle_skills_slash
with patch("hermes_cli.skills_hub.do_install") as mock_install:
handle_skills_slash("/skills install test/skill")
mock_install.assert_called_once()
_, kwargs = mock_install.call_args
assert kwargs.get("invalidate_cache") is False
def test_now_flag_invalidates_cache(self):
"""--now opts into immediate cache invalidation."""
from hermes_cli.skills_hub import handle_skills_slash
with patch("hermes_cli.skills_hub.do_install") as mock_install:
handle_skills_slash("/skills install test/skill --now")
mock_install.assert_called_once()
_, kwargs = mock_install.call_args
assert kwargs.get("invalidate_cache") is True
class TestHandleSkillsSlashUninstallFlags:
@@ -70,13 +93,32 @@ class TestHandleSkillsSlashUninstallFlags:
_, kwargs = mock_uninstall.call_args
assert kwargs.get("skip_confirm") is True
def test_no_flags(self):
def test_no_flags_still_skips_confirm(self):
"""Slash commands always skip confirmation — input() hangs in TUI."""
from hermes_cli.skills_hub import handle_skills_slash
with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
handle_skills_slash("/skills uninstall test-skill")
mock_uninstall.assert_called_once()
_, kwargs = mock_uninstall.call_args
assert kwargs.get("skip_confirm", False) is False
assert kwargs.get("skip_confirm") is True
def test_default_defers_cache_invalidation(self):
"""Without --now, cache invalidation is deferred to next session."""
from hermes_cli.skills_hub import handle_skills_slash
with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
handle_skills_slash("/skills uninstall test-skill")
mock_uninstall.assert_called_once()
_, kwargs = mock_uninstall.call_args
assert kwargs.get("invalidate_cache") is False
def test_now_flag_invalidates_cache(self):
"""--now opts into immediate cache invalidation."""
from hermes_cli.skills_hub import handle_skills_slash
with patch("hermes_cli.skills_hub.do_uninstall") as mock_uninstall:
handle_skills_slash("/skills uninstall test-skill --now")
mock_uninstall.assert_called_once()
_, kwargs = mock_uninstall.call_args
assert kwargs.get("invalidate_cache") is True
class TestDoInstallSkipConfirm:
+50
View File
@@ -237,3 +237,53 @@ def test_save_platform_tools_still_preserves_mcp_with_platform_default_present()
# Deselected configurable toolset removed
assert "terminal" not in saved
# ── Platform / toolset consistency ────────────────────────────────────────────
class TestPlatformToolsetConsistency:
"""Every platform in tools_config.PLATFORMS must have a matching toolset."""
def test_all_platforms_have_toolset_definitions(self):
"""Each platform's default_toolset must exist in TOOLSETS."""
from hermes_cli.tools_config import PLATFORMS
from toolsets import TOOLSETS
for platform, meta in PLATFORMS.items():
ts_name = meta["default_toolset"]
assert ts_name in TOOLSETS, (
f"Platform {platform!r} references toolset {ts_name!r} "
f"which is not defined in toolsets.py"
)
def test_gateway_toolset_includes_all_messaging_platforms(self):
"""hermes-gateway includes list should cover all messaging platforms."""
from hermes_cli.tools_config import PLATFORMS
from toolsets import TOOLSETS
gateway_includes = set(TOOLSETS["hermes-gateway"]["includes"])
# Exclude non-messaging platforms from the check
non_messaging = {"cli", "api_server"}
for platform, meta in PLATFORMS.items():
if platform in non_messaging:
continue
ts_name = meta["default_toolset"]
assert ts_name in gateway_includes, (
f"Platform {platform!r} toolset {ts_name!r} missing from "
f"hermes-gateway includes"
)
def test_skills_config_covers_tools_config_platforms(self):
"""skills_config.PLATFORMS should have entries for all gateway platforms."""
from hermes_cli.tools_config import PLATFORMS as TOOLS_PLATFORMS
from hermes_cli.skills_config import PLATFORMS as SKILLS_PLATFORMS
non_messaging = {"api_server"}
for platform in TOOLS_PLATFORMS:
if platform in non_messaging:
continue
assert platform in SKILLS_PLATFORMS, (
f"Platform {platform!r} in tools_config but missing from "
f"skills_config PLATFORMS"
)

Some files were not shown because too many files have changed in this diff Show More