Compare commits

...

295 Commits

Author SHA1 Message Date
Brooklyn Nicholson 3d8be3e84d feat(lint): observability for the LSP bridge with steady-state silence
Adds per-call structured logging on the dedicated ``hermes.lint.lsp``
logger so an opt-in user can answer "did LSP fire on that edit?" with
``rg 'lsp\['`` against ``~/.hermes/logs/agent.log``. Levels are tuned
so a 1000-write session emits exactly ONE INFO line at the default
threshold, not 1000.

Level model
-----------
* ``DEBUG`` (invisible at the default INFO threshold) for every per-call
  steady-state event: ``clean``, ``feature off``, ``extension not
  mapped``, ``backend not local``, repeated ``no project root`` for an
  already-announced file, repeated ``server unavailable`` for an
  already-announced binary.
* ``INFO`` for state transitions worth surfacing once: ``active for
  <root>`` the first time a (language, project_root) client starts,
  ``no project root for <path>`` the first time we see that orphan
  file. Plus every ``N diags`` event — diagnostics are inherently rare
  per-edit and are exactly the failure signal users want to grep for.
* ``WARNING`` for action-required failures the first time per
  (language, binary): ``server unavailable`` (binary not on PATH),
  ``no server configured``. Per-call ``WARNING`` for timeouts, server
  errors, and unexpected bridge exceptions — these are inherently
  novel events, not steady state, and each one is its own signal.

Dedup is in-process module-level sets guarded by a lock. Sets grow at
most by the number of distinct (language, project_root) and (language,
binary) pairs touched in one Python process — a few hundred entries in
the most aggressive monorepo session, which is bytes of memory. A
bounded LRU was rejected because evicting an entry would risk
re-firing the WARNING/INFO line we explicitly want to suppress.

Why this matters
----------------
The previous draft logged every per-call event at INFO. ``agent.log``
caps at 5 MB × 3 backups (= 20 MB) via ``RotatingFileHandler``, so
nothing would crash, but a normal coding session would dwarf the actual
signal under hundreds of ``lsp[typescript] clean (...)`` lines. The
new model preserves the verification answer ("LSP active for <root>")
and the action-required signals while keeping clean steady state out
of the user's face.

Tests
-----
* ``TestLogLevelsSteadyState`` — feature off, unmapped extension, non-
  local backend, and repeated clean writes all stay at DEBUG. Exactly
  one INFO ("active for ...") survives across N calls.
* ``TestLogLevelsNovelEvents`` — diagnostics are INFO per call;
  ``active for`` fires once per (language, root).
* ``TestLogLevelsActionRequired`` — server unavailable warns once per
  binary; orphan files INFO once per path; timeouts WARN every time.
2026-05-12 00:28:30 -04:00
Brooklyn Nicholson 9a546d6b08 feat(lint): opt-in LSP-backed lint path in _check_lint
The post-edit lint hook in `tools/file_operations.py::_check_lint` has
historically resolved `.ts`/`.tsx` to `npx tsc --noEmit <single-file>`,
which has no view of the surrounding project and floods the agent with
phantom "Cannot find module" errors that disappear the moment
`tsconfig.json` is in scope. Same shape for `.go` (`go vet` orphan) and
`.rs` (`rustfmt --check` is style, not types).

This change adds an opt-in LSP path that runs *before* the in-process /
shell linter table:

- `tools/lsp_client.py` — stdio JSON-RPC client with Content-Length
  framing, per-(language, project_root, server_cmd) cache, idle reaper.
  Sync API (`diagnostics(path, content)`) returns one snapshot via
  `textDocument/didOpen` + `publishDiagnostics` with a configurable
  settle window for servers that emit "indexing" diagnostics first.
  Hand-rolled rather than pulled from `multilspy` / `sansio-lsp-client`
  / `pygls` — see the module docstring for the comparison.
- `tools/lsp_lint.py` — bridge between `_check_lint` and the client.
  Returns `None` (caller falls through to the existing shell linter)
  whenever LSP cannot or should not handle the file: feature off,
  language unmapped, env not local, no project root marker, server
  binary missing, request timed out. Never raises.
- `tools/file_operations.py::_check_lint` — calls the bridge first,
  falls through to the in-process and shell linters unchanged. With
  the flag off (the default) this is one extra `import` on the lint
  path and zero behaviour change.
- `hermes_cli/config.py` — new `lint.lsp.{enabled,servers,...}` block,
  off by default. New top-level key, so the deep-merge handles
  upgrades and no `_config_version` bump is required.

Out of scope for this PR (explicitly noted in the module docstrings):

- Container / SSH / Modal / Daytona backends. The server has to run
  inside the sandbox, which means baking it into the image or
  installing on first use; that's the gnarly half and gets its own PR.
- didChange / live editing. We re-open with the freshly written bytes
  on every call.
- Pull diagnostics (LSP 3.17). Push works on every server we care
  about today.

Tests:

- `test_lsp_client.py` — pure helpers (project root walk, URI, framing,
  Diagnostic conversion) + end-to-end coverage against an in-tree fake
  LSP server (Python script over stdio) for clean / dirty / settle /
  timeout / registry caching / shutdown.
- `test_lsp_lint_bridge.py` — feature flag, backend gating, project
  root gating, language-map gating, error containment.
- `test_check_lint_lsp_routing.py` — regression guard that the default
  (LSP disabled) path still hits in-process / shell linters, plus
  proof that an enabled bridge short-circuits the shell linter.
2026-05-12 00:13:12 -04:00
Ben Barclay 3c23b15f81 fix(tui-clipboard): skip native safety net on OSC52-capable terminals (#20954)
* fix(tui-clipboard): skip native safety net on OSC52-capable terminals

On terminals with first-class OSC 52 support (Ghostty, kitty, WezTerm,
Windows Terminal, VS Code), setClipboard() currently fires both OSC 52
AND a parallel native-tool write (wl-copy / xclip / pbcopy). On Wayland
+ wl-copy this corrupts the clipboard: probeLinuxCopy() runs wl-copy
with empty stdin as an existence check (destructive — wipes clipboard
to empty string), and the subsequent real wl-copy invocation races
OSC 52 plus its own daemon's previous SIGTERM.

Symptom: user on Arch + Ghostty + wl-copy (Wayland, no tmux, no SSH)
had to press Ctrl+Shift+C three times before a selection landed.
env -u WAYLAND_DISPLAY -u DISPLAY HERMES_TUI_FORCE_OSC52=1 (which
short-circuits copyNative via the DISPLAY-absent early-return) made
every copy work instantly — proving OSC 52 alone is sufficient on
Ghostty and that copyNative() is actively destructive there.

Add OSC52_CAPABLE_TERMINALS allowlist to terminal.ts (same pattern as
the existing EXTENDED_KEYS_TERMINALS), and gate copyNative() on the
terminal NOT being on it. The native safety net continues to fire on
unrecognised terminals (xterm, GNOME Terminal, Konsole, Terminal.app,
etc.) where OSC 52 is less reliable.

* fix(tui-clipboard): address Copilot review feedback

- Move OSC52_CAPABLE_TERMINALS + supportsOsc52Clipboard() from
  ink/terminal.ts to utils/env.ts. ink/terminal.ts already imports
  link from ink/termio/osc.ts; importing back into termio/osc.ts
  introduced a circular dependency. utils/env.ts has no deps on
  either file and already owns terminal detection (detectTerminal()),
  so the helper sits naturally next to it.

- Replace the inline gating (!SSH_CONNECTION && !supportsOsc52Clipboard())
  with a pure shouldUseNativeClipboard(env, terminal) helper. The old
  expression skipped native on allowlisted terminals even when
  setClipboard() wouldn't actually emit OSC 52 (e.g. inside
  TMUX/STY where we use tmux load-buffer instead, or when the user
  has set HERMES_TUI_FORCE_OSC52=0). That made the clipboard write
  a no-op in those configurations. The new helper:
    1. SSH_CONNECTION set -> false (existing behaviour)
    2. TMUX or STY set -> true (we go through load-buffer, no race)
    3. shouldEmitClipboardSequence() false -> true (native is the
       only path left when OSC 52 is suppressed)
    4. Otherwise: skip native iff terminal is allowlisted.

- Add 11 tests for shouldUseNativeClipboard covering the SSH guard,
  TMUX/STY tmux-inside-Ghostty case, HERMES_TUI_FORCE_OSC52=0
  override, allowlisted vs non-allowlisted terminals, precedence,
  and default-args smoke. Tests follow the package's existing
  parameterised-helper style (no vi.mock; helpers accept env and
  terminal as arguments).

- Update test imports to the new utils/env.js path.

* fix(tui-clipboard): address Copilot round 2 feedback

* fix(tui-clipboard): address Copilot round 3 feedback

* fix(tui-clipboard): address Copilot round 4 feedback
2026-05-11 19:40:07 -07:00
Teknium e85592591e fix(nous): surface Portal-flagged free models in picker even when curated list is stale (#24082)
Free-tier users were seeing 'No free models currently available.' in the
`hermes model` and post-login pickers even though qwen/qwen3.6-plus is
free on the Portal right now. Three independent breakages compounded:

1. The docs-hosted catalog manifest at website/static/api/model-catalog.json
   was not regenerated when _PROVIDER_MODELS['nous'] was updated, so users
   fetching the manifest got a list that didn't include qwen/qwen3.6-plus.
2. _resolve_nous_pricing_credentials() returned ('', '') on any auth blip,
   collapsing get_pricing_for_provider('nous') to {} and making every
   curated model fall through the free-tier filter as 'paid'.
3. Even with healthy pricing, the picker only ever showed models from the
   in-repo curated list intersected with live pricing — a Portal-flagged
   free model not yet in the curated list could never appear.

Changes:
- hermes_cli/models.py: new union_with_portal_free_recommendations() that
  augments the curated list with Portal freeRecommendedModels entries
  (with synthetic free pricing so partition keeps them). The Portal's
  /api/nous/recommended-models endpoint is now the source of truth for
  free-tier surfacing — old Hermes builds will see new free models
  without a CLI release.
- hermes_cli/models.py: _resolve_nous_pricing_credentials() falls back to
  the public inference base URL when runtime cred resolution fails.
  The /v1/models endpoint exposes pricing without auth, so silently
  returning {} just because a refresh token expired was wrong.
- hermes_cli/auth.py + hermes_cli/main.py: both free-tier picker call
  sites call union_with_portal_free_recommendations() before partition.
- tests/hermes_cli/test_models.py: 7 tests covering union behaviour
  (prepend, dedup, end-to-end with stale pricing, empty/missing/error
  payloads, invalid entries).
- tests/hermes_cli/test_model_catalog.py: drift guard
  TestManifestMatchesInRepoLists fails CI when _PROVIDER_MODELS['nous']
  or OPENROUTER_MODELS is edited without re-running
  scripts/build_model_catalog.py. Verified empirically that removing a
  manifest entry triggers an assertion with an actionable error message.

Validation:
- 133/133 targeted tests pass (test_models, test_model_catalog,
  test_auth_nous_provider).
- Live E2E against the real Portal:
  - Stale curated list ['claude-opus','claude-sonnet','gpt-5.4'] (no
    qwen) → after union: ['qwen/qwen3.6-plus', ...] →
    partition(free_tier=True): selectable=['qwen/qwen3.6-plus'].
  - Simulated expired refresh token → anon fetch returns 403 pricing
    entries including qwen/qwen3.6-plus -> {prompt:0, completion:0}.
- ruff: clean.
2026-05-11 18:08:16 -07:00
Teknium ced1990c1c feat(computer-use): refresh cua-driver on hermes update + add install --upgrade (#24063)
cua-driver was only installed once on toolset enable: `_run_post_setup` early-returns when the binary is already on PATH, so upstream fixes (e.g. v0.1.6 Safari window-focus fix) never reached existing users without manual reinstall.

Two refresh points now:
- `hermes update` re-runs the upstream installer at the end of the update if cua-driver is on PATH (macOS-only, no-op otherwise). Ties driver freshness to the user-controlled update cadence — no startup latency, no per-launch GitHub API call.
- `hermes computer-use install --upgrade` for manual force-refresh.

The upstream `install.sh` always pulls the latest release, so re-running is the canonical upgrade path. No version-comparison logic needed.

`hermes computer-use status` now shows the installed version, and points at `--upgrade` for refreshing.
2026-05-11 17:10:58 -07:00
Teknium1 97a0e69df0 chore(release): add AUTHOR_MAP entry for ahmedbadr3 2026-05-11 16:51:09 -07:00
Ahmed Badr 05bad7b1e7 fix(dashboard): MiniMax 'Login' button launched Claude OAuth (#22832)
Fixes #22832.

## Root cause

`hermes_cli/web_server.py:start_oauth_login` dispatched OAuth flows by
the catalog's `flow` field rather than provider id:

    if catalog_entry["flow"] == "pkce":
        return _start_anthropic_pkce()

The catalog had two `flow: "pkce"` entries — `anthropic` and
`minimax-oauth` — so clicking "Login" on MiniMax in the dashboard's
Keys tab unconditionally launched the Anthropic/Claude PKCE flow.

## Fix

Three changes in `hermes_cli/web_server.py`:

1. Catalog entry for `minimax-oauth` changed from `flow: "pkce"` to
   `flow: "device_code"`. From a UX perspective MiniMax is a
   verification-URI + user-code flow (open URL, enter code, backend
   polls) — same shape as Nous's device-code flow. The PKCE bit
   (verifier + challenge from `_minimax_pkce_pair`) is a security
   extension that doesn't change the operator experience; the existing
   dashboard modal already renders `device_code` correctly for this UX.

2. New MiniMax branch in `_start_device_code_flow`, mirroring the
   existing Nous branch but calling MiniMax-specific helpers
   (`_minimax_request_user_code`, `_minimax_pkce_pair`). Stashes
   verifier + state in the session for the poller to consume. Handles
   the overloaded `expired_in` field (could be unix-ms timestamp OR
   seconds-from-now duration) the same way `_minimax_poll_token` does.

3. New `_minimax_poller` background thread mirroring `_nous_poller`.
   Calls `_minimax_poll_token` → on success builds the same
   `auth_state` dict the CLI flow (`_minimax_oauth_login`) builds, and
   persists via `_minimax_save_auth_state` so the dashboard path leaves
   the system in the same state as `hermes auth add minimax-oauth`.

Plus a dispatcher tightening to prevent regression: the `pkce` branch
now requires `provider_id == "anthropic"`, so any future PKCE provider
added without a proper start function gets a clean
`400 Unsupported flow` rather than silently launching Anthropic OAuth.

## Test

New `tests/hermes_cli/test_web_oauth_dispatch.py`:

- Regression test asserting MiniMax start does NOT return claude.ai
- Sanity test that Anthropic PKCE still works after the dispatcher
  tightening
- Forward-looking test: a hypothetical pkce-flagged provider without
  an explicit branch is rejected cleanly rather than misrouted

## Limitations

- The dashboard MiniMax path defaults to `region="global"`. CN-region
  operators can still use the CLI flow which supports `--region cn`.
  Adding a region toggle to the dashboard UI is a follow-up.
2026-05-11 16:51:09 -07:00
Teknium ea1d0462cf fix(cli): vertical fallback for markdown tables wider than terminal (#23948)
Follow-up to #23863 (CJK table alignment). The realigner was
correctly padding pipes to identical column offsets, but when a
table's natural width exceeds terminal cells it produced lines that
the terminal soft-wrapped mid-cell, destroying column alignment
visually even though the bytes were perfectly padded. Reported as
'columns are not aligned' on tables containing one long row alongside
several short rows.

Approach mirrors Claude Code's MarkdownTable.tsx narrow-terminal
fallback: when realign_markdown_tables is given an available_width
budget and the rebuilt horizontal table exceeds it, render each body
row as 'Header: value' lines separated by a thin ─ rule. Word-wraps
oversize values at the budget with a 2-space continuation indent.

- agent/markdown_tables.py: realign_markdown_tables(text, available_width=None);
  threshold check at the top of _render_block flips into a new
  _render_vertical fallback. Includes _wrap_to_width with hard-break
  for tokens longer than the budget.
- cli.py: helper _terminal_width_for_streaming() returns
  shutil.get_terminal_size().columns minus _STREAM_PAD and a 2-cell
  safety margin; passed to all three realign call sites
  (_render_final_assistant_content for strip+render Panel paths, and
  the streaming flushers in _emit_stream_text / _flush_stream).
- tests/agent/test_markdown_tables.py: 4 new tests covering the
  overflow-vertical fallback for ASCII + CJK content, the
  'fits → keep horizontal' case, and the long-cell wrap with indent.

Live-verified: with COLUMNS=100, the user's reported 'long row in
ASCII table' case now renders as vertical key-value rows that all fit
the panel; the 6-column CJK comparison table still renders as an
aligned horizontal table because it fits inside 100 cols.
2026-05-11 16:49:13 -07:00
ethernet 825bd50e6b Merge pull request #18036 from NousResearch/fix/bundle-size
ui-tui: bundle with esbuild, drop runtime node_modules
2026-05-11 17:46:19 -04:00
brooklyn! 75b428c852 feat(ui-tui): resolve markdown links to readable page titles (#24013)
* feat(ui-tui): resolve links to readable page titles

Mirror desktop pretty-link behavior in the TUI by resolving HTTP links to page titles with shared caching and safe fetch filters, plus slug-based fallbacks so chat links stay readable even when title fetch fails.

* refactor(ui-tui): tighten link-title fallback handling

Clean up the link-title resolver by hardening in-flight cleanup and clarifying title length limits, while adding focused coverage for HTML entity decoding and markdown-label fallback behavior.

* fix(ui-tui): block private-network targets in title fetches

Prevent automatic link-title resolution from requesting local or private hosts by rejecting RFC1918, link-local, ULA, and intranet-style hostnames before fetch, and add regression coverage for blocked host patterns.
2026-05-11 14:16:31 -07:00
ethernet c6ca11618a refactor(tui): simplify TUI build logic, remove stale staleness checks
The old mtime-tracking staleness machinery (_tui_build_needed,
_hermes_ink_bundle_stale, _find_bundled_tui) tried to avoid rebuilding
by comparing source timestamps to dist/entry.js. This was fragile and
added ~100 lines of code. Replace with three clear paths:

1. HERMES_TUI_DIR set (prebuilt/nix): just node dist/entry.js, no build
2. --dev mode: tsx src/entry.tsx, no build, hot reload
3. Normal: always npm run build (esbuild is ~1s, correctness > caching)

Also error when HERMES_TUI_DIR is set with --dev (footgun: prebuilt
bundle has no source code to hot-reload).
2026-05-11 17:04:34 -04:00
kshitijk4poor 9a63b5f16c chore: add nicoechaniz to AUTHOR_MAP 2026-05-11 13:16:07 -07:00
nicoechaniz e2b713cced fix(model-metadata): skip OpenRouter for known providers, add kimi/moonshot to PROVIDER_TO_MODELS_DEV
Based on PR #23950 by @nicoechaniz.

- Add "kimi" and "moonshot" to PROVIDER_TO_MODELS_DEV → kimi-for-coding
- Gate OpenRouter metadata step behind "if not effective_provider":
  known providers should not be overridden by community-maintained OR data
- Keep the targeted Kimi-family 32k guard as a secondary safety net
  inside the OR gate (for unknown providers with Kimi models)

Co-authored-by: nicoechaniz <nicoechaniz@altermundi.net>
2026-05-11 13:16:07 -07:00
kshitijk4poor 91eef6255e fix: correct context-length resolution for kimi-k2.6 on Ollama Cloud and Kimi Coding
Kimi-k2.6 (which supports 262K context) was incorrectly resolved as 32K,
tripping the 64K minimum-context guard and preventing use of the model on
Ollama Cloud and Kimi Coding / Moonshot providers.

Three fixes in the context-length resolution chain:

1. Ollama Cloud native /api/show query: new _query_ollama_api_show()
   queries the Ollama native API for authoritative GGUF model_info
   context_length.  For hosted Ollama, prefers model_info over num_ctx
   since users can't set their own num_ctx on Cloud.  Added at step 5e
   in get_model_context_length(), before the models.dev fallback.

2. models.dev :cloud/-cloud suffix fallback: lookup_models_dev_context()
   now also tries appending :cloud and -cloud suffixes when the bare
   model name doesn't match.  models.dev stores 'kimi-k2.6:cloud' but
   users and the live API use bare 'kimi-k2.6'.

3. Kimi-family 32K guard: after the OpenRouter metadata step, reject
   exactly 32768 for Kimi-named models (kimi-*, moonshot*) and fall
   through to hardcoded defaults ('kimi': 262144).  OpenRouter reports
   32768 for moonshotai/kimi-k2.6 but the model actually supports 262K.
   Narrow filter — only 32768, only Kimi-family — becomes dead code
   when OpenRouter updates its metadata.

---
2026-05-11 13:16:07 -07:00
ethernet 3197b4de6d Merge remote-tracking branch 'origin/main' into fix/bundle-size 2026-05-11 16:01:04 -04:00
Siddharth Balyan 271883447e feat: expose HERMES_SESSION_ID to agent tools via ContextVar + env (#23847)
Set HERMES_SESSION_ID using the existing session_context.py ContextVar
system for concurrency safety (multiple gateway sessions in one process
won't cross-talk). Also writes os.environ as fallback for CLI mode.

Touchpoints:
- gateway/session_context.py: Add _SESSION_ID ContextVar + _VAR_MAP entry
- run_agent.py: Set both ContextVar and os.environ at init and on
  context-compression rotation
- tools/environments/local.py: Bridge ContextVars into subprocess env
  in _make_run_env() (ContextVars don't propagate to child processes)
- tests/run_agent/test_session_id_env.py: 3 tests covering env, provided
  ID, and ContextVar paths

execute_code subprocess already passes HERMES_* prefixed vars through
_scrub_child_env (line 82: _SAFE_ENV_PREFIXES includes 'HERMES_').

Primary use case: webhook-triggered agents that need to include a
`--resume <session_id>` takeover command in their output.
2026-05-12 00:16:45 +05:30
kshitij ce0f529cde chore: ruff auto-fix C401, C416, C408, PLR1722 (#23940)
C401:   set(x for x in y) -> {x for x in y}      (set comprehension)
C416:   [(k,v) for k,v in d] -> list(d.items())  (unnecessary listcomp)
C408:   tuple()/dict() -> ()/{}                   (unnecessary collection call)
PLR1722: exit() -> sys.exit()                     (adds import sys where needed)

21 instances fixed, 0 remaining. 19 files, +40/-36.
2026-05-11 11:20:58 -07:00
Teknium 7b76366552 feat(prompt-cache): cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal (#23828)
Cuts input cost for first-turn Claude requests by ~85-90% on subsequent
sessions within an hour. Tools array (~13k tokens for default toolset) +
stable system prefix (~5-8k tokens) get a 1h cache_control marker; the
volatile suffix (memory, USER profile, timestamp, session id) sits in a
separate non-cached block at the end so it doesn't poison the cross-session
prefix when it changes.

Provider gate: Claude on native Anthropic (incl. OAuth subscription),
OpenRouter, and Nous Portal (which proxies to OpenRouter). All other
providers keep today's system_and_3 layout unchanged.

Layout (4 cache_control breakpoints, Anthropic max):
  1. tools[-1]              -> 1h (cross-session)
  2. system content[0]      -> 1h (cross-session, stable prefix)
  3. messages[-2]           -> 5m (within-session rolling)
  4. messages[-1]           -> 5m (within-session rolling)

Within-session rolling shrinks from 3 messages to 2 to free the breakpoint
budget. On Claude with realistic tool loadouts the long-lived tier carries
the bulk of cross-session value anyway.

System prompt is now always assembled cache-friendly: stable identity /
guidance / skills / platform hints first, then session-stable context
files (AGENTS.md, .cursorrules), then per-call volatile content. Old
single-string callers see the same logical content (same join order),
just reordered so volatile lives at the end.

Config knobs (defaults shown):
  prompt_caching:
    cache_ttl: "5m"           # rolling-window TTL (unchanged)
    long_lived_prefix: true    # opt-out switch
    long_lived_ttl: "1h"       # cross-session prefix TTL

Live E2E (tests/agent/test_prompt_caching_live.py, gated on
OPENROUTER_API_KEY) on anthropic/claude-haiku-4.5 with default toolset:
  Call 1 (cold):              cache_write=13,415  cache_read=0
  Call 2 (NEW agent + msg):   cache_write=391     cache_read=13,025
  Cross-session reuse:        97.09%

Implementation:
* agent/prompt_caching.py: new apply_anthropic_cache_control_long_lived()
  + mark_tools_for_long_lived_cache(); existing apply_anthropic_cache_control()
  preserved verbatim for the fallback path.
* agent/anthropic_adapter.py: convert_tools_to_anthropic() now forwards
  cache_control onto each Anthropic-format tool dict.
* run_agent.py: _build_system_prompt_parts() returns the 3-tier dict;
  _build_system_prompt() joins them (backward compatible).
  _supports_long_lived_anthropic_cache() policy added next to the existing
  _anthropic_prompt_cache_policy() (which now also recognises Nous Portal
  Claude — pre-existing gap fixed in passing).
  _build_api_kwargs() resolves tools_for_api once and propagates the
  marker through all four build paths (anthropic_messages, bedrock,
  codex_responses, profile/legacy chat completions).
  Long-lived flag plumbed into the runtime snapshot/restore + model-switch
  + fallback-promotion paths.

Tests:
* tests/agent/test_prompt_caching.py: +8 tests (TestMarkToolsForLongLivedCache,
  TestApplyAnthropicCacheControlLongLived).
* tests/run_agent/test_anthropic_prompt_cache_policy.py: +9 tests
  (TestSupportsLongLivedAnthropicCache matrix across 8 endpoint classes
  + a fallback-target case).
* tests/agent/test_prompt_caching_live.py: new live E2E (skipif when
  OPENROUTER_API_KEY is unset; runs outside the hermetic suite).
* Targeted suites: 327/327 pass (caching/adapter/policy/builder).
* tests/agent/ + tests/run_agent/: 3992 pass, 17 skip, 1 pre-existing
  flake (test_async_httpx_del_neuter::test_same_key_replaces_stale_loop_entry,
  verified failing on pristine origin/main).
2026-05-11 11:14:56 -07:00
kshitij 2ec8d2b42f chore: ruff auto-fix PLR6201 — tuple → set in membership tests (#23937)
Replace  with  for all literal-tuple
membership tests. Set lookup is O(1) vs O(n) for tuple — consistent
micro-optimization across the codebase.

608 instances fixed via `ruff --fix --unsafe-fixes`, 0 remaining.
133 files, +626/-626 (net zero).
2026-05-11 11:13:25 -07:00
Teknium1 8c11710314 chore(release): add AUTHOR_MAP entry for wuli666 2026-05-11 11:13:20 -07:00
wuli666 111b859e49 fix(auxiliary): evict async wrappers on poisoned client (follow-up to #23482)
#23482 fixed cache poisoning in the sync path: when a Codex auxiliary
timeout closes the underlying OpenAI client, _evict_cached_client_instance
walks CodexAuxiliaryClient wrappers via their _real_client attribute and
drops the cache entry so the next aux call rebuilds.

The cache key includes async_mode (see _client_cache_key), so the sync and
async clients for the same provider live in two distinct entries pointing
at the same underlying transport. The fix walked the sync wrapper's
_real_client correctly but the async wrappers
(AsyncCodexAuxiliaryClient, AsyncAnthropicAuxiliaryClient,
AsyncGeminiNativeClient) never exposed _real_client at all, so the async
entry survived eviction and kept handing out the poisoned client.

Effect on async aux callers: one timeout now poisons every subsequent
async aux call (compression, vision, session_search, title_generation)
with 'Connection error' until gateway restart -- even while the sync
route recovered as designed in #23482.

Mirror the sync wrapper's _real_client onto each async wrapper so the
existing eviction helper finds them. Three changes, one per wrapper:

- AsyncCodexAuxiliaryClient: self._real_client = sync_wrapper._real_client
  (the underlying OpenAI client)
- AsyncAnthropicAuxiliaryClient: same shape
- AsyncGeminiNativeClient: self._real_client = sync_client (Gemini's
  native facade is itself the leaf; no OpenAI client beneath it)

Update _evict_cached_client_instance docstring to reflect that it now
covers both sync and async wrappers via the same attribute walk.

Test: TestAuxiliaryClientPoisonedCacheEviction.test_evict_cached_client_instance_walks_async_wrapper
seeds both sync and async cache entries pointing at the same leaf and
asserts both are dropped on a single eviction call. Verified the test
fails without the wrapper changes ("async cache entry survived
eviction -- wrapper is missing _real_client") and passes with them.

Refs #23482, #23432
2026-05-11 11:13:20 -07:00
Teknium 1d00716754 fix(cli,tui): align CJK / wide-char markdown tables (#23863)
CJK and emoji glyphs render as two terminal cells but JS String#length
and the model's own padding count them as one, so any markdown table
with Chinese / Japanese / Korean cells drifts right per row when a
real terminal renders it. Both surfaces fix this with a display-cell
width measurement (wcswidth on the Python side, stringWidth on the
TUI side).

Changes:
- agent/markdown_tables.py: new helper. realign_markdown_tables(text)
  detects markdown table blocks (header + |---| divider) and
  rewrites the row padding using wcwidth.wcswidth so every pipe and
  dash lines up across rows. No-op on text without tables.
- cli.py: hook the helper into _render_final_assistant_content for
  strip / render modes (raw passes through untouched), and into the
  streaming line emitter so live token-by-token rendering also
  produces aligned tables. A small two-buffer state machine in
  _emit_stream_text holds table rows until the block ends, then
  flushes them through the realigner so all rows pad to a single
  per-column width.
- ui-tui/src/components/markdown.tsx: renderTable now uses
  stringWidth (Bun.stringWidth fast path + East-Asian-width-aware
  fallback, already memoised in @hermes/ink) instead of UTF-16
  String#length for both column-width measurement and per-cell
  padding. Drops the comment that documented the bug as a deliberate
  limitation.

Validation:
- New tests/agent/test_markdown_tables.py (11): every rebuilt block
  shares pipe column offsets across rows for pure CJK, mixed
  CJK+emoji, ragged-row, and multi-table inputs.
- Updated tests/cli/test_cli_markdown_rendering.py: the existing
  strip-mode test asserted exact whitespace; rewritten to assert the
  alignment contract (cell content survives + every rendered row
  shares pipe offsets).
- New ui-tui markdown.test.ts case (1): rendered column-2 start
  offset is identical for the header + every body row, including
  the CJK row that drifted before the fix.
- Live: hermes chat -q with the user-reported screenshot prompt now
  produces a perfectly aligned table on the wire (header, divider,
  4 body rows including '通义千问', all pipes at identical columns).
2026-05-11 11:13:06 -07:00
kshitij 657874460f chore: ruff auto-fixes — collapsible-else-if, if-stmt-min-max, dict.fromkeys (#23926)
PLR5501 (collapsible-else-if): 28 instances — else: if: → elif:
PLR1730 (if-stmt-min-max):   15 instances — if x<y: x=y → x=max(x,y)
C420   (dict.fromkeys):       2 instances — dictcomp → dict.fromkeys
PLR1704 (redefined-argument): 1 instance — reason → err_msg (shadow fix)
C414   (unnecessary-list):    1 instance — sorted(list(x)) → sorted(x)

28 files, -44 net lines. All mechanical, zero logic changes.
17,211 tests pass, zero regressions.
2026-05-11 11:03:29 -07:00
Teknium 8e2eb4b511 fix(/model): surface Nous Portal models from remote catalog manifest (#23912)
The /model picker for Nous Portal users was returning the in-repo
_PROVIDER_MODELS["nous"] snapshot — which only updates on Hermes
releases — instead of the remote manifest published at
https://hermes-agent.nousresearch.com/docs/api/model-catalog.json.

OpenRouter already pulled from the manifest via fetch_openrouter_models;
"nous" was the only curated provider where the existing manifest
plumbing (get_curated_nous_model_ids → get_curated_nous_models) was
defined but not wired into the picker pipeline. Switch the curated
build in list_authenticated_providers to use it, with the same
graceful fallback to the in-repo snapshot when the manifest is
unreachable.

Test: tests/hermes_cli/test_model_catalog.py exercises the picker with
a patched manifest and asserts the manifest's nous list reaches
list_picker_providers. Falls-back-to-static path was already covered
by test_curated_nous_ids_falls_back_to_hardcoded_on_empty_catalog.
2026-05-11 10:15:30 -07:00
Teknium1 cc9e788c14 fix(cli): defensive _slash_confirm_state access + AUTHOR_MAP
- getattr(self, '_slash_confirm_state', None) at the two read sites that
  trip object.__new__(HermesCLI) test fixtures (test_cli_external_editor,
  test_cli_skin_integration)
- _build_tui_layout_children: make slash_confirm_widget keyword-only with
  default None to avoid breaking subclassing extension hook for wrapper
  CLIs (test_cli_extension_hooks)
- AUTHOR_MAP entry for zhengyn0001

Follow-up to the salvaged commit ca1d4375a.
2026-05-11 10:02:03 -07:00
zhengyuna 054f568578 fix: use TUI modal for slash confirmations 2026-05-11 10:02:03 -07:00
rob-maron e155f2aca9 rebuild model catalog 2026-05-11 09:54:31 -07:00
Teknium1 283381b1ce fix(dashboard): validate dist exists when --skip-build is set
Follow-up to PR #23824. Adds two correctness fixes on top of the
contributor's salvaged commit:

1. Stale-dist fallback no longer gated on `fatal=False`. `cmd_dashboard`
   passes `fatal=True` and is the primary scenario this fallback is for
   (issue #23817 — Windows Scheduled Task at logon). The previous gate
   meant the fallback never fired in the case it was designed for.

2. `--skip-build` now verifies the dist actually exists before starting
   the server. Without this, a misconfigured pre-build would launch the
   dashboard pointing at a missing dist and silently serve 404s. We now
   exit 1 with a clear "pre-build first: cd web && npm run build"
   message, and on success print which dist directory is being used.

Verified end-to-end on Linux:
- build fails + stale dist (fatal=True)  -> fallback fires
- build fails + no dist (fatal=True)     -> exit 1 with stderr surfaced
- build fails + stale dist (fatal=False) -> fallback fires
- --skip-build + missing dist            -> exit 1 with clear guidance
- --skip-build + valid dist              -> 'Skipping web UI build...'
2026-05-11 09:27:05 -07:00
ygd58 7085f4e238 fix(dashboard): fallback to stale dist, retry build, add --skip-build flag
Three improvements for non-interactive contexts (Windows Scheduled
Tasks, CI/CD) where the web UI build may fail (issue #23817):

1. Retry build once after 3s — covers boot-time races (antivirus
   scanning Node.js, npm cache not ready, transient disk I/O)
2. Fall back to existing dist when build fails (non-fatal mode) —
   a stale UI is far better than no UI at all
3. Add --skip-build flag — lets callers pre-build in their wrapper
   script and start the dashboard without internal build attempt
4. Surface npm stderr in build failure output for easier debugging

Fixes #23817
2026-05-11 09:27:05 -07:00
Teknium1 88a2ce4ae5 chore: AUTHOR_MAP entry for VinceZcrikl noreply (#23647) 2026-05-11 08:14:03 -07:00
文森.Z a479ec01ed fix: make web UI build output decoding robust on Windows
On Windows systems using a Chinese GBK locale, `hermes update` could misreport the Web UI build as failed even when `npm run build` actually succeeded. The failure was caused by Python decoding captured npm output with the process locale inside a background subprocess reader thread. When npm emitted bytes such as `0x85`, decoding under GBK raised `UnicodeDecodeError`, and Hermes then surfaced a misleading "Web UI build failed" warning.

This change makes the npm install/npm ci path and the Web UI build step decode captured output explicitly as UTF-8 with `errors="replace"`. That keeps unexpected bytes from crashing output collection, preserves successful builds, and prevents false negatives during update on Windows.

The patch also adds regression tests that verify these subprocess calls always use explicit UTF-8 decoding with replacement semantics.
2026-05-11 08:14:03 -07:00
Teknium 7026af4e23 fix(agent): catch ChatGPT-account Codex data-URL rejection so images are stripped instead of cascading to compression (#23602)
When the user's main provider is openai-codex on the ChatGPT-account
backend (https://chatgpt.com/backend-api/codex), sending a native image
attachment encodes it as data:image/...base64,... in the input_image
field. The OpenAI Responses API on the public endpoint accepts that, but
the ChatGPT-account variant rejects it with HTTP 400:

  Invalid 'input[N].content[K].image_url'. Expected a valid URL, but got
  a value with an invalid format.

Hermes' image-rejection phrase list didn't include this wording, so the
error escaped the strip-and-retry branch and fell through to the generic
recovery path: model fallback → context-too-large → compression cascade
→ auxiliary OpenRouter 402 spam (issue #23570).

Add a NARROW phrase keyed on the field-path apostrophe used by the Codex
Responses error format: "image_url'. expected". This matches the actual
error format without false-tripping on generic 'Expected a valid URL'
errors from unrelated tools (webhooks, redirect_uri, etc.). Once matched,
the existing branch strips images from history, sets _vision_supported=
False for the session, and retries text-only.

Refs #23570 (1 of 3 image-replay improvements; persistence rewrite to
store image PATHS instead of inlined base64 is a separate follow-up)
2026-05-11 07:37:22 -07:00
Teknium 3e7145e0bb revert: roll back /goal checklist + /subgoal feature stack (#23813)
* Revert "fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)"

This reverts commit a63a2b7c78.

* Revert "fix(goals): forward standing /goal state on auto-compression session rotation (#23530)"

This reverts commit 4a080b1d5a.

* Revert "feat(goals): /goal checklist + /subgoal user controls (#23456)"

This reverts commit 404640a2b7.
2026-05-11 07:06:27 -07:00
kshitijk4poor 1d4a4997b1 chore: AUTHOR_MAP entries for sudo-hardening salvage contributors
- openclaw@agent.local → 29206394 (PR #22194)
- freedemon@gmail.com  → fr33d3m0n (PR #21128)
2026-05-11 06:56:30 -07:00
fr33d3m0n 976d8e27ad fix(approval): catch sudo with stdin/askpass/shell privilege flags
Adds the only #17873 category not covered by the in-flight PRs #17962
(briandevans, reverse shell + download-execute) and #7993 (SHL0MS,
credential reads + curl/wget exfiltration): sudo invocations that an
LLM-driven agent can drive without TTY interaction.

The agent has no TTY, so the sudo forms that succeed without human
involvement are those reading the password from stdin (`-S` / `--stdin`)
or via an askpass helper (`-A` / `--askpass`). The shell-launch (`-s`)
and list-privileges (`-a`) flags are also gated since they are
privilege-relevant invocations the agent can chain after acquiring the
password (e.g. read SUDO_PASSWORD from .env -> sudo -S -s -> root shell).
Plain `sudo cmd` (no flag) is TTY-bound and excluded.

Two patterns:

  1. Direct flag: `\bsudo\b[^;|&\n]*?\s+(?:-s\b|--stdin\b|-a\b|--askpass\b)`
     The lazy `[^;|&\n]*?` consumes flag-arguments without spanning
     command separators, so `sudo -u root -S whoami` matches (a textbook
     offensive form that a strict `(?:\s+-[^\s]+)*` "leading flags only"
     pattern would have missed because `root` is a flag-value not a flag).

  2. Combined short flags: `\bsudo\b[^;|&\n]*?\s+-[a-z]*[sa][a-z]*\b`
     Catches packed forms like `sudo -nS id` where multiple flags share
     a single `-X` token.

`_normalize_command_for_detection` lowercases input before pattern
matching (tools/approval.py:340), so case variants of S/s and A/a
collapse — both letter-pairs are gated since each is a privilege-
relevant invocation.

Tests: 21 new cases in TestDetectSudoStdin (12 positive covering all
flag-order permutations including herestring source and printf-piped
forms; 9 negative including TTY-bound `sudo whoami`, interactive
`sudo -i`, env-var reference `$SUDO_USER`, doc lookup `man sudo`,
package install, and the `pseudosudo` word-boundary edge case).

Empirical coverage: 11/11 attacks matched, 0/10 false positives.

Refs: #17873 category 4. Adjacent: #17962 (reverse shell + download-
execute), #7993 (credential reads + curl/wget exfiltration).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 06:56:30 -07:00
OpenClaw Agent 9520a1ccdf fix(terminal): block sudo -S password guessing when SUDO_PASSWORD is not set
Fixes #9590: Block explicit sudo -S (stdin password mode) commands
when the SUDO_PASSWORD environment variable is not configured.

The attack vector: the LLM constructs 'echo guessedpass | sudo -S cmd'
to brute-force sudo passwords, iterates based on sudo's error output
('Sorry, try again').  The existing _transform_sudo_command only
injects -S when SUDO_PASSWORD exists; without it, the LLM's explicit
sudo -S must be treated as a guessing attempt.

Changes:
- Add _check_sudo_stdin_guard() in approval.py: detects sudo -S when
  SUDO_PASSWORD is absent, anchored to command-start positions
  (^ ; && || | etc.) to avoid false positives on literal text
- Integrate into check_all_command_guards() above yolo/mode=off so
  the block is unconditional (like the hardline floor)
- Add 6 tests covering: detection, allow-list, SUDO_PASSWORD bypass,
  integration with check_all_command_guards, yolo non-bypass,
  container backend bypass
2026-05-11 06:56:30 -07:00
kshitijk4poor 494824fb11 chore: remove unused sentinel in test_send_message_tool 2026-05-11 06:44:58 -07:00
kshitijk4poor 5712483487 fix: guard resolve_profile_env against missing profile dirs
The _default_spawn HERMES_HOME injection (PR #23356) calls
resolve_profile_env which raises FileNotFoundError when the profile
dir doesn't exist. In production the profile always exists (workers are
only dispatched for live profiles), but tests with isolated HERMES_HOME
never create profile dirs. Catch FileNotFoundError and fall through —
HERMES_PROFILE is still set below, so the worker CLI resolves the
profile at startup.
2026-05-11 06:44:58 -07:00
kshitijk4poor 7087702210 chore: add salvage contributors to AUTHOR_MAP
For PRs #23206 (Frowtek), #23252 (Sylw3ster), #23358 (dmnkhorvath),
#23659 (smwbev), and #23356 (TurgutKural) — all part of the kanban
bug-fix batch salvage.
2026-05-11 06:44:58 -07:00
Ninso112 a1854ac07c fix(kanban): treat archived parent tasks as terminal for dependency resolution
When a parent task is archived, dependent child tasks were stuck in
todo forever because recompute_ready and claim_task only checked for
status == 'done'. Now both functions also treat 'archived' as a
terminal status, allowing children to proceed when their parent is
archived.

Fixes #23180.
2026-05-11 06:44:58 -07:00
Evgenii 27cfe72543 fix(kanban): use localized column label in select-all aria label 2026-05-11 06:44:58 -07:00
Dominikh 379e7dd014 test(send_message): cover _check_send_message gating paths
Adds a TestCheckSendMessage class with 7 focused tests pinning the
four passing conditions and the failure modes:

  - HERMES_KANBAN_TASK grants access (the new branch)
  - HERMES_KANBAN_TASK short-circuits before consulting
    session_context or gateway.status (so workers don't depend on
    those import paths being healthy)
  - HERMES_SESSION_PLATFORM=telegram grants access
  - HERMES_SESSION_PLATFORM=local falls through to gateway check
  - is_gateway_running()=True grants access
  - All signals absent → False
  - gateway.status ImportError is swallowed → False

Pinning the short-circuit (test #2) is the load-bearing one — it
documents the contract that worker-side availability cannot regress
to depending on gateway-side state lookups.
2026-05-11 06:44:58 -07:00
Dominikh 8ac998cb0c fix(send_message): allow kanban workers to call send_message
The kanban dispatcher sets HERMES_KANBAN_TASK on every spawned worker
but launches it with the assignee profile's HERMES_HOME (e.g.
~/.hermes/profiles/<name>/), which has no gateway.pid file. The
existing _check_send_message therefore returned False from the
is_gateway_running() fallback, even though the parent gateway is
alive and reachable.

Net effect: workers could call kanban_* tools (gated on
HERMES_KANBAN_TASK in _check_kanban_mode) but not send_message. This
breaks the natural pattern of "worker does the job, calls
send_message to deliver rich content to the originating chat, then
calls kanban_complete with a one-line summary" because the kanban
notifier's payload_summary is hard-truncated to the first line
(~200 chars) at gateway/run.py:3963 — anything richer has to ship
via send_message.

Honoring HERMES_KANBAN_TASK in _check_send_message — symmetric with
_check_kanban_mode in kanban_tools.py:42 — closes the gap. No new
state, no new env var, no profile-config changes required.
2026-05-11 06:44:58 -07:00
TurgutKural 5af315c4cc fix(kanban): inject HERMES_HOME into worker subprocess env
Default spawn did not propagate HERMES_HOME when forking kanban workers.
The worker's env is copied from the parent via dict(os.environ), so
HERMES_HOME is absent. When the child then starts hermes -p <profile>,
the CLI's _apply_profile_override() runs before hermes_constants is
imported and get_hermes_home() falls back to ~/.hermes (the default
profile root), silently ignoring the profile's config.yaml.  Profile-
scoped fallback_providers, toolsets, and agent settings are therefore
never applied to kanban workers.

The fix injects HERMES_HOME into the worker's env using
resolve_profile_env(profile_arg) so the child reads the correct profile
directory instead of the default root.
2026-05-11 06:44:58 -07:00
Sylw3ster 641e40c4bd fix(kanban): restore HERMES_KANBAN_BOARD after scoped slash override 2026-05-11 06:44:58 -07:00
liuhao1024 2b3bf17dfa fix(kanban): call kanban_block on iteration-budget exhaustion to prevent protocol violation
When a kanban worker subprocess hits the iteration budget, the agent
loop strips tools and asks the model for a summary.  The model cannot
call kanban_block itself at that point, so the process exits rc=0
without calling kanban_complete or kanban_block — a protocol violation
that the dispatcher detects as a fatal error, giving up after 1 failure
and stranding downstream tasks.

Fix: after _handle_max_iterations() returns, check HERMES_KANBAN_TASK
and call kanban_block with a reason describing the exhaustion.  The
dispatcher then sees a clean block transition instead of a protocol
violation, and the task can be retried or escalated by a human.

Fixes [Bug] kanban-worker exits cleanly (rc=0) on iteration-budget
exhaustion without calling kanban_complete or kanban_block #23216
2026-05-11 06:44:58 -07:00
Frowtek f6d4f3c37d fix(kanban): route gateway create auto-subscribe to explicit board 2026-05-11 06:44:58 -07:00
Siddharth Balyan 64145a1996 fix(nix): replace chown -R with targeted find in container entrypoint (#23633)
The container entrypoint ran `chown -R` on $HERMES_HOME every start.
`chown` strips the setgid bit (kernel security behavior), destroying
the 2770 permissions the NixOS activation script sets for group access
by hostUsers. This caused PermissionError for interactive CLI users
even though they were in the hermes group.

Replace with `find ... ! -user $UID -exec chown` which only touches
files with wrong ownership, leaving correctly-owned directories and
their permission bits intact.

Affects: container.enable + container.hostUsers + addToSystemPackages

Related: #19795, #19788, #9383
2026-05-11 12:59:57 +05:30
Siddharth Balyan 5606258855 feat(nix): add extraDependencyGroups for sealed venv extras (#21817)
Expose the dependency-groups parameter from python.nix through
hermes-agent.nix and the NixOS module, allowing users to opt into
pyproject.toml optional extras (e.g. hindsight, voice, matrix) that
are resolved by uv inside the sealed venv.

Unlike extraPythonPackages (which appends to PYTHONPATH and requires
collision checking), extraDependencyGroups resolves the full dependency
graph in a single uv pass — no PYTHONPATH patching, no version
conflicts, no collision risk.

When to use which:
- extraDependencyGroups: enable a pyproject.toml optional extra
- extraPythonPackages: add an external Python plugin not in pyproject.toml

Usage:
  services.hermes-agent.extraDependencyGroups = [ "hindsight" ];

Or via overlay:
  pkgs.hermes-agent.override { extraDependencyGroups = [ "hindsight" ]; }

Refs: #8873, #9194
2026-05-11 12:23:48 +05:30
Siddharth Balyan d992fd9aaf feat(deps): add hindsight-client as optional dependency (#21818)
Declares hindsight-client as an optional dependency group [hindsight]
in pyproject.toml. This allows build-time inclusion for environments
where runtime pip install is not possible (NixOS sealed venvs, Docker,
Kubernetes).

Not included in [all] — memory providers are plugins and should be
opted into explicitly.

Install via:
  uv sync --extra hindsight
  pip install hermes-agent[hindsight]

NixOS (with extraDependencyGroups):
  services.hermes-agent.extraDependencyGroups = [ "hindsight" ];

Closes #8873
2026-05-11 12:22:02 +05:30
Mibayy ebf2ea584a feat(terminal,cli): docker_extra_args + display.timestamps
Two independent opt-in QoL toggles, both off by default.

terminal.docker_extra_args:
- List of extra flags appended verbatim to docker run after security
  defaults. Useful for adding capabilities (e.g. --cap-add SETUID) or
  other docker run options not exposed by existing config keys.
- Non-string entries are logged and skipped.
- Also available via TERMINAL_DOCKER_EXTRA_ARGS='[...]' env var.

display.timestamps:
- Appends [HH:MM] to user input bullet and the assistant response box
  header. Single hub in _format_submitted_user_message_preview()
  covers both single-line and multi-line user previews; assistant
  response label gets the timestamp at box-open time.

Closes #1569 (timestamps).

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
2026-05-10 22:43:39 -07:00
Teknium 228b7d27bd fix(auxiliary): cache 402'd providers as unhealthy with TTL to stop per-call retry storms (#23597)
When an auxiliary provider returns HTTP 402 (credit / payment), every
subsequent compression / title-gen / session-search / vision call still
re-tried it as the FIRST entry in the chain — burning ~1 RTT to hit 402
again, then falling back. On a long Discord/LCM session that meant dozens
of doomed 402s per minute (issue #23570).

Add a per-process unhealthy-provider cache with a 10 min TTL. When any
caller observes a payment error against a provider, the label is marked
unhealthy and skipped by:
  * _resolve_auto Step-1 (main provider use-as-aux path)
  * _resolve_auto Step-2 (aggregator/fallback chain)
  * _try_payment_fallback (used by call_llm/acall_llm on first 402)

Skip-logs are throttled to once per minute per label so a bursty session
doesn't spam agent.log. Entries auto-expire so a topped-up account
recovers without manual intervention. The cache is in-process only by
design — multi-profile users with different keys per profile must each
hit the 402 once.

Refs #23570
2026-05-10 22:43:14 -07:00
0xbyt4 ace1c4ea8c fix(discord): typing indicator task not cleaned up after API error
When the Discord typing API call fails (rate limit, network error, 403),
_typing_loop returns early but the stale task remains in _typing_tasks.
Subsequent send_typing calls see the stale entry and skip, leaving no
typing indicator for the rest of the agent invocation.

Add finally block to _typing_loop to always remove the task from
_typing_tasks on exit, whether from cancellation, error, or normal
completion. This allows send_typing to create a fresh task.

3 new tests in test_discord_send.py:
- Task removed after API error
- Typing restartable after failure
- stop_typing cleans up
2026-05-10 22:41:26 -07:00
teknium1 0458d99f22 chore(release): AUTHOR_MAP entry for Mibayy clawhub email 2026-05-10 22:37:42 -07:00
teknium1 9526040700 chore(skills/stocks): tighten SKILL.md to modern format 2026-05-10 22:37:42 -07:00
teknium1 2ea957fc41 chore(skills/stocks): relocate to optional-skills/finance/stocks/ 2026-05-10 22:37:42 -07:00
Mibayy 896a7ce261 feat: add stocks & finance skill (Yahoo Finance, no API key)
5 commands: quote, search, history, compare, crypto
Zero dependencies, Python stdlib only.
Supports multi-symbol queries and crypto prices.
2026-05-10 22:37:42 -07:00
Jeffrey Quesnelle bf2cc8b31c Merge pull request #20317 from NousResearch/meta/security-policy
docs(security): rewrite policy around OS-level isolation as the boundary
2026-05-11 01:36:32 -04:00
Teknium 228a4d11ae fix(config): warn loudly on YAML parse failure instead of silent default fallback (#23585)
A YAML parse error in ~/.hermes/config.yaml caused load_config() to print
one line to stdout (Warning: Failed to load config: ...) and silently fall
back to DEFAULT_CONFIG, dropping every user override (auxiliary providers,
fallback chain, model settings). Users only noticed when downstream
behavior misbehaved — see issue #23570 where a tab-indent error in the
auxiliary section caused aux fallback to use OpenRouter (depleted) instead
of the configured Codex/MiniMax chain.

Now: log at WARNING (so 'hermes logs' surfaces it), write a prominent line
to stderr, dedup on (path, mtime_ns, size) so concurrent loads don't spam,
and re-warn after the user edits the file. Both call sites (raw read +
merged load) route through the same helper.

Refs #23570
2026-05-10 22:36:19 -07:00
Gutslabs 3af3c4eb8c fix(misc): three small defensive fixes from PR #1974
Salvages the three substantive low-severity fixes from Gutslabs' #1974
"misc bug fixes" bundle.  The other 8 claims in that PR were either
already fixed on main with superior implementations (state lock,
firecrawl lazy import, fcntl/msvcrt guard, path normalization, schema
migrations) or did not survive review.

- run_agent: `_materialize_data_url_for_vision` uses
  `NamedTemporaryFile(delete=False)`; if `base64.b64decode` raises on a
  corrupt data URL the temp file would persist forever.  Wrap the
  write in try/except and `os.unlink` the temp on failure.

- gateway/session: `append_to_transcript` JSONL write had no error
  handling, so disk-full / read-only-fs / permission errors crashed the
  message handler.  The SQLite write above is the primary store, so
  swallow OSError on the JSONL fallback with a debug log.

- gateway/status: `_read_pid_record` reads `pid_path.read_text()` after
  an `exists()` check; if the PID file is deleted between the two
  calls (concurrent gateway restart) we hit an unhandled OSError.
  Catch it and return None.

Adds a regression test for the tempfile cleanup; the other two paths
are defensive try/excepts on infrequent OSError that don't warrant
dedicated tests.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-10 22:28:01 -07:00
teknium1 482d49cf90 chore: AUTHOR_MAP entry for wilsen0 2026-05-10 22:22:25 -07:00
teknium1 edb4a2bda5 test(telegram): cover env-clamped helper + adaptive text-batch tiers
- New tests/gateway/test_telegram_text_batch_perf.py:
  TestEnvFloatClamped — 7 tests covering default-when-unset, valid
  parse, garbage fallback, NaN rejection, Inf rejection, min-clamp,
  max-clamp.  Asserts asyncio.sleep() always gets a finite number.

  TestAdaptiveTextBatchTiers — 4 tests covering the tier-constant
  invariants and the min(cap, tier_delay) composition rule.

- tests/gateway/test_display_config.py: update assertions for
  Telegram's new tool_progress='new' default.
2026-05-10 22:22:25 -07:00
wilsen0 ac95b8cdbe perf(gateway): tune Telegram cadence + adaptive fast-path for short replies
Re-authored against current main from PR #10388 by @wilsen0.  The
original branch is 3800+ commits stale and could not be cherry-picked
without reverting unrelated work; this change carries only the perf
intent forward.

Tuning summary
==============

Text-batch ingress (gateway/platforms/telegram.py):
  - HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS default 0.6 -> 0.3
  - HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS default 2.0 -> 1.0
  - Adaptive fast-path tiers in _flush_text_batch:
      total <= 320 cp -> min(cap, 0.18)
      total <= 1024 cp -> min(cap, 0.24)
      else            -> cap
    A single short reply now reaches the agent in ~180ms instead of
    600ms.  Tier constants compose with the configured cap via min()
    so an operator who tightens HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS
    below 0.18 still wins on every tier.
  - _env_float_clamped helper replaces bare float(os.getenv()).
    Rejects NaN / Inf, applies optional min/max bounds.  Used for
    text-batch + media-batch knobs.  Prevents asyncio.sleep(NaN)
    crashes when an operator typos an env var.

Stream cadence (gateway/config.py + stream_consumer.py):
  - StreamingConfig.edit_interval default 1.0s -> 0.8s
  - StreamingConfig.buffer_threshold default 40 -> 24 chars
  - DEFAULT_STREAMING_EDIT_INTERVAL / BUFFER_THRESHOLD / CURSOR are now
    a single source of truth.  StreamConsumerConfig imports them
    instead of duplicating the literals; the prior dual-source drift
    is fixed.

Tool progress (gateway/display_config.py):
  - Telegram default tool_progress 'all' -> 'new'.  Inside
    Telegram's ~1 edit/s flood envelope the 'all' default would
    accumulate edit pressure on busy chats; 'new' shows only the
    leading bubble per tool batch and feels less spammy.
  - Slack tier_low override (tool_progress='off') is preserved.

Composition with native draft streaming (#23512)
================================================

The mid-stream cadence (edit_interval, buffer_threshold) gates BOTH
the draft path (send_draft) and the edit path (edit_message), so the
tighter cadence helps native draft as much as edit-based.  The
text-batch fast-path applies before the consumer starts, so it speeds
up the first-token latency on every transport.  No conflict.

Stale-base avoidance
====================

Re-authored from scratch rather than cherry-picked.  Dropped from the
original branch:
  - Unrelated d2f043f9c 'fix(anthropic): preserve third-party thinking
    continuity' commit
  - boot_md.py builtin gateway hook (unrelated)
  - Reverted Slack tool_progress='off' (#14663) restoration
  - Reverted Platform plugin discovery, MSGRAPH_WEBHOOK, YUANBAO
    members deletion
  - 2300+ lines of run.py base-skew noise

Tests
=====

New tests/gateway/test_telegram_text_batch_perf.py:
  - 7 tests for _env_float_clamped (NaN, Inf, garbage, bounds).
  - 4 tests for the adaptive-tier composition rules.

Updated tests/gateway/test_display_config.py:
  - test_platform_default_when_no_user_config: 'all' -> 'new' for
    Telegram, with comment.
  - test_high_tier_platforms: split into Telegram-overrides-to-new
    and Discord-stays-all assertions.

Closes #10388.

Co-authored-by: wilsen0 <132184373+wilsen0@users.noreply.github.com>
2026-05-10 22:22:25 -07:00
Teknium e3b88a8fe2 rename(skills): api-testing -> rest-graphql-debug (#23589)
More specific name. The skill is REST + GraphQL debugging end-to-end,
not generic 'api testing' (a smoke-test pytest scaffold is one short
section out of ~500 lines). Renames directory + frontmatter name +
self-reference in the delegate_task example body.
2026-05-10 22:22:19 -07:00
teknium1 5f767879e6 chore(release): AUTHOR_MAP entry for Hugo-SEQUIER 2026-05-10 22:15:04 -07:00
teknium1 1f899393dc chore(skills/hyperliquid): tighten SKILL.md to modern format
- description shortened to <=60 chars
- platforms gated to [linux, macos, windows] (stdlib-only, all OK)
- author credits Hugo Sequier
- collapse redundant prerequisites/setup blocks
- terminal-tool-oriented procedure section
2026-05-10 22:15:04 -07:00
Hugo Sqr f2e8ed2405 Add unit tests for hyperliquid skill functionality
- Implement tests for normalizing perpetual markets and DEXs.
- Validate JSON output for main commands including markets, candles, and review.
- Ensure environment variable resolution and dotenv file reading are covered.
- Test export functionality for market data with expected output structure.
2026-05-10 22:15:04 -07:00
Teknium1 28b4fe6007 test: stabilize quick-command redaction test against xdist ordering
agent.redact._REDACT_ENABLED is snapshotted at import time from
HERMES_REDACT_SECRETS env. Under xdist a prior test in the same worker
can flip it, so test_exec_command_output_is_redacted was order-dependent.
Pin it via monkeypatch like test_terminal_output_transform_still_runs_strip_and_redact does.
2026-05-10 22:12:23 -07:00
0xbyt4 f6736ced81 fix(security): sanitize env and redact output in quick commands + remove write-only _pending_messages
1. Quick command exec ran in the gateway process's full environment
   without env sanitization or output redaction. A quick command like
   "env" or "printenv" would leak all API keys, OAuth tokens, and
   bot credentials to the messaging user.

   Fix: apply _sanitize_subprocess_env() before exec and
   redact_sensitive_text() on output before returning.

2. GatewayRunner._pending_messages was written on every interrupt
   (lines 1331-1334) but never read or consumed anywhere. The actual
   interrupt delivery uses adapter._pending_messages (a separate dict).
   Removed the write-only accumulation to prevent unbounded growth.
2026-05-10 22:12:23 -07:00
Muhammet Eren Karakuş 4c57a5b318 feat(skills): add api-testing optional skill (#1800)
Adds optional-skills/software-development/api-testing/SKILL.md — a single-file
runbook for systematic REST/GraphQL API debugging via Hermes tools (terminal,
execute_code, web_extract, delegate_task).

- 60-char description; gated to platforms: [linux, macos]
- Layered debug flow (connectivity → TLS → auth → format → parse → semantics)
- HTTP status playbook (401/403/404/409/422/429/5xx)
- Pagination, idempotency, contract validation, correlation IDs
- pytest smoke template, token-redaction patterns, leak checklist
- Hermes tool patterns replace generic curl/python examples

Lands in optional-skills/ (not always-active skills/) so it's installed via
hermes skills install official/software-development/api-testing.

scripts/release.py: AUTHOR_MAP entry for erenkar950@gmail.com → eren-karakus0.

Closes #1800.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-10 22:11:31 -07:00
teknium1 6c1af45b78 chore: AUTHOR_MAP entry for kjames2001 (James Huang) 2026-05-10 22:02:56 -07:00
teknium1 82352e54c4 test(telegram): regression coverage for edit overflow split-and-deliver
Two new tests:

- tests/gateway/test_telegram_format.py
  test_message_too_long_splits_into_continuations_not_silent_truncation:
  asserts edit_message returns success=True with continuation_message_ids
  populated and message_id pointing at the last continuation when
  content exceeds MAX_MESSAGE_LENGTH (#19537). Replaces the original
  fail-on-overflow assertion with the split-and-deliver contract.

- tests/gateway/test_stream_consumer.py
  TestEditOverflowSplitAndDeliver.test_consumer_advances_message_id_on_split_and_deliver:
  asserts the consumer side updates _message_id to the latest
  continuation, clears _last_sent_text, and fires on_new_message when
  the adapter reports a split-and-deliver result.
2026-05-10 22:02:56 -07:00
kjames2001 bf1f40996f fix(telegram): split-and-deliver oversized edits instead of silent truncation
When edit_message_text exceeded Telegram's 4096 UTF-16 codepoint limit,
the adapter caught the BadRequest, best-effort truncated the content
with '…', and returned SendResult(success=True). The stream consumer
believed the full edit was delivered and never recovered, silently
dropping everything past the truncation boundary on long replies.

Returning failure isn't safe either — the consumer's existing fallback
path can race against the next streaming tick, producing duplicate
sends or gaps. Instead, the adapter now SPLITS the oversized payload
across the existing message + new continuation messages, so the user
always gets the full reply in correct order.

How it works:

1. Pre-flight: if utf16_len(content) already exceeds MAX_MESSAGE_LENGTH,
   call the new _edit_overflow_split helper directly — saves a doomed
   round-trip + a Telegram error.

2. Reactive: if Telegram still returns 'message_too_long' after the
   pre-flight (e.g. parse_mode formatting inflated the payload past
   the limit via MarkdownV2 escapes), the same helper handles it.

3. _edit_overflow_split:
   - Splits via truncate_message(len_fn=utf16_len) — same chunking the
     non-streaming send() path uses; chunks get '(1/N)' suffixes.
   - Edits the original message_id with chunk 1 (with parse_mode +
     plain-fallback when finalize=True, mirroring the main edit path).
   - Sends each remaining chunk via self._bot.send_message threaded as
     a reply to the previous chunk so the user sees them as a
     contiguous block. MarkdownV2-with-plain-fallback per chunk on
     finalize.
   - Returns SendResult(success=True, message_id=<last_chunk_id>,
     continuation_message_ids=(<chunk2_id>, <chunk3_id>, ...)) so the
     stream consumer can keep editing the most recent visible message
     and the gateway has full visibility into every message id.

SendResult contract extension:

  Added optional continuation_message_ids: tuple = () field. When
  empty (the common case), behavior is unchanged. When populated, the
  caller knows the adapter delivered across multiple platform messages.

Stream consumer integration:

  GatewayStreamConsumer._send_or_edit advances _message_id to the
  last-continuation id when it sees continuation_message_ids on a
  successful edit result, resets _last_sent_text (the new visible
  message holds only the final chunk's text), and fires
  on_new_message so tool-progress bubbles linearize below the new
  continuation rather than the original. Mirrors the openclaw #32535
  inter-tool-leak guard.

Composes with what just landed:

  - PR #23455 (UTF-16 length-aware splitting in stream consumer)
    prevents most overflows upstream by measuring text in UTF-16
    codeunits before deciding to split. This PR is the safety net at
    the adapter boundary.
  - PR #23512 (native draft streaming, default for DM Telegram) routes
    DM streaming through send_draft, which has its own contract
    unaffected by this change. So this fix narrows in scope to the
    edit-based path: groups, supergroups, forum topics, every
    non-Telegram platform, and the per-response fallback after a
    draft failure.

Salvage notes:

  - Cherry-picked from PR #19537 by @kjames2001. Original PR returned
    failure on overflow; this evolves to split-and-deliver so users
    never lose content and the consumer state stays consistent.
  - Dropped an unrelated model-picker hunk (line 2114-2117) that
    silently killed the 'X more available — type /model <name>
    directly' hint by hardcoding total=len(models). Not in scope.
  - Restored the timeout-aware retryable=not is_timeout signal in
    send()'s fallthrough catch block.

Closes #19537.
2026-05-10 22:02:56 -07:00
Teknium 3b122cc1ac feat(kanban): stranded_in_ready diagnostic for unclaimed tasks (#23578)
Surface ready tasks that nobody claims within a threshold (default
30 min) regardless of why. One identity-agnostic signal that catches:

- Operator typo'd the assignee
- Profile was deleted, leaving its tasks stranded
- External worker pool (Codex CLI lane, custom daemon) is down
- Dispatcher misconfigured (wrong board / wrong HERMES_HOME)

Today the dispatcher correctly skips these (no respawn loop, good)
but nothing surfaces the fact that operator-actionable work is
accumulating. The new `stranded_in_ready` rule does that without
requiring a manual lane registry — it reads the most recent ready-
transition event (`created` / `promoted` / `reclaimed` / `unblocked`)
and fires when (now - last_ready_ts) > threshold.

Severity escalates with age: warning at threshold, error at 2x,
critical at 6x. The cli_hint and reassign actions point operators
at the right next step.

Out of scope deliberately:
- Lane registry (#20157 closed) — this signal supersedes it.
- Pushing the diagnostic into messaging gateways — diagnostics
  are pull-only via 'hermes kanban diagnostics' for now; gateway
  push is a separate UX decision.

Tests: 10 new + 461 existing kanban tests pass. E2E verified end-
to-end via 'hermes kanban diagnostics --json' against a 2h-old
stranded task — surfaces as error severity with correct actions.
2026-05-10 21:58:44 -07:00
Teknium1 bf5b8a7d61 chore(release): map @eloklam tailnet email 2026-05-10 21:44:37 -07:00
Teknium1 b8bf2f817d fix(kanban): merge dashboard batch QOL with i18n + collapse + assignee-casing
PR #23240 was branched before main landed:
- c39168453 i18n localization (16 locales)
- a91e5a875 native <details> collapse + skip empty metadata
- 0e0ddaac8 tone down completed-run metadata panel
- b308dd7d7 preserve assignee casing in dashboard

The cherry-pick took PR's dist/index.js wholesale via -X theirs,
which dropped those features. This commit re-applies them by
hand-merging the 7 conflict regions:

1. bulk-action catch handler: keep PR's failedIds + loadBoard,
   keep main's t-in-deps for tx() i18n calls
2. Refresh button: keep main's tx(t, 'refresh', ...), add PR's
   Clear filters button with tx(t, 'clearFilters', ...)
3. Archive button: keep main's tx(t, 'archive', ...), add PR's
   priority setter with tx(t, 'priority'/'setPriority', ...)
4. Column header: keep main's colHelp i18n var, add PR's
   column-select-all checkbox
5/6. lane.tasks/column.tasks .map: keep main's t->tk rename
   (avoids shadowing the i18n t), apply tk to PR's failed/
   draggingSource props
7. Card checkbox label-wrap: keep PR's <label> structure
   (larger hit target), keep main's tx(i18n, 'selectForBulk', ...)

Adds three new i18n keys (clearFilters, priority, setPriority)
that fall back to English via tx() until translators add them
to the kanban catalog, matching the existing pattern.
2026-05-10 21:44:37 -07:00
eloklam b60462a205 test(kanban): remove stale t.summary assertion from search test
Task.summary was never a real field; latest_summary already covers it.
Matches the haystack cleanup in commit f3015e6ab.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam 3df7e30244 kanban dashboard: fix shift-click range selection, column select-all toggle, and bulk action optimistic UI
- Bug 1: shift-click now always adds the target card and sets it as the
  last-selected anchor, so range selection works even when 0 or 1 cards
  are selected.
- Bug 2: column select-all checkbox now toggles: if every card in the
  column is already selected, clicking unselects them all.
- Bug 3: applyBulk now mirrors moveSelected with optimistic UI updates
  for status moves and calls loadBoard() on catch for consistency.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam 69053832e3 kanban dashboard: remove redundant t.summary from search haystack
The Task dataclass has no `summary` field; only Run carries summary.
The dashboard already searches `latest_summary` (derived from the
latest run), so `t.summary` in the client-side haystack was always
undefined and therefore redundant.

Verdict from task t_4bcac44f:
- Before batch QOL (6c7ec94d9): search only covered id, title,
  assignee, tenant.
- Batch QOL (7fd187102) correctly added body, result, latest_summary.
- `t.summary` was included but is a misleading no-op because tasks
  never expose a `summary` key — `latest_summary` already covers it.

Removes the redundant field from the haystack only.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam a88f201cd4 kanban dashboard: multi-card drag visual feedback
- When dragging a selected card while multiple cards are selected, the
  browser ghost image now shows a 'N cards' badge instead of a single card.
- All selected cards in the original column are dimmed (opacity 0.45 +
  grayscale) during the drag so the user sees the whole set is in-flight.
- Uses React state for the dragged task id; event delegation on the board
  columns container to avoid deep prop threading.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam 98c499b235 kanban dashboard: fix batch QOL oracle blockers
- Preserve failedIds partial-failure highlighting after moveSelected/
  applyBulk by clearing only selectedIds/lastSelectedId instead of
  calling clearSelected() (which also wiped failedIds).
- Fix touch/native multi-drag drop stale closure by adding
  props.selectedIds and props.onMoveSelected to the hermes-kanban:drop
  useEffect dependency array.

Fixes t_5bfafb73.
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam 0ea234e093 feat(kanban): dashboard batch QOL upgrade
- Shift-click range selection, column select-all, select-all-visible
- Multi-card drag/drop via selectedIds + /tasks/bulk
- Expanded bulk actions: todo/ready/blocked/unblock/complete/archive,
  priority setter, reassign with reclaim_first checkbox
- Partial failure card highlight (failedIds + hermes-kanban-card--failed)
- Search expanded to body, result, latest_summary, summary
- Clear filters button + reset all filters on board switch
- Accessibility: larger checkbox hit target, tabIndex/role/aria-label,
  Enter/Space/Esc keyboard handlers
- Fix temporal-dead-zone bug: move clearSelected before moveSelected
2026-05-10 21:44:37 -07:00
Yi Lok Enoch Lam 518d37f6af feat(kanban): add reclaim_first support to bulk reassign endpoint
- Extend BulkTaskBody with reclaim_first: bool = False
- In bulk_update, use kanban_db.reassign_task(..., reclaim_first=True)
  when payload.reclaim_first is set and assignee is present
- Falls back to existing assign_task behavior when reclaim_first is false

This enables the dashboard to bulk-reassign running tasks by
reclaiming their claims first, matching the single-task
/tasks/{id}/reassign endpoint behavior.
2026-05-10 21:44:37 -07:00
Teknium a63a2b7c78 fix(goals): force judge to use tool calls instead of JSON-text replies (#23547)
Live-tested on gemini-3-flash-preview the judge kept returning empty
or non-JSON content, tripping the consecutive-parse-failures auto-
pause. Free-form JSON output is hopeful; tool-call schemas are
enforced server-side by virtually every modern provider.

Two new tools the judge calls:

  - submit_checklist(items)  — Phase A, decompose
  - update_checklist(updates, new_items, reason) — Phase B, evaluate

Both phases now call the auxiliary client with tool_choice forcing
the right tool. read_file remains for Phase B history inspection,
with the loop exiting only when update_checklist is called or the
read budget is exhausted (at which point read_file is dropped from
the toolbox and update_checklist is forced).

Robustness:
- _call_judge_with_tool_choice falls back tool_choice forced→required→
  auto if the provider rejects a particular shape.
- If a fully-broken provider still returns content instead of a tool
  call, the legacy JSON-text parsers stay around as a last-ditch
  backstop so we never silently lose a checklist.
- _normalize_update_args replaces the JSON parser for the apply
  layer; same 1-based→0-based conversion + terminal-status filter.

Live verification: same fizzbuzz goal that was hitting 'judge model
returned unparseable output 3 turns in a row' before now terminates
in 2 turns, all 11 items marked completed with item-specific
evidence, no auto-pause. Agent log shows
'produced 11 checklist items via tool call' instead of the JSON-
parse path.

Tests: 7 new cases for the tool-call path (Phase A success, Phase B
update only, Phase B read_file→update, JSON-content backstop,
empty-text item dropping, non-terminal status filter).
2026-05-10 20:51:40 -07:00
Teknium 4a080b1d5a fix(goals): forward standing /goal state on auto-compression session rotation (#23530)
When run_agent's _compress_context fires mid-turn it ends the parent
session in SessionDB and creates a new continuation session with a
fresh session_id. The /goal state is keyed on session_id in
state_meta ("goal:<sid>"), so without forwarding the goal silently
disappears: _get_goal_manager() rebinds for the new session_id,
load_goal() returns None, mgr.is_active() is False, and the
continuation loop dies with no user-visible signal.

Fix: in the same SessionDB transaction block that creates the
continuation session, copy state_meta[goal:<old>] →
state_meta[goal:<new>] when present. No-op when the user has no
active goal. Logged at INFO so a stuck loop is debuggable.

Tests cover the round-trip via SessionDB and the no-op path.

Affects all three run-conversation surfaces (CLI, gateway, TUI
gateway) because _compress_context is the single rotation site.
2026-05-10 20:41:53 -07:00
teknium1 68d081f570 fix(kanban): keep '--created-by' default as 'user'
Out-of-scope behavior change in #23521 — the kanban notifier-routing fix
also flipped the 'kanban create --created-by' default from 'user' to the
active profile name. Revert to keep PR scope focused on the notifier
ownership fix; the profile-aware author default can be its own change.
2026-05-10 20:04:53 -07:00
Mike Nguyen ba5640fa11 fix(gateway): route kanban notifications to creator profile 2026-05-10 20:04:53 -07:00
teknium1 9e005d6779 chore: AUTHOR_MAP entry for NivOO5 2026-05-10 20:02:50 -07:00
teknium1 7f90141c63 test(telegram): native-draft transport coverage + docs
Added tests/gateway/test_stream_consumer_draft.py with 11 tests
covering:
- Transport selection: auto+dm-supported -> draft; auto+group -> edit;
  explicit edit; explicit draft on unsupported adapter -> edit;
  MagicMock adapter -> edit (back-compat for the existing test suite).
- Happy path: DM stream animates draft frames with a single shared
  draft_id, then finalizes via a regular adapter.send.
- Group fallback: drafts entirely skipped in non-DM chats.
- Failure fallback: send_draft returning success=False disables drafts
  for the rest of the response.
- Draft_id lifecycle: consecutive responses use distinct ids; tool
  boundaries bump the id so post-tool text animates fresh below the
  tool-progress bubble (the openclaw #32535 leak guard).
- _already_sent contract: drafts must NOT set the flag so the gateway's
  fallback final-send still fires (drafts have no message_id).

Updated website/docs/user-guide/messaging/telegram.md with a
'Streaming transport' section explaining auto|draft|edit|off, the
DM-only constraint, and the per-response fallback behaviour.
2026-05-10 20:02:50 -07:00
NivOO5 4ed293b38e feat(telegram): native draft streaming via sendMessageDraft (Bot API 9.5+)
Adds Telegram's native streaming-draft API as a streaming transport so DM
replies render with smooth animated previews as tokens arrive, dropping
the per-edit jitter of the legacy editMessageText polling path.

Adapter contract (gateway/platforms/base.py):
  - supports_draft_streaming(chat_type, metadata) -> bool. Default False.
    Telegram returns True only for DMs and only when the bound python-
    telegram-bot version exposes Bot.send_message_draft (PTB 22.6+).
  - send_draft(chat_id, draft_id, content, metadata) -> SendResult.
    Default raises NotImplementedError. Telegram delegates to PTB's
    send_message_draft. Drafts have no message_id (Bot API contract);
    SendResult.message_id is None on success.

Telegram adapter (gateway/platforms/telegram.py):
  - supports_draft_streaming gates on chat_type='dm' AND PTB capability.
  - send_draft trims to MAX_MESSAGE_LENGTH using utf16_len, threads
    message_thread_id through metadata, and routes failures back as
    SendResult(success=False, error=...) so the consumer can fall back.

Stream consumer (gateway/stream_consumer.py):
  - StreamConsumerConfig gains transport ('auto'|'draft'|'edit'|'off')
    and chat_type fields.
  - run() resolves _use_draft_streaming once via a probe at the top of
    the run, allocating a fresh class-wide draft_id_counter so each
    response animates as its own preview (no animation collision across
    consecutive responses to the same chat).
  - _send_or_edit gains a pre-edit branch: when drafts are active AND
    not finalizing AND no edit-path message_id is established, the
    frame routes through _send_draft_frame instead of edit_message.
    Drafts intentionally do NOT set _already_sent so the gateway's
    final sendMessage path still fires — drafts have no message_id and
    the user needs a real message in their chat history.
  - _reset_segment_state bumps the draft_id when the consumer is in
    draft mode so each text block after a tool boundary animates as a
    fresh preview below the tool-progress bubble (avoids the inter-
    tool-call leak openclaw documented in their #32535).
  - Per-response fallback: any send_draft failure (transient network,
    server reject, capability gap) flips _use_draft_streaming to False
    for the rest of the run, gracefully returning to the edit path.

Gateway config (gateway/config.py):
  - StreamingConfig.transport default flips edit -> auto. The auto path
    is identical to edit on every chat type that doesn't currently
    support drafts (groups, supergroups, forum topics, every non-
    Telegram platform), so the default is backwards-compatible for
    non-DM users.

Lifecycle model (Telegram Bot API 9.5):
  1. sendMessageDraft(chat_id, draft_id, text='') opens the bubble.
  2. Repeated sendMessageDraft calls with the SAME draft_id animate
     the preview as text grows.
  3. Drafts have no message_id and cannot be edited or deleted.
  4. When the response finishes the gateway's normal sendMessage path
     delivers the final answer; the draft preview clears naturally on
     the client and the user sees a real message in their history.

Inspired by PR #3412 by @NivOO5. Re-authored against current main
(stream_consumer.py is now ~4x larger than at #3412's branch base, with
new _NEW_SEGMENT/_COMMENTARY/finalize/_on_new_message machinery the
original PR didn't account for) but the design call (DM-only, edit-
fallback, transport=auto|draft|edit|off) is faithful to the original
proposal, with two improvements baked in:

  1. Per-response draft_id (monotonic counter, not a time hash) — no
     collision risk across consecutive responses on the same chat.
  2. Tool-boundary draft_id bump — prevents the inter-tool-call leak
     openclaw hit during their rollout (their #32535).

Closes #21439 (duplicate feature request).
2026-05-10 20:02:50 -07:00
Teknium 80bb5f2947 fix(achievements): use canonical X-Hermes-Session-Token header
Follow-up to TreyDong's fix: switch the auth header to
`X-Hermes-Session-Token` (the canonical pattern used by the rest of
the dashboard SPA — see `web/src/lib/api.ts` `fetchJSON()`). The
server still accepts both schemes, so the original `Authorization:
Bearer` form would also work; we standardize on X-header to match
every other dashboard fetch and only set the header when a token is
actually present.

Also add scripts/release.py AUTHOR_MAP entry for treydong.zh@gmail.com.
2026-05-10 19:41:45 -07:00
treydong da2ed478b5 fix(achievements): inject Authorization header in plugin API calls 2026-05-10 19:41:45 -07:00
Teknium 771b8c4a36 test(conftest): plug every gateway-kill leak path (#23486)
The existing _live_system_guard (PR #23397) blocked os.kill / os.killpg
and a narrow subset of subprocess invocations. Tests still SIGTERMed the
live gateway today (May 10) because the guard had structural holes.

Plug them all:
- subprocess: also wrap getoutput, getstatusoutput
- os.system, os.popen - completely unwrapped before
- pty.spawn - completely unwrapped before
- asyncio.create_subprocess_exec / create_subprocess_shell - bypassed
  the subprocess module entirely; now wrapped
- Subprocess command inspection now looks at the WHOLE command string,
  not just tokens[0]. Catches sudo systemctl, env systemctl, bash -c
  'systemctl', setsid systemctl, /usr/bin/systemctl, etc.
- New process-killer block: pkill / killall / taskkill / fuser
  targeting hermes/python patterns is now refused
- os.kill PID 0 (own group) allowed; PID -1 (every process we can
  signal) refused
- subprocess.Popen wrapper preserves __class_getitem__ so third-party
  packages that use Popen[bytes] as a type annotation still import

Coverage is locked in by tests/test_live_system_guard_self_test.py -
exercises every primitive against a guaranteed-foreign PID and asserts
the guard fires. Adding a new kill primitive without updating the guard
breaks CI.

scripts/run_tests.sh now also force-loads ~/.hermes/pytest_live_guard.py
when present (developer-machine convenience), so even worktrees that
predate this commit get the protection on subsequent test runs through
the canonical wrapper.
2026-05-10 18:55:28 -07:00
Teknium e5bce320db fix(auxiliary): evict cached client on timeout/connection error (#23482)
A Codex auxiliary timeout closes the underlying OpenAI client (so the
streaming hang doesn't sit until the user kills the session), but the
cached wrapper kept pointing at the now-dead transport. Subsequent
auxiliary calls (compression retry, memory flush, background review,
title generation routed via provider: main) reused that closed client
and failed fast with 'Connection error' until the gateway restarted —
even though the main agent route was healthy the whole time.

Sync `_get_cached_client` had no liveness check (async did, via loop
identity), and the connection-error fallback in `call_llm` only fired
on the auto provider path, so an explicit provider — including the
common `auxiliary.compression.provider: main` shape — never evicted.

Three fixes:

* New `_evict_cached_client_instance(target)` helper that drops the
  cache entry whose stored client is target (or wraps it via
  `_real_client`, for `CodexAuxiliaryClient`).
* `_CodexCompletionsAdapter._close_client_on_timeout` evicts the
  wrapper after closing the inner OpenAI client.
* `call_llm` and `async_call_llm` evict on `_is_connection_error`
  before re-raising, regardless of whether the provider is auto.

Net effect: one timeout costs one summary attempt + the existing 30s
compressor cooldown; the next compaction rebuilds the client and
works. Non-connection errors (4xx/5xx) do not evict, so cache hits
stay stable.

Closes #23432
2026-05-10 18:55:05 -07:00
Teknium1 ae83a54be4 docs(kanban): worker lane contract page + review-required convention
Closes the architectural-pin part of #19931. Most of what that issue
asked for is already implemented (logs under kanban root, env-pinned
workspace, dispatcher routing of unknown assignees, lifecycle
ownership, structured handoff conventions). What was missing:

1. A written contract integrators can point at when adding a new
   worker lane shape, and
2. The "code-changing workers should not auto-promote success to
   done" convention.

This commit ships both as docs+convention layered on existing primitives.
No kernel changes — the kanban_complete / kanban_block / kanban_comment
surfaces already support the review-required pattern; we just hadn't
written it down or made it visible to workers.

Changes:

- `agent/prompt_builder.py::KANBAN_GUIDANCE`: append the review-required
  exception to step 5 of the lifecycle. Workers get the cue
  auto-injected into their system prompt — drop structured metadata
  into a kanban_comment first, then end with
  kanban_block(reason="review-required: <summary>") instead of
  kanban_complete when the work needs review. Total prompt size went
  from ~3000 to ~3275 chars; well under the 4096 budget enforced by
  test_kanban_guidance_size.

- `skills/devops/kanban-worker/SKILL.md`: add a worked example to the
  existing "Good summary + metadata shapes" section between the
  Coding-task and Research-task examples. Same shape as the others
  (kanban_comment with structured handoff JSON, then kanban_block with
  the human-readable reason). Plus a one-line guide on when to use
  kanban_complete vs the review-required pattern.

- `website/docs/user-guide/features/kanban-worker-lanes.md` (new): the
  integrator-facing contract. Covers the hierarchy, the three things
  every lane must provide (assignee, spawn mechanism, lifecycle
  terminator), the env vars the dispatcher injects, the
  review-required convention, the failure modes the kernel handles
  for free, and an explicit "external CLI worker lane" deferred-
  pending-concrete-asker section that links to #19931 and #19924.

- `website/sidebars.ts`: link the new page under user-guide/features.

The "specialist worker lanes for external CLI tools (Codex / Claude
Code / OpenCode)" runner is NOT shipped here. The dispatcher's
spawn_fn parameter already supports plugin-shaped extension; the
per-CLI integration work (auth, sandbox policy, exit-code mapping)
needs a concrete asker. The new docs page tells would-be integrators
the contract any such lane must satisfy.

Refs #19931
2026-05-10 18:15:52 -07:00
teknium1 666b751536 chore: AUTHOR_MAP entry for rahimsais 2026-05-10 18:09:31 -07:00
rahimsais 737314fe91 fix(telegram): normalize dm threads and retry control sends
Cherry-picked from PR #10371. Two-layer defense for the spurious-thread_id
issue (#3206):

1. _build_message_event filters DM thread_ids: only preserve thread_id
   for real topic messages (is_topic_message=True). Telegram puts
   message_thread_id on every DM that is a reply, but reply-chain ids
   route to nonexistent threads on send.

2. _send_message_with_thread_fallback helper: control sends
   (send_update_prompt, send_exec_approval / send_slash_confirm,
   send_model_picker) retry once without message_thread_id when
   Telegram returns BadRequest 'Message thread not found'. Mirrors
   the pattern PR #3390 added for the streaming send path.

Salvage notes:
- Conflict 1 (line ~4099): merged the contributor's DM is_topic_message
  filter with the existing forum General-topic default from #22423,
  preserving both behaviors.
- Conflict 2 (line ~1664 / 1690): kept main's delete_message (PR #23416)
  alongside the new helper. Tightened the helper's exception catch
  from bare 'Exception' to use the existing _is_bad_request_error +
  _is_thread_not_found_error helpers (line 484-496) for consistency
  with the streaming send path.
- Widened the fix to send_update_prompt (was bare self._bot.send_message,
  same bug class).

Authored by rahimsais via PR #10371 (re-attributed from donrhmexe@
local commit author).
2026-05-10 18:09:31 -07:00
Teknium 404640a2b7 feat(goals): /goal checklist + /subgoal user controls (#23456)
* feat(goals): /goal checklist + /subgoal user controls

Two-phase judge for /goal — Phase A decomposes the goal into a detailed
checklist on first turn; Phase B evaluates each pending item harshly
against the agent's most recent response. The goal completes only when
every item is in a terminal status (completed or impossible). Adds
/subgoal so the user can append, complete, mark impossible, undo,
remove, or clear items the judge missed or got wrong.

Mechanics:
- GoalState gains `checklist` and `decomposed` fields, both backwards
  compatible (old state_meta rows load unchanged).
- Phase A: aux call writes a harsh, exhaustive checklist; biased toward
  more items not fewer. Falls through to legacy freeform judge when
  decompose fails.
- Phase B: judge gets the checklist + last-response snippet + path to
  a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json.
  A bounded read_file tool (max 5 calls per turn, restricted to that
  one file) lets the judge inspect history when the snippet is
  ambiguous. Stickiness in code: terminal items are frozen, only the
  user can revert via /subgoal undo.
- Continuation prompt shows checklist progress when non-empty;
  reverts to old prompt when empty.
- Status line shows M/N done counts.

CLI + gateway + TUI gateway all pass the agent reference into
evaluate_after_turn so the dump can be written. Gateway-side
/subgoal is allowed mid-run since it only modifies the checklist
the judge consults at turn boundaries.

Tests: 24 new cases — backcompat round-trip, Phase A decompose,
Phase B updates + new_items + stickiness, user override flows,
conversation dump (incl. unsafe-sid sanitization), judge read_file
restriction. Existing freeform-mode tests updated to patch the
renamed `judge_goal_freeform` and skip Phase A explicitly.

* fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning

Three live-test findings from running /goal end-to-end against
gemini-3-flash-preview as the judge:

1. Off-by-one bug — the judge sees the checklist rendered with 1-based
   indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed
   state.checklist as 0-based. Result: every judge update landed on
   the wrong item, evidence got attached to neighbouring rows, and
   the genuine 'first pending' item (usually #1) never got marked.
   Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the
   user prompt to call out the 1-based scheme explicitly. New tests
   cover the parser conversion + an end-to-end fake-judge round-trip.

2. Conversation dump never happened — _extract_agent_messages tried
   common AIAgent attribute names (.messages, .conversation_history,
   etc.) but AIAgent doesn't expose the message list as an instance
   attribute; it lives inside run_conversation()'s scope. Result: the
   judge's read_file tool always saw history_path=unavailable. Fix:
   added an explicit messages= kwarg to evaluate_after_turn that all
   three call sites (CLI, gateway, TUI gateway) now pass directly.
   Agent-attribute extraction kept as back-compat fallback.

3. Prompt was too harsh on simple goals. The original 'be HARSH,
   default to leaving items pending' wording made the judge refuse
   to mark 'file exists' completed even after the agent ran ls,
   test -f, os.path.isfile, and find — burning the entire 8-turn
   budget on a fizzbuzz task. Softened to 'strict but not absurd'
   with explicit guidance on what counts as evidence and a directive
   not to require re-proving items already established earlier.

Re-tested live with the same fizzbuzz goal: now terminates in 2
turns with all 8 checklist items correctly attributed to their
own evidence. /subgoal user-action flow (add / complete / undo /
impossible) verified live as well.
2026-05-10 16:56:51 -07:00
teknium1 c0bbdec850 chore: AUTHOR_MAP entry for Freeman-Consulting 2026-05-10 16:21:07 -07:00
teknium1 121bbe0385 test(stream-consumer): add UTF-16 overflow regression tests for #11170
New TestUtf16OverflowDetection class covers two scenarios:
- test_emoji_text_exceeding_utf16_limit_triggers_overflow_split: feeds
  2200 emoji codepoints (4400 UTF-16 units) — under Telegram's
  codepoint-equivalent limit but over its UTF-16 limit. Asserts
  truncate_message was called with len_fn=utf16_len, confirming the
  consumer detected the overflow.
- test_codepoint_only_adapter_falls_back_to_len: documents that
  adapters which don't subclass BasePlatformAdapter (or test MagicMocks)
  fall back to plain len for backwards compat.

The contributor's PR shipped no tests for the UTF-16 path.
2026-05-10 16:21:07 -07:00
Aubrey Freeman III c0da5d09a6 fix: use UTF-16 length for Telegram stream consumer message splitting
The stream consumer measured message length using Python's len() (Unicode
code points), but Telegram's actual limit is in UTF-16 code units. This
caused messages with supplementary characters (emoji, CJK, etc.) to exceed
Telegram's 4096-character limit, resulting in truncated messages with
formatting artifacts.

Changes:
- Add message_len_fn property to BasePlatformAdapter (defaults to len)
- Override in TelegramAdapter to return utf16_len
- Stream consumer uses adapter.message_len_fn for:
  - safe_limit calculation
  - overflow detection
  - truncate_message calls
  - split point calculation (via _custom_unit_to_cp)
  - fallback final send chunking

Fixes truncated messages with black square artifacts on Telegram when
the model generates responses containing multi-byte Unicode characters.
2026-05-10 16:21:07 -07:00
Teknium c5f1f863ac fix(cli): drive _prompt_text_input directly when off main thread (#23454)
Slash commands (/clear, /new, /undo, /reload-mcp) are dispatched from the
process_loop daemon thread.  prompt_toolkit.run_in_terminal returns a
coroutine that only the main-thread event loop can drive, so calling it
from a daemon thread orphans the coroutine — the input prompt never
renders and user keystrokes leak into the composer instead of the
confirmation prompt (issue #23185).

Mirror the thread-aware guard already in _run_curses_picker: when off the
main thread, fall back to a direct input() call.  Also wrap
run_in_terminal in try/except so WSL / Warp / other emulators that
silently drop the scheduled coroutine fall back to input() too.

Tests: tests/cli/test_prompt_text_input_thread_safety.py covers main
thread (run_in_terminal path), daemon thread (direct input fallback),
no-app, run_in_terminal-raises, and EOF handling.
2026-05-10 16:16:10 -07:00
konsisumer 62cfe79e93 fix(tools): clarify kanban_complete phantom-card retry guidance
When kanban_complete rejects a created_cards list as hallucinated, the
task is intentionally left in-flight (the gate runs before the write
txn) so the worker can retry with a corrected list or pass
created_cards=[] to skip the check. The retry path already worked, but
the previous error wording read like a terminal failure and workers
were observed abandoning the run instead of trying again.

Spell out the recovery path explicitly in the tool_error response
("Your task is still in-flight ... Retry kanban_complete with ...") and
add regression coverage at both the kernel and tool layers so the
retry contract — and the wording the worker depends on to discover
it — is pinned.

Fixes #22923
2026-05-10 16:14:43 -07:00
Keyu Yuan 2f00559d9e fix(telegram): pass source.thread_id explicitly on auto-reset notice (carve-out of #7404)
The auto-reset notice ("◐ Session automatically reset…") was being sent
with metadata=getattr(event, 'metadata', None), which can drop or
mis-route in Telegram forum topics: the event's metadata isn't
guaranteed to carry the originating thread_id, so the notice could leak
into General or another topic.

Use the existing self._thread_metadata_for_source(source) helper, which
already handles thread_id construction plus the Telegram DM topic
reply-fallback shape used everywhere else in the gateway.

Carve-out of #7404. The PR's other hunk (line 7578, queued first
response) is already redundant on main — gateway/run.py:15782 has used
_status_thread_metadata since the _thread_metadata_for_source plumbing
landed.

Closes #7355 (path B; paths A and C closed via prior salvage merges).
2026-05-10 16:12:40 -07:00
Wesley Simplicio a2920b1762 fix(tui): right-click copies selection, only pastes when no selection
Sub-issue 5 of #22034.

Right-click on the composer always pasted from the clipboard, even when
the user had highlighted text — diverging from terminal-native behavior
(xterm/iTerm/gnome-terminal) where right-click copies an active selection
and only pastes when nothing is selected.

Extract a small pure helper, decideRightClickAction(value, range), and
route the existing onMouseDown right-click branch through it. Selection
present and non-empty -> writeClipboardText(slice). Otherwise fall back
to the existing emitPaste path.
2026-05-10 16:06:33 -07:00
Teknium1 59d3f24f10 chore: AUTHOR_MAP entry for konsisumer noreply (#23071) 2026-05-10 15:23:04 -07:00
konsisumer 88588b6159 fix(kanban): extend stale claim instead of killing live worker
Workers running slow models (e.g. kimi-k2.6) can spend longer than
DEFAULT_CLAIM_TTL_SECONDS inside a single tool-free LLM call, making
no tool calls and therefore not heartbeating. release_stale_claims
previously reclaimed these healthy workers, producing the
spawn-then-immediately-reclaim loop reported in #23025.

When a stale-by-TTL claim's host-local worker PID is still alive,
extend the claim (emit a claim_extended event) rather than killing
it. enforce_max_runtime / detect_crashed_workers remain the upper
bounds for genuinely wedged or dead workers. Reclaim events now also
record claim_expires, last_heartbeat_at, worker_pid, and host_local
so operators can see why a worker was killed.
2026-05-10 15:23:04 -07:00
Teknium 3974a137c6 docs(user-stories): add 116 stories from the Hermes Discord archive (#23436)
* docs(user-stories): add 116 stories from Discord archive

Mined teknium1/nous-discord-archive for first-person user stories that match
the existing collage voice ('I run X every day', 'my family uses Hermes for
Y', 'so I built Z'). Skipped pure project pitches, Q&A, install help, and
generic announcements.

- Added 'discord' as a source in UserStoriesCollage (label + brand color)
- Added 116 entries to userStories.json (237 total, up from 121)
- Each entry links back to the discord-archive thread or channel archive file

* docs(user-stories): interleave discord stories across the full collage

Shuffle userStories.json with a fixed seed so the 116 Discord-sourced
entries are mixed evenly with the existing 121 entries instead of
appearing as a contiguous block at the end. Even distribution: 10-16
discord entries per decile across the array (ideal would be ~11).
2026-05-10 15:21:40 -07:00
Teknium d6e1fadbf5 fix(xai): omit reasoning.effort for grok models that reject it (#23435)
xAI's Responses API returns HTTP 400 ("Model X does not support
parameter reasoningEffort") for grok-4, grok-4-0709, grok-4-fast-*,
grok-4-1-fast-*, grok-3, grok-4.20-0309-*, and grok-code-fast-1 — even
though those models reason natively. Hermes was unconditionally sending
`reasoning: {effort: 'medium'}` to xAI for every Grok model, breaking
direct `--provider xai` for the entire grok-4 line.

Add a substring allowlist predicate (verified live against api.x.ai
2026-05-10) covering the only Grok families that accept the effort dial:
grok-3-mini*, grok-4.20-multi-agent*, grok-4.3*. The Responses transport
omits the `reasoning` key entirely for everything else while still
including `reasoning.encrypted_content` so we capture native reasoning
tokens.

Verified end-to-end: `hermes chat -q hi --provider xai --model grok-4-0709`
went from HTTP 400 to a successful reply.
2026-05-10 15:21:30 -07:00
teknium1 cc2a0c674a chore: AUTHOR_MAP entry for hrygo (黄飞虹) 2026-05-10 15:20:40 -07:00
teknium1 f9e0d60a99 test(thread-routing): handle both lark-SDK-present and absent paths
The contributor's regression test for Feishu fallback thread routing
asserted on attributes specific to the real lark SDK builder
(call_args.body, body.receive_id). In test environments without the
lark SDK installed, the in-tree fallback (gateway/platforms/feishu.py
_build_create_message_request) returns a SimpleNamespace using
.request_body instead of .body, causing AttributeError.

Now reads via getattr fallback and also verifies receive_id_type is
'thread_id' (not 'chat_id') as a stronger contract check.
2026-05-10 15:20:40 -07:00
黄飞虹 e164a9c1ed fix(stream-consumer): preserve thread routing on overflow first-send path
When the first streamed message exceeds the platform length limit and
gets split into chunks, _send_new_chunk was called with self._message_id
(which is None on first send), dropping thread routing entirely.

Fallback to self._initial_reply_to_id so overflow chunks land in the
correct topic/thread.

Also fix a fragile test assertion that could be silently skipped.
2026-05-10 15:20:40 -07:00
hrygo ff14666cdc fix(gateway): stream consumer first message drops thread context
Cherry-picked from PR #13077 commits:
- 5500c7d8 fix(gateway): stream consumer first message drops thread context
- e84403b9 test(gateway): add regression tests for stream consumer thread routing

Fixes: Streaming first message drops thread/topic context in Feishu group
topics, Slack threads, Telegram forum topics. Adds initial_reply_to_id
ctor arg to GatewayStreamConsumer, threaded through _send_or_edit and
_send_new_chunk. Also fixes Feishu _send_raw_message fallback path
(reply -> create) to use receive_id_type='thread_id' so the new message
lands in the correct topic instead of the main channel.

Authored by hrygo via PR #13077 (re-attributed from the bot-authored
salvage commit on the original branch).
2026-05-10 15:20:40 -07:00
Teknium 6636fecd47 fix(gateway): only mark final response sent when split-overflow chunks actually land (#23420)
The split-overflow path in _send_or_edit (gateway/stream_consumer.py) was
copying the cumulative _already_sent flag into _final_response_sent on the
done frame. _already_sent goes True on any successful prior edit (tool
progress) or on fallback-mode promotion when an edit fails — neither
proves the *current* chunked send delivered the final answer.

When the chunked send actually fails (network error, flood control), the
consumer would wrongly claim 'final delivered' and the gateway's
independent fallback delivery in run.py would be suppressed. User saw
only tool-progress bubbles and never got the answer.

Now we track per-chunk success locally: _send_new_chunk returns the new
message_id on success or returns the passed-in reply_to unchanged on
failure. If at least one returned id differs, chunks_delivered = True;
otherwise stays False, gateway fallback runs.

Adds two regression tests:
- test_split_overflow_failed_send_does_not_mark_final_sent — primes
  _already_sent=True, then makes every send fail; asserts
  _final_response_sent stays False.
- test_split_overflow_partial_send_marks_final_sent — happy path,
  asserts _final_response_sent goes True.

Note: the companion bug at the CancelledError handler (issue cited
lines 417-418) was already fixed by 3b5572ded on 2026-04-16.

Closes #10748
2026-05-10 15:13:54 -07:00
Teknium b38b100105 chore: AUTHOR_MAP entry for jelrod27 (#21398) 2026-05-10 14:27:59 -07:00
Teknium 787e3c368c test(kanban): cover redeliver-on-cycle + flip stale unsub-on-abnormal-event tests
Follow-up to the previous commit's notifier behavior change. Two test fixes:

1. `tests/gateway/test_kanban_notifier.py` gains
   `test_notifier_redelivers_same_kind_on_dispatch_cycle` — pins the new
   contract directly: a task that crashes, gets reclaimed, and crashes
   again notifies the user BOTH times. Before #21398 the second crash
   silently dropped because the subscription was already deleted.

2. `tests/hermes_cli/test_kanban_notify.py::
   test_notifier_unsubs_after_abnormal_events[gave_up|crashed|timed_out]`
   is flipped. Those tests were added in the salvage of #22941 and
   asserted the OLD behavior (subscription deleted after gave_up /
   crashed / timed_out). They're now obsolete — the new contract is
   "subscription survives a non-final terminal event so retries reach
   the user." Updated docstring + asserts; the cursor-advance check is
   added to confirm the dedup mechanism still works.

The `test_notifier_unsubs_after_completed_event` test stays untouched
because `completed` IS still a terminal event that triggers unsub
(the task hits `done` status, which is handled by the `task_terminal`
branch in the notifier loop).
2026-05-10 14:27:59 -07:00
jelrod27 a96dd54872 fix: deduplicate kanban notifications for blocked/gave_up states
The kanban notifier was re-firing the same blocked/gave_up/crashed/timed_out
notifications on every 5-second tick. Root cause: after delivering a terminal
event, the notifier unsubscribed the subscription, deleting its cursor. If
the unsub failed (WAL contention, transient error), the subscription survived
with a stale cursor, and the next tick would re-deliver the same event.

Even when the unsub succeeded, the subscription was gone. If the task later
transitioned to a different state (e.g., blocked -> unblocked -> blocked
again), a new subscription would start at cursor=0, re-delivering all past
events.

Fix: stop unsubscribing on terminal event kinds. Only remove the subscription
when the task reaches a truly final status (done/archived). For blocked,
gave_up, crashed, and timed_out, the subscription stays alive and the cursor
mechanism deduplicates naturally -- events with id <= last_event_id are never
re-fetched. This makes the dedup idempotent and eliminates the re-fire bug.

The old concern about subscriptions leaking forever on blocked tasks is moot:
blocked tasks will eventually be unblocked (transitioning to ready/running)
or archived, at which point the subscription is cleaned up.
2026-05-10 14:27:59 -07:00
teknium1 04e18160ab chore: AUTHOR_MAP entry for HuangYuChuh 2026-05-10 14:22:59 -07:00
teknium1 ec1fad3449 fix(gateway): align fallback delete with sibling style + add regression tests
Follow-up to HuangYuChuh's #17384 cherry-pick:

- Use defensive getattr+logger.debug for delete_message lookup, mirroring
  the sibling _try_send_fresh_final cleanup pattern at L820+. Platforms
  that don't implement delete_message no longer raise AttributeError; the
  failure path now logs at debug for diagnosability instead of silently
  swallowing.
- Add three regression tests in tests/gateway/test_stream_consumer.py:
  - delete_message awaited on happy-path exit with stale id
  - delete_message NOT awaited when no fallback chunks reached the user
  - no crash on adapters that lack delete_message (spec-restricted mock)
2026-05-10 14:22:59 -07:00
HuangYuChuh 4eb8479ebd fix(gateway): delete partial message after fallback send on flood control
When Telegram flood control triggers 3+ consecutive edit failures, the
stream consumer enters fallback mode and sends the complete response as
a new message. This leaves the user seeing two messages: a frozen
partial (with cursor) and the full duplicate.

After the fallback chunks are sent successfully, delete the original
partial message so the user only sees one complete response. The delete
is best-effort — if it fails (e.g. flood still active, missing
permissions), the full answer is still delivered.

Fixes #16668
2026-05-10 14:22:59 -07:00
Teknium cdb6e5e52a test(conftest): block tests from killing the live hermes-gateway (#23397)
The shutdown forensics added in #23285 caught tests/hermes_cli/ pytest
runs sending SIGTERM to the developer's live gateway 5+ times in 3
days. Root cause: when a single test forgets to mock os.kill or
find_gateway_pids, the real call leaks past the hermetic HERMES_HOME
isolation — find_gateway_pids' psutil scan walks the whole machine and
returns the live gateway PID, then the unmocked os.kill delivers the
signal.

Rather than audit and patch ~30 tests across cmd_update, kill_gateway_processes,
and stop_profile_gateway code paths, install a single autouse guard in
tests/conftest.py that blocks the two primitives that actually cause
the damage:

  - os.kill rejects any PID outside the test process subtree with a
    hard RuntimeError so the offending test gets a stack trace instead
    of silently murdering the real gateway.
  - subprocess.run / Popen / call / check_call / check_output reject
    any 'systemctl <verb> hermes-gateway' invocation that would mutate
    the live unit. Read-only systemctl calls (status, show, list-units)
    still pass through.

We intentionally do NOT stub find_gateway_pids / _scan_gateway_pids —
tests of those functions themselves need the real implementation.
Discovery without delivery is harmless; the os.kill + systemctl guards
catch the actual damage path.

Tests that legitimately need real signal delivery (e.g. PTY tests
signalling their own child) opt out via
@pytest.mark.live_system_guard_bypass.

Validation: tests/hermes_cli/ + tests/cli/ + tests/gateway/ produce
the same 17 failures with and without this guard (all pre-existing on
main, unrelated to gateway-kill leaks). The live gateway survives the
test run that previously SIGTERMed it.
2026-05-10 13:20:27 -07:00
Mike Nguyen 6062c24fd1 ci: skip lint comment on fork PRs 2026-05-10 13:19:41 -07:00
Teknium 9c68d12079 test(kanban): cover send-exception rewind + drop noisy success log to debug
Two follow-up improvements to the previous commit's notifier dedup work.

1. Add a regression test for the send-exception rewind path. The
   contributor's PR included a test for the adapter-disconnect path
   (test_kanban_notifier_rewinds_claim_if_adapter_disconnects, where
   adapter is None at delivery time), but not for the "adapter is
   connected, send() raises" path that fires inside the inner try/except
   at gateway/run.py:4314. The new test
   (test_kanban_notifier_rewinds_claim_on_send_exception) uses a
   FailingAdapter that always raises and confirms (a) send was actually
   attempted, (b) the claim was rewound, (c) the next call to
   unseen_events_for_sub still returns the event for retry.

2. Drop the per-delivery success log from INFO to DEBUG. A busy board
   on a multi-platform gateway can produce hundreds of these per day;
   that's gateway.log noise that obscures real warnings. Failure paths
   stay at WARNING (where you'd want to look when something's wrong)
   so we don't lose visibility into transient send issues.
2026-05-10 13:19:41 -07:00
Mike Nguyen 861ce7c0b6 fix: dedupe kanban notifier delivery claims 2026-05-10 13:19:41 -07:00
Teknium 373c4d6647 docs(sessions): document /handoff cross-platform session transfer (#23400)
Adds a Cross-Platform Handoff section to user-guide/sessions.md covering
the CLI flow, per-platform thread behavior (Telegram topics / Discord
threads / Slack message-anchored / no-thread fallback), failure modes,
and the resume-back-to-CLI loop.

Adds the /handoff entry to reference/slash-commands.md and updates the
CLI-only commands note.
2026-05-10 13:12:37 -07:00
Teknium 4d9dcbc47a fix(windows): unbreak install + update on Windows (#23394)
Three issues hit during a fresh Windows install + first `hermes update`:

1. `pyproject.toml` re-introduced the invalid `exclude-newer = "7 days"`
   under [tool.uv]. uv requires an RFC 3339 / ISO date — relative-duration
   strings parse-fail. The line was removed in PR #21221 on May 7 and
   accidentally added back in the v0.13.0 release commit (498bfc7bc1)
   the same day. Every uv invocation throughout install logged a TOML
   parse error, confusing users into thinking the install was broken.
   Fix: remove the line (and the now-empty [tool.uv] section).

2. `hermes update` failed on Windows with
   `Access is denied. (os error 5)` when uv tried to overwrite
   `venv\\Scripts\\hermes.exe` — the running entry-point shim. Windows
   blocks REPLACE on a mapped/loaded executable but allows RENAME (kernel
   tracks the file by handle, not path; same trick Chrome/Firefox use for
   self-update). Pre-rename live shims to `hermes.exe.old.<unix-ms>`
   before each `uv pip install -e .`; uv writes a fresh shim at the
   original path; the .old files are swept on the next hermes invocation.
   Wraps every install attempt (primary, base-only fallback, and
   per-extra retries). Restores shims if uv fails before writing
   replacements.

3. Tools post-setup hooks (ddgs, piper-tts, kittentts, langfuse,
   tinker-atropos) shelled out to `[sys.executable, '-m', 'pip', ...]`
   and died with `No module named pip` on every fresh Windows install.
   install.ps1 creates the venv via `uv venv` which doesn't seed pip;
   install.ps1 bootstraps pip later, but only inside the platform-SDK
   verify block — by then the wizard's post-setup hooks have already
   run and failed.

   New `_pip_install` helper tries uv pip first (works in pip-less
   venvs), then python -m pip, then ensurepip-bootstrap-then-pip. All
   five post-setup sites now route through it.

E2E:
- uv pip compile pyproject.toml — no parse warning
- quarantine + cleanup with simulated Windows scripts dir; rollback
  works when uv install fails before writing replacement shim
- _pip_install in a real `uv venv`-created (pip-less) venv: bootstraps
  pip via ensurepip and completes the install

Tests: tests/hermes_cli/ — 4135 pass, 8 pre-existing failures on main
unrelated to this PR (kanban_boards, openclaw_migration,
update_gateway_restart, web_server PluginAPIAuth).
2026-05-10 13:07:08 -07:00
teknium1 00ce5f04d9 feat(session): make /handoff actually transfer the session live
Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow
deferred everything to whenever a real user happened to message the
target platform; this rewrites it so the gateway picks up handoffs
immediately and the destination chat just starts working.

State machine on sessions table replaces the boolean flag:
  None -> 'pending' -> 'running' -> ('completed' | 'failed')
plus handoff_error for failure reasons. CLI request_handoff /
get_handoff_state / list_pending_handoffs / claim_handoff /
complete_handoff / fail_handoff helpers wrap the transitions.

CLI side (cli.py): /handoff <platform> validates the platform's home
channel via load_gateway_config, refuses if the agent is mid-turn,
flips the row to 'pending', and poll-blocks (60s) on terminal state.
On 'completed' it prints the /resume hint and exits the CLI like
/quit. On 'failed' or timeout it surfaces the reason and the CLI
session stays intact.

Gateway side (gateway/run.py): new _handoff_watcher background task
scans state.db every 2s, atomically claims pending rows, and runs
_process_handoff for each. _process_handoff:

  1. Resolves the platform's home channel.
  2. Asks the adapter for a fresh thread via the new
     create_handoff_thread(parent_chat_id, name) capability so the
     handed-off conversation gets its own scrollback. Adapters that
     don't support threads (or fail) return None and the watcher
     falls back to the home channel directly.
  3. Constructs a SessionSource keyed as 'thread' when a thread was
     created, 'dm' otherwise, then session_store.switch_session
     re-binds the destination key to the CLI session_id. The full
     role-aware transcript replays via load_transcript on the next
     turn (no flat-text injection into context_prompt).
  4. Forges a synthetic MessageEvent(internal=True) with the handoff
     notice and dispatches through _handle_message; the agent runs
     against the loaded transcript and adapter.send delivers the
     reply.
  5. Marks the row 'completed' on success, 'failed' (+error) on any
     exception.

Adapter capability (gateway/platforms/base.py): create_handoff_thread
default returns None. Three overrides:

  - Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic
    so DM topics (Bot API 9.4+) and forum supergroups both work.
  - Discord (gateway/platforms/discord.py): parent.create_thread on
    text channels with a seed-message + message.create_thread
    fallback for permission edge cases. Skips DMs and other
    non-thread-capable parents.
  - Slack (gateway/platforms/slack.py): posts a seed message and
    returns its ts as the thread anchor — Slack threads are
    message-anchored.

In thread mode, build_session_key keys the destination without
user_id (thread_sessions_per_user defaults to False) so the synthetic
turn and any later real-user message in the thread share the same
session_key — seamless takeover without race.

CommandDef stays cli_only=True (handoff is initiated from the CLI;
gateway exposes /resume for the reverse direction).

Removed the original PR's _handle_message_with_agent handoff hook
(transcript-as-text injection into context_prompt) and the
send_message_tool notification — both replaced by the watcher path.

Tests rewritten around the new state machine: 13/13 pass.
E2E-validated thread + no-thread paths and the failure path against
real worktree imports with mocked adapters.
2026-05-10 13:06:25 -07:00
kshitijk4poor 878611a79d feat(session): add /handoff command for cross-platform session transfer
Adds /handoff <platform> CLI command that queues the current session for
resume on the configured home channel of any messaging platform.

CLI side:
- /handoff telegram — marks session in shared DB, sends summary to
  the Telegram home channel via send_message
- /handoff discord — same for Discord
- Supports telegram, discord, slack, whatsapp, signal, matrix

Gateway side:
- On new session creation, checks for pending handoffs for the
  incoming message's platform
- If found, loads the CLI session's full conversation history and
  injects it into the context prompt as a handoff transcript
- Agent continues the conversation seamlessly

Files:
- hermes_state.py: handoff_pending, handoff_platform columns + helpers
- cli.py: _handle_handoff_command dispatch + handler
- hermes_cli/commands.py: CommandDef entry
- gateway/run.py: handoff detection in _handle_message_with_agent
- tests/hermes_cli/test_session_handoff.py: 8 tests
2026-05-10 13:06:25 -07:00
Teknium 6e5c49bdc4 refactor(kanban-orchestrator): drop hardcoded specialist roster, add Step-0 profile discovery
The skill enumerated 8 specialist profile names (researcher, analyst,
writer, reviewer, backend-eng, frontend-eng, ops, pm) as "the standard
roster" and told orchestrators to "assume these exist." Almost no real
Hermes setup matches that fleet — single-profile setups, Docker-worker
setups, and curated-team setups all violate it — so following the skill
literally produced cards assigned to non-existent profiles, which the
dispatcher silently failed to spawn (no autocorrect, no fallback, just
sits in `ready` forever).

Changes:

- Drop the standard-specialist-roster table.
- Add a "Profiles are user-configured — not a fixed roster" section at
  the top with a Step 0 that prescribes `hermes profile list` (or asking
  the user) before fanning out. Cache the result in working memory.
- Rewrite the worked task-graph example with placeholder names
  (<profile-A>, <profile-B>, <profile-C>) so the structure is still
  teachable but doesn't invite copy-paste of role names that may not
  exist.
- Reframe the "If no specialist fits" anti-temptation rule: don't
  invent profile names; ask the user.
- Add a "Inventing profile names that doesn't exist" entry to Pitfalls.
- Bump skill version 2.0.0 → 3.0.0 (semantic break: previous behavior
  promised a roster the skill no longer enumerates).
- Update website/docs/user-guide/features/kanban.md to drop the
  matching "(researcher, writer, analyst, backend-eng, reviewer, ops)"
  line and explain the discovery prompt instead.
- Re-run website/scripts/generate-skill-docs.py to refresh the
  auto-generated skill page + catalog.

Closes #21131 in spirit — addresses the same hardcoded-names footgun
@yehuosi flagged, with a different shape than their PR (delete the
roster rather than replace each name with placeholder, since the
roster table was the load-bearing footgun and the worked example is
salvageable with placeholder profile names).

Co-authored-by: yehuosi <yehuosi@users.noreply.github.com>
2026-05-10 12:59:11 -07:00
Teknium a282434301 feat(gateway): per-platform admin/user split for slash commands (salvage of #4443) (#23373)
* feat(gateway): per-platform admin/user split for slash commands

Adds an opt-in two-list access control on top of the existing per-platform
`allow_from` allowlists, scoped to slash commands only:

  - allow_admin_from         — full slash command access
  - user_allowed_commands    — what non-admins may run
  - group_allow_admin_from   — same, group/channel scope
  - group_user_allowed_commands

When `allow_admin_from` is unset for a scope, gating is disabled and every
allowed user keeps full access (backward compat). Plain chat is unaffected.
`/help` and `/whoami` are always reachable so users can see what they
can run.

Gate runs at the slash command dispatch site in gateway/run.py and uses
`is_gateway_known_command()`, so it covers built-in AND plugin-registered
commands through the live registry without per-feature wiring.

Adds `/whoami` showing platform, scope, tier, and runnable commands.

Salvage of PR #4443's permission tier work, scoped down. The full tier
system, tool filtering, audit log, usage tracking, rate limiting,
`/promote` flow, and persistent SQLite stores are not included here —
those can be re-expanded later if needed.

Co-authored-by: ReqX <mike@grossmann.at>

* fix(gateway): close running-agent fast-path bypass + add coverage and central docs

The slash command access gate was only applied at the cold dispatch site
(line ~5921). When an agent was already running, the running-agent
fast-path block (line ~5574) dispatched /restart, /stop, /new, /steer,
/model, /approve, /deny, /agents, /background, /kanban, /goal, /yolo,
/verbose, /footer, /help, /commands, /profile, /update directly
without going through the gate — letting non-admins bypass gating just
because an agent happens to be busy.

Refactored the gate into _check_slash_access() and called from BOTH
paths. /status remains intentionally pre-gate so users can always see
session state.

Also added 18 more dispatch tests covering:
  - Running-agent fast-path: blocks non-admin, allows admin, /status
    always works
  - Alias canonicalization (gate uses canonical name, not user alias)
  - Unknown / unregistered commands pass through (don't false-positive)
  - DM admin scope-locked when group has its own admin list
  - Multi-platform isolation (Discord gated, Telegram unrestricted)

Docs: added Slash Command Access Control section to the central
messaging index page + /whoami row in the chat commands table.

Co-authored-by: ReqX <mike@grossmann.at>

---------

Co-authored-by: ReqX <mike@grossmann.at>
2026-05-10 12:33:54 -07:00
Teknium 594209389d fix(xai): drop models being retired May 15, 2026 from pickers (#23291)
xAI is retiring grok-4, grok-4-0709, grok-4-fast{,-reasoning,-non-reasoning},
grok-4-1-fast{,-reasoning,-non-reasoning}, and grok-code-fast-1 on
May 15, 2026 at 12:00 PT. Remove them from the static fallbacks so the
`hermes model` picker, gateway /model picker, and setup wizard stop
auto-suggesting models that will be dead in days.

- _XAI_STATIC_FALLBACK in hermes_cli/models.py now lists only grok-4.20-*
  and grok-4.3 (the live replacements).
- copilot lists in hermes_cli/models.py and hermes_cli/setup.py drop
  grok-code-fast-1 (Copilot proxies it through xAI, so the upstream
  retirement breaks it there too).

Old configs that already reference retired IDs keep working until xAI
flips the switch — context-length lookups in agent/model_metadata.py and
the cache-affinity-header logic in provider_profiles still recognise the
old names. The cleanup here is purely about not advertising them to new
users.

Closes #23278.

Source: https://docs.x.ai/developers/migration/may-15-retirement
2026-05-10 12:12:55 -07:00
Teknium d62808c373 chore: AUTHOR_MAP entry for guglielmofonda (#21505) 2026-05-10 09:13:07 -07:00
Teknium 3fbbf58853 docs(kanban): document max_spawn as live concurrency cap (not per-tick budget)
Follow-up to the previous commit's behavior fix.

Adds a paragraph to dispatch_once's docstring making the concurrency-cap
semantic explicit, and an inline comment near the running_count query
explaining why we do the count (so a future reader doesn't refactor it
back to per-tick semantics thinking it's redundant). Both call out the
unbounded-accumulation failure mode that motivated the fix, since
nothing in the codebase or skills currently documents what max_spawn
is supposed to mean.

The semantic is per-board: each kanban board has its own SQLite file,
so the running-count COUNT(*) is naturally scoped to the board the
dispatcher tick is processing.
2026-05-10 09:13:07 -07:00
guglielmofonda 845be254ec fix(kanban): cap dispatch by running workers 2026-05-10 09:13:07 -07:00
Teknium cede612987 feat(gateway): shutdown forensics — non-blocking diag, per-phase timing, stale-unit warning (#23285)
When the gateway received SIGTERM, the shutdown_signal_handler ran a
synchronous 'ps aux' (3s timeout) inside the asyncio event loop, then
asyncio.create_task(runner.stop()).  On a busy host that ate 1-3s of
the teardown budget before draining could even start, and the resulting
log line was a multi-line ps dump that didn't tell us who sent the
signal.  The shutdown path itself logged 'Stopping gateway...' and then
nothing until 'Gateway stopped' — when systemd SIGKILLed mid-drain,
there was no way to see which phase wedged.

Changes:
- New gateway/shutdown_forensics.py:
  * snapshot_shutdown_context(sig) — sub-millisecond /proc-only capture
    of signal name, parent pid+name+cmdline, INVOCATION_ID (systemd
    marker), loadavg_1m, TracerPid, takeover/planned-stop marker
    presence + whether-it-names-self.  Pure stdlib, never raises.
  * spawn_async_diagnostic(log_path, sig) — detached subprocess with
    its own 'timeout 5s', start_new_session=True, writes ps auxf +
    pstree + dmesg to ~/.hermes/logs/gateway-shutdown-diag.log.
    Returns immediately, can't block the event loop or the cgroup
    teardown.
  * check_systemd_timing_alignment(drain_timeout) — reads
    /proc/self/cgroup for our unit, asks systemctl show for
    TimeoutStopUSec, returns mismatch info when the unit's stop
    timeout is smaller than restart_drain_timeout + 30s headroom
    (the case where systemd SIGKILLs mid-drain).
  * _parse_systemd_duration_to_us — covers '90s', '1min 30s',
    '500ms', '1h' style values from systemctl show.
  * format_context_for_log — single scannable key=value line, parent
    cmdline last.
- gateway/run.py shutdown_signal_handler:
  * Replaces synchronous ps aux + ad-hoc 'hermes-related lines' filter
    with snapshot + detached spawn.
  * Always logs 'Shutdown context: signal=... parent_pid=...
    parent_cmdline=...' regardless of planned/unexpected so we can
    correlate signal source even on planned restarts.
- gateway/run.py _stop_impl:
  * Per-phase '+X.XXs' timing for notify_active_sessions, drain
    (with drain_seconds, active_at_start, active_now, timed_out),
    post-interrupt tool kill, each adapter disconnect (Xs),
    all adapters disconnected, final-cleanup tool kill, SessionDB
    close, total teardown.
- gateway/run.py start():
  * Stale-unit warning at startup when the running systemd unit's
    TimeoutStopSec is smaller than the configured drain timeout.
    Points the user at 'hermes gateway service install --replace'
    to regenerate, or at shortening agent.restart_drain_timeout.

Tests: 30 new in tests/gateway/test_shutdown_forensics.py — snapshot
speed bound, signal name resolution, marker detection self-vs-other,
async diag spawn doesn't block caller, systemd duration parser, and
alignment check returns None outside systemd.  Wider tests/gateway/
suite: 5258 passing, 3 pre-existing TTS-routing failures unchanged
on main.
2026-05-10 09:01:51 -07:00
Teknium 1f5983c4c8 feat(kanban): aggregate all toolset-name typos in skills before raising
Follow-up to the previous commit's toolset-vs-skill validation.

The contributor's fix raises ValueError on the first toolset name found
in the skills list. That works for one mistake, but agents that confuse
skills with toolsets usually pass several at once
(`skills=["web", "browser", "terminal"]`) — and serial-correcting one
per failure round-trip wastes tokens. Collect all toolset-shaped
entries first, then raise once with the full list.

The error message is also slightly clearer:

    'web', 'browser', 'terminal' are toolset names, not skill name(s).
    Put toolsets in the assignee profile's `toolsets:` config instead of
    per-task skills. Skills are named skill bundles (e.g. `kanban-worker`,
    `blogwatcher`); toolsets are runtime capabilities (e.g. `web`,
    `browser`, `terminal`).

vs. the previous "the assignee profile's toolsets" — explicitly naming
the YAML key (`toolsets:`) and giving concrete examples in both
categories closes the conceptual gap that produced the bug to begin
with.

Adds one regression test (test_create_task_skills_lists_all_toolset_typos)
covering the multi-name aggregation path. The single-typo test from
the original PR still passes (the loose `match="toolset name"` matches
both singular and plural forms).
2026-05-10 08:41:28 -07:00
LeonSGP43 673418dfa1 fix(kanban): reject toolset names in task skills 2026-05-10 08:41:28 -07:00
Teknium a91e5a8759 feat(kanban-dashboard): native <details> collapse + skip empty metadata
Two follow-up improvements to Tranquil-Flow's metadata-panel restyle.
Both stay within the parent PR's "tone down the panel" scope.

1. Native <details>/<summary> collapse for verbose metadata.

   The parent PR consciously deferred this ("adding native expand/collapse
   would be the next step but requires UX agreement"). The default they
   asked for is straightforward: collapsed when the rendered JSON exceeds
   300 chars (the threshold where the max-height: 8.5rem cap actually
   starts mattering), expanded otherwise. <details>/<summary> is the right
   primitive — zero JS, browser-handled state, accessible by default
   (keyboard-navigable, screen-reader announces the disclosure state),
   and survives any react-state churn for free.

   The OS-default disclosure marker is suppressed (list-style: none +
   ::-webkit-details-marker hidden) and replaced with a CSS ::before
   chevron that rotates 90deg on the [open] attribute, so the look is
   consistent across Firefox/WebKit/Blink without the double-marker
   that would otherwise appear on the platforms that still render the
   default triangle.

2. Skip rendering when metadata is an empty object.

   `r.metadata && ...` truthy-checks, but `{}` is truthy in JS — so a
   completed task with no actual metadata would render a "Metadata"
   labeled disclosure block containing literal `{}`. Adds an
   Object.keys(r.metadata).length > 0 guard so empty payloads render
   nothing instead of an empty disclosure stub.

Tests: three new static-asset assertions covering the <details> shape,
the empty-object skip, and the suppress-default-marker + animated-chevron
CSS — all in `tests/plugins/test_kanban_dashboard_plugin.py`.
2026-05-10 08:30:42 -07:00
Tranquil-Flow 0e0ddaac8f fix(kanban-dashboard): tone down completed-run metadata panel (#19548)
Hand-rebased onto current main from PR #19980; the original branch was stale
against main (~6 unrelated dashboard fixes had landed since), so applying
the PR's dist files directly would have silently reverted them.

The run-history panel in the task drawer rendered each completed run's
`metadata` field as a `<code class="hermes-kanban-run-meta">` containing
`JSON.stringify(r.metadata)` — a single unindented monoline. With
`white-space: pre-wrap` and a monospace font, a writer task's metadata
(changed_files paths, source URLs, generated-artifact details) wrapped
into a tall block of code-ish text that filled the parent run row. The
container's faint `--color-foreground 3%` background then made the whole
thing read like a crash dump even though the run completed normally.

Restyle and label, no interactivity changes:

- Wrap the meta payload in a `.hermes-kanban-run-meta-block` sub-block
  with an explicit `Metadata` label (small, uppercase, muted) so the
  panel reads as auxiliary detail at a glance.
- Pretty-print the JSON (`indent=2`) so the structure is scannable
  instead of a wall of monoline text.
- Cap `.hermes-kanban-run-meta` at `max-height: 8.5rem; overflow: auto`
  so a verbose blob scrolls inside its own pane rather than swamping
  the run row.
- Sub-block uses a thin `border-left` rule and `background: transparent`
  — distinct from the destructive-tinted treatment used by crashed /
  timed_out / blocked / spawn_failed runs higher in the same file.

Tests: two new static-asset assertions in
`tests/plugins/test_kanban_dashboard_plugin.py` lock in the rendered
shape (the plugin ships built-only, no src/).
2026-05-10 08:30:42 -07:00
Teknium d4b26df897 perf(browser): route browser_console eval through supervisor's persistent CDP WS (180x faster) (#23226)
Adds CDPSupervisor.evaluate_runtime() and wires it into _browser_eval as a
fast path when a supervisor is alive for the current task_id. Replaces the
~180ms agent-browser subprocess fork+exec+Node-startup hop with a ~1ms
Runtime.evaluate over the supervisor's already-connected WebSocket.

Falls through to the existing agent-browser CLI path when no supervisor is
running (e.g. backends without CDP, or before the first browser_navigate
attaches one), so behaviour is unchanged where it can't apply.

JS-side exceptions surface directly without falling through to the
subprocess (the subprocess would just re-raise the same error, slower);
supervisor-side failures (loop down, no session) fall through cleanly.

Benchmark — 30 iterations of `1 + 1` against headless Chrome:
  supervisor WS              mean=  0.96ms  median=  0.91ms
  agent-browser subprocess   mean=179.35ms  median=167.73ms
  → 187x speedup mean

Tests: 14 unit tests (mocked supervisor + response-shape coverage), 5
real-Chrome e2e tests in test_browser_supervisor.py (gated on Chrome
being installed). Browser test suite: 355 passed, 1 skipped.
2026-05-10 07:37:55 -07:00
Teknium 08c5b35a73 test(kanban-dashboard): pin assignee-casing static-asset regressions + AUTHOR_MAP
Follow-up to the previous commit's casing fix.

The original PR shipped the dist edits without test coverage. The
contributor's reasoning (UI-only attributes in a pre-built JS bundle,
nothing meaningful to unit-test) is fair, but a static-asset assertion
catches the most likely regression vector — a future rebuild of the
dist bundle that loses the attributes — at near-zero cost.

Adds two regression tests in tests/plugins/test_kanban_dashboard_plugin.py:

- test_dashboard_assignee_inputs_preserve_casing — reads dist/index.js
  and asserts autoCapitalize="none", autoCorrect="off", spellCheck=false,
  and textTransform="none" each appear at least twice (one per assignee
  input — inline triage/lane create + task-edit panel).
- test_dashboard_lane_head_preserves_assignee_casing — reads dist/style.css
  and asserts the .hermes-kanban-lane-head rule body does NOT contain
  text-transform: uppercase. Locates the rule by marker so unrelated CSS
  churn nearby doesn't flake the test.

Both follow the same shape as the existing test_dashboard_requests_default_board_explicitly
static-asset guard from PR #22940's salvage.

Also adds the AUTHOR_MAP entry for princepal9120's GitHub-noreply email
so release notes credit the right account.
2026-05-10 07:35:01 -07:00
princepal9120 b308dd7d75 fix(kanban): preserve assignee casing in dashboard 2026-05-10 07:35:01 -07:00
Teknium 40a4bfa719 test(kanban): cover task_age safe-int guards + AUTHOR_MAP entry
Follow-up to the previous commit's safe-int task_age fix.

The original PR shipped without test coverage. This commit adds:

- test_safe_int_accepts_int_and_int_string — sanity for the well-typed
  path so the helper itself can't quietly start swallowing valid values.
- test_safe_int_returns_none_on_corrupt_inputs — the failure modes
  (None, '%s', 'abc', '', '1.5', random objects). Covers both the
  ValueError and TypeError catch branches.
- test_task_age_handles_corrupt_created_at — the headline regression:
  a task with created_at='%s' used to raise ValueError and turn
  GET /api/plugins/kanban/board into a 500.
- test_task_age_handles_corrupt_started_and_completed — confirms the
  safe-int treatment is consistent across all three timestamp fields.
- test_task_age_well_formed_task — regression that the safe path
  doesn't change observable output for normal data.
- test_task_dict_survives_corrupt_created_at — defense in depth.
  Writes a corrupt row directly via SQL, reads it back through the
  ORM, and confirms task_age + the surrounding plugin_api guard
  degrade gracefully instead of crashing.

Also adds the AUTHOR_MAP entry for the contributor's GitHub-noreply
email so release notes credit @baocin (the commit was authored locally
as `aoi <aoi@hino.local>` — re-attributed during salvage to the
github noreply form).
2026-05-10 07:15:59 -07:00
baocin 061a183008 fix(kanban): guard task_age against corrupt created_at values like '%s'
task_age() crashed with ValueError when created_at contained the
literal format string '%s' instead of a Unix timestamp, taking down
the entire GET /board endpoint with a 500.

- Add _safe_int() helper that returns None on non-numeric values
- Refactor task_age() to use _safe_int instead of bare int() casts
- Wrap task_age() call in _task_dict with try/except fallback so one
  corrupt row never kills the whole board endpoint
2026-05-10 07:15:59 -07:00
Teknium c39168453d feat(i18n): localize all gateway commands + web dashboard, add 8 new locales (16 total) (#22914)
* feat(i18n): localize /model command output

Reported by @tianma8888: when Chinese users run /model, the labels
("Provider:", "Context:", "_session only_", etc.) are still English.
This routes the static prose through the existing i18n catalog so it
follows display.language / HERMES_LANGUAGE.

Changes:
- locales/{en,zh,ja,de,es,fr,tr,uk}.yaml: add 17 keys under
  gateway.model.* covering switched/provider/context/max_output/cost/
  capabilities/prompt_caching/warning/saved_global/session_only_hint/
  current_label/current_tag/more_models_suffix/usage_*.
- gateway/run.py _handle_model_command: replace hardcoded f-strings in
  the picker callback, the text-list fallback, and the direct-switch
  confirmation block with t("gateway.model.<key>", ...).

What stays English:
- model IDs, provider slugs, capability strings, cost figures, and the
  "[Note: model was just switched...]" prepended to the model's next
  prompt (LLM-facing, not user-facing).
- The two slightly-different session-only hints unify on a single key
  with the em-dash phrasing.

Validation: tests/agent/test_i18n.py 27/27 passing (parity contract
holds), tests/gateway/ -k 'model or i18n' 74/74 passing.

* feat(i18n): localize all gateway slash command outputs

Expands the i18n catalog from 7 strings to 234 keys across 35 gateway
slash command handlers, so non-English users see localized output for
\`/profile\`, \`/status\`, \`/help\`, \`/personality\`, \`/voice\`, \`/reset\`,
\`/agents\`, \`/restart\`, \`/commands\`, \`/goal\`, \`/retry\`, \`/undo\`,
\`/sethome\`, \`/title\`, \`/yolo\`, \`/background\`, \`/approve\`, \`/deny\`,
\`/insights\`, \`/debug\`, \`/rollback\`, \`/reasoning\`, \`/fast\`,
\`/verbose\`, \`/footer\`, \`/compress\`, \`/topic\`, \`/kanban\`,
\`/resume\`, \`/branch\`, \`/usage\`, \`/reload-mcp\`, \`/reload-skills\`,
\`/update\`, \`/stop\` (plus the \`/model\` block already added in the
previous commit).

Reported by @tianma8888 — Chinese users want command output prose in
their language, not just the labels we already had.

Translations are hand-written for all 8 supported locales (en, zh, ja,
de, es, fr, tr, uk), matching each catalog's existing style: full-width
punctuation in zh, em-dashes in zh/ja/uk, French spaced colons,
German noun capitalization, etc.

What stays English (unchanged):
- Identifiers/values: model IDs, file paths, profile names, session IDs,
  command flag names like --global, URLs, config keys.
- Backtick code spans: \`/foo\`, \`config.yaml\`.
- Log messages (logger.info/warning/error).
- LLM-facing system notes prepended to next prompt (e.g. [Note: model
  was just switched...]).
- Strings produced by external modules (gateway_help_lines,
  format_gateway, manual_compression_feedback) — those have their
  own surfaces.

New shared keys for cross-handler boilerplate:
- gateway.shared.session_db_unavailable (5 call sites: branch, title,
  resume, topic, _disable_telegram_topic_mode_for_chat)
- gateway.shared.session_not_found (1 site)
- gateway.shared.warn_passthrough (2 sites in /title's f"⚠️ {e}" pattern)

YAML gotcha fixed: \`yolo.on\` and \`yolo.off\` were originally written
unquoted, which YAML 1.1 parses as boolean True/False keys. Renamed to
\`yolo.enabled\` / \`yolo.disabled\` for both safety and clarity.

Test fix: tests/agent/test_i18n.py::test_t_missing_key_in_non_english_falls_back_to_english
now resets the catalog cache on teardown, so the fake "foo: English Foo"
locale doesn't poison the module-level cache for subsequent tests in
the same xdist worker. (Without this, every gateway slash command test
that shares a worker with the i18n suite would see the fake catalog.)

Validation:
- tests/agent/test_i18n.py: 27/27 (parity contract — every key in every
  locale, matching placeholder tokens).
- tests/gateway/: 5077 passed, 0 failed (full gateway suite).
- 180 t() call sites added across 35 handlers; 1872 catalog entries
  total (234 keys × 8 locales).

* feat(i18n): add 8 new locales — af, ko, it, ga, zh-hant, pt, ru, hu

Expands the static-message catalog from 8 → 16 languages, each with full
270-key parity against the English source-of-truth.  Every locale now
covers the same surface PR #22914 added: approval prompts plus all 35
gateway slash command outputs.

New locales:
- af  Afrikaans      (community ask in #21961 by @GodsBoy; PRs #21962, #21970)
- ko  Korean         (PRs #20297 by @tmdgusya, #22285 by @project820)
- it  Italian        (PR #20371 by @leprincep35700)
- ga  Irish/Gaeilge  (PR #20962 by @ryanmcc09-dot)
- zh-hant Traditional Chinese (PRs #20523 by @jackey8616, #13140 by @anomixer)
- pt  Portuguese     (PRs #20443 by @pedroborges, #15737 by @carloshenriquecarniatto, #22063 by @Magaav)
- ru  Russian        (PR #22770 by @DrMaks22)
- hu  Hungarian      (PR #22336 by @lunasec007)

Each locale uses native-quality translations matching the existing tone
and conventions of the older 8 locales:
- zh-hant uses 繁體 characters with TW/HK technical vocabulary (軟體
  not 软件, 連線 not 连接, 設定 not 设置, 訊息 not 消息, 工作階段 not 会话, 程式
  not 程序, 預設 not 默认, 伺服器 not 服务器), full-width punctuation 「:()」.
- ko uses formal 합니다체 (습니다/합니다) register throughout.
- pt uses European Portuguese as baseline with neutral PT/BR vocabulary
  where possible.
- ga uses standard An Caighdeán Oifigiúil; English loanwords retained
  for tech terms without good Irish equivalents (gateway, API, JSON).
- All preserve {placeholder} tokens, backtick code spans, slash commands,
  brand names (Hermes, MCP, TTS, YOLO, OpenAI, Telegram, etc.), and emoji.

Aliases added in agent/i18n.py:
- af-za, Afrikaans → af
- ko-kr, Korean, 한국어 → ko
- it-it, italiano → it
- ga-ie, Irish, Gaeilge → ga
- zh-tw, zh-hk, zh-mo, traditional-chinese → zh-hant (note: zh-tw used to
  alias to zh; now aliases to its own zh-hant catalog)
- zh-cn, zh-hans, zh-sg → zh (unchanged from before)
- pt-pt, pt-br, brazilian, portuguese → pt
- ru-ru, Russian, русский → ru
- hu-hu, Magyar → hu

The zh-tw alias re-routing is intentional: previously typing 'zh-TW' got
the Simplified Chinese catalog (wrong vocabulary for Taiwan/HK users).
Now those users get the proper Traditional Chinese catalog.

Validation:
- tests/agent/test_i18n.py: 43/43 (parity contract holds for all 16
  languages × 270 keys = 4320 catalog entries, with matching placeholder
  tokens).
- E2E alias resolution verified for all 19 alias inputs (Afrikaans, ko-KR,
  한국어, italiano, Gaeilge, zh-TW, zh-HK, traditional-chinese, pt-BR,
  brazilian, Magyar, etc.).
- tests/gateway/: 5198 passed (3 pre-existing TTS routing failures
  unrelated to i18n).

Credit to all contributors whose PRs surfaced these language requests.
Their original PRs may now be closed as superseded with credit.

* feat(dashboard-i18n): add 14 web dashboard locales matching the static catalog

Brings the React dashboard (web/src/) up to the same 16-language
coverage the static catalog already has after the previous commits in
this PR. The Translations interface is TypeScript-typed, so every new
locale must provide every key — tsc -b is the parity guard.

Languages added (each is a complete 429-line locale file):
- af  Afrikaans
- ja  Japanese        (PR #22513 by @snuffxxx surfaced this)
- de  German          (PR #21749 by @mag1art)
- es  Spanish         (PR #21749)
- fr  French          (PRs #21749, #10310 by @foXaCe)
- tr  Turkish
- uk  Ukrainian
- ko  Korean          (PRs #21749, #18894 by @ovstng, #22285 by @project820)
- it  Italian
- ga  Irish (Gaeilge)
- zh-hant Traditional Chinese (PR #13140 by @anomixer)
- pt  Portuguese      (PRs #22063 by @Magaav, #22182 by @wesleysimplicio, #15737 by @carloshenriquecarniatto)
- ru  Russian         (PRs #21749, #22770 by @DrMaks22)
- hu  Hungarian       (PR #22336 by @lunasec007)

Each translation covers all 15 namespaces with full key parity vs en.ts,
preserves every {placeholder} token verbatim, keeps identifiers
untranslated (brand names, file paths, cron expressions, code spans),
translates the language.switchTo tooltip into the target language, and
matches existing tone conventions (zh-hant uses TW/HK vocab; ja uses
formal desu/masu; ko uses formal seumnida register; ga uses An
Caighdean Oifigiuil with English loanwords for tech vocab without good
Irish equivalents).

Plumbing:
- web/src/i18n/types.ts: Locale union expanded to all 16 codes.
- web/src/i18n/context.tsx: imports all 16 catalogs; exports
  LOCALE_META (endonym + flag per locale); isLocale() type guard.
- web/src/i18n/index.ts: re-export LOCALE_META.
- web/src/components/LanguageSwitcher.tsx: replaced two-state EN-ZH
  toggle with a click-to-open dropdown listing all 16 languages.

Note: zh-hant.ts exports zhHant (camelCase) since hyphen is invalid in
a JS identifier; the canonical 'zh-hant' string keys it in TRANSLATIONS.

Validation:
- npx tsc -b: 0 errors. Every locale satisfies Translations.
- npm run build (tsc + vite production): green, 2062 modules.
- Each locale file is exactly 429 lines.

Out of scope: plugin dashboards (kanban/achievements ship as prebuilt
bundles with no source in repo); Docusaurus docs (separate surface);
TUI (no i18n yet).

* feat(plugin-i18n): localize achievements + kanban plugin dashboards across all 16 locales

Brings the two shipped plugin dashboards (hermes-achievements, kanban)
under the same i18n umbrella as the core dashboard PR #22914 just
established.  Both bundles now read user-facing strings from the host's
i18n catalog via SDK.useI18n() instead of hardcoded English.

## Approach

Plugin dashboards ship as prebuilt IIFE bundles in
plugins/<name>/dashboard/dist/index.js — no build step, no source in
repo (upstream-authored, vendored as compiled JS).  Earlier contributor
PRs (#22594, #22595, #18747) tried direct edits but didn't actually
wire the bundles to read translations.

This change does the wiring properly:

1.  Each bundle gets a useI18n shim at IIFE scope:
        const useI18n = SDK.useI18n
          || function () { return { t: { kanban: null }, locale: "en" }; };
    Older host SDKs without useI18n still load the bundle and render
    English fallbacks.

2.  A small tx(t, path, fallback, vars) helper resolves dotted keys
    under the plugin's namespace (t.kanban.* or t.achievements.*) and
    interpolates {placeholder} tokens.

3.  Every React component starts with const { t } = useI18n() and
    each user-visible string is wrapped in tx(t, "key", "English fallback").
    Helpers called outside React components (window.prompt callers,
    constants used during init) take t as a parameter.

4.  Top-level constants that were English dictionaries (COLUMN_LABEL,
    COLUMN_HELP, DESTRUCTIVE_TRANSITIONS, DIAGNOSTIC_EVENT_LABELS in
    kanban) become getColumnLabel(t, status)-style functions backed by
    FALLBACK_* dictionaries.

## Translations added

Two new top-level namespaces added to the dashboard's TypeScript-typed
Translations interface:

- achievements: ~70 keys covering the hero, scan banner, achievement
  card, share dialog, stats, filters, and empty states.
- kanban: ~145 keys covering the board, columns (with nested
  columnLabels and columnHelp sub-dicts), card detail panel,
  bulk-actions toolbar, dependency editor, board switcher, and
  diagnostic callouts.

Each key is provided across all 16 supported locales:
en, zh, zh-hant, ja, de, es, fr, tr, uk, af, ko, it, ga, pt, ru, hu.

Total new translation entries: ~3,440 (215 keys × 16 locales).

## What stays English (deliberate)

- API paths, CSS class names, data-* attributes, JSON keys, regex
  strings, URLs, file paths (~/.hermes/kanban.db, boards/_archived/).
- State identifier strings used as lookup keys (triage / todo / ready /
  running / blocked / done / archived) — labels translate, key strings
  don't.
- The PNG share-card text rendered to canvas in the achievements
  ShareDialog (HERMES AGENT watermark, UNLOCKED stamp, tier names) —
  these become part of a globally-shared image and stay English.
- localStorage keys (hermes.kanban.selectedBoard).
- Brand names (Kanban, Hermes, WebSocket, Nous Research).

## Contributor credit

PR #22594 by @02356abc and PR #22595 by @02356abc supplied the
en + zh kanban namespace skeleton (145 keys); used as the en source-
of-truth in this commit and translated to the other 14 locales.

PR #18747 by @laolaoshiren first surfaced the achievements
localization request.

## Validation

- npx tsc -b: 0 errors. All 16 locale .ts files satisfy the
  Translations type with full key parity.
- npm run build (tsc + vite production build): green, 2062 modules,
  1.56MB JS / 95KB CSS, ~2.5s build.
- node --check on both plugin bundles: parse cleanly.
- 126 tx() call sites in kanban, 46 in achievements.

## Out of scope

- TUI (ui-tui/) has no i18n infrastructure yet.
- Docusaurus docs (website/i18n/) — already had zh-Hans; expanding
  is a separate translation workstream (Thai / Korean / Hindi PRs).
2026-05-10 07:14:14 -07:00
Teknium 62b1c74cbc fix(kanban): correct dispatcher spawn module name + PATH-first lookup
Follow-up to the previous commit's contributor cherry-pick.

The cherry-picked change replaced the bare ``["hermes", ...]`` spawn with
``[sys.executable, "-m", "hermes", ...]``. The intent was right (avoid
PATH dependence — cron, systemd User= services, launchd jobs, and other
detached dispatcher invocations routinely run with a stripped $PATH that
doesn't include the venv's bin/, breaking the bare-shim spawn) but the
module name is wrong: there is no top-level ``hermes`` package. The
console-script entry point in pyproject.toml is
``hermes = "hermes_cli.main:main"``, and ``python -m hermes`` fails with
``No module named hermes``. The cherry-picked form would have replaced a
sometimes-broken spawn with an always-broken one.

This commit:

- Adds ``_resolve_hermes_argv()``, mirroring ``gateway.run._resolve_hermes_bin``.
  Tries ``shutil.which("hermes")`` first (preferred — keeps existing ``ps``
  output and log lines familiar in the common case) and falls back to
  ``[sys.executable, "-m", "hermes_cli.main"]`` when the shim is not on
  PATH. The fallback goes through the running interpreter so it's
  PATH-independent. Kept as a local helper rather than imported from
  gateway because ``hermes_cli`` sits below ``gateway`` in the dependency
  order.
- Switches the dispatcher's ``cmd`` list to use ``*_resolve_hermes_argv()``.
- Adds three regression tests:
  * ``test_resolve_hermes_argv_prefers_path_shim`` — pins the PATH-first
    branch so a future refactor doesn't silently flip the order.
  * ``test_resolve_hermes_argv_falls_back_to_module_form_when_no_path_shim`` —
    pins the correct module name (``hermes_cli.main``, NOT ``hermes``).
    Direct regression guard for the form that shipped in the original PR.
  * ``test_resolve_hermes_argv_module_actually_runs`` — runs the fallback
    invocation as a real subprocess and asserts ``--version`` works, so
    losing ``hermes_cli.main``'s ``__main__`` handling can't slip past the
    string-match test.

Verified end-to-end: with the shim on PATH the resolver returns
``[/.../hermes]`` and ``--version`` works; with the shim removed the
resolver returns ``[python, -m, hermes_cli.main]`` and ``--version``
still works; the original PR's ``python -m hermes`` invocation fails as
expected (``No module named hermes``).
2026-05-10 07:10:47 -07:00
Wali Reheman d3db6724dd fix(kanban): use sys.executable -m hermes for dispatcher spawn
In NixOS container mode, hermes is installed at a store path with no
symlink on PATH (e.g. /data/current-package/bin/hermes). The kanban
dispatcher spawns workers via _default_spawn() using a bare 'hermes'
subprocess call, which fails with 'hermes executable not found on PATH'
in container mode.

Fix by calling sys.executable -m hermes instead, which is guaranteed
to resolve to the same Python interpreter running the dispatcher.
2026-05-10 07:10:47 -07:00
Teknium 5aa755e4e6 feat(plugins): run any LLM call from inside a plugin via ctx.llm (#23194)
* feat(plugins): host-owned LLM access via ctx.llm

Plugins can now ask the host to run a one-shot chat or structured
completion against the user's active model and auth, without ever
seeing an OAuth token or API key. Closes the gap where plugins that
needed bounded structured inference (receipts, CRM extraction,
support classification) had to either bring their own provider keys
or register a tool the agent had to call.

New surface on PluginContext:
- ctx.llm.complete(messages, ...)
- ctx.llm.complete_structured(instructions, input, json_schema, ...)
- async siblings ctx.llm.acomplete / acomplete_structured

Backed by the existing auxiliary_client.call_llm pipeline — every
provider, fallback chain, vision routing, and timeout policy Hermes
already supports applies automatically.

Trust gate (fail-closed by default):
- plugins.entries.<id>.llm.allow_model_override
- plugins.entries.<id>.llm.allowed_models (allowlist; '*' = any)
- plugins.entries.<id>.llm.allow_agent_id_override
- plugins.entries.<id>.llm.allow_profile_override

Embedded model@profile shorthand goes through the same gate as
explicit profile=, so it can't bypass the auth-profile policy.
Conflicting explicit and embedded profiles fail closed.

Also lands:
- plugins/plugin-llm-example/ — reference plugin that registers
  /receipt-extract, demonstrating image+text structured input,
  jsonschema validation, and the trust-gate config.
- website/docs/developer-guide/plugin-llm-access.md — full API docs.
- 45 unit tests covering trust gates, JSON parsing, schema
  validation, image encoding, async surface, and config loading.

Validation:
- 2628 tests pass in tests/agent/
- E2E: bundled plugin loaded with isolated HERMES_HOME, slash
  command produced parsed JSON via stubbed call_llm
- response_format extra_body wired correctly for both json_object
  and json_schema modes

* docs(plugin-llm): rewrite quickstart and framing

The quickstart now uses a meeting-notes-to-tasks example instead of
a receipt extractor, and the page leads with hook-time / gateway
pre-filter / scheduled-job framing rather than the OpenClaw
KB/support/CRM/finance/migration enumeration that the original
upstream PR used. Receipt example moved to a separate worked
example link so the docs page itself doesn't echo any of the
upstream framing.

Also clarifies where ctx.llm fits in the broader plugin surface
(table comparing register_tool / register_platform / register_hook
/ etc.) and what makes this lane different from auxiliary_client
internals.

No code change.

* docs(plugin-llm): reframe as any LLM call, not just structured output

The original draft leaned heavily on complete_structured() and made
the chat lane (complete() / acomplete()) feel like a footnote.
Restructure so:

- The page title and description say 'any LLM call.'
- The lead shows BOTH a plain chat call (error rewriter) AND a
  structured call (triage scorer) up top.
- Quick start has two complete plugin examples — /tldr (chat) and
  /paste-to-tasks (structured).
- New 'When to use which' table for choosing complete() vs
  complete_structured() vs the async siblings.
- Trust-gate sections explicitly note 'all four methods,' and the
  request-shaping list calls out chat-only fields (messages) and
  structured-only fields (instructions, input, json_schema)
  alongside each other.
- The 'Where this fits' section now says 'for any reason,
  structured or not.'

The receipt-extractor reference plugin still exists under
plugins/plugin-llm-example/ — but the docs page no longer treats
it as the canonical surface example. It's now described as 'a third
worked example, this time with image input.'

No code change.

* feat(plugin-llm): split provider/model into independent explicit kwargs

The first cut accepted a single 'provider/model' slug on every method
and split it internally. That looked clean but broke under live test:
the model-override path tried to use the slug's vendor prefix as a
literal Hermes provider id, which silently switched the user off
their aggregator (e.g. plugin asks for 'openai/gpt-4o-mini' on a user
who routes through OpenRouter — host attempted to call the 'openai'
provider directly, failed because OPENAI_API_KEY wasn't set).

New shape mirrors the host's main config:

  ctx.llm.complete(
      messages=[...],
      provider='openrouter',         # gated, optional
      model='openai/gpt-4o-mini',    # gated, optional
      profile='work',                # gated, optional
      ...
  )

Each is independently gated by its own allow_*_override flag.
Granting model-override does NOT auto-grant provider-override.
Allowlists are now per-axis (allowed_providers, allowed_models)
matched literally against whatever string the plugin sends.

Dropped 'model@profile' embedded-suffix shorthand entirely. Hermes
doesn't use that pattern anywhere else; profile= is its own kwarg.

Live E2E (against real OpenRouter via Teknium's config) confirms:
- zero-config call works
- default-deny blocks each override with a helpful error
- model-only override stays on user's active provider (the bug)
- provider+model override switches cleanly
- allowlist refuses non-listed entries
- structured output round-trip parses + schema-validates

Tests: 49 cases (up from 45); all green. Docs updated to match the
new shape, including a 'most plugins never need this section' callout
on the trust-gate config block.

* fix+cleanup(plugin-llm): real attribution, hook-mode coverage, move example out of core

Three integration fixes for the ctx.llm surface:

1. Attribution bug — result.provider and result.model now reflect
   what call_llm actually used, not placeholder fallbacks ('auto',
   'default'). New _resolve_attribution() helper:

     - explicit overrides win (what the call targeted)
     - response.model wins for the recorded model (provider
       canonicalisation: 'gpt-4o' → 'gpt-4o-2024-08-06' etc.)
     - falls back to _read_main_provider() / _read_main_model()
       when no override is set, so audit logs reflect the user's
       active main provider/model
     - 'auto' / 'default' only when EVERYTHING is empty

   Live verified: zero-config call now records
   provider='openrouter', model='anthropic/claude-4.7-opus-20260416'
   instead of provider='auto', model='default'.

2. Hook-mode coverage — TestHookMode confirms ctx.llm.complete
   works from inside a registered post_tool_call callback. The
   docs page promised hook integration; now there's a test that
   exercises the lazy-import path through the real invoke_hook
   machinery. Two cases: traceback-rewrite hook with conditional
   ctx.llm.complete, and minimal hook regression for the
   sync-hook + sync-llm path.

3. Reference plugin moved out of core. plugins/plugin-llm-example/
   is gone from hermes-agent — it now lives in the new
   NousResearch/hermes-example-plugins companion repo. The docs
   page links there. Hermes' bundled plugins should be plugins
   users actually run; reference / docs-companion plugins live
   externally.

Test count: 56 (up from 49). Wider sweep on tests/hermes_cli/
+ tests/gateway/ + tests/tools/ + tests/agent/ shows 16770
passing; the 12 failures are all pre-existing on origin/main
(verified by stashing this branch's changes and re-running) —
kanban-boards, delegate-task, gateway-restart, tts-routing —
none touch the plugin_llm surface.

* chore(plugins): move all example plugins to companion repo

Reference / docs-companion plugins now live exclusively in
NousResearch/hermes-example-plugins, not bundled with the core repo:

- example-dashboard
- strike-freedom-cockpit

A new fourth example, plugin-llm-async-example, was added to that
repo demonstrating ctx.llm's async surface (acomplete()) with
asyncio.gather() — registers /translate <lang>: <text> which fires
forward translation + sentiment classifier in parallel, then a
back-translation for QA. Live-tested at 2.5s for three real
provider round-trips (would be ~5-6s sequential).

Docs updated:
- developer-guide/plugin-llm-access.md links both sync and async
  examples in the Reference section
- user-guide/features/extending-the-dashboard.md repoints both demo
  sections to the companion repo with corrected install paths
- user-guide/features/built-in-plugins.md drops the two demo rows
- AGENTS.md notes that example plugins live in the companion repo

Net: hermes-agent's plugins/ directory now contains only plugins
users actually run (memory providers, dashboard tabs that ship real
features, the disk-cleanup hook, platform adapters). All four
demo / reference plugins live externally where they can be cloned
on demand instead of inflating the core install.
2026-05-10 07:09:28 -07:00
Teknium ae4b09ce10 test(security): broaden plugin API auth coverage + correct stale docstring
Follow-up to the previous commit's middleware fix.

- plugins/kanban/dashboard/plugin_api.py: rewrite the "Security note"
  docstring. The previous text said "/api/plugins/ is unauthenticated by
  design" — that's now actively wrong and dangerously misleading. New
  text explains that plugin routes flow through the same session-token
  middleware as core API routes and that --host 0.0.0.0 is safe to use
  on a LAN as a result.

- tests/hermes_cli/test_web_server.py: extend TestPluginAPIAuth to cover
  the surfaces the original PR didn't pin:
  * test_plugin_route_allows_auth now exercises a real plugin path
    (/api/plugins/example/hello) instead of accepting 200 OR 404 from
    a maybe-loaded kanban plugin — the assertion was effectively vacuous.
  * test_plugin_patch_requires_auth + test_plugin_delete_requires_auth
    cover non-GET mutation methods in case a future regression
    whitelists them by accident.
  * test_non_kanban_plugin_route_requires_auth proves the fix is
    plugin-agnostic, not kanban-specific (hits hermes-achievements +
    a non-existent plugin namespace; both 401 before route resolution).
  * test_plugin_websocket_unaffected_by_http_middleware locks in that
    the HTTP middleware change didn't accidentally start gating WS
    upgrades — kanban /events still uses its own ?token= check.
  Plus a cosmetic blank-line cleanup.
2026-05-10 07:04:18 -07:00
liuhao1024 ec9329ec41 fix(security): require dashboard auth for plugin API routes
Remove the blanket /api/plugins/* exemption from auth_middleware so
plugin API routes (e.g. Kanban dashboard) require the same session
token as all other /api/ endpoints.

Fixes #19533
2026-05-10 07:04:18 -07:00
Teknium 7312f7f849 feat(curator): hint at hermes curator pin in the rename block (#23212)
Surfaces the pin command at the moment users care about it: when a
consolidation just landed against their skill library and they're
looking at the umbrella name in the curator output. Previously `hermes
curator pin` existed but had no discovery surface — users only learned
it existed by reading docs or stumbling onto `hermes curator --help`.

The hint:

    archived 3 skill(s):
      • docx-extraction → document-tools
      • pdf-extraction → document-tools
      • old-stale — pruned (stale)
    full report: hermes curator status
    keep an umbrella stable: hermes curator pin document-tools

Gated on having at least one consolidation that produced an umbrella.
Pruned-only runs (nothing surviving to pin) skip the hint. When
multiple umbrellas were produced, picks alphabetically first as a
concrete example rather than listing them all.

3 new tests in tests/agent/test_curator_classification.py covering:
consolidation produces hint with real umbrella name, pruned-only run
omits it, multi-umbrella picks one example.
2026-05-10 06:44:53 -07:00
Teknium 50f9fee988 feat(gateway): add LINE Messaging API platform plugin (#23197)
* feat(gateway): add LINE Messaging API platform plugin

Adds LINE as a bundled platform plugin under `plugins/platforms/line/`,
synthesized from the strongest pieces of seven open community PRs. The
adapter requires zero core edits — `Platform("line")` is auto-discovered
via the bundled-plugin scan in `gateway/config.py`, and all hooks
(setup, env-enablement, cron delivery, standalone send) are wired
through `register_platform()` kwargs the way IRC and Teams do it.

Highlights merged into one plugin:

- **Reply token preferred, Push fallback.** Try the free reply token
  first (single-use, ~60s TTL); fall back to metered Push when the
  token is absent, expired, or rejected. (PR #21023)
- **Slow-LLM Template Buttons postback.** When the LLM is still running
  past `LINE_SLOW_RESPONSE_THRESHOLD` (default 45s), the adapter burns
  the original reply token to send a "Get answer" button bubble. The
  user taps it to fetch the cached answer via a fresh reply token —
  also free. State machine: PENDING → READY → DELIVERED, ERROR for
  cancelled runs (orphan resolves to `LINE_INTERRUPTED_TEXT` after
  /stop). Set threshold to 0 to disable. (PR #18153)
- **Three-allowlist gating** — separate user / group / room allowlists
  with `LINE_ALLOW_ALL_USERS=true` dev-only escape hatch. (PR #18153)
- **Markdown URL preservation.** Strip bold/italic/code-fence/heading
  markers (LINE renders them literally) but keep `[label](url)` →
  `label (url)` so URLs stay tappable. (PR #18153)
- **System-message bypass** for ` Interrupting`, ` Queued`, etc. —
  busy-acks reach the user as visible bubbles instead of being
  swallowed into the postback cache. (PR #18153)
- **Media via public HTTPS URLs.** LINE doesn't accept binary uploads;
  images/audio/video must be HTTPS-reachable. The adapter serves
  registered tempfiles under `/line/media/<token>/<filename>` from the
  same aiohttp app. Allowed-roots traversal guard covers
  `tempfile.gettempdir()`, `/tmp` (→ `/private/tmp` on macOS), and
  `HERMES_HOME`. `LINE_PUBLIC_URL` overrides URL construction for
  setups behind tunnels/proxies. (PR #8398)
- **5-message-per-call batching.** LINE rejects >5 messages per
  Reply/Push; smart-chunker caps text at 4500 chars per bubble.
- **Inbound dedup** via `webhookEventId` LRU. (PR #21023)
- **Self-message filter** via `/v2/bot/info` userId lookup. (PR #21023)
- **Loading-animation indicator** wired to LINE's `chat/loading/start`
  endpoint, DM-only (LINE rejects it for groups/rooms). (PR #21023)
- **Out-of-process cron delivery** via `_standalone_send`, so
  `deliver: line` cron jobs work even when cron runs detached from
  the gateway.
- **Webhook hardening** — 1 MiB body cap, constant-time HMAC-SHA256
  signature verification, dedup, scoped lock so two profiles can't
  bind the same channel.

Validation
----------

- `scripts/run_tests.sh tests/gateway/test_line_plugin.py` →
  73 passed in 1.05s
- `scripts/run_tests.sh tests/gateway/test_line_plugin.py
  tests/gateway/test_irc_adapter.py
  tests/gateway/test_plugin_platform_interface.py
  tests/gateway/test_platform_registry.py
  tests/gateway/test_config.py` → 193 passed, 7 skipped
- E2E import + register + signature roundtrip + `Platform("line")`
  bundled-plugin discovery verified against current `origin/main`.

Closes the seven open LINE PRs (#18153, #16832, #6676, #21023, #14942,
#14988, #8398) by superseding them with a single plugin-form
implementation that takes the best idea from each.

Co-authored-by: pwlee <32443648+leepoweii@users.noreply.github.com>
Co-authored-by: Jetha Chan <jetha@google.com>
Co-authored-by: Cattia <openclaw@liyangchen.me>
Co-authored-by: perng <charles@perng.com>
Co-authored-by: Soichiro Yoshimura <soichiro0111.dev@gmail.com>
Co-authored-by: David Zhou <77736378+David-0x221Eight@users.noreply.github.com>
Co-authored-by: Yu-ga <74749461+yuga-hashimoto@users.noreply.github.com>

* docs(platforms): document platform-specific slow-LLM UX pattern

Add a 'Platform-Specific Slow-LLM UX' section to the platform-adapter
developer guide covering the _keep_typing override pattern that LINE
uses for its Template Buttons postback flow.

Three subsections:
- Pattern: subclass _keep_typing to layer mid-flight UX (with code)
- Pattern: subclass send to route through a cache instead of sending
- When this pattern is appropriate (vs. always-Push fallback)

Plus a short pointer in gateway/platforms/ADDING_A_PLATFORM.md so
tree-readers find the prose walkthrough on the docsite.

Filed because the LINE plugin (PR #23197) was the first bundled
adapter to need this pattern — every prior plugin (irc, teams,
google_chat) handles slow responses with the default typing-loop and
a regular send_text. Documenting now while the rationale is fresh.

---------

Co-authored-by: pwlee <32443648+leepoweii@users.noreply.github.com>
Co-authored-by: Jetha Chan <jetha@google.com>
Co-authored-by: Cattia <openclaw@liyangchen.me>
Co-authored-by: perng <charles@perng.com>
Co-authored-by: Soichiro Yoshimura <soichiro0111.dev@gmail.com>
Co-authored-by: David Zhou <77736378+David-0x221Eight@users.noreply.github.com>
Co-authored-by: Yu-ga <74749461+yuga-hashimoto@users.noreply.github.com>
2026-05-10 06:40:46 -07:00
Teknium 9cdcf31cae docs(web-search): explain auxiliary-model summarization for web_extract (#23211)
web_extract runs returned page content through the web_extract auxiliary
model when pages exceed 5 000 chars (single-pass up to 500k, chunked up
to 2M, refused above that). The user-guide page didn't mention this —
users were surprised that long-page extracts produced summaries instead
of raw markdown, and that those summaries cost main-model tokens by
default.

Adds:
- size-driven behavior table (under 5k / 5k–500k / 500k–2M / over 2M)
- which auxiliary task does the work (auxiliary.web_extract)
- how to route summaries to a cheap model regardless of main
- escape hatch: browser_navigate when you need raw content
- troubleshooting entry for summarization timeouts
2026-05-10 06:40:23 -07:00
Teknium 3d4297a59a docs(user-stories): add 4 entries from @emmagine79 thread (#23204)
Captain Awesome's May 10 thread on hermes + Discord with GPT-5.5 / DeepSeek v4:
- life-changing umbrella tweet
- Google-me -> SSH-deploy landing page to VPS
- cron jobs triaging tech news into Discord channels by urgency
- PM paperclip agent running morning + evening standups for ADHD
2026-05-10 06:32:53 -07:00
Teknium ce374bc1ba chore: AUTHOR_MAP entry for kallidean (#20568) 2026-05-10 05:58:44 -07:00
Teknium 2704e7b67e fix(kanban): restrict board routing tools to orchestrators
Adapted from PR #20568 commit ce3518578 (Eric Litovsky / @kallidean).
Adds two-tier gating for the kanban tool surface so dispatcher-spawned
workers see only task-lifecycle tools (show/complete/block/heartbeat/
comment/create/link) while orchestrator profiles with `toolsets: [kanban]`
also see board-routing tools (kanban_list, kanban_unblock).

Workers shouldn't be enumerating or unblocking the board — they should
close their own task via the lifecycle tools. Hiding board-routing tools
from worker schemas keeps the worker focused and the toolset-isolation
contract honest.

Plus inherited from the same upstream commit:
- 50/200 row bound on kanban_list with `truncated` + `next_limit` metadata.
- Belt-and-suspenders runtime guard `_require_orchestrator_tool()` inside
  the orchestrator handlers in case a stale registration ever routes a
  worker to one of them.
- Tests for the new gate, the stricter bound, and the fact that even a
  worker with `toolsets: [kanban]` in config still doesn't see board
  routing.

Co-authored-by: Eric Litovsky <elitovsky@zenproject.net>
2026-05-10 05:58:44 -07:00
Eric Litovsky 50d281495e fix(kanban): parse triage flag explicitly 2026-05-10 05:58:44 -07:00
Eric Litovsky 26bf45f8c5 fix(kanban): parse include_archived explicitly 2026-05-10 05:58:44 -07:00
Eric Litovsky 236cbe16b6 feat(kanban): add orchestrator board tools 2026-05-10 05:58:44 -07:00
kshitijk4poor 44cdf555a8 fix(codex-spark): defensive 128k entry in DEFAULT_CONTEXT_LENGTHS + clarify validation test docstring
Two follow-ups from self-review:

1. Add gpt-5.3-codex-spark to DEFAULT_CONTEXT_LENGTHS at 128k. The
   primary resolution path for Spark goes through provider='openai-codex'
   → _CODEX_OAUTH_CONTEXT_FALLBACK (already correct). But if any future
   code path resolves Spark's context with a different provider (custom
   proxy, generic fallthrough), the longest-substring-first lookup in
   step 8 would match 'gpt-5' and report 400k, which is wrong by ~3x.
   Adding the explicit override is a cheap defensive correctness fix
   matching how gpt-5.4-mini and gpt-5.4-nano already shadow the generic
   gpt-5 entry.

2. Update test_openai_codex_model_validation_fallback.py docstring. The
   bug it was originally written for (gpt-5.3-codex-spark missing from
   listing) is now resolved by this PR's catalog restoration. The test
   still validly exercises the soft-accept code path for any future
   entitlement-gated Codex slug that ships before Hermes catalogs it,
   but the framing was stale — clarified.
2026-05-09 23:17:25 -07:00
kshitijk4poor 826e7171e9 test(codex-spark): add live-API regression and make picker test deterministic
Two follow-ups from self-review:

1. Add unit test for _fetch_models_from_api covering the live HTTP path.
   The salvaged PR #19530 dropped the supported_in_api:false filter in
   both _fetch_models_from_api and _read_cache_models, but only the
   cache path had a regression test. This adds the symmetric live-fetch
   test (mocked httpx) so a future drive-by change to the HTTP path
   can't silently re-introduce the filter.

2. Pin test_codex_picker_uses_live_codex_catalog to the cache fallback.
   The test wrote a fake JWT and a CODEX_HOME cache, but provider_model_ids
   ('openai-codex') still issued a real 10s HTTP probe to
   chatgpt.com/backend-api/codex/models before falling back to the cache.
   That made the test slow and non-deterministic in restricted/CI
   networks. Patch _fetch_models_from_api to return [] so we go straight
   to the cache path the test actually means to exercise.
2026-05-09 23:17:25 -07:00
kshitij 9ee9a4297d docs(codex-spark): document ChatGPT Pro entitlement gating
PR #12994 stripped gpt-5.3-codex-spark on the assumption that it was
unsupported. It's actually research-preview, ChatGPT-Pro-only, exposed
via the Codex OAuth backend at chatgpt.com/backend-api/codex/models —
not via the public OpenAI API.

Add explanatory comments in:
  - DEFAULT_CODEX_MODELS / _FORWARD_COMPAT_TEMPLATE_MODELS (codex_models.py)
  - _CODEX_OAUTH_CONTEXT_FALLBACK (model_metadata.py)
  - list_authenticated_providers' live-discovery branch (model_switch.py)

so future maintainers don't strip the entry again. Also documents the
intentional asymmetry that Spark stays out of the "openai" provider
catalog (it isn't on the public API) and why the supported_in_api
filter is *not* applied for the openai-codex route.
2026-05-09 23:17:25 -07:00
kshitij 6b5e0119b3 chore: add codex-spark salvage contributors to AUTHOR_MAP
Maps olegwn@gmail.com → nederev (PR #18286) and vesper@askclaw.dev →
askclaw-vesper (PR #19530) so the contributor attribution check passes
when their commits land via this salvage.
2026-05-09 23:17:25 -07:00
Vesper 🌙 9457644390 fix: surface Codex CLI-only models 2026-05-09 23:17:25 -07:00
olegdater c6dc295a35 fix(model-metadata): set codex-spark fallback context to 128k 2026-05-09 23:17:25 -07:00
olegdater 2a6f3deb50 fix(model-metadata): restore gpt-5.3-codex-spark fallback context 2026-05-09 23:17:25 -07:00
olegdater dcc8de83a9 feat(codex): add gpt-5.3-codex-spark model 2026-05-09 23:17:25 -07:00
Teknium e5af1dd633 fix(review): tell background reviewer not to capture transient env failures as skills (#23004)
Closes #6051.

Reported failure mode: agent migrated to WSL2, browser launch failed
because Playwright wasn't installed yet. Background reviewer captured
the failure as a durable skill (`browser-tool-launch-issue`) and the
agent kept refusing the browser tool for weeks after Playwright was
installed and verified working. Negative claims also propagated into
unrelated skills ("browser tools do not work", "cannot use Y from
execute_code").

Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both
lean hard on "be active, save things, a pass that does nothing is a
missed learning opportunity." Neither distinguished durable knowledge
from transient environment state. The reviewer was doing what it was
told.

Fix at the write site — both prompts now carry a "Do NOT capture"
section calling out:
  • Environment-dependent failures (missing binaries, fresh-install
    errors, post-migration path mismatches, 'command not found',
    unconfigured credentials, uninstalled packages)
  • Negative claims about tools or features ("X does not work")
    that harden into self-cited refusals
  • Session-specific transient errors that resolved before the
    conversation ended
  • One-off task narratives ("summarize today's market", "analyze
    this PR") — also addresses the #12812 / #4538 family

Plus a positive-reframing line: when a tool fails because of setup
state, capture the FIX (install command, config step, env var)
under an existing setup/troubleshooting skill — never "this tool
doesn't work" as a standalone constraint.

Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py
(2 new + all existing review-prompt assertions). Substring-based
checks so future prompt edits don't false-fail.
2026-05-09 22:51:25 -07:00
Teknium 126cbffb8a feat(stream-retry): add upstream + timing diagnostics to drop log (#23005)
The previous PR (#22993) gave us a structured WARNING per stream drop
but the only diagnostic was 'error_type=APIError error=Network
connection lost.' — same nothing the user started with. To actually
diagnose why subagents drop streams disproportionately we need to know
WHERE the drop happened.

Adds three breadcrumbs to the agent.log WARNING:

1. Inner exception chain. openai SDK wraps httpx errors as
   APIConnectionError / APIError so the catch site only sees the
   wrapper. _flatten_exception_chain walks __cause__/__context__ up to
   4 levels deep and renders 'Outer(msg) <- Inner(msg)' so we can
   tell ConnectError vs RemoteProtocolError vs ReadError vs
   ProxyError without enabling verbose mode.

2. Upstream HTTP headers. Snapshots cf-ray, x-openrouter-provider,
   x-openrouter-model, x-openrouter-id, x-request-id, server, via,
   etc. from stream.response immediately after open (so they survive
   even when the stream dies before the first chunk). These answer
   'is one CF edge / one downstream provider responsible, or random?'

3. Per-attempt counters. bytes streamed, chunk count, elapsed time on
   the dying attempt, and time-to-first-byte. Distinguishes 'couldn't
   connect at all' (0s, 0 bytes) from 'died after 30s mid-stream'
   (very different root causes — first is auth/routing, second is
   upstream idle-kill or proxy timeout).

Plumbing:

- _stream_diag_init / _stream_diag_capture_response live on AIAgent
  and produce a per-attempt dict held on request_client_holder['diag']
  for closure access from the retry block.
- _call_chat_completions and _call_anthropic both initialize the diag
  and increment counters per chunk/event (best-effort, never raises in
  the streaming hot path).
- _log_stream_retry / _emit_stream_drop accept an optional diag and
  render the new fields. Final-exhaustion log goes through the same
  helper so it gets the same diagnostic dump.
- UI status line gains a brief 'after Xs' suffix when timing is
  available — distinguishes 'connect failed' from 'died mid-stream'
  at a glance without grepping logs.

Sample WARNING after this change:

  Stream drop mid tool-call on attempt 2/3 — retrying.
    subagent_id=sa-2-cafef00d depth=1 provider=openrouter
    base_url=https://openrouter.ai/api/v1
    error_type=APIError error=Connection error.
    chain=APIError(Connection error.) <- RemoteProtocolError(peer
      closed connection without sending complete message body)
    http_status=200 bytes=12400 chunks=47 elapsed=12.00s ttfb=0.83s
    upstream=[cf-ray=8f1a2b3c4d5e6f7g-LAX
      x-openrouter-provider=Anthropic
      x-openrouter-id=gen-abc123 server=cloudflare]

Tests: 10 covering diag init, header capture (whitelist enforced for
PII), exception-chain walking + depth cap, log content with full diag,
log content without diag (placeholders), UI elapsed-suffix on/off.
2026-05-09 22:49:35 -07:00
Teknium 5a70d9b6be chore: AUTHOR_MAP entry for tymrtn (#21794) 2026-05-09 22:49:29 -07:00
tymrtn d1fc748def fix(kanban): /kanban slash command emits argparse garbage instead of help
Closes #21794.

`/kanban`, `/kanban help`, `/kanban --help`, and `/kanban <sub> -h`
all returned broken output to the gateway and interactive CLI. Three
underlying bugs in `hermes_cli.kanban.run_slash`:

1. argparse writes help to **stdout** but `run_slash` only captured
   stderr at parse time, so `-h` text was silently swallowed and
   replaced with the `(usage error: 0)` sentinel.
2. The wrapping parser used `prog="/"` and routed via a synthetic
   "_top → kanban" subparser, producing `usage: / kanban …` (stray
   space) and `usage: /kanban kanban …` (doubled token) in error text.
3. Bare `/kanban` and `/kanban help` dumped argparse's full ~3KB
   usage tree, which reads as visual garbage in a chat bubble.

Fix: drive the kanban_parser directly (no double-wrap), rewrite prog
strings on every leaf subparser, capture stdout AND stderr around
parse_args, distinguish SystemExit(0) (help — return captured stdout)
from SystemExit(2) (error — return single-line ⚠-prefixed message),
and add an explicit chat-friendly short-help block returned for bare
invocation and the help aliases (`help`, `--help`, `-h`, `?`).

Added 5 regression tests covering bare invocation, every help alias,
subcommand help, unknown action, and missing required arg.

Affects every chat platform via gateway/run.py::_handle_kanban_command
and the interactive CLI via cli.py::_handle_kanban_command.

Co-Authored-By: Nagatha (Claude Opus 4.7) <noreply@anthropic.com>
2026-05-09 22:49:29 -07:00
Teknium 3d2bfc502e chore(models): refresh OpenRouter + Nous fallback lists (#23001)
Reorder Anthropic Opus 4.7/4.6 + Sonnet 4.6 to the top, cluster free
models at the bottom of the OpenRouter list, and mirror the same
ordering into the Nous portal list (paid models only).

- Add inclusionai/ring-2.6-1t:free
- Drop minimax-m2.5, minimax-m2.5:free, sonnet-4.5, mimo-v2.5,
  glm-5v-turbo, glm-5-turbo, trinity-large-preview:free,
  trinity-large-thinking, qwen3.5-plus-02-15
- Replace qwen3.5-35b-a3b with qwen3.6-35b-a3b
- Drop x-ai/grok-4.20-beta from the Nous list
2026-05-09 22:47:38 -07:00
Teknium e2ce89a8aa chore: AUTHOR_MAP entry for li0near gmail (#21378) 2026-05-09 22:38:01 -07:00
li0near 6f2d60559e fix(kanban): drop redundant init_db() in gateway watchers (#21378)
Both `_kanban_notifier_watcher` and `_kanban_dispatcher_watcher`'s
`_tick_once_for_board` called `_kb.connect(board=slug)` immediately
followed by `_kb.init_db(board=slug)`. Since `connect()` already runs
the schema + idempotent migration on first open per process, the
explicit `init_db()` was redundant — and worse, `init_db()` deliberately
busts the per-process `_INITIALIZED_PATHS` cache and re-runs the migration
on a *second* connection that races the first.

On every cold gateway start against a legacy DB this surfaced as either
`sqlite3.OperationalError: duplicate column name: <col>` or intermittent
`database is locked` errors logged at the first tick. The duplicate-column
case is now tolerated by `_add_column_if_missing` (commit 78698381a), but
the wasted second migration plus the database-is-locked race remain
fixable by skipping the redundant call entirely.

Drops `_kb.init_db(board=slug)` at both call sites and adds a regression
test in `tests/hermes_cli/test_kanban_notify.py` that pins the absence
via source inspection plus a runtime spy.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-09 22:38:01 -07:00
Teknium 68e44642c8 fix(stream-retry): collapse two-line drop status, name provider, and let agent.log capture diagnostics (#22993)
Subagent stream drops were spamming the parent terminal with two lines
per blip ('Connection dropped...' + 'Reconnected...') while leaving zero
breadcrumb in agent.log to debug them.

Two underlying bugs, fixed together:

1. quiet_mode raised the run_agent/tools/etc. loggers to ERROR, which
   filters records before root-logger file handlers see them. The comment
   claimed 'File handlers still capture everything' — that was wrong.
   Removed in both run_agent.py and cli.py; console quietness already
   comes from hermes_logging not installing a console StreamHandler in
   non-verbose mode.

2. The stream-retry blocks emitted two _emit_status calls per drop
   ('⚠️ Connection dropped... Reconnecting...' + '🔄 Reconnected —
   resuming…') with no provider name, so multi-provider sessions had to
   dig through agent.log to attribute a drop. Replaced both call sites
   with a single _emit_stream_drop helper that emits ONE line naming the
   provider and error class, and always writes a structured WARNING to
   agent.log with subagent_id, depth, provider, base_url, error_type.

Net UX change: 6 lines per triple-subagent drop → 3 lines, each
naming the provider. agent.log now has a structured breadcrumb per
retry that didn't exist before.

Tests: 6 new tests in tests/run_agent/test_stream_drop_logging.py
covering the logger-level guard, structured WARNING content, single
status line per drop (no Reconnected follow-up), and provider naming.
2026-05-09 22:35:35 -07:00
Teknium 3800972dd0 feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955)
When the active main model has native vision and the provider supports
multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini
3, OpenRouter, Nous), vision_analyze loads the image bytes and returns
them to the model as a multimodal tool-result envelope. The model then
sees the pixels directly on its next turn instead of receiving a lossy
text description from an auxiliary LLM.

Falls back to the legacy aux-LLM text path for non-vision models and
unverified providers.

Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and
Cline. All four converge on the same pattern: tool results carry image
content blocks for vision-capable provider/model combinations.

Changes
- tools/vision_tools.py: _vision_analyze_native fast path + provider
  capability table (_supports_media_in_tool_results). Schema description
  updated to reflect new behaviour.
- agent/codex_responses_adapter.py: function_call_output.output now
  accepts the array form for multimodal tool results (was string-only).
  Preflight validates input_text/input_image parts.
- agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so
  tools see the live CLI/gateway override, not the stale config.yaml
  default. set_runtime_main()/clear_runtime_main() helpers.
- run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn
  start so vision_analyze's fast-path check sees the actual runtime.
- tests/conftest.py: clear runtime-main override between tests.

Tests
- tests/tools/test_vision_native_fast_path.py: provider capability
  table, envelope shape, fast-path gating (vision-capable model uses
  fast path; non-vision model falls through to aux).
- tests/run_agent/test_codex_multimodal_tool_result.py: list tool
  content becomes function_call_output.output array; preflight
  preserves arrays and drops unknown part types.

Live verified
- Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a
  typed filepath, gets pixels back, reads exact text from images that
  no aux description could capture (font color irony, multi-line
  fruit-count list, etc.).

PR replaces the closed prior efforts (#16506 shipped the inbound user-
attached path; this PR closes the gap for tool-discovered images).
2026-05-09 21:06:19 -07:00
Teknium e62250453b docs(user-stories): add 18 verified social entries (99 → 117) (#22920)
Found 18 real Hermes-Agent stories from HN, X, and Reddit not yet
captured on the page. All URLs HTTP-verified to return 200 with
matching titles.

Reddit (15): r/hermesagent (Obsidian-as-memory writeup at 794 upvotes,
LLM cheatsheet at 635 upvotes, Kanban game-changer post, OpenRouter #1
ranking, AMA from the Nous team, etc.); r/LocalLLaMA, r/Rag,
r/openclaw, r/SideProject, r/LocalLLM threads where users describe
their actual setups (Qwen3.5-9b on 16gb VRAM, 5060Ti + Telegram, smart
routing tiers).

X (3): @vmiss33's 'what I use Hermes for' guide, @HeyYanvi's
X-to-NotebookLM podcast workflow, @ExileAI_0's spare-laptop Iris
running RenPy + ComfyUI, @brucexu_eth's Hermes Inc. Telegram startup
sim from the hackathon, Hype's deep-dive blog.

HN (1): 'I'm using Hermes — sandbox it like any agent.'

No component changes — all new entries fit the existing schema
(real URL, real author, real date).
2026-05-09 20:58:09 -07:00
Clooooode 998676dd0c chore(test): comment of test case rewrite to english 2026-05-09 19:31:41 -07:00
Clooooode a4036654f1 fix(kanban): remove blocked kind from unsub 2026-05-09 19:31:41 -07:00
Clooooode dd49d50389 test(kanban): assert re-block notification is delivered after unblock cycle
Adds test_notifier_second_blocked_delivers to cover the case where a
task is blocked, unblocked, then blocked again — the second blocked
event must still deliver a gateway notification.

Currently fails because blocked is treated as a terminal event kind,
causing the subscription to be dropped after the first block.
2026-05-09 19:31:41 -07:00
Tranquil-Flow 8954537f95 fix(kanban): request default board explicitly (#21819) 2026-05-09 19:31:32 -07:00
Teknium eb3db231dc chore: AUTHOR_MAP entry for eloklam (#22898) 2026-05-09 19:31:14 -07:00
eloklam d04a0b81ee docs(skills): clarify kanban fan-out decomposition 2026-05-09 19:31:14 -07:00
Teknium 08ec602770 fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913)
Linux's MAX_ARG_STRLEN caps any single argv element at 128 KB
(32 * PAGE_SIZE). The previous heredoc-in-the-command-string approach
in _write_to_sandbox put the entire tool result inside the 'bash -c'
arg, so any result over ~128 KB raised OSError [Errno 7] 'Argument
list too long' before the heredoc ever ran. The caller logged a
warning, but quiet_mode (CLI default) sets tools.* to ERROR — so the
warning never reached agent.log either, and the agent saw a 1.5 KB
preview tagged 'Full output could not be saved to sandbox'. Hits
delegate_task with 3+ subagent outputs routinely now.

Switch to passing content via env.execute(stdin_data=...). cmd is
now just 'mkdir -p X && cat > Y' (under 1 KB), and the heavyweight
payload travels through stdin where there is no argv-element limit.

E2E reproduced the user's exact 144,778-char delegate_task envelope:
old code OSError'd, new code round-trips cleanly to disk with all
three task summaries intact.
2026-05-09 18:44:58 -07:00
Teknium ded194eb6a chore(skills): move heavy training skills + outlines to optional-skills (#22912)
These skills require heavy GPU/CUDA stacks or are niche enough that they shouldn't
be active by default. Moved to optional-skills/ where users opt-in via
`hermes skills install official/...`.

Moved:
- mlops/training/axolotl
- mlops/training/trl-fine-tuning
- mlops/training/unsloth
- mlops/inference/outlines

Counts: 91 -> 87 built-in, 72 -> 76 optional.

Auto-regenerated docs (per-skill pages + catalogs) reflect the move.
2026-05-09 18:44:12 -07:00
Teknium 4375b82cd9 feat(curator): show rename map in user-visible summary (#22910)
* feat(curator): show rename map (where skills went) in user-visible summary

The full data has always been on disk in REPORT.md, but the user-visible
curator summary (gateway 💾 line, CLI session-start panel,
`hermes curator status`) was counts-only — "consolidated 4 into 2
umbrellas" with no names. Users only discovered renames when something
they expected was gone.

New `_build_rename_summary()` formats the rename map and appends it to
`final_summary`:

    auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
    archived 3 skill(s):
      • docx-extraction → document-tools
      • pdf-extraction → document-tools
      • old-stale-thing — pruned (stale)
    full report: hermes curator status

Empty on no-op ticks (no archives), so most ticks add zero log noise.
Cap of 10 entries keeps agent.log readable when a 50-skill
consolidation lands; the full list is always in REPORT.md.

`hermes curator status` indents continuation lines so the multi-line
summary reads as one logical field.

5 new tests in tests/agent/test_curator_classification.py covering
empty / consolidation / pruning / cap / mixed cases.

* feat(curator): show recent run summary once on `hermes update`

The rename map is now visible from where users actually look — the
update flow they explicitly run, instead of just the live gateway log
or transient CLI session-start panel.

Behavior:
- After `hermes update`, if the most recent curator run produced a
  rename map (multi-line summary) that the user hasn't seen yet, print
  it once with a 'last run Xh ago' header and a one-time-message
  footer.
- Stamp `last_run_summary_shown_at = last_run_at` after printing so
  subsequent `hermes update` invocations are silent until a newer
  curator run lands.
- Silent on no-op runs (single-line summary like 'auto: no changes;
  llm: no change'). Still stamps shown so we don't reconsider on
  every update.
- Silent when the curator has never run (the existing first-run
  notice handles that case).

Output:

    ℹ Skill curator — last run 4h ago
      auto: 1 marked stale; llm: consolidated 2 into 1, pruned 1
      archived 3 skill(s):
        • docx-extraction → document-tools
        • pdf-extraction → document-tools
        • old-stale-thing — pruned (stale)
      full report: hermes curator status
      (This message shows once per curator run. View anytime: hermes curator status)

State migration:
- `_default_state()` gains `last_run_summary_shown_at: None`. Existing
  state files lack the field; `.get()` returns None; the comparison
  treats any prior run as 'not yet shown' and prints once on next
  update. Self-healing.

Wiring:
- Both `hermes update` paths in main.py call the new
  `_print_curator_recent_run_notice()` right after the existing
  first-run notice. Best-effort try/except so a state-load bug
  never breaks the update flow.

6 tests in tests/hermes_cli/test_curator_recent_run_notice.py:
no-run / single-line / multi-line / show-once / new-run-resets /
time-formatter buckets.
2026-05-09 18:43:40 -07:00
Teknium b67ea7ff47 perf(cli): skip welcome banner on chat -q single-query mode (#22904)
`hermes chat -q "..."` printed the full welcome banner before
running the query — kawaii ASCII logo, available toolsets list,
available skills list, model name, session ID, working directory,
update-available notice. Building it took ~420 ms on cold start
(~200 ms version-update probe, the rest is toolset / skill enumeration
plus Rich panel rendering).

For a one-shot `-q` query the banner is noise: the user already
picked the prompt, doesn't need a toolset reference, and gets the
session ID + resume hint from `_print_exit_summary()` after the
response prints.

The fully-quiet `-Q` / `--quiet` machine-readable path was already
banner-free; this brings the human-facing single-query path in line
so all non-interactive invocations are fast.

Measured impact (`hermes chat -q "ok" --max-turns 1`, 10-run
percentiles, 9950X3D):
  median:  1.90 → 1.75 s  (-150 ms)
  min:     1.80 → 1.73 s  ( -70 ms)
  P25:     1.82 → 1.74 s  ( -80 ms)

Wider variance than expected; the banner cost overlaps with API
latency on real `chat -q` runs. Min-time delta of 70 ms is the
cleanest signal — that's the deterministic banner-build cost gone.
The 150 ms median delta picks up cases where the version-update
probe also finishes during the wait.

Interactive mode (`hermes` with no `-q`) and the `--list-tools` /
`--list-toolsets` one-shot listing commands still show the banner —
those are the contexts where it's actually wanted.

Tests: 656/656 `tests/cli/` pass on top of latest main (modulo 5 pre-
existing flakes in `test_cli_save_config_value.py` that fail with
`No module named 'ruamel'` both with and without this change).
2026-05-09 18:20:28 -07:00
Teknium 5971a4e092 feat(docs): richer info panels on the Skills Hub for built-in + optional skills (#22905)
The Skills Hub at /skills had cards that, when expanded, showed only the
one-line description, tags, author, version, and an install command. For
the 163 bundled and optional skills shipped with the repo, this was thinner
than the data we already have on disk.

Three changes, all under website/:

1. extract-skills.py now pulls four extra fields per local skill:
   - 'overview' — first non-heading body paragraph from SKILL.md (stripped
     of admonitions/code fences, capped at ~500 chars at a sentence boundary)
   - 'envVars' / 'commands' — from the prerequisites: block in frontmatter
   - 'license' — from the top-level frontmatter
   - 'docsPath' — slug to the per-skill /docs/user-guide/skills/.../* page,
     computed with the same logic as generate-skill-docs.py

   162 of 163 local skills get a non-empty overview automatically. The
   remaining one (media/heartmula) has only headings/code in its body and
   falls through to the description.

2. Skill TS interface + SkillCard expanded-panel render the new fields:
   - Overview paragraph at the top of the panel
   - Prerequisites box (env vars + required commands) when frontmatter
     declares them
   - License row alongside author/version
   - 'View full documentation →' link to the per-skill docs page

   Search now covers the overview text too, so users can find skills by
   matching content from inside SKILL.md, not just the one-line description.

3. styles.module.css gains six new classes (overviewBlock, detailLabel,
   overviewText, prereqBlock/Row/Kind/List/Item, docsLink) styled to match
   the existing dark panel aesthetic.

External / community skills (Anthropic, LobeHub, Claude Marketplace cached
indexes) keep the old behavior — overview is empty, no prereqs, no docsPath.

Validation: 'npm run build' clean (exit 0); broken-link count unchanged at
155 baseline; all 163 generated docsPath values resolve to existing pages
under website/docs/user-guide/skills/.
2026-05-09 18:17:39 -07:00
Teknium da086a0154 chore: add ming1523 to AUTHOR_MAP 2026-05-09 17:55:12 -07:00
ming 85383c6363 fix(cli): preserve config comments on setting writes 2026-05-09 17:55:12 -07:00
Teknium de54618720 chore: add v1b3coder to AUTHOR_MAP 2026-05-09 17:54:58 -07:00
v1b3coder 4fdaf0b4d8 fix: use credential_pool for custom endpoint model listing probes
Same-provider /model switches on a 'custom' endpoint kept stale credentials
because (a) _resolve_named_custom_runtime's bare-custom + explicit_base_url
path went straight to OPENAI_API_KEY/OPENROUTER_API_KEY env fallbacks
without consulting the credential pool, and (b) switch_model() guarded
against custom-provider re-resolution to preserve base_url, locking in
the prior api_key.

Now the bare-custom path queries the credential pool first (mirroring
the named-custom-provider branch behavior), and the same-provider switch
guard is removed since resolve_runtime_provider has since grown a robust
custom-resolution path that preserves base_url from model_cfg.

Refs #18681 (the gateway-side api_key wiring is still separate),
#16254, #12919.
2026-05-09 17:54:58 -07:00
Teknium f93b8c28e3 chore: add DanielLSM to AUTHOR_MAP 2026-05-09 17:54:44 -07:00
Daniel Marta 1fb9f7c68c fix(gateway): pass max_total_size_mb and max_file_size_mb to CheckpointManager
The /rollback command handler in gateway/run.py was constructing
CheckpointManager with only enabled and max_snapshots, omitting
max_total_size_mb and max_file_size_mb that the __init__ expects.
This caused a TypeError on every /rollback invocation when checkpoints
were enabled.

Fixes: NousResearch/hermes-agent#18841
2026-05-09 17:54:44 -07:00
Teknium 4ca7c2104d test(gateway): stub /proc unavailability in find_gateway_pids fallback test
Follow-up test fix for #22693 — the existing test for ps-failure +
pid-file fallback needed the /proc walk path stubbed too since /proc
is now consulted first.
2026-05-09 17:54:17 -07:00
Wesley Simplicio 6bf7ac3185 fix(gateway): detect gateway process via /proc in Docker without procps
Salvage of NousResearch/hermes-agent#7622.

Docker images often lack procps so `ps` is unavailable.  Try reading
/proc/*/cmdline first (works in any Linux container) and fall back to
`ps -A eww` only when /proc is not present.  PermissionError on
individual PIDs is silently skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 17:54:17 -07:00
Teknium 2ffef15675 fix(test_gateway): stop run_gateway() tests from rewriting the dev's installed systemd unit (#22900)
run_gateway() calls refresh_systemd_unit_if_needed() on every invocation
so restart settings stay current after exit-code-75 respawns. The
user-scope unit path resolves under Path.home() (NOT sandboxed by
conftest, only HERMES_HOME is), and generate_systemd_unit() bakes the
current HERMES_HOME into the unit's Environment= line.

Result: any test that exercises run_gateway() end-to-end on a real
Linux dev box silently rewrites the developer's installed
~/.config/systemd/user/hermes-gateway.service with a polluted
HERMES_HOME pointing at /tmp/pytest-of-<user>/.../hermes_test. On the
next reboot, systemd loads that unit, the gateway starts looking at an
empty tmp dir, and Telegram/Discord/etc. all show as 'No messaging
platforms enabled' even though the user's real config is fine. Three
tests in tests/hermes_cli/test_gateway.py hit this path:
test_run_gateway_exits_cleanly_on_keyboard_interrupt,
test_run_gateway_exits_nonzero_when_start_gateway_reports_failure, and
test_run_gateway_root_guard_has_escape_hatch.

Two-layer fix:

1. _install_fake_gateway_run helper (covers all four run_gateway() call
   sites in test_gateway.py and any future ones) now also stubs
   supports_systemd_services and refresh_systemd_unit_if_needed.

2. refresh_systemd_unit_if_needed() itself sniffs the generated unit
   body for /pytest-of- and /hermes_test markers and refuses to write
   when present. Defense in depth so a future test that bypasses the
   helper still can't corrupt the dev's gateway. Tests that legitimately
   exercise the refresh flow (test_run_gateway_refreshes_outdated_unit_on_boot)
   patch generate_systemd_unit to return synthetic content that doesn't
   carry those markers, so they keep working.

Adds test_refresh_refuses_to_bake_pytest_tmpdir_into_real_user_unit as a
regression test for the source-side guard.
2026-05-09 17:54:09 -07:00
Wesley Simplicio 4f8d8ad912 fix(error_classifier): classify generic-typed timeout messages as transient (carve-out of #22664)
RuntimeError('claude CLI turn timed out') from a local OpenAI-compatible
shim was falling through to FailoverReason.unknown, surfacing as 'Empty
response from model' and burning 3 retry slots on the same failing
endpoint. _classify_by_message had no timeout-message branch — only
billing/rate_limit/auth/context_overflow/model_not_found patterns. The
type-based check at line 565 also requires isinstance(error, (TimeoutError,
ConnectionError, OSError)) — a plain RuntimeError doesn't match.

Add _TIMEOUT_MESSAGE_PATTERNS for 'timed out', 'deadline exceeded',
'request timed out', 'operation timed out', 'upstream timed out', 'turn
timed out'. _classify_by_message returns FailoverReason.timeout (retryable=True)
when any pattern matches.

Salvage of #22664's classifier portion. The original PR also bundled a
fallback self-selection guard which is now redundant (already on main
via #22780) plus DeepSeek thinking and session_search fixes that are
their own separate concerns.

Follow-up to #22780 — fixes the still-broken classification of
generic-typed provider-shim timeouts that #22780's dedup didn't cover.
2026-05-09 17:54:07 -07:00
Wesley Simplicio 6ddc48b058 fix(fallback): resolve api_key_env in fallback chain entries (carve-out of #22665)
Fallback chain entries with 'api_key_env: ENV_VAR_NAME' weren't being
resolved by either the init-time fallback path (line ~1660) or the
runtime _try_activate_fallback path (line ~8045). Only literal
'api_key' was honored; the snake_case 'api_key_env' alias documented
elsewhere in the config was silently dropped, so a 'provider: custom'
fallback with base_url + api_key_env worked as primary but failed as
fallback with 'no endpoint credentials found' / 401.

Adds 'or fb.get("api_key_env")' to the existing 'key_env' lookup in
both call sites, with empty-string-to-None coercion so unset env vars
don't poison the resolver.

Salvage of #22665's fallback portion. The original PR also bundled
gateway-degrade-on-no-adapters changes (those land via the carve-out
in #22853 which is the same code) and run_agent.py memory-nudge
counter hydration (issue #22357 territory, not mentioned in the
title). Drops both bundled pieces; keeps just the api_key_env fix.

Closes #5392.
2026-05-09 17:53:56 -07:00
Wesley Simplicio 246c676c2b fix(gateway): degrade gracefully when all platform adapters are missing
When connected_count == 0 AND enabled_platform_count > 0, the gateway
treated 'all adapters returned None' identically to 'all adapters
failed to connect' — both as fatal startup errors. The 'returned None'
case happens when imports fail silently or when adapters are present
in config but their dependencies aren't installed (e.g. discord.py
missing). Cron jobs and other gateway-runtime work would unnecessarily
fail to start.

Split: only return False when startup_retryable_errors is non-empty
(real connection attempt failed). When the list is empty AND enabled
> 0, log a warning and continue running, matching the 'no platforms
enabled' cron path.

Salvage of #22642's gateway slice. Drops the bundled run_agent.py
memory-nudge counter hydration block (issue #22357 territory) which
wasn't mentioned in the PR description.

Closes #5196.
2026-05-09 17:53:46 -07:00
Wesley Simplicio 116a1446a4 fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV
Problem: terminal.docker_env set in config.yaml was silently ignored.
Docker containers never received the user-specified env vars.

Root cause: docker_env was missing from all three config→env bridging
maps (cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync) and from the terminal_tool
_get_env_config() reader. _create_environment() consumed the key from
container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV
was never set.

Also extend the list-serialisation branches in cli.py and gateway/run.py
to handle dict values via json.dumps (lists already used json.dumps;
plain str() on a dict produces undecodable output).

Fix:
- cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings;
  serialise dict values with json.dumps alongside existing list path
- gateway/run.py: same additions to _terminal_env_map and serialisation
- hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV"
  to _config_to_env_sync so `hermes config set terminal.docker_env …`
  persists to .env correctly
- tools/terminal_tool.py: add docker_env key to _get_env_config() reading
  TERMINAL_DOCKER_ENV via _parse_env_var with default "{}"

Tests: add test_docker_env_is_bridged_everywhere to
tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on
origin/main, passes with fix.

Fixes #20537
2026-05-09 17:53:35 -07:00
Wesley Simplicio 53ec32819c fix(process_registry): kill orphaned Popen on post-spawn setup failure
After Popen succeeds with os.setsid (detached process group), 5 things
happen with no try/except: Thread construction, reader.start(), lock
acquisition, prune+register, checkpoint write. If any raises, the
Popen object goes unregistered and the detached process group leaks
indefinitely.

Wrap the post-spawn setup in try/except. On failure:
  - os.killpg(getpgid(pid), SIGKILL) takes down the entire process
    group (not just the shell - important because of detached PG +
    -lic shell wrapper that may have spawned children)
  - proc.kill() fallback for ProcessLookupError/PermissionError/OSError
  - proc.wait(timeout=5) reaps with a bound
  - re-raise to preserve original traceback
Nested try/except around cleanup so a secondary failure can't mask the
original.

Closes #2749.
2026-05-09 17:53:24 -07:00
Teknium c179bdab3c fix(install): also patch psutil on Termux fresh-install path
The Termux update path (PR #22814) prebuilds psutil from a marker-patched
sdist so 'platform android is not supported' doesn't kill it. The same
psutil setup.py error blocks fresh installs via scripts/install.sh — only
the update path was wired up. Without this, a brand-new Termux user can't
get past the very first 'pip install -e .[termux-all]' call.

- New scripts/install_psutil_android.py — standalone version of the same
  patcher hermes_cli/main.py uses, callable from bash.
- scripts/install.sh detects sys.platform == 'android' and runs the
  patcher before pip install.
- TODO note added to both copies pointing at upstream
  https://github.com/giampaolo/psutil/pull/2762; remove both when that
  ships.

Note: we keep psutil as a base dep on Android (do not adopt the proposed
sys_platform != 'android' marker in pyproject). Removing it would crash
five unguarded 'import psutil' sites at runtime
(tools/code_execution_tool.py, tools/tts_tool.py, tools/process_registry.py
(2x), gateway/platforms/whatsapp.py).
2026-05-09 17:53:15 -07:00
adybag14-cyber 6d5d467d39 fix(update): use termux-all uv fallback path on Termux 2026-05-09 17:53:15 -07:00
adybag14-cyber 3863d6d344 fix(update): prebuild psutil on Termux Android via Linux path shim 2026-05-09 17:53:15 -07:00
Wesley Simplicio 2245879af0 fix(checkpoint): guard _touch_project against non-dict project metadata
Problem
=======
`tools.checkpoint_manager._touch_project` reads the project metadata
file with `json.loads(meta_path.read_text(...))`, then immediately does:

    meta["workdir"] = str(_normalize_path(working_dir))

The `except` block only catches `(OSError, ValueError)`.  When the file
parses successfully but returns a non-dict value (a list `[]`, `null`,
or a scalar from a corrupted or hand-truncated write), `json.loads`
succeeds without error and `meta` is set to, e.g., `[]`.  The subsequent
subscript assignment then raises `TypeError: list indices must be
integers or slices, not str`, which is NOT caught by the narrow except
clause.

This TypeError propagates up through `_take` to `ensure_checkpoint`,
where the broad `except Exception` safety net swallows it.  The effect
is that `ensure_checkpoint` silently returns False for the entire
session — all checkpoints are skipped for the affected working directory
without any user-visible error.

Root cause
==========
Missing `isinstance(meta, dict)` guard after `json.loads`, identical in
pattern to bugs fixed in `cron/jobs.py` (#22569) and
`tools/process_registry.py` (#22544).  The same guard is already
present one function below in `_list_projects` (line 506), but was
inadvertently omitted in `_touch_project`.

Fix
===
Add two lines after the try/except:

```python
if not isinstance(meta, dict):
    meta = {}
```

This matches the existing guard in `_list_projects` and ensures a fresh
empty dict is used whenever the persisted value is not a mapping —
preserving the `created_at` semantics via `setdefault` on the next line.

Tests
=====
`TestTouchProjectMalformedMeta` covers four non-dict root values
(`[]`, `null`, `42`, `"oops"`).  Each writes a corrupted metadata file,
calls `_touch_project`, and asserts: (a) no exception raised, (b) the
metadata file is rewritten as a valid dict containing `last_touch` and
`workdir`.  All four fail on main with `TypeError`, pass with fix.
Full `tests/tools/test_checkpoint_manager.py` regression: 77 passed.
2026-05-09 17:53:13 -07:00
Wesley Simplicio 058c50816c fix(session): route OR-combined short CJK tokens to LIKE fallback (#20494)
The FTS5 trigram tokenizer requires >=3 CJK characters per individual
token to produce matchable trigrams. A query like "广西 OR 桂林 OR 漓江"
has cjk_count=6 (passes the existing >=3 guard) but each token is only
2 CJK chars, so the trigram index returns 0 results.

Fix:
- Add per-token check: if any non-operator CJK token has <3 CJK chars,
  force the LIKE fallback path regardless of total cjk_count.
- Expand the LIKE fallback to build one LIKE condition per non-operator
  token joined with OR, so each term is matched independently.

Regression tests added in TestCJKSearchFallback:
- test_cjk_or_combined_short_tokens_returns_results
- test_cjk_short_token_or_query_preserves_filters
2026-05-09 17:53:02 -07:00
Wesley Simplicio 35f773c459 fix(context_compressor): treat streaming premature-close as transient error
Problem:
When a provider or proxy drops a streaming response mid-flight (httpcore
raises RemoteProtocolError: "incomplete chunked read", "peer closed
connection", "response ended prematurely", etc.), _generate_summary
would not classify it as a transient error.  Instead of retrying on the
main model, it entered the generic 60-second cooldown, leaving context
growing unbounded until the cooldown expired.  Issue #18458.

Root cause:
_is_connection_error in auxiliary_client.py did not match httpcore's
streaming premature-close error substrings.  context_compressor.py's
_generate_summary except block never called _is_connection_error, so
those errors fell through to the 60-second generic cooldown rather than
triggering the retry-on-main fallback path used for timeouts.

Fix:
1. auxiliary_client.py — extend _is_connection_error keyword list with:
   "incomplete chunked read", "peer closed connection",
   "response ended prematurely", "unexpected eof",
   "remoteprotocolerror", "localprotocolerror".
   Also guard the `from openai import ...` with try/except ImportError
   so the function works in environments without the openai package.
2. context_compressor.py — import _is_connection_error and call it in
   _generate_summary's except block as _is_streaming_closed.  Include
   _is_streaming_closed in the fallback-to-main condition (alongside
   _is_model_not_found, _is_timeout, _is_json_decode) and use the
   shorter 30s transient cooldown for streaming-closed errors.

Tests:
4 new regression tests in TestStreamingClosedFallback:
- test_incomplete_chunked_read_falls_back_to_main
- test_peer_closed_connection_falls_back_to_main
- test_streaming_closed_on_main_uses_short_cooldown  (stash-verified)
- test_non_streaming_unknown_error_still_uses_long_cooldown

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-09 17:52:51 -07:00
heathley 0c5c4d1b8d fix(skills-hub): cover remaining SSRF fetch paths after #10029 2026-05-09 17:52:12 -07:00
Teknium af9df46525 chore: add kidonng to AUTHOR_MAP 2026-05-09 17:51:04 -07:00
Kid 1321bcf5fe fix(gateway): finalize final stream edit on done 2026-05-09 17:51:04 -07:00
Teknium c1cc3d4ea6 perf(image_gen): defer fal_client import to first generation request (#22859)
`tools/image_generation_tool.py` did `import fal_client` at module
top, which pulled the entire fal_client + httpx + rich stack on every
process that ran `discover_builtin_tools()` — every `hermes` cold
start, even ones that never touch image generation.

Make the import lazy: replace the eager import with a placeholder
(`fal_client: Any = None`) and add an idempotent `_load_fal_client()`
that rebinds the module global on first use. Call it from the two
runtime entry points (`_ManagedFalSyncClient.__init__` and
`_submit_fal_request`) and from the SDK-presence check in
`check_image_generation_requirements`.

The loader short-circuits if the global is already truthy, which
preserves the test pattern of monkeypatching `fal_client` to install
a mock — the `monkeypatch.setattr(image_tool, "fal_client", ...)`
calls in test_image_generation.py keep working unchanged.

Measured impact (15-run min times, 9950X3D):
  tools.image_generation_tool alone:  77 → 20 ms  (-74%)
                                      36 → 20 MB   (-44%)
  import cli (full):                 734 → 720 ms  (-2%)
  import model_tools:                372 → 366 ms  (-2%)

The microbench is dramatic but the full-CLI win is small — fal_client
shares its httpx + rich dependencies with the rest of the agent, so
on a real cold start most of the 16 MB / 64 ms is already paid by
other imports. The win matters mostly for processes that touch this
tool without otherwise loading httpx (rare) and for architectural
consistency with the previous lazy-load PRs (#22681 google_chat,
#22831 teams).

Tests: 55/55 `tests/tools/test_image_generation.py` pass, including
the cases that monkeypatch the module global to install a mock
fal_client. End-to-end verification confirms `import model_tools`
no longer pulls `fal_client` into `sys.modules`.
2026-05-09 17:45:09 -07:00
Teknium fef1a41248 docs: round 2 audit — messaging, developer-guide, guides, integrations (#22858)
Cross-checked 75 docs pages under user-guide/messaging/, developer-guide/,
guides/, and integrations/ against the live registries and gateway code.

messaging/
- index.md: API Server toolset is hermes-api-server (was 'hermes (default)');
  Google Chat slug is hermes-google_chat (underscore — plugin name uses _).
- google_chat.md: drop bogus 'pip install hermes-agent[google_chat]' (no such
  extra); list the actual deps (google-cloud-pubsub, google-api-python-client,
  google-auth, google-auth-oauthlib).
- qqbot.md: config namespace is platforms.qqbot (was platforms.qq, which is
  silently ignored by the adapter); QQ_STT_BASE_URL is not read directly —
  baseUrl lives under platforms.qqbot.extra.stt.
- teams-meetings.md: 'hermes teams-pipeline' is plugin-gated (teams_pipeline
  plugin must be enabled), not a built-in subcommand.
- sms.md: example log line 0.0.0.0:8080 -> 127.0.0.1:8080 (default
  SMS_WEBHOOK_HOST).
- open-webui.md: API_SERVER_* are env vars, not YAML keys — write them to
  per-profile .env, not 'hermes config set' (same pattern fixed in
  api-server.md last round). Also bumped example ports to 8650+ to dodge the
  default webhook (8644)/wecom-callback (8645)/msgraph-webhook (8646)
  collision.

developer-guide/
- architecture.md: tool/toolset counts (61/52 -> 70+/~28); LOC stamps for
  run_agent.py, cli.py, hermes_cli/main.py, setup.py, mcp_tool.py,
  gateway/run.py replaced with 'large file' to stop drifting.
- agent-loop.md: same LOC drift (~13,700 -> 'a large file (15k+ lines)').
- gateway-internals.md: '14+ external messaging platforms' -> '20+'; gateway
  platform tree updated (qqbot is a sub-package, not qqbot.py; added
  yuanbao.py, feishu_comment.py, msgraph_webhook.py); 'gateway/builtin_hooks/
  (always active)' was wrong — it's an empty extension point and
  _register_builtin_hooks() is a no-op stub.
- acp-internals.md: drop fictional 'message_callback' from the bridged-
  callbacks list; clarify thinking_callback is currently set to None.
- provider-runtime.md: provider list was missing AWS Bedrock, Azure Foundry,
  NVIDIA NIM, xAI, Arcee, GMI Cloud, StepFun, Qwen OAuth, Xiaomi, Ollama
  Cloud, LM Studio, Tencent TokenHub. Fallback section described only the
  legacy single-pair model — corrected to the canonical list-form
  fallback_providers chain.
- environments.md: parsers list missing llama4_json and the deepseek_v31
  alias; both register via @register_parser.
- browser-supervisor.md: drop reference to scripts/browser_supervisor_e2e.py
  which doesn't exist in-repo.
- contributing.md: tinker-atropos is a git submodule — note that
  'git submodule update --init' is required if cloning without
  --recurse-submodules.

guides/
- operate-teams-meeting-pipeline.md: cron flags were all wrong — schedule is
  positional (not --schedule), the script-only flag is --no-agent (not
  --script-only), and there's no --command flag. Replaced with a real example
  that creates the script under ~/.hermes/scripts/ and uses the actual flags.
  Also replaced fictional 'hermes cron show <name>' with 'hermes cron status'.
- automation-templates.md: 'cron create --skills "a,b"' doesn't work —
  the flag is --skill (singular, repeatable). Fixed all 5 occurrences via AST
  rewrite.
- minimax-oauth.md: 'hermes auth add minimax-oauth --region cn' silently
  fails because --region isn't registered on the auth-add argparse spec.
  Pointed users at the minimax-cn provider (or MINIMAX_CN_API_KEY env) for
  China-region access.
- cron-script-only.md: 'hermes send' is fictional — replaced the comparison-
  table mention with a webhook-subscription pointer; also fixed the dead link
  to /guides/pipe-script-output (page doesn't exist).
- cron-troubleshooting.md: 'hermes serve' isn't a real subcommand. Pointed
  at 'hermes gateway' (foreground) / 'hermes gateway start' (service).
- local-ollama-setup.md: 'agent.api_timeout' is not a config key. The right
  knob is the HERMES_API_TIMEOUT env var.
- python-library.md: run_conversation() return dict has only final_response
  and messages — task_id is stored on the agent instance, not echoed back.
- use-mcp-with-hermes.md: '--args /c "npx -y …"' wraps the npx command in
  one quoted string, so cmd.exe gets a single arg instead of the multi-token
  command line it needs. Removed the surrounding quotes — argparse nargs='*'
  collects each token correctly.

integrations/
- providers.md: Bedrock guardrail YAML keys were 'id'/'version' (don't exist);
  actual keys are guardrail_identifier/guardrail_version (matches DEFAULT_CONFIG
  and the run_agent.py reader). GMI default base URL (api.gmi.ai/v1 ->
  api.gmi-serving.com/v1) and portal URL (inference.gmi.ai -> www.gmicloud.ai)
  refreshed. Fallback section rewritten to lead with the canonical
  fallback_providers list form (was leading with the legacy fallback_model
  single dict); supported-providers list extended to include azure-foundry,
  alibaba-coding-plan, lmstudio.

index.md
- '68 built-in tools' -> '70+'; '15+ platforms' was both inconsistent with
  integrations/index.md ('19+') and undercounted — bumped to 20+ and added
  Weixin/QQ Bot/Yuanbao/Google Chat to the list.

Validation: 'npm run build' clean (exit 0); broken-link count unchanged at
155 (same as round-1 post-skill-regen baseline). 24 files, +132/-89.
2026-05-09 15:00:24 -07:00
Teknium 0bcc327cab docs(openrouter): document auxiliary.<task>.extra_body for OR routing and Pareto (#22844)
The plumbing for setting OpenRouter provider preferences and the Pareto Code
router on auxiliary tasks already exists — auxiliary.<task>.extra_body is
forwarded verbatim by call_llm() / async_call_llm(). It just wasn't documented,
so users who wanted (e.g.) Pareto Code routing for compression but the strongest
coder for the main agent had no way to discover the escape hatch.

- hermes_cli/config.py: expand the auxiliary section header with a YAML
  example showing provider routing plus plugins under extra_body, and an
  explicit note that main-agent provider_routing / openrouter.min_coding_score
  do NOT propagate to aux calls (each task is independent by design)
- website/docs/user-guide/configuration.md: new 'OpenRouter routing and
  Pareto Code for auxiliary tasks' subsection with worked example
- website/docs/integrations/providers.md: cross-link from the Pareto Code
  Router section to the aux-side doc

E2E verified that auxiliary.<task>.extra_body reaches the OpenRouter API with
the configured provider routing and plugins blocks intact.
2026-05-09 14:51:20 -07:00
Teknium 70bfd429e5 fix(gateway): preserve reasoning_content, codex_message_items, finish_reason on transcript replay (#22839)
PR #2974 whitelisted three reasoning fields (reasoning, reasoning_details,
codex_reasoning_items) for the gateway's simple-text replay branch. Three
more fields were added to the DB later but the whitelist was never updated:

  - reasoning_content: provider-facing thinking text. _copy_reasoning_content_for_api
    promotes 'reasoning' -> 'reasoning_content' at send time only when the
    strings happen to match. Carrying the original verbatim avoids loss
    for providers that return them as distinct fields (DeepSeek/Kimi/
    Moonshot thinking modes), and preserves the empty-string sentinel
    that DeepSeek V4 Pro requires for thinking-mode replay.
  - codex_message_items: exact assistant message items with 'phase'.
    OpenAI docs: 'preserve and resend phase on all assistant messages —
    dropping it can degrade performance.' Required for prefix cache hits.
    No recovery path exists — once dropped, gone.
  - finish_reason: informational; cheap to keep so transcripts replay
    identically across CLI and gateway.

The CLI is unaffected because cli.py keeps the live in-memory message list
across turns (cli.py:10046 'self.conversation_history = result["messages"]').
The gateway rebuilds agent_history from the SQLite transcript on every turn,
so any field stripped during replay is silently lost.

Refactors the inline whitelist into a module-level _build_replay_entry()
helper so the contract can be unit-tested. 16 new tests pin the field set
and falsy-value handling.

Verified end-to-end: DB stores all 8 fields, replay now preserves all 8
(was preserving only 5 for assistant text turns).
2026-05-09 14:47:33 -07:00
Teknium c7f0aab949 feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838)
Pick openrouter/pareto-code as your model and OpenRouter auto-routes each
request to the cheapest model meeting your coding-quality bar (ranked by
Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0,
default 0.65) tunes the floor.

- hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so
  it shows up in the picker with a description
- hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands
  on a mid-tier coder on the current Pareto frontier)
- plugins/model-providers/openrouter: emit extra_body.plugins =
  [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code
  AND the score is a valid float in [0.0, 1.0]
- agent/transports/chat_completions.py: same emission on the legacy flag
  path (when no provider profile is loaded)
- run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into
  both build_kwargs() invocations and the context-summary extra_body path
- cli.py: read openrouter.min_coding_score once at init, validate float in
  [0,1], pass to AIAgent constructions (CLI + background-task paths)
- cron/scheduler.py, batch_runner.py, tools/delegate_tool.py,
  tui_gateway/server.py: propagate the kwarg (mirrors providers_order
  plumbing — subagents inherit, cron/batch read from config)
- tests: profile-level + transport-level coverage of the model gating,
  unset/empty/out-of-range handling, and the legacy flag path
- docs: new 'OpenRouter Pareto Code Router' section in providers.md

Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a
mid-tier coder, at omission we get the strongest. Score is silently dropped
on any model other than openrouter/pareto-code, so it's safe to leave set.
2026-05-09 14:47:00 -07:00
Henkey b349ae1e4c fix(acp): honor task cwd for foreground terminal commands 2026-05-09 14:46:34 -07:00
Teknium 550f6e2efc perf(teams): defer httpx import to first webhook call (#22831)
Same pattern as the google_chat lazy-load (PR #22681), applied to the
Teams plugin. The bundled `plugins/platforms/teams/adapter.py` did
`import httpx` at module top, which dragged the entire httpx +
httpcore stack into every process that triggered plugin discovery —
including `hermes` invocations that never instantiate the Teams
adapter.

`httpx` is only needed inside one method
(`TeamsMeetingPipeline._write_summary_via_incoming_webhook`), and the
`httpx.AsyncBaseTransport` parameter annotation is already string-only
thanks to the existing `from __future__ import annotations`. Move the
runtime import inside the method.

Measured impact (7-run medians, 9950X3D):
  teams plugin alone:    118 → 89 ms  (-25%)
                         46 → 38 MB   (-17%)
  import cli (full):     unchanged
  import model_tools:    unchanged

The full-CLI numbers are flat because httpx is loaded transitively
from many other modules on that path. The microbench win is the real
signal: 29 ms / 8 MB shaved off any process that touches the teams
plugin without otherwise pulling httpx — primarily future workflows
where the gateway is enabled but Teams is not configured.

Tests: 44/44 `tests/gateway/test_teams.py` pass; 345 across all
plugin-platform suites (teams + qqbot + google_chat). The test file
imports `httpx` itself for the `MockTransport` fixture, which is
correct — tests legitimately use httpx, only the plugin's module-level
import was the issue.
2026-05-09 14:42:12 -07:00
HenkDz 840ebe063e fix: make session search initialize session db 2026-05-09 14:36:58 -07:00
helix4u 9c26297c80 fix(gateway): preserve Ctrl+C for Windows foreground runs 2026-05-09 14:34:18 -07:00
Teknium bfc84bdc6f chore: add Ninso112 to AUTHOR_MAP 2026-05-09 13:38:52 -07:00
Ninso112 883e11f0a0 fix(openrouter): add x-grok-conv-id header for Grok models to improve prompt cache hit rates (carve-out of #22708)
Pass session_id through to provider profile build_api_kwargs_extras so
the OpenRouter profile can attach an xAI cache-affinity header
(x-grok-conv-id: <session-id>) for x-ai/grok-* models. xAI prompt
cache requires server affinity via this header — without it the cache
is poisoned and Grok prompt-cache hit rates drop dramatically on
multi-turn sessions.

Carve-out of #22708 by Ninso112. The original PR bundled a /diff
slash command, a zsh completion fix (already on main via #22802),
and holographic memory null-guards. This salvage keeps just the
Grok header work — small, targeted, and well-tested. Other
contributors and changes preserved for separate review.

Closes #22705.
2026-05-09 13:38:52 -07:00
Teknium 5e2eba87e6 chore: add mbac to AUTHOR_MAP 2026-05-09 13:38:38 -07:00
mbac 1508dcb9c2 fix(gateway): adopt unit's HERMES_HOME for --system CLI ops
When systemd_restart / systemd_status / systemd_stop run under sudo,
HERMES_HOME is stripped and HOME=/root, so get_hermes_home() resolves
to /root/.hermes instead of the unit's pinned home. read_runtime_status
and get_running_pid then look at the wrong gateway_state.json — the
60s status poll never sees "running", times out, and forces another
systemctl restart that SIGTERMs the in-progress new gateway.

Read the unit's pinned HERMES_HOME from `systemctl show -p Environment`
and mirror it into os.environ before any HERMES_HOME-derived read.
Early-out when system=False (user-scope inherits naturally). Errors
swallowed so a transient systemctl failure doesn't break unrelated
CLI ops.

Closes #22035.
2026-05-09 13:38:38 -07:00
Teknium 448c11f16d fix(telegram): default notifications to 'important' (silence intermediate)
Per-tool-call push notifications on Telegram are noisy enough that
'all' is the wrong default — long agent runs spam the user's notification
shade with status messages they didn't ask to be pinged about. Final
responses, approval prompts, and slash confirmations still notify;
intermediate progress, streaming, and tool-progress messages now
deliver silently via disable_notification.

Users who want the legacy behavior can opt back in with:
  display:
    platforms:
      telegram:
        notifications: all
or HERMES_TELEGRAM_NOTIFICATIONS=all.
2026-05-09 13:38:25 -07:00
Teknium b4d3092f69 chore: add CalmProton to AUTHOR_MAP 2026-05-09 13:38:25 -07:00
Denis 236f3b0521 feat(gateway): add Telegram notification mode to suppress intermediate push notifications
Add a configurable notifications mode for the Telegram platform adapter
that controls which messages trigger push notifications.

- display.platforms.telegram.notifications: "all" (default) | "important"
- HERMES_TELEGRAM_NOTIFICATIONS env var override
- In "important" mode, all sends use disable_notification=True except:
  - Approvals (send_exec_approval) and slash confirmations
  - Final response messages (metadata["notify"]=True)
- Zero overhead in default "all" mode
- Zero impact on non-Telegram platforms

Closes #22771
2026-05-09 13:38:25 -07:00
Wesley Simplicio ca13993217 fix(delegate): add explicit do-not-use guidance to acp_command/acp_args schema (carve-out of #22680)
acp_command / acp_args descriptions previously primed the model to
populate them — "Per-task ACP command override (e.g. 'copilot')" —
even when no ACP CLI was installed. Models with weaker schema-following
discipline would set them and the spawn would fail.

Add explicit "Do NOT set unless the user has explicitly told you"
guidance at both the top-level acp_command and the per-task override.
Strengthen acp_args to mention it's empty unless acp_command is set.
Adds 2 tests pinning the descriptions.

Note: this is a cosmetic prompt-engineering fix — the params remain
exposed in the schema. The fully-correct fix is to gate them behind
a config flag or runtime ACP-CLI detection so the schema only emits
them when an ACP harness is available. Tracked as a follow-up; this
PR ships the low-cost stopgap.

Salvage of #22680 (delegate schema only). The original PR also
bundled unrelated fixes for #22548, #21944, #22150 — those
need separate PRs since #22548 and #21944 are already addressed
on main (#22780 + #22798 in flight) and #22150 deserves its own
review.

Closes #22013.
2026-05-09 13:37:30 -07:00
Teknium 1c9ffb177c fix(model-metadata): align hy3-preview static fallback + delete change-detector test (#22805)
Two co-located fixes:

1. agent/model_metadata.py: bump hy3-preview static fallback from
   256000 to 262144 (256 * 1024) to match OpenRouter live metadata
   so cache and offline both agree (issue #22268).

2. tests/hermes_cli/test_tencent_tokenhub_provider.py: replace the
   exact-value change-detector (assert ctx == 256000) with an
   invariant assertion (registered + >= 4096). Per AGENTS.md
   'Don't write change-detector tests': pinning the upstream-controlled
   context length is exactly the test class the rule forbids — it
   breaks every time the provider bumps the published value, with
   zero behavioral coverage gained.

Salvage of #22574 with a redirect on the test approach. The
contributor's diff bumped the integer and added a SECOND
change-detector pinning DEFAULT_CONTEXT_LENGTHS[hy3-preview] == 262144,
which would re-break on the next published bump. We instead delete
the change-detector entirely and assert the relationship.

Closes #22268.
2026-05-09 13:37:19 -07:00
Sanjay Santhanam fe61d95b44 fix(completion): use valid zsh _arguments exclusion-group syntax
The generated zsh completion script used `(-h --help)` as the exclusion
group for `_arguments`, which zsh rejects with:

  _arguments:comparguments: invalid argument: (-h --help){-h,--help}[...]

Exclusion groups in `_arguments` cannot contain long options. Use the
canonical `(-)` form (exclude all other options) which correctly
handles flag pairs like `-h`/`--help`.

Fixes NousResearch/hermes-agent#22686
2026-05-09 13:36:44 -07:00
Wesley Simplicio 6e848f60ef fix(doctor): normalize provider name and aliases before dedicated-skip check 2026-05-09 13:36:33 -07:00
Wesley Simplicio 1dd0790654 fix(doctor): skip pluggable provider profiles when a dedicated check exists (#22346)
Problem
-------
`hermes doctor` ran two health checks for Anthropic: a dedicated one
with the correct `x-api-key` + `anthropic-version` headers, and a
generic Bearer-auth one driven by the pluggable `ProviderProfile` for
"anthropic". The generic check called `https://api.anthropic.com/v1/models`
with `Authorization: Bearer ...`, which Anthropic answers with HTTP 404,
producing a noisy duplicate warning even when the dedicated check passed.

Root cause
----------
`hermes_cli/doctor.py:_build_apikey_providers_list` deduplicated profiles
against a `_known_canonical` set built from the static list (Z.AI/GLM,
Kimi, DeepSeek, …). Providers with their own dedicated check above the
generic loop (Anthropic, OpenRouter, Bedrock) were not in that set, so
their profiles were appended and ran a second, broken check.

Fix
---
Add `{"anthropic", "openrouter", "bedrock"}` to the skip set, and
also skip profiles whose aliases match any of those names (e.g.
`claude`, `claude-oauth` → anthropic).

Tests
-----
tests/hermes_cli/test_doctor_dedicated_provider_skip.py:
  - test_build_apikey_providers_list_skips_dedicated_check_providers:
    asserts the assembled list does not contain anthropic, openrouter,
    or bedrock entries.
  - test_build_apikey_providers_list_includes_non_dedicated_providers:
    sanity guard that legitimate providers (DeepSeek, Z.AI/GLM) survive.
Both confirmed via stash-verify (fail pre-fix with anthropic/openrouter
leaking, pass post-fix).

Fixes #22346
2026-05-09 13:36:33 -07:00
Wesley Simplicio 78698381af fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open
ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a
snapshot of PRAGMA table_info taken at function entry.  When the gateway
dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board
and once via init_db's discard-and-reconnect path), a second connection can
run the same migration before the first one commits, causing:

  sqlite3.OperationalError: duplicate column name: consecutive_failures

This crashed the dispatcher on every first tick after a gateway restart
(subsequent ticks succeeded because the columns were then present).

Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a
try/except that swallows OperationalError whose message contains
'duplicate column name'.  All ALTER TABLE calls in
_migrate_add_optional_columns are routed through this helper.

Closes #21708
2026-05-09 13:36:23 -07:00
Wesley Simplicio 68854cdcdb fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro
DeepSeek V4 Pro returns thinking content as typed blocks inside the
content array rather than as a top-level reasoning_content field:

  [{"type": "thinking", "thinking": "..."}, {"type": "output", ...}]

_extract_reasoning only handled content as a plain string, so the
thinking text was silently dropped.  On the next turn the session was
replayed without the thinking block, causing:

  HTTP 400: The content[].thinking in the thinking mode must be
  passed back to the API.

Fix: when content is a list and no structured reasoning field was
found, scan for items with type=='thinking' and accumulate their
'thinking' (or 'text') value into reasoning_parts.  Structured fields
(reasoning, reasoning_content, reasoning_details) still take priority
so existing provider behaviour is unchanged.

Closes #21944
2026-05-09 13:36:12 -07:00
Wesley Simplicio 98e94beb1b fix(deps): declare youtube-transcript-api in pyproject.toml [youtube] extra
skills/media/youtube-content/scripts/fetch_transcript.py and
optional-skills/productivity/memento-flashcards/scripts/youtube_quiz.py
both import youtube-transcript-api at runtime, but the package was not
listed in pyproject.toml.  A fresh `uv sync` therefore omits it, and
both skills fail on first invocation with:

    ModuleNotFoundError: No module named 'youtube_transcript_api'

Add a new [youtube] optional-dependency group with
youtube-transcript-api>=1.2.0 (the v1.x API surface the scripts already
use) and include it in [all] so standard installs pick it up.

Regression tests: TestPyprojectDeclaresYoutubeExtra verifies the extra
is present in pyproject.toml and included in [all].

Closes #22243

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-09 13:36:01 -07:00
Wesley Simplicio a671d8a27a fix(email): use real hermes version in IMAP ID command 2026-05-09 13:35:50 -07:00
Wesley Simplicio 3fd4ccbd8b fix(email): send IMAP ID extension to support 163/NetEase mailbox
163/NetEase IMAP servers reject every UID SEARCH/FETCH with `BYE Unsafe
Login` unless the client first identifies itself via the RFC 2971 ID
command after LOGIN.  Without this, the email gateway logs in OK but
then fails on the very first poll and the connection is torn down.

Send the ID payload best-effort after both `imap.login()` sites
(`EmailAdapter.connect` and `_fetch_new_messages`).  Failures are
swallowed at debug level so non-supporting IMAP servers (Gmail,
Outlook, Fastmail, Yahoo, etc.) keep working unchanged.

Closes #22271
2026-05-09 13:35:50 -07:00
Wesley Simplicio 48bf0ea249 fix(browser_tool): fall through to autodetect on config read failure 2026-05-09 13:35:39 -07:00
Wesley Simplicio 3170c8d448 fix(browser_tool): do not cache transient None cloud provider resolution
Problem: `_get_cloud_provider()` set `_cloud_provider_resolved = True`
before resolution. If credentials were briefly unavailable on the first
call (e.g. a managed Nous Portal token mid-refresh), the resolver pinned
the entire process to local mode forever, even after credentials
self-healed seconds later.

Root cause: bookkeeping was set up-front, so any code path that fell
through to `return _cached_cloud_provider` (config read failure, no
credentials yet, explicit-provider instantiation failure) committed the
transient `None` to the cache permanently.

Fix: invert the bookkeeping. `_cloud_provider_resolved = True` is now
set only when (a) the user explicitly chose `cloud_provider: local`, or
(b) a provider was successfully resolved. All transient `None` paths
return without poisoning the cache, so the next call retries. Explicit
provider instantiation failures now log at warning level with stack
trace so operators can diagnose them.

Tests: 5 new cases in tests/tools/test_browser_cloud_provider_cache.py
covering explicit local, successful resolution, no-credentials-yet,
config read failure, and explicit provider instantiation failure.
Stash-verify confirmed the 3 transient-None tests fail without the fix.
All 320 existing browser tests still green.

Closes #22324
2026-05-09 13:35:39 -07:00
Teknium 5a0021146b chore: add Qwinty to AUTHOR_MAP 2026-05-09 13:35:04 -07:00
Maxim Esipov 17d8914850 fix(auxiliary): rotate pooled auth after quota failures 2026-05-09 13:35:04 -07:00
Teknium 775c0e22cf perf(models_dev): cache-first lookup, skip network when disk cache is fresh (#22808)
`fetch_models_dev()` is on the hot path of every `AIAgent.__init__`
(via `context_compressor → get_model_context_length`). The previous
policy was "always try network first, only fall back to disk if
network fails," so every fresh `hermes chat` / `hermes gateway` /
batch / cron process paid 250-500 ms re-fetching a 2 MB JSON registry
that was already on disk from earlier runs.

Add a stage 2 between in-mem and network: if
`models_dev_cache.json` exists and its mtime is younger than the
existing `_MODELS_DEV_CACHE_TTL` (1 hour, same TTL the in-mem cache
already uses), load from disk and skip the network call.

The in-mem TTL is anchored to the disk file's age, so a 50-min-old
cache stays in-memory for only 10 more minutes — no surprise
extension of staleness window.

Invariants preserved:
- `force_refresh=True` still always hits the network and only falls
  back to disk on failure (`hermes config refresh` semantics).
- Missing disk cache → fall through to network (first-ever run).
- Stale disk cache (mtime > TTL) → fall through to network.
- Negative file age (clock skew) → fall through to network.
- Network failure → existing stage-4 stale-disk fallback unchanged.

Measured impact (3-run medians, 9950X3D, fresh process per run):
  fetch_models_dev cold:  256 → 17 ms  (-93%)
  hermes chat -q wall:   4.00 → 3.73 s (-7% median)
                         3.99 → 3.60 s (-10% min)

The chat-end-to-end win is bounded below by API latency variance, but
the fetch_models_dev microbenchmark is the cleanest signal: 239 ms
shaved off every fresh-process agent construction.

Win compounds with the previous perf PRs:
  #22681 google_chat lazy-load
  #22766 doctor parallel + IMDS off
  #22790 gateway.platforms PEP 562

Tests: all 30 `tests/agent/test_models_dev.py` pass (added 4 new ones
covering the new disk-cache-first path, force_refresh override, stale
disk fallback, and missing-disk-cache fall-through). Full `tests/agent/`
suite: 2560 passed, 0 failed.
2026-05-09 13:32:38 -07:00
Julien Talbot cd712b176a feat(transports/codex): pass reasoning.effort to xAI Responses API
The is_xai_responses branch only sent include=[reasoning.encrypted_content]
without forwarding the resolved reasoning_effort. Other Responses providers
(OpenAI, GitHub) already get effort forwarded — this aligns the xAI path.

Without this, agent.reasoning_effort is silently dropped on the xAI direct
path, making Hermes unable to control reasoning depth on grok-4.x via
api.x.ai. Tests added to TestCodexBuildKwargs cover effort passthrough,
disabled state, and minimal-clamp parity with non-xAI.
2026-05-09 13:23:02 -07:00
Teknium 252d68fd45 docs: deep audit — fix stale config keys, missing commands, and registry drift (#22784)
* docs: deep audit — fix stale config keys, missing commands, and registry drift

Cross-checked ~80 high-impact docs pages (getting-started, reference, top-level
user-guide, user-guide/features) against the live registries:

  hermes_cli/commands.py    COMMAND_REGISTRY (slash commands)
  hermes_cli/auth.py        PROVIDER_REGISTRY (providers)
  hermes_cli/config.py      DEFAULT_CONFIG (config keys)
  toolsets.py               TOOLSETS (toolsets)
  tools/registry.py         get_all_tool_names() (tools)
  python -m hermes_cli.main <subcmd> --help (CLI args)

reference/
- cli-commands.md: drop duplicate hermes fallback row + duplicate section,
  add stepfun/lmstudio to --provider enum, expand auth/mcp/curator subcommand
  lists to match --help output (status/logout/spotify, login, archive/prune/
  list-archived).
- slash-commands.md: add missing /sessions and /reload-skills entries +
  correct the cross-platform Notes line.
- tools-reference.md: drop bogus '68 tools' headline, drop fictional
  'browser-cdp toolset' (these tools live in 'browser' and are runtime-gated),
  add missing 'kanban' and 'video' toolset sections, fix MCP example to use
  the real mcp_<server>_<tool> prefix.
- toolsets-reference.md: list browser_cdp/browser_dialog inside the 'browser'
  row, add missing 'kanban' and 'video' toolset rows, drop the stale
  '38 tools' count for hermes-cli.
- profile-commands.md: add missing install/update/info subcommands, document
  fish completion.
- environment-variables.md: dedupe GMI_API_KEY/GMI_BASE_URL rows (kept the
  one with the correct gmi-serving.com default).
- faq.md: Anthropic/Google/OpenAI examples — direct providers exist (not just
  via OpenRouter), refresh the OpenAI model list.

getting-started/
- installation.md: PortableGit (not MinGit) is what the Windows installer
  fetches; document the 32-bit MinGit fallback.
- installation.md / termux.md: installer prefers .[termux-all] then falls
  back to .[termux].
- nix-setup.md: Python 3.12 (not 3.11), Node.js 22 (not 20); fix invalid
  'nix flake update --flake' invocation.
- updating.md: 'hermes backup restore --state pre-update' doesn't exist —
  point at the snapshot/quick-snapshot flow; correct config key
  'updates.pre_update_backup' (was 'update.backup').

user-guide/
- configuration.md: api_max_retries default 3 (not 2); display.runtime_footer
  is the real key (not display.runtime_metadata_footer); checkpoints defaults
  enabled=false / max_snapshots=20 (not true / 50).
- configuring-models.md: 'hermes model list' / 'hermes model set ...' don't
  exist — hermes model is interactive only.
- tui.md: busy_indicator -> tui_status_indicator with values
  kaomoji|emoji|unicode|ascii (not kawaii|minimal|dots|wings|none).
- security.md: SSH backend keys (TERMINAL_SSH_HOST/USER/KEY) live in .env,
  not config.yaml.
- windows-wsl-quickstart.md: there is no 'hermes api' subcommand — the
  OpenAI-compatible API server runs inside hermes gateway.

user-guide/features/
- computer-use.md: approvals.mode (not security.approval_level); fix broken
  ./browser-use.md link to ./browser.md.
- fallback-providers.md: top-level fallback_providers (not
  model.fallback_providers); the picker is subcommand-based, not modal.
- api-server.md: API_SERVER_* are env vars — write to per-profile .env,
  not 'hermes config set' which targets YAML.
- web-search.md: drop web_crawl as a registered tool (it isn't); deep-crawl
  modes are exposed through web_extract.
- kanban.md: failure_limit default is 2, not '~5'.
- plugins.md: drop hard-coded '33 providers' count.
- honcho.md: fix unclosed quote in echo HONCHO_API_KEY snippet; document
  that 'hermes honcho' subcommand is gated on memory.provider=honcho;
  reconcile subcommand list with actual --help output.
- memory-providers.md: legacy 'hermes honcho setup' redirect documented.

Verified via 'npm run build' — site builds cleanly; broken-link count went
from 149 to 146 (no regressions, fixed a few in passing).

* docs: round 2 audit fixes + regenerate skill catalogs

Follow-up to the previous commit on this branch:

Round 2 manual fixes:
- quickstart.md: KIMI_CODING_API_KEY mentioned alongside KIMI_API_KEY;
  voice-mode and ACP install commands rewritten — bare 'pip install ...'
  doesn't work for curl-installed setups (no pip on PATH, not in repo
  dir); replaced with 'cd ~/.hermes/hermes-agent && uv pip install -e
  ".[voice]"'. ACP already ships in [all] so the curl install includes it.
- cli.md / configuration.md: 'auxiliary.compression.model' shown as
  'google/gemini-3-flash-preview' (the doc's own claimed default);
  actual default is empty (= use main model). Reworded as 'leave empty
  (default) or pin a cheap model'.
- built-in-plugins.md: added the bundled 'kanban/dashboard' plugin row
  that was missing from the table.

Regenerated skill catalogs:
- ran website/scripts/generate-skill-docs.py to refresh all 163 per-skill
  pages and both reference catalogs (skills-catalog.md,
  optional-skills-catalog.md). This adds the entries that were genuinely
  missing — productivity/teams-meeting-pipeline (bundled),
  optional/finance/* (entire category — 7 skills:
  3-statement-model, comps-analysis, dcf-model, excel-author, lbo-model,
  merger-model, pptx-author), creative/hyperframes,
  creative/kanban-video-orchestrator, devops/watchers,
  productivity/shop-app, research/searxng-search,
  apple/macos-computer-use — and rewrites every other per-skill page from
  the current SKILL.md. Most diffs are tiny (one line of refreshed
  metadata).

Validation:
- 'npm run build' succeeded.
- Broken-link count moved 146 -> 155 — the +9 are zh-Hans translation
  shells that lag every newly-added skill page (pre-existing pattern).
  No regressions on any en/ page.
2026-05-09 13:19:51 -07:00
Teknium ea2d66ddc0 perf(gateway): defer QQAdapter and YuanbaoAdapter imports via PEP 562 (#22790)
`gateway/platforms/__init__.py` eagerly imported `QQAdapter` and
`YuanbaoAdapter` at package-init time, which transitively pulled in
qqbot's chunked-upload + keyboards + onboard machinery and yuanbao's
websocket stack. About 84 ms wall and 23 MB RSS on every fresh process
that touched anything under `gateway.platforms` — including `hermes
chat` (via run_agent → cli's plugin discovery transitive import).

Nothing in the codebase actually consumes these symbols from the
package root; every real call site uses the long-form path
(`from gateway.platforms.qqbot import QQAdapter`,
`from gateway.platforms.yuanbao import YuanbaoAdapter` in gateway/run.py).
The eager re-export was only there for convenience.

Replace with a PEP 562 module-level `__getattr__` that lazily imports
on first attribute access. Public API stays identical:
`from gateway.platforms import QQAdapter` keeps working but only
pays the import cost when the symbol is actually touched. `__dir__`
preserves help() / autocomplete behavior.

Measured impact (7-run medians, 9950X3D):
  import gateway.platforms        127 →  43 ms  (-66%)
                                   50 →  27 MB  (-46%)
  import gateway.platforms.base   127 →  44 ms  (-65%)
                                   50 →  27 MB  (-46%)
  import cli (full chat path)     745 → 710 ms  ( -5%)
                                   96 →  90 MB  ( -6%)
  hermes chat -q (cold)                  -5 MB

The per-import win is biggest because qqbot/yuanbao deps don't overlap
with anything on the gateway-platforms path — full `import cli`
already loads aiohttp/websockets transitively from other places, so
the marginal CLI win is smaller than the isolated import benchmark.
The `gateway.platforms.base` win is what matters most for long-lived
gateway processes: every gateway boot saves 23 MB resident.

All 144 qqbot tests pass; broader gateway suite (5132 tests) passes
modulo 4 pre-existing flakes also failing on main without this change.
2026-05-09 13:17:48 -07:00
Teknium dcff23a25f test(xai-image): regression-guard literal '1k'/'2k' resolution payload
The xAI image-gen provider was DOA from PR #14765 onward — every request
422'd because the resolution param was being mapped to '1024'/'2048' but
xAI's API expects the literal strings '1k'/'2k'. PR #18678 fixed the
mapping; this test asserts the wire payload carries the literal so the
regression cannot recur silently.
2026-05-09 13:07:46 -07:00
Ayman Kamal 5b32c9fc66 chore: add A-kamal to AUTHOR_MAP for PR #18678 2026-05-09 13:07:46 -07:00
Ayman Kamal 13b474c56e fix: send correct resolution param to xAI image generation API
The xAI /v1/images/generations endpoint expects resolution as a
literal string ('1k' or '2k'), not the numeric value ('1024').

- Change _XAI_RESOLUTIONS from a dict mapping to a validation set
- Use the resolution key directly instead of the mapped value
- Fall back to DEFAULT_RESOLUTION on invalid config values

Fixes 422 Unprocessable Entity errors when resolution was sent.
2026-05-09 13:07:46 -07:00
Teknium e612c3d6f0 perf(doctor): parallelize API connectivity checks and disable IMDS (#22766)
`hermes doctor` ran every connectivity probe sequentially and on a typical
developer laptop spent ~2s of its ~5s wall time inside boto3's EC2
instance-metadata-service lookup (169.254.169.254) — the default
AWS credential chain probes IMDS even when AWS_BEARER_TOKEN_BEDROCK
or AWS_ACCESS_KEY_ID is the only legitimate source.

Refactor the API Connectivity section so every probe (OpenRouter,
Anthropic, ~16 static API-key providers + dynamic profiles, AWS
Bedrock) is a pure function returning a structured result, then
fan them out through a ThreadPoolExecutor(max_workers=8). Output
order, glyphs, colours, padding, and issue strings stay byte-for-byte
identical to the sequential implementation; results are gathered
in submission order.

Also disable IMDS for the parallel block by setting
AWS_EC2_METADATA_DISABLED=true on the parent thread before submitting
work (and restoring its prior value in a finally block). Bedrock's
real-API call gets a Config(connect_timeout=5, read_timeout=10,
retries={max_attempts:1}) so a transient regional failure can't pad
the run by 30+ seconds.

Measured impact (5-run medians, 9950X3D):
  hermes doctor:           5.07 → 2.16 s  (-57%)

Doctor tests: 48 passed (test_doctor.py + test_doctor_command_install.py).

The remaining ~2s of wall is import overhead + a couple of one-off
network calls outside the API Connectivity section (`fetch_models_dev`
provider catalog refresh, Nous OAuth refresh in `Auth Providers`).
Those are next-tier targets, not part of this change.
2026-05-09 13:03:20 -07:00
Teknium 8f711f79a4 fix(tools): install cua-driver when Computer Use is enabled via 'hermes tools' (#22765)
Returning users who enabled '🖱️ Computer Use (macOS)' via 'hermes tools'
saw '✓ Saved configuration' but no install — cua-driver was never on
PATH and the toolset failed at first use. Two compounding causes:

1. _toolset_needs_configuration_prompt fell through to _toolset_has_keys,
   which returned True for any provider with empty env_vars. cua-driver
   has no env vars, so the gate skipped _configure_toolset entirely and
   _run_post_setup('cua_driver') never ran.

2. No stable CLI entry-point existed for re-running the install when
   the picker no-op'd it (e.g. when toggling the toolset off+on inside
   one picker session, where 'added' is empty).

Changes:

- hermes_cli/tools_config.py: add _POST_SETUP_INSTALLED registry
  mapping post_setup keys to installed-state predicates. The gate
  now returns True when any visible provider has a registered
  post_setup whose predicate fails. cua_driver is the only opt-in
  for now; other post_setup hooks keep their existing behaviour.
- hermes_cli/main.py: add 'hermes computer-use install' and
  'hermes computer-use status' as a stable docs target. install
  reuses the same _run_post_setup('cua_driver') path that the
  picker invokes; status reports whether cua-driver is on PATH.
- tools/computer_use/cua_backend.py: install hint now points users
  at 'hermes computer-use install' first.
- website/docs/user-guide/features/computer-use.md: document the
  new command as the primary install path.
- website/docs/reference/cli-commands.md: catalog 'hermes
  computer-use' alongside 'hermes tools'.
- tests/hermes_cli/test_post_setup_gating.py: regression coverage
  for the gate predicate (missing -> setup forced, installed ->
  setup skipped, broken predicate -> non-blocking, unregistered
  keys -> behaviour unchanged).

Fixes #22737. Reported by @f-trycua.
2026-05-09 13:02:25 -07:00
Teknium 6e5489c9f3 fix(memory): tighten MEMORY_GUIDANCE against ephemeral PR/issue/SHA notes (#22781)
The model regularly writes session-outcome facts to MEMORY.md despite
the existing 'Do NOT save task progress' line — entries like
'Submitted PR #22577 for the kanban dedup fix' or 'Fixed bug X in
file Y'. These are stale within days, pollute the system prompt,
and crowd out durable user preferences (the issue #22563 reporter
saw 9 sections of bug-fix notes injected on a brand-new task).

Add explicit examples of what NOT to save (PR numbers, issue
numbers, commit SHAs, 'fixed/submitted/Phase N done', file counts)
plus the 7-day-staleness heuristic so the model has a concrete
calibration target rather than guessing what counts as 'task progress'.

Closes #22563 (the prompt-side, low-risk portion). The bigger
relevance-based-injection / vector-retrieval feature requested in
#22563 is tracked under #2184 (Richer local memory). Per skill rule
on prompt caching, dynamic memory injection breaks the frozen-snapshot
invariant and needs a separate design call.
2026-05-09 12:48:25 -07:00
Teknium e7c0d6ee53 fix(fallback): skip chain entries matching current provider/model/base_url (#22780)
_try_activate_fallback() walked the chain by index without comparing
the candidate entry against the currently-failing backend. So a
misconfigured chain that listed the same provider+model as the primary,
or two custom_providers entries pointing at the same shim URL, would
loop the same failure 3x for the same backend.

After the fix, advance() skips:
  - entries where (provider, model) match the current agent's
  - entries with a base_url + model matching the current backend
    (catches two custom_providers names pointing at the same shim)

Recursing through self._try_activate_fallback() continues to the next
chain entry; if everything matches, returns False and the caller
moves on without retrying the same broken path.

3 regression tests covering same-provider-same-model skip, same-base_url-
same-model skip, and the all-self-matching-returns-False exhaustion path.

Closes #22548 (the Hermes-side portion). The 120s timeout itself in
the downstream claude-cli shim is a deployment concern documented in
that issue's wherewolf87 comment.
2026-05-09 12:48:19 -07:00
Teknium 70bc52e408 fix(cli): make Ctrl+Enter insert newline on WSL/SSH/Windows Terminal (#22777)
Native Windows, WSL, SSH sessions, and Windows Terminal all send
Ctrl+Enter as bare LF (c-j). Hermes was binding c-j as submit on
every POSIX platform, so Ctrl+Enter submitted instead of inserting
a newline on those terminals. Reported in #22379.

Add _preserve_ctrl_enter_newline() predicate that detects the
environments where Ctrl+Enter must produce a newline (sys.platform
== 'win32', SSH_CONNECTION/SSH_CLIENT/SSH_TTY env, WT_SESSION,
WSL_DISTRO_NAME, /proc/version 'microsoft' marker). Gate the
c-j-as-submit binding off in those environments and gate the
c-j-as-newline handler on. Local POSIX TTYs without those markers
(docker exec, plain ssh from a Mac) keep c-j as submit so plain
Enter still works on thin PTYs.

Add install_ctrl_enter_alias() in hermes_cli/pt_input_extras.py
mapping the three CSI-u / modifyOtherKeys variants of Ctrl+Enter
('\x1b[13;5u', '\x1b[27;5;13~', '\x1b[27;5;13u') to the
(Escape, ControlM) tuple Alt+Enter produces. This lets Kitty /
mintty / xterm-with-modifyOtherKeys users over SSH get a Ctrl+Enter
newline through the existing Alt+Enter handler.

9 new tests + extended existing test_lf_enter_binds_to_submit_handler_posix
to cover bare-local vs SSH branches.

Closes #22379.
2026-05-09 12:48:14 -07:00
Teknium 2124ad72a2 fix(api-server): emit length/error finish_reason for truncation/failure (#22775)
Non-streaming /v1/chat/completions wrapped any AIAgent result \u2014 including
partial/failed runs \u2014 as a successful 200 with finish_reason='stop' and
the internal failure string substituted into message.content. API
clients had no way to distinguish 'agent answered: X' from
'agent crashed and the X you see is its error message'.

After the fix:
  - completed: True             \u2192 200 finish_reason='stop' (unchanged)
  - partial + truncated text    \u2192 200 finish_reason='length' + hermes extras
  - partial + no text / failed  \u2192 502 OpenAI error envelope (SDKs raise)
  - other failures              \u2192 200 finish_reason='error' + hermes extras

Adds X-Hermes-Completed / X-Hermes-Partial / X-Hermes-Error headers
plus a 'hermes' extras object on partial responses for clients that
want the full picture.

Closes #22496.
2026-05-09 12:48:08 -07:00
Teknium 86f69e8c2a fix(agent): hydrate memory-nudge counters from conversation_history (#22774)
Gateway creates a fresh AIAgent per inbound message in several common
scenarios: cache miss, idle eviction (1h TTL), config-signature
mismatch, process restart. A freshly-built AIAgent has
_turns_since_memory=0 and _user_turn_count=0, so the
memory.nudge_interval trigger ('_turns_since_memory >=
_memory_nudge_interval') can never be reached when these reconstructions
happen on roughly the cadence of the interval. A user can chat for hours
on Telegram without ever seeing a self-improvement review fire.

Reconstruct the counters from conversation_history at the top of
run_conversation(), right after the existing _hydrate_todo_store call.
Idempotent guard ('if self._user_turn_count == 0') means a cached agent
that already accumulated counters keeps them; only freshly-built agents
hydrate. Modulo arithmetic preserves the original 1-in-N cadence rather
than firing a review immediately on resume.

7 regression tests pinning the contract (mid-cycle history, modulo wrap,
idempotency, zero-interval skip, role==user filtering, production-code
anchor).

Closes #22357.
2026-05-09 12:48:03 -07:00
Teknium ade5981429 fix(kanban): sanitize comment author rendering in build_worker_context (#22769)
Operator-controlled HERMES_PROFILE values were rendered as
'**${author}** (${ts}):' — markdown bold with no provenance prefix.
Worker comment bodies render directly underneath. A misleading
profile name like 'hermes-system' or 'operator' could be misread by
the next worker as a system directive above attacker-influenced
content (confused-deputy primitive gated on operator misconfig).

The LLM-controlled author-forgery surface was already closed in
#22435 (author removed from KANBAN_COMMENT_SCHEMA). This is
defense-in-depth: render with an explicit 'comment from worker
`<author>` at <ts>:' prefix so even 'hermes-system' resolves to
'comment from worker `hermes-system` at ...' — parseable as
worker-comment metadata, not a system directive. Strip backticks
from author so they can't break out of the fence.

Update test_build_worker_context_caps_comments to count by body
regex since the rendered author line now also starts with
'comment '.

Closes #22452.
2026-05-09 12:47:58 -07:00
Teknium f00dc6d7a3 fix(tests): harden run_tests.sh — uv-aware bootstrap + scrub HERMES_CRON_SESSION (#22767)
Two unrelated but co-located fixes to scripts/run_tests.sh:

1. pytest-split bootstrap (#22401): the script tried '$PYTHON -m pip
   install pytest-split' on first run, but uv-created venvs ship without
   pip. Result: 'No module named pip' before any test ran. Add a uv
   fallback (uv pip install --python $PYTHON), keep pip as a secondary
   path, and emit a clear error pointing at 'uv pip install -e ".[dev]"'
   when neither is available. Also declare pytest-split in
   pyproject.toml dev extra so a normal '.[dev]' install provisions it.

2. HERMES_CRON_SESSION leak (#22400): the hermetic env scrub already
   unsets HERMES_GATEWAY_SESSION and HERMES_INTERACTIVE but missed the
   sibling HERMES_CRON_SESSION. When run_tests.sh is invoked from a
   Hermes cron job, that variable leaks into pytest, flipping
   tools/approval.py into cron-deny mode and breaking
   tests/acp/test_approval_isolation.py and friends.

Closes #22400.
Closes #22401.
2026-05-09 12:47:52 -07:00
Teknium e90aa7f280 fix(agent): notify context engine on commit_memory_session (#22764)
When session_id rotates (e.g. /new), commit_memory_session was firing
MemoryManager.on_session_end but skipping ContextEngine.on_session_end.
Engines that accumulate per-session state (LCM-style DAGs, summary
stores) leaked that state from the rotated-out session into whatever
continued under the same compressor instance.

Mirror the call shutdown_memory_provider already makes — same
lifecycle moment, same hook contract ("real session boundaries (CLI
exit, /reset, gateway expiry)"). /new is a real boundary for the old
session_id; providers keep their state but the rotated-out session_id
is done.

6 regression tests covering both-hooks-fire, no-memory-manager,
no-context-engine, both failure-tolerant paths.

Closes #22394.
2026-05-09 12:28:42 -07:00
kshitijk4poor dae94fa652 fix: follow-up for salvaged PR #22263
- Restore allowed_chats gate before thread_id check so ignored_threads
  applies universally (even to guest mentions).
- Compute _message_mentions_bot once in _should_process_message to
  eliminate redundant second entity scan when guest_mode=true and the
  message does not mention the bot.
- Remove redundant _is_group_chat from _is_guest_mention (caller already
  verified the message is a group chat).
- Update _telegram_allowed_chats docstring to note guest_mode exception.
- Add test coverage: bot_command entity, text_mention entity,
  caption_entities, and ignored_threads + guest_mode interaction.
- Add nik1t7n to AUTHOR_MAP.
2026-05-09 11:54:04 -07:00
Nikita Nosov 55f518e521 feat(gateway): add Telegram guest mention mode 2026-05-09 11:54:04 -07:00
Teknium 369cee018d chore: add wali-reheman to AUTHOR_MAP 2026-05-09 11:12:03 -07:00
Teknium b959cfa056 fix: move pytest.importorskip below pytest import in skip-guarded tests
The original PR placed 'pwd = pytest.importorskip("pwd")' on line 4
but 'import pytest' on line 9 — NameError on module load. Same for
test_file_sync_back.py. Plus, the in-function 'pwd = pytest.importorskip'
calls in test_auto_detected_root_is_rejected confused Python's scope
analysis (later 'import pytest' made pytest local everywhere in the
function) and caused UnboundLocalError. Drop the now-redundant
in-function importorskip calls and rely on the module-level guard.
2026-05-09 11:12:03 -07:00
Wali Reheman 4e8b8573ca tests: add Windows skip guards for UNIX-only stdlib imports 2026-05-09 11:12:03 -07:00
Teknium b6ff96c057 fix(cron): allow quoted URL in github auth-header allowlist
The github-pr-workflow skill wraps the URL in double-quotes
('curl -H ... "https://api.github.com/..."'), which the original
allowlist regex (\s+https://api...) did not match. Without this,
the bundled github-pr-workflow skill is still blocked at every
cron tick despite #22605's fix landing for the bare-URL form.

Make the leading quote optional and add a regression test pinning
both single- and double-quoted forms.
2026-05-09 11:11:45 -07:00
qWaitCrypto 691778a08b fix(cron): keep auth-header exfiltration blocked 2026-05-09 11:11:45 -07:00
qWaitCrypto 783d11717a fix(cron): avoid github skill false positives in scanner 2026-05-09 11:11:45 -07:00
kshitijk4poor 9aefa74a9f feat(mcp): add codex preset for built-in MCP server discovery
Adds 'codex' to the _MCP_PRESETS registry so users can add it via

  Connecting to 'codex'...

  ✓ Connected! Found 2 tool(s) from 'codex':

    codex                                    Run a Codex session. Accepts configuration parameters matchi...
    codex-reply                              Continue a Codex conversation by providing the thread id and...

  Enable all 2 tools? [Y/n/select]:
  Cancelled. without manually specifying
the command and args.

Enables: codex mcp-server → Hermes native MCP client → Codex tools
available as first-class Hermes tools.
2026-05-09 11:11:28 -07:00
Teknium 684fd14db0 fix(dingtalk): align override signatures with base + guard Optional[error] in tests 2026-05-09 11:11:10 -07:00
qWaitCrypto c705c7ac9b fix(dingtalk): clarify webhook media behavior 2026-05-09 11:11:10 -07:00
Wesley Simplicio a33c63b9f8 fix(profiles): honour active_profile when HERMES_HOME points to hermes root
Problem:
After `hermes profile use NAME`, the gateway (started via systemd with
HERMES_HOME=/root/.hermes hardcoded) ignores the active profile and
always runs as the Default profile.  WebUI, Telegram, and all non-CLI
platforms are affected.

Root cause:
_apply_profile_override() contained an early-return guard:

    if profile_name is None and os.environ.get("HERMES_HOME"):
        return   # trust the inherited value

The intent was to let child processes inherit their parent's profile via
HERMES_HOME without redundantly re-reading active_profile.  But
systemd also sets HERMES_HOME — to the hermes root (/root/.hermes),
not a profile directory — so the guard fired and silently skipped the
active_profile check.  The user's `hermes profile use NAME` write to
~/.hermes/active_profile was never seen by the gateway process.

Fix:
Only skip the active_profile check when HERMES_HOME is already a
profile directory, identified by its immediate parent directory being
named "profiles" (e.g. ~/.hermes/profiles/coder or
/opt/data/profiles/coder).  When HERMES_HOME points to a root
directory (parent name != "profiles"), continue to read active_profile.

Tests:
- test_hermes_home_at_root_with_active_profile_is_redirected: the
  bug scenario — HERMES_HOME=/root/.hermes + active_profile=coder →
  HERMES_HOME must be redirected to .../profiles/coder.
  Stash-verified: FAILS without fix, PASSES with fix.
- test_hermes_home_already_profile_dir_is_trusted: child-process
  inheritance contract unchanged — .../profiles/coder is trusted as-is.
- test_hermes_home_unset_reads_active_profile: classic path unchanged.
- test_hermes_home_unset_default_profile_no_redirect: "default" still
  produces no redirect.
4/4 tests green.

Closes #22502.
2026-05-09 11:10:53 -07:00
briandevans 854c2ce309 fix(telegram): honor message.quote for partial-quote reply context
When a Telegram user replies using the native quote feature to select
only part of a prior message, _build_message_event was injecting the
ENTIRE replied-to message into reply_to_text via
message.reply_to_message.text/caption. python-telegram-bot exposes
the user-selected substring as message.quote (TextQuote.text); we now
prefer that and fall back to the full replied-to text only when no
native quote is present.

The agent-visible "[Replying to: \"...\"]" prefix can otherwise expand
the user's narrow quote into the full prior message, causing the agent
to act on unrelated actionable-looking text the user did not select
(e.g. multi-item briefings where the user quotes one bullet but the
prefix injects every bullet). Falls back cleanly when message.quote
is absent (PTB <21 or replies that don't quote a substring).

Fixes #22619

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 11:10:36 -07:00
Teknium 78b8155ecb chore: add xieNniu to AUTHOR_MAP 2026-05-09 11:10:04 -07:00
xieNniu c8ede8aa1b fix(plugins): resolve Git binary for installs under minimal PATH
Resolve git via shutil.which with POSIX and Git-for-Windows fallbacks before clone and pull so Dashboard/API installs do not misreport Git as missing.

Add regression tests for the resolver and pull subprocess invocation.
2026-05-09 11:10:04 -07:00
qWaitCrypto 124fbb0af0 fix(gateway): refresh runtime argv metadata 2026-05-09 11:08:23 -07:00
JackJin 7d276bfbee fix(cli): expand composite toolset when mixed with configurables in platform_toolsets
When platform_toolsets[<platform>] contains both a composite (e.g.
hermes-cli) and at least one configurable opt-in (e.g. spotify), the
has_explicit_config branch in _get_platform_tools silently dropped the
composite, leaving sessions with only the configurable + plugin tools
and no native tools (terminal, file, web, browser, memory, etc.).

Mirror the else-branch's subset inference for composites that sit
alongside the configurables, but apply _DEFAULT_OFF_TOOLSETS only to the
implicit expansion so user-listed default-off toolsets (spotify,
discord) survive.
2026-05-09 11:08:05 -07:00
Teknium 1f4200debf feat(delegate): show user's actual concurrency / spawn-depth limits in tool description (#22694)
The delegate_task tool description hardcoded 'default 3' / 'default 2' for
max_concurrent_children / max_spawn_depth, which misled the model on any
install that raised these limits — the schema text said 'default 3' even
when the user had set max_concurrent_children=15 / max_spawn_depth=3, so
the model would self-cap at 3 and never use the headroom.

Make the description dynamic. ToolEntry gains an optional
dynamic_schema_overrides callable; registry.get_definitions() merges its
output on top of the static schema before returning it. delegate_tool
registers a builder that reads the current delegation.* config and emits:

- 'up to N items concurrently for this user' (N = max_concurrent_children)
- 'Nested delegation IS enabled / OFF for this user (max_spawn_depth=N)'
- 'orchestrator children can themselves delegate up to M more level(s)'
- 'orchestrator_enabled=false' when the kill switch is set

The model_tools cache key already includes config.yaml mtime+size, so
edits to delegation.* in config invalidate the cached tool definitions
without an explicit hook. CLI_CONFIG staleness within a process is a
pre-existing limitation of _load_config and out of scope here.

Static description / tasks.description / role.description in
DELEGATE_TASK_SCHEMA are placeholders so module import doesn't trigger
cli.CLI_CONFIG load before the test conftest can redirect HERMES_HOME.
2026-05-09 11:07:53 -07:00
Teknium 000ddb8a93 chore: add SiliconID to AUTHOR_MAP 2026-05-09 11:07:37 -07:00
Matthew Cater cda20eec0c fix(kanban): gate claim + unblock on parent completion
Enforce the parent-completion invariant at claim_task (the single
ready->running chokepoint) and re-gate unblock_task so blocked->ready
only fires when parents are done. Prevents child tasks from running
ahead of in-progress parents under the create-then-link race.

Also adds a stress test that races concurrent create+link against
hammered claim_task and asserts no child runs while any parent is undone.

Ref: kanban/boards/cookai/workspaces/t_a6acd07d/root-cause.md
Refs: t_8d6af9d6
2026-05-09 11:07:37 -07:00
Teknium 79694018f8 feat(plugins): HERMES_PLUGINS_DEBUG=1 surfaces plugin discovery logs (#22684)
Plugin authors had no easy way to figure out why their plugin wasn't
loading — failures were buried in agent.log at WARNING and skip reasons
(disabled, not enabled, depth cap, exclusive) were DEBUG-only and
invisible by default.

Set HERMES_PLUGINS_DEBUG=1 to attach a stderr handler at DEBUG to the
hermes_cli.plugins logger only. Surfaces:

  - which directories were scanned + manifest counts per source
  - per manifest: resolved key, name, kind, source, on-disk path
  - skip reasons (disabled, not enabled, exclusive, depth cap, no register)
  - per load: tools/hooks/slash/CLI commands the plugin registered
  - full traceback on YAML parse failure (exc_info on the existing warning)
  - full traceback on register() exceptions, pointing at the plugin author's line

Env var off (default) → zero new stderr output, same as before.

Touches only hermes_cli/plugins.py + a doc section in the plugin-build
guide + an entry in the env-vars reference. 3 new tests lock the
attach/idempotent/no-attach behavior.
2026-05-09 11:07:12 -07:00
Teknium 8f83046f6c perf(google_chat): defer heavy google-cloud imports to first adapter use (#22681)
Plugin discovery imports every bundled platform plugin at model_tools
import time. The google_chat adapter unconditionally pulled in
google.cloud.pubsub_v1, googleapiclient, grpc, httplib2, and friends at
module top — about 33 MB RSS and 110 ms wall on every CLI invocation,
even ones that never construct a gateway adapter.

Wrap the heavy imports in _load_google_modules(): an idempotent loader
that rebinds the module-level globals (pubsub_v1, service_account,
HttpError, MediaFileUpload, …) on first call and is invoked from
GoogleChatAdapter.__init__, connect(), and check_google_chat_requirements().

The HttpError = Exception placeholder is preserved for the brief window
before the loader runs, so 'except HttpError as exc:' clauses stay
correct (Python looks up the name at try/except evaluation time, not
at function definition time).

Measured impact on a 9950X3D, 7-run medians:
  import cli:              895 → 787 ms  (-108 ms / -12%)
                           133 → 110 MB  ( -23 MB / -17%)
  import model_tools:      491 → 400 ms  ( -91 ms / -19%)
                            95 →  66 MB  ( -29 MB / -31%)
  google_chat alone:       244 → 132 ms  (-112 ms / -46%)
                            83 →  50 MB  ( -33 MB / -40%)
  hermes chat -q (cold):   177 → 145 MB  ( -32 MB / -18%)

Real-world win lands on every path that imports cli.py: hermes chat,
hermes gateway, cron jobs, batch runs, subagents. Long-lived gateway
processes save ~30 MB resident.

All 157 google_chat tests pass; full gateway suite (5050 tests) green.
2026-05-09 11:07:06 -07:00
Teknium 0d9800743c chore: add wesleysimplicio to AUTHOR_MAP 2026-05-09 11:06:21 -07:00
Wesley Simplicio 0c22434f03 fix(kanban): call recompute_ready after unlink_tasks removes a dependency
Problem:
unlink_tasks() removes a parent→child dependency edge but does not trigger
recompute_ready().  A child whose last blocking parent is unlinked stays
stuck in 'todo' indefinitely — it only promotes to 'ready' on the next
dispatcher tick or a manual 'hermes kanban recompute'.  For CLI-only users
without a dispatcher, the child is permanently stuck.

Root cause:
complete_task() and unblock_task() both call recompute_ready() after their
write transaction so downstream children are evaluated immediately.
unlink_tasks() was missing this call — removing a dependency is
semantically equivalent to completing one, so the same recompute is needed.

Fix:
Capture the rowcount result before the write_txn exits, then call
recompute_ready(conn) outside the transaction when a row was actually
deleted (so the child sees the updated task_links state).

Tests:
Added test_unlink_tasks_triggers_recompute_ready in
tests/hermes_cli/test_kanban_db.py: creates parent A (done) + parent C
(running), child B with both parents (todo), unlinks C→B, asserts B is
ready immediately.  Stash-verified: FAILS without fix (child stays todo),
PASSES with fix.
62/62 tests green in tests/hermes_cli/test_kanban_db.py.

Closes #22459.
2026-05-09 11:06:21 -07:00
Teknium b9c001116e feat: confirm prompt for destructive slash commands (#4069) (#22687)
/clear, /new, /reset, and /undo now ask the user to confirm before
discarding conversation state — three-option prompt routed through the
existing tools.slash_confirm primitive.

Native yes/no buttons render on Telegram, Discord, and Slack (their
adapters already implement send_slash_confirm); other platforms get a
text-fallback prompt and reply with /approve, /always, or /cancel.

The classic prompt_toolkit CLI uses the same three-option flow via the
established _prompt_text_input pattern (see _confirm_and_reload_mcp).
TUI keeps its existing modal overlay (#12312).

Gated by new config key approvals.destructive_slash_confirm (default
true). Picking 'Always Approve' flips the gate to false so subsequent
destructive commands run silently — matches the established
mcp_reload_confirm UX.

Out of scope: /cron remove (separate domain — scheduled jobs, not
session history). Existing TUI overlay env-var (HERMES_TUI_NO_CONFIRM)
left unchanged; cosmetic unification can come later.

Closes #4069.
2026-05-09 11:04:46 -07:00
ethernet 0cafe7d50d Merge pull request #22510 from novax635/fix/gateway-slash-confirm-boundary-cleanup
fix gateway: clear slash confirm state during session boundary cleanup
2026-05-09 12:48:49 -04:00
ethernet f1f42a7b9f Merge pull request #22610 from uzunkuyruk/fix/telegram-table-row-label-duplicate-bullet
fix(telegram): exclude row-label column from bullet items in table re…
2026-05-09 11:47:45 -04:00
uzunkuyruk 8fdaf4d3d6 fix(telegram): exclude row-label column from bullet items in table rendering
When a GFM table has a row-label column (first column with no header),
_render_table_block_for_telegram incorrectly included the row-label cell
in the bullet zip alongside the data cells, producing a spurious bullet
like '• 維度: 核心賣點' before the real data rows.

Detect the row-label column by comparing the first data row cell count
against the header count (has_row_label_col = len(first_data_row) ==
len(headers) + 1). When present, use cells[0] as the heading and
zip headers against cells[1:] only, correctly excluding the row-label
from the bullet list.

Fixes #22604
2026-05-09 17:39:16 +03:00
novax635 8b6501786c fix(gateway): clear slash-confirm state during session boundary cleanup 2026-05-09 14:18:20 +03:00
emozilla 0d1cbc2dda changes from feedback 2026-05-05 22:45:12 -04:00
emozilla 401aadb5b8 docs(security): rewrite policy around OS-level isolation as the boundary
Restate the trust model from first principles: the OS is the only
load-bearing boundary against an adversarial LLM. Distinguish
terminal-backend isolation (sandboxes the shell tool) from
whole-process wrapping (sandboxes the agent itself, reference
deployment NVIDIA OpenShell). Name in-process components (approval
gate, output redaction, Skills Guard) as heuristics, and the class
of reports that defeat them as out of scope under this policy —
while explicitly welcoming them as regular issues or PRs.

Introduce 'agent-loaded content' as the narrow, honest commitment:
attacker-influenced input must not chain into a write the agent
later loads on its own initiative.

Strip implementation-detail enumerations (backend names, adapter
names, config keys, env vars, internal symbols) so the doc stays
evergreen as code evolves.
2026-05-05 12:46:51 -04:00
ethernet 9d645d98c4 fix(tui): update README 2026-04-30 18:23:28 -04:00
ethernet 242659f5af fix(tui): don't hardcode /home/bb 2026-04-30 18:23:28 -04:00
ethernet 42df7ec597 fix(tui): update comments 2026-04-30 18:23:28 -04:00
ethernet 42e166c7ea refactor(docker): drop manual @hermes/ink build, rely on esbuild bundle
the esbuild pipeline (scripts/build.mjs) already bundles ink into a
single self-contained dist/entry.js.

remove the Dockerfile steps that manually copied packages/hermes-ink
into node_modules/@hermes/ink and ran a nested
npm install there.

- Dockerfile: simplify TUI build step to just 'npm run build'
- hermes_cli/main.py: _tui_build_needed now checks dist/entry.js
staleness against source files before falling back to the old
ink-bundle.js logic
- tests: update TUI npm install tests and drop the Dockerfile contract
test for the removed ink materialization step
2026-04-30 17:32:55 -04:00
github-actions[bot] 279504d5b8 fix(nix): refresh npm lockfile hashes 2026-04-30 19:49:01 +00:00
ethernet 42627b4eaf refactor(tui): bundle with esbuild, drop runtime node_modules
Replace the tsc + babel pipeline with a single esbuild invocation that
produces a self-contained dist/entry.js. The nix TUI derivation no
longer copies node_modules — only dist/ + package.json ship, shrinking
the output from hundreds of MB to ~2.9 MB.

- ui-tui/scripts/build.mjs: new esbuild bundler. Aliases @hermes/ink
  to source (esbuild's __esm helper doesn't await nested async init,
  which breaks lazy-assigned exports like 'render' when re-exporting
  through a prebuilt submodule). Stubs react-devtools-core (dev-only).
  Injects a createRequire shim for transitive CJS deps. Strips the
  shebang from src/entry.tsx because Nix patchShebangs mangles
  '/usr/bin/env -S node --max-old-space-size=8192 --expose-gc' — it
  drops the 'node' token. The Python launcher always invokes node
  explicitly, so the shebang is redundant.
- nix/tui.nix: installPhase no longer copies node_modules or the
  @hermes/ink packages dir.
- nix/checks.nix: drop the 'node_modules present' assertion.
- hermes_cli/main.py: _tui_need_npm_install short-circuits when
  dist/entry.js exists and no package-lock.json is present. That is
  the prebuilt-bundle layout (nix / packaged release) and there is
  nothing to install. Without this, the launcher tried to npm install
  in a non-existent site-packages/ui-tui path.
2026-04-30 15:38:50 -04:00
635 changed files with 64525 additions and 5331 deletions
+12
View File
@@ -143,6 +143,18 @@
# Also requires ~/.honcho/config.json with enabled=true (see README).
# HONCHO_API_KEY=
# =============================================================================
# HYPERLIQUID OPTIONAL SKILL
# =============================================================================
# Optional defaults for the Hyperliquid skill in optional-skills/blockchain/hyperliquid
#
# Hyperliquid API base URL override
# Default: https://api.hyperliquid.xyz
# HYPERLIQUID_API_URL=https://api.hyperliquid-testnet.xyz
#
# Default address for account-level commands like state, fills, orders, and review
# HYPERLIQUID_USER_ADDRESS=0x0000000000000000000000000000000000000000
# =============================================================================
# TERMINAL TOOL CONFIGURATION
# =============================================================================
+2 -1
View File
@@ -122,7 +122,8 @@ jobs:
retention-days: 14
- name: Post / update PR comment
if: github.event_name == 'pull_request'
if: github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
continue-on-error: true
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7
with:
script: |
+8 -4
View File
@@ -540,10 +540,14 @@ Full authoring guide: `website/docs/developer-guide/model-provider-plugin.md`.
### Dashboard / context-engine / image-gen plugin directories
`plugins/context_engine/`, `plugins/image_gen/`, `plugins/example-dashboard/`,
etc. follow the same pattern (ABC + orchestrator + per-plugin directory).
Context engines plug into `agent/context_engine.py`; image-gen providers
into `agent/image_gen_provider.py`.
`plugins/context_engine/`, `plugins/image_gen/`, etc. follow the same
pattern (ABC + orchestrator + per-plugin directory). Context engines
plug into `agent/context_engine.py`; image-gen providers into
`agent/image_gen_provider.py`. Reference / docs-companion plugins
(`example-dashboard`, `strike-freedom-cockpit`, `plugin-llm-example`,
`plugin-llm-async-example`) live in the
[`hermes-example-plugins`](https://github.com/NousResearch/hermes-example-plugins)
companion repo, not in this tree.
---
+302 -55
View File
@@ -1,84 +1,331 @@
# Hermes Agent Security Policy
This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project.
This document describes Hermes Agent's trust model, names the one
security boundary the project treats as load-bearing, and defines the
scope for vulnerability reports.
## 1. Vulnerability Reporting
## 1. Reporting a Vulnerability
Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities.
Report privately via [GitHub Security Advisories](https://github.com/NousResearch/hermes-agent/security/advisories/new)
or **security@nousresearch.com**. Do not open public issues for
security vulnerabilities. **Hermes Agent does not operate a bug
bounty program.**
### Required Submission Details
- **Title & Severity:** Concise description and CVSS score/rating.
- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`).
- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version.
- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release.
- **Impact:** Explanation of what trust boundary was crossed.
A useful report includes:
- A concise description and severity assessment.
- The affected component, identified by file path and line range
(e.g. `path/to/file.py:120-145`).
- Environment details (`hermes version`, commit SHA, OS, Python
version).
- A reproduction against `main` or the latest release.
- A statement of which trust boundary in §2 is crossed.
Please read §2 and §3 before submitting. Reports that demonstrate
limits of an in-process heuristic this policy does not treat as a
boundary will be closed as out-of-scope under §3 — but see §3.2:
they are still welcome as regular issues or pull requests, just not
through the private security channel.
---
## 2. Trust Model
The core assumption is that Hermes is a **personal agent** with one trusted operator.
Hermes Agent is a single-tenant personal agent. Its posture is
layered, and the layers are not equally load-bearing. Reporters and
operators should reason about them in the same terms.
### Operator & Session Trust
- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level.
- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries.
- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing.
### 2.1 Definitions
### Dangerous Command Approval
The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`:
- `"on"` (default) — prompts the user to approve dangerous commands.
- `"auto"` — auto-approves after a configurable delay.
- `"off"` — disables the gate entirely (break-glass; see Section 3).
- **Agent process.** The Python interpreter running Hermes Agent,
including any Python modules it has loaded (skills, plugins,
hook handlers).
- **Terminal backend.** A pluggable execution target for the
`terminal()` tool. The default runs commands directly on the host.
Other backends run commands inside a container, cloud sandbox, or
remote host.
- **Input surface.** Any channel through which content enters the
agent's context: operator input, web fetches, email, gateway
messages, file reads, MCP server responses, tool results.
- **Trust envelope.** The set of resources an operator has implicitly
granted Hermes Agent access to by running it — typically, whatever
the operator's own user account can reach on the host.
- **Stance.** An explicit statement in Hermes Agent's documentation
or code about how a consuming layer (adapter, UI, file writer,
shell) should treat agent output — e.g. "the dashboard renders
agent output as inert HTML."
### Output Redaction
`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations.
### 2.2 The Boundary: OS-Level Isolation
### Skills vs. MCP Servers
- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands.
- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning.
**The only security boundary against an adversarial LLM is the
operating system.** Nothing inside the agent process constitutes
containment — not the approval gate, not output redaction, not any
pattern scanner, not any tool allowlist. Any in-process component
that screens LLM output is a heuristic operating on an
attacker-influenced string, and this policy treats it as such.
### Code Execution Sandbox
The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls.
Hermes Agent supports two OS-level isolation postures. They address
different threats and an operator should choose deliberately.
### Subagents
- **No recursive delegation:** The `delegate_task` tool is disabled for child agents.
- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected.
- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation.
#### Terminal-backend isolation
A non-default terminal backend runs LLM-emitted shell commands
inside a container, remote host, or cloud sandbox. The file tools
(`read_file`, `write_file`, `patch`) also run through this backend,
since they are implemented on top of the shell contract — they
cannot reach paths the backend doesn't expose.
What this confines: anything the agent does by issuing shell or
file operations. What this does **not** confine: everything the
agent does in its own Python process. That includes the
code-execution tool (spawned as a host subprocess), MCP subprocesses
(spawned from the agent's environment), plugin loading, hook
dispatch, and skill loading (all imported into the agent
interpreter).
Terminal-backend isolation is the right posture when the concern is
LLM-emitted destructive shell or unwanted file-tool writes, and the
operator is otherwise trusted.
#### Whole-process wrapping
Whole-process wrapping runs the entire agent process tree inside a
sandbox. Every code path — shell, code-execution, MCP, file tools,
plugins, hooks, skill loading — is subject to the same filesystem,
network, process, and (where applicable) inference policy.
Hermes Agent supports this in two ways:
- **Hermes Agent's own Docker image and Compose setup.** Lighter-
weight; the agent runs in a standard container with operator-
configured mounts and network policy.
- **[NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell)**.
OpenShell provides per-session sandboxes with declarative policy
across filesystem, network (L7 egress), process/syscall, and
inference-routing layers. Network and inference policies are
hot-reloadable. Credentials are injected from a Provider store
and never touch the sandbox filesystem.
Under a whole-process wrapper, Hermes Agent's in-process heuristics
(§2.4) function as accident-prevention layered on top of a real
boundary. This is the supported posture when the agent ingests
content from surfaces the operator does not control — the open web,
inbound email, multi-user channels, untrusted MCP servers — and for
production or shared deployments.
Operators running the default local backend with untrusted input
surfaces, or running a terminal-backend sandbox and expecting it to
contain code paths that don't go through the shell, are operating
outside the supported security posture.
### 2.3 Credential Scoping
Hermes Agent filters the environment it passes to its lower-trust
in-process components: shell subprocesses, MCP subprocesses, and
the code-execution child. Credentials like provider API keys and
gateway tokens are stripped by default; variables explicitly
declared by the operator or by a loaded skill are passed through.
This reduces casual exfiltration. It is not containment. Any
component running inside the agent process (skills, plugins, hook
handlers) can read whatever the agent itself can read, including
in-memory credentials. The mitigation against a compromised
in-process component is operator review before install (§2.4,
§2.5), not environment scrubbing.
### 2.4 In-Process Heuristics
The following components screen or warn about LLM behavior. They
are useful. They are not boundaries.
- The **approval gate** detects common destructive shell patterns
and prompts the operator before execution. Shell is Turing-
complete; a denylist over shell strings is structurally
incomplete. The gate catches cooperative-mode mistakes, not
adversarial output.
- **Output redaction** strips secret-like patterns from display.
A motivated output producer will defeat it.
- **Skills Guard** scans installable skill content for injection
patterns. It is a review aid; the boundary for third-party skills
is operator review before install. Reviewing a skill means
reading its Python code and scripts, not just its SKILL.md
description — skills execute arbitrary Python at import time.
### 2.5 Plugin Trust Model
Plugins load into the agent process and run with full agent
privileges: they can read the same credentials, call the same
tools, register the same hooks, and import the same modules as
anything shipped in-tree. The boundary for third-party plugins is
operator review before install — the same rule as skills (§2.4),
called out separately because plugins are architecturally heavier
and often ship their own background services, network listeners,
and dependencies.
A malicious or buggy plugin is not a vulnerability in Hermes Agent
itself. Bugs in Hermes Agent's plugin-install or plugin-discovery
path that prevent the operator from seeing what they're installing
are in scope under §3.1.
### 2.6 External Surfaces
An **external surface** is any channel outside the local agent
process through which a caller can dispatch agent work, resolve
approvals, or receive agent output. Each surface has its own
authorization model, but the rules below apply uniformly.
**Surfaces in Hermes Agent:**
- **Gateway platform adapters.** Messaging integrations in
`gateway/platforms/` (Telegram, Discord, Slack, email, SMS, etc.)
and analogous adapters shipped as plugins.
- **Network-exposed HTTP surfaces.** The API server adapter, the
dashboard plugin, the kanban plugin's HTTP endpoints, and any
other plugin that binds a listening socket.
- **Editor / IDE adapters.** The ACP adapter (`acp_adapter/`) and
equivalent integrations that accept requests from a local client
process.
- **The TUI gateway (`tui_gateway/`).** JSON-RPC backend for the
Ink terminal UI, reached over local IPC.
**Uniform rules:**
1. **Authorization is required at every surface that crosses a
trust boundary.** For messaging and network HTTP surfaces, the
boundary is the network: authorization means an operator-
configured caller allowlist. For editor and local-IPC surfaces
(ACP, TUI gateway), the boundary is the host's user account:
authorization means relying on OS-level access control (file
permissions, loopback-only binds) and not exposing the surface
beyond the local user without an explicit network auth layer.
2. **An allowlist is required for every enabled network-exposed
adapter.** Adapters must refuse to dispatch agent work, resolve
approvals, or relay output until an allowlist is set. Code paths
that fail open when no allowlist is configured are code bugs in
scope under §3.1.
3. **Session identifiers are routing handles, not authorization
boundaries.** Knowing another caller's session ID does not grant
access to their approvals or output; authorization is always
re-checked against the allowlist (or OS-level equivalent).
4. **Within the authorized set, all callers are equally trusted.**
Hermes Agent does not model per-caller capabilities inside a
single adapter. Operators who need capability separation should
run separate agent instances with separate allowlists.
5. **Binding a local-only surface to a non-loopback interface is a
break-glass operator decision (§3.2).** The dashboard and other
plugin HTTP servers default to loopback; exposing them via
`--host 0.0.0.0` or equivalent makes public-exposure hardening
(§4) the operator's responsibility.
---
## 3. Out of Scope (Non-Vulnerabilities)
## 3. Scope
The following scenarios are **not** considered security breaches:
- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox.
- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection.
- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files).
- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability.
- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production.
- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system).
### 3.1 In Scope
- Escape from a declared OS-level isolation posture (§2.2): an
attacker-controlled code path reaching state that the posture
claimed to confine.
- Unauthorized external-surface access: a caller outside the
configured authorization set (allowlist, or OS-level equivalent
for local-IPC surfaces) dispatching work, receiving output, or
resolving approvals (§2.6).
- Credential exfiltration: leakage of operator credentials or
session authorization material to a destination outside the
trust envelope, via a mechanism that should have prevented it
(environment scrubbing bug, adapter logging, transport error
that flushes credentials to an upstream, etc.).
- Trust-model documentation violations: code behaving contrary to
what this policy, Hermes Agent's own documentation, or reasonable
operator expectations would predict — including cases where
Hermes Agent has documented a stance about how its output should
be rendered by a consuming layer (dashboard, gateway adapter,
file writer, shell) and a code path breaks that stance.
### 3.2 Out of Scope
"Out of scope" here means "not a security vulnerability under this
policy." It does not mean "not worth reporting." Improvements to the
in-process heuristics, hardening ideas, and UX fixes are welcome as
regular issues or pull requests — the approval gate can always catch
more patterns, redaction can always get smarter, adapter behavior
can always be tightened. These items just don't go through the
private-disclosure channel and don't receive advisories.
- **Bypasses of in-process heuristics (§2.4)** — approval-gate regex
bypasses, redaction bypasses, Skills Guard pattern bypasses, and
analogous reports against future heuristics. These components are
not boundaries; defeating them is not a vulnerability under this
policy.
- **Prompt injection per se.** Getting the LLM to emit unusual
output — via injected content, hallucination, training artifacts,
or any other cause — is not itself a vulnerability. "I achieved
prompt injection" without a chained §3.1 outcome is not an
actionable report under this policy.
- **Consequences of a chosen isolation posture.** Reports that a
code path operating within its posture's scope can do what that
posture permits are not vulnerabilities. Examples: shell or file
tools reaching host state under the local backend; code-execution
or MCP subprocesses reaching host state under terminal-backend
isolation that only sandboxes shell; reports whose preconditions
require pre-existing write access to operator-owned configuration
or credential files (those are already inside the trust envelope).
- **Documented break-glass settings.** Operator-selected trade-offs
that explicitly disable protections: `--insecure` and equivalent
flags on the dashboard or other components, disabled approvals,
local backend in production, development profiles that bypass
hermes-home security, and similar. Reports against those
configurations are not vulnerabilities — that's the flag's job.
- **Community-contributed skills and plugins.** Third-party skills
(including the community skills repository) and third-party
plugins are in the operator's review surface, not Hermes Agent's
trust surface (§2.4, §2.5). A skill or plugin doing something
malicious is the expected failure mode of one that wasn't
reviewed, not a vulnerability in Hermes Agent. Bugs in Hermes
Agent's skill-install or plugin-install path that prevent the
operator from seeing what they're installing are in scope under
§3.1.
- **Public exposure without external controls.** Exposing the
gateway or API to the public internet without authentication,
VPN, or firewall.
- **Tool-level read/write restrictions on a posture where shell is
permitted.** If a path is reachable via the terminal tool, reports
that other file tools can reach it add nothing.
---
## 4. Deployment Hardening & Best Practices
## 4. Deployment Hardening
### Filesystem & Network
- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads.
- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs.
- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment.
The single most important hardening decision is matching isolation
(§2.2) to the trust of the content the agent will ingest. Beyond
that:
### Skills & Supply Chain
- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal.
- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned.
- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns.
### Credential Storage
- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control.
- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases.
- Run the agent as a non-root user. The supplied container image
does this by default.
- Keep credentials in the operator credential file with tight
permissions, never in the main config, never in version control.
Under OpenShell, use the Provider store rather than an on-disk
credential file.
- Do not expose the gateway or API to the public internet without
VPN, Tailscale, or firewall protection. Under OpenShell, use the
network policy layer to restrict egress.
- Configure a caller allowlist for every network-exposed adapter
you enable (§2.6).
- Review third-party skills and plugins before install (§2.4,
§2.5). For skills, this means reading the Python and scripts,
not just SKILL.md. Skills Guard reports and the install audit
log are the review surface.
- Hermes Agent includes supply-chain guards for MCP server
launches and for dependency / bundled-package changes in CI; see
`CONTRIBUTING.md` for specifics.
---
## 5. Disclosure Process
## 5. Disclosure
- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first.
- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com.
- **Credits:** Reporters are credited in release notes unless anonymity is requested.
- **Coordinated disclosure window:** 90 days from report, or until a
fix is released, whichever comes first.
- **Channel:** the GHSA thread or email correspondence with
security@nousresearch.com.
- **Credit:** reporters are credited in release notes unless
anonymity is requested.
+1
View File
@@ -601,6 +601,7 @@ class SessionManager:
),
"quiet_mode": True,
"session_id": session_id,
"session_db": self._get_db(),
"model": model or default_model,
}
+2 -2
View File
@@ -769,8 +769,8 @@ def _build_patch_mode_content(patch_text: str) -> List[Any]:
old_chunks: list[str] = []
new_chunks: list[str] = []
for hunk in op.hunks:
old_lines = [line.content for line in hunk.lines if line.prefix in (" ", "-")]
new_lines = [line.content for line in hunk.lines if line.prefix in (" ", "+")]
old_lines = [line.content for line in hunk.lines if line.prefix in {" ", "-"}]
new_lines = [line.content for line in hunk.lines if line.prefix in {" ", "+"}]
if old_lines or new_lines:
old_chunks.append("\n".join(old_lines))
new_chunks.append("\n".join(new_lines))
+1 -1
View File
@@ -47,7 +47,7 @@ def _title_case_slug(value: Optional[str]) -> Optional[str]:
def _parse_dt(value: Any) -> Optional[datetime]:
if value in (None, ""):
if value in {None, ""}:
return None
if isinstance(value, (int, float)):
return datetime.fromtimestamp(float(value), tz=timezone.utc)
+12 -4
View File
@@ -1289,13 +1289,21 @@ def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
continue
if name:
seen_names.add(name)
result.append({
anthropic_tool: Dict[str, Any] = {
"name": name,
"description": fn.get("description", ""),
"input_schema": _normalize_tool_input_schema(
fn.get("parameters", {"type": "object", "properties": {}})
),
})
}
# Forward cache_control marker when present on the OpenAI-format
# tool dict (set by ``mark_tools_for_long_lived_cache``). Anthropic's
# tools array supports cache_control on the last tool to cache the
# entire schema cross-session.
cache_control = t.get("cache_control")
if isinstance(cache_control, dict):
anthropic_tool["cache_control"] = dict(cache_control)
result.append(anthropic_tool)
return result
@@ -1537,7 +1545,7 @@ def convert_messages_to_anthropic(
# downgraded to a spurious text block on the last assistant message.
reasoning_content = m.get("reasoning_content")
_already_has_thinking = any(
isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking")
isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"}
for b in blocks
)
if isinstance(reasoning_content, str) and not _already_has_thinking:
@@ -1688,7 +1696,7 @@ def convert_messages_to_anthropic(
if isinstance(m["content"], list):
m["content"] = [
b for b in m["content"]
if not (isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking"))
if not (isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"})
]
prev_blocks = fixed[-1]["content"]
curr_blocks = m["content"]
+617 -93
View File
@@ -175,7 +175,7 @@ def _normalize_aux_provider(provider: Optional[str]) -> str:
# Resolve to the user's actual main provider so named custom providers
# and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
main_prov = (_read_main_provider() or "").strip().lower()
if main_prov and main_prov not in ("auto", "main", ""):
if main_prov and main_prov not in {"auto", "main", ""}:
normalized = main_prov
else:
return "custom"
@@ -490,6 +490,29 @@ def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]:
return True, None
def _peek_pool_entry(provider: str) -> Optional[Any]:
"""Best-effort current/next pool entry without mutating selection order."""
try:
pool = load_pool(provider)
except Exception as exc:
logger.debug("Auxiliary client: could not load pool for %s (peek): %s", provider, exc)
return None
if not pool or not pool.has_credentials():
return None
try:
current_fn = getattr(pool, "current", None)
if callable(current_fn):
current = current_fn()
if current is not None:
return current
peek_fn = getattr(pool, "peek", None)
if callable(peek_fn):
return peek_fn()
except Exception as exc:
logger.debug("Auxiliary client: could not peek pool entry for %s: %s", provider, exc)
return None
def _pool_runtime_api_key(entry: Any) -> str:
if entry is None:
return ""
@@ -555,7 +578,7 @@ def _convert_content_for_responses(content: Any) -> Any:
if detail:
entry["detail"] = detail
converted.append(entry)
elif ptype in ("input_text", "input_image"):
elif ptype in {"input_text", "input_image"}:
# Already in Responses format — pass through
converted.append(part)
else:
@@ -683,6 +706,16 @@ class _CodexCompletionsAdapter:
close()
except Exception:
logger.debug("Codex auxiliary: client close during timeout failed", exc_info=True)
# The cached auxiliary client wraps this same ``self._client``
# (or *is* a ``CodexAuxiliaryClient`` whose ``_real_client`` is
# this instance). After we close the httpx transport above, the
# cache must drop that entry — otherwise the next auxiliary call
# (compression retry, memory flush, etc.) reuses the dead client
# and fails fast with a connection error. See issue #23432.
try:
_evict_cached_client_instance(self._client)
except Exception:
logger.debug("Codex auxiliary: cache eviction on timeout failed", exc_info=True)
def _check_cancelled() -> None:
if deadline is not None and time.monotonic() >= deadline:
@@ -765,7 +798,7 @@ class _CodexCompletionsAdapter:
if item_type == "message":
for part in (_item_get(item, "content") or []):
ptype = _item_get(part, "type")
if ptype in ("output_text", "text"):
if ptype in {"output_text", "text"}:
text_parts.append(_item_get(part, "text", ""))
elif item_type == "function_call":
tool_calls_raw.append(SimpleNamespace(
@@ -867,6 +900,14 @@ class AsyncCodexAuxiliaryClient:
self.chat = _AsyncCodexChatShim(async_adapter)
self.api_key = sync_wrapper.api_key
self.base_url = sync_wrapper.base_url
# Mirror the sync wrapper's _real_client so cache eviction by leaf
# OpenAI client (e.g. _close_client_on_timeout in #23482) drops
# this async entry too. Without this, sync and async cache entries
# diverge on poisoning: the sync entry is evicted but the async
# entry keeps reusing the closed transport, failing every
# subsequent async aux call with 'Connection error' until the
# gateway restarts.
self._real_client = sync_wrapper._real_client
class _AnthropicCompletionsAdapter:
@@ -1002,6 +1043,9 @@ class AsyncAnthropicAuxiliaryClient:
self.chat = _AsyncAnthropicChatShim(async_adapter)
self.api_key = sync_wrapper.api_key
self.base_url = sync_wrapper.base_url
# See AsyncCodexAuxiliaryClient: mirror _real_client so cache
# eviction on a poisoned underlying client also drops this entry.
self._real_client = sync_wrapper._real_client
def _endpoint_speaks_anthropic_messages(base_url: str) -> bool:
@@ -1440,7 +1484,16 @@ def _read_main_model() -> str:
config.yaml model.default is the single source of truth for the active
model. Environment variables are no longer consulted.
Runtime override: when an AIAgent is active with a CLI/gateway-provided
model that differs from config.yaml, ``set_runtime_main()`` records the
override in a process-local global. This is consulted FIRST so tools
that gate on "the active main model" (e.g. ``vision_analyze``'s native
fast path) see the live runtime, not the persisted config default.
"""
override = _RUNTIME_MAIN_MODEL
if isinstance(override, str) and override.strip():
return override.strip()
try:
from hermes_cli.config import load_config
cfg = load_config()
@@ -1461,7 +1514,13 @@ def _read_main_provider() -> str:
Returns the lowercase provider id (e.g. "alibaba", "openrouter") or ""
if not configured.
Runtime override: see ``_read_main_model`` — same mechanism for the
provider half of the runtime tuple.
"""
override = _RUNTIME_MAIN_PROVIDER
if isinstance(override, str) and override.strip():
return override.strip().lower()
try:
from hermes_cli.config import load_config
cfg = load_config()
@@ -1475,6 +1534,32 @@ def _read_main_provider() -> str:
return ""
# Process-local override set by AIAgent at session/turn start. Single-threaded
# per turn — no lock needed. Cleared by ``clear_runtime_main()``.
_RUNTIME_MAIN_PROVIDER: str = ""
_RUNTIME_MAIN_MODEL: str = ""
def set_runtime_main(provider: str, model: str) -> None:
"""Record the live runtime provider/model for the current AIAgent.
Called by ``run_agent.AIAgent._sync_runtime_main_for_aux_routing`` (or
equivalent setter) at the top of each turn so that
``_read_main_provider`` / ``_read_main_model`` reflect CLI/gateway
overrides instead of the stale config.yaml default.
"""
global _RUNTIME_MAIN_PROVIDER, _RUNTIME_MAIN_MODEL
_RUNTIME_MAIN_PROVIDER = (provider or "").strip().lower()
_RUNTIME_MAIN_MODEL = (model or "").strip()
def clear_runtime_main() -> None:
"""Clear the runtime override (e.g. on session end)."""
global _RUNTIME_MAIN_PROVIDER, _RUNTIME_MAIN_MODEL
_RUNTIME_MAIN_PROVIDER = ""
_RUNTIME_MAIN_MODEL = ""
def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str], Optional[str]]:
"""Resolve the active custom/main endpoint the same way the main CLI does.
@@ -1756,6 +1841,113 @@ def _get_provider_chain() -> List[tuple]:
]
# ── Auxiliary "recently 402'd" unhealthy-provider cache ────────────────────
#
# When an auxiliary provider returns HTTP 402 (Payment Required / credit
# exhaustion), retrying it on every subsequent aux call is wasteful — the
# provider stays depleted for hours or days, but the chain re-tries it as
# the FIRST entry on every compression/title-gen/session-search call,
# burns ~1 RTT, gets 402 again, then falls back. On a long Discord/LCM
# session that adds up to dozens of doomed 402s.
#
# Solution: when ANY caller observes a payment error against a provider,
# mark it unhealthy for ``_AUX_UNHEALTHY_TTL_SECONDS``. ``_resolve_auto``
# Step-2 and ``_try_payment_fallback`` both consult this cache and skip
# unhealthy entries (logging once per skip-reason so the user sees what
# happened). Entries auto-expire so a topped-up account recovers without
# manual intervention.
#
# Failure isolation: the cache is in-process only. A second hermes
# process won't inherit the unhealthy mark — that's intentional, since
# the user might be running two profiles with different OpenRouter keys.
_AUX_UNHEALTHY_TTL_SECONDS = 600 # 10 minutes
_aux_unhealthy_until: Dict[str, float] = {}
_aux_unhealthy_logged_at: Dict[str, float] = {}
# Map provider names that show up in resolved_provider / explicit-config
# back to the chain labels used by _get_provider_chain(). Keep in sync
# with the alias map in _try_payment_fallback below.
_AUX_UNHEALTHY_LABEL_ALIASES = {
"openrouter": "openrouter",
"nous": "nous",
"custom": "local/custom",
"local/custom": "local/custom",
"openai-codex": "openai-codex",
"codex": "openai-codex",
}
def _normalize_chain_label(provider: str) -> str:
"""Normalize a resolved_provider value to a chain label used by
``_get_provider_chain()``. Falls back to the lowercased input for
direct API-key providers (deepseek, alibaba, minimax, etc.) which
each report their own provider name from the api-key chain.
"""
if not provider:
return ""
p = str(provider).strip().lower()
return _AUX_UNHEALTHY_LABEL_ALIASES.get(p, p)
def _mark_provider_unhealthy(provider: str, ttl: Optional[float] = None) -> None:
"""Mark ``provider`` as recently-402'd, hidden from chain iteration
until the TTL expires. Called from the payment-fallback branches in
``call_llm`` and ``acall_llm`` after a confirmed payment error.
"""
label = _normalize_chain_label(provider)
if not label:
return
expires_at = time.time() + (ttl if ttl is not None else _AUX_UNHEALTHY_TTL_SECONDS)
_aux_unhealthy_until[label] = expires_at
logger.warning(
"Auxiliary: marking %s unhealthy for %ds (payment / credit error). "
"Subsequent auxiliary calls will skip it until %s.",
label,
int(ttl if ttl is not None else _AUX_UNHEALTHY_TTL_SECONDS),
time.strftime("%H:%M:%S", time.localtime(expires_at)),
)
def _is_provider_unhealthy(label: str) -> bool:
"""True iff ``label`` is in the unhealthy cache and the TTL hasn't expired.
Lazily evicts expired entries so the cache stays small.
"""
if not label:
return False
expires_at = _aux_unhealthy_until.get(label)
if expires_at is None:
return False
if time.time() >= expires_at:
_aux_unhealthy_until.pop(label, None)
_aux_unhealthy_logged_at.pop(label, None)
return False
return True
def _log_skip_unhealthy(label: str, task: Optional[str] = None) -> None:
"""Emit a single info-level log per minute when we skip an unhealthy
provider. Avoids spamming the log on bursty sessions while still
giving the user a trail.
"""
now = time.time()
last = _aux_unhealthy_logged_at.get(label, 0.0)
if now - last >= 60:
_aux_unhealthy_logged_at[label] = now
expires_at = _aux_unhealthy_until.get(label, now)
logger.info(
"Auxiliary %s: skipping %s (recently returned payment error, retry in %ds)",
task or "call", label, max(0, int(expires_at - now)),
)
def _reset_aux_unhealthy_cache() -> None:
"""Clear the unhealthy cache. Used by tests and by a future explicit
user trigger (e.g. ``hermes config aux reset``)."""
_aux_unhealthy_until.clear()
_aux_unhealthy_logged_at.clear()
def _is_payment_error(exc: Exception) -> bool:
"""Detect payment/credit/quota exhaustion errors.
@@ -1768,7 +1960,7 @@ def _is_payment_error(exc: Exception) -> bool:
err_lower = str(exc).lower()
# OpenRouter and other providers include "credits" or "afford" in 402 bodies,
# but sometimes wrap them in 429 or other codes.
if status in (402, 429, None):
if status in {402, 429, None}:
if any(kw in err_lower for kw in ("credits", "insufficient funds",
"can only afford", "billing",
"payment required")):
@@ -1817,10 +2009,12 @@ def _is_connection_error(exc: Exception) -> bool:
distinct from API errors (4xx/5xx) which indicate the provider IS
reachable but returned an error.
"""
from openai import APIConnectionError, APITimeoutError
if isinstance(exc, (APIConnectionError, APITimeoutError)):
return True
try:
from openai import APIConnectionError, APITimeoutError
if isinstance(exc, (APIConnectionError, APITimeoutError)):
return True
except ImportError:
pass
# urllib3 / httpx / httpcore connection errors
err_type = type(exc).__name__
if any(kw in err_type for kw in ("Connection", "Timeout", "DNS", "SSL")):
@@ -1830,6 +2024,16 @@ def _is_connection_error(exc: Exception) -> bool:
"connection refused", "name or service not known",
"no route to host", "network is unreachable",
"timed out", "connection reset",
# httpcore / httpx streaming premature-close errors. These surface
# when a proxy or provider drops the connection mid-stream and are
# transient by nature — the request should be retried or rerouted.
# See issue #18458.
"incomplete chunked read",
"peer closed connection",
"response ended prematurely",
"unexpected eof",
"remoteprotocolerror",
"localprotocolerror",
)):
return True
return False
@@ -1908,6 +2112,246 @@ def _evict_cached_clients(provider: str) -> None:
_client_cache.pop(key, None)
def _evict_cached_client_instance(target: Any) -> bool:
"""Drop the cache entry whose stored client is *target*.
Used when a specific cached client has been poisoned (closed httpx
transport after a timeout, broken streaming session, etc.) so the next
auxiliary call rebuilds rather than reusing the dead instance.
Walks both sync and async wrappers (``CodexAuxiliaryClient``,
``AnthropicAuxiliaryClient``, ``AsyncCodexAuxiliaryClient``, etc.) via
their ``_real_client`` attribute so a timeout that closes the underlying
``OpenAI`` (or native provider) client evicts every cached shim that
exposed it. Async wrappers must mirror their sync sibling's
``_real_client`` for this to work — otherwise the sync entry is evicted
but the async entry survives and keeps reusing the dead transport.
Returns True when at least one entry was evicted.
"""
if target is None:
return False
evicted = False
with _client_cache_lock:
for key in list(_client_cache.keys()):
entry = _client_cache.get(key)
if entry is None:
continue
cached = entry[0]
if cached is None:
continue
real = getattr(cached, "_real_client", None)
if cached is target or real is target:
del _client_cache[key]
evicted = True
return evicted
def _pool_cache_hint(
provider: str,
*,
main_runtime: Optional[Dict[str, Any]] = None,
) -> str:
"""Return a stable cache discriminator for pooled providers."""
normalized = _normalize_aux_provider(provider)
if normalized == "auto":
runtime = _normalize_main_runtime(main_runtime)
normalized = _normalize_aux_provider(runtime.get("provider") or _read_main_provider())
if normalized in {"", "auto", "custom"}:
return ""
entry = _peek_pool_entry(normalized)
if entry is None:
return ""
entry_id = str(getattr(entry, "id", "") or "").strip()
if not entry_id:
return ""
return f"{normalized}:{entry_id}"
def _pool_error_context(exc: Exception) -> Dict[str, Any]:
status = getattr(exc, "status_code", None)
payload: Dict[str, Any] = {"message": str(exc)}
if status is not None:
payload["status_code"] = status
return payload
def _recoverable_pool_provider(resolved_provider: str, client: Any) -> Optional[str]:
"""Infer which provider pool can recover the current auxiliary client."""
normalized = _normalize_aux_provider(resolved_provider)
if normalized not in {"", "auto", "custom"}:
return normalized
base = str(getattr(client, "base_url", "") or "")
if base_url_host_matches(base, "chatgpt.com"):
return "openai-codex"
if base_url_host_matches(base, "openrouter.ai"):
return "openrouter"
if base_url_host_matches(base, "inference-api.nousresearch.com"):
return "nous"
if base_url_host_matches(base, "api.anthropic.com"):
return "anthropic"
if base_url_host_matches(base, "api.githubcopilot.com"):
return "copilot"
if base_url_host_matches(base, "api.kimi.com"):
return "kimi-coding"
return None
def _recover_provider_pool(provider: str, exc: Exception) -> bool:
"""Try same-provider credential-pool recovery for auxiliary calls."""
normalized = _normalize_aux_provider(provider)
try:
pool = load_pool(normalized)
except Exception as load_exc:
logger.debug("Auxiliary client: could not load pool for %s recovery: %s", normalized, load_exc)
return False
if not pool or not pool.has_credentials():
return False
status_code = getattr(exc, "status_code", None)
error_context = _pool_error_context(exc)
if _is_auth_error(exc):
refreshed = pool.try_refresh_current()
if refreshed is not None:
_evict_cached_clients(normalized)
return True
next_entry = pool.mark_exhausted_and_rotate(
status_code=status_code if status_code is not None else 401,
error_context=error_context,
)
if next_entry is not None:
_evict_cached_clients(normalized)
return True
return False
if _is_payment_error(exc) or _is_rate_limit_error(exc):
fallback_status = 402 if _is_payment_error(exc) else 429
next_entry = pool.mark_exhausted_and_rotate(
status_code=status_code if status_code is not None else fallback_status,
error_context=error_context,
)
if next_entry is not None:
_evict_cached_clients(normalized)
return True
return False
def _retry_same_provider_sync(
*,
task: Optional[str],
resolved_provider: str,
resolved_model: Optional[str],
resolved_base_url: Optional[str],
resolved_api_key: Optional[str],
resolved_api_mode: Optional[str],
main_runtime: Optional[Dict[str, Any]],
final_model: Optional[str],
messages: list,
temperature: Optional[float],
max_tokens: Optional[int],
tools: Optional[list],
effective_timeout: float,
effective_extra_body: dict,
) -> Any:
if task == "vision":
_, retry_client, retry_model = resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
base_url=resolved_base_url,
api_key=resolved_api_key,
async_mode=False,
)
else:
retry_client, retry_model = _get_cached_client(
resolved_provider,
resolved_model,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
)
if retry_client is None:
raise RuntimeError(
f"Auxiliary {task or 'call'}: provider {resolved_provider} could not be rebuilt after recovery"
)
retry_base = str(getattr(retry_client, "base_url", "") or "")
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=retry_base or resolved_base_url,
)
if _is_anthropic_compat_endpoint(resolved_provider, retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
retry_client.chat.completions.create(**retry_kwargs), task,
)
async def _retry_same_provider_async(
*,
task: Optional[str],
resolved_provider: str,
resolved_model: Optional[str],
resolved_base_url: Optional[str],
resolved_api_key: Optional[str],
resolved_api_mode: Optional[str],
final_model: Optional[str],
messages: list,
temperature: Optional[float],
max_tokens: Optional[int],
tools: Optional[list],
effective_timeout: float,
effective_extra_body: dict,
) -> Any:
if task == "vision":
_, retry_client, retry_model = resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
base_url=resolved_base_url,
api_key=resolved_api_key,
async_mode=True,
)
else:
retry_client, retry_model = _get_cached_client(
resolved_provider,
resolved_model,
async_mode=True,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
)
if retry_client is None:
raise RuntimeError(
f"Auxiliary {task or 'call'}: provider {resolved_provider} could not be rebuilt after recovery"
)
retry_base = str(getattr(retry_client, "base_url", "") or "")
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=retry_base or resolved_base_url,
)
if _is_anthropic_compat_endpoint(resolved_provider, retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
await retry_client.chat.completions.create(**retry_kwargs), task,
)
def _refresh_provider_credentials(provider: str) -> bool:
"""Refresh short-lived credentials for OAuth-backed auxiliary providers."""
normalized = _normalize_aux_provider(provider)
@@ -1980,6 +2424,10 @@ def _try_payment_fallback(
for label, try_fn in _get_provider_chain():
if label in skip_chain_labels:
continue
if _is_provider_unhealthy(label):
_log_skip_unhealthy(label, task)
tried.append(f"{label} (unhealthy)")
continue
client, model = try_fn()
if client is not None:
logger.info(
@@ -2048,7 +2496,7 @@ def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Option
main_provider = runtime_provider or _read_main_provider()
main_model = runtime_model or _read_main_model()
if (main_provider and main_model
and main_provider not in ("auto", "")):
and main_provider not in {"auto", ""}):
resolved_provider = main_provider
explicit_base_url = None
explicit_api_key = None
@@ -2056,21 +2504,34 @@ def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Option
resolved_provider = "custom"
explicit_base_url = runtime_base_url
explicit_api_key = runtime_api_key or None
client, resolved = resolve_provider_client(
resolved_provider,
main_model,
explicit_base_url=explicit_base_url,
explicit_api_key=explicit_api_key,
api_mode=runtime_api_mode or None,
)
if client is not None:
logger.info("Auxiliary auto-detect: using main provider %s (%s)",
main_provider, resolved or main_model)
return client, resolved or main_model
# Skip Step-1 if the main provider was recently 402'd. The unhealthy
# cache TTL bounds how long we bypass it, so a topped-up account
# recovers automatically. If we tried Step-1 anyway, every aux call
# on a depleted main provider would pay one doomed 402 RTT before
# falling to Step-2.
main_chain_label = _normalize_chain_label(resolved_provider)
if main_chain_label and _is_provider_unhealthy(main_chain_label):
_log_skip_unhealthy(main_chain_label)
else:
client, resolved = resolve_provider_client(
resolved_provider,
main_model,
explicit_base_url=explicit_base_url,
explicit_api_key=explicit_api_key,
api_mode=runtime_api_mode or None,
)
if client is not None:
logger.info("Auxiliary auto-detect: using main provider %s (%s)",
main_provider, resolved or main_model)
return client, resolved or main_model
# ── Step 2: aggregator / fallback chain ──────────────────────────────
tried = []
for label, try_fn in _get_provider_chain():
if _is_provider_unhealthy(label):
_log_skip_unhealthy(label)
tried.append(f"{label} (unhealthy)")
continue
client, model = try_fn()
if client is not None:
if tried:
@@ -2696,7 +3157,7 @@ def resolve_provider_client(
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
elif pconfig.auth_type in {"oauth_device_code", "oauth_external"}:
# OAuth providers — route through their specific try functions
if provider == "nous":
return resolve_provider_client("nous", model, async_mode)
@@ -2805,7 +3266,7 @@ def get_available_vision_backends() -> List[str]:
available: List[str] = []
# 1. Active provider — if the user configured a provider, try it first.
main_provider = _read_main_provider()
if main_provider and main_provider not in ("auto", ""):
if main_provider and main_provider not in {"auto", ""}:
if main_provider in _VISION_AUTO_PROVIDER_ORDER:
if _strict_vision_backend_available(main_provider):
available.append(main_provider)
@@ -2851,7 +3312,7 @@ def resolve_vision_provider_client(
if resolved_base_url:
provider_for_base_override = (
requested if requested and requested not in ("", "auto") else "custom"
requested if requested and requested not in {"", "auto"} else "custom"
)
client, final_model = resolve_provider_client(
provider_for_base_override,
@@ -2879,7 +3340,7 @@ def resolve_vision_provider_client(
# 4. Stop
main_provider = _read_main_provider()
main_model = _read_main_model()
if main_provider and main_provider not in ("auto", ""):
if main_provider and main_provider not in {"auto", ""}:
vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
if main_provider == "nous":
sync_client, default_model = _resolve_strict_vision_backend(
@@ -3033,7 +3494,8 @@ def _client_cache_key(
) -> tuple:
runtime = _normalize_main_runtime(main_runtime)
runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key, is_vision)
pool_hint = _pool_cache_hint(provider, main_runtime=main_runtime)
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key, is_vision, pool_hint)
def _store_cached_client(cache_key: tuple, client: Any, default_model: Optional[str], *, bound_loop: Any = None) -> None:
@@ -3684,7 +4146,7 @@ def call_llm(
# credentials were found, fail fast instead of silently routing
# through OpenRouter (which causes confusing 404s).
_explicit = (resolved_provider or "").strip().lower()
if _explicit and _explicit not in ("auto", "openrouter", "custom"):
if _explicit and _explicit not in {"auto", "openrouter", "custom"}:
raise RuntimeError(
f"Provider '{_explicit}' is set in config.yaml but no API key "
f"was found. Set the {_explicit.upper()}_API_KEY environment "
@@ -3814,46 +4276,63 @@ def call_llm(
# ── Auth refresh retry ───────────────────────────────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and resolved_provider not in {"auto", "", None}
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s: refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
retry_client, retry_model = (
resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=False,
)[1:]
if task == "vision"
else _get_cached_client(
resolved_provider,
resolved_model,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
)
return _retry_same_provider_sync(
task=task,
resolved_provider=resolved_provider,
resolved_model=resolved_model,
resolved_base_url=resolved_base_url,
resolved_api_key=resolved_api_key,
resolved_api_mode=resolved_api_mode,
main_runtime=main_runtime,
final_model=final_model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
effective_timeout=effective_timeout,
effective_extra_body=effective_extra_body,
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
# ── Same-provider credential-pool recovery ─────────────────────
pool_provider = _recoverable_pool_provider(resolved_provider, client)
if pool_provider and (_is_auth_error(first_err) or _is_payment_error(first_err) or _is_rate_limit_error(first_err)):
recovery_err = first_err
if _is_rate_limit_error(first_err):
try:
return _validate_llm_response(
retry_client.chat.completions.create(**retry_kwargs), task)
client.chat.completions.create(**kwargs), task)
except Exception as retry_err:
if not (_is_auth_error(retry_err) or _is_payment_error(retry_err) or _is_rate_limit_error(retry_err)):
raise
recovery_err = retry_err
if _recover_provider_pool(pool_provider, recovery_err):
logger.info(
"Auxiliary %s: recovered %s via credential-pool rotation after %s",
task or "call", pool_provider, type(recovery_err).__name__,
)
return _retry_same_provider_sync(
task=task,
resolved_provider=resolved_provider,
resolved_model=resolved_model,
resolved_base_url=resolved_base_url,
resolved_api_key=resolved_api_key,
resolved_api_mode=resolved_api_mode,
main_runtime=main_runtime,
final_model=final_model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
effective_timeout=effective_timeout,
effective_extra_body=effective_extra_body,
)
# ── Payment / credit exhaustion fallback ──────────────────────
# When the resolved provider returns 402 or a credit-related error,
@@ -3880,10 +4359,17 @@ def call_llm(
# Only try alternative providers when the user didn't explicitly
# configure this task's provider. Explicit provider = hard constraint;
# auto (the default) = best-effort fallback chain. (#7559)
is_auto = resolved_provider in ("auto", "", None)
is_auto = resolved_provider in {"auto", "", None}
if should_fallback and is_auto:
if _is_payment_error(first_err):
reason = "payment error"
# Resolve the actual provider label (resolved_provider may be
# "auto"; the client's base_url tells us which backend got the
# 402). Mark THAT label unhealthy so subsequent aux calls
# skip it instead of paying another doomed RTT.
_mark_provider_unhealthy(
_recoverable_pool_provider(resolved_provider, client) or resolved_provider
)
elif _is_rate_limit_error(first_err):
reason = "rate limit"
else:
@@ -3901,6 +4387,17 @@ def call_llm(
base_url=str(getattr(fb_client, "base_url", "") or ""))
return _validate_llm_response(
fb_client.chat.completions.create(**fb_kwargs), task)
# Connection/timeout errors leave the cached client poisoned (closed
# httpx transport, half-read stream, dead async loop). Drop it from
# the cache regardless of whether we found a fallback above so the
# next auxiliary call rebuilds a fresh client instead of reusing the
# dead one. See issue #23432.
if _is_connection_error(first_err):
try:
_evict_cached_client_instance(client)
except Exception:
logger.debug("Auxiliary: cache eviction after connection error failed",
exc_info=True)
raise
@@ -4018,7 +4515,7 @@ async def async_call_llm(
)
if client is None:
_explicit = (resolved_provider or "").strip().lower()
if _explicit and _explicit not in ("auto", "openrouter", "custom"):
if _explicit and _explicit not in {"auto", "openrouter", "custom"}:
raise RuntimeError(
f"Provider '{_explicit}' is set in config.yaml but no API key "
f"was found. Set the {_explicit.upper()}_API_KEY environment "
@@ -4129,45 +4626,61 @@ async def async_call_llm(
# ── Auth refresh retry (mirrors sync call_llm) ───────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and resolved_provider not in {"auto", "", None}
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s (async): refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
if task == "vision":
_, retry_client, retry_model = resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=True,
)
else:
retry_client, retry_model = _get_cached_client(
resolved_provider,
resolved_model,
async_mode=True,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return await _retry_same_provider_async(
task=task,
resolved_provider=resolved_provider,
resolved_model=resolved_model,
resolved_base_url=resolved_base_url,
resolved_api_key=resolved_api_key,
resolved_api_mode=resolved_api_mode,
final_model=final_model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
effective_timeout=effective_timeout,
effective_extra_body=effective_extra_body,
)
# ── Same-provider credential-pool recovery (mirrors sync) ─────
pool_provider = _recoverable_pool_provider(resolved_provider, client)
if pool_provider and (_is_auth_error(first_err) or _is_payment_error(first_err) or _is_rate_limit_error(first_err)):
recovery_err = first_err
if _is_rate_limit_error(first_err):
try:
return _validate_llm_response(
await retry_client.chat.completions.create(**retry_kwargs), task)
await client.chat.completions.create(**kwargs), task)
except Exception as retry_err:
if not (_is_auth_error(retry_err) or _is_payment_error(retry_err) or _is_rate_limit_error(retry_err)):
raise
recovery_err = retry_err
if _recover_provider_pool(pool_provider, recovery_err):
logger.info(
"Auxiliary %s (async): recovered %s via credential-pool rotation after %s",
task or "call", pool_provider, type(recovery_err).__name__,
)
return await _retry_same_provider_async(
task=task,
resolved_provider=resolved_provider,
resolved_model=resolved_model,
resolved_base_url=resolved_base_url,
resolved_api_key=resolved_api_key,
resolved_api_mode=resolved_api_mode,
final_model=final_model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
effective_timeout=effective_timeout,
effective_extra_body=effective_extra_body,
)
# ── Payment / connection / rate-limit fallback (mirrors sync call_llm) ──
should_fallback = (
@@ -4175,10 +4688,13 @@ async def async_call_llm(
or _is_connection_error(first_err)
or _is_rate_limit_error(first_err)
)
is_auto = resolved_provider in ("auto", "", None)
is_auto = resolved_provider in {"auto", "", None}
if should_fallback and is_auto:
if _is_payment_error(first_err):
reason = "payment error"
_mark_provider_unhealthy(
_recoverable_pool_provider(resolved_provider, client) or resolved_provider
)
elif _is_rate_limit_error(first_err):
reason = "rate limit"
else:
@@ -4202,4 +4718,12 @@ async def async_call_llm(
fb_kwargs["model"] = async_fb_model
return _validate_llm_response(
await async_fb.chat.completions.create(**fb_kwargs), task)
# Mirror the sync path: drop poisoned clients on connection/timeout
# so the next aux call rebuilds. See issue #23432.
if _is_connection_error(first_err):
try:
_evict_cached_client_instance(client)
except Exception:
logger.debug("Auxiliary (async): cache eviction after connection error failed",
exc_info=True)
raise
+52 -1
View File
@@ -410,10 +410,29 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
call_id = raw_tool_call_id.strip()
if not isinstance(call_id, str) or not call_id.strip():
continue
# Multimodal tool result: convert OpenAI-style content list into
# Responses ``function_call_output.output`` array. The Responses
# API accepts ``output`` as either a string or an array of
# ``input_text``/``input_image`` items. See
# https://developers.openai.com/api/reference/python/resources/responses/.
tool_content = msg.get("content")
output_value: Any
if isinstance(tool_content, list):
converted = _chat_content_to_responses_parts(
tool_content, role="user",
)
if converted:
output_value = converted
else:
output_value = ""
else:
output_value = str(tool_content or "")
items.append({
"type": "function_call_output",
"call_id": call_id,
"output": str(msg.get("content", "") or ""),
"output": output_value,
})
return items
@@ -466,6 +485,38 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
output = item.get("output", "")
if output is None:
output = ""
# Output may be a string OR an array of structured content
# items (input_text / input_image) for multimodal tool results.
# Both shapes are accepted by the Responses API. We preserve
# the array form when present.
if isinstance(output, list):
# Validate each item is a recognised content shape; drop
# anything else to avoid 4xx from the API.
cleaned: List[Dict[str, Any]] = []
for part in output:
if not isinstance(part, dict):
continue
ptype = part.get("type")
if ptype == "input_text":
text = part.get("text")
if isinstance(text, str) and text:
cleaned.append({"type": "input_text", "text": text})
elif ptype == "input_image":
url = part.get("image_url")
if isinstance(url, str) and url:
entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
detail = part.get("detail")
if isinstance(detail, str) and detail.strip():
entry["detail"] = detail.strip()
cleaned.append(entry)
normalized.append(
{
"type": "function_call_output",
"call_id": call_id.strip(),
"output": cleaned if cleaned else "",
}
)
continue
if not isinstance(output, str):
output = str(output)
+24 -15
View File
@@ -23,7 +23,7 @@ import re
import time
from typing import Any, Dict, List, Optional
from agent.auxiliary_client import call_llm
from agent.auxiliary_client import call_llm, _is_connection_error
from agent.context_engine import ContextEngine
from agent.model_metadata import (
MINIMUM_CONTEXT_LENGTH,
@@ -167,7 +167,7 @@ def _strip_image_parts_from_parts(parts: Any) -> Any:
out.append(part)
continue
ptype = part.get("type")
if ptype in ("image", "image_url", "input_image"):
if ptype in {"image", "image_url", "input_image"}:
had_image = True
out.append({"type": "text", "text": "[screenshot removed to save context]"})
else:
@@ -274,8 +274,8 @@ def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) ->
mode = args.get("mode", "replace")
return f"[patch] {mode} in {path} ({content_len:,} chars result)"
if tool_name in ("browser_navigate", "browser_click", "browser_snapshot",
"browser_type", "browser_scroll", "browser_vision"):
if tool_name in {"browser_navigate", "browser_click", "browser_snapshot",
"browser_type", "browser_scroll", "browser_vision"}:
url = args.get("url", "")
ref = args.get("ref", "")
detail = f" {url}" if url else (f" ref={ref}" if ref else "")
@@ -304,7 +304,7 @@ def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) ->
code_preview += "..."
return f"[execute_code] `{code_preview}` ({line_count} lines output)"
if tool_name in ("skill_view", "skills_list", "skill_manage"):
if tool_name in {"skill_view", "skills_list", "skill_manage"}:
name = args.get("name", "?")
return f"[{tool_name}] name={name} ({content_len:,} chars)"
@@ -979,13 +979,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
_status = getattr(e, "status_code", None) or getattr(getattr(e, "response", None), "status_code", None)
_err_str = str(e).lower()
_is_model_not_found = (
_status in (404, 503)
_status in {404, 503}
or "model_not_found" in _err_str
or "does not exist" in _err_str
or "no available channel" in _err_str
)
_is_timeout = (
_status in (408, 429, 502, 504)
_status in {408, 429, 502, 504}
or "timeout" in _err_str
)
# Non-JSON / malformed-body responses from misconfigured providers
@@ -1000,6 +1000,14 @@ The user has requested that this compaction PRIORITISE preserving all informatio
isinstance(e, json.JSONDecodeError)
or "expecting value" in _err_str
)
# httpcore / httpx streaming premature-close errors surface as
# ConnectionError subclasses or plain Exception with characteristic
# substrings ("incomplete chunked read", "peer closed connection",
# "response ended prematurely", "unexpected eof"). These are
# transient network events; treat them like a timeout so we fall
# back to the main model instead of entering a 60-second cooldown.
# See issue #18458.
_is_streaming_closed = _is_connection_error(e)
if _is_json_decode and not _is_model_not_found and not _is_timeout:
logger.error(
"Context compression failed: auxiliary LLM returned a "
@@ -1012,7 +1020,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
e,
)
if (
(_is_model_not_found or _is_timeout or _is_json_decode)
(_is_model_not_found or _is_timeout or _is_json_decode or _is_streaming_closed)
and self.summary_model
and self.summary_model != self.model
and not getattr(self, "_summary_model_fallen_back", False)
@@ -1021,6 +1029,8 @@ The user has requested that this compaction PRIORITISE preserving all informatio
_reason = "returned invalid JSON"
elif _is_model_not_found:
_reason = "unavailable"
elif _is_streaming_closed:
_reason = "closed stream prematurely"
else:
_reason = "timed out"
self._fallback_to_main_for_compression(e, _reason)
@@ -1043,10 +1053,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._fallback_to_main_for_compression(e, "failed")
return self._generate_summary(turns_to_summarize, focus_topic=focus_topic)
# Transient errors (timeout, rate limit, network, JSON decode) —
# shorter cooldown for JSON decode since the body shape can flip
# back to valid quickly when an upstream proxy recovers.
_transient_cooldown = 30 if _is_json_decode else 60
# Transient errors (timeout, rate limit, network, JSON decode,
# streaming premature-close) — shorter cooldown for JSON decode and
# streaming-closed since those conditions can self-resolve quickly.
_transient_cooldown = 30 if (_is_json_decode or _is_streaming_closed) else 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
err_text = str(e).strip() or e.__class__.__name__
if len(err_text) > 220:
@@ -1306,8 +1316,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Ensure we protect at least min_tail messages
fallback_cut = n - min_tail
if cut_idx > fallback_cut:
cut_idx = fallback_cut
cut_idx = min(cut_idx, fallback_cut)
# If the token budget would protect everything (small conversations),
# force a cut after the head so compression can still remove middle turns.
@@ -1470,7 +1479,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
first_tail_role = messages[compress_end].get("role", "user") if compress_end < n_messages else "user"
# Pick a role that avoids consecutive same-role with both neighbors.
# Priority: avoid colliding with head (already committed), then tail.
if last_head_role in ("assistant", "tool"):
if last_head_role in {"assistant", "tool"}:
summary_role = "user"
else:
summary_role = "assistant"
+1 -1
View File
@@ -149,7 +149,7 @@ class PooledCredential:
}
result: Dict[str, Any] = {}
for field_def in fields(self):
if field_def.name in ("provider", "extra"):
if field_def.name in {"provider", "extra"}:
continue
value = getattr(self, field_def.name)
if value is not None or field_def.name in _ALWAYS_EMIT:
+107
View File
@@ -72,6 +72,7 @@ def _default_state() -> Dict[str, Any]:
"last_run_at": None,
"last_run_duration_seconds": None,
"last_run_summary": None,
"last_run_summary_shown_at": None,
"last_report_path": None,
"paused": False,
"run_count": 0,
@@ -876,6 +877,96 @@ def _reconcile_classification(
return {"consolidated": consolidated, "pruned": pruned}
def _build_rename_summary(
*,
before_names: Set[str],
after_report: List[Dict[str, Any]],
tool_calls: List[Dict[str, Any]],
model_final: str,
) -> str:
"""Format the user-visible rename map for a curator run.
Renders the "where did my skills go?" lines that get appended to the
`final_summary` string fed to gateway/CLI receivers. Empty string when
nothing was archived this run most ticks are no-op and shouldn't add
extra log noise.
Format::
archived 4 skill(s):
pdf-extraction document-tools
docx-extraction document-tools
flaky-thing pruned (stale)
old-utility spreadsheet-ops
full report: hermes curator status
keep an umbrella stable: hermes curator pin document-tools
Cap is 10 entries so a 50-skill consolidation doesn't blow up
agent.log; the full list is always in REPORT.md. The pin hint only
appears when at least one consolidation produced an umbrella worth
pinning (pruned-only runs skip it).
"""
after_by_name = {r.get("name"): r for r in after_report if isinstance(r, dict)}
after_names = set(after_by_name.keys())
removed = sorted(before_names - after_names)
added = sorted(after_names - before_names)
if not removed:
return ""
heuristic = _classify_removed_skills(
removed=removed,
added=added,
after_names=after_names,
tool_calls=tool_calls,
)
model_block = _parse_structured_summary(model_final)
destinations = set(after_names) | set(added)
absorbed_declarations = _extract_absorbed_into_declarations(tool_calls)
classification = _reconcile_classification(
removed=removed,
heuristic=heuristic,
model_block=model_block,
destinations=destinations,
absorbed_declarations=absorbed_declarations,
)
consolidated = classification["consolidated"]
pruned = classification["pruned"]
SHOW = 10
lines: List[str] = []
total = len(consolidated) + len(pruned)
lines.append(f"archived {total} skill(s):")
shown = 0
for entry in consolidated:
if shown >= SHOW:
break
name = entry.get("name", "?")
into = entry.get("into", "?")
lines.append(f"{name}{into}")
shown += 1
for entry in pruned:
if shown >= SHOW:
break
name = entry.get("name", "?") if isinstance(entry, dict) else str(entry)
lines.append(f"{name} — pruned (stale)")
shown += 1
if total > SHOW:
lines.append(f" … and {total - SHOW} more")
lines.append("full report: hermes curator status")
# Pin hint — only surface it when there's actually a destination skill
# worth pinning. The umbrella skills that absorbed content are the natural
# candidates: pinning one tells future curator runs to leave it alone.
# Pruned-only runs don't get this hint (nothing surviving to pin).
if consolidated:
umbrellas = sorted({e.get("into") for e in consolidated if e.get("into")})
if umbrellas:
example = umbrellas[0]
lines.append(
f"keep an umbrella stable: hermes curator pin {example}"
)
return "\n".join(lines)
def _write_run_report(
*,
started_at: datetime,
@@ -1398,6 +1489,22 @@ def run_curator_review(
"error": str(e),
}
# Append the rename map (`old-name → umbrella`) to the user-visible
# summary so people don't have to dig into REPORT.md to find out where
# their skills went. Best-effort: classification is pure but never
# block the run on a formatting issue.
try:
rename_lines = _build_rename_summary(
before_names=before_names,
after_report=skill_usage.agent_created_report(),
tool_calls=llm_meta.get("tool_calls", []) or [],
model_final=llm_meta.get("final", "") or "",
)
if rename_lines:
final_summary = f"{final_summary}\n{rename_lines}"
except Exception as e:
logger.debug("Curator rename summary build failed: %s", e, exc_info=True)
elapsed = (datetime.now(timezone.utc) - start).total_seconds()
state2 = load_state()
state2["last_run_duration_seconds"] = elapsed
+30 -8
View File
@@ -83,7 +83,7 @@ class ClassifiedError:
@property
def is_auth(self) -> bool:
return self.reason in (FailoverReason.auth, FailoverReason.auth_permanent)
return self.reason in {FailoverReason.auth, FailoverReason.auth_permanent}
@@ -254,6 +254,20 @@ _THINKING_SIG_PATTERNS = [
"signature", # Combined with "thinking" check
]
# Message-string patterns that indicate a provider-side timeout even when
# the exception type is generic (e.g. RuntimeError from a local shim that
# wraps a subprocess timeout). Checked before the type-based transport
# heuristics so custom-provider "timed out" errors don't fall through to
# the unknown bucket and get misreported as empty responses.
_TIMEOUT_MESSAGE_PATTERNS = [
"timed out",
"turn timed out",
"request timed out",
"deadline exceeded",
"operation timed out",
"upstream timed out",
]
# Transport error type names
_TRANSPORT_ERROR_TYPES = frozenset({
"ReadTimeout", "ConnectTimeout", "PoolTimeout",
@@ -674,10 +688,10 @@ def _classify_by_status(
result_fn=result_fn,
)
if status_code in (500, 502):
if status_code in {500, 502}:
return result_fn(FailoverReason.server_error, retryable=True)
if status_code in (503, 529):
if status_code in {503, 529}:
return result_fn(FailoverReason.overloaded, retryable=True)
# Other 4xx — non-retryable
@@ -796,7 +810,7 @@ def _classify_400(
# Responses API (and some providers) use flat body: {"message": "..."}
if not err_body_msg:
err_body_msg = str(body.get("message") or "").strip().lower()
is_generic = len(err_body_msg) < 30 or err_body_msg in ("error", "")
is_generic = len(err_body_msg) < 30 or err_body_msg in {"error", ""}
# Absolute token/message-count thresholds are only a proxy for smaller
# context windows. Large-context sessions can have many messages while
# still being far below their actual token budget.
@@ -827,14 +841,14 @@ def _classify_by_error_code(
"""Classify by structured error codes from the response body."""
code_lower = error_code.lower()
if code_lower in ("resource_exhausted", "throttled", "rate_limit_exceeded"):
if code_lower in {"resource_exhausted", "throttled", "rate_limit_exceeded"}:
return result_fn(
FailoverReason.rate_limit,
retryable=True,
should_rotate_credential=True,
)
if code_lower in ("insufficient_quota", "billing_not_active", "payment_required"):
if code_lower in {"insufficient_quota", "billing_not_active", "payment_required"}:
return result_fn(
FailoverReason.billing,
retryable=False,
@@ -842,14 +856,14 @@ def _classify_by_error_code(
should_fallback=True,
)
if code_lower in ("model_not_found", "model_not_available", "invalid_model"):
if code_lower in {"model_not_found", "model_not_available", "invalid_model"}:
return result_fn(
FailoverReason.model_not_found,
retryable=False,
should_fallback=True,
)
if code_lower in ("context_length_exceeded", "max_tokens_exceeded"):
if code_lower in {"context_length_exceeded", "max_tokens_exceeded"}:
return result_fn(
FailoverReason.context_overflow,
retryable=True,
@@ -963,6 +977,14 @@ def _classify_by_message(
should_fallback=True,
)
# Timeout message patterns — generic exception types (e.g. RuntimeError)
# raised by local shims or custom providers that internally wrap a
# subprocess/HTTP timeout. Classified as transport timeout so the retry
# loop rebuilds the client instead of treating the turn as an empty
# model response.
if any(p in error_msg for p in _TIMEOUT_MESSAGE_PATTERNS):
return result_fn(FailoverReason.timeout, retryable=True)
return None
+1 -1
View File
@@ -77,7 +77,7 @@ def _coerce_content_to_text(content: Any) -> str:
if p.get("type") == "text" and isinstance(p.get("text"), str):
pieces.append(p["text"])
# Multimodal (image_url, etc.) — stub for now; log and skip
elif p.get("type") in ("image_url", "input_audio"):
elif p.get("type") in {"image_url", "input_audio"}:
logger.debug("Dropping multimodal part (not yet supported): %s", p.get("type"))
return "\n".join(pieces)
return str(content)
+6
View File
@@ -945,6 +945,12 @@ class AsyncGeminiNativeClient:
self.api_key = sync_client.api_key
self.base_url = sync_client.base_url
self.chat = _AsyncGeminiChatNamespace(self)
# Expose the underlying sync client as _real_client so the auxiliary
# cache's eviction-by-leaf-client helper (#23482) can find and drop
# this async entry when the sync GeminiNativeClient is poisoned.
# GeminiNativeClient is itself the leaf (no OpenAI client beneath
# it), so we point at the sync_client directly.
self._real_client = sync_client
async def _create_chat_completion(self, **kwargs: Any) -> Any:
stream = bool(kwargs.get("stream"))
+29 -4
View File
@@ -39,20 +39,45 @@ from typing import Any
logger = logging.getLogger(__name__)
SUPPORTED_LANGUAGES: tuple[str, ...] = ("en", "zh", "ja", "de", "es", "fr", "tr", "uk")
SUPPORTED_LANGUAGES: tuple[str, ...] = (
"en", "zh", "zh-hant", "ja", "de", "es", "fr", "tr", "uk",
"af", "ko", "it", "ga", "pt", "ru", "hu",
)
DEFAULT_LANGUAGE = "en"
# Accept a few natural aliases so users who type "chinese" / "zh-CN" / "jp"
# get the right catalog instead of silently falling back to English.
_LANGUAGE_ALIASES: dict[str, str] = {
"english": "en", "en-us": "en", "en-gb": "en",
"chinese": "zh", "mandarin": "zh", "zh-cn": "zh", "zh-tw": "zh", "zh-hans": "zh", "zh-hant": "zh",
# Simplified Chinese — explicit codes route here; bare "chinese" / "mandarin"
# also default to Simplified since that's the larger user base.
"chinese": "zh", "mandarin": "zh", "zh-cn": "zh", "zh-hans": "zh", "zh-sg": "zh",
# Traditional Chinese — distinct catalog. Cover Taiwan / Hong Kong / Macau
# locale tags plus the common "traditional" alias.
"traditional-chinese": "zh-hant", "traditional_chinese": "zh-hant",
"zh-tw": "zh-hant", "zh-hk": "zh-hant", "zh-mo": "zh-hant",
"japanese": "ja", "jp": "ja", "ja-jp": "ja",
"german": "de", "deutsch": "de", "de-de": "de",
"spanish": "es", "español": "es", "espanol": "es", "es-es": "es", "es-mx": "es",
"german": "de", "deutsch": "de", "de-de": "de", "de-at": "de", "de-ch": "de",
"spanish": "es", "español": "es", "espanol": "es", "es-es": "es", "es-mx": "es", "es-ar": "es",
"french": "fr", "français": "fr", "france": "fr", "fr-fr": "fr", "fr-be": "fr", "fr-ca": "fr", "fr-ch": "fr",
"ukrainian": "uk", "ukrainisch": "uk", "українська": "uk", "uk-ua": "uk", "ua": "uk",
"turkish": "tr", "türkçe": "tr", "tr-tr": "tr",
# Afrikaans — South African Dutch-derived language; "af-ZA" is the common BCP-47 tag.
"afrikaans": "af", "af-za": "af",
# Korean
"korean": "ko", "한국어": "ko", "ko-kr": "ko",
# Italian
"italian": "it", "italiano": "it", "it-it": "it", "it-ch": "it",
# Irish (Gaeilge) — ga is the BCP-47 code
"irish": "ga", "gaeilge": "ga", "ga-ie": "ga",
# Portuguese — bare "portuguese" routes to European Portuguese; pt-br
# is in the same family but rendered identically here (no separate br catalog).
"portuguese": "pt", "português": "pt", "portugues": "pt",
"pt-pt": "pt", "pt-br": "pt", "brazilian": "pt", "brasileiro": "pt",
# Russian
"russian": "ru", "русский": "ru", "ru-ru": "ru",
# Hungarian
"hungarian": "hu", "magyar": "hu", "hu-hu": "hu",
}
_catalog_cache: dict[str, dict[str, str]] = {}
+4 -4
View File
@@ -76,7 +76,7 @@ def _explicit_aux_vision_override(cfg: Optional[Dict[str, Any]]) -> bool:
base_url = str(vision.get("base_url") or "").strip()
# "auto" / "" / blank = not explicit
if provider in ("", "auto") and not model and not base_url:
if provider in {"", "auto"} and not model and not base_url:
return False
return True
@@ -163,7 +163,7 @@ def _sniff_mime_from_bytes(raw: bytes) -> Optional[str]:
if raw.startswith(b"\xff\xd8\xff"):
return "image/jpeg"
# GIF87a / GIF89a
if raw[:6] in (b"GIF87a", b"GIF89a"):
if raw[:6] in {b"GIF87a", b"GIF89a"}:
return "image/gif"
# WEBP: "RIFF" .... "WEBP"
if len(raw) >= 12 and raw[:4] == b"RIFF" and raw[8:12] == b"WEBP":
@@ -172,9 +172,9 @@ def _sniff_mime_from_bytes(raw: bytes) -> Optional[str]:
if raw.startswith(b"BM"):
return "image/bmp"
# HEIC/HEIF: ftypheic / ftypheix / ftypmif1 / ftypmsf1 etc.
if len(raw) >= 12 and raw[4:8] == b"ftyp" and raw[8:12] in (
if len(raw) >= 12 and raw[4:8] == b"ftyp" and raw[8:12] in {
b"heic", b"heix", b"hevc", b"hevx", b"mif1", b"msf1", b"heim", b"heis",
):
}:
return "image/heic"
return None
+309
View File
@@ -0,0 +1,309 @@
"""CJK/wide-character-aware re-alignment of model-emitted markdown tables.
Models pad markdown tables assuming each character occupies one terminal
cell. CJK glyphs and most emoji render as two cells, so the model's
spacing collapses into drift the moment a table reaches a real terminal
header pipes line up, every body row drifts right by N cells per CJK
char.
This module rebuilds row padding using ``wcwidth.wcswidth`` (display
columns), preserving the table's pipes and dashes so it still reads as a
plain-text table in ``strip`` / unrendered display modes. Standard Rich
markdown rendering already aligns CJK correctly inside a wide enough
panel; this helper is for the paths that print the model's text more or
less verbatim.
The helper is deliberately conservative:
* Only contiguous ``| ... |`` blocks with a divider line are rewritten.
* Anything that does not look like a table is passed through unchanged.
* Single-line / mid-stream fragments are left alone callers buffer
table rows and flush them once the block is complete.
There is a small, intentional caveat: ``wcwidth`` returns ``-1`` for some
emoji-with-variation-selector sequences (e.g. ````); we clamp those to
0 so they do not corrupt the column width math. The 1-cell drift on
those specific glyphs is preferable to silently widening every table
that contains one.
"""
from __future__ import annotations
import re
from typing import List
from wcwidth import wcswidth
__all__ = [
"is_table_divider",
"looks_like_table_row",
"realign_markdown_tables",
"split_table_row",
]
_DIVIDER_CELL_RE = re.compile(r"^\s*:?-{3,}:?\s*$")
_MIN_COL_WIDTH = 3 # matches the divider's minimum dash run.
def _disp_width(s: str) -> int:
"""``wcswidth`` clamped to a non-negative integer.
``wcswidth`` returns ``-1`` when it encounters a control char or an
unknown sequence; treat those as zero-width rather than letting a
negative number flow into ``max`` and break the column-width math.
"""
w = wcswidth(s)
return w if w > 0 else 0
def _pad_to_width(s: str, target: int) -> str:
return s + " " * max(0, target - _disp_width(s))
def split_table_row(row: str) -> List[str]:
"""Split ``| a | b | c |`` into ``["a", "b", "c"]`` with trims."""
s = row.strip()
if s.startswith("|"):
s = s[1:]
if s.endswith("|"):
s = s[:-1]
return [c.strip() for c in s.split("|")]
def is_table_divider(row: str) -> bool:
"""True when ``row`` is a markdown table separator line."""
cells = split_table_row(row)
return len(cells) > 1 and all(_DIVIDER_CELL_RE.match(c) for c in cells)
def looks_like_table_row(row: str) -> bool:
"""True when ``row`` could plausibly be a markdown table row.
Used by streaming callers to decide whether to buffer an in-flight
line. We are intentionally permissive here the realigner itself
only rewrites blocks that are accompanied by a divider, so a false
positive here at most delays the print of one line.
"""
if "|" not in row:
return False
stripped = row.strip()
if not stripped:
return False
# A leading pipe is the strongest signal; without it we still allow
# rows with at least two pipes so models that omit the leading pipe
# don't slip past us.
if stripped.startswith("|"):
return True
return stripped.count("|") >= 2
def _render_block(rows: List[List[str]], available_width: int | None = None) -> List[str]:
"""Render ``rows`` (header + body, divider implied) at uniform widths.
If ``available_width`` is given and the rebuilt horizontal table
would exceed it, fall back to a vertical key-value rendering so
rows do not soft-wrap mid-cell terminal soft-wrap destroys
column alignment visually even when the underlying bytes are
perfectly padded, which is exactly the "tables look broken"
user report this code path is meant to address.
"""
ncols = max(len(r) for r in rows)
rows = [r + [""] * (ncols - len(r)) for r in rows]
widths = [
max(_MIN_COL_WIDTH, *(_disp_width(r[c]) for r in rows))
for c in range(ncols)
]
# Total horizontal width for the rendered row:
# `| ` + cell + ` ` for each column, plus the final closing `|`.
horizontal_width = sum(widths) + 3 * ncols + 1
if available_width is not None and horizontal_width > max(available_width, 20):
return _render_vertical(rows, ncols, available_width)
def _row(cells: List[str]) -> str:
return (
"| "
+ " | ".join(_pad_to_width(c, widths[k]) for k, c in enumerate(cells))
+ " |"
)
out = [_row(rows[0])]
out.append("|" + "|".join("-" * (w + 2) for w in widths) + "|")
for r in rows[1:]:
out.append(_row(r))
return out
def _wrap_to_width(text: str, width: int) -> List[str]:
"""Soft-wrap ``text`` at word boundaries to fit ``width`` display cells.
Falls back to hard-breaking the longest word if a single token is
wider than ``width``. Empty input yields a single empty string so
the caller's row count stays predictable.
"""
if width <= 0 or not text:
return [text]
words = text.split()
if not words:
return [""]
lines: List[str] = []
current = ""
current_w = 0
def _hard_break(word: str, w: int) -> List[str]:
out: List[str] = []
buf = ""
bw = 0
for ch in word:
cw = _disp_width(ch) or 1
if bw + cw > w and buf:
out.append(buf)
buf = ch
bw = cw
else:
buf += ch
bw += cw
if buf:
out.append(buf)
return out
for word in words:
ww = _disp_width(word)
if not current:
if ww <= width:
current = word
current_w = ww
else:
pieces = _hard_break(word, width)
lines.extend(pieces[:-1])
current = pieces[-1] if pieces else ""
current_w = _disp_width(current)
continue
if current_w + 1 + ww <= width:
current += " " + word
current_w += 1 + ww
else:
lines.append(current)
if ww <= width:
current = word
current_w = ww
else:
pieces = _hard_break(word, width)
lines.extend(pieces[:-1])
current = pieces[-1] if pieces else ""
current_w = _disp_width(current)
if current:
lines.append(current)
return lines or [""]
def _render_vertical(
rows: List[List[str]], ncols: int, available_width: int
) -> List[str]:
"""Render a too-wide table as vertical ``Header: value`` rows.
Mirrors Claude Code's narrow-terminal fallback in
``MarkdownTable.tsx``: each body row becomes a small block of
``Header: cell-value`` lines (continuation lines indented two
spaces) separated by a thin ```` divider between rows. Keeps
every line narrower than ``available_width`` so the terminal does
not soft-wrap mid-cell.
"""
if not rows:
return []
headers = rows[0] + [""] * (ncols - len(rows[0]))
body = rows[1:]
labels = [h or f"Column {i + 1}" for i, h in enumerate(headers)]
sep_width = max(20, min(40, available_width - 2)) if available_width else 30
separator = "" * sep_width
indent = " "
indent_w = _disp_width(indent)
out: List[str] = []
for ri, row in enumerate(body):
if ri > 0:
out.append(separator)
for ci in range(ncols):
label = labels[ci]
value = row[ci] if ci < len(row) else ""
label_w = _disp_width(label)
first_budget = max(10, available_width - label_w - 2)
cont_budget = max(10, available_width - indent_w)
if not value:
out.append(f"{label}:")
continue
wrapped = _wrap_to_width(value, first_budget)
out.append(f"{label}: {wrapped[0]}")
if len(wrapped) > 1:
# Re-flow continuation text at the wider continuation
# budget — words split across the narrower first-line
# budget should re-pack greedily for the rest.
cont_text = " ".join(wrapped[1:])
for cl in _wrap_to_width(cont_text, cont_budget):
if cl.strip():
out.append(f"{indent}{cl}")
return out
def realign_markdown_tables(text: str, available_width: int | None = None) -> str:
"""Rewrite every ``| ... |`` + divider block with wcwidth-aware padding.
Lines that are not part of a recognised table are returned verbatim,
so this is safe to apply to arbitrary assistant prose.
If ``available_width`` is given (terminal cells available for the
rendered table), tables wider than that are rendered as vertical
key-value pairs instead of a horizontal pipe-bordered grid. This
avoids the terminal soft-wrapping mid-cell, which destroys column
alignment visually even when the bytes are perfectly padded.
"""
if "|" not in text:
return text
lines = text.split("\n")
out: List[str] = []
i = 0
n = len(lines)
while i < n:
line = lines[i]
# A table starts with a header row whose next line is a divider.
if (
"|" in line
and i + 1 < n
and is_table_divider(lines[i + 1])
):
header = split_table_row(line)
body: List[List[str]] = []
j = i + 2
while j < n and "|" in lines[j] and lines[j].strip():
if is_table_divider(lines[j]):
j += 1
continue
body.append(split_table_row(lines[j]))
j += 1
if any(c for c in header) or body:
out.extend(_render_block([header] + body, available_width))
i = j
continue
out.append(line)
i += 1
return "\n".join(out)
+2 -2
View File
@@ -470,11 +470,11 @@ class MemoryManager:
accepted = [
p for p in params
if p.kind in (
if p.kind in {
inspect.Parameter.POSITIONAL_ONLY,
inspect.Parameter.POSITIONAL_OR_KEYWORD,
inspect.Parameter.KEYWORD_ONLY,
)
}
]
if len(accepted) >= 4:
return "positional"
+193 -18
View File
@@ -157,6 +157,13 @@ DEFAULT_CONTEXT_LENGTHS = {
"gpt-5.4-nano": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4-mini": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4": 1050000, # GPT-5.4, GPT-5.4 Pro (1.05M context)
# gpt-5.3-codex-spark is Codex-OAuth-only (ChatGPT Pro entitlement) and
# uses a smaller 128k window than other gpt-5.x slugs. Listed here as
# a defensive override so the longest-substring fallback doesn't match
# the generic "gpt-5" entry below (400k) and report the wrong limit if
# Spark's context ever needs to be resolved through this path. Real
# usage flows through _CODEX_OAUTH_CONTEXT_FALLBACK at line ~1113.
"gpt-5.3-codex-spark": 128000,
"gpt-5.1-chat": 128000, # Chat variant has 128k context
"gpt-5": 400000, # GPT-5.x base, mini, codex variants (400k)
"gpt-4.1": 1047576,
@@ -210,8 +217,10 @@ DEFAULT_CONTEXT_LENGTHS = {
"grok": 131072, # catch-all (grok-beta, unknown grok-*)
# Kimi
"kimi": 262144,
# Tencent — Hy3 Preview (Hunyuan) with 256K context window
"hy3-preview": 256000,
# Tencent — Hy3 Preview (Hunyuan) with 256K context window.
# OpenRouter live metadata reports 262144 (256 × 1024); align the
# static fallback so cache and offline both agree (issue #22268).
"hy3-preview": 262144,
# Nemotron — NVIDIA's open-weights series (128K context across all sizes)
"nemotron": 131072,
# Arcee
@@ -235,6 +244,44 @@ DEFAULT_CONTEXT_LENGTHS = {
"zai-org/GLM-5": 202752,
}
# xAI Grok models that ACCEPT the `reasoning.effort` parameter on
# api.x.ai. Verified live against /v1/responses 2026-05-10:
#
# ACCEPTS effort: grok-3-mini, grok-3-mini-fast, grok-4.20-multi-agent-0309,
# grok-4.3
# REJECTS effort: grok-3, grok-4, grok-4-0709, grok-4-fast-(non-)reasoning,
# grok-4-1-fast-(non-)reasoning, grok-4.20-0309-(non-)reasoning,
# grok-code-fast-1
#
# REJECTS-side models still reason natively — they just don't expose an
# effort dial — so callers should send no `reasoning` key at all rather
# than a default `medium` (which 400s with "Model X does not support
# parameter reasoningEffort").
_GROK_EFFORT_CAPABLE_PREFIXES = (
"grok-3-mini",
"grok-4.20-multi-agent",
"grok-4.3",
)
def grok_supports_reasoning_effort(model: str) -> bool:
"""Return True when an xAI Grok model accepts ``reasoning.effort``.
Allowlist by substring (matches both bare ``grok-3-mini`` and
aggregator-prefixed ``x-ai/grok-3-mini``). Conservative by design:
if a future Grok model isn't listed, we send no effort dial rather
than 400.
"""
name = (model or "").strip().lower()
if not name:
return False
# Strip common aggregator prefixes (x-ai/, openrouter/x-ai/, xai/, ...)
for sep in ("/",):
if sep in name:
name = name.rsplit(sep, 1)[-1]
return any(name.startswith(prefix) for prefix in _GROK_EFFORT_CAPABLE_PREFIXES)
_CONTEXT_LENGTH_KEYS = (
"context_length",
"context_window",
@@ -524,7 +571,7 @@ def _extract_pricing(payload: Dict[str, Any]) -> Dict[str, Any]:
pricing: Dict[str, Any] = {}
for target, aliases in alias_map.items():
for alias in aliases:
if alias in normalized and normalized[alias] not in (None, ""):
if alias in normalized and normalized[alias] not in {None, ""}:
pricing[target] = normalized[alias]
break
if pricing:
@@ -959,6 +1006,79 @@ def query_ollama_num_ctx(model: str, base_url: str, api_key: str = "") -> Option
return None
def _query_ollama_api_show(model: str, base_url: str, api_key: str = "") -> Optional[int]:
"""Query an Ollama server's native ``/api/show`` for context length.
Provider-agnostic: works against ANY Ollama-compatible server regardless
of hostname local Ollama, Ollama Cloud (``ollama.com``), custom Ollama
hosting behind a reverse proxy, etc. For non-Ollama servers the POST
returns 404/405 quickly; the function handles errors gracefully.
For hosted servers the GGUF ``model_info.*.context_length`` is the
authoritative source: the user can't set their own ``num_ctx``, and the
OpenAI-compat ``/v1/models`` endpoint correctly omits ``context_length``
per the OpenAI schema.
Resolution order for hosted Ollama:
1. ``model_info.*.context_length`` GGUF training max (authoritative)
2. ``parameters`` ``num_ctx`` server-side Modelfile override
The order is flipped vs ``query_ollama_num_ctx()`` because local users
control ``num_ctx`` themselves; hosted users can't.
"""
import httpx
server_url = base_url.rstrip("/")
if server_url.endswith("/v1"):
server_url = server_url[:-3]
headers = _auth_headers(api_key)
try:
with httpx.Client(timeout=5.0, headers=headers) as client:
resp = client.post(f"{server_url}/api/show", json={"name": model})
if resp.status_code != 200:
return None
data = resp.json()
# Hosted Ollama: GGUF model_info is the real max — prefer it over
# num_ctx which the Cloud operator may have capped arbitrarily.
model_info = data.get("model_info", {})
for key, value in model_info.items():
if "context_length" in key and isinstance(value, (int, float)):
ctx = int(value)
if ctx >= 1024:
return ctx
# Fall back to num_ctx from Modelfile parameters (rare on Cloud)
params = data.get("parameters", "")
if "num_ctx" in params:
for line in params.split("\n"):
if "num_ctx" in line:
parts = line.strip().split()
if len(parts) >= 2:
try:
ctx = int(parts[-1])
if ctx >= 1024:
return ctx
except ValueError:
pass
except Exception:
pass
return None
def _model_name_suggests_kimi(model: str) -> bool:
"""Return True if the model name looks like a Kimi-family model.
Catches ``kimi-k2.6``, ``kimi-k2.5``, ``kimi-k2-thinking``,
``moonshotai/Kimi-K2.6``, and similar variants. Used as a guard
against stale OpenRouter metadata that underreports these models
as 32K context when they actually support 262K+.
"""
lower = model.lower()
return lower.startswith("kimi") or "moonshot" in lower
def _query_local_context_length(model: str, base_url: str, api_key: str = "") -> Optional[int]:
"""Query a local server for the model's context length."""
import httpx
@@ -1106,6 +1226,12 @@ _CODEX_OAUTH_CONTEXT_FALLBACK: Dict[str, int] = {
"gpt-5.1-codex-max": 272_000,
"gpt-5.1-codex-mini": 272_000,
"gpt-5.3-codex": 272_000,
# Spark runs on specialised low-latency hardware and exposes a smaller
# 128k window than other Codex OAuth slugs. Listed explicitly so the
# longest-key-first fallback resolves it correctly — substring match
# on "gpt-5.3-codex" otherwise wins and reports 272k. Availability is
# gated by ChatGPT Pro entitlement on the Codex backend.
"gpt-5.3-codex-spark": 128_000,
"gpt-5.2-codex": 272_000,
"gpt-5.4-mini": 272_000,
"gpt-5.5": 272_000,
@@ -1254,12 +1380,17 @@ def get_model_context_length(
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
5. OpenRouter live API metadata
6. Nous suffix-match via OpenRouter cache
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. Default fallback (256K)
"""
5. Provider-aware lookups (before generic OpenRouter cache):
a. Copilot live /models API
b. Nous suffix-match via OpenRouter cache
c. Codex OAuth /models probe
d. GMI /models endpoint
e. Ollama native /api/show probe (any base_url, provider-agnostic)
f. models.dev registry lookup (with :cloud/-cloud suffix fallback)
6. OpenRouter live API metadata (Kimi-family 32k guard)
7. Hardcoded defaults (broad family patterns, longest-key-first)
8. Local server query (last resort)
9. Default fallback (256K)"""
# 0. Explicit config override — user knows best
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
return config_context_length
@@ -1339,6 +1470,13 @@ def get_model_context_length(
if context_length is not None:
return context_length
if not _is_known_provider_base_url(base_url):
# 2b. Ollama native /api/show — any URL might be an Ollama server
# (local, cloud, or custom hosting). Non-Ollama servers return
# 404/405 quickly. Fall through on failure.
ctx = _query_ollama_api_show(model, base_url, api_key=api_key)
if ctx is not None:
save_context_length(model, base_url, ctx)
return ctx
# 3. Try querying local server directly
if is_local_endpoint(base_url):
local_ctx = _query_local_context_length(model, base_url, api_key=api_key)
@@ -1370,7 +1508,7 @@ def get_model_context_length(
# (e.g. claude-opus-4.6 is 1M on Anthropic but 128K on GitHub Copilot).
# If provider is generic (openrouter/custom/empty), try to infer from URL.
effective_provider = provider
if not effective_provider or effective_provider in ("openrouter", "custom"):
if not effective_provider or effective_provider in {"openrouter", "custom"}:
if base_url:
inferred = _infer_provider_from_url(base_url)
if inferred:
@@ -1380,7 +1518,7 @@ def get_model_context_length(
# This catches account-specific models (e.g. claude-opus-4.6-1m) that
# don't exist in models.dev. For models that ARE in models.dev, this
# returns the provider-enforced limit which is what users can actually use.
if effective_provider in ("copilot", "copilot-acp", "github-copilot"):
if effective_provider in {"copilot", "copilot-acp", "github-copilot"}:
try:
from hermes_cli.models import get_copilot_model_context
ctx = get_copilot_model_context(model, api_key=api_key)
@@ -1408,16 +1546,53 @@ def get_model_context_length(
ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if ctx is not None:
return ctx
# 5e. Ollama native /api/show probe — runs for ANY provider with a
# base_url, not just ollama-cloud. Ollama-compatible servers expose
# this endpoint regardless of hostname (local Ollama, Ollama Cloud,
# custom Ollama hosting). The OpenAI-compat /v1/models endpoint
# correctly omits context_length per the OpenAI schema, but /api/show
# returns the authoritative GGUF model_info.context_length.
# For non-Ollama servers (OpenAI, Anthropic, etc.), the POST returns
# 404/405 quickly. Results are cached, so the hit is per-model+URL,
# once per hour.
if base_url:
ctx = _query_ollama_api_show(model, base_url, api_key=api_key)
if ctx is not None:
save_context_length(model, base_url, ctx)
return ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
if ctx:
return ctx
# 6. OpenRouter live API metadata (provider-unaware fallback)
metadata = fetch_model_metadata()
if model in metadata:
return metadata[model].get("context_length", DEFAULT_FALLBACK_CONTEXT)
# 6. OpenRouter live API metadata provider-unaware fallback.
# Only consulted when the provider is unknown (no effective_provider),
# because OpenRouter data is community-maintained and can be incorrect
# for models that belong to known providers with curated defaults.
if not effective_provider:
metadata = fetch_model_metadata()
if model in metadata:
or_ctx = metadata[model].get("context_length", DEFAULT_FALLBACK_CONTEXT)
# Guard against stale OpenRouter metadata for Kimi-family models.
# OpenRouter reports 32768 for moonshotai/kimi-k2.6, but the model
# actually supports 262144 (models.dev + official Kimi docs agree).
# Providers that host their own Kimi endpoints (Ollama Cloud, Kimi
# Coding, Moonshot) would otherwise trip the 64k minimum-context
# guard and reject a perfectly capable model.
# The filter is narrow: only reject exactly 32768 for Kimi-named
# models. If OpenRouter ever updates its data, the stale path
# becomes dead code with no impact.
if or_ctx == 32768 and _model_name_suggests_kimi(model):
logger.info(
"Rejecting OpenRouter metadata context=%s for %r "
"(Kimi-family underreport); falling through to hardcoded defaults",
or_ctx, model,
)
else:
return or_ctx
# 7. (reserved)
# 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
# Only check `default_model in model` (is the key a substring of the input).
@@ -1480,7 +1655,7 @@ def _count_image_tokens(msg: Dict[str, Any], cost_per_image: int) -> int:
if not isinstance(part, dict):
continue
ptype = part.get("type")
if ptype in ("image", "image_url", "input_image"):
if ptype in {"image", "image_url", "input_image"}:
count += 1
stashed = msg.get("_anthropic_content_blocks") if isinstance(msg, dict) else None
if isinstance(stashed, list):
@@ -1492,7 +1667,7 @@ def _count_image_tokens(msg: Dict[str, Any], cost_per_image: int) -> int:
inner = content.get("content")
if isinstance(inner, list):
for part in inner:
if isinstance(part, dict) and part.get("type") in ("image", "image_url"):
if isinstance(part, dict) and part.get("type") in {"image", "image_url"}:
count += 1
return count * cost_per_image
@@ -1514,7 +1689,7 @@ def _estimate_message_chars(msg: Dict[str, Any]) -> int:
cleaned = []
for part in v:
if isinstance(part, dict):
if part.get("type") in ("image", "image_url", "input_image"):
if part.get("type") in {"image", "image_url", "input_image"}:
cleaned.append({"type": part.get("type"), "image": "[stripped]"})
else:
cleaned.append(part)
+92 -5
View File
@@ -145,7 +145,9 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openai": "openai",
"openai-codex": "openai",
"zai": "zai",
"kimi": "kimi-for-coding",
"kimi-coding": "kimi-for-coding",
"moonshot": "kimi-for-coding",
"stepfun": "stepfun",
"kimi-coding-cn": "kimi-for-coding",
"minimax": "minimax",
@@ -197,6 +199,32 @@ def _load_disk_cache() -> Dict[str, Any]:
return {}
def _disk_cache_age_seconds() -> Optional[float]:
"""Return age (in seconds) of the disk cache file, or None if missing.
Used by ``fetch_models_dev`` to short-circuit the network probe when
a recent on-disk cache exists. Errors (missing file, permission
denied, weird filesystem) all return None callers fall through
to the network fetch path.
"""
try:
cache_path = _get_cache_path()
if not cache_path.exists():
return None
mtime = cache_path.stat().st_mtime
age = time.time() - mtime
# Negative age means the file's mtime is in the future (clock skew
# or system clock reset). Treat as "unknown freshness" → fall
# through to network so we don't serve potentially-bad data
# forever.
if age < 0:
return None
return age
except Exception as e:
logger.debug("Failed to stat models.dev disk cache: %s", e)
return None
def _save_disk_cache(data: Dict[str, Any]) -> None:
"""Save models.dev data to disk cache atomically."""
try:
@@ -207,13 +235,29 @@ def _save_disk_cache(data: Dict[str, Any]) -> None:
def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
"""Fetch models.dev registry. In-memory cache (1hr) + disk fallback.
"""Fetch models.dev registry. Cache hierarchy: in-mem → disk → network.
Returns the full registry dict keyed by provider ID, or empty dict on failure.
Cache hierarchy (when ``force_refresh=False``):
1. In-memory cache, populated and < TTL old return immediately.
2. **Disk cache file < TTL old by mtime load, populate in-mem, return.**
No network call. Saves ~500 ms per cold-start agent construction;
``models.dev`` only changes when providers add new models, so a
1 hour staleness window is acceptable (same TTL as in-mem cache).
3. Network fetch on success, save to disk + in-mem and return.
4. Network fails fall back to ANY available disk cache (even stale)
with a short 5 min in-mem grace period before retrying network.
When ``force_refresh=True`` (used by ``hermes config refresh``, the
\"refresh model catalog\" code path), stages 1 and 2 are skipped. The
function always hits the network and only falls back to disk if the
network call fails.
"""
global _models_dev_cache, _models_dev_cache_time
# Check in-memory cache
# Stage 1: fresh in-memory cache wins. This is the hot path on
# long-lived processes — no I/O, no system calls.
if (
not force_refresh
and _models_dev_cache
@@ -221,7 +265,27 @@ def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
):
return _models_dev_cache
# Try network fetch
# Stage 2: fresh-by-mtime disk cache short-circuits the network call.
# Only kicks in on cold-start processes (in-mem cache is empty or
# expired) and only when the user hasn't asked for a forced refresh.
# Skipped if the disk cache file is missing, unreadable, or older
# than _MODELS_DEV_CACHE_TTL.
if not force_refresh:
disk_age = _disk_cache_age_seconds()
if disk_age is not None and disk_age < _MODELS_DEV_CACHE_TTL:
disk_data = _load_disk_cache()
if disk_data:
_models_dev_cache = disk_data
# Anchor in-mem TTL to the disk file's age so we don't
# extend an already-aging cache by another full hour.
_models_dev_cache_time = time.time() - disk_age
logger.debug(
"Loaded models.dev from fresh disk cache "
"(%d providers, age=%.0fs)", len(disk_data), disk_age,
)
return _models_dev_cache
# Stage 3: network fetch.
try:
response = requests.get(MODELS_DEV_URL, timeout=15)
response.raise_for_status()
@@ -239,8 +303,9 @@ def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
except Exception as e:
logger.debug("Failed to fetch models.dev: %s", e)
# Fall back to disk cache — use a short TTL (5 min) so we retry
# the network fetch soon instead of serving stale data for a full hour.
# Stage 4: network failed — fall back to whatever disk cache exists,
# even if it's stale. Give it a short 5 min in-mem TTL so we retry
# the network soon instead of serving stale data for a full hour.
if not _models_dev_cache:
_models_dev_cache = _load_disk_cache()
if _models_dev_cache:
@@ -284,6 +349,28 @@ def lookup_models_dev_context(provider: str, model: str) -> Optional[int]:
if ctx:
return ctx
# Suffix-aware fallback: some providers (e.g. ollama-cloud) store
# model IDs with :cloud / -cloud suffixes in models.dev while the
# live API returns bare names. Without this, kimi-k2.6 misses the
# kimi-k2.6:cloud entry and falls through to stale OpenRouter metadata
# reporting 32768 — tripping the 64k minimum-context guard.
# The suffix-stripping in fetch_ollama_cloud_models() handles the
# model-picker UX; this handles the context-length lookup path.
for suffix in (":cloud", "-cloud"):
suffixed_key = model + suffix
entry = models.get(suffixed_key)
if entry:
ctx = _extract_context(entry)
if ctx:
return ctx
# Also try case-insensitive
suffixed_lower = model_lower + suffix
for mid, mdata in models.items():
if mid.lower() == suffixed_lower:
ctx = _extract_context(mdata)
if ctx:
return ctx
return None
+2 -2
View File
@@ -122,7 +122,7 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
# empty, drop it entirely.
if "enum" in repaired and isinstance(repaired["enum"], list):
node_type = repaired.get("type")
if node_type in ("string", "integer", "number", "boolean"):
if node_type in {"string", "integer", "number", "boolean"}:
cleaned = [v for v in repaired["enum"]
if v is not None and v != ""]
if cleaned:
@@ -135,7 +135,7 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
def _fill_missing_type(node: Dict[str, Any]) -> Dict[str, Any]:
"""Infer a reasonable ``type`` if this schema node has none."""
if "type" in node and node["type"] not in (None, ""):
if "type" in node and node["type"] not in {None, ""}:
return node
# Heuristic: presence of ``properties`` → object, ``items`` → array, ``enum``
+1046
View File
File diff suppressed because it is too large Load Diff
+12 -1
View File
@@ -157,6 +157,9 @@ MEMORY_GUIDANCE = (
"User preferences and recurring corrections matter more than procedural task details.\n"
"Do NOT save task progress, session outcomes, completed-work logs, or temporary TODO "
"state to memory; use session_search to recall those from past transcripts. "
"Specifically: do not record PR numbers, issue numbers, commit SHAs, 'fixed bug X', "
"'submitted PR Y', 'Phase N done', file counts, or any artifact that will be stale "
"in 7 days. If a fact will be stale in a week, it does not belong in memory. "
"If you've discovered a new way to do something, solved a problem that could be "
"necessary later, save it as a skill with the skill tool.\n"
"Write memories as declarative facts, not instructions to yourself. "
@@ -213,7 +216,15 @@ KANBAN_GUIDANCE = (
"artifacts. `metadata` is machine-readable facts "
"(`{changed_files: [...], tests_run: N, decisions: [...]}`). Downstream "
"workers read both via their own `kanban_show`. Never put secrets / "
"tokens / raw PII in either field — run rows are durable forever.\n"
"tokens / raw PII in either field — run rows are durable forever. "
"Exception: if your output is a code change that needs human review "
"before counting as merged/done (most coding tasks), drop the "
"structured metadata (changed_files / tests_run / diff_path) into a "
"`kanban_comment` first, then end with "
"`kanban_block(reason=\"review-required: <one-line summary>\")` so a "
"reviewer can approve+unblock or request changes. Reviewing-then-"
"completing is more honest than auto-completing work that still needs "
"eyes on it.\n"
"6. **If follow-up work appears, create it; don't do it.** Use "
"`kanban_create(title=..., assignee=<right-profile>, parents=[your-task-id])` "
"to spawn a child task for the appropriate specialist profile instead of "
+139 -10
View File
@@ -1,15 +1,25 @@
"""Anthropic prompt caching (system_and_3 strategy).
"""Anthropic prompt caching strategies.
Reduces input token costs by ~75% on multi-turn conversations by caching
the conversation prefix. Uses 4 cache_control breakpoints (Anthropic max):
1. System prompt (stable across all turns)
2-4. Last 3 non-system messages (rolling window)
Two layouts:
* ``system_and_3`` (default, used everywhere except the long-lived path):
4 cache_control breakpoints system prompt + last 3 non-system messages.
All at the same TTL (5m or 1h). Reduces input token costs by ~75% on
multi-turn conversations within a single session.
* ``prefix_and_2`` (Claude on Anthropic / OpenRouter / Nous Portal):
4 breakpoints split across two TTL tiers tools[-1] (1h) +
stable system prefix (1h) + last 2 non-system messages (5m). The
long-lived prefix is byte-stable across sessions for a given user
config, so every fresh session reads the cached system+tools instead
of re-paying for them. Within-session rolling window shrinks from 3
messages to 2 to free the breakpoint budget.
Pure functions -- no class state, no AIAgent dependency.
"""
import copy
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool = False) -> None:
@@ -38,6 +48,14 @@ def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool =
last["cache_control"] = cache_marker
def _build_marker(ttl: str) -> Dict[str, str]:
"""Build a cache_control marker dict for the given TTL ('5m' or '1h')."""
marker: Dict[str, str] = {"type": "ephemeral"}
if ttl == "1h":
marker["ttl"] = "1h"
return marker
def apply_anthropic_cache_control(
api_messages: List[Dict[str, Any]],
cache_ttl: str = "5m",
@@ -45,7 +63,8 @@ def apply_anthropic_cache_control(
) -> List[Dict[str, Any]]:
"""Apply system_and_3 caching strategy to messages for Anthropic models.
Places up to 4 cache_control breakpoints: system prompt + last 3 non-system messages.
Places up to 4 cache_control breakpoints: system prompt + last 3 non-system
messages, all at the same TTL.
Returns:
Deep copy of messages with cache_control breakpoints injected.
@@ -54,9 +73,7 @@ def apply_anthropic_cache_control(
if not messages:
return messages
marker = {"type": "ephemeral"}
if cache_ttl == "1h":
marker["ttl"] = "1h"
marker = _build_marker(cache_ttl)
breakpoints_used = 0
@@ -70,3 +87,115 @@ def apply_anthropic_cache_control(
_apply_cache_marker(messages[idx], marker, native_anthropic=native_anthropic)
return messages
def _mark_system_stable_block(
messages: List[Dict[str, Any]],
long_lived_marker: Dict[str, str],
) -> bool:
"""Mark the *first* content block of the system message with the 1h marker.
The system message is expected to have been split into multiple content
blocks beforehand by the caller block[0] is the cross-session-stable
prefix, subsequent blocks carry context files + volatile suffix.
Falls back to marking the whole system message as a single block when
the message hasn't been split (preserves correctness on the fallback path).
Returns True when a marker was placed.
"""
if not messages or messages[0].get("role") != "system":
return False
sys_msg = messages[0]
content = sys_msg.get("content")
# Already a list of blocks → mark the first block.
if isinstance(content, list) and content:
first = content[0]
if isinstance(first, dict):
first["cache_control"] = long_lived_marker
return True
return False
# String content (no split) → cannot place a stable-prefix breakpoint
# without changing the byte content. Caller is responsible for
# splitting; if they didn't, fall through to envelope marker so we still
# cache *something* for this turn.
if isinstance(content, str) and content:
sys_msg["content"] = [
{"type": "text", "text": content, "cache_control": long_lived_marker}
]
return True
return False
def apply_anthropic_cache_control_long_lived(
api_messages: List[Dict[str, Any]],
long_lived_ttl: str = "1h",
rolling_ttl: str = "5m",
native_anthropic: bool = False,
) -> List[Dict[str, Any]]:
"""Apply prefix_and_2 caching: long-lived stable prefix + rolling window.
Layout (4 breakpoints total):
* Stable system prefix (block[0]) ``long_lived_ttl`` TTL
* Last 2 non-system messages ``rolling_ttl`` TTL each
NOTE: this function does NOT mark the tools array. Tools cache_control
is attached separately (see ``mark_tools_for_long_lived_cache``) because
tools live outside the messages list in the API payload.
The caller MUST have split the system message into ordered content
blocks where block[0] is the cross-session-stable portion. If the system
message is still a single string, it is wrapped into a single block and
marked this is correct, just less effective (the volatile suffix is
not isolated, so the prefix invalidates per-session).
Returns:
Deep copy of messages with cache_control breakpoints injected.
"""
messages = copy.deepcopy(api_messages)
if not messages:
return messages
long_marker = _build_marker(long_lived_ttl)
rolling_marker = _build_marker(rolling_ttl)
placed_prefix = _mark_system_stable_block(messages, long_marker)
# Reserve 1 breakpoint for the system prefix (when placed); spend the
# remaining 3 on the rolling tail. Anthropic max is 4 total —
# tools[-1] (when marked) consumes the 4th, so we cap rolling at 2 here.
rolling_budget = 2 if placed_prefix else 3
non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
for idx in non_sys[-rolling_budget:]:
_apply_cache_marker(messages[idx], rolling_marker, native_anthropic=native_anthropic)
return messages
def mark_tools_for_long_lived_cache(
tools: Optional[List[Dict[str, Any]]],
long_lived_ttl: str = "1h",
) -> Optional[List[Dict[str, Any]]]:
"""Attach cache_control to the last tool in the OpenAI-format tools list.
Anthropic prefix-cache order is ``tools system messages``. Marking
the last tool dict caches the entire tools array (Anthropic's docs:
"the marker is placed on the last block you want included in the cached
prefix"). Marker is preserved across the OpenAI-wire boundary on
OpenRouter and Nous Portal (which proxies to OpenRouter); on native
Anthropic the marker is forwarded by ``convert_tools_to_anthropic``.
Returns a deep copy of the tools list with the marker attached, or the
input unchanged when tools is empty/None. Pure function does not
mutate the input.
"""
if not tools:
return tools
out = copy.deepcopy(tools)
last = out[-1]
if isinstance(last, dict):
last["cache_control"] = _build_marker(long_lived_ttl)
return out
+1 -1
View File
@@ -64,7 +64,7 @@ _SENSITIVE_BODY_KEYS = frozenset({
# cli.py) or `HERMES_REDACT_SECRETS=false` in ~/.hermes/.env. An opt-out
# warning is logged at gateway and CLI startup so operators see the
# downgrade — see `_log_redaction_status()` in gateway/run.py and cli.py.
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "true").lower() in ("1", "true", "yes", "on")
_REDACT_ENABLED = os.getenv("HERMES_REDACT_SECRETS", "true").lower() in {"1", "true", "yes", "on"}
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
+5 -5
View File
@@ -312,7 +312,7 @@ def _parse_single_entry(
)
matcher = None
if matcher is not None and event not in ("pre_tool_call", "post_tool_call"):
if matcher is not None and event not in {"pre_tool_call", "post_tool_call"}:
logger.warning(
"hooks.%s[%d].matcher=%r will be ignored at runtime — the "
"matcher field is only honored for pre_tool_call / "
@@ -423,7 +423,7 @@ def _make_callback(spec: ShellHookSpec) -> Callable[..., Optional[Dict[str, Any]
def _callback(**kwargs: Any) -> Optional[Dict[str, Any]]:
# Matcher gate — only meaningful for tool-scoped events.
if spec.event in ("pre_tool_call", "post_tool_call"):
if spec.event in {"pre_tool_call", "post_tool_call"}:
if not spec.matches_tool(kwargs.get("tool_name")):
return None
@@ -658,7 +658,7 @@ def _prompt_and_record(
print() # keep the terminal tidy after ^C
return False
if answer in ("y", "yes"):
if answer in {"y", "yes"}:
_record_approval(event, command)
return True
@@ -752,13 +752,13 @@ def _resolve_effective_accept(
if accept_hooks_arg:
return True
env = os.environ.get("HERMES_ACCEPT_HOOKS", "").strip().lower()
if env in ("1", "true", "yes", "on"):
if env in {"1", "true", "yes", "on"}:
return True
cfg_val = cfg.get("hooks_auto_accept", False)
if isinstance(cfg_val, bool):
return cfg_val
if isinstance(cfg_val, str):
return cfg_val.strip().lower() in ("1", "true", "yes", "on")
return cfg_val.strip().lower() in {"1", "true", "yes", "on"}
return False
+1 -1
View File
@@ -261,7 +261,7 @@ def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
for scan_dir in dirs_to_scan:
for skill_md in iter_skill_index_files(scan_dir, "SKILL.md"):
if any(part in ('.git', '.github', '.hub', '.archive') for part in skill_md.parts):
if any(part in {'.git', '.github', '.hub', '.archive'} for part in skill_md.parts):
continue
try:
content = skill_md.read_text(encoding='utf-8')
+19 -2
View File
@@ -279,7 +279,7 @@ class ChatCompletionsTransport(ProviderTransport):
_kimi_effort = "medium"
if reasoning_config and isinstance(reasoning_config, dict):
_e = (reasoning_config.get("effort") or "").strip().lower()
if _e in ("low", "medium", "high"):
if _e in {"low", "medium", "high"}:
_kimi_effort = _e
api_kwargs["reasoning_effort"] = _kimi_effort
@@ -294,7 +294,7 @@ class ChatCompletionsTransport(ProviderTransport):
_tokenhub_effort = "high"
if reasoning_config and isinstance(reasoning_config, dict):
_e = (reasoning_config.get("effort") or "").strip().lower()
if _e in ("low", "medium", "high"):
if _e in {"low", "medium", "high"}:
_tokenhub_effort = _e
api_kwargs["reasoning_effort"] = _tokenhub_effort
@@ -323,6 +323,21 @@ class ChatCompletionsTransport(ProviderTransport):
if provider_prefs and is_openrouter:
extra_body["provider"] = provider_prefs
# Pareto Code router plugin — model-gated. Same shape as the
# profile path in plugins/model-providers/openrouter/__init__.py;
# this branch only runs when the OpenRouter profile isn't loaded.
if is_openrouter and model == "openrouter/pareto-code":
_pareto_score = params.get("openrouter_min_coding_score")
if _pareto_score is not None and _pareto_score != "":
try:
_pareto_score_f = float(_pareto_score)
except (TypeError, ValueError):
_pareto_score_f = None
if _pareto_score_f is not None and 0.0 <= _pareto_score_f <= 1.0:
extra_body["plugins"] = [
{"id": "pareto-router", "min_coding_score": _pareto_score_f}
]
# Kimi extra_body.thinking
if is_kimi:
_kimi_thinking_enabled = True
@@ -448,6 +463,7 @@ class ChatCompletionsTransport(ProviderTransport):
qwen_session_metadata=params.get("qwen_session_metadata"),
model=model,
ollama_num_ctx=params.get("ollama_num_ctx"),
session_id=params.get("session_id"),
)
)
api_kwargs.update(top_level_from_profile)
@@ -462,6 +478,7 @@ class ChatCompletionsTransport(ProviderTransport):
model=model,
base_url=params.get("base_url"),
reasoning_config=reasoning_config,
openrouter_min_coding_score=params.get("openrouter_min_coding_score"),
)
if profile_body:
extra_body.update(profile_body)
+9
View File
@@ -104,7 +104,16 @@ class ResponsesApiTransport(ProviderTransport):
kwargs["prompt_cache_key"] = session_id
if reasoning_enabled and is_xai_responses:
from agent.model_metadata import grok_supports_reasoning_effort
kwargs["include"] = ["reasoning.encrypted_content"]
# xAI rejects `reasoning.effort` on grok-4 / grok-4-fast / grok-3
# / grok-code-fast / grok-4.20-0309-* with HTTP 400 even though
# those models reason natively. Only send the effort dial when
# the target model is on the allowlist; otherwise send no
# `reasoning` key at all and let the model reason on its own.
if grok_supports_reasoning_effort(model):
kwargs["reasoning"] = {"effort": reasoning_effort}
elif reasoning_enabled:
if is_github_responses:
github_reasoning = params.get("github_reasoning_extra")
+5 -1
View File
@@ -337,6 +337,7 @@ def _process_single_prompt(
providers_ignored=config.get("providers_ignored"),
providers_order=config.get("providers_order"),
provider_sort=config.get("provider_sort"),
openrouter_min_coding_score=config.get("openrouter_min_coding_score"),
max_tokens=config.get("max_tokens"),
reasoning_config=config.get("reasoning_config"),
prefill_messages=config.get("prefill_messages"),
@@ -546,6 +547,7 @@ class BatchRunner:
providers_ignored: List[str] = None,
providers_order: List[str] = None,
provider_sort: str = None,
openrouter_min_coding_score: Optional[float] = None,
max_tokens: int = None,
reasoning_config: Dict[str, Any] = None,
prefill_messages: List[Dict[str, Any]] = None,
@@ -595,6 +597,7 @@ class BatchRunner:
self.providers_ignored = providers_ignored
self.providers_order = providers_order
self.provider_sort = provider_sort
self.openrouter_min_coding_score = openrouter_min_coding_score
self.max_tokens = max_tokens
self.reasoning_config = reasoning_config
self.prefill_messages = prefill_messages
@@ -792,7 +795,7 @@ class BatchRunner:
conversations = entry.get("conversations", [])
for msg in conversations:
role = msg.get("role") or msg.get("from")
if role in ("user", "human"):
if role in {"user", "human"}:
prompt_text = (msg.get("content") or msg.get("value", "")).strip()
break
@@ -873,6 +876,7 @@ class BatchRunner:
"providers_ignored": self.providers_ignored,
"providers_order": self.providers_order,
"provider_sort": self.provider_sort,
"openrouter_min_coding_score": self.openrouter_min_coding_score,
"max_tokens": self.max_tokens,
"reasoning_config": self.reasoning_config,
"prefill_messages": self.prefill_messages,
+13
View File
@@ -203,6 +203,12 @@ terminal:
# docker_forward_env:
# - "GITHUB_TOKEN"
# - "NPM_TOKEN"
# # Optional: extra flags passed verbatim to docker run (appended after security defaults).
# # Useful for adding capabilities (e.g. apt installs needing SETUID) or custom options.
# # Example: add a Linux capability not included by default
# # docker_extra_args:
# # - "--cap-add"
# # - "SETUID"
# -----------------------------------------------------------------------------
# OPTION 4: Singularity/Apptainer container
@@ -657,6 +663,10 @@ platform_toolsets:
# platforms:
# telegram:
# reply_to_mode: "first" # off | first | all
# # guest_mode lets explicit @mentions from non-allowlisted groups through.
# # Default false; ordinary messages, replies, and regex wake words stay blocked.
# guest_mode: false
# # allowed_chats: ["-1001234567890"]
# extra:
# disable_link_previews: false # Set true to suppress Telegram URL previews in bot messages
@@ -943,6 +953,9 @@ display:
# false: Wait for the full response before rendering
streaming: true
# Show [HH:MM] timestamps on user input and assistant response labels.
# timestamps: false
# ───────────────────────────────────────────────────────────────────────────
# Skin / Theme
# ───────────────────────────────────────────────────────────────────────────
+837 -122
View File
File diff suppressed because it is too large Load Diff
+7 -8
View File
@@ -664,7 +664,7 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
# None both mean "clear the field" (restore old behaviour).
if "workdir" in updates:
_wd = updates["workdir"]
if _wd in (None, "", False):
if _wd in {None, "", False}:
updates["workdir"] = None
else:
updates["workdir"] = _normalize_workdir(_wd)
@@ -811,7 +811,7 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
# schedule quietly goes off. See issue #16265.
if job["next_run_at"] is None:
kind = job.get("schedule", {}).get("kind")
if kind in ("cron", "interval"):
if kind in {"cron", "interval"}:
job["state"] = "error"
if not job.get("last_error"):
job["last_error"] = (
@@ -855,7 +855,7 @@ def advance_next_run(job_id: str) -> bool:
for job in jobs:
if job["id"] == job_id:
kind = job.get("schedule", {}).get("kind")
if kind not in ("cron", "interval"):
if kind not in {"cron", "interval"}:
return False
now = _hermes_now().isoformat()
new_next = compute_next_run(job["schedule"], now)
@@ -909,7 +909,7 @@ def _get_due_jobs_locked() -> List[Dict[str, Any]]:
# next_run_at unset. Without this branch, such jobs are
# silently skipped forever; recompute next_run_at from the
# schedule so they pick up at their next scheduled tick.
if not recovered_next and kind in ("cron", "interval"):
if not recovered_next and kind in {"cron", "interval"}:
recovered_next = compute_next_run(schedule, now.isoformat())
if recovered_next:
recovery_kind = kind
@@ -940,7 +940,7 @@ def _get_due_jobs_locked() -> List[Dict[str, Any]]:
# (gateway was down and missed the window). Fast-forward to
# the next future occurrence instead of firing a stale run.
grace = _compute_grace_seconds(schedule)
if kind in ("cron", "interval") and (now - next_run_dt).total_seconds() > grace:
if kind in {"cron", "interval"} and (now - next_run_dt).total_seconds() > grace:
# Job is past its catch-up grace window — this is a stale missed run.
# Grace scales with schedule period: daily=2h, hourly=30m, 10min=5m.
new_next = compute_next_run(schedule, now.isoformat())
@@ -1082,9 +1082,8 @@ def rewrite_skill_refs(
new_skills.append(target)
elif name in pruned_set:
dropped.append(name)
else:
if name not in new_skills:
new_skills.append(name)
elif name not in new_skills:
new_skills.append(name)
if not mapped and not dropped:
continue
+2 -1
View File
@@ -754,7 +754,7 @@ def _run_job_script(script_path: str) -> tuple[bool, str]:
# shebang: the scripts dir is trusted, but keeping the interpreter
# choice explicit here keeps the allowed surface small and auditable.
suffix = path.suffix.lower()
if suffix in (".sh", ".bash"):
if suffix in {".sh", ".bash"}:
# Resolve bash dynamically so Windows (Git Bash) and Linux/macOS
# all work. On native Windows without Git for Windows installed
# shutil.which returns None — fall back to a clear error rather
@@ -1439,6 +1439,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
providers_ignored=pr.get("ignore"),
providers_order=pr.get("order"),
provider_sort=pr.get("sort"),
openrouter_min_coding_score=(_cfg.get("openrouter") or {}).get("min_coding_score"),
enabled_toolsets=_resolve_cron_enabled_toolsets(job, _cfg),
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
+1 -1
View File
@@ -264,7 +264,7 @@ def _parse_hint_result(text: str) -> tuple[int | None, str]:
"""Parse the judge's boxed decision and hint text."""
boxed = _BOXED_RE.findall(text)
score = int(boxed[-1]) if boxed else None
if score not in (1, -1):
if score not in {1, -1}:
score = None
hint_matches = _HINT_RE.findall(text)
hint = hint_matches[-1].strip() if hint_matches else ""
@@ -162,7 +162,7 @@ def _normalize_tar_member_parts(member_name: str) -> list:
):
raise ValueError(f"Unsafe archive member path: {member_name}")
parts = [part for part in posix_path.parts if part not in ("", ".")]
parts = [part for part in posix_path.parts if part not in {"", "."}]
if not parts or any(part == ".." for part in parts):
raise ValueError(f"Unsafe archive member path: {member_name}")
return parts
@@ -561,7 +561,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
# --- 5. Verify -- run test suite in the agent's sandbox ---
# Skip verification if the agent produced no meaningful output
only_system_and_user = all(
msg.get("role") in ("system", "user") for msg in result.messages
msg.get("role") in {"system", "user"} for msg in result.messages
)
if result.turns_used == 0 or only_system_and_user:
logger.warning(
@@ -919,7 +919,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
eval_metrics[f"eval/pass_rate_{cat_key}"] = cat_pass_rate
# Store metrics for wandb_log
self.eval_metrics = [(k, v) for k, v in eval_metrics.items()]
self.eval_metrics = list(eval_metrics.items())
# ---- Print summary ----
print(f"\n{'='*60}")
@@ -759,7 +759,7 @@ class YCBenchEvalEnv(HermesAgentBaseEnv):
eval_metrics[f"eval/survival_rate_{key}"] = ps / pt if pt else 0
eval_metrics[f"eval/avg_score_{key}"] = pa
self.eval_metrics = [(k, v) for k, v in eval_metrics.items()]
self.eval_metrics = list(eval_metrics.items())
# --- Print summary ---
print(f"\n{'='*60}")
+1 -1
View File
@@ -571,7 +571,7 @@ class HermesAgentBaseEnv(BaseEnv):
# (e.g., API call failed on turn 1). No point spinning up a Modal sandbox
# just to verify files that were never created.
only_system_and_user = all(
msg.get("role") in ("system", "user") for msg in result.messages
msg.get("role") in {"system", "user"} for msg in result.messages
)
if result.turns_used == 0 or only_system_and_user:
logger.warning(
+1 -1
View File
@@ -179,7 +179,7 @@ class ToolContext:
# Ensure parent directory exists in the sandbox
parent = str(_Path(remote_path).parent)
if parent not in (".", "/"):
if parent not in {".", "/"}:
self.terminal(f"mkdir -p {parent}", timeout=10)
# For small files, single command is fine
+66 -34
View File
@@ -28,9 +28,9 @@ def _coerce_bool(value: Any, default: bool = True) -> bool:
return default
if isinstance(value, str):
lowered = value.strip().lower()
if lowered in ("true", "1", "yes", "on"):
if lowered in {"true", "1", "yes", "on"}:
return True
if lowered in ("false", "0", "no", "off"):
if lowered in {"false", "0", "no", "off"}:
return False
return default
return is_truthy_value(value, default=default)
@@ -317,14 +317,32 @@ class PlatformConfig:
)
# Streaming defaults — single source of truth so both StreamingConfig and
# StreamConsumerConfig agree on the out-of-the-box edit rhythm. Tuned for
# Telegram's ~1 edit/s flood envelope: a touch under 1s lets the cadence
# breathe without bumping into rate limits, and a smaller buffer threshold
# makes short replies feel near-instant in DMs.
DEFAULT_STREAMING_EDIT_INTERVAL: float = 0.8
DEFAULT_STREAMING_BUFFER_THRESHOLD: int = 24
DEFAULT_STREAMING_CURSOR: str = ""
@dataclass
class StreamingConfig:
"""Configuration for real-time token streaming to messaging platforms."""
enabled: bool = False
transport: str = "edit" # "edit" (progressive editMessageText) or "off"
edit_interval: float = 1.0 # Seconds between message edits (Telegram rate-limits at ~1/s)
buffer_threshold: int = 40 # Chars before forcing an edit
cursor: str = "" # Cursor shown during streaming
# Transport selection:
# "auto" — prefer native streaming-draft updates when the platform
# supports them (Telegram sendMessageDraft, Bot API 9.5+);
# fall back to edit-based when not. Recommended.
# "draft" — explicitly request native drafts; falls back to edit when
# the platform/chat doesn't support them.
# "edit" — progressive editMessageText only (legacy behaviour).
# "off" — disable streaming entirely.
transport: str = "auto"
edit_interval: float = DEFAULT_STREAMING_EDIT_INTERVAL
buffer_threshold: int = DEFAULT_STREAMING_BUFFER_THRESHOLD
cursor: str = DEFAULT_STREAMING_CURSOR
# Ported from openclaw/openclaw#72038. When >0, the final edit for
# a long-running streamed response is delivered as a fresh message
# if the original preview has been visible for at least this many
@@ -350,10 +368,14 @@ class StreamingConfig:
return cls()
return cls(
enabled=_coerce_bool(data.get("enabled"), False),
transport=data.get("transport", "edit"),
edit_interval=_coerce_float(data.get("edit_interval"), 1.0),
buffer_threshold=_coerce_int(data.get("buffer_threshold"), 40),
cursor=data.get("cursor", ""),
transport=data.get("transport", "auto"),
edit_interval=_coerce_float(
data.get("edit_interval"), DEFAULT_STREAMING_EDIT_INTERVAL,
),
buffer_threshold=_coerce_int(
data.get("buffer_threshold"), DEFAULT_STREAMING_BUFFER_THRESHOLD,
),
cursor=data.get("cursor", DEFAULT_STREAMING_CURSOR),
fresh_final_after_seconds=_coerce_float(
data.get("fresh_final_after_seconds"), 60.0
),
@@ -588,8 +610,7 @@ class GatewayConfig:
try:
session_store_max_age_days = int(data.get("session_store_max_age_days", 90))
if session_store_max_age_days < 0:
session_store_max_age_days = 0
session_store_max_age_days = max(session_store_max_age_days, 0)
except (TypeError, ValueError):
session_store_max_age_days = 90
@@ -766,11 +787,19 @@ def load_gateway_config() -> GatewayConfig:
bridged["dm_policy"] = platform_cfg["dm_policy"]
if "allow_from" in platform_cfg:
bridged["allow_from"] = platform_cfg["allow_from"]
if "allow_admin_from" in platform_cfg:
bridged["allow_admin_from"] = platform_cfg["allow_admin_from"]
if "user_allowed_commands" in platform_cfg:
bridged["user_allowed_commands"] = platform_cfg["user_allowed_commands"]
if "group_policy" in platform_cfg:
bridged["group_policy"] = platform_cfg["group_policy"]
if "group_allow_from" in platform_cfg:
bridged["group_allow_from"] = platform_cfg["group_allow_from"]
if plat in (Platform.DISCORD, Platform.SLACK) and "channel_skill_bindings" in platform_cfg:
if "group_allow_admin_from" in platform_cfg:
bridged["group_allow_admin_from"] = platform_cfg["group_allow_admin_from"]
if "group_user_allowed_commands" in platform_cfg:
bridged["group_user_allowed_commands"] = platform_cfg["group_user_allowed_commands"]
if plat in {Platform.DISCORD, Platform.SLACK} and "channel_skill_bindings" in platform_cfg:
bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
if "channel_prompts" in platform_cfg:
channel_prompts = platform_cfg["channel_prompts"]
@@ -896,6 +925,8 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_REQUIRE_MENTION"] = str(_effective_rm).lower()
if "mention_patterns" in telegram_cfg and not os.getenv("TELEGRAM_MENTION_PATTERNS"):
os.environ["TELEGRAM_MENTION_PATTERNS"] = json.dumps(telegram_cfg["mention_patterns"])
if "guest_mode" in telegram_cfg and not os.getenv("TELEGRAM_GUEST_MODE"):
os.environ["TELEGRAM_GUEST_MODE"] = str(telegram_cfg["guest_mode"]).lower()
frc = telegram_cfg.get("free_response_chats")
if frc is not None and not os.getenv("TELEGRAM_FREE_RESPONSE_CHATS"):
if isinstance(frc, list):
@@ -941,16 +972,17 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(group_allowed_chats, list):
group_allowed_chats = ",".join(str(v) for v in group_allowed_chats)
os.environ["TELEGRAM_GROUP_ALLOWED_CHATS"] = str(group_allowed_chats)
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[Platform.TELEGRAM.value] = plat_data
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
extra["disable_link_previews"] = telegram_cfg["disable_link_previews"]
for _telegram_extra_key in ("guest_mode", "disable_link_previews"):
if _telegram_extra_key in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[Platform.TELEGRAM.value] = plat_data
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
extra[_telegram_extra_key] = telegram_cfg[_telegram_extra_key]
whatsapp_cfg = yaml_cfg.get("whatsapp", {})
if isinstance(whatsapp_cfg, dict):
@@ -1147,7 +1179,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
# Reply threading mode for Telegram (off/first/all)
telegram_reply_mode = os.getenv("TELEGRAM_REPLY_TO_MODE", "").lower()
if telegram_reply_mode in ("off", "first", "all"):
if telegram_reply_mode in {"off", "first", "all"}:
if Platform.TELEGRAM not in config.platforms:
config.platforms[Platform.TELEGRAM] = PlatformConfig()
config.platforms[Platform.TELEGRAM].reply_to_mode = telegram_reply_mode
@@ -1188,14 +1220,14 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
# Reply threading mode for Discord (off/first/all)
discord_reply_mode = os.getenv("DISCORD_REPLY_TO_MODE", "").lower()
if discord_reply_mode in ("off", "first", "all"):
if discord_reply_mode in {"off", "first", "all"}:
if Platform.DISCORD not in config.platforms:
config.platforms[Platform.DISCORD] = PlatformConfig()
config.platforms[Platform.DISCORD].reply_to_mode = discord_reply_mode
# WhatsApp (typically uses different auth mechanism)
whatsapp_enabled = os.getenv("WHATSAPP_ENABLED", "").lower() in ("true", "1", "yes")
whatsapp_disabled_explicitly = os.getenv("WHATSAPP_ENABLED", "").lower() in ("false", "0", "no")
whatsapp_enabled = os.getenv("WHATSAPP_ENABLED", "").lower() in {"true", "1", "yes"}
whatsapp_disabled_explicitly = os.getenv("WHATSAPP_ENABLED", "").lower() in {"false", "0", "no"}
if Platform.WHATSAPP in config.platforms:
# YAML config exists — respect explicit disable
wa_cfg = config.platforms[Platform.WHATSAPP]
@@ -1253,7 +1285,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.platforms[Platform.SIGNAL].extra.update({
"http_url": signal_url,
"account": signal_account,
"ignore_stories": os.getenv("SIGNAL_IGNORE_STORIES", "true").lower() in ("true", "1", "yes"),
"ignore_stories": os.getenv("SIGNAL_IGNORE_STORIES", "true").lower() in {"true", "1", "yes"},
})
signal_home = os.getenv("SIGNAL_HOME_CHANNEL")
if signal_home and Platform.SIGNAL in config.platforms:
@@ -1302,7 +1334,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
matrix_password = os.getenv("MATRIX_PASSWORD", "")
if matrix_password:
config.platforms[Platform.MATRIX].extra["password"] = matrix_password
matrix_e2ee = os.getenv("MATRIX_ENCRYPTION", "").lower() in ("true", "1", "yes")
matrix_e2ee = os.getenv("MATRIX_ENCRYPTION", "").lower() in {"true", "1", "yes"}
config.platforms[Platform.MATRIX].extra["encryption"] = matrix_e2ee
matrix_device_id = os.getenv("MATRIX_DEVICE_ID", "")
if matrix_device_id:
@@ -1367,7 +1399,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
)
# API Server
api_server_enabled = os.getenv("API_SERVER_ENABLED", "").lower() in ("true", "1", "yes")
api_server_enabled = os.getenv("API_SERVER_ENABLED", "").lower() in {"true", "1", "yes"}
api_server_key = os.getenv("API_SERVER_KEY", "")
api_server_cors_origins = os.getenv("API_SERVER_CORS_ORIGINS", "")
api_server_port = os.getenv("API_SERVER_PORT")
@@ -1394,7 +1426,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.platforms[Platform.API_SERVER].extra["model_name"] = api_server_model_name
# Webhook platform
webhook_enabled = os.getenv("WEBHOOK_ENABLED", "").lower() in ("true", "1", "yes")
webhook_enabled = os.getenv("WEBHOOK_ENABLED", "").lower() in {"true", "1", "yes"}
webhook_port = os.getenv("WEBHOOK_PORT")
webhook_secret = os.getenv("WEBHOOK_SECRET", "")
if webhook_enabled:
@@ -1410,11 +1442,11 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
config.platforms[Platform.WEBHOOK].extra["secret"] = webhook_secret
# Microsoft Graph webhook platform
msgraph_webhook_enabled = os.getenv("MSGRAPH_WEBHOOK_ENABLED", "").lower() in (
msgraph_webhook_enabled = os.getenv("MSGRAPH_WEBHOOK_ENABLED", "").lower() in {
"true",
"1",
"yes",
)
}
msgraph_webhook_port = os.getenv("MSGRAPH_WEBHOOK_PORT")
msgraph_webhook_client_state = os.getenv("MSGRAPH_WEBHOOK_CLIENT_STATE", "")
msgraph_webhook_resources = os.getenv("MSGRAPH_WEBHOOK_ACCEPTED_RESOURCES", "")
@@ -1608,7 +1640,7 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
"webhook_host": os.getenv("BLUEBUBBLES_WEBHOOK_HOST", "127.0.0.1"),
"webhook_port": int(os.getenv("BLUEBUBBLES_WEBHOOK_PORT", "8645")),
"webhook_path": os.getenv("BLUEBUBBLES_WEBHOOK_PATH", "/bluebubbles-webhook"),
"send_read_receipts": os.getenv("BLUEBUBBLES_SEND_READ_RECEIPTS", "true").lower() in ("true", "1", "yes"),
"send_read_receipts": os.getenv("BLUEBUBBLES_SEND_READ_RECEIPTS", "true").lower() in {"true", "1", "yes"},
})
bluebubbles_home = os.getenv("BLUEBUBBLES_HOME_CHANNEL")
if bluebubbles_home and Platform.BLUEBUBBLES in config.platforms:
+4 -4
View File
@@ -81,7 +81,7 @@ _TIER_MINIMAL = {
_PLATFORM_DEFAULTS: dict[str, dict[str, Any]] = {
# Tier 1 — full edit support, personal/team use
"telegram": _TIER_HIGH,
"telegram": {**_TIER_HIGH, "tool_progress": "new"},
"discord": _TIER_HIGH,
# Tier 2 — edit support, often customer/workspace channels
@@ -190,13 +190,13 @@ def _normalise(setting: str, value: Any) -> Any:
if value is True:
return "all"
return str(value).lower()
if setting in ("show_reasoning", "streaming"):
if setting in {"show_reasoning", "streaming"}:
if isinstance(value, str):
return value.lower() in ("true", "1", "yes", "on")
return value.lower() in {"true", "1", "yes", "on"}
return bool(value)
if setting == "cleanup_progress":
if isinstance(value, str):
return value.lower() in ("true", "1", "yes", "on")
return value.lower() in {"true", "1", "yes", "on"}
return bool(value)
if setting == "tool_preview_length":
try:
+11
View File
@@ -33,6 +33,17 @@ status display, gateway setup, and more.
auto-populate `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` so the setup
wizard surfaces proper descriptions, prompts, password flags, and URLs.
**Subclassing for platform-specific UX.** When a platform has a hard
time-window constraint that the base adapter can't anticipate (LINE's
60s single-use reply token, WhatsApp's 24h session window, etc.), an
adapter can override `_keep_typing` to layer a mid-flight bubble at a
threshold without expanding the kwarg surface. Always
`await super()._keep_typing(...)` so the typing heartbeat keeps running,
and tear down your side task in `finally`. See `plugins/platforms/line/`
for the full pattern (Template Buttons postback at 45s, `RequestCache`
state machine, `interrupt_session_activity` override for `/stop`
orphans) and the developer-guide page for the prose walkthrough.
See `plugins/platforms/irc/`, `plugins/platforms/teams/`, and
`plugins/platforms/google_chat/` for complete working examples, and
`website/docs/developer-guide/adding-platform-adapters.md` for the full
+26 -2
View File
@@ -9,9 +9,19 @@ Each adapter handles:
"""
from .base import BasePlatformAdapter, MessageEvent, SendResult
from .qqbot import QQAdapter
from .yuanbao import YuanbaoAdapter
# QQAdapter and YuanbaoAdapter were previously imported eagerly here, but
# nothing in the codebase consumes ``from gateway.platforms import
# QQAdapter`` (every real call site uses the long-form path
# ``from gateway.platforms.qqbot import QQAdapter``). The eager imports
# pulled in qqbot's chunked-upload + keyboards + onboard machinery and
# yuanbao's websocket stack — about 48 ms wall and ~8 MB RSS on every
# CLI invocation, even ones that never touch a gateway adapter.
#
# Use PEP 562 module ``__getattr__`` to keep the public re-export working
# while deferring the actual import to first attribute access. This is
# 100% backward-compatible for any external code that still imports the
# adapters from the package root.
__all__ = [
"BasePlatformAdapter",
"MessageEvent",
@@ -19,3 +29,17 @@ __all__ = [
"QQAdapter",
"YuanbaoAdapter",
]
def __getattr__(name):
if name == "QQAdapter":
from .qqbot import QQAdapter # noqa: F401
return QQAdapter
if name == "YuanbaoAdapter":
from .yuanbao import YuanbaoAdapter # noqa: F401
return YuanbaoAdapter
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
def __dir__():
return sorted(__all__)
+59 -13
View File
@@ -449,7 +449,7 @@ if AIOHTTP_AVAILABLE:
@web.middleware
async def body_limit_middleware(request, handler):
"""Reject overly large request bodies early based on Content-Length."""
if request.method in ("POST", "PUT", "PATCH"):
if request.method in {"POST", "PUT", "PATCH"}:
cl = request.headers.get("Content-Length")
if cl is not None:
try:
@@ -646,7 +646,7 @@ class APIServerAdapter(BasePlatformAdapter):
try:
from hermes_cli.profiles import get_active_profile_name
profile = get_active_profile_name()
if profile and profile not in ("default", "custom"):
if profile and profile not in {"default", "custom"}:
return profile
except Exception:
pass
@@ -1003,7 +1003,7 @@ class APIServerAdapter(BasePlatformAdapter):
system_prompt = content
else:
system_prompt = system_prompt + "\n" + content
elif role in ("user", "assistant"):
elif role in {"user", "assistant"}:
try:
content = _normalize_multimodal_content(raw_content)
except ValueError as exc:
@@ -1206,10 +1206,49 @@ class APIServerAdapter(BasePlatformAdapter):
status=500,
)
final_response = result.get("final_response", "")
if not final_response:
final_response = result.get("error", "(No response generated)")
final_response = result.get("final_response") or ""
is_partial = bool(result.get("partial"))
is_failed = bool(result.get("failed"))
completed = bool(result.get("completed", True))
err_msg = result.get("error")
# Decide finish_reason. OpenAI uses "length" for truncation, "stop"
# for normal completion, and downstream SDKs accept "error" / custom
# codes. See issue #22496.
if is_partial and err_msg and "truncat" in err_msg.lower():
finish_reason = "length"
elif is_failed or (not completed and err_msg):
finish_reason = "error"
else:
finish_reason = "stop"
response_headers = {
"X-Hermes-Session-Id": result.get("session_id", session_id),
}
if gateway_session_key:
response_headers["X-Hermes-Session-Key"] = gateway_session_key
# Hard-fail path: no usable assistant text AND a real failure → 5xx
# with OpenAI-style error envelope so SDK clients raise instead of
# silently rendering the internal failure string as message.content.
if not final_response and (is_failed or is_partial):
err_body = _openai_error(
err_msg or "Agent run did not produce a response.",
err_type="server_error",
code="agent_incomplete",
)
err_body["error"]["hermes"] = {
"completed": completed,
"partial": is_partial,
"failed": is_failed,
}
response_headers["X-Hermes-Completed"] = "false"
response_headers["X-Hermes-Partial"] = "true" if is_partial else "false"
return web.json_response(err_body, status=502, headers=response_headers)
# Soft-partial path: we have *some* text but the run did not complete
# (e.g. truncation with partial buffered output). Still 200 but signal
# truncation via finish_reason="length" + Hermes-specific extras.
response_data = {
"id": completion_id,
"object": "chat.completion",
@@ -1222,7 +1261,7 @@ class APIServerAdapter(BasePlatformAdapter):
"role": "assistant",
"content": final_response,
},
"finish_reason": "stop",
"finish_reason": finish_reason,
}
],
"usage": {
@@ -1231,12 +1270,19 @@ class APIServerAdapter(BasePlatformAdapter):
"total_tokens": usage.get("total_tokens", 0),
},
}
if is_partial or is_failed or not completed:
response_data["hermes"] = {
"completed": completed,
"partial": is_partial,
"failed": is_failed,
"error": err_msg,
"error_code": "output_truncated" if finish_reason == "length" else "agent_error",
}
response_headers["X-Hermes-Completed"] = "false"
response_headers["X-Hermes-Partial"] = "true" if is_partial else "false"
if err_msg:
response_headers["X-Hermes-Error"] = err_msg[:200]
response_headers = {
"X-Hermes-Session-Id": result.get("session_id", session_id),
}
if gateway_session_key:
response_headers["X-Hermes-Session-Key"] = gateway_session_key
return web.json_response(response_data, headers=response_headers)
async def _write_sse_chat_completion(
@@ -2335,7 +2381,7 @@ class APIServerAdapter(BasePlatformAdapter):
if cron_err:
return cron_err
try:
include_disabled = request.query.get("include_disabled", "").lower() in ("true", "1")
include_disabled = request.query.get("include_disabled", "").lower() in {"true", "1"}
jobs = _cron_list(include_disabled=include_disabled)
return web.json_response({"jobs": jobs})
except Exception as e:
+104 -3
View File
@@ -560,7 +560,7 @@ def _looks_like_image(data: bytes) -> bool:
return True
if data[:3] == b"\xff\xd8\xff":
return True
if data[:6] in (b"GIF87a", b"GIF89a"):
if data[:6] in {b"GIF87a", b"GIF89a"}:
return True
if data[:2] == b"BM":
return True
@@ -859,7 +859,7 @@ def cache_document_from_bytes(data: bytes, filename: str) -> str:
# Sanitize: strip directory components, null bytes, and control characters
safe_name = Path(filename).name if filename else "document"
safe_name = safe_name.replace("\x00", "").strip()
if not safe_name or safe_name in (".", ".."):
if not safe_name or safe_name in {".", ".."}:
safe_name = "document"
cached_name = f"doc_{uuid.uuid4().hex[:12]}_{safe_name}"
filepath = cache_dir / cached_name
@@ -1035,6 +1035,13 @@ class SendResult:
error: Optional[str] = None
raw_response: Any = None
retryable: bool = False # True for transient connection errors — base will retry automatically
# When the adapter had to split an oversized payload across multiple
# platform messages (e.g. Telegram edit_message overflow split-and-deliver),
# ``message_id`` is the LAST visible message id (so subsequent edits target
# the most recent chunk) and these are the additional message ids that
# made up the full payload, in send order. Empty tuple for the common
# single-message case.
continuation_message_ids: tuple = ()
class EphemeralReply(str):
@@ -1311,6 +1318,61 @@ class BasePlatformAdapter(ABC):
# _keep_typing skips send_typing when the chat_id is in this set.
self._typing_paused: set = set()
@property
def message_len_fn(self) -> Callable[[str], int]:
"""Return the length function for measuring message size on this platform.
Override in adapters whose platform counts characters differently from
Python ``len`` (e.g. Telegram counts UTF-16 code units).
"""
return len
def supports_draft_streaming(
self,
chat_type: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> bool:
"""Whether this adapter supports native streaming-draft updates.
Telegram Bot API 9.5 introduced ``sendMessageDraft``, which renders an
animated streaming preview as the bot calls it repeatedly with the
same ``draft_id`` and growing text. Adapters that implement
``send_draft`` should return True here for the chat types where the
platform supports it (Telegram restricts drafts to private DMs).
Default implementation returns False. Stream consumers fall back to
the edit-based path (``send`` + ``edit_message``) when this returns
False or when ``send_draft`` raises.
"""
return False
async def send_draft(
self,
chat_id: str,
draft_id: int,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send or update an animated streaming-draft preview.
Reuse the same ``draft_id`` (any non-zero int) across consecutive
calls within a single response so the platform animates the preview
rather than re-creating it. Different responses must use different
``draft_id`` values within the same chat to avoid animating over a
prior bubble.
Drafts have no message_id and cannot be edited, replied to, or
deleted via normal message APIs. When the response finishes, the
caller delivers the final answer as a regular ``send`` and the
draft preview clears naturally on the client.
Default implementation raises NotImplementedError; adapters that
also return True from :meth:`supports_draft_streaming` must override.
"""
raise NotImplementedError(
f"{type(self).__name__} does not implement send_draft"
)
@property
def has_fatal_error(self) -> bool:
return self._fatal_error_message is not None
@@ -1511,6 +1573,33 @@ class BasePlatformAdapter(ABC):
# property) so the stream consumer knows not to short-circuit.
REQUIRES_EDIT_FINALIZE: bool = False
async def create_handoff_thread(
self,
parent_chat_id: str,
name: str,
) -> Optional[str]:
"""Create a fresh thread under ``parent_chat_id`` for a session handoff.
Used by the gateway's handoff watcher when transferring a CLI
session to a thread-capable platform the new thread isolates the
handed-off conversation from any pre-existing chat in the home
channel and gives users a clean per-handoff scrollback.
Returns the new thread/topic id (as a string) on success, or
``None`` if the platform doesn't support threading or the
attempt failed (permissions, topics-mode off, etc.). When ``None``
is returned the watcher falls back to using ``parent_chat_id``
directly.
Default implementation returns ``None`` adapters that support
threads override this. See:
- Telegram: forum topics in groups, DM topics with bot API 9.4+
- Discord: text-channel threads (1440-min auto-archive)
- Slack: seed-message thread anchoring
"""
return None
async def edit_message(
self,
chat_id: str,
@@ -2704,7 +2793,7 @@ class BasePlatformAdapter(ABC):
# and preserve ordering of queued follow-ups. Route those
# through the dedicated handoff path that serializes
# cancellation + runner response + pending drain.
if cmd in ("stop", "new", "reset"):
if cmd in {"stop", "new", "reset"}:
try:
await self._dispatch_active_session_command(event, session_key, cmd)
except Exception as e:
@@ -2950,6 +3039,18 @@ class BasePlatformAdapter(ABC):
if text_content:
logger.info("[%s] Sending response (%d chars) to %s", self.name, len(text_content), event.source.chat_id)
_reply_anchor = _reply_anchor_for_event(event)
# Mark final response messages for notification delivery.
# Platform adapters that support per-message notification
# control (e.g. Telegram's disable_notification) use this
# flag to override silent-mode and ensure the final
# response triggers a push notification.
# Clone to avoid mutating the metadata shared with the
# typing-indicator task (which must remain unmarked).
if _thread_metadata is not None:
_thread_metadata = dict(_thread_metadata)
_thread_metadata["notify"] = True
else:
_thread_metadata = {"notify": True}
result = await self._send_with_retry(
chat_id=event.source.chat_id,
content=text_content,
+1 -1
View File
@@ -223,7 +223,7 @@ class BlueBubblesAdapter(BasePlatformAdapter):
def _webhook_url(self) -> str:
"""Compute the external webhook URL for BlueBubbles registration."""
host = self.webhook_host
if host in ("0.0.0.0", "127.0.0.1", "localhost", "::"):
if host in {"0.0.0.0", "127.0.0.1", "localhost", "::"}:
host = "localhost"
return f"http://{host}:{self.webhook_port}{self.webhook_path}"
+63 -2
View File
@@ -353,9 +353,9 @@ class DingTalkAdapter(BasePlatformAdapter):
configured = self.config.extra.get("require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() in ("true", "1", "yes", "on")
return configured.lower() in {"true", "1", "yes", "on"}
return bool(configured)
return os.getenv("DINGTALK_REQUIRE_MENTION", "false").lower() in ("true", "1", "yes", "on")
return os.getenv("DINGTALK_REQUIRE_MENTION", "false").lower() in {"true", "1", "yes", "on"}
def _dingtalk_free_response_chats(self) -> Set[str]:
raw = self.config.extra.get("free_response_chats")
@@ -886,6 +886,67 @@ class DingTalkAdapter(BasePlatformAdapter):
"""DingTalk does not support typing indicators."""
pass
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send an image via DingTalk markdown.
DingTalk's session webhook only supports text/markdown payloads, not
native image/file attachments. For remote image URLs, render the image
inline with markdown so the user still sees the image. Local files need
OpenAPI media upload and are handled separately.
"""
image_block = f"![image]({image_url})"
content = f"{caption}\n\n{image_block}" if caption else image_block
return await self.send(
chat_id=chat_id,
content=content,
reply_to=reply_to,
metadata=metadata,
)
async def send_image_file(
self,
chat_id: str,
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs,
) -> SendResult:
"""DingTalk webhook replies cannot send local image files directly."""
return SendResult(
success=False,
error=(
"DingTalk session webhook replies do not support local image uploads. "
"Only markdown/text replies are supported without OpenAPI media upload."
),
)
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
**kwargs,
) -> SendResult:
"""DingTalk webhook replies cannot send local file attachments directly."""
return SendResult(
success=False,
error=(
"DingTalk session webhook replies do not support local file attachments. "
"Only markdown/text replies are supported without OpenAPI message send."
),
)
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Return basic info about a DingTalk conversation."""
return {
+92 -12
View File
@@ -115,7 +115,7 @@ def _build_allowed_mentions():
raw = os.getenv(name, "").strip().lower()
if not raw:
return default
return raw in ("true", "1", "yes", "on")
return raw in {"true", "1", "yes", "on"}
return discord.AllowedMentions(
everyone=_b("DISCORD_ALLOW_MENTION_EVERYONE", False),
@@ -708,7 +708,7 @@ class DiscordAdapter(BasePlatformAdapter):
# Ignore Discord system messages (thread renames, pins, member joins, etc.)
# Allow both default and reply types — replies have a distinct MessageType.
if message.type not in (discord.MessageType.default, discord.MessageType.reply):
if message.type not in {discord.MessageType.default, discord.MessageType.reply}:
return
# Bot message filtering (DISCORD_ALLOW_BOTS):
@@ -769,7 +769,7 @@ class DiscordAdapter(BasePlatformAdapter):
# answer regardless of who is mentioned.
_ignore_no_mention = os.getenv(
"DISCORD_IGNORE_NO_MENTION", "true"
).lower() in ("true", "1", "yes")
).lower() in {"true", "1", "yes"}
if _ignore_no_mention and not _self_mentioned and not _other_bots_mentioned:
_channel_id = str(message.channel.id)
_parent_id = None
@@ -1317,7 +1317,7 @@ class DiscordAdapter(BasePlatformAdapter):
def _reactions_enabled(self) -> bool:
"""Check if message reactions are enabled via config/env."""
return os.getenv("DISCORD_REACTIONS", "true").lower() not in ("false", "0", "no")
return os.getenv("DISCORD_REACTIONS", "true").lower() not in {"false", "0", "no"}
async def on_processing_start(self, event: MessageEvent) -> None:
"""Add an in-progress reaction for normal Discord message events."""
@@ -2697,6 +2697,8 @@ class DiscordAdapter(BasePlatformAdapter):
await asyncio.sleep(8)
except asyncio.CancelledError:
pass
finally:
self._typing_tasks.pop(chat_id, None)
self._typing_tasks[chat_id] = asyncio.create_task(_typing_loop())
@@ -3135,9 +3137,9 @@ class DiscordAdapter(BasePlatformAdapter):
# UX so users don't see commands they can't invoke. Off by default
# to preserve the slash UX for deployments that intentionally allow
# everyone in the guild.
if os.getenv("DISCORD_HIDE_SLASH_COMMANDS", "false").strip().lower() in (
if os.getenv("DISCORD_HIDE_SLASH_COMMANDS", "false").strip().lower() in {
"true", "1", "yes", "on",
):
}:
self._apply_owner_only_visibility(tree)
def _apply_owner_only_visibility(self, tree) -> None:
@@ -3524,9 +3526,9 @@ class DiscordAdapter(BasePlatformAdapter):
configured = self.config.extra.get("require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() not in ("false", "0", "no", "off")
return configured.lower() not in {"false", "0", "no", "off"}
return bool(configured)
return os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no", "off")
return os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in {"false", "0", "no", "off"}
def _discord_free_response_channels(self) -> set:
"""Return Discord channel IDs where no bot mention is required.
@@ -3689,6 +3691,84 @@ class DiscordAdapter(BasePlatformAdapter):
)
return None
async def create_handoff_thread(
self,
parent_chat_id: str,
name: str,
) -> Optional[str]:
"""Create a Discord thread under a text channel for a handoff.
Falls back to a seed-message + ``message.create_thread`` path if
``parent.create_thread`` is rejected (some channel types or
permission setups). Returns the new thread id as a string, or
``None`` on failure or when the parent isn't a text channel
(DMs, voice channels, threads themselves can't host threads).
"""
if not self._client or not DISCORD_AVAILABLE:
return None
try:
parent_id = int(parent_chat_id)
except (TypeError, ValueError):
return None
try:
parent = self._client.get_channel(parent_id)
if parent is None:
parent = await self._client.fetch_channel(parent_id)
except Exception as exc:
logger.warning(
"[%s] Handoff thread: cannot resolve parent %s: %s",
self.name, parent_chat_id, exc,
)
return None
# DMs, voice channels, and existing threads can't host child threads.
if isinstance(parent, getattr(discord, "DMChannel", ())):
logger.info(
"[%s] Handoff thread: parent %s is a DM; threads not supported here",
self.name, parent_chat_id,
)
return None
thread_name = (name or "handoff").strip()[:80] or "handoff"
reason = "Hermes session handoff"
# First try: create a thread directly on the channel.
try:
create = getattr(parent, "create_thread", None)
if create is not None:
thread = await create(
name=thread_name,
auto_archive_duration=1440,
reason=reason,
)
return str(thread.id)
except Exception as direct_error:
logger.debug(
"[%s] Handoff thread: direct create failed (%s); trying seed-message fallback",
self.name, direct_error,
)
# Fallback: post a seed message and create the thread from it.
try:
send = getattr(parent, "send", None)
if send is None:
return None
seed_msg = await send(f"\U0001f9f5 Hermes handoff: **{thread_name}**")
thread = await seed_msg.create_thread(
name=thread_name,
auto_archive_duration=1440,
reason=reason,
)
return str(thread.id)
except Exception as fallback_error:
logger.warning(
"[%s] Handoff thread: both create paths failed for parent %s: %s",
self.name, parent_chat_id, fallback_error,
)
return None
async def send_exec_approval(
self, chat_id: str, command: str, session_key: str,
description: str = "dangerous command",
@@ -4120,7 +4200,7 @@ class DiscordAdapter(BasePlatformAdapter):
no_thread_channels_raw = os.getenv("DISCORD_NO_THREAD_CHANNELS", "")
no_thread_channels = {ch.strip() for ch in no_thread_channels_raw.split(",") if ch.strip()}
skip_thread = bool(channel_ids & no_thread_channels)
auto_thread = os.getenv("DISCORD_AUTO_THREAD", "true").lower() in ("true", "1", "yes")
auto_thread = os.getenv("DISCORD_AUTO_THREAD", "true").lower() in {"true", "1", "yes"}
is_reply_message = getattr(message, "type", None) == discord.MessageType.reply
if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
thread = await self._auto_create_thread(message)
@@ -4202,7 +4282,7 @@ class DiscordAdapter(BasePlatformAdapter):
try:
# Determine extension from content type (image/png -> .png)
ext = "." + content_type.split("/")[-1].split(";")[0]
if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
if ext not in {".jpg", ".jpeg", ".png", ".gif", ".webp"}:
ext = ".jpg"
cached_path = await self._cache_discord_image(att, ext)
media_urls.append(cached_path)
@@ -4216,7 +4296,7 @@ class DiscordAdapter(BasePlatformAdapter):
elif content_type.startswith("audio/"):
try:
ext = "." + content_type.split("/")[-1].split(";")[0]
if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
if ext not in {".ogg", ".mp3", ".wav", ".webm", ".m4a"}:
ext = ".ogg"
cached_path = await self._cache_discord_audio(att, ext)
media_urls.append(cached_path)
@@ -4259,7 +4339,7 @@ class DiscordAdapter(BasePlatformAdapter):
logger.info("[Discord] Cached user document: %s", cached_path)
# Inject text content for plain-text documents (capped at 100 KB)
MAX_TEXT_INJECT_BYTES = 100 * 1024
if ext in (".md", ".txt", ".log") and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
if ext in {".md", ".txt", ".log"} and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
try:
text_content = raw_bytes.decode("utf-8")
display_name = att.filename or f"document{ext}"
+27 -2
View File
@@ -54,7 +54,7 @@ _NOREPLY_PATTERNS = (
# RFC headers that indicate bulk/automated mail
_AUTOMATED_HEADERS = {
"Auto-Submitted": lambda v: v.lower() != "no",
"Precedence": lambda v: v.lower() in ("bulk", "list", "junk"),
"Precedence": lambda v: v.lower() in {"bulk", "list", "junk"},
"X-Auto-Response-Suppress": lambda v: bool(v),
"List-Unsubscribe": lambda v: bool(v),
}
@@ -65,6 +65,29 @@ MAX_MESSAGE_LENGTH = 50_000
# Supported image extensions for inline detection
_IMAGE_EXTS = {".jpg", ".jpeg", ".png", ".gif", ".webp"}
def _send_imap_id(imap: "imaplib.IMAP4") -> None:
"""Send RFC 2971 IMAP ID command identifying this client.
Required by 163/NetEase mailbox after LOGIN: without it, every UID
SEARCH/FETCH returns ``BYE Unsafe Login`` and disconnects. Other
IMAP servers either honor it silently or reject the unknown command;
we swallow failures so non-supporting servers keep working.
"""
try:
try:
from hermes_cli import __version__ as _hermes_version
except Exception: # noqa: BLE001 — keep ID best-effort if import fails
_hermes_version = "0"
imap.xatom(
"ID",
f'("name" "hermes-agent" "version" "{_hermes_version}" '
'"vendor" "NousResearch" '
'"support-email" "noreply@nousresearch.com")',
)
except Exception as e: # noqa: BLE001 — best-effort, never fatal
logger.debug("[Email] IMAP ID command not accepted: %s", e)
def _is_automated_sender(address: str, headers: dict) -> bool:
"""Return True if this email is from an automated/noreply source."""
addr = address.lower()
@@ -180,7 +203,7 @@ def _extract_attachments(
continue
# Skip text/plain and text/html body parts
content_type = part.get_content_type()
if content_type in ("text/plain", "text/html") and "attachment" not in disposition:
if content_type in {"text/plain", "text/html"} and "attachment" not in disposition:
continue
filename = part.get_filename()
@@ -276,6 +299,7 @@ class EmailAdapter(BasePlatformAdapter):
# Test IMAP connection
imap = imaplib.IMAP4_SSL(self._imap_host, self._imap_port, timeout=30)
imap.login(self._address, self._password)
_send_imap_id(imap)
# Mark all existing messages as seen so we only process new ones
imap.select("INBOX")
status, data = imap.uid("search", None, "ALL")
@@ -344,6 +368,7 @@ class EmailAdapter(BasePlatformAdapter):
imap = imaplib.IMAP4_SSL(self._imap_host, self._imap_port, timeout=30)
try:
imap.login(self._address, self._password)
_send_imap_id(imap)
imap.select("INBOX")
status, data = imap.uid("search", None, "UNSEEN")
+31 -21
View File
@@ -428,7 +428,7 @@ RejectReason = Literal[
def _is_bot_sender(sender: Any) -> bool:
# receive_v1 docs say {user, bot}; accept "app" defensively.
return getattr(sender, "sender_type", "") in ("bot", "app")
return getattr(sender, "sender_type", "") in {"bot", "app"}
def _sender_identity(sender: Any) -> frozenset:
@@ -1428,8 +1428,8 @@ class FeishuAdapter(BasePlatformAdapter):
per_chat_require_mention = _to_boolean(rule_cfg.get("require_mention"))
group_rules[str(chat_id)] = FeishuGroupRule(
policy=str(rule_cfg.get("policy", "open")).strip().lower(),
allowlist=set(str(u).strip() for u in rule_cfg.get("allowlist", []) if str(u).strip()),
blacklist=set(str(u).strip() for u in rule_cfg.get("blacklist", []) if str(u).strip()),
allowlist={str(u).strip() for u in rule_cfg.get("allowlist", []) if str(u).strip()},
blacklist={str(u).strip() for u in rule_cfg.get("blacklist", []) if str(u).strip()},
require_mention=per_chat_require_mention,
)
@@ -1443,7 +1443,7 @@ class FeishuAdapter(BasePlatformAdapter):
# Env-only so adapter and gateway auth bypass share one source; yaml
# feishu.allow_bots is bridged to this env var at config load.
allow_bots = os.getenv("FEISHU_ALLOW_BOTS", "none").strip().lower()
if allow_bots not in ("none", "mentions", "all"):
if allow_bots not in {"none", "mentions", "all"}:
logger.warning(
"[Feishu] Unknown allow_bots=%r, falling back to 'none'. Valid: none, mentions, all.",
allow_bots,
@@ -2752,7 +2752,7 @@ class FeishuAdapter(BasePlatformAdapter):
# =========================================================================
def _reactions_enabled(self) -> bool:
return os.getenv("FEISHU_REACTIONS", "true").strip().lower() not in ("false", "0", "no")
return os.getenv("FEISHU_REACTIONS", "true").strip().lower() not in {"false", "0", "no"}
async def _add_reaction(self, message_id: str, emoji_type: str) -> Optional[str]:
"""Return the reaction_id on success, else None. The id is needed later for deletion."""
@@ -3219,7 +3219,7 @@ class FeishuAdapter(BasePlatformAdapter):
self._on_bot_added_to_chat(data)
elif event_type == "im.chat.member.bot.deleted_v1":
self._on_bot_removed_from_chat(data)
elif event_type in ("im.message.reaction.created_v1", "im.message.reaction.deleted_v1"):
elif event_type in {"im.message.reaction.created_v1", "im.message.reaction.deleted_v1"}:
self._on_reaction_event(event_type, data)
elif event_type == "card.action.trigger":
self._on_card_action_trigger(data)
@@ -4273,21 +4273,31 @@ class FeishuAdapter(BasePlatformAdapter):
request = self._build_reply_message_request(effective_reply_to, body)
return await asyncio.to_thread(self._client.im.v1.message.reply, request)
body = self._build_create_message_body(
receive_id=chat_id,
msg_type=msg_type,
content=payload,
uuid_value=str(uuid.uuid4()),
)
# Detect whether chat_id is a user open_id (DM) or a chat_id (group).
# Feishu API expects receive_id_type="open_id" for user DMs (ou_ prefix)
# and receive_id_type="chat_id" for group chats (oc_ prefix, which IS
# the chat_id format — see https://open.feishu.cn/document/).
if chat_id.startswith("ou_"):
receive_id_type = "open_id"
# For topic/thread messages that fell back from reply→create, use
# thread_id as receive_id so the message lands in the topic instead of
# the main chat.
_thread_id = (metadata or {}).get("thread_id")
if _thread_id:
body = self._build_create_message_body(
receive_id=_thread_id,
msg_type=msg_type,
content=payload,
uuid_value=str(uuid.uuid4()),
)
request = self._build_create_message_request("thread_id", body)
else:
receive_id_type = "chat_id"
request = self._build_create_message_request(receive_id_type, body)
body = self._build_create_message_body(
receive_id=chat_id,
msg_type=msg_type,
content=payload,
uuid_value=str(uuid.uuid4()),
)
# Detect whether chat_id is a user open_id (DM) or a chat_id (group).
if chat_id.startswith("ou_"):
receive_id_type = "open_id"
else:
receive_id_type = "chat_id"
request = self._build_create_message_request(receive_id_type, body)
return await asyncio.to_thread(self._client.im.v1.message.create, request)
@staticmethod
@@ -4805,7 +4815,7 @@ def _poll_registration(
# Terminal errors
error = res.get("error", "")
if error in ("access_denied", "expired_token"):
if error in {"access_denied", "expired_token"}:
if poll_count > 0:
print()
logger.warning("[Feishu onboard] Registration %s", error)
+3 -3
View File
@@ -690,7 +690,7 @@ def _extract_docs_links(replies: List[Dict[str, Any]]) -> List[Dict[str, str]]:
except (json.JSONDecodeError, TypeError):
continue
for elem in content.get("elements", []):
if elem.get("type") not in ("docs_link", "link"):
if elem.get("type") not in {"docs_link", "link"}:
continue
link_data = elem.get("docs_link") or elem.get("link") or {}
url = link_data.get("url", "")
@@ -1031,7 +1031,7 @@ def _save_session_history(key: str, messages: List[Dict[str, Any]]) -> None:
# Only keep user/assistant messages (strip system messages and tool internals)
cleaned = [
m for m in messages
if m.get("role") in ("user", "assistant") and m.get("content")
if m.get("role") in {"user", "assistant"} and m.get("content")
]
# Keep last N
if len(cleaned) > _SESSION_MAX_MESSAGES:
@@ -1170,7 +1170,7 @@ async def handle_drive_comment_event(
rule = resolve_rule(comments_cfg, file_type, file_token)
# If no exact match and config has wiki keys, try reverse-lookup
if rule.match_source in ("wildcard", "top") and has_wiki_keys(comments_cfg):
if rule.match_source in {"wildcard", "top"} and has_wiki_keys(comments_cfg):
wiki_token = await _reverse_lookup_wiki_token(client, file_type, file_token)
if wiki_token:
rule = resolve_rule(comments_cfg, file_type, file_token, wiki_token=wiki_token)
+1 -1
View File
@@ -228,7 +228,7 @@ def _load_pairing_approved() -> set:
if isinstance(approved, dict):
return set(approved.keys())
if isinstance(approved, list):
return set(str(u) for u in approved if u)
return {str(u) for u in approved if u}
return set()
+1 -1
View File
@@ -246,7 +246,7 @@ class ThreadParticipationTracker:
thread_list = list(self._threads)
if len(thread_list) > self._max_tracked:
thread_list = thread_list[-self._max_tracked:]
self._threads = {thread_id: None for thread_id in thread_list}
self._threads = dict.fromkeys(thread_list)
atomic_json_write(path, thread_list, indent=None)
def mark(self, thread_id: str) -> None:
+2 -2
View File
@@ -256,7 +256,7 @@ class HomeAssistantAdapter(BasePlatformAdapter):
await self._handle_ha_event(data.get("event", {}))
except json.JSONDecodeError:
logger.debug("Invalid JSON from HA WS: %s", ws_msg.data[:200])
elif ws_msg.type in (aiohttp.WSMsgType.CLOSED, aiohttp.WSMsgType.ERROR):
elif ws_msg.type in {aiohttp.WSMsgType.CLOSED, aiohttp.WSMsgType.ERROR}:
break
async def _handle_ha_event(self, event: Dict[str, Any]) -> None:
@@ -361,7 +361,7 @@ class HomeAssistantAdapter(BasePlatformAdapter):
f"(was {'triggered' if old_val == 'on' else 'cleared'})"
)
if domain in ("light", "switch", "fan"):
if domain in {"light", "switch", "fan"}:
return (
f"[Home Assistant] {friendly_name}: turned "
f"{'on' if new_val == 'on' else 'off'}"
+13 -13
View File
@@ -245,11 +245,11 @@ def check_matrix_requirements() -> bool:
# If encryption is requested, verify E2EE deps are available at startup
# rather than silently degrading to plaintext-only at connect time.
encryption_requested = os.getenv("MATRIX_ENCRYPTION", "").lower() in (
encryption_requested = os.getenv("MATRIX_ENCRYPTION", "").lower() in {
"true",
"1",
"yes",
)
}
if encryption_requested and not _check_e2ee_deps():
logger.error(
"Matrix: MATRIX_ENCRYPTION=true but E2EE dependencies are missing. %s. "
@@ -312,7 +312,7 @@ class MatrixAdapter(BasePlatformAdapter):
)
self._encryption: bool = config.extra.get(
"encryption",
os.getenv("MATRIX_ENCRYPTION", "").lower() in ("true", "1", "yes"),
os.getenv("MATRIX_ENCRYPTION", "").lower() in {"true", "1", "yes"},
)
self._device_id: str = config.extra.get("device_id", "") or os.getenv(
"MATRIX_DEVICE_ID", ""
@@ -343,7 +343,7 @@ class MatrixAdapter(BasePlatformAdapter):
# Mention/thread gating — parsed once from env vars.
self._require_mention: bool = os.getenv(
"MATRIX_REQUIRE_MENTION", "true"
).lower() not in ("false", "0", "no")
).lower() not in {"false", "0", "no"}
free_rooms_raw = config.extra.get("free_response_rooms")
if free_rooms_raw is None:
free_rooms_raw = os.getenv("MATRIX_FREE_RESPONSE_ROOMS", "")
@@ -367,22 +367,22 @@ class MatrixAdapter(BasePlatformAdapter):
self._allowed_rooms: Set[str] = {
r.strip() for r in str(allowed_rooms_raw).split(",") if r.strip()
}
self._auto_thread: bool = os.getenv("MATRIX_AUTO_THREAD", "true").lower() in (
self._auto_thread: bool = os.getenv("MATRIX_AUTO_THREAD", "true").lower() in {
"true",
"1",
"yes",
)
}
self._dm_auto_thread: bool = os.getenv(
"MATRIX_DM_AUTO_THREAD", "false"
).lower() in ("true", "1", "yes")
).lower() in {"true", "1", "yes"}
self._dm_mention_threads: bool = os.getenv(
"MATRIX_DM_MENTION_THREADS", "false"
).lower() in ("true", "1", "yes")
).lower() in {"true", "1", "yes"}
# Reactions: configurable via MATRIX_REACTIONS (default: true).
self._reactions_enabled: bool = os.getenv(
"MATRIX_REACTIONS", "true"
).lower() not in ("false", "0", "no")
).lower() not in {"false", "0", "no"}
self._pending_reactions: dict[tuple[str, str], str] = {}
# Delay before redacting reactions so Matrix homeservers have time to
# deliver the final message event without tripping "missing event"
@@ -1771,9 +1771,9 @@ class MatrixAdapter(BasePlatformAdapter):
# Cache media locally when downstream tools need a real file path.
cached_path = None
should_cache_locally = msg_type in (
should_cache_locally = msg_type in {
MessageType.PHOTO, MessageType.AUDIO, MessageType.VIDEO, MessageType.DOCUMENT,
) or is_voice_message or is_encrypted_media
} or is_voice_message or is_encrypted_media
if should_cache_locally and url:
try:
file_bytes = await self._client.download_media(ContentURI(url))
@@ -1834,7 +1834,7 @@ class MatrixAdapter(BasePlatformAdapter):
ext = ext_map.get(media_type, ".jpg")
cached_path = cache_image_from_bytes(file_bytes, ext=ext)
logger.info("[Matrix] Cached user image at %s", cached_path)
elif msg_type in (MessageType.AUDIO, MessageType.VOICE):
elif msg_type in {MessageType.AUDIO, MessageType.VOICE}:
ext = (
Path(
body
@@ -2602,7 +2602,7 @@ class MatrixAdapter(BasePlatformAdapter):
"""Sanitize a URL for use in an href attribute."""
stripped = url.strip()
scheme = stripped.split(":", 1)[0].lower().strip() if ":" in stripped else ""
if scheme in ("javascript", "data", "vbscript"):
if scheme in {"javascript", "data", "vbscript"}:
return ""
return stripped.replace('"', "&quot;")
+6 -6
View File
@@ -611,7 +611,7 @@ class MattermostAdapter(BasePlatformAdapter):
# succeed on retry — stop reconnecting instead of looping forever.
import aiohttp
err_str = str(exc).lower()
if isinstance(exc, aiohttp.WSServerHandshakeError) and exc.status in (401, 403):
if isinstance(exc, aiohttp.WSServerHandshakeError) and exc.status in {401, 403}:
logger.error("Mattermost WS auth failed (HTTP %d) — stopping reconnect", exc.status)
return
if "401" in err_str or "403" in err_str or "unauthorized" in err_str:
@@ -649,21 +649,21 @@ class MattermostAdapter(BasePlatformAdapter):
if self._closing:
return
if raw_msg.type in (
if raw_msg.type in {
raw_msg.type.TEXT,
raw_msg.type.BINARY,
):
}:
try:
event = json.loads(raw_msg.data)
except (json.JSONDecodeError, TypeError):
continue
await self._handle_ws_event(event)
elif raw_msg.type in (
elif raw_msg.type in {
raw_msg.type.ERROR,
raw_msg.type.CLOSE,
raw_msg.type.CLOSING,
raw_msg.type.CLOSED,
):
}:
logger.info("Mattermost: WebSocket closed (%s)", raw_msg.type)
break
@@ -732,7 +732,7 @@ class MattermostAdapter(BasePlatformAdapter):
require_mention = os.getenv(
"MATTERMOST_REQUIRE_MENTION", "true"
).lower() not in ("false", "0", "no")
).lower() not in {"false", "0", "no"}
free_channels_raw = os.getenv("MATTERMOST_FREE_RESPONSE_CHANNELS", "")
free_channels = {ch.strip() for ch in free_channels_raw.split(",") if ch.strip()}
+16 -16
View File
@@ -513,7 +513,7 @@ class QQAdapter(BasePlatformAdapter):
self._fail_pending("Connection closed")
# Stop reconnecting for fatal codes
if code in (4914, 4915):
if code in {4914, 4915}:
desc = "offline/sandbox-only" if code == 4914 else "banned"
logger.error(
"[%s] Bot is %s. Check QQ Open Platform.", self._log_tag, desc
@@ -550,7 +550,7 @@ class QQAdapter(BasePlatformAdapter):
self._token_expires_at = 0.0
# Session invalid → clear session, will re-identify on next Hello
if code in (
if code in {
4006,
4007,
4009,
@@ -568,7 +568,7 @@ class QQAdapter(BasePlatformAdapter):
4911,
4912,
4913,
):
}:
logger.info(
"[%s] Session error (%d), clearing session for re-identify",
self._log_tag,
@@ -637,12 +637,12 @@ class QQAdapter(BasePlatformAdapter):
payload = self._parse_json(msg.data)
if payload:
self._dispatch_payload(payload)
elif msg.type in (aiohttp.WSMsgType.PING,):
elif msg.type in {aiohttp.WSMsgType.PING,}:
# aiohttp auto-replies with PONG
pass
elif msg.type == aiohttp.WSMsgType.CLOSE:
raise QQCloseError(msg.data, msg.extra)
elif msg.type in (aiohttp.WSMsgType.CLOSED, aiohttp.WSMsgType.ERROR):
elif msg.type in {aiohttp.WSMsgType.CLOSED, aiohttp.WSMsgType.ERROR}:
raise RuntimeError("WebSocket closed")
async def _heartbeat_loop(self) -> None:
@@ -783,13 +783,13 @@ class QQAdapter(BasePlatformAdapter):
self._handle_ready(d)
elif t == "RESUMED":
logger.info("[%s] Session resumed", self._log_tag)
elif t in (
elif t in {
"C2C_MESSAGE_CREATE",
"GROUP_AT_MESSAGE_CREATE",
"DIRECT_MESSAGE_CREATE",
"GUILD_MESSAGE_CREATE",
"GUILD_AT_MESSAGE_CREATE",
):
}:
asyncio.create_task(self._on_message(t, d))
elif t == "INTERACTION_CREATE":
self._create_task(self._on_interaction(d))
@@ -859,9 +859,9 @@ class QQAdapter(BasePlatformAdapter):
# Route by event type
if event_type == "C2C_MESSAGE_CREATE":
await self._handle_c2c_message(d, msg_id, content, author, timestamp)
elif event_type in ("GROUP_AT_MESSAGE_CREATE",):
elif event_type in {"GROUP_AT_MESSAGE_CREATE",}:
await self._handle_group_message(d, msg_id, content, author, timestamp)
elif event_type in ("GUILD_MESSAGE_CREATE", "GUILD_AT_MESSAGE_CREATE"):
elif event_type in {"GUILD_MESSAGE_CREATE", "GUILD_AT_MESSAGE_CREATE"}:
await self._handle_guild_message(d, msg_id, content, author, timestamp)
elif event_type == "DIRECT_MESSAGE_CREATE":
await self._handle_dm_message(d, msg_id, content, author, timestamp)
@@ -1864,7 +1864,7 @@ class QQAdapter(BasePlatformAdapter):
return ".wav"
if data[:4] == b"fLaC":
return ".flac"
if data[:2] in (b"\xff\xfb", b"\xff\xf3", b"\xff\xf2"):
if data[:2] in {b"\xff\xfb", b"\xff\xf3", b"\xff\xf2"}:
return ".mp3"
if data[:4] == b"\x30\x26\xb2\x75" or data[:4] == b"\x4f\x67\x67\x53":
return ".ogg"
@@ -2033,7 +2033,7 @@ class QQAdapter(BasePlatformAdapter):
"base_url": base_url,
"api_key": api_key,
"model": model
or ("glm-asr" if provider in ("zai", "glm") else "whisper-1"),
or ("glm-asr" if provider in {"zai", "glm"} else "whisper-1"),
}
# 2. QQ-specific env vars (set by `hermes setup gateway` / `hermes gateway`)
@@ -2115,7 +2115,7 @@ class QQAdapter(BasePlatformAdapter):
if urlparse(source_url).path
else ""
)
if not ext or ext not in (
if not ext or ext not in {
".silk",
".amr",
".mp3",
@@ -2124,7 +2124,7 @@ class QQAdapter(BasePlatformAdapter):
".m4a",
".aac",
".flac",
):
}:
ext = self._guess_ext_from_data(audio_data)
with tempfile.NamedTemporaryFile(suffix=ext, delete=False) as tmp_src:
@@ -2870,7 +2870,7 @@ class QQAdapter(BasePlatformAdapter):
raise ValueError("Media source is required")
parsed = urlparse(source)
if parsed.scheme in ("http", "https"):
if parsed.scheme in {"http", "https"}:
# For URLs, pass through directly to the upload API
content_type = mimetypes.guess_type(source)[0] or "application/octet-stream"
resolved_name = file_name or Path(parsed.path).name or "media"
@@ -2966,7 +2966,7 @@ class QQAdapter(BasePlatformAdapter):
chat_type = self._guess_chat_type(chat_id)
return {
"name": chat_id,
"type": "group" if chat_type in ("group", "guild") else "dm",
"type": "group" if chat_type in {"group", "guild"} else "dm",
}
# ------------------------------------------------------------------
@@ -2975,7 +2975,7 @@ class QQAdapter(BasePlatformAdapter):
@staticmethod
def _is_url(source: str) -> bool:
return urlparse(str(source)).scheme in ("http", "https")
return urlparse(str(source)).scheme in {"http", "https"}
def _guess_chat_type(self, chat_id: str) -> str:
"""Determine chat type from stored inbound metadata, fallback to 'c2c'."""
+2 -3
View File
@@ -239,7 +239,7 @@ class ChunkedUploader:
:raises UploadFileTooLargeError: When the file exceeds the platform limit.
:raises RuntimeError: On other API or I/O failures.
"""
if chat_type not in ("c2c", "group"):
if chat_type not in {"c2c", "group"}:
raise ValueError(
f"ChunkedUploader: unsupported chat_type {chat_type!r}"
)
@@ -592,8 +592,7 @@ async def _run_with_concurrency(
concurrency: int,
) -> None:
"""Run a list of thunks with a bounded number in flight at once."""
if concurrency < 1:
concurrency = 1
concurrency = max(concurrency, 1)
sem = asyncio.Semaphore(concurrency)
async def _wrap(thunk: Callable[[], Awaitable[None]]) -> None:
+3 -3
View File
@@ -99,11 +99,11 @@ def _guess_extension(data: bytes) -> str:
def _is_image_ext(ext: str) -> bool:
return ext.lower() in (".jpg", ".jpeg", ".png", ".gif", ".webp")
return ext.lower() in {".jpg", ".jpeg", ".png", ".gif", ".webp"}
def _is_audio_ext(ext: str) -> bool:
return ext.lower() in (".mp3", ".wav", ".ogg", ".m4a", ".aac")
return ext.lower() in {".mp3", ".wav", ".ogg", ".m4a", ".aac"}
_EXT_TO_MIME = {
@@ -1449,7 +1449,7 @@ class SignalAdapter(BasePlatformAdapter):
contacts from seeing the 👀 reaction (which fires before run.py's
auth gate and would otherwise reveal that a bot is listening).
"""
if os.getenv("SIGNAL_REACTIONS", "true").lower() in ("false", "0", "no"):
if os.getenv("SIGNAL_REACTIONS", "true").lower() in {"false", "0", "no"}:
return False
if event is not None:
sender = getattr(getattr(event, "source", None), "user_id", None)
+46 -11
View File
@@ -679,6 +679,41 @@ class SlackAdapter(BasePlatformAdapter):
if lock_acquired and not self._running:
self._release_platform_lock()
async def create_handoff_thread(
self,
parent_chat_id: str,
name: str,
) -> Optional[str]:
"""Create a Slack thread anchor for a session handoff.
Slack threads are anchored to a parent message (``thread_ts``), not
a channel-level construct. So we post a seed message into the home
channel and return its ``ts`` the watcher uses that as the
``thread_id`` for subsequent sends.
Returns the seed message ts as a string, or ``None`` on failure.
"""
if not self._app:
return None
try:
client = self._get_client(parent_chat_id)
if client is None:
return None
seed_text = f":thread: Hermes handoff — *{(name or 'session').strip()[:80]}*"
result = await client.chat_postMessage(
channel=parent_chat_id,
text=seed_text,
)
ts = result.get("ts") if isinstance(result, dict) else getattr(result, "get", lambda _k, _d=None: None)("ts")
if ts:
return str(ts)
except Exception as exc:
logger.warning(
"[%s] Handoff thread: seed-post failed for channel %s: %s",
self.name, parent_chat_id, exc,
)
return None
async def disconnect(self) -> None:
"""Disconnect from Slack."""
if self._handler:
@@ -900,7 +935,7 @@ class SlackAdapter(BasePlatformAdapter):
raw = self.config.extra.get("dm_top_level_threads_as_sessions")
if raw is None:
return True # default: each DM thread is its own session
return str(raw).strip().lower() in ("1", "true", "yes", "on")
return str(raw).strip().lower() in {"1", "true", "yes", "on"}
def _resolve_thread_ts(
self,
@@ -1265,7 +1300,7 @@ class SlackAdapter(BasePlatformAdapter):
def _reactions_enabled(self) -> bool:
"""Check if message reactions are enabled via config/env."""
return os.getenv("SLACK_REACTIONS", "true").lower() not in ("false", "0", "no")
return os.getenv("SLACK_REACTIONS", "true").lower() not in {"false", "0", "no"}
async def on_processing_start(self, event: MessageEvent) -> None:
"""Add an in-progress reaction when message processing begins."""
@@ -1738,7 +1773,7 @@ class SlackAdapter(BasePlatformAdapter):
# Ignore message edits and deletions
subtype = event.get("subtype")
if subtype in ("message_changed", "message_deleted"):
if subtype in {"message_changed", "message_deleted"}:
return
original_text = event.get("text", "")
@@ -1857,7 +1892,7 @@ class SlackAdapter(BasePlatformAdapter):
channel_type = event.get("channel_type", "")
if not channel_type and channel_id.startswith("D"):
channel_type = "im"
is_dm = channel_type in ("im", "mpim") # Both 1:1 and group DMs
is_dm = channel_type in {"im", "mpim"} # Both 1:1 and group DMs
# Build thread_ts for session keying.
# In channels: fall back to ts so each top-level @mention starts a
@@ -1998,7 +2033,7 @@ class SlackAdapter(BasePlatformAdapter):
if mimetype.startswith("image/") and url:
try:
ext = "." + mimetype.split("/")[-1].split(";")[0]
if ext not in (".jpg", ".jpeg", ".png", ".gif", ".webp"):
if ext not in {".jpg", ".jpeg", ".png", ".gif", ".webp"}:
ext = ".jpg"
# Slack private URLs require the bot token as auth header
cached = await self._download_slack_file(url, ext, team_id=team_id)
@@ -2014,7 +2049,7 @@ class SlackAdapter(BasePlatformAdapter):
elif mimetype.startswith("audio/") and url:
try:
ext = "." + mimetype.split("/")[-1].split(";")[0]
if ext not in (".ogg", ".mp3", ".wav", ".webm", ".m4a"):
if ext not in {".ogg", ".mp3", ".wav", ".webm", ".m4a"}:
ext = ".ogg"
cached = await self._download_slack_file(url, ext, audio=True, team_id=team_id)
media_urls.append(cached)
@@ -2702,7 +2737,7 @@ class SlackAdapter(BasePlatformAdapter):
if team_id and channel_id:
self._channel_team[channel_id] = team_id
if slash_name in ("hermes", ""):
if slash_name in {"hermes", ""}:
# Legacy /hermes <subcommand> [args] routing + free-form questions.
# Empty slash_name falls into this branch for backward compat
# with any caller that didn't populate command["command"].
@@ -2897,9 +2932,9 @@ class SlackAdapter(BasePlatformAdapter):
configured = self.config.extra.get("require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() not in ("false", "0", "no", "off")
return configured.lower() not in {"false", "0", "no", "off"}
return bool(configured)
return os.getenv("SLACK_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no", "off")
return os.getenv("SLACK_REQUIRE_MENTION", "true").lower() not in {"false", "0", "no", "off"}
def _slack_strict_mention(self) -> bool:
"""When true, channel threads require an explicit @-mention on every
@@ -2909,9 +2944,9 @@ class SlackAdapter(BasePlatformAdapter):
configured = self.config.extra.get("strict_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() in ("true", "1", "yes", "on")
return configured.lower() in {"true", "1", "yes", "on"}
return bool(configured)
return os.getenv("SLACK_STRICT_MENTION", "false").lower() in ("true", "1", "yes", "on")
return os.getenv("SLACK_STRICT_MENTION", "false").lower() in {"true", "1", "yes", "on"}
def _slack_free_response_channels(self) -> set:
"""Return channel IDs where no @mention is required."""
+531 -65
View File
@@ -77,7 +77,6 @@ from gateway.platforms.base import (
SUPPORTED_VIDEO_TYPES,
SUPPORTED_DOCUMENT_TYPES,
utf16_len,
_prefix_within_utf16_limit,
)
from gateway.platforms.telegram_network import (
TelegramFallbackTransport,
@@ -180,18 +179,32 @@ def _render_table_block_for_telegram(table_block: list[str]) -> str:
if len(headers) < 2:
return "\n".join(table_block)
# Detect row-label column: present when data rows have one more cell
# than the header row (the row-label column carries no header).
first_data_row = _split_markdown_table_row(table_block[2]) if len(table_block) > 2 else []
has_row_label_col = len(first_data_row) == len(headers) + 1
rendered_rows: list[str] = []
for index, row in enumerate(table_block[2:], start=1):
cells = _split_markdown_table_row(row)
if len(cells) < len(headers):
cells.extend([""] * (len(headers) - len(cells)))
elif len(cells) > len(headers):
cells = cells[: len(headers)]
if has_row_label_col:
# First cell is the row-label (heading); remaining cells align with headers.
heading = cells[0] if cells and cells[0] else f"Row {index}"
data_cells = cells[1:]
else:
# No row-label column: use first non-empty cell as heading.
heading = next((cell for cell in cells if cell), f"Row {index}")
data_cells = cells
# Pad or trim data_cells to match headers length.
if len(data_cells) < len(headers):
data_cells.extend([""] * (len(headers) - len(data_cells)))
elif len(data_cells) > len(headers):
data_cells = data_cells[: len(headers)]
heading = next((cell for cell in cells if cell), f"Row {index}")
rendered_rows.append(f"**{heading}**")
rendered_rows.extend(
f"{header}: {value}" for header, value in zip(headers, cells)
f"{header}: {value}" for header, value in zip(headers, data_cells)
)
return "\n\n".join(rendered_rows)
@@ -269,6 +282,50 @@ class TelegramAdapter(BasePlatformAdapter):
MEDIA_GROUP_WAIT_SECONDS = 0.8
_GENERAL_TOPIC_THREAD_ID = "1"
# Adaptive text-batch ingress: short messages need a tighter delay so the
# first token reaches the agent fast. Numbers tuned for "feels instant":
# ≤320 codepoints (one short paragraph) settles in ~180ms; ≤1024
# (a normal paragraph) in ~240ms; longer waits the configured cap.
# Always clamped to ``_text_batch_delay_seconds`` so an operator can lower
# the cap further via env var.
_TEXT_BATCH_FAST_LEN = 320
_TEXT_BATCH_FAST_DELAY_S = 0.18
_TEXT_BATCH_SHORT_LEN = 1024
_TEXT_BATCH_SHORT_DELAY_S = 0.24
@staticmethod
def _env_float_clamped(
name: str,
default: float,
*,
min_value: Optional[float] = None,
max_value: Optional[float] = None,
) -> float:
"""Read a float env var, reject non-finite values, and clamp to bounds.
Guarantees the returned value is a finite number usable directly in
``asyncio.sleep()`` and similar APIs that reject NaN / Inf.
"""
import math
raw = os.getenv(name)
try:
value = float(raw) if raw is not None else float(default)
except (TypeError, ValueError):
value = float(default)
if not math.isfinite(value):
value = float(default)
if min_value is not None:
value = max(value, min_value)
if max_value is not None:
value = min(value, max_value)
return value
@property
def message_len_fn(self):
"""Telegram measures message length in UTF-16 code units."""
return utf16_len
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.TELEGRAM)
self._app: Optional[Application] = None
@@ -285,9 +342,24 @@ class TelegramAdapter(BasePlatformAdapter):
self._media_group_events: Dict[str, MessageEvent] = {}
self._media_group_tasks: Dict[str, asyncio.Task] = {}
# Buffer rapid text messages so Telegram client-side splits of long
# messages are aggregated into a single MessageEvent.
self._text_batch_delay_seconds = float(os.getenv("HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS", "0.6"))
self._text_batch_split_delay_seconds = float(os.getenv("HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS", "2.0"))
# messages are aggregated into a single MessageEvent. Lower defaults
# (0.3s / 1.0s instead of 0.6s / 2.0s) let short replies stream
# without a noticeable wait — combined with the adaptive fast-path
# in ``_calc_text_batch_delay`` below, ≤320-codepoint replies settle
# in ~180ms. All bounds are conservative for Telegram's
# ~1 edit/s flood envelope.
self._text_batch_delay_seconds = self._env_float_clamped(
"HERMES_TELEGRAM_TEXT_BATCH_DELAY_SECONDS",
0.3,
min_value=0.08,
max_value=2.0,
)
self._text_batch_split_delay_seconds = self._env_float_clamped(
"HERMES_TELEGRAM_TEXT_BATCH_SPLIT_DELAY_SECONDS",
1.0,
min_value=self._text_batch_delay_seconds,
max_value=4.0,
)
self._pending_text_batches: Dict[str, MessageEvent] = {}
self._pending_text_batch_tasks: Dict[str, asyncio.Task] = {}
self._polling_error_task: Optional[asyncio.Task] = None
@@ -305,6 +377,30 @@ class TelegramAdapter(BasePlatformAdapter):
# Slash-confirm button state: confirm_id → session_key (for /reload-mcp
# and any other slash-confirm prompts; see GatewayRunner._request_slash_confirm).
self._slash_confirm_state: Dict[str, str] = {}
# Notification mode for message sends.
# "important" — only final responses, approvals, and slash confirmations
# trigger notifications; tool progress, streaming, status
# messages are delivered silently via disable_notification.
# This is the default — Telegram users found per-tool-call
# push notifications too noisy.
# "all" — every message triggers a push notification (legacy
# behavior; opt-in via display.platforms.telegram.notifications).
self._notifications_mode: str = "important"
def _notification_kwargs(
self, metadata: Optional[Dict[str, Any]]
) -> Dict[str, Any]:
"""Return disable_notification kwargs when the adapter is in silent mode.
In "important" mode, all message sends are silently delivered
(disable_notification=True) unless the caller explicitly requests a
notification by setting ``metadata["notify"] = True``.
"""
if getattr(self, "_notifications_mode", "important") != "important":
return {}
if (metadata or {}).get("notify"):
return {}
return {"disable_notification": True}
def _is_callback_user_authorized(
self,
@@ -520,7 +616,7 @@ class TelegramAdapter(BasePlatformAdapter):
def _looks_like_network_error(error: Exception) -> bool:
"""Return True for transient network errors that warrant a reconnect attempt."""
name = error.__class__.__name__.lower()
if name in ("networkerror", "timedout", "connectionerror"):
if name in {"networkerror", "timedout", "connectionerror"}:
return True
try:
from telegram.error import NetworkError, TimedOut
@@ -536,9 +632,9 @@ class TelegramAdapter(BasePlatformAdapter):
return default
if isinstance(value, str):
lowered = value.strip().lower()
if lowered in ("true", "1", "yes", "on"):
if lowered in {"true", "1", "yes", "on"}:
return True
if lowered in ("false", "0", "no", "off"):
if lowered in {"false", "0", "no", "off"}:
return False
return default
return bool(value)
@@ -827,6 +923,24 @@ class TelegramAdapter(BasePlatformAdapter):
)
return None
async def create_handoff_thread(
self,
parent_chat_id: str,
name: str,
) -> Optional[str]:
"""Create a forum topic for a session handoff.
Works for DM topics (Bot API 9.4+, requires user to enable Topics
in their chat with the bot) and forum supergroups. Returns the
``message_thread_id`` as a string, or ``None`` on failure.
"""
try:
chat_id_int = int(parent_chat_id)
except (TypeError, ValueError):
return None
thread_id = await self._create_dm_topic(chat_id_int, name=name)
return str(thread_id) if thread_id else None
async def rename_dm_topic(
self,
chat_id: int,
@@ -1057,7 +1171,7 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in {"1", "true", "yes", "on"})
fallback_ips = self._fallback_ips()
if not fallback_ips:
fallback_ips = await discover_fallback_ips()
@@ -1400,6 +1514,7 @@ class TelegramAdapter(BasePlatformAdapter):
reply_to_message_id=reply_to_id,
**thread_kwargs,
**self._link_preview_kwargs(),
**self._notification_kwargs(metadata),
)
except Exception as md_error:
# Markdown parsing failed, try plain text
@@ -1413,6 +1528,7 @@ class TelegramAdapter(BasePlatformAdapter):
reply_to_message_id=reply_to_id,
**thread_kwargs,
**self._link_preview_kwargs(),
**self._notification_kwargs(metadata),
)
else:
raise
@@ -1496,10 +1612,18 @@ class TelegramAdapter(BasePlatformAdapter):
except Exception as e:
logger.error("[%s] Failed to send Telegram message: %s", self.name, e, exc_info=True)
err_str = str(e).lower()
# Message too long — content exceeded 4096 chars. Return failure so
# stream consumer enters fallback mode and sends the remainder.
if "message_too_long" in err_str or "too long" in err_str:
logger.debug(
"[%s] send() content too long, falling back to new-message continuation",
self.name,
)
return SendResult(success=False, error="message_too_long")
# TimedOut means the request may have reached Telegram —
# mark as non-retryable so _send_with_retry() doesn't re-send.
_to = locals().get("_TimedOut")
err_str = str(e).lower()
is_timeout = (_to and isinstance(e, _to)) or "timed out" in err_str
return SendResult(success=False, error=str(e), retryable=not is_timeout)
@@ -1511,9 +1635,26 @@ class TelegramAdapter(BasePlatformAdapter):
*,
finalize: bool = False,
) -> SendResult:
"""Edit a previously sent Telegram message."""
"""Edit a previously sent Telegram message.
Telegram caps single-message text at 4096 UTF-16 codeunits. Streaming
replies that grow past this limit must NOT be silently truncated and
must NOT return failure (the consumer would re-send and create a
duplicate). Instead this method split-and-delivers: edit the
existing message with the first chunk and send the rest as
continuation messages, returning the final chunk's id so subsequent
edits target the most recent visible message.
"""
if not self._bot:
return SendResult(success=False, error="Not connected")
# Pre-flight: if content already exceeds the limit, split-and-deliver
# without round-tripping a doomed edit.
if utf16_len(content) > self.MAX_MESSAGE_LENGTH:
return await self._edit_overflow_split(
chat_id, message_id, content, finalize=finalize,
)
try:
if not finalize:
await self._bot.edit_message_text(
@@ -1547,22 +1688,17 @@ class TelegramAdapter(BasePlatformAdapter):
# "Message is not modified" — content identical, treat as success
if "not modified" in err_str:
return SendResult(success=True, message_id=message_id)
# Message too long — content exceeded 4096 chars (e.g. during
# streaming). Truncate and succeed so the stream consumer can
# split the overflow into a new message instead of dying.
# Reactive split-and-deliver: parse_mode formatting can inflate
# the payload past the limit even when the raw text was under
# (e.g. MarkdownV2 escapes). Same fix as the pre-flight path.
if "message_too_long" in err_str or "too long" in err_str:
truncated = _prefix_within_utf16_limit(
content, self.MAX_MESSAGE_LENGTH - 20
) + ""
try:
await self._bot.edit_message_text(
chat_id=int(chat_id),
message_id=int(message_id),
text=truncated,
)
except Exception:
pass # best-effort truncation
return SendResult(success=True, message_id=message_id)
logger.debug(
"[%s] edit_message overflow (%d UTF-16 > %d), splitting",
self.name, utf16_len(content), self.MAX_MESSAGE_LENGTH,
)
return await self._edit_overflow_split(
chat_id, message_id, content, finalize=finalize,
)
# Flood control / RetryAfter — short waits are retried inline,
# long waits return a failure immediately so streaming can fall back
# to a normal final send instead of leaving a truncated partial.
@@ -1598,6 +1734,147 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=False, error=str(e))
async def _edit_overflow_split(
self,
chat_id: str,
message_id: str,
content: str,
*,
finalize: bool,
) -> SendResult:
"""Split an oversized edit across the existing message + continuations.
Edit the original ``message_id`` with chunk 1 (with the platform's
usual ``(1/N)`` suffix preserved), then send the remaining chunks as
new messages threaded as replies to the previous chunk so the user
sees them grouped. Returns ``SendResult(success=True,
message_id=<last-chunk-id>, continuation_message_ids=(...))`` so the
stream consumer can keep editing the most recent visible message
and the gateway has full visibility into every message id we put on
screen.
Falls back to ``SendResult(success=False)`` only if even the first-
chunk edit fails that's a real adapter problem, not an overflow.
"""
chunks = self.truncate_message(
content, self.MAX_MESSAGE_LENGTH, len_fn=utf16_len,
)
if len(chunks) <= 1:
# Defensive: shouldn't happen given the caller's pre-flight, but
# if truncate_message returned a single chunk just edit normally.
chunks = [content]
# Step 1 — edit the existing message with the first chunk.
first_chunk = chunks[0]
try:
if finalize:
# Use format_message + parse_mode for the final chunk;
# mirror edit_message's main happy-path.
formatted = self.format_message(first_chunk)
try:
await self._bot.edit_message_text(
chat_id=int(chat_id),
message_id=int(message_id),
text=formatted,
parse_mode=ParseMode.MARKDOWN_V2,
)
except Exception as fmt_err:
if "not modified" not in str(fmt_err).lower():
await self._bot.edit_message_text(
chat_id=int(chat_id),
message_id=int(message_id),
text=first_chunk,
)
else:
await self._bot.edit_message_text(
chat_id=int(chat_id),
message_id=int(message_id),
text=first_chunk,
)
except Exception as e:
err_str = str(e).lower()
if "not modified" in err_str:
# First chunk identical to current text — fall through to
# send continuations.
pass
else:
logger.error(
"[%s] Overflow split: first-chunk edit failed: %s",
self.name, e, exc_info=True,
)
return SendResult(success=False, error=str(e))
# Step 2 — send each remaining chunk as a continuation message,
# threaded as a reply to the previous so the user sees them as a
# contiguous block. We call self._bot.send_message directly so the
# continuation skips ``self.send``'s own pre-chunking pass (chunks
# are already correctly sized). Best-effort MarkdownV2 with plain
# fallback, mirroring send().
continuation_ids: list[str] = []
prev_id = message_id
for chunk in chunks[1:]:
sent_msg = None
for use_markdown in (True, False) if finalize else (False,):
try:
text = self.format_message(chunk) if use_markdown else chunk
sent_msg = await self._bot.send_message(
chat_id=int(chat_id),
text=text,
parse_mode=ParseMode.MARKDOWN_V2 if use_markdown else None,
reply_to_message_id=int(prev_id) if prev_id else None,
)
break
except Exception as send_err:
if "reply message not found" in str(send_err).lower():
# Drop the reply anchor and try again.
try:
sent_msg = await self._bot.send_message(
chat_id=int(chat_id),
text=chunk,
)
break
except Exception as _retry_err:
logger.warning(
"[%s] Overflow continuation no-reply retry failed: %s",
self.name, _retry_err,
)
sent_msg = None
break
if use_markdown:
# try plain text on next loop iteration
continue
logger.warning(
"[%s] Overflow continuation send failed: %s",
self.name, send_err,
)
sent_msg = None
break
if sent_msg is None:
# Continuation failed — the user has chunk 1 + however many
# continuations succeeded. Report success with what we got
# so the stream consumer knows the edit landed; the
# remaining tail is lost on this attempt and the next
# streaming tick may retry.
logger.warning(
"[%s] Overflow split: stopped at %d/%d chunks delivered",
self.name, 1 + len(continuation_ids), len(chunks),
)
break
new_id = str(getattr(sent_msg, "message_id", "")) or prev_id
continuation_ids.append(new_id)
prev_id = new_id
last_id = continuation_ids[-1] if continuation_ids else message_id
logger.debug(
"[%s] Overflow split delivered %d chunks; last_id=%s",
self.name, 1 + len(continuation_ids), last_id,
)
return SendResult(
success=True,
message_id=last_id,
continuation_message_ids=tuple(continuation_ids),
)
async def delete_message(self, chat_id: str, message_id: str) -> bool:
"""Delete a previously sent Telegram message.
@@ -1623,6 +1900,109 @@ class TelegramAdapter(BasePlatformAdapter):
)
return False
def supports_draft_streaming(
self,
chat_type: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> bool:
"""Telegram supports sendMessageDraft for private chats only.
Bot API 9.5 (March 2026) opened ``sendMessageDraft`` to all bots
unconditionally for private (DM) chats. Groups, supergroups, and
channels still rely on the edit-based path.
We additionally require ``self._bot`` to expose ``send_message_draft``
(added to python-telegram-bot in 22.6); older PTB installs gracefully
fall back to the edit path even on DMs.
"""
if not self._bot or not hasattr(self._bot, "send_message_draft"):
return False
return (chat_type or "").lower() in {"dm", "private"}
async def send_draft(
self,
chat_id: str,
draft_id: int,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Stream a partial message via Telegram's native sendMessageDraft.
The Bot API animates the preview when the same ``draft_id`` is reused
across consecutive calls in the same chat. When the response
finishes, the caller sends the final text via the normal ``send``
path; the draft preview clears naturally on the client (Telegram has
no Bot API to "promote" a draft to a real message the final
``sendMessage`` is what the user receives in their history).
"""
if not self._bot:
return SendResult(success=False, error="not_connected")
if not hasattr(self._bot, "send_message_draft"):
return SendResult(success=False, error="api_unavailable")
# Trim to the same UTF-16 budget the platform enforces on regular
# sends. Drafts have the same length contract as messages.
text = content if len(content) <= self.MAX_MESSAGE_LENGTH else \
self.truncate_message(content, self.MAX_MESSAGE_LENGTH, len_fn=utf16_len)[0]
kwargs: Dict[str, Any] = {
"chat_id": int(chat_id),
"draft_id": int(draft_id),
"text": text,
}
thread_id = self._metadata_thread_id(metadata)
if thread_id is not None:
kwargs["message_thread_id"] = thread_id
try:
ok = await self._bot.send_message_draft(**kwargs)
if ok:
# Drafts have no message_id; we report success without one
# so the caller knows the animation frame landed.
return SendResult(success=True, message_id=None)
return SendResult(success=False, error="draft_rejected")
except Exception as e:
# Most likely: BadRequest because this bot/chat doesn't allow
# drafts, or a transient server hiccup. The caller treats any
# failure as "fall back to edit-based for this response".
logger.debug(
"[%s] sendMessageDraft failed (chat=%s draft_id=%s): %s",
self.name, chat_id, draft_id, e,
)
return SendResult(success=False, error=str(e))
async def _send_message_with_thread_fallback(self, **kwargs):
"""Send a Telegram message, retrying once without message_thread_id
if Telegram returns 'Message thread not found'.
Used for control-style sends (approval prompts, model picker,
update prompts) that can carry a stale thread_id from a DM
reply chain. The streaming send loop has its own equivalent
(PR #3390) at the body of ``send``; this helper applies the
same retry pattern to the non-streaming control paths.
"""
if not self._bot:
raise RuntimeError("Not connected")
message_thread_id = kwargs.get("message_thread_id")
try:
return await self._bot.send_message(**kwargs)
except Exception as send_err:
if (
message_thread_id is not None
and self._is_bad_request_error(send_err)
and self._is_thread_not_found_error(send_err)
):
logger.warning(
"[%s] Thread %s not found for control message, retrying without message_thread_id",
self.name,
message_thread_id,
)
retry_kwargs = dict(kwargs)
retry_kwargs.pop("message_thread_id", None)
return await self._bot.send_message(**retry_kwargs)
raise
async def send_update_prompt(
self, chat_id: str, prompt: str, default: str = "",
session_key: str = "",
@@ -1646,7 +2026,7 @@ class TelegramAdapter(BasePlatformAdapter):
])
thread_id = self._metadata_thread_id(metadata)
reply_to_id = self._reply_to_message_id_for_send(None, metadata)
msg = await self._bot.send_message(
msg = await self._send_message_with_thread_fallback(
chat_id=int(chat_id),
text=text,
parse_mode=ParseMode.MARKDOWN,
@@ -1726,7 +2106,7 @@ class TelegramAdapter(BasePlatformAdapter):
)
)
msg = await self._bot.send_message(**kwargs)
msg = await self._send_message_with_thread_fallback(**kwargs)
# Store session_key keyed by approval_id for the callback handler
self._approval_state[approval_id] = session_key
@@ -1778,7 +2158,7 @@ class TelegramAdapter(BasePlatformAdapter):
)
)
msg = await self._bot.send_message(**kwargs)
msg = await self._send_message_with_thread_fallback(**kwargs)
self._slash_confirm_state[confirm_id] = session_key
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
@@ -1836,7 +2216,7 @@ class TelegramAdapter(BasePlatformAdapter):
thread_id = metadata.get("thread_id") if metadata else None
reply_to_id = self._reply_to_message_id_for_send(None, metadata)
msg = await self._bot.send_message(
msg = await self._send_message_with_thread_fallback(
chat_id=int(chat_id),
text=text,
parse_mode=ParseMode.MARKDOWN,
@@ -2343,7 +2723,7 @@ class TelegramAdapter(BasePlatformAdapter):
with open(audio_path, "rb") as audio_file:
ext = os.path.splitext(audio_path)[1].lower()
# .ogg / .opus files -> send as voice (round playable bubble)
if ext in (".ogg", ".opus"):
if ext in {".ogg", ".opus"}:
_voice_thread = self._metadata_thread_id(metadata)
reply_to_id = self._reply_to_message_id_for_send(reply_to, metadata)
voice_thread_kwargs = self._thread_kwargs_for_send(
@@ -2360,13 +2740,14 @@ class TelegramAdapter(BasePlatformAdapter):
"caption": caption[:1024] if caption else None,
"reply_to_message_id": reply_to_id,
**voice_thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
"voice",
reset_media=lambda: audio_file.seek(0),
)
elif ext in (".mp3", ".m4a"):
elif ext in {".mp3", ".m4a"}:
# Telegram's Bot API sendAudio only accepts MP3 / M4A.
_audio_thread = self._metadata_thread_id(metadata)
reply_to_id = self._reply_to_message_id_for_send(reply_to, metadata)
@@ -2384,6 +2765,7 @@ class TelegramAdapter(BasePlatformAdapter):
"caption": caption[:1024] if caption else None,
"reply_to_message_id": reply_to_id,
**audio_thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
@@ -2520,6 +2902,7 @@ class TelegramAdapter(BasePlatformAdapter):
"media": media,
"reply_to_message_id": reply_to_id,
**thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
@@ -2577,6 +2960,7 @@ class TelegramAdapter(BasePlatformAdapter):
"caption": caption[:1024] if caption else None,
"reply_to_message_id": reply_to_id,
**thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
@@ -2672,6 +3056,7 @@ class TelegramAdapter(BasePlatformAdapter):
"caption": caption[:1024] if caption else None,
"reply_to_message_id": reply_to_id,
**thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
@@ -2717,6 +3102,7 @@ class TelegramAdapter(BasePlatformAdapter):
"caption": caption[:1024] if caption else None,
"reply_to_message_id": reply_to_id,
**thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
@@ -2767,6 +3153,7 @@ class TelegramAdapter(BasePlatformAdapter):
"caption": caption[:1024] if caption else None,
"reply_to_message_id": reply_to_id,
**photo_thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
@@ -2802,6 +3189,7 @@ class TelegramAdapter(BasePlatformAdapter):
"caption": caption[:1024] if caption else None,
"reply_to_message_id": reply_to_id,
**upload_thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
@@ -2847,6 +3235,7 @@ class TelegramAdapter(BasePlatformAdapter):
"caption": caption[:1024] if caption else None,
"reply_to_message_id": reply_to_id,
**animation_thread_kwargs,
**self._notification_kwargs(metadata),
},
metadata,
reply_to_id,
@@ -3109,9 +3498,18 @@ class TelegramAdapter(BasePlatformAdapter):
configured = self.config.extra.get("require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() in ("true", "1", "yes", "on")
return configured.lower() in {"true", "1", "yes", "on"}
return bool(configured)
return os.getenv("TELEGRAM_REQUIRE_MENTION", "false").lower() in ("true", "1", "yes", "on")
return os.getenv("TELEGRAM_REQUIRE_MENTION", "false").lower() in {"true", "1", "yes", "on"}
def _telegram_guest_mode(self) -> bool:
"""Return whether non-allowlisted groups may trigger via direct @mention."""
configured = self.config.extra.get("guest_mode")
if configured is not None:
if isinstance(configured, str):
return configured.lower() in {"true", "1", "yes", "on"}
return bool(configured)
return os.getenv("TELEGRAM_GUEST_MODE", "false").lower() in {"true", "1", "yes", "on"}
def _telegram_free_response_chats(self) -> set[str]:
raw = self.config.extra.get("free_response_chats")
@@ -3124,8 +3522,9 @@ class TelegramAdapter(BasePlatformAdapter):
def _telegram_allowed_chats(self) -> set[str]:
"""Return the whitelist of group/supergroup chat IDs the bot will respond in.
When non-empty, group messages from chats NOT in this set are silently
ignored even if the bot is @mentioned. DMs are never filtered.
When non-empty, group messages from chats NOT in this set are
silently ignored unless ``guest_mode`` is enabled and the bot is
explicitly @mentioned. DMs are never filtered.
Empty set means no restriction (fully backward compatible).
"""
raw = self.config.extra.get("allowed_chats")
@@ -3199,7 +3598,7 @@ class TelegramAdapter(BasePlatformAdapter):
if not chat:
return False
chat_type = str(getattr(chat, "type", "")).split(".")[-1].lower()
return chat_type in ("group", "supergroup")
return chat_type in {"group", "supergroup"}
def _is_reply_to_bot(self, message: Message) -> bool:
if not self._bot or not getattr(message, "reply_to_message", None):
@@ -3272,6 +3671,14 @@ class TelegramAdapter(BasePlatformAdapter):
return True
return False
def _is_guest_mention(self, message: Message) -> bool:
"""Return True for the narrow guest-mode bypass: explicit bot mention.
The caller (:meth:`_should_process_message`) has already verified
the message is a group chat, so that check is not repeated here.
"""
return self._telegram_guest_mode() and self._message_mentions_bot(message)
def _clean_bot_trigger_text(self, text: Optional[str]) -> Optional[str]:
if not text or not self._bot or not getattr(self._bot, "username", None):
return text
@@ -3283,16 +3690,18 @@ class TelegramAdapter(BasePlatformAdapter):
"""Apply Telegram group trigger rules.
DMs remain unrestricted. Group/supergroup messages are accepted when:
- the chat passes the ``allowed_chats`` whitelist (when set)
- the chat passes the ``allowed_chats`` whitelist (when set), or
``guest_mode`` is enabled and the bot is explicitly mentioned
- the chat is explicitly allowlisted in ``free_response_chats``
- ``require_mention`` is disabled
- the message replies to the bot
- the bot is @mentioned
- the text/caption matches a configured regex wake-word pattern
When ``allowed_chats`` is non-empty, it acts as a hard gate messages
from any chat not in the list are ignored regardless of the other
rules. When ``require_mention`` is enabled, slash commands are not given
When ``allowed_chats`` is non-empty, it remains a hard gate except for
the narrow ``guest_mode`` bypass: group/supergroup messages that
explicitly @mention this bot. Replies and regex wake words do not bypass
``allowed_chats``. When ``require_mention`` is enabled, slash commands are not given
special treatment they must pass the same mention/reply checks
as any other group message. Users can still trigger commands via
the Telegram bot menu (``/command@botname``) or by explicitly
@@ -3301,14 +3710,7 @@ class TelegramAdapter(BasePlatformAdapter):
"""
if not self._is_group_chat(message):
return True
# allowed_chats check (whitelist — must pass before other gating).
# When set, group messages from chats NOT in this whitelist are
# silently ignored, even if @mentioned. DMs are already excluded above.
allowed = self._telegram_allowed_chats()
if allowed:
chat_id_str = str(getattr(getattr(message, "chat", None), "id", ""))
if chat_id_str not in allowed:
return False
thread_id = getattr(message, "message_thread_id", None)
if thread_id is not None:
try:
@@ -3316,13 +3718,31 @@ class TelegramAdapter(BasePlatformAdapter):
return False
except (TypeError, ValueError):
logger.warning("[%s] Ignoring non-numeric Telegram message_thread_id: %r", self.name, thread_id)
if str(getattr(getattr(message, "chat", None), "id", "")) in self._telegram_free_response_chats():
chat_id_str = str(getattr(getattr(message, "chat", None), "id", ""))
# Resolve guest-mode mention bypass once so _message_mentions_bot
# is not called redundantly in the normal flow below.
guest_mention = self._is_guest_mention(message)
# allowed_chats check (whitelist). When set, group messages from chats
# outside the whitelist are ignored unless guest_mode permits this
# exact message as an explicit direct mention. DMs are excluded above.
allowed = self._telegram_allowed_chats()
if allowed and chat_id_str not in allowed:
return guest_mention
if guest_mention:
return True
if chat_id_str in self._telegram_free_response_chats():
return True
if not self._telegram_require_mention():
return True
if self._is_reply_to_bot(message):
return True
if self._message_mentions_bot(message):
# When guest_mode is True, _is_guest_mention already called
# _message_mentions_bot above — skip the redundant second call.
if not self._telegram_guest_mode() and self._message_mentions_bot(message):
return True
return self._message_matches_mention_patterns(message)
@@ -3442,12 +3862,27 @@ class TelegramAdapter(BasePlatformAdapter):
"""
current_task = asyncio.current_task()
try:
# Adaptive delay: if the latest chunk is near Telegram's 4096-char
# split point, a continuation is almost certain — wait longer.
# Adaptive delay tiers:
# - last chunk ≥ _SPLIT_THRESHOLD: a continuation is almost
# certain → wait the longer split delay.
# - total accumulated text ≤ _TEXT_BATCH_FAST_LEN (~320 cp):
# short message → cap delay at _TEXT_BATCH_FAST_DELAY_S
# so the agent sees the text near-instantly.
# - total ≤ _TEXT_BATCH_SHORT_LEN (~1024 cp):
# medium → cap at _TEXT_BATCH_SHORT_DELAY_S.
# - otherwise: use the configured cap.
# Tiers compose with operator overrides via the env-var-driven
# ``_text_batch_delay_seconds`` (e.g. an operator who sets the
# cap below 0.18s gets that lower number on every tier).
pending = self._pending_text_batches.get(key)
last_len = getattr(pending, "_last_chunk_len", 0) if pending else 0
total_len = len(getattr(pending, "text", "") or "") if pending else 0
if last_len >= self._SPLIT_THRESHOLD:
delay = self._text_batch_split_delay_seconds
elif total_len <= self._TEXT_BATCH_FAST_LEN:
delay = min(self._text_batch_delay_seconds, self._TEXT_BATCH_FAST_DELAY_S)
elif total_len <= self._TEXT_BATCH_SHORT_LEN:
delay = min(self._text_batch_delay_seconds, self._TEXT_BATCH_SHORT_DELAY_S)
else:
delay = self._text_batch_delay_seconds
await asyncio.sleep(delay)
@@ -3722,7 +4157,7 @@ class TelegramAdapter(BasePlatformAdapter):
# For text files, inject content into event.text (capped at 100 KB)
MAX_TEXT_INJECT_BYTES = 100 * 1024
if ext in (".md", ".txt") and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
if ext in {".md", ".txt"} and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
try:
text_content = raw_bytes.decode("utf-8")
display_name = original_filename or f"document{ext}"
@@ -3961,14 +4396,29 @@ class TelegramAdapter(BasePlatformAdapter):
# Determine chat type
chat_type = "dm"
if chat.type in (ChatType.GROUP, ChatType.SUPERGROUP):
if chat.type in {ChatType.GROUP, ChatType.SUPERGROUP}:
chat_type = "group"
elif chat.type == ChatType.CHANNEL:
chat_type = "channel"
# Resolve DM topic name and skill binding
# Resolve DM topic name and skill binding.
# In private chats, only preserve thread ids for real topic messages
# (is_topic_message=True). Telegram puts message_thread_id on every
# DM that is a reply, even when the user is just replying to a
# previous message in the same DM — that bogus id then routes to a
# nonexistent thread and Telegram returns 'Message thread not found'
# on send (#3206).
thread_id_raw = message.message_thread_id
thread_id_str = str(thread_id_raw) if thread_id_raw is not None else None
is_topic_message = bool(getattr(message, "is_topic_message", False))
thread_id_str = None
if thread_id_raw is not None:
if chat_type == "group":
thread_id_str = str(thread_id_raw)
elif chat_type == "dm" and is_topic_message:
thread_id_str = str(thread_id_raw)
# For forum groups without an explicit topic, default to the
# General-topic id so the gateway routes back to the General topic
# rather than dropping into the bot's main channel (#22423).
if chat_type == "group" and thread_id_str is None and getattr(chat, "is_forum", False):
thread_id_str = self._GENERAL_TOPIC_THREAD_ID
chat_topic = None
@@ -4012,12 +4462,28 @@ class TelegramAdapter(BasePlatformAdapter):
chat_topic=chat_topic,
)
# Extract reply context if this message is a reply
# Extract reply context if this message is a reply.
# Prefer Telegram's native partial quote (message.quote, TextQuote)
# so a user replying to a single selected substring of a prior
# multi-section message doesn't get the whole replied-to message
# injected into the agent's context — which can cause the agent
# to act on unrelated actionable-looking text the user didn't
# quote (#22619). Fall back to the full replied-to message text
# / caption when no native quote is present.
reply_to_id = None
reply_to_text = None
if message.reply_to_message:
reply_to_id = str(message.reply_to_message.message_id)
reply_to_text = message.reply_to_message.text or message.reply_to_message.caption or None
quote = getattr(message, "quote", None)
quote_text = getattr(quote, "text", None) if quote is not None else None
if quote_text:
reply_to_text = quote_text
else:
reply_to_text = (
message.reply_to_message.text
or message.reply_to_message.caption
or None
)
# Per-channel/topic ephemeral prompt
from gateway.platforms.base import resolve_channel_prompt
@@ -4046,7 +4512,7 @@ class TelegramAdapter(BasePlatformAdapter):
def _reactions_enabled(self) -> bool:
"""Check if message reactions are enabled via config/env."""
return os.getenv("TELEGRAM_REACTIONS", "false").lower() not in ("false", "0", "no")
return os.getenv("TELEGRAM_REACTIONS", "false").lower() not in {"false", "0", "no"}
async def _set_reaction(self, chat_id: str, message_id: str, emoji: str) -> bool:
"""Set a single emoji reaction on a Telegram message."""
+1 -1
View File
@@ -59,7 +59,7 @@ class TelegramFallbackTransport(httpx.AsyncBaseTransport):
"""
def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
self._fallback_ips = list(dict.fromkeys(_normalize_fallback_ips(fallback_ips)))
proxy_url = _resolve_proxy_url(target_hosts=[_TELEGRAM_API_HOST, *self._fallback_ips])
if proxy_url and "proxy" not in transport_kwargs:
transport_kwargs["proxy"] = proxy_url
+4 -4
View File
@@ -295,7 +295,7 @@ class WeComAdapter(BasePlatformAdapter):
auth_payload = await self._wait_for_handshake(req_id)
errcode = auth_payload.get("errcode", 0)
if errcode not in (0, None):
if errcode not in {0, None}:
errmsg = auth_payload.get("errmsg", "authentication failed")
raise RuntimeError(f"{errmsg} (errcode={errcode})")
@@ -320,7 +320,7 @@ class WeComAdapter(BasePlatformAdapter):
if self._payload_req_id(payload) == req_id:
return payload
logger.debug("[%s] Ignoring pre-auth payload: %s", self.name, payload.get("cmd"))
elif msg.type in (aiohttp.WSMsgType.CLOSED, aiohttp.WSMsgType.CLOSE, aiohttp.WSMsgType.ERROR):
elif msg.type in {aiohttp.WSMsgType.CLOSED, aiohttp.WSMsgType.CLOSE, aiohttp.WSMsgType.ERROR}:
raise RuntimeError("WeCom websocket closed during authentication")
async def _listen_loop(self) -> None:
@@ -360,7 +360,7 @@ class WeComAdapter(BasePlatformAdapter):
payload = self._parse_json(msg.data)
if payload:
await self._dispatch_payload(payload)
elif msg.type in (aiohttp.WSMsgType.CLOSE, aiohttp.WSMsgType.CLOSED, aiohttp.WSMsgType.ERROR):
elif msg.type in {aiohttp.WSMsgType.CLOSE, aiohttp.WSMsgType.CLOSED, aiohttp.WSMsgType.ERROR}:
raise RuntimeError("WeCom websocket closed")
async def _heartbeat_loop(self) -> None:
@@ -998,7 +998,7 @@ class WeComAdapter(BasePlatformAdapter):
@staticmethod
def _response_error(response: Dict[str, Any]) -> Optional[str]:
errcode = response.get("errcode", 0)
if errcode in (0, None):
if errcode in {0, None}:
return None
errmsg = str(response.get("errmsg") or "unknown error")
return f"WeCom errcode {errcode}: {errmsg}"
+4 -4
View File
@@ -605,7 +605,7 @@ def _assert_weixin_cdn_url(url: str) -> None:
except Exception as exc: # noqa: BLE001
raise ValueError(f"Unparseable media URL: {url!r}") from exc
if scheme not in ("http", "https"):
if scheme not in {"http", "https"}:
raise ValueError(
f"Media URL has disallowed scheme {scheme!r}; only http/https are permitted."
)
@@ -983,7 +983,7 @@ def _extract_text(item_list: List[Dict[str, Any]]) -> str:
ref = item.get("ref_msg") or {}
ref_item = ref.get("message_item") or {}
ref_type = ref_item.get("type")
if ref_type in (ITEM_IMAGE, ITEM_VIDEO, ITEM_FILE, ITEM_VOICE):
if ref_type in {ITEM_IMAGE, ITEM_VIDEO, ITEM_FILE, ITEM_VOICE}:
title = ref.get("title") or ""
prefix = f"[引用媒体: {title}]\n" if title else "[引用媒体]\n"
return f"{prefix}{text}".strip()
@@ -1331,7 +1331,7 @@ class WeixinAdapter(BasePlatformAdapter):
ret = response.get("ret", 0)
errcode = response.get("errcode", 0)
if ret not in (0, None) or errcode not in (0, None):
if ret not in {0, None} or errcode not in {0, None}:
if (ret == SESSION_EXPIRED_ERRCODE or errcode == SESSION_EXPIRED_ERRCODE
or _is_stale_session_ret(ret, errcode, response.get("errmsg"))):
logger.error("[%s] Session expired; pausing for 10 minutes", self.name)
@@ -1601,7 +1601,7 @@ class WeixinAdapter(BasePlatformAdapter):
if resp and isinstance(resp, dict):
ret = resp.get("ret")
errcode = resp.get("errcode")
if (ret is not None and ret not in (0,)) or (errcode is not None and errcode not in (0,)):
if (ret is not None and ret not in {0,}) or (errcode is not None and errcode not in {0,}):
is_session_expired = (
ret == SESSION_EXPIRED_ERRCODE
or errcode == SESSION_EXPIRED_ERRCODE
+4 -4
View File
@@ -301,9 +301,9 @@ class WhatsAppAdapter(BasePlatformAdapter):
configured = self.config.extra.get("require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() in ("true", "1", "yes", "on")
return configured.lower() in {"true", "1", "yes", "on"}
return bool(configured)
return os.getenv("WHATSAPP_REQUIRE_MENTION", "false").lower() in ("true", "1", "yes", "on")
return os.getenv("WHATSAPP_REQUIRE_MENTION", "false").lower() in {"true", "1", "yes", "on"}
def _whatsapp_free_response_chats(self) -> set[str]:
raw = self.config.extra.get("free_response_chats")
@@ -679,7 +679,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
# getattr-with-default keeps tests that construct the adapter via
# ``WhatsAppAdapter.__new__`` (bypassing __init__) working without
# every _make_adapter() helper having to seed the attribute.
if getattr(self, "_shutting_down", False) and returncode in (0, -2, -15):
if getattr(self, "_shutting_down", False) and returncode in {0, -2, -15}:
logger.info(
"[%s] Bridge exited during shutdown (code %d).",
self.name,
@@ -1183,7 +1183,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
if msg_type == MessageType.DOCUMENT and cached_urls:
for doc_path in cached_urls:
ext = Path(doc_path).suffix.lower()
if ext in (".txt", ".md", ".csv", ".json", ".xml", ".yaml", ".yml", ".log", ".py", ".js", ".ts", ".html", ".css"):
if ext in {".txt", ".md", ".csv", ".json", ".xml", ".yaml", ".yml", ".log", ".py", ".js", ".ts", ".html", ".css"}:
try:
file_size = Path(doc_path).stat().st_size
if file_size > MAX_TEXT_INJECT_BYTES:
+5 -5
View File
@@ -2228,7 +2228,7 @@ class MediaResolveMiddleware(InboundMiddleware):
resp.raise_for_status()
payload = resp.json()
code = payload.get("code")
if code not in (None, 0):
if code not in {None, 0}:
raise RuntimeError(
f"resource/v1/download failed: code={code}, msg={payload.get('msg', '')}"
)
@@ -2391,7 +2391,7 @@ class MediaResolveMiddleware(InboundMiddleware):
rid = m.group(2)
kind, _, filename = head.partition(":")
kind = kind.strip()
if kind not in ("image", "file"):
if kind not in {"image", "file"}:
continue
if rid in seen:
continue
@@ -2993,10 +2993,10 @@ class ConnectionManager:
# Fire-and-forget heartbeat ACKs — server always responds but callers don't
# wait on these; silently discard to avoid "Unmatched Response" noise.
if cmd_type == CMD_TYPE["Response"] and cmd in (
if cmd_type == CMD_TYPE["Response"] and cmd in {
"send_group_heartbeat",
"send_private_heartbeat",
):
}:
logger.debug("[%s] Heartbeat ACK received: cmd=%s msg_id=%s", adapter.name, cmd, msg_id)
return
@@ -3369,7 +3369,7 @@ class MediaSendHandler(ABC):
# Remove keys already passed explicitly to avoid "multiple values" TypeError
fwd_kwargs = {
k: v for k, v in kwargs.items()
if k not in ("file_uuid", "filename", "content_type")
if k not in {"file_uuid", "filename", "content_type"}
}
msg_body = self.build_msg_body(
upload_result,
+2 -2
View File
@@ -150,7 +150,7 @@ def _parse_jpeg_size(buf: bytes) -> Optional[dict[str, int]]:
i += 1
continue
marker = buf[i + 1]
if marker in (0xC0, 0xC2):
if marker in {0xC0, 0xC2}:
h = struct.unpack(">H", buf[i + 5: i + 7])[0]
w = struct.unpack(">H", buf[i + 7: i + 9])[0]
return {"width": w, "height": h}
@@ -165,7 +165,7 @@ def _parse_gif_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 10:
return None
sig = buf[:6].decode("ascii", errors="replace")
if sig not in ("GIF87a", "GIF89a"):
if sig not in {"GIF87a", "GIF89a"}:
return None
w = struct.unpack("<H", buf[6:8])[0]
h = struct.unpack("<H", buf[8:10])[0]
+1 -1
View File
@@ -702,7 +702,7 @@ def decode_inbound_push(data: bytes) -> Optional[dict]:
"trace_id": trace_id,
}
# 过滤空值(保持 API 整洁)
return {k: v for k, v in result.items() if v or k in ("msg_body", "msg_seq")}
return {k: v for k, v in result.items() if v or k in {"msg_body", "msg_seq"}}
except Exception as e:
if DEBUG_MODE:
logger.debug("[yuanbao_proto] decode_inbound_push failed: %s", e)
+1298 -564
View File
File diff suppressed because it is too large Load Diff
+12 -7
View File
@@ -764,12 +764,12 @@ class SessionStore:
now = _now()
if policy.mode in ("idle", "both"):
if policy.mode in {"idle", "both"}:
idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
if now > idle_deadline:
return True
if policy.mode in ("daily", "both"):
if policy.mode in {"daily", "both"}:
today_reset = now.replace(
hour=policy.at_hour,
minute=0, second=0, microsecond=0,
@@ -805,12 +805,12 @@ class SessionStore:
now = _now()
if policy.mode in ("idle", "both"):
if policy.mode in {"idle", "both"}:
idle_deadline = entry.updated_at + timedelta(minutes=policy.idle_minutes)
if now > idle_deadline:
return "idle"
if policy.mode in ("daily", "both"):
if policy.mode in {"daily", "both"}:
today_reset = now.replace(
hour=policy.at_hour,
minute=0,
@@ -1276,9 +1276,14 @@ class SessionStore:
# Also write legacy JSONL (keeps existing tooling working during transition)
transcript_path = self.get_transcript_path(session_id)
with self._lock:
with open(transcript_path, "a", encoding="utf-8") as f:
f.write(json.dumps(message, ensure_ascii=False) + "\n")
try:
with self._lock:
with open(transcript_path, "a", encoding="utf-8") as f:
f.write(json.dumps(message, ensure_ascii=False) + "\n")
except OSError as e:
# Disk full / read-only fs / permission errors must not crash the
# message handler — the SQLite write above is the primary store.
logger.debug("Failed to write JSONL transcript for %s: %s", session_id, e)
def rewrite_transcript(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
"""Replace the entire transcript for a session with new messages.
+2
View File
@@ -55,6 +55,7 @@ _SESSION_THREAD_ID: ContextVar = ContextVar("HERMES_SESSION_THREAD_ID", default=
_SESSION_USER_ID: ContextVar = ContextVar("HERMES_SESSION_USER_ID", default=_UNSET)
_SESSION_USER_NAME: ContextVar = ContextVar("HERMES_SESSION_USER_NAME", default=_UNSET)
_SESSION_KEY: ContextVar = ContextVar("HERMES_SESSION_KEY", default=_UNSET)
_SESSION_ID: ContextVar = ContextVar("HERMES_SESSION_ID", default=_UNSET)
# Cron auto-delivery vars — set per-job in run_job() so concurrent jobs
# don't clobber each other's delivery targets.
@@ -70,6 +71,7 @@ _VAR_MAP = {
"HERMES_SESSION_USER_ID": _SESSION_USER_ID,
"HERMES_SESSION_USER_NAME": _SESSION_USER_NAME,
"HERMES_SESSION_KEY": _SESSION_KEY,
"HERMES_SESSION_ID": _SESSION_ID,
"HERMES_CRON_AUTO_DELIVER_PLATFORM": _CRON_AUTO_DELIVER_PLATFORM,
"HERMES_CRON_AUTO_DELIVER_CHAT_ID": _CRON_AUTO_DELIVER_CHAT_ID,
"HERMES_CRON_AUTO_DELIVER_THREAD_ID": _CRON_AUTO_DELIVER_THREAD_ID,
+462
View File
@@ -0,0 +1,462 @@
"""Shutdown forensics — capture context when the gateway receives SIGTERM/SIGINT.
The gateway's ``shutdown_signal_handler`` runs synchronously inside the
asyncio event loop. We can't safely block it for long, but we DO want a
durable record of who/what triggered the shutdown so that "the gateway
keeps dying" incidents can be diagnosed after the fact.
This module exposes :func:`snapshot_shutdown_context`, a fast (<10ms),
non-blocking probe that returns a structured dict the signal handler can
log immediately, plus :func:`spawn_async_diagnostic`, a fire-and-forget
``ps`` walk that runs as a detached subprocess so it can't block teardown
even if /proc is wedged.
Anything that needs to wait (e.g. shelling out to ``ps aux``) belongs in
the async helper, never in the synchronous probe.
"""
from __future__ import annotations
import json
import os
import signal
import subprocess
import sys
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
_SIGNAL_NAME_BY_NUM: Dict[int, str] = {}
for _name in ("SIGTERM", "SIGINT", "SIGHUP", "SIGQUIT", "SIGUSR1", "SIGUSR2"):
_val = getattr(signal, _name, None)
if _val is not None:
_SIGNAL_NAME_BY_NUM[int(_val)] = _name
def _signal_name(sig: Any) -> str:
"""Return a human-readable signal name (or ``str(sig)`` as fallback)."""
if sig is None:
return "UNKNOWN"
try:
sig_int = int(sig)
except (TypeError, ValueError):
return str(sig)
return _SIGNAL_NAME_BY_NUM.get(sig_int, f"signal#{sig_int}")
def _read_proc_field(pid: int, key: str) -> Optional[str]:
"""Read a single field from /proc/<pid>/status. Linux only; None elsewhere."""
try:
with open(f"/proc/{pid}/status", encoding="utf-8") as fh:
for line in fh:
if line.startswith(key + ":"):
return line.split(":", 1)[1].strip()
except (FileNotFoundError, PermissionError, OSError):
pass
return None
def _read_proc_cmdline(pid: int) -> Optional[str]:
"""Read /proc/<pid>/cmdline as a printable string. Linux only; None elsewhere."""
try:
with open(f"/proc/{pid}/cmdline", "rb") as fh:
data = fh.read()
except (FileNotFoundError, PermissionError, OSError):
return None
if not data:
return None
# cmdline uses NUL separators
return data.replace(b"\x00", b" ").decode("utf-8", errors="replace").strip()
def _proc_summary(pid: int) -> Dict[str, Any]:
"""Compact /proc/<pid> snapshot: pid, ppid, state, uid, cmdline.
Best-effort. Missing fields are simply omitted rather than raising.
"""
summary: Dict[str, Any] = {"pid": pid}
if pid <= 0:
return summary
name = _read_proc_field(pid, "Name")
if name is not None:
summary["name"] = name
state = _read_proc_field(pid, "State")
if state is not None:
summary["state"] = state
ppid = _read_proc_field(pid, "PPid")
if ppid is not None:
try:
summary["ppid"] = int(ppid)
except ValueError:
pass
uid = _read_proc_field(pid, "Uid")
if uid is not None:
# "real effective saved fs"
summary["uid"] = uid.split()[0] if uid else uid
cmdline = _read_proc_cmdline(pid)
if cmdline:
# Truncate aggressively — these can be 4KB
summary["cmdline"] = cmdline[:300]
return summary
def snapshot_shutdown_context(received_signal: Any = None) -> Dict[str, Any]:
"""Fast (<10ms) snapshot of who/what is asking us to shut down.
Captures:
* The signal number/name (so SIGINT vs SIGTERM is visible)
* Our own PID/ppid + parent process info from /proc (Linux)
* Whether systemd is our parent (``ppid==1`` or ``INVOCATION_ID`` set)
* Whether takeover/planned-stop markers exist (consumed lazily by the caller)
* /proc/self limits + load average (1-min)
* Wall-clock and monotonic timestamps for cross-correlating later phases
Pure stdlib, never raises, never blocks on subprocesses.
"""
now = time.time()
monotonic = time.monotonic()
pid = os.getpid()
ppid = os.getppid()
ctx: Dict[str, Any] = {
"ts": now,
"ts_monotonic": monotonic,
"signal": _signal_name(received_signal),
"signal_num": int(received_signal) if received_signal is not None else None,
"pid": pid,
"ppid": ppid,
"parent": _proc_summary(ppid),
"self": _proc_summary(pid),
}
# systemd context. If we were started by a systemd unit, INVOCATION_ID
# is set in our env. ppid==1 (init) is also a strong signal that
# systemd reaped+forwarded the SIGTERM.
invocation_id = os.environ.get("INVOCATION_ID")
if invocation_id:
ctx["systemd_invocation_id"] = invocation_id
journal_stream = os.environ.get("JOURNAL_STREAM")
if journal_stream:
ctx["systemd_journal_stream"] = journal_stream
ctx["under_systemd"] = bool(invocation_id) or ppid == 1
# Load average — high load points the finger at "something else
# crushing the box" rather than "external killer".
try:
ctx["loadavg_1m"] = os.getloadavg()[0]
except (OSError, AttributeError):
pass
# /proc/self/status TracerPid: nonzero means a debugger / strace is
# attached. Useful when "phantom SIGKILL" turns out to be a manual
# gdb session.
try:
tracer = _read_proc_field(pid, "TracerPid")
if tracer is not None and tracer != "0":
ctx["tracer_pid"] = int(tracer) if tracer.isdigit() else tracer
ctx["tracer"] = _proc_summary(int(tracer)) if tracer.isdigit() else None
except (TypeError, ValueError):
pass
# Race-detection hint: did somebody recently start a sibling gateway
# with --replace? We can't see the new process directly here, but if
# there's a takeover marker on disk that DOESN'T name us, that's a
# smoking gun for "another --replace instance is killing us".
# Filenames mirror gateway.status (._TAKEOVER_MARKER_FILENAME /
# _PLANNED_STOP_MARKER_FILENAME); we use string literals here so the
# signal-handler path stays import-light.
try:
hermes_home_str = os.environ.get("HERMES_HOME")
if hermes_home_str:
takeover_path = Path(hermes_home_str) / ".gateway-takeover.json"
if takeover_path.exists():
try:
raw = takeover_path.read_text(encoding="utf-8")
ctx["takeover_marker"] = raw[:300]
ctx["takeover_marker_for_self"] = (
f'"target_pid": {pid}' in raw
or f"'target_pid': {pid}" in raw
)
except OSError:
pass
planned_stop_path = Path(hermes_home_str) / ".gateway-planned-stop.json"
if planned_stop_path.exists():
try:
raw = planned_stop_path.read_text(encoding="utf-8")
ctx["planned_stop_marker"] = raw[:300]
except OSError:
pass
except Exception: # noqa: BLE001 — never raise from a signal handler
pass
return ctx
def spawn_async_diagnostic(
log_path: Path,
signal_name: str,
*,
timeout_seconds: float = 5.0,
) -> Optional[int]:
"""Fire-and-forget ``ps``-style snapshot written to ``log_path``.
Runs as a detached subprocess so it can't block the asyncio event loop
or compete with platform teardown. The subprocess uses its own
``timeout`` so a wedged ``ps`` still self-cleans within
``timeout_seconds``.
Returns the subprocess PID on success, ``None`` on failure. Never
raises.
We deliberately avoid ``subprocess.run(["ps", "aux"])`` from inside the
signal handler (the pre-existing pattern): on a busy host with hundreds
of processes, ``ps aux`` can take >2s to walk /proc, during which the
asyncio loop is frozen and adapter teardown can't begin.
"""
try:
log_path.parent.mkdir(parents=True, exist_ok=True)
except OSError:
return None
# Inline shell so we don't have to ship a helper script. bash -c is
# available on every POSIX target we support; on Windows we just skip
# the snapshot (the platform doesn't ship ps anyway).
if sys.platform == "win32":
return None
script = (
f"echo '=== shutdown diagnostic @ {signal_name} ==='; "
"echo '--- date ---'; date -u +%Y-%m-%dT%H:%M:%SZ; "
"echo '--- ps auxf (top 60 by cpu) ---'; "
"ps auxf --sort=-pcpu 2>/dev/null | head -60; "
"echo '--- pstree of self ---'; "
f"pstree -plau {os.getpid()} 2>/dev/null | head -40 || true; "
"echo '--- /proc/loadavg ---'; "
"cat /proc/loadavg 2>/dev/null || true; "
"echo '--- recent dmesg (oom/killed) ---'; "
"dmesg -T 2>/dev/null | tail -20 || journalctl --user -n 20 --no-pager 2>/dev/null | tail -20 || true; "
"echo '=== end ==='"
)
try:
# Open the log file in append mode and let the subprocess inherit.
# We use os.O_APPEND so concurrent diagnostics from rapid signals
# don't trample each other.
fd = os.open(str(log_path), os.O_WRONLY | os.O_CREAT | os.O_APPEND, 0o644)
except OSError:
return None
try:
# Detach from our process group so the subprocess survives even
# if systemd kills our cgroup with KillMode=control-group (which
# would also reap us anyway, but defense in depth). Without
# start_new_session, a SIGKILL on our cgroup takes the diag down
# before it can flush.
proc = subprocess.Popen(
["timeout", f"{timeout_seconds:.0f}", "bash", "-c", script],
stdout=fd,
stderr=subprocess.STDOUT,
stdin=subprocess.DEVNULL,
start_new_session=True,
close_fds=True,
)
except (FileNotFoundError, OSError):
try:
os.close(fd)
except OSError:
pass
return None
finally:
# Subprocess inherited the fd; we can drop our handle.
try:
os.close(fd)
except OSError:
pass
return proc.pid
def format_context_for_log(ctx: Dict[str, Any]) -> str:
"""Render a shutdown context dict as a single, scannable log line."""
sig = ctx.get("signal", "?")
parent = ctx.get("parent") or {}
parent_cmd = parent.get("cmdline", "(unknown)")
parent_name = parent.get("name") or "?"
parent_pid = parent.get("pid") or "?"
under_systemd = "yes" if ctx.get("under_systemd") else "no"
load = ctx.get("loadavg_1m")
load_str = f"{load:.2f}" if isinstance(load, (int, float)) else "?"
extras: List[str] = []
if ctx.get("takeover_marker") is not None:
for_self = ctx.get("takeover_marker_for_self")
extras.append(
f"takeover_marker_present={'self' if for_self else 'other'}"
)
if ctx.get("planned_stop_marker") is not None:
extras.append("planned_stop_marker_present=yes")
if ctx.get("tracer_pid"):
extras.append(f"tracer_pid={ctx['tracer_pid']}")
extras_str = (" " + " ".join(extras)) if extras else ""
# Parent cmdline is the most useful single signal — log it prominently.
return (
f"signal={sig} "
f"under_systemd={under_systemd} "
f"parent_pid={parent_pid} "
f"parent_name={parent_name} "
f"loadavg_1m={load_str}"
f"{extras_str} "
f"parent_cmdline={parent_cmd!r}"
)
def context_as_json(ctx: Dict[str, Any]) -> str:
"""JSON-serialise a context dict for structured ingestion. Never raises."""
try:
return json.dumps(ctx, default=str, sort_keys=True)
except (TypeError, ValueError):
return "{}"
def check_systemd_timing_alignment(drain_timeout: float) -> Optional[Dict[str, Any]]:
"""At startup, sanity-check that systemd's TimeoutStopSec >= drain_timeout.
When the gateway is run under a stale systemd unit file (e.g. the user
upgraded hermes-agent but never re-ran ``hermes setup`` to regenerate
the unit), ``TimeoutStopSec`` can be smaller than the configured
``restart_drain_timeout``. Result: SIGTERM arrives, the drain starts,
and systemd SIGKILLs the cgroup mid-drain looks like a phantom kill
in the journal because the journal only logs ``code=killed status=9``.
Returns ``None`` when the alignment is fine OR we can't determine it
(not running under systemd, ``systemctl`` unavailable, etc.). Returns
a dict with ``timeout_stop_sec`` + ``drain_timeout`` + ``mismatch``
bool when we have data to report.
Best-effort. Never raises.
"""
invocation_id = os.environ.get("INVOCATION_ID")
if not invocation_id:
return None # Not running under systemd (or at least not directly)
# Try to identify our unit name and ask systemctl for its config.
unit_name: Optional[str] = None
try:
# /proc/self/cgroup gives us "0::/user.slice/.../hermes-gateway.service"
with open("/proc/self/cgroup", encoding="utf-8") as fh:
for line in fh:
# systemd cgroup line ends with the unit name
if ".service" in line:
parts = line.strip().split("/")
for p in reversed(parts):
if p.endswith(".service"):
unit_name = p
break
if unit_name:
break
except (OSError, FileNotFoundError):
pass
if not unit_name:
return None
# Query systemctl for TimeoutStopUSec. Use --user OR system depending
# on which manager actually owns the unit. Try user first since
# that's the common case for hermes.
timeout_us: Optional[int] = None
for flag in (["--user"], []):
try:
result = subprocess.run(
["systemctl", *flag, "show", unit_name, "--property=TimeoutStopUSec"],
capture_output=True, text=True, timeout=2.0,
)
except (FileNotFoundError, subprocess.TimeoutExpired, OSError):
continue
if result.returncode != 0:
continue
# Output: "TimeoutStopUSec=1min 30s" or "TimeoutStopUSec=90000000"
for line in result.stdout.splitlines():
if line.startswith("TimeoutStopUSec="):
value = line.split("=", 1)[1].strip()
# Try numeric microseconds first
if value.isdigit():
timeout_us = int(value)
else:
timeout_us = _parse_systemd_duration_to_us(value)
if timeout_us is not None:
break
if timeout_us is not None:
break
if timeout_us is None:
return None
timeout_stop_sec = timeout_us / 1_000_000.0
# systemd needs headroom for: post-interrupt kill, adapter disconnect,
# SessionDB close, file unlinks, etc. 30s matches the unit-template
# constant in hermes_cli/gateway.py.
headroom = 30.0
expected = drain_timeout + headroom
return {
"unit": unit_name,
"timeout_stop_sec": timeout_stop_sec,
"drain_timeout": drain_timeout,
"expected_min": expected,
"mismatch": timeout_stop_sec < expected,
}
def _parse_systemd_duration_to_us(raw: str) -> Optional[int]:
"""Parse 'TimeoutStopUSec=1min 30s' / '90s' style values to microseconds.
systemd accepts a wide grammar; we cover the common cases (s, ms, min,
h) and return None on anything unexpected. Never raises.
"""
if not raw:
return None
units = {
"us": 1,
"ms": 1_000,
"s": 1_000_000,
"sec": 1_000_000,
"min": 60_000_000,
"h": 3_600_000_000,
"hr": 3_600_000_000,
}
total_us = 0
token = ""
digits = ""
for ch in raw + " ":
if ch.isdigit() or ch == ".":
if token:
# End previous unit, start new number
multiplier = units.get(token.lower())
if multiplier is None or not digits:
return None
try:
total_us += int(float(digits) * multiplier)
except ValueError:
return None
digits = ""
token = ""
digits += ch
elif ch.isalpha():
token += ch
elif digits and token:
multiplier = units.get(token.lower())
if multiplier is None:
return None
try:
total_us += int(float(digits) * multiplier)
except ValueError:
return None
digits = ""
token = ""
elif digits and not token:
# Bare number = seconds (rare but valid)
try:
total_us += int(float(digits) * 1_000_000)
except ValueError:
return None
digits = ""
return total_us if total_us > 0 else None
+229
View File
@@ -0,0 +1,229 @@
"""Per-platform slash command access control.
This module sits beside the existing per-platform allowlist (``allow_from``)
and adds a second axis: of the users who are *allowed to talk to the
gateway*, which ones can run *which slash commands*.
Two lists per platform scope (DM vs group, mirroring ``allow_from`` vs
``group_allow_from``):
- ``allow_admin_from`` user IDs that get every registered slash
command (built-in + plugin-registered).
- ``user_allowed_commands`` slash command names non-admin users may
run. Empty / unset non-admins get no
slash commands.
Backward compatibility:
If ``allow_admin_from`` is not set for a scope, slash command gating
is disabled entirely for that scope. Every allowed user can run every
slash command, exactly like before. This means existing installs are
unaffected until an operator opts in by listing at least one admin.
The gate is applied at the slash command dispatch site in
``gateway/run.py`` so it covers BOTH built-in and plugin-registered
commands via the live registry. Gating slash commands does not affect
plain chat non-admin users can still talk to the agent normally,
they just can't trigger commands outside ``user_allowed_commands``.
Authored as a slimmed-down salvage of PR #4443's permission tiers
(co-authored by @ReqX). The full tier system, audit log, usage
tracking, rate limiting, and tool filtering from that PR are not
included here only the slash-command access split.
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Any, FrozenSet, Iterable, Optional, Tuple
# Slash commands that MUST stay reachable for any allowed user, even when
# slash gating is enabled and the user has no commands listed. Without this
# carve-out, a non-admin user has no way to discover what they can or
# can't do (``/help``, ``/whoami``) and no way to see what state the agent
# is in (``/status``). These mirror the smallest set of read-only commands
# we'd hand to a guest. Operators can still narrow this further by writing
# their own ``user_allowed_commands`` (this set is only the implicit
# fallback floor — anything in ``user_allowed_commands`` overrides it
# additively, never restrictively).
_ALWAYS_ALLOWED_FOR_USERS: FrozenSet[str] = frozenset({
"help",
"whoami",
})
@dataclass(frozen=True)
class SlashAccessPolicy:
"""Resolved access policy for a single (platform, scope) pair.
``scope`` is ``"dm"`` for direct messages and ``"group"`` for groups,
channels, threads, and any other multi-user context. The mapping from
SessionSource.chat_type scope happens in ``policy_for_source``.
"""
enabled: bool # gating active for this scope?
admin_user_ids: FrozenSet[str]
user_allowed_commands: FrozenSet[str]
def is_admin(self, user_id: Optional[str]) -> bool:
if not self.enabled:
# Gating disabled → treat every allowed user as admin so
# downstream code can keep using ``is_admin`` / ``can_run``
# uniformly.
return True
if not user_id:
return False
return str(user_id) in self.admin_user_ids
def can_run(self, user_id: Optional[str], canonical_cmd: str) -> bool:
if not self.enabled:
return True
if self.is_admin(user_id):
return True
if not canonical_cmd:
return False
if canonical_cmd in _ALWAYS_ALLOWED_FOR_USERS:
return True
return canonical_cmd in self.user_allowed_commands
_DM_CHAT_TYPES = frozenset({"dm", "direct", "private", ""})
def _coerce_id_list(raw: Any) -> FrozenSet[str]:
"""Normalize a YAML-loaded admin/user list into a frozenset of strings.
Accepts ``None``, list, tuple, or comma-separated string. Stringifies
each entry and strips whitespace; empty entries are dropped.
"""
if raw is None:
return frozenset()
if isinstance(raw, (list, tuple, set, frozenset)):
items: Iterable[Any] = raw
elif isinstance(raw, str):
items = (s for s in raw.split(",") if s.strip())
else:
# single scalar (int user id, etc.)
items = (raw,)
out: list[str] = []
for it in items:
s = str(it).strip()
if s:
out.append(s)
return frozenset(out)
def _coerce_command_list(raw: Any) -> FrozenSet[str]:
"""Normalize a slash command allowlist.
Strips leading slashes so YAML can read either ``["help", "status"]``
or ``["/help", "/status"]``. Lowercase canonicalization matches how
``resolve_command()`` stores names.
"""
if raw is None:
return frozenset()
if isinstance(raw, (list, tuple, set, frozenset)):
items: Iterable[Any] = raw
elif isinstance(raw, str):
items = (s for s in raw.split(",") if s.strip())
else:
items = (raw,)
out: list[str] = []
for it in items:
s = str(it).strip().lstrip("/").lower()
if s:
out.append(s)
return frozenset(out)
def _scope_for_chat_type(chat_type: Optional[str]) -> str:
if chat_type and chat_type.lower() in _DM_CHAT_TYPES:
return "dm"
return "group"
def _platform_extra(platform_config: Any) -> dict:
"""Return the ``extra`` dict from a PlatformConfig-like object.
Defensively handles None and non-PlatformConfig shapes so calling
code can stay simple.
"""
if platform_config is None:
return {}
extra = getattr(platform_config, "extra", None)
if isinstance(extra, dict):
return extra
if isinstance(platform_config, dict):
# Some test harnesses pass dicts directly.
return platform_config
return {}
def _keys_for_scope(scope: str) -> Tuple[str, str]:
"""Return (admin_key, user_cmd_key) names for a scope."""
if scope == "group":
return ("group_allow_admin_from", "group_user_allowed_commands")
return ("allow_admin_from", "user_allowed_commands")
def policy_from_extra(extra: dict, scope: str) -> SlashAccessPolicy:
"""Build a policy from a platform's ``extra`` dict for one scope.
DM scope falls back to group scope keys ONLY for ``user_allowed_commands``
when the DM scope didn't specify its own. This keeps the common case
(operator wants the same command set DM and group) ergonomic without
forcing duplication. Admin lists are NOT cross-scope: an admin in
DMs is not implicitly an admin in a group.
"""
admin_key, cmd_key = _keys_for_scope(scope)
admin_ids = _coerce_id_list(extra.get(admin_key))
cmds = _coerce_command_list(extra.get(cmd_key))
if scope == "dm" and not cmds:
# DM didn't specify — let group's user_allowed_commands fall through
# so operators only need to list it once if it's the same.
cmds = _coerce_command_list(extra.get("group_user_allowed_commands"))
enabled = bool(admin_ids)
return SlashAccessPolicy(
enabled=enabled,
admin_user_ids=admin_ids,
user_allowed_commands=cmds,
)
def policy_for_source(gateway_config: Any, source: Any) -> SlashAccessPolicy:
"""Resolve the access policy for a SessionSource.
Returns a "disabled" policy (gating off, allow everything) when:
- gateway_config is None
- the platform has no PlatformConfig
- the platform's PlatformConfig has no admin list set for the scope
Callers should treat the returned policy as authoritative for slash
command gating only. It does not gate plain chat messages.
"""
if gateway_config is None or source is None:
return SlashAccessPolicy(
enabled=False,
admin_user_ids=frozenset(),
user_allowed_commands=frozenset(),
)
platforms = getattr(gateway_config, "platforms", None)
platform_config = None
if platforms is not None:
try:
platform_config = platforms.get(source.platform)
except Exception:
platform_config = None
extra = _platform_extra(platform_config)
scope = _scope_for_chat_type(getattr(source, "chat_type", None))
return policy_from_extra(extra, scope)
__all__ = [
"SlashAccessPolicy",
"policy_from_extra",
"policy_for_source",
]
+11 -5
View File
@@ -218,7 +218,11 @@ def _read_pid_record(pid_path: Optional[Path] = None) -> Optional[dict]:
if not pid_path.exists():
return None
raw = pid_path.read_text().strip()
try:
raw = pid_path.read_text().strip()
except OSError:
# File was deleted between exists() and read_text(), or permission flipped.
return None
if not raw:
return None
@@ -482,10 +486,12 @@ def write_runtime_status(
"""Persist gateway runtime health information for diagnostics/status."""
path = _get_runtime_status_path()
payload = _read_json_file(path) or _build_runtime_status_record()
current_record = _build_pid_record()
payload.setdefault("platforms", {})
payload.setdefault("kind", _GATEWAY_KIND)
payload["pid"] = os.getpid()
payload["start_time"] = _get_process_start_time(os.getpid())
payload["kind"] = current_record["kind"]
payload["pid"] = current_record["pid"]
payload["argv"] = current_record["argv"]
payload["start_time"] = current_record["start_time"]
payload["updated_at"] = _utc_now_iso()
if gateway_state is not _UNSET:
@@ -598,7 +604,7 @@ def acquire_scoped_lock(scope: str, identity: str, metadata: Optional[dict[str,
for _line in _proc_status.read_text(encoding="utf-8").splitlines():
if _line.startswith("State:"):
_state = _line.split()[1]
if _state in ("T", "t"): # stopped or tracing stop
if _state in {"T", "t"}: # stopped or tracing stop
stale = True
break
except (OSError, PermissionError):
+271 -20
View File
@@ -21,7 +21,15 @@ import queue
import re
import time
from dataclasses import dataclass
from typing import Any, Optional
from typing import Any, Callable, Optional
from gateway.platforms.base import BasePlatformAdapter as _BasePlatformAdapter
from gateway.platforms.base import _custom_unit_to_cp
from gateway.config import (
DEFAULT_STREAMING_EDIT_INTERVAL as _DEFAULT_STREAMING_EDIT_INTERVAL,
DEFAULT_STREAMING_BUFFER_THRESHOLD as _DEFAULT_STREAMING_BUFFER_THRESHOLD,
DEFAULT_STREAMING_CURSOR as _DEFAULT_STREAMING_CURSOR,
)
logger = logging.getLogger("gateway.stream_consumer")
@@ -40,9 +48,9 @@ _COMMENTARY = object()
@dataclass
class StreamConsumerConfig:
"""Runtime config for a single stream consumer instance."""
edit_interval: float = 1.0
buffer_threshold: int = 40
cursor: str = ""
edit_interval: float = _DEFAULT_STREAMING_EDIT_INTERVAL
buffer_threshold: int = _DEFAULT_STREAMING_BUFFER_THRESHOLD
cursor: str = _DEFAULT_STREAMING_CURSOR
buffer_only: bool = False
# When >0, the final edit for a streamed response is delivered as a
# fresh message if the original preview has been visible for at least
@@ -52,6 +60,18 @@ class StreamConsumerConfig:
# openclaw/openclaw#72038. Default 0 = always edit in place (legacy
# behavior). The gateway enables this selectively per-platform.
fresh_final_after_seconds: float = 0.0
# Streaming transport selection:
# "auto" — prefer native draft streaming (e.g. Telegram sendMessageDraft)
# when the adapter + chat supports it; fall back to edit.
# "draft" — explicitly request native draft streaming; fall back to
# edit when unsupported.
# "edit" — progressive editMessageText (legacy behavior).
# "off" — handled by the gateway before the consumer is even built.
transport: str = "auto"
# Hint for the consumer about the originating chat type (e.g. "dm",
# "group", "supergroup", "forum"). Used to gate native draft streaming,
# which is platform-specific (Telegram drafts are DM-only).
chat_type: str = ""
class GatewayStreamConsumer:
@@ -85,6 +105,11 @@ class GatewayStreamConsumer:
"</THINKING>", "</thinking>", "</thought>",
)
# Class-wide monotonic counter for native-streaming draft ids. Telegram
# animates a draft when the same draft_id is reused across consecutive
# calls in the same chat, so we need a fresh non-zero id per response.
_draft_id_counter: int = 0
def __init__(
self,
adapter: Any,
@@ -92,6 +117,7 @@ class GatewayStreamConsumer:
config: Optional[StreamConsumerConfig] = None,
metadata: Optional[dict] = None,
on_new_message: Optional[callable] = None,
initial_reply_to_id: Optional[str] = None,
):
self.adapter = adapter
self.chat_id = chat_id
@@ -105,6 +131,7 @@ class GatewayStreamConsumer:
# the content, not edit the old bubble above it.
# Called with no arguments. Exceptions are swallowed.
self._on_new_message = on_new_message
self._initial_reply_to_id = initial_reply_to_id
self._queue: queue.Queue = queue.Queue()
self._accumulated = ""
self._message_id: Optional[str] = None
@@ -136,6 +163,20 @@ class GatewayStreamConsumer:
self._in_think_block = False
self._think_buffer = ""
# Native draft-streaming state. Resolved at the start of run() based
# on cfg.transport, cfg.chat_type, and the adapter's
# supports_draft_streaming() probe. When True, the consumer emits
# animated draft frames via adapter.send_draft instead of progressive
# edits via adapter.edit_message. The final answer still goes
# through the normal first-send path so the user gets a real message
# in their chat history (drafts have no message_id).
self._use_draft_streaming = False
self._draft_id: Optional[int] = None
# Cumulative draft-frame failure count for this consumer. After the
# first failure we permanently disable drafts for the remainder of
# this response and route through edit-based for graceful degradation.
self._draft_failures = 0
@property
def already_sent(self) -> bool:
"""True if at least one message was sent or edited during the run."""
@@ -174,6 +215,16 @@ class GatewayStreamConsumer:
self._last_sent_text = ""
self._fallback_final_send = False
self._fallback_prefix = ""
# Native draft streaming: bump the draft_id so the next text segment
# animates as a fresh preview below the tool-progress bubbles, not
# over the prior segment's already-finalized draft. This is how
# we avoid the "inter-tool-call text leak" failure mode openclaw
# documented in their issue #32535 — each text block becomes its
# own visible message via the finalize, then a new draft animates
# for the next one.
if self._use_draft_streaming:
type(self)._draft_id_counter += 1
self._draft_id = type(self)._draft_id_counter
def on_delta(self, text: str) -> None:
"""Thread-safe callback — called from the agent's worker thread.
@@ -299,9 +350,32 @@ class GatewayStreamConsumer:
async def run(self) -> None:
"""Async task that drains the queue and edits the platform message."""
# Platform message length limit — leave room for cursor + formatting
# Platform message length limit — leave room for cursor + formatting.
# Use the adapter's length function (e.g. utf16_len for Telegram) so
# overflow detection matches what the platform actually enforces.
# Gate on isinstance(BasePlatformAdapter) so test MagicMocks (whose
# auto-attributes return mock objects, not callables) fall back to len.
_len_fn: "Callable[[str], int]" = (
self.adapter.message_len_fn
if isinstance(self.adapter, _BasePlatformAdapter)
else len
)
_raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
_safe_limit = max(500, _raw_limit - len(self.cfg.cursor) - 100)
_safe_limit = max(500, _raw_limit - _len_fn(self.cfg.cursor) - 100)
# Resolve native draft streaming once per run. When enabled the
# consumer routes mid-stream frames through adapter.send_draft and
# leaves _message_id=None so the existing got_done path delivers the
# final answer as a regular sendMessage (drafts have no message_id
# to edit).
self._use_draft_streaming = self._resolve_draft_streaming()
if self._use_draft_streaming:
type(self)._draft_id_counter += 1
self._draft_id = type(self)._draft_id_counter
logger.debug(
"Stream consumer using native-draft transport (chat=%s draft_id=%s)",
self.chat_id, self._draft_id,
)
try:
while True:
@@ -343,6 +417,10 @@ class GatewayStreamConsumer:
should_edit = should_edit or (
(elapsed >= self._current_edit_interval
and self._accumulated)
# buffer_threshold is intentionally codepoint-based:
# it's a debounce heuristic ("send updates roughly
# every N visible characters"), not a platform-limit
# check. _len_fn is reserved for overflow detection.
or len(self._accumulated) >= self.cfg.buffer_threshold
)
@@ -351,7 +429,7 @@ class GatewayStreamConsumer:
# Split overflow: if accumulated text exceeds the platform
# limit, split into properly sized chunks.
if (
len(self._accumulated) > _safe_limit
_len_fn(self._accumulated) > _safe_limit
and self._message_id is None
):
# No existing message to edit (first message or after a
@@ -360,15 +438,23 @@ class GatewayStreamConsumer:
# proper word/code-fence boundaries and chunk
# indicators like "(1/2)".
chunks = self.adapter.truncate_message(
self._accumulated, _safe_limit
self._accumulated, _safe_limit, len_fn=_len_fn,
)
chunks_delivered = False
reply_to = self._message_id or self._initial_reply_to_id
for chunk in chunks:
await self._send_new_chunk(chunk, self._message_id)
new_id = await self._send_new_chunk(chunk, reply_to)
if new_id is not None and new_id != reply_to:
chunks_delivered = True
self._accumulated = ""
self._last_sent_text = ""
self._last_edit_time = time.monotonic()
if got_done:
self._final_response_sent = self._already_sent
# Only claim final delivery if THESE chunks actually
# landed. ``_already_sent`` may be True from prior
# tool-progress edits or fallback-mode promotion (#10748)
# — that doesn't mean the final answer reached the user.
self._final_response_sent = chunks_delivered
return
if got_segment_break:
self._message_id = None
@@ -379,11 +465,14 @@ class GatewayStreamConsumer:
# Existing message: edit it with the first chunk, then
# start a new message for the overflow remainder.
while (
len(self._accumulated) > _safe_limit
_len_fn(self._accumulated) > _safe_limit
and self._message_id is not None
and self._edit_supported
):
split_at = self._accumulated.rfind("\n", 0, _safe_limit)
_cp_budget = _custom_unit_to_cp(
self._accumulated, _safe_limit, _len_fn,
)
split_at = self._accumulated.rfind("\n", 0, _cp_budget)
if split_at < _safe_limit // 2:
split_at = _safe_limit
chunk = self._accumulated[:split_at]
@@ -411,7 +500,7 @@ class GatewayStreamConsumer:
# path below so we don't finalize here for it.
current_update_visible = await self._send_or_edit(
display_text,
finalize=got_segment_break,
finalize=(got_done or got_segment_break),
)
self._last_edit_time = time.monotonic()
@@ -574,14 +663,18 @@ class GatewayStreamConsumer:
return final_text
@staticmethod
def _split_text_chunks(text: str, limit: int) -> list[str]:
def _split_text_chunks(
text: str, limit: int,
len_fn: "Callable[[str], int]" = len,
) -> list[str]:
"""Split text into reasonably sized chunks for fallback sends."""
if len(text) <= limit:
if len_fn(text) <= limit:
return [text]
chunks: list[str] = []
remaining = text
while len(remaining) > limit:
split_at = remaining.rfind("\n", 0, limit)
while len_fn(remaining) > limit:
_cp_budget = _custom_unit_to_cp(remaining, limit, len_fn)
split_at = remaining.rfind("\n", 0, _cp_budget)
if split_at < limit // 2:
split_at = limit
chunks.append(remaining[:split_at])
@@ -637,9 +730,15 @@ class GatewayStreamConsumer:
return
raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
_len_fn: "Callable[[str], int]" = (
self.adapter.message_len_fn
if isinstance(self.adapter, _BasePlatformAdapter)
else len
)
safe_limit = max(500, raw_limit - 100)
chunks = self._split_text_chunks(continuation, safe_limit)
chunks = self._split_text_chunks(continuation, safe_limit, len_fn=_len_fn)
stale_message_id = self._message_id # partial message to clean up
last_message_id: Optional[str] = None
last_successful_chunk = ""
sent_any_chunk = False
@@ -687,6 +786,22 @@ class GatewayStreamConsumer:
# so any stale tool-progress bubble gets closed off.
self._notify_new_message()
# Remove the frozen partial message so the user only sees the
# complete fallback response. Best-effort — if the platform doesn't
# implement ``delete_message``, the delete fails (flood control still
# active, bot lacks permission, message too old to delete), the
# partial remains but at least the full answer was delivered.
if stale_message_id and stale_message_id != last_message_id:
delete_fn = getattr(self.adapter, "delete_message", None)
if delete_fn is not None:
try:
await delete_fn(self.chat_id, stale_message_id)
except Exception as e:
logger.debug(
"Fallback partial cleanup failed (%s): %s",
stale_message_id, e,
)
self._message_id = last_message_id
self._already_sent = True
self._final_response_sent = True
@@ -699,6 +814,89 @@ class GatewayStreamConsumer:
err_lower = err.lower()
return "flood" in err_lower or "retry after" in err_lower or "rate" in err_lower
def _resolve_draft_streaming(self) -> bool:
"""Decide whether this run should use native draft streaming.
Honors ``cfg.transport``:
* ``"edit"`` never use drafts (legacy progressive-edit path).
* ``"draft"`` require draft support; gracefully fall back to edit
when the adapter declines. Logs the downgrade at debug.
* ``"auto"`` use drafts when the adapter supports them for this
chat type; otherwise edit.
Adapter eligibility is checked via
:meth:`BasePlatformAdapter.supports_draft_streaming`, which considers
the chat type (e.g. Telegram drafts are DM-only) and platform-version
gates (e.g. python-telegram-bot 22.6+).
"""
transport = (self.cfg.transport or "auto").lower()
if transport == "edit":
return False
# "off" is filtered upstream by the gateway; treat as edit defensively.
if transport == "off":
return False
# Test adapters are MagicMocks that don't subclass BasePlatformAdapter;
# default them to edit so existing test behaviour is preserved.
if not isinstance(self.adapter, _BasePlatformAdapter):
return False
try:
supported = self.adapter.supports_draft_streaming(
chat_type=self.cfg.chat_type or None,
metadata=self.metadata,
)
except Exception:
logger.debug("supports_draft_streaming probe raised", exc_info=True)
supported = False
if not supported:
if transport == "draft":
logger.debug(
"Draft streaming requested but unsupported (chat=%s, type=%r) — "
"falling back to edit",
self.chat_id, self.cfg.chat_type,
)
return False
return True
async def _send_draft_frame(self, text: str) -> bool:
"""Emit a single animated draft frame for the current accumulated text.
Returns True when the frame landed. On any failure, permanently
disables drafts for the remainder of this run so subsequent frames
flow through the edit-based path (which can adapt with flood-control
backoff, etc.). Drafts have no message_id and clear naturally on
the client when the response finalizes via a regular sendMessage.
"""
if self._draft_id is None:
# Defensive: should never happen — _use_draft_streaming gate is
# set in tandem with _draft_id in run(). Disable to be safe.
self._use_draft_streaming = False
return False
try:
result = await self.adapter.send_draft(
chat_id=self.chat_id,
draft_id=self._draft_id,
content=text,
metadata=self.metadata,
)
except Exception as e:
logger.debug(
"send_draft raised, disabling draft transport for this run: %s", e,
)
self._draft_failures += 1
self._use_draft_streaming = False
return False
if not getattr(result, "success", False):
logger.debug(
"send_draft returned success=False, disabling draft transport: %s",
getattr(result, "error", "unknown"),
)
self._draft_failures += 1
self._use_draft_streaming = False
return False
# Frame delivered. Track text for parity with edit-based no-op skip.
self._last_sent_text = text
return True
async def _flush_segment_tail_on_edit_failure(self) -> None:
"""Deliver un-sent tail content before a segment-break reset.
@@ -893,6 +1091,35 @@ class GatewayStreamConsumer:
and self.cfg.cursor in text
and len(_visible_stripped) < _MIN_NEW_MSG_CHARS):
return True # too short for a standalone message — accumulate more
# Native draft streaming: route mid-stream frames through send_draft.
# The final answer is delivered via the regular sendMessage path
# below — drafts have no message_id so we can't finalize them
# in-place; the regular sendMessage clears the draft naturally on
# the client and gives the user a real message in their history.
# Skip when:
# * finalize=True (this is the final answer; needs to be a real message)
# * an edit path is already established (message_id is set, e.g. after
# a tool-boundary segment break where the prior text was finalized
# as a real sendMessage and the next text segment continues editing
# that one — staying on edit-based for that segment is correct).
if (
self._use_draft_streaming
and not finalize
and self._message_id is None
):
# No-op skip: identical to the last frame we sent.
if text == self._last_sent_text:
return True
ok = await self._send_draft_frame(text)
if ok:
# Drafts mark "we put something on screen" but DO NOT set
# _already_sent — that flag gates the gateway's fallback
# final-send path and we still need that to fire so the
# user gets a real message (drafts have no message_id).
return True
# Failure already disabled drafts for this run; fall through to
# the regular edit/send path below.
try:
if self._message_id is not None:
if self._edit_supported:
@@ -931,7 +1158,29 @@ class GatewayStreamConsumer:
)
if result.success:
self._already_sent = True
self._last_sent_text = text
# Adapter may have split-and-delivered an oversized
# edit across the original message + N continuations.
# When that happens, ``message_id`` is the LAST visible
# continuation and ``_last_sent_text`` no longer reflects
# the on-screen content (the new message only holds the
# final chunk's text), so subsequent edits must target
# the new id and skip-if-same comparisons must reset.
# Fire on_new_message so tool-progress bubbles linearize
# below the new continuation, not the original.
# ``getattr`` with default keeps backwards compat with
# SimpleNamespace mocks in tests that pre-date the field.
_continuation_ids = getattr(result, "continuation_message_ids", ()) or ()
if (
_continuation_ids
and result.message_id
and result.message_id != self._message_id
):
self._message_id = str(result.message_id)
self._message_created_ts = time.monotonic()
self._last_sent_text = ""
self._notify_new_message()
else:
self._last_sent_text = text
# Successful edit — reset flood strike counter
self._flood_strikes = 0
return True
@@ -979,10 +1228,12 @@ class GatewayStreamConsumer:
# The final response will be sent by the fallback path.
return False
else:
# First message — send new
# First message — send new, threaded to the original user message
# so it lands in the correct topic/thread.
result = await self.adapter.send(
chat_id=self.chat_id,
content=text,
reply_to=self._initial_reply_to_id,
metadata=self.metadata,
)
if result.success:
+19 -9
View File
@@ -1450,7 +1450,7 @@ def resolve_provider(
# whose availability isn't implied by LM_API_KEY presence (it may be
# offline, and the no-auth setup uses a placeholder value), so it
# also requires explicit selection.
if pid in ("copilot", "lmstudio"):
if pid in {"copilot", "lmstudio"}:
continue
for env_var in pconfig.api_key_env_vars:
if has_usable_secret(os.getenv(env_var, "")):
@@ -2541,7 +2541,7 @@ def refresh_codex_oauth_pure(
# A 401/403 from the token endpoint always means the refresh token
# is invalid/expired — force relogin even if the body error code
# wasn't one of the known strings above.
if response.status_code in (401, 403) and not relogin_required:
if response.status_code in {401, 403} and not relogin_required:
relogin_required = True
raise AuthError(
message,
@@ -2947,7 +2947,7 @@ def _merge_shared_nous_oauth_state(state: Dict[str, Any]) -> bool:
"expires_at",
):
value = shared.get(key)
if value not in (None, ""):
if value not in {None, ""}:
state[key] = value
return True
@@ -3986,7 +3986,7 @@ def get_api_key_provider_status(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id in ("kimi-coding", "kimi-coding-cn"):
if provider_id in {"kimi-coding", "kimi-coding-cn"}:
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif env_url:
base_url = env_url
@@ -4090,7 +4090,7 @@ def resolve_api_key_provider_credentials(provider_id: str) -> Dict[str, Any]:
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if provider_id in ("kimi-coding", "kimi-coding-cn"):
if provider_id in {"kimi-coding", "kimi-coding-cn"}:
base_url = _resolve_kimi_base_url(api_key, pconfig.inference_base_url, env_url)
elif provider_id == "zai":
base_url = _resolve_zai_base_url(api_key, pconfig.inference_base_url, env_url)
@@ -4510,7 +4510,7 @@ def _login_openai_codex(
reuse = input("Use existing credentials? [Y/n]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
reuse = "y"
if reuse in ("", "y", "yes"):
if reuse in {"", "y", "yes"}:
config_path = _update_config_for_provider("openai-codex", existing.get("base_url", DEFAULT_CODEX_BASE_URL))
print()
print("Login successful!")
@@ -4531,7 +4531,7 @@ def _login_openai_codex(
do_import = input("Import these credentials? (a separate login is recommended) [y/N]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
do_import = "n"
if do_import in ("y", "yes"):
if do_import in {"y", "yes"}:
_save_codex_tokens(cli_tokens)
base_url = os.getenv("HERMES_CODEX_BASE_URL", "").strip().rstrip("/") or DEFAULT_CODEX_BASE_URL
config_path = _update_config_for_provider("openai-codex", base_url)
@@ -4623,7 +4623,7 @@ def _codex_device_code_login() -> Dict[str, Any]:
if poll_resp.status_code == 200:
code_resp = poll_resp.json()
break
elif poll_resp.status_code in (403, 404):
elif poll_resp.status_code in {403, 404}:
continue # User hasn't completed login yet
else:
raise AuthError(
@@ -5188,7 +5188,7 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
do_import = input("Import these credentials? [Y/n]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
do_import = "y"
if do_import in ("", "y", "yes"):
if do_import in {"", "y", "yes"}:
print("Rehydrating Nous session from shared credentials...")
auth_state = _try_import_shared_nous_state(
timeout_seconds=timeout_seconds,
@@ -5251,6 +5251,7 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
from hermes_cli.models import (
get_curated_nous_model_ids, get_pricing_for_provider,
check_nous_free_tier, partition_nous_models_by_tier,
union_with_portal_free_recommendations,
)
model_ids = get_curated_nous_model_ids()
@@ -5260,6 +5261,15 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
pricing = get_pricing_for_provider("nous")
free_tier = check_nous_free_tier()
if free_tier:
# The Portal's freeRecommendedModels endpoint is the
# source of truth for what's free *right now*. Augment
# the curated list with anything new the Portal flags
# as free so users on older Hermes builds still see
# newly-launched free models without a CLI release.
_portal_for_recs = auth_state.get("portal_base_url", "")
model_ids, pricing = union_with_portal_free_recommendations(
model_ids, pricing, _portal_for_recs,
)
model_ids, unavailable_models = partition_nous_models_by_tier(
model_ids, pricing, free_tier=True,
)
+1 -1
View File
@@ -266,7 +266,7 @@ def auth_add_command(args) -> None:
do_import = input("Import these credentials? [Y/n]: ").strip().lower()
except (EOFError, KeyboardInterrupt):
do_import = "y"
if do_import in ("", "y", "yes"):
if do_import in {"", "y", "yes"}:
print("Rehydrating Nous session from shared credentials...")
rehydrated = auth_mod._try_import_shared_nous_state(
timeout_seconds=getattr(args, "timeout", None) or 15.0,
+4 -6
View File
@@ -298,7 +298,7 @@ def _detect_prefix(zf: zipfile.ZipFile) -> str:
if len(first_parts) == 1:
prefix = first_parts.pop()
# Only strip if it looks like a hermes dir name
if prefix in (".hermes", "hermes"):
if prefix in {".hermes", "hermes"}:
return prefix + "/"
return ""
@@ -349,7 +349,7 @@ def run_import(args) -> None:
except (EOFError, KeyboardInterrupt):
print("\nAborted.")
sys.exit(1)
if answer not in ("y", "yes"):
if answer not in {"y", "yes"}:
print("Aborted.")
return
@@ -802,8 +802,7 @@ def _prune_pre_update_backups(backup_dir: Path, keep: int) -> int:
Operators who genuinely don't want a backup should set
``updates.pre_update_backup: false`` in config that gates creation.
"""
if keep < 1:
keep = 1
keep = max(keep, 1)
if not backup_dir.exists():
return 0
@@ -875,8 +874,7 @@ def _prune_pre_migration_backups(backup_dir: Path, keep: int) -> int:
Only touches files matching ``pre-migration-*.zip`` so other backups in
the same directory are never touched.
"""
if keep < 0:
keep = 0
keep = max(keep, 0)
if not backup_dir.exists():
return 0
+1 -1
View File
@@ -139,7 +139,7 @@ def _confirm(prompt: str) -> bool:
except (EOFError, KeyboardInterrupt):
print()
return False
return resp in ("y", "yes")
return resp in {"y", "yes"}
def cmd_clear(args: argparse.Namespace) -> int:
+10 -11
View File
@@ -298,7 +298,7 @@ def claw_command(args):
if action == "migrate":
_cmd_migrate(args)
elif action in ("cleanup", "clean"):
elif action in {"cleanup", "clean"}:
_cmd_cleanup(args)
else:
print("Usage: hermes claw <command> [options]")
@@ -670,17 +670,16 @@ def _cmd_cleanup(args):
elif not auto_yes and not sys.stdin.isatty():
print_info(f"Non-interactive session — would archive: {source_dir}")
print_info("To execute, re-run with: hermes claw cleanup --yes")
elif auto_yes or prompt_yes_no(f"Archive {source_dir}?", default=True):
try:
archive_path = _archive_directory(source_dir)
print_success(f"Archived: {source_dir}{archive_path}")
total_archived += 1
except OSError as e:
print_error(f"Could not archive: {e}")
print_info(f"Try manually: mv {source_dir} {source_dir}.pre-migration")
else:
if auto_yes or prompt_yes_no(f"Archive {source_dir}?", default=True):
try:
archive_path = _archive_directory(source_dir)
print_success(f"Archived: {source_dir}{archive_path}")
total_archived += 1
except OSError as e:
print_error(f"Could not archive: {e}")
print_info(f"Try manually: mv {source_dir} {source_dir}.pre-migration")
else:
print_info("Skipped.")
print_info("Skipped.")
# Summary
print()
+27 -6
View File
@@ -16,6 +16,19 @@ DEFAULT_CODEX_MODELS: List[str] = [
"gpt-5.4-mini",
"gpt-5.4",
"gpt-5.3-codex",
# gpt-5.3-codex-spark is in research preview and is exposed *only* via
# the Codex CLI / OAuth backend (chatgpt.com/backend-api/codex/models)
# for ChatGPT Pro subscribers. It is NOT available in the public OpenAI
# API, so it intentionally stays out of the "openai" provider catalog
# in hermes_cli/models.py — only the openai-codex (OAuth) provider
# surfaces it. The Codex backend reports ``supported_in_api: false`` for
# this slug; that flag describes API availability, not Codex backend
# availability, so the fetch/cache code paths below intentionally do
# not filter on it. PR #12994 removed this entry on the assumption it
# was unsupported — that was wrong; restored here. Keep it in the
# curated fallback so Pro users still see Spark in `/model` when live
# discovery is unavailable (offline first run, transient API failure).
"gpt-5.3-codex-spark",
"gpt-5.2-codex",
"gpt-5.1-codex-max",
"gpt-5.1-codex-mini",
@@ -26,6 +39,11 @@ _FORWARD_COMPAT_TEMPLATE_MODELS: List[tuple[str, tuple[str, ...]]] = [
("gpt-5.4-mini", ("gpt-5.3-codex", "gpt-5.2-codex")),
("gpt-5.4", ("gpt-5.3-codex", "gpt-5.2-codex")),
("gpt-5.3-codex", ("gpt-5.2-codex",)),
# Surface Spark whenever any compatible Codex template is present so
# accounts hitting the live endpoint with an older lineup still see
# Spark in the picker. Backend gates real availability by ChatGPT Pro
# entitlement; Hermes does not.
("gpt-5.3-codex-spark", ("gpt-5.3-codex", "gpt-5.2-codex")),
]
@@ -78,10 +96,12 @@ def _fetch_models_from_api(access_token: str) -> List[str]:
if not isinstance(slug, str) or not slug.strip():
continue
slug = slug.strip()
if item.get("supported_in_api") is False:
continue
# Codex CLI's catalog uses ``supported_in_api`` for the public OpenAI
# API, not for the OAuth-backed Codex backend that this provider uses.
# Some valid Codex CLI models (for example gpt-5.3-codex-spark) are
# marked false here but are still accepted by the Codex route.
visibility = item.get("visibility", "")
if isinstance(visibility, str) and visibility.strip().lower() in ("hide", "hidden"):
if isinstance(visibility, str) and visibility.strip().lower() in {"hide", "hidden"}:
continue
priority = item.get("priority")
rank = int(priority) if isinstance(priority, (int, float)) else 10_000
@@ -128,10 +148,11 @@ def _read_cache_models(codex_home: Path) -> List[str]:
if not isinstance(slug, str) or not slug.strip():
continue
slug = slug.strip()
if item.get("supported_in_api") is False:
continue
# Do not filter on ``supported_in_api`` here. It describes the
# public OpenAI API, while Hermes openai-codex talks to the same
# OAuth-backed Codex backend as Codex CLI.
visibility = item.get("visibility")
if isinstance(visibility, str) and visibility.strip().lower() in ("hide", "hidden"):
if isinstance(visibility, str) and visibility.strip().lower() in {"hide", "hidden"}:
continue
priority = item.get("priority")
rank = int(priority) if isinstance(priority, (int, float)) else 10_000
+4 -1
View File
@@ -79,6 +79,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("undo", "Remove the last user/assistant exchange", "Session"),
CommandDef("title", "Set a title for the current session", "Session",
args_hint="[name]"),
CommandDef("handoff", "Hand off this session to a messaging platform (Telegram, Discord, etc.)", "Session",
args_hint="<platform>", cli_only=True),
CommandDef("branch", "Branch the current session (explore a different path)", "Session",
aliases=("fork",), args_hint="[name]"),
CommandDef("compress", "Manually compress conversation context", "Session",
@@ -103,6 +105,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("goal", "Set a standing goal Hermes works on across turns until achieved", "Session",
args_hint="[text | pause | resume | clear | status]"),
CommandDef("status", "Show session info", "Session"),
CommandDef("whoami", "Show your slash command access (admin / user)", "Info"),
CommandDef("profile", "Show active profile name and home directory", "Info"),
CommandDef("sethome", "Set this chat as the home channel", "Session",
gateway_only=True, aliases=("set-home",)),
@@ -808,7 +811,7 @@ def discord_skill_commands_by_category(
# names are marked with a sentinel so the warning distinguishes
# "skill collided with a reserved command" from "two skills collided
# on the 32-char clamp" — the latter is the rename-worthy case.
_names_used: dict[str, str] = {n: "<reserved>" for n in reserved_names}
_names_used: dict[str, str] = dict.fromkeys(reserved_names, "<reserved>")
hidden = 0
try:
+3 -3
View File
@@ -216,9 +216,9 @@ _hermes() {{
typeset -A opt_args
_arguments -C \\
'(-h --help){{-h,--help}}[Show help and exit]' \\
'(-V --version){{-V,--version}}[Show version and exit]' \\
'(-p --profile){{-p,--profile}}[Profile name]:profile:_hermes_profiles' \\
'(-)'{{-h,--help}}'[Show help and exit]' \\
'(-)'{{-V,--version}}'[Show version and exit]' \\
'(-)'{{-p,--profile}}'[Profile name]:profile:_hermes_profiles' \\
'1:command:->commands' \\
'*::arg:->args'
+143 -13
View File
@@ -28,6 +28,48 @@ from typing import Dict, Any, Optional, List, Tuple
logger = logging.getLogger(__name__)
# Track which (config_path, mtime_ns, size) tuples we've already warned about
# so concurrent CLI/gateway loads of a broken config.yaml don't spam stderr
# every time. Cleared automatically when the file changes (different mtime).
_CONFIG_PARSE_WARNED: set = set()
def _warn_config_parse_failure(config_path: Path, exc: Exception) -> None:
"""Surface a config.yaml parse failure to user, log, and stderr.
A YAML parse error in ``~/.hermes/config.yaml`` causes ``load_config()``
to silently fall back to ``DEFAULT_CONFIG``, which means every user
override (auxiliary providers, fallback chain, model overrides, etc.)
is dropped. Before this helper that was a one-line ``print(...)`` that
scrolled off-screen on the first invocation and was never seen again.
Now: warn once per (path, mtime_ns, size) on stderr **and** in
``agent.log`` / ``errors.log`` at WARNING level so ``hermes logs``
surfaces it. Re-warns automatically if the file changes (different
mtime/size), so users editing the config see the next failure.
"""
try:
st = config_path.stat()
key = (str(config_path), st.st_mtime_ns, st.st_size)
except OSError:
key = (str(config_path), 0, 0)
if key in _CONFIG_PARSE_WARNED:
return
_CONFIG_PARSE_WARNED.add(key)
msg = (
f"Failed to parse {config_path}: {exc}. "
f"Falling back to default config — every user override "
f"(auxiliary providers, fallback chain, model settings) is being IGNORED. "
f"Fix the YAML and restart."
)
logger.warning(msg)
try:
sys.stderr.write(f"⚠️ hermes config: {msg}\n")
sys.stderr.flush()
except Exception:
pass
_IS_WINDOWS = platform.system() == "Windows"
_ENV_VAR_NAME_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
_LAST_EXPANDED_CONFIG_BY_PATH: Dict[str, Any] = {}
@@ -537,6 +579,7 @@ DEFAULT_CONFIG = {
# Explicit opt-in: mount the host cwd into /workspace for Docker sessions.
# Default off because passing host directories into a sandbox weakens isolation.
"docker_mount_cwd_to_workspace": False,
"docker_extra_args": [], # Extra flags passed verbatim to docker run
# Explicit opt-in: run the Docker container as the host user's uid:gid
# (via `--user`). When enabled, files written into bind-mounted dirs
# (docker_volumes, the persistent workspace, or the auto-mounted cwd)
@@ -680,8 +723,15 @@ DEFAULT_CONFIG = {
# Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
# cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
# long_lived_prefix: when true (default), Claude on Anthropic / OpenRouter / Nous
# Portal uses a split layout: tools[-1] + stable system prefix at long_lived_ttl
# (cross-session cache), last 2 messages at cache_ttl (within-session rolling).
# Set false to keep the legacy "system + last 3 messages" single-tier layout.
# long_lived_ttl: TTL for the cross-session prefix tier ("5m" or "1h"; default "1h").
"prompt_caching": {
"cache_ttl": "5m",
"long_lived_prefix": True,
"long_lived_ttl": "1h",
},
# OpenRouter-specific settings.
@@ -691,9 +741,18 @@ DEFAULT_CONFIG = {
# See: https://openrouter.ai/docs/guides/features/response-caching
# response_cache_ttl: how long cached responses remain valid, in seconds (1-86400).
# Default 300 (5 minutes). Only used when response_cache is enabled.
# min_coding_score: knob for the openrouter/pareto-code router (0.0-1.0).
# Only applied when model.model is "openrouter/pareto-code". Higher
# values route to stronger (more expensive) coders; lower values open
# up cheaper, faster options. Default 0.65 lands on the mid-tier
# coder on the current Pareto frontier. Empty string = let OpenRouter
# pick the strongest available coder (router's documented default
# when the plugins block is omitted).
# See: https://openrouter.ai/docs/guides/routing/routers/pareto-router
"openrouter": {
"response_cache": True,
"response_cache_ttl": 300,
"min_coding_score": 0.65,
},
# AWS Bedrock provider configuration.
@@ -722,6 +781,26 @@ DEFAULT_CONFIG = {
# Empty model = use provider's default auxiliary model.
# All tasks fall back to openrouter:google/gemini-3-flash-preview if
# the configured provider is unavailable.
#
# extra_body: forwarded verbatim as request body fields on every aux call
# for that task. Use this to set provider-specific knobs (independent of
# main-agent settings). On OpenRouter you can set provider routing prefs
# and the Pareto Code coding-score floor here. Example:
#
# auxiliary:
# compression:
# provider: openrouter
# model: openrouter/pareto-code
# extra_body:
# provider: # OpenRouter provider routing
# order: [anthropic, google]
# sort: throughput # or price | latency
# plugins: # OpenRouter Pareto Code router
# - id: pareto-router
# min_coding_score: 0.5
#
# Each aux task is independent — main-agent provider_routing and
# openrouter.min_coding_score do NOT propagate to aux calls by design.
"auxiliary": {
"vision": {
"provider": "auto", # auto | openrouter | nous | codex | custom
@@ -830,6 +909,7 @@ DEFAULT_CONFIG = {
"bell_on_complete": False,
"show_reasoning": False,
"streaming": False,
"timestamps": False, # Show [HH:MM] on user and assistant labels
"final_response_markdown": "strip", # render | strip | raw
# Preserve recent classic CLI output across Ctrl+L, /redraw, and
# terminal resize full-screen clears. Disable if a terminal emulator
@@ -1113,6 +1193,45 @@ DEFAULT_CONFIG = {
},
},
# Post-edit lint behaviour. Hermes runs a syntax check after every
# write/patch and surfaces only the errors that edit introduced (see
# ``ShellFileOperations._check_lint_delta`` in tools/file_operations.py).
# The defaults below leave the legacy shell linters (``npx tsc --noEmit``,
# ``go vet``, ``rustfmt --check``, ``py_compile``) in charge — the LSP
# path is opt-in until container/SSH backends grow a way to host the
# language server inside the sandbox.
"lint": {
"lsp": {
# Master switch. When false, ``_check_lint`` skips LSP entirely
# and the in-process / shell linter table runs as before.
"enabled": False,
# Per-language server launch command. Each value is either a
# whitespace-separated string or a list. Missing languages
# fall back to ``tools.lsp_lint._DEFAULT_SERVERS``.
"servers": {
"typescript": "typescript-language-server --stdio",
"typescriptreact": "typescript-language-server --stdio",
"javascript": "typescript-language-server --stdio",
"javascriptreact": "typescript-language-server --stdio",
"rust": "rust-analyzer",
"go": "gopls",
},
# Wall-clock timeout for a single ``didOpen`` → diagnostics
# round-trip. Servers that take longer than this on cold start
# (notably rust-analyzer indexing a fresh checkout) cause the
# caller to fall through to the shell linter.
"diagnostic_timeout": 10,
# Settle window (milliseconds). Typescript-language-server
# publishes an empty diagnostics batch while the program graph
# loads, then re-publishes once the file is fully analysed.
# Re-snapshotting after this delay catches the real verdict.
"settle_ms": 400,
# Shut down a long-lived server process after this many seconds
# of inactivity. The reaper runs in-process; no daemons.
"idle_shutdown": 600,
},
},
# Honcho AI-native memory -- reads ~/.honcho/config.json as single source of truth.
# This section is only needed for hermes-specific overrides; everything else
# (apiKey, workspace, peerName, sessions, enabled) comes from the global config.
@@ -1204,6 +1323,15 @@ DEFAULT_CONFIG = {
# "Always Approve" to silence the prompt permanently; that flips
# this key to false.
"mcp_reload_confirm": True,
# When true, destructive session slash commands (/clear, /new, /reset,
# /undo) ask the user to confirm before discarding conversation state.
# Three-option prompt (Approve Once / Always Approve / Cancel) routed
# through tools.slash_confirm — native yes/no buttons on Telegram,
# Discord, and Slack; text fallback elsewhere. Users click "Always
# Approve" to silence the prompt permanently; that flips this key to
# false. TUI has its own modal overlay (HERMES_TUI_NO_CONFIRM=1 to
# opt out there).
"destructive_slash_confirm": True,
},
# Permanently allowed dangerous command patterns (added via "always" approval)
@@ -3120,7 +3248,7 @@ def warn_deprecated_cwd_env_vars(config: Optional[Dict[str, Any]] = None) -> Non
terminal_cfg = config.get("terminal", {})
config_cwd = terminal_cfg.get("cwd", ".") if isinstance(terminal_cfg, dict) else "."
# Only warn if config.yaml doesn't have an explicit path
config_has_explicit_cwd = config_cwd not in (".", "auto", "cwd", "")
config_has_explicit_cwd = config_cwd not in {".", "auto", "cwd", ""}
lines: list[str] = []
if messaging_cwd:
@@ -3180,10 +3308,10 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
if "tool_progress" not in display:
old_enabled = get_env_value("HERMES_TOOL_PROGRESS")
old_mode = get_env_value("HERMES_TOOL_PROGRESS_MODE")
if old_enabled and old_enabled.lower() in ("false", "0", "no"):
if old_enabled and old_enabled.lower() in {"false", "0", "no"}:
display["tool_progress"] = "off"
results["config_added"].append("display.tool_progress=off (from HERMES_TOOL_PROGRESS=false)")
elif old_mode and old_mode.lower() in ("new", "all"):
elif old_mode and old_mode.lower() in {"new", "all"}:
display["tool_progress"] = old_mode.lower()
results["config_added"].append(f"display.tool_progress={old_mode.lower()} (from HERMES_TOOL_PROGRESS_MODE)")
else:
@@ -3262,7 +3390,7 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
new_entry = {"api": old_url}
if old_name:
new_entry["name"] = old_name
if old_key and old_key not in ("no-key", "no-key-required", ""):
if old_key and old_key not in {"no-key", "no-key-required", ""}:
new_entry["api_key"] = old_key
# Carry over model and api_mode if present
@@ -3320,7 +3448,7 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
stt.pop("model", None)
# Place it in the appropriate provider section only if the
# user didn't already set a model there
if provider in ("local", "local_command"):
if provider in {"local", "local_command"}:
# Don't migrate an OpenAI model name into the local section
_local_models = {
"tiny.en", "tiny", "base.en", "base", "small.en", "small",
@@ -3404,7 +3532,7 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
if not aux_comp.get("model"):
aux_comp["model"] = str(s_model).strip()
migrated_keys.append(f"model={s_model}")
if s_provider and str(s_provider).strip() not in ("", "auto"):
if s_provider and str(s_provider).strip() not in {"", "auto"}:
aux = config.setdefault("auxiliary", {})
aux_comp = aux.setdefault("compression", {})
if not aux_comp.get("provider") or aux_comp.get("provider") == "auto":
@@ -3635,7 +3763,7 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
except (EOFError, KeyboardInterrupt):
answer = "n"
if answer in ("y", "yes"):
if answer in {"y", "yes"}:
print()
for name, info in new_and_unset:
if info.get("url"):
@@ -3696,7 +3824,7 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
except (EOFError, KeyboardInterrupt):
answer = "n"
if answer in ("y", "yes"):
if answer in {"y", "yes"}:
print()
config = load_config()
try:
@@ -3966,7 +4094,8 @@ def read_raw_config() -> Dict[str, Any]:
try:
with open(config_path, encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
except Exception:
except Exception as e:
_warn_config_parse_failure(config_path, e)
return {}
if not isinstance(data, dict):
@@ -4016,7 +4145,7 @@ def load_config() -> Dict[str, Any]:
config = _deep_merge(config, user_config)
except Exception as e:
print(f"Warning: Failed to load config: {e}")
_warn_config_parse_failure(config_path, e)
normalized = _normalize_root_model_keys(_normalize_max_turns_config(config))
expanded = _expand_env_vars(normalized)
@@ -4777,9 +4906,9 @@ def set_config_value(key: str, value: str):
# inline navigation here silently overwrote lists with dicts.
# Convert value to appropriate type
if value.lower() in ('true', 'yes', 'on'):
if value.lower() in {'true', 'yes', 'on'}:
value = True
elif value.lower() in ('false', 'no', 'off'):
elif value.lower() in {'false', 'no', 'off'}:
value = False
elif value.isdigit():
value = int(value)
@@ -4805,6 +4934,7 @@ def set_config_value(key: str, value: str):
"terminal.vercel_runtime": "TERMINAL_VERCEL_RUNTIME",
"terminal.docker_mount_cwd_to_workspace": "TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE",
"terminal.docker_run_as_host_user": "TERMINAL_DOCKER_RUN_AS_HOST_USER",
"terminal.docker_env": "TERMINAL_DOCKER_ENV",
# terminal.cwd intentionally excluded — CLI resolves at runtime,
# gateway bridges it in gateway/run.py. Persisting to .env causes
# stale values to poison child processes.
@@ -4983,7 +5113,7 @@ def _inject_profile_env_vars() -> None:
try:
from providers import list_providers
for _pp in list_providers():
if _pp.auth_type not in ("api_key",):
if _pp.auth_type not in {"api_key",}:
continue
for _var in _pp.env_vars:
if _var in OPTIONAL_ENV_VARS:
+1 -1
View File
@@ -128,7 +128,7 @@ def _try_gh_cli_token() -> Optional[str]:
# Build a clean env so gh doesn't short-circuit on GITHUB_TOKEN / GH_TOKEN
clean_env = {k: v for k, v in os.environ.items()
if k not in ("GITHUB_TOKEN", "GH_TOKEN")}
if k not in {"GITHUB_TOKEN", "GH_TOKEN"}}
for gh_path in _gh_cli_candidates():
cmd = [gh_path, "auth", "token"]
+12 -3
View File
@@ -55,7 +55,16 @@ def _cmd_status(args) -> int:
print(f"curator: {status_line}")
print(f" runs: {runs}")
print(f" last run: {_fmt_ts(last_run)}")
print(f" last summary: {summary}")
# Summary may be multi-line when the curator archived skills (the rename
# map gets appended as `name → umbrella` lines). Indent continuation
# lines so the block reads as one logical field.
if "\n" in summary:
first, *rest = summary.splitlines()
print(f" last summary: {first}")
for line in rest:
print(f" {line}")
else:
print(f" last summary: {summary}")
_report = state.get("last_report_path")
if _report:
suffix = "" if Path(_report).exists() else " (missing)"
@@ -338,7 +347,7 @@ def _cmd_prune(args) -> int:
except (EOFError, KeyboardInterrupt):
print("\ncurator: aborted")
return 1
if reply not in ("y", "yes"):
if reply not in {"y", "yes"}:
print("curator: aborted")
return 1
@@ -440,7 +449,7 @@ def _cmd_rollback(args) -> int:
except (EOFError, KeyboardInterrupt):
print("\ncancelled")
return 1
if ans not in ("y", "yes"):
if ans not in {"y", "yes"}:
print("cancelled")
return 1
+12 -12
View File
@@ -139,16 +139,16 @@ def curses_checklist(
stdscr.refresh()
key = stdscr.getch()
if key in (curses.KEY_UP, ord("k")):
if key in {curses.KEY_UP, ord("k")}:
cursor = (cursor - 1) % len(items)
elif key in (curses.KEY_DOWN, ord("j")):
elif key in {curses.KEY_DOWN, ord("j")}:
cursor = (cursor + 1) % len(items)
elif key == ord(" "):
chosen.symmetric_difference_update({cursor})
elif key in (curses.KEY_ENTER, 10, 13):
elif key in {curses.KEY_ENTER, 10, 13}:
result_holder[0] = set(chosen)
return
elif key in (27, ord("q")):
elif key in {27, ord("q")}:
result_holder[0] = cancel_returns
return
@@ -265,14 +265,14 @@ def curses_radiolist(
stdscr.refresh()
key = stdscr.getch()
if key in (curses.KEY_UP, ord("k")):
if key in {curses.KEY_UP, ord("k")}:
cursor = (cursor - 1) % len(items)
elif key in (curses.KEY_DOWN, ord("j")):
elif key in {curses.KEY_DOWN, ord("j")}:
cursor = (cursor + 1) % len(items)
elif key in (ord(" "), curses.KEY_ENTER, 10, 13):
elif key in {ord(" "), curses.KEY_ENTER, 10, 13}:
result_holder[0] = cursor
return
elif key in (27, ord("q")):
elif key in {27, ord("q")}:
result_holder[0] = cancel_returns
return
@@ -388,14 +388,14 @@ def curses_single_select(
stdscr.refresh()
key = stdscr.getch()
if key in (curses.KEY_UP, ord("k")):
if key in {curses.KEY_UP, ord("k")}:
cursor = (cursor - 1) % len(all_items)
elif key in (curses.KEY_DOWN, ord("j")):
elif key in {curses.KEY_DOWN, ord("j")}:
cursor = (cursor + 1) % len(all_items)
elif key in (curses.KEY_ENTER, 10, 13):
elif key in {curses.KEY_ENTER, 10, 13}:
result_holder[0] = cursor
return
elif key in (27, ord("q")):
elif key in {27, ord("q")}:
result_holder[0] = None
return
+1 -1
View File
@@ -93,7 +93,7 @@ def poll_registration(device_code: str) -> dict:
"""
data = _api_post("/app/registration/poll", {"device_code": device_code})
status_raw = str(data.get("status", "")).strip().upper()
if status_raw not in ("WAITING", "SUCCESS", "FAIL", "EXPIRED"):
if status_raw not in {"WAITING", "SUCCESS", "FAIL", "EXPIRED"}:
status_raw = "UNKNOWN"
return {
"status": status_raw,
+351 -186
View File
@@ -245,15 +245,31 @@ def _build_apikey_providers_list() -> list:
}
for _label, _canonical in _name_to_canonical.items():
_known_canonical.add(_canonical)
# Providers that already have a dedicated health check above the generic
# API-key loop (with custom headers/auth). Skip their pluggable profiles
# here so the generic Bearer-auth loop doesn't run a duplicate, broken
# check (e.g. Anthropic native API requires x-api-key, not Bearer).
_dedicated_canonical = {"anthropic", "openrouter", "bedrock"}
_known_canonical.update(_dedicated_canonical)
try:
from providers import list_providers
from providers.base import ProviderProfile as _PP
try:
from hermes_cli.providers import normalize_provider as _normalize_provider
except Exception: # pragma: no cover - normalization is best-effort
def _normalize_provider(_name: str) -> str:
return (_name or "").strip().lower()
for _pp in list_providers():
if not isinstance(_pp, _PP) or _pp.auth_type != "api_key" or not _pp.env_vars:
continue
_label = _pp.display_name or _pp.name
if _label in _known_names or _pp.name in _known_canonical:
continue
_candidates = {_normalize_provider(_pp.name)}
for _alias in (_pp.aliases or ()):
_candidates.add(_normalize_provider(_alias))
if _candidates & _dedicated_canonical:
continue
# Separate API-key vars from base-URL override vars — the health-check
# loop sends the first found value as Authorization: Bearer, so a URL
# string must never be picked.
@@ -457,7 +473,7 @@ def run_doctor(args):
if (
provider
and _resolve_auth_provider is not None
and provider not in ("auto", "custom")
and provider not in {"auto", "custom"}
):
try:
runtime_provider = _resolve_auth_provider(provider)
@@ -469,7 +485,7 @@ def run_doctor(args):
if (
provider
and _resolve_provider_full is not None
and provider not in ("auto", "custom")
and provider not in {"auto", "custom"}
):
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
catalog_provider = provider_def.id if provider_def is not None else None
@@ -526,7 +542,7 @@ def run_doctor(args):
# own env-var checks elsewhere in doctor, and get_auth_status()
# returns a bare {logged_in: False} for anything it doesn't
# explicitly dispatch, which would produce false positives.
if runtime_provider and runtime_provider not in ("auto", "custom", "openrouter"):
if runtime_provider and runtime_provider not in {"auto", "custom", "openrouter"}:
try:
from hermes_cli.auth import PROVIDER_REGISTRY, get_auth_status
pconfig = PROVIDER_REGISTRY.get(runtime_provider)
@@ -713,13 +729,12 @@ def run_doctor(args):
hermes_home = HERMES_HOME
if hermes_home.exists():
check_ok(f"{_DHH} directory exists")
elif should_fix:
hermes_home.mkdir(parents=True, exist_ok=True)
check_ok(f"Created {_DHH} directory")
fixed_count += 1
else:
if should_fix:
hermes_home.mkdir(parents=True, exist_ok=True)
check_ok(f"Created {_DHH} directory")
fixed_count += 1
else:
check_warn(f"{_DHH} not found", "(will be created on first use)")
check_warn(f"{_DHH} not found", "(will be created on first use)")
# Check expected subdirectories
expected_subdirs = ["cron", "sessions", "logs", "skills", "memories"]
@@ -727,13 +742,12 @@ def run_doctor(args):
subdir_path = hermes_home / subdir_name
if subdir_path.exists():
check_ok(f"{_DHH}/{subdir_name}/ exists")
elif should_fix:
subdir_path.mkdir(parents=True, exist_ok=True)
check_ok(f"Created {_DHH}/{subdir_name}/")
fixed_count += 1
else:
if should_fix:
subdir_path.mkdir(parents=True, exist_ok=True)
check_ok(f"Created {_DHH}/{subdir_name}/")
fixed_count += 1
else:
check_warn(f"{_DHH}/{subdir_name}/ not found", "(will be created on first use)")
check_warn(f"{_DHH}/{subdir_name}/ not found", "(will be created on first use)")
# Check for SOUL.md persona file
soul_path = hermes_home / "SOUL.md"
@@ -939,14 +953,12 @@ def run_doctor(args):
else:
check_fail("docker not found", "(required for TERMINAL_ENV=docker)")
issues.append("Install Docker or change TERMINAL_ENV")
elif _safe_which("docker"):
check_ok("docker", "(optional)")
elif _is_termux():
check_info("Docker backend is not available inside Termux (expected on Android)")
else:
if _safe_which("docker"):
check_ok("docker", "(optional)")
else:
if _is_termux():
check_info("Docker backend is not available inside Termux (expected on Android)")
else:
check_warn("docker not found", "(optional)")
check_warn("docker not found", "(optional)")
# SSH (if using ssh backend)
if terminal_env == "ssh":
@@ -998,7 +1010,7 @@ def run_doctor(args):
issues.append(f"Set TERMINAL_VERCEL_RUNTIME to one of: {supported}")
disk = os.getenv("TERMINAL_CONTAINER_DISK", "51200").strip()
if disk in ("", "0", "51200"):
if disk in {"", "0", "51200"}:
check_ok("Vercel disk setting", "(uses platform default)")
else:
check_fail("Vercel custom disk unsupported", "(reset terminal.container_disk to 51200)")
@@ -1024,7 +1036,7 @@ def run_doctor(args):
for line in auth_status.detail_lines:
check_info(f"Vercel auth {line}")
persistent = os.getenv("TERMINAL_CONTAINER_PERSISTENT", "true").lower() in ("1", "true", "yes", "on")
persistent = os.getenv("TERMINAL_CONTAINER_PERSISTENT", "true").lower() in {"1", "true", "yes", "on"}
if persistent:
check_info("Vercel persistence: snapshot filesystem only; live processes do not survive sandbox recreation")
else:
@@ -1042,15 +1054,14 @@ def run_doctor(args):
elif shutil.which("agent-browser"):
check_ok("agent-browser", "(browser automation)")
agent_browser_ok = True
elif _is_termux():
check_info("agent-browser is not installed (expected in the tested Termux path)")
check_info("Install it manually later with: npm install -g agent-browser && agent-browser install")
check_info("Termux browser setup:")
for step in _termux_browser_setup_steps(node_installed=True):
check_info(step)
else:
if _is_termux():
check_info("agent-browser is not installed (expected in the tested Termux path)")
check_info("Install it manually later with: npm install -g agent-browser && agent-browser install")
check_info("Termux browser setup:")
for step in _termux_browser_setup_steps(node_installed=True):
check_info(step)
else:
check_warn("agent-browser not installed", "(run: npm install)")
check_warn("agent-browser not installed", "(run: npm install)")
# Chromium presence — the browser tools silently fail to register when
# agent-browser is found but no Playwright-managed Chromium is on disk
@@ -1101,15 +1112,14 @@ def run_doctor(args):
f"Install with: cd {PROJECT_ROOT} && "
"npx playwright install --with-deps chromium"
)
elif _is_termux():
check_info("Node.js not found (browser tools are optional in the tested Termux path)")
check_info("Install Node.js on Termux with: pkg install nodejs")
check_info("Termux browser setup:")
for step in _termux_browser_setup_steps(node_installed=False):
check_info(step)
else:
if _is_termux():
check_info("Node.js not found (browser tools are optional in the tested Termux path)")
check_info("Install Node.js on Termux with: pkg install nodejs")
check_info("Termux browser setup:")
for step in _termux_browser_setup_steps(node_installed=False):
check_info(step)
else:
check_warn("Node.js not found", "(optional, needed for browser tools)")
check_warn("Node.js not found", "(optional, needed for browser tools)")
# npm audit for all Node.js packages
_npm_bin = _safe_which("npm")
@@ -1166,44 +1176,92 @@ def run_doctor(args):
# =========================================================================
print()
print(color("◆ API Connectivity", Colors.CYAN, Colors.BOLD))
openrouter_key = os.getenv("OPENROUTER_API_KEY")
if openrouter_key:
print(" Checking OpenRouter API...", end="", flush=True)
# Refactor: every connectivity probe below is HTTP-bound and fully
# independent. Running them in series spent ~5s wall on a typical
# workstation (2s of that was boto3's IMDS lookup for AWS credentials,
# which times out unless you're actually on EC2). Threading them with
# a small executor pool collapses the section to roughly the slowest
# single probe — about 2s — without changing the output format.
#
# Each ``_probe_*`` helper is a pure function: takes its inputs,
# makes one HTTP/SDK call, returns a ``_ConnectivityResult`` carrying
# the line(s) to print and any issue strings to append. No globals,
# no shared mutable state, no printing inside the workers.
import concurrent.futures as _futures
from collections import namedtuple as _namedtuple
_ConnectivityResult = _namedtuple(
"_ConnectivityResult", ["label", "lines", "issues"]
)
_probes: list = [] # list of (label, callable) submitted in display order
def _probe_openrouter() -> _ConnectivityResult:
key = os.getenv("OPENROUTER_API_KEY")
if not key:
return _ConnectivityResult(
"OpenRouter API",
[(color("", Colors.YELLOW), "OpenRouter API",
color("(not configured)", Colors.DIM))],
[],
)
try:
import httpx
response = httpx.get(
r = httpx.get(
OPENROUTER_MODELS_URL,
headers={"Authorization": f"Bearer {openrouter_key}"},
timeout=10
headers={"Authorization": f"Bearer {key}"},
timeout=10,
)
if response.status_code == 200:
print(f"\r {color('', Colors.GREEN)} OpenRouter API ")
elif response.status_code == 401:
print(f"\r {color('', Colors.RED)} OpenRouter API {color('(invalid API key)', Colors.DIM)} ")
issues.append("Check OPENROUTER_API_KEY in .env")
elif response.status_code == 402:
print(f"\r {color('', Colors.RED)} OpenRouter API {color('(out of credits — payment required)', Colors.DIM)}")
issues.append(
"OpenRouter account has insufficient credits. "
"Fix: run 'hermes config set model.provider <provider>' to switch providers, "
"or fund your OpenRouter account at https://openrouter.ai/settings/credits"
if r.status_code == 200:
return _ConnectivityResult(
"OpenRouter API",
[(color("", Colors.GREEN), "OpenRouter API", "")],
[],
)
elif response.status_code == 429:
print(f"\r {color('', Colors.RED)} OpenRouter API {color('(rate limited)', Colors.DIM)} ")
issues.append("OpenRouter rate limit hit — consider switching to a different provider or waiting")
else:
print(f"\r {color('', Colors.RED)} OpenRouter API {color(f'(HTTP {response.status_code})', Colors.DIM)} ")
if r.status_code == 401:
return _ConnectivityResult(
"OpenRouter API",
[(color("", Colors.RED), "OpenRouter API",
color("(invalid API key)", Colors.DIM))],
["Check OPENROUTER_API_KEY in .env"],
)
if r.status_code == 402:
return _ConnectivityResult(
"OpenRouter API",
[(color("", Colors.RED), "OpenRouter API",
color("(out of credits — payment required)", Colors.DIM))],
["OpenRouter account has insufficient credits. "
"Fix: run 'hermes config set model.provider <provider>' "
"to switch providers, or fund your OpenRouter account "
"at https://openrouter.ai/settings/credits"],
)
if r.status_code == 429:
return _ConnectivityResult(
"OpenRouter API",
[(color("", Colors.RED), "OpenRouter API",
color("(rate limited)", Colors.DIM))],
["OpenRouter rate limit hit — consider switching to "
"a different provider or waiting"],
)
return _ConnectivityResult(
"OpenRouter API",
[(color("", Colors.RED), "OpenRouter API",
color(f"(HTTP {r.status_code})", Colors.DIM))],
[],
)
except Exception as e:
print(f"\r {color('', Colors.RED)} OpenRouter API {color(f'({e})', Colors.DIM)} ")
issues.append("Check network connectivity")
else:
check_warn("OpenRouter API", "(not configured)")
from hermes_cli.auth import get_anthropic_key
anthropic_key = get_anthropic_key()
if anthropic_key:
print(" Checking Anthropic API...", end="", flush=True)
return _ConnectivityResult(
"OpenRouter API",
[(color("", Colors.RED), "OpenRouter API",
color(f"({e})", Colors.DIM))],
["Check network connectivity"],
)
def _probe_anthropic() -> _ConnectivityResult:
from hermes_cli.auth import get_anthropic_key
key = get_anthropic_key()
if not key:
return _ConnectivityResult("Anthropic API", [], [])
try:
import httpx
from agent.anthropic_adapter import (
@@ -1212,140 +1270,247 @@ def run_doctor(args):
_OAUTH_ONLY_BETAS,
_CONTEXT_1M_BETA,
)
headers = {"anthropic-version": "2023-06-01"}
is_oauth = _is_oauth_token(anthropic_key)
is_oauth = _is_oauth_token(key)
if is_oauth:
headers["Authorization"] = f"Bearer {anthropic_key}"
headers["Authorization"] = f"Bearer {key}"
headers["anthropic-beta"] = ",".join(_COMMON_BETAS + _OAUTH_ONLY_BETAS)
else:
headers["x-api-key"] = anthropic_key
response = httpx.get(
headers["x-api-key"] = key
r = httpx.get(
"https://api.anthropic.com/v1/models",
headers=headers,
timeout=10
headers=headers, timeout=10,
)
# Reactive recovery: OAuth subscriptions that don't include 1M
# context reject the request with 400 "long context beta is not
# yet available for this subscription". Retry once with that
# beta stripped so the doctor check doesn't falsely report the
# Anthropic API as unreachable for those users.
# Reactive recovery: OAuth subscriptions without 1M context reject the
# request with 400 "long context beta is not yet available for this
# subscription". Retry once with that beta stripped so the doctor
# check doesn't falsely report Anthropic as unreachable.
if (
is_oauth
and response.status_code == 400
and "long context beta" in response.text.lower()
and "not yet available" in response.text.lower()
and r.status_code == 400
and "long context beta" in r.text.lower()
and "not yet available" in r.text.lower()
):
headers["anthropic-beta"] = ",".join(
[b for b in _COMMON_BETAS if b != _CONTEXT_1M_BETA] + list(_OAUTH_ONLY_BETAS)
[b for b in _COMMON_BETAS if b != _CONTEXT_1M_BETA]
+ list(_OAUTH_ONLY_BETAS)
)
response = httpx.get(
r = httpx.get(
"https://api.anthropic.com/v1/models",
headers=headers,
timeout=10,
headers=headers, timeout=10,
)
if response.status_code == 200:
print(f"\r {color('', Colors.GREEN)} Anthropic API ")
elif response.status_code == 401:
print(f"\r {color('', Colors.RED)} Anthropic API {color('(invalid API key)', Colors.DIM)} ")
else:
msg = "(couldn't verify)"
print(f"\r {color('', Colors.YELLOW)} Anthropic API {color(msg, Colors.DIM)} ")
if r.status_code == 200:
return _ConnectivityResult(
"Anthropic API",
[(color("", Colors.GREEN), "Anthropic API", "")],
[],
)
if r.status_code == 401:
return _ConnectivityResult(
"Anthropic API",
[(color("", Colors.RED), "Anthropic API",
color("(invalid API key)", Colors.DIM))],
[],
)
return _ConnectivityResult(
"Anthropic API",
[(color("", Colors.YELLOW), "Anthropic API",
color("(couldn't verify)", Colors.DIM))],
[],
)
except Exception as e:
print(f"\r {color('', Colors.YELLOW)} Anthropic API {color(f'({e})', Colors.DIM)} ")
return _ConnectivityResult(
"Anthropic API",
[(color("", Colors.YELLOW), "Anthropic API",
color(f"({e})", Colors.DIM))],
[],
)
def _probe_apikey_provider(pname, env_vars, default_url, base_env,
supports_health_check) -> _ConnectivityResult:
key = ""
for ev in env_vars:
key = os.getenv(ev, "")
if key:
break
if not key:
return _ConnectivityResult(pname, [], [])
label = pname.ljust(20)
if not supports_health_check:
return _ConnectivityResult(
pname,
[(color("", Colors.GREEN), label,
color("(key configured)", Colors.DIM))],
[],
)
try:
import httpx
base = os.getenv(base_env, "") if base_env else ""
# Auto-detect Kimi Code keys (sk-kimi-) → api.kimi.com/coding/v1
# (OpenAI-compat surface, which exposes /models for health check).
if not base and key.startswith("sk-kimi-"):
base = "https://api.kimi.com/coding/v1"
# Anthropic-compat endpoints (/anthropic, api.kimi.com/coding
# with no /v1) don't support /models. Rewrite to OpenAI-compat
# /v1 surface for health checks.
if base and base.rstrip("/").endswith("/anthropic"):
from agent.auxiliary_client import _to_openai_base_url
base = _to_openai_base_url(base)
if base_url_host_matches(base, "api.kimi.com") and base.rstrip("/").endswith("/coding"):
base = base.rstrip("/") + "/v1"
url = (base.rstrip("/") + "/models") if base else default_url
headers = {
"Authorization": f"Bearer {key}",
"User-Agent": _HERMES_USER_AGENT,
}
if base_url_host_matches(base, "api.kimi.com"):
headers["User-Agent"] = "claude-code/0.1.0"
r = httpx.get(url, headers=headers, timeout=10)
if (
pname == "Alibaba/DashScope"
and not base
and r.status_code == 401
):
r = httpx.get(
"https://dashscope.aliyuncs.com/compatible-mode/v1/models",
headers=headers, timeout=10,
)
if r.status_code == 200:
return _ConnectivityResult(
pname,
[(color("", Colors.GREEN), label, "")],
[],
)
if r.status_code == 401:
return _ConnectivityResult(
pname,
[(color("", Colors.RED), label,
color("(invalid API key)", Colors.DIM))],
[f"Check {env_vars[0]} in .env"],
)
return _ConnectivityResult(
pname,
[(color("", Colors.YELLOW), label,
color(f"(HTTP {r.status_code})", Colors.DIM))],
[],
)
except Exception as e:
return _ConnectivityResult(
pname,
[(color("", Colors.YELLOW), label,
color(f"({e})", Colors.DIM))],
[],
)
def _probe_bedrock() -> _ConnectivityResult:
try:
from agent.bedrock_adapter import (
has_aws_credentials,
resolve_aws_auth_env_var,
resolve_bedrock_region,
)
except ImportError:
return _ConnectivityResult("AWS Bedrock", [], [])
if not has_aws_credentials():
return _ConnectivityResult("AWS Bedrock", [], [])
auth_var = resolve_aws_auth_env_var()
region = resolve_bedrock_region()
label = "AWS Bedrock".ljust(20)
try:
import boto3
from botocore.config import Config as _BotoConfig
# Trim retries on the actual Bedrock API call so a transient
# failure doesn't pad the doctor run by 30+ seconds.
cfg = _BotoConfig(
connect_timeout=5,
read_timeout=10,
retries={"max_attempts": 1},
)
client = boto3.client("bedrock", region_name=region, config=cfg)
resp = client.list_foundation_models()
n = len(resp.get("modelSummaries", []))
return _ConnectivityResult(
"AWS Bedrock",
[(color("", Colors.GREEN), label,
color(f"({auth_var}, {region}, {n} models)", Colors.DIM))],
[],
)
except ImportError:
return _ConnectivityResult(
"AWS Bedrock",
[(color("", Colors.YELLOW), label,
color(f"(boto3 not installed — {sys.executable} -m pip install boto3)",
Colors.DIM))],
[f"Install boto3 for Bedrock: {sys.executable} -m pip install boto3"],
)
except Exception as e:
err_name = type(e).__name__
return _ConnectivityResult(
"AWS Bedrock",
[(color("", Colors.YELLOW), label,
color(f"({err_name}: {e})", Colors.DIM))],
[f"AWS Bedrock: {err_name} — check IAM permissions for "
f"bedrock:ListFoundationModels"],
)
# Build the probe submission list in display order
_probes.append(("OpenRouter API", _probe_openrouter))
_probes.append(("Anthropic API", _probe_anthropic))
# -- API-key providers --
# Tuple: (name, env_vars, default_url, base_env, supports_models_endpoint)
# If supports_models_endpoint is False, we skip the health check and just show "configured"
# Cached at module level after first build — profiles auto-extend it.
global _APIKEY_PROVIDERS_CACHE
if _APIKEY_PROVIDERS_CACHE is None:
_APIKEY_PROVIDERS_CACHE = _build_apikey_providers_list()
_apikey_providers = _APIKEY_PROVIDERS_CACHE
for _pname, _env_vars, _default_url, _base_env, _supports_health_check in _apikey_providers:
_key = ""
for _ev in _env_vars:
_key = os.getenv(_ev, "")
if _key:
break
if _key:
_label = _pname.ljust(20)
# Some providers (like MiniMax) don't support /models endpoint
if not _supports_health_check:
print(f" {color('', Colors.GREEN)} {_label} {color('(key configured)', Colors.DIM)}")
continue
print(f" Checking {_pname} API...", end="", flush=True)
try:
import httpx
_base = os.getenv(_base_env, "") if _base_env else ""
# Auto-detect Kimi Code keys (sk-kimi-) → api.kimi.com/coding/v1
# (OpenAI-compat surface, which exposes /models for health check).
if not _base and _key.startswith("sk-kimi-"):
_base = "https://api.kimi.com/coding/v1"
# Anthropic-compat endpoints (/anthropic, api.kimi.com/coding
# with no /v1) don't support /models. Rewrite to the OpenAI-compat
# /v1 surface for health checks.
if _base and _base.rstrip("/").endswith("/anthropic"):
from agent.auxiliary_client import _to_openai_base_url
_base = _to_openai_base_url(_base)
if base_url_host_matches(_base, "api.kimi.com") and _base.rstrip("/").endswith("/coding"):
_base = _base.rstrip("/") + "/v1"
_url = (_base.rstrip("/") + "/models") if _base else _default_url
_headers = {
"Authorization": f"Bearer {_key}",
"User-Agent": _HERMES_USER_AGENT,
}
if base_url_host_matches(_base, "api.kimi.com"):
_headers["User-Agent"] = "claude-code/0.1.0"
_resp = httpx.get(
_url,
headers=_headers,
timeout=10,
)
if (
_pname == "Alibaba/DashScope"
and not _base
and _resp.status_code == 401
):
_resp = httpx.get(
"https://dashscope.aliyuncs.com/compatible-mode/v1/models",
headers=_headers,
timeout=10,
)
if _resp.status_code == 200:
print(f"\r {color('', Colors.GREEN)} {_label} ")
elif _resp.status_code == 401:
print(f"\r {color('', Colors.RED)} {_label} {color('(invalid API key)', Colors.DIM)} ")
issues.append(f"Check {_env_vars[0]} in .env")
else:
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'(HTTP {_resp.status_code})', Colors.DIM)} ")
except Exception as _e:
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'({_e})', Colors.DIM)} ")
for _entry in _APIKEY_PROVIDERS_CACHE:
_pname, _env_vars, _default_url, _base_env, _supports = _entry
# Capture loop vars by binding default args — without this, all closures
# would share the final iteration's values and every probe would hit
# the last provider's URL.
_probes.append((_pname, lambda p=_pname, e=_env_vars, u=_default_url,
b=_base_env, s=_supports:
_probe_apikey_provider(p, e, u, b, s)))
# -- AWS Bedrock --
# Bedrock uses the AWS SDK credential chain, not API keys.
_probes.append(("AWS Bedrock", _probe_bedrock))
# Print a single status line so users see something happening, then
# fan out. ``\r`` clears it once the first real result line lands.
print(f" {color(f'Running {len(_probes)} connectivity checks in parallel…', Colors.DIM)}",
end="", flush=True)
# Disable boto3's EC2 instance-metadata-service probe for the duration
# of the parallel block. boto's default credential chain tries
# 169.254.169.254 with a multi-second timeout when we're not on EC2,
# which dominated the section's wall time before this fix
# (~2s on a developer laptop, even with the rest parallelized).
# Set on the parent thread before submitting work so the env-var
# mutation never races with another worker. has_aws_credentials() in
# the bedrock probe already gates on real env-var creds, so IMDS is
# never the legitimate source for `hermes doctor`.
_imds_prev = os.environ.get("AWS_EC2_METADATA_DISABLED")
os.environ["AWS_EC2_METADATA_DISABLED"] = "true"
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_aws_auth_env_var, resolve_bedrock_region
if has_aws_credentials():
_auth_var = resolve_aws_auth_env_var()
_region = resolve_bedrock_region()
_label = "AWS Bedrock".ljust(20)
print(f" Checking AWS Bedrock...", end="", flush=True)
try:
import boto3
_br_client = boto3.client("bedrock", region_name=_region)
_br_resp = _br_client.list_foundation_models()
_model_count = len(_br_resp.get("modelSummaries", []))
print(f"\r {color('', Colors.GREEN)} {_label} {color(f'({_auth_var}, {_region}, {_model_count} models)', Colors.DIM)} ")
except ImportError:
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'(boto3 not installed — {sys.executable} -m pip install boto3)', Colors.DIM)} ")
issues.append(f"Install boto3 for Bedrock: {sys.executable} -m pip install boto3")
except Exception as _e:
_err_name = type(_e).__name__
print(f"\r {color('', Colors.YELLOW)} {_label} {color(f'({_err_name}: {_e})', Colors.DIM)} ")
issues.append(f"AWS Bedrock: {_err_name} — check IAM permissions for bedrock:ListFoundationModels")
except ImportError:
pass # bedrock_adapter not available — skip silently
# 8 workers is plenty — each probe is a single HTTP call plus a TLS
# handshake. More than that wastes thread-startup cost and risks
# noisy output if anything ever printed from inside a worker.
with _futures.ThreadPoolExecutor(max_workers=8,
thread_name_prefix="doctor-probe") as _ex:
_futures_in_order = [_ex.submit(_fn) for _, _fn in _probes]
_results = [_f.result() for _f in _futures_in_order]
finally:
if _imds_prev is None:
os.environ.pop("AWS_EC2_METADATA_DISABLED", None)
else:
os.environ["AWS_EC2_METADATA_DISABLED"] = _imds_prev
# Clear the "Running …" line and print all results in submission order.
print("\r" + " " * 70 + "\r", end="")
for _r in _results:
for _glyph, _label, _detail in _r.lines:
if _detail:
print(f" {_glyph} {_label} {_detail}")
else:
print(f" {_glyph} {_label}")
for _issue in _r.issues:
issues.append(_issue)
# =========================================================================
# Check: Submodules
+3 -3
View File
@@ -307,7 +307,7 @@ def cmd_fallback_clear(args) -> None: # noqa: ARG001
print()
print(" Cancelled.")
return
if resp not in ("y", "yes"):
if resp not in {"y", "yes"}:
print(" Cancelled — no change.")
return
@@ -347,11 +347,11 @@ def _numbered_pick(question: str, choices: List[str]) -> Optional[int]:
def cmd_fallback(args) -> None:
"""Top-level dispatcher for ``hermes fallback [subcommand]``."""
sub = getattr(args, "fallback_command", None)
if sub in (None, "", "list", "ls"):
if sub in {None, "", "list", "ls"}:
cmd_fallback_list(args)
elif sub == "add":
cmd_fallback_add(args)
elif sub in ("remove", "rm"):
elif sub in {"remove", "rm"}:
cmd_fallback_remove(args)
elif sub == "clear":
cmd_fallback_clear(args)
+187 -70
View File
@@ -394,42 +394,68 @@ def _scan_gateway_pids(exclude_pids: set[int], all_profiles: bool = False) -> li
pass
current_cmd = ""
else:
result = subprocess.run(
["ps", "-A", "eww", "-o", "pid=,command="],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
return []
for line in result.stdout.split("\n"):
stripped = line.strip()
if not stripped or "grep" in stripped:
continue
# Try /proc first (works in Docker without procps installed),
# fall back to ps -A eww.
_found_via_proc = False
if os.path.isdir("/proc"):
try:
my_pid = os.getpid()
for entry in os.listdir("/proc"):
if not entry.isdigit():
continue
pid = int(entry)
if pid == my_pid or pid in exclude_pids:
continue
try:
cmdline = open(f"/proc/{pid}/cmdline", "rb").read().decode("utf-8", errors="replace")
cmdline = cmdline.replace("\x00", " ")
if any(p in cmdline for p in patterns) and (
all_profiles or _matches_current_profile(cmdline)
):
_append_unique_pid(pids, pid, exclude_pids)
except (OSError, PermissionError):
continue
_found_via_proc = True
except Exception:
pass
pid = None
command = ""
if not _found_via_proc:
result = subprocess.run(
["ps", "-A", "eww", "-o", "pid=,command="],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode != 0:
return []
for line in result.stdout.split("\n"):
stripped = line.strip()
if not stripped or "grep" in stripped:
continue
parts = stripped.split(None, 1)
if len(parts) == 2:
try:
pid = int(parts[0])
command = parts[1]
except ValueError:
pid = None
pid = None
command = ""
if pid is None:
aux_parts = stripped.split()
if len(aux_parts) > 10 and aux_parts[1].isdigit():
pid = int(aux_parts[1])
command = " ".join(aux_parts[10:])
parts = stripped.split(None, 1)
if len(parts) == 2:
try:
pid = int(parts[0])
command = parts[1]
except ValueError:
pid = None
if pid is None:
continue
if any(pattern in command for pattern in patterns) and (
all_profiles or _matches_current_profile(command)
):
_append_unique_pid(pids, pid, exclude_pids)
if pid is None:
aux_parts = stripped.split()
if len(aux_parts) > 10 and aux_parts[1].isdigit():
pid = int(aux_parts[1])
command = " ".join(aux_parts[10:])
if pid is None:
continue
if any(pattern in command for pattern in patterns) and (
all_profiles or _matches_current_profile(command)
):
_append_unique_pid(pids, pid, exclude_pids)
except (OSError, subprocess.TimeoutExpired):
return []
@@ -635,6 +661,66 @@ def _probe_systemd_service_running(system: bool = False) -> tuple[bool, bool]:
return selected_system, result.stdout.strip() == "active"
def _read_systemd_unit_environment(system: bool = False) -> dict[str, str]:
"""Parse the gateway unit's ``Environment=`` directives.
``systemctl show -p Environment`` returns a single line of
space-separated ``KEY=VALUE`` pairs; values are not quoted in the output
even when the unit file quoted them. We split on whitespace and ``=``.
"""
selected_system = _select_systemd_scope(system)
try:
result = _run_systemctl(
[
"show",
get_service_name(),
"--no-pager",
"--property",
"Environment",
],
system=selected_system,
capture_output=True,
text=True,
timeout=10,
)
except (RuntimeError, subprocess.TimeoutExpired, OSError):
return {}
if result.returncode != 0:
return {}
parsed: dict[str, str] = {}
for line in result.stdout.splitlines():
if not line.startswith("Environment="):
continue
body = line[len("Environment="):].strip()
for token in body.split():
if "=" not in token:
continue
key, value = token.split("=", 1)
parsed[key] = value
return parsed
def _sync_hermes_home_from_systemd_unit(system: bool) -> None:
"""When acting on a system-scope unit, adopt its ``HERMES_HOME``.
Under ``sudo``, ``HERMES_HOME`` is stripped and ``HOME=/root``, so
:func:`get_hermes_home` falls back to ``/root/.hermes`` the wrong
profile. The unit file pins ``HERMES_HOME`` for the actual gateway
process, so we mirror that into our own environment to make
``read_runtime_status`` / ``get_running_pid`` read the correct files.
"""
if not system:
return
env = _read_systemd_unit_environment(system=True)
unit_home = env.get("HERMES_HOME", "").strip()
if not unit_home:
return
current = os.environ.get("HERMES_HOME", "").strip()
if current == unit_home:
return
os.environ["HERMES_HOME"] = unit_home
def _read_systemd_unit_properties(
system: bool = False,
properties: tuple[str, ...] = (
@@ -1108,7 +1194,7 @@ def _systemd_operational(system: bool = False) -> bool:
)
# "running", "degraded", "starting" all mean systemd is PID 1
status = result.stdout.strip().lower()
return status in ("running", "degraded", "starting", "initializing")
return status in {"running", "degraded", "starting", "initializing"}
except (RuntimeError, subprocess.TimeoutExpired, OSError):
return False
@@ -1141,6 +1227,27 @@ def is_windows() -> bool:
return sys.platform == 'win32'
def _windows_gateway_should_absorb_console_controls() -> bool:
"""Return True for detached Windows gateway runs that should ignore Ctrl+C.
Foreground ``hermes gateway run`` must remain interruptible from
PowerShell/CMD. Detached service-style launches opt in via
``HERMES_GATEWAY_DETACHED=1``; older wrappers without the env marker are
treated as detached when no interactive stdin is attached.
"""
if not is_windows():
return False
detached = os.getenv("HERMES_GATEWAY_DETACHED", "").strip().lower()
if detached in {"1", "true", "yes", "on"}:
return True
try:
return not bool(sys.stdin and sys.stdin.isatty())
except (ValueError, OSError):
return True
# =============================================================================
# Service Configuration
# =============================================================================
@@ -2149,7 +2256,30 @@ def refresh_systemd_unit_if_needed(system: bool = False) -> bool:
return False
expected_user = _read_systemd_user_from_unit(unit_path) if system else None
unit_path.write_text(generate_systemd_unit(system=system, run_as_user=expected_user), encoding="utf-8")
new_unit = generate_systemd_unit(system=system, run_as_user=expected_user)
# ── Test-environment safety belt ─────────────────────────────────────
# The user-scope unit path resolves under ``Path.home()``, which is NOT
# sandboxed by the test conftest (only HERMES_HOME is). If a test
# exercises ``run_gateway()`` with a pytest-tmp HERMES_HOME, the freshly
# generated unit bakes that ``/tmp/pytest-of-.../hermes_test`` path into
# ``Environment="HERMES_HOME=..."``. Writing that to the developer's
# real user systemd unit file silently breaks their gateway on the next
# reboot (systemd loads the polluted env, the gateway looks at an empty
# tmp dir, and Telegram/Discord/etc. all show as "not configured").
# Refuse to write when the generated unit references a pytest tmpdir.
# Detection sniffs the unit body — tests that legitimately exercise the
# refresh flow patch ``generate_systemd_unit`` to return synthetic
# content (``"new unit\n"``) which doesn't contain these markers and
# still works.
if not system and (
"/pytest-of-" in new_unit
or "/hermes_test\"" in new_unit
or "/hermes_test/" in new_unit
):
return False
unit_path.write_text(new_unit, encoding="utf-8")
_run_systemctl(["daemon-reload"], system=system, check=True, timeout=30)
print(f"↻ Updated gateway {_service_scope_label(system)} service definition to match the current Hermes install")
return True
@@ -2380,6 +2510,7 @@ def systemd_stop(system: bool = False):
if system:
_require_root_for_system_service("stop")
_require_service_installed("stop", system=system)
_sync_hermes_home_from_systemd_unit(system=system)
try:
from gateway.status import get_running_pid, write_planned_stop_marker
pid = get_running_pid(cleanup_stale=False)
@@ -2408,6 +2539,7 @@ def systemd_restart(system: bool = False):
_preflight_user_systemd()
_require_service_installed("restart", system=system)
refresh_systemd_unit_if_needed(system=system)
_sync_hermes_home_from_systemd_unit(system=system)
from gateway.status import get_running_pid
pid = get_running_pid() or _systemd_main_pid(system=system)
@@ -2503,6 +2635,8 @@ def systemd_status(deep: bool = False, system: bool = False, full: bool = False)
print(f" Run: {'sudo ' if system else ''}hermes gateway install{scope_flag}")
return
_sync_hermes_home_from_systemd_unit(system=system)
if has_conflicting_systemd_units():
print_systemd_scope_conflict_warning()
print()
@@ -2781,7 +2915,7 @@ def launchd_start():
try:
subprocess.run(["launchctl", "kickstart", f"{_launchd_domain()}/{label}"], check=True, timeout=30)
except subprocess.CalledProcessError as e:
if e.returncode not in (3, 113):
if e.returncode not in {3, 113}:
raise
print("↻ launchd job was unloaded; reloading service definition")
subprocess.run(["launchctl", "bootstrap", _launchd_domain(), str(plist_path)], check=True, timeout=30)
@@ -2805,7 +2939,7 @@ def launchd_stop():
try:
subprocess.run(["launchctl", "bootout", target], check=True, timeout=90)
except subprocess.CalledProcessError as e:
if e.returncode in (3, 113):
if e.returncode in {3, 113}:
pass # Already unloaded — nothing to stop.
else:
raise
@@ -2877,7 +3011,7 @@ def launchd_restart():
subprocess.run(["launchctl", "kickstart", "-k", target], check=True, timeout=90)
print("✓ Service restarted")
except subprocess.CalledProcessError as e:
if e.returncode not in (3, 113):
if e.returncode not in {3, 113}:
raise
# Job not loaded — bootstrap and start fresh
print("↻ launchd job was unloaded; reloading")
@@ -2978,34 +3112,17 @@ def run_gateway(verbose: int = 0, quiet: bool = False, replace: bool = False):
_guard_official_docker_root_gateway()
sys.path.insert(0, str(PROJECT_ROOT))
# On Windows, when the gateway is launched as a detached background
# process (via ``hermes gateway install`` → Scheduled Task / Startup
# folder / direct pythonw.exe spawn) there is no console attached. In
# that case Windows can still deliver CTRL_C_EVENT / CTRL_BREAK_EVENT
# to the process group under some circumstances (e.g. when *another*
# process in the same group sends one), which Python 3.11 translates
# into KeyboardInterrupt inside asyncio.run(). The outer handler below
# catches that and exits cleanly — silently killing the gateway. On
# detached boots we must absorb those spurious signals so the gateway
# stays alive; real user Ctrl+C still comes through prompt_toolkit /
# the asyncio signal handler when running in a real console.
#
# IMPORTANT lesson (May 2026): we originally gated this on "stdin is
# NOT a TTY" assuming only detached pythonw runs would be vulnerable.
# Wrong. When the user runs `hermes gateway start` from a PowerShell
# console, the gateway inherits that console and stdin IS a TTY —
# but it's STILL vulnerable to CTRL_C_EVENT broadcast by any sibling
# `hermes` invocation (like `hermes gateway status` 30 seconds later)
# because Windows routes console events to all processes sharing the
# console. Every hermes CLI process after that sibling fires is a
# potential drive-by killer. So on Windows, for `gateway run`
# specifically (never interactive by design), always install the
# SIGINT absorber regardless of TTY state.
# Detached Windows gateway runs must ignore console-control broadcasts
# from sibling CLI processes, but foreground `hermes gateway run` still
# needs to obey the banner's "Press Ctrl+C to stop" contract.
# Service-style launchers set HERMES_GATEWAY_DETACHED=1; older wrappers
# without the marker are handled by the non-TTY fallback.
try:
_stdin_is_tty = bool(sys.stdin and sys.stdin.isatty())
except (ValueError, OSError):
_stdin_is_tty = False
if is_windows():
_absorb_windows_console_controls = _windows_gateway_should_absorb_console_controls()
if _absorb_windows_console_controls:
try:
signal.signal(signal.SIGINT, signal.SIG_IGN)
if hasattr(signal, "SIGBREAK"):
@@ -3103,6 +3220,7 @@ def run_gateway(verbose: int = 0, quiet: bool = False, replace: bool = False):
replace=replace,
argv=sys.argv,
stdin_is_tty=_stdin_is_tty,
absorb_windows_console_controls=_absorb_windows_console_controls,
)
def _atexit_hook() -> None:
@@ -3631,7 +3749,7 @@ def _platform_status(platform: dict) -> str:
password = get_env_value("MATRIX_PASSWORD")
if (val or password) and homeserver:
e2ee = get_env_value("MATRIX_ENCRYPTION")
suffix = " + E2EE" if e2ee and e2ee.lower() in ("true", "1", "yes") else ""
suffix = " + E2EE" if e2ee and e2ee.lower() in {"true", "1", "yes"} else ""
return f"configured{suffix}"
if val or password or homeserver:
return "partially configured"
@@ -4829,15 +4947,14 @@ def gateway_setup():
print_info(" Run in foreground: hermes gateway run")
print_info(" For persistence: tmux new -s hermes 'hermes gateway run'")
print_info(" To enable systemd: add systemd=true to /etc/wsl.conf, then 'wsl --shutdown'")
elif is_termux():
from hermes_constants import display_hermes_home as _dhh
print_info(" Termux does not use systemd/launchd services.")
print_info(" Run in foreground: hermes gateway run")
print_info(f" Or start it manually in the background (best effort): nohup hermes gateway run >{_dhh()}/logs/gateway.log 2>&1 &")
else:
if is_termux():
from hermes_constants import display_hermes_home as _dhh
print_info(" Termux does not use systemd/launchd services.")
print_info(" Run in foreground: hermes gateway run")
print_info(f" Or start it manually in the background (best effort): nohup hermes gateway run >{_dhh()}/logs/gateway.log 2>&1 &")
else:
print_info(" Service install not supported on this platform.")
print_info(" Run in foreground: hermes gateway run")
print_info(" Service install not supported on this platform.")
print_info(" Run in foreground: hermes gateway run")
else:
print()
print_info("No platforms configured. Run 'hermes gateway setup' when ready.")
+2
View File
@@ -216,6 +216,7 @@ def _build_gateway_cmd_script(
lines.append(f"cd /d {_quote_cmd_script_arg(working_dir)}")
lines.append(f'set "HERMES_HOME={hermes_home}"')
lines.append('set "PYTHONIOENCODING=utf-8"')
lines.append('set "HERMES_GATEWAY_DETACHED=1"')
# VIRTUAL_ENV lets the gateway's own python detection find the venv
# if someone imports hermes_constants-based logic during startup.
venv_dir = str(Path(python_path).resolve().parent.parent)
@@ -371,6 +372,7 @@ def _build_gateway_argv() -> tuple[list[str], str, dict[str, str]]:
env_overlay = {
"HERMES_HOME": hermes_home,
"PYTHONIOENCODING": "utf-8",
"HERMES_GATEWAY_DETACHED": "1",
"VIRTUAL_ENV": str(Path(python_exe).resolve().parent.parent),
}
return argv, working_dir, env_overlay
+3 -3
View File
@@ -270,7 +270,7 @@ def _parse_judge_response(raw: str) -> Tuple[bool, str, bool]:
done_val = data.get("done")
if isinstance(done_val, str):
done = done_val.strip().lower() in ("true", "yes", "1", "done")
done = done_val.strip().lower() in {"true", "yes", "1", "done"}
else:
done = bool(done_val)
reason = str(data.get("reason") or "").strip()
@@ -389,11 +389,11 @@ class GoalManager:
return self._state is not None and self._state.status == "active"
def has_goal(self) -> bool:
return self._state is not None and self._state.status in ("active", "paused")
return self._state is not None and self._state.status in {"active", "paused"}
def status_line(self) -> str:
s = self._state
if s is None or s.status in ("cleared",):
if s is None or s.status in {"cleared",}:
return "No active goal. Set one with /goal <text>."
turns = f"{s.turns_used}/{s.max_turns} turns"
if s.status == "active":
+3 -3
View File
@@ -32,11 +32,11 @@ def hooks_command(args) -> None:
print("Run 'hermes hooks --help' for details.")
return
if sub in ("list", "ls"):
if sub in {"list", "ls"}:
_cmd_list(args)
elif sub == "test":
_cmd_test(args)
elif sub in ("revoke", "remove", "rm"):
elif sub in {"revoke", "remove", "rm"}:
_cmd_revoke(args)
elif sub == "doctor":
_cmd_doctor(args)
@@ -220,7 +220,7 @@ def _cmd_test(args) -> None:
if getattr(args, "for_tool", None):
specs = [
s for s in specs
if s.event not in ("pre_tool_call", "post_tool_call")
if s.event not in {"pre_tool_call", "post_tool_call"}
or s.matches_tool(args.for_tool)
]
+104 -36
View File
@@ -82,7 +82,7 @@ def _parse_workspace_flag(value: str) -> tuple[str, Optional[str]]:
if not value:
return ("scratch", None)
v = value.strip()
if v in ("scratch", "worktree"):
if v in {"scratch", "worktree"}:
return (v, None)
if v.startswith("dir:"):
path = v[len("dir:"):].strip()
@@ -510,6 +510,10 @@ def build_parser(parent_subparsers: argparse._SubParsersAction) -> argparse.Argu
p_nsub.add_argument("--chat-id", required=True)
p_nsub.add_argument("--thread-id", default=None)
p_nsub.add_argument("--user-id", default=None)
p_nsub.add_argument(
"--notifier-profile", default=None,
help="Profile gateway that owns/delivers this subscription (default: active profile)",
)
p_nlist = sub.add_parser(
"notify-list",
@@ -648,6 +652,16 @@ def kanban_command(args: argparse.Namespace) -> int:
# keeps the patch small and inherits the exact same resolution the
# dispatcher uses for workers — consistency is a feature here.
board_override = getattr(args, "board", None)
prev_board_env = os.environ.get("HERMES_KANBAN_BOARD")
restore_board_env = False
def _restore_board_env() -> None:
if not restore_board_env:
return
if prev_board_env is None:
os.environ.pop("HERMES_KANBAN_BOARD", None)
else:
os.environ["HERMES_KANBAN_BOARD"] = prev_board_env
if board_override:
try:
normed = kb._normalize_board_slug(board_override)
@@ -667,12 +681,16 @@ def kanban_command(args: argparse.Namespace) -> int:
)
return 1
os.environ["HERMES_KANBAN_BOARD"] = normed
restore_board_env = True
# Boards management doesn't touch the DB at all — dispatch early so
# fresh installs that haven't initialized any DB can still use
# `hermes kanban boards create …`.
if action == "boards":
return _dispatch_boards(args)
try:
return _dispatch_boards(args)
finally:
_restore_board_env()
# Auto-initialize the DB before dispatching any subcommand. init_db
# is idempotent, so running it every invocation is cheap (one
@@ -685,6 +703,7 @@ def kanban_command(args: argparse.Namespace) -> int:
kb.init_db()
except Exception as exc:
print(f"kanban: could not initialize database: {exc}", file=sys.stderr)
_restore_board_env()
return 1
handlers = {
@@ -726,12 +745,16 @@ def kanban_command(args: argparse.Namespace) -> int:
handler = handlers.get(action)
if not handler:
print(f"kanban: unknown action {action!r}", file=sys.stderr)
_restore_board_env()
return 2
try:
return int(handler(args) or 0)
except (ValueError, RuntimeError) as exc:
print(f"kanban: {exc}", file=sys.stderr)
_restore_board_env()
return 1
finally:
_restore_board_env()
# ---------------------------------------------------------------------------
@@ -765,15 +788,15 @@ def _dispatch_boards(args: argparse.Namespace) -> int:
can still run ``boards create`` / ``boards list``.
"""
sub = getattr(args, "boards_action", None) or "list"
if sub in ("list", "ls"):
if sub in {"list", "ls"}:
return _cmd_boards_list(args)
if sub in ("create", "new"):
if sub in {"create", "new"}:
return _cmd_boards_create(args)
if sub in ("rm", "remove", "delete"):
if sub in {"rm", "remove", "delete"}:
return _cmd_boards_rm(args)
if sub in ("switch", "use"):
if sub in {"switch", "use"}:
return _cmd_boards_switch(args)
if sub in ("show", "current"):
if sub in {"show", "current"}:
return _cmd_boards_show(args)
if sub == "rename":
return _cmd_boards_rename(args)
@@ -1278,7 +1301,7 @@ def _cmd_show(args: argparse.Namespace) -> int:
def _cmd_assign(args: argparse.Namespace) -> int:
profile = None if args.profile.lower() in ("none", "-", "null") else args.profile
profile = None if args.profile.lower() in {"none", "-", "null"} else args.profile
with kb.connect() as conn:
ok = kb.assign_task(conn, args.task_id, profile)
if not ok:
@@ -1305,7 +1328,7 @@ def _cmd_reclaim(args: argparse.Namespace) -> int:
def _cmd_reassign(args: argparse.Namespace) -> int:
profile = None if args.profile.lower() in ("none", "-", "null") else args.profile
profile = None if args.profile.lower() in {"none", "-", "null"} else args.profile
with kb.connect() as conn:
ok = kb.reassign_task(
conn, args.task_id, profile,
@@ -1921,6 +1944,7 @@ def _cmd_notify_subscribe(args: argparse.Namespace) -> int:
conn, task_id=args.task_id,
platform=args.platform, chat_id=args.chat_id,
thread_id=args.thread_id, user_id=args.user_id,
notifier_profile=args.notifier_profile or _profile_author(),
)
print(f"Subscribed {args.platform}:{args.chat_id}"
+ (f":{args.thread_id}" if args.thread_id else "")
@@ -1939,8 +1963,9 @@ def _cmd_notify_list(args: argparse.Namespace) -> int:
return 0
for s in subs:
thr = f":{s['thread_id']}" if s.get("thread_id") else ""
owner = f" owner={s['notifier_profile']}" if s.get("notifier_profile") else ""
print(f" {s['task_id']:10s} {s['platform']}:{s['chat_id']}{thr}"
f" (since event {s['last_event_id']})")
f" (since event {s['last_event_id']}){owner}")
return 0
@@ -2071,19 +2096,18 @@ def _cmd_specify(args: argparse.Namespace) -> int:
"reason": outcome.reason,
"new_title": outcome.new_title,
}))
elif outcome.ok:
title_suffix = (
f" — retitled: {outcome.new_title!r}"
if outcome.new_title
else ""
)
print(f"Specified {outcome.task_id} → todo{title_suffix}")
else:
if outcome.ok:
title_suffix = (
f" — retitled: {outcome.new_title!r}"
if outcome.new_title
else ""
)
print(f"Specified {outcome.task_id} → todo{title_suffix}")
else:
print(
f"kanban: specify {outcome.task_id}: {outcome.reason}",
file=sys.stderr,
)
print(
f"kanban: specify {outcome.task_id}: {outcome.reason}",
file=sys.stderr,
)
if not all_flag:
return 0 if ok_count == 1 else 1
# --all: succeed if at least one promotion landed; exit 1 only when
@@ -2136,6 +2160,29 @@ def _cmd_gc(args: argparse.Namespace) -> int:
# Slash-command entry point (used by /kanban from CLI and gateway)
# ---------------------------------------------------------------------------
_SLASH_KANBAN_HELP = """\
**/kanban** manage the shared task board.
Common subcommands:
`list` (alias `ls`) List tasks on the current board
`show <id>` Task details + comments + events
`stats` Per-status / per-assignee counts
`create <title>` Create a task (auto-subscribes you to events)
`comment <id> <msg>` Append a comment
`complete <id>` Mark task(s) done
`block <id> [reason]` Mark blocked; `unblock <id>` to revive
`assign <id> <profile>` Reassign
`boards list` Show all boards
`assignees` Known profiles + counts
`context <id>` Full worker-context dump
`runs <id>` Attempt history
`log <id>` Worker log
Run `/kanban <subcommand> -h` for arguments. \
Read-only commands are safe while an agent is running.\
"""
def run_slash(rest: str) -> str:
"""Execute a ``/kanban …`` string and return captured stdout/stderr.
@@ -2148,26 +2195,47 @@ def run_slash(rest: str) -> str:
tokens = shlex.split(rest) if rest and rest.strip() else []
parser = argparse.ArgumentParser(prog="/kanban", add_help=False)
parser.exit_on_error = False # type: ignore[attr-defined]
sub = parser.add_subparsers(dest="kanban_action")
# Reuse the argparse builder -- call it with a throwaway parent
# subparsers via a wrapping top-level parser.
wrap = argparse.ArgumentParser(prog="/", add_help=False)
wrap.exit_on_error = False # type: ignore[attr-defined]
wrap_sub = wrap.add_subparsers(dest="_top")
build_parser(wrap_sub)
# Bare ``/kanban`` or ``/kanban help`` / ``--help`` / ``-h`` / ``?``:
# show the curated short-help block instead of dumping argparse's full
# usage tree (which is enormous and reads as garbage in a chat
# bubble). Per-subcommand help still works via ``/kanban foo -h``.
if not tokens or tokens[0] in {"help", "--help", "-h", "?"}:
return _SLASH_KANBAN_HELP
# Single argparse tree rooted at "/kanban". build_parser() expects a
# subparsers action to attach to, so build a throwaway one and pull
# the kanban_parser back out — then drive it directly so usage/error
# text reads as ``/kanban`` (not ``/kanban-wrap kanban``).
_wrap = argparse.ArgumentParser(prog="/kanban-wrap", add_help=False)
_wrap.exit_on_error = False # type: ignore[attr-defined]
_top_sub = _wrap.add_subparsers(dest="_top")
kanban_parser = build_parser(_top_sub)
kanban_parser.prog = "/kanban"
kanban_parser.exit_on_error = False # type: ignore[attr-defined]
for _action in kanban_parser._actions:
if isinstance(_action, argparse._SubParsersAction):
for _name, _choice in _action.choices.items():
_choice.prog = f"/kanban {_name}"
_choice.exit_on_error = False # type: ignore[attr-defined]
buf_out = io.StringIO()
buf_err = io.StringIO()
# ``-h`` / ``--help`` makes argparse print to stdout and SystemExit(0).
# Capture both streams so neither the help text nor the error text
# bypasses our buffer.
try:
# Prepend the "kanban" token so our top-level subparser routes here.
argv = ["kanban", *tokens] if tokens else ["kanban"]
args = wrap.parse_args(argv)
with contextlib.redirect_stdout(buf_out), contextlib.redirect_stderr(buf_err):
args = kanban_parser.parse_args(tokens)
except SystemExit as exc:
return f"(usage error: {exc})"
out = buf_out.getvalue().rstrip()
err = buf_err.getvalue().rstrip()
# Help dump (exit 0) → return the captured help text directly.
if exc.code in {0, None} and out:
return out
body = err or out
return f"⚠ /kanban usage error\n{body}" if body else "⚠ /kanban usage error"
except argparse.ArgumentError as exc:
return f"(usage error: {exc})"
return f"⚠ /kanban usage error: {exc}"
with contextlib.redirect_stdout(buf_out), contextlib.redirect_stderr(buf_err):
try:
+394 -41
View File
@@ -83,6 +83,8 @@ from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Iterable, Optional
from toolsets import get_toolset_names
# ---------------------------------------------------------------------------
# Constants
@@ -90,6 +92,7 @@ from typing import Any, Iterable, Optional
VALID_STATUSES = {"triage", "todo", "ready", "running", "blocked", "done", "archived"}
VALID_WORKSPACE_KINDS = {"scratch", "worktree", "dir"}
KNOWN_TOOLSET_NAMES = frozenset(name.casefold() for name in get_toolset_names())
# A running task's claim is valid for 15 minutes; after that the next
# dispatcher tick reclaims it. Workers that outlive this window should call
@@ -858,6 +861,7 @@ CREATE TABLE IF NOT EXISTS kanban_notify_subs (
chat_id TEXT NOT NULL,
thread_id TEXT NOT NULL DEFAULT '',
user_id TEXT,
notifier_profile TEXT,
created_at INTEGER NOT NULL,
last_event_id INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (task_id, platform, chat_id, thread_id)
@@ -963,6 +967,25 @@ def init_db(
return path
def _add_column_if_missing(
conn: sqlite3.Connection, table: str, column: str, ddl: str
) -> bool:
"""Run ``ALTER TABLE <table> ADD COLUMN <ddl>``, idempotent across races.
Returns ``True`` when the column was actually added by this call.
Swallows ``duplicate column name`` errors so a concurrent connection
that ran the same migration first does not crash the dispatcher tick
(issue #21708).
"""
try:
conn.execute(f"ALTER TABLE {table} ADD COLUMN {ddl}")
return True
except sqlite3.OperationalError as exc:
if "duplicate column name" in str(exc).lower():
return False
raise
def _migrate_add_optional_columns(conn: sqlite3.Connection) -> None:
"""Add columns that were introduced after v1 release to legacy DBs.
@@ -970,11 +993,13 @@ def _migrate_add_optional_columns(conn: sqlite3.Connection) -> None:
"""
cols = {row["name"] for row in conn.execute("PRAGMA table_info(tasks)")}
if "tenant" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN tenant TEXT")
_add_column_if_missing(conn, "tasks", "tenant", "tenant TEXT")
if "result" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN result TEXT")
_add_column_if_missing(conn, "tasks", "result", "result TEXT")
if "idempotency_key" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN idempotency_key TEXT")
_add_column_if_missing(
conn, "tasks", "idempotency_key", "idempotency_key TEXT"
)
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_tasks_idempotency "
"ON tasks(idempotency_key)"
@@ -997,37 +1022,51 @@ def _migrate_add_optional_columns(conn: sqlite3.Connection) -> None:
# the *original* snapshot; this is intentional and safe as long as
# no step depends on a column added by a previous step in the same call.
if "consecutive_failures" not in cols:
conn.execute(
"ALTER TABLE tasks ADD COLUMN consecutive_failures "
"INTEGER NOT NULL DEFAULT 0"
added = _add_column_if_missing(
conn,
"tasks",
"consecutive_failures",
"consecutive_failures INTEGER NOT NULL DEFAULT 0",
)
if "spawn_failures" in cols:
if added and "spawn_failures" in cols:
conn.execute(
"UPDATE tasks SET consecutive_failures = COALESCE(spawn_failures, 0)"
)
if "worker_pid" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN worker_pid INTEGER")
_add_column_if_missing(conn, "tasks", "worker_pid", "worker_pid INTEGER")
if "last_failure_error" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN last_failure_error TEXT")
if "last_spawn_error" in cols:
added = _add_column_if_missing(
conn, "tasks", "last_failure_error", "last_failure_error TEXT"
)
if added and "last_spawn_error" in cols:
conn.execute(
"UPDATE tasks SET last_failure_error = last_spawn_error"
)
if "max_runtime_seconds" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN max_runtime_seconds INTEGER")
_add_column_if_missing(
conn, "tasks", "max_runtime_seconds", "max_runtime_seconds INTEGER"
)
if "last_heartbeat_at" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN last_heartbeat_at INTEGER")
_add_column_if_missing(
conn, "tasks", "last_heartbeat_at", "last_heartbeat_at INTEGER"
)
if "current_run_id" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN current_run_id INTEGER")
_add_column_if_missing(
conn, "tasks", "current_run_id", "current_run_id INTEGER"
)
if "workflow_template_id" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN workflow_template_id TEXT")
_add_column_if_missing(
conn, "tasks", "workflow_template_id", "workflow_template_id TEXT"
)
if "current_step_key" not in cols:
conn.execute("ALTER TABLE tasks ADD COLUMN current_step_key TEXT")
_add_column_if_missing(
conn, "tasks", "current_step_key", "current_step_key TEXT"
)
if "skills" not in cols:
# JSON array of skill names the dispatcher force-loads into the
# worker (additive to the built-in `kanban-worker`). NULL is fine
# for existing rows.
conn.execute("ALTER TABLE tasks ADD COLUMN skills TEXT")
_add_column_if_missing(conn, "tasks", "skills", "skills TEXT")
if "max_retries" not in cols:
# Per-task override for the consecutive-failure circuit breaker.
@@ -1035,18 +1074,30 @@ def _migrate_add_optional_columns(conn: sqlite3.Connection) -> None:
# config, then ``DEFAULT_FAILURE_LIMIT``. Existing rows get NULL,
# which is the correct default (they keep the global behaviour
# they were getting before the column existed).
conn.execute("ALTER TABLE tasks ADD COLUMN max_retries INTEGER")
_add_column_if_missing(conn, "tasks", "max_retries", "max_retries INTEGER")
# task_events gained a run_id column; back-fill it as NULL for
# historical events (they predate runs and can't be attributed).
ev_cols = {row["name"] for row in conn.execute("PRAGMA table_info(task_events)")}
if "run_id" not in ev_cols:
conn.execute("ALTER TABLE task_events ADD COLUMN run_id INTEGER")
_add_column_if_missing(conn, "task_events", "run_id", "run_id INTEGER")
conn.execute(
"CREATE INDEX IF NOT EXISTS idx_events_run "
"ON task_events(run_id, id)"
)
notify_table_exists = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='kanban_notify_subs'"
).fetchone() is not None
if notify_table_exists:
notify_cols = {
row["name"] for row in conn.execute("PRAGMA table_info(kanban_notify_subs)")
}
if "notifier_profile" not in notify_cols:
_add_column_if_missing(
conn, "kanban_notify_subs", "notifier_profile", "notifier_profile TEXT"
)
# One-shot backfill: any task that is 'running' before runs existed
# had its claim_lock / claim_expires / worker_pid on the task row.
# Synthesize a matching task_runs row so subsequent end-run / heartbeat
@@ -1237,6 +1288,12 @@ def create_task(
if skills is not None:
cleaned: list[str] = []
seen: set[str] = set()
# Collect all toolset-name confusions up front so the user sees the
# whole list at once. Raising on the first hit is friendly when the
# input has one mistake, but agents that confuse skills with toolsets
# usually pass several at once (`skills=["web", "browser", "terminal"]`)
# and serial-correcting one per failure round-trips wastes tokens.
toolset_typos: list[str] = []
for s in skills:
if not s:
continue
@@ -1248,10 +1305,23 @@ def create_task(
f"skill name cannot contain comma: {name!r} "
f"(pass a list of separate names instead of a comma-joined string)"
)
if name.casefold() in KNOWN_TOOLSET_NAMES:
toolset_typos.append(name)
continue
if name in seen:
continue
seen.add(name)
cleaned.append(name)
if toolset_typos:
quoted = ", ".join(repr(n) for n in toolset_typos)
noun = "is a toolset name" if len(toolset_typos) == 1 else "are toolset names"
raise ValueError(
f"{quoted} {noun}, not skill name(s). "
"Put toolsets in the assignee profile's `toolsets:` config "
"instead of per-task skills. Skills are named skill bundles "
"(e.g. `kanban-worker`, `blogwatcher`); toolsets are runtime "
"capabilities (e.g. `web`, `browser`, `terminal`)."
)
skills_list = cleaned
# Idempotency check — return the existing task instead of creating a
@@ -1504,7 +1574,14 @@ def unlink_tasks(conn: sqlite3.Connection, parent_id: str, child_id: str) -> boo
conn, child_id, "unlinked",
{"parent": parent_id, "child": child_id},
)
return cur.rowcount > 0
removed = cur.rowcount > 0
if removed:
# Dependency edge removed — re-evaluate promotion eligibility for the
# child immediately. Matches the contract of complete_task and
# unblock_task; without this the child stays stuck in todo until the
# next dispatcher tick or a manual `hermes kanban recompute` (issue #22459).
recompute_ready(conn)
return removed
def parent_ids(conn: sqlite3.Connection, task_id: str) -> list[str]:
@@ -1749,7 +1826,7 @@ def _synthesize_ended_run(
# ---------------------------------------------------------------------------
def recompute_ready(conn: sqlite3.Connection) -> int:
"""Promote ``todo`` tasks to ``ready`` when all parents are ``done``.
"""Promote ``todo`` tasks to ``ready`` when all parents are ``done`` or ``archived``.
Returns the number of tasks promoted. Safe to call inside or outside
an existing transaction; it opens its own IMMEDIATE txn.
@@ -1767,7 +1844,7 @@ def recompute_ready(conn: sqlite3.Connection) -> int:
"WHERE l.child_id = ?",
(task_id,),
).fetchall()
if all(p["status"] == "done" for p in parents):
if all(p["status"] in {"done", "archived"} for p in parents):
conn.execute(
"UPDATE tasks SET status = 'ready' WHERE id = ? AND status = 'todo'",
(task_id,),
@@ -1797,6 +1874,31 @@ def claim_task(
lock = claimer or _claimer_id()
expires = now + int(ttl_seconds)
with write_txn(conn):
# Structural invariant: never transition ready -> running while any
# parent is not yet 'done'. This is the single enforcement point
# regardless of which writer (create_task, link_tasks, unblock_task,
# release_stale_claims, manual SQL) set status='ready'. If a racy
# writer promoted a task with undone parents, demote it back to
# 'todo' here — recompute_ready will re-promote when the parents
# actually finish. See RCA at
# kanban/boards/cookai/workspaces/t_a6acd07d/root-cause.md.
undone = conn.execute(
"SELECT 1 FROM task_links l "
"JOIN tasks p ON p.id = l.parent_id "
"WHERE l.child_id = ? AND p.status NOT IN ('done', 'archived') LIMIT 1",
(task_id,),
).fetchone()
if undone:
conn.execute(
"UPDATE tasks SET status = 'todo' "
"WHERE id = ? AND status = 'ready'",
(task_id,),
)
_append_event(
conn, task_id, "claim_rejected",
{"reason": "parents_not_done"},
)
return None
# Defensive: if a prior run somehow leaked (invariant violation from
# an unknown code path), close it as 'reclaimed' so we don't strand
# it when the CAS resets the pointer below. No-op when the invariant
@@ -1908,16 +2010,69 @@ def release_stale_claims(
) -> int:
"""Reset any ``running`` task whose claim has expired.
Returns the number of stale claims reclaimed. Safe to call often.
A stale-by-TTL claim whose host-local worker PID is still alive is
*extended* (with a ``claim_extended`` event) instead of being
reclaimed. Reclaiming a live worker mid-flight produces the spawn-
then-immediately-reclaim loop seen on slow models that spend longer
than ``DEFAULT_CLAIM_TTL_SECONDS`` inside a single tool-free LLM
call (#23025): no tool calls means no ``kanban_heartbeat``, even
though the subprocess is healthy. ``enforce_max_runtime`` and
``detect_crashed_workers`` remain the upper bounds for genuinely
wedged or dead workers.
Returns the number of stale claims actually reclaimed (live-pid
extensions don't count). Safe to call often.
"""
now = int(time.time())
reclaimed = 0
host_prefix = f"{_claimer_id().split(':', 1)[0]}:"
stale = conn.execute(
"SELECT id, claim_lock, worker_pid FROM tasks "
"WHERE status = 'running' AND claim_expires IS NOT NULL AND claim_expires < ?",
"SELECT id, claim_lock, worker_pid, claim_expires, last_heartbeat_at "
"FROM tasks "
"WHERE status = 'running' AND claim_expires IS NOT NULL "
" AND claim_expires < ?",
(now,),
).fetchall()
for row in stale:
lock = row["claim_lock"] or ""
host_local = lock.startswith(host_prefix)
if host_local and row["worker_pid"] and _pid_alive(row["worker_pid"]):
new_expires = now + int(DEFAULT_CLAIM_TTL_SECONDS)
with write_txn(conn):
cur = conn.execute(
"UPDATE tasks SET claim_expires = ? "
"WHERE id = ? AND status = 'running' "
" AND claim_lock IS ? "
" AND claim_expires IS NOT NULL "
" AND claim_expires < ?",
(new_expires, row["id"], row["claim_lock"], now),
)
if cur.rowcount != 1:
continue
run_id = _current_run_id(conn, row["id"])
if run_id is not None:
conn.execute(
"UPDATE task_runs SET claim_expires = ? WHERE id = ?",
(new_expires, run_id),
)
_append_event(
conn, row["id"], "claim_extended",
{
"reason": "pid_alive",
"worker_pid": int(row["worker_pid"]),
"claim_lock": row["claim_lock"],
"claim_expires_was": int(row["claim_expires"]),
"claim_expires_now": new_expires,
"last_heartbeat_at": (
int(row["last_heartbeat_at"])
if row["last_heartbeat_at"] is not None
else None
),
},
run_id=run_id,
)
continue
termination = _terminate_reclaimed_worker(
row["worker_pid"], row["claim_lock"], signal_fn=signal_fn,
)
@@ -1937,7 +2092,20 @@ def release_stale_claims(
error=f"stale_lock={row['claim_lock']}",
metadata=termination,
)
payload = {"stale_lock": row["claim_lock"]}
payload = {
"stale_lock": row["claim_lock"],
"worker_pid": (
int(row["worker_pid"])
if row["worker_pid"] is not None else None
),
"claim_expires": int(row["claim_expires"]),
"last_heartbeat_at": (
int(row["last_heartbeat_at"])
if row["last_heartbeat_at"] is not None else None
),
"now": now,
"host_local": host_local,
}
payload.update(termination)
_append_event(
conn, row["id"], "reclaimed",
@@ -2496,14 +2664,30 @@ def unblock_task(conn: sqlite3.Connection, task_id: str) -> bool:
""",
(now, int(stale["current_run_id"])),
)
cur = conn.execute(
"UPDATE tasks SET status = 'ready', current_run_id = NULL "
"WHERE id = ? AND status = 'blocked'",
# Re-gate on parent completion before flipping 'blocked' back to
# 'ready'. Unconditionally setting status='ready' here bypasses the
# parent-completion invariant (the dispatcher trusts that column);
# if parents are still in progress the task must wait in 'todo'
# until recompute_ready picks it up. RCA: Bug 2 at
# kanban/boards/cookai/workspaces/t_a6acd07d/root-cause.md.
undone_parents = conn.execute(
"SELECT 1 FROM task_links l "
"JOIN tasks p ON p.id = l.parent_id "
"WHERE l.child_id = ? AND p.status != 'done' LIMIT 1",
(task_id,),
).fetchone()
new_status = "todo" if undone_parents else "ready"
cur = conn.execute(
"UPDATE tasks SET status = ?, current_run_id = NULL "
"WHERE id = ? AND status = 'blocked'",
(new_status, task_id),
)
if cur.rowcount != 1:
return False
_append_event(conn, task_id, "unblocked", None)
_append_event(
conn, task_id, "unblocked",
{"status": new_status} if new_status != "ready" else None,
)
return True
@@ -3504,6 +3688,14 @@ def dispatch_once(
failures the task is auto-blocked with the last error as its reason
prevents the dispatcher from thrashing forever on an unfixable task.
``max_spawn`` is a **live concurrency cap**, not a per-tick spawn budget:
it counts tasks already in ``status='running'`` plus this tick's spawns
against the limit. So ``max_spawn=4`` means "at most 4 workers running
at any time across the whole board" — matching the gateway's stated
intent ("limit concurrent kanban tasks"). With a per-tick interpretation
a 60-second tick interval could grow concurrency by N every minute on a
busy board and accumulate without bound.
``spawn_fn`` defaults to ``_default_spawn``. Tests pass a stub.
``board`` pins workspace/log/db resolution for this tick to a specific
board. When omitted, the current-board resolution chain is used.
@@ -3555,6 +3747,21 @@ def dispatch_once(
result.timed_out = enforce_max_runtime(conn)
result.promoted = recompute_ready(conn)
# Count tasks already running so max_spawn enforces concurrency rather
# than a per-tick spawn budget. See the docstring above for the full
# rationale; the short version is that a 60-second tick interval with a
# per-tick budget of N would grow concurrency by N every tick on a busy
# board, since "running" tasks aren't reclaimed by completion alone —
# they sit in status='running' until the worker calls
# kanban_complete/kanban_block (or the dispatcher TTL-reclaims them).
running_count = 0
if max_spawn is not None:
running_count = int(
conn.execute(
"SELECT COUNT(*) FROM tasks WHERE status = 'running'"
).fetchone()[0]
)
ready_rows = conn.execute(
"SELECT id, assignee FROM tasks "
"WHERE status = 'ready' AND claim_lock IS NULL "
@@ -3562,7 +3769,7 @@ def dispatch_once(
).fetchall()
spawned = 0
for row in ready_rows:
if max_spawn is not None and spawned >= max_spawn:
if max_spawn is not None and running_count + spawned >= max_spawn:
break
if not row["assignee"]:
result.skipped_unassigned.append(row["id"])
@@ -3666,6 +3873,35 @@ def _rotate_worker_log(log_path: Path, max_bytes: int) -> None:
pass
def _resolve_hermes_argv() -> list[str]:
"""Resolve the ``hermes`` invocation as argv parts for ``Popen``.
Tries in order:
1. ``shutil.which("hermes")`` the console-script shim, the same form
that shows up in ``ps`` output and existing logs. Preferred so live
systems' diagnostics stay familiar.
2. ``sys.executable -m hermes_cli.main`` fallback for setups where
Hermes is launched from a venv and the ``hermes`` shim is not on
the dispatcher's ``$PATH`` (cron, systemd ``User=`` services,
launchd jobs, detached processes, etc.). Goes through the running
interpreter so the result is independent of ``$PATH``.
Mirrors ``gateway.run._resolve_hermes_bin`` for the same reason. Kept
local (not imported from gateway) because ``hermes_cli`` sits below
``gateway`` in the dependency order.
"""
import shutil
hermes_bin = shutil.which("hermes")
if hermes_bin:
return [hermes_bin]
# Fallback to the module form. ``hermes_cli.main`` is the actual
# console-script target declared in pyproject.toml, NOT a top-level
# ``hermes`` package — there is no ``hermes`` package to import.
return [sys.executable, "-m", "hermes_cli.main"]
def _default_spawn(
task: Task,
workspace: str,
@@ -3694,6 +3930,25 @@ def _default_spawn(
prompt = f"work kanban task {task.id}"
env = dict(os.environ)
# Inject HERMES_HOME so the worker reads the profile-scoped config.yaml
# (fallback_providers, toolsets, agent settings, etc.) instead of the root
# config. Without this, `env = dict(os.environ)` copies only the parent's
# env, and when the child process starts `hermes -p <name>` the
# _apply_profile_override() runs *before* hermes_constants is imported.
# If HERMES_HOME is absent from the child's env, get_hermes_home() falls
# back to Path.home() / ".hermes" (the DEFAULT profile root), ignoring the
# profile-specific config entirely. Fixes profile-scoped fallback_providers
# being invisible to kanban workers.
from hermes_cli.profiles import resolve_profile_env
try:
env["HERMES_HOME"] = resolve_profile_env(profile_arg)
except FileNotFoundError:
# Profile dir doesn't exist — defer resolution to the CLI's
# _apply_profile_override() via HERMES_PROFILE (set below).
# This only happens in test fixtures where the isolated
# HERMES_HOME never had profiles created.
pass
if task.tenant:
env["HERMES_TENANT"] = task.tenant
env["HERMES_KANBAN_TASK"] = task.id
@@ -3722,7 +3977,7 @@ def _default_spawn(
env["HERMES_PROFILE"] = profile_arg
cmd = [
"hermes",
*_resolve_hermes_argv(),
"-p", profile_arg,
# Auto-load the kanban-worker skill so every dispatched worker
# has the pattern library (good summary/metadata shapes, retry
@@ -4024,7 +4279,14 @@ def build_worker_context(conn: sqlite3.Connection, task_id: str) -> str:
)
for c in shown_c:
ts = time.strftime("%Y-%m-%d %H:%M", time.localtime(c.created_at))
lines.append(f"**{c.author}** ({ts}):")
# Render author with explicit "comment from worker" framing so
# operator-controlled HERMES_PROFILE values like "hermes-system"
# or "operator" can't be misread by the next worker as a system
# directive above the (attacker-influenceable) comment body.
# Defense-in-depth — the LLM-controlled author-forgery surface
# was already closed in #22435. See #22452.
safe_author = (c.author or "").replace("`", "")
lines.append(f"comment from worker `{safe_author}` at {ts}:")
lines.append(_cap(c.body, _CTX_MAX_COMMENT_BYTES))
lines.append("")
@@ -4071,16 +4333,26 @@ def board_stats(conn: sqlite3.Connection) -> dict:
}
def _safe_int(val: Optional[str]) -> Optional[int]:
"""Parse a timestamp field to int, returning None on garbage like '%s'."""
if val is None:
return None
try:
return int(val)
except (ValueError, TypeError):
return None
def task_age(task: Task) -> dict:
"""Return age metrics for a single task. All values are seconds or None."""
now = int(time.time())
age_since_created = now - int(task.created_at) if task.created_at else None
age_since_started = (
now - int(task.started_at) if task.started_at else None
)
created = _safe_int(task.created_at)
started = _safe_int(task.started_at)
completed = _safe_int(task.completed_at)
age_since_created = now - created if created else None
age_since_started = now - started if started else None
time_to_complete = (
int(task.completed_at) - int(task.started_at or task.created_at)
if task.completed_at else None
completed - (started or created) if completed else None
)
return {
"created_age_seconds": age_since_created,
@@ -4101,6 +4373,7 @@ def add_notify_sub(
chat_id: str,
thread_id: Optional[str] = None,
user_id: Optional[str] = None,
notifier_profile: Optional[str] = None,
) -> None:
"""Register a gateway source that wants terminal-state notifications
for ``task_id``. Idempotent on (task, platform, chat, thread)."""
@@ -4109,10 +4382,10 @@ def add_notify_sub(
conn.execute(
"""
INSERT OR IGNORE INTO kanban_notify_subs
(task_id, platform, chat_id, thread_id, user_id, created_at)
VALUES (?, ?, ?, ?, ?, ?)
(task_id, platform, chat_id, thread_id, user_id, notifier_profile, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?)
""",
(task_id, platform, chat_id, thread_id or "", user_id, now),
(task_id, platform, chat_id, thread_id or "", user_id, notifier_profile, now),
)
@@ -4194,6 +4467,57 @@ def unseen_events_for_sub(
return max_id, out
def claim_unseen_events_for_sub(
conn: sqlite3.Connection,
*,
task_id: str,
platform: str,
chat_id: str,
thread_id: Optional[str] = None,
kinds: Optional[Iterable[str]] = None,
) -> tuple[int, int, list[Event]]:
"""Atomically claim unseen notification events for one subscription.
Returns ``(old_cursor, new_cursor, events)``. When events are returned,
``kanban_notify_subs.last_event_id`` has already been advanced to
``new_cursor`` inside a ``BEGIN IMMEDIATE`` transaction. That makes the
notifier's read/claim step single-owner across multiple gateway watcher
processes pointed at the same board DB: concurrent watchers serialize on
SQLite's writer lock, and only the first process sees and claims a given
event range.
Callers should send the claimed events, then either leave the cursor at
``new_cursor`` on success or call :func:`rewind_notify_cursor` if delivery
failed before any terminal unsubscribe removed the row.
"""
with write_txn(conn):
row = conn.execute(
"SELECT last_event_id FROM kanban_notify_subs "
"WHERE task_id = ? AND platform = ? AND chat_id = ? AND thread_id = ?",
(task_id, platform, chat_id, thread_id or ""),
).fetchone()
if row is None:
return 0, 0, []
old_cursor = int(row["last_event_id"])
new_cursor, events = unseen_events_for_sub(
conn,
task_id=task_id,
platform=platform,
chat_id=chat_id,
thread_id=thread_id,
kinds=kinds,
)
if not events:
return old_cursor, old_cursor, []
conn.execute(
"UPDATE kanban_notify_subs SET last_event_id = ? "
"WHERE task_id = ? AND platform = ? AND chat_id = ? AND thread_id = ? "
"AND last_event_id = ?",
(int(new_cursor), task_id, platform, chat_id, thread_id or "", int(old_cursor)),
)
return old_cursor, new_cursor, events
def advance_notify_cursor(
conn: sqlite3.Connection,
*,
@@ -4211,6 +4535,35 @@ def advance_notify_cursor(
)
def rewind_notify_cursor(
conn: sqlite3.Connection,
*,
task_id: str,
platform: str,
chat_id: str,
thread_id: Optional[str] = None,
claimed_cursor: int,
old_cursor: int,
) -> bool:
"""Undo a notification claim when delivery fails.
The CAS guard only rewinds if no later notifier advanced the row after our
claim. This keeps retry behavior for transient send failures without
clobbering newer progress.
"""
with write_txn(conn):
cur = conn.execute(
"UPDATE kanban_notify_subs SET last_event_id = ? "
"WHERE task_id = ? AND platform = ? AND chat_id = ? AND thread_id = ? "
"AND last_event_id = ?",
(
int(old_cursor), task_id, platform, chat_id, thread_id or "",
int(claimed_cursor),
),
)
return cur.rowcount > 0
# ---------------------------------------------------------------------------
# Retention + garbage collection
# ---------------------------------------------------------------------------
+137 -10
View File
@@ -177,7 +177,7 @@ def _active_hallucination_events(
active: list[Any] = []
for ev in events:
k = _event_kind(ev)
if k in ("completed", "edited"):
if k in {"completed", "edited"}:
active.clear()
elif k == kind:
active.append(ev)
@@ -193,10 +193,9 @@ def _latest_clean_event_ts(events: Iterable[Any]) -> int:
"""
latest = 0
for ev in events:
if _event_kind(ev) in ("completed", "edited"):
if _event_kind(ev) in {"completed", "edited"}:
t = _event_ts(ev)
if t > latest:
latest = t
latest = max(latest, t)
return latest
@@ -356,7 +355,7 @@ def _rule_repeated_failures(task, events, runs, now, cfg) -> list[Diagnostic]:
most_recent_outcome = None
for r in reversed(ordered_runs):
oc = _task_field(r, "outcome")
if oc in ("spawn_failed", "timed_out", "crashed"):
if oc in {"spawn_failed", "timed_out", "crashed"}:
most_recent_outcome = oc
break
@@ -374,7 +373,7 @@ def _rule_repeated_failures(task, events, runs, now, cfg) -> list[Diagnostic]:
label=f"Fix profile auth: hermes -p {assignee} auth",
payload={"command": f"hermes -p {assignee} auth"},
))
elif most_recent_outcome in ("timed_out", "crashed"):
elif most_recent_outcome in {"timed_out", "crashed"}:
# Worker got off the ground but died. Logs are the right place
# to diagnose; reclaim/reassign are the recovery levers.
task_id = _task_field(task, "id")
@@ -467,7 +466,7 @@ def _rule_repeated_crashes(task, events, runs, now, cfg) -> list[Diagnostic]:
consecutive += 1
if last_err is None:
last_err = _task_field(r, "error")
elif outcome in ("completed", "reclaimed"):
elif outcome in {"completed", "reclaimed"}:
# A success (or manual reclaim) breaks the streak.
break
else:
@@ -534,8 +533,7 @@ def _rule_stuck_in_blocked(task, events, runs, now, cfg) -> list[Diagnostic]:
for ev in events:
if _event_kind(ev) == "blocked":
t = _event_ts(ev)
if t > last_blocked_ts:
last_blocked_ts = t
last_blocked_ts = max(last_blocked_ts, t)
if last_blocked_ts == 0:
return []
age_hours = (now - last_blocked_ts) / 3600.0
@@ -543,7 +541,7 @@ def _rule_stuck_in_blocked(task, events, runs, now, cfg) -> list[Diagnostic]:
return []
# Any comment / unblock after the block breaks the "stale" signal.
for ev in events:
if _event_kind(ev) in ("commented", "unblocked") and _event_ts(ev) > last_blocked_ts:
if _event_kind(ev) in {"commented", "unblocked"} and _event_ts(ev) > last_blocked_ts:
return []
actions: list[DiagnosticAction] = [
DiagnosticAction(
@@ -570,6 +568,129 @@ def _rule_stuck_in_blocked(task, events, runs, now, cfg) -> list[Diagnostic]:
)]
def _rule_stranded_in_ready(task, events, runs, now, cfg) -> list[Diagnostic]:
"""Task has been in ``ready`` status for too long without any worker
claiming it.
Threshold: cfg["stranded_threshold_seconds"] (default 1800 = 30 min).
Catches every "task waiting for a worker that never comes" case
without caring WHY:
* Operator typo'd the assignee — no profile or external worker matches.
* Profile was deleted, leaving its tasks stranded.
* External worker pool (Codex CLI, Claude Code lane, custom daemon)
is down, hung, or wasn't started.
* Dispatcher is misconfigured (wrong board, wrong HERMES_HOME).
Pre-rule, all of these silently rotted in ``skipped_nonspawnable``
the dispatcher correctly skipped them (good no respawn loop) but
nobody surfaced the fact that operator-actionable work was
accumulating. The rule fires when a ready task's promoted-to-ready
timestamp is older than the threshold AND the assignee is non-empty
(truly unassigned tasks have their own ``skipped_unassigned`` signal
on the dispatcher and a different operator response).
The signal is age-based on purpose: it's identity-agnostic, so it
works for Hermes profiles, registered lanes, external workers, and
typos uniformly. No registry to curate, no per-board allowlist.
"""
threshold_seconds = float(
cfg.get("stranded_threshold_seconds", 30 * 60)
)
status = _task_field(task, "status")
if status != "ready":
return []
# Skip tasks with a live claim — they're being worked on, even if
# the worker hasn't reported progress yet (run-level liveness
# extends the claim TTL; we don't want to second-guess that here).
if _task_field(task, "claim_lock"):
return []
assignee = _task_field(task, "assignee") or ""
if not assignee.strip():
# Unassigned tasks: the dispatcher's ``skipped_unassigned`` is
# already the right signal. A separate diagnostic here would
# double-flag the same condition.
return []
# Find the most recent event that put this task into ready.
# ``created`` covers tasks born ready; ``promoted`` covers parent-
# done auto-promotion; ``reclaimed`` covers TTL/crash recovery;
# ``unblocked`` covers human-driven resumes.
READY_TRANSITION_KINDS = {
"created", "promoted", "reclaimed", "unblocked",
}
last_ready_ts = 0
for ev in events:
if _event_kind(ev) in READY_TRANSITION_KINDS:
t = _event_ts(ev)
last_ready_ts = max(last_ready_ts, t)
# Fallback: if no qualifying event exists (very old task or events
# truncated), fall back to ``created_at`` on the task row. Better
# to occasionally over-flag an ancient task than miss a stranded one.
if last_ready_ts == 0:
last_ready_ts = int(_task_field(task, "created_at", default=0) or 0)
if last_ready_ts == 0:
return []
age_seconds = now - last_ready_ts
if age_seconds < threshold_seconds:
return []
# Format the age in the largest sensible unit.
if age_seconds >= 3600:
age_str = f"{age_seconds / 3600:.1f}h"
else:
age_str = f"{int(age_seconds / 60)}m"
# Severity escalates with age. Below 2x threshold = warning;
# 2x 6x = error; beyond 6x = critical (something is clearly
# broken, not just slow).
if age_seconds >= threshold_seconds * 6:
severity = "critical"
elif age_seconds >= threshold_seconds * 2:
severity = "error"
else:
severity = "warning"
actions = [
DiagnosticAction(
kind="reassign",
label="Reassign to a different worker",
payload={"current_assignee": assignee},
),
DiagnosticAction(
kind="cli_hint",
label="Check dispatcher status",
payload={"command": "hermes kanban diagnostics"},
),
]
return [Diagnostic(
kind="stranded_in_ready",
severity=severity,
title=f"Ready for {age_str} with no worker",
detail=(
f"This task has been ready for {age_str} but nothing has "
f"claimed it. Common causes: assignee {assignee!r} is "
f"misspelled, the profile was deleted, or the external "
f"worker pool for this lane is down. Confirm the assignee "
f"is correct and that a worker is actually polling for it."
),
actions=actions,
first_seen_at=last_ready_ts,
last_seen_at=last_ready_ts,
count=1,
data={
"ready_since": last_ready_ts,
"age_seconds": int(age_seconds),
"assignee": assignee,
"threshold_seconds": int(threshold_seconds),
},
)]
# Registry — order matters: rules higher on the list render first when
# severity ties. Add new rules here.
_RULES: list[RuleFn] = [
@@ -578,6 +699,7 @@ _RULES: list[RuleFn] = [
_rule_repeated_failures,
_rule_repeated_crashes,
_rule_stuck_in_blocked,
_rule_stranded_in_ready,
]
@@ -589,6 +711,7 @@ DIAGNOSTIC_KINDS = (
"repeated_failures",
"repeated_crashes",
"stuck_in_blocked",
"stranded_in_ready",
)
@@ -598,6 +721,10 @@ DEFAULT_CONFIG = {
"spawn_failure_threshold": 3,
"crash_threshold": 2,
"blocked_stale_hours": 24,
# Stranded-task threshold. 30 min by default — below that, the
# signal is dominated by tasks that are about to be claimed on the
# next dispatcher tick (default 60s) and would just be noise.
"stranded_threshold_seconds": 30 * 60,
}
+580 -194
View File
File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More