Adds a built-in 'bunnny' skin preset with a hot-pink coquette palette:
- Hot pink (#FF3366) borders with Barbie-pink (#FF69B4) accents
- Lavender-blush (#FFF0F5) text on deep-plum (#2A0E1E) surfaces
- Coquette spinner verbs (sparkling, twirling, tying a little bow)
- Heart/sparkle/flower spinner faces (♡ ✧ ✿ ❀ ෆ)
- Heart (♡) prompt symbol and tool prefix
- (ノ◕ヮ◕)ノ*:・゚✧ kaomoji in welcome + help header
- Custom HERMES <3 banner_logo in pink gradient
- banner_hero of twin coquette bunnies holding paws, framed with
floating sparkles, hearts, and flowers to fill the banner width
Skin is cosmetic only — agent_name stays 'Hermes Agent'. Adds entry
to the skins.md docs table and ignores .venv/ in .gitignore.
The background skill-review prompts (_SKILL_REVIEW_PROMPT and the **Skills**
half of _COMBINED_REVIEW_PROMPT) steered the reviewer toward passive
behavior — most passes concluded 'Nothing to save.' even when the session
produced real lessons. User-preference corrections (style, format,
legibility, verbosity) were especially lost: they were read as memory
signals only, so skills never carried the fix.
This rewrite changes the stance:
- **Active-update bias.** The reviewer now treats inaction as a missed
learning opportunity. 'Nothing to save.' remains an explicit escape
but is no longer framed as the most-common outcome.
- **User-preference corrections are first-class skill signals.** Style,
tone, format, legibility, verbosity complaints — and the actual
phrasings users use ('stop doing X', 'this is too verbose', 'I hate
when you Y', 'remember this') — now warrant patching the skill that
governs the task, not just writing to memory.
- **Loaded-skill-first preference order.** When a skill was loaded via
/skill-name or skill_view during the session, the reviewer patches
THAT one first. It was in play; it's the right place.
- **Four-step ladder: patch-loaded → patch-umbrella → support-file →
create.** Support files are explicitly enumerated as three kinds:
* references/<topic>.md — session-specific detail OR condensed
knowledge banks (quoted research, API docs excerpts, domain notes)
* templates/<name>.<ext> — starter files to copy and modify
* scripts/<name>.<ext> — statically re-runnable actions
- **Name-veto for CREATE.** New skill names MUST be class-level — no PR
numbers, error strings, codenames, library-alone names, or session
artifacts ('fix-X / debug-Y / audit-Z-today'). If the proposed name
only fits today's task, fall back to one of the patch/support-file
options.
- **Memory scope clarified.** 'who the user is and what the current
situation and state of your operations are' — MEMORY.md is
situational/state, USER.md is identity/preferences.
- **Curator handoff.** Reviewer flags overlap; the background curator
handles consolidation at scale. Single-session reviewer doesn't
attempt umbrella-rebalancing.
Tests: tests/run_agent/test_review_prompt_class_first.py upgraded to
assert the new behavioral contracts (active bias, user-correction
signals, loaded-skill-first, support-file kinds, name-veto, memory
framing, curator handoff). 17 tests, all pass.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Three modules independently implemented the same "preserve head+tail of
a secret, mask the middle" logic with slightly different behaviors that
had started to drift:
hermes_cli/config.py redact_key — 12-char floor, 4+4, DIM '(not set)'
hermes_cli/status.py redact_key — 12-char floor, 4+4, plain '(not set)' ← drift
hermes_cli/dump.py _redact — 12-char floor, 4+4, empty string
The visible bug: 'hermes status' displayed the '(not set)' placeholder
in plain text while 'hermes config' showed it in dim text. Same concept,
inconsistent UI.
Introduces mask_secret() in agent/redact.py as the canonical helper,
with head/tail/floor/placeholder/empty kwargs. The three call sites
become one-line wrappers that differ only in the 'empty' handling:
config.redact_key → mask_secret(k, empty=color('(not set)', Colors.DIM))
status.redact_key → mask_secret(k, empty=color('(not set)', Colors.DIM))
dump._redact → mask_secret(v) # empty → ''
agent.redact._mask_token (log redactor, different policy: 18-char floor,
6+4 visible, '***' on empty) also ports to mask_secret but retains its
own empty-case handling to preserve the historical '***' return.
Net: the three display-time redactors now agree on formatting, the
canonical helper lives in one place, and future tweaks (e.g. adding
bullet-point masking, changing the head/tail widths) happen once.
Verified:
- 3/3 tests/hermes_cli/test_web_server.py::TestRedactKey pass
- 89/89 agent/tests/test_redact.py + tests/tools/test_browser_secret_exfil.py
+ tests/hermes_cli/test_redact_config_bridge.py pass
- Live 'hermes status', 'hermes config', 'hermes dump' all render the
same way they did before (verified against actual env with real
keys: OpenRouter, Firecrawl, Browserbase, FAL, Tinker all show
'prefix...suffix'; Kimi shows '***' at <12 chars; unset shows
'(not set)' uniformly).
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Match the buffered-stdin rearm cadence to IN_PASTE state so large pastes do not spin the normal escape timeout while waiting for readable data to drain.
Keep the latest prompt sticky while the viewport is in live assistant output beyond history, and clear stale sticky state at the real bottom using fresh scroll height.
detect_dangerous_command() and detect_hardline_command() were calling
re.search(pattern, text, re.IGNORECASE | re.DOTALL) inline — Python's
re._cache (512 patterns) amortizes compile cost on the warm path, but:
1. The first terminal() call per process pays the full compile fan-out
for all 59 patterns (12 HARDLINE + 47 DANGEROUS). Measured at
~2.6 ms per detect_dangerous_command() call after re.purge().
2. The re._cache is LRU — unrelated regex work elsewhere in the agent
(response parsing, text normalization, etc.) can evict our patterns
and silently re-compile them on the next terminal() call.
Precompiling at module load eliminates both costs:
detect_dangerous_command:
cold 2.613 ms → 0.298 ms (-88%)
warm 0.042 ms → 0.004 ms (-90%)
detect_hardline_command:
cold ~0.6 ms → 0.006 ms
warm 0.011 ms → 0.002 ms
Savings are per terminal() call. Agents with heavy terminal use see
compound savings; the bigger value is the stability guarantee (no
re._cache eviction can silently re-introduce the 2.6 ms cold cost
mid-session).
Implementation:
- HARDLINE_PATTERNS_COMPILED and DANGEROUS_PATTERNS_COMPILED built at
module load from the existing (pattern, description) tuples, using
shared _RE_FLAGS = re.IGNORECASE | re.DOTALL.
- detect_* functions now iterate the compiled list and call pattern_re.search(text).
- Original HARDLINE_PATTERNS and DANGEROUS_PATTERNS lists kept as-is
(other code in the file uses them for key derivation /
_PATTERN_KEY_ALIASES).
Verified:
- 160/161 tests/tools/test_approval*.py pass (1 pre-existing heartbeat
test flake on main).
- 349/349 tests/tools/ 'approval or terminal or dangerous' pass.
- Live hermes chat smoke: 3 benign terminal commands + 1 rm -rf /tmp/
(clarify prompt fired — approval path still works) + 1 sudo (sudo
password prompt fired — DANGEROUS pattern match still works). 23
log lines in the smoke window, zero errors.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Replace the removed built-in boot-md hook (#17093) with a how-to that
shows users how to wire up the same behavior themselves via the hooks
system. Uses _resolve_gateway_model() + _resolve_runtime_agent_kwargs()
so the example works against custom endpoints and OAuth providers,
not just the aggregator defaults that the old built-in silently assumed.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Validate configured providers against both Hermes runtime provider ids and
catalog-normalized provider ids. This keeps providers like ai-gateway from
being rejected after catalog resolution maps them to models.dev ids.
Keep credential checks and vendor-slug warnings anchored to the runtime id
so doctor reports actionable provider names in follow-up diagnostics.
Two amplifying optimizations to per-turn overhead in the gateway:
1. get_tool_definitions() memoization (model_tools.py)
Keyed on (frozenset(enabled), frozenset(disabled),
registry._generation, config.yaml mtime+size). Only active when
quiet_mode=True (which is every hot-path caller — gateway,
AIAgent.__init__); quiet_mode=False keeps the existing print side
effects. Cached path returns a shallow-copy list sharing read-only
schema dicts.
Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway
constructs fresh AIAgent per message, so this saves ~7 ms/turn before
any LLM work.
2. check_fn() TTL cache (tools/registry.py)
check_fn callables like check_terminal_requirements probe external
state (Docker daemon, Modal SDK, playwright binary). For a long-lived
process, hitting them on every get_definitions() pass was pure waste
— external state changes on human timescales. 30 s TTL so env-var
flips (hermes tools enable X) propagate within a turn or two without
explicit invalidation.
Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate);
subsequent calls ~0.01ms via the upstream memoization.
Invalidation surface:
- registry._generation bumps on register/deregister/register_toolset_alias,
invalidating the memoized definitions automatically.
- config.yaml mtime in the cache key captures user-visible config edits
affecting dynamic schemas (execute_code mode, discord allowlist).
- invalidate_check_fn_cache() exposed for explicit flushes (e.g. after
hermes tools enable/disable).
- tests/conftest.py autouse fixture clears both caches before every test
so env-var monkeypatches don't see stale results.
Also fixes a regression from PR #17046 that I missed:
- tools/web_tools.py — Firecrawl was removed from module scope by the
lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'.
Applied the same _FirecrawlProxy pattern used in auxiliary_client/
run_agent for OpenAI (module-level proxy that looks like the class
but imports the SDK on first call/isinstance; patch() replaces the
attribute as usual).
Verified:
- 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main)
- 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in
the full suite due to check_fn TTL cross-test pollution; fixed by
the autouse fixture)
- 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp
dynamic discovery, 5 mcp structured content — all confirmed on main)
- 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails)
- 868/868 tests/run_agent/ (excluding test_run_agent.py which has
pre-existing suite-level issues)
- Live smoke: 2 turns + /model switch + tool calls, zero errors in
agent.log session window.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
* fix(tui): honor documented mouse_tracking config key
The TUI runtime was reading display.tui_mouse while docs and user-facing
examples pointed users at display.mouse_tracking. That made persistent
mouse-disable config look like a no-op for users trying to restore native
terminal selection/copy behavior on Linux/SSH/tmux terminals.
Use display.mouse_tracking as the canonical key, keep display.tui_mouse as
a legacy fallback, and have /mouse write the documented key. Both gateway
config.get and client-side config sync now share the same precedence: the
canonical key wins, then the legacy key, then default on.
* review(copilot): align mouse tracking config coercion
- Load gateway config once before deriving display.mouse_tracking state.
- Use key-presence precedence on the TUI client too, so canonical
mouse_tracking wins over legacy tui_mouse even when the value is null.
- Treat numeric 0 as disabled on both gateway and client, matching the
existing string "0" handling.
- Widen ConfigDisplayConfig mouse fields because config.get full returns raw
YAML, not normalized booleans.
This PR groups the TUI fixes that restore macOS Terminal usability and clean up the theme/composer regressions:
- copy transcript selections on macOS drag-release so Terminal.app users can copy while mouse tracking is enabled
- copy composer selections on macOS drag-release; composer selection is internal to TextInput and does not use the global Ink selection bus
- keep IDE Cmd+C forwarding setup macOS-only, and make keybinding conflict checks respect simple when-clause overlap/negation
- force truecolor before chalk initializes (unless NO_COLOR / FORCE_COLOR / HERMES_TUI_TRUECOLOR opt-outs apply) so the default banner keeps its gold/amber/bronze gradient in Terminal.app
- move TUI surfaces onto semantic theme tokens and preserve skin prompt symbols as bare tokens with renderer-owned spacing
- render focused placeholders as dim hint text in TTY mode instead of inverse/selected-looking synthetic cursor text
Round 1 of #17174 hit `nix-lockfile-check` failure. Root cause was
NOT a stale hash — the primary `nix (ubuntu-latest)` and
`nix (macos-latest)` builds passed. GitHub's Magic Nix Cache returned
HTTP 418 (rate-limited / throttled) mid-run, so the rebuild bailed
with `some outputs of '/nix/store/...-npm-deps.drv' are not valid,
so checking is not possible` — no `got:` line for the script to
extract.
The script then incorrectly treated this as 'build failed with no
hash mismatch' and exited 1, breaking the lint on every PR whenever
the cache is throttled.
Now we recognize the throttling/cache-disabled signature and skip
that entry with a warning. A real stale hash still surfaces in the
primary `.#$ATTR` build (separate CI job), so we don't lose
coverage.
`web/package-lock.json` was updated by the design-system refactor
(merged via #17007 + follow-ups: spinner / select / badges / buttons)
without bumping `nix/web.nix::npmDeps.hash`, breaking nix builds on
every PR + main since 2026-04-28T18:46.
Hash sourced from the actual `Check flake` failure output:
specified: sha256-AahWmJ9gDQ9pMPa1FYwUjYdO2mOi6JM9Mst27E0vp68=
got: sha256-+B2+Fe4djPzHHcUXRx+m0cuyaopAhW0PcHsMgYfV5VE=
Standalone single-file fix so it can land fast and clear nix on
every other open PR.
* feat(tui): pluggable busy-indicator styles (kaomoji/emoji/unicode/ascii)
The status-bar `FaceTicker` rotated through wide-and-variable kaomoji
glyphs (`(。•́︿•̀。)`, `( ͡° ͜ʖ ͡°)`, …) every 2.5s. Real display widths range
from ~5 to ~16 columns, so the rest of the bar (cwd, ctx %, voice,
bg counter) shifted on every cycle. Padding the verb alone (#17116)
helped but didn't address the dominant jitter source — the glyph
itself.
Add four indicator styles, configurable + hot-swappable:
* `kaomoji` (default — preserves the existing vibe; verb is now
pad-stable so the only width churn left is the kaomoji itself).
* `emoji` — single 2-col emoji frame (`⚕ 🌀🤔✨🍵🔮`).
* `unicode` — `unicode-animations` braille spinner (1-col, smooth).
* `ascii` — `| / - \` (1-col, max compat).
Wires:
* `display.tui_status_indicator` in `DEFAULT_CONFIG` (default
`kaomoji`).
* New JSON-RPC `config.set/get indicator` keys, narrow allow-list.
* `applyDisplay` reads the field and patches `UiState.indicatorStyle`,
so the existing `mtime` poll picks up `~/.hermes/config.yaml` edits
within ~5s without a TUI restart.
* `/indicator [style]` slash command (alias `/indicator-style`,
subcommand completion `kaomoji|emoji|unicode|ascii`). Bare form
shows the current style; setter fires `config.set` and
optimistically `patchUiState({ indicatorStyle })` so the live TUI
swaps immediately, matching the `/skin` UX.
* `CommandDef("indicator", ..., subcommands=...)` so classic CLI
autocomplete + TUI `complete.slash` both surface it.
* `FaceTicker` decouples spinner cadence from verb cadence — the
glyph runs at the spinner's authored interval (or `FACE_TICK_MS`
for kaomoji), the verb stays on the original 2.5s cycle, and both
re-arm cleanly when style changes.
Tests:
* `normalizeIndicatorStyle` rejects unknown / non-string input.
* `applyDisplay → tui_status_indicator` covers fan-out + fallback.
* `/indicator <style>` hot-swaps `UiState.indicatorStyle` after a
successful `config.set`.
* `/indicator sparkle` rejects with the usage hint and never hits
the gateway.
* Slash-parity matrix gets `'/indicator'` → `config.get`.
Validation:
cd ui-tui && npm run type-check — clean; npm test --run — 398/398.
scripts/run_tests.sh tests/test_tui_gateway_server.py
tests/hermes_cli/test_commands.py — 220/220.
* chore(tui): drop /indicator-style alias to declutter autocomplete
* fix(tui): drop verb-width pad — /indicator handles glyph jitter directly
* fix(tui): unicode indicator style hides the verb (cleanest option)
* refactor(tui): single source of truth for INDICATOR_STYLES; cleaner error format
Round 1 Copilot review on PR #17150:
- Exported `INDICATOR_STYLES` const tuple from `interfaces.ts`;
`IndicatorStyle` union type is derived from it. `useConfigSync`
builds its validation Set from the tuple, and `session.ts` uses it
for both the usage hint and the runtime allow-list — adding/removing
a style now touches one line.
- Backend `config.set indicator` error message: switched
`sorted(allowed)` list repr to `pick one of ascii|emoji|kaomoji|unicode`
(matches the TUI usage hint), and reports the normalized `raw`
instead of the original `value`. Backend allowed tuple now has a
comment pointing back at `INDICATOR_STYLES` so the two stay aligned.
Note: kept the verb portion unpadded per design intent — fixed-width
padding was the exact UX the `/indicator` command was added to remove.
Stable width comes from the glyph; verbs cycling is part of the kawaii
aesthetic. Reply on the verb thread will explain.
* fix(tui): drop type collapse + gate verb timer + DEFAULT_INDICATOR_STYLE
Round 2 Copilot review on PR #17150:
- `tui_status_indicator?: 'ascii' | ... | string` collapses to `string`
in TS — consumers got no narrowing. Documented as plain `string` with
a comment about runtime validation via `normalizeIndicatorStyle`.
- `FaceTicker` always started a 2.5s verb interval, even for the
`unicode` style which hides the verb entirely. Now gated on
`showVerb` from `renderIndicator` — `unicode` stays calm.
Pre-emptive self-review (avoid round 3):
- Three call sites duplicated the literal `'kaomoji'` default
(uiStore, normalizeIndicatorStyle, slash command). Added
`DEFAULT_INDICATOR_STYLE` to interfaces.ts and threaded it through
so changing the default touches one line.
* fix(tui-gateway): normalize config.get indicator output to match TUI render
Round 4 Copilot review on PR #17150: `config.get` for `indicator`
returned the raw `display.tui_status_indicator` value without
validation, so a hand-edited config.yaml with stray casing or an
unknown style would leave `/indicator` printing one thing while
the TUI rendered the kaomoji default (frontend's
`normalizeIndicatorStyle` does this normalization on receive).
Lifted the allow-list to module scope as `_INDICATOR_STYLES` /
`_INDICATOR_DEFAULT`, reused by both `config.set` and `config.get`.
Comment notes the alignment with `INDICATOR_STYLES` /
`DEFAULT_INDICATOR_STYLE` in interfaces.ts so adding/removing a
style is a one-line change on each end.
Tests cover: known value verbatim, casing/whitespace normalize,
unknown→default, unset→default.
* fix(tui-gateway): preserve falsy-input diagnostics in config.set indicator error
Round 5 Copilot review on PR #17150: `raw = str(value or "").strip().lower()`
collapsed any falsy non-string (`0`, `False`, `[]`) to empty string,
so the error message read `unknown indicator: ` with nothing after —
losing the original input.
Switched to `("" if value is None else str(value)).strip().lower()`
so only `None` (the genuine 'no value' case) becomes blank. Used
`{raw!r}` in the error so the diagnostic is unambiguous (`'0'` vs `0`).
Tests:
- known-value happy path (`'EMOJI'` → `'emoji'`)
- falsy non-string inputs (`0` / `False` / `[]`) surface meaningfully
- `None` keeps the blank-repr error
* feat(tui): expand light-terminal auto-detection (HERMES_TUI_THEME, BG hex)
Modern terminals (Ghostty, Warp, iTerm2) don't set COLORFGBG, so the
auto-light path was effectively COLORFGBG-only and silently broken for
many users. Two pragmatic additions, both opt-in, plus a clearer
priority chain:
1. **`HERMES_TUI_THEME=light|dark`** as a symmetric explicit override.
The existing `HERMES_TUI_LIGHT` is fine but reads as boolean noise;
a named theme env var matches `display.skin` muscle memory.
2. **`HERMES_TUI_BACKGROUND` hex/rgb hint.** Lets advanced users
(or a future OSC11 query helper that caches the answer) state a
ground-truth background colour. Decoded to Rec. 709 luma; ≥ 0.6
counts as light.
Priority order is now fully ordered and explainable:
1. `HERMES_TUI_LIGHT` (1/0/true/false/on/off).
2. `HERMES_TUI_THEME=light|dark`.
3. `HERMES_TUI_BACKGROUND` luminance.
4. `COLORFGBG` last field — light slots 7/15 → light, 0–15 → dark
(authoritative when set, so the new TERM_PROGRAM path can never
stomp on a terminal that already volunteered a dark answer).
5. `TERM_PROGRAM` allow-list — empty by default. The slot is left
in place because folks asked for it but populating it risks
wrongly flipping users on Apple_Terminal / iTerm2 dark profiles
to light. Easy to add per terminal once we have signal.
Tests: 5 new cases in `theme.test.ts` covering theme env, background
hex (3- and 6-char), invalid hex falling through, and COLORFGBG taking
precedence over the future allow-list.
Validation: `npm run type-check` clean, `npm test --run` 392/392.
* review(copilot): tighten theme detection comments + drop unnecessary cast
* review(copilot): strict hex regex so partial garbage doesn't slip into luminance
* test(tui): make TERM_PROGRAM allow-list injectable so precedence is provable
Copilot review on PR #17113: `LIGHT_DEFAULT_TERM_PROGRAMS` is empty
in production, so the prior assertion would have passed even if
`detectLightMode` ignored `COLORFGBG` entirely. That defeats the
test's purpose.
`detectLightMode` now takes the allow-list as an optional second
argument (defaults to the production set). The test injects a set
containing `Apple_Terminal`, asserts the allow-list alone WOULD
return light, then asserts `COLORFGBG: '15;0'` overrides it — the
precedence rule is now exercised, not assumed.
* fix(tui): COLORFGBG empty-trailing-field falls through; isolate DEFAULT_THEME tests
Round 2 Copilot review on PR #17113:
1. `Number(colorfgbg.split(';').at(-1))` returns 0 for an empty trailing
field (e.g. `COLORFGBG='15;'` → bg===0), which would have looked
like an authoritative dark slot and incorrectly blocked the
TERM_PROGRAM allow-list. Added a `/^\d+$/` guard before coercion;
non-numeric trailing fields now fall through.
2. Fixed the misleading '0–6 / 8–15 ranges are dark' comment — the
block returns true for bg===15, so the range is actually 0–6 / 8–14.
3. `DEFAULT_THEME` is computed from `process.env` at module-load.
A developer shell with `HERMES_TUI_THEME=light` (or a bright
`HERMES_TUI_BACKGROUND`) would flip it and break local tests.
The DEFAULT_THEME describe blocks now sterilize the relevant env
vars + dynamically import theme.ts (vi.resetModules pattern from
platform.test.ts). fromSkin tests compare against DARK_THEME
directly to decouple them from ambient env.
* test(tui): isolate ALL env-coupled theme symbols, not just DEFAULT_THEME
Round 3 Copilot review on PR #17113: the static top-level imports of
`fromSkin`, `DARK_THEME`, `LIGHT_THEME` evaluated theme.ts before
`importThemeWithCleanEnv` had a chance to clean the env. Because
`fromSkin` closes over `DEFAULT_THEME`, an ambient `HERMES_TUI_THEME=light`
or bright `HERMES_TUI_BACKGROUND` would still flip the base palette
and cause local-only failures.
Removed the static import entirely. Every test now obtains its theme
symbols via `importThemeWithCleanEnv`, including `detectLightMode`
(for consistency, even though it takes env as a parameter).
`fromSkin` tests assert against the cleaned `DEFAULT_THEME` from the
same dynamic import — preserves the actual contract (skins extend the
ambient base palette) without coupling the test to dev-shell state.
Verified by running with HERMES_TUI_THEME=light + HERMES_TUI_BACKGROUND=#ffffff:
all 20 theme tests still pass.
Self-review (avoid round 4):
- Audited other test files importing DEFAULT_THEME (syntax.test.ts,
streamingMarkdown.test.ts, constants.test.ts) — all just pass it as
a parameter or assert palette property existence (works on both
light + dark), so no env coupling there.
* fix(tui-gateway): harden stdio transport against half-closed pipes + SIGTERM races
`tui_gateway` reports `tui_gateway_crash.log` traces where the main
thread sits in `sys.stdin` while a worker holds `_stdout_lock` mid-
flush, and SIGTERM then calls `sys.exit(0)` while the lock is still
held — the interpreter shutdown stalls behind the wedged write.
Two narrowly scoped hardenings:
**`tui_gateway/transport.py`**
* Move JSON serialisation outside the lock — long messages no longer
block sibling writers while we serialise.
* Treat `BrokenPipeError`, `ValueError` ("I/O on closed file") and
generic `OSError` from both `write` and `flush` as "peer is gone":
return `False` instead of bubbling, matching what `write_json`'s
callers in `entry.py` already expect.
* Split `flush` into its own try block so a stuck flush never strands
a partial write or holds the lock indefinitely on its way out.
* Optional `HERMES_TUI_GATEWAY_NO_FLUSH=1` env knob to skip explicit
`flush()` entirely on environments where a half-closed read pipe
produces an indefinite kernel-level block. Default unchanged.
**`tui_gateway/entry.py`**
* `_log_signal` now spawns a 1-second daemon timer that calls
`os._exit(0)` if the orderly `sys.exit(0)` path is itself stuck
behind a wedged worker. Atexit handlers run inside the grace
window when they can; the timer is the safety net so a deadlocked
flush no longer strands the gateway process.
Tests:
* `test_write_json_closed_stream_returns_false` — ValueError path.
* `test_write_json_oserror_on_flush_returns_false` — OSError on flush
must not strand the lock; the write portion still landed before the
flush failure.
* `test_write_json_no_flush_env_skips_flush` — env knob bypass.
Validation: `scripts/run_tests.sh tests/tui_gateway/test_protocol.py`
(42/42 pass; one pre-existing failure on
`test_session_resume_returns_hydrated_messages` is unrelated to this
change — same `include_ancestors` mock kwarg issue tracked elsewhere).
`scripts/run_tests.sh tests/test_tui_gateway_server.py` 90/90 pass.
* review(copilot): tighten transport hardening comments + test cleanup
* review(copilot): narrow exception capture, configurable grace, simpler no-flush test
* fix(tui-gateway): narrow ValueError to closed-stream; surface UnicodeEncodeError
Copilot review on PR #17118: `UnicodeEncodeError` is a ValueError
subclass, so a non-UTF-8 stdout (mismatched PYTHONIOENCODING / locale)
would have been silently swallowed as 'peer gone' under
`except ValueError`. That hides a real environment bug.
Now:
- UnicodeEncodeError → log with exc_info (warning) and drop the frame
- ValueError where str(e) contains 'closed file' → peer gone, return False
- Any other ValueError → log loudly, drop frame (defensive, but visible)
Same shape applied to flush. Adds two regression tests.
* fix(tui-gateway): reserve write() False for peer-gone; re-raise programming errors
Round 2 Copilot review on PR #17118: `Transport.write()` returning
`False` is documented as 'peer is gone', and `entry.py` reacts by
calling `sys.exit(0)`. But the implementation also returned False
for non-IO conditions (non-JSON-safe payloads, UnicodeEncodeError,
unrelated ValueErrors), so a programming error or local env bug would
present as a clean disconnect — exactly the diagnosis pain we wanted
to eliminate.
Now:
- `json.dumps` failure → re-raises (TypeError/ValueError surfaces in crash log)
- `BrokenPipeError` → False (peer gone)
- `ValueError('...closed file...')` → False (peer gone)
- `UnicodeEncodeError` and any other ValueError → re-raise
- `OSError` → False (existing IO-failure semantics, debug-logged)
Tests updated to assert the re-raise behaviour and added a
non-serializable-payload regression test.
* fix(tui-gateway): narrow OSError to peer-gone errnos; honest test naming
Round 3 Copilot review on PR #17118:
- Docstring claimed False = peer gone, but generic OSError on write/flush
also returned False — meaning ENOSPC/EACCES/EIO would silently exit.
Added `_PEER_GONE_ERRNOS = {EPIPE, ECONNRESET, EBADF, ESHUTDOWN, +WSA}`
and narrowed the OSError handlers; non-peer-gone errnos re-raise.
Docstring now lists OSError as peer-gone branch with the errno set.
- The `_DISABLE_FLUSH` test was named after the env var but actually
patched the module constant. Renamed it to reflect the contract being
tested (skips flush when constant is true) AND added a real
end-to-end test that sets the env var, reloads transport.py, and
asserts the constant flips. Cleanup reload restores defaults so
parallel tests stay isolated.
Self-review (avoid round 4):
- Verified TeeTransport's secondary-swallow stays intentional.
- _log_signal grace path already covered by separate tests.
* fix(tui): honor display.busy_input_mode in TUI v2
The TUI v2 frontend hard-coded `composerActions.enqueue(full)` whenever
`ui.busy` was true. The classic CLI and gateway adapters honor the
`display.busy_input_mode` config key (`interrupt` | `queue` | `steer`),
but Ink ignored it — sending a message during a long-running turn always
landed in the queue regardless of config. The config default is already
`interrupt` (hermes_cli/config.py), so users who explicitly opted into
that experience were silently stuck on the legacy queue path.
This wires the value through the existing config-sync surface:
* `applyDisplay` now reads `display.busy_input_mode`, defaults to
`interrupt` (matching `_load_busy_input_mode` in tui_gateway), and
drops it into a new `UiState.busyInputMode` field.
* `dispatchSubmission` and the queue-edit fall-through call a shared
`handleBusyInput` helper that branches on the mode:
* `queue` — legacy behavior, append to the queue.
* `steer` — call `session.steer`; on rejection, fall back to
queue with a sys note.
* `interrupt` — `turnController.interruptTurn(...)` then `send()`,
so the new prompt actually moves.
* Mtime polling in `useConfigSync` already re-applies `config.full`, so
flipping `display.busy_input_mode` in `~/.hermes/config.yaml` takes
effect on the next 5s tick without restarting the TUI.
Tests:
* `applyDisplay → busy_input_mode` covers normalization + UiState fan-out.
* `normalizeBusyInputMode` mirrors the Python side's allow-list.
Validation:
* `npm run type-check` (in `ui-tui/`) — clean.
* `npm test --run` (in `ui-tui/`) — 394/394.
* review(copilot): narrow busy_input_mode type, preserve queue order on steer fallback
* review(copilot): clarify handleBusyInput comment (option, not return value)
* fix(tui): default busy_input_mode to queue in TUI (CLI keeps interrupt)
In a full-screen TUI users typically author the next prompt while the
agent is still streaming, so an unintended interrupt loses in-flight
typing. TUI fallback now defaults to `queue`; CLI / messaging
adapters keep `interrupt` as the framework default.
Override per-config via `display.busy_input_mode: interrupt` (or
`steer`) — the normalize/wire path is unchanged, only the missing-
value branch differs from the Python default.
uiStore initial value also flipped to `queue` so first-frame render
before `config.full` lands matches the eventual normalized value.
`turnController.recordMessageComplete` and `recordMessageDelta` both
prioritised `payload.rendered` over `payload.text`. `payload.rendered`
is the Rich-Console output `tui_gateway` builds for terminals that
can't render markdown themselves; the TUI already renders markdown via
`<Md>`. Two real bugs follow:
1. **Final answer garbled when `display.final_response_markdown: render`
is set** (#16391). Raw ANSI escape sequences pass through into the
React tree and the user sees overlapping coloured text instead of
their answer.
2. **Streaming silently drops content.** Per-delta `rendered` is an
*incremental* Rich fragment. The previous code did
`this.bufRef = rendered ?? this.bufRef + text`, which on every tick
replaced the whole accumulated buffer with the latest mid-sequence
ANSI fragment. Long replies arrived truncated and looked
half-painted — easy to miss as "model is being terse" instead of a
client bug.
Fix:
* `recordMessageComplete` now prefers `payload.text`, falling back to
`payload.rendered` only when the gateway elected not to send any.
* `recordMessageDelta` always accumulates `text`; `rendered` is ignored
on the streaming path entirely (Ink does its own markdown render via
`<Md>` / `streamingMarkdown.tsx`).
Tests:
* `prefers raw text over Rich-rendered ANSI on message.complete` —
the assistant message reflects raw markdown, not ANSI.
* `falls back to payload.rendered when text is missing` — preserves
the legacy "no `text`, only ANSI" path used by some adapters.
* `always accumulates raw text in message.delta and ignores rendered` —
pre-fix code would have made this assertion fail because each delta
overwrote the buffer.
Validation: `npm run type-check` clean, `npm test --run` 392/392 pass.
* fix(tui): make /browser connect actually take effect on the live agent
Reports were that `/browser connect <url>` (and "changes to CDP url
don't get picked up") didn't propagate to the live agent in `--tui`,
forcing users to fall back to setting `browser.cdp_url` in
`config.yaml` and restarting. Tracing the path on current main shows
the protocol wiring is already correct — `/browser` is registered in
`ui-tui/src/app/slash/commands/ops.ts` and dispatches `browser.manage`
through the gateway RPC, NOT the slash worker (covered by the
`browser.manage` row in `slashParity.test.ts`). But three real gaps
left the experience flaky:
1. `cleanup_all_browsers()` ran AFTER `os.environ["BROWSER_CDP_URL"]`
was rewritten. `_ensure_cdp_supervisor(...)` reads the env to
resolve its target URL, so a tool call landing in that brief window
could re-attach the supervisor to the OLD CDP endpoint just before
we reaped sessions, leaving the agent talking to a dead URL.
Reorder to clean first, swap env, clean again so the supervisor
for the default task is definitively closed.
2. `browser.manage status` reported only the env var, ignoring
`browser.cdp_url` from config.yaml. `_get_cdp_override()` (the
resolver the agent itself uses) consults both — match it so
`/browser status` answers the same question the next
`browser_navigate` will see. Closes a stealth bug where users
saw "browser not connected" while their CDP URL was perfectly
set in config.yaml.
3. `/browser disconnect` only cleared `BROWSER_CDP_URL` and reaped
once, leaving the same swap window as connect. Symmetrical
double-cleanup here too.
Frontend (`ops.ts`):
* Echo "next browser tool call will use this CDP endpoint" on success
so users see immediate confirmation that the gateway accepted the
swap, even before any tool runs.
* Mention `browser.cdp_url` in `config.yaml` in the usage hint and
the not-connected status line. Persistent config is the correct
fix for some terminal-multiplexer / sub-agent flows where env
inheritance is unreliable; surfacing it makes that workaround
discoverable.
Tests (4 new, all hermetic):
* `status` returns the resolved URL when only `browser.cdp_url` is
set in config.yaml.
* `connect` writes env AND cleans before/after, in that order.
* `connect` against an unreachable endpoint does NOT mutate env or
reap.
* `disconnect` removes env and cleans twice.
Validation:
scripts/run_tests.sh tests/test_tui_gateway_server.py — 94/94 pass.
cd ui-tui && npm run type-check — clean; npm test --run — 389/389.
* review(copilot): always defer to _get_cdp_override; normalize bare host:port
* review(copilot): collapse discovery-style CDP paths so /json/version isn't duplicated
* fix(tui): /browser status must not perform CDP discovery I/O
Copilot review on PR #17120: previous version routed through
`tools.browser_tool._get_cdp_override`, which calls
`_resolve_cdp_override` and performs an HTTP probe to /json/version
with a multi-second timeout for discovery-style URLs. That blocks
the TUI on `/browser status` whenever the configured host is slow
or unreachable.
Status now reads env-then-config directly with no network I/O. The
WS normalization still happens in `browser_navigate` for actual
tool calls, so behaviour-on-call is unchanged.
* fix(tui): skip /json/version probe for concrete ws://devtools/browser endpoints
Round 2 Copilot review on PR #17120: hosted CDP providers (Browserbase,
browserless, etc.) return concrete `ws[s]://.../devtools/browser/<id>`
URLs which are already directly connectable but don't serve the HTTP
discovery path. The previous `/json/version` probe rejected these
valid endpoints with 'could not reach browser CDP'.
For `ws[s]://...` URLs whose path starts with `/devtools/browser/` we
now do a TCP-level reachability check (`socket.create_connection`)
instead of the HTTP probe. The actual CDP handshake happens on the
next `browser_navigate` call, so we still surface unreachable hosts
as 5031 errors — just without the false negatives.
Discovery-style URLs (`http://host:port[/json[/version]]`) keep the
HTTP probe path unchanged. Updated existing test + added two new
ones (TCP-only success, TCP unreachable → 5031).
* feat(tui): opt-in auto-resume of the most recent session
`hermes --tui` always forges a fresh session at startup unless the user
sets `HERMES_TUI_RESUME=<id>`. Disconnects, terminal-window crashes,
and accidental Ctrl+D therefore lose every piece of in-flight context
even though `state.db` still has the full history a `/resume` away.
Add an opt-in path that mirrors classic CLI's `hermes -c` muscle
memory: when `display.tui_auto_resume_recent: true` is set in
`~/.hermes/config.yaml`, the TUI looks up the most recent human-facing
session and resumes it instead of starting fresh. Default off so
existing users aren't surprised; explicit `HERMES_TUI_RESUME` always
wins.
Wires:
* New `session.most_recent` JSON-RPC in `tui_gateway/server.py` that
returns the first non-`tool` row from `list_sessions_rich`, or
`{"session_id": null}` when none. Uses the same deny-list as
`session.list` so sub-agent rows can't sneak in.
* `createGatewayEventHandler.handleReady` re-ordered: explicit
`STARTUP_RESUME_ID` first (unchanged), then conditional auto-resume
via `config.get full → display.tui_auto_resume_recent`, then the
legacy `newSession()` fallback. Failures of either RPC fall back
to `newSession()` so the path is always finite.
* Default `display.tui_auto_resume_recent: False` added to
`DEFAULT_CONFIG` in `hermes_cli/config.py` (no `_config_version`
bump per AGENTS.md — deep-merge handles the additive key).
Tests:
* 4 new vitest cases in `createGatewayEventHandler.test.ts` cover
every gate-and-fallback combination (env wins, config off, config
on with hit, config on with miss).
* 3 new pytest cases for `session.most_recent` (denied row skip,
tool-only → null, db-unavailable → null).
Validation:
scripts/run_tests.sh tests/test_tui_gateway_server.py — 93/93.
cd ui-tui && npm run type-check — clean; npm test --run — 393/393.
* review(copilot): fold session.most_recent errors into null + extend ConfigDisplayConfig
* review(copilot): cover RPC-rejection fallbacks in auto-resume tests
* fix(tui): drop stale stream events after ctrl-c interrupt
Once interruptTurn() flips this.interrupted, only recordMessageDelta
short-circuited. recordReasoningDelta/Available, recordToolStart/
Progress/Complete, and recordInlineDiffToolComplete kept populating
turnState until the python loop reached its next _interrupt_requested
check (~1s on busy turns), making it look like ctrl-c was ignored
while late "thinking" + tool calls kept landing in the UI.
Add the same interrupted guard to every stream-side recorder, and
clear the flag at startMessage() so the next turn isn't suppressed
if the previous turn never delivered message.complete.
* fix(tui): guard recordTodos against post-interrupt mutation; fake-timers in test
Copilot review on PR #16706:
1. `recordToolStart` is interruption-guarded, but `tool.start`
handler also calls `recordTodos(payload.todos)` first — so a
late tool.start carrying todos could still mutate `turnState.todos`
after Ctrl-C, leaving ghost rows in the panel. Adds the same
`if (this.interrupted) return` early-exit to `recordTodos` so
*all* tool.start side-effects are dropped post-interrupt.
2. The interrupt test was leaking a real `setTimeout` (interrupt
cooldown) across test files, which could fire later and mutate
uiStore from the wrong test context. Wraps the test in
`vi.useFakeTimers()` + `vi.runAllTimers()` and restores real
timers in finally.
3. Extends the same test with a todos payload on the post-interrupt
tool.start so we have explicit regression coverage for #1.
* fix(tui): guard pushTrail post-interrupt; harden interrupt-test cleanup
Round 2 Copilot review on PR #16706:
1. `tool.generating` events route through `pushTrail`, which was not
interruption-guarded — late events could still write 'drafting …'
into `turnTrail` after Ctrl-C, leaving a stale shimmer in the UI.
Adds the same `if (this.interrupted) return` early-exit.
2. Test cleanup moved `vi.runAllTimers()` into `finally` (before
`vi.useRealTimers()`) so a mid-test assertion failure can't leak
the interrupt-cooldown setTimeout across other test files.
3. Replaced the misleading 'pre-interrupt todos … expected to be
cleared by the interrupt cycle' comment with an accurate one
reflecting current behaviour (interrupt does NOT clear todos).
4. Added an explicit assertion that a post-interrupt `tool.generating`
event does not extend `turnTrail` — regression coverage for #1.
* fix(tui): append gateway stderr tail to start_timeout activity
`gateway.start_timeout` previously published only `cwd` + `python`,
which made TUI startup failures hard to disambiguate. The user saw
`gateway startup timed out · /path/to/python /repo · /logs to inspect`
with no signal whether the actual cause was a wrong python interpreter,
a missing dependency, or a config parse failure.
Plumb a 20-line stderr tail through the event so the most useful lines
land directly in the TUI activity feed, capped to the last 8 non-empty
lines for readability:
* `gatewayClient.ts` — collect `getLogTail(20)` when the readyTimer
fires and attach it as `payload.stderr_tail`.
* `gatewayTypes.ts` — extend the `gateway.start_timeout` event union
with the new optional field.
* `createGatewayEventHandler.ts` — emit the trimmed lines after the
existing `gateway startup timed out` activity entry, classified
`error`.
Tests: regression test in `createGatewayEventHandler.test.ts` checks
that `ModuleNotFoundError` / `FileNotFoundError` lines from the tail
land in `getTurnState().activity` so they show up in the UI immediately.
Validation: `npm run type-check` clean, `npm test --run` 390/390.
* review(copilot): filter blanks before slice and cap stderr tail at 120 chars
Tables rendered through `<Md>` had no separator and no header weight,
so they read as a paragraph with extra whitespace. This adds two tiny,
border-free changes that survive Ink's grapheme-approximate column
widths better than a full outline:
* Bold the header row, keeping the existing amber colour.
* Insert a dim `─`-dashed rule between the header and body rows.
We deliberately stay away from a full outline — column widths are
measured via `stripInlineMarkup(...).length`, which is grapheme-aware
but still off by a cell on East Asian wide characters and emoji-mid-
cell strings. A header rule plus the existing 2-space column gap
gives the visual hierarchy the issue asks for without amplifying that
inaccuracy into a misaligned border.
Validation: `npm run type-check` clean, `npm test --run` 389/389.
- Remove dead _lmstudio_loaded_context attribute from run_agent.py (set
but never read — the loaded context is pushed to context_compressor.update_model
which is the actual consumer)
- Cache empty reasoning options with 60s TTL to avoid per-turn HTTP probe
for non-reasoning LM Studio models. Non-empty results cached permanently.
- Extract _lmstudio_server_root(), _lmstudio_request_headers(), and
_lmstudio_fetch_raw_models() shared helpers in models.py — eliminates
URL-strip + auth-header + HTTP-call duplication across probe_lmstudio_models,
ensure_lmstudio_model_loaded, and lmstudio_model_reasoning_options
- Revert runtime_provider.py base_url precedence change: preserve the
established contract (saved config.base_url > env var > default) for all
api_key providers
- Remove unnecessary config version bump 22→23
- Fix TUI test: relax target_model assertion to avoid module-cache flake
- AUTHOR_MAP: added rugved@lmstudio.ai → rugvedS07
CopilotACPClient communicates via subprocess stdio and returns a plain
SimpleNamespace from _create_chat_completion(). The streaming path tries
to iterate this as a stream, crashing with:
TypeError: 'types.SimpleNamespace' object is not iterable
Mirror the existing ACP exclusion pattern (used for Responses API upgrade)
to disable streaming when provider is copilot-acp or base_url starts with
acp:// or acp+tcp://.
Based on PR #9428 by @ningfangbin and issue #16271 by @Joseph19820124.
Fixes#16271
* ci(nix): auto-fix stale npm hashes on push to main
When a PR merges to main with updated package-lock.json or package.json
in ui-tui/ or web/, the new auto-fix-main job detects stale npmDepsHash
values and pushes a fix commit directly to main.
This eliminates the recurring manual hash-bump PRs (#15420, #15314,
#15272, #15244) by reusing the existing fix-lockfiles --apply pipeline.
The fix commit only touches nix/*.nix files, which are outside the push
path filter (package-lock.json / package.json), so it cannot re-trigger
itself.
Closes#15314
* fix(ci): use GitHub App token for auto-fix-main push
GITHUB_TOKEN commits are invisible to workflow triggers (GitHub's
infinite-loop prevention). The auto-fix-main job pushes directly to
main, so the fix commit never triggered downstream nix.yml verification.
Mint a short-lived token via the repo's GitHub App (daimon-nous, APP_ID
+ APP_PRIVATE_KEY secrets) so the push is treated as a real event and
nix.yml fires to verify the corrected hashes.
Tested via workflow_dispatch dry-run: app token minted successfully,
checkout with app token succeeded, fix job correctly gated.
Resolves review feedback from Bugbot (r3144569551).
* ci(nix): rename lockfile check job for required status check
Rename 'check' → 'nix-lockfile-check' so the status check name is
unambiguous when added as a required check on main.
* fix(ci): harden auto-fix-main against races, loops, and silent failures
Address adversarial review findings:
1. Race condition (#1): Job-level concurrency with cancel-in-progress
collapses back-to-back pushes; ref: main checkout always gets latest
branch state; explicit push target (origin HEAD:main).
2. Loop prevention (#2): File-whitelist check before commit aborts if
any file outside nix/{tui,web}.nix was modified, preventing
accidental self-triggering.
3. Silent infra failures (#8): nix-lockfile-check now fails explicitly
when fix-lockfiles exits without reporting stale status (catches nix
setup failures, network errors, script bugs that bypass continue-on-error).
4. Commit traceability (#11): Auto-fix commits include source SHA and
workflow run URL in the commit body.
5. Explicit push target (#12): git push origin HEAD:main instead of
bare git push.
---------
Co-authored-by: alt-glitch <alt-glitch@users.noreply.github.com>
* fix(nix): make extraPackages actually work — wire into per-user profile
#17030 deprecated extraPackages because it only set the systemd service
PATH, which the terminal backend's login-shell snapshot discards.
Instead of deprecating, fix it: set users.users.${cfg.user}.packages
so NixOS builds a per-user profile at /etc/profiles/per-user/hermes/bin.
This path is included in PATH by /etc/set-environment, which the login
shell sources, so the terminal backend's snapshot picks it up.
One line of actual logic:
users.users.${cfg.user}.packages = cfg.extraPackages;
Verified in a NixOS VM test: su - hermes -c 'which hello' resolves
to /etc/profiles/per-user/hermes/bin/hello.
Reverts the deprecation warning and docs changes from #17030, restores
extraPackages as the recommended way to give the agent extra tools.
Container mode is unaffected — extraPackages was always native-only
(the systemd path line is inside !cfg.container.enable).
* nix: clarify additive merge semantics for extraPackages user profile
---------
Co-authored-by: Siddharth Balyan <daimon@noreply.github.com>
BOOT.md was merged in PR #3733 before the feature was ready — the
built-in hook spawned a bare AIAgent() with no model/runtime kwargs,
which immediately 401s on any provider with a custom endpoint. Three
separate community PRs (#5240, #12514, #14992) tried to paper over it.
Remove the BOOT.md hook entirely and its user-facing docs/tips. Keep
the gateway/builtin_hooks/ package and the HookRegistry._register_builtin_hooks()
hook-point intact as the extension surface for future always-on
gateway hooks.
Closes#5239.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
* perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage
Four heavy SDK/module imports are now deferred off the hot startup path.
Net savings on cold module imports:
cli 1200 → 958 ms (-242)
run_agent 1220 → 901 ms (-319)
tools.web_tools 711 → 423 ms (-288)
agent.anthropic_adapter 230 → 15 ms (-215)
agent.auxiliary_client 253 → 68 ms (-185)
Four independent changes in one PR since they all use the same pattern
and share the same risk profile (heavy SDK import → lazy proxy or
function-local import):
1. tools/web_tools.py:
'from firecrawl import Firecrawl' moved into _get_firecrawl_client(),
which is only called when backend='firecrawl'. Users on Exa/Tavily/
Parallel pay zero firecrawl cost.
2. cli.py + gateway/run.py:
'from agent.account_usage import ...' moved into the /limits handlers.
account_usage transitively pulls the OpenAI SDK chain; only needed
when the user runs /limits.
3. agent/anthropic_adapter.py:
'try: import anthropic as _anthropic_sdk' replaced with a cached
'_get_anthropic_sdk()' accessor. The three usage sites
(build_anthropic_client, build_anthropic_bedrock_client,
read_claude_code_credentials_from_keychain) now resolve via the
accessor. All pre-existing test patches of
'agent.anthropic_adapter._anthropic_sdk' keep working because the
accessor respects any value already in module globals.
4. agent/auxiliary_client.py AND run_agent.py:
'from openai import OpenAI' replaced with an '_OpenAIProxy()' module-
level object that looks like the OpenAI class but imports the SDK on
first call/isinstance check. This preserves:
- 15+ in-module OpenAI(...) construction sites in auxiliary_client
and the single site in run_agent's _create_openai_client (Python's
function-scope name lookup finds the proxy, forwards the call);
- 'patch("agent.auxiliary_client.OpenAI", ...)' and
'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test
files (patch replaces the module attribute as usual).
Tried two alternatives first:
- 'from openai._client import OpenAI' — doesn't skip openai/__init__.py
(the audit's hypothesis here was wrong).
- Module-level __getattr__ — works for external access but Python
function-scope name resolution skips __getattr__, so in-module
OpenAI(...) calls NameError.
Note: 'openai' still loads on 'import cli' because
cli.py -> neuter_async_httpx_del() -> openai._base_client, and
run_agent.py -> code_execution_tool.py (module-level
build_execute_code_schema) -> _load_config() -> 'from cli import
CLI_CONFIG'. Deferring those is a separate, larger change — out of scope
for this PR. The savings above all come from avoiding the openai/*,
anthropic/*, and firecrawl/* top-level type-tree imports on paths that
don't need them.
Verified:
- 302/302 tests in tests/agent/{test_anthropic_adapter,
test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain}
pass. Two pre-existing failures on main unchanged.
- 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail).
- 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py,
test_plugin_context_engine_init.py, test_invalid_context_length_warning.py,
test_api_max_retries_config.py,
tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py
pass (1 pre-existing fail).
- Live hermes chat smoke: 2 turns + /model switch + tool calls, zero
errors in the 57-line agent.log window.
- Module-level import of run_agent + auxiliary_client + anthropic_adapter
no longer pulls 'anthropic' or 'firecrawl' at all.
* fix(gateway): restore top-level account_usage import for test-patch surface
CI caught two failures in tests/gateway/test_usage_command.py that I
missed locally:
AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage'
The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...)
to inject a fake account-fetch call. Moving the import inside the
handler deleted that module-level attribute, breaking the patch surface.
Restoring the top-level import in gateway/run.py gives up the ~230 ms
gateway-boot savings from that one lazy, but:
1. the gateway is a long-running daemon — boot cost is paid once per
install, not per turn;
2. the other four lazy-imports (firecrawl, openai, anthropic, cli's
account_usage) remain in place and still account for the bulk of
the savings reported in the PR body;
3. preserving the patch surface keeps the established
'gateway.run.fetch_account_usage' monkeypatch pattern working
without touching tests.
Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed.
Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent):
2332 passed, 4 failed — all 4 pre-existing on main.
---------
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
load_config() and read_raw_config() now cache their result keyed on
the config file's (mtime_ns, size). On cache hit they return a deepcopy
of the cached value, skipping yaml.safe_load + deep-merge + normalize +
env-var expansion entirely. save_config() + migrate_config() write via
atomic_yaml_write which produces a fresh inode, so stat() sees a new
mtime_ns and the next load repopulates automatically — no explicit
invalidation hook needed.
Measured per-call cost:
load_config() cold: 13.3 ms
load_config() cached: 0.23 ms (57x faster)
read_raw_config() cached: 0.13 ms
A single gateway turn hits the config 5-15 times (session context,
auxiliary client resolution, memory config, plugin hooks, approval
lookups, per-tool settings). That's 65-200 ms/turn of pure YAML
re-parsing on main. After this change: 1-3 ms/turn.
Also migrates gateway/run.py's 6 direct yaml.safe_load(config.yaml)
call sites through _load_gateway_config, which now shares the
read_raw_config cache when _hermes_home agrees with the canonical
config path. The direct-read fallback is retained for tests that
monkeypatch gateway_run._hermes_home without touching HERMES_HOME.
Safety:
- load_config() returns a deepcopy on every call; the 67+ call sites
that mutate the result (cfg["model"]["default"] = ..., etc.) can't
corrupt the cache.
- save_config() / atomic_yaml_write bump mtime, naturally invalidating
the cache for the next reader.
- Cache is keyed on str(config_path), so HERMES_HOME profile switches
don't collide.
Verified:
- 112 config tests pass (test_config, test_config_env_expansion,
test_config_env_refs, test_config_drift, test_config_validation,
test_aux_config).
- 87 gateway tests pass (test_verbose_command, test_session_info,
test_compress_focus, test_runtime_footer, test_resume_command,
test_reasoning_command, test_approve_deny_commands,
test_run_progress_interrupt).
- Live hermes chat smoke — 2 turns + /model switch + tool calls,
zero errors in agent.log.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Previously, check_browser_requirements() only checked for the agent-browser
CLI, not the Chromium binary it drives. When the CLI was present but
Chromium wasn't (common in Docker images predating the playwright install
step), the browser tool was advertised to the agent, every call hung for
the full command timeout (~30s each, ~220s for a chained navigate), and
the agent eventually gave up with no useful error — users saw 'browser
not working' with empty errors.log.
Changes:
- tools/browser_tool.py: add _chromium_installed() checking
PLAYWRIGHT_BROWSERS_PATH + default Playwright cache paths for
chromium-* / chromium_headless_shell-* dirs; wire into
check_browser_requirements() for local mode (cloud providers
unaffected). _run_browser_command fails fast with an actionable
Docker vs. host message instead of hanging. _running_in_docker()
checks /.dockerenv and /proc/1/cgroup.
- hermes_cli/tools_config.py: post_setup for 'Local Browser' now runs
'agent-browser install --with-deps' after npm install to actually
download Chromium. In Docker, points user at the updated image pull
instead of trying to install into a read-only layer. Cloud-provider
post_setup (browserbase) skips Chromium install entirely.
- tests/tools/test_browser_chromium_check.py: new tests covering
search roots, install detection, requirements branches (local/cloud/
camofox), and the fast-fail guard in docker/non-docker contexts.
- tests/tools/test_browser_homebrew_paths.py: 5 existing subprocess-path
tests now mock _chromium_installed=True since they exercise the
post-guard subprocess path.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Discord's per-app command-management bucket is ~5 writes / 20 s. A
mass-prune-plus-upsert reconcile (77 orphans + 30 desired = 107 writes
in the reported case) can't finish under the old flat 30 s budget, and
the subsequent reconnect retries inside the rate-limit cooldown also
time out — leaving slash commands broken for ~60 min until the bucket
fully recovers.
Bump the timeout to 600 s so realistic bursts drain, update the warning
message to point at the saturated bucket instead of a hardcoded 30 s.
The 600 s cap still guards against a true hang.
Credit to @Tranquil-Flow for PR #16739 and @davidbordenwi for reporting
#16713 with the bucket-math diagnosis.
Closes#16713.
Co-authored-by: Teknium <teknium@nousresearch.com>
The telegram.reactions key was already wired up (gateway/config.py bridges
it to TELEGRAM_REACTIONS at startup) but was undocumented and missing from
DEFAULT_CONFIG, so users had no way to discover it. Add it with the
existing off-by-default behavior preserved.
No behavior change — runtime default stays False.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
extraPackages adds packages to the systemd service PATH, but the
terminal backend's login-shell snapshot rebuilds PATH from NixOS system
profiles, so tools added via extraPackages are invisible to terminal
commands, skills, and cron jobs — the entire use case.
Changes:
- Mark the option description as deprecated with explanation
- Emit a NixOS warning when extraPackages is non-empty, including a
ready-to-paste environment.systemPackages replacement
- Update docs: quick-reference table, plugin example, and options
reference all point to environment.systemPackages
The option still functions (non-breaking) so existing configs keep
working while users migrate.
Auxiliary tasks (title_generation, vision, compression, web_extract,
session_search) now pick the correct wire protocol based on the
endpoint, not just on which resolve_provider_client branch built the
client. Fixes 404s on Kimi Coding Plan and any other named provider
whose endpoint speaks Anthropic Messages.
Root cause: the 'api_key' branch of resolve_provider_client (and the
Step 2 fallback chain inside _resolve_auto) always built a plain
OpenAI client regardless of what the endpoint actually spoke. For
provider=kimi-coding + model=kimi-for-coding, that meant:
POST https://api.kimi.com/coding/v1/chat/completions
{ "model": "kimi-for-coding", ... }
→ 404 resource_not_found_error
The /coding route only accepts the Anthropic Messages shape (the main
agent already uses api_mode=anthropic_messages for it). Earlier fixes
(#16819, #22ddac4b1) patched the anonymous-custom, named-custom, and
external-process branches — but the named api_key branch (kimi-coding,
minimax, zai, future /anthropic providers) was the fourth sibling and
never got the same treatment.
Fix: one module-level helper _maybe_wrap_anthropic() that rewraps a
plain OpenAI client in AnthropicAuxiliaryClient when:
- api_mode is explicitly 'anthropic_messages', OR
- the URL ends in '/anthropic', OR
- the host is api.kimi.com + path contains '/coding', OR
- the host is api.anthropic.com.
Wired into _wrap_if_needed (covers all resolve_provider_client
branches that already go through it) and into the Step 2 api_key
fallback chain inside _resolve_auto. Explicit api_mode still wins:
passing api_mode='chat_completions' forces OpenAI wire, and already-
wrapped specialized adapters (Codex, Gemini native, CopilotACP) pass
through unchanged.
E2E verified:
- resolve_provider_client('kimi-coding', 'kimi-for-coding')
→ AnthropicAuxiliaryClient (was plain OpenAI, which 404'd)
- _resolve_auto Step 1 for kimi-coding runtime → AnthropicAuxiliaryClient
- resolve_provider_client('openrouter', ...) → plain OpenAI (no regression)
- api_mode='chat_completions' override → plain OpenAI (explicit wins)
Tests:
- tests/agent/test_auxiliary_transport_autodetect.py (new): 21 tests
covering URL detection, wrap decisions, and integration.
- 204/205 existing auxiliary tests pass (1 pre-existing failure on
main, unrelated to this change).
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Append a compact 'model · 68% · ~/projects/hermes' footer to the FINAL
message of each turn, disabled by default (display.runtime_footer.enabled).
Answers the Telegram-side parity ask: runtime context that the CLI status
bar already shows is now available in messaging replies when enabled.
Wiring:
- gateway/runtime_footer.py: resolve_footer_config + format_runtime_footer +
build_footer_line. Pure-function renderer; per-platform overrides under
display.platforms.<platform>.runtime_footer.
- gateway/run.py: appends footer to response right after reasoning prepend
so it lands only on the final message (never tool progress or streaming
chunks). When streaming already delivered the body (already_sent), the
footer is sent as a small trailing message instead.
- agent_result now exposes context_length alongside last_prompt_tokens so
the footer can compute the pct; both gateway return paths updated.
- /footer [on|off|status] slash command, wired in CLI (cli.py) and gateway
(gateway/run.py both running-agent bypass and main dispatch). Global
toggle only; per-platform overrides via config.yaml.
Graceful degradation:
- Missing context_length (unknown model) → pct field silently dropped
(no '?%' artifact).
- Empty final_response → no footer appended.
- Unknown field names in config → silently ignored.
Tests: 25-case unit suite (tests/gateway/test_runtime_footer.py) plus E2E
harness covering streaming vs non-streaming branches, per-platform override,
and the exact argument contract gateway/run.py uses.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Mechanical cleanup across 43 files — removes 46 unused imports
(F401) and 14 unused local variables (F841) detected by
`ruff check --select F401,F841`. Net: -49 lines.
Also fixes a latent NameError in rl_cli.py where `get_hermes_home()`
was called at module line 32 before its import at line 65 — the
module never imported successfully on main. The ruff audit surfaced
this because it correctly saw the symbol as imported-but-unused
(the call happened before the import ran); the fix moves the import
to the top of the file alongside other stdlib imports.
One `# noqa: F401` kept in hermes_cli/status.py for `subprocess`:
tests monkeypatch `hermes_cli.status.subprocess` as a regression
guard that systemctl isn't called on Termux, so the name must
exist at module scope even though the module body doesn't reference
it. Docstring explains the reason.
Also fixes an invalid `# noqa:` directive in
gateway/platforms/discord.py:308 that lacked a rule code.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
The contributor's PR (#16750) scoped the fix to run_setup_wizard() and
explicitly punted the two sibling sites. Both have the identical
[ -e /dev/tty ] pattern followed by a < /dev/tty redirect and crash in
Docker the same way:
- scripts/install.sh:732 install_system_packages() -- apt sudo prompt
fallback. sudo ... < /dev/tty dies with the same ENXIO.
- scripts/install.sh:1395 maybe_start_gateway() -- gateway-install gate,
same function path as the wizard reproducer.
Fix both with the same (: </dev/tty) 2>/dev/null probe, and parametrize
the regression test over all three gated functions so any future
regression is caught regardless of which site breaks.
Address the three Copilot inline findings on the regression test:
- Switch _extract_run_setup_wizard() from str.index() with hard-coded
markers (which raises ValueError if `maybe_start_gateway()` is renamed
or the marker leaks into a comment) to an anchored regex on the
function-definition + closing-brace boundaries.
- Match `[ -e /dev/tty ]` with surrounding whitespace, optional quoting,
and the `test -e /dev/tty` form so the regression guard catches every
spelling of the existence-only check, not just the exact substring.
- Replace the literal `(: </dev/tty)` substring assertion with a
higher-level invariant — the gate must be an `if`/`if !` whose test
redirects stdin from /dev/tty — so equivalent open-based probes
(`exec 3</dev/tty` + close, brace-grouped variants, etc.) keep the
test green while the bare existence check stays caught.
Verified guard: both tests still pass on the fix and both fail on
`origin/main` with the documented messages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In Docker builds the `/dev/tty` device node is present in the mount
namespace, so `[ -e /dev/tty ]` returns true — but opening it fails
with `ENXIO: No such device or address`. Under the old gate the
"no terminal available" skip never triggered, the setup wizard ran,
and the build aborted a few lines later when bash tried `< /dev/tty`:
/tmp/install.sh: line 1347: /dev/tty: No such device or address
Replace the existence check with `(: </dev/tty) 2>/dev/null`, which
actually attempts to open /dev/tty in a subshell. The probe succeeds
when piped from `curl | bash` on a real terminal (the wizard's intended
use case) and fails cleanly in Docker build / CI contexts so the skip
kicks in before the redirect can crash.
Add a regression test that statically asserts run_setup_wizard does not
gate on the bare existence check and that the open-based probe is in
place.
Fixes#16746.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
delegate_task runs inside the parent turn and is cancelled when the parent is interrupted (new user message, /stop, /new). The child status payload (status=interrupted, exit_reason=interrupted) is already honest, but the tool schema and user-facing docs did not set the expectation, so users reasonably assumed delegated subagents would keep running in the background after interrupting the parent.
Updates:
- tools/delegate_tool.py DELEGATE_TASK_SCHEMA description adds a WHEN NOT TO USE bullet pointing at cronjob / terminal(background=True, notify_on_complete=True) for durable long-running work.
- website/docs/user-guide/features/delegation.md gains a Lifetime and Durability callout above Key Properties.
- website/docs/guides/delegation-patterns.md expands the Use something else list and the Constraints section with the same guidance.
Reported by LizLiz (@lizliz404) via Teknium.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
The gateway caches one AIAgent per session to preserve prompt-cache hits,
keyed by _agent_config_signature(). The signature previously only
fingerprinted model/credentials/toolsets/ephemeral-prompt — NOT the
compression or context_length config. As a result, users who edited
model.context_length or compression.threshold in config.yaml on a
long-lived gateway saw no effect until they triggered an unrelated
cache eviction (/model switch, /reset, gateway restart).
Add a new cache_keys parameter to _agent_config_signature and a
_CACHE_BUSTING_CONFIG_KEYS registry listing config values the agent
bakes in at construction time. Call sites read the current config and
pass it through — next gateway message with an edited config
rebuilds the agent.
Keys registered:
- model.context_length
- compression.enabled
- compression.threshold
- compression.target_ratio
- compression.protect_last_n
Reported by @OP (Apr 26 feedback bundle).
## Changes
- gateway/run.py: new _CACHE_BUSTING_CONFIG_KEYS tuple,
_extract_cache_busting_config classmethod, cache_keys kwarg on
_agent_config_signature, call site passes the extracted dict
- tests/gateway/test_agent_cache.py: 11 new tests
(5 on _agent_config_signature behavior, 6 on _extract_cache_busting_config)
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Network errors through proxies (e.g. sing-box) can leave httpx
connections in a half-closed state occupying pool slots. After enough
reconnect cycles the 256-connection default fills up entirely, causing
Pool timeout: All connections in the connection pool are occupied.
Fix: cycle only the getUpdates request object (_request[0]) via
shut-down + re-initialize before restarting polling. This drains stale
connections without touching the general request (_request[1]) that
concurrent send_message / edit_message calls rely on.
The drain is applied to both _handle_polling_network_error and
_handle_polling_conflict reconnect paths via a shared
_drain_polling_connections() helper. Failures in the drain are
swallowed so reconnect always proceeds.
Based on #16466 by @Mirac1eSky.
Auxiliary callers that configure reasoning via
auxiliary.<task>.extra_body.reasoning were having that config silently
dropped by the Codex Responses adapter — it only forwarded
messages/model/tools through to responses.stream(), never translating
chat.completions-shaped reasoning hints into the Responses API's
top-level reasoning + include fields.
Mirror the main-agent translation from agent/transports/codex.py:
- extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"}
- 'minimal' → 'low' clamp (Codex backend rejects 'minimal')
- Always include ['reasoning.encrypted_content'] when reasoning is enabled
- {'enabled': False} → omit reasoning and include entirely
- Non-dict reasoning values are ignored defensively
Reported by @OP (Apr 26 feedback bundle).
## Changes
- agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads
and translates extra_body.reasoning before calling responses.stream()
- tests/agent/test_auxiliary_client.py: 9 new tests covering all effort
levels, the minimal→low clamp, the disabled path, the no-op paths,
and defensive handling of wrong-shape inputs
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
The gateway session-hygiene pre-compression safety valve had a hardcoded
400-message threshold. On long-lived sessions with short turns this was
either too high (users with aggressive compression preferences) or too
low (users with very large context models who want to keep more history
in-flight).
Add compression.hygiene_hard_message_limit (default 400) so it can be
tuned without forking the gateway.
Reported by @OP (Apr 26 feedback bundle).
## Changes
- hermes_cli/config.py: new DEFAULT_CONFIG key with 400 default
- gateway/run.py: read compression.hygiene_hard_message_limit at
hygiene-time, fall back to 400 if missing/invalid
- tests/gateway/test_session_hygiene.py: two tests — override fires at
the configured limit, default does not fire below 400
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
When openai-codex tokens expire or the ChatGPT account hits a 429
window, the pool entry gets marked STATUS_EXHAUSTED with
last_error_reset_at many hours in the future. If the user then runs
`hermes model` / `hermes auth openai-codex` to reauth, fresh tokens
land in ~/.hermes/auth.json but the pool entry stayed frozen behind
its reset_at — every request kept failing with 'credential pool: no
available entries (all exhausted or empty)' until the original window
elapsed.
_available_entries() already had auth.json/credentials-file resync
branches for anthropic/claude_code and nous/device_code; openai-codex
was missing. Added _sync_codex_entry_from_auth_store() mirroring the
nous version (reads state["tokens"][{access,refresh}_token] +
state["last_refresh"]) and wired it into the exhausted-entry resync
loop.
Also softens the 'codex CLI not found' doctor warning — native
device-code OAuth does not require the Codex binary, only
importing existing Codex CLI tokens does. Downgraded to an info line.
Reported on Discord by p1aceho1der: Codex stalled indefinitely after
a rate-limit reset, reauth didn't help, and doctor falsely warned
that the codex CLI was required.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Gemini 3 Flash documents low/medium/high as the accepted thinkingLevel
values. The salvaged bridge was forwarding Hermes' "minimal" effort to
Flash verbatim, which is not a documented Gemini level and risks a 400
from the native adapter.
Clamp minimal->low on Flash (matching how Pro already clamps minimal+low
down), and funnel anything outside {low, medium, high} into medium to
keep the request valid by construction. No behaviour change for the
documented effort levels.
Telegram has no native table syntax. The gateway auto-rewrites pipe
tables into row-group bullets (see previous commit), but letting models
know up front means they emit the clean form directly instead of
relying on post-processing to synthesize headings.
Also helps users whose MEMORY.md formatting policies were being
overridden — the platform hint now carries the guidance.
The rate-limit branch added by the original PR did sleep+continue with
no attempt to record the last error, so persistent iLink -2 responses
exhausted the retry loop and hit 'assert last_error is not None',
raising AssertionError instead of a descriptive RuntimeError.
Record last_error = RuntimeError(...) before continuing, and break out
of the loop on the final attempt instead of sleeping uselessly.
- Change MAX_MESSAGE_LENGTH from 4000 to 2000 to match Weixin iLink API limit
- Add RATE_LIMIT_ERRCODE = -2 handling with 3x backoff retry
- Increase default send_chunk_delay_seconds from 0.35 to 1.5 to avoid rate limits
- Increase default send_chunk_retries from 2 to 4 for better reliability
- Use _split_text() in send() to chunk long messages before delivery
Fixes#16411
Add tests/test_cli_manual_compress.py verifying _manual_compress passes
None (not the cached system prompt) to _compress_context, forwards the
/compress <topic> focus string, rotates CLI session_id to the new child
session, and clears the pending title.
Co-authored-by: revar <revar@users.noreply.github.com>
_manual_compress() passed self.agent._cached_system_prompt to
_compress_context() as the system_message argument. _compress_context
calls _build_system_prompt(system_message), which appends system_message
to prompt_parts that already contain the agent identity block — causing
the identity to appear twice in the new session's system prompt
(20,957 -> 42,303 chars, +102% as reported in issue #15281).
Fix: pass None instead of _cached_system_prompt. _build_system_prompt(None)
rebuilds the system prompt correctly from scratch without appending a
pre-built prompt on top of the identity layers.
Fixes#15281
Follow-up to PR #16802 (BeliefanX). The original fix read
`agent_history[-1].get("timestamp")` for the tool-tail freshness gate,
but `gateway/run.py` strips the `timestamp` field off all tool/tool_call
rows when building `agent_history` from the raw transcript (see
`clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}`). At
runtime the tool-tail branch always saw `None` and silently took the
legacy-fresh path — the stale-guard never fired for the tool-tail case
it was supposed to cover.
Changes:
- Read the freshness signal from the RAW `history` list (via new
`_last_transcript_timestamp()` helper) BEFORE the strip. Both the
resume_pending branch and the tool-tail branch use this single signal,
replacing the two divergent ones.
- Default window bumped 15 min → 1 hour via new
`_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT`. The 15-minute default was
shorter than the default `gateway_timeout` of 30 min, so a legitimate
long-running turn interrupted near its timeout boundary and resumed
shortly after would have been misclassified as stale.
- Configurable via `config.yaml` `agent.gateway_auto_continue_freshness`
(bridged to `HERMES_AUTO_CONTINUE_FRESHNESS` at gateway startup — same
pattern as `gateway_timeout`). Set to 0 to disable the gate.
- `_coerce_gateway_timestamp` now explicitly rejects bool (which is a
subclass of int and would otherwise coerce to 0.0/1.0).
- Tests rewritten to exercise the real production data shape: raw
`history` → `_build_agent_history` strip → freshness decision. A
regression guard (`test_stale_tool_tail_with_production_data_shape`)
asserts `agent_history` tool rows carry NO timestamp, protecting
against someone "fixing" the original bug by re-adding the stripped
field (which would break the OpenAI tool-result message contract).
Add BeliefanX to scripts/release.py AUTHOR_MAP.
E2E verified: config.yaml → env var bridge → helper returns configured
value; default 1h window; malformed/empty env var falls back to default;
ISO-Z timestamps parse; ms-epoch coerced; bool rejected.
Extract the islink/realpath guard from the 16743 fix into a single
atomic_replace() helper in utils.py, then migrate every os.replace()
call site in the codebase to use it.
The original PR #16777 correctly identified and fixed the bug, but
only patched 9 of ~24 call sites. The same bug class (managed
deployments that symlink state files silently losing the link on
every write) still existed at auth.json, sessions file, gateway
config, env_loader, webhook subscriptions, debug store, model
catalog, pairing, google OAuth, nous rate guard, and more.
Rather than add another 10+ copies of the same three-line guard,
consolidate into atomic_replace(tmp, target) which:
- resolves symlinks via os.path.realpath before os.replace
- returns the resolved real path so callers can re-apply permissions
- is a drop-in replacement for os.replace at the use sites
Changes:
- utils.py: new atomic_replace() helper + atomic_json_write /
atomic_yaml_write now call it instead of inlining the guard
- 16 files: all os.replace() call sites migrated to atomic_replace()
- agent/{google_oauth, nous_rate_guard, shell_hooks}.py
- cron/jobs.py
- gateway/{pairing, session, platforms/telegram}.py
- hermes_cli/{auth, config, debug, env_loader, model_catalog, webhook}.py
- tools/{memory_tool, skill_manager_tool, skills_sync}.py
Tests: tests/test_atomic_replace_symlinks.py pins the invariant for
atomic_replace + atomic_json_write + atomic_yaml_write, covers plain
files, first-time creates, broken symlinks, and permission preservation.
Refs #16743
Builds on #16777 by @vominh1919.
os.replace(tmp, path) replaces the symlink itself with a regular file,
breaking users who symlink config.yaml, SOUL.md, or .env from ~/.hermes/
to a dotfiles repo or managed profile package.
Fix: resolve symlinks via os.path.realpath() before os.replace(), so the
real file is overwritten in-place while the symlink survives.
Fixed in 7 files covering all os.replace call sites:
- utils.py (atomic_json_write, atomic_yaml_write — fixes save_config)
- hermes_cli/config.py (env sanitizer, save_env_value, remove_env_value)
- tools/skill_manager_tool.py (_atomic_write_text — SOUL.md writes)
- tools/memory_tool.py (memory file writes)
- tools/skills_sync.py (manifest writes)
- cron/jobs.py (job state + output file writes)
- agent/shell_hooks.py (hook file writes)
FixesNousResearch/hermes-agent#16743
Real OpenClaw configs key agents.defaults.models by full provider/model
API ID with an 'alias' field on the value (e.g.
{'anthropic/claude-opus-4-6': {'alias': 'Claude Opus 4.6'}}). Add
regression tests for issue #16745 covering:
- reverse-lookup of alias against real schema (keyed by API ID)
- alias resolution when model is a bare string vs {'primary': ...}
- passthrough when the value is already a provider/model API ID
- passthrough when the alias has no catalog match
- string-valued catalog entries (belt-and-suspenders)
- no catalog at all
`hermes claw migrate` copied OpenClaw's model setting verbatim, which
could be a display alias (e.g. "Claude Opus 4.6") instead of the actual
API ID (e.g. "claude-opus-4-6"). Hermes then sent the alias to the API,
causing HTTP 404 model not found.
Fix: look up the model string in agents.defaults.models (plural) alias
catalog. If found, use the resolved "id" field, prepending the provider
prefix if needed. If not found (already an API ID), pass through unchanged.
FixesNousResearch/hermes-agent#16745
DeepSeek API returns HTTP 400 with 'Insufficient Balance' message when
account funds are depleted. This pattern was not in _BILLING_PATTERNS,
causing the error to be misclassified instead of triggering billing
exhaustion handling (e.g., fallback to alternate provider).
Suggested by teknium1 in PR review of #15586.
Adds tools.schema_sanitizer.strip_nullable_unions as the single
implementation for collapsing anyOf/oneOf nullable unions. Both the
MCP input-schema normalizer and the Anthropic tool-schema guard now
delegate to it instead of re-implementing the same walk three times.
The global sanitizer also gains a final pass so any tool that slips
past the two earlier hooks (plugin tools, non-MCP custom tools with
Pydantic-shaped schemas) still gets safe input_schemas on Anthropic.
- tools/schema_sanitizer.py:
* New public strip_nullable_unions(schema, keep_nullable_hint=True).
* _sanitize_single_tool() calls it as a final pass (hint preserved
so coerce_tool_args can still map string "null" to None).
- tools/mcp_tool.py: _normalize_mcp_input_schema delegates.
- agent/anthropic_adapter.py: _normalize_tool_input_schema delegates
with keep_nullable_hint=False (Anthropic does not recognize nullable).
No behavioral change for the fix itself; tests (73/73 targeted +
E2E across MCP→sanitizer→Anthropic paths) pass.
25 new tests (all Bedrock API calls mocked, no real AWS creds needed):
tests/hermes_cli/test_bedrock_model_picker.py (20 tests):
- provider_model_ids("bedrock") uses live discovery, returns regional
model IDs, falls back gracefully on empty/exception, resolves all
bedrock aliases (aws, aws-bedrock, amazon-bedrock) to live discovery
- list_authenticated_providers() section 2: bedrock appears with AWS
creds, model list from discover_bedrock_models(), total_models
matches, is_current flag works, absent creds hides bedrock, discovery
failure does not crash, no duplicate entries
- Region routing: botocore profile eu-central-1 yields eu.* model IDs
end-to-end; env var takes priority over botocore profile
- providers.py overlay: exists with correct transport/auth_type, label
is non-empty, all aliases normalize to bedrock
tests/agent/test_bedrock_adapter.py (5 tests):
- resolve_bedrock_region() botocore profile fallback, botocore failure
fallback, us-east-1 hard fallback (with botocore mocked)
provider_model_ids("bedrock") fell through to a static _PROVIDER_MODELS
table containing only hardcoded us.* model IDs. Users configured for
non-US AWS regions (eu-central-1, ap-northeast-1, etc.) saw wrong or no
models in /model and autocomplete.
Root causes fixed:
1. models.py: provider_model_ids() now calls discover_bedrock_models()
keyed by the resolved region before falling back to the static table.
A new bedrock_model_ids_or_none() helper in bedrock_adapter.py
consolidates the discover -> extract IDs -> fallback pattern used by
all three call sites.
2. providers.py: registers bedrock in HERMES_OVERLAYS with
transport=bedrock_converse and auth_type=aws_sdk so
get_provider("bedrock") and resolve_provider_full("bedrock") work.
3. model_switch.py: list_authenticated_providers() sections 2 and 3
detect AWS credentials via has_aws_credentials() for aws_sdk
overlays and use live discovery for the model list.
4. bedrock_adapter.py: resolve_bedrock_region() reads the configured
region from botocore.session before falling back to us-east-1,
covering users who set their region in ~/.aws/config via a named
profile rather than env vars.
5. tui_gateway/server.py: passes provider= to get_model_context_length()
so context window lookups work correctly for the Bedrock provider.
* fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path
OAuth requests now identify as Hermes on the wire. Removed:
- "You are Claude Code, Anthropic's official CLI for Claude." system
prompt prepend
- Hermes Agent → Claude Code / Nous Research → Anthropic
system-prompt substitutions
- mcp_ tool-name prefix on outgoing tool schemas + message history
- Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path
removed from AnthropicTransport.normalize_response, + all 5 call
sites in run_agent.py and auxiliary_client.py)
- user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on
the Messages API client
Added:
- OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth
requests carrying it with HTTP 400 'This authentication style is
incompatible with the long context beta header.'
Kept (auth plumbing, not identity spoofing):
- _is_oauth_token classifier and is_oauth flag threading
- Bearer vs x-api-key auth routing
- _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend
requires these on the OAuth-gated Messages endpoint
- _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth
creds to third parties; this is the only way the login flow works
- claude-cli/<v> User-Agent on the OAuth token exchange + refresh
endpoints at platform.claude.com/v1/oauth/token — bare requests get
Cloudflare 1010 blocked
Verified live against api.anthropic.com with a fresh sk-ant-oat01-*
token:
- claude-haiku-4-5 simple message: HTTP 200, 'OK' response
- claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool
named 'terminal' (no mcp_ prefix) round-tripped correctly
- Outgoing wire: no user-agent, no x-app, real Hermes identity in
system prompt, real tool name in schema
Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer
needed since the mcp_ round-trip is gone).
* fix(anthropic): resolve_anthropic_token() reads credential pool first
Close the gap where ~/.hermes/auth.json → credential_pool.anthropic
(where hermes login + dashboard PKCE flow write OAuth tokens) was not
in resolve_anthropic_token()'s source list.
Before: users who authed via hermes login got the token written into
the pool, but legacy fallback code paths (auxiliary_client, models
catalog fetch, explicit-runtime path) that call resolve_anthropic_token()
saw None and raised 'No Anthropic credentials found' — even though the
token was sitting in auth.json.
New priority 1: pool.select() with env-sourced entries skipped. Skipping
env:* entries preserves the existing env-var priority logic further
down the chain (static env OAuth → refreshable Claude Code upgrade via
_prefer_refreshable_claude_code_token).
Surfaced while writing the hermes-agent-dev skill playbook for
'finding a live OAuth token for an E2E test'.
---------
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Adds a pre-call sanitizer that detects assistant messages containing only
reasoning (reasoning / reasoning_content, no visible content, no
tool_calls) and drops them from the API copy. Adjacent user messages
left behind are merged so role alternation is preserved for the
provider.
Mirrors Claude Code's approach in src/utils/messages.ts
(filterOrphanedThinkingOnlyMessages + mergeAdjacentUserMessages). We
drop the whole turn rather than fabricate stub text (the '.' /
'(continued)' pattern from contributor PRs #11098, #13010, #16842 that
were rejected because they put words in the model's mouth).
The stored conversation history (self.messages) is never mutated — only
the per-call api_messages copy. Users still see the reasoning block in
the CLI/gateway transcript; only the wire copy is cleaned. Session
persistence keeps the full trace.
Two call sites covered:
- Main agent loop, after _sanitize_api_messages (catches every turn).
- Iteration-limit-summary fallback path.
Tests: tests/run_agent/test_thinking_only_sanitizer.py — 25 cases
covering detection (string/list content, whitespace-only, tool_calls,
reasoning_details list form), drop behavior, adjacent-user merge
(string+string, list+list, mixed), non-mutation of input dicts, and
system-message handling.
E2E live-tested against 5 providers with a poisoned history (empty
assistant message + reasoning_content): OpenRouter→Anthropic/OpenAI/
DeepSeek-R1/Qwen, native Gemini. All 5 accepted the cleaned request.
Happy-path regression (5/5) confirms the sanitizer is a noop when no
thinking-only turn exists.
Related: #16823 (wontfix — stub-text approach rejected).
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Registers tencent-tokenhub (https://tokenhub.tencentmaas.com/v1) as a
new API-key provider with model tencent/hy3-preview (256K context).
- PROVIDER_REGISTRY entry + TOKENHUB_API_KEY / TOKENHUB_BASE_URL env vars
- Aliases: tencent, tokenhub, tencent-cloud, tencentmaas
- openai_chat transport with is_tokenhub branch for top-level
reasoning_effort (Hy3 is a reasoning model)
- tencent/hy3-preview:free added to OpenRouter curated list
- 60+ tests (provider registry, aliases, runtime resolution,
credentials, model catalog, URL mapping, context length)
- Docs: integrations/providers.md, environment-variables.md,
model-catalog.json
Author: simonweng <simonweng@tencent.com>
Salvaged from PR #16860 onto current main (resolved conflicts with
#16935 Azure Anthropic env-var hint tests and the --provider choices=
list removal in chat_parser).
Three related fixes around custom env-var-name hints for provider entries.
1. Azure Anthropic path: previously hardcoded to look up AZURE_ANTHROPIC_KEY
then ANTHROPIC_API_KEY with no way to override. If a user wrote
model:
provider: anthropic
base_url: https://my-resource.services.ai.azure.com/anthropic
key_env: MY_CUSTOM_KEY
the key_env hint was silently ignored and the resolver raised
'No Azure Anthropic API key found' even when MY_CUSTOM_KEY was set
in the environment. The runtime now checks, in order:
(1) os.getenv(model_cfg.key_env)
(2) os.getenv(model_cfg.api_key_env) # docs alias
(3) model_cfg.api_key # inline value
(4) AZURE_ANTHROPIC_KEY # historical default
(5) ANTHROPIC_API_KEY # historical default
Error message updated to mention key_env as an option.
2. Provider entry normalizer (_normalize_custom_provider_entry): accept
'api_key_env' as a snake_case alias for 'key_env', and 'apiKeyEnv' as a
camelCase alias. Adds both to the _KNOWN_KEYS set so the 'unknown
config keys ignored' warning doesn't fire on valid configs.
3. _VALID_CUSTOM_PROVIDER_FIELDS: add 'key_env'. That set documents
supported custom_providers entry fields; it was drifting from reality
since key_env has been read at runtime in auxiliary_client.py,
runtime_provider.py, and main.py for a while.
Docs: website/docs/guides/azure-foundry.md now uses the canonical key_env
field and notes that api_key_env / keyEnv / apiKeyEnv are accepted as
aliases.
Validation: 12 new tests in test_runtime_provider_resolution.py covering
all 5 Azure Anthropic resolution paths + 4 normalizer-alias tests. Pass
rate across related suites (165 + 46 tests): 100%.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
The mention_user_id injection from #38a6bada9 unconditionally attached an
@user:server mention pill + MSC3952 m.mentions.user_ids payload to every
outbound reply and every tool-progress status update. The stated intent
was push notifications in muted rooms, but shipped as always-on in every
room, DM or group, muted or not — so every reply pinged the user.
- gateway/platforms/base.py: stop injecting mention_user_id into send
metadata on every reply; restore the original _thread_metadata passthrough.
- gateway/run.py: drop mention_user_id from status-thread metadata.
- gateway/platforms/matrix.py: drop the mention-pill append block in
_send_text that consumed the metadata. Keep the reaction-based exec
approval half of #38a6bada9 and the inbound/outbound m.mentions
handling (unrelated to the per-reply ping).
Reported by Elkim [NOUS] on Discord.
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
* feat(claw-migrate): harden OpenClaw import with plan-first apply, redaction, and pre-migration backup
Adopts four design patterns from OpenClaw's reciprocal migrate-hermes
importer so both migration paths have the same safety posture.
- **Refuse-on-conflict apply.** 'hermes claw migrate' now refuses to
execute when the plan has any conflict items, unless --overwrite is
set. Previously the user could say 'yes, proceed' and end up with a
silent partial migration that skipped every conflicting item.
- **Engine-level secret redaction.** The report.json and summary.md
written to disk (and --json stdout) run through a redactor that
matches OpenClaw's key-name markers and value-shape patterns
(sk-*, ghp_*, xox*-, AIza*, Bearer *). Prevents accidental API key
leakage in bug reports and support channels.
- **Pre-migration tarball snapshot.** Apply creates one timestamped
restore-point archive of ~/.hermes/ at ~/.hermes/migration/pre-migration-backups/
before any mutation, excluding regenerable directories
(sessions, logs, cache). Opt out with --no-backup.
- **Blocked-by-earlier-conflict sequencing.** If a config.yaml write
hits conflict/error mid-apply, subsequent config-mutating options
are marked skipped with reason 'blocked by earlier apply conflict'
rather than attempting partial writes.
- **Structured warnings[] and next_steps[] on the report** — actionable
guidance surfaces in both JSON output and summary.md.
- **--json output mode** — emits the redacted report on stdout for CI.
Also flips --preset full to NOT auto-enable --migrate-secrets. Users
now have to opt in to secret import explicitly, mirroring OpenClaw's
two-phase posture.
Status/kind/action constants are defined (STATUS_MIGRATED etc) with
values that match the existing strings the script emits, so the
report schema is backward-compatible. ItemResult gains a 'sensitive'
bool field that redaction and consumers can key off.
Validation: 26 new unit tests + 1 updated test in tests/skills/
test_openclaw_migration_hardening.py and test_claw.py cover redaction
(key markers, value patterns, recursion, on-disk), warnings/next_steps,
blocked-by-earlier sequencing, --json mode, and the preset-flip.
Manual E2E against a fake $HERMES_HOME with real-shaped secrets
confirmed: (1) secrets never appear in stdout or on disk,
(2) _cmd_migrate refuses apply when plan has conflicts,
(3) --overwrite proceeds past the guard and the backup tarball is
created, (4) --no-backup skips the archive.
Related docs: website/docs/guides/migrate-from-openclaw.md and
website/docs/reference/cli-commands.md updated to reflect the
preset-flip and new --no-backup flag.
* refactor(claw-migrate): reuse hermes backup system for pre-migration snapshot
Drops the inline tarball in hermes_cli/claw.py in favor of
hermes_cli.backup.create_pre_migration_backup(), which shares an
implementation with create_pre_update_backup via a new
_write_full_zip_backup helper. Benefits:
- Consistent exclusion rules with hermes backup (_EXCLUDED_DIRS,
_EXCLUDED_SUFFIXES, _EXCLUDED_NAMES — single source of truth).
- SQLite safe-copy via _safe_copy_db (state.db restores cleanly).
- Zip format restorable with 'hermes import <archive>'.
- Lives under ~/.hermes/backups/pre-migration-*.zip alongside
pre-update-*.zip — one place for all snapshot archives.
- Auto-prune rotation with separate keep counters (pre-migration
keeps 5, pre-update keeps 5, they don't touch each other's files).
7 new tests in tests/hermes_cli/test_backup.py lock the contract:
directory location, shared exclusion rules, _validate_backup_zip
acceptance (i.e. restorable with 'hermes import'), non-recursive
into prior backups, rotation, missing-home handling, and the
invariant that pre-migration rotation never touches pre-update
backups.
Help text and docs updated — the restore hint now says
'hermes import <name>' instead of 'tar -xzf <archive> -C ~/'.
* chore(claw-migrate): use backup._format_size and drop duplicate output line
Minor polish using another existing primitive from hermes_cli.backup:
- Show backup archive size with _format_size (e.g. '(245 B)' or '(2.4 MB)')
matching the format hermes backup already uses.
- Drop the duplicate 'Pre-migration backup saved' line after Migration
Results — the earlier 'Pre-migration backup: <path> (<size>)' line
already surfaces the path before apply runs.
---------
Co-authored-by: teknium1 <teknium@users.noreply.github.com>
Follow-up to the static list refresh: replace the hardcoded xAI entries
with _xai_curated_models(), mirroring the _codex_curated_models()
pattern from PR #7844. The helper reads $HERMES_HOME/models_dev_cache.json
at import time (no network call) and falls back to a small static list
when the cache is missing or malformed.
Why: _PROVIDER_MODELS["xai"] has drifted once already (issue #16699) and
will drift again next time xAI renames a model. Hermes already maintains
the models.dev cache and uses it for context-length lookups; pointing
_PROVIDER_MODELS at the same source means the /model picker self-heals on
the next cache refresh instead of requiring a PR.
Behavior:
- With cache populated (normal user): shows every current xAI model ID,
picks up renames automatically on next refresh.
- Without cache (fresh install, offline): falls back to a static snapshot
of the 9 current flagship IDs.
- Malformed cache / unexpected shape: same static fallback, no crash.
Import time verified <20ms — disk read only, no HTTP.
Addresses the structural piece of #16699 ("consider a single
_provider_models(provider) resolver") for xAI. Other per-provider lists
can adopt the same pattern as drift is observed.
_PROVIDER_MODELS["xai"] was pointing at model IDs the xAI direct API
no longer accepts:
- grok-4.20-reasoning
- grok-4-1-fast-reasoning
Replaced with the actual current xAI catalog IDs from models.dev
($HERMES_HOME/models_dev_cache.json, mirror of https://models.dev/api.json):
grok-4.20-0309-reasoning
grok-4.20-0309-non-reasoning
grok-4.20-multi-agent-0309
grok-4-1-fast
grok-4-1-fast-non-reasoning
grok-4-fast
grok-4-fast-non-reasoning
grok-4
grok-code-fast-1
The xAI-direct API (https://api.x.ai/v1) serves the dated IDs shown
above; the bare aliases (grok-4.20, grok-4.1-fast, etc.) are
OpenRouter/Vercel-gateway normalizations and are not accepted on
xAI-direct. Those gateways remain unaffected.
Fixes#16699
Follow-up to PR #16819 applying the same treatment to the two sibling
fallback sites in resolve_provider_client() that carry the identical bug
class as the anonymous-custom branch:
- Named custom provider (providers: / custom_providers: config entries):
apply _to_openai_base_url() on the OpenAI-wire path (chat_completions /
codex_responses), leave custom_base untouched on the anthropic_messages
path where the /anthropic surface is intentional. Prefer
main_runtime.get('model') over _read_main_model() so the entry model
still wins first. The ImportError fallback for anthropic_messages now
redoes query-param extraction against the rewritten URL so the final
OpenAI client hits /v1.
- external_process branch (copilot-acp): same main_runtime.get('model')
fallback before _read_main_model() so auxiliary tasks on this provider
track live /model switches instead of stale config.yaml.
Keeps the fix consistent across all three custom-endpoint fallback sites
in resolve_provider_client().
Three related issues prevented user-defined providers in `providers:` and
`model_aliases:` from being reachable through standard CLI flags. Requests
silently routed to the configured `model.base_url` instead of the user-
intended endpoint.
* hermes_cli/model_switch.py — root cause of the silent misrouting:
`_ensure_direct_aliases()` rebound `DIRECT_ALIASES` to a freshly-loaded
dict, leaving every `from hermes_cli.model_switch import DIRECT_ALIASES`
caller stuck on the stale empty original. Switched to `.update()` so
module attribute references stay valid.
* hermes_cli/main.py — chat subcommand `--provider` had `choices=[...]`
hardcoded to built-in providers, rejecting valid keys from user
`providers:` config. Dropped the choices list; runtime resolution
validates correctly downstream.
* hermes_cli/oneshot.py — `-m <alias>` only resolved the model name; the
alias's base_url was never propagated. Now consults `DIRECT_ALIASES`
before falling through to `detect_provider_for_model`, and threads the
alias's base_url to `resolve_runtime_provider(explicit_base_url=...)`.
* hermes_cli/runtime_provider.py — `_resolve_named_custom_runtime` now
honors `(provider="custom", explicit_base_url=...)` so a base_url
propagated from a direct-alias resolution actually builds a runtime
instead of falling through to provider-registry handlers that don't
know about ad-hoc local endpoints.
Verified: `hermes chat --provider <user-key> -m <model> -q "..."` and
`hermes -m <user-alias> -z "..."` both route to the user-intended
endpoint, observable via the target server's request log.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to #15328's vision-unsupported retry branch in run_agent.py.
_strip_images_from_messages() previously deleted any message whose content
was entirely images. That's fine for synthetic user messages injected for
attachment delivery, but it breaks providers for tool-role messages — the
paired tool_call_id on the preceding assistant message ends up unmatched,
which OpenAI-compatible APIs reject with HTTP 400.
Fix: tool-role messages whose content becomes empty are replaced with a
plaintext placeholder that preserves the tool_call_id linkage. Only
non-tool messages are dropped. Added 10 tests covering the role-alternation
invariants + image-type coverage.
Image-rejection detector: expanded phrase list (image content not
supported / multimodal input / vision input / model does not support
image) and gated on 4xx status so transient 5xx errors never get
misinterpreted as 'server said no to images'. Detection is documented as
best-effort English phrase matching.
AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to
ddupont808 so release notes attribute the salvage correctly.
Tool handlers (e.g. computer_use capture) return a _multimodal envelope
dict when a screenshot is attached. The tool-message builder was passing
this raw dict as the `content` field of role:tool messages, which is an
illegal format — OpenAI-compatible APIs expect a string or a content-parts
list, not a plain Python dict, and would reject it with a 400/422 error.
Fix: unwrap _multimodal results to their `content` list
([{type:text,...},{type:image_url,...}]) in both the parallel and
sequential tool-call paths. The Anthropic adapter already handles content
lists natively; vision-capable OpenAI-compatible servers (mlx-vlm,
GPT-4o, etc.) accept image_url parts in tool messages directly.
Also add a _vision_supported adaptive fallback: on first image-rejection
error ("Only 'text' content type is supported." etc.) the agent strips all
image parts from the message history and retries with text only, so
text-only endpoints degrade gracefully without crashing the session.
Extends the cua-driver computer-use backend to drive backgrounded macOS
windows without stealing keyboard or mouse focus from the foreground app.
All changes target the cua-driver MCP backend and the shared dispatcher.
## cua_backend.py
**Window-aware capture**: capture() now calls list_windows + get_window_state
instead of the removed capture tool. Prefers structuredContent.windows
(MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back
to regex-parsed text for older builds. Stores the selected (pid, window_id)
as sticky context so subsequent action calls do not need a redundant round-trip.
**Action routing**: click/scroll/type_text/key all carry the sticky pid
(and window_id for element-indexed clicks). type_text routes through
type_text_chars (individual key events) rather than AX attribute write --
WebKit AXTextFields reject attribute writes from backgrounded processes.
**Key parsing**: _parse_key_combo splits cmd+s-style strings into
(key, [modifiers]) and routes to hotkey (modifier present) or
press_key (bare key) -- cua-driver actual tool names.
**set_value method**: new set_value(value, element) calls the cua-driver
set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari,
AXPress opens the native macOS popup which closes immediately when the app is
non-frontmost; set_value AX-presses the matching child option directly
(no menu required, no focus steal).
**focus_app**: reimplemented as a pure window-selector (enumerates
list_windows, sets sticky pid/window_id) without ever raising the window
or stealing focus.
**list_apps**: fixed tool name from listApps to list_apps; handles plain-text
response via regex when structured data is absent.
**Structured-content extraction**: _extract_tool_result now surfaces
structuredContent from MCP results, enabling the list_windows window array
without text parsing.
**Helpers**: _parse_windows_from_text, _parse_elements_from_tree,
_split_tree_text, _parse_key_combo extracted as module-level functions.
## schema.py
Added set_value to the action enum with a description explaining when to
prefer it over click (select/popup elements, sliders, no focus steal).
Added value field for set_value payloads.
## tool.py
Routed set_value action through _dispatch to backend.set_value.
Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated).
Fixed MIME-type detection in _capture_response: cua-driver may return
JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png)
rather than hardcoding image/png.
## agent/display.py + run_agent.py
Guard _detect_tool_failure and result-preview logic against non-string
function_result values: multimodal tool results (dicts with _multimodal=True)
are not string-sliceable; treat them as successes and fall back to str()
for length/preview.
Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.
Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.
- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
Actions: capture (som/vision/ax), click, double_click, right_click,
middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
`content: [text, image_url]` parts) that flows through
handle_function_call into the tool message. Anthropic adapter converts
into native `tool_result` image blocks; OpenAI-compatible providers
get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
recent screenshots carry real image data; older ones become text
placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
instead of its base64 char length (~1MB would have registered as
~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
(curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
entries.
44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.
469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.
- `model_tools.py` provider-gating: the tool is available to every
provider. Providers without multi-part tool message support will see
text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
client-side eviction + compressor pruning cover the same cost ceiling
without a beta header.
- macOS only. cua-driver uses private SkyLight SPIs
(SLEventPostToPid, SLPSPostEventRecordTo,
_AXObserverAddNotificationAndCheckRemote) that can break on any macOS
update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
prints the Settings path.
Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
Opt-in Langfuse tracing for Hermes conversations — LLM calls, tool
usage, usage/cost breakdown per span. Hooks into pre/post_api_request,
pre/post_llm_call, pre/post_tool_call. SDK is optional; missing SDK or
credentials renders the plugin inert.
Salvaged from PR #16845 by @kshitijk4poor, who wrote the plugin
(~875 LOC, 6 hooks, Langfuse usage-details/cost-details normalization,
read_file payload summarization).
Salvage scope (why this isn't PR #16845 as-authored):
- Lives at plugins/observability/langfuse/ (standalone kind, opt-in via
plugins.enabled) instead of a new parallel optional-plugins/
directory. Standalone bundled plugins are already opt-in — only their
plugin.yaml is scanned at startup; the Python module is not imported
unless the user enables it. The premise of optional-plugins/ (avoid
import cost for users who don't want it) is already solved by the
existing plugin system.
- Dropped the triple activation gate (plugins.enabled +
plugins.langfuse.enabled + HERMES_LANGFUSE_ENABLED). The Hermes plugin
system's own enable/disable is authoritative; runtime credentials
gate whether the hook actually traces.
- Rewrote _is_enabled() → cached _get_langfuse() with an _INIT_FAILED
sentinel. The original called hermes_cli.config.load_config() from
every hook invocation (full yaml parse + deep merge + env expansion
on every pre/post_tool_call, potentially 100+ times per turn). The
cached version reads env once and returns the cached client or None
on every subsequent call with zero further work.
- hermes tools → Langfuse Observability post-setup adds
observability/langfuse to plugins.enabled directly (via
_save_enabled_set) instead of going through an install-copy flow.
Enable:
hermes tools # interactive
hermes plugins enable observability/langfuse # manual
Required env (set by `hermes tools` or in ~/.hermes/.env):
HERMES_LANGFUSE_PUBLIC_KEY
HERMES_LANGFUSE_SECRET_KEY
HERMES_LANGFUSE_BASE_URL # optional
Co-authored-by: kshitijk4poor <kshitijk4poor@gmail.com>
Narrow plaintext shortcut that rewrites a tiny set of admin phrases
("restart gateway", "restart the gateway", "restart hermes") into the
/restart slash command, but only in DMs. Scope is intentionally tight:
- DM text messages only — group chats keep natural-language semantics
- Exact restart-style phrases only
- Skips anything already starting with "/"
Without this, the LLM can receive "restart gateway" as a user turn and
try to satisfy it via the terminal tool (systemctl restart ...). That
kills the gateway while the originating agent is still running, which
leaves systemd in "draining" state waiting on a process it's about to
kill. Routing the phrase to the slash-command dispatcher bypasses the
agent loop and uses the existing restart machinery (request_restart).
Called once, at the adapter level in BasePlatformAdapter.handle_message,
so every platform gets it for free and pending-message reinjection is
covered by the same call site.
Adds 2 Telegram-parametrized e2e tests: DM routes to request_restart,
group chats fall through to the normal agent path.
Runtime already supports list-form fallback_model (run_agent.py:1459
iterates fallback_chain; fallback_cmd.py migrates legacy single-dict
configs to list format). The config validator and save_config comment
gate still assumed single-dict form and flagged list-form configs as
errors. Fix both:
- validate_config_structure: when fallback_model is a list, validate
each entry has provider+model; keep the existing single-dict path.
- save_config: suppress the "add fallback_model" comment when any list
entry is well-formed.
Adds 4 list-form validator tests.
PR #16858's session-scoped interactive sudo password cache falls back to
a thread-identity scope when no HERMES_SESSION_KEY is bound. ACP never
set that contextvar, so two ACP sessions landing on the same reused
ThreadPoolExecutor thread still shared the cache — the exact scenario
the PR headlined.
acp_adapter/server.py now:
- binds HERMES_SESSION_KEY=<session_id> via gateway.session_context
inside _run_agent() (and clears on exit)
- wraps the loop.run_in_executor(_executor, _run_agent) call in a fresh
contextvars.copy_context() so concurrent ACP sessions don't stomp on
each other's ContextVar writes (executor pool threads would otherwise
share a context).
Adds tests/acp/test_approval_isolation.py::
test_sudo_password_cache_isolated_across_acp_sessions_on_same_pool_thread
which drives two back-to-back sessions through a 1-worker ThreadPoolExecutor
and asserts B does not observe A's cached password.
Follow-up on top of the cherry-picked contributor commit for #16751:
1. Delete triggers: the original PR switched FTS5 from external to inline
content mode and concatenated content || tool_name || tool_calls in
the insert/update triggers, but left the delete triggers passing
old.content to the FTS5 delete-command. FTS5 inline delete requires
the content to match what was stored, so every DELETE on messages
raised 'SQL logic error'. Replaced with plain DELETE FROM ... WHERE
rowid = old.id on all four delete paths (normal + trigram, delete +
update-delete).
2. v11 migration: existing DBs have the old external-content FTS tables
and triggers. Because CREATE VIRTUAL TABLE IF NOT EXISTS / CREATE
TRIGGER IF NOT EXISTS skip when the objects already exist, upgraders
would have kept the broken behavior forever. Bumped SCHEMA_VERSION
to 11 and added a migration that drops both FTS tables + all 6 old
triggers, recreates them via FTS_SQL / FTS_TRIGRAM_SQL, and backfills
from messages using the same concatenation expression.
3. Regression tests: 6 new tests cover INSERT / UPDATE / DELETE paths
for tool_name + tool_calls indexing plus the full v10 -> v11 upgrade
path on a hand-built legacy DB.
The FTS5 virtual tables (messages_fts, messages_fts_trigram) previously
only indexed the content column via external content mode. Tool calls
and tool names stored in the tool_calls (JSON) and tool_name columns
were invisible to FTS5 search.
Root cause: FTS5 triggers only INSERTed new.content into the index.
Changes:
- Switch FTS5 tables from external content (content=messages) to inline
mode so that trigger-inserted content is both indexed and stored
- Update all 6 FTS5 triggers to concatenate content, tool_name, and
tool_calls when indexing new messages
- Extend the short-CJK LIKE fallback to also search tool_name and
tool_calls columns
Closes: #16751
FTS5 default tokenizer splits 'sp_new1' into tokens 'sp' and 'new1'.
Without quoting, a search for 'sp_new' becomes an AND query
('sp AND new') that fails to match rows indexed as 'sp_new1'.
Fix: add underscore to the character class in Step 5 regex
([.-] -> [._-]) so underscored terms are wrapped in double quotes.
Also adds test_sanitize_fts5_quotes_underscored_terms.
Both keys are documented in cli-config.yaml.example and read at runtime by
hermes_cli/timeouts.py (get_provider_request_timeout and get_provider_stale_timeout),
but the provider-entry validator in config.py flagged them as unknown, producing
noisy warnings on every CLI invocation for users who followed the documented config.
Fixes#16779
- App.tsx doc comment: replace stale ChatPageHost reference with
'persistent chat host block rendered inline near the bottom of this
file' so readers can find the actual code.
- App.tsx persistent host: show a small spinner on /chat while plugin
manifests are loading instead of a blank content area. Direct
/chat deep-links used to paint empty for up to ~2s in the worst
case (plugin-registration safety timeout) because both the route
sink (null) and the persistent host (!pluginsLoading gate) render
nothing during that window. Non-chat routes stay empty as before.
- ChatPage.tsx: rename setter to match the 'raw' state — useState
now destructures as [mobilePanelOpenRaw, setMobilePanelOpenRaw],
and all four call sites (closeMobilePanel, matchMedia listener,
open-button onClick, plus destructure) updated accordingly. No
behavior change; matches the 'raw vs derived' convention the
original comment set up.
The dashboard's Chat tab (hermes dashboard --tui) lost its session
whenever the user navigated to another tab and came back. React Router
unmounted ChatPage on path change, which ran the cleanup function,
closed the PTY WebSocket, and terminated the underlying TUI child -
so the next mount generated a fresh channel id, spawned a new PTY, and
started a brand-new conversation.
Rather than rebuild the destroyed state (session id capture + resume
via HERMES_TUI_RESUME would reload history from disk but drop in-flight
tool state, scrollback, and picker position), keep the component tree
alive.
* Pull ChatPage out of Routes into a sibling always-mounted host that
toggles visibility via display:none keyed off the current route. A
tiny ChatRouteSink still claims /chat so the catch-all redirect
does not fire.
* xterm instance, WebSocket, PTY child, and TUI/agent state all
survive; returning to /chat shows the exact conversation the user
left.
* Respect plugin `/chat` overrides: if a plugin manifest declares
`tab.override: "/chat"`, the Routes tree already swaps the element
for <PluginPage /> — we additionally suppress the persistent host
so the two don't paint on top of each other. Preserves the
pre-persistence contract that a plugin owning /chat replaces the
built-in chat UI entirely.
* Wait for usePlugins() to finish loading before mounting the
persistent host. Manifests arrive asynchronously from
/api/dashboard/plugins, so without the `!pluginsLoading` gate the
host would mount with manifests=[], spawn a PTY, and then unmount
mid-session when the manifest list resolves and reveals a /chat
override. Typical delay is <50ms; worst case is the 2s plugin-
registration safety timeout. Cheaper than killing someone's
conversation underneath them.
* Gate page-header slot (`setEnd`), the mobile sheet's portalled
render, and body-scroll lock on a new `isActive` prop so the hidden
ChatPage doesn't fight the active page for shared state. The
scroll-lock effect keys on the *derived* `mobilePanelOpen` (which is
`isActive && mobilePanelOpenRaw`) rather than the raw state — that
way tab-switch flips the dep false, fires the cleanup, and releases
`document.body.style.overflow`. Keying on the raw state would leave
body.overflow="hidden" stuck on /sessions and every other tab until
the user navigated back to /chat and explicitly closed the sheet.
* When isActive flips false to true, force a double-rAF fit:
display:none collapses the host box and ResizeObserver does not fire
on display changes, so xterm would otherwise stay at a stale or 1x1
grid. Also early-return from syncTerminalMetrics when the host has
zero area, since fit() on a zero-sized element produces a 1x1
terminal.
* Focus handling on tab return: only steal focus into the terminal if
focus wasn't already parked somewhere inside ChatPage (e.g. the
sidebar model picker, a tool-call entry). Yanking focus away from
whatever the user last clicked is surprising and a screen-reader
foot-gun; the typical "first activation" case still focuses the
terminal because document.activeElement is <body> at that point.
Trade-off worth flagging, deliberately not mitigated in this change:
while hidden, ChatPage still holds a PTY child + WebSocket + xterm
instance for the dashboard's full lifetime. The WS keeps delivering
bytes and xterm keeps parsing them into a display:none host (cheap —
no paint work, but not free). Reasonable costs to pay for the session
preservation; if they become a problem we can pause `term.write` when
!isActive or idle-disconnect after N minutes hidden.
Lint clean on touched files. tsc -b && vite build pass.
Follow-up to the salvaged PR #16867 that added the read path for
agent.disabled_toolsets in _get_platform_tools():
- Document the new config key under a "Global Toolset Disable" section
in website/docs/user-guide/configuration.md, including the precedence
note (global disable overrides per-platform platform_toolsets).
- Map nazirulhafiy@gmail.com -> nazirulhafiy in scripts/release.py
AUTHOR_MAP so release-notes CI attributes the cherry-picked commit.
Previously, agent.disabled_toolsets in config.yaml only worked for CLI
mode (run_agent.py --disabled_toolsets). The gateway always passed
enabled_toolsets to AIAgent, and get_tool_definitions() ignored
disabled_toolsets when enabled_toolsets was set.
Fix: _get_platform_tools() now reads agent.disabled_toolsets from config
and excludes those toolsets from the returned set. This runs last so it
overrides everything above.
Added 3 tests covering cross-platform suppression, explicit platform
config override, and empty/missing config no-op behavior.
Streaming-only providers (glm, MiniMax, gpt-5.x via aigw, Anthropic via
openai-compat shims) emit reasoning through delta.reasoning_content
chunks that get accumulated into the local reasoning_text string — but
never land on the assistant message object as a top-level attribute. The
prior guard at _build_assistant_message only wrote reasoning_content
when the SDK exposed hasattr(msg, 'reasoning_content'), so these
providers persisted the chain-of-thought under the internal 'reasoning'
key and omitted the protocol-standard field.
The poison was silent until the user later switched to a DeepSeek-v4 or
Kimi thinking model, at which point replay failed with HTTP 400:
'The reasoning_content in the thinking mode must be passed back to the
API.' One reported session store accumulated 4,031 poisoned messages
across 1,101 files (#16844).
Fix: add an additive fallback that promotes the already-sanitized
reasoning_text to reasoning_content when no earlier branch wrote it AND
reasoning text was actually captured. Layered on top of the existing
SDK-attr branch and DeepSeek ''-pad (#15250) rather than replacing them,
so every existing behavior is preserved:
- SDK-exposed reasoning_content (OpenAI/Moonshot/DeepSeek SDK) still
wins.
- DeepSeek tool-call ''-pad still fires when the SDK exposes the attr
but the value is None.
- Non-thinking turns with no reasoning leave the field absent, so
_copy_reasoning_content_for_api's cross-provider leak guard (#15748),
promote-from-'reasoning' tier, and thinking-pad tier remain live at
replay time.
- No empty '' gets eagerly written on every assistant turn (which would
have bypassed the read-side ladder and triggered empty thinking-block
insertion in the Anthropic adapter).
Tests: three new TestBuildAssistantMessage cases covering the streaming
promotion path, SDK precedence, and field-absent-when-no-reasoning
invariant.
Credit @Sanjays2402 for the original diagnosis and patch in #16884;
this is a scoped rework that preserves the existing read-side
compensation code as defense in depth.
Refs #16844, #16884, #15250, #15353, #15748.
Address Copilot review on #16868:
1. Tighten pool iteration. ``validate_copilot_token`` only rejects empty
strings and classic PATs (``ghp_*``); a malformed/unsupported ``gho_*``
token at ``credential_pool.copilot[0]`` would pass the gate and short-
circuit the loop, hiding a later valid entry. Switch to calling
``exchange_copilot_token`` directly: only entries that actually exchange
into a live Copilot API token are returned. Bad/expired entries fall
through to the next, and an exhausted pool returns ``""`` so the picker
falls back to the curated list (existing behaviour).
2. Reword the docstring + test module docstring to describe the pool seed
path accurately — ``hermes auth add copilot`` adds an api-key-typed
credential whose ``access_token`` field stores the pasted token, and
``_seed_from_env`` mirrors ``COPILOT_GITHUB_TOKEN`` from
``~/.hermes/.env`` into the pool. The previous wording implied
``auth add copilot`` itself ran the device-code flow, which it does
not (the device-code flow lives in ``hermes model``).
Two new tests cover the iteration change:
- ``test_skips_pool_entry_that_fails_to_exchange`` — pool[0] raises,
pool[1] succeeds, picker uses pool[1].
- ``test_all_pool_entries_fail_exchange_returns_empty`` — every entry
raises, return ``""``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Users whose only Copilot credential is the OAuth `access_token` saved by
`hermes auth add copilot` (device-code flow) saw the `/model` picker drop
back to a stale hardcoded list. Reason: `_resolve_copilot_catalog_api_key`
only consulted env vars (`COPILOT_GITHUB_TOKEN` / `GH_TOKEN` /
`GITHUB_TOKEN`) and the `gh auth token` CLI fallback, never the credential
pool that Hermes's own login flow writes into `auth.json`. With no token,
the live catalog fetch silently 401s and the picker hides current models
(claude-opus-4.7, claude-sonnet-4.6, gpt-5.5, grok-code-fast-1) — even
though `/model <id>` works fine because runtime inference reads the pool
through a different code path.
Mirror the Codex catalog resolver pattern: env-var first (unchanged), then
walk `read_credential_pool("copilot")` for the first entry with a
supported `access_token` (`gho_*` / `github_pat_*` / `ghu_*`). Run it
through `get_copilot_api_token()` so the catalog request uses the same
exchanged token the runtime path uses. Classic PATs (`ghp_*`) are still
rejected up-front via `validate_copilot_token` since the Copilot API
doesn't accept them.
Strictly additive: env still wins, and a missing/locked auth.json (or any
exception during pool read) still returns "" so the caller falls through
to the curated catalog.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
model_tools.py ran discover_mcp_tools() as a module-level side effect.
discover_mcp_tools() uses a blocking 120s wait internally (via
_run_on_mcp_loop -> future.result(timeout=120)).
The gateway lazy-imports run_agent -> model_tools on the first user
message, which happens inside the asyncio event loop thread. A slow or
unreachable MCP server therefore froze Discord shard heartbeats and
Telegram polling for up to 120s on the first message after gateway
start.
Fix: remove the module-level call. Every entry point now runs
discovery explicitly at its own startup, using the context-appropriate
blocking/non-blocking pattern:
- gateway/run.py: loop.run_in_executor(None, discover_mcp_tools)
before platforms start accepting traffic
- hermes_cli/main.py: inline (no event loop at CLI startup)
- tui_gateway/entry.py: inline (sync stdin loop, no event loop)
- acp_adapter/entry.py: inline before asyncio.run()
Closes#16856.
_handle_set_home_command wrote FEISHU_HOME_CHANNEL / DISCORD_HOME_CHANNEL /
etc. as top-level keys into config.yaml, but load_gateway_config() only
reads home channels from env vars. After every gateway restart the home
channel was lost — on every platform, not just Feishu.
Fix: switch /sethome to save_env_value(), which atomically writes to
~/.hermes/.env and updates the current process env in one shot. The
handler builds the env key from platform_name.upper(), so one line
change repairs /sethome for every platform that has a HOME_CHANNEL
env var.
Also widen _EXTRA_ENV_KEYS in hermes_cli/config.py so HOME_CHANNEL and
HOME_CHANNEL_NAME for every platform are treated as managed env vars:
SIGNAL, SLACK, SMS, DINGTALK, BLUEBUBBLES, FEISHU, WECOM, YUANBAO, plus
the missing *_NAME variants for DISCORD/TELEGRAM/MATTERMOST.
Closes#16806
Co-authored-by: teknium1 <screenmachine@gmail.com>
`/new` after `/model <custom-provider>:<model>` silently reverted to a
native provider whose static catalog happened to contain the same model
name (e.g. `deepseek-v4-pro` → native `deepseek` → 401).
Root cause at the `/model` writeback site: `HERMES_INFERENCE_PROVIDER`
was set unconditionally but `HERMES_TUI_PROVIDER` was only mirrored when
it was already set. On sessions launched without `--provider`,
`HERMES_TUI_PROVIDER` stayed unset, so `_resolve_startup_runtime()` on
`/new` skipped the explicit-provider early return and fell through to
`detect_static_provider_for_model()`.
Fix: set `HERMES_TUI_PROVIDER` unconditionally alongside
`HERMES_INFERENCE_PROVIDER` when `/model` lands. Keeps #15755's
invariant intact — `HERMES_TUI_PROVIDER` remains the canonical
"explicit this process" carrier, `HERMES_INFERENCE_PROVIDER` remains
ambient and does not short-circuit startup resolution.
Bug report and diagnosis: @Bartok9 in #16857 / #16873.
Fixes#16857
Replace the Linux/macOS pgrep regex ("hermes.*dashboard") with a ps
scan + the same explicit patterns list already used on the Windows
branch and in hermes_cli.gateway._scan_gateway_pids:
hermes dashboard
hermes_cli.main dashboard
hermes_cli/main.py dashboard
The old greedy regex would match any cmdline containing both words —
e.g. a chat session whose argv mentions "dashboard" or an unrelated
grafana/dashboard-server process. Added regression tests for both.
Follow-up tightening on #16881.
The dashboard is a long-lived server process users start and forget.
When hermes update replaces files on disk, the running process holds
the old Python backend in memory while the JS bundle gets updated,
producing a silent frontend/backend mismatch (e.g. v0.11.0 changed
the session token header -- old backends reject every API call).
Scan for running dashboard processes after a successful update (both
git and ZIP paths) and print a warning with their PIDs and restart
instructions. Mirrors the existing pattern for gateway processes.
Fixes#16872
When delegation.provider is configured (e.g. minimax-cn), subagents
inherited the parent's acp_command unconditionally. This caused
run_agent.py to initialize CopilotACPClient, which bypassed the
override credentials entirely and used its own default model
(provider=copilot-acp model=qwen3.5-397b-a17b) instead of the
configured delegation.provider and delegation.model.
Fix: when override_provider is set but override_acp_command is not,
clear effective_acp_command and effective_acp_args so the child agent
uses direct API calls with the configured provider credentials.
The existing override_acp_command path is unchanged — explicit ACP
transport overrides still force provider=copilot-acp as before.
Fixes#16816
PR #16888 swaps the opencode-zen/go resolver so that api_mode is always
re-derived from the effective model before the persisted api_mode is
consulted. That's the point of the fix — a stale anthropic_messages
from a previous minimax default must not survive a /model switch to a
chat_completions target (or vice versa) and strip /v1 from base_url.
The prior test asserted the opposite precedence — that a persisted
api_mode won over model-derived mode — and was added in #4508 to lock
in escape-hatch behavior. Under the new precedence that escape hatch
no longer exists for opencode (only for providers that genuinely
support both modes at a single endpoint — and for opencode the model
name is the unambiguous signal). Rename + invert the assertion to
document the intentional behavior change.
Refs #16878.
opencode-zen and opencode-go each serve both anthropic_messages
(e.g. minimax-m2.7) and chat_completions (e.g. deepseek-v4-flash)
models behind a single base_url. The api_mode resolver in
hermes_cli/runtime_provider.py honoured the persisted
model_cfg.api_mode (set by the previous default model) before checking
the opencode model registry, so /model deepseek-v4-flash from a session
whose default was minimax-m2.7 inherited 'anthropic_messages', stripped
'/v1' from base_url (the Anthropic SDK adds its own /v1/messages), and
404'd.
Promote the opencode detection branch above the configured_mode check
in both api_mode resolution paths:
- _resolve_runtime_from_pool_entry (pool-backed providers)
- _resolve_api_key_runtime (api-key providers, fallback path)
Both branches now call opencode_model_api_mode(provider, effective_model)
unconditionally for opencode-zen/go before considering any persisted
api_mode, so the mode always reflects the model the user just switched
to.
Existing tests pass (12/12 in tests/hermes_cli/test_model_switch_opencode_anthropic.py).
Fixes#16878
Switch _PRIORITY_PROCESSING_MODELS and _ANTHROPIC_FAST_MODE_MODELS from
hardcoded frozensets to prefix-based matching. Any gpt-*, o1*, o3*, o4*
(OpenAI) and any claude-* (Anthropic) now exposes /fast.
Fixes the case where gpt-5.5 and other post-catalog models silently
skipped Priority Processing because they weren't in the frozenset.
Future OpenAI/Anthropic releases will work without a catalog bump.
Safety:
- Codex-series (*codex*) still excluded — they route through the Codex
Responses API which doesn't take service_tier.
- Anthropic adapter already gates speed=fast on native endpoints only
(_is_third_party_anthropic_endpoint), so claude-sonnet-4.6 on
OpenRouter/Bedrock/opencode-zen won't leak the unknown beta.
- service_tier=priority is silently dropped by non-OpenAI proxies, so
false positives are harmless.
Drop the duplicate _load_openclaw_config_early() added in the salvaged
commit — load_openclaw_config() (line 979) has the identical body and
is a plain instance method that only needs self.source_root, which is
already set before __init__ needs it.
OpenClaw users who started before the rebrand (when the project was
clawd/clawdbot) often have a custom workspace directory configured via
agents.defaults.workspace in openclaw.json (e.g. ~/clawd/ instead of
~/.openclaw/workspace/).
The migration tool only checked hardcoded relative paths (workspace/,
workspace-main/, workspace-assistant/) inside the source root, so files
like MEMORY.md, skills, and daily memory in custom workspaces were
silently skipped.
This change:
- Reads agents.defaults.workspace from openclaw.json at init time
- Uses it as a final fallback in source_candidate() when files aren't
found in the standard locations
- Standard workspace paths are still preferred (custom is fallback only)
- Custom workspace is only used when it's outside the source_root tree
(avoids double-matching when workspace/ is the default)
Adds two tests:
- Custom workspace files are discovered and migrated
- Standard workspace location is preferred over custom
Flips security.redact_secrets from true to false in DEFAULT_CONFIG, and
the HERMES_REDACT_SECRETS env-var fallback in agent/redact.py now
requires explicit opt-in ("1"/"true"/"yes"/"on") to enable.
New installs and users without a security.redact_secrets key get pass-
through tool output. Existing users whose config.yaml explicitly sets
redact_secrets: true keep redaction on — the config-yaml -> env-var
bridges in hermes_cli/main.py and gateway/run.py still honor their
setting.
Also updates the inline config comments, website docs, and the
hermes-agent skill so /hermes config set security.redact_secrets true
is now the documented way to turn it on.
MatrixAdapter._is_self_sender returns True defensively when _user_id is empty
(whoami not yet resolved) to prevent echo loops — see #15763. The reaction
approval test must therefore initialize a user_id so _on_reaction does not
drop the inbound test event before reaching the approval handler.
Self-contained docker-compose harness that exercises the new bootstrap
branch against a real Continuwuity homeserver. Three tests:
1. fresh bot → bootstrap fires, /keys/query returns master + ssk
with UNPADDED base64 keyids, current device is signed by the
new SSK
2. second startup with same crypto store → bootstrap is skipped
3. MATRIX_RECOVERY_KEY set → existing verify_with_recovery_key path
takes precedence, no new bootstrap
Run via:
docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml up -d
python tests/e2e/matrix_xsign_bootstrap/test_bootstrap.py
docker compose -f tests/e2e/matrix_xsign_bootstrap/docker-compose.yml down -v
The test mirrors the bootstrap snippet from matrix.py inline so it can
run without importing the full hermes gateway and its deps. Skipped
automatically when mautrix isn't installed or the homeserver is
unreachable.
All three pass against ghcr.io/continuwuity/continuwuity:latest
(Continuwuity 0.5.7). The unpadded-keyid assertion is the load-bearing
one — it's exactly the property the PR's bootstrap path provides that
the hand-rolled `base64.b64encode().decode()` scripts get wrong.
Without this, every Matrix bot started under hermes-agent shows the
"Encrypted by a device not verified by its owner" badge in Element
indefinitely, because the cross-signing chain (master → SSK → device)
was never published. Operators currently have to write their own
bootstrap script and remember to run it once per bot — and it's easy
to get wrong (the obvious base64.b64encode().decode() produces padded
keyids that matrix-rust-sdk silently rejects in /keys/query, so even
correctly-signed keys fail to load identity in Element).
mautrix already has the right primitive: generate_recovery_key() does
the full flow — generate seeds, upload privates to SSSS, publish
publics to the homeserver, sign the current device with the new SSK,
and return the human-readable recovery key. We invoke it once on
startup if the bot has no existing cross-signing identity, and log
the recovery key with a clear instruction to save it for future
restarts via MATRIX_RECOVERY_KEY (which the existing recovery-key
path already consumes).
Skipped when MATRIX_RECOVERY_KEY is set (existing path takes over)
or when the bot already has cross-signing keys on the homeserver
(get_own_cross_signing_public_keys returns non-None).
Bootstrap failure is non-fatal — logged with hint about UIA; the bot
continues without cross-signing and Element will show the warning
that prompted this PR. That matches the existing soft-fail pattern
for verify_with_recovery_key.
Tested against Continuwuity 0.5.7 (no UIA required). Synapse with
UIA enabled will need a follow-up PR to thread MATRIX_PASSWORD
through to /keys/device_signing/upload.
Five ``except Exception as exc:`` blocks in the Matrix adapter logged
only ``str(exc)`` without ``exc_info=True``:
- _reverify_keys_after_upload → post-upload key verification failure
- _upload_keys_if_needed → initial device-key query failure
- _upload_keys_if_needed → re-upload device keys failure
- _upload_keys_if_needed → initial device key upload failure
- connect → whoami / access-token validation failure
The E2EE key paths here are security-critical: a silent traceback-
less failure during device-key verification or upload makes it
hard for operators to tell whether their Matrix bot is failing
because of a stale token, a federation timeout, or an olm state
mismatch — all three fail with different tracebacks, which
``str(exc)`` alone flattens.
The contributing guide asks for ``exc_info=True`` on error logs.
Append it to each of the five call sites. Pure logging enrichment.
- Wrap _sync_loop sync() call with asyncio.wait_for(timeout=45s) to guard
against TCP-level hangs that the Matrix long-poll timeout cannot catch
- Add logger.debug at the top of _on_room_message so LOG_LEVEL=DEBUG
confirms whether callbacks fire at all (diagnoses #5819, #7914, #12614)
- Add logger.debug when MATRIX_REQUIRE_MENTION silently drops a message,
pointing users to the env var to disable the filter
Adapted for current mautrix-python adapter (PR was written against the
legacy matrix-nio adapter).
Closes#5819
* Port from Kilo-Org/kilocode#9448: roll up subagent costs into parent session total
Child subagents built by delegate_task() each track their own
session_estimated_cost_usd, but the parent agent's total never folded
those numbers in. On runs where the parent mostly delegates and the
children do the expensive work, the footer/UI was reporting a fraction
of the actual spend — sometimes $0.00 when the parent itself made no
billed calls.
Fix:
- Capture each child's session_estimated_cost_usd into _child_cost_usd
on the result entry (before child.close() drops the counter).
- After the existing subagent_stop hook loop, sum the children's costs
and add the total to parent.session_estimated_cost_usd.
- Promote session_cost_source from 'none' -> 'subagent' when the parent
had no direct spend but children did, so the UI doesn't label the
total as having unknown provenance. Real sources (openrouter,
anthropic, etc.) are preserved.
Nested orchestrator -> worker trees roll up naturally: each layer's own
delegate_task() folds its direct children in, and when the orchestrator
itself returns, its parent folds the orchestrator's now-inflated total
on top.
Internal fields (_child_cost_usd, _child_role) are stripped from the
results dict before it's serialised back to the model — same contract
as _child_role already followed.
Tests: TestSubagentCostRollup (5 cases) covers single-child, batch,
zero-cost-children, preserved-source, and legacy-fixture paths.
Source: https://github.com/Kilo-Org/kilocode/pull/9448
* fix(web): scope dashboard config Reset button to the current tab
Reported by @ykmfb001 via X: clicking 'Restore Defaults' (恢复默认值) on
the Auxiliary page wiped the entire config.yaml to defaults, not just
the auxiliary section. The button sits next to the category tabs and
users reasonably assumed 'reset this tab', not 'reset everything'.
Changes:
- handleReset now scopes to the fields in the current view:
active category's fields (form mode) or search-matched fields
(search mode). Only those keys are copied from defaults; the rest
of the config is left alone.
- Added a window.confirm() with the scope name before applying.
- Button is hidden in YAML mode (scoping doesn't apply there).
- Tooltip/aria-label now name the scope, e.g. 'Reset Auxiliary to
defaults'.
- i18n: new resetScopeTooltip / confirmResetScope / resetScopeToast
strings in en + zh; resetDefaults key preserved for compat.
On AWS Bedrock (and Azure AI Foundry), Claude Opus 4.6/4.7 and Sonnet 4.6
are capped at 200K context unless the request carries the
`context-1m-2025-08-07` beta header. On native Anthropic (api.anthropic.com)
1M went GA so the header is a harmless no-op, but Bedrock/Azure still gate
it as beta as of 2026-04.
Hermes was advertising 1M in model_metadata.py (`claude-opus-4-7: 1000000`)
while silently sending a request without the beta — so Bedrock users saw
a 200K ceiling with no error message, and no config knob unblocked it.
Claude Code sends this header by default, which is why the same Bedrock
credentials worked there.
- Add `context-1m-2025-08-07` to `_COMMON_BETAS` (alongside interleaved
thinking and fine-grained tool streaming).
- Strip it in `_common_betas_for_base_url` for MiniMax bearer-auth
endpoints — they host their own models, not Claude, so Anthropic beta
headers are irrelevant and could risk rejection.
- Attach `_COMMON_BETAS` as `default_headers` on the AnthropicBedrock
client. Previously that constructor passed no betas at all, so native
Anthropic had the 1M unlock via default_headers but Bedrock didn't.
- Fast-mode per-request `extra_headers` already rebuilds from
`_common_betas_for_base_url`, so it picks up the 1M beta automatically.
Reported by user 'Rodmar' on Discord: Bedrock Opus 4.7 stuck at 200K while
same credentials worked in Claude Code.
Anyone who ran hermes between Apr 15 (42aeb4ec) and Apr 22 (a7d78d3b)
has schema_version=7 from the pre-renumber api_call_count migration.
When a7d78d3b inserted reasoning_content as the new v7 and pushed
api_call_count to v8, the 'if current_version < 7' gate was already
false for those users, so reasoning_content was never created —
sqlite3.OperationalError: no such column: reasoning_content on any
/continue or /resume touching assistant replays.
Replaces the version-gated ADD COLUMN chain with _reconcile_columns():
on every startup, parse SCHEMA_SQL via an in-memory SQLite and diff
against PRAGMA table_info; ALTER TABLE ADD COLUMN for anything missing.
Follows the Beets / sqlite-utils pattern — SCHEMA_SQL becomes the single
source of truth for declared columns. Self-healing and idempotent.
v10 trigram FTS backfill is retained in a version-gated block — that
migration isn't a column add, it inserts existing message rows into
the new FTS virtual table, so reconciliation can't express it.
schema_version is also kept for future row-data migrations.
Salvaged from #14097 (@kshitijk4poor) onto current main; v10 trigram
preservation and the v9 codex_message_items column (stale-missed by
the original branch) are covered automatically by reconciliation.
Tests:
- Regression: DB at old v7 with api_call_count but no reasoning_content
gets the column on open
- Idempotency: reopening the same DB is a no-op
- Structural invariant: every SCHEMA_SQL column is in the live DB
- Existing v2 migration test still passes
- E2E verified against fresh / v1 / old-v7 / v9 DBs, plus v10 trigram
backfill preserved
Port https://github.com/blader/humanizer (MIT, v2.5.1, 16k stars) into
the built-in skills under skills/creative/humanizer/. Based on Wikipedia's
'Signs of AI writing' guide (WikiProject AI Cleanup) — detects 29 AI-writing
patterns and rewrites them to sound human.
Hermes-native adaptations:
- Description (<60 chars) explains what it's for: 'Humanize text: strip
AI-isms and add real voice.'
- 'When to use this skill' section — trigger phrases (humanize, de-AI,
de-slop, un-ChatGPT, rewrite to not sound like an LLM) plus guidance to
apply it to the agent's own output (release notes, PR descriptions, docs).
- 'How to use it in Hermes' — maps the three real input paths (inline,
file via read_file/patch/write_file, voice-calibration sample) onto the
tools the agent actually has. Drops Claude Code's allowed-tools block.
- Converted frontmatter to Hermes format (metadata.hermes.tags, category,
homepage, related_skills).
Attribution preserved:
- Original author Siqi Chen (@blader) credited in frontmatter and body.
- Full MIT LICENSE copied verbatim alongside SKILL.md.
- Wikipedia / WikiProject AI Cleanup credited.
- 29 patterns, personality/soul section, and full worked example kept
verbatim from the source (29,914 chars).
Validated end-to-end against a clean HERMES_HOME:
- sync_skills() copies skills/creative/humanizer/ including LICENSE.
- skills_list(category='creative') returns the 48-char description.
- skill_view(name='humanizer') returns the full body with all 29 patterns,
personality/soul, attribution, and Hermes tool refs (read_file, patch,
write_file) intact.
Plugins can now observe dangerous-command approval events in real time,
on both the CLI-interactive path and the async gateway path. This is the
missing hook surface external tools need to build approval notifiers
(macOS menu-bar allow/deny, Slack alerts, audit logs, etc.) without
forking Hermes or running a parallel gateway adapter.
Changes:
- hermes_cli/plugins.py: add two entries to VALID_HOOKS
- tools/approval.py: fire both hooks from check_all_command_guards --
around prompt_dangerous_approval (CLI surface) and around the
notify_cb + blocking event.wait loop (gateway surface)
- website/docs/user-guide/features/hooks.md: document both hooks with
a macOS-notification example
- tests/tools/test_approval_plugin_hooks.py: 5 tests covering CLI once,
CLI deny, plugin-crash resilience, gateway approve, gateway timeout
Hooks are observer-only: return values are ignored, so plugins cannot
veto or pre-answer an approval (use pre_tool_call for that). A crashing
plugin cannot break the approval flow -- invoke_hook swallows per-
callback errors, and the wrapper logs and swallows dispatch-layer
errors too.
Surface kwarg distinguishes "cli" from "gateway"; post hook reports
choice as one of once/session/always/deny/timeout.
A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures.
Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields:
- _last_aux_model_failure_model: str | None
- _last_aux_model_failure_error: str | None
Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings.
Surface at three places:
- gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved)
- gateway /compress command: ℹ line appended to the reply
- CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam
Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.
Adds four new reference docs covering common TD use cases not previously
documented in the skill:
- animation.md: LFOs, timers, keyframes, easing, time references
- midi-osc.md: MIDI controllers, OSC routing, TouchOSC, multi-machine sync
- particles.md: POPs and particleSOP — emission, forces, collisions, render
- projection-mapping.md: windowCOMP, corner pin, mesh warp, edge blending
Also clarifies the SKILL.md tool quick reference: adds td_screen_point_to_global
and notes that 4 admin/dev-mode tools (td_project_quit, td_test_session,
td_dev_log, td_clear_dev_log) live only in mcp-tools.md to keep the main
reference focused on creative workflows.
No SKILL.md workflow or critical-rules changes. References load on demand
so no token-budget impact at session start.
The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder.
Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back).
Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.
PR #16333 added a warning to the manual /compress reply when the
auxiliary summariser fails and the static fallback placeholder is
used, but only the gateway-hygiene path had a test
(test_session_hygiene_warns_user_when_summary_generation_fails).
The /compress branch in _handle_compress_command was uncovered.
New test test_compress_command_appends_warning_when_summary_generation_fails
mocks the compressor's _last_summary_fallback_used /
_last_summary_dropped_count / _last_summary_error fields and
verifies the /compress reply contains the ⚠️ marker, the underlying
error string, the dropped message count, and the 'historical
message(s) were removed' wording — i.e. the same contract the
hygiene-path test enforces.
The per-call reset block at the top of compress() cleared
_last_summary_dropped_count and _last_summary_fallback_used but
not _last_summary_error. Functionally this didn't break the
gateway warning path (callers gate on _last_summary_fallback_used
first, and _last_summary_error is overwritten on the next failure),
but it left the three tracking fields inconsistent — anyone
reading _last_summary_error standalone after a successful compress
would see a stale value from a previous failed compress.
Reset all three together so the per-call contract is uniform.
The fallback placeholder said "N conversation turns were removed" while the
gateway warning said "N historical message(s) were removed". Use "messages"
in both so users don't wonder if the two counters refer to different things.
Address review feedback on PR #16333:
1. The hygiene-path warning send was missing metadata=_hyg_meta. On
Telegram topics / Slack threads / Discord threads the warning would
land in the main channel instead of the originating thread. Now
reuses the same _hyg_meta dict already computed for the hygiene
compaction itself.
2. New gateway-level test
test_session_hygiene_warns_user_when_summary_generation_fails
verifies end-to-end:
- When the compressor's _last_summary_fallback_used flag is True,
the gateway invokes adapter.send() exactly once.
- The warning message includes the dropped count and the underlying
error string.
- metadata={'thread_id': ...} is propagated so the warning lands
in the originating topic/thread.
Tests: 20 gateway hygiene + 54 context_compressor — all pass.
When auxiliary compression's summary LLM call fails (e.g. model 404,
auxiliary model misconfigured), the compressor still drops the selected
turns and inserts a static fallback placeholder — the dropped context
is unrecoverable.
Previously the only signal of this was a WARNING in agent.log. Gateway
users (Telegram/Discord/etc.) had no way to know context was lost
because the existing _emit_warning path requires a status_callback,
and the gateway hygiene path uses a temporary _hyg_agent with
quiet_mode=True and no callback wired up.
Changes:
- ContextCompressor: track _last_summary_fallback_used and
_last_summary_dropped_count on each compress() call. Cleared at the
start of compress() and on session reset.
- gateway/run.py hygiene: after auto-compress, inspect the temp
agent's compressor; if fallback was used, send a visible ⚠️ warning
to the user via the platform adapter (TG/Discord/etc.) including
dropped count and the underlying error.
- gateway/run.py /compress: append the same warning to the manual
compress reply so users running /compress see the failure too.
Acceptance:
- Summary success: no user-visible warning (unchanged).
- Summary failure on gateway hygiene: user receives a TG/Discord
message with dropped count + error + remediation hint.
- Summary failure on /compress: warning appended to the command reply.
- CLI status_callback / _emit_warning path is untouched.
- Test coverage: two new tests verify the tracking fields are set on
failure and cleared on subsequent success.
The typing-indicator refresh loop in BasePlatformAdapter._keep_typing
awaited each send_typing call unconditionally. Each call is an HTTP
round-trip to the platform API (Telegram/Discord), normally ~100ms. When
the same network instability that causes upstream provider timeouts
(e.g. Anthropic capacity blips slowing first-token latency past the
120s stream-read timeout) also slows the platform typing API to
multi-second response times, the refresh loop stalls inside the await.
Platform-side typing expires at ~5s, so the bubble dies and stays dead
until the stuck send_typing call returns — right when the user most
needs the 'still working' signal and instead sees a bot that looks
dead, then asks 'wtf are you doing' which itself interrupts the
eventually-recovering turn.
Bound each send_typing with asyncio.wait_for (1.5s cap, derived from
interval so it's always below the 2s cadence). Slow calls get abandoned
so the next scheduled tick fires a fresh send_typing on schedule. As
long as any one of them reaches the platform within its ~5s
typing-expiry window, the bubble stays visible across the stall.
Also catches non-timeout send_typing exceptions (transient HTTP errors)
so one bad tick doesn't terminate the whole loop.
Tests: 4 new in tests/gateway/test_keep_typing_timeout.py covering
slow-send non-blocking, fast-send still-awaited, exception resilience,
and paused-chat regression guard.
- moveCursor(extend=true) now collapses to the bare cursor when the
computed offset equals the existing anchor instead of leaving a
zero-length sel. Without this, Shift+Left at col 0 / Shift+Home at
start would silently hide the hardware cursor (selected truthy)
without rendering any highlight.
- _tui_need_npm_install also catches UnicodeDecodeError so a corrupted
/ non-UTF8 lockfile falls back to the mtime path the docstring
promises instead of crashing.
Made-with: Cursor
* feat(tui): auto copy-on-select for transcript text
Drag in the transcript already highlighted but you had to press Cmd+C to
land it on the clipboard, and the highlight cleared on copy — most users
never realised selection existed. Now drag-release fires copySelectionNoClear
so the text is on the clipboard immediately while the highlight stays put,
matching iTerm2's "Copy to pasteboard on selection" default. Esc clears.
Behaviour:
- Single click in the input still positions the cursor (TextInput onClick).
- Single click in the transcript still does nothing destructive.
- Double / triple click select word / line, then drag extends.
- /copyselect [on|off|toggle] (alias /cos) flips the setting at runtime,
HERMES_TUI_DISABLE_COPY_ON_SELECT=1 disables at startup, persists via
display.tui_copy_on_select in config.yaml.
Help overlay now lists drag-select, multi-click, and click-to-position
so the gestures are discoverable.
Made-with: Cursor
* fix(tui): support prompt text selection gestures
Add mouse drag selection and Shift+Arrow/Home/End extension inside the TUI composer so prompt text behaves like a normal editable field while keeping click-to-position and right-click paste intact.
Made-with: Cursor
* Revert "feat(tui): auto copy-on-select for transcript text"
This reverts commit 6701288fe0.
* fix(tui): allow composer selection from prompt whitespace
Give the composer a one-cell mouse capture pad before the editable text. The prompt glyph/gutter still does not become selectable, but dragging from the edge now anchors at input offset 0 so users do not need to hit the first character precisely.
Made-with: Cursor
* fix(tui): clear selections from blank composer space
Clicking blank space in the transcript or composer now clears active TUI/input selections like a normal text surface. TextInput clicks stop bubbling so cursor placement and selection gestures keep their local behavior.
Made-with: Cursor
* fix(tui): delegate prompt gutter drags to composer text
The prompt gutter is now an input gesture region, not selectable content. Dragging from the whitespace or prompt area anchors the composer selection at offset 0, while selection highlight/copy remains limited to actual input text.
Made-with: Cursor
* fix(tui): move composer cursor to end on selection clear
External clear actions now collapse the composer selection to the end of the input, matching normal text-field behavior after dismissing a selection.
Made-with: Cursor
* fix(tui): capture composer padding before prompt
Add an explicit mouse capture cell over the left padding before the prompt glyph. Drags starting there now delegate to the composer input at offset 0 instead of starting terminal-level selection over the prompt chrome.
Made-with: Cursor
* fix(tui): avoid npm install on lockfile mtime churn
Compare package-lock.json against npm's hidden node_modules lock by content instead of mtimes. Git checkouts and npm lock rewrites can make the root lockfile newer even when installed dependencies already match, causing hermes --tui to print Installing TUI dependencies on every launch.
Made-with: Cursor
* fix(tui): include prompt leading cell in gesture region
Use the prompt box's real layout region to cover the leading whitespace cell before the glyph. The cell now participates in mouse hit testing and delegates to composer selection instead of starting terminal-level selection.
Made-with: Cursor
* fix(tui): widen prompt-side gesture capture band
Capture a wider left-side band around the composer prompt row so drags starting in terminal gutter/padding cells are consumed and delegated to input selection, instead of triggering terminal-level selection chrome.
Made-with: Cursor
* fix(tui): make pre-prompt spacer non-selectable content
Replace the sticky-prompt fallback `Text(' ')` with an empty spacer box so the visual gap remains but no literal space character is rendered/copyable before the composer prompt.
Made-with: Cursor
* fix(tui): capture pre-prompt spacer without shifting prompt layout
Revert the widened negative-margin prompt capture band and instead capture drags on the dedicated spacer row above the prompt. This keeps prompt/text alignment stable while still delegating whitespace-start drags to composer selection.
Made-with: Cursor
* fix(tui): align prompt with status bar and capture full input row
Drop the leading prompt column from 3 to 2 so the input first character lines up with the status bar text. Wrap the prompt+input row in a single mouse-capture box and stop event propagation from TextInput's own handlers so any drag in that row delegates to composer selection without leaking to terminal-level selection.
Made-with: Cursor
* fix(tui): anchor hardware cursor during composer selection
When a composer selection covers a row exactly the column width, the rendered text fills the row and the terminal auto-wraps the hardware cursor to col 0 of the next row, leaving a ghost block beneath the prompt. Park the cursor at the start of the input box during selection so it can't escape the input region.
Made-with: Cursor
* fix(tui): hide hardware cursor during composer selection
Stop fighting auto-wrap by hiding the hardware cursor outright while the
composer has an active selection. This prevents both the ghost block under
the prompt (cursor wrapping past the last cell) and the parked-cursor block
on the first selected character. The cursor restores as soon as the
selection clears or focus changes.
Made-with: Cursor
* chore(tui): /clean — drop dead capture-pad path, dedupe gutter handlers
- TextInput: remove unused leftCaptureColumns prop and capture-pad math, drop
unused mouseApi.startAt, fold mouse offset into a single offsetAt helper,
share a MouseEventLite type across the four handlers.
- appLayout: hoist a GutterMouseEvent type and an endInputDrag callback so the
spacer/prompt/input rows share one shape.
- _tui_need_npm_install: lift the runtime-only key set to a module constant,
collapse nested isinstance checks, and document the mtime fallback.
Made-with: Cursor
* fix(tui): address copilot review on PR #16732
- Split InputSelection.clear() into clear() (cursor-preserving) and
collapseToEnd() (clear + jump to end). Cmd+C copy paths keep using
clear() so the cursor stays put; the blank-area click in useMainApp
switches to collapseToEnd() to match the requested UX.
- Spacer-row drags now force row=0 when forwarding into the input,
since the spacer's vertical origin doesn't align with the input box
and Ink mouse-capture keeps dispatching motion to the original
target. Prompt+input row drag keeps localRow because origins match.
Made-with: Cursor
* fix(tui): give TextInput Box an explicit width
After the /clean pass dropped the unused capture-pad math, the wrapping
Box also lost its explicit width and started sizing to its rendered
content. Clicks past the last character missed TextInput and fell
through to the parent prompt-row Box, which collapsed the cursor to
offset 0. Pin the Box back to `columns` so the input owns its full
column span regardless of value length.
Made-with: Cursor
* feat(tui): double-click select-all + hide cursor on terminal blur
- Track click time/offset in TextInput so a quick second click on the
same offset triggers select-all. Ink's screen-level multi-click is
bypassed once our onMouseDown captures, so the gesture has to be
detected locally.
- Extend the cursor-hide effect to also fire when the terminal loses
focus, so the hollow-rect ghost most terminals draw at the parked
cursor position disappears too.
Made-with: Cursor
* chore(tui): /clean — extract isMultiClickAt helper
Pull the click-recurrence math out of TextInput's onMouseDown into a
small isMultiClickAt(offset) helper so the handler reads as the gesture
list it actually is (multi-click → select-all, otherwise start).
Drop the redundant length>0 guard now that selectAll() already noops on
an empty value.
Made-with: Cursor
* docs(tui): explain _tui_need_npm_install content-vs-mtime comparison
Expand the docstring so future readers understand why we parse the
lockfiles instead of comparing mtimes, what the optional/peer skip
covers, how stale hidden-lock entries are handled, and when we fall
back to mtime.
The previous commit on this branch went through a layer that redacted
strings matching API-key patterns. Restore the original placeholder
values (sk-ant-..., ${ANTHROPIC_API_KEY}, etc.) that were already in
main so the diff is scoped strictly to the new Multi-profile support
section.
Clarifies that Hermes' built-in multi-profile feature is not recommended
when running under Docker. Recommends instead running one container per
profile, each bind-mounting its own host data directory as /opt/data.
Includes docker run examples, a rationale list (isolation, independent
lifecycle, port separation, concurrent-write safety), and a Compose
snippet showing two profile services side by side.
- Rename `removeAt` → `removeAtInPlace` and document the mutation
contract; the old name read like a non-mutating helper.
- Hotkey table + queue header: use `Ctrl+X` / `Esc` to match the
rest of the UI (was `⌃X` / `esc`).
- Render the queued header as a single template literal so JSX
text-node whitespace can't sneak into the rendered line.
- Make `Esc` while editing beat the `terminal.hasSelection` clear:
the header promises 'Esc cancel', so an active selection
shouldn't silently consume the keystroke.
The text input's ctrl-passthrough whitelist only listed Ctrl+C and
Ctrl+B. Ctrl+X fell through to the printable-char branch and got
inserted as 'x' alongside the queue-delete action firing in
useInputHandlers.
Add Ctrl+X to the same whitelist so it bypasses the readline-style
fallback and reaches the app-level handler unchanged. When not in
queue-edit mode it's a no-op, which is fine — typing 'x' on Ctrl+X
was the wrong default anyway.
Today there's no way to remove a queued message — ↑ loads it for edit,
ctrl-K dispatches the head, but a draft you no longer want stays put
forever. ctrl-C just clears the composer and exits edit mode without
touching the queue.
Two new bindings, both gated on queueEditIdx !== null so they're
inert when the user isn't pointing at a queue item:
- ctrl-X — delete the queue item being edited, clear composer, exit
edit mode. "cut" matches the mental model and doesn't collide with
any existing binding.
- esc — cancel the edit (composer clears, item stays in queue).
Mirrors ctrl-C's existing behavior so muscle memory has two paths.
Header line now reads `queued (3) · editing 2 · ⌃X delete · esc cancel`
when in edit mode, so the affordance is discoverable without /help.
The /help hotkey table also gets a Ctrl+X entry.
ctrl-C is intentionally unchanged: it should never destroy queued
content. Cancel is non-destructive (esc / ctrl-C); only ctrl-X
removes the item.
Same layering concern as the persisted-assistant scrub already removed:
_emit_interim_assistant_message and the final_response return path were
mutating model output broadly. Streaming scrubber covers real leaks
delta-by-delta; these post-stream scrubs were redundant.
Reviewer pushback on the original boundary-hardening commits — three
overreach points pulled plugin-specific policy into shared core paths:
1. gateway/run.py hardcoded a '## Honcho Context' literal split for
vision-LLM output. Plugin-format heading in framework code; could
truncate legitimate output naturally containing that header.
Drop the literal split; keep generic sanitize_context (the wrapper
strip is plugin-agnostic). Plugin-specific cleanup belongs at the
provider boundary, not the shared gateway path.
2. run_agent.run_conversation scrubbed user_message and
persist_user_message before the conversation loop. User text is
sacred — if a user types a literal <memory-context> tag we must
not silently delete it. The producer (build_memory_context_block)
is the only legitimate emitter; user input should never need the
reverse op.
3. _build_assistant_message scrubbed model output before persistence.
Same hazard: would silently mutate legitimate documentation/code
the model emits containing the literal markers. The streaming
scrubber catches real leaks delta-by-delta before content is
concatenated; persist-time scrub was redundant belt-and-suspenders.
4. _fire_stream_delta stripped leading newlines from every delta unless
a paragraph break flag was set. Mid-stream '\n' is legitimate
markdown — lists, code fences, paragraph breaks — and chunk
boundaries are arbitrary. Narrow lstrip to the very first delta
of the stream only (so stale provider preamble still gets cleaned
on turn start, but mid-stream formatting survives).
Plus: build_memory_context_block now logs a warning when its defensive
sanitize_context strips something — surfaces buggy providers returning
pre-wrapped text instead of silently double-fencing.
Net architectural change: scrub surface collapses from 8 sites to 3
(StreamingContextScrubber on output deltas, plugin→backend send,
build_memory_context_block input-validation). Plugin-specific strings
stay out of shared runtime paths. User input and persisted assistant
output are no longer mutated.
Tests: rescoped TestMemoryContextSanitization (helper-correctness only,
no source-inspection of removed call sites), updated vision tests to
drop '## Honcho Context' literal-split assertions, updated
_build_assistant_message persistence test to assert preservation.
Added: cross-turn scrubber reset, build_memory_context_block warn-on-
violation, mid-stream newline preservation (plain + code fence).
Closed PR #5137 addressed the retrieval path (peer cards via get_card()
instead of the session-scoped lookup that returned empty for per-session
messaging flows) — that architectural fix is already in main as
_fetch_peer_card / _fetch_peer_context.
What never got fixed is the user-visible side: honcho_profile returning
a flat 'No profile facts available yet.' leaves the model to guess at
why. The model then often surfaces it to the user as a cryptic error.
Adds a diagnostic hint next to the existing 'result' message, enumerating
the likely causes in rough order of frequency:
1. Observation disabled for this peer (user_observe_me/others off)
2. Peer card hasn't accumulated yet (fresh peer / dialectic cadence
hasn't fired enough turns — cards build over time)
3. Generic fallback: self-hosted Honcho < 3.x lacks peer cards
The hint also suggests alternative tools (honcho_reasoning / honcho_search)
so the model can route around the empty card rather than giving up.
Schema description updated so the model knows the hint field exists and
that an empty card is NOT an error state.
7 tests cover the hint paths: warmup, observation-disabled for user + ai,
generic fallback, populated card still returns plain result (no hint),
alternative-tool suggestion present.
The scheme-validation commit (e77a3f2c) was too strict: a user with
legacy ''baseUrl: localhost:8000'' (no ''http://'' prefix) in their
''~/.honcho/config.json'' would get ''No API key configured'' from the
CLI after that change, even though their setup worked before.
urlparse on a schemeless host:port treats the host segment as the
scheme and leaves netloc empty, so the http/https check rejected it.
Falls back to a lenient check for schemeless strings that look like
hosts: contain '.' or ':', aren't a boolean/null literal, aren't pure
digits. The SDK still rejects truly malformed URLs at connect time
with a clearer error than ours.
Three new tests: legacy schemeless hosts accepted; obvious garbage
literals (''true'', ''null'', ''12345'') still rejected. Reviewer
noted concern #1: schemeless regression for self-hosters with old
configs.
main's 6a957a74 added an optional 'metadata' kwarg to
MemoryProvider.on_memory_write so providers can distinguish tool-driven
memory writes from background-review writes. MemoryManager already
does a getfullargspec-based introspection, so the old 3-arg signature
didn't break at runtime — but it missed the origin hint entirely.
Updates HonchoMemoryProvider.on_memory_write to accept the kwarg. The
metadata isn't yet threaded into Honcho's create_conclusion payload —
that's worth its own PR once the consolidation lands and the new
metadata shape stabilises.
Two small follow-ups to the PR review:
- Hoist hashlib import from _enforce_session_id_limit() to module top.
stdlib imports are free after first cache, but keeping all imports at
module top matches the rest of the codebase.
- _resolve_api_key now URL-parses baseUrl and requires http/https +
non-empty netloc before returning the 'local' sentinel. A typo like
baseUrl: 'true' (or bare 'localhost') no longer silently passes the
credential guard; the CLI correctly reports 'not configured'.
Three new tests cover the new validation (garbage strings, non-http
schemes, valid https).
fixes#5719
The auxiliary vision LLM called by gateway._enrich_message_with_vision
can echo its injected Honcho system prompt back into the image
description. That description gets embedded verbatim into the enriched
user message, so recalled memory (personal facts, dialectic output)
surfaces into a user-visible bubble.
Strips both forms of leak before embedding:
- <memory-context>...</memory-context> fenced blocks (sanitize_context)
- trailing '## Honcho Context' sections (header + everything after)
Plus regression tests:
- tests/agent/test_streaming_context_scrubber.py — 13 tests on the
stateful scrubber (whole block, split tags, false-positive partial
tags, unterminated span, reset, case-insensitivity)
- tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on
_fire_stream_delta covering the realistic 7-chunk leak scenario and
the cross-turn scrubber reset
- tests/gateway/test_vision_memory_leak.py — 4 tests covering the
vision auto-analysis boundary (clean pass-through, '## Honcho Context'
header, fenced block, both patterns together)
sanitize_context() uses a non-greedy block regex that needs both
<memory-context> open and close tags present in a single string. When a
provider streams the fenced memory block across multiple deltas (typical
for recalled-context leaks — the payload often arrives in 10+ 1-80 char
chunks), the per-delta sanitize stripped the lone open/close tags via
_FENCE_TAG_RE but let the payload in between flow straight to the UI.
Adds StreamingContextScrubber: a small stateful scrubber that tracks
open/close tag pairs across deltas, holds back partial-tag tails at
chunk boundaries, and discards span contents wholesale (including the
system-note line that fragments across deltas).
Wired into _fire_stream_delta; reset per user turn; benign trailing
partial-tag tails are flushed at the end of each model call. Mid-span
interruption (provider drops closing tag) drops the orphaned content
rather than leaking it — truncated answer > leaked memory.
Follow-up to #13672 (@dontcallmejames).
new_session() was popping the old cached session, releasing the lock,
calling get_or_create, then re-acquiring the lock to insert. A concurrent
caller could observe the empty-cache window and race-create its own
session, producing two divergent session objects for the same key.
_cache_lock is an RLock, so nested reacquisition inside get_or_create is
safe. Hold it across the whole pop/create/insert sequence.
Follow-up to #13510 (@hekaru-agent).
When no explicit timeout is configured (HonchoClientConfig.timeout,
honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client
previously constructed the SDK with no timeout kwarg, letting the
underlying httpx client hang indefinitely if the Honcho backend
became unreachable mid-request.
This is a silent-failure hazard on the post-response path of
run_conversation: the memory_manager.sync_all() / queue_prefetch_all()
calls fire after the agent has already generated its final reply, so
a stalled Honcho request blocks run_conversation from returning.
The gateway never logs "response ready" and never delivers the
response to the platform (Telegram, etc.), even though the text is
already saved to the session file.
Repro: unplug the network or block app.honcho.dev mid-turn after
the model has produced its final message. Without this change,
_run_agent never returns. With it, the call aborts after 30s,
run_conversation returns, and the gateway delivers the response
(Honcho sync failure is logged and swallowed as before).
The default applies only when nothing is configured, so any
deployment that has explicitly set timeout / HONCHO_TIMEOUT /
honcho.timeout / honcho.requestTimeout keeps its existing value.
Self-hosted deployments that genuinely need a longer ceiling can
still override via any of those knobs.
_resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all
CLI subcommands (identity --show, status, migrate, etc.) bail with
"No API key configured" on self-hosted instances that use baseUrl
without an API key.
Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the
client.py behavior that already handles this case for the SDK.
Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).
Wraps _session_cache mutations in threading.RLock. Without this, concurrent
gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time)
can race on the cache and silently lose conclusions or memory writes.
Adopted from #13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup
hunk from that PR is dropped here for scope isolation. Resolved a small
conflict with the pinPeerName guard (kept both).
Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram
supergroup reply chains, Slack thread IDs with long workspace prefixes) can
exceed Honcho's 100-character session ID limit after sanitization. Every
Honcho API call for those sessions then 400s with "session_id too long".
Add a helper that enforces the 100-char limit after sanitization:
short keys (the common case) short-circuit unchanged; over-limit keys
keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over
the original key so two long keys sharing a leading segment can't
collide onto the same truncated ID.
Adds 7 regression tests in tests/honcho_plugin/test_client.py covering
short / exact-limit / long / deterministic / collision-resistant /
allowlist-preserving / hash-suffix-present cases.
CI caught that ``test_session_manager_prefers_runtime_user_id_over_config_peer_name``
in ``tests/agent/test_memory_user_id.py`` failed after this branch: that
test passes a ``MagicMock`` for ``config``, where
``mock.pin_peer_name`` silently returns another ``MagicMock`` — truthy by
default. My ``getattr(..., "pin_peer_name", False)`` fallback was
supposed to guard against callers that haven't added the new attr, but
MagicMock *does* have the attr — it just returns a live mock for it.
Tightened the gate to ``getattr(..., False) is True``. Real configs
built via ``HonchoClientConfig.from_global_config`` always yield a
proper boolean, so strict equality matches the pinned case and rejects
both the unset-attr fallback and MagicMock stand-ins. Added a comment
explaining why ``is True`` is intentional, not paranoid.
Also tightened the ``peer_name`` existence check to
``getattr(..., None)`` so a MagicMock with ``peer_name`` left at its
default (also truthy) doesn't spuriously enable pinning either.
Verified against both the new ``test_pin_peer_name.py`` suite (13/13
pass) and the previously-failing
``TestHonchoUserIdScoping`` (3/3 pass). Zero behaviour change for real
``HonchoClientConfig`` values.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the
platform-native user ID as ``runtime_user_peer_name`` into the Honcho
session manager. That ID wins over ``peer_name`` in ``honcho.json``, so a
single user who connects over three platforms ends up as three separate
Honcho peers — one per platform — with fragmented memory and no cross-
platform context continuity.
For multi-user bots this is correct (and must not change): each user gets
their own peer scope. For the vast majority of personal Hermes deployments
the configured ``peer_name`` is an unambiguous identity, though, so the
reporter asked for an opt-in knob that pins the user peer to that value.
Fix: new ``pinPeerName`` boolean on the host config, default ``false``.
When ``true`` AND ``peerName`` is set, the configured peer_name beats the
gateway's runtime identity; every other resolution case is unchanged.
honcho.json:
{
"peerName": "Igor",
"hosts": {
"hermes": { "pinPeerName": true }
}
}
session.py (resolution order, pinned case):
runtime_user_peer_name → skipped (opt-in flag active)
config.peer_name → WINS "Igor"
session-key fallback → unreached
Parsing follows the same host-block-overrides-root pattern as every other
flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper).
Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups):
- Config parsing: default, root true, host-block true, host overrides
root, explicit false.
- Peer resolution: runtime wins by default (regression guard for multi-
user bots), config wins when pinned, pin-without-peer_name is a no-op
(prevents silent peer-id collapse to session-key fallback), CLI path
where runtime is absent, deepest fallback intact, assistant peer
untouched by the flag.
- Cross-platform unification: Telegram UID + Discord snowflake collapse
to one peer when pinned; negative control confirms two distinct
runtime IDs still produce two peers when unpinned.
244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions.
Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name",
False)`` so callers building partial config objects (several test fixtures
across the codebase do this) don't break if they haven't updated yet.
Runtime cost: one attr lookup per new session.
Closes#14984
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* feat(nix): parameterize dependency-groups in python.nix
* refactor(nix): extract package to callPackage-able hermes-agent.nix
Makes the package overridable via .override{} and adds
extraPythonPackages parameter for PYTHONPATH injection.
Includes build-time collision check using PEP 503 name
canonicalization.
* feat(nix): add overlay for external NixOS consumption
External flakes can now add overlays = [ inputs.hermes-agent.overlays.default ]
to get pkgs.hermes-agent with full .override support.
* test(nix): add check for extraPythonPackages PYTHONPATH injection
Verifies wrapper has PYTHONPATH when extras provided, and
base package has no PYTHONPATH without extras.
* feat(nix): add extraPlugins option for directory-based plugins
Symlinks plugin packages into HERMES_HOME/plugins/ at activation time.
Validates plugin.yaml presence. Asserts unique plugin names at eval time.
Hermes discovers them automatically via its directory scan.
* feat(nix): add extraPythonPackages option for entry-point plugins
Overrides the hermes package with PYTHONPATH injection when
extraPythonPackages is non-empty. Plugin .dist-info directories
become visible to importlib.metadata for entry-point discovery.
Works in both native systemd and container modes.
* docs: add NixOS declarative plugin installation to nix-setup, plugins, and build-a-plugin guides
- nix-setup.md: new Plugins section with extraPlugins/extraPythonPackages
examples, overlay usage, collision checking note, options reference rows
- plugins.md: Nix row in discovery table, NixOS declarative plugins section
- build-a-hermes-plugin.md: Distribute for NixOS section after pip section
* fix: address review feedback — remove unrelated umask, fix fetchFromGitHub naming, simplify checks
- Remove accidentally introduced umask/migration changes (unrelated to plugins)
- Add pluginName helper, fix fetchFromGitHub producing name='source'
- Show name= in extraPlugins example docs
- Simplify checks.nix: use hermes-agent.override instead of re-callPackage
- Fix fragile grep shell logic in checks
* refactor: address simplify feedback — lib.getName, drop unused inputs', Python list for extras
- Use lib.getName instead of custom pluginName helper
- Drop unused inputs' from checks.nix perSystem args
- Pass extraPythonPackages as Python list literal instead of colon-split string
* fix: walk propagatedBuildInputs for plugin PYTHONPATH and collision check
Uses python312.pkgs.requiredPythonModules to resolve the full transitive
closure of extraPythonPackages. Without this, a plugin with third-party
deps (e.g. requests) would fail at runtime if those deps weren't already
in the sealed uv2nix venv. The collision check now also scans the full
closure, catching transitive conflicts.
* cleanup: fold plugins into subdir loop, use find for symlink cleanup, inline lib.getName
- Add 'plugins' to the existing cron/sessions/logs/memories subdir loop
instead of a separate mkdir/chown/chmod block
- Replace fragile for-glob with find -delete for stale symlink cleanup
- Inline lib.getName at both call sites, remove pluginName wrapper
* fix: bypass FTS5 for CJK queries in session_search
FTS5 default tokenizer splits CJK characters into individual tokens,
so multi-character queries like "大别山项目" become AND of single chars.
This produces few/no results compared to LIKE substring search.
For CJK queries, skip FTS5 entirely and use LIKE for accurate
phrase matching.
Fixes NousResearch/hermes-agent#15500
* fix: cache _contains_cjk, escape LIKE wildcards, add regression tests
On top of the CJK FTS5 bypass from #15509:
- Cache _contains_cjk() result in a local var to avoid redundant O(n)
scans on every CJK query
- Escape %, _ in LIKE queries so literal wildcards in user input are
not treated as SQL wildcards (consistent with other LIKE queries in
hermes_state.py that use ESCAPE '\')
- Fix misleading comment ('or CJK fallback' → accurate description)
- Add 3 regression tests:
- test_cjk_partial_fts5_results_supplemented_by_like (#15500 / #14829)
- test_cjk_like_dedup_no_duplicates
- test_cjk_like_escapes_wildcards (new wildcard escaping)
* feat: trigram FTS5 index for CJK search, replace LIKE fallback
Replace the LIKE '%query%' full-table-scan fallback for CJK queries with
a proper trigram FTS5 index (messages_fts_trigram). The trigram tokenizer
creates overlapping 3-byte sequences so substring matching works natively
for any script — CJK, Thai, etc.
For queries with 3+ CJK characters: uses the trigram FTS5 table with
proper ranking, snippets, and indexed lookups. For shorter queries
(1-2 CJK chars): falls back to LIKE since the trigram tokenizer needs
≥9 UTF-8 bytes (3 CJK chars) minimum.
Schema v10 migration creates the trigram table and backfills existing
messages. Triggers keep the index in sync on INSERT/UPDATE/DELETE.
Builds on top of #16276 (bypass FTS5 for CJK, escape LIKE wildcards).
---------
Co-authored-by: vominh1919 <vominh1919@gmail.com>
Keep the parity test backed by the real Python command registry while avoiding hard failures in Node-only Vitest environments that cannot import hermes_cli.commands.
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
is 22, so all users are past version 17 and would never be prompted for
GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
(was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
Distinguish missing model from unsupported model before enabling fast mode and cover both cases so config and live agent state remain untouched on invalid fast toggles.
Match classic CLI parity by refusing to enable fast mode when the active model cannot produce fast request overrides, avoiding a misleading fast status with no runtime effect.
Make `config.set fast status` read-only and keep live agent request overrides in sync with fast-mode toggles so runtime API kwargs match the selected mode.
Expose a small forceRedraw API from @hermes/ink and use it for Ctrl/Cmd+L so the hotkey performs a real terminal clear + full repaint instead of a no-op state patch.
Use explicit repaint patch semantics for Ctrl/Cmd+L and narrow the hotkey assertion to the actual +L entry so unrelated descriptions do not cause false failures.
Harden busy mode config reads against invalid display config shapes and align /fast help+usage text with accepted aliases, with regression coverage for non-dict display values.
Make Ctrl+L non-destructive by redrawing the current screen state instead of starting a new session, and stop auto-appending --global for typed /model commands so session scope remains the default unless explicitly requested.
Route /browser, /reload-mcp, /rollback, /stop, /fast, and /busy through direct TUI RPC handlers so state changes hit the live gateway session instead of slash-worker fallback. Add TUI session finalize/reset parity hooks (memory commit + plugin boundaries) and parity matrix tests to keep mutating commands off fallback.
Handle queued-title ValueError cleanup during session init, harden Discord message source building for test stubs, and fix the Dockerfile contract test syntax error. Also refresh the TUI lockfile and Nix build flags so nix ubuntu-latest no longer fails on npm lock/peer resolution drift.
Retry queued pending titles even when the DB already has a non-empty title so explicit user title intents are not silently lost (for example after auto-title). Includes regression coverage.
Tighten pending-title flush during session init and treat row lookup failures during title-set no-op detection as RPC errors instead of silently queueing.
Handle session.title read failures without crashing, distinguish no-op title writes from missing session rows, and use a distinct empty-title error code with regression coverage.
- create HERMES_TUI_ACTIVE_SESSION_FILE with mkstemp instead of a predictable tmp path and always cleanup in finally
- add assertions that launch wiring uses a randomized session file path and removes it on exit
- use a grouped last_active join in search_sessions to avoid per-row correlated max lookups
- always close SessionDB in _resolve_last_session via finally and add regression coverage for search failure cleanup
- order session listing by computed last_active in SessionDB so callers get MRU rows directly
- keep _resolve_last_session as a single-row lookup and add regression coverage for >20 session sampling
Route TUI /title through session.title RPC and queue titles when the session DB row is still initializing, so renamed sessions reliably appear in /resume and browse flows.
The auto-lowered-threshold warning only named the compression model,
making it confusing when the main and aux models are configured with
the same slug but end up with different resolved context lengths (e.g.
OpenRouter's stepfun/step-3.5-flash catalog value vs. a main-model
context_length override). Users couldn't tell whether the warning
reflected two different models or a context-resolution mismatch.
Now includes both 'model (provider)' labels. The aux provider falls
back to the client's base_url hostname when the configured provider
is 'auto', so users see where compression is actually being called.
Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.
Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.
Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
* fix: clean gateway auxiliary client caches on teardown
* fix(gateway): recover from stale pid files and close cron agents
Two issues were keeping the gateway from surviving long runs:
1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which
refuses to unlink when the file's pid differs from our own. That
safety check exists for the --replace atexit handoff, but it also
applied to stale-record cleanup, so after a crashy exit the pid
file was orphaned: `write_pid_file()`'s O_EXCL create then failed
with `FileExistsError`, and systemd looped on "PID file race lost
to another gateway instance". Unlink unconditionally from this
helper since the caller has already verified the record is dead.
2. The cron scheduler never closed the ephemeral `AIAgent` it creates
per tick, and never swept the process-global auxiliary-client
cache. Over days of 10-minute ticks this leaked subprocesses and
async httpx transports until the gateway hit EMFILE. Release the
agent and call `cleanup_stale_async_clients()` in `run_job`'s
outer `finally`, matching the gateway's own per-turn cleanup.
* chore(release): map bloodcarter@gmail.com -> bloodcarter
---------
Co-authored-by: bloodcarter <bloodcarter@gmail.com>
When a paste takes longer than 500ms to process on the prompt_toolkit
event-loop thread, emit a logger.warning with elapsed time, byte size,
line count, and sys.platform. Gives us concrete repro data for the
recurring 'CLI freezes after paste on macOS' class of reports (issue
#16263, plus sibling reports across Claude Code / Cursor / Lightroom
against macOS Tahoe 26).
Pure diagnostic — no behavior change. Two time.perf_counter() calls
and one conditional per paste event. Log line only fires when the
handler is actually slow, so normal pastes add no log noise.
The backup takes a consistent snapshot of each .db via sqlite3.backup(),
so shipping the live .db-wal / .db-shm / .db-journal alongside pairs the
fresh snapshot with stale sidecar state and produces a torn restore on
first open. Sidecars are transient and SQLite regenerates them on next
connection anyway.
This also trims multi-MB of junk from every zip — state.db-wal alone was
~9 MB here, doubled by the fact the WAL is the live write-ahead log, not
data.
PR #13734 fixed the concurrent-tool-executor vector (ThreadPoolExecutor
workers didn't inherit the CLI's TLS approval callback). Two vectors
remained that could still land in the deadlocking input() fallback:
1. _spawn_background_review spawns a raw threading.Thread with no
approval callback installed, so any dangerous-command guard the
review agent trips falls back to input() -> deadlock against the
parent's prompt_toolkit TUI (same class as delegate_task subagents,
fixed in 023b1bff1 / #15491). Install a _bg_review_auto_deny
callback at thread start, clear on finally.
2. prompt_dangerous_approval's fallback unconditionally spawned a
daemon thread calling input() when approval_callback was None.
That fallback can never succeed under prompt_toolkit because the
user's Enter goes to pt's raw-mode stdin capture. Detect an active
pt Application via get_app_or_none() and fail closed (deny + log)
instead, so future threads that forget to install a callback
degrade gracefully instead of hanging 60s invisibly.
Regression guards:
- tests/run_agent/test_background_review.py verifies the review
worker thread sees a callable auto-deny callback mid-run and that
the slot is cleared in the finally block.
- tests/tools/test_approval.py TestFailClosedUnderPromptToolkit
verifies prompt_dangerous_approval returns 'deny' fast under a
mocked pt Application, and that a real callback still wins over
the guard.
When tools execute concurrently via ThreadPoolExecutor, worker threads
could not see the thread-local approval/sudo callbacks registered by
the CLI. This caused dangerous-command prompts to fall back to plain
input(), which deadlocks against prompt_toolkit's raw terminal mode.
Capture parent-thread callbacks before launching workers, register
them locally in each _run_tool thread, and clear them on exit.
Mirrors the existing fix pattern from cli.py run_agent() for the
main agent worker thread (GHSA-qg5c-hvr5-hjgr / #13617).
The background skill/memory review agent was created without toolset
restrictions, inheriting the full default tool set. This allowed it to
use terminal, send_message, delegate_task, and other tools outside its
intended scope, potentially performing unrelated side effects after
skill creation.
Restrict the review agent to only memory and skills toolsets by passing
enabled_toolsets=['memory', 'skills'] during AIAgent construction.
Fixes#15204
The gateway fix in the previous commit forwards _session_messages on
gateway session teardown. The CLI exit cleanup path had the same bug:
it read getattr(agent, 'conversation_history', None) or [] — but AIAgent
has no conversation_history attribute, so providers always received [].
Switch to _session_messages (same attribute the gateway now uses),
guarded by isinstance(..., list) to preserve the no-arg fallback for
MagicMock-based CLI test stubs.
Adds tests/cli/test_cli_shutdown_memory_messages.py (4 cases mirroring
the gateway suite).
``_cleanup_agent_resources`` previously invoked
``agent.shutdown_memory_provider()`` with no arguments, so every memory
provider's ``on_session_end`` hook received an empty list. Providers
with an early-return guard on empty input (Holographic, Hindsight) never
extracted facts from the conversation, and users hit
"抱歉,找不到相關的對話記錄" on the first turn after any gateway
restart, session reset, or idle expiry.
Forward ``agent._session_messages`` — the transcript the agent itself
maintains and refreshes every turn via ``_persist_session`` — so
providers see the actual conversation. Falls back to the legacy no-arg
call whenever the attribute is absent or not a list (test stubs built
via ``object.__new__`` or ``MagicMock``) to preserve backward
compatibility with existing suites. ``AIAgent.shutdown_memory_provider``
already accepts ``messages: list = None`` (run_agent.py:4126), so this
is a pure caller-side fix.
Paths that use ``skip_memory=True`` temporary agents (memory flush,
hygiene auto-compress, ``/compress``) are no-ops inside
``shutdown_memory_provider`` because ``self._memory_manager`` is None —
no behaviour change for them.
Covers Part A of the bug report. Part B (adding ``on_session_end`` to
the Hindsight plugin) is a separate concern that would benefit from
this fix landing first.
Regression test added at
``tests/gateway/test_shutdown_memory_provider_messages.py`` covering:
populated messages forwarded, empty list still forwarded, attribute
missing falls back, non-list (MagicMock) falls back, provider
exceptions don't block ``close()``, None agent no-op, and agent
without ``shutdown_memory_provider`` tolerated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Session-local trajectory cache — keyed by session hash, regenerated
per-session, won't port to another machine anyway. On a large install
this was multiple GB of pure noise in every zip.
Also adds a regression test for the pre-existing backups/ exclusion
so the two machine-local dirs share coverage.
The zip backup could add minutes to every 'hermes update' on large
HERMES_HOME directories. Flip the default to off and add a --backup
flag for one-off opt-in runs.
- updates.pre_update_backup default: True -> False
- hermes update: new --backup flag (opposite of existing --no-backup)
- Silent no-op when disabled (no message spam on every update)
- Existing --no-backup still works and wins over --backup
- Users who explicitly set pre_update_backup: true keep the old behavior
- Tests updated to cover default-off, --backup opt-in, and config-enabled paths
* feat(image-input): native multimodal routing based on model vision capability
Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.
Routing decision (agent/image_routing.py::decide_image_input_mode):
agent.image_input_mode = auto | native | text (default: auto)
In auto mode:
- If auxiliary.vision.provider/model is explicitly configured, keep the
text pipeline (user paid for a dedicated vision backend).
- Else if models.dev reports supports_vision=True for the active
provider/model, attach natively.
- Else fall back to text (current behaviour).
Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).
run_agent.py changes:
- _prepare_anthropic_messages_for_api now passes image parts through
unchanged when the model supports vision — the Anthropic adapter
translates them to native image blocks. Previous behaviour
(vision_analyze → text) only runs for non-vision Anthropic models.
- New _prepare_messages_for_non_vision_model mirrors the same contract
for chat.completions and codex_responses paths, so non-vision models
on any provider get text-fallback instead of failing at the provider.
- New _model_supports_vision() helper reads models.dev caps.
vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.
Config default: agent.image_input_mode = auto.
Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).
* feat(image-input): size-cap + resize oversized images, charge image tokens in compressor
Two follow-ups that make the native image routing safer for long / heavy
sessions:
1) Oversize handling in build_native_content_parts:
- 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
the most restrictive provider — Gemini inline data).
- Delegates to vision_tools._resize_image_for_vision (Pillow-based,
already battle-tested) to downscale to 5 MB first-try.
- If Pillow is missing or resize still overshoots, the image is
dropped and reported back in skipped[]; caller falls back to text
enrichment for that image.
2) Image-token accounting in context_compressor:
- New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
within the realistic range for Anthropic/GPT-4o/Gemini billing).
- _content_length_for_budget() helper: sums text-part lengths and
charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
input_image part. Base64 payload inside image_url is NOT counted
as chars — dimensions don't matter, only image-presence.
- Both tail-cut sites (_prune_old_tool_results L527 and
_find_tail_cut_by_tokens L1126) now call the helper so multi-image
conversations don't slip past compression budget.
Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).
* fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits
The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:
* Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
'image exceeds 5 MB maximum' above that).
* Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
complaint (empirically verified April 2026 with progressive PNG
sizes).
New behaviour:
* _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
* Providers NOT in the table get no ceiling — images attach at native
size and we trust the provider to return its own error if it
disagrees. A provider-specific 400 message is clearer than us
guessing wrong and silently degrading image quality.
* build_native_content_parts() gains a keyword-only provider arg;
gateway/CLI/TUI pass the active provider so Anthropic users get
auto-resize protection while OpenAI users don't pay it.
* Resize target dropped from 5 MB to 4 MB to slide safely under
Anthropic's boundary with header overhead.
Empirical measurements (direct API, no Hermes in the loop):
image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5
0.19 MB ✓ ✓ ✓
12.37 MB ✗ 400 5MB ✓ ✓
23.85 MB ✗ 400 5MB ✓ ✓
49.46 MB ✗ 413 ✓ ✓
Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.
* refactor(image-input): attempt native, shrink-and-retry on provider reject
Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.
Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.
Reactive design:
- image_routing.py: _file_to_data_url encodes native size, no ceiling.
build_native_content_parts drops its provider kwarg.
- error_classifier.py: new FailoverReason.image_too_large + pattern
match ("image exceeds", "image too large", etc.) checked BEFORE
context_overflow so Anthropic's 5MB rejection lands in the right
bucket.
- run_agent.py: new _try_shrink_image_parts_in_messages walks api
messages in-place, re-encodes oversized data: URL image parts
through vision_tools._resize_image_for_vision to fit under 4MB,
handles both chat.completions (dict image_url) and Responses
(string image_url) shapes, ignores http URLs (provider-fetched).
New image_shrink_retry_attempted flag in the retry loop fires the
shrink exactly once per turn after credential-pool recovery but
before auth retries.
E2E verified live against Anthropic claude-sonnet-4-6:
- 17.9MB PNG (23.9MB b64) attached at native size
- Anthropic returns 400 "image exceeds 5 MB maximum"
- Agent logs '📐 Image(s) exceeded provider size limit — shrank and
retrying...'
- Retry succeeds, correct response delivered in 6.8s total.
Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
* feat(plugins): google_meet — bundled plugin for join+transcribe Meet calls
v1 shipping transcribe-only. Spawns headless Chromium via Playwright,
joins an explicit https://meet.google.com/ URL, enables live captions,
and scrapes them into a transcript file the agent can read across turns.
The agent then has the meeting content in context and can do followup
work (send recap, file issues, schedule followups) with its regular tools.
Surface:
- Tools: meet_join, meet_status, meet_transcript, meet_leave, meet_say
(meet_say is a v1 stub — returns not-implemented; v2 will wire
realtime duplex audio via OpenAI Realtime / Gemini Live +
BlackHole / PulseAudio null-sink.)
- CLI: hermes meet setup | auth | join | status | transcript | stop
- Lifecycle: on_session_end auto-leaves any still-running bot.
Safety:
- URL regex rejects anything that isn't https://meet.google.com/...
- No calendar scanning, no auto-dial, no auto-consent announcement.
- Single active meeting per install; a second meet_join leaves the first.
- Platform-gated to Linux + macOS (Windows audio routing for v2 untested).
- Opt-in: standalone plugin, user must add 'google_meet' to
plugins.enabled in config.yaml.
Zero core changes. Plugin uses existing register_tool /
register_cli_command / register_hook surfaces. 21 new unit tests cover the
URL safety gate, transcript dedup + status round-trip, process-manager
refusals/start/stop paths, tool-handler JSON shape under each branch,
session-end cleanup, and platform-gated register().
* feat(plugins/google_meet): v2 realtime audio + v3 remote node host
v2 \u2014 agent speaks in-meeting
audio_bridge.py: PulseAudio null-sink (Linux) + BlackHole probe (macOS).
On Linux we load pactl module-null-sink + module-virtual-source, track
module ids for teardown; Chrome gets PULSE_SOURCE=<virt src> env so its
fake mic reads what we write to the sink. macOS just probes BlackHole
2ch and returns its device name \u2014 the plugin refuses to switch the
user's default audio input (that would surprise them).
realtime/openai_client.py: sync WebSocket client for the OpenAI Realtime
API. RealtimeSession.speak(text) sends conversation.item.create +
response.create, accumulates response.audio.delta PCM bytes, appends
them to a file. RealtimeSpeaker runs a JSONL-queue loop consuming
meet_say calls. 'websockets' is an optional dep imported lazily.
meet_bot.py: when HERMES_MEET_MODE=realtime, provisions AudioBridge,
starts RealtimeSession + speaker thread, spawns paplay to pump PCM
into the null-sink, then cleans everything up on SIGTERM. If any
realtime setup step fails, falls back cleanly to transcribe mode
with an error flagged in status.json.
process_manager.enqueue_say(): writes a JSONL line to say_queue.jsonl;
refuses when no active meeting or active meeting is transcribe-only.
tools.meet_say: real implementation; requires active mode='realtime'.
meet_join: adds mode='transcribe'|'realtime' param.
v3 \u2014 remote node host
node/protocol.py: JSON envelope (type, id, token, payload) + validate.
node/registry.py: $HERMES_HOME/workspace/meetings/nodes.json, with
resolve() auto-selecting the sole registered node when name is None.
node/server.py: NodeServer \u2014 websockets.serve, bearer-token auth,
dispatches start_bot/stop/status/transcript/say/ping onto the local
process_manager. Token auto-generated + persisted on first run.
node/client.py: NodeClient \u2014 short-lived sync WS per RPC, raises
RuntimeError on error envelopes, clean API matching the server.
node/cli.py: 'hermes meet node {run,list,approve,remove,status,ping}'
subtree; wired into the main meet CLI by cli.py so 'hermes meet node'
Just Works.
tools.py: every meet_* tool accepts node='<name>'|'auto'; when set,
routes through NodeClient to the remote bot instead of running
locally. Unknown node \u2192 clear 'no registered meet node matches ...'
error.
cli.py: 'hermes meet join --node my-mac --mode realtime' and
'hermes meet say "..." --node my-mac' route to the node; 'hermes
meet node approve <name> <url> <token>' registers one.
Tests
21 v1 tests updated (meet_say is no longer a stub; active-record now
carries mode).
20 new audio_bridge + realtime tests.
42 new node tests (protocol/registry/server/client/cli).
17 new v1/v2/v3 integration tests at the plugin level covering
enqueue_say edge cases, env var passthrough, mode validation, node
routing (known/unknown/auto/ambiguous), and argparse wiring for
`hermes meet say` + `hermes meet node` + --mode/--node flags.
Total: 100 plugin tests + 58 plugin-system tests = 158 passing.
E2E verified on Linux with fresh HERMES_HOME: plugin loads, 5 tools
register, on_session_end hook wires, 'hermes meet' CLI tree wires
including the node subtree, NodeRegistry round-trips, meet_join routes
correctly to NodeClient under node='my-mac' with mode='realtime',
enqueue_say accepts realtime/rejects transcribe, argparse parses every
new flag cleanly.
Zero changes to core. All new code lives under plugins/google_meet/.
* feat(plugins/google_meet): auto-install, admission detect, mac PCM pump, barge-in, richer status
Ready-for-live-test follow-up on PR #16364. Five additions that matter for
the first live run on a real Meet, in priority order:
1. hermes meet install [--realtime] [--yes]
pip install playwright websockets + python -m playwright install chromium
--realtime: installs platform audio deps (pulseaudio-utils on Linux via
sudo apt, blackhole-2ch + ffmpeg on macOS via brew). Prompts before
sudo/brew unless --yes. Refuses on Windows. Refuses to auto-flip the
macOS default input — user still selects BlackHole in System Settings
(deliberate; surprise audio rerouting is worse than a manual step).
2. Admission detection
_detect_admission(page): Leave-button visible OR caption region
attached OR participants list present → we're in-call.
_detect_denied(page): 'You can\'t join this video call' / 'You were
removed' / 'No one responded to your request' → bail out.
HERMES_MEET_LOBBY_TIMEOUT (default 300s) caps how long we sit in
the lobby before giving up. in_call stays False until admitted.
Status surfaces leaveReason: duration_expired | lobby_timeout |
denied | page_closed.
3. macOS PCM pump
ffmpeg reads speaker.pcm (24kHz s16le mono) and writes to the
BlackHole AVFoundation output via -f audiotoolbox
-audio_device_index <N>. _mac_audio_device_index() probes
ffmpeg -f avfoundation -list_devices true to resolve 'BlackHole 2ch'
→ numeric index. Falls back to index 0 on probe failure. Linux
paplay pump unchanged.
4. Richer status dict
_BotState now tracks realtime, realtimeReady, realtimeDevice,
audioBytesOut, lastAudioOutAt, lastBargeInAt, joinAttemptedAt,
leaveReason. RealtimeSession.audio_bytes_out / last_audio_out_at
counters fold into the status file once a second so meet_status()
can show the agent's voice activity in near-real-time.
5. Barge-in
RealtimeSession.cancel_response() sends type='response.cancel' over
the same WS (lock-guarded so it's safe to call from the caption
thread while speak() is reading frames). Handles response.cancelled
as a terminal frame type. _looks_like_human_speaker() gates triggers
so the bot's own name, 'You', 'Unknown', and blanks don't self-cancel.
Called from the caption drain loop: when a new caption arrives
attributed to a real participant while rt.session exists, we fire
cancel_response() and stamp lastBargeInAt.
Tests: 20 new unit tests across _BotState telemetry, barge-in gating,
admission/denied probe error handling, cancel_response with and without
a connected WS, and `hermes meet install` CLI wiring (flag parsing +
end-to-end subprocess.run verification + Linux-already-installed fast
path). Total 171 passing across all google_meet test files + the
plugin-system regression suite.
E2E verified on Linux: plugin loads, all 5 tools register,
`hermes meet install --realtime --yes` parses, fresh-bot status.json
has every new telemetry key, cancel_response on a disconnected session
returns False without raising, barge-in helper gates the bot's own
name correctly.
Still out of scope (for a future PR, not blocking live test):
mic → Realtime duplex (the agent listening to meeting audio via
WebRTC), node-host TLS/pairing UX, Windows audio, Meet create+Twilio.
Docs updated: SKILL.md now lists the installer subcommand, lobby
timeout, barge-in caveat, and the full status-dict reference table.
README.md quick-start uses hermes meet install.
Every 'hermes update' now runs a full backup of ~/.hermes/ first, so
users can always roll back to the exact state they had before the
update if anything goes wrong (corrupted sessions.db, broken skills,
config migrations that don't round-trip, etc.).
Changes:
- hermes_cli/backup.py: new create_pre_update_backup() helper. Writes
to <HERMES_HOME>/backups/pre-update-<stamp>.zip using the same
exclusion rules and SQLite safe-copy as 'hermes backup'. Auto-rotates
(keep last N, pre-update-*.zip only — hand-dropped zips in backups/
are untouched). Adds 'backups' to _EXCLUDED_DIRS so subsequent backups
don't nest prior ones.
- hermes_cli/main.py: _run_pre_update_backup() wired into
_cmd_update_impl before any git operation. Prints save path, restore
command, and how to disable. Swallows failures so a broken backup
never blocks the update itself. New --no-backup flag on 'hermes
update' for one-off override.
- hermes_cli/config.py: new 'updates' section in DEFAULT_CONFIG with
pre_update_backup (default true) and backup_keep (default 5).
Auto-surfaces in the dashboard config UI.
- tests/hermes_cli/test_backup.py: +11 tests covering backup location,
content parity with 'hermes backup', no-recursion, rotation, manual
file preservation, config gate, --no-backup flag, flag-wins-over-config.
Adds a short always-on pointer to the system prompt: when the user asks
about configuring, setting up, troubleshooting, or using Hermes Agent
itself, load the hermes-agent skill via skill_view(name='hermes-agent')
and fall back to https://hermes-agent.nousresearch.com/docs via
web_extract. Keeps sessions without skill_view loaded useful too — the
docs URL + web_extract is enough to answer most questions.
The guidance is appended right after DEFAULT_AGENT_IDENTITY (or SOUL.md)
so it ships regardless of which toolset profile is active. Footprint is
~560 chars, behind the existing prompt cache.
The CLI renders through prompt_toolkit in non-full-screen mode, so every
repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase
before drawing the new frame. Any time that tracked position drifts from
terminal reality, redraws stack on top of stale content instead of
overwriting it. Four user-visible bugs share this root cause.
Fixes:
- #5474 (SIGWINCH ghosts): the resize wrapper previously only handled
column-shrink reflow. Generalize it to force a full screen-clear
(erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize
— covers widen, row-shrink, and multiplexer SIGWINCH-less redraws.
- #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so
prompt_toolkit has no signal to recover. Add a _force_full_redraw()
helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed
as /redraw. Users can manually clear drift without restarting Hermes.
- #14692 (DSR response leaks — ^[[53;1R): resize storms make
prompt_toolkit's CSI 6n queries race past the input parser; the
terminal's reply ends up as literal input text. Add a sibling of the
bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the
caret-escape visible form from paste text, buffer text-filter, and
the input-processing loop.
The idle-redraw removal (#12641) is in the preceding commit from
@foxion37 — keeping them as separate commits preserves attribution.
On provider switches mid-session (e.g. MiniMax -> DeepSeek), the source
assistant turn carries a 'reasoning' field written by the prior provider
but no 'reasoning_content' key. _copy_reasoning_content_for_api would
promote that foreign 'reasoning' to 'reasoning_content' on the outbound
DeepSeek request, leaking a cross-provider chain of thought and in
practice causing HTTP 400.
DeepSeek's own _build_assistant_message always pins reasoning_content=''
at creation time for tool-call turns, so the shape (reasoning set,
reasoning_content absent, tool_calls present) is unreachable from
same-provider DeepSeek history — it can only come from a prior provider.
Pad with '' in that case instead of promoting.
Healthy same-provider 'reasoning' promotion (no tool_calls, or on
providers that do not require the empty-string pin) is unchanged.
Defensive: when the generator encounters a fenced code block containing
Unicode box-drawing characters, wrap it in `<!-- ascii-guard-ignore -->`
markers so the docs-site-checks lint (which scans inside code fences)
can't reject the page for a skill's own diagram.
Plain bash/python code blocks stay uncluttered — only blocks with box
chars get wrapped. Skill authors no longer have to remember to add the
ignore markers in every SKILL.md with ASCII art.
Fixes#15305.
Previously 'hermes debug share' uploads only got DELETEd when the user
ran 'hermes debug share' again — opportunistic-sweep-on-invoke was the
only cleanup path. A user who uploaded once and never ran debug again
left pastes up until paste.rs's retention kicked in (which, empirically,
never actually expires them).
Hook _sweep_expired_pastes into the gateway cron ticker at the same
hourly cadence as the image/document cache cleanups. The opportunistic
sweep in 'hermes debug share' stays as a fallback for CLI-only users
who never start the gateway.
On macOS (bash 3.2 and some Homebrew bash builds) `source`ing a file that
contains `declare -x` statements prints each declaration to stdout. The
persistent-shell wrapper in tools/environments/base.py was only redirecting
stderr when sourcing the session snapshot, so ~60 lines of env vars leaked
into every terminal tool response — blowing out context and triggering
HTTP 400s on context-limited providers.
Fix: redirect both stdout and stderr when sourcing the snapshot. Linux
bash is silent here, so the redirect is harmless there; macOS no longer
leaks.
Closes#15459
Co-authored-by: Sanjays2402 <51058514+Sanjays2402@users.noreply.github.com>
Quick state snapshot now includes pairing JSONs (generic + legacy +
Feishu comment pairing), and `hermes update` takes a pre-update
snapshot labeled `pre-update` before pulling.
Pairing data lives outside state.db in platform-specific JSONs under
~/.hermes/pairing/, ~/.hermes/platforms/pairing/, and
~/.hermes/feishu_comment_pairing.json. The update command already
couldn't touch $HERMES_HOME, but #15733 reports lost pairing after
an update — this gives users something to restore from via
`/snapshot list` / `/snapshot restore <id>` if anything clobbers
the approved-user lists.
- Extend _QUICK_STATE_FILES with pairing paths (files + dirs)
- Snapshot walks directories recursively and records each file in the
manifest individually so restore logic is unchanged
- _cmd_update_impl calls create_quick_snapshot(label='pre-update')
after 'Found N new commits' and before 'Pulling updates'
- Snapshot failures are logged at debug and never block the update
Refs #15733.
read_file's dedup path returned a lightweight stub on re-reads of an
unchanged file, then returned early — so the consecutive-read loop
guard (hard block at count>=4) at the bottom of read_file_tool never
ran for stub-looped calls. Weaker tool-following models (local Qwen3.6
variants in the reported case) ignore the passive 'refer to earlier
result' hint and hammer the same read_file call until iteration budget
runs out.
Track per-key stub returns in task_data['dedup_hits'] and, on the
second stub for the same (path, offset, limit), return a hard BLOCKED
error mirroring the wording the real-read path already uses. A real
read, an intervening non-read tool call (notify_other_tool_call), or
reset_file_dedup (on context compression) all clear the counter so
the guard never stays engaged longer than the actual loop.
Closes#15759
Telegram groups emit a single bot_command entity covering the whole
/cmd@botname span with no accompanying mention entity, so the existing
mention gate in _message_mentions_bot dropped slash commands sent via
the bot-menu autocomplete whenever require_mention is enabled.
Recognise bot_command entities whose @botname suffix matches the bot
username (case-insensitive) as a direct mention, and keep rejecting
commands addressed at other bots. Fixes#15415.
When 'hermes model' runs against a providers: (keyed-schema) entry that
relies only on key_env, the picker resolves the env var for the live
/models request and then wrote a synthesized 'api_key: ${KEY_ENV}' back
to the providers.<key> entry. That's redundant — the runtime already
resolves from key_env directly — and it clutters configs that
intentionally keep credentials out of config.yaml.
Only persist provider_entry['api_key'] when the user originally had an
inline value (literal secret or ${VAR} template). Entries that declared
only key_env stay clean on save.
Fixes#15803.
For 14 of 74 compressed skills, the original description contained
trigger keywords, technique counts, attribution, or use-case phrases
not covered by the existing body content. Prepends a 'When to use' /
'What's inside' block near the top so the agent still has the full
context when the skill is loaded.
Skills salvaged:
- codex, ascii-video, creative-ideation, excalidraw, manim-video, p5js
- gif-search, heartmula, youtube-content
- lm-evaluation-harness, obliteratus, vllm, axolotl
- powerpoint
Remaining 60 skills were verified to already cover the dropped content
in their existing body sections (When to Use, overview, intro prose)
or had short descriptions fully captured by the new compressed form.
Target: every skill's description fits in a one-line gateway menu and
leads with trigger keywords an agent would match on. Drops filler like
'Use this skill to', 'A skill for', 'This skill provides'.
Before: max description length was 791 chars (architecture-diagram),
74 of 81 built-in skills were >60 chars.
After: max 60, mean 54, all 81 built-in skills <=60.
Rewritten with double-quoted YAML scalars to preserve Chinese/arrow
glyphs (baoyu-comic, yuanbao, youtube-content).
- claude-design: 'Design one-off HTML artifacts (landing, deck, prototype).' (57)
- popular-web-designs: '54 real design systems (Stripe, Linear, Vercel) as HTML/CSS.' (60)
- design-md: "Author/validate/export Google's DESIGN.md token spec files." (59)
Also adds an inline callout near the top of claude-design pointing to
popular-web-designs and design-md so the cross-reference lands even
without reading the full decision table.
- claude-design: design process + taste for one-off HTML artifacts
- popular-web-designs: 54 ready-to-paste design systems (Stripe/Linear/etc.)
- design-md: formal DESIGN.md token spec file authoring
Adds a comparison table to claude-design's 'When To Use' section and
reciprocal pointers in design-md and popular-web-designs. Also corrects
claude-design author attribution to BadTechBandit.
Harden the Matrix adapter's sender-drop guards so bot-self events and
appservice/bridge identities never reach the gateway's pairing flow or
the agent loop.
Two filters, applied as early as possible in _on_room_message (and
_on_reaction for the self-filter):
1. _is_self_sender(sender) — case-insensitive + whitespace-trimmed
equality with self._user_id. When self._user_id is still empty
(whoami has not resolved, or login failed), returns True
defensively: an unidentified bot dropping its own events is always
preferable to falling into an echo loop. The previous byte-for-byte
equality check let differently-cased copies of the bot's MXID slip
through, and an unresolved self-ID silently disabled the guard.
2. _is_system_or_bridge_sender(sender) — drops appservice namespace
puppets (conventional @_bridge_...:server form) and malformed
senders with an empty localpart. These identities used to fall
through to the gateway's unauthorized-user path, trigger a pairing
code, and — once an operator approved the bridge — every outbound
message the bridge relayed would loop back as an authorized user
message. This was the root of the 'hall of mirrors' symptom.
Fixes#15763
Test plan
---------
scripts/run_tests.sh tests/gateway/test_matrix.py
scripts/run_tests.sh tests/gateway/test_matrix_mention.py tests/gateway/test_matrix_voice.py
All 182 tests pass. 14 new regression tests cover exact / case-insensitive
/ whitespace / unresolved-self-id matches, bridge prefix detection, empty
sender, and the full _on_room_message drop path.
Closes#15775.
Title generation swallowed exceptions at debug level and returned None,
so a depleted auxiliary provider (e.g. OpenRouter 402) silently left
sessions with NULL titles. Reporter observed 45 untitled sessions
accumulated over 19 days with no user-visible indication.
- agent/title_generator.py: accept optional failure_callback, bump log
to WARNING, invoke callback on call_llm exception (swallowing callback
errors so nothing can crash the fire-and-forget worker thread).
- cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the
callback so failures route through the existing user-visible warning
channel.
- tests: cover callback fires / errors are swallowed / no-callback
legacy behavior / maybe_auto_title forwards kwarg to worker.
The bare-string isinstance guard added in 80ae2621 covered _find_tail_cut_by_tokens
(line 1084) but missed the identical pattern in _calculate_protect_tail_boundary
(line 487, the protect-tail scan loop). Both loops call .get("text", "") on every
list item in message["content"]; both crash with AttributeError when that list
contains a bare string.
Apply the same dict/str/fallback isinstance guard to the protect-tail path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
raw_content from message["content"] can be a list that contains bare
strings, not only dicts. The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.
Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)). Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.
This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.
Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
sum(len(p.get("text", "")) for p in raw_content)
if isinstance(raw_content, list) else len(raw_content)
Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.
Fixes#16087.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If the gateway's Python env loses access to 'croniter' between when a
cron job was created and when mark_job_run() fires, compute_next_run()
returns None for cron schedules. mark_job_run() treated that as terminal
completion and wrote enabled=false, state=completed — turning a missing
runtime dep into a silent, permanent job-off.
That behaviour is safe for one-shot jobs but wrong for recurring ones. A
missing dep should surface as an error the user can see, not as successful
completion of a job that is about to stop firing.
mark_job_run() now only disables the job on next_run_at=None when the
schedule is one-shot. For recurring (cron/interval) schedules it keeps
enabled=true, sets state=error, and records last_error so the user can
see why the job isn't advancing. compute_next_run() also logs a warning
the first time cron+no-croniter hits, so the underlying cause is visible
in the gateway log.
Tests cover:
- recurring cron job stays enabled with state=error when HAS_CRONITER=False
- recurring interval stays enabled when compute_next_run returns None
- one-shot jobs still flip to enabled=false, state=completed (no regression)
Fixes#16265
Azure Foundry deploys GPT-5.x, codex-*, and o1/o3/o4 reasoning models as
Responses-API-only. Calling /chat/completions against these deployments
returns 400 'The requested operation is unsupported.', which broke any
user who ran 'hermes model' on Azure, picked a gpt-5/codex deployment,
and kept the default api_mode: chat_completions. Verified in a user
debug bundle on 2026-04-26: gpt-5.3-codex failed on synopsisse.openai.azure.com
with that exact payload while gpt-4o-pure on the same endpoint worked.
Adds azure_foundry_model_api_mode(model_name) that returns
codex_responses when the model name starts with gpt-5, codex, o1, o3,
or o4 — otherwise None so chat_completions / anthropic_messages stay
untouched for gpt-4o, Llama, Claude-via-Anthropic, etc.
Resolver (both the direct Azure Foundry path and the pool-entry path)
consults it and upgrades api_mode unless the user explicitly picked
anthropic_messages. target_model (from /model mid-session switch)
takes precedence over the persisted default so switching from gpt-4o
to gpt-5.3-codex routes correctly before the next request.
Docs: correct the azure-foundry guide which previously claimed Azure
keeps gpt-5.x on chat completions — that was only true for early Azure
OpenAI, not Azure Foundry codex/o-series deployments.
Tests: 14 unit tests for azure_foundry_model_api_mode + 6 integration
tests in TestAzureFoundryResolution covering Bob's exact scenario,
target_model override, anthropic_messages guard, and o3-mini.
Follow-up to #16323 — the UrlSource adapter is shipped but four
user-facing docs surfaces still only listed the hub-identifier forms.
- user-guide/features/skills.md: add ``url`` to the Supported-hub-sources
table; add a new "#### 8. Direct URL (`url`)" section explaining scope
(single-file SKILL.md only), name-resolution order (frontmatter → URL
slug → interactive prompt → --name flag), and both TTY and
non-interactive usage. Add two URL examples to the install-examples
block near the top of the page.
- reference/cli-commands.md: two URL install examples + one note
explaining the name-resolution fallback chain.
- guides/work-with-skills.md: one URL-install example alongside the
existing hub-identifier examples.
- skills/autonomous-ai-agents/hermes-agent/SKILL.md: Quick Reference
block's ``hermes skills install`` line now spells out that ID can be
a hub identifier OR a direct SKILL.md URL, and mentions --name for
frontmatter-less skills.
No code changes. No new dependencies. Website builds via the usual
Docusaurus pipeline.
Co-authored-by: teknium1 <teknium@noreply.github.com>
Parse scope from the raw callback URL before stripping the auth code so Flow.fetch_token matches user-granted scopes. Add regression test for dual-scope callbacks.
Made-with: Cursor
Two related fixes for OpenClaw-residue problems after an OpenClaw→Hermes
migration (especially migrations done via OpenClaw's own tool, which
doesn't archive the source directory).
1. optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py:
rebrand_text() was rewriting ~/.openclaw/config.yaml → ~/.Hermes/config.yaml
(capital H — a directory that doesn't exist). Now case-preserving:
"OpenClaw" → "Hermes" (prose), but "openclaw" → "hermes" (so filesystem
paths land on the real Hermes home). Regex logic unchanged — replacement
function now checks if the matched text was all-lowercase and emits the
replacement in the matching case.
2. agent/onboarding.py + cli.py: one-time startup banner the first time
Hermes launches and finds ~/.openclaw/. Tells the user to run
`hermes claw cleanup` to archive it, gated on the existing onboarding
seen-flag framework (onboarding.seen.openclaw_residue_cleanup in
config.yaml). Fires once per install; re-running requires wiping that
flag or running cleanup directly.
Tests:
- 4 new TestDetectOpenclawResidue tests (present / absent / file-instead-
of-dir / default-home smoke)
- 2 TestOpenclawResidueHint tests (content check)
- 2 TestOpenclawResidueSeenFlag tests (flag isolation + round-trip)
- test_rebrand_text_preserves_filesystem_path_casing regression test
with 4 scenarios including the exact ~/.openclaw/config.yaml case
- Existing test_rebrand_text_* tests updated to the new case-preserving
contract (lowercase input → lowercase output)
Co-authored-by: teknium1 <teknium@noreply.github.com>
Four small tool-description / skill-content tweaks addressing recurring
model mistakes seen in @versun's docx feedback (Kimi 2.6, but the patterns
apply to every model):
1. browser_navigate description: call out .md/.txt/.json/.yaml/.csv/.xml,
raw.githubusercontent.com, and API endpoints as specifically preferring
curl or web_extract. The generic "prefer web_search or web_extract" was
too weak; models kept firing up the browser for plain-text URLs.
2. delegate_task description: two additions.
(a) Pass user language / output-style preferences in 'context' when they
differ from English — otherwise subagents default to English and their
summaries contaminate the final reply (caused the bilingual digest bug).
(b) Subagent summaries are self-reports, not verified facts. For
operations with external side-effects (HTTP uploads, remote writes,
file creation at shared paths), require a verifiable handle (URL, ID,
path) and verify it yourself before claiming success.
3. agent/prompt_builder.py Skills-mandatory block: new explicit line
"Whenever the user asks to configure / set up / modify / install /
enable / disable / troubleshoot Hermes Agent itself, load the
`hermes-agent` skill first." The generic "load what's relevant" didn't
route Hermes-meta questions (like "how do I turn off redaction?") to
the one skill that has the answer.
4. skills/autonomous-ai-agents/hermes-agent/SKILL.md: new "Security &
Privacy Toggles" section covering security.redact_secrets (with the
import-time-snapshot restart-required caveat), privacy.redact_pii,
approvals.mode (manual/smart/off) + --yolo + HERMES_YOLO_MODE, shell
hooks allowlist, and how to disable network/media tools entirely.
Every command verified against the actual config keys — no invented
knobs.
Co-authored-by: teknium1 <teknium@noreply.github.com>
* feat(skills): install skills from a direct HTTP(S) URL
Adds UrlSource adapter so `hermes skills install <url-to-SKILL.md>` and
`/skills install <url>` work as first-class operations — no more
improvising with curl + patch + cp.
- Claims identifiers that start with http(s):// and end in .md
- Skips /.well-known/skills/ URLs (WellKnownSkillSource handles those)
- Skill name from YAML frontmatter, URL-slug fallback
- Single-file SKILL.md only (v1 scope — multi-file skills need a manifest)
- Trust level 'community'; full security scan still runs
- Lock file stores the URL as identifier so `hermes skills update`
re-fetches from the same URL cleanly
Scope matches real user need from @versun's docx feedback where
`https://sharethis.chat/SKILL.md` had no first-class install path.
* feat(skills): interactive name/category for URL installs + --name override
Follow-up to the UrlSource adapter. The previous commit fell back to weak
heuristics when frontmatter had no ``name:`` and could produce garbage names
like ``SKILL`` or ``unnamed-skill``. Now:
tools/skills_hub.py
- ``UrlSource._is_valid_skill_name()`` — strict identifier check
(``^[a-z][a-z0-9_-]*$``), rejects sentinel values (``SKILL``, ``README``,
``INDEX``, ``unnamed-skill``, empty, non-strings).
- ``_resolve_skill_name()`` returns ``Optional[str]`` — ``None`` when
nothing valid is resolvable. Also ignores unsafe frontmatter names
(``../evil``) and falls through to URL slug instead of returning None
immediately, so a URL with a bad frontmatter but a good path still
works.
- ``fetch()``/``inspect()`` carry an ``awaiting_name=True`` marker in
metadata/extra when resolution fails, letting ``do_install`` decide
whether to prompt, apply an override, or error out.
hermes_cli/skills_hub.py
- ``do_install`` gains a ``name_override`` parameter.
- On URL-sourced bundles with ``awaiting_name=True``:
1. If ``name_override`` is valid → use it.
2. If ``name_override`` is invalid → refuse with a clear error.
3. Else if ``skip_confirm=True`` (non-interactive: slash / TUI /
gateway / scripts) → refuse with an actionable retry hint pointing
at ``--name <your-name>`` on both CLI and slash forms.
4. Else (interactive TTY) → prompt for the name.
- Interactive TTY also prompts for a category when none is given for a
URL-sourced install, hinting existing category buckets so users can
reuse ``productivity``, ``devops``, etc. Empty input → flat install.
- ``_existing_categories()`` scans ``~/.hermes/skills/`` for subdirs that
look like category buckets (contain nested SKILL.md files); skips
top-level skills and hidden dirs.
- ``_prompt_for_skill_name()`` / ``_prompt_for_category()`` helpers
(EOF/Ctrl-C-safe, match the existing ``Confirm [y/N]`` prompt style).
hermes_cli/main.py
- ``hermes skills install`` argparse gains ``--name <name>``.
hermes_cli/skills_hub.py (slash)
- ``/skills install <url> --name <x>`` parsing added.
Tests
- tests/tools/test_skills_hub.py: updated ``UrlSource`` tests to assert
the new ``awaiting_name`` metadata; added 4 new tests for
``_is_valid_skill_name`` rejection sets and the awaiting-name marker.
- tests/hermes_cli/test_skills_hub.py: 8 new tests covering --name
override accept/reject, non-interactive error, interactive name prompt,
interactive category prompt, cancel-aborts-install, and
``_existing_categories`` scan behavior (buckets vs flat skills).
- E2E verified all four paths (no-name/no-override → error;
--name override → install; frontmatter name → install;
invalid --name → rejection).
---------
Co-authored-by: teknium1 <teknium@noreply.github.com>
_search_members() and _fetch_messages() call min(limit, 100) assuming
limit is int. Models can pass limit as a string (e.g. "10"), causing
TypeError: '<' not supported between instances of 'str' and 'int'.
Add try/except int() coercion with safe defaults at the top of both
functions, matching the pattern used in session_search fix (#10522).
`_resolve_effective_accept()` used `return bool(cfg_val)` for the
`hooks_auto_accept` config key. In Python, `bool("false")` is `True`,
so a user setting `hooks_auto_accept: "false"` (quoted YAML string)
in `config.yaml` would silently enable auto-approval of every shell
hook, bypassing the consent prompt entirely.
Replace the coercion with the same type-aware parsing already used for
the HERMES_ACCEPT_HOOKS env var three lines above: bool passthrough,
strings checked against {1,true,yes,on} case-insensitively, everything
else (including "false", None, 0, ints) rejected.
Add TestHooksAutoAcceptParsing guarding the regression across all four
value shapes (bool, string-truthy, string-falsy, missing/None).
Reported by @sprmn24 in #16244.
Follow-up on top of #16243. Two small tweaks:
- Compile the regex once as `_SAFE_IDENTIFIER_RE` and pin it to
`[A-Za-z0-9@.+\-]`. The previous `\w` accepts Unicode word chars
(full-width digits, accented letters) which aren't valid WhatsApp
identifiers and shouldn't reach the mapping-file lookup.
- Add a comment clarifying this is defense-in-depth, not a live
traversal. The hardcoded `lid-mapping-{current}{suffix}.json`
prefix already prevents escape via pathlib's component split —
with `current='../secrets'`, the first path component under
`session/` is the literal directory name `lid-mapping-..`,
which the attacker cannot create.
E2E verified: legit mapping chains still resolve, all probed attack
shapes (`../`, absolute paths, shell metacharacters, Unicode digit
tricks) are rejected before any file access.
expand_whatsapp_aliases() interpolated untrusted identifiers directly
into filenames (lid-mapping-{current}.json) without validation.
An identifier containing ../ or / could escape the session directory.
Also replaced bare except Exception: continue with targeted
(OSError, json.JSONDecodeError) and a debug log so mapping
corruption is diagnosable instead of silently skipped.
Fixes:
- Reject identifiers with unsafe characters via re.match guard
- Replace broad exception swallow with specific catch + debug log
Both get_provider_request_timeout() and get_provider_stale_timeout()
wrapped the load_config import in try/except ImportError but left the
actual load_config() call unprotected. A corrupt config file, YAML
parse error, or permission failure would raise instead of returning
None safely.
Move load_config() inside the try block so any exception returns None.
- remove the temporary -c MRU logic and companion test from this branch so PR #15926 stays focused on TUI perf work
- keep the resume-ordering change isolated in the dedicated follow-up PR
- drop unused TUI helpers, test-only layout scaffolding, and stale public debug exports
- remove an unused profiler import and trim test-only coverage for deleted helpers
- gateway handler: turnController always archives in recordMessageComplete,
so the post-complete archiveTodosAtTurnEnd().forEach is dead code. Drop
it and the now-unused import.
- turnController: collapse archive prepend into a single spread expression.
- gateway server: one-line comment for the tool.start todo skip.
Two bugs surfaced together while the model fired the todo tool:
1. Count flickered (e.g. 3 → 1 → 3) because tool.start echoed
args.todos as the live state. With merge=true (or any partial
replacement) args.todos is just the items being updated, not the
full list. Drop the early echo — tool.complete already carries the
canonical full list from the tool result.
2. After turn end the panel jumped from under the user prompt to below
thinking/tools because archiveDoneTodos() was pushed AFTER segments
in finalMessages. Prepend the archive trail msg so it sits right
after the user prompt — same visual slot the live panel occupied
during streaming.
CPU profiling showed the built TUI loading React development modules unless NODE_ENV was set. Default CLI and dashboard TUI children to production while preserving explicit user overrides.
Keep history metadata consistent with lineage replay, globally order replayed lineage messages, and make Ink cache eviction report post-eviction sizes. Also keys TUI config cache by path to avoid cross-home test leakage.
When _compress_context rotates session_id (compression split), fire
on_session_start(new_sid, boundary_reason="compression",
old_session_id=<old>) on the active context engine. Plugin engines
(e.g. hermes-lcm) use this to preserve DAG lineage across the rollover
instead of re-initializing fresh per-session state.
Built-in ContextCompressor.on_session_start accepts **kwargs and ignores
them — no behavior change for default users.
Closes hermes-lcm#68 symptom: after Hermes compressed and minted a new
physical session, LCM was treating the split as a fresh /new and losing
continuity (compression_count: 1, store_messages: 0, dag_nodes: 0).
Credit: @Tosko4 (PR #13370) — minimized scope to the boundary_reason
signal only; the broader session-lifecycle refactor will be taken in
separate PRs if justified by concrete plugin need.
Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/. The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever. Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.
Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from #13861 / #16286:
- tools/checkpoint_manager.py: new prune_checkpoints() and
maybe_auto_prune_checkpoints() helpers. Deletes shadow repos that
are orphan (HERMES_WORKDIR marker points to a path that no longer
exists) or stale (newest in-repo mtime older than retention_days).
Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
runs once per min_interval_hours regardless of how many hermes
processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
retention_days / delete_orphans / min_interval_hours knobs.
Default auto_prune: false so users who rely on /rollback against
long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
tracking, non-shadow dir skip, interval gating, corrupt marker
recovery.
Refs #3015 (session-file disk growth was fixed in #16286; this covers
the checkpoint side noted out-of-scope there).
The write_file guard added in #16223 used strict equality against the
internal dedup status message. In practice, the model sometimes
prepends a short note or appends a trailing comment before calling
write_file, which slipped past the strict check.
Broaden the heuristic: reject writes whose stripped content equals
the status message OR contains it and is <=2x its length. Short,
status-dominated writes are always corruption; legitimate docs that
quote the message verbatim are always much longer.
Adds two tests: one for the small-wrapper corruption shape, one
confirming large legitimate files that quote the status still write.
write_file_tool and patch_tool both call _update_read_timestamp to
refresh the staleness tracker after writing, but they never invalidate
the dedup cache entries for the written path. The dedup cache keys are
(resolved_path, offset, limit) → mtime tuples populated by read_file_tool.
On filesystems where a read and write land in the same mtime second (or
when mtime granularity is 1s), the cached and current mtime are equal,
so the dedup check incorrectly returns a 'File unchanged since last
read' stub — even though the file was just overwritten.
The agent then sees stale content (or a stale 'File not found' error)
and enters expensive error-recovery loops, burning API calls.
Fix: add _invalidate_dedup_for_path(filepath, task_id) that removes all
dedup entries whose resolved path matches the written file. Called from
_update_read_timestamp so both write_file_tool and patch_tool benefit
automatically. Scoped to the writing task_id — other tasks' caches are
not affected.
6 regression tests added covering:
- read→write→read within same mtime second (core #13144 scenario)
- invalidation across all offset/limit combinations
- isolation: writing file A does not invalidate file B's cache
- isolation: writing in task A does not invalidate task B's cache
- _invalidate_dedup_for_path safety on missing task / empty dedup
All 25 tests pass (19 existing + 6 new).
Fixes#13144
Follow-up to #15960 — the provider-active detection in tools_config.py
also read use_gateway with raw truthiness (is False, not dict.get), so
quoted 'false' caused the FAL-direct row to show wrong active status in
the hermes tools picker. Route both sites through is_truthy_value().
PR #16013 plugged the leak in `/new`, but two sibling session-boundary
resets had the same bug:
1. Inactivity / suspended-session auto-reset (top of `_handle_message`)
previously cleared only reasoning. Now drops model override and the
queued "/model switched" note as well.
2. Compression-exhaustion auto-reset now also drops the pending note
alongside the existing model/reasoning cleanup.
All three session-boundary sites now use the identical cleanup idiom.
`npm install --silent` (used by `_build_web_ui` and `_update_node_dependencies`)
silently rewrites package-lock.json on npm ≥ 10 (strips "peer": true etc.),
leaving the working tree dirty after every `hermes update`. The next update
then detects the dirty lockfile and stashes it — producing a trail of
hermes-update-autostash entries for web/package-lock.json, ui-tui/package-lock.json,
and root package-lock.json.
Switch to `npm ci` (strict, lockfile-preserving) via a new
`_run_npm_install_deterministic` helper that falls back to `npm install`
when the lockfile is missing or out of sync (WIP forks).
Verified locally: all three lockfiles stay byte-identical after the real
_build_web_ui / _update_node_dependencies run twice back-to-back. Fallback
path tested with a deliberately out-of-sync lockfile and a no-lockfile case.
Four independent session-UX bugs reported by an external user (#16294).
/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.
'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.
TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.
ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.
Closes#16294.
Expand the airtable skill from bare CRUD to a full Hermes-shaped
cookbook matching the linear/notion neighbors, and trim the
description to fit the 60-char system-prompt cutoff.
Hermes-specific additions:
- Explicit 'use the terminal tool with curl — not web_extract or
browser_navigate' guidance, matching the same note in linear.
- Note that AIRTABLE_API_KEY flows from ~/.hermes/.env into the
subprocess automatically via env_passthrough, so curl calls don't
need to re-export it.
- Prefer 'python3 -m json.tool' (always present) over jq (optional)
for pretty-printing, with -s on every curl to keep output clean.
- Read-before-write workflow that resolves record IDs via
filterByFormula instead of guessing.
Cookbook expansion (new vs original):
- Field-type reference table (text, select, multi-select, attachment,
linked record, user) with the exact write-shape Airtable expects.
- typecast flag for auto-coercing values / auto-creating select options.
- performUpsert PATCH for idempotent sync by merge field.
- Batch create/delete endpoints (10-record cap per call).
- Sort + fields query params with URL-encoding (%5B / %5D).
- Named-view query that applies saved filter/sort server-side.
- Full pagination loop template (while loop with offset).
- Common filterByFormula patterns (exact match, contains, AND/OR,
date comparison, NOT empty).
- Rate-limit backoff guidance (Retry-After header, per-base budget).
- Airtable error-code reference (AUTHENTICATION_REQUIRED,
INVALID_PERMISSIONS, MODEL_ID_NOT_FOUND,
INVALID_MULTIPLE_CHOICE_OPTIONS) so the agent can map failures to
user-actionable fixes instead of just retrying.
Also: description trimmed from 183 chars (truncated to 60 in system
prompt, losing 'filter/upsert/delete' trigger terms) down to 59 chars
that render whole: 'Airtable REST API via curl. Records CRUD, filters,
upserts.' Catalog row updated to match.
SKILL.md grew from 115 to 228 lines — still under the 500-line soft
cap and below the linear skill (297 lines) which serves the same
role for GraphQL.
- scripts/release.py: map sonoyuncudmr@gmail.com -> Sonoyunchu so the
check-attribution CI job and release notes credit Soynchu correctly.
- website/docs/reference/skills-catalog.md: add the airtable row to
the productivity bundled-skills table.
Adds NOTION_API_KEY, LINEAR_API_KEY, TENOR_API_KEY, and AIRTABLE_API_KEY
to OPTIONAL_ENV_VARS so:
- They persist to ~/.hermes/.env via save_env_value like every other
key Hermes knows about, instead of being ad-hoc variables the user
has to hand-edit the dotfile for.
- load_env() / reload_env() populate os.environ from .env on every
startup — the user sets the key once, skills keep working across
restarts without losing access.
- hermes setup / hermes config show surface them as known optional
vars with the correct signup URL (linear.app/settings/api,
airtable.com/create/tokens, etc.).
These four entries use category="skill" (new) rather than "tool".
tools/environments/local.py auto-adds every category=tool/messaging
entry to _HERMES_PROVIDER_ENV_BLOCKLIST, which stops env passthrough
from leaking provider credentials into the execute_code sandbox
(GHSA-rhgp-j443-p4rf). Skill API keys are the opposite case — the
point is for the agent's subprocess to see them so curl can read
Authorization headers — so they must be outside the blocklist. The
new category is inert for that check.
All four entries are advanced=True: they show up in 'hermes config'
and 'hermes status' displays, but do not nag users who have never
touched those skills during setup checklists.
E2E verified: save_env_value → reload_env → os.environ populated →
skill_view reports setup_needed=False → env_passthrough registers
the key for subprocess inheritance.
Convert the airtable skill from 'skills.config.airtable.api_key'
(config.yaml, wrong bucket for a secret) to 'prerequisites.env_vars:
[AIRTABLE_API_KEY]' (~/.hermes/.env), matching every other bundled
skill that authenticates with an API token.
Why the original shape was wrong:
- metadata.hermes.config is for non-secret skill settings (paths,
preferences) per references/skill-config-interface.md. Storing a
bearer token under skills.config.* also triggered the documented
'hermes config migrate' nag-on-every-run problem.
- The Quick Reference's 'AIRTABLE_API_KEY=...' bash line couldn't
read skills.config.airtable.api_key anyway — it's a yaml path, not
an env var.
Follow-up polish on the same pass:
- Added version/author/license frontmatter to match notion/linear.
- Added prerequisites.commands: [curl].
- Setup section now specifies the PAT format (pat...) that replaced
legacy 'key...' API keys in Feb 2024, plus the three required scopes
(data.records:read/write, schema.bases:read) and the per-base Access
list requirement.
- Clarified PATCH vs PUT and pagination (100 records/page cap).
- Swapped verification from 'hermes -q ...' (non-deterministic) to a
curl /v0/meta/bases call that returns a verifiable HTTP status code.
_web_ui_build_needed() in PR #14914 checked web_dir/"dist" as the
sentinel, but vite.config.ts sets outDir: "../hermes_cli/web_dist" so
the build output lands in hermes_cli/web_dist/, never in web/dist/.
The sentinel was therefore always missing → _web_ui_build_needed always
returned True → npm install + Vite build ran on every startup → OOM on
low-memory VPS persisted unchanged.
Fix: derive dist_dir as web_dir.parent / "hermes_cli" / "web_dist" so
the sentinel points to the actual build output directory.
Fixes#14898
When the gateway intercepts a pending /update prompt and the user sends
a recognized slash command (/new, /help, ...), the command now dispatches
normally AND the detached update subprocess is unblocked by writing a
blank .update_response. _gateway_prompt reads '' → strips → returns the
prompt's default (typically a safe 'n' / skip), so the update process
exits cleanly instead of blocking on stdin until the 30-minute watcher
timeout.
Also clears _update_prompt_pending[session_key] on this path so stray
future input for the same session isn't re-intercepted.
Extends PR #15849 with tests for the new cancel-write + a regression
test pinning the legacy behavior of unrecognized /foo slash commands
still being consumed as the response.
Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663).
- Built-in slack default: off; other tier-2 platforms unchanged.
- Adjust /verbose isolation test for off to new cycle.
- Migration tests: read/write config.yaml as UTF-8 (Windows locale).
Previously, setting SLACK_BOT_TOKEN in .env would unconditionally enable
the Slack gateway adapter regardless of `slack.enabled: false` in config.yaml.
This caused spurious "SLACK_APP_TOKEN not set" errors when the token was
used only by skills (e.g. cron jobs that send Slack messages) rather than
for the Hermes messaging gateway.
Now, enabled: false in config.yaml is respected — the token is stored so
skills can still use it, but the gateway adapter is not activated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TestAutoMaintenance gains 3 tests: auto-prune deletes transcript files
when sessions_dir is passed, preserves them when it isn't (backward-
compat), and never touches active-session files during prune.
- FakeDB helpers in test_sessions_delete.py accept **kwargs so they
don't break when delete_session signature gains sessions_dir.
`delete_session()` and `prune_sessions()` only removed SQLite records,
leaving .json/.jsonl transcript files on disk forever. Over time this
causes unbounded disk growth (~27MB/day observed).
Changes:
- Add `_remove_session_files()` static helper that cleans up
`{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json`
- `delete_session()` accepts optional `sessions_dir` param and removes
files for the deleted session and its children
- `prune_sessions()` accepts optional `sessions_dir` param and removes
files for all pruned sessions after the DB transaction
- Wire up CLI `hermes sessions delete` and `hermes sessions prune` to
pass `sessions_dir`
- File cleanup is best-effort (OSError silenced) so DB operations are
never blocked by filesystem issues
- Fully backward-compatible: `sessions_dir=None` (default) preserves
existing behavior
Extends the existing channel_skill_bindings mechanism (previously
Discord-only) to Slack, so a channel or DM can auto-load one or more
skills at session start without relying on the model's skill selector
for every short reply.
Motivation: Mats's German flashcards DM pushes a cron-driven card
5x/day; he responds with one-word guesses like 'work'. Previously each
reply required the main agent to decide whether to load german-flashcards
(full opus turn just to pick a skill). With the binding configured per
Slack channel, the skill is injected at session start and grading runs
directly.
Changes:
- Extract resolve_channel_skills() from DiscordAdapter._resolve_channel_skills
into gateway.platforms.base (now shared across adapters).
- DiscordAdapter._resolve_channel_skills delegates to the shared helper
(behavior preserved — existing test suite still passes unchanged).
- SlackAdapter: resolve channel_skill_bindings on each message and attach
auto_skill to MessageEvent. gateway/run.py already handles auto-skill
injection on new sessions; this just wires Slack through it.
- gateway/config.py: accept channel_skill_bindings in slack: block of
config.yaml (was Discord-only).
- Tests: new tests/gateway/test_slack_channel_skills.py with 11 cases
covering DM/thread/parent resolution, single-vs-list skills, dedup,
malformed entries. Discord suite unchanged.
- Docs: add 'Per-Channel Skill Bindings' section to Slack user guide.
Config example:
slack:
channel_skill_bindings:
- id: "D0ATH9TQ0G6"
skills: ["german-flashcards"]
Enter while the agent is busy can now inject the typed text via /steer —
arriving at the agent after the next tool call — instead of interrupting
(current default) or queueing for the next turn.
Changes:
- cli.py: keybinding honors busy_input_mode='steer' by calling
agent.steer(text) on the UI thread (thread-safe), with automatic
fallback to 'queue' when the agent is missing, steer() is unavailable,
images are attached, or steer() rejects the payload. /busy accepts
'steer' as a fourth argument alongside queue/interrupt/status.
- gateway/run.py: busy-message handler and the PRIORITY running-agent
path both route through running_agent.steer() when the mode is 'steer',
with the same fallback-to-queue safety net. Ack wording tells users
their message was steered into the current run. Restart-drain queueing
now also activates for 'steer' so messages aren't lost across restarts.
- agent/onboarding.py: first-touch hint has a steer branch for both
CLI and gateway.
- hermes_cli/commands.py: /busy args_hint updated to include steer,
and 'steer' is registered as a subcommand (completions).
- hermes_cli/web_server.py: dashboard select widget offers steer.
- hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py:
inline docs updated.
- website/docs/user-guide/cli.md + messaging/index.md: documented.
- Tests: steer set/status path for /busy; onboarding hints;
_load_busy_input_mode accepts steer; busy-session ack exercises
steer success + two fallback-to-queue branches.
Requested on X by @CodingAcct.
Default is unchanged (interrupt).
MCP stdio servers are spawned via the SDK's stdio_client, which on
Linux uses start_new_session=True (setsid). When a cron job is
cancelled mid-way (timeout, agent finish, exception), the subprocess
often escapes the SDK's teardown and survives as a session leader.
Because setsid() detaches the child from the gateway's process group
/ cgroup tree, systemd does not reap it on service restart either —
so every cron tick that touches an MCP tool leaks a dangling server
process.
Fix:
* tools/mcp_tool.py — _run_stdio now wraps the whole stdio+session
context in try/finally. On any exit path (clean, exception,
cancellation), PIDs still alive are moved from the active
_stdio_pids set into a new _orphan_stdio_pids set. Orphan
detection is done via os.kill(pid, 0) — a cheap liveness probe
that never signals the target.
* tools/mcp_tool.py — _kill_orphaned_mcp_children gains an
include_active=False flag. Default behaviour now only reaps the
orphan set so concurrent sessions (other parallel cron jobs or
live user chats) are never disrupted. The existing shutdown path
passes include_active=True to keep the previous "kill everything"
semantics after the MCP loop is stopped.
* cron/scheduler.py — the cleanup hook is moved from run_job()'s
finally (which would race with parallel siblings after #13021)
into tick() after the ThreadPoolExecutor has joined every future.
At that point there are no in-flight sessions from this tick, so
sweeping the orphan set is always safe.
Net effect: zero regression for healthy sessions, and orphan MCP
servers no longer accumulate between gateway restarts.
Made-with: Cursor
Multiple overlapping Slack attachment improvements:
1. Upload retry with backoff on transient errors (429, 5xx, connection
reset, rate_limited, service unavailable). New _is_retryable_upload_error
helper covers three upload paths: _upload_file, send_video,
send_document. Up to 3 attempts with 1.5s * attempt backoff.
2. Thread participation tracking: successful file uploads now add the
thread_ts to _bot_message_ts, mirroring how text replies are tracked.
This lets follow-up thread messages auto-trigger the bot (same
engagement rules as replied threads).
3. Thread metadata preservation in the image redirect-guard fallback
(send_image → send text fallback) and in two gateway.run.py send
paths (image + document fallback calls).
4. HTML response rejection in _download_slack_file_bytes. Parallels
the existing check in _download_slack_file. Guards against Slack
returning a sign-in / redirect page as document bytes when scopes
are missing, so the agent doesn't get HTML-as-a-PDF.
5. File lifecycle event acks (file_shared / file_created / file_change).
These events arrive around snippet uploads. Acking them silences the
slack_bolt 'Unhandled request' 404 warnings without changing behavior.
6. Post-loop message type classification so a mixed image+document upload
classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT.
Previously, the per-file classification in the inbound loop could be
overwritten unpredictably.
7. Expanded text-inject whitelist in inbound document handling to cover
.csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so
snippets and config files are directly visible to the agent, not just
cached as opaque uploads. Paired with new MIME entries in
SUPPORTED_DOCUMENT_TYPES in base.py.
Squashed from two commits in #11819 so the single commit carries the
contributor's GitHub attribution (the original commits were authored
under a local dev hostname).
- stringWidth: true LRU on cache hit (touch-on-read via delete+set) so
hot strings stay resident under long sessions; was insertion-order
FIFO before
- virtualHeights: include todos, panel sections, and intro version in
messageHeightKey so height-cache reuse correctly invalidates when
todo content / panel sections change
- virtualHeights: estimate trail+todos rows at todos.length+2 (or 2
collapsed) instead of the generic ~1-line fallback, so initial
virtualization offsets are closer to reality
- useInputHandlers: clearTimeout on unmount for scrollIdleTimer so
pending relaxStreaming() never fires after teardown
- render-node-to-output: drop unused declined.noHint counter from
scrollFastPathStats; it was always 0 (the "hint missing" branch is
outside the diagnostics block)
- perfPane / hermes-ink.d.ts: follow the noHint removal
- wheelAccel: replace ~/claude-code path comment with generic
attribution that doesn't reference a developer-local checkout
TodoPanel now renders as a child of the most recent user message's
virtualized row container, so it visually belongs to that prompt and
follows it during scroll. Falls back gracefully when no user message
exists yet (panel just doesn't render).
Adds an `evictInkCaches(level)` API that prunes the four hot module-level
caches (`widthCache`, `wrapCache`, `sliceCache`, `lineWidthCache`) with
either a half-keep LRU pass or a full clear. Wired into:
- memoryMonitor: half-prune on 'high', full drop on 'critical', before
the heap dump / auto-restart path. Gives long sessions a shot at
recovering RSS instead of hard-exiting.
- useSessionLifecycle.resetSession: half-prune so a /new session starts
with a half-warm pool and the prior session can resume cheaply.
Also: lineWidthCache now uses LRU half-eviction on overflow instead of a
full `cache.clear()`, matching the other three caches.
Comparison vs claude-code: both forks now share the same `prevScreen`
blit + dirty-cascade machinery in render-node-to-output. Their smoothness
came from sibling-memo discipline (every chrome pane memo'd so dirty
cascade doesn't disable transcript blit) — already in place in our
appLayout.tsx (TranscriptPane / ComposerPane / StatusRulePane all memo'd).
Alt-screen is not the cause; both use it. The remaining gap was per-row
CPU on width/wrap/slice, which the previous commit closed.
CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three
hot loops in the per-frame render path:
Output.get() per-frame walk: 24% total
└─ sliceAnsi(line, from, to) per write: 18% total
stringWidth(line) chain (cached + JS): 14% total
All three were re-doing identical work every frame: same string → same
clipped slice → same width.
Fixes:
1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path
skips the cache (inline scan beats Map.get for short ASCII, the >90%
case). String.charCodeAt scan up to 64 chars is cheaper than the regex
fallback.
2. Memoize wrapText (4k-entry LRU keyed by maxWidth|wrapType|text) — wrapAnsi
is pure and the same content reflows identically every frame.
3. Memoize sliceAnsi (4k-entry LRU keyed by start|end|str) for the
end-defined hot path used by Output.get().
4. Skip the slice entirely in Output.get() when the line already fits the
clip box (startsBefore=false && endsAfter=false). Most transcript lines
never exceed their container width, and tokenizing them just to slice
(line, 0, width) was pure overhead. This single fast-path drops
sliceAnsi from 18% → ~0% in the profile.
Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20,
SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16
lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS
window still render in full so reading-zone behavior is unchanged.
Validation, real-user CPU profile, page-up scroll on 11k-line session:
Output.get() self-time: 24% → 0.3%
sliceAnsi total: 18% → not in top 25
stringWidth family: 14% → ~3%
idle: 60.7% → 77.3%
Frame timings (synthetic page-up profile harness):
dur p95: ~10ms → 4.87ms
dur p99: 25ms+ → 12.80ms
yoga p99: ~20ms → 1.87ms
The remaining CPU in the profile is Yoga layoutNode + React commit,
which is the irreducible work for this UI tree size.
Ports openclaw/openclaw#72038 to hermes-agent.
Telegram's `editMessageText` preserves the original message timestamp,
so a long-running streamed reply (reasoning models that take 60+ seconds
to finish) would keep the first-token timestamp even after completion.
Users can't tell how long a task actually took.
When a preview message has been visible for >= 60s (configurable via
`streaming.fresh_final_after_seconds`), finalize by sending a fresh
message instead of editing in place, then best-effort delete the stale
preview. Short previews still edit in place (the existing fast path).
Implementation notes adapted from OpenClaw's TypeScript original:
- `StreamConsumerConfig` gains `fresh_final_after_seconds` (default 0 =
legacy edit-in-place). Gateway-level `StreamingConfig` defaults to 60.
- `GatewayStreamConsumer` tracks `_message_created_ts` at first-send and
checks it in `_send_or_edit` on `finalize=True`. New helpers
`_should_send_fresh_final` + `_try_fresh_final`.
- `BasePlatformAdapter` gains optional `delete_message(chat_id, message_id)`
returning False by default. `TelegramAdapter` implements it via
`_bot.delete_message`.
- `gateway/run.py` only enables fresh-final for `Platform.TELEGRAM`;
other platforms ignore the setting (they don't have the stale-edit
timestamp problem or edit-then-read works cheaply).
- Fallback to normal edit on any fresh-send failure — no user-visible
regression if Telegram rate-limits a send or the message is gone.
Tests: 15 new cases in tests/gateway/test_stream_consumer_fresh_final.py
covering short/long previews, config plumbing, delete-support absent,
send-failure fallback, __no_edit__ sentinel safety, and StreamingConfig
round-trip.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by
ink's onFrame callback (so it's the REAL render rate, not a timer).
Displays fps, last-frame duration, and total frame count, colored by
threshold (green ≥50, yellow ≥30, red below).
Implementation:
* lib/fpsStore.ts — nanostore atom updated from a trackFrame()
sink. Ring buffer of last 30 frame timestamps; fps = 29/elapsed.
trackFrame is undefined when SHOW_FPS is off so ink's onFrame
short-circuits at the optional chain.
* components/fpsOverlay.tsx — tiny <Text> subscriber; returns null
when SHOW_FPS is off (React skips the subtree entirely).
* entry.tsx — composes onFrame from logFrameEvent (dev-perf) and
trackFrame (fps) so both flags can coexist. When both are off,
onFrame is undefined and ink never attaches the handler.
* appLayout.tsx — mounts the overlay as a flex-shrink=0 right-
aligned Box below the composer, conditional on SHOW_FPS.
Usage:
HERMES_TUI_FPS=1 hermes --tui
# bottom right: " 62.3fps · 0.8ms · #1234" (green/yellow/red)
Intended as a user-facing diagnostic during the scroll-perf tuning
pass — watch the counter drop while holding PageUp to see where
frames go silent, without having to run scripts/profile-tui.py in a
side terminal.
126 files post-compile with React Compiler; 352 tests still pass.
Replaces the static WHEEL_SCROLL_STEP=1 multiplier on wheel events
with an adaptive accel state machine that infers user intent from
inter-event timing.
Algorithm ported straight from claude-code's
src/components/ScrollKeybindingHandler.tsx. All tuning constants,
the native/xterm.js path split, the encoder-bounce detection, the
trackpad-burst signature → all theirs. This file is a mechanical
port into our module structure.
What it does:
precision click (>500ms gap) 1 row/event (deliberate scan)
sustained mouse (40-200ms) 2-6 rows (decay curve)
detected wheel bounce ramps to 15 (sticky wheel-mode)
trackpad flick (5+ <5ms) 1 row/event (burst detect)
direction reversal reset to base
Two implementation paths:
* native terminals (ghostty, iTerm2, Kitty, WezTerm) — linear
window-ramp + optional wheel-mode curve triggered by detected
encoder bounce. SGR proportional reporting handled via the
burst-count guard.
* xterm.js (VS Code / Cursor / browser terminals) — pure
exponential-decay curve with fractional carry. Events arrive
1-per-notch with no pre-amplification, so the curve is more
aggressive.
Selected at construction via isXtermJs() from @hermes/ink (now
exported). Per-user tune via HERMES_TUI_SCROLL_SPEED (alias
CLAUDE_CODE_SCROLL_SPEED for portability).
13 unit tests covering direction flip/bounce/reversal, idle
disengage, trackpad-burst disengage, frac invariants, and the
native vs xterm.js branches.
Profiled under --rate 30 (stress test) and --rate 10 (realistic
sustained scroll): accel ramps to cap=6 at 30Hz burst, decays to
1-3 rows at sparse 10Hz clicks. Perf is comparable to baseline
because accel IS multiplying step — the win is perceptual (fast
flicks cover distance, slow clicks keep precision), not raw fps.
Companion to the earlier WHEEL_SCROLL_STEP=1 change: that set the
base; this modulates around it.
Was user-local in ~/.hermes/skills/. Ported into skills/software-development/
so other Hermes users get it and so the related_skills links from
node-inspect-debugger and python-debugpy resolve in-repo.
Frontmatter upgraded to match repo convention (version/author/license/
metadata.hermes.{tags,related_skills}, description rewritten as "Use when ...").
Body expanded with debugging-tactics section pointing at the two new
debugger skills, and additional common-issues / pitfalls entries.
Adds a gate so we can A/B test whether bypassing the alt-screen +
viewport constraint lets the terminal's native scrollback beat our
virtualization on scroll perf.
Result: definitively NO. Inline mode is 40x worse on every metric
that moves, because AlternateScreen is what constrains the ScrollBox
to the viewport height. Without it, the ScrollBox grows to contain
every child of the transcript and every frame re-renders all 1100
messages.
Profile under hold-wheel_up (1106-msg session, 30Hz for 6s):
metric fullscreen inline delta
patches_total 28,864 1,111,574 +3751%
writeBytes_total 42 KB 1.6 MB +3881%
fps_throughput 15.8 fps 1.75 fps -89%
frames 179 18 -90%
gap_p50_ms 17 (~60fps) 726 (~1fps) +4170%
yoga_p99 34 ms 405 ms +1083%
renderer_p99 14 ms 169 ms +1062%
flickers 0 5 offscreen —
This is actually the cleanest data we've gotten so far:
* AlternateScreen is LOAD-BEARING for perf — its viewport height
constraint is what lets useVirtualHistory's culling work. No
constraint → ScrollBox grows unbounded → every fiber mounts.
* The outer terminal (Cursor's xterm.js) parsed 1.6 MB of ANSI in
under 10 seconds with drain p99 = 8.83 ms and 0 backpressure
frames. Our terminal-write hypothesis from last session was
wrong: the bottleneck is React + Yoga, not the wire.
* Doing proper inline mode (non-virtualized transcript in
scrollback, composer pinned below) is not a flag flip — it's a
different UI architecture. Leaving this flag in so anyone
re-running the experiment gets the same numbers, but not
building the architecture until we're sure the perf win is
worth the UX loss (it probably isn't — the fullscreen + virt
path is the one we should optimize, not replace).
Keeping the flag as an experiment gate. Flip HERMES_TUI_INLINE=1
and run scripts/profile-tui.py --compare to reproduce.
Two new skills under skills/software-development/ for real breakpoint-driven
debugging from the terminal:
- node-inspect-debugger: node --inspect / --inspect-brk, node inspect REPL,
CDP scripting via chrome-remote-interface, attaching to running Node
processes (SIGUSR1), ui-tui-specific recipes, Vitest under debugger,
CPU profiles + heap snapshots.
- python-debugpy: pdb quick reference, breakpoint() workflow, pytest --pdb
(with xdist caveat for scripts/run_tests.sh), post-mortem, debugpy for
remote/attach, remote-pdb as the agent-friendly alternative to DAP,
recipes for tui_gateway/_SlashWorker/subprocess debugging.
Before: change code → build → run profile → manually compare to
mental model of last run. After: `--loop` watches ui-tui/src and
packages/hermes-ink/src for .ts(x) changes, rebuilds on change,
re-runs the same scenario, prints a side-by-side A/B diff against
the previous iteration — so each edit's impact is quantified
instantly. Ctrl+C to stop.
Also added:
--save LABEL saves metrics snapshot to /tmp/perf-<LABEL>.json
--compare LABEL diffs the current run vs that snapshot
--extra-flag X pass-through to node dist/entry.js (prepping for
--no-fullscreen below)
key_metrics() flattens a full run into scalar numbers across
frames, React commits, and per-phase timings. format_diff() prints
a table with ↑/↓ markers denoting regressions vs improvements based
on whether the metric is lower-is-better (p99, max, patches, drain)
or higher-is-better (fps, gaps_under_16ms).
Run-to-run noise on static code is ~5-15% on most metrics — big
signal (>30% change on renderer_p99 / fps) cuts through cleanly.
Useful both for validating a single fix and for detecting subtle
regressions during the wheel-accel port.
Usage during the next perf session:
# one-shot with a baseline for later comparison
scripts/profile-tui.py --seconds 6 --hold wheel_up --save pre-accel
# after porting the wheel handler
scripts/profile-tui.py --seconds 6 --hold wheel_up --compare pre-accel
# continuous iteration
scripts/profile-tui.py --seconds 6 --hold wheel_up --loop
Adds four fields to FrameEvent.phases and the matching profile
summary:
optimizedPatches post-optimize patch count (what's actually
written to stdout; the .patches field is
pre-optimize)
writeBytes UTF-8 byte count of the write this frame
backpressure true when Node's stdout.write returned false
(Writable buffer full — outer terminal can't
keep up)
prevFrameDrainMs end-to-end drain time of the PREVIOUS frame's
write, captured from stdout.write's 2-arg
callback. Reported on the next frame so the
measurement reflects "time until OS flushed
the bytes to the terminal fd", not "time until
queued in Node".
writeDiffToTerminal() now returns { bytes, backpressure } and
accepts an optional onDrain callback. Only attached on TTY with
diff; piped/non-TTY stdout bypasses flow control so the callback
would fire synchronously anyway.
Initial measurements under hold-wheel_up against 1106-msg session
(30Hz for 6s):
patches total 28,888
optimized total 16,700 (ratio 0.58 — optimizer cuts ~42%)
writeBytes 42 KB / 10s = 4.2 KB/s throughput
drainMs p50 0.14 ms terminal accepts bytes instantly
drainMs p99 0.85 ms
backpressure 0% of frames
This rules out the terminal-parse hypothesis — Cursor's xterm.js
drains our output in sub-millisecond time at only 4 KB/s. The
remaining lag has to be in the render pipeline, not the wire.
Profile output now includes the bytes+drain+backpressure lines to
keep this visible on every subsequent iteration.
Profiled with scripts/profile-tui.py under hold-PageUp + hold-wheel.
The placeholder → microtask-upgrade pattern did not reduce renderer
p99 (63ms → 63ms) or max (96ms → 142ms, slightly worse). Each fresh
row still pays the Md cost — just on a follow-up commit instead of
inline — and the follow-up commit shows up as a second heavy frame
a few ms later.
The real bottlenecks turned out to be:
1. wheel step too large (fixed in 7ca16eea)
2. outer terminal ANSI parse throughput (diagnosing next)
3. React commit frequency during hold-scroll (needs coalescing)
None of which DeferredMd addresses. Clearing the complexity so the
next experiments land on a simpler substrate.
User observation: "it doesn't scroll line by line/row by row."
Was right. Two places hardcoded big deltas:
1. WHEEL_SCROLL_STEP = 6 (config/limits.ts)
Each wheel event scrolled 6 rows. A mechanical wheel notch emits
3-5 events → 18-30 rows per click, which visually teleports past
content instead of smooth-scrolling it. Drop to 1. Trackpads
emit 50-100 events per flick — at step=1 that's still a fast flick
(a whole viewport in one flick) but each intermediate frame is
visible. Porting claude-code's wheel accel state machine is the
right next step if this feels sluggish on precision scrolls.
2. pageUp/pageDown = viewport - 2 (useInputHandlers.ts)
Full-viewport jumps replace the entire screen — no visual
continuity, can't scan content — AND land right at Ink's fast-path
threshold (`delta < innerHeight`), which disqualifies the DECSTBM
blit on every press. Half-viewport keeps 50% continuity AND
drops well under the threshold. Two presses still cover the same
total distance.
Profiled against the 1106-msg session, holding the key at 30Hz for
6s:
wheel_up (step 6 → 1):
frames 142 → 163 (+15%)
throughput 10.7 → 15.8 fps (+48%)
patches tot 53018→ 36562 (-31%)
gap p50 5ms → 16ms (actual rendering ~60fps now)
<16ms frames 93 → 76
16-33ms 82 → 76
hitches 3 → 1
pageUp (viewport-2 → viewport/2):
throughput 10.7 → 9.5 fps (same ballpark — smaller delta × same
event rate = less total scroll)
Ink's proportional drain caps at `innerHeight - 1` per frame to keep
the DECSTBM fast path firing. With these smaller deltas every event
comfortably fits under that cap, so fast-path hit rate goes up and
patch volume per frame drops — the measured 31% reduction in total
patches-sent correlates with users perceiving smoother scrolling
because the outer terminal (VS Code / xterm.js / tmux) isn't drowning
in ANSI between paints.
Tests/type-check/build clean; 352 tests pass.
Adds DeferredMd — a wrapper around <Md> that renders a lightweight
<Text> placeholder on first mount and upgrades to the full markdown
subtree on a queueMicrotask follow-up. Rationale: fresh MessageLine
mounts during PageUp hold run our markdown tokenizer + syntax
highlighter synchronously, producing the 63-112ms renderer spikes
profiled earlier. A plain <Text> placeholder only needs Yoga to wrap
the pre-stripped string (no tokenizer, no highlight), then the Md
subtree builds in a follow-up React commit.
Upgrade cache: once a (theme, compact, text) tuple has been upgraded,
a WeakMap-keyed Set remembers it so remounts (scroll-out then
scroll-back) mount straight into <Md> — no placeholder round-trip.
WeakMap on theme means palette swaps re-upgrade naturally.
Honesty note: profiling under hold-PageUp showed this didn't reduce
renderer p99 measurably — the upgrade commit just pays the Md cost on
a follow-up frame instead of inline. The bigger bottleneck turned out
to be React commit frequency (3.5 commits/sec during 30Hz scroll
input, with 200ms+ silent gaps between commits dominating perceived
FPS), which this change doesn't address. Keeping the deferred path
anyway because:
1. It's correct and tested — no regressions across 352 tests
2. Defensive for pathological fresh-mount cases (giant code blocks,
wide tables) that aren't in the current profile fixture
3. Pairs naturally with useVirtualHistory's useDeferredValue to keep
React's concurrent scheduler able to interrupt upgrade commits
If the follow-up perf investigation (terminal write throughput / patch
volume / commit frequency) shows DeferredMd is net-neutral-or-worse in
practice, this can be reverted with a one-line swap back to <Md> in
messageLine.tsx:115.
Companion to the streaming 2-column fix in 7242361a — these two
touched messageLine.tsx together so they land as a pair.
StreamingMd returned <><Md/><Md/></> — a bare Fragment with two <Md>
children. Each <Md> returns a <Box flexDirection="column">, but its
parent in messageLine.tsx (line 169) is `<Box width={...}>` with no
flexDirection, which Ink defaults to 'row'. So during streaming the
two column boxes rendered side-by-side, producing the visible "tokens
jumble into two columns until it fixes itself" bug — the "fix" was
message.complete flipping isStreaming→false, which swaps the
StreamingMd subtree for a single DeferredMd/Md child (no siblings → row
direction is harmless).
Wrap the two <Md> siblings in a flexDirection="column" Box so they
stack. Localized fix so the non-streaming path (single-child, works
fine in a row parent) is untouched.
Reported by user:
> "tokens streaming... going into 2 columns randomly and jumbling
> together until it fixes itself"
No test changes — findStableBoundary tests still pass (the layout
change is parent-structural, not in the boundary logic). Build clean,
tsc clean, 352 tests pass.
Adds scrollFastPathStats counters to render-node-to-output.ts: captures
every time a ScrollBox's DECSTBM scroll hint is generated, records
whether the fast path took it (blit+shift from prevScreen) or declined,
and why. Exposed through hermes-ink's public exports and snapshotted on
every FrameEvent so the profiler harness can correlate decline reasons
with the actual patch/renderer cost per frame.
This is pure observation — no behaviour change. Preparing for the
virtual-history rewrite: the hypothesis was that our topSpacer/
bottomSpacer scheme disqualifies every scroll via heightDelta
mismatch, but the data shows the fast path is actually taken on most
scrolls (19/23 over a 6s PageUp hold through 1100 messages) — the
remaining steady-state renderer cost is Yoga tree traversal, not
the per-frame full redraw I initially suspected.
Declines that do happen correlate with React commits that changed the
mounted range mid-scroll (heightDelta=±3 to ±35). Those are the rarer
cases the virtualization rewrite still needs to address.
No test diffs — instrumentation-only. Build verified: `tsc --noEmit`
plus the full `npm run build` compiler post-pass pass cleanly.
Extends HERMES_DEV_PERF to capture the complete render pipeline, not
just React commits. Adds scripts/profile-tui.py to drive repeatable
hold-PageUp stress tests against a real long session.
perfPane.tsx:
Wires ink's onFrame callback (already plumbed through the fork) into
the same perf.log as the React.Profiler samples. Captures per-phase
timing (yoga calculateLayout, renderNodeToOutput, screen diff, patch
optimize, stdout write) plus yoga counters (visited/measured/cache-
Hits/live) and patch counts per frame. Events are tagged
{src: 'react'|'frame'} so jq can split them. logFrameEvent is
undefined when HERMES_DEV_PERF is unset, so ink doesn't even attach
the callback.
entry.tsx:
Passes logFrameEvent into render().
types/hermes-ink.d.ts:
Declares FrameEvent + onFrame on RenderOptions so the ui-tui side
type-checks against the plumbed-through ink option.
scripts/profile-tui.py:
New harness. Launches the built TUI under a PTY with the longest
session in state.db resumed, holds PageUp/PageDown/etc at a
configurable Hz for N seconds, then parses perf.log and prints
per-phase p50/p95/p99/max plus yoga-counter summaries. Zero deps
beyond stdlib. Exit 2 if nothing was captured (wiring broken).
Initial findings (1106-msg session, 6s PageUp hold at 30Hz):
- Steady state: 10 fps; renderer phase p99=63ms, write p99=0.2ms
- 4/107 heavy frames (>=16ms), all dominated by renderNodeToOutput
- One pathological 97ms frame with yoga measuring 70,415 text cells
and Yoga visiting 225k nodes — the cold-unmeasured-region hit
- Ink's scroll fast-path (DECSTBM blit from prevScreen) is
disqualified because our spacer-based virtual history doesn't
keep heightDelta in sync with scroll.delta, so every PageUp step
falls through to a full 2000-4800 patch re-render instead of ~40
Split in-flight assistant text at the last stable block boundary so only
the unclosed tail re-tokenizes per stream delta. Previously the full
text was rendered as plain <Text> during streaming and only flipped to
<Md> at message.complete — cheap per delta but loses live markdown
formatting.
New StreamingMd component holds a monotonically-growing stablePrefix
in a ref (idempotent under StrictMode double-render), renders it as
one <Md> that memoizes across deltas, and renders the unstable suffix
as a second <Md> that re-parses on each delta. Cost per delta drops
from O(total length) to O(unstable length).
findStableBoundary walks back to the last "\n\n" outside an open
fenced code block — splitting inside an open fence would orphan the
opener and break highlighting in the prefix.
Adapted from claude-code's src/components/Markdown.tsx:186 but built
on our line-based tokenizer instead of marked.lexer. 9 new tests cover
fence balance, boundary walk, and empty input.
Part of the --tui perf audit (see audit #7).
Slack's modern composer sends messages with a 'blocks' array that
contains rich_text elements. When a user forwards or quotes another
message, the quoted content shows up in the rich_text_quote children
of that array — and is NOT included in the plain 'text' field. The
agent saw only the lossy plain text and was blind to forwarded /
quoted content. Same story for link unfurl previews (Notion, docs,
GitHub, etc.) which Slack puts in the 'attachments' array.
Two fixes in the inbound handler:
1. _extract_text_from_slack_blocks walks rich_text / rich_text_quote /
rich_text_list / rich_text_preformatted trees and renders readable
text ('> quoted', '• bullet', code fences), dedupes against the
plain text field, and appends the extracted content so the agent
sees everything.
2. Link unfurl / attachment preview extraction reads title, url,
body, and footer from the 'attachments' array and appends a
'📎 [title](url)\n body\n _footer_' section per preview.
Skips is_msg_unfurl to avoid echoing our own Slack replies back.
Routing is careful not to trust augmented text: mention gating
(is_mentioned) and slash-command detection both run against the
original 'text' field, so forwarded content containing '<@bot>' or
'/deploy' in a quote can't trick the bot into responding in a
channel it shouldn't or classifying a normal message as a command.
Adjustment from original PR: dropped _serialize_slack_blocks_for_agent,
which inlined a redacted JSON dump of non-rich_text blocks (section,
accessory, actions, etc.) — the agent would see the raw Block Kit
structure for UI-heavy alerts. It added up to 6000 characters to the
prompt context on every qualifying message with no opt-out. The
rich_text extraction and attachment unfurls cover the common bug-fix
case (quoted/forwarded content + link previews) without the prefill
tax. If a user needs block inspection later, it can return as a
config opt-in.
Also updates the Slack platform notes in session.py to accurately
describe what the gateway inlines.
After #14798 made cron honor per-platform `hermes tools` config, the
`_DEFAULT_OFF_TOOLSETS` filter silently stripped `homeassistant` from
cron jobs for users who'd been relying on the previous blanket toolset.
Norbert's HA cron reports regressed as a result.
The HA toolset is already runtime-gated by its `check_fn` (requires
HASS_TOKEN to register any tools). When HASS_TOKEN is set the user has
explicitly opted in — `_DEFAULT_OFF_TOOLSETS` adds nothing in that case,
so stop double-gating and restore HA for cron / cli / other platforms
without an explicit saved toolset list.
moa and rl stay off by default (original #14798 goal preserved).
Fixes HA cron regression reported by Norbert.
HindsightEmbedded.close() delegates to its sync client.close(). When Hermes
created/used that client on the shared async loop, closing it from the main
thread raises 'attached to a different loop' before aiohttp releases the
session — so the ClientSession / TCPConnector leak past provider teardown.
Close the embedded inner async client on the shared loop first via
_run_sync(inner_client.aclose()), then let the wrapper's sync close()
do its daemon/UI bookkeeping.
Salvage of #14605: test placement rebased — appended TestShutdown class
after TestSharedEventLoopLifecycle (which landed on main after the PR was
written). Original author attribution preserved.
Translate Slack attachment failures into actionable user-facing notices
instead of generic download errors. When a scope/auth/permission issue
breaks attachment processing, the user sees:
[Slack attachment notice]
- Slack attachment access failed for photo.jpg. Missing scope:
files:read. Update the Slack app scopes/settings and reinstall
the app to the workspace.
Two helpers do the translation:
_describe_slack_api_error — handles SlackApiError responses
(missing_scope, invalid_auth, file_not_found, access_denied, etc.)
_describe_slack_download_failure — handles httpx.HTTPStatusError
(401/403/404) and Slack-returns-HTML-sign-in fallbacks
Wired into three existing call sites:
- the Slack Connect files.info path (PR #11111) so scope errors
surface instead of being logged as generic "files.info failed"
- the image, audio, and document download paths so 401/403 and
HTML-body responses translate into actionable notices
Adjustment from original PR: dropped _probe_slack_file_access_issue,
the proactive pre-download files.info probe. It added one extra
Slack API call per attachment even on healthy ones, and overlapped
with the existing files.info call from PR #11111. The post-failure
translation path covers the same user-facing diagnostic value
without the per-message tax.
Also documents files:read scope more prominently in the Slack setup
guide and troubleshooting table.
Contributed back from https://github.com/xinbenlv/zn-hermes-agent.
Closes#7015.
Co-authored-by: xinbenlv <zzn+pa@zzn.im>
Background review fork now inherits session_id, credential_pool, and
status_callback from the parent (added in #16099 after this PR was
written). Extend the bare-agent helper so the regression test keeps
reaching the cleanup assertions instead of failing in the runtime
resolver.
Signed-off-by: Teknium <8425893+teknium1@users.noreply.github.com>
Temporary background review agents can initialize Hindsight-backed memory clients, but close() alone skips provider teardown. Shut the memory provider down before closing so aiohttp sessions do not leak at process exit.
Made-with: Cursor
Slack Connect channels return file objects with file_access="check_file_info"
and no url_private_download field (see
https://docs.slack.dev/reference/objects/file-object/#slack_connect_files).
These stub objects must be resolved via files.info before download can
proceed. Without this the agent silently skips attachments posted in
Slack Connect channels.
Call files.info on every file whose file_access is check_file_info,
replace the stub with the full file object, and let the existing
download path continue. Warn and skip on files.info failures.
Closes#11095.
The Slack thread-context fetcher used to drop every message with a
bot_id, which silently erased the thread parent whenever a cron job (or
any other bot) had posted it. As a result, replies to a cron-posted
summary lost all context and the agent answered as if from a blank
thread.
Changes:
1. gateway/platforms/slack.py::_fetch_thread_context
- Keep the thread parent even when it was posted by a bot
(e.g. cron summaries, third-party integrations).
- Only skip *our own* prior bot replies to avoid circular context,
matching the per-workspace bot user id via _team_bot_user_ids so
multi-workspace deployments stay correct.
- Keep non-self bot children (useful third-party context).
2. gateway/platforms/slack.py::_handle_slack_message
- Populate MessageEvent.reply_to_text for thread replies (parity
with Telegram/Discord/Feishu/WeCom). gateway.run uses this field
to inject a [Replying to: "..."] prefix when the parent is not
already in the session history, which is exactly the scenario
triggered by cron-generated thread parents.
- New helper _fetch_thread_parent_text reuses the existing thread-
context cache (and its 60s TTL) to avoid duplicate
conversations.replies calls; falls back to a cheap limit=1 fetch
when the cache is cold.
Tests:
- Updated TestSlackThreadContext::test_skips_bot_messages to reflect
the new behaviour (self-bot child dropped, third-party bot kept).
- Added:
* test_fetch_thread_context_includes_bot_parent
* test_fetch_thread_context_excludes_self_bot_replies
* test_fetch_thread_context_multi_workspace
* test_fetch_thread_context_current_ts_excluded (regression guard)
* test_fetch_thread_parent_text_from_cache
* test_slack_reply_to_text_set_on_thread_reply
* test_slack_reply_to_text_none_for_top_level_message
Full Slack suite: 176 passed (was 169).
Slack's chat.postMessage API rejects user IDs (U...) and workspace
IDs (W...) — they are not valid conversation IDs. Posting to them
fails because the API requires a channel ID (C/G/D). To DM a user,
the sender must first call conversations.open to obtain a D... ID.
Tighten _SLACK_TARGET_RE from [CGDUW] to [CGD] so the send path rejects
U/W values as explicit targets and instead falls through to channel-
name resolution (where they'll fail with a clear 'could not resolve'
error rather than silently getting stuck in a retry loop on the API).
Flip the corresponding regression test to assert U/W values are not
explicit. Matches the narrower regex briandevans proposed in #15939.
Co-authored-by: briandevans <brian@bde.io>
send_message(target='slack:<channel_id>') failed with "Could not
resolve" because _parse_target_ref had no Slack branch — Slack's
uppercase alphanumeric IDs fell through to channel-name resolution,
which only matched by name. As a fallback, the agent would retry with
bare target='slack' and post to the home channel instead.
Three fixes:
- _parse_target_ref recognizes Slack IDs (C/G/D/U/W prefix) as
explicit targets so the name-resolver is bypassed entirely.
- resolve_channel_name tries a case-sensitive raw-ID match before
the existing name match, so any platform's IDs resolve cleanly.
- _build_slack now actually calls users.conversations against each
workspace's AsyncWebClient (paginated), instead of only returning
session-history entries. This populates the directory with public
and private channels the bot has joined, so action='list' shows
them and they can also be addressed by name. Errors from one
workspace don't block others.
build_channel_directory becomes async (Slack web calls require it).
The two async-context callers in gateway/run.py are awaited; the
cron ticker thread call bridges via asyncio.run_coroutine_threadsafe.
Slack bot needs channels:read and groups:read scopes for full
enumeration; missing scopes degrade gracefully per-workspace.
addressing #15927
Removes deepseek/deepseek-v4-pro and deepseek/deepseek-v4-flash from
OPENROUTER_MODELS and _PROVIDER_MODELS['nous'], then regenerates
website/static/api/model-catalog.json so the hosted picker JSON drops
them too. Direct-API deepseek provider support is unchanged.
load_gateway_config() has a side effect: when config.yaml contains
platform-gating keys (slack.require_mention, slack.strict_mention,
slack.free_response_channels, slack.allow_bots, slack.reactions, plus
analogous keys for discord/telegram/whatsapp/dingtalk/matrix), it calls
os.environ[KEY] = ... to bridge them to env-var form.
monkeypatch.delenv doesn't track direct os.environ mutations made
inside the test body, so tests that call load_gateway_config() leak
those env vars into later tests on the same xdist worker. The failure
mode is flaky seed-dependent: test_top_level_message_requires_mention_
even_with_session (and siblings in TestThreadReplyHandling) pass when
SLACK_REQUIRE_MENTION is unset but fail when a leaked value of 'false'
is present.
Add the gating env vars to _HERMES_BEHAVIORAL_VARS so the hermetic
autouse fixture blanks them on every test setup, closing the leak
regardless of which test sets them.
Extends the strict_mention feature so an @mention in strict mode no
longer persistently tags the thread as 'mentioned'. Without this, the
thread's first mention would permanently auto-trigger the bot on every
subsequent message — which is exactly what strict_mention is designed
to prevent. Closes the agent-to-agent ack loop hole hhhonzik identified
in #14117.
Co-authored-by: hhhonzik <me@janstepanovsky.cz>
Adds a strict_mention config option that, when enabled, requires an
explicit @-mention on every message in channel threads. Disables the
'once mentioned, forever in the thread' and session-presence auto-triggers.
- New _slack_strict_mention() helper (config.extra + SLACK_STRICT_MENTION env)
- Bridged top-level slack.strict_mention yaml to SLACK_STRICT_MENTION env,
matching require_mention/allow_bots bridging
- Unit tests for the helper + config bridge
* fix(install): add /usr/local/bin PATH guard for RHEL root non-login shells
The FHS-layout branch assumed /usr/local/bin is on PATH for every
standard shell. That holds for login shells (via /etc/profile's
pathmunge) but breaks on RHEL/CentOS/Rocky/Alma 8+ root in non-login
interactive shells (su, sudo -s, tmux panes, some web terminals) —
/etc/bashrc does not add /usr/local/bin and /root/.bash_profile
doesn't either. Result: hermes command links to /usr/local/bin/hermes
but the user has to type the absolute path each time.
Probe a fresh 'bash -i -c' (non-login interactive, matching the user
scenario) after symlinking. If hermes isn't resolvable, append an
idempotent PATH guard to /root/.bashrc and /root/.bash_profile, same
grep pattern already used by the ~/.local/bin branch below. No change
on distros where /usr/local/bin is already inherited.
* fix(update): repair RHEL root PATH on hermes update
Existing RHEL/CentOS/Rocky/Alma root installs won't be repaired by the
install.sh fix alone because 'hermes update' is an in-place git pull, not
a rerun of install.sh. Port the same probe + idempotent .bashrc write
into cmd_update so affected users get fixed automatically on next update.
_ensure_fhs_path_guard() runs after 'Update complete!':
- Linux + root + FHS-layout install (command at /usr/local/bin/hermes) only
- Probe: env -i bash -i -c 'command -v hermes' — fresh non-login interactive
shell, same scenario the user reports
- On failure, append PATH guard to /root/.bashrc and /root/.bash_profile,
skipping if any uncommented PATH line already mentions /usr/local/bin
- Silent no-op on macOS, non-root, legacy layout, or shells that already
resolve hermes
Top-level channel messages arrive at _resolve_thread_ts with
metadata.thread_id set to the message's own ts, because the inbound
handler in _handle_message_event uses 'event.ts' as a session-keying
fallback when event.thread_ts is absent. That made metadata alone
insufficient to distinguish a real thread reply from a top-level
message, so reply_in_thread=false only took effect in DMs.
Use reply_to (== incoming message_id == ts for top-level messages) as
the tiebreaker: when metadata.thread_id == reply_to the 'thread' is the
synthetic session-keying fallback, not a real parent, so we reply
directly in the channel. Real thread replies (reply_to != thread_id)
still resolve to the parent thread and preserve conversation context.
Closes#9268.
Parameterize the test helpers in test_status_command.py to accept a
Platform and add two regression tests ensuring the first-run home-channel
onboarding uses '/hermes sethome' on Slack and '/sethome' everywhere else.
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Slack's adapter registers a single parent slash command /hermes and
dispatches subcommands via slack_subcommand_map(). Bare /sethome is
not a registered command on Slack and fails with 'app did not
respond', logging 'Unhandled request' in slack_bolt.AsyncApp.
Show /hermes sethome in the first-run onboarding hint when the
source platform is Slack; keep /sethome for Telegram, Discord,
Matrix, Mattermost, and other platforms that register it directly.
Fixes#14632
Repeated /queue commands now each produce a full agent turn, in order,
with no merging. Previously the second /queue overwrote the first
because the handler wrote directly into the adapter's single-slot
_pending_messages dict.
- GatewayRunner grows a _queued_events overflow buffer (dict of list).
- /queue puts new items in the adapter's next-up slot when free,
otherwise appends to the overflow. After each run's drain consumes
the slot, the next overflow item is promoted so the recursive run
picks it up.
- /new and /reset clear the overflow.
- /status now reports queue depth when non-zero.
- Ack message shows the depth once it exceeds 1.
Helpers (_enqueue_fifo, _promote_queued_event, _queue_depth) use the
getattr default-fallback pattern so existing tests that build bare
GatewayRunner instances via object.__new__ keep working.
Before: delegate_task children each allocated their own terminal
sandbox keyed by child task_id. Starting extra containers (or Modal
sandboxes / Daytona workspaces) is expensive, and the subagent's work
is invisible to the parent — files written by the child in its
container don't exist in the parent's when the subagent returns.
After: a single `_resolve_container_task_id` helper maps any
tool-call task_id to "default" UNLESS an env override is registered
for it. The parent agent and all delegate_task children therefore
share one long-lived sandbox — installed packages, cwd, /workspace
files, and /tmp scratch carry over freely between them.
RL and benchmark environments (TerminalBench2, HermesSweEnv, ...)
opt in to isolation via `register_task_env_overrides(task_id, {...})`;
those task_ids survive the collapse and get their own sandbox,
preserving the per-task Docker image behavior these benchmarks rely on.
file_state / active-subagents registry / TUI events still key off the
original child task_id, so the 'subagent wrote a file the parent read'
warning and UI per-subagent panels keep working.
Tradeoff: parallel delegate_task children (tasks=[...]) now share one
bash/container. Concurrent cd, env-var mutations, and writes to the
same path will collide. If that bites a specific workflow, the
subagent can opt back into isolation via register_task_env_overrides.
Applied at four lookup sites:
- tools/terminal_tool.py terminal_tool() and get_active_env()
- tools/file_tools.py _get_file_ops() and _get_live_tracking_cwd()
- tools/code_execution_tool.py _get_or_create_environment()
Docs: website/docs/user-guide/configuration.md updated to reflect the
shared-container reality and document the RL/benchmark carve-out.
Tests: tests/tools/test_shared_container_task_id.py (9 cases).
Every command in COMMAND_REGISTRY (/btw, /stop, /model, /help, /new,
/bg, /reset, ...) is now a first-class Slack slash command instead of
a /hermes <subcommand>. Users get the same autocomplete-driven slash
picker experience Slack users expect and that Discord and Telegram
already provide.
Previously Slack registered ONE native slash (/hermes) and split on
the first word, so typing /btw in Slack's composer got 'couldn't find
an app for /btw' because the workspace manifest never declared it.
Changes
- hermes_cli/commands.py: slack_native_slashes() + slack_app_manifest()
generate a Slack manifest from the registry (canonical names +
aliases + plugin commands), clamped to Slack's 50-slash cap with
/hermes reserved as the catch-all.
- gateway/platforms/slack.py: single regex matcher dispatches every
registered slash to _handle_slash_command, which dispatches on
command['command']. Legacy /hermes <subcommand> keeps working for
backward compat with older workspace manifests.
- hermes_cli/slack_cli.py + hermes_cli/main.py: new 'hermes slack
manifest' command prints/writes a full manifest (display info,
OAuth scopes, event subs, socket mode, slash commands) ready to
paste into 'Create from manifest' or Features → App Manifest.
- hermes_cli/setup.py: _setup_slack() now writes the manifest up-front
and points users at the 'From an app manifest' flow; also offers
to refresh the manifest on reconfigure for picking up new commands.
- Tests: 14 new tests covering native-slash dispatch (/btw, /stop,
/model), legacy /hermes <sub> compat, manifest structure, and
telegram<->slack parity (every Telegram command must also register
as a Slack slash). Existing /hermes-registration test updated to
assert the new regex matches /hermes, /btw, /stop, /model, /help.
- Docs: slack.md gains a 'Slash Commands' section + Option A manifest
flow in Step 1; cli-commands.md documents 'hermes slack manifest'.
Users pick up the new slashes by running 'hermes slack manifest --write'
and pasting into Features → App Manifest → Edit in their Slack app
config, then Save (Slack prompts for reinstall if scopes changed).
The Docker terminal-backend docs said 'each session starts a long-lived
container', implying a fresh container per chat session. That hasn't been
true for a while: for the top-level agent, task_id defaults to 'default'
and the container is cached in _active_environments for the lifetime of
the Hermes process. /new, /reset, and switching sessions all reuse the
same container. Only delegate_task subagents and RL rollouts get isolated
containers keyed by their own task_id.
skills/feeds/ only contained a category-marker DESCRIPTION.md with no
actual skills in it. Removing the directory and the 'feeds' -> 'Feeds'
display-label mapping in website/scripts/extract-skills.py (the only
other reference in the repo).
* fix(tui): call maybe_auto_title for TUI sessions (#15961)
The maybe_auto_title() helper is called from cli.py and gateway/run.py
but was never wired into tui_gateway/server.py, so every session started
via 'hermes --tui' landed in state.db with an empty title. Evidence from
the issue reporter: 0/154 TUI sessions titled vs 91/383 CLI.
Mirror the CLI/Gateway pattern: after emitting message.complete, when the
turn finished cleanly, fire-and-forget title generation using the session
key, user prompt, agent response, and current history.
Fixes#15949.
Co-authored-by: math0r-be <math0r-be@github.com>
* chore(release): map math0r-be placeholder email in AUTHOR_MAP
---------
Co-authored-by: math0r-be <math0r-be@github.com>
* fix(/branch): redirect session_log_file and expose branch sessions in list
Two bugs when using /branch:
1. cli.py _handle_branch_command updated agent.session_id but not
agent.session_log_file, so all messages written after branching
landed in the original session's JSON file and the branch never
got its own session_{id}.json on disk.
Fix: mirror the compression-split path (run_agent.py:7579) and
update session_log_file immediately after changing session_id.
2. hermes_state.py list_sessions_rich filtered out every session
with parent_session_id IS NOT NULL to hide sub-agent runs and
compression continuations. Branch sessions share this column, so
they became invisible to `hermes sessions list` and `sessions browse`.
Fix: also include branch children — those whose parent ended with
end_reason='branched' AND whose started_at >= parent.ended_at
(the same timing condition that get_compression_tip uses to
distinguish continuations from live-spawned subagents).
Fixes#14854
Co-Authored-By: Octopus <liyuan851277048@icloud.com>
* chore(release): map octo-patch placeholder email in AUTHOR_MAP
---------
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Octopus <liyuan851277048@icloud.com>
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
'hermes skills list' now shows every skill's enabled/disabled status
and accepts --enabled-only to filter down to what will actually load
for the active profile:
hermes -p dario skills list --enabled-only
Previously the command was a flat catalog — it did not apply
skills.disabled from config.yaml, so there was no way to see the
live skill set for a profile without reading config by hand.
Profile switching already works via -p (swaps HERMES_HOME); this
just surfaces the result visibly.
Changes:
- hermes_cli/skills_hub.py: do_list adds a Status column and an
enabled_only filter; summary reports enabled/disabled split
- hermes_cli/main.py: --enabled-only flag on 'skills list'
- /skills list slash command accepts --enabled-only too
- tests: 4 new (status column, disabled marking, enabled-only
hiding, no platform leakage into get_disabled_skill_names);
existing fixtures updated to accept skip_disabled kwarg
Reported by @mochizukimr on X.
* feat(cli,tui): surface /queue, /bg, /steer in agent-running placeholder
While the agent loop is running, the input placeholder previously only
hinted at Enter-to-interrupt. Surface the full set of busy-time actions
(interrupt via new message, /queue, /bg, /steer) so users discover them
without hunting through docs or Teknium's tweets.
- cli.py: "msg=interrupt · /queue · /bg · /steer · Ctrl+C cancel"
- ui-tui/src/components/appLayout.tsx: same string (was "Ctrl+C to interrupt…")
* revert tui placeholder change (cli-only per review)
Address Copilot review findings:
1. Gate _last_activity_desc on interrupt_depth == 0 alongside _last_activity_ts.
Both fields are semantically paired — desc describes the activity *at* ts.
Updating desc without ts made get_activity_summary() report "starting new
turn (cached)" for 20+ minutes while the timestamp showed the true stale
duration, producing misleading diagnostic output.
2. Monkeypatch gateway.run.time.time to a fixed epoch in tests that assert
on _last_activity_ts values. Real time.time() comparisons were latently
flaky under slow CI or NTP adjustments. _FAKE_NOW = 10_000.0 is used
as the reference; assertions are now exact equality rather than >=.
3. Add test_fresh_turn_resets_desc and test_interrupt_turn_preserves_desc to
directly cover the gated desc behaviour introduced by (1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_last_activity_ts was unconditionally reset to time.time() on every
_agent_cache hit. For interrupt-recursive _run_agent calls
(_interrupt_depth > 0) this silently reset the inactivity watchdog's
idle clock on each re-entry, preventing the 30-min timeout from ever
firing when a turn got stuck in an interrupt loop. A stuck session
would emit "Still working... iteration 0/60, starting new turn (cached)"
heartbeats indefinitely instead of timing out.
Gate the reset on _interrupt_depth == 0 only. Fresh external turns
still receive the reset so a session idle for 29 min doesn't trip the
watchdog before the new turn makes its first API call (#9051).
The per-turn reset logic is extracted into a static helper
_init_cached_agent_for_turn() to make it directly testable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to #6616 covering the remaining user-injected prompt markers that
the original PR did not touch (reporter's second comment on #6576 explicitly
flagged these). Azure OpenAI Default/DefaultV2 content filters treat any
bracketed [SYSTEM: ...] as prompt-injection and reject with HTTP 400.
Remaining call sites renamed:
- cli.py: background-process notifications (watch_disabled, watch_match,
completion), MCP reload notice (4 live + 1 docstring)
- gateway/run.py: same notification paths + auto-loaded skill banner +
MCP reload notice (5 live + 1 docstring)
- tools/process_registry.py: comment reference
Not renamed:
- environments/hermes_base_env.py '[SYSTEM]\n{content}' — RL training
trajectory rendering only, never sent to Azure, part of a symmetric
[USER]/[ASSISTANT]/[TOOL] scheme.
AUTHOR_MAP: buraysandro9@gmail.com -> ygd58.
Azure OpenAI content filters (Default/DefaultV2) treat bracketed
[SYSTEM: ...] meta-instructions as prompt-injection attempts and
reject requests with HTTP 400.
Replacing [SYSTEM: with [IMPORTANT: preserves the same semantic
meaning for the model while bypassing the Azure heuristic.
Fixes#6576
Follow-up to cherry-picked PR #15920:
- agent/credential_pool.py: hoist 'from hermes_cli.config import get_env_value'
to module top instead of inline try/except in each seed site (3 sites).
No import cycle — hermes_cli/config.py doesn't depend on agent.credential_pool.
- hermes_cli/auth.py: same hoist for the _resolve_api_key_provider_secret loop.
- tests/tools/test_credential_pool_env_fallback.py: replace smoke-only tests
with real .env file I/O. Each test writes a temp ~/.hermes/.env, verifies
_seed_from_env / _resolve_api_key_provider_secret read from it, and asserts
the full priority chain: os.environ > .env > credential_pool. Uses
'deepseek' as the test provider since 'openai' isn't in PROVIDER_REGISTRY
and _seed_from_env's generic path requires a real pconfig lookup.
_resolve_api_key_provider_secret() and _seed_from_env() only checked
os.environ for provider API keys. When keys exist in ~/.hermes/.env but
are not loaded into the process environment (e.g. ACP adapter entry
point, post-session-start .env edits, or non-CLI entry points), the
resolution returns an empty string, causing HTTP 401 failures.
Changes:
- credential_pool._seed_from_env: use get_env_value() which checks both
os.environ and ~/.hermes/.env file, preventing _prune_stale_seeded_entries
from removing valid entries whose env var isn't in os.environ
- credential_pool._seed_from_env: same fix for openrouter and
base_url_env_var resolution
- auth._resolve_api_key_provider_secret: use get_env_value() instead of
os.getenv(), and add credential_pool fallback when env resolution fails
Fixes#15914
The background memory/skill review (_spawn_background_review) has always
forked a new AIAgent passing only model and provider, then relied on
AIAgent.__init__ to re-resolve credentials from env vars. This works for
users with keys in ~/.hermes/.env but silently falls back to env-var
auto-resolution in all cases, which fails for OAuth-only providers,
session-scoped creds, and credential-pool setups where auth can't be
reconstructed from env.
This used to be invisible -- failures were swallowed via logger.debug().
PR 8a2506af4 (Apr 24) surfaced auxiliary failures to the user, which
made the stale bug visible as:
"Auxiliary background review failed: No LLM provider configured"
Fix: pass api_key, base_url, api_mode, and credential_pool from the
parent's live runtime into the fork -- matching how every other
auxiliary path (compression, memory flush, vision, session search)
already inherits the parent's credentials via _current_main_runtime().
The chown/chmod block on config.yaml was added in b24d239ce to keep the
file readable by the hermes runtime user, but it sat in the post-gosu
'running as hermes' section of the entrypoint. That meant:
1. Default `docker run <image>` — container starts as root, entrypoint
drops to hermes via gosu, then non-root hermes tries to chown the
file to hermes. Works by coincidence because the file was just
created by root during volume setup and gosu target == target owner.
2. `docker run -u $(id -u):$(id -g) <image>` (#15865) — container
starts as the caller's UID. The root block is skipped entirely, we
land in the hermes section as some arbitrary non-root user, and
chown to 'hermes' fails with 'Operation not permitted'. Script
aborts under `set -e`.
Move the chown/chmod into the root block (before the gosu exec) where
it actually has privilege, and guard with `2>/dev/null || true` so
rootless Podman (where even in-container root lacks host-side chown
rights) doesn't abort either.
Closes#15865
Salvage PR #15883 cherry-picked FocusFlow Dev's commit; release-notes
CI needs the AUTHOR_MAP entry to attribute to the PR author's GitHub
login rather than a placeholder.
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.
Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).
What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
the running-agent guard (board writes don't touch agent state), so
/kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
updated, sidebar entry added.
Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
can serve many businesses with data isolation by workspace path.
Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
Follow-up to PR #16053 (/btw as /background alias). Cleans up the
plumbing added exclusively for the old ephemeral /btw handler and
repairs a broken btw bypass that landed between my refactor and this
follow-up.
run_agent.py:
- Remove persist_session kwarg, instance attr, and _persist_session
short-circuit. Only /btw ever passed persist_session=False; with
/btw gone the default (always persist) is the only behavior anyone
ever wanted.
gateway/run.py:
- Remove the unreachable 'if _cmd_def_inner.name == "btw"' block
(PR #16059). Canonical name for a /btw message is 'background' after
alias resolution — the comparison could never be true, and it called
_handle_btw_command which no longer exists. The /background branch
above it already dispatches /btw correctly.
tests/gateway/test_running_agent_session_toggles.py:
- Fix test_btw_dispatches_mid_run to mock _handle_background_command
(the real dispatch target for /btw) instead of the deleted
_handle_btw_command.
/btw spawns a parallel ephemeral side-question task (self-guarded against
concurrent /btw on the same chat) — exactly like /background. But it was
missing from the running-agent bypass list in _handle_message(), so it
fell through to the catch-all and returned:
⏳ Agent is running — /btw can't run mid-turn. Wait for the current
response or /stop first.
That's the opposite of what /btw is for — asking a side question while
the main turn is still working. Add the bypass next to /background and a
regression test covering the mid-turn dispatch path.
Reported by @IuriiTiunov on Telegram.
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.
Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.
Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py
Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
PR #16046 added /busy and /verbose hints to the classic CLI and the
gateway runner but skipped the Ink TUI (and therefore the dashboard
/chat page, which embeds the TUI via PTY). This extends the same
latch to the TUI with TUI-native wording.
The TUI's busy-input model is not the /busy knob from the CLI —
single Enter while busy auto-queues, double Enter on an empty line
interrupts. The new busy-input hint teaches THAT gesture instead of
telling the user to flip a config that does not apply.
Changes:
- agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui()
- tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy
hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into
_on_tool_complete for the 30s/tool_progress=all path. Same
config.yaml latch so each hint fires at most once per install across
CLI, gateway, and TUI combined.
- ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event
- ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys()
- ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first
busy enqueue
- tests/agent/test_onboarding.py — +3 cases for TUI hint shape
- tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim
- website/docs/user-guide/tui.md — new 'Interrupting and queueing'
section explaining the TUI's double-Enter model and the hints
Validation:
scripts/run_tests.sh tests/agent/test_onboarding.py \
tests/tui_gateway/test_protocol.py \
tests/gateway/test_busy_session_ack.py
-> 66 passed
npm --prefix ui-tui run type-check -> clean
npm --prefix ui-tui run lint -> clean
npm --prefix ui-tui run build -> clean
Manage the fallback_providers chain from the CLI instead of hand-editing
config.yaml. The picker reuses select_provider_and_model() from 'hermes
model' — same provider list, same credential prompts, same model picker.
hermes fallback [list] Show the current chain (primary + fallbacks)
hermes fallback add Run the model picker, append selection to chain
hermes fallback remove Pick an entry to delete (arrow-key menu)
hermes fallback clear Remove all entries (with confirmation)
'add' snapshots config['model'] before calling the picker, extracts the
user's selection from the post-picker state, then restores the primary
and appends {provider, model, base_url?, api_mode?} to fallback_providers.
Auth store's active_provider is snapshot/restored too so OAuth-provider
fallbacks don't silently deactivate the user's primary. Duplicates and
self-as-fallback are rejected. Legacy single-dict 'fallback_model' entries
are auto-migrated to the list format on first write.
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:
1. First message while the agent is working — appends a hint to the busy-ack
explaining the /busy queue vs /busy interrupt knob, phrased to match the
mode that was just applied (don't tell a queue-mode user to switch to
queue).
2. First tool that runs for >= 30s in the noisiest progress mode
(tool_progress: all) — prints a hint about /verbose to cycle display
modes (all -> new -> off -> verbose). Gated on /verbose actually being
usable on the surface: always shown on CLI; on gateway only shown when
display.tool_progress_command is enabled.
Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.
New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
load_cli_config defaults (cli.py). No _config_version bump — deep
merge handles new keys.
Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
after building the ack. progress_callback tracks tool.completed
duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
Enter; _on_tool_progress appends the tool-progress hint on the first
>=30s tool completion. In-memory CLI_CONFIG is also updated so
subsequent fires in the same process are suppressed immediately.
All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
The base adapter's auto-TTS path fired on any voice message unless the
chat had explicitly run /voice off — it never read voice.auto_tts from
config.yaml, so users who set auto_tts: false still got audio replies.
Gate the base adapter on a three-layer decision instead:
1. chat in _auto_tts_enabled_chats (explicit /voice on|tts) → fire
2. chat in _auto_tts_disabled_chats (explicit /voice off) → suppress
3. else → voice.auto_tts global default
Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default
and mirrors /voice on|tts chats into _auto_tts_enabled_chats via the
existing _sync_voice_mode_state_to_adapter path. /voice off still wins.
Closes#16007.
Users who run `hermes setup` get `cli-config.yaml.example` copied verbatim
(including comments) to ~/.hermes/config.yaml. But several display settings
had thin comments that didn't enumerate the valid options, so users couldn't
tell from reading their config what values each key accepts.
- busy_input_mode: widen from 'CLI' to 'CLI and gateway platforms';
note /stop as gateway equivalent of Ctrl+C; add /busy_input_mode runtime hint
- compact, interim_assistant_messages, bell_on_complete, show_reasoning,
streaming: add true/false option lines showing effect of each value
- skin: refresh the built-in skin list (was missing daylight, warm-lightmode,
poseidon, sisyphus, charizard — 5 of 9 built-ins undocumented)
When the LLM response carries N parallel tool calls, the agent fires
N tool.started events back-to-back before its interrupt check runs.
A user sending /stop mid-batch would see the '⚡ Interrupting current
task' ack followed by a trail of 🔍 web_search bubbles for the remaining
events in the batch — making the interrupt feel ignored.
progress_callback and the drain loop in send_progress_messages now
check agent.is_interrupted (via agent_holder[0], the existing
cross-scope handle). Events that arrive after interrupt are dropped
at both the queueing and rendering stages. The '⚡ Interrupting'
message is sent through a separate adapter path and is unaffected.
Follow-up on #16020 salvage. Three corrections:
1. Truth signal for /copy
Before: success was 'OSC 52 sequence was emitted to stdout'. That's
false on local Linux inside tmux (emitSequence=false), so /copy kept
printing 'clipboard copy failed' to users whose xclip/wl-copy had
already succeeded fire-and-forget.
Fix: setClipboard() now returns { sequence, success } where success =
native-fired OR tmux-buffer-loaded OR osc52-emitted. copyNative()
returns a boolean telling setClipboard whether a native attempt was
made. /copy only shows 'failed' when literally no path was taken.
2. Dashboard keybinding
Before: Ctrl+C for copy on non-Mac (Ctrl+Shift+C for paste).
That swallows SIGINT when a stale selection is present and breaks
the xterm/gnome-terminal/konsole/Windows-Terminal convention where
Ctrl+C in a terminal emulator is always SIGINT. The real bug was
that clipboard writes lost user-gesture through OSC-52 round-trips,
which the direct writeText already fixes.
Fix: revert copyModifier to Ctrl+Shift+C on non-Mac. Direct
writeText in the keydown handler preserves user gesture. term.write
Escape replaced with term.clearSelection() (works without relying
on TUI input mode).
3. Error toast text
Before: 'see HERMES_TUI_DEBUG_CLIPBOARD' — tells users how to
debug but not how to fix.
Fix: point users at HERMES_TUI_FORCE_OSC52=1 first (the actual
escape hatch), mention the debug var second.
- Dashboard copy: direct Clipboard API on Ctrl+C/Cmd+C (user gesture);
send Escape to TUI to clear selection; Ctrl+Shift+C kept as fallback.
- TUI /copy: copySelection() async; only reports success if OSC52 emitted.
- Add HERMES_TUI_FORCE_OSC52 env var to override native-tool detection.
- Fixes "copied N chars" false-positive when clipboard backend absent.
Changes:
web/src/pages/ChatPage.tsx — direct navigator.clipboard.writeText
ui-tui/packages/hermes-ink/src/ink/ink.tsx — async copySelection
ui-tui/packages/hermes-ink/src/ink/termio/osc.ts — HERMES_TUI_FORCE_OSC52
ui-tui/src/app/slash/commands/core.ts — async /copy with honest feedback
Problem: Ctrl+C in Hermes TUI shows 'copied' but clipboard often empty.
Root causes:
- Native Linux tools (xclip, wl-copy) require DISPLAY/WAYLAND_DISPLAY; in
headless Docker/SSH they fail or hang.
- OSC 52 fallback requires terminal emulator support; when absent, sequence
is dropped silently.
- Dashboard OSC 52 → Clipboard API path fails due to missing user gesture;
errors were silently caught.
- User feedback 'copied selection' was shown unconditionally, regardless of
success.
Solution implemented:
- Short-circuit Linux native clipboard probing when no display server is
present (no DISPLAY and no WAYLAND_DISPLAY). Avoids futile attempts and
timeouts.
- Add HERMES_TUI_DEBUG_CLIPBOARD env var (1/true). When set, TUI logs to
stderr which clipboard path is used, probe results on Linux, and whether
OSC 52 was emitted. Greatly improves diagnosability.
- Improve dashboard clipboard error handling: replace empty catch blocks
with console.warn messages for OSC 52 decode/Write failures and direct
copy/paste errors. Makes browser permission/user-gesture failures visible
in DevTools.
- Add comprehensive clipboard troubleshooting documentation to README and
AGENTS, covering OSC 52 verification, tmux config, Docker/headless
constraints, env vars, dashboard caveats, and fallback strategies.
Technical details:
- in ui-tui/packages/hermes-ink/src/ink/termio/osc.ts:
- Early return on Linux if both DISPLAY and WAYLAND_DISPLAY unset.
- Refactor probe sequence to async with 500ms timeout,
caching result; subsequent copies use cached tool immediately.
- Emit debug logs when HERMES_TUI_DEBUG_CLIPBOARD=1.
- in ink.tsx: log when OSC 52 not emitted (native
or tmux path in use) in debug mode.
- : OSC 52 handler and Ctrl+Shift+C handler now
log warnings to console on Clipboard API rejection with error message.
- Documentation: new 'Clipboard Troubleshooting' section in README; new
'Clipboard environment variables and pitfalls' subsection in AGENTS.md
(Known Pitfalls).
Tests: full ui-tui test suite (292 tests) passes; clipboard and OSC tests
unaffected. No breaking changes.
Files changed:
- ui-tui/packages/hermes-ink/src/ink/termio/osc.ts
- ui-tui/packages/hermes-ink/src/ink/ink.tsx
- web/src/pages/ChatPage.tsx
- README.md
- AGENTS.md
- CHANGELOG.md (new)
OpenRouter and Nous Portal curated picker lists now resolve via a JSON
manifest served by the docs site, falling back to the in-repo snapshot
when unreachable. Lets us update model lists without shipping a release.
Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
(source at website/static/api/model-catalog.json; auto-deploys via the
existing deploy-site.yml GitHub Pages pipeline on every merge to main).
Schema (v1) carries id + optional description + free-form metadata at
manifest, provider, and model levels. Pricing and context length stay
live-fetched via existing machinery (/v1/models endpoints, models.dev).
Config (new model_catalog section, default enabled):
model_catalog.url master manifest URL
model_catalog.ttl_hours disk cache TTL (default 24h)
model_catalog.providers.<name>.url optional per-provider override
Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP
fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last
resort. Never raises to callers; at worst returns the bundled list.
Changes:
- website/static/api/model-catalog.json initial manifest (35 OR + 31 Nous)
- scripts/build_model_catalog.py regenerator from in-repo lists
- hermes_cli/model_catalog.py fetch + validate + cache module
- hermes_cli/models.py fetch_openrouter_models() +
new get_curated_nous_model_ids()
- hermes_cli/main.py, hermes_cli/auth.py Nous flows use the helper
- hermes_cli/config.py model_catalog defaults
- website/docs/reference/model-catalog.md + sidebars.ts
- tests/hermes_cli/test_model_catalog.py 21 tests (validation, fetch
success/failure, accessors,
disabled, overrides, integration)
Stop pre-stripping the path from the configured MCP server URL before
constructing OAuthClientProvider. The MCP SDK strips the path itself via
OAuthContext.get_authorization_base_url() for authorization-server
discovery, but uses the full server_url through
resource_url_from_server_url() + check_resource_allowed() to validate
against the server's RFC 9728 Protected Resource Metadata.
For servers whose PRM advertises a path-scoped resource (e.g. Notion's
https://mcp.notion.com/mcp), our _parse_base_url() collapsed the URL to
the origin, so check_resource_allowed() saw requested='/' vs
configured='/mcp/' and refused the token. Fixes OAuth against Notion MCP
(and any other path-scoped resource).
Closes#16015.
`_apply_model_switch_result` (the interactive `/model` picker's
confirmation path) printed `ModelInfo.context_window` straight from
models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on
openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker
showed 1M while the runtime (compressor, gateway `/model`, typed
`/model <name>`) correctly used 272K — the classic 'sometimes 1M,
sometimes 272K' mismatch on a single model.
Both display paths now go through `resolve_display_context_length()`,
matching the fix that `_handle_model_switch` received earlier.
Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS
(`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the
272K Codex cap is already enforced via the Codex-OAuth branch, so the
fallback now reflects what every non-Codex probe-miss should see.
Tests: adds `test_apply_model_switch_result_context.py` with three
scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls
back to ModelInfo). Updates the existing non-Codex fallback test to
assert 1.05M (the correct value).
## Validation
| path | before | after |
|-------------------------------|-----------|-----------|
| picker -> gpt-5.5 on Codex | 1,050,000 | 272,000 |
| picker -> gpt-5.5 on OpenAI | 1,050,000 | 1,050,000 |
| picker -> gpt-5.5 on OpenRouter | 1,050,000 | 1,050,000 |
| typed /model gpt-5.5 on Codex | 272,000 | 272,000 |
#14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native
provider but the context-window lookup still falls back to the existing
"deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context
window, so any caller relying on get_model_context_length() for
pre-flight token budgeting (compression, context warnings) under-counts
by ~8x.
Add explicit lowercase entries for the four DeepSeek model ids that
ship 1M context:
- deepseek-v4-pro
- deepseek-v4-flash
- deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking)
- deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking)
Longest-key-first substring matching means these explicit entries also
cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter
and Nous Portal) without regressing the existing 128K fallback for
older / unknown DeepSeek model ids on custom endpoints.
Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing
The background skill-review prompt (spawned after N user turns) now instructs
the reviewer to SURVEY existing skills first, identify the CLASS of task, and
PREFER updating/generalizing an existing skill over creating a new narrow one.
This reduces near-duplicate skill accumulation at the source. Catches the
common failure mode where repeated tasks of the same class each spawn their
own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead
of a single class-level skill ("desktop-app-build-troubleshooting").
Applied to both _SKILL_REVIEW_PROMPT and the **Skills** half of
_COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged.
Groundwork for the Curator feature (issue #7816) — the creation-side fix.
Curator handles the retirement/consolidation side in a follow-up PR.
Tests assert the behavioral instructions are present (survey, class, update-
over-create, overlap-flagging, opt-out clause) rather than snapshotting the
full prompt text.
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of
those models recorded a cross-session file breaker that blocked EVERY
model on Nous for the cooldown window -- even though the caller's
own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro
capacity error, restarted, switched to Kimi 2.6, and still got
'Nous Portal rate limit active -- resets in 46m 53s'.
Nous already emits the full x-ratelimit-* header suite on every
response (captured by rate_limit_tracker into agent._rate_limit_state).
We now gate the breaker on that data: trip it only when either the
429's own headers or the last-known-good state show a bucket with
remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s
(healthy buckets everywhere, but upstream out of capacity) fall
through to normal retry/fallback and the breaker is never written.
Note: the in-memory 'restart TUI/gateway to clear' workaround
circulated in Discord does NOT work -- the breaker is file-backed at
~/.hermes/rate_limits/nous.json. The workaround for users still
affected by a bad state file is to delete it.
Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.
Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
_serialize_payload already surfaces non-top-level kwargs under
payload['extra'], so duration_ms lands at extra.duration_ms for
shell-hook scripts
Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.
Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).
Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
Adds a floor below --yolo: a tiny set of commands so catastrophic they
should never run via the agent, regardless of --yolo, gateway /yolo,
approvals.mode=off, or cron approve mode. Opting into yolo is trusting
the agent with your files and services — not trusting it to wipe the
disk or power the box off.
The list is deliberately small (12 patterns), covering only
unrecoverable ops:
- rm -rf targeting /, /home, /etc, /usr, /var, /boot, /bin, /sbin,
/lib, ~, $HOME
- mkfs (any variant)
- dd + redirection to raw block devices (/dev/sd*, /dev/nvme*, etc.)
- fork bomb
- kill -1 / kill -9 -1
- shutdown, reboot, halt, poweroff, init 0/6, telinit 0/6,
systemctl poweroff/reboot/halt/kexec
Recoverable-but-costly commands (git reset --hard, rm -rf /tmp/x,
chmod -R 777, curl | sh) stay in DANGEROUS_PATTERNS where yolo can
still pass them through — that's what yolo is for.
Container backends (docker/singularity/modal/daytona) continue to
bypass both hardline and dangerous checks, since nothing they do can
touch the host.
Inspired by Mercury Agent's permission-hardened blocklist.
Bare `hermes setup` on a returning user now drops straight into the
full reconfigure wizard — every prompt shows the current value as its
default, press Enter to keep or type a new value to change it. The
returning-user menu is gone.
Behavior:
- First-time user: first-time wizard (unchanged)
- Returning user, bare command: full reconfigure wizard (new default)
- Returning user, `--quick`: only prompt for missing/unset items
- Returning user, one section: `hermes setup model|terminal|gateway|tools|agent`
- `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default)
The section functions already used current values as prompt defaults —
this change just removes the extra click to get to them.
The 'Quick Setup - configure missing items only' menu option is now
exposed as the explicit `--quick` flag; it's the narrow case of
filling in missing config (e.g. after a partial OpenClaw migration or
when a required API key got cleared).
Inspired by Mercury Agent's `mercury doctor` UX.
Also removes:
- RETURNING_USER_MENU_SECTION_KEYS (orphaned constant)
- Two returning-user menu tests in test_setup_noninteractive.py
(guarding behavior that no longer exists — covered by
test_setup_reconfigure.py instead)
- New website/docs/guides/azure-foundry.md covering both OpenAI-style
and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x
routing, /v1 stripping, api-version query forwarding, and the
provider: anthropic + Azure URL alternative setup.
- environment-variables.md picks up AZURE_FOUNDRY_API_KEY,
AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY.
- cli-commands.md includes azure-foundry in the provider choices list.
- configuration.md lists azure-foundry among auxiliary-task providers.
- sidebars.ts wires the new guide into the Guides section.
- scripts/release.py AUTHOR_MAP entries for TechPrototyper,
HangGlidersRule (noreply), and pein892 so the contributor-attribution
CI check does not reject the salvage.
The azure-foundry wizard now probes the endpoint before asking the user
to pick anything by hand:
1. URL path sniff — endpoints ending in /anthropic are Azure Foundry
Claude routes and skip to anthropic_messages.
2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped
model list, we switch to chat_completions and prefill the picker
with the returned deployment/model IDs.
3. Anthropic Messages probe — fallback for endpoints that don't expose
/models but do speak the Anthropic Messages shape.
4. Manual fallback — private endpoints / custom routes still work;
the user picks API mode + types a deployment name.
Context length for the selected model is resolved through the existing
agent.model_metadata.get_model_context_length chain (models.dev,
provider metadata, hardcoded family fallbacks) and stored in
model.context_length when a non-default value is found.
Also refactors runtime_provider so Azure Foundry resolution is reused
between the explicit-credentials path and the default top-level path —
previously the /v1 strip for Anthropic-style Azure only ran when the
caller passed explicit_* args, which meant config-driven sessions
hit a double-/v1 URL.
New module hermes_cli/azure_detect.py with 19 unit tests covering:
- path sniff, model ID extraction, probe fallbacks
- HTTP error handling (URLError, HTTPError)
- context-length lookup passthrough
- DEFAULT_FALLBACK_CONTEXT rejection
New runtime tests cover:
- OpenAI-style Azure Foundry
- Anthropic-style Azure Foundry with /v1 stripping
- Missing base_url / API key raising AuthError
Rationale: Microsoft confirms there's no pure-API-key endpoint to list
Azure deployments (that requires ARM management auth). The v1 Azure
OpenAI endpoint does expose /models with the resource's available
model catalog, which is good enough for picker prefill in the common
case. Users on private/gated endpoints fall through to manual entry.
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.
Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.
No-op for URLs without query params — fully backward compatible.
Add support for Azure Foundry as a new inference provider. Azure Foundry
endpoints can use either OpenAI-style (/v1/chat/completions) or
Anthropic-style (/v1/messages) API formats.
Changes:
- Add azure-foundry to PROVIDER_REGISTRY (auth.py)
- Add azure-foundry overlay in HERMES_OVERLAYS (providers.py)
- Add empty model list for azure-foundry (models.py)
- Add _model_flow_azure_foundry() interactive setup (main.py)
- Add azure-foundry runtime resolution with api_mode support (runtime_provider.py)
- Add AZURE_FOUNDRY_API_KEY and AZURE_FOUNDRY_BASE_URL env vars (config.py)
Usage:
hermes model -> More providers -> Azure Foundry
The setup wizard prompts for:
- Endpoint URL
- API format (OpenAI or Anthropic-style)
- API key
- Model name
Configuration is saved to config.yaml (model.provider, model.base_url,
model.api_mode, model.default) and ~/.hermes/.env (AZURE_FOUNDRY_API_KEY).
Fixes#15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback.
## What changed
New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching.
`agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe).
Wired through five call sites that previously either duplicated the lookup or ignored it entirely:
- `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning)
- `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch
- `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg
- `gateway/run.py` /model confirmation (picker callback + text path)
- `gateway/run.py` `_format_session_info` (/info)
## Context probe tiers
`CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency.
## Repro (from #15779)
```yaml
custom_providers:
- name: my-custom-endpoint
base_url: https://example.invalid/v1
model: gpt-5.5
models:
gpt-5.5:
context_length: 1050000
```
`/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000".
## Tests
- `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants
- `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver
- `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K)
- `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier
task.cancel() can't preempt the run_in_executor thread running
run_conversation(), so we rely on agent.interrupt() to wake the loop.
Without a timeout, a slow/unresponsive interrupt blocks the HTTP
response indefinitely. Wrap the await in wait_for(shield(task), 5.0)
and log a warning on timeout.
Also tidy one extra space in the module docstring's /stop entry.
Add ability to interrupt a running agent via the runs API. Previously
/v1/runs could start a run and subscribe to events, but there was no
way to cancel it. The new endpoint stores agent and task references
during execution, calls agent.interrupt() to stop LLM calls, then
cancels the asyncio task.
Includes 15 tests covering start, events, and stop scenarios.
- resolveEditor() now returns argv (string[]) so EDITOR='code --wait'
and VISUAL='emacsclient -t' tokenize correctly into spawnSync's
separate command + args. Previously the whole string was passed as
argv[0] and would ENOENT.
- Skip the POSIX X_OK PATH walk on Windows; return ['notepad.exe']
there since fs.constants.X_OK is not meaningful and PATHEXT-based
resolution would need its own implementation.
- Surface openEditor() rejections via actions.sys instead of letting
them become unhandled promise rejections in the useInput callback.
- Hotkey docs/comment now say Cmd/Ctrl+G to match isAction()'s
platform-action-modifier behavior (Cmd on macOS, Ctrl elsewhere).
When the user interrupts a long-running task, prompt_toolkit tries to
flush stdout during emergency shutdown. If stdout is in a broken state
(redirected to /dev/null, pipe closed, terminal gone), the flush raises
`OSError: [Errno 5] Input/output error` which propagates unhandled and
crashes the CLI.
Two defense layers:
1. `_suppress_closed_loop_errors`: add `OSError` with `errno.EIO` to
the asyncio exception handler, matching the existing pattern for
`RuntimeError("Event loop is closed")` and `KeyError("is not
registered")`.
2. Outer `except (KeyError, OSError)` block: add `errno.EIO` check
before the existing string-match guards, silently suppressing the
error instead of printing a misleading stdin-related message.
Fixes#13710.
- editor.ts: collapse two private helpers into one flatMap-driven lookup,
keep `isExecutable` as the only named primitive, document the fallback
chain with prompt_toolkit parity
- editor.test.ts: hoist the `exe` helper out of `describe`, drop the
empty afterEach + dead mkdir branch, materialize expected paths before
the resolveEditor call so argument evaluation order doesn't bite
- useComposerState.openEditor: rmSync the mkdtemp dir (was leaking),
early-return on bad exit / empty buffer, run cleanup in finally
- useInputHandlers: cheap `ch.toLowerCase() === 'g'` guard before the
modifier check
- hermes-ink/screen.ts: pick up `npm run fix` import-sort cleanup so
lint passes
Base CLI's editor UX was better because prompt_toolkit picks the system
editor first, then friendly terminal editors before vi. Do not override
that with a vim-first chain.
Keep the CLI on prompt_toolkit's picker and only set tempfile_suffix='.md'
to avoid the complex-tempfile EEXIST path. Update the TUI resolver to
match prompt_toolkit's fallback order: $VISUAL, $EDITOR, editor, nano,
pico, vi, emacs.
Setting buffer.tempfile = 'prompt.md' pushed prompt_toolkit into its
complex-tempfile path, which creates a temp dir and then calls
os.makedirs() on that same path when no subdirectory is present. That
raises EEXIST before the editor can launch.
Keep prompt_toolkit on the simple tempfile path with .md suffix, and
make the editor fallback chain explicit on both surfaces:
$VISUAL -> $EDITOR -> nvim -> vim -> vi -> nano.
The cherry-picked approach serialized the UI-shaped transcript on the Node
side, producing a third JSON format alongside cli.py save_conversation and
tui_gateway session.save. Simpler to call the existing session.save method,
which already writes the canonical agent history (raw OpenAI messages +
model) to an absolute-path file.
- /save still short-circuits before the slash worker
- Empty transcript -> 'no conversation yet'
- No active session -> 'no active session - nothing to save'
- Otherwise: rpc('session.save', {session_id}) and echo back the file path
- Tests updated to assert RPC contract; new test covers the no-sid case
prompt_toolkit's default editor list is: $VISUAL, $EDITOR, /usr/bin/editor,
/usr/bin/nano, /usr/bin/pico, /usr/bin/vi, /usr/bin/emacs — so when
neither env var is set, the base CLI launched nano. The TUI fell back
to a literal 'vi'. Same Ctrl+G keystroke, two different editors.
Pick the same chain on both surfaces:
$VISUAL → $EDITOR → vim → vi → nano
CLI: override input_area.buffer._open_file_in_editor on the TextArea
once at app build time. Local to that buffer; doesn't touch
os.environ or affect other subprocesses.
TUI: extract resolveEditor() into ui-tui/src/lib/editor.ts. PATH walk
with accessSync(X_OK), no shelling out. Six-line unit test verifies
the priority order and the multi-entry PATH walk.
The raw-template lookup added in PR #15817 went through
`get_compatible_custom_providers(read_raw_config())`, which calls
`_normalize_custom_provider_entry` → `urlparse(base_url)`. Any
entry whose `base_url` is itself an env-ref (`${NEURALWATT_API_BASE}`)
was dropped as 'not a valid URL', so `api_key_ref` stayed empty and the
resolved secret was still written to `model.api_key` — the exact case
the original Discord report described.
Replace the normalizer-gated lookup with a direct read of
`raw['custom_providers']` and `raw['providers']`, indexed by name
(case-insensitive, optionally qualified by model) so the loaded
(expanded) entry can be matched regardless of how `base_url` is
written.
Add an integration regression test driving the real
`select_provider_and_model` entry point with the Discord-reported
NeuralWatt config (`${VAR}` in both `base_url` and `api_key`).
This test fails on the PR-only fix and passes with the broadened
lookup.
Base CLI was handing prompt_toolkit's Buffer.open_in_editor() a default
config — Buffer.tempfile_suffix and .tempfile both empty — so it
created /tmp/tmpXXXXXX with no extension. nano/vim/helix all key
syntax highlighting off the file extension, so the buffer rendered
plain.
The TUI already writes to <mkdtemp>/prompt.md and gets full markdown
highlighting + a sensible title bar. Set buffer.tempfile = 'prompt.md'
on the TextArea so prompt_toolkit's complex-tempfile path produces
<mkdtemp>/prompt.md to match. shutil.rmtree cleanup is built-in.
When switching models on a custom endpoint (ollama-launch):
- Same-provider switches no longer re-resolve credentials (fixes base_url
being lost for 'custom' provider on subsequent switches)
- Named providers (ollama-launch) are resolved via user_providers so
switch_model can find their base_url from config
- Models not in the /v1/models probe but present in the user's saved
provider config are accepted with a warning instead of rejected
- CLI /model and TUI /model both pass user_providers/custom_providers
to switch_model so the config model list is available for validation
Closes#15088
Same problem as the TUI: Cursor and VSCode bind Ctrl+G to "Find Next"
at the editor level, so the keystroke never reaches the terminal and
the prompt_toolkit-driven Hermes CLI sees nothing.
Register ('escape', 'g') alongside the existing 'c-g' on the same
handler so the editor handoff works inside Cursor/VSCode too. The
filter (no clarify/approval/sudo/secret prompt active) is unchanged.
VSCode and Cursor bind Ctrl+G to "Find Next" at the editor level, so
the keystroke never reaches the embedded terminal — Ctrl+G to open
\$EDITOR was effectively dead inside those IDEs.
Alt+G is unbound in both editors and reaches the TUI cleanly as
`\x1bg` → `key.meta && ch === 'g'` after parse-keypress. Accept it
alongside the existing isAction(key, ch, 'g') check, and document the
fallback in README + the hotkeys panel.
The Ctrl+G handler was toggling the alt-screen by hand
(`\x1b[?1049l` ... `\x1b[?1049h`) without releasing stdin or kitty
keyboard mode, so the launched editor would lose keystrokes (Ink kept
swallowing them) and editors that don't speak CSI-u (e.g. nano) would
print "Unknown sequence" for every Ctrl-key.
Switch to `withInkSuspended` from @hermes/ink, the same helper
`/setup` already uses. It pauses Ink, removes stdin listeners, drops
raw mode, disables kitty/modifyOtherKeys + mouse + focus reporting,
runs the editor, then restores everything with a full repaint.
Previously _copy_reasoning_content_for_api only padded reasoning_content
when the assistant message had tool_calls. DeepSeek V4 thinking mode
requires the field on every assistant turn, including plain text replies
without tool_calls.
- Remove the 'source_msg.get("tool_calls") and' guard
- Update test: plain assistant turns now get padded for DeepSeek/Kimi
Fixes#15213
- webhooks.md: adds a Video Tutorial section under the intro with a
responsive YouTube iframe (WNYe5mD4fY8).
- configuration.md: adds a Video Tutorial subsection under Auxiliary
Models with a responsive YouTube iframe (NoF-YajElIM).
Both use a 16:9 aspect-ratio wrapper so the embeds scale cleanly on
mobile. Verified with `npm run build` — MDX parses clean, no new
warnings or broken links introduced.
Adds a 'Video Guide' section pointing at the walkthrough of a Hermes agent
abliterating Gemma with OBLITERATUS, so the agent can surface it when the
user wants a visual overview before running the workflow.
- add a written-cell bitmap so selection can distinguish rendered spaces from blank padding
- preserve code indentation without markdown-specific rendering hacks
- clamp selection highlight to real row content so blank drag margins do not render or copy
- keep successful copy actions quiet while preserving usage and failure feedback
- accept forwarded Cmd+C for selection copy in SSH sessions even when Hermes runs on Linux
- keep local Linux Alt+C from acting as copy and update TUI hotkey hints for remote shells
- add reusable overlay key and help-text helpers for picker-style overlays
- make model, session, skills, and pager hints consistently support Esc/q close behavior
- expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution
- keep startup alias resolution network-free and add regression tests in models and tui gateway suites
- run the requested ui-tui lint+format pass and include resulting formatting updates
- guard text-measure cache eviction key in hermes-ink so ui-tui type-check stays green
The Codex Responses API rejects input_text inside assistant messages —
only output_text and refusal are valid content types for assistant role.
_chat_content_to_responses_parts() previously hardcoded all text content
to input_text regardless of the message role. When an assistant message
had list-format content (multimodal or structured), this produced invalid
input_text parts that the API rejected with:
Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.
Fix: add a role parameter to _chat_content_to_responses_parts() that
selects output_text for assistant messages and input_text for user
messages. Thread this through _chat_messages_to_responses_input() and
_preflight_codex_input_items().
Fixes#15687
When a user sends /stop during a streaming API call, the outer poll loop
detects _interrupt_requested and closes the HTTP connection. However, the
inner _call() thread catches the connection error and enters its retry
loop — opening a FRESH connection without checking the interrupt flag.
On slow providers like ollama-cloud, each retry attempt blocks for the
full stream-read timeout (120s+). With 3 retry attempts this caused
510+ second delays between /stop and actual response — the agent appeared
completely unresponsive despite the stop being acknowledged.
Fix: add an _interrupt_requested check at the top of the streaming retry
loop so the agent exits immediately instead of retrying.
Also fix log truncation: all session key logging in gateway/run.py used
[:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437'
(33 chars) to 'agent:main:telegram:' — losing the identifying chat type
and user ID. Replace with full keys to make logs debuggable.
Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.
The post-graceful-drain is-active poll used a fixed 10s timeout, but
systemd's hermes-gateway.service has RestartSec=30 — so systemd won't
respawn the unit for 30s after exit-75, and our poll gives up during
the cooldown. Result: every 'hermes update' printed
⚠ hermes-gateway drained but didn't relaunch — forcing restart
followed by a redundant 'systemctl restart' that kicked the newly-
respawning gateway again (and re-started WhatsApp / Discord a second
time in the process).
Fix: read RestartUSec from the unit via 'systemctl show' and set the
poll budget to max(10s, RestartSec + 10s slack). Units without
RestartSec set (or value=infinity) fall back to the original 10s.
Observed timeline from journalctl before fix:
08:56:22.262 old PID exits 75
08:56:32.707 systemd logs Stopped -> Started (10.4s gap, > 10s budget)
After fix the poll covers 40s — comfortably inside RestartSec + slack.
Validation:
- RestartUSec parser tested against '30s', '100ms', '1min 30s',
'infinity', '', 'garbage', '500us', '2min' — all correct.
- Against the live hermes-gateway.service: parses to 30.0s.
- tests/hermes_cli/test_update_gateway_restart.py: 41/41 pass.
Makes hermes -z usable by sweeper without mutating user config.
- Top-level -m/--model and --provider flags that apply to -z/--oneshot
(mirrors hermes chat's plumbing).
- HERMES_INFERENCE_MODEL env var as the parallel to HERMES_INFERENCE_PROVIDER
for CI / scripted invocations.
- resolve_runtime_provider() gets the requested provider; when --model is
given without --provider, detect_provider_for_model() auto-selects the
provider that serves it (same semantic as /model in an interactive session).
- --provider without --model errors out with exit 2 — carrying a config
model across to a different provider is usually wrong, and silently
picking the provider's catalog default hides the mismatch.
Config defaults still used when both flags are omitted (existing behavior).
Validation (all live against OpenRouter):
-z 'x' ....................... uses config default (opus-4.7)
-z 'x' --model haiku-4.5 ..... haiku-4.5 via auto-detected openrouter
-z 'x' --model ... --provider pair as given
HERMES_INFERENCE_MODEL=... -z haiku-4.5 via env var
-z 'x' --provider anthropic .. exits 2 with error to stderr
* feat: add `hermes -z <prompt>` one-shot mode
Top-level flag that runs a single prompt and prints ONLY the final
response text to stdout. No banner, no spinner, no tool previews, no
session_id line — stdout is machine-readable, stderr is silent.
Tools, memory, rules, and AGENTS.md in the CWD are loaded as normal.
Approvals are auto-bypassed (sets HERMES_YOLO_MODE=1 for the call).
Bypasses cli.py entirely — goes straight to AIAgent.chat().
* feat(oneshot): handle interactive-callback gaps explicitly
Document (and where needed, patch) the interactive surfaces that have
no user to answer in oneshot mode:
- clarify — inject a callback that tells the agent to pick the
best default and continue (previously returned a
generic 'not available in this execution context'
error that wastes a tool call)
- sudo password — terminal_tool already gates on HERMES_INTERACTIVE
(we don't set it); sudo fails gracefully
- shell hooks — HERMES_ACCEPT_HOOKS=1 auto-approves; also falls
back to deny on non-tty stdin
- dangerous cmd — HERMES_YOLO_MODE=1 short-circuits before input()
- secret capture— tool returns gracefully when no callback wired
Live-tested: agent asked clarify(['red','blue']) and got 'red' back,
replied with only 'red'.
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.
Problems with flush_memories:
- Pre-dates the background review loop. It was the only memory-save
path when introduced; the background review now fires every 10 user
turns on CLI and gateway alike, which is far more frequent than
compression or session reset ever triggered flush.
- Blocking and synchronous. Pre-compression flush ran on the live agent
before compression, blocking the user-visible response.
- Cache-breaking. Flush built a temporary conversation prefix
(system prompt + memory-only tool list) that diverged from the live
conversation's cached prefix, invalidating prompt caching. The
gateway variant spawned a fresh AIAgent with its own clean prompt
for each finalized session — still cache-breaking, just in a
different process.
- Redundant. Background review runs in the live conversation's
session context, gets the same content, writes to the same memory
store, and doesn't break the cache. Everything flush_memories
claimed to preserve is already covered.
What this removes:
- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
(and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
_check_compression_model_feasibility (headroom was only needed
because flush dragged the full main-agent system prompt along;
the compression summariser sends a single user-role prompt so
new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
flush-specific paths
What this renames (with read-time backcompat on sessions.json):
- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
The session-expiry watcher still uses the flag to avoid re-running
finalize/eviction on the same expired session; the new name
reflects what it now actually gates. from_dict() reads
'expiry_finalized' first, falls back to the legacy 'memory_flushed'
key so existing sessions.json files upgrade seamlessly.
Supersedes #15631 and #15638.
Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites. No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
_check_compression_model_feasibility calls get_model_context_length
without provider=, so Codex OAuth users get 1,050,000 (from models.dev
for 'openai') instead of the actual 272,000 limit. This happens because
_infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'),
skipping the Codex-specific resolution branch entirely.
Result: compression threshold set at 85% of 1.05M = 892K — conversations
never trigger compression, the context grows unbounded, and when gateway
hygiene eventually forces compression, the Codex endpoint drops the
oversized streaming request ('peer closed connection without sending
complete message body').
Fix: forward self.provider to get_model_context_length so provider-
specific resolution branches (Codex OAuth 272K, Copilot live /models,
Nous suffix-match) fire correctly.
Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).
Follow-up to PR #15658. The feature PR introduced page-scoped slots
(<page>:top / <page>:bottom inside every built-in page) but only
touched the Shell slots catalogue. Adds proper narrative coverage so
plugin authors find the feature.
Changes
- extending-the-dashboard.md:
- Frontmatter description + intro bullet now mention page-scoped slots
- New TOC entry "Augmenting built-in pages (page-scoped slots)"
- New dedicated subsection after "Replacing built-in pages"
explaining the heavy-vs-light tradeoff, listing the pages that
expose slots, and showing a worked manifest + IIFE example with
tab.hidden: true
- Cross-link from the tab.override section pointing readers to the
lighter augmentation option
- web-dashboard.md:
- Bullet mentioning "page-scoped slots (inject widgets into
built-in pages without overriding them)"
Validation
- TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches
the generated heading slug
- Code fences balanced (64, even)
- Pre-existing docusaurus build errors (skills.json, api-server.md
link) reproduce on bare main -- not introduced here
* fix(terminal): three-layer defense against watch_patterns notification spam
Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.
Three layered defenses, each sufficient on its own:
1. Mutual exclusion (terminal_tool.py): When both flags are set on a
background process, drop watch_patterns with a warning. notify_on_complete
wins because 'let me know when it's done' is the more useful signal and
fires exactly once. Extracted as _resolve_notification_flag_conflict() so
the rule is testable in isolation.
2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
bails the moment session.exited is True. Post-exit chunks (buffered reads
draining after the process is gone) no longer produce notifications. This
is the fix flagged as future work in session 20260418_020302_79881c.
3. Global circuit breaker (process_registry.py): Per-session rate limits don't
catch the sibling-flood case — N concurrent processes can each stay under
8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
trips a 30-second cooldown across ALL sessions, emits a single
watch_overflow_tripped event, silently counts dropped events, and emits a
watch_overflow_released summary when the cooldown ends.
Also updates the tool schema + docstring to document the new behavior.
Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.
Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.
* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion
Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.
## New rule — per session
- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
for the session and the session is auto-promoted to notify_on_complete
semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
strike counter — healthy cadence is forgiven.
## Schema + docstring rewritten
The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
description so the model sees the contract every turn
## Removed
- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
/ _watch_strike_candidate / _watch_consecutive_strikes.
## Kept
- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
secondary safety net for concurrent siblings. Still valuable when 20
short-lived processes each fire once — none individually violates the
per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.
## Tests
- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
second in cooldown suppressed, multi-drop = single strike, 3 strikes
disables + promotes, clean window resets counter, suppressed count
carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
* feat(dashboard): page-scoped plugin slots for built-in pages
Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.
Previously, plugins could only:
1. Add new tabs (tab.path)
2. Replace whole built-in pages (tab.override)
3. Inject into global shell slots (header-*, footer-*, pre-main, ...)
None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.
Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
(sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
<PluginSlot name="<page>:top" />
as the first child of its outer wrapper and
<PluginSlot name="<page>:bottom" />
as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
sessions:top via registerSlot(), with matching slots entry in
the manifest -- so freshly-setup users can see what page-scoped
slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
colon-bearing slot names (sessions:top, analytics:bottom, ...)
Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
* fix(terminal): three-layer defense against watch_patterns notification spam
Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.
Three layered defenses, each sufficient on its own:
1. Mutual exclusion (terminal_tool.py): When both flags are set on a
background process, drop watch_patterns with a warning. notify_on_complete
wins because 'let me know when it's done' is the more useful signal and
fires exactly once. Extracted as _resolve_notification_flag_conflict() so
the rule is testable in isolation.
2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
bails the moment session.exited is True. Post-exit chunks (buffered reads
draining after the process is gone) no longer produce notifications. This
is the fix flagged as future work in session 20260418_020302_79881c.
3. Global circuit breaker (process_registry.py): Per-session rate limits don't
catch the sibling-flood case — N concurrent processes can each stay under
8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
trips a 30-second cooldown across ALL sessions, emits a single
watch_overflow_tripped event, silently counts dropped events, and emits a
watch_overflow_released summary when the cooldown ends.
Also updates the tool schema + docstring to document the new behavior.
Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.
Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.
* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion
Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.
## New rule — per session
- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
for the session and the session is auto-promoted to notify_on_complete
semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
strike counter — healthy cadence is forgiven.
## Schema + docstring rewritten
The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
description so the model sees the contract every turn
## Removed
- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
/ _watch_strike_candidate / _watch_consecutive_strikes.
## Kept
- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
secondary safety net for concurrent siblings. Still valuable when 20
short-lived processes each fire once — none individually violates the
per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.
## Tests
- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
second in cooldown suppressed, multi-drop = single strike, 3 strikes
disables + promotes, clean window resets counter, suppressed count
carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
The auto-restart path in `hermes update` verifies systemd unit health with
`time.sleep(3)` + a single `systemctl is-active` call. The unit's
Stopped -> Started transition after a graceful SIGUSR1 exit (or a hard
restart) is not always complete inside that 3s window, so the verify
races and reports 'drained but didn't relaunch' even though systemd is
about to bring the unit back up a fraction of a second later. Users
then see a spurious warning, a redundant fallback `systemctl restart`
fires, and adapters (Discord, WhatsApp) get restarted twice.
Replace the three sleep+oneshot sites with a small `_wait_for_service_active()`
closure that polls `is-active` every 0.5s for up to 10s. Behaviour
is unchanged when the unit is healthy or truly dead — only the race
window around a clean restart is now handled correctly.
Tests: tests/hermes_cli/test_update_gateway_restart.py (41/41).
`hermes tools` → "reconfigure existing" listed Spotify twice because
the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174)
left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets()
unconditionally appended get_plugin_toolsets() on top, so the same
'spotify' key showed up from both sources.
Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the
nicer label and description). Also guards against future bundled plugins
that share a toolset key with a built-in.
Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter — — not just temperature.
- agent/auxiliary_client.py:
* New _is_unsupported_parameter_error(exc, param): matches the same six
phrasings the old temperature detector did plus 'unrecognized parameter'
and 'invalid parameter', against any named param.
* _is_unsupported_temperature_error is now a thin back-compat wrapper so
existing imports and tests keep working.
* The max_tokens → max_completion_tokens retry branch in call_llm and
async_call_llm now (a) gates on 'max_tokens is not None' so we do not
pop a key that was never set and silently substitute a None value on
the retry, and (b) also matches the generic helper in addition to the
legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
up phrasings like 'Unknown parameter: max_tokens' that previously slipped
through.
- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
the generic detector across params, the back-compat wrapper, and the two
hardenings to the max_tokens retry branch (None gate + generic phrasing).
Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:
new_threshold = aux_context
This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.
Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.
This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.
Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.
The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.
call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.
Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
wire into both sync and async call_llm paths before the existing
max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
detector phrasings, sync + async retry, no-retry-without-temperature,
and non-temperature 400s not triggering the retry
Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.
Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.
Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>
The memory-flush fallback for api_mode='codex_responses' was unconditionally
adding `temperature` to codex_kwargs before calling _run_codex_stream. The
Responses API does not accept temperature on any supported backend:
- chatgpt.com/backend-api/codex rejects it outright
- api.openai.com + gpt-5/o-series reasoning models reject it
- Copilot Responses rejects it on reasoning models
The CodexAuxiliaryClient adapter and the codex_responses transport both
correctly omit temperature — the flush fallback was the only path putting
it back. On errors from the primary aux path (e.g. expired OAuth token),
users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter:
temperature`.
Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.
Both discord (read/participate) and discord_admin (server admin) are now
configurable via `hermes tools` with default-OFF. Previously the core
discord tool (fetch_messages, search_members, create_thread) auto-loaded
on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user
never opted into.
Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so
the discord toolsets only show up in the Discord platform's checklist,
not on CLI/Telegram/Slack/etc. Applied at four gates:
- _prompt_toolset_checklist: checklist filter
- _get_platform_tools: resolution filter (both branches)
- _save_platform_tools: save-time filter (covers 'Configure all
platforms' and hand-edited config.yaml)
- tools_disable_enable_command: rejects `hermes tools enable discord`
on non-Discord platforms with a clear error
build_session_context_prompt now injects the Discord IDs block only
when both conditions hold: the discord/discord_admin toolset is
enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough —
the tool's check_fn gates on the token at registry time, so opting
in without a token yields no tools and the IDs block would lie.
Otherwise keep the stale-API disclaimer.
When DISCORD_BOT_TOKEN is set — meaning the discord tool actually
loads — emit a dedicated IDs block in the session context prompt so
the agent can call ``fetch_messages``, ``pin_message``, etc. with
real identifiers instead of probing.
Currently only ``thread_id`` was exposed as a raw ID (via the
``description`` string). The agent in a Discord thread had to guess
that the thread ID doubles as a channel ID for the REST API (it
does), and it had no way to reference the parent channel, the guild,
or the triggering message at all.
The block adapts to context:
- Thread: guild / parent channel / thread / message
- Channel: guild / channel / message
- (DM has no guild/channel IDs worth listing; only message)
Discord isn't in _PII_SAFE_PLATFORMS, so IDs ship unredacted.
The Discord platform note in the session context prompt claimed the
agent has no server-management APIs — pre-dating the discord tool.
With a bot token configured the agent actually has fetch_messages,
search_members, create_thread, and optionally the discord_admin tool;
telling the model otherwise causes it to refuse or apologise for
calls it is fully able to make.
Gate the disclaimer on DISCORD_BOT_TOKEN being unset, matching the
tool's own ``check_fn``. Without a token the note still appears and
remains accurate; with a token the model is no longer gaslit into
refusing valid tool calls.
Discord knows all four identifiers for every inbound message — guild,
channel (or thread), parent channel when in a thread, and the
triggering message. Pass them into ``SessionSource`` via the new
``build_source()`` kwargs so downstream code (context-prompt builder,
delivery, logging) can use them without re-resolving from discord.py
objects.
For auto-threaded messages, remember the original channel as the
parent before swapping ``chat_id`` to the freshly created thread.
Behavioural: still a no-op — nothing consumes these fields yet.
Groundwork for injecting raw platform identifiers into the agent's
system prompt. Currently only `thread_id` is exposed as a raw ID —
callers in a Discord thread had to guess `channel_id == thread_id`
(which happens to work because threads are channels in Discord's REST
API) and had no way to reference the parent channel, guild, or the
triggering message.
Adds three optional fields:
- `guild_id` — Discord guild / Slack workspace / Matrix server scope
- `parent_chat_id` — parent channel when chat_id refers to a thread
- `message_id` — ID of the triggering message (pin/reply/react)
Extends `BasePlatformAdapter.build_source()` to accept + forward them
and teaches `to_dict`/`from_dict` to serialize them. Behaviourally a
no-op: nothing reads the fields yet and they default to None.
The feishu_doc and feishu_drive tools were registered in the tool
registry but never added to the hermes-feishu composite toolset.
The pipeline fix from the prior commit now recovers them automatically
once they are in the composite.
Split the monolithic discord_server tool (14 actions) into two:
- discord: core actions (fetch_messages, search_members, create_thread)
that are useful for the agent's normal operation. Auto-enabled on
the discord platform via the pipeline fix.
- discord_admin: server management actions (list channels/roles, pins,
role assignment) that require explicit opt-in via hermes tools.
Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.
The reverse-mapping loop in _get_platform_tools only checked
CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets
like discord and feishu_doc whose tools were in the composite but
had no configurable key. Add a second pass over TOOLSETS that picks
up unclaimed toolsets whose tools are present in the resolved
composite.
The tool schema promised 'On update, pass an empty array to clear' but the
update branch ignored the context_from kwarg entirely — users could set
the field at create time and never modify or clear it afterward.
- tools/cronjob_tools.py: handle context_from in the update branch the
same way script/enabled_toolsets/workdir are handled: normalize str/list
to refs, validate each referenced job exists (same check the create
branch does), store as list-or-None to match create_job()'s shape.
Empty string or empty list clears the field.
- tests/cron/test_cron_context_from.py: 6 new tests covering add/change/
clear (both shapes)/bad-ref/preserve-across-unrelated-update.
Root installs on Linux now put the code at /usr/local/lib/hermes-agent and
the hermes command at /usr/local/bin/hermes. HERMES_HOME (~/.hermes) stays
state-only. Matches Claude Code / Codex CLI / OpenClaw, keeps Docker
bind-mounted /root/ volumes lean, and puts the command on every shell's
default PATH without touching shell RC files.
- Non-root users and macOS root: unchanged
- Existing root installs at $HERMES_HOME/hermes-agent: preserved in-place
(detected via .git dir) — no auto-migration, no breakage
- Explicit --dir / $HERMES_INSTALL_DIR: always wins, never overridden
- Termux: unchanged (package manager manages /data/data/...)
Requested by @souly9999 (Discord). Our own Dockerfile already uses this
split (code at /opt/hermes, data at /opt/data volume); the user-install
path now matches.
YAML parses bare numeric toolset names (e.g. 12306:) as int, causing
TypeError in sorted() since the read path normalizes to str but the
save path did not.
The no_mcp sentinel was preserved in existing entries even when the
user re-enabled MCP servers, causing MCP to stay silently disabled.
update_model() recalculated threshold_tokens but left tail_token_budget
and max_summary_tokens at their __init__ values. When switching from a
200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K)
instead of the intended ~10%.
Adds budget recalculation in update_model() and 2 regression tests.
The web-dashboard.md and dashboard-plugins.md pages had overlapping,
partial coverage of the theme and plugin systems. Themes were split
across two pages; the plugin docs had a minimal manifest reference but
no step-by-step guide, no slot catalog, and no theme+plugin demo.
New: user-guide/features/extending-the-dashboard.md — single navigable
reference for all three extension layers (themes, UI plugins, backend
plugins). Includes:
- Theme quick-start + full schema (palette, typography, layout, layout
variants, assets, componentStyles, colorOverrides, customCSS)
- Plugin quick-start + full schema (manifest, SDK, slots, tab.override,
tab.hidden, backend routes, custom CSS)
- 10-slot shell catalog with locations
- Plugin discovery + load lifecycle
- Combined theme+plugin walkthrough (Strike Freedom cockpit demo)
- API reference + troubleshooting
web-dashboard.md: trimmed to core tool docs (pages, REST API, CORS,
development). Theme/plugin content now points to the new page with a
built-in themes summary table.
dashboard-plugins.md: deleted (merged into extending-the-dashboard.md).
sidebars.ts: swap 'dashboard-plugins' → 'extending-the-dashboard' under
the Management group.
No user-facing behavior change; docs-only.
Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval
callback lives in tools/terminal_tool.py's threading.local(), which worker
threads do not inherit. When a subagent hits a dangerous-command guard,
prompt_dangerous_approval() falls back to input() from the worker thread,
deadlocking against the parent's prompt_toolkit TUI that owns stdin.
Fix: install a non-interactive callback into every subagent worker thread
via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)).
The callback is config-gated by delegation.subagent_auto_approve:
false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist)
true -> _subagent_auto_approve (opt-in YOLO for cron/batch)
Both emit a logger.warning audit line. Gateway sessions are unaffected
because they resolve approvals via tools/approval.py's per-session queue,
not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685).
- hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False
- cli-config.yaml.example: documented, commented (default)
- tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve,
_get_subagent_approval_callback, wired into the child timeout executor
- tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion,
and TLS scoping in the worker thread
On Windows WSL2, ConPTY implicitly enables mouse event injection when
the alternate screen buffer (DEC 1049) is entered, causing raw escape
sequences to appear in the transcript as ghost characters.
Fix (two parts):
1. ConPTY fix: send DISABLE_MOUSE_TRACKING immediately after entering
alt screen when mouse tracking is off (AlternateScreen.tsx)
2. Runtime toggle: add /mouse [on|off|toggle] slash command with config
persistence (display.tui_mouse) so users can manage this at runtime
The env var HERMES_TUI_DISABLE_MOUSE continues to work as the initial
default, but can now be overridden via /mouse and persisted to config.
Closes: upstream ConPTY mouse injection issue
Credits: OutThisLife / PR #13716 for the toggle concept
Two adjustments to make CI pass:
- In gateway/platforms/matrix.py: `DeviceID` is `NewType("DeviceID", str)`,
so passing `client.device_id` directly (already a str) works identically
at runtime. The explicit import was cosmetic and tripped CI environments
where `mautrix.types` doesn't re-export DeviceID at the expected path
("cannot import name 'DeviceID' from 'mautrix.types' (unknown location)").
- In tests/gateway/test_matrix.py: add `put_device_id` to the hand-written
`PgCryptoStore` fake so the three encryption-path tests
(test_connect_with_access_token_and_encryption,
test_connect_uses_configured_device_id_over_whoami,
test_connect_registers_encrypted_event_handler_when_encryption_on) can
exercise the new crypto-store binding without AttributeError.
PgCryptoStore.__init__ defaults _device_id to "" and put_account writes
that blank value into crypto_account. The UPSERT's ON CONFLICT DO UPDATE
clause deliberately does not touch device_id, so once the row is written
blank it stays blank forever — breaking every downstream device-scoped
olm operation. Peers' to-device olm ciphertext can't match our identity
key, no megolm sessions ever land, and the user sees "hermes is in the
room but never responds to encrypted messages".
Fix: call put_device_id(client.device_id) immediately after
crypto_store.open() and before olm.load(). This sets the store's
in-memory _device_id so the first put_account INSERT writes the correct
value from the start.
Observable symptoms without the fix, on a fresh crypto.db:
- crypto_account.device_id = ""
- crypto_tracked_user: 0 rows
- crypto_device: 0 rows
- crypto_olm_session: 0 rows
- crypto_megolm_inbound_session: 0 rows
- "No one-time keys nor device keys got when trying to share keys"
warning on every startup
- "olm event doesn't contain ciphertext for this device" DecryptionError
on any inbound to-device event
- Encrypted room messages arrive but never decrypt
After the fix (wiped crypto.db + restart):
- device_id populated with actual runtime device (e.g. CZIKTRFLOV)
- all counts populate from sync as expected
- encrypted DMs flow normally
Who hits this: anyone with a fresh crypto.db — includes first-time matrix
E2EE setup, nio→mautrix migrations (since matrix.py removes the legacy
pickle on startup, creating a fresh SQLite store), and anyone who wipes
crypto.db to start over. Existing installs that somehow already have a
non-blank device_id would be unaffected, but no prior code path writes
it correctly, so that set is likely empty.
* fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths
fix-lockfiles checked npm lockfile hashes by running
`nix build .#<attr>.npmDeps`, but fetchNpmDeps is a fixed-output
derivation — if the old store path exists locally, Nix returns it from
cache without re-fetching. This caused the script to report "ok" even
when hashes were stale, while CI (with no cache) failed with a hash
mismatch.
Adding --rebuild forces Nix to re-derive and verify the output hash
against the declared one, catching staleness regardless of local cache
state. Also updates the tui and web npm deps hashes that were stale.
* fix(nix): regenerate ui-tui lockfile to add missing @emnapi entries
npm ci was failing because @emnapi/core and @emnapi/runtime were
missing from ui-tui/package-lock.json despite being required as peer
deps by @napi-rs/wasm-runtime (via @rolldown/binding-wasm32-wasi).
Running npm install --package-lock-only adds the missing entries.
The npmDepsHash reverts to its previous value since fetchNpmDeps was
already fetching these packages as transitive dependencies.
/model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because
the display block used ModelInfo.context_window directly from models.dev.
Codex OAuth actually enforces 272K for the same slug, and the agent's
compressor already runs at 272K via get_model_context_length() — so the
banner + real context budget said 272K while /model lied with 1M.
Route the display context through a new resolve_display_context_length()
helper that always prefers agent.model_metadata.get_model_context_length
(which knows about Codex OAuth, Copilot, Nous caps) and only falls back
to models.dev when that returns nothing.
Fix applied to all 3 /model display sites:
cli.py _handle_model_switch
gateway/run.py picker on_model_selected callback
gateway/run.py text-fallback confirmation
Reported by @emilstridell (Telegram, April 2026).
Closes#13626.
Two follow-ups on top of the _hermes_home helper from @jerome-benoit's #12729:
1. Declare a [google] optional extra in pyproject.toml
(google-api-python-client, google-auth-oauthlib, google-auth-httplib2) and
include it in [all]. Packagers (Nix flake, Homebrew) now ship the deps by
default, so `setup.py --check` does not need to shell out to pip at
runtime — the imports succeed and install_deps() is never reached.
This fixes the Nix breakage where pip/ensurepip are stripped.
2. Add `from __future__ import annotations` to setup.py so the PEP 604
`str | None` annotation parses on Python 3.9 (macOS system python).
Previously system python3 SyntaxError'd before any code ran.
install_deps() error message now also points users at the extra instead of
just the raw pip command.
The three google-workspace scripts (setup.py, google_api.py, gws_bridge.py)
each had their own way of resolving HERMES_HOME:
- setup.py imported hermes_constants (crashes outside Hermes process)
- google_api.py used os.getenv inline (no strip, no empty handling)
- gws_bridge.py defined its own local get_hermes_home() (duplicate)
Extract the common logic into _hermes_home.py which:
- Delegates to hermes_constants when available (profile support, etc.)
- Falls back to os.getenv with .strip() + empty-as-unset handling
- Provides display_hermes_home() with ~/ shortening for profiles
All three scripts now import from _hermes_home instead of duplicating.
7 regression tests cover the fallback path: env var override, default
~/.hermes, empty env var, display shortening, profile paths, and
custom non-home paths.
Closes#12722
Extracts _needs_kimi_tool_reasoning() for symmetry with the existing
_needs_deepseek_tool_reasoning() helper, so _copy_reasoning_content_for_api
uses the same detection logic as _build_assistant_message. Future changes
to either provider's signals now only touch one function.
Adds tests/run_agent/test_deepseek_reasoning_content_echo.py covering:
- All 3 DeepSeek detection signals (provider, model, host)
- Poisoned history replay (empty string fallback)
- Plain assistant turns NOT padded
- Explicit reasoning_content preserved
- Reasoning field promoted to reasoning_content
- Existing Kimi/Moonshot detection intact
- Non-thinking providers left alone
21 tests, all pass.
DeepSeek V4 thinking mode requires reasoning_content on every
assistant message that includes tool_calls. When this field is
missing from persisted history, replaying the session causes
HTTP 400: 'The reasoning_content in the thinking mode must be
passed back to the API.'
Two-part fix (refs #15250):
1. _copy_reasoning_content_for_api: Merge the Kimi-only and
DeepSeek detection into a single needs_tool_reasoning_echo
check. This handles already-poisoned persisted sessions by
injecting an empty reasoning_content on replay.
2. _build_assistant_message: Store reasoning_content='' on new
DeepSeek tool-call messages at creation time, preventing
future session poisoning at the source.
Additional fix:
3. _handle_max_iterations: Add missing call to
_copy_reasoning_content_for_api in the max-iterations flush
path (previously only main loop and flush_memories had it).
Detection covers:
- provider == 'deepseek'
- model name containing 'deepseek' (case-insensitive)
- base URL matching api.deepseek.com (for custom provider)
``run_conversation`` was calling ``memory_manager.sync_all(
original_user_message, final_response)`` at the end of every turn
where both args were present. That gate didn't consider the
``interrupted`` local flag, so an external memory backend received
partial assistant output, aborted tool chains, or mid-stream resets as
durable conversational truth. Downstream recall then treated the
not-yet-real state as if the user had seen it complete, poisoning the
trust boundary between "what the user took away from the turn" and
"what Hermes was in the middle of producing when the interrupt hit".
Extracted the inline sync block into a new private method
``AIAgent._sync_external_memory_for_turn(original_user_message,
final_response, interrupted)`` so the interrupt guard is a single
visible check at the top of the method instead of hidden in a
boolean-and at the call site. That also gives tests a clean seam to
assert on — the pre-fix layout buried the logic inside the 3,000-line
``run_conversation`` function where no focused test could reach it.
The new method encodes three independent skip conditions:
1. ``interrupted`` → skip entirely (the #15218 fix). Applies even
when ``final_response`` and ``original_user_message`` happen to
be populated — an interrupt may have landed between a streamed
reply and the next tool call, so the strings on disk are not
actually the turn the user took away.
2. No memory manager / no final_response / no user message →
preserve existing skip behaviour (nothing new for providerless
sessions, system-initiated refreshes, tool-only turns that never
resolved, etc.).
3. Sync_all / queue_prefetch_all exceptions → swallow. External
memory providers are strictly best-effort; a misconfigured or
offline backend must never block the user from seeing their
response.
The prefetch side-effect is gated on the same interrupt flag: the
user's next message is almost certainly a retry of the same intent,
and a prefetch keyed on the interrupted turn would fire against stale
context.
### Tests (16 new, all passing on py3.11 venv)
``tests/run_agent/test_memory_sync_interrupted.py`` exercises the
helper directly on a bare ``AIAgent`` (``__new__`` pattern that the
interrupt-propagation tests already use). Coverage:
- Interrupted turn with full-looking response → no sync (the fix)
- Interrupted turn with long assistant output → no sync (the interrupt
could have landed mid-stream; strings-on-disk lie)
- Normal completed turn → sync_all + queue_prefetch_all both called
with the right args (regression guard for the positive path)
- No final_response / no user_message / no memory manager → existing
pre-fix skip paths still apply
- sync_all raises → exception swallowed, prefetch still attempted
- queue_prefetch_all raises → exception swallowed after sync succeeded
- 8-case parametrised matrix across (interrupted × final_response ×
original_user_message) asserts sync fires iff interrupted=False AND
both strings are non-empty
Closes#15218
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_load_auth_store() caught all parse/read exceptions and silently
returned an empty store, making corruption look like a logout with
no diagnostic information and no way to recover the original file.
Now copies the corrupt file to auth.json.corrupt before resetting,
and logs a warning with the exception and backup path.
Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.
Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.
Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
_submit_anthropic_pkce() retrieved sess under _oauth_sessions_lock but
wrote back to sess["status"] and sess["error_message"] outside the lock.
A concurrent session GC or cancel could race with these writes, producing
inconsistent session state.
Wrap all 4 sess write sites in _oauth_sessions_lock:
- network exception path (Token exchange failed)
- missing access_token path
- credential save failure path
- success path (approved)
Before this, typing during /compress was accepted by the classic CLI
prompt and landed in the next prompt after compression finished,
effectively consuming a keystroke for a prompt that was about to be
replaced. Wrapping the body in self._busy_command('Compressing
context...') blocks input rendering for the duration, matching the
pattern /skills install and other slow commands already use.
Salvages the useful part of #10303 (@iRonin). The `_compressing` flag
added to run_agent.py in the original PR was dead code (set in 3 spots,
read nowhere — not by cli.py, not by run_agent.py, not by the Ink TUI
which doesn't use _busy_command at all) and was dropped.
When display.busy_input_mode is 'queue', the runner-level PRIORITY block
in _handle_message was still calling running_agent.interrupt() for every
text follow-up to an active session. The adapter-level busy handler
already honors queue mode (commit 9d147f7fd), but this runner-level path
was an unconditional interrupt regardless of config.
Adds a queue-mode branch that queues the follow-up via
_queue_or_replace_pending_event() and returns without interrupting.
Salvages the useful part of #12070 (@knockyai). The config fan-out to
per-platform extra was redundant — runner already loads busy_input_mode
directly via _load_busy_input_mode().
Follow-up to @iRonin's Ctrl+D EOF fix. If the input text is empty but
the user has pending attached images, do nothing rather than exiting —
otherwise a stray Ctrl+D silently discards the attachments.
skill_view response went to the model verbatim; duplicating the SKILL.md
body as raw_content on every tool call added token cost with no agent-facing
benefit. Remove the field and update tests to assert on content only.
The slash/preload caller (agent/skill_commands.py) already falls back to
content when raw_content is absent, and it calls skill_view(preprocess=False)
anyway, so content is already unrendered on that path.
- update faq answer with new `backup` command in release 0.9.0
- move profile export section together with backup section so related information can be read more easily
- add table comparison between `profile export` and `backup` to assist users if understanding the nuances between both
Extends _repair_tool_call_arguments() to cover the most common local-model
JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and
newlines inside JSON string values (memory save summaries, file contents,
etc.). Previously fell through to '{}' replacement, losing the call.
Adds two repair passes:
- Pass 0: json.loads(strict=False) + re-serialise to canonical wire form
- Pass 4: escape 0x00-0x1F control chars inside string values, then retry
Ports the core utility from #12068 / PR #12093 without the larger plumbing
change (that PR also replaced json.loads at 8 call sites; current main's
_repair_tool_call_arguments is already the single chokepoint, so the
upgrade happens transparently for every existing caller).
Credit: @truenorth-lj for the original utility design.
4 new regression tests covering literal newlines, tabs, re-serialisation
to strict=True-valid output, and the trailing-comma + control-char
combination case.
When the streaming path (chat completions) assembled tool call deltas and
detected malformed JSON arguments, it set has_truncated_tool_args=True but
passed the broken args through unchanged. This triggered the truncation
handler which returned a partial result and killed the session (/new required).
_many_ malformations are repairable: trailing commas, unclosed brackets,
Python None, empty strings. _repair_tool_call_arguments() already existed
for the pre-API-request path but wasn't called during streaming assembly.
Now when JSON parsing fails during streaming assembly, we attempt repair
via _repair_tool_call_arguments() before flagging as truncated. If repair
succeeds (returns valid JSON), the tool call proceeds normally. Only truly
unrepairable args fall through to the truncation handler.
This prevents the most common session-killing failure mode for models like
GLM-5.1 that produce trailing commas or unclosed brackets.
Tests: 12 new streaming assembly repair tests, all 29 existing repair
tests still passing.
The web UI schema advertised 'block' as a busy_input_mode choice, but
no implementation ever existed — the gateway and CLI both silently
collapsed 'block' (and anything other than 'queue') to 'interrupt'.
Users who picked 'block' in the dashboard got interrupts anyway.
Drop 'block' from the select options. The two supported modes are
'interrupt' (default) and 'queue'.
When a session is split by context compression mid-tool-call, an assistant
message may end up with truncated/invalid JSON in tool_calls[*].function.arguments.
On the next turn this is replayed verbatim and providers reject the entire request
with HTTP 400 invalid_tool_call_format, bricking the conversation in a loop that
cannot recover without manual session quarantine.
This patch adds a defensive sanitizer that runs immediately before
client.chat.completions.create() in AIAgent.run_conversation():
- Validates each assistant tool_calls[*].function.arguments via json.loads
- Replaces invalid/empty arguments with '{}'
- Injects a synthetic tool response (or prepends a marker to the existing one)
so downstream messages keep valid tool_call_id pairing
- Logs each repair with session_id / message_index / preview for observability
Defense in depth: corruption can originate from compression splits, manual edits,
or plugin bugs. Sanitizing at the send chokepoint catches all sources.
Adds 7 unit tests covering: truncated JSON, empty string, None, non-string args,
existing matching tool response (no duplicate injection), non-assistant messages
ignored, multiple repairs.
Fixes#15236
gpt-5.x on the Codex Responses API sometimes degenerates and emits
Harmony-style `to=functions.<name> {json}` serialization as plain
assistant-message text instead of a structured `function_call` item.
The intent never makes it into `response.output` as a function_call,
so `tool_calls` is empty and `_normalize_codex_response()` returns
the leaked text as the final content. Downstream (e.g. delegate_task),
this surfaces as a confident-looking summary with `tool_trace: []`
because no tools actually ran — the Taiwan-embassy-email bug report.
Detect the pattern, scrub the content, and return finish_reason=
'incomplete' so the existing Codex-incomplete continuation path
(run_agent.py:11331, 3 retries) gets a chance to re-elicit a proper
function_call item. Encrypted reasoning items are preserved so the
model keeps its chain-of-thought on the retry.
Regression tests: leaked text triggers incomplete, real tool calls
alongside leak-looking text are preserved, clean responses pass
through unchanged.
Reported on Discord (gpt-5.4 / openai-codex).
Covers the two bugs salvaged from PR #15161:
- test_batch_runner_checkpoint: TestFinalCheckpointNoDuplicates asserts
the final aggregated completed_prompts list has no duplicate indices,
and keeps a sanity anchor test documenting the pre-fix pattern so a
future refactor that re-introduces it is caught immediately.
- test_model_tools: TestCoerceNumberInfNan asserts _coerce_number
returns the original string for inf/-inf/nan/Infinity inputs and that
the result round-trips through strict (allow_nan=False) json.dumps.
batch_runner: completed_prompts_set is already fully populated by the
time the aggregation loop runs (incremental updates happen at result
collection time), so the subsequent extend() call re-added every
completed prompt index a second time. Removed the redundant variable
and extend, and write sorted(completed_prompts_set) directly to the
final checkpoint instead.
model_tools: _coerce_number returned Python float('inf')/float('nan')
for inf/nan strings rather than the original string. json.dumps raises
ValueError for these values, so any tool call where the model emitted
"inf" or "nan" for a numeric parameter would crash at serialization.
Changed the guard to return the original string, matching the
function's documented "returns original string on failure" contract.
Replaces gpt-5.4 / gpt-5.4-pro entries in the OpenRouter fallback snapshot
and the Nous Portal curated list. Other aggregators (Vercel AI Gateway)
and provider-native lists are unchanged.
Inline diff segments were anchored relative to assistant narration, but the
turn details pane still rendered after streamSegments. On completion that put
the diff before the tool telemetry that produced it. When a turn has anchored
diff segments, commit the accumulated thinking/tool trail as a pre-diff trail
message, then render the diff and final summary.
TUI auto-resolves `display.personality` at session init, unlike the base CLI.
If config contains `agent.personalities: null`, `_resolve_personality_prompt`
called `.get()` on None and failed before model/provider selection.
Normalize null personalities to `{}` and surface a targeted config warning.
Tolerating null top-level keys silently drops user settings (e.g.
`agent.system_prompt` next to a bare `agent:` line is gone). Probe at
session create, log via `logger.warning`, and surface in the boot info
under `config_warning` — rendered in the TUI feed alongside the existing
`credential_warning` banner.
YAML parses bare keys like `agent:` or `display:` as None. `dict.get(key, {})`
returns that None instead of the default (defaults only fire on missing keys),
so every `cfg.get("agent", {}).get(...)` chain in tui_gateway/server.py
crashed agent init with `'NoneType' object has no attribute 'get'`.
Guard all 21 sites with `(cfg.get(X) or {})`. Regression test covers the
null-section init path reported on Twitter against the new TUI.
Recovers the manual click on the details accordion: with #14968's new
SECTION_DEFAULTS (thinking/tools start `expanded`), every panel render
was OR-ing the local open toggle against `visible.X === 'expanded'`.
That pinned `open=true` for the default-expanded sections, so clicking
the chevron flipped the local state but the panel never collapsed.
Local toggle is now the sole source of truth at render time; the
useState init still seeds from the resolved visibility (so first paint
is correct) and the existing useEffect still re-syncs when the user
mutates visibility at runtime via `/details`.
Same OR-lock cleared inside SubagentAccordion (`showChildren ||
openX`) — pre-existing but the same shape, so expand-all on the
spawn tree no longer makes inner sections un-collapsible either.
Follow-up to the canonical-identity session-key fix: pull the
JID/LID normalize/expand/canonical helpers into gateway/whatsapp_identity.py
instead of living in two places. gateway/session.py (session-key build) and
gateway/run.py (authorisation allowlist) now both import from the shared
module, so the two resolution paths can't drift apart.
Also switches the auth path from module-level _hermes_home (cached at
import time) to dynamic get_hermes_home() lookup, which matches the
session-key path and correctly reflects HERMES_HOME env overrides. The
lone test that monkeypatched gateway.run._hermes_home for the WhatsApp
auth path is updated to set HERMES_HOME env var instead; all other
tests that monkeypatch _hermes_home for unrelated paths (update,
restart drain, shutdown marker, etc.) still work — the module-level
_hermes_home is untouched.
Hermes' WhatsApp bridge routinely surfaces the same person under either
a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid),
and may flip between the two for a single human within the same
conversation. Before this change, build_session_key used the raw
identifier verbatim, so the bridge reshuffling an alias form produced
two distinct session keys for the same person — in two places:
1. DM chat_id — a user's DM sessions split in half, transcripts and
per-sender state diverge.
2. Group participant_id (with group_sessions_per_user enabled) — a
member's per-user session inside a group splits in half for the
same reason.
Add a canonicalizer that walks the bridge's lid-mapping-*.json files
and picks the shortest/numeric-preferred alias as the stable identity.
build_session_key now routes both the DM chat_id and the group
participant_id through this helper when the platform is WhatsApp.
All other platforms and chat types are untouched.
Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier
as public helpers. Plugins that need per-sender behaviour (role-based
routing, per-contact authorization, policy gating) need the same
identity resolution Hermes uses internally; without a public helper,
each plugin would have to re-implement the walker against the bridge's
internal on-disk format. Keeping this alongside build_session_key
makes it authoritative and one refactor away if the bridge ever
changes shape.
_expand_whatsapp_aliases stays private — it's an implementation detail
of how the mapping files are walked, not a contract callers should
depend on.
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.
Architecture:
browser <Terminal> (xterm.js)
│ onData ───► ws.send(keystrokes)
│ onResize ► ws.send('\x1b[RESIZE:cols;rows]')
│ write ◄── ws.onmessage (PTY bytes)
▼
FastAPI /api/pty (token-gated, loopback-only)
▼
PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent
Components
----------
hermes_cli/pty_bridge.py
Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
inherently byte-oriented and UTF-8 boundaries may land mid-read),
non-blocking select-based reads, TIOCSWINSZ resize, idempotent
SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
is WSL-supported only).
hermes_cli/web_server.py
@app.websocket("/api/pty") endpoint gated by the existing
_SESSION_TOKEN (via ?token= query param since browsers can't set
Authorization on WS upgrades). Loopback-only enforcement. Reader task
uses run_in_executor to pump PTY bytes without blocking the event
loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
before forwarding to the PTY. The endpoint resolves the TUI argv
through a _resolve_chat_argv hook so tests can inject fake commands
without building the real TUI.
Tests
-----
tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.
tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.
96 tests pass under scripts/run_tests.sh.
(cherry picked from commit 29b337bca7)
feat(web): add Chat tab with xterm.js terminal + Sessions resume button
(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
against current main: BUILTIN_ROUTES table + plugin slot layout)
fix(tui): replace OSC 52 jargon in /copy confirmation
When the user ran /copy successfully, Ink confirmed with:
sent OSC52 copy sequence (terminal support required)
That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.
Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:
copied 1482 chars
(cherry picked from commit a0701b1d5a)
docs: document the dashboard Chat tab
AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'
website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).
(cherry picked from commit 2c2e32cc45)
feat(tui-gateway): transport-aware dispatch + WebSocket sidecar
Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.
- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
+ module-level `StdioTransport` fallback. The stdio stream resolves
through a lambda so existing tests that monkey-patch `_real_stdout`
keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
- `write_json` routes via session transport (for async events) →
contextvar transport (for in-request writes) → stdio fallback.
- `dispatch(req, transport=None)` binds the transport for the request
lifetime and propagates it to pool workers via `contextvars.copy_context`
so async handlers don't lose their sink.
- `_init_session` and the manual-session create path stash the
request's transport so out-of-band events (subagent.complete, etc.)
fan out to the right peer.
`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.
feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal
Composes the two transports into a single Chat tab:
┌─────────────────────────────────────────┬──────────────┐
│ xterm.js / PTY (emozilla #13379) │ ChatSidebar │
│ the literal hermes --tui process │ /api/ws │
└─────────────────────────────────────────┴──────────────┘
terminal bytes structured events
The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:
• model + provider badge with connection state (click → switch)
• running tool-call list (driven by tool.start / tool.progress /
tool.complete events)
• model picker dialog (gateway-driven, reuses ModelPickerDialog)
The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.
- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
Owns its GatewayClient, drives the model picker through
`slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
(`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.
Co-authored-by: emozilla <emozilla@nousresearch.com>
refactor(web): /clean pass on ChatSidebar + ChatPage lint debt
- ChatSidebar: lift gw out of useRef into a useMemo derived from a
reconnect counter. React 19's react-hooks/refs and react-hooks/
set-state-in-effect rules both fire when you touch a ref during
render or call setState from inside a useEffect body. The
counter-derived gw is the canonical pattern for "external resource
that needs to be replaceable on user action" — re-creating the
client comes from bumping `version`, the effect just wires + tears
down. Drops the imperative `gwRef.current = …` reassign in
reconnect, drops the truthy ref guard in JSX. modelLabel +
banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
so the effect body doesn't have to setState on first run. Drops
the unused react-hooks/exhaustive-deps eslint-disable. Adds a
scoped no-control-regex disable on the SGR mouse parser regex
(the \\x1b is intentional for xterm escape sequences).
All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.
Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.
chore: uptick
fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary
The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.
Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.
feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast
The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.
Cross-process forwarding:
- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
(event_publisher.py, sync websockets client). entry.py installs the
tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
every dispatcher emit to a back-WS without disturbing Ink's stdio
handshake.
- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
(subscriber) endpoints with a per-channel registry. /api/pty now
accepts ?channel= and propagates the sidecar URL via env. start_server
also stashes app.state.bound_port so the URL is constructable.
- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
passes it to /api/pty and as a prop to ChatSidebar.
- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
tool.start/progress/complete back into the ToolCall list. Restores
the ToolCall primitive.
Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).
fix(web): address Copilot review on #14890
Five threads, all real:
- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
the open handshake. Server emits `gateway.ready` immediately after
accept, so a listener attached after the open promise could race past
the initial skin payload and lose it.
- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
WS into the existing error banner. 4401/4403 (auth/loopback reject)
surface as a "reload the page" message; mid-stream drops surface as
"events feed disconnected" with the existing reconnect button. Clean
unmount closes (1000/1001) stay silent.
- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
ptyprocess lives in the `pty` extra, not `web`. Switch to
`hermes-agent[web,pty]` in both prerequisite blocks.
- AGENTS.md: previous "never add a parallel React chat surface" guidance
was overbroad and contradicted this PR's sidebar. Tightened to forbid
re-implementing the transcript/composer/PTY terminal while explicitly
allowing structured supporting widgets (sidebar / model picker /
inspectors), matching the actual architecture.
- web/package-lock.json: regenerated cleanly so the wterm sibling
workspace paths (extraneous machine-local entries) stop polluting CI.
Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.
refactor(web): /clean pass on ChatSidebar events handler
Spotted in the round-2 review:
- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
hit the "unexpected drop" branch. Track `unmounting` in the effect
scope and gate the banner through a `surface()` helper so cleanup
closes stay silent.
- DRY the duplicated "events feed disconnected" string into a local
const used by both the error and close handlers.
- Drop the `opened` flag (no longer needed once the unmount guard is
the source of truth for "is this an expected close?").
Three interrupt-recovery sites in run_agent.py rebuilt self._anthropic_client
with build_anthropic_client(self._anthropic_api_key, ...) unconditionally.
When provider=bedrock + api_mode=anthropic_messages (AnthropicBedrock SDK
path), self._anthropic_api_key is the sentinel 'aws-sdk' — build_anthropic_client
doesn't accept that and the rebuild either crashed or produced a non-functional
client.
Extract a _rebuild_anthropic_client() helper that dispatches to
build_anthropic_bedrock_client(region) when provider='bedrock', falling back
to build_anthropic_client() for native Anthropic and other anthropic_messages
providers (MiniMax, Kimi, Alibaba, etc.). Three inline rebuild sites now call
the helper.
Partial salvage of #14680 by @bsgdigital — only the _rebuild_anthropic_client
helper. The normalize_model_name Bedrock-prefix piece was subsumed by #14664,
and the aux client aws_sdk branch was subsumed by #14770 (both in the same
salvage PR as this commit).
## Problem
When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT
timeout, VPN flap, server-side TCP RST, proxy idle cull), the next
Converse call surfaces as one of:
* botocore.exceptions.ConnectionClosedError / ReadTimeoutError /
EndpointConnectionError / ConnectTimeoutError
* urllib3.exceptions.ProtocolError
* A bare AssertionError raised from inside urllib3 or botocore
(internal connection-pool invariant check)
The agent loop retries the request 3x, but the cached boto3 client in
_bedrock_runtime_client_cache is reused across retries — so every
attempt hits the same dead connection pool and fails identically.
Only a process restart clears the cache and lets the user keep working.
The bare-AssertionError variant is particularly user-hostile because
str(AssertionError()) is an empty string, so the retry banner shows:
⚠️ API call failed: AssertionError
📝 Error:
with no hint of what went wrong.
## Fix
Add two helpers to agent/bedrock_adapter.py:
* is_stale_connection_error(exc) — classifies exceptions that
indicate dead-client/dead-socket state. Matches botocore
ConnectionError + HTTPClientError subtrees, urllib3
ProtocolError / NewConnectionError, and AssertionError
raised from a frame whose module name starts with urllib3.,
botocore., or boto3.. Application-level AssertionErrors are
intentionally excluded.
* invalidate_runtime_client(region) — per-region counterpart to
the existing reset_client_cache(). Evicts a single cached
client so the next call rebuilds it (and its connection pool).
Wire both into the Converse call sites:
* call_converse() / call_converse_stream() in
bedrock_adapter.py (defense-in-depth for any future caller)
* The two direct client.converse(**kwargs) /
client.converse_stream(**kwargs) call sites in run_agent.py
(the paths the agent loop actually uses)
On a stale-connection exception, the client is evicted and the
exception re-raised unchanged. The agent's existing retry loop then
builds a fresh client on the next attempt and recovers without
requiring a process restart.
## Tests
tests/agent/test_bedrock_adapter.py gets three new classes (14 tests):
* TestInvalidateRuntimeClient — per-region eviction correctness;
non-cached region returns False.
* TestIsStaleConnectionError — classifies botocore
ConnectionClosedError / EndpointConnectionError /
ReadTimeoutError, urllib3 ProtocolError, library-internal
AssertionError (both urllib3.* and botocore.* frames), and
correctly ignores application-level AssertionError and
unrelated exceptions (ValueError, KeyError).
* TestCallConverseInvalidatesOnStaleError — end-to-end: stale
error evicts the cached client, non-stale error (validation)
leaves it alone, successful call leaves it cached.
All 116 tests in test_bedrock_adapter.py pass.
Signed-off-by: Andre Kurait <andrekurait@gmail.com>
Bedrock's aws_sdk auth_type had no matching branch in
resolve_provider_client(), causing it to fall through to the
"unhandled auth_type" warning and return (None, None). This broke
all auxiliary tasks (compression, memory, summarization) for Bedrock
users — the main conversation loop worked fine, but background
context management silently failed.
Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via
build_anthropic_bedrock_client(), using boto3's default credential
chain (IAM roles, SSO, env vars, instance metadata). Default
auxiliary model is Haiku for cost efficiency.
Closes#13919
## Problem
`get_model_context_length()` in `agent/model_metadata.py` had a resolution
order bug that caused every Bedrock model to fall back to the 128K default
context length instead of reaching the static Bedrock table (200K for
Claude, etc.).
The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in
`_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False.
The resolution order then ran the custom-endpoint probe (step 2) *before*
the Bedrock branch (step 4b), which:
1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`).
2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the
bedrock-runtime URL (Bedrock doesn't serve this shape).
3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the
"probe-down" branch — never reaching the Bedrock static table.
Result: users on Bedrock saw 128K context for Claude models that
actually support 200K on Bedrock, causing premature auto-compression.
## Fix
Promote the Bedrock branch from step 4b to step 1b, so it runs *before*
the custom-endpoint probe at step 2. The static table in
`bedrock_adapter.py::get_bedrock_context_length()` is the authoritative
source for Bedrock (the ListFoundationModels API doesn't expose context
window sizes), so there's no reason to probe `/models` first.
The original step 4b is replaced with a one-line breadcrumb comment
pointing to the new location, to make the resolution-order docstring
accurate.
## Changes
- `agent/model_metadata.py`
- Add step 1b: Bedrock static-table branch (unchanged predicate, moved).
- Remove dead step 4b block, replace with breadcrumb comment.
- Update resolution-order docstring to include step 1b.
- `tests/agent/test_model_metadata.py`
- New `TestBedrockContextResolution` class (3 tests):
- `test_bedrock_provider_returns_static_table_before_probe`:
confirms `provider="bedrock"` hits the static table and does NOT
call `fetch_endpoint_model_metadata` (regression guard).
- `test_bedrock_url_without_provider_hint`: confirms the
`bedrock-runtime.*.amazonaws.com` host match works without an
explicit `provider=` hint.
- `test_non_bedrock_url_still_probes`: confirms the probe still
fires for genuinely-custom endpoints (no over-reach).
## Testing
pytest tests/agent/test_model_metadata.py -q
# 83 passed in 1.95s (3 new + 80 existing)
## Risk
Very low.
- Predicate is identical to the original step 4b — no behaviour change
for non-Bedrock paths.
- Original step 4b was dead code for the user-facing case (always hit
the 128K fallback first), so removing it cannot regress behaviour.
- Bedrock path now short-circuits before any network I/O — faster too.
- `ImportError` fall-through preserved so users without `boto3`
installed are unaffected.
## Related
- This is a prerequisite for accurate context-window accounting on
Bedrock — the fix for #14710 (stale-connection client eviction)
depends on correct context sizing to know when to compress.
Signed-off-by: Andre Kurait <andrekurait@gmail.com>
Bedrock model IDs use dots as namespace separators (anthropic.claude-opus-4-7,
us.anthropic.claude-sonnet-4-5-v1:0), not version separators.
normalize_model_name() was unconditionally converting all dots to hyphens,
producing invalid IDs that Bedrock rejects with HTTP 400/404.
This affected both the main agent loop (partially mitigated by
_anthropic_preserve_dots in run_agent.py) and all auxiliary client calls
(compression, session_search, vision, etc.) which go through
_AnthropicCompletionsAdapter and never pass preserve_dots=True.
Fix: add _is_bedrock_model_id() to detect Bedrock namespace prefixes
(anthropic., us., eu., ap., jp., global.) and skip dot-to-hyphen
conversion for these IDs regardless of the preserve_dots flag.
A child running a legitimately long-running tool (terminal command,
browser fetch, big file read) holds current_tool set and keeps
api_call_count frozen while the tool runs. The previous stale check
treated that as idle after 5 heartbeat cycles (~150s), stopped
touching the parent, and let the gateway kill the session.
Split the threshold in two:
- _HEARTBEAT_STALE_CYCLES_IDLE=5 (~150s) — applied only when
current_tool is None (child wedged between turns)
- _HEARTBEAT_STALE_CYCLES_IN_TOOL=20 (~600s) — applied when the child
is inside a tool call
Stale counter also resets when current_tool changes (new tool =
progress). The hard child_timeout_seconds (default 600s) is still
the final cap, so genuinely stuck tools don't get to block forever.
A — 'hermes tools' activation now runs the full Spotify wizard.
Previously a user had to (1) toggle the Spotify toolset on in 'hermes
tools' AND (2) separately run 'hermes auth spotify' to actually use
it. The second step was a discovery gap — the docs mentioned it but
nothing in the TUI pointed users there.
Now toggling Spotify on calls login_spotify_command as a post_setup
hook. If the user has no client_id yet, the interactive wizard walks
them through Spotify app creation; if they do, it skips straight to
PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on
AND authenticated. SystemExit from the wizard (user abort) leaves the
toolset enabled and prints a 'run: hermes auth spotify' hint — it
does NOT fail the toolset toggle.
Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard
handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users
to type env var names before the wizard fires was UX-backwards — the
point of the wizard is that they don't HAVE a client_id yet.
B — Docs page now covers cron + Spotify.
New 'Scheduling: Spotify + cron' section with two working examples
(morning playlist, wind-down pause) using the real 'hermes cron add'
CLI surface (verified via 'cron add --help'). Covers the active-device
gotcha, Premium gating, memory isolation, and links to the cron docs.
Also fixed a stale '9 Spotify tools' reference in the setup copy —
we consolidated to 7 tools in #15154.
Validation:
- scripts/run_tests.sh tests/hermes_cli/test_tools_config.py
tests/hermes_cli/test_spotify_auth.py
tests/tools/test_spotify_client.py
→ 54 passed
- website: node scripts/prebuild.mjs && npx docusaurus build
→ SUCCESS, no new warnings
Bug 3 — Stale OAuth token not detected in 'hermes model':
- _model_flow_anthropic used 'has_creds = bool(existing_key)' which treats
any non-empty token (including expired OAuth tokens) as valid.
- Added existing_is_stale_oauth check: if the only credential is an OAuth
token (sk-ant- prefix) with no valid cc_creds fallback, mark it stale
and force the re-auth menu instead of silently accepting a broken token.
Bug 4 — macOS Keychain credentials never read:
- Claude Code >=2.1.114 migrated from ~/.claude/.credentials.json to the
macOS Keychain under service 'Claude Code-credentials'.
- Added _read_claude_code_credentials_from_keychain() using the 'security'
CLI tool; read_claude_code_credentials() now tries Keychain first then
falls back to JSON file.
- Non-Darwin platforms return None from Keychain read immediately.
Tests:
- tests/agent/test_anthropic_keychain.py: 11 cases covering Darwin-only
guard, security command failures, JSON parsing, fallback priority.
- tests/hermes_cli/test_anthropic_model_flow_stale_oauth.py: 8 cases
covering stale OAuth detection, API key passthrough, cc_creds fallback.
Refs: #12905
FixesNousResearch/hermes-agent#9813
Root cause: _is_oauth_token() only recognized sk-ant-* and eyJ* patterns,
but Claude Code OAuth tokens from CLAUDE_CODE_OAUTH_TOKEN use cc- prefix
Fix: Add cc- prefix detection so these tokens route through Bearer auth
Moves the Spotify integration from tools/ into plugins/spotify/,
matching the existing pattern established by plugins/image_gen/ for
third-party service integrations.
Why:
- tools/ should be reserved for foundational capabilities (terminal,
read_file, web_search, etc.). tools/providers/ was a one-off
directory created solely for spotify_client.py.
- plugins/ is already the home for image_gen backends, memory
providers, context engines, and standalone hook-based plugins.
Spotify is a third-party service integration and belongs alongside
those, not in tools/.
- Future service integrations (eventually: Deezer, Apple Music, etc.)
now have a pattern to copy.
Changes:
- tools/spotify_tool.py → plugins/spotify/tools.py (handlers + schemas)
- tools/providers/spotify_client.py → plugins/spotify/client.py
- tools/providers/ removed (was only used for Spotify)
- New plugins/spotify/__init__.py with register(ctx) calling
ctx.register_tool() × 7. The handler/check_fn wiring is unchanged.
- New plugins/spotify/plugin.yaml (kind: backend, bundled, auto-load).
- tests/tools/test_spotify_client.py: import paths updated.
tools_config fix — _DEFAULT_OFF_TOOLSETS now wins over plugin auto-enable:
- _get_platform_tools() previously auto-enabled unknown plugin
toolsets for new platforms. That was fine for image_gen (which has
no toolset of its own) but bad for Spotify, which explicitly
requires opt-in (don't ship 7 tool schemas to users who don't use
it). Added a check: if a plugin toolset is in _DEFAULT_OFF_TOOLSETS,
it stays off until the user picks it in 'hermes tools'.
Pre-existing test bug fix:
- tests/hermes_cli/test_plugins.py::test_list_returns_sorted
asserted names were sorted, but list_plugins() sorts by key
(path-derived, e.g. image_gen/openai). With only image_gen plugins
bundled, name and key order happened to agree. Adding plugins/spotify
broke that coincidence (spotify sorts between openai-codex and xai
by name but after xai by key). Updated test to assert key order,
which is what the code actually documents.
Validation:
- scripts/run_tests.sh tests/hermes_cli/test_plugins.py \
tests/hermes_cli/test_tools_config.py \
tests/hermes_cli/test_spotify_auth.py \
tests/tools/test_spotify_client.py \
tests/tools/test_registry.py
→ 143 passed
- E2E plugin load: 'spotify' appears in loaded plugins, all 7 tools
register into the spotify toolset, check_fn gating intact.
Three quality improvements on top of #15121 / #15130 / #15135:
1. Tool consolidation (9 → 7)
- spotify_saved_tracks + spotify_saved_albums → spotify_library with
kind='tracks'|'albums'. Handler code was ~90 percent identical
across the two old tools; the merge is a behavioral no-op.
- spotify_activity dropped. Its 'now_playing' action was a duplicate
of spotify_playback.get_currently_playing (both return identical
204/empty payloads). Its 'recently_played' action moves onto
spotify_playback as a new action — history belongs adjacent to
live state.
- Net: each API call ships 2 fewer tool schemas when the Spotify
toolset is enabled, and the action surface is more discoverable
(everything playback-related is on one tool).
2. Spotify skill (skills/media/spotify/SKILL.md)
Teaches the agent canonical usage patterns so common requests don't
balloon into 4+ tool calls:
- 'play X' = one search, then play by URI (not search + scan +
describe + play)
- 'what's playing' = single get_currently_playing (no preflight
get_state chain)
- Don't retry on '403 Premium required' or '403 No active device' —
both require user action
- URI/URL/bare-ID format normalization
- Full failure-mode reference for 204/401/403/429
3. Surfaced in 'hermes setup' tool status
Adds 'Spotify (PKCE OAuth)' to the tool status list when
auth.json has a Spotify access/refresh token. Matches the
homeassistant pattern but reads from auth.json (OAuth-based) rather
than env vars.
Docs updated to reflect the new 7-tool surface, and mention the
companion skill in the 'Using it' section.
Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after
renaming/replacing the spotify_activity tests with library +
recently_played coverage). Docusaurus build clean.
Salvage of the Gemini-specific piece from PR #12585 by @briandevans.
Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed
with 'models/' (native Gemini-API convention), so set-membership against
curated bare IDs drops every model. Strip the prefix before comparison.
The Anthropic static-catalog piece of #12585 was subsumed by #12618's
_fetch_anthropic_models() branch landing earlier in the same salvage PR.
Full branch cherry-pick was skipped because it also carried unrelated
catalog-version regressions.
The generic /v1/models probe in validate_requested_model() sent a plain
'Authorization: Bearer <key>' header, which works for OpenAI-compatible
endpoints but results in a 401 Unauthorized from Anthropic's API.
Anthropic requires x-api-key + anthropic-version headers (or Bearer for
OAuth tokens from Claude Code).
Add a provider-specific branch for normalized == 'anthropic' that calls
the existing _fetch_anthropic_models() helper, which already handles
both regular API keys and Claude Code OAuth tokens correctly. This
mirrors the pattern already used for openai-codex, copilot, and bedrock.
The branch also includes:
- fuzzy auto-correct (cutoff 0.9) for near-exact model ID typos
- fuzzy suggestions (cutoff 0.5) when the model is not listed
- graceful fall-through when the token cannot be resolved or the
network is unreachable (accepts with a warning rather than hard-fail)
- a note that newer/preview/snapshot model IDs can be gate-listed
and may still work even if not returned by /v1/models
Fixes Anthropic provider users seeing 'service unreachable' errors
when running /model <claude-model> because every probe 401'd.
- probe_api_models: add api_mode param; use x-api-key + anthropic-version
headers for anthropic_messages mode (Anthropic's native Models API auth)
- probe_api_models: add User-Agent header to avoid Cloudflare 403 blocks
on third-party OpenAI-compatible endpoints
- validate_requested_model: pass api_mode through from switch_model
- validate_requested_model: for anthropic_messages mode, attempt probe with
correct auth; if probe fails (many proxies don't implement /v1/models),
accept the model with an informational warning instead of rejecting
- fetch_api_models: propagate api_mode to probe_api_models
Regression test for #14981. Verifies that _session_expiry_watcher fires
on_session_finalize for each session swept out of the store, matching
the contract documented for /new, /reset, CLI shutdown, and gateway stop.
Verified the test fails cleanly on pre-fix code (hook call list missing
sess-expired) and passes with the fix applied.
The initial Spotify docs page shipped in #15130 was a setup guide. This
expands it into a full feature reference:
- Per-tool parameter table for all 9 tools, extracted from the real
schemas in tools/spotify_tool.py (actions, required/optional args,
premium gating).
- Free vs Premium feature matrix — which actions work on which tier,
so Free users don't assume Spotify tools are useless to them.
- Active-device prerequisite called out at the top; this is the #1
cause of '403 no active device' reports for every Spotify
integration.
- SSH / headless section explaining that browser auto-open is skipped
when SSH_CLIENT/SSH_TTY is set, and how to tunnel the callback port.
- Token lifecycle: refresh on 401, persistence across restarts, how
to revoke server-side via spotify.com/account/apps.
- Example prompt list so users know what to ask the agent.
- Troubleshooting expanded: no-active-device, Premium-required, 204
now_playing, INVALID_CLIENT, 429, 401 refresh-revoked, wizard not
opening browser.
- 'Where things live' table mapping auth.json / .env / Spotify app.
Verified with 'node scripts/prebuild.mjs && npx docusaurus build'
— page compiles, no new warnings.
When the primary provider raises AuthError (expired OAuth token,
revoked API key), the error was re-raised before AIAgent was created,
so fallback_model was never consulted. Now both gateway/run.py and
cron/scheduler.py catch AuthError specifically and attempt to resolve
credentials from the fallback_providers/fallback_model config chain
before propagating the error.
Closes#7230
Try to activate fallback model after errors was calling get_model_context_length()
without the config_context_length parameter, causing it to fall through to
DEFAULT_FALLBACK_CONTEXT (128K) even when config.yaml has an explicit
model.context_length value (e.g. 204800 for MiniMax-M2.7).
This mirrors the fix already present in switch_model() at line 1988, which
correctly passes config_context_length. The fallback path was missed.
Fixes: context_length forced to 128K on fallback activation
ssl.SSLError (and its subclass ssl.SSLCertVerificationError) inherits from
OSError *and* ValueError via Python's MRO. The is_local_validation_error
check used isinstance(api_error, (ValueError, TypeError)) to detect
programming bugs that should abort immediately — but this inadvertently
caught ssl.SSLError, treating a TLS transport failure as a non-retryable
client error.
The error classifier already maps SSLCertVerificationError to
FailoverReason.timeout with retryable=True (its type name is in
_TRANSPORT_ERROR_TYPES), but the inline isinstance guard was overriding
that classification and triggering an unnecessary abort.
Fix: add ssl.SSLError to the exclusion list alongside the existing
UnicodeEncodeError carve-out so TLS errors fall through to the
classifier's retryable path.
Closes#14367
Two small fixes triggered by a support report where the user saw a
cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML
error page, not a real API error) on every gemini-2.5-pro request.
The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but
nothing in our output made that diagnosable:
1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but
omitted Google entirely, so users had no way to verify from 'hermes
dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY
rows.
2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted
an empty/whitespace api_key and stamped it into the x-goog-api-key
header, which made Google's frontend return a generic HTML 400 long
before the request reached the Generative Language backend. Now we
raise RuntimeError at construction with an actionable message
pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com.
Added a regression test that covers '', ' ', and None.
Claude-style and some Anthropic-tuned models occasionally emit tool
names as class-like identifiers: TodoTool_tool, Patch_tool,
BrowserClick_tool, PatchTool. These failed strict-dict lookup in
valid_tool_names and triggered the 'Unknown tool' self-correction
loop, wasting a full turn of iteration and tokens.
_repair_tool_call already handled lowercase / separator / fuzzy
matches but couldn't bridge the CamelCase-to-snake_case gap or the
trailing '_tool' suffix that Claude sometimes tacks on. Extend it
with two bounded normalization passes:
1. CamelCase -> snake_case (via regex lookbehind).
2. Strip trailing _tool / -tool / tool suffix (case-insensitive,
applied twice so TodoTool_tool reduces all the way: strip
_tool -> TodoTool, snake -> todo_tool, strip 'tool' -> todo).
Cheap fast-paths (lowercase / separator-normalized) still run first
so the common case stays zero-cost. Fuzzy match remains the last
resort unchanged.
Tests: tests/run_agent/test_repair_tool_call_name.py covers the
three original reports (TodoTool_tool, Patch_tool, BrowserClick_tool),
plus PatchTool, WriteFileTool, ReadFile_tool, write-file_Tool,
patch-tool, and edge cases (empty, None, '_tool' alone, genuinely
unknown names).
18 new tests + 17 existing arg-repair tests = 35/35 pass.
Closes#14784
Previously 'hermes auth spotify' crashed with 'HERMES_SPOTIFY_CLIENT_ID
is required' if the user hadn't manually created a Spotify developer
app and set env vars. Now the command detects a missing client_id and
walks the user through the one-time app registration inline:
- Opens https://developer.spotify.com/dashboard in the browser
- Tells the user exactly what to paste into the Spotify form
(including the correct default redirect URI, 127.0.0.1:43827)
- Prompts for the Client ID
- Persists HERMES_SPOTIFY_CLIENT_ID to ~/.hermes/.env so subsequent
runs skip the wizard
- Continues straight into the PKCE OAuth flow
Also prints the docs URL at both the start of the wizard and the end
of a successful login so users can find the full guide.
Adds website/docs/user-guide/features/spotify.md with the complete
setup walkthrough, tool reference, and troubleshooting, and wires it
into the sidebar under User Guide > Features > Advanced.
Fixes a stale redirect URI default in the hermes_cli/tools_config.py
TOOL_CATEGORIES entry (was 8888/callback from the PR description
instead of the actual DEFAULT_SPOTIFY_REDIRECT_URI value
43827/spotify/callback defined in auth.py).
Streamable HTTP MCP servers may garbage-collect their server-side
session state while the OAuth token remains valid — idle TTL, server
restart, pod rotation, etc. Before this fix, the tool-call handler
treated the resulting "Invalid or expired session" error as a plain
tool failure with no recovery path, so **every subsequent call on
the affected server failed until the gateway was manually
restarted**. Reporter: #13383.
The OAuth-based recovery path (``_handle_auth_error_and_retry``)
already exists for 401s, but it only fires on auth errors. Session
expiry slipped through because the access token is still valid —
nothing 401'd, so the existing recovery branch was skipped.
Fix
---
Add a sibling function ``_handle_session_expired_and_retry`` that
detects MCP session-expiry via ``_is_session_expired_error`` (a
narrow allow-list of known-stable substrings: ``"invalid or expired
session"``, ``"session expired"``, ``"session not found"``,
``"unknown session"``, etc.) and then uses the existing transport
reconnect mechanism:
* Sets ``MCPServerTask._reconnect_event`` — the server task's
lifecycle loop already interprets this as "tear down the current
``streamablehttp_client`` + ``ClientSession`` and rebuild them,
reusing the existing OAuth provider instance".
* Waits up to 15 s for the new session to come back ready.
* Retries the original call once. If the retry succeeds, returns
its result and resets the circuit-breaker error count. If the
retry raises, or if the reconnect doesn't ready in time, falls
through to the caller's generic error path.
Unlike the 401 path, this does **not** call ``handle_401`` — the
access token is already valid and running an OAuth refresh would be
a pointless round-trip.
All 5 MCP handlers (``call_tool``, ``list_resources``, ``read_resource``,
``list_prompts``, ``get_prompt``) now consult both recovery paths
before falling through:
recovered = _handle_auth_error_and_retry(...) # 401 path
if recovered is not None: return recovered
recovered = _handle_session_expired_and_retry(...) # new
if recovered is not None: return recovered
# generic error response
Narrow scope — explicitly not changed
-------------------------------------
* **Detection is string-based on a 5-entry allow-list.** The MCP
SDK wraps JSON-RPC errors in ``McpError`` whose exception type +
attributes vary across SDK versions, so matching on message
substrings is the durable path. Kept narrow to avoid false
positives — a regular ``RuntimeError("Tool failed")`` will NOT
trigger spurious reconnects (pinned by
``test_is_session_expired_rejects_unrelated_errors``).
* **No change to the existing 401 recovery flow.** The new path is
consulted only after the auth path declines (returns ``None``).
* **Retry count stays at 1.** If the reconnect-then-retry also
fails, we don't loop — the error surfaces normally so the model
sees a failed tool call rather than a hang.
* **``InterruptedError`` is explicitly excluded** from session-expired
detection so user-cancel signals always short-circuit the same
way they did before (pinned by
``test_is_session_expired_rejects_interrupted_error``).
Regression coverage
-------------------
``tests/tools/test_mcp_tool_session_expired.py`` (new, 16 cases):
Unit tests for ``_is_session_expired_error``:
* ``test_is_session_expired_detects_invalid_or_expired_session`` —
reporter's exact wpcom-mcp text.
* ``test_is_session_expired_detects_expired_session_variant`` —
"Session expired" / "expired session" variants.
* ``test_is_session_expired_detects_session_not_found`` — server GC
variant ("session not found", "unknown session").
* ``test_is_session_expired_is_case_insensitive``.
* ``test_is_session_expired_rejects_unrelated_errors`` — narrow-scope
canary: random RuntimeError / ValueError / 401 don't trigger.
* ``test_is_session_expired_rejects_interrupted_error`` — user cancel
must never route through reconnect.
* ``test_is_session_expired_rejects_empty_message``.
Handler integration tests:
* ``test_call_tool_handler_reconnects_on_session_expired`` — reporter's
full repro: first call raises "Invalid or expired session", handler
signals ``_reconnect_event``, retries once, returns the retry's
success result with no ``error`` key.
* ``test_call_tool_handler_non_session_expired_error_falls_through``
— preserved-behaviour canary: random tool failures do NOT trigger
reconnect.
* ``test_session_expired_handler_returns_none_without_loop`` —
defensive: cold-start / shutdown race.
* ``test_session_expired_handler_returns_none_without_server_record``
— torn-down server falls through cleanly.
* ``test_session_expired_handler_returns_none_when_retry_also_fails``
— no retry loop on repeated failure.
Parametrised across all 4 non-``tools/call`` handlers:
* ``test_non_tool_handlers_also_reconnect_on_session_expired``
[list_resources / read_resource / list_prompts / get_prompt].
**15 of 16 fail on clean ``origin/main`` (``6fb69229``)** with
``ImportError: cannot import name '_is_session_expired_error'``
— the fix's surface symbols don't exist there yet. The 1 passing
test is an ordering artefact of pytest-xdist worker collection.
Validation
----------
``source venv/bin/activate && python -m pytest
tests/tools/test_mcp_tool_session_expired.py -q`` → **16 passed**.
Broader MCP suite (5 files:
``test_mcp_tool.py``, ``test_mcp_tool_401_handling.py``,
``test_mcp_tool_session_expired.py``, ``test_mcp_reconnect_signal.py``,
``test_mcp_oauth.py``) → **230 passed, 0 regressions**.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add event hook to httpx.AsyncClient in MCP HTTP transport that strips
Authorization headers when a redirect targets a different origin,
preventing credential leakage to third-party servers.
``tools/mcp_oauth.py`` relied on ``assert _oauth_port is not None`` to
guard the module-level port set by ``build_oauth_auth``. Python's
``-O`` / ``-OO`` optimization flags strip ``assert`` statements
entirely, so a deployment that runs ``python -O -m hermes ...``
silently loses the check: ``_oauth_port`` stays ``None`` and the
failure surfaces much later as an obscure ``int()`` or
``http.server.HTTPServer((host, None))`` TypeError rather than the
intended "OAuth callback port not set" signal.
Replace with an explicit ``if … raise RuntimeError(...)`` so the
invariant is preserved regardless of the interpreter's optimization
level. Docstring updated to document the new exception.
Found during a proactive audit of ``assert`` statements in
non-test code paths.
OAuth client information and token responses from the MCP SDK contain
Pydantic AnyUrl fields (client_uri, redirect_uris, etc.). The previous
model_dump() call returned a dict with these AnyUrl objects still as
their native Python type, which then crashed json.dumps with:
TypeError: Object of type AnyUrl is not JSON serializable
This caused any OAuth-based MCP server (e.g. alphaxiv) to fail
registration with an "OAuth flow error" traceback during startup.
Adding mode="json" tells Pydantic to serialize all fields to
JSON-compatible primitives (AnyUrl -> str, datetime -> ISO string, etc.)
before returning the dict, so the standard json.dumps can handle it.
Three call sites fixed:
- HermesTokenStorage.set_tokens
- HermesTokenStorage.set_client_info
- build_oauth_auth pre-registration write
`_normalize_for_deepseek` was mapping every non-reasoner input into
`deepseek-chat` on the assumption that DeepSeek's API accepts only two
model IDs. That assumption no longer holds — `deepseek-v4-pro` and
`deepseek-v4-flash` are first-class IDs accepted by the direct API,
and on aggregators `deepseek-chat` routes explicitly to V3 (DeepInfra
backend returns `deepseek-chat-v3`). So a user picking V4 Pro through
the model picker was being silently downgraded to V3.
Verified 2026-04-24 against Nous portal's OpenAI-compat surface:
- `deepseek/deepseek-v4-flash` → provider: DeepSeek,
model: deepseek-v4-flash-20260423
- `deepseek/deepseek-chat` → provider: DeepInfra,
model: deepseek/deepseek-chat-v3
Fix:
- Add `deepseek-v4-pro` and `deepseek-v4-flash` to
`_DEEPSEEK_CANONICAL_MODELS` so exact matches pass through.
- Add `_DEEPSEEK_V_SERIES_RE` (`^deepseek-v\d+(...)?$`) so future
V-series IDs (`deepseek-v5-*`, dated variants) keep passing through
without another code change.
- Update docstring + module header to reflect the new rule.
Tests:
- New `TestDeepseekVSeriesPassThrough` — 8 parametrized cases covering
bare, vendor-prefixed, case-variant, dated, and future V-series IDs
plus end-to-end `normalize_model_for_provider(..., "deepseek")`.
- New `TestDeepseekCanonicalAndReasonerMapping` — regression coverage
for canonical pass-through, reasoner-keyword folding, and
fall-back-to-chat behaviour.
- 77/77 pass.
Reported on Discord (Ufonik, Don Piedro): `/model > Deepseek >
deepseek-v4-pro` surfaced
`Normalized 'deepseek-v4-pro' to 'deepseek-chat'`. Picker listing
showed the v4 names, so validation also rejected the post-normalize
`deepseek-chat` as "not in provider listing" — the contradiction
users saw. Normalizer now respects the picker's choice.
Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.
Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".
tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.
Closes#15012.
E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
Follow-up on top of #15096 cherry-pick:
- Remove spotify_* from _HERMES_CORE_TOOLS (keep only in the 'spotify'
toolset, so the 9 Spotify tool schemas are not shipped to every user).
- Add 'spotify' to CONFIGURABLE_TOOLSETS + _DEFAULT_OFF_TOOLSETS so new
installs get it opt-in via 'hermes tools', matching homeassistant/rl.
- Wire TOOL_CATEGORIES entry pointing at 'hermes auth spotify' for the
actual PKCE login (optional HERMES_SPOTIFY_CLIENT_ID /
HERMES_SPOTIFY_REDIRECT_URI env vars).
- scripts/release.py: map contributor email to GitHub login.
Concurrent Hermes processes (e.g. cron jobs) refreshing a Nous OAuth token
via resolve_nous_runtime_credentials() write the rotated tokens to auth.json.
The calling process's pool entry becomes stale, and the next refresh against
the already-rotated token triggers a 'refresh token reuse' revocation on
the Nous Portal.
_sync_nous_entry_from_auth_store() reads auth.json under the same lock used
by resolve_nous_runtime_credentials, and adopts the newer token pair before
refreshing the pool entry. This complements #15111 (which preserved the
obtained_at timestamps through seeding).
Partial salvage of #10160 by @konsisumer — only the agent/credential_pool.py
changes + the 3 Nous-specific regression tests. The PR also touched 10
unrelated files (Dockerfile, tips.py, various tool tests) which were
dropped as scope creep.
Regression tests:
- test_sync_nous_entry_from_auth_store_adopts_newer_tokens
- test_sync_nous_entry_noop_when_tokens_match
- test_nous_exhausted_entry_recovers_via_auth_store_sync
Extracts pool-rotation-room logic into `_pool_may_recover_from_rate_limit`
so single-credential pools no longer block the eager-fallback path on 429.
The existing check `pool is not None and pool.has_available()` lets
fallback fire only after the pool marks every entry as exhausted. With
exactly one credential in the pool (the common shape for Gemini OAuth,
Vertex service accounts, and any personal-key setup), `has_available()`
flips back to True as soon as the cooldown expires — Hermes retries
against the same entry, hits the same daily-quota 429, and burns the
retry budget in a tight loop before ever reaching the configured
`fallback_model`. Observed in the wild as 4+ hours of 429 noise on a
single Gemini key instead of falling through to Vertex as configured.
Rotation is only meaningful with more than one credential — gate on
`len(pool.entries()) > 1`. Multi-credential pools keep the current
wait-for-rotation behaviour unchanged.
Fixes#11314. Related to #8947, #10210, #7230. Narrower scope than
open PRs #8023 (classifier change) and #11492 (503/529 credential-pool
bypass) — this addresses the single-credential 429 case specifically
and does not conflict with either.
Tests: 6 new unit tests in tests/run_agent/test_provider_fallback.py
covering (a) None pool, (b) single-cred available, (c) single-cred in
cooldown, (d) 2-cred available rotates, (e) multi-cred all cooling-down
falls back, (f) many-cred available rotates. All 18 tests in the file
pass.
Previously _handle_credential_pool_error handled 401, 402, and 429
but silently ignored 403. When a provider returns 403 for a revoked or
unauthorised credential (e.g. Nous agent_key invalidated by a newer
login), the pool was never rotated and every subsequent request
continued to use the same failing credential.
Treat 403 the same as 402: immediately mark the current credential
exhausted and rotate to the next pool entry, since a Forbidden response
will not resolve itself with a retry.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The least_used strategy selected entries via min(request_count) but
never incremented the counter. All entries stayed at count=0, so the
strategy degenerated to fill_first behavior with no actual load balancing.
Now increments request_count after each selection and persists the update.
The Copilot provider resolved context windows via models.dev static data,
which does not include account-specific models (e.g. claude-opus-4.6-1m
with 1M context). This adds the live Copilot /models API as a higher-
priority source for copilot/copilot-acp/github-copilot providers.
New helper get_copilot_model_context() in hermes_cli/models.py extracts
capabilities.limits.max_prompt_tokens from the cached catalog. Results
are cached in-process for 1 hour.
In agent/model_metadata.py, step 5a queries the live API before falling
through to models.dev (step 5b). This ensures account-specific models
get correct context windows while standard models still have a fallback.
Part 1 of #7731.
Refs: #7272
Raw GitHub tokens (gho_/github_pat_/ghu_) are now exchanged for
short-lived Copilot API tokens via /copilot_internal/v2/token before
being used as Bearer credentials. This is required to access
internal-only models (e.g. claude-opus-4.6-1m with 1M context).
Implementation:
- exchange_copilot_token(): calls the token exchange endpoint with
in-process caching (dict keyed by SHA-256 fingerprint), refreshed
2 minutes before expiry. No disk persistence — gateway is long-running
so in-memory cache is sufficient.
- get_copilot_api_token(): convenience wrapper with graceful fallback —
returns exchanged token on success, raw token on failure.
- Both callers (hermes_cli/auth.py and agent/credential_pool.py) now
pipe the raw token through get_copilot_api_token() before use.
12 new tests covering exchange, caching, expiry, error handling,
fingerprinting, and caller integration. All 185 existing copilot/auth
tests pass.
Part 2 of #7731.
When using GitHub Copilot as provider, HTTP 401 errors could cause
Hermes to silently fall back to the next model in the chain instead
of recovering. This adds a one-shot retry mechanism that:
1. Re-resolves the Copilot token via the standard priority chain
(COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token)
2. Rebuilds the OpenAI client with fresh credentials and Copilot headers
3. Retries the failed request before falling back
The fix handles the common case where the gho_* OAuth token remains
valid but the httpx client state becomes stale (e.g. after startup
race conditions or long-lived sessions).
Key design decisions:
- Always rebuild client even if token string unchanged (recovers stale state)
- Uses _apply_client_headers_for_base_url() for canonical header management
- One-shot flag guard prevents infinite 401 loops (matches existing pattern
used by Codex/Nous/Anthropic providers)
- No token exchange via /copilot_internal/v2/token (returns 404 for some
account types; direct gho_* auth works reliably)
Tests: 3 new test cases covering end-to-end 401->refresh->retry,
client rebuild verification, and same-token rebuild scenarios.
Docs: Updated providers.md with Copilot auth behavior section.
Pass an explicit HOME into Copilot ACP child processes so delegated ACP runs do not fail when the ambient environment is missing HOME.
Prefer the per-profile subprocess home when available, then fall back to HOME, expanduser('~'), pwd.getpwuid(...), and /home/openclaw. Add regression tests for both profile-home preference and clean HOME fallback.
Refs #11068.
Two narrow fixes motivated by #15099.
1. _seed_from_singletons() was dropping obtained_at, agent_key_obtained_at,
expires_in, and friends when seeding device_code pool entries from the
providers.nous singleton. Fresh credentials showed up with
obtained_at=None, which broke downstream freshness-sensitive consumers
(self-heal hooks, pool pruning by age) — they treated just-minted
credentials as older than they actually were and evicted them.
2. When the Nous Portal OAuth 2.1 server returns invalid_grant with
'Refresh token reuse detected' in the error_description, rewrite the
message to explain the likely cause (an external process consumed the
rotated RT without persisting it back) and the mitigation. The generic
reuse message led users to report this as a Hermes persistence bug when
the actual trigger was typically a third-party monitoring script calling
/api/oauth/token directly. Non-reuse errors keep their original server
description untouched.
Closes#15099.
Regression tests:
- tests/agent/test_credential_pool.py::test_nous_seed_from_singletons_preserves_obtained_at_timestamps
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_token_reuse_detection_surfaces_actionable_message
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_non_reuse_error_keeps_original_description
Cron jobs can now specify a per-job working directory. When set, the job
runs as if launched from that directory: AGENTS.md / CLAUDE.md /
.cursorrules from that dir are injected into the system prompt, and the
terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD).
When unset, old behaviour is preserved (no project context files, tools
use the scheduler's cwd).
Requested by @bluthcy.
## Mechanism
- cron/jobs.py: create_job / update_job accept 'workdir'; validated to
be an absolute existing directory at create/update time.
- cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD
at it and flip skip_context_files to False before building the agent.
Restored in finally on every exit path.
- cron/scheduler.py tick: workdir jobs run sequentially (outside the
thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs
still run in the parallel pool unchanged.
- tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py:
expose 'workdir' via the cronjob tool and 'hermes cron create/edit
--workdir ...'. Empty string on edit clears the field.
## Validation
- tests/cron/test_cron_workdir.py (21 tests): normalize, create, update,
JSON round-trip via cronjob tool, tick partition (workdir jobs run on
the main thread, not the pool), run_job env toggle + restore in finally.
- Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py,
test_config_cwd_bridge.py, test_worktree.py): 314/314 passed.
- Live smoke: hermes cron create --workdir $(pwd) works; relative path
rejected; list shows 'Workdir:'; edit --workdir '' clears.
agent/redact.py snapshots _REDACT_ENABLED from HERMES_REDACT_SECRETS at
module-import time. hermes_cli/main.py calls setup_logging() early, which
transitively imports agent.redact — BEFORE any config bridge has run. So
users who set 'security.redact_secrets: false' in config.yaml (instead of
HERMES_REDACT_SECRETS=false in .env) had the toggle silently ignored in
both 'hermes chat' and 'hermes gateway run'.
Bridge config.yaml -> env var in hermes_cli/main.py BEFORE setup_logging.
.env still wins (only set env when unset) — config.yaml is the fallback.
Regression tests in tests/hermes_cli/test_redact_config_bridge.py spawn
fresh subprocesses to verify:
- redact_secrets: false in config.yaml disables redaction
- default (key absent) leaves redaction enabled
- .env HERMES_REDACT_SECRETS=true overrides config.yaml
json.JSONDecodeError inherits from ValueError. The agent loop's
non-retryable classifier at run_agent.py ~L10782 treated any
ValueError/TypeError as a local programming bug and short-circuited
retry. Without a carve-out, a transient JSONDecodeError from a
provider that returned a malformed response body, a truncated stream,
or a router-layer corruption would fail the turn immediately.
Add JSONDecodeError to the existing UnicodeEncodeError exclusion
tuple so the classified-retry logic (which already handles 429/529/
context-overflow/etc.) gets to run on bad-JSON errors.
Tests (tests/run_agent/test_jsondecodeerror_retryable.py):
- JSONDecodeError: NOT local validation
- UnicodeEncodeError: NOT local validation (existing carve-out)
- bare ValueError: IS local validation (programming bug)
- bare TypeError: IS local validation (programming bug)
- source-level assertion that run_agent.py still carries the carve-out
(guards against accidental revert)
Closes#14782
/model kimi-k2.6 on opencode-zen (or glm-5.1 on opencode-go) returned OpenCode's
website 404 HTML page when the user's persisted model.default was a Claude or
MiniMax model. The switched-to chat_completions request hit
https://opencode.ai/zen (or /zen/go) with no /v1 suffix.
Root cause: resolve_runtime_provider() computed api_mode from
model_cfg.get('default') instead of the model being requested. With a Claude
default, it resolved api_mode=anthropic_messages, stripped /v1 from base_url
(required for the Anthropic SDK), then switch_model()'s opencode_model_api_mode
override flipped api_mode back to chat_completions without restoring /v1.
Fix: thread an optional target_model kwarg through resolve_runtime_provider
and _resolve_runtime_from_pool_entry. When the caller is performing an explicit
mid-session model switch (i.e. switch_model()), the target model drives both
api_mode selection and the conditional /v1 strip. Other callers (CLI init,
gateway init, cron, ACP, aux client, delegate, account_usage, tui_gateway) pass
nothing and preserve the existing config-default behavior.
Regression tests added in test_model_switch_opencode_anthropic.py use the REAL
resolver (not a mock) to guard the exact Quentin-repro scenario. Existing tests
that mocked resolve_runtime_provider with 'lambda requested:' had their mock
signatures widened to '**kwargs' to accept the new kwarg.
When a subagent in delegate_task times out before making its first LLM
request, write a structured diagnostic file under
~/.hermes/logs/subagent-timeout-<sid>-<ts>.log capturing enough state
for the user (and us) to debug the hang. The old error message —
'Subagent timed out after Ns with no response. The child may be stuck
on a slow API call or unresponsive network request.' — gave no
observability for the 0-API-call case, which is the hardest to reason
about remotely.
The diagnostic captures:
- timeout config vs actual duration
- goal (truncated to 1000 chars)
- child config: model, provider, api_mode, base_url, max_iterations,
quiet_mode, platform, _delegate_role, _delegate_depth
- enabled_toolsets + loaded tool names
- system prompt byte/char count (catches oversized prompts that
providers silently choke on)
- tool schema count + byte size
- child's get_activity_summary() snapshot
- Python stack of the worker thread at the moment of timeout
(reveals whether the hang is in credential resolution, transport,
prompt construction, etc.)
Wiring:
- _run_single_child captures the worker thread via a small wrapper
around child.run_conversation so we can look up its stack at
timeout.
- After a FuturesTimeoutError, we pull child.get_activity_summary()
to read api_call_count. If 0 AND it was a timeout (not a raise),
_dump_subagent_timeout_diagnostic() is invoked.
- The returned path is surfaced in the error string so the parent
agent (and therefore the user / gateway) sees exactly where to look.
- api_calls > 0 timeouts keep the old 'stuck on slow API call'
phrasing since that's the correct diagnosis for those.
This does NOT change any behavior for successful subagent runs,
non-timeout errors, or subagents that made at least one API call
before hanging.
Tests: 7 cases (tests/tools/test_delegate_subagent_timeout_diagnostic.py)
- output format + required sections + field values
- long-goal truncation with [truncated] marker
- missing / already-exited worker thread branches
- unwritable HERMES_HOME/logs/ returns None without raising
- _run_single_child wiring: 0 API calls → dump + diagnostic_path in error
- _run_single_child wiring: N>0 API calls → no dump, old message
Refs: #14726
When /model selects Custom but model.provider in YAML still reflects a prior provider, trust model.base_url only for loopback hosts or when provider is custom. Consult CUSTOM_BASE_URL before OpenRouter defaults (#14676).
Two related paths where Codex auth failures silently swallowed the
fallback chain instead of switching to the next provider:
1. cli.py — _ensure_runtime_credentials() calls resolve_runtime_provider()
before each turn. When provider is explicitly configured (not "auto"),
an AuthError from token refresh is re-raised and printed as a bold-red
error, returning False before the agent ever starts. The fallback chain
was never tried. Fix: on AuthError, iterate fallback_providers and
switch to the first one that resolves successfully.
2. run_agent.py — inside the codex_responses validity gate (inner retry
loop), response.status in {"failed","cancelled"} with non-empty output
items was treated as a valid response and broke out of the retry loop,
reaching _normalize_codex_response() outside the fallback machinery.
That function raises RuntimeError on status="failed", which propagates
to the outer except with no fallback logic. Fix: detect terminal status
codes before the output_items check and set response_invalid=True so
the existing fallback chain fires normally.
OpenAI's OAuth token endpoint returns errors in a nested shape —
{"error": {"code": "refresh_token_reused", "message": "..."}} —
not the OAuth spec's flat {"error": "...", "error_description": "..."}.
The existing parser only handled the flat shape, so:
- `err.get("error")` returned a dict, the `isinstance(str)` guard
rejected it, and `code` stayed `"codex_refresh_failed"`.
- The dedicated `refresh_token_reused` branch (with its actionable
"re-run codex + hermes auth" message and `relogin_required=True`)
never fired.
- Users saw the generic "Codex token refresh failed with status 401"
when another Codex client (CLI, VS Code extension) had consumed
their single-use refresh token — giving no hint that re-auth was
required.
Parse both shapes, mapping OpenAI's nested `code`/`type` onto the
existing `code` variable so downstream branches (`refresh_token_reused`,
`invalid_grant`, etc.) fire correctly.
Add regression tests covering:
- nested `refresh_token_reused` → actionable message + relogin_required
- nested generic code → code + message surfaced
- flat OAuth-spec `invalid_grant` still handled (back-compat)
- unparseable body → generic fallback message, relogin_required=False
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow-up to salvaged PR #13483:
- Default HERMES_UID/HERMES_GID to 10000 (matches Dockerfile's useradd
and the entrypoint's default) instead of 1001. Users should set these
to their own id -u / id -g; document that in the header.
- Dashboard service: bind to 127.0.0.1 without --insecure by default.
The dashboard stores API keys; the original compose file exposed it on
0.0.0.0 with auth explicitly disabled, which the dashboard's own
--insecure help text flags as DANGEROUS.
- Add header comments explaining HERMES_UID usage, the dashboard
security posture, and how to expose the API server safely.
- Remove 'USER hermes' from Dockerfile so entrypoint runs as root and can
usermod/groupmod before gosu drop. Add chmod -R a+rX /opt/hermes so any
remapped UID can read the install directory.
- Fix entrypoint chown logic: always chown -R when HERMES_UID is remapped
from default 10000, not just when top-level dir ownership mismatches.
- Add docker-compose.yml with gateway + dashboard services.
- Add .hermes to .gitignore.
Google AI Studio's free tier (<= 250 req/day for gemini-2.5-flash) is
exhausted in a handful of agent turns, so the setup wizard now refuses
to wire up Gemini when the supplied key is on the free tier, and the
runtime 429 handler appends actionable billing guidance.
Setup-time probe (hermes_cli/main.py):
- `_model_flow_api_key_provider` fires one minimal generateContent call
when provider_id == 'gemini' and classifies the response as
free/paid/unknown via x-ratelimit-limit-requests-per-day header or
429 body containing 'free_tier'.
- Free -> print block message, refuse to save the provider, return.
- Paid -> 'Tier check: paid' and proceed.
- Unknown (network/auth error) -> 'could not verify', proceed anyway.
Runtime 429 handler (agent/gemini_native_adapter.py):
- `gemini_http_error` appends billing guidance when the 429 error body
mentions 'free_tier', catching users who bypass setup by putting
GOOGLE_API_KEY directly in .env.
Tests: 21 unit tests for the probe + error path, 4 tests for the
setup-flow block. All 67 existing gemini tests still pass.
PR #14935 added a Codex-aware context resolver but only new lookups
hit the live /models probe. Users who had run Hermes on gpt-5.5 / 5.4
BEFORE that PR already had the wrong value (e.g. 1,050,000 from
models.dev) persisted in ~/.hermes/context_length_cache.yaml, and the
cache-first lookup in get_model_context_length() returns it forever.
Symptom (reported in the wild by Ludwig, min heo, Gaoge on current
main at 6051fba9d, which is AFTER #14935):
* Startup banner shows context usage against 1M
* Compression fires late and then OpenAI hard-rejects with
'context length will be reduced from 1,050,000 to 128,000'
around the real 272k boundary.
Fix: when the step-1 cache returns a value for an openai-codex lookup,
check whether it's >= 400k. Codex OAuth caps every slug at 272k (live
probe values) so anything at or above 400k is definitionally a
pre-#14935 leftover. Drop that entry from the on-disk cache and fall
through to step 5, which runs the live /models probe and repersists
the correct value (or 272k from the hardcoded fallback if the probe
fails). Non-Codex providers and legitimately-cached Codex entries at
272k are untouched.
Changes:
- agent/model_metadata.py:
* _invalidate_cached_context_length() — drop a single entry from
context_length_cache.yaml and rewrite the file.
* Step-1 cache check in get_model_context_length() now gates
provider=='openai-codex' entries >= 400k through invalidation
instead of returning them.
Tests (3 new in TestCodexOAuthContextLength):
- stale 1.05M Codex entry is dropped from disk AND re-resolved
through the live probe to 272k; unrelated cache entries survive.
- fresh 272k Codex entry is respected (no probe call, no invalidation).
- non-Codex 1M entries (e.g. anthropic/claude-opus-4.6 on OpenRouter)
are unaffected — the guard is strictly scoped to openai-codex.
Full tests/agent/test_model_metadata.py: 88 passed.
Make the main-branch test suite pass again. Most failures were tests
still asserting old shapes after recent refactors; two were real source
bugs.
Source fixes:
- tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every
shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel
measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty.
- hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150.
Test fixes (mostly stale mock targets / missing fixture fields):
- test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm
(the local name bound at import), not tools.terminal_tool.cleanup_vm.
- test_browser_camofox: patch tools.browser_camofox.load_config, not
hermes_cli.config.load_config (the source module, not the resolved one).
- test_flush_memories_codex._chat_response_with_memory_call: add
finish_reason, tool_call.id, tool_call.type so the chat_completions
transport normalizer doesn't AttributeError.
- test_concurrent_interrupt: polling_tool signature now accepts
messages= kwarg that _invoke_tool() passes through.
- test_minimax_provider: add _fallback_chain=[] to the __new__'d agent
so switch_model() doesn't AttributeError.
- test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working
after the scanner switched to agent.skill_utils.iter_skill_index_files
(os.walk-based). Point SKILLS_DIR at a real tmp_path and patch
agent.skill_utils.get_external_skills_dirs.
- test_browser_cdp_tool: browser_cdp toolset was intentionally split into
'browser-cdp' (commit 96b0f3700) so its stricter check_fn doesn't gate
the whole browser toolset; test now expects 'browser-cdp'.
- test_registry: add tools.browser_dialog_tool to the expected
builtin-discovery set (PR #14540 added it).
- test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint'
key on the JSON payload, not inline '[Hint: ...' text.
- test_write_deny test_hermes_env: resolve .env via get_hermes_home() so
the path matches the profile-aware denylist under hermetic HERMES_HOME.
- test_checkpoint_manager test_falls_back_to_parent: guard the walk-up
so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the
project root.
- test_quick_commands: set cli.session_id in the __new__'d CLI so the
alias-args path doesn't trip AttributeError when fuzzy-matching leaks
a skill command across xdist test distribution.
Gemini's Schema validator requires every `enum` entry to be a string,
even when the parent `type` is integer/number/boolean. Discord's
`auto_archive_duration` parameter (`type: integer, enum: [60, 1440,
4320, 10080]`) tripped this on every request that shipped the full
tool catalog to generativelanguage.googleapis.com, surfacing as
`Gateway: Non-retryable client error: Gemini HTTP 400 (INVALID_ARGUMENT)
Invalid value ... (TYPE_STRING), 60` and aborting the turn.
Sanitize by dropping the `enum` key when the declared type is numeric
or boolean and any entry is non-string. The `type` and `description`
survive, so the model still knows the allowed values; the tool handler
keeps its own runtime validation. Other providers (OpenAI,
OpenRouter, Anthropic) are unaffected — the sanitizer only runs for
native Gemini / cloudcode adapters.
Reported by @selfhostedsoul on Discord with hermes debug share.
Adds an optional bank_id_template config that derives the bank name at
initialize() time from runtime context. Existing users with a static
bank_id keep the current behavior (template is empty by default).
Supported placeholders:
{profile} — active Hermes profile (agent_identity kwarg)
{workspace} — Hermes workspace (agent_workspace kwarg)
{platform} — cli, telegram, discord, etc.
{user} — platform user id (gateway sessions)
{session} — session id
Unsafe characters in placeholder values are sanitized, and empty
placeholders collapse cleanly (e.g. "hermes-{user}" with no user
becomes "hermes"). If the template renders empty, the static bank_id
is used as a fallback.
Common uses:
bank_id_template: hermes-{profile} # isolate per Hermes profile
bank_id_template: {workspace}-{profile} # workspace + profile scoping
bank_id_template: hermes-{user} # per-user banks for gateway
Reusing session_id as document_id caused data loss on /resume: when
the session is loaded again, _session_turns starts empty and the next
retain replaces the entire previously stored content.
Now each process lifecycle gets its own document_id formed as
{session_id}-{startup_timestamp}, so:
- Same session, same process: turns accumulate into one document (existing behavior)
- Resume (new process, same session): writes a new document, old one preserved
- Forks: child process gets its own document; parent's doc is untouched
Also adds session lineage tags so all processes for the same session
(or its parent) can still be filtered together via recall:
- session:<session_id> on every retain
- parent:<parent_session_id> when initialized with parent_session_id
Closes#6602
The existing test_local_embedded_setup_materializes_profile_env expected
exact equality on ~/.hermes/.env content; the new HINDSIGHT_TIMEOUT=120
line from the timeout feature now appears in that file. Append it to the
expected string so the test reflects the new post_setup output.
The previous commit added HINDSIGHT_TIMEOUT as a configurable env var,
but _run_sync still used the hardcoded _DEFAULT_TIMEOUT (120s). All
async operations (recall, retain, reflect, aclose) now go through an
instance method that uses self._timeout, so the configured value is
actually applied.
Also: added backward-compatible alias comment for the module-level
function.
The Hindsight Cloud API can take 30-40 seconds per request. The
hardcoded 30s timeout was too aggressive and caused frequent
timeout errors. This patch:
1. Adds HINDSIGHT_TIMEOUT environment variable (default: 120s)
2. Adds timeout to the config schema for setup wizard visibility
3. Uses the configurable timeout in both _run_sync() and client creation
4. Reads from config.json or env var, falling back to 120s default
This makes the timeout upgrade-proof — users can set it via env var
or config without patching source code.
Signed-off-by: Kumar <kumar@tekgnosis.net>
The module-global `_loop` / `_loop_thread` pair is shared across every
`HindsightMemoryProvider` instance in the process — the plugin loader
creates one provider per `AIAgent`, and the gateway creates one `AIAgent`
per concurrent chat session (Telegram/Discord/Slack/CLI).
`HindsightMemoryProvider.shutdown()` stopped the shared loop when any one
session ended. That stranded the aiohttp `ClientSession` and `TCPConnector`
owned by every sibling provider on a now-dead loop — they were never
reachable for close and surfaced as the `Unclosed client session` /
`Unclosed connector` warnings reported in #11923.
Fix: stop stopping the shared loop in `shutdown()`. Per-provider cleanup
still closes that provider's own client via `self._client.aclose()`. The
loop runs on a daemon thread and is reclaimed on process exit; keeping
it alive between provider shutdowns means sibling providers can drain
their own sessions cleanly.
Regression tests in `tests/plugins/memory/test_hindsight_provider.py`
(`TestSharedEventLoopLifecycle`):
- `test_shutdown_does_not_stop_shared_event_loop` — two providers share
the loop; shutting down one leaves the loop live for the other. This
test reproduces the #11923 leak on `main` and passes with the fix.
- `test_client_aclose_called_on_cloud_mode_shutdown` — each provider's
own aiohttp session is still closed via `aclose()`.
Fixes#11923.
The cherry-picked model_picker test installed its own discord mock at
module-import time via a local _ensure_discord_mock(), overwriting
sys.modules['discord'] with a mock that lacked attributes other
gateway tests needed (Intents.default(), File, app_commands.Choice).
On pytest-xdist workers that collected test_discord_model_picker.py
first, the shared mock in tests/gateway/conftest.py got clobbered and
downstream tests failed with AttributeError / TypeError against
missing mock attrs. Classic sys.modules cross-test pollution (see
xdist-cross-test-pollution skill).
Fix:
- Extend the canonical _ensure_discord_mock() in tests/gateway/conftest.py
to cover everything the model_picker test needs: real View/Select/
Button/SelectOption classes (not MagicMock sentinels), an Embed
class that preserves title/description/color kwargs for assertion,
and Color.greyple.
- Strip the duplicated mock-setup block from test_discord_model_picker.py
and rely on the shared mock that conftest installs at collection
time.
Regression check:
scripts/run_tests.sh tests/gateway/ tests/hermes_cli/ -k 'discord or model or copilot or provider' -o 'addopts='
1291 passed (was 1288 passed + 3 xdist-ordered failures before this commit).
Keep Discord Copilot model switching responsive and current by refreshing picker data from the live catalog when possible, correcting the curated fallback list, and clearing stale controls before the switch completes.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Xiaomi's API (api.xiaomimimo.com) requires lowercase model IDs like
"mimo-v2.5-pro" but rejects mixed-case names like "MiMo-V2.5-Pro"
that users copy from marketing docs or the ProviderEntry description.
Add _LOWERCASE_MODEL_PROVIDERS set and apply .lower() to model names
for providers in this set (currently just xiaomi) after stripping the
provider prefix. This ensures any case variant in config.yaml is
normalized before hitting the API.
Other providers (minimax, zai, etc.) are NOT affected — their APIs
accept mixed case (e.g. MiniMax-M2.7).
When user runs
✓ Memory provider: built-in only
Saved to config.yaml and leaves the API key blank,
the old code skipped writing it entirely. This caused the uvx daemon
launcher to fail at startup because it couldn't distinguish between
"key not configured" and "explicitly blank key."
Now HINDSIGHT_LLM_API_KEY is always written to .env so the value
is either set or explicitly empty.
- Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in)
- Document DEFAULT_CONFIG and developer guide example
- Add unit tests for default, 1h, and invalid TTL fallback
Made-with: Cursor
Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.
Two gaps caused this:
1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
providers-dict branch (new-style) returned only name/base_url/api_key/
model and dropped api_mode. The legacy custom_providers-list branch
already propagated it correctly. The dict branch now parses and
returns api_mode via _parse_api_mode() in both match paths.
2. agent/auxiliary_client.py::resolve_provider_client — the named
custom provider block at ~L1740 ignored custom_entry['api_mode']
and unconditionally built an OpenAI client (only wrapping for
Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
otherwise plain OpenAI. An explicit task-level api_mode override
still wins over the provider entry's declared api_mode.
Fixes#15033
Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering
- providers-dict preserves valid api_mode
- invalid api_mode values are dropped
- missing api_mode leaves the entry unchanged (no regression)
- resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
api_mode=anthropic_messages
- full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
with an auxiliary.<task> override
- providers without api_mode still use the OpenAI-wire path
When context compression fires mid-session, run_agent's _compress_context
ends the current session, creates a new child session linked by
parent_session_id, and resets the SQLite flush cursor. New messages land
in the child; the parent row ends up with message_count = 0. A user who
runs 'hermes --resume <original_id>' sees a blank chat even though the
transcript exists — just under a descendant id.
PR #12920 already fixed the exit banner to print the live descendant id
at session end, but that didn't help users who resume by a session id
captured BEFORE the banner update (scripts, sessions list, old terminal
scrollback) or who type the parent id manually.
Fix: add SessionDB.resolve_resume_session_id() which walks the
parent→child chain forward and returns the first descendant with at
least one message row. Wire it into all three resume entry points:
- HermesCLI._preload_resumed_session() (early resume at run() time)
- HermesCLI._init_agent() (the classical resume path)
- /resume slash command
Semantics preserved when the chain has no descendants with messages,
when the requested session already has messages, or when the id is
unknown. A depth cap of 32 guards against malformed loops.
This does NOT concatenate the pre-compression parent transcript into
the child — the whole point of compression is to shrink that, so
replaying it would blow the cache budget we saved. We just jump to
the post-compression child. The summary already reflects what was
compressed away.
Tests: tests/hermes_state/test_resolve_resume_session_id.py covers
- the exact 6-session shape from the issue
- passthrough when session has messages / no descendants
- passthrough for nonexistent / empty / None input
- middle-of-chain redirects
- fork resolution (prefers most-recent child)
Closes#15000
Pin the behaviour added in the preceding commit — `_get_proxy_for_base_url()`
must return None for hosts covered by NO_PROXY and the HTTPS_PROXY otherwise,
and the full `_create_openai_client()` path must NOT mount HTTPProxy for a
NO_PROXY host.
Refs: #14966
Follow-up to the allowed_channels wildcard fix in the preceding commit.
The same '*' literal trap affected two other Discord channel config lists:
- DISCORD_IGNORED_CHANNELS: '*' was stored as the literal string in the
ignored set, and the intersection check never matched real channel IDs,
so '*' was a no-op instead of silencing every channel.
- DISCORD_FREE_RESPONSE_CHANNELS: same shape — '*' never matched, so
the bot still required a mention everywhere.
Add a '*' short-circuit to both checks, matching the allowed_channels
semantics. Extend tests/gateway/test_discord_allowed_channels.py with
regression coverage for all three lists.
Refs: #14920
allowed_channels: "*" in config (or DISCORD_ALLOWED_CHANNELS="*" env var)
is meant to allow all channels, but the check was comparing numeric channel
IDs against the literal string set {"*"} via set intersection — always empty,
so every message was silently dropped.
Add a "*" short-circuit before the set intersection, consistent with every
other platform's allowlist handling (Signal, Slack, Telegram all do this).
Fixes#14920
Follow-up to PR #14533 — applies the same _resolve_requests_verify()
treatment to the one requests.get() site the PR missed (Codex OAuth
chatgpt.com /models probe). Keeps all seven requests.get() callsites
in model_metadata.py consistent so HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE /
SSL_CERT_FILE are honored everywhere.
Co-authored-by: teknium1 <teknium@hermes-agent>
Follow-up to aeff6dfe:
- Fix semantic error in VALID_HOOKS inline comment ("after core auth" ->
"before auth"). Hook intentionally runs BEFORE auth so plugins can
handle unauthorized senders without triggering the pairing flow.
- Fix wrong class name in the same comment (HermesGateway ->
GatewayRunner, matching gateway/run.py).
- Add a full ### pre_gateway_dispatch section in
website/docs/user-guide/features/hooks.md (matches the pattern of
every other plugin hook: signature, params table, fires-where,
return-value table, use cases, two worked examples) plus a row in
the quick-reference table.
- Add the anchor link on the plugins.md table row so it matches the
other hook entries.
No code behavior change.
Introduces a new plugin hook `pre_gateway_dispatch` fired once per
incoming MessageEvent in `_handle_message`, after the internal-event
guard but before the auth / pairing chain. Plugins may return a dict
to influence flow:
{"action": "skip", "reason": "..."} -> drop (no reply)
{"action": "rewrite", "text": "..."} -> replace event.text
{"action": "allow"} / None -> normal dispatch
Motivation: gateway-level message-flow patterns that don't fit cleanly
into any single adapter — e.g. listen-only group-chat windows (buffer
ambient messages, collapse on @mention), or human-handover silent
ingest (record messages while an owner handles the chat manually).
Today these require forking core; with this hook they can live in a
single profile-agnostic plugin.
Hook runs BEFORE auth so plugins can handle unauthorized senders
(e.g. customer-service handover ingest) without triggering the
pairing-code flow. Exceptions in plugin callbacks are caught and
logged; the first non-None action dict wins, remaining results are
ignored.
Includes:
- `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py`
- Invocation block in `gateway/run.py::_handle_message`
- 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py`
(skip, rewrite, allow, exception safety, internal-event bypass)
- 2 additional tests in `tests/hermes_cli/test_plugins.py`
- Table entry in `website/docs/user-guide/features/plugins.md`
Made-with: Cursor
- hermes_cli/auth.py: add _default_verify() with macOS Homebrew certifi
fallback (mirrors weixin 3a0ec1d93). Extend env var chain to include
REQUESTS_CA_BUNDLE so one env var works across httpx + requests paths.
- agent/model_metadata.py: add _resolve_requests_verify() reading
HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE in priority
order. Apply explicit verify= to all 6 requests.get callsites.
- Tests: 18 new unit tests + autouse platform pin on existing
TestResolveVerifyFallback to keep its "returns True" assertions
platform-independent.
Empirically verified against self-signed HTTPS server: requests honors
REQUESTS_CA_BUNDLE only; httpx honors SSL_CERT_FILE only. Hermes now
honors all three everywhere.
Triggered by Discord reports — Nous OAuth SSL failure on macOS
Homebrew Python; custom provider self-signed cert ignored despite
REQUESTS_CA_BUNDLE set in env.
The aliases were added to hermes_cli/providers.py but auth.py has its own
_PROVIDER_ALIASES table inside resolve_provider() that is consulted before
PROVIDER_REGISTRY lookup. Without this, provider: alibaba_coding in
config.yaml (the exact repro from #14940) raised 'Unknown provider'.
Mirror the three aliases into auth.py so resolve_provider() accepts them.
The alibaba-coding-plan provider (coding-intl.dashscope.aliyuncs.com/v1)
was not registered in providers.py or auth.py. When users set
provider: alibaba_coding or provider: alibaba-coding-plan in config.yaml,
Hermes could not resolve the credentials and fell back to OpenRouter
or rejected the request with HTTP 401/402 (issue #14940).
Changes:
- providers.py: add HermesOverlay for alibaba-coding-plan with
ALIBABA_CODING_PLAN_BASE_URL env var support
- providers.py: add aliases alibaba_coding, alibaba-coding,
alibaba_coding_plan -> alibaba-coding-plan
- auth.py: add ProviderConfig for alibaba-coding-plan with:
- inference_base_url: https://coding-intl.dashscope.aliyuncs.com/v1
- api_key_env_vars: ALIBABA_CODING_PLAN_API_KEY, DASHSCOPE_API_KEY
Fixes#14940
Manual /compress crashed with 'LCMEngine' object has no attribute
'_align_boundary_forward' when any context-engine plugin was active.
The gateway handler reached into _align_boundary_forward and
_find_tail_cut_by_tokens on tmp_agent.context_compressor, but those
are ContextCompressor-specific — not part of the generic ContextEngine
ABC — so every plugin engine (LCM, etc.) raised AttributeError.
- Add optional has_content_to_compress(messages) to ContextEngine ABC
with a safe default of True (always attempt).
- Override it in the built-in ContextCompressor using the existing
private helpers — preserves exact prior behavior for 'compressor'.
- Rewrite gateway /compress preflight to call the ABC method, deleting
the private-helper reach-in.
- Add focus_topic to the ABC compress() signature. Make _compress_context
retry without focus_topic on TypeError so older strict-sig plugins
don't crash on manual /compress <focus>.
- Regression test with a fake ContextEngine subclass that only
implements the ABC (mirrors LCM's surface).
Reported by @selfhostedsoul (Discord, Apr 22).
faster-whisper's device="auto" picks CUDA when ctranslate2's wheel
ships CUDA shared libs, even on hosts without the NVIDIA runtime
(libcublas.so.12 / libcudnn*). On those hosts the model often loads
fine but transcribe() fails at first dlopen, and the broken model
stays cached in the module-global — every subsequent voice message
in the gateway process fails identically until restart.
- Add _load_local_whisper_model() wrapper: try auto, catch missing-lib
errors, retry on device=cpu compute_type=int8.
- Wrap transcribe() with the same fallback: evict cached model, reload
on CPU, retry once. Required because the dlopen failure only surfaces
at first kernel launch, not at model construction.
- Narrow marker list (libcublas, libcudnn, libcudart, 'cannot be loaded',
'no kernel image is available', 'no CUDA-capable device', driver
mismatch). Deliberately excludes 'CUDA out of memory' and similar —
those are real runtime failures that should surface, not be silently
retried on CPU.
- Tests for load-time fallback, runtime fallback (with cached-model
eviction verified), and the OOM non-fallback path.
Reported via Telegram voice-message dumps on WSL2 hosts where libcublas
isn't installed by default.
Local llama.cpp servers (e.g. ggml-org/llama.cpp:full-cuda) fail the entire
request with HTTP 400 'Unable to generate parser for this template. ...
Unrecognized schema: "object"' when any tool schema contains shapes its
json-schema-to-grammar converter can't handle:
* 'type': 'object' without 'properties'
* bare string schema values ('additionalProperties: "object"')
* 'type': ['X', 'null'] arrays (nullable form)
Cloud providers accept these silently, so they ship from external MCP
servers (Atlassian, GCloud, Datadog) and from a couple of our own tools.
Changes
- tools/schema_sanitizer.py: walks the finalized tool list right before it
leaves get_tool_definitions() and repairs the hostile shapes in a deep
copy. No-op on well-formed schemas. Recurses into properties, items,
additionalProperties, anyOf/oneOf/allOf, and $defs.
- model_tools.get_tool_definitions(): invoke the sanitizer as the last
step so all paths (built-in, MCP, plugin, dynamically-rebuilt) get
covered uniformly.
- tools/browser_cdp_tool.py, tools/mcp_tool.py: fix our own bare-object
schemas so sanitization isn't load-bearing for in-repo tools.
- tui_gateway/server.py: _load_enabled_toolsets() was passing
include_default_mcp_servers=False at runtime. That's the config-editing
variant (see PR #3252) — it silently drops every default MCP server
from the TUI's enabled_toolsets, which is why the TUI didn't hit the
llama.cpp crash (no MCP tools sent at all). Switch to True so TUI
matches CLI behavior.
Tests
tests/tools/test_schema_sanitizer.py (17 tests) covers the individual
failure modes, well-formed pass-through, deep-copy isolation, and
required-field pruning.
E2E: loaded the default 'hermes-cli' toolset with MCP discovery and
confirmed all 27 resolved tool schemas pass a llama.cpp-compatibility
walk (no 'object' node missing 'properties', no bare-string schema
values).
Round-2 Copilot review on #14968 caught two leftover spots that didn't
fully respect per-section overrides:
- messageLine.tsx (trail branch): the previous fix gated on
`SECTION_NAMES.some(...)`, which stayed true whenever any section was
visible. With `thinking: 'expanded'` as the new built-in default,
that meant `display.sections.tools: hidden` left an empty wrapper Box
alive for trail messages. Now gates on the actual content-bearing
sections for a trail message — `tools` OR `activity` — so a
tools-hidden config drops the wrapper cleanly.
- messageLine.tsx (showDetails): still keyed off the global
`detailsMode !== 'hidden'`, so per-section overrides like
`sections.thinking: expanded` couldn't escape global hidden for
assistant messages with reasoning + tool metadata. Recomputed via
resolved per-section modes (`thinkingMode`/`toolsMode`).
- types.ts: rewrote the SectionVisibility doc comment to reflect the
actual resolution order (explicit override → SECTION_DEFAULTS →
global), so the docstring stops claiming "missing keys fall back to
the global mode" when SECTION_DEFAULTS now layers in between.
All three lookups (thinking/tools/activity) are computed once at the
top of MessageLine and shared by every branch.
Extends SECTION_DEFAULTS so the out-of-the-box TUI shows the turn as
a live transcript (reasoning + tool calls streaming inline) instead of
a wall of `▸` chevrons the user has to click every turn.
Final default matrix:
- thinking: expanded
- tools: expanded
- activity: hidden (unchanged from the previous commit)
- subagents: falls through to details_mode (collapsed by default)
Everything explicit in `display.sections` still wins, so anyone who
already pinned an override keeps their layout. One-line revert is
`display.sections.<name>: collapsed`.
Copilot review on #14968 caught that the early returns gated on the
global `detailsMode === 'hidden'` short-circuited every render path
before sectionMode() got a chance to apply per-section overrides — so
`details_mode: hidden` + `sections.tools: expanded` was silently a no-op.
Three call sites had the same bug shape; all now key off the resolved
section modes:
- ToolTrail: replace the `detailsMode === 'hidden'` early return with
an `allHidden = every section resolved to hidden` check. When that's
true, fall back to the floating-alert backstop (errors/warnings) so
quiet-mode users aren't blind to ambient failures, and update the
comment block to match the actual condition.
- messageLine.tsx: drop the same `detailsMode === 'hidden'` pre-check
on `msg.kind === 'trail'`; only skip rendering the wrapper when every
section resolves to hidden (`SECTION_NAMES.some(...) !== 'hidden'`).
- useMainApp.ts: rebuild `showProgressArea` around `anyPanelVisible`
instead of branching on the global mode. This also fixes the
suppressed Copilot concern about an empty wrapper Box rendering above
the streaming area when ToolTrail returns null.
Regression test in details.test.ts pins the override-escapes-hidden
behaviour for tools/thinking/activity. 271/271 vitest, lints clean.
- domain/details: extract `norm()`, fold parseDetailsMode + resolveSections
into terser functional form, reject array values for resolveSections
- slash /details: destructure tokens, factor reset/mode into one dispatch,
drop DETAIL_MODES set + DetailsMode/SectionName imports (parseDetailsMode
+ isSectionName narrow + return), centralize usage strings
- ToolTrail: collapse 4 separate xxxSection vars into one memoized
`visible` map; effect deps stabilize on the memo identity instead of
4 primitives
The activity panel (gateway hints, terminal-parity nudges, background
notifications) is noise for the typical day-to-day user, who only cares
about thinking + tools + streamed content. Make `hidden` the built-in
default for that section so users land on the quiet mode out of the box.
Tool failures still render inline on the failing tool row, so this
default suppresses the noise feed without losing the signal.
Opt back in with `display.sections.activity: collapsed` (chevron) or
`expanded` (always open) in `~/.hermes/config.yaml`, or live with
`/details activity collapsed`.
Implementation: SECTION_DEFAULTS in domain/details.ts, applied as the
fallback in `sectionMode()` between the explicit override and the
global details_mode. Existing `display.sections.activity` overrides
take precedence — no migration needed for users who already set it.
Wrap the existing version label in the welcome-banner panel title
('Hermes Agent v… · upstream … · local …') with an OSC-8 terminal
hyperlink pointing at the latest git tag's GitHub release page
(https://github.com/NousResearch/hermes-agent/releases/tag/<tag>).
Clickable in modern terminals (iTerm2, WezTerm, Windows Terminal,
GNOME Terminal, Kitty, etc.); degrades to plain text on terminals
without OSC-8 support. No new line added to the banner.
New get_latest_release_tag() helper runs 'git describe --tags
--abbrev=0' in the Hermes checkout (3s timeout, per-process cache,
silent fallback for non-git/pip installs and forks without tags).
OpenRouter returns a 404 with the specific message
'No endpoints available matching your guardrail restrictions and data
policy. Configure: https://openrouter.ai/settings/privacy'
when a user's account-level privacy setting excludes the only endpoint
serving a model (e.g. DeepSeek V4 Pro, which today is hosted only by
DeepSeek's own endpoint that may log inputs).
Before this change we classified it as model_not_found, which was
misleading (the model exists) and triggered provider fallback (useless —
the same account setting applies to every OpenRouter call).
Now it classifies as a new FailoverReason.provider_policy_blocked with
retryable=False, should_fallback=False. The error body already contains
the fix URL, so the user still gets actionable guidance.
- disable ANSI dim on VTE terminals by default so dark-background reasoning and accents stay readable
- suppress local multiplexer OSC52 echo while preserving remote passthrough and add regression coverage
On ChatGPT Codex OAuth every gpt-5.x slug actually caps at 272,000 tokens,
but Hermes was resolving gpt-5.5 / gpt-5.4 to 1,050,000 (from models.dev)
because openai-codex aliases to the openai entry there. At 1.05M the
compressor never fires and requests hard-fail with 'context window
exceeded' around the real 272k boundary.
Verified live against chatgpt.com/backend-api/codex/models:
gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex,
gpt-5.2, gpt-5.1-codex-max → context_window = 272000
Changes:
- agent/model_metadata.py:
* _fetch_codex_oauth_context_lengths() — probe the Codex /models
endpoint with the OAuth bearer token and read context_window per
slug (1h in-memory TTL).
* _resolve_codex_oauth_context_length() — prefer the live probe,
fall back to hardcoded _CODEX_OAUTH_CONTEXT_FALLBACK (all 272k).
* Wire into get_model_context_length() when provider=='openai-codex',
running BEFORE the models.dev lookup (which returns 1.05M). Result
persists via save_context_length() so subsequent lookups skip the
probe entirely.
* Fixed the now-wrong comment on the DEFAULT_CONTEXT_LENGTHS gpt-5.5
entry (400k was never right for Codex; it's the catch-all for
providers we can't probe live).
Tests (4 new in TestCodexOAuthContextLength):
- fallback table used when no token is available (no models.dev leakage)
- live probe overrides the fallback
- probe failure (non-200) falls back to hardcoded 272k
- non-codex providers (openrouter, direct openai) unaffected
Non-codex context resolution is unchanged — the Codex branch only fires
when provider=='openai-codex'.
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
Generates a full dedicated Docusaurus page for every one of the 132 skills
(73 bundled + 59 optional) under website/docs/user-guide/skills/{bundled,optional}/<category>/.
Each page carries the skill's description, metadata (version, author, license,
dependencies, platform gating, tags, related skills cross-linked to their own
pages), and the complete SKILL.md body that Hermes loads at runtime.
Previously the two catalog pages just listed skills with a one-line blurb and
no way to see what the skill actually did — users had to go read the source
repo. Now every skill has a browsable, searchable, cross-linked reference in
the docs.
- website/scripts/generate-skill-docs.py — generator that reads skills/ and
optional-skills/, writes per-skill pages, regenerates both catalog indexes,
and rewrites the Skills section of sidebars.ts. Handles MDX escaping
(outside fenced code blocks: curly braces, unsafe HTML-ish tags) and
rewrites relative references/*.md links to point at the GitHub source.
- website/docs/reference/skills-catalog.md — regenerated; each row links to
the new dedicated page.
- website/docs/reference/optional-skills-catalog.md — same.
- website/sidebars.ts — Skills section now has Bundled / Optional subtrees
with one nested category per skill folder.
- .github/workflows/{docs-site-checks,deploy-site}.yml — run the generator
before docusaurus build so CI stays in sync with the source SKILL.md files.
Build verified locally with `npx docusaurus build`. Only remaining warnings
are pre-existing broken link/anchor issues in unrelated pages.
* feat(config): make tool output truncation limits configurable
Port from anomalyco/opencode#23770: expose a new `tool_output` config
section so users can tune the hardcoded truncation caps that apply to
terminal output and read_file pagination.
Three knobs under `tool_output`:
- max_bytes (default 50_000) — terminal stdout/stderr cap
- max_lines (default 2000) — read_file pagination cap
- max_line_length (default 2000) — per-line cap in line-numbered view
All three keep their existing hardcoded values as defaults, so behaviour
is unchanged when the section is absent. Power users on big-context
models can raise them; small-context local models can lower them.
Implementation:
- New `tools/tool_output_limits.py` reads the section with defensive
fallback (missing/invalid values → defaults, never raises).
- `tools/terminal_tool.py` MAX_OUTPUT_CHARS now comes from
get_max_bytes().
- `tools/file_operations.py` normalize_read_pagination() and
_add_line_numbers() now pull the limits at call time.
- `hermes_cli/config.py` DEFAULT_CONFIG gains the `tool_output` section
so `hermes setup` writes defaults into fresh configs.
- Docs page `user-guide/configuration.md` gains a "Tool Output
Truncation Limits" section with large-context and small-context
example configs.
Tests (18 new in tests/tools/test_tool_output_limits.py):
- Default resolution with missing / malformed / non-dict config.
- Full and partial user overrides.
- Coercion of bad values (None, negative, wrong type, str int).
- Shortcut accessors delegate correctly.
- DEFAULT_CONFIG exposes the section with the right defaults.
- Integration: normalize_read_pagination clamps to the configured
max_lines.
* feat(skills): add design-md skill for Google's DESIGN.md spec
Built-in skill under skills/creative/ that teaches the agent to author,
lint, diff, and export DESIGN.md files — Google's open-source
(Apache-2.0) format for describing a visual identity to coding agents.
Covers:
- YAML front matter + markdown body anatomy
- Full token schema (colors, typography, rounded, spacing, components)
- Canonical section order + duplicate-heading rejection
- Component property whitelist + variants-as-siblings pattern
- CLI workflow via 'npx @google/design.md' (lint/diff/export/spec)
- Lint rule reference including WCAG contrast checks
- Common YAML pitfalls (quoted hex, negative dimensions, dotted refs)
- Starter template at templates/starter.md
Package verified live on npm (@google/design.md@0.1.1).
MCP stdio servers' stderr was being dumped directly onto the user's
terminal during hermes launch. Servers like FastMCP-based ones print a
large ASCII banner at startup; slack-mcp-server emits JSON logs; etc.
With prompt_toolkit / Rich rendering the TUI concurrently, these
unsolicited writes corrupt the terminal state — hanging the session
~80% of the time for one user with Google Ads Tools + slack-mcp
configured, forcing Ctrl+C and restart loops.
Root cause: `stdio_client(server_params)` in tools/mcp_tool.py was
called without `errlog=`, and the SDK's default is `sys.stderr` —
i.e. the real parent-process stderr, which is the TTY.
Fix: open a shared, append-mode log at $HERMES_HOME/logs/mcp-stderr.log
(created once per process, line-buffered, real fd required by asyncio's
subprocess machinery) and pass it as `errlog` to every stdio_client.
Each server's spawn writes a timestamped header so the shared log stays
readable when multiple servers are running. Falls back to /dev/null if
the log file cannot be opened.
Verified by E2E spawning a subprocess with the log fd as its stderr:
banner lines land in the log file, nothing reaches the calling TTY.
FloatingOverlays (SessionPicker, ModelPicker, SkillsHub, pager,
completions) was nested inside the !isBlocked guard in ComposerPane.
When any overlay opened, isBlocked became true, which removed the
entire composer box from the tree — including the overlay that was
trying to render. This made /resume with no args appear to do nothing
(the input line vanished and no picker appeared).
Since 99d859ce (feat: refactor by splitting up app and doing proper
state), isBlocked gated only the text input lines so that
approval/clarify prompts and pickers rendered above a hidden composer.
The regression happened in 408fc893 (fix(tui): tighten composer — status
sits directly above input, overlays anchor to input) when
FloatingOverlays was moved into the input row for anchoring but
accidentally kept inside the !isBlocked guard.
so here, we render FloatingOverlays outside the !isBlocked guard inside
the same position:relative Box, so overlays
stay visible even when text input is hidden. Only the actual input
buffer lines and TextInput are gated now.
Fixes: /resume, /history, /logs, /model, /skills, and completion
dropdowns when blocked overlays are active.
Two fixes on top of the fuzzy-@ branch:
(1) Rebase artefact: re-apply only the fuzzy additions on top of
fresh `tui_gateway/server.py`. The earlier commit was cut from a
base 58 commits behind main and clobbered ~170 lines of
voice.toggle / voice.record handlers and the gateway crash hooks
(`_panic_hook`, `_thread_panic_hook`). Reset server.py to
origin/main and re-add only:
- `_FUZZY_*` constants + `_list_repo_files` + `_fuzzy_basename_rank`
- the new fuzzy branch in the `complete.path` handler
(2) Path scoping (Copilot review): `git ls-files` returns repo-root-
relative paths, but completions need to resolve under the gateway's
cwd. When hermes is launched from a subdirectory, the previous
code surfaced `@file:apps/web/src/foo.tsx` even though the agent
would resolve that relative to `apps/web/` and miss. Fix:
- `git -C root rev-parse --show-toplevel` to get repo top
- `git -C top ls-files …` for the listing
- `os.path.relpath(top + p, root)` per result, dropping anything
starting with `../` so the picker stays scoped to cwd-and-below
(matches Cmd-P workspace semantics)
`apps/web/src/foo.tsx` ends up as `@file:src/foo.tsx` from inside
`apps/web/`, and sibling subtrees + parent-of-cwd files don't leak.
New test `test_fuzzy_paths_relative_to_cwd_inside_subdir` builds a
3-package mono-repo, runs from `apps/web/`, and verifies completion
paths are subtree-relative + outside-of-cwd files don't appear.
Copilot review threads addressed: #3134675504 (path scoping),
#3134675532 (`voice.toggle` regression), #3134675541 (`voice.record`
regression — both were stale-base artefacts, not behavioural changes).
Rebase-artefact cleanup on this branch:
- Restore `voice.status` and `voice.transcript` cases in
createGatewayEventHandler plus the `voice` / `submission` /
`composer.setInput` ctx destructuring. They were added to main in
the 58-commit gap that this branch was originally cut behind;
dropping them was unintentional.
- Rebase the test ctx shape to match main (voice.* fakes,
submission.submitRef, composer.setInput) and apply the same
segment-anchor test rewrites on top.
- Drop the `#14XXX` placeholder from the tool.complete comment;
replace with a plain-English rationale.
- Rewrite the broken mid-word "pushInlineDiff- Segment" in
turnController's dedupe comment to refer to
pushInlineDiffSegment and `kind: 'diff'` plainly.
- Collapse the filter predicate in recordMessageComplete from a
4-line if/return into one boolean expression — same semantics,
reads left-to-right as a single predicate.
Copilot review threads resolved: #3134668789, #3134668805,
#3134668822.
Visual polish on top of the segment-anchor change: diff blocks were
butting up against the narration around them. Tag diff-only segments
with `kind: 'diff'` (extended on Msg) and give them `marginTop={1}` +
`marginBottom={1}` in MessageLine, matching the spacing we already
use for user messages. Also swaps the regex-based `diffSegmentBody`
check for an explicit `kind === 'diff'` guard so the dedupe path is
clearer.
Revisits #13729. That PR buffered each `tool.complete`'s inline_diff
and merged them into the final assistant message body as a fenced
```diff block. The merge-at-end placement reads as "the agent wrote
this after the summary", even when the edit fired mid-turn — which
is both misleading and (per blitz feedback) feels like noise tacked
onto the end of every task.
Segment-anchored placement instead:
- On tool.complete with inline_diff, `pushInlineDiffSegment` calls
`flushStreamingSegment` first (so any in-progress narration lands
as its own segment), then pushes the ```diff block as its own
segment into segmentMessages. The diff is now anchored BETWEEN the
narration that preceded the edit and whatever the agent streams
afterwards, which is where the edit actually happened.
- `recordMessageComplete` no longer merges buffered diffs. The only
remaining dedupe is "drop diff-only segments whose body the final
assistant text narrates verbatim (or whose diff fence the final
text already contains)" — same tradeoff as before, kept so an
agent that narrates its own diff doesn't render two stacked copies.
- Drops `pendingInlineDiffs` and `queueInlineDiff` — buffer + end-
merge machinery is gone; segmentMessages is now the only source
of truth.
Side benefit: Ctrl+C interrupt (`interruptTurn`) iterates
segmentMessages, so diff segments are now preserved in the
transcript when the user cancels after an edit. Previously the
pending buffer was silently dropped on interrupt.
Reported by Teknium during blitz usage: "no diffs are ever at the
end because it didn't make this file edit after the final message".
Typing `@appChrome` in the composer should surface
`ui-tui/src/components/appChrome.tsx` without requiring the user to
first type the full directory path — matches the Cmd-P behaviour
users expect from modern editors.
The gateway's `complete.path` handler was doing a plain
`os.listdir(".")` + `startswith` prefix match, so basenames only
resolved inside the current working directory. This reworks it to:
- enumerate repo files via `git ls-files -z --cached --others
--exclude-standard` (fast, honours `.gitignore`); fall back to a
bounded `os.walk` that skips common vendor / build dirs when the
working dir isn't a git repo. Results cached per-root with a 5s
TTL so rapid keystrokes don't respawn git processes.
- rank basenames with a 5-tier scorer: exact → prefix → camelCase
/ word-boundary → substring → subsequence. Shorter basenames win
ties; shorter rel paths break basename-length ties.
- only take the fuzzy branch when the query is bare (no `/`), is a
context reference (`@...`), and isn't `@folder:` — path-ish
queries and folder tags fall through to the existing
directory-listing path so explicit navigation intent is
preserved.
Completion rows now carry `display = basename`,
`meta = directory`, so the picker renders
`appChrome.tsx ui-tui/src/components` on one row (basename bold,
directory dim) — the meta column was previously "dir" / "" and is
a more useful signal for fuzzy hits.
Reported by Ben Barclay during the TUI v2 blitz test.
Adds a per-ink-text measurement cache keyed by width|widthMode to avoid
re-squashing and re-wrapping the same text when yoga calls measureFunc
multiple times per frame with different widths during flex layout re-pass.
When a tool schema declares `type: array` or `type: object` and the model
emits the value as a JSON string (common with complex oneOf discriminated
unions), the MCP server rejects it with -32602 "expected array, received
string". Extend `_coerce_value` to attempt `json.loads` for these types
and replace the string with the parsed value before dispatch.
Root cause confirmed via live testing: `add_reminders.reminders` uses a
oneOf discriminated union (relative/absolute/location) that triggers model
output drift. Sending a real array passes validation; sending a string
reproduces the exact error.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The voice.toggle handler was persisting display.voice_enabled /
display.voice_tts to config.yaml, so a TUI session that ever turned
voice on would re-open with it already on (and the mic badge lit) on
every subsequent launch. cli.py treats voice strictly as runtime
state: _voice_mode = False at __init__, only /voice on flips it, and
nothing writes it back to disk.
Drop the _write_config_key calls in voice.toggle on/off/tts and the
config.yaml fallback in _voice_mode_enabled / _voice_tts_enabled.
State is now env-var-only (HERMES_VOICE / HERMES_VOICE_TTS), scoped to
the live gateway subprocess — the next launch starts clean.
Crash-log stack trace (tui_gateway_crash.log) from the user's session
pinned the regression: SIGPIPE arrived while main thread was blocked on
for-raw-in-sys.stdin — i.e., a background thread (debug print to stderr,
most likely from HERMES_VOICE_DEBUG=1) wrote to a pipe whose buffer the
TUI hadn't drained yet, and SIG_DFL promptly killed the process.
Two fixes that together restore CLI parity:
- entry.py: SIGPIPE → SIG_IGN instead of the _log_signal handler that
then exited. With SIG_IGN, Python raises BrokenPipeError on the
offending write, which write_json already handles with a clean exit
via _log_exit. SIGTERM / SIGHUP still route through _log_signal so
real termination signals remain diagnosable.
- hermes_cli/voice.py:_debug: wrap the stderr print in a BrokenPipeError
/ OSError try/except. This runs from daemon threads (silence callback,
TTS playback, beep), so a broken stderr must not escape and ride up
into the main event loop.
Verified by spawning the gateway subprocess locally:
voice.toggle status → 200 OK, process stays alive, clean exit on
stdin close logs "reason=stdin EOF" instead of a silent reap.
SIG_DFL for SIGPIPE means the kernel reaps the gateway subprocess the
instant a background thread (TTS playback, silence callback, voice
status emitter) writes to a stdout the TUI stopped reading — before
the Python interpreter can run excepthook, threading.excepthook,
atexit, or the entry.py post-loop _log_exit.
Replace the three SIG_DFL / SIG_IGN bindings with a _log_signal
handler that:
- records which signal (SIGPIPE / SIGTERM / SIGHUP) fired and when;
- dumps the main-thread stack at signal delivery AND every live
thread's stack via sys._current_frames — the background-thread
write that provoked SIGPIPE is almost always visible here;
- writes everything to ~/.hermes/logs/tui_gateway_crash.log and prints
a [gateway-signal] breadcrumb to stderr so the TUI Activity surfaces
it as well.
SIGINT stays ignored (TUI handles Ctrl+C for the user).
Gateway exits weren't reaching the panic hook because entry.py calls
sys.exit(0) on broken stdout — clean termination, no exception. That
left "gateway exited" in the TUI with zero forensic trail when pipe
breaks happened mid-turn.
Entry.py now tags each exit path — startup-write failure, parse-error-
response write failure, per-method response write failure, stdin EOF —
with a one-line entry in ~/.hermes/logs/tui_gateway_crash.log and a
gateway.stderr breadcrumb. Includes the JSON-RPC method name on the
dispatch path, which is the only way to tell "died right after handling
voice.toggle on" from "died emitting the second message.complete".
When the gateway subprocess raises an unhandled exception during a
voice-mode turn, nothing survives: stdout is the JSON-RPC pipe, stderr
flushes but the process is already exiting, and no log file catches
Python's default traceback print. The user is left with an
undiagnosable "gateway exited" banner.
Install:
- sys.excepthook → write full traceback to tui_gateway_crash.log +
echo the first line to stderr (which the TUI pumps into
Activity as a gateway.stderr event). Chains to the default hook so
the process still terminates.
- threading.excepthook → same, tagged with the thread name so it's
clear when the crash came from a daemon thread (beep playback, TTS,
silence callback, etc.).
- Turn-dispatcher except block now also appends a traceback to the
crash log before emitting the user-visible error event — str(e)
alone was too terse to identify where in the voice pipeline the
failure happened.
Zero behavioural change on the happy path; purely forensics.
TTS feedback loop (hermes_cli/voice.py)
The VAD loop kept the microphone live while speak_text played the
agent's reply over the speakers, so the reply itself was picked up,
transcribed, and submitted — the agent then replied to its own echo
("Ha, looks like we're in a loop").
Ported cli.py:_voice_tts_done synchronisation:
- _tts_playing: threading.Event (initially set = "not playing").
- speak_text cancels the active recorder before opening the speakers,
clears _tts_playing, and on exit waits 300 ms before re-starting the
recorder — long enough for the OS audio device to settle so afplay
and sounddevice don't race for it.
- _continuous_on_silence now waits on _tts_playing (up to 60 s) before
re-arming the mic with another 300 ms gap, mirroring
cli.py:10619-10621. If the user flips voice off during the wait the
loop exits cleanly instead of fighting for the device.
Without both halves the loop races: if the silence callback fires
before TTS starts it re-arms immediately; if TTS is already playing
the pause-and-resume path catches it.
Red REC badge (ui-tui appChrome + useMainApp)
Classic CLI (cli.py:_get_voice_status_fragments) renders "● REC" in
red and "◉ STT" in amber. TUI was showing a dim "REC" with no dot,
making it hard to spot at a glance. voiceLabel now emits the same
glyphs and appChrome colours them via t.color.error / t.color.warn,
falling back to dim for the idle label.
Three issues surfaced during end-to-end testing of the CLI-parity voice
loop and are fixed together because they all blocked "speak → agent
responds → TTS reads it back" from working at all:
1. Wrong result key (hermes_cli/voice.py)
transcribe_recording() returns {"success": bool, "transcript": str},
matching cli.py:_voice_stop_and_transcribe. The wrapper was reading
result.get("text"), which is None, so every successful Groq / local
STT response was thrown away and the 3-strikes halt fired after
three silent-looking cycles. Fixed by reading "transcript" and also
honouring "success" like the CLI does. Updated the loop simulation
tests to return the correct shape.
2. TTS speak-back was missing (tui_gateway/server.py + hermes_cli/voice.py)
The TUI had a voice.toggle "tts" subcommand but nothing downstream
actually read the flag — agent replies never spoke. Mirrored
cli.py:8747-8754's dispatch: on message.complete with status ==
"complete", if _voice_tts_enabled() is true, spawn a daemon thread
running speak_text(response). Rewrote speak_text as a full port of
cli.py:_voice_speak_response — same markdown-strip regex pipeline
(code blocks, links, bold/italic, inline code, headers, list bullets,
horizontal rules, excessive newlines), same 4000-char cap, same
explicit mp3 output path, same MP3-over-OGG playback choice (afplay
misbehaves on OGG), same cleanup of both extensions. Keeps TUI TTS
audible output byte-for-byte identical to the classic CLI.
3. Auto-submit swallowed on non-empty composer (createGatewayEventHandler.ts)
The voice.transcript handler branched on prev input via a setInput
updater and fired submitRef.current inside the updater when prev was
empty. React strict mode double-invokes state updaters, which would
queue the submit twice; and when the composer had any content the
transcript was merely appended — the agent never saw it. CLI
_pending_input.put(transcript) unconditionally feeds the transcript
as the next turn, so match that: always clear the composer and
setTimeout(() => submitRef.current(text), 0) outside any updater.
Side effect can't run twice this way, and a half-typed draft on the
rare occasion is a fair trade vs. silently dropping the turn.
Also added peak_rms to the rec.stop debug line so "recording too quiet"
is diagnosable at a glance when HERMES_VOICE_DEBUG=1.
The TUI had drifted from the CLI's voice model in two ways:
- /voice on was lighting up the microphone immediately and Ctrl+B was
interpreted as a mode toggle. The CLI separates the two: /voice on
just flips the umbrella bit, recording only starts once the user
presses Ctrl+B, which also sets _voice_continuous so the VAD loop
auto-restarts until the user presses Ctrl+B again or three silent
cycles pass.
- /voice tts was missing entirely, so users couldn't turn agent reply
speech on/off from inside the TUI.
This commit brings the TUI to parity.
Python
- hermes_cli/voice.py: continuous-mode API (start_continuous,
stop_continuous, is_continuous_active) layered on the existing PTT
wrappers. The silence callback transcribes, fires on_transcript,
tracks consecutive no-speech cycles, and auto-restarts — mirroring
cli.py:_voice_stop_and_transcribe + _restart_recording.
- tui_gateway/server.py:
- voice.toggle now supports on / off / tts / status. The umbrella
bit lives in HERMES_VOICE + display.voice_enabled; tts lives in
HERMES_VOICE_TTS + display.voice_tts. /voice off also tears down
any active continuous loop so a toggle-off really releases the
microphone.
- voice.record start/stop now drives start_continuous/stop_continuous.
start is refused with a clear error when the mode is off, matching
cli.py:handle_voice_record's early return on `not _voice_mode`.
- New voice.transcript / voice.status events emit through
_voice_emit (remembers the sid that last enabled the mode so
events land in the right session).
TypeScript
- gatewayTypes.ts: voice.status + voice.transcript event
discriminants; VoiceToggleResponse gains tts; VoiceRecordResponse
gains status for the new "started/stopped" responses.
- interfaces.ts: GatewayEventHandlerContext gains composer.setInput +
submission.submitRef + voice.{setRecording, setProcessing,
setVoiceEnabled}; InputHandlerContext.voice gains enabled +
setVoiceEnabled for the mode-aware Ctrl+B handler.
- createGatewayEventHandler.ts: voice.status drives REC/STT badges;
voice.transcript auto-submits when the composer is empty (CLI
_pending_input.put parity) and appends when a draft is in flight.
no_speech_limit flips voice off + sys line.
- useInputHandlers.ts: Ctrl+B now calls voice.record (start/stop),
not voice.toggle, and nudges the user with a sys line when the
mode is off instead of silently flipping it on.
- useMainApp.ts: wires the new event-handler context fields.
- slash/commands/session.ts: /voice handles on / off / tts / status
with CLI-matching output ("voice: mode on · tts off").
Backward compat preserved for voice.record (was always PTT shape;
gateway still honours start/stop with mode-gating added).
tui_gateway/server.py:3486/3491/3509 imports start_recording,
stop_and_transcribe, and speak_text from hermes_cli.voice, but the
module never existed (not in git history — never shipped, never
deleted). Every voice.record / voice.tts RPC call hit the ImportError
branch and the TUI surfaced it as "voice module not available — install
audio dependencies" even on boxes with sounddevice / faster-whisper /
numpy installed.
Adds a thin wrapper on top of tools.voice_mode (recording +
transcription) and tools.tts_tool (text-to-speech):
- start_recording() — idempotent; stores the active AudioRecorder in a
module-global guarded by a Lock so repeat Ctrl+B presses don't fight
over the mic.
- stop_and_transcribe() — returns None for no-op / no-speech /
Whisper-hallucination cases so the TUI's existing "no speech detected"
path keeps working unchanged.
- speak_text(text) — lazily imports tts_tool (optional provider SDKs
stay unloaded until the first /voice tts call), parses the tool's
JSON result, and plays the audio via play_audio_file.
Paired with the Ctrl+B keybinding fix in the prior commit, the TUI
voice pipeline now works end-to-end for the first time.
When the user runs /voice and then presses Ctrl+B in the TUI, three
handlers collaborate to consume the chord and none of them dispatch
voice.record:
- isAction() is platform-aware — on macOS it requires Cmd (meta/super),
so Ctrl+B fails the match in useInputHandlers and never triggers
voiceStart/voiceStop.
- TextInput's Ctrl+B pass-through list doesn't include 'b', so the
keystroke falls through to the wordMod backward-word branch on Linux
and to the printable-char insertion branch on macOS — the latter is
exactly what timmie reported ("enters a b into the tui").
- /voice emits "voice: on" with no hint, so the user has no way to
know Ctrl+B is the recording toggle.
Introduces isVoiceToggleKey(key, ch) in lib/platform.ts that matches
raw Ctrl+B on every platform (mirrors tips.py and config.yaml's
voice.record_key default) and additionally accepts Cmd+B on macOS so
existing muscle memory keeps working. Wires it into useInputHandlers,
adds Ctrl+B to TextInput's pass-through list so the global handler
actually receives the chord, and appends "press Ctrl+B to record" to
the /voice on message.
Empirically verified with hermes --tui: Ctrl+B no longer leaks 'b'
into the composer and now dispatches the voice.record RPC (the
downstream ImportError for hermes_cli.voice is a separate upstream
bug — follow-up patch).
The 300s default was too tight for high-reasoning models on non-trivial
delegated tasks — e.g. gpt-5.5 xhigh reviewing 12 files would burn >5min
on reasoning tokens before issuing its first tool call, tripping the
hard wall-clock timeout with 0 api_calls logged.
- tools/delegate_tool.py: DEFAULT_CHILD_TIMEOUT 300 -> 600
- hermes_cli/config.py: surface delegation.child_timeout_seconds in
DEFAULT_CONFIG so it's discoverable (previously the key was read by
_get_child_timeout() but absent from the default config schema)
Users can still override via config.yaml delegation.child_timeout_seconds
or DELEGATION_CHILD_TIMEOUT_SECONDS env var (floor 30s, no ceiling).
Fixes a broader class of 'tools.function.parameters is not a valid
moonshot flavored json schema' errors on Nous / OpenRouter aggregators
routing to moonshotai/kimi-k2.6 with MCP tools loaded.
## Moonshot sanitizer (agent/moonshot_schema.py, new)
Model-name-routed (not base-URL-routed) so Nous / OpenRouter users are
covered alongside api.moonshot.ai. Applied in
ChatCompletionsTransport.build_kwargs when is_moonshot_model(model).
Two repairs:
1. Fill missing 'type' on every property / items / anyOf-child schema
node (structural walk — only schema-position dicts are touched, not
container maps like properties/$defs).
2. Strip 'type' at anyOf parents; Moonshot rejects it.
## MCP normalizer hardened (tools/mcp_tool.py)
Draft-07 $ref rewrite from PR #14802 now also does:
- coerce missing / null 'type' on object-shaped nodes (salvages #4897)
- prune 'required' arrays to names that exist in 'properties'
(salvages #4651; Gemini 400s on dangling required)
- apply recursively, not just top-level
These repairs are provider-agnostic so the same MCP schema is valid on
OpenAI, Anthropic, Gemini, and Moonshot in one pass.
## Crash fix: safe getattr for Tool.inputSchema
_convert_mcp_schema now uses getattr(t, 'inputSchema', None) so MCP
servers whose Tool objects omit the attribute entirely no longer abort
registration (salvages #3882).
## Validation
- tests/agent/test_moonshot_schema.py: 27 new tests (model detection,
missing-type fill, anyOf-parent strip, non-mutation, real-world MCP
shape)
- tests/tools/test_mcp_tool.py: 7 new tests (missing / null type,
required pruning, nested repair, safe getattr)
- tests/agent/transports/test_chat_completions.py: 2 new integration
tests (Moonshot route sanitizes, non-Moonshot route doesn't)
- Targeted suite: 49 passed
- E2E via execute_code with a realistic MCP tool carrying all three
Moonshot rejection modes + dangling required + draft-07 refs:
sanitizer produces a schema valid on Moonshot and Gemini
Cron now resolves its toolset from the same per-platform config the
gateway uses — `_get_platform_tools(cfg, 'cron')` — instead of blindly
loading every default toolset. Existing cron jobs without a per-job
override automatically lose `moa`, `homeassistant`, and `rl` (the
`_DEFAULT_OFF_TOOLSETS` set), which stops the "surprise $4.63
mixture_of_agents run" class of bug (Norbert, Discord).
Precedence inside `run_job`:
1. per-job `enabled_toolsets` (PR #14767 / #6130) — wins if set
2. `_get_platform_tools(cfg, 'cron')` — new, the blanket gate
3. `None` fallback (legacy) — only on resolver exception
Changes:
- hermes_cli/platforms.py: register 'cron' with default_toolset
'hermes-cron'
- toolsets.py: add 'hermes-cron' toolset (mirrors 'hermes-cli';
`_get_platform_tools` then filters via `_DEFAULT_OFF_TOOLSETS`)
- cron/scheduler.py: add `_resolve_cron_enabled_toolsets(job, cfg)`,
call it at the `AIAgent(...)` kwargs site
- tests/cron/test_scheduler.py: replace the 'None when not set' test
(outdated contract) with an invariant ('moa not in default cron
toolset') + new per-job-wins precedence test
- tests/hermes_cli/test_tools_config.py: mark 'cron' as non-messaging
in the gateway-toolset-coverage test
Themes and plugins can now pull off arbitrary dashboard reskins (cockpit
HUD, retro terminal, etc.) without touching core code.
Themes gain four new fields:
- layoutVariant: standard | cockpit | tiled — shell layout selector
- assets: {bg, hero, logo, crest, sidebar, header, custom: {...}} —
artwork URLs exposed as --theme-asset-* CSS vars
- customCSS: raw CSS injected as a scoped <style> tag on theme apply
(32 KiB cap, cleaned up on theme switch)
- componentStyles: per-component CSS-var overrides (clipPath,
borderImage, background, boxShadow, ...) for card/header/sidebar/
backdrop/tab/progress/badge/footer/page
Plugin manifests gain three new fields:
- tab.override: replaces a built-in route instead of adding a tab
- tab.hidden: register component + slots without adding a nav entry
- slots: declares shell slots the plugin populates
10 named shell slots: backdrop, header-left/right/banner, sidebar,
pre-main, post-main, footer-left/right, overlay. Plugins register via
window.__HERMES_PLUGINS__.registerSlot(name, slot, Component). A
<PluginSlot> React helper is exported on the plugin SDK.
Ships a full demo at plugins/strike-freedom-cockpit/ — theme YAML +
slot-only plugin that reproduces a Gundam cockpit dashboard: MS-STATUS
sidebar with live telemetry, COMPASS crest in header, notched card
corners via componentStyles, scanline overlay via customCSS, gold/cyan
palette, Orbitron typography.
Validation:
- 15 new tests in test_web_server.py covering every extended field
- tests/hermes_cli/: 2615 passed (3 pre-existing unrelated failures)
- tsc -b --noEmit: clean
- vite build: 418 kB bundle, ~2 kB delta for slots/theme extensions
Co-authored-by: Teknium <p@nousresearch.com>
- AUTHOR_MAP entry for 130918800+devorun for #6636 attribution
- test_moa_defaults: was a change-detector tied to the exact frontier
model list — flips red every OpenRouter churn. Rewritten as an
invariant (non-empty, valid vendor/model slugs).
The agent-facing image_generate tool only passes prompt + aspect_ratio to
provider.generate() (see tools/image_generation_tool.py:953). The editing
block (reference_images / edit_image kwargs) could never fire from the
tool surface, and the xAI edits endpoint is /images/edits with a
different payload shape anyway — not /images/generations as submitted.
- Remove reference_images / edit_image kwargs handling from generate()
- Remove matching test_with_reference_images case
- Update docstring + plugin.yaml description to text-to-image only
- Surface resolution in the success extras
Follow-up to PR #14547. Tests: 18/18 pass.
Fixes several outright-wrong facts and gaps vs current main:
- venv activation: .venv is preferred, venv is fallback (per run_tests.sh)
- AIAgent default model is "" (empty, resolved from config), not hardcoded opus
- Test suite is ~15k tests / ~700 files, not ~3000
- tools/mcp_tool.py is 2.6k LOC, not 1050
- Remove stale "currently 5" config_version note; the real bump-trigger rule
is migration-only, not every new key
- Remove MESSAGING_CWD as the messaging cwd — it's been removed in favor of
terminal.cwd in config.yaml (gateway bridges to TERMINAL_CWD env var)
- .env is secrets-only; non-secret settings belong in config.yaml
- simple_term_menu pitfall: existing sites are legacy fallback, rule is
no new usage
Incomplete/missing sections filled in:
- Gateway platforms list updated to reflect actual adapters (matrix,
mattermost, email, sms, dingtalk, wecom, weixin, feishu, bluebubbles,
webhook, api_server, etc.)
- New 'Plugins' section covering general plugins, memory-provider plugins,
and dashboard/context-engine/image-gen plugin directories — including
the May 2026 rule that plugins must not touch core files
- New 'Skills' section covering skills/ vs optional-skills/ split and
SKILL.md frontmatter fields
- Logs section pointing at ~/.hermes/logs/ and 'hermes logs' CLI
- Prompt-cache policy now explicitly mentions --now / deferred slash-command
invalidation pattern
- Two new pitfalls: gateway two-guard dispatch rule, squash-merge-from-stale
branch silent revert, don't-wire-dead-code rule
Tree layout trimmed to load-bearing entry points — per-file subtrees were
~70% stale so replaced with directory-level notes pointing readers at the
filesystem as the source of truth.
- Drop broken tinker-atropos submodule instructions: no .gitmodules exists,
tinker-atropos/ is empty, and atroposlib + tinker are regular pip deps in
pyproject.toml pulled in by .[all,dev]. Replace with a one-line note.
- CLI vs Messaging table: /skills is cli_only=True in COMMAND_REGISTRY, so
remove it from the messaging column. /<skill-name> still works there.
- Point contributors at scripts/run_tests.sh (the canonical runner enforcing
CI-parity env) instead of bare pytest.
Follow-up to Magaav's safe sync policy. Two gaps in the canonicalizer
caused false diffs or silent drift:
1. discord.py's AppCommand.to_dict() omits nsfw, dm_permission, and
default_member_permissions — those live only on attributes. The
canonicalizer was reading them via payload.get() and getting defaults
(False/True/None), while the desired side from Command.to_dict(tree)
had the real values. Any command using non-default permissions
false-diffed on every startup. Pull them from the AppCommand
attributes via _existing_command_to_payload().
2. contexts and integration_types weren't canonicalized at all, so
drift in either was silently ignored. Added both to
_canonicalize_app_command_payload (sorted for stable compare).
Also normalized default_member_permissions to str-or-None since the
server emits strings but discord.py stores ints locally.
Added regression tests for both gaps.
Replaces blind tree.sync() on every Discord reconnect with a diff-based
reconcile. In safe mode (default), fetch existing global commands,
compare desired vs existing payloads, skip unchanged, PATCH changed,
recreate when non-patchable metadata differs, POST missing, and delete
stale commands one-by-one. Keeps 'bulk' for legacy behavior and 'off'
to skip startup sync entirely.
Fixes restart-heavy workflows that burn Discord's command write budget
and can surface 429s when iterating on native slash commands.
Env var: DISCORD_COMMAND_SYNC_POLICY (safe|bulk|off), default 'safe'.
Co-authored-by: Codex <codex@openai.invalid>
- _stdio_pids: set → Dict[int,str] tracks pid→server_name
- SIGTERM-first with 2s grace before SIGKILL escalation
- hasattr guard for SIGKILL on platforms without it
- Updated tests for dict-based tracking and 3-phase kill sequence
float(os.getenv(...)) at module level raises ValueError on any
non-numeric value, crashing the web server at import before it starts.
Wrap in try/except with a warning log and fallback to 3.0s.
The original regex only matched relative paths (./foo/.env or bare
.env), so the exact command from the bug report —
`cp /opt/data/.env.local /opt/data/.env` — did not trigger approval.
Broaden the leading-path prefix to accept an absolute leading slash
alongside ./ and ../, and add regressions for the bug-report command
and its redirection variant.
cmd_update no longer SIGKILLs in-flight agent runs, and users get
'still working' status every 3 min instead of 10. Two long-standing
sources of '@user — agent gives up mid-task' reports on Telegram and
other gateways.
Drain-aware update:
- New helper hermes_cli.gateway._graceful_restart_via_sigusr1(pid,
drain_timeout) sends SIGUSR1 to the gateway and polls os.kill(pid,
0) until the process exits or the budget expires.
- cmd_update's systemd loop now reads MainPID via 'systemctl show
--property=MainPID --value' and tries the graceful path first. The
gateway's existing SIGUSR1 handler -> request_restart(via_service=
True) -> drain -> exit(75) is wired in gateway/run.py and is
respawned by systemd's Restart=on-failure (and the explicit
RestartForceExitStatus=75 on newer units).
- Falls back to 'systemctl restart' when MainPID is unknown, the
drain budget elapses, or the unit doesn't respawn after exit (older
units missing Restart=on-failure). Old install behavior preserved.
- Drain budget = max(restart_drain_timeout, 30s) + 15s margin so the
drain loop in run_agent + final exit have room before fallback
fires. Composes with #14728's tool-subprocess reaping.
Notification interval:
- agent.gateway_notify_interval default 600 -> 180.
- HERMES_AGENT_NOTIFY_INTERVAL env-var fallback in gateway/run.py
matched.
- 9-minute weak-model spinning runs now ping at 3 min and 6 min
instead of 27 seconds before completion, removing the 'is the bot
dead?' reflex that drives gateway-restart cycles.
Tests:
- Two new tests in tests/hermes_cli/test_update_gateway_restart.py:
one asserts SIGUSR1 is sent and 'systemctl restart' is NOT called
when MainPID is known and the helper succeeds; one asserts the
fallback fires when the helper returns False.
- E2E: spawned detached bash processes confirm the helper returns
True on SIGUSR1-handling exit (~0.5s) and False on SIGUSR1-ignoring
processes (timeout). Verified non-existent PID and pid=0 edge cases.
- 41/41 in test_update_gateway_restart.py (was 39, +2 new).
- 154/154 in shutdown-related suites including #14728's new tests.
Reported by @GeoffWellman and @ANT_1515 on X.
Closes#11616.
The agent's API retry loop hardcoded max_retries = 3, so users with
fallback providers on flaky primaries burned through ~3 × provider
timeout (e.g. 3 × 180s = 9 minutes) before their fallback chain got a
chance to kick in.
Expose a new config key:
agent:
api_max_retries: 3 # default unchanged
Set it to 1 for fast failover when you have fallback providers, or
raise it if you prefer longer tolerance on a single provider. Values
< 1 are clamped to 1 (single attempt, no retry); non-integer values
fall back to the default.
This wraps the Hermes-level retry loop only — the OpenAI SDK's own
low-level retries (max_retries=2 default) still run beneath this for
transient network errors.
Changes:
- hermes_cli/config.py: add agent.api_max_retries default 3 with comment.
- run_agent.py: read self._api_max_retries in AIAgent.__init__; replace
hardcoded max_retries = 3 in the retry loop with self._api_max_retries.
- cli-config.yaml.example: documented example entry.
- hermes_cli/tips.py: discoverable tip line.
- tests/run_agent/test_api_max_retries_config.py: 4 tests covering
default, override, clamp-to-one, and invalid-value fallback.
Closes#8202.
Root cause: stop() reclaimed tool-call bash/sleep children only at the
very end of the shutdown sequence — after a 60s drain, 5s interrupt
grace, and per-adapter disconnect. Under systemd (TimeoutStopSec bounded
by drain_timeout), that meant the cgroup SIGKILL escalation fired first,
and systemd reaped the bash/sleep children instead of us.
Fix:
- Extract tool-subprocess cleanup into a local helper
_kill_tool_subprocesses() in _stop_impl().
- Invoke it eagerly right after _interrupt_running_agents() on the
drain-timeout path, before adapter disconnect.
- Keep the existing catch-all call at the end for the graceful path
and defense in depth against mid-teardown respawns.
- Bump generated systemd unit TimeoutStopSec to drain_timeout + 30s
so cleanup + disconnect + DB close has headroom above the drain
budget, matching the 'subprocess timeout > TimeoutStopSec + margin'
rule from the skill.
Tests:
- New: test_gateway_stop_kills_tool_subprocesses_before_adapter_disconnect_on_timeout
asserts kill_all() runs before disconnect() when drain times out.
- New: test_gateway_stop_kills_tool_subprocesses_on_graceful_path
guards that the final catch-all still fires when drain succeeds
(regression guard against accidental removal during refactor).
- Updated: existing systemd unit generator tests expect TimeoutStopSec=90
(= 60s drain + 30s headroom) with explanatory comment.
Previously delegate_task exposed 'max_iterations' in its JSON schema and used
`max_iterations or default_max_iter` — so a model guessing conservatively (or
copy-pasting a docstring hint like 'Only set lower for simple tasks') could
silently shrink a subagent's budget below the user's configured
delegation.max_iterations. One such call this session capped a deep forensic
audit at 40 iterations while the user's config was set to 250.
Changes:
- Drop 'max_iterations' from DELEGATE_TASK_SCHEMA['parameters']['properties'].
Models can no longer emit it.
- In delegate_task(): ignore any caller-supplied max_iterations, always use
delegation.max_iterations from config. Log at debug if a stale schema or
internal caller still passes one through.
- Keep the Python kwarg on the function signature for internal callers
(_build_child_agent tests pass it through the plumbing layer).
- Update test_schema_valid to assert the param is now absent (intentional
contract change, not a change-detector).
A test in tests/agent/test_credential_pool.py
(test_try_refresh_current_updates_only_current_entry) monkeypatched
refresh_codex_oauth_pure() to return the literal fixture strings
'access-new'/'refresh-new', then executed the real production code path
in agent/credential_pool.py::try_refresh_current which calls
_sync_device_code_entry_to_auth_store → _save_provider_state → writes
to `providers.openai-codex.tokens`. That writer resolves the target via
get_hermes_home()/auth.json. If the test ran with HERMES_HOME unset (direct
pytest invocation, IDE runner bypassing conftest discovery, or any other
sandbox escape), it would overwrite the real user's auth store with the
fixture strings.
Observed in the wild: Teknium's ~/.hermes/auth.json providers.openai-codex.tokens
held 'access-new'/'refresh-new' for five days. His CLI kept working because
the credential_pool entries still held real JWTs, but `hermes model`'s live
discovery path (which reads via resolve_codex_runtime_credentials →
_read_codex_tokens → providers.tokens) was silently 401-ing.
Fixes:
- Delete test_try_refresh_current_updates_only_current_entry. It was the
only test that exercised a writer hitting providers.openai-codex.tokens
with literal stub tokens. The entry-level rotation behavior it asserted
is still covered by test_mark_exhausted_and_rotate_persists_status above.
- Add a seat belt in hermes_cli.auth._auth_file_path(): if PYTEST_CURRENT_TEST
is set AND the resolved path equals the real ~/.hermes/auth.json, raise
with a clear message. In production (no PYTEST_CURRENT_TEST), a single
dict lookup. Any future test that forgets to monkeypatch HERMES_HOME
fails loudly instead of corrupting the user's credentials.
Validation:
- production (no PYTEST_CURRENT_TEST): returns real path, unchanged behavior
- pytest + HERMES_HOME unset (points at real home): raises with message
- pytest + HERMES_HOME=/tmp/...: returns tmp path, tests pass normally
Dashboard themes now control typography and layout, not just colors.
Each built-in theme picks its own fonts, base size, radius, and density
so switching produces visible changes beyond hue.
Schema additions (per theme):
- typography — fontSans, fontMono, fontDisplay, fontUrl, baseSize,
lineHeight, letterSpacing. fontUrl is injected as <link> on switch
so Google/Bunny/self-hosted stylesheets all work.
- layout — radius (any CSS length) and density
(compact | comfortable | spacious, multiplies Tailwind spacing).
- colorOverrides (optional) — pin individual shadcn tokens that would
otherwise derive from the palette.
Built-in themes are now distinct beyond palette:
- default — system stack, 15px, 0.5rem radius, comfortable
- midnight — Inter + JetBrains Mono, 14px, 0.75rem, comfortable
- ember — Spectral (serif) + IBM Plex Mono, 15px, 0.25rem
- mono — IBM Plex Sans + Mono, 13px, 0 radius, compact
- cyberpunk— Share Tech Mono everywhere, 14px, 0 radius, compact
- rose — Fraunces (serif) + DM Mono, 16px, 1rem, spacious
Also fixes two bugs:
1. Custom user themes silently fell back to default. ThemeProvider
only applied BUILTIN_THEMES[name], so YAML files in
~/.hermes/dashboard-themes/ showed in the picker but did nothing.
Server now ships the full normalised definition; client applies it.
2. Docs documented a 21-token flat colors schema that never matched
the code (applyPalette reads a 3-layer palette). Rewrote the
Themes section against the actual shape.
Implementation:
- web/src/themes/types.ts: extend DashboardTheme with typography,
layout, colorOverrides; ThemeListEntry carries optional definition.
- web/src/themes/presets.ts: 6 built-ins with distinct typography+layout.
- web/src/themes/context.tsx: applyTheme() writes palette+typography+
layout+overrides as CSS vars, injects fontUrl stylesheet, fixes the
fallback-to-default bug via resolveTheme(name).
- web/src/index.css: html/body/code read the new theme-font vars;
--radius-sm/md/lg/xl derive from --theme-radius; --spacing scales
with --theme-spacing-mul so Tailwind utilities shift with density.
- hermes_cli/web_server.py: _normalise_theme_definition() parses loose
YAML (bare hex strings, partial blocks) into the canonical wire
shape; /api/dashboard/themes ships full definitions for user themes.
- tests/hermes_cli/test_web_server.py: 16 new tests covering the
normaliser and discovery (rejection cases, clamping, defaults).
- website/docs/user-guide/features/web-dashboard.md: rewrite Themes
section with real schema, per-model tables, full YAML example.
OpenAI launched GPT-5.5 on Codex today (Apr 23 2026). Adds it to the static
catalog and pipes the user's OAuth access token into the openai-codex path of
provider_model_ids() so /model mid-session and the gateway picker hit the
live ChatGPT codex/models endpoint — new models appear for each user
according to what ChatGPT actually lists for their account, without a Hermes
release.
Verified live: 'gpt-5.5' returns priority 0 (featured) from the endpoint,
400k context per OpenAI's launch article. 'hermes chat --provider
openai-codex --model gpt-5.5' completes end-to-end.
Changes:
- hermes_cli/codex_models.py: add gpt-5.5 to DEFAULT_CODEX_MODELS + forward-compat
- agent/model_metadata.py: 400k context length entry
- hermes_cli/models.py: resolve codex OAuth token before calling
get_codex_model_ids() in provider_model_ids('openai-codex')
Trim comment noise, remove redundant typing, normalize sticky prompt viewport args to top→bottom order, and reuse one sticky viewport helper instead of duplicating the math.
Sticky prompt selection only considered the top edge of the viewport, so it could keep showing an older user prompt even when a newer one was already visible lower down. Suppress sticky output whenever a user message is visible in the viewport and cover it with a regression test.
Renderer-driven follow-to-bottom was restoring the viewport to the tail without notifying ScrollBox subscribers, so StickyPromptTracker could stay stale-visible. Notify on render-time scroll/sticky changes and treat near-bottom as bottom for prompt hiding.
When the viewport is away from the bottom, keep the last visible progress snapshot instead of rebuilding the streaming/thinking subtree on every turn-store update. This cuts scroll-time churn while preserving live updates near the tail and on turn completion.
Commit 43de1ca8 removed the _nr_to_assistant_message shim in favor of
duck-typed properties on the ToolCall dataclass. However, the
extra_content property (which carries the Gemini thought_signature) was
omitted from the ToolCall definition. This caused _build_assistant_message
to silently drop the signature via getattr(tc, 'extra_content', None)
returning None, leading to HTTP 400 errors on subsequent turns for all
Gemini 3 thinking models.
Add the extra_content property to ToolCall (matching the existing
call_id and response_item_id pattern) so the thought_signature round-trips
correctly through the transport → agent loop → API replay path.
Credit to @celttechie for identifying the root cause and providing the fix.
Closes#14488
Broaden the settle repaint from xterm.js-only to all alt-screen terminals. Ink upstream and ConPTY/xterm reports point to resize/reflow desync as a general stale-cell class, not a host-specific quirk.
Copilot flagged the variable as unused. LogUpdate.render only sees prev/next, so a simulated "physical terminal" has no hook in the public API. Kept the narrative in the comment and tightened the assertion to demonstrate the test's actual invariant: identical prev/next emits no heal patches.
Replace 28-line guard + nested queueMicrotask + pendingResizeRender flag-reuse with a named canAltScreenRepaint predicate and a single flat paint. setTimeout already drained the burst coalescer; the nested defer and flag dance were paranoia.
Three bugs fixed in model alias resolution:
1. resolve_alias() returned the FIRST catalog match with no version
preference. '/model mimo' picked mimo-v2-omni (index 0 in dict)
instead of mimo-v2.5-pro. Now collects all prefix matches, sorts
by version descending with pro/max ranked above bare names, and
returns the highest.
2. models.dev registry missing newly added models (e.g. v2.5 for
native xiaomi). resolve_alias() now merges static _PROVIDER_MODELS
entries into the catalog so models resolve immediately without
waiting for models.dev to sync.
3. hermes model picker showed only models.dev results (3 xiaomi models),
hiding curated entries (5 total). The picker now merges curated
models into the models.dev list so all models appear.
Also fixes a trailing-dot float parsing edge case in _model_sort_key
where '5.4.' failed float() and multi-dot versions like '5.4.1'
weren't parsed correctly.
- run the xterm.js settle-heal pass through a full render commit instead of diff-only scheduleRender
- guard against overlapping resize renders and clear settle timers on unmount
## Merged
Adds MiMo v2.5-pro and v2.5 support to Xiaomi native provider, OpenCode Go, and setup wizard.
### Changes
- Context lengths: added v2.5-pro (1M) and v2.5 (1M), corrected existing MiMo entries to exact values (262144)
- Provider lists: xiaomi, opencode-go, setup wizard
- Vision: upgraded from mimo-v2-omni to mimo-v2.5 (omnimodal)
- Config description updated for XIAOMI_API_KEY
- Tests updated for new vision model preference
### Verification
- 4322 tests passed, 0 new regressions
- Live API tested on Xiaomi portal: basic, reasoning, tool calling, multi-tool, file ops, system prompt, vision — all pass
- Self-review found and fixed 2 issues (redundant vision check, stale HuggingFace context length)
- force full alt-screen damage in xterm.js hosts to avoid stale glyph artifacts
- skip incremental scroll optimization there and repaint from a cleared screen atomically
Replaces the blanket 'always allow' change from the previous commit with
an opt-in config flag so users who want belt-and-suspenders security can
still get the keyword scan on skill_manage output.
## Default behavior (flag off)
skill_manage(action='create'|'edit'|'patch') no longer runs the keyword
scanner. The agent can write skills that mention risky keywords in prose
(documenting what reviewers should watch for, describing cache-bust
semantics in a PR-review skill, referencing AGENTS.md, etc.) without
getting blocked.
Rationale: the agent can already execute the same code paths via
terminal() with no gate, so the scan adds friction without meaningful
security against a compromised or malicious agent.
## Opt-in behavior (flag on)
Set skills.guard_agent_created: true in config.yaml to get the original
behavior back. Scanner runs on every skill_manage write; dangerous
verdicts surface as a tool error the agent can react to (retry without
the flagged content).
## External hub installs unaffected
trusted/community sources (hermes skills install) always get scanned
regardless of this flag. The gate is specifically for skill_manage,
which only agents call.
## Changes
- hermes_cli/config.py: add skills.guard_agent_created: False to DEFAULT_CONFIG
- tools/skill_manager_tool.py: _guard_agent_created_enabled() reads the flag;
_security_scan_skill() short-circuits to None when the flag is off
- tools/skills_guard.py: restore INSTALL_POLICY['agent-created'] =
('allow', 'allow', 'ask') so the scan remains strict when it does run
- tests/tools/test_skills_guard.py: restore original ask/force tests
- tests/tools/test_skill_manager_tool.py: new TestSecurityScanGate class
covering both flag states + config error handling
## Validation
- tests/tools/test_skills_guard.py + test_skill_manager_tool.py: 115/115 pass
- E2E: flagged-keyword skill creates with default config, blocks with flag on
The security scanner is meant to protect against hostile external skills
pulled from GitHub via hermes skills install — trusted/community policies
block or ask on dangerous verdicts accordingly. But agent-created skills
(from skill_manage) run in the same process as the agent that wrote them.
The agent can already execute the same code paths via terminal() with no
gate, so the ask-on-dangerous policy adds friction without meaningful
security.
Concrete trigger: an agent writing a PR-review skill that describes
cache-busting or persistence semantics in prose gets blocked because
those words appear in the patterns list. The skill isn't actually doing
anything dangerous — it's just documenting what reviewers should watch
for in other PRs.
Change: agent-created dangerous verdict maps to 'allow' instead of 'ask'.
External hub installs (trusted/community) keep their stricter policies
intact. Tests updated: renamed test_dangerous_agent_created_asks →
test_dangerous_agent_created_allowed; renamed force-override test and
updated assertion since force is now a no-op for agent-created (the allow
branch returns first).
Before this, _process_message_background's finally did an unconditional
'del self._active_sessions[session_key]' — even if a /stop/ /new
command had already swapped in its own command_guard via
_dispatch_active_session_command and cancelled us. The old task's
unwind would clobber the newer guard, opening a race for follow-ups.
Replace with _release_session_guard(session_key, guard=interrupt_event)
so the delete only fires when the guard we captured is still the one
installed. The sibling _session_tasks pop already had equivalent
ownership matching via asyncio.current_task() identity; this closes the
asymmetry.
Adds two direct regressions in test_session_split_brain_11016:
- stale guard reference must not clobber a newer guard by identity
- guard=None default still releases unconditionally (for callers that
don't have a captured guard to match against)
Refs #11016
Covers all three layers of the salvaged fix:
1. Adapter-side cancellation: /stop, /new, /reset cancel the in-flight
adapter task, release the guard, and let follow-up messages through;
/new keeps the guard installed until the runner response lands, then
drains the queued follow-up in order.
2. Adapter-side self-heal: a split-brain guard (done owner task, lock
still live) is healed on the next inbound message and the user gets
a reply instead of being trapped in infinite busy acks. A guard
with no recorded owner task is NOT auto-healed (protects fixtures
that install guards directly).
3. Runner-side generation guard: stale async runs whose generation was
bumped by /stop or /new cannot clear a newer run's _running_agents
slot on the way out.
11 tests, all green.
Refs #11016
Closes the runner-side half of the split-brain described in issue #11016
by wiring the existing _session_run_generation counter through the
session-slot promotion and release paths.
Without this, an older async run could still:
- promote itself from sentinel to real agent after /stop or /new
invalidated its run generation
- clear _running_agents on the way out, deleting a newer run's slot
Both races leave _running_agents desynced from what the user actually
has in flight, which is half of what shows up as 'No active task to
stop' followed by late 'Interrupting current task...' acks.
Changes:
- track_agent() in _run_agent now calls _is_session_run_current() before
writing the real agent into _running_agents[session_key]; if /stop or
/new bumped the generation while the agent was spinning up, the slot
is left alone (the newer run owns it).
- _release_running_agent_state() gained an optional run_generation
keyword. When provided, it only clears the slot if the generation is
still current. The final cleanup at the tail of _run_agent passes the
run's generation so an old unwind can't blow away a newer run's state.
- Returns bool so callers can tell when a release was blocked.
All the existing call sites that do NOT pass run_generation behave
exactly as before — this is a strict additive guard.
Refs #11016
Closes the adapter-side half of the split-brain described in issue #11016
where _active_sessions stays live but nothing is processing, trapping the
chat in repeated 'Interrupting current task...' while /stop reports no
active task.
Changes on BasePlatformAdapter:
- Add _session_tasks: Dict[str, asyncio.Task] mapping session -> owner task
so session-terminating commands can cancel the right task and old task
finally blocks can't clobber a newer task's guard.
- Add _release_session_guard(guard=...) that only releases if the guard
Event still matches, preventing races where /stop or /new swaps in a
temporary guard while the old task unwinds.
- Add _session_task_is_stale() and _heal_stale_session_lock() for
on-entry self-heal: when handle_message() sees an _active_sessions
entry whose RECORDED owner task is done/cancelled, clear it and fall
through to normal dispatch. No owner task recorded = not stale (some
tests install guards directly and shouldn't be auto-healed).
- Add cancel_session_processing() as the explicit adapter-side cancel
API so /stop/ /new/ /reset can cleanly tear down in-flight work.
- Route /stop, /new, /reset through _dispatch_active_session_command():
1. install a temporary command guard so follow-ups stay queued
2. let the runner process the command
3. cancel the old adapter task AFTER the runner response is ready
4. release the command guard and drain the latest pending follow-up
- _start_session_processing() replaces the inline create_task + guard
setup in handle_message() so guard + owner-task entry land atomically.
- cancel_background_tasks() also clears _session_tasks.
Combined, this means:
- /stop / /new / /reset actually cancel stuck work instead of leaving
adapter state desynced from runner state.
- A dead session lock self-heals on the next inbound message rather than
persisting until gateway restart.
- Follow-up messages after /new are processed in order, after the reset
command's runner response lands.
Refs #11016
The environment-snapshot login shell was auto-sourcing only ~/.bashrc when
building the PATH snapshot. On Debian/Ubuntu the default ~/.bashrc starts
with a non-interactive short-circuit:
case $- in *i*) ;; *) return;; esac
Sourcing it from a non-interactive shell returns before any PATH export
below that guard runs. Node version managers like n and nvm append their
PATH line under that guard, so Hermes was capturing a PATH without
~/n/bin — and the terminal tool saw 'node: command not found' even when
node was on the user's interactive shell PATH.
Expand the auto-source list (when auto_source_bashrc is on) to:
~/.profile → ~/.bash_profile → ~/.bashrc
~/.profile and ~/.bash_profile have no interactivity guard — installers
that write their PATH there (n's n-install, nvm's curl installer on most
setups) take effect. ~/.bashrc still runs last to preserve behaviour for
users who put PATH logic there without the guard.
Added two tests covering the new behaviour plus an E2E test that spins up
a real LocalEnvironment with a guard-prefixed ~/.bashrc and a ~/.profile
PATH export, and verifies the captured snapshot PATH contains the profile
entry.
On fresh RHEL/Debian SSH sessions without linger, `systemctl --user
start hermes-gateway` fails with 'Failed to connect to bus: No medium
found' because /run/user/$UID/bus doesn't exist. Setup previously
showed a raw CalledProcessError and continued claiming success, so the
gateway never actually started.
systemd_start() and systemd_restart() now call _preflight_user_systemd()
for the user scope first:
- Bus socket already there → no-op (desktop / linger-enabled servers)
- Linger off → try loginctl enable-linger (works when polkit permits,
needs sudo otherwise), wait for socket
- Still unreachable → raise UserSystemdUnavailableError with a clean
remediation message pointing to sudo loginctl + hermes gateway run
as the foreground fallback
Setup's start/restart handlers and gateway_command() catch the new
exception and render the multi-line guidance instead of a traceback.
When a newly-bundled skill's name collides with a pre-existing user
skill, sync silently kept the user's copy. Users never learned that
a bundled version shipped by that name.
Now (on non-quiet sync only) print:
⚠ <name>: bundled version shipped but you already have a local
skill by this name — yours was kept. Run `hermes skills reset
<name>` to replace it with the bundled version.
No behavior change to manifest writes or to the kept user copy —
purely additive warning on the existing collision-skip path.
When a new bundled skill's name collided with a pre-existing user skill
(from hub, custom, or leftover), sync_skills() recorded the bundled hash
in the manifest even though the on-disk copy was unrelated to bundled.
On the next sync, user_hash != origin_hash (bundled_hash) marked the
skill as "user-modified" permanently, blocking all bundled updates for
that skill until the user ran `hermes skills reset`.
Fix: only baseline the manifest entry when the user's on-disk copy is
byte-identical to bundled (safe to track — this is the reset re-sync or
coincidentally-identical install case). Otherwise skip the manifest
write entirely: the on-disk skill is unrelated to bundled and shouldn't
be tracked as if it were.
This preserves reset_bundled_skill()'s re-baseline flow (its post-delete
sync still writes to the manifest when user copy matches bundled) while
fixing the poisoning scenario for genuinely unrelated collisions.
Adds two tests following the existing test_failed_copy_does_not_poison_manifest
pattern: one verifying the manifest stays clean after a collision with
differing content, one verifying no false user_modified flag on resync.
Adds ruff (fast Python linter from Astral) as a dev dependency and sets
up initial config with all files excluded — ruff is entirely disabled
for now, this just lands the config for slow rollout enabling it
module-by-module in follow-up PRs.
Multiple custom_providers entries sharing the same base_url + api_key
are now grouped into a single picker row. A local Ollama host with
per-model display names ("Ollama — GLM 5.1", "Ollama — Qwen3-coder",
"Ollama — Kimi K2", "Ollama — MiniMax M2.7") previously produced four
near-duplicate picker rows that differed only by suffix; now it appears
as one "Ollama" row with four models.
Key changes:
- Grouping key changed from slug-by-name to (base_url, api_key). Names
frequently differ per model while the endpoint stays the same.
- When the grouped endpoint matches current_base_url, the row's slug is
set to current_provider so picker-driven switches route through the
live credential pipeline (no re-resolution needed).
- Per-model suffix is stripped from the display name ("Ollama — X" →
"Ollama") via em-dash / " - " separators.
- Two groups with different api_keys at the same base_url (or otherwise
colliding on cleaned name) are disambiguated with a numeric suffix
(custom:openai, custom:openai-2) so both stay visible.
- current_base_url parameter plumbed through both gateway call sites.
Existing #8216, #11499, #13509 regressions covered (dict/list shapes
of models:, section-3/section-4 dedup, normalized list-format entries).
Salvaged from @davidvv's PR #9210 — the underlying code had diverged
~1400 commits since that PR was opened, so this is a reconstruction of
the same approach on current main rather than a clean cherry-pick.
Authorship preserved via --author on this commit.
Closes#9210
When _resolve_tirith_path() returns None (e.g. install failed on
unsupported platform or all resolution paths exhausted), the function
passed None directly to subprocess.run(), causing a TypeError instead
of respecting the fail_open config.
Add a None check before the subprocess call that allows or blocks
according to the configured fail_open policy, matching the existing
error handling behavior for OSError and TimeoutExpired.
On Windows, os.kill(nonexistent_pid, 0) raises OSError with WinError 87
("The parameter is incorrect") instead of ProcessLookupError. Without
catching OSError, the acquire_scoped_lock() and get_running_pid() paths
crash on any invalid PID check — preventing gateway startup on Windows
whenever a stale PID file survives from a prior run.
Adapted @phpoh's fix in #12490 onto current main. The main file was
refactored in the interim (get_running_pid now iterates over
(primary_record, fallback_record) with a per-iteration try/except),
so the OSError catch is added as a new except clause after
PermissionError (which is a subclass of OSError, so order matters:
PermissionError must match first).
Co-authored-by: phpoh <1352808998@qq.com>
When override_acp_command was passed to _build_child_agent, it failed to
override effective_provider to 'copilot-acp' and effective_api_mode to
'chat_completions'. This caused the child AIAgent to inherit the parent's
native API configuration (e.g. Anthropic) and attempt real HTTP requests
using the parent's API key, leading to HTTP 401 errors and completely
bypassing the ACP subprocess.
Ensure that if an ACP command override is provided, the child agent
correctly routes through CopilotACPClient.
Refs #2653
_normalize_custom_provider_entry silently drops the models field when it's
a list. Hand-edited configs (and the shape used by older Hermes versions)
still write models as a plain list of ids, so after the normalize pass the
entry reaches list_authenticated_providers() with no models and /model
shows the provider with (0) models — even though the underlying picker
code handles lists fine.
Convert list-format models into the empty-value dict shape the rest of
the pipeline already expects. Dict-format entries keep passing through
unchanged.
Repro (before the fix):
custom_providers:
- name: acme
base_url: https://api.example.com/v1
models: [foo, bar, baz]
/model shows "acme (0)"; bypassing normalize in list_authenticated_providers
returns three models, confirming the drop happens in normalize.
Adds four unit tests covering list→dict conversion, dict pass-through,
filtering of empty/non-string entries, and the empty-list case.
Adds MiniMax-AI/cli to the default taps list so the mmx-cli skill
is discoverable and installable out of the box via /skills browse
and /skills install. The skill definition lives upstream at
github.com/MiniMax-AI/cli/skill/SKILL.md, keeping updates decoupled.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When user_name is stored as None (e.g. Telegram users without a
display name), dict.get('user_name', '') returns None because the
key exists — the default is only used for missing keys. This causes
a TypeError when the format specifier :<20 is applied to None.
Use `or ''` to coerce None to an empty string.
Fixes#7392
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
NormalizedResponse and ToolCall now have backward-compat properties
so the agent loop can read them directly without the shim:
ToolCall: .type, .function (returns self), .call_id, .response_item_id
NormalizedResponse: .reasoning_content, .reasoning_details,
.codex_reasoning_items
This eliminates the 35-line shim and its 4 call sites in run_agent.py.
Also changes flush_memories guard from hasattr(response, 'choices')
to self.api_mode in ('chat_completions', 'bedrock_converse') so it
works with raw boto3 dicts too.
WS1 items 3+4 of Cycle 2 (#14418).
3-layer chain (transport → v2 → v1) was collapsed to 2-layer in PR 7.
This collapses the remaining 2-layer (transport → v1 → NR mapping in
transport) to 1-layer: v1 now returns NormalizedResponse directly.
Before: adapter returns (SimpleNamespace, finish_reason) tuple,
transport unpacks and maps to NormalizedResponse (22 lines).
After: adapter returns NormalizedResponse, transport is a
1-line passthrough.
Also updates ToolCall construction — adapter now creates ToolCall
dataclass directly instead of SimpleNamespace(id, type, function).
WS1 item 1 of Cycle 2 (#14418).
Replace direct normalize_anthropic_response() call in
_AnthropicCompletionsAdapter.create() with
AnthropicTransport.normalize_response() via get_transport().
Before: auxiliary_client called adapter v1 directly, bypassing
the transport layer entirely.
After: auxiliary_client → get_transport('anthropic_messages') →
transport.normalize_response() → adapter v1 → NormalizedResponse.
The adapter v1 function (normalize_anthropic_response) now has
zero callers outside agent/anthropic_adapter.py and the transport.
This unblocks collapsing v1 to return NormalizedResponse directly
in a follow-up (the remaining 2-layer chain becomes 1-layer).
WS1 item 2 of Cycle 2 (#14418).
When run_conversation returns a non-dict value (e.g. an int under
error conditions), the subsequent result.get("final_response", "")
raises an opaque "'int' object has no attribute 'get'" AttributeError.
Add a type guard that converts this into a clear RuntimeError, which
is properly caught by the outer except Exception handler that marks
the job as failed and delivers the error message.
FixesNousResearch/hermes-agent#9433
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In WeCom group chats, messages sent as "@BotName /command" arrive with
the @mention prefix intact. This causes is_command() to return False
since the text does not start with "/".
Strip the leading @mention in group messages before creating the
MessageEvent, mirroring the existing behavior in the Telegram adapter.
- Replace hardcoded 'fr' default with DEFAULT_LOCAL_STT_LANGUAGE ('en')
— removes locale leak, matches other providers
- Drop redundant default=True on is_truthy_value (dict .get already defaults)
- Update auto-detect comment to include 'xai' in the chain
- Fix docstring: 21 languages (match PR body + actual xAI API)
- Update test_sends_language_and_format to set HERMES_LOCAL_STT_LANGUAGE=fr
explicitly, since default is no longer 'fr'
All 18 xAI STT tests pass locally.
* feat(agent): add PLATFORM_HINTS for matrix, mattermost, and feishu
These platform adapters fully support media delivery (send_image,
send_document, send_voice, send_video) but were missing from
PLATFORM_HINTS, leaving agents unaware of their platform context,
markdown rendering, and MEDIA: tag support.
Salvaged from PR #7370 by Rutimka — wecom excluded since main already
has a more detailed version.
Co-Authored-By: Marco Rutsch <marco@rutimka.de>
* test: add missing Markdown assertion for feishu platform hint
---------
Co-authored-by: Marco Rutsch <marco@rutimka.de>
iOS auto-corrects -- to — (em dash) and - to – (en dash), causing
commands like /model glm-4.7 —provider zai to fail with
'Model names cannot contain spaces'. Normalize at get_command_args().
The documentation claimed fallback activates 'at most once per session',
but the actual implementation restores the primary model at the start of
every run_conversation() call via _restore_primary_runtime().
Relevant source: run_agent.py lines 1666-1694 (snapshot), 6454-6517
(restore), 8681-8684 (called each turn).
Updated the One-Shot info box and the summary table to accurately
describe the per-turn restoration behavior.
In list_authenticated_providers(), providers like qwen-oauth that use
OAuth authentication were incorrectly flagged as authenticated because
the env-var check fell back to models.dev provider env vars (e.g.
DASHSCOPE_API_KEY for alibaba). Any user with an alibaba API key would
see a ghost qwen-oauth entry in /model picker with 0 models listed.
Fix: skip providers whose auth_type is not api_key in the env-var
detection section (step 1). OAuth/external-process providers are
properly handled in step 2 (HERMES_OVERLAYS) which checks the auth store.
In _detect_target(), platform.system() returns "Android" on Termux,
not "Linux". Without this change tirith's auto-installer skips
Android even though the Linux GNU binaries are ABI-compatible.
When an MCP server config has ssl_verify: false (e.g. local dev with
a self-signed cert), the setting was read from config.yaml but never
passed to the httpx client, causing CERTIFICATE_VERIFY_FAILED errors
and silent connection failures.
Fix: read ssl_verify from config and pass it as the 'verify' kwarg to
both code paths:
- New API (mcp >= 1.24.0): httpx.AsyncClient(verify=ssl_verify)
- Legacy API (mcp < 1.24.0): streamablehttp_client(..., verify=ssl_verify)
Fixes local dev setups using ServBay, LocalWP, MAMP, or any stack with
a self-signed TLS certificate.
The QQCloseError (non-4008) reconnect path in _listen_loop was
missing the MAX_RECONNECT_ATTEMPTS upper-bound check that exists
in both the Exception handler (line 546) and the 4008 rate-limit
handler (line 486). Without this check, if _reconnect() fails
permanently for any non-4008 close code, backoff_idx grows
indefinitely and the bot retries forever at 60-second intervals
instead of giving up cleanly.
Fix: add the same guard after backoff_idx += 1 in the general
QQCloseError branch, consistent with the existing Exception path.
The Dockerfile declares VOLUME /opt/data and the published
docker-compose flow bind-mounts ./data:/opt/data for runtime
state. Because .dockerignore did not list data/, any file the
container writes under /opt/data leaks back into the build
context on the next `docker compose build`.
This becomes a hard failure when the container writes a
dangling symlink there — e.g. PulseAudio's XDG runtime entry
(data/.config/pulse/<host>-runtime -> /tmp/pulse-*) whose
target only exists inside the container. Docker's tar packer
cannot resolve the broken symlink on the host and aborts
context load with `invalid file request`.
Excluding data/ keeps build context clean, shrinks the context
tarball (logs/, sessions/, memories/ no longer shipped), and
matches the intent already expressed in .gitignore.
New built-in image_gen backend at plugins/image_gen/openai-codex/ that
exposes the same gpt-image-2 low/medium/high tier catalog as the
existing 'openai' plugin, but routes generation through the ChatGPT/
Codex Responses image_generation tool path. Available whenever the user
has Codex OAuth signed in; no OPENAI_API_KEY required.
The two plugins are independent — users select between them via
'hermes tools' → Image Generation, and image_gen.provider in
config.yaml. The existing 'openai' (API-key) plugin is unchanged.
Reuses _read_codex_access_token() and _codex_cloudflare_headers() from
agent.auxiliary_client so token expiry / cred-pool / Cloudflare
originator handling stays in one place.
Inspired by #14047 by @Hygaard, but re-implemented as a separate
plugin instead of an in-place fork of the openai plugin.
Closes#11195
Add discord.slash_commands config option (default: true) to allow
users to disable Discord slash command registration when running
alongside other bots that use the same command names.
When set to false in config.yaml:
discord:
slash_commands: false
The _register_slash_commands() call is skipped while text-based
parsing of /commands continues to work normally.
Fixes#4881
Adds an Exa-specific setup note next to the Parallel search-modes line
documenting EXA_API_KEY, category filtering (company, research paper,
news, people, personal site, pdf), and domain/date filters.
Reapplied onto current main from @10ishq's PR #6697 — the original branch
was too far behind main to cherry-pick directly (touched 1,456 unrelated
files from deleted/renamed paths).
Co-authored-by: 10ishq <tanishq@exa.ai>
Add epub, pdf, zip, rar, 7z, docx, xlsx, pptx, txt, csv, apk, ipa to
the MEDIA: path regex in extract_media(). These file types were already
routed to send_document() in the delivery loop (base.py:1705), but the
extraction regex only matched media extensions (audio/video/image),
causing document paths to fall through to the generic \S+ branch which
could fail silently in some cases. This explicit list ensures reliable
matching and delivery for all common document formats.
Port from openai/codex#18646.
Adds two flags to 'hermes chat' that fully isolate a run from user-level
configuration and rules:
* --ignore-user-config: skip ~/.hermes/config.yaml and fall back to
built-in defaults. Credentials in .env are still loaded so the agent
can actually call a provider.
* --ignore-rules: skip auto-injection of AGENTS.md, SOUL.md,
.cursorrules, and persistent memory (maps to AIAgent(skip_context_files=True,
skip_memory=True)).
Primary use cases:
- Reproducible CI runs that should not pick up developer-local config
- Third-party integrations (e.g. Chronicle in Codex) that bring their
own config and don't want user preferences leaking in
- Bug-report reproduction without the reporter's personal overrides
- Debugging: bisect 'was it my config?' vs 'real bug' in one command
Both flags are registered on the parent parser AND the 'chat' subparser
(with argparse.SUPPRESS on the subparser to avoid overwriting the parent
value when the flag is placed before the subcommand, matching the
existing --yolo/--worktree/--pass-session-id pattern).
Env vars HERMES_IGNORE_USER_CONFIG=1 and HERMES_IGNORE_RULES=1 are set
by cmd_chat BEFORE 'from cli import main' runs, which is critical
because cli.py evaluates CLI_CONFIG = load_cli_config() at module import
time. The cli.py / hermes_cli.config.load_cli_config() function checks
the env var and skips ~/.hermes/config.yaml when set.
Tests: 11 new tests in tests/hermes_cli/test_ignore_user_config_flags.py
covering the env gate, constructor wiring, cmd_chat simulation, and
argparse flag registration. All pass; existing hermes_cli + cli suites
unaffected (3005 pass, 2 pre-existing unrelated failures).
Follow-up for #13862 — the post-init api_mode upgrade at __init__ (direct OpenAI /
gpt-5-requires-responses path) runs AFTER the eager transport warm. Clear the cache
so the stale chat_completions entry is evicted.
Cosmetic: correctness was already fine since _get_transport() keys by current
api_mode, but this avoids leaving unused cache state behind.
Consolidate 4 per-transport lazy singleton helpers (_get_anthropic_transport,
_get_codex_transport, _get_chat_completions_transport, _get_bedrock_transport)
into one generic _get_transport(api_mode) with a shared dict cache.
Collapse the 65-line main normalize block (3 api_mode branches, each with
its own SimpleNamespace shim) into 7 lines: one _get_transport() call +
one _nr_to_assistant_message() shared shim. The shim extracts provider_data
fields (codex_reasoning_items, reasoning_details, call_id, response_item_id)
into the SimpleNamespace shape downstream code expects.
Wire chat_completions and bedrock_converse normalize through their transports
for the first time — these were previously falling into the raw
response.choices[0].message else branch.
Remove 8 dead codex adapter imports that have zero callers after PRs 1-6.
Transport lifecycle improvements:
- Eagerly warm transport cache at __init__ (surfaces import errors early)
- Invalidate transport cache on api_mode change (switch_model, fallback
activation, fallback restore, transport recovery) — prevents stale
transport after mid-session provider switch
run_agent.py: -32 net lines (11,988 -> 11,956).
PR 7 of the provider transport refactor.
Follow-up to the /resume and /branch cleanup in the previous commit:
/new is a conversation-boundary operation too, so session-scoped
dangerous-command approvals and /yolo state must not survive it.
Adds a scoped unit test for _clear_session_boundary_security_state that
also covers the /new path (which calls the same helper).
On Windows, Path.open() defaults to the system ANSI code page (cp1252).
If the .env file contains UTF-8 characters, decoding fails with
'gbk codec can't decode byte 0x94'. Specify encoding='utf-8'
explicitly to ensure consistent behavior across platforms.
When a user manually sets fallback_model as a YAML list instead of a
dict, save_config_value() crashes with:
AttributeError: 'list' object has no attribute 'get'
at the fb.get('provider') call on hermes_cli/config.py.
The fix adds isinstance(fb, dict) so list-format values are treated as
unconfigured — the fallback_model comment block is appended to guide
correct usage — instead of crashing.
Fixes#4091
Co-authored-by: [AI-assisted — Claude Sonnet 4.6 via Milo/Hermes]
Commit 8254b820 ("--init for zombie reaping + sleep infinity for
idle-based lifetime") made the Docker terminal backend launch
sandbox containers with `sleep infinity` as the command, so the
lifetime is controlled by an external idle reaper instead of a
fixed timeout.
But `docker/entrypoint.sh` unconditionally wraps its args with
`hermes`:
exec hermes "$@"
Result: `hermes sleep infinity` → argparse rejects `sleep` as a
subcommand and the container exits immediately with code 2:
hermes: error: argument command: invalid choice: 'sleep'
(choose from chat, model, gateway, setup, ...)
Every sandbox container launched by the docker backend dies at
startup, breaking terminal/file tool execution end-to-end.
Fix: dispatch at the tail of the entrypoint. If the first arg is
an executable on PATH (sleep, bash, sh, etc.) run it raw; otherwise
preserve the legacy `hermes <subcommand>` wrapping behavior. Both
invocation styles below keep working:
docker run <image> -> hermes (interactive)
docker run <image> chat -q "hi" -> hermes chat -q "hi"
docker run <image> sleep infinity -> sleep infinity
docker run <image> bash -> bash
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Docker terminal backend runs containers with `--cap-drop ALL`
and re-adds only DAC_OVERRIDE, CHOWN, FOWNER. Since commit fee0e0d3
("run as non-root user, use virtualenv") the image entrypoint drops
from root to the `hermes` user via `gosu`, which requires CAP_SETUID
and CAP_SETGID. Without them every sandbox container exits
immediately with:
Dropping root privileges
error: failed switching to 'hermes': operation not permitted
Breaking every terminal/file tool invocation in `terminal.backend: docker`
mode.
Fix: add SETUID and SETGID to the cap-add list. The `no-new-privileges`
security-opt is kept, so gosu still cannot escalate back to root after
the one-way drop — the hardening posture is preserved.
Reproduction
------------
With any image whose ENTRYPOINT calls `gosu <user>`, the container
exits immediately under the pre-fix cap set. Post-fix, the drop
succeeds and the container proceeds normally.
docker run --rm \
--cap-drop ALL \
--cap-add DAC_OVERRIDE --cap-add CHOWN --cap-add FOWNER \
--security-opt no-new-privileges \
--entrypoint /usr/local/bin/gosu \
hermes-claude:latest hermes id
# -> error: failed switching to 'hermes': operation not permitted
# Same command with SETUID+SETGID added:
# -> uid=10000(hermes) gid=10000(hermes) groups=10000(hermes)
Tests
-----
Added `test_security_args_include_setuid_setgid_for_gosu_drop` that
asserts both caps are present and the overall hardening posture
(cap-drop ALL + no-new-privileges) is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port from openclaw/openclaw#67318. Some open models (notably Gemma
variants served via OpenRouter) emit tool calls as XML blocks inside
assistant content instead of via the structured tool_calls field:
<function name="read_file"><parameter name="path">/tmp/x</parameter></function>
<tool_call>{"name":"x"}</tool_call>
<function_calls>[{...}]</function_calls>
Left unstripped, this raw XML leaked to gateway users (Discord, Telegram,
Matrix, Feishu, Signal, WhatsApp, etc.) and the CLI, since hermes-agent's
existing reasoning-tag stripper handled only <think>/<thinking>/<thought>
variants.
Extend _strip_think_blocks (run_agent.py) and _strip_reasoning_tags
(cli.py) to cover:
* <tool_call>, <tool_calls>, <tool_result>
* <function_call>, <function_calls>
* <function name="..."> ... </function> (Gemma-style)
The <function> variant is boundary-gated (only strips when the tag sits
at start-of-line or after sentence punctuation AND carries a name="..."
attribute) so prose mentions like 'Use <function> declarations in JS'
are preserved. Dangling <function name="..."> with no close is
intentionally left visible — matches OpenClaw's asymmetry so a truncated
streaming tail still reaches the user.
Tests: 9 new cases in TestStripThinkBlocks (run_agent) + 9 in new file
tests/run_agent/test_strip_reasoning_tags_cli.py. Covers Qwen-style
<tool_call>, Gemma-style <function name="...">, multi-line payloads,
prose preservation, stray close tags, dangling open tags, and mixed
reasoning+tool_call content.
Note: this port covers the post-streaming final-text path, which is what
gateway adapters and CLI display consume. Extending the per-delta stream
filter in gateway/stream_consumer.py to hide these tags live as they
stream is a separate follow-up; for now users may see raw XML briefly
during a stream before the final cleaned text replaces it.
Refs: openclaw/openclaw#67318
Feishu's open_id is app-scoped (same user gets different open_ids per
bot app), not a canonical identity. Functionally correct for single-bot
mode but semantically misleading.
- Add comprehensive Feishu identity model documentation to module docstring
- Prefer user_id (tenant-scoped) over open_id (app-scoped) in
_resolve_sender_profile when both are available
- Document bot_open_id usage for @mention matching
- Update user_id_alt comment in SessionSource to be platform-generic
Ref: closes analysis from PR #8388 (closed as over-scoped)
Port from openclaw/openclaw#66664. The build_anthropic_kwargs call site
used 'max_tokens or _get_anthropic_max_output(model)', which correctly
falls back when max_tokens is 0 or None (falsy) but lets negative ints
(-1, -500), fractional floats (0.5, 8192.7), NaN, and infinity leak
through to the Anthropic API. Anthropic rejects these with HTTP 400
('max_tokens: must be greater than or equal to 1'), turning a local
config error into a surprise mid-conversation failure.
Add two resolver helpers matching OpenClaw's:
_resolve_positive_anthropic_max_tokens — returns int(value) only if
value is a finite positive number; excludes bools, strings, NaN,
infinity, sub-one positives (floor to 0).
_resolve_anthropic_messages_max_tokens — prefers a positive requested
value, else falls back to the model's output ceiling; raises
ValueError only if no positive budget can be resolved.
The context-window clamp at the call site (max_tokens > context_length)
is preserved unchanged — it handles oversized values; the new resolver
handles non-positive values. These concerns are now cleanly separated.
Tests: 17 new cases covering positive/zero/negative ints, fractional
floats (both >1 and <1), NaN, infinity, booleans, strings, None, and
integration via build_anthropic_kwargs.
Refs: openclaw/openclaw#66664
_generate_summary() takes (turns_to_summarize, focus_topic) but the
summary model fallback path passed (messages, summary_budget) — where
'messages' is not even in scope, causing a NameError.
Fix the recursive call to pass the correct variables so the fallback
to the main model actually works when the summary model is unavailable.
Fixes: #10721
The Anthropic provider entry in PROVIDER_REGISTRY is the only standard
API-key provider missing a base_url_env_var. This causes the credential
pool to hardcode base_url to https://api.anthropic.com, ignoring
ANTHROPIC_BASE_URL from the environment.
When using a proxy (e.g. LiteLLM, custom gateway), subagent delegation
fails with 401 because:
1. _seed_from_env() creates pool entries with the hardcoded base_url
2. On error recovery, _swap_credential() overwrites the child agent's
proxy URL with the pool entry's api.anthropic.com
3. The proxy API key is sent to real Anthropic → authentication_error
Adding base_url_env_var="ANTHROPIC_BASE_URL" aligns Anthropic with the
20+ other providers that already have this field set (alibaba, gemini,
deepseek, xai, etc.).
LocalEnvironment._run_bash() spawned subprocess.Popen without a cwd
argument, so init_session()'s pwd -P ran in the gateway process's
startup directory and overwrote self.cwd. Pass cwd=self.cwd so the
initial snapshot captures the user-configured working directory.
Tested:
- pytest tests/ -q (255 env-related tests passed)
- Full suite: 13,537 passed; 70 pre-existing failures unrelated to local env
Running 'hermes profile create' inside the container creates wrappers at
/opt/data/.local/bin but that directory isn't on PATH by default.
Add ENV PATH so wrappers are discoverable without touching shell configs.
The container_config builder in terminal_tool.py was missing
docker_forward_env and docker_env keys, causing config.yaml's
docker_forward_env setting to be silently ignored. Environment
variables listed in docker_forward_env were never injected into
Docker containers.
This fix adds both keys to the container_config dict so they are
properly passed to _create_environment().
Follow-up on helix4u's PR #14211:
- Flip default to true: narrowing toolsets=['web','browser'] expresses
'I want these extras', not 'silently strip MCP'. Parent MCP tools
(registered at runtime) should survive narrowing by default.
- Drop _config_version bump (22->23); additive nested key under
delegation.* is handled by _deep_merge, no migration needed.
- Update tests to reflect new default behavior.
browser_cdp_tool.py registers before browser_tool.py (alphabetical
import order), so its stricter check_fn (requires CDP endpoint) becomes
the toolset-level check for all 11 browser tools. This causes
'hermes doctor' to report the entire browser toolset as unavailable
even when agent-browser is correctly installed.
Move browser_cdp to toolset='browser-cdp' so it is evaluated
independently. browser_navigate et al. only need agent-browser;
browser_cdp additionally requires a reachable CDP endpoint.
Mid-stream SSL alerts (bad_record_mac, tls_alert_internal_error, handshake
failures) previously fell through the classifier pipeline to the 'unknown'
bucket because:
- ssl.SSLError type names weren't in _TRANSPORT_ERROR_TYPES (the
isinstance(OSError) catch picks up some but not all SDK-wrapped forms)
- the message-pattern list had no SSL alert substrings
The 'unknown' bucket is still retryable, but: (a) logs tell the user
'unknown' instead of identifying the cause, (b) it bypasses the
transport-specific backoff/fallback logic, and (c) if the SSL error
happens on a large session with a generic 'connection closed' wrapper,
the existing disconnect-on-large-session heuristic would incorrectly
trigger context compression — expensive, and never fixes a transport
hiccup.
Changes:
- Add ssl.SSLError and its subclass type names to _TRANSPORT_ERROR_TYPES
- New _SSL_TRANSIENT_PATTERNS list (separate from _SERVER_DISCONNECT_PATTERNS
so SSL alerts route to timeout, not context_overflow+compress)
- New step 5 in the classifier pipeline: SSL pattern check runs BEFORE
the disconnect check to pre-empt the large-session-compress path
Patterns cover both space-separated ('ssl alert', 'bad record mac')
and underscore-separated ('ERR_SSL_SSL/TLS_ALERT_BAD_RECORD_MAC')
forms. This is load-bearing because OpenSSL 3.x changed the error-code
separator from underscore to slash (e.g. SSLV3_ALERT_BAD_RECORD_MAC →
SSL/TLS_ALERT_BAD_RECORD_MAC) and will likely churn again — matching on
stable alert reason substrings survives future format changes.
Tests (8 new):
- BAD_RECORD_MAC in Python ssl.c format
- OpenSSL 3.x underscore format
- TLSV1_ALERT_INTERNAL_ERROR
- ssl handshake failure
- [SSL: ...] prefix fallback
- Real ssl.SSLError instance
- REGRESSION GUARD: SSL on large session does NOT compress
- REGRESSION GUARD: plain disconnect on large session STILL compresses
os.walk() by default does not follow symlinks, causing skills
linked via symlinks to be invisible to the skill discovery system.
Add followlinks=True so that symlinked skill directories are scanned.
Port from cline/cline#10266.
When OpenAI-compatible proxies (OpenRouter, Vercel AI Gateway, Cline)
route Claude models, they sometimes surface the Anthropic-native cache
counters (`cache_read_input_tokens`, `cache_creation_input_tokens`) at
the top level of the `usage` object instead of nesting them inside
`prompt_tokens_details`. Our chat-completions branch of
`normalize_usage()` only read the nested `prompt_tokens_details` fields,
so those responses:
- reported `cache_write_tokens = 0` even when the model actually did a
prompt-cache write,
- reported only some of the cache-read tokens when the proxy exposed them
top-level only,
- overstated `input_tokens` by the missed cache-write amount, which in
turn made cost estimation and the status-bar cache-hit percentage wrong
for Claude traffic going through these gateways.
Now the chat-completions branch tries the OpenAI-standard
`prompt_tokens_details` first and falls back to the top-level
Anthropic-shape fields only if the nested values are absent/zero. The
Anthropic and Codex Responses branches are unchanged.
Regression guards added for three shapes: top-level write + nested read,
top-level-only, and both-present (nested wins).
Version managers like frum (Ruby), rvm, nvm, and others commonly alias
cd to a wrapper function that runs additional logic after directory
changes. When Hermes captures the shell environment into a session
snapshot, these aliases are preserved. If the wrapper function fails
in the subprocess context (e.g. frum not on PATH), every cd fails,
causing all terminal commands to exit with code 126.
Using builtin cd bypasses any aliases or functions, ensuring the
directory change always uses the real bash builtin regardless of
what version managers are installed.
Make Tavily client respect a TAVILY_BASE_URL environment variable,
defaulting to https://api.tavily.com for backward compatibility.
Consistent with FIRECRAWL_API_URL pattern already used in this module.
Zhipu AI (智谱) serves both international users via api.z.ai and
China-based users via open.bigmodel.cn. The domestic endpoint was not
mapped in _URL_TO_PROVIDER, causing Hermes to treat it as an unknown
custom endpoint and fall back to the default 128K context length
instead of resolving the correct 200K+ context via models.dev or the
hardcoded GLM defaults.
This affects users of both the standard API
(https://open.bigmodel.cn/api/paas/v4) and the Coding Plan
(https://open.bigmodel.cn/api/coding/paas/v4).
New and newer models from models.dev now surface automatically in
/model (both hermes model CLI and the gateway Telegram/Discord picker)
for a curated set of secondary providers — no Hermes release required
when the registry publishes a new model.
Primary user-visible fix: on OpenCode Go, typing '/model mimo-v2.5-pro'
no longer silently fuzzy-corrects to 'mimo-v2-pro'. The exact match
against the merged models.dev catalog wins.
Scope (opt-in frozenset _MODELS_DEV_PREFERRED in hermes_cli/models.py):
opencode-go, opencode-zen, deepseek, kilocode, fireworks, mistral,
togetherai, cohere, perplexity, groq, nvidia, huggingface, zai,
gemini, google.
Explicitly NOT merged:
- openrouter and nous (never): curated list is already a hand-picked
subset / Portal is source of truth.
- xai, xiaomi, minimax, minimax-cn, kimi-coding, kimi-coding-cn,
alibaba, qwen-oauth (per-project decision to keep curated-only).
- providers with dedicated live-endpoint paths (copilot, anthropic,
ai-gateway, ollama-cloud, custom, stepfun, openai-codex) — those
paths already handle freshness themselves.
Changes:
- hermes_cli/models.py: add _MODELS_DEV_PREFERRED + _merge_with_models_dev
helper. provider_model_ids() branches on the set at its curated-fallback
return. Merge is models.dev-first, curated-only extras appended,
case-insensitive dedup, graceful fallback when models.dev is offline.
- hermes_cli/model_switch.py: list_authenticated_providers() calls the
same merge in both its code paths (PROVIDER_TO_MODELS_DEV loop +
HERMES_OVERLAYS loop). Picker AND validation-fallback both see
fresh entries.
- tests/hermes_cli/test_models_dev_preferred_merge.py (new): 13 tests —
merge-helper unit tests (empty/raise/order/dedup), opencode-go/zen
behavior, openrouter+nous explicitly guarded from merge.
- tests/hermes_cli/test_opencode_go_in_model_list.py: converted from
snapshot-style assertion to a behavior-based floor check, so it
doesn't break when models.dev publishes additional opencode-go
entries.
Addresses a report from @pfanis via Telegram: newer Xiaomi variants
on OpenCode Go weren't appearing in the /model picker, and /model
was silently routing requests for new variants to older ones.
The code execution sandbox creates a Unix domain socket in /tmp with
default permissions, allowing any local user to connect and execute
tool calls. Restrict to 0o600 after bind.
Closes#6230
- Adds 'ctx_size' field to _CONTEXT_LENGTH_KEYS tuple
- Enables hermes agent to correctly detect context size from custom LLMs
running on Lemonade server that use this field name instead of the
standard keys (max_seq_len, n_ctx_train, n_ctx)
The _looks_like_gateway_process function was missing the
hermes-gateway script pattern, causing dashboard to report gateway
as not running even when the process was active.
Patterns now cover all entry points:
- hermes_cli.main gateway
- hermes_cli/main.py gateway
- hermes gateway
- hermes-gateway (new)
- gateway/run.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow-up for salvaged PR #14179.
`_cleanup_invalid_pid_path` previously called `remove_pid_file()` for the
default PID path, but that helper defensively refuses to delete a PID file
whose pid field differs from `os.getpid()` (to protect --replace handoffs).
Every realistic stale-PID scenario is exactly that case: a crashed/Ctrl+C'd
gateway left behind a PID file owned by a now-dead foreign PID.
Once `get_running_pid()` has confirmed the runtime lock is inactive, the
on-disk metadata is known to belong to a dead process, so we can force-unlink
both the PID file and the sibling `gateway.lock` directly instead of going
through the defensive helper.
Also adds a regression test with a dead foreign PID that would have failed
against the previous cleanup logic.
Upgrades agent-browser from 0.13.0 to 0.26.0, picking up 13 releases of
daemon reliability fixes:
- Daemon hang on Linux from waitpid(-1) race in SIGCHLD handler (#1098)
- Chrome killed after ~10s idle due to PR_SET_PDEATHSIG thread tracking (#1157)
- Orphaned Chrome processes via process-group kill on shutdown (#1137)
- Stale daemon after upgrade via .version sidecar and auto-restart (#1134)
- Idle timeout not firing (sleep future recreated each loop) (#1110)
- Navigation hanging on lifecycle events that never fire (#1059, #1092)
- CDP attach hang on Chrome 144+ (#1133)
- Windows daemon TCP bind with Hyper-V port conflicts (#1041)
- Shadow DOM traversal in accessibility tree snapshots
- doctor command for user self-diagnosis
Also wires AGENT_BROWSER_IDLE_TIMEOUT_MS into the browser subprocess
environment so the daemon self-terminates after our configured inactivity
timeout (default 300s). This is the daemon-side counterpart to the
Python-side inactivity reaper — the daemon kills itself and its Chrome
children when no commands arrive, preventing orphan accumulation even
when the Python process dies without running atexit handlers.
Addresses #7343 (daemon socket hangs, shadow DOM) and #13793 (orphan
accumulation from force-killed sessions).
Fixes#12976
The generic "gemma": 8192 fallback was incorrectly matching gemma4:31b-cloud
before the more specific Gemma 4 entries could match, causing Hermes to assign
only 8K context instead of 262K. Added "gemma-4" and "gemma4" entries before
the fallback to correctly handle Gemma 4 model naming conventions.
Plugin slash commands now surface as first-class commands in every gateway
enumerator — Discord native slash picker, Telegram BotCommand menu, Slack
/hermes subcommand map — without a separate per-platform plugin API.
The existing 'command:<name>' gateway hook gains a decision protocol via
HookRegistry.emit_collect(): handlers that return a dict with
{'decision': 'deny'|'handled'|'rewrite'|'allow'} can intercept slash
command dispatch before core handling runs, unifying what would otherwise
have been a parallel 'pre_gateway_command' hook surface.
Changes:
- gateway/hooks.py: add HookRegistry.emit_collect() that fires the same
handler set as emit() but collects non-None return values. Backward
compatible — fire-and-forget telemetry hooks still work via emit().
- hermes_cli/plugins.py: add optional 'args_hint' param to
register_command() so plugins can opt into argument-aware native UI
registration (Discord arg picker, future platforms).
- hermes_cli/commands.py: add _iter_plugin_command_entries() helper and
merge plugin commands into telegram_bot_commands() and
slack_subcommand_map(). New is_gateway_known_command() recognizes both
built-in and plugin commands so the gateway hook fires for either.
- gateway/platforms/discord.py: extract _build_auto_slash_command helper
from the COMMAND_REGISTRY auto-register loop and reuse it for
plugin-registered commands. Built-in name conflicts are skipped.
- gateway/run.py: before normal slash dispatch, call emit_collect on
command:<canonical> and honor deny/handled/rewrite/allow decisions.
Hook now fires for plugin commands too.
- scripts/release.py: AUTHOR_MAP entry for @Magaav.
- Tests: emit_collect semantics, plugin command surfacing per platform,
decision protocol (deny/handled/rewrite/allow + non-dict tolerance),
Discord plugin auto-registration + conflict skipping, is_gateway_known_command.
Salvaged from #14131 (@Magaav). Original PR added a parallel
'pre_gateway_command' hook and a platform-keyed plugin command
registry; this re-implementation reuses the existing 'command:<name>'
hook and treats plugin commands as platform-agnostic so the same
capability reaches Telegram and Slack without new API surface.
Co-authored-by: Magaav <73175452+Magaav@users.noreply.github.com>
Replace xiaomi/mimo-v2-pro with xiaomi/mimo-v2.5-pro and xiaomi/mimo-v2.5
in the OpenRouter fallback catalog and the nous provider model list.
Add matching DEFAULT_CONTEXT_LENGTHS entries (1M tokens each).
- appLayout.tsx: restore the 1-row placeholder when `showStickyPrompt`
is false. Dropping it saved a row but the composer height shifted by
one as the prompt appeared/disappeared, jumping the input vertically
on scroll.
- useInputHandlers: gateway.rpc (from useMainApp) already catches errors
with its own sys() message and resolves to null. The previous `.catch`
was dead code and on RPC failures the user saw both 'error: ...' (from
rpc) and 'failed to toggle yolo'. Drop the catch and gate 'failed to
toggle yolo' on a non-null response so null (= rpc already spoke)
stays silent.
`is_local_endpoint()` leaned on `ipaddress.is_private`, which classifies
RFC-1918 ranges and link-local as private but deliberately excludes the
RFC 6598 CGNAT block (100.64.0.0/10) — the range Tailscale uses for its
mesh IPs. As a result, Ollama reached over Tailscale (e.g.
`http://100.77.243.5:11434`) was treated as remote and missed the
automatic stream-read / stale-stream timeout bumps, so cold model load
plus long prefill would trip the 300 s watchdog before the first token.
Add a module-level `_TAILSCALE_CGNAT = ipaddress.IPv4Network("100.64.0.0/10")`
(built once) and extend `is_local_endpoint()` to match the block both
via the parsed-`IPv4Address` path and the existing bare-string fallback
(for symmetry with the 10/172/192 checks). Also hoist the previously
function-local `import ipaddress` to module scope now that it's used by
the constant.
Extend `TestIsLocalEndpoint` with a CGNAT positive set (lower bound,
representative host, MagicDNS anchor, upper bound) and a near-miss
negative set (just below 100.64.0.0, just above 100.127.255.255, well
outside the block, and first-octet-wrong).
Resolve Feishu @_user_N / @_all placeholders into display names plus a
structured [Mentioned: Name (open_id=...), ...] hint so agents can both
reason about who was mentioned and call Feishu OpenAPI tools with stable
open_ids. Strip bot self-mentions only at message edges (leading
unconditionally, trailing only before whitespace/terminal punctuation)
so commands parse cleanly while mid-text references are preserved.
Covers both plain-text and rich-post payloads.
Also fixes a pre-existing hydration bug: Client.request no longer accepts
the 'method' kwarg on lark-oapi 1.5.3, so bot identity silently failed
to hydrate and self-filtering never worked. Migrate to the
BaseRequest.builder() pattern and accept the 'app_name' field the API
actually returns. Tighten identity matching precedence so open_id is
authoritative when present on both sides.
Adds security.allow_private_urls / HERMES_ALLOW_PRIVATE_URLS toggle so
users on OpenWrt routers, TUN-mode proxies (Clash/Mihomo/Sing-box),
corporate split-tunnel VPNs, and Tailscale networks — where DNS resolves
public domains to 198.18.0.0/15 or 100.64.0.0/10 — can use web_extract,
browser, vision URL fetching, and gateway media downloads.
Single toggle in tools/url_safety.py; all 23 is_safe_url() call sites
inherit automatically. Cached for process lifetime.
Cloud metadata endpoints stay ALWAYS blocked regardless of the toggle:
169.254.169.254 (AWS/GCP/Azure/DO/Oracle), 169.254.170.2 (AWS ECS task
IAM creds), 169.254.169.253 (Azure IMDS wire server), 100.100.100.200
(Alibaba), fd00:ec2::254 (AWS IPv6), the entire 169.254.0.0/16
link-local range, and the metadata.google.internal / metadata.goog
hostnames (checked pre-DNS so they can't be bypassed on networks where
those names resolve to local IPs).
Supersedes #3779 (narrower HERMES_ALLOW_RFC2544 for the same class of
users).
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
- normalizeStatusBar: replace Set + early-returns + cast with a single
alias lookup table. Handles legacy `false`, trims/lowercases strings,
maps `on` → `top` in one pass. One expression, no `as` hacks.
- Tab title block: drop the narrative comment, fold
blockedOnInput/titleStatus/cwdTag/terminalTitle into inline expressions
inside useTerminalTitle. Avoids shadowing the outer `cwd`.
- tui_gateway statusbar set branch: read `display` once instead of
`cfg0.get("display")` twice.
Previously the terminal tab title was `{⏳/✓} {model} — Hermes` which
only distinguished busy vs idle. Users juggling multiple Hermes tabs had
no way to tell which one was waiting on them for approval/clarify/sudo/
secret, and no cue for which workspace the tab was attached to.
- 3-state marker: `⚠` when an overlay prompt is open, `⏳` busy, `✓` idle.
- Append `· {shortCwd}` (28-char budget, $HOME → ~) so the tab surfaces
the workspace directly.
- Drop the `— Hermes` suffix — the marker already signals what this is,
and tab titles are tight.
Anthropic's API can legitimately return content=[] with stop_reason="end_turn"
when the model has nothing more to add after a turn that already delivered the
user-facing text alongside a trivial tool call (e.g. memory write). The transport
validator was treating that as an invalid response, triggering 3 retries that
each returned the same valid-but-empty response, then failing the run with
"Invalid API response after 3 retries."
The downstream normalizer already handles empty content correctly (empty loop
over response.content, content=None, finish_reason="stop"), so the only fix
needed is at the validator boundary.
Tests:
- Empty content + stop_reason="end_turn" → valid (the fix)
- Empty content + stop_reason="tool_use" → still invalid (regression guard)
- Empty content without stop_reason → still invalid (existing behavior preserved)
Copilot on #14145 flagged that the shift+tab yolo handler treated any
non-null RPC result as valid, so a response shape like {value: undefined}
or {value: 'weird'} would incorrectly echo 'yolo off'. Now only '1' and
'0' map to on/off; anything else (including missing value) surfaces as
'failed to toggle yolo', matching the null/catch branches.
When the streaming connection dropped AFTER user-visible text was
delivered but a tool call was in flight, we stubbed the turn with a
'⚠ Stream stalled mid tool-call; Ask me to retry' warning — costing
an iteration and breaking the flow. Users report this happening
increasingly often on long SSE streams through flaky provider routes.
Fix: in the existing inner stream-retry loop, relax the
deltas_were_sent short-circuit. If a tool call was in flight
(partial_tool_names populated) AND the error is a transient connection
error (timeout, RemoteProtocolError, SSE 'connection lost', etc.),
silently retry instead of bailing out. Fire a brief 'Connection
dropped mid tool-call; reconnecting…' marker so the user understands
the preamble is about to be re-streamed.
Researched how Claude Code (tombstone + non-streaming fallback),
OpenCode (blind Effect.retry wrapping whole stream), and Clawdbot
(4-way gate: stopReason==error + output==0 + !hadPotentialSideEffects)
handle this. Chose the narrow Clawdbot-style gate: retry only when
(a) a tool call was actually in flight (otherwise the existing
stub-with-recovered-text is correct for pure-text stalls) and
(b) the error is transient. Side-effect safety is automatic — no
tool has been dispatched within this single API call yet.
UX trade-off: user sees preamble text twice on retry (OpenCode-style).
Strictly better than a lost action with a 'retry manually' message.
If retries exhaust, falls through to the existing stub-with-warning
path so the user isn't left with zero signal.
Tests: 3 new tests in TestSilentRetryMidToolCall covering
(1) silent retry recovers tool call; (2) exhausted retries fall back
to stub; (3) text-only stalls don't trigger retry. 30/30 pass.
Copilot on #14138 flagged that the share report says '(file not found)'
when the log exists but is empty (either because the primary is empty
and no .1 rotation exists, or in the rare race where the file is
truncated between _resolve_log_path() and stat()).
- Split _primary_log_path() out of _resolve_log_path so both can share
the LOG_FILES/home math without duplication.
- _capture_log_snapshot now reports '(file empty)' when the primary
path exists on disk with zero bytes, and keeps '(file not found)'
for the truly-missing case.
Tests: rename test_returns_none_for_empty → test_empty_primary_reports_file_empty
with the new assertion, plus a race-path test that monkeypatches
_resolve_log_path to exercise the size==0 branch directly.
- entry.tsx no longer writes bootBanner() to the main screen before the
alt-screen enters. The <Banner> renders inside the alt screen via the
seeded intro row, so nothing is lost — just the flash that preceded it.
Fixes the torn first frame reported on Alacritty (blitz row 5 #17) and
shaves the 'starting agent' hang perception (row 5 #1) since the UI
paints straight into the steady-state view
- AlternateScreen prefixes ERASE_SCROLLBACK (\x1b[3J) to its entry so
strict emulators start from a pristine grid; named constants replace
the inline sequences for clarity
- bootBanner.ts deleted — dead code
- normalizeStatusBar: trim/lowercase + 'on' → 'top' alias so user-edited
YAML variants (Top, " bottom ", on) coerce correctly
- shift-tab yolo: no-op with sys note when no live session; success-gated
echo and catch fallback so RPC failures don't report as 'yolo off'
- tui_gateway config.set/get statusbar: isinstance(display, dict) guards
mirroring the compact branch so a malformed display scalar in config.yaml
can't raise
Tests: +1 vitest for trim/case/on, +2 pytest for non-dict display survival.
- normalizeStatusBar collapses to one ternary expression
- /statusbar slash hoists the toggle value and flattens the branch tree
- shift-tab yolo comment reduced to one line
- cursorLayout/offsetFromPosition lose paragraph-length comments
- appLayout collapses the three {!overlay.agents && …} into one fragment
- StatusRule drops redundant flexShrink={0} (Yoga default)
- server.py uses a walrus + frozenset and trims the compat helper
Net -43 LoC. 237 vitest + 46 pytest green, layouts unchanged.
The heart was rendered as a literal space when inactive. Because it's
absolutely positioned at right:0 inside the composer row, that blank
still overpainted the rightmost input cell. On wrapped 2-line drafts,
editing near the boundary made the final visible character appear to
jump in/out as it crossed the overpainted column.
When inactive, render nothing; only mount the heart while it's actually
animating.
The 'columns' prop passed to TextInput was cols - pw, but the actual
render width is cols - pw - 2 (NoSelect's paddingX={1} on each side
subtracts two cols from the composer area). cursorLayout thought it
had two extra cols, so wrap-ansi wrapped at render col N while the
declared cursor sat at col N+2 on the same row. The render and the
declared cursor disagreed right at the wrap boundary — the last
letter of a sentence spanning two lines flickered in/out as each
keystroke flipped which cell the cursor claimed.
Also polish the /help hotkeys panel — the !cmd / {!cmd} placeholders
read as literal commands to type, so show them with angle-bracket
syntax and a concrete example (blitz row 5 sub-item 4).
Previous revision added marginTop={1} to the input which stacked as a
phantom gap BETWEEN status and input. The breathing row should sit
ABOVE the status-in-top cluster, not inside it.
- StatusRulePane at="top" now carries its own marginTop={1} so it
always has a one-row gap above (separating it from transcript or,
when queue is present, from the last queue item)
- Input Box marginTop flips: 0 in top mode (status is the separator),
1 in bottom/off mode (input itself caps the composer cluster)
- Net: status and input are tight together in 'top'; input and status
are tight together at the bottom in 'bottom'; one-row breathing room
above whichever element sits on top of the cluster
Three bugs rolled together, all in the composer area:
- StatusRule was measuring as 2 rows in Yoga due to a quirk with the
complex nested <Text wrap="truncate-end"> content. Lock the outer box
to height={1} so 'top' mode actually abuts the input instead of
leaving a phantom blank row between them
- FloatingOverlays (slash completions, /model picker, /resume, /skills
browser, pager) was anchored to the status box. In 'bottom' mode the
status box moved away, so overlays vanished. Move the overlays into
the input row (which is position:relative) so they always pop up
above the input regardless of status position
- Drop the <Text> </Text> fallback in the sticky-prompt slot (only
render a row when there's an actual sticky prompt to show) and
collapse the now-unused Box column wrapping the input. Saves two
rows of dead vertical space in the default layout
'top' and 'bottom' are positions relative to the input row, not the alt
screen viewport:
- top (default) → inline above the input, where the bar originally lived
(what 'on' used to mean)
- bottom → below the input, pinned to the last row
- off → hidden
Drops the literal top-of-screen placement; 'on' is kept as a backward-
compat alias that resolves to 'top' at both the config layer
(normalizeStatusBar, _coerce_statusbar) and the slash command.
Default is back to 'on' (inline, above the input) — bottom was too far
from the input and felt disconnected. Users who want it pinned can
opt in explicitly.
- UiState.statusBar: boolean → 'on' | 'off' | 'bottom' | 'top'
- /statusbar [on|off|bottom|top|toggle]; no-arg still binary-toggles
between off and on (preserves muscle memory)
- appLayout renders StatusRulePane in three slots (inline inside
ComposerPane for 'on', above transcript row for 'top', after
ComposerPane for 'bottom'); only the slot matching ui.statusBar
actually mounts
- drop the input's marginBottom when 'bottom' so the rule sits tight
against the input instead of floating a row below
- useConfigSync.normalizeStatusBar coerces legacy bool (true→on,
false→off) and unknown shapes to 'on' for forward-compat reads
- tui_gateway: split compact from statusbar config handlers; persist
string enum with _coerce_statusbar helper for legacy bool configs
- input wrap: add <Text wrap="wrap-char"> mode that drives wrap-ansi with
wordWrap:false, and align cursorLayout/offsetFromPosition to that same
boundary (w=cols, trailing-cell overflow). Word-wrap's whitespace
reshuffle was causing the cursor to jump a word left/right on each
keystroke near the right edge — blitz row 9
- shift-tab: toggle per-session yolo without submitting a turn (mirrors
Claude Code's in-place dangerously-approve); slash /yolo still works
for discoverability — blitz row 5 sub-item 11
- statusline: lift StatusRule out of ComposerPane to a new StatusRulePane
anchored at the bottom of AppLayout, below the input — blitz row 5
sub-item 12
The salvaged status-bar skin keys were seeded on the default skin, but
_build_skin_config merges default.colors into every skin — so daylight
and warm-lightmode silently inherited silver status_bar_text (#C0C0C0)
on their light backgrounds, rendering as low-contrast gray on gray.
Drop the seven status_bar_{text,strong,dim,good,warn,bad,critical}
entries from the default skin's colors and let get_prompt_toolkit_style
_overrides fall back to banner_text / banner_title / banner_dim /
ui_ok / ui_warn / ui_error. Dark skins keep their explicit overrides
and render identically; light skins now inherit their own dark banner
colors for readable status-bar text.
Drop rebased test assumptions about theme-mode helpers removed on main and keep the status bar skin integration aligned with the current skin engine model.
Route prompt_toolkit status bar colors through the skin engine so /skin updates the status bar alongside the rest of the interactive TUI.
Add regression coverage for the new status bar style override keys and CLI style composition.
These thin wrappers around _capture_log_snapshot had zero production
callers after the snapshot refactor — run_debug_share uses snapshots
directly and collect_debug_report captures internally. The wrappers
also caused a performance regression: _read_log_tail read up to 512KB
and built full_text just to return tail_text.
Remove both wrappers and migrate TestReadFullLog → TestCaptureLogSnapshot
to test _capture_log_snapshot directly. Same coverage, tests the real
API instead of dead indirection.
Add missing AUTHOR_MAP entry for taosiyuan163 whose truncation boundary
fix was adapted into _capture_log_snapshot().
Add regression tests proving: line-boundary truncation keeps the full
first line, mid-line truncation correctly drops the partial fragment.
Adapt the byte-boundary-safe truncation fix from PR #14040 by
taosiyuan163 into the new _capture_log_snapshot() code path: when
the truncation cut lands exactly on a line boundary, keep the first
retained line instead of unconditionally dropping it.
Also add a 2x max_bytes safety cap to the backward-reading loop to
prevent unbounded memory consumption when log files contain very long
lines (e.g. JSON blobs) with few newlines.
Based on #14040 by @taosiyuan163.
Pull duplicated rules into ui-tui/src/lib/subagentTree so the live overlay,
disk snapshot label, and diff pane all speak one dialect:
- export fmtDuration(seconds) — was a private helper in subagentTree;
agentsOverlay's local secLabel/fmtDur/fmtElapsedLabel now wrap the same
core (with UI-only empty-string policy).
- export topLevelSubagents(items) — matches buildSubagentTree's orphan
semantics (no parent OR parent not in snapshot). Replaces three hand-
rolled copies across createGatewayEventHandler (disk label), agentsOverlay
DiffPane, and prior inline filters.
Also collapse agentsOverlay boilerplate:
- replace IIFE title + inner `delta` helper with straight expressions;
- introduce module-level diffMetricLine for replay-diff rows;
- tighten OverlayScrollbar (single thumbColor expression, vBar/thumbBody).
Adds unit coverage for the new exports (fmtDuration + topLevelSubagents).
No behaviour change; 221 tests pass.
- delegate_task: use shared tool_error() for the paused-spawn early return
so the error envelope matches the rest of the tool.
- Disk snapshot label: treat orphaned nodes (parentId missing from the
snapshot) as top-level, matching buildSubagentTree / summarizeLabel.
- List rows: pad the status dot with space before (heat-marker gap or
matching 2-space filler) and after (3 spaces to goal) so `●` / `○` /
`✓` / `■` / `✗` don't read glued to the heat bar or the goal text.
- Gantt rows: bump id→bar separator from 1 to 2 spaces; widen the id
gutter from 4 to 5 cols and re-align the ruler lead to match.
Four real issues Copilot flagged:
1. delegate_tool: `_build_child_agent` never passed `toolsets` to the
progress callback, so the event payload's `toolsets` field (wired
through every layer) was always empty and the overlay's toolsets
row never populated. Thread `child_toolsets` through.
2. event handler: the race-protection on subagent.spawn_requested /
subagent.start only preserved `completed`, so a late-arriving queued
event could clobber `failed` / `interrupted` too. Preserve any
terminal status (`completed | failed | interrupted`).
3. SpawnHud: comment claimed concurrency was approximated by "widest
level in the tree" but code used `totals.activeCount` (total across
all parents). `max_concurrent_children` is a per-parent cap, so
activeCount over-warns for multi-orchestrator runs. Switch to
`max(widthByDepth(tree))`; the label now reads `⚡W/cap+extra` where
W is the widest level (drives the ratio) and `+extra` is the rest.
4. spawn_tree.list: comment said "peek header without parsing full list"
but the code json.loads()'d every snapshot. Adds a per-session
`_index.jsonl` sidecar written on save; list() reads only the index
(with a full-scan fallback for pre-index sessions). O(1) per
snapshot now vs O(file-size).
Adds _reactions_enabled() gating to match Discord (DISCORD_REACTIONS) and
Telegram (TELEGRAM_REACTIONS) pattern. Defaults to true to preserve existing
behavior. Gates at three levels:
- _handle_slack_message: skips _reacting_message_ids registration
- on_processing_start: early return
- on_processing_complete: early return
Also adds config.yaml bridge (slack.reactions) and two new tests.
Slack reactions were placed around handle_message(), which returns
immediately after spawning a background task. This caused the 👀
→ ✅ swap to happen before any real work began.
Fix: implement on_processing_start / on_processing_complete callbacks
(matching Discord/Telegram) so reactions bracket actual _message_handler
work driven by the base class.
Also fixes missing stop_typing() for Slack's assistant thread status
indicator, which left 'is thinking...' stuck in the UI after processing
completed.
- Add _reacting_message_ids set for DM/@mention-only gating
- Add _active_status_threads dict for stop_typing lookup
- Update test_reactions_in_message_flow for new callback pattern
- Add test_reactions_failure_outcome and test_reactions_skipped_for_non_dm_non_mention
- createGatewayEventHandler: remove dead `return` after a block that
always returns (tool.complete case). The inner block exits via
both branches so the outer statement was never reachable. Was
pre-existing on main; fixed here because it was the only thing
blocking `npm run fix` on this branch.
- agentsOverlay + ops: prettier reformatting.
`npm run fix` / `npm run type-check` / `npm test` all clean.
The Write tool that wrote the cleaned overlay split the `if` keyword
across two lines in 9 places (` i\nf (cond) {`), which silently
passed one typecheck run but actually left the handler as broken
JS — every keystroke threw. Input froze in the /agents overlay
(j/k/arrows/q/etc. all no-ops) while the 500ms now-tick kept
rendering, so the UI looked "frozen but the timeline moves".
Reflows the handler as-intended with no behaviour change.
Adds a live + post-hoc audit surface for recursive delegate_task fan-out.
None of cc/oc/oclaw tackle nested subagent trees inside an Ink overlay;
this ships a view-switched dashboard that handles arbitrary depth + width.
Python
- delegate_tool: every subagent event now carries subagent_id, parent_id,
depth, model, tool_count; subagent.complete also ships input/output/
reasoning tokens, cost, api_calls, files_read/files_written, and a
tail of tool-call outputs
- delegate_tool: new subagent.spawn_requested event + _active_subagents
registry so the overlay can kill a branch by id and pause new spawns
- tui_gateway: new RPCs delegation.status, delegation.pause,
subagent.interrupt, spawn_tree.save/list/load (disk under
\$HERMES_HOME/spawn-trees/<session>/<ts>.json)
TUI
- /agents overlay: full-width list mode (gantt strip + row picker) and
Enter-to-drill full-width scrollable detail mode; inverse+amber
selection, heat-coloured branch markers, wall-clock gantt with tick
ruler, per-branch rollups
- Detail pane: collapsible accordions (Budget, Files, Tool calls, Output,
Progress, Summary); open-state persists across agents + mode switches
via a shared atom
- /replay [N|last|list|load <path>] for in-memory + disk history;
/replay-diff <a> <b> for side-by-side tree comparison
- Status-bar SpawnHud warns as depth/concurrency approaches caps;
overlay auto-follows the just-finished turn onto history[1]
- Theme: bump DARK dim #B8860B → #CC9B1F for readable secondary text
globally; keep LIGHT untouched
Tests: +29 new subagentTree unit tests; 215/215 passing.
Follow-up to the cherry-picked PR #13897 fix. Three issues found:
1. CRITICAL: The thinking block synthesised from reasoning_content was
immediately stripped by the third-party signature management code
(Kimi is classified as _is_third_party_anthropic_endpoint). Added a
Kimi-specific carve-out that preserves unsigned thinking blocks while
still stripping Anthropic-signed blocks Kimi can't validate.
2. Empty-string reasoning_content was silently dropped because the
truthiness check ('if reasoning_content and ...') evaluates to False
for ''. Changed to 'isinstance(reasoning_content, str)' so the
tier-3 fallback from _copy_reasoning_content_for_api (which injects
'' for Kimi tool-call messages with no reasoning) actually produces
a thinking block.
3. The thinking block was appended AFTER tool_use blocks. Anthropic
protocol requires thinking -> text -> tool_use ordering. Changed to
blocks.insert(0, ...) to prepend.
FixesNousResearch/hermes-agent#13848
Kimi's /coding endpoint speaks the Anthropic Messages protocol but has its
own thinking semantics: when thinking is enabled, Kimi validates message
history and requires every prior assistant tool-call message to carry
OpenAI-style reasoning_content.
The Anthropic path never populated that field, and
convert_messages_to_anthropic strips all Anthropic thinking blocks on
third-party endpoints — so the request failed with HTTP 400:
"thinking is enabled but reasoning_content is missing in assistant
tool call message at index N"
Now, when an assistant message contains tool_calls and a
reasoning_content string, we append a {"type": "thinking", ...} block
to the Anthropic content so Kimi can validate the history. This only
affects assistant messages with tool_calls + reasoning_content; plain
text assistant messages are unchanged.
The 404 branch in _classify_by_status had dead code: the generic
fallback below the _MODEL_NOT_FOUND_PATTERNS check returned the
exact same classification (model_not_found + should_fallback=True),
so every 404 — regardless of message — was treated as a missing model.
This bites local-endpoint users (llama.cpp, Ollama, vLLM) whose 404s
usually mean a wrong endpoint path, proxy routing glitch, or transient
backend issue — not a missing model. Claiming 'model not found' misleads
the next turn and silently falls back to another provider when the real
problem was a URL typo the user should see.
Fix: only classify 404 as model_not_found when the message actually
matches _MODEL_NOT_FOUND_PATTERNS ("invalid model", "model not found",
etc.). Otherwise fall through as unknown (retryable) so the real error
surfaces in the retry loop.
Test updated to match the new behavior. 103 error_classifier tests pass.
* fix(plugins): auto-coerce user-installed memory plugins to kind=exclusive
User-installed memory provider plugins at $HERMES_HOME/plugins/<name>/
were being dispatched to the general PluginManager, which has no
register_memory_provider method on PluginContext. Every startup logged:
Failed to load plugin 'mempalace': 'PluginContext' object has no
attribute 'register_memory_provider'
Bundled memory providers were already skipped via skip_names={memory,
context_engine} in discover_and_load, but user-installed ones weren't.
Fix: _parse_manifest now scans the plugin's __init__.py source for
'register_memory_provider' or 'MemoryProvider' (same heuristic as
plugins/memory/__init__.py:_is_memory_provider_dir) and auto-coerces
kind to 'exclusive' when the manifest didn't declare one explicitly.
This routes the plugin to plugins/memory discovery instead of the
general loader.
The escape hatch: if a manifest explicitly declares kind: standalone,
the heuristic doesn't override it.
Reported by Uncle HODL on Discord.
* fix(nous): actionable CLI message when Nous 401 refresh fails
Mirrors the Anthropic 401 diagnostic pattern. When Nous returns 401
and the credential refresh (_try_refresh_nous_client_credentials)
also fails, the user used to see only the raw APIError. Now prints:
🔐 Nous 401 — Portal authentication failed.
Response: <truncated body>
Most likely: Portal OAuth expired, account out of credits, or
agent key revoked.
Troubleshooting:
• Re-authenticate: hermes login --provider nous
• Check credits / billing: https://portal.nousresearch.com
• Verify stored credentials: $HERMES_HOME/auth.json
• Switch providers temporarily: /model <model> --provider openrouter
Addresses the common 'my hermes model hangs' pattern where the user's
Portal OAuth expired and the CLI gave no hint about the next step.
Adds schema v7 'api_call_count' column. run_agent.py increments it by 1
per LLM API call, web_server analytics SQL aggregates it, frontend uses
the real counter instead of summing sessions.
The 'API Calls' card on the analytics dashboard previously displayed
COUNT(*) from the sessions table — the number of conversations, not
LLM requests. Each session makes 10-90 API calls through the tool loop,
so the reported number was ~30x lower than real.
Salvaged from PR #10140 (@kshitijk4poor). The cache-token accuracy
portions of the original PR were deferred — per-provider analytics is
the better path there, since cache_write_tokens and actual_cost_usd
are only reliably available from a subset of providers (Anthropic
native, Codex Responses, OpenRouter with usage.include).
Tests:
- schema_version v7 assertion
- migration v2 -> v7 adds api_call_count column with default 0
- update_token_counts increments api_call_count by provided delta
- absolute=True sets api_call_count directly
- /api/analytics/usage exposes total_api_calls in totals
- Replace async create_bind_task/poll_bind_result with synchronous
httpx.Client equivalents, eliminating manual event loop management
- Move _render_qr and full qr_register() entry-point into onboard.py,
mirroring the Feishu onboarding pattern
- Remove _qqbot_render_qr and _qqbot_qr_flow from gateway.py (~90 lines);
call site becomes a single qr_register() import
- Fix potential segfault: previous code called loop.close() in the EXPIRED
branch and again in the finally block (double-close crashed under uvloop)
- Add configurable retain_tags / retain_source / retain_user_prefix /
retain_assistant_prefix knobs for native Hindsight.
- Thread gateway session identity (user_name, chat_id, chat_name,
chat_type, thread_id) through AIAgent and MemoryManager into
MemoryProvider.initialize kwargs so providers can scope and tag
retained memories.
- Hindsight attaches the new identity fields as retain metadata,
merges per-call tool tags with configured default tags, and uses
the configurable transcript labels for auto-retained turns.
Co-authored-by: Abner <abner.the.foreman@agentmail.to>
* feat(state): auto-prune old sessions + VACUUM state.db at startup
state.db accumulates every session, message, and FTS5 index entry forever.
A heavy user (gateway + cron) reported 384MB with 982 sessions / 68K messages
causing slowdown; manual 'hermes sessions prune --older-than 7' + VACUUM
brought it to 43MB. The prune command and VACUUM are not wired to run
automatically anywhere — sessions grew unbounded until users noticed.
Changes:
- hermes_state.py: new state_meta key/value table, vacuum() method, and
maybe_auto_prune_and_vacuum() — idempotent via last-run timestamp in
state_meta so it only actually executes once per min_interval_hours
across all Hermes processes for a given HERMES_HOME. Never raises.
- hermes_cli/config.py: new 'sessions:' block in DEFAULT_CONFIG
(auto_prune=True, retention_days=90, vacuum_after_prune=True,
min_interval_hours=24). Added to _KNOWN_ROOT_KEYS.
- cli.py: call maintenance once at HermesCLI init (shared helper
_run_state_db_auto_maintenance reads config and delegates to DB).
- gateway/run.py: call maintenance once at GatewayRunner init.
- Docs: user-guide/sessions.md rewrites 'Automatic Cleanup' section.
Why VACUUM matters: SQLite does NOT shrink the file on DELETE — freed
pages get reused on next INSERT. Without VACUUM, a delete-heavy DB stays
bloated forever. VACUUM only runs when the prune actually removed rows,
so tight DBs don't pay the I/O cost.
Tests: 10 new tests in tests/test_hermes_state.py covering state_meta,
vacuum, idempotency, interval skipping, VACUUM-only-when-needed,
corrupt-marker recovery. All 246 existing state/config/gateway tests
still pass.
Verified E2E with real imports + isolated HERMES_HOME: DEFAULT_CONFIG
exposes the new block, load_config() returns it for fresh installs,
first call prunes+vacuums, second call within min_interval_hours skips,
and the state_meta marker persists across connection close/reopen.
* sessions.auto_prune defaults to false (opt-in)
Session history powers session_search recall across past conversations,
so silently pruning on startup could surprise users. Ship the machinery
disabled and let users opt in when they notice state.db is hurting
performance.
- DEFAULT_CONFIG.sessions.auto_prune: True → False
- Call-site fallbacks in cli.py and gateway/run.py match the new default
(so unmigrated configs still see off)
- Docs: flip 'Enable in config.yaml' framing + tip explains the tradeoff
Follow-ups on top of salvaged #13923 (@keifergu):
- Print QR poll dot every 3s instead of every 18s so "Fetching
configuration results..." doesn't look hung.
- On "status=success but no bot_info" from the WeCom query endpoint,
log the full payload at WARNING and tell the user we're falling
back to manual entry (was previously a single opaque line).
- Document in the qr_scan_for_bot_info() docstring that the
work.weixin.qq.com/ai/qc/* endpoints are the admin-console web-UI
flow, not the public developer API, and may change without notice.
Also add keifergu@tencent.com to scripts/release.py AUTHOR_MAP so
release notes attribute the feature correctly.
Adds an optional skill that walks users through installing and using
alibaba/page-agent — a pure-JS in-page GUI agent that web developers
embed into their own webapps so end users can drive the UI with
natural language.
Three install paths: CDN demo (30s, no install), npm install into an
existing app with provider config table (Qwen/OpenAI/Ollama/OpenRouter),
and clone-from-source for dev/contributor workflow.
Clear use-case framing up front (embed AI copilot in SaaS/admin/B2B,
modernize legacy UIs, accessibility via natural language) and an
explicit NOT-for list that points users wanting server-side browser
automation back to Hermes' built-in browser tool.
Live-verified: repo builds on Node 22.22 + npm 10.9, dev:demo serves
at localhost:5174, API surface (new PageAgent{...}, panel.show(),
execute(task)) matches what the skill documents. Also verified
discovery end-to-end via OptionalSkillSource with isolated
HERMES_HOME — search/inspect/fetch all resolve
official/web-development/page-agent correctly.
New category directory: optional-skills/web-development/ with a
DESCRIPTION.md explaining the distinction from Hermes' own browser
automation (outside-in vs inside-out).
The transport refactor (PRs #13862 ff.) added agent/transports/ as a
sub-package but the setuptools packages.find include list only had
"agent" (top-level files), not "agent.*" (sub-packages).
pip install / Nix builds therefore ship run_agent.py (which now imports
from agent.transports on every API call) but omit the transports
directory entirely, causing:
ModuleNotFoundError: No module named 'agent.transports'
on every LLM call for packaged installs.
Adds "agent.*" to match the existing pattern used by tools, gateway,
tui_gateway, and plugins.
Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:
- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
mapping, and StepFun region/model persistence
Based on #6005 by @hengm3467.
Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>
_make_agent() was not calling resolve_runtime_provider(), so bare-slug
models (e.g. 'claude-opus-4-6' with provider: anthropic) left provider,
base_url, and api_key empty in AIAgent — causing HTTP 404 at
api.anthropic.com.
Now mirrors cli.py: calls resolve_runtime_provider(requested=None) and
forwards all 7 resolved fields to AIAgent.
Adds regression test.
2026-04-18 22:01:07 -07:00
1742 changed files with 195269 additions and 32106 deletions
### TUI in the Dashboard (`hermes dashboard` → `/chat`)
The dashboard embeds the real `hermes --tui` — **not** a rewrite. See `hermes_cli/pty_bridge.py` + the `@app.websocket("/api/pty")` endpoint in `hermes_cli/web_server.py`.
- Browser loads `web/src/pages/ChatPage.tsx`, which mounts xterm.js's `Terminal` with the WebGL renderer, `@xterm/addon-fit` for container-driven resize, and `@xterm/addon-unicode11` for modern wide-character widths.
-`/api/pty?token=…` upgrades to a WebSocket; auth uses the same ephemeral `_SESSION_TOKEN` as REST, via query param (browsers can't set `Authorization` on WS upgrade).
- The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
- Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.
**Do not re-implement the primary chat experience in React.** The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded `hermes --tui` — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.
**Structured React UI around the TUI is allowed when it is not a second chat surface.** Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. `ChatSidebar`, `ModelPickerDialog`, `ToolCall`) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.
---
## Adding New Tools
@@ -263,7 +282,7 @@ registry.register(
**2. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
Auto-discovery: any `hermes_agent/tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual import list to maintain.
Auto-discovery: any `tools/*.py` file with a top-level `registry.register()` call is imported automatically — no manual import list to maintain.
The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
@@ -271,7 +290,7 @@ The registry handles schema collection, dispatch, availability checking, and err
**State files**: If a tool stores persistent state (caches, logs, checkpoints), use `get_hermes_home()` for the base directory — never `Path.home() / ".hermes"`. This ensures each profile gets its own state.
**Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `todo_tool.py` for the pattern.
**Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `tools/todo_tool.py` for the pattern.
---
@@ -279,9 +298,13 @@ The registry handles schema collection, dispatch, availability checking, and err
### config.yaml options:
1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
2. Bump `_config_version` (currently 5) to trigger migration for existing users
2. Bump `_config_version` (check the current value at the top of `DEFAULT_CONFIG`)
ONLY if you need to actively migrate/transform existing user config
(renaming keys, changing structure). Adding a new key to an existing
section is handled automatically by the deep-merge and does NOT require
a version bump.
### .env variables:
### .env variables (SECRETS ONLY — API keys, tokens, passwords):
1. Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` with metadata:
```python
"NEW_API_KEY":{
@@ -293,13 +316,29 @@ The registry handles schema collection, dispatch, availability checking, and err
`metadata.hermes.config` (config.yaml settings the skill needs — stored
under `skills.config.<key>`, prompted during setup, injected at load time).
---
## Important Policies
### Prompt Caching Must Not Break
Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT implement changes that would:**
@@ -402,9 +529,10 @@ Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT i
Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.
### Working Directory Behavior
- **CLI**: Uses current directory (`.` → `os.getcwd()`)
- **Messaging**: Uses `MESSAGING_CWD` env var (default: homedirectory)
Slash commands that mutate system-prompt state (skills, tools, memory, etc.)
must be **cache-aware**: default to deferred invalidation (change takes
effect next session), with an opt-in `--now` flag for immediate
invalidation. See `/skills install --now` for the canonical pattern.
### Background Process Notifications (Gateway)
@@ -426,7 +554,7 @@ Hermes supports **profiles** — multiple fully isolated instances, each with it
`HERMES_HOME` directory (config, API keys, memory, sessions, skills, gateway, etc.).
The core mechanism: `_apply_profile_override()` in `hermes_cli/main.py` sets
`HERMES_HOME` before any module imports. All 119+ references to `get_hermes_home()`
`HERMES_HOME` before any module imports. All `get_hermes_home()` references
automatically scope to the active profile.
### Rules for profile-safe code
@@ -483,17 +611,45 @@ Use `get_hermes_home()` from `hermes_constants` for code paths. Use `display_her
for user-facing print/log messages. Hardcoding `~/.hermes` breaks profiles — each profile
has its own `HERMES_HOME` directory. This was the source of 5 bugs fixed in PR #3575.
### DO NOT use `simple_term_menu` for interactive menus
Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) instead. See `hermes_cli/tools_config.py` for the pattern.
### DO NOT introduce new `simple_term_menu` usage
Existing call sites in `hermes_cli/main.py` remain for legacy fallback only;
the preferred UI is curses (stdlib) because `simple_term_menu` has
ghost-duplication rendering bugs in tmux/iTerm2 with arrow keys. New
interactive menus must use `hermes_cli/curses_ui.py` — see
`hermes_cli/tools_config.py` for the canonical pattern.
### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.
### `_last_resolved_tool_names` is a process-global in `hermes_agent/tools/dispatch.py`
### `_last_resolved_tool_names` is a process-global in `model_tools.py`
`_run_single_child()` in `delegate_tool.py` saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.
### DO NOT hardcode cross-tool references in schema descriptions
Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `hermes_agent/tools/dispatch.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `model_tools.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
### The gateway has TWO message guards — both must bypass approval/control commands
When an agent is running, messages pass through two sequential guards:
(1) **base adapter** (`gateway/platforms/base.py`) queues messages in
`_pending_messages` when `session_key in self._active_sessions`, and
| Browse skills | `/skills` or `/<skill-name>` | `/skills` or `/<skill-name>` |
| Browse skills | `/skills` or `/<skill-name>` | `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
@@ -157,14 +157,10 @@ curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv venv --python 3.11
source venv/bin/activate
uv pip install -e ".[all,dev]"
python -m pytest tests/ -q
scripts/run_tests.sh
```
> **RL Training (optional):** To work on the RL/Tinker-Atropos integration:
> ```bash
> git submodule update --init tinker-atropos
> uv pip install -e "./tinker-atropos"
> ```
> **RL Training (optional):** The RL/Atropos integration (`environments/`) ships via the `atroposlib` and `tinker` dependencies pulled in by `.[all,dev]` — no submodule setup required.
> The Interface release — a full React/Ink rewrite of the interactive CLI, a pluggable transport architecture underneath every provider, native AWS Bedrock support, five new inference paths, a 17th messaging platform (QQBot), a dramatically expanded plugin surface, and GPT-5.5 via Codex OAuth.
This release also folds in all the highlights deferred from v0.10.0 (which shipped only the Nous Tool Gateway) — so it covers roughly two weeks of work across the whole stack.
---
## ✨ Highlights
- **New Ink-based TUI** — `hermes --tui` is now a full React/Ink rewrite of the interactive CLI, with a Python JSON-RPC backend (`tui_gateway`). Sticky composer, live streaming with OSC-52 clipboard support, stable picker keys, status bar with per-turn stopwatch and git branch, `/clear` confirm, light-theme preset, and a subagent spawn observability overlay. ~310 commits to `ui-tui/` + `tui_gateway/`. (@OutThisLife + Teknium)
- **Transport ABC + Native AWS Bedrock** — Format conversion and HTTP transport were extracted from `run_agent.py` into a pluggable `agent/transports/` layer. `AnthropicTransport`, `ChatCompletionsTransport`, `ResponsesApiTransport`, and `BedrockTransport` each own their own format conversion and API shape. Native AWS Bedrock support via the Converse API ships on top of the new abstraction. ([#10549](https://github.com/NousResearch/hermes-agent/pull/10549), [#13347](https://github.com/NousResearch/hermes-agent/pull/13347), [#13366](https://github.com/NousResearch/hermes-agent/pull/13366), [#13430](https://github.com/NousResearch/hermes-agent/pull/13430), [#13805](https://github.com/NousResearch/hermes-agent/pull/13805), [#13814](https://github.com/NousResearch/hermes-agent/pull/13814) — @kshitijk4poor + Teknium)
- **Five new inference paths** — Native NVIDIA NIM ([#11774](https://github.com/NousResearch/hermes-agent/pull/11774)), Arcee AI ([#9276](https://github.com/NousResearch/hermes-agent/pull/9276)), Step Plan ([#13893](https://github.com/NousResearch/hermes-agent/pull/13893)), Google Gemini CLI OAuth ([#11270](https://github.com/NousResearch/hermes-agent/pull/11270)), and Vercel ai-gateway with pricing + dynamic discovery ([#13223](https://github.com/NousResearch/hermes-agent/pull/13223) — @jerilynzheng). Plus Gemini routed through the native AI Studio API for better performance ([#12674](https://github.com/NousResearch/hermes-agent/pull/12674)).
- **GPT-5.5 over Codex OAuth** — OpenAI's new GPT-5.5 reasoning model is now available through your ChatGPT Codex OAuth, with live model discovery wired into the model picker so new OpenAI releases show up without catalog updates. ([#14720](https://github.com/NousResearch/hermes-agent/pull/14720))
- **QQBot — 17th supported platform** — Native QQBot adapter via QQ Official API v2, with QR scan-to-configure setup wizard, streaming cursor, emoji reactions, and DM/group policy gating that matches WeCom/Weixin parity. ([#9364](https://github.com/NousResearch/hermes-agent/pull/9364), [#11831](https://github.com/NousResearch/hermes-agent/pull/11831))
- **Plugin surface expanded** — Plugins can now register slash commands (`register_command`), dispatch tools directly (`dispatch_tool`), block tool execution from hooks (`pre_tool_call` can veto), rewrite tool results (`transform_tool_result`), transform terminal output (`transform_terminal_output`), ship image_gen backends, and add custom dashboard tabs. The bundled disk-cleanup plugin is opt-in by default as a reference implementation. ([#9377](https://github.com/NousResearch/hermes-agent/pull/9377), [#10626](https://github.com/NousResearch/hermes-agent/pull/10626), [#10763](https://github.com/NousResearch/hermes-agent/pull/10763), [#10951](https://github.com/NousResearch/hermes-agent/pull/10951), [#12929](https://github.com/NousResearch/hermes-agent/pull/12929), [#12944](https://github.com/NousResearch/hermes-agent/pull/12944), [#12972](https://github.com/NousResearch/hermes-agent/pull/12972), [#13799](https://github.com/NousResearch/hermes-agent/pull/13799), [#14175](https://github.com/NousResearch/hermes-agent/pull/14175))
- **`/steer` — mid-run agent nudges** — `/steer <prompt>` injects a note that the running agent sees after its next tool call, without interrupting the turn or breaking prompt cache. For when you want to course-correct an agent in-flight. ([#12116](https://github.com/NousResearch/hermes-agent/pull/12116))
- **Shell hooks** — Wire any shell script as a Hermes lifecycle hook (pre_tool_call, post_tool_call, on_session_start, etc.) without writing a Python plugin. ([#13296](https://github.com/NousResearch/hermes-agent/pull/13296))
- **Webhook direct-delivery mode** — Webhook subscriptions can now forward payloads straight to a platform chat without going through the agent — zero-LLM push notifications for alerting, uptime checks, and event streams. ([#12473](https://github.com/NousResearch/hermes-agent/pull/12473))
- **Smarter delegation** — Subagents now have an explicit `orchestrator` role that can spawn their own workers, with configurable `max_spawn_depth` (default flat). Concurrent sibling subagents share filesystem state through a file-coordination layer so they don't clobber each other's edits. ([#13691](https://github.com/NousResearch/hermes-agent/pull/13691), [#13718](https://github.com/NousResearch/hermes-agent/pull/13718))
- **Auxiliary models — configurable UI + main-model-first** — `hermes model` has a dedicated "Configure auxiliary models" screen for per-task overrides (compression, vision, session_search, title_generation). `auto` routing now defaults to the main model for side tasks across all users (previously aggregator users were silently routed to a cheap provider-side default). ([#11891](https://github.com/NousResearch/hermes-agent/pull/11891), [#11900](https://github.com/NousResearch/hermes-agent/pull/11900))
- **Dashboard plugin system + live theme switching** — The web dashboard is now extensible. Third-party plugins can add custom tabs, widgets, and views without forking. Paired with a live-switching theme system — themes now control colors, fonts, layout, and density — so users can hot-swap the dashboard look without a reload. Same theming discipline the CLI has, now on the web. ([#10951](https://github.com/NousResearch/hermes-agent/pull/10951), [#10687](https://github.com/NousResearch/hermes-agent/pull/10687), [#14725](https://github.com/NousResearch/hermes-agent/pull/14725))
- **Transport ABC** abstracts format conversion and HTTP transport from `run_agent.py` into `agent/transports/` ([#13347](https://github.com/NousResearch/hermes-agent/pull/13347))
- **AnthropicTransport** — Anthropic Messages API path ([#13366](https://github.com/NousResearch/hermes-agent/pull/13366), @kshitijk4poor)
- **ChatCompletionsTransport** — default path for OpenAI-compatible providers ([#13805](https://github.com/NousResearch/hermes-agent/pull/13805))
- **Kimi K2.6** across OpenRouter, Nous Portal, native Kimi, and HuggingFace ([#13148](https://github.com/NousResearch/hermes-agent/pull/13148), [#13152](https://github.com/NousResearch/hermes-agent/pull/13152), [#13169](https://github.com/NousResearch/hermes-agent/pull/13169))
- **Kimi K2.5** promoted to first position in all model suggestion lists ([#11745](https://github.com/NousResearch/hermes-agent/pull/11745), @kshitijk4poor)
- **Xiaomi MiMo v2.5-pro + v2.5** on OpenRouter, Nous Portal, and native ([#14184](https://github.com/NousResearch/hermes-agent/pull/14184), [#14635](https://github.com/NousResearch/hermes-agent/pull/14635), @kshitijk4poor)
- **GLM-5V-Turbo** for coding plan ([#9907](https://github.com/NousResearch/hermes-agent/pull/9907))
- **Claude Opus 4.7** in Nous Portal catalog ([#11398](https://github.com/NousResearch/hermes-agent/pull/11398))
- **OpenRouter elephant-alpha** in curated lists ([#9378](https://github.com/NousResearch/hermes-agent/pull/9378))
- **OpenCode-Go** — Kimi K2.6 and Qwen3.5/3.6 Plus in curated catalog ([#13429](https://github.com/NousResearch/hermes-agent/pull/13429))
- **minimax/minimax-m2.5:free** in OpenRouter catalog ([#13836](https://github.com/NousResearch/hermes-agent/pull/13836))
- **`/model` merges models.dev entries** for lesser-loved providers ([#14221](https://github.com/NousResearch/hermes-agent/pull/14221))
- Fix: preserve `session_id` across `previous_response_id` chains in `/v1/responses` ([#10059](https://github.com/NousResearch/hermes-agent/pull/10059))
---
## 🖥️ New Ink-based TUI
A full React/Ink rewrite of the interactive CLI — invoked via `hermes --tui` or `HERMES_TUI=1`. Shipped across ~310 commits to `ui-tui/` and `tui_gateway/`.
### TUI Foundations
- New TUI based on Ink + Python JSON-RPC backend
- Prettier + ESLint + vitest tooling for `ui-tui/`
- Entry split between `src/entry.tsx` (TTY gate) and `src/app.tsx` (state machine)
- Persistent `_SlashWorker` subprocess for slash command dispatch
- **QQBot (17th platform)** — QQ Official API v2 adapter with QR setup, streaming, package split ([#9364](https://github.com/NousResearch/hermes-agent/pull/9364), [#11831](https://github.com/NousResearch/hermes-agent/pull/11831))
- **concept-diagrams** (salvage of #11045, @v1k22) ([#11363](https://github.com/NousResearch/hermes-agent/pull/11363))
- **architecture-diagram** (Cocoon AI port) ([#9906](https://github.com/NousResearch/hermes-agent/pull/9906))
- **pixel-art** with hardware palettes and video animation ([#12663](https://github.com/NousResearch/hermes-agent/pull/12663), [#12725](https://github.com/NousResearch/hermes-agent/pull/12725))
- Context `session_search` coerces limit to int (prevents TypeError) ([#10522](https://github.com/NousResearch/hermes-agent/pull/10522))
- Memory tool stays available when `fcntl` is unavailable (Windows) ([#9783](https://github.com/NousResearch/hermes-agent/pull/9783))
- Trajectory compressor credentials load from `HERMES_HOME/.env` ([#9632](https://github.com/NousResearch/hermes-agent/pull/9632), @Dusk1e)
-`@_context_completions` no longer crashes on `@` mention ([#9683](https://github.com/NousResearch/hermes-agent/pull/9683), @kshitijk4poor)
- Group session `user_id` no longer treated as `thread_id` in shutdown notifications ([#10546](https://github.com/NousResearch/hermes-agent/pull/10546))
- Telegram `platform_hint` — markdown is supported (closes #8261) ([#10612](https://github.com/NousResearch/hermes-agent/pull/10612))
- Doctor checks for Kimi China credentials fixed
- Streaming: don't suppress final response when commentary message is sent ([#10540](https://github.com/NousResearch/hermes-agent/pull/10540))
- Rapid Telegram follow-ups no longer get cut off
---
## 🧪 Testing & CI
- **Contributor attribution CI check** on PRs ([#9376](https://github.com/NousResearch/hermes-agent/pull/9376))
- Hermetic test parity (`scripts/run_tests.sh`) held across this window
- Test count stabilized post-Transport refactor; CI matrix held green through the transport rollout
# Reset per-call summary failure state — callers inspect these fields
# after compress() returns to decide whether to surface a warning.
self._last_summary_dropped_count=0
self._last_summary_fallback_used=False
self._last_summary_error=None
self._last_aux_model_failure_error=None
self._last_aux_model_failure_model=None
n_messages=len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress=self.protect_first_n+3+1
@@ -1144,10 +1321,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
foriinrange(compress_start):
msg=messages[i].copy()
ifi==0andmsg.get("role")=="system":
existing=msg.get("content")or""
existing=msg.get("content")
_compression_note="[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"
# Supports single tasks and batch mode (default 3 parallel, configurable).
delegation:
max_iterations: 50 # Max tool-calling turns per child (default: 50)
# max_concurrent_children: 3 # Max parallel child agents (default: 3)
# max_spawn_depth: 1# Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
# max_concurrent_children: 3 # Max parallel child agents per batch (default: 3, floor: 1, no ceiling).
# WARNING: values above 10 multiply API cost linearly.
# max_spawn_depth: 1 # Delegation tree depth cap (range: 1-3, default: 1 = flat).
# Raise to 2 to allow workers to spawn their own subagents.
# Requires role="orchestrator" on intermediate agents.
# orchestrator_enabled: true # Kill switch for role="orchestrator" children (default: true).
# subagent_auto_approve: false # When a subagent hits a dangerous-command approval prompt, auto-deny (default: false)
# or auto-approve "once" (true) instead of blocking on stdin.
# The parent TUI owns stdin, so blocking would deadlock; non-interactive resolution is required.
# Both choices emit a logger.warning audit line. Flip to true only for cron/batch pipelines.
# inherit_mcp_toolsets: true # When explicit child toolsets are narrowed, also keep the parent's MCP toolsets (default: true). Set false for strict intersection.
# model: "google/gemini-3-flash-preview" # Override model for subagents (empty = inherit parent)
f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.