Compare commits

...

158 Commits

Author SHA1 Message Date
Brooklyn Nicholson b431ae73ef fix(cli): address Copilot review #1 (4 threads)
Thread 1 (cli.py:1488): Fix broken skin hook — class is SkinConfig
not Skin. The previous code silently no-op'd via the broad except,
so SkinConfig.get_color() calls weren't actually remapped. Verified
the hook fires now: in light mode, banner_text returns #1A1A1A
instead of #FFF8DC.

Thread 2 (cli.py:1328): Align comment with actual timeout. The OSC 11
read deadline is 100ms (time.monotonic() + 0.1), not 50ms. Fixed
the docstring.

Thread 3 (cli.py:13389): Remove unused imports of Point and Screen
in the _output_screen_diff monkey-patch block. Leftover from earlier
experiments — the wrapper only needs previous_screen mutation.

Thread 4 (cli.py:11422): Skip light-mode remap entirely when a pt
style string already specifies its own bg (e.g. 'bg:#1a1a2e #FFF8DC'
for status-bar / completion-menu). Those colors were tuned for that
specific dark bg; remapping the FG to #1A1A1A would produce
dark-on-dark (invisible). Now we detect the explicit 'bg:' token
and leave the whole value untouched.

Also dropped the stale comment block at the resize-handler that
described the old 'force \x1b[2J\x1b[H clear-screen on resize'
recovery — replaced with the actual current strategy
(monkey-patch _output_screen_diff).
2026-05-15 00:21:19 -05:00
Brooklyn Nicholson 1d109f5be3 feat(cli): light-mode color remap covers all skin reads (Rich Panel borders, etc)
Three changes that together make the response Panel readable in light
Terminal.app mode:

1. Hook Skin.get_color() at module load so EVERY skin color read goes
   through _maybe_remap_for_light_mode(). Previously only _hex_to_ansi()
   and pt's style strings were remapped — Rich Panel borders and body
   text bypassed the remap and stayed as #FFF8DC (cornsilk on cream).

2. Prime the light-mode detection cache at import time when stdin is
   a tty. Ensures OSC 11 query happens before any banner/Panel render.

3. Drop status-bar fg colors (#C0C0C0 silver, #888888, #555555, #8B8682)
   from the remap table — those are paired with a dark navy bg, so
   remapping them to dark gray would make them invisible the OTHER
   direction (dark on dark).
2026-05-15 00:01:16 -05:00
Brooklyn Nicholson 97b407cedd fix(cli): prime light-mode detection at run() start, before pt grabs tty
OSC 11 background query needs raw tty access; running it from inside
pt's render path could race with pt's own tty handling.  Call
_detect_light_mode() once in HermesCLI.run() at startup so the result
is cached before pt's Application starts.
2026-05-14 23:41:13 -05:00
Brooklyn Nicholson 61e63cbaa8 feat(cli): light/dark terminal mode detection + automatic color remap
Mirrors ui-tui/src/theme.ts detectLightMode() in Python so the base
hermes CLI also adapts to light Terminal.app backgrounds.

Detection priority (first match wins):
  1. HERMES_LIGHT / HERMES_TUI_LIGHT env (true/false)
  2. HERMES_TUI_THEME=light|dark
  3. HERMES_TUI_BACKGROUND=#RRGGBB
  4. COLORFGBG env (xterm/Konsole/urxvt)
  5. OSC 11 query (\x1b]11;?\x1b\\) — asks the terminal directly
     with a 100ms timeout
  6. Default: dark

When light mode is detected, dark-mode-tuned skin colors are remapped
to higher-contrast equivalents:
  #FFF8DC (cornsilk) -> #1A1A1A (near-black)
  #FFD700 (gold)     -> #9A6B00 (dark goldenrod)
  #B8860B (dim)      -> #5C4500 (deeper brown)
  ... etc

Hooked at two points:
  - _hex_to_ansi() — auto-remaps any color emitted via the ANSI helper
  - _build_tui_style_dict() — rewrites pt style strings (chrome bg/fg)

Set HERMES_TUI_THEME=light to force light-mode behavior; otherwise
the OSC 11 query at startup auto-detects in most modern terminals.
2026-05-14 23:39:12 -05:00
Brooklyn Nicholson 07d4a172cc fix(cli): use ANSI dim+italic for [thinking] text (light/dark mode)
The _DIM ANSI escape was a SkinAwareAnsi bound to banner_dim (#B8860B
dark goldenrod). On light cream Terminal.app backgrounds this rendered
the [thinking] reasoning preview essentially invisible (dark goldenrod
on cream is very low contrast).

Replace _DIM with a fixed ANSI dim+italic escape (\x1b[2;3m) so dim
text inherits the terminal's default foreground color and stays
readable in both light and dark Terminal.app modes.

Updated the /skin command to no longer call _DIM.reset() since _DIM
is now a plain str.
2026-05-14 23:24:30 -05:00
Brooklyn Nicholson 8033b9cf0d fix(skin): always use terminal default for typed input (light/dark mode)
Skin engine was setting 'input-area' style to the skin's 'prompt' color
(near-white #FFF8DC for default and most other skins). On light-mode
Terminal.app this made typed text invisible (white-on-white).

Decouple the prompt symbol color (still skin-controlled) from the typed
input color (now always inherits terminal default fg). The user's typed
text is now readable in both light and dark Terminal.app modes
regardless of which skin is active.
2026-05-14 23:10:51 -05:00
Brooklyn Nicholson dabe459617 fix(cli): default input/prompt color to terminal foreground (light mode visibility)
Hardcoded #FFF8DC (cornsilk) for the input area and prompt made typed
text invisible on light-mode Terminal.app (white-on-white).

Default to empty style string '' so the input/prompt inherit the
terminal's default foreground color. Skins can still opt into a
colored prompt by setting the 'prompt' color explicitly in their YAML.
banner_text default kept at #FFF8DC since the banner has its own
background and the legacy default was working there.
2026-05-14 22:56:24 -05:00
Brooklyn Nicholson 4a1303d7e4 fix(cli): tighten _output_screen_diff patch to preserve ANSI styles
Previous version (ba3822a64) replaced None previous_screen with a
fresh Screen() before passing to pt's renderer. That changed the
behavior of pt's `if not previous_screen` guard at L178-185, which
fires reset_attributes() + erase_down() on first-paint and after
width changes. With that reset suppressed, ANSI styles can leak
between renders and chat text loses its color/bold/italic styling.

Fix: only mutate previous_screen.height when previous_screen is
already non-None AND its current height is genuinely smaller than
the new screen's height. Don't touch the None case at all — let pt's
own first-paint reset path run as designed.

The reserve-vertical-space scroll suppression (the actual bug fix)
still works because that branch only matters when previous_screen
exists with a height that's less than current_height — which is
exactly the case we now handle.

# Verified empirically

- Before/after resize: colors preserved (status bar yellow, rules
  orange, "26 commits behind" warning yellow caution)
- After widen back: colors still correct
- 10-resize stress test: ZERO scrollback delta, full content preserved
2026-05-14 22:48:19 -05:00
Brooklyn Nicholson ba3822a643 fix(cli): monkey-patch pt's _output_screen_diff to skip reserve-vertical-scroll
# What changed

Replaced DECSTBM scroll region + chrome-row erase approach with a
direct monkey-patch of prompt_toolkit's module-level
`_output_screen_diff` function.

The DECSTBM approach had two killer bugs:
1. Scroll region leaked into the user's shell after hermes quit
   (atexit firing semantics + the region persists across processes
   in macOS Terminal.app)
2. Chrome-row erase wiped chat content / streaming responses if user
   resized mid-stream

# Root cause (re-verified by reading pt/renderer.py)

`_output_screen_diff` (renderer.py L232-242) deliberately moves the
cursor to the bottom of the canvas after painting:

```python
# Correctly reserve vertical space as required by the layout.
# When this is a new screen (drawn for the first time), or for some
# reason higher than the previous one. Move the cursor once to the
# bottom of the output. That way, we're sure that the terminal
# scrolls up, even when the lower lines of the canvas just contain
# whitespace.
if current_height > previous_screen.height:
    current_pos = move_cursor(Point(x=0, y=current_height - 1))
```

In non-fullscreen mode this scrolls chrome content into terminal
scrollback EVERY render — not just on resize. The `move_cursor`
walks down via `\r\n` which scrolls when at the bottom row.

# Fix

Wrap `_output_screen_diff` and inflate `previous_screen.height` to
match `screen.height` before passing through. This makes the
`if current_height > previous_screen.height` guard fall through and
skip the bottom-cursor-move entirely. Without that move, pt's render
only writes within the layout's actual rows. `\r\n` between rows
inside the layout body never reaches the bottom of the viewport
(because `move_cursor(0,0)` walks UP first to layout-top, then
`\r\n*N` walks DOWN only as far as the layout actually spans).

# Verified empirically in real Terminal.app

10-resize stress test (mixed shrink+widen) during streaming:
   ZERO scrollback delta (0 status bars added)
   Full streaming response preserved
   User input preserved
   Banner preserved in scrollback
   Status bar correctly anchored at bottom
   No visible duplicates anywhere
   No shell breakage after quit (no scroll region to leak)

# Reverted

- DECSTBM scroll region (shell-leak risk gone)
- atexit handler for scroll region restore (no longer needed)
- Chrome-row erase (\x1b[2K walking) — no longer needed
- _hermes_resize_clear function — back to vanilla _schedule_resize_recovery
2026-05-14 22:27:55 -05:00
Brooklyn Nicholson eac40204c2 fix(cli): erase only chrome rows on resize, preserve chat output
Previous version (fef97aee5) used `\x1b[J` (erase from cursor to end of
screen) which WIPED the entire viewport — losing the user's just-typed
message and any streaming agent response if they resized mid-stream.

Fix: erase ONLY the bottom chrome rows (`CHROME_ROWS = 8`, generous
slack for status bar + 2 rules + input + reflow extras).  Walk up
from the bottom; for each row emit `\x1b[<row>;1H\x1b[2K` (move
to row, erase line).  `\x1b[2K` does NOT push to scrollback.

Chat content above the chrome band stays untouched.

# Verified empirically in real Terminal.app

Test sequence:
  1. Start hermes (170 cols)
  2. Send message "Tell me a 4 sentence story about a cat"
  3. While agent is streaming, shrink to 98 cols
  4. Widen back to 170 cols

Result after this fix:
   User's message still visible
   "Initializing agent..." still visible
   Full agent response still visible (the cat story)
   Status bar at bottom, no duplicates
   Banner preserved in scrollback above
   Zero scrollback pollution (delta = 0 across 2 resizes)
2026-05-14 22:06:59 -05:00
Brooklyn Nicholson fef97aee59 fix(cli): DECSTBM scroll region + \x1b[J erase for clean resize
# Verified empirically in real Terminal.app with real shell scrollback above

After 6 column shrinks:
   ZERO status bars accumulated in scrollback (delta = 0)
   Status bar correctly anchored at bottom of viewport
   No visible duplicate chrome
   Chat responses display correctly after fix
   Layout matches normal hermes UX

# Root cause (verified by reading prompt_toolkit/renderer.py source)

pt's `_output_screen_diff` (renderer.py:106) emits `write("\r\n" * N)` to
advance the cursor between rows during paint. At the bottom row of the
terminal, each `\r\n` SCROLLS the viewport, pushing content into terminal
scrollback. pt does this *deliberately* — see line 232-242 comment:
"Move the cursor once to the bottom of the output. That way, we're sure
that the terminal scrolls up". This is the actual mechanism behind pt
issues #29 (open since 2014), #1675, #1933. aider/xonsh/ipython all hit
this wall and gave up; nobody on GitHub has shipped a fix.

# The fix

DECSTBM `\x1b[<top>;<bottom>r` sets a SCROLL REGION on the terminal.
When pt's `\r\n` scrolls within the region, rows that fall off the top
of the region are DISCARDED instead of being pushed to terminal
scrollback. Region top must be > 1 — when region starts at row 1, the
terminal treats it semantically as "no region" and scrolled content
still goes to scrollback. Above row 2 it gets discarded.

Same trick used by vim's status line, tmux, weechat, htop.

Three more critical details:

1. **DECSTBM resets cursor to (1,1).** We follow it with an explicit
   `\x1b[<rows>;1H` to move the cursor back to the bottom row, so pt's
   render anchors the chrome at the bottom of the viewport.

2. **`\x1b[J` (erase from cursor to end of screen) does NOT push to
   scrollback.** `\x1b[2J` does. So on resize we use `\x1b[J` to wipe
   the old reflowed chrome WITHOUT polluting history.

3. **Skip `_schedule_resize_recovery`** — its `_status_bar_suppressed
   _after_resize=True` flag hides the chrome until next user input,
   which makes resize feel broken with this fix in place. Call pt's
   native `_on_resize` directly instead.

# Reverts

- transcript widget (alt-screen-only path, was an earlier attempt)
- alt-screen mode (broke chat output rendering)
- HERMES_DEBUG_RESIZE / HERMES_RESIZE_STRATEGY env-var paths
2026-05-14 21:57:39 -05:00
Teknium 2844c888f1 fix(cli): clamp scrollback box widths + suppress status bar after resize (#25975)
When the terminal shrinks, already-printed box-drawing rules (response,
reasoning, streaming TTS, background-task Panels) reflow into multiple
narrower rows — visible as duplicated horizontal separators / ghost
lines in scrollback. Similarly, prompt_toolkit redraws a fresh status
bar on SIGWINCH on top of one the terminal just reflowed, producing
double-bar artifacts on column shrink.

Two surgical changes:

1. Decorative scrollback boxes now use a new
   `HermesCLI._scrollback_box_width()` helper that clamps to
   `max(32, min(width, 56))`. The live TUI footer is unaffected and still
   uses the full width. Covers: streaming response box (open + close),
   reasoning box (open + close, both streaming and post-stream paths),
   streaming-TTS box close, final-response Rich Panel, and the
   background-task Rich Panel.

2. `_recover_after_resize()` now also sets a new
   `_status_bar_suppressed_after_resize` flag so the dynamic status bar
   and both input separator rules stay hidden until the next user input.
   The flag is cleared in the process loop the moment the user submits
   their next prompt, restoring chrome cleanly.

Tests:
- New `test_input_rules_hide_after_resize_until_next_input` covers the
  flag's effect on rule heights.
- New `test_scrollback_box_width_caps_to_resize_safe_value` covers the
  helper at floor / cap / mid-range / overflow.
- Existing resize-recovery test extended to assert the flag flips.

Refs: #18449 #19280 #22976
Salvage of #24403.

Co-authored-by: Szymonclawd <szymonclawd@mac.home>
2026-05-14 15:22:44 -07:00
teknium1 f491b07cb2 chore(release): map @LeonSGP43 commit email in AUTHOR_MAP 2026-05-14 15:14:29 -07:00
LeonSGP43 ac64d0c2ca fix: preserve ansi output history on resize replay 2026-05-14 15:14:29 -07:00
Teknium 6244535682 fix(voice): remove per-tool-call beep in CLI voice mode (#25967)
The spinner already shows tool activity visually; the 1.2 kHz tone on
every tool.started event was unwanted noise (especially on WSL2, where
each beep also triggers Windows Terminal's bell notification).

Removed the play_beep call in _on_tool_progress entirely. Record
start/stop beeps (gated by voice.beep_enabled) are unaffected.
2026-05-14 15:12:10 -07:00
teknium1 7bf66a07bd chore(release): map @1000Delta in AUTHOR_MAP 2026-05-14 15:11:51 -07:00
Xu Zhizhong 06c6c1f0f2 fix(cli): batch resize history replay 2026-05-14 15:11:51 -07:00
Teknium fe83c4001b fix(codex-app-server): attach redacted stderr tail to generic failures (#25929)
When codex app-server fails outside the OAuth-classified path
(non-auth turn/start errors, plain TimeoutErrors, generic turn-ended
status, subprocess silently exits, hard deadline timeout), the user
got a bare 'Internal error' / 'turn/start failed: ...' with no
context. Diagnosing config/provider/auth-bridge issues forced a
re-run with verbose codex flags.

Add a _format_error_with_stderr helper that appends the last few
stderr lines via agent.redact.redact_sensitive_text(force=True),
and use it at every catch-all error site:

- ensure_started() failures (codex init / thread/start) now return
  a TurnResult.error with should_retire=True instead of bubbling
- non-OAuth turn/start CodexAppServerError / TimeoutError
- subprocess-died branch (previously dumped raw stderr_blob[-300:]
  with no redaction — a leak risk)
- turn ended with non-completed status
- hard turn-timeout deadline

OAuth-classified failures and the post-tool quiet watchdog already
produce clean hints and stay unchanged. The redactor catches sk-*,
gh*_*, Authorization: Bearer, query-string tokens, JWTs, private
keys, etc., so provider error payloads can't leak into chat output
or trajectories.

Inspired by openclaw#80718, adapted for our app-server transport.
2026-05-14 14:55:23 -07:00
helix4u a28add199d fix(agent): keep image tool results from poisoning text-only sessions 2026-05-14 14:52:15 -07:00
VTRiot bc42e62b17 fix(gateway): prevent duplicate final send when only cosmetic edit failed
When the stream consumer's got_done handler successfully delivers the
final response content via _send_or_edit but the subsequent edit
(e.g. cursor removal) fails, final_response_sent remains False even
though the user has already received the final answer. The gateway's
fallback send path then re-delivers the same content, causing the
user to see the response twice on Telegram.

Introduce a new _final_content_delivered flag on the stream consumer,
set by the got_done handler when the final content has reached the
user. The _run_agent suppression logic now treats this flag as an
additional signal (alongside final_response_sent and
response_previewed) that final delivery is already complete.

This preserves the existing behavior for intermediate-text-only
streams (where already_sent=True but no final content has been
delivered) — those still receive the gateway's fallback send, matching
the test expectation in test_partial_stream_output_does_not_set_already_sent.

Adds TestFinalContentDeliveredSuppression with two cases covering
both the suppression (content delivered + edit failed) and the
non-suppression (intermediate text only) branches.
2026-05-14 14:51:07 -07:00
luyao618 b4b8509fe8 fix(gateway): load streaming config from nested gateway.streaming key
`hermes config set gateway.streaming.*` writes the streaming block
nested under a `gateway:` key in config.yaml, but the config loader
only checked for a top-level `streaming:` key — silently ignoring
the nested variant.

Fall back to `yaml_cfg['gateway']['streaming']` when the top-level
key is absent, matching the pattern already used for other nested
config sections.

Closes #25676
2026-05-14 14:51:07 -07:00
luyao618 d44dafdb4e fix(telegram): set REQUIRES_EDIT_FINALIZE so final MarkdownV2 edit is not skipped
When the final streamed text is identical to the last plain-text edit,
stream_consumer._send_or_edit short-circuits and never calls
adapter.edit_message(finalize=True).  For Telegram, this skips the
plain-text → MarkdownV2 conversion, leaving raw Markdown syntax visible
to the user.

Set REQUIRES_EDIT_FINALIZE = True on TelegramAdapter so the finalize
edit is always delivered, matching the existing DingTalk pattern.

Fixes #25710
2026-05-14 14:51:07 -07:00
ethernet cd64bed55e Merge pull request #21012 from stephenschoettler/fix/ci-pr-check-unblock
fix(ci): unblock shared PR checks
2026-05-14 16:16:42 -04:00
Teknium 9ed751b967 fix(whatsapp): drop status broadcasts and channel newsletters before agent dispatch (#25845)
WhatsApp pseudo-chats (Status updates / Stories, Channels / Newsletters,
broadcast lists) were being routed through the full agent pipeline. A
user's gateway.log showed the agent replying to a contact's Story
('status@broadcast') with 345 chars plus title-generation cost, which
also shows up in the contact's status feed.

Drop these JIDs at _should_process_message() before the policy gate so
they're filtered regardless of dm_policy or allowlist state. Covers:
- status@broadcast (Stories)
- *@newsletter (Channels)
- *@broadcast (broadcast lists, future-proofing)

The bridge.js already filters these on the fromMe outbound path, but
inbound events on self-chat mode skipped that check.

Tests:
- status@broadcast dropped on open policy
- broadcast filter wins over allowlisted senders
- real DMs still pass through
- helper unit cases (case-insensitive, whitespace-tolerant)

26/26 tests/gateway/test_whatsapp_group_gating.py pass; 59/59 adjacent
WhatsApp test suites pass.
2026-05-14 09:59:03 -07:00
Teknium b08f53a758 skill(comfyui): add template-integrity reference from @purzbeats (#25828)
Adds references/template-integrity.md covering safe conversion of the
official comfyui-workflow-templates package from editor format to API
format — Reroute bypass via link tracing, dotted dynamic-input keys
(values.a, resize_type.width) that must NOT be flattened, server-error
"patch don't rebuild" loop, Cloud quirks (302 redirect to signed GCS
URL, free-tier 1 concurrent job, 1920x1080 OOM on RTX 5090), and a
Discord-compatible ffmpeg stitch recipe (yuv420p + xfade/acrossfade).

SKILL.md lists the new reference so the agent loads it when starting
from an official template. purzbeats added to author list and to
scripts/release.py AUTHOR_MAP.

Co-authored-by: purzbeats <97489706+purzbeats@users.noreply.github.com>
2026-05-14 09:34:10 -07:00
Teknium 78b842c995 fix(install): support non-sudo service-user installs on apt distros (#25814)
The Debian/Ubuntu branch of install_node_deps() ran 'npx playwright install
--with-deps chromium' unconditionally. Playwright invokes sudo interactively
to apt-install Chromium's system libraries, which blocks the installer for
non-sudo users (systemd service accounts, unprivileged operator users) on
an unsatisfiable password prompt.

Changes:
- install.sh: gate --with-deps behind a sudo capability check on the apt
  branch (matches the existing Arch/pacman branch pattern). Non-sudo users
  fall back to 'npx playwright install chromium' alone and the installer
  prints the exact 'sudo npx playwright install-deps chromium' command an
  administrator can run separately.
- install.sh: add --skip-browser (alias --no-playwright) to skip the
  Playwright step entirely for headless installs that don't need browser
  automation. Mirrors the existing --no-venv / --skip-setup shape.
- installation.md: add a 'Non-Sudo / System Service User Installs' section
  covering the admin/service-user split, the --skip-browser flag, and the
  ~/.local/bin PATH gotcha (the root cause of the 'No module named dotenv'
  error users hit when running the repo source 'hermes' script with system
  Python instead of the venv launcher).
- test_install_sh_browser_install.py: regression coverage for the
  --skip-browser flag and the sudo-gate on the apt branch.

Reported by @ssilver in Discord.
2026-05-14 09:05:31 -07:00
EthanGuo-coder 26933c2f59 fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks
_make_stream_chunk built delta_kwargs with only `role`, so a reasoning-only
chunk produced a SimpleNamespace without a `.content` attribute. Downstream
consumers that read `delta.content` then raised AttributeError on Gemini 2.5
Flash, where the thinking delta arrives before any content delta.

Seed `content`, `tool_calls`, `reasoning`, and `reasoning_content` as None
up front, matching the pattern already used in gemini_native_adapter.py.
Key-present arguments still override the defaults.

Fixes #24974
References: Related open PR #24984 (luyao618) applies the same 1-line fix; this PR adds a regression test that #24984 omits
Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-14 08:03:56 -07:00
Teknium 72b5dd8658 fix(update): refresh lazy-installed backends on hermes update (#25766)
Pyproject's [all] extra was slimmed down in May 2026 — ~20 optional
backends moved to tools/lazy_deps.py and only install on first use.
hermes update runs uv pip install -e .[all] which doesn't touch any of
them, so pin bumps in LAZY_DEPS (CVE response, transitive fixes) were
silently ignored on already-activated backends.

Two changes:

1. _is_satisfied() now parses the spec and checks the installed version
   against the constraint via packaging.specifiers. Previously it
   returned True the moment the package name was importable, which made
   ensure() a name-presence gate rather than a version-pin gate.

2. New active_features() / refresh_active_features() pair: lists every
   feature with at least one of its packages currently installed, then
   re-runs ensure() on each. Refresh is invoked at the end of
   _cmd_update_impl, right after the [all] install completes. Cold
   backends (never activated) stay quiet — no churn for them.

Output during update is one summary block:
  → Refreshing 4 active lazy backend(s)...
    ↑ 1 refreshed: provider.anthropic
    ✓ 3 already current
or
    ⚠ memory.honcho failed to refresh: <pip stderr>

Failures never raise out of update — backends keep their previously-
installed version and we tell the user to rerun once upstream is fixed.
security.allow_lazy_installs=false is honored: features get marked
"skipped" with the reason shown.

Tests: 18 new unit tests covering version-aware satisfaction (exact pin,
range, extras blocks, missing package, malformed spec), active feature
discovery, and refresh status reporting. All 61 lazy_deps tests pass.
2026-05-14 08:03:40 -07:00
wesleysimplicio 436a0a271e test(toolsets): lock web search into default platform coverage
Adds regression tests pinning web search into the WhatsApp and api-server
default platform-coverage toolsets. Pure test additions, no runtime change.

Salvage of the test-addition commit from #25692 by @wesleysimplicio.
(The AUTHOR_MAP fixup commit from the same PR landed separately as
529ec85c7.)
2026-05-14 08:03:33 -07:00
wesleysimplicio 529ec85c77 chore(release): map oswaldb22 noreply email for AUTHOR_MAP
Co-Authored-By: Oswald <oswaldb22@users.noreply.github.com>
2026-05-14 08:02:25 -07:00
wesleysimplicio 364ddd45e8 fix(terminal): prevent safety filter false positives on keywords inside quoted strings
The _foreground_background_guidance() function matched background-wrapper
keywords (nohup/disown/setsid) anywhere in the command text, including
inside quoted strings, Python -c code, commit messages, and PR body text.

Two-layer fix:
1. Strip single-quoted, double-quoted, and backtick-quoted content before
   pattern matching via _strip_quotes() helper.
2. Tighten the regex to only match keywords at command-start positions
   (after ^, ;, &, &&, ||, or $() — not mid-argument.

Both layers are needed: quote stripping handles the common case of keywords
in string literals, and the position-aware regex handles unquoted cases
like 'export FOO=setsid' (word boundary match, wrong position).

Fixes #20064
2026-05-14 08:02:01 -07:00
oxngon 3adde245b7 fix(gateway): forward image attachments to background agent tasks
When the gateway spawned a background agent (e.g. for delegation), media
URLs and types from the originating message weren't forwarded — the bg
agent saw the prompt but no attached images. Vision-enabled tasks
effectively lost their inputs.

Forwards media_urls/media_types through the bg-task spawn path and
runs the same vision-enrichment step the main flow uses, so the bg
agent gets image descriptions inlined into its prompt.

Closes #25614.

Salvage of #25603 by @oxngon (manually re-applied — original branch
was severely stale against current main).
2026-05-14 08:01:34 -07:00
vanthinh6886 a952ca3ff6 fix: restrict .env file permissions to 0600
Set file mode 0600 on ~/.hermes/.env after creation in the installer and
after every write via memory_setup._write_env_vars(). This ensures only
the file owner can read/write API keys and tokens, matching standard
practice for credential files (.netrc, .aws/credentials, .ssh/config).

Fixes #25477
2026-05-14 07:59:38 -07:00
zccyman f26098e22f fix(gateway): enable text-intercept for multi-choice clarify fallback (#25567) 2026-05-14 07:59:12 -07:00
AsoTora 1247ff2dca fix: stop retrying initial MCP auth failures 2026-05-14 07:58:43 -07:00
evgyur 1dd33988e2 docs: clarify media impact on session context 2026-05-14 07:58:20 -07:00
yifengingit c03acca508 fix: use AUTOINCREMENT id for message ordering instead of timestamp
On WSL2 (and similar environments), time.time() is not strictly monotonic
due to NTP sync or host clock adjustments. When clock regression occurs
during a multi-tool flush, later-inserted rows get earlier timestamps,
causing ORDER BY timestamp, id to sort them before rows that were written
first. This breaks the tool_calls/tool_response adjacency invariant and
triggers HTTP 400 from the API.

Use ORDER BY id instead, since id (INTEGER PRIMARY KEY AUTOINCREMENT)
always reflects true insertion order regardless of system clock behavior.
2026-05-14 07:57:54 -07:00
Arkmusn 8ae65d5c8c fix: read approvals.timeout from config in CLI approval callback
The _approval_callback method in HermesCLI hardcoded timeout=60
instead of reading the approvals.timeout config value. This meant
the config setting was silently ignored for CLI interactive prompts.

Other approval paths (callbacks.py, tools/approval.py) already read
the config correctly — only cli.py was missed.
2026-05-14 07:57:31 -07:00
teknium1 d8fdec16d5 chore(release): add AUTHOR_MAP entries for second new-contributor batch
Pre-stages AUTHOR_MAP for 7 new contributors in the upcoming batch:

- HxT9          (#25760)
- evgyur        (#25651)
- AsoTora       (#25624)
- oxngon        (#25603)
- yifengingit   (#25589)
- vanthinh6886  (#25562)
- Arkmusn       (#25559)

EthanGuo-coder, wesleysimplicio, and zccyman are already in the map.
2026-05-14 07:57:06 -07:00
Teknium 12f755c9eb fix(codex-runtime): retire wedged sessions + post-tool watchdog + OAuth refresh classify (#25769)
Mirrors openclaw beta.8's app-server resilience fixes so a stuck codex
subprocess can't burn the full turn deadline and so users get a
`codex login` pointer instead of raw RPC errors when their token expires.

- TurnResult.should_retire signals the caller to drop+respawn codex.
- Deadline-hit path and dead-subprocess detection set should_retire so
  the next turn doesn't ride a CPU-spinning or auth-broken process.
- Post-tool watchdog (post_tool_quiet_timeout=90s): if a tool item
  completes and codex goes silent past the threshold without further
  output or turn/completed, fast-fail instead of waiting the full 600s.
  Resets on any non-tool activity so normal think-after-tool flows are
  not affected.
- <turn_aborted> and <turn_aborted/> in agent text are treated as
  terminal — some codex builds tear down a turn that way without
  emitting turn/completed.
- _classify_oauth_failure() inspects RPC error message + stderr tail
  for invalid_grant / token refresh / 401 / etc. and rewrites
  user-facing errors to 'run codex login'. Conservative: generic
  failures still surface verbatim. Fires at turn/start failure,
  turn/completed failure, and dead-subprocess paths.
- thread/start cross-fill: tolerate thread.id, thread.sessionId,
  top-level sessionId/threadId so future codex schema drift doesn't
  KeyError us at handshake.
- run_agent.py: when run_turn returns should_retire=True OR raises,
  close + null self._codex_session so the next turn respawns.

Tests: +30 cases across session + integration suites.
  tests/agent/transports/test_codex_app_server_session.py 50/50 pass
  tests/run_agent/test_codex_app_server_integration.py 27/27 pass
  Broader codex scope (transports + cli runtime/migration) 376/376 pass
2026-05-14 07:55:09 -07:00
binhnt92 63991bbd97 fix(memory): skip OpenViking upload symlinks 2026-05-14 07:48:03 -07:00
teknium1 26deeea830 fix(telegram): restore model-switch success path + author map
The cherry-picked PR over-indented the edit_message_text block for
the mm: (model selected → switch) success path so the confirmation
edit lived inside the preceding 'except Exception as exc' branch and
only fired when the callback raised. Dedent the try/except back to
12-space indent so it runs after the callback succeeds, restoring
the original flow that removes the inline buttons and shows the
'Switched to ...' confirmation.

Add a regression test (test_model_selected_edits_message_on_success)
that asserts edit_message_text is awaited and the result text is
routed through format_message (MARKDOWN_V2 + backtick survival).

Add phuongvm to scripts/release.py AUTHOR_MAP.
2026-05-14 07:47:52 -07:00
Phuong Lambert a694040520 fix(telegram): escape dynamic markdown in callback flows
Use MarkdownV2 formatting for Telegram callback follow-ups and interactive prompts where dynamic names or user text can break legacy Markdown parsing. Add regression coverage for reload-mcp, model picker, approval callbacks, and update prompts.
2026-05-14 07:47:52 -07:00
Teknium 524490a409 fix(install.ps1): pin uv sync to venv\, verify baseline imports on Windows (#25755)
* fix(cli): allow rotating broken OpenRouter / AI Gateway key in `hermes model` flow

Before: when `OPENROUTER_API_KEY` (or `AI_GATEWAY_API_KEY`) was already
set in ~/.hermes/.env, `hermes model openrouter` / `hermes model
ai-gateway` skipped the API-key prompt entirely and jumped straight to
the model picker. Users with a broken / expired / wrong key had no way
to replace it without editing ~/.hermes/.env by hand or re-running
`hermes setup` from scratch.

Both flows now route through the existing `_prompt_api_key()` helper,
which surfaces [K]eep / [R]eplace / [C]lear when a key is already
configured — the same UX the generic API-key providers (z.ai, MiniMax,
Gemini, etc.) and the Daytona setup already use.

* fix(install.ps1): pin uv sync target to venv\, verify baseline imports

Two related Windows-installer bugs that produce a broken venv with
`ModuleNotFoundError: No module named 'dotenv'` on first `hermes` run.

## Bug 1: uv sync ignores VIRTUAL_ENV, syncs into .venv\ instead of venv\

`Install-Dependencies` creates the venv at `venv\` via `uv venv venv`,
sets `$env:VIRTUAL_ENV = "$InstallDir\venv"`, then runs
`uv sync --extra all --locked`. Modern uv (>=0.5) ignores `VIRTUAL_ENV`
for the `sync` subcommand and uses the project default `.venv\`
instead. Result: deps land in `$InstallDir\.venv\`, `venv\` stays
empty except for the python.exe stub from the earlier `uv venv` call,
`hermes.exe` ends up wired to the wrong site-packages.

The bash installer (`scripts/install.sh`) already worked around this in
`install_deps()` line 1127 by passing `UV_PROJECT_ENVIRONMENT` — that
flag tells uv exactly where to put the project env regardless of
`VIRTUAL_ENV`. Port the same fix to PowerShell.

## Bug 2: no post-install verification

If the sync still misdirects for any other reason (uv version drift,
filesystem quirk, user re-run scenarios), the installer reports success
and the user only finds out by running `hermes` and getting an
unhelpful traceback. Add a baseline-import probe that runs the venv's
own python against the four packages every `hermes` invocation needs
(`dotenv`, `openai`, `rich`, `prompt_toolkit`). On failure, throw
with a recovery command tailored to whether a sibling `.venv\` exists.

User report (Windows 11, Python 3.13.5, Hermes v0.13.0): manual repro
steps were exactly this — `uv sync` landed in `.venv\`, recovered by
junctioning `venv\` → `.venv\` to bridge the path mismatch.
2026-05-14 07:39:13 -07:00
Teknium 17e0e9d174 fix(cli): allow rotating broken OpenRouter / AI Gateway key in hermes model flow (#25750)
Before: when `OPENROUTER_API_KEY` (or `AI_GATEWAY_API_KEY`) was already
set in ~/.hermes/.env, `hermes model openrouter` / `hermes model
ai-gateway` skipped the API-key prompt entirely and jumped straight to
the model picker. Users with a broken / expired / wrong key had no way
to replace it without editing ~/.hermes/.env by hand or re-running
`hermes setup` from scratch.

Both flows now route through the existing `_prompt_api_key()` helper,
which surfaces [K]eep / [R]eplace / [C]lear when a key is already
configured — the same UX the generic API-key providers (z.ai, MiniMax,
Gemini, etc.) and the Daytona setup already use.
2026-05-14 07:31:43 -07:00
teknium1 1dca6a6960 feat(discord): render clarify choices as buttons
Brings Discord to parity with Telegram on the clarify tool's interactive
UX. Overrides BasePlatformAdapter.send_clarify on DiscordAdapter to attach
a button view when choices are present.

  - ClarifyChoiceView: one discord.ui.Button per choice (max 24, Discord's
    25-component view cap leaves one slot for Other) plus a final
    'Other (type answer)' button.
  - Numeric click -> tools.clarify_gateway.resolve_gateway_clarify(
    clarify_id, choice_text) using the canonical choice text from the
    gateway entry (falls back to the button label if the entry vanished).
  - Other click -> tools.clarify_gateway.mark_awaiting_text(clarify_id) so
    the gateway's text-intercept captures the next user message in this
    session as the response.
  - Auth via the shared _component_check_auth helper (same OR-semantics as
    ExecApprovalView / SlashConfirmView / UpdatePromptView / ModelPickerView).
  - Open-ended (no choices) path renders the prompt as a plain embed and
    relies on the existing text-intercept resolution.
  - Single-use: first valid click disables every button and updates the
    embed footer with who answered and what they chose.

No changes to BasePlatformAdapter.send_clarify or the gateway's
clarify_callback wiring -- the existing scaffolding already drives all
adapters; Discord just inherits the default text fallback today and gains
buttons by virtue of this override.

Test conftest extended: _FakeEmbed gains add_field() / set_footer() stubs
so tests can construct embedded views without monkey-patching per-test.

Original PR: #19249 by @LeonSGP43. This is a reshape of the contributor's
work onto current main's clarify infrastructure (clarify_id + entry-based
resolution shared with Telegram, instead of a parallel on_answer-closure
mechanism). The button view structure and UX shape are preserved.

Tests: 14 new tests in tests/gateway/test_discord_clarify_buttons.py.
391/391 existing Discord gateway tests still pass.

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
2026-05-14 07:26:43 -07:00
Tranquil-Flow c75e1a03f9 fix(install): preserve pip entry point when re-running on symlinked install
setup_path() writes the user-facing hermes shim with `cat >`, which
follows existing symlinks. Older installs created
`$command_link_dir/hermes` as a symlink to `$HERMES_BIN`
(`venv/bin/hermes`), so re-running install.sh stomped the pip entry
point with a bash shim that exec'd itself in an infinite loop.

`rm -f` the link target before writing so the shim lands at
`$command_link_dir/hermes` and the venv entry point is left intact.

Adds a regression test that reproduces the symlink-stomp end-to-end
(creates the symlink, drives the real shim-write block from setup_path,
asserts the venv pip script body survives and the shim is now a regular
file). Both new assertions fail on origin/main and pass with the fix.

Closes #21454.
2026-05-14 07:08:45 -07:00
Alex ddb8d8fa84 docs: update NovitaAI provider positioning (#25532) 2026-05-14 01:31:12 -07:00
kshitijk4poor 0f0e20ef81 test(novita): cache pricing, add provider test coverage, AUTHOR_MAP entry
Follow-up to Alex-wuhu's NovitaAI provider commit. Adds:

- _pricing_cache hit/write in _fetch_novita_pricing (was missing — every
  pricing fetch was re-hitting the network), mirroring the
  fetch_ai_gateway_pricing pattern. force_refresh now also propagates
  from get_pricing_for_provider.
- TestNovitaProvider in tests/hermes_cli/test_api_key_providers.py
  covering profile load, alias resolution, registry auto-registration,
  model list parity between main.py and models.py, _URL_TO_PROVIDER,
  _PROVIDER_PREFIXES, context_size in _CONTEXT_LENGTH_KEYS, pricing
  unit conversion, and pricing cache behavior.
- AUTHOR_MAP entry for yanglongwei06@gmail.com → @Alex-yang00.
2026-05-13 23:51:15 -07:00
Alex-wuhu 1551ce46a4 docs: update NovitaAI description to "90+ models, pay-per-use" 2026-05-13 23:51:15 -07:00
Alex-wuhu c76e879574 feat: add NovitaAI as LLM provider
Add NovitaAI as a first-class provider with dedicated model selection
flow, live pricing, and authoritative context length resolution.

- Register provider in PROVIDER_REGISTRY, HERMES_OVERLAYS, and all
  alias/label maps (ID: novita, aliases: novita-ai, novitaai)
- Add dedicated _model_flow_novita() with 3-tier model list fallback:
  Novita API → models.dev → static curated list
- Fetch live pricing from /v1/models with correct unit conversion
  (input_token_price_per_m is 0.0001 USD per Mtok)
- Add Novita-specific context length resolution (step 4b) in
  get_model_context_length(), prioritized over models.dev/OpenRouter
- Register api.novita.ai in _URL_TO_PROVIDER to prevent early return
  from the custom-endpoint code path
- Add models.dev mapping (novita → novita-ai)
- Add default auxiliary model (deepseek/deepseek-v3-0324)
- Add NOVITA_API_KEY to test isolation (conftest.py)
- Update docs: providers page, env vars reference, CLI reference,
  .env.example, README, and landing page
2026-05-13 23:51:15 -07:00
ayushere 55ba02befb fix(background-review): silence memory provider teardown output leak
Background review fork redirected stdout/stderr around run_conversation()
so its iteration messages stay silent.  But the memory-provider teardown
(shutdown_memory_provider() and review_agent.close()) fired in the outer
finally block AFTER the redirect_stdout context exited — so provider
teardown prints (Honcho disconnect, Hindsight sync, etc.) leaked into
the parent terminal at end of every turn.

Moves the teardown inside the redirect_stdout scope on the success path
(and nulls review_agent so the finally safety-net skips double-shutdown).
The finally block is rewritten as an exception-path safety net that
re-opens a devnull redirect, since the original 'with' context has
already exited by the time finally runs.

Salvage of #25342 by @ayushere (manually re-applied + merged conflict
with current main's set_thread_tool_whitelist wiring).
2026-05-13 23:17:22 -07:00
PaTTeeL 7becb19ea0 fix(auxiliary): forward custom_providers to compression model context-length detection
When auxiliary.compression.provider is "auto", the compression model
reuses the main model's provider and base_url.  The main model's
context_length was correctly picking up custom_providers per-model
overrides (via _custom_providers stored during __init__), but the
auxiliary compression model's context-length detection path in
_check_compression_model_feasibility was not passing custom_providers,
causing it to skip step 0b and fall through to models.dev.

This meant that for providers like NVIDIA NIM where the user has a
per-model context_length in custom_providers (e.g. 196608 for
minimax-m2.7), the auxiliary model would use the models.dev value
(204800) instead of the user-configured one — a subtle discrepancy
that could lead to silent compression issues when the auxiliary model
doesn't actually support the detected context length.

Fix: pass self._custom_providers (already stored as an instance attr
during __init__) to the get_model_context_length() call for the
auxiliary compression model.
2026-05-13 23:13:51 -07:00
magic524 8199ec3803 fix(gateway): keep QQBot reconnect loop alive 2026-05-13 23:13:25 -07:00
fu576 f0e46c5e9e fix: do not inherit api_mode when delegating across providers
Cross-provider delegation (e.g. MiniMax parent → DeepSeek child) must not
inherit the parent's api_mode, because each provider uses a different API
surface: MiniMax uses 'anthropic_messages' while DeepSeek uses
'chat_completions'. Inheriting the wrong mode causes 404 errors.

When the effective provider differs from the parent's provider, derive
api_mode from the target provider's defaults instead (None triggers
re-derivation).

Refs: Bug #20558, PR #20563
2026-05-13 23:12:57 -07:00
pearjelly 71191b7e8e fix(gateway): make Feishu ws connect override sync to preserve context manager
The Feishu adapter wrapped lark-oapi's Connect() callable to inject
ping_interval/ping_timeout overrides, but made the wrapper async. The
underlying library uses Connect() as an async context manager (async
with Connect(...) as ws:), which requires the call itself to be sync
and return an AsyncContextManager — making it async meant the wrapper
was awaited eagerly and ws never bound.

Restoring the sync wrapper preserves the protocol while still injecting
the overrides.

Salvage of #25388 by @pearjelly (manually re-applied — original branch
was severely stale against current main).
2026-05-13 23:12:34 -07:00
raymaylee 00ad3d3c9c fix: show context compaction status 2026-05-13 23:11:43 -07:00
kfa-ai bd33a48a58 feat(whatsapp): surface quoted reply metadata 2026-05-13 23:11:20 -07:00
Tianyu199509 fd9c1504da fix: gateway PID detection fails on Windows (two issues)
- _read_process_cmdline: /proc and 'ps' are unavailable on Windows,
  so process cmdline was always empty. Add psutil fallback (already
  a hard dependency used by _pid_exists in the same module).

- _record_looks_like_gateway: argv paths use backslashes on Windows
  but patterns use forward slashes/dots, so the fallback record check
  always failed. Normalize backslashes to forward slashes before
  matching.

Together these caused get_running_pid() to return None on Windows
even when the gateway process is alive, making the dashboard report
gateway as 'stopped' despite it functioning normally.
2026-05-13 23:10:57 -07:00
AllynSheep 057f5a31d1 fix(auxiliary): skip providers without credentials immediately
When the auxiliary client fallback chain reaches a provider that has no
credentials configured (no API key, no pool entry), the current code
just returns (None, None) which counts toward the per-call timeout
budget on the next attempt. Mark the provider unhealthy with a short
TTL so the chain advances quickly to the next viable option.

Closes #25384.

Salvage of #25395 by @AllynSheep.
2026-05-13 23:10:33 -07:00
1RB b59ed9c6bc fix(discord): handle forwarded messages via message_snapshots
Discord introduced message_snapshots for forwarded messages — text and
attachments live inside snap.content / snap.attachments rather than on
the parent message. _handle_message wasn't reading them, so forwards
showed up empty.

Defensively extracts snapshot text (when raw_content is empty) and
appends snapshot attachments to the working all_attachments list used
for type detection and media routing. hasattr/getattr guards keep this
safe on older discord.py installs without the field.

Salvage of #25462 by @1RB (manually re-applied — original branch was
stale against current main).
2026-05-13 23:08:53 -07:00
ephron-ren efa97af7e2 fix(agent): add Xiaomi MiMo to reasoning_content echo-back providers
Xiaomi MiMo emits reasoning via OpenAI's reasoning_content field and
requires reasoning_content on every assistant tool-call message when
replaying history. Without echo-back, subsequent API calls fail with
HTTP 400 — same shape as DeepSeek and Kimi/Moonshot thinking modes.

Adds _needs_mimo_tool_reasoning() detection (provider == 'xiaomi',
'mimo' in model, or xiaomimimo.com base url) and wires it into the
_needs_thinking_reasoning_pad() check.

Salvage of #25358 by @ephron-ren (manually re-applied — original branch
was severely stale against current main).
2026-05-13 23:07:09 -07:00
freqyfreqy 8de26e280e docs(lsp): replace "git worktree" with "git repository" in LSP docs
The word "worktree" (a git subcommand feature for parallel checkouts)
was used interchangeably with "repository" in the LSP docs, causing
confusion. LSP only requires a git-initialized directory, not an actual
worktree.

Fixes two instances: section "When LSP runs" and the troubleshooting
"Editing a file outside any git repo" heading.
2026-05-13 23:05:20 -07:00
domtriola 796c8a2d63 docs(user-guide): point tirith link to correct repo 2026-05-13 23:04:57 -07:00
teknium1 2ff744ae2c chore(release): add AUTHOR_MAP entries for 25-PR new-contributor batch
Pre-stages AUTHOR_MAP for 12 new contributors whose PRs are being salvaged
in the upcoming batch:

- 1RB        (#25462)
- ayushere   (#25342)
- domtriola  (#25424)
- ephron-ren (#25358)
- freqyfreqy (#25423)
- fu576      (#25369)
- kfa-ai     (#25398)
- magic524   (#25361)
- PaTTeeL    (#25359)
- pearjelly  (#25388)
- raymaylee  (#25394)
- Tianyu199509 (#25421)
2026-05-13 23:04:35 -07:00
teknium1 16796acc84 chore(release): add AUTHOR_MAP entry for mrshu
Maps mr@shu.io to the mrshu GitHub handle so the release script
attributes the salvaged ACP approval bridging commit correctly.
2026-05-13 22:59:39 -07:00
mr.Shu 31b4721791 fix: simplify ACP approval bridging
Previously ACP dangerous-command approvals mixed an invalid ACP
payload shape with partial Hermes option mapping, and the callback
plumbing was shared across worker threads. This commit uses ACP
tool-call updates, preserves Hermes once/session/always semantics,
and scopes approval callbacks to the current worker thread.

- Build permission requests with `update_tool_call` and unique
  `perm-check-*` ids in `acp_adapter/permissions.py`
- Keep ACP option mapping explicit and fail closed on unknown outcomes
  or request failures
- Set approval callbacks inside the ACP executor worker and read them
  from thread-local state in `tools/terminal_tool.py`
- Replace duplicated ACP bridge coverage with focused tests in
  `tests/acp/test_permissions.py` and add a thread-local callback test
2026-05-13 22:59:39 -07:00
teknium1 35ce94a2f8 fix(tests): correct skin engine test API call
The salvaged regression test called skin.get_spinner_list() which
doesn't exist on SkinConfig. Replace with direct dict access on
skin.spinner — same intent (verify default empty spinner is preserved
when user override is invalid).
2026-05-13 22:55:52 -07:00
Dusk1e 5f234d4057 fix(cli): harden skin yaml parsing for invalid section types 2026-05-13 22:55:52 -07:00
Teknium 8f19078c6a feat(goals): /subgoal — user-added criteria appended to active /goal (#25449)
* feat(goals): /subgoal — user-added criteria appended to active /goal

Layers a /subgoal command on top of the existing freeform Ralph judge
loop. The user can append extra criteria mid-loop; the judge factors
them into its done/continue verdict and the continuation prompt
surfaces them to the agent. No new tool, no agent self-judging — the
existing judge model just sees a richer prompt.

Forms:
  /subgoal                  show current subgoals
  /subgoal <text>           append a criterion
  /subgoal remove <n>       drop subgoal n (1-based)
  /subgoal clear            wipe all subgoals

How it integrates:

- GoalState gains `subgoals: List[str]` (default []), backwards-compat
  for existing state_meta rows.
- judge_goal accepts an optional subgoals kwarg; non-empty switches to
  JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE which lists them as
  numbered criteria and asks 'is the goal AND every additional
  criterion satisfied?'
- next_continuation_prompt picks CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE
  when non-empty so the agent sees what to target.
- /subgoal is allowed mid-run on the gateway since it only touches the
  state the judge reads at turn boundary — no race with the running
  turn.
- Status line shows '... , N subgoals' when present.

Surface:
- hermes_cli/goals.py — field, prompt blocks, manager methods, judge weave
- hermes_cli/commands.py — /subgoal CommandDef
- cli.py — _handle_subgoal_command
- gateway/run.py — _handle_subgoal_command + mid-run dispatch
- tests/hermes_cli/test_goals.py — 15 new tests (backcompat, mutation,
  persistence, prompt template selection, judge-prompt content via mock,
  status-line rendering)

77 goal-related tests passing across goals + cli + gateway + tui.

* fix(goals): slash commands don't preempt the goal-continuation hook

Two findings from live-testing /subgoal:

1. Slash commands queued while the agent is running landed in
   _pending_input (same queue as real user messages). The goal hook's
   'is a real user message pending?' check returned True and silently
   skipped — but the slash command consumes its queue slot via
   process_command() which never re-fires the goal hook, so the loop
   stalls indefinitely. Now the hook peeks the queue and only defers
   when a non-slash payload is present.

2. The with-subgoals judge prompt was too soft — opus 4.7 said 'done,
   implying all requirements met' without verifying. Tightened to
   demand specific per-criterion evidence (file contents, output line,
   command result) and explicitly reject phrases like 'implying it was
   done.'

Live verified: /subgoal injected mid-loop now correctly forces the
judge to refuse done until the new criterion is met. Agent gets the
continuation prompt with subgoals listed, updates the script, judge
confirms done with specific evidence cited.
2026-05-13 22:55:09 -07:00
teknium1 d110ce4493 fix(clipboard): only read PNG signature bytes, not entire file
Tighten _is_png_file() to read just the 8-byte PNG magic via path.open()
+ read(8), instead of slurping the entire image into memory only to check
the prefix.
2026-05-13 22:54:21 -07:00
Dusk1e 8db544b4d0 fix(clipboard): reject non-png clipboard images when png normalization fails 2026-05-13 22:54:21 -07:00
teknium1 c872f07c47 fix(tests): exercise profile-mode HERMES_HOME for honcho fallback
The cherry-picked tests from #6173 set HERMES_HOME outside Path.home()/.hermes,
which forces get_default_hermes_root() down its Docker branch and returns
HERMES_HOME directly — so _get_default_hermes_home() never resolves to the
~/.hermes directory the tests were trying to assert about.

Rewire both tests to use the real profile layout (HERMES_HOME pointing at
~/.hermes/profiles/<name>) so _get_default_hermes_home() resolves back to
~/.hermes and the default-profile fallback is actually exercised.
2026-05-13 22:53:01 -07:00
Billard d18618f48f fix(honcho): respect HOME-anchored default profile fallback 2026-05-13 22:53:01 -07:00
kshitijk4poor 4ca5e72444 fix(web): preserve top-level error envelope on unconfigured systems
Surfaced by local E2E behavior-parity testing of PR vs origin/main: the
plugin-migrated dispatchers were quietly changing the error envelope
shape returned to function-calling models on unconfigured systems.

Two findings, both from per-result error wrapping bleeding into the
pre-flight configuration error path:

1. **search**: ``firecrawl.search()`` caught the
   ``ValueError("Web tools are not configured...")`` from
   ``_get_firecrawl_client()`` and returned it as
   ``{"success": False, "error": ...}``, losing the legacy
   ``{"error": "Error searching web: ..."}`` envelope that
   ``tool_error()`` emits on main. Models that special-case the
   ``error`` key still detect the failure, but the prefix is part of
   the legacy contract some users rely on.

2. **crawl**: ``firecrawl.crawl()`` caught the same pre-flight
   ``ValueError`` and wrapped it as a per-page error inside
   ``results[0]``. Main short-circuits on ``check_firecrawl_api_key()``
   BEFORE dispatching, so its unconfigured response is
   ``{"success": False, "error": "web_crawl requires Firecrawl..."}``
   at the top level. The PR's per-page burying hid the failure inside
   ``results[]`` where models that check ``result.get("error")`` would
   miss it.

Fix:
- ``plugins/web/firecrawl/provider.py``: pull
  ``_get_firecrawl_client()`` outside the broad ``try`` in
  ``search()``. Pre-flight ``ValueError`` / ``ImportError`` propagate
  to the dispatcher's top-level exception handler. In-flight SDK
  errors still get wrapped as ``{"success": False, ...}``.
- ``tools/web_tools.py``: mirror main's upstream availability gate in
  ``web_crawl_tool``. When the resolved crawl provider is
  ``is_available()==False``, short-circuit BEFORE dispatching with the
  same top-level error shape main emits.
- ``tests/tools/test_web_providers.py``: 2 regression tests
  (``TestUnconfiguredErrorEnvelopeParity``) lock in the behavior so
  future plugin work can't undo this.

Verified via local subprocess-based parity test (14/14 scenarios match
origin/main shape exactly) and full 210/210 web test suite green.
2026-05-13 22:31:28 -07:00
kshitijk4poor 657e6d87cc fix(web): align _LEGACY_PREFERENCE with legacy 7-provider order + doc cleanup
Self-review of the plugin migration surfaced one warning and a handful of
doc/dead-code cleanups. None affect production behaviour through the main
dispatcher (which always calls `tools.web_tools._get_backend()` first and
preserves the full 7-provider walk), but direct callers of
`agent.web_search_registry.get_active_*_provider()` previously diverged
from the legacy order and could return `None` for users with credentials
but no explicit `web.backend` config key.

Changes
-------
1. `_LEGACY_PREFERENCE` was shipped as a 4-tuple
   `("brave-free", "firecrawl", "searxng", "ddgs")` while the PR
   description and the legacy `_get_backend()` candidate order both
   call for the 7-tuple
   `(firecrawl, parallel, tavily, exa, searxng, brave-free, ddgs)`.
   Replaced with the 7-tuple. Verified empirically: with TAVILY+EXA keys
   and no config, `get_active_search_provider()` now returns tavily
   (was None); with EXA+PARALLEL it returns parallel (was None); with
   BRAVE+FIRECRAWL it returns firecrawl (was brave-free).

2. `agent/web_search_registry.py` — module docstring, `_resolve` step-3
   docstring, and inline comment all listed the old 4-tuple and claimed
   "brave-free first because it was the shipped default". The legacy
   default is `"firecrawl"`. Rewritten to match the new ordering and
   reference `tools.web_tools._get_backend()` as the source of truth.

3. `agent/web_search_registry.py` — `get_active_crawl_provider`
   docstring said "only Tavily implements it among built-in providers".
   Firecrawl also advertises `supports_crawl=True` after the previous
   commit. Updated to "Tavily and Firecrawl".

4. `plugins/web/tavily/provider.py` — module docstring said "Tavily is
   the only built-in backend that natively crawls". Updated.

5. `agent/web_search_provider.py` — ABC docstring mentioned only
   `search` / `extract` capabilities. Added `crawl` for accuracy.

6. `plugins/web/{firecrawl,parallel,exa}/provider.py` — dead plugin-level
   cache globals (`_firecrawl_client`, `_parallel_client`,
   `_async_parallel_client`, `_exa_client`) were declared but never read
   (all reads/writes go through `_wt.*` per the `extracting-inline-
   helpers-to-plugins` recipe). Removed the dead declarations; the
   reset-for-tests helpers in firecrawl + parallel now clear the
   canonical `_wt._<name>` slots, matching the pattern exa already used.

Tests
-----
218/218 web-targeted tests still pass (no test changes needed). 4910/4910
in `tests/tools/` still green.
2026-05-13 22:31:28 -07:00
kshitijk4poor 21e3a863bb feat(web): firecrawl plugin natively supports crawl; delete legacy inline path
The web-provider migration originally left firecrawl crawl as the only
provider-specific code remaining inline in tools/web_tools.py (~250
lines of Firecrawl-specific crawl orchestration that didn't fit the
plugin's existing surface). This commit closes that gap.

What this adds
--------------
1. plugins/web/firecrawl/provider.py: implement async ``crawl(url, **kwargs)``
   - Accepts the same kwargs as the dispatcher passes to any crawl
     provider (``instructions``, ``depth``, ``limit``); Firecrawl's
     /crawl endpoint ignores ``instructions`` and ``depth`` so we log
     and drop with a clear info message.
   - Wraps the sync SDK ``crawl()`` call in asyncio.to_thread so the
     gateway event loop isn't blocked on a multi-page crawl.
   - Preserves the response-shape normalization across pydantic /
     typed-object / dict variants that the legacy inline code did.
   - Preserves per-page website-policy re-check (catches blocked
     redirects after the SDK returns).
   - Returns the same {"results": [...]} shape so the dispatcher's
     shared LLM-summarization post-processing path works unchanged.
   - Sets supports_crawl() to True so the dispatcher routes through
     the plugin instead of the legacy fallthrough.

2. tools/web_tools.py: delete the entire legacy firecrawl crawl block
   that used to run after "No registered provider supports crawl" —
   ~270 lines including:
   - check_firecrawl_api_key gate + typed error
   - inline SSRF + website-policy seed-URL gate (dispatcher already
     does this)
   - Firecrawl client setup with crawl_params
   - 100+ lines of pydantic/dict/typed-object normalization
   - Per-page LLM-processing loop (kept in the dispatcher's shared
     post-processing path; that's where it always belonged)
   - trimming + base64 image cleanup (still done in the dispatcher's
     shared path)

   Replaced with a single typed-error branch when no crawl-capable
   provider is available: "web_crawl has no available backend. Set
   FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or set
   TAVILY_API_KEY for Tavily."

Test updates
------------
- tests/tools/test_website_policy.py:
  - test_web_crawl_short_circuits_blocked_url: dispatcher seed-URL
    gate still runs on web_tools.check_website_access (no change to
    that patch), but the firecrawl client lockdown moved to the
    plugin module — patch firecrawl_provider._get_firecrawl_client
    instead of web_tools._get_firecrawl_client. The dispatcher
    short-circuits before the plugin runs, so the test still passes.
  - test_web_crawl_blocks_redirected_final_url: patch the per-page
    policy gate at plugins.web.firecrawl.provider.check_website_access
    (where it now runs) AND on web_tools (where the seed-URL gate
    still runs). Patch firecrawl_provider._get_firecrawl_client for
    the FakeCrawlClient injection. Both checks flow through the same
    fake_check function.
- tests/plugins/web/test_web_search_provider_plugins.py:
  - Update parametrized capability-flag spec: firecrawl supports_crawl
    is now True.
  - Add test_firecrawl_crawl_returns_error_dict_when_unconfigured —
    verifies inspect.iscoroutinefunction(p.crawl) is True and that
    the async crawl returns a per-page error dict (not a raise) when
    FIRECRAWL_API_KEY is missing.

Verified
--------
- 218/218 web tests pass (was 173, +44 plugin tests + 1 new firecrawl
  crawl test from this commit = 218 with the test deduplication).
- Compile-clean (py_compile passes on both files).
- Provider capabilities matrix confirmed end-to-end:
    name        search  extract  crawl   async-extract?  async-crawl?
    firecrawl   True    True     True    True            True
    tavily      True    True     True    False           False
  Both crawl-capable providers exercise the dispatcher's
  inspect.iscoroutinefunction async-or-sync detection.

Net diff
--------
- tools/web_tools.py: -254 lines (legacy inline crawl gone)
- plugins/web/firecrawl/provider.py: +185 lines (crawl method)
- test_website_policy.py: +14/-9 lines (patch locations)
- test_web_search_provider_plugins.py: +22/-1 lines (capability flag
  + new firecrawl crawl test)
- Total: -32 net LoC; tools/web_tools.py is now 1509 lines (was 1763
  before this commit, 2227 before the migration started).
2026-05-13 22:31:28 -07:00
kshitijk4poor e8cee87e85 test(plugins): tests/plugins/web/ — coverage for the 7-plugin migration
Adds 44 focused tests under tests/plugins/web/ covering the surface that
the PR #25182 web-provider migration introduced. Complements the
existing tests/tools/ coverage which is dispatcher-centric; this file is
plugin-centric and tests each plugin + the registry directly.

Test classes (44 tests, ~1.1s on 4 workers)
-------------------------------------------

TestBundledPluginsRegister (16 tests)
  - All seven plugins present in the registry after
    _ensure_plugins_discovered()
  - Per-plugin parametrized capability-flag assertions
    (brave-free / ddgs / searxng: search-only;
     exa / parallel / firecrawl: search + extract;
     tavily: search + extract + crawl)
  - Every plugin exposes name + display_name properties
  - Every plugin returns a picker-compatible get_setup_schema() dict

TestIsAvailable (7 tests)
  - Each premium plugin reports is_available()==False when its env var is
    absent and True once set (brave-free / searxng / tavily / exa /
    parallel)
  - firecrawl recognizes either FIRECRAWL_API_KEY or FIRECRAWL_API_URL
    as a "configured" signal
  - ddgs is the always-on fallback and must not raise from is_available()

TestRegistryResolution (4 tests)
  - Option B semantics validated end-to-end:
    1. Explicit configured provider wins even when is_available()==False
       (dispatcher surfaces typed credential errors, no silent switch)
    2. Unknown/typo name falls back to first available legacy-preference
       provider
    3. Asking for extract via a search-only backend falls back to an
       extract-capable available provider (capability-incompatible
       branch in _resolve())
    4. No config + no credentials → None (or ddgs if installed)

TestAsyncExtractDispatch (4 tests)
  - parallel + firecrawl extract() are coroutine functions (async path
    in dispatcher uses await)
  - exa + tavily extract() are sync (dispatcher wraps in
    asyncio.to_thread)

TestErrorResponseShapes (7 tests)
  - Plugins return typed error dicts (success=False + "error" key) when
    credentials are missing, never raise
  - async extract() returns list of per-URL error dicts
  - tavily crawl() returns {"results": [{"error": ...}]} on missing
    credentials

Design notes
------------
- All tests use real imports of plugin modules — no mocking of provider
  classes themselves — so they catch drift in the ABC, registry, and
  glue layer simultaneously. Per the hermes-agent-dev skill's E2E
  testing guidance.
- The autouse _isolate_env fixture clears every web-provider env var
  before each test so is_available() reflects the test's setup.
- Resolution tests use the lower-level _resolve() directly rather than
  rebuilding the HERMES_HOME config dance — same observable behavior,
  no sys.modules.pop side-effects that would break the ABC isinstance
  check inside ctx.register_web_search_provider().
2026-05-13 22:31:28 -07:00
kshitijk4poor 39b4ebfcea refactor(web): delete legacy tools/web_providers/ directory + migrate ABC tests
Removes the legacy in-tree provider scaffolding that PR #25182 fully
replaced with the plugin architecture:

  tools/web_providers/__init__.py        (6 lines)
  tools/web_providers/base.py            (89 lines — old ABCs)
  tools/web_providers/ARCHITECTURE.md    (73 lines — old design doc)

These were the staging-ground ABCs and provider modules that the
plugin migration absorbed. All seven web providers now implement the
single :class:`agent.web_search_provider.WebSearchProvider` ABC and
live under ``plugins/web/<vendor>/``. Nothing else in the tree imports
``tools.web_providers`` — verified via grep before deletion.

Test migration (tests/tools/test_web_providers.py)
--------------------------------------------------
Rewrote ``TestWebProviderABCs`` to test the new unified ABC at
:mod:`agent.web_search_provider`:

  - test_cannot_instantiate_abc_directly — abstract ``name`` + ``is_available``
  - test_concrete_search_only_provider_works — exercise default
    ``supports_extract=False`` / ``supports_crawl=False`` flags
  - test_concrete_multi_capability_provider_works — exercise all three
    capabilities, async extract supported (declared sync here for
    simplicity; real plugins like parallel + firecrawl use async)
  - test_search_only_provider_skips_extract_and_crawl — verify
    ``supports_*()`` flags default to False so search-only providers
    don't have to implement extract() or crawl()

The 9 other tests in the file (per-capability backend selection,
DEFAULT_CONFIG merge, dispatcher routing) test public helpers in
``tools.web_tools`` that still exist and pass unchanged.

agent/web_search_provider.py docstring updated to reflect that the
legacy ABCs no longer exist; the response-shape contract is preserved
bit-for-bit so external consumers see no behavioral change.

Net diff
--------
- tools/web_providers/ removed (-168 lines)
- tests/tools/test_web_providers.py rewritten ABC section (+78/-30 net,
  same coverage, new API)
- agent/web_search_provider.py docstring (-3/+5 lines)

Verified
--------
- 173/173 targeted web tests pass
- 12/12 ABC contract tests pass with the new interface
- No remaining grep hits for ``tools.web_providers`` outside of
  intentional historical references in plugin docstrings.
2026-05-13 22:31:28 -07:00
kshitijk4poor 24fe60faa2 refactor(tools): drop hardcoded web picker rows + skiplist; plugins are sole source
Removes the seven hardcoded TOOL_CATEGORIES["web"] provider rows that
duplicated the plugin-registered providers, and deletes the
_WEB_PLUGIN_SKIPLIST that existed to prevent duplicate picker rows
during the migration. The Web Search & Extract category now derives its
provider rows entirely from agent.web_search_registry via
_plugin_web_search_providers(), matching how Spotify, Google Meet, and
the image_gen plugins are surfaced.

Removed (deduplicated against plugin schemas):
  - Firecrawl Cloud         → plugins.web.firecrawl
  - Exa                     → plugins.web.exa
  - Parallel                → plugins.web.parallel
  - Tavily                  → plugins.web.tavily
  - SearXNG                 → plugins.web.searxng
  - Brave Search (Free Tier) → plugins.web.brave_free
  - DuckDuckGo (ddgs)       → plugins.web.ddgs (post_setup hook preserved)

Retained in TOOL_CATEGORIES["web"]:
  - Nous Subscription   — requires requires_nous_auth +
                          managed_nous_feature + override_env_vars
                          to drive the managed-gateway UX. Not a
                          provider — a different *setup flow* for the
                          firecrawl backend.
  - Firecrawl Self-Hosted — points firecrawl at a private Docker URL
                            via FIRECRAWL_API_URL only. Same reason:
                            UX setup-flow row, not a provider.

These two rows describe alternative auth/billing paths for the
firecrawl backend; they intentionally share web_backend="firecrawl"
with the plugin row but light up different env-var prompts.

Plugin schema extensions
------------------------
- ddgs plugin's get_setup_schema() now emits `post_setup: "ddgs"` so
  selection still triggers the pip-install hook in _run_post_setup().
- _plugin_web_search_providers() passes `post_setup` through verbatim
  when present in the schema (other future plugins like camofox / a
  hypothetical playwright-web plugin can opt in the same way).
- Picker rows now carry both `web_backend` (legacy field consumed by
  setup + selection helpers) and `web_search_plugin_name`
  (informational marker), so behavior is identical between hardcoded
  and plugin-registered rows.

Net diff
--------
- hermes_cli/tools_config.py: -141/+50 lines (~91 lines net)
- plugins/web/ddgs/provider.py: +7/-4 (post_setup field + badge polish)

Verified
--------
- Compile-clean for both files
- Picker shows: 2 hardcoded rows (Nous Subscription, Firecrawl
  Self-Hosted) + 7 plugin rows (alphabetically: Brave Search,
  DuckDuckGo, Exa, Firecrawl, Parallel, SearXNG, Tavily). DuckDuckGo
  row carries post_setup="ddgs" for first-time install.
- 173 web-specific tests still pass.
2026-05-13 22:31:28 -07:00
kshitijk4poor 748f3e016b refactor(web): delete inline vendor helpers, re-export from plugins
Removes ~580 lines of dead code from tools/web_tools.py that were
superseded by the plugin migration but kept around in the cutover commit
to keep the diff focused. Replaces them with thin re-export shims so
existing tests and external callers that reach for the legacy
``tools.web_tools.<name>`` paths continue to work transparently.

Deleted from tools/web_tools.py
--------------------------------
- Lazy Firecrawl SDK proxy (_load_firecrawl_cls, _FirecrawlProxy,
  _FIRECRAWL_CLS_CACHE, the Firecrawl singleton)
- Firecrawl client section (_get_direct_firecrawl_config,
  _get_firecrawl_gateway_url, _is_tool_gateway_ready,
  _has_direct_firecrawl_config, _raise_web_backend_configuration_error,
  _firecrawl_backend_help_suffix, _get_firecrawl_client)
- Parallel client section (_get_parallel_client,
  _get_async_parallel_client, _parallel_client, _async_parallel_client)
- Tavily client section (_TAVILY_BASE_URL, _tavily_request,
  _normalize_tavily_search_results, _normalize_tavily_documents)
- Generic SDK normalizers (_to_plain_object, _normalize_result_list,
  _extract_web_search_results, _extract_scrape_payload)
- Exa client section (_get_exa_client, _exa_client, _exa_search,
  _exa_extract)
- Parallel helpers (_parallel_search, _parallel_extract)
- Duplicate inline check_firecrawl_api_key

Net: tools/web_tools.py drops from 2227 → 1613 lines (-614 lines).

Re-exports added at top of tools/web_tools.py
---------------------------------------------
- From plugins.web.firecrawl.provider:
  Firecrawl, _FirecrawlProxy, _FIRECRAWL_CLS_CACHE, _load_firecrawl_cls,
  _get_direct_firecrawl_config, _get_firecrawl_gateway_url,
  _is_tool_gateway_ready, _has_direct_firecrawl_config,
  _firecrawl_backend_help_suffix, _raise_web_backend_configuration_error,
  _get_firecrawl_client, _to_plain_object, _normalize_result_list,
  _extract_web_search_results, _extract_scrape_payload,
  check_firecrawl_api_key
- From plugins.web.tavily.provider:
  _tavily_request, _normalize_tavily_search_results,
  _normalize_tavily_documents
- From plugins.web.parallel.provider:
  _get_parallel_client, _get_async_parallel_client
- From plugins.web.exa.provider:
  _get_exa_client

Plus retained module-level imports for backward-compat with tests:
- httpx (tests patch tools.web_tools.httpx for tavily request mocking)
- build_vendor_gateway_url, _read_nous_access_token,
  resolve_managed_tool_gateway, managed_nous_tools_enabled,
  prefers_gateway (tests patch tools.web_tools.<name>)

Plugin indirection pattern (key technique)
------------------------------------------
For functions inside the firecrawl/parallel/exa plugins to honor
unit-test patches that target ``tools.web_tools.<name>``, the plugin
implementations now do ``import tools.web_tools as _wt`` at call time
and read helper names through that module (``_wt._read_nous_access_token``,
``_wt.Firecrawl``, ``_wt.prefers_gateway``, etc.). This makes the
existing test patches transparently reach the plugin code without any
test changes.

The cached client globals (_firecrawl_client, _firecrawl_client_config,
_parallel_client, _async_parallel_client, _exa_client) also now live on
tools.web_tools so existing test setup_method handlers that reset
``tools.web_tools._<vendor>_client = None`` between cases keep working.
The plugins read/write the cache via getattr/setattr on the web_tools
module.

Verified
--------
- 173/173 targeted web tests pass:
  test_web_providers.py, test_web_providers_brave_free.py,
  test_web_providers_ddgs.py, test_web_providers_searxng.py,
  test_web_tools_config.py, test_web_tools_tavily.py,
  test_website_policy.py, test_config_null_guard.py
- Compile-clean (py_compile.compile passes)
- All inline implementations now exist in exactly one place
  (plugins.web.<vendor>.provider)

Follow-up clean-up
------------------
- Drop _WEB_PLUGIN_SKIPLIST + hardcoded TOOL_CATEGORIES["web"] rows
  (next commit)
- Delete tools/web_providers/ directory entirely
- Add tests/plugins/web/ coverage
- Full tests/tools/ + tests/gateway/ regression sweep before promoting PR
2026-05-13 22:31:28 -07:00
kshitijk4poor 5e54330e27 fix(web): preserve firecrawl crawl + website-policy gate after migration
Two regressions discovered by running the full tests/tools/ suite after
the dispatcher cutover, both fixed in this commit:

1. web_crawl_tool incorrectly errored "search-only" for firecrawl
---------------------------------------------------------------------
The cutover treated any provider with supports_crawl()==False as a
search-only backend and returned the typed search-only error. But
firecrawl can crawl via the legacy multi-page-extract path inside
web_crawl_tool — it just doesn't expose supports_crawl on the plugin
(adding native firecrawl crawl is a clean follow-up).

Fix: only emit the search-only error when the provider supports
NEITHER crawl NOR extract (brave-free / ddgs / searxng). When the
provider supports extract but not crawl (firecrawl), fall through to
the legacy firecrawl-via-extract path below.

2. firecrawl plugin's check_website_access wasn't patchable
---------------------------------------------------------------------
The plugin imported `from tools.website_policy import check_website_access`
INSIDE the extract() function body, so monkeypatching the name on
plugins.web.firecrawl.provider had no effect — the inner import re-bound
the name on every call.

Fix: hoist the import to module level. Cheap (website_policy itself
has no heavy deps) and makes the standard
monkeypatch.setattr(firecrawl_provider, "check_website_access", ...)
pattern work.

Test updates (tests/tools/test_website_policy.py — 4 tests):
  - test_web_extract_short_circuits_blocked_url
  - test_web_extract_blocks_redirected_final_url
    Both: patch the gate at plugins.web.firecrawl.provider (where it
    runs after migration) and force the firecrawl plugin to be the
    active extract provider via FIRECRAWL_API_KEY.
  - test_web_crawl_short_circuits_blocked_url
  - test_web_crawl_blocks_redirected_final_url
    Both: unchanged — the dispatcher-level gate at tools.web_tools.py
    line 1651 still uses the imported `check_website_access` name and
    the firecrawl-fallthrough path is exercised as before.

Verified: 22/22 tests/tools/test_website_policy.py pass.
2026-05-13 22:31:28 -07:00
kshitijk4poor b05253ceed refactor(web): dispatch all three tools through web_search_registry
Cuts over web_search_tool, web_extract_tool, and web_crawl_tool in
tools/web_tools.py to dispatch through agent.web_search_registry
instead of the legacy hardcoded if-elif backend chains.

Per-tool changes:

  web_search_tool (sync)
    Replace 5 backend branches (parallel, exa, registry-3-providers,
    tavily, firecrawl-fallthrough) with a single registry path:
      1. _get_search_backend() resolves the configured name
      2. _wsp_get_provider(name) for explicit-config-wins semantics
      3. get_active_search_provider() fallback for typo / unknown name
      4. provider.search(query, limit) — sync for all 7 providers

  web_extract_tool (async)
    Replace 4 backend branches (parallel-async, exa-sync, tavily-sync,
    search-only-error, firecrawl-perurl-loop) with:
      1. Same provider resolution as search.
      2. When configured backend IS registered but doesn't support
         extract (search-only providers like brave-free), surface a
         typed "search-only" error matching the legacy text — tests
         assert that wording.
      3. inspect.iscoroutinefunction(provider.extract) detects sync vs
         async: parallel + firecrawl are async; exa + tavily are sync.
         Sync extracts run in asyncio.to_thread() so we don't block.

  web_crawl_tool (async)
    Replace tavily-specific branch + search-only-error block with:
      1. _wsp_get_provider(backend) — explicit config first
      2. Search-only typed error when the configured name doesn't
         support crawl (matches legacy phrasing)
      3. get_active_crawl_provider() fallback otherwise
      4. provider.crawl(url, **kwargs) — async-or-sync dispatch as above
      5. Response post-processing (LLM summarization, trimming) stays
         unchanged — it's not provider-specific.
    When no plugin advertises supports_crawl, falls through to the
    existing Firecrawl-via-web-summarize path below (unchanged).

Test updates (2 tests in tests/tools/test_web_tools_config.py):
  - test_web_search_clamps_limit_before_backend_call:
      patch("tools.web_tools._parallel_search") -> patch the registry
      provider returned by agent.web_search_registry.get_provider
  - test_search_error_response_does_not_expose_diagnostics:
      patch("tools.web_tools._get_firecrawl_client") -> same pattern

Tests unchanged (still pass):
  - All TestXBackendWiring classes (test _get_backend / _is_backend_available
    config-resolution, independent of dispatch)
  - All TestXSearchOnlyErrors classes (test the search-only error path
    via web_extract_tool / web_crawl_tool — error text preserved)
  - 141 passing web tests total, 0 regressions.

Dead-code cleanup deferred to a follow-up commit so this diff stays
focused on the cutover. After this commit:
  - tools.web_tools._exa_search / _exa_extract / _parallel_search /
    _parallel_extract / _tavily_request / _normalize_tavily_* /
    _get_firecrawl_client / _extract_web_search_results /
    _extract_scrape_payload / _to_plain_object / _normalize_result_list
    are no longer called by the dispatchers, but still exist.
  - The config-resolution layer (_get_backend, _is_backend_available,
    _is_tool_gateway_ready, _has_direct_firecrawl_config) IS still in
    use and must stay.
  - The Firecrawl proxy and check_firecrawl_api_key are still imported
    by integration tests and patched by unit tests — must stay (or be
    re-exported from the plugin).
2026-05-13 22:31:28 -07:00
kshitijk4poor 143184e943 feat(web): firecrawl plugin — largest migration (search + async extract + dual auth)
Migrates Firecrawl from inline code in tools/web_tools.py to a bundled
plugin at plugins/web/firecrawl/. By line count this is the largest of
the seven provider migrations: the firecrawl path captured most of the
file's vendor-specific complexity.

What moved into the plugin (all previously in tools/web_tools.py):

  Lazy Firecrawl SDK proxy
    - _load_firecrawl_cls() — caches the imported SDK class
    - _FirecrawlProxy + Firecrawl singleton — defers ~200ms of SDK
      imports until first construction or isinstance check.

  Client construction (dual auth)
    - _get_direct_firecrawl_config()  — direct FIRECRAWL_API_KEY/URL path
    - _get_firecrawl_gateway_url()    — managed Nous tool-gateway URL
    - _is_tool_gateway_ready()        — gateway URL + Nous token check
    - _has_direct_firecrawl_config()  — direct config present?
    - _get_firecrawl_client()         — combined client construction
                                        honoring web.use_gateway
    - check_firecrawl_api_key()       — top-level "is firecrawl usable"
    - _firecrawl_backend_help_suffix() — managed-gateway help string
    - _raise_web_backend_configuration_error() — typed misconfig error

  Response shape normalization (vendor-specific)
    - _to_plain_object(), _normalize_result_list() — SDK→dict helpers
    - _extract_web_search_results() — handles SDK/direct/gateway shapes
    - _extract_scrape_payload()     — nested-data unwrap for scrape

  Per-URL extract loop
    - 60s asyncio.wait_for timeout per URL
    - Pre-scrape website-policy gate
    - Post-scrape redirect-aware SSRF re-check
    - Format-aware content selection (markdown / html / auto)
    - Per-URL errors returned as {"error": str} entries, no raises

Extract is declared `async def` — each URL is scraped in
asyncio.to_thread(...). This is the second async-extract plugin after
parallel.

The plugin re-exports `Firecrawl` (the lazy proxy) and
`check_firecrawl_api_key()` so existing tests doing
`patch("tools.web_tools.Firecrawl")` or
`monkeypatch.setattr(web_tools, "check_firecrawl_api_key", ...)` keep
working — tools/web_tools.py re-exports both names in the next
dispatcher-cutover commit.

Note: web_crawl_tool still has its own Firecrawl crawl path inline
(separate from extract); the Firecrawl SDK supports /crawl but we don't
expose supports_crawl=True on this plugin yet. Tavily handles crawl
today. Adding Firecrawl crawl is a clean follow-up.

Adds "firecrawl" to _WEB_PLUGIN_SKIPLIST.

E2E verified:
  - All 7 providers register: brave-free, ddgs, exa, firecrawl,
    parallel, searxng, tavily
  - inspect.iscoroutinefunction(firecrawl.extract) -> True
  - Firecrawl proxy is a callable lazy proxy at module level
  - check_firecrawl_api_key reflects FIRECRAWL_API_KEY presence
2026-05-13 22:31:28 -07:00
kshitijk4poor 31fcde876c feat(web): tavily plugin — first three-capability plugin (search + extract + crawl)
Migrates Tavily from inline _tavily_request() / _normalize_tavily_*
helpers in tools/web_tools.py to a bundled plugin at plugins/web/tavily/.

First plugin in the codebase to advertise supports_crawl=True. Tavily is
unique among built-in backends in offering a native /crawl endpoint that
walks linked pages from a seed URL with optional natural-language
instructions and depth ("basic" or "advanced").

Capabilities:
  - supports_search()  -> True (Tavily /search)
  - supports_extract() -> True (Tavily /extract)
  - supports_crawl()   -> True (Tavily /crawl)
  All sync (httpx.post under the hood).

The crawl method accepts forward-compat kwargs (instructions, depth,
limit) and is gated against unsafe URLs/policy by the dispatcher in
web_crawl_tool — exactly as before.

Behavior preserved:
  - TAVILY_API_KEY required (ValueError → typed error response)
  - TAVILY_BASE_URL env override honored
  - /crawl requires both body auth AND Bearer header — preserved
  - failed_results[] and failed_urls[] response keys mapped to per-URL
    items with error fields rather than raising
  - max_results capped at 20 server-side

Adds "tavily" to _WEB_PLUGIN_SKIPLIST.

The legacy inline _tavily_request / _normalize_tavily_search_results /
_normalize_tavily_documents / _TAVILY_BASE_URL in tools/web_tools.py are
NOT deleted yet — search/extract dispatch and the entire web_crawl_tool
function still reference them. They go away when those dispatchers are
cut over to the registry.

E2E verified:
  - Tavily registers with all 3 capabilities
  - Provider list now: brave-free, ddgs, exa, parallel, searxng, tavily
2026-05-13 22:31:28 -07:00
kshitijk4poor 4816646109 feat(web): parallel plugin — first async-extract plugin
Migrates Parallel.ai from inline `_parallel_search()` / `_parallel_extract()`
in tools/web_tools.py to a bundled plugin at plugins/web/parallel/.

First plugin in the codebase to expose an async :meth:`extract`:

  - search() is sync — Parallel.beta.search
  - extract() is **async def** — AsyncParallel.beta.extract

The ABC's docstring on supports_extract() already permits sync-or-async;
this commit is the first to exercise the async path. The web_extract_tool
dispatcher (next commit) detects coroutines via
inspect.iscoroutinefunction and awaits accordingly.

Behavior preserved:
  - PARALLEL_API_KEY required (raises ValueError if missing → surfaced
    as {"success": False, "error": "..."} instead)
  - PARALLEL_SEARCH_MODE env var honored (agentic|fast|one-shot, default
    agentic), validated via _resolve_search_mode()
  - Limit capped at 20 server-side via min(limit, 20)
  - Per-URL failure mode preserved: response.errors[] each become a
    result dict with an "error" field rather than raising
  - Module-level _parallel_client / _async_parallel_client caches kept
    (mirrors legacy singleton pattern)

Adds "parallel" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so
the picker doesn't double-list.

The legacy inline _parallel_search, _parallel_extract, _get_parallel_client,
_get_async_parallel_client in tools/web_tools.py are NOT deleted yet — the
dispatcher still calls them. They go away when the dispatcher cuts over.

E2E verified:
  - inspect.iscoroutinefunction(p.search) -> False
  - inspect.iscoroutinefunction(p.extract) -> True
  - extract() returns a coroutine (not a list)
  - 5 providers register correctly (brave-free, ddgs, exa, parallel, searxng)
2026-05-13 22:31:28 -07:00
kshitijk4poor ec8449e9c6 feat(web): exa plugin — first multi-capability migration (search + extract)
Migrates Exa from the inline `_exa_search()` / `_exa_extract()` helpers in
tools/web_tools.py to a bundled plugin at plugins/web/exa/.

This is the first plugin in this PR to advertise supports_extract=True,
exercising the multi-capability ABC path that the initial three migrations
(brave_free, ddgs, searxng — all search-only) did not cover.

Both Exa methods are sync — the SDK is sync-only. The web_extract_tool
dispatcher in tools/web_tools.py will continue to call them inline until
Task "dispatch-extract-all" cuts it over to the registry.

Behaviour preserved bit-for-bit aside from the ABC method-name change:
  - is_configured()  -> is_available()
  - provider_name()  -> name (property)
  - "exa" stays as the registered name
  - Module-level `_exa_client` cache + lazy `from exa_py import Exa`
    preserved at the new location.
  - Errors (ValueError for missing API key, ImportError for missing SDK,
    generic Exception) caught and surfaced as {"success": False, "error": ...}
    instead of raising.

Adds "exa" to _WEB_PLUGIN_SKIPLIST in hermes_cli/tools_config.py so the
hardcoded TOOL_CATEGORIES["web"] row and the plugin-injected row don't
duplicate during the spike. The skip-list goes away in the cleanup phase
along with the hardcoded row.

The legacy inline `_exa_search` / `_exa_extract` / `_get_exa_client` /
`_exa_client` in tools/web_tools.py are NOT deleted yet — the dispatcher
still references them. They go away in the next dispatcher-cutover commit.

E2E verified:
  - Plugin discovers + registers
  - .supports_search/.supports_extract/.supports_crawl = (True, True, False)
  - .get_setup_schema() returns the picker row shape
  - resolve(): explicit exa + EXA_API_KEY -> exa; without key -> exa (registered
    but unavailable, dispatcher surfaces "EXA_API_KEY not set" error)
2026-05-13 22:31:28 -07:00
kshitijk4poor e3f0a88891 feat(web): extend ABC with supports_crawl and async-extract semantics
Two ABC additions to cover the surface area of the remaining four
providers (exa, parallel, tavily, firecrawl) which were untouched by the
initial spike:

1. supports_crawl() + crawl() — Tavily natively crawls a seed URL via
   its /crawl endpoint. Exposing supports_crawl=True lets the crawl
   tool's dispatcher route to Tavily when configured, falling back to
   the auxiliary-model summarization path otherwise. Firecrawl could
   add this in a follow-up (the SDK supports it; we just don't surface
   it as a tool today).

2. Async-or-sync extract() — Parallel's SDK is natively async
   (AsyncParallel.beta.extract); Exa and Tavily are sync; Firecrawl is
   sync but called inside asyncio.to_thread() with a 60s timeout. The
   ABC docstring now permits either shape: implementations declare
   their own sync/async signature and the dispatcher uses
   inspect.iscoroutinefunction to detect and await.

Also adds get_active_crawl_provider() to web_search_registry mirroring
the search/extract resolvers, with web.crawl_backend as the explicit
override config key.

No behavior change on its own — these are scaffolds for the four
remaining provider migrations.
2026-05-13 22:31:28 -07:00
kshitijk4poor 0a7cbd3342 fix(plugins): filter resolution by is_available() in web + image_gen registries
Both web_search_registry._resolve() and image_gen_registry.get_active_provider()
walked their registered providers and returned the first one matching the
capability flag — without checking whether that provider was actually
usable. On a fresh install with no credentials at all, this meant
get_active_search_provider() returned `brave-free` (legacy preference
order) even though BRAVE_SEARCH_API_KEY was unset, leading the
dispatcher to surface a "BRAVE_SEARCH_API_KEY is not set" error for a
provider the user never chose. Same bug shape in image_gen for FAL.

Resolution semantics now match tools.web_tools._get_backend():

  1. Explicit config name wins, ignoring is_available() — the dispatcher
     surfaces a precise "X_API_KEY is not set" error rather than silently
     switching backends. Matches user expectation: "I configured X, tell
     me what's wrong with X."
  2. Fallback (no explicit config) walks the legacy preference order
     filtered by is_available() — pick the highest-priority backend the
     user actually has credentials for.

is_available() is wrapped in a try/except so a buggy provider doesn't
brick resolution.

E2E verified:
  - No creds + no config: get_active_search_provider() -> None
  - Explicit brave-free + no key: get_active_search_provider() -> brave-free
    (and .is_available() correctly reports False)

This fix was identified during the spike (#25182 finding #1) and is
fold-in to the same PR rather than a follow-up.
2026-05-13 22:31:28 -07:00
kshitijk4poor 6b219f5af6 refactor(web): remove legacy in-tree provider modules
Deletes tools/web_providers/{brave_free,ddgs,searxng}.py — the three
providers that moved to plugins/web/ in prior commits. tools/web_tools.py
no longer imports them (registry dispatch as of d8735963f), so removing
them is purely a cleanup pass.

Also migrates the existing tests to the new import paths:
  tests/tools/test_web_providers_brave_free.py
  tests/tools/test_web_providers_ddgs.py
  tests/tools/test_web_providers_searxng.py

Mechanical rewrites:
  - `from tools.web_providers.X import YSearchProvider`
      -> `from plugins.web.X.provider import YWebSearchProvider`
  - `.is_configured()` -> `.is_available()`        (legacy method  -> new method)
  - `.provider_name()` -> `.name`                  (legacy method  -> new property)
  - `from tools.web_providers.base import WebSearchProvider`
      -> `from agent.web_search_provider import WebSearchProvider`
      (the subclass-check asserts membership in the new plugin-facing ABC)
  - `sys.modules.delitem("tools.web_providers.ddgs")` updated to point at
    `plugins.web.ddgs.provider` (cache-busting for lazy ddgs imports)

The TestXBackendWiring / TestXSearchOnlyErrors classes (covering
_is_backend_available, _get_backend, check_web_api_key, and the
"search-only" error paths in web_extract/web_crawl) are untouched —
those still test web_tools.py's backend-selection logic, which continues
to recognize the names "brave-free" / "ddgs" / "searxng" even after the
modules behind them moved to plugins.

tools/web_providers/base.py is intentionally NOT deleted by this commit
— it's the parent ABC of the legacy modules and shares its name with
agent/web_search_provider.py::WebSearchProvider. Removing it surfaces the
naming collision (see PR description Finding 0); the real migration PR
deletes it in the same commit that drops the _WEB_PLUGIN_SKIPLIST
guards in hermes_cli/tools_config.py.

Test results:
  bash scripts/run_tests.sh tests/tools/test_web_providers_*.py
  -> 65 passed in 3.41s (all rewritten unit tests + unchanged integration tests)
  bash scripts/run_tests.sh tests/tools/test_web_*.py
  -> 141 passed in 4.70s (full web test set, post-deletion)
2026-05-13 22:31:28 -07:00
kshitijk4poor 714630110b feat(tools): mirror image_gen plugin-injection in Web Search picker
Adds _plugin_web_search_providers() and wires it into _visible_providers()
for the "Web Search & Extract" category. Mirrors the existing image_gen
pattern at the same site exactly.

Spike scope: while the three migrated providers (brave-free, ddgs, searxng)
still have hardcoded TOOL_CATEGORIES rows, _WEB_PLUGIN_SKIPLIST excludes
them so the picker doesn't show duplicates. The migration PR drops the
hardcoded rows and the skip-list both — then this helper is the only
source of web-provider picker rows.

E2E verified: helper returns [] today (skip-list covers all 3 migrated
providers); injection point is sound and ready for the post-migration state.
2026-05-13 22:31:28 -07:00
kshitijk4poor 6bd16a645b refactor(web): dispatch brave-free/ddgs/searxng via web_search_registry
The three migrated providers (brave-free, ddgs, searxng) are now dispatched
through agent.web_search_registry.get_provider() instead of importing
their concrete classes directly. The four inline providers (parallel, exa,
tavily, firecrawl) keep their existing branches — they live in
tools/web_tools.py itself and aren't part of this spike's plugin extraction.

The legacy tools/web_providers/{brave_free,ddgs,searxng}.py modules are
still in place (untouched by this commit) — Task 10 deletes them once the
real migration PR is ready. Keeping them alive during the spike means
revertibility is trivial.

E2E verified:
  1. Plugin discovery registers ['brave-free','ddgs','searxng']
  2. Config web.search_backend: brave-free resolves to the plugin instance
  3. Dispatch result matches the original {success, data.web[]} contract
  4. compile OK; no new LSP errors beyond pre-existing ones in web_tools.py
2026-05-13 22:31:28 -07:00
kshitijk4poor 0d085d9454 feat(web): searxng plugin (search-only, third migration)
Adds plugins/web/searxng/. SearXNG aggregates results from upstream engines
via its JSON API (/search?format=json) — search-only, no extract capability
(supports_extract() returns False).

E2E verified — registry now has ['brave-free', 'ddgs', 'searxng'].
2026-05-13 22:31:28 -07:00
kshitijk4poor 5c7d098bee feat(web): ddgs plugin (second migration)
Adds plugins/web/ddgs/ following the same plugins/image_gen/ pattern as
brave_free. DuckDuckGo search via the community ddgs package; no API key,
package is an optional dep gated by is_available().

E2E verified — registry now has ['brave-free', 'ddgs'].
2026-05-13 22:31:28 -07:00
kshitijk4poor d403cf018c feat(web): brave_free plugin (first migration from tools/web_providers/)
Adds plugins/web/brave_free/ as the first plugin built against the new
WebSearchProvider ABC. Mirrors the plugins/image_gen/openai/ layout exactly:

  plugins/web/brave_free/
    plugin.yaml      kind: backend, provides_web_providers: [brave-free]
    __init__.py      register(ctx) -> ctx.register_web_search_provider(...)
    provider.py      BraveFreeWebSearchProvider(WebSearchProvider)

Behavior preserved: same name ("brave-free" with hyphen), same env var
(BRAVE_SEARCH_API_KEY), same HTTP request shape, same response normalization.

The legacy tools/web_providers/brave_free.py is left in place — the
dispatcher in tools/web_tools.py still references it. Task 7 cuts over the
dispatcher to the new registry; Task 10 deletes the legacy file.

E2E verified:
  HERMES_PLUGINS_DEBUG=1 python -c "
  from hermes_cli.plugins import _ensure_plugins_discovered
  _ensure_plugins_discovered()
  from agent.web_search_registry import list_providers
  print([p.name for p in list_providers()])
  "
  # -> ['brave-free']
2026-05-13 22:31:28 -07:00
kshitijk4poor f29f02a73f feat(plugins): add ctx.register_web_search_provider() facade 2026-05-13 22:31:28 -07:00
kshitijk4poor 007a630b16 feat(web): add web search provider registry mirroring image_gen pattern 2026-05-13 22:31:28 -07:00
kshitijk4poor 2cea98e143 feat(web): add WebSearchProvider ABC mirroring image_gen template 2026-05-13 22:31:28 -07:00
teknium1 563077a47a refactor(cli): route /model picker through shared inventory module
The interactive CLI /model picker was the third call-site duplicating
the inline config-slice + list_authenticated_providers pattern that
PR #23666 consolidated for the dashboard and TUI. Route it through
load_picker_context() + build_models_payload() too so all surfaces
that show authenticated providers share one substrate.

Side effect: cli.py now also benefits from the latent v12+ keyed
providers fix (custom_providers populated via
get_compatible_custom_providers, not cfg.get raw).

The aux-task switcher (hermes_cli/main.py) and gateway model
switcher (gateway/run.py) deliberately stay on the legacy path —
they use different config sections (auxiliary.<task>.*) and a
different config loader (_load_gateway_config) respectively, so
forcing them through ConfigContext would either overload its
semantics or grow the module past the clean refactor scope.
2026-05-13 22:31:11 -07:00
kshitijk4poor efc32ab639 refactor(inventory): extract shared ConfigContext + build_models_payload
Three call-sites in the codebase each duplicated the same config-slice
+ list_authenticated_providers + post-processing pattern:

- hermes_cli/web_server.py /api/model/options
- tui_gateway/server.py model.options JSON-RPC
- tui_gateway/server.py model.save_key JSON-RPC

This consolidates them onto hermes_cli/inventory.py:

  load_picker_context() -> ConfigContext
      Replaces the 17-LOC config-slice (model.{default,name,provider,
      base_url}, providers:, custom_providers:) every consumer did
      inline.

  ConfigContext.with_overrides(*, current_provider=, current_model=,
                               current_base_url=) -> ConfigContext
      Truthy-only overlay for TUI agent-session state on top of disk
      config. Empty getattr(agent, ...) attrs MUST NOT clobber disk.

  build_models_payload(ctx, *, include_unconfigured, picker_hints,
                       canonical_order, max_models) -> dict
      Single payload builder. Delegates curation to
      list_authenticated_providers (does not call provider_model_ids
      per row \u2014 that pulls non-agentic models). picker_hints +
      canonical_order produce the TUI ModelPickerDialog shape;
      defaults match the dashboard's existing /api/model/options
      contract.

Two latent bugs fixed by consolidation:

1. The dashboard read cfg.get('custom_providers') directly, missing
   the v12+ keyed providers: form. Now both surfaces go through
   get_compatible_custom_providers().

2. The TUI's canonical-merge keyed on is_user_defined to decide order.
   Section 3 of list_authenticated_providers sets is_user_defined=True
   on rows from the providers: config dict even when the slug is
   canonical \u2014 that silently demoted them to the picker tail.
   _reorder_canonical now keys on slug membership instead.

Stats: +666 / -145 (net +521). Module 240 LOC; 18 behavior tests.

This PR replaces the rejected #23369 (which bundled the consolidation
with new scriptable CLI surfaces \u2014 hermes models list/status, hermes
providers list \u2014 and a JSON contract that have no external user
demand). Just the refactor; the CLI surface is deferred to a separate
PR gated on actual demand.

Refs #23359.
2026-05-13 22:31:11 -07:00
teknium1 4ceab16893 fix(compression): keep default protect_first_n at 3 + align ABC
Follow-up on the salvaged feat commit:

- Keep the constructor / config / yaml-example default at 3 so existing
  gateway and CLI users see no behavioural change. PR #13754 (which this
  builds on) had lowered the default to 2 to chase pre-feature parity in
  the system-prompt-present case, at the cost of quietly halving the
  protected head for the gateway path (which strips the system prompt
  before calling compress()). With the new "system prompt is implicit"
  semantics, default 3 gives every caller a stable head shape.
- agent/context_engine.py: bring the ABC's protect_first_n docstring in
  line with the new semantics so plugin context engines interpret the
  config key the same way the built-in compressor does.
- tests: adjust the default-value test (3, not 2) and a stale comment;
  per-test protect_first_n=2/3/1 values added in PR #13754 stay as-is
  since those tests fix concrete head shapes.
2026-05-13 22:25:16 -07:00
snav dee71a31e5 feat(compression): make protect_first_n configurable
The number of head messages preserved verbatim across context compactions
was previously hardcoded to 3 in AIAgent.__init__. Expose it as
`compression.protect_first_n` in config, matching the existing
`protect_last_n` pattern.

Motivation: users who rely on rolling compaction for long-running sessions
had the opening user/assistant exchange pinned as head forever, which
doesn't always match how they want the session framed after many
compactions. Lowering to 1 preserves the system prompt + first non-system
message; lowering to 0 preserves only the system prompt and lets the
entire first exchange age out naturally through the summary.

Semantics: `protect_first_n` counts non-system head messages protected
**in addition to** the system prompt, which is always implicitly protected
when present. Same meaning across both code paths:

  protect_first_n=0 → system prompt only (or nothing if no system message)
  protect_first_n=2 → system prompt + first 2 non-system messages (default)

This unifies the CLI path (which reads messages with the system prompt at
position 0) and the gateway path (where the gateway /compress handler
strips the system prompt before calling compress() — see
gateway/run.py L9150-9154 on the parent fork). Previously these two paths
disagreed:

  CLI path:     protect_first_n=1 → protect system prompt only
  Gateway path: protect_first_n=1 → protect first USER turn forever

In practice on long-running gateway sessions the old semantics pinned
whatever stale aside happened to be the first user message, reinserting
it into every compaction summary indefinitely.

Default chosen as 2 (not 3) so that the effective protected head count
remains 3 messages in the common case — assuming a system prompt is
present, default protection becomes system + 2 non-system = 3 total,
matching the pre-feature behaviour where `protect_first_n` was hardcoded
to protect 3 messages total. Sessions without a system prompt will see a
small behaviour change (2 protected head messages instead of 3), but this
is the rare path and the new semantics make the system-prompt-present
case the well-defined one.

Changes:

- agent/context_compressor.py: redefine protect_first_n as the count of
  non-system head messages protected beyond the implicit system-prompt
  guarantee; both paths converge. Constructor default updated to 2.
- hermes_cli/config.py: add `compression.protect_first_n` default (2),
  matching the new semantics. `show_config` label tweaked to
  'Protect first: N non-system head messages' for clarity.
- run_agent.py: read protect_first_n from config; 0 is now valid (system
  prompt is always implicitly protected).
- cli-config.yaml.example: document the new key and rationale.
- tests/agent/test_context_compressor.py: cover default, override, the
  end-to-end `protect_first_n=0` and `protect_first_n=1` behaviour,
  the no-system-prompt (gateway) path, and the new shared-semantics
  regression test.

Fixes #13751
Tested on Ubuntu 24.04.
2026-05-13 22:25:16 -07:00
teknium1 ffbc21100d chore(release): map jake@nousresearch.com → simpolism 2026-05-13 22:21:43 -07:00
snav d863773c81 feat(discord): add thread_require_mention for multi-bot threads
By default, once Hermes participates in a Discord thread (auto-created on
@mention or replied in once) it auto-responds to every subsequent message
in that thread without requiring further @mentions. That's the right default
for one-on-one conversations and isolated channel threads.

But it's a confirmed footgun in multi-bot threads. When a user invokes one
bot per turn — addressing Codex first, then Hermes — every other bot in the
thread also fires on every message, burning credits and spamming the channel.
Author has hit this personally in active multi-bot research-team threads.

Add a new `discord.thread_require_mention` config key (env:
`DISCORD_THREAD_REQUIRE_MENTION`), default `false` to preserve existing
behavior. When `true`, the in-thread mention shortcut is disabled and
threads are gated the same way channels are. Explicit @mentions still pass
through as expected.

Mirrors the existing helper shape (config.extra > env > default) and the
existing yaml→env bridge pattern used by `require_mention`.

Changes:

- gateway/platforms/discord.py: new `_discord_thread_require_mention()`
  helper; in_bot_thread shortcut now AND's with `not _discord_thread_require_mention()`
- gateway/config.py: bridge `discord.thread_require_mention` from config.yaml
  to `DISCORD_THREAD_REQUIRE_MENTION` env var (mirrors the existing
  `require_mention` bridge two lines above)
- hermes_cli/config.py: add `thread_require_mention: False` default to
  DEFAULT_CONFIG['discord']
- tests/gateway/test_discord_free_response.py: 4 new tests covering default
  behaviour (in-thread shortcut still works), enabled behaviour (mention
  required in threads), enabled+mentioned (mention still passes through),
  and yaml-via-config.extra path. Also clears DISCORD_* env vars in the
  `adapter` fixture so process-env state from the contributor's shell
  doesn't leak into per-test behaviour.
- tests/gateway/test_config.py: 2 new tests covering the yaml→env bridge
  (both the apply-from-yaml and env-precedence-over-yaml paths)
- website/docs/user-guide/messaging/discord.md: document the new env var
  + config key with multi-bot rationale; cross-link from `auto_thread`
  section

Tested on Ubuntu 24.04.
2026-05-13 22:21:43 -07:00
simpolism d557544560 fix(discord): keep free-response channels inline
Free-response channels are intended as lightweight chat surfaces — the bot
responds to every message without requiring an @mention. But the auto-thread
gate only checked DISCORD_NO_THREAD_CHANNELS, not DISCORD_FREE_RESPONSE_CHANNELS,
so every message in a free-response channel still spawned a brand-new thread.
That turns a chat channel into a thread-spawning machine: 1 thread per message.

The user-facing docs at website/docs/user-guide/messaging/discord.md already
describe the intended behavior ("Free-response channels also skip auto-threading
— the bot replies inline rather than spinning off a new thread per message"),
so this is a code-vs-docs gap, not a design change.

Fix: OR is_free_channel into skip_thread alongside the existing no_thread_channels
check. One-line production change.

Regression test added at tests/gateway/test_discord_free_response.py:
test_discord_free_response_channel_skips_auto_thread asserts that a message
in a free-response channel never calls _auto_create_thread.  Reverting the
one-line fix causes the test to fail with 'Expected mock to not have been
awaited. Awaited 1 times.' — i.e. the test demonstrates the bug concretely.
2026-05-13 22:21:18 -07:00
kshitijk4poor 3633c8690b refactor(plugins): add apply_yaml_config_fn registry hook
Lets platform plugins own their YAML→env config bridge instead of forcing
core gateway/config.py to know every platform's schema.

The hook receives the full parsed config.yaml and the platform's own
sub-dict, may mutate os.environ (env > YAML precedence preserved via the
standard `not os.getenv(...)` guards), and may return a dict to merge
into PlatformConfig.extra. It runs during load_gateway_config() after
the existing generic shared-key loop and before _apply_env_overrides(),
mirroring the env_enablement_fn dispatch pattern (#21306, #21331).

Pure addition — no behavior change for existing platforms. Each of the
eight platforms with hardcoded YAML→env blocks today (discord, telegram,
whatsapp, slack, dingtalk, mattermost, matrix, feishu, ~252 LOC in
gateway/config.py) can migrate in independent follow-up PRs; the
hardcoded blocks remain functional in the meantime, and their
`not os.getenv(...)` guards make them no-ops for any env var the hook
already set.

Test coverage: 10 new tests in tests/gateway/test_platform_registry.py
covering field default, callable acceptance, env mutation, extras
merge, both signature args, exception swallowing, missing/non-dict
sections, and env > YAML precedence.

Refs #3823, #24356.
Closes #24836.
2026-05-13 22:20:30 -07:00
Teknium d5775fe988 feat(codex-runtime): skip unavailable plugins during migration (#25437)
Followup to PR #24182 — caught when scanning OpenClaw for recent codex
fixes we hadn't considered. OpenClaw learned the hard way (#80815) that
migrating plugins which codex itself reports as unavailable produces
config that fails at activation time.

Our /codex-runtime codex_app_server enable path queries codex's
plugin/list and migrates everything where installed=true. We were
trusting codex's installation state and ignoring its availability
field. So a plugin that's installed=true but availability=UNAVAILABLE
(broken local install) or REQUIRES_AUTH (OAuth expired or never
completed) would get an [plugins."<n>@openai-curated"] entry in
~/.codex/config.toml — and the user's first codex turn after enabling
the runtime would fail because codex refuses to activate it.

Fix: filter on availability in _query_codex_plugins(). Only emit
plugins where availability is empty (older codex versions without the
field — preserve backward compat) or explicitly AVAILABLE.

Tests:
  test_plugin_discovery_skips_unavailable_plugins — verifies 4 cases:
    - good-plugin (installed=True, availability=AVAILABLE) → migrated
    - broken-plugin (installed=True, availability=UNAVAILABLE) → skipped
    - auth-pending (installed=True, availability=REQUIRES_AUTH) → skipped
    - legacy-plugin (installed=True, no availability field) → migrated
      (older codex versions; preserve backward compat)

Docs:
  Added bullet to 'What's NOT migrated' list in the docs page calling
  out the availability filter and why.

Other OpenClaw codex PRs I reviewed but did NOT apply (with reasoning):
  - #81591 (load Codex for selectable models): we resolve runtime
    per-call already, no startup-time gating to fix
  - #81510 (cron compatibility): we documented cron as untested; their
    fix is for OpenClaw-specific cron orchestration shape
  - #81223 (rotate incompatible context-engine threads): we don't
    have a Lossless context engine equivalent
  - #80688 (constrain sandbox): we don't have an outer-sandbox concept
  - #80616 (release on turn_aborted): we already handle status=
    interrupted in turn/completed correctly
  - #80278 (expose activeModel in plugin SDK): not our surface
  - #80792 (default destructive_actions on): we don't expose that knob

56 codex-runtime migration tests still green (+1 new).
2026-05-13 22:20:27 -07:00
Teknium f7ad2f1115 feat(dashboard): hide token/cost analytics behind config flag (default off) (#25438)
The Analytics page and the token/cost surfaces on the Models page show
local debug estimates only. They count input+output (and a bar viz adds
cache_read+reasoning, missing cache_write entirely) from successful
main-agent responses that returned a usable usage block.

Excluded silently:
- All auxiliary calls — context compression, title generation, vision,
  session search, web extract, smart approvals, MCP routing, plugin LLM
  access (13 production call sites bypass update_token_counts)
- Provider-side retries, fallback attempts
- Any call whose usage block didn't come back
- cache_write_tokens (column exists in sessions table but not returned
  by /api/analytics/models)

Real-world impact: a user on Kimi K2.6 saw 150K local vs 27M on the
OpenRouter side over the same window. Precise-looking numbers next to
provider billing create false confidence and support load.

This change adds dashboard.show_token_analytics (default False) to gate:
- The Analytics nav item (hidden from sidebar when off)
- The Analytics page (renders an explanation card instead of charts)
- Token bars, totals, cost figures, avg/api_calls on the Models page

The Models page keeps capability metadata (context window, vision,
tools, reasoning), the use-as-main/aux menu, sessions count, and
last-used timestamps when the flag is off.

Set dashboard.show_token_analytics: true in config.yaml to opt back in
to the local debug estimate. Fixing the underlying accounting (issue
#23270) is a separate, larger workstream.

Refs: #23270, #21705
2026-05-13 22:20:25 -07:00
snav e90508103c chore(release): map jake@nousresearch.com and simpolism@gmail.com to @simpolism
Both addresses route to the same GitHub account (@simpolism / snav). Adding
the mappings here keeps release notes from showing two separate contributors
for what is one person's work, and unblocks subsequent PRs from this account
that would otherwise each need their own scripts/release.py noise.
2026-05-13 22:17:13 -07:00
teknium1 8c6b0c9ecd test(memory): cover cache-parity + runtime whitelist on background review fork
- test_background_review_does_not_narrow_toolset_schema: review fork must
  NOT pass enabled_toolsets to AIAgent (full parent schema = matching
  Anthropic cache key on the 'tools' field).
- test_background_review_installs_thread_local_whitelist: the runtime
  whitelist that replaces schema-level narrowing must contain memory +
  skills tools and exclude terminal / send_message / delegate_task /
  web_search / execute_code.
- test_review_fork_inherits_parent_cached_system_prompt: new test for
  PR #17276's first root cause — the fork's _cached_system_prompt must
  equal the parent's byte-for-byte.
- test_review_fork_pins_session_start_and_session_id: defensive belt-and-
  suspenders for the cached-prompt inheritance.

Inverted the original test_background_review_agent_uses_restricted_toolsets
(which asserted the schema-level narrowing) — that narrowing was the
direct cause of #25322's cache miss, and the runtime whitelist replaces
its safety claim without breaking cache parity.

Refs #25322, #15204, PR #17276.
2026-05-13 22:12:47 -07:00
teknium1 07349ce4df fix(memory): pin session_start + session_id on background review fork
Belt-and-suspenders complement to the cached-system-prompt inheritance:
pin session_start and session_id to the parent's so any code path that
re-renders parts of the system prompt (compression, plugin hooks)
still produces byte-identical output. The cached-prompt assignment
already short-circuits the normal rebuild path, but these pins
guarantee parity even if a future code path bypasses the cache.

Idea from simpolism's reference PR #25427 for #25322.

Co-Authored-By: simpolism <32201324+simpolism@users.noreply.github.com>
2026-05-13 22:12:47 -07:00
teknium1 95d074cdb2 chore(release): map WorldWriter for PR #17276 salvage 2026-05-13 22:12:47 -07:00
WorldWriter 5fe0672260 fix(memory): hit prefix cache in background review fork
Background review fork is supposed to hit Anthropic's prefix cache on the
parent's messages_snapshot, but currently doesn't (cache_read=0 on every
fork). Two root causes, fixed in this commit:

1. System prompt is rebuilt at fork time. _cached_system_prompt starts as
   None, so run_conversation calls _build_system_prompt, which embeds a
   minute-precision "Conversation started: ..." timestamp. Reviews fire
   10+ turns after session start, so the minute differs from main's,
   producing a 1-character diff that invalidates the byte-exact cache key.
   Fix: inherit the parent's _cached_system_prompt directly (same idea as
   #17089, which was self-closed for only fixing this half).

2. Tools schema was narrowed via enabled_toolsets=["memory","skills"] for
   safety. Anthropic's cache key includes `tools`, which sits before
   `system` in the cache hierarchy, so even byte-identical `system` won't
   hit when `tools` differs from main's full set.
   Fix: drop the schema-level restriction so `tools` matches main, and
   deny non-whitelisted tools at runtime via the existing
   get_pre_tool_call_block_message gate (hermes_cli/plugins.py:1085,
   already called at all three dispatch sites). Install/clear a thread-
   local whitelist (added in the previous commit) on the daemon thread.
   Append a soft constraint to the review prompt so the model knows.

Real E2E on Sonnet 4.5 (12-tool task + auto-triggered review):
- Per review-call cost: $0.331 → $0.035 (~89% reduction)
- End-to-end per run:   $0.848 → $0.629 (~26% reduction)
- Review fork cache_create / cache_read: 88,385 / 0  →  1,234 / 94,404

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 22:12:47 -07:00
WorldWriter 3a30c605b3 feat(plugins): add thread-local tool whitelist to pre_tool_call gate
Adds set_thread_tool_whitelist / clear_thread_tool_whitelist to
hermes_cli/plugins.py. When set on the current thread, restricts which
tools can pass through get_pre_tool_call_block_message; non-whitelisted
tools are blocked with a configurable deny message.

Mirrors the per-thread approval-callback pattern already used by
set_approval_callback (tools/terminal_tool.py:190). Used by
_spawn_background_review to deny non-memory/non-skill tools at runtime
while inheriting the parent agent's full tools schema for prefix-cache
parity (see follow-up commit).

Tests cover allow / deny / clear / cross-thread isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 22:12:47 -07:00
Siddharth Balyan d898e0eb7f fix(gateway): complete lazy-install rebind for slack/feishu/matrix + add ensure_and_bind helper (#25038)
Fixes #25028.

The lazy-install hooks added in #25014 installed packages correctly but
failed to rebind module-level globals after install:

- Slack: missing aiohttp rebind → NameError on file uploads
- Feishu: none of the ~25 lark_oapi symbols rebound → TypeError on
  adapter instantiation
- Matrix: mautrix.types enums stayed as stubs → mismatched values at
  runtime

Introduces tools.lazy_deps.ensure_and_bind() — a DRY helper that
combines ensure() + importer-callable + globals().update(). This
eliminates the error-prone pattern of manually listing every global
that needs updating after lazy-install. Each platform adapter now
defines a single _import() function returning all bindings.

Also fixes: pyproject.toml [slack] extra was missing aiohttp (needed
by slack-bolt's async path).
2026-05-14 10:41:46 +05:30
helix4u 52521c937a fix(install): skip browser download when system chromium exists 2026-05-13 22:07:02 -07:00
Teknium 7f08cb5941 fix(tts): align MiniMax TTS defaults with current API and add GroupId support
Follow-up on @pty819's t2a_v2 endpoint fix:

- Default model: speech-02 -> speech-02-hd (bare 'speech-02' is not in the
  supported enum; t2a_v2 rejects it with 400). Official enum: speech-01-hd,
  speech-01-turbo, speech-02-hd, speech-02-turbo, speech-2.6-hd/turbo,
  speech-2.8-hd/turbo.
- Default voice: female-shaonv -> English_expressive_narrator. The
  legacy speech-01-series short ID doesn't resolve cleanly on the
  speech-02+ models that are now the default.
- Default base URL: api.minimaxi.com -> api.minimax.io (matches the
  canonical host in the published docs; api-uw.minimax.io is the
  reduced-latency alt).
- Add GroupId support via tts.minimax.group_id config or MINIMAX_GROUP_ID
  env var. Some MiniMax accounts scope TTS requests by group; without it,
  requests 401. Only appended when not already in the user's base_url.

Tests rewritten to cover both the default t2a_v2 path (hex-encoded audio
in JSON, nested voice_setting/audio_setting) and the legacy
text_to_speech path (raw audio bytes, flat payload). Adds coverage for
GroupId config/env wiring and error surfacing.

Also adds AUTHOR_MAP entry for pty819's GitHub-noreply email.
2026-05-13 22:04:28 -07:00
pty819 c875c0dc11 fix(tts): update MiniMax default model to speech-02 and correct API endpoint
The MiniMax TTS defaults were outdated:
- DEFAULT_MINIMAX_MODEL was 'speech-01' but MiniMax now uses 'speech-02'
- DEFAULT_MINIMAX_BASE_URL was 'https://api.minimax.chat/v1/text_to_speech'
  which no longer works; the correct endpoint is
  'https://api.minimaxi.com/v1/t2a_v2'

Users who configured tts.provider: minimax were getting model-not-supported
errors because the hardcoded defaults did not match available API permissions.
2026-05-13 22:04:28 -07:00
Teknium 6122a79aab feat(slack): support !cmd as alternate prefix for slash commands in threads (#25355)
Slack platform-blocks native slash commands inside thread replies ("/queue
is not supported in threads. Sorry!") and there is no app-side setting to
re-enable them. As a workaround, rewrite a leading '!' to '/' for any known
gateway command before downstream processing — so '!queue', '!stop',
'!model gpt-5.4' etc. work inside Slack threads (and anywhere else).

Only the first token is checked against is_gateway_known_command(), so
casual messages like '!nice work' pass through to the agent unchanged.
Downstream pipeline (MessageType.COMMAND tagging, gateway dispatcher,
thread reply routing) is unchanged.

Adds 6 tests covering rewrite, args preservation, thread routing,
casual-message passthrough, '@bot' suffix, and plain '/' still-works.
2026-05-13 18:58:14 -07:00
Teknium 3f13d78088 perf(tools): cache get_nous_auth_status() and load_env() to fix slow hermes tools menus (#25341)
`hermes tools` -> "All Platforms" took ~14s to render the checklist
because building the toolset labels called `get_nous_auth_status()` ~31x
transitively (`_toolset_has_keys` -> `_visible_providers` ->
`get_nous_subscription_features` -> `managed_nous_tools_enabled`).
Each call did a synchronous OAuth refresh POST to
portal.nousresearch.com (~350ms even on the failure path), so one menu
paint burned >13s of HTTP and 31 single-use Nous refresh tokens.

Secondary hot spot: every `get_env_value()` re-read and re-sanitised
the entire .env file. 116 reads with O(lines x known-keys) scanning
added ~300ms of CPU per render.

Fix is two process-level caches, both mtime-keyed so login/logout/edit
invalidate naturally:

* `hermes_cli/auth.py`: memoise `get_nous_auth_status()` for 15s keyed
  on auth.json mtime. Splits `_compute_nous_auth_status()` as the
  uncached impl. Adds `invalidate_nous_auth_status_cache()`.
* `hermes_cli/config.py`: memoise `load_env()` keyed on .env
  (path, mtime, size). Adds `invalidate_env_cache()`, wired into
  `save_env_value`, `remove_env_value`, and the sanitize-on-load
  writer so writers don't return stale dicts on same-second writes.

Before/after on Teknium's box (real HERMES_HOME, no Nous login):

* "All Platforms" cold path: ~13,874ms -> ~691ms label-build
* Warm re-open within the same process: ~122ms -> ~17ms

Side benefit: stops burning a Nous refresh token on every menu paint,
which was risking the portal's reuse-detection revocation logic.
2026-05-13 18:40:14 -07:00
Stephen Schoettler 3c106c89a1 test(ci): stabilize shared optional dependency baselines 2026-05-13 17:32:22 -07:00
Teknium dd5a9502e3 fix(tools-config): write video_gen.provider on Reconfigure tool path (#25307)
`_reconfigure_provider()` handled `image_gen_plugin_name` in both
branches (no-env-vars early return and post-env-vars) but never mirrored
the same handling for `video_gen_plugin_name`. The first-time
`_configure_provider()` path correctly routes to
`_select_plugin_video_gen_provider()`; reconfigure forgot to.

Repro:
1. Enable video_gen in `hermes tools` → Configure for All Platforms.
2. Go back into `hermes tools` → Reconfigure tool → Video Generation.
3. Pick xAI (with XAI_API_KEY already set).
4. Hit Enter at the "keep current key?" prompt.

Expected: `video_gen.provider: xai` written to config.yaml.
Actual: function returns silently; no `video_gen:` block ever written;
`video_generate` tool fails with "No video generation backend is
configured."

Fix: add the missing `video_gen_plugin_name` branch in both code paths
of `_reconfigure_provider()`, mirroring the existing
`image_gen_plugin_name` handling and the first-time configure logic.

Tests: `tests/hermes_cli/test_video_gen_picker.py` covers both branches
(env-vars-set keep-current and no-env-vars paths).
2026-05-13 17:31:54 -07:00
Teknium ef98e3f9e6 docs: close in-tree memory plugins to new PRs and codify skill standards (#25302)
AGENTS.md and CONTRIBUTING.md both now state:

1. No new memory providers in the repo. The set under plugins/memory/
   (honcho, mem0, supermemory, byterover, hindsight, holographic,
   openviking, retaindb) is closed. New backends ship as standalone
   plugin repos that users install into ~/.hermes/plugins/ via the
   same MemoryProvider ABC, discovery path, and hermes memory setup
   integration. PRs adding a new plugins/memory/<name>/ directory get
   closed with a pointer to publish as their own repo.

2. Skill authoring standards (hardline) — applies to all new or
   modernized skills (bundled, optional, contributed):
   - description <= 60 chars, one sentence, ends with period, no
     marketing words, no name repetition (verification snippet
     included)
   - tools referenced in SKILL.md prose must be native Hermes tools
     or MCP servers the skill expects — no grep/cat/sed/find etc.
     when search_files/read_file/patch already cover them
   - platforms: gating audited against actual POSIX-only primitives
   - author credits the human contributor first, not 'Hermes Agent'
   - SKILL.md uses modern section order with line targets
   - scripts/references/templates layout for non-trivial logic
   - tests at tests/skills/test_<skill>_skill.py, stdlib + mock only
   - .env.example edits isolated to a delimited block

CONTRIBUTING.md includes a good/bad description example and a
'don't say / say' table mapping shell utilities to native tools.
AGENTS.md points the agent at references/new-skill-pr-salvage.md
for the full salvage checklist.
2026-05-13 17:19:50 -07:00
teknium1 66c70966cd chore(skills/evm): tighten SKILL.md to modern format
- description ≤60 chars (was 346)
- platforms: [linux, macos, windows] — script is pure stdlib (urllib, json, argparse), no POSIX-only primitives
- author: credit @Mibayy + @youssefea + @ethernet8023 + Hermes Agent (was just Mibayy)
- regenerated auto-gen docs page
2026-05-13 17:18:39 -07:00
ethernet e3fc081499 feat(skills): merge blockchain/base into blockchain/evm; salvage PR #2010
Salvages the closed PR #2010 (Mibayy's EVM multi-chain skill) and folds the
existing optional-skills/blockchain/base/ skill into it, so we ship one
unified EVM skill instead of two overlapping ones.

Pulled in from base/:
  - 8 missing Base-specific tokens (AERO, DEGEN, TOSHI, BRETT, WELL,
    cbETH, cbBTC, wstETH, rETH) added to KNOWN_TOKENS['base'] —
    base/ had 11, evm/ only had 3 (USDC/DAI/WETH).
  - L1 data-fee pitfall note for rollups (Base, Arbitrum, Optimism, zkSync).
  - Batch-size chunking in rpc_batch (Base RPC caps batches at 10 calls
    per JSON-RPC request; adding more known tokens tripped that limit
    and broke 'wallet --chain base' with a 'list index out of range'
    error). Ported the chunking pattern from base/_rpc_batch_chunk.

Latent bugs found and fixed while smoke-testing the merge:
  - cmd_multichain and cmd_allowance both iterated KNOWN_TOKENS[chain]
    with 'for contract, (symbol, _name) in known.items()' — but the dict
    shape is {symbol: contract_str}, not {addr: (sym, name)}. This raised
    'too many values to unpack (expected 2)' on every non-zero balance.
    Now iterates as 'for symbol, contract in known.items()'.
  - Input validation: added is_valid_address / is_valid_txhash /
    require_address / require_txhash helpers and wired them into
    cmd_wallet, cmd_tx, cmd_token, cmd_activity, cmd_allowance,
    cmd_decode, cmd_contract, cmd_multichain. Fails fast with exit 2
    on malformed input instead of burning an RPC round-trip on garbage.

Documentation:
  - SKILL.md now flags that this skill supersedes optional-skills/blockchain/base.
  - Pitfalls expanded for ENS (single-endpoint dependency on
    ensideas.com), tx decoding (single-endpoint dependency on
    4byte.directory), and rollup L1 fees.
  - Regenerated website/docs/user-guide/skills/optional/blockchain/
    blockchain-evm.md and removed the old blockchain-base.md page;
    catalog updated.

Removed:
  - optional-skills/blockchain/base/SKILL.md
  - optional-skills/blockchain/base/scripts/base_client.py
  - website/docs/user-guide/skills/optional/blockchain/blockchain-base.md

Smoke-tested live against Base mainnet: stats, price, token, wallet
(vitalik.eth — 3.12 ETH + 13.88 USDC + 4.23 DAI + 0.06 WETH on Base)
and allowance (ethereum, 7 unlimited approvals to Uniswap/Permit2).

Original PR #2010 author: Mibayy.
Original base/ skill author: youssefea.
2026-05-13 17:18:39 -07:00
Mibayy aa1e2edd35 feat: add EVM multi-chain skill (8 chains, 14 commands)
Adds a comprehensive EVM blockchain skill with 14 commands:
- stats, wallet, tx, token, activity, gas, price (core queries)
- compare: gas + prices across all 8 chains simultaneously
- whale: scan recent blocks for large transfers (configurable min USD)
- multichain: scan same wallet across all 8 chains in parallel
- allowance: check dangerous ERC-20 approvals (Permit2, Uniswap, 1inch...)
- decode: decode tx input data via 4byte.directory
- ens: resolve ENS names <-> addresses (bidirectional)
- contract: inspect contracts (proxy detection, ERC-20/721, bytecode size)

Chains: Ethereum, BNB Chain, Base, Arbitrum One, Polygon, Optimism, Avalanche, zkSync Era

Zero external dependencies. Python stdlib only (urllib, json, argparse, threading).

Co-authored-by: Mibayy <mibay@clawhub.io>
2026-05-13 17:18:39 -07:00
Teknium 091d8e1030 feat(codex-runtime): optional codex app-server runtime for OpenAI/Codex models (#24182)
* feat(codex-runtime): scaffold optional codex app-server runtime

Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex
turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch.
Default behavior is unchanged.

Lands in three pieces:

1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker
   for codex's app-server protocol (codex-rs/app-server). Spawn, init
   handshake, request/response, notification queue, server-initiated
   request queue (for approval round-trips), interrupt-friendly blocking
   reads. Tested against real codex 0.130.0 binary end-to-end during
   development.

2. hermes_cli/runtime_provider.py:
   - Adds 'codex_app_server' to _VALID_API_MODES.
   - Adds _maybe_apply_codex_app_server_runtime() helper, called at the
     end of _resolve_runtime_from_pool_entry(). Inert unless
     'model.openai_runtime: codex_app_server' is set in config.yaml AND
     provider in {openai, openai-codex}. Other providers cannot be
     rerouted (anthropic, openrouter, etc. preserved).

3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests
   covering api_mode registration, the rewriter helper (default-off,
   case-insensitive, opt-in, non-eligible providers preserved), version
   parser, missing-binary handling, error class. Does NOT require codex
   CLI installed.

This commit is wire-only: the api_mode is recognized but AIAgent does
not yet branch on it. Followup commits add the session adapter, event
projector, approval bridge, transcript projection (so memory/skill
review still works), plugin migration, and slash command.

Existing tests remain green:
- tests/cli/test_cli_provider_resolution.py (29 passed)
- tests/agent/test_credential_pool_routing.py (included above)

* feat(codex-runtime): add codex item projector for memory/skill review

The translator that lets Hermes' self-improvement loop keep working under the
Codex runtime: converts codex 'item/*' notifications into Hermes' standard
{role, content, tool_calls, tool_call_id} message shape that
agent/curator.py already knows how to read.

Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs):
  - userMessage          → {role: user, content}
  - agentMessage         → {role: assistant, content: text}
  - reasoning            → stashed in next assistant's 'reasoning' field
  - commandExecution     → assistant tool_call(name='exec_command') + tool result
  - fileChange           → assistant tool_call(name='apply_patch') + tool result
  - mcpToolCall          → assistant tool_call(name='mcp.<server>.<tool>') + tool result
  - dynamicToolCall      → assistant tool_call(name=<tool>) + tool result
  - plan/hookPrompt/etc  → opaque assistant note, no fabricated tool_calls

Invariants preserved:
  - Message role alternation never violated: each tool item produces at most
    one assistant + one tool message in that order, correlated by call_id.
  - Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta)
    don't materialize messages — only item/completed does. Mirrors how
    Hermes already only writes the assistant message after streaming ends.
  - Tool call ids are deterministic (codex item id-based) so replays produce
    identical messages and prefix caches stay valid (AGENTS.md pitfall #16).
  - JSON args use sorted_keys for the same reason.

Real wire formats verified against codex 0.130.0 by capturing live
notifications from thread/shellCommand and including one as a fixture
(COMMAND_EXEC_COMPLETED).

23 new tests, all green:
  - Streaming deltas don't materialize (3 paths)
  - Turn/thread frame events are silent
  - commandExecution: 5 tests including non-zero exit annotation +
    deterministic id stability across replays
  - agentMessage + reasoning attachment + reasoning consumption
  - fileChange: summary without inlined content
  - mcpToolCall: namespaced naming + error surfacing
  - userMessage: text fragments only (drops images/etc)
  - opaque items: no fabricated tool_calls
  - Helpers: deterministic id stability + sorted JSON args
  - Role alternation invariant across all four tool-shaped item types

This commit is a pure addition. AIAgent integration (the wire that uses the
projector) is the next commit.

* feat(codex-runtime): add session adapter + approval bridge

The third self-contained module: CodexAppServerSession owns one Codex
thread per Hermes session, drives turn/start, consumes streaming
notifications via CodexEventProjector, handles server-initiated approval
requests, and translates cancellation into turn/interrupt.

The adapter has a single public per-turn method:

    result = session.run_turn(user_input='...', turn_timeout=600)
    # result.final_text          → assistant text for the caller
    # result.projected_messages  → list ready to splice into AIAgent.messages
    # result.tool_iterations     → tick count for _iters_since_skill nudge
    # result.interrupted         → True on Ctrl+C / deadline / interrupt
    # result.error               → error string when the turn cannot complete
    # result.turn_id, thread_id  → for sessions DB / resume

Behavior:

  - ensure_started() spawns codex, does the initialize handshake, and
    issues thread/start with cwd + permissions profile. Idempotent.
  - run_turn() blocks until turn/completed, drains server-initiated
    requests (approvals) before reading notifications so codex never
    deadlocks waiting for us, projects every item/completed via the
    projector, and increments tool_iterations for the skill nudge gate.
  - request_interrupt() is thread-safe (threading.Event); the next loop
    iteration issues turn/interrupt and unwinds.
  - turn_timeout deadlock guard issues turn/interrupt and records an
    error if the turn never completes.
  - close() escalates terminate → kill via the underlying client.

Approval bridge:

  Codex emits server-initiated requests for execCommandApproval and
  applyPatchApproval. The adapter translates Hermes' approval choice
  vocabulary onto codex's decision vocabulary:

    Hermes 'once'                → codex 'approved'
    Hermes 'session' or 'always' → codex 'approvedForSession'
    Hermes 'deny' / anything else → codex 'denied'

  Routing precedence:
    1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive)
    2. approval_callback wired by the CLI (defers to
       tools.approval.prompt_dangerous_approval())
    3. Fail-closed denial when neither is wired

  Unknown server-request methods are answered with JSON-RPC error -32601
  so codex doesn't hang waiting for us.

Permission profile mapping mirrors AGENTS.md:
    Hermes 'auto'              → codex 'workspace-write'
    Hermes 'approval-required' → codex 'read-only-with-approval'
    Hermes 'unrestricted/yolo' → codex 'full-access'

20 new tests, all green. Combined with prior commits this PR now has
67 tests across three modules:
  - test_codex_app_server_runtime.py: 24 (api_mode + transport surface)
  - test_codex_event_projector.py: 23 (item taxonomy projections)
  - test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts)

Full tests/agent/transports/ directory: 249/249 pass — no regressions
to existing transport tests.

Still no wire into AIAgent.run_conversation(); that integration commit
is small and goes next.

* feat(codex-runtime): wire codex_app_server runtime into AIAgent

The integration commit. AIAgent.run_conversation() now early-returns to a
new helper _run_codex_app_server_turn() when self.api_mode ==
'codex_app_server', bypassing the chat_completions tool loop entirely.

Three small surgical edits to run_agent.py (~105 LOC total):

1. Line ~1204 (constructor api_mode validation set):
   Add 'codex_app_server' so an explicit api_mode='codex_app_server'
   passed to AIAgent() isn't silently rewritten to 'chat_completions'.

2. Line ~12048 (run_conversation, just before the while loop):
   Early-return to _run_codex_app_server_turn() when self.api_mode is
   'codex_app_server'. Placed AFTER all standard pre-loop setup —
   logging context, session DB, surrogate sanitization, _user_turn_count
   and _turns_since_memory increments, _ext_prefetch_cache, memory
   manager on_turn_start — so behavior outside the model-call loop is
   identical between paths. Default Hermes flow is unchanged when the
   flag is off.

3. End-of-class (line ~15497):
   New method _run_codex_app_server_turn(). Lazy-instantiates one
   CodexAppServerSession per AIAgent (reused across turns), runs the
   turn, splices projected_messages into messages, increments
   _iters_since_skill by tool_iterations (since the chat_completions
   loop normally does that per iteration), fires
   _spawn_background_review on the same cadence as the default path.

Counter accounting:

  _turns_since_memory  ← already incremented at run_conversation:11817
                         (gated on memory store configured) — codex
                         helper does NOT touch it (would double-count).
  _user_turn_count     ← already incremented at run_conversation:11793
                         — codex helper does NOT touch it.
  _iters_since_skill   ← incremented in the chat_completions loop per
                         tool iteration. Codex helper increments by
                         turn.tool_iterations since the loop is bypassed.

User message:

  ALREADY appended to messages by run_conversation pre-loop (line 11823)
  before the early-return reaches us. Helper does NOT append again.
  Regression test test_user_message_not_duplicated guards this.

Approval callback wiring:

  Lazy-fetches tools.terminal_tool._get_approval_callback at session
  spawn time, passes to CodexAppServerSession. CLI threads with
  prompt_toolkit get interactive approvals; gateway/cron contexts get
  the codex-side fail-closed deny.

Error path:

  Codex session exceptions become a 'partial' result with completed=False
  and a final_response that explicitly tells the user how to switch back:
  'Codex app-server turn failed: ... Fall back to default runtime with
  /codex-runtime auto.' Same return-dict shape as the chat_completions
  path so all callers (gateway, CLI, batch_runner, ACP) work unchanged.

9 new integration tests in tests/run_agent/test_codex_app_server_integration.py:
  - api_mode='codex_app_server' is accepted on AIAgent construction
  - run_conversation returns the expected codex shape
    (final_response, codex_thread_id, codex_turn_id, completed, partial)
  - Projected messages are spliced into messages list
  - _iters_since_skill ticks per tool iteration
  - _user_turn_count delegated to standard flow (not double-counted)
  - User message appears exactly once (regression guard)
  - _spawn_background_review IS invoked (memory/skill review keeps working)
  - chat.completions.create is NEVER called (loop fully bypassed)
  - Session exception → partial result with /codex-runtime auto hint
  - Interrupted turn → partial result with error preserved

Adjacent test runs confirm no regressions:
  - tests/run_agent/test_memory_nudge_counter_hydration.py: green
  - tests/run_agent/test_background_review.py: green
  - tests/run_agent/test_fallback_model.py: green
  - tests/agent/transports/: 249/249 green

Still missing for full feature: /codex-runtime slash command, plugin
migration helper, docs page, live e2e test gated on codex binary. Those
are the remaining followup commits.

* feat(codex-runtime): add /codex-runtime slash command (CLI + gateway)

User-facing toggle for the optional codex app-server runtime. Follows the
'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly:
single CommandDef in the central registry → CLI handler → gateway handler
→ running-agent guard → all surfaces (autocomplete, /help, Telegram menu,
Slack subcommands) update automatically.

Surface:
    /codex-runtime                    — show current state + codex CLI status
    /codex-runtime auto               — Hermes default runtime
    /codex-runtime codex_app_server   — codex subprocess runtime
    /codex-runtime on / off           — synonyms

Files changed:

  hermes_cli/codex_runtime_switch.py (new):
    Pure-Python state machine shared by CLI and gateway. Parse args,
    read/write model.openai_runtime in the config dict, gate enabling
    behind a codex --version check (don't let users opt in to a runtime
    they have no binary for; print npm install hint instead).
    Returns a CodexRuntimeStatus dataclass that callers render however
    suits their surface.

  hermes_cli/commands.py:
    Single CommandDef entry, no aliases (codex-runtime is its own thing).

  cli.py:
    Dispatch in process_command() + _handle_codex_runtime() handler that
    delegates to the shared module and renders results via _cprint.

  gateway/run.py:
    Dispatch in _handle_message() + _handle_codex_runtime_command() that
    returns a string (gateway sends as message). On a successful change
    that requires a new session, _evict_cached_agent() forces the next
    inbound message to construct a fresh AIAgent with the new api_mode —
    avoids prompt-cache invalidation mid-session.

  gateway/run.py running-agent guard:
    /codex-runtime joins /model in the early-intercept block so a runtime
    flip mid-turn can't split a turn across two transports.

Tests:
  tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the
  state machine: arg parsing (10 cases incl. case-insensitive and
  synonyms), reading current runtime (5 cases incl. malformed configs),
  writing runtime (3 cases), apply() entry point covering read-only,
  no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check,
  and persist-failure paths (8 cases). All green.

Adjacent test suites confirm no regressions:
  - tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py:
    167/167 green
  - tests/agent/transports/: 283/283 green when combined with prior commits

Still missing: plugin migration helper, docs page, live e2e test gated on
codex binary. Followup commits.

* feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml

Translates the user's mcp_servers config from ~/.hermes/config.yaml into
the TOML format codex's MCP client expects. Wired into the
/codex-runtime codex_app_server enable path so users get their MCP tool
surface in the spawned subprocess automatically.

The migration runs on every enable. Failures are non-fatal — the runtime
change still proceeds and the user gets a warning so they can fix the
codex config manually.

What translates (mapping verified against codex-rs/core/src/config/edit.rs):
  Hermes mcp_servers.<n>.command/args/env  → codex stdio transport
  Hermes mcp_servers.<n>.url/headers       → codex streamable_http transport
  Hermes mcp_servers.<n>.timeout           → codex tool_timeout_sec
  Hermes mcp_servers.<n>.connect_timeout   → codex startup_timeout_sec
  Hermes mcp_servers.<n>.cwd               → codex stdio cwd
  Hermes mcp_servers.<n>.enabled: false    → codex enabled = false

What does NOT translate (warned + skipped per server):
  Hermes-specific keys (sampling, etc.) — codex's MCP client has no
  equivalent. Listed in the per-server skipped[] field of the report.

What's NOT migrated (intentional):
  AGENTS.md — codex respects this file natively in its cwd. Hermes' own
  AGENTS.md (project-level) is already in the worktree, so codex picks
  it up without translation. No code needed.

Idempotency design:
  All managed content lives between a 'managed by hermes-agent' marker
  and the next non-mcp_servers section header. _strip_existing_managed_block
  removes the prior managed region cleanly, preserving any user-added
  codex config (model, providers.openai, sandbox profiles, etc.) above
  or below.

Files added:
  hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration
    helper. Public API: migrate(hermes_config, codex_home=None,
    dry_run=False) returns MigrationReport with .migrated/.errors/
    .skipped_keys_per_server. No external TOML dependency — minimal
    formatter handles strings/numbers/booleans/lists/inline-tables.

  tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests
  covering:
    - per-server translation (12): stdio/http/sse, cwd, timeouts,
      enabled flag, command+url precedence, sampling drop, unknown keys
    - TOML formatter (8): types, escaping, inline tables, error case
    - existing-block stripping (4): no marker, alone, with user content
      above, with user content below
    - end-to-end migrate() (8): empty, dry-run, round-trip, idempotent
      re-run, preserves user config, error reporting, invalid input,
      summary formatting

Files changed:
  hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in
    the codex_app_server enable branch. Migration failure logs a warning
    in the result message but does NOT fail the runtime change. Disable
    path (auto) explicitly skips migration.

  tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests:
    test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration,
    test_migration_failure_does_not_block_enable.

All 325 feature tests green:
  - tests/agent/transports/: 249 (incl. 67 new)
  - tests/run_agent/test_codex_app_server_integration.py: 9
  - tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new)
  - tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new)

* perf(codex-runtime): cache codex --version check within apply()

Single /codex-runtime invocation could spawn 'codex --version' up to 3
times (state report, enable gate, success message). Each spawn is ~50ms,
so the cumulative cost wasn't a crisis, but it was wasteful and turned a
trivial slash command into something noticeably laggy on slower systems.

Refactored to lazy-once via a closure over a nonlocal cache. First call
spawns; subsequent calls in the same apply() reuse the result.

Behavior unchanged — same return shape, same error handling, same install
hint when codex is missing. Just one subprocess per call instead of three.

Two regression-guard tests added:
  - test_binary_check_cached_within_apply: enable path → call_count == 1
  - test_binary_check_cached_on_read_only_call: state-report path → call_count == 1

Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime
tests still green.

* fix(codex-runtime): correct protocol field names found via live e2e test

Three real bugs caught only by running a turn end-to-end against codex
0.130.0 with a real ChatGPT subscription. Unit tests passed because they
asserted on our own (incorrect) wire shapes; the wire format from
codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and
my initial reading of the README was incomplete.

Bug 1: thread/start.permissions wire format

Was sending {"profileId": "workspace-write"}.
Real format per PermissionProfileSelectionParams enum (tagged union):
  {"type": "profile", "id": "workspace-write"}
AND requires the experimentalApi capability declared during initialize.
AND requires a matching [permissions] table in ~/.codex/config.toml or
codex fails the request with 'default_permissions requires a [permissions]
table'.

Fix: stop overriding permissions on thread/start. Codex picks its default
profile (read-only unless user configures otherwise), which matches what
codex CLI users expect — they configure their default permission profile
in ~/.codex/config.toml the standard way. Trying to be clever about
profile selection broke every turn we tested.

Live error before fix: 'Invalid request: missing field type' on every
turn/start, even though our turn/start payload was correct — the field
codex was complaining about was inside the permissions sub-object we
shouldn't have been sending.

Bug 2: server-request method names

Was matching 'execCommandApproval' and 'applyPatchApproval'.
Real names per common.rs ServerRequest enum:
  item/commandExecution/requestApproval
  item/fileChange/requestApproval
  item/permissions/requestApproval (new third method)

Fix: match the documented names. Added handler for
item/permissions/requestApproval that always declines — codex sometimes
asks to escalate permissions mid-turn and silent acceptance would surprise
users.

Live symptom before fix: agent.log showed
'Unknown codex server request: item/commandExecution/requestApproval'
and codex stalled because we replied with -32601 (unsupported method)
instead of an approval decision. The agent reported back 'The write
command was rejected' even though Hermes never showed the user an
approval prompt.

Bug 3: approval decision values

Was sending decision strings 'approved'/'approvedForSession'/'denied'.
Real values per CommandExecutionApprovalDecision enum (camelCase):
  accept, acceptForSession, decline, cancel
(also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment
variants we don't currently use).

Fix: rename _approval_choice_to_codex_decision return values; update
auto_approve_* fallbacks; update fail-closed default from 'denied' to
'decline'. Test mapping table updated to match.

Live test verified after fixes:
  $ hermes (with model.openai_runtime: codex_app_server)
  > Run the shell command: echo hermes-codex-livetest > .../proof.txt
    then read it back

  Approval prompt fired with 'Codex requests exec in <cwd>'.
  User chose 'Allow once'. Codex executed the command, wrote the file,
  read it back. Final response: 'Read back from proof.txt:
  hermes-codex-livetest'. File contents on disk match.

agent.log confirms:
  codex app-server thread started: id=019e200e profile=workspace-write
                                    cwd=/tmp/hermes-codex-livetest/workspace

All 20 session tests still green after wire-format updates.

* fix(codex-runtime): correct apply_patch approval params + ship docs

Live e2e revealed FileChangeRequestApprovalParams doesn't carry the
changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's
'reason' field describes what the patch wants to do. Test config and
display logic updated to use it. The first 'apply_patch (0 change(s))'
display from the live test is now 'apply_patch: <reason>'.

Adds website/docs/user-guide/features/codex-app-server-runtime.md
covering enable/disable, prerequisites, approval UX, MCP migration
behavior, permission profile delegation to ~/.codex/config.toml, known
limitations, and the architecture diagram. Wired into the Automation
category in sidebars.ts.

Live e2e validation across the path matrix:
  ✓ thread/start handshake
  ✓ turn/start with text input
  ✓ commandExecution items + projection
  ✓ item/commandExecution/requestApproval → Hermes UI → response
  ✓ Approve once → command runs
  ✓ Deny → command rejected, codex falls back to read-only message
  ✓ Multi-turn (codex remembers prior turn's results)
  ✓ apply_patch via Codex's fileChange path
  ✓ item/fileChange/requestApproval → Hermes UI
  ✓ MCP server migration loads inside spawned codex (verified via
    'use the filesystem MCP tool' prompt)
  ✓ /codex-runtime auto → codex_app_server toggle cycle
  ✓ Disable doesn't trigger migration
  ✓ Enable with codex CLI present succeeds + migrates
  ✓ Hermes-side interrupt path (turn/interrupt request issued cleanly
    even if codex finishes before the interrupt lands)

Known live-validated limitations now documented in the docs page:
  - delegate_task subagents unavailable on this runtime
  - permission profile selection delegated to ~/.codex/config.toml
  - apply_patch approval prompt has no inline changeset (codex protocol
    doesn't expose it)

145/145 codex-runtime tests still green.

* feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11)

Major: migrate native Codex plugins (#7 in OpenClaw's PR list)

Discovers installed curated plugins via codex's plugin/list RPC and
writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml
so they're enabled in the spawned Codex sessions. This is the
'YouTube-video-worthy' bit Pash highlighted: when a user has
google-calendar, github, etc. installed in their Codex CLI, those
plugins activate automatically when they enable Hermes' codex runtime.

Implementation:
  - hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins()
    helper spawns 'codex app-server' briefly and walks plugin/list. Returns
    (plugins, error) — failures are non-fatal so MCP migration still works.
  - render_codex_toml_section() now takes plugins + permissions args.
  - migrate() defaults: discover_plugins=True, default_permission_profile=
    'workspace-write'. Explicit None on either disables that side.
  - _strip_existing_managed_block() now also strips [plugins.*] and
    [permissions]/[permissions.*] sections inside the managed block, so
    re-runs replace plugins cleanly without touching codex's own config.

Quirk fixes:

#2 Default permissions profile written on enable.
   Without this, Codex's read-only default kicks in and EVERY write
   triggers an approval prompt. Now writes [permissions] default =
   'workspace-write' so the runtime feels normal out of the box. Set
   default_permission_profile=None to opt out.

#4 apply_patch approval prompt now shows what's changing.
   Codex's FileChangeRequestApprovalParams doesn't carry the changeset.
   Session adapter now caches the fileChange item from item/started
   notifications and looks it up by itemId when codex requests approval.
   Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of
   'apply_patch (0 change(s))'.

   Side benefit: also drains pending notifications BEFORE handling a
   server request, so the projector and per-turn caches are up to date
   when the approval decision fires. Bounded to 8 notifications per
   loop iter to avoid starving codex's response.

#5/#10 Exec approval prompt never shows empty cwd.
   When codex omits cwd in CommandExecutionRequestApprovalParams, fall
   back to the session's cwd. If somehow neither is available, show
   '<unknown>' explicitly instead of an empty string.

   Also surfaces 'reason' from the approval params when codex provides
   it — gives users more context on why codex wants to run something.

#11 Banner indicates the codex_app_server runtime when active.
   New 'Runtime: codex app-server (terminal/file ops/MCP run inside
   codex)' line appears in the welcome banner only when the runtime is
   on. Default banner is unchanged.

Tests:
  - 7 new tests in test_codex_runtime_plugin_migration.py covering
    plugin discovery (mocked), failure handling, dry-run skip, opt-out
    flag, idempotent re-runs, and permissions writing.
  - 3 new tests in test_codex_app_server_session.py covering the
    enriched approval prompts: cwd fallback, change summary on
    apply_patch, fallback when no item/started cache exists.
  - All 26 session tests + 46 migration tests green; 153 total in PR.

* feat(codex-runtime): hermes-tools MCP callback + native plugin migration

The big architectural addition: when codex_app_server runtime is on,
Hermes registers its own tool surface as an MCP server in
~/.codex/config.toml so the codex subprocess can call back into Hermes
for tools codex doesn't ship with — web_search, browser_*, vision,
image_generate, skills, TTS.

Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) —
when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva
installed via 'codex plugin', Hermes discovers them via plugin/list and
writes [plugins.<name>@openai-curated] entries so they activate
automatically.

New module: agent/transports/hermes_tools_mcp_server.py
  FastMCP stdio server exposing 17 Hermes tools. Each call dispatches
  through model_tools.handle_function_call() — same code path as the
  Hermes default runtime. Run with:
    python -m agent.transports.hermes_tools_mcp_server [--verbose]

  Exposed: web_search, web_extract, browser_navigate / _click / _type /
    _press / _snapshot / _scroll / _back / _get_images / _console /
    _vision, vision_analyze, image_generate, skill_view, skills_list,
    text_to_speech.

  NOT exposed (deliberately):
    - terminal/shell/read_file/write_file/patch — codex has built-ins
    - delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in
      model_tools.py:493, require running AIAgent context. Documented
      as a limitation and surfaced in the slash command output.

Migration changes (hermes_cli/codex_runtime_plugin_migration.py):
  - _query_codex_plugins() spawns 'codex app-server' briefly to walk
    plugin/list and pull installed openai-curated plugins. Failures are
    non-fatal — MCP migration still completes.
  - render_codex_toml_section() now takes plugins + permissions args
    AND wraps the managed block with a MIGRATION_END_MARKER comment so
    the stripper can reliably find both ends, even when the block
    contains top-level keys (default_permissions = ...).
  - migrate() defaults: discover_plugins=True, expose_hermes_tools=True,
    default_permission_profile=':workspace' (built-in codex profile name
    — must be prefixed with ':'). All three opt-out via explicit args.
  - _build_hermes_tools_mcp_entry() builds the codex stdio entry with
    HERMES_HOME and PYTHONPATH passthrough so a worktree-launched
    Hermes points the MCP subprocess at the same module layout.

Live-caught wire bugs fixed during this turn:
  1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is
     for *user-defined* profiles with structured fields. Built-in
     profile names start with ':' (':workspace', ':read-only',
     ':danger-no-sandbox'). Was emitting
     which codex rejected with 'invalid type: string "X", expected
     struct PermissionProfileToml'.
  2. Built-in profile is , NOT . Codex
     rejected  with 'unknown built-in profile'.
  3. Codex's MCP layer sends  for
     tool-call confirmation. We weren't handling it, so codex stalled
     and returned 'MCP tool call was rejected'. Now: auto-accept for
     our own hermes-tools server (user already opted in by enabling
     the runtime), decline for third-party servers.

Quirk fixes shipped (from the limitations list):
  #2 default permissions: workspace profile written on enable. No more
     approval prompt on every write.
  #4 apply_patch approval shows what's changing: cache fileChange
     items from item/started, look up by itemId when codex sends
     item/fileChange/requestApproval. Prompt: '1 add, 1 update:
     /tmp/new.py, /tmp/old.py' instead of '0 change(s)'.
  #5/#10 exec approval cwd never empty: fall back to session cwd, then
     '<unknown>'. Also surfaces 'reason' from codex when present.
  #11 banner shows 'Runtime: codex app-server' line when active so
     users understand why tool counts may not match what's reachable.

Tests:
  - 5 new tests in test_codex_runtime_plugin_migration.py covering
    plugin discovery, expose_hermes_tools entry generation, idempotent
    re-runs, opt-out flag, permissions profile.
  - 3 new tests in test_codex_app_server_session.py covering enriched
    approval prompts (cwd fallback, fileChange summary).
  - 2 new tests for mcpServer/elicitation/request handling (accept
    hermes-tools, decline others).
  - New test file test_hermes_tools_mcp_server.py covering module
    surface, EXPOSED_TOOLS safety invariants (no shell/file_ops,
    no agent-loop tools), and main() error paths.
  - 166 codex-runtime tests total, all green.

Live e2e validated against codex 0.130.0 + ChatGPT subscription:
  ✓ /codex-runtime codex_app_server enables, migrates filesystem MCP,
    registers hermes-tools, writes default_permissions = ':workspace'
  ✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions
  ✓ Shell command runs without approval prompt (workspace profile works)
  ✓ Multi-turn — codex remembers prior turn's results
  ✓ apply_patch path via fileChange request approval
  ✓ web_search via hermes-tools MCP callback returns real Firecrawl
    results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s
  ✓ Disable cycle clean

Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md
  Full re-write covering native plugin migration, the hermes-tools
  callback architecture, the prerequisites change ('codex login is
  separate from hermes auth login codex'), the trade-off table now
  reflecting which Hermes tools work via callback, and the limitations
  list updated with what's actually unavailable on this runtime.

* feat(codex-runtime): pin user-config preservation invariant for quirk #6

Quirk #6 from the limitations list — user MCP servers / overrides /
codex-only sections in ~/.codex/config.toml that live OUTSIDE the
hermes-managed block must survive re-migration verbatim.

This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER
pair I added when fixing the default_permissions wire format (so the
strip can find both ends of the managed region even with top-level
keys like default_permissions). But it was an emergent property
without a test pinning it.

Now explicitly tested:
  - User MCP server above the managed block survives migration
  - User MCP server below the managed block survives migration
  - Both above + below survive a second re-migration
  - User content (model, providers, sandbox, otel, etc.) outside our
    region is left untouched

Docs added a section "Editing ~/.codex/config.toml safely" explaining
the marker contract — so users know they can add their own MCP
servers, override permissions, configure codex-only options, etc.
without fear of Hermes overwriting their work.

167 codex-runtime tests, all green.

* docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find

Previous docs and PR description undersold what codex's built-in
toolset actually provides. apply_patch alone made it sound like the
runtime could only edit files in patch format — implying you'd lose
terminal use, read_file, write_file, search/find. That was wrong.

Codex's 'shell' tool runs arbitrary shell commands inside the sandbox,
which covers everything you'd do in bash: cat/head/tail (read), echo>
or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/
test/git/etc. apply_patch is for structured multi-file edits on top
of that. update_plan is its in-runtime todo. view_image loads images.
And codex has its own web_search built in (in addition to the
Firecrawl-backed one Hermes exposes via MCP callback).

Docs now have a 'What tools the model actually has' section right
after Why, breaking the surface into three clearly-labeled buckets:

  1. Codex's built-in toolset (always on) — shell, apply_patch,
     update_plan, view_image, web_search; covers everything terminal-
     adjacent.
  2. Native Codex plugins (auto-migrated from your codex plugin
     install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc.
  3. Hermes tool callback (MCP server in ~/.codex/config.toml) —
     web_search/web_extract via Firecrawl, browser_*, vision_analyze,
     image_generate, skill_view/skills_list, text_to_speech.

Plus a 'What's NOT available' callout listing the four agent-loop tools
(delegate_task, memory, session_search, todo) that need running
AIAgent context and can't reach the codex runtime.

Trade-offs table broken out: shell, apply_patch, update_plan,
view_image, sandbox each get their own row with a one-line description
so users can see at a glance what's available natively.

Architecture diagram updated to list the codex built-ins by name
instead of 'apply_patch + shell + sandbox'.

No code changes — purely docs clarification. 167 codex-runtime tests
still green.

* fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade

Two real bugs in the self-improvement loop integration that the previous
test mocked away.

Bug 1: wrong call signature

The codex helper was calling self._spawn_background_review() with no
args after every turn. That function actually requires:
  messages_snapshot=list   (positional or keyword)
  review_memory=bool       (at least one trigger must be True)
  review_skills=bool

So the call would have raised TypeError at runtime — except the only
test that exercised this path mocked _spawn_background_review entirely
and just asserted spawn.called, so the wrong-arg shape never surfaced.

Bug 2: review fork inherits codex_app_server api_mode

The review fork is constructed with:
  api_mode = _parent_runtime.get('api_mode')

So when the parent is codex_app_server, the review fork ALSO runs as
codex_app_server. But the review fork's whole job is to call agent-loop
tools (memory, skill_manage) which require Hermes' own dispatch — they
short-circuit with 'must be handled by the agent loop' on the codex
runtime. So the review fork would have run, decided to save something,
called memory or skill_manage, and silently no-op'd.

Fixed in run_agent.py:_spawn_background_review() — when the parent
api_mode is 'codex_app_server', the review fork is downgraded to
'codex_responses' (same OAuth credentials, same openai-codex provider,
but talks to OpenAI's Responses API directly so Hermes owns the loop).

Also rewrote the codex helper's review wiring to match the
chat_completions path:
  - Computes _should_review_memory in the pre-loop block (was already
    being computed; now passed through to the helper as an arg).
  - Computes _should_review_skills AFTER the codex turn returns +
    counters tick (line ~15432 pattern in chat_completions).
  - Calls _spawn_background_review(messages_snapshot=, review_memory=,
    review_skills=) only when at least one trigger fires.
  - Adds the external memory provider sync (_sync_external_memory_for_turn)
    that the chat_completions path runs after every turn.

Tests:

  Replaced the broken test_background_review_invoked (which only
  asserted spawn.called) with three sharper tests:
    - test_background_review_NOT_invoked_below_threshold:
      single turn at default thresholds → no review fires (would have
      caught the original 'every turn calls spawn with no args' bug)
    - test_background_review_skill_trigger_fires_above_threshold:
      10 tool_iterations at threshold=10 → review fires with
      messages_snapshot=list, review_skills=True, counter resets
    - test_background_review_signature_never_breaks: regression guard
      asserting positional args are always empty and kwargs include
      messages_snapshot

  New TestReviewForkApiModeDowngrade class:
    - test_codex_app_server_parent_downgrades_review_fork: drives the
      real _spawn_background_review function (no mock at that level),
      asserts the review_agent gets api_mode='codex_responses' when
      the parent was codex_app_server.

Live-validated against real run_conversation:
  - Counter ticked from 0 to 5 after a 5-tool-iteration turn
  - _spawn_background_review fired exactly once with kwargs-only signature
  - review_skills=True, review_memory=False
  - messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool
    results + 1 final assistant + initial system/user)
  - Counter reset to 0 after fire

170 codex-runtime tests, all green.

Docs: added a Self-improvement loop section to the codex runtime page
explaining both how the trigger logic stays equivalent and that the
review fork is auto-downgraded to codex_responses for the agent-loop
tools. Also clarified that apply_patch and update_plan ARE codex's
built-in tools (the previous version made it sound like they were
separate from 'codex's stuff' — they're not, all five tools listed
in 'What tools the model actually has' section 1 are codex built-ins).

* feat(codex-runtime): expose kanban tools through Hermes MCP callback

Kanban workers spawn as separate hermes chat -q subprocesses that read
the user's config.yaml. If model.openai_runtime: codex_app_server is set
globally (which is the whole point of opt-in), every dispatched worker
ALSO comes up on the codex runtime.

That mostly works — codex's built-in shell + apply_patch + update_plan
do the actual task work fine — but it had one critical break: the
worker handoff tools (kanban_complete, kanban_block, kanban_comment,
kanban_heartbeat) are Hermes-registered tools, not codex built-ins.
On the codex runtime, codex builds its own tool list and these never
reach the model, so the worker would do the work but not be able to
report back, hanging until the dispatcher's timeout escalates it as
zombie.

Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes
MCP callback. They dispatch statelessly through handle_function_call()
just like web_search and the others — they read HERMES_KANBAN_TASK
from env (set by the dispatcher), gate correctly (worker tools require
the env var, orchestrator tools require it unset), and write to
~/.hermes/kanban.db.

Why kanban tools work via stateless dispatch when delegate_task/memory/
session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS
(model_tools.py:493) and short-circuit in handle_function_call() with
'must be handled by the agent loop' — they need to mutate AIAgent's
mid-loop state. Kanban tools have no such requirement; they're pure
side-effect functions against the kanban.db plus state_meta.

Tools exposed:
  Worker handoff (require HERMES_KANBAN_TASK):
    kanban_complete, kanban_block, kanban_comment, kanban_heartbeat
  Read-only board queries:
    kanban_show, kanban_list
  Orchestrator (require HERMES_KANBAN_TASK unset):
    kanban_create, kanban_unblock, kanban_link

Tests:
  - test_kanban_worker_tools_exposed: complete/block/comment/heartbeat
    in EXPOSED_TOOLS (regression guard for the would-hang-worker bug)
  - test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link

Docs:
  - New 'Workflow features' section in the docs page covering /goal,
    kanban, and cron behavior on this runtime
  - /goal: works fully via run_conversation feedback; only caveat is
    approval-prompt noise on long writes-heavy goals (mitigated by
    the default :workspace permission profile)
  - Kanban: enumerated which tools are reachable via the callback and
    why the env var propagates correctly through the codex subprocess
    to the MCP server subprocess
  - Cron: documented as 'not specifically tested' — same rules as the
    CLI apply since cron runs through AIAgent.run_conversation
  - Trade-offs table gained rows for /goal, kanban worker, kanban
    orchestrator

172/172 codex-runtime tests green (+2 from kanban tests).

* docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost

Three docs gaps caught during a final audit:

1. /codex-runtime was only in the feature docs page, not in the
   slash-commands reference. Added rows to both the CLI section and
   the Messaging section so users discover it where they'd look for
   slash command syntax.

2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md.
   CODEX_HOME lets users redirect Codex CLI's config dir (the migration
   honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and
   propagates to the codex subprocess + the hermes-tools MCP subprocess
   so kanban worker tools gate correctly — documented as 'don't set
   manually' since it's an internal handoff.

3. Aux client behavior on this runtime. When openai_runtime=
   codex_app_server is on with the openai-codex provider, every aux
   task (title generation, context compression, vision auto-detect,
   session search summarization, the background self-improvement review
   fork) flows through the user's ChatGPT subscription by default.

   This is true for the existing codex_responses path too, but it's
   more visible / important here because users explicitly opted in for
   subscription billing. Added a 'Auxiliary tasks and ChatGPT
   subscription token cost' section to the docs page with a YAML
   example showing how to override specific aux tasks to a cheaper
   model (typically google/gemini-3-flash-preview via OpenRouter).

   Also documents how the self-improvement review fork gets
   auto-downgraded from codex_app_server to codex_responses by the
   fix earlier in this PR.

No code changes — pure docs. 172 codex-runtime tests still green.

* docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME

OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning
codex app-server they were synthesizing a per-agent HOME alongside
CODEX_HOME. That made every subprocess codex's shell tool launches
(gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's
real config files. They had to back it out in PR #81562 — keep
CODEX_HOME isolation, leave HOME alone.

Audit confirms Hermes' codex spawn doesn't have this problem. We do
os.environ.copy() and only overlay CODEX_HOME (when provided) and
RUST_LOG. HOME passes through unchanged. But it was an emergent
property without a test pinning it, so adding a regression guard:

  test_spawn_env_preserves_HOME — confirms parent HOME survives intact
                                  in the subprocess env
  test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home
                                                  arg still isolates
                                                  codex state correctly

Docs additions:

  'HOME environment variable passthrough' section — calls out the
  contract explicitly: CODEX_HOME isolates codex's own state, HOME
  stays user-real so gh/git/aws/npm/etc. find their normal config.
  Cites openclaw#81562 as the cautionary tale.

  'Multi-profile / multi-tenant setups' section — addresses the
  related concern: profiles share ~/.codex/ by default. For users who
  want per-profile codex isolation (separate auth, separate plugins),
  documents the manual CODEX_HOME=<profile-scoped-dir> approach.

  Explains why we DON'T auto-scope CODEX_HOME per profile: doing so
  would silently invalidate existing codex login state for anyone
  upgrading to this PR with tokens already at ~/.codex/auth.json.
  Opt-in is safer than surprising users.

174 codex-runtime tests (+2 from HOME guards), all green.

* fix(codex-runtime): TOML control-char escapes + atomic config.toml write

Two footguns caught in a final audit pass before merge.

Bug 1: TOML control characters not escaped

The _format_toml_value() helper escaped backslashes and double quotes
but passed literal control characters (\n, \t, \r, \f, \b) through
unchanged. TOML basic strings don't allow literal control characters
— a path or env var containing a newline would produce invalid TOML
that codex refuses to load.

Realistic exposure: pathological cases like a HERMES_HOME with a
trailing newline (env var concatenation accident), or a PYTHONPATH
with a tab from a multi-line shell heredoc.

Fix: escape all five TOML basic-string control sequences (\b \t \n
\f \r) in addition to \\ and \" that we already did. Order
matters — backslash must come first or the other escapes get
re-escaped.

Bug 2: config.toml write wasn't atomic

If the python process crashed between target.mkdir() and the
write_text() finishing, a half-written config.toml could be left
behind. On NFS / Windows / some FUSE mounts this is a real concern;
on ext4/APFS small writes are usually atomic in practice but not
guaranteed.

Fix: write to a tempfile.mkstemp() temp file in the same directory,
then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on
Windows). On rename failure, clean up the temp file so repeated
failed migrations don't pile up .config.toml.* files.

Tests:
  - test_string_with_newline_escaped — \n in value → \n in output
  - test_string_with_tab_escaped — \t in value → \t in output
  - test_string_with_other_controls_escaped — \r, \f, \b
  - test_windows_path_escaped_correctly — backslash doubling
  - test_atomic_write_no_temp_leak_on_success — no .config.toml.*
    left over after a successful write
  - test_atomic_write_cleanup_on_rename_failure — temp file removed
    when Path.replace raises (simulated disk full)

180 codex-runtime tests, all green (+6 from this commit).

Footguns audited but NOT fixed (with rationale):

- Concurrent migrations race. Two Hermes processes hitting
  /codex-runtime codex_app_server within seconds of each other could
  cause one writer to lose entries. Low probability (you'd have to
  enable from two surfaces simultaneously) and low impact (just re-run
  migration). Adding fcntl/msvcrt locking is more code than it's
  worth here. The atomic rename above means each individual write is
  consistent — only the merge step is racy.

- Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and
  check at runtime but don't reject too-new versions. Right call —
  the protocol has been stable through 0.125 → 0.130. If OpenAI
  breaks it later we'd see the error in test_codex_app_server_runtime
  on CI before users hit it.
2026-05-13 17:18:15 -07:00
Teknium 9d42c2c286 feat(video_gen): unified video_generate tool with pluggable provider backends (#25126)
* feat(video_gen): unified video_generate tool with pluggable provider backends

One core video_generate tool, every backend a plugin. Mirrors the
image_gen + memory_provider + context_engine architecture: ABC, registry,
plugin-context registration hook, and per-plugin model catalogs surfaced
through hermes tools.

Surface (one schema, every backend):
- operation: generate / edit / extend
- modalities: text-to-video (prompt only), image-to-video (prompt +
  image_url), video edit (prompt + video_url), video extend (video_url)
- reference_image_urls, duration, aspect_ratio, resolution,
  negative_prompt, audio, seed, model override
- Providers ignore unknown kwargs and declare what they support via
  VideoGenProvider.capabilities() — backend-specific quirks stay in the
  backend, the agent learns one tool

Backends shipped:
- plugins/video_gen/xai/  — Grok-Imagine, full generate/edit/extend +
  image-to-video + reference images (salvaged from PR #10600 by
  @Jaaneek, reshaped into the plugin interface)
- plugins/video_gen/fal/  — Veo 3.1 (t2v + i2v), Kling O3 i2v,
  Pixverse v6 i2v with model-aware payload building that drops keys a
  model doesn't declare

Wiring:
- agent/video_gen_provider.py — VideoGenProvider ABC, normalize_operation,
  success_response / error_response, save_b64_video / save_bytes_video,
  $HERMES_HOME/cache/videos/
- agent/video_gen_registry.py — thread-safe register/get/list +
  get_active_provider() reading video_gen.provider from config.yaml
- hermes_cli/plugins.py — PluginContext.register_video_gen_provider()
- hermes_cli/tools_config.py — Video Generation category in
  hermes tools, plugin-only providers list, model picker per plugin,
  config write to video_gen.{provider,model}
- toolsets.py — new video_gen toolset
- tests: 31 new tests covering ABC, registry, tool dispatch, both plugins
- docs: developer-guide/video-gen-provider-plugin.md (parallel to the
  image-gen guide), sidebar + toolsets-reference + plugin guides updated

Supersedes: #25035 (FAL), #17972 (FAL), #14543 (xAI), #13847 (HappyHorse),
#10458 (provider categories), #10786 (xAI media+search bundle), #2984
(FAL duplicate), #19086 (Google Veo standalone — easy port to plugin
interface).

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen): dynamic schema reflects active backend's capabilities

Address the 'capability variance' question — instead of one tool with a
static schema that lies about what every backend supports, the
video_generate tool now rebuilds its description at get_definitions()
time based on the configured video_gen.provider and video_gen.model.

The agent sees backend-specific guidance up-front:
- 'fal-ai/veo3.1/image-to-video': 'image-to-video only — image_url is
  REQUIRED; text-only prompts will be rejected'
- 'fal-ai/veo3.1' (t2v): no image_url restriction shown
- xAI grok-imagine-video: 'operations: generate, edit, extend; up to 7
  reference_image_urls'
- Backends without edit/extend: 'not supported on this backend — surface
  that they need to switch backends via hermes tools'

This is the same pattern PR #22694 used for delegate_task self-capping —
documented in the dynamic-tool-schemas skill. Cache invalidation is
free: get_tool_definitions() already memoizes on config.yaml mtime, so a
mid-session backend swap rebuilds the schema automatically.

Tested:
- Empirical FAL OpenAPI schema check confirms image-to-video models
  require image_url (FAL returns HTTP 422 otherwise) — client-side
  rejection in FALVideoGenProvider.generate() now prevents the wasted
  round-trip
- Live E2E: fal-ai/veo3.1/image-to-video + prompt-only → clean
  missing_image_url error; fal-ai/veo3.1 + prompt-only → dispatches
- 6 new tests cover the builder (no config / image-only / full-surface /
  text-only / unknown provider / registry wiring), all passing
- 37/37 in the slice, 134/134 in the broader regression set

* test(video_gen/xai): full surface integration tests + cleaner schema

Verified end-to-end that the xAI plugin handles every documented mode
from PR #10600's surface: text-to-video, image-to-video,
reference-images-to-video, video edit, video extend (with and without
prompt). All five modes route to the correct xAI endpoint
(/videos/generations, /videos/edits, /videos/extensions) with the right
payload shape (image / reference_images / video keys), and all five
client-side rejections fire before the network: edit-without-prompt,
extend-without-video_url, image+refs conflict, >7 references, and
duration/aspect_ratio clamping.

15 new integration tests grouped into four classes (endpoint routing,
modalities, validation, clamping). httpx is stubbed via a small fake
AsyncClient that records POSTs so the tests assert the actual payload
the plugin would send to xAI — not just the success/error envelope.

Also cleaned up a description redundancy: when a model's operations
match the backend's overall set, we no longer print the duplicate
'operations supported by this model' line. xAI's description now reads:

    Active backend: xAI . model: grok-imagine-video
    - operations supported by this backend: edit, extend, generate
    - modalities supported by this backend: image, reference_images, text
    - aspect_ratio choices: 16:9, 1:1, 2:3, 3:2, 3:4, 4:3, 9:16
    - resolution choices: 480p, 720p
    - duration range: 1-15s
    - reference_image_urls: up to 7 images

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen): collapse surface to t2v + i2v, family-based auto-routing

Two design changes per Teknium:

1) Drop edit/extend from the tool surface entirely. Only text-to-video
and image-to-video remain. The agent sees a clean tool with two
modalities; backend-specific quirks like xAI's edit/extend endpoints
stay out of the unified schema.

2) FAL: pick a model FAMILY once, the plugin routes between the
family's text-to-video and image-to-video endpoints based on whether
image_url was passed. Users no longer pick 'fal-ai/veo3.1' AND
'fal-ai/veo3.1/image-to-video' as separate options — they pick
'veo3.1', and the plugin handles the rest.

Catalog rewritten as families:

    veo3.1            fal-ai/veo3.1                                /  fal-ai/veo3.1/image-to-video
    pixverse-v6       fal-ai/pixverse/v6/text-to-video             /  fal-ai/pixverse/v6/image-to-video
    kling-o3-standard fal-ai/kling-video/o3/standard/text-to-video /  fal-ai/kling-video/o3/standard/image-to-video

xAI uses a single endpoint (/videos/generations) for both modes,
routed by the presence of the 'image' field in the payload — no
edit/extend exposure.

Schema changes:
- VIDEO_GENERATE_SCHEMA: drop operation, drop video_url. Final params:
  prompt (required), image_url, reference_image_urls, duration,
  aspect_ratio, resolution, negative_prompt, audio, seed, model.
- VideoGenProvider ABC: drop normalize_operation, VALID_OPERATIONS,
  DEFAULT_OPERATION. capabilities() drops 'operations' key.
- success_response: add 'modality' field ('text' | 'image') so the
  agent and logs can see which endpoint was actually hit.

Dynamic schema builder simplified — no operations bullet, no
'switch backends if you need edit/extend' guidance. When the active
backend supports both modalities (the common case), description reads:

    Active backend: FAL . model: pixverse-v6
    - supports both text-to-video (omit image_url) and image-to-video
      (pass image_url) - routes automatically
    - aspect_ratio choices: 16:9, 9:16, 1:1
    - resolution choices: 360p, 540p, 720p, 1080p
    - duration range: 1-15s
    - audio: pass audio=true to enable native audio (pricing tier)
    - negative_prompt: supported

Tests: 51 in the video_gen slice, 216 across the broader image+video
sweep, all passing. New FAL routing tests prove pixverse-v6 + no image
hits text-to-video endpoint, pixverse-v6 + image_url hits
image-to-video endpoint, same for veo3.1 and kling-o3-standard.

Docs updated: developer-guide page rewrites the 'model families' pattern
as a first-class section so external plugin authors know the convention.
toolsets-reference and toolsets.py descriptions match the new surface.

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

* feat(video_gen/fal): expand catalog to 6 families, cheap + premium tiers

Catalog now covers everything Teknium specced from FAL:

  Cheap tier:
    ltx-2.3        fal-ai/ltx-2.3-22b/text-to-video       / image-to-video
    pixverse-v6    fal-ai/pixverse/v6/text-to-video       / image-to-video

  Premium tier:
    veo3.1         fal-ai/veo3.1                          / fal-ai/veo3.1/image-to-video
    seedance-2.0   bytedance/seedance-2.0/text-to-video   / image-to-video
    kling-v3-4k    fal-ai/kling-video/v3/4k/text-to-video / image-to-video
    happy-horse    fal-ai/happy-horse/text-to-video       / image-to-video

DEFAULT_MODEL moved from veo3.1 (premium) to pixverse-v6 (cheap, sane
defaults, both modalities) — better first-run UX for users who haven't
explicitly picked a model.

New family-entry knob: image_param_key. Kling v3 4K's image-to-video
endpoint expects start_image_url instead of image_url; declaring
image_param_key='start_image_url' on the family lets _build_payload
remap correctly. Other families default to plain image_url.

Per-family capability flags reflect each model's docs:
- LTX 2.3 + Happy Horse: minimal payloads (no duration/aspect/resolution
  enum exposed by FAL — let endpoint apply defaults)
- Seedance: 6 aspect ratios incl 21:9, durations 4-15, audio supported,
  negative prompts NOT supported per docs
- Kling v3 4K: 16:9/9:16/1:1, 3-15s, audio + negative
- Veo 3.1: unchanged, 16:9/9:16, 4/6/8s

Tests: +5 covering the new families (full catalog, Kling 4K
start_image_url remap, Seedance routing, LTX payload minimality, Happy
Horse minimality). 56/56 in the slice green.

Note: I did NOT add the FAL-hosted xAI Grok-Imagine variant. Hermes
already has a direct xAI plugin that talks to xAI's own API; routing
the same model through FAL's wrapper would duplicate the surface
without adding capabilities. Users on FAL who want Grok-Imagine should
use the xAI plugin directly; flag if you want both routes available.

* test(video_gen): tool-surface routing matrix — every model x modality

End-to-end matrix test driven through _handle_video_generate() — the
actual function the agent's video_generate tool call lands in. Writes
config.yaml, invokes the registered handler with a raw args dict, then
asserts the outbound HTTP/SDK call hit the right endpoint with the right
payload shape.

Parametrized over FAL_FAMILIES.keys() so the matrix auto-discovers new
families as they're added (add a family to FAL_FAMILIES and you get
both modalities tested for free).

Coverage:
- All 6 FAL families x {text-only, text+image} = 12 cases
- xAI x {text-only, text+image} = 2 cases
- tool-level model= arg overrides config = 2 cases

For each case, verifies:
- result['success'] is True
- result['modality'] matches input shape ('text' if no image_url, 'image' otherwise)
- outbound endpoint URL matches the family's text_endpoint or image_endpoint
- text-only payloads carry no image-shaped keys
- text+image payloads carry the family's image key (image_url for most,
  start_image_url for kling-v3-4k, wrapped 'image' object for xAI)

All 16 cases passing. Confirms the tool surface routes every
(provider, model, modality) combination correctly with zero leakage.

* feat(video_gen): keep video_gen out of first-run setup, surface in status

Two changes:

1. video_gen joins _DEFAULT_OFF_TOOLSETS, so it is NOT pre-selected in
   the first-run toolset checklist. Video gen is niche, paid, and slow —
   most users don't want it nagging them during initial setup. Anyone
   who wants it opts in via 'hermes tools' -> Video Generation, which
   already routes to the provider+model picker.

2. The 'hermes setup' status panel learns about video_gen — but only
   shows the row when a plugin reports available. Users without
   FAL_KEY/XAI_API_KEY see nothing about video gen; users with one of
   those keys see 'Video Generation (FAL) ✓' as confirmation it's wired.

Verified live:
- Fresh install (no creds): zero video_gen mentions in wizard.
- With FAL_KEY: status row appears with active backend name.
- 160/160 in the setup + tools_config + video_gen test slice.

Rationale: image_gen is on by default because it's a featured creative
tool used in casual chat (telegrams, etc). Video gen is heavier — long
wait, paid per-second pricing. Default-off matches user intent better.

---------

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
2026-05-13 16:39:41 -07:00
teknium1 b833d85019 chore(release): map mgongzai author for PR #25183 salvage 2026-05-13 14:53:04 -07:00
Kong cc64a04f61 test(gateway): make queued follow-up regression generic
Replace tenant-specific example text in the transcript offset regression with generic follow-up turns so the upstream test documents the bug without customer-specific wording.
2026-05-13 14:53:04 -07:00
Kong 9a815b6c8c fix(gateway): preserve queued follow-up transcript history
Keep the outer history_offset when _run_agent drains queued follow-ups recursively so transcript persistence includes every queued turn in the chain instead of only the last one.
2026-05-13 14:53:04 -07:00
brooklyn! 08671d8771 tui: make URLs clickable + hover-highlight in any terminal (#25071)
* tui: make URLs clickable + hover-highlight in any terminal

Problem
-------
URLs printed by `hermes --tui` were not clickable in basic macOS Terminal.app.
Cmd+click did nothing, the cursor didn't change shape — like nothing was
detected — even though arrow buttons and other Box onClick handlers worked
fine.

Root cause
----------
Two layers of dead plumbing:

1. `<Link>` only emitted the underlying `<ink-link>` (which carries the
   hyperlink metadata into the screen buffer) when `supportsHyperlinks()`
   said yes. On Apple_Terminal that's false, so the per-cell hyperlink
   field stayed empty, so `Ink.getHyperlinkAt()` had nothing to return on
   click. The visible underline was just decorative.

2. `Ink.openHyperlink()` calls `this.onHyperlinkClick?.(url)`, but
   `onHyperlinkClick` was never assigned anywhere in the codebase. The
   click pipeline (`App.tsx → onOpenHyperlink → Ink.openHyperlink`) ran
   but bailed silently on the optional chain.

Bonus discovery: even when wired up, there was no hover affordance —
terminal apps can't change the system mouse cursor, so users had no
visual signal that a cell was clickable. Arrow buttons in the chrome
worked because they had explicit `<Box onClick>` styling; inline link
URLs didn't.

Fix
---
- `Link.tsx`: always emit `<ink-link>` regardless of terminal capability.
  The renderer's `wrapWithOsc8Link` already gates the actual OSC 8 escape
  on `supportsHyperlinks()` further down — so terminals that don't
  understand OSC 8 still don't see the escape, but the screen-buffer
  metadata (which the click dispatcher reads) is now populated everywhere.

- `ink.tsx + root.ts`: add `onHyperlinkClick?: (url: string) => void` to
  `Options` / `RenderOptions`, wire it to the existing `Ink.onHyperlinkClick`
  field in the constructor.

- `src/lib/openExternalUrl.ts`: small platform-aware opener using
  `child_process.spawn` with arg-array (no shell) — http(s) only, rejects
  `file:`, `javascript:`, `data:`, etc., so a hostile model can't trigger
  arbitrary local handlers via `<Link url="file:///...">`. Detached + stdio
  ignore so closing the TUI doesn't kill the browser and Chrome stderr
  doesn't leak into the alt screen.

- `entry.tsx`: pass `onHyperlinkClick: openExternalUrl` to `ink.render`.

- `hyperlinkHover.ts` + Ink hover wiring: track the URL under the pointer
  in `Ink.hoveredHyperlink`, update it from `dispatchHover`, and inverse-
  highlight every cell of the matching link in the render-pass overlay
  (same pattern as `applySearchHighlight`). This is the cursor-hover
  affordance for clickable links — terminals don't expose cursor shape,
  so we light up the link itself.

- `types/hermes-ink.d.ts`: add `onHyperlinkClick` to the `RenderOptions`
  shim so consumers (`entry.tsx`) type-check against the new option.

Tests
-----
- `src/lib/openExternalUrl.test.ts` (15 cases): http(s) accepted; file/js/
  data/mailto/ftp/ssh rejected; macOS open(1), Windows cmd.exe start with
  empty title slot, Linux xdg-open dispatch; shell-metacharacter URLs
  pass through unmolested as a single argv element; synchronous spawn
  failure returns false.

Verified empirically in Apple Terminal 455.1 (macOS 15.7.3): clicking a
URL opens in default browser, hovering inverts the link cells, and
moving away clears the highlight. Full TUI suite: 713 passing, 0
type errors.

Reverts
-------
The earlier attempt that version-gated Apple_Terminal in
`supports-hyperlinks.ts` was based on a wrong assumption — Terminal.app
silently strips OSC 8 sequences but does not render them as clickable
hyperlinks. Reverted to the original allowlist.

* tui: address Copilot review — explorer.exe on win32 + comment fixes

- openExternalUrl: switch win32 from `cmd.exe /c start` to `explorer.exe`.
  cmd.exe's `start` builtin reparses the URL through cmd's tokenizer, so
  `&`, `|`, `^`, `<`, `>` either split the command or get reinterpreted —
  breaking both the protocol-allowlist safety story AND plain http(s) URLs
  with `&` in query strings. `explorer.exe <url>` invokes the registered
  protocol handler directly with no shell.

- openExternalUrl.test.ts: rename the win32 test to reflect the new
  contract and add two regression tests — one with `&|^<>` metachars,
  one with the common analytics-URL `&` query-param pattern — both pinned
  to single-argv-element delivery via explorer.exe.

- Link.tsx: fix misleading comment. OSC 8 escapes are emitted
  unconditionally by the renderer (`wrapWithOsc8Link` in
  render-node-to-output.ts, `oscLink` in log-update.ts). Non-supporting
  terminals silently strip the sequence, which is why hover/click
  affordance has to come from the in-process overlay rather than the
  terminal's own link rendering.

Verified: 715/715 tests pass, type-check + build clean.

* tui: address Copilot review #2 — async spawn errors + hover scope + docs

1. openExternalUrl: attach a no-op `'error'` listener on the spawned
   child BEFORE unref(). spawn() returns a ChildProcess synchronously
   even when the binary is missing (ENOENT on xdg-open / explorer.exe),
   unreachable, or otherwise unusable; the failure surfaces later as
   an 'error' event. An unhandled 'error' on an EventEmitter crashes
   Node, which would tear down the whole TUI. The listener is a
   deliberate no-op — we already returned `true` synchronously and the
   user just doesn't see the browser pop.

2. openExternalUrl.test.ts: add a regression test using a real
   EventEmitter to simulate the async-error path. Pins both the
   listener-attached contract and the "doesn't throw on emit" behavior.
   Was 17/17, now 18/18.

3. ink.tsx dispatchHover: bypass `getHyperlinkAt()` and read
   `cellAt(...).hyperlink` directly. `getHyperlinkAt` falls back to
   `findPlainTextUrlAt` for cells without an OSC 8 hyperlink, but the
   render-pass overlay (`applyHyperlinkHoverHighlight`) only matches on
   `cell.hyperlink === hoveredUrl` — so plain-text URLs would burn
   re-renders without ever producing the highlight. Hover is now a
   strictly 1:1 fit for what the overlay can paint. Plain-text URLs
   still get the click action via the existing dispatch path.

4. root.ts + ink.tsx doc comments: replace the misleading "typically
   `open` / `xdg-open` / `start` shell" wording with the actual safe
   recipe — argv-array spawn into `open` / `xdg-open` / `explorer.exe`,
   with an explicit warning that `cmd.exe /c start` reparses the URL
   through cmd's tokenizer and is unsafe + breaks `&`-query URLs.

Verified: 716/716 tests pass, type-check + build clean.

* tui: address Copilot review #3 — hover damage, alt-screen cleanup, opener allowlist

1. ink.tsx onRender: stop folding steady-state hover into hlActive.
   hlActive forces a full-screen damage diff so previous-frame inverted
   cells get re-emitted when the highlight set changes. The transition
   IS the trigger — enter / leave / change-to-other-link. While the
   pointer just sits on a link the painted cells don't change and the
   per-cell diff handles the no-op. Folding the steady state in would
   burn a full-screen diff on every frame. Added a
   lastRenderedHoveredHyperlink tracker and gate the hlActive bump on
   `hovered !== lastRendered`.

2. ink.tsx setAltScreenActive: clear hoveredHyperlink (and the tracker)
   when toggling alt-screen state. Hover dispatch is alt-screen-gated,
   so once we leave there's no path to clear it. Without this, remounting
   <AlternateScreen> would paint a phantom hover from the previous
   session until the next mouse-move arrived.

3. openExternalUrl.ts openCommand: allowlist linux + the BSD family for
   xdg-open and return null for everything else (aix, sunos, cygwin,
   haiku, etc.). Previously the default-fallback always returned
   xdg-open, which made the caller's `if (!command) return false` dead
   and yielded a misleading `true` on platforms that probably don't
   have xdg-open. New tests cover the null path AND the
   openExternalUrl-returns-false-without-spawning behavior.

Verified: 718/718 tests pass, type-check + build clean.

* tui: address Copilot review #4 — doc comment accuracy

1. openExternalUrl return-value doc: now lists all three false paths
   (URL rejected / no opener for platform / synchronous spawn throw)
   plus a note that async 'error' events still return true because the
   spawn was attempted.

2. ink.tsx onHyperlinkClick field doc: clarifies the callback receives
   either an OSC 8 hyperlink OR a plain-text URL detected by
   findPlainTextUrlAt — App.tsx routes both into the same callback.

3. hyperlinkHover applyHyperlinkHoverHighlight doc: drops the misleading
   'caller forces full-frame damage' promise. Caller decides; for hover
   the current caller only forces full damage on transitions.

No behavior change. 718/718 tests pass.

* tui: address Copilot review #5 — lint fixes

1. ink.tsx: reorder `./hyperlinkHover.js` import before `./screen.js` to
   satisfy perfectionist/sort-imports.

2. Link.tsx: drop unused `fallback` parameter destructuring + the
   trailing `void (null as ...)` dead-statement (would trip
   no-unused-expressions). Kept `fallback?: ReactNode` on the Props
   interface as a documented compat shim so existing call sites still
   compile, with a comment explaining why it's no longer wired up.

3. openExternalUrl.test.ts: replace `typeof import('node:child_process').spawn`
   inline annotations (forbidden by @typescript-eslint/consistent-type-imports)
   with a `SpawnLike` type alias backed by a real `import type { spawn as SpawnFn }`.

No behavior change. 718/718 tests pass, type-check clean, lint clean on
all modified files.
2026-05-13 13:52:10 -07:00
vominh1919 e2b2d48610 fix(cli): preserve startup banner on terminal resize
Recover from SIGWINCH without clearing the physical screen or scrollback
buffer. The startup banner and tool summary are printed before
prompt_toolkit owns the live chrome, so they live in normal terminal
scrollback. Calling erase_screen() + \x1b[3J] on every resize removed
that UI permanently — _replay_output_history cannot reconstruct it
because the banner was never added to _OUTPUT_HISTORY.

Instead, just reset prompt_toolkit's renderer cache and invalidate so
the next incremental redraw starts from a clean slate, then let the
original on_resize handler recalculate layout for the new terminal
size. This matches the behaviour of bash/zsh/fish on SIGWINCH.

Fixes NousResearch/hermes-agent#22999
2026-05-13 13:36:31 -07:00
teknium1 59da8ec4ec fix(tools): refuse skill_view name collisions instead of guessing
skill_view ran the direct-path strategy across every skill dir before
the recursive strategy, so a top-level skill in an external dir could
silently shadow a same-named nested local skill. /skills correctly
listed the local version (deduped local-first by _find_all_skills) but
skill_view loaded the external one — confusing, and a real bug class
for users with skills.external_dirs registered alongside categorized
local skills.

Pick a louder fix than @polkn's PR #6136 proposed: collect every match
across all dirs (direct path, recursive by parent dir name, legacy
flat <name>.md), and if there's more than one, refuse with an error
that surfaces every matching path plus a hint to load by the
categorized form. Local-first precedence would have replaced silent
external-shadowing with silent same-name collisions between two
externals, or made an externally-shadowed-by-local skill unreachable
by bare name with no signal. Refusing forces the user to disambiguate
once and never wonder which skill ran.

Recovery: pass the full categorized path
("foundations/runtime/explore-codebase" instead of
"explore-codebase"), or rename one of the colliding skills.

Co-authored-by: pol <pol.kuijken@gmail.com>
2026-05-13 13:29:28 -07:00
Teknium 256bedb632 fix(setup): drop post-setup chat handoff (#25067)
Removes the 'Launch hermes chat now? (Y/n)' prompt at the end of
hermes setup. The summary already prints 'Ready to go! → hermes'
so the auto-launch was redundant, and on macOS 26+ it could crash
in prompt_toolkit when setup was invoked from the curl install
script with stdin redirected from /dev/tty (#5884, #6128).

After setup, users run 'hermes' themselves like every other CLI
tool. Same pattern applies to the Windows installer.

Closes #6128 (narrower env-var-guarded fix superseded by removing
the prompt outright).
2026-05-13 13:28:25 -07:00
littlewwwhite 6f2d1c88b7 feat(custom): prompt and persist explicit api_mode for custom providers
Adds an explicit API compatibility mode prompt to the `hermes model -> custom`
flow so Codex-compatible third-party endpoints (and any other non-default
backend whose URL doesn't match the existing heuristics in
`_detect_api_mode_for_url`) can be selected explicitly instead of silently
falling back to chat_completions.

Choices: Auto-detect / chat_completions / codex_responses / anthropic_messages.

Persists `api_mode` to:
  - `model.api_mode` (active session config)
  - the matching `custom_providers[*]` entry (so re-activating the named
    provider next time replays the same transport)

Salvaged from PR #6125 onto current main: kept the new prompt and the
`_save_custom_provider(api_mode=...)` plumbing; the named-custom flow
already extracts and applies `api_mode` from the saved entry on current
main so those changes are preserved as-is. Test fixtures updated for the
new prompt and the existing display-name prompt.

Co-authored-by: littlewwwhite <1095245867@qq.com>
2026-05-13 13:21:33 -07:00
Teknium 1979ef5802 chore(release): map iuyup author for PR #6155 salvage 2026-05-13 10:31:22 -07:00
iuyup d6c9711ba8 fix(security): reduce unnecessary shell=True in subprocess calls
- memory_setup.py: use shlex.split() for plugin dep checks instead of shell=True
- transcription_tools.py: avoid shell=True for auto-detected whisper commands
  (user-provided templates via env var still use shell=True for compatibility)
- cli.py: add comment clarifying intentional shell=True for user quick_commands
- Add test verifying auto-detected template is shlex-safe

Addresses CONTRIBUTING.md Priority #3 (Security hardening — shell injection).
2026-05-13 10:31:22 -07:00
teknium1 a9b8254e5f chore(release): map anton.kuenzi@gmail.com -> ZeterMordio
For PR #11754 salvage (zsh completion compdef registration + _arguments
syntax tests). CI release script blocks unmapped emails.
2026-05-13 09:34:15 -07:00
Teknium a43d7e67b4 refactor(profiles): remove dead generate_bash_completion / generate_zsh_completion
These two functions in hermes_cli/profiles.py have no callers — the live
`hermes completion {bash,zsh}` command uses hermes_cli/completion.py's
generate_bash() / generate_zsh() instead. Multiple PRs (incl. #6141) tried
to fix the trailing-`_hermes "$@"` zsh bug here, only to discover the
patch never reached users. Delete the dead code so future contributors
patch the right file.

The actual user-facing fix lives in the preceding cherry-picked commits
to hermes_cli/completion.py.
2026-05-13 09:34:15 -07:00
Anton Künzi 6d30b4a7e3 test(cli): strengthen zsh completion regression coverage 2026-05-13 09:34:15 -07:00
Anton Künzi 8c4bec6155 fix(cli): repair broken zsh completion generation 2026-05-13 09:34:15 -07:00
ethernet 4fdfdf6749 Merge pull request #25045 from NousResearch/hermes/hermes-852727b9
ci(docker): split :latest (releases only) from :main
2026-05-13 10:47:30 -04:00
ethernet 1149e75db2 ci(docker): split :latest (releases only) from :main (main HEAD)
Previously :latest tracked the tip of main, which meant pulling :latest
got you whatever was last merged — fine for development, surprising for
users who expect :latest to mean 'the most recent stable release'.

Reshape the publish flow so the floating tags carry their conventional
meaning:

  - :sha-<sha>      every main commit (unchanged, immutable)
  - :main           tip of main (NEW; what :latest used to do)
  - :<release_tag>  every published release, e.g. :v1.2.3 (unchanged)
  - :latest         most recent release (CHANGED; release-only now)

Implementation:

  - Rename the move-latest job to move-main; it still gates on push to
    main, still ancestor-checks the existing :main label before
    retagging, still uses cancel-in-progress: false so queued moves run
    serially.

  - Add a new move-latest job gated on release: published. Reads the
    OCI revision label off the existing :latest and only advances if
    the release commit is a strict descendant. This keeps backport
    releases on older branches (e.g. patching v1.1.5 after v1.2.3 has
    already shipped) from dragging :latest backwards.

  - merge job exposes pushed_release_tag and release_tag outputs so
    move-latest knows when to fire and what to retag from.
2026-05-13 10:30:42 -04:00
Siddharth Balyan 5d90386baa fix(gateway): add lazy_deps.ensure() to slack, matrix, dingtalk, feishu adapters (#25014)
Only Discord and Telegram had lazy-install hooks in their
check_*_requirements() functions. The remaining four platforms that were
moved to lazy_deps (Slack, Matrix, DingTalk, Feishu) would just return
False immediately if their packages weren't pre-installed — no attempt
to install them at runtime.

This means even with the .venv permissions fix (#24841), these four
platforms would still fail to load in Docker (or any fresh install)
unless the user manually ran pip install.

Add the same lazy_deps.ensure() pattern to all four, matching the
existing Discord/Telegram implementation.
2026-05-13 19:28:50 +05:30
kshitijk4poor c3094b46e9 refactor: import FILE_MUTATING_TOOL_NAMES from shared module
Drops the duplicate _FILE_MUTATING_TOOLS frozenset in run_agent.py and
imports the canonical FILE_MUTATING_TOOL_NAMES from
agent/tool_result_classification.py (aliased as _FILE_MUTATING_TOOLS to
avoid renaming the existing call sites). Prevents future drift if
another file-mutating tool is added — only one set needs updating.

No behavior change: same frozenset({'write_file', 'patch'}), and the
117 PR-scoped tests still pass.
2026-05-13 06:46:23 -07:00
GodsBoy da0ddbf88a fix: classify landed file mutations with diagnostics 2026-05-13 06:46:23 -07:00
briandevans 71c6dd0dcf fix(cli): add 'lsp' to _BUILTIN_SUBCOMMANDS so plugin discovery is skipped
`lsp` is registered as a top-level subparser in `main()` (lines 9539-9545)
via `agent.lsp.cli.register_subparser`, so it shows up in `hermes --help`
output alongside the other built-ins. The `_BUILTIN_SUBCOMMANDS` set used
by `_plugin_cli_discovery_needed` to short-circuit the ~500-650ms plugin
import pass did not list it, so every `hermes lsp ...` invocation paid
the full discovery cost despite being a fully-built-in command.

This is also caught by the parity guard added in #22120:
`tests/hermes_cli/test_startup_plugin_gating.py::test_builtin_set_covers_every_registered_subcommand`
has been failing on clean origin/main with:

    AssertionError: _BUILTIN_SUBCOMMANDS is missing these live
    subcommands: ['lsp']. Add them to hermes_cli/main.py::_BUILTIN_SUBCOMMANDS
    so plugin discovery can be skipped when the user targets them.

Fix: add `"lsp"` to the frozenset (alphabetical position between `logs`
and `mcp`). The accompanying `test_builtin_set_has_no_phantom_entries`
guard still passes because `lsp` is genuinely live — registered via the
guarded `try/except Exception` in main() since #24168.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 06:46:23 -07:00
Siddharth Balyan 942adf6179 fix(docker): chown .venv to hermes so lazy_deps can install platform packages (#24841)
The Dockerfile permissions section made /opt/hermes/.venv readable but not
writable by the hermes runtime user.  Since the 2026-05-12 policy change
moved messaging packages (discord.py, telegram, slack, etc.) out of [all]
and into lazy_deps.py, the Docker image no longer ships with them
pre-installed.  At first gateway boot, lazy_deps.ensure() tries to
`uv pip install` them into the venv but fails with EACCES because
site-packages is root-owned.

The result: every messaging platform adapter silently fails to load inside
Docker containers, producing only a cryptic "discord.py not installed"
warning despite the gateway being correctly configured.

Two-part fix:

1. Dockerfile: add /opt/hermes/.venv to the existing chown -R hermes:hermes
   line so the default (UID 10000) case works out of the box.

2. docker/entrypoint.sh: extend the needs_chown block to also re-chown the
   .venv when HERMES_UID is remapped. Without this, the build-time chown
   becomes stale when someone uses the documented HERMES_UID override in
   docker-compose.yml.

Fixes #21536
Related: #17674, #21543, #21755
2026-05-13 11:55:07 +05:30
Teknium 1e01b25e76 feat(providers): rename Alibaba Cloud to Qwen Cloud, reorder picker (#24835)
- Rename 'Alibaba Cloud (DashScope)' display label to 'Qwen Cloud'
  in CANONICAL_PROVIDERS (model picker, /model, hermes model TUI) and
  PROVIDER_REGISTRY (setup wizard prompts, status output).
- Move Qwen Cloud (alibaba) up to position 6 — directly below
  OpenAI Codex and above Xiaomi MiMo.
- Move Qwen OAuth (Portal) (qwen-oauth) to the bottom of the
  canonical provider list.

Provider slug 'alibaba' is unchanged — only the display label
moved. DashScope env var (DASHSCOPE_API_KEY) and base URL are
unchanged. The separate 'alibaba-coding-plan' plugin provider is
not affected.
2026-05-12 22:43:41 -07:00
Teknium 486b692ddd feat(nous): unified client=hermes-client-v<version> tag on every Portal request (#24779)
* feat(nous): unified client=hermes-client-v<version> tag on every Portal request

Every Hermes request to Nous Portal now carries the same
client=hermes-client-v<__version__> tag (e.g. client=hermes-client-v0.13.0
on this release), sourced live from hermes_cli.__version__. The release
script's regex bump auto-aligns it on every release.

Centralized in agent/portal_tags.py and wired into all four call sites:
- NousProfile.build_extra_body (main agent loop, every chat completion)
- auxiliary_client.NOUS_EXTRA_BODY + _build_call_kwargs (aux client)
- run_agent.py compression-summary fallback path
- tools/web_tools.py web_extract fallback

Replaces the client=aux marker added in #24194 with the unified version
tag. Tests assert against the helper output (invariant) rather than the
literal string, so they don't need updating on every release.

* feat(nous): cover /goal judge and kanban specify aux paths

Two aux-using surfaces bypassed call_llm by invoking
client.chat.completions.create() directly without extra_body, so they
were missing the unified Portal client tag:

- hermes_cli/goals.py — /goal standing-goal judge
- hermes_cli/kanban_specify.py — kanban triage specifier

Both now pass extra_body=get_auxiliary_extra_body() or None so they
inherit the version tag when the aux client points at Nous Portal, and
emit nothing otherwise (no tag leak to OpenRouter/Anthropic auxes).
2026-05-12 20:49:20 -07:00
Teknium b06e999302 fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778)
The long-lived prefix-cache layout split the system prompt into stable/
context/volatile blocks and re-derived them on every API call. The
volatile tier (timestamp + memory snapshot + USER profile) ticks per
turn, so the system message bytes mutated mid-conversation and broke
upstream prompt caches (OpenRouter, Nous Portal, Anthropic).

Diagnosed via live wire-format diffing: an 8-turn conversation showed
OLD layout flipping system block[1] sha mid-session at the minute
boundary, dropping cached_tokens to 0 on that turn (cumulative
66.6% vs 83.3% for the single-block layout). Hermes invariant:
history (system + all but the last 1-2 messages) must be static.

Fix: drop the long-lived layout entirely. Single layout everywhere —
system_and_3 with one cached system string built once on first turn,
replayed verbatim on every subsequent turn. Loses cross-session 1h
prefix caching for Claude (the feature that motivated the split), but
within-session caching now actually works on every provider.

Removed:
- run_agent.py: _use_long_lived_prefix_cache flag, _long_lived_cache_ttl,
  _supports_long_lived_anthropic_cache method, the long-lived branch in
  run_conversation, mark_tools_for_long_lived_cache call site
- agent/prompt_caching.py: apply_anthropic_cache_control_long_lived,
  mark_tools_for_long_lived_cache, _mark_system_stable_block helper
- hermes_cli/config.py: prompt_caching.long_lived_prefix and
  prompt_caching.long_lived_ttl config keys
- tests/agent/test_prompt_caching_live.py (entire file)
- tests/agent/test_prompt_caching.py: TestMarkToolsForLongLivedCache,
  TestApplyAnthropicCacheControlLongLived
- tests/run_agent/test_anthropic_prompt_cache_policy.py:
  TestSupportsLongLivedAnthropicCache

Targeted tests: 62/62 pass.
2026-05-12 20:46:04 -07:00
amathxbt 80374d4dd9 fix: approval DELETE pattern DOTALL flag allows newline bypass 2026-05-12 18:50:37 -07:00
AgentArcLab 8ac351407e fix(agent): clear stale config context_length on model switch
When switching models via /model, AIAgent._config_context_length was
never cleared, so the new model inherited the previous model's context
window instead of auto-detecting the correct one via
get_model_context_length().

Clear _config_context_length to None before the runtime field swap so
the full resolution chain (custom_providers per-model, endpoint probe,
models.dev, etc.) is re-evaluated for the newly selected model.

Closes #21509
2026-05-12 18:50:04 -07:00
alblez a4289d74ac fix(test): use i18n t() for restart drain assertion
The test_restart_command_while_busy_requests_drain_without_interrupt test
was asserting against a hardcoded emoji string that was valid before the
i18n migration. After gateway/run.py switched to t("gateway.draining",
count=N), the test sees the translated output (or the raw key when the
locale catalog isn't resolved in xdist workers).

Fix by asserting against t("gateway.draining", count=1) — this produces
the correct expected value regardless of whether the locale file is
available in the test environment.
2026-05-12 18:49:33 -07:00
liuhao1024 1a4e8f7041 fix(gateway): make WhatsApp npm install timeout configurable
Default timeout raised from 60s to 300s (5 minutes) to accommodate
slower systems like Unraid NAS. Configurable via WHATSAPP_NPM_INSTALL_TIMEOUT
environment variable.
2026-05-12 18:49:07 -07:00
AhmetArif0 420762f867 fix(tools): forward thread_id via metadata in _send_via_adapter live path
The live adapter path in _send_via_adapter called adapter.send() without
passing thread_id, while the standalone fallback path correctly forwarded
it. For plugin platforms (google_chat, teams, irc, line) running with the
gateway in-process, this caused every threaded reply to land as a new
top-level message instead of continuing the thread.

Matches the pattern already used by _send_matrix_via_adapter and
_send_feishu: build metadata={"thread_id": thread_id} and pass it through.
2026-05-12 18:48:44 -07:00
02356abc e77fd75c44 fix(wecom): update connection status after WebSocket reconnection
The WeCom adapter's _listen_loop() automatically reconnects when the
WebSocket drops, but it never called _mark_connected() after a successful
reconnection. This left the runtime status file (gateway_state.json) stuck
in "disconnected" even though the adapter was fully operational again.

Add self._mark_connected() right after _open_connection() succeeds so
that the dashboard and health probes report the correct state.

Tested by forcing a WebSocket close via the heartbeat loop and verifying
that the status file updated from "disconnected" back to "connected".
2026-05-12 18:48:17 -07:00
265 changed files with 25547 additions and 4902 deletions
+8
View File
@@ -14,6 +14,14 @@
# LLM_MODEL is no longer read from .env — this line is kept for reference only.
# LLM_MODEL=anthropic/claude-opus-4.6
# =============================================================================
# LLM PROVIDER (NovitaAI)
# =============================================================================
# NovitaAI — 90+ models, pay-per-use
# Get your key at: https://novita.ai/settings/key-management
# NOVITA_API_KEY=
# NOVITA_BASE_URL=https://api.novita.ai/openai/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (Google AI Studio / Gemini)
# =============================================================================
+161 -34
View File
@@ -28,9 +28,10 @@ permissions:
contents: read
# Concurrency: push/release runs are NEVER cancelled so every merge gets its
# own SHA-tagged image; :latest is guarded separately by the move-latest job.
# PR runs reuse a PR-scoped group with cancel-in-progress: true so rapid
# pushes to the same PR collapse to the latest commit.
# own SHA-tagged image; :main and :latest are guarded separately by the
# move-main and move-latest jobs. PR runs reuse a PR-scoped group with
# cancel-in-progress: true so rapid pushes to the same PR collapse to the
# latest commit.
concurrency:
group: docker-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
@@ -91,10 +92,10 @@ jobs:
# pattern for multi-runner multi-platform builds.
#
# We apply the OCI revision label here (and again on arm64) because
# the move-latest job reads it off the linux/amd64 sub-manifest config
# of `:latest` to decide whether it's safe to advance. The label must
# be on each per-arch image — manifest lists themselves don't carry
# image config labels.
# the move-main / move-latest jobs read it off the linux/amd64
# sub-manifest config of the floating tag to decide whether it's safe
# to advance. The label must be on each per-arch image — manifest
# lists themselves don't carry image config labels.
- name: Push amd64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
@@ -217,6 +218,8 @@ jobs:
timeout-minutes: 10
outputs:
pushed_sha_tag: ${{ steps.mark_pushed.outputs.pushed }}
pushed_release_tag: ${{ steps.mark_release_pushed.outputs.pushed }}
release_tag: ${{ steps.tag.outputs.tag }}
steps:
- name: Download digests
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
@@ -271,33 +274,43 @@ jobs:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
# Signal to move-latest that the SHA tag is live. Only on main pushes;
# releases don't trigger move-latest (they use their own release tag).
# Signal to move-main that the SHA tag is live. Only on main pushes;
# releases set pushed_release_tag instead.
- name: Mark SHA tag pushed
id: mark_pushed
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: echo "pushed=true" >> "$GITHUB_OUTPUT"
# Signal to move-latest that the release tag is live.
- name: Mark release tag pushed
id: mark_release_pushed
if: github.event_name == 'release'
run: echo "pushed=true" >> "$GITHUB_OUTPUT"
# ---------------------------------------------------------------------------
# Move :latest to point at the SHA tag the merge job pushed.
# Move :main to point at the SHA tag the merge job pushed.
#
# :main is the floating tag that tracks the tip of the main branch. Every
# merge to main retags :main forward. Users who want "latest dev build"
# pull :main; users who want stable releases pull :latest.
#
# The real serialization guarantee comes from the top-level concurrency
# group (`docker-${{ github.ref }}` with `cancel-in-progress: false`),
# which ensures at most one workflow run for this ref executes at a time.
# That means two move-latest steps for the same ref cannot overlap.
# That means two move-main steps for the same ref cannot overlap.
#
# This job has its own concurrency group as defense-in-depth: if the
# top-level group is ever loosened, queued move-latests will run serially
# top-level group is ever loosened, queued move-mains will run serially
# in arrival order, each one running the ancestor check below and either
# advancing :latest or skipping. `cancel-in-progress: false` matches the
# advancing :main or skipping. `cancel-in-progress: false` matches the
# top-level setting — we don't want rapid pushes to cancel a queued
# move-latest, because the ancestor check is the real safety mechanism
# and queueing is cheap (move-latest is a ~30s registry op).
# move-main, because the ancestor check is the real safety mechanism
# and queueing is cheap (move-main is a ~30s registry op).
#
# Combined with the ancestor check, this means :latest only ever moves
# Combined with the ancestor check, this means :main only ever moves
# forward in git history.
# ---------------------------------------------------------------------------
move-latest:
move-main:
if: |
github.repository == 'NousResearch/hermes-agent'
&& github.event_name == 'push'
@@ -307,7 +320,7 @@ jobs:
runs-on: ubuntu-latest
timeout-minutes: 10
concurrency:
group: docker-move-latest-${{ github.ref }}
group: docker-move-main-${{ github.ref }}
cancel-in-progress: false
steps:
- name: Checkout code
@@ -324,13 +337,13 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Read the git revision label off the current :latest manifest, then
# Read the git revision label off the current :main manifest, then
# use `git merge-base --is-ancestor` to check whether our commit is a
# descendant of it. If :latest doesn't exist yet, or its label is
# descendant of it. If :main doesn't exist yet, or its label is
# missing, we treat that as "safe to publish". If another run already
# advanced :latest past us (or diverged), we skip and leave it alone.
- name: Decide whether to move :latest
id: latest_check
# advanced :main past us (or diverged), we skip and leave it alone.
- name: Decide whether to move :main
id: main_check
run: |
set -euo pipefail
image=nousresearch/hermes-agent
@@ -338,6 +351,119 @@ jobs:
# Pull the JSON for the linux/amd64 sub-manifest's config and extract
# the OCI revision label with jq — Go template field access can't
# handle dots in map keys, so using json+jq is the robust route.
image_json=$(
docker buildx imagetools inspect "${image}:main" \
--format '{{ json (index .Image "linux/amd64") }}' \
2>/dev/null || true
)
if [ -z "${image_json}" ]; then
echo "No existing :main (or inspect failed) — safe to publish."
echo "push_main=true" >> "$GITHUB_OUTPUT"
exit 0
fi
current_sha=$(
printf '%s' "${image_json}" \
| jq -r '.config.Labels."org.opencontainers.image.revision" // ""'
)
if [ -z "${current_sha}" ]; then
echo "Registry :main has no revision label — safe to publish."
echo "push_main=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "Registry :main is at ${current_sha}"
echo "This run is at ${GITHUB_SHA}"
if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
echo ":main already points at our SHA — nothing to do."
echo "push_main=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Make sure we have the :main commit locally for merge-base.
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
git fetch --no-tags --prune origin \
"+refs/heads/main:refs/remotes/origin/main" \
|| true
fi
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
echo "Registry :main points at an unknown commit (${current_sha}); refusing to overwrite."
echo "push_main=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Our SHA must be a descendant of the current :main to be safe.
if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
echo "Our commit is a descendant of :main — safe to advance."
echo "push_main=true" >> "$GITHUB_OUTPUT"
else
echo "Another run advanced :main past us (or diverged) — leaving it alone."
echo "push_main=false" >> "$GITHUB_OUTPUT"
fi
# Retag the already-pushed SHA manifest as :main. This is a registry-
# side operation — no rebuild, no layer re-push — so it's quick and
# atomic per-tag. The ancestor check above plus the cancel-in-progress
# concurrency on this job together guarantee we only ever move :main
# forward in git history.
- name: Move :main to this SHA
if: steps.main_check.outputs.push_main == 'true'
run: |
set -euo pipefail
image=nousresearch/hermes-agent
docker buildx imagetools create \
--tag "${image}:main" \
"${image}:sha-${GITHUB_SHA}"
# ---------------------------------------------------------------------------
# Move :latest to point at the release tag the merge job pushed.
#
# :latest is the floating tag that tracks the most recent stable release.
# Only `release: published` events advance it — never main pushes.
#
# We still run an ancestor check against the existing :latest so that a
# backport release on an older branch (e.g. patching v1.1.5 after v1.2.3
# is out) doesn't drag :latest backwards. The check is the same shape as
# move-main: read the OCI revision label off the current :latest, look up
# that commit in git, and only advance if our release commit is a strict
# descendant.
# ---------------------------------------------------------------------------
move-latest:
if: |
github.repository == 'NousResearch/hermes-agent'
&& github.event_name == 'release'
&& needs.merge.outputs.pushed_release_tag == 'true'
needs: merge
runs-on: ubuntu-latest
timeout-minutes: 10
concurrency:
group: docker-move-latest
cancel-in-progress: false
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
fetch-depth: 1000
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Decide whether to move :latest
id: latest_check
run: |
set -euo pipefail
image=nousresearch/hermes-agent
image_json=$(
docker buildx imagetools inspect "${image}:latest" \
--format '{{ json (index .Image "linux/amd64") }}' \
@@ -362,7 +488,7 @@ jobs:
fi
echo "Registry :latest is at ${current_sha}"
echo "This run is at ${GITHUB_SHA}"
echo "This release is at ${GITHUB_SHA}"
if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
echo ":latest already points at our SHA — nothing to do."
@@ -371,6 +497,7 @@ jobs:
fi
# Make sure we have the :latest commit locally for merge-base.
# Releases can be cut from any branch, so fetch broadly.
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
git fetch --no-tags --prune origin \
"+refs/heads/main:refs/remotes/origin/main" \
@@ -383,25 +510,25 @@ jobs:
exit 0
fi
# Our SHA must be a descendant of the current :latest to be safe.
# Our release SHA must be a descendant of the current :latest.
# Backport releases on older branches won't satisfy this and will
# be left alone — :latest stays on the newer release.
if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
echo "Our commit is a descendant of :latest — safe to advance."
echo "Our release commit is a descendant of :latest — safe to advance."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
else
echo "Another run advanced :latest past us (or diverged) — leaving it alone."
echo "Existing :latest is newer than this release (likely a backport) — leaving it alone."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
fi
# Retag the already-pushed SHA manifest as :latest. This is a registry-
# side operation — no rebuild, no layer re-push — so it's quick and
# atomic per-tag. The ancestor check above plus the cancel-in-progress
# concurrency on this job together guarantee we only ever move :latest
# forward in git history.
- name: Move :latest to this SHA
# Retag the already-pushed release manifest as :latest.
- name: Move :latest to this release tag
if: steps.latest_check.outputs.push_latest == 'true'
env:
RELEASE_TAG: ${{ needs.merge.outputs.release_tag }}
run: |
set -euo pipefail
image=nousresearch/hermes-agent
docker buildx imagetools create \
--tag "${image}:latest" \
"${image}:sha-${GITHUB_SHA}"
"${image}:${RELEASE_TAG}"
+91
View File
@@ -513,6 +513,17 @@ generic plugin surface (new hook, new ctx method) — never hardcode
plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded
honcho argparse from `main.py` for exactly this reason.
**No new in-tree memory providers (policy, May 2026):** the set of
built-in memory providers under `plugins/memory/` is closed. New memory
backends must ship as **standalone plugin repos** that users install
into `~/.hermes/plugins/` (or via pip entry points) — they implement
the same `MemoryProvider` ABC, register through the same discovery
path, and integrate via `hermes memory setup` / `post_setup()` without
landing in this tree. PRs that add a new directory under
`plugins/memory/` will be closed with a pointer to publish the
provider as its own repo. Existing in-tree providers stay; bug fixes
to them are welcome.
### Model-provider plugins (`plugins/model-providers/<name>/`)
Every inference backend (openrouter, anthropic, gmi, deepseek, nvidia, …)
@@ -580,6 +591,86 @@ during setup, injected at load time).
Top-level `tags:` and `category:` are also accepted and mirrored from
`metadata.hermes.*` by the loader.
### Skill authoring standards (HARDLINE)
Every new or modernized skill — bundled, optional, or contributed —
must meet these standards before merge. Reviewers reject PRs that
violate them.
1. **`description` ≤ 60 characters, one sentence, ends with a period.**
Long descriptions bloat skill listings and dilute the model's
attention when many skills are loaded. State the capability, not
the implementation. No marketing words ("powerful",
"comprehensive", "seamless", "advanced"). Don't repeat the skill
name. Verify with:
```python
import re, pathlib
m = re.search(r'^description: (.*)$',
pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
re.MULTILINE)
assert len(m.group(1)) <= 60, len(m.group(1))
```
2. **Tools referenced in SKILL.md prose must be native Hermes tools or
MCP servers the skill explicitly expects.** When the skill needs a
capability, point at the proper tool by name in backticks
(`` `terminal` ``, `` `web_extract` ``, `` `read_file` ``,
`` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``,
`` `browser_navigate` ``, `` `delegate_task` ``, etc.). Do NOT
name shell utilities the agent already has wrapped — `grep` →
`search_files`, `cat`/`head`/`tail` → `read_file`, `sed`/`awk` →
`patch`, `find`/`ls` → `search_files target='files'`. If the skill
depends on an MCP server, name the MCP server and document the
expected setup in `## Prerequisites`. Anything else (third-party
CLIs, shell pipelines, etc.) is fair game inside script files but
should not be the headline interaction surface in the prose.
3. **`platforms:` gating audited against actual script imports.**
Skills that use POSIX-only primitives (`fcntl`, `termios`,
`os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, `/tmp`
hardcoded, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`,
`systemctl`) must declare their supported platforms. Default
posture: try to fix it cross-platform first — `tempfile.gettempdir`,
`pathlib.Path`, `psutil.pid_exists`, Python-level filtering instead
of `grep`. Gate to a narrower set only when the dependency is
genuinely platform-bound.
4. **`author` credits the human contributor first.** For external
contributions, the contributor's real name + GitHub handle goes
first; "Hermes Agent" is the secondary collaborator. If the
contributor's commit shows "Hermes Agent" as author (because they
used Hermes to draft the skill), replace it with their actual name
— credit the human, not the tool.
5. **SKILL.md body uses the modern section order.** `# <Skill> Skill`
title, 2-3 sentence intro stating what it does and doesn't do,
`## When to Use`, `## Prerequisites`, `## How to Run`,
`## Quick Reference`, `## Procedure`, `## Pitfalls`,
`## Verification`. Target ~200 lines for a complex skill,
~100 lines for a simple one. Cut redundant intro fluff, marketing
prose, and re-explanations of env vars already in
`## Prerequisites`.
6. **Scripts go in `scripts/`, references in `references/`,
templates in `templates/`.** Don't expect the model to inline-write
parsers, XML walkers, or non-trivial logic every call — ship a
helper script. Reference it from SKILL.md by path relative to the
skill directory.
7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only
stdlib + pytest + `unittest.mock`. No live network calls. Run via
`scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`.
8. **`.env.example` additions are isolated to a clearly delimited
block.** Don't touch the surrounding file — contributor-supplied
`.env.example` versions are usually stale and edits outside the
skill's own block must be dropped during salvage.
The full salvage / modernization checklist for external skill PRs
lives in the `hermes-agent-dev` skill at
`references/new-skill-pr-salvage.md` — load it before polishing
contributor skill PRs.
---
## Toolsets
+70
View File
@@ -49,6 +49,24 @@ If your skill is specialized, community-contributed, or niche, it's better suite
---
## Memory Providers: Ship as a Standalone Plugin
**We are no longer accepting new memory providers into this repo.** The set of built-in providers under `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) is closed. If you want to add a new memory backend, publish it as a **standalone plugin repo** that users install into `~/.hermes/plugins/` (or via a pip entry point).
Standalone memory plugins:
- Implement the same `MemoryProvider` ABC (`agent/memory_provider.py`) — `sync_turn`, `prefetch`, `shutdown`, and optionally `post_setup(hermes_home, config)` for setup-wizard integration
- Use the same discovery system — `discover_memory_providers()` picks them up from user/project plugin directories and pip entry points
- Integrate with `hermes memory setup` via `post_setup()` — no need to touch core code
- Can register their own CLI subcommands via `register_cli(subparser)` in a `cli.py` file
- Get all the same lifecycle hooks and config plumbing as in-tree providers
PRs that add a new directory under `plugins/memory/` will be closed with a pointer to publish the provider as its own repo. Existing in-tree providers stay; bug fixes to them are welcome.
This isn't a quality bar — it's a coupling-and-maintenance decision. Memory providers are the most common plugin type and they shouldn't all live in this tree.
---
## Development Setup
### Prerequisites
@@ -461,6 +479,58 @@ Gateway and messaging sessions never collect secrets in-band; they instruct the
See `skills/gifs/gif-search/` and `skills/email/himalaya/` for examples.
### Skill authoring standards (HARDLINE)
Every new or modernized skill — bundled, optional, or contributed — must meet these standards before merge. Reviewers reject PRs that violate them.
1. **`description` ≤ 60 characters, one sentence, ends with a period.** Long descriptions bloat the skill listing UI and dilute the model's attention when many skills are loaded. State the capability, not the implementation. No marketing words ("powerful", "comprehensive", "seamless", "advanced"). Don't repeat the skill name. Verify with:
```python
import re, pathlib
m = re.search(r'^description: (.*)$',
pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
re.MULTILINE)
assert len(m.group(1)) <= 60, len(m.group(1))
```
Good: `Search arXiv papers by keyword, author, category, or ID.`
Bad: `A powerful and comprehensive skill that allows the agent to search arXiv for relevant academic papers using various criteria including keywords, authors, and categories.`
2. **Tools referenced in SKILL.md prose must be native Hermes tools or MCP servers the skill explicitly expects.** When the skill needs a capability, point at the proper tool by name in backticks: `` `terminal` ``, `` `web_extract` ``, `` `web_search` ``, `` `read_file` ``, `` `write_file` ``, `` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``, `` `browser_navigate` ``, `` `delegate_task` ``, `` `image_generate` ``, `` `text_to_speech` ``, `` `cronjob` ``, `` `memory` ``, `` `skill_view` ``, `` `todo` ``, `` `execute_code` ``.
Do NOT name shell utilities the agent already has wrapped:
| Don't say | Say |
|---|---|
| `grep`, `rg` | `search_files` |
| `cat`, `head`, `tail` | `read_file` |
| `sed`, `awk` | `patch` |
| `find`, `ls` | `search_files` (with `target='files'`) |
| `curl` for content extraction | `web_extract` |
| `echo > file`, `cat <<EOF` | `write_file` |
If the skill depends on an MCP server, name the MCP server and document its setup in `## Prerequisites`. Third-party CLIs (e.g. `ffmpeg`, `gh`, a specific SDK) are fine to invoke from inside script files, but the prose should frame the interaction as "invoke through the `terminal` tool", not as a manual shell session.
3. **`platforms:` gating audited against actual script imports.** Skills that use POSIX-only primitives (`fcntl`, `termios`, `os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, hardcoded `/tmp` paths, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`, `systemctl`) must declare their supported platforms via the `platforms:` frontmatter. Default posture is to fix it cross-platform first — `tempfile.gettempdir()`, `pathlib.Path`, `psutil.pid_exists()`, Python-level filtering instead of `grep`. Gate to a narrower set only when the dependency is genuinely platform-bound (e.g. `osascript` is macOS-only, `/proc` is Linux-only).
4. **`author` credits the human contributor first.** For external contributions, the contributor's real name + GitHub handle goes first (`Jane Doe (jane-doe)`); "Hermes Agent" is the secondary collaborator. If the contributor's commit shows "Hermes Agent" as author because they used Hermes to draft the skill, replace it with their actual name — credit the human, not the tool.
5. **SKILL.md body uses the modern section order.** `# <Skill> Skill` title, 2-3 sentence intro stating what it does and what it doesn't do, then:
- `## When to Use` — trigger conditions
- `## Prerequisites` — env vars, install steps, MCP setup, API key sourcing
- `## How to Run` — canonical invocation through the `terminal` tool
- `## Quick Reference` — flat command/API reference
- `## Procedure` — numbered steps with copy-paste commands
- `## Pitfalls` — known limits, rate limits, things that look broken but aren't
- `## Verification` — single command that proves the skill works
Target ~200 lines for a complex skill, ~100 lines for a simple one. Cut redundant intro fluff, marketing prose, and re-explanations of env vars already documented in `## Prerequisites`.
6. **Scripts go in `scripts/`, references in `references/`, templates in `templates/`.** Don't expect the model to inline-write parsers, XML walkers, or non-trivial logic every call — ship a helper script. Reference scripts from SKILL.md by path relative to the skill directory.
7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only stdlib + pytest + `unittest.mock`. No live network calls. Run via `scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`. Must pass under the hermetic CI env (no API keys leaking through). Use `monkeypatch` and `tmp_path` for any env-var or filesystem dependencies.
8. **`.env.example` additions are isolated to a clearly delimited block.** Don't touch the surrounding file — contributor-supplied `.env.example` versions are usually stale, and edits outside the skill's own block will be dropped during salvage. Comment all values with `#` (it's documentation, not live config).
### Skill guidelines
- **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
+5 -1
View File
@@ -94,9 +94,13 @@ RUN cd web && npm run build && \
# hermes_cli/main.py succeeds (see #18800). /opt/hermes/web is build-time
# only (HERMES_WEB_DIST points at hermes_cli/web_dist) and is intentionally
# not chowned here.
# The .venv MUST be hermes-writable so lazy_deps.py can install platform
# packages (discord.py, telegram, slack, etc.) at first gateway boot.
# Without this, `uv pip install` fails with EACCES and all messaging
# adapters silently fail to load. See tools/lazy_deps.py.
USER root
RUN chmod -R a+rX /opt/hermes && \
chown -R hermes:hermes /opt/hermes/ui-tui /opt/hermes/node_modules
chown -R hermes:hermes /opt/hermes/.venv /opt/hermes/ui-tui /opt/hermes/node_modules
# Start as root so the entrypoint can usermod/groupmod + gosu.
# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
+1 -1
View File
@@ -14,7 +14,7 @@
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NovitaAI](https://novita.ai) (AI-native cloud for Model API, Agent Sandbox, and GPU Cloud), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
<table>
<tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
+94 -33
View File
@@ -1,10 +1,11 @@
"""ACP permission bridging — maps ACP approval requests to hermes approval callbacks."""
"""ACP permission bridging for Hermes dangerous-command approvals."""
from __future__ import annotations
import asyncio
import logging
from concurrent.futures import TimeoutError as FutureTimeout
from itertools import count
from typing import Callable
from acp.schema import (
@@ -14,24 +15,87 @@ from acp.schema import (
logger = logging.getLogger(__name__)
# Maps ACP PermissionOptionKind -> hermes approval result strings
_KIND_TO_HERMES = {
# Maps ACP permission option ids to Hermes approval result strings.
# Option ids are stable across both the ``allow_permanent=True`` and
# ``allow_permanent=False`` paths even though the option list differs.
_OPTION_ID_TO_HERMES = {
"allow_once": "once",
"allow_session": "session",
"allow_always": "always",
"reject_once": "deny",
"reject_always": "deny",
"deny": "deny",
}
_PERMISSION_REQUEST_IDS = count(1)
def _build_permission_options(*, allow_permanent: bool) -> list[PermissionOption]:
"""Return ACP options that match Hermes approval semantics."""
options = [
PermissionOption(option_id="allow_once", kind="allow_once", name="Allow once"),
PermissionOption(
option_id="allow_session",
# ACP has no session-scoped kind, so use the closest persistent
# hint while keeping Hermes semantics in the option id.
kind="allow_always",
name="Allow for session",
),
]
if allow_permanent:
options.append(
PermissionOption(
option_id="allow_always",
kind="allow_always",
name="Allow always",
),
)
options.append(PermissionOption(option_id="deny", kind="reject_once", name="Deny"))
return options
def _build_permission_tool_call(command: str, description: str):
"""Return the ACP tool-call update attached to a permission request.
``request_permission`` expects a ``ToolCallUpdate`` payload — produced
by ``_acp.update_tool_call`` — not a ``ToolCallStart``. Each request
gets a unique ``perm-check-N`` id so concurrent requests don't collide.
"""
import acp as _acp
tool_call_id = f"perm-check-{next(_PERMISSION_REQUEST_IDS)}"
return _acp.update_tool_call(
tool_call_id,
title=description,
kind="execute",
status="pending",
content=[_acp.tool_content(_acp.text_block(f"$ {command}"))],
raw_input={"command": command, "description": description},
)
def _map_outcome_to_hermes(outcome: object, *, allowed_option_ids: set[str]) -> str:
"""Map an ACP permission outcome into Hermes approval strings."""
if not isinstance(outcome, AllowedOutcome):
return "deny"
option_id = outcome.option_id
if option_id not in allowed_option_ids:
logger.warning("Permission request returned unknown option_id: %s", option_id)
return "deny"
return _OPTION_ID_TO_HERMES.get(option_id, "deny")
def make_approval_callback(
request_permission_fn: Callable,
loop: asyncio.AbstractEventLoop,
session_id: str,
timeout: float = 60.0,
) -> Callable[[str, str], str]:
) -> Callable[..., str]:
"""
Return a hermes-compatible ``approval_callback(command, description) -> str``
that bridges to the ACP client's ``request_permission`` call.
Return a Hermes-compatible approval callback that bridges to ACP.
The callback accepts ``command`` and ``description`` plus optional
keyword arguments such as ``allow_permanent`` used by
``tools.approval.prompt_dangerous_approval()``.
Args:
request_permission_fn: The ACP connection's ``request_permission`` coroutine.
@@ -40,41 +104,38 @@ def make_approval_callback(
timeout: Seconds to wait for a response before auto-denying.
"""
def _callback(command: str, description: str) -> str:
options = [
PermissionOption(option_id="allow_once", kind="allow_once", name="Allow once"),
PermissionOption(option_id="allow_always", kind="allow_always", name="Allow always"),
PermissionOption(option_id="deny", kind="reject_once", name="Deny"),
]
import acp as _acp
tool_call = _acp.start_tool_call("perm-check", command, kind="execute")
coro = request_permission_fn(
session_id=session_id,
tool_call=tool_call,
options=options,
)
def _callback(
command: str,
description: str,
*,
allow_permanent: bool = True,
**_: object,
) -> str:
options = _build_permission_options(allow_permanent=allow_permanent)
future = None
try:
tool_call = _build_permission_tool_call(command, description)
coro = request_permission_fn(
session_id=session_id,
tool_call=tool_call,
options=options,
)
future = asyncio.run_coroutine_threadsafe(coro, loop)
response = future.result(timeout=timeout)
except (FutureTimeout, Exception) as exc:
if future is not None:
future.cancel()
logger.warning("Permission request timed out or failed: %s", exc)
return "deny"
if response is None:
return "deny"
outcome = response.outcome
if isinstance(outcome, AllowedOutcome):
option_id = outcome.option_id
# Look up the kind from our options list
for opt in options:
if opt.option_id == option_id:
return _KIND_TO_HERMES.get(opt.kind, "deny")
return "once" # fallback for unknown option_id
else:
return "deny"
allowed_option_ids = {option.option_id for option in options}
return _map_outcome_to_hermes(
response.outcome,
allowed_option_ids=allowed_option_ids,
)
return _callback
+2 -3
View File
@@ -1305,9 +1305,8 @@ def convert_tools_to_anthropic(tools: List[Dict]) -> List[Dict]:
),
}
# Forward cache_control marker when present on the OpenAI-format
# tool dict (set by ``mark_tools_for_long_lived_cache``). Anthropic's
# tools array supports cache_control on the last tool to cache the
# entire schema cross-session.
# tool dict. Anthropic's tools array supports cache_control on the
# last tool to cache the entire schema cross-session.
cache_control = t.get("cache_control")
if isinstance(cache_control, dict):
anthropic_tool["cache_control"] = dict(cache_control)
+29 -4
View File
@@ -382,7 +382,28 @@ _AI_GATEWAY_HEADERS = {
# Nous Portal extra_body for product attribution.
# Callers should pass this as extra_body in chat.completions.create()
# when the auxiliary client is backed by Nous Portal.
NOUS_EXTRA_BODY = {"tags": ["product=hermes-agent", "client=aux"]}
#
# The tags are computed from agent.portal_tags so the client= marker stays
# in lockstep with hermes_cli.__version__ across every Portal call site
# (main loop, aux, compression, web_extract). Do not inline a literal here;
# see agent/portal_tags.py for the rationale.
from agent.portal_tags import nous_portal_tags as _nous_portal_tags
def _nous_extra_body() -> dict:
"""Return a fresh Nous Portal ``extra_body`` dict.
Computed at call time so a hot-reloaded ``hermes_cli.__version__`` is
reflected without restarting long-running processes.
"""
return {"tags": _nous_portal_tags()}
# Backwards-compatible module attribute. Some callers (tests, third-party
# plugins) read ``NOUS_EXTRA_BODY`` directly; keep it as a snapshot of the
# current tags. Callers that need the freshest value should call
# ``_nous_extra_body()`` or import ``nous_portal_tags`` directly.
NOUS_EXTRA_BODY = _nous_extra_body()
# Set at resolve time — True if the auxiliary client points to Nous Portal
auxiliary_is_nous: bool = False
@@ -1386,6 +1407,7 @@ def _try_openrouter(explicit_api_key: str = None) -> Tuple[Optional[OpenAI], Opt
if pool_present:
or_key = explicit_api_key or _pool_runtime_api_key(entry)
if not or_key:
_mark_provider_unhealthy("openrouter", ttl=60)
return None, None
base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
logger.debug("Auxiliary client: OpenRouter via pool")
@@ -1394,6 +1416,7 @@ def _try_openrouter(explicit_api_key: str = None) -> Tuple[Optional[OpenAI], Opt
or_key = explicit_api_key or os.getenv("OPENROUTER_API_KEY")
if not or_key:
_mark_provider_unhealthy("openrouter", ttl=60)
return None, None
logger.debug("Auxiliary client: OpenRouter")
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
@@ -1425,6 +1448,7 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
"Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
_remaining,
)
_mark_provider_unhealthy("nous", ttl=_remaining)
return None, None
except Exception:
pass
@@ -1432,6 +1456,7 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
nous = _read_nous_auth()
runtime = _resolve_nous_runtime_api(force_refresh=False)
if runtime is None and not nous:
_mark_provider_unhealthy("nous", ttl=60)
return None, None
global auxiliary_is_nous
auxiliary_is_nous = True
@@ -3437,7 +3462,7 @@ def get_auxiliary_extra_body() -> dict:
Includes Nous Portal product tags when the auxiliary client is backed
by Nous Portal. Returns empty dict otherwise.
"""
return dict(NOUS_EXTRA_BODY) if auxiliary_is_nous else {}
return _nous_extra_body() if auxiliary_is_nous else {}
def auxiliary_max_tokens_param(value: int) -> dict:
@@ -4026,7 +4051,7 @@ def _build_call_kwargs(
# Provider-specific extra_body
merged_extra = dict(extra_body or {})
if provider == "nous" or auxiliary_is_nous:
merged_extra.setdefault("tags", []).extend(NOUS_EXTRA_BODY["tags"])
merged_extra.setdefault("tags", []).extend(_nous_portal_tags())
if merged_extra:
kwargs["extra_body"] = merged_extra
@@ -4411,7 +4436,7 @@ def extract_content_or_reasoning(response) -> str:
1. ``message.content`` — strip inline think/reasoning blocks, check for
remaining non-whitespace text.
2. ``message.reasoning`` / ``message.reasoning_content`` — direct
structured reasoning fields (DeepSeek, Moonshot, Novita, etc.).
structured reasoning fields (DeepSeek, Moonshot, NovitaAI, etc.).
3. ``message.reasoning_details`` — OpenRouter unified array format.
Returns the best available text, or ``""`` if nothing found.
+23 -3
View File
@@ -1185,6 +1185,26 @@ The user has requested that this compaction PRIORITISE preserving all informatio
idx += 1
return idx
def _protect_head_size(self, messages: List[Dict[str, Any]]) -> int:
"""Total count of head messages to protect.
``protect_first_n`` is defined as *additional* messages protected
beyond the system prompt. The system prompt (if present at index 0)
is always implicitly protected — it's load-bearing context that
must never be summarised away. This keeps semantics stable across
call paths where the system prompt may or may not be included in
the ``messages`` list (e.g. the gateway ``/compress`` handler
strips it before calling compress()).
Examples:
protect_first_n=0 → system prompt only (or nothing if no system msg)
protect_first_n=3 → system + first 3 non-system messages
"""
head = 0
if messages and messages[0].get("role") == "system":
head = 1
return head + self.protect_first_n
def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
"""Pull a compress-end boundary backward to avoid splitting a
tool_call / result group.
@@ -1343,7 +1363,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
skip the LLM call when the transcript is still entirely inside
the protected head/tail.
"""
compress_start = self._align_boundary_forward(messages, self.protect_first_n)
compress_start = self._align_boundary_forward(messages, self._protect_head_size(messages))
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
return compress_start < compress_end
@@ -1379,7 +1399,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._last_aux_model_failure_model = None
n_messages = len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self.protect_first_n + 3 + 1
_min_for_compress = self._protect_head_size(messages) + 3 + 1
if n_messages <= _min_for_compress:
if not self.quiet_mode:
logger.warning(
@@ -1399,7 +1419,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)
# Phase 2: Determine boundaries
compress_start = self.protect_first_n
compress_start = self._protect_head_size(messages)
compress_start = self._align_boundary_forward(messages, compress_start)
# Use token-budget tail protection instead of fixed message count
+5
View File
@@ -55,6 +55,11 @@ class ContextEngine(ABC):
# These control the preflight compression check. Subclasses may
# override via __init__ or property; defaults are sensible for most
# engines.
#
# protect_first_n semantics (since PR #13754): count of non-system head
# messages always preserved verbatim, IN ADDITION to the system prompt
# which is always implicitly protected. Default 3 keeps the
# historical "system + first 3 non-system messages" head shape.
threshold_percent: float = 0.75
protect_first_n: int = 3
+3
View File
@@ -14,6 +14,7 @@ from difflib import unified_diff
from pathlib import Path
from utils import safe_json_loads
from agent.tool_result_classification import file_mutation_result_landed
# ANSI escape codes for coloring tool failure indicators
_RED = "\033[31m"
@@ -810,6 +811,8 @@ def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]
"""
if result is None:
return False, ""
if file_mutation_result_landed(tool_name, result):
return False, ""
if tool_name == "terminal":
data = safe_json_loads(result)
+7 -1
View File
@@ -450,7 +450,13 @@ def _make_stream_chunk(
finish_reason: Optional[str] = None,
reasoning: str = "",
) -> _GeminiStreamChunk:
delta_kwargs: Dict[str, Any] = {"role": "assistant"}
delta_kwargs: Dict[str, Any] = {
"role": "assistant",
"content": None,
"tool_calls": None,
"reasoning": None,
"reasoning_content": None,
}
if content:
delta_kwargs["content"] = content
if tool_call_delta is not None:
+31 -6
View File
@@ -77,6 +77,17 @@ def get_active_provider() -> Optional[ImageGenProvider]:
Reads ``image_gen.provider`` from config.yaml; falls back per the
module docstring.
**Availability semantics** (mirrors :mod:`agent.web_search_registry`):
- When ``image_gen.provider`` is explicitly set, the configured
provider is returned even if :meth:`ImageGenProvider.is_available`
reports False the dispatcher surfaces a precise "X_API_KEY is not
set" error rather than silently switching backends.
- When ``image_gen.provider`` is unset, the fallback path (single-
provider shortcut and the FAL legacy preference) is filtered by
``is_available()`` so we don't pick a provider the user has no
credentials for.
"""
configured: Optional[str] = None
try:
@@ -94,6 +105,17 @@ def get_active_provider() -> Optional[ImageGenProvider]:
with _lock:
snapshot = dict(_providers)
def _is_available_safe(p: ImageGenProvider) -> bool:
"""Wrap ``is_available()`` so a buggy provider doesn't kill resolution."""
try:
return bool(p.is_available())
except Exception as exc: # noqa: BLE001
logger.debug("image_gen provider %s.is_available() raised %s", p.name, exc)
return False
# 1. Explicit config wins — return regardless of is_available() so the
# user gets a precise downstream error message rather than a silent
# backend switch.
if configured:
provider = snapshot.get(configured)
if provider is not None:
@@ -103,13 +125,16 @@ def get_active_provider() -> Optional[ImageGenProvider]:
configured,
)
# Fallback: single-provider case
if len(snapshot) == 1:
return next(iter(snapshot.values()))
# 2. Fallback: single registered provider — but only if it's actually
# available (no credentials = don't surface it as "active").
available = [p for p in snapshot.values() if _is_available_safe(p)]
if len(available) == 1:
return available[0]
# Fallback: prefer legacy FAL for backward compat
if "fal" in snapshot:
return snapshot["fal"]
# 3. Fallback: prefer legacy FAL for backward compat, when available.
fal = snapshot.get("fal")
if fal is not None and _is_available_safe(fal):
return fal
return None
+23 -2
View File
@@ -47,7 +47,7 @@ def _resolve_requests_verify() -> bool | str:
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-oauth", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba", "novita",
"qwen-oauth",
"xiaomi",
"arcee",
@@ -66,7 +66,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"gmi-cloud", "gmicloud",
"xai", "x-ai", "x.ai", "grok",
"nvidia", "nim", "nvidia-nim", "nemotron",
"qwen-portal",
"qwen-portal", "novita-ai", "novitaai",
})
@@ -104,6 +104,8 @@ def _strip_provider_prefix(model: str) -> str:
_model_metadata_cache: Dict[str, Dict[str, Any]] = {}
_model_metadata_cache_time: float = 0
_novita_metadata_cache: Dict[str, Dict[str, Any]] = {}
_novita_metadata_cache_time: float = 0
_MODEL_CACHE_TTL = 3600
_endpoint_model_metadata_cache: Dict[str, Dict[str, Dict[str, Any]]] = {}
_endpoint_model_metadata_cache_time: Dict[str, float] = {}
@@ -285,6 +287,7 @@ def grok_supports_reasoning_effort(model: str) -> bool:
_CONTEXT_LENGTH_KEYS = (
"context_length",
"context_window",
"context_size",
"max_context_length",
"max_position_embeddings",
"max_model_len",
@@ -361,6 +364,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"api.gmi-serving.com": "gmi",
"api.novita.ai": "novita",
"tokenhub.tencentmaas.com": "tencent-tokenhub",
"ollama.com": "ollama-cloud",
}
@@ -557,6 +561,16 @@ def _extract_max_completion_tokens(payload: Dict[str, Any]) -> Optional[int]:
def _extract_pricing(payload: Dict[str, Any]) -> Dict[str, Any]:
novita_input = payload.get("input_token_price_per_m")
novita_output = payload.get("output_token_price_per_m")
if novita_input is not None or novita_output is not None:
pricing: Dict[str, Any] = {}
if novita_input is not None:
pricing["prompt"] = str(float(novita_input) / 10_000 / 1_000_000)
if novita_output is not None:
pricing["completion"] = str(float(novita_output) / 10_000 / 1_000_000)
return pricing
alias_map = {
"prompt": ("prompt", "input", "input_cost_per_token", "prompt_token_cost"),
"completion": ("completion", "output", "output_cost_per_token", "completion_token_cost"),
@@ -1527,6 +1541,13 @@ def get_model_context_length(
except ImportError:
pass # boto3 not installed — fall through to generic resolution
if provider == "novita" or (base_url and base_url_host_matches(base_url, "api.novita.ai")):
ctx = _resolve_endpoint_context_length(model, base_url or "https://api.novita.ai/openai/v1", api_key=api_key)
if ctx is not None:
if base_url:
save_context_length(model, base_url, ctx)
return ctx
# 2. Active endpoint metadata for truly custom/unknown endpoints.
# Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
# /models endpoint may report a provider-imposed limit (e.g. Copilot
+1
View File
@@ -141,6 +141,7 @@ class ProviderInfo:
# Hermes provider names → models.dev provider IDs
PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openrouter": "openrouter",
"novita": "novita-ai",
"anthropic": "anthropic",
"openai": "openai",
"openai-codex": "openai",
+64
View File
@@ -0,0 +1,64 @@
"""Centralized Nous Portal request tags.
Every Hermes request that hits the Nous Portal main agent loop, auxiliary
client (compression / titles / vision / web_extract / session_search / etc.),
and any future code path must carry the same product-attribution tags so
Nous can attribute usage to Hermes Agent and bucket it by client release.
Tag shape (sent in OpenAI-compatible ``extra_body['tags']``):
[
"product=hermes-agent",
"client=hermes-client-v<__version__>",
]
The version is sourced live from ``hermes_cli.__version__`` so it auto-aligns
to whatever release is installed; the release script
(``scripts/release.py``) regex-bumps that single string, and every Portal
request picks up the new tag on the next process start.
Why one helper instead of inlining the literal at each site:
* Four call sites (main loop profile, aux client, run_agent compression
fallback, web_tools fallback) used to drift apart see PR #24194 which
only got the aux site, leaving the main loop sending a different tag set.
* Tests should assert the same tag list everywhere; centralizing makes that
assertion a one-liner against this module.
Do NOT pre-compute these as module-level constants in the consumers. The
version can change at runtime (editable installs, hot-reload tooling), and
``hermes_cli.__version__`` is the canonical source of truth.
"""
from __future__ import annotations
from typing import List
def _hermes_version() -> str:
"""Return the current Hermes release version, e.g. ``"0.13.0"``.
Falls back to ``"unknown"`` if ``hermes_cli`` cannot be imported (should
never happen in a real install guarded for defensive testing).
"""
try:
from hermes_cli import __version__
return __version__
except Exception:
return "unknown"
def hermes_client_tag() -> str:
"""Return the ``client=...`` tag for Nous Portal requests.
Format: ``client=hermes-client-v<MAJOR>.<MINOR>.<PATCH>``.
"""
return f"client=hermes-client-v{_hermes_version()}"
def nous_portal_tags() -> List[str]:
"""Return the canonical list of Nous Portal product tags.
Always returns a fresh list so callers can mutate it freely
(e.g. ``merged_extra.setdefault("tags", []).extend(nous_portal_tags())``).
"""
return ["product=hermes-agent", hermes_client_tag()]
+6 -128
View File
@@ -1,25 +1,15 @@
"""Anthropic prompt caching strategies.
"""Anthropic prompt caching strategy.
Two layouts:
* ``system_and_3`` (default, used everywhere except the long-lived path):
4 cache_control breakpoints system prompt + last 3 non-system messages.
All at the same TTL (5m or 1h). Reduces input token costs by ~75% on
multi-turn conversations within a single session.
* ``prefix_and_2`` (Claude on Anthropic / OpenRouter / Nous Portal):
4 breakpoints split across two TTL tiers tools[-1] (1h) +
stable system prefix (1h) + last 2 non-system messages (5m). The
long-lived prefix is byte-stable across sessions for a given user
config, so every fresh session reads the cached system+tools instead
of re-paying for them. Within-session rolling window shrinks from 3
messages to 2 to free the breakpoint budget.
Single layout: ``system_and_3``. 4 cache_control breakpoints system
prompt + last 3 non-system messages, all at the same TTL (5m or 1h).
Reduces input token costs by ~75% on multi-turn conversations within a
single session.
Pure functions -- no class state, no AIAgent dependency.
"""
import copy
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List
def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool = False) -> None:
@@ -87,115 +77,3 @@ def apply_anthropic_cache_control(
_apply_cache_marker(messages[idx], marker, native_anthropic=native_anthropic)
return messages
def _mark_system_stable_block(
messages: List[Dict[str, Any]],
long_lived_marker: Dict[str, str],
) -> bool:
"""Mark the *first* content block of the system message with the 1h marker.
The system message is expected to have been split into multiple content
blocks beforehand by the caller block[0] is the cross-session-stable
prefix, subsequent blocks carry context files + volatile suffix.
Falls back to marking the whole system message as a single block when
the message hasn't been split (preserves correctness on the fallback path).
Returns True when a marker was placed.
"""
if not messages or messages[0].get("role") != "system":
return False
sys_msg = messages[0]
content = sys_msg.get("content")
# Already a list of blocks → mark the first block.
if isinstance(content, list) and content:
first = content[0]
if isinstance(first, dict):
first["cache_control"] = long_lived_marker
return True
return False
# String content (no split) → cannot place a stable-prefix breakpoint
# without changing the byte content. Caller is responsible for
# splitting; if they didn't, fall through to envelope marker so we still
# cache *something* for this turn.
if isinstance(content, str) and content:
sys_msg["content"] = [
{"type": "text", "text": content, "cache_control": long_lived_marker}
]
return True
return False
def apply_anthropic_cache_control_long_lived(
api_messages: List[Dict[str, Any]],
long_lived_ttl: str = "1h",
rolling_ttl: str = "5m",
native_anthropic: bool = False,
) -> List[Dict[str, Any]]:
"""Apply prefix_and_2 caching: long-lived stable prefix + rolling window.
Layout (4 breakpoints total):
* Stable system prefix (block[0]) ``long_lived_ttl`` TTL
* Last 2 non-system messages ``rolling_ttl`` TTL each
NOTE: this function does NOT mark the tools array. Tools cache_control
is attached separately (see ``mark_tools_for_long_lived_cache``) because
tools live outside the messages list in the API payload.
The caller MUST have split the system message into ordered content
blocks where block[0] is the cross-session-stable portion. If the system
message is still a single string, it is wrapped into a single block and
marked this is correct, just less effective (the volatile suffix is
not isolated, so the prefix invalidates per-session).
Returns:
Deep copy of messages with cache_control breakpoints injected.
"""
messages = copy.deepcopy(api_messages)
if not messages:
return messages
long_marker = _build_marker(long_lived_ttl)
rolling_marker = _build_marker(rolling_ttl)
placed_prefix = _mark_system_stable_block(messages, long_marker)
# Reserve 1 breakpoint for the system prefix (when placed); spend the
# remaining 3 on the rolling tail. Anthropic max is 4 total —
# tools[-1] (when marked) consumes the 4th, so we cap rolling at 2 here.
rolling_budget = 2 if placed_prefix else 3
non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
for idx in non_sys[-rolling_budget:]:
_apply_cache_marker(messages[idx], rolling_marker, native_anthropic=native_anthropic)
return messages
def mark_tools_for_long_lived_cache(
tools: Optional[List[Dict[str, Any]]],
long_lived_ttl: str = "1h",
) -> Optional[List[Dict[str, Any]]]:
"""Attach cache_control to the last tool in the OpenAI-format tools list.
Anthropic prefix-cache order is ``tools system messages``. Marking
the last tool dict caches the entire tools array (Anthropic's docs:
"the marker is placed on the last block you want included in the cached
prefix"). Marker is preserved across the OpenAI-wire boundary on
OpenRouter and Nous Portal (which proxies to OpenRouter); on native
Anthropic the marker is forwarded by ``convert_tools_to_anthropic``.
Returns a deep copy of the tools list with the marker attached, or the
input unchanged when tools is empty/None. Pure function does not
mutate the input.
"""
if not tools:
return tools
out = copy.deepcopy(tools)
last = out[-1]
if isinstance(last, dict):
last["cache_control"] = _build_marker(long_lived_ttl)
return out
+3
View File
@@ -14,6 +14,7 @@ from dataclasses import dataclass, field
from typing import Any, Mapping
from utils import safe_json_loads
from agent.tool_result_classification import file_mutation_result_landed
IDEMPOTENT_TOOL_NAMES = frozenset(
@@ -196,6 +197,8 @@ def classify_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str
"""
if result is None:
return False, ""
if file_mutation_result_landed(tool_name, result):
return False, ""
if tool_name == "terminal":
data = safe_json_loads(result)
+26
View File
@@ -0,0 +1,26 @@
"""Shared helpers for classifying tool result payloads."""
from __future__ import annotations
import json
from typing import Any
FILE_MUTATING_TOOL_NAMES = frozenset({"write_file", "patch"})
def file_mutation_result_landed(tool_name: str, result: Any) -> bool:
"""Return True when a file mutation result proves the write landed."""
if tool_name not in FILE_MUTATING_TOOL_NAMES or not isinstance(result, str):
return False
try:
data = json.loads(result.strip())
except Exception:
return False
if not isinstance(data, dict) or data.get("error"):
return False
if tool_name == "write_file":
return "bytes_written" in data
if tool_name == "patch":
return data.get("success") is True
return False
+368
View File
@@ -0,0 +1,368 @@
"""Codex app-server JSON-RPC client.
Speaks the protocol documented in codex-rs/app-server/README.md (codex 0.125+).
Transport is newline-delimited JSON-RPC 2.0 over stdio: spawn `codex app-server`,
do an `initialize` handshake, then drive `thread/start` + `turn/start` and
consume streaming `item/*` notifications until `turn/completed`.
This module is the wire-level speaker only. Higher-level concerns (event
projection into Hermes' display, approval bridging, transcript projection into
AIAgent.messages, plugin migration) live in sibling modules.
Status: optional opt-in runtime gated behind `model.openai_runtime ==
"codex_app_server"`. Hermes' default tool dispatch is unchanged when this
runtime is not selected.
"""
from __future__ import annotations
import json
import os
import queue
import subprocess
import threading
import time
from dataclasses import dataclass, field
from typing import Any, Callable, Optional
# Default minimum codex version we test against. The PR sets this from the
# `codex --version` parsed at install time; bumping is a one-line change here.
MIN_CODEX_VERSION = (0, 125, 0)
@dataclass
class CodexAppServerError(RuntimeError):
"""Raised on JSON-RPC errors from the app-server."""
code: int
message: str
data: Optional[Any] = None
def __str__(self) -> str: # pragma: no cover - trivial
return f"codex app-server error {self.code}: {self.message}"
@dataclass
class _Pending:
queue: queue.Queue
method: str
sent_at: float = field(default_factory=time.time)
class CodexAppServerClient:
"""Minimal JSON-RPC 2.0 client for `codex app-server` over stdio.
Threading model:
- Spawning thread (caller) drives request/response pairs synchronously.
- One reader thread parses stdout, dispatches replies to the right
pending future, and routes notifications + server-initiated requests
to bounded queues that the caller drains on their own cadence.
- One reader thread captures stderr for diagnostics; codex emits
tracing logs there at RUST_LOG-controlled levels.
Intentionally NOT async. AIAgent.run_conversation() is synchronous and
runs on the main thread; layering asyncio just to drive a stdio child
creates surprising interrupt semantics. We use blocking queues with
timeouts and rely on `turn/interrupt` for cancellation.
"""
def __init__(
self,
codex_bin: str = "codex",
codex_home: Optional[str] = None,
extra_args: Optional[list[str]] = None,
env: Optional[dict[str, str]] = None,
) -> None:
self._codex_bin = codex_bin
cmd = [codex_bin, "app-server"] + list(extra_args or [])
spawn_env = os.environ.copy()
if env:
spawn_env.update(env)
if codex_home:
spawn_env["CODEX_HOME"] = codex_home
# Codex emits tracing to stderr; default WARN keeps it quiet for users.
spawn_env.setdefault("RUST_LOG", "warn")
self._proc = subprocess.Popen(
cmd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
bufsize=0,
env=spawn_env,
)
self._next_id = 1
self._pending: dict[int, _Pending] = {}
self._pending_lock = threading.Lock()
self._notifications: queue.Queue = queue.Queue()
self._server_requests: queue.Queue = queue.Queue()
self._stderr_lines: list[str] = []
self._stderr_lock = threading.Lock()
self._closed = False
self._initialized = False
self._reader = threading.Thread(target=self._read_stdout, daemon=True)
self._reader.start()
self._stderr_reader = threading.Thread(target=self._read_stderr, daemon=True)
self._stderr_reader.start()
# ---------- lifecycle ----------
def initialize(
self,
client_name: str = "hermes",
client_title: str = "Hermes Agent",
client_version: str = "0.1",
capabilities: Optional[dict] = None,
timeout: float = 10.0,
) -> dict:
"""Send `initialize` + `initialized` handshake. Returns the server's
InitializeResponse (userAgent, codexHome, platformFamily, platformOs)."""
if self._initialized:
raise RuntimeError("already initialized")
params = {
"clientInfo": {
"name": client_name,
"title": client_title,
"version": client_version,
},
"capabilities": capabilities or {},
}
result = self.request("initialize", params, timeout=timeout)
self.notify("initialized")
self._initialized = True
return result
def close(self, timeout: float = 3.0) -> None:
"""Close stdin and wait for the subprocess to exit, escalating to kill."""
if self._closed:
return
self._closed = True
try:
if self._proc.stdin and not self._proc.stdin.closed:
self._proc.stdin.close()
except Exception:
pass
try:
self._proc.terminate()
self._proc.wait(timeout=timeout)
except subprocess.TimeoutExpired:
try:
self._proc.kill()
self._proc.wait(timeout=1.0)
except Exception:
pass
def __enter__(self) -> "CodexAppServerClient":
return self
def __exit__(self, *exc: Any) -> None:
self.close()
# ---------- send/receive ----------
def request(
self,
method: str,
params: Optional[dict] = None,
timeout: float = 30.0,
) -> dict:
"""Send a JSON-RPC request and block on the response. Returns `result`,
raises CodexAppServerError on `error`."""
rid = self._take_id()
q: queue.Queue = queue.Queue(maxsize=1)
with self._pending_lock:
self._pending[rid] = _Pending(queue=q, method=method)
self._send({"id": rid, "method": method, "params": params or {}})
try:
msg = q.get(timeout=timeout)
except queue.Empty:
with self._pending_lock:
self._pending.pop(rid, None)
raise TimeoutError(
f"codex app-server method {method!r} timed out after {timeout}s"
)
if "error" in msg:
err = msg["error"]
raise CodexAppServerError(
code=err.get("code", -1),
message=err.get("message", ""),
data=err.get("data"),
)
return msg.get("result", {})
def notify(self, method: str, params: Optional[dict] = None) -> None:
"""Send a JSON-RPC notification (no id, no response expected)."""
self._send({"method": method, "params": params or {}})
def respond(self, request_id: Any, result: dict) -> None:
"""Reply to a server-initiated request (e.g. approval prompts)."""
self._send({"id": request_id, "result": result})
def respond_error(
self, request_id: Any, code: int, message: str, data: Optional[Any] = None
) -> None:
"""Reply to a server-initiated request with an error."""
err: dict[str, Any] = {"code": code, "message": message}
if data is not None:
err["data"] = data
self._send({"id": request_id, "error": err})
def take_notification(self, timeout: float = 0.0) -> Optional[dict]:
"""Pop the next streaming notification, or return None on timeout.
timeout=0.0 means non-blocking. Use small positive timeouts inside the
AIAgent turn loop to interleave reads with interrupt checks."""
try:
if timeout <= 0:
return self._notifications.get_nowait()
return self._notifications.get(timeout=timeout)
except queue.Empty:
return None
def take_server_request(self, timeout: float = 0.0) -> Optional[dict]:
"""Pop the next server-initiated request (e.g. exec/applyPatch approval)."""
try:
if timeout <= 0:
return self._server_requests.get_nowait()
return self._server_requests.get(timeout=timeout)
except queue.Empty:
return None
# ---------- diagnostics ----------
def stderr_tail(self, n: int = 20) -> list[str]:
"""Return last n lines of codex's stderr (for error reports)."""
with self._stderr_lock:
return list(self._stderr_lines[-n:])
def is_alive(self) -> bool:
return self._proc.poll() is None
# ---------- internals ----------
def _take_id(self) -> int:
# JSON-RPC ids only need to be unique per-connection. A simple
# monotonically increasing int is the common choice and matches what
# codex's own clients use.
rid = self._next_id
self._next_id += 1
return rid
def _send(self, obj: dict) -> None:
if self._closed:
raise RuntimeError("codex app-server client is closed")
if self._proc.stdin is None:
raise RuntimeError("codex app-server stdin not available")
try:
self._proc.stdin.write((json.dumps(obj) + "\n").encode("utf-8"))
self._proc.stdin.flush()
except (BrokenPipeError, ValueError) as exc:
raise RuntimeError(
f"codex app-server stdin closed unexpectedly: {exc}"
) from exc
def _read_stdout(self) -> None:
if self._proc.stdout is None:
return
try:
for line in iter(self._proc.stdout.readline, b""):
if not line:
break
line = line.strip()
if not line:
continue
try:
msg = json.loads(line)
except json.JSONDecodeError:
# Non-JSON output is unexpected on stdout; tracing belongs
# on stderr. Surface it via stderr buffer for diagnostics.
with self._stderr_lock:
self._stderr_lines.append(
f"<non-json on stdout> {line[:200]!r}"
)
continue
self._dispatch(msg)
except Exception as exc:
with self._stderr_lock:
self._stderr_lines.append(f"<stdout reader error> {exc}")
def _dispatch(self, msg: dict) -> None:
# Reply (has id + result/error, no method)
if "id" in msg and ("result" in msg or "error" in msg):
with self._pending_lock:
pending = self._pending.pop(msg["id"], None)
if pending is not None:
try:
pending.queue.put_nowait(msg)
except queue.Full: # pragma: no cover - defensive
pass
return
# Server-initiated request (has id + method)
if "id" in msg and "method" in msg:
self._server_requests.put(msg)
return
# Notification (no id)
if "method" in msg:
self._notifications.put(msg)
def _read_stderr(self) -> None:
if self._proc.stderr is None:
return
try:
for line in iter(self._proc.stderr.readline, b""):
if not line:
break
with self._stderr_lock:
self._stderr_lines.append(
line.decode("utf-8", "replace").rstrip()
)
# Bound memory: keep last 500 lines.
if len(self._stderr_lines) > 500:
self._stderr_lines = self._stderr_lines[-500:]
except Exception: # pragma: no cover
pass
def parse_codex_version(output: str) -> Optional[tuple[int, int, int]]:
"""Parse `codex --version` output. Returns (major, minor, patch) or None."""
# Output format: "codex-cli 0.130.0" possibly followed by metadata.
import re
match = re.search(r"(\d+)\.(\d+)\.(\d+)", output or "")
if not match:
return None
return (int(match.group(1)), int(match.group(2)), int(match.group(3)))
def check_codex_binary(
codex_bin: str = "codex", min_version: tuple[int, int, int] = MIN_CODEX_VERSION
) -> tuple[bool, str]:
"""Verify codex CLI is installed and meets minimum version.
Returns (ok, message). Used by setup wizard and runtime startup."""
try:
proc = subprocess.run(
[codex_bin, "--version"],
capture_output=True,
text=True,
timeout=10,
)
except FileNotFoundError:
return False, (
f"codex CLI not found at {codex_bin!r}. Install with: "
f"npm i -g @openai/codex"
)
except subprocess.TimeoutExpired:
return False, "codex --version timed out"
if proc.returncode != 0:
return False, f"codex --version exited {proc.returncode}: {proc.stderr.strip()}"
version = parse_codex_version(proc.stdout)
if version is None:
return False, f"could not parse codex version from: {proc.stdout!r}"
if version < min_version:
return False, (
f"codex {'.'.join(map(str, version))} is older than required "
f"{'.'.join(map(str, min_version))}. Run: npm i -g @openai/codex"
)
return True, ".".join(map(str, version))
@@ -0,0 +1,810 @@
"""Session adapter for codex app-server runtime.
Owns one Codex thread per Hermes session. Drives `turn/start`, consumes
streaming notifications via CodexEventProjector, handles server-initiated
approval requests (apply_patch, exec command), translates cancellation,
and returns a clean turn result that AIAgent.run_conversation() can splice
into its `messages` list.
Lifecycle:
session = CodexAppServerSession(cwd="/home/x/proj")
session.ensure_started() # spawns + handshake + thread/start
result = session.run_turn(user_input="hello") # blocks until turn/completed
# result.final_text → assistant text returned to caller
# result.projected_messages → list of {role, content, ...} for messages list
# result.tool_iterations → how many tool-shaped items completed (skill nudge counter)
# result.interrupted → True if Ctrl+C / interrupt_requested fired mid-turn
session.close() # tears down subprocess
Threading model: the adapter is single-threaded from the caller's perspective.
The underlying CodexAppServerClient owns its own reader threads but exposes
blocking-with-timeout queues that this adapter polls in a loop, so the run_turn
call is synchronous and behaves like AIAgent's existing chat_completions loop.
"""
from __future__ import annotations
import logging
import os
import threading
import time
from dataclasses import dataclass, field
from typing import Any, Callable, Optional
from agent.redact import redact_sensitive_text
from agent.transports.codex_app_server import (
CodexAppServerClient,
CodexAppServerError,
)
from agent.transports.codex_event_projector import CodexEventProjector
logger = logging.getLogger(__name__)
# How many tailing stderr lines from the codex subprocess to attach to a
# user-facing error when we don't have a more specific classification (OAuth,
# wedge watchdog, etc.). Small enough to keep error messages legible, large
# enough to surface a config/provider/auth diagnostic.
_STDERR_TAIL_LINES = 12
# Permission profile mapping mirrors the docstring in PR proposal:
# Hermes' tools.terminal.security_mode → Codex's permissions profile id.
# Defaults if config is missing → workspace-write (matches Codex's own default).
_HERMES_TO_CODEX_PERMISSION_PROFILE = {
"auto": "workspace-write",
"approval-required": "read-only-with-approval",
"unrestricted": "full-access",
# Backstop alias used by some skills/tests.
"yolo": "full-access",
}
@dataclass
class TurnResult:
"""Result of one user→assistant→tool turn through the codex app-server."""
final_text: str = ""
projected_messages: list[dict] = field(default_factory=list)
tool_iterations: int = 0
interrupted: bool = False
error: Optional[str] = None # Set if turn ended in a non-recoverable error
turn_id: Optional[str] = None
thread_id: Optional[str] = None
# Hint to the caller that the underlying codex subprocess is likely
# wedged (turn-level timeout fired, post-tool watchdog tripped, or
# token-refresh failure killed the child). The caller should retire
# the session so the next turn respawns codex from scratch instead
# of riding a CPU-spinning or auth-broken process. Mirrors openclaw
# beta.8's "retire timed-out app-server clients" fix.
should_retire: bool = False
# Markers we accept as terminal even when codex never emits turn/completed.
# Some codex versions stream `<turn_aborted>` as raw text in agentMessage
# items when an interrupt or upstream error tears the turn down before the
# normal completion path fires. Mirrors openclaw beta.8 fix.
_TURN_ABORTED_MARKERS = ("<turn_aborted>", "<turn_aborted/>")
# Substrings in codex stderr / JSON-RPC error messages that signal the
# subprocess died because its OAuth credentials are no longer valid.
# Kept conservative: we only redirect users to `codex login` when we're
# reasonably sure that's the actual failure, otherwise we surface the
# original error verbatim. Mirrors openclaw beta.8's auth-refresh
# classification.
_OAUTH_REFRESH_FAILURE_HINTS = (
"invalid_grant",
"invalid grant",
"refresh token",
"refresh_token",
"token refresh",
"token_refresh",
"token has expired",
"expired_token",
"expired token",
"not authenticated",
"unauthenticated",
"unauthorized",
"401 unauthorized",
"re-authenticate",
"reauthenticate",
"please log in",
"please login",
"auth profile",
"no auth profile",
"oauth",
)
def _classify_oauth_failure(*parts: str) -> Optional[str]:
"""Return a user-friendly re-auth hint if any of the provided strings
look like a codex OAuth/token-refresh failure; otherwise None.
Used for both `turn/start` JSON-RPC errors and post-mortem stderr
inspection when the subprocess exits unexpectedly. Conservative on
purpose we only redirect users to `codex login` when the signal
is strong, so unrelated runtime failures still surface verbatim.
"""
haystack = " ".join(p for p in parts if p).lower()
if not haystack:
return None
for needle in _OAUTH_REFRESH_FAILURE_HINTS:
if needle in haystack:
return (
"Codex authentication failed — your ChatGPT/Codex login "
"looks expired or invalid. Run `codex login` to refresh, "
"then retry. (Fall back to default runtime with "
"`/codex-runtime auto` if the issue persists.)"
)
return None
@dataclass
class _ServerRequestRouting:
"""Default policies for codex-side approval requests when no interactive
callback is wired in. These are only used by tests + cron / non-interactive
contexts; the live CLI path passes an approval_callback that defers to
tools.approval.prompt_dangerous_approval()."""
auto_approve_exec: bool = False
auto_approve_apply_patch: bool = False
class CodexAppServerSession:
"""One Codex thread per Hermes session, lifetime owned by AIAgent.
Not thread-safe one caller drives it at a time, matching how AIAgent's
run_conversation() loop is structured today. The codex client itself can
handle interleaved reads/writes via its own threads, but the adapter's
state (projector, thread_id, turn counter) is owned by the caller thread.
"""
def __init__(
self,
*,
cwd: Optional[str] = None,
codex_bin: str = "codex",
codex_home: Optional[str] = None,
permission_profile: Optional[str] = None,
approval_callback: Optional[Callable[..., str]] = None,
on_event: Optional[Callable[[dict], None]] = None,
request_routing: Optional[_ServerRequestRouting] = None,
client_factory: Optional[Callable[..., CodexAppServerClient]] = None,
) -> None:
self._cwd = cwd or os.getcwd()
self._codex_bin = codex_bin
self._codex_home = codex_home
self._permission_profile = (
permission_profile or _HERMES_TO_CODEX_PERMISSION_PROFILE.get(
os.environ.get("HERMES_TERMINAL_SECURITY_MODE", "auto"),
"workspace-write",
)
)
self._approval_callback = approval_callback
self._on_event = on_event # Display hook (kawaii spinner ticks etc.)
self._routing = request_routing or _ServerRequestRouting()
self._client_factory = client_factory or CodexAppServerClient
self._client: Optional[CodexAppServerClient] = None
self._thread_id: Optional[str] = None
self._interrupt_event = threading.Event()
# Pending file-change items, keyed by item id. Populated on
# item/started for fileChange items; consumed by the approval
# bridge when codex sends item/fileChange/requestApproval. The
# approval params don't carry the changeset, so we cache here
# to surface a real summary in the approval prompt (quirk #4).
self._pending_file_changes: dict[str, str] = {}
self._closed = False
# ---------- lifecycle ----------
def ensure_started(self) -> str:
"""Spawn the subprocess, do the initialize handshake, and start a
thread. Returns the codex thread id. Idempotent repeated calls
return the same thread id."""
if self._thread_id is not None:
return self._thread_id
if self._client is None:
self._client = self._client_factory(
codex_bin=self._codex_bin, codex_home=self._codex_home
)
self._client.initialize(
client_name="hermes",
client_title="Hermes Agent",
client_version=_get_hermes_version(),
)
# Permission selection is intentionally NOT sent on thread/start.
# Two reasons (live-tested against codex 0.130.0):
# 1. `thread/start.permissions` is gated behind the experimentalApi
# capability on this codex version — we'd have to opt in during
# initialize and accept the unstable surface.
# 2. Even with experimentalApi declared and the correct shape
# (`{"type": "profile", "id": "..."}`, not `{"profileId": ...}`),
# codex requires a matching `[permissions]` table in
# ~/.codex/config.toml or it fails the request with
# 'default_permissions requires a [permissions] table'.
# Letting codex pick its default (`:read-only` unless the user has
# configured otherwise in their codex config.toml) is the standard
# codex CLI workflow and avoids fighting codex's own validation.
# Users who want a write-capable profile configure it in their
# ~/.codex/config.toml the same way they would for any codex usage.
params: dict[str, Any] = {"cwd": self._cwd}
result = self._client.request("thread/start", params, timeout=15)
# Cross-fill thread.id/sessionId — different codex versions have
# serialized this under either key. Mirrors openclaw beta.8's
# tolerance fix so future codex drops/renames don't KeyError us
# at handshake time.
thread_obj = result.get("thread") or {}
thread_id = (
thread_obj.get("id")
or thread_obj.get("sessionId")
or result.get("sessionId")
or result.get("threadId")
)
if not thread_id:
raise CodexAppServerError(
code=-32603,
message=(
"codex thread/start returned no thread id "
f"(payload keys: {sorted(result.keys())})"
),
)
self._thread_id = thread_id
logger.info(
"codex app-server thread started: id=%s profile=%s cwd=%s",
self._thread_id[:8],
self._permission_profile,
self._cwd,
)
return self._thread_id
def close(self) -> None:
if self._closed:
return
self._closed = True
if self._client is not None:
try:
self._client.close()
except Exception: # pragma: no cover - best-effort cleanup
pass
self._client = None
self._thread_id = None
def __enter__(self) -> "CodexAppServerSession":
return self
def __exit__(self, *exc: Any) -> None:
self.close()
# ---------- interrupt ----------
def request_interrupt(self) -> None:
"""Idempotent: signal the active turn loop to issue turn/interrupt
and unwind. Called by AIAgent's _interrupt_requested path."""
self._interrupt_event.set()
# ---------- diagnostics ----------
def _format_error_with_stderr(
self,
prefix: str,
exc: Any = "",
*,
tail_lines: int = _STDERR_TAIL_LINES,
) -> str:
"""Build a user-facing error string for codex failures.
Appends the last few lines of codex's stderr buffer when available,
passed through agent.redact with force=True so secrets in provider
error responses (auth headers, query-string tokens, sk-* keys) never
leak into chat output or trajectories. The codex CLI's own error
text ('Internal error', 'turn/start failed: ...') is otherwise
opaque and forces users to re-run with verbose flags to diagnose
config / provider / auth-bridge problems.
Use this for the generic / catch-all branches. Specific
classifications (OAuth via _classify_oauth_failure, post-tool wedge
watchdog) already produce a clean hint and should be used instead.
"""
exc_str = str(exc) if exc != "" and exc is not None else ""
base = f"{prefix}: {exc_str}" if exc_str else prefix
if self._client is None:
return base
try:
tail = self._client.stderr_tail(tail_lines)
except Exception: # pragma: no cover - diagnostic best-effort
return base
if not tail:
return base
joined = "\n".join(line.rstrip() for line in tail if line)
if not joined.strip():
return base
redacted = redact_sensitive_text(joined, force=True)
return f"{base}\ncodex stderr (last {len(tail)} lines):\n{redacted}"
# ---------- per-turn ----------
def run_turn(
self,
user_input: str,
*,
turn_timeout: float = 600.0,
notification_poll_timeout: float = 0.25,
post_tool_quiet_timeout: float = 90.0,
) -> TurnResult:
"""Send a user message and block until turn/completed, while
forwarding server-initiated approval requests and projecting items
into Hermes' messages shape.
post_tool_quiet_timeout: if codex emits a tool completion and then
goes quiet for this many seconds without emitting another item or
`turn/completed`, fast-fail and mark the session for retirement.
Mirrors openclaw beta.8's post-tool completion watchdog (#81697)
so a wedged codex doesn't burn the full turn deadline.
"""
# Pre-create the result so startup failures (codex subprocess can't
# spawn, initialize handshake rejects, thread/start blows up) surface
# the same way per-turn failures do — with a TurnResult.error string
# the caller can render — instead of bubbling raw codex exceptions
# up to AIAgent.run_conversation.
result = TurnResult()
try:
self.ensure_started()
except (CodexAppServerError, TimeoutError) as exc:
result.error = self._format_error_with_stderr(
"codex app-server startup failed", exc
)
# Subprocess almost certainly unhealthy — retire so the next
# turn re-spawns cleanly.
result.should_retire = True
return result
assert self._client is not None and self._thread_id is not None
result.thread_id = self._thread_id
self._interrupt_event.clear()
projector = CodexEventProjector()
# Send turn/start with the user input. Text-only for now (codex
# supports rich content but Hermes' text path is the common case).
try:
ts = self._client.request(
"turn/start",
{
"threadId": self._thread_id,
"input": [{"type": "text", "text": user_input}],
},
timeout=10,
)
except CodexAppServerError as exc:
# Classify auth/refresh failures so the user gets a clear
# `codex login` pointer instead of a raw RPC error string.
stderr_blob = "\n".join(self._client.stderr_tail(40))
hint = _classify_oauth_failure(exc.message, stderr_blob)
if hint is not None:
result.error = hint
# Subprocess is fine on a JSON-RPC level here, but the
# token store is broken — retire so the next turn does a
# clean handshake (and the user has a chance to re-auth
# via `codex login` between turns).
result.should_retire = True
else:
result.error = self._format_error_with_stderr(
"turn/start failed", exc
)
return result
except TimeoutError as exc:
# turn/start hanging is a strong signal the subprocess is wedged.
stderr_blob = "\n".join(self._client.stderr_tail(40))
hint = _classify_oauth_failure(stderr_blob)
result.error = hint or self._format_error_with_stderr(
"turn/start timed out", exc
)
result.should_retire = True
return result
result.turn_id = (ts.get("turn") or {}).get("id")
deadline = time.time() + turn_timeout
turn_complete = False
# Post-tool watchdog state. last_tool_completion_at is set whenever
# a tool-shaped item completes; if no further notification arrives
# within post_tool_quiet_timeout and the turn hasn't completed, we
# fast-fail and retire the session.
last_tool_completion_at: Optional[float] = None
while time.time() < deadline and not turn_complete:
if self._interrupt_event.is_set():
self._issue_interrupt(result.turn_id)
result.interrupted = True
break
# Detect a dead subprocess between iterations. If codex exited
# (e.g. crashed, segfaulted, or its auth refresh thread killed
# the process), we won't get any more notifications — bail out
# rather than waiting for the full turn deadline.
if not self._client.is_alive():
stderr_blob = "\n".join(self._client.stderr_tail(60))
hint = _classify_oauth_failure(stderr_blob)
if hint is not None:
result.error = hint
else:
result.error = self._format_error_with_stderr(
"codex app-server subprocess exited unexpectedly",
tail_lines=20,
)
result.should_retire = True
break
# Post-tool watchdog: if a tool completion was the most recent
# signal and codex has been silent past the quiet timeout, give
# up on this turn instead of waiting for the outer deadline.
if (
last_tool_completion_at is not None
and (time.time() - last_tool_completion_at)
> post_tool_quiet_timeout
):
self._issue_interrupt(result.turn_id)
result.interrupted = True
result.error = (
f"codex went silent for "
f"{post_tool_quiet_timeout:.0f}s after a tool result; "
f"retiring app-server session."
)
result.should_retire = True
break
# Drain any server-initiated requests (approvals) before
# reading notifications, so the codex side isn't blocked.
sreq = self._client.take_server_request(timeout=0)
if sreq is not None:
# Drain any pending notifications first so per-turn state
# (e.g. _pending_file_changes for fileChange approvals) is
# up to date when we make the approval decision. Bounded
# to avoid starving the server-request response.
for _ in range(8):
pending = self._client.take_notification(timeout=0)
if pending is None:
break
self._track_pending_file_change(pending)
proj = projector.project(pending)
if proj.messages:
result.projected_messages.extend(proj.messages)
if proj.is_tool_iteration:
result.tool_iterations += 1
last_tool_completion_at = time.time()
if proj.final_text is not None:
result.final_text = proj.final_text
if _has_turn_aborted_marker(proj.final_text):
turn_complete = True
result.interrupted = True
result.error = (
result.error
or "codex reported turn_aborted"
)
self._handle_server_request(sreq)
# Activity counts as live signal — reset the post-tool
# quiet timer so an approval round-trip doesn't trip it.
last_tool_completion_at = None
continue
note = self._client.take_notification(
timeout=notification_poll_timeout
)
if note is None:
continue
method = note.get("method", "")
if self._on_event is not None:
try:
self._on_event(note)
except Exception: # pragma: no cover - display callback
logger.debug("on_event callback raised", exc_info=True)
# Track in-progress fileChange items so the approval bridge
# can surface a real change summary when codex requests
# approval (the approval params themselves don't carry the
# changeset). Quirk #4 fix.
self._track_pending_file_change(note)
# Project into messages
projection = projector.project(note)
if projection.messages:
result.projected_messages.extend(projection.messages)
if projection.is_tool_iteration:
result.tool_iterations += 1
# Arm/refresh the post-tool quiet watchdog whenever a
# tool-shaped item completes.
last_tool_completion_at = time.time()
else:
# Any non-tool projected activity (assistant message,
# status update, etc.) means codex is still producing
# output — clear the quiet timer so we don't fast-fail.
if projection.messages or projection.final_text is not None:
last_tool_completion_at = None
if projection.final_text is not None:
# Codex can emit multiple agentMessage items in one turn
# (e.g. partial then final). Take the last one as canonical.
result.final_text = projection.final_text
# Some codex builds tear a turn down by emitting a
# `<turn_aborted>` marker in the agent message text and
# never sending turn/completed. Treat the marker itself
# as terminal so we don't burn the full deadline.
if _has_turn_aborted_marker(projection.final_text):
turn_complete = True
result.interrupted = True
result.error = (
result.error or "codex reported turn_aborted"
)
if method == "turn/completed":
turn_complete = True
turn_status = (
(note.get("params") or {}).get("turn") or {}
).get("status")
if turn_status and turn_status not in ("completed", "interrupted"):
err_obj = (
(note.get("params") or {}).get("turn") or {}
).get("error")
if err_obj:
err_msg = err_obj.get("message") or str(err_obj)
# If the turn failed for an auth/refresh reason,
# rewrite the error into a re-auth hint AND mark
# the session for retirement.
stderr_blob = "\n".join(
self._client.stderr_tail(40)
)
hint = _classify_oauth_failure(err_msg, stderr_blob)
if hint is not None:
result.error = hint
result.should_retire = True
else:
result.error = self._format_error_with_stderr(
f"turn ended status={turn_status}", err_msg
)
if not turn_complete and not result.interrupted:
# Hit the deadline. Issue interrupt to stop wasted compute, and
# tell the caller to retire the session — a turn that never
# finished is a strong sign codex is wedged in a way the next
# turn shouldn't inherit.
self._issue_interrupt(result.turn_id)
result.interrupted = True
if not result.error:
result.error = self._format_error_with_stderr(
f"turn timed out after {turn_timeout}s"
)
result.should_retire = True
return result
# ---------- internals ----------
def _issue_interrupt(self, turn_id: Optional[str]) -> None:
if self._client is None or self._thread_id is None or turn_id is None:
return
try:
self._client.request(
"turn/interrupt",
{"threadId": self._thread_id, "turnId": turn_id},
timeout=5,
)
except CodexAppServerError as exc:
# "no active turn to interrupt" is fine — already done.
logger.debug("turn/interrupt non-fatal: %s", exc)
except TimeoutError:
logger.warning("turn/interrupt timed out")
def _handle_server_request(self, req: dict) -> None:
"""Translate a codex server request (approval) into Hermes' approval
flow, then send the response.
Method names verified live against codex 0.130.0 (Apr 2026):
item/commandExecution/requestApproval exec approvals
item/fileChange/requestApproval apply_patch approvals
item/permissions/requestApproval permissions changes
(we decline; user controls
permission profile in
~/.codex/config.toml).
"""
if self._client is None:
return
method = req.get("method", "")
rid = req.get("id")
params = req.get("params") or {}
if method == "item/commandExecution/requestApproval":
decision = self._decide_exec_approval(params)
self._client.respond(rid, {"decision": decision})
elif method == "item/fileChange/requestApproval":
decision = self._decide_apply_patch_approval(params)
self._client.respond(rid, {"decision": decision})
elif method == "item/permissions/requestApproval":
# Codex sometimes asks to escalate permissions mid-turn. We
# always decline — the user already chose their permission
# profile in ~/.codex/config.toml and surprise escalations
# shouldn't be silently accepted.
self._client.respond(rid, {"decision": "decline"})
elif method == "mcpServer/elicitation/request":
# Codex's MCP layer asks the user for structured input on
# behalf of an MCP server (e.g. tool-call confirmation,
# OAuth, form data). For our own hermes-tools callback we
# auto-accept — the user already approved Hermes' tools
# by enabling the runtime, and we never expose anything
# codex's built-in shell can't already do. For other MCP
# servers we decline so the user explicitly opts in via
# codex's own auth flow.
server_name = params.get("serverName") or ""
if server_name == "hermes-tools":
self._client.respond(
rid,
{"action": "accept", "content": None, "_meta": None},
)
else:
self._client.respond(
rid,
{"action": "decline", "content": None, "_meta": None},
)
else:
# Unknown server request — codex can extend this surface. Reject
# cleanly so codex doesn't hang waiting for us.
logger.warning("Unknown codex server request: %s", method)
self._client.respond_error(
rid, code=-32601, message=f"Unsupported method: {method}"
)
def _decide_exec_approval(self, params: dict) -> str:
if self._routing.auto_approve_exec:
return "accept"
command = params.get("command") or ""
# Codex's CommandExecutionRequestApprovalParams has cwd as Optional —
# fall back to the session's cwd when codex doesn't include it so the
# approval prompt is never empty (quirk #10 fix).
cwd = params.get("cwd") or self._cwd or "<unknown>"
reason = params.get("reason")
description = f"Codex requests exec in {cwd}"
if reason:
description += f"{reason}"
if self._approval_callback is not None:
try:
choice = self._approval_callback(
command, description, allow_permanent=False
)
return _approval_choice_to_codex_decision(choice)
except Exception:
logger.exception("approval_callback raised on exec request")
return "decline"
return "decline" # fail-closed when no callback wired
def _decide_apply_patch_approval(self, params: dict) -> str:
if self._routing.auto_approve_apply_patch:
return "accept"
if self._approval_callback is not None:
# FileChangeRequestApprovalParams gives us reason + grantRoot.
# The actual changeset lives on the corresponding fileChange
# item which the projector has already cached for us — look it
# up by item_id so the user sees what's actually changing.
reason = params.get("reason")
grant_root = params.get("grantRoot")
item_id = params.get("itemId") or ""
change_summary = self._lookup_pending_file_change(item_id)
description_parts = []
if reason:
description_parts.append(reason)
if change_summary:
description_parts.append(change_summary)
if grant_root:
description_parts.append(f"grants write to {grant_root}")
description = (
"; ".join(description_parts)
if description_parts
else "Codex requests to apply a patch"
)
command_label = (
f"apply_patch: {change_summary}" if change_summary
else f"apply_patch: {reason}" if reason
else "apply_patch"
)
try:
choice = self._approval_callback(
command_label,
description,
allow_permanent=False,
)
return _approval_choice_to_codex_decision(choice)
except Exception:
logger.exception("approval_callback raised on apply_patch")
return "decline"
return "decline"
def _track_pending_file_change(self, note: dict) -> None:
"""Maintain self._pending_file_changes from item/started + item/completed
notifications. Lets the apply_patch approval prompt show what's
actually changing codex's approval params don't carry the data."""
method = note.get("method", "")
params = note.get("params") or {}
item = params.get("item") or {}
if item.get("type") != "fileChange":
return
item_id = item.get("id") or ""
if not item_id:
return
if method == "item/started":
changes = item.get("changes") or []
if not changes:
self._pending_file_changes[item_id] = "1 change pending"
return
kinds: dict[str, int] = {}
paths: list[str] = []
for ch in changes:
if not isinstance(ch, dict):
continue
kind = (ch.get("kind") or {}).get("type") or "update"
kinds[kind] = kinds.get(kind, 0) + 1
p = ch.get("path") or ""
if p:
paths.append(p)
counts = ", ".join(f"{n} {k}" for k, n in sorted(kinds.items()))
preview = ", ".join(paths[:3])
if len(paths) > 3:
preview += f", +{len(paths) - 3} more"
self._pending_file_changes[item_id] = (
f"{counts}: {preview}" if preview else counts
)
elif method == "item/completed":
self._pending_file_changes.pop(item_id, None)
def _lookup_pending_file_change(self, item_id: str) -> Optional[str]:
"""Look up an in-progress fileChange item by id and summarize its
changes for the approval prompt. Returns None when we don't have
the item cached (e.g. approval arrived before item/started, or
fileChange item content not tracked yet)."""
if not item_id:
return None
cached = self._pending_file_changes.get(item_id)
if not cached:
return None
return cached
def _approval_choice_to_codex_decision(choice: str) -> str:
"""Map Hermes approval choices onto codex's CommandExecutionApprovalDecision
/ FileChangeApprovalDecision wire values.
Hermes returns 'once', 'session', 'always', or 'deny'.
Codex expects 'accept', 'acceptForSession', 'decline', or 'cancel'
(verified against codex-rs/app-server-protocol/src/protocol/v2/item.rs
on codex 0.130.0).
"""
if choice in ("once",):
return "accept"
if choice in ("session", "always"):
return "acceptForSession"
return "decline"
def _has_turn_aborted_marker(text: str) -> bool:
"""Return True if `text` contains any of the raw markers codex uses
to signal a turn was aborted without emitting `turn/completed`.
Codex emits `<turn_aborted>` (and sometimes `<turn_aborted/>`) as raw
text inside agentMessage items when an interrupt or upstream error
tears the turn down before the normal completion path fires. Mirrors
openclaw beta.8's terminal-marker fix so we don't burn the full turn
deadline waiting for a turn/completed that never comes.
"""
if not text:
return False
for marker in _TURN_ABORTED_MARKERS:
if marker in text:
return True
return False
def _get_hermes_version() -> str:
"""Best-effort Hermes version string for codex's userAgent line."""
try:
from importlib.metadata import version
return version("hermes-agent")
except Exception: # pragma: no cover
return "0.0.0"
+312
View File
@@ -0,0 +1,312 @@
"""Projects codex app-server events into Hermes' messages list.
The translator that lets Hermes' memory/skill review keep working under the
Codex runtime: it converts Codex `item/*` notifications into the standard
OpenAI-shaped `{role, content, tool_calls, tool_call_id}` entries that
`agent/curator.py` already knows how to read.
Codex emits items with a discriminator field `type`:
- userMessage {role: "user", content}
- agentMessage {role: "assistant", content}
- reasoning stashed in the assistant's "reasoning" field
- commandExecution assistant tool_call(name="exec") + tool result
- fileChange assistant tool_call(name="apply_patch") + tool result
- mcpToolCall assistant tool_call(name=f"mcp.{server}.{tool}") + tool result
- dynamicToolCall assistant tool_call(name=tool) + tool result
- plan/hookPrompt/collabAgentToolCall recorded as opaque assistant notes
Each item maps to AT MOST one assistant entry + one tool entry, preserving
Hermes' message-alternation invariants (system → user → assistant → user/tool
assistant ...). Multiple Codex tool calls within one Codex turn produce
multiple consecutive (assistant, tool) pairs, which is the same shape Hermes
already produces for parallel tool calls.
Counters tracked alongside projection:
- tool_iterations: ticks once per completed tool-shaped item. Used by
AIAgent._iters_since_skill (skill nudge gate, default threshold 10).
"""
from __future__ import annotations
import hashlib
import json
from dataclasses import dataclass, field
from typing import Any, Optional
def _deterministic_call_id(item_type: str, item_id: str) -> str:
"""Stable id for tool_call message correlation.
Uses the codex item id directly when present (already a uuid); falls back
to a content hash so replay produces the same id across sessions and
prefix caches stay valid. See AGENTS.md Pitfall #16 (deterministic IDs in
tool call history)."""
if item_id:
return f"codex_{item_type}_{item_id}"
digest = hashlib.sha256(f"{item_type}".encode()).hexdigest()[:16]
return f"codex_{item_type}_{digest}"
def _format_tool_args(d: dict) -> str:
"""Format a dict as JSON the way Hermes' existing tool_calls path does."""
return json.dumps(d, ensure_ascii=False, sort_keys=True)
@dataclass
class ProjectionResult:
"""Output of projecting one Codex item.
`messages` is a list because some Codex items produce two messages
(assistant tool_call + tool result). Empty list = item ignored (e.g. a
streaming `outputDelta` that doesn't materialize into messages until the
`item/completed` event)."""
messages: list[dict] = field(default_factory=list)
is_tool_iteration: bool = False
final_text: Optional[str] = None # Set when an agentMessage completes
class CodexEventProjector:
"""Stateful projector consuming Codex notifications in arrival order.
Owns the in-progress reasoning content (codex emits reasoning as separate
items but Hermes stashes it on the next assistant message)."""
def __init__(self) -> None:
self._pending_reasoning: list[str] = []
def project(self, notification: dict) -> ProjectionResult:
"""Project a single notification. Idempotent for non-completion events;
only `item/completed` and `turn/completed` materialize messages."""
method = notification.get("method", "")
params = notification.get("params", {}) or {}
# We only materialize messages on `item/completed`. Streaming deltas
# (`item/<type>/outputDelta`, `item/<type>/delta`) are display-only and
# don't enter the messages list — same way Hermes already only writes
# the assistant message after the streaming completion event.
if method != "item/completed":
return ProjectionResult()
item = params.get("item") or {}
item_type = item.get("type") or ""
item_id = item.get("id") or ""
if item_type == "agentMessage":
return self._project_agent_message(item)
if item_type == "reasoning":
self._pending_reasoning.extend(item.get("summary") or [])
self._pending_reasoning.extend(item.get("content") or [])
return ProjectionResult()
if item_type == "commandExecution":
return self._project_command(item, item_id)
if item_type == "fileChange":
return self._project_file_change(item, item_id)
if item_type == "mcpToolCall":
return self._project_mcp_tool_call(item, item_id)
if item_type == "dynamicToolCall":
return self._project_dynamic_tool_call(item, item_id)
if item_type == "userMessage":
return self._project_user_message(item)
# Unknown / rare items (plan, hookPrompt, collabAgentToolCall, etc.)
# — record as opaque assistant note so memory review can still see
# *something* happened, but don't fabricate tool_call structure.
return self._project_opaque(item, item_type)
# ---------- per-type projections ----------
def _project_agent_message(self, item: dict) -> ProjectionResult:
text = item.get("text") or ""
msg: dict[str, Any] = {"role": "assistant", "content": text}
if self._pending_reasoning:
msg["reasoning"] = "\n".join(self._pending_reasoning)
self._pending_reasoning = []
return ProjectionResult(messages=[msg], final_text=text)
def _project_user_message(self, item: dict) -> ProjectionResult:
# codex's userMessage content is a list of UserInput variants. For
# projection purposes we flatten any text fragments and ignore
# non-text parts (images, etc.) — Hermes' messages store text only.
text_parts: list[str] = []
for fragment in item.get("content") or []:
if isinstance(fragment, dict):
if fragment.get("type") == "text":
text_parts.append(fragment.get("text") or "")
elif "text" in fragment:
text_parts.append(str(fragment["text"]))
return ProjectionResult(
messages=[{"role": "user", "content": "\n".join(text_parts)}]
)
def _project_command(self, item: dict, item_id: str) -> ProjectionResult:
call_id = _deterministic_call_id("exec", item_id)
args = {
"command": item.get("command") or "",
"cwd": item.get("cwd") or "",
}
assistant_msg = {
"role": "assistant",
"content": None,
"tool_calls": [
{
"id": call_id,
"type": "function",
"function": {
"name": "exec_command",
"arguments": _format_tool_args(args),
},
}
],
}
if self._pending_reasoning:
assistant_msg["reasoning"] = "\n".join(self._pending_reasoning)
self._pending_reasoning = []
output = item.get("aggregatedOutput") or ""
exit_code = item.get("exitCode")
if exit_code is not None and exit_code != 0:
output = f"[exit {exit_code}]\n{output}"
tool_msg = {
"role": "tool",
"tool_call_id": call_id,
"content": output,
}
return ProjectionResult(
messages=[assistant_msg, tool_msg], is_tool_iteration=True
)
def _project_file_change(self, item: dict, item_id: str) -> ProjectionResult:
call_id = _deterministic_call_id("apply_patch", item_id)
# Reduce the codex changes array to a digest the agent loop will
# find readable. We record per-file change kinds (Add/Update/Delete)
# without inlining full file contents — those can be huge.
changes_summary = []
for change in item.get("changes") or []:
kind = (change.get("kind") or {}).get("type") or "update"
path = change.get("path") or ""
changes_summary.append({"kind": kind, "path": path})
args = {"changes": changes_summary}
assistant_msg = {
"role": "assistant",
"content": None,
"tool_calls": [
{
"id": call_id,
"type": "function",
"function": {
"name": "apply_patch",
"arguments": _format_tool_args(args),
},
}
],
}
if self._pending_reasoning:
assistant_msg["reasoning"] = "\n".join(self._pending_reasoning)
self._pending_reasoning = []
status = item.get("status") or "unknown"
n = len(changes_summary)
tool_msg = {
"role": "tool",
"tool_call_id": call_id,
"content": f"apply_patch status={status}, {n} change(s)",
}
return ProjectionResult(
messages=[assistant_msg, tool_msg], is_tool_iteration=True
)
def _project_mcp_tool_call(self, item: dict, item_id: str) -> ProjectionResult:
server = item.get("server") or "mcp"
tool = item.get("tool") or "unknown"
call_id = _deterministic_call_id(f"mcp_{server}_{tool}", item_id)
args = item.get("arguments") or {}
if not isinstance(args, dict):
args = {"arguments": args}
assistant_msg = {
"role": "assistant",
"content": None,
"tool_calls": [
{
"id": call_id,
"type": "function",
"function": {
"name": f"mcp.{server}.{tool}",
"arguments": _format_tool_args(args),
},
}
],
}
if self._pending_reasoning:
assistant_msg["reasoning"] = "\n".join(self._pending_reasoning)
self._pending_reasoning = []
result = item.get("result")
error = item.get("error")
if error:
content = f"[error] {json.dumps(error, ensure_ascii=False)[:1000]}"
elif result is not None:
content = json.dumps(result, ensure_ascii=False)[:4000]
else:
content = ""
tool_msg = {
"role": "tool",
"tool_call_id": call_id,
"content": content,
}
return ProjectionResult(
messages=[assistant_msg, tool_msg], is_tool_iteration=True
)
def _project_dynamic_tool_call(
self, item: dict, item_id: str
) -> ProjectionResult:
tool = item.get("tool") or "unknown"
call_id = _deterministic_call_id(f"dyn_{tool}", item_id)
args = item.get("arguments") or {}
if not isinstance(args, dict):
args = {"arguments": args}
assistant_msg = {
"role": "assistant",
"content": None,
"tool_calls": [
{
"id": call_id,
"type": "function",
"function": {
"name": tool,
"arguments": _format_tool_args(args),
},
}
],
}
if self._pending_reasoning:
assistant_msg["reasoning"] = "\n".join(self._pending_reasoning)
self._pending_reasoning = []
content_items = item.get("contentItems") or []
if isinstance(content_items, list) and content_items:
content = json.dumps(content_items, ensure_ascii=False)[:4000]
else:
success = item.get("success")
content = f"success={success}"
tool_msg = {
"role": "tool",
"tool_call_id": call_id,
"content": content,
}
return ProjectionResult(
messages=[assistant_msg, tool_msg], is_tool_iteration=True
)
def _project_opaque(self, item: dict, item_type: str) -> ProjectionResult:
# Record the existence of the item without inventing tool_calls.
# Memory review will see this and may or may not save anything.
try:
payload = json.dumps(item, ensure_ascii=False)[:1500]
except (TypeError, ValueError):
payload = repr(item)[:1500]
return ProjectionResult(
messages=[
{
"role": "assistant",
"content": f"[codex {item_type}] {payload}",
}
]
)
+225
View File
@@ -0,0 +1,225 @@
"""Hermes-tools-as-MCP server for the codex_app_server runtime.
When the user runs `openai/*` turns through the codex app-server, codex
owns the loop and builds its own tool list. By default, that means
Hermes' richer tool surface — web search, browser automation,
delegate_task subagents, vision analysis, persistent memory, skills,
cross-session search, image generation, TTS is unreachable.
This module exposes a curated subset of those Hermes tools to the
spawned codex subprocess via stdio MCP. Codex registers it as a normal
MCP server (per `~/.codex/config.toml [mcp_servers.hermes-tools]`) and
the user gets full Hermes capability inside a Codex turn.
Scope (what we expose):
- web_search, web_extract Firecrawl, no codex equivalent
- browser_navigate / _click / _type / Camofox/Browserbase automation
_snapshot / _screenshot / _scroll / _back / _press / _vision
- delegate_task Hermes subagents
- vision_analyze image inspection by vision model
- image_generate image generation
- memory Hermes' persistent memory store
- skill_view, skills_list Hermes' skill library
- session_search cross-session search
- text_to_speech TTS
What we DO NOT expose (codex has equivalents):
- terminal / shell codex's own shell tool
- read_file / write_file / patch codex's apply_patch + shell
- search_files / process codex's shell
- clarify, todo codex's own UX
Run with: python -m agent.transports.hermes_tools_mcp_server
Spawned by: CodexAppServerSession.ensure_started() when the runtime is
active and config opts in.
"""
from __future__ import annotations
import json
import logging
import os
import sys
from typing import Any, Optional
logger = logging.getLogger(__name__)
# Tools we expose. Each name MUST match a registered Hermes tool that
# `model_tools.handle_function_call()` can dispatch.
#
# What we deliberately DO NOT expose:
# - terminal / shell / read_file / write_file / patch / search_files /
# process — codex's built-ins cover these and approval routes through
# codex's own UI.
# - delegate_task / memory / session_search / todo — these are
# `_AGENT_LOOP_TOOLS` in Hermes (model_tools.py:493). They require
# the running AIAgent context to dispatch (mid-loop state), so a
# stateless MCP callback can't drive them. Hermes' default runtime
# keeps these working; the codex_app_server runtime cannot.
EXPOSED_TOOLS: tuple[str, ...] = (
"web_search",
"web_extract",
"browser_navigate",
"browser_click",
"browser_type",
"browser_press",
"browser_snapshot",
"browser_scroll",
"browser_back",
"browser_get_images",
"browser_console",
"browser_vision",
"vision_analyze",
"image_generate",
"skill_view",
"skills_list",
"text_to_speech",
# Kanban worker handoff tools — gated on HERMES_KANBAN_TASK env var
# (set by the kanban dispatcher when spawning a worker). Without these
# in the callback, a worker spawned with openai_runtime=codex_app_server
# could do the work but couldn't report completion back to the kernel,
# making it hang until timeout. Stateless dispatch — they just read
# the env var and write to ~/.hermes/kanban.db.
"kanban_complete",
"kanban_block",
"kanban_comment",
"kanban_heartbeat",
"kanban_show",
"kanban_list",
# NOTE: kanban_create / kanban_unblock / kanban_link are orchestrator-
# only — the kanban tool gates them on HERMES_KANBAN_TASK being unset.
# They're exposed here for orchestrator agents running on the codex
# runtime that need to dispatch new tasks.
"kanban_create",
"kanban_unblock",
"kanban_link",
)
def _build_server() -> Any:
"""Create the FastMCP server with Hermes tools attached. Lazy imports
so the module can be imported without the mcp package installed
(we degrade to a clear error only when actually run)."""
try:
from mcp.server.fastmcp import FastMCP
except ImportError as exc: # pragma: no cover - install hint
raise ImportError(
f"hermes-tools MCP server requires the 'mcp' package: {exc}"
) from exc
# Discover Hermes tools so dispatch works.
from model_tools import (
get_tool_definitions,
handle_function_call,
)
mcp = FastMCP(
"hermes-tools",
instructions=(
"Hermes Agent's tool surface, exposed for use inside a Codex "
"session. Use these for capabilities Codex's built-in toolset "
"doesn't cover: web search/extract, browser automation, "
"subagent delegation, vision, image generation, persistent "
"memory, skills, and cross-session search."
),
)
# Pull authoritative Hermes tool schemas for the ones we expose, so
# MCP clients see the same parameter docs Hermes gives the model.
all_defs = {
td["function"]["name"]: td["function"]
for td in (get_tool_definitions(quiet_mode=True) or [])
if isinstance(td, dict) and td.get("type") == "function"
}
exposed_count = 0
for name in EXPOSED_TOOLS:
spec = all_defs.get(name)
if spec is None:
logger.debug(
"skipping %s — not registered in this Hermes process", name
)
continue
description = spec.get("description") or f"Hermes {name} tool"
params_schema = spec.get("parameters") or {"type": "object", "properties": {}}
# FastMCP wants a Python callable. Build a closure that takes the
# arguments dict, dispatches via handle_function_call, and returns
# the result string. We use add_tool() for full control over the
# input schema (FastMCP's @tool() decorator inspects type hints,
# which we can't get from a JSON schema at runtime).
def _make_handler(tool_name: str):
def _dispatch(**kwargs: Any) -> str:
try:
return handle_function_call(tool_name, kwargs or {})
except Exception as exc:
logger.exception("tool %s raised", tool_name)
return json.dumps({"error": str(exc), "tool": tool_name})
_dispatch.__name__ = tool_name
_dispatch.__doc__ = description
return _dispatch
try:
mcp.add_tool(
_make_handler(name),
name=name,
description=description,
# FastMCP accepts JSON schema directly via the
# input_schema parameter on newer versions; older
# versions use parameters_schema. Try both for compat.
)
except TypeError:
# Older mcp SDK signature — fall back to decorator-style.
handler = _make_handler(name)
handler = mcp.tool(name=name, description=description)(handler)
exposed_count += 1
logger.info(
"hermes-tools MCP server registered %d/%d tools",
exposed_count,
len(EXPOSED_TOOLS),
)
return mcp
def main(argv: Optional[list[str]] = None) -> int:
"""Entry point for `python -m agent.transports.hermes_tools_mcp_server`."""
argv = argv or sys.argv[1:]
verbose = "--verbose" in argv or "-v" in argv
log_level = logging.INFO if verbose else logging.WARNING
logging.basicConfig(
level=log_level,
stream=sys.stderr, # MCP uses stdio for protocol — logs MUST go to stderr
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
)
# Quiet mode: keep Hermes' own banners off stdout (which is the MCP wire).
os.environ.setdefault("HERMES_QUIET", "1")
os.environ.setdefault("HERMES_REDACT_SECRETS", "true")
try:
server = _build_server()
except ImportError as exc:
sys.stderr.write(f"hermes-tools MCP server cannot start: {exc}\n")
return 2
# FastMCP runs with stdio transport by default when launched as a
# subprocess.
try:
server.run()
except KeyboardInterrupt:
return 0
except Exception as exc:
logger.exception("hermes-tools MCP server crashed")
sys.stderr.write(f"hermes-tools MCP server error: {exc}\n")
return 1
return 0
if __name__ == "__main__":
sys.exit(main())
+299
View File
@@ -0,0 +1,299 @@
"""
Video Generation Provider ABC
=============================
Defines the pluggable-backend interface for video generation. Providers register
instances via ``PluginContext.register_video_gen_provider()``; the active one
(selected via ``video_gen.provider`` in ``config.yaml``) services every
``video_generate`` tool call.
Providers live in ``<repo>/plugins/video_gen/<name>/`` (built-in, auto-loaded
as ``kind: backend``) or ``~/.hermes/plugins/video_gen/<name>/`` (user, opt-in
via ``plugins.enabled``).
Mirrors the ``image_gen`` provider design (``agent/image_gen_provider.py``) so
the two surfaces stay learnable together.
Unified surface
---------------
One tool ``video_generate`` covers **text-to-video** and **image-to-video**.
The router is the presence of ``image_url``: if it's set, the provider routes
to its image-to-video endpoint; if it's omitted, the provider routes to
text-to-video. Users pick one **model family** (e.g. Pixverse v6, Veo 3.1,
Kling O3 Standard); the provider handles which underlying FAL/xAI endpoint
to hit.
Video edit and video extend are intentionally NOT exposed in this surface
the inconsistency across backends is too large for one unified tool. If
those use cases warrant attention later they can ship as separate tools.
Response shape
--------------
All providers return a dict built by :func:`success_response` /
:func:`error_response`. Keys:
success bool
video str | None URL or absolute file path
model str provider-specific model identifier
prompt str echoed prompt
modality str "text" | "image" (which mode was used)
aspect_ratio str provider-native (e.g. "16:9") or ""
duration int seconds (0 if not applicable)
provider str provider name (for diagnostics)
error str only when success=False
error_type str only when success=False
"""
from __future__ import annotations
import abc
import base64
import datetime
import logging
import uuid
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
# Common aspect ratios across providers (Veo / Kling / xAI / Pixverse). The
# tool schema advertises this set as an enum hint, but providers may accept
# a narrower or wider set — they are responsible for clamping.
COMMON_ASPECT_RATIOS: Tuple[str, ...] = ("16:9", "9:16", "1:1", "4:3", "3:4", "3:2", "2:3")
DEFAULT_ASPECT_RATIO = "16:9"
COMMON_RESOLUTIONS: Tuple[str, ...] = ("480p", "540p", "720p", "1080p")
DEFAULT_RESOLUTION = "720p"
# ---------------------------------------------------------------------------
# ABC
# ---------------------------------------------------------------------------
class VideoGenProvider(abc.ABC):
"""Abstract base class for a video generation backend.
Subclasses must implement :meth:`generate`. Everything else has sane
defaults override only what your provider needs.
"""
@property
@abc.abstractmethod
def name(self) -> str:
"""Stable short identifier used in ``video_gen.provider`` config.
Lowercase, no spaces. Examples: ``xai``, ``fal``, ``google``.
"""
@property
def display_name(self) -> str:
"""Human-readable label shown in ``hermes tools``. Defaults to ``name.title()``."""
return self.name.title()
def is_available(self) -> bool:
"""Return True when this provider can service calls.
Typically checks for a required API key and optional-dependency
import. Default: True.
"""
return True
def list_models(self) -> List[Dict[str, Any]]:
"""Return catalog entries for ``hermes tools`` model picker.
Each entry represents a **model family** that supports text-to-video
and/or image-to-video routing internally::
{
"id": "veo-3.1", # required
"display": "Veo 3.1", # optional; defaults to id
"speed": "~60s", # optional
"strengths": "...", # optional
"price": "$0.20/s", # optional
"modalities": ["text", "image"], # optional, advisory
}
Default: empty list (provider has no user-selectable models).
"""
return []
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker."""
return {
"name": self.display_name,
"badge": "",
"tag": "",
"env_vars": [],
}
def default_model(self) -> Optional[str]:
"""Return the default model id, or None if not applicable."""
models = self.list_models()
if models:
return models[0].get("id")
return None
def capabilities(self) -> Dict[str, Any]:
"""Return what this provider supports.
Returned dict (all keys optional)::
{
"modalities": ["text", "image"], # which inputs the backend accepts
"aspect_ratios": ["16:9", "9:16", ...],
"resolutions": ["720p", "1080p"],
"max_duration": 15, # seconds
"min_duration": 1,
"supports_audio": True,
"supports_negative_prompt": True,
"max_reference_images": 7,
}
Used by the tool layer for soft validation and by ``hermes tools``
for the picker. Default: text-only.
"""
return {
"modalities": ["text"],
"aspect_ratios": list(COMMON_ASPECT_RATIOS),
"resolutions": list(COMMON_RESOLUTIONS),
"max_duration": 10,
"min_duration": 1,
"supports_audio": False,
"supports_negative_prompt": False,
"max_reference_images": 0,
}
@abc.abstractmethod
def generate(
self,
prompt: str,
*,
model: Optional[str] = None,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
duration: Optional[int] = None,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
resolution: str = DEFAULT_RESOLUTION,
negative_prompt: Optional[str] = None,
audio: Optional[bool] = None,
seed: Optional[int] = None,
**kwargs: Any,
) -> Dict[str, Any]:
"""Generate a video from a prompt (text-to-video) or animate an image
(image-to-video).
Routing: if ``image_url`` is provided, the provider should route to
its image-to-video endpoint; otherwise text-to-video. The plugin
is responsible for picking the right underlying endpoint within
the user's chosen model family.
Implementations should return the dict from :func:`success_response`
or :func:`error_response`. ``kwargs`` may contain forward-compat
parameters future versions of the schema will expose
implementations MUST ignore unknown keys (no TypeError).
"""
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _videos_cache_dir() -> Path:
"""Return ``$HERMES_HOME/cache/videos/``, creating parents as needed."""
from hermes_constants import get_hermes_home
path = get_hermes_home() / "cache" / "videos"
path.mkdir(parents=True, exist_ok=True)
return path
def save_b64_video(
b64_data: str,
*,
prefix: str = "video",
extension: str = "mp4",
) -> Path:
"""Decode base64 video data and write under ``$HERMES_HOME/cache/videos/``.
Returns the absolute :class:`Path` to the saved file.
Filename format: ``<prefix>_<YYYYMMDD_HHMMSS>_<short-uuid>.<ext>``.
"""
raw = base64.b64decode(b64_data)
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
short = uuid.uuid4().hex[:8]
path = _videos_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
path.write_bytes(raw)
return path
def save_bytes_video(
raw: bytes,
*,
prefix: str = "video",
extension: str = "mp4",
) -> Path:
"""Write raw video bytes (e.g. an HTTP download body) to the cache."""
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
short = uuid.uuid4().hex[:8]
path = _videos_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
path.write_bytes(raw)
return path
def success_response(
*,
video: str,
model: str,
prompt: str,
modality: str = "text",
aspect_ratio: str = "",
duration: int = 0,
provider: str,
extra: Optional[Dict[str, Any]] = None,
) -> Dict[str, Any]:
"""Build a uniform success response dict.
``video`` may be an HTTP URL or an absolute filesystem path.
``modality`` is ``"text"`` (text-to-video) or ``"image"`` (image-to-video)
indicates which endpoint was actually hit, useful for diagnostics.
"""
payload: Dict[str, Any] = {
"success": True,
"video": video,
"model": model,
"prompt": prompt,
"modality": modality,
"aspect_ratio": aspect_ratio,
"duration": int(duration) if duration else 0,
"provider": provider,
}
if extra:
for k, v in extra.items():
payload.setdefault(k, v)
return payload
def error_response(
*,
error: str,
error_type: str = "provider_error",
provider: str = "",
model: str = "",
prompt: str = "",
aspect_ratio: str = "",
) -> Dict[str, Any]:
"""Build a uniform error response dict."""
return {
"success": False,
"video": None,
"error": error,
"error_type": error_type,
"model": model,
"prompt": prompt,
"aspect_ratio": aspect_ratio,
"provider": provider,
}
+117
View File
@@ -0,0 +1,117 @@
"""
Video Generation Provider Registry
==================================
Central map of registered providers. Populated by plugins at import-time via
``PluginContext.register_video_gen_provider()``; consumed by the
``video_generate`` tool to dispatch each call to the active backend.
Active selection
----------------
The active provider is chosen by ``video_gen.provider`` in ``config.yaml``.
If unset, :func:`get_active_provider` applies fallback logic:
1. If exactly one provider is registered, use it.
2. Otherwise return ``None`` (the tool surfaces a helpful error pointing
the user at ``hermes tools``).
Mirrors ``agent/image_gen_registry.py`` so the two surfaces behave the
same.
"""
from __future__ import annotations
import logging
import threading
from typing import Dict, List, Optional
from agent.video_gen_provider import VideoGenProvider
logger = logging.getLogger(__name__)
_providers: Dict[str, VideoGenProvider] = {}
_lock = threading.Lock()
def register_provider(provider: VideoGenProvider) -> None:
"""Register a video generation provider.
Re-registration (same ``name``) overwrites the previous entry and logs
a debug message this makes hot-reload scenarios (tests, dev loops)
behave predictably.
"""
if not isinstance(provider, VideoGenProvider):
raise TypeError(
f"register_provider() expects a VideoGenProvider instance, "
f"got {type(provider).__name__}"
)
name = provider.name
if not isinstance(name, str) or not name.strip():
raise ValueError("Video gen provider .name must be a non-empty string")
with _lock:
existing = _providers.get(name)
_providers[name] = provider
if existing is not None:
logger.debug("Video gen provider '%s' re-registered (was %r)", name, type(existing).__name__)
else:
logger.debug("Registered video gen provider '%s' (%s)", name, type(provider).__name__)
def list_providers() -> List[VideoGenProvider]:
"""Return all registered providers, sorted by name."""
with _lock:
items = list(_providers.values())
return sorted(items, key=lambda p: p.name)
def get_provider(name: str) -> Optional[VideoGenProvider]:
"""Return the provider registered under *name*, or None."""
if not isinstance(name, str):
return None
with _lock:
return _providers.get(name.strip())
def get_active_provider() -> Optional[VideoGenProvider]:
"""Resolve the currently-active provider.
Reads ``video_gen.provider`` from config.yaml; falls back per the
module docstring.
"""
configured: Optional[str] = None
try:
from hermes_cli.config import load_config
cfg = load_config()
section = cfg.get("video_gen") if isinstance(cfg, dict) else None
if isinstance(section, dict):
raw = section.get("provider")
if isinstance(raw, str) and raw.strip():
configured = raw.strip()
except Exception as exc:
logger.debug("Could not read video_gen.provider from config: %s", exc)
with _lock:
snapshot = dict(_providers)
if configured:
provider = snapshot.get(configured)
if provider is not None:
return provider
logger.debug(
"video_gen.provider='%s' configured but not registered; falling back",
configured,
)
# Fallback: single-provider case
if len(snapshot) == 1:
return next(iter(snapshot.values()))
return None
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
_providers.clear()
+221
View File
@@ -0,0 +1,221 @@
"""
Web Search Provider ABC
=======================
Defines the pluggable-backend interface for web search and content extraction.
Providers register instances via ``PluginContext.register_web_search_provider()``;
the active one (selected via ``web.search_backend`` / ``web.extract_backend`` /
``web.backend`` in ``config.yaml``) services every ``web_search`` /
``web_extract`` tool call.
Providers live in ``<repo>/plugins/web/<name>/`` (built-in, auto-loaded as
``kind: backend``) or ``~/.hermes/plugins/web/<name>/`` (user, opt-in via
``plugins.enabled``).
This ABC is the SINGLE plugin-facing surface for web providers every
provider in the tree (brave-free, ddgs, searxng, exa, parallel, tavily,
firecrawl) implements it. The legacy in-tree ``tools.web_providers.base``
ABCs were deleted in PR #25182 along with the per-vendor inline helpers
in ``tools/web_tools.py``; the response-shape contract documented below
is preserved bit-for-bit so the tool wrapper does not have to translate.
Response shape (preserved from the legacy contract):
Search results::
{
"success": True,
"data": {
"web": [
{"title": str, "url": str, "description": str, "position": int},
...
]
}
}
Extract results::
{
"success": True,
"data": [
{"url": str, "title": str, "content": str,
"raw_content": str, "metadata": dict},
...
]
}
On failure (either capability)::
{"success": False, "error": str}
"""
from __future__ import annotations
import abc
from typing import Any, Dict, List
# ---------------------------------------------------------------------------
# ABC
# ---------------------------------------------------------------------------
class WebSearchProvider(abc.ABC):
"""Abstract base class for a web search/extract/crawl backend.
Subclasses must implement :meth:`is_available` and at least one of
:meth:`search` / :meth:`extract` / :meth:`crawl`. The
:meth:`supports_search` / :meth:`supports_extract` / :meth:`supports_crawl`
capability flags let the registry route each tool call to the right
provider, and let multi-capability providers (Firecrawl, Tavily, Exa,
) advertise multiple capabilities from a single class.
"""
@property
@abc.abstractmethod
def name(self) -> str:
"""Stable short identifier used in ``web.search_backend`` /
``web.extract_backend`` / ``web.backend`` config keys.
Lowercase, no spaces; hyphens permitted to preserve existing
user-visible names. Examples: ``brave-free``, ``ddgs``,
``searxng``, ``firecrawl``.
"""
@property
def display_name(self) -> str:
"""Human-readable label shown in ``hermes tools``. Defaults to ``name``."""
return self.name
@abc.abstractmethod
def is_available(self) -> bool:
"""Return True when this provider can service calls.
Typically a cheap check (env var present, optional Python dep
importable, instance URL set). Must NOT make network calls this
runs at tool-registration time and on every ``hermes tools`` paint.
"""
def supports_search(self) -> bool:
"""Return True if this provider implements :meth:`search`."""
return True
def supports_extract(self) -> bool:
"""Return True if this provider implements :meth:`extract`.
Both sync and async :meth:`extract` implementations are valid the
dispatcher detects coroutine functions via
:func:`inspect.iscoroutinefunction` and awaits as needed. Sync
implementations that perform blocking I/O (HTTP, SDK calls) should
ideally wrap in :func:`asyncio.to_thread` at the call site; small
providers can keep their sync shape and let the dispatcher handle
threading.
"""
return False
def supports_crawl(self) -> bool:
"""Return True if this provider implements :meth:`crawl`.
Crawl differs from extract in that the agent provides a *seed URL*
and the provider walks linked pages on its own useful for
documentation sites where the agent doesn't know all relevant
URLs upfront. Tavily is the only built-in backend that natively
crawls today; Firecrawl provides a similar capability that we
don't currently surface as a tool.
Providers that don't crawl should leave this as False; the
dispatcher in :func:`tools.web_tools.web_crawl_tool` will fall
back to its auxiliary-model summarization path.
"""
return False
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a web search.
Override when :meth:`supports_search` returns True. The default
raises NotImplementedError; callers should gate on
:meth:`supports_search` before calling.
"""
raise NotImplementedError(
f"{self.name} does not support search (override supports_search)"
)
def extract(self, urls: List[str], **kwargs: Any) -> Any:
"""Extract content from one or more URLs.
Override when :meth:`supports_extract` returns True. The default
raises NotImplementedError; callers should gate on
:meth:`supports_extract` before calling.
Return shape: a list of result dicts matching what the legacy
:func:`tools.web_tools.web_extract_tool` post-processing pipeline
expects::
[
{
"url": str,
"title": str,
"content": str,
"raw_content": str,
"metadata": dict, # optional
"error": str, # optional, only on per-URL failure
},
...
]
Implementations MAY be ``async def`` the dispatcher detects
coroutines via :func:`inspect.iscoroutinefunction` and awaits.
``kwargs`` may carry forward-compat fields (``format``, ``include_raw``,
``max_chars``) implementations should ignore unknown keys.
"""
raise NotImplementedError(
f"{self.name} does not support extract (override supports_extract)"
)
def crawl(self, url: str, **kwargs: Any) -> Any:
"""Crawl a seed URL and return results.
Override when :meth:`supports_crawl` returns True. The default
raises NotImplementedError; callers should gate on
:meth:`supports_crawl` before calling.
Return shape: ``{"results": [{"url": str, "title": str,
"content": str, ...}, ...]}`` matching what
:func:`tools.web_tools.web_crawl_tool` post-processing expects.
Implementations MAY be ``async def``.
``kwargs`` may carry forward-compat fields (e.g. ``max_depth``,
``include_domains``) implementations should ignore unknown keys.
"""
raise NotImplementedError(
f"{self.name} does not support crawl (override supports_crawl)"
)
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker.
Used by ``hermes_cli/tools_config.py`` to inject this provider as a
row in the Web Search / Web Extract picker. Shape::
{
"name": "Brave Search (Free)",
"badge": "free",
"tag": "No paid tier needed — uses Brave's free API.",
"env_vars": [
{"key": "BRAVE_SEARCH_API_KEY",
"prompt": "Brave Search API key",
"url": "https://brave.com/search/api/"},
],
}
Default: minimal entry derived from ``display_name``. Override to
expose API key prompts, badges, and instance URL fields.
"""
return {
"name": self.display_name,
"badge": "",
"tag": "",
"env_vars": [],
}
+262
View File
@@ -0,0 +1,262 @@
"""
Web Search Provider Registry
============================
Central map of registered web providers. Populated by plugins at import-time
via :meth:`PluginContext.register_web_search_provider`; consumed by the
``web_search`` and ``web_extract`` tool wrappers in :mod:`tools.web_tools` to
dispatch each call to the active backend.
Active selection
----------------
The active provider is chosen by configuration with this precedence:
1. ``web.search_backend`` / ``web.extract_backend`` / ``web.crawl_backend``
(per-capability override).
2. ``web.backend`` (shared fallback).
3. If exactly one capability-eligible provider is registered AND available,
use it.
4. Legacy preference order ``firecrawl`` ``parallel`` ``tavily``
``exa`` ``searxng`` ``brave-free`` ``ddgs`` filtered by
availability. Matches the historic ``tools.web_tools._get_backend()``
candidate order so installs that never set a config key keep landing
on the same provider they did before the plugin migration.
5. Otherwise ``None`` the tool surfaces a helpful error pointing at
``hermes tools``.
The capability filter (``supports_search`` / ``supports_extract`` /
``supports_crawl``) is applied at every step so a search-only provider
(``brave-free``) configured as ``web.extract_backend`` correctly falls
through to an extract-capable backend.
"""
from __future__ import annotations
import logging
import threading
from typing import Dict, List, Optional
from agent.web_search_provider import WebSearchProvider
logger = logging.getLogger(__name__)
_providers: Dict[str, WebSearchProvider] = {}
_lock = threading.Lock()
def register_provider(provider: WebSearchProvider) -> None:
"""Register a web search/extract provider.
Re-registration (same ``name``) overwrites the previous entry and logs
a debug message makes hot-reload scenarios (tests, dev loops) behave
predictably.
"""
if not isinstance(provider, WebSearchProvider):
raise TypeError(
f"register_provider() expects a WebSearchProvider instance, "
f"got {type(provider).__name__}"
)
name = provider.name
if not isinstance(name, str) or not name.strip():
raise ValueError("Web provider .name must be a non-empty string")
with _lock:
existing = _providers.get(name)
_providers[name] = provider
if existing is not None:
logger.debug(
"Web provider '%s' re-registered (was %r)",
name, type(existing).__name__,
)
else:
logger.debug(
"Registered web provider '%s' (%s)",
name, type(provider).__name__,
)
def list_providers() -> List[WebSearchProvider]:
"""Return all registered providers, sorted by name."""
with _lock:
items = list(_providers.values())
return sorted(items, key=lambda p: p.name)
def get_provider(name: str) -> Optional[WebSearchProvider]:
"""Return the provider registered under *name*, or None."""
if not isinstance(name, str):
return None
with _lock:
return _providers.get(name.strip())
# ---------------------------------------------------------------------------
# Active-provider resolution
# ---------------------------------------------------------------------------
def _read_config_key(*path: str) -> Optional[str]:
"""Resolve a dotted config key from ``config.yaml``. Returns None on miss."""
try:
from hermes_cli.config import load_config
cfg = load_config()
cur = cfg
for segment in path:
if not isinstance(cur, dict):
return None
cur = cur.get(segment)
if isinstance(cur, str) and cur.strip():
return cur.strip()
except Exception as exc:
logger.debug("Could not read config %s: %s", ".".join(path), exc)
return None
# Legacy preference order — preserves behaviour for users who set no
# ``web.backend`` / ``web.<capability>_backend`` config key at all. Matches
# the historic candidate order in :func:`tools.web_tools._get_backend`
# (paid providers first so existing paid setups don't get downgraded to
# a free tier on upgrade). Filtered by ``is_available()`` at walk time so
# we don't surface a provider the user has no credentials for.
_LEGACY_PREFERENCE = (
"firecrawl",
"parallel",
"tavily",
"exa",
"searxng",
"brave-free",
"ddgs",
)
def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearchProvider]:
"""Resolve the active provider for a capability ("search" | "extract" | "crawl").
Resolution rules (in order):
1. **Explicit config wins, ignoring availability.** If
``web.{capability}_backend`` or ``web.backend`` names a registered
provider that supports *capability*, return it even if its
:meth:`is_available` returns False the dispatcher will surface a
precise "X_API_KEY is not set" error to the user instead of silently
routing somewhere else. Matches legacy
:func:`tools.web_tools._get_backend` behavior for configured names.
2. **Single-provider shortcut.** When only one registered provider
supports *capability* AND ``is_available()`` reports True, return it.
3. **Legacy preference walk, filtered by availability.** Walk the
:data:`_LEGACY_PREFERENCE` order (firecrawl parallel tavily
exa searxng brave-free ddgs) looking for a provider whose
``supports_<capability>()`` is True AND whose ``is_available()`` is
True. Matches the historic ``tools.web_tools._get_backend()``
candidate order so users with credentials but no explicit config
key keep landing on the same provider as pre-migration. This is
the path that fires when no config key is set pick the
highest-priority backend the user actually has credentials for.
Returns None when no provider is configured AND no available provider
matches the legacy preference; the dispatcher then returns a "set up a
provider" error to the user.
"""
with _lock:
snapshot = dict(_providers)
def _capable(p: WebSearchProvider) -> bool:
if capability == "search":
return bool(p.supports_search())
if capability == "extract":
return bool(p.supports_extract())
if capability == "crawl":
return bool(p.supports_crawl())
return False
def _is_available_safe(p: WebSearchProvider) -> bool:
"""Wrap ``is_available()`` so a buggy provider doesn't kill resolution."""
try:
return bool(p.is_available())
except Exception as exc: # noqa: BLE001
logger.debug("provider %s.is_available() raised %s", p.name, exc)
return False
# 1. Explicit config wins — return regardless of is_available() so the
# user gets a precise downstream error message rather than a silent
# backend switch. Matches _get_backend() in web_tools.py.
if configured:
provider = snapshot.get(configured)
if provider is not None and _capable(provider):
return provider
if provider is None:
logger.debug(
"web backend '%s' configured but not registered; falling back",
configured,
)
else:
logger.debug(
"web backend '%s' configured but does not support '%s'; falling back",
configured, capability,
)
# 2. + 3. Fallback path — filter by availability so we don't surface
# a provider the user has no credentials for. Without this filter,
# a registered-but-unconfigured provider could end up "active" on
# a fresh install with no API keys at all.
eligible = [
p for p in snapshot.values()
if _capable(p) and _is_available_safe(p)
]
if len(eligible) == 1:
return eligible[0]
for legacy in _LEGACY_PREFERENCE:
provider = snapshot.get(legacy)
if (
provider is not None
and _capable(provider)
and _is_available_safe(provider)
):
return provider
return None
def get_active_search_provider() -> Optional[WebSearchProvider]:
"""Resolve the currently-active web search provider.
Reads ``web.search_backend`` (preferred) or ``web.backend`` (shared
fallback) from config.yaml; falls back per the module docstring.
"""
explicit = _read_config_key("web", "search_backend") or _read_config_key("web", "backend")
return _resolve(explicit, capability="search")
def get_active_extract_provider() -> Optional[WebSearchProvider]:
"""Resolve the currently-active web extract provider.
Reads ``web.extract_backend`` (preferred) or ``web.backend`` (shared
fallback) from config.yaml; falls back per the module docstring.
"""
explicit = _read_config_key("web", "extract_backend") or _read_config_key("web", "backend")
return _resolve(explicit, capability="extract")
def get_active_crawl_provider() -> Optional[WebSearchProvider]:
"""Resolve the currently-active web crawl provider.
Reads ``web.crawl_backend`` (preferred) or ``web.backend`` (shared
fallback) from config.yaml; falls back per the module docstring.
Crawl is a niche capability among built-in providers only Tavily and
Firecrawl implement it. Callers should expect ``None`` and fall back to
a different strategy (e.g. summarize-via-LLM) when neither is
configured.
"""
explicit = _read_config_key("web", "crawl_backend") or _read_config_key("web", "backend")
return _resolve(explicit, capability="crawl")
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
_providers.clear()
+12
View File
@@ -364,6 +364,18 @@ compression:
# compression of older turns.
protect_last_n: 20
# Number of non-system messages to protect at the head of the transcript, in
# ADDITION to the system prompt (which is always implicitly protected).
# Head messages are NEVER summarized — they survive every compression
# indefinitely. This gives stable early context for short/medium sessions,
# but in long-running sessions that rely on rolling compaction the pinned
# opening turns may not match how you want the session framed over time.
# Set to 0 to preserve ONLY the system prompt (plus the rolling summary
# and recent tail) — the cleanest configuration for long-running sessions.
# Default 3 preserves the system prompt plus the first three non-system
# head messages, matching the pre-feature behaviour.
protect_first_n: 3
# To pin a specific model/provider for compression summaries, use the
# auxiliary section below (auxiliary.compression.provider / model).
+643 -81
View File
@@ -1242,7 +1242,13 @@ _STREAM_PAD = " " # 4-space indent for streamed response text (matches Panel
def _hex_to_ansi(hex_color: str, *, bold: bool = False) -> str:
"""Convert a hex color like '#268bd2' to a true-color ANSI escape."""
"""Convert a hex color like '#268bd2' to a true-color ANSI escape.
Auto-remaps known dark-mode-tuned colors to readable light-mode
equivalents when running on a light terminal (see
_maybe_remap_for_light_mode + _LIGHT_MODE_REMAP).
"""
hex_color = _maybe_remap_for_light_mode(hex_color)
try:
r = int(hex_color[1:3], 16)
g = int(hex_color[3:5], 16)
@@ -1253,6 +1259,250 @@ def _hex_to_ansi(hex_color: str, *, bold: bool = False) -> str:
return _ACCENT_ANSI_DEFAULT if bold else "\033[38;2;184;134;11m"
# ────────────────────────────────────────────────────────────────────────
# Light/dark terminal mode detection.
#
# Mirrors ui-tui/src/theme.ts detectLightMode(). Used to decide whether
# to remap "near-white" skin colors (e.g. #FFF8DC banner_text, #B8860B
# banner_dim) to darker equivalents that are readable on a light
# Terminal.app / iTerm2 background.
#
# Detection priority:
# 1. HERMES_LIGHT / HERMES_TUI_LIGHT env (true/false) — explicit override
# 2. HERMES_TUI_THEME=light|dark — explicit theme
# 3. HERMES_TUI_BACKGROUND=#RRGGBB — explicit bg hint
# 4. COLORFGBG env (set by xterm/Konsole/urxvt) — bg slot 7/15 = light
# 5. OSC 11 query (\x1b]11;?\x1b\\) — ask the terminal directly
# 6. Default: assume dark (matches the legacy Hermes assumption)
#
# Cached after first call so we don't query the terminal repeatedly.
_LIGHT_MODE_CACHE: bool | None = None
_TRUE_RE = re.compile(r"^(1|true|on|yes|y)$")
_FALSE_RE = re.compile(r"^(0|false|off|no|n)$")
_LIGHT_DEFAULT_TERM_PROGRAMS = frozenset() # Apple_Terminal doesn't reliably indicate; require explicit
def _luminance_from_hex(hex_str: str) -> float | None:
s = (hex_str or "").strip().lstrip("#")
if len(s) == 3:
s = "".join(c * 2 for c in s)
if len(s) != 6 or not all(c in "0123456789abcdefABCDEF" for c in s):
return None
try:
r, g, b = int(s[0:2], 16), int(s[2:4], 16), int(s[4:6], 16)
except ValueError:
return None
# Rec.709 luma
return (0.2126 * r + 0.7152 * g + 0.0722 * b) / 255.0
def _query_osc11_background() -> str | None:
"""Ask the terminal for its background color via OSC 11.
Most modern terminals reply with \x1b]11;rgb:RRRR/GGGG/BBBB\x1b\\
within a few ms. We wait up to 100ms total before giving up.
Returns "#RRGGBB" or None on timeout / non-tty.
"""
if not sys.stdin.isatty() or not sys.stdout.isatty():
return None
try:
import termios
import tty
fd = sys.stdin.fileno()
old = termios.tcgetattr(fd)
except Exception:
return None
try:
try:
tty.setcbreak(fd)
except Exception:
return None
try:
sys.stdout.write("\x1b]11;?\x1b\\")
sys.stdout.flush()
except Exception:
return None
# Read up to ~50ms for the response
import select
deadline = time.monotonic() + 0.1
buf = b""
while time.monotonic() < deadline:
r, _, _ = select.select([fd], [], [], deadline - time.monotonic())
if not r:
continue
try:
chunk = os.read(fd, 64)
except OSError:
break
if not chunk:
break
buf += chunk
if b"\x1b\\" in buf or b"\x07" in buf:
break
# Parse: \x1b]11;rgb:RRRR/GGGG/BBBB\x1b\\
m = re.search(rb"rgb:([0-9a-fA-F]+)/([0-9a-fA-F]+)/([0-9a-fA-F]+)", buf)
if not m:
return None
# Each component is 1-4 hex digits — normalize to 8-bit
def norm(h: bytes) -> int:
v = int(h, 16)
# Scale to 0-255 based on hex length
bits = len(h) * 4
return (v * 255) // ((1 << bits) - 1) if bits else 0
r, g, b = norm(m.group(1)), norm(m.group(2)), norm(m.group(3))
return f"#{r:02X}{g:02X}{b:02X}"
finally:
try:
termios.tcsetattr(fd, termios.TCSANOW, old)
except Exception:
pass
def _detect_light_mode() -> bool:
global _LIGHT_MODE_CACHE
if _LIGHT_MODE_CACHE is not None:
return _LIGHT_MODE_CACHE
result = False
try:
# 1. Explicit env override
for var in ("HERMES_LIGHT", "HERMES_TUI_LIGHT"):
v = (os.environ.get(var) or "").strip().lower()
if _TRUE_RE.match(v):
result = True
_LIGHT_MODE_CACHE = result
return result
if _FALSE_RE.match(v):
_LIGHT_MODE_CACHE = result
return result
# 2. Theme hint
theme = (os.environ.get("HERMES_TUI_THEME") or "").strip().lower()
if theme == "light":
result = True
_LIGHT_MODE_CACHE = result
return result
if theme == "dark":
_LIGHT_MODE_CACHE = result
return result
# 3. Explicit bg hex
bg_hint = os.environ.get("HERMES_TUI_BACKGROUND") or ""
bg_lum = _luminance_from_hex(bg_hint)
if bg_lum is not None:
result = bg_lum >= 0.5
_LIGHT_MODE_CACHE = result
return result
# 4. COLORFGBG (xterm/Konsole/urxvt)
cfgbg = (os.environ.get("COLORFGBG") or "").strip()
if cfgbg:
last = cfgbg.split(";")[-1] if ";" in cfgbg else cfgbg
if last.isdigit():
bg = int(last)
if bg in (7, 15):
result = True
_LIGHT_MODE_CACHE = result
return result
if 0 <= bg < 16:
_LIGHT_MODE_CACHE = result
return result
# 5. OSC 11 query (best-effort, only when stdin/stdout are TTY)
bg_color = _query_osc11_background()
if bg_color:
lum = _luminance_from_hex(bg_color)
if lum is not None:
result = lum >= 0.5
_LIGHT_MODE_CACHE = result
return result
# 6. TERM_PROGRAM allow-list (currently empty)
tp = (os.environ.get("TERM_PROGRAM") or "").strip()
if tp in _LIGHT_DEFAULT_TERM_PROGRAMS:
result = True
except Exception:
result = False
_LIGHT_MODE_CACHE = result
return result
# Light-mode equivalents of skin colors that are unreadable on cream
# Terminal.app backgrounds. Used by _SkinAwareAnsi to remap colors
# at resolution time when light mode is detected.
#
# IMPORTANT: only remap colors that are used as STANDALONE foregrounds
# on the terminal's background. Don't remap colors that are paired
# with a dark bg (e.g. status bar text on bg:#1a1a2e) — those would
# become invisible the OTHER direction (dark gray on dark navy).
_LIGHT_MODE_REMAP: dict[str, str] = {
# Original (dark-mode) -> Light-mode replacement (darker, readable)
"#FFF8DC": "#1A1A1A", # cornsilk -> near-black
"#FFD700": "#9A6B00", # gold -> dark goldenrod (readable on cream)
"#FFBF00": "#8A5A00", # amber -> dark amber
"#B8860B": "#5C4500", # dark goldenrod -> deeper brown (more contrast)
"#DAA520": "#6B4F00", # goldenrod -> dark olive
"#F1E6CF": "#1A1A1A", # cream -> near-black
"#c9d1d9": "#24292F", # github-light fg
"#EAF7FF": "#0F1B26", # ice
"#F5F5F5": "#1A1A1A",
"#FFF0D4": "#1A1A1A",
"#CD7F32": "#8A4F1A", # bronze -> darker bronze
"#FFEFB5": "#3A2A00",
# NOTE: skipping #C0C0C0/#888888/#555555/#8B8682 — those are
# status-bar foregrounds paired with dark navy bg, where dark
# remap values would become invisible.
}
def _maybe_remap_for_light_mode(hex_color: str) -> str:
"""If we're in light mode, remap a dark-mode-tuned color to a
higher-contrast equivalent. No-op in dark mode."""
if not _detect_light_mode():
return hex_color
if not hex_color or not hex_color.startswith("#"):
return hex_color
# Case-insensitive lookup
upper = hex_color.upper()
if upper in _LIGHT_MODE_REMAP_UPPER:
return _LIGHT_MODE_REMAP_UPPER[upper]
return hex_color
# Pre-uppercased lookup table for case-insensitive remapping
_LIGHT_MODE_REMAP_UPPER = {k.upper(): v for k, v in _LIGHT_MODE_REMAP.items()}
def _install_skin_light_mode_hook() -> None:
"""Wrap SkinConfig.get_color at import time so EVERY skin color read goes
through the light-mode remap. Idempotent."""
try:
from hermes_cli.skin_engine import SkinConfig # type: ignore[import]
except Exception:
return
if getattr(SkinConfig, "_hermes_light_mode_hook_installed", False):
return
_orig_get_color = SkinConfig.get_color
def _wrapped_get_color(self, key, fallback=""):
value = _orig_get_color(self, key, fallback)
try:
return _maybe_remap_for_light_mode(value)
except Exception:
return value
SkinConfig.get_color = _wrapped_get_color # type: ignore[method-assign]
SkinConfig._hermes_light_mode_hook_installed = True # type: ignore[attr-defined]
_install_skin_light_mode_hook()
# Prime the light-mode detection cache early (at module load) when
# we're running interactively so OSC 11 happens before pt grabs the
# tty. Skip for non-tty contexts (subagents, gateway, tests).
try:
if sys.stdin.isatty() and sys.stdout.isatty():
_detect_light_mode()
except Exception:
pass
class _SkinAwareAnsi:
"""Lazy ANSI escape that resolves from the skin engine on first use.
@@ -1290,7 +1540,12 @@ class _SkinAwareAnsi:
_ACCENT = _SkinAwareAnsi("response_border", "#FFD700", bold=True)
_DIM = _SkinAwareAnsi("banner_dim", "#B8860B")
# Use ANSI dim+italic attributes (\x1b[2;3m) instead of a hardcoded
# hex color so dim/thinking text inherits the terminal's default
# foreground color and stays readable in both light and dark
# Terminal.app modes. Hardcoded skin colors like #B8860B
# (dark goldenrod) become invisible against light cream backgrounds.
_DIM = "\x1b[2;3m"
def _accent_hex() -> str:
@@ -1415,9 +1670,6 @@ _OUTPUT_HISTORY_REPLAYING = False
_OUTPUT_HISTORY_SUPPRESSED = False
_OUTPUT_HISTORY_MAX_LINES = 200
_OUTPUT_HISTORY = deque(maxlen=_OUTPUT_HISTORY_MAX_LINES)
_ANSI_CONTROL_RE = re.compile(
r"\x1b(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~]|\][^\x07]*(?:\x07|\x1b\\))"
)
def _coerce_output_history_limit(value) -> int:
@@ -1459,10 +1711,10 @@ def _record_output_history_entry(entry) -> None:
def _record_output_history(text: str) -> None:
if not _OUTPUT_HISTORY_ENABLED or _OUTPUT_HISTORY_REPLAYING or _OUTPUT_HISTORY_SUPPRESSED:
return
clean = _ANSI_CONTROL_RE.sub("", str(text)).replace("\r", "").rstrip("\n")
if not clean:
normalized = str(text).replace("\r", "").rstrip("\n")
if not normalized:
return
for line in clean.splitlines():
for line in normalized.splitlines():
_record_output_history_entry(line)
@@ -1473,6 +1725,7 @@ def _replay_output_history() -> None:
return
_OUTPUT_HISTORY_REPLAYING = True
try:
rendered_lines = []
for entry in tuple(_OUTPUT_HISTORY):
if callable(entry):
try:
@@ -1483,8 +1736,15 @@ def _replay_output_history() -> None:
lines = lines.splitlines()
else:
lines = [entry]
for line in lines:
_pt_print(_PT_ANSI(str(line)))
rendered_lines.extend(str(line) for line in lines)
if rendered_lines:
# Replay after resize can contain hundreds of history lines. A
# per-line prompt_toolkit print forces one synchronous terminal I/O
# and redraw cycle per line, which users perceive as a waterfall of
# old output. Keep the existing history contents unchanged, but
# emit the replay as one ANSI payload so resize recovery does a
# single prompt_toolkit print/redraw.
_pt_print(_PT_ANSI("\n".join(rendered_lines)))
except Exception:
pass
finally:
@@ -2639,6 +2899,12 @@ class HermesCLI:
# Status bar visibility (toggled via /statusbar)
self._status_bar_visible = True
# When True, the input separator rules and the dynamic status bar are
# hidden until the next user input. Set by _recover_after_resize() so a
# SIGWINCH cannot stamp a freshly-drawn status bar on top of one that
# the terminal just reflowed into scrollback — the cause of duplicated
# bars / "blank line flooding" reports (#19280, #22976).
self._status_bar_suppressed_after_resize = False
self._resize_recovery_lock = threading.Lock()
self._resize_recovery_timer = None
self._resize_recovery_pending = False
@@ -2703,9 +2969,36 @@ class HermesCLI:
pass
def _recover_after_resize(self, app, original_on_resize) -> None:
"""Recover a resized classic CLI without desynchronizing cursor state."""
self._clear_prompt_toolkit_screen(app, rebuild_scrollback=True)
_replay_output_history()
"""Recover a resized classic CLI without desynchronizing cursor state.
Unlike _force_full_redraw, we do NOT clear the physical screen or
scrollback here. The startup banner and tool summary are printed
before prompt_toolkit owns the live chrome, so they live in normal
terminal scrollback. Erasing the screen on SIGWINCH removes that
startup UI and ``_replay_output_history`` cannot reconstruct it
(the banner was never added to ``_OUTPUT_HISTORY``).
Instead we just reset prompt_toolkit's renderer cache so the next
incremental redraw starts from a clean slate, then let
``original_on_resize`` recalculate layout for the new size.
We also flag ``_status_bar_suppressed_after_resize`` so the dynamic
status bar and input separator rules stay hidden until the next user
input. On column shrink the terminal reflows already-rendered status
bar rows into scrollback before prompt_toolkit can erase them; drawing
a fresh full-width bar immediately makes the old and new versions
look duplicated (#19280, #22976). Clearing the suppression on the
next prompt restores the bar cleanly.
"""
self._status_bar_suppressed_after_resize = True
try:
app.renderer.reset(leave_alternate_screen=False)
except Exception:
pass
try:
app.invalidate()
except Exception:
pass
original_on_resize()
def _schedule_resize_recovery(self, app, original_on_resize, delay: float = 0.12) -> None:
@@ -2940,10 +3233,34 @@ class HermesCLI:
width = self._get_tui_terminal_width()
return width < 64
@staticmethod
def _scrollback_box_width(width: Optional[int] = None) -> int:
"""Return a resize-safe width for printed scrollback box rules.
Lines already printed to terminal scrollback are reflowed by the
terminal emulator when the column count shrinks. A full-width response
border drawn at, say, 200 columns will wrap into two or three rows of
dashes after the user resizes to 80 columns, looking like duplicated
separator lines (the family of bugs tracked by #18449, #19280, #22976).
Keep decorative scrollback boxes intentionally narrower than the
viewport so a moderate resize never triggers reflow. The live TUI
footer (status bar, input rule) still uses the full width only
content that is *stamped into scrollback* needs this clamp.
"""
if width is None:
try:
width = shutil.get_terminal_size((80, 24)).columns
except Exception:
width = 80
return max(32, min(int(width or 80), 56))
def _tui_input_rule_height(self, position: str, width: Optional[int] = None) -> int:
"""Return the visible height for the top/bottom input separator rules."""
if position not in {"top", "bottom"}:
raise ValueError(f"Unknown input rule position: {position}")
if getattr(self, "_status_bar_suppressed_after_resize", False):
return 0
if position == "top":
return 1
return 0 if self._use_minimal_tui_chrome(width=width) else 1
@@ -3453,7 +3770,7 @@ class HermesCLI:
# Open reasoning box on first reasoning token
if not getattr(self, "_reasoning_box_opened", False):
self._reasoning_box_opened = True
w = shutil.get_terminal_size().columns
w = self._scrollback_box_width()
r_label = " Reasoning "
r_fill = w - 2 - len(r_label)
_cprint(f"\n{_DIM}┌─{r_label}{'' * max(r_fill - 1, 0)}{_RST}")
@@ -3477,7 +3794,7 @@ class HermesCLI:
if buf:
_cprint(f"{_DIM}{buf}{_RST}")
self._reasoning_buf = ""
w = shutil.get_terminal_size().columns
w = self._scrollback_box_width()
_cprint(f"{_DIM}{'' * (w - 2)}{_RST}")
self._reasoning_box_opened = False
@@ -3668,7 +3985,7 @@ class HermesCLI:
self._stream_text_ansi = ""
if self.show_timestamps:
label = f"{label} {datetime.now().strftime('%H:%M')}"
w = shutil.get_terminal_size().columns
w = self._scrollback_box_width()
fill = w - 2 - HermesCLI._status_bar_display_width(label)
_cprint(f"\n{_ACCENT}╭─{label}{'' * max(fill - 1, 0)}{_RST}")
@@ -3769,7 +4086,7 @@ class HermesCLI:
# Close the response box
if self._stream_box_opened:
w = shutil.get_terminal_size().columns
w = self._scrollback_box_width()
_cprint(f"{_ACCENT}{'' * (w - 2)}{_RST}")
def _reset_stream_state(self) -> None:
@@ -6596,7 +6913,7 @@ class HermesCLI:
/model <name> --provider <provider> switch provider + model
/model --provider <provider> switch to provider, auto-detect model
"""
from hermes_cli.model_switch import switch_model, parse_model_flags, list_authenticated_providers
from hermes_cli.model_switch import switch_model, parse_model_flags
from hermes_cli.providers import get_label
# Parse args from the original command
@@ -6606,16 +6923,25 @@ class HermesCLI:
# Parse --provider and --global flags
model_input, explicit_provider, persist_global = parse_model_flags(raw_args)
# Load providers for switch_model (picker path needs them below)
user_provs = None
custom_provs = None
# Single inventory context — replaces the inline config-slice the
# dashboard / TUI used to duplicate. Overlay live session state
# via with_overrides (truthy-only) so empty self.* attrs don't
# clobber disk config.
from hermes_cli.inventory import build_models_payload, load_picker_context
try:
from hermes_cli.config import get_compatible_custom_providers, load_config
cfg = load_config()
user_provs = cfg.get("providers")
custom_provs = get_compatible_custom_providers(cfg)
ctx = load_picker_context().with_overrides(
current_provider=self.provider or "",
current_model=self.model or "",
current_base_url=self.base_url or "",
)
except Exception:
pass
ctx = None
# switch_model() + _open_model_picker still need the raw provider
# dicts; ConfigContext is the canonical source for both.
user_provs = ctx.user_providers if ctx is not None else None
custom_provs = ctx.custom_providers if ctx is not None else None
# No args at all: open prompt_toolkit-native picker modal
if not model_input and not explicit_provider:
@@ -6623,14 +6949,9 @@ class HermesCLI:
provider_display = get_label(self.provider) if self.provider else "unknown"
try:
providers = list_authenticated_providers(
current_provider=self.provider or "",
current_base_url=self.base_url or "",
current_model=self.model or "",
user_providers=user_provs,
custom_providers=custom_provs,
max_models=50,
)
if ctx is None:
raise RuntimeError("inventory context unavailable")
providers = build_models_payload(ctx, max_models=50)["providers"]
except Exception:
providers = []
@@ -6756,6 +7077,46 @@ class HermesCLI:
else:
_cprint(" (session only — add --global to persist)")
def _handle_codex_runtime(self, cmd_original: str) -> None:
"""Handle /codex-runtime — toggle the codex app-server runtime opt-in.
Usage:
/codex-runtime show current state
/codex-runtime auto Hermes default (chat_completions)
/codex-runtime codex_app_server hand turns to codex subprocess
/codex-runtime on / off synonyms for the above
"""
from hermes_cli import codex_runtime_switch as crs
parts = cmd_original.split(None, 1)
raw_args = parts[1].strip() if len(parts) > 1 else ""
new_value, errors = crs.parse_args(raw_args)
if errors:
for err in errors:
_cprint(f"{err}")
return
# Load + persist via the existing config helpers
try:
from hermes_cli.config import load_config, save_config
except Exception as exc:
_cprint(f"❌ could not load config: {exc}")
return
cfg = load_config()
result = crs.apply(
cfg,
new_value,
persist_callback=(save_config if new_value is not None else None),
)
prefix = "" if result.success else ""
for line in result.message.splitlines():
_cprint(f" {prefix} {line}" if line.startswith("openai_runtime")
else f" {line}")
if result.success and result.requires_new_session:
_cprint(" Tip: `/reset` starts a new session immediately.")
def _should_handle_model_command_inline(self, text: str, has_images: bool = False) -> bool:
"""Return True when /model should be handled immediately on the UI thread."""
if not text or has_images or not _looks_like_slash_command(text):
@@ -7436,6 +7797,8 @@ class HermesCLI:
self._handle_resume_command(cmd_original)
elif canonical == "model":
self._handle_model_switch(cmd_original)
elif canonical == "codex-runtime":
self._handle_codex_runtime(cmd_original)
elif canonical == "gquota":
self._handle_gquota_command(cmd_original)
@@ -7583,6 +7946,8 @@ class HermesCLI:
_cprint(f" No agent running; queued as next turn: {payload[:80]}{'...' if len(payload) > 80 else ''}")
elif canonical == "goal":
self._handle_goal_command(cmd_original)
elif canonical == "subgoal":
self._handle_subgoal_command(cmd_original)
elif canonical == "skin":
self._handle_skin_command(cmd_original)
elif canonical == "voice":
@@ -7600,6 +7965,8 @@ class HermesCLI:
exec_cmd = qcmd.get("command", "")
if exec_cmd:
try:
# shell=True is intentional: quick_commands are user-defined
# shell snippets from config.yaml — not agent/LLM controlled.
result = subprocess.run(
exec_cmd, shell=True, capture_output=True,
text=True, timeout=30
@@ -7801,8 +8168,8 @@ class HermesCLI:
from hermes_cli.skin_engine import get_active_skin
_skin = get_active_skin()
label = _skin.get_branding("response_label", "⚕ Hermes")
_resp_color = _skin.get_color("response_border", "#CD7F32")
_resp_text = _skin.get_color("banner_text", "#FFF8DC")
_resp_color = _maybe_remap_for_light_mode(_skin.get_color("response_border", "#CD7F32"))
_resp_text = _maybe_remap_for_light_mode(_skin.get_color("banner_text", "#FFF8DC"))
except Exception:
label = "⚕ Hermes"
_resp_color = "#CD7F32"
@@ -7817,6 +8184,7 @@ class HermesCLI:
style=_resp_text,
box=rich_box.HORIZONTALS,
padding=(1, 4),
width=self._scrollback_box_width(),
))
else:
_cprint(" (No response generated)")
@@ -8179,6 +8547,81 @@ class HermesCLI:
except Exception:
pass
def _handle_subgoal_command(self, cmd: str) -> None:
"""Dispatch /subgoal subcommands.
Forms:
/subgoal show current subgoals
/subgoal <text> append a criterion
/subgoal remove <n> drop subgoal n (1-based)
/subgoal clear wipe all subgoals
Subgoals are extra criteria the user adds mid-loop. They get
appended to both the judge prompt (verdict must consider them)
and the continuation prompt (agent sees them) on the next turn
boundary. No special kick the running turn finishes, the next
judge call includes them.
"""
parts = (cmd or "").strip().split(None, 2)
arg = " ".join(parts[1:]).strip() if len(parts) > 1 else ""
mgr = self._get_goal_manager()
if mgr is None:
_cprint(f" {_DIM}Goals unavailable (no active session).{_RST}")
return
if not mgr.has_goal():
_cprint(f" {_DIM}No active goal. Set one with /goal <text>.{_RST}")
return
# No args → list current subgoals.
if not arg:
_cprint(f" {mgr.status_line()}")
_cprint(f" {mgr.render_subgoals()}")
return
tokens = arg.split(None, 1)
verb = tokens[0].lower()
rest = tokens[1].strip() if len(tokens) > 1 else ""
if verb == "remove":
if not rest:
_cprint(" Usage: /subgoal remove <n>")
return
try:
idx = int(rest.split()[0])
except ValueError:
_cprint(" /subgoal remove: <n> must be an integer (1-based index).")
return
try:
removed = mgr.remove_subgoal(idx)
except (IndexError, RuntimeError) as exc:
_cprint(f" /subgoal remove: {exc}")
return
_cprint(f" ✓ Removed subgoal {idx}: {removed}")
return
if verb == "clear":
try:
prev = mgr.clear_subgoals()
except RuntimeError as exc:
_cprint(f" /subgoal clear: {exc}")
return
if prev:
_cprint(f" ✓ Cleared {prev} subgoal{'s' if prev != 1 else ''}.")
else:
_cprint(f" {_DIM}No subgoals to clear.{_RST}")
return
# Otherwise — append the whole arg as a new subgoal.
try:
text = mgr.add_subgoal(arg)
except (ValueError, RuntimeError) as exc:
_cprint(f" /subgoal: {exc}")
return
idx = len(mgr.state.subgoals) if mgr.state else 0
_cprint(f" ✓ Added subgoal {idx}: {text}")
def _maybe_continue_goal_after_turn(self) -> None:
"""Hook run after every CLI turn. Judges + maybe re-queues.
@@ -8205,10 +8648,36 @@ class HermesCLI:
# If a real user message is already queued, don't inject a
# continuation prompt on top — let the user's turn go first.
# Slash commands don't count as "real user messages" for this
# check: they're inspection/mutation (e.g. /subgoal added mid-
# run) and the process_loop dispatches them via process_command,
# not via chat(). If we treat a queued /subgoal as preempting,
# the goal loop silently stalls — we'd return here, then the
# slash command consumes its queue slot via process_command()
# which never re-fires the goal hook. Peek at all queued entries
# and only defer when there's a non-slash payload.
try:
if getattr(self, "_pending_input", None) is not None \
and not self._pending_input.empty():
return
pending = getattr(self, "_pending_input", None)
if pending is not None and not pending.empty():
has_real_message = False
try:
# Queue.queue is the underlying deque — direct peek
# without disturbing FIFO order.
for entry in list(pending.queue):
# Bundled payloads are (text, images) tuples;
# unpack for inspection.
if isinstance(entry, tuple) and entry:
entry = entry[0]
if isinstance(entry, str) and _looks_like_slash_command(entry):
continue
has_real_message = True
break
except Exception:
# Fallback: if we can't introspect the queue, behave
# like the old check and defer to be safe.
has_real_message = True
if has_real_message:
return
except Exception:
pass
@@ -8301,7 +8770,8 @@ class HermesCLI:
set_active_skin(new_skin)
_ACCENT.reset() # Re-resolve ANSI color for the new skin
_DIM.reset() # Re-resolve dim/secondary ANSI color for the new skin
# _DIM is now a fixed dim+italic ANSI escape (terminal-default fg)
# so it doesn't need re-resolving on skin switch.
if save_config_value("display.skin", new_skin):
print(f" Skin set to: {new_skin} (saved)")
else:
@@ -9198,7 +9668,7 @@ class HermesCLI:
Updates the TUI spinner widget so the user can see what the agent
is doing during tool execution (fills the gap between thinking
spinner and next response). Also plays audio cue in voice mode.
spinner and next response).
On tool.started, records a monotonic timestamp so get_spinner_text()
can show a live elapsed timer (the TUI poll loop already invalidates
@@ -9277,20 +9747,6 @@ class HermesCLI:
)
self._invalidate()
if not self._voice_mode:
return
if not function_name or function_name.startswith("_"):
return
try:
from tools.voice_mode import play_beep
threading.Thread(
target=play_beep,
kwargs={"frequency": 1200, "duration": 0.06, "count": 1},
daemon=True,
).start()
except Exception:
pass
def _on_tool_start(self, tool_call_id: str, function_name: str, function_args: dict):
"""Capture local before-state for write-capable tools."""
try:
@@ -9895,7 +10351,7 @@ class HermesCLI:
import time as _time
with self._approval_lock:
timeout = 60
timeout = int(CLI_CONFIG.get("approvals", {}).get("timeout", 60))
response_queue = queue.Queue()
self._approval_state = {
@@ -10389,7 +10845,7 @@ class HermesCLI:
nonlocal _streaming_box_opened
if not _streaming_box_opened:
_streaming_box_opened = True
w = self.console.width
w = self._scrollback_box_width(getattr(self.console, "width", 80))
label = " ⚕ Hermes "
if self.show_timestamps:
label = f"{label}{datetime.now().strftime('%H:%M')} "
@@ -10674,7 +11130,7 @@ class HermesCLI:
if self.show_reasoning and result and not _reasoning_already_shown:
reasoning = result.get("last_reasoning")
if reasoning:
w = shutil.get_terminal_size().columns
w = self._scrollback_box_width()
r_label = " Reasoning "
r_fill = w - 2 - len(r_label)
r_top = f"{_DIM}┌─{r_label}{'' * max(r_fill - 1, 0)}{_RST}"
@@ -10694,18 +11150,18 @@ class HermesCLI:
from hermes_cli.skin_engine import get_active_skin
_skin = get_active_skin()
label = _skin.get_branding("response_label", "⚕ Hermes")
_resp_color = _skin.get_color("response_border", "#CD7F32")
_resp_text = _skin.get_color("banner_text", "#FFF8DC")
_resp_color = _maybe_remap_for_light_mode(_skin.get_color("response_border", "#CD7F32"))
_resp_text = _maybe_remap_for_light_mode(_skin.get_color("banner_text", "#FFF8DC"))
except Exception:
label = "⚕ Hermes"
_resp_color = "#CD7F32"
_resp_text = "#FFF8DC"
_resp_color = _maybe_remap_for_light_mode("#CD7F32")
_resp_text = _maybe_remap_for_light_mode("#FFF8DC")
is_error_response = result and (result.get("failed") or result.get("partial"))
already_streamed = self._stream_started and self._stream_box_opened and not is_error_response
if use_streaming_tts and _streaming_box_opened and not is_error_response:
# Text was already printed sentence-by-sentence; just close the box
w = shutil.get_terminal_size().columns
w = self._scrollback_box_width()
_cprint(f"\n{_ACCENT}{'' * (w - 2)}{_RST}")
elif already_streamed:
# Response was already streamed token-by-token with box framing;
@@ -10721,6 +11177,7 @@ class HermesCLI:
style=_resp_text,
box=rich_box.HORIZONTALS,
padding=(1, 4),
width=self._scrollback_box_width(),
))
@@ -10937,13 +11394,48 @@ class HermesCLI:
return "".join(text for _, text in self._get_tui_prompt_fragments())
def _build_tui_style_dict(self) -> dict[str, str]:
"""Layer the active skin's prompt_toolkit colors over the base TUI style."""
"""Layer the active skin's prompt_toolkit colors over the base TUI style.
Also rewrites any hex-color tokens in the resulting style strings
to their light-mode equivalents (via _LIGHT_MODE_REMAP) when the
terminal is detected as light. This makes the chrome readable
on cream Terminal.app backgrounds without per-skin overrides.
"""
style_dict = dict(getattr(self, "_tui_style_base", {}) or {})
try:
from hermes_cli.skin_engine import get_prompt_toolkit_style_overrides
style_dict.update(get_prompt_toolkit_style_overrides())
except Exception:
pass
# Light-mode remap on the style strings. Each value is a pt
# style string like "bg:#1a1a2e #C0C0C0 bold" — split on space,
# rewrite any "#XXX" tokens (including "bg:#XXX") through the
# light-mode remap, rejoin.
#
# CRITICAL: skip the remap entirely when a style string already
# specifies its own bg (e.g. status-bar / completion-menu styles
# with `bg:#1a1a2e ...`). Those colors were tuned for that
# specific dark bg and remapping the FG to a dark equivalent
# would produce dark-on-dark (invisible). The terminal's BG
# mode is irrelevant — what matters is the bg the style itself
# paints.
try:
if _detect_light_mode():
def _remap_value(v: str) -> str:
if not v:
return v
tokens = v.split()
has_explicit_bg = any(t.startswith("bg:") for t in tokens)
if has_explicit_bg:
# The style paints its own bg — leave its fg alone.
return v
return " ".join(
_maybe_remap_for_light_mode(t) if t.startswith("#") else t
for t in tokens
)
style_dict = {k: _remap_value(v or "") for k, v in style_dict.items()}
except Exception:
pass
return style_dict
def _apply_tui_skin_style(self) -> bool:
@@ -11029,6 +11521,13 @@ class HermesCLI:
def run(self):
"""Run the interactive CLI loop with persistent input at bottom."""
# Detect light/dark terminal mode now (before pt grabs the tty).
# Caches the result so subsequent _hex_to_ansi / style calls
# don't risk re-querying mid-render.
try:
_detect_light_mode()
except Exception:
pass
# Push the entire TUI to the bottom of the terminal so the banner,
# responses, and prompt all appear pinned to the bottom — empty
# space stays above, not below. This prints enough blank lines to
@@ -12754,7 +13253,10 @@ class HermesCLI:
# guard against any future width mismatch.
wrap_lines=False,
),
filter=Condition(lambda: cli_ref._status_bar_visible),
filter=Condition(
lambda: cli_ref._status_bar_visible
and not getattr(cli_ref, "_status_bar_suppressed_after_resize", False)
),
)
# Allow wrapper CLIs to register extra keybindings.
@@ -12789,11 +13291,16 @@ class HermesCLI:
# Style for the application
self._tui_style_base = {
'input-area': '#FFF8DC',
'placeholder': '#555555 italic',
'prompt': '#FFF8DC',
# Input area / prompt: empty style strings inherit the
# terminal's default foreground/background, so the typed
# text is readable in both light and dark Terminal.app
# color schemes. (Hardcoding a near-white #FFF8DC made
# input invisible on light backgrounds.)
'input-area': '',
'placeholder': '#888888 italic',
'prompt': '',
'prompt-working': '#888888 italic',
'hint': '#555555 italic',
'hint': '#888888 italic',
'status-bar': 'bg:#1a1a2e #C0C0C0',
'status-bar-strong': 'bg:#1a1a2e #FFD700 bold',
'status-bar-dim': 'bg:#1a1a2e #8B8682',
@@ -12852,19 +13359,70 @@ class HermesCLI:
self._app = app # Store reference for clarify_callback
# ── Fix ghost status-bar lines on terminal resize ──────────────
# When the terminal shrinks (e.g. un-maximize), the emulator reflows
# the previously-rendered full-width rows (status bar, input rules)
# into multiple narrower rows. prompt_toolkit's _on_resize handler
# only cursor_up()s by the stored layout height, missing the extra
# rows created by reflow — leaving ghost duplicates visible.
# Resize handling: monkey-patch prompt_toolkit's _output_screen_diff
# to suppress the deliberate "reserve vertical space" scroll-up.
#
# It's not just column-shrink: widening, row-shrinking, and
# multiplexer-driven SIGWINCH-less redraws (cmux / tmux tab switch)
# all produce the same class of drift, where the renderer's tracked
# _cursor_pos.y no longer matches terminal reality. The only reliable
# recovery is a full screen-clear (\x1b[2J\x1b[H) before the next
# redraw, so we force one on every resize rather than trying to
# compute the exact drift.
# Background: prompt_toolkit's renderer (renderer.py L232-242)
# explicitly moves the cursor to the bottom of the canvas after
# painting "to make sure the terminal scrolls up, even when the
# lower lines of the canvas just contain whitespace". In
# non-fullscreen mode this scrolls chrome content (status bar,
# input rules) into terminal scrollback on every render. When
# the terminal column-shrinks, the emulator reflows the previously
# rendered full-width rows into multiple narrower rows that get
# pushed up — leaving ghost duplicates AND polluting scrollback.
# Same issue as pt #29 (open since 2014), #1675, #1933.
#
# Surgical fix: wrap _output_screen_diff so that when its internal
# `if current_height > previous_screen.height` branch fires (the
# one that does the bottom-cursor-move), we make it fall through
# by inflating previous_screen.height first.
try:
import prompt_toolkit.renderer as _pt_renderer
from prompt_toolkit.renderer import _output_screen_diff as _orig_osd
if not getattr(_pt_renderer, "_hermes_osd_patched", False):
def _patched_output_screen_diff(
app, output, screen, current_pos, color_depth,
previous_screen, last_style, is_done, full_screen,
attrs_for_style_string, style_string_has_style,
size, previous_width,
):
"""Wraps pt's _output_screen_diff to suppress the
reserve-vertical-space scroll (renderer.py L232-242).
Strategy: ONLY when previous_screen is non-None and
its current height is genuinely smaller than the new
screen's height, inflate it to match. This prevents
the bottom-cursor-move at L242 without changing any
other code path's behavior.
Critical: do NOT replace a None previous_screen with
a fresh Screen() that would skip the proper
reset_attributes()+erase_down() at L178-185 which
fires when previous_screen is None (first-paint /
width-change). Without that reset, ANSI styles
leak between renders.
"""
try:
if previous_screen is not None and hasattr(previous_screen, "height"):
if previous_screen.height < screen.height:
previous_screen.height = screen.height
except Exception:
pass
return _orig_osd(
app, output, screen, current_pos, color_depth,
previous_screen, last_style, is_done, full_screen,
attrs_for_style_string, style_string_has_style,
size, previous_width,
)
_pt_renderer._output_screen_diff = _patched_output_screen_diff
_pt_renderer._hermes_osd_patched = True
except Exception:
pass
_original_on_resize = app._on_resize
def _resize_clear_ghosts():
@@ -12923,6 +13481,10 @@ class HermesCLI:
if not user_input:
continue
# The user has typed and submitted something, so any
# post-resize transient suppression should end here.
self._status_bar_suppressed_after_resize = False
# Unpack image payload: (text, [Path, ...]) or plain str
submit_images = []
if isinstance(user_input, tuple):
+4
View File
@@ -39,6 +39,10 @@ if [ "$(id -u)" = "0" ]; then
# by the mapped user on the host side.
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "Warning: chown failed (rootless container?) — continuing anyway"
# The .venv must also be re-chowned when UID is remapped, otherwise
# lazy_deps.py cannot install platform packages (discord.py, etc.).
chown -R hermes:hermes "$INSTALL_DIR/.venv" 2>/dev/null || \
echo "Warning: chown .venv failed (rootless container?) — continuing anyway"
fi
# Ensure config.yaml is readable by the hermes runtime user even if it was
+71 -9
View File
@@ -74,6 +74,24 @@ def _normalize_notice_delivery(value: Any, default: str = "public") -> str:
return default
def _ensure_platform_extra_dict(platforms_data: dict, name: str) -> tuple[dict, dict]:
"""Get-or-create ``platforms_data[name]`` and its nested ``extra`` dict.
Both slots are coerced to ``{}`` if a non-dict value is encountered, so
callers can safely write keys without type-checking. Returns
``(plat_data, extra)`` for in-place mutation.
"""
plat_data = platforms_data.setdefault(name, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[name] = plat_data
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
return plat_data, extra
# Module-level cache for bundled platform plugin names (lives outside the
# enum so it doesn't become an accidental enum member).
_Platform__bundled_plugin_names: Optional[set] = None
@@ -717,6 +735,10 @@ def load_gateway_config() -> GatewayConfig:
gw_data["thread_sessions_per_user"] = yaml_cfg["thread_sessions_per_user"]
streaming_cfg = yaml_cfg.get("streaming")
if not isinstance(streaming_cfg, dict):
# Fall back to nested gateway.streaming written by
# ``hermes config set gateway.streaming.*``
streaming_cfg = yaml_cfg.get("gateway", {}).get("streaming")
if isinstance(streaming_cfg, dict):
gw_data["streaming"] = streaming_cfg
@@ -755,7 +777,27 @@ def load_gateway_config() -> GatewayConfig:
merged["extra"] = merged_extra
platforms_data[plat_name] = merged
gw_data["platforms"] = platforms_data
for plat in Platform:
# Iterate built-in platforms plus any registered plugin platforms
# so plugin authors get the same shared-key bridging (#24836).
try:
from hermes_cli.plugins import discover_plugins
discover_plugins() # idempotent
from gateway.platform_registry import platform_registry as _pr
except Exception as e:
logger.debug("plugin discovery skipped: %s", e)
_pr = None
_shared_loop_targets: list = list(Platform)
if _pr is not None:
for _entry in _pr.plugin_entries():
try:
_plat = Platform(_entry.name)
except (ValueError, KeyError):
continue
if _plat not in _shared_loop_targets:
_shared_loop_targets.append(_plat)
for plat in _shared_loop_targets:
if plat == Platform.LOCAL:
continue
platform_cfg = yaml_cfg.get(plat.value)
@@ -810,20 +852,38 @@ def load_gateway_config() -> GatewayConfig:
enabled_was_explicit = "enabled" in platform_cfg
if not bridged and not enabled_was_explicit:
continue
plat_data = platforms_data.setdefault(plat.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[plat.value] = plat_data
plat_data, extra = _ensure_platform_extra_dict(platforms_data, plat.value)
if enabled_was_explicit:
plat_data["enabled"] = platform_cfg["enabled"]
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
if plat == Platform.SLACK and enabled_was_explicit:
extra["_enabled_explicit"] = True
extra.update(bridged)
# Plugin-owned YAML→env config bridges (#24836). See
# ``PlatformEntry.apply_yaml_config_fn`` for the hook contract.
# Order: shared-key loop (above) → this dispatch → legacy hardcoded
# blocks (below; no-op when a hook already set their env var) →
# ``_apply_env_overrides()`` after ``GatewayConfig.from_dict``.
if _pr is not None:
for entry in _pr.all_entries():
if entry.apply_yaml_config_fn is None:
continue
platform_cfg = yaml_cfg.get(entry.name)
if not isinstance(platform_cfg, dict):
continue
try:
seeded = entry.apply_yaml_config_fn(yaml_cfg, platform_cfg)
except Exception as e:
logger.debug(
"apply_yaml_config_fn for %s raised: %s",
entry.name, e,
)
continue
if not isinstance(seeded, dict) or not seeded:
continue
_, extra = _ensure_platform_extra_dict(platforms_data, entry.name)
extra.update(seeded)
# Slack settings → env vars (env vars take precedence)
slack_cfg = yaml_cfg.get("slack", {})
if isinstance(slack_cfg, dict):
@@ -852,6 +912,8 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(discord_cfg, dict):
if "require_mention" in discord_cfg and not os.getenv("DISCORD_REQUIRE_MENTION"):
os.environ["DISCORD_REQUIRE_MENTION"] = str(discord_cfg["require_mention"]).lower()
if "thread_require_mention" in discord_cfg and not os.getenv("DISCORD_THREAD_REQUIRE_MENTION"):
os.environ["DISCORD_THREAD_REQUIRE_MENTION"] = str(discord_cfg["thread_require_mention"]).lower()
frc = discord_cfg.get("free_response_channels")
if frc is not None and not os.getenv("DISCORD_FREE_RESPONSE_CHANNELS"):
if isinstance(frc, list):
+16
View File
@@ -119,6 +119,22 @@ class PlatformEntry:
# Signature: () -> Optional[dict[str, Any]]
env_enablement_fn: Optional[Callable[[], Optional[dict]]] = None
# ── YAML→env config bridge ──
# Optional: translate this platform's ``config.yaml`` keys into env vars
# and/or seed ``PlatformConfig.extra`` directly. Lets a plugin own its
# YAML config translation instead of forcing core ``gateway/config.py``
# to know every platform's schema.
#
# Signature: (yaml_cfg: dict, platform_cfg: dict) -> Optional[dict]
# Called from ``load_gateway_config()`` after the generic shared-key loop
# and before ``_apply_env_overrides``. Mutating ``os.environ`` is allowed
# (use ``not os.getenv(...)`` guards to preserve env > YAML precedence);
# any returned dict is merged into ``PlatformConfig.extra``. Exceptions
# are caught and logged at debug level.
# See website/docs/developer-guide/adding-platform-adapters.md for the
# full contract and a worked example.
apply_yaml_config_fn: Optional[Callable[[dict, dict], Optional[dict]]] = None
# Optional: home-channel env var name for cron/notification delivery
# (e.g. ``"IRC_HOME_CHANNEL"``). When set, ``cron.scheduler`` treats this
# platform as a valid ``deliver=<name>`` target and reads the env var to
+8
View File
@@ -21,6 +21,14 @@ status display, gateway setup, and more.
constructed. Without this, env-only setups don't surface in
`hermes gateway status` or `get_connected_platforms()` until the SDK
instantiates.
- `apply_yaml_config_fn: (yaml_cfg, platform_cfg) -> Optional[dict]`
translate this platform's `config.yaml` keys into env vars and/or seed
`PlatformConfig.extra` directly. Lets a plugin own its YAML schema
instead of growing core `gateway/config.py` boilerplate per platform.
Mutating `os.environ` is allowed (use `not os.getenv(...)` guards to
preserve env > YAML precedence); the returned dict is merged into
`PlatformConfig.extra`. Called during `load_gateway_config()` after
the generic shared-key loop and before `_apply_env_overrides()`.
- `cron_deliver_env_var: str` — name of the `*_HOME_CHANNEL` env var. When
set, `deliver=<name>` cron jobs route to this var without editing
`cron/scheduler.py`'s hardcoded sets.
+10 -2
View File
@@ -1774,8 +1774,12 @@ class BasePlatformAdapter(ABC):
The default implementation falls back to a numbered text list,
which works on every platform the user replies with a number
("2") or with the literal choice text, and the gateway intercepts
and resolves. Adapters with native button UIs (Telegram, Discord)
SHOULD override this for a richer UX.
and resolves. For the text fallback path, the default calls
``mark_awaiting_text()`` so that the gateway text-intercept
(:meth:`GatewayRunner._maybe_intercept_clarify_text`) catches the
user's reply instead of timing out.
Adapters with native button UIs (Telegram, Discord) SHOULD
override this for a richer UX.
"""
if choices:
lines = [f"{question}", ""]
@@ -1784,6 +1788,10 @@ class BasePlatformAdapter(ABC):
lines.append("")
lines.append("Reply with the number, the option text, or your own answer.")
text = "\n".join(lines)
# Text fallback: enable text-capture so the gateway intercept
# picks up the user's typed reply (e.g. "2" or choice text).
from tools.clarify_gateway import mark_awaiting_text
mark_awaiting_text(clarify_id)
else:
text = f"{question}"
return await self.send(
+26 -2
View File
@@ -111,9 +111,33 @@ DINGTALK_TYPE_MAPPING = {
def check_dingtalk_requirements() -> bool:
"""Check if DingTalk dependencies are available and configured."""
"""Check if DingTalk dependencies are available and configured.
Lazy-installs dingtalk-stream via ``tools.lazy_deps.ensure("platform.dingtalk")``
on first call if not present.
"""
global DINGTALK_STREAM_AVAILABLE, dingtalk_stream, ChatbotMessage, CallbackMessage, AckMessage
global HTTPX_AVAILABLE, httpx
if not DINGTALK_STREAM_AVAILABLE or not HTTPX_AVAILABLE:
return False
try:
from tools.lazy_deps import ensure as _lazy_ensure
_lazy_ensure("platform.dingtalk", prompt=False)
except Exception:
return False
try:
import dingtalk_stream as _ds
from dingtalk_stream import ChatbotMessage as _CM
from dingtalk_stream.frames import CallbackMessage as _CBM, AckMessage as _AM
import httpx as _httpx
except ImportError:
return False
dingtalk_stream = _ds
ChatbotMessage = _CM
CallbackMessage = _CBM
AckMessage = _AM
httpx = _httpx
DINGTALK_STREAM_AVAILABLE = True
HTTPX_AVAILABLE = True
if not os.getenv("DINGTALK_CLIENT_ID") or not os.getenv("DINGTALK_CLIENT_SECRET"):
return False
return True
+308 -6
View File
@@ -3577,6 +3577,25 @@ class DiscordAdapter(BasePlatformAdapter):
return {part.strip() for part in s.split(",") if part.strip()}
return set()
def _discord_thread_require_mention(self) -> bool:
"""Return whether thread participation requires @mention to follow up.
When ``False`` (default), once the bot has participated in a thread it
keeps responding to every message in that thread without needing to be
mentioned again useful for one-on-one conversations.
When ``True``, the @mention requirement is enforced inside threads as
well. Set this when multiple bots share a thread and you want each
one to only fire on explicit @mention, avoiding bot-to-bot loops or
unwanted cross-replies.
"""
configured = self.config.extra.get("thread_require_mention")
if configured is not None:
if isinstance(configured, str):
return configured.lower() not in ("false", "0", "no", "off")
return bool(configured)
return os.getenv("DISCORD_THREAD_REQUIRE_MENTION", "false").lower() in ("true", "1", "yes", "on")
def _thread_parent_channel(self, channel: Any) -> Any:
"""Return the parent text channel when invoked from a thread."""
return getattr(channel, "parent", None) or channel
@@ -3877,6 +3896,84 @@ class DiscordAdapter(BasePlatformAdapter):
except Exception as e:
return SendResult(success=False, error=str(e))
async def send_clarify(
self,
chat_id: str,
question: str,
choices: Optional[list],
clarify_id: str,
session_key: str,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Render a clarify prompt with one Discord button per choice.
Multi-choice mode (``choices`` non-empty): renders a button per option
plus a final "✏️ Other (type answer)" button. Picking "Other" flips
the clarify entry into text-capture mode so the next user message in
the session becomes the response. Numeric clicks resolve immediately
via ``resolve_gateway_clarify(clarify_id, choice_text)``.
Open-ended mode (``choices`` empty/None): renders the question as
plain embed text no buttons. The gateway's text-intercept captures
the next message in this session and resolves the clarify.
"""
if not self._client or not DISCORD_AVAILABLE:
return SendResult(success=False, error="Not connected")
try:
target_id = chat_id
if metadata and metadata.get("thread_id"):
target_id = metadata["thread_id"]
channel = self._client.get_channel(int(target_id))
if not channel:
channel = await self._client.fetch_channel(int(target_id))
# Discord embed description limit is 4096; trim conservatively.
max_desc = 4088
body = str(question or "").strip()
if len(body) > max_desc:
body = body[: max_desc - 3] + "..."
embed = discord.Embed(
title="❓ Hermes needs your input",
description=body,
color=discord.Color.orange(),
)
clean_choices = [
str(c).strip() for c in (choices or []) if c is not None and str(c).strip()
]
# Discord allows up to 5 buttons per row, 5 rows per view = 25.
# We reserve one slot for the "Other" button, so cap at 24 choices.
clean_choices = clean_choices[:24]
if clean_choices:
embed.add_field(
name="Choices",
value="Pick one below, or click ✏️ Other to type a custom answer.",
inline=False,
)
view = ClarifyChoiceView(
choices=clean_choices,
clarify_id=clarify_id,
allowed_user_ids=self._allowed_user_ids,
allowed_role_ids=self._allowed_role_ids,
)
else:
embed.add_field(
name="Reply",
value="Reply in this channel with your answer.",
inline=False,
)
view = None
msg = await channel.send(embed=embed, view=view) if view else await channel.send(embed=embed)
return SendResult(success=True, message_id=str(msg.id))
except Exception as e:
logger.warning("[%s] send_clarify failed: %s", self.name, e)
return SendResult(success=False, error=str(e))
async def send_update_prompt(
self, chat_id: str, prompt: str, default: str = "",
session_key: str = "",
@@ -4167,6 +4264,17 @@ class DiscordAdapter(BasePlatformAdapter):
raw_content = message.content.strip()
normalized_content = raw_content
mention_prefix = False
snapshot_attachments = []
if hasattr(message, "message_snapshots") and message.message_snapshots:
snapshot_text_parts = []
for snap in message.message_snapshots:
if getattr(snap, "content", None):
snapshot_text_parts.append(snap.content.strip())
snapshot_attachments.extend(getattr(snap, "attachments", []) or [])
if snapshot_text_parts and not raw_content:
raw_content = "\n".join(snapshot_text_parts)
normalized_content = raw_content
if self._client.user and self._client.user in message.mentions:
mention_prefix = True
normalized_content = normalized_content.replace(f"<@{self._client.user.id}>", "").strip()
@@ -4209,8 +4317,15 @@ class DiscordAdapter(BasePlatformAdapter):
)
# Skip the mention check if the message is in a thread where
# the bot has previously participated (auto-created or replied in).
in_bot_thread = is_thread and thread_id in self._threads
# the bot has previously participated (auto-created or replied in)
# — UNLESS thread_require_mention is enabled, in which case threads
# are gated the same as channels. Useful when multiple bots share
# a thread.
in_bot_thread = (
is_thread
and thread_id in self._threads
and not self._discord_thread_require_mention()
)
if require_mention and not is_free_channel and not in_bot_thread:
if self._client.user not in message.mentions and not mention_prefix:
@@ -4223,7 +4338,7 @@ class DiscordAdapter(BasePlatformAdapter):
if not is_thread and not isinstance(message.channel, discord.DMChannel):
no_thread_channels_raw = os.getenv("DISCORD_NO_THREAD_CHANNELS", "")
no_thread_channels = {ch.strip() for ch in no_thread_channels_raw.split(",") if ch.strip()}
skip_thread = bool(channel_ids & no_thread_channels)
skip_thread = bool(channel_ids & no_thread_channels) or is_free_channel
auto_thread = os.getenv("DISCORD_AUTO_THREAD", "true").lower() in {"true", "1", "yes"}
is_reply_message = getattr(message, "type", None) == discord.MessageType.reply
if auto_thread and not skip_thread and not is_voice_linked_channel and not is_reply_message:
@@ -4235,13 +4350,15 @@ class DiscordAdapter(BasePlatformAdapter):
auto_threaded_channel = thread
self._threads.mark(thread_id)
all_attachments = list(message.attachments) + snapshot_attachments
# Determine message type
msg_type = MessageType.TEXT
if normalized_content.startswith("/"):
msg_type = MessageType.COMMAND
elif message.attachments:
elif all_attachments:
# Check attachment types
for att in message.attachments:
for att in all_attachments:
if att.content_type:
if att.content_type.startswith("image/"):
msg_type = MessageType.PHOTO
@@ -4300,7 +4417,7 @@ class DiscordAdapter(BasePlatformAdapter):
media_urls = []
media_types = []
pending_text_injection: Optional[str] = None
for att in message.attachments:
for att in all_attachments:
content_type = att.content_type or "unknown"
if content_type.startswith("image/"):
try:
@@ -5099,3 +5216,188 @@ if DISCORD_AVAILABLE:
async def on_timeout(self):
self.resolved = True
self.clear_items()
class ClarifyChoiceView(discord.ui.View):
"""Interactive button view for the clarify tool's multiple-choice prompts.
Renders one button per choice (max 24) plus a final `` Other`` button.
Picking a numeric choice resolves the gateway clarify entry immediately;
picking ``Other`` flips the entry into text-capture mode so the next
user message in the session becomes the response (the gateway's
text-intercept handles the resolution).
Auth gating mirrors ``ExecApprovalView`` only users/roles in the
Discord adapter's allowlist may answer. Single-use: after the first
valid click all buttons disable and the embed updates to show who
answered and what they chose.
"""
def __init__(
self,
choices: List[str],
clarify_id: str,
allowed_user_ids: set,
allowed_role_ids: Optional[set] = None,
):
super().__init__(timeout=300) # 5-minute timeout
self.choices = list(choices)[:24]
self.clarify_id = clarify_id
self.allowed_user_ids = allowed_user_ids
self.allowed_role_ids = allowed_role_ids or set()
self.resolved = False
for index, choice in enumerate(self.choices):
# Discord button labels are capped at 80 chars.
label_body = choice if len(choice) <= 75 else choice[:72] + "..."
button = discord.ui.Button(
label=f"{index + 1}. {label_body}",
style=discord.ButtonStyle.primary,
custom_id=f"clarify:{clarify_id}:{index}",
)
button.callback = self._make_choice_callback(index, choice)
self.add_item(button)
other_btn = discord.ui.Button(
label="✏️ Other (type answer)",
style=discord.ButtonStyle.secondary,
custom_id=f"clarify:{clarify_id}:other",
)
other_btn.callback = self._on_other
self.add_item(other_btn)
def _check_auth(self, interaction: "discord.Interaction") -> bool:
return _component_check_auth(
interaction, self.allowed_user_ids, self.allowed_role_ids,
)
def _make_choice_callback(self, index: int, choice: str):
async def _callback(interaction: "discord.Interaction"):
await self._resolve_choice(interaction, index, choice)
return _callback
async def _resolve_choice(
self,
interaction: "discord.Interaction",
index: int,
choice: str,
) -> None:
"""Resolve the clarify with a chosen option."""
if self.resolved:
await interaction.response.send_message(
"This prompt has already been answered~", ephemeral=True,
)
return
if not self._check_auth(interaction):
await interaction.response.send_message(
"You're not authorized to answer this prompt~", ephemeral=True,
)
return
self.resolved = True
for child in self.children:
child.disabled = True
embed = interaction.message.embeds[0] if (
interaction.message and interaction.message.embeds
) else None
if embed:
user = getattr(interaction, "user", None)
display_name = getattr(user, "display_name", "user")
embed.color = discord.Color.green()
embed.set_footer(text=f"Answered by {display_name}: {choice}")
try:
await interaction.response.edit_message(embed=embed, view=self)
except Exception:
logger.debug(
"Discord clarify edit_message failed for %s",
self.clarify_id,
exc_info=True,
)
try:
await interaction.response.defer()
except Exception:
pass
# Resolve via the gateway clarify primitive — same mechanism as
# Telegram. Look up the canonical choice text from the entry so
# we round-trip the original value, not a button-label variant.
resolved_text: Optional[str] = None
try:
from tools.clarify_gateway import _entries as _clarify_entries # type: ignore
entry = _clarify_entries.get(self.clarify_id)
if entry and entry.choices and 0 <= index < len(entry.choices):
resolved_text = entry.choices[index]
except Exception:
resolved_text = None
if resolved_text is None:
resolved_text = choice
try:
from tools.clarify_gateway import resolve_gateway_clarify
resolved = resolve_gateway_clarify(self.clarify_id, resolved_text)
logger.info(
"Discord clarify button resolved (id=%s, choice=%r, user=%s, ok=%s)",
self.clarify_id, resolved_text,
getattr(getattr(interaction, "user", None), "display_name", "?"),
resolved,
)
except Exception as exc:
logger.error(
"Discord clarify resolve_gateway_clarify failed (id=%s): %s",
self.clarify_id, exc,
)
async def _on_other(self, interaction: "discord.Interaction") -> None:
"""Flip the clarify entry into text-capture mode."""
if self.resolved:
await interaction.response.send_message(
"This prompt has already been answered~", ephemeral=True,
)
return
if not self._check_auth(interaction):
await interaction.response.send_message(
"You're not authorized to answer this prompt~", ephemeral=True,
)
return
# Don't pop the entry — the gateway's text-intercept needs it
# until the user actually types. Just mark it as awaiting text
# and disable the buttons so the user can't double-click.
try:
from tools.clarify_gateway import mark_awaiting_text
mark_awaiting_text(self.clarify_id)
except Exception as exc:
logger.warning(
"Discord clarify mark_awaiting_text failed (id=%s): %s",
self.clarify_id, exc,
)
self.resolved = True
for child in self.children:
child.disabled = True
embed = interaction.message.embeds[0] if (
interaction.message and interaction.message.embeds
) else None
if embed:
user = getattr(interaction, "user", None)
display_name = getattr(user, "display_name", "user")
embed.color = discord.Color.blue()
embed.set_footer(
text=f"Awaiting typed response from {display_name}",
)
try:
await interaction.response.edit_message(embed=embed, view=self)
except Exception:
try:
await interaction.response.defer()
except Exception:
pass
async def on_timeout(self):
self.resolved = True
for child in self.children:
child.disabled = True
+61 -4
View File
@@ -1300,12 +1300,12 @@ def _run_official_feishu_ws_client(ws_client: Any, adapter: Any) -> None:
except Exception:
logger.debug("[Feishu] Failed to apply websocket runtime overrides", exc_info=True)
async def _connect_with_overrides(*args: Any, **kwargs: Any) -> Any:
def _connect_with_overrides(*args: Any, **kwargs: Any) -> Any:
if adapter._ws_ping_interval is not None and "ping_interval" not in kwargs:
kwargs["ping_interval"] = adapter._ws_ping_interval
if adapter._ws_ping_timeout is not None and "ping_timeout" not in kwargs:
kwargs["ping_timeout"] = adapter._ws_ping_timeout
return await original_connect(*args, **kwargs)
return original_connect(*args, **kwargs)
def _configure_with_overrides(conf: Any) -> Any:
if original_configure is None:
@@ -1343,8 +1343,65 @@ def _run_official_feishu_ws_client(ws_client: Any, adapter: Any) -> None:
def check_feishu_requirements() -> bool:
"""Check if Feishu/Lark dependencies are available."""
return FEISHU_AVAILABLE
"""Check if Feishu/Lark dependencies are available.
Lazy-installs lark-oapi via ``tools.lazy_deps.ensure("platform.feishu")``
on first call if not present. Rebinds all module-level globals on success.
"""
if FEISHU_AVAILABLE:
return True
def _import():
import lark_oapi as lark
from lark_oapi.api.application.v6 import GetApplicationRequest
from lark_oapi.api.im.v1 import (
CreateFileRequest, CreateFileRequestBody,
CreateImageRequest, CreateImageRequestBody,
CreateMessageRequest, CreateMessageRequestBody,
GetChatRequest, GetMessageRequest, GetMessageResourceRequest,
P2ImMessageMessageReadV1,
ReplyMessageRequest, ReplyMessageRequestBody,
UpdateMessageRequest, UpdateMessageRequestBody,
)
from lark_oapi.core import AccessTokenType, HttpMethod
from lark_oapi.core.const import FEISHU_DOMAIN, LARK_DOMAIN
from lark_oapi.core.model import BaseRequest
from lark_oapi.event.callback.model.p2_card_action_trigger import (
CallBackCard, P2CardActionTriggerResponse,
)
from lark_oapi.event.dispatcher_handler import EventDispatcherHandler
from lark_oapi.ws import Client as FeishuWSClient
return {
"lark": lark,
"GetApplicationRequest": GetApplicationRequest,
"CreateFileRequest": CreateFileRequest,
"CreateFileRequestBody": CreateFileRequestBody,
"CreateImageRequest": CreateImageRequest,
"CreateImageRequestBody": CreateImageRequestBody,
"CreateMessageRequest": CreateMessageRequest,
"CreateMessageRequestBody": CreateMessageRequestBody,
"GetChatRequest": GetChatRequest,
"GetMessageRequest": GetMessageRequest,
"GetMessageResourceRequest": GetMessageResourceRequest,
"P2ImMessageMessageReadV1": P2ImMessageMessageReadV1,
"ReplyMessageRequest": ReplyMessageRequest,
"ReplyMessageRequestBody": ReplyMessageRequestBody,
"UpdateMessageRequest": UpdateMessageRequest,
"UpdateMessageRequestBody": UpdateMessageRequestBody,
"AccessTokenType": AccessTokenType,
"HttpMethod": HttpMethod,
"FEISHU_DOMAIN": FEISHU_DOMAIN,
"LARK_DOMAIN": LARK_DOMAIN,
"BaseRequest": BaseRequest,
"CallBackCard": CallBackCard,
"P2CardActionTriggerResponse": P2CardActionTriggerResponse,
"EventDispatcherHandler": EventDispatcherHandler,
"FeishuWSClient": FeishuWSClient,
"FEISHU_AVAILABLE": True,
}
from tools.lazy_deps import ensure_and_bind
return ensure_and_bind("platform.feishu", _import, globals(), prompt=False)
class FeishuAdapter(BasePlatformAdapter):
+30 -5
View File
@@ -224,7 +224,11 @@ def _check_e2ee_deps() -> bool:
def check_matrix_requirements() -> bool:
"""Return True if the Matrix adapter can be used."""
"""Return True if the Matrix adapter can be used.
Lazy-installs mautrix via ``tools.lazy_deps.ensure("platform.matrix")``
on first call if not present. Rebinds all module-level type globals on success.
"""
token = os.getenv("MATRIX_ACCESS_TOKEN", "")
password = os.getenv("MATRIX_PASSWORD", "")
homeserver = os.getenv("MATRIX_HOMESERVER", "")
@@ -238,10 +242,31 @@ def check_matrix_requirements() -> bool:
try:
import mautrix # noqa: F401
except ImportError:
logger.warning(
"Matrix: mautrix not installed. Run: pip install 'mautrix[encryption]'"
)
return False
def _import():
from mautrix.types import (
ContentURI, EventID, EventType, PaginationDirection,
PresenceState, RoomCreatePreset, RoomID, SyncToken,
TrustState, UserID,
)
return {
"ContentURI": ContentURI,
"EventID": EventID,
"EventType": EventType,
"PaginationDirection": PaginationDirection,
"PresenceState": PresenceState,
"RoomCreatePreset": RoomCreatePreset,
"RoomID": RoomID,
"SyncToken": SyncToken,
"TrustState": TrustState,
"UserID": UserID,
}
from tools.lazy_deps import ensure_and_bind
if not ensure_and_bind("platform.matrix", _import, globals(), prompt=False):
logger.warning(
"Matrix: mautrix not installed. Run: pip install 'mautrix[encryption]'"
)
return False
# If encryption is requested, verify E2EE deps are available at startup
# rather than silently degrading to plaintext-only at connect time.
+27 -2
View File
@@ -176,6 +176,28 @@ class QQAdapter(BasePlatformAdapter):
fut.set_exception(RuntimeError(reason))
self._pending_responses.clear()
def _mark_transport_disconnected(self) -> None:
"""Mark QQ WS down without stopping the reconnect loop.
BasePlatformAdapter uses _running for both process lifecycle and
connection status. QQBot needs to keep the listener task alive across
transient transport drops so it can continue reconnect attempts after a
short-lived gateway or network failure.
"""
if self.has_fatal_error:
return
self._write_runtime_status_safe(
"disconnected",
platform_state="disconnected",
error_code=None,
error_message=None,
)
@property
def is_connected(self) -> bool:
"""Return True only when the QQ WebSocket transport is usable."""
return bool(self._running and self._ws and not self._ws.closed)
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.QQBOT)
@@ -509,7 +531,7 @@ class QQAdapter(BasePlatformAdapter):
else:
quick_disconnect_count = 0
self._mark_disconnected()
self._mark_transport_disconnected()
self._fail_pending("Connection closed")
# Stop reconnecting for fatal codes
@@ -531,6 +553,7 @@ class QQAdapter(BasePlatformAdapter):
RATE_LIMIT_DELAY,
)
if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
self._mark_disconnected()
return
await asyncio.sleep(RATE_LIMIT_DELAY)
if await self._reconnect(backoff_idx):
@@ -584,17 +607,19 @@ class QQAdapter(BasePlatformAdapter):
backoff_idx += 1
if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
logger.error("[%s] Max reconnect attempts reached (QQCloseError)", self._log_tag)
self._mark_disconnected()
return
except Exception as exc:
if not self._running:
return
logger.warning("[%s] WebSocket error: %s", self._log_tag, exc)
self._mark_disconnected()
self._mark_transport_disconnected()
self._fail_pending("Connection interrupted")
if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
logger.error("[%s] Max reconnect attempts reached", self._log_tag)
self._mark_disconnected()
return
if await self._reconnect(backoff_idx):
+43 -2
View File
@@ -73,8 +73,29 @@ class _ThreadContextCache:
def check_slack_requirements() -> bool:
"""Check if Slack dependencies are available."""
return SLACK_AVAILABLE
"""Check if Slack dependencies are available.
Lazy-installs slack-bolt/slack-sdk via ``tools.lazy_deps.ensure("platform.slack")``
on first call if not present. Rebinds all module-level globals on success.
"""
if SLACK_AVAILABLE:
return True
def _import():
from slack_bolt.async_app import AsyncApp
from slack_bolt.adapter.socket_mode.async_handler import AsyncSocketModeHandler
from slack_sdk.web.async_client import AsyncWebClient
import aiohttp
return {
"AsyncApp": AsyncApp,
"AsyncSocketModeHandler": AsyncSocketModeHandler,
"AsyncWebClient": AsyncWebClient,
"aiohttp": aiohttp,
"SLACK_AVAILABLE": True,
}
from tools.lazy_deps import ensure_and_bind
return ensure_and_bind("platform.slack", _import, globals(), prompt=False)
def _extract_text_from_slack_blocks(blocks: list) -> str:
@@ -1777,6 +1798,26 @@ class SlackAdapter(BasePlatformAdapter):
return
original_text = event.get("text", "")
# Slack blocks native slash commands inside threads ("/queue is not
# supported in threads. Sorry!"). As a workaround, recognise a
# leading ``!`` as an alternate command prefix and rewrite it to
# ``/`` so the rest of the pipeline (MessageType.COMMAND tagging,
# gateway dispatcher) handles it like a normal slash command. Only
# rewrite when the first token resolves to a known gateway command
# so casual messages like "!nice work" pass through unchanged.
if original_text.startswith("!"):
try:
from hermes_cli.commands import is_gateway_known_command
first_token = original_text[1:].split(maxsplit=1)[0]
# Strip "@suffix" the same way get_command() does, so
# forms like ``!stop@hermes`` still resolve.
cmd_name = first_token.split("@", 1)[0].lower()
if cmd_name and "/" not in cmd_name and is_gateway_known_command(cmd_name):
original_text = "/" + original_text[1:]
except Exception: # pragma: no cover - defensive
pass
text = original_text
# Extract quoted/forwarded content from Slack blocks.
+49 -34
View File
@@ -332,6 +332,13 @@ class TelegramAdapter(BasePlatformAdapter):
MEDIA_GROUP_WAIT_SECONDS = 0.8
_GENERAL_TOPIC_THREAD_ID = "1"
# Telegram's edit_message applies MarkdownV2 formatting only on the
# finalize=True path. Without this flag, stream_consumer._send_or_edit
# short-circuits when the raw text is unchanged between the last streamed
# edit and the final edit, skipping the plain-text → MarkdownV2 conversion.
# Fixes #25710.
REQUIRES_EDIT_FINALIZE: bool = True
# Adaptive text-batch ingress: short messages need a tighter delay so the
# first token reaches the agent fast. Numbers tuned for "feels instant":
# ≤320 codepoints (one short paragraph) settles in ~180ms; ≤1024
@@ -2070,7 +2077,7 @@ class TelegramAdapter(BasePlatformAdapter):
return SendResult(success=False, error="Not connected")
try:
default_hint = f" (default: {default})" if default else ""
text = f"⚕ *Update needs your input:*\n\n{prompt}{default_hint}"
text = self.format_message(f"⚕ *Update needs your input:*\n\n{prompt}{default_hint}")
keyboard = InlineKeyboardMarkup([
[
InlineKeyboardButton("✓ Yes", callback_data="update_prompt:y"),
@@ -2082,7 +2089,7 @@ class TelegramAdapter(BasePlatformAdapter):
msg = await self._send_message_with_thread_fallback(
chat_id=int(chat_id),
text=text,
parse_mode=ParseMode.MARKDOWN,
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=keyboard,
reply_to_message_id=reply_to_id,
**self._thread_kwargs_for_send(
@@ -2334,11 +2341,13 @@ class TelegramAdapter(BasePlatformAdapter):
keyboard = InlineKeyboardMarkup(rows)
provider_label = get_label(current_provider)
text = (
f"⚙ *Model Configuration*\n\n"
f"Current model: `{current_model or 'unknown'}`\n"
f"Provider: {provider_label}\n\n"
f"Select a provider:"
text = self.format_message(
(
f"⚙ *Model Configuration*\n\n"
f"Current model: `{current_model or 'unknown'}`\n"
f"Provider: {provider_label}\n\n"
f"Select a provider:"
)
)
thread_id = metadata.get("thread_id") if metadata else None
@@ -2346,7 +2355,7 @@ class TelegramAdapter(BasePlatformAdapter):
msg = await self._send_message_with_thread_fallback(
chat_id=int(chat_id),
text=text,
parse_mode=ParseMode.MARKDOWN,
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=keyboard,
reply_to_message_id=reply_to_id,
**self._thread_kwargs_for_send(
@@ -2456,12 +2465,14 @@ class TelegramAdapter(BasePlatformAdapter):
extra = f"\n_{total - shown} more available — type `/model <name>` directly_" if total > shown else ""
await query.edit_message_text(
text=(
f"⚙ *Model Configuration*\n\n"
f"Provider: *{pname}*{page_info}\n"
f"Select a model:{extra}"
text=self.format_message(
(
f"⚙ *Model Configuration*\n\n"
f"Provider: *{pname}*{page_info}\n"
f"Select a model:{extra}"
)
),
parse_mode=ParseMode.MARKDOWN,
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=keyboard,
)
await query.answer()
@@ -2490,12 +2501,14 @@ class TelegramAdapter(BasePlatformAdapter):
extra = f"\n_{total - shown} more available — type `/model <name>` directly_" if total > shown else ""
await query.edit_message_text(
text=(
f"⚙ *Model Configuration*\n\n"
f"Provider: *{pname}*{page_info}\n"
f"Select a model:{extra}"
text=self.format_message(
(
f"⚙ *Model Configuration*\n\n"
f"Provider: *{pname}*{page_info}\n"
f"Select a model:{extra}"
)
),
parse_mode=ParseMode.MARKDOWN,
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=keyboard,
)
await query.answer()
@@ -2530,8 +2543,8 @@ class TelegramAdapter(BasePlatformAdapter):
# Edit message to show confirmation, remove buttons
try:
await query.edit_message_text(
text=result_text,
parse_mode=ParseMode.MARKDOWN,
text=self.format_message(result_text),
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=None,
)
except Exception:
@@ -2571,13 +2584,15 @@ class TelegramAdapter(BasePlatformAdapter):
provider_label = state["current_provider"]
await query.edit_message_text(
text=(
f"⚙ *Model Configuration*\n\n"
f"Current model: `{state['current_model'] or 'unknown'}`\n"
f"Provider: {provider_label}\n\n"
f"Select a provider:"
text=self.format_message(
(
f"⚙ *Model Configuration*\n\n"
f"Current model: `{state['current_model'] or 'unknown'}`\n"
f"Provider: {provider_label}\n\n"
f"Select a provider:"
)
),
parse_mode=ParseMode.MARKDOWN,
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=keyboard,
)
await query.answer()
@@ -2660,8 +2675,8 @@ class TelegramAdapter(BasePlatformAdapter):
# Edit message to show decision, remove buttons
try:
await query.edit_message_text(
text=f"{label} by {user_display}",
parse_mode=ParseMode.MARKDOWN,
text=self.format_message(f"{label} by {user_display}"),
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=None,
)
except Exception:
@@ -2714,8 +2729,8 @@ class TelegramAdapter(BasePlatformAdapter):
try:
await query.edit_message_text(
text=f"{label} by {user_display}",
parse_mode=ParseMode.MARKDOWN,
text=self.format_message(f"{label} by {user_display}"),
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=None,
)
except Exception:
@@ -2740,8 +2755,8 @@ class TelegramAdapter(BasePlatformAdapter):
prompt_message_id = getattr(query.message, "message_id", None)
send_kwargs: Dict[str, Any] = {
"chat_id": int(query.message.chat_id),
"text": result_text,
"parse_mode": ParseMode.MARKDOWN,
"text": self.format_message(result_text),
"parse_mode": ParseMode.MARKDOWN_V2,
**self._link_preview_kwargs(),
}
chat_type_value = getattr(chat_type, "value", chat_type)
@@ -2901,8 +2916,8 @@ class TelegramAdapter(BasePlatformAdapter):
label = "Yes" if answer == "y" else "No"
try:
await query.edit_message_text(
text=f"⚕ Update prompt answered: *{label}*",
parse_mode=ParseMode.MARKDOWN,
text=self.format_message(f"⚕ Update prompt answered: *{label}*"),
parse_mode=ParseMode.MARKDOWN_V2,
reply_markup=None,
)
except Exception:
+1
View File
@@ -345,6 +345,7 @@ class WeComAdapter(BasePlatformAdapter):
try:
await self._open_connection()
backoff_idx = 0
self._mark_connected()
logger.info("[%s] Reconnected", self.name)
except Exception as reconnect_exc:
logger.warning("[%s] Reconnect failed: %s", self.name, reconnect_exc)
+32 -2
View File
@@ -322,6 +322,26 @@ class WhatsAppAdapter(BasePlatformAdapter):
return {str(part).strip() for part in raw if str(part).strip()}
return {part.strip() for part in str(raw).split(",") if part.strip()}
@staticmethod
def _is_broadcast_chat(chat_id: str) -> bool:
"""True for WhatsApp pseudo-chats that aren't real conversations.
Covers Status updates (Stories) and Channel/Newsletter broadcasts.
These show up as inbound messages on Baileys but the agent should
never reply answering a Story update spams the contact's status
feed, and Channel posts aren't addressable in the first place.
"""
if not chat_id:
return False
cid = chat_id.strip().lower()
if cid == "status@broadcast":
return True
# @broadcast suffix covers status@broadcast plus any future
# broadcast-list variants. @newsletter is the Channel JID suffix.
if cid.endswith("@broadcast") or cid.endswith("@newsletter"):
return True
return False
def _is_dm_allowed(self, sender_id: str) -> bool:
"""Check whether a DM from the given sender should be processed."""
if self._dm_policy == "disabled":
@@ -432,9 +452,16 @@ class WhatsAppAdapter(BasePlatformAdapter):
return cleaned.strip() or text
def _should_process_message(self, data: Dict[str, Any]) -> bool:
chat_id_raw = str(data.get("chatId") or "")
# WhatsApp uses pseudo-chats for Status updates (Stories) and
# Channel/Newsletter broadcasts. These are not real conversations
# and the agent should never reply to them — even in self-chat mode
# where the bridge may surface them as "fromMe" events.
if self._is_broadcast_chat(chat_id_raw):
return False
is_group = data.get("isGroup", False)
if is_group:
chat_id = str(data.get("chatId") or "")
chat_id = chat_id_raw
if not self._is_group_allowed(chat_id):
return False
else:
@@ -494,12 +521,15 @@ class WhatsAppAdapter(BasePlatformAdapter):
# plain executable path.
_npm_bin = shutil.which("npm") or "npm"
try:
# Read timeout from environment variable, default to 300 seconds (5 minutes)
# to accommodate slower systems like Unraid NAS
npm_install_timeout = int(os.environ.get("WHATSAPP_NPM_INSTALL_TIMEOUT", "300"))
install_result = subprocess.run(
[_npm_bin, "install", "--silent"],
cwd=str(bridge_dir),
capture_output=True,
text=True,
timeout=60,
timeout=npm_install_timeout,
)
if install_result.returncode != 0:
print(f"[{self.name}] npm install failed: {install_result.stderr}")
+184 -4
View File
@@ -1139,6 +1139,38 @@ def _should_clear_resume_pending_after_turn(agent_result: dict) -> bool:
return True
def _preserve_queued_followup_history_offset(
current_result: dict,
followup_result: dict,
) -> dict:
"""Carry the outer history offset through queued follow-up drains.
``_process_message_background()`` persists transcript rows only once, after the
entire in-band queued-follow-up chain returns. Each recursive ``_run_agent()``
call advances ``history_offset`` to the history it received, so without
correction the outermost persistence step sees only the *last* queued turn as
"new" and silently drops earlier turns from the same drain chain.
Preserve the earliest (outermost) history offset so the final transcript slice
still includes every queued turn that ran during the chain.
"""
if not isinstance(followup_result, dict):
return followup_result
if not isinstance(current_result, dict):
return followup_result
current_offset = current_result.get("history_offset")
followup_offset = followup_result.get("history_offset")
if not isinstance(current_offset, int):
return followup_result
if isinstance(followup_offset, int) and followup_offset <= current_offset:
return followup_result
merged = dict(followup_result)
merged["history_offset"] = current_offset
return merged
class GatewayRunner:
"""
Main gateway controller.
@@ -6096,6 +6128,12 @@ class GatewayRunner:
if _cmd_def_inner and _cmd_def_inner.name == "model":
return "Agent is running — wait or /stop first, then switch models."
# /codex-runtime must not be used while the agent is running.
# Switching mid-turn would split a turn across two transports.
if _cmd_def_inner and _cmd_def_inner.name == "codex-runtime":
return ("Agent is running — wait or /stop first, then "
"change runtime.")
# /approve and /deny must bypass the running-agent interrupt path.
# The agent thread is blocked on a threading.Event inside
# tools/approval.py — sending an interrupt won't unblock it.
@@ -6135,6 +6173,12 @@ class GatewayRunner:
return await self._handle_goal_command(event)
return "Agent is running — use /goal status / pause / clear mid-run, or /stop before setting a new goal."
# /subgoal is safe mid-run — it only modifies the goal's
# subgoals list, which the judge reads at the next turn
# boundary. No race with the running turn.
if _cmd_def_inner and _cmd_def_inner.name == "subgoal":
return await self._handle_subgoal_command(event)
# Session-level toggles that are safe to run mid-agent —
# /yolo can unblock a pending approval prompt, /verbose cycles
# the tool-progress display mode for the ongoing stream.
@@ -6430,6 +6474,9 @@ class GatewayRunner:
if canonical == "model":
return await self._handle_model_command(event)
if canonical == "codex-runtime":
return await self._handle_codex_runtime_command(event)
if canonical == "personality":
return await self._handle_personality_command(event)
@@ -6513,6 +6560,9 @@ class GatewayRunner:
if canonical == "goal":
return await self._handle_goal_command(event)
if canonical == "subgoal":
return await self._handle_subgoal_command(event)
if canonical == "voice":
return await self._handle_voice_command(event)
@@ -9210,6 +9260,51 @@ class GatewayRunner:
return "\n".join(lines)
async def _handle_codex_runtime_command(self, event: MessageEvent) -> str:
"""Handle /codex-runtime command in the gateway.
Same surface as the CLI handler in cli.py:
/codex-runtime show current state
/codex-runtime auto Hermes default runtime
/codex-runtime codex_app_server codex subprocess runtime
/codex-runtime on / off synonyms
On change, the cached agent for this session is evicted so the next
message creates a fresh AIAgent with the new api_mode wired in
(avoids prompt-cache invalidation mid-session)."""
from hermes_cli import codex_runtime_switch as crs
raw_args = event.get_command_args().strip() if event else ""
new_value, errors = crs.parse_args(raw_args)
if errors:
return "" + "\n".join(errors)
# Load + persist via the same helpers used for /model and /yolo
try:
from hermes_cli.config import load_config, save_config
except Exception as exc:
return f"❌ Could not load config: {exc}"
cfg = load_config()
result = crs.apply(
cfg,
new_value,
persist_callback=(save_config if new_value is not None else None),
)
# On a real change, evict the cached agent so the new runtime takes
# effect on the next message rather than waiting for cache TTL.
if result.success and new_value is not None and result.requires_new_session:
try:
session_key = self._session_key_for_source(event.source)
self._evict_cached_agent(session_key)
except Exception:
logger.debug("could not evict cached agent after codex-runtime change",
exc_info=True)
prefix = "" if result.success else ""
return f"{prefix} {result.message}"
async def _handle_personality_command(self, event: MessageEvent) -> str:
"""Handle /personality command - list or set a personality."""
from hermes_constants import display_hermes_home
@@ -9438,6 +9533,57 @@ class GatewayRunner:
return t("gateway.goal.set", budget=state.max_turns, goal=state.goal)
async def _handle_subgoal_command(self, event: "MessageEvent") -> str:
"""Handle /subgoal for gateway platforms (mirror of CLI handler).
Subgoals are extra criteria appended to the active goal mid-loop.
They modify state read at the next turn boundary, so this is safe
to invoke while the agent is running.
"""
args = (event.get_command_args() or "").strip()
mgr, _session_entry = self._get_goal_manager_for_event(event)
if mgr is None:
return t("gateway.goal.unavailable")
if not mgr.has_goal():
return "No active goal. Set one with /goal <text>."
# No args → list current subgoals.
if not args:
return f"{mgr.status_line()}\n{mgr.render_subgoals()}"
tokens = args.split(None, 1)
verb = tokens[0].lower()
rest = tokens[1].strip() if len(tokens) > 1 else ""
if verb == "remove":
if not rest:
return "Usage: /subgoal remove <n>"
try:
idx = int(rest.split()[0])
except ValueError:
return "/subgoal remove: <n> must be an integer (1-based index)."
try:
removed = mgr.remove_subgoal(idx)
except (IndexError, RuntimeError) as exc:
return f"/subgoal remove: {exc}"
return f"✓ Removed subgoal {idx}: {removed}"
if verb == "clear":
try:
prev = mgr.clear_subgoals()
except RuntimeError as exc:
return f"/subgoal clear: {exc}"
if prev:
return f"✓ Cleared {prev} subgoal{'s' if prev != 1 else ''}."
return "No subgoals to clear."
try:
text = mgr.add_subgoal(args)
except (ValueError, RuntimeError) as exc:
return f"/subgoal: {exc}"
idx = len(mgr.state.subgoals) if mgr.state else 0
return f"✓ Added subgoal {idx}: {text}"
async def _send_goal_status_notice(self, source: Any, message: str) -> None:
"""Send a /goal judge status line back to the originating chat/thread."""
adapter = self.adapters.get(source.platform)
@@ -10209,6 +10355,10 @@ class GatewayRunner:
event_message_id = self._reply_anchor_for_event(event)
# Forward image/audio attachments so the background agent can see them.
media_urls = list(event.media_urls) if event.media_urls else []
media_types = list(event.media_types) if event.media_types else []
# Fire-and-forget the background task
_task = asyncio.create_task(
self._run_background_task(
@@ -10216,6 +10366,8 @@ class GatewayRunner:
source,
task_id,
event_message_id=event_message_id,
media_urls=media_urls,
media_types=media_types,
)
)
self._background_tasks.add(_task)
@@ -10230,10 +10382,15 @@ class GatewayRunner:
source: "SessionSource",
task_id: str,
event_message_id: Optional[str] = None,
media_urls: Optional[List[str]] = None,
media_types: Optional[List[str]] = None,
) -> None:
"""Execute a background agent task and deliver the result to the chat."""
from run_agent import AIAgent
media_urls = media_urls or []
media_types = media_types or []
adapter = self.adapters.get(source.platform)
if not adapter:
logger.warning("No adapter for platform %s in background task %s", source.platform, task_id)
@@ -10269,6 +10426,23 @@ class GatewayRunner:
self._service_tier = self._load_service_tier()
turn_route = self._resolve_turn_agent_config(prompt, model, runtime_kwargs)
# Enrich the prompt with image descriptions so the background
# agent can see user-attached images (same as the main flow).
enriched_prompt = prompt
if media_urls:
image_paths = []
for i, path in enumerate(media_urls):
mtype = media_types[i] if i < len(media_types) else ""
if mtype.startswith("image/"):
image_paths.append(path)
if image_paths:
try:
enriched_prompt = await self._enrich_message_with_vision(
prompt, image_paths,
)
except Exception as e:
logger.warning("Background task vision enrichment failed: %s", e)
def run_sync():
agent = AIAgent(
model=turn_route["model"],
@@ -10300,7 +10474,7 @@ class GatewayRunner:
)
try:
return agent.run_conversation(
user_message=prompt,
user_message=enriched_prompt,
task_id=task_id,
)
finally:
@@ -15957,6 +16131,7 @@ class GatewayRunner:
_already_streamed = bool(
(_sc and getattr(_sc, "final_response_sent", False))
or _previewed
or (_sc and getattr(_sc, "final_content_delivered", False))
)
first_response = result.get("final_response", "")
if first_response and not _already_streamed:
@@ -16042,7 +16217,7 @@ class GatewayRunner:
except Exception:
pass
return await self._run_agent(
followup_result = await self._run_agent(
message=next_message,
context_prompt=context_prompt,
history=updated_history,
@@ -16054,6 +16229,7 @@ class GatewayRunner:
event_message_id=next_message_id,
channel_prompt=next_channel_prompt,
)
return _preserve_queued_followup_history_offset(result, followup_result)
finally:
# Stop progress sender, interrupt monitor, and notification task
if progress_task:
@@ -16117,12 +16293,16 @@ class GatewayRunner:
# response_previewed means the interim_assistant_callback already
# sent the final text via the adapter (non-streaming path).
_previewed = bool(response.get("response_previewed"))
if not _is_empty_sentinel and (_streamed or _previewed):
_content_delivered = bool(
_sc and getattr(_sc, "final_content_delivered", False)
)
if not _is_empty_sentinel and (_streamed or _previewed or _content_delivered):
logger.info(
"Suppressing normal final send for session %s: final delivery already confirmed (streamed=%s previewed=%s).",
"Suppressing normal final send for session %s: final delivery already confirmed (streamed=%s previewed=%s content_delivered=%s).",
session_key or "?",
_streamed,
_previewed,
_content_delivered,
)
response["already_sent"] = True
+13 -1
View File
@@ -128,6 +128,7 @@ def _read_process_cmdline(pid: int) -> Optional[str]:
On Linux, reads /proc/<pid>/cmdline directly. On macOS and other
platforms without /proc, falls back to ``ps -p <pid> -o command=``.
On Windows (no /proc, no ps), uses psutil.
"""
cmdline_path = Path(f"/proc/{pid}/cmdline")
try:
@@ -150,6 +151,16 @@ def _read_process_cmdline(pid: int) -> Optional[str]:
except (OSError, subprocess.TimeoutExpired):
pass
# Windows fallback: psutil (already used by _pid_exists)
try:
import psutil # type: ignore
proc = psutil.Process(pid)
cmdline_parts = proc.cmdline()
if cmdline_parts:
return " ".join(cmdline_parts)
except Exception:
pass
return None
@@ -178,7 +189,8 @@ def _record_looks_like_gateway(record: dict[str, Any]) -> bool:
if not isinstance(argv, list) or not argv:
return False
cmdline = " ".join(str(part) for part in argv)
# Normalize Windows backslashes so patterns match cross-platform.
cmdline = " ".join(str(part) for part in argv).replace("\\", "/")
patterns = (
"hermes_cli.main gateway",
"hermes_cli/main.py gateway",
+17
View File
@@ -150,6 +150,10 @@ class GatewayStreamConsumer:
self._flood_strikes = 0 # Consecutive flood-control edit failures
self._current_edit_interval = self.cfg.edit_interval # Adaptive backoff
self._final_response_sent = False
# Set when the final response content was sent to the user via
# streaming, even if the final edit (cursor removal etc.)
# subsequently failed.
self._final_content_delivered = False
# Cache adapter lifecycle capability: only platforms that need an
# explicit finalize call (e.g. DingTalk AI Cards) force us to make
# a redundant final edit. Everyone else keeps the fast path.
@@ -187,6 +191,12 @@ class GatewayStreamConsumer:
"""True when the stream consumer delivered the final assistant reply."""
return self._final_response_sent
@property
def final_content_delivered(self) -> bool:
"""True when the final response content reached the user, even if
the subsequent cosmetic edit (cursor removal) failed."""
return self._final_content_delivered
def on_segment_break(self) -> None:
"""Finalize the current stream segment and start a fresh message."""
self._queue.put(_NEW_SEGMENT)
@@ -455,6 +465,8 @@ class GatewayStreamConsumer:
# tool-progress edits or fallback-mode promotion (#10748)
# — that doesn't mean the final answer reached the user.
self._final_response_sent = chunks_delivered
if chunks_delivered:
self._final_content_delivered = True
return
if got_segment_break:
self._message_id = None
@@ -505,6 +517,11 @@ class GatewayStreamConsumer:
self._last_edit_time = time.monotonic()
if got_done:
# Record that the final content reached the user even
# if the cosmetic final edit below fails.
if current_update_visible and self._accumulated:
self._final_content_delivered = True
# Final edit without cursor. If progressive editing failed
# mid-stream, send a single continuation/fallback message
# here instead of letting the base gateway path send the
+60 -2
View File
@@ -35,7 +35,7 @@ from dataclasses import dataclass, field
from datetime import datetime, timezone
from http.server import BaseHTTPRequestHandler, HTTPServer
from pathlib import Path
from typing import Any, Dict, List, Optional
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import parse_qs, urlencode, urlparse
import httpx
@@ -284,7 +284,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
),
"alibaba": ProviderConfig(
id="alibaba",
name="Alibaba Cloud (DashScope)",
name="Qwen Cloud",
auth_type="api_key",
inference_base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
api_key_env_vars=("DASHSCOPE_API_KEY",),
@@ -3870,6 +3870,39 @@ def _snapshot_nous_pool_status() -> Dict[str, Any]:
return _empty_nous_auth_status()
# ── Process-level memo for get_nous_auth_status() ──
# get_nous_auth_status() validates state by calling resolve_nous_runtime_credentials(),
# which does a synchronous OAuth refresh POST to portal.nousresearch.com. That can take
# ~350ms even on the failure path, and read-only UI surfaces (`hermes tools`, status panels,
# subscription-feature checks) call it many times per render — `hermes tools` → "All Platforms"
# was firing the refresh ~31× during one menu paint, racking up >13s of HTTP and burning
# single-use refresh tokens. Cache the snapshot for a few seconds, keyed on the auth.json
# mtime so that `hermes auth login/logout/add/remove` invalidate naturally on the next call.
_NOUS_AUTH_STATUS_CACHE_TTL = 15.0 # seconds
_nous_auth_status_cache: Optional[Tuple[float, Optional[float], Dict[str, Any]]] = None
def _auth_file_mtime() -> Optional[float]:
try:
return _auth_file_path().stat().st_mtime
except FileNotFoundError:
return None
except Exception:
return None
def invalidate_nous_auth_status_cache() -> None:
"""Clear the get_nous_auth_status() process-level memo.
Call this from any code path that mutates Nous auth state without going
through resolve_nous_runtime_credentials() (e.g. tests). Login/logout
flows touch auth.json, so the mtime check below invalidates them
automatically explicit invalidation is the belt-and-braces option.
"""
global _nous_auth_status_cache
_nous_auth_status_cache = None
def get_nous_auth_status() -> Dict[str, Any]:
"""Status snapshot for Nous auth.
@@ -3878,7 +3911,32 @@ def get_nous_auth_status() -> Dict[str, Any]:
by resolving runtime credentials so revoked refresh sessions do not show up
as a healthy login. If provider state is absent, fall back to the credential
pool for the just-logged-in / not-yet-promoted case.
The returned snapshot is memoised for ~15s keyed on the auth.json mtime,
so menu/status surfaces that ask repeatedly don't trigger one refresh POST
per call. Login/logout flows write to auth.json and therefore invalidate
the cache automatically; tests can also call
``invalidate_nous_auth_status_cache()`` explicitly.
"""
global _nous_auth_status_cache
now = time.monotonic()
mtime = _auth_file_mtime()
cached = _nous_auth_status_cache
if cached is not None:
cached_at, cached_mtime, cached_status = cached
if (
cached_mtime == mtime
and (now - cached_at) < _NOUS_AUTH_STATUS_CACHE_TTL
):
return dict(cached_status)
status = _compute_nous_auth_status()
_nous_auth_status_cache = (now, mtime, dict(status))
return status
def _compute_nous_auth_status() -> Dict[str, Any]:
"""Uncached implementation of get_nous_auth_status(). See that function."""
state = get_provider_auth_state("nous")
if state:
base_status = {
+13
View File
@@ -581,6 +581,19 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
if mcp_connected:
summary_parts.append(f"{mcp_connected} MCP servers")
summary_parts.append("/help for commands")
# Indicate when the codex_app_server runtime is active so users
# understand why tool counts may not match what's actually reachable
# (codex builds its own tool list inside the spawned subprocess).
try:
from hermes_cli.codex_runtime_switch import get_current_runtime
from hermes_cli.config import load_config as _load_cfg
if get_current_runtime(_load_cfg()) == "codex_app_server":
right_lines.append(
f"[bold {accent}]Runtime:[/] [{text}]codex app-server[/] "
f"[dim {dim}](terminal/file ops/MCP run inside codex)[/]"
)
except Exception:
pass
# Show active profile name when not 'default'
try:
from hermes_cli.profiles import get_active_profile_name
+17 -4
View File
@@ -22,6 +22,7 @@ from pathlib import Path
from hermes_constants import is_wsl as _is_wsl
logger = logging.getLogger(__name__)
_PNG_SIGNATURE = b"\x89PNG\r\n\x1a\n"
def save_clipboard_image(dest: Path) -> bool:
@@ -378,10 +379,13 @@ def _wayland_save(dest: Path) -> bool:
dest.unlink(missing_ok=True)
return False
# BMP needs conversion to PNG (common in WSLg where only BMP
# is bridged from Windows clipboard via RDP).
if mime == "image/bmp":
return _convert_to_png(dest)
# save_clipboard_image() promises a PNG output path. Wayland can offer
# JPEG/GIF/WebP/BMP payloads, so normalize every non-PNG result before
# returning success.
if mime != "image/png":
if not _convert_to_png(dest) or not _is_png_file(dest):
dest.unlink(missing_ok=True)
return False
return True
@@ -433,6 +437,15 @@ def _convert_to_png(path: Path) -> bool:
return path.exists() and path.stat().st_size > 0
def _is_png_file(path: Path) -> bool:
"""Return True when *path* starts with the PNG file signature."""
try:
with path.open("rb") as f:
return f.read(len(_PNG_SIGNATURE)) == _PNG_SIGNATURE
except OSError:
return False
# ── X11 (xclip) ─────────────────────────────────────────────────────────
def _xclip_has_image() -> bool:
@@ -0,0 +1,614 @@
"""Migrate Hermes' MCP server config and Codex's installed curated plugins
to the format Codex expects in ~/.codex/config.toml.
When the user enables the codex_app_server runtime, the codex subprocess
runs its own MCP client and its own plugin runtime (Linear, Atlassian,
Asana, plus per-account ChatGPT apps via app/list). For both of those to
be useful, the user's choices need to be visible to codex too. This
module:
1. Reads Hermes' YAML and writes equivalent [mcp_servers.<name>]
entries to ~/.codex/config.toml.
2. Queries codex's `plugin/list` for the openai-curated marketplace
and writes [plugins."<name>@<marketplace>"] entries for any plugin
the user has installed=true on their codex CLI. (This is what
OpenClaw calls "migrate native codex plugins" the YouTube-video-
worthy bit Pash highlighted: Canva, GitHub, Calendar, Gmail
pre-configured.)
3. Writes a [permissions] default profile so users on this runtime
don't get an approval prompt on every write attempt.
What translates (MCP servers):
Hermes mcp_servers.<n>.command/args/env codex stdio transport
Hermes mcp_servers.<n>.url/headers codex streamable_http transport
Hermes mcp_servers.<n>.timeout codex tool_timeout_sec
Hermes mcp_servers.<n>.connect_timeout codex startup_timeout_sec
What does NOT translate (warned + skipped):
Hermes-specific keys (sampling, etc.) codex's MCP client has no
equivalent. Listed in the per-server skipped[] field of the report.
What's NOT migrated (intentional):
AGENTS.md codex respects this file natively in its cwd. Hermes' own
AGENTS.md (project-level) is already in the worktree, so codex picks
it up without translation. No code needed.
"""
from __future__ import annotations
import logging
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Optional
logger = logging.getLogger(__name__)
# Marker comments wrapping the managed section so re-runs can detect
# what's ours and what's user-edited. Both must appear or strip is a no-op.
MIGRATION_MARKER = (
"# managed by hermes-agent — `hermes codex-runtime migrate` regenerates this section"
)
MIGRATION_END_MARKER = (
"# end hermes-agent managed section"
)
@dataclass
class MigrationReport:
"""Outcome of a migration pass."""
target_path: Optional[Path] = None
migrated: list[str] = field(default_factory=list)
skipped_keys_per_server: dict[str, list[str]] = field(default_factory=dict)
migrated_plugins: list[str] = field(default_factory=list)
plugin_query_error: Optional[str] = None
wrote_permissions_default: Optional[str] = None
errors: list[str] = field(default_factory=list)
written: bool = False
dry_run: bool = False
def summary(self) -> str:
lines = []
if self.dry_run:
lines.append(f"(dry run) Would write {self.target_path}")
elif self.written:
lines.append(f"Wrote {self.target_path}")
if self.migrated:
lines.append(f"Migrated {len(self.migrated)} MCP server(s):")
for name in self.migrated:
skipped = self.skipped_keys_per_server.get(name, [])
note = (
f" (skipped: {', '.join(skipped)})" if skipped else ""
)
lines.append(f" - {name}{note}")
else:
lines.append("No MCP servers found in Hermes config.")
if self.migrated_plugins:
lines.append(
f"Migrated {len(self.migrated_plugins)} native Codex plugin(s):"
)
for name in self.migrated_plugins:
lines.append(f" - {name}")
elif self.plugin_query_error:
lines.append(f"Codex plugin discovery skipped: {self.plugin_query_error}")
if self.wrote_permissions_default:
lines.append(
f"Wrote default_permissions = "
f"{self.wrote_permissions_default!r}"
)
for err in self.errors:
lines.append(f"{err}")
return "\n".join(lines)
# Hermes keys that codex's MCP schema doesn't support — dropped during
# migration with a warning. Anything not on the keep list AND not the
# transport keys is added to skipped.
_KNOWN_HERMES_KEYS = {
# transport — stdio
"command", "args", "env", "cwd",
# transport — http
"url", "headers", "transport",
# timeouts
"timeout", "connect_timeout",
# general
"enabled", "description",
}
# Subset that have a direct codex equivalent.
_KEYS_DROPPED_WITH_WARNING = {
# Hermes' sampling subsection — codex MCP has no equivalent
"sampling",
}
def _translate_one_server(
name: str, hermes_cfg: dict
) -> tuple[Optional[dict], list[str]]:
"""Translate one Hermes MCP server config to the codex inline-table dict
representation. Returns (codex_entry, skipped_keys).
codex_entry is a dict ready for TOML serialization, or None when the
server can't be translated (e.g. neither command nor url present)."""
if not isinstance(hermes_cfg, dict):
return None, []
skipped: list[str] = []
out: dict[str, Any] = {}
has_command = bool(hermes_cfg.get("command"))
has_url = bool(hermes_cfg.get("url"))
if has_command and has_url:
skipped.append("url (both command and url set; preferring stdio)")
has_url = False
if has_command:
# Stdio transport
out["command"] = str(hermes_cfg["command"])
args = hermes_cfg.get("args") or []
if args:
out["args"] = [str(a) for a in args]
env = hermes_cfg.get("env") or {}
if env:
# Codex expects string values
out["env"] = {str(k): str(v) for k, v in env.items()}
cwd = hermes_cfg.get("cwd")
if cwd:
out["cwd"] = str(cwd)
elif has_url:
# streamable_http transport (codex covers both http and SSE here)
out["url"] = str(hermes_cfg["url"])
headers = hermes_cfg.get("headers") or {}
if headers:
out["http_headers"] = {str(k): str(v) for k, v in headers.items()}
# Hermes' transport: sse hint is informational; codex auto-negotiates
if hermes_cfg.get("transport") == "sse":
skipped.append("transport=sse (codex auto-negotiates)")
else:
return None, ["no command or url field"]
# Timeouts
if "timeout" in hermes_cfg:
try:
out["tool_timeout_sec"] = float(hermes_cfg["timeout"])
except (TypeError, ValueError):
skipped.append("timeout (not numeric)")
if "connect_timeout" in hermes_cfg:
try:
out["startup_timeout_sec"] = float(hermes_cfg["connect_timeout"])
except (TypeError, ValueError):
skipped.append("connect_timeout (not numeric)")
# Enabled flag (codex defaults to true so we only emit when explicitly false)
if hermes_cfg.get("enabled") is False:
out["enabled"] = False
# Detect keys we explicitly drop with warning
for key in hermes_cfg:
if key in _KEYS_DROPPED_WITH_WARNING:
skipped.append(f"{key} (no codex equivalent)")
elif key not in _KNOWN_HERMES_KEYS:
skipped.append(f"{key} (unknown Hermes key)")
return out, skipped
def _format_toml_value(value: Any) -> str:
"""Minimal TOML value formatter for the value types we emit.
We only emit strings, numbers, booleans, and tables of those no nested
arrays of tables. This covers everything codex's MCP schema accepts."""
if isinstance(value, bool):
return "true" if value else "false"
if isinstance(value, (int, float)):
return repr(value)
if isinstance(value, str):
# Escape per TOML basic-string rules. Order matters: backslash
# first so the other escapes don't get re-escaped.
# Control characters (newline, tab, etc.) must use \-escapes
# because TOML basic strings don't allow literal control chars
# — passing them through would produce invalid TOML that codex
# would refuse to load. Paths usually don't contain control
# chars but env-var passthrough (HERMES_HOME, PYTHONPATH) could
# in pathological cases.
escaped = (
value
.replace("\\", "\\\\")
.replace('"', '\\"')
.replace("\b", "\\b")
.replace("\t", "\\t")
.replace("\n", "\\n")
.replace("\f", "\\f")
.replace("\r", "\\r")
)
return f'"{escaped}"'
if isinstance(value, list):
items = ", ".join(_format_toml_value(v) for v in value)
return f"[{items}]"
if isinstance(value, dict):
items = ", ".join(
f'{_quote_key(k)} = {_format_toml_value(v)}' for k, v in value.items()
)
return "{ " + items + " }" if items else "{}"
raise ValueError(f"Unsupported TOML value type: {type(value).__name__}")
def _quote_key(key: str) -> str:
"""Return key bare-or-quoted depending on whether it's a valid bare key."""
if all(c.isalnum() or c in "-_" for c in key) and key:
return key
escaped = key.replace("\\", "\\\\").replace('"', '\\"')
return f'"{escaped}"'
def render_codex_toml_section(
servers: dict[str, dict],
plugins: Optional[list[dict]] = None,
default_permission_profile: Optional[str] = None,
) -> str:
"""Render the managed [mcp_servers.<n>] / [plugins.<id>] / [permissions]
block for ~/.codex/config.toml.
Args:
servers: dict of MCP server name translated codex inline-table
plugins: optional list of {name, marketplace, enabled} for native
Codex plugins to enable. (E.g. the Linear / Atlassian / Asana
curated plugins, or per-account ChatGPT apps.)
default_permission_profile: when set, write `[permissions] default`
so the user doesn't get an approval prompt on every write
attempt. Common values: "workspace-write", "read-only",
"full-access".
"""
out = [MIGRATION_MARKER]
if not servers and not plugins and not default_permission_profile:
out.append("# (no MCP servers, plugins, or permissions configured by Hermes)")
out.append(MIGRATION_END_MARKER)
return "\n".join(out) + "\n"
if default_permission_profile:
# Codex's config schema: `default_permissions` is a top-level
# string referencing a profile name. Built-in profile names start
# with ":" (":workspace-write", ":read-only", ":full-access"). The
# [permissions] table is for *user-defined* named profiles with
# structured fields — not what we want.
normalized = (
default_permission_profile
if default_permission_profile.startswith(":")
else f":{default_permission_profile}"
)
out.append("")
out.append(f"default_permissions = {_format_toml_value(normalized)}")
if servers:
for name in sorted(servers.keys()):
cfg = servers[name]
out.append("")
out.append(f"[mcp_servers.{_quote_key(name)}]")
for k, v in cfg.items():
out.append(f"{_quote_key(k)} = {_format_toml_value(v)}")
if plugins:
for plugin in sorted(plugins, key=lambda p: f"{p.get('name','')}@{p.get('marketplace','')}"):
name = plugin.get("name") or ""
marketplace = plugin.get("marketplace") or "openai-curated"
enabled = bool(plugin.get("enabled", True))
qualified = f"{name}@{marketplace}"
out.append("")
out.append(f'[plugins.{_quote_key(qualified)}]')
out.append(f"enabled = {_format_toml_value(enabled)}")
out.append("")
out.append(MIGRATION_END_MARKER)
return "\n".join(out) + "\n"
def _strip_existing_managed_block(toml_text: str) -> str:
"""Remove any prior managed section so re-runs idempotently replace it.
The managed section is everything between MIGRATION_MARKER (start) and
MIGRATION_END_MARKER (end), inclusive of both markers. User-edited
sections above or below are preserved verbatim.
Backward compatibility: if the start marker is found but no end marker
follows, we fall back to the heuristic that swallows lines until we
hit a section that's not [mcp_servers.*]/[plugins.*]/[permissions]/
a `default_permissions =` key. This matches what older versions of
this code wrote so re-runs don't break configs from prior Hermes
versions."""
lines = toml_text.splitlines(keepends=True)
out: list[str] = []
in_managed = False
saw_end_marker = False
for line in lines:
line_stripped_nl = line.rstrip("\n")
if line_stripped_nl == MIGRATION_MARKER:
in_managed = True
saw_end_marker = False
continue
if in_managed:
if line_stripped_nl == MIGRATION_END_MARKER:
in_managed = False
saw_end_marker = True
continue
stripped = line.lstrip()
if not saw_end_marker and stripped.startswith("[") and not (
stripped.startswith("[mcp_servers")
or stripped.startswith("[plugins")
or stripped.startswith("[permissions]")
or stripped.startswith("[permissions.")
):
# Old-format managed block without end marker: bail back
# to user content as soon as we see a non-managed section.
in_managed = False
out.append(line)
continue
# Otherwise swallow the line.
continue
out.append(line)
return "".join(out)
def _query_codex_plugins(
codex_home: Optional[Path] = None,
timeout: float = 8.0,
) -> tuple[list[dict], Optional[str]]:
"""Query codex's `plugin/list` for installed curated plugins.
Spawns `codex app-server` briefly, sends initialize + plugin/list,
extracts plugins where installed=true. Returns (plugins, error).
Plugins is a list of {name, marketplace, enabled} dicts ready for
render_codex_toml_section().
On any failure (codex not installed, RPC error, timeout) returns
([], error_message). Migration treats this as non-fatal MCP
servers and permissions still write through.
"""
try:
from agent.transports.codex_app_server import CodexAppServerClient
except Exception as exc:
return [], f"transport unavailable: {exc}"
try:
with CodexAppServerClient(
codex_home=str(codex_home) if codex_home else None
) as client:
client.initialize(client_name="hermes-migration")
resp = client.request("plugin/list", {}, timeout=timeout)
except Exception as exc:
return [], f"plugin/list query failed: {exc}"
out: list[dict] = []
seen: set[tuple[str, str]] = set()
marketplaces = resp.get("marketplaces") or []
if not isinstance(marketplaces, list):
return [], "plugin/list response missing 'marketplaces'"
for marketplace in marketplaces:
if not isinstance(marketplace, dict):
continue
market_name = str(marketplace.get("name") or "openai-curated")
plugins = marketplace.get("plugins") or []
if not isinstance(plugins, list):
continue
for plugin in plugins:
if not isinstance(plugin, dict):
continue
installed = bool(plugin.get("installed", False))
if not installed:
continue
# Skip plugins codex itself reports as unavailable (broken
# install, missing OAuth, removed from marketplace, etc.).
# Cf. openclaw/openclaw#80815 — OpenClaw learned to gate
# migration on app readiness to avoid writing config that
# would fail at activation time. Our migration writes to
# codex's config.toml directly, so a broken plugin would
# surface as a codex error on first use. Skipping it here
# keeps the migrated config clean and the user's first
# codex turn from failing.
availability = str(plugin.get("availability") or "").upper()
if availability and availability != "AVAILABLE":
logger.debug(
"skipping plugin %s: availability=%s",
plugin.get("name"), availability,
)
continue
name = str(plugin.get("name") or "")
if not name:
continue
key = (name, market_name)
if key in seen:
continue
seen.add(key)
# Carry forward whatever 'enabled' codex reports — defaults to
# true for installed plugins. This is the same shape OpenClaw
# writes when migrating native codex plugins.
out.append({
"name": name,
"marketplace": market_name,
"enabled": bool(plugin.get("enabled", True)),
})
return out, None
def _build_hermes_tools_mcp_entry() -> dict:
"""Build the codex stdio-transport entry that launches Hermes' own
tool surface as an MCP server. Codex's subprocess will call back into
this for browser/web/delegate_task/vision/memory/skills tools.
The command runs the worktree's Python via the current sys.executable
so a hermes installed under /opt/, /usr/local/, or a venv all work.
HERMES_HOME and PYTHONPATH are passed through so the spawned process
sees the same config + module layout the user is running."""
import sys
env: dict[str, str] = {}
# HERMES_HOME passes through if set so the MCP subprocess sees the
# same config / auth / sessions DB as the parent CLI.
hermes_home = os.environ.get("HERMES_HOME")
if hermes_home:
env["HERMES_HOME"] = hermes_home
# PYTHONPATH passes through so a worktree-launched hermes finds the
# branch's modules instead of the installed package.
pythonpath = os.environ.get("PYTHONPATH")
if pythonpath:
env["PYTHONPATH"] = pythonpath
# Quiet mode + redaction defaults so the MCP wire stays clean.
env["HERMES_QUIET"] = "1"
env["HERMES_REDACT_SECRETS"] = env.get("HERMES_REDACT_SECRETS", "true")
out: dict[str, Any] = {
"command": sys.executable,
"args": ["-m", "agent.transports.hermes_tools_mcp_server"],
}
if env:
out["env"] = env
# Generous timeouts — browser_navigate or delegate_task can take a
# while; we don't want codex's MCP client to give up too early.
out["startup_timeout_sec"] = 30.0
out["tool_timeout_sec"] = 600.0
return out
def migrate(
hermes_config: dict,
*,
codex_home: Optional[Path] = None,
dry_run: bool = False,
discover_plugins: bool = True,
default_permission_profile: Optional[str] = ":workspace",
expose_hermes_tools: bool = True,
) -> MigrationReport:
"""Translate Hermes mcp_servers config + Codex curated plugins into
~/.codex/config.toml.
Args:
hermes_config: full ~/.hermes/config.yaml dict
codex_home: override CODEX_HOME (defaults to ~/.codex)
dry_run: skip the actual write; report what would happen
discover_plugins: when True (default), query `plugin/list` against
the live codex CLI to migrate any installed curated plugins
into [plugins."<name>@<marketplace>"] entries. Set False to
skip the subprocess spawn (for tests or restricted environments).
default_permission_profile: when set (default ":workspace"), write
top-level `default_permissions = "<name>"` so users on this
runtime don't get an approval prompt on every write attempt.
Built-in codex profile names are ":workspace", ":read-only",
":danger-no-sandbox" (note the leading ":"). Also accepts a
user-defined profile name (no leading ":") that the user has
configured in their own [permissions.<name>] table. Set None
to leave permissions unset and let codex use its compiled-in
default (which is read-only).
expose_hermes_tools: when True (default), register Hermes' own
tool surface (web_search, browser_*, delegate_task, vision,
memory, skills, etc.) as an MCP server in ~/.codex/config.toml
so the codex subprocess can call back into Hermes for tools
codex doesn't have built in. Set False to opt out.
"""
report = MigrationReport(dry_run=dry_run)
codex_home = codex_home or Path.home() / ".codex"
target = codex_home / "config.toml"
report.target_path = target
hermes_servers = (hermes_config or {}).get("mcp_servers") or {}
if not isinstance(hermes_servers, dict):
report.errors.append(
"mcp_servers in Hermes config is not a dict; cannot migrate."
)
return report
translated: dict[str, dict] = {}
for name, cfg in hermes_servers.items():
out, skipped = _translate_one_server(str(name), cfg or {})
if out is None:
report.errors.append(
f"server {name!r} skipped: {', '.join(skipped) or 'no transport configured'}"
)
continue
translated[str(name)] = out
if skipped:
report.skipped_keys_per_server[str(name)] = skipped
report.migrated.append(str(name))
# Discover installed Codex curated plugins. Best-effort — never blocks
# the migration if codex is unreachable or the RPC fails.
plugins: list[dict] = []
if discover_plugins and not dry_run:
plugins, plugin_err = _query_codex_plugins(codex_home=codex_home)
if plugin_err:
report.plugin_query_error = plugin_err
for p in plugins:
report.migrated_plugins.append(f"{p['name']}@{p['marketplace']}")
# Track whether we wrote a default permission profile so the report
# surfaces it to the user.
if default_permission_profile:
report.wrote_permissions_default = default_permission_profile
# Inject Hermes' own tool surface as an MCP server so the spawned
# codex subprocess can call back into Hermes for the tools codex
# doesn't ship with — web_search, browser_*, delegate_task, vision,
# memory, skills, session_search, image_generate, text_to_speech.
# The server itself is agent/transports/hermes_tools_mcp_server.py
# and is launched on demand by codex (stdio MCP).
if expose_hermes_tools:
translated["hermes-tools"] = _build_hermes_tools_mcp_entry()
if "hermes-tools" not in report.migrated:
report.migrated.append("hermes-tools")
# Build the new managed block
managed_block = render_codex_toml_section(
translated, plugins=plugins,
default_permission_profile=default_permission_profile,
)
# Read existing codex config if any, strip the prior managed block,
# append the new one.
if target.exists():
try:
existing = target.read_text(encoding="utf-8")
except Exception as exc:
report.errors.append(f"could not read {target}: {exc}")
return report
without_managed = _strip_existing_managed_block(existing)
# Ensure exactly one blank line between user content and managed block
if without_managed and not without_managed.endswith("\n"):
without_managed += "\n"
new_text = (
without_managed.rstrip("\n") + "\n\n" + managed_block
if without_managed.strip()
else managed_block
)
else:
new_text = managed_block
if dry_run:
return report
try:
codex_home.mkdir(parents=True, exist_ok=True)
# Atomic write: write to a temp file in the same directory then
# rename. Same-directory rename is atomic on POSIX and ReplaceFile
# on Windows. Avoids leaving a half-written config.toml that
# codex would refuse to load if we crash mid-write.
import tempfile
tmp_fd, tmp_path_str = tempfile.mkstemp(
prefix=".config.toml.", dir=str(codex_home)
)
tmp_path = Path(tmp_path_str)
try:
with os.fdopen(tmp_fd, "w", encoding="utf-8") as fh:
fh.write(new_text)
tmp_path.replace(target)
except Exception:
# Clean up the temp file if the rename didn't happen.
try:
if tmp_path.exists():
tmp_path.unlink()
except Exception:
pass
raise
report.written = True
except Exception as exc:
report.errors.append(f"could not write {target}: {exc}")
return report
+266
View File
@@ -0,0 +1,266 @@
"""Shared logic for the /codex-runtime slash command.
Toggles `model.openai_runtime` between "auto" (= chat_completions, Hermes'
default) and "codex_app_server" (= hand turns to a codex subprocess).
Both CLI (cli.py) and gateway (gateway/run.py) call into this module so the
behavior stays identical across surfaces.
The actual runtime resolution happens in hermes_cli.runtime_provider's
_maybe_apply_codex_app_server_runtime() helper, which reads the persisted
config value. This module just persists the value and reports the change.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass
from typing import Optional
logger = logging.getLogger(__name__)
VALID_RUNTIMES = ("auto", "codex_app_server")
@dataclass
class CodexRuntimeStatus:
"""Result of a /codex-runtime invocation. Callers render this however
suits their surface (CLI uses Rich panels, gateway sends a text message)."""
success: bool
new_value: Optional[str] = None
old_value: Optional[str] = None
message: str = ""
requires_new_session: bool = False
codex_binary_ok: bool = True
codex_version: Optional[str] = None
def parse_args(arg_string: str) -> tuple[Optional[str], list[str]]:
"""Parse the slash-command argument string. Returns (value, errors).
No args return current state (value=None)
'auto' / 'codex_app_server' / 'on' / 'off' return that value
anything else error
"""
raw = (arg_string or "").strip().lower()
if not raw:
return None, []
# Accept human-friendly synonyms
if raw in ("on", "codex", "enable"):
return "codex_app_server", []
if raw in ("off", "default", "disable", "hermes"):
return "auto", []
if raw in VALID_RUNTIMES:
return raw, []
return None, [
f"Unknown runtime {raw!r}. Use one of: auto, codex_app_server, on, off"
]
def get_current_runtime(config: dict) -> str:
"""Read the current `model.openai_runtime` value from a config dict.
Returns 'auto' for unset / empty / unrecognized values."""
if not isinstance(config, dict):
return "auto"
model_cfg = config.get("model") or {}
if not isinstance(model_cfg, dict):
return "auto"
value = str(model_cfg.get("openai_runtime") or "").strip().lower()
if value in VALID_RUNTIMES:
return value
return "auto"
def set_runtime(config: dict, new_value: str) -> str:
"""Mutate the config dict in place to persist the new runtime value.
Returns the previous value for callers that want to report a delta."""
if new_value not in VALID_RUNTIMES:
raise ValueError(
f"invalid runtime {new_value!r}; must be one of {VALID_RUNTIMES}"
)
old = get_current_runtime(config)
if not isinstance(config.get("model"), dict):
config["model"] = {}
config["model"]["openai_runtime"] = new_value
return old
def check_codex_binary_ok() -> tuple[bool, Optional[str]]:
"""Best-effort verification that codex CLI is installed at acceptable
version. Returns (ok, version_or_message)."""
try:
from agent.transports.codex_app_server import check_codex_binary
return check_codex_binary()
except Exception as exc: # pragma: no cover
return False, f"codex check failed: {exc}"
def apply(
config: dict,
new_value: Optional[str],
*,
persist_callback=None,
) -> CodexRuntimeStatus:
"""Top-level entry point used by both CLI and gateway handlers.
Args:
config: in-memory config dict (will be mutated when new_value is set)
new_value: desired runtime; None means "show current state only"
persist_callback: optional callable taking the mutated config dict
and persisting it to disk. Skipped when None (used by tests).
Returns: CodexRuntimeStatus describing the outcome.
"""
current = get_current_runtime(config)
# Cache the codex binary check for this apply() call. Subprocess spawn
# is cheap (~50ms for `codex --version`), but we'd otherwise call it up
# to 3 times in the enable path (read-only/state, gate, success message).
# None = not yet checked; (bool, str) = result.
_binary_check: Optional[tuple[bool, Optional[str]]] = None
def _check_binary_cached() -> tuple[bool, Optional[str]]:
nonlocal _binary_check
if _binary_check is None:
_binary_check = check_codex_binary_ok()
return _binary_check
# Read-only call: just report state
if new_value is None:
ok, ver = _check_binary_cached()
msg = (
f"openai_runtime: {current}\n"
f"codex CLI: {'OK ' + ver if ok else 'not available — ' + (ver or 'install with `npm i -g @openai/codex`')}"
)
return CodexRuntimeStatus(
success=True,
new_value=current,
old_value=current,
message=msg,
codex_binary_ok=ok,
codex_version=ver if ok else None,
)
# No change requested
if new_value == current:
return CodexRuntimeStatus(
success=True,
new_value=current,
old_value=current,
message=f"openai_runtime already set to {current}",
)
# If switching ON, verify codex CLI is installed before persisting —
# an opt-in toggle that silently fails on the first turn is the
# worst possible UX. Block here with a clear install hint.
if new_value == "codex_app_server":
ok, ver_or_msg = _check_binary_cached()
if not ok:
return CodexRuntimeStatus(
success=False,
new_value=None,
old_value=current,
message=(
"Cannot enable codex_app_server runtime: "
f"{ver_or_msg or 'codex CLI not available'}\n"
"Install with: npm i -g @openai/codex"
),
codex_binary_ok=False,
codex_version=None,
)
set_runtime(config, new_value)
if persist_callback is not None:
try:
persist_callback(config)
except Exception as exc:
logger.exception("failed to persist openai_runtime change")
return CodexRuntimeStatus(
success=False,
new_value=new_value,
old_value=current,
message=f"updated config in memory but persist failed: {exc}",
)
msg_lines = [
f"openai_runtime: {current}{new_value}",
]
if new_value == "codex_app_server":
ok, ver = _check_binary_cached()
if ok:
msg_lines.append(f"codex CLI: {ver}")
# Auto-migrate Hermes' MCP servers + Codex's installed curated
# plugins into ~/.codex/config.toml so the spawned codex subprocess
# sees the same tool surface AND can call back into Hermes for
# browser/web/delegate_task/vision/memory tools (#7 fix).
# Failures are non-fatal — the runtime change still proceeds.
try:
from hermes_cli.codex_runtime_plugin_migration import migrate
mig_report = migrate(config)
# Tools/MCP servers (excluding the hermes-tools callback,
# which is internal plumbing — surface separately).
user_servers = [
s for s in mig_report.migrated if s != "hermes-tools"
]
if user_servers:
msg_lines.append(
f"Migrated {len(user_servers)} MCP server(s): "
f"{', '.join(user_servers)}"
)
# Native Codex plugin migration (Linear, GitHub, etc.)
if mig_report.migrated_plugins:
msg_lines.append(
f"Migrated {len(mig_report.migrated_plugins)} native "
f"Codex plugin(s): {', '.join(mig_report.migrated_plugins)}"
)
elif mig_report.plugin_query_error:
msg_lines.append(
f"Codex plugin discovery skipped: "
f"{mig_report.plugin_query_error}"
)
# Permissions + Hermes tool callback are always-on production
# bits the user benefits from knowing about.
if mig_report.wrote_permissions_default:
msg_lines.append(
f"Default sandbox: {mig_report.wrote_permissions_default} "
f"(no approval prompt on every write)"
)
if "hermes-tools" in mig_report.migrated:
msg_lines.append(
"Hermes tool callback registered: codex can now use "
"web_search, web_extract, browser_*, vision_analyze, "
"image_generate, skill_view, skills_list, text_to_speech, "
"kanban_* (worker + orchestrator) via MCP."
)
msg_lines.append(
" (delegate_task, memory, session_search, todo run "
"only on the default Hermes runtime — they need the "
"agent loop context.)"
)
msg_lines.append(f" (config: {mig_report.target_path})")
for err in mig_report.errors:
msg_lines.append(f"⚠ MCP migration: {err}")
except Exception as exc:
msg_lines.append(f"⚠ MCP migration skipped: {exc}")
msg_lines.append(
"OpenAI/Codex turns now run through `codex app-server` "
"(terminal/file ops/patching inside Codex; "
"Hermes tools available via MCP callback)."
)
msg_lines.append(
"Effective on next session — current cached agent keeps "
"the prior runtime to preserve prompt cache."
)
else:
msg_lines.append("OpenAI/Codex turns will use the default Hermes runtime.")
msg_lines.append("Effective on next session.")
return CodexRuntimeStatus(
success=True,
new_value=new_value,
old_value=current,
message="\n".join(msg_lines),
requires_new_session=True,
)
+4
View File
@@ -104,6 +104,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
args_hint="<prompt>"),
CommandDef("goal", "Set a standing goal Hermes works on across turns until achieved", "Session",
args_hint="[text | pause | resume | clear | status]"),
CommandDef("subgoal", "Add or manage extra criteria on the active goal", "Session",
args_hint="[text | remove N | clear]"),
CommandDef("status", "Show session info", "Session"),
CommandDef("whoami", "Show your slash command access (admin / user)", "Info"),
CommandDef("profile", "Show active profile name and home directory", "Info"),
@@ -120,6 +122,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration",
aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
CommandDef("codex-runtime", "Toggle codex app-server runtime for OpenAI/Codex models",
"Configuration", args_hint="[auto|codex_app_server]"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
cli_only=True),
+1 -1
View File
@@ -238,7 +238,7 @@ _hermes() {{
esac
}}
_hermes "$@"
compdef _hermes hermes
"""
+78 -12
View File
@@ -731,19 +731,18 @@ DEFAULT_CONFIG = {
"target_ratio": 0.20, # fraction of threshold to preserve as recent tail
"protect_last_n": 20, # minimum recent messages to keep uncompressed
"hygiene_hard_message_limit": 400, # gateway session-hygiene force-compress threshold by message count
"protect_first_n": 3, # non-system head messages always preserved
# verbatim, in ADDITION to the system prompt
# (which is always implicitly protected). Set to
# 0 for long-running rolling-compaction sessions
# where you want nothing pinned except the
# system prompt + rolling summary + recent tail.
},
# Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
# cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
# long_lived_prefix: when true (default), Claude on Anthropic / OpenRouter / Nous
# Portal uses a split layout: tools[-1] + stable system prefix at long_lived_ttl
# (cross-session cache), last 2 messages at cache_ttl (within-session rolling).
# Set false to keep the legacy "system + last 3 messages" single-tier layout.
# long_lived_ttl: TTL for the cross-session prefix tier ("5m" or "1h"; default "1h").
"prompt_caching": {
"cache_ttl": "5m",
"long_lived_prefix": True,
"long_lived_ttl": "1h",
},
# OpenRouter-specific settings.
@@ -978,6 +977,21 @@ DEFAULT_CONFIG = {
# Web dashboard settings
"dashboard": {
"theme": "default", # Dashboard visual theme: "default", "midnight", "ember", "mono", "cyberpunk", "rose"
# Hide the token/cost analytics surfaces (Analytics page, token bars and
# cost figures on the Models page) by default. The numbers shown there
# are a local debug estimate: they only count successful main-agent
# responses with a usable ``response.usage``, and silently exclude every
# auxiliary call (context compression, title generation, vision,
# session search, web extract, smart approval, MCP routing, plugin LLM
# access) plus provider-side retries, fallback attempts, and any call
# whose usage block didn't come back. Cache writes are also missing
# from the API response. On models with heavy auxiliary traffic
# (Kimi K2.6, MiniMax M2.7) the local total can be 10x-100x lower than
# the provider bill, which is worse than hiding the numbers entirely
# because they look precise enough to compare against the provider.
# Set this to True to re-enable the surfaces with the understanding
# that the numbers are a local lower-bound estimate, not billing.
"show_token_analytics": False,
},
# Privacy settings
@@ -1236,6 +1250,7 @@ DEFAULT_CONFIG = {
"free_response_channels": "", # Comma-separated channel IDs where bot responds without mention
"allowed_channels": "", # If set, bot ONLY responds in these channel IDs (whitelist)
"auto_thread": True, # Auto-create threads on @mention in channels (like Slack)
"thread_require_mention": False, # If True, require @mention in threads too (multi-bot threads)
"reactions": True, # Add 👀/✅/❌ reactions to messages during processing
"channel_prompts": {}, # Per-channel ephemeral system prompts (forum parents apply to child threads)
# Opt-in DM role-based auth (#12136). By default, DISCORD_ALLOWED_ROLES
@@ -2114,10 +2129,10 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
"FAL_KEY": {
"description": "FAL API key for image generation",
"description": "FAL API key for image and video generation",
"prompt": "FAL API key",
"url": "https://fal.ai/",
"tools": ["image_generate"],
"tools": ["image_generate", "video_generate"],
"password": True,
"category": "tool",
},
@@ -4326,10 +4341,34 @@ def load_env() -> Dict[str, str]:
concatenated KEY=VALUE pairs on a single line) are handled
gracefully instead of producing mangled values such as duplicated
bot tokens. See #8908.
The parsed dict is memoised keyed on the .env file mtime, because
``get_env_value()`` is called dozens-to-hundreds of times per
interactive menu render (`hermes tools`, `hermes setup`, status
panels). Sanitisation is O(lines × known-keys), so re-parsing the
same file on every call was burning ~300ms of CPU per `hermes tools`
menu paint on top of the OAuth-refresh slowness. The mtime check
invalidates the cache when the user edits .env mid-process.
"""
global _env_cache
env_path = get_env_path()
env_vars = {}
try:
mtime = env_path.stat().st_mtime
size = env_path.stat().st_size
cache_key = (str(env_path), mtime, size)
except FileNotFoundError:
cache_key = (str(env_path), None, None)
except Exception:
cache_key = None
if cache_key is not None and _env_cache is not None:
cached_key, cached_vars = _env_cache
if cached_key == cache_key:
return dict(cached_vars)
env_vars: Dict[str, str] = {}
if env_path.exists():
# On Windows, open() defaults to the system locale (cp1252) which can
# fail on UTF-8 .env files. Always use explicit UTF-8; tolerate BOM
@@ -4345,10 +4384,33 @@ def load_env() -> Dict[str, str]:
if line and not line.startswith('#') and '=' in line:
key, _, value = line.partition('=')
env_vars[key.strip()] = value.strip().strip('"\'')
if cache_key is not None:
_env_cache = (cache_key, dict(env_vars))
return env_vars
# Module-level memo for load_env(), keyed on (path, mtime, size).
# Editing .env bumps mtime → next load_env() rebuilds. invalidate_env_cache()
# is the explicit knob for writers that update .env via this module
# (set_env_value, save_env, etc.) without relying on filesystem mtime
# resolution.
_env_cache: Optional[Tuple[Tuple[str, Optional[float], Optional[int]], Dict[str, str]]] = None
def invalidate_env_cache() -> None:
"""Clear the load_env() process-level memo.
Writers that mutate .env (set_env_value, save_env, etc.) call this
to guarantee the next load_env() sees their change even on
filesystems with coarse mtime resolution. Reads invalidate naturally
via the mtime/size check.
"""
global _env_cache
_env_cache = None
def _sanitize_env_lines(lines: list) -> list:
"""Fix corrupted .env lines before reading or writing.
@@ -4451,6 +4513,7 @@ def sanitize_env_file() -> int:
pass
raise
_secure_file(env_path)
invalidate_env_cache()
return fixes
@@ -4562,6 +4625,7 @@ def save_env_value(key: str, value: str):
_secure_file(env_path)
os.environ[key] = value
invalidate_env_cache()
def remove_env_value(key: str) -> bool:
@@ -4617,6 +4681,7 @@ def remove_env_value(key: str) -> bool:
_secure_file(env_path)
os.environ.pop(key, None)
invalidate_env_cache()
return found
@@ -4803,6 +4868,7 @@ def show_config():
print(f" Threshold: {compression.get('threshold', 0.50) * 100:.0f}%")
print(f" Target ratio: {compression.get('target_ratio', 0.20) * 100:.0f}% of threshold preserved")
print(f" Protect last: {compression.get('protect_last_n', 20)} messages")
print(f" Protect first: {compression.get('protect_first_n', 3)} non-system head messages")
_aux_comp = config.get('auxiliary', {}).get('compression', {})
_sm = _aux_comp.get('model', '') or '(auto)'
print(f" Model: {_sm}")
+141 -12
View File
@@ -33,8 +33,8 @@ import json
import logging
import re
import time
from dataclasses import dataclass, asdict
from typing import Any, Dict, Optional, Tuple
from dataclasses import dataclass, field, asdict
from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
@@ -65,6 +65,21 @@ CONTINUATION_PROMPT_TEMPLATE = (
"If you are blocked and need input from the user, say so clearly and stop."
)
# Used when the user has added one or more /subgoal criteria. Surfaced
# to the agent verbatim so it sees what to target on the next turn,
# and surfaced to the judge so the verdict considers them too.
CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE = (
"[Continuing toward your standing goal]\n"
"Goal: {goal}\n\n"
"Additional criteria the user added mid-loop:\n"
"{subgoals_block}\n\n"
"Continue working toward the goal AND all additional criteria. Take "
"the next concrete step. If you believe the goal and every "
"additional criterion are complete, state so explicitly and stop. "
"If you are blocked and need input from the user, say so clearly "
"and stop."
)
JUDGE_SYSTEM_PROMPT = (
"You are a strict judge evaluating whether an autonomous agent has "
@@ -88,6 +103,23 @@ JUDGE_USER_PROMPT_TEMPLATE = (
"Is the goal satisfied?"
)
# Used when the user has added /subgoal criteria. The judge must
# evaluate ALL of them being met, not just the original goal.
JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE = (
"Goal:\n{goal}\n\n"
"Additional criteria the user added mid-loop (all must also be "
"satisfied for the goal to be DONE):\n{subgoals_block}\n\n"
"Agent's most recent response:\n{response}\n\n"
"Decision: For each numbered criterion above, find concrete "
"evidence in the agent's response that the criterion is "
"satisfied. Do not accept generic phrases like 'all requirements "
"met' or 'implying it was done' — require specific evidence (a "
"file contents excerpt, an output line, a command result). If "
"ANY criterion lacks specific evidence in the response, the goal "
"is NOT done — return CONTINUE.\n\n"
"Is the goal AND every additional criterion satisfied?"
)
# ──────────────────────────────────────────────────────────────────────
# Dataclass
@@ -108,6 +140,12 @@ class GoalState:
last_reason: Optional[str] = None
paused_reason: Optional[str] = None # why we auto-paused (budget, etc.)
consecutive_parse_failures: int = 0 # judge-output parse failures in a row
# User-added criteria appended mid-loop via the /subgoal command.
# When non-empty the judge prompt and continuation prompt both
# include them so the agent works toward them and the judge factors
# them into the verdict. Backwards-compatible: defaults to empty so
# old state_meta rows load unchanged.
subgoals: List[str] = field(default_factory=list)
def to_json(self) -> str:
return json.dumps(asdict(self), ensure_ascii=False)
@@ -115,6 +153,10 @@ class GoalState:
@classmethod
def from_json(cls, raw: str) -> "GoalState":
data = json.loads(raw)
raw_subgoals = data.get("subgoals") or []
subgoals: List[str] = []
if isinstance(raw_subgoals, list):
subgoals = [str(s).strip() for s in raw_subgoals if str(s).strip()]
return cls(
goal=data.get("goal", ""),
status=data.get("status", "active"),
@@ -126,8 +168,18 @@ class GoalState:
last_reason=data.get("last_reason"),
paused_reason=data.get("paused_reason"),
consecutive_parse_failures=int(data.get("consecutive_parse_failures", 0) or 0),
subgoals=subgoals,
)
# --- subgoals helpers -------------------------------------------------
def render_subgoals_block(self) -> str:
"""Render the subgoals as a numbered ``- N. text`` block. Empty
when no subgoals exist."""
if not self.subgoals:
return ""
return "\n".join(f"- {i}. {text}" for i, text in enumerate(self.subgoals, start=1))
# ──────────────────────────────────────────────────────────────────────
# Persistence (SessionDB state_meta)
@@ -284,6 +336,7 @@ def judge_goal(
last_response: str,
*,
timeout: float = DEFAULT_JUDGE_TIMEOUT,
subgoals: Optional[List[str]] = None,
) -> Tuple[str, str, bool]:
"""Ask the auxiliary model whether the goal is satisfied.
@@ -296,6 +349,11 @@ def judge_goal(
auto-pause after N consecutive parse failures (see
``DEFAULT_MAX_CONSECUTIVE_PARSE_FAILURES``).
``subgoals`` is an optional list of user-added criteria (from
``/subgoal``) that the judge must also factor into its DONE/CONTINUE
decision. When non-empty the prompt switches to the with-subgoals
template; otherwise behavior is identical to the original judge.
This is deliberately fail-open: any error returns ``("continue", "...", False)``
so a broken judge doesn't wedge progress — the turn budget and the
consecutive-parse-failures auto-pause are the backstops.
@@ -307,7 +365,7 @@ def judge_goal(
return "continue", "empty response (nothing to evaluate)", False
try:
from agent.auxiliary_client import get_text_auxiliary_client
from agent.auxiliary_client import get_auxiliary_extra_body, get_text_auxiliary_client
except Exception as exc:
logger.debug("goal judge: auxiliary client import failed: %s", exc)
return "continue", "auxiliary client unavailable", False
@@ -321,10 +379,22 @@ def judge_goal(
if client is None or not model:
return "continue", "no auxiliary client configured", False
prompt = JUDGE_USER_PROMPT_TEMPLATE.format(
goal=_truncate(goal, 2000),
response=_truncate(last_response, _JUDGE_RESPONSE_SNIPPET_CHARS),
)
# Build the prompt — pick the with-subgoals variant when applicable.
clean_subgoals = [s.strip() for s in (subgoals or []) if s and s.strip()]
if clean_subgoals:
subgoals_block = "\n".join(
f"- {i}. {text}" for i, text in enumerate(clean_subgoals, start=1)
)
prompt = JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE.format(
goal=_truncate(goal, 2000),
subgoals_block=_truncate(subgoals_block, 2000),
response=_truncate(last_response, _JUDGE_RESPONSE_SNIPPET_CHARS),
)
else:
prompt = JUDGE_USER_PROMPT_TEMPLATE.format(
goal=_truncate(goal, 2000),
response=_truncate(last_response, _JUDGE_RESPONSE_SNIPPET_CHARS),
)
try:
resp = client.chat.completions.create(
@@ -336,6 +406,7 @@ def judge_goal(
temperature=0,
max_tokens=200,
timeout=timeout,
extra_body=get_auxiliary_extra_body() or None,
)
except Exception as exc:
logger.info("goal judge: API call failed (%s) — falling through to continue", exc)
@@ -396,14 +467,15 @@ class GoalManager:
if s is None or s.status in {"cleared",}:
return "No active goal. Set one with /goal <text>."
turns = f"{s.turns_used}/{s.max_turns} turns"
sub = f", {len(s.subgoals)} subgoal{'s' if len(s.subgoals) != 1 else ''}" if s.subgoals else ""
if s.status == "active":
return f"⊙ Goal (active, {turns}): {s.goal}"
return f"⊙ Goal (active, {turns}{sub}): {s.goal}"
if s.status == "paused":
extra = f"{s.paused_reason}" if s.paused_reason else ""
return f"⏸ Goal (paused, {turns}{extra}): {s.goal}"
return f"⏸ Goal (paused, {turns}{sub}{extra}): {s.goal}"
if s.status == "done":
return f"✓ Goal done ({turns}): {s.goal}"
return f"Goal ({s.status}, {turns}): {s.goal}"
return f"✓ Goal done ({turns}{sub}): {s.goal}"
return f"Goal ({s.status}, {turns}{sub}): {s.goal}"
# --- mutation -----------------------------------------------------
@@ -456,6 +528,53 @@ class GoalManager:
self._state.last_reason = reason
save_goal(self.session_id, self._state)
# --- /subgoal user controls ---------------------------------------
def add_subgoal(self, text: str) -> str:
"""Append a user-added criterion to the active goal. Requires
``has_goal()``; raises ``RuntimeError`` otherwise.
Returns the cleaned text so the caller can show it back to the user.
"""
if self._state is None or not self.has_goal():
raise RuntimeError("no active goal")
text = (text or "").strip()
if not text:
raise ValueError("subgoal text is empty")
self._state.subgoals.append(text)
save_goal(self.session_id, self._state)
return text
def remove_subgoal(self, index_1based: int) -> str:
"""Remove a subgoal by 1-based index. Returns the removed text."""
if self._state is None or not self.has_goal():
raise RuntimeError("no active goal")
idx = int(index_1based) - 1
if idx < 0 or idx >= len(self._state.subgoals):
raise IndexError(
f"index out of range (1..{len(self._state.subgoals)})"
)
removed = self._state.subgoals.pop(idx)
save_goal(self.session_id, self._state)
return removed
def clear_subgoals(self) -> int:
"""Wipe all subgoals. Returns the previous count."""
if self._state is None or not self.has_goal():
raise RuntimeError("no active goal")
prev = len(self._state.subgoals)
self._state.subgoals = []
save_goal(self.session_id, self._state)
return prev
def render_subgoals(self) -> str:
"""Public helper for the /subgoal slash command."""
if self._state is None:
return "(no active goal)"
if not self._state.subgoals:
return "(no subgoals — use /subgoal <text> to add criteria)"
return self._state.render_subgoals_block()
# --- the main entry point called after every turn -----------------
def evaluate_after_turn(
@@ -493,7 +612,9 @@ class GoalManager:
state.turns_used += 1
state.last_turn_at = time.time()
verdict, reason, parse_failed = judge_goal(state.goal, last_response)
verdict, reason, parse_failed = judge_goal(
state.goal, last_response, subgoals=state.subgoals or None
)
state.last_verdict = verdict
state.last_reason = reason
@@ -578,6 +699,11 @@ class GoalManager:
def next_continuation_prompt(self) -> Optional[str]:
if not self._state or self._state.status != "active":
return None
if self._state.subgoals:
return CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE.format(
goal=self._state.goal,
subgoals_block=self._state.render_subgoals_block(),
)
return CONTINUATION_PROMPT_TEMPLATE.format(goal=self._state.goal)
@@ -585,6 +711,9 @@ __all__ = [
"GoalState",
"GoalManager",
"CONTINUATION_PROMPT_TEMPLATE",
"CONTINUATION_PROMPT_WITH_SUBGOALS_TEMPLATE",
"JUDGE_USER_PROMPT_TEMPLATE",
"JUDGE_USER_PROMPT_WITH_SUBGOALS_TEMPLATE",
"DEFAULT_MAX_TURNS",
"load_goal",
"save_goal",
+240
View File
@@ -0,0 +1,240 @@
"""Provider/model inventory context — shared substrate for the dashboard
``/api/model/options``, the TUI ``model.options``/``model.save_key``
JSON-RPC handlers, and the interactive picker.
Before this module the three call-sites each duplicated:
1. The 17-LOC config-slice that pulls ``model.{default,name,provider,base_url}``,
``providers:``, and ``custom_providers:`` out of ``load_config()``;
2. The call into ``list_authenticated_providers`` with the resulting kwargs;
3. (TUI only) a 45-LOC post-pass that merges authenticated rows with
unconfigured ``CANONICAL_PROVIDERS`` rows and emits ``authenticated``/
``auth_type``/``key_env``/``warning`` hints for the picker UI.
Consolidating those three steps into one entry point eliminates two bugs
the duplicates were hiding:
- The dashboard read ``cfg.get("custom_providers")`` directly, missing the
v12+ keyed ``providers:`` form (which the TUI handled via
``get_compatible_custom_providers``).
- The TUI's canonical-merge keyed on ``is_user_defined`` to decide
ordering. Section 3 of ``list_authenticated_providers`` sets
``is_user_defined=True`` even for canonical slugs that appear in the
``providers:`` config dict, which silently demoted them to the tail of
the picker. ``_reorder_canonical`` keys on slug membership instead.
Substrate facts (verified May 2026):
- ``list_authenticated_providers`` already populates each row's
``models`` from the curated catalog (same source as the picker). Do
NOT call ``provider_model_ids()`` per row to "freshen" that bypasses
curation and pulls in non-agentic models (Nous /models returns ~400
IDs including TTS, embeddings, rerankers, image/video generators).
"""
from __future__ import annotations
from dataclasses import dataclass, replace
from typing import Optional
# ─── Public types ───────────────────────────────────────────────────────
@dataclass(frozen=True)
class ConfigContext:
"""Snapshot of the model + provider config every inventory caller
needs. Built once via ``load_picker_context()``; the TUI overlays
live agent state via ``with_overrides()`` before passing through.
"""
current_provider: str
current_model: str
current_base_url: str
user_providers: dict
custom_providers: list
def with_overrides(
self,
*,
current_provider: Optional[str] = None,
current_model: Optional[str] = None,
current_base_url: Optional[str] = None,
) -> "ConfigContext":
"""Return a copy with truthy overrides applied.
Truthy-only because the TUI reads agent attributes that may be
empty strings before an agent is spawned empties must NOT
clobber the disk-config values.
"""
kw: dict = {}
if current_provider:
kw["current_provider"] = current_provider
if current_model:
kw["current_model"] = current_model
if current_base_url:
kw["current_base_url"] = current_base_url
return replace(self, **kw) if kw else self
def load_picker_context() -> ConfigContext:
"""Load the disk-config snapshot every consumer needs.
Replaces the inline 17-LOC config-slice that ``web_server.py`` and
``tui_gateway/server.py`` (×2 sites) used to do.
"""
from hermes_cli.config import get_compatible_custom_providers, load_config
cfg = load_config()
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
current_model = model_cfg.get("default", model_cfg.get("name", "")) or ""
current_provider = model_cfg.get("provider", "") or ""
current_base_url = model_cfg.get("base_url", "") or ""
else:
# config.model can be a bare string in older configs.
current_model = str(model_cfg) if model_cfg else ""
current_provider = ""
current_base_url = ""
raw = cfg.get("providers")
return ConfigContext(
current_provider=current_provider,
current_model=current_model,
current_base_url=current_base_url,
user_providers=raw if isinstance(raw, dict) else {},
custom_providers=get_compatible_custom_providers(cfg),
)
# ─── Public: payload builder ────────────────────────────────────────────
def build_models_payload(
ctx: ConfigContext,
*,
include_unconfigured: bool = False,
picker_hints: bool = False,
canonical_order: bool = False,
max_models: int = 50,
) -> dict:
"""Build the ``{providers, model, provider}`` shape every consumer
needs from a single substrate call.
Flags:
- ``include_unconfigured``: append ``CANONICAL_PROVIDERS`` rows that
``list_authenticated_providers`` didn't emit (TUI uses this to show
the full provider universe in the picker).
- ``picker_hints``: add ``authenticated``/``auth_type``/``key_env``/
``warning`` per row (TUI ``ModelPickerDialog`` shape).
- ``canonical_order``: reorder canonical-slug rows to
``CANONICAL_PROVIDERS`` declaration order; truly-custom rows go
last (TUI display order).
"""
from hermes_cli.model_switch import list_authenticated_providers
rows = list_authenticated_providers(
current_provider=ctx.current_provider,
current_base_url=ctx.current_base_url,
current_model=ctx.current_model,
user_providers=ctx.user_providers,
custom_providers=ctx.custom_providers,
max_models=max_models,
)
if include_unconfigured:
rows = list(rows) + _append_unconfigured_rows(rows, ctx)
if picker_hints:
_apply_picker_hints(rows)
if canonical_order:
rows = _reorder_canonical(rows)
return {
"providers": rows,
"model": ctx.current_model,
"provider": ctx.current_provider,
}
# ─── Internal: row post-processing ──────────────────────────────────────
def _append_unconfigured_rows(rows: list[dict], ctx: ConfigContext) -> list[dict]:
"""Build skeleton rows for canonical providers missing from ``rows``."""
from hermes_cli.models import CANONICAL_PROVIDERS, _PROVIDER_LABELS
seen = {r["slug"].lower() for r in rows}
cur = (ctx.current_provider or "").lower()
extras: list[dict] = []
for entry in CANONICAL_PROVIDERS:
if entry.slug.lower() in seen:
continue
extras.append(
{
"slug": entry.slug,
"name": _PROVIDER_LABELS.get(entry.slug, entry.label),
"is_current": entry.slug.lower() == cur,
"is_user_defined": False,
"models": [],
"total_models": 0,
"source": "canonical",
}
)
return extras
def _apply_picker_hints(rows: list[dict]) -> None:
"""Add ``authenticated``/``auth_type``/``key_env``/``warning`` per row.
Mutates ``rows`` in-place. Rows already from
``list_authenticated_providers`` are marked ``authenticated=True``;
the unconfigured skeleton rows from ``_append_unconfigured_rows`` get
the picker's setup-hint shape.
"""
from hermes_cli.auth import PROVIDER_REGISTRY
for row in rows:
if "authenticated" in row:
continue
# Distinguish authenticated rows (returned by
# list_authenticated_providers) from skeleton rows (from
# _append_unconfigured_rows). The skeleton rows have empty
# `models` AND source="canonical"; authenticated rows have
# populated `models` OR a non-canonical source.
is_skeleton = row.get("source") == "canonical" and not row.get("models")
row["authenticated"] = not is_skeleton
if not is_skeleton or row.get("is_user_defined"):
continue
cfg = PROVIDER_REGISTRY.get(row["slug"])
auth_type = cfg.auth_type if cfg else "api_key"
key_env = (
cfg.api_key_env_vars[0]
if (cfg and cfg.api_key_env_vars)
else ""
)
row["auth_type"] = auth_type
row["key_env"] = key_env
row["warning"] = (
f"paste {key_env} to activate"
if auth_type == "api_key" and key_env
else f"run `hermes model` to configure ({auth_type})"
)
def _reorder_canonical(rows: list[dict]) -> list[dict]:
"""Canonical slugs in ``CANONICAL_PROVIDERS`` declaration order;
truly-custom rows last.
Keys on slug membership, NOT ``is_user_defined`` section 3 of
``list_authenticated_providers`` sets ``is_user_defined=True`` on
rows from the ``providers:`` config dict even when the slug is
canonical. Keying on the flag would silently demote canonical
providers configured via the new keyed schema.
"""
from hermes_cli.models import CANONICAL_PROVIDERS
order = {e.slug: i for i, e in enumerate(CANONICAL_PROVIDERS)}
canon = sorted(
(r for r in rows if r["slug"] in order),
key=lambda r: order[r["slug"]],
)
extras = [r for r in rows if r["slug"] not in order]
return canon + extras
+2 -1
View File
@@ -155,7 +155,7 @@ def specify_task(
)
try:
from agent.auxiliary_client import get_text_auxiliary_client
from agent.auxiliary_client import get_auxiliary_extra_body, get_text_auxiliary_client
except Exception as exc: # pragma: no cover — import smoke test
logger.debug("specify: auxiliary client import failed: %s", exc)
return SpecifyOutcome(task_id, False, "auxiliary client unavailable")
@@ -187,6 +187,7 @@ def specify_task(
temperature=0.3,
max_tokens=1500,
timeout=timeout or 120,
extra_body=get_auxiliary_extra_body() or None,
)
except Exception as exc:
logger.info(
+240 -44
View File
@@ -2414,30 +2414,31 @@ def _prompt_provider_choice(choices, *, default=0):
def _model_flow_openrouter(config, current_model=""):
"""OpenRouter provider: ensure API key, then pick model."""
from hermes_cli.auth import (
ProviderConfig,
_prompt_model_selection,
_save_model_choice,
deactivate_provider,
)
from hermes_cli.config import get_env_value, save_env_value
from hermes_cli.config import get_env_value
api_key = get_env_value("OPENROUTER_API_KEY")
if not api_key:
print("No OpenRouter API key configured.")
# Route through _prompt_api_key so users can replace a stale/broken key
# in-flow (K/R/C) instead of having to edit ~/.hermes/.env by hand. The
# previous bypass-when-key-exists branch left no way to recover from a
# bad paste short of re-running `hermes setup` from scratch. OpenRouter
# isn't in PROVIDER_REGISTRY so we synthesize a minimal pconfig.
pconfig = ProviderConfig(
id="openrouter",
name="OpenRouter",
auth_type="api_key",
api_key_env_vars=("OPENROUTER_API_KEY",),
)
existing_key = get_env_value("OPENROUTER_API_KEY") or ""
if not existing_key:
print("Get one at: https://openrouter.ai/keys")
print()
try:
import getpass
key = getpass.getpass("OpenRouter API key (or Enter to cancel): ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not key:
print("Cancelled.")
return
save_env_value("OPENROUTER_API_KEY", key)
print("API key saved.")
print()
_resolved, abort = _prompt_api_key(pconfig, existing_key, provider_id="openrouter")
if abort:
return
from hermes_cli.models import model_ids, get_pricing_for_provider
@@ -2473,33 +2474,26 @@ def _model_flow_openrouter(config, current_model=""):
def _model_flow_ai_gateway(config, current_model=""):
"""Vercel AI Gateway provider: ensure API key, then pick model with pricing."""
from hermes_cli.auth import (
PROVIDER_REGISTRY,
_prompt_model_selection,
_save_model_choice,
deactivate_provider,
)
from hermes_cli.config import get_env_value, save_env_value
from hermes_cli.config import get_env_value
api_key = get_env_value("AI_GATEWAY_API_KEY")
if not api_key:
print("No Vercel AI Gateway API key configured.")
# Route through _prompt_api_key so users can replace a stale/broken key
# in-flow (K/R/C) instead of having to edit ~/.hermes/.env by hand.
pconfig = PROVIDER_REGISTRY["ai-gateway"]
existing_key = get_env_value("AI_GATEWAY_API_KEY") or ""
if not existing_key:
print(
"Create API key here: https://vercel.com/d?to=%2F%5Bteam%5D%2F%7E%2Fai-gateway&title=AI+Gateway"
)
print("Add a payment method to get $5 in free credits.")
print()
try:
import getpass
key = getpass.getpass("AI Gateway API key (or Enter to cancel): ").strip()
except (KeyboardInterrupt, EOFError):
print()
return
if not key:
print("Cancelled.")
return
save_env_value("AI_GATEWAY_API_KEY", key)
print("API key saved.")
print()
_resolved, abort = _prompt_api_key(pconfig, existing_key, provider_id="ai-gateway")
if abort:
return
from hermes_cli.models import ai_gateway_model_ids, get_pricing_for_provider
@@ -3079,6 +3073,21 @@ def _model_flow_custom(config):
else:
print(f" If /v1 should not be in the base URL, try: {suggested}")
# Prompt for API compatibility mode explicitly so codex-compatible custom
# providers don't silently fall back to chat_completions.
current_model_cfg = config.get("model")
current_api_mode = ""
if isinstance(current_model_cfg, dict):
current_api_mode = str(current_model_cfg.get("api_mode") or "").strip()
api_mode = _prompt_custom_api_mode_selection(
effective_url,
current_api_mode=current_api_mode,
)
if api_mode:
print(f" API mode: {api_mode}")
else:
print(" API mode: auto-detect")
# Select model — use probe results when available, fall back to manual input
model_name = ""
detected_models = probe.get("models") or []
@@ -3142,7 +3151,10 @@ def _model_flow_custom(config):
model["base_url"] = effective_url
if effective_key:
model["api_key"] = effective_key
model.pop("api_mode", None) # let runtime auto-detect from URL
if api_mode:
model["api_mode"] = api_mode
else:
model.pop("api_mode", None)
save_config(cfg)
deactivate_provider()
@@ -3165,7 +3177,10 @@ def _model_flow_custom(config):
_caller_model["base_url"] = effective_url
if effective_key:
_caller_model["api_key"] = effective_key
_caller_model.pop("api_mode", None)
if api_mode:
_caller_model["api_mode"] = api_mode
else:
_caller_model.pop("api_mode", None)
config["model"] = _caller_model
print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")
@@ -3176,9 +3191,80 @@ def _model_flow_custom(config):
model_name or "",
context_length=context_length,
name=display_name,
api_mode=api_mode,
)
def _prompt_custom_api_mode_selection(base_url: str, current_api_mode: str = "") -> Optional[str]:
"""Prompt for a custom provider API mode.
Returns an explicit mode string, or None to keep auto-detect behavior.
"""
from hermes_cli.runtime_provider import _detect_api_mode_for_url
detected_mode = _detect_api_mode_for_url(base_url)
normalized_current = str(current_api_mode or "").strip().lower()
default_mode = normalized_current or detected_mode or ""
mode_options = [
(
"",
"Auto-detect",
"Use Hermes URL heuristics; best for standard OpenAI-compatible endpoints.",
),
(
"chat_completions",
"Chat Completions",
"Use /chat/completions for standard OpenAI-compatible servers.",
),
(
"codex_responses",
"Responses / Codex",
"Use /responses for Codex-compatible tool-calling backends.",
),
(
"anthropic_messages",
"Anthropic Messages",
"Use /v1/messages for Anthropic-compatible endpoints.",
),
]
print()
print("Select API compatibility mode:")
for idx, (value, label, description) in enumerate(mode_options, 1):
markers = []
if value == detected_mode:
markers.append("detected")
if value == default_mode:
markers.append("current")
suffix = f" [{' / '.join(markers)}]" if markers else ""
print(f" {idx}. {label}{suffix}")
print(f" {description}")
try:
raw = input(
"Choice [1-4, Enter to keep current/detected]: "
).strip().lower()
except (KeyboardInterrupt, EOFError):
print("\nCancelled.")
raise
if not raw:
return default_mode or None
if raw in {"1", "auto", "detect", "auto-detect"}:
return None
if raw in {"2", "chat", "chat_completions", "completions"}:
return "chat_completions"
if raw in {"3", "responses", "codex", "codex_responses"}:
return "codex_responses"
if raw in {"4", "anthropic", "anthropic_messages", "messages"}:
return "anthropic_messages"
print(f"Invalid API mode choice: {raw}. Falling back to auto-detect.")
return None
def _auto_provider_name(base_url: str) -> str:
"""Generate a display name from a custom endpoint URL.
@@ -3214,12 +3300,12 @@ def _custom_provider_api_key_config_value(provider_info, resolved_api_key=""):
def _save_custom_provider(
base_url, api_key="", model="", context_length=None, name=None
base_url, api_key="", model="", context_length=None, name=None, api_mode=None
):
"""Save a custom endpoint to custom_providers in config.yaml.
Deduplicates by base_url if the URL already exists, updates the
model name and context_length but doesn't add a duplicate entry.
model name, context_length, and api_mode but doesn't add a duplicate entry.
Uses *name* when provided, otherwise auto-generates from the URL.
"""
from hermes_cli.config import load_config, save_config
@@ -3245,6 +3331,13 @@ def _save_custom_provider(
models_cfg[model] = {"context_length": context_length}
entry["models"] = models_cfg
changed = True
if api_mode:
if entry.get("api_mode") != api_mode:
entry["api_mode"] = api_mode
changed = True
elif "api_mode" in entry:
entry.pop("api_mode", None)
changed = True
if changed:
cfg["custom_providers"] = providers
save_config(cfg)
@@ -3259,6 +3352,8 @@ def _save_custom_provider(
entry["api_key"] = api_key
if model:
entry["model"] = model
if api_mode:
entry["api_mode"] = api_mode
if model and context_length:
entry["models"] = {model: {"context_length": context_length}}
@@ -3712,7 +3807,7 @@ def _model_flow_named_custom(config, provider_info):
save_config(cfg)
else:
# Save model name to the custom_providers entry for next time
_save_custom_provider(base_url, config_api_key, model_name)
_save_custom_provider(base_url, config_api_key, model_name, api_mode=api_mode)
print(f"\n✅ Model set to: {model_name}")
print(f" Provider: {name} ({base_url})")
@@ -4869,6 +4964,37 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
)
if model_list:
print(f" Found {len(model_list)} model(s) from Ollama Cloud")
elif provider_id == "novita":
from hermes_cli.models import fetch_api_models
api_key_for_probe = existing_key or (get_env_value(key_env) if key_env else "")
curated = _PROVIDER_MODELS.get(provider_id, [])
live_models = fetch_api_models(api_key_for_probe, effective_base)
if live_models:
model_list = live_models
print(f" Found {len(model_list)} model(s) from {pconfig.name} API")
else:
mdev_models: list = []
try:
from agent.models_dev import list_agentic_models
mdev_models = list_agentic_models(provider_id)
except Exception:
pass
if mdev_models:
seen = {m.lower() for m in mdev_models}
model_list = list(mdev_models)
for m in curated:
if m.lower() not in seen:
model_list.append(m)
seen.add(m.lower())
print(f" Found {len(model_list)} model(s) from models.dev registry")
else:
model_list = curated
if model_list:
print(
f' Showing {len(model_list)} curated models — use "Enter custom model name" for others.'
)
else:
curated = _PROVIDER_MODELS.get(provider_id, [])
@@ -6701,6 +6827,74 @@ def _cleanup_quarantined_exes(scripts_dir: Path | None = None) -> None:
pass
def _refresh_active_lazy_features() -> None:
"""Refresh lazy-installed backends after a code update.
When pyproject.toml's ``[all]`` extra was slimmed down (May 2026), most
optional backends moved to ``tools/lazy_deps.py`` and only install on
first use. ``hermes update`` runs ``uv pip install -e .[all]`` which
leaves those packages untouched so if we bump a pin in
:data:`LAZY_DEPS` (CVE response, transitive bug fix), users who already
activated the backend keep the stale version forever.
This function asks lazy_deps which features the user has previously
activated and reinstalls them under the current pins. Features the
user never enabled stay quiet no churn for cold backends.
Never raises. A failure here must not block the rest of the update.
"""
try:
from tools import lazy_deps
except Exception as exc:
logger.debug("Lazy refresh skipped (import failed): %s", exc)
return
try:
active = lazy_deps.active_features()
except Exception as exc:
logger.debug("Lazy refresh skipped (active_features failed): %s", exc)
return
if not active:
return
print()
print(f"→ Refreshing {len(active)} active lazy backend(s)...")
try:
results = lazy_deps.refresh_active_features(prompt=False)
except Exception as exc:
# refresh_active_features is documented as never-raise, but defend
# the update flow against future regressions.
print(f" ⚠ Lazy refresh failed unexpectedly: {exc}")
return
refreshed = [f for f, s in results.items() if s == "refreshed"]
current = [f for f, s in results.items() if s == "current"]
failed = [(f, s) for f, s in results.items() if s.startswith("failed:")]
skipped = [(f, s) for f, s in results.items() if s.startswith("skipped:")]
if refreshed:
print(f"{len(refreshed)} refreshed: {', '.join(refreshed)}")
if current:
print(f"{len(current)} already current")
if skipped:
# Most common reason: security.allow_lazy_installs=false. Show one
# line so the user knows why; not an error.
names = ", ".join(f for f, _ in skipped)
reason = skipped[0][1].split(": ", 1)[-1]
print(f" · {len(skipped)} skipped ({reason}): {names}")
if failed:
for feature, status in failed:
reason = status.split(": ", 1)[-1]
# Clip noisy pip stderr to keep update output legible.
if len(reason) > 200:
reason = reason[:200] + "..."
print(f"{feature} failed to refresh: {reason}")
print(" Backends keep their previously-installed version; rerun")
print(" `hermes update` once the upstream issue is resolved.")
def _install_python_dependencies_with_optional_fallback(
install_cmd_prefix: list[str],
*,
@@ -7623,6 +7817,8 @@ def _cmd_update_impl(args, gateway_mode: bool):
_install_psutil_android_compat(pip_cmd)
_install_python_dependencies_with_optional_fallback(pip_cmd, group=install_group)
_refresh_active_lazy_features()
_update_node_dependencies()
_build_web_ui(PROJECT_ROOT / "web")
@@ -9168,7 +9364,7 @@ def _build_provider_choices() -> list[str]:
"auto", "openrouter", "nous", "openai-codex", "copilot-acp", "copilot",
"anthropic", "gemini", "google-gemini-cli", "xai", "bedrock", "azure-foundry",
"ollama-cloud", "huggingface", "zai", "kimi-coding", "kimi-coding-cn",
"stepfun", "minimax", "minimax-cn", "kilocode", "xiaomi", "arcee",
"stepfun", "minimax", "minimax-cn", "kilocode", "novita", "xiaomi", "arcee",
"nvidia", "deepseek", "alibaba", "qwen-oauth", "opencode-zen", "opencode-go",
]
@@ -9188,10 +9384,10 @@ _BUILTIN_SUBCOMMANDS = frozenset(
"computer-use",
"config", "cron", "curator", "dashboard", "debug", "doctor",
"dump", "fallback", "gateway", "hooks", "import", "insights",
"kanban", "login", "logout", "logs", "mcp", "memory", "model",
"pairing", "plugins", "profile", "sessions", "setup", "skills",
"slack", "status", "tools", "uninstall", "update", "version",
"webhook", "whatsapp", "chat",
"kanban", "login", "logout", "logs", "lsp", "mcp", "memory",
"model", "pairing", "plugins", "profile", "sessions", "setup",
"skills", "slack", "status", "tools", "uninstall", "update",
"version", "webhook", "whatsapp", "chat",
# Help-ish invocations — plugin commands not being listed in
# top-level --help is an acceptable trade-off for skipping an
# expensive eager import of every bundled plugin module.
+8 -1
View File
@@ -10,6 +10,7 @@ from __future__ import annotations
import getpass
import os
import sys
import shlex
from pathlib import Path
from hermes_constants import get_hermes_home
@@ -134,7 +135,7 @@ def _install_dependencies(provider_name: str) -> None:
if check_cmd:
try:
subprocess.run(
check_cmd, shell=True, capture_output=True, timeout=5
shlex.split(check_cmd), check=True, capture_output=True, timeout=5
)
except Exception:
if install_cmd:
@@ -378,6 +379,12 @@ def _write_env_vars(env_path: Path, env_writes: dict) -> None:
new_lines.append(f"{key}={val}")
env_path.write_text("\n".join(new_lines) + "\n", encoding="utf-8")
# Restrict permissions — .env holds API keys and tokens.
try:
import stat
env_path.chmod(stat.S_IRUSR | stat.S_IWUSR) # 0600
except OSError:
pass # Windows or read-only FS
# ---------------------------------------------------------------------------
+75 -3
View File
@@ -445,6 +445,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
# Azure Foundry: user-provided endpoint and model.
# Empty list because models depend on the endpoint configuration.
"azure-foundry": [],
"novita": [
"moonshotai/kimi-k2.5",
"minimax/minimax-m2.7",
"zai-org/glm-5",
"deepseek/deepseek-v3-0324",
"deepseek/deepseek-r1-0528",
"qwen/qwen3-235b-a22b-fp8",
],
}
# Vercel AI Gateway: derive the bare-model-id catalog from the curated
@@ -905,13 +913,14 @@ class ProviderEntry(NamedTuple):
CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("nous", "Nous Portal", "Nous Portal (Nous Research subscription)"),
ProviderEntry("openrouter", "OpenRouter", "OpenRouter (100+ models, pay-per-use)"),
ProviderEntry("novita", "NovitaAI", "NovitaAI (AI-native cloud: Model API, Agent Sandbox, GPU Cloud)"),
ProviderEntry("lmstudio", "LM Studio", "LM Studio (local desktop app with built-in model server)"),
ProviderEntry("anthropic", "Anthropic", "Anthropic (Claude models — API key or Claude Code)"),
ProviderEntry("openai-codex", "OpenAI Codex", "OpenAI Codex"),
ProviderEntry("alibaba", "Qwen Cloud", "Qwen Cloud / DashScope Coding (Qwen + multi-provider)"),
ProviderEntry("xiaomi", "Xiaomi MiMo", "Xiaomi MiMo (MiMo-V2.5 and V2 models — pro, omni, flash)"),
ProviderEntry("tencent-tokenhub", "Tencent TokenHub", "Tencent TokenHub (Hy3 Preview — direct API via tokenhub.tencentmaas.com)"),
ProviderEntry("nvidia", "NVIDIA NIM", "NVIDIA NIM (Nemotron models — build.nvidia.com or local NIM)"),
ProviderEntry("qwen-oauth", "Qwen OAuth (Portal)", "Qwen OAuth (reuses local Qwen CLI login)"),
ProviderEntry("copilot", "GitHub Copilot", "GitHub Copilot (uses GITHUB_TOKEN or gh auth token)"),
ProviderEntry("copilot-acp", "GitHub Copilot ACP", "GitHub Copilot ACP (spawns `copilot --acp --stdio`)"),
ProviderEntry("huggingface", "Hugging Face", "Hugging Face Inference Providers (20+ open models)"),
@@ -926,7 +935,6 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("minimax", "MiniMax", "MiniMax (global direct API)"),
ProviderEntry("minimax-oauth", "MiniMax (OAuth)", "MiniMax via OAuth browser login (Coding Plan, minimax.io)"),
ProviderEntry("minimax-cn", "MiniMax (China)", "MiniMax China (domestic direct API)"),
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
ProviderEntry("ollama-cloud", "Ollama Cloud", "Ollama Cloud (cloud-hosted open models — ollama.com)"),
ProviderEntry("arcee", "Arcee AI", "Arcee AI (Trinity models — direct API)"),
ProviderEntry("gmi", "GMI Cloud", "GMI Cloud (multi-model direct API)"),
@@ -936,6 +944,7 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("bedrock", "AWS Bedrock", "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
ProviderEntry("azure-foundry", "Azure Foundry", "Azure Foundry (OpenAI-style or Anthropic-style endpoint — your Azure AI deployment)"),
ProviderEntry("ai-gateway", "Vercel AI Gateway", "Vercel AI Gateway"),
ProviderEntry("qwen-oauth", "Qwen OAuth (Portal)", "Qwen OAuth (reuses local Qwen CLI login)"),
]
# Auto-extend CANONICAL_PROVIDERS with any provider registered in providers/
@@ -1014,6 +1023,8 @@ _PROVIDER_ALIASES = {
"hf": "huggingface",
"hugging-face": "huggingface",
"huggingface-hub": "huggingface",
"novita-ai": "novita",
"novitaai": "novita",
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
"tencent": "tencent-tokenhub",
@@ -1494,7 +1505,7 @@ def _resolve_nous_pricing_credentials() -> tuple[str, str]:
def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> dict[str, dict[str, str]]:
"""Return live pricing for providers that support it (openrouter, nous, ai-gateway)."""
"""Return live pricing for providers that support it (openrouter, nous, ai-gateway, novita)."""
normalized = normalize_provider(provider)
if normalized == "openrouter":
return fetch_models_with_pricing(
@@ -1504,6 +1515,8 @@ def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> d
)
if normalized == "ai-gateway":
return fetch_ai_gateway_pricing(force_refresh=force_refresh)
if normalized == "novita":
return _fetch_novita_pricing(force_refresh=force_refresh)
if normalized == "nous":
api_key, base_url = _resolve_nous_pricing_credentials()
if base_url:
@@ -1520,6 +1533,65 @@ def get_pricing_for_provider(provider: str, *, force_refresh: bool = False) -> d
return {}
def _fetch_novita_pricing(
timeout: float = 8.0,
*,
force_refresh: bool = False,
) -> dict[str, dict[str, str]]:
"""Fetch pricing from NovitaAI /v1/models.
NovitaAI returns input/output prices per million tokens in units of
0.0001 USD. Convert them to the per-token strings used by the shared
pricing formatter.
Results are cached in ``_pricing_cache`` keyed on the resolved base URL,
matching the pattern used by ``fetch_ai_gateway_pricing`` without this,
every menu render or pricing lookup re-hits the network.
"""
api_key = os.getenv("NOVITA_API_KEY", "").strip()
if not api_key:
return {}
base_url = os.getenv("NOVITA_BASE_URL", "").strip() or "https://api.novita.ai/openai/v1"
cache_key = base_url.rstrip("/")
if not force_refresh and cache_key in _pricing_cache:
return _pricing_cache[cache_key]
url = cache_key + "/models"
headers = {
"Authorization": f"Bearer {api_key}",
"Accept": "application/json",
"User-Agent": _HERMES_USER_AGENT,
}
try:
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req, timeout=timeout) as resp:
payload = json.loads(resp.read().decode())
except Exception:
_pricing_cache[cache_key] = {}
return {}
result: dict[str, dict[str, str]] = {}
for item in payload.get("data", []):
if not isinstance(item, dict):
continue
mid = item.get("id")
if not mid:
continue
inp = item.get("input_token_price_per_m")
out = item.get("output_token_price_per_m")
if inp is None and out is None:
continue
result[str(mid)] = {
"prompt": str(float(inp or 0) / 10_000 / 1_000_000),
"completion": str(float(out or 0) / 10_000 / 1_000_000),
}
_pricing_cache[cache_key] = result
return result
# All provider IDs and aliases that are valid for the provider:model syntax.
_KNOWN_PROVIDER_NAMES: set[str] = (
set(_PROVIDER_LABELS.keys())
+75
View File
@@ -542,6 +542,61 @@ class PluginContext:
self.manifest.name, provider.name,
)
# -- video gen provider registration -------------------------------------
def register_video_gen_provider(self, provider) -> None:
"""Register a video generation backend.
``provider`` must be an instance of
:class:`agent.video_gen_provider.VideoGenProvider`. The
``provider.name`` attribute is what ``video_gen.provider`` in
``config.yaml`` matches against when routing ``video_generate``
tool calls.
"""
from agent.video_gen_provider import VideoGenProvider
from agent.video_gen_registry import register_provider as _register_video_provider
if not isinstance(provider, VideoGenProvider):
logger.warning(
"Plugin '%s' tried to register a video_gen provider that does "
"not inherit from VideoGenProvider. Ignoring.",
self.manifest.name,
)
return
_register_video_provider(provider)
logger.info(
"Plugin '%s' registered video_gen provider: %s",
self.manifest.name, provider.name,
)
# -- web search/extract provider registration ----------------------------
def register_web_search_provider(self, provider) -> None:
"""Register a web search/extract backend.
``provider`` must be an instance of
:class:`agent.web_search_provider.WebSearchProvider`. The
``provider.name`` attribute is what ``web.search_backend`` /
``web.extract_backend`` / ``web.backend`` in ``config.yaml``
matches against when routing ``web_search`` / ``web_extract``
tool calls.
"""
from agent.web_search_provider import WebSearchProvider
from agent.web_search_registry import register_provider as _register_web_provider
if not isinstance(provider, WebSearchProvider):
logger.warning(
"Plugin '%s' tried to register a web provider that does "
"not inherit from WebSearchProvider. Ignoring.",
self.manifest.name,
)
return
_register_web_provider(provider)
logger.info(
"Plugin '%s' registered web provider: %s",
self.manifest.name, provider.name,
)
# -- platform adapter registration ---------------------------------------
def register_platform(
@@ -1312,6 +1367,21 @@ def invoke_hook(hook_name: str, **kwargs: Any) -> List[Any]:
_thread_tool_whitelist = threading.local()
def set_thread_tool_whitelist(
allowed: Optional[Set[str]],
deny_msg_fmt: str = "Tool '{tool_name}' denied: not in this thread's tool whitelist",
) -> None:
_thread_tool_whitelist.allowed = allowed
_thread_tool_whitelist.fmt = deny_msg_fmt
def clear_thread_tool_whitelist() -> None:
_thread_tool_whitelist.allowed = None
def get_pre_tool_call_block_message(
tool_name: str,
args: Optional[Dict[str, Any]],
@@ -1330,6 +1400,11 @@ def get_pre_tool_call_block_message(
directive wins. Invalid or irrelevant hook return values are
silently ignored so existing observer-only hooks are unaffected.
"""
allowed = getattr(_thread_tool_whitelist, "allowed", None)
if allowed is not None and tool_name not in allowed:
fmt = getattr(_thread_tool_whitelist, "fmt", "Tool '{tool_name}' denied")
return fmt.format(tool_name=tool_name)
hook_results = invoke_hook(
"pre_tool_call",
tool_name=tool_name,
-85
View File
@@ -1295,91 +1295,6 @@ def rename_profile(old_name: str, new_name: str) -> Path:
return new_dir
# ---------------------------------------------------------------------------
# Tab completion
# ---------------------------------------------------------------------------
def generate_bash_completion() -> str:
"""Generate a bash completion script for hermes profile names."""
return '''# Hermes Agent profile completion
# Add to ~/.bashrc: eval "$(hermes completion bash)"
_hermes_profiles() {
local profiles_dir="$HOME/.hermes/profiles"
local profiles="default"
if [ -d "$profiles_dir" ]; then
profiles="$profiles $(ls "$profiles_dir" 2>/dev/null)"
fi
echo "$profiles"
}
_hermes_completion() {
local cur prev
cur="${COMP_WORDS[COMP_CWORD]}"
prev="${COMP_WORDS[COMP_CWORD-1]}"
# Complete profile names after -p / --profile
if [[ "$prev" == "-p" || "$prev" == "--profile" ]]; then
COMPREPLY=($(compgen -W "$(_hermes_profiles)" -- "$cur"))
return
fi
# Complete profile subcommands
if [[ "${COMP_WORDS[1]}" == "profile" ]]; then
case "$prev" in
profile)
COMPREPLY=($(compgen -W "list use create delete show alias rename export import" -- "$cur"))
return
;;
use|delete|show|alias|rename|export)
COMPREPLY=($(compgen -W "$(_hermes_profiles)" -- "$cur"))
return
;;
esac
fi
# Top-level subcommands
if [[ "$COMP_CWORD" == 1 ]]; then
local commands="chat model gateway setup status cron doctor dump config skills tools mcp sessions profile update version"
COMPREPLY=($(compgen -W "$commands" -- "$cur"))
fi
}
complete -F _hermes_completion hermes
'''
def generate_zsh_completion() -> str:
"""Generate a zsh completion script for hermes profile names."""
return '''#compdef hermes
# Hermes Agent profile completion
# Add to ~/.zshrc: eval "$(hermes completion zsh)"
_hermes() {
local -a profiles
profiles=(default)
if [[ -d "$HOME/.hermes/profiles" ]]; then
profiles+=("${(@f)$(ls $HOME/.hermes/profiles 2>/dev/null)}")
fi
_arguments \\
'-p[Profile name]:profile:($profiles)' \\
'--profile[Profile name]:profile:($profiles)' \\
'1:command:(chat model gateway setup status cron doctor dump config skills tools mcp sessions profile update version)' \\
'*::arg:->args'
case $words[1] in
profile)
_arguments '1:action:(list use create delete show alias rename export import)' \\
'2:profile:($profiles)'
;;
esac
}
_hermes "$@"
'''
# ---------------------------------------------------------------------------
# Profile env resolution (called from _apply_profile_override)
# ---------------------------------------------------------------------------
+9
View File
@@ -156,6 +156,11 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
is_aggregator=True,
base_url_env_var="HF_BASE_URL",
),
"novita": HermesOverlay(
transport="openai_chat",
is_aggregator=True,
base_url_env_var="NOVITA_BASE_URL",
),
"xai": HermesOverlay(
transport="codex_responses",
base_url_override="https://api.x.ai/v1",
@@ -309,6 +314,10 @@ ALIASES: Dict[str, str] = {
"hugging-face": "huggingface",
"huggingface-hub": "huggingface",
# novita
"novita-ai": "novita",
"novitaai": "novita",
# xiaomi
"mimo": "xiaomi",
"xiaomi-mimo": "xiaomi",
+44 -1
View File
@@ -164,7 +164,18 @@ def _copilot_runtime_api_mode(model_cfg: Dict[str, Any], api_key: str) -> str:
return "chat_completions"
_VALID_API_MODES = {"chat_completions", "codex_responses", "anthropic_messages", "bedrock_converse"}
_VALID_API_MODES = {
"chat_completions",
"codex_responses",
"anthropic_messages",
"bedrock_converse",
# Optional opt-in: hand the entire turn to a `codex app-server` subprocess
# so terminal/file-ops/patching/sandboxing run inside Codex's own runtime
# instead of Hermes' tool dispatch. Gated behind config key
# `model.openai_runtime == "codex_app_server"` AND provider in
# {"openai", "openai-codex"}. Default is unchanged.
"codex_app_server",
}
def _parse_api_mode(raw: Any) -> Optional[str]:
@@ -176,6 +187,32 @@ def _parse_api_mode(raw: Any) -> Optional[str]:
return None
def _maybe_apply_codex_app_server_runtime(
*,
provider: str,
api_mode: str,
model_cfg: Optional[Dict[str, Any]],
) -> str:
"""Optional opt-in: rewrite api_mode → "codex_app_server" for OpenAI/Codex
providers when the user has explicitly enabled that runtime via
`model.openai_runtime: codex_app_server` in config.yaml.
Default behavior is preserved: when the key is unset, "auto", or empty,
this function is a no-op. Only providers in {"openai", "openai-codex"}
are eligible other providers (anthropic, openrouter, etc.) cannot be
rerouted through codex.
Returns the (possibly-rewritten) api_mode."""
if not model_cfg:
return api_mode
if provider not in ("openai", "openai-codex"):
return api_mode
runtime = str(model_cfg.get("openai_runtime") or "").strip().lower()
if runtime == "codex_app_server":
return "codex_app_server"
return api_mode
def _resolve_runtime_from_pool_entry(
*,
provider: str,
@@ -293,6 +330,12 @@ def _resolve_runtime_from_pool_entry(
if api_mode == "anthropic_messages" and provider in {"opencode-zen", "opencode-go"}:
base_url = re.sub(r"/v1/?$", "", base_url)
# Optional opt-in: route OpenAI/Codex turns through `codex app-server`.
# Inert when `model.openai_runtime` is unset or "auto".
api_mode = _maybe_apply_codex_app_server_runtime(
provider=provider, api_mode=api_mode, model_cfg=model_cfg
)
return {
"provider": provider,
"api_mode": api_mode,
+20 -14
View File
@@ -454,6 +454,26 @@ def _print_setup_summary(config: dict, hermes_home):
else:
tool_status.append(("Image Generation", False, "FAL_KEY or OPENAI_API_KEY"))
# Video generation — opt-in via `hermes tools` → Video Generation.
# Only show the row when a plugin reports available so we don't badger
# users who don't care about video gen with a "missing" status line.
try:
from agent.video_gen_registry import list_providers as _list_video_providers
from hermes_cli.plugins import _ensure_plugins_discovered as _ensure_plugins
_ensure_plugins()
_video_backend = None
for _vp in _list_video_providers():
try:
if _vp.is_available():
_video_backend = _vp.display_name
break
except Exception:
continue
except Exception:
_video_backend = None
if _video_backend:
tool_status.append((f"Video Generation ({_video_backend})", True, None))
# TTS — show configured provider
tts_provider = cfg_get(config, "tts", "provider", default="edge")
if subscription_features.tts.managed_by_nous:
@@ -3246,18 +3266,6 @@ def run_setup_wizard(args):
print_info(f" cp {_backup_path} {config_path}")
_print_setup_summary(config, hermes_home)
_offer_launch_chat()
def _offer_launch_chat():
"""Prompt the user to jump straight into chat after setup."""
print()
if not prompt_yes_no("Launch hermes chat now?", True):
return
from hermes_cli.relaunch import relaunch
relaunch(["chat"])
def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool):
"""Streamlined first-time setup: provider, model, terminal & messaging.
@@ -3301,8 +3309,6 @@ def _run_first_time_quick_setup(config: dict, hermes_home, is_existing: bool):
_print_setup_summary(config, hermes_home)
_offer_launch_chat()
def _run_quick_setup(config: dict, hermes_home):
"""Quick setup — only configure items that are missing."""
+37 -8
View File
@@ -666,25 +666,46 @@ def _load_skin_from_yaml(path: Path) -> Optional[Dict[str, Any]]:
return None
def _mapping_or_empty(value: Any, *, section: str, skin_name: str) -> Dict[str, Any]:
"""Return a mapping value or an empty dict when the section type is invalid."""
if isinstance(value, dict):
return value
if value is None:
return {}
logger.warning(
"Skin '%s' has invalid '%s' section type (%s); ignoring section",
skin_name,
section,
type(value).__name__,
)
return {}
def _build_skin_config(data: Dict[str, Any]) -> SkinConfig:
"""Build a SkinConfig from a raw dict (built-in or loaded from YAML)."""
# Start with default values as base for missing keys
default = _BUILTIN_SKINS["default"]
skin_name = str(data.get("name", "unknown"))
color_overrides = _mapping_or_empty(data.get("colors"), section="colors", skin_name=skin_name)
spinner_overrides = _mapping_or_empty(data.get("spinner"), section="spinner", skin_name=skin_name)
branding_overrides = _mapping_or_empty(data.get("branding"), section="branding", skin_name=skin_name)
emoji_overrides = _mapping_or_empty(data.get("tool_emojis"), section="tool_emojis", skin_name=skin_name)
colors = dict(default.get("colors", {}))
colors.update(data.get("colors", {}))
colors.update(color_overrides)
spinner = dict(default.get("spinner", {}))
spinner.update(data.get("spinner", {}))
spinner.update(spinner_overrides)
branding = dict(default.get("branding", {}))
branding.update(data.get("branding", {}))
branding.update(branding_overrides)
return SkinConfig(
name=data.get("name", "unknown"),
name=skin_name,
description=data.get("description", ""),
colors=colors,
spinner=spinner,
branding=branding,
tool_prefix=data.get("tool_prefix", default.get("tool_prefix", "")),
tool_emojis=data.get("tool_emojis", {}),
tool_emojis=emoji_overrides,
banner_logo=data.get("banner_logo", ""),
banner_hero=data.get("banner_hero", ""),
)
@@ -828,10 +849,14 @@ def get_prompt_toolkit_style_overrides() -> Dict[str, str]:
except Exception:
return {}
prompt = skin.get_color("prompt", "#FFF8DC")
# Input/prompt: leave unset by default so the typed text inherits
# the terminal's foreground color (readable in both light and dark
# color schemes). Skins can opt into a colored prompt by setting
# `prompt` explicitly in their YAML.
prompt = skin.get_color("prompt", "")
input_rule = skin.get_color("input_rule", "#CD7F32")
title = skin.get_color("banner_title", "#FFD700")
text = skin.get_color("banner_text", prompt)
text = skin.get_color("banner_text", "#FFF8DC")
dim = skin.get_color("banner_dim", "#555555")
label = skin.get_color("ui_label", title)
warn = skin.get_color("ui_warn", "#FF8C00")
@@ -851,7 +876,11 @@ def get_prompt_toolkit_style_overrides() -> Dict[str, str]:
menu_meta_current_bg = skin.get_color("completion_menu_meta_current_bg", menu_current_bg)
return {
"input-area": prompt,
# Typed input always uses terminal default fg/bg so it's
# readable in both light and dark Terminal.app modes. The
# skin's `prompt` color (if any) only styles the prompt symbol,
# NOT the user's typed text.
"input-area": "",
"placeholder": f"{dim} italic",
"prompt": prompt,
"prompt-working": f"{dim} italic",
+270 -63
View File
@@ -60,6 +60,7 @@ CONFIGURABLE_TOOLSETS = [
("vision", "👁️ Vision / Image Analysis", "vision_analyze"),
("video", "🎬 Video Analysis", "video_analyze (requires video-capable model)"),
("image_gen", "🎨 Image Generation", "image_generate"),
("video_gen", "🎬 Video Generation", "video_generate (text-to-video + image-to-video)"),
("moa", "🧠 Mixture of Agents", "mixture_of_agents"),
("tts", "🔊 Text-to-Speech", "text_to_speech"),
("skills", "📚 Skills", "list, view, manage"),
@@ -82,7 +83,11 @@ CONFIGURABLE_TOOLSETS = [
# Toolsets that are OFF by default for new installs.
# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
# but the setup checklist won't pre-select them for first-time users.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin", "video"}
#
# Video gen is off by default — it's a niche, paid, slow feature. Users
# who want it opt in via `hermes tools` → Video Generation, which walks
# them through provider + model selection.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin", "video", "video_gen"}
# Platform-scoped toolsets: only appear in the `hermes tools` checklist for
# these platforms, and only resolve/save for these platforms. A toolset
@@ -240,6 +245,15 @@ TOOL_CATEGORIES = {
"setup_title": "Select Search Provider",
"setup_note": "A free DuckDuckGo search skill is also included — skip this if you don't need a premium provider.",
"icon": "🔍",
# Per-provider rows are injected at runtime from
# plugins.web.<vendor>.provider via _plugin_web_search_providers()
# in _visible_providers(). Only non-provider UX setup-flow rows
# for the firecrawl backend are listed here:
# - "Nous Subscription" — managed Firecrawl billed via Nous
# subscription (requires_nous_auth + override_env_vars).
# - "Firecrawl Self-Hosted" — points firecrawl at a private
# Docker instance via FIRECRAWL_API_URL only.
# See PR #25182 for the migration rationale.
"providers": [
{
"name": "Nous Subscription",
@@ -251,42 +265,6 @@ TOOL_CATEGORIES = {
"managed_nous_feature": "web",
"override_env_vars": ["FIRECRAWL_API_KEY", "FIRECRAWL_API_URL"],
},
{
"name": "Firecrawl Cloud",
"badge": "★ recommended",
"tag": "Full-featured search, extract, and crawl",
"web_backend": "firecrawl",
"env_vars": [
{"key": "FIRECRAWL_API_KEY", "prompt": "Firecrawl API key", "url": "https://firecrawl.dev"},
],
},
{
"name": "Exa",
"badge": "paid",
"tag": "Neural search with semantic understanding",
"web_backend": "exa",
"env_vars": [
{"key": "EXA_API_KEY", "prompt": "Exa API key", "url": "https://exa.ai"},
],
},
{
"name": "Parallel",
"badge": "paid",
"tag": "AI-powered search and extract",
"web_backend": "parallel",
"env_vars": [
{"key": "PARALLEL_API_KEY", "prompt": "Parallel API key", "url": "https://parallel.ai"},
],
},
{
"name": "Tavily",
"badge": "free tier",
"tag": "Search, extract, and crawl — 1000 free searches/mo",
"web_backend": "tavily",
"env_vars": [
{"key": "TAVILY_API_KEY", "prompt": "Tavily API key", "url": "https://app.tavily.com/home"},
],
},
{
"name": "Firecrawl Self-Hosted",
"badge": "free · self-hosted",
@@ -296,32 +274,6 @@ TOOL_CATEGORIES = {
{"key": "FIRECRAWL_API_URL", "prompt": "Your Firecrawl instance URL (e.g., http://localhost:3002)"},
],
},
{
"name": "SearXNG",
"badge": "free · self-hosted · search only",
"tag": "Privacy-respecting metasearch engine — search only (pair with any extract provider)",
"web_backend": "searxng",
"env_vars": [
{"key": "SEARXNG_URL", "prompt": "Your SearXNG instance URL (e.g., http://localhost:8080)", "url": "https://searxng.github.io/searxng/"},
],
},
{
"name": "Brave Search (Free Tier)",
"badge": "free tier · search only",
"tag": "2,000 queries/mo free — search only (pair with any extract provider)",
"web_backend": "brave-free",
"env_vars": [
{"key": "BRAVE_SEARCH_API_KEY", "prompt": "Brave Search subscription token", "url": "https://brave.com/search/api/"},
],
},
{
"name": "DuckDuckGo (ddgs)",
"badge": "free · no key · search only",
"tag": "Search via the ddgs Python package — no API key (pair with any extract provider)",
"web_backend": "ddgs",
"env_vars": [],
"post_setup": "ddgs",
},
],
},
"image_gen": {
@@ -349,6 +301,15 @@ TOOL_CATEGORIES = {
},
],
},
"video_gen": {
"name": "Video Generation",
"icon": "🎬",
# Providers list is intentionally empty — every video gen backend
# is a plugin, surfaced by ``_plugin_video_gen_providers()`` and
# injected by ``_visible_providers``. Mirrors the design we'll
# converge image_gen toward.
"providers": [],
},
"browser": {
"name": "Browser Automation",
"icon": "🌐",
@@ -1525,6 +1486,101 @@ def _plugin_image_gen_providers() -> list[dict]:
return rows
def _plugin_video_gen_providers() -> list[dict]:
"""Build picker-row dicts from plugin-registered video gen providers.
Mirrors ``_plugin_image_gen_providers`` exactly every video backend
is a plugin, so this function is the *only* source of provider rows
for the Video Generation category. The hardcoded ``TOOL_CATEGORIES``
entry for ``video_gen`` keeps an empty providers list.
"""
try:
from agent.video_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
providers = list_providers()
except Exception:
return []
rows: list[dict] = []
for provider in providers:
try:
schema = provider.get_setup_schema()
except Exception:
continue
if not isinstance(schema, dict):
continue
rows.append(
{
"name": schema.get("name", provider.display_name),
"badge": schema.get("badge", ""),
"tag": schema.get("tag", ""),
"env_vars": schema.get("env_vars", []),
"video_gen_plugin_name": provider.name,
}
)
return rows
# Mirror of _plugin_image_gen_providers for web search backends. Surfaces
# every plugin-registered web provider so it appears in the
# "Web Search & Extract" picker. All seven providers (brave-free, ddgs,
# searxng, exa, parallel, tavily, firecrawl) live as plugins after
# PR #25182 — this helper is the sole source of truth for the category's
# provider rows. The hardcoded entries that used to drive the category
# were deleted in the same PR; only the two non-provider UX rows
# ("Nous Subscription" managed-gateway entry, "Firecrawl Self-Hosted")
# remain in TOOL_CATEGORIES because they describe alternative *setup
# flows* for the firecrawl backend rather than distinct providers.
def _plugin_web_search_providers() -> list[dict]:
"""Build picker-row dicts from plugin-registered web search providers.
Each returned dict is a regular ``TOOL_CATEGORIES`` provider row. It
populates both ``web_backend`` (legacy field consumed by setup +
selection helpers) and ``web_search_plugin_name`` (informational
marker) so the picker behaves identically whether a provider is
hardcoded or plugin-registered.
After PR #25182, all seven web providers (brave-free, ddgs, searxng,
exa, parallel, tavily, firecrawl) are plugins; this helper is the sole
source of provider rows for the Web Search & Extract category.
"""
try:
from agent.web_search_registry import list_providers as _list_web_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
providers = _list_web_providers()
except Exception:
return []
rows: list[dict] = []
for provider in providers:
name = getattr(provider, "name", None)
if not name:
continue
try:
schema = provider.get_setup_schema()
except Exception:
continue
if not isinstance(schema, dict):
continue
row = {
"name": schema.get("name", provider.display_name),
"badge": schema.get("badge", ""),
"tag": schema.get("tag", ""),
"env_vars": schema.get("env_vars", []),
"web_backend": name,
"web_search_plugin_name": name,
}
# Optional pass-through fields the schema can opt into.
if schema.get("post_setup"):
row["post_setup"] = schema["post_setup"]
rows.append(row)
return rows
def _visible_providers(cat: dict, config: dict) -> list[dict]:
"""Return provider entries visible for the current auth/config state."""
features = get_nous_subscription_features(config)
@@ -1541,6 +1597,19 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
if cat.get("name") == "Image Generation":
visible.extend(_plugin_image_gen_providers())
# Inject plugin-registered video_gen backends. Unlike image_gen,
# video_gen has NO hardcoded providers — every backend is a plugin.
if cat.get("name") == "Video Generation":
visible.extend(_plugin_video_gen_providers())
# Inject plugin-registered web search backends. After PR #25182, this
# is the SOLE source of provider rows for the Web Search & Extract
# category — the per-provider hardcoded entries were deleted. The two
# remaining hardcoded rows ("Nous Subscription", "Firecrawl
# Self-Hosted") are non-provider UX setup-flow rows for firecrawl.
if cat.get("name") == "Web Search & Extract":
visible.extend(_plugin_web_search_providers())
return visible
@@ -1608,6 +1677,23 @@ def _toolset_needs_configuration_prompt(ts_key: str, config: dict) -> bool:
from agent.image_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
for provider in list_providers():
try:
if provider.is_available():
return False
except Exception:
continue
except Exception:
pass
return True
if ts_key == "video_gen":
# Satisfied when any plugin-registered video gen provider reports
# available — no in-tree fallback (every backend is a plugin).
try:
from agent.video_gen_registry import list_providers
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
for provider in list_providers():
try:
@@ -1952,6 +2038,106 @@ def _select_plugin_image_gen_provider(plugin_name: str, config: dict) -> None:
_configure_imagegen_model_for_plugin(plugin_name, config)
# ─── Video Generation Model Pickers ───────────────────────────────────────────
def _plugin_video_gen_catalog(plugin_name: str):
"""Return ``(catalog_dict, default_model_id)`` for a video gen plugin.
Mirrors :func:`_plugin_image_gen_catalog`. Returns ``({}, None)`` when
the plugin isn't registered or has no models.
"""
try:
from agent.video_gen_registry import get_provider
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
provider = get_provider(plugin_name)
except Exception:
return {}, None
if provider is None:
return {}, None
try:
models = provider.list_models() or []
default = provider.default_model()
except Exception:
return {}, None
catalog = {m["id"]: m for m in models if isinstance(m, dict) and "id" in m}
return catalog, default
def _configure_videogen_model_for_plugin(plugin_name: str, config: dict) -> None:
"""Prompt for a video gen model from a plugin's catalog.
Mirrors :func:`_configure_imagegen_model_for_plugin`. Writes the
selection to ``video_gen.model``.
"""
catalog, default_model = _plugin_video_gen_catalog(plugin_name)
if not catalog:
return
cur_cfg = config.setdefault("video_gen", {})
if not isinstance(cur_cfg, dict):
cur_cfg = {}
config["video_gen"] = cur_cfg
current_model = cur_cfg.get("model") or default_model
if current_model not in catalog:
current_model = default_model
model_ids = list(catalog.keys())
ordered = [current_model] + [m for m in model_ids if m != current_model]
widths = {
"model": max(len(m) for m in model_ids),
"speed": max((len(catalog[m].get("speed", "")) for m in model_ids), default=6),
"strengths": max((len(catalog[m].get("strengths", "")) for m in model_ids), default=0),
}
print()
header = (
f" {'Model':<{widths['model']}} "
f"{'Speed':<{widths['speed']}} "
f"{'Strengths':<{widths['strengths']}} "
f"Price"
)
print(color(header, Colors.CYAN))
rows = []
for mid in ordered:
meta = catalog[mid]
row = (
f" {mid:<{widths['model']}} "
f"{meta.get('speed', ''):<{widths['speed']}} "
f"{meta.get('strengths', ''):<{widths['strengths']}} "
f"{meta.get('price', '')}"
)
if mid == current_model:
row += " ← currently in use"
rows.append(row)
idx = _prompt_choice(
f" Choose {plugin_name} model:",
rows,
default=0,
)
chosen = ordered[idx]
cur_cfg["model"] = chosen
_print_success(f" Model set to: {chosen}")
def _select_plugin_video_gen_provider(plugin_name: str, config: dict) -> None:
"""Persist a plugin-backed video generation provider selection."""
vid_cfg = config.setdefault("video_gen", {})
if not isinstance(vid_cfg, dict):
vid_cfg = {}
config["video_gen"] = vid_cfg
vid_cfg["provider"] = plugin_name
vid_cfg["use_gateway"] = False
_print_success(f" video_gen.provider set to: {plugin_name}")
_configure_videogen_model_for_plugin(plugin_name, config)
def _configure_provider(provider: dict, config: dict):
"""Configure a single provider - prompt for API keys and set config."""
env_vars = provider.get("env_vars", [])
@@ -2014,6 +2200,12 @@ def _configure_provider(provider: dict, config: dict):
if plugin_name:
_select_plugin_image_gen_provider(plugin_name, config)
return
# Plugin-registered video_gen provider — same flow, different
# registry.
video_plugin = provider.get("video_gen_plugin_name")
if video_plugin:
_select_plugin_video_gen_provider(video_plugin, config)
return
# Imagegen backends prompt for model selection after backend pick.
backend = provider.get("imagegen_backend")
if backend:
@@ -2062,6 +2254,10 @@ def _configure_provider(provider: dict, config: dict):
if plugin_name:
_select_plugin_image_gen_provider(plugin_name, config)
return
video_plugin = provider.get("video_gen_plugin_name")
if video_plugin:
_select_plugin_video_gen_provider(video_plugin, config)
return
# Imagegen backends prompt for model selection after env vars are in.
backend = provider.get("imagegen_backend")
if backend:
@@ -2286,6 +2482,11 @@ def _reconfigure_provider(provider: dict, config: dict):
if plugin_name:
_select_plugin_image_gen_provider(plugin_name, config)
return
# Plugin-registered video_gen provider — same flow, different registry.
video_plugin = provider.get("video_gen_plugin_name")
if video_plugin:
_select_plugin_video_gen_provider(video_plugin, config)
return
# Imagegen backends prompt for model selection on reconfig too.
backend = provider.get("imagegen_backend")
if backend:
@@ -2318,6 +2519,12 @@ def _reconfigure_provider(provider: dict, config: dict):
_select_plugin_image_gen_provider(plugin_name, config)
return
# Plugin-registered video_gen provider — same flow, different registry.
video_plugin = provider.get("video_gen_plugin_name")
if video_plugin:
_select_plugin_video_gen_provider(video_plugin, config)
return
backend = provider.get("imagegen_backend")
if backend:
_configure_imagegen_model(backend, config)
+2 -32
View File
@@ -994,39 +994,9 @@ def get_model_options():
can share the same types.
"""
try:
from hermes_cli.model_switch import list_authenticated_providers
from hermes_cli.inventory import build_models_payload, load_picker_context
cfg = load_config()
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
current_model = model_cfg.get("default", model_cfg.get("name", "")) or ""
current_provider = model_cfg.get("provider", "") or ""
current_base_url = model_cfg.get("base_url", "") or ""
else:
current_model = str(model_cfg) if model_cfg else ""
current_provider = ""
current_base_url = ""
user_providers = cfg.get("providers") if isinstance(cfg.get("providers"), dict) else {}
custom_providers = (
cfg.get("custom_providers")
if isinstance(cfg.get("custom_providers"), list)
else []
)
providers = list_authenticated_providers(
current_provider=current_provider,
current_base_url=current_base_url,
current_model=current_model,
user_providers=user_providers,
custom_providers=custom_providers,
max_models=50,
)
return {
"providers": providers,
"model": current_model,
"provider": current_provider,
}
return build_models_payload(load_picker_context(), max_models=50)
except Exception:
_log.exception("GET /api/model/options failed")
raise HTTPException(status_code=500, detail="Failed to list model options")
+3 -3
View File
@@ -1597,10 +1597,10 @@ class SessionDB:
self._execute_write(_do)
def get_messages(self, session_id: str) -> List[Dict[str, Any]]:
"""Load all messages for a session, ordered by timestamp."""
"""Load all messages for a session, ordered by insertion order."""
with self._lock:
cursor = self._conn.execute(
"SELECT * FROM messages WHERE session_id = ? ORDER BY timestamp, id",
"SELECT * FROM messages WHERE session_id = ? ORDER BY id",
(session_id,),
)
rows = cursor.fetchall()
@@ -1700,7 +1700,7 @@ class SessionDB:
"SELECT role, content, tool_call_id, tool_calls, tool_name, "
"finish_reason, reasoning, reasoning_content, reasoning_details, "
"codex_reasoning_items, codex_message_items "
f"FROM messages WHERE session_id IN ({placeholders}) ORDER BY timestamp, id",
f"FROM messages WHERE session_id IN ({placeholders}) ORDER BY id",
tuple(session_ids),
).fetchall()
-232
View File
@@ -1,232 +0,0 @@
---
name: base
description: Query Base (Ethereum L2) blockchain data with USD pricing — wallet balances, token info, transaction details, gas analysis, contract inspection, whale detection, and live network stats. Uses Base RPC + CoinGecko. No API key required.
version: 0.1.0
author: youssefea
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [Base, Blockchain, Crypto, Web3, RPC, DeFi, EVM, L2, Ethereum]
related_skills: []
---
# Base Blockchain Skill
Query Base (Ethereum L2) on-chain data enriched with USD pricing via CoinGecko.
8 commands: wallet portfolio, token info, transactions, gas analysis,
contract inspection, whale detection, network stats, and price lookup.
No API key needed. Uses only Python standard library (urllib, json, argparse).
---
## When to Use
- User asks for a Base wallet balance, token holdings, or portfolio value
- User wants to inspect a specific transaction by hash
- User wants ERC-20 token metadata, price, supply, or market cap
- User wants to understand Base gas costs and L1 data fees
- User wants to inspect a contract (ERC type detection, proxy resolution)
- User wants to find large ETH transfers (whale detection)
- User wants Base network health, gas price, or ETH price
- User asks "what's the price of USDC/AERO/DEGEN/ETH?"
---
## Prerequisites
The helper script uses only Python standard library (urllib, json, argparse).
No external packages required.
Pricing data comes from CoinGecko's free API (no key needed, rate-limited
to ~10-30 requests/minute). For faster lookups, use `--no-prices` flag.
---
## Quick Reference
RPC endpoint (default): https://mainnet.base.org
Override: export BASE_RPC_URL=https://your-private-rpc.com
Helper script path: ~/.hermes/skills/blockchain/base/scripts/base_client.py
```
python3 base_client.py wallet <address> [--limit N] [--all] [--no-prices]
python3 base_client.py tx <hash>
python3 base_client.py token <contract_address>
python3 base_client.py gas
python3 base_client.py contract <address>
python3 base_client.py whales [--min-eth N]
python3 base_client.py stats
python3 base_client.py price <contract_address_or_symbol>
```
---
## Procedure
### 0. Setup Check
```bash
python3 --version
# Optional: set a private RPC for better rate limits
export BASE_RPC_URL="https://mainnet.base.org"
# Confirm connectivity
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
```
### 1. Wallet Portfolio
Get ETH balance and ERC-20 token holdings with USD values.
Checks ~15 well-known Base tokens (USDC, WETH, AERO, DEGEN, etc.)
via on-chain `balanceOf` calls. Tokens sorted by value, dust filtered.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
wallet 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
```
Flags:
- `--limit N` — show top N tokens (default: 20)
- `--all` — show all tokens, no dust filter, no limit
- `--no-prices` — skip CoinGecko price lookups (faster, RPC-only)
Output includes: ETH balance + USD value, token list with prices sorted
by value, dust count, total portfolio value in USD.
Note: Only checks known tokens. Unknown ERC-20s are not discovered.
Use the `token` command with a specific contract address for any token.
### 2. Transaction Details
Inspect a full transaction by its hash. Shows ETH value transferred,
gas used, fee in ETH/USD, status, and decoded ERC-20/ERC-721 transfers.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
tx 0xabc123...your_tx_hash_here
```
Output: hash, block, from, to, value (ETH + USD), gas price, gas used,
fee, status, contract creation address (if any), token transfers.
### 3. Token Info
Get ERC-20 token metadata: name, symbol, decimals, total supply, price,
market cap, and contract code size.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
token 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
```
Output: name, symbol, decimals, total supply, price, market cap.
Reads name/symbol/decimals directly from the contract via eth_call.
### 4. Gas Analysis
Detailed gas analysis with cost estimates for common operations.
Shows current gas price, base fee trends over 10 blocks, block
utilization, and estimated costs for ETH transfers, ERC-20 transfers,
and swaps.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py gas
```
Output: current gas price, base fee, block utilization, 10-block trend,
cost estimates in ETH and USD.
Note: Base is an L2 — actual transaction costs include an L1 data
posting fee that depends on calldata size and L1 gas prices. The
estimates shown are for L2 execution only.
### 5. Contract Inspection
Inspect an address: determine if it's an EOA or contract, detect
ERC-20/ERC-721/ERC-1155 interfaces, resolve EIP-1967 proxy
implementation addresses.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
contract 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
```
Output: is_contract, code size, ETH balance, detected interfaces
(ERC-20, ERC-721, ERC-1155), ERC-20 metadata, proxy implementation
address.
### 6. Whale Detector
Scan the most recent block for large ETH transfers with USD values.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py \
whales --min-eth 1.0
```
Note: scans the latest block only — point-in-time snapshot, not historical.
Default threshold is 1.0 ETH (lower than Solana's default since ETH
values are higher).
### 7. Network Stats
Live Base network health: latest block, chain ID, gas price, base fee,
block utilization, transaction count, and ETH price.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
```
### 8. Price Lookup
Quick price check for any token by contract address or known symbol.
```bash
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price ETH
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price USDC
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price AERO
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price DEGEN
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py price 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
```
Known symbols: ETH, WETH, USDC, cbETH, AERO, DEGEN, TOSHI, BRETT,
WELL, wstETH, rETH, cbBTC.
---
## Pitfalls
- **CoinGecko rate-limits** — free tier allows ~10-30 requests/minute.
Price lookups use 1 request per token. Use `--no-prices` for speed.
- **Public RPC rate-limits** — Base's public RPC limits requests.
For production use, set BASE_RPC_URL to a private endpoint
(Alchemy, QuickNode, Infura).
- **Wallet shows known tokens only** — unlike Solana, EVM chains have no
built-in "get all tokens" RPC. The wallet command checks ~15 popular
Base tokens via `balanceOf`. Unknown ERC-20s won't appear. Use the
`token` command for any specific contract.
- **Token names read from contract** — if a contract doesn't implement
`name()` or `symbol()`, these fields may be empty. Known tokens have
hardcoded labels as fallback.
- **Gas estimates are L2 only** — Base transaction costs include an L1
data posting fee (depends on calldata size and L1 gas prices). The gas
command estimates L2 execution cost only.
- **Whale detector scans latest block only** — not historical. Results
vary by the moment you query. Default threshold is 1.0 ETH.
- **Proxy detection** — only EIP-1967 proxies are detected. Other proxy
patterns (EIP-1167 minimal proxy, custom storage slots) are not checked.
- **Retry on 429** — both RPC and CoinGecko calls retry up to 2 times
with exponential backoff on rate-limit errors.
---
## Verification
```bash
# Should print Base chain ID (8453), latest block, gas price, and ETH price
python3 ~/.hermes/skills/blockchain/base/scripts/base_client.py stats
```
File diff suppressed because it is too large Load Diff
+211
View File
@@ -0,0 +1,211 @@
---
name: evm
description: "Read-only EVM client: wallets, tokens, gas across 8 chains."
version: 1.0.0
author: Mibayy (@Mibayy), youssefea (@youssefea), ethernet8023 (@ethernet8023), Hermes Agent
license: MIT
platforms: [linux, macos, windows]
metadata:
hermes:
tags: [EVM, Ethereum, BNB, BSC, Base, Arbitrum, Polygon, Optimism, Avalanche, zkSync, Blockchain, Crypto, Web3, DeFi, NFT, ENS, Whale, Security]
category: blockchain
related_skills: [solana]
requires_toolsets: [terminal]
---
# EVM Blockchain Skill
Query EVM-compatible blockchain data across 8 chains with USD pricing.
14 commands: wallet portfolio, token info, transactions, activity, gas tracker,
network stats, price lookup, multi-chain scan, whale detection, ENS resolution,
allowance checker, contract inspector, and transaction decoder.
Supports 8 chains: Ethereum, BNB Chain (BSC), Base, Arbitrum One, Polygon,
Optimism, Avalanche (C-Chain), zkSync Era.
No API key needed. Zero external dependencies — Python standard library only
(urllib, json, argparse, threading).
> **Supersedes the standalone `base` skill.** Base-specific tokens (AERO, DEGEN,
> TOSHI, BRETT, WELL, cbETH, cbBTC, wstETH, rETH) and all Base RPC functionality
> previously living under `optional-skills/blockchain/base/` have been folded
> into this skill. Pass `--chain base` to any command for Base coverage.
---
## When to Use
- User asks for a wallet balance or portfolio on any EVM chain
- User wants to check the same wallet across ALL chains at once
- User wants to inspect a transaction by hash (or decode what it did)
- User wants ERC-20 token metadata, price, supply, or market cap
- User wants recent transaction history for an address
- User wants current gas prices or to compare fees across chains
- User wants to find large whale transfers in recent blocks
- User asks to resolve an ENS name (vitalik.eth) or reverse-lookup an address
- User wants to check if a contract has dangerous token approvals
- User wants to inspect a smart contract (proxy? ERC-20? ERC-721? bytecode size?)
- User wants to compare gas costs across chains before a transaction
---
## Prerequisites
Python 3.8+ standard library only. No pip installs required.
Pricing: CoinGecko free API (rate-limited, ~10-30 req/min).
ENS: ensideas.com public API.
Tx decoding: 4byte.directory public API.
Override RPC endpoint: `export EVM_RPC_URL=https://your-rpc.com`
Helper script path: `~/.hermes/skills/blockchain/evm/scripts/evm_client.py`
---
## Quick Reference
```
SCRIPT=~/.hermes/skills/blockchain/evm/scripts/evm_client.py
# Network & prices
python3 $SCRIPT stats # Ethereum stats
python3 $SCRIPT stats --chain arbitrum # Arbitrum stats
python3 $SCRIPT compare # Gas + prices ALL 8 chains
# Wallet
python3 $SCRIPT wallet 0xd8dA...96045 # Portfolio (ETH + ERC-20)
python3 $SCRIPT wallet 0xd8dA...96045 --chain bsc
python3 $SCRIPT multichain 0xd8dA...96045 # Same wallet on ALL chains
# Tokens & prices
python3 $SCRIPT price ETH
python3 $SCRIPT price 0xdAC1...1ec7 # By contract address
python3 $SCRIPT token 0xdAC1...1ec7 # ERC-20 metadata + market cap
# Transactions
python3 $SCRIPT tx 0x5c50...f060 # Transaction details
python3 $SCRIPT decode 0x5c50...f060 # Decode input data (4byte.directory)
python3 $SCRIPT activity 0xd8dA...96045 # Recent transactions
# Gas
python3 $SCRIPT gas # Gas prices + cost estimates
python3 $SCRIPT gas --chain optimism
# Security
python3 $SCRIPT allowance 0xd8dA...96045 # Dangerous ERC-20 approvals
python3 $SCRIPT contract 0xdAC1...1ec7 # Contract inspection (proxy? standards?)
# ENS
python3 $SCRIPT ens vitalik.eth # Name -> address + profile
python3 $SCRIPT ens 0xd8dA...96045 # Address -> ENS name
# Whale detection
python3 $SCRIPT whale # Large transfers (last 20 blocks, >$10k)
python3 $SCRIPT whale --blocks 50 --min-usd 100000 --chain arbitrum
```
---
## Procedure
### 0. Setup Check
```bash
python3 --version # 3.8+ required
python3 ~/.hermes/skills/blockchain/evm/scripts/evm_client.py stats
```
### 1. Wallet Portfolio
Native balance + known ERC-20 tokens, sorted by USD value.
```bash
python3 $SCRIPT wallet 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
python3 $SCRIPT wallet 0xd8dA... --chain bsc --no-prices # faster
```
### 2. Multi-Chain Scan
Scans all 8 chains simultaneously for the same address using threads.
```bash
python3 $SCRIPT multichain 0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045
```
Output: per-chain native balance + token holdings + grand total USD.
### 3. Compare (Gas + Prices)
All 8 chains queried in parallel. Shows cheapest/most expensive chain.
```bash
python3 $SCRIPT compare
```
### 4. Transaction Details & Decode
```bash
python3 $SCRIPT tx 0x5c504ed432cb51138bcf09aa5e8a410dd4a1e204ef84bfed1be16dfba1b22060
python3 $SCRIPT decode 0x5c504ed... # Shows human-readable function signature
```
Decode uses 4byte.directory to translate 0xa9059cbb -> transfer(address,uint256).
### 5. ENS Resolution
```bash
python3 $SCRIPT ens vitalik.eth # -> 0xd8dA... + avatar + social links
python3 $SCRIPT ens 0xd8dA...96045 # -> vitalik.eth
```
### 6. Allowance Checker (Security)
Checks ERC-20 approvals granted to known DEX/bridge contracts.
```bash
python3 $SCRIPT allowance 0xYourWallet
```
Flags UNLIMITED approvals as HIGH risk.
### 7. Contract Inspector
```bash
python3 $SCRIPT contract 0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48 # USDC (proxy)
python3 $SCRIPT contract 0xdAC17F958D2ee523a2206206994597C13D831ec7 # USDT (ERC-20)
```
Detects: proxy (EIP-1967/EIP-1167), ERC-20, ERC-721, ERC-165. Shows bytecode size and implementation address for proxies.
### 8. Whale Detection
```bash
python3 $SCRIPT whale # ETH, last 20 blocks, >$10k
python3 $SCRIPT whale --blocks 50 --min-usd 50000 --chain bsc
```
### 9. Gas Tracker
```bash
python3 $SCRIPT gas
python3 $SCRIPT gas --chain polygon
```
Shows gwei price + USD cost for: transfer, ERC-20 transfer, approve, swap, NFT mint, NFT transfer.
---
## Supported Chains
| Key | Name | Native | Chain ID |
|-----------|----------------|--------|----------|
| ethereum | Ethereum | ETH | 1 |
| bsc | BNB Chain | BNB | 56 |
| base | Base | ETH | 8453 |
| arbitrum | Arbitrum One | ETH | 42161 |
| polygon | Polygon | POL | 137 |
| optimism | Optimism | ETH | 10 |
| avalanche | Avalanche C | AVAX | 43114 |
| zksync | zkSync Era | ETH | 324 |
---
## Pitfalls
- CoinGecko free tier: ~10-30 req/min. Use `--no-prices` for faster wallet scans.
- Public RPCs may throttle. Set EVM_RPC_URL to a private endpoint for production.
- `wallet` and `allowance` only check known token list (~30 tokens per chain). Use a block explorer for complete token discovery.
- `activity` scans recent blocks only (max 200). For full history, use Etherscan API.
- `multichain` runs 8 parallel threads — can trigger rate limits on public RPCs.
- ENS resolution depends on a single public endpoint (ensideas.com / ens.vitalik.ca) with no fallback. If that endpoint is down, `ens` will fail — re-run later or use a block explorer.
- Tx decoding depends on a single public endpoint (4byte.directory) with no fallback. Selectors not in their database show up as `unknown`.
- **L2 gas estimates are L2-execution only.** On rollups like Base, Arbitrum, Optimism, and zkSync, the actual transaction cost also includes an L1 data-posting fee that depends on calldata size and current L1 gas prices. The `gas` command does not estimate that L1 component. For Base specifically, see the network's L1 fee oracle (contract `0x420000000000000000000000000000000000000F`).
- Address / tx-hash inputs are validated for 0x-prefix + correct length + hex, but EIP-55 checksum casing is **not** enforced (RPC endpoints accept any-case hex).
---
## Verification
```bash
# Should print current block, gas price, ETH price
python3 ~/.hermes/skills/blockchain/evm/scripts/evm_client.py stats
# Should resolve vitalik.eth to 0xd8dA...
python3 ~/.hermes/skills/blockchain/evm/scripts/evm_client.py ens vitalik.eth
```
File diff suppressed because it is too large Load Diff
+2 -1
View File
@@ -21,6 +21,7 @@ from dataclasses import dataclass, field
from pathlib import Path
from hermes_constants import get_hermes_home
from hermes_cli.profiles import _get_default_hermes_home
from typing import Any, TYPE_CHECKING
if TYPE_CHECKING:
@@ -73,7 +74,7 @@ def resolve_config_path() -> Path:
return local_path
# Default profile's config — host blocks accumulate here via setup/clone
default_path = Path.home() / ".hermes" / "honcho.json"
default_path = _get_default_hermes_home() / "honcho.json"
if default_path != local_path and default_path.exists():
return default_path
+7
View File
@@ -336,10 +336,17 @@ ADD_RESOURCE_SCHEMA = {
def _zip_directory(dir_path: Path) -> Path:
"""Create a temporary zip file containing a directory tree."""
root = dir_path.resolve()
zip_path = Path(tempfile.gettempdir()) / f"openviking_upload_{uuid.uuid4().hex}.zip"
with zipfile.ZipFile(zip_path, "w", zipfile.ZIP_DEFLATED) as zipf:
for file_path in dir_path.rglob("*"):
if file_path.is_symlink():
continue
if file_path.is_file():
try:
file_path.resolve().relative_to(root)
except ValueError:
continue
arcname = str(file_path.relative_to(dir_path)).replace("\\", "/")
zipf.write(file_path, arcname=arcname)
return zip_path
+2 -1
View File
@@ -2,6 +2,7 @@
from typing import Any
from agent.portal_tags import nous_portal_tags
from providers import register_provider
from providers.base import ProviderProfile
@@ -12,7 +13,7 @@ class NousProfile(ProviderProfile):
def build_extra_body(
self, *, session_id: str | None = None, **context
) -> dict[str, Any]:
return {"tags": ["product=hermes-agent"]}
return {"tags": nous_portal_tags()}
def build_api_kwargs_extras(
self,
@@ -0,0 +1,27 @@
"""NovitaAI provider profile."""
from providers import register_provider
from providers.base import ProviderProfile
novita = ProviderProfile(
name="novita",
aliases=("novita-ai", "novitaai"),
display_name="NovitaAI",
description="NovitaAI — AI-native cloud for builders and agents",
signup_url="https://novita.ai/settings/key-management",
env_vars=("NOVITA_API_KEY", "NOVITA_BASE_URL"),
base_url="https://api.novita.ai/openai/v1",
auth_type="api_key",
default_aux_model="deepseek/deepseek-v3-0324",
fallback_models=(
"moonshotai/kimi-k2.5",
"minimax/minimax-m2.7",
"zai-org/glm-5",
"deepseek/deepseek-v3-0324",
"deepseek/deepseek-r1-0528",
"qwen/qwen3-235b-a22b-fp8",
),
)
register_provider(novita)
@@ -0,0 +1,5 @@
name: novita-provider
kind: model-provider
version: 1.0.0
description: NovitaAI AI-native cloud for builders and agents
author: Nous Research
+523
View File
@@ -0,0 +1,523 @@
"""FAL.ai video generation backend.
User-facing surface: pick a **model family** (e.g. "Pixverse v6",
"Veo 3.1", "Seedance 2.0", "Kling v3 4K", "LTX 2.3", "Happy Horse").
The plugin auto-routes to the family's text-to-video endpoint when
called without ``image_url``, and to its image-to-video endpoint when
``image_url`` is provided. The agent never sees the routing it just
calls ``video_generate(prompt=..., image_url=...)``.
Model families (each with t2v + i2v endpoints):
Cheap tier:
ltx-2.3 fal-ai/ltx-2.3-22b/text-to-video / fal-ai/ltx-2.3-22b/image-to-video
pixverse-v6 fal-ai/pixverse/v6/text-to-video / fal-ai/pixverse/v6/image-to-video
Premium tier:
veo3.1 fal-ai/veo3.1 / fal-ai/veo3.1/image-to-video
seedance-2.0 bytedance/seedance-2.0/text-to-video / bytedance/seedance-2.0/image-to-video
kling-v3-4k fal-ai/kling-video/v3/4k/text-to-video / fal-ai/kling-video/v3/4k/image-to-video
happy-horse fal-ai/happy-horse/text-to-video / fal-ai/happy-horse/image-to-video
Selection precedence for the active family:
1. ``model=`` arg from the tool call
2. ``FAL_VIDEO_MODEL`` env var
3. ``video_gen.fal.model`` in ``config.yaml``
4. ``video_gen.model`` in ``config.yaml`` (when it's one of our family IDs)
5. ``DEFAULT_MODEL``
Authentication via ``FAL_KEY``. Output is an HTTPS URL from FAL's CDN; the
gateway downloads and delivers it.
"""
from __future__ import annotations
import logging
import os
from typing import Any, Dict, List, Optional, Tuple
from agent.video_gen_provider import (
VideoGenProvider,
error_response,
success_response,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Family catalog
# ---------------------------------------------------------------------------
#
# Each family declares both endpoints (when available) plus a per-family
# capability sheet derived from FAL's OpenAPI schemas. Capability flags
# drive which keys get added to the request payload — keys a family doesn't
# advertise are dropped before send.
#
# Capabilities:
# aspect_ratios : tuple of supported ratios (None = endpoint decides)
# resolutions : tuple of supported resolutions (None = endpoint decides)
# durations : tuple of supported durations OR (min, max) range
# (heuristic: 2-element with gap > 1 is a range)
# audio : True if generate_audio is supported
# negative : True if negative_prompt is supported
FAL_FAMILIES: Dict[str, Dict[str, Any]] = {
# ─── Cheap / fast tier ─────────────────────────────────────────────
"ltx-2.3": {
"display": "LTX 2.3 (22B)",
"speed": "~30-60s",
"price": "cheap",
"strengths": "22B model with native audio generation. Affordable.",
"tier": "cheap",
"text_endpoint": "fal-ai/ltx-2.3-22b/text-to-video",
"image_endpoint": "fal-ai/ltx-2.3-22b/image-to-video",
# LTX docs don't expose duration/aspect/resolution enums — leave
# blank so we don't send unrecognized payload keys.
"aspect_ratios": None,
"resolutions": None,
"durations": None,
"audio": True,
"negative": True,
},
"pixverse-v6": {
"display": "Pixverse v6",
"speed": "~30-90s",
"price": "cheap",
"strengths": "Affordable. Negative prompts. 1-15s durations.",
"tier": "cheap",
"text_endpoint": "fal-ai/pixverse/v6/text-to-video",
"image_endpoint": "fal-ai/pixverse/v6/image-to-video",
"aspect_ratios": None,
"resolutions": ("360p", "540p", "720p", "1080p"),
"durations": (1, 15),
"audio": True,
"negative": True,
},
# ─── Expensive / premium tier ──────────────────────────────────────
"veo3.1": {
"display": "Veo 3.1",
"speed": "~60-120s",
"price": "premium",
"strengths": "Google DeepMind. Cinematic, native audio, strong prompt adherence.",
"tier": "premium",
"text_endpoint": "fal-ai/veo3.1",
"image_endpoint": "fal-ai/veo3.1/image-to-video",
"aspect_ratios": ("16:9", "9:16"),
"resolutions": ("720p", "1080p"),
"durations": (4, 6, 8),
"audio": True,
"negative": True,
},
"seedance-2.0": {
"display": "Seedance 2.0",
"speed": "~60-120s",
"price": "premium",
"strengths": "ByteDance. Cinematic, synchronized audio + lip-sync, 4-15s.",
"tier": "premium",
"text_endpoint": "bytedance/seedance-2.0/text-to-video",
"image_endpoint": "bytedance/seedance-2.0/image-to-video",
# Seedance accepts "auto" too — we omit it from the enum so the
# agent can't pass it; the endpoint defaults handle the rest.
"aspect_ratios": ("21:9", "16:9", "4:3", "1:1", "3:4", "9:16"),
"resolutions": ("480p", "720p", "1080p"),
"durations": (4, 15),
"audio": True,
"negative": False,
},
"kling-v3-4k": {
"display": "Kling v3 4K",
"speed": "~120-300s",
"price": "premium",
"strengths": "4K output, native audio (Chinese/English), 3-15s.",
"tier": "premium",
"text_endpoint": "fal-ai/kling-video/v3/4k/text-to-video",
"image_endpoint": "fal-ai/kling-video/v3/4k/image-to-video",
# Kling 4K image-to-video uses `start_image_url` instead of
# `image_url`. Handled in _build_payload via image_param_key.
"image_param_key": "start_image_url",
"aspect_ratios": ("16:9", "9:16", "1:1"),
"resolutions": None, # 4K is implicit
"durations": (3, 15),
"audio": True,
"negative": True,
},
"happy-horse": {
"display": "Happy Horse 1.0",
"speed": "~60-120s",
"price": "premium",
"strengths": "Alibaba. New model, sparse public docs — conservative defaults.",
"tier": "premium",
"text_endpoint": "fal-ai/happy-horse/text-to-video",
"image_endpoint": "fal-ai/happy-horse/image-to-video",
# Docs don't expose duration/aspect/resolution — let the endpoint
# apply its own defaults.
"aspect_ratios": None,
"resolutions": None,
"durations": None,
"audio": False,
"negative": False,
},
}
DEFAULT_MODEL = "pixverse-v6" # cheap, both modalities, sane defaults
def _is_duration_range(durations: Any) -> bool:
"""Heuristic: a 2-tuple of ints with a gap > 1 is treated as ``(min, max)``."""
if not isinstance(durations, tuple) or len(durations) != 2:
return False
if not all(isinstance(d, int) for d in durations):
return False
return durations[1] - durations[0] > 1
def _clamp_duration(family: Dict[str, Any], duration: Optional[int]) -> Optional[int]:
durations = family.get("durations")
if not durations:
return duration
if duration is None:
return durations[0]
if _is_duration_range(durations):
lo, hi = durations
return max(lo, min(hi, duration))
# enum
if duration in durations:
return duration
return min(durations, key=lambda d: abs(d - duration))
# ---------------------------------------------------------------------------
# Config / model resolution
# ---------------------------------------------------------------------------
def _load_video_gen_section() -> Dict[str, Any]:
try:
from hermes_cli.config import load_config
cfg = load_config()
section = cfg.get("video_gen") if isinstance(cfg, dict) else None
return section if isinstance(section, dict) else {}
except Exception as exc:
logger.debug("Could not load video_gen config: %s", exc)
return {}
def _resolve_family(explicit: Optional[str]) -> Tuple[str, Dict[str, Any]]:
"""Decide which FAL family to use. Returns ``(family_id, meta)``."""
candidates: List[Optional[str]] = []
candidates.append(explicit)
candidates.append(os.environ.get("FAL_VIDEO_MODEL"))
cfg = _load_video_gen_section()
fal_cfg = cfg.get("fal") if isinstance(cfg.get("fal"), dict) else {}
if isinstance(fal_cfg, dict):
candidates.append(fal_cfg.get("model"))
top = cfg.get("model")
if isinstance(top, str):
candidates.append(top)
for c in candidates:
if isinstance(c, str) and c.strip() and c.strip() in FAL_FAMILIES:
fid = c.strip()
return fid, FAL_FAMILIES[fid]
return DEFAULT_MODEL, FAL_FAMILIES[DEFAULT_MODEL]
# ---------------------------------------------------------------------------
# Payload construction
# ---------------------------------------------------------------------------
def _build_payload(
family: Dict[str, Any],
*,
prompt: str,
image_url: Optional[str],
duration: Optional[int],
aspect_ratio: str,
resolution: str,
negative_prompt: Optional[str],
audio: Optional[bool],
seed: Optional[int],
) -> Dict[str, Any]:
"""Build a family-specific payload, dropping keys the family doesn't declare."""
payload: Dict[str, Any] = {}
if prompt:
payload["prompt"] = prompt
if image_url:
# Some endpoints (e.g. Kling v3 4K image-to-video) expect
# `start_image_url` instead of `image_url`. The family entry can
# declare an override.
key = family.get("image_param_key") or "image_url"
payload[key] = image_url
if seed is not None:
payload["seed"] = seed
if family.get("aspect_ratios"):
if aspect_ratio in family["aspect_ratios"]:
payload["aspect_ratio"] = aspect_ratio
# otherwise let the endpoint auto-crop / use its default
if family.get("resolutions"):
if resolution in family["resolutions"]:
payload["resolution"] = resolution
# else: let the endpoint default
clamped = _clamp_duration(family, duration)
if clamped is not None and family.get("durations"):
# FAL exposes duration as a string in the queue API ("8" not 8).
payload["duration"] = str(clamped)
if family.get("audio") and audio is not None:
payload["generate_audio"] = bool(audio)
if family.get("negative") and negative_prompt:
payload["negative_prompt"] = negative_prompt
return payload
# ---------------------------------------------------------------------------
# fal_client lazy import (same pattern as image_generation_tool)
# ---------------------------------------------------------------------------
_fal_client: Any = None
def _load_fal_client() -> Any:
global _fal_client
if _fal_client is not None:
return _fal_client
import fal_client # type: ignore
_fal_client = fal_client
return fal_client
# ---------------------------------------------------------------------------
# Provider
# ---------------------------------------------------------------------------
class FALVideoGenProvider(VideoGenProvider):
"""FAL.ai multi-family video generation backend.
Routes between text-to-video and image-to-video endpoints automatically
based on whether ``image_url`` was provided.
"""
@property
def name(self) -> str:
return "fal"
@property
def display_name(self) -> str:
return "FAL"
def is_available(self) -> bool:
if not os.environ.get("FAL_KEY", "").strip():
return False
try:
import fal_client # noqa: F401
except ImportError:
return False
return True
def list_models(self) -> List[Dict[str, Any]]:
out: List[Dict[str, Any]] = []
for fid, meta in FAL_FAMILIES.items():
modalities: List[str] = []
if meta.get("text_endpoint"):
modalities.append("text")
if meta.get("image_endpoint"):
modalities.append("image")
out.append({
"id": fid,
"display": meta["display"],
"speed": meta["speed"],
"strengths": meta["strengths"],
"price": meta["price"],
"tier": meta.get("tier", "premium"),
"modalities": modalities,
})
return out
def default_model(self) -> Optional[str]:
return DEFAULT_MODEL
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "FAL",
"badge": "paid",
"tag": "LTX, Pixverse, Veo 3.1, Seedance 2.0, Kling 4K, Happy Horse — text-to-video & image-to-video",
"env_vars": [
{
"key": "FAL_KEY",
"prompt": "FAL.ai API key",
"url": "https://fal.ai/dashboard/keys",
},
],
}
def capabilities(self) -> Dict[str, Any]:
return {
"modalities": ["text", "image"],
"aspect_ratios": ["16:9", "9:16", "1:1"],
"resolutions": ["360p", "540p", "720p", "1080p"],
"max_duration": 15,
"min_duration": 1,
"supports_audio": True,
"supports_negative_prompt": True,
"max_reference_images": 0,
}
def generate(
self,
prompt: str,
*,
model: Optional[str] = None,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
duration: Optional[int] = None,
aspect_ratio: str = "16:9",
resolution: str = "720p",
negative_prompt: Optional[str] = None,
audio: Optional[bool] = None,
seed: Optional[int] = None,
**kwargs: Any,
) -> Dict[str, Any]:
if not os.environ.get("FAL_KEY", "").strip():
return error_response(
error=(
"FAL_KEY not set. Run `hermes tools` → Video Generation "
"→ FAL to configure."
),
error_type="auth_required",
provider="fal",
prompt=prompt,
)
try:
fal_client = _load_fal_client()
except ImportError:
return error_response(
error="fal_client Python package not installed (pip install fal-client)",
error_type="missing_dependency",
provider="fal",
prompt=prompt,
)
prompt = (prompt or "").strip()
family_id, family = _resolve_family(model)
# Route: image_url → image-to-video endpoint; else → text-to-video.
image_url_norm = (image_url or "").strip() or None
if image_url_norm:
endpoint = family.get("image_endpoint")
modality_used = "image"
if not endpoint:
return error_response(
error=(
f"FAL family {family_id} has no image-to-video "
f"endpoint. Pick a family with image-to-video support "
f"via `hermes tools` → Video Generation."
),
error_type="modality_unsupported",
provider="fal", model=family_id, prompt=prompt,
)
else:
endpoint = family.get("text_endpoint")
modality_used = "text"
if not endpoint:
return error_response(
error=(
f"FAL family {family_id} has no text-to-video "
f"endpoint. Pass an image_url to use its "
f"image-to-video endpoint, or pick a different family."
),
error_type="modality_unsupported",
provider="fal", model=family_id, prompt=prompt,
)
if not prompt:
return error_response(
error="prompt is required.",
error_type="missing_prompt",
provider="fal", model=family_id, prompt=prompt,
)
payload = _build_payload(
family,
prompt=prompt,
image_url=image_url_norm,
duration=duration,
aspect_ratio=aspect_ratio,
resolution=resolution,
negative_prompt=negative_prompt,
audio=audio,
seed=seed,
)
try:
result = fal_client.subscribe(
endpoint,
arguments=payload,
with_logs=False,
)
except Exception as exc:
logger.warning(
"FAL video gen failed (family=%s, endpoint=%s): %s",
family_id, endpoint, exc, exc_info=True,
)
return error_response(
error=f"FAL video generation failed: {exc}",
error_type="api_error",
provider="fal", model=family_id, prompt=prompt,
aspect_ratio=aspect_ratio,
)
video = (result or {}).get("video") if isinstance(result, dict) else None
url: Optional[str] = None
if isinstance(video, dict):
url = video.get("url")
elif isinstance(video, str):
url = video
if not url:
return error_response(
error="FAL returned no video URL in response",
error_type="empty_response",
provider="fal", model=family_id, prompt=prompt,
)
extra: Dict[str, Any] = {"endpoint": endpoint}
if isinstance(video, dict):
if video.get("file_size"):
extra["file_size"] = video["file_size"]
if video.get("content_type"):
extra["content_type"] = video["content_type"]
return success_response(
video=url,
model=family_id,
prompt=prompt,
modality=modality_used,
aspect_ratio=aspect_ratio if "aspect_ratio" in payload else "",
duration=int(payload["duration"]) if "duration" in payload else 0,
provider="fal",
extra=extra,
)
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Plugin entry point — wire ``FALVideoGenProvider`` into the registry."""
ctx.register_video_gen_provider(FALVideoGenProvider())
+7
View File
@@ -0,0 +1,7 @@
name: fal
version: 1.0.0
description: "FAL.ai video generation backend. Multi-model — Veo 3.1, Kling, Pixverse — covering text-to-video and image-to-video via fal_client's queue API."
author: NousResearch
kind: backend
requires_env:
- FAL_KEY
+402
View File
@@ -0,0 +1,402 @@
"""xAI Grok-Imagine video generation backend.
Surface: text-to-video and image-to-video (animate an input image)
through xAI's ``/videos/generations`` endpoint. Edit and extend are not
exposed in this unified surface xAI is the only backend that supports
them and the inconsistency would force per-backend prose in the agent's
tool description.
Originally salvaged from PR #10600 by @Jaaneek; reshaped into the
:class:`VideoGenProvider` plugin interface and trimmed to the
generate-only surface.
Authentication via ``XAI_API_KEY``. Output is an HTTPS URL from xAI's
CDN; the gateway downloads and delivers it.
"""
from __future__ import annotations
import asyncio
import logging
import os
import uuid
from typing import Any, Dict, List, Optional
import httpx
from agent.video_gen_provider import (
VideoGenProvider,
error_response,
success_response,
)
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
DEFAULT_XAI_BASE_URL = "https://api.x.ai/v1"
DEFAULT_MODEL = "grok-imagine-video"
DEFAULT_DURATION = 8
DEFAULT_ASPECT_RATIO = "16:9"
DEFAULT_RESOLUTION = "720p"
DEFAULT_TIMEOUT_SECONDS = 240
DEFAULT_POLL_INTERVAL_SECONDS = 5
VALID_ASPECT_RATIOS = {"1:1", "16:9", "9:16", "4:3", "3:4", "3:2", "2:3"}
VALID_RESOLUTIONS = {"480p", "720p"}
MAX_REFERENCE_IMAGES = 7
_MODELS: Dict[str, Dict[str, Any]] = {
"grok-imagine-video": {
"display": "Grok Imagine Video",
"speed": "~60-240s",
"strengths": "Text-to-video + image-to-video; up to 7 reference images for style/character.",
"price": "see https://docs.x.ai/docs/models",
"modalities": ["text", "image"],
},
}
# ---------------------------------------------------------------------------
# HTTP helpers
# ---------------------------------------------------------------------------
def _xai_base_url() -> str:
return (os.getenv("XAI_BASE_URL") or DEFAULT_XAI_BASE_URL).strip().rstrip("/")
def _xai_headers() -> Dict[str, str]:
api_key = os.getenv("XAI_API_KEY", "").strip()
if not api_key:
raise ValueError("XAI_API_KEY not set. Get one at https://console.x.ai/")
try:
from tools.xai_http import hermes_xai_user_agent
ua = hermes_xai_user_agent()
except Exception:
ua = "hermes-agent/video_gen"
return {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"User-Agent": ua,
}
def _normalize_reference_images(reference_image_urls: Optional[List[str]]):
refs = []
for url in reference_image_urls or []:
normalized = (url or "").strip()
if normalized:
refs.append({"url": normalized})
return refs or None
def _clamp_duration(duration: Optional[int], has_reference_images: bool) -> int:
value = duration if duration is not None else DEFAULT_DURATION
if value < 1:
value = 1
if value > 15:
value = 15
if has_reference_images and value > 10:
value = 10
return value
async def _submit(
client: httpx.AsyncClient,
payload: Dict[str, Any],
) -> str:
"""POST to /videos/generations — xAI's only public endpoint for our
text-to-video and image-to-video surface."""
response = await client.post(
f"{_xai_base_url()}/videos/generations",
headers={**_xai_headers(), "x-idempotency-key": str(uuid.uuid4())},
json=payload,
timeout=60,
)
response.raise_for_status()
body = response.json()
request_id = body.get("request_id")
if not request_id:
raise RuntimeError("xAI video response did not include request_id")
return request_id
async def _poll(
client: httpx.AsyncClient,
request_id: str,
*,
timeout_seconds: int,
poll_interval: int,
) -> Dict[str, Any]:
elapsed = 0.0
last_status = "queued"
while elapsed < timeout_seconds:
response = await client.get(
f"{_xai_base_url()}/videos/{request_id}",
headers=_xai_headers(),
timeout=30,
)
response.raise_for_status()
body = response.json()
last_status = (body.get("status") or "").lower()
if last_status == "done":
return {"status": "done", "body": body}
if last_status in {"failed", "error", "expired", "cancelled"}:
return {"status": last_status, "body": body}
await asyncio.sleep(poll_interval)
elapsed += poll_interval
return {"status": "timeout", "body": {"status": last_status}}
# ---------------------------------------------------------------------------
# Provider
# ---------------------------------------------------------------------------
class XAIVideoGenProvider(VideoGenProvider):
"""xAI grok-imagine-video backend (text-to-video + image-to-video)."""
@property
def name(self) -> str:
return "xai"
@property
def display_name(self) -> str:
return "xAI"
def is_available(self) -> bool:
return bool(os.environ.get("XAI_API_KEY", "").strip())
def list_models(self) -> List[Dict[str, Any]]:
return [{"id": mid, **meta} for mid, meta in _MODELS.items()]
def default_model(self) -> Optional[str]:
return DEFAULT_MODEL
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "xAI",
"badge": "paid",
"tag": "grok-imagine-video — text-to-video & image-to-video with reference images",
"env_vars": [
{
"key": "XAI_API_KEY",
"prompt": "xAI API key",
"url": "https://console.x.ai/",
},
],
}
def capabilities(self) -> Dict[str, Any]:
return {
"modalities": ["text", "image"],
"aspect_ratios": sorted(VALID_ASPECT_RATIOS),
"resolutions": sorted(VALID_RESOLUTIONS),
"max_duration": 15,
"min_duration": 1,
"supports_audio": False,
"supports_negative_prompt": False,
"max_reference_images": MAX_REFERENCE_IMAGES,
}
def generate(
self,
prompt: str,
*,
model: Optional[str] = None,
image_url: Optional[str] = None,
reference_image_urls: Optional[List[str]] = None,
duration: Optional[int] = None,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
resolution: str = DEFAULT_RESOLUTION,
negative_prompt: Optional[str] = None,
audio: Optional[bool] = None,
seed: Optional[int] = None,
**kwargs: Any,
) -> Dict[str, Any]:
try:
loop = asyncio.new_event_loop()
try:
return loop.run_until_complete(self._generate_async(
prompt=prompt,
model=model,
image_url=image_url,
reference_image_urls=reference_image_urls,
duration=duration,
aspect_ratio=aspect_ratio,
resolution=resolution,
))
finally:
loop.close()
except Exception as exc:
logger.warning("xAI video gen unexpected failure: %s", exc, exc_info=True)
return error_response(
error=f"xAI video generation failed: {exc}",
error_type="api_error",
provider="xai",
model=model or DEFAULT_MODEL,
prompt=prompt,
aspect_ratio=aspect_ratio,
)
async def _generate_async(
self,
*,
prompt: str,
model: Optional[str],
image_url: Optional[str],
reference_image_urls: Optional[List[str]],
duration: Optional[int],
aspect_ratio: str,
resolution: str,
) -> Dict[str, Any]:
if not os.environ.get("XAI_API_KEY", "").strip():
return error_response(
error="XAI_API_KEY not set. Get one at https://console.x.ai/",
error_type="auth_required",
provider="xai", prompt=prompt,
)
prompt = (prompt or "").strip()
image_url_norm = (image_url or "").strip() or None
normalized_aspect_ratio = (aspect_ratio or DEFAULT_ASPECT_RATIO).strip()
normalized_resolution = (resolution or DEFAULT_RESOLUTION).strip().lower()
modality_used = "image" if image_url_norm else "text"
if not prompt:
return error_response(
error=(
"prompt is required for xAI video generation "
"(text-to-video or image-to-video)"
),
error_type="missing_prompt",
provider="xai", prompt=prompt,
)
refs = _normalize_reference_images(reference_image_urls)
if refs and len(refs) > MAX_REFERENCE_IMAGES:
return error_response(
error=f"reference_image_urls supports at most {MAX_REFERENCE_IMAGES} images on xAI",
error_type="too_many_references",
provider="xai", prompt=prompt,
)
if image_url_norm and refs:
return error_response(
error="image_url and reference_image_urls cannot be combined on xAI",
error_type="conflicting_inputs",
provider="xai", prompt=prompt,
)
clamped_duration = _clamp_duration(duration, has_reference_images=bool(refs))
if normalized_aspect_ratio not in VALID_ASPECT_RATIOS:
normalized_aspect_ratio = DEFAULT_ASPECT_RATIO
if normalized_resolution not in VALID_RESOLUTIONS:
normalized_resolution = DEFAULT_RESOLUTION
payload: Dict[str, Any] = {
"model": model or DEFAULT_MODEL,
"prompt": prompt,
"duration": clamped_duration,
"aspect_ratio": normalized_aspect_ratio,
"resolution": normalized_resolution,
}
if image_url_norm:
payload["image"] = {"url": image_url_norm}
if refs:
payload["reference_images"] = refs
async with httpx.AsyncClient() as client:
try:
request_id = await _submit(client, payload)
except httpx.HTTPStatusError as exc:
detail = ""
try:
detail = exc.response.text[:500]
except Exception:
pass
return error_response(
error=f"xAI submit failed ({exc.response.status_code}): {detail or exc}",
error_type="api_error",
provider="xai",
model=model or DEFAULT_MODEL,
prompt=prompt,
)
poll_result = await _poll(
client, request_id,
timeout_seconds=DEFAULT_TIMEOUT_SECONDS,
poll_interval=DEFAULT_POLL_INTERVAL_SECONDS,
)
status = poll_result["status"]
body = poll_result["body"]
if status == "done":
video = body.get("video") or {}
url = video.get("url")
if not url:
return error_response(
error="xAI video generation completed without a video URL",
error_type="empty_response",
provider="xai",
model=body.get("model") or model or DEFAULT_MODEL,
prompt=prompt,
)
extra: Dict[str, Any] = {
"request_id": request_id,
"resolution": normalized_resolution,
}
if body.get("usage"):
extra["usage"] = body["usage"]
return success_response(
video=url,
model=body.get("model") or model or DEFAULT_MODEL,
prompt=prompt,
modality=modality_used,
aspect_ratio=normalized_aspect_ratio,
duration=video.get("duration") or clamped_duration,
provider="xai",
extra=extra,
)
if status == "timeout":
return error_response(
error=f"Timed out waiting for video generation after {DEFAULT_TIMEOUT_SECONDS}s",
error_type="timeout",
provider="xai",
model=model or DEFAULT_MODEL,
prompt=prompt,
)
message = (
(body.get("error", {}) or {}).get("message")
or body.get("message")
or f"xAI video generation ended with status '{status}'"
)
return error_response(
error=message,
error_type=f"xai_{status}",
provider="xai",
model=model or DEFAULT_MODEL,
prompt=prompt,
)
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Plugin entry point — wire ``XAIVideoGenProvider`` into the registry."""
ctx.register_video_gen_provider(XAIVideoGenProvider())
+7
View File
@@ -0,0 +1,7 @@
name: xai
version: 1.0.0
description: "xAI Grok-Imagine video generation backend. Supports text-to-video, image-to-video, reference-image-guided generation, video edit, and video extend via the xAI async videos API."
author: NousResearch
kind: backend
requires_env:
- XAI_API_KEY
+7
View File
@@ -0,0 +1,7 @@
# Bundled web search providers — plugins/web/.
#
# Each subdirectory follows the image_gen plugin layout:
# plugins/web/<name>/{plugin.yaml, __init__.py, provider.py}
#
# They auto-load via kind: backend and register via
# ctx.register_web_search_provider() into agent.web_search_registry.
+14
View File
@@ -0,0 +1,14 @@
"""Brave Search (free tier) plugin — bundled, auto-loaded.
Mirrors the ``plugins/image_gen/openai/`` layout: ``provider.py`` holds the
provider class, ``__init__.py::register(ctx)`` registers an instance.
"""
from __future__ import annotations
from plugins.web.brave_free.provider import BraveFreeWebSearchProvider
def register(ctx) -> None:
"""Register the Brave-free provider with the plugin context."""
ctx.register_web_search_provider(BraveFreeWebSearchProvider())
+7
View File
@@ -0,0 +1,7 @@
name: web-brave-free
version: 1.0.0
description: "Brave Search (free tier) — web search via Brave's Data-for-Search API. Requires BRAVE_SEARCH_API_KEY (free signup at https://brave.com/search/api/, 2k queries/month)."
author: NousResearch
kind: backend
provides_web_providers:
- brave-free
@@ -1,23 +1,20 @@
"""Brave Search web search provider (free tier).
"""Brave Search (free tier) — plugin form.
Brave Search's Data-for-Search API offers a free tier (2,000 queries/mo at the
time of writing) after signing up at https://brave.com/search/api/. This
provider implements ``WebSearchProvider`` only the Data-for-Search endpoint
returns search results, it does not extract/crawl arbitrary URLs.
Subclasses :class:`agent.web_search_provider.WebSearchProvider` (the
plugin-facing ABC). The legacy in-tree module
``tools.web_providers.brave_free`` was removed in the same commit that
moved this code under ``plugins/``; this file is now the canonical
implementation.
Configuration::
Config keys this provider responds to::
# ~/.hermes/.env
BRAVE_SEARCH_API_KEY=your-subscription-token
# ~/.hermes/config.yaml
web:
search_backend: "brave-free"
extract_backend: "firecrawl" # pair with an extract provider if needed
search_backend: "brave-free" # explicit per-capability
backend: "brave-free" # shared fallback
The API uses the ``X-Subscription-Token`` header. Free-tier keys are rate
limited (1 qps) and capped at 2k queries/month; see the Brave dashboard for
current quotas.
Auth env var::
BRAVE_SEARCH_API_KEY=... # https://brave.com/search/api/ (free tier)
"""
from __future__ import annotations
@@ -26,49 +23,45 @@ import logging
import os
from typing import Any, Dict
from tools.web_providers.base import WebSearchProvider
from agent.web_search_provider import WebSearchProvider
logger = logging.getLogger(__name__)
_BRAVE_ENDPOINT = "https://api.search.brave.com/res/v1/web/search"
class BraveFreeSearchProvider(WebSearchProvider):
"""Search via the Brave Search API (free tier).
class BraveFreeWebSearchProvider(WebSearchProvider):
"""Search-only Brave provider using the free-tier Data-for-Search API.
Requires ``BRAVE_SEARCH_API_KEY`` to be set. The value is passed as the
``X-Subscription-Token`` header. No extract capability pair with
Firecrawl/Tavily/Exa/Parallel when you also need ``web_extract``.
Free tier is 2,000 queries/month (1 qps). No content-extraction capability
users pair this with Firecrawl/Tavily/Exa for ``web_extract``.
"""
def provider_name(self) -> str:
@property
def name(self) -> str:
# Hyphen form preserved for backward compat with the existing
# ``web.search_backend: "brave-free"`` config keys users have set.
return "brave-free"
def is_configured(self) -> bool:
@property
def display_name(self) -> str:
return "Brave Search (Free)"
def is_available(self) -> bool:
"""Return True when ``BRAVE_SEARCH_API_KEY`` is set to a non-empty value."""
return bool(os.getenv("BRAVE_SEARCH_API_KEY", "").strip())
def supports_search(self) -> bool:
return True
def supports_extract(self) -> bool:
return False
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a search against the Brave Search API.
Returns normalized results::
{
"success": True,
"data": {
"web": [
{
"title": str,
"url": str,
"description": str,
"position": int,
},
...
]
}
}
On failure returns ``{"success": False, "error": str}``.
Returns ``{"success": True, "data": {"web": [{"title", "url", "description", "position"}]}}``
on success, or ``{"success": False, "error": str}`` on failure.
"""
import httpx
@@ -128,3 +121,17 @@ class BraveFreeSearchProvider(WebSearchProvider):
)
return {"success": True, "data": {"web": web_results}}
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "Brave Search (Free)",
"badge": "free",
"tag": "Free-tier API key — 2k queries/mo, search only.",
"env_vars": [
{
"key": "BRAVE_SEARCH_API_KEY",
"prompt": "Brave Search API key (free tier)",
"url": "https://brave.com/search/api/",
},
],
}
+15
View File
@@ -0,0 +1,15 @@
"""DuckDuckGo search plugin — bundled, auto-loaded.
Backed by the community ``ddgs`` Python package which scrapes DDG's HTML
results page. No API key required, but the package itself must be installed
(it's an optional dep — gated via :meth:`is_available`).
"""
from __future__ import annotations
from plugins.web.ddgs.provider import DDGSWebSearchProvider
def register(ctx) -> None:
"""Register the DDGS provider with the plugin context."""
ctx.register_web_search_provider(DDGSWebSearchProvider())
+7
View File
@@ -0,0 +1,7 @@
name: web-ddgs
version: 1.0.0
description: "DuckDuckGo web search via the ddgs Python package — no API key required. Install with `pip install ddgs`."
author: NousResearch
kind: backend
provides_web_providers:
- ddgs
@@ -1,28 +1,13 @@
"""DuckDuckGo web search provider via the ``ddgs`` Python package.
"""DuckDuckGo search — plugin form (via the ``ddgs`` package).
DuckDuckGo does not provide an official programmatic search API. The
community-maintained `ddgs <https://pypi.org/project/ddgs/>`_ package (the
renamed successor of ``duckduckgo-search``) scrapes DuckDuckGo's HTML results
page and normalizes them. It implements ``WebSearchProvider`` only there is
no extract capability.
Subclasses the plugin-facing :class:`agent.web_search_provider.WebSearchProvider`.
The legacy in-tree module ``tools.web_providers.ddgs`` was removed in the
same commit that moved this code under ``plugins/``; this file is now the
canonical implementation.
Configuration::
# No API key required. Enable by installing the package and pointing the
# web backend at ddgs:
pip install ddgs
# ~/.hermes/config.yaml
web:
search_backend: "ddgs"
extract_backend: "firecrawl" # pair with an extract provider if needed
Rate limits are enforced server-side by DuckDuckGo. Expect intermittent
``DuckDuckGoSearchException`` / 202 responses under heavy use; this provider
surfaces them as ``{"success": False, "error": ...}`` rather than crashing
the tool call.
See https://duckduckgo.com/?q=duckduckgo+tos for terms of use.
The ``ddgs`` package is an optional dependency. ``is_available()`` reflects
whether the package is importable; the plugin still registers either way so
``hermes tools`` can prompt the user to install it.
"""
from __future__ import annotations
@@ -30,39 +15,49 @@ from __future__ import annotations
import logging
from typing import Any, Dict
from tools.web_providers.base import WebSearchProvider
from agent.web_search_provider import WebSearchProvider
logger = logging.getLogger(__name__)
class DDGSSearchProvider(WebSearchProvider):
"""Search via the ``ddgs`` package (DuckDuckGo HTML scrape).
class DDGSWebSearchProvider(WebSearchProvider):
"""DuckDuckGo HTML-scrape search provider.
No API key required. The provider is considered "configured" when the
``ddgs`` package is importable there is nothing else to set up.
No API key needed. Rate limits are enforced server-side by DuckDuckGo;
the provider surfaces ``DuckDuckGoSearchException`` and other ddgs errors
as ``{"success": False, "error": ...}`` rather than raising.
"""
def provider_name(self) -> str:
@property
def name(self) -> str:
return "ddgs"
def is_configured(self) -> bool:
@property
def display_name(self) -> str:
return "DuckDuckGo (ddgs)"
def is_available(self) -> bool:
"""Return True when the ``ddgs`` package is importable.
Called at tool-registration time; must not perform network I/O.
Probes the import once; cheap because Python caches the import. Must
NOT perform network I/O runs at tool-registration time and on every
``hermes tools`` paint.
"""
try:
import ddgs # noqa: F401
return True
except ImportError:
return False
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a DuckDuckGo search and return normalized results.
def supports_search(self) -> bool:
return True
Returns ``{"success": True, "data": {"web": [...]}}`` on success or
``{"success": False, "error": str}`` on failure (missing package,
rate-limited, network error, etc.).
"""
def supports_extract(self) -> bool:
return False
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a DuckDuckGo search and return normalized results."""
try:
from ddgs import DDGS # type: ignore
except ImportError:
@@ -96,3 +91,14 @@ class DDGSSearchProvider(WebSearchProvider):
logger.info("DDGS search '%s': %d results (limit %d)", query, len(web_results), limit)
return {"success": True, "data": {"web": web_results}}
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "DuckDuckGo (ddgs)",
"badge": "free · no key · search only",
"tag": "Search via the ddgs Python package — no API key (pair with any extract provider)",
"env_vars": [],
# Trigger `_run_post_setup("ddgs")` after the user picks this row
# so the ddgs Python package gets pip-installed on first selection.
"post_setup": "ddgs",
}
+15
View File
@@ -0,0 +1,15 @@
"""Exa web search + extract plugin — bundled, auto-loaded.
Backed by the official Exa SDK (``exa-py``). Both search and extract are
sync; the dispatcher in :mod:`tools.web_tools` handles the wrap when the
caller is async.
"""
from __future__ import annotations
from plugins.web.exa.provider import ExaWebSearchProvider
def register(ctx) -> None:
"""Register the Exa provider with the plugin context."""
ctx.register_web_search_provider(ExaWebSearchProvider())
+7
View File
@@ -0,0 +1,7 @@
name: web-exa
version: 1.0.0
description: "Exa web search and content extraction. Requires EXA_API_KEY — sign up at https://exa.ai."
author: NousResearch
kind: backend
provides_web_providers:
- exa
+212
View File
@@ -0,0 +1,212 @@
"""Exa web search + content extraction — plugin form.
Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Uses the
official Exa SDK (``exa-py``) which is lazy-loaded via
:func:`tools.lazy_deps.ensure` so that cold-start CLI users don't pay the
SDK import cost when Exa isn't configured.
Config keys this provider responds to::
web:
search_backend: "exa" # explicit per-capability
extract_backend: "exa" # explicit per-capability
backend: "exa" # shared fallback for both
Env var::
EXA_API_KEY=... # https://exa.ai (paid tier; free trial available)
The previous in-tree implementation lived at
``tools.web_tools._exa_search`` / ``_exa_extract``; this file is the
canonical replacement. Behavior is bit-for-bit identical aside from the
ABC method-name change.
"""
from __future__ import annotations
import logging
import os
from typing import Any, Dict, List
from agent.web_search_provider import WebSearchProvider
logger = logging.getLogger(__name__)
# Module-level note: the canonical ``_exa_client`` cache slot lives on
# :mod:`tools.web_tools` so tests that do ``tools.web_tools._exa_client =
# None`` between cases see fresh state. The plugin reads/writes through
# that public module (see :func:`_get_exa_client`).
def _get_exa_client() -> Any:
"""Lazy-import and cache an Exa SDK client.
Cache lives on :mod:`tools.web_tools` (as ``_exa_client``) so unit
tests that reset that name between cases keep working. Raises
``ValueError`` when ``EXA_API_KEY`` is unset.
"""
import tools.web_tools as _wt
cached = getattr(_wt, "_exa_client", None)
if cached is not None:
return cached
api_key = os.getenv("EXA_API_KEY")
if not api_key:
raise ValueError(
"EXA_API_KEY environment variable not set. "
"Get your API key at https://exa.ai"
)
try:
from tools.lazy_deps import ensure as _lazy_ensure
_lazy_ensure("search.exa", prompt=False)
except ImportError:
pass
except Exception as exc: # noqa: BLE001 — lazy_deps surfaces install hints
raise ImportError(str(exc))
from exa_py import Exa # noqa: WPS433 — deliberately lazy
client = Exa(api_key=api_key)
client.headers["x-exa-integration"] = "hermes-agent"
_wt._exa_client = client
return client
def _reset_client_for_tests() -> None:
"""Drop the cached Exa client so tests can re-instantiate cleanly."""
import tools.web_tools as _wt
_wt._exa_client = None
class ExaWebSearchProvider(WebSearchProvider):
"""Exa search + extract provider.
Both methods are sync Exa's SDK is sync-only. The web_extract_tool
dispatcher wraps sync extracts via ``asyncio.to_thread`` when it
needs to keep the event loop responsive.
"""
@property
def name(self) -> str:
return "exa"
@property
def display_name(self) -> str:
return "Exa"
def is_available(self) -> bool:
"""Return True when ``EXA_API_KEY`` is set to a non-empty value."""
return bool(os.getenv("EXA_API_KEY", "").strip())
def supports_search(self) -> bool:
return True
def supports_extract(self) -> bool:
return True
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute an Exa search.
Returns ``{"success": True, "data": {"web": [{...}, ...]}}`` on
success, ``{"success": False, "error": str}`` on failure (incl.
missing API key and SDK install errors).
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return {"success": False, "error": "Interrupted"}
logger.info("Exa search: '%s' (limit=%d)", query, limit)
response = _get_exa_client().search(
query,
num_results=limit,
contents={"highlights": True},
)
web_results = []
for i, result in enumerate(response.results or []):
highlights = result.highlights or []
web_results.append(
{
"url": result.url or "",
"title": result.title or "",
"description": " ".join(highlights) if highlights else "",
"position": i + 1,
}
)
return {"success": True, "data": {"web": web_results}}
except ValueError as exc:
# Raised by _get_exa_client when EXA_API_KEY missing
return {"success": False, "error": str(exc)}
except ImportError as exc:
return {"success": False, "error": f"Exa SDK not installed: {exc}"}
except Exception as exc: # noqa: BLE001 — surface as failure
logger.warning("Exa search error: %s", exc)
return {"success": False, "error": f"Exa search failed: {exc}"}
def extract(self, urls: List[str], **kwargs: Any) -> List[Dict[str, Any]]:
"""Extract content from one or more URLs via Exa.
Returns a list of result dicts shaped for the legacy LLM
post-processing pipeline. On per-URL or whole-batch failure,
results carry an ``error`` field rather than raising.
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return [
{"url": u, "error": "Interrupted", "title": ""} for u in urls
]
logger.info("Exa extract: %d URL(s)", len(urls))
response = _get_exa_client().get_contents(urls, text=True)
results: List[Dict[str, Any]] = []
for result in response.results or []:
content = result.text or ""
url = result.url or ""
title = result.title or ""
results.append(
{
"url": url,
"title": title,
"content": content,
"raw_content": content,
"metadata": {"sourceURL": url, "title": title},
}
)
return results
except ValueError as exc:
return [{"url": u, "title": "", "content": "", "error": str(exc)} for u in urls]
except ImportError as exc:
return [
{"url": u, "title": "", "content": "", "error": f"Exa SDK not installed: {exc}"}
for u in urls
]
except Exception as exc: # noqa: BLE001
logger.warning("Exa extract error: %s", exc)
return [
{"url": u, "title": "", "content": "", "error": f"Exa extract failed: {exc}"}
for u in urls
]
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "Exa",
"badge": "paid",
"tag": "Semantic + neural web search with content extraction.",
"env_vars": [
{
"key": "EXA_API_KEY",
"prompt": "Exa API key",
"url": "https://exa.ai",
},
],
}
+28
View File
@@ -0,0 +1,28 @@
"""Firecrawl web search + extract plugin — bundled, auto-loaded.
Largest single plugin in this PR. Captures everything the previous
inline implementation in tools/web_tools.py did:
- Lazy import of the firecrawl SDK (~200ms cold-start cost) via a
callable proxy that defers the actual import to first use.
- Dual client paths: direct (FIRECRAWL_API_KEY / FIRECRAWL_API_URL)
OR Nous-hosted tool-gateway routing for subscribers, with
web.use_gateway as the tie-breaker.
- Per-URL scrape loop with 60s timeout, SSRF re-check after redirect,
website-policy gating, and format-aware content selection.
- Robust response shape normalization across SDK / direct API /
gateway variants (search returns differ by transport).
The plugin re-exports ``Firecrawl`` (the lazy proxy) and
``check_firecrawl_api_key`` for backward-compatibility with tests and
external code that imports those names from ``tools.web_tools``.
"""
from __future__ import annotations
from plugins.web.firecrawl.provider import FirecrawlWebSearchProvider
def register(ctx) -> None:
"""Register the Firecrawl provider with the plugin context."""
ctx.register_web_search_provider(FirecrawlWebSearchProvider())
+7
View File
@@ -0,0 +1,7 @@
name: web-firecrawl
version: 1.0.0
description: "Firecrawl web search + content extraction. Supports direct API and Nous-hosted tool-gateway routing for subscribers. Requires FIRECRAWL_API_KEY (or FIRECRAWL_API_URL for self-hosted), or an active Nous subscription with FIRECRAWL_GATEWAY_URL."
author: NousResearch
kind: backend
provides_web_providers:
- firecrawl
+773
View File
@@ -0,0 +1,773 @@
"""Firecrawl web search + extract — plugin form.
Subclasses :class:`agent.web_search_provider.WebSearchProvider`. This is
the largest provider migrated in this PR; it captures the full inline
firecrawl implementation that previously lived in tools/web_tools.py:
- :data:`Firecrawl` lazy proxy that defers the ~200ms SDK import to
first use (re-exported by tools.web_tools for backward compat with
existing tests that mock that name).
- :func:`_get_firecrawl_client` with direct + managed-gateway dual
mode, controlled by ``web.use_gateway`` config when both are
configured.
- :func:`check_firecrawl_api_key` re-exported (tests + tools_config
setup hint depend on this name living in tools.web_tools).
- :func:`_extract_web_search_results` / :func:`_extract_scrape_payload`
response-shape normalizers that handle SDK / direct API / gateway
response variants.
- Per-URL extract loop with 60s timeout, redirect-aware SSRF re-check,
website-policy gating, and format-aware content selection.
Async note: the underlying SDK is sync. ``extract()`` is declared
``async def`` because it performs per-URL I/O that benefits from
running in an executor; the implementation wraps each scrape in
:func:`asyncio.to_thread` with :func:`asyncio.wait_for(timeout=60)` to
guard against hung fetches.
Config keys this provider responds to::
web:
search_backend: "firecrawl" # explicit per-capability
extract_backend: "firecrawl" # explicit per-capability
backend: "firecrawl" # shared fallback (default)
use_gateway: false # prefer managed gateway when both
# direct + gateway credentials exist
Env vars::
FIRECRAWL_API_KEY=... # direct cloud auth
FIRECRAWL_API_URL=... # self-hosted Firecrawl
FIRECRAWL_GATEWAY_URL=... # Nous tool-gateway (subscribers)
TOOL_GATEWAY_DOMAIN=... # alternate gateway env
TOOL_GATEWAY_SCHEME=...
TOOL_GATEWAY_USER_TOKEN=...
"""
from __future__ import annotations
import asyncio
import logging
import os
from typing import Any, Dict, List, Optional, TYPE_CHECKING
from agent.web_search_provider import WebSearchProvider
from tools.website_policy import check_website_access
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Lazy Firecrawl SDK proxy
# ---------------------------------------------------------------------------
# The firecrawl SDK pulls ~200ms of imports (httpcore, firecrawl.v1/v2 type
# trees) on a cold CLI. We only need it when the backend is actually
# "firecrawl", so defer the import to first use via a callable proxy.
#
# Tests that do ``patch("tools.web_tools.Firecrawl", ...)`` continue to
# work because tools/web_tools.py re-exports ``Firecrawl`` from this
# module — so the patched name still references the same proxy instance.
if TYPE_CHECKING:
from firecrawl import Firecrawl as FirecrawlSDK # noqa: F401 — type hints only
_FIRECRAWL_CLS_CACHE: Optional[type] = None
def _load_firecrawl_cls() -> type:
"""Import and cache ``firecrawl.Firecrawl``."""
global _FIRECRAWL_CLS_CACHE
if _FIRECRAWL_CLS_CACHE is None:
try:
from tools.lazy_deps import ensure as _lazy_ensure
_lazy_ensure("search.firecrawl", prompt=False)
except ImportError:
pass
except Exception as exc: # noqa: BLE001 — surface install hint
raise ImportError(str(exc))
from firecrawl import Firecrawl as _cls # noqa: WPS433 — deliberately lazy
_FIRECRAWL_CLS_CACHE = _cls
return _FIRECRAWL_CLS_CACHE
class _FirecrawlProxy:
"""Callable proxy that looks like ``firecrawl.Firecrawl`` but imports lazily."""
__slots__ = ()
def __call__(self, *args: Any, **kwargs: Any) -> Any:
return _load_firecrawl_cls()(*args, **kwargs)
def __instancecheck__(self, obj: Any) -> bool:
return isinstance(obj, _load_firecrawl_cls())
def __repr__(self) -> str:
return "<lazy firecrawl.Firecrawl proxy>"
Firecrawl = _FirecrawlProxy()
# ---------------------------------------------------------------------------
# Client construction (direct vs managed-gateway)
# ---------------------------------------------------------------------------
#
# The canonical cache slots live on :mod:`tools.web_tools` so tests that do
# ``tools.web_tools._firecrawl_client = None`` between cases see fresh
# state. The plugin reads/writes through that public module — see
# :func:`_get_firecrawl_client` below.
def _get_direct_firecrawl_config() -> Optional[tuple]:
"""Return explicit direct Firecrawl kwargs + cache key, or None when unset."""
api_key = os.getenv("FIRECRAWL_API_KEY", "").strip()
api_url = os.getenv("FIRECRAWL_API_URL", "").strip().rstrip("/")
if not api_key and not api_url:
return None
kwargs: Dict[str, str] = {}
if api_key:
kwargs["api_key"] = api_key
if api_url:
kwargs["api_url"] = api_url
return kwargs, ("direct", api_url or None, api_key or None)
def _get_firecrawl_gateway_url() -> str:
"""Return the configured Firecrawl gateway URL."""
import tools.web_tools as _wt
return _wt.build_vendor_gateway_url("firecrawl")
def _is_tool_gateway_ready() -> bool:
"""Return True when gateway URL + Nous Subscriber token are available.
Reads ``read_nous_access_token`` and ``resolve_managed_tool_gateway``
via :mod:`tools.web_tools` rather than direct imports, so unit tests
that ``patch("tools.web_tools._read_nous_access_token", ...)`` see
their patches honored. The names are re-exported on
:mod:`tools.web_tools` for exactly this reason.
"""
import tools.web_tools as _wt
return _wt.resolve_managed_tool_gateway(
"firecrawl", token_reader=_wt._read_nous_access_token
) is not None
def _has_direct_firecrawl_config() -> bool:
"""Return True when direct Firecrawl config is explicitly configured."""
return _get_direct_firecrawl_config() is not None
def check_firecrawl_api_key() -> bool:
"""Return True when Firecrawl backend (direct or gateway) is usable.
Re-exported by :mod:`tools.web_tools` for backward compatibility with
existing tests and the ``hermes tools`` setup flow.
"""
return _has_direct_firecrawl_config() or _is_tool_gateway_ready()
def _firecrawl_backend_help_suffix() -> str:
"""Return optional managed-gateway guidance for Firecrawl help text."""
import tools.web_tools as _wt
if not _wt.managed_nous_tools_enabled():
return ""
return (
", or use the Nous Tool Gateway via your subscription "
"(FIRECRAWL_GATEWAY_URL or TOOL_GATEWAY_DOMAIN)"
)
def _raise_web_backend_configuration_error() -> None:
"""Raise a clear error for unsupported web backend configuration."""
import tools.web_tools as _wt
message = (
"Web tools are not configured. "
"Set FIRECRAWL_API_KEY for cloud Firecrawl or set FIRECRAWL_API_URL "
"for a self-hosted Firecrawl instance."
)
if _wt.managed_nous_tools_enabled():
message += (
" With your Nous subscription you can also use the Tool Gateway — "
"run `hermes tools` and select Nous Subscription as the web provider."
)
raise ValueError(message)
def _get_firecrawl_client() -> Any:
"""Get or create the cached Firecrawl client.
When ``web.use_gateway`` is set in config, the managed Tool Gateway is
preferred even if direct Firecrawl credentials are present. Otherwise
direct Firecrawl takes precedence when explicitly configured.
Raises ValueError when neither path is usable.
The cached client is stored on :mod:`tools.web_tools` (as
``_firecrawl_client`` and ``_firecrawl_client_config``) rather than on
this plugin module so that unit tests that reset the cache via
``tools.web_tools._firecrawl_client = None`` keep working. Helper
functions (``prefers_gateway``, ``resolve_managed_tool_gateway``,
``_read_nous_access_token``, ``Firecrawl``) are also looked up via
:mod:`tools.web_tools` for the same reason see
:func:`_is_tool_gateway_ready`.
"""
import tools.web_tools as _wt
direct_config = _get_direct_firecrawl_config()
if direct_config is not None and not _wt.prefers_gateway("web"):
kwargs, client_config = direct_config
else:
managed_gateway = _wt.resolve_managed_tool_gateway(
"firecrawl", token_reader=_wt._read_nous_access_token
)
if managed_gateway is None:
logger.error(
"Firecrawl client initialization failed: "
"missing direct config and tool-gateway auth."
)
_raise_web_backend_configuration_error()
kwargs = {
"api_key": managed_gateway.nous_user_token,
"api_url": managed_gateway.gateway_origin,
}
client_config = (
"tool-gateway",
kwargs["api_url"],
managed_gateway.nous_user_token,
)
cached = getattr(_wt, "_firecrawl_client", None)
cached_config = getattr(_wt, "_firecrawl_client_config", None)
if cached is not None and cached_config == client_config:
return cached
# Construct via the re-exported Firecrawl proxy on tools.web_tools so
# unit tests patching ``tools.web_tools.Firecrawl`` see their mock.
_wt._firecrawl_client = _wt.Firecrawl(**kwargs)
_wt._firecrawl_client_config = client_config
return _wt._firecrawl_client
def _reset_client_for_tests() -> None:
"""Drop the cached Firecrawl client so tests can re-instantiate cleanly.
Clears the canonical slots on :mod:`tools.web_tools` (where
:func:`_get_firecrawl_client` reads/writes them).
"""
import tools.web_tools as _wt
_wt._firecrawl_client = None
_wt._firecrawl_client_config = None
# ---------------------------------------------------------------------------
# Response shape normalization (SDK / direct / gateway differ)
# ---------------------------------------------------------------------------
def _to_plain_object(value: Any) -> Any:
"""Convert SDK objects to plain python data structures when possible."""
if value is None:
return None
if isinstance(value, (dict, list, str, int, float, bool)):
return value
if hasattr(value, "model_dump"):
try:
return value.model_dump()
except Exception: # noqa: BLE001
pass
if hasattr(value, "__dict__"):
try:
return {k: v for k, v in value.__dict__.items() if not k.startswith("_")}
except Exception: # noqa: BLE001
pass
return value
def _normalize_result_list(values: Any) -> List[Dict[str, Any]]:
"""Normalize mixed SDK/list payloads into a list of dicts."""
if not isinstance(values, list):
return []
normalized: List[Dict[str, Any]] = []
for item in values:
plain = _to_plain_object(item)
if isinstance(plain, dict):
normalized.append(plain)
return normalized
def _extract_web_search_results(response: Any) -> List[Dict[str, Any]]:
"""Extract Firecrawl search results across SDK/direct/gateway response shapes."""
response_plain = _to_plain_object(response)
if isinstance(response_plain, dict):
data = response_plain.get("data")
if isinstance(data, list):
return _normalize_result_list(data)
if isinstance(data, dict):
data_web = _normalize_result_list(data.get("web"))
if data_web:
return data_web
data_results = _normalize_result_list(data.get("results"))
if data_results:
return data_results
top_web = _normalize_result_list(response_plain.get("web"))
if top_web:
return top_web
top_results = _normalize_result_list(response_plain.get("results"))
if top_results:
return top_results
if hasattr(response, "web"):
return _normalize_result_list(getattr(response, "web", []))
return []
def _extract_scrape_payload(scrape_result: Any) -> Dict[str, Any]:
"""Normalize Firecrawl scrape payload shape across SDK and gateway variants."""
result_plain = _to_plain_object(scrape_result)
if not isinstance(result_plain, dict):
return {}
nested = result_plain.get("data")
if isinstance(nested, dict):
return nested
return result_plain
# ---------------------------------------------------------------------------
# Provider class
# ---------------------------------------------------------------------------
class FirecrawlWebSearchProvider(WebSearchProvider):
"""Firecrawl search + extract provider with dual auth paths."""
@property
def name(self) -> str:
return "firecrawl"
@property
def display_name(self) -> str:
return "Firecrawl"
def is_available(self) -> bool:
"""Return True when direct Firecrawl OR managed-gateway path is configured."""
return check_firecrawl_api_key()
def supports_search(self) -> bool:
return True
def supports_extract(self) -> bool:
return True
def supports_crawl(self) -> bool:
return True
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a Firecrawl search.
Sync; matches the legacy ``_get_firecrawl_client().search(...)``
call directly. Normalizes the response across SDK/direct/gateway
shapes via :func:`_extract_web_search_results`.
Pre-flight errors (``ValueError`` from configuration check,
``ImportError`` from missing SDK) propagate to the dispatcher's
top-level handler, which wraps them as ``tool_error(...)``
matching the legacy ``{"error": "Error searching web: ..."}``
envelope. Only in-flight errors are caught and surfaced as
``{"success": False, "error": ...}``.
"""
from tools.interrupt import is_interrupted
if is_interrupted():
return {"success": False, "error": "Interrupted"}
logger.info("Firecrawl search: '%s' (limit=%d)", query, limit)
# _get_firecrawl_client() raises ValueError on unconfigured systems —
# let it propagate so the dispatcher emits the legacy envelope shape.
client = _get_firecrawl_client()
try:
response = client.search(query=query, limit=limit)
web_results = _extract_web_search_results(response)
logger.info("Firecrawl: found %d search results", len(web_results))
return {"success": True, "data": {"web": web_results}}
except Exception as exc: # noqa: BLE001
logger.warning("Firecrawl search error: %s", exc)
return {"success": False, "error": f"Firecrawl search failed: {exc}"}
async def extract(self, urls: List[str], **kwargs: Any) -> List[Dict[str, Any]]:
"""Extract content from one or more URLs via Firecrawl.
Async; each URL is scraped in a background thread with a 60s
timeout. After scraping, the final URL (post-redirect) is
re-checked against website-access policy.
Accepted kwargs (others ignored for forward compat):
- ``format``: ``"markdown"`` or ``"html"``; default is both
(request both, return markdown when available).
Returns the legacy per-URL list-of-results shape. Per-URL failures
(timeout, SSRF block, scrape error, policy block) become items
with an ``error`` field rather than raising.
"""
from tools.interrupt import is_interrupted as _is_interrupted
if _is_interrupted():
return [{"url": u, "error": "Interrupted", "title": ""} for u in urls]
format = kwargs.get("format")
formats: List[str] = []
if format == "markdown":
formats = ["markdown"]
elif format == "html":
formats = ["html"]
else:
formats = ["markdown", "html"]
# check_website_access is the legacy policy gate; imported at
# module level (lazy-friendly because the website_policy import is
# cheap) so monkeypatching it in tests works as expected.
results: List[Dict[str, Any]] = []
for url in urls:
if _is_interrupted():
results.append({"url": url, "error": "Interrupted", "title": ""})
continue
# Pre-scrape website policy gate
blocked = check_website_access(url)
if blocked:
logger.info(
"Blocked web_extract for %s by rule %s",
blocked["host"],
blocked["rule"],
)
results.append(
{
"url": url,
"title": "",
"content": "",
"error": blocked["message"],
"blocked_by_policy": {
"host": blocked["host"],
"rule": blocked["rule"],
"source": blocked["source"],
},
}
)
continue
try:
logger.info("Firecrawl scraping: %s", url)
try:
scrape_result = await asyncio.wait_for(
asyncio.to_thread(
_get_firecrawl_client().scrape,
url=url,
formats=formats,
),
timeout=60,
)
except asyncio.TimeoutError:
logger.warning("Firecrawl scrape timed out for %s", url)
results.append(
{
"url": url,
"title": "",
"content": "",
"error": (
"Scrape timed out after 60s — page may be too large "
"or unresponsive. Try browser_navigate instead."
),
}
)
continue
scrape_payload = _extract_scrape_payload(scrape_result)
metadata = scrape_payload.get("metadata", {})
content_markdown = scrape_payload.get("markdown")
content_html = scrape_payload.get("html")
# Ensure metadata is a dict (SDK may return a typed object)
if not isinstance(metadata, dict):
if hasattr(metadata, "model_dump"):
metadata = metadata.model_dump()
elif hasattr(metadata, "__dict__"):
metadata = metadata.__dict__
else:
metadata = {}
title = metadata.get("title", "")
final_url = metadata.get("sourceURL", url)
# Re-check website-access policy after any redirect
final_blocked = check_website_access(final_url)
if final_blocked:
logger.info(
"Blocked redirected web_extract for %s by rule %s",
final_blocked["host"],
final_blocked["rule"],
)
results.append(
{
"url": final_url,
"title": title,
"content": "",
"raw_content": "",
"error": final_blocked["message"],
"blocked_by_policy": {
"host": final_blocked["host"],
"rule": final_blocked["rule"],
"source": final_blocked["source"],
},
}
)
continue
# Choose markdown vs html according to the requested format
if format == "markdown" or (format is None and content_markdown):
chosen_content = content_markdown
else:
chosen_content = content_html or content_markdown or ""
results.append(
{
"url": final_url,
"title": title,
"content": chosen_content,
"raw_content": chosen_content,
"metadata": metadata,
}
)
except Exception as scrape_err: # noqa: BLE001
logger.debug("Firecrawl scrape failed for %s: %s", url, scrape_err)
results.append(
{
"url": url,
"title": "",
"content": "",
"raw_content": "",
"error": str(scrape_err),
}
)
return results
async def crawl(self, url: str, **kwargs: Any) -> Dict[str, Any]:
"""Crawl a seed URL via Firecrawl's ``/crawl`` endpoint.
Sync SDK call wrapped in ``asyncio.to_thread`` because the dispatcher
in :func:`tools.web_tools.web_crawl_tool` is async and runs LLM
post-processing on the response. The dispatcher gates the seed URL
against SSRF + website-access policy before calling us; this method
re-checks every crawled page's URL against the policy after the
crawl returns to catch redirected pages that map to a blocked host.
Accepted kwargs (others ignored for forward compat):
- ``instructions``: str logged then dropped. Firecrawl's /crawl
endpoint does NOT accept natural-language instructions (that's
an /extract feature), so we record the value for debugging and
proceed without it. Tavily's crawl IS instruction-aware; this
divergence is documented in both plugins' docstrings.
- ``limit``: int max pages to crawl (default 20).
- ``depth``: str accepted for API parity with Tavily; ignored
by Firecrawl's crawl endpoint.
Returns ``{"results": [...]}`` matching the shape that
:func:`tools.web_tools.web_crawl_tool`'s shared LLM-summarization
path expects. Per-page failures (policy block on redirected URL,
bad response shape) are included as items with an ``error`` field
rather than raising.
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return {"results": [{"url": url, "title": "", "content": "", "error": "Interrupted"}]}
instructions = kwargs.get("instructions")
limit = kwargs.get("limit", 20)
# Firecrawl's /crawl endpoint does not accept natural-language
# instructions (that's an /extract feature). Log + drop.
if instructions:
logger.info(
"Firecrawl crawl: 'instructions' parameter ignored "
"(not supported by Firecrawl /crawl)"
)
logger.info("Firecrawl crawl: %s (limit=%d)", url, limit)
crawl_params = {
"limit": limit,
"scrape_options": {"formats": ["markdown"]},
}
# The SDK call is sync; run in a thread so we don't block the
# gateway event loop on a multi-page crawl.
crawl_result = await asyncio.to_thread(
_get_firecrawl_client().crawl,
url=url,
**crawl_params,
)
# CrawlJob normalization across SDK + direct + gateway shapes.
data_list: List[Any] = []
if hasattr(crawl_result, "data"):
data_list = crawl_result.data if crawl_result.data else []
logger.info(
"Firecrawl crawl status: %s, %d pages",
getattr(crawl_result, "status", "unknown"),
len(data_list),
)
elif isinstance(crawl_result, dict) and "data" in crawl_result:
data_list = crawl_result.get("data", []) or []
else:
logger.warning(
"Firecrawl crawl: unexpected result type %r",
type(crawl_result).__name__,
)
pages: List[Dict[str, Any]] = []
for item in data_list:
# Pydantic model | typed object | dict — handle all shapes.
content_markdown = None
content_html = None
metadata: Any = {}
if hasattr(item, "model_dump"):
item_dict = item.model_dump()
content_markdown = item_dict.get("markdown")
content_html = item_dict.get("html")
metadata = item_dict.get("metadata", {})
elif hasattr(item, "__dict__"):
content_markdown = getattr(item, "markdown", None)
content_html = getattr(item, "html", None)
metadata_obj = getattr(item, "metadata", {})
if hasattr(metadata_obj, "model_dump"):
metadata = metadata_obj.model_dump()
elif hasattr(metadata_obj, "__dict__"):
metadata = metadata_obj.__dict__
elif isinstance(metadata_obj, dict):
metadata = metadata_obj
else:
metadata = {}
elif isinstance(item, dict):
content_markdown = item.get("markdown")
content_html = item.get("html")
metadata = item.get("metadata", {})
# Ensure metadata is a plain dict.
if not isinstance(metadata, dict):
if hasattr(metadata, "model_dump"):
metadata = metadata.model_dump()
elif hasattr(metadata, "__dict__"):
metadata = metadata.__dict__
else:
metadata = {}
page_url = metadata.get(
"sourceURL", metadata.get("url", "Unknown URL")
)
title = metadata.get("title", "")
# Per-page policy re-check (catches blocked redirects).
page_blocked = check_website_access(page_url)
if page_blocked:
logger.info(
"Blocked crawled page %s by rule %s",
page_blocked["host"],
page_blocked["rule"],
)
pages.append(
{
"url": page_url,
"title": title,
"content": "",
"raw_content": "",
"error": page_blocked["message"],
"blocked_by_policy": {
"host": page_blocked["host"],
"rule": page_blocked["rule"],
"source": page_blocked["source"],
},
}
)
continue
content = content_markdown or content_html or ""
pages.append(
{
"url": page_url,
"title": title,
"content": content,
"raw_content": content,
"metadata": metadata,
}
)
return {"results": pages}
except ValueError as exc:
return {"results": [{"url": url, "title": "", "content": "", "error": str(exc)}]}
except ImportError as exc:
return {
"results": [
{
"url": url,
"title": "",
"content": "",
"error": f"Firecrawl SDK not installed: {exc}",
}
]
}
except Exception as exc: # noqa: BLE001
logger.warning("Firecrawl crawl error: %s", exc)
return {
"results": [
{
"url": url,
"title": "",
"content": "",
"error": f"Firecrawl crawl failed: {exc}",
}
]
}
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "Firecrawl",
"badge": "paid · optional gateway",
"tag": (
"Full search + extract + crawl; supports direct API and "
"Nous tool-gateway routing."
),
"env_vars": [
{
"key": "FIRECRAWL_API_KEY",
"prompt": "Firecrawl API key (or leave blank for self-hosted)",
"url": "https://docs.firecrawl.dev/introduction",
},
],
}
+16
View File
@@ -0,0 +1,16 @@
"""Parallel.ai web search + extract plugin — bundled, auto-loaded.
First plugin in this repo to expose an async :meth:`extract` Parallel's
SDK is async-native (``AsyncParallel.beta.extract``). The web_extract_tool
dispatcher detects coroutines via :func:`inspect.iscoroutinefunction` and
awaits.
"""
from __future__ import annotations
from plugins.web.parallel.provider import ParallelWebSearchProvider
def register(ctx) -> None:
"""Register the Parallel provider with the plugin context."""
ctx.register_web_search_provider(ParallelWebSearchProvider())
+7
View File
@@ -0,0 +1,7 @@
name: web-parallel
version: 1.0.0
description: "Parallel.ai web search + content extraction. Search returns objective-tuned results; extract uses the async SDK for parallel page fetches. Requires PARALLEL_API_KEY — sign up at https://parallel.ai."
author: NousResearch
kind: backend
provides_web_providers:
- parallel
+291
View File
@@ -0,0 +1,291 @@
"""Parallel.ai web search + content extraction — plugin form.
Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Uses two
distinct Parallel SDK clients:
- ``Parallel`` (sync) for :meth:`search`
- ``AsyncParallel`` (async) for :meth:`extract`
This is the first plugin to exercise the **async-extract** code path in
the ABC: :meth:`extract` is declared ``async def``, and the dispatcher
in :func:`tools.web_tools.web_extract_tool` detects coroutines via
:func:`inspect.iscoroutinefunction` and awaits.
Config keys this provider responds to::
web:
search_backend: "parallel" # explicit per-capability
extract_backend: "parallel" # explicit per-capability
backend: "parallel" # shared fallback
# Optional: search mode (default "agentic"; also "fast" or "one-shot")
# via the PARALLEL_SEARCH_MODE env var.
Env vars::
PARALLEL_API_KEY=... # https://parallel.ai (required)
PARALLEL_SEARCH_MODE=agentic # optional: agentic|fast|one-shot
"""
from __future__ import annotations
import logging
import os
from typing import Any, Dict, List
from agent.web_search_provider import WebSearchProvider
logger = logging.getLogger(__name__)
# Module-level note: the canonical cache slots ``_parallel_client`` and
# ``_async_parallel_client`` live on :mod:`tools.web_tools` so tests that do
# ``tools.web_tools._parallel_client = None`` between cases see fresh state.
# The plugin reads/writes through that public module (see
# :func:`_get_sync_client` / :func:`_get_async_client`).
def _ensure_parallel_sdk_installed() -> None:
"""Trigger lazy install of the parallel SDK if it isn't present.
Mirrors the lazy-deps pattern used by the legacy implementation.
Swallows benign ImportError from the lazy_deps helper itself; if the
SDK is genuinely missing the subsequent ``from parallel import ...``
raises ImportError that the caller can handle.
"""
try:
from tools.lazy_deps import ensure as _lazy_ensure
_lazy_ensure("search.parallel", prompt=False)
except ImportError:
pass
except Exception as exc: # noqa: BLE001 — surface install hint as ImportError
raise ImportError(str(exc))
def _get_sync_client() -> Any:
"""Lazy-load + cache the sync Parallel client.
Cache lives on :mod:`tools.web_tools` (as ``_parallel_client``) so unit
tests that reset that name between cases keep working.
"""
import tools.web_tools as _wt
cached = getattr(_wt, "_parallel_client", None)
if cached is not None:
return cached
api_key = os.getenv("PARALLEL_API_KEY")
if not api_key:
raise ValueError(
"PARALLEL_API_KEY environment variable not set. "
"Get your API key at https://parallel.ai"
)
_ensure_parallel_sdk_installed()
from parallel import Parallel # noqa: WPS433 — deliberately lazy
client = Parallel(api_key=api_key)
_wt._parallel_client = client
return client
def _get_async_client() -> Any:
"""Lazy-load + cache the async Parallel client.
Cache lives on :mod:`tools.web_tools` (as ``_async_parallel_client``).
"""
import tools.web_tools as _wt
cached = getattr(_wt, "_async_parallel_client", None)
if cached is not None:
return cached
api_key = os.getenv("PARALLEL_API_KEY")
if not api_key:
raise ValueError(
"PARALLEL_API_KEY environment variable not set. "
"Get your API key at https://parallel.ai"
)
_ensure_parallel_sdk_installed()
from parallel import AsyncParallel # noqa: WPS433 — deliberately lazy
client = AsyncParallel(api_key=api_key)
_wt._async_parallel_client = client
return client
def _reset_clients_for_tests() -> None:
"""Drop both cached clients so tests can re-instantiate cleanly.
Clears the canonical slots on :mod:`tools.web_tools` (where
:func:`_get_sync_client` / :func:`_get_async_client` read/write them).
"""
import tools.web_tools as _wt
_wt._parallel_client = None
_wt._async_parallel_client = None
# Backward-compatible aliases for the names that lived in tools.web_tools
# before the migration (matches existing tests + external callers).
_get_parallel_client = _get_sync_client
_get_async_parallel_client = _get_async_client
def _resolve_search_mode() -> str:
"""Return the validated PARALLEL_SEARCH_MODE value (default "agentic")."""
mode = os.getenv("PARALLEL_SEARCH_MODE", "agentic").lower().strip()
if mode not in {"fast", "one-shot", "agentic"}:
mode = "agentic"
return mode
class ParallelWebSearchProvider(WebSearchProvider):
"""Parallel.ai search + async extract provider."""
@property
def name(self) -> str:
return "parallel"
@property
def display_name(self) -> str:
return "Parallel"
def is_available(self) -> bool:
"""Return True when ``PARALLEL_API_KEY`` is set to a non-empty value."""
return bool(os.getenv("PARALLEL_API_KEY", "").strip())
def supports_search(self) -> bool:
return True
def supports_extract(self) -> bool:
return True
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a Parallel search (sync).
Uses the ``beta.search`` endpoint with the configured mode
(``PARALLEL_SEARCH_MODE`` env var, default "agentic"). Limit is
capped at 20 server-side.
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return {"success": False, "error": "Interrupted"}
mode = _resolve_search_mode()
logger.info(
"Parallel search: '%s' (mode=%s, limit=%d)", query, mode, limit
)
response = _get_sync_client().beta.search(
search_queries=[query],
objective=query,
mode=mode,
max_results=min(limit, 20),
)
web_results = []
for i, result in enumerate(response.results or []):
excerpts = result.excerpts or []
web_results.append(
{
"url": result.url or "",
"title": result.title or "",
"description": " ".join(excerpts) if excerpts else "",
"position": i + 1,
}
)
return {"success": True, "data": {"web": web_results}}
except ValueError as exc:
return {"success": False, "error": str(exc)}
except ImportError as exc:
return {
"success": False,
"error": f"Parallel SDK not installed: {exc}",
}
except Exception as exc: # noqa: BLE001
logger.warning("Parallel search error: %s", exc)
return {"success": False, "error": f"Parallel search failed: {exc}"}
async def extract(
self, urls: List[str], **kwargs: Any
) -> List[Dict[str, Any]]:
"""Extract content from one or more URLs via the async SDK.
Returns the legacy list-of-results shape that
:func:`tools.web_tools.web_extract_tool` expects: one entry per
successful URL plus one entry per failed URL with an ``error``
field. Errors are not raised they're returned as per-URL items.
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return [
{"url": u, "error": "Interrupted", "title": ""} for u in urls
]
logger.info("Parallel extract: %d URL(s)", len(urls))
response = await _get_async_client().beta.extract(
urls=urls,
full_content=True,
)
results: List[Dict[str, Any]] = []
for result in response.results or []:
content = result.full_content or ""
if not content:
content = "\n\n".join(result.excerpts or [])
url = result.url or ""
title = result.title or ""
results.append(
{
"url": url,
"title": title,
"content": content,
"raw_content": content,
"metadata": {"sourceURL": url, "title": title},
}
)
for error in response.errors or []:
results.append(
{
"url": error.url or "",
"title": "",
"content": "",
"error": error.content or error.error_type or "extraction failed",
"metadata": {"sourceURL": error.url or ""},
}
)
return results
except ValueError as exc:
return [{"url": u, "title": "", "content": "", "error": str(exc)} for u in urls]
except ImportError as exc:
return [
{"url": u, "title": "", "content": "", "error": f"Parallel SDK not installed: {exc}"}
for u in urls
]
except Exception as exc: # noqa: BLE001
logger.warning("Parallel extract error: %s", exc)
return [
{"url": u, "title": "", "content": "", "error": f"Parallel extract failed: {exc}"}
for u in urls
]
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "Parallel",
"badge": "paid",
"tag": "Objective-tuned search + parallel page extraction.",
"env_vars": [
{
"key": "PARALLEL_API_KEY",
"prompt": "Parallel API key",
"url": "https://parallel.ai",
},
],
}
+15
View File
@@ -0,0 +1,15 @@
"""SearXNG search plugin — bundled, auto-loaded.
Backed by a user-hosted SearXNG instance (URL configured via ``SEARXNG_URL``).
Search-only pair with an extract provider (firecrawl/tavily/exa) for
``web_extract`` calls.
"""
from __future__ import annotations
from plugins.web.searxng.provider import SearXNGWebSearchProvider
def register(ctx) -> None:
"""Register the SearXNG provider with the plugin context."""
ctx.register_web_search_provider(SearXNGWebSearchProvider())

Some files were not shown because too many files have changed in this diff Show More