Compare commits

..

660 Commits

Author SHA1 Message Date
emozilla ddd2542ba5 fix(dashboard): persist chat tab state across tab switches
The dashboard's Chat tab (hermes dashboard --tui) lost its session
whenever the user navigated to another tab and came back.  React Router
unmounted ChatPage on path change, which ran the cleanup function,
closed the PTY WebSocket, and terminated the underlying TUI child -
so the next mount generated a fresh channel id, spawned a new PTY, and
started a brand-new conversation.

Rather than rebuild the destroyed state (session id capture + resume
via HERMES_TUI_RESUME would reload history from disk but drop in-flight
tool state, scrollback, and picker position), keep the component tree
alive.

* Pull ChatPage out of Routes into a sibling always-mounted host that
  toggles visibility via display:none keyed off the current route.  A
  tiny ChatRouteSink still claims /chat so the catch-all redirect
  does not fire.
* xterm instance, WebSocket, PTY child, and TUI/agent state all
  survive; returning to /chat shows the exact conversation the user
  left.
* Respect plugin `/chat` overrides: if a plugin manifest declares
  `tab.override: "/chat"`, the Routes tree already swaps the element
  for <PluginPage /> — we additionally suppress the persistent host
  so the two don't paint on top of each other.  Preserves the
  pre-persistence contract that a plugin owning /chat replaces the
  built-in chat UI entirely.
* Wait for usePlugins() to finish loading before mounting the
  persistent host.  Manifests arrive asynchronously from
  /api/dashboard/plugins, so without the `!pluginsLoading` gate the
  host would mount with manifests=[], spawn a PTY, and then unmount
  mid-session when the manifest list resolves and reveals a /chat
  override.  Typical delay is <50ms; worst case is the 2s plugin-
  registration safety timeout.  Cheaper than killing someone's
  conversation underneath them.
* Gate page-header slot (`setEnd`), the mobile sheet's portalled
  render, and body-scroll lock on a new `isActive` prop so the hidden
  ChatPage doesn't fight the active page for shared state.  The
  scroll-lock effect keys on the *derived* `mobilePanelOpen` (which is
  `isActive && mobilePanelOpenRaw`) rather than the raw state — that
  way tab-switch flips the dep false, fires the cleanup, and releases
  `document.body.style.overflow`.  Keying on the raw state would leave
  body.overflow="hidden" stuck on /sessions and every other tab until
  the user navigated back to /chat and explicitly closed the sheet.
* When isActive flips false to true, force a double-rAF fit:
  display:none collapses the host box and ResizeObserver does not fire
  on display changes, so xterm would otherwise stay at a stale or 1x1
  grid.  Also early-return from syncTerminalMetrics when the host has
  zero area, since fit() on a zero-sized element produces a 1x1
  terminal.
* Focus handling on tab return: only steal focus into the terminal if
  focus wasn't already parked somewhere inside ChatPage (e.g. the
  sidebar model picker, a tool-call entry).  Yanking focus away from
  whatever the user last clicked is surprising and a screen-reader
  foot-gun; the typical "first activation" case still focuses the
  terminal because document.activeElement is <body> at that point.

Trade-off worth flagging, deliberately not mitigated in this change:
while hidden, ChatPage still holds a PTY child + WebSocket + xterm
instance for the dashboard's full lifetime.  The WS keeps delivering
bytes and xterm keeps parsing them into a display:none host (cheap —
no paint work, but not free).  Reasonable costs to pay for the session
preservation; if they become a problem we can pause `term.write` when
!isActive or idle-disconnect after N minutes hidden.

Lint clean on touched files.  tsc -b && vite build pass.
2026-04-28 02:40:25 -04:00
iamagenius00 f2fcc087f7 test(gateway): cover /compress summary-failure warning path
PR #16333 added a warning to the manual /compress reply when the
auxiliary summariser fails and the static fallback placeholder is
used, but only the gateway-hygiene path had a test
(test_session_hygiene_warns_user_when_summary_generation_fails).
The /compress branch in _handle_compress_command was uncovered.

New test test_compress_command_appends_warning_when_summary_generation_fails
mocks the compressor's _last_summary_fallback_used /
_last_summary_dropped_count / _last_summary_error fields and
verifies the /compress reply contains the ⚠️ marker, the underlying
error string, the dropped message count, and the 'historical
message(s) were removed' wording — i.e. the same contract the
hygiene-path test enforces.
2026-04-27 19:18:13 -07:00
iamagenius00 e7f2204a07 fix(compression): reset _last_summary_error at start of compress()
The per-call reset block at the top of compress() cleared
_last_summary_dropped_count and _last_summary_fallback_used but
not _last_summary_error. Functionally this didn't break the
gateway warning path (callers gate on _last_summary_fallback_used
first, and _last_summary_error is overwritten on the next failure),
but it left the three tracking fields inconsistent — anyone
reading _last_summary_error standalone after a successful compress
would see a stale value from a previous failed compress.

Reset all three together so the per-call contract is uniform.
2026-04-27 19:18:13 -07:00
iamagenius00 5c56805a74 fix(compression): align fallback placeholder wording with gateway warning
The fallback placeholder said "N conversation turns were removed" while the
gateway warning said "N historical message(s) were removed". Use "messages"
in both so users don't wonder if the two counters refer to different things.
2026-04-27 19:18:13 -07:00
iamagenius00 c61bc3f72c fix(compression): pass thread_id metadata + add gateway test for warning delivery
Address review feedback on PR #16333:

1. The hygiene-path warning send was missing metadata=_hyg_meta. On
   Telegram topics / Slack threads / Discord threads the warning would
   land in the main channel instead of the originating thread. Now
   reuses the same _hyg_meta dict already computed for the hygiene
   compaction itself.

2. New gateway-level test
   test_session_hygiene_warns_user_when_summary_generation_fails
   verifies end-to-end:
   - When the compressor's _last_summary_fallback_used flag is True,
     the gateway invokes adapter.send() exactly once.
   - The warning message includes the dropped count and the underlying
     error string.
   - metadata={'thread_id': ...} is propagated so the warning lands
     in the originating topic/thread.

Tests: 20 gateway hygiene + 54 context_compressor — all pass.
2026-04-27 19:18:13 -07:00
iamagenius00 dfdc4276e8 fix(compression): notify gateway users when summary generation fails
When auxiliary compression's summary LLM call fails (e.g. model 404,
auxiliary model misconfigured), the compressor still drops the selected
turns and inserts a static fallback placeholder — the dropped context
is unrecoverable.

Previously the only signal of this was a WARNING in agent.log. Gateway
users (Telegram/Discord/etc.) had no way to know context was lost
because the existing _emit_warning path requires a status_callback,
and the gateway hygiene path uses a temporary _hyg_agent with
quiet_mode=True and no callback wired up.

Changes:
- ContextCompressor: track _last_summary_fallback_used and
  _last_summary_dropped_count on each compress() call. Cleared at the
  start of compress() and on session reset.
- gateway/run.py hygiene: after auto-compress, inspect the temp
  agent's compressor; if fallback was used, send a visible ⚠️ warning
  to the user via the platform adapter (TG/Discord/etc.) including
  dropped count and the underlying error.
- gateway/run.py /compress: append the same warning to the manual
  compress reply so users running /compress see the failure too.

Acceptance:
- Summary success: no user-visible warning (unchanged).
- Summary failure on gateway hygiene: user receives a TG/Discord
  message with dropped count + error + remediation hint.
- Summary failure on /compress: warning appended to the command reply.
- CLI status_callback / _emit_warning path is untouched.
- Test coverage: two new tests verify the tracking fields are set on
  failure and cleared on subsequent success.
2026-04-27 19:18:13 -07:00
Teknium f40b20d13c fix(gateway): keep typing indicator alive across slow send_typing calls (#16763)
The typing-indicator refresh loop in BasePlatformAdapter._keep_typing
awaited each send_typing call unconditionally. Each call is an HTTP
round-trip to the platform API (Telegram/Discord), normally ~100ms. When
the same network instability that causes upstream provider timeouts
(e.g. Anthropic capacity blips slowing first-token latency past the
120s stream-read timeout) also slows the platform typing API to
multi-second response times, the refresh loop stalls inside the await.
Platform-side typing expires at ~5s, so the bubble dies and stays dead
until the stuck send_typing call returns — right when the user most
needs the 'still working' signal and instead sees a bot that looks
dead, then asks 'wtf are you doing' which itself interrupts the
eventually-recovering turn.

Bound each send_typing with asyncio.wait_for (1.5s cap, derived from
interval so it's always below the 2s cadence). Slow calls get abandoned
so the next scheduled tick fires a fresh send_typing on schedule. As
long as any one of them reaches the platform within its ~5s
typing-expiry window, the bubble stays visible across the stall.

Also catches non-timeout send_typing exceptions (transient HTTP errors)
so one bad tick doesn't terminate the whole loop.

Tests: 4 new in tests/gateway/test_keep_typing_timeout.py covering
slow-send non-blocking, fast-send still-awaited, exception resilience,
and paused-chat regression guard.
2026-04-27 19:09:32 -07:00
kshitijk4poor 853ed609a1 feat(skills): bundle touchdesigner-mcp by default 2026-04-27 18:22:58 -07:00
helix4u 49fb75463f fix(gateway): keep env-token Slack enabled 2026-04-27 18:19:14 -07:00
brooklyn! e0e67a99bb fix(tui): address copilot follow-up review on PR #16732 (#16740)
- moveCursor(extend=true) now collapses to the bare cursor when the
  computed offset equals the existing anchor instead of leaving a
  zero-length sel. Without this, Shift+Left at col 0 / Shift+Home at
  start would silently hide the hardware cursor (selected truthy)
  without rendering any highlight.
- _tui_need_npm_install also catches UnicodeDecodeError so a corrupted
  / non-UTF8 lockfile falls back to the mtime path the docstring
  promises instead of crashing.

Made-with: Cursor
2026-04-27 16:54:25 -07:00
brooklyn! e7091bb326 fix(tui): mouse + keyboard text selection in the composer (#16732)
* feat(tui): auto copy-on-select for transcript text

Drag in the transcript already highlighted but you had to press Cmd+C to
land it on the clipboard, and the highlight cleared on copy — most users
never realised selection existed. Now drag-release fires copySelectionNoClear
so the text is on the clipboard immediately while the highlight stays put,
matching iTerm2's "Copy to pasteboard on selection" default. Esc clears.

Behaviour:
- Single click in the input still positions the cursor (TextInput onClick).
- Single click in the transcript still does nothing destructive.
- Double / triple click select word / line, then drag extends.
- /copyselect [on|off|toggle] (alias /cos) flips the setting at runtime,
  HERMES_TUI_DISABLE_COPY_ON_SELECT=1 disables at startup, persists via
  display.tui_copy_on_select in config.yaml.

Help overlay now lists drag-select, multi-click, and click-to-position
so the gestures are discoverable.

Made-with: Cursor

* fix(tui): support prompt text selection gestures

Add mouse drag selection and Shift+Arrow/Home/End extension inside the TUI composer so prompt text behaves like a normal editable field while keeping click-to-position and right-click paste intact.

Made-with: Cursor

* Revert "feat(tui): auto copy-on-select for transcript text"

This reverts commit 6701288fe0.

* fix(tui): allow composer selection from prompt whitespace

Give the composer a one-cell mouse capture pad before the editable text. The prompt glyph/gutter still does not become selectable, but dragging from the edge now anchors at input offset 0 so users do not need to hit the first character precisely.

Made-with: Cursor

* fix(tui): clear selections from blank composer space

Clicking blank space in the transcript or composer now clears active TUI/input selections like a normal text surface. TextInput clicks stop bubbling so cursor placement and selection gestures keep their local behavior.

Made-with: Cursor

* fix(tui): delegate prompt gutter drags to composer text

The prompt gutter is now an input gesture region, not selectable content. Dragging from the whitespace or prompt area anchors the composer selection at offset 0, while selection highlight/copy remains limited to actual input text.

Made-with: Cursor

* fix(tui): move composer cursor to end on selection clear

External clear actions now collapse the composer selection to the end of the input, matching normal text-field behavior after dismissing a selection.

Made-with: Cursor

* fix(tui): capture composer padding before prompt

Add an explicit mouse capture cell over the left padding before the prompt glyph. Drags starting there now delegate to the composer input at offset 0 instead of starting terminal-level selection over the prompt chrome.

Made-with: Cursor

* fix(tui): avoid npm install on lockfile mtime churn

Compare package-lock.json against npm's hidden node_modules lock by content instead of mtimes. Git checkouts and npm lock rewrites can make the root lockfile newer even when installed dependencies already match, causing hermes --tui to print Installing TUI dependencies on every launch.

Made-with: Cursor

* fix(tui): include prompt leading cell in gesture region

Use the prompt box's real layout region to cover the leading whitespace cell before the glyph. The cell now participates in mouse hit testing and delegates to composer selection instead of starting terminal-level selection.

Made-with: Cursor

* fix(tui): widen prompt-side gesture capture band

Capture a wider left-side band around the composer prompt row so drags starting in terminal gutter/padding cells are consumed and delegated to input selection, instead of triggering terminal-level selection chrome.

Made-with: Cursor

* fix(tui): make pre-prompt spacer non-selectable content

Replace the sticky-prompt fallback `Text(' ')` with an empty spacer box so the visual gap remains but no literal space character is rendered/copyable before the composer prompt.

Made-with: Cursor

* fix(tui): capture pre-prompt spacer without shifting prompt layout

Revert the widened negative-margin prompt capture band and instead capture drags on the dedicated spacer row above the prompt. This keeps prompt/text alignment stable while still delegating whitespace-start drags to composer selection.

Made-with: Cursor

* fix(tui): align prompt with status bar and capture full input row

Drop the leading prompt column from 3 to 2 so the input first character lines up with the status bar text. Wrap the prompt+input row in a single mouse-capture box and stop event propagation from TextInput's own handlers so any drag in that row delegates to composer selection without leaking to terminal-level selection.

Made-with: Cursor

* fix(tui): anchor hardware cursor during composer selection

When a composer selection covers a row exactly the column width, the rendered text fills the row and the terminal auto-wraps the hardware cursor to col 0 of the next row, leaving a ghost block beneath the prompt. Park the cursor at the start of the input box during selection so it can't escape the input region.

Made-with: Cursor

* fix(tui): hide hardware cursor during composer selection

Stop fighting auto-wrap by hiding the hardware cursor outright while the
composer has an active selection. This prevents both the ghost block under
the prompt (cursor wrapping past the last cell) and the parked-cursor block
on the first selected character. The cursor restores as soon as the
selection clears or focus changes.

Made-with: Cursor

* chore(tui): /clean — drop dead capture-pad path, dedupe gutter handlers

- TextInput: remove unused leftCaptureColumns prop and capture-pad math, drop
  unused mouseApi.startAt, fold mouse offset into a single offsetAt helper,
  share a MouseEventLite type across the four handlers.
- appLayout: hoist a GutterMouseEvent type and an endInputDrag callback so the
  spacer/prompt/input rows share one shape.
- _tui_need_npm_install: lift the runtime-only key set to a module constant,
  collapse nested isinstance checks, and document the mtime fallback.

Made-with: Cursor

* fix(tui): address copilot review on PR #16732

- Split InputSelection.clear() into clear() (cursor-preserving) and
  collapseToEnd() (clear + jump to end). Cmd+C copy paths keep using
  clear() so the cursor stays put; the blank-area click in useMainApp
  switches to collapseToEnd() to match the requested UX.
- Spacer-row drags now force row=0 when forwarding into the input,
  since the spacer's vertical origin doesn't align with the input box
  and Ink mouse-capture keeps dispatching motion to the original
  target. Prompt+input row drag keeps localRow because origins match.

Made-with: Cursor

* fix(tui): give TextInput Box an explicit width

After the /clean pass dropped the unused capture-pad math, the wrapping
Box also lost its explicit width and started sizing to its rendered
content. Clicks past the last character missed TextInput and fell
through to the parent prompt-row Box, which collapsed the cursor to
offset 0. Pin the Box back to `columns` so the input owns its full
column span regardless of value length.

Made-with: Cursor

* feat(tui): double-click select-all + hide cursor on terminal blur

- Track click time/offset in TextInput so a quick second click on the
  same offset triggers select-all. Ink's screen-level multi-click is
  bypassed once our onMouseDown captures, so the gesture has to be
  detected locally.
- Extend the cursor-hide effect to also fire when the terminal loses
  focus, so the hollow-rect ghost most terminals draw at the parked
  cursor position disappears too.

Made-with: Cursor

* chore(tui): /clean — extract isMultiClickAt helper

Pull the click-recurrence math out of TextInput's onMouseDown into a
small isMultiClickAt(offset) helper so the handler reads as the gesture
list it actually is (multi-click → select-all, otherwise start).
Drop the redundant length>0 guard now that selectAll() already noops on
an empty value.

Made-with: Cursor

* docs(tui): explain _tui_need_npm_install content-vs-mtime comparison

Expand the docstring so future readers understand why we parse the
lockfiles instead of comparing mtimes, what the optional/peer skip
covers, how stale hidden-lock entries are handled, and when we fall
back to mtime.
2026-04-27 16:43:48 -07:00
Ben Barclay bebc10528f Merge pull request #16728 from NousResearch/docs/docker-multi-profile-section
docs(docker): add "Multi-profile support" section recommending one container per profile
2026-04-28 09:29:24 +10:00
Ben Barclay 273be93499 docs(docker): restore accidentally-redacted placeholder strings
The previous commit on this branch went through a layer that redacted
strings matching API-key patterns. Restore the original placeholder
values (sk-ant-..., ${ANTHROPIC_API_KEY}, etc.) that were already in
main so the diff is scoped strictly to the new Multi-profile support
section.
2026-04-28 08:21:40 +10:00
Ben Barclay adc2856ffb docs(docker): add "Multi-profile support" section
Clarifies that Hermes' built-in multi-profile feature is not recommended
when running under Docker. Recommends instead running one container per
profile, each bind-mounting its own host data directory as /opt/data.
Includes docker run examples, a rationale list (isolation, independent
lifecycle, port separation, concurrent-write safety), and a Compose
snippet showing two profile services side by side.
2026-04-28 08:20:01 +10:00
brooklyn! 46b4cf8d21 Merge pull request #16707 from NousResearch/bb/tui-queue-delete
feat(tui): delete queued message while editing with ctrl-x / cancel with esc
2026-04-27 15:56:46 -05:00
Brooklyn Nicholson 718088c382 fix(tui): copilot review on #16707 — naming, label consistency, esc priority
- Rename `removeAt` → `removeAtInPlace` and document the mutation
  contract; the old name read like a non-mutating helper.
- Hotkey table + queue header: use `Ctrl+X` / `Esc` to match the
  rest of the UI (was `⌃X` / `esc`).
- Render the queued header as a single template literal so JSX
  text-node whitespace can't sneak into the rendered line.
- Make `Esc` while editing beat the `terminal.hasSelection` clear:
  the header promises 'Esc cancel', so an active selection
  shouldn't silently consume the keystroke.
2026-04-27 15:37:54 -05:00
Brooklyn Nicholson 32b068560d fix(tui): stop ctrl+x from leaking a literal 'x' into the composer
The text input's ctrl-passthrough whitelist only listed Ctrl+C and
Ctrl+B.  Ctrl+X fell through to the printable-char branch and got
inserted as 'x' alongside the queue-delete action firing in
useInputHandlers.

Add Ctrl+X to the same whitelist so it bypasses the readline-style
fallback and reaches the app-level handler unchanged.  When not in
queue-edit mode it's a no-op, which is fine — typing 'x' on Ctrl+X
was the wrong default anyway.
2026-04-27 15:32:16 -05:00
Brooklyn Nicholson ea1012f59f feat(tui): delete queued message while editing with ctrl-x / cancel with esc
Today there's no way to remove a queued message — ↑ loads it for edit,
ctrl-K dispatches the head, but a draft you no longer want stays put
forever. ctrl-C just clears the composer and exits edit mode without
touching the queue.

Two new bindings, both gated on queueEditIdx !== null so they're
inert when the user isn't pointing at a queue item:

- ctrl-X — delete the queue item being edited, clear composer, exit
  edit mode.  "cut" matches the mental model and doesn't collide with
  any existing binding.
- esc — cancel the edit (composer clears, item stays in queue).
  Mirrors ctrl-C's existing behavior so muscle memory has two paths.

Header line now reads `queued (3) · editing 2 · ⌃X delete · esc cancel`
when in edit mode, so the affordance is discoverable without /help.
The /help hotkey table also gets a Ctrl+X entry.

ctrl-C is intentionally unchanged: it should never destroy queued
content.  Cancel is non-destructive (esc / ctrl-C); only ctrl-X
removes the item.
2026-04-27 15:24:14 -05:00
Erosika 4a9ac5c355 fix(memory): drop scrub from interim commentary + final response
Same layering concern as the persisted-assistant scrub already removed:
_emit_interim_assistant_message and the final_response return path were
mutating model output broadly.  Streaming scrubber covers real leaks
delta-by-delta; these post-stream scrubs were redundant.
2026-04-27 12:37:33 -07:00
Erosika 49e3a1d8ee style: trim verbose comment blocks added by previous commit 2026-04-27 12:37:33 -07:00
Erosika e553f6f3e4 fix(memory): narrow scrub surface to known wrapper boundaries
Reviewer pushback on the original boundary-hardening commits — three
overreach points pulled plugin-specific policy into shared core paths:

1. gateway/run.py hardcoded a '## Honcho Context' literal split for
   vision-LLM output.  Plugin-format heading in framework code; could
   truncate legitimate output naturally containing that header.
   Drop the literal split; keep generic sanitize_context (the wrapper
   strip is plugin-agnostic).  Plugin-specific cleanup belongs at the
   provider boundary, not the shared gateway path.

2. run_agent.run_conversation scrubbed user_message and
   persist_user_message before the conversation loop.  User text is
   sacred — if a user types a literal <memory-context> tag we must
   not silently delete it.  The producer (build_memory_context_block)
   is the only legitimate emitter; user input should never need the
   reverse op.

3. _build_assistant_message scrubbed model output before persistence.
   Same hazard: would silently mutate legitimate documentation/code
   the model emits containing the literal markers.  The streaming
   scrubber catches real leaks delta-by-delta before content is
   concatenated; persist-time scrub was redundant belt-and-suspenders.

4. _fire_stream_delta stripped leading newlines from every delta unless
   a paragraph break flag was set.  Mid-stream '\n' is legitimate
   markdown — lists, code fences, paragraph breaks — and chunk
   boundaries are arbitrary.  Narrow lstrip to the very first delta
   of the stream only (so stale provider preamble still gets cleaned
   on turn start, but mid-stream formatting survives).

Plus: build_memory_context_block now logs a warning when its defensive
sanitize_context strips something — surfaces buggy providers returning
pre-wrapped text instead of silently double-fencing.

Net architectural change: scrub surface collapses from 8 sites to 3
(StreamingContextScrubber on output deltas, plugin→backend send,
build_memory_context_block input-validation).  Plugin-specific strings
stay out of shared runtime paths.  User input and persisted assistant
output are no longer mutated.

Tests: rescoped TestMemoryContextSanitization (helper-correctness only,
no source-inspection of removed call sites), updated vision tests to
drop '## Honcho Context' literal-split assertions, updated
_build_assistant_message persistence test to assert preservation.
Added: cross-turn scrubber reset, build_memory_context_block warn-on-
violation, mid-stream newline preservation (plain + code fence).
2026-04-27 12:37:33 -07:00
Erosika 05435a35ed chore(release): map honcho-consolidation contributor emails
Adds AUTHOR_MAP entries for the 5 cherry-picked authors in #15381
so the contributor-attribution CI check passes.
2026-04-27 12:37:33 -07:00
Erosika 894e0b935b feat(honcho): explain why when honcho_profile returns an empty card
Closed PR #5137 addressed the retrieval path (peer cards via get_card()
instead of the session-scoped lookup that returned empty for per-session
messaging flows) — that architectural fix is already in main as
_fetch_peer_card / _fetch_peer_context.

What never got fixed is the user-visible side: honcho_profile returning
a flat 'No profile facts available yet.' leaves the model to guess at
why.  The model then often surfaces it to the user as a cryptic error.

Adds a diagnostic hint next to the existing 'result' message, enumerating
the likely causes in rough order of frequency:

  1. Observation disabled for this peer (user_observe_me/others off)
  2. Peer card hasn't accumulated yet (fresh peer / dialectic cadence
     hasn't fired enough turns — cards build over time)
  3. Generic fallback: self-hosted Honcho < 3.x lacks peer cards

The hint also suggests alternative tools (honcho_reasoning / honcho_search)
so the model can route around the empty card rather than giving up.

Schema description updated so the model knows the hint field exists and
that an empty card is NOT an error state.

7 tests cover the hint paths: warmup, observation-disabled for user + ai,
generic fallback, populated card still returns plain result (no hint),
alternative-tool suggestion present.
2026-04-27 12:37:33 -07:00
Erosika 5883df5574 fix(honcho): keep legacy schemeless baseUrl configs working
The scheme-validation commit (e77a3f2c) was too strict: a user with
legacy ''baseUrl: localhost:8000'' (no ''http://'' prefix) in their
''~/.honcho/config.json'' would get ''No API key configured'' from the
CLI after that change, even though their setup worked before.

urlparse on a schemeless host:port treats the host segment as the
scheme and leaves netloc empty, so the http/https check rejected it.

Falls back to a lenient check for schemeless strings that look like
hosts: contain '.' or ':', aren't a boolean/null literal, aren't pure
digits. The SDK still rejects truly malformed URLs at connect time
with a clearer error than ours.

Three new tests: legacy schemeless hosts accepted; obvious garbage
literals (''true'', ''null'', ''12345'') still rejected.  Reviewer
noted concern #1: schemeless regression for self-hosters with old
configs.
2026-04-27 12:37:33 -07:00
Erosika cd276eef78 compat(honcho): accept metadata kwarg on on_memory_write ABC bump
main's 6a957a74 added an optional 'metadata' kwarg to
MemoryProvider.on_memory_write so providers can distinguish tool-driven
memory writes from background-review writes.  MemoryManager already
does a getfullargspec-based introspection, so the old 3-arg signature
didn't break at runtime — but it missed the origin hint entirely.

Updates HonchoMemoryProvider.on_memory_write to accept the kwarg.  The
metadata isn't yet threaded into Honcho's create_conclusion payload —
that's worth its own PR once the consolidation lands and the new
metadata shape stabilises.
2026-04-27 12:37:33 -07:00
Erosika 02ab255a0d style(honcho): hoist hashlib import; validate baseUrl scheme before 'local' sentinel
Two small follow-ups to the PR review:

- Hoist hashlib import from _enforce_session_id_limit() to module top.
  stdlib imports are free after first cache, but keeping all imports at
  module top matches the rest of the codebase.

- _resolve_api_key now URL-parses baseUrl and requires http/https +
  non-empty netloc before returning the 'local' sentinel.  A typo like
  baseUrl: 'true' (or bare 'localhost') no longer silently passes the
  credential guard; the CLI correctly reports 'not configured'.

Three new tests cover the new validation (garbage strings, non-http
schemes, valid https).
2026-04-27 12:37:33 -07:00
Erosika 3b2edb347d fix(gateway): scrub memory-context leaks from vision auto-analysis output
fixes #5719

The auxiliary vision LLM called by gateway._enrich_message_with_vision
can echo its injected Honcho system prompt back into the image
description.  That description gets embedded verbatim into the enriched
user message, so recalled memory (personal facts, dialectic output)
surfaces into a user-visible bubble.

Strips both forms of leak before embedding:
  - <memory-context>...</memory-context> fenced blocks (sanitize_context)
  - trailing '## Honcho Context' sections (header + everything after)

Plus regression tests:
  - tests/agent/test_streaming_context_scrubber.py — 13 tests on the
    stateful scrubber (whole block, split tags, false-positive partial
    tags, unterminated span, reset, case-insensitivity)
  - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on
    _fire_stream_delta covering the realistic 7-chunk leak scenario and
    the cross-turn scrubber reset
  - tests/gateway/test_vision_memory_leak.py — 4 tests covering the
    vision auto-analysis boundary (clean pass-through, '## Honcho Context'
    header, fenced block, both patterns together)
2026-04-27 12:37:33 -07:00
Erosika 5ce5b17a42 fix(honcho): buffer partial memory-context spans across stream deltas
sanitize_context() uses a non-greedy block regex that needs both
<memory-context> open and close tags present in a single string. When a
provider streams the fenced memory block across multiple deltas (typical
for recalled-context leaks — the payload often arrives in 10+ 1-80 char
chunks), the per-delta sanitize stripped the lone open/close tags via
_FENCE_TAG_RE but let the payload in between flow straight to the UI.

Adds StreamingContextScrubber: a small stateful scrubber that tracks
open/close tag pairs across deltas, holds back partial-tag tails at
chunk boundaries, and discards span contents wholesale (including the
system-note line that fragments across deltas).

Wired into _fire_stream_delta; reset per user turn; benign trailing
partial-tag tails are flushed at the end of each model call.  Mid-span
interruption (provider drops closing tag) drops the orphaned content
rather than leaking it — truncated answer > leaked memory.

Follow-up to #13672 (@dontcallmejames).
2026-04-27 12:37:33 -07:00
Erosika 5d349ea857 fix(honcho): hold RLock across new_session's get_or_create to close race
new_session() was popping the old cached session, releasing the lock,
calling get_or_create, then re-acquiring the lock to insert. A concurrent
caller could observe the empty-cache window and race-create its own
session, producing two divergent session objects for the same key.

_cache_lock is an RLock, so nested reacquisition inside get_or_create is
safe. Hold it across the whole pop/create/insert sequence.

Follow-up to #13510 (@hekaru-agent).
2026-04-27 12:37:33 -07:00
twozle 82205276c1 fix(plugins/memory/honcho): default Honcho SDK HTTP timeout to 30s
When no explicit timeout is configured (HonchoClientConfig.timeout,
honcho.timeout / requestTimeout, or HONCHO_TIMEOUT), get_honcho_client
previously constructed the SDK with no timeout kwarg, letting the
underlying httpx client hang indefinitely if the Honcho backend
became unreachable mid-request.

This is a silent-failure hazard on the post-response path of
run_conversation: the memory_manager.sync_all() / queue_prefetch_all()
calls fire after the agent has already generated its final reply, so
a stalled Honcho request blocks run_conversation from returning.
The gateway never logs "response ready" and never delivers the
response to the platform (Telegram, etc.), even though the text is
already saved to the session file.

Repro: unplug the network or block app.honcho.dev mid-turn after
the model has produced its final message. Without this change,
_run_agent never returns. With it, the call aborts after 30s,
run_conversation returns, and the gateway delivers the response
(Honcho sync failure is logged and swallowed as before).

The default applies only when nothing is configured, so any
deployment that has explicitly set timeout / HONCHO_TIMEOUT /
honcho.timeout / honcho.requestTimeout keeps its existing value.
Self-hosted deployments that genuinely need a longer ceiling can
still override via any of those knobs.
2026-04-27 12:37:33 -07:00
Alexander Yususpov 36d6b643f6 fix(honcho): CLI credential guard rejects self-hosted baseUrl configs
_resolve_api_key() only checks for apiKey / HONCHO_API_KEY, so all
CLI subcommands (identity --show, status, migrate, etc.) bail with
"No API key configured" on self-hosted instances that use baseUrl
without an API key.

Return "local" when baseUrl or HONCHO_BASE_URL is set, matching the
client.py behavior that already handles this case for the SDK.

Tested on: macOS, self-hosted Honcho (Docker, localhost:8000).
2026-04-27 12:37:33 -07:00
HiddenPuppy 5d36871d92 Fix Honcho HOME-aware global config fallback 2026-04-27 12:37:33 -07:00
dontcallmejames f1ba4014e1 fix: harden memory-context leak boundaries 2026-04-27 12:37:33 -07:00
dontcallmejames 39713ba2ae fix: strip leaked memory context from commentary 2026-04-27 12:37:33 -07:00
hekaru-agent dad0217450 fix(honcho): thread-safe session cache via RLock
Wraps _session_cache mutations in threading.RLock. Without this, concurrent
gateway sessions (e.g., Telegram + Discord hitting Honcho at the same time)
can race on the cache and silently lose conclusions or memory writes.

Adopted from #13510 by @hekaru-agent; the off-topic cron/jobs.py cleanup
hunk from that PR is dropped here for scope isolation. Resolved a small
conflict with the pinPeerName guard (kept both).
2026-04-27 12:37:33 -07:00
Sanjays2402 cd1c4812ab fix(honcho): truncate resolve_session_name output to Honcho's 100-char limit (#13868)
Gateway session keys (Matrix "!room:server" + thread event IDs, Telegram
supergroup reply chains, Slack thread IDs with long workspace prefixes) can
exceed Honcho's 100-character session ID limit after sanitization. Every
Honcho API call for those sessions then 400s with "session_id too long".

Add a helper that enforces the 100-char limit after sanitization:
short keys (the common case) short-circuit unchanged; over-limit keys
keep a prefix and append a deterministic `-<8 hex>` SHA-256 suffix over
the original key so two long keys sharing a leading segment can't
collide onto the same truncated ID.

Adds 7 regression tests in tests/honcho_plugin/test_client.py covering
short / exact-limit / long / deterministic / collision-resistant /
allowlist-preserving / hash-suffix-present cases.
2026-04-27 12:37:33 -07:00
Brian D. Evans 326c9daa69 fix(honcho): require strict True for pin_peer_name to survive MagicMock configs (#15162)
CI caught that ``test_session_manager_prefers_runtime_user_id_over_config_peer_name``
in ``tests/agent/test_memory_user_id.py`` failed after this branch: that
test passes a ``MagicMock`` for ``config``, where
``mock.pin_peer_name`` silently returns another ``MagicMock`` — truthy by
default.  My ``getattr(..., "pin_peer_name", False)`` fallback was
supposed to guard against callers that haven't added the new attr, but
MagicMock *does* have the attr — it just returns a live mock for it.

Tightened the gate to ``getattr(..., False) is True``.  Real configs
built via ``HonchoClientConfig.from_global_config`` always yield a
proper boolean, so strict equality matches the pinned case and rejects
both the unset-attr fallback and MagicMock stand-ins.  Added a comment
explaining why ``is True`` is intentional, not paranoid.

Also tightened the ``peer_name`` existence check to
``getattr(..., None)`` so a MagicMock with ``peer_name`` left at its
default (also truthy) doesn't spuriously enable pinning either.

Verified against both the new ``test_pin_peer_name.py`` suite (13/13
pass) and the previously-failing
``TestHonchoUserIdScoping`` (3/3 pass).  Zero behaviour change for real
``HonchoClientConfig`` values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:37:33 -07:00
Brian D. Evans d03c6fcc45 fix(honcho): pinPeerName opt-in keeps memory unified across platforms (#14984)
When a gateway drives Hermes (Telegram, Discord, Slack, ...), it passes the
platform-native user ID as ``runtime_user_peer_name`` into the Honcho
session manager.  That ID wins over ``peer_name`` in ``honcho.json``, so a
single user who connects over three platforms ends up as three separate
Honcho peers — one per platform — with fragmented memory and no cross-
platform context continuity.

For multi-user bots this is correct (and must not change): each user gets
their own peer scope.  For the vast majority of personal Hermes deployments
the configured ``peer_name`` is an unambiguous identity, though, so the
reporter asked for an opt-in knob that pins the user peer to that value.

Fix: new ``pinPeerName`` boolean on the host config, default ``false``.
When ``true`` AND ``peerName`` is set, the configured peer_name beats the
gateway's runtime identity; every other resolution case is unchanged.

  honcho.json:
  {
    "peerName": "Igor",
    "hosts": {
      "hermes": { "pinPeerName": true }
    }
  }

  session.py (resolution order, pinned case):
    runtime_user_peer_name  →  skipped (opt-in flag active)
    config.peer_name        →  WINS   "Igor"
    session-key fallback    →  unreached

Parsing follows the same host-block-overrides-root pattern as every other
flag in HonchoClientConfig.from_global_config (``_resolve_bool`` helper).

Tests (tests/honcho_plugin/test_pin_peer_name.py — 13 cases, 5 groups):
- Config parsing: default, root true, host-block true, host overrides
  root, explicit false.
- Peer resolution: runtime wins by default (regression guard for multi-
  user bots), config wins when pinned, pin-without-peer_name is a no-op
  (prevents silent peer-id collapse to session-key fallback), CLI path
  where runtime is absent, deepest fallback intact, assistant peer
  untouched by the flag.
- Cross-platform unification: Telegram UID + Discord snowflake collapse
  to one peer when pinned; negative control confirms two distinct
  runtime IDs still produce two peers when unpinned.

244 honcho_plugin tests pass, 3 pre-existing skips, zero regressions.

Defensive detail: session.py uses ``getattr(self._config, "pin_peer_name",
False)`` so callers building partial config objects (several test fixtures
across the codebase do this) don't break if they haven't updated yet.
Runtime cost: one attr lookup per new session.

Closes #14984

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:37:33 -07:00
Siddharth Balyan ef41d3bd45 feat(nix): declarative plugin installation for NixOS module (#15953)
* feat(nix): parameterize dependency-groups in python.nix

* refactor(nix): extract package to callPackage-able hermes-agent.nix

Makes the package overridable via .override{} and adds
extraPythonPackages parameter for PYTHONPATH injection.
Includes build-time collision check using PEP 503 name
canonicalization.

* feat(nix): add overlay for external NixOS consumption

External flakes can now add overlays = [ inputs.hermes-agent.overlays.default ]
to get pkgs.hermes-agent with full .override support.

* test(nix): add check for extraPythonPackages PYTHONPATH injection

Verifies wrapper has PYTHONPATH when extras provided, and
base package has no PYTHONPATH without extras.

* feat(nix): add extraPlugins option for directory-based plugins

Symlinks plugin packages into HERMES_HOME/plugins/ at activation time.
Validates plugin.yaml presence. Asserts unique plugin names at eval time.
Hermes discovers them automatically via its directory scan.

* feat(nix): add extraPythonPackages option for entry-point plugins

Overrides the hermes package with PYTHONPATH injection when
extraPythonPackages is non-empty. Plugin .dist-info directories
become visible to importlib.metadata for entry-point discovery.
Works in both native systemd and container modes.

* docs: add NixOS declarative plugin installation to nix-setup, plugins, and build-a-plugin guides

- nix-setup.md: new Plugins section with extraPlugins/extraPythonPackages
  examples, overlay usage, collision checking note, options reference rows
- plugins.md: Nix row in discovery table, NixOS declarative plugins section
- build-a-hermes-plugin.md: Distribute for NixOS section after pip section

* fix: address review feedback — remove unrelated umask, fix fetchFromGitHub naming, simplify checks

- Remove accidentally introduced umask/migration changes (unrelated to plugins)
- Add pluginName helper, fix fetchFromGitHub producing name='source'
- Show name= in extraPlugins example docs
- Simplify checks.nix: use hermes-agent.override instead of re-callPackage
- Fix fragile grep shell logic in checks

* refactor: address simplify feedback — lib.getName, drop unused inputs', Python list for extras

- Use lib.getName instead of custom pluginName helper
- Drop unused inputs' from checks.nix perSystem args
- Pass extraPythonPackages as Python list literal instead of colon-split string

* fix: walk propagatedBuildInputs for plugin PYTHONPATH and collision check

Uses python312.pkgs.requiredPythonModules to resolve the full transitive
closure of extraPythonPackages. Without this, a plugin with third-party
deps (e.g. requests) would fail at runtime if those deps weren't already
in the sealed uv2nix venv. The collision check now also scans the full
closure, catching transitive conflicts.

* cleanup: fold plugins into subdir loop, use find for symlink cleanup, inline lib.getName

- Add 'plugins' to the existing cron/sessions/logs/memories subdir loop
  instead of a separate mkdir/chown/chmod block
- Replace fragile for-glob with find -delete for stale symlink cleanup
- Inline lib.getName at both call sites, remove pluginName wrapper
2026-04-28 00:18:32 +05:30
Siddharth Balyan 1fa76607c0 feat: trigram FTS5 index for CJK search, replace LIKE fallback (#16651)
* fix: bypass FTS5 for CJK queries in session_search

FTS5 default tokenizer splits CJK characters into individual tokens,
so multi-character queries like "大别山项目" become AND of single chars.
This produces few/no results compared to LIKE substring search.

For CJK queries, skip FTS5 entirely and use LIKE for accurate
phrase matching.

Fixes NousResearch/hermes-agent#15500

* fix: cache _contains_cjk, escape LIKE wildcards, add regression tests

On top of the CJK FTS5 bypass from #15509:

- Cache _contains_cjk() result in a local var to avoid redundant O(n)
  scans on every CJK query
- Escape %, _ in LIKE queries so literal wildcards in user input are
  not treated as SQL wildcards (consistent with other LIKE queries in
  hermes_state.py that use ESCAPE '\')
- Fix misleading comment ('or CJK fallback' → accurate description)
- Add 3 regression tests:
  - test_cjk_partial_fts5_results_supplemented_by_like (#15500 / #14829)
  - test_cjk_like_dedup_no_duplicates
  - test_cjk_like_escapes_wildcards (new wildcard escaping)

* feat: trigram FTS5 index for CJK search, replace LIKE fallback

Replace the LIKE '%query%' full-table-scan fallback for CJK queries with
a proper trigram FTS5 index (messages_fts_trigram).  The trigram tokenizer
creates overlapping 3-byte sequences so substring matching works natively
for any script — CJK, Thai, etc.

For queries with 3+ CJK characters: uses the trigram FTS5 table with
proper ranking, snippets, and indexed lookups.  For shorter queries
(1-2 CJK chars): falls back to LIKE since the trigram tokenizer needs
≥9 UTF-8 bytes (3 CJK chars) minimum.

Schema v10 migration creates the trigram table and backfills existing
messages.  Triggers keep the index in sync on INSERT/UPDATE/DELETE.

Builds on top of #16276 (bypass FTS5 for CJK, escape LIKE wildcards).

---------

Co-authored-by: vominh1919 <vominh1919@gmail.com>
2026-04-28 00:12:07 +05:30
brooklyn! e80504b088 Merge pull request #16656 from NousResearch/bb/tui-parity-mutating-commands
fix(tui): route mutating slash commands through live gateway state
2026-04-27 13:30:19 -05:00
Brooklyn Nicholson ed4f7f0ba3 test(tui): skip slash parity matrix when Python registry is unavailable
Keep the parity test backed by the real Python command registry while avoiding hard failures in Node-only Vitest environments that cannot import hermes_cli.commands.
2026-04-27 13:19:11 -05:00
kshitijk4poor 56724147ef fix(providers/gmi): post-salvage review fixes
- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
  is 22, so all users are past version 17 and would never be prompted for
  GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
  model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
  used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
  stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
  (was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
  wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
  to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
  fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list
2026-04-27 11:17:59 -07:00
Isaac Huang c53fcb0173 feat(providers): add GMI Cloud as a first-class API-key provider (#11955)
Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider
with built-in auth, aliases, model catalog, CLI entry points, auxiliary client
routing, context length resolution, doctor checks, env var tracking, and docs.

- auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL)
- providers.py: HermesOverlay with extra_env_vars for models.dev detection
- models.py: curated slash-form model catalog; live /v1/models fetch
- main.py: 'gmi' in _named_custom_provider_map and --provider choices
- model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated
  context-length probe block (GMI's /models has authoritative data)
- auxiliary_client.py: alias entries; _compat_model fix for slash-form
  models on cached aggregator-style clients; gmi aux default model
- doctor.py: GMI in provider connectivity checks
- config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS
- conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix)
- docs: providers.md, environment-variables.md, fallback-providers.md,
  configuration.md, quickstart.md (expands provider table)

Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>
2026-04-27 11:17:59 -07:00
Brooklyn Nicholson 8a33ed6136 fix(tui): address rollback guard and parity registry review
Load slash command names from the Python registry instead of regex-parsing source, and guard native rollback when no TUI session is active.
2026-04-27 13:10:13 -05:00
brooklyn! 41f70e6fc4 Merge pull request #16664 from NousResearch/bb/fix-tui-forceredraw-export
fix(tui): expose forceRedraw in Ink type shim
2026-04-27 13:08:16 -05:00
Brooklyn Nicholson adbd173ddd fix(tui): expose forceRedraw in Ink type shim 2026-04-27 13:07:48 -05:00
Brooklyn Nicholson 4f59510dd4 fix(tui): tighten fast-mode support validation
Distinguish missing model from unsupported model before enabling fast mode and cover both cases so config and live agent state remain untouched on invalid fast toggles.
2026-04-27 13:00:11 -05:00
Brooklyn Nicholson 4a08f1015a fix(tui): reject fast mode for unsupported live models
Match classic CLI parity by refusing to enable fast mode when the active model cannot produce fast request overrides, avoiding a misleading fast status with no runtime effect.
2026-04-27 12:55:41 -05:00
Brooklyn Nicholson 8bd5d0667a Merge origin/main into bb/tui-parity-mutating-commands
Resolve session command merge conflict and keep the branch current with main so PR #16656 is mergeable.
2026-04-27 12:51:11 -05:00
brooklyn! 6d24880604 Merge pull request #16657 from NousResearch/bb/tui-keybinding-model-parity
fix(tui): align Ctrl+L and /model default scope with classic CLI
2026-04-27 12:49:37 -05:00
Brooklyn Nicholson b8556eb15e fix(tui): address fast-mode live sync review feedback
Make `config.set fast status` read-only and keep live agent request overrides in sync with fast-mode toggles so runtime API kwargs match the selected mode.
2026-04-27 12:47:42 -05:00
Brooklyn Nicholson b3e7a412e2 fix(tui): wire Ctrl+L to Ink forceRedraw path
Expose a small forceRedraw API from @hermes/ink and use it for Ctrl/Cmd+L so the hotkey performs a real terminal clear + full repaint instead of a no-op state patch.
2026-04-27 12:44:24 -05:00
Brooklyn Nicholson da6f8449a5 test(tui): tighten redraw hotkey review follow-ups
Use explicit repaint patch semantics for Ctrl/Cmd+L and narrow the hotkey assertion to the actual +L entry so unrelated descriptions do not cause false failures.
2026-04-27 12:30:40 -05:00
Brooklyn Nicholson a13449a40a fix(tui): address Copilot review feedback on mutating command parity
Harden busy mode config reads against invalid display config shapes and align /fast help+usage text with accepted aliases, with regression coverage for non-dict display values.
2026-04-27 12:30:30 -05:00
Brooklyn Nicholson 17029a64e8 chore(ui-tui): apply npm run fix formatting pass
Run ui-tui lint autofix + prettier and commit the resulting formatting-only changes for the keybinding/model parity branch.
2026-04-27 12:25:27 -05:00
Brooklyn Nicholson 487da4b72b chore(ui-tui): apply npm run fix formatting pass
Run ui-tui lint autofix + prettier and commit the resulting formatting-only changes for the parity PR branch.
2026-04-27 12:25:21 -05:00
Brooklyn Nicholson 4909b94f99 fix(tui): align Ctrl+L and /model with classic CLI semantics
Make Ctrl+L non-destructive by redrawing the current screen state instead of starting a new session, and stop auto-appending --global for typed /model commands so session scope remains the default unless explicitly requested.
2026-04-27 12:23:56 -05:00
Brooklyn Nicholson a4cb3ef66c fix(tui): make mutating slash paths native and lifecycle-safe
Route /browser, /reload-mcp, /rollback, /stop, /fast, and /busy through direct TUI RPC handlers so state changes hit the live gateway session instead of slash-worker fallback. Add TUI session finalize/reset parity hooks (memory commit + plugin boundaries) and parity matrix tests to keep mutating commands off fallback.
2026-04-27 12:20:08 -05:00
brooklyn! d5a89283b7 Merge pull request #16625 from NousResearch/bb/fix-tui-title-session-sync
fix(tui): keep /title session names in sync
2026-04-27 12:05:54 -05:00
Brooklyn Nicholson 633f74504f fix(ci): resolve follow-up title edge case and flaky checks
Handle queued-title ValueError cleanup during session init, harden Discord message source building for test stubs, and fix the Dockerfile contract test syntax error. Also refresh the TUI lockfile and Nix build flags so nix ubuntu-latest no longer fails on npm lock/peer resolution drift.
2026-04-27 11:49:02 -05:00
Brooklyn Nicholson 27936ee02d fix(tui-gateway): keep queued user titles from being dropped
Retry queued pending titles even when the DB already has a non-empty title so explicit user title intents are not silently lost (for example after auto-title). Includes regression coverage.
2026-04-27 11:31:49 -05:00
Brooklyn Nicholson 3aa86717b6 fix(tui-gateway): harden pending-title retry and user errors
Retry persisting queued titles on session.title reads and map title validation failures to a user-facing 4022 code instead of generic 5007.
2026-04-27 11:27:51 -05:00
Brooklyn Nicholson 492c4c6573 fix(tui-gateway): address follow-up Copilot title threads
Tighten pending-title flush during session init and treat row lookup failures during title-set no-op detection as RPC errors instead of silently queueing.
2026-04-27 11:15:37 -05:00
Brooklyn Nicholson 3824b03237 fix(tui-gateway): harden session title RPC edge cases
Handle session.title read failures without crashing, distinguish no-op title writes from missing session rows, and use a distinct empty-title error code with regression coverage.
2026-04-27 11:05:10 -05:00
Brooklyn Nicholson 42b917c92c chore: uptick 2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 7ccfb97fee test(cli): assert active-session file lifecycle in launch_tui
Validate that the temp active-session file exists while the TUI subprocess runs and is removed after launch cleanup to match mkstemp semantics.
2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 7a6128cc4f fix(tui): harden active-session temp file handling
- create HERMES_TUI_ACTIVE_SESSION_FILE with mkstemp instead of a predictable tmp path and always cleanup in finally
- add assertions that launch wiring uses a randomized session file path and removes it on exit
2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 4b28140912 fix(cli): tighten MRU lookup and session DB cleanup
- use a grouped last_active join in search_sessions to avoid per-row correlated max lookups
- always close SessionDB in _resolve_last_session via finally and add regression coverage for search failure cleanup
2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 653b5ec128 fix(tui): report actual session on exit 2026-04-27 08:52:12 -07:00
Brooklyn Nicholson 164e33aa46 fix(cli): resolve -c by true MRU session
- order session listing by computed last_active in SessionDB so callers get MRU rows directly
- keep _resolve_last_session as a single-row lookup and add regression coverage for >20 session sampling
2026-04-27 08:52:12 -07:00
Brooklyn Nicholson cdfbd89ea5 fix(tui): keep /title session names in sync
Route TUI /title through session.title RPC and queue titles when the session DB row is still initializing, so renamed sessions reliably appear in /resume and browse flows.
2026-04-27 10:51:14 -05:00
kshitijk4poor 730347e38f feat(skills): expand touchdesigner-mcp with GLSL, post-FX, audio, geometry references (#13664)
Add 6 new reference files with generic reusable patterns:
- glsl.md: uniforms, built-in functions, shader templates, Bayer dither
- postfx.md: bloom, CRT scanlines, chromatic aberration, feedback glow
- layout-compositor.md: layoutTOP, overTOP grids, panel dividers
- operator-tips.md: wireframe rendering, feedback TOP setup
- geometry-comp.md: instancing, POP vs SOP rendering, shape morphing
- audio-reactive.md: band extraction (audiofilterCHOP), beat detection, MIDI

Expand pitfalls.md (#46-63):
- Connection syntax, moviefileoutTOP bug, batch frame capture
- TOP.save() time advancement, feedback masking, incremental builds
- MCP reconnection after project.load(), TOX reverse-engineering
- sliderCOMP naming, create() suffix requirement
- COMP reparenting (copyOPs), expressionCHOP crash
- Strip session-specific names in earlier pitfalls (promo_ -> my_)
- Audio device CHOP at FPS=0: active=False is the fix, not volume=0

All content is generic — no session-specific paths, hardware, aesthetics,
or param-name-only entries (those belong in td_get_par_info).
Bumps version 1.0.0 -> 1.1.0.

Salvaged from @kshitijk4poor's original PR #13664; dropped setup.sh and
troubleshooting.md changes that reverted subsequent HERMES_HOME and pgrep
fixes already on main, and preserved original author frontmatter.
2026-04-27 08:46:36 -07:00
Teknium 628ca99d9b fix(compression): show main + aux model and provider in feasibility warning (#16619)
The auto-lowered-threshold warning only named the compression model,
making it confusing when the main and aux models are configured with
the same slug but end up with different resolved context lengths (e.g.
OpenRouter's stepfun/step-3.5-flash catalog value vs. a main-model
context_length override). Users couldn't tell whether the warning
reflected two different models or a context-resolution mismatch.

Now includes both 'model (provider)' labels. The aux provider falls
back to the client's base_url hostname when the configured provider
is 'auto', so users see where compression is actually being called.
2026-04-27 08:43:24 -07:00
Teknium 460a8ce5d9 chore(release): map hermes-agent-dhabibi bot -> dhabibi 2026-04-27 08:35:50 -07:00
hermes-agent-dhabibi aa53fb661a fix(copilot): mark native image requests as vision
Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
2026-04-27 08:35:50 -07:00
hermes-agent-dhabibi 8402ba150e fix(copilot): send vision header for Copilot vision requests
Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>
2026-04-27 08:35:50 -07:00
brooklyn! 512c610058 Merge pull request #16605 from NousResearch/bb/fix-tui-docker-ink-build
fix(docker): prebuild TUI assets in image
2026-04-27 10:17:58 -05:00
Brooklyn Nicholson b479205396 fix(docker): tighten TUI build contract 2026-04-27 10:15:00 -05:00
Austin Pickett 60f2415a4a Merge pull request #16600 from NousResearch/austin/fix/model-provider
fix(models): consolidate provider and model into /model command
2026-04-27 08:14:27 -07:00
Austin Pickett 082acc75b0 fix(review): address copilot review 2026-04-27 11:06:28 -04:00
Brooklyn Nicholson 4424a0e0f7 fix(docker): prebuild TUI assets in image 2026-04-27 10:05:07 -05:00
kshitij 98d75dea5a perf(tui): lazily seed virtual history heights (#16523) 2026-04-27 07:55:45 -07:00
Teknium 9b55365f6f fix(gateway,cron): close ephemeral agents + reap stale aux clients (salvage #13979) (#16598)
* fix: clean gateway auxiliary client caches on teardown

* fix(gateway): recover from stale pid files and close cron agents

Two issues were keeping the gateway from surviving long runs:

1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which
   refuses to unlink when the file's pid differs from our own. That
   safety check exists for the --replace atexit handoff, but it also
   applied to stale-record cleanup, so after a crashy exit the pid
   file was orphaned: `write_pid_file()`'s O_EXCL create then failed
   with `FileExistsError`, and systemd looped on "PID file race lost
   to another gateway instance". Unlink unconditionally from this
   helper since the caller has already verified the record is dead.

2. The cron scheduler never closed the ephemeral `AIAgent` it creates
   per tick, and never swept the process-global auxiliary-client
   cache. Over days of 10-minute ticks this leaked subprocesses and
   async httpx transports until the gateway hit EMFILE. Release the
   agent and call `cleanup_stale_async_clients()` in `run_job`'s
   outer `finally`, matching the gateway's own per-turn cleanup.

* chore(release): map bloodcarter@gmail.com -> bloodcarter

---------

Co-authored-by: bloodcarter <bloodcarter@gmail.com>
2026-04-27 07:41:42 -07:00
Austin Pickett a0b62e0c5a fix(models): consolidate provider and model into /model command 2026-04-27 10:38:36 -04:00
Teknium ac0325c257 diagnostic(cli): log slow bracketed-paste handler (>500ms) for #16263 (#16575)
When a paste takes longer than 500ms to process on the prompt_toolkit
event-loop thread, emit a logger.warning with elapsed time, byte size,
line count, and sys.platform. Gives us concrete repro data for the
recurring 'CLI freezes after paste on macOS' class of reports (issue
#16263, plus sibling reports across Claude Code / Cursor / Lightroom
against macOS Tahoe 26).

Pure diagnostic — no behavior change. Two time.perf_counter() calls
and one conditional per paste event. Log line only fires when the
handler is actually slow, so normal pastes add no log noise.
2026-04-27 06:44:36 -07:00
Teknium 817633bc5d feat(backup): exclude SQLite WAL/SHM/journal sidecars (#16576)
The backup takes a consistent snapshot of each .db via sqlite3.backup(),
so shipping the live .db-wal / .db-shm / .db-journal alongside pairs the
fresh snapshot with stale sidecar state and produces a torn restore on
first open. Sidecars are transient and SQLite regenerates them on next
connection anyway.

This also trims multi-MB of junk from every zip — state.db-wal alone was
~9 MB here, doubled by the fact the WAL is the live write-ahead log, not
data.
2026-04-27 06:43:52 -07:00
Teknium 9692ce2072 chore(release): map andrewho.sf@gmail.com -> andrewhosf
Release-notes contributor attribution for the salvaged PR #13734 fix.
2026-04-27 06:42:32 -07:00
Teknium 008860a23f fix(approval): close remaining prompt_toolkit deadlock vectors (#15216)
PR #13734 fixed the concurrent-tool-executor vector (ThreadPoolExecutor
workers didn't inherit the CLI's TLS approval callback). Two vectors
remained that could still land in the deadlocking input() fallback:

1. _spawn_background_review spawns a raw threading.Thread with no
   approval callback installed, so any dangerous-command guard the
   review agent trips falls back to input() -> deadlock against the
   parent's prompt_toolkit TUI (same class as delegate_task subagents,
   fixed in 023b1bff1 / #15491). Install a _bg_review_auto_deny
   callback at thread start, clear on finally.

2. prompt_dangerous_approval's fallback unconditionally spawned a
   daemon thread calling input() when approval_callback was None.
   That fallback can never succeed under prompt_toolkit because the
   user's Enter goes to pt's raw-mode stdin capture. Detect an active
   pt Application via get_app_or_none() and fail closed (deny + log)
   instead, so future threads that forget to install a callback
   degrade gracefully instead of hanging 60s invisibly.

Regression guards:
- tests/run_agent/test_background_review.py verifies the review
  worker thread sees a callable auto-deny callback mid-run and that
  the slot is cleared in the finally block.
- tests/tools/test_approval.py TestFailClosedUnderPromptToolkit
  verifies prompt_dangerous_approval returns 'deny' fast under a
  mocked pt Application, and that a real callback still wins over
  the guard.
2026-04-27 06:42:32 -07:00
Andrew Ho 0046d170dc fix(agent): propagate approval callbacks to concurrent tool worker threads
When tools execute concurrently via ThreadPoolExecutor, worker threads
could not see the thread-local approval/sudo callbacks registered by
the CLI. This caused dangerous-command prompts to fall back to plain
input(), which deadlocks against prompt_toolkit's raw terminal mode.

Capture parent-thread callbacks before launching workers, register
them locally in each _run_tool thread, and clear them on exit.

Mirrors the existing fix pattern from cli.py run_agent() for the
main agent worker thread (GHSA-qg5c-hvr5-hjgr / #13617).
2026-04-27 06:42:32 -07:00
luyao618 8ad29a938a fix(agent): restrict background review agent to memory and skills toolsets
The background skill/memory review agent was created without toolset
restrictions, inheriting the full default tool set. This allowed it to
use terminal, send_message, delegate_task, and other tools outside its
intended scope, potentially performing unrelated side effects after
skill creation.

Restrict the review agent to only memory and skills toolsets by passing
enabled_toolsets=['memory', 'skills'] during AIAgent construction.

Fixes #15204
2026-04-27 06:41:23 -07:00
Teknium a59a98b180 fix(cli): pass session messages to shutdown_memory_provider (#15165 sibling)
The gateway fix in the previous commit forwards _session_messages on
gateway session teardown.  The CLI exit cleanup path had the same bug:
it read getattr(agent, 'conversation_history', None) or [] — but AIAgent
has no conversation_history attribute, so providers always received [].

Switch to _session_messages (same attribute the gateway now uses),
guarded by isinstance(..., list) to preserve the no-arg fallback for
MagicMock-based CLI test stubs.

Adds tests/cli/test_cli_shutdown_memory_messages.py (4 cases mirroring
the gateway suite).
2026-04-27 06:41:16 -07:00
briandevans 500774e30e fix(gateway): pass session messages to shutdown_memory_provider (#15165)
``_cleanup_agent_resources`` previously invoked
``agent.shutdown_memory_provider()`` with no arguments, so every memory
provider's ``on_session_end`` hook received an empty list. Providers
with an early-return guard on empty input (Holographic, Hindsight) never
extracted facts from the conversation, and users hit
"抱歉,找不到相關的對話記錄" on the first turn after any gateway
restart, session reset, or idle expiry.

Forward ``agent._session_messages`` — the transcript the agent itself
maintains and refreshes every turn via ``_persist_session`` — so
providers see the actual conversation. Falls back to the legacy no-arg
call whenever the attribute is absent or not a list (test stubs built
via ``object.__new__`` or ``MagicMock``) to preserve backward
compatibility with existing suites. ``AIAgent.shutdown_memory_provider``
already accepts ``messages: list = None`` (run_agent.py:4126), so this
is a pure caller-side fix.

Paths that use ``skip_memory=True`` temporary agents (memory flush,
hygiene auto-compress, ``/compress``) are no-ops inside
``shutdown_memory_provider`` because ``self._memory_manager`` is None —
no behaviour change for them.

Covers Part A of the bug report. Part B (adding ``on_session_end`` to
the Hindsight plugin) is a separate concern that would benefit from
this fix landing first.

Regression test added at
``tests/gateway/test_shutdown_memory_provider_messages.py`` covering:
populated messages forwarded, empty list still forwarded, attribute
missing falls back, non-list (MagicMock) falls back, provider
exceptions don't block ``close()``, None agent no-op, and agent
without ``shutdown_memory_provider`` tolerated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 06:41:16 -07:00
teknium1 c4ad2c33f4 chore(release): map christian@scheid.tech -> scheidti 2026-04-27 06:41:11 -07:00
Christian Scheid 75b460bc94 fix(email): add required Date header to outbound mail 2026-04-27 06:41:11 -07:00
Teknium a9033c9220 feat(backup): exclude checkpoints/ from backups (#16572)
Session-local trajectory cache — keyed by session hash, regenerated
per-session, won't port to another machine anyway. On a large install
this was multiple GB of pure noise in every zip.

Also adds a regression test for the pre-existing backups/ exclusion
so the two machine-local dirs share coverage.
2026-04-27 06:40:18 -07:00
Teknium ea3c5a14c3 feat(update): make pre-update backup opt-in (off by default) (#16566)
The zip backup could add minutes to every 'hermes update' on large
HERMES_HOME directories. Flip the default to off and add a --backup
flag for one-off opt-in runs.

- updates.pre_update_backup default: True -> False
- hermes update: new --backup flag (opposite of existing --no-backup)
- Silent no-op when disabled (no message spam on every update)
- Existing --no-backup still works and wins over --backup
- Users who explicitly set pre_update_backup: true keep the old behavior
- Tests updated to cover default-off, --backup opt-in, and config-enabled paths
2026-04-27 06:36:35 -07:00
Teknium ec671c4154 feat(image-input): native multimodal routing based on model vision capability (#16506)
* feat(image-input): native multimodal routing based on model vision capability

Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.

Routing decision (agent/image_routing.py::decide_image_input_mode):

  agent.image_input_mode = auto | native | text  (default: auto)

In auto mode:
  - If auxiliary.vision.provider/model is explicitly configured, keep the
    text pipeline (user paid for a dedicated vision backend).
  - Else if models.dev reports supports_vision=True for the active
    provider/model, attach natively.
  - Else fall back to text (current behaviour).

Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).

run_agent.py changes:
  - _prepare_anthropic_messages_for_api now passes image parts through
    unchanged when the model supports vision — the Anthropic adapter
    translates them to native image blocks. Previous behaviour
    (vision_analyze → text) only runs for non-vision Anthropic models.
  - New _prepare_messages_for_non_vision_model mirrors the same contract
    for chat.completions and codex_responses paths, so non-vision models
    on any provider get text-fallback instead of failing at the provider.
  - New _model_supports_vision() helper reads models.dev caps.

vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.

Config default: agent.image_input_mode = auto.

Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).

* feat(image-input): size-cap + resize oversized images, charge image tokens in compressor

Two follow-ups that make the native image routing safer for long / heavy
sessions:

1) Oversize handling in build_native_content_parts:
   - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
     the most restrictive provider — Gemini inline data).
   - Delegates to vision_tools._resize_image_for_vision (Pillow-based,
     already battle-tested) to downscale to 5 MB first-try.
   - If Pillow is missing or resize still overshoots, the image is
     dropped and reported back in skipped[]; caller falls back to text
     enrichment for that image.

2) Image-token accounting in context_compressor:
   - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
     within the realistic range for Anthropic/GPT-4o/Gemini billing).
   - _content_length_for_budget() helper: sums text-part lengths and
     charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
     input_image part.  Base64 payload inside image_url is NOT counted
     as chars — dimensions don't matter, only image-presence.
   - Both tail-cut sites (_prune_old_tool_results L527 and
     _find_tail_cut_by_tokens L1126) now call the helper so multi-image
     conversations don't slip past compression budget.

Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).

* fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits

The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:

  * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
    'image exceeds 5 MB maximum' above that).
  * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
    complaint (empirically verified April 2026 with progressive PNG
    sizes).

New behaviour:

  * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
    ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
  * Providers NOT in the table get no ceiling — images attach at native
    size and we trust the provider to return its own error if it
    disagrees. A provider-specific 400 message is clearer than us
    guessing wrong and silently degrading image quality.
  * build_native_content_parts() gains a keyword-only provider arg;
    gateway/CLI/TUI pass the active provider so Anthropic users get
    auto-resize protection while OpenAI users don't pay it.
  * Resize target dropped from 5 MB to 4 MB to slide safely under
    Anthropic's boundary with header overhead.

Empirical measurements (direct API, no Hermes in the loop):

    image b64     anthropic   openrouter/gpt5.5   codex-oauth/gpt5.5
    0.19 MB       ✓           ✓                   ✓
    12.37 MB      ✗ 400 5MB   ✓                   ✓
    23.85 MB      ✗ 400 5MB   ✓                   ✓
    49.46 MB      ✗ 413       ✓                   ✓

Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.

* refactor(image-input): attempt native, shrink-and-retry on provider reject

Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.

Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.

Reactive design:
  - image_routing.py: _file_to_data_url encodes native size, no ceiling.
    build_native_content_parts drops its provider kwarg.
  - error_classifier.py: new FailoverReason.image_too_large + pattern
    match ("image exceeds", "image too large", etc.) checked BEFORE
    context_overflow so Anthropic's 5MB rejection lands in the right
    bucket.
  - run_agent.py: new _try_shrink_image_parts_in_messages walks api
    messages in-place, re-encodes oversized data: URL image parts
    through vision_tools._resize_image_for_vision to fit under 4MB,
    handles both chat.completions (dict image_url) and Responses
    (string image_url) shapes, ignores http URLs (provider-fetched).
    New image_shrink_retry_attempted flag in the retry loop fires the
    shrink exactly once per turn after credential-pool recovery but
    before auth retries.

E2E verified live against Anthropic claude-sonnet-4-6:
  - 17.9MB PNG (23.9MB b64) attached at native size
  - Anthropic returns 400 "image exceeds 5 MB maximum"
  - Agent logs '📐 Image(s) exceeded provider size limit — shrank and
    retrying...'
  - Retry succeeds, correct response delivered in 6.8s total.

Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
2026-04-27 06:27:59 -07:00
Teknium df3c9593f8 feat(plugins): google_meet \u2014 join, transcribe, speak, follow up (#16364)
* feat(plugins): google_meet — bundled plugin for join+transcribe Meet calls

v1 shipping transcribe-only. Spawns headless Chromium via Playwright,
joins an explicit https://meet.google.com/ URL, enables live captions,
and scrapes them into a transcript file the agent can read across turns.
The agent then has the meeting content in context and can do followup
work (send recap, file issues, schedule followups) with its regular tools.

Surface:
  - Tools: meet_join, meet_status, meet_transcript, meet_leave, meet_say
    (meet_say is a v1 stub — returns not-implemented; v2 will wire
    realtime duplex audio via OpenAI Realtime / Gemini Live +
    BlackHole / PulseAudio null-sink.)
  - CLI: hermes meet setup | auth | join | status | transcript | stop
  - Lifecycle: on_session_end auto-leaves any still-running bot.

Safety:
  - URL regex rejects anything that isn't https://meet.google.com/...
  - No calendar scanning, no auto-dial, no auto-consent announcement.
  - Single active meeting per install; a second meet_join leaves the first.
  - Platform-gated to Linux + macOS (Windows audio routing for v2 untested).
  - Opt-in: standalone plugin, user must add 'google_meet' to
    plugins.enabled in config.yaml.

Zero core changes. Plugin uses existing register_tool /
register_cli_command / register_hook surfaces. 21 new unit tests cover the
URL safety gate, transcript dedup + status round-trip, process-manager
refusals/start/stop paths, tool-handler JSON shape under each branch,
session-end cleanup, and platform-gated register().

* feat(plugins/google_meet): v2 realtime audio + v3 remote node host

v2 \u2014 agent speaks in-meeting
  audio_bridge.py: PulseAudio null-sink (Linux) + BlackHole probe (macOS).
    On Linux we load pactl module-null-sink + module-virtual-source, track
    module ids for teardown; Chrome gets PULSE_SOURCE=<virt src> env so its
    fake mic reads what we write to the sink. macOS just probes BlackHole
    2ch and returns its device name \u2014 the plugin refuses to switch the
    user's default audio input (that would surprise them).
  realtime/openai_client.py: sync WebSocket client for the OpenAI Realtime
    API. RealtimeSession.speak(text) sends conversation.item.create +
    response.create, accumulates response.audio.delta PCM bytes, appends
    them to a file. RealtimeSpeaker runs a JSONL-queue loop consuming
    meet_say calls. 'websockets' is an optional dep imported lazily.
  meet_bot.py: when HERMES_MEET_MODE=realtime, provisions AudioBridge,
    starts RealtimeSession + speaker thread, spawns paplay to pump PCM
    into the null-sink, then cleans everything up on SIGTERM. If any
    realtime setup step fails, falls back cleanly to transcribe mode
    with an error flagged in status.json.
  process_manager.enqueue_say(): writes a JSONL line to say_queue.jsonl;
    refuses when no active meeting or active meeting is transcribe-only.
  tools.meet_say: real implementation; requires active mode='realtime'.
  meet_join: adds mode='transcribe'|'realtime' param.

v3 \u2014 remote node host
  node/protocol.py: JSON envelope (type, id, token, payload) + validate.
  node/registry.py: $HERMES_HOME/workspace/meetings/nodes.json, with
    resolve() auto-selecting the sole registered node when name is None.
  node/server.py: NodeServer \u2014 websockets.serve, bearer-token auth,
    dispatches start_bot/stop/status/transcript/say/ping onto the local
    process_manager. Token auto-generated + persisted on first run.
  node/client.py: NodeClient \u2014 short-lived sync WS per RPC, raises
    RuntimeError on error envelopes, clean API matching the server.
  node/cli.py: 'hermes meet node {run,list,approve,remove,status,ping}'
    subtree; wired into the main meet CLI by cli.py so 'hermes meet node'
    Just Works.
  tools.py: every meet_* tool accepts node='<name>'|'auto'; when set,
    routes through NodeClient to the remote bot instead of running
    locally. Unknown node \u2192 clear 'no registered meet node matches ...'
    error.
  cli.py: 'hermes meet join --node my-mac --mode realtime' and
    'hermes meet say "..." --node my-mac' route to the node; 'hermes
    meet node approve <name> <url> <token>' registers one.

Tests
  21 v1 tests updated (meet_say is no longer a stub; active-record now
    carries mode).
  20 new audio_bridge + realtime tests.
  42 new node tests (protocol/registry/server/client/cli).
  17 new v1/v2/v3 integration tests at the plugin level covering
    enqueue_say edge cases, env var passthrough, mode validation, node
    routing (known/unknown/auto/ambiguous), and argparse wiring for
    `hermes meet say` + `hermes meet node` + --mode/--node flags.
  Total: 100 plugin tests + 58 plugin-system tests = 158 passing.

E2E verified on Linux with fresh HERMES_HOME: plugin loads, 5 tools
register, on_session_end hook wires, 'hermes meet' CLI tree wires
including the node subtree, NodeRegistry round-trips, meet_join routes
correctly to NodeClient under node='my-mac' with mode='realtime',
enqueue_say accepts realtime/rejects transcribe, argparse parses every
new flag cleanly.

Zero changes to core. All new code lives under plugins/google_meet/.

* feat(plugins/google_meet): auto-install, admission detect, mac PCM pump, barge-in, richer status

Ready-for-live-test follow-up on PR #16364. Five additions that matter for
the first live run on a real Meet, in priority order:

1. hermes meet install [--realtime] [--yes]
   pip install playwright websockets + python -m playwright install chromium
   --realtime: installs platform audio deps (pulseaudio-utils on Linux via
   sudo apt, blackhole-2ch + ffmpeg on macOS via brew). Prompts before
   sudo/brew unless --yes. Refuses on Windows. Refuses to auto-flip the
   macOS default input — user still selects BlackHole in System Settings
   (deliberate; surprise audio rerouting is worse than a manual step).

2. Admission detection
   _detect_admission(page): Leave-button visible OR caption region
   attached OR participants list present → we're in-call.
   _detect_denied(page): 'You can\'t join this video call' / 'You were
   removed' / 'No one responded to your request' → bail out.
   HERMES_MEET_LOBBY_TIMEOUT (default 300s) caps how long we sit in
   the lobby before giving up. in_call stays False until admitted.
   Status surfaces leaveReason: duration_expired | lobby_timeout |
   denied | page_closed.

3. macOS PCM pump
   ffmpeg reads speaker.pcm (24kHz s16le mono) and writes to the
   BlackHole AVFoundation output via -f audiotoolbox
   -audio_device_index <N>. _mac_audio_device_index() probes
   ffmpeg -f avfoundation -list_devices true to resolve 'BlackHole 2ch'
   → numeric index. Falls back to index 0 on probe failure. Linux
   paplay pump unchanged.

4. Richer status dict
   _BotState now tracks realtime, realtimeReady, realtimeDevice,
   audioBytesOut, lastAudioOutAt, lastBargeInAt, joinAttemptedAt,
   leaveReason. RealtimeSession.audio_bytes_out / last_audio_out_at
   counters fold into the status file once a second so meet_status()
   can show the agent's voice activity in near-real-time.

5. Barge-in
   RealtimeSession.cancel_response() sends type='response.cancel' over
   the same WS (lock-guarded so it's safe to call from the caption
   thread while speak() is reading frames). Handles response.cancelled
   as a terminal frame type. _looks_like_human_speaker() gates triggers
   so the bot's own name, 'You', 'Unknown', and blanks don't self-cancel.
   Called from the caption drain loop: when a new caption arrives
   attributed to a real participant while rt.session exists, we fire
   cancel_response() and stamp lastBargeInAt.

Tests: 20 new unit tests across _BotState telemetry, barge-in gating,
admission/denied probe error handling, cancel_response with and without
a connected WS, and `hermes meet install` CLI wiring (flag parsing +
end-to-end subprocess.run verification + Linux-already-installed fast
path). Total 171 passing across all google_meet test files + the
plugin-system regression suite.

E2E verified on Linux: plugin loads, all 5 tools register,
`hermes meet install --realtime --yes` parses, fresh-bot status.json
has every new telemetry key, cancel_response on a disconnected session
returns False without raising, barge-in helper gates the bot's own
name correctly.

Still out of scope (for a future PR, not blocking live test):
mic → Realtime duplex (the agent listening to meeting audio via
WebRTC), node-host TLS/pairing UX, Windows audio, Meet create+Twilio.

Docs updated: SKILL.md now lists the installer subcommand, lobby
timeout, barge-in caveat, and the full status-dict reference table.
README.md quick-start uses hermes meet install.
2026-04-27 06:22:25 -07:00
Teknium 8ed599dc05 feat(update): auto-backup HERMES_HOME before hermes update (#16539)
Every 'hermes update' now runs a full backup of ~/.hermes/ first, so
users can always roll back to the exact state they had before the
update if anything goes wrong (corrupted sessions.db, broken skills,
config migrations that don't round-trip, etc.).

Changes:
- hermes_cli/backup.py: new create_pre_update_backup() helper. Writes
  to <HERMES_HOME>/backups/pre-update-<stamp>.zip using the same
  exclusion rules and SQLite safe-copy as 'hermes backup'. Auto-rotates
  (keep last N, pre-update-*.zip only — hand-dropped zips in backups/
  are untouched). Adds 'backups' to _EXCLUDED_DIRS so subsequent backups
  don't nest prior ones.
- hermes_cli/main.py: _run_pre_update_backup() wired into
  _cmd_update_impl before any git operation. Prints save path, restore
  command, and how to disable. Swallows failures so a broken backup
  never blocks the update itself. New --no-backup flag on 'hermes
  update' for one-off override.
- hermes_cli/config.py: new 'updates' section in DEFAULT_CONFIG with
  pre_update_backup (default true) and backup_keep (default 5).
  Auto-surfaces in the dashboard config UI.
- tests/hermes_cli/test_backup.py: +11 tests covering backup location,
  content parity with 'hermes backup', no-recursion, rotation, manual
  file preservation, config gate, --no-backup flag, flag-wins-over-config.
2026-04-27 05:36:19 -07:00
Teknium 920ebd8303 feat(prompt): point agent at hermes-agent skill + docs site for Hermes questions (#16535)
Adds a short always-on pointer to the system prompt: when the user asks
about configuring, setting up, troubleshooting, or using Hermes Agent
itself, load the hermes-agent skill via skill_view(name='hermes-agent')
and fall back to https://hermes-agent.nousresearch.com/docs via
web_extract. Keeps sessions without skill_view loaded useful too — the
docs URL + web_extract is enough to answer most questions.

The guidance is appended right after DEFAULT_AGENT_IDENTITY (or SOUL.md)
so it ships regardless of which toolset profile is active. Footprint is
~560 chars, behind the existing prompt cache.
2026-04-27 05:35:55 -07:00
Teknium bb00b783fb fix(cli): eliminate ghost status-bar + DSR input leaks from terminal drift
The CLI renders through prompt_toolkit in non-full-screen mode, so every
repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase
before drawing the new frame. Any time that tracked position drifts from
terminal reality, redraws stack on top of stale content instead of
overwriting it. Four user-visible bugs share this root cause.

Fixes:

- #5474 (SIGWINCH ghosts): the resize wrapper previously only handled
  column-shrink reflow. Generalize it to force a full screen-clear
  (erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize
  — covers widen, row-shrink, and multiplexer SIGWINCH-less redraws.

- #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so
  prompt_toolkit has no signal to recover. Add a _force_full_redraw()
  helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed
  as /redraw. Users can manually clear drift without restarting Hermes.

- #14692 (DSR response leaks — ^[[53;1R): resize storms make
  prompt_toolkit's CSI 6n queries race past the input parser; the
  terminal's reply ends up as literal input text. Add a sibling of the
  bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the
  caret-escape visible form from paste text, buffer text-filter, and
  the input-processing loop.

The idle-redraw removal (#12641) is in the preceding commit from
@foxion37 — keeping them as separate commits preserves attribution.
2026-04-27 05:31:47 -07:00
Q 5e92b67807 fix: stop idle CLI redraws 2026-04-27 05:31:47 -07:00
Teknium ee1a07f9e9 fix(agent): block cross-provider reasoning leak to DeepSeek/Kimi (#15748) (#16500)
On provider switches mid-session (e.g. MiniMax -> DeepSeek), the source
assistant turn carries a 'reasoning' field written by the prior provider
but no 'reasoning_content' key. _copy_reasoning_content_for_api would
promote that foreign 'reasoning' to 'reasoning_content' on the outbound
DeepSeek request, leaking a cross-provider chain of thought and in
practice causing HTTP 400.

DeepSeek's own _build_assistant_message always pins reasoning_content=''
at creation time for tool-call turns, so the shape (reasoning set,
reasoning_content absent, tool_calls present) is unreachable from
same-provider DeepSeek history — it can only come from a prior provider.
Pad with '' in that case instead of promoting.

Healthy same-provider 'reasoning' promotion (no tool_calls, or on
providers that do not require the empty-string pin) is unchanged.
2026-04-27 04:06:23 -07:00
Teknium 65f648ee84 fix(website): auto-wrap ASCII-art code blocks in generated skill pages (#16497)
Defensive: when the generator encounters a fenced code block containing
Unicode box-drawing characters, wrap it in `<!-- ascii-guard-ignore -->`
markers so the docs-site-checks lint (which scans inside code fences)
can't reject the page for a skill's own diagram.

Plain bash/python code blocks stay uncluttered — only blocks with box
chars get wrapped. Skill authors no longer have to remember to add the
ignore markers in every SKILL.md with ASCII art.

Fixes #15305.
2026-04-27 03:38:39 -07:00
Wysie 64a497bfa9 fix(hindsight): preserve setup config on blank input 2026-04-27 03:34:58 -07:00
Teknium 90a3e73daf fix(debug): sweep expired paste.rs uploads on a real timer (#16431)
Previously 'hermes debug share' uploads only got DELETEd when the user
ran 'hermes debug share' again — opportunistic-sweep-on-invoke was the
only cleanup path. A user who uploaded once and never ran debug again
left pastes up until paste.rs's retention kicked in (which, empirically,
never actually expires them).

Hook _sweep_expired_pastes into the gateway cron ticker at the same
hourly cadence as the image/document cache cleanups. The opportunistic
sweep in 'hermes debug share' stays as a fallback for CLI-only users
who never start the gateway.
2026-04-27 00:36:33 -07:00
vominh1919 2e6699b319 fix: strip leaked declare-x env dump from terminal output on macOS (#15459)
On macOS (bash 3.2 and some Homebrew bash builds) `source`ing a file that
contains `declare -x` statements prints each declaration to stdout. The
persistent-shell wrapper in tools/environments/base.py was only redirecting
stderr when sourcing the session snapshot, so ~60 lines of env vars leaked
into every terminal tool response — blowing out context and triggering
HTTP 400s on context-limited providers.

Fix: redirect both stdout and stderr when sourcing the snapshot. Linux
bash is silent here, so the redirect is harmless there; macOS no longer
leaks.

Closes #15459

Co-authored-by: Sanjays2402 <51058514+Sanjays2402@users.noreply.github.com>
2026-04-27 00:19:48 -07:00
Teknium 21f503c23c feat(update): snapshot pairing data before git pull (#16383)
Quick state snapshot now includes pairing JSONs (generic + legacy +
Feishu comment pairing), and `hermes update` takes a pre-update
snapshot labeled `pre-update` before pulling.

Pairing data lives outside state.db in platform-specific JSONs under
~/.hermes/pairing/, ~/.hermes/platforms/pairing/, and
~/.hermes/feishu_comment_pairing.json.  The update command already
couldn't touch $HERMES_HOME, but #15733 reports lost pairing after
an update — this gives users something to restore from via
`/snapshot list` / `/snapshot restore <id>` if anything clobbers
the approved-user lists.

- Extend _QUICK_STATE_FILES with pairing paths (files + dirs)
- Snapshot walks directories recursively and records each file in the
  manifest individually so restore logic is unchanged
- _cmd_update_impl calls create_quick_snapshot(label='pre-update')
  after 'Found N new commits' and before 'Pulling updates'
- Snapshot failures are logged at debug and never block the update

Refs #15733.
2026-04-27 00:19:12 -07:00
Teknium a32d07529c fix(file-tools): escalate to BLOCKED on repeated read_file dedup stubs (#16382)
read_file's dedup path returned a lightweight stub on re-reads of an
unchanged file, then returned early — so the consecutive-read loop
guard (hard block at count>=4) at the bottom of read_file_tool never
ran for stub-looped calls. Weaker tool-following models (local Qwen3.6
variants in the reported case) ignore the passive 'refer to earlier
result' hint and hammer the same read_file call until iteration budget
runs out.

Track per-key stub returns in task_data['dedup_hits'] and, on the
second stub for the same (path, offset, limit), return a hard BLOCKED
error mirroring the wording the real-read path already uses. A real
read, an intervening non-read tool call (notify_other_tool_call), or
reset_file_dedup (on context compression) all clear the counter so
the guard never stays engaged longer than the actual loop.

Closes #15759
2026-04-27 00:17:26 -07:00
alberto 3ff3dfb5ac fix(telegram): accept /cmd@botname from bot menu in groups
Telegram groups emit a single bot_command entity covering the whole
/cmd@botname span with no accompanying mention entity, so the existing
mention gate in _message_mentions_bot dropped slash commands sent via
the bot-menu autocomplete whenever require_mention is enabled.

Recognise bot_command entities whose @botname suffix matches the bot
username (case-insensitive) as a direct mention, and keep rejecting
commands addressed at other bots. Fixes #15415.
2026-04-26 22:00:18 -07:00
Teknium 8258f4dcb7 fix(model): avoid persisting key_env-resolved secrets to providers entry (#16372)
When 'hermes model' runs against a providers: (keyed-schema) entry that
relies only on key_env, the picker resolves the env var for the live
/models request and then wrote a synthesized 'api_key: ${KEY_ENV}' back
to the providers.<key> entry. That's redundant — the runtime already
resolves from key_env directly — and it clutters configs that
intentionally keep credentials out of config.yaml.

Only persist provider_entry['api_key'] when the user originally had an
inline value (literal secret or ${VAR} template). Entries that declared
only key_env stay clean on save.

Fixes #15803.
2026-04-26 21:52:12 -07:00
Teknium 9f1b1977bc docs(skills): salvage dropped trigger content into skill bodies
For 14 of 74 compressed skills, the original description contained
trigger keywords, technique counts, attribution, or use-case phrases
not covered by the existing body content. Prepends a 'When to use' /
'What's inside' block near the top so the agent still has the full
context when the skill is loaded.

Skills salvaged:
- codex, ascii-video, creative-ideation, excalidraw, manim-video, p5js
- gif-search, heartmula, youtube-content
- lm-evaluation-harness, obliteratus, vllm, axolotl
- powerpoint

Remaining 60 skills were verified to already cover the dropped content
in their existing body sections (When to Use, overview, intro prose)
or had short descriptions fully captured by the new compressed form.
2026-04-26 21:50:56 -07:00
Teknium e3921e7ca4 docs(skills): compress 74 built-in skill descriptions to <=60 chars
Target: every skill's description fits in a one-line gateway menu and
leads with trigger keywords an agent would match on. Drops filler like
'Use this skill to', 'A skill for', 'This skill provides'.

Before: max description length was 791 chars (architecture-diagram),
74 of 81 built-in skills were >60 chars.

After: max 60, mean 54, all 81 built-in skills <=60.

Rewritten with double-quoted YAML scalars to preserve Chinese/arrow
glyphs (baoyu-comic, yuanbao, youtube-content).
2026-04-26 21:50:56 -07:00
Teknium 7d586ddb42 docs(skills): trim design skill descriptions to <=60 chars + inline cross-ref
- claude-design: 'Design one-off HTML artifacts (landing, deck, prototype).' (57)
- popular-web-designs: '54 real design systems (Stripe, Linear, Vercel) as HTML/CSS.' (60)
- design-md: "Author/validate/export Google's DESIGN.md token spec files." (59)

Also adds an inline callout near the top of claude-design pointing to
popular-web-designs and design-md so the cross-reference lands even
without reading the full decision table.
2026-04-26 21:50:56 -07:00
Teknium a131c134bc chore(release): map BadTechBandit in AUTHOR_MAP 2026-04-26 21:50:56 -07:00
Teknium 55be532369 docs(skills): clarify when to use claude-design vs popular-web-designs vs design-md
- claude-design: design process + taste for one-off HTML artifacts
- popular-web-designs: 54 ready-to-paste design systems (Stripe/Linear/etc.)
- design-md: formal DESIGN.md token spec file authoring

Adds a comparison table to claude-design's 'When To Use' section and
reciprocal pointers in design-md and popular-web-designs. Also corrects
claude-design author attribution to BadTechBandit.
2026-04-26 21:50:56 -07:00
CREWorx 8c5d3a99d6 feat(skills): add claude-design HTML artifact skill 2026-04-26 21:50:56 -07:00
Teknium af3d5150c1 fix(matrix): close 'hall of mirrors' pairing + echo loop (#15763) (#16374)
Harden the Matrix adapter's sender-drop guards so bot-self events and
appservice/bridge identities never reach the gateway's pairing flow or
the agent loop.

Two filters, applied as early as possible in _on_room_message (and
_on_reaction for the self-filter):

1. _is_self_sender(sender) — case-insensitive + whitespace-trimmed
   equality with self._user_id.  When self._user_id is still empty
   (whoami has not resolved, or login failed), returns True
   defensively: an unidentified bot dropping its own events is always
   preferable to falling into an echo loop.  The previous byte-for-byte
   equality check let differently-cased copies of the bot's MXID slip
   through, and an unresolved self-ID silently disabled the guard.

2. _is_system_or_bridge_sender(sender) — drops appservice namespace
   puppets (conventional @_bridge_...:server form) and malformed
   senders with an empty localpart.  These identities used to fall
   through to the gateway's unauthorized-user path, trigger a pairing
   code, and — once an operator approved the bridge — every outbound
   message the bridge relayed would loop back as an authorized user
   message.  This was the root of the 'hall of mirrors' symptom.

Fixes #15763

Test plan
---------
scripts/run_tests.sh tests/gateway/test_matrix.py
scripts/run_tests.sh tests/gateway/test_matrix_mention.py tests/gateway/test_matrix_voice.py
All 182 tests pass.  14 new regression tests cover exact / case-insensitive
/ whitespace / unresolved-self-id matches, bridge prefix detection, empty
sender, and the full _on_room_message drop path.
2026-04-26 21:50:28 -07:00
Teknium 4a2ee6c162 fix(title-gen): surface auxiliary failures via _emit_auxiliary_failure
Closes #15775.

Title generation swallowed exceptions at debug level and returned None,
so a depleted auxiliary provider (e.g. OpenRouter 402) silently left
sessions with NULL titles. Reporter observed 45 untitled sessions
accumulated over 19 days with no user-visible indication.

- agent/title_generator.py: accept optional failure_callback, bump log
  to WARNING, invoke callback on call_llm exception (swallowing callback
  errors so nothing can crash the fire-and-forget worker thread).
- cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the
  callback so failures route through the existing user-visible warning
  channel.
- tests: cover callback fires / errors are swallowed / no-callback
  legacy behavior / maybe_auto_title forwards kwarg to worker.
2026-04-26 21:49:34 -07:00
briandevans bda2dbc29e fix(compressor): apply bare-string guard to protect-tail boundary scan
The bare-string isinstance guard added in 80ae2621 covered _find_tail_cut_by_tokens
(line 1084) but missed the identical pattern in _calculate_protect_tail_boundary
(line 487, the protect-tail scan loop).  Both loops call .get("text", "") on every
list item in message["content"]; both crash with AttributeError when that list
contains a bare string.

Apply the same dict/str/fallback isinstance guard to the protect-tail path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:48:09 -07:00
briandevans 943465235e fix(compressor): guard against bare-string items in multimodal content list
raw_content from message["content"] can be a list that contains bare
strings, not only dicts.  The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.

Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)).  Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 21:48:09 -07:00
briandevans cfc8befe65 fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens
_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.

This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.

Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
  sum(len(p.get("text", "")) for p in raw_content)
  if isinstance(raw_content, list) else len(raw_content)

Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.

Fixes #16087.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 21:48:09 -07:00
Teknium 3e68809fe0 chore(release): map romanornr noreply email 2026-04-26 21:47:40 -07:00
romanornr a0fe73bada fix(cli): strip leaked bracketed-paste wrappers 2026-04-26 21:47:40 -07:00
Teknium 7c63c24613 fix(cron): don't silently disable recurring cron jobs when croniter is missing (#16368)
If the gateway's Python env loses access to 'croniter' between when a
cron job was created and when mark_job_run() fires, compute_next_run()
returns None for cron schedules. mark_job_run() treated that as terminal
completion and wrote enabled=false, state=completed — turning a missing
runtime dep into a silent, permanent job-off.

That behaviour is safe for one-shot jobs but wrong for recurring ones. A
missing dep should surface as an error the user can see, not as successful
completion of a job that is about to stop firing.

mark_job_run() now only disables the job on next_run_at=None when the
schedule is one-shot. For recurring (cron/interval) schedules it keeps
enabled=true, sets state=error, and records last_error so the user can
see why the job isn't advancing. compute_next_run() also logs a warning
the first time cron+no-croniter hits, so the underlying cause is visible
in the gateway log.

Tests cover:
- recurring cron job stays enabled with state=error when HAS_CRONITER=False
- recurring interval stays enabled when compute_next_run returns None
- one-shot jobs still flip to enabled=false, state=completed (no regression)

Fixes #16265
2026-04-26 21:47:32 -07:00
Teknium c5781d50c7 fix(azure-foundry): auto-route gpt-5.x / codex / o-series to Responses API (#16361)
Azure Foundry deploys GPT-5.x, codex-*, and o1/o3/o4 reasoning models as
Responses-API-only.  Calling /chat/completions against these deployments
returns 400 'The requested operation is unsupported.', which broke any
user who ran 'hermes model' on Azure, picked a gpt-5/codex deployment,
and kept the default api_mode: chat_completions.  Verified in a user
debug bundle on 2026-04-26: gpt-5.3-codex failed on synopsisse.openai.azure.com
with that exact payload while gpt-4o-pure on the same endpoint worked.

Adds azure_foundry_model_api_mode(model_name) that returns
codex_responses when the model name starts with gpt-5, codex, o1, o3,
or o4 — otherwise None so chat_completions / anthropic_messages stay
untouched for gpt-4o, Llama, Claude-via-Anthropic, etc.

Resolver (both the direct Azure Foundry path and the pool-entry path)
consults it and upgrades api_mode unless the user explicitly picked
anthropic_messages.  target_model (from /model mid-session switch)
takes precedence over the persisted default so switching from gpt-4o
to gpt-5.3-codex routes correctly before the next request.

Docs: correct the azure-foundry guide which previously claimed Azure
keeps gpt-5.x on chat completions — that was only true for early Azure
OpenAI, not Azure Foundry codex/o-series deployments.

Tests: 14 unit tests for azure_foundry_model_api_mode + 6 integration
tests in TestAzureFoundryResolution covering Bob's exact scenario,
target_model override, anthropic_messages guard, and o3-mini.
2026-04-26 21:33:31 -07:00
Teknium 235bfb192b docs(skills): document URL install across features, reference, guide, and hermes-agent skill (#16355)
Follow-up to #16323 — the UrlSource adapter is shipped but four
user-facing docs surfaces still only listed the hub-identifier forms.

- user-guide/features/skills.md: add ``url`` to the Supported-hub-sources
  table; add a new "#### 8. Direct URL (`url`)" section explaining scope
  (single-file SKILL.md only), name-resolution order (frontmatter → URL
  slug → interactive prompt → --name flag), and both TTY and
  non-interactive usage. Add two URL examples to the install-examples
  block near the top of the page.
- reference/cli-commands.md: two URL install examples + one note
  explaining the name-resolution fallback chain.
- guides/work-with-skills.md: one URL-install example alongside the
  existing hub-identifier examples.
- skills/autonomous-ai-agents/hermes-agent/SKILL.md: Quick Reference
  block's ``hermes skills install`` line now spells out that ID can be
  a hub identifier OR a direct SKILL.md URL, and mentions --name for
  frontmatter-less skills.

No code changes. No new dependencies. Website builds via the usual
Docusaurus pipeline.

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-26 21:27:59 -07:00
brooklyn! e63929d4f3 Merge pull request #15926 from NousResearch/bb/tui-long-session-perf
perf(tui): stabilize long-session scrolling
2026-04-26 23:10:08 -05:00
Teknium 859e09b7ce chore(release): map xiahu889889@proton.me to xiahu88988 2026-04-26 21:08:19 -07:00
xiahu88988 898ccfd667 fix(skills): honor scope query from Google OAuth redirect URL
Parse scope from the raw callback URL before stripping the auth code so Flow.fetch_token matches user-granted scopes. Add regression test for dual-scope callbacks.

Made-with: Cursor
2026-04-26 21:08:19 -07:00
Teknium 6c87371815 fix(openclaw-migration): case-preserving brand rewrite + one-time ~/.openclaw residue banner (#16327)
Two related fixes for OpenClaw-residue problems after an OpenClaw→Hermes
migration (especially migrations done via OpenClaw's own tool, which
doesn't archive the source directory).

1. optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py:
   rebrand_text() was rewriting ~/.openclaw/config.yaml → ~/.Hermes/config.yaml
   (capital H — a directory that doesn't exist). Now case-preserving:
   "OpenClaw" → "Hermes" (prose), but "openclaw" → "hermes" (so filesystem
   paths land on the real Hermes home). Regex logic unchanged — replacement
   function now checks if the matched text was all-lowercase and emits the
   replacement in the matching case.

2. agent/onboarding.py + cli.py: one-time startup banner the first time
   Hermes launches and finds ~/.openclaw/. Tells the user to run
   `hermes claw cleanup` to archive it, gated on the existing onboarding
   seen-flag framework (onboarding.seen.openclaw_residue_cleanup in
   config.yaml). Fires once per install; re-running requires wiping that
   flag or running cleanup directly.

Tests:
- 4 new TestDetectOpenclawResidue tests (present / absent / file-instead-
  of-dir / default-home smoke)
- 2 TestOpenclawResidueHint tests (content check)
- 2 TestOpenclawResidueSeenFlag tests (flag isolation + round-trip)
- test_rebrand_text_preserves_filesystem_path_casing regression test
  with 4 scenarios including the exact ~/.openclaw/config.yaml case
- Existing test_rebrand_text_* tests updated to the new case-preserving
  contract (lowercase input → lowercase output)

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-26 20:57:26 -07:00
Teknium 517f30b043 improve(agent): guidance for plain-text URLs, subagent language/verification, hermes-config routing (#16325)
Four small tool-description / skill-content tweaks addressing recurring
model mistakes seen in @versun's docx feedback (Kimi 2.6, but the patterns
apply to every model):

1. browser_navigate description: call out .md/.txt/.json/.yaml/.csv/.xml,
   raw.githubusercontent.com, and API endpoints as specifically preferring
   curl or web_extract. The generic "prefer web_search or web_extract" was
   too weak; models kept firing up the browser for plain-text URLs.

2. delegate_task description: two additions.
   (a) Pass user language / output-style preferences in 'context' when they
   differ from English — otherwise subagents default to English and their
   summaries contaminate the final reply (caused the bilingual digest bug).
   (b) Subagent summaries are self-reports, not verified facts. For
   operations with external side-effects (HTTP uploads, remote writes,
   file creation at shared paths), require a verifiable handle (URL, ID,
   path) and verify it yourself before claiming success.

3. agent/prompt_builder.py Skills-mandatory block: new explicit line
   "Whenever the user asks to configure / set up / modify / install /
   enable / disable / troubleshoot Hermes Agent itself, load the
   `hermes-agent` skill first." The generic "load what's relevant" didn't
   route Hermes-meta questions (like "how do I turn off redaction?") to
   the one skill that has the answer.

4. skills/autonomous-ai-agents/hermes-agent/SKILL.md: new "Security &
   Privacy Toggles" section covering security.redact_secrets (with the
   import-time-snapshot restart-required caveat), privacy.redact_pii,
   approvals.mode (manual/smart/off) + --yolo + HERMES_YOLO_MODE, shell
   hooks allowlist, and how to disable network/media tools entirely.
   Every command verified against the actual config keys — no invented
   knobs.

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-26 20:57:19 -07:00
Teknium 9c416e20ab feat(skills): install skills from a direct HTTP(S) URL (#16323)
* feat(skills): install skills from a direct HTTP(S) URL

Adds UrlSource adapter so `hermes skills install <url-to-SKILL.md>` and
`/skills install <url>` work as first-class operations — no more
improvising with curl + patch + cp.

- Claims identifiers that start with http(s):// and end in .md
- Skips /.well-known/skills/ URLs (WellKnownSkillSource handles those)
- Skill name from YAML frontmatter, URL-slug fallback
- Single-file SKILL.md only (v1 scope — multi-file skills need a manifest)
- Trust level 'community'; full security scan still runs
- Lock file stores the URL as identifier so `hermes skills update`
  re-fetches from the same URL cleanly

Scope matches real user need from @versun's docx feedback where
`https://sharethis.chat/SKILL.md` had no first-class install path.

* feat(skills): interactive name/category for URL installs + --name override

Follow-up to the UrlSource adapter. The previous commit fell back to weak
heuristics when frontmatter had no ``name:`` and could produce garbage names
like ``SKILL`` or ``unnamed-skill``. Now:

tools/skills_hub.py
- ``UrlSource._is_valid_skill_name()`` — strict identifier check
  (``^[a-z][a-z0-9_-]*$``), rejects sentinel values (``SKILL``, ``README``,
  ``INDEX``, ``unnamed-skill``, empty, non-strings).
- ``_resolve_skill_name()`` returns ``Optional[str]`` — ``None`` when
  nothing valid is resolvable. Also ignores unsafe frontmatter names
  (``../evil``) and falls through to URL slug instead of returning None
  immediately, so a URL with a bad frontmatter but a good path still
  works.
- ``fetch()``/``inspect()`` carry an ``awaiting_name=True`` marker in
  metadata/extra when resolution fails, letting ``do_install`` decide
  whether to prompt, apply an override, or error out.

hermes_cli/skills_hub.py
- ``do_install`` gains a ``name_override`` parameter.
- On URL-sourced bundles with ``awaiting_name=True``:
  1. If ``name_override`` is valid → use it.
  2. If ``name_override`` is invalid → refuse with a clear error.
  3. Else if ``skip_confirm=True`` (non-interactive: slash / TUI /
     gateway / scripts) → refuse with an actionable retry hint pointing
     at ``--name <your-name>`` on both CLI and slash forms.
  4. Else (interactive TTY) → prompt for the name.
- Interactive TTY also prompts for a category when none is given for a
  URL-sourced install, hinting existing category buckets so users can
  reuse ``productivity``, ``devops``, etc. Empty input → flat install.
- ``_existing_categories()`` scans ``~/.hermes/skills/`` for subdirs that
  look like category buckets (contain nested SKILL.md files); skips
  top-level skills and hidden dirs.
- ``_prompt_for_skill_name()`` / ``_prompt_for_category()`` helpers
  (EOF/Ctrl-C-safe, match the existing ``Confirm [y/N]`` prompt style).

hermes_cli/main.py
- ``hermes skills install`` argparse gains ``--name <name>``.

hermes_cli/skills_hub.py (slash)
- ``/skills install <url> --name <x>`` parsing added.

Tests
- tests/tools/test_skills_hub.py: updated ``UrlSource`` tests to assert
  the new ``awaiting_name`` metadata; added 4 new tests for
  ``_is_valid_skill_name`` rejection sets and the awaiting-name marker.
- tests/hermes_cli/test_skills_hub.py: 8 new tests covering --name
  override accept/reject, non-interactive error, interactive name prompt,
  interactive category prompt, cancel-aborts-install, and
  ``_existing_categories`` scan behavior (buckets vs flat skills).
- E2E verified all four paths (no-name/no-override → error;
  --name override → install; frontmatter name → install;
  invalid --name → rejection).

---------

Co-authored-by: teknium1 <teknium@noreply.github.com>
2026-04-26 20:57:10 -07:00
Brooklyn Nicholson d308ae27e1 fix(nix): refresh tui npm deps hash
Update nix/tui.nix npmDeps hash to match the current ui-tui package-lock inputs so nix builds and CI lockfile checks pass.
2026-04-26 22:56:36 -05:00
sprmn24 b288934dff fix(discord_tool): coerce limit parameter to int before min() call
_search_members() and _fetch_messages() call min(limit, 100) assuming
limit is int. Models can pass limit as a string (e.g. "10"), causing
TypeError: '<' not supported between instances of 'str' and 'int'.

Add try/except int() coercion with safe defaults at the top of both
functions, matching the pattern used in session_search fix (#10522).
2026-04-26 20:48:38 -07:00
Teknium e19854d893 fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322)
`_resolve_effective_accept()` used `return bool(cfg_val)` for the
`hooks_auto_accept` config key. In Python, `bool("false")` is `True`,
so a user setting `hooks_auto_accept: "false"` (quoted YAML string)
in `config.yaml` would silently enable auto-approval of every shell
hook, bypassing the consent prompt entirely.

Replace the coercion with the same type-aware parsing already used for
the HERMES_ACCEPT_HOOKS env var three lines above: bool passthrough,
strings checked against {1,true,yes,on} case-insensitively, everything
else (including "false", None, 0, ints) rejected.

Add TestHooksAutoAcceptParsing guarding the regression across all four
value shapes (bool, string-truthy, string-falsy, missing/None).

Reported by @sprmn24 in #16244.
2026-04-26 20:48:35 -07:00
Teknium 6993e566ba fix(whatsapp_identity): pin identifier regex to ASCII, clarify it's defense-in-depth
Follow-up on top of #16243. Two small tweaks:

- Compile the regex once as `_SAFE_IDENTIFIER_RE` and pin it to
  `[A-Za-z0-9@.+\-]`. The previous `\w` accepts Unicode word chars
  (full-width digits, accented letters) which aren't valid WhatsApp
  identifiers and shouldn't reach the mapping-file lookup.
- Add a comment clarifying this is defense-in-depth, not a live
  traversal. The hardcoded `lid-mapping-{current}{suffix}.json`
  prefix already prevents escape via pathlib's component split —
  with `current='../secrets'`, the first path component under
  `session/` is the literal directory name `lid-mapping-..`,
  which the attacker cannot create.

E2E verified: legit mapping chains still resolve, all probed attack
shapes (`../`, absolute paths, shell metacharacters, Unicode digit
tricks) are rejected before any file access.
2026-04-26 20:48:31 -07:00
sprmn24 91512b8210 fix(whatsapp_identity): guard against path traversal and silent mapping errors
expand_whatsapp_aliases() interpolated untrusted identifiers directly
into filenames (lid-mapping-{current}.json) without validation.
An identifier containing ../ or / could escape the session directory.

Also replaced bare except Exception: continue with targeted
(OSError, json.JSONDecodeError) and a debug log so mapping
corruption is diagnosable instead of silently skipped.

Fixes:
- Reject identifiers with unsafe characters via re.match guard
- Replace broad exception swallow with specific catch + debug log
2026-04-26 20:48:31 -07:00
Teknium 366351b94d refactor(timeouts): drop redundant ImportError in except clause
Exception already covers ImportError; (ImportError, Exception) was a
cosmetic wart from the bugfix. Pure no-op.
2026-04-26 20:48:20 -07:00
sprmn24 16e243e067 fix(timeouts): guard load_config() call against runtime exceptions
Both get_provider_request_timeout() and get_provider_stale_timeout()
wrapped the load_config import in try/except ImportError but left the
actual load_config() call unprotected. A corrupt config file, YAML
parse error, or permission failure would raise instead of returning
None safely.

Move load_config() inside the try block so any exception returns None.
2026-04-26 20:48:20 -07:00
Brooklyn Nicholson 3e1664923d Revert "fix(tui): report actual session on exit"
This reverts commit 1566f1eecc.
2026-04-26 22:43:34 -05:00
Brooklyn Nicholson c23463fce9 chore(tui): keep MRU resume split out of perf PR
- remove the temporary -c MRU logic and companion test from this branch so PR #15926 stays focused on TUI perf work
- keep the resume-ordering change isolated in the dedicated follow-up PR
2026-04-26 22:40:35 -05:00
Brooklyn Nicholson de790eaceb test(tui): align viewport snapshot key test with quantization
- keep 8-row key binning for scroll jitter stability and update the assertion to match runtime behavior
2026-04-26 22:35:55 -05:00
Brooklyn Nicholson d81b1cd86c chore: uptick 2026-04-26 22:22:31 -05:00
Brooklyn Nicholson 7945fcef21 Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/tui-long-session-perf 2026-04-26 22:17:22 -05:00
Brooklyn Nicholson ffa33e53f6 chore(tui): remove dead branch cleanup code
- drop unused TUI helpers, test-only layout scaffolding, and stale public debug exports
- remove an unused profiler import and trim test-only coverage for deleted helpers
2026-04-26 21:54:24 -05:00
Brooklyn Nicholson 635948d0e0 chore(tui): tighten todo-fix comments, drop dead archive call
- gateway handler: turnController always archives in recordMessageComplete,
  so the post-complete archiveTodosAtTurnEnd().forEach is dead code. Drop
  it and the now-unused import.
- turnController: collapse archive prepend into a single spread expression.
- gateway server: one-line comment for the tool.start todo skip.
2026-04-26 21:46:50 -05:00
Brooklyn Nicholson c2ca02fcff fix(tui): stabilize live todo panel count and anchor position
Two bugs surfaced together while the model fired the todo tool:

1. Count flickered (e.g. 3 → 1 → 3) because tool.start echoed
   args.todos as the live state. With merge=true (or any partial
   replacement) args.todos is just the items being updated, not the
   full list. Drop the early echo — tool.complete already carries the
   canonical full list from the tool result.

2. After turn end the panel jumped from under the user prompt to below
   thinking/tools because archiveDoneTodos() was pushed AFTER segments
   in finalMessages. Prepend the archive trail msg so it sits right
   after the user prompt — same visual slot the live panel occupied
   during streaming.
2026-04-26 21:45:18 -05:00
Brooklyn Nicholson b51c528613 fix(tui): address virtual row and perf log review notes
Keep transcript row keys stable across capped-history trims and rename React Profiler timestamp fields so JSONL consumers don't confuse absolute timestamps with durations.
2026-04-26 21:37:43 -05:00
Brooklyn Nicholson 625c31fcea fix(tui): run built TUI with production React by default
CPU profiling showed the built TUI loading React development modules unless NODE_ENV was set. Default CLI and dashboard TUI children to production while preserving explicit user overrides.
2026-04-26 21:34:31 -05:00
Brooklyn Nicholson dda12775f2 fix(tui): address Copilot review follow-ups
Keep history metadata consistent with lineage replay, globally order replayed lineage messages, and make Ink cache eviction report post-eviction sizes. Also keys TUI config cache by path to avoid cross-home test leakage.
2026-04-26 21:24:54 -05:00
Brooklyn Nicholson 2e4b65b9f5 chore(tui): clean remaining Ink perf scaffolding
Trim narration comments and collapse small one-off helpers in the remaining ui-tui perf support files while preserving behaviour.
2026-04-26 21:20:54 -05:00
Teknium cb51baeceb chore(release): map Tosko4 in AUTHOR_MAP 2026-04-26 19:07:18 -07:00
Tosko4 e85b752516 fix: signal compression boundary to context engine
When _compress_context rotates session_id (compression split), fire
on_session_start(new_sid, boundary_reason="compression",
old_session_id=<old>) on the active context engine. Plugin engines
(e.g. hermes-lcm) use this to preserve DAG lineage across the rollover
instead of re-initializing fresh per-session state.

Built-in ContextCompressor.on_session_start accepts **kwargs and ignores
them — no behavior change for default users.

Closes hermes-lcm#68 symptom: after Hermes compressed and minted a new
physical session, LCM was treating the split as a fresh /new and losing
continuity (compression_count: 1, store_messages: 0, dag_nodes: 0).

Credit: @Tosko4 (PR #13370) — minimized scope to the boundary_reason
signal only; the broader session-lifecycle refactor will be taken in
separate PRs if justified by concrete plugin need.
2026-04-26 19:07:18 -07:00
Brooklyn Nicholson 7da2f07641 Merge remote-tracking branch 'origin/main' into bb/tui-long-session-perf 2026-04-26 21:07:15 -05:00
Teknium 478444c262 feat(checkpoints): auto-prune orphan and stale shadow repos at startup (#16303)
Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/.  The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever.  Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.

Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from #13861 / #16286:

- tools/checkpoint_manager.py: new prune_checkpoints() and
  maybe_auto_prune_checkpoints() helpers.  Deletes shadow repos that
  are orphan (HERMES_WORKDIR marker points to a path that no longer
  exists) or stale (newest in-repo mtime older than retention_days).
  Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
  runs once per min_interval_hours regardless of how many hermes
  processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
  retention_days / delete_orphans / min_interval_hours knobs.
  Default auto_prune: false so users who rely on /rollback against
  long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
  called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
  tracking, non-shadow dir skip, interval gating, corrupt marker
  recovery.

Refs #3015 (session-file disk growth was fixed in #16286; this covers
the checkpoint side noted out-of-scope there).
2026-04-26 19:05:52 -07:00
Teknium ced8f44cd2 fix(file-tools): broaden dedup-status write guard to cover small wrappers
The write_file guard added in #16223 used strict equality against the
internal dedup status message. In practice, the model sometimes
prepends a short note or appends a trailing comment before calling
write_file, which slipped past the strict check.

Broaden the heuristic: reject writes whose stripped content equals
the status message OR contains it and is <=2x its length. Short,
status-dominated writes are always corruption; legitimate docs that
quote the message verbatim are always much longer.

Adds two tests: one for the small-wrapper corruption shape, one
confirming large legitimate files that quote the status still write.
2026-04-26 19:05:36 -07:00
helix4u 977d5f56c9 fix(file-tools): keep read dedup status out of file content 2026-04-26 19:05:36 -07:00
voidborne-d a32b325d06 fix(tools): invalidate read_file dedup cache on write_file and patch
write_file_tool and patch_tool both call _update_read_timestamp to
refresh the staleness tracker after writing, but they never invalidate
the dedup cache entries for the written path.  The dedup cache keys are
(resolved_path, offset, limit) → mtime tuples populated by read_file_tool.

On filesystems where a read and write land in the same mtime second (or
when mtime granularity is 1s), the cached and current mtime are equal,
so the dedup check incorrectly returns a 'File unchanged since last
read' stub — even though the file was just overwritten.

The agent then sees stale content (or a stale 'File not found' error)
and enters expensive error-recovery loops, burning API calls.

Fix: add _invalidate_dedup_for_path(filepath, task_id) that removes all
dedup entries whose resolved path matches the written file.  Called from
_update_read_timestamp so both write_file_tool and patch_tool benefit
automatically.  Scoped to the writing task_id — other tasks' caches are
not affected.

6 regression tests added covering:
- read→write→read within same mtime second (core #13144 scenario)
- invalidation across all offset/limit combinations
- isolation: writing file A does not invalidate file B's cache
- isolation: writing in task A does not invalidate task B's cache
- _invalidate_dedup_for_path safety on missing task / empty dedup

All 25 tests pass (19 existing + 6 new).

Fixes #13144
2026-04-26 19:05:36 -07:00
0z! 419535f07f Update maps_client.py 2026-04-26 19:03:54 -07:00
0z! e504a599fe Update maps_client.py
fix: include seconds in timezone UTC offset output
2026-04-26 19:03:54 -07:00
Yukipukii1 dbe5015566 fix(session-search): exclude current lineage root deterministically in recent mode 2026-04-26 19:03:17 -07:00
teknium ebad6d3f1e chore(release): map yoimexex@gmail.com -> Yoimex 2026-04-26 19:02:55 -07:00
Teknium 87610ce380 fix(tools): coerce quoted use_gateway in image_gen UI detection
Follow-up to #15960 — the provider-active detection in tools_config.py
also read use_gateway with raw truthiness (is False, not dict.get), so
quoted 'false' caused the FAL-direct row to show wrong active status in
the hermes tools picker. Route both sites through is_truthy_value().
2026-04-26 19:02:55 -07:00
Yoimex f66ebe64e8 fix(cli): coerce use_gateway config flags in tool routing 2026-04-26 19:02:55 -07:00
Teknium 36b13709f5 chore(release): map johnncenae in AUTHOR_MAP 2026-04-26 19:01:50 -07:00
Teknium 77d4766602 fix(gateway): clear pending model note on auto-reset paths too
PR #16013 plugged the leak in `/new`, but two sibling session-boundary
resets had the same bug:

1. Inactivity / suspended-session auto-reset (top of `_handle_message`)
   previously cleared only reasoning. Now drops model override and the
   queued "/model switched" note as well.
2. Compression-exhaustion auto-reset now also drops the pending note
   alongside the existing model/reasoning cleanup.

All three session-boundary sites now use the identical cleanup idiom.
2026-04-26 19:01:50 -07:00
johnncenae 00c6480a05 fix(gateway): clear stale pending model note on session reset 2026-04-26 19:01:50 -07:00
helix4u 88a85d30c1 fix(logging): attach gateway log after cli init 2026-04-26 19:01:26 -07:00
simbam99 cebf95854b Fix MessageDeduplicator max_size enforcement 2026-04-26 18:51:51 -07:00
Teknium 34eb1aaa9a fix(update): use npm ci to stop rewriting package-lock on every update (#16295)
`npm install --silent` (used by `_build_web_ui` and `_update_node_dependencies`)
silently rewrites package-lock.json on npm ≥ 10 (strips "peer": true etc.),
leaving the working tree dirty after every `hermes update`. The next update
then detects the dirty lockfile and stashes it — producing a trail of
hermes-update-autostash entries for web/package-lock.json, ui-tui/package-lock.json,
and root package-lock.json.

Switch to `npm ci` (strict, lockfile-preserving) via a new
`_run_npm_install_deterministic` helper that falls back to `npm install`
when the lockfile is missing or out of sync (WIP forks).

Verified locally: all three lockfiles stay byte-identical after the real
_build_web_ui / _update_node_dependencies run twice back-to-back. Fallback
path tested with a deliberately out-of-sync lockfile and a no-lockfile case.
2026-04-26 18:51:31 -07:00
Teknium ab6879634e yuanbao platform (#16298)
Co-authored-by: loongzhao <loongzhao@tencent.com>
2026-04-26 18:50:49 -07:00
Teknium 5eb6cd82b2 fix(sessions): /save lands under $HERMES_HOME, widen browse+TUI picker, force-refresh ollama-cloud on setup (#16296)
Four independent session-UX bugs reported by an external user (#16294).

/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.

'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.

TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.

ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.

Closes #16294.
2026-04-26 18:49:48 -07:00
Teknium 7e3c8a31f0 feat(skills/airtable): tailor skill to Hermes idioms + expand cookbook
Expand the airtable skill from bare CRUD to a full Hermes-shaped
cookbook matching the linear/notion neighbors, and trim the
description to fit the 60-char system-prompt cutoff.

Hermes-specific additions:
- Explicit 'use the terminal tool with curl — not web_extract or
  browser_navigate' guidance, matching the same note in linear.
- Note that AIRTABLE_API_KEY flows from ~/.hermes/.env into the
  subprocess automatically via env_passthrough, so curl calls don't
  need to re-export it.
- Prefer 'python3 -m json.tool' (always present) over jq (optional)
  for pretty-printing, with -s on every curl to keep output clean.
- Read-before-write workflow that resolves record IDs via
  filterByFormula instead of guessing.

Cookbook expansion (new vs original):
- Field-type reference table (text, select, multi-select, attachment,
  linked record, user) with the exact write-shape Airtable expects.
- typecast flag for auto-coercing values / auto-creating select options.
- performUpsert PATCH for idempotent sync by merge field.
- Batch create/delete endpoints (10-record cap per call).
- Sort + fields query params with URL-encoding (%5B / %5D).
- Named-view query that applies saved filter/sort server-side.
- Full pagination loop template (while loop with offset).
- Common filterByFormula patterns (exact match, contains, AND/OR,
  date comparison, NOT empty).
- Rate-limit backoff guidance (Retry-After header, per-base budget).
- Airtable error-code reference (AUTHENTICATION_REQUIRED,
  INVALID_PERMISSIONS, MODEL_ID_NOT_FOUND,
  INVALID_MULTIPLE_CHOICE_OPTIONS) so the agent can map failures to
  user-actionable fixes instead of just retrying.

Also: description trimmed from 183 chars (truncated to 60 in system
prompt, losing 'filter/upsert/delete' trigger terms) down to 59 chars
that render whole: 'Airtable REST API via curl. Records CRUD, filters,
upserts.' Catalog row updated to match.

SKILL.md grew from 115 to 228 lines — still under the 500-line soft
cap and below the linear skill (297 lines) which serves the same
role for GraphQL.
2026-04-26 18:45:15 -07:00
Teknium 0bef0b9416 chore: docs + attribution for airtable skill
- scripts/release.py: map sonoyuncudmr@gmail.com -> Sonoyunchu so the
  check-attribution CI job and release notes credit Soynchu correctly.
- website/docs/reference/skills-catalog.md: add the airtable row to
  the productivity bundled-skills table.
2026-04-26 18:45:15 -07:00
Teknium 55e9329ee6 feat(config): register bundled-skill API keys in OPTIONAL_ENV_VARS
Adds NOTION_API_KEY, LINEAR_API_KEY, TENOR_API_KEY, and AIRTABLE_API_KEY
to OPTIONAL_ENV_VARS so:

- They persist to ~/.hermes/.env via save_env_value like every other
  key Hermes knows about, instead of being ad-hoc variables the user
  has to hand-edit the dotfile for.
- load_env() / reload_env() populate os.environ from .env on every
  startup — the user sets the key once, skills keep working across
  restarts without losing access.
- hermes setup / hermes config show surface them as known optional
  vars with the correct signup URL (linear.app/settings/api,
  airtable.com/create/tokens, etc.).

These four entries use category="skill" (new) rather than "tool".
tools/environments/local.py auto-adds every category=tool/messaging
entry to _HERMES_PROVIDER_ENV_BLOCKLIST, which stops env passthrough
from leaking provider credentials into the execute_code sandbox
(GHSA-rhgp-j443-p4rf). Skill API keys are the opposite case — the
point is for the agent's subprocess to see them so curl can read
Authorization headers — so they must be outside the blocklist. The
new category is inert for that check.

All four entries are advanced=True: they show up in 'hermes config'
and 'hermes status' displays, but do not nag users who have never
touched those skills during setup checklists.

E2E verified: save_env_value → reload_env → os.environ populated →
skill_view reports setup_needed=False → env_passthrough registers
the key for subprocess inheritance.
2026-04-26 18:45:15 -07:00
Teknium 0d4247d9bf fix(skills/airtable): use .env credential pattern matching notion/linear
Convert the airtable skill from 'skills.config.airtable.api_key'
(config.yaml, wrong bucket for a secret) to 'prerequisites.env_vars:
[AIRTABLE_API_KEY]' (~/.hermes/.env), matching every other bundled
skill that authenticates with an API token.

Why the original shape was wrong:
- metadata.hermes.config is for non-secret skill settings (paths,
  preferences) per references/skill-config-interface.md. Storing a
  bearer token under skills.config.* also triggered the documented
  'hermes config migrate' nag-on-every-run problem.
- The Quick Reference's 'AIRTABLE_API_KEY=...' bash line couldn't
  read skills.config.airtable.api_key anyway — it's a yaml path, not
  an env var.

Follow-up polish on the same pass:
- Added version/author/license frontmatter to match notion/linear.
- Added prerequisites.commands: [curl].
- Setup section now specifies the PAT format (pat...) that replaced
  legacy 'key...' API keys in Feb 2024, plus the three required scopes
  (data.records:read/write, schema.bases:read) and the per-base Access
  list requirement.
- Clarified PATCH vs PUT and pagination (100 records/page cap).
- Swapped verification from 'hermes -q ...' (non-deterministic) to a
  curl /v0/meta/bases call that returns a verifiable HTTP status code.
2026-04-26 18:45:15 -07:00
Sonoyunchu c997183f53 feat(skills): add bundled Airtable productivity skill 2026-04-26 18:45:15 -07:00
Teknium f01e4402a9 chore(release): map georgeglessner in AUTHOR_MAP 2026-04-26 18:43:57 -07:00
George Glessner 5b5a53a155 fix(cli): check hermes_cli/web_dist/ not web/dist/ for build staleness
_web_ui_build_needed() in PR #14914 checked web_dir/"dist" as the
sentinel, but vite.config.ts sets outDir: "../hermes_cli/web_dist" so
the build output lands in hermes_cli/web_dist/, never in web/dist/.
The sentinel was therefore always missing → _web_ui_build_needed always
returned True → npm install + Vite build ran on every startup → OOM on
low-memory VPS persisted unchanged.

Fix: derive dist_dir as web_dir.parent / "hermes_cli" / "web_dist" so
the sentinel points to the actual build output directory.

Fixes #14898
2026-04-26 18:43:57 -07:00
Teknium 90c84c6dba fix(gateway): unblock update subprocess on recognized-command bypass
When the gateway intercepts a pending /update prompt and the user sends
a recognized slash command (/new, /help, ...), the command now dispatches
normally AND the detached update subprocess is unblocked by writing a
blank .update_response. _gateway_prompt reads '' → strips → returns the
prompt's default (typically a safe 'n' / skip), so the update process
exits cleanly instead of blocking on stdin until the 30-minute watcher
timeout.

Also clears _update_prompt_pending[session_key] on this path so stray
future input for the same session isn't re-intercepted.

Extends PR #15849 with tests for the new cancel-write + a regression
test pinning the legacy behavior of unrecognized /foo slash commands
still being consumed as the response.
2026-04-26 18:39:44 -07:00
Yukipukii1 bdaf56a94d fix(gateway): bypass slash commands during pending update prompts 2026-04-26 18:39:44 -07:00
Brooklyn Nicholson b1c49d5e73 chore(tui): /clean recent perf work — KISS/DRY pass
24 files, -319 LoC. Behaviour preserved, 369/369 tests green.

- hermes-ink caches: shared lruEvict helper for the four parallel LRU
  caches (stringWidth, wrapText, sliceAnsi, lineWidth); touch-on-read
  stays inlined per cache; tightened output.ts skip-slice fast path.
- wheelAccel: trimmed provenance header, collapsed env parsing, ternary
  dispatch in computeWheelStep.
- perfPane: folded ensureLogDir into once-flag, spread-with-overrides
  for fastPath/phases instead of full rebuilds.
- env: extracted truthy() (used 4×).
- virtualHeights: collapsed user/diff/slash height bumps; trail+todos
  estimate.
- useInputHandlers: scrollIdleTimer cleanup on unmount, ?? undefined
  shorthand.
- useMainApp: dropped dead liveTailVisible IIFE and liveProgress
  indirection.
- appLayout, markdown, messageLine, entry: vertical rhythm, dropped
  narration comments, inlined one-shot vars.
- fix: empty catch blocks → /* best-effort */ for no-empty lint.
2026-04-26 20:38:47 -05:00
Teknium bdc1adf711 chore(release): map haru398801, badgerbees, xnbi in AUTHOR_MAP 2026-04-26 18:33:35 -07:00
Badgerbees 55f212a7a2 fix(slack): honor NO_PROXY for Slack transport 2026-04-26 18:33:35 -07:00
Xnbi 7eaad06a87 fix(gateway): default Slack tool_progress to off
Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663).

- Built-in slack default: off; other tier-2 platforms unchanged.

- Adjust /verbose isolation test for off to new cycle.

- Migration tests: read/write config.yaml as UTF-8 (Windows locale).
2026-04-26 18:33:35 -07:00
haru398801 a01e767b24 fix(gateway): respect config.yaml slack.enabled when SLACK_BOT_TOKEN env var is set
Previously, setting SLACK_BOT_TOKEN in .env would unconditionally enable
the Slack gateway adapter regardless of `slack.enabled: false` in config.yaml.
This caused spurious "SLACK_APP_TOKEN not set" errors when the token was
used only by skills (e.g. cron jobs that send Slack messages) rather than
for the Hermes messaging gateway.

Now, enabled: false in config.yaml is respected — the token is stored so
skills can still use it, but the gateway adapter is not activated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-26 18:33:35 -07:00
hharry11 fd474d0f00 fix(gateway): avoid cross-user mirror writes in per-user group sessions 2026-04-26 18:31:24 -07:00
Teknium cd2aee36ca test(sessions): wire sessions_dir through auto-prune + file-cleanup regression tests
- TestAutoMaintenance gains 3 tests: auto-prune deletes transcript files
  when sessions_dir is passed, preserves them when it isn't (backward-
  compat), and never touches active-session files during prune.
- FakeDB helpers in test_sessions_delete.py accept **kwargs so they
  don't break when delete_session signature gains sessions_dir.
2026-04-26 18:31:07 -07:00
Yang Zhi 3b60abb6bb fix(sessions): delete on-disk transcript files during prune and delete (#3015)
`delete_session()` and `prune_sessions()` only removed SQLite records,
leaving .json/.jsonl transcript files on disk forever. Over time this
causes unbounded disk growth (~27MB/day observed).

Changes:
- Add `_remove_session_files()` static helper that cleans up
  `{session_id}.json`, `.jsonl`, and `request_dump_{session_id}_*.json`
- `delete_session()` accepts optional `sessions_dir` param and removes
  files for the deleted session and its children
- `prune_sessions()` accepts optional `sessions_dir` param and removes
  files for all pruned sessions after the DB transaction
- Wire up CLI `hermes sessions delete` and `hermes sessions prune` to
  pass `sessions_dir`
- File cleanup is best-effort (OSError silenced) so DB operations are
  never blocked by filesystem issues
- Fully backward-compatible: `sessions_dir=None` (default) preserves
  existing behavior
2026-04-26 18:31:07 -07:00
Wysie 0ba6471dd1 fix: recover hindsight embedded daemon after idle shutdown 2026-04-26 18:29:11 -07:00
Yukipukii1 7317d69f19 fix(security): treat quoted false as false in browser SSRF guards 2026-04-26 18:27:13 -07:00
Teknium 2a0fc97c76 chore(release): map mewwts in AUTHOR_MAP 2026-04-26 18:25:41 -07:00
mewwts 8fb861ea6e feat(gateway/slack): support channel_skill_bindings
Extends the existing channel_skill_bindings mechanism (previously
Discord-only) to Slack, so a channel or DM can auto-load one or more
skills at session start without relying on the model's skill selector
for every short reply.

Motivation: Mats's German flashcards DM pushes a cron-driven card
5x/day; he responds with one-word guesses like 'work'. Previously each
reply required the main agent to decide whether to load german-flashcards
(full opus turn just to pick a skill). With the binding configured per
Slack channel, the skill is injected at session start and grading runs
directly.

Changes:
- Extract resolve_channel_skills() from DiscordAdapter._resolve_channel_skills
  into gateway.platforms.base (now shared across adapters).
- DiscordAdapter._resolve_channel_skills delegates to the shared helper
  (behavior preserved — existing test suite still passes unchanged).
- SlackAdapter: resolve channel_skill_bindings on each message and attach
  auto_skill to MessageEvent. gateway/run.py already handles auto-skill
  injection on new sessions; this just wires Slack through it.
- gateway/config.py: accept channel_skill_bindings in slack: block of
  config.yaml (was Discord-only).
- Tests: new tests/gateway/test_slack_channel_skills.py with 11 cases
  covering DM/thread/parent resolution, single-vs-list skills, dedup,
  malformed entries. Discord suite unchanged.
- Docs: add 'Per-Channel Skill Bindings' section to Slack user guide.

Config example:
  slack:
    channel_skill_bindings:
      - id: "D0ATH9TQ0G6"
        skills: ["german-flashcards"]
2026-04-26 18:25:41 -07:00
Teknium 635253b918 feat(busy): add 'steer' as a third display.busy_input_mode option (#16279)
Enter while the agent is busy can now inject the typed text via /steer —
arriving at the agent after the next tool call — instead of interrupting
(current default) or queueing for the next turn.

Changes:
- cli.py: keybinding honors busy_input_mode='steer' by calling
  agent.steer(text) on the UI thread (thread-safe), with automatic
  fallback to 'queue' when the agent is missing, steer() is unavailable,
  images are attached, or steer() rejects the payload. /busy accepts
  'steer' as a fourth argument alongside queue/interrupt/status.
- gateway/run.py: busy-message handler and the PRIORITY running-agent
  path both route through running_agent.steer() when the mode is 'steer',
  with the same fallback-to-queue safety net. Ack wording tells users
  their message was steered into the current run. Restart-drain queueing
  now also activates for 'steer' so messages aren't lost across restarts.
- agent/onboarding.py: first-touch hint has a steer branch for both
  CLI and gateway.
- hermes_cli/commands.py: /busy args_hint updated to include steer,
  and 'steer' is registered as a subcommand (completions).
- hermes_cli/web_server.py: dashboard select widget offers steer.
- hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py:
  inline docs updated.
- website/docs/user-guide/cli.md + messaging/index.md: documented.
- Tests: steer set/status path for /busy; onboarding hints;
  _load_busy_input_mode accepts steer; busy-session ack exercises
  steer success + two fallback-to-queue branches.

Requested on X by @CodingAcct.

Default is unchanged (interrupt).
2026-04-26 18:21:29 -07:00
Teknium 87477756fd chore(release): map Ito-69 in AUTHOR_MAP 2026-04-26 18:21:20 -07:00
Ivan Tonov 930494d687 fix(cron): reap orphaned MCP stdio subprocesses after each tick
MCP stdio servers are spawned via the SDK's stdio_client, which on
Linux uses start_new_session=True (setsid).  When a cron job is
cancelled mid-way (timeout, agent finish, exception), the subprocess
often escapes the SDK's teardown and survives as a session leader.
Because setsid() detaches the child from the gateway's process group
/ cgroup tree, systemd does not reap it on service restart either —
so every cron tick that touches an MCP tool leaks a dangling server
process.

Fix:

* tools/mcp_tool.py — _run_stdio now wraps the whole stdio+session
  context in try/finally.  On any exit path (clean, exception,
  cancellation), PIDs still alive are moved from the active
  _stdio_pids set into a new _orphan_stdio_pids set.  Orphan
  detection is done via os.kill(pid, 0) — a cheap liveness probe
  that never signals the target.

* tools/mcp_tool.py — _kill_orphaned_mcp_children gains an
  include_active=False flag.  Default behaviour now only reaps the
  orphan set so concurrent sessions (other parallel cron jobs or
  live user chats) are never disrupted.  The existing shutdown path
  passes include_active=True to keep the previous "kill everything"
  semantics after the MCP loop is stopped.

* cron/scheduler.py — the cleanup hook is moved from run_job()'s
  finally (which would race with parallel siblings after #13021)
  into tick() after the ThreadPoolExecutor has joined every future.
  At that point there are no in-flight sessions from this tick, so
  sweeping the orphan set is always safe.

Net effect: zero regression for healthy sessions, and orphan MCP
servers no longer accumulate between gateway restarts.

Made-with: Cursor
2026-04-26 18:21:20 -07:00
Teknium 5db6db891c chore(release): map ghostmfr in AUTHOR_MAP 2026-04-26 18:20:17 -07:00
ghostmfr e818ec520a fix(slack): harden attachment handling
Multiple overlapping Slack attachment improvements:

1. Upload retry with backoff on transient errors (429, 5xx, connection
   reset, rate_limited, service unavailable). New _is_retryable_upload_error
   helper covers three upload paths: _upload_file, send_video,
   send_document. Up to 3 attempts with 1.5s * attempt backoff.

2. Thread participation tracking: successful file uploads now add the
   thread_ts to _bot_message_ts, mirroring how text replies are tracked.
   This lets follow-up thread messages auto-trigger the bot (same
   engagement rules as replied threads).

3. Thread metadata preservation in the image redirect-guard fallback
   (send_image → send text fallback) and in two gateway.run.py send
   paths (image + document fallback calls).

4. HTML response rejection in _download_slack_file_bytes. Parallels
   the existing check in _download_slack_file. Guards against Slack
   returning a sign-in / redirect page as document bytes when scopes
   are missing, so the agent doesn't get HTML-as-a-PDF.

5. File lifecycle event acks (file_shared / file_created / file_change).
   These events arrive around snippet uploads. Acking them silences the
   slack_bolt 'Unhandled request' 404 warnings without changing behavior.

6. Post-loop message type classification so a mixed image+document upload
   classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT.
   Previously, the per-file classification in the inbound loop could be
   overwritten unpredictably.

7. Expanded text-inject whitelist in inbound document handling to cover
   .csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so
   snippets and config files are directly visible to the agent, not just
   cached as opaque uploads. Paired with new MIME entries in
   SUPPORTED_DOCUMENT_TYPES in base.py.

Squashed from two commits in #11819 so the single commit carries the
contributor's GitHub attribution (the original commits were authored
under a local dev hostname).
2026-04-26 18:20:17 -07:00
Brooklyn Nicholson 527ac351b4 fix(tui): address Copilot review comments
- stringWidth: true LRU on cache hit (touch-on-read via delete+set) so
  hot strings stay resident under long sessions; was insertion-order
  FIFO before
- virtualHeights: include todos, panel sections, and intro version in
  messageHeightKey so height-cache reuse correctly invalidates when
  todo content / panel sections change
- virtualHeights: estimate trail+todos rows at todos.length+2 (or 2
  collapsed) instead of the generic ~1-line fallback, so initial
  virtualization offsets are closer to reality
- useInputHandlers: clearTimeout on unmount for scrollIdleTimer so
  pending relaxStreaming() never fires after teardown
- render-node-to-output: drop unused declined.noHint counter from
  scrollFastPathStats; it was always 0 (the "hint missing" branch is
  outside the diagnostics block)
- perfPane / hermes-ink.d.ts: follow the noHint removal
- wheelAccel: replace ~/claude-code path comment with generic
  attribution that doesn't reference a developer-local checkout
2026-04-26 20:07:41 -05:00
Brooklyn Nicholson b115ea62da feat(tui): anchor LiveTodoPanel to latest user message row
TodoPanel now renders as a child of the most recent user message's
virtualized row container, so it visually belongs to that prompt and
follows it during scroll. Falls back gracefully when no user message
exists yet (panel just doesn't render).
2026-04-26 20:07:29 -05:00
Brooklyn Nicholson 25767513f2 perf(tui): unified Ink cache eviction on memory pressure + session reset
Adds an `evictInkCaches(level)` API that prunes the four hot module-level
caches (`widthCache`, `wrapCache`, `sliceCache`, `lineWidthCache`) with
either a half-keep LRU pass or a full clear. Wired into:

- memoryMonitor: half-prune on 'high', full drop on 'critical', before
  the heap dump / auto-restart path. Gives long sessions a shot at
  recovering RSS instead of hard-exiting.
- useSessionLifecycle.resetSession: half-prune so a /new session starts
  with a half-warm pool and the prior session can resume cheaply.

Also: lineWidthCache now uses LRU half-eviction on overflow instead of a
full `cache.clear()`, matching the other three caches.

Comparison vs claude-code: both forks now share the same `prevScreen`
blit + dirty-cascade machinery in render-node-to-output. Their smoothness
came from sibling-memo discipline (every chrome pane memo'd so dirty
cascade doesn't disable transcript blit) — already in place in our
appLayout.tsx (TranscriptPane / ComposerPane / StatusRulePane all memo'd).
Alt-screen is not the cause; both use it. The remaining gap was per-row
CPU on width/wrap/slice, which the previous commit closed.
2026-04-26 19:41:53 -05:00
Brooklyn Nicholson c370e2e1e5 perf(tui): cache stringWidth/wrapText/sliceAnsi + skip-slice when line fits clip
CPU profile (Apr 2026, real-user scroll on 11k-line session) showed three
hot loops in the per-frame render path:

  Output.get() per-frame walk:                 24% total
  └─ sliceAnsi(line, from, to) per write:     18% total
  stringWidth(line) chain (cached + JS):      14% total

All three were re-doing identical work every frame: same string → same
clipped slice → same width.

Fixes:

1. Memoize stringWidth (8k-entry LRU) for non-ASCII strings; ASCII fast-path
   skips the cache (inline scan beats Map.get for short ASCII, the >90%
   case). String.charCodeAt scan up to 64 chars is cheaper than the regex
   fallback.

2. Memoize wrapText (4k-entry LRU keyed by maxWidth|wrapType|text) — wrapAnsi
   is pure and the same content reflows identically every frame.

3. Memoize sliceAnsi (4k-entry LRU keyed by start|end|str) for the
   end-defined hot path used by Output.get().

4. Skip the slice entirely in Output.get() when the line already fits the
   clip box (startsBefore=false && endsAfter=false). Most transcript lines
   never exceed their container width, and tokenizing them just to slice
   (line, 0, width) was pure overhead. This single fast-path drops
   sliceAnsi from 18% → ~0% in the profile.

Also tighten virtualization constants (MAX_MOUNTED 260→120, OVERSCAN 40→20,
SLIDE_STEP 25→12) and cap historical-message render at 800 chars / 16
lines via HISTORY_RENDER_MAX_*; messages inside the FULL_RENDER_TAIL_ITEMS
window still render in full so reading-zone behavior is unchanged.

Validation, real-user CPU profile, page-up scroll on 11k-line session:

  Output.get() self-time:     24%   →   0.3%
  sliceAnsi total:            18%   →   not in top 25
  stringWidth family:         14%   →   ~3%
  idle:                     60.7%   →  77.3%

Frame timings (synthetic page-up profile harness):
  dur p95:   ~10ms   →  4.87ms
  dur p99:   25ms+   → 12.80ms
  yoga p99:  ~20ms   →  1.87ms

The remaining CPU in the profile is Yoga layoutNode + React commit,
which is the irreducible work for this UI tree size.
2026-04-26 19:28:09 -05:00
Teknium b16f9d438b feat(telegram): send fresh finals for stale preview streams (port openclaw#72038) (#16261)
Ports openclaw/openclaw#72038 to hermes-agent.

Telegram's `editMessageText` preserves the original message timestamp,
so a long-running streamed reply (reasoning models that take 60+ seconds
to finish) would keep the first-token timestamp even after completion.
Users can't tell how long a task actually took.

When a preview message has been visible for >= 60s (configurable via
`streaming.fresh_final_after_seconds`), finalize by sending a fresh
message instead of editing in place, then best-effort delete the stale
preview. Short previews still edit in place (the existing fast path).

Implementation notes adapted from OpenClaw's TypeScript original:
- `StreamConsumerConfig` gains `fresh_final_after_seconds` (default 0 =
  legacy edit-in-place). Gateway-level `StreamingConfig` defaults to 60.
- `GatewayStreamConsumer` tracks `_message_created_ts` at first-send and
  checks it in `_send_or_edit` on `finalize=True`. New helpers
  `_should_send_fresh_final` + `_try_fresh_final`.
- `BasePlatformAdapter` gains optional `delete_message(chat_id, message_id)`
  returning False by default. `TelegramAdapter` implements it via
  `_bot.delete_message`.
- `gateway/run.py` only enables fresh-final for `Platform.TELEGRAM`;
  other platforms ignore the setting (they don't have the stale-edit
  timestamp problem or edit-then-read works cheaply).
- Fallback to normal edit on any fresh-send failure — no user-visible
  regression if Telegram rate-limits a send or the message is gone.

Tests: 15 new cases in tests/gateway/test_stream_consumer_fresh_final.py
covering short/long previews, config plumbing, delete-support absent,
send-failure fallback, __no_edit__ sentinel safety, and StreamingConfig
round-trip.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-04-26 17:26:37 -07:00
Brooklyn Nicholson 85e9a23efb feat(tui): HERMES_TUI_FPS=1 shows live fps counter
Adds a corner-overlay FPS readout gated on HERMES_TUI_FPS, fed by
ink's onFrame callback (so it's the REAL render rate, not a timer).
Displays fps, last-frame duration, and total frame count, colored by
threshold (green ≥50, yellow ≥30, red below).

Implementation:
  * lib/fpsStore.ts — nanostore atom updated from a trackFrame()
    sink.  Ring buffer of last 30 frame timestamps; fps = 29/elapsed.
    trackFrame is undefined when SHOW_FPS is off so ink's onFrame
    short-circuits at the optional chain.
  * components/fpsOverlay.tsx — tiny <Text> subscriber; returns null
    when SHOW_FPS is off (React skips the subtree entirely).
  * entry.tsx — composes onFrame from logFrameEvent (dev-perf) and
    trackFrame (fps) so both flags can coexist.  When both are off,
    onFrame is undefined and ink never attaches the handler.
  * appLayout.tsx — mounts the overlay as a flex-shrink=0 right-
    aligned Box below the composer, conditional on SHOW_FPS.

Usage:
  HERMES_TUI_FPS=1 hermes --tui
  # bottom right: "  62.3fps ·   0.8ms · #1234" (green/yellow/red)

Intended as a user-facing diagnostic during the scroll-perf tuning
pass — watch the counter drop while holding PageUp to see where
frames go silent, without having to run scripts/profile-tui.py in a
side terminal.

126 files post-compile with React Compiler; 352 tests still pass.
2026-04-26 17:20:47 -05:00
Brooklyn Nicholson 4395c2b007 feat(tui): port claude-code's wheel accel state machine
Replaces the static WHEEL_SCROLL_STEP=1 multiplier on wheel events
with an adaptive accel state machine that infers user intent from
inter-event timing.

Algorithm ported straight from claude-code's
src/components/ScrollKeybindingHandler.tsx.  All tuning constants,
the native/xterm.js path split, the encoder-bounce detection, the
trackpad-burst signature → all theirs.  This file is a mechanical
port into our module structure.

What it does:

  precision click (>500ms gap)   1 row/event   (deliberate scan)
  sustained mouse (40-200ms)     2-6 rows      (decay curve)
  detected wheel bounce          ramps to 15   (sticky wheel-mode)
  trackpad flick (5+ <5ms)       1 row/event   (burst detect)
  direction reversal             reset to base

Two implementation paths:

  * native terminals (ghostty, iTerm2, Kitty, WezTerm) — linear
    window-ramp + optional wheel-mode curve triggered by detected
    encoder bounce.  SGR proportional reporting handled via the
    burst-count guard.

  * xterm.js (VS Code / Cursor / browser terminals) — pure
    exponential-decay curve with fractional carry.  Events arrive
    1-per-notch with no pre-amplification, so the curve is more
    aggressive.

Selected at construction via isXtermJs() from @hermes/ink (now
exported).  Per-user tune via HERMES_TUI_SCROLL_SPEED (alias
CLAUDE_CODE_SCROLL_SPEED for portability).

13 unit tests covering direction flip/bounce/reversal, idle
disengage, trackpad-burst disengage, frac invariants, and the
native vs xterm.js branches.

Profiled under --rate 30 (stress test) and --rate 10 (realistic
sustained scroll): accel ramps to cap=6 at 30Hz burst, decays to
1-3 rows at sparse 10Hz clicks.  Perf is comparable to baseline
because accel IS multiplying step — the win is perceptual (fast
flicks cover distance, slow clicks keep precision), not raw fps.

Companion to the earlier WHEEL_SCROLL_STEP=1 change: that set the
base; this modulates around it.
2026-04-26 17:16:11 -05:00
Brooklyn Nicholson 0cd98499bb Promote debugging-hermes-tui-commands to in-repo skill
Was user-local in ~/.hermes/skills/. Ported into skills/software-development/
so other Hermes users get it and so the related_skills links from
node-inspect-debugger and python-debugpy resolve in-repo.

Frontmatter upgraded to match repo convention (version/author/license/
metadata.hermes.{tags,related_skills}, description rewritten as "Use when ...").
Body expanded with debugging-tactics section pointing at the two new
debugger skills, and additional common-issues / pitfalls entries.
2026-04-26 17:13:12 -05:00
Brooklyn Nicholson 4cdb6962ca Add hermes-agent-skill-authoring skill
Class-level skill for writing SKILL.md files inside this repo: required
frontmatter per tools/skill_manager_tool.py validator, size limits,
peer-matched structure, directory placement, write_file vs skill_manage,
caching pitfalls, cross-reference caveats.
2026-04-26 17:12:25 -05:00
Brooklyn Nicholson 9a46feb9bd experiment(tui): HERMES_TUI_INLINE flag to skip AlternateScreen
Adds a gate so we can A/B test whether bypassing the alt-screen +
viewport constraint lets the terminal's native scrollback beat our
virtualization on scroll perf.

Result: definitively NO.  Inline mode is 40x worse on every metric
that moves, because AlternateScreen is what constrains the ScrollBox
to the viewport height.  Without it, the ScrollBox grows to contain
every child of the transcript and every frame re-renders all 1100
messages.

Profile under hold-wheel_up (1106-msg session, 30Hz for 6s):

  metric                    fullscreen       inline       delta
  patches_total              28,864         1,111,574     +3751%
  writeBytes_total           42 KB          1.6 MB        +3881%
  fps_throughput             15.8 fps       1.75 fps      -89%
  frames                     179            18            -90%
  gap_p50_ms                 17 (~60fps)    726 (~1fps)   +4170%
  yoga_p99                   34 ms          405 ms        +1083%
  renderer_p99               14 ms          169 ms        +1062%
  flickers                   0              5 offscreen   —

This is actually the cleanest data we've gotten so far:

  * AlternateScreen is LOAD-BEARING for perf — its viewport height
    constraint is what lets useVirtualHistory's culling work.  No
    constraint → ScrollBox grows unbounded → every fiber mounts.

  * The outer terminal (Cursor's xterm.js) parsed 1.6 MB of ANSI in
    under 10 seconds with drain p99 = 8.83 ms and 0 backpressure
    frames.  Our terminal-write hypothesis from last session was
    wrong: the bottleneck is React + Yoga, not the wire.

  * Doing proper inline mode (non-virtualized transcript in
    scrollback, composer pinned below) is not a flag flip — it's a
    different UI architecture.  Leaving this flag in so anyone
    re-running the experiment gets the same numbers, but not
    building the architecture until we're sure the perf win is
    worth the UX loss (it probably isn't — the fullscreen + virt
    path is the one we should optimize, not replace).

Keeping the flag as an experiment gate.  Flip HERMES_TUI_INLINE=1
and run scripts/profile-tui.py --compare to reproduce.
2026-04-26 17:11:49 -05:00
Brooklyn Nicholson 8d2b08342c Add node-inspect-debugger and python-debugpy skills
Two new skills under skills/software-development/ for real breakpoint-driven
debugging from the terminal:

- node-inspect-debugger: node --inspect / --inspect-brk, node inspect REPL,
  CDP scripting via chrome-remote-interface, attaching to running Node
  processes (SIGUSR1), ui-tui-specific recipes, Vitest under debugger,
  CPU profiles + heap snapshots.

- python-debugpy: pdb quick reference, breakpoint() workflow, pytest --pdb
  (with xdist caveat for scripts/run_tests.sh), post-mortem, debugpy for
  remote/attach, remote-pdb as the agent-friendly alternative to DAP,
  recipes for tui_gateway/_SlashWorker/subprocess debugging.
2026-04-26 17:10:11 -05:00
Brooklyn Nicholson 82f842277e perf(tui): profile harness gains --loop, --save, --compare
Before: change code → build → run profile → manually compare to
mental model of last run.  After: `--loop` watches ui-tui/src and
packages/hermes-ink/src for .ts(x) changes, rebuilds on change,
re-runs the same scenario, prints a side-by-side A/B diff against
the previous iteration — so each edit's impact is quantified
instantly.  Ctrl+C to stop.

Also added:
  --save LABEL     saves metrics snapshot to /tmp/perf-<LABEL>.json
  --compare LABEL  diffs the current run vs that snapshot
  --extra-flag X   pass-through to node dist/entry.js (prepping for
                   --no-fullscreen below)

key_metrics() flattens a full run into scalar numbers across
frames, React commits, and per-phase timings.  format_diff() prints
a table with ↑/↓ markers denoting regressions vs improvements based
on whether the metric is lower-is-better (p99, max, patches, drain)
or higher-is-better (fps, gaps_under_16ms).

Run-to-run noise on static code is ~5-15% on most metrics — big
signal (>30% change on renderer_p99 / fps) cuts through cleanly.
Useful both for validating a single fix and for detecting subtle
regressions during the wheel-accel port.

Usage during the next perf session:

  # one-shot with a baseline for later comparison
  scripts/profile-tui.py --seconds 6 --hold wheel_up --save pre-accel

  # after porting the wheel handler
  scripts/profile-tui.py --seconds 6 --hold wheel_up --compare pre-accel

  # continuous iteration
  scripts/profile-tui.py --seconds 6 --hold wheel_up --loop
2026-04-26 17:08:07 -05:00
Brooklyn Nicholson f823535db2 perf(tui): instrument stdout drain — rule out terminal parse bottleneck
Adds four fields to FrameEvent.phases and the matching profile
summary:

  optimizedPatches  post-optimize patch count (what's actually
                    written to stdout; the .patches field is
                    pre-optimize)
  writeBytes        UTF-8 byte count of the write this frame
  backpressure      true when Node's stdout.write returned false
                    (Writable buffer full — outer terminal can't
                    keep up)
  prevFrameDrainMs  end-to-end drain time of the PREVIOUS frame's
                    write, captured from stdout.write's 2-arg
                    callback.  Reported on the next frame so the
                    measurement reflects "time until OS flushed
                    the bytes to the terminal fd", not "time until
                    queued in Node".

writeDiffToTerminal() now returns { bytes, backpressure } and
accepts an optional onDrain callback.  Only attached on TTY with
diff; piped/non-TTY stdout bypasses flow control so the callback
would fire synchronously anyway.

Initial measurements under hold-wheel_up against 1106-msg session
(30Hz for 6s):

  patches total    28,888
  optimized total  16,700   (ratio 0.58 — optimizer cuts ~42%)
  writeBytes       42 KB / 10s = 4.2 KB/s throughput
  drainMs p50      0.14 ms   terminal accepts bytes instantly
  drainMs p99      0.85 ms
  backpressure     0% of frames

This rules out the terminal-parse hypothesis — Cursor's xterm.js
drains our output in sub-millisecond time at only 4 KB/s.  The
remaining lag has to be in the render pipeline, not the wire.
Profile output now includes the bytes+drain+backpressure lines to
keep this visible on every subsequent iteration.
2026-04-26 17:06:22 -05:00
Brooklyn Nicholson d3dedf10aa revert(tui): drop DeferredMd, profiling showed it was neutral
Profiled with scripts/profile-tui.py under hold-PageUp + hold-wheel.
The placeholder → microtask-upgrade pattern did not reduce renderer
p99 (63ms → 63ms) or max (96ms → 142ms, slightly worse).  Each fresh
row still pays the Md cost — just on a follow-up commit instead of
inline — and the follow-up commit shows up as a second heavy frame
a few ms later.

The real bottlenecks turned out to be:

  1. wheel step too large (fixed in 7ca16eea)
  2. outer terminal ANSI parse throughput (diagnosing next)
  3. React commit frequency during hold-scroll (needs coalescing)

None of which DeferredMd addresses.  Clearing the complexity so the
next experiments land on a simpler substrate.
2026-04-26 17:03:38 -05:00
Brooklyn Nicholson 7ca16eea56 perf(tui): scroll one row at a time per wheel event, half-viewport per pageUp
User observation: "it doesn't scroll line by line/row by row."

Was right.  Two places hardcoded big deltas:

1. WHEEL_SCROLL_STEP = 6 (config/limits.ts)
   Each wheel event scrolled 6 rows.  A mechanical wheel notch emits
   3-5 events → 18-30 rows per click, which visually teleports past
   content instead of smooth-scrolling it.  Drop to 1.  Trackpads
   emit 50-100 events per flick — at step=1 that's still a fast flick
   (a whole viewport in one flick) but each intermediate frame is
   visible.  Porting claude-code's wheel accel state machine is the
   right next step if this feels sluggish on precision scrolls.

2. pageUp/pageDown = viewport - 2 (useInputHandlers.ts)
   Full-viewport jumps replace the entire screen — no visual
   continuity, can't scan content — AND land right at Ink's fast-path
   threshold (`delta < innerHeight`), which disqualifies the DECSTBM
   blit on every press.  Half-viewport keeps 50% continuity AND
   drops well under the threshold.  Two presses still cover the same
   total distance.

Profiled against the 1106-msg session, holding the key at 30Hz for
6s:

  wheel_up (step 6 → 1):
    frames       142  →  163    (+15%)
    throughput   10.7 → 15.8 fps (+48%)
    patches tot  53018→ 36562   (-31%)
    gap p50      5ms  → 16ms    (actual rendering ~60fps now)
    <16ms frames 93   → 76
    16-33ms      82   → 76
    hitches      3    → 1

  pageUp (viewport-2 → viewport/2):
    throughput   10.7 → 9.5 fps  (same ballpark — smaller delta × same
                                  event rate = less total scroll)

Ink's proportional drain caps at `innerHeight - 1` per frame to keep
the DECSTBM fast path firing.  With these smaller deltas every event
comfortably fits under that cap, so fast-path hit rate goes up and
patch volume per frame drops — the measured 31% reduction in total
patches-sent correlates with users perceiving smoother scrolling
because the outer terminal (VS Code / xterm.js / tmux) isn't drowning
in ANSI between paints.

Tests/type-check/build clean; 352 tests pass.
2026-04-26 17:01:22 -05:00
Brooklyn Nicholson 4a9070c9ac perf(tui): defer Md upgrade for fresh-mounted assistant rows
Adds DeferredMd — a wrapper around <Md> that renders a lightweight
<Text> placeholder on first mount and upgrades to the full markdown
subtree on a queueMicrotask follow-up. Rationale: fresh MessageLine
mounts during PageUp hold run our markdown tokenizer + syntax
highlighter synchronously, producing the 63-112ms renderer spikes
profiled earlier. A plain <Text> placeholder only needs Yoga to wrap
the pre-stripped string (no tokenizer, no highlight), then the Md
subtree builds in a follow-up React commit.

Upgrade cache: once a (theme, compact, text) tuple has been upgraded,
a WeakMap-keyed Set remembers it so remounts (scroll-out then
scroll-back) mount straight into <Md> — no placeholder round-trip.
WeakMap on theme means palette swaps re-upgrade naturally.

Honesty note: profiling under hold-PageUp showed this didn't reduce
renderer p99 measurably — the upgrade commit just pays the Md cost on
a follow-up frame instead of inline. The bigger bottleneck turned out
to be React commit frequency (3.5 commits/sec during 30Hz scroll
input, with 200ms+ silent gaps between commits dominating perceived
FPS), which this change doesn't address. Keeping the deferred path
anyway because:

  1. It's correct and tested — no regressions across 352 tests
  2. Defensive for pathological fresh-mount cases (giant code blocks,
     wide tables) that aren't in the current profile fixture
  3. Pairs naturally with useVirtualHistory's useDeferredValue to keep
     React's concurrent scheduler able to interrupt upgrade commits

If the follow-up perf investigation (terminal write throughput / patch
volume / commit frequency) shows DeferredMd is net-neutral-or-worse in
practice, this can be reverted with a one-line swap back to <Md> in
messageLine.tsx:115.

Companion to the streaming 2-column fix in 7242361a — these two
touched messageLine.tsx together so they land as a pair.
2026-04-26 16:56:09 -05:00
Brooklyn Nicholson 7242361a69 fix(tui): wrap streaming markdown split in column Box
StreamingMd returned <><Md/><Md/></> — a bare Fragment with two <Md>
children. Each <Md> returns a <Box flexDirection="column">, but its
parent in messageLine.tsx (line 169) is `<Box width={...}>` with no
flexDirection, which Ink defaults to 'row'. So during streaming the
two column boxes rendered side-by-side, producing the visible "tokens
jumble into two columns until it fixes itself" bug — the "fix" was
message.complete flipping isStreaming→false, which swaps the
StreamingMd subtree for a single DeferredMd/Md child (no siblings → row
direction is harmless).

Wrap the two <Md> siblings in a flexDirection="column" Box so they
stack. Localized fix so the non-streaming path (single-child, works
fine in a row parent) is untouched.

Reported by user:
> "tokens streaming... going into 2 columns randomly and jumbling
>  together until it fixes itself"

No test changes — findStableBoundary tests still pass (the layout
change is parent-structural, not in the boundary logic). Build clean,
tsc clean, 352 tests pass.
2026-04-26 16:55:56 -05:00
Brooklyn Nicholson cd7a200e6c perf(tui): instrument scroll fast-path decline reasons
Adds scrollFastPathStats counters to render-node-to-output.ts: captures
every time a ScrollBox's DECSTBM scroll hint is generated, records
whether the fast path took it (blit+shift from prevScreen) or declined,
and why. Exposed through hermes-ink's public exports and snapshotted on
every FrameEvent so the profiler harness can correlate decline reasons
with the actual patch/renderer cost per frame.

This is pure observation — no behaviour change. Preparing for the
virtual-history rewrite: the hypothesis was that our topSpacer/
bottomSpacer scheme disqualifies every scroll via heightDelta
mismatch, but the data shows the fast path is actually taken on most
scrolls (19/23 over a 6s PageUp hold through 1100 messages) — the
remaining steady-state renderer cost is Yoga tree traversal, not
the per-frame full redraw I initially suspected.

Declines that do happen correlate with React commits that changed the
mounted range mid-scroll (heightDelta=±3 to ±35). Those are the rarer
cases the virtualization rewrite still needs to address.

No test diffs — instrumentation-only.  Build verified: `tsc --noEmit`
plus the full `npm run build` compiler post-pass pass cleanly.
2026-04-26 16:45:53 -05:00
Brooklyn Nicholson 71eee26640 perf(tui): full-pipeline instrumentation + profiling harness
Extends HERMES_DEV_PERF to capture the complete render pipeline, not
just React commits. Adds scripts/profile-tui.py to drive repeatable
hold-PageUp stress tests against a real long session.

perfPane.tsx:
  Wires ink's onFrame callback (already plumbed through the fork) into
  the same perf.log as the React.Profiler samples. Captures per-phase
  timing (yoga calculateLayout, renderNodeToOutput, screen diff, patch
  optimize, stdout write) plus yoga counters (visited/measured/cache-
  Hits/live) and patch counts per frame.  Events are tagged
  {src: 'react'|'frame'} so jq can split them.  logFrameEvent is
  undefined when HERMES_DEV_PERF is unset, so ink doesn't even attach
  the callback.

entry.tsx:
  Passes logFrameEvent into render().

types/hermes-ink.d.ts:
  Declares FrameEvent + onFrame on RenderOptions so the ui-tui side
  type-checks against the plumbed-through ink option.

scripts/profile-tui.py:
  New harness. Launches the built TUI under a PTY with the longest
  session in state.db resumed, holds PageUp/PageDown/etc at a
  configurable Hz for N seconds, then parses perf.log and prints
  per-phase p50/p95/p99/max plus yoga-counter summaries. Zero deps
  beyond stdlib. Exit 2 if nothing was captured (wiring broken).

Initial findings (1106-msg session, 6s PageUp hold at 30Hz):
  - Steady state: 10 fps; renderer phase p99=63ms, write p99=0.2ms
  - 4/107 heavy frames (>=16ms), all dominated by renderNodeToOutput
  - One pathological 97ms frame with yoga measuring 70,415 text cells
    and Yoga visiting 225k nodes — the cold-unmeasured-region hit
  - Ink's scroll fast-path (DECSTBM blit from prevScreen) is
    disqualified because our spacer-based virtual history doesn't
    keep heightDelta in sync with scroll.delta, so every PageUp step
    falls through to a full 2000-4800 patch re-render instead of ~40
2026-04-26 16:36:25 -05:00
Brooklyn Nicholson 69ff201050 feat(tui): anchor todo panel above streaming output 2026-04-26 16:26:50 -05:00
Brooklyn Nicholson 2259eac49e feat(tui): collapse completed todo panel on turn end 2026-04-26 16:24:15 -05:00
Brooklyn Nicholson cb7cfba6de fix(cli): surface last_active in search_sessions so -c works 2026-04-26 16:21:57 -05:00
Brooklyn Nicholson debae25f1c perf(tui): incremental markdown during streaming
Split in-flight assistant text at the last stable block boundary so only
the unclosed tail re-tokenizes per stream delta. Previously the full
text was rendered as plain <Text> during streaming and only flipped to
<Md> at message.complete — cheap per delta but loses live markdown
formatting.

New StreamingMd component holds a monotonically-growing stablePrefix
in a ref (idempotent under StrictMode double-render), renders it as
one <Md> that memoizes across deltas, and renders the unstable suffix
as a second <Md> that re-parses on each delta. Cost per delta drops
from O(total length) to O(unstable length).

findStableBoundary walks back to the last "\n\n" outside an open
fenced code block — splitting inside an open fence would orphan the
opener and break highlighting in the prefix.

Adapted from claude-code's src/components/Markdown.tsx:186 but built
on our line-based tokenizer instead of marked.lexer. 9 new tests cover
fence balance, boundary walk, and empty input.

Part of the --tui perf audit (see audit #7).
2026-04-26 16:21:34 -05:00
Brooklyn Nicholson bde89c169b fix(cli): -c picks the most recently used session 2026-04-26 16:17:39 -05:00
Brooklyn Nicholson b36007b246 feat(tui): allow collapsing archived todo panels 2026-04-26 16:15:59 -05:00
Brooklyn Nicholson c78b528125 feat(tui): archive todos at turn end with incomplete hint 2026-04-26 16:14:58 -05:00
Brooklyn Nicholson 319c1c1691 fix(tui): inline todo in transcript, group across thinking 2026-04-26 16:09:28 -05:00
Brooklyn Nicholson 4943ea2a7c fix(tui): merge tools into contextual shelves 2026-04-26 16:00:38 -05:00
Brooklyn Nicholson 4d3e3a738d chore(tui): sort imports 2026-04-26 15:56:47 -05:00
Brooklyn Nicholson a5319fb7af test(tui): cover live todo completion flow 2026-04-26 15:56:08 -05:00
Brooklyn Nicholson f5552f92e2 fix(tui): stabilize live todo progress 2026-04-26 15:55:38 -05:00
Brooklyn Nicholson 1566f1eecc fix(tui): report actual session on exit 2026-04-26 15:55:01 -05:00
Brooklyn Nicholson a30db69dd5 chore(tui): clean live progress lint 2026-04-26 15:42:07 -05:00
Brooklyn Nicholson f6846205cc fix(tui): isolate turn state from app render 2026-04-26 15:40:38 -05:00
Brooklyn Nicholson 6a3873942f fix(tui): format thinking paragraphs 2026-04-26 15:38:18 -05:00
Brooklyn Nicholson 64de685d3f test(tui): remove stale turn freeze experiment 2026-04-26 15:35:41 -05:00
Brooklyn Nicholson cee4036e8b fix(tui): merge tool shelves in transcript 2026-04-26 15:35:38 -05:00
Brooklyn Nicholson cf8439263a fix(tui): keep todo pinned outside transcript 2026-04-26 15:33:01 -05:00
Brooklyn Nicholson 3271ffbd80 fix(tui): pin todo panel above live output 2026-04-26 15:27:31 -05:00
Brooklyn Nicholson a7831b63db fix(tui): stabilize live progress rendering 2026-04-26 15:23:43 -05:00
Brooklyn Nicholson d4dde6b5f2 fix(tui): restore resumed transcript lineage 2026-04-26 15:16:12 -05:00
Teknium 755a280424 chore(release): map Wang-tianhao in AUTHOR_MAP 2026-04-26 13:02:51 -07:00
Wang-tianhao 6087e04043 fix(slack): extract rich_text quotes/lists and link unfurl previews
Slack's modern composer sends messages with a 'blocks' array that
contains rich_text elements. When a user forwards or quotes another
message, the quoted content shows up in the rich_text_quote children
of that array — and is NOT included in the plain 'text' field. The
agent saw only the lossy plain text and was blind to forwarded /
quoted content. Same story for link unfurl previews (Notion, docs,
GitHub, etc.) which Slack puts in the 'attachments' array.

Two fixes in the inbound handler:

1. _extract_text_from_slack_blocks walks rich_text / rich_text_quote /
   rich_text_list / rich_text_preformatted trees and renders readable
   text ('> quoted', '• bullet', code fences), dedupes against the
   plain text field, and appends the extracted content so the agent
   sees everything.

2. Link unfurl / attachment preview extraction reads title, url,
   body, and footer from the 'attachments' array and appends a
   '📎 [title](url)\n   body\n   _footer_' section per preview.
   Skips is_msg_unfurl to avoid echoing our own Slack replies back.

Routing is careful not to trust augmented text: mention gating
(is_mentioned) and slash-command detection both run against the
original 'text' field, so forwarded content containing '<@bot>' or
'/deploy' in a quote can't trick the bot into responding in a
channel it shouldn't or classifying a normal message as a command.

Adjustment from original PR: dropped _serialize_slack_blocks_for_agent,
which inlined a redacted JSON dump of non-rich_text blocks (section,
accessory, actions, etc.) — the agent would see the raw Block Kit
structure for UI-heavy alerts. It added up to 6000 characters to the
prompt context on every qualifying message with no opt-out. The
rich_text extraction and attachment unfurls cover the common bug-fix
case (quoted/forwarded content + link previews) without the prefill
tax. If a user needs block inspection later, it can return as a
config opt-in.

Also updates the Slack platform notes in session.py to accurately
describe what the gateway inlines.
2026-04-26 13:02:51 -07:00
Teknium 4921b26945 fix(cron): keep homeassistant toolset enabled when HASS_TOKEN is set (#16208)
After #14798 made cron honor per-platform `hermes tools` config, the
`_DEFAULT_OFF_TOOLSETS` filter silently stripped `homeassistant` from
cron jobs for users who'd been relying on the previous blanket toolset.
Norbert's HA cron reports regressed as a result.

The HA toolset is already runtime-gated by its `check_fn` (requires
HASS_TOKEN to register any tools). When HASS_TOKEN is set the user has
explicitly opted in — `_DEFAULT_OFF_TOOLSETS` adds nothing in that case,
so stop double-gating and restore HA for cron / cli / other platforms
without an explicit saved toolset list.

moa and rl stay off by default (original #14798 goal preserved).

Fixes HA cron regression reported by Norbert.
2026-04-26 12:55:58 -07:00
Teknium 822b507a72 chore(release): map maxims-oss in AUTHOR_MAP 2026-04-26 12:54:46 -07:00
maxims-oss 18beb69b49 fix(memory): close embedded Hindsight async client cleanly
HindsightEmbedded.close() delegates to its sync client.close(). When Hermes
created/used that client on the shared async loop, closing it from the main
thread raises 'attached to a different loop' before aiohttp releases the
session — so the ClientSession / TCPConnector leak past provider teardown.

Close the embedded inner async client on the shared loop first via
_run_sync(inner_client.aclose()), then let the wrapper's sync close()
do its daemon/UI bookkeeping.

Salvage of #14605: test placement rebased — appended TestShutdown class
after TestSharedEventLoopLifecycle (which landed on main after the PR was
written). Original author attribution preserved.
2026-04-26 12:54:46 -07:00
Tranquil-Flow bf05b8f4a2 fix(gateway): clean up cached agents on shutdown (#11205) 2026-04-26 12:51:53 -07:00
Zainan Victor Zhou 778fd1898e fix(slack): surface attachment access diagnostics
Translate Slack attachment failures into actionable user-facing notices
instead of generic download errors. When a scope/auth/permission issue
breaks attachment processing, the user sees:

  [Slack attachment notice]
  - Slack attachment access failed for photo.jpg. Missing scope:
    files:read. Update the Slack app scopes/settings and reinstall
    the app to the workspace.

Two helpers do the translation:

  _describe_slack_api_error — handles SlackApiError responses
    (missing_scope, invalid_auth, file_not_found, access_denied, etc.)

  _describe_slack_download_failure — handles httpx.HTTPStatusError
    (401/403/404) and Slack-returns-HTML-sign-in fallbacks

Wired into three existing call sites:
 - the Slack Connect files.info path (PR #11111) so scope errors
   surface instead of being logged as generic "files.info failed"
 - the image, audio, and document download paths so 401/403 and
   HTML-body responses translate into actionable notices

Adjustment from original PR: dropped _probe_slack_file_access_issue,
the proactive pre-download files.info probe. It added one extra
Slack API call per attachment even on healthy ones, and overlapped
with the existing files.info call from PR #11111. The post-failure
translation path covers the same user-facing diagnostic value
without the per-message tax.

Also documents files:read scope more prominently in the Slack setup
guide and troubleshooting table.

Contributed back from https://github.com/xinbenlv/zn-hermes-agent.

Closes #7015.
Co-authored-by: xinbenlv <zzn+pa@zzn.im>
2026-04-26 12:47:43 -07:00
Teknium 45bfcb9e71 test: update bare-agent helper for live-runtime attrs added by #16099
Background review fork now inherits session_id, credential_pool, and
status_callback from the parent (added in #16099 after this PR was
written). Extend the bare-agent helper so the regression test keeps
reaching the cleanup assertions instead of failing in the runtime
resolver.

Signed-off-by: Teknium <8425893+teknium1@users.noreply.github.com>
2026-04-26 12:45:39 -07:00
MRHwick aa7b5acfcd pass attribution check 2026-04-26 12:45:39 -07:00
MRHwick 36e352afa7 preserve the original comment 2026-04-26 12:45:39 -07:00
MRHwick 2d86e97a7e fix(run_agent): shut down background review memory providers
Temporary background review agents can initialize Hindsight-backed memory clients, but close() alone skips provider teardown. Shut the memory provider down before closing so aiohttp sessions do not leak at process exit.

Made-with: Cursor
2026-04-26 12:45:39 -07:00
Teknium edadeaf495 chore(release): map Satoshi-agi and kunlabs in AUTHOR_MAP 2026-04-26 12:35:16 -07:00
kunlabs f9885130b4 fix(slack): download files in Slack Connect channels
Slack Connect channels return file objects with file_access="check_file_info"
and no url_private_download field (see
https://docs.slack.dev/reference/objects/file-object/#slack_connect_files).
These stub objects must be resolved via files.info before download can
proceed. Without this the agent silently skips attachments posted in
Slack Connect channels.

Call files.info on every file whose file_access is check_file_info,
replace the stub with the full file object, and let the existing
download path continue. Warn and skip on files.info failures.

Closes #11095.
2026-04-26 12:35:16 -07:00
flobo3 f414df3a56 fix(slack): include team_id in thread-context cache key 2026-04-26 12:35:16 -07:00
Satoshi-agi c0d25df311 fix(slack): preserve thread-parent context when cron/bot posted the parent
The Slack thread-context fetcher used to drop every message with a
bot_id, which silently erased the thread parent whenever a cron job (or
any other bot) had posted it. As a result, replies to a cron-posted
summary lost all context and the agent answered as if from a blank
thread.

Changes:

1. gateway/platforms/slack.py::_fetch_thread_context
   - Keep the thread parent even when it was posted by a bot
     (e.g. cron summaries, third-party integrations).
   - Only skip *our own* prior bot replies to avoid circular context,
     matching the per-workspace bot user id via _team_bot_user_ids so
     multi-workspace deployments stay correct.
   - Keep non-self bot children (useful third-party context).

2. gateway/platforms/slack.py::_handle_slack_message
   - Populate MessageEvent.reply_to_text for thread replies (parity
     with Telegram/Discord/Feishu/WeCom). gateway.run uses this field
     to inject a [Replying to: "..."] prefix when the parent is not
     already in the session history, which is exactly the scenario
     triggered by cron-generated thread parents.
   - New helper _fetch_thread_parent_text reuses the existing thread-
     context cache (and its 60s TTL) to avoid duplicate
     conversations.replies calls; falls back to a cheap limit=1 fetch
     when the cache is cold.

Tests:

- Updated TestSlackThreadContext::test_skips_bot_messages to reflect
  the new behaviour (self-bot child dropped, third-party bot kept).
- Added:
    * test_fetch_thread_context_includes_bot_parent
    * test_fetch_thread_context_excludes_self_bot_replies
    * test_fetch_thread_context_multi_workspace
    * test_fetch_thread_context_current_ts_excluded (regression guard)
    * test_fetch_thread_parent_text_from_cache
    * test_slack_reply_to_text_set_on_thread_reply
    * test_slack_reply_to_text_none_for_top_level_message

Full Slack suite: 176 passed (was 169).
2026-04-26 12:35:16 -07:00
helix4u 10e36188da fix(cli): wire approvals in background tasks 2026-04-26 12:29:48 -07:00
Teknium 6a3102f9d4 chore(release): map hhuang91 in AUTHOR_MAP 2026-04-26 12:29:02 -07:00
bde3249023 75d3eaa0e4 fix(slack): exclude U/W user IDs from explicit target regex
Slack's chat.postMessage API rejects user IDs (U...) and workspace
IDs (W...) — they are not valid conversation IDs. Posting to them
fails because the API requires a channel ID (C/G/D). To DM a user,
the sender must first call conversations.open to obtain a D... ID.

Tighten _SLACK_TARGET_RE from [CGDUW] to [CGD] so the send path rejects
U/W values as explicit targets and instead falls through to channel-
name resolution (where they'll fail with a clear 'could not resolve'
error rather than silently getting stuck in a retry loop on the API).

Flip the corresponding regression test to assert U/W values are not
explicit. Matches the narrower regex briandevans proposed in #15939.

Co-authored-by: briandevans <brian@bde.io>
2026-04-26 12:29:02 -07:00
hhuang91 802c7acb81 fix(Slack): resolve Slack channels by raw ID and enumerate joined channels
send_message(target='slack:<channel_id>') failed with "Could not
resolve" because _parse_target_ref had no Slack branch — Slack's
uppercase alphanumeric IDs fell through to channel-name resolution,
which only matched by name. As a fallback, the agent would retry with
bare target='slack' and post to the home channel instead.

Three fixes:

- _parse_target_ref recognizes Slack IDs (C/G/D/U/W prefix) as
  explicit targets so the name-resolver is bypassed entirely.
- resolve_channel_name tries a case-sensitive raw-ID match before
  the existing name match, so any platform's IDs resolve cleanly.
- _build_slack now actually calls users.conversations against each
  workspace's AsyncWebClient (paginated), instead of only returning
  session-history entries. This populates the directory with public
  and private channels the bot has joined, so action='list' shows
  them and they can also be addressed by name. Errors from one
  workspace don't block others.

build_channel_directory becomes async (Slack web calls require it).
The two async-context callers in gateway/run.py are awaited; the
cron ticker thread call bridges via asyncio.run_coroutine_threadsafe.

Slack bot needs channels:read and groups:read scopes for full
enumeration; missing scopes degrade gracefully per-workspace.

addressing #15927
2026-04-26 12:29:02 -07:00
Teknium 541cd732e8 chore(models): drop deepseek from OpenRouter and Nous Portal curated picker lists (#16197)
Removes deepseek/deepseek-v4-pro and deepseek/deepseek-v4-flash from
OPENROUTER_MODELS and _PROVIDER_MODELS['nous'], then regenerates
website/static/api/model-catalog.json so the hosted picker JSON drops
them too. Direct-API deepseek provider support is unchanged.
2026-04-26 12:28:17 -07:00
Teknium 4d119bb62a test: blank platform-gating env vars in hermetic fixture
load_gateway_config() has a side effect: when config.yaml contains
platform-gating keys (slack.require_mention, slack.strict_mention,
slack.free_response_channels, slack.allow_bots, slack.reactions, plus
analogous keys for discord/telegram/whatsapp/dingtalk/matrix), it calls
os.environ[KEY] = ... to bridge them to env-var form.

monkeypatch.delenv doesn't track direct os.environ mutations made
inside the test body, so tests that call load_gateway_config() leak
those env vars into later tests on the same xdist worker. The failure
mode is flaky seed-dependent: test_top_level_message_requires_mention_
even_with_session (and siblings in TestThreadReplyHandling) pass when
SLACK_REQUIRE_MENTION is unset but fail when a leaked value of 'false'
is present.

Add the gating env vars to _HERMES_BEHAVIORAL_VARS so the hermetic
autouse fixture blanks them on every test setup, closing the leak
regardless of which test sets them.
2026-04-26 12:23:20 -07:00
Teknium 878c196738 chore(release): map hhhonzik in AUTHOR_MAP 2026-04-26 12:23:20 -07:00
Honza Stepanovsky 50dd67c680 fix(slack): skip _mentioned_threads registration when strict_mention is on
Extends the strict_mention feature so an @mention in strict mode no
longer persistently tags the thread as 'mentioned'. Without this, the
thread's first mention would permanently auto-trigger the bot on every
subsequent message — which is exactly what strict_mention is designed
to prevent. Closes the agent-to-agent ack loop hole hhhonzik identified
in #14117.

Co-authored-by: hhhonzik <me@janstepanovsky.cz>
2026-04-26 12:23:20 -07:00
Ching aea4a90f0e feat(slack): add opt-in slack.strict_mention gate for channel threads
Adds a strict_mention config option that, when enabled, requires an
explicit @-mention on every message in channel threads. Disables the
'once mentioned, forever in the thread' and session-presence auto-triggers.

- New _slack_strict_mention() helper (config.extra + SLACK_STRICT_MENTION env)
- Bridged top-level slack.strict_mention yaml to SLACK_STRICT_MENTION env,
  matching require_mention/allow_bots bridging
- Unit tests for the helper + config bridge
2026-04-26 12:23:20 -07:00
Teknium 897dc3a2bb fix(install+update): add /usr/local/bin PATH guard for RHEL root non-login shells (#16191)
* fix(install): add /usr/local/bin PATH guard for RHEL root non-login shells

The FHS-layout branch assumed /usr/local/bin is on PATH for every
standard shell. That holds for login shells (via /etc/profile's
pathmunge) but breaks on RHEL/CentOS/Rocky/Alma 8+ root in non-login
interactive shells (su, sudo -s, tmux panes, some web terminals) —
/etc/bashrc does not add /usr/local/bin and /root/.bash_profile
doesn't either. Result: hermes command links to /usr/local/bin/hermes
but the user has to type the absolute path each time.

Probe a fresh 'bash -i -c' (non-login interactive, matching the user
scenario) after symlinking. If hermes isn't resolvable, append an
idempotent PATH guard to /root/.bashrc and /root/.bash_profile, same
grep pattern already used by the ~/.local/bin branch below. No change
on distros where /usr/local/bin is already inherited.

* fix(update): repair RHEL root PATH on hermes update

Existing RHEL/CentOS/Rocky/Alma root installs won't be repaired by the
install.sh fix alone because 'hermes update' is an in-place git pull, not
a rerun of install.sh. Port the same probe + idempotent .bashrc write
into cmd_update so affected users get fixed automatically on next update.

_ensure_fhs_path_guard() runs after 'Update complete!':
- Linux + root + FHS-layout install (command at /usr/local/bin/hermes) only
- Probe: env -i bash -i -c 'command -v hermes' — fresh non-login interactive
  shell, same scenario the user reports
- On failure, append PATH guard to /root/.bashrc and /root/.bash_profile,
  skipping if any uncommented PATH line already mentions /usr/local/bin
- Silent no-op on macOS, non-root, legacy layout, or shells that already
  resolve hermes
2026-04-26 12:22:37 -07:00
Brooklyn Nicholson 350ee1bf23 refactor(tui): render progress in ordered stream timeline 2026-04-26 14:12:43 -05:00
Brooklyn Nicholson 3d21f97422 fix(tui): keep live tool state before stream segments 2026-04-26 14:06:42 -05:00
Teknium 4b5a88d714 fix(slack): honor reply_in_thread=false for top-level channel messages
Top-level channel messages arrive at _resolve_thread_ts with
metadata.thread_id set to the message's own ts, because the inbound
handler in _handle_message_event uses 'event.ts' as a session-keying
fallback when event.thread_ts is absent. That made metadata alone
insufficient to distinguish a real thread reply from a top-level
message, so reply_in_thread=false only took effect in DMs.

Use reply_to (== incoming message_id == ts for top-level messages) as
the tiebreaker: when metadata.thread_id == reply_to the 'thread' is the
synthetic session-keying fallback, not a real parent, so we reply
directly in the channel. Real thread replies (reply_to != thread_id)
still resolve to the parent thread and preserve conversation context.

Closes #9268.
2026-04-26 12:04:46 -07:00
bde3249023 b1be86ef96 fix(gateway): bridge slack.reply_in_thread config 2026-04-26 12:04:46 -07:00
Brooklyn Nicholson 7b5b524fc7 refactor(tui): clean thinking and viewport helpers 2026-04-26 14:03:36 -05:00
Brooklyn Nicholson a30ffbe1d4 fix(tui): show queued prompts when drained 2026-04-26 14:01:14 -05:00
Brooklyn Nicholson c9f7b703dd fix(tui): filter thinking status noise 2026-04-26 13:59:56 -05:00
Brooklyn Nicholson a8bfe72d35 fix(tui): address latest review feedback 2026-04-26 13:56:26 -05:00
Teknium ae7687cdc5 chore(release): map zhiyanliu in AUTHOR_MAP 2026-04-26 11:56:23 -07:00
sgaofen c730f6cc0b test(gateway): cover Slack vs non-Slack home-channel onboarding hint
Parameterize the test helpers in test_status_command.py to accept a
Platform and add two regression tests ensuring the first-run home-channel
onboarding uses '/hermes sethome' on Slack and '/sethome' everywhere else.

Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
2026-04-26 11:56:23 -07:00
Zhi Yan Liu d993a3f450 fix(gateway): use /hermes sethome in onboarding hint on Slack
Slack's adapter registers a single parent slash command /hermes and
dispatches subcommands via slack_subcommand_map(). Bare /sethome is
not a registered command on Slack and fails with 'app did not
respond', logging 'Unhandled request' in slack_bolt.AsyncApp.

Show /hermes sethome in the first-run onboarding hint when the
source platform is Slack; keep /sethome for Telegram, Discord,
Matrix, Mattermost, and other platforms that register it directly.

Fixes #14632
2026-04-26 11:56:23 -07:00
Teknium 1dfcc2ffc3 fix(gateway): /queue is now a true FIFO — each invocation gets its own turn (#16175)
Repeated /queue commands now each produce a full agent turn, in order,
with no merging.  Previously the second /queue overwrote the first
because the handler wrote directly into the adapter's single-slot
_pending_messages dict.

- GatewayRunner grows a _queued_events overflow buffer (dict of list).
- /queue puts new items in the adapter's next-up slot when free,
  otherwise appends to the overflow.  After each run's drain consumes
  the slot, the next overflow item is promoted so the recursive run
  picks it up.
- /new and /reset clear the overflow.
- /status now reports queue depth when non-zero.
- Ack message shows the depth once it exceeds 1.

Helpers (_enqueue_fifo, _promote_queued_event, _queue_depth) use the
getattr default-fallback pattern so existing tests that build bare
GatewayRunner instances via object.__new__ keep working.
2026-04-26 11:55:09 -07:00
Teknium 5b2c59559a feat(terminal): collapse subagent task_ids to shared container (#16177)
Before: delegate_task children each allocated their own terminal
sandbox keyed by child task_id. Starting extra containers (or Modal
sandboxes / Daytona workspaces) is expensive, and the subagent's work
is invisible to the parent — files written by the child in its
container don't exist in the parent's when the subagent returns.

After: a single `_resolve_container_task_id` helper maps any
tool-call task_id to "default" UNLESS an env override is registered
for it. The parent agent and all delegate_task children therefore
share one long-lived sandbox — installed packages, cwd, /workspace
files, and /tmp scratch carry over freely between them.

RL and benchmark environments (TerminalBench2, HermesSweEnv, ...)
opt in to isolation via `register_task_env_overrides(task_id, {...})`;
those task_ids survive the collapse and get their own sandbox,
preserving the per-task Docker image behavior these benchmarks rely on.

file_state / active-subagents registry / TUI events still key off the
original child task_id, so the 'subagent wrote a file the parent read'
warning and UI per-subagent panels keep working.

Tradeoff: parallel delegate_task children (tasks=[...]) now share one
bash/container. Concurrent cd, env-var mutations, and writes to the
same path will collide. If that bites a specific workflow, the
subagent can opt back into isolation via register_task_env_overrides.

Applied at four lookup sites:
- tools/terminal_tool.py terminal_tool() and get_active_env()
- tools/file_tools.py _get_file_ops() and _get_live_tracking_cwd()
- tools/code_execution_tool.py _get_or_create_environment()

Docs: website/docs/user-guide/configuration.md updated to reflect the
shared-container reality and document the RL/benchmark carve-out.
Tests: tests/tools/test_shared_container_task_id.py (9 cases).
2026-04-26 11:55:02 -07:00
Brooklyn Nicholson 2be5e181a9 fix(tui): keep thinking color theme-neutral 2026-04-26 13:54:12 -05:00
Brooklyn Nicholson 015f6c825d fix(tui): support modified enter for multiline input 2026-04-26 13:52:54 -05:00
Brooklyn Nicholson bb59d3bac2 fix(tui): preserve completed thinking panel 2026-04-26 13:49:41 -05:00
Brooklyn Nicholson 4a21920b5e fix(tui): address copilot review nits 2026-04-26 13:43:08 -05:00
Brooklyn Nicholson cc16d0ef77 Merge remote-tracking branch 'origin/main' into bb/tui-long-session-perf
# Conflicts:
#	ui-tui/src/app/interfaces.ts
2026-04-26 13:39:57 -05:00
Teknium 087e74d4d7 feat(slack): register every gateway command as a native slash (Discord/Telegram parity) (#16164)
Every command in COMMAND_REGISTRY (/btw, /stop, /model, /help, /new,
/bg, /reset, ...) is now a first-class Slack slash command instead of
a /hermes <subcommand>. Users get the same autocomplete-driven slash
picker experience Slack users expect and that Discord and Telegram
already provide.

Previously Slack registered ONE native slash (/hermes) and split on
the first word, so typing /btw in Slack's composer got 'couldn't find
an app for /btw' because the workspace manifest never declared it.

Changes
- hermes_cli/commands.py: slack_native_slashes() + slack_app_manifest()
  generate a Slack manifest from the registry (canonical names +
  aliases + plugin commands), clamped to Slack's 50-slash cap with
  /hermes reserved as the catch-all.
- gateway/platforms/slack.py: single regex matcher dispatches every
  registered slash to _handle_slash_command, which dispatches on
  command['command']. Legacy /hermes <subcommand> keeps working for
  backward compat with older workspace manifests.
- hermes_cli/slack_cli.py + hermes_cli/main.py: new 'hermes slack
  manifest' command prints/writes a full manifest (display info,
  OAuth scopes, event subs, socket mode, slash commands) ready to
  paste into 'Create from manifest' or Features → App Manifest.
- hermes_cli/setup.py: _setup_slack() now writes the manifest up-front
  and points users at the 'From an app manifest' flow; also offers
  to refresh the manifest on reconfigure for picking up new commands.
- Tests: 14 new tests covering native-slash dispatch (/btw, /stop,
  /model), legacy /hermes <sub> compat, manifest structure, and
  telegram<->slack parity (every Telegram command must also register
  as a Slack slash). Existing /hermes-registration test updated to
  assert the new regex matches /hermes, /btw, /stop, /model, /help.
- Docs: slack.md gains a 'Slash Commands' section + Option A manifest
  flow in Step 1; cli-commands.md documents 'hermes slack manifest'.

Users pick up the new slashes by running 'hermes slack manifest --write'
and pasting into Features → App Manifest → Edit in their Slack app
config, then Save (Slack prompts for reinstall if scopes changed).
2026-04-26 11:38:32 -07:00
Brooklyn Nicholson a8fcd1c742 fix(tui): apply details mode live 2026-04-26 13:34:33 -05:00
Teknium 9be83728a6 docs(docker-backend): clarify container is shared across sessions, not per-session (#16158)
The Docker terminal-backend docs said 'each session starts a long-lived
container', implying a fresh container per chat session. That hasn't been
true for a while: for the top-level agent, task_id defaults to 'default'
and the container is cached in _active_environments for the lifetime of
the Hermes process. /new, /reset, and switching sessions all reuse the
same container. Only delegate_task subagents and RL rollouts get isolated
containers keyed by their own task_id.
2026-04-26 10:46:08 -07:00
Teknium 9397767513 chore(skills): remove empty feeds category (#16153)
skills/feeds/ only contained a category-marker DESCRIPTION.md with no
actual skills in it. Removing the directory and the 'feeds' -> 'Feeds'
display-label mapping in website/scripts/extract-skills.py (the only
other reference in the repo).
2026-04-26 10:44:56 -07:00
Teknium 9662e3218a fix(tui): call maybe_auto_title for TUI sessions (#15949) (#16151)
* fix(tui): call maybe_auto_title for TUI sessions (#15961)

The maybe_auto_title() helper is called from cli.py and gateway/run.py
but was never wired into tui_gateway/server.py, so every session started
via 'hermes --tui' landed in state.db with an empty title. Evidence from
the issue reporter: 0/154 TUI sessions titled vs 91/383 CLI.

Mirror the CLI/Gateway pattern: after emitting message.complete, when the
turn finished cleanly, fire-and-forget title generation using the session
key, user prompt, agent response, and current history.

Fixes #15949.

Co-authored-by: math0r-be <math0r-be@github.com>

* chore(release): map math0r-be placeholder email in AUTHOR_MAP

---------

Co-authored-by: math0r-be <math0r-be@github.com>
2026-04-26 10:44:22 -07:00
Teknium 0824ba6a9d fix(/branch): redirect session_log_file and expose branch sessions in list (#14854) (#16150)
* fix(/branch): redirect session_log_file and expose branch sessions in list

Two bugs when using /branch:

1. cli.py _handle_branch_command updated agent.session_id but not
   agent.session_log_file, so all messages written after branching
   landed in the original session's JSON file and the branch never
   got its own session_{id}.json on disk.

   Fix: mirror the compression-split path (run_agent.py:7579) and
   update session_log_file immediately after changing session_id.

2. hermes_state.py list_sessions_rich filtered out every session
   with parent_session_id IS NOT NULL to hide sub-agent runs and
   compression continuations. Branch sessions share this column, so
   they became invisible to `hermes sessions list` and `sessions browse`.

   Fix: also include branch children — those whose parent ended with
   end_reason='branched' AND whose started_at >= parent.ended_at
   (the same timing condition that get_compression_tip uses to
   distinguish continuations from live-spawned subagents).

Fixes #14854

Co-Authored-By: Octopus <liyuan851277048@icloud.com>

* chore(release): map octo-patch placeholder email in AUTHOR_MAP

---------

Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Octopus <liyuan851277048@icloud.com>
2026-04-26 10:28:19 -07:00
Teknium 42c076d349 feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.

Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.

Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.

Feature is on by default. Opt out via:
  browser:
    auto_local_for_private_urls: false

The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
Teknium 0e2a53eab2 feat(skills): show enabled/disabled status in 'skills list' (#16129)
'hermes skills list' now shows every skill's enabled/disabled status
and accepts --enabled-only to filter down to what will actually load
for the active profile:

    hermes -p dario skills list --enabled-only

Previously the command was a flat catalog — it did not apply
skills.disabled from config.yaml, so there was no way to see the
live skill set for a profile without reading config by hand.
Profile switching already works via -p (swaps HERMES_HOME); this
just surfaces the result visibly.

Changes:
- hermes_cli/skills_hub.py: do_list adds a Status column and an
  enabled_only filter; summary reports enabled/disabled split
- hermes_cli/main.py: --enabled-only flag on 'skills list'
- /skills list slash command accepts --enabled-only too
- tests: 4 new (status column, disabled marking, enabled-only
  hiding, no platform leakage into get_disabled_skill_names);
  existing fixtures updated to accept skip_disabled kwarg

Reported by @mochizukimr on X.
2026-04-26 09:20:53 -07:00
Brooklyn Nicholson 6814646b36 fix(tui): avoid duplicating flushed stream text 2026-04-26 10:58:18 -05:00
Teknium eaa7e2db67 feat(cli,tui): surface /queue, /bg, /steer in agent-running placeholder (#16118)
* feat(cli,tui): surface /queue, /bg, /steer in agent-running placeholder

While the agent loop is running, the input placeholder previously only
hinted at Enter-to-interrupt. Surface the full set of busy-time actions
(interrupt via new message, /queue, /bg, /steer) so users discover them
without hunting through docs or Teknium's tweets.

- cli.py: "msg=interrupt · /queue · /bg · /steer · Ctrl+C cancel"
- ui-tui/src/components/appLayout.tsx: same string (was "Ctrl+C to interrupt…")

* revert tui placeholder change (cli-only per review)
2026-04-26 08:50:30 -07:00
briandevans 4e356098d2 fixup! fix(gateway): preserve inactivity clock on interrupt-recursive cached-agent turns (#15654)
Address Copilot review findings:

1. Gate _last_activity_desc on interrupt_depth == 0 alongside _last_activity_ts.
   Both fields are semantically paired — desc describes the activity *at* ts.
   Updating desc without ts made get_activity_summary() report "starting new
   turn (cached)" for 20+ minutes while the timestamp showed the true stale
   duration, producing misleading diagnostic output.

2. Monkeypatch gateway.run.time.time to a fixed epoch in tests that assert
   on _last_activity_ts values.  Real time.time() comparisons were latently
   flaky under slow CI or NTP adjustments.  _FAKE_NOW = 10_000.0 is used
   as the reference; assertions are now exact equality rather than >=.

3. Add test_fresh_turn_resets_desc and test_interrupt_turn_preserves_desc to
   directly cover the gated desc behaviour introduced by (1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:45:44 -07:00
briandevans de24315978 fix(gateway): preserve inactivity clock on interrupt-recursive cached-agent turns (#15654)
_last_activity_ts was unconditionally reset to time.time() on every
_agent_cache hit.  For interrupt-recursive _run_agent calls
(_interrupt_depth > 0) this silently reset the inactivity watchdog's
idle clock on each re-entry, preventing the 30-min timeout from ever
firing when a turn got stuck in an interrupt loop.  A stuck session
would emit "Still working... iteration 0/60, starting new turn (cached)"
heartbeats indefinitely instead of timing out.

Gate the reset on _interrupt_depth == 0 only.  Fresh external turns
still receive the reset so a session idle for 29 min doesn't trip the
watchdog before the new turn makes its first API call (#9051).

The per-turn reset logic is extracted into a static helper
_init_cached_agent_for_turn() to make it directly testable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 08:45:44 -07:00
Teknium 20cb706e03 chore: extend [SYSTEM:→[IMPORTANT: rename + AUTHOR_MAP
Follow-up to #6616 covering the remaining user-injected prompt markers that
the original PR did not touch (reporter's second comment on #6576 explicitly
flagged these). Azure OpenAI Default/DefaultV2 content filters treat any
bracketed [SYSTEM: ...] as prompt-injection and reject with HTTP 400.

Remaining call sites renamed:
- cli.py: background-process notifications (watch_disabled, watch_match,
  completion), MCP reload notice (4 live + 1 docstring)
- gateway/run.py: same notification paths + auto-loaded skill banner +
  MCP reload notice (5 live + 1 docstring)
- tools/process_registry.py: comment reference

Not renamed:
- environments/hermes_base_env.py '[SYSTEM]\n{content}' — RL training
  trajectory rendering only, never sent to Azure, part of a symmetric
  [USER]/[ASSISTANT]/[TOOL] scheme.

AUTHOR_MAP: buraysandro9@gmail.com -> ygd58.
2026-04-26 08:44:58 -07:00
ygd58 d7a3468246 fix(prompts): replace [SYSTEM: with [IMPORTANT: to avoid Azure content filter
Azure OpenAI content filters (Default/DefaultV2) treat bracketed
[SYSTEM: ...] meta-instructions as prompt-injection attempts and
reject requests with HTTP 400.

Replacing [SYSTEM: with [IMPORTANT: preserves the same semantic
meaning for the model while bypassing the Azure heuristic.

Fixes #6576
2026-04-26 08:44:58 -07:00
Teknium f2d655529a fix(auth): hoist get_env_value import + strengthen .env fallback tests
Follow-up to cherry-picked PR #15920:

- agent/credential_pool.py: hoist 'from hermes_cli.config import get_env_value'
  to module top instead of inline try/except in each seed site (3 sites).
  No import cycle — hermes_cli/config.py doesn't depend on agent.credential_pool.
- hermes_cli/auth.py: same hoist for the _resolve_api_key_provider_secret loop.
- tests/tools/test_credential_pool_env_fallback.py: replace smoke-only tests
  with real .env file I/O. Each test writes a temp ~/.hermes/.env, verifies
  _seed_from_env / _resolve_api_key_provider_secret read from it, and asserts
  the full priority chain: os.environ > .env > credential_pool. Uses
  'deepseek' as the test provider since 'openai' isn't in PROVIDER_REGISTRY
  and _seed_from_env's generic path requires a real pconfig lookup.
2026-04-26 08:32:09 -07:00
阿泥豆 27f4dba5ce test: add unit tests for credential pool env fallback 2026-04-26 08:32:09 -07:00
阿泥豆 8443998dc3 fix(auth): resolve API keys from ~/.hermes/.env and credential_pool
_resolve_api_key_provider_secret() and _seed_from_env() only checked
os.environ for provider API keys. When keys exist in ~/.hermes/.env but
are not loaded into the process environment (e.g. ACP adapter entry
point, post-session-start .env edits, or non-CLI entry points), the
resolution returns an empty string, causing HTTP 401 failures.

Changes:
- credential_pool._seed_from_env: use get_env_value() which checks both
  os.environ and ~/.hermes/.env file, preventing _prune_stale_seeded_entries
  from removing valid entries whose env var isn't in os.environ
- credential_pool._seed_from_env: same fix for openrouter and
  base_url_env_var resolution
- auth._resolve_api_key_provider_secret: use get_env_value() instead of
  os.getenv(), and add credential_pool fallback when env resolution fails

Fixes #15914
2026-04-26 08:32:09 -07:00
Teknium e3901d5b25 fix(run_agent): background review fork inherits parent's live runtime (#16099)
The background memory/skill review (_spawn_background_review) has always
forked a new AIAgent passing only model and provider, then relied on
AIAgent.__init__ to re-resolve credentials from env vars. This works for
users with keys in ~/.hermes/.env but silently falls back to env-var
auto-resolution in all cases, which fails for OAuth-only providers,
session-scoped creds, and credential-pool setups where auth can't be
reconstructed from env.

This used to be invisible -- failures were swallowed via logger.debug().
PR 8a2506af4 (Apr 24) surfaced auxiliary failures to the user, which
made the stale bug visible as:
    "Auxiliary background review failed: No LLM provider configured"

Fix: pass api_key, base_url, api_mode, and credential_pool from the
parent's live runtime into the fork -- matching how every other
auxiliary path (compression, memory flush, vision, session search)
already inherits the parent's credentials via _current_main_runtime().
2026-04-26 08:29:40 -07:00
Teknium 06f81752ed Revert "feat(kanban): durable multi-profile collaboration board (#16081)" (#16098)
This reverts commit 15937a6b46.
2026-04-26 08:29:37 -07:00
Teknium 9ef1ae138a fix(docker): don't chown config.yaml after gosu drop (#15865) (#16096)
The chown/chmod block on config.yaml was added in b24d239ce to keep the
file readable by the hermes runtime user, but it sat in the post-gosu
'running as hermes' section of the entrypoint. That meant:

1. Default `docker run <image>` — container starts as root, entrypoint
   drops to hermes via gosu, then non-root hermes tries to chown the
   file to hermes. Works by coincidence because the file was just
   created by root during volume setup and gosu target == target owner.
2. `docker run -u $(id -u):$(id -g) <image>` (#15865) — container
   starts as the caller's UID. The root block is skipped entirely, we
   land in the hermes section as some arbitrary non-root user, and
   chown to 'hermes' fails with 'Operation not permitted'. Script
   aborts under `set -e`.

Move the chown/chmod into the root block (before the gosu exec) where
it actually has privilege, and guard with `2>/dev/null || true` so
rootless Podman (where even in-container root lacks host-side chown
rights) doesn't abort either.

Closes #15865
2026-04-26 08:27:39 -07:00
Teknium c5196f1fc2 chore(release): map focusflow.app.help@gmail.com to yes999zc
Salvage PR #15883 cherry-picked FocusFlow Dev's commit; release-notes
CI needs the AUTHOR_MAP entry to attribute to the PR author's GitHub
login rather than a placeholder.
2026-04-26 08:25:22 -07:00
FocusFlow Dev 63bf7a29b6 fix(run_agent): prevent reasoning_content regression in DeepSeek/Kimi tool-call replay
PR #15478 fixed missing reasoning_content for DeepSeek API but introduced
a regression: tool-call messages with genuine 'reasoning' field were
overwritten by empty-string fallback before promotion.

Re-order _copy_reasoning_content_for_api steps:
  1. Preserve explicit reasoning_content
  2. Promote 'reasoning' field (MOVED UP)
  3. DeepSeek/Kimi tool-call empty-string fallback (MOVED DOWN)
  4. Non-thinking provider cleanup

Fixes #15812, relates #15749, #15478.
2026-04-26 08:25:22 -07:00
Teknium 15937a6b46 feat(kanban): durable multi-profile collaboration board (#16081)
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.

Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).

What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
  dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
  entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
  worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
  the running-agent guard (board writes don't touch agent state), so
  /kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
  vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
  4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
  updated, sidebar entry added.

Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
  execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
  No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
  failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
  claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
  can serve many businesses with data isolation by workspace path.

Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
2026-04-26 08:24:26 -07:00
Teknium 454d883e69 refactor: drop persist_session plumbing + fix broken btw mid-turn bypass (#16075)
Follow-up to PR #16053 (/btw as /background alias). Cleans up the
plumbing added exclusively for the old ephemeral /btw handler and
repairs a broken btw bypass that landed between my refactor and this
follow-up.

run_agent.py:
- Remove persist_session kwarg, instance attr, and _persist_session
  short-circuit. Only /btw ever passed persist_session=False; with
  /btw gone the default (always persist) is the only behavior anyone
  ever wanted.

gateway/run.py:
- Remove the unreachable 'if _cmd_def_inner.name == "btw"' block
  (PR #16059). Canonical name for a /btw message is 'background' after
  alias resolution — the comparison could never be true, and it called
  _handle_btw_command which no longer exists. The /background branch
  above it already dispatches /btw correctly.

tests/gateway/test_running_agent_session_toggles.py:
- Fix test_btw_dispatches_mid_run to mock _handle_background_command
  (the real dispatch target for /btw) instead of the deleted
  _handle_btw_command.
2026-04-26 07:15:23 -07:00
Teknium 70f56e7605 fix(gateway): let /btw dispatch mid-turn instead of being rejected
/btw spawns a parallel ephemeral side-question task (self-guarded against
concurrent /btw on the same chat) — exactly like /background. But it was
missing from the running-agent bypass list in _handle_message(), so it
fell through to the catch-all and returned:

   Agent is running — /btw can't run mid-turn. Wait for the current
  response or /stop first.

That's the opposite of what /btw is for — asking a side question while
the main turn is still working. Add the bypass next to /background and a
regression test covering the mid-turn dispatch path.

Reported by @IuriiTiunov on Telegram.
2026-04-26 07:11:10 -07:00
Teknium 7fa70b6c87 refactor: /btw is now an alias for /background (#16053)
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.

Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.

Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py

Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
2026-04-26 07:11:08 -07:00
Teknium 9a70260490 Revert "feat(onboarding): port first-touch hints to the TUI (#16054)" (#16062)
This reverts commit ffd2621039.
2026-04-26 06:31:37 -07:00
Teknium ffd2621039 feat(onboarding): port first-touch hints to the TUI (#16054)
PR #16046 added /busy and /verbose hints to the classic CLI and the
gateway runner but skipped the Ink TUI (and therefore the dashboard
/chat page, which embeds the TUI via PTY).  This extends the same
latch to the TUI with TUI-native wording.

The TUI's busy-input model is not the /busy knob from the CLI —
single Enter while busy auto-queues, double Enter on an empty line
interrupts.  The new busy-input hint teaches THAT gesture instead of
telling the user to flip a config that does not apply.

Changes:
- agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui()
- tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy
  hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into
  _on_tool_complete for the 30s/tool_progress=all path.  Same
  config.yaml latch so each hint fires at most once per install across
  CLI, gateway, and TUI combined.
- ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event
- ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys()
- ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first
  busy enqueue
- tests/agent/test_onboarding.py — +3 cases for TUI hint shape
- tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim
- website/docs/user-guide/tui.md — new 'Interrupting and queueing'
  section explaining the TUI's double-Enter model and the hints

Validation:
scripts/run_tests.sh tests/agent/test_onboarding.py \
  tests/tui_gateway/test_protocol.py \
  tests/gateway/test_busy_session_ack.py
  -> 66 passed
npm --prefix ui-tui run type-check -> clean
npm --prefix ui-tui run lint       -> clean
npm --prefix ui-tui run build      -> clean
2026-04-26 06:24:19 -07:00
Teknium 1e37ddc929 feat(cli): add 'hermes fallback' command to manage fallback providers (#16052)
Manage the fallback_providers chain from the CLI instead of hand-editing
config.yaml. The picker reuses select_provider_and_model() from 'hermes
model' — same provider list, same credential prompts, same model picker.

  hermes fallback [list]   Show the current chain (primary + fallbacks)
  hermes fallback add      Run the model picker, append selection to chain
  hermes fallback remove   Pick an entry to delete (arrow-key menu)
  hermes fallback clear    Remove all entries (with confirmation)

'add' snapshots config['model'] before calling the picker, extracts the
user's selection from the post-picker state, then restores the primary
and appends {provider, model, base_url?, api_mode?} to fallback_providers.
Auth store's active_provider is snapshot/restored too so OAuth-provider
fallbacks don't silently deactivate the user's primary. Duplicates and
self-as-fallback are rejected. Legacy single-dict 'fallback_model' entries
are auto-migrated to the list format on first write.
2026-04-26 06:19:04 -07:00
Teknium 83c1c201f6 feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046)
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:

1. First message while the agent is working — appends a hint to the busy-ack
   explaining the /busy queue vs /busy interrupt knob, phrased to match the
   mode that was just applied (don't tell a queue-mode user to switch to
   queue).

2. First tool that runs for >= 30s in the noisiest progress mode
   (tool_progress: all) — prints a hint about /verbose to cycle display
   modes (all -> new -> off -> verbose). Gated on /verbose actually being
   usable on the surface: always shown on CLI; on gateway only shown when
   display.tool_progress_command is enabled.

Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.

New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
  both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
  load_cli_config defaults (cli.py). No _config_version bump — deep
  merge handles new keys.

Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
  after building the ack.  progress_callback tracks tool.completed
  duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
  Enter; _on_tool_progress appends the tool-progress hint on the first
  >=30s tool completion.  In-memory CLI_CONFIG is also updated so
  subsequent fires in the same process are suppressed immediately.

All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
2026-04-26 06:06:27 -07:00
Teknium 4bda9dcade fix(gateway): honor voice.auto_tts config in auto-TTS gate (#16007) (#16039)
The base adapter's auto-TTS path fired on any voice message unless the
chat had explicitly run /voice off — it never read voice.auto_tts from
config.yaml, so users who set auto_tts: false still got audio replies.

Gate the base adapter on a three-layer decision instead:
  1. chat in _auto_tts_enabled_chats (explicit /voice on|tts) → fire
  2. chat in _auto_tts_disabled_chats (explicit /voice off)  → suppress
  3. else → voice.auto_tts global default

Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default
and mirrors /voice on|tts chats into _auto_tts_enabled_chats via the
existing _sync_voice_mode_state_to_adapter path. /voice off still wins.

Closes #16007.
2026-04-26 05:52:05 -07:00
Teknium 67dcace412 docs(config): show options in comments for display settings (#16038)
Users who run `hermes setup` get `cli-config.yaml.example` copied verbatim
(including comments) to ~/.hermes/config.yaml. But several display settings
had thin comments that didn't enumerate the valid options, so users couldn't
tell from reading their config what values each key accepts.

- busy_input_mode: widen from 'CLI' to 'CLI and gateway platforms';
  note /stop as gateway equivalent of Ctrl+C; add /busy_input_mode runtime hint
- compact, interim_assistant_messages, bell_on_complete, show_reasoning,
  streaming: add true/false option lines showing effect of each value
- skin: refresh the built-in skin list (was missing daylight, warm-lightmode,
  poseidon, sisyphus, charizard — 5 of 9 built-ins undocumented)
2026-04-26 05:51:37 -07:00
Teknium 35c57cc46b fix(gateway): suppress tool-progress bubbles after interrupt (#16034)
When the LLM response carries N parallel tool calls, the agent fires
N tool.started events back-to-back before its interrupt check runs.
A user sending /stop mid-batch would see the ' Interrupting current
task' ack followed by a trail of 🔍 web_search bubbles for the remaining
events in the batch — making the interrupt feel ignored.

progress_callback and the drain loop in send_progress_messages now
check agent.is_interrupted (via agent_holder[0], the existing
cross-scope handle). Events that arrive after interrupt are dropped
at both the queueing and rendering stages. The ' Interrupting'
message is sent through a separate adapter path and is unaffected.
2026-04-26 05:47:37 -07:00
Teknium e8441c4c0f fix(clipboard): report native/tmux success, keep Ctrl+Shift+C on dashboard
Follow-up on #16020 salvage. Three corrections:

1. Truth signal for /copy
   Before: success was 'OSC 52 sequence was emitted to stdout'. That's
   false on local Linux inside tmux (emitSequence=false), so /copy kept
   printing 'clipboard copy failed' to users whose xclip/wl-copy had
   already succeeded fire-and-forget.
   Fix: setClipboard() now returns { sequence, success } where success =
   native-fired OR tmux-buffer-loaded OR osc52-emitted. copyNative()
   returns a boolean telling setClipboard whether a native attempt was
   made. /copy only shows 'failed' when literally no path was taken.

2. Dashboard keybinding
   Before: Ctrl+C for copy on non-Mac (Ctrl+Shift+C for paste).
   That swallows SIGINT when a stale selection is present and breaks
   the xterm/gnome-terminal/konsole/Windows-Terminal convention where
   Ctrl+C in a terminal emulator is always SIGINT. The real bug was
   that clipboard writes lost user-gesture through OSC-52 round-trips,
   which the direct writeText already fixes.
   Fix: revert copyModifier to Ctrl+Shift+C on non-Mac. Direct
   writeText in the keydown handler preserves user gesture. term.write
   Escape replaced with term.clearSelection() (works without relying
   on TUI input mode).

3. Error toast text
   Before: 'see HERMES_TUI_DEBUG_CLIPBOARD' — tells users how to
   debug but not how to fix.
   Fix: point users at HERMES_TUI_FORCE_OSC52=1 first (the actual
   escape hatch), mention the debug var second.
2026-04-26 05:46:45 -07:00
Harry Riddle 2511207cb0 chore: revert docs 2026-04-26 05:46:45 -07:00
Harry Riddle 0f3a6f0fb3 fix(clipboard): dashboard Ctrl+C direct copy; TUI honest feedback; HERMES_TUI_FORCE_OSC52
- Dashboard copy: direct Clipboard API on Ctrl+C/Cmd+C (user gesture);
  send Escape to TUI to clear selection; Ctrl+Shift+C kept as fallback.
- TUI /copy: copySelection() async; only reports success if OSC52 emitted.
- Add HERMES_TUI_FORCE_OSC52 env var to override native-tool detection.
- Fixes "copied N chars" false-positive when clipboard backend absent.

Changes:
  web/src/pages/ChatPage.tsx — direct navigator.clipboard.writeText
  ui-tui/packages/hermes-ink/src/ink/ink.tsx — async copySelection
  ui-tui/packages/hermes-ink/src/ink/termio/osc.ts — HERMES_TUI_FORCE_OSC52
  ui-tui/src/app/slash/commands/core.ts — async /copy with honest feedback
2026-04-26 05:46:45 -07:00
Harry Riddle a562420383 fix(tui): robust clipboard handling with debug logging and headless detection
Problem: Ctrl+C in Hermes TUI shows 'copied' but clipboard often empty.
Root causes:
- Native Linux tools (xclip, wl-copy) require DISPLAY/WAYLAND_DISPLAY; in
  headless Docker/SSH they fail or hang.
- OSC 52 fallback requires terminal emulator support; when absent, sequence
  is dropped silently.
- Dashboard OSC 52 → Clipboard API path fails due to missing user gesture;
  errors were silently caught.
- User feedback 'copied selection' was shown unconditionally, regardless of
  success.

Solution implemented:
- Short-circuit Linux native clipboard probing when no display server is
  present (no DISPLAY and no WAYLAND_DISPLAY). Avoids futile attempts and
  timeouts.
- Add HERMES_TUI_DEBUG_CLIPBOARD env var (1/true). When set, TUI logs to
  stderr which clipboard path is used, probe results on Linux, and whether
  OSC 52 was emitted. Greatly improves diagnosability.
- Improve dashboard clipboard error handling: replace empty catch blocks
  with console.warn messages for OSC 52 decode/Write failures and direct
  copy/paste errors. Makes browser permission/user-gesture failures visible
  in DevTools.
- Add comprehensive clipboard troubleshooting documentation to README and
  AGENTS, covering OSC 52 verification, tmux config, Docker/headless
  constraints, env vars, dashboard caveats, and fallback strategies.

Technical details:
-  in ui-tui/packages/hermes-ink/src/ink/termio/osc.ts:
  - Early return on Linux if both DISPLAY and WAYLAND_DISPLAY unset.
  - Refactor probe sequence to async  with 500ms timeout,
    caching result; subsequent copies use cached tool immediately.
  - Emit debug logs when HERMES_TUI_DEBUG_CLIPBOARD=1.
-  in ink.tsx: log when OSC 52 not emitted (native
  or tmux path in use) in debug mode.
- : OSC 52 handler and Ctrl+Shift+C handler now
  log warnings to console on Clipboard API rejection with error message.
- Documentation: new 'Clipboard Troubleshooting' section in README; new
  'Clipboard environment variables and pitfalls' subsection in AGENTS.md
  (Known Pitfalls).

Tests: full ui-tui test suite (292 tests) passes; clipboard and OSC tests
unaffected. No breaking changes.

Files changed:
- ui-tui/packages/hermes-ink/src/ink/termio/osc.ts
- ui-tui/packages/hermes-ink/src/ink/ink.tsx
- web/src/pages/ChatPage.tsx
- README.md
- AGENTS.md
- CHANGELOG.md (new)
2026-04-26 05:46:45 -07:00
Teknium 855366909f feat(models): remote model catalog manifest for OpenRouter + Nous Portal (#16033)
OpenRouter and Nous Portal curated picker lists now resolve via a JSON
manifest served by the docs site, falling back to the in-repo snapshot
when unreachable. Lets us update model lists without shipping a release.

Live URL: https://hermes-agent.nousresearch.com/docs/api/model-catalog.json
(source at website/static/api/model-catalog.json; auto-deploys via the
existing deploy-site.yml GitHub Pages pipeline on every merge to main).

Schema (v1) carries id + optional description + free-form metadata at
manifest, provider, and model levels. Pricing and context length stay
live-fetched via existing machinery (/v1/models endpoints, models.dev).

Config (new model_catalog section, default enabled):
  model_catalog.url       master manifest URL
  model_catalog.ttl_hours disk cache TTL (default 24h)
  model_catalog.providers.<name>.url   optional per-provider override

Fetch pipeline: in-process cache -> disk cache (fresh < TTL) -> HTTP
fetch -> disk-cache-on-failure fallback -> in-repo snapshot as last
resort. Never raises to callers; at worst returns the bundled list.

Changes:
- website/static/api/model-catalog.json    initial manifest (35 OR + 31 Nous)
- scripts/build_model_catalog.py           regenerator from in-repo lists
- hermes_cli/model_catalog.py              fetch + validate + cache module
- hermes_cli/models.py                     fetch_openrouter_models() +
                                           new get_curated_nous_model_ids()
- hermes_cli/main.py, hermes_cli/auth.py   Nous flows use the helper
- hermes_cli/config.py                     model_catalog defaults
- website/docs/reference/model-catalog.md  + sidebars.ts
- tests/hermes_cli/test_model_catalog.py   21 tests (validation, fetch
                                           success/failure, accessors,
                                           disabled, overrides, integration)
2026-04-26 05:46:43 -07:00
Teknium d09ab8ff13 fix(mcp-oauth): preserve server_url path for protected-resource validation (#16031)
Stop pre-stripping the path from the configured MCP server URL before
constructing OAuthClientProvider. The MCP SDK strips the path itself via
OAuthContext.get_authorization_base_url() for authorization-server
discovery, but uses the full server_url through
resource_url_from_server_url() + check_resource_allowed() to validate
against the server's RFC 9728 Protected Resource Metadata.

For servers whose PRM advertises a path-scoped resource (e.g. Notion's
https://mcp.notion.com/mcp), our _parse_base_url() collapsed the URL to
the origin, so check_resource_allowed() saw requested='/' vs
configured='/mcp/' and refused the token. Fixes OAuth against Notion MCP
(and any other path-scoped resource).

Closes #16015.
2026-04-26 05:43:54 -07:00
Teknium 438db0c7b0 fix(cli): /model picker honors provider-specific context caps (#16030)
`_apply_model_switch_result` (the interactive `/model` picker's
confirmation path) printed `ModelInfo.context_window` straight from
models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on
openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker
showed 1M while the runtime (compressor, gateway `/model`, typed
`/model <name>`) correctly used 272K — the classic 'sometimes 1M,
sometimes 272K' mismatch on a single model.

Both display paths now go through `resolve_display_context_length()`,
matching the fix that `_handle_model_switch` received earlier.

Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS
(`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the
272K Codex cap is already enforced via the Codex-OAuth branch, so the
fallback now reflects what every non-Codex probe-miss should see.

Tests: adds `test_apply_model_switch_result_context.py` with three
scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls
back to ModelInfo). Updates the existing non-Codex fallback test to
assert 1.05M (the correct value).

## Validation
| path                          | before    | after     |
|-------------------------------|-----------|-----------|
| picker -> gpt-5.5 on Codex    | 1,050,000 | 272,000   |
| picker -> gpt-5.5 on OpenAI   | 1,050,000 | 1,050,000 |
| picker -> gpt-5.5 on OpenRouter | 1,050,000 | 1,050,000 |
| typed /model gpt-5.5 on Codex | 272,000   | 272,000   |
2026-04-26 05:43:31 -07:00
zkl 2ccdadcca6 fix(deepseek): bump V4 family context window to 1M tokens
#14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native
provider but the context-window lookup still falls back to the existing
"deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context
window, so any caller relying on get_model_context_length() for
pre-flight token budgeting (compression, context warnings) under-counts
by ~8x.

Add explicit lowercase entries for the four DeepSeek model ids that
ship 1M context:

- deepseek-v4-pro
- deepseek-v4-flash
- deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking)
- deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking)

Longest-key-first substring matching means these explicit entries also
cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter
and Nous Portal) without regressing the existing 128K fallback for
older / unknown DeepSeek model ids on custom endpoints.

Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing
2026-04-26 05:32:54 -07:00
Teknium 76042f5867 feat(review): class-first skill review prompt (#16026)
The background skill-review prompt (spawned after N user turns) now instructs
the reviewer to SURVEY existing skills first, identify the CLASS of task, and
PREFER updating/generalizing an existing skill over creating a new narrow one.

This reduces near-duplicate skill accumulation at the source. Catches the
common failure mode where repeated tasks of the same class each spawn their
own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead
of a single class-level skill ("desktop-app-build-troubleshooting").

Applied to both _SKILL_REVIEW_PROMPT and the **Skills** half of
_COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged.

Groundwork for the Curator feature (issue #7816) — the creation-side fix.
Curator handles the retirement/consolidation side in a follow-up PR.

Tests assert the behavioral instructions are present (survey, class, update-
over-create, overlap-flagging, opt-out clause) rather than snapshotting the
full prompt text.
2026-04-26 05:17:10 -07:00
Teknium 192e7eb21f fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898)
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of
those models recorded a cross-session file breaker that blocked EVERY
model on Nous for the cooldown window -- even though the caller's
own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro
capacity error, restarted, switched to Kimi 2.6, and still got
'Nous Portal rate limit active -- resets in 46m 53s'.

Nous already emits the full x-ratelimit-* header suite on every
response (captured by rate_limit_tracker into agent._rate_limit_state).
We now gate the breaker on that data: trip it only when either the
429's own headers or the last-known-good state show a bucket with
remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s
(healthy buckets everywhere, but upstream out of capacity) fall
through to normal retry/fallback and the breaker is never written.

Note: the in-memory 'restart TUI/gateway to clear' workaround
circulated in Discord does NOT work -- the breaker is file-backed at
~/.hermes/rate_limits/nous.json. The workaround for users still
affected by a bad state file is to delete it.

Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).
2026-04-26 04:53:42 -07:00
Brooklyn Nicholson d91e24547c fix(tui): attach inline diffs to tool timeline 2026-04-26 05:17:26 -05:00
Brooklyn Nicholson 05dc2eec36 fix(tui): tighten timeline detail spacing 2026-04-26 05:13:21 -05:00
Brooklyn Nicholson 2e6c3c7d23 fix(tui): address follow-up review nits 2026-04-26 05:06:57 -05:00
Brooklyn Nicholson a0aebad673 fix(tui): anchor details to stream timeline 2026-04-26 04:59:44 -05:00
Brooklyn Nicholson 7143d22a83 fix(tui): keep queued sends in queue UI 2026-04-26 04:49:56 -05:00
Brooklyn Nicholson 5ac4088856 fix(tui): keep live progress visible while scrolling 2026-04-26 04:46:44 -05:00
Brooklyn Nicholson e16e196c7e fix(tui): keep selection drag responsive 2026-04-26 04:44:19 -05:00
Brooklyn Nicholson 7d68ea9501 fix(tui): stream legacy thinking deltas visibly 2026-04-26 04:42:04 -05:00
Brooklyn Nicholson bc17310442 fix(tui): smooth selection drag behavior 2026-04-26 04:39:25 -05:00
Brooklyn Nicholson 8f0fa0836f fix(tui): preserve composer width on narrow panes 2026-04-26 04:35:54 -05:00
Brooklyn Nicholson bbd950efcf fix(tui): keep stream cadence responsive while typing 2026-04-26 04:32:55 -05:00
Brooklyn Nicholson 381121025e fix(tui): address review feedback 2026-04-26 04:28:55 -05:00
Brooklyn Nicholson 355e0ae960 fix(tui): keep streaming progress stable during interaction 2026-04-26 04:23:57 -05:00
Brooklyn Nicholson 1c964ed43f fix(tui): rely on native cursor for input 2026-04-26 03:47:05 -05:00
Brooklyn Nicholson cd7c5e5606 perf(tui): defer local input render during echo 2026-04-26 03:38:56 -05:00
Brooklyn Nicholson ee7ef33b02 fix(tui): queue busy submissions gracefully 2026-04-26 03:27:45 -05:00
Brooklyn Nicholson 5cd41d2b3b perf(tui): widen native input echo 2026-04-26 03:22:50 -05:00
Brooklyn Nicholson 9bb3bc422d perf(tui): optimistically echo simple input 2026-04-26 03:07:15 -05:00
Brooklyn Nicholson 19d75d1797 perf(tui): coalesce composer echo updates 2026-04-26 02:21:22 -05:00
Brooklyn Nicholson 458ce792d2 fix(tui): persist model switches by default 2026-04-26 02:15:10 -05:00
Brooklyn Nicholson 14fcff60c9 style(tui): apply formatter 2026-04-26 01:48:10 -05:00
Brooklyn Nicholson db4e4acca0 perf(tui): stabilize long-session scrolling 2026-04-26 01:47:05 -05:00
Teknium 59b56d445c feat(hooks): add duration_ms to post_tool_call + transform_tool_result (#15429)
Plugin hooks fired after a tool dispatch now receive an integer
duration_ms kwarg measuring how long the tool's registry.dispatch()
call took (time.monotonic() before/after). Inspired by Claude Code
2.1.119 which added the same field to PostToolUse hook inputs.

Wire points:
- model_tools.py: measure dispatch latency, pass duration_ms to
  invoke_hook("post_tool_call", ...) and invoke_hook("transform_tool_result", ...)
- hermes_cli/hooks.py: include duration_ms in the synthetic payload
  used by 'hermes hooks test' and 'hermes hooks doctor' so shell-hook
  authors see the same shape at development time as runtime
- shell hooks (agent/shell_hooks.py): no code change needed;
  _serialize_payload already surfaces non-top-level kwargs under
  payload['extra'], so duration_ms lands at extra.duration_ms for
  shell-hook scripts

Plugin authors can now build latency dashboards, per-tool SLO alerts,
and regression canaries without having to wrap every tool manually.

Test: tests/test_model_tools.py::test_post_tool_call_receives_non_negative_integer_duration_ms
E2E: real PluginManager + dispatch monkey-patched with a 50ms sleep,
hook callback observes duration_ms=50 (int).

Refs: https://code.claude.com/docs/en/changelog (2.1.119, Apr 23 2026)
2026-04-25 22:13:12 -07:00
Teknium eb28145f36 feat(approval): hardline blocklist for unrecoverable commands (#15878)
Adds a floor below --yolo: a tiny set of commands so catastrophic they
should never run via the agent, regardless of --yolo, gateway /yolo,
approvals.mode=off, or cron approve mode.  Opting into yolo is trusting
the agent with your files and services — not trusting it to wipe the
disk or power the box off.

The list is deliberately small (12 patterns), covering only
unrecoverable ops:
- rm -rf targeting /, /home, /etc, /usr, /var, /boot, /bin, /sbin,
  /lib, ~, $HOME
- mkfs (any variant)
- dd + redirection to raw block devices (/dev/sd*, /dev/nvme*, etc.)
- fork bomb
- kill -1 / kill -9 -1
- shutdown, reboot, halt, poweroff, init 0/6, telinit 0/6,
  systemctl poweroff/reboot/halt/kexec

Recoverable-but-costly commands (git reset --hard, rm -rf /tmp/x,
chmod -R 777, curl | sh) stay in DANGEROUS_PATTERNS where yolo can
still pass them through — that's what yolo is for.

Container backends (docker/singularity/modal/daytona) continue to
bypass both hardline and dangerous checks, since nothing they do can
touch the host.

Inspired by Mercury Agent's permission-hardened blocklist.
2026-04-25 22:07:12 -07:00
Teknium a55de5bcd0 feat(setup): auto-reconfigure on existing installs (#15879)
Bare `hermes setup` on a returning user now drops straight into the
full reconfigure wizard — every prompt shows the current value as its
default, press Enter to keep or type a new value to change it. The
returning-user menu is gone.

Behavior:
- First-time user: first-time wizard (unchanged)
- Returning user, bare command: full reconfigure wizard (new default)
- Returning user, `--quick`: only prompt for missing/unset items
- Returning user, one section: `hermes setup model|terminal|gateway|tools|agent`
- `--reconfigure`: preserved as backwards-compat alias (no-op since it's now default)

The section functions already used current values as prompt defaults —
this change just removes the extra click to get to them.

The 'Quick Setup - configure missing items only' menu option is now
exposed as the explicit `--quick` flag; it's the narrow case of
filling in missing config (e.g. after a partial OpenClaw migration or
when a required API key got cleared).

Inspired by Mercury Agent's `mercury doctor` UX.

Also removes:
- RETURNING_USER_MENU_SECTION_KEYS (orphaned constant)
- Two returning-user menu tests in test_setup_noninteractive.py
  (guarding behavior that no longer exists — covered by
  test_setup_reconfigure.py instead)
2026-04-25 22:02:02 -07:00
brooklyn! cec0af02ad Merge pull request #15870 from NousResearch/bb/fix-skills-search
fix(tui): restore skills search RPC
2026-04-25 22:13:28 -05:00
Brooklyn Nicholson 91a7a0acbe fix(tui): restore skills search RPC 2026-04-25 22:11:52 -05:00
Teknium 7c50ed707c docs(azure-foundry): add provider guide, env vars, release AUTHOR_MAP
- New website/docs/guides/azure-foundry.md covering both OpenAI-style
  and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x
  routing, /v1 stripping, api-version query forwarding, and the
  provider: anthropic + Azure URL alternative setup.
- environment-variables.md picks up AZURE_FOUNDRY_API_KEY,
  AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY.
- cli-commands.md includes azure-foundry in the provider choices list.
- configuration.md lists azure-foundry among auxiliary-task providers.
- sidebars.ts wires the new guide into the Guides section.
- scripts/release.py AUTHOR_MAP entries for TechPrototyper,
  HangGlidersRule (noreply), and pein892 so the contributor-attribution
  CI check does not reject the salvage.
2026-04-25 18:48:43 -07:00
Teknium 731e1ef8cb feat(azure-foundry): auto-detect transport, models, context length
The azure-foundry wizard now probes the endpoint before asking the user
to pick anything by hand:

  1. URL path sniff — endpoints ending in /anthropic are Azure Foundry
     Claude routes and skip to anthropic_messages.
  2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped
     model list, we switch to chat_completions and prefill the picker
     with the returned deployment/model IDs.
  3. Anthropic Messages probe — fallback for endpoints that don't expose
     /models but do speak the Anthropic Messages shape.
  4. Manual fallback — private endpoints / custom routes still work;
     the user picks API mode + types a deployment name.

Context length for the selected model is resolved through the existing
agent.model_metadata.get_model_context_length chain (models.dev,
provider metadata, hardcoded family fallbacks) and stored in
model.context_length when a non-default value is found.

Also refactors runtime_provider so Azure Foundry resolution is reused
between the explicit-credentials path and the default top-level path —
previously the /v1 strip for Anthropic-style Azure only ran when the
caller passed explicit_* args, which meant config-driven sessions
hit a double-/v1 URL.

New module hermes_cli/azure_detect.py with 19 unit tests covering:
- path sniff, model ID extraction, probe fallbacks
- HTTP error handling (URLError, HTTPError)
- context-length lookup passthrough
- DEFAULT_FALLBACK_CONTEXT rejection

New runtime tests cover:
- OpenAI-style Azure Foundry
- Anthropic-style Azure Foundry with /v1 stripping
- Missing base_url / API key raising AuthError

Rationale: Microsoft confirms there's no pure-API-key endpoint to list
Azure deployments (that requires ARM management auth).  The v1 Azure
OpenAI endpoint does expose /models with the resource's available
model catalog, which is good enough for picker prefill in the common
case.  Users on private/gated endpoints fall through to manual entry.
2026-04-25 18:48:43 -07:00
akhater ac57114284 fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
2026-04-25 18:48:43 -07:00
pein892 24b4b24d79 fix: preserve URL query params for Azure OpenAI and custom endpoints
Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.
2026-04-25 18:48:43 -07:00
HangGlidersRule c15064fa37 fix: pass api-version as default_query param, not in base_url — SDK was producing malformed URLs like /anthropic?api-version=.../v1/messages 2026-04-25 18:48:43 -07:00
HangGlidersRule 7bfa9442de fix: skip OAuth token refresh for Azure Anthropic endpoints — prevents ~/.claude/.credentials.json from overwriting Azure key mid-session 2026-04-25 18:48:43 -07:00
HangGlidersRule d8e4c7214e fix: Azure Anthropic short-circuit in resolve_runtime_provider — bypass custom runtime when provider=anthropic + azure.com URL 2026-04-25 18:48:43 -07:00
HangGlidersRule 6ef3a47ce5 fix: use Azure API key directly for Azure endpoints, bypass OAuth token priority chain 2026-04-25 18:48:43 -07:00
TechPrototyper 3a7653dd1f feat: Add Azure Foundry provider with OpenAI/Anthropic API mode selection
Add support for Azure Foundry as a new inference provider. Azure Foundry
endpoints can use either OpenAI-style (/v1/chat/completions) or
Anthropic-style (/v1/messages) API formats.

Changes:
- Add azure-foundry to PROVIDER_REGISTRY (auth.py)
- Add azure-foundry overlay in HERMES_OVERLAYS (providers.py)
- Add empty model list for azure-foundry (models.py)
- Add _model_flow_azure_foundry() interactive setup (main.py)
- Add azure-foundry runtime resolution with api_mode support (runtime_provider.py)
- Add AZURE_FOUNDRY_API_KEY and AZURE_FOUNDRY_BASE_URL env vars (config.py)

Usage:
  hermes model -> More providers -> Azure Foundry

The setup wizard prompts for:
- Endpoint URL
- API format (OpenAI or Anthropic-style)
- API key
- Model name

Configuration is saved to config.yaml (model.provider, model.base_url,
model.api_mode, model.default) and ~/.hermes/.env (AZURE_FOUNDRY_API_KEY).
2026-04-25 18:48:43 -07:00
Teknium 125de02056 fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844)
Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback.

## What changed

New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching.

`agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe).

Wired through five call sites that previously either duplicated the lookup or ignored it entirely:
- `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning)
- `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch
- `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg
- `gateway/run.py` /model confirmation (picker callback + text path)
- `gateway/run.py` `_format_session_info` (/info)

## Context probe tiers

`CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency.

## Repro (from #15779)

```yaml
custom_providers:
  - name: my-custom-endpoint
    base_url: https://example.invalid/v1
    model: gpt-5.5
    models:
      gpt-5.5:
        context_length: 1050000
```

`/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000".

## Tests

- `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants
- `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver
- `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K)
- `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier
2026-04-25 18:47:53 -07:00
Teknium 4c591c2819 chore(release): map fqsy1416@gmail.com to EKKOLearnAI 2026-04-25 18:40:35 -07:00
Teknium 01535a4732 fix(api_server): cap stop-run wait at 5s so interrupt can't hang handler
task.cancel() can't preempt the run_in_executor thread running
run_conversation(), so we rely on agent.interrupt() to wake the loop.
Without a timeout, a slow/unresponsive interrupt blocks the HTTP
response indefinitely. Wrap the await in wait_for(shield(task), 5.0)
and log a warning on timeout.

Also tidy one extra space in the module docstring's /stop entry.
2026-04-25 18:40:35 -07:00
ekko 0a15dbdc43 feat(api_server): add POST /v1/runs/{run_id}/stop endpoint
Add ability to interrupt a running agent via the runs API. Previously
/v1/runs could start a run and subscribe to events, but there was no
way to cancel it. The new endpoint stores agent and task references
during execution, calls agent.interrupt() to stop LLM calls, then
cancels the asyncio task.

Includes 15 tests covering start, events, and stop scenarios.
2026-04-25 18:40:35 -07:00
Teknium ce0513dd2e chore(release): map Feranmi10 personal email 2026-04-25 18:39:55 -07:00
Oluwadare Feranmi dc5e02ea7f feat(cli): implement hermes update --check flag (fixes #10318) 2026-04-25 18:39:55 -07:00
brooklyn! ff851ba7b9 Merge pull request #15821 from NousResearch/fix/tui-ctrl-g-editor
fix: external editor handoff in CLI/TUI
2026-04-25 20:37:05 -05:00
Brooklyn Nicholson 14dd8e9a72 fix(tui): address Copilot review on editor handoff
- resolveEditor() now returns argv (string[]) so EDITOR='code --wait'
  and VISUAL='emacsclient -t' tokenize correctly into spawnSync's
  separate command + args. Previously the whole string was passed as
  argv[0] and would ENOENT.
- Skip the POSIX X_OK PATH walk on Windows; return ['notepad.exe']
  there since fs.constants.X_OK is not meaningful and PATHEXT-based
  resolution would need its own implementation.
- Surface openEditor() rejections via actions.sys instead of letting
  them become unhandled promise rejections in the useInput callback.
- Hotkey docs/comment now say Cmd/Ctrl+G to match isAction()'s
  platform-action-modifier behavior (Cmd on macOS, Ctrl elsewhere).
2026-04-25 20:34:24 -05:00
Wysie 1d80e92c7e test(discord): add guild to fake e2e messages 2026-04-25 18:25:56 -07:00
Teknium edce7522a5 chore(release): add AUTHOR_MAP entry for voidborne-d personal email 2026-04-25 18:25:13 -07:00
voidborne-d 45e1228a8a fix(cli): suppress OSError EIO on interrupt shutdown
When the user interrupts a long-running task, prompt_toolkit tries to
flush stdout during emergency shutdown.  If stdout is in a broken state
(redirected to /dev/null, pipe closed, terminal gone), the flush raises
`OSError: [Errno 5] Input/output error` which propagates unhandled and
crashes the CLI.

Two defense layers:

1. `_suppress_closed_loop_errors`: add `OSError` with `errno.EIO` to
   the asyncio exception handler, matching the existing pattern for
   `RuntimeError("Event loop is closed")` and `KeyError("is not
   registered")`.

2. Outer `except (KeyError, OSError)` block: add `errno.EIO` check
   before the existing string-match guards, silently suppressing the
   error instead of printing a misleading stdin-related message.

Fixes #13710.
2026-04-25 18:25:13 -07:00
Brooklyn Nicholson 83129e72de refactor(tui): tighten editor handoff helpers
- editor.ts: collapse two private helpers into one flatMap-driven lookup,
  keep `isExecutable` as the only named primitive, document the fallback
  chain with prompt_toolkit parity
- editor.test.ts: hoist the `exe` helper out of `describe`, drop the
  empty afterEach + dead mkdir branch, materialize expected paths before
  the resolveEditor call so argument evaluation order doesn't bite
- useComposerState.openEditor: rmSync the mkdtemp dir (was leaking),
  early-return on bad exit / empty buffer, run cleanup in finally
- useInputHandlers: cheap `ch.toLowerCase() === 'g'` guard before the
  modifier check
- hermes-ink/screen.ts: pick up `npm run fix` import-sort cleanup so
  lint passes
2026-04-25 20:24:06 -05:00
Teknium 4d170134ef chore(release): map nerijusn76@gmail.com to Nerijusas (#15833) 2026-04-25 18:22:49 -07:00
nerijusas 81e01f6ee9 fix(agent): preserve Codex message items for replay 2026-04-25 18:22:06 -07:00
Brooklyn Nicholson 7fd8dc0bfb fix: preserve prompt_toolkit editor picker and mirror it in TUI
Base CLI's editor UX was better because prompt_toolkit picks the system
editor first, then friendly terminal editors before vi. Do not override
that with a vim-first chain.

Keep the CLI on prompt_toolkit's picker and only set tempfile_suffix='.md'
to avoid the complex-tempfile EEXIST path. Update the TUI resolver to
match prompt_toolkit's fallback order: $VISUAL, $EDITOR, editor, nano,
pico, vi, emacs.
2026-04-25 20:20:05 -05:00
Brooklyn Nicholson d056b610b7 fix: avoid prompt_toolkit complex tempfile bug and prefer nvim first
Setting buffer.tempfile = 'prompt.md' pushed prompt_toolkit into its
complex-tempfile path, which creates a temp dir and then calls
os.makedirs() on that same path when no subdirectory is present. That
raises EEXIST before the editor can launch.

Keep prompt_toolkit on the simple tempfile path with .md suffix, and
make the editor fallback chain explicit on both surfaces:
$VISUAL -> $EDITOR -> nvim -> vim -> vi -> nano.
2026-04-25 20:16:50 -05:00
Teknium 2536a36f6f fix(tui): route /save through session.save JSON-RPC
The cherry-picked approach serialized the UI-shaped transcript on the Node
side, producing a third JSON format alongside cli.py save_conversation and
tui_gateway session.save. Simpler to call the existing session.save method,
which already writes the canonical agent history (raw OpenAI messages +
model) to an absolute-path file.

- /save still short-circuits before the slash worker
- Empty transcript -> 'no conversation yet'
- No active session -> 'no active session - nothing to save'
- Otherwise: rpc('session.save', {session_id}) and echo back the file path
- Tests updated to assert RPC contract; new test covers the no-sid case
2026-04-25 18:11:37 -07:00
helix4u 1b8ca9254f fix(tui): save live transcript from slash command 2026-04-25 18:11:37 -07:00
Brooklyn Nicholson db7c5735f0 fix: prefer vim over nano for $EDITOR fallback (CLI + TUI)
prompt_toolkit's default editor list is: $VISUAL, $EDITOR, /usr/bin/editor,
/usr/bin/nano, /usr/bin/pico, /usr/bin/vi, /usr/bin/emacs — so when
neither env var is set, the base CLI launched nano. The TUI fell back
to a literal 'vi'. Same Ctrl+G keystroke, two different editors.

Pick the same chain on both surfaces:
  $VISUAL → $EDITOR → vim → vi → nano

CLI: override input_area.buffer._open_file_in_editor on the TextArea
once at app build time. Local to that buffer; doesn't touch
os.environ or affect other subprocesses.

TUI: extract resolveEditor() into ui-tui/src/lib/editor.ts. PATH walk
with accessSync(X_OK), no shelling out. Six-line unit test verifies
the priority order and the multi-entry PATH walk.
2026-04-25 20:11:25 -05:00
Teknium 8bbeaea6c7 fix(config): broaden api-key ref lookup to templated base_url
The raw-template lookup added in PR #15817 went through
`get_compatible_custom_providers(read_raw_config())`, which calls
`_normalize_custom_provider_entry` → `urlparse(base_url)`. Any
entry whose `base_url` is itself an env-ref (`${NEURALWATT_API_BASE}`)
was dropped as 'not a valid URL', so `api_key_ref` stayed empty and the
resolved secret was still written to `model.api_key` — the exact case
the original Discord report described.

Replace the normalizer-gated lookup with a direct read of
`raw['custom_providers']` and `raw['providers']`, indexed by name
(case-insensitive, optionally qualified by model) so the loaded
(expanded) entry can be matched regardless of how `base_url` is
written.

Add an integration regression test driving the real
`select_provider_and_model` entry point with the Discord-reported
NeuralWatt config (`${VAR}` in both `base_url` and `api_key`).
This test fails on the PR-only fix and passes with the broadened
lookup.
2026-04-25 18:10:52 -07:00
helix4u 1fdc31b214 fix(config): preserve custom provider api key refs 2026-04-25 18:10:52 -07:00
Brooklyn Nicholson 5fac6c3440 fix(cli): write editor draft to prompt.md so syntax highlighting works
Base CLI was handing prompt_toolkit's Buffer.open_in_editor() a default
config — Buffer.tempfile_suffix and .tempfile both empty — so it
created /tmp/tmpXXXXXX with no extension. nano/vim/helix all key
syntax highlighting off the file extension, so the buffer rendered
plain.

The TUI already writes to <mkdtemp>/prompt.md and gets full markdown
highlighting + a sensible title bar. Set buffer.tempfile = 'prompt.md'
on the TextArea so prompt_toolkit's complex-tempfile path produces
<mkdtemp>/prompt.md to match. shutil.rmtree cleanup is built-in.
2026-04-25 20:04:04 -05:00
kshitijk4poor 2c56dce0ed fix(model): preserve custom endpoint credentials and accept cloud models not in /v1/models
When switching models on a custom endpoint (ollama-launch):
- Same-provider switches no longer re-resolve credentials (fixes base_url
  being lost for 'custom' provider on subsequent switches)
- Named providers (ollama-launch) are resolved via user_providers so
  switch_model can find their base_url from config
- Models not in the /v1/models probe but present in the user's saved
  provider config are accepted with a warning instead of rejected
- CLI /model and TUI /model both pass user_providers/custom_providers
  to switch_model so the config model list is available for validation

Closes #15088
2026-04-25 18:03:47 -07:00
Teknium 01cf2c65cc chore(release): map iris@growthpillars.co to irispillars (#15825)
Follow-up to #15533 (merged). Prevents release notes CI from
attributing the contributor to the placeholder.
2026-04-25 18:02:13 -07:00
helix4u b2d3308f98 fix(doctor): accept bare custom provider 2026-04-25 18:01:36 -07:00
Iris Jin 25ba6a4a74 fix(gateway): make reasoning session-scoped by default 2026-04-25 18:01:31 -07:00
Brooklyn Nicholson 4c797bfae9 fix(cli): accept Alt+G as Ctrl+G fallback in VSCode/Cursor terminals
Same problem as the TUI: Cursor and VSCode bind Ctrl+G to "Find Next"
at the editor level, so the keystroke never reaches the terminal and
the prompt_toolkit-driven Hermes CLI sees nothing.

Register ('escape', 'g') alongside the existing 'c-g' on the same
handler so the editor handoff works inside Cursor/VSCode too. The
filter (no clarify/approval/sudo/secret prompt active) is unchanged.
2026-04-25 20:01:03 -05:00
Brooklyn Nicholson c58956a9a2 fix(tui): accept Alt+G as Ctrl+G fallback in VSCode/Cursor terminals
VSCode and Cursor bind Ctrl+G to "Find Next" at the editor level, so
the keystroke never reaches the embedded terminal — Ctrl+G to open
\$EDITOR was effectively dead inside those IDEs.

Alt+G is unbound in both editors and reaches the TUI cleanly as
`\x1bg` → `key.meta && ch === 'g'` after parse-keypress. Accept it
alongside the existing isAction(key, ch, 'g') check, and document the
fallback in README + the hotkeys panel.
2026-04-25 19:57:17 -05:00
Brooklyn Nicholson 3944b22506 fix(tui): suspend Ink properly when opening $EDITOR via Ctrl+G
The Ctrl+G handler was toggling the alt-screen by hand
(`\x1b[?1049l` ... `\x1b[?1049h`) without releasing stdin or kitty
keyboard mode, so the launched editor would lose keystrokes (Ink kept
swallowing them) and editors that don't speak CSI-u (e.g. nano) would
print "Unknown sequence" for every Ctrl-key.

Switch to `withInkSuspended` from @hermes/ink, the same helper
`/setup` already uses. It pauses Ink, removes stdin listeners, drops
raw mode, disables kitty/modifyOtherKeys + mouse + focus reporting,
runs the editor, then restores everything with a full repaint.
2026-04-25 19:54:06 -05:00
brooklyn! 489bed6f96 Merge pull request #15478 from yes999zc/fix-deepseek-reasoning-all-assistant-messages
fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages
2026-04-25 19:19:33 -05:00
FocusFlow Dev ad0ac89478 fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages
Previously _copy_reasoning_content_for_api only padded reasoning_content
when the assistant message had tool_calls. DeepSeek V4 thinking mode
requires the field on every assistant turn, including plain text replies
without tool_calls.

- Remove the 'source_msg.get("tool_calls") and' guard
- Update test: plain assistant turns now get padded for DeepSeek/Kimi

Fixes #15213
2026-04-26 07:47:13 +08:00
Teknium dc4d92f131 docs: embed tutorial videos on webhooks + auxiliary models pages (#15809)
- webhooks.md: adds a Video Tutorial section under the intro with a
  responsive YouTube iframe (WNYe5mD4fY8).
- configuration.md: adds a Video Tutorial subsection under Auxiliary
  Models with a responsive YouTube iframe (NoF-YajElIM).

Both use a 16:9 aspect-ratio wrapper so the embeds scale cleanly on
mobile. Verified with `npm run build` — MDX parses clean, no new
warnings or broken links introduced.
2026-04-25 16:44:53 -07:00
Teknium 47420a84b9 docs(obliteratus): link YouTube video guide in SKILL.md (#15808)
Adds a 'Video Guide' section pointing at the walkthrough of a Hermes agent
abliterating Gemma with OBLITERATUS, so the agent can surface it when the
user wants a visual overview before running the workflow.
2026-04-25 16:30:38 -07:00
brooklyn! f93d4624bf Merge pull request #15749 from Zjianru/fix/copy-reasoning-content-ordering-and-cross-provider-isolation
fix(agent): ordering fix in _copy_reasoning_content_for_api — cross-provider reasoning isolation
2026-04-25 17:21:49 -05:00
codez 5ae608152e fix: remove has_reasoning guard — inject empty reasoning_content for DeepSeek/Kimi tool_calls unconditionally 2026-04-26 06:08:54 +08:00
brooklyn! 88b65cc82a Update run_agent.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-26 05:49:38 +08:00
brooklyn! edc78e258c Merge pull request #15766 from NousResearch/bb/tui-ssh-copy
fix(tui): honor client copy shortcut over ssh
2026-04-25 15:33:17 -05:00
Brooklyn Nicholson 31d7f1951a fix(tui): clamp copied selection bounds
Clamp copied selection columns to the screen width before scanning rendered cells.
2026-04-25 15:32:45 -05:00
Brooklyn Nicholson b1c18e5a41 refactor(tui): format screen imports
Keep screen.ts import ordering aligned with the ui-tui formatter.
2026-04-25 15:26:51 -05:00
Brooklyn Nicholson bd66e55a02 fix(tui): track rendered spaces for selection copy
- add a written-cell bitmap so selection can distinguish rendered spaces from blank padding
- preserve code indentation without markdown-specific rendering hacks
2026-04-25 15:21:26 -05:00
Brooklyn Nicholson 1735ced93b fix(tui): preserve code block indentation in selection
Render code indentation spaces as selectable cells so copied fenced code keeps its leading whitespace.
2026-04-25 15:17:36 -05:00
Brooklyn Nicholson bba16943f6 fix(tui): preserve rendered indentation in selections
- trim only empty edge rows instead of full selected text
- bound selection paint using unwritten cells so rendered indentation remains copyable
2026-04-25 15:14:26 -05:00
Brooklyn Nicholson 132620ba3d refactor(tui): simplify remote copy hotkey hints
Use an explicit conditional table instead of spread casting for SSH copy hint rows.
2026-04-25 15:09:12 -05:00
Brooklyn Nicholson 876bb60044 fix(tui): trim whitespace-only selection chrome
- clamp selection highlight to real row content so blank drag margins do not render or copy
- keep successful copy actions quiet while preserving usage and failure feedback
2026-04-25 15:07:29 -05:00
Brooklyn Nicholson a68793b6c4 refactor(tui): share remote shell detection
Reuse the platform helper for SSH-aware copy hints so hotkey display and input handling cannot drift.
2026-04-25 14:55:28 -05:00
Brooklyn Nicholson bcc5362432 fix(tui): honor client copy shortcut over ssh
- accept forwarded Cmd+C for selection copy in SSH sessions even when Hermes runs on Linux
- keep local Linux Alt+C from acting as copy and update TUI hotkey hints for remote shells
2026-04-25 14:44:39 -05:00
brooklyn! 283c8fd6e2 Merge pull request #15755 from NousResearch/bb/tui-model-flag
fix(tui): honor launch model overrides
2026-04-25 14:30:26 -05:00
Brooklyn Nicholson 919274b60e fix(tui): align overlay q shortcut casing
Keep shared overlay close behavior consistent with pager and agents overlays by binding lowercase q only.
2026-04-25 14:26:35 -05:00
Brooklyn Nicholson 6e83d90eb4 refactor(tui): tighten overlay helpers
- rename overlay help text component to match its role
- share picker window math across model, session, and skills overlays
2026-04-25 14:23:45 -05:00
Brooklyn Nicholson c6fdf48b79 fix(tui): sync inference model after switches
- keep HERMES_INFERENCE_MODEL aligned with HERMES_MODEL after in-TUI model switches
- clarify static provider detection remapping docs
2026-04-25 14:17:57 -05:00
Brooklyn Nicholson a046483e86 fix(tui): share overlay close controls
- add reusable overlay key and help-text helpers for picker-style overlays
- make model, session, skills, and pager hints consistently support Esc/q close behavior
2026-04-25 14:17:04 -05:00
Brooklyn Nicholson fdcbd2257b fix(tui): resolve startup model aliases statically
- expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution
- keep startup alias resolution network-free and add regression tests in models and tui gateway suites
2026-04-25 14:13:02 -05:00
Brooklyn Nicholson 48bdd2445e fix(tui): apply ui-tui fix pass and restore type-check
- run the requested ui-tui lint+format pass and include resulting formatting updates
- guard text-measure cache eviction key in hermes-ink so ui-tui type-check stays green
2026-04-25 14:08:54 -05:00
Brooklyn Nicholson 5e52011de3 fix(tui): bind provider as model alias 2026-04-25 13:58:59 -05:00
Brooklyn Nicholson e48a497d16 fix(tui): share static model detection 2026-04-25 13:56:16 -05:00
Brooklyn Nicholson 2dfcc8087a fix(tui): avoid network lookup during startup 2026-04-25 13:47:18 -05:00
Brooklyn Nicholson 4db58d45d4 fix(tui): address startup provider review 2026-04-25 13:29:15 -05:00
Brooklyn Nicholson 57b43fdd4b fix(tui): preserve provider precedence on startup 2026-04-25 13:25:43 -05:00
Brooklyn Nicholson e9c47c7042 fix(tui): honor launch model overrides 2026-04-25 13:21:59 -05:00
brooklyn! ee0728c6c4 Merge pull request #15351 from helix4u/fix/tui-rebuild-missing-ink-bundle
fix(tui): rebuild when ink bundle is missing
2026-04-25 13:14:23 -05:00
codez 9daa0620a6 fix(agent): ordering fix in _copy_reasoning_content_for_api — cross-provider reasoning isolation
Fix logic-ordering bug where normalized_reasoning promotion returns
before the DeepSeek/Kimi needs_empty_reasoning guard, causing
cross-provider reasoning content (MiniMax → DeepSeek) to leak into
reasoning_content and trigger HTTP 400.

Changes:
- Reorder branching: existing reasoning_content check first
- Add 'not has_reasoning' guard so poisoned histories (no reasoning)
  still get '' injected for DeepSeek/Kimi
- Healthy same-provider reasoning promotion path unchanged

Refs: #15250, #15213
2026-04-26 02:04:52 +08:00
kshitij 648b89911f fix: use output_text for assistant message content in Codex Responses API (#15690)
The Codex Responses API rejects input_text inside assistant messages —
only output_text and refusal are valid content types for assistant role.

_chat_content_to_responses_parts() previously hardcoded all text content
to input_text regardless of the message role. When an assistant message
had list-format content (multimodal or structured), this produced invalid
input_text parts that the API rejected with:

  Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.

Fix: add a role parameter to _chat_content_to_responses_parts() that
selects output_text for assistant messages and input_text for user
messages. Thread this through _chat_messages_to_responses_input() and
_preflight_codex_input_items().

Fixes #15687
2026-04-25 10:13:29 -07:00
kshitijk4poor 7c17accb29 fix: /stop now immediately aborts streaming retry loop
When a user sends /stop during a streaming API call, the outer poll loop
detects _interrupt_requested and closes the HTTP connection. However, the
inner _call() thread catches the connection error and enters its retry
loop — opening a FRESH connection without checking the interrupt flag.

On slow providers like ollama-cloud, each retry attempt blocks for the
full stream-read timeout (120s+). With 3 retry attempts this caused
510+ second delays between /stop and actual response — the agent appeared
completely unresponsive despite the stop being acknowledged.

Fix: add an _interrupt_requested check at the top of the streaming retry
loop so the agent exits immediately instead of retrying.

Also fix log truncation: all session key logging in gateway/run.py used
[:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437'
(33 chars) to 'agent:main:telegram:' — losing the identifying chat type
and user ID. Replace with full keys to make logs debuggable.

Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.
2026-04-25 09:51:39 -07:00
Teknium 5006b2204b fix(update): honor RestartSec when polling for gateway respawn (#15707)
The post-graceful-drain is-active poll used a fixed 10s timeout, but
systemd's hermes-gateway.service has RestartSec=30 — so systemd won't
respawn the unit for 30s after exit-75, and our poll gives up during
the cooldown. Result: every 'hermes update' printed

  ⚠ hermes-gateway drained but didn't relaunch — forcing restart

followed by a redundant 'systemctl restart' that kicked the newly-
respawning gateway again (and re-started WhatsApp / Discord a second
time in the process).

Fix: read RestartUSec from the unit via 'systemctl show' and set the
poll budget to max(10s, RestartSec + 10s slack). Units without
RestartSec set (or value=infinity) fall back to the original 10s.

Observed timeline from journalctl before fix:
  08:56:22.262  old PID exits 75
  08:56:32.707  systemd logs Stopped -> Started  (10.4s gap, > 10s budget)

After fix the poll covers 40s — comfortably inside RestartSec + slack.

Validation:
- RestartUSec parser tested against '30s', '100ms', '1min 30s',
  'infinity', '', 'garbage', '500us', '2min' — all correct.
- Against the live hermes-gateway.service: parses to 30.0s.
- tests/hermes_cli/test_update_gateway_restart.py: 41/41 pass.
2026-04-25 09:08:27 -07:00
Teknium a9fa73a620 feat(oneshot): add --model / --provider / HERMES_INFERENCE_MODEL (#15704)
Makes hermes -z usable by sweeper without mutating user config.

- Top-level -m/--model and --provider flags that apply to -z/--oneshot
  (mirrors hermes chat's plumbing).
- HERMES_INFERENCE_MODEL env var as the parallel to HERMES_INFERENCE_PROVIDER
  for CI / scripted invocations.
- resolve_runtime_provider() gets the requested provider; when --model is
  given without --provider, detect_provider_for_model() auto-selects the
  provider that serves it (same semantic as /model in an interactive session).
- --provider without --model errors out with exit 2 — carrying a config
  model across to a different provider is usually wrong, and silently
  picking the provider's catalog default hides the mismatch.

Config defaults still used when both flags are omitted (existing behavior).

Validation (all live against OpenRouter):
  -z 'x' ....................... uses config default (opus-4.7)
  -z 'x' --model haiku-4.5 ..... haiku-4.5 via auto-detected openrouter
  -z 'x' --model ... --provider  pair as given
  HERMES_INFERENCE_MODEL=... -z  haiku-4.5 via env var
  -z 'x' --provider anthropic .. exits 2 with error to stderr
2026-04-25 08:55:36 -07:00
Teknium 7c8c031f60 feat: add hermes -z <prompt> one-shot mode (#15702)
* feat: add `hermes -z <prompt>` one-shot mode

Top-level flag that runs a single prompt and prints ONLY the final
response text to stdout. No banner, no spinner, no tool previews, no
session_id line — stdout is machine-readable, stderr is silent.

Tools, memory, rules, and AGENTS.md in the CWD are loaded as normal.
Approvals are auto-bypassed (sets HERMES_YOLO_MODE=1 for the call).
Bypasses cli.py entirely — goes straight to AIAgent.chat().

* feat(oneshot): handle interactive-callback gaps explicitly

Document (and where needed, patch) the interactive surfaces that have
no user to answer in oneshot mode:

  - clarify       — inject a callback that tells the agent to pick the
                    best default and continue (previously returned a
                    generic 'not available in this execution context'
                    error that wastes a tool call)
  - sudo password — terminal_tool already gates on HERMES_INTERACTIVE
                    (we don't set it); sudo fails gracefully
  - shell hooks   — HERMES_ACCEPT_HOOKS=1 auto-approves; also falls
                    back to deny on non-tty stdin
  - dangerous cmd — HERMES_YOLO_MODE=1 short-circuits before input()
  - secret capture— tool returns gracefully when no callback wired

Live-tested: agent asked clarify(['red','blue']) and got 'red' back,
replied with only 'red'.
2026-04-25 08:44:38 -07:00
Teknium ea01bdcebe refactor(memory): remove flush_memories entirely (#15696)
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
2026-04-25 08:21:14 -07:00
kshitijk4poor d635e2df3f fix(compression): pass provider to context length resolver in feasibility check
_check_compression_model_feasibility calls get_model_context_length
without provider=, so Codex OAuth users get 1,050,000 (from models.dev
for 'openai') instead of the actual 272,000 limit. This happens because
_infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'),
skipping the Codex-specific resolution branch entirely.

Result: compression threshold set at 85% of 1.05M = 892K — conversations
never trigger compression, the context grows unbounded, and when gateway
hygiene eventually forces compression, the Codex endpoint drops the
oversized streaming request ('peer closed connection without sending
complete message body').

Fix: forward self.provider to get_model_context_length so provider-
specific resolution branches (Codex OAuth 272K, Copilot live /models,
Nous suffix-match) fire correctly.

Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).
2026-04-25 07:09:47 -07:00
Teknium cf2fabc40f docs(dashboard): document page-scoped plugin slots (#15662)
Follow-up to PR #15658. The feature PR introduced page-scoped slots
(<page>:top / <page>:bottom inside every built-in page) but only
touched the Shell slots catalogue. Adds proper narrative coverage so
plugin authors find the feature.

Changes
- extending-the-dashboard.md:
  - Frontmatter description + intro bullet now mention page-scoped slots
  - New TOC entry "Augmenting built-in pages (page-scoped slots)"
  - New dedicated subsection after "Replacing built-in pages"
    explaining the heavy-vs-light tradeoff, listing the pages that
    expose slots, and showing a worked manifest + IIFE example with
    tab.hidden: true
  - Cross-link from the tab.override section pointing readers to the
    lighter augmentation option
- web-dashboard.md:
  - Bullet mentioning "page-scoped slots (inject widgets into
    built-in pages without overriding them)"

Validation
- TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches
  the generated heading slug
- Code fences balanced (64, even)
- Pre-existing docusaurus build errors (skills.json, api-server.md
  link) reproduce on bare main -- not introduced here
2026-04-25 06:59:24 -07:00
Teknium af22421e87 feat(dashboard): page-scoped plugin slots for built-in pages (#15658)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.

* feat(dashboard): page-scoped plugin slots for built-in pages

Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.

Previously, plugins could only:
  1. Add new tabs (tab.path)
  2. Replace whole built-in pages (tab.override)
  3. Inject into global shell slots (header-*, footer-*, pre-main, ...)

None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.

Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
  (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
  grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
    <PluginSlot name="<page>:top" />
  as the first child of its outer wrapper and
    <PluginSlot name="<page>:bottom" />
  as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
  sessions:top via registerSlot(), with matching slots entry in
  the manifest -- so freshly-setup users can see what page-scoped
  slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
  authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
  colon-bearing slot names (sessions:top, analytics:bottom, ...)

Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
2026-04-25 06:55:35 -07:00
Teknium 97d54f0e4d fix(terminal): three-layer defense against watch_patterns notification spam (#15642)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
2026-04-25 06:41:58 -07:00
Teknium 6e561ffa6d fix(update): poll is-active instead of one-shot sleep(3) after gateway restart (#15639)
The auto-restart path in `hermes update` verifies systemd unit health with
`time.sleep(3)` + a single `systemctl is-active` call.  The unit's
Stopped -> Started transition after a graceful SIGUSR1 exit (or a hard
restart) is not always complete inside that 3s window, so the verify
races and reports 'drained but didn't relaunch' even though systemd is
about to bring the unit back up a fraction of a second later.  Users
then see a spurious warning, a redundant fallback `systemctl restart`
fires, and adapters (Discord, WhatsApp) get restarted twice.

Replace the three sleep+oneshot sites with a small `_wait_for_service_active()`
closure that polls `is-active` every 0.5s for up to 10s.  Behaviour
is unchanged when the unit is healthy or truly dead — only the race
window around a clean restart is now handled correctly.

Tests: tests/hermes_cli/test_update_gateway_restart.py (41/41).
2026-04-25 06:11:22 -07:00
Teknium ac05daa189 fix(tools): dedupe bundled plugin toolsets with built-in entries (#15634)
`hermes tools` → "reconfigure existing" listed Spotify twice because
the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174)
left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets()
unconditionally appended get_plugin_toolsets() on top, so the same
'spotify' key showed up from both sources.

Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the
nicer label and description). Also guards against future bundled plugins
that share a toolset key with a built-in.
2026-04-25 05:53:08 -07:00
Teknium 3c1c65e754 fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)
Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
2026-04-25 05:50:34 -07:00
Teknium f92006ce1c fix(compression): reserve system+tools headroom when aux binds threshold (#15631)
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:

    new_threshold = aux_context

This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.

Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.

This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.

Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
2026-04-25 05:41:56 -07:00
Teknium b35d692f45 chore(release): map ash@users.noreply.github.com to ash 2026-04-25 05:27:17 -07:00
Ash Rowan Vale 🌿 facea84559 fix(auxiliary): retry without temperature when any provider rejects it
Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>
2026-04-25 05:27:17 -07:00
Teknium f67a61dc93 fix(flush_memories): strip temperature from codex_responses fallback (#15620)
The memory-flush fallback for api_mode='codex_responses' was unconditionally
adding `temperature` to codex_kwargs before calling _run_codex_stream. The
Responses API does not accept temperature on any supported backend:

- chatgpt.com/backend-api/codex rejects it outright
- api.openai.com + gpt-5/o-series reasoning models reject it
- Copilot Responses rejects it on reasoning models

The CodexAuxiliaryClient adapter and the codex_responses transport both
correctly omit temperature — the flush fallback was the only path putting
it back. On errors from the primary aux path (e.g. expired OAuth token),
users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter:
temperature`.

Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.
2026-04-25 05:01:25 -07:00
Teknium 6ed37e0f42 feat(tools): make discord/discord_admin opt-in, Discord-only
Both discord (read/participate) and discord_admin (server admin) are now
configurable via `hermes tools` with default-OFF. Previously the core
discord tool (fetch_messages, search_members, create_thread) auto-loaded
on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user
never opted into.

Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so
the discord toolsets only show up in the Discord platform's checklist,
not on CLI/Telegram/Slack/etc. Applied at four gates:
  - _prompt_toolset_checklist: checklist filter
  - _get_platform_tools: resolution filter (both branches)
  - _save_platform_tools: save-time filter (covers 'Configure all
    platforms' and hand-edited config.yaml)
  - tools_disable_enable_command: rejects `hermes tools enable discord`
    on non-Discord platforms with a clear error

build_session_context_prompt now injects the Discord IDs block only
when both conditions hold: the discord/discord_admin toolset is
enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough —
the tool's check_fn gates on the token at registry time, so opting
in without a token yields no tools and the IDs block would lie.
Otherwise keep the stale-API disclaimer.
2026-04-25 04:51:11 -07:00
alt-glitch 591deeb928 feat(session): inject Discord IDs block when discord tool is loaded
When DISCORD_BOT_TOKEN is set — meaning the discord tool actually
loads — emit a dedicated IDs block in the session context prompt so
the agent can call ``fetch_messages``, ``pin_message``, etc. with
real identifiers instead of probing.

Currently only ``thread_id`` was exposed as a raw ID (via the
``description`` string).  The agent in a Discord thread had to guess
that the thread ID doubles as a channel ID for the REST API (it
does), and it had no way to reference the parent channel, the guild,
or the triggering message at all.

The block adapts to context:

  - Thread:     guild / parent channel / thread / message
  - Channel:    guild / channel / message
  - (DM has no guild/channel IDs worth listing; only message)

Discord isn't in _PII_SAFE_PLATFORMS, so IDs ship unredacted.
2026-04-25 04:51:11 -07:00
alt-glitch 5ae07e7b5c fix(session): gate stale "no Discord APIs" note on DISCORD_BOT_TOKEN
The Discord platform note in the session context prompt claimed the
agent has no server-management APIs — pre-dating the discord tool.
With a bot token configured the agent actually has fetch_messages,
search_members, create_thread, and optionally the discord_admin tool;
telling the model otherwise causes it to refuse or apologise for
calls it is fully able to make.

Gate the disclaimer on DISCORD_BOT_TOKEN being unset, matching the
tool's own ``check_fn``.  Without a token the note still appears and
remains accurate; with a token the model is no longer gaslit into
refusing valid tool calls.
2026-04-25 04:51:11 -07:00
alt-glitch 47b02e961c feat(discord): populate guild_id, parent_chat_id, message_id on SessionSource
Discord knows all four identifiers for every inbound message — guild,
channel (or thread), parent channel when in a thread, and the
triggering message.  Pass them into ``SessionSource`` via the new
``build_source()`` kwargs so downstream code (context-prompt builder,
delivery, logging) can use them without re-resolving from discord.py
objects.

For auto-threaded messages, remember the original channel as the
parent before swapping ``chat_id`` to the freshly created thread.

Behavioural: still a no-op — nothing consumes these fields yet.
2026-04-25 04:51:11 -07:00
alt-glitch 0702231dd8 feat(session): add guild_id/parent_chat_id/message_id to SessionSource
Groundwork for injecting raw platform identifiers into the agent's
system prompt.  Currently only `thread_id` is exposed as a raw ID —
callers in a Discord thread had to guess `channel_id == thread_id`
(which happens to work because threads are channels in Discord's REST
API) and had no way to reference the parent channel, guild, or the
triggering message.

Adds three optional fields:

- `guild_id` — Discord guild / Slack workspace / Matrix server scope
- `parent_chat_id` — parent channel when chat_id refers to a thread
- `message_id` — ID of the triggering message (pin/reply/react)

Extends `BasePlatformAdapter.build_source()` to accept + forward them
and teaches `to_dict`/`from_dict` to serialize them.  Behaviourally a
no-op: nothing reads the fields yet and they default to None.
2026-04-25 04:51:11 -07:00
alt-glitch db09477b77 feat(feishu): wire feishu doc/drive tools into hermes-feishu composite
The feishu_doc and feishu_drive tools were registered in the tool
registry but never added to the hermes-feishu composite toolset.
The pipeline fix from the prior commit now recovers them automatically
once they are in the composite.
2026-04-25 04:50:14 -07:00
alt-glitch 81987f0350 feat(discord): split discord_server into discord + discord_admin tools
Split the monolithic discord_server tool (14 actions) into two:

- discord: core actions (fetch_messages, search_members, create_thread)
  that are useful for the agent's normal operation. Auto-enabled on
  the discord platform via the pipeline fix.

- discord_admin: server management actions (list channels/roles, pins,
  role assignment) that require explicit opt-in via hermes tools.
  Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.
2026-04-25 04:50:14 -07:00
alt-glitch 9830905dab fix(tools): recover non-configurable toolsets from composite resolution
The reverse-mapping loop in _get_platform_tools only checked
CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets
like discord and feishu_doc whose tools were in the composite but
had no configurable key. Add a second pass over TOOLSETS that picks
up unclaimed toolsets whose tools are present in the resolved
composite.
2026-04-25 04:50:14 -07:00
Teknium 0d548d1db9 fix(cron): wire context_from through the update action
The tool schema promised 'On update, pass an empty array to clear' but the
update branch ignored the context_from kwarg entirely — users could set
the field at create time and never modify or clear it afterward.

- tools/cronjob_tools.py: handle context_from in the update branch the
  same way script/enabled_toolsets/workdir are handled: normalize str/list
  to refs, validate each referenced job exists (same check the create
  branch does), store as list-or-None to match create_job()'s shape.
  Empty string or empty list clears the field.
- tests/cron/test_cron_context_from.py: 6 new tests covering add/change/
  clear (both shapes)/bad-ref/preserve-across-unrelated-update.
2026-04-25 04:49:28 -07:00
MorAlekss eb92222811 fix(cron): silent skip when context_from job has no output yet 2026-04-25 04:49:28 -07:00
MorAlekss e4a91ccb76 test(cron): add PermissionError coverage for context_from 2026-04-25 04:49:28 -07:00
MorAlekss 5ac5365923 feat(cron): add context_from field for cron job output chaining 2026-04-25 04:49:28 -07:00
Teknium f433197f23 feat(installer): FHS layout for root installs on Linux (#15608)
Root installs on Linux now put the code at /usr/local/lib/hermes-agent and
the hermes command at /usr/local/bin/hermes.  HERMES_HOME (~/.hermes) stays
state-only.  Matches Claude Code / Codex CLI / OpenClaw, keeps Docker
bind-mounted /root/ volumes lean, and puts the command on every shell's
default PATH without touching shell RC files.

- Non-root users and macOS root: unchanged
- Existing root installs at $HERMES_HOME/hermes-agent: preserved in-place
  (detected via .git dir) — no auto-migration, no breakage
- Explicit --dir / $HERMES_INSTALL_DIR: always wins, never overridden
- Termux: unchanged (package manager manages /data/data/...)

Requested by @souly9999 (Discord). Our own Dockerfile already uses this
split (code at /opt/hermes, data at /opt/data volume); the user-install
path now matches.
2026-04-25 04:49:16 -07:00
Teknium df485628ce chore(release): map Readon's git email to GitHub login 2026-04-25 04:49:07 -07:00
Yindong 9fde22d233 fix the reset of model change by /model. 2026-04-25 04:49:07 -07:00
alt-glitch 9d7b64b5dd fix(tools): normalize numeric entries and clear stale no_mcp in _save_platform_tools
YAML parses bare numeric toolset names (e.g. 12306:) as int, causing
TypeError in sorted() since the read path normalizes to str but the
save path did not.

The no_mcp sentinel was preserved in existing entries even when the
user re-enabled MCP servers, causing MCP to stay silently disabled.
2026-04-25 04:49:02 -07:00
vominh1919 5401a0080d fix: recalculate token budgets on model switch in ContextCompressor
update_model() recalculated threshold_tokens but left tail_token_budget
and max_summary_tokens at their __init__ values. When switching from a
200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K)
instead of the intended ~10%.

Adds budget recalculation in update_model() and 2 regression tests.
2026-04-25 15:07:56 +05:30
Teknium e5647d7863 docs: consolidate dashboard themes and plugins into Extending the Dashboard (#15530)
The web-dashboard.md and dashboard-plugins.md pages had overlapping,
partial coverage of the theme and plugin systems. Themes were split
across two pages; the plugin docs had a minimal manifest reference but
no step-by-step guide, no slot catalog, and no theme+plugin demo.

New: user-guide/features/extending-the-dashboard.md — single navigable
reference for all three extension layers (themes, UI plugins, backend
plugins). Includes:

- Theme quick-start + full schema (palette, typography, layout, layout
  variants, assets, componentStyles, colorOverrides, customCSS)
- Plugin quick-start + full schema (manifest, SDK, slots, tab.override,
  tab.hidden, backend routes, custom CSS)
- 10-slot shell catalog with locations
- Plugin discovery + load lifecycle
- Combined theme+plugin walkthrough (Strike Freedom cockpit demo)
- API reference + troubleshooting

web-dashboard.md: trimmed to core tool docs (pages, REST API, CORS,
development). Theme/plugin content now points to the new page with a
built-in themes summary table.

dashboard-plugins.md: deleted (merged into extending-the-dashboard.md).

sidebars.ts: swap 'dashboard-plugins' → 'extending-the-dashboard' under
the Management group.

No user-facing behavior change; docs-only.
2026-04-24 23:26:51 -07:00
Teknium 023b1bff11 fix(delegate): resolve subagent approval prompts without deadlocking parent TUI (#15491)
Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval
callback lives in tools/terminal_tool.py's threading.local(), which worker
threads do not inherit. When a subagent hits a dangerous-command guard,
prompt_dangerous_approval() falls back to input() from the worker thread,
deadlocking against the parent's prompt_toolkit TUI that owns stdin.

Fix: install a non-interactive callback into every subagent worker thread
via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)).
The callback is config-gated by delegation.subagent_auto_approve:

  false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist)
  true            -> _subagent_auto_approve (opt-in YOLO for cron/batch)

Both emit a logger.warning audit line. Gateway sessions are unaffected
because they resolve approvals via tools/approval.py's per-session queue,
not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685).

- hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False
- cli-config.yaml.example: documented, commented (default)
- tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve,
  _get_subagent_approval_callback, wired into the child timeout executor
- tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion,
  and TLS scoping in the worker thread
2026-04-24 22:37:22 -07:00
brooklyn! 6407b3d5b3 Merge pull request #15488 from kevin-ho/fix/tui-mouse-toggle
fix(tui): proactive mouse disable on ConPTY + /mouse toggle command
2026-04-24 22:43:47 -05:00
Teknium 0a59994030 fix(cli-config): keep delegation overrides commented in example 2026-04-24 20:38:58 -07:00
MorAlekss 0ed37c0ca4 docs(delegate): document max_concurrent_children and max_spawn_depth + cost warning 2026-04-24 20:38:58 -07:00
Vesper (on behalf of Director) 1c8ce33d51 fix(tui): proactive mouse disable on ConPTY + /mouse toggle command
On Windows WSL2, ConPTY implicitly enables mouse event injection when
the alternate screen buffer (DEC 1049) is entered, causing raw escape
sequences to appear in the transcript as ghost characters.

Fix (two parts):
1. ConPTY fix: send DISABLE_MOUSE_TRACKING immediately after entering
   alt screen when mouse tracking is off (AlternateScreen.tsx)
2. Runtime toggle: add /mouse [on|off|toggle] slash command with config
   persistence (display.tui_mouse) so users can manage this at runtime

The env var HERMES_TUI_DISABLE_MOUSE continues to work as the initial
default, but can now be overridden via /mouse and persisted to config.

Closes: upstream ConPTY mouse injection issue
Credits: OutThisLife / PR #13716 for the toggle concept
2026-04-24 20:32:12 -07:00
Clifford Garwood 2182de55bb fix(matrix): drop needless DeviceID import + mock put_device_id in tests
Two adjustments to make CI pass:

- In gateway/platforms/matrix.py: `DeviceID` is `NewType("DeviceID", str)`,
  so passing `client.device_id` directly (already a str) works identically
  at runtime. The explicit import was cosmetic and tripped CI environments
  where `mautrix.types` doesn't re-export DeviceID at the expected path
  ("cannot import name 'DeviceID' from 'mautrix.types' (unknown location)").

- In tests/gateway/test_matrix.py: add `put_device_id` to the hand-written
  `PgCryptoStore` fake so the three encryption-path tests
  (test_connect_with_access_token_and_encryption,
  test_connect_uses_configured_device_id_over_whoami,
  test_connect_registers_encrypted_event_handler_when_encryption_on) can
  exercise the new crypto-store binding without AttributeError.
2026-04-25 07:17:03 +05:30
Clifford Garwood 3cf13747b7 fix(matrix): bind PgCryptoStore device_id so fresh E2EE installs work
PgCryptoStore.__init__ defaults _device_id to "" and put_account writes
that blank value into crypto_account. The UPSERT's ON CONFLICT DO UPDATE
clause deliberately does not touch device_id, so once the row is written
blank it stays blank forever — breaking every downstream device-scoped
olm operation. Peers' to-device olm ciphertext can't match our identity
key, no megolm sessions ever land, and the user sees "hermes is in the
room but never responds to encrypted messages".

Fix: call put_device_id(client.device_id) immediately after
crypto_store.open() and before olm.load(). This sets the store's
in-memory _device_id so the first put_account INSERT writes the correct
value from the start.

Observable symptoms without the fix, on a fresh crypto.db:
  - crypto_account.device_id = ""
  - crypto_tracked_user: 0 rows
  - crypto_device: 0 rows
  - crypto_olm_session: 0 rows
  - crypto_megolm_inbound_session: 0 rows
  - "No one-time keys nor device keys got when trying to share keys"
    warning on every startup
  - "olm event doesn't contain ciphertext for this device" DecryptionError
    on any inbound to-device event
  - Encrypted room messages arrive but never decrypt

After the fix (wiped crypto.db + restart):
  - device_id populated with actual runtime device (e.g. CZIKTRFLOV)
  - all counts populate from sync as expected
  - encrypted DMs flow normally

Who hits this: anyone with a fresh crypto.db — includes first-time matrix
E2EE setup, nio→mautrix migrations (since matrix.py removes the legacy
pickle on startup, creating a fresh SQLite store), and anyone who wipes
crypto.db to start over. Existing installs that somehow already have a
non-blank device_id would be unaffected, but no prior code path writes
it correctly, so that set is likely empty.
2026-04-25 07:17:03 +05:30
Siddharth Balyan 3e61703b08 fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths (#15444)
* fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths

fix-lockfiles checked npm lockfile hashes by running
`nix build .#<attr>.npmDeps`, but fetchNpmDeps is a fixed-output
derivation — if the old store path exists locally, Nix returns it from
cache without re-fetching. This caused the script to report "ok" even
when hashes were stale, while CI (with no cache) failed with a hash
mismatch.

Adding --rebuild forces Nix to re-derive and verify the output hash
against the declared one, catching staleness regardless of local cache
state. Also updates the tui and web npm deps hashes that were stale.

* fix(nix): regenerate ui-tui lockfile to add missing @emnapi entries

npm ci was failing because @emnapi/core and @emnapi/runtime were
missing from ui-tui/package-lock.json despite being required as peer
deps by @napi-rs/wasm-runtime (via @rolldown/binding-wasm32-wasi).

Running npm install --package-lock-only adds the missing entries.
The npmDepsHash reverts to its previous value since fetchNpmDeps was
already fetching these packages as transitive dependencies.
2026-04-25 06:14:32 +05:30
Teknium 05d8f11085 fix(/model): show provider-enforced context length, not raw models.dev (#15438)
/model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because
the display block used ModelInfo.context_window directly from models.dev.
Codex OAuth actually enforces 272K for the same slug, and the agent's
compressor already runs at 272K via get_model_context_length() — so the
banner + real context budget said 272K while /model lied with 1M.

Route the display context through a new resolve_display_context_length()
helper that always prefers agent.model_metadata.get_model_context_length
(which knows about Codex OAuth, Copilot, Nous caps) and only falls back
to models.dev when that returns nothing.

Fix applied to all 3 /model display sites:
  cli.py _handle_model_switch
  gateway/run.py picker on_model_selected callback
  gateway/run.py text-fallback confirmation

Reported by @emilstridell (Telegram, April 2026).
2026-04-24 17:21:38 -07:00
Teknium 13038dc747 fix(skills): ship google-workspace deps as [google] extra; make setup.py 3.9-parseable
Closes #13626.

Two follow-ups on top of the _hermes_home helper from @jerome-benoit's #12729:

1. Declare a [google] optional extra in pyproject.toml
   (google-api-python-client, google-auth-oauthlib, google-auth-httplib2) and
   include it in [all]. Packagers (Nix flake, Homebrew) now ship the deps by
   default, so `setup.py --check` does not need to shell out to pip at
   runtime — the imports succeed and install_deps() is never reached.
   This fixes the Nix breakage where pip/ensurepip are stripped.

2. Add `from __future__ import annotations` to setup.py so the PEP 604
   `str | None` annotation parses on Python 3.9 (macOS system python).
   Previously system python3 SyntaxError'd before any code ran.

install_deps() error message now also points users at the extra instead of
just the raw pip command.
2026-04-24 16:45:27 -07:00
Teknium 629e108ee2 chore(release): map jerome.benoit@sap.com to jerome-benoit 2026-04-24 16:45:27 -07:00
Jérôme Benoit c34d3f4807 fix(skills): factor HERMES_HOME resolution into shared _hermes_home helper
The three google-workspace scripts (setup.py, google_api.py, gws_bridge.py)
each had their own way of resolving HERMES_HOME:

- setup.py imported hermes_constants (crashes outside Hermes process)
- google_api.py used os.getenv inline (no strip, no empty handling)
- gws_bridge.py defined its own local get_hermes_home() (duplicate)

Extract the common logic into _hermes_home.py which:
- Delegates to hermes_constants when available (profile support, etc.)
- Falls back to os.getenv with .strip() + empty-as-unset handling
- Provides display_hermes_home() with ~/ shortening for profiles

All three scripts now import from _hermes_home instead of duplicating.

7 regression tests cover the fallback path: env var override, default
~/.hermes, empty env var, display shortening, profile paths, and
custom non-home paths.

Closes #12722
2026-04-24 16:45:27 -07:00
Teknium f14264c438 chore(release): map simbamax99@gmail.com to @simbam99 2026-04-24 16:42:31 -07:00
simbam99 19a3e2ce8e fix(gateway): follow compression continuations during /resume 2026-04-24 16:42:31 -07:00
Teknium d58b305adf refactor(deepseek-reasoning): consolidate detection into helpers + regression tests
Extracts _needs_kimi_tool_reasoning() for symmetry with the existing
_needs_deepseek_tool_reasoning() helper, so _copy_reasoning_content_for_api
uses the same detection logic as _build_assistant_message. Future changes
to either provider's signals now only touch one function.

Adds tests/run_agent/test_deepseek_reasoning_content_echo.py covering:
- All 3 DeepSeek detection signals (provider, model, host)
- Poisoned history replay (empty string fallback)
- Plain assistant turns NOT padded
- Explicit reasoning_content preserved
- Reasoning field promoted to reasoning_content
- Existing Kimi/Moonshot detection intact
- Non-thinking providers left alone

21 tests, all pass.
2026-04-24 16:38:29 -07:00
Teknium e93cc934c7 chore(release): map chenzeshi@live.com -> chen1749144759 in AUTHOR_MAP 2026-04-24 16:38:29 -07:00
chen1749144759 93a2d6b307 fix: add DeepSeek reasoning_content echo for tool-call messages
DeepSeek V4 thinking mode requires reasoning_content on every
assistant message that includes tool_calls. When this field is
missing from persisted history, replaying the session causes
HTTP 400: 'The reasoning_content in the thinking mode must be
passed back to the API.'

Two-part fix (refs #15250):

1. _copy_reasoning_content_for_api: Merge the Kimi-only and
   DeepSeek detection into a single needs_tool_reasoning_echo
   check. This handles already-poisoned persisted sessions by
   injecting an empty reasoning_content on replay.

2. _build_assistant_message: Store reasoning_content='' on new
   DeepSeek tool-call messages at creation time, preventing
   future session poisoning at the source.

Additional fix:
3. _handle_max_iterations: Add missing call to
   _copy_reasoning_content_for_api in the max-iterations flush
   path (previously only main loop and flush_memories had it).

Detection covers:
- provider == 'deepseek'
- model name containing 'deepseek' (case-insensitive)
- base URL matching api.deepseek.com (for custom provider)
2026-04-24 16:38:29 -07:00
Teknium 4fade39c90 chore(release): map benjaminsehl noreply email in AUTHOR_MAP 2026-04-24 16:04:37 -07:00
Benjamin Sehl f731c2c2bd fix(gateway/bluebubbles): align iMessage delivery with non-editable UX 2026-04-24 16:04:37 -07:00
Brian D. Evans 00c3d848d8 fix(memory): skip external-provider sync on interrupted turns (#15218)
``run_conversation`` was calling ``memory_manager.sync_all(
original_user_message, final_response)`` at the end of every turn
where both args were present.  That gate didn't consider the
``interrupted`` local flag, so an external memory backend received
partial assistant output, aborted tool chains, or mid-stream resets as
durable conversational truth.  Downstream recall then treated the
not-yet-real state as if the user had seen it complete, poisoning the
trust boundary between "what the user took away from the turn" and
"what Hermes was in the middle of producing when the interrupt hit".

Extracted the inline sync block into a new private method
``AIAgent._sync_external_memory_for_turn(original_user_message,
final_response, interrupted)`` so the interrupt guard is a single
visible check at the top of the method instead of hidden in a
boolean-and at the call site.  That also gives tests a clean seam to
assert on — the pre-fix layout buried the logic inside the 3,000-line
``run_conversation`` function where no focused test could reach it.

The new method encodes three independent skip conditions:

  1. ``interrupted`` → skip entirely (the #15218 fix).  Applies even
     when ``final_response`` and ``original_user_message`` happen to
     be populated — an interrupt may have landed between a streamed
     reply and the next tool call, so the strings on disk are not
     actually the turn the user took away.
  2. No memory manager / no final_response / no user message →
     preserve existing skip behaviour (nothing new for providerless
     sessions, system-initiated refreshes, tool-only turns that never
     resolved, etc.).
  3. Sync_all / queue_prefetch_all exceptions → swallow.  External
     memory providers are strictly best-effort; a misconfigured or
     offline backend must never block the user from seeing their
     response.

The prefetch side-effect is gated on the same interrupt flag: the
user's next message is almost certainly a retry of the same intent,
and a prefetch keyed on the interrupted turn would fire against stale
context.

### Tests (16 new, all passing on py3.11 venv)

``tests/run_agent/test_memory_sync_interrupted.py`` exercises the
helper directly on a bare ``AIAgent`` (``__new__`` pattern that the
interrupt-propagation tests already use).  Coverage:

- Interrupted turn with full-looking response → no sync (the fix)
- Interrupted turn with long assistant output → no sync (the interrupt
  could have landed mid-stream; strings-on-disk lie)
- Normal completed turn → sync_all + queue_prefetch_all both called
  with the right args (regression guard for the positive path)
- No final_response / no user_message / no memory manager → existing
  pre-fix skip paths still apply
- sync_all raises → exception swallowed, prefetch still attempted
- queue_prefetch_all raises → exception swallowed after sync succeeded
- 8-case parametrised matrix across (interrupted × final_response ×
  original_user_message) asserts sync fires iff interrupted=False AND
  both strings are non-empty

Closes #15218

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 15:30:18 -07:00
Yukipukii1 fd10463069 fix(env): safely quote ~/ subpaths in wrapped cd commands 2026-04-24 15:25:12 -07:00
sprmn24 c599a41b84 fix(auth): preserve corrupt auth.json and warn instead of silently resetting
_load_auth_store() caught all parse/read exceptions and silently
returned an empty store, making corruption look like a logout with
no diagnostic information and no way to recover the original file.

Now copies the corrupt file to auth.json.corrupt before resetting,
and logs a warning with the exception and backup path.
2026-04-24 15:22:44 -07:00
Teknium c7d62b3fe3 chore(release): map ebukau84@gmail.com -> UgwujaGeorge in AUTHOR_MAP 2026-04-24 15:22:19 -07:00
Teknium 36d68bcb82 fix(api-server): persist incomplete snapshot on asyncio.CancelledError too
Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.

Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.

Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
2026-04-24 15:22:19 -07:00
UgwujaGeorge a29bad2a3c fix(api-server): persist response snapshot on client disconnect when store=True 2026-04-24 15:22:19 -07:00
sprmn24 7957da7a1d fix(web_server): hold _oauth_sessions_lock during PKCE session state writes
_submit_anthropic_pkce() retrieved sess under _oauth_sessions_lock but
wrote back to sess["status"] and sess["error_message"] outside the lock.
A concurrent session GC or cancel could race with these writes, producing
inconsistent session state.

Wrap all 4 sess write sites in _oauth_sessions_lock:
- network exception path (Token exchange failed)
- missing access_token path
- credential save failure path
- success path (approved)
2026-04-24 15:22:04 -07:00
Cyprian Kowalczyk fd3864d8bd feat(cli): wrap /compress in _busy_command to block input during compression
Before this, typing during /compress was accepted by the classic CLI
prompt and landed in the next prompt after compression finished,
effectively consuming a keystroke for a prompt that was about to be
replaced. Wrapping the body in self._busy_command('Compressing
context...') blocks input rendering for the duration, matching the
pattern /skills install and other slow commands already use.

Salvages the useful part of #10303 (@iRonin). The `_compressing` flag
added to run_agent.py in the original PR was dead code (set in 3 spots,
read nowhere — not by cli.py, not by run_agent.py, not by the Ink TUI
which doesn't use _busy_command at all) and was dropped.
2026-04-24 15:21:22 -07:00
Yukipukii1 8ea389a7f8 fix(gateway/config): coerce quoted boolean values in config parsing 2026-04-24 15:20:05 -07:00
knockyai 3e6c108565 fix(gateway): honor queue mode in runner PRIORITY interrupt path
When display.busy_input_mode is 'queue', the runner-level PRIORITY block
in _handle_message was still calling running_agent.interrupt() for every
text follow-up to an active session. The adapter-level busy handler
already honors queue mode (commit 9d147f7fd), but this runner-level path
was an unconditional interrupt regardless of config.

Adds a queue-mode branch that queues the follow-up via
_queue_or_replace_pending_event() and returns without interrupting.

Salvages the useful part of #12070 (@knockyai). The config fan-out to
per-platform extra was redundant — runner already loads busy_input_mode
directly via _load_busy_input_mode().
2026-04-24 15:18:34 -07:00
Teknium e3a1a9c24d chore(release): map julia@alexland.us -> alexg0bot in AUTHOR_MAP (#15384) 2026-04-24 15:18:09 -07:00
Teknium e3697e20a6 chore(release): map iRonin personal email to GitHub login 2026-04-24 15:17:09 -07:00
Teknium ed91b79b7e fix(cli): keep Ctrl+D no-op when only attachments pending
Follow-up to @iRonin's Ctrl+D EOF fix. If the input text is empty but
the user has pending attached images, do nothing rather than exiting —
otherwise a stray Ctrl+D silently discards the attachments.
2026-04-24 15:17:09 -07:00
CK iRonin.IT 08d5c9c539 fix: Ctrl+D deletes char under cursor, only exits on empty input (bash/zsh behaviour) 2026-04-24 15:17:09 -07:00
Julia Bennet 1dcf79a864 feat: add slash command for busy input mode 2026-04-24 15:15:26 -07:00
teknium1 2de8a7a229 fix(skills): drop raw_content to avoid doubling skill payload
skill_view response went to the model verbatim; duplicating the SKILL.md
body as raw_content on every tool call added token cost with no agent-facing
benefit. Remove the field and update tests to assert on content only.

The slash/preload caller (agent/skill_commands.py) already falls back to
content when raw_content is absent, and it calls skill_view(preprocess=False)
anyway, so content is already unrendered on that path.
2026-04-24 15:15:07 -07:00
helix4u ead66f0c92 fix(skills): apply inline shell in skill_view 2026-04-24 15:15:07 -07:00
Allard 0bcbc9e316 docs(faq): Update docs on backups
- update faq answer with new `backup` command in release 0.9.0
- move profile export section together with backup section so related information can be read more easily
- add table comparison between `profile export` and `backup` to assist users if understanding the nuances between both
2026-04-24 15:14:08 -07:00
Teknium 2d444fc84d fix(run_agent): handle unescaped control chars in tool_call arguments (#15356)
Extends _repair_tool_call_arguments() to cover the most common local-model
JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and
newlines inside JSON string values (memory save summaries, file contents,
etc.). Previously fell through to '{}' replacement, losing the call.

Adds two repair passes:
  - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form
  - Pass 4: escape 0x00-0x1F control chars inside string values, then retry

Ports the core utility from #12068 / PR #12093 without the larger plumbing
change (that PR also replaced json.loads at 8 call sites; current main's
_repair_tool_call_arguments is already the single chokepoint, so the
upgrade happens transparently for every existing caller).

Credit: @truenorth-lj for the original utility design.

4 new regression tests covering literal newlines, tabs, re-serialisation
to strict=True-valid output, and the trailing-comma + control-char
combination case.
2026-04-24 15:06:41 -07:00
Teknium bb53d79d26 chore(release): map q19dcp@gmail.com -> aj-nt in AUTHOR_MAP 2026-04-24 15:03:07 -07:00
AJ 17fc84c256 fix: repair malformed tool call args in streaming assembly before flagging as truncated
When the streaming path (chat completions) assembled tool call deltas and
detected malformed JSON arguments, it set has_truncated_tool_args=True but
passed the broken args through unchanged. This triggered the truncation
handler which returned a partial result and killed the session (/new required).

_many_ malformations are repairable: trailing commas, unclosed brackets,
Python None, empty strings. _repair_tool_call_arguments() already existed
for the pre-API-request path but wasn't called during streaming assembly.

Now when JSON parsing fails during streaming assembly, we attempt repair
via _repair_tool_call_arguments() before flagging as truncated. If repair
succeeds (returns valid JSON), the tool call proceeds normally. Only truly
unrepairable args fall through to the truncation handler.

This prevents the most common session-killing failure mode for models like
GLM-5.1 that produce trailing commas or unclosed brackets.

Tests: 12 new streaming assembly repair tests, all 29 existing repair
tests still passing.
2026-04-24 15:03:07 -07:00
Teknium b7c1d77e55 fix(dashboard): remove unimplemented 'block' busy_input_mode option
The web UI schema advertised 'block' as a busy_input_mode choice, but
no implementation ever existed — the gateway and CLI both silently
collapsed 'block' (and anything other than 'queue') to 'interrupt'.
Users who picked 'block' in the dashboard got interrupts anyway.

Drop 'block' from the select options. The two supported modes are
'interrupt' (default) and 'queue'.
2026-04-24 15:01:38 -07:00
luyao618 7a192b124e fix(run_agent): repair corrupted tool_call arguments before sending to provider
When a session is split by context compression mid-tool-call, an assistant
message may end up with truncated/invalid JSON in tool_calls[*].function.arguments.
On the next turn this is replayed verbatim and providers reject the entire request
with HTTP 400 invalid_tool_call_format, bricking the conversation in a loop that
cannot recover without manual session quarantine.

This patch adds a defensive sanitizer that runs immediately before
client.chat.completions.create() in AIAgent.run_conversation():

- Validates each assistant tool_calls[*].function.arguments via json.loads
- Replaces invalid/empty arguments with '{}'
- Injects a synthetic tool response (or prepends a marker to the existing one)
  so downstream messages keep valid tool_call_id pairing
- Logs each repair with session_id / message_index / preview for observability

Defense in depth: corruption can originate from compression splits, manual edits,
or plugin bugs. Sanitizing at the send chokepoint catches all sources.

Adds 7 unit tests covering: truncated JSON, empty string, None, non-string args,
existing matching tool response (no duplicate injection), non-assistant messages
ignored, multiple repairs.

Fixes #15236
2026-04-24 14:55:47 -07:00
helix4u 0738b80833 fix(tui): rebuild when ink bundle is missing 2026-04-24 15:51:38 -06:00
Teknium 4093ee9c62 fix(codex): detect leaked tool-call text in assistant content (#15347)
gpt-5.x on the Codex Responses API sometimes degenerates and emits
Harmony-style `to=functions.<name> {json}` serialization as plain
assistant-message text instead of a structured `function_call` item.
The intent never makes it into `response.output` as a function_call,
so `tool_calls` is empty and `_normalize_codex_response()` returns
the leaked text as the final content. Downstream (e.g. delegate_task),
this surfaces as a confident-looking summary with `tool_trace: []`
because no tools actually ran — the Taiwan-embassy-email bug report.

Detect the pattern, scrub the content, and return finish_reason=
'incomplete' so the existing Codex-incomplete continuation path
(run_agent.py:11331, 3 retries) gets a chance to re-elicit a proper
function_call item. Encrypted reasoning items are preserved so the
model keeps its chain-of-thought on the retry.

Regression tests: leaked text triggers incomplete, real tool calls
alongside leak-looking text are preserved, clean responses pass
through unchanged.

Reported on Discord (gpt-5.4 / openai-codex).
2026-04-24 14:39:59 -07:00
helix4u 6a957a74bc fix(memory): add write origin metadata 2026-04-24 14:37:55 -07:00
Teknium 14b27bb68c chore(release): map @tochukwuada in AUTHOR_MAP
Contributor email for PR #15161 salvage (debthemelon
<thomasgeorgevii09@gmail.com>).
2026-04-24 14:32:21 -07:00
Teknium ef9355455b test: regression coverage for checkpoint dedup and inf/nan coercion
Covers the two bugs salvaged from PR #15161:

- test_batch_runner_checkpoint: TestFinalCheckpointNoDuplicates asserts
  the final aggregated completed_prompts list has no duplicate indices,
  and keeps a sanity anchor test documenting the pre-fix pattern so a
  future refactor that re-introduces it is caught immediately.

- test_model_tools: TestCoerceNumberInfNan asserts _coerce_number
  returns the original string for inf/-inf/nan/Infinity inputs and that
  the result round-trips through strict (allow_nan=False) json.dumps.
2026-04-24 14:32:21 -07:00
debthemelon dbdefa43c8 fix: eliminate duplicate checkpoint entries and JSON-unsafe coercion
batch_runner: completed_prompts_set is already fully populated by the
time the aggregation loop runs (incremental updates happen at result
collection time), so the subsequent extend() call re-added every
completed prompt index a second time. Removed the redundant variable
and extend, and write sorted(completed_prompts_set) directly to the
final checkpoint instead.

model_tools: _coerce_number returned Python float('inf')/float('nan')
for inf/nan strings rather than the original string. json.dumps raises
ValueError for these values, so any tool call where the model emitted
"inf" or "nan" for a numeric parameter would crash at serialization.
Changed the guard to return the original string, matching the
function's documented "returns original string on failure" contract.
2026-04-24 14:32:21 -07:00
Teknium db9d6375fb feat(models): add openai/gpt-5.5 and gpt-5.5-pro to OpenRouter + Nous Portal (#15343)
Replaces gpt-5.4 / gpt-5.4-pro entries in the OpenRouter fallback snapshot
and the Nous Portal curated list. Other aggregators (Vercel AI Gateway)
and provider-native lists are unchanged.
2026-04-24 14:31:47 -07:00
helix4u 8a2506af43 fix(aux): surface auxiliary failures in UI 2026-04-24 14:31:21 -07:00
helix4u e7590f92a2 fix(telegram): honor no_proxy for explicit proxy setup 2026-04-24 14:31:04 -07:00
brooklyn! a5129c72ef Merge pull request #15337 from NousResearch/bb/tui-kawaii-default-off
fix(tui): keep default personality neutral
2026-04-24 16:23:00 -05:00
Brooklyn Nicholson 53fc10fc9a fix(tui): keep default personality neutral 2026-04-24 16:19:23 -05:00
brooklyn! 93ddff53e3 Merge pull request #15321 from NousResearch/bb/tui-inline-diff-tooltrail-order
fix(tui): render tool trail before anchored inline diffs
2026-04-24 15:20:42 -05:00
Brooklyn Nicholson de596aca1c fix(tui): render tool trail before anchored inline diffs
Inline diff segments were anchored relative to assistant narration, but the
turn details pane still rendered after streamSegments. On completion that put
the diff before the tool telemetry that produced it. When a turn has anchored
diff segments, commit the accumulated thinking/tool trail as a pre-diff trail
message, then render the diff and final summary.
2026-04-24 15:07:02 -05:00
brooklyn! 6f1eed3968 Merge pull request #15274 from NousResearch/bb/tui-null-config-guard
fix(tui): tolerate + warn on null sections in config.yaml
2026-04-24 13:02:12 -05:00
Brooklyn Nicholson e3940f9807 fix(tui): guard personality overlay when personalities is null
TUI auto-resolves `display.personality` at session init, unlike the base CLI.
If config contains `agent.personalities: null`, `_resolve_personality_prompt`
called `.get()` on None and failed before model/provider selection.
Normalize null personalities to `{}` and surface a targeted config warning.
2026-04-24 12:57:51 -05:00
Brooklyn Nicholson bfa60234c8 feat(tui): warn on bare null sections in config.yaml
Tolerating null top-level keys silently drops user settings (e.g.
`agent.system_prompt` next to a bare `agent:` line is gone). Probe at
session create, log via `logger.warning`, and surface in the boot info
under `config_warning` — rendered in the TUI feed alongside the existing
`credential_warning` banner.
2026-04-24 12:49:02 -05:00
Brooklyn Nicholson fd9b692d33 fix(tui): tolerate null top-level sections in config.yaml
YAML parses bare keys like `agent:` or `display:` as None. `dict.get(key, {})`
returns that None instead of the default (defaults only fire on missing keys),
so every `cfg.get("agent", {}).get(...)` chain in tui_gateway/server.py
crashed agent init with `'NoneType' object has no attribute 'get'`.

Guard all 21 sites with `(cfg.get(X) or {})`. Regression test covers the
null-section init path reported on Twitter against the new TUI.
2026-04-24 12:43:09 -05:00
Austin Pickett c61547c067 Merge pull request #14890 from NousResearch/bb/tui-web-chat-unified
feat(web): dashboard Chat tab — xterm.js + JSON-RPC sidecar (supersedes #12710 + #13379)
2026-04-24 10:35:43 -07:00
brooklyn! 7f0f67d5f7 Merge pull request #15266 from NousResearch/bb/fix-tui-section-toggle
fix(tui): chevrons re-toggle even when section default is expanded
2026-04-24 12:24:27 -05:00
Brooklyn Nicholson f5e2a77a80 fix(tui): chevrons re-toggle even when section default is expanded
Recovers the manual click on the details accordion: with #14968's new
SECTION_DEFAULTS (thinking/tools start `expanded`), every panel render
was OR-ing the local open toggle against `visible.X === 'expanded'`.
That pinned `open=true` for the default-expanded sections, so clicking
the chevron flipped the local state but the panel never collapsed.

Local toggle is now the sole source of truth at render time; the
useState init still seeds from the resolved visibility (so first paint
is correct) and the existing useEffect still re-syncs when the user
mutates visibility at runtime via `/details`.

Same OR-lock cleared inside SubagentAccordion (`showChildren ||
openX`) — pre-existing but the same shape, so expand-all on the
spawn tree no longer makes inner sections un-collapsible either.
2026-04-24 12:22:20 -05:00
Austin Pickett 850fac14e3 chore: address copilot comments 2026-04-24 12:51:04 -04:00
Austin Pickett 5500b51800 chore: fix lint 2026-04-24 12:32:10 -04:00
Austin Pickett 63975aa75b fix: mobile chat in new layout 2026-04-24 12:07:46 -04:00
Teknium 62c14d5513 refactor(gateway): extract WhatsApp identity helpers into shared module
Follow-up to the canonical-identity session-key fix: pull the
JID/LID normalize/expand/canonical helpers into gateway/whatsapp_identity.py
instead of living in two places. gateway/session.py (session-key build) and
gateway/run.py (authorisation allowlist) now both import from the shared
module, so the two resolution paths can't drift apart.

Also switches the auth path from module-level _hermes_home (cached at
import time) to dynamic get_hermes_home() lookup, which matches the
session-key path and correctly reflects HERMES_HOME env overrides. The
lone test that monkeypatched gateway.run._hermes_home for the WhatsApp
auth path is updated to set HERMES_HOME env var instead; all other
tests that monkeypatch _hermes_home for unrelated paths (update,
restart drain, shutdown marker, etc.) still work — the module-level
_hermes_home is untouched.
2026-04-24 07:55:55 -07:00
Keira Voss 10deb1b87d fix(gateway): canonicalize WhatsApp identity in session keys
Hermes' WhatsApp bridge routinely surfaces the same person under either
a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid),
and may flip between the two for a single human within the same
conversation. Before this change, build_session_key used the raw
identifier verbatim, so the bridge reshuffling an alias form produced
two distinct session keys for the same person — in two places:

  1. DM chat_id — a user's DM sessions split in half, transcripts and
     per-sender state diverge.
  2. Group participant_id (with group_sessions_per_user enabled) — a
     member's per-user session inside a group splits in half for the
     same reason.

Add a canonicalizer that walks the bridge's lid-mapping-*.json files
and picks the shortest/numeric-preferred alias as the stable identity.
build_session_key now routes both the DM chat_id and the group
participant_id through this helper when the platform is WhatsApp.
All other platforms and chat types are untouched.

Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier
as public helpers. Plugins that need per-sender behaviour (role-based
routing, per-contact authorization, policy gating) need the same
identity resolution Hermes uses internally; without a public helper,
each plugin would have to re-implement the walker against the bridge's
internal on-disk format. Keeping this alongside build_session_key
makes it authoritative and one refactor away if the bridge ever
changes shape.

_expand_whatsapp_aliases stays private — it's an implementation detail
of how the mapping files are walked, not a contract callers should
depend on.
2026-04-24 07:55:55 -07:00
emozilla f49afd3122 feat(web): add /api/pty WebSocket bridge to embed TUI in dashboard
Exposes hermes --tui over a PTY-backed WebSocket so the dashboard can
embed the real TUI rather than reimplement its surface. The browser
attaches xterm.js to the socket; keystrokes flow in, PTY output bytes
flow out.

Architecture:

    browser <Terminal> (xterm.js)
           │  onData ───► ws.send(keystrokes)
           │  onResize ► ws.send('\x1b[RESIZE:cols;rows]')
           │  write   ◄── ws.onmessage (PTY bytes)
           ▼
    FastAPI /api/pty (token-gated, loopback-only)
           ▼
    PtyBridge (ptyprocess) ── spawns node ui-tui/dist/entry.js ──► tui_gateway + AIAgent

Components
----------

hermes_cli/pty_bridge.py
  Thin wrapper around ptyprocess.PtyProcess: byte-safe read/write on the
  master fd via os.read/os.write (not PtyProcessUnicode — ANSI is
  inherently byte-oriented and UTF-8 boundaries may land mid-read),
  non-blocking select-based reads, TIOCSWINSZ resize, idempotent
  SIGHUP→SIGTERM→SIGKILL teardown, platform guard (POSIX-only; Windows
  is WSL-supported only).

hermes_cli/web_server.py
  @app.websocket("/api/pty") endpoint gated by the existing
  _SESSION_TOKEN (via ?token= query param since browsers can't set
  Authorization on WS upgrades). Loopback-only enforcement. Reader task
  uses run_in_executor to pump PTY bytes without blocking the event
  loop. Writer loop intercepts a custom \x1b[RESIZE:cols;rows] escape
  before forwarding to the PTY. The endpoint resolves the TUI argv
  through a _resolve_chat_argv hook so tests can inject fake commands
  without building the real TUI.

Tests
-----

tests/hermes_cli/test_pty_bridge.py — 12 unit tests: spawn, stdout,
stdin round-trip, EOF, resize (via TIOCSWINSZ + tput readback), close
idempotency, cwd, env forwarding, unavailable-platform error.

tests/hermes_cli/test_web_server.py — TestPtyWebSocket adds 7 tests:
missing/bad token rejection (close code 4401), stdout streaming,
stdin round-trip, resize escape forwarding, unavailable-platform ANSI
error frame + 1011 close, resume parameter forwarding to argv.

96 tests pass under scripts/run_tests.sh.

(cherry picked from commit 29b337bca7)

feat(web): add Chat tab with xterm.js terminal + Sessions resume button

(cherry picked from commit 3d21aee8 by emozilla, conflicts resolved
 against current main: BUILTIN_ROUTES table + plugin slot layout)

fix(tui): replace OSC 52 jargon in /copy confirmation

When the user ran /copy successfully, Ink confirmed with:

  sent OSC52 copy sequence (terminal support required)

That reads like a protocol spec to everyone who isn't a terminal
implementer. The caveat was a historical artifact — OSC 52 wasn't
universally supported when this message was written, so the TUI
honestly couldn't guarantee the copy had landed anywhere.

Today every modern terminal (including the dashboard's embedded
xterm.js) handles OSC 52 reliably. Say what the user actually wants
to know — that it copied, and how much — matching the message the
TUI already uses for selection copy:

  copied 1482 chars

(cherry picked from commit a0701b1d5a)

docs: document the dashboard Chat tab

AGENTS.md — new subsection under TUI Architecture explaining that the
dashboard embeds the real hermes --tui rather than rewriting it,
with pointers to the pty_bridge + WebSocket endpoint and the rule
'never add a parallel chat surface in React.'

website/docs/user-guide/features/web-dashboard.md — user-facing Chat
section inside the existing Web Dashboard page, covering how it works
(WebSocket + PTY + xterm.js), the Sessions-page resume flow, and
prerequisites (Node.js, ptyprocess, POSIX kernel / WSL on Windows).

(cherry picked from commit 2c2e32cc45)

feat(tui-gateway): transport-aware dispatch + WebSocket sidecar

Decouples the JSON-RPC dispatcher from its I/O sink so the same handler
surface can drive multiple transports concurrently. The PTY chat tab
already speaks to the TUI binary as bytes — this adds a structured
event channel alongside it for dashboard-side React widgets that need
typed events (tool.start/complete, model picker state, slash catalog)
that PTY can't surface.

- `tui_gateway/transport.py` — `Transport` protocol + `contextvars` binding
  + module-level `StdioTransport` fallback. The stdio stream resolves
  through a lambda so existing tests that monkey-patch `_real_stdout`
  keep passing without modification.
- `tui_gateway/ws.py` — WebSocket transport implementation; FastAPI
  endpoint mounting lives in hermes_cli/web_server.py.
- `tui_gateway/server.py`:
  - `write_json` routes via session transport (for async events) →
    contextvar transport (for in-request writes) → stdio fallback.
  - `dispatch(req, transport=None)` binds the transport for the request
    lifetime and propagates it to pool workers via `contextvars.copy_context`
    so async handlers don't lose their sink.
  - `_init_session` and the manual-session create path stash the
    request's transport so out-of-band events (subagent.complete, etc.)
    fan out to the right peer.

`tui_gateway.entry` (Ink's stdio handshake) is unchanged externally —
it falls through every precedence step into the stdio fallback, byte-
identical to the previous behaviour.

feat(web): ChatSidebar — JSON-RPC sidecar next to xterm.js terminal

Composes the two transports into a single Chat tab:

  ┌─────────────────────────────────────────┬──────────────┐
  │  xterm.js / PTY  (emozilla #13379)      │ ChatSidebar  │
  │  the literal hermes --tui process       │  /api/ws     │
  └─────────────────────────────────────────┴──────────────┘
        terminal bytes                          structured events

The terminal pane stays the canonical chat surface — full TUI fidelity,
slash commands, model picker, mouse, skin engine, wide chars all paint
inside the terminal. The sidebar opens a parallel JSON-RPC WebSocket
to the same gateway and renders metadata that PTY can't surface to
React chrome:

  • model + provider badge with connection state (click → switch)
  • running tool-call list (driven by tool.start / tool.progress /
    tool.complete events)
  • model picker dialog (gateway-driven, reuses ModelPickerDialog)

The sidecar is best-effort. If the WS can't connect (older gateway,
network hiccup, missing token) the terminal pane keeps working
unimpaired — sidebar just shows the connection-state badge in the
appropriate tone.

- `web/src/components/ChatSidebar.tsx` — new component (~270 lines).
  Owns its GatewayClient, drives the model picker through
  `slash.exec`, fans tool events into a capped tool list.
- `web/src/pages/ChatPage.tsx` — split layout: terminal pane
  (`flex-1`) + sidebar (`w-80`, `lg+` only).
- `hermes_cli/web_server.py` — mount `/api/ws` (token + loopback
  guards mirror /api/pty), delegate to `tui_gateway.ws.handle_ws`.

Co-authored-by: emozilla <emozilla@nousresearch.com>

refactor(web): /clean pass on ChatSidebar + ChatPage lint debt

- ChatSidebar: lift gw out of useRef into a useMemo derived from a
  reconnect counter. React 19's react-hooks/refs and react-hooks/
  set-state-in-effect rules both fire when you touch a ref during
  render or call setState from inside a useEffect body. The
  counter-derived gw is the canonical pattern for "external resource
  that needs to be replaceable on user action" — re-creating the
  client comes from bumping `version`, the effect just wires + tears
  down. Drops the imperative `gwRef.current = …` reassign in
  reconnect, drops the truthy ref guard in JSX. modelLabel +
  banner inlined as derived locals (one-off useMemo was overkill).
- ChatPage: lazy-init the banner state from the missing-token check
  so the effect body doesn't have to setState on first run. Drops
  the unused react-hooks/exhaustive-deps eslint-disable. Adds a
  scoped no-control-regex disable on the SGR mouse parser regex
  (the \\x1b is intentional for xterm escape sequences).

All my-touched files now lint clean. Remaining warnings on web/
belong to pre-existing files this PR doesn't touch.

Verified: vitest 249/249, ui-tui eslint clean, web tsc clean,
python imports clean.

chore: uptick

fix(web): drop ChatSidebar tool list — events can't cross PTY/WS boundary

The /api/pty endpoint spawns `hermes --tui` as a child process with its
own tui_gateway and _sessions dict; /api/ws runs handle_ws in-process in
the dashboard server with a separate _sessions dict. Tool events fire on
the child's gateway and never reach the WS sidecar, so the sidebar's
tool.start/progress/complete listeners always observed an empty list.

Drop the misleading list (and the now-orphaned ToolCall primitive),
keep model badge + connection state + model picker + error banner —
those work because they're sidecar-local concerns. Surfacing tool calls
in the sidebar requires cross-process forwarding (PTY child opens a
back-WS to the dashboard, gateway tees emits onto stdio + sidecar
transport) — proper feature for a follow-up.

feat(web): wire ChatSidebar tool list to PTY child via /api/pub broadcast

The dashboard's /api/pty spawns hermes --tui as a child process; tool
events fire in the python tui_gateway grandchild and never crossed the
process boundary into the in-process WS sidecar — so the sidebar tool
list was always empty.

Cross-process forwarding:

- tui_gateway: TeeTransport (transport.py) + WsPublisherTransport
  (event_publisher.py, sync websockets client). entry.py installs the
  tee on _stdio_transport when HERMES_TUI_SIDECAR_URL is set, mirroring
  every dispatcher emit to a back-WS without disturbing Ink's stdio
  handshake.

- hermes_cli/web_server.py: new /api/pub (publisher) + /api/events
  (subscriber) endpoints with a per-channel registry. /api/pty now
  accepts ?channel= and propagates the sidecar URL via env. start_server
  also stashes app.state.bound_port so the URL is constructable.

- web/src/pages/ChatPage.tsx: generates a channel UUID per mount,
  passes it to /api/pty and as a prop to ChatSidebar.

- web/src/components/ChatSidebar.tsx: opens /api/events?channel=, fans
  tool.start/progress/complete back into the ToolCall list. Restores
  the ToolCall primitive.

Tests: 4 new TestPtyWebSocket cases cover channel propagation,
broadcast fan-out, and missing-channel rejection (10 PTY tests pass,
120 web_server tests overall).

fix(web): address Copilot review on #14890

Five threads, all real:

- gatewayClient.ts: register `message`/`close` listeners BEFORE awaiting
  the open handshake.  Server emits `gateway.ready` immediately after
  accept, so a listener attached after the open promise could race past
  the initial skin payload and lose it.

- ChatSidebar.tsx: wire `error`/`close` on the /api/events subscriber
  WS into the existing error banner.  4401/4403 (auth/loopback reject)
  surface as a "reload the page" message; mid-stream drops surface as
  "events feed disconnected" with the existing reconnect button.  Clean
  unmount closes (1000/1001) stay silent.

- web-dashboard.md: install hint was `pip install hermes-agent[web]` but
  ptyprocess lives in the `pty` extra, not `web`.  Switch to
  `hermes-agent[web,pty]` in both prerequisite blocks.

- AGENTS.md: previous "never add a parallel React chat surface" guidance
  was overbroad and contradicted this PR's sidebar.  Tightened to forbid
  re-implementing the transcript/composer/PTY terminal while explicitly
  allowing structured supporting widgets (sidebar / model picker /
  inspectors), matching the actual architecture.

- web/package-lock.json: regenerated cleanly so the wterm sibling
  workspace paths (extraneous machine-local entries) stop polluting CI.

Tests: 249/249 vitest, 10/10 PTY/events, web tsc clean.

refactor(web): /clean pass on ChatSidebar events handler

Spotted in the round-2 review:

- Banner flashed on clean unmount: `ws.close()` from the effect cleanup
  fires `close` with code 1005, opened=true, neither 1000 nor 1001 —
  hit the "unexpected drop" branch.  Track `unmounting` in the effect
  scope and gate the banner through a `surface()` helper so cleanup
  closes stay silent.

- DRY the duplicated "events feed disconnected" string into a local
  const used by both the error and close handlers.

- Drop the `opened` flag (no longer needed once the unmount guard is
  the source of truth for "is this an expected close?").
2026-04-24 10:51:49 -04:00
Austin Pickett 1143f234e3 Merge pull request #14899 from NousResearch/feat/dashboard-layout
Feat/dashboard layout
2026-04-24 07:48:31 -07:00
Teknium c4627f4933 chore(release): map Group G contributors in AUTHOR_MAP 2026-04-24 07:26:07 -07:00
bsgdigital 7c3e5706d8 fix(bedrock): Bedrock-aware _rebuild_anthropic_client helper on interrupt
Three interrupt-recovery sites in run_agent.py rebuilt self._anthropic_client
with build_anthropic_client(self._anthropic_api_key, ...) unconditionally.
When provider=bedrock + api_mode=anthropic_messages (AnthropicBedrock SDK
path), self._anthropic_api_key is the sentinel 'aws-sdk' — build_anthropic_client
doesn't accept that and the rebuild either crashed or produced a non-functional
client.

Extract a _rebuild_anthropic_client() helper that dispatches to
build_anthropic_bedrock_client(region) when provider='bedrock', falling back
to build_anthropic_client() for native Anthropic and other anthropic_messages
providers (MiniMax, Kimi, Alibaba, etc.). Three inline rebuild sites now call
the helper.

Partial salvage of #14680 by @bsgdigital — only the _rebuild_anthropic_client
helper. The normalize_model_name Bedrock-prefix piece was subsumed by #14664,
and the aux client aws_sdk branch was subsumed by #14770 (both in the same
salvage PR as this commit).
2026-04-24 07:26:07 -07:00
Andre Kurait a9ccb03ccc fix(bedrock): evict cached boto3 client on stale-connection errors
## Problem

When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT
timeout, VPN flap, server-side TCP RST, proxy idle cull), the next
Converse call surfaces as one of:

  * botocore.exceptions.ConnectionClosedError / ReadTimeoutError /
    EndpointConnectionError / ConnectTimeoutError
  * urllib3.exceptions.ProtocolError
  * A bare AssertionError raised from inside urllib3 or botocore
    (internal connection-pool invariant check)

The agent loop retries the request 3x, but the cached boto3 client in
_bedrock_runtime_client_cache is reused across retries — so every
attempt hits the same dead connection pool and fails identically.
Only a process restart clears the cache and lets the user keep working.

The bare-AssertionError variant is particularly user-hostile because
str(AssertionError()) is an empty string, so the retry banner shows:

    ⚠️  API call failed: AssertionError
       📝 Error:

with no hint of what went wrong.

## Fix

Add two helpers to agent/bedrock_adapter.py:

  * is_stale_connection_error(exc) — classifies exceptions that
    indicate dead-client/dead-socket state. Matches botocore
    ConnectionError + HTTPClientError subtrees, urllib3
    ProtocolError / NewConnectionError, and AssertionError
    raised from a frame whose module name starts with urllib3.,
    botocore., or boto3.. Application-level AssertionErrors are
    intentionally excluded.

  * invalidate_runtime_client(region) — per-region counterpart to
    the existing reset_client_cache(). Evicts a single cached
    client so the next call rebuilds it (and its connection pool).

Wire both into the Converse call sites:

  * call_converse() / call_converse_stream() in
    bedrock_adapter.py (defense-in-depth for any future caller)
  * The two direct client.converse(**kwargs) /
    client.converse_stream(**kwargs) call sites in run_agent.py
    (the paths the agent loop actually uses)

On a stale-connection exception, the client is evicted and the
exception re-raised unchanged. The agent's existing retry loop then
builds a fresh client on the next attempt and recovers without
requiring a process restart.

## Tests

tests/agent/test_bedrock_adapter.py gets three new classes (14 tests):

  * TestInvalidateRuntimeClient — per-region eviction correctness;
    non-cached region returns False.
  * TestIsStaleConnectionError — classifies botocore
    ConnectionClosedError / EndpointConnectionError /
    ReadTimeoutError, urllib3 ProtocolError, library-internal
    AssertionError (both urllib3.* and botocore.* frames), and
    correctly ignores application-level AssertionError and
    unrelated exceptions (ValueError, KeyError).
  * TestCallConverseInvalidatesOnStaleError — end-to-end: stale
    error evicts the cached client, non-stale error (validation)
    leaves it alone, successful call leaves it cached.

All 116 tests in test_bedrock_adapter.py pass.

Signed-off-by: Andre Kurait <andrekurait@gmail.com>
2026-04-24 07:26:07 -07:00
Tranquil-Flow 7dc6eb9fbf fix(agent): handle aws_sdk auth type in resolve_provider_client
Bedrock's aws_sdk auth_type had no matching branch in
resolve_provider_client(), causing it to fall through to the
"unhandled auth_type" warning and return (None, None).  This broke
all auxiliary tasks (compression, memory, summarization) for Bedrock
users — the main conversation loop worked fine, but background
context management silently failed.

Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via
build_anthropic_bedrock_client(), using boto3's default credential
chain (IAM roles, SSO, env vars, instance metadata).  Default
auxiliary model is Haiku for cost efficiency.

Closes #13919
2026-04-24 07:26:07 -07:00
Andre Kurait b290297d66 fix(bedrock): resolve context length via static table before custom-endpoint probe
## Problem

`get_model_context_length()` in `agent/model_metadata.py` had a resolution
order bug that caused every Bedrock model to fall back to the 128K default
context length instead of reaching the static Bedrock table (200K for
Claude, etc.).

The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in
`_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False.
The resolution order then ran the custom-endpoint probe (step 2) *before*
the Bedrock branch (step 4b), which:

  1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`).
  2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the
     bedrock-runtime URL (Bedrock doesn't serve this shape).
  3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the
     "probe-down" branch — never reaching the Bedrock static table.

Result: users on Bedrock saw 128K context for Claude models that
actually support 200K on Bedrock, causing premature auto-compression.

## Fix

Promote the Bedrock branch from step 4b to step 1b, so it runs *before*
the custom-endpoint probe at step 2. The static table in
`bedrock_adapter.py::get_bedrock_context_length()` is the authoritative
source for Bedrock (the ListFoundationModels API doesn't expose context
window sizes), so there's no reason to probe `/models` first.

The original step 4b is replaced with a one-line breadcrumb comment
pointing to the new location, to make the resolution-order docstring
accurate.

## Changes

- `agent/model_metadata.py`
  - Add step 1b: Bedrock static-table branch (unchanged predicate, moved).
  - Remove dead step 4b block, replace with breadcrumb comment.
  - Update resolution-order docstring to include step 1b.

- `tests/agent/test_model_metadata.py`
  - New `TestBedrockContextResolution` class (3 tests):
    - `test_bedrock_provider_returns_static_table_before_probe`:
      confirms `provider="bedrock"` hits the static table and does NOT
      call `fetch_endpoint_model_metadata` (regression guard).
    - `test_bedrock_url_without_provider_hint`: confirms the
      `bedrock-runtime.*.amazonaws.com` host match works without an
      explicit `provider=` hint.
    - `test_non_bedrock_url_still_probes`: confirms the probe still
      fires for genuinely-custom endpoints (no over-reach).

## Testing

  pytest tests/agent/test_model_metadata.py -q
  # 83 passed in 1.95s (3 new + 80 existing)

## Risk

Very low.

- Predicate is identical to the original step 4b — no behaviour change
  for non-Bedrock paths.
- Original step 4b was dead code for the user-facing case (always hit
  the 128K fallback first), so removing it cannot regress behaviour.
- Bedrock path now short-circuits before any network I/O — faster too.
- `ImportError` fall-through preserved so users without `boto3`
  installed are unaffected.

## Related

- This is a prerequisite for accurate context-window accounting on
  Bedrock — the fix for #14710 (stale-connection client eviction)
  depends on correct context sizing to know when to compress.

Signed-off-by: Andre Kurait <andrekurait@gmail.com>
2026-04-24 07:26:07 -07:00
Qi Ke f2fba4f9a1 fix(anthropic): auto-detect Bedrock model IDs in normalize_model_name (#12295)
Bedrock model IDs use dots as namespace separators (anthropic.claude-opus-4-7,
us.anthropic.claude-sonnet-4-5-v1:0), not version separators.
normalize_model_name() was unconditionally converting all dots to hyphens,
producing invalid IDs that Bedrock rejects with HTTP 400/404.

This affected both the main agent loop (partially mitigated by
_anthropic_preserve_dots in run_agent.py) and all auxiliary client calls
(compression, session_search, vision, etc.) which go through
_AnthropicCompletionsAdapter and never pass preserve_dots=True.

Fix: add _is_bedrock_model_id() to detect Bedrock namespace prefixes
(anthropic., us., eu., ap., jp., global.) and skip dot-to-hyphen
conversion for these IDs regardless of the preserve_dots flag.
2026-04-24 07:26:07 -07:00
Teknium fcc05284fc fix(delegate): tool-activity-aware heartbeat stale detection (#13041) (#15183)
A child running a legitimately long-running tool (terminal command,
browser fetch, big file read) holds current_tool set and keeps
api_call_count frozen while the tool runs. The previous stale check
treated that as idle after 5 heartbeat cycles (~150s), stopped
touching the parent, and let the gateway kill the session.

Split the threshold in two:
- _HEARTBEAT_STALE_CYCLES_IDLE=5 (~150s)  — applied only when
  current_tool is None (child wedged between turns)
- _HEARTBEAT_STALE_CYCLES_IN_TOOL=20 (~600s) — applied when the child
  is inside a tool call

Stale counter also resets when current_tool changes (new tool =
progress). The hard child_timeout_seconds (default 600s) is still
the final cap, so genuinely stuck tools don't get to block forever.
2026-04-24 07:25:19 -07:00
Teknium 1840c6a57d feat(spotify): wire setup wizard into 'hermes tools' + document cron usage (#15180)
A — 'hermes tools' activation now runs the full Spotify wizard.

Previously a user had to (1) toggle the Spotify toolset on in 'hermes
tools' AND (2) separately run 'hermes auth spotify' to actually use
it. The second step was a discovery gap — the docs mentioned it but
nothing in the TUI pointed users there.

Now toggling Spotify on calls login_spotify_command as a post_setup
hook. If the user has no client_id yet, the interactive wizard walks
them through Spotify app creation; if they do, it skips straight to
PKCE. Either way, one 'hermes tools' pass leaves Spotify toggled on
AND authenticated. SystemExit from the wizard (user abort) leaves the
toolset enabled and prints a 'run: hermes auth spotify' hint — it
does NOT fail the toolset toggle.

Dropped the TOOL_CATEGORIES env_vars list for Spotify. The wizard
handles HERMES_SPOTIFY_CLIENT_ID persistence itself, and asking users
to type env var names before the wizard fires was UX-backwards — the
point of the wizard is that they don't HAVE a client_id yet.

B — Docs page now covers cron + Spotify.

New 'Scheduling: Spotify + cron' section with two working examples
(morning playlist, wind-down pause) using the real 'hermes cron add'
CLI surface (verified via 'cron add --help'). Covers the active-device
gotcha, Premium gating, memory isolation, and links to the cron docs.

Also fixed a stale '9 Spotify tools' reference in the setup copy —
we consolidated to 7 tools in #15154.

Validation:
- scripts/run_tests.sh tests/hermes_cli/test_tools_config.py
    tests/hermes_cli/test_spotify_auth.py
    tests/tools/test_spotify_client.py
  → 54 passed
- website: node scripts/prebuild.mjs && npx docusaurus build
  → SUCCESS, no new warnings
2026-04-24 07:24:28 -07:00
Blind Dev 591aa159aa feat: allow Telegram chat allowlists for groups and forums (#15027)
* feat: allow Telegram chat allowlists for groups and forums

* chore: map web3blind noreply email for release attribution

---------

Co-authored-by: web3blind <web3blind@users.noreply.github.com>
2026-04-24 07:23:14 -07:00
Austin Pickett d3e56b9f39 chore: refac 2026-04-24 10:17:57 -04:00
Teknium c6b734e24d chore(release): map Group B contributors in AUTHOR_MAP 2026-04-24 07:14:00 -07:00
Wooseong Kim 54146ae07c fix(aux): refresh cached auth after 401 2026-04-24 07:14:00 -07:00
Wooseong Kim be6b83562d fix(aux): force anthropic oauth refresh after 401
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 07:14:00 -07:00
5park1e e1106772d9 fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain
Bug 3 — Stale OAuth token not detected in 'hermes model':
- _model_flow_anthropic used 'has_creds = bool(existing_key)' which treats
  any non-empty token (including expired OAuth tokens) as valid.
- Added existing_is_stale_oauth check: if the only credential is an OAuth
  token (sk-ant- prefix) with no valid cc_creds fallback, mark it stale
  and force the re-auth menu instead of silently accepting a broken token.

Bug 4 — macOS Keychain credentials never read:
- Claude Code >=2.1.114 migrated from ~/.claude/.credentials.json to the
  macOS Keychain under service 'Claude Code-credentials'.
- Added _read_claude_code_credentials_from_keychain() using the 'security'
  CLI tool; read_claude_code_credentials() now tries Keychain first then
  falls back to JSON file.
- Non-Darwin platforms return None from Keychain read immediately.

Tests:
- tests/agent/test_anthropic_keychain.py: 11 cases covering Darwin-only
  guard, security command failures, JSON parsing, fallback priority.
- tests/hermes_cli/test_anthropic_model_flow_stale_oauth.py: 8 cases
  covering stale OAuth detection, API key passthrough, cc_creds fallback.

Refs: #12905
2026-04-24 07:14:00 -07:00
nightq 5383615db5 fix: recognize Claude Code OAuth tokens (cc- prefix) in _is_oauth_token
Fixes NousResearch/hermes-agent#9813

Root cause: _is_oauth_token() only recognized sk-ant-* and eyJ* patterns,
but Claude Code OAuth tokens from CLAUDE_CODE_OAUTH_TOKEN use cc- prefix
Fix: Add cc- prefix detection so these tokens route through Bearer auth
2026-04-24 07:14:00 -07:00
Maymun 56086e3fd7 fix(auth): write Anthropic OAuth token files atomically to prevent corruption 2026-04-24 07:14:00 -07:00
Teknium 8d12fb1e6b refactor(spotify): convert to built-in bundled plugin under plugins/spotify (#15174)
Moves the Spotify integration from tools/ into plugins/spotify/,
matching the existing pattern established by plugins/image_gen/ for
third-party service integrations.

Why:
- tools/ should be reserved for foundational capabilities (terminal,
  read_file, web_search, etc.). tools/providers/ was a one-off
  directory created solely for spotify_client.py.
- plugins/ is already the home for image_gen backends, memory
  providers, context engines, and standalone hook-based plugins.
  Spotify is a third-party service integration and belongs alongside
  those, not in tools/.
- Future service integrations (eventually: Deezer, Apple Music, etc.)
  now have a pattern to copy.

Changes:
- tools/spotify_tool.py → plugins/spotify/tools.py (handlers + schemas)
- tools/providers/spotify_client.py → plugins/spotify/client.py
- tools/providers/ removed (was only used for Spotify)
- New plugins/spotify/__init__.py with register(ctx) calling
  ctx.register_tool() × 7. The handler/check_fn wiring is unchanged.
- New plugins/spotify/plugin.yaml (kind: backend, bundled, auto-load).
- tests/tools/test_spotify_client.py: import paths updated.

tools_config fix — _DEFAULT_OFF_TOOLSETS now wins over plugin auto-enable:
- _get_platform_tools() previously auto-enabled unknown plugin
  toolsets for new platforms. That was fine for image_gen (which has
  no toolset of its own) but bad for Spotify, which explicitly
  requires opt-in (don't ship 7 tool schemas to users who don't use
  it). Added a check: if a plugin toolset is in _DEFAULT_OFF_TOOLSETS,
  it stays off until the user picks it in 'hermes tools'.

Pre-existing test bug fix:
- tests/hermes_cli/test_plugins.py::test_list_returns_sorted
  asserted names were sorted, but list_plugins() sorts by key
  (path-derived, e.g. image_gen/openai). With only image_gen plugins
  bundled, name and key order happened to agree. Adding plugins/spotify
  broke that coincidence (spotify sorts between openai-codex and xai
  by name but after xai by key). Updated test to assert key order,
  which is what the code actually documents.

Validation:
- scripts/run_tests.sh tests/hermes_cli/test_plugins.py \
    tests/hermes_cli/test_tools_config.py \
    tests/hermes_cli/test_spotify_auth.py \
    tests/tools/test_spotify_client.py \
    tests/tools/test_registry.py
  → 143 passed
- E2E plugin load: 'spotify' appears in loaded plugins, all 7 tools
  register into the spotify toolset, check_fn gating intact.
2026-04-24 07:06:11 -07:00
Teknium e5d41f05d4 feat(spotify): consolidate tools (9→7), add spotify skill, surface in hermes setup (#15154)
Three quality improvements on top of #15121 / #15130 / #15135:

1. Tool consolidation (9 → 7)
   - spotify_saved_tracks + spotify_saved_albums → spotify_library with
     kind='tracks'|'albums'. Handler code was ~90 percent identical
     across the two old tools; the merge is a behavioral no-op.
   - spotify_activity dropped. Its 'now_playing' action was a duplicate
     of spotify_playback.get_currently_playing (both return identical
     204/empty payloads). Its 'recently_played' action moves onto
     spotify_playback as a new action — history belongs adjacent to
     live state.
   - Net: each API call ships 2 fewer tool schemas when the Spotify
     toolset is enabled, and the action surface is more discoverable
     (everything playback-related is on one tool).

2. Spotify skill (skills/media/spotify/SKILL.md)
   Teaches the agent canonical usage patterns so common requests don't
   balloon into 4+ tool calls:
   - 'play X' = one search, then play by URI (not search + scan +
     describe + play)
   - 'what's playing' = single get_currently_playing (no preflight
     get_state chain)
   - Don't retry on '403 Premium required' or '403 No active device' —
     both require user action
   - URI/URL/bare-ID format normalization
   - Full failure-mode reference for 204/401/403/429

3. Surfaced in 'hermes setup' tool status
   Adds 'Spotify (PKCE OAuth)' to the tool status list when
   auth.json has a Spotify access/refresh token. Matches the
   homeassistant pattern but reads from auth.json (OAuth-based) rather
   than env vars.

Docs updated to reflect the new 7-tool surface, and mention the
companion skill in the 'Using it' section.

Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after
renaming/replacing the spotify_activity tests with library +
recently_played coverage). Docusaurus build clean.
2026-04-24 06:14:51 -07:00
Austin Pickett 0fdbfad2b0 feat: embed docs 2026-04-24 09:04:11 -04:00
Teknium 9d1b277e1d chore(release): map Group H contributors in AUTHOR_MAP 2026-04-24 05:48:15 -07:00
XieNBi 4a51ab61eb fix(cli): non-zero /model counts for native OpenAI and direct API rows 2026-04-24 05:48:15 -07:00
Brian D. Evans 7f26cea390 fix(models): strip models/ prefix in Gemini validator (#12532)
Salvage of the Gemini-specific piece from PR #12585 by @briandevans.
Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs prefixed
with 'models/' (native Gemini-API convention), so set-membership against
curated bare IDs drops every model. Strip the prefix before comparison.

The Anthropic static-catalog piece of #12585 was subsumed by #12618's
_fetch_anthropic_models() branch landing earlier in the same salvage PR.
Full branch cherry-pick was skipped because it also carried unrelated
catalog-version regressions.
2026-04-24 05:48:15 -07:00
H-Ali13381 2303dd8686 fix(models): use Anthropic-native headers for model validation
The generic /v1/models probe in validate_requested_model() sent a plain
'Authorization: Bearer <key>' header, which works for OpenAI-compatible
endpoints but results in a 401 Unauthorized from Anthropic's API.
Anthropic requires x-api-key + anthropic-version headers (or Bearer for
OAuth tokens from Claude Code).

Add a provider-specific branch for normalized == 'anthropic' that calls
the existing _fetch_anthropic_models() helper, which already handles
both regular API keys and Claude Code OAuth tokens correctly.  This
mirrors the pattern already used for openai-codex, copilot, and bedrock.

The branch also includes:
- fuzzy auto-correct (cutoff 0.9) for near-exact model ID typos
- fuzzy suggestions (cutoff 0.5) when the model is not listed
- graceful fall-through when the token cannot be resolved or the
  network is unreachable (accepts with a warning rather than hard-fail)
- a note that newer/preview/snapshot model IDs can be gate-listed
  and may still work even if not returned by /v1/models

Fixes Anthropic provider users seeing 'service unreachable' errors
when running /model <claude-model> because every probe 401'd.
2026-04-24 05:48:15 -07:00
wangshengyang2004 647900e813 fix(cli): support model validation for anthropic_messages and cloudflare-protected endpoints
- probe_api_models: add api_mode param; use x-api-key + anthropic-version
  headers for anthropic_messages mode (Anthropic's native Models API auth)
- probe_api_models: add User-Agent header to avoid Cloudflare 403 blocks
  on third-party OpenAI-compatible endpoints
- validate_requested_model: pass api_mode through from switch_model
- validate_requested_model: for anthropic_messages mode, attempt probe with
  correct auth; if probe fails (many proxies don't implement /v1/models),
  accept the model with an informational warning instead of rejecting
- fetch_api_models: propagate api_mode to probe_api_models
2026-04-24 05:48:15 -07:00
Teknium 25465fd8d7 test(gateway): on_session_finalize fires on idle-expiry + AUTHOR_MAP
Regression test for #14981. Verifies that _session_expiry_watcher fires
on_session_finalize for each session swept out of the store, matching
the contract documented for /new, /reset, CLI shutdown, and gateway stop.

Verified the test fails cleanly on pre-fix code (hook call list missing
sess-expired) and passes with the fix applied.
2026-04-24 05:40:52 -07:00
Stefan Dimitrov 260ae62134 Invoke session finalize hooks on expiry flush 2026-04-24 05:40:52 -07:00
Teknium 9be17bb84f docs(spotify): expand feature page with tool reference, Free/Premium matrix, troubleshooting (#15135)
The initial Spotify docs page shipped in #15130 was a setup guide. This
expands it into a full feature reference:

- Per-tool parameter table for all 9 tools, extracted from the real
  schemas in tools/spotify_tool.py (actions, required/optional args,
  premium gating).
- Free vs Premium feature matrix — which actions work on which tier,
  so Free users don't assume Spotify tools are useless to them.
- Active-device prerequisite called out at the top; this is the #1
  cause of '403 no active device' reports for every Spotify
  integration.
- SSH / headless section explaining that browser auto-open is skipped
  when SSH_CLIENT/SSH_TTY is set, and how to tunnel the callback port.
- Token lifecycle: refresh on 401, persistence across restarts, how
  to revoke server-side via spotify.com/account/apps.
- Example prompt list so users know what to ask the agent.
- Troubleshooting expanded: no-active-device, Premium-required, 204
  now_playing, INVALID_CLIENT, 429, 401 refresh-revoked, wizard not
  opening browser.
- 'Where things live' table mapping auth.json / .env / Spotify app.

Verified with 'node scripts/prebuild.mjs && npx docusaurus build'
— page compiles, no new warnings.
2026-04-24 05:38:02 -07:00
Teknium fe9d9a26d8 chore(release): map Group F contributors in AUTHOR_MAP 2026-04-24 05:35:43 -07:00
Tranquil-Flow ee83a710f0 fix(gateway,cron): activate fallback_model when primary provider auth fails
When the primary provider raises AuthError (expired OAuth token,
revoked API key), the error was re-raised before AIAgent was created,
so fallback_model was never consulted. Now both gateway/run.py and
cron/scheduler.py catch AuthError specifically and attempt to resolve
credentials from the fallback_providers/fallback_model config chain
before propagating the error.

Closes #7230
2026-04-24 05:35:43 -07:00
vlwkaos f7f7588893 fix(agent): only set rate-limit cooldown when leaving primary; add tests 2026-04-24 05:35:43 -07:00
LeonSGP43 a9fd8d7c88 fix(agent): default missing fallback chain on switch 2026-04-24 05:35:43 -07:00
CruxExperts 46451528a5 fix(agent): pass config_context_length in fallback activation path
Try to activate fallback model after errors was calling get_model_context_length()
without the config_context_length parameter, causing it to fall through to
DEFAULT_FALLBACK_CONTEXT (128K) even when config.yaml has an explicit
model.context_length value (e.g. 204800 for MiniMax-M2.7).

This mirrors the fix already present in switch_model() at line 1988, which
correctly passes config_context_length. The fallback path was missed.

Fixes: context_length forced to 128K on fallback activation
2026-04-24 05:35:43 -07:00
Bartok9 4e27e498f1 fix(agent): exclude ssl.SSLError from is_local_validation_error to prevent non-retryable abort
ssl.SSLError (and its subclass ssl.SSLCertVerificationError) inherits from
OSError *and* ValueError via Python's MRO. The is_local_validation_error
check used isinstance(api_error, (ValueError, TypeError)) to detect
programming bugs that should abort immediately — but this inadvertently
caught ssl.SSLError, treating a TLS transport failure as a non-retryable
client error.

The error classifier already maps SSLCertVerificationError to
FailoverReason.timeout with retryable=True (its type name is in
_TRANSPORT_ERROR_TYPES), but the inline isinstance guard was overriding
that classification and triggering an unnecessary abort.

Fix: add ssl.SSLError to the exclusion list alongside the existing
UnicodeEncodeError carve-out so TLS errors fall through to the
classifier's retryable path.

Closes #14367
2026-04-24 05:35:43 -07:00
Teknium ba44a3d256 fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133)
Two small fixes triggered by a support report where the user saw a
cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML
error page, not a real API error) on every gemini-2.5-pro request.

The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but
nothing in our output made that diagnosable:

1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but
   omitted Google entirely, so users had no way to verify from 'hermes
   dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY
   rows.

2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted
   an empty/whitespace api_key and stamped it into the x-goog-api-key
   header, which made Google's frontend return a generic HTML 400 long
   before the request reached the Generative Language backend. Now we
   raise RuntimeError at construction with an actionable message
   pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com.

Added a regression test that covers '', '   ', and None.
2026-04-24 05:35:17 -07:00
Teknium a1caec1088 fix(agent): repair CamelCase + _tool suffix tool-call emissions (#15124)
Claude-style and some Anthropic-tuned models occasionally emit tool
names as class-like identifiers: TodoTool_tool, Patch_tool,
BrowserClick_tool, PatchTool. These failed strict-dict lookup in
valid_tool_names and triggered the 'Unknown tool' self-correction
loop, wasting a full turn of iteration and tokens.

_repair_tool_call already handled lowercase / separator / fuzzy
matches but couldn't bridge the CamelCase-to-snake_case gap or the
trailing '_tool' suffix that Claude sometimes tacks on. Extend it
with two bounded normalization passes:

  1. CamelCase -> snake_case (via regex lookbehind).
  2. Strip trailing _tool / -tool / tool suffix (case-insensitive,
     applied twice so TodoTool_tool reduces all the way: strip
     _tool -> TodoTool, snake -> todo_tool, strip 'tool' -> todo).

Cheap fast-paths (lowercase / separator-normalized) still run first
so the common case stays zero-cost. Fuzzy match remains the last
resort unchanged.

Tests: tests/run_agent/test_repair_tool_call_name.py covers the
three original reports (TodoTool_tool, Patch_tool, BrowserClick_tool),
plus PatchTool, WriteFileTool, ReadFile_tool, write-file_Tool,
patch-tool, and edge cases (empty, None, '_tool' alone, genuinely
unknown names).

18 new tests + 17 existing arg-repair tests = 35/35 pass.

Closes #14784
2026-04-24 05:32:08 -07:00
Teknium 05394f2f28 feat(spotify): interactive setup wizard + docs page (#15130)
Previously 'hermes auth spotify' crashed with 'HERMES_SPOTIFY_CLIENT_ID
is required' if the user hadn't manually created a Spotify developer
app and set env vars. Now the command detects a missing client_id and
walks the user through the one-time app registration inline:

- Opens https://developer.spotify.com/dashboard in the browser
- Tells the user exactly what to paste into the Spotify form
  (including the correct default redirect URI, 127.0.0.1:43827)
- Prompts for the Client ID
- Persists HERMES_SPOTIFY_CLIENT_ID to ~/.hermes/.env so subsequent
  runs skip the wizard
- Continues straight into the PKCE OAuth flow

Also prints the docs URL at both the start of the wizard and the end
of a successful login so users can find the full guide.

Adds website/docs/user-guide/features/spotify.md with the complete
setup walkthrough, tool reference, and troubleshooting, and wires it
into the sidebar under User Guide > Features > Advanced.

Fixes a stale redirect URI default in the hermes_cli/tools_config.py
TOOL_CATEGORIES entry (was 8888/callback from the PR description
instead of the actual DEFAULT_SPOTIFY_REDIRECT_URI value
43827/spotify/callback defined in auth.py).
2026-04-24 05:30:05 -07:00
Teknium 0d32411310 chore(release): map Group D contributors in AUTHOR_MAP 2026-04-24 05:28:45 -07:00
Brian D. Evans e87a2100f6 fix(mcp): auto-reconnect + retry once when the transport session expires (#13383)
Streamable HTTP MCP servers may garbage-collect their server-side
session state while the OAuth token remains valid — idle TTL, server
restart, pod rotation, etc.  Before this fix, the tool-call handler
treated the resulting "Invalid or expired session" error as a plain
tool failure with no recovery path, so **every subsequent call on
the affected server failed until the gateway was manually
restarted**.  Reporter: #13383.

The OAuth-based recovery path (``_handle_auth_error_and_retry``)
already exists for 401s, but it only fires on auth errors.  Session
expiry slipped through because the access token is still valid —
nothing 401'd, so the existing recovery branch was skipped.

Fix
---
Add a sibling function ``_handle_session_expired_and_retry`` that
detects MCP session-expiry via ``_is_session_expired_error`` (a
narrow allow-list of known-stable substrings: ``"invalid or expired
session"``, ``"session expired"``, ``"session not found"``,
``"unknown session"``, etc.) and then uses the existing transport
reconnect mechanism:

* Sets ``MCPServerTask._reconnect_event`` — the server task's
  lifecycle loop already interprets this as "tear down the current
  ``streamablehttp_client`` + ``ClientSession`` and rebuild them,
  reusing the existing OAuth provider instance".
* Waits up to 15 s for the new session to come back ready.
* Retries the original call once.  If the retry succeeds, returns
  its result and resets the circuit-breaker error count.  If the
  retry raises, or if the reconnect doesn't ready in time, falls
  through to the caller's generic error path.

Unlike the 401 path, this does **not** call ``handle_401`` — the
access token is already valid and running an OAuth refresh would be
a pointless round-trip.

All 5 MCP handlers (``call_tool``, ``list_resources``, ``read_resource``,
``list_prompts``, ``get_prompt``) now consult both recovery paths
before falling through:

    recovered = _handle_auth_error_and_retry(...)          # 401 path
    if recovered is not None: return recovered
    recovered = _handle_session_expired_and_retry(...)     # new
    if recovered is not None: return recovered
    # generic error response

Narrow scope — explicitly not changed
-------------------------------------
* **Detection is string-based on a 5-entry allow-list.**  The MCP
  SDK wraps JSON-RPC errors in ``McpError`` whose exception type +
  attributes vary across SDK versions, so matching on message
  substrings is the durable path.  Kept narrow to avoid false
  positives — a regular ``RuntimeError("Tool failed")`` will NOT
  trigger spurious reconnects (pinned by
  ``test_is_session_expired_rejects_unrelated_errors``).
* **No change to the existing 401 recovery flow.**  The new path is
  consulted only after the auth path declines (returns ``None``).
* **Retry count stays at 1.**  If the reconnect-then-retry also
  fails, we don't loop — the error surfaces normally so the model
  sees a failed tool call rather than a hang.
* **``InterruptedError`` is explicitly excluded** from session-expired
  detection so user-cancel signals always short-circuit the same
  way they did before (pinned by
  ``test_is_session_expired_rejects_interrupted_error``).

Regression coverage
-------------------
``tests/tools/test_mcp_tool_session_expired.py`` (new, 16 cases):

Unit tests for ``_is_session_expired_error``:
* ``test_is_session_expired_detects_invalid_or_expired_session`` —
  reporter's exact wpcom-mcp text.
* ``test_is_session_expired_detects_expired_session_variant`` —
  "Session expired" / "expired session" variants.
* ``test_is_session_expired_detects_session_not_found`` — server GC
  variant ("session not found", "unknown session").
* ``test_is_session_expired_is_case_insensitive``.
* ``test_is_session_expired_rejects_unrelated_errors`` — narrow-scope
  canary: random RuntimeError / ValueError / 401 don't trigger.
* ``test_is_session_expired_rejects_interrupted_error`` — user cancel
  must never route through reconnect.
* ``test_is_session_expired_rejects_empty_message``.

Handler integration tests:
* ``test_call_tool_handler_reconnects_on_session_expired`` — reporter's
  full repro: first call raises "Invalid or expired session", handler
  signals ``_reconnect_event``, retries once, returns the retry's
  success result with no ``error`` key.
* ``test_call_tool_handler_non_session_expired_error_falls_through``
  — preserved-behaviour canary: random tool failures do NOT trigger
  reconnect.
* ``test_session_expired_handler_returns_none_without_loop`` —
  defensive: cold-start / shutdown race.
* ``test_session_expired_handler_returns_none_without_server_record``
  — torn-down server falls through cleanly.
* ``test_session_expired_handler_returns_none_when_retry_also_fails``
  — no retry loop on repeated failure.

Parametrised across all 4 non-``tools/call`` handlers:
* ``test_non_tool_handlers_also_reconnect_on_session_expired``
  [list_resources / read_resource / list_prompts / get_prompt].

**15 of 16 fail on clean ``origin/main`` (``6fb69229``)** with
``ImportError: cannot import name '_is_session_expired_error'``
— the fix's surface symbols don't exist there yet.  The 1 passing
test is an ordering artefact of pytest-xdist worker collection.

Validation
----------
``source venv/bin/activate && python -m pytest
tests/tools/test_mcp_tool_session_expired.py -q`` → **16 passed**.

Broader MCP suite (5 files:
``test_mcp_tool.py``, ``test_mcp_tool_401_handling.py``,
``test_mcp_tool_session_expired.py``, ``test_mcp_reconnect_signal.py``,
``test_mcp_oauth.py``) → **230 passed, 0 regressions**.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 05:28:45 -07:00
AntAISecurityLab 8c2732a9f9 fix(security): strip MCP auth on cross-origin redirect
Add event hook to httpx.AsyncClient in MCP HTTP transport that strips
Authorization headers when a redirect targets a different origin,
preventing credential leakage to third-party servers.
2026-04-24 05:28:45 -07:00
Alexazhu 15050fd965 fix(mcp_oauth): raise RuntimeError instead of asserting OAuth port is set
``tools/mcp_oauth.py`` relied on ``assert _oauth_port is not None`` to
guard the module-level port set by ``build_oauth_auth``. Python's
``-O`` / ``-OO`` optimization flags strip ``assert`` statements
entirely, so a deployment that runs ``python -O -m hermes ...``
silently loses the check: ``_oauth_port`` stays ``None`` and the
failure surfaces much later as an obscure ``int()`` or
``http.server.HTTPServer((host, None))`` TypeError rather than the
intended "OAuth callback port not set" signal.

Replace with an explicit ``if … raise RuntimeError(...)`` so the
invariant is preserved regardless of the interpreter's optimization
level. Docstring updated to document the new exception.

Found during a proactive audit of ``assert`` statements in
non-test code paths.
2026-04-24 05:28:45 -07:00
Amanuel Tilahun Bogale 5fa2f4258a fix: serialize Pydantic AnyUrl fields when persisting MCP OAuth state
OAuth client information and token responses from the MCP SDK contain
Pydantic AnyUrl fields (client_uri, redirect_uris, etc.). The previous
model_dump() call returned a dict with these AnyUrl objects still as
their native Python type, which then crashed json.dumps with:

  TypeError: Object of type AnyUrl is not JSON serializable

This caused any OAuth-based MCP server (e.g. alphaxiv) to fail
registration with an "OAuth flow error" traceback during startup.

Adding mode="json" tells Pydantic to serialize all fields to
JSON-compatible primitives (AnyUrl -> str, datetime -> ISO string, etc.)
before returning the dict, so the standard json.dumps can handle it.

Three call sites fixed:
- HermesTokenStorage.set_tokens
- HermesTokenStorage.set_client_info
- build_oauth_auth pre-registration write
2026-04-24 05:28:45 -07:00
0xbyt4 4ac731c841 fix(model-normalize): pass DeepSeek V-series IDs through instead of folding to deepseek-chat
`_normalize_for_deepseek` was mapping every non-reasoner input into
`deepseek-chat` on the assumption that DeepSeek's API accepts only two
model IDs. That assumption no longer holds — `deepseek-v4-pro` and
`deepseek-v4-flash` are first-class IDs accepted by the direct API,
and on aggregators `deepseek-chat` routes explicitly to V3 (DeepInfra
backend returns `deepseek-chat-v3`). So a user picking V4 Pro through
the model picker was being silently downgraded to V3.

Verified 2026-04-24 against Nous portal's OpenAI-compat surface:
  - `deepseek/deepseek-v4-flash` → provider: DeepSeek,
    model: deepseek-v4-flash-20260423
  - `deepseek/deepseek-chat`     → provider: DeepInfra,
    model: deepseek/deepseek-chat-v3

Fix:
- Add `deepseek-v4-pro` and `deepseek-v4-flash` to
  `_DEEPSEEK_CANONICAL_MODELS` so exact matches pass through.
- Add `_DEEPSEEK_V_SERIES_RE` (`^deepseek-v\d+(...)?$`) so future
  V-series IDs (`deepseek-v5-*`, dated variants) keep passing through
  without another code change.
- Update docstring + module header to reflect the new rule.

Tests:
- New `TestDeepseekVSeriesPassThrough` — 8 parametrized cases covering
  bare, vendor-prefixed, case-variant, dated, and future V-series IDs
  plus end-to-end `normalize_model_for_provider(..., "deepseek")`.
- New `TestDeepseekCanonicalAndReasonerMapping` — regression coverage
  for canonical pass-through, reasoner-keyword folding, and
  fall-back-to-chat behaviour.
- 77/77 pass.

Reported on Discord (Ufonik, Don Piedro): `/model > Deepseek >
deepseek-v4-pro` surfaced
`Normalized 'deepseek-v4-pro' to 'deepseek-chat'`. Picker listing
showed the v4 names, so validation also rejected the post-normalize
`deepseek-chat` as "not in provider listing" — the contradiction
users saw. Normalizer now respects the picker's choice.
2026-04-24 05:24:54 -07:00
Austin Pickett 4f5669a569 feat: add docs link 2026-04-24 08:22:44 -04:00
Teknium acd78a457e fix(docker): reap orphaned subprocesses via tini as PID 1 (#15116)
Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes #15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
2026-04-24 05:22:34 -07:00
Teknium 4ff7950f7f chore(spotify): gate toolset off by default, add to hermes tools UI
Follow-up on top of #15096 cherry-pick:
- Remove spotify_* from _HERMES_CORE_TOOLS (keep only in the 'spotify'
  toolset, so the 9 Spotify tool schemas are not shipped to every user).
- Add 'spotify' to CONFIGURABLE_TOOLSETS + _DEFAULT_OFF_TOOLSETS so new
  installs get it opt-in via 'hermes tools', matching homeassistant/rl.
- Wire TOOL_CATEGORIES entry pointing at 'hermes auth spotify' for the
  actual PKCE login (optional HERMES_SPOTIFY_CLIENT_ID /
  HERMES_SPOTIFY_REDIRECT_URI env vars).
- scripts/release.py: map contributor email to GitHub login.
2026-04-24 05:20:38 -07:00
Dilee 7e9dd9ca45 Add native Spotify tools with PKCE auth 2026-04-24 05:20:38 -07:00
Teknium 3392d1e422 chore(release): map Group E contributors in AUTHOR_MAP 2026-04-24 05:20:05 -07:00
konsisumer 785d168d50 fix(credential_pool): add Nous OAuth cross-process auth-store sync
Concurrent Hermes processes (e.g. cron jobs) refreshing a Nous OAuth token
via resolve_nous_runtime_credentials() write the rotated tokens to auth.json.
The calling process's pool entry becomes stale, and the next refresh against
the already-rotated token triggers a 'refresh token reuse' revocation on
the Nous Portal.

_sync_nous_entry_from_auth_store() reads auth.json under the same lock used
by resolve_nous_runtime_credentials, and adopts the newer token pair before
refreshing the pool entry. This complements #15111 (which preserved the
obtained_at timestamps through seeding).

Partial salvage of #10160 by @konsisumer — only the agent/credential_pool.py
changes + the 3 Nous-specific regression tests. The PR also touched 10
unrelated files (Dockerfile, tips.py, various tool tests) which were
dropped as scope creep.

Regression tests:
- test_sync_nous_entry_from_auth_store_adopts_newer_tokens
- test_sync_nous_entry_noop_when_tokens_match
- test_nous_exhausted_entry_recovers_via_auth_store_sync
2026-04-24 05:20:05 -07:00
Michael Steuer cd221080ec fix: validate nous auth status against runtime credentials 2026-04-24 05:20:05 -07:00
Prasad Subrahmanya 1fc77f995b fix(agent): fall back on rate limit when pool has no rotation room
Extracts pool-rotation-room logic into `_pool_may_recover_from_rate_limit`
so single-credential pools no longer block the eager-fallback path on 429.

The existing check `pool is not None and pool.has_available()` lets
fallback fire only after the pool marks every entry as exhausted.  With
exactly one credential in the pool (the common shape for Gemini OAuth,
Vertex service accounts, and any personal-key setup), `has_available()`
flips back to True as soon as the cooldown expires — Hermes retries
against the same entry, hits the same daily-quota 429, and burns the
retry budget in a tight loop before ever reaching the configured
`fallback_model`.  Observed in the wild as 4+ hours of 429 noise on a
single Gemini key instead of falling through to Vertex as configured.

Rotation is only meaningful with more than one credential — gate on
`len(pool.entries()) > 1`.  Multi-credential pools keep the current
wait-for-rotation behaviour unchanged.

Fixes #11314.  Related to #8947, #10210, #7230.  Narrower scope than
open PRs #8023 (classifier change) and #11492 (503/529 credential-pool
bypass) — this addresses the single-credential 429 case specifically
and does not conflict with either.

Tests: 6 new unit tests in tests/run_agent/test_provider_fallback.py
covering (a) None pool, (b) single-cred available, (c) single-cred in
cooldown, (d) 2-cred available rotates, (e) multi-cred all cooling-down
falls back, (f) many-cred available rotates.  All 18 tests in the file
pass.
2026-04-24 05:20:05 -07:00
jakubkrcmar 1af44a13c0 fix(model_picker): detect mapped-provider auth-store credentials 2026-04-24 05:20:05 -07:00
Andy fff7ee31ae fix: clarify auth retry guidance 2026-04-24 05:20:05 -07:00
YueLich 6fcaf5ebc2 fix: rotate credential pool on 403 (Forbidden) responses
Previously _handle_credential_pool_error handled 401, 402, and 429
but silently ignored 403. When a provider returns 403 for a revoked or
unauthorised credential (e.g. Nous agent_key invalidated by a newer
login), the pool was never rotated and every subsequent request
continued to use the same failing credential.

Treat 403 the same as 402: immediately mark the current credential
exhausted and rotate to the next pool entry, since a Forbidden response
will not resolve itself with a retry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 05:20:05 -07:00
vominh1919 461899894e fix: increment request_count in least_used pool strategy
The least_used strategy selected entries via min(request_count) but
never incremented the counter. All entries stayed at count=0, so the
strategy degenerated to fill_first behavior with no actual load balancing.

Now increments request_count after each selection and persists the update.
2026-04-24 05:20:05 -07:00
Teknium b3aed6cfd8 chore(release): map l0hde and difujia in AUTHOR_MAP 2026-04-24 05:09:08 -07:00
NiuNiu Xia 76329196c1 fix(copilot): wire live /models max_prompt_tokens into context-window resolver
The Copilot provider resolved context windows via models.dev static data,
which does not include account-specific models (e.g. claude-opus-4.6-1m
with 1M context). This adds the live Copilot /models API as a higher-
priority source for copilot/copilot-acp/github-copilot providers.

New helper get_copilot_model_context() in hermes_cli/models.py extracts
capabilities.limits.max_prompt_tokens from the cached catalog. Results
are cached in-process for 1 hour.

In agent/model_metadata.py, step 5a queries the live API before falling
through to models.dev (step 5b). This ensures account-specific models
get correct context windows while standard models still have a fallback.

Part 1 of #7731.
Refs: #7272
2026-04-24 05:09:08 -07:00
NiuNiu Xia d7ad07d6fe fix(copilot): exchange raw GitHub token for Copilot API JWT
Raw GitHub tokens (gho_/github_pat_/ghu_) are now exchanged for
short-lived Copilot API tokens via /copilot_internal/v2/token before
being used as Bearer credentials. This is required to access
internal-only models (e.g. claude-opus-4.6-1m with 1M context).

Implementation:
- exchange_copilot_token(): calls the token exchange endpoint with
  in-process caching (dict keyed by SHA-256 fingerprint), refreshed
  2 minutes before expiry. No disk persistence — gateway is long-running
  so in-memory cache is sufficient.
- get_copilot_api_token(): convenience wrapper with graceful fallback —
  returns exchanged token on success, raw token on failure.
- Both callers (hermes_cli/auth.py and agent/credential_pool.py) now
  pipe the raw token through get_copilot_api_token() before use.

12 new tests covering exchange, caching, expiry, error handling,
fingerprinting, and caller integration. All 185 existing copilot/auth
tests pass.

Part 2 of #7731.
2026-04-24 05:09:08 -07:00
l0hde 2cab8129d1 feat(copilot): add 401 auth recovery with automatic token refresh and client rebuild
When using GitHub Copilot as provider, HTTP 401 errors could cause
Hermes to silently fall back to the next model in the chain instead
of recovering. This adds a one-shot retry mechanism that:

1. Re-resolves the Copilot token via the standard priority chain
   (COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token)
2. Rebuilds the OpenAI client with fresh credentials and Copilot headers
3. Retries the failed request before falling back

The fix handles the common case where the gho_* OAuth token remains
valid but the httpx client state becomes stale (e.g. after startup
race conditions or long-lived sessions).

Key design decisions:
- Always rebuild client even if token string unchanged (recovers stale state)
- Uses _apply_client_headers_for_base_url() for canonical header management
- One-shot flag guard prevents infinite 401 loops (matches existing pattern
  used by Codex/Nous/Anthropic providers)
- No token exchange via /copilot_internal/v2/token (returns 404 for some
  account types; direct gho_* auth works reliably)

Tests: 3 new test cases covering end-to-end 401->refresh->retry,
client rebuild verification, and same-token rebuild scenarios.
Docs: Updated providers.md with Copilot auth behavior section.
2026-04-24 05:09:08 -07:00
MestreY0d4-Uninter 7d2f93a97f fix: set HOME for Copilot ACP subprocesses
Pass an explicit HOME into Copilot ACP child processes so delegated ACP runs do not fail when the ambient environment is missing HOME.

Prefer the per-profile subprocess home when available, then fall back to HOME, expanduser('~'), pwd.getpwuid(...), and /home/openclaw. Add regression tests for both profile-home preference and clean HOME fallback.

Refs #11068.
2026-04-24 05:09:08 -07:00
Teknium 78450c4bd6 fix(nous-oauth): preserve obtained_at in pool + actionable message on RT reuse (#15111)
Two narrow fixes motivated by #15099.

1. _seed_from_singletons() was dropping obtained_at, agent_key_obtained_at,
   expires_in, and friends when seeding device_code pool entries from the
   providers.nous singleton. Fresh credentials showed up with
   obtained_at=None, which broke downstream freshness-sensitive consumers
   (self-heal hooks, pool pruning by age) — they treated just-minted
   credentials as older than they actually were and evicted them.

2. When the Nous Portal OAuth 2.1 server returns invalid_grant with
   'Refresh token reuse detected' in the error_description, rewrite the
   message to explain the likely cause (an external process consumed the
   rotated RT without persisting it back) and the mitigation. The generic
   reuse message led users to report this as a Hermes persistence bug when
   the actual trigger was typically a third-party monitoring script calling
   /api/oauth/token directly. Non-reuse errors keep their original server
   description untouched.

Closes #15099.

Regression tests:
- tests/agent/test_credential_pool.py::test_nous_seed_from_singletons_preserves_obtained_at_timestamps
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_token_reuse_detection_surfaces_actionable_message
- tests/hermes_cli/test_auth_nous_provider.py::test_refresh_non_reuse_error_keeps_original_description
2026-04-24 05:08:46 -07:00
Teknium 852c7f3be3 feat(cron): per-job workdir for project-aware cron runs (#15110)
Cron jobs can now specify a per-job working directory. When set, the job
runs as if launched from that directory: AGENTS.md / CLAUDE.md /
.cursorrules from that dir are injected into the system prompt, and the
terminal / file / code-exec tools use it as their cwd (via TERMINAL_CWD).
When unset, old behaviour is preserved (no project context files, tools
use the scheduler's cwd).

Requested by @bluthcy.

## Mechanism

- cron/jobs.py: create_job / update_job accept 'workdir'; validated to
  be an absolute existing directory at create/update time.
- cron/scheduler.py run_job: if job.workdir is set, point TERMINAL_CWD
  at it and flip skip_context_files to False before building the agent.
  Restored in finally on every exit path.
- cron/scheduler.py tick: workdir jobs run sequentially (outside the
  thread pool) because TERMINAL_CWD is process-global. Workdir-less jobs
  still run in the parallel pool unchanged.
- tools/cronjob_tools.py + hermes_cli/cron.py + hermes_cli/main.py:
  expose 'workdir' via the cronjob tool and 'hermes cron create/edit
  --workdir ...'. Empty string on edit clears the field.

## Validation

- tests/cron/test_cron_workdir.py (21 tests): normalize, create, update,
  JSON round-trip via cronjob tool, tick partition (workdir jobs run on
  the main thread, not the pool), run_job env toggle + restore in finally.
- Full targeted suite (tests/cron/, test_cronjob_tools.py, test_cron.py,
  test_config_cwd_bridge.py, test_worktree.py): 314/314 passed.
- Live smoke: hermes cron create --workdir $(pwd) works; relative path
  rejected; list shows 'Workdir:'; edit --workdir '' clears.
2026-04-24 05:07:01 -07:00
Teknium 0e235947b9 fix(redact): honor security.redact_secrets from config.yaml (#15109)
agent/redact.py snapshots _REDACT_ENABLED from HERMES_REDACT_SECRETS at
module-import time. hermes_cli/main.py calls setup_logging() early, which
transitively imports agent.redact — BEFORE any config bridge has run. So
users who set 'security.redact_secrets: false' in config.yaml (instead of
HERMES_REDACT_SECRETS=false in .env) had the toggle silently ignored in
both 'hermes chat' and 'hermes gateway run'.

Bridge config.yaml -> env var in hermes_cli/main.py BEFORE setup_logging.
.env still wins (only set env when unset) — config.yaml is the fallback.

Regression tests in tests/hermes_cli/test_redact_config_bridge.py spawn
fresh subprocesses to verify:
- redact_secrets: false in config.yaml disables redaction
- default (key absent) leaves redaction enabled
- .env HERMES_REDACT_SECRETS=true overrides config.yaml
2026-04-24 05:03:26 -07:00
Teknium c2b3db48f5 fix(agent): retry on json.JSONDecodeError instead of treating it as a local validation error (#15107)
json.JSONDecodeError inherits from ValueError. The agent loop's
non-retryable classifier at run_agent.py ~L10782 treated any
ValueError/TypeError as a local programming bug and short-circuited
retry. Without a carve-out, a transient JSONDecodeError from a
provider that returned a malformed response body, a truncated stream,
or a router-layer corruption would fail the turn immediately.

Add JSONDecodeError to the existing UnicodeEncodeError exclusion
tuple so the classified-retry logic (which already handles 429/529/
context-overflow/etc.) gets to run on bad-JSON errors.

Tests (tests/run_agent/test_jsondecodeerror_retryable.py):
  - JSONDecodeError: NOT local validation
  - UnicodeEncodeError: NOT local validation (existing carve-out)
  - bare ValueError: IS local validation (programming bug)
  - bare TypeError: IS local validation (programming bug)
  - source-level assertion that run_agent.py still carries the carve-out
    (guards against accidental revert)

Closes #14782
2026-04-24 05:02:58 -07:00
Teknium 1eb29e6452 fix(opencode): derive api_mode from target model, not stale config default (#15106)
/model kimi-k2.6 on opencode-zen (or glm-5.1 on opencode-go) returned OpenCode's
website 404 HTML page when the user's persisted model.default was a Claude or
MiniMax model. The switched-to chat_completions request hit
https://opencode.ai/zen (or /zen/go) with no /v1 suffix.

Root cause: resolve_runtime_provider() computed api_mode from
model_cfg.get('default') instead of the model being requested. With a Claude
default, it resolved api_mode=anthropic_messages, stripped /v1 from base_url
(required for the Anthropic SDK), then switch_model()'s opencode_model_api_mode
override flipped api_mode back to chat_completions without restoring /v1.

Fix: thread an optional target_model kwarg through resolve_runtime_provider
and _resolve_runtime_from_pool_entry. When the caller is performing an explicit
mid-session model switch (i.e. switch_model()), the target model drives both
api_mode selection and the conditional /v1 strip. Other callers (CLI init,
gateway init, cron, ACP, aux client, delegate, account_usage, tui_gateway) pass
nothing and preserve the existing config-default behavior.

Regression tests added in test_model_switch_opencode_anthropic.py use the REAL
resolver (not a mock) to guard the exact Quentin-repro scenario. Existing tests
that mocked resolve_runtime_provider with 'lambda requested:' had their mock
signatures widened to '**kwargs' to accept the new kwarg.
2026-04-24 04:58:46 -07:00
Teknium 7634c1386f feat(delegate): diagnostic dump when a subagent times out with 0 API calls (#15105)
When a subagent in delegate_task times out before making its first LLM
request, write a structured diagnostic file under
~/.hermes/logs/subagent-timeout-<sid>-<ts>.log capturing enough state
for the user (and us) to debug the hang. The old error message —
'Subagent timed out after Ns with no response. The child may be stuck
on a slow API call or unresponsive network request.' — gave no
observability for the 0-API-call case, which is the hardest to reason
about remotely.

The diagnostic captures:
  - timeout config vs actual duration
  - goal (truncated to 1000 chars)
  - child config: model, provider, api_mode, base_url, max_iterations,
    quiet_mode, platform, _delegate_role, _delegate_depth
  - enabled_toolsets + loaded tool names
  - system prompt byte/char count (catches oversized prompts that
    providers silently choke on)
  - tool schema count + byte size
  - child's get_activity_summary() snapshot
  - Python stack of the worker thread at the moment of timeout
    (reveals whether the hang is in credential resolution, transport,
    prompt construction, etc.)

Wiring:
  - _run_single_child captures the worker thread via a small wrapper
    around child.run_conversation so we can look up its stack at
    timeout.
  - After a FuturesTimeoutError, we pull child.get_activity_summary()
    to read api_call_count. If 0 AND it was a timeout (not a raise),
    _dump_subagent_timeout_diagnostic() is invoked.
  - The returned path is surfaced in the error string so the parent
    agent (and therefore the user / gateway) sees exactly where to look.
  - api_calls > 0 timeouts keep the old 'stuck on slow API call'
    phrasing since that's the correct diagnosis for those.

This does NOT change any behavior for successful subagent runs,
non-timeout errors, or subagents that made at least one API call
before hanging.

Tests: 7 cases (tests/tools/test_delegate_subagent_timeout_diagnostic.py)
  - output format + required sections + field values
  - long-goal truncation with [truncated] marker
  - missing / already-exited worker thread branches
  - unwritable HERMES_HOME/logs/ returns None without raising
  - _run_single_child wiring: 0 API calls → dump + diagnostic_path in error
  - _run_single_child wiring: N>0 API calls → no dump, old message

Refs: #14726
2026-04-24 04:58:32 -07:00
Teknium 3cb43df2cd chore(release): add georgex8001 to AUTHOR_MAP 2026-04-24 04:54:16 -07:00
georgex8001 1dca2e0a28 fix(runtime): resolve bare custom provider to loopback or CUSTOM_BASE_URL
When /model selects Custom but model.provider in YAML still reflects a prior provider, trust model.base_url only for loopback hosts or when provider is custom. Consult CUSTOM_BASE_URL before OpenRouter defaults (#14676).
2026-04-24 04:54:16 -07:00
Teknium 2f39dbe471 chore(release): map j3ffffff and A-FdL-Prog in AUTHOR_MAP 2026-04-24 04:53:32 -07:00
Matt Maximo 271f0e6eb0 fix(model): let Codex setup reuse or reauthenticate 2026-04-24 04:53:32 -07:00
Devzo 813dbd9b40 fix(codex): route auth failures to fallback provider chain
Two related paths where Codex auth failures silently swallowed the
fallback chain instead of switching to the next provider:

1. cli.py — _ensure_runtime_credentials() calls resolve_runtime_provider()
   before each turn. When provider is explicitly configured (not "auto"),
   an AuthError from token refresh is re-raised and printed as a bold-red
   error, returning False before the agent ever starts. The fallback chain
   was never tried. Fix: on AuthError, iterate fallback_providers and
   switch to the first one that resolves successfully.

2. run_agent.py — inside the codex_responses validity gate (inner retry
   loop), response.status in {"failed","cancelled"} with non-empty output
   items was treated as a valid response and broke out of the retry loop,
   reaching _normalize_codex_response() outside the fallback machinery.
   That function raises RuntimeError on status="failed", which propagates
   to the outer except with no fallback logic. Fix: detect terminal status
   codes before the output_items check and set response_invalid=True so
   the existing fallback chain fires normally.
2026-04-24 04:53:32 -07:00
j3ffffff f76df30e08 fix(auth): parse OpenAI nested error shape in Codex token refresh
OpenAI's OAuth token endpoint returns errors in a nested shape —
{"error": {"code": "refresh_token_reused", "message": "..."}} —
not the OAuth spec's flat {"error": "...", "error_description": "..."}.
The existing parser only handled the flat shape, so:

- `err.get("error")` returned a dict, the `isinstance(str)` guard
  rejected it, and `code` stayed `"codex_refresh_failed"`.
- The dedicated `refresh_token_reused` branch (with its actionable
  "re-run codex + hermes auth" message and `relogin_required=True`)
  never fired.
- Users saw the generic "Codex token refresh failed with status 401"
  when another Codex client (CLI, VS Code extension) had consumed
  their single-use refresh token — giving no hint that re-auth was
  required.

Parse both shapes, mapping OpenAI's nested `code`/`type` onto the
existing `code` variable so downstream branches (`refresh_token_reused`,
`invalid_grant`, etc.) fire correctly.

Add regression tests covering:
- nested `refresh_token_reused` → actionable message + relogin_required
- nested generic code → code + message surfaced
- flat OAuth-spec `invalid_grant` still handled (back-compat)
- unparseable body → generic fallback message, relogin_required=False

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-24 04:53:32 -07:00
Teknium 227afcd80f chore(release): map jiechengwu@pony.ai to Jason2031
AUTHOR_MAP entry for the cherry-picked commit in salvaged PR #13483
so release notes attribute correctly.
2026-04-24 04:52:11 -07:00
Teknium 06b60b76cd fix(docker): safer docker-compose defaults for UID and dashboard bind
Follow-up to salvaged PR #13483:
- Default HERMES_UID/HERMES_GID to 10000 (matches Dockerfile's useradd
  and the entrypoint's default) instead of 1001. Users should set these
  to their own id -u / id -g; document that in the header.
- Dashboard service: bind to 127.0.0.1 without --insecure by default.
  The dashboard stores API keys; the original compose file exposed it on
  0.0.0.0 with auth explicitly disabled, which the dashboard's own
  --insecure help text flags as DANGEROUS.
- Add header comments explaining HERMES_UID usage, the dashboard
  security posture, and how to expose the API server safely.
2026-04-24 04:52:11 -07:00
Jiecheng Wu 14c9f7272c fix(docker): fix HERMES_UID permission handling and add docker-compose.yml
- Remove 'USER hermes' from Dockerfile so entrypoint runs as root and can
  usermod/groupmod before gosu drop. Add chmod -R a+rX /opt/hermes so any
  remapped UID can read the install directory.
- Fix entrypoint chown logic: always chown -R when HERMES_UID is remapped
  from default 10000, not just when top-level dir ownership mismatches.
- Add docker-compose.yml with gateway + dashboard services.
- Add .hermes to .gitignore.
2026-04-24 04:52:11 -07:00
LeonSGP43 ccc8fccf77 fix(cli): validate user-defined providers consistently 2026-04-24 04:48:56 -07:00
Teknium 3aa1a41e88 feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100)
Google AI Studio's free tier (<= 250 req/day for gemini-2.5-flash) is
exhausted in a handful of agent turns, so the setup wizard now refuses
to wire up Gemini when the supplied key is on the free tier, and the
runtime 429 handler appends actionable billing guidance.

Setup-time probe (hermes_cli/main.py):
- `_model_flow_api_key_provider` fires one minimal generateContent call
  when provider_id == 'gemini' and classifies the response as
  free/paid/unknown via x-ratelimit-limit-requests-per-day header or
  429 body containing 'free_tier'.
- Free  -> print block message, refuse to save the provider, return.
- Paid  -> 'Tier check: paid' and proceed.
- Unknown (network/auth error) -> 'could not verify', proceed anyway.

Runtime 429 handler (agent/gemini_native_adapter.py):
- `gemini_http_error` appends billing guidance when the 429 error body
  mentions 'free_tier', catching users who bypass setup by putting
  GOOGLE_API_KEY directly in .env.

Tests: 21 unit tests for the probe + error path, 4 tests for the
setup-flow block. All 67 existing gemini tests still pass.
2026-04-24 04:46:17 -07:00
Teknium 346601ca8d fix(context): invalidate stale Codex OAuth cache entries >= 400k (#15078)
PR #14935 added a Codex-aware context resolver but only new lookups
hit the live /models probe. Users who had run Hermes on gpt-5.5 / 5.4
BEFORE that PR already had the wrong value (e.g. 1,050,000 from
models.dev) persisted in ~/.hermes/context_length_cache.yaml, and the
cache-first lookup in get_model_context_length() returns it forever.

Symptom (reported in the wild by Ludwig, min heo, Gaoge on current
main at 6051fba9d, which is AFTER #14935):
  * Startup banner shows context usage against 1M
  * Compression fires late and then OpenAI hard-rejects with
    'context length will be reduced from 1,050,000 to 128,000'
    around the real 272k boundary.

Fix: when the step-1 cache returns a value for an openai-codex lookup,
check whether it's >= 400k. Codex OAuth caps every slug at 272k (live
probe values) so anything at or above 400k is definitionally a
pre-#14935 leftover. Drop that entry from the on-disk cache and fall
through to step 5, which runs the live /models probe and repersists
the correct value (or 272k from the hardcoded fallback if the probe
fails). Non-Codex providers and legitimately-cached Codex entries at
272k are untouched.

Changes:
- agent/model_metadata.py:
  * _invalidate_cached_context_length() — drop a single entry from
    context_length_cache.yaml and rewrite the file.
  * Step-1 cache check in get_model_context_length() now gates
    provider=='openai-codex' entries >= 400k through invalidation
    instead of returning them.

Tests (3 new in TestCodexOAuthContextLength):
- stale 1.05M Codex entry is dropped from disk AND re-resolved
  through the live probe to 272k; unrelated cache entries survive.
- fresh 272k Codex entry is respected (no probe call, no invalidation).
- non-Codex 1M entries (e.g. anthropic/claude-opus-4.6 on OpenRouter)
  are unaffected — the guard is strictly scoped to openai-codex.

Full tests/agent/test_model_metadata.py: 88 passed.
2026-04-24 04:46:07 -07:00
Teknium 18f3fc8a6f fix(tests): resolve 17 persistent CI test failures (#15084)
Make the main-branch test suite pass again. Most failures were tests
still asserting old shapes after recent refactors; two were real source
bugs.

Source fixes:
- tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every
  shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel
  measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty.
- hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150.

Test fixes (mostly stale mock targets / missing fixture fields):
- test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm
  (the local name bound at import), not tools.terminal_tool.cleanup_vm.
- test_browser_camofox: patch tools.browser_camofox.load_config, not
  hermes_cli.config.load_config (the source module, not the resolved one).
- test_flush_memories_codex._chat_response_with_memory_call: add
  finish_reason, tool_call.id, tool_call.type so the chat_completions
  transport normalizer doesn't AttributeError.
- test_concurrent_interrupt: polling_tool signature now accepts
  messages= kwarg that _invoke_tool() passes through.
- test_minimax_provider: add _fallback_chain=[] to the __new__'d agent
  so switch_model() doesn't AttributeError.
- test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working
  after the scanner switched to agent.skill_utils.iter_skill_index_files
  (os.walk-based). Point SKILLS_DIR at a real tmp_path and patch
  agent.skill_utils.get_external_skills_dirs.
- test_browser_cdp_tool: browser_cdp toolset was intentionally split into
  'browser-cdp' (commit 96b0f3700) so its stricter check_fn doesn't gate
  the whole browser toolset; test now expects 'browser-cdp'.
- test_registry: add tools.browser_dialog_tool to the expected
  builtin-discovery set (PR #14540 added it).
- test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint'
  key on the JSON payload, not inline '[Hint: ...' text.
- test_write_deny test_hermes_env: resolve .env via get_hermes_home() so
  the path matches the profile-aware denylist under hermetic HERMES_HOME.
- test_checkpoint_manager test_falls_back_to_parent: guard the walk-up
  so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the
  project root.
- test_quick_commands: set cli.session_id in the __new__'d CLI so the
  alias-args path doesn't trip AttributeError when fuzzy-matching leaks
  a skill command across xdist test distribution.
2026-04-24 03:46:46 -07:00
Teknium 1f9c368622 fix(gemini): drop integer/number/boolean enums from tool schemas (#15082)
Gemini's Schema validator requires every `enum` entry to be a string,
even when the parent `type` is integer/number/boolean. Discord's
`auto_archive_duration` parameter (`type: integer, enum: [60, 1440,
4320, 10080]`) tripped this on every request that shipped the full
tool catalog to generativelanguage.googleapis.com, surfacing as
`Gateway: Non-retryable client error: Gemini HTTP 400 (INVALID_ARGUMENT)
Invalid value ... (TYPE_STRING), 60` and aborting the turn.

Sanitize by dropping the `enum` key when the declared type is numeric
or boolean and any entry is non-string. The `type` and `description`
survive, so the model still knows the allowed values; the tool handler
keeps its own runtime validation. Other providers (OpenAI,
OpenRouter, Anthropic) are unaffected — the sanitizer only runs for
native Gemini / cloudcode adapters.

Reported by @selfhostedsoul on Discord with hermes debug share.
2026-04-24 03:40:00 -07:00
Nicolò Boschi edff2fbe7e feat(hindsight): optional bank_id_template for per-agent / per-user banks
Adds an optional bank_id_template config that derives the bank name at
initialize() time from runtime context. Existing users with a static
bank_id keep the current behavior (template is empty by default).

Supported placeholders:
  {profile}   — active Hermes profile (agent_identity kwarg)
  {workspace} — Hermes workspace (agent_workspace kwarg)
  {platform}  — cli, telegram, discord, etc.
  {user}      — platform user id (gateway sessions)
  {session}   — session id

Unsafe characters in placeholder values are sanitized, and empty
placeholders collapse cleanly (e.g. "hermes-{user}" with no user
becomes "hermes"). If the template renders empty, the static bank_id
is used as a fallback.

Common uses:
  bank_id_template: hermes-{profile}            # isolate per Hermes profile
  bank_id_template: {workspace}-{profile}       # workspace + profile scoping
  bank_id_template: hermes-{user}               # per-user banks for gateway
2026-04-24 03:38:17 -07:00
Nicolò Boschi f9c6c5ab84 fix(hindsight): scope document_id per process to avoid resume overwrite (#6602)
Reusing session_id as document_id caused data loss on /resume: when
the session is loaded again, _session_turns starts empty and the next
retain replaces the entire previously stored content.

Now each process lifecycle gets its own document_id formed as
{session_id}-{startup_timestamp}, so:
- Same session, same process: turns accumulate into one document (existing behavior)
- Resume (new process, same session): writes a new document, old one preserved
- Forks: child process gets its own document; parent's doc is untouched

Also adds session lineage tags so all processes for the same session
(or its parent) can still be filtered together via recall:
- session:<session_id> on every retain
- parent:<parent_session_id> when initialized with parent_session_id

Closes #6602
2026-04-24 03:38:17 -07:00
Teknium 3a86f70969 test(hindsight): update materialize-profile-env test for HINDSIGHT_TIMEOUT
The existing test_local_embedded_setup_materializes_profile_env expected
exact equality on ~/.hermes/.env content; the new HINDSIGHT_TIMEOUT=120
line from the timeout feature now appears in that file. Append it to the
expected string so the test reflects the new post_setup output.
2026-04-24 03:36:02 -07:00
tekgnosis-net f1ba2f0c0b fix(hindsight): use configured timeout in _run_sync for all async operations
The previous commit added HINDSIGHT_TIMEOUT as a configurable env var,
but _run_sync still used the hardcoded _DEFAULT_TIMEOUT (120s). All
async operations (recall, retain, reflect, aclose) now go through an
instance method that uses self._timeout, so the configured value is
actually applied.

Also: added backward-compatible alias comment for the module-level
function.
2026-04-24 03:36:02 -07:00
tekgnosis-net 403c82b6b6 feat(hindsight): add configurable HINDSIGHT_TIMEOUT env var
The Hindsight Cloud API can take 30-40 seconds per request. The
hardcoded 30s timeout was too aggressive and caused frequent
timeout errors. This patch:

1. Adds HINDSIGHT_TIMEOUT environment variable (default: 120s)
2. Adds timeout to the config schema for setup wizard visibility
3. Uses the configurable timeout in both _run_sync() and client creation
4. Reads from config.json or env var, falling back to 120s default

This makes the timeout upgrade-proof — users can set it via env var
or config without patching source code.

Signed-off-by: Kumar <kumar@tekgnosis.net>
2026-04-24 03:36:02 -07:00
Jason Perlow 93a74f74bf fix(hindsight): preserve shared event loop across provider shutdowns
The module-global `_loop` / `_loop_thread` pair is shared across every
`HindsightMemoryProvider` instance in the process — the plugin loader
creates one provider per `AIAgent`, and the gateway creates one `AIAgent`
per concurrent chat session (Telegram/Discord/Slack/CLI).

`HindsightMemoryProvider.shutdown()` stopped the shared loop when any one
session ended. That stranded the aiohttp `ClientSession` and `TCPConnector`
owned by every sibling provider on a now-dead loop — they were never
reachable for close and surfaced as the `Unclosed client session` /
`Unclosed connector` warnings reported in #11923.

Fix: stop stopping the shared loop in `shutdown()`. Per-provider cleanup
still closes that provider's own client via `self._client.aclose()`. The
loop runs on a daemon thread and is reclaimed on process exit; keeping
it alive between provider shutdowns means sibling providers can drain
their own sessions cleanly.

Regression tests in `tests/plugins/memory/test_hindsight_provider.py`
(`TestSharedEventLoopLifecycle`):

- `test_shutdown_does_not_stop_shared_event_loop` — two providers share
  the loop; shutting down one leaves the loop live for the other. This
  test reproduces the #11923 leak on `main` and passes with the fix.
- `test_client_aclose_called_on_cloud_mode_shutdown` — each provider's
  own aiohttp session is still closed via `aclose()`.

Fixes #11923.
2026-04-24 03:34:12 -07:00
Teknium b4c030025f chore(release): map Nicecsh in AUTHOR_MAP
Required by CI for the #15030 salvage — Nicecsh's commits
(cshong2017@outlook.com) carry their authorship into main.
2026-04-24 03:33:29 -07:00
Teknium 42d6ab5082 test(gateway): unify discord mock via shared conftest; drop duplicated mock in model_picker test
The cherry-picked model_picker test installed its own discord mock at
module-import time via a local _ensure_discord_mock(), overwriting
sys.modules['discord'] with a mock that lacked attributes other
gateway tests needed (Intents.default(), File, app_commands.Choice).
On pytest-xdist workers that collected test_discord_model_picker.py
first, the shared mock in tests/gateway/conftest.py got clobbered and
downstream tests failed with AttributeError / TypeError against
missing mock attrs. Classic sys.modules cross-test pollution (see
xdist-cross-test-pollution skill).

Fix:
- Extend the canonical _ensure_discord_mock() in tests/gateway/conftest.py
  to cover everything the model_picker test needs: real View/Select/
  Button/SelectOption classes (not MagicMock sentinels), an Embed
  class that preserves title/description/color kwargs for assertion,
  and Color.greyple.
- Strip the duplicated mock-setup block from test_discord_model_picker.py
  and rely on the shared mock that conftest installs at collection
  time.

Regression check:
  scripts/run_tests.sh tests/gateway/ tests/hermes_cli/ -k 'discord or model or copilot or provider' -o 'addopts='
  1291 passed (was 1288 passed + 3 xdist-ordered failures before this commit).
2026-04-24 03:33:29 -07:00
Nicecsh fe34741f32 fix(model): repair Discord Copilot /model flow
Keep Discord Copilot model switching responsive and current by refreshing picker data from the live catalog when possible, correcting the curated fallback list, and clearing stale controls before the switch completes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 03:33:29 -07:00
Nicecsh 2e2de124af fix(aux): normalize GitHub Copilot provider slugs
Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 03:33:29 -07:00
LeonSGP43 df55660e3c fix(hindsight): disable broken local runtime on unsupported CPUs 2026-04-24 03:33:14 -07:00
kshitij 7897f65a94 fix(normalize): lowercase Xiaomi model IDs for case-insensitive config (#15066)
Xiaomi's API (api.xiaomimimo.com) requires lowercase model IDs like
"mimo-v2.5-pro" but rejects mixed-case names like "MiMo-V2.5-Pro"
that users copy from marketing docs or the ProviderEntry description.

Add _LOWERCASE_MODEL_PROVIDERS set and apply .lower() to model names
for providers in this set (currently just xiaomi) after stripping the
provider prefix. This ensures any case variant in config.yaml is
normalized before hitting the API.

Other providers (minimax, zai, etc.) are NOT affected — their APIs
accept mixed case (e.g. MiniMax-M2.7).
2026-04-24 03:33:05 -07:00
bwjoke 3e994e38f7 [verified] fix: materialize hindsight profile env during setup 2026-04-24 03:30:11 -07:00
JC的AI分身 127048e643 fix(hindsight): accept snake_case api_key config 2026-04-24 03:30:03 -07:00
harryplusplus d6b65bbc47 fix(hindsight): preserve non-ASCII text in retained conversation turns 2026-04-24 03:29:58 -07:00
Chris Danis a5c7422f23 fix(hindsight): always write HINDSIGHT_LLM_API_KEY to .env, even when empty
When user runs
  ✓ Memory provider: built-in only
  Saved to config.yaml and leaves the API key blank,
the old code skipped writing it entirely. This caused the uvx daemon
launcher to fail at startup because it couldn't distinguish between
"key not configured" and "explicitly blank key."

Now HINDSIGHT_LLM_API_KEY is always written to .env so the value
is either set or explicitly empty.
2026-04-24 03:29:53 -07:00
Teknium 3c0a728607 chore(release): map hindsight PR contributors in AUTHOR_MAP (#15070)
Adds AUTHOR_MAP entries for perlowja, tangyuanjc, harryplusplus
ahead of merging PRs #14109, #13153, #13090.
2026-04-24 03:29:46 -07:00
Teknium 339123481e chore(release): map ericnicolaides (wildcat.local commit email) in AUTHOR_MAP 2026-04-24 03:21:29 -07:00
WildCat Eng Manager 9e6f34a76e docs: document prompt_caching.cache_ttl in cli-config example
Made-with: Cursor
2026-04-24 03:21:29 -07:00
WildCat Eng Manager 7626f3702e feat: read prompt caching cache_ttl from config
- Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in)
- Document DEFAULT_CONFIG and developer guide example
- Add unit tests for default, 1h, and invalid TTL fallback

Made-with: Cursor
2026-04-24 03:21:29 -07:00
Teknium 9de555f3e3 chore(release): add 0xharryriddle to AUTHOR_MAP 2026-04-24 03:17:18 -07:00
Harry Riddle ac25e6c99a feat(auth-codex): add config-provider fallback detection for logout in hermes-agent/hermes_cli/auth.py 2026-04-24 03:17:18 -07:00
Teknium b2e124d082 refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047)
* refactor(commands): drop /provider and clean up slash registry

* refactor(commands): drop /plan special handler — use plain skill dispatch
2026-04-24 03:10:52 -07:00
Teknium b29287258a fix(aux-client): honor api_mode: anthropic_messages for named custom providers (#15059)
Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.

Two gaps caused this:

1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
   providers-dict branch (new-style) returned only name/base_url/api_key/
   model and dropped api_mode. The legacy custom_providers-list branch
   already propagated it correctly. The dict branch now parses and
   returns api_mode via _parse_api_mode() in both match paths.

2. agent/auxiliary_client.py::resolve_provider_client — the named
   custom provider block at ~L1740 ignored custom_entry['api_mode']
   and unconditionally built an OpenAI client (only wrapping for
   Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
   dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
   in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
   otherwise plain OpenAI. An explicit task-level api_mode override
   still wins over the provider entry's declared api_mode.

Fixes #15033

Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering

  - providers-dict preserves valid api_mode
  - invalid api_mode values are dropped
  - missing api_mode leaves the entry unchanged (no regression)
  - resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
    api_mode=anthropic_messages
  - full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
    with an auxiliary.<task> override
  - providers without api_mode still use the OpenAI-wire path
2026-04-24 03:10:30 -07:00
luyao618 bc15f526fb fix(agent): exclude prior-history tool messages from background review summary
Cherry-pick-of: 27b6a217b (PR #14967 by @luyao618)

Co-authored-by: luyao618 <364939526@qq.com>
2026-04-24 03:10:19 -07:00
Teknium ba3284f34a chore(release): map salvage-batch contributors in AUTHOR_MAP
Adds three contributors whose commits land via this batch of salvage PRs:

- @mrunmayee17 (mrunmayeerane17@gmail.com) — Discord wildcard fix #14920
- @camaragon   (69489633+camaragon@users.noreply.github.com) — ACP MCP fix #14986
- @shamork     (shamork@outlook.com) — NO_PROXY bypass fix #14966

Required by CI, which rejects PRs with unmapped personal emails.
2026-04-24 03:04:42 -07:00
Teknium f24956ba12 fix(resume): redirect --resume to the descendant that actually holds the messages
When context compression fires mid-session, run_agent's _compress_context
ends the current session, creates a new child session linked by
parent_session_id, and resets the SQLite flush cursor. New messages land
in the child; the parent row ends up with message_count = 0. A user who
runs 'hermes --resume <original_id>' sees a blank chat even though the
transcript exists — just under a descendant id.

PR #12920 already fixed the exit banner to print the live descendant id
at session end, but that didn't help users who resume by a session id
captured BEFORE the banner update (scripts, sessions list, old terminal
scrollback) or who type the parent id manually.

Fix: add SessionDB.resolve_resume_session_id() which walks the
parent→child chain forward and returns the first descendant with at
least one message row. Wire it into all three resume entry points:

  - HermesCLI._preload_resumed_session() (early resume at run() time)
  - HermesCLI._init_agent() (the classical resume path)
  - /resume slash command

Semantics preserved when the chain has no descendants with messages,
when the requested session already has messages, or when the id is
unknown. A depth cap of 32 guards against malformed loops.

This does NOT concatenate the pre-compression parent transcript into
the child — the whole point of compression is to shrink that, so
replaying it would blow the cache budget we saved. We just jump to
the post-compression child. The summary already reflects what was
compressed away.

Tests: tests/hermes_state/test_resolve_resume_session_id.py covers
  - the exact 6-session shape from the issue
  - passthrough when session has messages / no descendants
  - passthrough for nonexistent / empty / None input
  - middle-of-chain redirects
  - fork resolution (prefers most-recent child)

Closes #15000
2026-04-24 03:04:42 -07:00
Teknium 166b960fe4 test(proxy): regression tests for NO_PROXY bypass on keepalive client
Pin the behaviour added in the preceding commit — `_get_proxy_for_base_url()`
must return None for hosts covered by NO_PROXY and the HTTPS_PROXY otherwise,
and the full `_create_openai_client()` path must NOT mount HTTPProxy for a
NO_PROXY host.

Refs: #14966
2026-04-24 03:04:42 -07:00
shamork cbc39a8672 fix(proxy): honor no_proxy for local custom endpoints 2026-04-24 03:04:42 -07:00
Cameron Aragon dfc5563641 fix(acp): include MCP toolsets in ACP sessions 2026-04-24 03:04:42 -07:00
Teknium 8a1e247c6c fix(discord): honor wildcard '*' in ignored_channels and free_response_channels
Follow-up to the allowed_channels wildcard fix in the preceding commit.
The same '*' literal trap affected two other Discord channel config lists:

- DISCORD_IGNORED_CHANNELS: '*' was stored as the literal string in the
  ignored set, and the intersection check never matched real channel IDs,
  so '*' was a no-op instead of silencing every channel.
- DISCORD_FREE_RESPONSE_CHANNELS: same shape — '*' never matched, so
  the bot still required a mention everywhere.

Add a '*' short-circuit to both checks, matching the allowed_channels
semantics. Extend tests/gateway/test_discord_allowed_channels.py with
regression coverage for all three lists.

Refs: #14920
2026-04-24 03:04:42 -07:00
Mrunmayee Rane 8598746e86 fix(discord): honor wildcard '*' in DISCORD_ALLOWED_CHANNELS
allowed_channels: "*" in config (or DISCORD_ALLOWED_CHANNELS="*" env var)
is meant to allow all channels, but the check was comparing numeric channel
IDs against the literal string set {"*"} via set intersection — always empty,
so every message was silently dropped.

Add a "*" short-circuit before the set intersection, consistent with every
other platform's allowlist handling (Signal, Slack, Telegram all do this).

Fixes #14920
2026-04-24 03:04:42 -07:00
Teknium f58a16f520 fix(auth): apply verify= to Codex OAuth /models probe (#15049)
Follow-up to PR #14533 — applies the same _resolve_requests_verify()
treatment to the one requests.get() site the PR missed (Codex OAuth
chatgpt.com /models probe). Keeps all seven requests.get() callsites
in model_metadata.py consistent so HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE /
SSL_CERT_FILE are honored everywhere.

Co-authored-by: teknium1 <teknium@hermes-agent>
2026-04-24 03:02:24 -07:00
Teknium 621fd348dc chore(release): add ReginaldasR to AUTHOR_MAP 2026-04-24 03:02:16 -07:00
Reginaldas 3e10f339fd fix(providers): send user agent to routermint endpoints 2026-04-24 03:02:16 -07:00
Teknium 5fdba79eb4 chore(release): add keiravoss94 AUTHOR_MAP entry 2026-04-24 03:02:03 -07:00
Keira Voss 2ba9b29f37 docs(plugins): correct pre_gateway_dispatch doc text and add hooks.md section
Follow-up to aeff6dfe:

- Fix semantic error in VALID_HOOKS inline comment ("after core auth" ->
  "before auth"). Hook intentionally runs BEFORE auth so plugins can
  handle unauthorized senders without triggering the pairing flow.
- Fix wrong class name in the same comment (HermesGateway ->
  GatewayRunner, matching gateway/run.py).
- Add a full ### pre_gateway_dispatch section in
  website/docs/user-guide/features/hooks.md (matches the pattern of
  every other plugin hook: signature, params table, fires-where,
  return-value table, use cases, two worked examples) plus a row in
  the quick-reference table.
- Add the anchor link on the plugins.md table row so it matches the
  other hook entries.

No code behavior change.
2026-04-24 03:02:03 -07:00
Keira Voss 1ef1e4c669 feat(plugins): add pre_gateway_dispatch hook
Introduces a new plugin hook `pre_gateway_dispatch` fired once per
incoming MessageEvent in `_handle_message`, after the internal-event
guard but before the auth / pairing chain. Plugins may return a dict
to influence flow:

    {"action": "skip",    "reason": "..."}  -> drop (no reply)
    {"action": "rewrite", "text":   "..."}  -> replace event.text
    {"action": "allow"}  /  None             -> normal dispatch

Motivation: gateway-level message-flow patterns that don't fit cleanly
into any single adapter — e.g. listen-only group-chat windows (buffer
ambient messages, collapse on @mention), or human-handover silent
ingest (record messages while an owner handles the chat manually).
Today these require forking core; with this hook they can live in a
single profile-agnostic plugin.

Hook runs BEFORE auth so plugins can handle unauthorized senders
(e.g. customer-service handover ingest) without triggering the
pairing-code flow. Exceptions in plugin callbacks are caught and
logged; the first non-None action dict wins, remaining results are
ignored.

Includes:
- `VALID_HOOKS` entry + inline doc in `hermes_cli/plugins.py`
- Invocation block in `gateway/run.py::_handle_message`
- 5 new tests in `tests/gateway/test_pre_gateway_dispatch.py`
  (skip, rewrite, allow, exception safety, internal-event bypass)
- 2 additional tests in `tests/hermes_cli/test_plugins.py`
- Table entry in `website/docs/user-guide/features/plugins.md`

Made-with: Cursor
2026-04-24 03:02:03 -07:00
0xbyt4 8aa37a0cf9 fix(auth): honor SSL CA env vars across httpx + requests callsites
- hermes_cli/auth.py: add _default_verify() with macOS Homebrew certifi
  fallback (mirrors weixin 3a0ec1d93). Extend env var chain to include
  REQUESTS_CA_BUNDLE so one env var works across httpx + requests paths.
- agent/model_metadata.py: add _resolve_requests_verify() reading
  HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE in priority
  order. Apply explicit verify= to all 6 requests.get callsites.
- Tests: 18 new unit tests + autouse platform pin on existing
  TestResolveVerifyFallback to keep its "returns True" assertions
  platform-independent.

Empirically verified against self-signed HTTPS server: requests honors
REQUESTS_CA_BUNDLE only; httpx honors SSL_CERT_FILE only. Hermes now
honors all three everywhere.

Triggered by Discord reports — Nous OAuth SSL failure on macOS
Homebrew Python; custom provider self-signed cert ignored despite
REQUESTS_CA_BUNDLE set in env.
2026-04-24 03:00:33 -07:00
Teknium b0cb81a089 fix(auth): route alibaba_coding* aliases through resolve_provider
The aliases were added to hermes_cli/providers.py but auth.py has its own
_PROVIDER_ALIASES table inside resolve_provider() that is consulted before
PROVIDER_REGISTRY lookup. Without this, provider: alibaba_coding in
config.yaml (the exact repro from #14940) raised 'Unknown provider'.

Mirror the three aliases into auth.py so resolve_provider() accepts them.
2026-04-24 02:59:32 -07:00
ygd58 727d1088c4 fix(providers): register alibaba-coding-plan as a first-class provider
The alibaba-coding-plan provider (coding-intl.dashscope.aliyuncs.com/v1)
was not registered in providers.py or auth.py. When users set
provider: alibaba_coding or provider: alibaba-coding-plan in config.yaml,
Hermes could not resolve the credentials and fell back to OpenRouter
or rejected the request with HTTP 401/402 (issue #14940).

Changes:
- providers.py: add HermesOverlay for alibaba-coding-plan with
  ALIBABA_CODING_PLAN_BASE_URL env var support
- providers.py: add aliases alibaba_coding, alibaba-coding,
  alibaba_coding_plan -> alibaba-coding-plan
- auth.py: add ProviderConfig for alibaba-coding-plan with:
  - inference_base_url: https://coding-intl.dashscope.aliyuncs.com/v1
  - api_key_env_vars: ALIBABA_CODING_PLAN_API_KEY, DASHSCOPE_API_KEY

Fixes #14940
2026-04-24 02:59:32 -07:00
Teknium a9a4416c7c fix(compress): don't reach into ContextCompressor privates from /compress (#15039)
Manual /compress crashed with 'LCMEngine' object has no attribute
'_align_boundary_forward' when any context-engine plugin was active.
The gateway handler reached into _align_boundary_forward and
_find_tail_cut_by_tokens on tmp_agent.context_compressor, but those
are ContextCompressor-specific — not part of the generic ContextEngine
ABC — so every plugin engine (LCM, etc.) raised AttributeError.

- Add optional has_content_to_compress(messages) to ContextEngine ABC
  with a safe default of True (always attempt).
- Override it in the built-in ContextCompressor using the existing
  private helpers — preserves exact prior behavior for 'compressor'.
- Rewrite gateway /compress preflight to call the ABC method, deleting
  the private-helper reach-in.
- Add focus_topic to the ABC compress() signature. Make _compress_context
  retry without focus_topic on TypeError so older strict-sig plugins
  don't crash on manual /compress <focus>.
- Regression test with a fake ContextEngine subclass that only
  implements the ABC (mirrors LCM's surface).

Reported by @selfhostedsoul (Discord, Apr 22).
2026-04-24 02:55:43 -07:00
Teknium 4350668ae4 fix(transcription): fall back to CPU when CUDA runtime libs are missing
faster-whisper's device="auto" picks CUDA when ctranslate2's wheel
ships CUDA shared libs, even on hosts without the NVIDIA runtime
(libcublas.so.12 / libcudnn*). On those hosts the model often loads
fine but transcribe() fails at first dlopen, and the broken model
stays cached in the module-global — every subsequent voice message
in the gateway process fails identically until restart.

- Add _load_local_whisper_model() wrapper: try auto, catch missing-lib
  errors, retry on device=cpu compute_type=int8.
- Wrap transcribe() with the same fallback: evict cached model, reload
  on CPU, retry once. Required because the dlopen failure only surfaces
  at first kernel launch, not at model construction.
- Narrow marker list (libcublas, libcudnn, libcudart, 'cannot be loaded',
  'no kernel image is available', 'no CUDA-capable device', driver
  mismatch). Deliberately excludes 'CUDA out of memory' and similar —
  those are real runtime failures that should surface, not be silently
  retried on CPU.
- Tests for load-time fallback, runtime fallback (with cached-model
  eviction verified), and the OOM non-fallback path.

Reported via Telegram voice-message dumps on WSL2 hosts where libcublas
isn't installed by default.
2026-04-24 02:50:14 -07:00
Teknium 34c3e67109 fix: sanitize tool schemas for llama.cpp backends; restore MCP in TUI (#15032)
Local llama.cpp servers (e.g. ggml-org/llama.cpp:full-cuda) fail the entire
request with HTTP 400 'Unable to generate parser for this template. ...
Unrecognized schema: "object"' when any tool schema contains shapes its
json-schema-to-grammar converter can't handle:

  * 'type': 'object' without 'properties'
  * bare string schema values ('additionalProperties: "object"')
  * 'type': ['X', 'null'] arrays (nullable form)

Cloud providers accept these silently, so they ship from external MCP
servers (Atlassian, GCloud, Datadog) and from a couple of our own tools.

Changes

- tools/schema_sanitizer.py: walks the finalized tool list right before it
  leaves get_tool_definitions() and repairs the hostile shapes in a deep
  copy. No-op on well-formed schemas. Recurses into properties, items,
  additionalProperties, anyOf/oneOf/allOf, and $defs.
- model_tools.get_tool_definitions(): invoke the sanitizer as the last
  step so all paths (built-in, MCP, plugin, dynamically-rebuilt) get
  covered uniformly.
- tools/browser_cdp_tool.py, tools/mcp_tool.py: fix our own bare-object
  schemas so sanitization isn't load-bearing for in-repo tools.
- tui_gateway/server.py: _load_enabled_toolsets() was passing
  include_default_mcp_servers=False at runtime. That's the config-editing
  variant (see PR #3252) — it silently drops every default MCP server
  from the TUI's enabled_toolsets, which is why the TUI didn't hit the
  llama.cpp crash (no MCP tools sent at all). Switch to True so TUI
  matches CLI behavior.

Tests

tests/tools/test_schema_sanitizer.py (17 tests) covers the individual
failure modes, well-formed pass-through, deep-copy isolation, and
required-field pruning.

E2E: loaded the default 'hermes-cli' toolset with MCP discovery and
confirmed all 27 resolved tool schemas pass a llama.cpp-compatibility
walk (no 'object' node missing 'properties', no bare-string schema
values).
2026-04-24 02:44:46 -07:00
brooklyn! 5dda4cab41 Merge pull request #14968 from NousResearch/bb/tui-section-visibility
feat(tui): per-section visibility for the details accordion
2026-04-24 03:02:26 -05:00
Brooklyn Nicholson 6604e94c75 fix(tui): gate messageLine on content-bearing sections, not all sections
Round-2 Copilot review on #14968 caught two leftover spots that didn't
fully respect per-section overrides:

- messageLine.tsx (trail branch): the previous fix gated on
  `SECTION_NAMES.some(...)`, which stayed true whenever any section was
  visible.  With `thinking: 'expanded'` as the new built-in default,
  that meant `display.sections.tools: hidden` left an empty wrapper Box
  alive for trail messages.  Now gates on the actual content-bearing
  sections for a trail message — `tools` OR `activity` — so a
  tools-hidden config drops the wrapper cleanly.

- messageLine.tsx (showDetails): still keyed off the global
  `detailsMode !== 'hidden'`, so per-section overrides like
  `sections.thinking: expanded` couldn't escape global hidden for
  assistant messages with reasoning + tool metadata.  Recomputed via
  resolved per-section modes (`thinkingMode`/`toolsMode`).

- types.ts: rewrote the SectionVisibility doc comment to reflect the
  actual resolution order (explicit override → SECTION_DEFAULTS →
  global), so the docstring stops claiming "missing keys fall back to
  the global mode" when SECTION_DEFAULTS now layers in between.

All three lookups (thinking/tools/activity) are computed once at the
top of MessageLine and shared by every branch.
2026-04-24 03:01:06 -05:00
Brooklyn Nicholson 67bfd4b828 feat(tui): stream thinking + tools expanded by default
Extends SECTION_DEFAULTS so the out-of-the-box TUI shows the turn as
a live transcript (reasoning + tool calls streaming inline) instead of
a wall of `▸` chevrons the user has to click every turn.

Final default matrix:

  - thinking: expanded
  - tools:    expanded
  - activity: hidden    (unchanged from the previous commit)
  - subagents: falls through to details_mode (collapsed by default)

Everything explicit in `display.sections` still wins, so anyone who
already pinned an override keeps their layout.  One-line revert is
`display.sections.<name>: collapsed`.
2026-04-24 02:53:44 -05:00
Brooklyn Nicholson 70925363b6 fix(tui): per-section overrides escape global details_mode: hidden
Copilot review on #14968 caught that the early returns gated on the
global `detailsMode === 'hidden'` short-circuited every render path
before sectionMode() got a chance to apply per-section overrides — so
`details_mode: hidden` + `sections.tools: expanded` was silently a no-op.

Three call sites had the same bug shape; all now key off the resolved
section modes:

- ToolTrail: replace the `detailsMode === 'hidden'` early return with
  an `allHidden = every section resolved to hidden` check.  When that's
  true, fall back to the floating-alert backstop (errors/warnings) so
  quiet-mode users aren't blind to ambient failures, and update the
  comment block to match the actual condition.

- messageLine.tsx: drop the same `detailsMode === 'hidden'` pre-check
  on `msg.kind === 'trail'`; only skip rendering the wrapper when every
  section resolves to hidden (`SECTION_NAMES.some(...) !== 'hidden'`).

- useMainApp.ts: rebuild `showProgressArea` around `anyPanelVisible`
  instead of branching on the global mode.  This also fixes the
  suppressed Copilot concern about an empty wrapper Box rendering above
  the streaming area when ToolTrail returns null.

Regression test in details.test.ts pins the override-escapes-hidden
behaviour for tools/thinking/activity.  271/271 vitest, lints clean.
2026-04-24 02:49:58 -05:00
Brooklyn Nicholson 005cc29e98 refactor(tui): /clean pass on per-section visibility plumbing
- domain/details: extract `norm()`, fold parseDetailsMode + resolveSections
  into terser functional form, reject array values for resolveSections
- slash /details: destructure tokens, factor reset/mode into one dispatch,
  drop DETAIL_MODES set + DetailsMode/SectionName imports (parseDetailsMode
  + isSectionName narrow + return), centralize usage strings
- ToolTrail: collapse 4 separate xxxSection vars into one memoized
  `visible` map; effect deps stabilize on the memo identity instead of
  4 primitives
2026-04-24 02:42:03 -05:00
Brooklyn Nicholson 728767e910 feat(tui): hide the activity panel by default
The activity panel (gateway hints, terminal-parity nudges, background
notifications) is noise for the typical day-to-day user, who only cares
about thinking + tools + streamed content.  Make `hidden` the built-in
default for that section so users land on the quiet mode out of the box.

Tool failures still render inline on the failing tool row, so this
default suppresses the noise feed without losing the signal.

Opt back in with `display.sections.activity: collapsed` (chevron) or
`expanded` (always open) in `~/.hermes/config.yaml`, or live with
`/details activity collapsed`.

Implementation: SECTION_DEFAULTS in domain/details.ts, applied as the
fallback in `sectionMode()` between the explicit override and the
global details_mode.  Existing `display.sections.activity` overrides
take precedence — no migration needed for users who already set it.
2026-04-24 02:37:42 -05:00
Brooklyn Nicholson 78481ac124 feat(tui): per-section visibility for the details accordion
Adds optional per-section overrides on top of the existing global
details_mode (hidden | collapsed | expanded).  Lets users keep the
accordion collapsed by default while auto-expanding tools, or hide the
activity panel entirely without touching thinking/tools/subagents.

Config (~/.hermes/config.yaml):

    display:
      details_mode: collapsed
      sections:
        thinking: expanded
        tools:    expanded
        activity: hidden

Slash command:

  /details                              show current global + overrides
  /details [hidden|collapsed|expanded]  set global mode (existing)
  /details <section> <mode|reset>       per-section override (new)
  /details <section> reset              clear override

Sections: thinking, tools, subagents, activity.

Implementation:

- ui-tui/src/types.ts             SectionName + SectionVisibility
- ui-tui/src/domain/details.ts    parseSectionMode / resolveSections /
                                  sectionMode + SECTION_NAMES
- ui-tui/src/app/uiStore.ts +
  app/interfaces.ts +
  app/useConfigSync.ts            sections threaded into UiState
- ui-tui/src/components/
  thinking.tsx                    ToolTrail consults per-section mode for
                                  hidden/expanded behaviour; expandAll
                                  skips hidden sections; floating-alert
                                  fallback respects activity:hidden
- ui-tui/src/components/
  messageLine.tsx + appLayout.tsx pass sections through render tree
- ui-tui/src/app/slash/
  commands/core.ts                /details <section> <mode|reset> syntax
- tui_gateway/server.py           config.set details_mode.<section>
                                  writes to display.sections.<section>
                                  (empty value clears the override)
- website/docs/user-guide/tui.md  documented

Tests: 14 new (4 domain, 4 useConfigSync, 3 slash, 3 gateway).
Total: 269/269 vitest, all gateway tests pass.
2026-04-24 02:34:32 -05:00
Austin Pickett 809868e628 feat: refac 2026-04-24 01:04:19 -04:00
Austin Pickett e5d2815b41 feat: add sidebar 2026-04-24 00:56:19 -04:00
750 changed files with 85677 additions and 8580 deletions
+1
View File
@@ -69,3 +69,4 @@ mini-swe-agent/
.nix-stamps/
result
website/static/api/skills-index.json
models-dev-upstream/
+13
View File
@@ -240,6 +240,19 @@ npm run fmt # prettier
npm test # vitest
```
### TUI in the Dashboard (`hermes dashboard` → `/chat`)
The dashboard embeds the real `hermes --tui`**not** a rewrite. See `hermes_cli/pty_bridge.py` + the `@app.websocket("/api/pty")` endpoint in `hermes_cli/web_server.py`.
- Browser loads `web/src/pages/ChatPage.tsx`, which mounts xterm.js's `Terminal` with the WebGL renderer, `@xterm/addon-fit` for container-driven resize, and `@xterm/addon-unicode11` for modern wide-character widths.
- `/api/pty?token=…` upgrades to a WebSocket; auth uses the same ephemeral `_SESSION_TOKEN` as REST, via query param (browsers can't set `Authorization` on WS upgrade).
- The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
- Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.
**Do not re-implement the primary chat experience in React.** The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded `hermes --tui` — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.
**Structured React UI around the TUI is allowed when it is not a second chat surface.** Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. `ChatSidebar`, `ModelPickerDialog`, `ToolCall`) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.
---
## Adding New Tools
+18 -6
View File
@@ -10,9 +10,11 @@ ENV PYTHONUNBUFFERED=1
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# Install system dependencies in one layer, clear APT cache
# tini reaps orphaned zombie processes (MCP stdio subprocesses, git, bun, etc.)
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli && \
build-essential nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
rm -rf /var/lib/apt/lists/*
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
@@ -28,22 +30,32 @@ WORKDIR /opt/hermes
# unless the lockfiles themselves change.
COPY package.json package-lock.json ./
COPY web/package.json web/package-lock.json web/
COPY ui-tui/package.json ui-tui/package-lock.json ui-tui/
COPY ui-tui/packages/hermes-ink/package.json ui-tui/packages/hermes-ink/package-lock.json ui-tui/packages/hermes-ink/
RUN npm install --prefer-offline --no-audit && \
npx playwright install --with-deps chromium --only-shell && \
(cd web && npm install --prefer-offline --no-audit) && \
(cd ui-tui && npm install --prefer-offline --no-audit) && \
npm cache clean --force
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
COPY --chown=hermes:hermes . .
# Build web dashboard (Vite outputs to hermes_cli/web_dist/)
RUN cd web && npm run build
# Build browser dashboard and terminal UI assets.
RUN cd web && npm run build && \
cd ../ui-tui && npm run build
# ---------- Permissions ----------
# Make install dir world-readable so any HERMES_UID can read it at runtime.
# The venv needs to be traversable too.
USER root
RUN chmod -R a+rX /opt/hermes
# Start as root so the entrypoint can usermod/groupmod + gosu.
# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
# ---------- Python virtualenv ----------
RUN chown hermes:hermes /opt/hermes
USER hermes
RUN uv venv && \
uv pip install --no-cache-dir -e ".[all]"
@@ -52,4 +64,4 @@ ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
ENV PATH="/opt/data/.local/bin:${PATH}"
VOLUME [ "/opt/data" ]
ENTRYPOINT [ "/opt/hermes/docker/entrypoint.sh" ]
ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]
+9 -3
View File
@@ -60,7 +60,7 @@ from acp_adapter.events import (
make_tool_progress_cb,
)
from acp_adapter.permissions import make_approval_callback
from acp_adapter.session import SessionManager, SessionState
from acp_adapter.session import SessionManager, SessionState, _expand_acp_enabled_toolsets
logger = logging.getLogger(__name__)
@@ -287,7 +287,11 @@ class HermesACPAgent(acp.Agent):
try:
from model_tools import get_tool_definitions
enabled_toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
enabled_toolsets = _expand_acp_enabled_toolsets(
getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"],
mcp_server_names=[server.name for server in mcp_servers],
)
state.agent.enabled_toolsets = enabled_toolsets
disabled_toolsets = getattr(state.agent, "disabled_toolsets", None)
state.agent.tools = get_tool_definitions(
enabled_toolsets=enabled_toolsets,
@@ -754,7 +758,9 @@ class HermesACPAgent(acp.Agent):
def _cmd_tools(self, args: str, state: SessionState) -> str:
try:
from model_tools import get_tool_definitions
toolsets = getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
toolsets = _expand_acp_enabled_toolsets(
getattr(state.agent, "enabled_toolsets", None) or ["hermes-acp"]
)
tools = get_tool_definitions(enabled_toolsets=toolsets, quiet_mode=True)
if not tools:
return "No tools available."
+28 -1
View File
@@ -106,6 +106,24 @@ def _register_task_cwd(task_id: str, cwd: str) -> None:
logger.debug("Failed to register ACP task cwd override", exc_info=True)
def _expand_acp_enabled_toolsets(
toolsets: List[str] | None = None,
mcp_server_names: List[str] | None = None,
) -> List[str]:
"""Return ACP toolsets plus explicit MCP server toolsets for this session."""
expanded: List[str] = []
for name in list(toolsets or ["hermes-acp"]):
if name and name not in expanded:
expanded.append(name)
for server_name in list(mcp_server_names or []):
toolset_name = f"mcp-{server_name}"
if server_name and toolset_name not in expanded:
expanded.append(toolset_name)
return expanded
def _clear_task_cwd(task_id: str) -> None:
"""Remove task-specific cwd overrides for an ACP session."""
if not task_id:
@@ -537,9 +555,18 @@ class SessionManager:
elif isinstance(model_cfg, str) and model_cfg.strip():
default_model = model_cfg.strip()
configured_mcp_servers = [
name
for name, cfg in (config.get("mcp_servers") or {}).items()
if not isinstance(cfg, dict) or cfg.get("enabled", True) is not False
]
kwargs = {
"platform": "acp",
"enabled_toolsets": ["hermes-acp"],
"enabled_toolsets": _expand_acp_enabled_toolsets(
["hermes-acp"],
mcp_server_names=configured_mcp_servers,
),
"quiet_mode": True,
"session_id": session_id,
"model": model or default_model,
+122 -8
View File
@@ -14,6 +14,8 @@ import copy
import json
import logging
import os
import platform
import subprocess
from pathlib import Path
from hermes_constants import get_hermes_home
@@ -277,8 +279,9 @@ def _is_oauth_token(key: str) -> bool:
Positively identifies Anthropic OAuth tokens by their key format:
- ``sk-ant-`` prefix (but NOT ``sk-ant-api``) → setup tokens, managed keys
- ``eyJ`` prefix → JWTs from the Anthropic OAuth flow
- ``cc-`` prefix → Claude Code OAuth access tokens (from CLAUDE_CODE_OAUTH_TOKEN)
Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match either pattern
Non-Anthropic keys (MiniMax, Alibaba, etc.) don't match any pattern
and correctly return False.
"""
if not key:
@@ -292,6 +295,9 @@ def _is_oauth_token(key: str) -> bool:
# JWTs from Anthropic OAuth flow
if key.startswith("eyJ"):
return True
# Claude Code OAuth access tokens (opaque, from CLAUDE_CODE_OAUTH_TOKEN)
if key.startswith("cc-"):
return True
return False
@@ -384,7 +390,16 @@ def build_anthropic_client(api_key: str, base_url: str = None, timeout: float =
"timeout": Timeout(timeout=float(_read_timeout), connect=10.0),
}
if normalized_base_url:
kwargs["base_url"] = normalized_base_url
# Azure Anthropic endpoints require an ``api-version`` query parameter.
# Pass it via default_query so the SDK appends it to every request URL
# without corrupting the base_url (appending it directly produces
# malformed paths like /anthropic?api-version=.../v1/messages).
_is_azure_endpoint = "azure.com" in normalized_base_url.lower()
if _is_azure_endpoint and "api-version" not in normalized_base_url:
kwargs["base_url"] = normalized_base_url.rstrip("/")
kwargs["default_query"] = {"api-version": "2025-04-15"}
else:
kwargs["base_url"] = normalized_base_url
common_betas = _common_betas_for_base_url(normalized_base_url)
if _is_kimi_coding_endpoint(base_url):
@@ -461,8 +476,72 @@ def build_anthropic_bedrock_client(region: str):
)
def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
"""Read Claude Code OAuth credentials from the macOS Keychain.
Claude Code >=2.1.114 stores credentials in the macOS Keychain under the
service name "Claude Code-credentials" rather than (or in addition to)
the JSON file at ~/.claude/.credentials.json.
The password field contains a JSON string with the same claudeAiOauth
structure as the JSON file.
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
import platform
import subprocess
if platform.system() != "Darwin":
return None
try:
# Read the "Claude Code-credentials" generic password entry
result = subprocess.run(
["security", "find-generic-password",
"-s", "Claude Code-credentials",
"-w"],
capture_output=True,
text=True,
timeout=5,
)
except (OSError, subprocess.TimeoutExpired):
logger.debug("Keychain: security command not available or timed out")
return None
if result.returncode != 0:
logger.debug("Keychain: no entry found for 'Claude Code-credentials'")
return None
raw = result.stdout.strip()
if not raw:
return None
try:
data = json.loads(raw)
except json.JSONDecodeError:
logger.debug("Keychain: credentials payload is not valid JSON")
return None
oauth_data = data.get("claudeAiOauth")
if oauth_data and isinstance(oauth_data, dict):
access_token = oauth_data.get("accessToken", "")
if access_token:
return {
"accessToken": access_token,
"refreshToken": oauth_data.get("refreshToken", ""),
"expiresAt": oauth_data.get("expiresAt", 0),
"source": "macos_keychain",
}
return None
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials from ~/.claude/.credentials.json.
"""Read refreshable Claude Code OAuth credentials.
Checks two sources in order:
1. macOS Keychain (Darwin only) — "Claude Code-credentials" entry
2. ~/.claude/.credentials.json file
This intentionally excludes ~/.claude.json primaryApiKey. Opencode's
subscription flow is OAuth/setup-token based with refreshable credentials,
@@ -471,6 +550,12 @@ def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
"""
# Try macOS Keychain first (covers Claude Code >=2.1.114)
kc_creds = _read_claude_code_credentials_from_keychain()
if kc_creds:
return kc_creds
# Fall back to JSON file
cred_path = Path.home() / ".claude" / ".credentials.json"
if cred_path.exists():
try:
@@ -641,7 +726,9 @@ def _write_claude_code_credentials(
existing["claudeAiOauth"] = oauth_data
cred_path.parent.mkdir(parents=True, exist_ok=True)
cred_path.write_text(json.dumps(existing, indent=2), encoding="utf-8")
_tmp_cred = cred_path.with_suffix(".tmp")
_tmp_cred.write_text(json.dumps(existing, indent=2), encoding="utf-8")
_tmp_cred.replace(cred_path)
# Restrict permissions (credentials file)
cred_path.chmod(0o600)
except (OSError, IOError) as e:
@@ -908,6 +995,26 @@ def read_hermes_oauth_credentials() -> Optional[Dict[str, Any]]:
# ---------------------------------------------------------------------------
def _is_bedrock_model_id(model: str) -> bool:
"""Detect AWS Bedrock model IDs that use dots as namespace separators.
Bedrock model IDs come in two forms:
- Bare: ``anthropic.claude-opus-4-7``
- Regional (inference profiles): ``us.anthropic.claude-sonnet-4-5-v1:0``
In both cases the dots separate namespace components, not version
numbers, and must be preserved verbatim for the Bedrock API.
"""
lower = model.lower()
# Regional inference-profile prefixes
if any(lower.startswith(p) for p in ("global.", "us.", "eu.", "ap.", "jp.")):
return True
# Bare Bedrock model IDs: provider.model-family
if lower.startswith("anthropic."):
return True
return False
def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
"""Normalize a model name for the Anthropic API.
@@ -915,11 +1022,19 @@ def normalize_model_name(model: str, preserve_dots: bool = False) -> str:
- Converts dots to hyphens in version numbers (OpenRouter uses dots,
Anthropic uses hyphens: claude-opus-4.6 → claude-opus-4-6), unless
preserve_dots is True (e.g. for Alibaba/DashScope: qwen3.5-plus).
- Preserves Bedrock model IDs (``anthropic.claude-opus-4-7``) and
regional inference profiles (``us.anthropic.claude-*``) whose dots
are namespace separators, not version separators.
"""
lower = model.lower()
if lower.startswith("anthropic/"):
model = model[len("anthropic/"):]
if not preserve_dots:
# Bedrock model IDs use dots as namespace separators
# (e.g. "anthropic.claude-opus-4-7", "us.anthropic.claude-*").
# These must not be converted to hyphens. See issue #12295.
if _is_bedrock_model_id(model):
return model
# OpenRouter uses dots for version separators (claude-opus-4.6),
# Anthropic uses hyphens (claude-opus-4-6). Convert dots to hyphens.
model = model.replace(".", "-")
@@ -1574,9 +1689,9 @@ def build_anthropic_kwargs(
# ── Strip sampling params on 4.7+ ─────────────────────────────────
# Opus 4.7 rejects any non-default temperature/top_p/top_k with a 400.
# Callers (auxiliary_client, flush_memories, etc.) may set these for
# older models; drop them here as a safety net so upstream 4.6 → 4.7
# migrations don't require coordinated edits everywhere.
# Callers (auxiliary_client, etc.) may set these for older models;
# drop them here as a safety net so upstream 4.6 → 4.7 migrations
# don't require coordinated edits everywhere.
if _forbids_sampling_params(model):
for _sampling_key in ("temperature", "top_p", "top_k"):
kwargs.pop(_sampling_key, None)
@@ -1598,4 +1713,3 @@ def build_anthropic_kwargs(
return kwargs
+437 -50
View File
@@ -42,6 +42,7 @@ import time
from pathlib import Path # noqa: F401 — used by test mocks
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse, parse_qs, urlunparse
from openai import OpenAI
@@ -52,6 +53,17 @@ from utils import base_url_host_matches, base_url_hostname, normalize_proxy_env_
logger = logging.getLogger(__name__)
def _extract_url_query_params(url: str):
"""Extract query params from URL, return (clean_url, default_query dict or None)."""
parsed = urlparse(url)
if parsed.query:
clean = urlunparse(parsed._replace(query=""))
params = {k: v[0] for k, v in parse_qs(parsed.query).items()}
return clean, params
return url, None
# Module-level flag: only warn once per process about stale OPENAI_BASE_URL.
_stale_base_url_warned = False
@@ -70,10 +82,18 @@ _PROVIDER_ALIASES = {
"moonshot": "kimi-coding",
"kimi-cn": "kimi-coding-cn",
"moonshot-cn": "kimi-coding-cn",
"gmi-cloud": "gmi",
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
"claude-code": "anthropic",
"github": "copilot",
"github-copilot": "copilot",
"github-model": "copilot",
"github-models": "copilot",
"github-copilot-acp": "copilot-acp",
"copilot-acp-agent": "copilot-acp",
}
@@ -89,10 +109,11 @@ def _normalize_aux_provider(provider: Optional[str]) -> str:
if normalized == "main":
# Resolve to the user's actual main provider so named custom providers
# and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
main_prov = _read_main_provider()
main_prov = (_read_main_provider() or "").strip().lower()
if main_prov and main_prov not in ("auto", "main", ""):
return main_prov
return "custom"
normalized = main_prov
else:
return "custom"
return _PROVIDER_ALIASES.get(normalized, normalized)
@@ -136,6 +157,7 @@ _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"kimi-coding": "kimi-k2-turbo-preview",
"stepfun": "step-3.5-flash",
"kimi-coding-cn": "kimi-k2-turbo-preview",
"gmi": "google/gemini-3.1-flash-lite-preview",
"minimax": "MiniMax-M2.7",
"minimax-cn": "MiniMax-M2.7",
"anthropic": "claude-haiku-4-5-20251001",
@@ -383,7 +405,7 @@ class _CodexCompletionsAdapter:
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Tools support for flush_memories and similar callers
# Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
tools = kwargs.get("tools")
if tools:
converted = []
@@ -1150,8 +1172,10 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
return None, None
model = _read_main_model() or "gpt-4o-mini"
logger.debug("Auxiliary client: custom endpoint (%s, api_mode=%s)", model, custom_mode or "chat_completions")
_clean_base, _dq = _extract_url_query_params(custom_base)
_extra = {"default_query": _dq} if _dq else {}
if custom_mode == "codex_responses":
real_client = OpenAI(api_key=custom_key, base_url=custom_base)
real_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
return CodexAuxiliaryClient(real_client, model), model
if custom_mode == "anthropic_messages":
# Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
@@ -1165,12 +1189,12 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
"Custom endpoint declares api_mode=anthropic_messages but the "
"anthropic SDK is not installed — falling back to OpenAI-wire."
)
return OpenAI(api_key=custom_key, base_url=custom_base), model
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
return (
AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
model,
)
return OpenAI(api_key=custom_key, base_url=custom_base), model
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
@@ -1342,6 +1366,111 @@ def _is_auth_error(exc: Exception) -> bool:
return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
"""Detect provider 400s for an unsupported request parameter.
Different OpenAI-compatible endpoints phrase the same class of error a few
ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
``param`` field, ``X is not supported``, ``unknown parameter: X``,
``unrecognized request argument: X``. We match on both the parameter
name and a generic "unsupported/unknown/unrecognized parameter" marker so
call sites can reactively retry without the offending key instead of
surfacing a noisy auxiliary failure.
Generalizes the temperature-specific detector that originally shipped
with PR #15621 so the same retry strategy can cover ``max_tokens``,
``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
for the generalization pattern.
"""
param_lower = (param or "").lower()
if not param_lower:
return False
err_lower = str(exc).lower()
if param_lower not in err_lower:
return False
return any(marker in err_lower for marker in (
"unsupported parameter",
"unsupported_parameter",
"not supported",
"does not support",
"unknown parameter",
"unrecognized request argument",
"unrecognized parameter",
"invalid parameter",
))
def _is_unsupported_temperature_error(exc: Exception) -> bool:
"""Back-compat wrapper: detect API errors where the model rejects ``temperature``.
Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
public symbol because existing tests and call sites import it by name.
"""
return _is_unsupported_parameter_error(exc, "temperature")
def _evict_cached_clients(provider: str) -> None:
"""Drop cached auxiliary clients for a provider so fresh creds are used."""
normalized = _normalize_aux_provider(provider)
with _client_cache_lock:
stale_keys = [
key for key in _client_cache
if _normalize_aux_provider(str(key[0])) == normalized
]
for key in stale_keys:
client = _client_cache.get(key, (None, None, None))[0]
if client is not None:
_force_close_async_httpx(client)
try:
close_fn = getattr(client, "close", None)
if callable(close_fn):
close_fn()
except Exception:
pass
_client_cache.pop(key, None)
def _refresh_provider_credentials(provider: str) -> bool:
"""Refresh short-lived credentials for OAuth-backed auxiliary providers."""
normalized = _normalize_aux_provider(provider)
try:
if normalized == "openai-codex":
from hermes_cli.auth import resolve_codex_runtime_credentials
creds = resolve_codex_runtime_credentials(force_refresh=True)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
if normalized == "nous":
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
force_mint=True,
)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
if normalized == "anthropic":
from agent.anthropic_adapter import read_claude_code_credentials, _refresh_oauth_token, resolve_anthropic_token
creds = read_claude_code_credentials()
token = _refresh_oauth_token(creds) if isinstance(creds, dict) and creds.get("refreshToken") else None
if not str(token or "").strip():
token = resolve_anthropic_token()
if not str(token or "").strip():
return False
_evict_cached_clients(normalized)
return True
except Exception as exc:
logger.debug("Auxiliary provider credential refresh failed for %s: %s", normalized, exc)
return False
return False
def _try_payment_fallback(
failed_provider: str,
task: str = None,
@@ -1491,8 +1620,14 @@ def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Option
# below — never look up auth env vars ad-hoc.
def _to_async_client(sync_client, model: str):
"""Convert a sync client to its async counterpart, preserving Codex routing."""
def _to_async_client(sync_client, model: str, is_vision: bool = False):
"""Convert a sync client to its async counterpart, preserving Codex routing.
When ``is_vision=True`` and the underlying base URL is Copilot, the
resulting async client carries the ``Copilot-Vision-Request: true``
header so the request is routed to Copilot's vision-capable
infrastructure (otherwise vision payloads silently time out).
"""
from openai import AsyncOpenAI
if isinstance(sync_client, CodexAuxiliaryClient):
@@ -1521,9 +1656,11 @@ def _to_async_client(sync_client, model: str):
if base_url_host_matches(sync_base_url, "openrouter.ai"):
async_kwargs["default_headers"] = dict(_OR_HEADERS)
elif base_url_host_matches(sync_base_url, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
from hermes_cli.copilot_auth import copilot_request_headers
async_kwargs["default_headers"] = copilot_default_headers()
async_kwargs["default_headers"] = copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
)
elif base_url_host_matches(sync_base_url, "api.kimi.com"):
async_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
return AsyncOpenAI(**async_kwargs), model
@@ -1550,6 +1687,7 @@ def resolve_provider_client(
explicit_api_key: str = None,
api_mode: str = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Central router: given a provider name and optional model, return a
configured client with the correct auth, base URL, and API format.
@@ -1633,7 +1771,7 @@ def resolve_provider_client(
"auxiliary provider (using %r instead)", model, resolved)
model = None
final_model = model or resolved
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# ── OpenRouter ───────────────────────────────────────────────────
@@ -1646,7 +1784,7 @@ def resolve_provider_client(
)
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# ── Nous Portal (OAuth) ──────────────────────────────────────────
@@ -1663,7 +1801,7 @@ def resolve_provider_client(
"but Nous Portal not configured (run: hermes auth)")
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# ── OpenAI Codex (OAuth → Responses API) ─────────────────────────
@@ -1690,7 +1828,7 @@ def resolve_provider_client(
"but no Codex OAuth token found (run: hermes model)")
return None, None
final_model = _normalize_resolved_model(model or default, provider)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
@@ -1713,14 +1851,19 @@ def resolve_provider_client(
provider,
)
extra = {}
_clean_base, _dq = _extract_url_query_params(custom_base)
if _dq:
extra["default_query"] = _dq
if base_url_host_matches(custom_base, "api.kimi.com"):
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
extra["default_headers"] = copilot_default_headers()
client = OpenAI(api_key=custom_key, base_url=custom_base, **extra)
from hermes_cli.copilot_auth import copilot_request_headers
extra["default_headers"] = copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
)
client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# Try custom first, then codex, then API-key providers
for try_fn in (_try_custom_endpoint, _try_codex,
@@ -1730,13 +1873,13 @@ def resolve_provider_client(
final_model = _normalize_resolved_model(model or default, provider)
_cbase = str(getattr(client, "base_url", "") or "")
client = _wrap_if_needed(client, final_model, _cbase)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
logger.warning("resolve_provider_client: custom/main requested "
"but no endpoint credentials found")
return None, None
# ── Named custom providers (config.yaml custom_providers list) ───
# ── Named custom providers (config.yaml providers dict / custom_providers list) ───
try:
from hermes_cli.runtime_provider import _get_named_custom_provider
custom_entry = _get_named_custom_provider(provider)
@@ -1747,17 +1890,54 @@ def resolve_provider_client(
if not custom_key and custom_key_env:
custom_key = os.getenv(custom_key_env, "").strip()
custom_key = custom_key or "no-key-required"
# An explicit per-task api_mode override (from _resolve_task_provider_model)
# wins; otherwise fall back to what the provider entry declared.
entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
if custom_base:
final_model = _normalize_resolved_model(
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
provider,
)
client = OpenAI(api_key=custom_key, base_url=custom_base)
client = _wrap_if_needed(client, final_model, custom_base)
_clean_base2, _dq2 = _extract_url_query_params(custom_base)
_extra2 = {"default_query": _dq2} if _dq2 else {}
logger.debug(
"resolve_provider_client: named custom provider %r (%s)",
provider, final_model)
return (_to_async_client(client, final_model) if async_mode
"resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
provider, final_model, entry_api_mode or "chat_completions")
# anthropic_messages: route through the Anthropic Messages API
# via AnthropicAuxiliaryClient. Mirrors the anonymous-custom
# branch in _try_custom_endpoint(). See #15033.
if entry_api_mode == "anthropic_messages":
try:
from agent.anthropic_adapter import build_anthropic_client
real_client = build_anthropic_client(custom_key, custom_base)
except ImportError:
logger.warning(
"Named custom provider %r declares api_mode="
"anthropic_messages but the anthropic SDK is not "
"installed — falling back to OpenAI-wire.",
provider,
)
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
real_client, final_model, custom_key, custom_base, is_oauth=False,
)
if async_mode:
return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
return sync_anthropic, final_model
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
# codex_responses or inherited auto-detect (via _wrap_if_needed).
# _wrap_if_needed reads the closed-over `api_mode` (the task-level
# override). Named-provider entry api_mode=codex_responses also
# flows through here.
if entry_api_mode == "codex_responses" and not isinstance(
client, CodexAuxiliaryClient
):
client = CodexAuxiliaryClient(client, final_model)
else:
client = _wrap_if_needed(client, final_model, custom_base)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
logger.warning(
"resolve_provider_client: named custom provider %r has no base_url",
@@ -1789,7 +1969,7 @@ def resolve_provider_client(
logger.warning("resolve_provider_client: anthropic requested but no Anthropic credentials found")
return None, None
final_model = _normalize_resolved_model(model or default_model, provider)
return (_to_async_client(client, final_model) if async_mode else (client, final_model))
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode else (client, final_model))
creds = resolve_api_key_provider_credentials(provider)
api_key = str(creds.get("api_key", "")).strip()
@@ -1815,7 +1995,7 @@ def resolve_provider_client(
if is_native_gemini_base_url(base_url):
client = GeminiNativeClient(api_key=api_key, base_url=base_url)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
# Provider-specific headers
@@ -1823,9 +2003,11 @@ def resolve_provider_client(
if base_url_host_matches(base_url, "api.kimi.com"):
headers["User-Agent"] = "claude-code/0.1.0"
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
from hermes_cli.models import copilot_default_headers
from hermes_cli.copilot_auth import copilot_request_headers
headers.update(copilot_default_headers())
headers.update(copilot_request_headers(
is_agent_turn=True, is_vision=is_vision
))
client = OpenAI(api_key=api_key, base_url=base_url,
**({"default_headers": headers} if headers else {}))
@@ -1851,7 +2033,7 @@ def resolve_provider_client(
client = _wrap_if_needed(client, final_model, base_url)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
if pconfig.auth_type == "external_process":
@@ -1883,12 +2065,45 @@ def resolve_provider_client(
args=args,
)
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
return (_to_async_client(client, final_model) if async_mode
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
logger.warning("resolve_provider_client: external-process provider %s not "
"directly supported", provider)
return None, None
elif pconfig.auth_type == "aws_sdk":
# AWS SDK providers (Bedrock) — use the Anthropic Bedrock client via
# boto3's credential chain (IAM roles, SSO, env vars, instance metadata).
try:
from agent.bedrock_adapter import has_aws_credentials, resolve_bedrock_region
from agent.anthropic_adapter import build_anthropic_bedrock_client
except ImportError:
logger.warning("resolve_provider_client: bedrock requested but "
"boto3 or anthropic SDK not installed")
return None, None
if not has_aws_credentials():
logger.debug("resolve_provider_client: bedrock requested but "
"no AWS credentials found")
return None, None
region = resolve_bedrock_region()
default_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
final_model = _normalize_resolved_model(model or default_model, provider)
try:
real_client = build_anthropic_bedrock_client(region)
except ImportError as exc:
logger.warning("resolve_provider_client: cannot create Bedrock "
"client: %s", exc)
return None, None
client = AnthropicAuxiliaryClient(
real_client, final_model, api_key="aws-sdk",
base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
)
logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
# OAuth providers — route through their specific try functions
if provider == "nous":
@@ -1961,8 +2176,13 @@ def _normalize_vision_provider(provider: Optional[str]) -> str:
return _normalize_aux_provider(provider)
def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]:
def _resolve_strict_vision_backend(
provider: str,
model: Optional[str] = None,
) -> Tuple[Optional[Any], Optional[str]]:
provider = _normalize_vision_provider(provider)
if provider == "copilot":
return resolve_provider_client("copilot", model, is_vision=True)
if provider == "openrouter":
return _try_openrouter()
if provider == "nous":
@@ -2030,7 +2250,7 @@ def resolve_vision_provider_client(
return resolved_provider, None, None
final_model = resolved_model or default_model
if async_mode:
async_client, async_model = _to_async_client(sync_client, final_model)
async_client, async_model = _to_async_client(sync_client, final_model, is_vision=True)
return resolved_provider, async_client, async_model
return resolved_provider, sync_client, final_model
@@ -2062,8 +2282,11 @@ def resolve_vision_provider_client(
main_provider = _read_main_provider()
main_model = _read_main_model()
if main_provider and main_provider not in ("auto", ""):
vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
if main_provider == "nous":
sync_client, default_model = _resolve_strict_vision_backend(main_provider)
sync_client, default_model = _resolve_strict_vision_backend(
main_provider, vision_model
)
if sync_client is not None:
logger.info(
"Vision auto-detect: using main provider %s (%s)",
@@ -2071,10 +2294,10 @@ def resolve_vision_provider_client(
)
return _finalize(main_provider, sync_client, default_model)
else:
vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
rpc_client, rpc_model = resolve_provider_client(
main_provider, vision_model,
api_mode=resolved_api_mode)
api_mode=resolved_api_mode,
is_vision=True)
if rpc_client is not None:
logger.info(
"Vision auto-detect: using main provider %s (%s)",
@@ -2096,11 +2319,14 @@ def resolve_vision_provider_client(
return None, None, None
if requested in _VISION_AUTO_PROVIDER_ORDER:
sync_client, default_model = _resolve_strict_vision_backend(requested)
sync_client, default_model = _resolve_strict_vision_backend(
requested, resolved_model
)
return _finalize(requested, sync_client, default_model)
client, final_model = _get_cached_client(requested, resolved_model, async_mode,
api_mode=resolved_api_mode)
api_mode=resolved_api_mode,
is_vision=True)
if client is None:
return requested, None, None
return requested, client, final_model
@@ -2164,10 +2390,11 @@ def _client_cache_key(
api_key: Optional[str] = None,
api_mode: Optional[str] = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> tuple:
runtime = _normalize_main_runtime(main_runtime)
runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key, is_vision)
def _store_cached_client(cache_key: tuple, client: Any, default_model: Optional[str], *, bound_loop: Any = None) -> None:
@@ -2193,6 +2420,7 @@ def _refresh_nous_auxiliary_client(
api_key: Optional[str] = None,
api_mode: Optional[str] = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Refresh Nous runtime creds, rebuild the client, and replace the cache entry."""
runtime = _resolve_nous_runtime_api(force_refresh=True)
@@ -2210,7 +2438,7 @@ def _refresh_nous_auxiliary_client(
current_loop = _aio.get_event_loop()
except RuntimeError:
pass
client, final_model = _to_async_client(sync_client, final_model or "")
client, final_model = _to_async_client(sync_client, final_model or "", is_vision=is_vision)
else:
client = sync_client
@@ -2221,6 +2449,7 @@ def _refresh_nous_auxiliary_client(
api_key=api_key,
api_mode=api_mode,
main_runtime=main_runtime,
is_vision=is_vision,
)
_store_cached_client(cache_key, client, final_model, bound_loop=current_loop)
return client, final_model
@@ -2332,12 +2561,19 @@ def _is_openrouter_client(client: Any) -> bool:
return False
def _cached_client_accepts_slash_models(client: Any, cached_default: Optional[str]) -> bool:
"""Best-effort check for cached clients that accept ``vendor/model`` IDs."""
if _is_openrouter_client(client):
return True
return bool(cached_default and "/" in cached_default)
def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
"""Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.
"""Keep slash-bearing model IDs only for cached clients that support them.
Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
"""
if model and "/" in model and not _is_openrouter_client(client):
if model and "/" in model and not _cached_client_accepts_slash_models(client, cached_default):
return cached_default
return model or cached_default
@@ -2350,6 +2586,7 @@ def _get_cached_client(
api_key: str = None,
api_mode: str = None,
main_runtime: Optional[Dict[str, Any]] = None,
is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
"""Get or create a cached client for the given provider.
@@ -2386,6 +2623,7 @@ def _get_cached_client(
api_key=api_key,
api_mode=api_mode,
main_runtime=main_runtime,
is_vision=is_vision,
)
with _client_cache_lock:
if cache_key in _client_cache:
@@ -2417,6 +2655,7 @@ def _get_cached_client(
explicit_api_key=api_key,
api_mode=api_mode,
main_runtime=runtime,
is_vision=is_vision,
)
if client is not None:
# For async clients, remember which loop they were created on so we
@@ -2623,8 +2862,8 @@ def _build_call_kwargs(
temperature = fixed_temperature
# Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
# drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
# flush_memories, 0 on structured-JSON extraction) don't 400 the moment
# drop here so auxiliary callers that hardcode temperature (e.g. 0 on
# structured-JSON extraction) don't 400 the moment
# the aux model is flipped to 4.7.
if temperature is not None:
from agent.anthropic_adapter import _forbids_sampling_params
@@ -2712,7 +2951,7 @@ def call_llm(
Args:
task: Auxiliary task name ("compression", "vision", "web_extract",
"session_search", "skills_hub", "mcp", "flush_memories").
"session_search", "skills_hub", "mcp", "title_generation").
Reads provider:model from config/env. Ignored if provider is set.
provider: Explicit provider override.
model: Explicit model override.
@@ -2815,13 +3054,45 @@ def call_llm(
if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
# Handle max_tokens vs max_completion_tokens retry, then payment fallback.
# Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
# then payment fallback.
try:
return _validate_llm_response(
client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s: provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
# If retry still fails, fall through to the max_tokens /
# payment / auth chains below using the temperature-stripped
# kwargs. Re-raise only if the retry hit something those
# chains won't handle.
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
@@ -2848,6 +3119,7 @@ def call_llm(
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info("Auxiliary %s: refreshed Nous runtime credentials after 401, retrying",
@@ -2857,6 +3129,49 @@ def call_llm(
return _validate_llm_response(
refreshed_client.chat.completions.create(**kwargs), task)
# ── Auth refresh retry ───────────────────────────────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s: refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
retry_client, retry_model = (
resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=False,
)[1:]
if task == "vision"
else _get_cached_client(
resolved_provider,
resolved_model,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
)
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
retry_client.chat.completions.create(**retry_kwargs), task)
# ── Payment / credit exhaustion fallback ──────────────────────
# When the resolved provider returns 402 or a credit-related error,
# try alternative providers instead of giving up. This handles the
@@ -3041,8 +3356,35 @@ async def async_call_llm(
return _validate_llm_response(
await client.chat.completions.create(**kwargs), task)
except Exception as first_err:
if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
retry_kwargs = dict(kwargs)
retry_kwargs.pop("temperature", None)
logger.info(
"Auxiliary %s (async): provider rejected temperature; retrying once without it",
task or "call",
)
try:
return _validate_llm_response(
await client.chat.completions.create(**retry_kwargs), task)
except Exception as retry_err:
retry_err_str = str(retry_err)
if not (
_is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_auth_error(retry_err)
or "max_tokens" in retry_err_str
or "unsupported_parameter" in retry_err_str
):
raise
first_err = retry_err
kwargs = retry_kwargs
err_str = str(first_err)
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
if max_tokens is not None and (
"max_tokens" in err_str
or "unsupported_parameter" in err_str
or _is_unsupported_parameter_error(first_err, "max_tokens")
):
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = max_tokens
try:
@@ -3068,6 +3410,7 @@ async def async_call_llm(
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info("Auxiliary %s (async): refreshed Nous runtime credentials after 401, retrying",
@@ -3077,6 +3420,48 @@ async def async_call_llm(
return _validate_llm_response(
await refreshed_client.chat.completions.create(**kwargs), task)
# ── Auth refresh retry (mirrors sync call_llm) ───────────────
if (_is_auth_error(first_err)
and resolved_provider not in ("auto", "", None)
and not client_is_nous):
if _refresh_provider_credentials(resolved_provider):
logger.info(
"Auxiliary %s (async): refreshed %s credentials after auth error, retrying",
task or "call", resolved_provider,
)
if task == "vision":
_, retry_client, retry_model = resolve_vision_provider_client(
provider=resolved_provider,
model=final_model,
async_mode=True,
)
else:
retry_client, retry_model = _get_cached_client(
resolved_provider,
resolved_model,
async_mode=True,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
)
if retry_client is not None:
retry_kwargs = _build_call_kwargs(
resolved_provider,
retry_model or final_model,
messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
timeout=effective_timeout,
extra_body=effective_extra_body,
base_url=resolved_base_url,
)
_retry_base = str(getattr(retry_client, "base_url", "") or "")
if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
return _validate_llm_response(
await retry_client.chat.completions.create(**retry_kwargs), task)
# ── Payment / connection fallback (mirrors sync call_llm) ─────
should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
is_auto = resolved_provider in ("auto", "", None)
@@ -3094,7 +3479,9 @@ async def async_call_llm(
extra_body=effective_extra_body,
base_url=str(getattr(fb_client, "base_url", "") or ""))
# Convert sync fallback client to async
async_fb, async_fb_model = _to_async_client(fb_client, fb_model or "")
async_fb, async_fb_model = _to_async_client(
fb_client, fb_model or "", is_vision=(task == "vision")
)
if async_fb_model and async_fb_model != fb_kwargs.get("model"):
fb_kwargs["model"] = async_fb_model
return _validate_llm_response(
+130 -2
View File
@@ -87,6 +87,114 @@ def reset_client_cache():
_bedrock_control_client_cache.clear()
def invalidate_runtime_client(region: str) -> bool:
"""Evict the cached ``bedrock-runtime`` client for a single region.
Per-region counterpart to :func:`reset_client_cache`. Used by the converse
call wrappers to discard clients whose underlying HTTP connection has
gone stale, so the next call allocates a fresh client (with a fresh
connection pool) instead of reusing a dead socket.
Returns True if a cached entry was evicted, False if the region was not
cached.
"""
existed = region in _bedrock_runtime_client_cache
_bedrock_runtime_client_cache.pop(region, None)
return existed
# ---------------------------------------------------------------------------
# Stale-connection detection
# ---------------------------------------------------------------------------
#
# boto3 caches its HTTPS connection pool inside the client object. When a
# pooled connection is killed out from under us (NAT timeout, VPN flap,
# server-side TCP RST, proxy idle cull, etc.), the next use surfaces as
# one of a handful of low-level exceptions — most commonly
# ``botocore.exceptions.ConnectionClosedError`` or
# ``urllib3.exceptions.ProtocolError``. urllib3 also trips an internal
# ``assert`` in a couple of paths (connection pool state checks, chunked
# response readers) which bubbles up as a bare ``AssertionError`` with an
# empty ``str(exc)``.
#
# In all of these cases the client is the problem, not the request: retrying
# with the same cached client reproduces the failure until the process
# restarts. The fix is to evict the region's cached client so the next
# attempt builds a new one.
_STALE_LIB_MODULE_PREFIXES = (
"urllib3.",
"botocore.",
"boto3.",
)
def _traceback_frames_modules(exc: BaseException):
"""Yield ``__name__``-style module strings for each frame in exc's traceback."""
tb = getattr(exc, "__traceback__", None)
while tb is not None:
frame = tb.tb_frame
module = frame.f_globals.get("__name__", "")
yield module or ""
tb = tb.tb_next
def is_stale_connection_error(exc: BaseException) -> bool:
"""Return True if ``exc`` indicates a dead/stale Bedrock HTTP connection.
Matches:
* ``botocore.exceptions.ConnectionError`` and subclasses
(``ConnectionClosedError``, ``EndpointConnectionError``,
``ReadTimeoutError``, ``ConnectTimeoutError``).
* ``urllib3.exceptions.ProtocolError`` / ``NewConnectionError`` /
``ConnectionError`` (best-effort import — urllib3 is a transitive
dependency of botocore so it is always available in practice).
* Bare ``AssertionError`` raised from a frame inside urllib3, botocore,
or boto3. These are internal-invariant failures (typically triggered
by corrupted connection-pool state after a dropped socket) and are
recoverable by swapping the client.
Non-library ``AssertionError``s (from application code or tests) are
intentionally not matched — only library-internal asserts signal stale
connection state.
"""
# botocore: the canonical signal — HTTPClientError is the umbrella for
# ConnectionClosedError, ReadTimeoutError, EndpointConnectionError,
# ConnectTimeoutError, and ProxyConnectionError. ConnectionError covers
# the same family via a different branch of the hierarchy.
try:
from botocore.exceptions import (
ConnectionError as BotoConnectionError,
HTTPClientError,
)
botocore_errors: tuple = (BotoConnectionError, HTTPClientError)
except ImportError: # pragma: no cover — botocore always present with boto3
botocore_errors = ()
if botocore_errors and isinstance(exc, botocore_errors):
return True
# urllib3: low-level transport failures
try:
from urllib3.exceptions import (
ProtocolError,
NewConnectionError,
ConnectionError as Urllib3ConnectionError,
)
urllib3_errors = (ProtocolError, NewConnectionError, Urllib3ConnectionError)
except ImportError: # pragma: no cover
urllib3_errors = ()
if urllib3_errors and isinstance(exc, urllib3_errors):
return True
# Library-internal AssertionError (urllib3 / botocore / boto3)
if isinstance(exc, AssertionError):
for module in _traceback_frames_modules(exc):
if any(module.startswith(prefix) for prefix in _STALE_LIB_MODULE_PREFIXES):
return True
return False
# ---------------------------------------------------------------------------
# AWS credential detection
# ---------------------------------------------------------------------------
@@ -787,7 +895,17 @@ def call_converse(
guardrail_config=guardrail_config,
)
response = client.converse(**kwargs)
try:
response = client.converse(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse(region=%s, model=%s): "
"%s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
return normalize_converse_response(response)
@@ -819,7 +937,17 @@ def call_converse_stream(
guardrail_config=guardrail_config,
)
response = client.converse_stream(**kwargs)
try:
response = client.converse_stream(**kwargs)
except Exception as exc:
if is_stale_connection_error(exc):
logger.warning(
"bedrock: stale-connection error on converse_stream(region=%s, "
"model=%s): %s — evicting cached client so the next call reconnects.",
region, model, type(exc).__name__,
)
invalidate_runtime_client(region)
raise
return normalize_converse_stream_events(response)
+197 -11
View File
@@ -23,26 +23,52 @@ from agent.prompt_builder import DEFAULT_AGENT_IDENTITY
logger = logging.getLogger(__name__)
# Matches Codex/Harmony tool-call serialization that occasionally leaks into
# assistant-message content when the model fails to emit a structured
# ``function_call`` item. Accepts the common forms:
#
# to=functions.exec_command
# assistant to=functions.exec_command
# <|channel|>commentary to=functions.exec_command
#
# ``to=functions.<name>`` is the stable marker — the optional ``assistant`` or
# Harmony channel prefix varies by degeneration mode. Case-insensitive to
# cover lowercase/uppercase ``assistant`` variants.
_TOOL_CALL_LEAK_PATTERN = re.compile(
r"(?:^|[\s>|])to=functions\.[A-Za-z_][\w.]*",
re.IGNORECASE,
)
# ---------------------------------------------------------------------------
# Multimodal content helpers
# ---------------------------------------------------------------------------
def _chat_content_to_responses_parts(content: Any) -> List[Dict[str, Any]]:
def _chat_content_to_responses_parts(content: Any, *, role: str = "user") -> List[Dict[str, Any]]:
"""Convert chat-style multimodal content to Responses API input parts.
Input: ``[{"type":"text"|"image_url", ...}]`` (native OpenAI Chat format)
Output: ``[{"type":"input_text"|"input_image", ...}]`` (Responses format)
Output: ``[{"type":"input_text"|"output_text"|"input_image", ...}]`` (Responses format)
The ``role`` parameter controls the text content type:
- ``"user"`` (default) → ``"input_text"``
- ``"assistant"`` → ``"output_text"``
The Responses API rejects ``input_text`` inside assistant messages and
``output_text`` inside user messages, so callers MUST pass the correct
role for the message being converted.
Returns an empty list when ``content`` is not a list or contains no
recognized parts — callers fall back to the string path.
"""
text_type = "output_text" if role == "assistant" else "input_text"
if not isinstance(content, list):
return []
converted: List[Dict[str, Any]] = []
for part in content:
if isinstance(part, str):
if part:
converted.append({"type": "input_text", "text": part})
converted.append({"type": text_type, "text": part})
continue
if not isinstance(part, dict):
continue
@@ -50,7 +76,7 @@ def _chat_content_to_responses_parts(content: Any) -> List[Dict[str, Any]]:
if ptype in {"text", "input_text", "output_text"}:
text = part.get("text")
if isinstance(text, str) and text:
converted.append({"type": "input_text", "text": text})
converted.append({"type": text_type, "text": text})
continue
if ptype in {"image_url", "input_image"}:
image_ref = part.get("image_url")
@@ -201,6 +227,23 @@ def _responses_tools(tools: Optional[List[Dict[str, Any]]] = None) -> Optional[L
# Message format conversion
# ---------------------------------------------------------------------------
_RESPONSE_MESSAGE_STATUSES = {"completed", "incomplete", "in_progress"}
def _normalize_responses_message_status(value: Any, *, default: str = "completed") -> str:
"""Normalize a Responses assistant message status for replay.
The API accepts completed/incomplete/in_progress on replayed assistant
output messages. Preserve those exactly (modulo case/hyphen spelling) so
incomplete Codex continuation turns don't get falsely marked completed.
"""
if isinstance(value, str):
status = value.strip().lower().replace("-", "_").replace(" ", "_")
if status in _RESPONSE_MESSAGE_STATUSES:
return status
return default
def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Convert internal chat-style messages to Responses input items."""
items: List[Dict[str, Any]] = []
@@ -216,9 +259,10 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
if role in {"user", "assistant"}:
content = msg.get("content", "")
if isinstance(content, list):
content_parts = _chat_content_to_responses_parts(content)
content_parts = _chat_content_to_responses_parts(content, role=role)
text_type = "output_text" if role == "assistant" else "input_text"
content_text = "".join(
p.get("text", "") for p in content_parts if p.get("type") == "input_text"
p.get("text", "") for p in content_parts if p.get("type") == text_type
)
else:
content_parts = []
@@ -245,7 +289,57 @@ def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Di
seen_item_ids.add(item_id)
has_codex_reasoning = True
if content_parts:
# Replay exact assistant message items (with id/phase) from
# previous turns so the API can maintain prefix-cache hits.
# OpenAI docs: "preserve and resend phase on all assistant
# messages — dropping it can degrade performance."
codex_message_items = msg.get("codex_message_items")
replayed_message_items = 0
if isinstance(codex_message_items, list):
for raw_item in codex_message_items:
if not isinstance(raw_item, dict):
continue
if raw_item.get("type") != "message" or raw_item.get("role") != "assistant":
continue
raw_content_parts = raw_item.get("content")
if not isinstance(raw_content_parts, list):
continue
normalized_content_parts = []
for part in raw_content_parts:
if not isinstance(part, dict):
continue
part_type = str(part.get("type") or "").strip()
if part_type not in {"output_text", "text"}:
continue
text = part.get("text", "")
if text is None:
text = ""
if not isinstance(text, str):
text = str(text)
normalized_content_parts.append({"type": "output_text", "text": text})
if not normalized_content_parts:
continue
replay_item = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(raw_item.get("status")),
"content": normalized_content_parts,
}
item_id = raw_item.get("id")
if isinstance(item_id, str) and item_id.strip():
replay_item["id"] = item_id.strip()
phase = raw_item.get("phase")
if isinstance(phase, str) and phase.strip():
replay_item["phase"] = phase.strip()
items.append(replay_item)
replayed_message_items += 1
if replayed_message_items > 0:
pass
elif content_parts:
items.append({"role": "assistant", "content": content_parts})
elif content_text.strip():
items.append({"role": "assistant", "content": content_text})
@@ -405,6 +499,47 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
normalized.append(reasoning_item)
continue
if item_type == "message":
role = item.get("role")
if role != "assistant":
raise ValueError(f"Codex Responses input[{idx}] message items must have role='assistant'.")
content = item.get("content")
if not isinstance(content, list):
raise ValueError(f"Codex Responses input[{idx}] message item must have content list.")
normalized_content = []
for part_idx, part in enumerate(content):
if not isinstance(part, dict):
raise ValueError(
f"Codex Responses input[{idx}] message content[{part_idx}] must be an object."
)
part_type = part.get("type")
if part_type not in {"output_text", "text"}:
raise ValueError(
f"Codex Responses input[{idx}] message content[{part_idx}] has unsupported type {part_type!r}."
)
text = part.get("text", "")
if text is None:
text = ""
if not isinstance(text, str):
text = str(text)
normalized_content.append({"type": "output_text", "text": text})
if not normalized_content:
raise ValueError(f"Codex Responses input[{idx}] message item must contain at least one text part.")
normalized_item: Dict[str, Any] = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(item.get("status")),
"content": normalized_content,
}
item_id = item.get("id")
if isinstance(item_id, str) and item_id.strip():
normalized_item["id"] = item_id.strip()
phase = item.get("phase")
if isinstance(phase, str) and phase.strip():
normalized_item["phase"] = phase.strip()
normalized.append(normalized_item)
continue
role = item.get("role")
if role in {"user", "assistant"}:
content = item.get("content", "")
@@ -412,13 +547,16 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
content = ""
if isinstance(content, list):
# Multimodal content from ``_chat_messages_to_responses_input``
# is already in Responses format (``input_text`` / ``input_image``).
# Validate each part and pass through.
# is already in Responses format (``input_text`` / ``output_text``
# / ``input_image``). Validate each part and pass through.
# Use the correct text type for the role — ``output_text`` for
# assistant messages, ``input_text`` for user messages.
text_type = "output_text" if role == "assistant" else "input_text"
validated: List[Dict[str, Any]] = []
for part_idx, part in enumerate(content):
if isinstance(part, str):
if part:
validated.append({"type": "input_text", "text": part})
validated.append({"type": text_type, "text": part})
continue
if not isinstance(part, dict):
raise ValueError(
@@ -429,7 +567,7 @@ def _preflight_codex_input_items(raw_items: Any) -> List[Dict[str, Any]]:
text = part.get("text", "")
if not isinstance(text, str):
text = str(text or "")
validated.append({"type": "input_text", "text": text})
validated.append({"type": text_type, "text": text})
elif ptype in {"input_image", "image_url"}:
image_ref = part.get("image_url", "")
detail = part.get("detail")
@@ -686,6 +824,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
content_parts: List[str] = []
reasoning_parts: List[str] = []
reasoning_items_raw: List[Dict[str, Any]] = []
message_items_raw: List[Dict[str, Any]] = []
tool_calls: List[Any] = []
has_incomplete_items = response_status in {"queued", "in_progress", "incomplete"}
saw_commentary_phase = False
@@ -704,6 +843,7 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if item_type == "message":
item_phase = getattr(item, "phase", None)
normalized_phase = None
if isinstance(item_phase, str):
normalized_phase = item_phase.strip().lower()
if normalized_phase in {"commentary", "analysis"}:
@@ -713,6 +853,18 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
message_text = _extract_responses_message_text(item)
if message_text:
content_parts.append(message_text)
raw_message_item: Dict[str, Any] = {
"type": "message",
"role": "assistant",
"status": _normalize_responses_message_status(item_status),
"content": [{"type": "output_text", "text": message_text}],
}
item_id = getattr(item, "id", None)
if isinstance(item_id, str) and item_id:
raw_message_item["id"] = item_id
if normalized_phase:
raw_message_item["phase"] = normalized_phase
message_items_raw.append(raw_message_item)
elif item_type == "reasoning":
reasoning_text = _extract_responses_reasoning_text(item)
if reasoning_text:
@@ -787,6 +939,37 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
if isinstance(out_text, str):
final_text = out_text.strip()
# ── Tool-call leak recovery ──────────────────────────────────
# gpt-5.x on the Codex Responses API sometimes degenerates and emits
# what should be a structured `function_call` item as plain assistant
# text using the Harmony/Codex serialization (``to=functions.foo
# {json}`` or ``assistant to=functions.foo {json}``). The model
# intended to call a tool, but the intent never made it into
# ``response.output`` as a ``function_call`` item, so ``tool_calls``
# is empty here. If we pass this through, the parent sees a
# confident-looking summary with no audit trail (empty ``tool_trace``)
# and no tools actually ran — the Taiwan-embassy-email incident.
#
# Detection: leaked tokens always contain ``to=functions.<name>`` and
# the assistant message has no real tool calls. Treat it as incomplete
# so the existing Codex-incomplete continuation path (3 retries,
# handled in run_agent.py) gets a chance to re-elicit a proper
# ``function_call`` item. The existing loop already handles message
# append, dedup, and retry budget.
leaked_tool_call_text = False
if final_text and not tool_calls and _TOOL_CALL_LEAK_PATTERN.search(final_text):
leaked_tool_call_text = True
logger.warning(
"Codex response contains leaked tool-call text in assistant content "
"(no structured function_call items). Treating as incomplete so the "
"continuation path can re-elicit a proper tool call. Leaked snippet: %r",
final_text[:300],
)
# Clear the text so downstream code doesn't surface the garbage as
# a summary. The encrypted reasoning items (if any) are preserved
# so the model keeps its chain-of-thought on the retry.
final_text = ""
assistant_message = SimpleNamespace(
content=final_text,
tool_calls=tool_calls,
@@ -794,10 +977,13 @@ def _normalize_codex_response(response: Any) -> tuple[Any, str]:
reasoning_content=None,
reasoning_details=None,
codex_reasoning_items=reasoning_items_raw or None,
codex_message_items=message_items_raw or None,
)
if tool_calls:
finish_reason = "tool_calls"
elif leaked_tool_call_text:
finish_reason = "incomplete"
elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
finish_reason = "incomplete"
elif reasoning_items_raw and not final_text:
+93 -5
View File
@@ -61,9 +61,52 @@ _PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
# Chars per token rough estimate
_CHARS_PER_TOKEN = 4
# Flat token cost per attached image part. Real cost varies by provider and
# dimensions (Anthropic ≈ width×height/750, GPT-4o up to ~1700 for
# high-detail 2048×2048, Gemini 258/tile), but 1600 is a realistic ceiling
# that keeps compression budgeting honest for multi-image conversations.
# Matches Claude Code's IMAGE_TOKEN_ESTIMATE constant.
_IMAGE_TOKEN_ESTIMATE = 1600
# Same figure expressed in the char-budget currency the rest of the
# compressor speaks in. Used when accumulating message "content length"
# for tail-cut decisions.
_IMAGE_CHAR_EQUIVALENT = _IMAGE_TOKEN_ESTIMATE * _CHARS_PER_TOKEN
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
def _content_length_for_budget(raw_content: Any) -> int:
"""Return the effective char-length of a message's content for token budgeting.
Plain strings: ``len(content)``. Multimodal lists: sum of text-part
``len(text)`` plus a flat ``_IMAGE_CHAR_EQUIVALENT`` per image part
(``image_url`` / ``input_image`` / Anthropic-style ``image``). This
keeps the compressor from treating a turn with 5 attached images as
near-zero tokens just because the text part is empty.
"""
if isinstance(raw_content, str):
return len(raw_content)
if not isinstance(raw_content, list):
return len(str(raw_content or ""))
total = 0
for p in raw_content:
if isinstance(p, str):
total += len(p)
continue
if not isinstance(p, dict):
total += len(str(p))
continue
ptype = p.get("type")
if ptype in {"image_url", "input_image", "image"}:
total += _IMAGE_CHAR_EQUIVALENT
else:
# text / input_text / tool_result-with-text / anything else with
# a text field. Ignore the raw base64 payload inside image_url
# dicts — dimensions don't matter, only whether it's an image.
total += len(p.get("text", "") or "")
return total
def _content_text_for_contains(content: Any) -> str:
"""Return a best-effort text view of message content.
@@ -294,6 +337,9 @@ class ContextCompressor(ContextEngine):
self._context_probed = False
self._context_probe_persistable = False
self._previous_summary = None
self._last_summary_error = None
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
@@ -317,6 +363,13 @@ class ContextCompressor(ContextEngine):
int(context_length * self.threshold_percent),
MINIMUM_CONTEXT_LENGTH,
)
# Recalculate token budgets for the new context length so the
# compressor stays calibrated after a model switch (e.g. 200K → 32K).
target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(
int(context_length * 0.05), _SUMMARY_TOKENS_CEILING,
)
def __init__(
self,
@@ -389,6 +442,12 @@ class ContextCompressor(ContextEngine):
self._last_compression_savings_pct: float = 100.0
self._ineffective_compression_count: int = 0
self._summary_failure_cooldown_until: float = 0.0
self._last_summary_error: Optional[str] = None
# When summary generation fails and a static fallback is inserted,
# record how many turns were unrecoverably dropped so callers
# (gateway hygiene, /compress) can surface a visible warning.
self._last_summary_dropped_count: int = 0
self._last_summary_fallback_used: bool = False
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
@@ -475,7 +534,7 @@ class ContextCompressor(ContextEngine):
for i in range(len(result) - 1, -1, -1):
msg = result[i]
raw_content = msg.get("content") or ""
content_len = sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content)
content_len = _content_length_for_budget(raw_content)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -812,10 +871,12 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._previous_summary = summary
self._summary_failure_cooldown_until = 0.0
self._summary_model_fallen_back = False
self._last_summary_error = None
return self._with_summary_prefix(summary)
except RuntimeError:
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
@@ -853,6 +914,10 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self._summary_failure_cooldown_until = time.monotonic() + _transient_cooldown
err_text = str(e).strip() or e.__class__.__name__
if len(err_text) > 220:
err_text = err_text[:217].rstrip() + "..."
self._last_summary_error = err_text
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
@@ -1067,8 +1132,9 @@ The user has requested that this compaction PRIORITISE preserving all informatio
for i in range(n - 1, head_end - 1, -1):
msg = messages[i]
content = msg.get("content") or ""
msg_tokens = len(content) // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
raw_content = msg.get("content") or ""
content_len = _content_length_for_budget(raw_content)
msg_tokens = content_len // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
# Include tool call arguments in estimate
for tc in msg.get("tool_calls") or []:
if isinstance(tc, dict):
@@ -1099,6 +1165,21 @@ The user has requested that this compaction PRIORITISE preserving all informatio
return max(cut_idx, head_end + 1)
# ------------------------------------------------------------------
# ContextEngine: manual /compress preflight
# ------------------------------------------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
"""Return True if there is a non-empty middle region to compact.
Overrides the ABC default so the gateway ``/compress`` guard can
skip the LLM call when the transcript is still entirely inside
the protected head/tail.
"""
compress_start = self._align_boundary_forward(messages, self.protect_first_n)
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
return compress_start < compress_end
# ------------------------------------------------------------------
# Main compression entry point
# ------------------------------------------------------------------
@@ -1122,6 +1203,11 @@ The user has requested that this compaction PRIORITISE preserving all informatio
related to this topic and be more aggressive about compressing
everything else. Inspired by Claude Code's ``/compact``.
"""
# Reset per-call summary failure state — callers inspect these fields
# after compress() returns to decide whether to surface a warning.
self._last_summary_dropped_count = 0
self._last_summary_fallback_used = False
self._last_summary_error = None
n_messages = len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self.protect_first_n + 3 + 1
@@ -1200,11 +1286,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
if not self.quiet_mode:
logger.warning("Summary generation failed — inserting static fallback context marker")
n_dropped = compress_end - compress_start
self._last_summary_dropped_count = n_dropped
self._last_summary_fallback_used = True
summary = (
f"{SUMMARY_PREFIX}\n"
f"Summary generation was unavailable. {n_dropped} conversation turns were "
f"Summary generation was unavailable. {n_dropped} message(s) were "
f"removed to free context space but could not be summarized. The removed "
f"turns contained earlier work in this session. Continue based on the "
f"messages contained earlier work in this session. Continue based on the "
f"recent messages below and the current state of any files or resources."
)
+22
View File
@@ -78,6 +78,7 @@ class ContextEngine(ABC):
self,
messages: List[Dict[str, Any]],
current_tokens: int = None,
focus_topic: str = None,
) -> List[Dict[str, Any]]:
"""Compact the message list and return the new message list.
@@ -86,6 +87,12 @@ class ContextEngine(ABC):
context budget. The implementation is free to summarize, build a
DAG, or do anything else — as long as the returned list is a valid
OpenAI-format message sequence.
Args:
focus_topic: Optional topic string from manual ``/compress <focus>``.
Engines that support guided compression should prioritise
preserving information related to this topic. Engines that
don't support it may simply ignore this argument.
"""
# -- Optional: pre-flight check ----------------------------------------
@@ -98,6 +105,21 @@ class ContextEngine(ABC):
"""
return False
# -- Optional: manual /compress preflight ------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:
"""Quick check: is there anything in ``messages`` that can be compacted?
Used by the gateway ``/compress`` command as a preflight guard —
returning False lets the gateway report "nothing to compress yet"
without making an LLM call.
Default returns True (always attempt). Engines with a cheap way
to introspect their own head/tail boundaries should override this
to return False when the transcript is still entirely protected.
"""
return True
# -- Optional: session lifecycle ---------------------------------------
def on_session_start(self, session_id: str, **kwargs) -> None:
+42
View File
@@ -46,6 +46,47 @@ def _resolve_args() -> list[str]:
return shlex.split(raw)
def _resolve_home_dir() -> str:
"""Return a stable HOME for child ACP processes."""
try:
from hermes_constants import get_subprocess_home
profile_home = get_subprocess_home()
if profile_home:
return profile_home
except Exception:
pass
home = os.environ.get("HOME", "").strip()
if home:
return home
expanded = os.path.expanduser("~")
if expanded and expanded != "~":
return expanded
try:
import pwd
resolved = pwd.getpwuid(os.getuid()).pw_dir.strip()
if resolved:
return resolved
except Exception:
pass
# Last resort: /tmp (writable on any POSIX system). Avoids crashing the
# subprocess with no HOME; callers can set HERMES_HOME explicitly if they
# need a different writable dir.
return "/tmp"
def _build_subprocess_env() -> dict[str, str]:
env = os.environ.copy()
env["HOME"] = _resolve_home_dir()
return env
def _jsonrpc_error(message_id: Any, code: int, message: str) -> dict[str, Any]:
return {
"jsonrpc": "2.0",
@@ -382,6 +423,7 @@ class CopilotACPClient:
text=True,
bufsize=1,
cwd=self._acp_cwd,
env=_build_subprocess_env(),
)
except FileNotFoundError as exc:
raise RuntimeError(
+114 -6
View File
@@ -14,6 +14,7 @@ from datetime import datetime
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.config import get_env_value
import hermes_cli.auth as auth_mod
from hermes_cli.auth import (
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
@@ -455,6 +456,61 @@ class CredentialPool:
logger.debug("Failed to sync from credentials file: %s", exc)
return entry
def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Nous pool entry from auth.json if tokens differ.
Nous OAuth refresh tokens are single-use. When another process
(e.g. a concurrent cron) refreshes the token via
``resolve_nous_runtime_credentials``, it writes fresh tokens to
auth.json under ``_auth_store_lock``. The pool entry's tokens
become stale. This method detects that and adopts the newer pair,
avoiding a "refresh token reuse" revocation on the Nous Portal.
"""
if self.provider != "nous" or entry.source != "device_code":
return entry
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "nous")
if not state:
return entry
store_refresh = state.get("refresh_token", "")
store_access = state.get("access_token", "")
if store_refresh and store_refresh != entry.refresh_token:
logger.debug(
"Pool entry %s: syncing tokens from auth.json (Nous refresh token changed)",
entry.id,
)
field_updates: Dict[str, Any] = {
"access_token": store_access,
"refresh_token": store_refresh,
"last_status": None,
"last_status_at": None,
"last_error_code": None,
}
if state.get("expires_at"):
field_updates["expires_at"] = state["expires_at"]
if state.get("agent_key"):
field_updates["agent_key"] = state["agent_key"]
if state.get("agent_key_expires_at"):
field_updates["agent_key_expires_at"] = state["agent_key_expires_at"]
if state.get("inference_base_url"):
field_updates["inference_base_url"] = state["inference_base_url"]
extra_updates = dict(entry.extra)
for extra_key in ("obtained_at", "expires_in", "agent_key_id",
"agent_key_expires_in", "agent_key_reused",
"agent_key_obtained_at"):
val = state.get(extra_key)
if val is not None:
extra_updates[extra_key] = val
updated = replace(entry, extra=extra_updates, **field_updates)
self._replace_entry(entry, updated)
self._persist()
return updated
except Exception as exc:
logger.debug("Failed to sync Nous entry from auth.json: %s", exc)
return entry
def _sync_device_code_entry_to_auth_store(self, entry: PooledCredential) -> None:
"""Write refreshed pool entry tokens back to auth.json providers.
@@ -561,6 +617,9 @@ class CredentialPool:
last_refresh=refreshed.get("last_refresh"),
)
elif self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
nous_state = {
"access_token": entry.access_token,
"refresh_token": entry.refresh_token,
@@ -635,6 +694,26 @@ class CredentialPool:
# Credentials file had a valid (non-expired) token — use it directly
logger.debug("Credentials file has valid token, using without refresh")
return synced
# For nous: another process may have consumed the refresh token
# between our proactive sync and the HTTP call. Re-sync from
# auth.json and adopt the fresh tokens if available.
if self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced.refresh_token != entry.refresh_token:
logger.debug("Nous refresh failed but auth.json has newer tokens — adopting")
updated = replace(
synced,
last_status=STATUS_OK,
last_status_at=None,
last_error_code=None,
last_error_reason=None,
last_error_message=None,
last_error_reset_at=None,
)
self._replace_entry(synced, updated)
self._persist()
self._sync_device_code_entry_to_auth_store(updated)
return updated
self._mark_exhausted(entry, None)
return None
@@ -698,6 +777,17 @@ class CredentialPool:
if synced is not entry:
entry = synced
cleared_any = True
# For nous entries, sync from auth.json before status checks.
# Another process may have successfully refreshed via
# resolve_nous_runtime_credentials(), making this entry's
# exhausted status stale.
if (self.provider == "nous"
and entry.source == "device_code"
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@@ -739,8 +829,11 @@ class CredentialPool:
if self._strategy == STRATEGY_LEAST_USED and len(available) > 1:
entry = min(available, key=lambda e: e.request_count)
# Increment usage counter so subsequent selections distribute load
updated = replace(entry, request_count=entry.request_count + 1)
self._replace_entry(entry, updated)
self._current_id = entry.id
return entry
return updated
if self._strategy == STRATEGY_ROUND_ROBIN and len(available) > 1:
entry = available[0]
@@ -1056,6 +1149,18 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"inference_base_url": state.get("inference_base_url"),
"agent_key": state.get("agent_key"),
"agent_key_expires_at": state.get("agent_key_expires_at"),
# Carry the mint/refresh timestamps into the pool so
# freshness-sensitive consumers (self-heal hooks, pool
# pruning by age) can distinguish just-minted credentials
# from stale ones. Without these, fresh device_code
# entries get obtained_at=None and look older than they
# are (#15099).
"obtained_at": state.get("obtained_at"),
"expires_in": state.get("expires_in"),
"agent_key_id": state.get("agent_key_id"),
"agent_key_expires_in": state.get("agent_key_expires_in"),
"agent_key_reused": state.get("agent_key_reused"),
"agent_key_obtained_at": state.get("agent_key_obtained_at"),
"tls": state.get("tls") if isinstance(state.get("tls"), dict) else None,
"label": seeded_label,
},
@@ -1066,9 +1171,10 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
# env vars (COPILOT_GITHUB_TOKEN / GH_TOKEN). They don't live in
# the auth store or credential pool, so we resolve them here.
try:
from hermes_cli.copilot_auth import resolve_copilot_token
from hermes_cli.copilot_auth import resolve_copilot_token, get_copilot_api_token
token, source = resolve_copilot_token()
if token:
api_token = get_copilot_api_token(token)
source_name = "gh_cli" if "gh" in source.lower() else f"env:{source}"
if not _is_suppressed(provider, source_name):
active_sources.add(source_name)
@@ -1080,7 +1186,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
{
"source": source_name,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": token,
"access_token": api_token,
"base_url": pconfig.inference_base_url if pconfig else "",
"label": source,
},
@@ -1168,7 +1274,8 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
def _is_source_suppressed(_p, _s): # type: ignore[misc]
return False
if provider == "openrouter":
token = os.getenv("OPENROUTER_API_KEY", "").strip()
# Check both os.environ and ~/.hermes/.env file
token = (get_env_value("OPENROUTER_API_KEY") or "").strip()
if token:
source = "env:OPENROUTER_API_KEY"
if _is_source_suppressed(provider, source):
@@ -1194,7 +1301,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip().rstrip("/")
env_url = (get_env_value(pconfig.base_url_env_var) or "").strip().rstrip("/")
env_vars = list(pconfig.api_key_env_vars)
if provider == "anthropic":
@@ -1205,7 +1312,8 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
]
for env_var in env_vars:
token = os.getenv(env_var, "").strip()
# Check both os.environ and ~/.hermes/.env file
token = (get_env_value(env_var) or "").strip()
if not token:
continue
source = f"env:{env_var}"
+36
View File
@@ -42,6 +42,7 @@ class FailoverReason(enum.Enum):
# Context / payload
context_overflow = "context_overflow" # Context too large — compress, not failover
payload_too_large = "payload_too_large" # 413 — compress payload
image_too_large = "image_too_large" # Native image part exceeds provider's per-image limit — shrink and retry
# Model
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
@@ -147,6 +148,20 @@ _PAYLOAD_TOO_LARGE_PATTERNS = [
"error code: 413",
]
# Image-size patterns. Matched against 400 bodies (not 413) because most
# providers return a 400 with a specific image-too-big message before the
# whole request hits the 413 size limit. Anthropic's wording is the most
# important here (hard 5 MB per image, returned as
# "messages.N.content.K.image.source.base64: image exceeds 5 MB maximum").
_IMAGE_TOO_LARGE_PATTERNS = [
"image exceeds", # Anthropic: "image exceeds 5 MB maximum"
"image too large", # generic
"image_too_large", # error_code variant
"image size exceeds", # variant
# "request_too_large" on a request known to contain an image → image is
# the likely culprit; we still try the shrink path before giving up.
]
# Context overflow patterns
_CONTEXT_OVERFLOW_PATTERNS = [
"context length",
@@ -343,6 +358,11 @@ def classify_api_error(
"""
status_code = _extract_status_code(error)
error_type = type(error).__name__
# Copilot/GitHub Models RateLimitError may not set .status_code; force 429
# so downstream rate-limit handling (classifier reason, pool rotation,
# fallback gating) fires correctly instead of misclassifying as generic.
if status_code is None and error_type == "RateLimitError":
status_code = 429
body = _extract_error_body(error)
error_code = _extract_error_code(body)
@@ -666,6 +686,15 @@ def _classify_400(
) -> ClassifiedError:
"""Classify 400 Bad Request — context overflow, format error, or generic."""
# Image-too-large from 400 (Anthropic's 5 MB per-image check fires this way).
# Must be checked BEFORE context_overflow because messages can trip both
# patterns ("exceeds" + "image") and image-shrink is a cheaper recovery.
if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
return result_fn(
FailoverReason.image_too_large,
retryable=True,
)
# Context overflow from 400
if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
return result_fn(
@@ -793,6 +822,13 @@ def _classify_by_message(
should_compress=True,
)
# Image-too-large patterns (from message text when no status_code)
if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
return result_fn(
FailoverReason.image_too_large,
retryable=True,
)
# Usage-limit patterns need the same disambiguation as 402: some providers
# surface "usage limit" errors without an HTTP status code. A transient
# signal ("try again", "resets at", …) means it's a periodic quota, not
+104
View File
@@ -44,6 +44,97 @@ def is_native_gemini_base_url(base_url: str) -> bool:
return not normalized.endswith("/openai")
def probe_gemini_tier(
api_key: str,
base_url: str = DEFAULT_GEMINI_BASE_URL,
*,
model: str = "gemini-2.5-flash",
timeout: float = 10.0,
) -> str:
"""Probe a Google AI Studio API key and return its tier.
Returns one of:
- ``"free"`` -- key is on the free tier (unusable with Hermes)
- ``"paid"`` -- key is on a paid tier
- ``"unknown"`` -- probe failed; callers should proceed without blocking.
"""
key = (api_key or "").strip()
if not key:
return "unknown"
normalized_base = str(base_url or DEFAULT_GEMINI_BASE_URL).strip().rstrip("/")
if not normalized_base:
normalized_base = DEFAULT_GEMINI_BASE_URL
if normalized_base.lower().endswith("/openai"):
normalized_base = normalized_base[: -len("/openai")]
url = f"{normalized_base}/models/{model}:generateContent"
payload = {
"contents": [{"role": "user", "parts": [{"text": "hi"}]}],
"generationConfig": {"maxOutputTokens": 1},
}
try:
with httpx.Client(timeout=timeout) as client:
resp = client.post(
url,
params={"key": key},
json=payload,
headers={"Content-Type": "application/json"},
)
except Exception as exc:
logger.debug("probe_gemini_tier: network error: %s", exc)
return "unknown"
headers_lower = {k.lower(): v for k, v in resp.headers.items()}
rpd_header = headers_lower.get("x-ratelimit-limit-requests-per-day")
if rpd_header:
try:
rpd_val = int(rpd_header)
except (TypeError, ValueError):
rpd_val = None
# Published free-tier daily caps (Dec 2025):
# gemini-2.5-pro: 100, gemini-2.5-flash: 250, flash-lite: 1000
# Tier 1 starts at ~1500+ for Flash. We treat <= 1000 as free.
if rpd_val is not None and rpd_val <= 1000:
return "free"
if rpd_val is not None and rpd_val > 1000:
return "paid"
if resp.status_code == 429:
body_text = ""
try:
body_text = resp.text or ""
except Exception:
body_text = ""
if "free_tier" in body_text.lower():
return "free"
return "paid"
if 200 <= resp.status_code < 300:
return "paid"
return "unknown"
def is_free_tier_quota_error(error_message: str) -> bool:
"""Return True when a Gemini 429 message indicates free-tier exhaustion."""
if not error_message:
return False
return "free_tier" in error_message.lower()
_FREE_TIER_GUIDANCE = (
"\n\nYour Google API key is on the free tier (<= 250 requests/day for "
"gemini-2.5-flash). Hermes typically makes 3-10 API calls per user turn, "
"so the free tier is exhausted in a handful of messages and cannot sustain "
"an agent session. Enable billing on your Google Cloud project and "
"regenerate the key in a billing-enabled project: "
"https://aistudio.google.com/apikey"
)
class GeminiAPIError(Exception):
"""Error shape compatible with Hermes retry/error classification."""
@@ -650,6 +741,12 @@ def gemini_http_error(response: httpx.Response) -> GeminiAPIError:
else:
message = f"Gemini returned HTTP {status}: {body_text[:500]}"
# Free-tier quota exhaustion -> append actionable guidance so users who
# bypassed the setup wizard (direct GOOGLE_API_KEY in .env) still learn
# that the free tier cannot sustain an agent session.
if status == 429 and is_free_tier_quota_error(err_message or body_text):
message = message + _FREE_TIER_GUIDANCE
return GeminiAPIError(
message,
code=code,
@@ -704,6 +801,13 @@ class GeminiNativeClient:
http_client: Optional[httpx.Client] = None,
**_: Any,
) -> None:
if not (api_key or "").strip():
raise RuntimeError(
"Gemini native client requires an API key, but none was provided. "
"Set GOOGLE_API_KEY or GEMINI_API_KEY in your environment / ~/.hermes/.env "
"(get one at https://aistudio.google.com/app/apikey), or run `hermes setup` "
"to configure the Google provider."
)
self.api_key = api_key
normalized_base = (base_url or DEFAULT_GEMINI_BASE_URL).rstrip("/")
if normalized_base.endswith("/openai"):
+14
View File
@@ -73,6 +73,20 @@ def sanitize_gemini_schema(schema: Any) -> Dict[str, Any]:
]
continue
cleaned[key] = value
# Gemini's Schema validator requires every ``enum`` entry to be a string,
# even when the parent ``type`` is ``integer`` / ``number`` / ``boolean``.
# OpenAI / OpenRouter / Anthropic accept typed enums (e.g. Discord's
# ``auto_archive_duration: {type: integer, enum: [60, 1440, 4320, 10080]}``),
# so we only drop the ``enum`` when it would collide with Gemini's rule.
# Keeping ``type: integer`` plus the human-readable description gives the
# model enough guidance; the tool handler still validates the value.
enum_val = cleaned.get("enum")
type_val = cleaned.get("type")
if isinstance(enum_val, list) and type_val in {"integer", "number", "boolean"}:
if any(not isinstance(item, str) for item in enum_val):
cleaned.pop("enum", None)
return cleaned
+236
View File
@@ -0,0 +1,236 @@
"""Routing helpers for inbound user-attached images.
Two modes:
native attach images as OpenAI-style ``image_url`` content parts on the
user turn. Provider adapters (Anthropic, Gemini, Bedrock, Codex,
OpenAI chat.completions) already translate these into their
vendor-specific multimodal formats.
text run ``vision_analyze`` on each image up-front and prepend the
description to the user's text. The model never sees the pixels;
it only sees a lossy text summary. This is the pre-existing
behaviour and still the right choice for non-vision models.
The decision is made once per message turn by :func:`decide_image_input_mode`.
It reads ``agent.image_input_mode`` from config.yaml (``auto`` | ``native``
| ``text``, default ``auto``) and the active model's capability metadata.
In ``auto`` mode:
- If the user has explicitly configured ``auxiliary.vision.provider``
(i.e. not ``auto`` and not empty), we assume they want the text pipeline
regardless of the main model they've opted in to a specific vision
backend for a reason (cost, quality, local-only, etc.).
- Otherwise, if the active model reports ``supports_vision=True`` in its
models.dev metadata, we attach natively.
- Otherwise (non-vision model, no explicit override), we fall back to text.
This keeps ``vision_analyze`` surfaced as a tool in every session skills
and agent flows that chain it (browser screenshots, deeper inspection of
URL-referenced images, style-gating loops) keep working. The routing only
affects *how user-attached images on the current turn* are presented to the
main model.
"""
from __future__ import annotations
import base64
import logging
import mimetypes
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
_VALID_MODES = frozenset({"auto", "native", "text"})
def _coerce_mode(raw: Any) -> str:
"""Normalize a config value into one of the valid modes."""
if not isinstance(raw, str):
return "auto"
val = raw.strip().lower()
if val in _VALID_MODES:
return val
return "auto"
def _explicit_aux_vision_override(cfg: Optional[Dict[str, Any]]) -> bool:
"""True when the user configured a specific auxiliary vision backend.
An explicit override means the user *wants* the text pipeline (they're
paying for a dedicated vision model), so we don't silently bypass it.
"""
if not isinstance(cfg, dict):
return False
aux = cfg.get("auxiliary") or {}
if not isinstance(aux, dict):
return False
vision = aux.get("vision") or {}
if not isinstance(vision, dict):
return False
provider = str(vision.get("provider") or "").strip().lower()
model = str(vision.get("model") or "").strip()
base_url = str(vision.get("base_url") or "").strip()
# "auto" / "" / blank = not explicit
if provider in ("", "auto") and not model and not base_url:
return False
return True
def _lookup_supports_vision(provider: str, model: str) -> Optional[bool]:
"""Return True/False if we can resolve caps, None if unknown."""
if not provider or not model:
return None
try:
from agent.models_dev import get_model_capabilities
caps = get_model_capabilities(provider, model)
except Exception as exc: # pragma: no cover - defensive
logger.debug("image_routing: caps lookup failed for %s:%s%s", provider, model, exc)
return None
if caps is None:
return None
return bool(caps.supports_vision)
def decide_image_input_mode(
provider: str,
model: str,
cfg: Optional[Dict[str, Any]],
) -> str:
"""Return ``"native"`` or ``"text"`` for the given turn.
Args:
provider: active inference provider ID (e.g. ``"anthropic"``, ``"openrouter"``).
model: active model slug as it would be sent to the provider.
cfg: loaded config.yaml dict, or None. When None, behaves as auto.
"""
mode_cfg = "auto"
if isinstance(cfg, dict):
agent_cfg = cfg.get("agent") or {}
if isinstance(agent_cfg, dict):
mode_cfg = _coerce_mode(agent_cfg.get("image_input_mode"))
if mode_cfg == "native":
return "native"
if mode_cfg == "text":
return "text"
# auto
if _explicit_aux_vision_override(cfg):
return "text"
supports = _lookup_supports_vision(provider, model)
if supports is True:
return "native"
return "text"
# Image size handling is REACTIVE rather than proactive: we attempt native
# attachment at full size regardless of provider, and rely on
# ``run_agent._try_shrink_image_parts_in_messages`` to shrink + retry if
# the provider rejects the request (e.g. Anthropic's hard 5 MB per-image
# ceiling returned as HTTP 400 "image exceeds 5 MB maximum").
#
# Why reactive: our knowledge of provider ceilings is partial and evolving
# (OpenAI accepts 49 MB+, Anthropic 5 MB, Gemini 100 MB, others unknown).
# A proactive per-provider table would be stale the moment a provider raises
# or lowers its limit, and silently degrading quality for users on providers
# that would have accepted the full image is the worse failure mode.
# The shrink-on-reject path loses 1 API call + maybe 1s of Pillow work when
# it fires, which is cheaper than permanent quality loss.
def _guess_mime(path: Path) -> str:
mime, _ = mimetypes.guess_type(str(path))
if mime and mime.startswith("image/"):
return mime
# mimetypes on some Linux distros mis-maps .jpg; default to jpeg when
# the suffix looks imagey.
suffix = path.suffix.lower()
return {
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
}.get(suffix, "image/jpeg")
def _file_to_data_url(path: Path) -> Optional[str]:
"""Encode a local image as a base64 data URL at its native size.
Size limits are NOT enforced here the agent retry loop
(``run_agent._try_shrink_image_parts_in_messages``) shrinks on the
provider's first rejection. Keeping this simple means providers that
accept large images (OpenAI 49 MB+, Gemini 100 MB) don't pay a silent
quality tax just because one other provider is stricter.
Returns None only if the file can't be read (missing, permission
denied, etc.); the caller reports those paths in ``skipped``.
"""
try:
raw = path.read_bytes()
except Exception as exc:
logger.warning("image_routing: failed to read %s%s", path, exc)
return None
mime = _guess_mime(path)
b64 = base64.b64encode(raw).decode("ascii")
return f"data:{mime};base64,{b64}"
def build_native_content_parts(
user_text: str,
image_paths: List[str],
) -> Tuple[List[Dict[str, Any]], List[str]]:
"""Build an OpenAI-style ``content`` list for a user turn.
Shape:
[{"type": "text", "text": "..."},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
...]
Images are attached at their native size. If a provider rejects the
request because an image is too large (e.g. Anthropic's 5 MB per-image
ceiling), the agent's retry loop transparently shrinks and retries
once see ``run_agent._try_shrink_image_parts_in_messages``.
Returns (content_parts, skipped_paths). Skipped paths are files that
couldn't be read from disk.
"""
parts: List[Dict[str, Any]] = []
skipped: List[str] = []
text = (user_text or "").strip()
if text:
parts.append({"type": "text", "text": text})
for raw_path in image_paths:
p = Path(raw_path)
if not p.exists() or not p.is_file():
skipped.append(str(raw_path))
continue
data_url = _file_to_data_url(p)
if not data_url:
skipped.append(str(raw_path))
continue
parts.append({
"type": "image_url",
"image_url": {"url": data_url},
})
# If the text was empty, add a neutral prompt so the turn isn't just images.
if not text and any(p.get("type") == "image_url" for p in parts):
parts.insert(0, {"type": "text", "text": "What do you see in this image?"})
return parts, skipped
__all__ = [
"decide_image_input_mode",
"build_native_content_parts",
]
+156 -6
View File
@@ -31,6 +31,7 @@ from __future__ import annotations
import json
import logging
import re
import inspect
from typing import Any, Dict, List, Optional
from agent.memory_provider import MemoryProvider
@@ -62,15 +63,124 @@ def sanitize_context(text: str) -> str:
return text
def build_memory_context_block(raw_context: str) -> str:
"""Wrap prefetched memory in a fenced block with system note.
class StreamingContextScrubber:
"""Stateful scrubber for streaming text that may contain split memory-context spans.
The fence prevents the model from treating recalled context as user
discourse. Injected at API-call time only never persisted.
The one-shot ``sanitize_context`` regex cannot survive chunk boundaries:
a ``<memory-context>`` opened in one delta and closed in a later delta
leaks its payload to the UI because the non-greedy block regex needs
both tags in one string. This scrubber runs a small state machine
across deltas, holding back partial-tag tails and discarding
everything inside a span (including the system-note line).
Usage::
scrubber = StreamingContextScrubber()
for delta in stream:
visible = scrubber.feed(delta)
if visible:
emit(visible)
trailing = scrubber.flush() # at end of stream
if trailing:
emit(trailing)
The scrubber is re-entrant per agent instance. Callers building new
top-level responses (new turn) should create a fresh scrubber or call
``reset()``.
"""
_OPEN_TAG = "<memory-context>"
_CLOSE_TAG = "</memory-context>"
def __init__(self) -> None:
self._in_span: bool = False
self._buf: str = ""
def reset(self) -> None:
self._in_span = False
self._buf = ""
def feed(self, text: str) -> str:
"""Return the visible portion of ``text`` after scrubbing.
Any trailing fragment that could be the start of an open/close tag
is held back in the internal buffer and surfaced on the next
``feed()`` call or discarded/emitted by ``flush()``.
"""
if not text:
return ""
buf = self._buf + text
self._buf = ""
out: list[str] = []
while buf:
if self._in_span:
idx = buf.lower().find(self._CLOSE_TAG)
if idx == -1:
# Hold back a potential partial close tag; drop the rest
held = self._max_partial_suffix(buf, self._CLOSE_TAG)
self._buf = buf[-held:] if held else ""
return "".join(out)
# Found close — skip span content + tag, continue
buf = buf[idx + len(self._CLOSE_TAG):]
self._in_span = False
else:
idx = buf.lower().find(self._OPEN_TAG)
if idx == -1:
# No open tag — hold back a potential partial open tag
held = self._max_partial_suffix(buf, self._OPEN_TAG)
if held:
out.append(buf[:-held])
self._buf = buf[-held:]
else:
out.append(buf)
return "".join(out)
# Emit text before the tag, enter span
if idx > 0:
out.append(buf[:idx])
buf = buf[idx + len(self._OPEN_TAG):]
self._in_span = True
return "".join(out)
def flush(self) -> str:
"""Emit any held-back buffer at end-of-stream.
If we're still inside an unterminated span the remaining content is
discarded (safer: leaking partial memory context is worse than a
truncated answer). Otherwise the held-back partial-tag tail is
emitted verbatim (it turned out not to be a real tag).
"""
if self._in_span:
self._buf = ""
self._in_span = False
return ""
tail = self._buf
self._buf = ""
return tail
@staticmethod
def _max_partial_suffix(buf: str, tag: str) -> int:
"""Return the length of the longest buf-suffix that is a tag-prefix.
Case-insensitive. Returns 0 if no suffix could start the tag.
"""
tag_lower = tag.lower()
buf_lower = buf.lower()
max_check = min(len(buf_lower), len(tag_lower) - 1)
for i in range(max_check, 0, -1):
if tag_lower.startswith(buf_lower[-i:]):
return i
return 0
def build_memory_context_block(raw_context: str) -> str:
"""Wrap prefetched memory in a fenced block with system note."""
if not raw_context or not raw_context.strip():
return ""
clean = sanitize_context(raw_context)
if clean != raw_context:
logger.warning("memory provider returned pre-wrapped context; stripped")
return (
"<memory-context>\n"
"[System note: The following is recalled memory context, "
@@ -312,7 +422,39 @@ class MemoryManager:
)
return "\n\n".join(parts)
def on_memory_write(self, action: str, target: str, content: str) -> None:
@staticmethod
def _provider_memory_write_metadata_mode(provider: MemoryProvider) -> str:
"""Return how to pass metadata to a provider's memory-write hook."""
try:
signature = inspect.signature(provider.on_memory_write)
except (TypeError, ValueError):
return "keyword"
params = list(signature.parameters.values())
if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
return "keyword"
if "metadata" in signature.parameters:
return "keyword"
accepted = [
p for p in params
if p.kind in (
inspect.Parameter.POSITIONAL_ONLY,
inspect.Parameter.POSITIONAL_OR_KEYWORD,
inspect.Parameter.KEYWORD_ONLY,
)
]
if len(accepted) >= 4:
return "positional"
return "legacy"
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Notify external providers when the built-in memory tool writes.
Skips the builtin provider itself (it's the source of the write).
@@ -321,7 +463,15 @@ class MemoryManager:
if provider.name == "builtin":
continue
try:
provider.on_memory_write(action, target, content)
metadata_mode = self._provider_memory_write_metadata_mode(provider)
if metadata_mode == "keyword":
provider.on_memory_write(
action, target, content, metadata=dict(metadata or {})
)
elif metadata_mode == "positional":
provider.on_memory_write(action, target, content, dict(metadata or {}))
else:
provider.on_memory_write(action, target, content)
except Exception as e:
logger.debug(
"Memory provider '%s' on_memory_write failed: %s",
+12 -3
View File
@@ -26,7 +26,7 @@ Optional hooks (override to opt in):
on_turn_start(turn, message, **kwargs) per-turn tick with runtime context
on_session_end(messages) end-of-session extraction
on_pre_compress(messages) -> str extract before context compression
on_memory_write(action, target, content) mirror built-in memory writes
on_memory_write(action, target, content, metadata=None) mirror built-in memory writes
on_delegation(task, result, **kwargs) parent-side observation of subagent work
"""
@@ -34,7 +34,7 @@ from __future__ import annotations
import logging
from abc import ABC, abstractmethod
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
@@ -220,12 +220,21 @@ class MemoryProvider(ABC):
should all have ``env_var`` set and this method stays no-op).
"""
def on_memory_write(self, action: str, target: str, content: str) -> None:
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Called when the built-in memory tool writes an entry.
action: 'add', 'replace', or 'remove'
target: 'memory' or 'user'
content: the entry content
metadata: structured provenance for the write, when available. Common
keys include ``write_origin``, ``execution_context``, ``session_id``,
``parent_session_id``, ``platform``, and ``tool_name``.
Use to mirror built-in memory writes to your backend.
"""
+168 -43
View File
@@ -6,6 +6,7 @@ and run_agent.py for pre-flight context checks.
import ipaddress
import logging
import os
import re
import time
from pathlib import Path
@@ -21,6 +22,25 @@ from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
def _resolve_requests_verify() -> bool | str:
"""Resolve SSL verify setting for `requests` calls from env vars.
The `requests` library only honours REQUESTS_CA_BUNDLE / CURL_CA_BUNDLE
by default. Hermes also honours HERMES_CA_BUNDLE (its own convention)
and SSL_CERT_FILE (used by the stdlib `ssl` module and by httpx), so
that a single env var can cover both `requests` and `httpx` callsites
inside the same process.
Returns either a filesystem path to a CA bundle, or True to defer to
the requests default (certifi).
"""
for env_var in ("HERMES_CA_BUNDLE", "REQUESTS_CA_BUNDLE", "SSL_CERT_FILE"):
val = os.getenv(env_var)
if val and os.path.isfile(val):
return val
return True
# Provider names that can appear as a "provider:" prefix before a model ID.
# Only these are stripped — Ollama-style "model:tag" colons (e.g. "qwen3.5:27b")
# are preserved so the full model name reaches cache lookups and server queries.
@@ -31,6 +51,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"qwen-oauth",
"xiaomi",
"arcee",
"gmi",
"custom", "local",
# Common aliases
"google", "google-gemini", "google-ai-studio",
@@ -40,6 +61,7 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"arcee-ai", "arceeai",
"gmi-cloud", "gmicloud",
"xai", "x-ai", "x.ai", "grok",
"nvidia", "nim", "nvidia-nim", "nemotron",
"qwen-portal",
@@ -86,9 +108,11 @@ _endpoint_model_metadata_cache_time: Dict[str, float] = {}
_ENDPOINT_MODEL_CACHE_TTL = 300
# Descending tiers for context length probing when the model is unknown.
# We start at 128K (a safe default for most modern models) and step down
# on context-length errors until one works.
# We start at 256K (covers GPT-5.x, many current large-context models) and
# step down on context-length errors until one works. Tier[0] is also the
# default fallback when no detection method succeeds.
CONTEXT_PROBE_TIERS = [
256_000,
128_000,
64_000,
32_000,
@@ -123,10 +147,11 @@ DEFAULT_CONTEXT_LENGTHS = {
"claude": 200000,
# OpenAI — GPT-5 family (most have 400k; specific overrides first)
# Source: https://developers.openai.com/api/docs/models
# GPT-5.5 (launched Apr 23 2026). 400k is the fallback for providers we
# can't probe live. ChatGPT Codex OAuth actually caps lower (272k as of
# Apr 2026) and is resolved via _resolve_codex_oauth_context_length().
"gpt-5.5": 400000,
# GPT-5.5 (launched Apr 23 2026) is 1.05M on the direct OpenAI API and
# ChatGPT Codex OAuth caps it at 272K; both paths resolve via their own
# provider-aware branches (_resolve_codex_oauth_context_length + models.dev).
# This hardcoded value is only reached when every probe misses.
"gpt-5.5": 1050000,
"gpt-5.4-nano": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4-mini": 400000, # 400k (not 1.05M like full 5.4)
"gpt-5.4": 1050000, # GPT-5.4, GPT-5.4 Pro (1.05M context)
@@ -142,7 +167,17 @@ DEFAULT_CONTEXT_LENGTHS = {
"gemma-4-31b": 256000,
"gemma-3": 131072,
"gemma": 8192, # fallback for older gemma models
# DeepSeek
# DeepSeek — V4 family ships with a 1M context window. The legacy
# aliases ``deepseek-chat`` / ``deepseek-reasoner`` are server-side
# mapped to the non-thinking / thinking modes of ``deepseek-v4-flash``
# and inherit the same 1M window. The ``deepseek`` substring entry
# below remains as a 128K fallback for older / unknown DeepSeek model
# ids (e.g. via custom endpoints).
# https://api-docs.deepseek.com/zh-cn/quick_start/pricing
"deepseek-v4-pro": 1_000_000,
"deepseek-v4-flash": 1_000_000,
"deepseek-chat": 1_000_000,
"deepseek-reasoner": 1_000_000,
"deepseek": 128000,
# Meta
"llama": 131072,
@@ -274,6 +309,7 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"integrate.api.nvidia.com": "nvidia",
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"api.gmi-serving.com": "gmi",
"ollama.com": "ollama-cloud",
}
@@ -495,7 +531,7 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
return _model_metadata_cache
try:
response = requests.get(OPENROUTER_MODELS_URL, timeout=10)
response = requests.get(OPENROUTER_MODELS_URL, timeout=10, verify=_resolve_requests_verify())
response.raise_for_status()
data = response.json()
@@ -562,6 +598,7 @@ def fetch_endpoint_model_metadata(
server_url.rstrip("/") + "/api/v1/models",
headers=headers,
timeout=10,
verify=_resolve_requests_verify(),
)
response.raise_for_status()
payload = response.json()
@@ -610,7 +647,7 @@ def fetch_endpoint_model_metadata(
for candidate in candidates:
url = candidate.rstrip("/") + "/models"
try:
response = requests.get(url, headers=headers, timeout=10)
response = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
response.raise_for_status()
payload = response.json()
cache: Dict[str, Dict[str, Any]] = {}
@@ -641,9 +678,10 @@ def fetch_endpoint_model_metadata(
try:
# Try /v1/props first (current llama.cpp); fall back to /props for older builds
base = candidate.rstrip("/").replace("/v1", "")
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5)
_verify = _resolve_requests_verify()
props_resp = requests.get(base + "/v1/props", headers=headers, timeout=5, verify=_verify)
if not props_resp.ok:
props_resp = requests.get(base + "/props", headers=headers, timeout=5)
props_resp = requests.get(base + "/props", headers=headers, timeout=5, verify=_verify)
if props_resp.ok:
props = props_resp.json()
gen_settings = props.get("default_generation_settings", {})
@@ -667,6 +705,29 @@ def fetch_endpoint_model_metadata(
return {}
def _resolve_endpoint_context_length(
model: str,
base_url: str,
api_key: str = "",
) -> Optional[int]:
"""Resolve context length from an endpoint's live ``/models`` metadata."""
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
return None
def _get_context_cache_path() -> Path:
"""Return path to the persistent context length cache file."""
from hermes_constants import get_hermes_home
@@ -715,6 +776,22 @@ def get_cached_context_length(model: str, base_url: str) -> Optional[int]:
return cache.get(key)
def _invalidate_cached_context_length(model: str, base_url: str) -> None:
"""Drop a stale cache entry so it gets re-resolved on the next lookup."""
key = f"{model}@{base_url}"
cache = _load_context_cache()
if key not in cache:
return
del cache[key]
path = _get_context_cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
except Exception as e:
logger.debug("Failed to invalidate context length cache entry %s: %s", key, e)
def get_next_probe_tier(current_length: int) -> Optional[int]:
"""Return the next lower probe tier, or None if already at minimum."""
for tier in CONTEXT_PROBE_TIERS:
@@ -992,7 +1069,7 @@ def _query_anthropic_context_length(model: str, base_url: str, api_key: str) ->
"x-api-key": api_key,
"anthropic-version": "2023-06-01",
}
resp = requests.get(url, headers=headers, timeout=10)
resp = requests.get(url, headers=headers, timeout=10, verify=_resolve_requests_verify())
if resp.status_code != 200:
return None
data = resp.json()
@@ -1054,6 +1131,7 @@ def _fetch_codex_oauth_context_lengths(access_token: str) -> Dict[str, int]:
"https://chatgpt.com/backend-api/codex/models?client_version=1.0.0",
headers={"Authorization": f"Bearer {access_token}"},
timeout=10,
verify=_resolve_requests_verify(),
)
if resp.status_code != 200:
logger.debug(
@@ -1154,12 +1232,14 @@ def get_model_context_length(
api_key: str = "",
config_context_length: int | None = None,
provider: str = "",
custom_providers: list | None = None,
) -> int:
"""Get the context length for a model.
Resolution order:
0. Explicit config override (model.context_length or custom_providers per-model)
1. Persistent cache (previously discovered via probing)
1b. AWS Bedrock static table (must precede custom-endpoint probe)
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
@@ -1173,6 +1253,23 @@ def get_model_context_length(
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
return config_context_length
# 0b. custom_providers per-model override — check before any probe.
# This closes the gap where /model switch and display paths used to fall
# back to 128K despite the user having a per-model context_length set.
# See #15779.
if custom_providers and base_url and model:
try:
from hermes_cli.config import get_custom_provider_context_length
cp_ctx = get_custom_provider_context_length(
model=model,
base_url=base_url,
custom_providers=custom_providers,
)
if cp_ctx:
return cp_ctx
except Exception:
pass # fall through to probing
# Normalise provider-prefixed model names (e.g. "local:model-name" →
# "model-name") so cache lookups and server queries use the bare ID that
# local servers actually know about. Ollama "model:tag" colons are preserved.
@@ -1182,7 +1279,41 @@ def get_model_context_length(
if base_url:
cached = get_cached_context_length(model, base_url)
if cached is not None:
return cached
# Invalidate stale Codex OAuth cache entries: pre-PR #14935 builds
# resolved gpt-5.x to the direct-API value (e.g. 1.05M) via
# models.dev and persisted it. Codex OAuth caps at 272K for every
# slug, so any cached Codex entry at or above 400K is a leftover
# from the old resolution path. Drop it and fall through to the
# live /models probe in step 5 below.
if provider == "openai-codex" and cached >= 400_000:
logger.info(
"Dropping stale Codex cache entry %s@%s -> %s (pre-fix value); "
"re-resolving via live /models probe",
model, base_url, f"{cached:,}",
)
_invalidate_cached_context_length(model, base_url)
else:
return cached
# 1b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels API doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py that reflects
# AWS-imposed limits (e.g. 200K for Claude models vs 1M on the native
# Anthropic API). This must run BEFORE the custom-endpoint probe at
# step 2 — bedrock-runtime.<region>.amazonaws.com is not in
# _URL_TO_PROVIDER, so it would otherwise be treated as a custom endpoint,
# fail the /models probe (Bedrock doesn't expose that shape), and fall
# back to the 128K default before reaching the original step 4b branch.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 2. Active endpoint metadata for truly custom/unknown endpoints.
# Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
@@ -1190,22 +1321,9 @@ def get_model_context_length(
# returns 128k) instead of the model's full context (400k). models.dev
# has the correct per-provider values and is checked at step 5+.
if _is_custom_endpoint(base_url) and not _is_known_provider_base_url(base_url):
endpoint_metadata = fetch_endpoint_model_metadata(base_url, api_key=api_key)
matched = endpoint_metadata.get(model)
if not matched:
# Single-model servers: if only one model is loaded, use it
if len(endpoint_metadata) == 1:
matched = next(iter(endpoint_metadata.values()))
else:
# Fuzzy match: substring in either direction
for key, entry in endpoint_metadata.items():
if model in key or key in model:
matched = entry
break
if matched:
context_length = matched.get("context_length")
if isinstance(context_length, int):
return context_length
context_length = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if context_length is not None:
return context_length
if not _is_known_provider_base_url(base_url):
# 3. Try querying local server directly
if is_local_endpoint(base_url):
@@ -1229,19 +1347,7 @@ def get_model_context_length(
if ctx:
return ctx
# 4b. AWS Bedrock — use static context length table.
# Bedrock's ListFoundationModels doesn't expose context window sizes,
# so we maintain a curated table in bedrock_adapter.py.
if provider == "bedrock" or (
base_url
and base_url_hostname(base_url).startswith("bedrock-runtime.")
and base_url_host_matches(base_url, "amazonaws.com")
):
try:
from agent.bedrock_adapter import get_bedrock_context_length
return get_bedrock_context_length(model)
except ImportError:
pass # boto3 not installed — fall through to generic resolution
# 4b. (Bedrock handled earlier at step 1b — before custom-endpoint probe.)
# 5. Provider-aware lookups (before generic OpenRouter cache)
# These are provider-specific and take priority over the generic OR cache,
@@ -1255,6 +1361,19 @@ def get_model_context_length(
if inferred:
effective_provider = inferred
# 5a. Copilot live /models API — max_prompt_tokens from the user's account.
# This catches account-specific models (e.g. claude-opus-4.6-1m) that
# don't exist in models.dev. For models that ARE in models.dev, this
# returns the provider-enforced limit which is what users can actually use.
if effective_provider in ("copilot", "copilot-acp", "github-copilot"):
try:
from hermes_cli.models import get_copilot_model_context
ctx = get_copilot_model_context(model, api_key=api_key)
if ctx:
return ctx
except Exception:
pass # Fall through to models.dev
if effective_provider == "nous":
ctx = _resolve_nous_context_length(model)
if ctx:
@@ -1268,6 +1387,12 @@ def get_model_context_length(
if base_url:
save_context_length(model, base_url, codex_ctx)
return codex_ctx
if effective_provider == "gmi" and base_url:
# GMI exposes authoritative context_length via /models, but it is not
# in models.dev yet. Preserve that higher-fidelity endpoint lookup.
ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if ctx is not None:
return ctx
if effective_provider:
from agent.models_dev import lookup_models_dev_context
ctx = lookup_models_dev_context(effective_provider, model)
@@ -1277,7 +1402,7 @@ def get_model_context_length(
# 6. OpenRouter live API metadata (provider-unaware fallback)
metadata = fetch_model_metadata()
if model in metadata:
return metadata[model].get("context_length", 128000)
return metadata[model].get("context_length", DEFAULT_FALLBACK_CONTEXT)
# 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
# Only check `default_model in model` (is the key a substring of the input).
+142
View File
@@ -180,3 +180,145 @@ def format_remaining(seconds: float) -> str:
h, remainder = divmod(s, 3600)
m = remainder // 60
return f"{h}h {m}m" if m else f"{h}h"
# Buckets with reset windows shorter than this are treated as transient
# (upstream jitter, secondary throttling) rather than a genuine quota
# exhaustion worth a cross-session breaker trip.
_MIN_RESET_FOR_BREAKER_SECONDS = 60.0
def is_genuine_nous_rate_limit(
*,
headers: Optional[Mapping[str, str]] = None,
last_known_state: Optional[Any] = None,
) -> bool:
"""Decide whether a 429 from Nous Portal is a real account rate limit.
Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi,
MiMo, Hermes, ...) behind one endpoint. A 429 can mean either:
(a) The caller's own RPM / RPH / TPM / TPH bucket on Nous is
exhausted a genuine rate limit that will last until the
bucket resets.
(b) The upstream provider is out of capacity for a specific model
transient, clears in seconds, and has nothing to do with
the caller's quota on Nous.
Tripping the cross-session breaker on (b) blocks ALL Nous requests
(and all models, since Nous is one provider key) for minutes even
though the caller's account is healthy and a different model would
have worked. That's the bug users hit when DeepSeek V4 Pro 429s
trigger a breaker that then blocks Kimi 2.6 and MiMo V2.5 Pro.
We tell the two apart by looking at:
1. The 429 response's own ``x-ratelimit-*`` headers. Nous emits
the full suite on every response including 429s. An exhausted
bucket (``remaining == 0`` with a reset window >= 60s) is
proof of (a).
2. The last-known-good rate-limit state captured by
``_capture_rate_limits()`` on the previous successful
response. If any bucket there was already near-exhausted with
a substantial reset window, the current 429 is almost
certainly (a) continuing from that condition.
If neither signal fires, we treat the 429 as (b): fail the single
request, let the retry loop or model-switch proceed, and do NOT
write the cross-session breaker file.
Returns True when the evidence points at (a).
"""
# Signal 1: current 429 response headers.
state = _parse_buckets_from_headers(headers)
if _has_exhausted_bucket(state):
return True
# Signal 2: last-known-good state from a recent successful response.
# Accepts either a RateLimitState (dataclass from rate_limit_tracker)
# or a dict of bucket snapshots.
if last_known_state is not None and _has_exhausted_bucket_in_object(last_known_state):
return True
return False
def _parse_buckets_from_headers(
headers: Optional[Mapping[str, str]],
) -> dict[str, tuple[Optional[int], Optional[float]]]:
"""Extract (remaining, reset_seconds) per bucket from x-ratelimit-* headers.
Returns empty dict when no rate-limit headers are present.
"""
if not headers:
return {}
lowered = {k.lower(): v for k, v in headers.items()}
if not any(k.startswith("x-ratelimit-") for k in lowered):
return {}
def _maybe_int(raw: Optional[str]) -> Optional[int]:
if raw is None:
return None
try:
return int(float(raw))
except (TypeError, ValueError):
return None
def _maybe_float(raw: Optional[str]) -> Optional[float]:
if raw is None:
return None
try:
return float(raw)
except (TypeError, ValueError):
return None
result: dict[str, tuple[Optional[int], Optional[float]]] = {}
for tag in ("requests", "requests-1h", "tokens", "tokens-1h"):
remaining = _maybe_int(lowered.get(f"x-ratelimit-remaining-{tag}"))
reset = _maybe_float(lowered.get(f"x-ratelimit-reset-{tag}"))
if remaining is not None or reset is not None:
result[tag] = (remaining, reset)
return result
def _has_exhausted_bucket(
buckets: Mapping[str, tuple[Optional[int], Optional[float]]],
) -> bool:
"""Return True when any bucket has remaining == 0 AND a meaningful reset window."""
for remaining, reset in buckets.values():
if remaining is None or remaining > 0:
continue
if reset is None:
continue
if reset >= _MIN_RESET_FOR_BREAKER_SECONDS:
return True
return False
def _has_exhausted_bucket_in_object(state: Any) -> bool:
"""Check a RateLimitState-like object for an exhausted bucket.
Accepts the dataclass from ``agent.rate_limit_tracker`` (buckets
exposed as attributes ``requests_min``, ``requests_hour``,
``tokens_min``, ``tokens_hour``) and falls back gracefully for any
object missing those attributes.
"""
for attr in ("requests_min", "requests_hour", "tokens_min", "tokens_hour"):
bucket = getattr(state, attr, None)
if bucket is None:
continue
limit = getattr(bucket, "limit", 0) or 0
remaining = getattr(bucket, "remaining", 0) or 0
# Prefer the adjusted "remaining_seconds_now" property when present;
# fall back to raw reset_seconds.
reset = getattr(bucket, "remaining_seconds_now", None)
if reset is None:
reset = getattr(bucket, "reset_seconds", 0.0) or 0.0
if limit <= 0:
continue
if remaining > 0:
continue
if reset >= _MIN_RESET_FOR_BREAKER_SECONDS:
return True
return False
+191
View File
@@ -0,0 +1,191 @@
"""
Contextual first-touch onboarding hints.
Instead of blocking first-run questionnaires, show a one-time hint the *first*
time a user hits a behavior fork message-while-running, first long-running
tool, etc. Each hint is shown once per install (tracked in ``config.yaml`` under
``onboarding.seen.<flag>``) and then never again.
Keep this module tiny and dependency-free so both the CLI and gateway can import
it without pulling in heavy modules.
"""
from __future__ import annotations
import logging
from pathlib import Path
from typing import Any, Mapping, Optional
logger = logging.getLogger(__name__)
# -------------------------------------------------------------------------
# Flag names (stable — used as config.yaml keys under onboarding.seen)
# -------------------------------------------------------------------------
BUSY_INPUT_FLAG = "busy_input_prompt"
TOOL_PROGRESS_FLAG = "tool_progress_prompt"
OPENCLAW_RESIDUE_FLAG = "openclaw_residue_cleanup"
# -------------------------------------------------------------------------
# Hint content
# -------------------------------------------------------------------------
def busy_input_hint_gateway(mode: str) -> str:
"""Hint shown the first time a user messages while the agent is busy.
``mode`` is the effective busy_input_mode that was just applied, so the
message matches reality ("I just interrupted…" vs "I just queued…").
"""
if mode == "queue":
return (
"💡 First-time tip — I queued your message instead of interrupting. "
"Send `/busy interrupt` to make new messages stop the current task "
"immediately, or `/busy status` to check. This notice won't appear again."
)
if mode == "steer":
return (
"💡 First-time tip — I steered your message into the current run; "
"it will arrive after the next tool call instead of interrupting. "
"Send `/busy interrupt` or `/busy queue` to change this, or "
"`/busy status` to check. This notice won't appear again."
)
return (
"💡 First-time tip — I just interrupted my current task to answer you. "
"Send `/busy queue` to queue follow-ups for after the current task instead, "
"`/busy steer` to inject them mid-run without interrupting, or "
"`/busy status` to check. This notice won't appear again."
)
def busy_input_hint_cli(mode: str) -> str:
"""CLI version of the busy-input hint (plain text, no markdown)."""
if mode == "queue":
return (
"(tip) Your message was queued for the next turn. "
"Use /busy interrupt to make Enter stop the current run instead, "
"or /busy steer to inject mid-run. This tip only shows once."
)
if mode == "steer":
return (
"(tip) Your message was steered into the current run; it arrives "
"after the next tool call. Use /busy interrupt or /busy queue to "
"change this. This tip only shows once."
)
return (
"(tip) Your message interrupted the current run. "
"Use /busy queue to queue messages for the next turn instead, "
"or /busy steer to inject mid-run. This tip only shows once."
)
def tool_progress_hint_gateway() -> str:
return (
"💡 First-time tip — that tool took a while and I'm streaming every step. "
"If the progress messages feel noisy, send `/verbose` to cycle modes "
"(all → new → off). This notice won't appear again."
)
def tool_progress_hint_cli() -> str:
return (
"(tip) That tool ran for a while. Use /verbose to cycle tool-progress "
"display modes (all -> new -> off -> verbose). This tip only shows once."
)
def openclaw_residue_hint_cli() -> str:
"""Banner shown the first time Hermes starts and finds ``~/.openclaw/``.
OpenClaw-era config, memory, and skill paths in ``~/.openclaw/`` will
otherwise attract the agent (memory entries like ``~/.openclaw/config.yaml``
get carried forward and the agent dutifully reads them). ``hermes claw
cleanup`` renames the directory so the agent stops finding it.
"""
return (
"Heads up — an OpenClaw workspace was detected at ~/.openclaw/.\n"
"After migrating, the agent can still get confused and read that "
"directory's config/memory instead of Hermes's.\n"
"Run `hermes claw cleanup` to archive it (rename → .openclaw.pre-migration). "
"This tip only shows once; rerun it any time with `hermes claw cleanup`."
)
def detect_openclaw_residue(home: Optional[Path] = None) -> bool:
"""Return True if an OpenClaw workspace directory is present in ``$HOME``.
Pure filesystem check no side effects. ``home`` override exists for tests.
"""
base = home or Path.home()
try:
return (base / ".openclaw").is_dir()
except OSError:
return False
# -------------------------------------------------------------------------
# State read / write
# -------------------------------------------------------------------------
def _get_seen_dict(config: Mapping[str, Any]) -> Mapping[str, Any]:
onboarding = config.get("onboarding") if isinstance(config, Mapping) else None
if not isinstance(onboarding, Mapping):
return {}
seen = onboarding.get("seen")
return seen if isinstance(seen, Mapping) else {}
def is_seen(config: Mapping[str, Any], flag: str) -> bool:
"""Return True if the user has already been shown this first-touch hint."""
return bool(_get_seen_dict(config).get(flag))
def mark_seen(config_path: Path, flag: str) -> bool:
"""Persist ``onboarding.seen.<flag> = True`` to ``config_path``.
Uses the atomic YAML writer so a concurrent process can't observe a
partially-written file. Returns True on success, False on any error
(including the config file being absent onboarding is best-effort).
"""
try:
import yaml
from utils import atomic_yaml_write
except Exception as e: # pragma: no cover — dependency issue
logger.debug("onboarding: failed to import yaml/utils: %s", e)
return False
try:
cfg: dict = {}
if config_path.exists():
with open(config_path, encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
if not isinstance(cfg.get("onboarding"), dict):
cfg["onboarding"] = {}
seen = cfg["onboarding"].get("seen")
if not isinstance(seen, dict):
seen = {}
cfg["onboarding"]["seen"] = seen
if seen.get(flag) is True:
return True # already marked — nothing to do
seen[flag] = True
atomic_yaml_write(config_path, cfg)
return True
except Exception as e:
logger.debug("onboarding: failed to mark flag %s: %s", flag, e)
return False
__all__ = [
"BUSY_INPUT_FLAG",
"TOOL_PROGRESS_FLAG",
"OPENCLAW_RESIDUE_FLAG",
"busy_input_hint_gateway",
"busy_input_hint_cli",
"tool_progress_hint_gateway",
"tool_progress_hint_cli",
"openclaw_residue_hint_cli",
"detect_openclaw_residue",
"is_seen",
"mark_seen",
]
+34
View File
@@ -141,6 +141,12 @@ DEFAULT_AGENT_IDENTITY = (
"Be targeted and efficient in your exploration and investigations."
)
HERMES_AGENT_HELP_GUIDANCE = (
"If the user asks about configuring, setting up, or using Hermes Agent "
"itself, load the `hermes-agent` skill with skill_view(name='hermes-agent') "
"before answering. Docs: https://hermes-agent.nousresearch.com/docs"
)
MEMORY_GUIDANCE = (
"You have persistent memory across sessions. Save durable facts using the memory "
"tool: user preferences, environment details, tool quirks, and stable conventions. "
@@ -422,6 +428,29 @@ PLATFORM_HINTS = {
"your response. Images are sent as native photos, and other files arrive as downloadable "
"documents."
),
"yuanbao": (
"You are on Yuanbao (腾讯元宝), a Chinese AI assistant platform. "
"Markdown formatting is supported (code blocks, tables, bold/italic). "
"You CAN send media files natively — to deliver a file to the user, include "
"MEDIA:/absolute/path/to/file in your response. The file will be sent as a native "
"Yuanbao attachment: images (.jpg, .png, .webp, .gif) are sent as photos, "
"and other files (.pdf, .docx, .txt, .zip, etc.) arrive as downloadable documents "
"(max 50 MB). You can also include image URLs in markdown format ![alt](url) and "
"they will be downloaded and sent as native photos. "
"Do NOT tell the user you lack file-sending capability — use MEDIA: syntax "
"whenever a file delivery is appropriate.\n\n"
"Stickers (贴纸 / 表情包 / TIM face): Yuanbao has a built-in sticker catalogue. "
"When the user sends a sticker (you see '[emoji: 名称]' in their message) or asks "
"you to send/reply-with a 贴纸/表情/表情包, you MUST use the sticker tools:\n"
" 1. Call yb_search_sticker with a Chinese keyword (e.g. '666', '比心', '吃瓜', "
" '捂脸', '合十') to discover matching sticker_ids.\n"
" 2. Call yb_send_sticker with the chosen sticker_id or name — this sends a real "
" TIMFaceElem that renders as a native sticker in the chat.\n"
"DO NOT draw sticker-like PNGs with execute_code/Pillow/matplotlib and then send "
"them via MEDIA: or send_image_file. That produces a fake low-quality 'sticker' "
"image and is the WRONG path. Bare Unicode emoji in text is also not a substitute "
"— when a sticker is the right response, use yb_send_sticker."
),
}
# ---------------------------------------------------------------------------
@@ -825,6 +854,11 @@ def build_skills_system_prompt(
"Skills also encode the user's preferred approach, conventions, and quality standards "
"for tasks like code review, planning, and testing — load them even for tasks you "
"already know how to do, because the skill defines how it should be done here.\n"
"Whenever the user asks you to configure, set up, install, enable, disable, modify, "
"or troubleshoot Hermes Agent itself — its CLI, config, models, providers, tools, "
"skills, voice, gateway, plugins, or any feature — load the `hermes-agent` skill "
"first. It has the actual commands (e.g. `hermes config set …`, `hermes tools`, "
"`hermes setup`) so you don't have to guess or invent workarounds.\n"
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"After difficult/iterative tasks, offer to save as a skill. "
"If a skill you loaded was missing steps, had wrong commands, or needed "
+5 -1
View File
@@ -754,7 +754,11 @@ def _resolve_effective_accept(
if env in ("1", "true", "yes", "on"):
return True
cfg_val = cfg.get("hooks_auto_accept", False)
return bool(cfg_val)
if isinstance(cfg_val, bool):
return cfg_val
if isinstance(cfg_val, str):
return cfg_val.strip().lower() in ("1", "true", "yes", "on")
return False
# ---------------------------------------------------------------------------
+12 -135
View File
@@ -1,154 +1,29 @@
"""Shared slash command helpers for skills and built-in prompt-style modes.
"""Shared slash command helpers for skills.
Shared between CLI (cli.py) and gateway (gateway/run.py) so both surfaces
can invoke skills via /skill-name commands and prompt-only built-ins like
/plan.
can invoke skills via /skill-name commands.
"""
import json
import logging
import re
import subprocess
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, Optional
from hermes_constants import display_hermes_home
from agent.skill_preprocessing import (
expand_inline_shell as _expand_inline_shell,
load_skills_config as _load_skills_config,
substitute_template_vars as _substitute_template_vars,
)
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
_PLAN_SLUG_RE = re.compile(r"[^a-z0-9]+")
# Patterns for sanitizing skill names into clean hyphen-separated slugs.
_SKILL_INVALID_CHARS = re.compile(r"[^a-z0-9-]")
_SKILL_MULTI_HYPHEN = re.compile(r"-{2,}")
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only — no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def _load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def _substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def _run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return f"[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "…[truncated]"
return output
def _expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return _run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def build_plan_path(
user_instruction: str = "",
*,
now: datetime | None = None,
) -> Path:
"""Return the default workspace-relative markdown path for a /plan invocation.
Relative paths are intentional: file tools are task/backend-aware and resolve
them against the active working directory for local, docker, ssh, modal,
daytona, and similar terminal backends. That keeps the plan with the active
workspace instead of the Hermes host's global home directory.
"""
slug_source = (user_instruction or "").strip().splitlines()[0] if user_instruction else ""
slug = _PLAN_SLUG_RE.sub("-", slug_source.lower()).strip("-")
if slug:
slug = "-".join(part for part in slug.split("-")[:8] if part)[:48].strip("-")
slug = slug or "conversation-plan"
timestamp = (now or datetime.now()).strftime("%Y-%m-%d_%H%M%S")
return Path(".hermes") / "plans" / f"{timestamp}-{slug}.md"
def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tuple[dict[str, Any], Path | None, str] | None:
"""Load a skill by name/path and return (loaded_payload, skill_dir, display_name)."""
raw_identifier = (skill_identifier or "").strip()
@@ -167,7 +42,9 @@ def _load_skill_payload(skill_identifier: str, task_id: str | None = None) -> tu
else:
normalized = raw_identifier.lstrip("/")
loaded_skill = json.loads(skill_view(normalized, task_id=task_id))
loaded_skill = json.loads(
skill_view(normalized, task_id=task_id, preprocess=False)
)
except Exception:
return None
@@ -452,7 +329,7 @@ def build_skill_invocation_message(
loaded_skill, skill_dir, skill_name = loaded
activation_note = (
f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want '
f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want '
"you to follow its instructions. The full skill content is loaded below.]"
)
return _build_skill_message(
@@ -491,7 +368,7 @@ def build_preloaded_skills_prompt(
loaded_skill, skill_dir, skill_name = loaded
activation_note = (
f'[SYSTEM: The user launched this CLI session with the "{skill_name}" skill '
f'[IMPORTANT: The user launched this CLI session with the "{skill_name}" skill '
"preloaded. Treat its instructions as active guidance for the duration of this "
"session unless the user overrides them.]"
)
+131
View File
@@ -0,0 +1,131 @@
"""Shared SKILL.md preprocessing helpers."""
import logging
import re
import subprocess
from pathlib import Path
logger = logging.getLogger(__name__)
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
# Tokens that don't resolve (e.g. ${HERMES_SESSION_ID} with no session) are
# left as-is so the user can debug them.
_SKILL_TEMPLATE_RE = re.compile(r"\$\{(HERMES_SKILL_DIR|HERMES_SESSION_ID)\}")
# Matches inline shell snippets like: !`date +%Y-%m-%d`
# Non-greedy, single-line only -- no newlines inside the backticks.
_INLINE_SHELL_RE = re.compile(r"!`([^`\n]+)`")
# Cap inline-shell output so a runaway command can't blow out the context.
_INLINE_SHELL_MAX_OUTPUT = 4000
def load_skills_config() -> dict:
"""Load the ``skills`` section of config.yaml (best-effort)."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
skills_cfg = cfg.get("skills")
if isinstance(skills_cfg, dict):
return skills_cfg
except Exception:
logger.debug("Could not read skills config", exc_info=True)
return {}
def substitute_template_vars(
content: str,
skill_dir: Path | None,
session_id: str | None,
) -> str:
"""Replace ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} in skill content.
Only substitutes tokens for which a concrete value is available --
unresolved tokens are left in place so the author can spot them.
"""
if not content:
return content
skill_dir_str = str(skill_dir) if skill_dir else None
def _replace(match: re.Match) -> str:
token = match.group(1)
if token == "HERMES_SKILL_DIR" and skill_dir_str:
return skill_dir_str
if token == "HERMES_SESSION_ID" and session_id:
return str(session_id)
return match.group(0)
return _SKILL_TEMPLATE_RE.sub(_replace, content)
def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
"""Execute a single inline-shell snippet and return its stdout (trimmed).
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
try:
completed = subprocess.run(
["bash", "-c", command],
cwd=str(cwd) if cwd else None,
capture_output=True,
text=True,
timeout=max(1, int(timeout)),
check=False,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"
except FileNotFoundError:
return "[inline-shell error: bash not found]"
except Exception as exc:
return f"[inline-shell error: {exc}]"
output = (completed.stdout or "").rstrip("\n")
if not output and completed.stderr:
output = completed.stderr.rstrip("\n")
if len(output) > _INLINE_SHELL_MAX_OUTPUT:
output = output[:_INLINE_SHELL_MAX_OUTPUT] + "...[truncated]"
return output
def expand_inline_shell(
content: str,
skill_dir: Path | None,
timeout: int,
) -> str:
"""Replace every !`cmd` snippet in ``content`` with its stdout.
Runs each snippet with the skill directory as CWD so relative paths in
the snippet work the way the author expects.
"""
if "!`" not in content:
return content
def _replace(match: re.Match) -> str:
cmd = match.group(1).strip()
if not cmd:
return ""
return run_inline_shell(cmd, skill_dir, timeout)
return _INLINE_SHELL_RE.sub(_replace, content)
def preprocess_skill_content(
content: str,
skill_dir: Path | None,
session_id: str | None = None,
skills_cfg: dict | None = None,
) -> str:
"""Apply configured SKILL.md template and inline-shell preprocessing."""
if not content:
return content
cfg = skills_cfg if isinstance(skills_cfg, dict) else load_skills_config()
if cfg.get("template_vars", True):
content = substitute_template_vars(content, skill_dir, session_id)
if cfg.get("inline_shell", False):
timeout = int(cfg.get("inline_shell_timeout", 10) or 10)
content = expand_inline_shell(content, skill_dir, timeout)
return content
+33 -4
View File
@@ -6,12 +6,18 @@ adds latency to the user-facing reply.
import logging
import threading
from typing import Optional
from typing import Callable, Optional
from agent.auxiliary_client import call_llm
logger = logging.getLogger(__name__)
# Callback signature: (task_name, exception) -> None. Used to surface
# auxiliary failures to the user through AIAgent._emit_auxiliary_failure
# so silent-drops (e.g. OpenRouter 402 exhausting the fallback chain)
# become visible instead of piling up as NULL session titles.
FailureCallback = Callable[[str, BaseException], None]
_TITLE_PROMPT = (
"Generate a short, descriptive title (3-7 words) for a conversation that starts with the "
"following exchange. The title should capture the main topic or intent. "
@@ -19,11 +25,21 @@ _TITLE_PROMPT = (
)
def generate_title(user_message: str, assistant_response: str, timeout: float = 30.0) -> Optional[str]:
def generate_title(
user_message: str,
assistant_response: str,
timeout: float = 30.0,
failure_callback: Optional[FailureCallback] = None,
) -> Optional[str]:
"""Generate a session title from the first exchange.
Uses the auxiliary LLM client (cheapest/fastest available model).
Returns the title string or None on failure.
``failure_callback`` is invoked with ``(task, exception)`` when the
auxiliary call raises the caller typically wires this to
``AIAgent._emit_auxiliary_failure`` so the user sees a warning instead
of silently accumulating untitled sessions.
"""
# Truncate long messages to keep the request small
user_snippet = user_message[:500] if user_message else ""
@@ -52,7 +68,15 @@ def generate_title(user_message: str, assistant_response: str, timeout: float =
title = title[:77] + "..."
return title if title else None
except Exception as e:
logger.debug("Title generation failed: %s", e)
# Log at WARNING so this shows up in agent.log without debug mode.
# Full detail at debug level for operators who need the stack.
logger.warning("Title generation failed: %s", e)
logger.debug("Title generation traceback", exc_info=True)
if failure_callback is not None:
try:
failure_callback("title generation", e)
except Exception:
logger.debug("Title generation failure_callback raised", exc_info=True)
return None
@@ -61,6 +85,7 @@ def auto_title_session(
session_id: str,
user_message: str,
assistant_response: str,
failure_callback: Optional[FailureCallback] = None,
) -> None:
"""Generate and set a session title if one doesn't already exist.
@@ -81,7 +106,9 @@ def auto_title_session(
except Exception:
return
title = generate_title(user_message, assistant_response)
title = generate_title(
user_message, assistant_response, failure_callback=failure_callback
)
if not title:
return
@@ -98,6 +125,7 @@ def maybe_auto_title(
user_message: str,
assistant_response: str,
conversation_history: list,
failure_callback: Optional[FailureCallback] = None,
) -> None:
"""Fire-and-forget title generation after the first exchange.
@@ -119,6 +147,7 @@ def maybe_auto_title(
thread = threading.Thread(
target=auto_title_session,
args=(session_db, session_id, user_message, assistant_response),
kwargs={"failure_callback": failure_callback},
daemon=True,
name="auto-title",
)
+7 -2
View File
@@ -23,9 +23,14 @@ def get_transport(api_mode: str):
This allows gradual migration call sites can check for None
and fall back to the legacy code path.
"""
if not _REGISTRY:
_discover_transports()
cls = _REGISTRY.get(api_mode)
if cls is None:
# The registry can be partially populated when a specific transport
# module was imported directly (for example chat_completions before
# codex). Discover on misses, not only when the registry is empty, so
# test/order-dependent imports do not make valid api_modes unavailable.
_discover_transports()
cls = _REGISTRY.get(api_mode)
if cls is None:
return None
return cls()
+5 -4
View File
@@ -31,15 +31,15 @@ class ChatCompletionsTransport(ProviderTransport):
def convert_messages(self, messages: List[Dict[str, Any]], **kwargs) -> List[Dict[str, Any]]:
"""Messages are already in OpenAI format — sanitize Codex leaks only.
Strips Codex Responses API fields (``codex_reasoning_items`` on the
message, ``call_id``/``response_item_id`` on tool_calls) that strict
chat-completions providers reject with 400/422.
Strips Codex Responses API fields (``codex_reasoning_items`` /
``codex_message_items`` on the message, ``call_id``/``response_item_id``
on tool_calls) that strict chat-completions providers reject with 400/422.
"""
needs_sanitize = False
for msg in messages:
if not isinstance(msg, dict):
continue
if "codex_reasoning_items" in msg:
if "codex_reasoning_items" in msg or "codex_message_items" in msg:
needs_sanitize = True
break
tool_calls = msg.get("tool_calls")
@@ -59,6 +59,7 @@ class ChatCompletionsTransport(ProviderTransport):
if not isinstance(msg, dict):
continue
msg.pop("codex_reasoning_items", None)
msg.pop("codex_message_items", None)
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
+20
View File
@@ -120,6 +120,24 @@ class ResponsesApiTransport(ProviderTransport):
if request_overrides:
kwargs.update(request_overrides)
if is_codex_backend:
prompt_cache_key = kwargs.get("prompt_cache_key")
cache_scope_id = str(prompt_cache_key or session_id or "").strip()
if cache_scope_id:
existing_extra_headers = kwargs.get("extra_headers")
merged_extra_headers: Dict[str, str] = {}
if isinstance(existing_extra_headers, dict):
merged_extra_headers.update(
{
str(key): str(value)
for key, value in existing_extra_headers.items()
if key and value is not None
}
)
merged_extra_headers["session_id"] = cache_scope_id
merged_extra_headers["x-client-request-id"] = cache_scope_id
kwargs["extra_headers"] = merged_extra_headers
max_tokens = params.get("max_tokens")
if max_tokens is not None and not is_codex_backend:
kwargs["max_output_tokens"] = max_tokens
@@ -160,6 +178,8 @@ class ResponsesApiTransport(ProviderTransport):
provider_data = {}
if msg and hasattr(msg, "codex_reasoning_items") and msg.codex_reasoning_items:
provider_data["codex_reasoning_items"] = msg.codex_reasoning_items
if msg and hasattr(msg, "codex_message_items") and msg.codex_message_items:
provider_data["codex_message_items"] = msg.codex_message_items
if msg and hasattr(msg, "reasoning_details") and msg.reasoning_details:
provider_data["reasoning_details"] = msg.reasoning_details
+6 -1
View File
@@ -97,7 +97,7 @@ class NormalizedResponse:
Response-level ``provider_data`` examples:
* Anthropic: ``{"reasoning_details": [...]}``
* Codex: ``{"codex_reasoning_items": [...]}``
* Codex: ``{"codex_reasoning_items": [...], "codex_message_items": [...]}``
* Others: ``None``
"""
@@ -126,6 +126,11 @@ class NormalizedResponse:
pd = self.provider_data or {}
return pd.get("codex_reasoning_items")
@property
def codex_message_items(self):
pd = self.provider_data or {}
return pd.get("codex_message_items")
# ---------------------------------------------------------------------------
# Factory helpers
+2 -6
View File
@@ -951,13 +951,9 @@ class BatchRunner:
root_logger.setLevel(original_level)
# Aggregate all batch statistics and update checkpoint
all_completed_prompts = list(completed_prompts_set)
total_reasoning_stats = {"total_assistant_turns": 0, "turns_with_reasoning": 0, "turns_without_reasoning": 0}
for batch_result in results:
# Add newly completed prompts
all_completed_prompts.extend(batch_result.get("completed_prompts", []))
# Aggregate tool stats
for tool_name, stats in batch_result.get("tool_stats", {}).items():
if tool_name not in total_tool_stats:
@@ -977,7 +973,7 @@ class BatchRunner:
# Save final checkpoint (best-effort; incremental writes already happened)
try:
checkpoint_data["completed_prompts"] = all_completed_prompts
checkpoint_data["completed_prompts"] = sorted(completed_prompts_set)
self._save_checkpoint(checkpoint_data, lock=checkpoint_lock)
except Exception as ckpt_err:
print(f"⚠️ Warning: Failed to save final checkpoint: {ckpt_err}")
+47 -10
View File
@@ -326,6 +326,16 @@ compression:
# To pin a specific model/provider for compression summaries, use the
# auxiliary section below (auxiliary.compression.provider / model).
# =============================================================================
# Anthropic prompt caching TTL
# =============================================================================
# When prompt caching is active (Claude via OpenRouter or native Anthropic),
# Anthropic supports two TTL tiers for cached prefixes: "5m" (default) and
# "1h". Other values are ignored and "5m" is used.
#
prompt_caching:
cache_ttl: "5m" # use "1h" for long sessions with pauses between turns
# =============================================================================
# Auxiliary Models (Advanced — Experimental)
# =============================================================================
@@ -596,6 +606,7 @@ platform_toolsets:
signal: [hermes-signal]
homeassistant: [hermes-homeassistant]
qqbot: [hermes-qqbot]
yuanbao: [hermes-yuanbao]
# =============================================================================
# Gateway Platform Settings
@@ -780,9 +791,16 @@ code_execution:
# Supports single tasks and batch mode (default 3 parallel, configurable).
delegation:
max_iterations: 50 # Max tool-calling turns per child (default: 50)
# max_concurrent_children: 3 # Max parallel child agents (default: 3)
# max_spawn_depth: 1 # Tree depth cap (1-3, default: 1 = flat). Raise to 2 or 3 to allow orchestrator children to spawn their own workers.
# max_concurrent_children: 3 # Max parallel child agents per batch (default: 3, floor: 1, no ceiling).
# WARNING: values above 10 multiply API cost linearly.
# max_spawn_depth: 1 # Delegation tree depth cap (range: 1-3, default: 1 = flat).
# Raise to 2 to allow workers to spawn their own subagents.
# Requires role="orchestrator" on intermediate agents.
# orchestrator_enabled: true # Kill switch for role="orchestrator" children (default: true).
# subagent_auto_approve: false # When a subagent hits a dangerous-command approval prompt, auto-deny (default: false)
# or auto-approve "once" (true) instead of blocking on stdin.
# The parent TUI owns stdin, so blocking would deadlock; non-interactive resolution is required.
# Both choices emit a logger.warning audit line. Flip to true only for cron/batch pipelines.
# inherit_mcp_toolsets: true # When explicit child toolsets are narrowed, also keep the parent's MCP toolsets (default: true). Set false for strict intersection.
# model: "google/gemini-3-flash-preview" # Override model for subagents (empty = inherit parent)
# provider: "openrouter" # Override provider for subagents (empty = inherit parent)
@@ -807,7 +825,9 @@ delegation:
# Display
# =============================================================================
display:
# Use compact banner mode
# Use compact banner mode (hides the ASCII-art banner, shows a single line).
# true: Compact single-line banner
# false: Full ASCII banner with tool/skill summary (default)
compact: false
# Tool progress display level (CLI and gateway)
@@ -821,12 +841,19 @@ display:
# Gateway-only natural mid-turn assistant updates.
# When true, completed assistant status messages are sent as separate chat
# messages. This is independent of tool_progress and gateway streaming.
# true: Send mid-turn assistant updates as separate messages (default)
# false: Only send the final response
interim_assistant_messages: true
# What Enter does when Hermes is already busy in the CLI.
# What Enter does when Hermes is already busy (CLI and gateway platforms).
# interrupt: Interrupt the current run and redirect Hermes (default)
# queue: Queue your message for the next turn
# Ctrl+C always interrupts regardless of this setting.
# steer: Inject your message mid-run via /steer, arriving at the agent
# after the next tool call — no interrupt, no role violation.
# Falls back to 'queue' if the agent isn't running yet or if
# images are attached (steer only carries text).
# Ctrl+C (or /stop in gateway) always interrupts regardless of this setting.
# Toggle at runtime with /busy <interrupt|queue|steer>.
busy_input_mode: interrupt
# Background process notifications (gateway/messaging only).
@@ -842,17 +869,22 @@ display:
# Play terminal bell when agent finishes a response.
# Useful for long-running tasks — your terminal will ding when the agent is done.
# Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
# true: Ring the terminal bell on each response
# false: Silent (default)
bell_on_complete: false
# Show model reasoning/thinking before each response.
# When enabled, a dim box shows the model's thought process above the response.
# Toggle at runtime with /reasoning show or /reasoning hide.
# true: Show the reasoning box
# false: Hide reasoning (default)
show_reasoning: false
# Stream tokens to the terminal as they arrive instead of waiting for the
# full response. The response box opens on first token and text appears
# line-by-line. Tool calls are still captured silently.
# Stream tokens to the terminal in real-time. Disable to wait for full responses.
# true: Stream tokens as they arrive (default)
# false: Wait for the full response before rendering
streaming: true
# ───────────────────────────────────────────────────────────────────────────
@@ -862,10 +894,15 @@ display:
# response box label, and branding text. Change at runtime with /skin <name>.
#
# Built-in skins:
# default — Classic Hermes gold/kawaii
# ares — Crimson/bronze war-god theme with spinner wings
# mono — Clean grayscale monochrome
# slate — Cool blue developer-focused
# default — Classic Hermes gold/kawaii
# ares — Crimson/bronze war-god theme with spinner wings
# mono — Clean grayscale monochrome
# slate — Cool blue developer-focused
# daylight — Bright light-mode theme
# warm-lightmode — Warm paper-tone light-mode theme
# poseidon — Sea-green/teal Olympian theme
# sisyphus — Earthy stone-and-moss theme
# charizard — Fiery orange dragon theme
#
# Custom skins: drop a YAML file in ~/.hermes/skins/<name>.yaml
# Schema (all fields optional, missing values inherit from default):
+671 -387
View File
File diff suppressed because it is too large Load Diff
+96 -4
View File
@@ -16,7 +16,7 @@ import uuid
from datetime import datetime, timedelta
from pathlib import Path
from hermes_constants import get_hermes_home
from typing import Optional, Dict, List, Any
from typing import Optional, Dict, List, Any, Union
logger = logging.getLogger(__name__)
@@ -311,6 +311,12 @@ def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None
elif schedule["kind"] == "cron":
if not HAS_CRONITER:
logger.warning(
"Cannot compute next run for cron schedule %r: 'croniter' "
"is not installed. Install the 'cron' extra (pip install "
"'hermes-agent[cron]') to re-enable recurring cron jobs.",
schedule.get("expr"),
)
return None
cron = croniter(schedule["expr"], now)
next_run = cron.get_next(datetime)
@@ -371,6 +377,39 @@ def save_jobs(jobs: List[Dict[str, Any]]):
raise
def _normalize_workdir(workdir: Optional[str]) -> Optional[str]:
"""Normalize and validate a cron job workdir.
Rules:
- Empty / None None (feature off, preserves old behaviour).
- ``~`` is expanded. Relative paths are rejected cron jobs run detached
from any shell cwd, so relative paths have no stable meaning.
- The path must exist and be a directory at create/update time. We do
NOT re-check at run time (a user might briefly unmount the dir; the
scheduler will just fall back to old behaviour with a logged warning).
Returns the absolute path string, or None when disabled.
Raises ValueError on invalid input.
"""
if workdir is None:
return None
raw = str(workdir).strip()
if not raw:
return None
expanded = Path(raw).expanduser()
if not expanded.is_absolute():
raise ValueError(
f"Cron workdir must be an absolute path (got {raw!r}). "
f"Cron jobs run detached from any shell cwd, so relative paths are ambiguous."
)
resolved = expanded.resolve()
if not resolved.exists():
raise ValueError(f"Cron workdir does not exist: {resolved}")
if not resolved.is_dir():
raise ValueError(f"Cron workdir is not a directory: {resolved}")
return str(resolved)
def create_job(
prompt: str,
schedule: str,
@@ -384,7 +423,9 @@ def create_job(
provider: Optional[str] = None,
base_url: Optional[str] = None,
script: Optional[str] = None,
context_from: Optional[Union[str, List[str]]] = None,
enabled_toolsets: Optional[List[str]] = None,
workdir: Optional[str] = None,
) -> Dict[str, Any]:
"""
Create a new cron job.
@@ -404,9 +445,18 @@ def create_job(
script: Optional path to a Python script whose stdout is injected into the
prompt each run. The script runs before the agent turn, and its output
is prepended as context. Useful for data collection / change detection.
context_from: Optional job ID (or list of job IDs) whose most recent output
is injected into the prompt as context before each run.
Useful for chaining cron jobs: job A finds data, job B processes it.
enabled_toolsets: Optional list of toolset names to restrict the agent to.
When set, only tools from these toolsets are loaded, reducing
token overhead. When omitted, all default tools are loaded.
workdir: Optional absolute path. When set, the job runs as if launched
from that directory: AGENTS.md / CLAUDE.md / .cursorrules from
that directory are injected into the system prompt, and the
terminal/file/code_exec tools use it as their working directory
(via TERMINAL_CWD). When unset, the old behaviour is preserved
(no context files injected, tools use the scheduler's cwd).
Returns:
The created job dict
@@ -439,6 +489,15 @@ def create_job(
normalized_script = normalized_script or None
normalized_toolsets = [str(t).strip() for t in enabled_toolsets if str(t).strip()] if enabled_toolsets else None
normalized_toolsets = normalized_toolsets or None
normalized_workdir = _normalize_workdir(workdir)
# Normalize context_from: accept str or list of str, store as list or None
if isinstance(context_from, str):
context_from = [context_from.strip()] if context_from.strip() else None
elif isinstance(context_from, list):
context_from = [str(j).strip() for j in context_from if str(j).strip()] or None
else:
context_from = None
label_source = (prompt or (normalized_skills[0] if normalized_skills else None)) or "cron job"
job = {
@@ -451,6 +510,7 @@ def create_job(
"provider": normalized_provider,
"base_url": normalized_base_url,
"script": normalized_script,
"context_from": context_from,
"schedule": parsed_schedule,
"schedule_display": parsed_schedule.get("display", schedule),
"repeat": {
@@ -471,6 +531,7 @@ def create_job(
"deliver": deliver,
"origin": origin, # Tracks where job was created for "origin" delivery
"enabled_toolsets": normalized_toolsets,
"workdir": normalized_workdir,
}
jobs = load_jobs()
@@ -504,6 +565,15 @@ def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]
if job["id"] != job_id:
continue
# Validate / normalize workdir if present in updates. Empty string or
# None both mean "clear the field" (restore old behaviour).
if "workdir" in updates:
_wd = updates["workdir"]
if _wd in (None, "", False):
updates["workdir"] = None
else:
updates["workdir"] = _normalize_workdir(_wd)
updated = _apply_skill_fields({**job, **updates})
schedule_changed = "schedule" in updates
@@ -634,10 +704,32 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
# Compute next run
job["next_run_at"] = compute_next_run(job["schedule"], now)
# If no next run (one-shot completed), disable
# If no next run, decide whether this is terminal completion
# (one-shot) or a transient failure (recurring schedule couldn't
# compute — e.g. 'croniter' missing from the runtime env).
# Recurring jobs must NEVER be silently disabled: that turns a
# missing runtime dep into "job completed" and the user's
# schedule quietly goes off. See issue #16265.
if job["next_run_at"] is None:
job["enabled"] = False
job["state"] = "completed"
kind = job.get("schedule", {}).get("kind")
if kind in ("cron", "interval"):
job["state"] = "error"
if not job.get("last_error"):
job["last_error"] = (
"Failed to compute next run for recurring "
"schedule (is the 'croniter' package "
"installed in the gateway's Python env?)"
)
logger.error(
"Job '%s' (%s) could not compute next_run_at; "
"leaving enabled and marking state=error so the "
"job is not silently disabled.",
job.get("name", job["id"]),
kind,
)
else:
job["enabled"] = False
job["state"] = "completed"
elif job.get("state") != "paused":
job["state"] = "scheduled"
+158 -13
View File
@@ -77,7 +77,7 @@ _KNOWN_DELIVERY_PLATFORMS = frozenset({
"telegram", "discord", "slack", "whatsapp", "signal",
"matrix", "mattermost", "homeassistant", "dingtalk", "feishu",
"wecom", "wecom_callback", "weixin", "sms", "email", "webhook", "bluebubbles",
"qqbot",
"qqbot", "yuanbao",
})
# Platforms that support a configured cron/notification home target, mapped to
@@ -337,6 +337,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
"sms": Platform.SMS,
"bluebubbles": Platform.BLUEBUBBLES,
"qqbot": Platform.QQBOT,
"yuanbao": Platform.YUANBAO,
}
# Optionally wrap the content with a header/footer so the user knows this
@@ -671,10 +672,51 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
f"{prompt}"
)
# Inject output from referenced cron jobs as context.
context_from = job.get("context_from")
if context_from:
from cron.jobs import OUTPUT_DIR
if isinstance(context_from, str):
context_from = [context_from]
for source_job_id in context_from:
# Guard against path traversal — valid job IDs are 12-char hex strings
if not source_job_id or not all(c in "0123456789abcdef" for c in source_job_id):
logger.warning("context_from: skipping invalid job_id %r", source_job_id)
continue
try:
job_output_dir = OUTPUT_DIR / source_job_id
if not job_output_dir.exists():
continue # silent skip — no output yet
output_files = sorted(
job_output_dir.glob("*.md"),
key=lambda f: f.stat().st_mtime,
reverse=True,
)
if not output_files:
continue # silent skip — no output yet
latest_output = output_files[0].read_text(encoding="utf-8").strip()
# Truncate to 8K characters to avoid prompt bloat
_MAX_CONTEXT_CHARS = 8000
if len(latest_output) > _MAX_CONTEXT_CHARS:
latest_output = latest_output[:_MAX_CONTEXT_CHARS] + "\n\n[... output truncated ...]"
if latest_output:
prompt = (
f"## Output from job '{source_job_id}'\n"
"The following is the most recent output from a preceding "
"cron job. Use it as context for your analysis.\n\n"
f"```\n{latest_output}\n```\n\n"
f"{prompt}"
)
else:
continue # silent skip — empty output
except (OSError, PermissionError) as e:
logger.warning("context_from: failed to read output for job %r: %s", source_job_id, e)
# silent skip — do not pollute the prompt with error messages
# Always prepend cron execution guidance so the agent knows how
# delivery works and can suppress delivery when appropriate.
cron_hint = (
"[SYSTEM: You are running as a scheduled cron job. "
"[IMPORTANT: You are running as a scheduled cron job. "
"DELIVERY: Your final response will be automatically delivered "
"to the user — do NOT use send_message or try to deliver "
"the output yourself. Just produce your report/output as your "
@@ -710,7 +752,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
parts.append("")
parts.extend(
[
f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
"",
content,
]
@@ -718,7 +760,7 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
if skipped:
notice = (
f"[SYSTEM: The following skill(s) were listed for this job but could not be found "
f"[IMPORTANT: The following skill(s) were listed for this job but could not be found "
f"and were skipped: {', '.join(skipped)}. "
f"Start your response with a brief notice so the user is aware, e.g.: "
f"'⚠️ Skill(s) not found and skipped: {', '.join(skipped)}']"
@@ -780,6 +822,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
logger.info("Running job '%s' (ID: %s)", job_name, job_id)
logger.info("Prompt: %s", prompt[:100])
agent = None
# Mark this as a cron session so the approval system can apply cron_mode.
# This env var is process-wide and persists for the lifetime of the
# scheduler process — every job this process runs is a cron job.
@@ -795,6 +839,30 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
chat_name=origin.get("chat_name", "") if origin else "",
)
# Per-job working directory. When set (and validated at create/update
# time), we point TERMINAL_CWD at it so:
# - build_context_files_prompt() picks up AGENTS.md / CLAUDE.md /
# .cursorrules from the job's project dir, AND
# - the terminal, file, and code-exec tools run commands from there.
#
# tick() serializes workdir-jobs outside the parallel pool, so mutating
# os.environ["TERMINAL_CWD"] here is safe for those jobs. For workdir-less
# jobs we leave TERMINAL_CWD untouched — preserves the original behaviour
# (skip_context_files=True, tools use whatever cwd the scheduler has).
_job_workdir = (job.get("workdir") or "").strip() or None
if _job_workdir and not Path(_job_workdir).is_dir():
# Directory was removed between create-time validation and now. Log
# and drop back to old behaviour rather than crashing the job.
logger.warning(
"Job '%s': configured workdir %r no longer exists — running without it",
job_id, _job_workdir,
)
_job_workdir = None
_prior_terminal_cwd = os.environ.get("TERMINAL_CWD", "_UNSET_")
if _job_workdir:
os.environ["TERMINAL_CWD"] = _job_workdir
logger.info("Job '%s': using workdir %s", job_id, _job_workdir)
try:
# Re-read .env and config.yaml fresh every run so provider/key
# changes take effect without a gateway restart.
@@ -871,6 +939,7 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
resolve_runtime_provider,
format_runtime_provider_error,
)
from hermes_cli.auth import AuthError
try:
runtime_kwargs = {
"requested": job.get("provider") or os.getenv("HERMES_INFERENCE_PROVIDER"),
@@ -878,6 +947,28 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
if job.get("base_url"):
runtime_kwargs["explicit_base_url"] = job.get("base_url")
runtime = resolve_runtime_provider(**runtime_kwargs)
except AuthError as auth_exc:
# Primary provider auth failed — try fallback chain before giving up.
logger.warning("Job '%s': primary auth failed (%s), trying fallback", job_id, auth_exc)
fb = _cfg.get("fallback_providers") or _cfg.get("fallback_model")
fb_list = (fb if isinstance(fb, list) else [fb]) if fb else []
runtime = None
for entry in fb_list:
if not isinstance(entry, dict):
continue
try:
fb_kwargs = {"requested": entry.get("provider")}
if entry.get("base_url"):
fb_kwargs["explicit_base_url"] = entry["base_url"]
if entry.get("api_key"):
fb_kwargs["explicit_api_key"] = entry["api_key"]
runtime = resolve_runtime_provider(**fb_kwargs)
logger.info("Job '%s': fallback resolved to %s", job_id, runtime.get("provider"))
break
except Exception as fb_exc:
logger.debug("Job '%s': fallback %s failed: %s", job_id, entry.get("provider"), fb_exc)
if runtime is None:
raise RuntimeError(format_runtime_provider_error(auth_exc)) from auth_exc
except Exception as exc:
message = format_runtime_provider_error(exc)
raise RuntimeError(message) from exc
@@ -920,7 +1011,10 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
enabled_toolsets=_resolve_cron_enabled_toolsets(job, _cfg),
disabled_toolsets=["cronjob", "messaging", "clarify"],
quiet_mode=True,
skip_context_files=True, # Don't inject SOUL.md/AGENTS.md from scheduler cwd
# When a workdir is configured, inject AGENTS.md / CLAUDE.md /
# .cursorrules from that directory; otherwise preserve the old
# behaviour (don't inject SOUL.md/AGENTS.md from the scheduler cwd).
skip_context_files=not bool(_job_workdir),
skip_memory=True, # Cron system prompts would corrupt user representations
platform="cron",
session_id=_cron_session_id,
@@ -1059,6 +1153,14 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
return False, output, "", error_msg
finally:
# Restore TERMINAL_CWD to whatever it was before this job ran. We
# only ever mutate it when the job has a workdir; see the setup block
# at the top of run_job for the serialization guarantee.
if _job_workdir:
if _prior_terminal_cwd == "_UNSET_":
os.environ.pop("TERMINAL_CWD", None)
else:
os.environ["TERMINAL_CWD"] = _prior_terminal_cwd
# Clean up ContextVar session/delivery state for this job.
clear_session_vars(_ctx_tokens)
if _session_db:
@@ -1070,6 +1172,24 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
_session_db.close()
except (Exception, KeyboardInterrupt) as e:
logger.debug("Job '%s': failed to close SQLite session store: %s", job_id, e)
# Release subprocesses, terminal sandboxes, browser daemons, and the
# main OpenAI/httpx client held by this ephemeral cron agent. Without
# this, a gateway that ticks cron every N minutes leaks fds per job
# until it hits EMFILE (#10200 / "too many open files").
try:
if agent is not None:
agent.close()
except (Exception, KeyboardInterrupt) as e:
logger.debug("Job '%s': failed to close agent resources: %s", job_id, e)
# Each cron run spins up a short-lived worker thread whose event loop
# dies as soon as the ``ThreadPoolExecutor`` shuts down. Any async
# httpx clients cached under that loop are now unusable — reap them
# so their transports don't accumulate in the process-global cache.
try:
from agent.auxiliary_client import cleanup_stale_async_clients
cleanup_stale_async_clients()
except Exception as e:
logger.debug("Job '%s': failed to reap stale auxiliary clients: %s", job_id, e)
def tick(verbose: bool = True, adapters=None, loop=None) -> int:
@@ -1186,14 +1306,39 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
mark_job_run(job["id"], False, str(e))
return False
# Run all due jobs concurrently, each in its own ContextVar copy
# so session/delivery state stays isolated per-thread.
with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
_futures = []
for job in due_jobs:
_ctx = contextvars.copy_context()
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results = [f.result() for f in _futures]
# Partition due jobs: those with a per-job workdir mutate
# os.environ["TERMINAL_CWD"] inside run_job, which is process-global —
# so they MUST run sequentially to avoid corrupting each other. Jobs
# without a workdir leave env untouched and stay parallel-safe.
workdir_jobs = [j for j in due_jobs if (j.get("workdir") or "").strip()]
parallel_jobs = [j for j in due_jobs if not (j.get("workdir") or "").strip()]
_results: list = []
# Sequential pass for workdir jobs.
for job in workdir_jobs:
_ctx = contextvars.copy_context()
_results.append(_ctx.run(_process_job, job))
# Parallel pass for the rest — same behaviour as before.
if parallel_jobs:
with concurrent.futures.ThreadPoolExecutor(max_workers=_max_workers) as _tick_pool:
_futures = []
for job in parallel_jobs:
_ctx = contextvars.copy_context()
_futures.append(_tick_pool.submit(_ctx.run, _process_job, job))
_results.extend(f.result() for f in _futures)
# Best-effort sweep of MCP stdio subprocesses that survived their
# session teardown during this tick. Runs AFTER every job has
# finished so active sessions (including live user chats) are
# never touched — only PIDs explicitly detected as orphans in
# tools.mcp_tool._run_stdio's finally block are reaped.
try:
from tools.mcp_tool import _kill_orphaned_mcp_children
_kill_orphaned_mcp_children()
except Exception as _e:
logger.debug("Post-tick MCP orphan cleanup failed: %s", _e)
return sum(_results)
finally:
+52
View File
@@ -0,0 +1,52 @@
#
# docker-compose.yml for Hermes Agent
#
# Usage:
# HERMES_UID=$(id -u) HERMES_GID=$(id -g) docker compose up -d
#
# Set HERMES_UID / HERMES_GID to the host user that owns ~/.hermes so
# files created inside the container stay readable/writable on the host.
# The entrypoint remaps the internal `hermes` user to these values via
# usermod/groupmod + gosu.
#
# Security notes:
# - The dashboard service binds to 127.0.0.1 by default. It stores API
# keys; exposing it on LAN without auth is unsafe. If you want remote
# access, use an SSH tunnel or put it behind a reverse proxy that
# adds authentication — do NOT pass --insecure --host 0.0.0.0.
# - The gateway's API server is off unless you uncomment API_SERVER_KEY
# and API_SERVER_HOST. See docs/user-guide/api-server.md before doing
# this on an internet-facing host.
#
services:
gateway:
build: .
image: hermes-agent
container_name: hermes
restart: unless-stopped
network_mode: host
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_UID=${HERMES_UID:-10000}
- HERMES_GID=${HERMES_GID:-10000}
# To expose the OpenAI-compatible API server beyond localhost,
# uncomment BOTH lines (API_SERVER_KEY is mandatory for auth):
# - API_SERVER_HOST=0.0.0.0
# - API_SERVER_KEY=${API_SERVER_KEY}
command: ["gateway", "run"]
dashboard:
image: hermes-agent
container_name: hermes-dashboard
restart: unless-stopped
network_mode: host
depends_on:
- gateway
volumes:
- ~/.hermes:/opt/data
environment:
- HERMES_UID=${HERMES_UID:-10000}
- HERMES_GID=${HERMES_GID:-10000}
# Localhost-only. For remote access, tunnel via `ssh -L 9119:localhost:9119`.
command: ["dashboard", "--host", "127.0.0.1", "--no-open"]
+20 -9
View File
@@ -22,9 +22,18 @@ if [ "$(id -u)" = "0" ]; then
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
fi
# Fix ownership of the data volume. When HERMES_UID remaps the hermes user,
# files created by previous runs (under the old UID) become inaccessible.
# Always chown -R when UID was remapped; otherwise only if top-level is wrong.
actual_hermes_uid=$(id -u hermes)
if [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
echo "$HERMES_HOME is not owned by $actual_hermes_uid, fixing"
needs_chown=false
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "10000" ]; then
needs_chown=true
elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
needs_chown=true
fi
if [ "$needs_chown" = true ]; then
echo "Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.
@@ -32,6 +41,15 @@ if [ "$(id -u)" = "0" ]; then
echo "Warning: chown failed (rootless container?) — continuing anyway"
fi
# Ensure config.yaml is readable by the hermes runtime user even if it was
# edited on the host after initial ownership setup. Must run here (as root)
# rather than after the gosu drop, otherwise a non-root caller like
# `docker run -u $(id -u):$(id -g)` hits "Operation not permitted" (#15865).
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml" 2>/dev/null || true
chmod 640 "$HERMES_HOME/config.yaml" 2>/dev/null || true
fi
echo "Dropping root privileges"
exec gosu hermes "$0" "$@"
fi
@@ -58,13 +76,6 @@ if [ ! -f "$HERMES_HOME/config.yaml" ]; then
cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
fi
# Ensure the main config file remains accessible to the hermes runtime user
# even if it was edited on the host after initial ownership setup.
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml"
chmod 640 "$HERMES_HOME/config.yaml"
fi
# SOUL.md
if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
+1
View File
@@ -36,6 +36,7 @@
imports = [
./nix/packages.nix
./nix/overlays.nix
./nix/nixosModules.nix
./nix/checks.nix
./nix/devShell.nix
+67 -14
View File
@@ -57,7 +57,7 @@ def _session_entry_name(origin: Dict[str, Any]) -> str:
# Build / refresh
# ---------------------------------------------------------------------------
def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
async def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
"""
Build a channel directory from connected platform adapters and session data.
@@ -72,7 +72,7 @@ def build_channel_directory(adapters: Dict[Any, Any]) -> Dict[str, Any]:
if platform == Platform.DISCORD:
platforms["discord"] = _build_discord(adapter)
elif platform == Platform.SLACK:
platforms["slack"] = _build_slack(adapter)
platforms["slack"] = await _build_slack(adapter)
except Exception as e:
logger.warning("Channel directory: failed to build %s: %s", platform.value, e)
@@ -136,21 +136,66 @@ def _build_discord(adapter) -> List[Dict[str, str]]:
return channels
def _build_slack(adapter) -> List[Dict[str, str]]:
"""List Slack channels the bot has joined."""
# Slack adapter may expose a web client
client = getattr(adapter, "_app", None) or getattr(adapter, "_client", None)
if not client:
async def _build_slack(adapter) -> List[Dict[str, Any]]:
"""List Slack channels the bot has joined across all workspaces.
Uses ``users.conversations`` against each workspace's web client. Pulls
public + private channels the bot is a member of, then merges in DMs
discovered from session history (IMs aren't useful to enumerate
proactively).
"""
team_clients = getattr(adapter, "_team_clients", None) or {}
if not team_clients:
return _build_from_sessions("slack")
try:
from tools.send_message_tool import _send_slack # noqa: F401
# Use the Slack Web API directly if available
except Exception:
pass
channels: List[Dict[str, Any]] = []
seen_ids: set = set()
# Fallback to session data
return _build_from_sessions("slack")
for team_id, client in team_clients.items():
try:
cursor: Optional[str] = None
for _page in range(20): # safety cap on pagination
response = await client.users_conversations(
types="public_channel,private_channel",
exclude_archived=True,
limit=200,
cursor=cursor,
)
if not response.get("ok"):
logger.warning(
"Channel directory: users.conversations not ok for team %s: %s",
team_id,
response.get("error", "unknown"),
)
break
for ch in response.get("channels", []):
cid = ch.get("id")
name = ch.get("name")
if not cid or not name or cid in seen_ids:
continue
seen_ids.add(cid)
channels.append({
"id": cid,
"name": name,
"type": "private" if ch.get("is_private") else "channel",
})
cursor = (response.get("response_metadata") or {}).get("next_cursor")
if not cursor:
break
except Exception as e:
logger.warning(
"Channel directory: failed to list Slack channels for team %s: %s",
team_id, e,
)
continue
# Merge in DM/group entries discovered from session history.
for entry in _build_from_sessions("slack"):
if entry.get("id") not in seen_ids:
channels.append(entry)
seen_ids.add(entry.get("id"))
return channels
def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
@@ -223,6 +268,14 @@ def resolve_channel_name(platform_name: str, name: str) -> Optional[str]:
if not channels:
return None
# 0. Exact ID match — case-sensitive, no normalization. Lets callers pass
# raw platform IDs (e.g. Slack "C0B0QV5434G") even when the format guard
# in _parse_target_ref hasn't recognized them as explicit.
raw = name.strip()
for ch in channels:
if ch.get("id") == raw:
return ch["id"]
query = _normalize_channel_query(name)
# 1. Exact name match, including the display labels shown by send_message(action="list")
+92 -6
View File
@@ -67,6 +67,7 @@ class Platform(Enum):
WEIXIN = "weixin"
BLUEBUBBLES = "bluebubbles"
QQBOT = "qqbot"
YUANBAO = "yuanbao"
@dataclass
@@ -135,7 +136,7 @@ class SessionResetPolicy:
mode=mode if mode is not None else "both",
at_hour=at_hour if at_hour is not None else 4,
idle_minutes=idle_minutes if idle_minutes is not None else 1440,
notify=notify if notify is not None else True,
notify=_coerce_bool(notify, True),
notify_exclude_platforms=tuple(exclude) if exclude is not None else ("api_server", "webhook"),
)
@@ -178,7 +179,7 @@ class PlatformConfig:
home_channel = HomeChannel.from_dict(data["home_channel"])
return cls(
enabled=data.get("enabled", False),
enabled=_coerce_bool(data.get("enabled"), False),
token=data.get("token"),
api_key=data.get("api_key"),
home_channel=home_channel,
@@ -195,6 +196,14 @@ class StreamingConfig:
edit_interval: float = 1.0 # Seconds between message edits (Telegram rate-limits at ~1/s)
buffer_threshold: int = 40 # Chars before forcing an edit
cursor: str = "" # Cursor shown during streaming
# Ported from openclaw/openclaw#72038. When >0, the final edit for
# a long-running streamed response is delivered as a fresh message
# if the original preview has been visible for at least this many
# seconds, so the platform's visible timestamp reflects completion
# time instead of the preview creation time. Currently applied to
# Telegram only (other platforms ignore the setting). Default 60s
# matches the OpenClaw rollout. Set to 0 to disable.
fresh_final_after_seconds: float = 60.0
def to_dict(self) -> Dict[str, Any]:
return {
@@ -203,6 +212,7 @@ class StreamingConfig:
"edit_interval": self.edit_interval,
"buffer_threshold": self.buffer_threshold,
"cursor": self.cursor,
"fresh_final_after_seconds": self.fresh_final_after_seconds,
}
@classmethod
@@ -215,6 +225,9 @@ class StreamingConfig:
edit_interval=float(data.get("edit_interval", 1.0)),
buffer_threshold=int(data.get("buffer_threshold", 40)),
cursor=data.get("cursor", ""),
fresh_final_after_seconds=float(
data.get("fresh_final_after_seconds", 60.0)
),
)
@@ -314,6 +327,9 @@ class GatewayConfig:
# QQBot uses extra dict for app credentials
elif platform == Platform.QQBOT and config.extra.get("app_id") and config.extra.get("client_secret"):
connected.append(platform)
# Yuanbao uses extra dict for app credentials
elif platform == Platform.YUANBAO and config.extra.get("app_id") and config.extra.get("app_secret"):
connected.append(platform)
# DingTalk uses client_id/client_secret from config.extra or env vars
elif platform == Platform.DINGTALK and (
config.extra.get("client_id") or os.getenv("DINGTALK_CLIENT_ID")
@@ -435,7 +451,7 @@ class GatewayConfig:
reset_triggers=data.get("reset_triggers", ["/new", "/reset"]),
quick_commands=quick_commands,
sessions_dir=sessions_dir,
always_log_local=data.get("always_log_local", True),
always_log_local=_coerce_bool(data.get("always_log_local"), True),
stt_enabled=_coerce_bool(stt_enabled, True),
group_sessions_per_user=_coerce_bool(group_sessions_per_user, True),
thread_sessions_per_user=_coerce_bool(thread_sessions_per_user, False),
@@ -550,6 +566,8 @@ def load_gateway_config() -> GatewayConfig:
existing = {}
# Deep-merge extra dicts so gateway.json defaults survive
merged_extra = {**existing.get("extra", {}), **plat_block.get("extra", {})}
if plat_name == Platform.SLACK.value and "enabled" in plat_block:
merged_extra["_enabled_explicit"] = True
merged = {**existing, **plat_block}
if merged_extra:
merged["extra"] = merged_extra
@@ -570,6 +588,8 @@ def load_gateway_config() -> GatewayConfig:
)
if "reply_prefix" in platform_cfg:
bridged["reply_prefix"] = platform_cfg["reply_prefix"]
if "reply_in_thread" in platform_cfg:
bridged["reply_in_thread"] = platform_cfg["reply_in_thread"]
if "require_mention" in platform_cfg:
bridged["require_mention"] = platform_cfg["require_mention"]
if "free_response_channels" in platform_cfg:
@@ -584,7 +604,7 @@ def load_gateway_config() -> GatewayConfig:
bridged["group_policy"] = platform_cfg["group_policy"]
if "group_allow_from" in platform_cfg:
bridged["group_allow_from"] = platform_cfg["group_allow_from"]
if plat == Platform.DISCORD and "channel_skill_bindings" in platform_cfg:
if plat in (Platform.DISCORD, Platform.SLACK) and "channel_skill_bindings" in platform_cfg:
bridged["channel_skill_bindings"] = platform_cfg["channel_skill_bindings"]
if "channel_prompts" in platform_cfg:
channel_prompts = platform_cfg["channel_prompts"]
@@ -592,16 +612,21 @@ def load_gateway_config() -> GatewayConfig:
bridged["channel_prompts"] = {str(k): v for k, v in channel_prompts.items()}
else:
bridged["channel_prompts"] = channel_prompts
if not bridged:
enabled_was_explicit = "enabled" in platform_cfg
if not bridged and not enabled_was_explicit:
continue
plat_data = platforms_data.setdefault(plat.value, {})
if not isinstance(plat_data, dict):
plat_data = {}
platforms_data[plat.value] = plat_data
if enabled_was_explicit:
plat_data["enabled"] = platform_cfg["enabled"]
extra = plat_data.setdefault("extra", {})
if not isinstance(extra, dict):
extra = {}
plat_data["extra"] = extra
if plat == Platform.SLACK and enabled_was_explicit:
extra["_enabled_explicit"] = True
extra.update(bridged)
# Slack settings → env vars (env vars take precedence)
@@ -609,6 +634,8 @@ def load_gateway_config() -> GatewayConfig:
if isinstance(slack_cfg, dict):
if "require_mention" in slack_cfg and not os.getenv("SLACK_REQUIRE_MENTION"):
os.environ["SLACK_REQUIRE_MENTION"] = str(slack_cfg["require_mention"]).lower()
if "strict_mention" in slack_cfg and not os.getenv("SLACK_STRICT_MENTION"):
os.environ["SLACK_STRICT_MENTION"] = str(slack_cfg["strict_mention"]).lower()
if "allow_bots" in slack_cfg and not os.getenv("SLACK_ALLOW_BOTS"):
os.environ["SLACK_ALLOW_BOTS"] = str(slack_cfg["allow_bots"]).lower()
frc = slack_cfg.get("free_response_channels")
@@ -687,6 +714,11 @@ def load_gateway_config() -> GatewayConfig:
os.environ["TELEGRAM_REACTIONS"] = str(telegram_cfg["reactions"]).lower()
if "proxy_url" in telegram_cfg and not os.getenv("TELEGRAM_PROXY"):
os.environ["TELEGRAM_PROXY"] = str(telegram_cfg["proxy_url"]).strip()
if "group_allowed_chats" in telegram_cfg and not os.getenv("TELEGRAM_GROUP_ALLOWED_USERS"):
gac = telegram_cfg["group_allowed_chats"]
if isinstance(gac, list):
gac = ",".join(str(v) for v in gac)
os.environ["TELEGRAM_GROUP_ALLOWED_USERS"] = str(gac)
if "disable_link_previews" in telegram_cfg:
plat_data = platforms_data.setdefault(Platform.TELEGRAM.value, {})
if not isinstance(plat_data, dict):
@@ -913,8 +945,20 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
slack_token = os.getenv("SLACK_BOT_TOKEN")
if slack_token:
if Platform.SLACK not in config.platforms:
# No yaml config for Slack — env-only setup, enable it
config.platforms[Platform.SLACK] = PlatformConfig()
config.platforms[Platform.SLACK].enabled = True
config.platforms[Platform.SLACK].enabled = True
else:
slack_config = config.platforms[Platform.SLACK]
enabled_was_explicit = bool(slack_config.extra.pop("_enabled_explicit", False))
if not slack_config.enabled and not enabled_was_explicit:
# Top-level Slack settings such as channel prompts should not
# turn an env-token setup into a disabled platform. Only an
# explicit slack.enabled/platforms.slack.enabled false should.
slack_config.enabled = True
# If yaml config exists, respect its enabled flag (don't override
# explicit enabled: false). Token is still stored so skills that
# send Slack messages can use it without activating the gateway adapter.
config.platforms[Platform.SLACK].token = slack_token
slack_home = os.getenv("SLACK_HOME_CHANNEL")
if slack_home and Platform.SLACK in config.platforms:
@@ -1271,6 +1315,48 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
name=os.getenv("QQBOT_HOME_CHANNEL_NAME") or os.getenv(qq_home_name_env, "Home"),
)
# Yuanbao — YUANBAO_APP_ID preferred
yuanbao_app_id = os.getenv("YUANBAO_APP_ID") or os.getenv("YUANBAO_APP_KEY")
yuanbao_app_secret = os.getenv("YUANBAO_APP_SECRET")
if yuanbao_app_id and yuanbao_app_secret:
if Platform.YUANBAO not in config.platforms:
config.platforms[Platform.YUANBAO] = PlatformConfig()
config.platforms[Platform.YUANBAO].enabled = True
extra = config.platforms[Platform.YUANBAO].extra
extra["app_id"] = yuanbao_app_id
extra["app_secret"] = yuanbao_app_secret
yuanbao_bot_id = os.getenv("YUANBAO_BOT_ID")
if yuanbao_bot_id:
extra["bot_id"] = yuanbao_bot_id
yuanbao_ws_url = os.getenv("YUANBAO_WS_URL")
if yuanbao_ws_url:
extra["ws_url"] = yuanbao_ws_url
yuanbao_api_domain = os.getenv("YUANBAO_API_DOMAIN")
if yuanbao_api_domain:
extra["api_domain"] = yuanbao_api_domain
yuanbao_route_env = os.getenv("YUANBAO_ROUTE_ENV")
if yuanbao_route_env:
extra["route_env"] = yuanbao_route_env
yuanbao_home = os.getenv("YUANBAO_HOME_CHANNEL")
if yuanbao_home:
config.platforms[Platform.YUANBAO].home_channel = HomeChannel(
platform=Platform.YUANBAO,
chat_id=yuanbao_home,
name=os.getenv("YUANBAO_HOME_CHANNEL_NAME", "Home"),
)
yuanbao_dm_policy = os.getenv("YUANBAO_DM_POLICY")
if yuanbao_dm_policy:
extra["dm_policy"] = yuanbao_dm_policy.strip().lower()
yuanbao_dm_allow_from = os.getenv("YUANBAO_DM_ALLOW_FROM")
if yuanbao_dm_allow_from:
extra["dm_allow_from"] = yuanbao_dm_allow_from
yuanbao_group_policy = os.getenv("YUANBAO_GROUP_POLICY")
if yuanbao_group_policy:
extra["group_policy"] = yuanbao_group_policy.strip().lower()
yuanbao_group_allow_from = os.getenv("YUANBAO_GROUP_ALLOW_FROM")
if yuanbao_group_allow_from:
extra["group_allow_from"] = yuanbao_group_allow_from
# Session settings
idle_minutes = os.getenv("SESSION_IDLE_MINUTES")
if idle_minutes:
+3 -1
View File
@@ -79,7 +79,9 @@ _PLATFORM_DEFAULTS: dict[str, dict[str, Any]] = {
"discord": _TIER_HIGH,
# Tier 2 — edit support, often customer/workspace channels
"slack": _TIER_MEDIUM,
# Slack: tool_progress off by default — Bolt posts cannot be edited like CLI;
# "new"/"all" spam permanent lines in channels (hermes-agent#14663).
"slack": {**_TIER_MEDIUM, "tool_progress": "off"},
"mattermost": _TIER_MEDIUM,
"matrix": _TIER_MEDIUM,
"feishu": _TIER_MEDIUM,
+57 -11
View File
@@ -28,6 +28,7 @@ def mirror_to_session(
message_text: str,
source_label: str = "cli",
thread_id: Optional[str] = None,
user_id: Optional[str] = None,
) -> bool:
"""
Append a delivery-mirror message to the target session's transcript.
@@ -39,9 +40,20 @@ def mirror_to_session(
All errors are caught -- this is never fatal.
"""
try:
session_id = _find_session_id(platform, str(chat_id), thread_id=thread_id)
session_id = _find_session_id(
platform,
str(chat_id),
thread_id=thread_id,
user_id=user_id,
)
if not session_id:
logger.debug("Mirror: no session found for %s:%s:%s", platform, chat_id, thread_id)
logger.debug(
"Mirror: no session found for %s:%s:%s:%s",
platform,
chat_id,
thread_id,
user_id,
)
return False
mirror_msg = {
@@ -59,17 +71,33 @@ def mirror_to_session(
return True
except Exception as e:
logger.debug("Mirror failed for %s:%s:%s: %s", platform, chat_id, thread_id, e)
logger.debug(
"Mirror failed for %s:%s:%s:%s: %s",
platform,
chat_id,
thread_id,
user_id,
e,
)
return False
def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = None) -> Optional[str]:
def _find_session_id(
platform: str,
chat_id: str,
thread_id: Optional[str] = None,
user_id: Optional[str] = None,
) -> Optional[str]:
"""
Find the active session_id for a platform + chat_id pair.
Scans sessions.json entries and matches where origin.chat_id == chat_id
on the right platform. DM session keys don't embed the chat_id
(e.g. "agent:main:telegram:dm"), so we check the origin dict.
When *user_id* is provided, prefer exact sender matches. If multiple
same-chat candidates exist and none matches the user, return None instead
of guessing and contaminating another participant's session.
"""
if not _SESSIONS_INDEX.exists():
return None
@@ -81,8 +109,7 @@ def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = Non
return None
platform_lower = platform.lower()
best_match = None
best_updated = ""
candidates = []
for _key, entry in data.items():
origin = entry.get("origin") or {}
@@ -96,12 +123,31 @@ def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = Non
origin_thread_id = origin.get("thread_id")
if thread_id is not None and str(origin_thread_id or "") != str(thread_id):
continue
updated = entry.get("updated_at", "")
if updated > best_updated:
best_updated = updated
best_match = entry.get("session_id")
candidates.append(entry)
return best_match
if not candidates:
return None
if user_id:
exact_user_matches = [
entry for entry in candidates
if str((entry.get("origin") or {}).get("user_id") or "") == str(user_id)
]
if exact_user_matches:
candidates = exact_user_matches
elif len(candidates) > 1:
return None
elif len(candidates) > 1:
distinct_user_ids = {
str((entry.get("origin") or {}).get("user_id") or "").strip()
for entry in candidates
if str((entry.get("origin") or {}).get("user_id") or "").strip()
}
if len(distinct_user_ids) > 1:
return None
best_entry = max(candidates, key=lambda entry: entry.get("updated_at", ""))
return best_entry.get("session_id")
def _append_to_jsonl(session_id: str, message: dict) -> None:
+2
View File
@@ -10,10 +10,12 @@ Each adapter handles:
from .base import BasePlatformAdapter, MessageEvent, SendResult
from .qqbot import QQAdapter
from .yuanbao import YuanbaoAdapter
__all__ = [
"BasePlatformAdapter",
"MessageEvent",
"SendResult",
"QQAdapter",
"YuanbaoAdapter",
]
+150 -22
View File
@@ -9,6 +9,7 @@ Exposes an HTTP server with endpoints:
- GET /v1/models lists hermes-agent as an available model
- POST /v1/runs start a run, returns run_id immediately (202)
- GET /v1/runs/{run_id}/events SSE stream of structured lifecycle events
- POST /v1/runs/{run_id}/stop interrupt a running agent
- GET /health health check
- GET /health/detailed rich status for cross-container dashboard probing
@@ -586,6 +587,9 @@ class APIServerAdapter(BasePlatformAdapter):
self._run_streams: Dict[str, "asyncio.Queue[Optional[Dict]]"] = {}
# Creation timestamps for orphaned-run TTL sweep
self._run_streams_created: Dict[str, float] = {}
# Active run agent/task references for stop support
self._active_run_agents: Dict[str, Any] = {}
self._active_run_tasks: Dict[str, "asyncio.Task"] = {}
self._session_db: Optional[Any] = None # Lazy-init SessionDB for session continuity
@staticmethod
@@ -1204,10 +1208,12 @@ class APIServerAdapter(BasePlatformAdapter):
If the client disconnects mid-stream, ``agent.interrupt()`` is
called so the agent stops issuing upstream LLM calls, then the
asyncio task is cancelled. When ``store=True`` the full response
is persisted to the ResponseStore in a ``finally`` block so GET
/v1/responses/{id} and ``previous_response_id`` chaining work the
same as the batch path.
asyncio task is cancelled. When ``store=True`` an initial
``in_progress`` snapshot is persisted immediately after
``response.created`` and disconnects update it to an
``incomplete`` snapshot so GET /v1/responses/{id} and
``previous_response_id`` chaining still have something to
recover from.
"""
import queue as _q
@@ -1269,6 +1275,60 @@ class APIServerAdapter(BasePlatformAdapter):
final_response_text = ""
agent_error: Optional[str] = None
usage: Dict[str, int] = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
terminal_snapshot_persisted = False
def _persist_response_snapshot(
response_env: Dict[str, Any],
*,
conversation_history_snapshot: Optional[List[Dict[str, Any]]] = None,
) -> None:
if not store:
return
if conversation_history_snapshot is None:
conversation_history_snapshot = list(conversation_history)
conversation_history_snapshot.append({"role": "user", "content": user_message})
self._response_store.put(response_id, {
"response": response_env,
"conversation_history": conversation_history_snapshot,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
def _persist_incomplete_if_needed() -> None:
"""Persist an ``incomplete`` snapshot if no terminal one was written.
Called from both the client-disconnect (``ConnectionResetError``)
and server-cancellation (``asyncio.CancelledError``) paths so
GET /v1/responses/{id} and ``previous_response_id`` chaining keep
working after abrupt stream termination.
"""
if not store or terminal_snapshot_persisted:
return
incomplete_text = "".join(final_text_parts) or final_response_text
incomplete_items: List[Dict[str, Any]] = list(emitted_items)
if incomplete_text:
incomplete_items.append({
"type": "message",
"role": "assistant",
"content": [{"type": "output_text", "text": incomplete_text}],
})
incomplete_env = _envelope("incomplete")
incomplete_env["output"] = incomplete_items
incomplete_env["usage"] = {
"input_tokens": usage.get("input_tokens", 0),
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
incomplete_history = list(conversation_history)
incomplete_history.append({"role": "user", "content": user_message})
if incomplete_text:
incomplete_history.append({"role": "assistant", "content": incomplete_text})
_persist_response_snapshot(
incomplete_env,
conversation_history_snapshot=incomplete_history,
)
try:
# response.created — initial envelope, status=in_progress
@@ -1278,6 +1338,7 @@ class APIServerAdapter(BasePlatformAdapter):
"type": "response.created",
"response": created_env,
})
_persist_response_snapshot(created_env)
last_activity = time.monotonic()
async def _open_message_item() -> None:
@@ -1534,6 +1595,18 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
_failed_history = list(conversation_history)
_failed_history.append({"role": "user", "content": user_message})
if final_response_text or agent_error:
_failed_history.append({
"role": "assistant",
"content": final_response_text or agent_error,
})
_persist_response_snapshot(
failed_env,
conversation_history_snapshot=_failed_history,
)
terminal_snapshot_persisted = True
await _write_event("response.failed", {
"type": "response.failed",
"response": failed_env,
@@ -1546,30 +1619,24 @@ class APIServerAdapter(BasePlatformAdapter):
"output_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
}
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
_persist_response_snapshot(
completed_env,
conversation_history_snapshot=full_history,
)
terminal_snapshot_persisted = True
await _write_event("response.completed", {
"type": "response.completed",
"response": completed_env,
})
# Persist for future chaining / GET retrieval, mirroring
# the batch path behavior.
if store:
full_history = list(conversation_history)
full_history.append({"role": "user", "content": user_message})
if isinstance(result, dict) and result.get("messages"):
full_history.extend(result["messages"])
else:
full_history.append({"role": "assistant", "content": final_response_text})
self._response_store.put(response_id, {
"response": completed_env,
"conversation_history": full_history,
"instructions": instructions,
"session_id": session_id,
})
if conversation:
self._response_store.set_conversation(conversation, response_id)
except (ConnectionResetError, ConnectionAbortedError, BrokenPipeError, OSError):
_persist_incomplete_if_needed()
# Client disconnected — interrupt the agent so it stops
# making upstream LLM calls, then cancel the task.
agent = agent_ref[0] if agent_ref else None
@@ -1585,6 +1652,22 @@ class APIServerAdapter(BasePlatformAdapter):
except (asyncio.CancelledError, Exception):
pass
logger.info("SSE client disconnected; interrupted agent task %s", response_id)
except asyncio.CancelledError:
# Server-side cancellation (e.g. shutdown, request timeout) —
# persist an incomplete snapshot so GET /v1/responses/{id} and
# previous_response_id chaining still work, then re-raise so the
# runtime's cancellation semantics are respected.
_persist_incomplete_if_needed()
agent = agent_ref[0] if agent_ref else None
if agent is not None:
try:
agent.interrupt("SSE task cancelled")
except Exception:
pass
if not agent_task.done():
agent_task.cancel()
logger.info("SSE task cancelled; persisted incomplete snapshot for %s", response_id)
raise
return response
@@ -2362,6 +2445,7 @@ class APIServerAdapter(BasePlatformAdapter):
stream_delta_callback=_text_cb,
tool_progress_callback=event_cb,
)
self._active_run_agents[run_id] = agent
def _run_sync():
r = agent.run_conversation(
user_message=user_message,
@@ -2401,8 +2485,11 @@ class APIServerAdapter(BasePlatformAdapter):
q.put_nowait(None)
except Exception:
pass
self._active_run_agents.pop(run_id, None)
self._active_run_tasks.pop(run_id, None)
task = asyncio.create_task(_run_and_close())
self._active_run_tasks[run_id] = task
try:
self._background_tasks.add(task)
except TypeError:
@@ -2461,6 +2548,44 @@ class APIServerAdapter(BasePlatformAdapter):
return response
async def _handle_stop_run(self, request: "web.Request") -> "web.Response":
"""POST /v1/runs/{run_id}/stop — interrupt a running agent."""
auth_err = self._check_auth(request)
if auth_err:
return auth_err
run_id = request.match_info["run_id"]
agent = self._active_run_agents.get(run_id)
task = self._active_run_tasks.get(run_id)
if agent is None and task is None:
return web.json_response(_openai_error(f"Run not found: {run_id}", code="run_not_found"), status=404)
if agent is not None:
try:
agent.interrupt("Stop requested via API")
except Exception:
pass
if task is not None and not task.done():
task.cancel()
# Bounded wait: run_conversation() executes in the default
# executor thread which task.cancel() cannot preempt — we rely on
# agent.interrupt() above to break the loop. Cap the wait so a
# slow/unresponsive interrupt can't hang this handler.
try:
await asyncio.wait_for(asyncio.shield(task), timeout=5.0)
except asyncio.TimeoutError:
logger.warning(
"[api_server] stop for run %s timed out after 5s; "
"agent may still be finishing the current step",
run_id,
)
except (asyncio.CancelledError, Exception):
pass
return web.json_response({"run_id": run_id, "status": "stopping"})
async def _sweep_orphaned_runs(self) -> None:
"""Periodically clean up run streams that were never consumed."""
while True:
@@ -2475,6 +2600,8 @@ class APIServerAdapter(BasePlatformAdapter):
logger.debug("[api_server] sweeping orphaned run %s", run_id)
self._run_streams.pop(run_id, None)
self._run_streams_created.pop(run_id, None)
self._active_run_agents.pop(run_id, None)
self._active_run_tasks.pop(run_id, None)
# ------------------------------------------------------------------
# BasePlatformAdapter interface
@@ -2510,6 +2637,7 @@ class APIServerAdapter(BasePlatformAdapter):
# Structured event streaming
self._app.router.add_post("/v1/runs", self._handle_runs)
self._app.router.add_get("/v1/runs/{run_id}/events", self._handle_run_events)
self._app.router.add_post("/v1/runs/{run_id}/stop", self._handle_stop_run)
# Start background sweep to clean up orphaned (unconsumed) run streams
sweep_task = asyncio.create_task(self._sweep_orphaned_runs())
try:
+287 -9
View File
@@ -148,7 +148,102 @@ def _detect_macos_system_proxy() -> str | None:
return None
def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
def _split_host_port(value: str) -> tuple[str, int | None]:
raw = str(value or "").strip()
if not raw:
return "", None
if "://" in raw:
parsed = urlsplit(raw)
return (parsed.hostname or "").lower().rstrip("."), parsed.port
if raw.startswith("[") and "]" in raw:
host, _, rest = raw[1:].partition("]")
port = None
if rest.startswith(":") and rest[1:].isdigit():
port = int(rest[1:])
return host.lower().rstrip("."), port
if raw.count(":") == 1:
host, _, maybe_port = raw.rpartition(":")
if maybe_port.isdigit():
return host.lower().rstrip("."), int(maybe_port)
return raw.lower().strip("[]").rstrip("."), None
def _no_proxy_entries() -> list[str]:
entries: list[str] = []
for key in ("NO_PROXY", "no_proxy"):
raw = os.environ.get(key, "")
entries.extend(part.strip() for part in raw.split(",") if part.strip())
return entries
def _no_proxy_entry_matches(entry: str, host: str, port: int | None = None) -> bool:
token = str(entry or "").strip().lower()
if not token:
return False
if token == "*":
return True
token_host, token_port = _split_host_port(token)
if token_port is not None and port is not None and token_port != port:
return False
if token_port is not None and port is None:
return False
if not token_host:
return False
try:
network = ipaddress.ip_network(token_host, strict=False)
try:
return ipaddress.ip_address(host) in network
except ValueError:
return False
except ValueError:
pass
try:
token_ip = ipaddress.ip_address(token_host)
try:
return ipaddress.ip_address(host) == token_ip
except ValueError:
return False
except ValueError:
pass
if token_host.startswith("*."):
suffix = token_host[1:]
return host.endswith(suffix)
if token_host.startswith("."):
return host == token_host[1:] or host.endswith(token_host)
return host == token_host or host.endswith(f".{token_host}")
def should_bypass_proxy(target_hosts: str | list[str] | tuple[str, ...] | set[str] | None) -> bool:
"""Return True when NO_PROXY/no_proxy matches at least one target host.
Supports exact hosts, domain suffixes, wildcard suffixes, IP literals,
CIDR ranges, optional host:port entries, and ``*``.
"""
entries = _no_proxy_entries()
if not entries or not target_hosts:
return False
if isinstance(target_hosts, str):
candidates = [target_hosts]
else:
candidates = list(target_hosts)
for candidate in candidates:
host, port = _split_host_port(str(candidate))
if not host:
continue
if any(_no_proxy_entry_matches(entry, host, port) for entry in entries):
return True
return False
def resolve_proxy_url(
platform_env_var: str | None = None,
*,
target_hosts: str | list[str] | tuple[str, ...] | set[str] | None = None,
) -> str | None:
"""Return a proxy URL from env vars, or macOS system proxy.
Check order:
@@ -156,18 +251,26 @@ def resolve_proxy_url(platform_env_var: str | None = None) -> str | None:
1. HTTPS_PROXY / HTTP_PROXY / ALL_PROXY (and lowercase variants)
2. macOS system proxy via ``scutil --proxy`` (auto-detect)
Returns *None* if no proxy is found.
Returns *None* if no proxy is found, or if NO_PROXY/no_proxy matches one
of ``target_hosts``.
"""
if platform_env_var:
value = (os.environ.get(platform_env_var) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = (os.environ.get(key) or "").strip()
if value:
if should_bypass_proxy(target_hosts):
return None
return normalize_proxy_url(value)
return normalize_proxy_url(_detect_macos_system_proxy())
detected = normalize_proxy_url(_detect_macos_system_proxy())
if detected and should_bypass_proxy(target_hosts):
return None
return detected
def proxy_kwargs_for_bot(proxy_url: str | None) -> dict:
@@ -233,6 +336,39 @@ def proxy_kwargs_for_aiohttp(proxy_url: str | None) -> tuple[dict, dict]:
return {}, {"proxy": proxy_url}
def is_host_excluded_by_no_proxy(hostname: str, no_proxy_value: str | None = None) -> bool:
"""Return True when ``hostname`` matches a ``NO_PROXY`` entry.
Supports comma- or whitespace-separated entries with optional leading dots
and ``*.`` wildcards, which match both the apex domain and subdomains.
"""
raw = no_proxy_value
if raw is None:
raw = os.environ.get("NO_PROXY") or os.environ.get("no_proxy") or ""
raw = raw.strip()
if not raw:
return False
lower_hostname = hostname.lower()
for entry in re.split(r"[\s,]+", raw):
normalized = entry.strip().lower()
if not normalized:
continue
if normalized == "*":
return True
if normalized.startswith("*."):
normalized = normalized[2:]
elif normalized.startswith("."):
normalized = normalized[1:]
if lower_hostname == normalized or lower_hostname.endswith(f".{normalized}"):
return True
return False
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
@@ -590,7 +726,15 @@ SUPPORTED_DOCUMENT_TYPES = {
".pdf": "application/pdf",
".md": "text/markdown",
".txt": "text/plain",
".csv": "text/csv",
".log": "text/plain",
".json": "application/json",
".xml": "application/xml",
".yaml": "application/yaml",
".yml": "application/yaml",
".toml": "application/toml",
".ini": "text/plain",
".cfg": "text/plain",
".zip": "application/zip",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
@@ -879,6 +1023,61 @@ def resolve_channel_prompt(
return None
def resolve_channel_skills(
config_extra: dict,
channel_id: str,
parent_id: str | None = None,
) -> list[str] | None:
"""Resolve auto-loaded skill(s) for a channel/thread from platform config.
Looks up ``channel_skill_bindings`` in the adapter's ``config.extra`` dict.
Config format::
channel_skill_bindings:
- id: "C0123" # Slack channel ID or Discord channel/forum ID
skills: ["skill-a", "skill-b"]
- id: "D0ABCDE"
skill: "solo-skill" # single string also accepted
Prefers an exact match on *channel_id*; falls back to *parent_id*
(useful for forum threads / Slack threads inheriting the parent channel's
binding).
Returns a deduplicated list of skill names (order preserved), or None if
no match is found.
"""
bindings = config_extra.get("channel_skill_bindings") or []
if not isinstance(bindings, list) or not bindings:
return None
ids_to_check: set[str] = set()
if channel_id:
ids_to_check.add(str(channel_id))
if parent_id:
ids_to_check.add(str(parent_id))
if not ids_to_check:
return None
for entry in bindings:
if not isinstance(entry, dict):
continue
entry_id = str(entry.get("id", ""))
if entry_id in ids_to_check:
skills = entry.get("skills") or entry.get("skill")
if isinstance(skills, str):
s = skills.strip()
return [s] if s else None
if isinstance(skills, list) and skills:
seen: list[str] = []
for name in skills:
if not isinstance(name, str):
continue
nm = name.strip()
if nm and nm not in seen:
seen.append(nm)
return seen or None
return None
class BasePlatformAdapter(ABC):
"""
Base class for platform adapters.
@@ -922,7 +1121,20 @@ class BasePlatformAdapter(ABC):
self._post_delivery_callbacks: Dict[str, Any] = {}
self._expected_cancelled_tasks: set[asyncio.Task] = set()
self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
# Chats where auto-TTS on voice input is disabled (set by /voice off)
# Auto-TTS on voice input: ``_auto_tts_default`` is the global default
# (``voice.auto_tts`` in config.yaml, pushed by GatewayRunner on connect).
# Per-chat overrides live in two sets populated from ``_voice_mode``:
# - ``_auto_tts_enabled_chats``: chat explicitly opted in via ``/voice on``
# or ``/voice tts`` (mode is ``voice_only`` or ``all``). Fires even when
# the global default is False.
# - ``_auto_tts_disabled_chats``: chat explicitly opted out via
# ``/voice off`` (mode is ``off``). Suppresses auto-TTS even when the
# global default is True.
# The gate in _process_message() is:
# fire if chat in _auto_tts_enabled_chats
# OR (_auto_tts_default and chat not in _auto_tts_disabled_chats)
self._auto_tts_default: bool = False
self._auto_tts_enabled_chats: set = set()
self._auto_tts_disabled_chats: set = set()
# Chats where typing indicator is paused (e.g. during approval waits).
# _keep_typing skips send_typing when the chat_id is in this set.
@@ -944,6 +1156,21 @@ class BasePlatformAdapter(ABC):
def fatal_error_retryable(self) -> bool:
return self._fatal_error_retryable
def _should_auto_tts_for_chat(self, chat_id: str) -> bool:
"""Whether auto-TTS on voice input should fire for ``chat_id``.
Decision layers (Issue #16007):
1. Explicit ``/voice on`` or ``/voice tts`` always fire (even if
``voice.auto_tts`` is False).
2. Explicit ``/voice off`` never fire.
3. Fall back to the global ``voice.auto_tts`` config default.
"""
if chat_id in self._auto_tts_enabled_chats:
return True
if chat_id in self._auto_tts_disabled_chats:
return False
return bool(self._auto_tts_default)
def set_fatal_error_handler(self, handler: Callable[["BasePlatformAdapter"], Awaitable[None] | None]) -> None:
self._fatal_error_handler = handler
@@ -1127,6 +1354,27 @@ class BasePlatformAdapter(ABC):
"""
return SendResult(success=False, error="Not supported")
async def delete_message(
self,
chat_id: str,
message_id: str,
) -> bool:
"""
Delete a previously sent message. Optional platforms that don't
support deletion return ``False`` and callers fall back to leaving
the message in place.
Used by the stream consumer's fresh-final cleanup path (see
openclaw/openclaw#72038) to remove long-lived preview messages
after sending the completed reply as a fresh message so the
platform's visible timestamp reflects completion time.
Returns ``True`` on successful deletion, ``False`` otherwise.
Subclasses should override for platforms with a deletion API
(e.g. Telegram ``deleteMessage``).
"""
return False
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""
Send a typing indicator.
@@ -1454,13 +1702,41 @@ class BasePlatformAdapter(ABC):
the agent is waiting for dangerous-command approval). This is critical
for Slack's Assistant API where ``assistant_threads_setStatus`` disables
the compose box pausing lets the user type ``/approve`` or ``/deny``.
Each ``send_typing`` call is bounded by a ~1.5s timeout so a slow
network round-trip can't stall the refresh cadence. Telegram- and
Discord-side typing expire after ~5s; if any individual send_typing
takes longer than the refresh interval, the bubble would die and
stay dead until that call returns. Abandoning the slow call lets
the next tick fire a fresh send_typing on schedule as long as
one of them succeeds within the 5s platform-side window, the bubble
stays visible across provider stalls / upstream API timeouts.
"""
# Bound each send_typing round-trip so the refresh cadence isn't
# gated on network health. Must stay below ``interval`` so a slow
# call gets abandoned before the next scheduled tick.
_send_typing_timeout = max(0.25, min(1.5, interval - 0.25))
try:
while True:
if stop_event is not None and stop_event.is_set():
return
if chat_id not in self._typing_paused:
await self.send_typing(chat_id, metadata=metadata)
try:
await asyncio.wait_for(
self.send_typing(chat_id, metadata=metadata),
timeout=_send_typing_timeout,
)
except asyncio.TimeoutError:
# Slow network — abandon this tick, keep the loop
# on schedule so the next send_typing fires fresh.
pass
except asyncio.CancelledError:
raise
except Exception as typing_err:
logger.debug(
"[%s] send_typing error (non-fatal): %s",
self.name, typing_err,
)
if stop_event is None:
await asyncio.sleep(interval)
continue
@@ -2111,12 +2387,14 @@ class BasePlatformAdapter(ABC):
logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
# Auto-TTS: if voice message, generate audio FIRST (before sending text)
# Skipped when the chat has voice mode disabled (/voice off)
# Gated via ``_should_auto_tts_for_chat``: fires when the chat has
# an explicit ``/voice on|tts`` opt-in OR when ``voice.auto_tts`` is
# True globally and no ``/voice off`` has been issued.
_tts_path = None
if (event.message_type == MessageType.VOICE
if (self._should_auto_tts_for_chat(event.source.chat_id)
and event.message_type == MessageType.VOICE
and text_content
and not media_files
and event.source.chat_id not in self._auto_tts_disabled_chats):
and not media_files):
try:
from tools.tts_tool import text_to_speech_tool, check_tts_requirements
if check_tts_requirements():
+19 -2
View File
@@ -99,6 +99,7 @@ def _normalize_server_url(raw: str) -> str:
class BlueBubblesAdapter(BasePlatformAdapter):
platform = Platform.BLUEBUBBLES
SUPPORTS_MESSAGE_EDITING = False
MAX_MESSAGE_LENGTH = MAX_TEXT_LENGTH
def __init__(self, config: PlatformConfig):
@@ -391,6 +392,13 @@ class BlueBubblesAdapter(BasePlatformAdapter):
# Text sending
# ------------------------------------------------------------------
@staticmethod
def truncate_message(content: str, max_length: int = MAX_TEXT_LENGTH) -> List[str]:
# Use the base splitter but skip pagination indicators — iMessage
# bubbles flow naturally without "(1/3)" suffixes.
chunks = BasePlatformAdapter.truncate_message(content, max_length)
return [re.sub(r"\s*\(\d+/\d+\)$", "", c) for c in chunks]
async def send(
self,
chat_id: str,
@@ -398,10 +406,19 @@ class BlueBubblesAdapter(BasePlatformAdapter):
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
text = strip_markdown(content or "")
text = self.format_message(content)
if not text:
return SendResult(success=False, error="BlueBubbles send requires text")
chunks = self.truncate_message(text, max_length=self.MAX_MESSAGE_LENGTH)
# Split on paragraph breaks first (double newlines) so each thought
# becomes its own iMessage bubble, then truncate any that are still
# too long.
paragraphs = [p.strip() for p in re.split(r'\n\s*\n', text) if p.strip()]
chunks: List[str] = []
for para in (paragraphs or [text]):
if len(para) <= self.MAX_MESSAGE_LENGTH:
chunks.append(para)
else:
chunks.extend(self.truncate_message(para, max_length=self.MAX_MESSAGE_LENGTH))
last = SendResult(success=True)
for chunk in chunks:
guid = await self._resolve_chat_guid(chat_id)
+28 -32
View File
@@ -2246,10 +2246,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_usage(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/usage")
@tree.command(name="provider", description="Show available providers")
async def slash_provider(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/provider")
@tree.command(name="help", description="Show available commands")
async def slash_help(interaction: discord.Interaction):
await self._run_simple_slash(interaction, "/help")
@@ -2319,11 +2315,6 @@ class DiscordAdapter(BasePlatformAdapter):
async def slash_background(interaction: discord.Interaction, prompt: str):
await self._run_simple_slash(interaction, f"/background {prompt}", "Background task started~")
@tree.command(name="btw", description="Ephemeral side question using session context")
@discord.app_commands.describe(question="Your side question (no tools, not persisted)")
async def slash_btw(interaction: discord.Interaction, question: str):
await self._run_simple_slash(interaction, f"/btw {question}")
# ── Auto-register any gateway-available commands not yet on the tree ──
# This ensures new commands added to COMMAND_REGISTRY in
# hermes_cli/commands.py automatically appear as Discord slash
@@ -2688,21 +2679,8 @@ class DiscordAdapter(BasePlatformAdapter):
skills: ["skill-a", "skill-b"]
Also checks parent_id so forum threads inherit the forum's bindings.
"""
bindings = self.config.extra.get("channel_skill_bindings", [])
if not bindings:
return None
ids_to_check = {channel_id}
if parent_id:
ids_to_check.add(parent_id)
for entry in bindings:
entry_id = str(entry.get("id", ""))
if entry_id in ids_to_check:
skills = entry.get("skills") or entry.get("skill")
if isinstance(skills, str):
return [skills]
if isinstance(skills, list) and skills:
return list(dict.fromkeys(skills)) # dedup, preserve order
return None
from gateway.platforms.base import resolve_channel_skills
return resolve_channel_skills(self.config.extra, channel_id, parent_id)
def _resolve_channel_prompt(self, channel_id: str, parent_id: str | None = None) -> str | None:
"""Resolve a Discord per-channel prompt, preferring the exact channel over its parent."""
@@ -2719,7 +2697,12 @@ class DiscordAdapter(BasePlatformAdapter):
return os.getenv("DISCORD_REQUIRE_MENTION", "true").lower() not in ("false", "0", "no", "off")
def _discord_free_response_channels(self) -> set:
"""Return Discord channel IDs where no bot mention is required."""
"""Return Discord channel IDs where no bot mention is required.
A single ``"*"`` entry (either from a list or a comma-separated
string) is preserved in the returned set so callers can short-circuit
on wildcard membership, consistent with ``allowed_channels``.
"""
raw = self.config.extra.get("free_response_channels")
if raw is None:
raw = os.getenv("DISCORD_FREE_RESPONSE_CHANNELS", "")
@@ -3212,14 +3195,14 @@ class DiscordAdapter(BasePlatformAdapter):
allowed_channels_raw = os.getenv("DISCORD_ALLOWED_CHANNELS", "")
if allowed_channels_raw:
allowed_channels = {ch.strip() for ch in allowed_channels_raw.split(",") if ch.strip()}
if not (channel_ids & allowed_channels):
if "*" not in allowed_channels and not (channel_ids & allowed_channels):
logger.debug("[%s] Ignoring message in non-allowed channel: %s", self.name, channel_ids)
return
# Check ignored channels - never respond even when mentioned
ignored_channels_raw = os.getenv("DISCORD_IGNORED_CHANNELS", "")
ignored_channels = {ch.strip() for ch in ignored_channels_raw.split(",") if ch.strip()}
if channel_ids & ignored_channels:
if "*" in ignored_channels or (channel_ids & ignored_channels):
logger.debug("[%s] Ignoring message in ignored channel: %s", self.name, channel_ids)
return
@@ -3233,7 +3216,11 @@ class DiscordAdapter(BasePlatformAdapter):
voice_linked_ids = {str(ch_id) for ch_id in self._voice_text_channels.values()}
current_channel_id = str(message.channel.id)
is_voice_linked_channel = current_channel_id in voice_linked_ids
is_free_channel = bool(channel_ids & free_channels) or is_voice_linked_channel
is_free_channel = (
"*" in free_channels
or bool(channel_ids & free_channels)
or is_voice_linked_channel
)
# Skip the mention check if the message is in a thread where
# the bot has previously participated (auto-created or replied in).
@@ -3307,6 +3294,7 @@ class DiscordAdapter(BasePlatformAdapter):
chat_topic = self._get_effective_topic(message.channel, is_thread=is_thread)
# Build source
guild = getattr(message, "guild", None)
source = self.build_source(
chat_id=str(effective_channel.id),
chat_name=chat_name,
@@ -3316,7 +3304,7 @@ class DiscordAdapter(BasePlatformAdapter):
thread_id=thread_id,
chat_topic=chat_topic,
is_bot=getattr(message.author, "bot", False),
guild_id=str(message.guild.id) if message.guild else None,
guild_id=str(guild.id) if guild else None,
parent_chat_id=parent_channel_id,
message_id=str(message.id),
)
@@ -3870,6 +3858,15 @@ if DISCORD_AVAILABLE:
self.resolved = True
model_id = interaction.data["values"][0]
self.clear_items()
await interaction.response.edit_message(
embed=discord.Embed(
title="⚙ Switching Model",
description=f"Switching to `{model_id}`...",
color=discord.Color.blue(),
),
view=None,
)
try:
result_text = await self.on_model_selected(
@@ -3880,14 +3877,13 @@ if DISCORD_AVAILABLE:
except Exception as exc:
result_text = f"Error switching model: {exc}"
self.clear_items()
await interaction.response.edit_message(
await interaction.edit_original_response(
embed=discord.Embed(
title="⚙ Model Switched",
description=result_text,
color=discord.Color.green(),
),
view=self,
view=None,
)
async def _on_back(self, interaction: discord.Interaction):
+3
View File
@@ -28,6 +28,7 @@ from email.header import decode_header
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email.utils import formatdate
from email import encoders
from pathlib import Path
from typing import Any, Dict, List, Optional
@@ -504,6 +505,7 @@ class EmailAdapter(BasePlatformAdapter):
msg["In-Reply-To"] = original_msg_id
msg["References"] = original_msg_id
msg["Date"] = formatdate(localtime=True)
msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
msg["Message-ID"] = msg_id
@@ -586,6 +588,7 @@ class EmailAdapter(BasePlatformAdapter):
msg["In-Reply-To"] = original_msg_id
msg["References"] = original_msg_id
msg["Date"] = formatdate(localtime=True)
msg_id = f"<hermes-{uuid.uuid4().hex[:12]}@{self._address.split('@')[1]}>"
msg["Message-ID"] = msg_id
+9
View File
@@ -57,6 +57,15 @@ class MessageDeduplicator:
if len(self._seen) > self._max_size:
cutoff = now - self._ttl
self._seen = {k: v for k, v in self._seen.items() if v > cutoff}
if len(self._seen) > self._max_size:
# TTL pruning alone does not cap the cache when every entry is
# still fresh. Keep the newest entries so the helper's
# max_size bound is enforced under sustained traffic.
newest = sorted(
self._seen.items(),
key=lambda item: item[1],
)[-self._max_size:]
self._seen = dict(newest)
return False
def clear(self):
+87 -3
View File
@@ -532,6 +532,20 @@ class MatrixAdapter(BasePlatformAdapter):
)
await crypto_store.open()
# Bind the store to the runtime device_id before any
# put_account() runs. PgCryptoStore defaults _device_id
# to "" and its crypto_account UPSERT never updates the
# device_id column on conflict — so once put_account
# writes blank, it stays blank forever. That breaks
# every downstream device-scoped olm operation: peer
# to-device ciphertext can't find our identity key and
# no megolm sessions ever land. Setting _device_id here
# (in-memory; the on-disk row may not exist yet) makes
# the first put_account write the correct value.
# DeviceID is a NewType(str) so plain str works at runtime.
if client.device_id:
await crypto_store.put_device_id(client.device_id)
crypto_state = _CryptoStateStore(state_store, self._joined_rooms)
olm = OlmMachine(client, crypto_store, crypto_state)
@@ -1164,13 +1178,83 @@ class MatrixAdapter(BasePlatformAdapter):
# Event callbacks
# ------------------------------------------------------------------
def _is_self_sender(self, sender: str) -> bool:
"""Return True if the sender refers to the bot's own account.
Matrix user IDs are byte-compared after trimming whitespace and
lowercasing some homeservers normalize the localpart case
differently at different API surfaces, and the reply-loop tail
of the "hall of mirrors" bug (#15763) has been observed with the
bot's own account bypassing a case-sensitive equality check.
When ``self._user_id`` is empty (whoami hasn't resolved yet, or
login failed), we cannot prove a sender is NOT us, so we return
True defensively an unidentified bot dropping its own events
is always preferable to falling into an echo loop.
"""
own = (self._user_id or "").strip().lower()
if not own:
return True
return sender.strip().lower() == own
@staticmethod
def _is_system_or_bridge_sender(sender: str) -> bool:
"""Return True if the sender looks like a system / bridge / appservice
identity rather than a real user.
Appservice namespaces on Matrix conventionally prefix bot / puppet
user IDs with an underscore (e.g. ``@_telegram_12345:server``,
``@_discord_999:server``, ``@_slack_...:server``). Server-notices
bots and bridge-controller bots on many homeservers use the same
pattern.
We treat these as system identities for pairing purposes: they
should never be offered a pairing code, because an operator
approving the code would hand the bridge itself permanent
authorization and every outbound message relayed by the bridge
would then loop back into the agent as an "authorized user
message", which is the root of issue #15763.
Matches:
``@_something:server`` appservice namespace convention
``@:server`` malformed / empty localpart
``:server`` malformed, no leading ``@``
"""
s = (sender or "").strip()
if not s:
return True
# Localpart is everything between leading '@' and ':'
if s.startswith("@"):
s = s[1:]
if ":" in s:
localpart, _, _ = s.partition(":")
else:
localpart = s
if not localpart:
return True
return localpart.startswith("_")
async def _on_room_message(self, event: Any) -> None:
"""Handle incoming room message events (text, media)."""
room_id = str(getattr(event, "room_id", ""))
sender = str(getattr(event, "sender", ""))
# Ignore own messages.
if sender == self._user_id:
# Ignore own messages (case-insensitive; also drops when our own
# user_id hasn't been resolved yet — see _is_self_sender docstring
# and issue #15763).
if self._is_self_sender(sender):
return
# Ignore appservice / bridge / system identities so they never
# trigger the pairing flow. Once a bridge user is paired, every
# outbound message it relays would loop back as an authorized
# user message (the "hall of mirrors" in #15763).
if self._is_system_or_bridge_sender(sender):
logger.debug(
"Matrix: ignoring system/bridge sender %s in %s",
sender,
room_id,
)
return
# Deduplicate by event ID.
@@ -1640,7 +1724,7 @@ class MatrixAdapter(BasePlatformAdapter):
async def _on_reaction(self, event: Any) -> None:
"""Handle incoming reaction events."""
sender = str(getattr(event, "sender", ""))
if sender == self._user_id:
if self._is_self_sender(sender):
return
event_id = str(getattr(event, "event_id", ""))
if self._is_duplicate_event(event_id):
File diff suppressed because it is too large Load Diff
+47 -1
View File
@@ -703,7 +703,6 @@ class TelegramAdapter(BasePlatformAdapter):
"write_timeout": _env_float("HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT", 20.0),
}
proxy_url = resolve_proxy_url("TELEGRAM_PROXY")
disable_fallback = (os.getenv("HERMES_TELEGRAM_DISABLE_FALLBACK_IPS", "").strip().lower() in ("1", "true", "yes", "on"))
fallback_ips = self._fallback_ips()
if not fallback_ips:
@@ -714,6 +713,8 @@ class TelegramAdapter(BasePlatformAdapter):
", ".join(fallback_ips),
)
proxy_targets = ["api.telegram.org", *fallback_ips]
proxy_url = resolve_proxy_url("TELEGRAM_PROXY", target_hosts=proxy_targets)
if fallback_ips and not proxy_url and not disable_fallback:
logger.info(
"[%s] Telegram fallback IPs active: %s",
@@ -1208,6 +1209,31 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=False, error=str(e))
async def delete_message(self, chat_id: str, message_id: str) -> bool:
"""Delete a previously sent Telegram message.
Used by the stream consumer's fresh-final cleanup path (ported
from openclaw/openclaw#72038) to remove long-lived preview
messages after sending the completed reply as a fresh message.
Telegram's Bot API ``deleteMessage`` works for bot-posted
messages in the last 48 hours. Failures are non-fatal the
caller leaves the preview in place and logs at debug level.
"""
if not self._bot:
return False
try:
await self._bot.delete_message(
chat_id=int(chat_id),
message_id=int(message_id),
)
return True
except Exception as e:
logger.debug(
"[%s] Failed to delete Telegram message %s: %s",
self.name, message_id, e,
)
return False
async def send_update_prompt(
self, chat_id: str, prompt: str, default: str = "",
session_key: str = "",
@@ -2327,6 +2353,26 @@ class TelegramAdapter(BasePlatformAdapter):
user = getattr(entity, "user", None)
if user and getattr(user, "id", None) == bot_id:
return True
elif entity_type == "bot_command" and expected:
# Telegram's official group-disambiguation form for slash
# commands (``/cmd@botname``) is emitted as a single
# ``bot_command`` entity covering the whole span — there
# is no accompanying ``mention`` entity. Treat it as a
# direct address to this bot when the ``@botname`` suffix
# matches. This is the form Telegram's own command menu
# autocomplete produces in groups, so dropping it at the
# mention gate would break /new, /reset, /help, ... for
# every group that has ``require_mention`` enabled (#15415).
offset = int(getattr(entity, "offset", -1))
length = int(getattr(entity, "length", 0))
if offset < 0 or length <= 0:
continue
command_text = source_text[offset:offset + length]
at_index = command_text.find("@")
if at_index < 0:
continue
if command_text[at_index:].strip().lower() == expected:
return True
return False
def _message_matches_mention_patterns(self, message: Message) -> bool:
+3 -3
View File
@@ -43,10 +43,10 @@ _DOH_PROVIDERS: list[dict] = [
_SEED_FALLBACK_IPS: list[str] = ["149.154.167.220"]
def _resolve_proxy_url() -> str | None:
def _resolve_proxy_url(target_hosts=None) -> str | None:
# Delegate to shared implementation (env vars + macOS system proxy detection)
from gateway.platforms.base import resolve_proxy_url
return resolve_proxy_url("TELEGRAM_PROXY")
return resolve_proxy_url("TELEGRAM_PROXY", target_hosts=target_hosts)
class TelegramFallbackTransport(httpx.AsyncBaseTransport):
@@ -60,7 +60,7 @@ class TelegramFallbackTransport(httpx.AsyncBaseTransport):
def __init__(self, fallback_ips: Iterable[str], **transport_kwargs):
self._fallback_ips = [ip for ip in dict.fromkeys(_normalize_fallback_ips(fallback_ips))]
proxy_url = _resolve_proxy_url()
proxy_url = _resolve_proxy_url(target_hosts=[_TELEGRAM_API_HOST, *self._fallback_ips])
if proxy_url and "proxy" not in transport_kwargs:
transport_kwargs["proxy"] = proxy_url
self._primary = httpx.AsyncHTTPTransport(**transport_kwargs)
File diff suppressed because it is too large Load Diff
+647
View File
@@ -0,0 +1,647 @@
"""
yuanbao_media.py 元宝平台媒体处理模块
提供 COS 上传文件下载TIM 媒体消息构建等功能
移植自 TypeScript media.tsyuanbao-openclaw-plugin
使用 httpx 替代 cos-nodejs-sdk-v5避免引入额外 SDK 依赖
COS 上传流程
1. 调用 genUploadInfo 获取临时凭证tmpSecretId/tmpSecretKey/sessionToken
2. 用临时凭证通过 HMAC-SHA1 签名构建 Authorization
3. HTTP PUT 上传到 COS
TIM 消息体构建
- buildImageMsgBody() TIMImageElem
- buildFileMsgBody() TIMFileElem
"""
from __future__ import annotations
import hashlib
import hmac
import logging
import os
import re
import secrets
import struct
import time
import urllib.parse
from datetime import datetime, timezone, timedelta
from typing import Optional, Any
import httpx
logger = logging.getLogger(__name__)
# ============ 常量 ============
UPLOAD_INFO_PATH = "/api/resource/genUploadInfo"
DEFAULT_API_DOMAIN = "yuanbao.tencent.com"
DEFAULT_MAX_SIZE_MB = 50
# COS 加速域名后缀(优先使用全球加速)
COS_USE_ACCELERATE = True
# ============ 类型映射 ============
# MIME → image_format 数字(TIM 协议字段)
_MIME_TO_IMAGE_FORMAT: dict[str, int] = {
"image/jpeg": 1,
"image/jpg": 1,
"image/gif": 2,
"image/png": 3,
"image/bmp": 4,
"image/webp": 255,
"image/heic": 255,
"image/tiff": 255,
}
# 文件扩展名 → MIME
_EXT_TO_MIME: dict[str, str] = {
".jpg": "image/jpeg",
".jpeg": "image/jpeg",
".png": "image/png",
".gif": "image/gif",
".webp": "image/webp",
".bmp": "image/bmp",
".heic": "image/heic",
".tiff": "image/tiff",
".ico": "image/x-icon",
".pdf": "application/pdf",
".doc": "application/msword",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".xls": "application/vnd.ms-excel",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
".ppt": "application/vnd.ms-powerpoint",
".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
".txt": "text/plain",
".zip": "application/zip",
".tar": "application/x-tar",
".gz": "application/gzip",
".mp3": "audio/mpeg",
".mp4": "video/mp4",
".wav": "audio/wav",
".ogg": "audio/ogg",
".webm": "video/webm",
}
# ============ 工具函数 ============
def guess_mime_type(filename: str) -> str:
"""根据文件扩展名猜测 MIME 类型。"""
ext = os.path.splitext(filename)[-1].lower()
return _EXT_TO_MIME.get(ext, "application/octet-stream")
def is_image(filename: str, mime_type: str = "") -> bool:
"""判断是否为图片类型。"""
if mime_type.startswith("image/"):
return True
ext = os.path.splitext(filename)[-1].lower()
return ext in {".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".heic", ".tiff", ".ico"}
def get_image_format(mime_type: str) -> int:
"""获取 TIM 图片格式编号。"""
return _MIME_TO_IMAGE_FORMAT.get(mime_type.lower(), 255)
def md5_hex(data: bytes) -> str:
"""计算 MD5 十六进制摘要。"""
return hashlib.md5(data).hexdigest()
def generate_file_id() -> str:
"""生成随机文件 ID(32 位 hex)。"""
return secrets.token_hex(16)
# ============ 图片尺寸解析(纯 Python,无需 Pillow ============
def parse_image_size(data: bytes) -> Optional[dict[str, int]]:
"""
解析图片宽高支持 JPEG/PNG/GIF/WebP无需第三方依赖
返回 {"width": w, "height": h} None无法识别
"""
return (
_parse_png_size(data)
or _parse_jpeg_size(data)
or _parse_gif_size(data)
or _parse_webp_size(data)
)
def _parse_png_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 24:
return None
if buf[:4] != b"\x89PNG":
return None
w = struct.unpack(">I", buf[16:20])[0]
h = struct.unpack(">I", buf[20:24])[0]
return {"width": w, "height": h}
def _parse_jpeg_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 4 or buf[0] != 0xFF or buf[1] != 0xD8:
return None
i = 2
while i < len(buf) - 9:
if buf[i] != 0xFF:
i += 1
continue
marker = buf[i + 1]
if marker in (0xC0, 0xC2):
h = struct.unpack(">H", buf[i + 5: i + 7])[0]
w = struct.unpack(">H", buf[i + 7: i + 9])[0]
return {"width": w, "height": h}
if i + 3 < len(buf):
i += 2 + struct.unpack(">H", buf[i + 2: i + 4])[0]
else:
break
return None
def _parse_gif_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 10:
return None
sig = buf[:6].decode("ascii", errors="replace")
if sig not in ("GIF87a", "GIF89a"):
return None
w = struct.unpack("<H", buf[6:8])[0]
h = struct.unpack("<H", buf[8:10])[0]
return {"width": w, "height": h}
def _parse_webp_size(buf: bytes) -> Optional[dict[str, int]]:
if len(buf) < 16:
return None
if buf[:4] != b"RIFF" or buf[8:12] != b"WEBP":
return None
chunk = buf[12:16].decode("ascii", errors="replace")
if chunk == "VP8 ":
if len(buf) >= 30 and buf[23] == 0x9D and buf[24] == 0x01 and buf[25] == 0x2A:
w = struct.unpack("<H", buf[26:28])[0] & 0x3FFF
h = struct.unpack("<H", buf[28:30])[0] & 0x3FFF
return {"width": w, "height": h}
elif chunk == "VP8L":
if len(buf) >= 25 and buf[20] == 0x2F:
bits = struct.unpack("<I", buf[21:25])[0]
w = (bits & 0x3FFF) + 1
h = ((bits >> 14) & 0x3FFF) + 1
return {"width": w, "height": h}
elif chunk == "VP8X":
if len(buf) >= 30:
w = (buf[24] | (buf[25] << 8) | (buf[26] << 16)) + 1
h = (buf[27] | (buf[28] << 8) | (buf[29] << 16)) + 1
return {"width": w, "height": h}
return None
# ============ URL 下载 ============
async def download_url(
url: str,
max_size_mb: int = DEFAULT_MAX_SIZE_MB,
) -> tuple[bytes, str]:
"""
下载 URL 内容返回 (bytes, content_type)
Args:
url: HTTP(S) URL
max_size_mb: 最大允许大小MB超过则抛出异常
Returns:
(data_bytes, content_type_string)
Raises:
ValueError: 内容超过大小限制
httpx.HTTPError: 网络/HTTP 错误
"""
max_bytes = max_size_mb * 1024 * 1024
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
# 先 HEAD 检查大小
try:
head = await client.head(url)
content_length = int(head.headers.get("content-length", 0) or 0)
if content_length > 0 and content_length > max_bytes:
raise ValueError(
f"文件过大: {content_length / 1024 / 1024:.1f} MB > {max_size_mb} MB"
)
except httpx.HTTPStatusError:
pass # 部分服务器不支持 HEAD,忽略
# GET 下载(流式读取,防止超限)
async with client.stream("GET", url) as resp:
resp.raise_for_status()
content_type = resp.headers.get("content-type", "").split(";")[0].strip()
chunks: list[bytes] = []
downloaded = 0
async for chunk in resp.aiter_bytes(65536):
downloaded += len(chunk)
if downloaded > max_bytes:
raise ValueError(
f"文件过大: 已超过 {max_size_mb} MB 限制"
)
chunks.append(chunk)
data = b"".join(chunks)
return data, content_type
# ============ COS 鉴权(HMAC-SHA1 ============
def _cos_sign(
method: str,
path: str,
params: dict[str, str],
headers: dict[str, str],
secret_id: str,
secret_key: str,
start_time: Optional[int] = None,
expire_seconds: int = 3600,
) -> str:
"""
构建 COS 请求签名q-sign-algorithm=sha1 方案
参考https://cloud.tencent.com/document/product/436/7778
Args:
method: HTTP 方法小写 "put"
path: URL 路径URL encode 后的小写
params: URL 查询参数 dict用于签名
headers: 参与签名的请求头 dictkey 需小写
secret_id: 临时 SecretIdtmpSecretId
secret_key: 临时 SecretKeytmpSecretKey
start_time: 签名起始 Unix 时间戳默认 now
expire_seconds: 签名有效期默认 3600
Returns:
Authorization header 完整字符串
"""
now = int(time.time())
q_sign_time = f"{start_time or now};{(start_time or now) + expire_seconds}"
# Step 1: SignKey = HMAC-SHA1(SecretKey, q-sign-time)
sign_key = hmac.new(
secret_key.encode("utf-8"),
q_sign_time.encode("utf-8"),
hashlib.sha1,
).hexdigest()
# Step 2: HttpString
# 参数和头部需按字典序排列,key 小写
sorted_params = sorted((k.lower(), urllib.parse.quote(str(v), safe="") ) for k, v in params.items())
sorted_headers = sorted((k.lower(), urllib.parse.quote(str(v), safe="") ) for k, v in headers.items())
url_param_list = ";".join(k for k, _ in sorted_params)
url_params = "&".join(f"{k}={v}" for k, v in sorted_params)
header_list = ";".join(k for k, _ in sorted_headers)
header_str = "&".join(f"{k}={v}" for k, v in sorted_headers)
http_string = "\n".join([
method.lower(),
path,
url_params,
header_str,
"",
])
# Step 3: StringToSign = sha1 hash of HttpString
sha1_of_http = hashlib.sha1(http_string.encode("utf-8")).hexdigest()
string_to_sign = "\n".join([
"sha1",
q_sign_time,
sha1_of_http,
"",
])
# Step 4: Signature = HMAC-SHA1(SignKey, StringToSign)
signature = hmac.new(
sign_key.encode("utf-8"),
string_to_sign.encode("utf-8"),
hashlib.sha1,
).hexdigest()
return (
f"q-sign-algorithm=sha1"
f"&q-ak={secret_id}"
f"&q-sign-time={q_sign_time}"
f"&q-key-time={q_sign_time}"
f"&q-header-list={header_list}"
f"&q-url-param-list={url_param_list}"
f"&q-signature={signature}"
)
# ============ 主要公开 API ============
async def get_cos_credentials(
app_key: str,
api_domain: str,
token: str,
filename: str = "file",
file_id: Optional[str] = None,
bot_id: str = "",
route_env: str = "",
) -> dict:
"""
调用 genUploadInfo 接口获取 COS 临时密钥及上传配置
Args:
app_key: 应用 Key用于 X-ID
api_domain: API 域名 https://bot.yuanbao.tencent.com
token: 当前有效的签票 tokenX-Token
filename: 待上传的文件名含扩展名
file_id: 客户端生成的唯一文件 ID不传则自动生成
bot_id: Bot 账号 ID用于 X-ID
Returns:
COS 上传配置 dict包含以下字段
bucketName (str) COS Bucket 名称
region (str) COS 地域
location (str) 上传 Key对象路径
encryptTmpSecretId (str) 临时 SecretId
encryptTmpSecretKey(str) 临时 SecretKey
encryptToken (str) SessionToken
startTime (int) 凭证起始时间戳Unix
expiredTime (int) 凭证过期时间戳Unix
resourceUrl (str) 上传后的公网访问 URL
resourceID (str) 资源 ID可选
Raises:
RuntimeError: 接口返回非 0 code 或字段缺失
"""
if file_id is None:
file_id = generate_file_id()
upload_url = f"{api_domain.rstrip('/')}{UPLOAD_INFO_PATH}"
headers = {
"Content-Type": "application/json",
"X-Token": token,
"X-ID": bot_id or app_key,
"X-Source": "web",
}
if route_env:
headers["X-Route-Env"] = route_env
body = {
"fileName": filename,
"fileId": file_id,
"docFrom": "localDoc",
"docOpenId": "",
}
async with httpx.AsyncClient(timeout=15.0) as client:
resp = await client.post(upload_url, json=body, headers=headers)
resp.raise_for_status()
result: dict[str, Any] = resp.json()
code = result.get("code")
if code != 0 and code is not None:
raise RuntimeError(
f"genUploadInfo 失败: code={code}, msg={result.get('msg', '')}"
)
data = result.get("data") or result
required_fields = ["bucketName", "location"]
missing = [f for f in required_fields if not data.get(f)]
if missing:
raise RuntimeError(
f"genUploadInfo 返回字段不完整: 缺少字段 {missing}"
)
return data
async def upload_to_cos(
file_bytes: bytes,
filename: str,
content_type: str,
credentials: dict,
bucket: str,
region: str,
) -> dict:
"""
通过 httpx PUT 请求将文件上传到 COS
使用临时凭证tmpSecretId/tmpSecretKey/sessionToken构建 HMAC-SHA1 签名
Args:
file_bytes: 文件二进制内容
filename: 文件名用于辅助计算 MIMEUUID
content_type: MIME 类型 "image/jpeg"
credentials: get_cos_credentials() 返回的 dict包含
encryptTmpSecretId tmpSecretId
encryptTmpSecretKey tmpSecretKey
encryptToken sessionToken
location COS key对象路径
resourceUrl 上传后公网 URL
startTime 凭证起始时间Unix
expiredTime 凭证过期时间Unix
bucket: COS Bucket 名称 chatbot-1234567890
region: COS 地域 ap-guangzhou
Returns:
上传结果 dict包含
url (str) COS 公网访问 URL
uuid (str) 文件内容 MD5
size (int) 文件大小字节
width (int, optional) 图片宽度仅图片
height (int, optional) 图片高度仅图片
Raises:
httpx.HTTPStatusError: COS 返回非 2xx 状态
RuntimeError: credentials 字段缺失
"""
secret_id: str = credentials.get("encryptTmpSecretId", "")
secret_key: str = credentials.get("encryptTmpSecretKey", "")
session_token: str = credentials.get("encryptToken", "")
cos_key: str = credentials.get("location", "")
resource_url: str = credentials.get("resourceUrl", "")
start_time: Optional[int] = credentials.get("startTime")
expired_time: Optional[int] = credentials.get("expiredTime")
if not secret_id or not secret_key or not cos_key:
raise RuntimeError(
f"COS credentials 不完整: secretId={bool(secret_id)}, "
f"secretKey={bool(secret_key)}, location={bool(cos_key)}"
)
# 构建 COS 上传 URL(优先使用全球加速域名)
if COS_USE_ACCELERATE:
cos_host = f"{bucket}.cos.accelerate.myqcloud.com"
else:
cos_host = f"{bucket}.cos.{region}.myqcloud.com"
# URL encode cos_key(保留 /
encoded_key = urllib.parse.quote(cos_key, safe="/")
cos_url = f"https://{cos_host}/{encoded_key.lstrip('/')}"
# 确定 Content-Type
if not content_type or content_type == "application/octet-stream":
if is_image(filename):
content_type = guess_mime_type(filename)
else:
content_type = "application/octet-stream"
# 计算文件 MD5 + size
file_uuid = md5_hex(file_bytes)
file_size = len(file_bytes)
# 参与签名的请求头
sign_headers = {
"host": cos_host,
"content-type": content_type,
"x-cos-security-token": session_token,
}
# 计算签名有效期
now = int(time.time())
sign_start = start_time if start_time else now
sign_expire = (expired_time - now) if expired_time and expired_time > now else 3600
authorization = _cos_sign(
method="put",
path=f"/{encoded_key.lstrip('/')}",
params={},
headers=sign_headers,
secret_id=secret_id,
secret_key=secret_key,
start_time=sign_start,
expire_seconds=sign_expire,
)
put_headers = {
"Authorization": authorization,
"Content-Type": content_type,
"x-cos-security-token": session_token,
}
logger.info(
"COS PUT: bucket=%s region=%s key=%s size=%d mime=%s",
bucket, region, cos_key, file_size, content_type,
)
async with httpx.AsyncClient(timeout=120.0) as client:
resp = await client.put(
cos_url,
content=file_bytes,
headers=put_headers,
)
resp.raise_for_status()
# 解析图片尺寸(仅图片类型)
result: dict[str, Any] = {
"url": resource_url or cos_url,
"uuid": file_uuid,
"size": file_size,
}
if content_type.startswith("image/"):
size_info = parse_image_size(file_bytes)
if size_info:
result["width"] = size_info["width"]
result["height"] = size_info["height"]
logger.info(
"COS 上传成功: url=%s size=%d",
result["url"], file_size,
)
return result
# ============ TIM 媒体消息构建 ============
def build_image_msg_body(
url: str,
uuid: Optional[str] = None,
filename: Optional[str] = None,
size: int = 0,
width: int = 0,
height: int = 0,
mime_type: str = "",
) -> list[dict]:
"""
构建腾讯 IM TIMImageElem 消息体
参考https://cloud.tencent.com/document/product/269/2720
Args:
url: 图片公网访问 URLCOS resourceUrl
uuid: 文件 UUIDMD5 或其他唯一标识
filename: 文件名uuid 为空时作为备用
size: 文件大小字节
width: 图片宽度像素
height: 图片高度像素
mime_type: MIME 类型用于确定 image_format
Returns:
TIMImageElem 消息体列表适合直接放入 msg_body
"""
_uuid = uuid or filename or _basename_from_url(url) or "image"
image_format = get_image_format(mime_type) if mime_type else 255
return [
{
"msg_type": "TIMImageElem",
"msg_content": {
"uuid": _uuid,
"image_format": image_format,
"image_info_array": [
{
"type": 1, # 1 = 原图
"size": size,
"width": width,
"height": height,
"url": url,
}
],
},
}
]
def build_file_msg_body(
url: str,
filename: str,
uuid: Optional[str] = None,
size: int = 0,
) -> list[dict]:
"""
构建腾讯 IM TIMFileElem 消息体
参考https://cloud.tencent.com/document/product/269/2720
Args:
url: 文件公网访问 URLCOS resourceUrl
filename: 文件名含扩展名
uuid: 文件 UUIDMD5 或其他唯一标识不传则使用 filename
size: 文件大小字节
Returns:
TIMFileElem 消息体列表适合直接放入 msg_body
"""
_uuid = uuid or filename
return [
{
"msg_type": "TIMFileElem",
"msg_content": {
"uuid": _uuid,
"file_name": filename,
"file_size": size,
"url": url,
},
}
]
# ============ 内部工具 ============
def _basename_from_url(url: str) -> str:
"""从 URL 提取文件名。"""
try:
parsed = urllib.parse.urlparse(url)
return os.path.basename(parsed.path)
except Exception:
return ""
File diff suppressed because it is too large Load Diff
+558
View File
@@ -0,0 +1,558 @@
"""
Yuanbao sticker (TIMFaceElem) support.
Ported from yuanbao-openclaw-plugin/src/sticker/.
TIMFaceElem wire format:
{
"msg_type": "TIMFaceElem",
"msg_content": {
"index": 0, # always 0 per Yuanbao convention
"data": "<json>", # serialised sticker metadata
}
}
The `data` field carries a JSON string with the sticker's metadata so the
receiver can look up the correct asset in the emoji pack.
"""
from __future__ import annotations
import json
import random
import re
import unicodedata
from typing import Optional
# ---------------------------------------------------------------------------
# Sticker catalogue ported from builtin-stickers.json
# Key : canonical name (Chinese)
# Value : {sticker_id, package_id, name, description, width, height, formats}
# ---------------------------------------------------------------------------
STICKER_MAP: dict[str, dict] = {
"六六六": {
"sticker_id": "278", "package_id": "1003", "name": "六六六",
"description": "666 厉害 牛 棒 绝了 好强 awesome",
"width": 128, "height": 128, "formats": "png",
},
"我想开了": {
"sticker_id": "262", "package_id": "1003", "name": "我想开了",
"description": "想开 佛系 释怀 顿悟 看淡了 无所谓",
"width": 128, "height": 128, "formats": "png",
},
"害羞": {
"sticker_id": "130", "package_id": "1003", "name": "害羞",
"description": "腼腆 不好意思 脸红 娇羞 羞涩 捂脸",
"width": 128, "height": 128, "formats": "png",
},
"比心": {
"sticker_id": "252", "package_id": "1003", "name": "比心",
"description": "笔芯 爱你 爱心手势 love heart 喜欢你",
"width": 128, "height": 128, "formats": "png",
},
"委屈": {
"sticker_id": "125", "package_id": "1003", "name": "委屈",
"description": "难过 想哭 可怜巴巴 瘪嘴 受伤 被欺负",
"width": 128, "height": 128, "formats": "png",
},
"亲亲": {
"sticker_id": "146", "package_id": "1003", "name": "亲亲",
"description": "么么 mua 亲一下 kiss 飞吻 啵",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "131", "package_id": "1003", "name": "",
"description": "帅 墨镜 cool 高冷 有型 swagger",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "145", "package_id": "1003", "name": "",
"description": "睡觉 困 zzZ 打盹 躺平 休眠 sleepy",
"width": 128, "height": 128, "formats": "png",
},
"发呆": {
"sticker_id": "152", "package_id": "1003", "name": "发呆",
"description": "懵 愣住 放空 呆滞 出神 脑子空白",
"width": 128, "height": 128, "formats": "png",
},
"可怜": {
"sticker_id": "157", "package_id": "1003", "name": "可怜",
"description": "卖萌 求饶 委屈巴巴 弱小 拜托 眼巴巴",
"width": 128, "height": 128, "formats": "png",
},
"摊手": {
"sticker_id": "200", "package_id": "1003", "name": "摊手",
"description": "无奈 没办法 耸肩 随便 那咋整 whatever",
"width": 128, "height": 128, "formats": "png",
},
"头大": {
"sticker_id": "213", "package_id": "1003", "name": "头大",
"description": "头疼 烦恼 郁闷 难搞 崩溃 一团乱",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "256", "package_id": "1003", "name": "",
"description": "害怕 惊恐 震惊 吓一跳 恐怖 怂",
"width": 128, "height": 128, "formats": "png",
},
"吐血": {
"sticker_id": "203", "package_id": "1003", "name": "吐血",
"description": "无语 崩溃 被雷 内伤 一口老血 屮",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "185", "package_id": "1003", "name": "",
"description": "傲娇 生气 不满 撇嘴 不理 赌气",
"width": 128, "height": 128, "formats": "png",
},
"嘿嘿": {
"sticker_id": "220", "package_id": "1003", "name": "嘿嘿",
"description": "坏笑 猥琐笑 偷笑 憨笑 得意 你懂的",
"width": 128, "height": 128, "formats": "png",
},
"头秃": {
"sticker_id": "218", "package_id": "1003", "name": "头秃",
"description": "程序员 加班 焦虑 没头发 秃了 肝爆",
"width": 128, "height": 128, "formats": "png",
},
"暗中观察": {
"sticker_id": "221", "package_id": "1003", "name": "暗中观察",
"description": "窥屏 潜水 偷偷看 角落 围观 屏住呼吸",
"width": 128, "height": 128, "formats": "png",
},
"我酸了": {
"sticker_id": "224", "package_id": "1003", "name": "我酸了",
"description": "嫉妒 柠檬精 羡慕 吃柠檬 眼红 恰柠檬",
"width": 128, "height": 128, "formats": "png",
},
"打call": {
"sticker_id": "246", "package_id": "1003", "name": "打call",
"description": "应援 加油 支持 喝彩 助威 call",
"width": 128, "height": 128, "formats": "png",
},
"庆祝": {
"sticker_id": "251", "package_id": "1003", "name": "庆祝",
"description": "祝贺 开心 耶 party 胜利 干杯",
"width": 128, "height": 128, "formats": "png",
},
"奋斗": {
"sticker_id": "151", "package_id": "1003", "name": "奋斗",
"description": "努力 加油 拼搏 冲 干劲 卷起来",
"width": 128, "height": 128, "formats": "png",
},
"惊讶": {
"sticker_id": "143", "package_id": "1003", "name": "惊讶",
"description": "震惊 哇 不敢相信 OMG 居然 这么离谱",
"width": 128, "height": 128, "formats": "png",
},
"疑问": {
"sticker_id": "144", "package_id": "1003", "name": "疑问",
"description": "问号 不懂 啥 为什么 啥情况 懵逼问",
"width": 128, "height": 128, "formats": "png",
},
"仔细分析": {
"sticker_id": "248", "package_id": "1003", "name": "仔细分析",
"description": "思考 推敲 认真 研究 琢磨 让我想想",
"width": 128, "height": 128, "formats": "png",
},
"撅嘴": {
"sticker_id": "184", "package_id": "1003", "name": "撅嘴",
"description": "嘟嘴 卖萌 不高兴 撒娇 嘴翘",
"width": 128, "height": 128, "formats": "png",
},
"泪奔": {
"sticker_id": "199", "package_id": "1003", "name": "泪奔",
"description": "大哭 伤心 破防 感动哭 泪流满面 呜呜",
"width": 128, "height": 128, "formats": "png",
},
"尊嘟假嘟": {
"sticker_id": "276", "package_id": "1003", "name": "尊嘟假嘟",
"description": "真的假的 真假 可爱问 你骗我 是不是",
"width": 128, "height": 128, "formats": "png",
},
"略略略": {
"sticker_id": "113", "package_id": "1003", "name": "略略略",
"description": "调皮 吐舌 不服 略 气死你 鬼脸",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "180", "package_id": "1003", "name": "",
"description": "想睡 倦 打哈欠 睁不开眼 好困啊 sleepy",
"width": 128, "height": 128, "formats": "png",
},
"折磨": {
"sticker_id": "181", "package_id": "1003", "name": "折磨",
"description": "难受 痛苦 煎熬 蚌埠住了 受不了 要命",
"width": 128, "height": 128, "formats": "png",
},
"抠鼻": {
"sticker_id": "182", "package_id": "1003", "name": "抠鼻",
"description": "不屑 无聊 淡定 无所谓 鄙视 挖鼻",
"width": 128, "height": 128, "formats": "png",
},
"鼓掌": {
"sticker_id": "183", "package_id": "1003", "name": "鼓掌",
"description": "拍手 叫好 赞同 666 喝彩 掌声",
"width": 128, "height": 128, "formats": "png",
},
"斜眼笑": {
"sticker_id": "204", "package_id": "1003", "name": "斜眼笑",
"description": "滑稽 坏笑 doge 意味深长 阴阳怪气 嘿嘿嘿",
"width": 128, "height": 128, "formats": "png",
},
"辣眼睛": {
"sticker_id": "216", "package_id": "1003", "name": "辣眼睛",
"description": "看不下去 cringe 毁三观 太丑了 瞎了",
"width": 128, "height": 128, "formats": "png",
},
"哦哟": {
"sticker_id": "217", "package_id": "1003", "name": "哦哟",
"description": "惊讶 起哄 哇哦 有戏 不简单 哟",
"width": 128, "height": 128, "formats": "png",
},
"吃瓜": {
"sticker_id": "222", "package_id": "1003", "name": "吃瓜",
"description": "围观 看戏 八卦 路人 看热闹 板凳",
"width": 128, "height": 128, "formats": "png",
},
"狗头": {
"sticker_id": "225", "package_id": "1003", "name": "狗头",
"description": "doge 保命 开玩笑 滑稽 反讽 懂的都懂",
"width": 128, "height": 128, "formats": "png",
},
"敬礼": {
"sticker_id": "227", "package_id": "1003", "name": "敬礼",
"description": "salute 尊重 收到 遵命 致敬 报告",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "231", "package_id": "1003", "name": "",
"description": "知道了 明白 敷衍 嗯 这样啊 收到",
"width": 128, "height": 128, "formats": "png",
},
"拿到红包": {
"sticker_id": "236", "package_id": "1003", "name": "拿到红包",
"description": "红包 谢谢老板 发财 开心 抢到了 欧气",
"width": 128, "height": 128, "formats": "png",
},
"牛吖": {
"sticker_id": "239", "package_id": "1003", "name": "牛吖",
"description": "牛 厉害 强 666 佩服 大佬",
"width": 128, "height": 128, "formats": "png",
},
"贴贴": {
"sticker_id": "272", "package_id": "1003", "name": "贴贴",
"description": "抱抱 亲昵 蹭蹭 亲密 靠靠 撒娇贴",
"width": 128, "height": 128, "formats": "png",
},
"爱心": {
"sticker_id": "138", "package_id": "1003", "name": "爱心",
"description": "心 love 喜欢你 红心 示爱 么么哒",
"width": 128, "height": 128, "formats": "png",
},
"晚安": {
"sticker_id": "170", "package_id": "1003", "name": "晚安",
"description": "好梦 睡了 night 早点休息 安啦 moon",
"width": 128, "height": 128, "formats": "png",
},
"太阳": {
"sticker_id": "176", "package_id": "1003", "name": "太阳",
"description": "晴天 早上好 阳光 morning 好天气 日",
"width": 128, "height": 128, "formats": "png",
},
"柠檬": {
"sticker_id": "266", "package_id": "1003", "name": "柠檬",
"description": "酸 嫉妒 柠檬精 羡慕 我酸 恰柠檬",
"width": 128, "height": 128, "formats": "png",
},
"大冤种": {
"sticker_id": "267", "package_id": "1003", "name": "大冤种",
"description": "倒霉 吃亏 自嘲 好心没好报 背锅 工具人",
"width": 128, "height": 128, "formats": "png",
},
"吐了": {
"sticker_id": "132", "package_id": "1003", "name": "吐了",
"description": "恶心 yue 受不了 嫌弃 想吐 生理不适",
"width": 128, "height": 128, "formats": "png",
},
"": {
"sticker_id": "134", "package_id": "1003", "name": "",
"description": "生气 愤怒 火大 暴躁 气炸 怼",
"width": 128, "height": 128, "formats": "png",
},
"玫瑰": {
"sticker_id": "165", "package_id": "1003", "name": "玫瑰",
"description": "花 示爱 表白 浪漫 送你花 情人节",
"width": 128, "height": 128, "formats": "png",
},
"凋谢": {
"sticker_id": "119", "package_id": "1003", "name": "凋谢",
"description": "花谢 失恋 难过 枯萎 心碎 凉了",
"width": 128, "height": 128, "formats": "png",
},
"点赞": {
"sticker_id": "159", "package_id": "1003", "name": "点赞",
"description": "赞 认同 好棒 good like 大拇指 顶",
"width": 128, "height": 128, "formats": "png",
},
"握手": {
"sticker_id": "164", "package_id": "1003", "name": "握手",
"description": "合作 你好 商务 hello deal 成交 友好",
"width": 128, "height": 128, "formats": "png",
},
"抱拳": {
"sticker_id": "163", "package_id": "1003", "name": "抱拳",
"description": "谢谢 失敬 江湖 承让 拜托 有礼",
"width": 128, "height": 128, "formats": "png",
},
"ok": {
"sticker_id": "169", "package_id": "1003", "name": "ok",
"description": "好的 收到 没问题 okay 行 可以 懂了",
"width": 128, "height": 128, "formats": "png",
},
"拳头": {
"sticker_id": "174", "package_id": "1003", "name": "拳头",
"description": "加油 干 冲 fight 力量 击拳 硬气",
"width": 128, "height": 128, "formats": "png",
},
"鞭炮": {
"sticker_id": "191", "package_id": "1003", "name": "鞭炮",
"description": "过年 喜庆 爆竹 春节 噼里啪啦 红",
"width": 128, "height": 128, "formats": "png",
},
"烟花": {
"sticker_id": "258", "package_id": "1003", "name": "烟花",
"description": "庆典 漂亮 新年 嘭 绽放 节日快乐",
"width": 128, "height": 128, "formats": "png",
},
}
def get_sticker_by_name(name: str) -> Optional[dict]:
"""
按名称查找贴纸支持模糊匹配
匹配优先级
1. 完全相等name
2. name 包含查询词前缀/子串
3. description 包含查询词同义词搜索
4. 通用模糊评分 sticker-search 同算法命中即返回得分最高的一条
返回 sticker dict找不到返回 None
"""
if not name:
return None
query = name.strip()
if query in STICKER_MAP:
return STICKER_MAP[query]
for key, sticker in STICKER_MAP.items():
if query in key or key in query:
return sticker
for sticker in STICKER_MAP.values():
desc = sticker.get("description", "")
if query in desc:
return sticker
matches = search_stickers(query, limit=1)
return matches[0] if matches else None
def get_random_sticker(category: str = None) -> dict:
"""
随机返回一个贴纸
若指定 category则在 description 中含有该关键词的贴纸里随机选取
category None 时从全表随机
"""
if category:
candidates = [
s for s in STICKER_MAP.values()
if category in s.get("description", "") or category in s.get("name", "")
]
if candidates:
return random.choice(candidates)
return random.choice(list(STICKER_MAP.values()))
def get_sticker_by_id(sticker_id: str) -> Optional[dict]:
"""按 sticker_id 精确查找贴纸。"""
if not sticker_id:
return None
sid = str(sticker_id).strip()
for sticker in STICKER_MAP.values():
if sticker.get("sticker_id") == sid:
return sticker
return None
# ---------------------------------------------------------------------------
# 模糊搜索(对齐 chatbot-web yuanbao-openclaw-plugin/sticker-cache.ts.searchStickers
# ---------------------------------------------------------------------------
_PUNCT_RE = re.compile(r"[\s\u3000\-_·.,,。!?\"“”'‘’、/\\]+")
def _normalize_text(raw: str) -> str:
return unicodedata.normalize("NFKC", str(raw or "")).strip().lower()
def _compact_text(raw: str) -> str:
return _PUNCT_RE.sub("", _normalize_text(raw))
def _multiset_char_hit_ratio(needle: str, haystack: str) -> float:
if not needle:
return 0.0
bag: dict[str, int] = {}
for ch in haystack:
bag[ch] = bag.get(ch, 0) + 1
hits = 0
for ch in needle:
n = bag.get(ch, 0)
if n > 0:
hits += 1
bag[ch] = n - 1
return hits / len(needle)
def _bigram_jaccard(a: str, b: str) -> float:
if len(a) < 2 or len(b) < 2:
return 0.0
A = {a[i:i + 2] for i in range(len(a) - 1)}
B = {b[i:i + 2] for i in range(len(b) - 1)}
inter = len(A & B)
union = len(A) + len(B) - inter
return inter / union if union else 0.0
def _longest_subsequence_ratio(needle: str, haystack: str) -> float:
if not needle:
return 0.0
j = 0
for ch in haystack:
if j >= len(needle):
break
if ch == needle[j]:
j += 1
return j / len(needle)
def _score_field(haystack: str, query: str) -> float:
hay = _normalize_text(haystack)
q = _normalize_text(query)
if not hay or not q:
return 0.0
hay_c = _compact_text(haystack)
q_c = _compact_text(query)
best = 0.0
if hay == q:
best = max(best, 100.0)
if q in hay:
best = max(best, 92 + min(6, len(q)))
if len(q) >= 2 and hay.startswith(q):
best = max(best, 88.0)
if q_c and q_c in hay_c:
best = max(best, 86.0)
best = max(best, _multiset_char_hit_ratio(q_c, hay_c) * 62)
best = max(best, _bigram_jaccard(q_c, hay_c) * 58)
best = max(best, _longest_subsequence_ratio(q_c, hay_c) * 52)
if len(q) == 1 and q in hay:
best = max(best, 68.0)
return best
def search_stickers(query: str, limit: int = 10) -> list[dict]:
"""
在内置贴纸表中按模糊匹配排序返回前 N 条结果
评分综合 name/description 字段的子串字符多重集覆盖bigram Jaccard子序列比例
name 权重略高于 description×0.88 query 时按字典顺序返回前 N
"""
safe_limit = max(1, min(500, int(limit) if limit else 10))
if not query or not _normalize_text(query):
return list(STICKER_MAP.values())[:safe_limit]
scored: list[tuple[float, dict]] = []
for sticker in STICKER_MAP.values():
name_s = _score_field(sticker.get("name", ""), query)
desc_s = _score_field(sticker.get("description", ""), query) * 0.88
sid = str(sticker.get("sticker_id", "")).strip()
q_norm = _normalize_text(query)
id_s = 0.0
if sid and q_norm:
sid_norm = _normalize_text(sid)
if sid_norm == q_norm:
id_s = 100.0
elif q_norm in sid_norm:
id_s = 84.0
scored.append((max(name_s, desc_s, id_s), sticker))
scored.sort(key=lambda x: x[0], reverse=True)
top = scored[0][0] if scored else 0
if top <= 0:
return [s for _, s in scored[:safe_limit]]
if top >= 22:
floor = 18.0
elif top >= 12:
floor = max(10.0, top * 0.5)
else:
floor = max(6.0, top * 0.35)
filtered = [pair for pair in scored if pair[0] >= floor]
out = filtered if filtered else scored
return [s for _, s in out[:safe_limit]]
def build_face_msg_body(
face_index: int,
face_type: int = 1,
data: Optional[str] = None,
) -> list:
"""
构造 TIMFaceElem 消息体
Yuanbao 约定
- index 固定传 0服务端通过 data 字段识别具体表情
- data JSON 字符串包含 sticker_id / package_id 等字段
Args:
face_index: 保留字段暂时不影响 wire formatYuanbao 固定 index=0
face_index > 0 时视为旧版 QQ 表情 ID直接放入 index
face_type: 保留字段兼容旧接口当前未使用
data: 已序列化的 JSON 字符串 None 时仅传 index
Returns:
符合 Yuanbao TIM 协议的 msg_body list::
[{"msg_type": "TIMFaceElem", "msg_content": {"index": 0, "data": "..."}}]
"""
msg_content: dict = {"index": face_index}
if data is not None:
msg_content["data"] = data
return [{"msg_type": "TIMFaceElem", "msg_content": msg_content}]
def build_sticker_msg_body(sticker: dict) -> list:
"""
STICKER_MAP 中的 sticker dict 直接构造 TIMFaceElem 消息体
这是 send_sticker() 的内部辅助确保 data 字段与原始 JS 插件一致
"""
data_payload = json.dumps(
{
"sticker_id": sticker["sticker_id"],
"package_id": sticker["package_id"],
"width": sticker.get("width", 128),
"height": sticker.get("height", 128),
"formats": sticker.get("formats", "png"),
"name": sticker["name"],
},
ensure_ascii=False,
separators=(",", ":"),
)
return build_face_msg_body(face_index=0, data=data_payload)
+1041 -599
View File
File diff suppressed because it is too large Load Diff
+79 -19
View File
@@ -60,6 +60,10 @@ from .config import (
SessionResetPolicy, # noqa: F401 — re-exported via gateway/__init__.py
HomeChannel,
)
from .whatsapp_identity import (
canonical_whatsapp_identifier,
normalize_whatsapp_identifier,
)
@dataclass
@@ -198,6 +202,31 @@ that requires raw IDs). Discord is excluded because mentions use ``<@user_id>``
and the LLM needs the real ID to tag users."""
def _discord_tools_loaded() -> bool:
"""True iff the agent will actually have Discord tools this session.
Two conditions must hold:
1. The `discord` or `discord_admin` toolset is enabled for the
Discord platform via `hermes tools` (opt-in, default OFF).
2. `DISCORD_BOT_TOKEN` is set the tool's `check_fn` gates on it
at registry time, so the toolset being enabled in config is not
enough if the token isn't configured.
Returns False (safe default keeps the stale-API disclaimer) on any
error so a bad config can't silently promise tools the agent lacks.
"""
if not (os.environ.get("DISCORD_BOT_TOKEN") or "").strip():
return False
try:
from hermes_cli.config import load_config
from hermes_cli.tools_config import _get_platform_tools
cfg = load_config()
enabled = _get_platform_tools(cfg, "discord", include_default_mcp_servers=False)
return "discord" in enabled or "discord_admin" in enabled
except Exception:
return False
def build_session_context_prompt(
context: SessionContext,
*,
@@ -281,17 +310,17 @@ def build_session_context_prompt(
"**Platform notes:** You are running inside Slack. "
"You do NOT have access to Slack-specific APIs — you cannot search "
"channel history, pin/unpin messages, manage channels, or list users. "
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
"Do not promise to perform these actions. The gateway may inline the "
"current message's Slack block/attachment payload when available, but "
"you still cannot call Slack APIs yourself."
)
elif context.source.platform == Platform.DISCORD:
# The discord tool self-gates on DISCORD_BOT_TOKEN at registry
# check time. Match that condition so the prompt stays honest:
# with a token the agent has fetch_messages/search_members/
# create_thread (and optionally discord_admin) and should know
# the IDs it can call them with; without one it really is
# limited to reading/replying via the gateway.
if (os.environ.get("DISCORD_BOT_TOKEN") or "").strip():
# Inject the Discord IDs block only when the agent actually has
# Discord tools loaded this session — i.e. the user opted into
# `discord` / `discord_admin` via `hermes tools` AND the bot
# token is configured. Otherwise keep the stale-API disclaimer
# honest so we never promise tools the agent lacks.
if _discord_tools_loaded():
src = context.source
id_lines = ["", "**Discord IDs (for the `discord` / `discord_admin` tools):**"]
if src.guild_id:
@@ -313,6 +342,26 @@ def build_session_context_prompt(
"Do not promise to perform these actions. If the user asks, explain "
"that you can only read messages sent directly to you and respond."
)
elif context.source.platform == Platform.BLUEBUBBLES:
lines.append("")
lines.append(
"**Platform notes:** You are responding via iMessage. "
"Keep responses short and conversational — think texts, not essays. "
"Structure longer replies as separate short thoughts, each separated "
"by a blank line (double newline). Each block between blank lines "
"will be delivered as its own iMessage bubble, so write accordingly: "
"one idea per bubble, 13 sentences each. "
"If the user needs a detailed answer, give the short version first "
"and offer to elaborate."
)
elif context.source.platform == Platform.YUANBAO:
lines.append("")
lines.append(
"**Platform notes:** You are running inside Yuanbao. "
"You CAN send private (DM) messages via the send_message tool. "
"Use target='yuanbao:direct:<account_id>' for DM "
"and target='yuanbao:group:<group_code>' for group chat."
)
# Connected platforms
platforms_list = ["local (files on this machine)"]
@@ -399,11 +448,11 @@ class SessionEntry:
auto_reset_reason: Optional[str] = None # "idle" or "daily"
reset_had_activity: bool = False # whether the expired session had any messages
# Set by the background expiry watcher after it successfully flushes
# memories for this session. Persisted to sessions.json so the flag
# survives gateway restarts (the old in-memory _pre_flushed_sessions
# set was lost on restart, causing redundant re-flushes).
memory_flushed: bool = False
# Set by the background expiry watcher after it finalizes an expired
# session (invoking on_session_finalize hooks and evicting the cached
# agent). Persisted to sessions.json so the flag survives gateway
# restarts — prevents redundant finalization runs.
expiry_finalized: bool = False
# When True the next call to get_or_create_session() will auto-reset
# this session (create a new session_id) so the user starts fresh.
@@ -439,7 +488,7 @@ class SessionEntry:
"last_prompt_tokens": self.last_prompt_tokens,
"estimated_cost_usd": self.estimated_cost_usd,
"cost_status": self.cost_status,
"memory_flushed": self.memory_flushed,
"expiry_finalized": self.expiry_finalized,
"suspended": self.suspended,
"resume_pending": self.resume_pending,
"resume_reason": self.resume_reason,
@@ -491,7 +540,7 @@ class SessionEntry:
last_prompt_tokens=data.get("last_prompt_tokens", 0),
estimated_cost_usd=data.get("estimated_cost_usd", 0.0),
cost_status=data.get("cost_status", "unknown"),
memory_flushed=data.get("memory_flushed", False),
expiry_finalized=data.get("expiry_finalized", data.get("memory_flushed", False)),
suspended=data.get("suspended", False),
resume_pending=data.get("resume_pending", False),
resume_reason=data.get("resume_reason"),
@@ -550,15 +599,24 @@ def build_session_key(
"""
platform = source.platform.value
if source.chat_type == "dm":
if source.chat_id:
dm_chat_id = source.chat_id
if source.platform == Platform.WHATSAPP:
dm_chat_id = canonical_whatsapp_identifier(source.chat_id)
if dm_chat_id:
if source.thread_id:
return f"agent:main:{platform}:dm:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{source.chat_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}"
if source.thread_id:
return f"agent:main:{platform}:dm:{source.thread_id}"
return f"agent:main:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
if participant_id and source.platform == Platform.WHATSAPP:
# Same JID/LID-flip bug as the DM case: without canonicalisation, a
# single group member gets two isolated per-user sessions when the
# bridge reshuffles alias forms.
participant_id = canonical_whatsapp_identifier(str(participant_id)) or participant_id
key_parts = ["agent:main", platform, source.chat_type]
if source.chat_id:
@@ -1183,6 +1241,7 @@ class SessionStore:
reasoning_content=message.get("reasoning_content") if message.get("role") == "assistant" else None,
reasoning_details=message.get("reasoning_details") if message.get("role") == "assistant" else None,
codex_reasoning_items=message.get("codex_reasoning_items") if message.get("role") == "assistant" else None,
codex_message_items=message.get("codex_message_items") if message.get("role") == "assistant" else None,
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
@@ -1215,6 +1274,7 @@ class SessionStore:
reasoning_content=msg.get("reasoning_content") if role == "assistant" else None,
reasoning_details=msg.get("reasoning_details") if role == "assistant" else None,
codex_reasoning_items=msg.get("codex_reasoning_items") if role == "assistant" else None,
codex_message_items=msg.get("codex_message_items") if role == "assistant" else None,
)
except Exception as e:
logger.debug("Failed to rewrite transcript in DB: %s", e)
+110
View File
@@ -44,6 +44,14 @@ class StreamConsumerConfig:
buffer_threshold: int = 40
cursor: str = ""
buffer_only: bool = False
# When >0, the final edit for a streamed response is delivered as a
# fresh message if the original preview has been visible for at least
# this many seconds. This makes the platform's visible timestamp
# reflect completion time instead of first-token time for long-running
# responses (e.g. reasoning models that stream slowly). Ported from
# openclaw/openclaw#72038. Default 0 = always edit in place (legacy
# behavior). The gateway enables this selectively per-platform.
fresh_final_after_seconds: float = 0.0
class GatewayStreamConsumer:
@@ -91,6 +99,12 @@ class GatewayStreamConsumer:
self._queue: queue.Queue = queue.Queue()
self._accumulated = ""
self._message_id: Optional[str] = None
# Wall-clock timestamp (time.monotonic) when ``_message_id`` was
# first assigned from a successful first-send. Used by the
# fresh-final logic to detect long-lived previews whose edit
# timestamps would be stale by completion time. Ported from
# openclaw/openclaw#72038.
self._message_created_ts: Optional[float] = None
self._already_sent = False
self._edit_supported = True # Disabled when progressive edits are no longer usable
self._last_edit_time = 0.0
@@ -136,6 +150,7 @@ class GatewayStreamConsumer:
if preserve_no_edit and self._message_id == "__no_edit__":
return
self._message_id = None
self._message_created_ts = None
self._accumulated = ""
self._last_sent_text = ""
self._fallback_final_send = False
@@ -734,6 +749,81 @@ class GatewayStreamConsumer:
logger.error("Commentary send error: %s", e)
return False
def _should_send_fresh_final(self) -> bool:
"""Return True when a long-lived preview should be replaced with a
fresh final message instead of an edit.
Conditions:
- Fresh-final is enabled (``fresh_final_after_seconds > 0``).
- We have a real preview message id (not the ``__no_edit__`` sentinel
and not ``None``).
- The preview has been visible for at least the configured threshold.
Ported from openclaw/openclaw#72038.
"""
threshold = getattr(self.cfg, "fresh_final_after_seconds", 0.0) or 0.0
if threshold <= 0:
return False
if not self._message_id or self._message_id == "__no_edit__":
return False
if self._message_created_ts is None:
return False
age = time.monotonic() - self._message_created_ts
return age >= threshold
async def _try_fresh_final(self, text: str) -> bool:
"""Send ``text`` as a brand-new message (best-effort delete the old
preview) so the platform's visible timestamp reflects completion
time. Returns True on successful delivery, False on any failure so
the caller falls back to the normal edit path.
Ported from openclaw/openclaw#72038.
"""
old_message_id = self._message_id
try:
result = await self.adapter.send(
chat_id=self.chat_id,
content=text,
metadata=self.metadata,
)
except Exception as e:
logger.debug("Fresh-final send failed, falling back to edit: %s", e)
return False
if not getattr(result, "success", False):
return False
# Successful fresh send — try to delete the stale preview so the
# user doesn't see the old edit-stuck message underneath. Cleanup
# is best-effort; platforms that don't implement ``delete_message``
# just leave the preview behind (still an acceptable outcome —
# the visible final timestamp is the important part).
if old_message_id and old_message_id != "__no_edit__":
delete_fn = getattr(self.adapter, "delete_message", None)
if delete_fn is not None:
try:
await delete_fn(self.chat_id, old_message_id)
except Exception as e:
logger.debug(
"Fresh-final preview cleanup failed (%s): %s",
old_message_id, e,
)
# Adopt the new message id as the current message so subsequent
# callers (e.g. overflow split loops, finalize retries) see a
# consistent state.
new_message_id = getattr(result, "message_id", None)
if new_message_id:
self._message_id = new_message_id
self._message_created_ts = time.monotonic()
else:
# Send succeeded but platform didn't return an id — treat the
# delivery as final-only and fall back to "__no_edit__" so we
# don't try to edit something we can't address.
self._message_id = "__no_edit__"
self._message_created_ts = None
self._already_sent = True
self._last_sent_text = text
self._final_response_sent = True
return True
async def _send_or_edit(self, text: str, *, finalize: bool = False) -> bool:
"""Send or edit the streaming message.
@@ -786,6 +876,22 @@ class GatewayStreamConsumer:
finalize and self._adapter_requires_finalize
):
return True
# Fresh-final for long-lived previews: when finalizing
# the last edit in a streaming sequence, if the
# original preview has been visible for at least
# ``fresh_final_after_seconds``, send the completed
# reply as a fresh message so the platform's visible
# timestamp reflects completion time instead of the
# preview creation time. Best-effort cleanup of the
# old preview follows. Ported from
# openclaw/openclaw#72038. Gated by config so the
# legacy edit-in-place path stays the default.
if (
finalize
and self._should_send_fresh_final()
and await self._try_fresh_final(text)
):
return True
# Edit existing message
result = await self.adapter.edit_message(
chat_id=self.chat_id,
@@ -852,6 +958,10 @@ class GatewayStreamConsumer:
if result.success:
if result.message_id:
self._message_id = result.message_id
# Track when the preview first became visible to
# the user so fresh-final logic can detect stale
# preview timestamps on long-running responses.
self._message_created_ts = time.monotonic()
else:
self._edit_supported = False
self._already_sent = True
+155
View File
@@ -0,0 +1,155 @@
"""Shared helpers for canonicalising WhatsApp sender identity.
WhatsApp's bridge can surface the same human under two different JID shapes
within a single conversation:
- LID form: ``999999999999999@lid``
- Phone form: ``15551234567@s.whatsapp.net``
Both the authorisation path (:mod:`gateway.run`) and the session-key path
(:mod:`gateway.session`) need to collapse these aliases to a single stable
identity. This module is the single source of truth for that resolution so
the two paths can never drift apart.
Public helpers:
- :func:`normalize_whatsapp_identifier` strip JID/LID/device/plus syntax
down to the bare numeric identifier.
- :func:`canonical_whatsapp_identifier` walk the bridge's
``lid-mapping-*.json`` files and return a stable canonical identity
across phone/LID variants.
- :func:`expand_whatsapp_aliases` return the full alias set for an
identifier. Used by authorisation code that needs to match any known
form of a sender against an allow-list.
Plugins that need per-sender behaviour on WhatsApp (role-based routing,
per-contact authorisation, policy gating in a gateway hook) should use
``canonical_whatsapp_identifier`` so their bookkeeping lines up with
Hermes' own session keys.
"""
from __future__ import annotations
import json
import logging
import re
from typing import Set
logger = logging.getLogger(__name__)
# WhatsApp JIDs are numeric (or plus-prefixed numeric) with optional
# ``@``, ``.`` and ``:`` separators. ``\w`` is pinned to ASCII so
# full-width digits / Unicode word chars can't sneak through.
_SAFE_IDENTIFIER_RE = re.compile(r"^[A-Za-z0-9@.+\-]+$")
from hermes_constants import get_hermes_home
def normalize_whatsapp_identifier(value: str) -> str:
"""Strip WhatsApp JID/LID syntax down to its stable numeric identifier.
Accepts any of the identifier shapes the WhatsApp bridge may emit:
``"60123456789@s.whatsapp.net"``, ``"60123456789:47@s.whatsapp.net"``,
``"60123456789@lid"``, or a bare ``"+601****6789"`` / ``"60123456789"``.
Returns just the numeric identifier (``"60123456789"``) suitable for
equality comparisons.
Useful for plugins that want to match sender IDs against
user-supplied config (phone numbers in ``config.yaml``) without
worrying about which variant the bridge happens to deliver.
"""
return (
str(value or "")
.strip()
.replace("+", "", 1)
.split(":", 1)[0]
.split("@", 1)[0]
)
def expand_whatsapp_aliases(identifier: str) -> Set[str]:
"""Resolve WhatsApp phone/LID aliases via bridge session mapping files.
Returns the set of all identifiers transitively reachable through the
bridge's ``$HERMES_HOME/whatsapp/session/lid-mapping-*.json`` files,
starting from ``identifier``. The result always includes the
normalized input itself, so callers can safely ``in`` check against
the return value without a separate fallback branch.
Returns an empty set if ``identifier`` normalizes to empty.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return set()
session_dir = get_hermes_home() / "whatsapp" / "session"
resolved: Set[str] = set()
queue = [normalized]
while queue:
current = queue.pop(0)
if not current or current in resolved:
continue
# Defense-in-depth: reject identifiers that could sneak path
# separators / traversal segments into the ``lid-mapping-{current}``
# filename below. The hardcoded ``lid-mapping-`` prefix already
# prevents escape via pathlib's component split (an attacker can't
# create ``lid-mapping-..`` as a real directory in session_dir), but
# this keeps the identifier space to the characters WhatsApp JIDs
# actually use and avoids depending on that filesystem-layout
# invariant.
if not _SAFE_IDENTIFIER_RE.match(current):
continue
resolved.add(current)
for suffix in ("", "_reverse"):
mapping_path = session_dir / f"lid-mapping-{current}{suffix}.json"
if not mapping_path.exists():
continue
try:
mapped = normalize_whatsapp_identifier(
json.loads(mapping_path.read_text(encoding="utf-8"))
)
except (OSError, json.JSONDecodeError) as exc:
logger.debug("whatsapp_identity: failed to read %s: %s", mapping_path, exc)
continue
if mapped and mapped not in resolved:
queue.append(mapped)
return resolved
def canonical_whatsapp_identifier(identifier: str) -> str:
"""Return a stable WhatsApp sender identity across phone-JID/LID variants.
WhatsApp may surface the same person under either a phone-format JID
(``60123456789@s.whatsapp.net``) or a LID (``1234567890@lid``). This
applies to a DM ``chat_id`` *and* to the ``participant_id`` of a
member inside a group chat both represent a user identity, and the
bridge may flip between the two for the same human.
This helper reads the bridge's ``whatsapp/session/lid-mapping-*.json``
files, walks the mapping transitively, and picks the shortest
(numeric-preferred) alias as the canonical identity.
:func:`gateway.session.build_session_key` uses this for both WhatsApp
DM chat_ids and WhatsApp group participant_ids, so callers get the
same session-key identity Hermes itself uses.
Plugins that need per-sender behaviour (role-based routing,
authorisation, per-contact policy) should use this so their
bookkeeping lines up with Hermes' session bookkeeping even when
the bridge reshuffles aliases.
Returns an empty string if ``identifier`` normalizes to empty. If no
mapping files exist yet (fresh bridge install), returns the
normalized input unchanged.
"""
normalized = normalize_whatsapp_identifier(identifier)
if not normalized:
return ""
# expand_whatsapp_aliases always includes `normalized` itself in the
# returned set, so the min() below degrades gracefully to `normalized`
# when no lid-mapping files are present.
aliases = expand_whatsapp_aliases(normalized)
return min(aliases, key=lambda candidate: (len(candidate), candidate))
+963 -107
View File
File diff suppressed because it is too large Load Diff
+72 -3
View File
@@ -110,18 +110,40 @@ def _display_source(source: str) -> str:
return source.split(":", 1)[1] if source.startswith("manual:") else source
def _classify_exhausted_status(entry) -> tuple[str, bool]:
code = getattr(entry, "last_error_code", None)
reason = str(getattr(entry, "last_error_reason", "") or "").strip().lower()
message = str(getattr(entry, "last_error_message", "") or "").strip().lower()
if code == 429 or any(token in reason for token in ("rate_limit", "usage_limit", "quota", "exhausted")) or any(
token in message for token in ("rate limit", "usage limit", "quota", "too many requests")
):
return "rate-limited", True
if code in {401, 403} or any(token in reason for token in ("invalid_token", "invalid_grant", "unauthorized", "forbidden", "auth")) or any(
token in message for token in ("unauthorized", "forbidden", "expired", "revoked", "invalid token", "authentication")
):
return "auth failed", False
return "exhausted", True
def _format_exhausted_status(entry) -> str:
if entry.last_status != STATUS_EXHAUSTED:
return ""
label, show_retry_window = _classify_exhausted_status(entry)
reason = getattr(entry, "last_error_reason", None)
reason_text = f" {reason}" if isinstance(reason, str) and reason.strip() else ""
code = f" ({entry.last_error_code})" if entry.last_error_code else ""
if not show_retry_window:
return f" {label}{reason_text}{code} (re-auth may be required)"
exhausted_until = _exhausted_until(entry)
if exhausted_until is None:
return f" exhausted{reason_text}{code}"
return f" {label}{reason_text}{code}"
remaining = max(0, int(math.ceil(exhausted_until - time.time())))
if remaining <= 0:
return f" exhausted{reason_text}{code} (ready to retry)"
return f" {label}{reason_text}{code} (ready to retry)"
minutes, seconds = divmod(remaining, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
@@ -133,7 +155,7 @@ def _format_exhausted_status(entry) -> str:
wait = f"{minutes}m {seconds}s"
else:
wait = f"{seconds}s"
return f" exhausted{reason_text}{code} ({wait} left)"
return f" {label}{reason_text}{code} ({wait} left)"
def auth_add_command(args) -> None:
@@ -386,6 +408,44 @@ def auth_reset_command(args) -> None:
print(f"Reset status on {count} {provider} credentials")
def auth_status_command(args) -> None:
provider = _normalize_provider(getattr(args, "provider", "") or "")
if not provider:
raise SystemExit("Provider is required. Example: `hermes auth status spotify`.")
status = auth_mod.get_auth_status(provider)
if not status.get("logged_in"):
reason = status.get("error")
if reason:
print(f"{provider}: logged out ({reason})")
else:
print(f"{provider}: logged out")
return
print(f"{provider}: logged in")
for key in ("auth_type", "client_id", "redirect_uri", "scope", "expires_at", "api_base_url"):
value = status.get(key)
if value:
print(f" {key}: {value}")
def auth_logout_command(args) -> None:
auth_mod.logout_command(SimpleNamespace(provider=getattr(args, "provider", None)))
def auth_spotify_command(args) -> None:
action = str(getattr(args, "spotify_action", "") or "login").strip().lower()
if action in {"", "login"}:
auth_mod.login_spotify_command(args)
return
if action == "status":
auth_status_command(SimpleNamespace(provider="spotify"))
return
if action == "logout":
auth_logout_command(SimpleNamespace(provider="spotify"))
return
raise SystemExit(f"Unknown Spotify auth action: {action}")
def _interactive_auth() -> None:
"""Interactive credential pool management when `hermes auth` is called bare."""
# Show current pool status first
@@ -583,5 +643,14 @@ def auth_command(args) -> None:
if action == "reset":
auth_reset_command(args)
return
if action == "status":
auth_status_command(args)
return
if action == "logout":
auth_logout_command(args)
return
if action == "spotify":
auth_spotify_command(args)
return
# No subcommand — launch interactive mode
_interactive_auth()
+300
View File
@@ -0,0 +1,300 @@
"""Azure Foundry endpoint auto-detection.
Inspect an Azure AI Foundry / Azure OpenAI endpoint to determine:
- API transport (OpenAI-style ``chat_completions`` vs
Anthropic-style ``anthropic_messages``)
- Available models (best effort Azure does not expose a deployment
listing via the inference API key, but Azure OpenAI v1 endpoints
return the resource's model catalog via ``GET /models``)
- Context length for each discovered/entered model, via the existing
:func:`agent.model_metadata.get_model_context_length` resolver.
Rationale:
Azure has no pure-API-key deployment-listing endpoint per Microsoft,
deployment enumeration requires ARM management-plane auth. Azure
OpenAI v1 endpoints ``{resource}.openai.azure.com/openai/v1`` do return
a ``/models`` list, but it reflects the resource's *available* models
rather than the user's *deployed* deployment names. In practice it is
still a useful hint the user picks a familiar model name and we look
up its context length from the catalog.
The detector never crashes on errors (every HTTP call is wrapped in a
broad try/except). Callers get a :class:`DetectionResult` with whatever
information could be gathered, and fall back to manual entry for the
rest.
"""
from __future__ import annotations
import json
import logging
import re
from dataclasses import dataclass, field
from typing import Optional
from urllib import request as urllib_request
from urllib.error import HTTPError, URLError
from urllib.parse import urlparse, urlunparse
logger = logging.getLogger(__name__)
# Default Azure OpenAI ``api-version`` to probe with. The v1 GA endpoint
# accepts requests without ``api-version`` entirely, so this is only used
# as a fallback for pre-v1 resources that still require it.
_AZURE_OPENAI_PROBE_API_VERSIONS = (
"2025-04-01-preview",
"2024-10-21", # oldest GA that supports /models
)
# Default Azure Anthropic ``api-version``. Matches the value used by
# ``agent/anthropic_adapter.py`` when building the Anthropic client.
_AZURE_ANTHROPIC_API_VERSION = "2025-04-15"
@dataclass
class DetectionResult:
"""Everything auto-detection could gather from a base URL + API key."""
#: Detected API transport: ``"chat_completions"``,
#: ``"anthropic_messages"``, or ``None`` when detection failed.
api_mode: Optional[str] = None
#: Deployment / model IDs returned by ``/models`` (best effort).
#: Empty when the endpoint doesn't expose the list with an API key.
models: list[str] = field(default_factory=list)
#: Lowercased host from the base URL (used for display messages).
hostname: str = ""
#: Human-readable reason the detector chose ``api_mode``. Useful
#: for explaining auto-detection to the user in the wizard.
reason: str = ""
#: ``True`` when ``/models`` returned a valid OpenAI-shaped payload.
models_probe_ok: bool = False
#: ``True`` when the URL was determined to be an Anthropic-style
#: endpoint (from path suffix or live probe).
is_anthropic: bool = False
def _http_get_json(url: str, api_key: str, timeout: float = 6.0) -> tuple[int, Optional[dict]]:
"""GET a URL with ``api-key`` + ``Authorization`` headers. Return
``(status_code, parsed_json_or_None)``. Never raises."""
req = urllib_request.Request(url, method="GET")
# Azure OpenAI uses ``api-key``. Some Azure deployments (and
# Anthropic-style routes) use ``Authorization: Bearer``. Send both
# so we probe once per URL rather than twice.
req.add_header("api-key", api_key)
req.add_header("Authorization", f"Bearer {api_key}")
req.add_header("User-Agent", "hermes-agent/azure-detect")
try:
with urllib_request.urlopen(req, timeout=timeout) as resp:
body = resp.read()
try:
return resp.status, json.loads(body.decode("utf-8", errors="replace"))
except Exception:
return resp.status, None
except HTTPError as exc:
return exc.code, None
except (URLError, TimeoutError, OSError) as exc:
logger.debug("azure_detect: GET %s failed: %s", url, exc)
return 0, None
except Exception as exc: # pragma: no cover — defensive
logger.debug("azure_detect: GET %s unexpected error: %s", url, exc)
return 0, None
def _strip_trailing_v1(url: str) -> str:
"""Strip trailing ``/v1`` or ``/v1/`` so we can construct sub-paths."""
return re.sub(r"/v1/?$", "", url.rstrip("/"))
def _looks_like_anthropic_path(url: str) -> bool:
"""Return True when the URL's path ends in ``/anthropic`` or
contains a ``/anthropic/`` segment. Used by Azure Foundry
resources that route Claude traffic through a dedicated path."""
try:
parsed = urlparse(url)
path = (parsed.path or "").lower().rstrip("/")
return path.endswith("/anthropic") or "/anthropic/" in path + "/"
except Exception:
return False
def _extract_model_ids(payload: dict) -> list[str]:
"""Extract a list of model IDs from an OpenAI-shaped ``/models``
response. Returns ``[]`` on any shape mismatch."""
data = payload.get("data") if isinstance(payload, dict) else None
if not isinstance(data, list):
return []
ids: list[str] = []
for item in data:
if not isinstance(item, dict):
continue
# OpenAI shape: {"id": "gpt-5.4", "object": "model", ...}
mid = item.get("id") or item.get("model") or item.get("name")
if isinstance(mid, str) and mid:
ids.append(mid)
return ids
def _probe_openai_models(base_url: str, api_key: str) -> tuple[bool, list[str]]:
"""Probe ``<base>/models`` for an OpenAI-shaped response.
Returns ``(ok, models)``. ``ok`` is True iff the endpoint accepted
us as an OpenAI-style caller (200 OK + OpenAI-shaped JSON body).
"""
base_url = base_url.rstrip("/")
# Azure OpenAI v1: {resource}.openai.azure.com/openai/v1 — no
# api-version required for GA paths, so probe without first.
candidates = [f"{base_url}/models"]
# Fallback: explicit api-version for pre-v1 resources
for v in _AZURE_OPENAI_PROBE_API_VERSIONS:
candidates.append(f"{base_url}/models?api-version={v}")
for url in candidates:
status, body = _http_get_json(url, api_key)
if status == 200 and body is not None:
ids = _extract_model_ids(body)
if ids:
logger.info(
"azure_detect: /models probe OK at %s (%d models)",
url, len(ids),
)
return True, ids
# 200 + empty list still counts as "OpenAI shape, no models
# listed" — let the user proceed with manual entry.
if isinstance(body, dict) and "data" in body:
return True, []
return False, []
def _probe_anthropic_messages(base_url: str, api_key: str) -> bool:
"""Send a zero-token request to ``<base>/v1/messages`` and check
whether the endpoint at least *recognises* the Anthropic Messages
shape (any 4xx that mentions ``messages`` or ``model``, or a 400
``invalid_request`` with an Anthropic error shape). Never completes
a real chat.
"""
base = _strip_trailing_v1(base_url)
url = f"{base}/v1/messages?api-version={_AZURE_ANTHROPIC_API_VERSION}"
payload = json.dumps({
"model": "probe",
"max_tokens": 1,
"messages": [{"role": "user", "content": "ping"}],
}).encode("utf-8")
req = urllib_request.Request(url, method="POST", data=payload)
req.add_header("api-key", api_key)
req.add_header("Authorization", f"Bearer {api_key}")
req.add_header("anthropic-version", "2023-06-01")
req.add_header("content-type", "application/json")
req.add_header("User-Agent", "hermes-agent/azure-detect")
try:
with urllib_request.urlopen(req, timeout=6.0) as resp:
# Should never 200 — "probe" isn't a real deployment. But
# if it does, the endpoint definitely speaks Anthropic.
return resp.status < 500
except HTTPError as exc:
# 4xx with an Anthropic-shaped error body = Anthropic endpoint.
try:
body = exc.read().decode("utf-8", errors="replace")
lowered = body.lower()
if "anthropic" in lowered or '"type"' in lowered and '"error"' in lowered:
return True
# Pre-Azure-v1 Azure Foundry returns a plain 404 for
# Anthropic-style calls on non-Anthropic deployments. A
# 400 "model not found" IS Anthropic though.
if exc.code == 400 and ("messages" in lowered or "model" in lowered):
return True
return False
except Exception:
return False
except (URLError, TimeoutError, OSError):
return False
except Exception: # pragma: no cover
return False
def detect(base_url: str, api_key: str) -> DetectionResult:
"""Inspect an Azure endpoint and describe its transport + models.
Call this from the wizard before asking the user to pick an API
mode manually. The caller should treat the returned
:class:`DetectionResult` as *advisory* if ``api_mode`` is None,
fall back to asking the user.
"""
result = DetectionResult()
try:
parsed = urlparse(base_url)
result.hostname = (parsed.hostname or "").lower()
except Exception:
result.hostname = ""
# 1. Path sniff. Azure Foundry exposes Anthropic-style deployments
# under a dedicated ``/anthropic`` path.
if _looks_like_anthropic_path(base_url):
result.is_anthropic = True
result.api_mode = "anthropic_messages"
result.reason = "URL path ends in /anthropic → Anthropic Messages API"
return result
# 2. Try the OpenAI-style /models probe. If this works, the
# endpoint definitely speaks OpenAI wire.
ok, models = _probe_openai_models(base_url, api_key)
if ok:
result.models_probe_ok = True
result.models = models
result.api_mode = "chat_completions"
result.reason = (
f"GET /models returned {len(models)} model(s) — OpenAI-style endpoint"
if models
else "GET /models returned an OpenAI-shaped empty list — OpenAI-style endpoint"
)
return result
# 3. Fallback: probe the Anthropic Messages shape. Slower and more
# intrusive than /models, so only run it when the OpenAI probe
# failed.
if _probe_anthropic_messages(base_url, api_key):
result.is_anthropic = True
result.api_mode = "anthropic_messages"
result.reason = "Endpoint accepts Anthropic Messages shape"
return result
# Nothing matched. Caller falls back to manual selection.
result.reason = (
"Could not probe endpoint (private network, missing model list, or "
"non-standard path) — falling back to manual API-mode selection"
)
return result
def lookup_context_length(model: str, base_url: str, api_key: str) -> Optional[int]:
"""Thin wrapper around :func:`agent.model_metadata.get_model_context_length`
that returns ``None`` when only the fallback default (128k) would
fire, so the wizard can distinguish "we actually know this" from
"we guessed."""
try:
from agent.model_metadata import (
DEFAULT_FALLBACK_CONTEXT,
get_model_context_length,
)
except Exception:
return None
try:
n = get_model_context_length(model, base_url=base_url, api_key=api_key)
except Exception as exc:
logger.debug("azure_detect: context length lookup failed: %s", exc)
return None
if isinstance(n, int) and n > 0 and n != DEFAULT_FALLBACK_CONTEXT:
return n
return None
__all__ = ["DetectionResult", "detect", "lookup_context_length"]
+177 -1
View File
@@ -36,12 +36,23 @@ _EXCLUDED_DIRS = {
"__pycache__", # bytecode caches — regenerated on import
".git", # nested git dirs (profiles shouldn't have these, but safety)
"node_modules", # js deps if website/ somehow leaks in
"backups", # prior auto-backups — don't nest backups exponentially
"checkpoints", # session-local trajectory caches — regenerated per-session,
# session-hash-keyed so they don't port to another machine anyway
}
# File-name suffixes to skip
_EXCLUDED_SUFFIXES = (
".pyc",
".pyo",
# SQLite sidecar files — the backup takes a consistent snapshot of ``*.db``
# via ``sqlite3.backup()``, so shipping the live WAL / shared-memory /
# rollback-journal alongside would pair a fresh snapshot with stale sidecar
# state and produce a torn restore on the next open. They're transient and
# regenerated on first connection anyway.
".db-wal",
".db-shm",
".db-journal",
)
# File names to skip (runtime state that's meaningless on another machine)
@@ -454,6 +465,12 @@ def run_import(args) -> None:
# Critical state files to include in quick snapshots (relative to HERMES_HOME).
# Everything else is either regeneratable (logs, cache) or managed separately
# (skills, repo, sessions/).
#
# Entries may be individual files OR directories. Directories are captured
# recursively; missing entries are silently skipped. Pairing data lives in
# platform-specific JSON blobs outside state.db, so it's listed here explicitly
# — `hermes update` snapshots this set before pulling so approved-user lists
# are recoverable if anything goes wrong (issue #15733).
_QUICK_STATE_FILES = (
"state.db",
"config.yaml",
@@ -463,6 +480,10 @@ _QUICK_STATE_FILES = (
"gateway_state.json",
"channel_directory.json",
"processes.json",
# Pairing stores (generic + per-platform JSONs outside state.db)
"pairing", # legacy location (gateway/pairing.py)
"platforms/pairing", # new location (gateway/pairing.py)
"feishu_comment_pairing.json", # Feishu comment subscription pairings
)
_QUICK_SNAPSHOTS_DIR = "state-snapshots"
@@ -498,7 +519,27 @@ def create_quick_snapshot(
for rel in _QUICK_STATE_FILES:
src = home / rel
if not src.exists() or not src.is_file():
if not src.exists():
continue
if src.is_dir():
# Walk the directory and record each file individually in the
# manifest so restore can treat them uniformly. Empty dirs are
# skipped (nothing to snapshot).
for sub in src.rglob("*"):
if not sub.is_file():
continue
sub_rel = sub.relative_to(home).as_posix()
dst = snap_dir / sub_rel
dst.parent.mkdir(parents=True, exist_ok=True)
try:
shutil.copy2(sub, dst)
manifest[sub_rel] = dst.stat().st_size
except (OSError, PermissionError) as exc:
logger.warning("Could not snapshot %s: %s", sub_rel, exc)
continue
if not src.is_file():
continue
dst = snap_dir / rel
@@ -653,3 +694,138 @@ def run_quick_backup(args) -> None:
print(f" Restore with: /snapshot restore {snap_id}")
else:
print("No state files found to snapshot.")
# ---------------------------------------------------------------------------
# Pre-update auto-backup
# ---------------------------------------------------------------------------
_PRE_UPDATE_BACKUPS_DIR = "backups"
_PRE_UPDATE_PREFIX = "pre-update-"
_PRE_UPDATE_DEFAULT_KEEP = 5
def _pre_update_backup_dir(hermes_home: Optional[Path] = None) -> Path:
home = hermes_home or get_hermes_home()
return home / _PRE_UPDATE_BACKUPS_DIR
def _prune_pre_update_backups(backup_dir: Path, keep: int) -> int:
"""Remove oldest pre-update backups beyond the keep limit.
Returns the number of files deleted. Only touches files matching
``pre-update-*.zip`` so hand-made zips dropped in the same directory
are never touched.
"""
if keep < 0:
keep = 0
if not backup_dir.exists():
return 0
backups = sorted(
(p for p in backup_dir.iterdir()
if p.is_file() and p.name.startswith(_PRE_UPDATE_PREFIX) and p.suffix.lower() == ".zip"),
key=lambda p: p.name,
reverse=True,
)
deleted = 0
for p in backups[keep:]:
try:
p.unlink()
deleted += 1
except OSError as exc:
logger.warning("Failed to prune backup %s: %s", p.name, exc)
return deleted
def create_pre_update_backup(
hermes_home: Optional[Path] = None,
keep: int = _PRE_UPDATE_DEFAULT_KEEP,
) -> Optional[Path]:
"""Create a full zip backup of HERMES_HOME under ``backups/``.
Mirrors :func:`run_backup` (same exclusion rules, same SQLite safe-copy)
but writes to ``<HERMES_HOME>/backups/pre-update-<timestamp>.zip`` and
auto-prunes old pre-update backups.
Returns the path to the created zip, or ``None`` if no files were
found or the backup could not be created. Never raises the caller
(``hermes update``) should continue even if the backup fails.
"""
hermes_root = hermes_home or get_default_hermes_root()
if not hermes_root.is_dir():
return None
backup_dir = _pre_update_backup_dir(hermes_root)
try:
backup_dir.mkdir(parents=True, exist_ok=True)
except OSError as exc:
logger.warning("Could not create pre-update backup dir %s: %s", backup_dir, exc)
return None
stamp = datetime.now().strftime("%Y-%m-%d-%H%M%S")
out_path = backup_dir / f"{_PRE_UPDATE_PREFIX}{stamp}.zip"
# Collect files (same logic as run_backup, minus the chatty progress prints)
files_to_add: list[tuple[Path, Path]] = []
try:
for dirpath, dirnames, filenames in os.walk(hermes_root, followlinks=False):
dp = Path(dirpath)
# Prune excluded directories in-place so os.walk doesn't descend
dirnames[:] = [d for d in dirnames if d not in _EXCLUDED_DIRS]
for fname in filenames:
fpath = dp / fname
try:
rel = fpath.relative_to(hermes_root)
except ValueError:
continue
if _should_exclude(rel):
continue
# Skip the output zip itself if it already exists
try:
if fpath.resolve() == out_path.resolve():
continue
except (OSError, ValueError):
pass
files_to_add.append((fpath, rel))
except OSError as exc:
logger.warning("Pre-update backup: walk failed: %s", exc)
return None
if not files_to_add:
return None
try:
with zipfile.ZipFile(out_path, "w", zipfile.ZIP_DEFLATED, compresslevel=6) as zf:
for abs_path, rel_path in files_to_add:
try:
if abs_path.suffix == ".db":
with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
tmp_db = Path(tmp.name)
try:
if _safe_copy_db(abs_path, tmp_db):
zf.write(tmp_db, arcname=str(rel_path))
finally:
tmp_db.unlink(missing_ok=True)
else:
zf.write(abs_path, arcname=str(rel_path))
except (PermissionError, OSError, ValueError) as exc:
logger.debug("Skipping %s in pre-update backup: %s", rel_path, exc)
continue
except OSError as exc:
logger.warning("Pre-update backup: zip write failed: %s", exc)
# Best-effort cleanup of partial file
try:
out_path.unlink(missing_ok=True)
except OSError:
pass
return None
_prune_pre_update_backups(backup_dir, keep=keep)
return out_path
+123 -11
View File
@@ -62,6 +62,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
aliases=("reset",)),
CommandDef("clear", "Clear screen and start a new session", "Session",
cli_only=True),
CommandDef("redraw", "Force a full UI repaint (recovers from terminal drift)", "Session",
cli_only=True),
CommandDef("history", "Show conversation history", "Session",
cli_only=True),
CommandDef("save", "Save the current conversation", "Session",
@@ -77,16 +79,14 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("rollback", "List or restore filesystem checkpoints", "Session",
args_hint="[number]"),
CommandDef("snapshot", "Create or restore state snapshots of Hermes config/state", "Session",
aliases=("snap",), args_hint="[create|restore <id>|prune]"),
cli_only=True, aliases=("snap",), args_hint="[create|restore <id>|prune]"),
CommandDef("stop", "Kill all running background processes", "Session"),
CommandDef("approve", "Approve a pending dangerous command", "Session",
gateway_only=True, args_hint="[session|always]"),
CommandDef("deny", "Deny a pending dangerous command", "Session",
gateway_only=True),
CommandDef("background", "Run a prompt in the background", "Session",
aliases=("bg",), args_hint="<prompt>"),
CommandDef("btw", "Ephemeral side question using session context (no tools, not persisted)", "Session",
args_hint="<question>"),
aliases=("bg", "btw"), args_hint="<prompt>"),
CommandDef("agents", "Show active agents and running tasks", "Session",
aliases=("tasks",)),
CommandDef("queue", "Queue a prompt for the next turn (doesn't interrupt)", "Session",
@@ -103,10 +103,10 @@ COMMAND_REGISTRY: list[CommandDef] = [
# Configuration
CommandDef("config", "Show current configuration", "Configuration",
cli_only=True),
CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--provider name] [--global]"),
CommandDef("provider", "Show available providers and current provider",
"Configuration"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info"),
CommandDef("model", "Switch model for this session", "Configuration",
aliases=("provider",), args_hint="[model] [--provider name] [--global]"),
CommandDef("gquota", "Show Google Gemini Code Assist quota usage", "Info",
cli_only=True),
CommandDef("personality", "Set a predefined personality", "Configuration",
args_hint="[name]"),
@@ -124,9 +124,12 @@ COMMAND_REGISTRY: list[CommandDef] = [
args_hint="[normal|fast|status]",
subcommands=("normal", "fast", "status", "on", "off")),
CommandDef("skin", "Show or change the display skin/theme", "Configuration",
args_hint="[name]"),
cli_only=True, args_hint="[name]"),
CommandDef("voice", "Toggle voice mode", "Configuration",
args_hint="[on|off|tts|status]", subcommands=("on", "off", "tts", "status")),
CommandDef("busy", "Control what Enter does while Hermes is working", "Configuration",
cli_only=True, args_hint="[queue|steer|interrupt|status]",
subcommands=("queue", "steer", "interrupt", "status")),
# Tools & Skills
CommandDef("tools", "Manage tools: /tools [list|disable|enable] [name...]", "Tools & Skills",
@@ -139,7 +142,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
cli_only=True, args_hint="[subcommand]",
subcommands=("list", "add", "create", "edit", "pause", "resume", "run", "remove")),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills"),
CommandDef("reload", "Reload .env variables into the running session", "Tools & Skills",
cli_only=True),
CommandDef("reload-mcp", "Reload MCP servers from config", "Tools & Skills",
aliases=("reload_mcp",)),
CommandDef("browser", "Connect browser tools to your live Chrome via CDP", "Tools & Skills",
@@ -317,7 +321,7 @@ def should_bypass_active_session(command_name: str | None) -> bool:
safety net in gateway.run discards any command text that reaches
the pending queue which meant a mid-run /model (or /reasoning,
/voice, /insights, /title, /resume, /retry, /undo, /compress,
/usage, /provider, /reload-mcp, /sethome, /reset) would silently
/usage, /reload-mcp, /sethome, /reset) would silently
interrupt the agent AND get discarded, producing a zero-char
response. See issue #5057 / PRs #6252, #10370, #4665.
@@ -804,6 +808,114 @@ def discord_skill_commands_by_category(
return trimmed_categories, uncategorized, hidden
# ---------------------------------------------------------------------------
# Slack native slash commands
# ---------------------------------------------------------------------------
# Slack slash command name constraints: lowercase a-z, 0-9, hyphens,
# underscores. Max 32 chars. Slack app manifest accepts up to 50 slash
# commands per app.
_SLACK_MAX_SLASH_COMMANDS = 50
_SLACK_NAME_LIMIT = 32
_SLACK_INVALID_CHARS = re.compile(r"[^a-z0-9_\-]")
def _sanitize_slack_name(raw: str) -> str:
"""Convert a command name to a valid Slack slash command name.
Slack allows lowercase a-z, digits, hyphens, and underscores. Max 32
chars. Uppercase is lowercased; invalid chars are stripped.
"""
name = raw.lower()
name = _SLACK_INVALID_CHARS.sub("", name)
name = name.strip("-_")
return name[:_SLACK_NAME_LIMIT]
def slack_native_slashes() -> list[tuple[str, str, str]]:
"""Return (slash_name, description, usage_hint) triples for Slack.
Every gateway-available command in ``COMMAND_REGISTRY`` is surfaced as
a standalone Slack slash command (e.g. ``/btw``, ``/stop``, ``/model``),
matching Discord's and Telegram's model where every command is a
first-class slash and not a ``/hermes <verb>`` subcommand.
Both canonical names and aliases are included so users can type any
documented form (e.g. ``/background``, ``/bg``, and ``/btw`` all work).
Plugin-registered slash commands are included too.
Results are clamped to Slack's 50-command limit with duplicate-name
avoidance. ``/hermes`` is always reserved as the first entry so the
legacy ``/hermes <subcommand>`` form keeps working for anything that
gets dropped by the clamp or for free-form questions.
"""
overrides = _resolve_config_gates()
entries: list[tuple[str, str, str]] = []
seen: set[str] = set()
# Reserve /hermes as the catch-all top-level command.
entries.append(("hermes", "Talk to Hermes or run a subcommand", "[subcommand] [args]"))
seen.add("hermes")
def _add(name: str, desc: str, hint: str) -> None:
slack_name = _sanitize_slack_name(name)
if not slack_name or slack_name in seen:
return
if len(entries) >= _SLACK_MAX_SLASH_COMMANDS:
return
# Slack description cap is 2000 chars; keep it short.
entries.append((slack_name, desc[:140], hint[:100]))
seen.add(slack_name)
# First pass: canonical names (so they win slots if we hit the cap).
for cmd in COMMAND_REGISTRY:
if not _is_gateway_available(cmd, overrides):
continue
_add(cmd.name, cmd.description, cmd.args_hint or "")
# Second pass: aliases.
for cmd in COMMAND_REGISTRY:
if not _is_gateway_available(cmd, overrides):
continue
for alias in cmd.aliases:
# Skip aliases that only differ from canonical by case/punctuation
# normalization (already covered by _add dedup).
_add(alias, f"Alias for /{cmd.name}{cmd.description}", cmd.args_hint or "")
# Third pass: plugin commands.
for name, description, args_hint in _iter_plugin_command_entries():
_add(name, description, args_hint or "")
return entries
def slack_app_manifest(request_url: str = "https://hermes-agent.local/slack/commands") -> dict[str, Any]:
"""Generate a Slack app manifest with all gateway commands as slashes.
``request_url`` is required by Slack's manifest schema for every slash
command, but in Socket Mode (which we use) Slack ignores it and routes
the command event through the WebSocket. A placeholder URL is fine.
The returned dict is the ``features.slash_commands`` portion only
callers compose it into a full manifest (or merge into an existing
one). Keeping it narrow avoids coupling us to the rest of the manifest
schema (display_information, oauth_config, settings, etc.) which users
set up once in the Slack UI and rarely change.
"""
slashes = []
for name, desc, usage in slack_native_slashes():
entry = {
"command": f"/{name}",
"description": desc or f"Run /{name}",
"should_escape": False,
"url": request_url,
}
if usage:
entry["usage_hint"] = usage
slashes.append(entry)
return {"features": {"slash_commands": slashes}}
def slack_subcommand_map() -> dict[str, str]:
"""Return subcommand -> /command mapping for Slack /hermes handler.
+220 -9
View File
@@ -389,6 +389,20 @@ DEFAULT_CONFIG = {
# (60+ tool iterations with tiny output) before users assume the
# bot is dead and /restart.
"gateway_notify_interval": 180,
# How user-attached images are presented to the main model on each turn.
# "auto" — attach natively when the active model reports
# supports_vision=True AND the user hasn't explicitly
# configured auxiliary.vision.provider. Otherwise fall
# back to text (vision_analyze pre-analysis).
# "native" — always attach natively; non-vision models will either
# error at the provider or get a last-chance text fallback
# (see run_agent._prepare_messages_for_api).
# "text" — always pre-analyze with vision_analyze and prepend the
# description as text; the main model never sees pixels.
# Affects gateway platforms, the TUI, and CLI /attach. vision_analyze
# remains available as a tool regardless of this setting — the routing
# only controls how inbound user images are presented.
"image_input_mode": "auto",
},
"terminal": {
@@ -465,6 +479,7 @@ DEFAULT_CONFIG = {
"command_timeout": 30, # Timeout for browser commands in seconds (screenshot, navigate, etc.)
"record_sessions": False, # Auto-record browser sessions as WebM videos
"allow_private_urls": False, # Allow navigating to private/internal IPs (localhost, 192.168.x.x, etc.)
"auto_local_for_private_urls": True, # When a cloud provider is set, auto-spawn local Chromium for LAN/localhost URLs instead of sending them to the cloud
"cdp_url": "", # Optional persistent CDP endpoint for attaching to an existing Chromium/Chrome
# CDP supervisor — dialog + frame detection via a persistent WebSocket.
# Active only when a CDP-capable backend is attached (Browserbase or
@@ -486,6 +501,19 @@ DEFAULT_CONFIG = {
"checkpoints": {
"enabled": True,
"max_snapshots": 50, # Max checkpoints to keep per directory
# Auto-maintenance: shadow repos accumulate forever under
# ~/.hermes/checkpoints/ (one per cd'd working directory). Field
# reports put the typical offender at 1000+ repos / ~12 GB. When
# auto_prune is on, hermes sweeps at startup (at most once per
# min_interval_hours) and deletes:
# * orphan repos: HERMES_WORKDIR no longer exists on disk
# * stale repos: newest mtime older than retention_days
# Opt-in so users who rely on /rollback against long-ago sessions
# never lose data silently.
"auto_prune": False,
"retention_days": 7,
"delete_orphans": True,
"min_interval_hours": 24,
},
# Maximum characters returned by a single read_file call. Reads that
@@ -521,6 +549,12 @@ DEFAULT_CONFIG = {
},
# Anthropic prompt caching (Claude via OpenRouter or native Anthropic API).
# cache_ttl must be "5m" or "1h" (Anthropic-supported tiers); other values are ignored.
"prompt_caching": {
"cache_ttl": "5m",
},
# AWS Bedrock provider configuration.
# Only used when model.provider is "bedrock".
"bedrock": {
@@ -606,14 +640,6 @@ DEFAULT_CONFIG = {
"timeout": 30,
"extra_body": {},
},
"flush_memories": {
"provider": "auto",
"model": "",
"base_url": "",
"api_key": "",
"timeout": 30,
"extra_body": {},
},
"title_generation": {
"provider": "auto",
"model": "",
@@ -628,7 +654,7 @@ DEFAULT_CONFIG = {
"compact": False,
"personality": "kawaii",
"resume_display": "full",
"busy_input_mode": "interrupt",
"busy_input_mode": "interrupt", # interrupt | queue | steer
"bell_on_complete": False,
"show_reasoning": False,
"streaming": False,
@@ -777,6 +803,15 @@ DEFAULT_CONFIG = {
# warning log if out of range.
"max_spawn_depth": 1, # depth cap (1 = flat [default], 2 = orchestrator→leaf, 3 = three-level)
"orchestrator_enabled": True, # kill switch for role="orchestrator"
# When a subagent hits a dangerous-command approval prompt, the parent's
# prompt_toolkit TUI owns stdin — a thread-local input() call from the
# subagent worker would deadlock the parent UI. To avoid the deadlock,
# subagent threads ALWAYS resolve approvals non-interactively:
# false (default) → auto-deny with a logger.warning audit line (safe)
# true → auto-approve "once" with a logger.warning audit line
# Flip to true only if you trust delegated work to run dangerous cmds
# without human review (cron pipelines, batch automation, etc.).
"subagent_auto_approve": False,
},
# Ephemeral prefill messages file — JSON list of {role, content} dicts
@@ -952,6 +987,27 @@ DEFAULT_CONFIG = {
"backup_count": 3, # Number of rotated backup files to keep
},
# Remotely-hosted model catalog manifest. When enabled, the CLI fetches
# curated model lists for OpenRouter and Nous Portal from this URL,
# falling back to the in-repo snapshot on network failure. Lets us
# update model picker lists without shipping a hermes-agent release.
# The default URL is served by the docs site GitHub Pages deploy.
"model_catalog": {
"enabled": True,
"url": "https://hermes-agent.nousresearch.com/docs/api/model-catalog.json",
# Disk cache TTL in hours. Beyond this, the CLI refetches on the
# next /model or `hermes model` invocation; network failures
# silently fall back to the stale cache.
"ttl_hours": 24,
# Optional per-provider override URLs for third parties that want
# to self-host their own curation list using the same schema.
# Example:
# providers:
# openrouter:
# url: https://example.com/my-curation.json
"providers": {},
},
# Network settings — workarounds for connectivity issues.
"network": {
# Force IPv4 connections. On servers with broken or unreachable IPv6,
@@ -988,6 +1044,27 @@ DEFAULT_CONFIG = {
"min_interval_hours": 24,
},
# Contextual first-touch onboarding hints (see agent/onboarding.py).
# Each hint is shown once per install and then latched here so it
# never fires again. Users can wipe the section to re-see all hints.
"onboarding": {
"seen": {},
},
# ``hermes update`` behaviour.
"updates": {
# Run a full ``hermes backup``-style zip of HERMES_HOME before every
# ``hermes update``. Backups land in ``<HERMES_HOME>/backups/`` and
# can be restored with ``hermes import <path>``. Off by default —
# on large HERMES_HOME directories the zip can add minutes to every
# update. Set to true to re-enable, or pass ``--backup`` to opt in
# for a single update run.
"pre_update_backup": False,
# How many pre-update backup zips to retain. Older ones are pruned
# automatically after each successful backup.
"backup_keep": 5,
},
# Config schema version - bump this when adding new required fields
"_config_version": 22,
}
@@ -1177,6 +1254,22 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"GMI_API_KEY": {
"description": "GMI Cloud API key",
"prompt": "GMI Cloud API key",
"url": "https://www.gmicloud.ai/",
"password": True,
"category": "provider",
"advanced": True,
},
"GMI_BASE_URL": {
"description": "GMI Cloud base URL override",
"prompt": "GMI Cloud base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"MINIMAX_API_KEY": {
"description": "MiniMax API key (international)",
"prompt": "MiniMax API key",
@@ -1364,6 +1457,21 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"AZURE_FOUNDRY_API_KEY": {
"description": "Azure Foundry API key for custom Azure endpoints",
"prompt": "Azure Foundry API Key",
"url": "https://ai.azure.com/",
"password": True,
"category": "provider",
},
"AZURE_FOUNDRY_BASE_URL": {
"description": "Azure Foundry base URL (set via 'hermes model' for endpoint-specific config)",
"prompt": "Azure Foundry base URL",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"EXA_API_KEY": {
@@ -1531,6 +1639,44 @@ OPTIONAL_ENV_VARS = {
"category": "tool",
},
# ── Bundled skills (opt-in: only needed if the user uses that skill) ──
# These use category="skill" (distinct from "tool") so the sandbox
# env blocklist in tools/environments/local.py does NOT rewrite them —
# skills legitimately need these passed through to curl via
# tools/env_passthrough.py when the user's skill calls out.
"NOTION_API_KEY": {
"description": "Notion integration token (used by the `notion` skill)",
"prompt": "Notion API key",
"url": "https://www.notion.so/my-integrations",
"password": True,
"category": "skill",
"advanced": True,
},
"LINEAR_API_KEY": {
"description": "Linear personal API key (used by the `linear` skill)",
"prompt": "Linear API key",
"url": "https://linear.app/settings/api",
"password": True,
"category": "skill",
"advanced": True,
},
"AIRTABLE_API_KEY": {
"description": "Airtable personal access token (used by the `airtable` skill)",
"prompt": "Airtable API key",
"url": "https://airtable.com/create/tokens",
"password": True,
"category": "skill",
"advanced": True,
},
"TENOR_API_KEY": {
"description": "Tenor API key for GIF search (used by the `gif-search` skill)",
"prompt": "Tenor API key",
"url": "https://developers.google.com/tenor/guides/quickstart",
"password": True,
"category": "skill",
"advanced": True,
},
# ── Honcho ──
"HONCHO_API_KEY": {
"description": "Honcho API key for AI-native persistent memory",
@@ -2199,6 +2345,71 @@ def get_compatible_custom_providers(
return compatible
def get_custom_provider_context_length(
model: str,
base_url: str,
custom_providers: Optional[List[Dict[str, Any]]] = None,
config: Optional[Dict[str, Any]] = None,
) -> Optional[int]:
"""Look up a per-model ``context_length`` override from ``custom_providers``.
Matches any entry whose ``base_url`` equals ``base_url`` (trailing-slash
insensitive) and returns ``custom_providers[i].models.<model>.context_length``
if present and valid. Returns ``None`` when no override applies.
This is the single source of truth for custom-provider context overrides,
used by:
* ``AIAgent.__init__`` (startup resolution)
* ``AIAgent.switch_model`` (mid-session ``/model`` switch)
* ``hermes_cli.model_switch.resolve_display_context_length`` (``/model`` confirmation display)
* ``gateway.run._format_session_info`` (``/info`` display)
* ``agent.model_metadata.get_model_context_length`` (when custom_providers is threaded through)
Before this helper existed, the lookup was duplicated in ``run_agent.py``'s
startup path only; every other path (notably ``/model`` switch) fell back
to the 128K default. See #15779.
"""
if not model or not base_url:
return None
if custom_providers is None:
try:
custom_providers = get_compatible_custom_providers(config)
except Exception:
if config is None:
return None
raw = config.get("custom_providers")
custom_providers = raw if isinstance(raw, list) else []
if not isinstance(custom_providers, list):
return None
target_url = (base_url or "").rstrip("/")
if not target_url:
return None
for entry in custom_providers:
if not isinstance(entry, dict):
continue
entry_url = (entry.get("base_url") or "").rstrip("/")
if not entry_url or entry_url != target_url:
continue
models = entry.get("models")
if not isinstance(models, dict):
continue
model_cfg = models.get(model)
if not isinstance(model_cfg, dict):
continue
raw_ctx = model_cfg.get("context_length")
if raw_ctx is None:
continue
try:
ctx = int(raw_ctx)
except (TypeError, ValueError):
continue
if ctx > 0:
return ctx
return None
def check_config_version() -> Tuple[int, int]:
"""
Check config version.
+93
View File
@@ -275,6 +275,99 @@ def copilot_device_code_login(
return None
# ─── Copilot Token Exchange ────────────────────────────────────────────────
# Module-level cache for exchanged Copilot API tokens.
# Maps raw_token_fingerprint -> (api_token, expires_at_epoch).
_jwt_cache: dict[str, tuple[str, float]] = {}
_JWT_REFRESH_MARGIN_SECONDS = 120 # refresh 2 min before expiry
# Token exchange endpoint and headers (matching VS Code / Copilot CLI)
_TOKEN_EXCHANGE_URL = "https://api.github.com/copilot_internal/v2/token"
_EDITOR_VERSION = "vscode/1.104.1"
_EXCHANGE_USER_AGENT = "GitHubCopilotChat/0.26.7"
def _token_fingerprint(raw_token: str) -> str:
"""Short fingerprint of a raw token for cache keying (avoids storing full token)."""
import hashlib
return hashlib.sha256(raw_token.encode()).hexdigest()[:16]
def exchange_copilot_token(raw_token: str, *, timeout: float = 10.0) -> tuple[str, float]:
"""Exchange a raw GitHub token for a short-lived Copilot API token.
Calls ``GET https://api.github.com/copilot_internal/v2/token`` with
the raw GitHub token and returns ``(api_token, expires_at)``.
The returned token is a semicolon-separated string (not a standard JWT)
used as ``Authorization: Bearer <token>`` for Copilot API requests.
Results are cached in-process and reused until close to expiry.
Raises ``ValueError`` on failure.
"""
import urllib.request
fp = _token_fingerprint(raw_token)
# Check cache first
cached = _jwt_cache.get(fp)
if cached:
api_token, expires_at = cached
if time.time() < expires_at - _JWT_REFRESH_MARGIN_SECONDS:
return api_token, expires_at
req = urllib.request.Request(
_TOKEN_EXCHANGE_URL,
method="GET",
headers={
"Authorization": f"token {raw_token}",
"User-Agent": _EXCHANGE_USER_AGENT,
"Accept": "application/json",
"Editor-Version": _EDITOR_VERSION,
},
)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
except Exception as exc:
raise ValueError(f"Copilot token exchange failed: {exc}") from exc
api_token = data.get("token", "")
expires_at = data.get("expires_at", 0)
if not api_token:
raise ValueError("Copilot token exchange returned empty token")
# Convert expires_at to float if needed
expires_at = float(expires_at) if expires_at else time.time() + 1800
_jwt_cache[fp] = (api_token, expires_at)
logger.debug(
"Copilot token exchanged, expires_at=%s",
expires_at,
)
return api_token, expires_at
def get_copilot_api_token(raw_token: str) -> str:
"""Exchange a raw GitHub token for a Copilot API token, with fallback.
Convenience wrapper: returns the exchanged token on success, or the
raw token unchanged if the exchange fails (e.g. network error, unsupported
account type). This preserves existing behaviour for accounts that don't
need exchange while enabling access to internal-only models for those that do.
"""
if not raw_token:
return raw_token
try:
api_token, _ = exchange_copilot_token(raw_token)
return api_token
except Exception as exc:
logger.debug("Copilot token exchange failed, using raw token: %s", exc)
return raw_token
# ─── Copilot API Headers ───────────────────────────────────────────────────
def copilot_request_headers(
+9
View File
@@ -93,6 +93,9 @@ def cron_list(show_all: bool = False):
script = job.get("script")
if script:
print(f" Script: {script}")
workdir = job.get("workdir")
if workdir:
print(f" Workdir: {workdir}")
# Execution history
last_status = job.get("last_status")
@@ -168,6 +171,7 @@ def cron_create(args):
skill=getattr(args, "skill", None),
skills=_normalize_skills(getattr(args, "skill", None), getattr(args, "skills", None)),
script=getattr(args, "script", None),
workdir=getattr(args, "workdir", None),
)
if not result.get("success"):
print(color(f"Failed to create job: {result.get('error', 'unknown error')}", Colors.RED))
@@ -180,6 +184,8 @@ def cron_create(args):
job_data = result.get("job", {})
if job_data.get("script"):
print(f" Script: {job_data['script']}")
if job_data.get("workdir"):
print(f" Workdir: {job_data['workdir']}")
print(f" Next run: {result['next_run_at']}")
return 0
@@ -218,6 +224,7 @@ def cron_edit(args):
repeat=getattr(args, "repeat", None),
skills=final_skills,
script=getattr(args, "script", None),
workdir=getattr(args, "workdir", None),
)
if not result.get("success"):
print(color(f"Failed to update job: {result.get('error', 'unknown error')}", Colors.RED))
@@ -233,6 +240,8 @@ def cron_edit(args):
print(" Skills: none")
if updated.get("script"):
print(f" Script: {updated['script']}")
if updated.get("workdir"):
print(f" Workdir: {updated['workdir']}")
return 0
+11 -5
View File
@@ -45,8 +45,13 @@ def _pending_file() -> Path:
Each entry: ``{"url": "...", "expire_at": <unix_ts>}``. Scheduled
DELETEs used to be handled by spawning a detached Python process per
paste that slept for 6 hours; those accumulated forever if the user
ran ``hermes debug share`` repeatedly. We now persist the schedule
to disk and sweep expired entries on the next debug invocation.
ran ``hermes debug share`` repeatedly.
Deletion is now driven by the gateway's cron ticker
(``gateway/run.py::_start_cron_ticker``) which calls
``_sweep_expired_pastes`` once per hour. ``hermes debug share`` also
runs an opportunistic sweep on entry as a fallback for CLI-only users
who never start the gateway.
"""
return get_hermes_home() / "pastes" / "pending.json"
@@ -223,9 +228,10 @@ def _schedule_auto_delete(urls: list[str], delay_seconds: int = _AUTO_DELETE_SEC
interpreters that never exited until the sleep completed.
The replacement is stateless: we append to ``~/.hermes/pastes/pending.json``
and rely on opportunistic sweeps (``_sweep_expired_pastes``) called from
every ``hermes debug`` invocation. If the user never runs ``hermes debug``
again, paste.rs's own retention policy handles cleanup.
and the gateway's cron ticker sweeps expired entries once per hour.
``hermes debug share`` also runs an opportunistic sweep as a fallback
for CLI-only users. If neither runs again, paste.rs's own retention
policy handles cleanup.
"""
_record_pending(urls, delay_seconds=delay_seconds)
+35 -8
View File
@@ -29,6 +29,7 @@ if _env_path.exists():
load_dotenv(PROJECT_ROOT / ".env", override=False, encoding="utf-8")
from hermes_cli.colors import Colors, color
from hermes_cli.models import _HERMES_USER_AGENT
from hermes_constants import OPENROUTER_MODELS_URL
from utils import base_url_host_matches
@@ -45,6 +46,7 @@ _PROVIDER_ENV_HINTS = (
"Z_AI_API_KEY",
"KIMI_API_KEY",
"KIMI_CN_API_KEY",
"GMI_API_KEY",
"MINIMAX_API_KEY",
"MINIMAX_CN_API_KEY",
"KILOCODE_API_KEY",
@@ -295,16 +297,37 @@ def run_doctor(args):
except Exception:
pass
try:
from hermes_cli.auth import resolve_provider as _resolve_provider
from hermes_cli.config import get_compatible_custom_providers as _compatible_custom_providers
from hermes_cli.providers import resolve_provider_full as _resolve_provider_full
except Exception:
_resolve_provider = None
_compatible_custom_providers = None
_resolve_provider_full = None
custom_providers = []
if _compatible_custom_providers is not None:
try:
custom_providers = _compatible_custom_providers(cfg)
except Exception:
custom_providers = []
user_providers = cfg.get("providers")
if isinstance(user_providers, dict):
known_providers.update(str(name).strip().lower() for name in user_providers if str(name).strip())
for entry in custom_providers:
if not isinstance(entry, dict):
continue
name = str(entry.get("name") or "").strip()
if name:
known_providers.add("custom:" + name.lower().replace(" ", "-"))
canonical_provider = provider
if provider and _resolve_provider is not None and provider != "auto":
try:
canonical_provider = _resolve_provider(provider)
except Exception:
canonical_provider = None
if (
provider
and _resolve_provider_full is not None
and provider not in ("auto", "custom")
):
provider_def = _resolve_provider_full(provider, user_providers, custom_providers)
canonical_provider = provider_def.id if provider_def is not None else None
if provider and provider != "auto":
if canonical_provider is None or (known_providers and canonical_provider not in known_providers):
@@ -915,6 +938,7 @@ def run_doctor(args):
("StepFun Step Plan", ("STEPFUN_API_KEY",), "https://api.stepfun.ai/step_plan/v1/models", "STEPFUN_BASE_URL", True),
("Kimi / Moonshot (China)", ("KIMI_CN_API_KEY",), "https://api.moonshot.cn/v1/models", None, True),
("Arcee AI", ("ARCEEAI_API_KEY",), "https://api.arcee.ai/api/v1/models", "ARCEE_BASE_URL", True),
("GMI Cloud", ("GMI_API_KEY",), "https://api.gmi-serving.com/v1/models", "GMI_BASE_URL", True),
("DeepSeek", ("DEEPSEEK_API_KEY",), "https://api.deepseek.com/v1/models", "DEEPSEEK_BASE_URL", True),
("Hugging Face", ("HF_TOKEN",), "https://router.huggingface.co/v1/models", "HF_BASE_URL", True),
("NVIDIA NIM", ("NVIDIA_API_KEY",), "https://integrate.api.nvidia.com/v1/models", "NVIDIA_BASE_URL", True),
@@ -957,7 +981,10 @@ def run_doctor(args):
if base_url_host_matches(_base, "api.kimi.com") and _base.rstrip("/").endswith("/coding"):
_base = _base.rstrip("/") + "/v1"
_url = (_base.rstrip("/") + "/models") if _base else _default_url
_headers = {"Authorization": f"Bearer {_key}"}
_headers = {
"Authorization": f"Bearer {_key}",
"User-Agent": _HERMES_USER_AGENT,
}
if base_url_host_matches(_base, "api.kimi.com"):
_headers["User-Agent"] = "claude-code/0.1.0"
_resp = httpx.get(
+2
View File
@@ -267,6 +267,8 @@ def run_dump(args):
("ANTHROPIC_API_KEY", "anthropic"),
("ANTHROPIC_TOKEN", "anthropic_token"),
("NOUS_API_KEY", "nous"),
("GOOGLE_API_KEY", "google/gemini"),
("GEMINI_API_KEY", "gemini"),
("GLM_API_KEY", "glm/zai"),
("ZAI_API_KEY", "zai"),
("KIMI_API_KEY", "kimi"),
+361
View File
@@ -0,0 +1,361 @@
"""
hermes fallback manage the fallback provider chain.
Fallback providers are tried in order when the primary model fails with
rate-limit, overload, or connection errors. See:
https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers
Subcommands:
hermes fallback [list] Show the current fallback chain (default when no subcommand)
hermes fallback add Pick provider + model via the same picker as `hermes model`,
then append the selection to the chain
hermes fallback remove Pick an entry to delete from the chain
hermes fallback clear Remove all fallback entries
Storage: ``fallback_providers`` in ``~/.hermes/config.yaml`` (top-level, list of
``{provider, model, base_url?, api_mode?}`` dicts). The legacy single-dict
``fallback_model`` format is migrated to the new list format on first add.
"""
from __future__ import annotations
import copy
from typing import Any, Dict, List, Optional
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _read_chain(config: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Return the normalized fallback chain as a list of dicts.
Accepts both the new list format (``fallback_providers``) and the legacy
single-dict format (``fallback_model``). The returned list is always a
fresh copy callers can mutate without touching the config dict.
"""
chain = config.get("fallback_providers") or []
if isinstance(chain, list):
result = [dict(e) for e in chain if isinstance(e, dict) and e.get("provider") and e.get("model")]
if result:
return result
legacy = config.get("fallback_model")
if isinstance(legacy, dict) and legacy.get("provider") and legacy.get("model"):
return [dict(legacy)]
if isinstance(legacy, list):
return [dict(e) for e in legacy if isinstance(e, dict) and e.get("provider") and e.get("model")]
return []
def _write_chain(config: Dict[str, Any], chain: List[Dict[str, Any]]) -> None:
"""Persist the chain to ``fallback_providers`` and clear legacy key."""
config["fallback_providers"] = chain
# Drop the legacy single-dict key on write so there's only one source of truth.
if "fallback_model" in config:
config.pop("fallback_model", None)
def _format_entry(entry: Dict[str, Any]) -> str:
"""One-line human-readable rendering of a fallback entry."""
provider = entry.get("provider", "?")
model = entry.get("model", "?")
base = entry.get("base_url")
suffix = f" [{base}]" if base else ""
return f"{model} (via {provider}){suffix}"
def _extract_fallback_from_model_cfg(model_cfg: Any) -> Optional[Dict[str, Any]]:
"""Pull the ``{provider, model, base_url?, api_mode?}`` dict from a ``config["model"]`` snapshot."""
if not isinstance(model_cfg, dict):
return None
provider = (model_cfg.get("provider") or "").strip()
# The picker writes the selected model to ``model.default``.
model = (model_cfg.get("default") or model_cfg.get("model") or "").strip()
if not provider or not model:
return None
entry: Dict[str, Any] = {"provider": provider, "model": model}
base_url = (model_cfg.get("base_url") or "").strip()
if base_url:
entry["base_url"] = base_url
api_mode = (model_cfg.get("api_mode") or "").strip()
if api_mode:
entry["api_mode"] = api_mode
return entry
def _snapshot_auth_active_provider() -> Any:
"""Return the current ``active_provider`` in auth.json, or a sentinel if unavailable."""
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
return store.get("active_provider")
except Exception:
return None
def _restore_auth_active_provider(value: Any) -> None:
"""Write back a previously snapshotted ``active_provider`` value."""
try:
from hermes_cli.auth import _auth_store_lock, _load_auth_store, _save_auth_store
with _auth_store_lock():
store = _load_auth_store()
store["active_provider"] = value
_save_auth_store(store)
except Exception:
# Best-effort — if auth.json can't be restored, the user's primary
# provider may have been deactivated by the picker. They can re-run
# `hermes model` to fix it. Don't fail the fallback add.
pass
# ---------------------------------------------------------------------------
# Subcommand handlers
# ---------------------------------------------------------------------------
def cmd_fallback_list(args) -> None: # noqa: ARG001
"""Print the current fallback chain."""
from hermes_cli.config import load_config
config = load_config()
chain = _read_chain(config)
print()
if not chain:
print(" No fallback providers configured.")
print()
print(" Add one with: hermes fallback add")
print()
return
primary = _describe_primary(config)
if primary:
print(f" Primary: {primary}")
print()
print(f" Fallback chain ({len(chain)} {'entry' if len(chain) == 1 else 'entries'}):")
for i, entry in enumerate(chain, 1):
print(f" {i}. {_format_entry(entry)}")
print()
print(" Tried in order when the primary fails (rate-limit, 5xx, connection errors).")
print(" Docs: https://hermes-agent.nousresearch.com/docs/user-guide/features/fallback-providers")
print()
def _describe_primary(config: Dict[str, Any]) -> Optional[str]:
"""One-line description of the primary model for display purposes."""
model_cfg = config.get("model")
if isinstance(model_cfg, dict):
provider = (model_cfg.get("provider") or "?").strip() or "?"
model = (model_cfg.get("default") or model_cfg.get("model") or "?").strip() or "?"
return f"{model} (via {provider})"
if isinstance(model_cfg, str) and model_cfg.strip():
return model_cfg.strip()
return None
def cmd_fallback_add(args) -> None:
"""Launch the same picker as `hermes model`, then append the selection to the chain."""
from hermes_cli.main import _require_tty, select_provider_and_model
from hermes_cli.config import load_config, save_config
_require_tty("fallback add")
# Snapshot BEFORE the picker runs so we can distinguish "user actually
# picked something" from "user cancelled" by comparing before/after.
before_cfg = load_config()
model_before = copy.deepcopy(before_cfg.get("model"))
active_provider_before = _snapshot_auth_active_provider()
print()
print(" Adding a fallback provider. The picker below is the same one used by")
print(" `hermes model` — select the provider + model you want as a fallback.")
print()
try:
select_provider_and_model(args=args)
except SystemExit:
# Some provider flows exit on auth failure — restore state and re-raise.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
raise
# Read the post-picker state to see what the user selected.
after_cfg = load_config()
model_after = after_cfg.get("model")
new_entry = _extract_fallback_from_model_cfg(model_after)
if not new_entry:
# Picker didn't complete (user cancelled or flow bailed). Nothing to do.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
print()
print(" No fallback added.")
return
# Picker picked the same thing that's already the primary → nothing changed,
# and there's nothing useful to add as a fallback to itself.
primary_entry = _extract_fallback_from_model_cfg(model_before)
if primary_entry and primary_entry["provider"] == new_entry["provider"] \
and primary_entry["model"] == new_entry["model"]:
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
print()
print(f" Selected model matches the current primary ({_format_entry(new_entry)}).")
print(" A provider cannot be a fallback for itself — no change.")
return
# Reload the config with the primary restored, then append the new entry
# to ``fallback_providers``. We deliberately re-load (rather than mutating
# ``after_cfg``) because the picker may have touched other top-level keys
# (custom_providers, providers credentials) that we want to keep.
_restore_model_cfg(model_before)
_restore_auth_active_provider(active_provider_before)
final_cfg = load_config()
chain = _read_chain(final_cfg)
# Reject exact-duplicate fallback entries.
for existing in chain:
if existing.get("provider") == new_entry["provider"] \
and existing.get("model") == new_entry["model"]:
print()
print(f" {_format_entry(new_entry)} is already in the fallback chain — skipped.")
return
chain.append(new_entry)
_write_chain(final_cfg, chain)
save_config(final_cfg)
print()
print(f" Added fallback: {_format_entry(new_entry)}")
print(f" Chain is now {len(chain)} {'entry' if len(chain) == 1 else 'entries'} long.")
print()
print(" Run `hermes fallback list` to view, or `hermes fallback remove` to delete.")
def _restore_model_cfg(model_before: Any) -> None:
"""Restore ``config["model"]`` to a previously-captured snapshot."""
from hermes_cli.config import load_config, save_config
cfg = load_config()
if model_before is None:
cfg.pop("model", None)
else:
cfg["model"] = copy.deepcopy(model_before)
save_config(cfg)
def cmd_fallback_remove(args) -> None: # noqa: ARG001
"""Pick an entry from the chain and remove it."""
from hermes_cli.config import load_config, save_config
config = load_config()
chain = _read_chain(config)
if not chain:
print()
print(" No fallback providers configured — nothing to remove.")
print()
return
choices = [_format_entry(e) for e in chain]
choices.append("Cancel")
try:
from hermes_cli.setup import _curses_prompt_choice
idx = _curses_prompt_choice("Select a fallback to remove:", choices, 0)
except Exception:
idx = _numbered_pick("Select a fallback to remove:", choices)
if idx is None or idx < 0 or idx >= len(chain):
print()
print(" Cancelled — no change.")
return
removed = chain.pop(idx)
_write_chain(config, chain)
save_config(config)
print()
print(f" Removed fallback: {_format_entry(removed)}")
if chain:
print(f" Chain is now {len(chain)} {'entry' if len(chain) == 1 else 'entries'} long.")
else:
print(" Fallback chain is now empty.")
print()
def cmd_fallback_clear(args) -> None: # noqa: ARG001
"""Remove all fallback entries (with confirmation)."""
from hermes_cli.config import load_config, save_config
config = load_config()
chain = _read_chain(config)
if not chain:
print()
print(" No fallback providers configured — nothing to clear.")
print()
return
print()
print(f" Current fallback chain ({len(chain)} {'entry' if len(chain) == 1 else 'entries'}):")
for i, entry in enumerate(chain, 1):
print(f" {i}. {_format_entry(entry)}")
print()
try:
resp = input(" Clear all entries? [y/N]: ").strip().lower()
except (KeyboardInterrupt, EOFError):
print()
print(" Cancelled.")
return
if resp not in ("y", "yes"):
print(" Cancelled — no change.")
return
_write_chain(config, [])
save_config(config)
print()
print(" Fallback chain cleared.")
print()
def _numbered_pick(question: str, choices: List[str]) -> Optional[int]:
"""Fallback numbered-list picker when curses is unavailable."""
print(question)
for i, c in enumerate(choices, 1):
print(f" {i}. {c}")
print()
while True:
try:
val = input(f"Choice [1-{len(choices)}]: ").strip()
if not val:
return None
idx = int(val) - 1
if 0 <= idx < len(choices):
return idx
print(f"Please enter 1-{len(choices)}")
except ValueError:
print("Please enter a number")
except (KeyboardInterrupt, EOFError):
print()
return None
# ---------------------------------------------------------------------------
# Dispatch
# ---------------------------------------------------------------------------
def cmd_fallback(args) -> None:
"""Top-level dispatcher for ``hermes fallback [subcommand]``."""
sub = getattr(args, "fallback_command", None)
if sub in (None, "", "list", "ls"):
cmd_fallback_list(args)
elif sub == "add":
cmd_fallback_add(args)
elif sub in ("remove", "rm"):
cmd_fallback_remove(args)
elif sub == "clear":
cmd_fallback_clear(args)
else:
print(f"Unknown fallback subcommand: {sub}")
print("Use one of: list, add, remove, clear")
raise SystemExit(2)
+24
View File
@@ -2724,6 +2724,24 @@ _PLATFORMS = [
"help": "OpenID to deliver cron results and notifications to."},
],
},
{
"key": "yuanbao",
"label": "Yuanbao",
"emoji": "💎",
"token_var": "YUANBAO_APP_ID",
"setup_instructions": [
"1. Download the Yuanbao app from https://yuanbao.tencent.com/",
"2. In the app, go to PAI → My Bot and create a new bot",
"3. After the bot is created, copy the App ID and App Secret",
"4. Enter them below and Hermes will connect automatically over WebSocket",
],
"vars": [
{"name": "YUANBAO_APP_ID", "prompt": "App ID", "password": False,
"help": "The App ID from your Yuanbao IM Bot credentials."},
{"name": "YUANBAO_APP_SECRET", "prompt": "App Secret", "password": True,
"help": "The App Secret (used for HMAC signing) from your Yuanbao IM Bot."},
],
},
]
@@ -3108,6 +3126,12 @@ def _setup_wecom():
print_success("💬 WeCom configured!")
def _setup_yuanbao():
"""Configure Yuanbao via the standard platform setup."""
yuanbao_platform = next(p for p in _PLATFORMS if p["key"] == "yuanbao")
_setup_standard_platform(yuanbao_platform)
def _is_service_installed() -> bool:
"""Check if the gateway is installed as a system service."""
if supports_systemd_services():
+1
View File
@@ -125,6 +125,7 @@ _DEFAULT_PAYLOADS = {
"task_id": "test-task",
"tool_call_id": "test-call",
"result": '{"output": "hello"}',
"duration_ms": 42,
},
"pre_llm_call": {
"session_id": "test-session",
+1275 -71
View File
File diff suppressed because it is too large Load Diff
+329
View File
@@ -0,0 +1,329 @@
"""Remote model catalog fetcher.
The Hermes docs site hosts a JSON manifest of curated models for providers
we want to update without shipping a release (currently OpenRouter and
Nous Portal). This module fetches, validates, and caches that manifest,
falling back to the in-repo hardcoded lists when the network is unavailable.
Pipeline
--------
1. ``get_catalog()`` returns a parsed manifest dict.
- Checks in-process cache (invalidated by TTL).
- Reads disk cache at ``~/.hermes/cache/model_catalog.json``.
- Fetches the master URL if disk cache is stale or missing.
- On any fetch failure, keeps using the stale cache (or empty dict).
2. ``get_curated_openrouter_models()`` / ``get_curated_nous_models()``
thin accessors returning the shapes existing callers expect. Each
falls back to the in-repo hardcoded list on any lookup failure.
Schema (version 1)
------------------
::
{
"version": 1,
"updated_at": "2026-04-25T22:00:00Z",
"metadata": {...}, # free-form
"providers": {
"openrouter": {
"metadata": {...}, # free-form
"models": [
{"id": "vendor/model", "description": "recommended",
"metadata": {...}} # free-form, model-level
]
},
"nous": {...}
}
}
Unknown fields are ignored extra metadata can be added at either level
without bumping ``version``. ``version`` bumps are reserved for
breaking changes (renaming ``providers``, changing ``models`` shape).
"""
from __future__ import annotations
import json
import logging
import os
import time
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
from hermes_cli import __version__ as _HERMES_VERSION
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Constants
# ---------------------------------------------------------------------------
DEFAULT_CATALOG_URL = (
"https://hermes-agent.nousresearch.com/docs/api/model-catalog.json"
)
DEFAULT_TTL_HOURS = 24
DEFAULT_FETCH_TIMEOUT = 8.0
SUPPORTED_SCHEMA_VERSION = 1
_HERMES_USER_AGENT = f"hermes-cli/{_HERMES_VERSION}"
# In-process cache to avoid repeated disk + parse work across multiple
# calls within the same session. Invalidated by TTL against the disk file's
# mtime, so calling code never has to think about this.
_catalog_cache: dict[str, Any] | None = None
_catalog_cache_source_mtime: float = 0.0
# ---------------------------------------------------------------------------
# Config
# ---------------------------------------------------------------------------
def _load_catalog_config() -> dict[str, Any]:
"""Load the ``model_catalog`` config block with defaults filled in."""
try:
from hermes_cli.config import load_config
cfg = load_config() or {}
except Exception:
cfg = {}
raw = cfg.get("model_catalog")
if not isinstance(raw, dict):
raw = {}
return {
"enabled": bool(raw.get("enabled", True)),
"url": str(raw.get("url") or DEFAULT_CATALOG_URL),
"ttl_hours": float(raw.get("ttl_hours") or DEFAULT_TTL_HOURS),
"providers": raw.get("providers") if isinstance(raw.get("providers"), dict) else {},
}
def _cache_path() -> Path:
"""Return the disk cache path. Import lazily so tests can monkeypatch home."""
from hermes_constants import get_hermes_home
return get_hermes_home() / "cache" / "model_catalog.json"
# ---------------------------------------------------------------------------
# Fetch + validate + cache
# ---------------------------------------------------------------------------
def _fetch_manifest(url: str, timeout: float) -> dict[str, Any] | None:
"""HTTP GET the manifest URL and return a parsed dict, or None on failure."""
try:
req = urllib.request.Request(
url,
headers={
"Accept": "application/json",
"User-Agent": _HERMES_USER_AGENT,
},
)
with urllib.request.urlopen(req, timeout=timeout) as resp:
data = json.loads(resp.read().decode())
except (urllib.error.URLError, TimeoutError, json.JSONDecodeError, OSError) as exc:
logger.info("model catalog fetch failed (%s): %s", url, exc)
return None
except Exception as exc: # pragma: no cover — defensive
logger.info("model catalog fetch errored (%s): %s", url, exc)
return None
if not _validate_manifest(data):
logger.info("model catalog at %s failed schema validation", url)
return None
return data
def _validate_manifest(data: Any) -> bool:
"""Return True when ``data`` matches the minimum manifest shape."""
if not isinstance(data, dict):
return False
version = data.get("version")
if not isinstance(version, int) or version > SUPPORTED_SCHEMA_VERSION:
# Future schema version we don't understand — refuse rather than
# guess. Older schemas (version < 1) aren't supported either.
return False
providers = data.get("providers")
if not isinstance(providers, dict):
return False
for pname, pblock in providers.items():
if not isinstance(pname, str) or not isinstance(pblock, dict):
return False
models = pblock.get("models")
if not isinstance(models, list):
return False
for m in models:
if not isinstance(m, dict):
return False
if not isinstance(m.get("id"), str) or not m["id"].strip():
return False
return True
def _read_disk_cache() -> tuple[dict[str, Any] | None, float]:
"""Return ``(data_or_none, mtime)``. mtime is 0 if file is missing."""
path = _cache_path()
try:
mtime = path.stat().st_mtime
except (OSError, FileNotFoundError):
return (None, 0.0)
try:
with open(path) as fh:
data = json.load(fh)
except (OSError, json.JSONDecodeError):
return (None, 0.0)
if not _validate_manifest(data):
return (None, 0.0)
return (data, mtime)
def _write_disk_cache(data: dict[str, Any]) -> None:
path = _cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".tmp")
with open(tmp, "w") as fh:
json.dump(data, fh, indent=2)
fh.write("\n")
os.replace(tmp, path)
except OSError as exc:
logger.info("model catalog cache write failed: %s", exc)
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def get_catalog(*, force_refresh: bool = False) -> dict[str, Any]:
"""Return the parsed model catalog manifest, or an empty dict on failure.
Callers should treat a missing provider/model as "use the in-repo fallback"
never raise from this function so the CLI keeps working offline.
"""
global _catalog_cache, _catalog_cache_source_mtime
cfg = _load_catalog_config()
if not cfg["enabled"]:
return {}
ttl_seconds = max(0.0, cfg["ttl_hours"] * 3600.0)
disk_data, disk_mtime = _read_disk_cache()
now = time.time()
disk_fresh = disk_data is not None and (now - disk_mtime) < ttl_seconds
# In-process cache hit: disk hasn't changed since we loaded it and still fresh.
if (
not force_refresh
and _catalog_cache is not None
and disk_data is not None
and disk_mtime == _catalog_cache_source_mtime
and disk_fresh
):
return _catalog_cache
# Disk is fresh enough — use it without a network hit.
if not force_refresh and disk_fresh and disk_data is not None:
_catalog_cache = disk_data
_catalog_cache_source_mtime = disk_mtime
return disk_data
# Need to (re)fetch. If it fails, fall back to any stale disk copy.
fetched = _fetch_manifest(cfg["url"], DEFAULT_FETCH_TIMEOUT)
if fetched is not None:
_write_disk_cache(fetched)
new_disk_data, new_mtime = _read_disk_cache()
if new_disk_data is not None:
_catalog_cache = new_disk_data
_catalog_cache_source_mtime = new_mtime
return new_disk_data
_catalog_cache = fetched
_catalog_cache_source_mtime = now
return fetched
if disk_data is not None:
_catalog_cache = disk_data
_catalog_cache_source_mtime = disk_mtime
return disk_data
return {}
def _fetch_provider_override(provider: str) -> dict[str, Any] | None:
"""If ``model_catalog.providers.<name>.url`` is set, fetch that instead."""
cfg = _load_catalog_config()
if not cfg["enabled"]:
return None
provider_cfg = cfg["providers"].get(provider)
if not isinstance(provider_cfg, dict):
return None
override_url = provider_cfg.get("url")
if not isinstance(override_url, str) or not override_url.strip():
return None
# Override fetches skip the disk cache because they're usually
# third-party self-hosted. Re-request on every call but with a short
# timeout so they don't block the picker.
return _fetch_manifest(override_url.strip(), DEFAULT_FETCH_TIMEOUT)
def _get_provider_block(provider: str) -> dict[str, Any] | None:
"""Return the provider's manifest block, respecting per-provider overrides."""
override = _fetch_provider_override(provider)
if override is not None:
block = override.get("providers", {}).get(provider)
if isinstance(block, dict):
return block
catalog = get_catalog()
if not catalog:
return None
block = catalog.get("providers", {}).get(provider)
return block if isinstance(block, dict) else None
def get_curated_openrouter_models() -> list[tuple[str, str]] | None:
"""Return OpenRouter's curated ``[(id, description), ...]`` from the manifest.
Returns ``None`` when the manifest is unavailable, so callers can fall
back to their hardcoded list.
"""
block = _get_provider_block("openrouter")
if not block:
return None
out: list[tuple[str, str]] = []
for m in block.get("models", []):
mid = str(m.get("id") or "").strip()
if not mid:
continue
desc = str(m.get("description") or "")
out.append((mid, desc))
return out or None
def get_curated_nous_models() -> list[str] | None:
"""Return Nous Portal's curated list of model ids from the manifest.
Returns ``None`` when the manifest is unavailable.
"""
block = _get_provider_block("nous")
if not block:
return None
out: list[str] = []
for m in block.get("models", []):
mid = str(m.get("id") or "").strip()
if mid:
out.append(mid)
return out or None
def reset_cache() -> None:
"""Clear the in-process cache. Used by tests and ``hermes model --refresh``."""
global _catalog_cache, _catalog_cache_source_mtime
_catalog_cache = None
_catalog_cache_source_mtime = 0.0
+48 -9
View File
@@ -12,8 +12,12 @@ Different LLM providers expect model identifiers in different formats:
model IDs, but Claude still uses hyphenated native names like
``claude-sonnet-4-6``.
- **OpenCode Go** preserves dots in model names: ``minimax-m2.7``.
- **DeepSeek** only accepts two model identifiers:
``deepseek-chat`` and ``deepseek-reasoner``.
- **DeepSeek** accepts ``deepseek-chat`` (V3), ``deepseek-reasoner``
(R1-family), and the first-class V-series IDs (``deepseek-v4-pro``,
``deepseek-v4-flash``, and any future ``deepseek-v<N>-*``). Older
Hermes revisions folded every non-reasoner input into
``deepseek-chat``, which on aggregators routes to V3 so a user
picking V4 Pro was silently downgraded.
- **Custom** and remaining providers pass the name through as-is.
This module centralises that translation so callers can simply write::
@@ -25,6 +29,7 @@ Inspired by Clawdbot's ``normalizeAnthropicModelId`` pattern.
from __future__ import annotations
import re
from typing import Optional
# ---------------------------------------------------------------------------
@@ -100,6 +105,15 @@ _MATCHING_PREFIX_STRIP_PROVIDERS: frozenset[str] = frozenset({
"custom",
})
# Providers whose APIs require lowercase model IDs. Xiaomi's
# ``api.xiaomimimo.com`` rejects mixed-case names like ``MiMo-V2.5-Pro``
# that users might copy from marketing docs — it only accepts
# ``mimo-v2.5-pro``. After stripping a matching provider prefix, these
# providers also get ``.lower()`` applied.
_LOWERCASE_MODEL_PROVIDERS: frozenset[str] = frozenset({
"xiaomi",
})
# ---------------------------------------------------------------------------
# DeepSeek special handling
# ---------------------------------------------------------------------------
@@ -115,17 +129,30 @@ _DEEPSEEK_REASONER_KEYWORDS: frozenset[str] = frozenset({
})
_DEEPSEEK_CANONICAL_MODELS: frozenset[str] = frozenset({
"deepseek-chat",
"deepseek-reasoner",
"deepseek-chat", # V3 on DeepSeek direct and most aggregators
"deepseek-reasoner", # R1-family reasoning model
"deepseek-v4-pro", # V4 Pro — first-class model ID
"deepseek-v4-flash", # V4 Flash — first-class model ID
})
# First-class V-series IDs (``deepseek-v4-pro``, ``deepseek-v4-flash``,
# future ``deepseek-v5-*``, dated variants like ``deepseek-v4-flash-20260423``).
# Verified empirically 2026-04-24: DeepSeek's Chat Completions API returns
# ``provider: DeepSeek`` / ``model: deepseek-v4-flash-20260423`` when called
# with ``model=deepseek/deepseek-v4-flash``, so these names are not aliases
# of ``deepseek-chat`` and must not be folded into it.
_DEEPSEEK_V_SERIES_RE = re.compile(r"^deepseek-v\d+([-.].+)?$")
def _normalize_for_deepseek(model_name: str) -> str:
"""Map any model input to one of DeepSeek's two accepted identifiers.
"""Map a model input to a DeepSeek-accepted identifier.
Rules:
- Already ``deepseek-chat`` or ``deepseek-reasoner`` -> pass through.
- Contains any reasoner keyword (r1, think, reasoning, cot, reasoner)
- Already a known canonical (``deepseek-chat``/``deepseek-reasoner``/
``deepseek-v4-pro``/``deepseek-v4-flash``) -> pass through.
- Matches the V-series pattern ``deepseek-v<digit>...`` -> pass through
(covers future ``deepseek-v5-*`` and dated variants without a release).
- Contains a reasoner keyword (r1, think, reasoning, cot, reasoner)
-> ``deepseek-reasoner``.
- Everything else -> ``deepseek-chat``.
@@ -133,13 +160,17 @@ def _normalize_for_deepseek(model_name: str) -> str:
model_name: The bare model name (vendor prefix already stripped).
Returns:
One of ``"deepseek-chat"`` or ``"deepseek-reasoner"``.
A DeepSeek-accepted model identifier.
"""
bare = _strip_vendor_prefix(model_name).lower()
if bare in _DEEPSEEK_CANONICAL_MODELS:
return bare
# V-series first-class IDs (v4-pro, v4-flash, future v5-*, dated variants)
if _DEEPSEEK_V_SERIES_RE.match(bare):
return bare
# Check for reasoner-like keywords anywhere in the name
for keyword in _DEEPSEEK_REASONER_KEYWORDS:
if keyword in bare:
@@ -347,6 +378,9 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
>>> normalize_model_for_provider("claude-sonnet-4.6", "zai")
'claude-sonnet-4.6'
>>> normalize_model_for_provider("MiMo-V2.5-Pro", "xiaomi")
'mimo-v2.5-pro'
"""
name = (model_input or "").strip()
if not name:
@@ -410,7 +444,12 @@ def normalize_model_for_provider(model_input: str, target_provider: str) -> str:
# --- Direct providers: repair matching provider prefixes only ---
if provider in _MATCHING_PREFIX_STRIP_PROVIDERS:
return _strip_matching_provider_prefix(name, provider)
result = _strip_matching_provider_prefix(name, provider)
# Some providers require lowercase model IDs (e.g. Xiaomi's API
# rejects "MiMo-V2.5-Pro" but accepts "mimo-v2.5-pro").
if provider in _LOWERCASE_MODEL_PROVIDERS:
result = result.lower()
return result
# --- Authoritative native providers: preserve user-facing slugs as-is ---
if provider in _AUTHORITATIVE_NATIVE_PROVIDERS:
+110 -20
View File
@@ -527,6 +527,49 @@ def _resolve_alias_fallback(
return None
def resolve_display_context_length(
model: str,
provider: str,
base_url: str = "",
api_key: str = "",
model_info: Optional[ModelInfo] = None,
custom_providers: list | None = None,
) -> Optional[int]:
"""Resolve the context length to show in /model output.
models.dev reports per-vendor context (e.g. gpt-5.5 = 1.05M on openai)
but provider-enforced limits can be lower (e.g. Codex OAuth caps the
same slug at 272k). The authoritative source is
``agent.model_metadata.get_model_context_length`` which already knows
about Codex OAuth, Copilot, Nous, and falls back to models.dev for the
rest.
When ``custom_providers`` is provided, per-model ``context_length``
overrides from ``custom_providers[].models.<id>.context_length`` are
honored this closes #15779 where ``/model`` switch ignored user-set
overrides.
Prefer the provider-aware value; fall back to ``model_info.context_window``
only if the resolver returns nothing.
"""
try:
from agent.model_metadata import get_model_context_length
ctx = get_model_context_length(
model,
base_url=base_url or "",
api_key=api_key or "",
provider=provider or None,
custom_providers=custom_providers,
)
if ctx:
return int(ctx)
except Exception:
pass
if model_info is not None and model_info.context_window:
return int(model_info.context_window)
return None
# ---------------------------------------------------------------------------
# Core model-switching pipeline
# ---------------------------------------------------------------------------
@@ -771,7 +814,10 @@ def switch_model(
if provider_changed or explicit_provider:
try:
runtime = resolve_runtime_provider(requested=target_provider)
runtime = resolve_runtime_provider(
requested=target_provider,
target_model=new_model,
)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
@@ -788,10 +834,18 @@ def switch_model(
)
else:
try:
runtime = resolve_runtime_provider(requested=current_provider)
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
runtime = resolve_runtime_provider(
requested=current_provider,
target_model=new_model,
)
# If resolution fell through to "custom" (e.g. named custom provider like
# "ollama-launch" that resolve_runtime_provider doesn't know), keep existing
# credentials. Otherwise use the resolved values (picks up credential rotation,
# base_url adjustments for OpenCode, etc.).
if runtime.get("provider") != "custom":
api_key = runtime.get("api_key", "")
base_url = runtime.get("base_url", "")
api_mode = runtime.get("api_mode", "")
except Exception:
pass
@@ -815,6 +869,7 @@ def switch_model(
target_provider,
api_key=api_key,
base_url=base_url,
api_mode=api_mode or None,
)
except Exception as e:
validation = {
@@ -824,16 +879,31 @@ def switch_model(
"message": f"Could not validate `{new_model}`: {e}",
}
# Override rejection if model is in the user's saved provider config.
# API /v1/models may not list cloud/aliased models even though the server supports them.
if not validation.get("accepted"):
msg = validation.get("message", "Invalid model")
return ModelSwitchResult(
success=False,
new_model=new_model,
target_provider=target_provider,
provider_label=provider_label,
is_global=is_global,
error_message=msg,
)
override = False
if user_providers:
for up in user_providers:
if isinstance(up, dict) and up.get("provider") == target_provider:
cfg_models = up.get("models", [])
if new_model in cfg_models or any(
m.get("name") == new_model for m in cfg_models if isinstance(m, dict)
):
override = True
break
if override:
validation = {"accepted": True, "persist": True, "recognized": False, "message": validation.get("message", "")}
else:
msg = validation.get("message", "Invalid model")
return ModelSwitchResult(
success=False,
new_model=new_model,
target_provider=target_provider,
provider_label=provider_label,
is_global=is_global,
error_message=msg,
)
# Apply auto-correction if validation found a closer match
if validation.get("corrected_model"):
@@ -936,7 +1006,7 @@ def list_authenticated_providers(
from hermes_cli.auth import PROVIDER_REGISTRY
from hermes_cli.models import (
OPENROUTER_MODELS, _PROVIDER_MODELS,
_MODELS_DEV_PREFERRED, _merge_with_models_dev,
_MODELS_DEV_PREFERRED, _merge_with_models_dev, provider_model_ids,
)
results: List[dict] = []
@@ -984,6 +1054,14 @@ def list_authenticated_providers(
# Check if any env var is set
has_creds = any(os.environ.get(ev) for ev in env_vars)
if not has_creds:
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if store and hermes_id in store.get("credential_pool", {}):
has_creds = True
except Exception:
pass
if not has_creds:
continue
@@ -1095,11 +1173,14 @@ def list_authenticated_providers(
if not has_creds:
continue
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
# Merge with models.dev for preferred providers (same rationale as above).
if hermes_slug in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_slug, model_ids)
if hermes_slug in {"copilot", "copilot-acp"}:
model_ids = provider_model_ids(hermes_slug)
else:
# Use curated list — look up by Hermes slug, fall back to overlay key
model_ids = curated.get(hermes_slug, []) or curated.get(pid, [])
# Merge with models.dev for preferred providers (same rationale as above).
if hermes_slug in _MODELS_DEV_PREFERRED:
model_ids = _merge_with_models_dev(hermes_slug, model_ids)
total = len(model_ids)
top = model_ids[:max_models]
@@ -1222,6 +1303,15 @@ def list_authenticated_providers(
if m and m not in models_list:
models_list.append(m)
# Official OpenAI API rows in providers: often have base_url but no
# explicit models: dict — avoid a misleading zero count in /model.
if not models_list:
url_lower = str(api_url).strip().lower()
if "api.openai.com" in url_lower:
fb = curated.get("openai") or []
if fb:
models_list = list(fb)
# Try to probe /v1/models if URL is set (but don't block on it)
# For now just show what we know from config
results.append({
+415 -77
View File
@@ -33,8 +33,6 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("moonshotai/kimi-k2.6", "recommended"),
("deepseek/deepseek-v4-pro", ""),
("deepseek/deepseek-v4-flash", ""),
("anthropic/claude-opus-4.7", ""),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-sonnet-4.6", ""),
@@ -42,7 +40,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-sonnet-4.5", ""),
("anthropic/claude-haiku-4.5", ""),
("openrouter/elephant-alpha", "free"),
("openai/gpt-5.4", ""),
("openai/gpt-5.5", ""),
("openai/gpt-5.4-mini", ""),
("xiaomi/mimo-v2.5-pro", ""),
("xiaomi/mimo-v2.5", ""),
@@ -65,7 +63,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
("nvidia/nemotron-3-super-120b-a12b:free", "free"),
("arcee-ai/trinity-large-preview:free", "free"),
("arcee-ai/trinity-large-thinking", ""),
("openai/gpt-5.4-pro", ""),
("openai/gpt-5.5-pro", ""),
("openai/gpt-5.4-nano", ""),
]
@@ -111,8 +109,6 @@ def _codex_curated_models() -> list[str]:
_PROVIDER_MODELS: dict[str, list[str]] = {
"nous": [
"moonshotai/kimi-k2.6",
"deepseek/deepseek-v4-pro",
"deepseek/deepseek-v4-flash",
"xiaomi/mimo-v2.5-pro",
"xiaomi/mimo-v2.5",
"anthropic/claude-opus-4.7",
@@ -120,7 +116,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"anthropic/claude-sonnet-4.6",
"anthropic/claude-sonnet-4.5",
"anthropic/claude-haiku-4.5",
"openai/gpt-5.4",
"openai/gpt-5.5",
"openai/gpt-5.4-mini",
"openai/gpt-5.3-codex",
"google/gemini-3-pro-preview",
@@ -139,9 +135,21 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"x-ai/grok-4.20-beta",
"nvidia/nemotron-3-super-120b-a12b",
"arcee-ai/trinity-large-thinking",
"openai/gpt-5.4-pro",
"openai/gpt-5.5-pro",
"openai/gpt-5.4-nano",
],
# Native OpenAI Chat Completions (api.openai.com). Used by /model counts and
# provider_model_ids fallback when /v1/models is unavailable.
"openai": [
"gpt-5.4",
"gpt-5.4-mini",
"gpt-5-mini",
"gpt-5.3-codex",
"gpt-5.2-codex",
"gpt-4.1",
"gpt-4o",
"gpt-4o-mini",
],
"openai-codex": _codex_curated_models(),
"copilot-acp": [
"copilot-acp",
@@ -155,10 +163,13 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"gpt-4.1",
"gpt-4o",
"gpt-4o-mini",
"claude-opus-4.6",
"claude-sonnet-4.6",
"claude-sonnet-4",
"claude-sonnet-4.5",
"claude-haiku-4.5",
"gemini-3.1-pro-preview",
"gemini-3-pro-preview",
"gemini-3-flash-preview",
"gemini-2.5-pro",
"grok-code-fast-1",
],
@@ -267,6 +278,14 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"trinity-large-preview",
"trinity-mini",
],
"gmi": [
"zai-org/GLM-5.1-FP8",
"deepseek-ai/DeepSeek-V3.2",
"moonshotai/Kimi-K2.5",
"google/gemini-3.1-flash-lite-preview",
"anthropic/claude-sonnet-4.6",
"openai/gpt-5.4",
],
"opencode-zen": [
"kimi-k2.5",
"gpt-5.4-pro",
@@ -368,6 +387,9 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"us.meta.llama4-maverick-17b-instruct-v1:0",
"us.meta.llama4-scout-17b-instruct-v1:0",
],
# Azure Foundry: user-provided endpoint and model.
# Empty list because models depend on the endpoint configuration.
"azure-foundry": [],
}
# Vercel AI Gateway: derive the bare-model-id catalog from the curated
@@ -682,7 +704,7 @@ def get_nous_recommended_aux_model(
# ---------------------------------------------------------------------------
# Canonical provider list — single source of truth for provider identity.
# Every code path that lists, displays, or iterates providers derives from
# this list: hermes model, /model, /provider, list_authenticated_providers.
# this list: hermes model, /model, list_authenticated_providers.
#
# Fields:
# slug — internal provider ID (used in config.yaml, --provider flag)
@@ -695,7 +717,6 @@ class ProviderEntry(NamedTuple):
label: str
tui_desc: str # detailed description for `hermes model` TUI
CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("nous", "Nous Portal", "Nous Portal (Nous Research subscription)"),
ProviderEntry("openrouter", "OpenRouter", "OpenRouter (100+ models, pay-per-use)"),
@@ -721,10 +742,12 @@ CANONICAL_PROVIDERS: list[ProviderEntry] = [
ProviderEntry("alibaba", "Alibaba Cloud (DashScope)","Alibaba Cloud / DashScope Coding (Qwen + multi-provider)"),
ProviderEntry("ollama-cloud", "Ollama Cloud", "Ollama Cloud (cloud-hosted open models — ollama.com)"),
ProviderEntry("arcee", "Arcee AI", "Arcee AI (Trinity models — direct API)"),
ProviderEntry("gmi", "GMI Cloud", "GMI Cloud (multi-model direct API)"),
ProviderEntry("kilocode", "Kilo Code", "Kilo Code (Kilo Gateway API)"),
ProviderEntry("opencode-zen", "OpenCode Zen", "OpenCode Zen (35+ curated models, pay-as-you-go)"),
ProviderEntry("opencode-go", "OpenCode Go", "OpenCode Go (open models, $10/month subscription)"),
ProviderEntry("bedrock", "AWS Bedrock", "AWS Bedrock (Claude, Nova, Llama, DeepSeek — IAM or API key)"),
ProviderEntry("azure-foundry", "Azure Foundry", "Azure Foundry (OpenAI-style or Anthropic-style endpoint — your Azure AI deployment)"),
]
# Derived dicts — used throughout the codebase
@@ -754,6 +777,8 @@ _PROVIDER_ALIASES = {
"stepfun-coding-plan": "stepfun",
"arcee-ai": "arcee",
"arceeai": "arcee",
"gmi-cloud": "gmi",
"gmicloud": "gmi",
"minimax-china": "minimax-cn",
"minimax_cn": "minimax-cn",
"claude": "anthropic",
@@ -857,7 +882,16 @@ def fetch_openrouter_models(
if _openrouter_catalog_cache is not None and not force_refresh:
return list(_openrouter_catalog_cache)
fallback = list(OPENROUTER_MODELS)
# Prefer the remotely-hosted catalog manifest; fall back to the in-repo
# snapshot when the manifest is unreachable. Both are curated lists that
# drive the picker; the OpenRouter live /v1/models filter (tool support,
# free pricing) is applied on top either way.
try:
from hermes_cli.model_catalog import get_curated_openrouter_models
remote = get_curated_openrouter_models()
except Exception:
remote = None
fallback = list(remote) if remote else list(OPENROUTER_MODELS)
preferred_ids = [mid for mid, _ in fallback]
try:
@@ -910,6 +944,24 @@ def model_ids(*, force_refresh: bool = False) -> list[str]:
return [mid for mid, _ in fetch_openrouter_models(force_refresh=force_refresh)]
def get_curated_nous_model_ids() -> list[str]:
"""Return the curated Nous Portal model-id list.
Prefers the remotely-hosted catalog manifest (published under
``website/static/api/model-catalog.json``); falls back to the in-repo
snapshot in ``_PROVIDER_MODELS["nous"]`` when the manifest is
unreachable. Always returns a list (never None).
"""
try:
from hermes_cli.model_catalog import get_curated_nous_models
remote = get_curated_nous_models()
except Exception:
remote = None
if remote:
return list(remote)
return list(_PROVIDER_MODELS.get("nous", []))
def _ai_gateway_model_is_free(pricing: Any) -> bool:
"""Return True if an AI Gateway model has $0 input AND output pricing."""
if not isinstance(pricing, dict):
@@ -1110,7 +1162,10 @@ def fetch_models_with_pricing(
return _pricing_cache[cache_key]
url = cache_key.rstrip("/") + "/v1/models"
headers: dict[str, str] = {"Accept": "application/json"}
headers: dict[str, str] = {
"Accept": "application/json",
"User-Agent": _HERMES_USER_AGENT,
}
if api_key:
headers["Authorization"] = f"Bearer {api_key}"
@@ -1361,27 +1416,93 @@ def curated_models_for_provider(
return [(m, "") for m in models]
def detect_provider_for_model(
def _provider_keys(provider: str) -> set[str]:
key = (provider or "").strip().lower()
normalized = normalize_provider(provider)
return {k for k in (key, normalized) if k}
def _model_in_provider_catalog(name_lower: str, providers: set[str]) -> bool:
return any(
name_lower == model.lower()
for provider in providers
for model in _PROVIDER_MODELS.get(provider, [])
)
_AGGREGATOR_PROVIDERS = frozenset(
{"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
)
def _resolve_static_model_alias(
name_lower: str,
current_keys: set[str],
) -> Optional[tuple[str, str]]:
"""Resolve short aliases (e.g. sonnet/opus) using static catalogs only."""
try:
from hermes_cli.model_switch import MODEL_ALIASES
except Exception:
return None
identity = MODEL_ALIASES.get(name_lower)
if identity is None:
return None
vendor = identity.vendor
family = identity.family
def _match(provider: str) -> Optional[str]:
models = _PROVIDER_MODELS.get(provider, [])
if not models:
return None
prefix = (
f"{vendor}/{family}"
if provider in _AGGREGATOR_PROVIDERS
else family
).lower()
for model in models:
if model.lower().startswith(prefix):
return model
return None
for provider in current_keys:
if matched := _match(provider):
return provider, matched
for provider in _PROVIDER_MODELS:
if provider in current_keys or provider in _AGGREGATOR_PROVIDERS:
continue
if matched := _match(provider):
return provider, matched
for provider in _AGGREGATOR_PROVIDERS:
if provider in current_keys and (matched := _match(provider)):
return provider, matched
return None
def detect_static_provider_for_model(
model_name: str,
current_provider: str,
) -> Optional[tuple[str, str]]:
"""Auto-detect the best provider for a model name.
"""Auto-detect a provider from static catalogs only.
Returns ``(provider_id, model_name)`` the model name may be remapped
(e.g. bare ``deepseek-chat`` ``deepseek/deepseek-chat`` for OpenRouter).
Returns ``(provider_id, model_name)``. The model name may be remapped
when a static alias or bare provider name resolves to a catalog default.
Returns ``None`` when no confident match is found.
Priority:
0. Bare provider name switch to that provider's default model
1. Direct provider with credentials (highest)
2. Direct provider without credentials remap to OpenRouter slug
3. OpenRouter catalog match
"""
name = (model_name or "").strip()
if not name:
return None
name_lower = name.lower()
current_keys = _provider_keys(current_provider)
alias_match = _resolve_static_model_alias(name_lower, current_keys)
if alias_match:
return alias_match
# --- Step 0: bare provider name typed as model ---
# If someone types `/model nous` or `/model anthropic`, treat it as a
@@ -1394,64 +1515,49 @@ def detect_provider_for_model(
if (
resolved_provider in _PROVIDER_LABELS
and default_models
and resolved_provider != normalize_provider(current_provider)
and resolved_provider not in current_keys
):
return (resolved_provider, default_models[0])
# Aggregators list other providers' models — never auto-switch TO them
_AGGREGATORS = {"nous", "openrouter", "ai-gateway", "copilot", "kilocode"}
# If the model belongs to the current provider's catalog, don't suggest switching
current_models = _PROVIDER_MODELS.get(current_provider, [])
if any(name_lower == m.lower() for m in current_models):
if _model_in_provider_catalog(name_lower, current_keys):
return None
# --- Step 1: check static provider catalogs for a direct match ---
direct_match: Optional[str] = None
for pid, models in _PROVIDER_MODELS.items():
if pid == current_provider or pid in _AGGREGATORS:
if pid in current_keys or pid in _AGGREGATOR_PROVIDERS:
continue
if any(name_lower == m.lower() for m in models):
direct_match = pid
break
return (pid, name)
if direct_match:
# Check if we have credentials for this provider — env vars,
# credential pool, or auth store entries.
has_creds = False
try:
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY.get(direct_match)
if pconfig:
for env_var in pconfig.api_key_env_vars:
if os.getenv(env_var, "").strip():
has_creds = True
break
except Exception:
pass
# Also check credential pool and auth store — covers OAuth,
# Claude Code tokens, and other non-env-var credentials (#10300).
if not has_creds:
try:
from agent.credential_pool import load_pool
pool = load_pool(direct_match)
if pool.has_credentials():
has_creds = True
except Exception:
pass
if not has_creds:
try:
from hermes_cli.auth import _load_auth_store
store = _load_auth_store()
if direct_match in store.get("providers", {}) or direct_match in store.get("credential_pool", {}):
has_creds = True
except Exception:
pass
return None
# Always return the direct provider match. If credentials are
# missing, the client init will give a clear error rather than
# silently routing through the wrong provider (#10300).
return (direct_match, name)
def detect_provider_for_model(
model_name: str,
current_provider: str,
) -> Optional[tuple[str, str]]:
"""Auto-detect the best provider for a model name.
Returns ``(provider_id, model_name)`` the model name may be remapped
(e.g. bare ``deepseek-chat`` ``deepseek/deepseek-chat`` for OpenRouter).
Returns ``None`` when no confident match is found.
Priority:
0. Bare provider name switch to that provider's default model
1. Direct provider static catalog match
2. OpenRouter catalog match
"""
name = (model_name or "").strip()
if not name:
return None
static_match = detect_static_provider_for_model(name, current_provider)
if static_match:
return static_match
if _model_in_provider_catalog(name.lower(), _provider_keys(current_provider)):
return None
# --- Step 2: check OpenRouter catalog ---
# First try exact match (handles provider/model format)
@@ -1742,6 +1848,30 @@ def provider_model_ids(provider: Optional[str], *, force_refresh: bool = False)
live = fetch_ollama_cloud_models(force_refresh=force_refresh)
if live:
return live
if normalized == "openai":
api_key = os.getenv("OPENAI_API_KEY", "").strip()
if api_key:
base_raw = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
base = base_raw or "https://api.openai.com/v1"
try:
live = fetch_api_models(api_key, base)
if live:
return live
except Exception:
pass
if normalized == "gmi":
try:
from hermes_cli.auth import resolve_api_key_provider_credentials
creds = resolve_api_key_provider_credentials("gmi")
api_key = str(creds.get("api_key") or "").strip()
base_url = str(creds.get("base_url") or "").strip()
if api_key and base_url:
live = fetch_api_models(api_key, base_url)
if live:
return live
except Exception:
pass
if normalized == "custom":
base_url = _get_custom_base_url()
if base_url:
@@ -1896,6 +2026,51 @@ def fetch_github_model_catalog(
return None
# ─── Copilot catalog context-window helpers ─────────────────────────────────
# Module-level cache: {model_id: max_prompt_tokens}
_copilot_context_cache: dict[str, int] = {}
_copilot_context_cache_time: float = 0.0
_COPILOT_CONTEXT_CACHE_TTL = 3600 # 1 hour
def get_copilot_model_context(model_id: str, api_key: Optional[str] = None) -> Optional[int]:
"""Look up max_prompt_tokens for a Copilot model from the live /models API.
Results are cached in-process for 1 hour to avoid repeated API calls.
Returns the token limit or None if not found.
"""
global _copilot_context_cache, _copilot_context_cache_time
# Serve from cache if fresh
if _copilot_context_cache and (time.time() - _copilot_context_cache_time < _COPILOT_CONTEXT_CACHE_TTL):
if model_id in _copilot_context_cache:
return _copilot_context_cache[model_id]
# Cache is fresh but model not in it — don't re-fetch
return None
# Fetch and populate cache
catalog = fetch_github_model_catalog(api_key=api_key)
if not catalog:
return None
cache: dict[str, int] = {}
for item in catalog:
mid = str(item.get("id") or "").strip()
if not mid:
continue
caps = item.get("capabilities") or {}
limits = caps.get("limits") or {}
max_prompt = limits.get("max_prompt_tokens")
if isinstance(max_prompt, int) and max_prompt > 0:
cache[mid] = max_prompt
_copilot_context_cache = cache
_copilot_context_cache_time = time.time()
return cache.get(model_id)
def _is_github_models_base_url(base_url: Optional[str]) -> bool:
normalized = (base_url or "").strip().rstrip("/").lower()
return (
@@ -1929,6 +2104,7 @@ _COPILOT_MODEL_ALIASES = {
"openai/o4-mini": "gpt-5-mini",
"anthropic/claude-opus-4.6": "claude-opus-4.6",
"anthropic/claude-sonnet-4.6": "claude-sonnet-4.6",
"anthropic/claude-sonnet-4": "claude-sonnet-4",
"anthropic/claude-sonnet-4.5": "claude-sonnet-4.5",
"anthropic/claude-haiku-4.5": "claude-haiku-4.5",
# Dash-notation fallbacks: Hermes' default Claude IDs elsewhere use
@@ -1938,10 +2114,12 @@ _COPILOT_MODEL_ALIASES = {
# "model_not_supported". See issue #6879.
"claude-opus-4-6": "claude-opus-4.6",
"claude-sonnet-4-6": "claude-sonnet-4.6",
"claude-sonnet-4-0": "claude-sonnet-4",
"claude-sonnet-4-5": "claude-sonnet-4.5",
"claude-haiku-4-5": "claude-haiku-4.5",
"anthropic/claude-opus-4-6": "claude-opus-4.6",
"anthropic/claude-sonnet-4-6": "claude-sonnet-4.6",
"anthropic/claude-sonnet-4-0": "claude-sonnet-4",
"anthropic/claude-sonnet-4-5": "claude-sonnet-4.5",
"anthropic/claude-haiku-4-5": "claude-haiku-4.5",
}
@@ -2071,6 +2249,52 @@ def copilot_model_api_mode(
return "chat_completions"
# Azure Foundry model families that require the Responses API. Azure
# rejects /chat/completions against these deployments with
# ``400 "The requested operation is unsupported."`` — the same payload Bob
# Dobolina hit in April 2026 on ``gpt-5.3-codex`` while ``gpt-4o-pure`` on
# the same endpoint worked fine. Keep the patterns broad enough to cover
# vendor-renamed deployments (e.g. ``gpt-5.3-codex``, ``gpt-5-codex``,
# ``gpt-5.4``, ``o1-preview``) but tight enough to leave GPT-4 / 3.5 / Llama /
# Mistral / Grok deployments on chat completions.
_AZURE_FOUNDRY_RESPONSES_PREFIXES = (
"codex", # codex-*, codex-mini
"gpt-5", # gpt-5, gpt-5.x, gpt-5-codex, gpt-5.x-codex
"o1", # o1, o1-preview, o1-mini
"o3", # o3, o3-mini
"o4", # o4, o4-mini
)
def azure_foundry_model_api_mode(model_name: Optional[str]) -> Optional[str]:
"""Infer Azure Foundry api_mode from a deployment/model name.
Returns ``"codex_responses"`` when the model name matches a family that
only accepts the Responses API on Azure Foundry (GPT-5.x, codex, o1/o3/o4
reasoning models). Returns ``None`` otherwise the caller should fall
back to the configured/default api_mode (typically ``chat_completions``)
so GPT-4o, GPT-4 Turbo, Llama, Mistral, etc. keep working.
Intentionally does NOT return ``anthropic_messages``; Anthropic-style
Azure endpoints are disambiguated by URL (``/anthropic`` suffix) in
``runtime_provider._detect_api_mode_for_url`` and by the user setting
``model.api_mode: anthropic_messages`` explicitly.
"""
raw = str(model_name or "").strip().lower()
if not raw:
return None
# Strip any vendor/ prefix a user may have copied from OpenRouter / Copilot.
if "/" in raw:
raw = raw.rsplit("/", 1)[-1]
# gpt-5-mini speaks chat completions on Copilot but Azure Foundry deploys
# the full gpt-5 family uniformly on Responses API — don't carve an
# exception here.
for prefix in _AZURE_FOUNDRY_RESPONSES_PREFIXES:
if raw.startswith(prefix):
return "codex_responses"
return None
def normalize_opencode_model_id(provider_id: Optional[str], model_id: Optional[str]) -> str:
"""Normalize OpenCode config IDs to the bare model slug used in API requests."""
provider = normalize_provider(provider_id)
@@ -2166,8 +2390,15 @@ def probe_api_models(
api_key: Optional[str],
base_url: Optional[str],
timeout: float = 5.0,
api_mode: Optional[str] = None,
) -> dict[str, Any]:
"""Probe an OpenAI-compatible ``/models`` endpoint with light URL heuristics."""
"""Probe a ``/models`` endpoint with light URL heuristics.
For ``anthropic_messages`` mode, uses ``x-api-key`` and
``anthropic-version`` headers (Anthropic's native auth) instead of
``Authorization: Bearer``. The response shape (``data[].id``) is
identical, so the same parser works for both.
"""
normalized = (base_url or "").strip().rstrip("/")
if not normalized:
return {
@@ -2199,7 +2430,10 @@ def probe_api_models(
tried: list[str] = []
headers: dict[str, str] = {"User-Agent": _HERMES_USER_AGENT}
if api_key:
if api_key and api_mode == "anthropic_messages":
headers["x-api-key"] = api_key
headers["anthropic-version"] = "2023-06-01"
elif api_key:
headers["Authorization"] = f"Bearer {api_key}"
if normalized.startswith(COPILOT_BASE_URL):
headers.update(copilot_default_headers())
@@ -2241,7 +2475,10 @@ def _fetch_ai_gateway_models(timeout: float = 5.0) -> Optional[list[str]]:
base_url = AI_GATEWAY_BASE_URL
url = base_url.rstrip("/") + "/models"
headers: dict[str, str] = {"Authorization": f"Bearer {api_key}"}
headers: dict[str, str] = {
"Authorization": f"Bearer {api_key}",
"User-Agent": _HERMES_USER_AGENT,
}
req = urllib.request.Request(url, headers=headers)
try:
with urllib.request.urlopen(req, timeout=timeout) as resp:
@@ -2261,13 +2498,14 @@ def fetch_api_models(
api_key: Optional[str],
base_url: Optional[str],
timeout: float = 5.0,
api_mode: Optional[str] = None,
) -> Optional[list[str]]:
"""Fetch the list of available model IDs from the provider's ``/models`` endpoint.
Returns a list of model ID strings, or ``None`` if the endpoint could not
be reached (network error, timeout, auth failure, etc.).
"""
return probe_api_models(api_key, base_url, timeout=timeout).get("models")
return probe_api_models(api_key, base_url, timeout=timeout, api_mode=api_mode).get("models")
# ---------------------------------------------------------------------------
@@ -2395,6 +2633,7 @@ def validate_requested_model(
*,
api_key: Optional[str] = None,
base_url: Optional[str] = None,
api_mode: Optional[str] = None,
) -> dict[str, Any]:
"""
Validate a ``/model`` value for the active provider.
@@ -2436,7 +2675,11 @@ def validate_requested_model(
}
if normalized == "custom":
probe = probe_api_models(api_key, base_url)
# Try probing with correct auth for the api_mode.
if api_mode == "anthropic_messages":
probe = probe_api_models(api_key, base_url, api_mode=api_mode)
else:
probe = probe_api_models(api_key, base_url)
api_models = probe.get("models")
if api_models is not None:
if requested_for_lookup in set(api_models):
@@ -2475,8 +2718,8 @@ def validate_requested_model(
)
return {
"accepted": False,
"persist": False,
"accepted": True,
"persist": True,
"recognized": False,
"message": message,
}
@@ -2485,12 +2728,17 @@ def validate_requested_model(
f"Note: could not reach this custom endpoint's model listing at `{probe.get('probed_url')}`. "
f"Hermes will still save `{requested}`, but the endpoint should expose `/models` for verification."
)
if api_mode == "anthropic_messages":
message += (
"\n Many Anthropic-compatible proxies do not implement the Models API "
"(GET /v1/models). The model name has been accepted without verification."
)
if probe.get("suggested_base_url"):
message += f"\n If this server expects `/v1`, try base URL: `{probe.get('suggested_base_url')}`"
return {
"accepted": False,
"persist": False,
"accepted": api_mode == "anthropic_messages",
"persist": True,
"recognized": False,
"message": message,
}
@@ -2578,10 +2826,100 @@ def validate_requested_model(
),
}
# Native Anthropic provider: /v1/models requires x-api-key (or Bearer for
# OAuth) plus anthropic-version headers. The generic OpenAI-style probe
# below uses plain Bearer auth and 401s against Anthropic, so dispatch to
# the native fetcher which handles both API keys and Claude-Code OAuth
# tokens. (The api_mode=="anthropic_messages" branch below handles the
# Messages-API transport case separately.)
if normalized == "anthropic":
anthropic_models = _fetch_anthropic_models()
if anthropic_models is not None:
if requested_for_lookup in set(anthropic_models):
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
auto = get_close_matches(requested_for_lookup, anthropic_models, n=1, cutoff=0.9)
if auto:
return {
"accepted": True,
"persist": True,
"recognized": True,
"corrected_model": auto[0],
"message": f"Auto-corrected `{requested}` → `{auto[0]}`",
}
suggestions = get_close_matches(requested, anthropic_models, n=3, cutoff=0.5)
suggestion_text = ""
if suggestions:
suggestion_text = "\n Similar models: " + ", ".join(f"`{s}`" for s in suggestions)
# Accept anyway — Anthropic sometimes gates newer/preview models
# (e.g. snapshot IDs, early-access releases) behind accounts
# even though they aren't listed on /v1/models.
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: `{requested}` was not found in Anthropic's /v1/models listing. "
f"It may still work if you have early-access or snapshot IDs."
f"{suggestion_text}"
),
}
# _fetch_anthropic_models returned None — no token resolvable or
# network failure. Fall through to the generic warning below.
# Anthropic Messages API: many proxies don't implement /v1/models.
# Try probing with correct auth; if it fails, accept with a warning.
if api_mode == "anthropic_messages":
api_models = fetch_api_models(api_key, base_url, api_mode=api_mode)
if api_models is not None:
if requested_for_lookup in set(api_models):
return {
"accepted": True,
"persist": True,
"recognized": True,
"message": None,
}
auto = get_close_matches(requested_for_lookup, api_models, n=1, cutoff=0.9)
if auto:
return {
"accepted": True,
"persist": True,
"recognized": True,
"corrected_model": auto[0],
"message": f"Auto-corrected `{requested}` → `{auto[0]}`",
}
# Probe failed or model not found — accept anyway (proxy likely
# doesn't implement the Anthropic Models API).
return {
"accepted": True,
"persist": True,
"recognized": False,
"message": (
f"Note: could not verify `{requested}` against this endpoint's "
f"model listing. Many Anthropic-compatible proxies do not "
f"implement GET /v1/models. The model name has been accepted "
f"without verification."
),
}
# Probe the live API to check if the model actually exists
api_models = fetch_api_models(api_key, base_url)
if api_models is not None:
# Gemini's OpenAI-compat /v1beta/openai/models endpoint returns IDs
# prefixed with "models/" (e.g. "models/gemini-2.5-flash") — native
# Gemini-API convention. Our curated list and user input both use
# the bare ID, so a direct set-membership check drops every known
# Gemini model. Strip the prefix before comparison. See #12532.
if normalized == "gemini":
api_models = [
m[len("models/"):] if isinstance(m, str) and m.startswith("models/") else m
for m in api_models
]
if requested_for_lookup in set(api_models):
# API confirmed the model exists
return {
+16 -8
View File
@@ -9,6 +9,7 @@ from typing import Dict, Iterable, Optional, Set
from hermes_cli.auth import get_nous_auth_status
from hermes_cli.config import get_env_value, load_config
from tools.managed_tool_gateway import is_managed_tool_gateway_ready
from utils import is_truthy_value
from tools.tool_backend_helpers import (
fal_key_is_configured,
has_direct_modal_credentials,
@@ -25,6 +26,13 @@ _DEFAULT_PLATFORM_TOOLSETS = {
}
def _uses_gateway(section: object) -> bool:
"""Return True when a config section explicitly opts into the gateway."""
if not isinstance(section, dict):
return False
return is_truthy_value(section.get("use_gateway"), default=False)
@dataclass(frozen=True)
class NousFeatureState:
key: str
@@ -262,11 +270,11 @@ def get_nous_subscription_features(
# use_gateway flags — when True, the user explicitly opted into the
# Tool Gateway via `hermes model`, so direct credentials should NOT
# prevent gateway routing.
web_use_gateway = bool(web_cfg.get("use_gateway"))
tts_use_gateway = bool(tts_cfg.get("use_gateway"))
browser_use_gateway = bool(browser_cfg.get("use_gateway"))
web_use_gateway = _uses_gateway(web_cfg)
tts_use_gateway = _uses_gateway(tts_cfg)
browser_use_gateway = _uses_gateway(browser_cfg)
image_gen_cfg = config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}
image_use_gateway = bool(image_gen_cfg.get("use_gateway"))
image_use_gateway = _uses_gateway(image_gen_cfg)
direct_exa = bool(get_env_value("EXA_API_KEY"))
direct_firecrawl = bool(get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL"))
@@ -601,10 +609,10 @@ def get_gateway_eligible_tools(
# no direct keys exist — we only skip the prompt for tools where
# use_gateway was explicitly set.
opted_in = {
"web": bool((config.get("web") if isinstance(config.get("web"), dict) else {}).get("use_gateway")),
"image_gen": bool((config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}).get("use_gateway")),
"tts": bool((config.get("tts") if isinstance(config.get("tts"), dict) else {}).get("use_gateway")),
"browser": bool((config.get("browser") if isinstance(config.get("browser"), dict) else {}).get("use_gateway")),
"web": _uses_gateway(config.get("web")),
"image_gen": _uses_gateway(config.get("image_gen")),
"tts": _uses_gateway(config.get("tts")),
"browser": _uses_gateway(config.get("browser")),
}
unconfigured: list[str] = []
+202
View File
@@ -0,0 +1,202 @@
"""Oneshot (-z) mode: send a prompt, get the final content block, exit.
Bypasses cli.py entirely. No banner, no spinner, no session_id line,
no stderr chatter. Just the agent's final text to stdout.
Toolsets = whatever the user has configured for "cli" in `hermes tools`.
Rules / memory / AGENTS.md / preloaded skills = same as a normal chat turn.
Approvals = auto-bypassed (HERMES_YOLO_MODE=1 is set for the call).
Working directory = the user's CWD (AGENTS.md etc. resolve from there as usual).
Model / provider selection mirrors `hermes chat`:
- Both optional. If omitted, use the user's configured default.
- If both given, pair them exactly as given.
- If only --model given, auto-detect the provider that serves it.
- If only --provider given, error out (ambiguous caller must pick a model).
Env var fallbacks (used when the corresponding arg is not passed):
- HERMES_INFERENCE_MODEL
- HERMES_INFERENCE_PROVIDER (already read by resolve_runtime_provider)
"""
from __future__ import annotations
import logging
import os
import sys
from contextlib import redirect_stderr, redirect_stdout
from typing import Optional
def run_oneshot(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
) -> int:
"""Execute a single prompt and print only the final content block.
Args:
prompt: The user message to send.
model: Optional model override. Falls back to HERMES_INFERENCE_MODEL
env var, then config.yaml's model.default / model.model.
provider: Optional provider override. Falls back to
HERMES_INFERENCE_PROVIDER env var, then config.yaml's model.provider,
then "auto".
Returns the exit code. Caller should sys.exit() with the return.
"""
# Silence every stdlib logger for the duration. AIAgent, tools, and
# provider adapters all log to stderr through the root logger; file
# handlers added by setup_logging() keep working (they're attached to
# the root logger's handler list, not affected by level), but no
# bytes reach the terminal.
logging.disable(logging.CRITICAL)
# --provider without --model is ambiguous: carrying the user's configured
# model across to a different provider is usually wrong (that provider may
# not host it), and silently picking the provider's catalog default hides
# the mismatch. Require the caller to be explicit. Validate BEFORE the
# stderr redirect so the message actually reaches the terminal.
env_model_early = os.getenv("HERMES_INFERENCE_MODEL", "").strip()
if provider and not ((model or "").strip() or env_model_early):
sys.stderr.write(
"hermes -z: --provider requires --model (or HERMES_INFERENCE_MODEL). "
"Pass both explicitly, or neither to use your configured defaults.\n"
)
return 2
# Auto-approve any shell / tool approvals. Non-interactive by
# definition — a prompt would hang forever.
os.environ["HERMES_YOLO_MODE"] = "1"
os.environ["HERMES_ACCEPT_HOOKS"] = "1"
# Redirect stderr AND stdout to devnull for the entire call tree.
# We'll print the final response to the real stdout at the end.
real_stdout = sys.stdout
devnull = open(os.devnull, "w")
try:
with redirect_stdout(devnull), redirect_stderr(devnull):
response = _run_agent(prompt, model=model, provider=provider)
finally:
try:
devnull.close()
except Exception:
pass
if response:
real_stdout.write(response)
if not response.endswith("\n"):
real_stdout.write("\n")
real_stdout.flush()
return 0
def _run_agent(
prompt: str,
model: Optional[str] = None,
provider: Optional[str] = None,
) -> str:
"""Build an AIAgent exactly like a normal CLI chat turn would, then
run a single conversation. Returns the final response string."""
# Imports are local so they don't run when hermes is invoked for
# other commands (keeps top-level CLI startup cheap).
from hermes_cli.config import load_config
from hermes_cli.models import detect_provider_for_model
from hermes_cli.runtime_provider import resolve_runtime_provider
from hermes_cli.tools_config import _get_platform_tools
from run_agent import AIAgent
cfg = load_config()
# Resolve effective model: explicit arg → env var → config.
model_cfg = cfg.get("model") or {}
if isinstance(model_cfg, str):
cfg_model = model_cfg
else:
cfg_model = model_cfg.get("default") or model_cfg.get("model") or ""
env_model = os.getenv("HERMES_INFERENCE_MODEL", "").strip()
effective_model = (model or "").strip() or env_model or cfg_model
# Resolve effective provider: explicit arg → (auto-detect from model if
# model was explicit) → env / config (handled inside resolve_runtime_provider).
#
# When --model is given without --provider, auto-detect the provider that
# serves that model — same semantic as `/model <name>` in an interactive
# session. Without this, resolve_runtime_provider() would fall back to
# the user's configured default provider, which may not host the model
# the caller just asked for.
effective_provider = (provider or "").strip() or None
if effective_provider is None and (model or env_model):
# Only auto-detect when the model was explicitly requested via arg or
# env var (not when it came from config — that's the "use my defaults"
# path and the configured provider is already correct).
explicit_model = (model or "").strip() or env_model
if explicit_model:
cfg_provider = ""
if isinstance(model_cfg, dict):
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
current_provider = (
cfg_provider
or os.getenv("HERMES_INFERENCE_PROVIDER", "").strip().lower()
or "auto"
)
detected = detect_provider_for_model(explicit_model, current_provider)
if detected:
effective_provider, effective_model = detected
runtime = resolve_runtime_provider(
requested=effective_provider,
target_model=effective_model or None,
)
# Pull in whatever toolsets the user has enabled for "cli".
# sorted() gives stable ordering; set→list for AIAgent's signature.
toolsets_list = sorted(_get_platform_tools(cfg, "cli"))
agent = AIAgent(
api_key=runtime.get("api_key"),
base_url=runtime.get("base_url"),
provider=runtime.get("provider"),
api_mode=runtime.get("api_mode"),
model=effective_model,
enabled_toolsets=toolsets_list,
quiet_mode=True,
platform="cli",
credential_pool=runtime.get("credential_pool"),
# Interactive callbacks are intentionally NOT wired beyond this
# one. In oneshot mode there's no user sitting at a terminal:
# - clarify → returns a synthetic "pick a default" instruction
# so the agent continues instead of stalling on
# the tool's built-in "not available" error
# - sudo password prompt → terminal_tool gates on
# HERMES_INTERACTIVE which we never set
# - shell-hook approval → auto-approved via HERMES_ACCEPT_HOOKS=1
# (set above); also falls back to deny on non-tty
# - dangerous-command approval → bypassed via HERMES_YOLO_MODE=1
# - skill secret capture → returns gracefully when no callback set
clarify_callback=_oneshot_clarify_callback,
)
# Belt-and-braces: make sure AIAgent doesn't invoke any streaming
# display callbacks that would bypass our stdout capture.
agent.suppress_status_output = True
agent.stream_delta_callback = None
agent.tool_gen_callback = None
return agent.chat(prompt) or ""
def _oneshot_clarify_callback(question: str, choices=None) -> str:
"""Clarify is disabled in oneshot mode — tell the agent to pick a
default and proceed instead of stalling or erroring."""
if choices:
return (
f"[oneshot mode: no user available. Pick the best option from "
f"{choices} using your own judgment and continue.]"
)
return (
"[oneshot mode: no user available. Make the most reasonable "
"assumption you can and continue.]"
)
+1
View File
@@ -36,6 +36,7 @@ PLATFORMS: OrderedDict[str, PlatformInfo] = OrderedDict([
("wecom_callback", PlatformInfo(label="💬 WeCom Callback", default_toolset="hermes-wecom-callback")),
("weixin", PlatformInfo(label="💬 Weixin", default_toolset="hermes-weixin")),
("qqbot", PlatformInfo(label="💬 QQBot", default_toolset="hermes-qqbot")),
("yuanbao", PlatformInfo(label="🤖 Yuanbao", default_toolset="hermes-yuanbao")),
("webhook", PlatformInfo(label="🔗 Webhook", default_toolset="hermes-webhook")),
("api_server", PlatformInfo(label="🌐 API Server", default_toolset="hermes-api-server")),
("cron", PlatformInfo(label="⏰ Cron", default_toolset="hermes-cron")),
+8
View File
@@ -71,6 +71,14 @@ VALID_HOOKS: Set[str] = {
"on_session_finalize",
"on_session_reset",
"subagent_stop",
# Gateway pre-dispatch hook. Fired once per incoming MessageEvent
# after the internal-event guard but BEFORE auth/pairing and agent
# dispatch. Plugins may return a dict to influence flow:
# {"action": "skip", "reason": "..."} -> drop message (no reply)
# {"action": "rewrite", "text": "..."} -> replace event.text, continue
# {"action": "allow"} / None -> normal dispatch
# Kwargs: event: MessageEvent, gateway: GatewayRunner, session_store.
"pre_gateway_dispatch",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
+24
View File
@@ -116,6 +116,10 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
transport="openai_chat",
base_url_env_var="DASHSCOPE_BASE_URL",
),
"alibaba-coding-plan": HermesOverlay(
transport="openai_chat",
base_url_env_var="ALIBABA_CODING_PLAN_BASE_URL",
),
"vercel": HermesOverlay(
transport="openai_chat",
is_aggregator=True,
@@ -159,10 +163,22 @@ HERMES_OVERLAYS: Dict[str, HermesOverlay] = {
base_url_override="https://api.arcee.ai/api/v1",
base_url_env_var="ARCEE_BASE_URL",
),
"gmi": HermesOverlay(
transport="openai_chat",
extra_env_vars=("GMI_API_KEY",),
base_url_override="https://api.gmi-serving.com/v1",
base_url_env_var="GMI_BASE_URL",
),
"ollama-cloud": HermesOverlay(
transport="openai_chat",
base_url_env_var="OLLAMA_BASE_URL",
),
# Azure Foundry: supports both OpenAI-style and Anthropic-style endpoints.
# The transport is determined at runtime from config.yaml model.api_mode.
"azure-foundry": HermesOverlay(
transport="openai_chat", # default; overridden by api_mode in config
base_url_env_var="AZURE_FOUNDRY_BASE_URL",
),
}
@@ -259,6 +275,9 @@ ALIASES: Dict[str, str] = {
"aliyun": "alibaba",
"qwen": "alibaba",
"alibaba-cloud": "alibaba",
"alibaba_coding": "alibaba-coding-plan",
"alibaba-coding": "alibaba-coding-plan",
"alibaba_coding_plan": "alibaba-coding-plan",
# google-gemini-cli (OAuth + Code Assist)
"gemini-cli": "google-gemini-cli",
@@ -284,6 +303,10 @@ ALIASES: Dict[str, str] = {
"arcee-ai": "arcee",
"arceeai": "arcee",
# gmi
"gmi-cloud": "gmi",
"gmicloud": "gmi",
# Local server aliases → virtual "local" concept (resolved via user config)
"lmstudio": "lmstudio",
"lm-studio": "lmstudio",
@@ -306,6 +329,7 @@ _LABEL_OVERRIDES: Dict[str, str] = {
"copilot-acp": "GitHub Copilot ACP",
"stepfun": "StepFun Step Plan",
"xiaomi": "Xiaomi MiMo",
"gmi": "GMI Cloud",
"local": "Local endpoint",
"bedrock": "AWS Bedrock",
"ollama-cloud": "Ollama Cloud",
+229
View File
@@ -0,0 +1,229 @@
"""PTY bridge for `hermes dashboard` chat tab.
Wraps a child process behind a pseudo-terminal so its ANSI output can be
streamed to a browser-side terminal emulator (xterm.js) and typed
keystrokes can be fed back in. The only caller today is the
``/api/pty`` WebSocket endpoint in ``hermes_cli.web_server``.
Design constraints:
* **POSIX-only.** Hermes Agent supports Windows exclusively via WSL, which
exposes a native POSIX PTY via ``openpty(3)``. Native Windows Python
has no PTY; :class:`PtyUnavailableError` is raised with a user-readable
install/platform message so the dashboard can render a banner instead of
crashing.
* **Zero Node dependency on the server side.** We use :mod:`ptyprocess`,
which is a pure-Python wrapper around the OS calls. The browser talks
to the same ``hermes --tui`` binary it would launch from the CLI, so
every TUI feature (slash popover, model picker, tool rows, markdown,
skin engine, clarify/sudo/approval prompts) ships automatically.
* **Byte-safe I/O.** Reads and writes go through the PTY master fd
directly we avoid :class:`ptyprocess.PtyProcessUnicode` because
streaming ANSI is inherently byte-oriented and UTF-8 boundaries may land
mid-read.
"""
from __future__ import annotations
import errno
import fcntl
import os
import select
import signal
import struct
import sys
import termios
import time
from typing import Optional, Sequence
try:
import ptyprocess # type: ignore
_PTY_AVAILABLE = not sys.platform.startswith("win")
except ImportError: # pragma: no cover - dev env without ptyprocess
ptyprocess = None # type: ignore
_PTY_AVAILABLE = False
__all__ = ["PtyBridge", "PtyUnavailableError"]
class PtyUnavailableError(RuntimeError):
"""Raised when a PTY cannot be created on this platform.
Today this means native Windows (no ConPTY bindings) or a dev
environment missing the ``ptyprocess`` dependency. The dashboard
surfaces the message to the user as a chat-tab banner.
"""
class PtyBridge:
"""Thin wrapper around ``ptyprocess.PtyProcess`` for byte streaming.
Not thread-safe. A single bridge is owned by the WebSocket handler
that spawned it; the reader runs in an executor thread while writes
happen on the event-loop thread. Both sides are OK because the
kernel PTY is the actual synchronization point we never call
:mod:`ptyprocess` methods concurrently, we only call ``os.read`` and
``os.write`` on the master fd, which is safe.
"""
def __init__(self, proc: "ptyprocess.PtyProcess"): # type: ignore[name-defined]
self._proc = proc
self._fd: int = proc.fd
self._closed = False
# -- lifecycle --------------------------------------------------------
@classmethod
def is_available(cls) -> bool:
"""True if a PTY can be spawned on this platform."""
return bool(_PTY_AVAILABLE)
@classmethod
def spawn(
cls,
argv: Sequence[str],
*,
cwd: Optional[str] = None,
env: Optional[dict] = None,
cols: int = 80,
rows: int = 24,
) -> "PtyBridge":
"""Spawn ``argv`` behind a new PTY and return a bridge.
Raises :class:`PtyUnavailableError` if the platform can't host a
PTY. Raises :class:`FileNotFoundError` or :class:`OSError` for
ordinary exec failures (missing binary, bad cwd, etc.).
"""
if not _PTY_AVAILABLE:
if sys.platform.startswith("win"):
raise PtyUnavailableError(
"Pseudo-terminals are unavailable on this platform. "
"Hermes Agent supports Windows only via WSL."
)
if ptyprocess is None:
raise PtyUnavailableError(
"The `ptyprocess` package is missing. "
"Install with: pip install ptyprocess "
"(or pip install -e '.[pty]')."
)
raise PtyUnavailableError("Pseudo-terminals are unavailable.")
# Let caller-supplied env fully override inheritance; if they pass
# None we inherit the server's env (same semantics as subprocess).
spawn_env = os.environ.copy() if env is None else env
proc = ptyprocess.PtyProcess.spawn( # type: ignore[union-attr]
list(argv),
cwd=cwd,
env=spawn_env,
dimensions=(rows, cols),
)
return cls(proc)
@property
def pid(self) -> int:
return int(self._proc.pid)
def is_alive(self) -> bool:
if self._closed:
return False
try:
return bool(self._proc.isalive())
except Exception:
return False
# -- I/O --------------------------------------------------------------
def read(self, timeout: float = 0.2) -> Optional[bytes]:
"""Read up to 64 KiB of raw bytes from the PTY master.
Returns:
* bytes zero or more bytes of child output
* empty bytes (``b""``) no data available within ``timeout``
* None child has exited and the master fd is at EOF
Never blocks longer than ``timeout`` seconds. Safe to call after
:meth:`close`; returns ``None`` in that case.
"""
if self._closed:
return None
try:
readable, _, _ = select.select([self._fd], [], [], timeout)
except (OSError, ValueError):
return None
if not readable:
return b""
try:
data = os.read(self._fd, 65536)
except OSError as exc:
# EIO on Linux = slave side closed. EBADF = already closed.
if exc.errno in (errno.EIO, errno.EBADF):
return None
raise
if not data:
return None
return data
def write(self, data: bytes) -> None:
"""Write raw bytes to the PTY master (i.e. the child's stdin)."""
if self._closed or not data:
return
# os.write can return a short write under load; loop until drained.
view = memoryview(data)
while view:
try:
n = os.write(self._fd, view)
except OSError as exc:
if exc.errno in (errno.EIO, errno.EBADF, errno.EPIPE):
return
raise
if n <= 0:
return
view = view[n:]
def resize(self, cols: int, rows: int) -> None:
"""Forward a terminal resize to the child via ``TIOCSWINSZ``."""
if self._closed:
return
# struct winsize: rows, cols, xpixel, ypixel (all unsigned short)
winsize = struct.pack("HHHH", max(1, rows), max(1, cols), 0, 0)
try:
fcntl.ioctl(self._fd, termios.TIOCSWINSZ, winsize)
except OSError:
pass
# -- teardown ---------------------------------------------------------
def close(self) -> None:
"""Terminate the child (SIGTERM → 0.5s grace → SIGKILL) and close fds.
Idempotent. Reaping the child is important so we don't leak
zombies across the lifetime of the dashboard process.
"""
if self._closed:
return
self._closed = True
# SIGHUP is the conventional "your terminal went away" signal.
# We escalate if the child ignores it.
for sig in (signal.SIGHUP, signal.SIGTERM, signal.SIGKILL):
if not self._proc.isalive():
break
try:
self._proc.kill(sig)
except Exception:
pass
deadline = time.monotonic() + 0.5
while self._proc.isalive() and time.monotonic() < deadline:
time.sleep(0.02)
try:
self._proc.close(force=True)
except Exception:
pass
# Context-manager sugar — handy in tests and ad-hoc scripts.
def __enter__(self) -> "PtyBridge":
return self
def __exit__(self, *_exc) -> None:
self.close()
+243 -13
View File
@@ -36,6 +36,29 @@ def _normalize_custom_provider_name(value: str) -> str:
return value.strip().lower().replace(" ", "-")
def _loopback_hostname(host: str) -> bool:
h = (host or "").lower().rstrip(".")
return h in {"localhost", "127.0.0.1", "::1", "0.0.0.0"}
def _config_base_url_trustworthy_for_bare_custom(cfg_base_url: str, cfg_provider: str) -> bool:
"""Decide whether ``model.base_url`` may back bare ``custom`` runtime resolution.
GitHub #14676: the model picker can select Custom while ``model.provider`` still reflects a
previous provider. Reject non-loopback URLs unless the YAML provider is already ``custom``,
so a stale OpenRouter/Z.ai base_url cannot hijack local ``custom`` sessions.
"""
cfg_provider_norm = (cfg_provider or "").strip().lower()
bu = (cfg_base_url or "").strip()
if not bu:
return False
if cfg_provider_norm == "custom":
return True
if base_url_host_matches(bu, "openrouter.ai"):
return False
return _loopback_hostname(base_url_hostname(bu))
def _detect_api_mode_for_url(base_url: str) -> Optional[str]:
"""Auto-detect api_mode from the resolved base URL.
@@ -160,8 +183,16 @@ def _resolve_runtime_from_pool_entry(
requested_provider: str,
model_cfg: Optional[Dict[str, Any]] = None,
pool: Optional[CredentialPool] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
model_cfg = model_cfg or _get_model_config()
# When the caller is resolving for a specific target model (e.g. a /model
# mid-session switch), prefer that over the persisted model.default. This
# prevents api_mode being computed from a stale config default that no
# longer matches the model actually being used — the bug that caused
# opencode-zen /v1 to be stripped for chat_completions requests when
# config.default was still a Claude model.
effective_model = (target_model or model_cfg.get("default") or "")
base_url = (getattr(entry, "runtime_base_url", None) or getattr(entry, "base_url", None) or "").rstrip("/")
api_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
api_mode = "chat_completions"
@@ -190,6 +221,32 @@ def _resolve_runtime_from_pool_entry(
elif provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
base_url = base_url or PROVIDER_REGISTRY["copilot"].inference_base_url
elif provider == "azure-foundry":
# Azure Foundry: read api_mode and base_url from config
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
if cfg_provider == "azure-foundry":
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
if cfg_base_url:
base_url = cfg_base_url
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode:
api_mode = configured_mode
# Model-family inference for GPT-5.x / codex / o1-o4: Azure rejects
# /chat/completions on these with 400 "operation unsupported" — see
# azure_foundry_model_api_mode() for rationale. Skip when the user
# explicitly picked anthropic_messages (Anthropic-style endpoint).
if effective_model and api_mode != "anthropic_messages":
try:
from hermes_cli.models import azure_foundry_model_api_mode
inferred = azure_foundry_model_api_mode(effective_model)
except Exception:
inferred = None
if inferred:
api_mode = inferred
# For Anthropic-style endpoints, strip /v1 suffix
if api_mode == "anthropic_messages":
base_url = re.sub(r"/v1/?$", "", base_url)
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
# Honour model.base_url from config.yaml when the configured provider
@@ -207,7 +264,7 @@ def _resolve_runtime_from_pool_entry(
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
api_mode = opencode_model_api_mode(provider, effective_model)
else:
# Auto-detect Anthropic-compatible endpoints (/anthropic suffix,
# Kimi /coding, api.openai.com → codex_responses, api.x.ai →
@@ -323,12 +380,16 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
# Found match by provider key
base_url = entry.get("api") or entry.get("url") or entry.get("base_url") or ""
if base_url:
return {
result = {
"name": entry.get("name", ep_name),
"base_url": base_url.strip(),
"api_key": resolved_api_key,
"model": entry.get("default_model", ""),
}
api_mode = _parse_api_mode(entry.get("api_mode"))
if api_mode:
result["api_mode"] = api_mode
return result
# Also check the 'name' field if present
display_name = entry.get("name", "")
if display_name:
@@ -337,12 +398,16 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
# Found match by display name
base_url = entry.get("api") or entry.get("url") or entry.get("base_url") or ""
if base_url:
return {
result = {
"name": display_name,
"base_url": base_url.strip(),
"api_key": resolved_api_key,
"model": entry.get("default_model", ""),
}
api_mode = _parse_api_mode(entry.get("api_mode"))
if api_mode:
result["api_mode"] = api_mode
return result
# Fall back to custom_providers: list (legacy format)
custom_providers = config.get("custom_providers")
@@ -464,6 +529,7 @@ def _resolve_openrouter_runtime(
cfg_provider = cfg_provider.strip().lower()
env_openrouter_base_url = os.getenv("OPENROUTER_BASE_URL", "").strip()
env_custom_base_url = os.getenv("CUSTOM_BASE_URL", "").strip()
# Use config base_url when available and the provider context matches.
# OPENAI_BASE_URL env var is no longer consulted — config.yaml is
@@ -473,11 +539,14 @@ def _resolve_openrouter_runtime(
if requested_norm == "auto":
if not cfg_provider or cfg_provider == "auto":
use_config_base_url = True
elif requested_norm == "custom" and cfg_provider == "custom":
elif requested_norm == "custom" and _config_base_url_trustworthy_for_bare_custom(
cfg_base_url, cfg_provider
):
use_config_base_url = True
base_url = (
(explicit_base_url or "").strip()
or env_custom_base_url
or (cfg_base_url.strip() if use_config_base_url else "")
or env_openrouter_base_url
or OPENROUTER_BASE_URL
@@ -546,6 +615,88 @@ def _resolve_openrouter_runtime(
}
def _resolve_azure_foundry_runtime(
*,
requested_provider: str,
model_cfg: Dict[str, Any],
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
"""Resolve an Azure Foundry runtime entry.
Reads ``model.base_url`` + ``model.api_mode`` from config.yaml (or
explicit overrides), pulls the API key from ``.env`` / env var, and
strips a trailing ``/v1`` for Anthropic-style endpoints because the
Anthropic SDK appends ``/v1/messages`` internally.
Raises :class:`AuthError` when required values are missing.
"""
explicit_api_key = str(explicit_api_key or "").strip()
explicit_base_url_clean = str(explicit_base_url or "").strip().rstrip("/")
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
cfg_base_url = ""
cfg_api_mode = "chat_completions"
if cfg_provider == "azure-foundry":
cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
cfg_api_mode = _parse_api_mode(model_cfg.get("api_mode")) or "chat_completions"
# Model-family inference: Azure Foundry deploys GPT-5.x / codex / o1-o4
# reasoning models as Responses-API-only. Calling /chat/completions
# against them returns 400 "The requested operation is unsupported."
# Upgrade api_mode when the model name matches, unless the user has
# explicitly chosen anthropic_messages (Anthropic-style endpoint).
effective_model = str(target_model or model_cfg.get("default") or "").strip()
if effective_model and cfg_api_mode != "anthropic_messages":
try:
from hermes_cli.models import azure_foundry_model_api_mode
inferred = azure_foundry_model_api_mode(effective_model)
except Exception:
inferred = None
if inferred:
cfg_api_mode = inferred
env_base_url = os.getenv("AZURE_FOUNDRY_BASE_URL", "").strip().rstrip("/")
base_url = explicit_base_url_clean or cfg_base_url or env_base_url
if not base_url:
raise AuthError(
"Azure Foundry requires a base URL. Set it via 'hermes model' or "
"the AZURE_FOUNDRY_BASE_URL environment variable."
)
api_key = explicit_api_key
if not api_key:
try:
from hermes_cli.config import get_env_value
api_key = get_env_value("AZURE_FOUNDRY_API_KEY") or ""
except Exception:
api_key = ""
if not api_key:
api_key = os.getenv("AZURE_FOUNDRY_API_KEY", "").strip()
if not api_key:
raise AuthError(
"Azure Foundry requires an API key. Set AZURE_FOUNDRY_API_KEY in "
"~/.hermes/.env or run 'hermes model' to configure."
)
# Anthropic SDK appends /v1/messages itself, so strip any trailing /v1
# we inherited from the configured base_url to avoid double-/v1 paths.
if cfg_api_mode == "anthropic_messages":
base_url = re.sub(r"/v1/?$", "", base_url)
source = "explicit" if (explicit_api_key or explicit_base_url) else "config"
return {
"provider": "azure-foundry",
"api_mode": cfg_api_mode,
"base_url": base_url,
"api_key": api_key,
"source": source,
"requested_provider": requested_provider,
}
def _resolve_explicit_runtime(
*,
provider: str,
@@ -635,6 +786,15 @@ def _resolve_explicit_runtime(
"requested_provider": requested_provider,
}
# Azure Foundry: user-configured endpoint with selectable API mode
if provider == "azure-foundry":
return _resolve_azure_foundry_runtime(
requested_provider=requested_provider,
model_cfg=model_cfg,
explicit_api_key=explicit_api_key,
explicit_base_url=explicit_base_url,
)
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
env_url = ""
@@ -689,10 +849,55 @@ def resolve_runtime_provider(
requested: Optional[str] = None,
explicit_api_key: Optional[str] = None,
explicit_base_url: Optional[str] = None,
target_model: Optional[str] = None,
) -> Dict[str, Any]:
"""Resolve runtime provider credentials for agent execution."""
"""Resolve runtime provider credentials for agent execution.
target_model: Optional override for model_cfg.get("default") when
computing provider-specific api_mode (e.g. OpenCode Zen/Go where different
models route through different API surfaces). Callers performing an
explicit mid-session model switch should pass the new model here so
api_mode is derived from the model they are switching TO, not the stale
persisted default. Other callers can leave it None to preserve existing
behavior (api_mode derived from config).
"""
requested_provider = resolve_requested_provider(requested)
# Azure Anthropic short-circuit: when explicitly targeting an Azure endpoint
# with provider="anthropic", bypass _resolve_named_custom_runtime (which would
# return provider="custom" with chat_completions api_mode and no valid key).
# Instead, use the Azure key directly with anthropic_messages api_mode.
_eff_base = (explicit_base_url or "").strip()
if requested_provider == "anthropic" and "azure.com" in _eff_base:
_azure_key = (
(explicit_api_key or "").strip()
or os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
return {
"provider": "anthropic",
"api_mode": "anthropic_messages",
"base_url": _eff_base.rstrip("/"),
"api_key": _azure_key,
"source": "azure-explicit",
"requested_provider": requested_provider,
}
# Azure Foundry: user-configured endpoint with selectable API mode
# (OpenAI-style chat_completions or Anthropic-style anthropic_messages).
# Resolve before the custom-runtime / pool / generic paths so Azure
# config is always picked up from model.base_url + model.api_mode,
# regardless of whether the caller passed explicit_* args.
if requested_provider == "azure-foundry":
azure_runtime = _resolve_azure_foundry_runtime(
requested_provider=requested_provider,
model_cfg=_get_model_config(),
explicit_api_key=explicit_api_key,
explicit_base_url=explicit_base_url,
target_model=target_model,
)
return azure_runtime
custom_runtime = _resolve_named_custom_runtime(
requested_provider=requested_provider,
explicit_api_key=explicit_api_key,
@@ -772,6 +977,7 @@ def resolve_runtime_provider(
requested_provider=requested_provider,
model_cfg=model_cfg,
pool=pool,
target_model=target_model,
)
if provider == "nous":
@@ -870,13 +1076,6 @@ def resolve_runtime_provider(
# Anthropic (native Messages API)
if provider == "anthropic":
from agent.anthropic_adapter import resolve_anthropic_token
token = resolve_anthropic_token()
if not token:
raise AuthError(
"No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
"run 'claude setup-token', or authenticate with 'claude /login'."
)
# Allow base URL override from config.yaml model.base_url, but only
# when the configured provider is anthropic — otherwise a non-Anthropic
# base_url (e.g. Codex endpoint) would leak into Anthropic requests.
@@ -885,6 +1084,33 @@ def resolve_runtime_provider(
if cfg_provider == "anthropic":
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
base_url = cfg_base_url or "https://api.anthropic.com"
# For Azure AI Foundry endpoints, use ANTHROPIC_API_KEY directly —
# Claude Code OAuth tokens (sk-ant-oat01) are not accepted by Azure.
# Azure keys don't start with "sk-ant-" so resolve_anthropic_token()
# would find the Claude Code OAuth token first (priority 3) and return
# that instead, causing 401s. Detect Azure endpoints and use the env
# key directly to bypass the OAuth priority chain.
_is_azure_endpoint = "azure.com" in base_url.lower() or (
cfg_base_url and "azure.com" in cfg_base_url.lower()
)
if _is_azure_endpoint:
token = (
os.getenv("AZURE_ANTHROPIC_KEY", "").strip()
or os.getenv("ANTHROPIC_API_KEY", "").strip()
)
if not token:
raise AuthError(
"No Azure Anthropic API key found. Set AZURE_ANTHROPIC_KEY or ANTHROPIC_API_KEY."
)
else:
from agent.anthropic_adapter import resolve_anthropic_token
token = resolve_anthropic_token()
if not token:
raise AuthError(
"No Anthropic credentials found. Set ANTHROPIC_TOKEN or ANTHROPIC_API_KEY, "
"run 'claude setup-token', or authenticate with 'claude /login'."
)
return {
"provider": "anthropic",
"api_mode": "anthropic_messages",
@@ -990,7 +1216,11 @@ def resolve_runtime_provider(
api_mode = configured_mode
elif provider in ("opencode-zen", "opencode-go"):
from hermes_cli.models import opencode_model_api_mode
api_mode = opencode_model_api_mode(provider, model_cfg.get("default", ""))
# Prefer the target_model from the caller (explicit mid-session
# switch) over the stale model.default; see _resolve_runtime_from_pool_entry
# for the same rationale.
_effective = target_model or model_cfg.get("default", "")
api_mode = opencode_model_api_mode(provider, _effective)
else:
# Auto-detect Anthropic-compatible endpoints by URL convention
# (e.g. https://api.minimax.io/anthropic, https://dashscope.../anthropic)
+105 -63
View File
@@ -500,6 +500,15 @@ def _print_setup_summary(config: dict, hermes_home):
if get_env_value("HASS_TOKEN"):
tool_status.append(("Smart Home (Home Assistant)", True, None))
# Spotify (OAuth via hermes auth spotify — check auth.json, not env vars)
try:
from hermes_cli.auth import get_provider_auth_state
_spotify_state = get_provider_auth_state("spotify") or {}
if _spotify_state.get("access_token") or _spotify_state.get("refresh_token"):
tool_status.append(("Spotify (PKCE OAuth)", True, None))
except Exception:
pass
# Skills Hub
if get_env_value("GITHUB_TOKEN"):
tool_status.append(("Skills Hub (GitHub)", True, None))
@@ -1847,27 +1856,32 @@ def _setup_slack():
if existing:
print_info("Slack: already configured")
if not prompt_yes_no("Reconfigure Slack?", False):
# Even without reconfiguring, offer to refresh the manifest so
# new commands (e.g. /btw, /stop, ...) get registered in Slack.
if prompt_yes_no(
"Regenerate the Slack app manifest with the latest command "
"list? (recommended after `hermes update`)",
True,
):
_write_slack_manifest_and_instruct()
return
print_info("Steps to create a Slack app:")
print_info(" 1. Go to https://api.slack.com/apps → Create New App (from scratch)")
print_info(" 1. Go to https://api.slack.com/apps → Create New App")
print_info(" Pick 'From an app manifest' — we'll generate one for you below.")
print_info(" 2. Enable Socket Mode: Settings → Socket Mode → Enable")
print_info(" • Create an App-Level Token with 'connections:write' scope")
print_info(" 3. Add Bot Token Scopes: Features → OAuth & Permissions")
print_info(" Required scopes: chat:write, app_mentions:read,")
print_info(" channels:history, channels:read, im:history,")
print_info(" im:read, im:write, users:read, files:read, files:write")
print_info(" Optional for private channels: groups:history")
print_info(" 4. Subscribe to Events: Features → Event Subscriptions → Enable")
print_info(" Required events: message.im, message.channels, app_mention")
print_info(" Optional for private channels: message.groups")
print_warning(" ⚠ Without message.channels the bot will ONLY work in DMs,")
print_warning(" not public channels.")
print_info(" 5. Install to Workspace: Settings → Install App")
print_info(" 6. Reinstall the app after any scope or event changes")
print_info(" 7. After installing, invite the bot to channels: /invite @YourBot")
print_info(" 3. Install to Workspace: Settings → Install App")
print_info(" 4. After installing, invite the bot to channels: /invite @YourBot")
print()
print_info(" Full guide: https://hermes-agent.nousresearch.com/docs/user-guide/messaging/slack/")
print()
# Generate and write manifest up-front so the user can paste it into
# the "Create from manifest" flow instead of clicking through scopes /
# events / slash commands one at a time.
_write_slack_manifest_and_instruct()
print()
bot_token = prompt("Slack Bot Token (xoxb-...)", password=True)
if not bot_token:
@@ -1893,6 +1907,49 @@ def _setup_slack():
print_info(" Set SLACK_ALLOW_ALL_USERS=true or GATEWAY_ALLOW_ALL_USERS=true only if you intentionally want open workspace access.")
def _write_slack_manifest_and_instruct():
"""Generate the Slack manifest, write it under HERMES_HOME, and print
paste-into-Slack instructions.
Exposed as its own helper so both the initial setup flow and the
"reconfigure? → no" branch can refresh the manifest without the user
re-entering tokens. Failures are non-fatal if the manifest write
fails for any reason, we print a warning and skip rather than abort
the whole Slack setup.
"""
try:
from hermes_cli.slack_cli import _build_full_manifest
from hermes_constants import get_hermes_home
manifest = _build_full_manifest(
bot_name="Hermes",
bot_description="Your Hermes agent on Slack",
)
target = Path(get_hermes_home()) / "slack-manifest.json"
target.parent.mkdir(parents=True, exist_ok=True)
import json as _json
target.write_text(
_json.dumps(manifest, indent=2, ensure_ascii=False) + "\n",
encoding="utf-8",
)
print_success(f"Slack app manifest written to: {target}")
print_info(
" Paste it into https://api.slack.com/apps → your app → Features "
"→ App Manifest → Edit, then Save. Slack will prompt to "
"reinstall if scopes or slash commands changed."
)
print_info(
" Re-run `hermes slack manifest --write` anytime to refresh after "
"Hermes adds new commands."
)
except Exception as exc: # pragma: no cover - best-effort UX helper
print_warning(f"Couldn't write Slack manifest: {exc}")
print_info(
" You can generate it manually later with: "
"hermes slack manifest --write"
)
def _setup_matrix():
"""Configure Matrix credentials."""
print_header("Matrix")
@@ -2076,6 +2133,12 @@ def _setup_feishu():
_gateway_setup_feishu()
def _setup_yuanbao():
"""Configure Yuanbao via gateway setup."""
from hermes_cli.gateway import _setup_yuanbao as _gateway_setup_yuanbao
_gateway_setup_yuanbao()
def _setup_wecom():
"""Configure WeCom (Enterprise WeChat) via gateway setup."""
from hermes_cli.gateway import _setup_wecom as _gateway_setup_wecom
@@ -2220,6 +2283,7 @@ _GATEWAY_PLATFORMS = [
("WhatsApp", "WHATSAPP_ENABLED", _setup_whatsapp),
("DingTalk", "DINGTALK_CLIENT_ID", _setup_dingtalk),
("Feishu / Lark", "FEISHU_APP_ID", _setup_feishu),
("Yuanbao", "YUANBAO_APP_ID", _setup_yuanbao),
("WeCom (Enterprise WeChat)", "WECOM_BOT_ID", _setup_wecom),
("WeCom Callback (Self-Built App)", "WECOM_CALLBACK_CORP_ID", _setup_wecom_callback),
("Weixin (WeChat)", "WEIXIN_ACCOUNT_ID", _setup_weixin),
@@ -2854,17 +2918,6 @@ SETUP_SECTIONS = [
("agent", "Agent Settings", setup_agent_settings),
]
# The returning-user menu intentionally omits standalone TTS because model setup
# already includes TTS selection and tools setup covers the rest of the provider
# configuration. Keep this list in the same order as the visible menu entries.
RETURNING_USER_MENU_SECTION_KEYS = [
"model",
"terminal",
"gateway",
"tools",
"agent",
]
def run_setup_wizard(args):
"""Run the interactive setup wizard.
@@ -2889,6 +2942,9 @@ def run_setup_wizard(args):
save_config(copy.deepcopy(DEFAULT_CONFIG))
print_success("Configuration reset to defaults.")
reconfigure_requested = bool(getattr(args, "reconfigure", False))
quick_requested = bool(getattr(args, "quick", False))
config = load_config()
hermes_home = get_hermes_home()
@@ -2980,50 +3036,36 @@ def run_setup_wizard(args):
migration_ran = False
if is_existing:
# ── Returning User Menu ──
print()
print_header("Welcome Back!")
print_success("You already have Hermes configured.")
print()
menu_choices = [
"Quick Setup - configure missing items only",
"Full Setup - reconfigure everything",
"Model & Provider",
"Terminal Backend",
"Messaging Platforms (Gateway)",
"Tools",
"Agent Settings",
"Exit",
]
choice = prompt_choice("What would you like to do?", menu_choices, 0)
if choice == 0:
# Quick setup
# Existing install — default is the full-wizard reconfigure flow.
# Every prompt shows the current value as its default, so pressing
# Enter keeps it. Opt into `--quick` for the narrow "just fill in
# missing items" flow (useful after a partial OpenClaw migration
# or when a required API key got cleared).
if quick_requested:
_run_quick_setup(config, hermes_home)
return
elif choice == 1:
# Full setup — fall through to run all sections
pass
elif choice == 7:
print_info("Exiting. Run 'hermes setup' again when ready.")
return
elif 2 <= choice <= 6:
# Individual section — map by key, not by position.
# SETUP_SECTIONS includes TTS but the returning-user menu skips it,
# so positional indexing (choice - 2) would dispatch the wrong section.
section_key = RETURNING_USER_MENU_SECTION_KEYS[choice - 2]
section = next((s for s in SETUP_SECTIONS if s[0] == section_key), None)
if section:
_, label, func = section
func(config)
save_config(config)
_print_setup_summary(config, hermes_home)
return
print()
print_header("Reconfigure")
print_success("You already have Hermes configured.")
print_info("Running the full wizard — each prompt shows your current value.")
print_info("Press Enter to keep it, or type a new value to change it.")
print_info("")
print_info("Tip: jump straight to a section with 'hermes setup model|terminal|")
print_info(" gateway|tools|agent', or fill only missing items with --quick.")
# Fall through to the "Full Setup — run all sections" block below.
# --reconfigure is now the default on existing installs; the flag
# is preserved for backwards compatibility but is a no-op here.
else:
# ── First-Time Setup ──
print()
# --reconfigure / --quick on a fresh install are meaningless — fall
# through to the normal first-time flow.
if reconfigure_requested or quick_requested:
print_info("No existing configuration found — running first-time setup.")
print()
# Offer OpenClaw migration before configuration begins
migration_ran = _offer_openclaw_migration(hermes_home)
if migration_ran:
+230 -20
View File
@@ -11,9 +11,10 @@ handler are thin wrappers that parse args and delegate.
"""
import json
import re
import shutil
from pathlib import Path
from typing import Any, Dict, Optional
from typing import Any, Dict, List, Optional
from rich.console import Console
from rich.panel import Panel
@@ -141,6 +142,103 @@ def _derive_category_from_install_path(install_path: str) -> str:
return "" if parent == "." else parent
# ---------------------------------------------------------------------------
# Interactive name/category resolution for URL-installed skills
# ---------------------------------------------------------------------------
_VALID_NAME_RE = re.compile(r"^[a-z][a-z0-9_-]*$")
_VALID_CATEGORY_RE = re.compile(r"^[a-z][a-z0-9_/-]*$")
def _is_valid_installed_skill_name(name: str) -> bool:
"""Accept identifier-shaped names, reject empty / sentinel-y values."""
if not isinstance(name, str):
return False
candidate = name.strip().lower()
if not candidate or candidate in {"skill", "readme", "index", "unnamed-skill"}:
return False
return bool(_VALID_NAME_RE.match(candidate))
def _existing_categories() -> List[str]:
"""Return sorted subdirectory names under ``~/.hermes/skills/`` that look
like category buckets (contain at least one ``SKILL.md`` somewhere below).
Used to suggest reusable categories when interactively installing from a
URL. Hidden dirs (``.hub``, ``.trash``) are skipped.
"""
from tools.skills_hub import SKILLS_DIR
out: List[str] = []
try:
for entry in SKILLS_DIR.iterdir():
if not entry.is_dir() or entry.name.startswith("."):
continue
# Only count as a category if it contains skills, not if it IS a skill.
# Heuristic: if ``<entry>/SKILL.md`` exists, it's a skill at the
# top level (no category); otherwise treat as a category bucket.
if (entry / "SKILL.md").exists():
continue
# Has at least one nested SKILL.md?
try:
if any(entry.rglob("SKILL.md")):
out.append(entry.name)
except OSError:
continue
except (FileNotFoundError, OSError):
return []
return sorted(set(out))
def _prompt_for_skill_name(c: Console, url: str, default: str = "") -> Optional[str]:
"""Prompt interactively for a skill name. Returns None on cancel/EOF."""
c.print()
c.print(
f"[yellow]The SKILL.md at {url} doesn't declare a `name:` in its "
f"frontmatter,[/]\n[yellow]and the URL path doesn't produce a valid "
f"identifier either.[/]"
)
default_hint = f" [{default}]" if default else ""
c.print(
f"[bold]Enter a skill name{default_hint}:[/] "
f"[dim](lowercase letters, digits, hyphens, underscores; starts with a letter)[/]"
)
try:
answer = input("Name: ").strip()
except (EOFError, KeyboardInterrupt):
return None
if not answer and default:
answer = default
if not _is_valid_installed_skill_name(answer):
c.print(f"[bold red]Invalid name:[/] {answer!r}. Aborting install.\n")
return None
return answer
def _prompt_for_category(c: Console, existing: List[str]) -> str:
"""Prompt interactively for a category. Empty/None input means flat install."""
c.print()
if existing:
c.print(
"[bold]Pick a category[/] "
"[dim](reuse an existing bucket, type a new one, or press Enter to install flat)[/]"
)
c.print(f"[dim]Existing: {', '.join(existing)}[/]")
else:
c.print(
"[bold]Category[/] [dim](optional — press Enter to install flat at ~/.hermes/skills/<name>/)[/]"
)
try:
answer = input("Category: ").strip()
except (EOFError, KeyboardInterrupt):
return ""
if not answer:
return ""
if not _VALID_CATEGORY_RE.match(answer):
c.print(f"[dim]Invalid category {answer!r} — installing flat.[/]")
return ""
return answer
def do_search(query: str, source: str = "all", limit: int = 10,
console: Optional[Console] = None) -> None:
"""Search registries and display results as a Rich table."""
@@ -309,8 +407,17 @@ def do_browse(page: int = 1, page_size: int = 20, source: str = "all",
def do_install(identifier: str, category: str = "", force: bool = False,
console: Optional[Console] = None, skip_confirm: bool = False,
invalidate_cache: bool = True) -> None:
"""Fetch, quarantine, scan, confirm, and install a skill."""
invalidate_cache: bool = True,
name_override: str = "") -> None:
"""Fetch, quarantine, scan, confirm, and install a skill.
``name_override`` lets non-interactive callers (slash commands, gateway,
scripts) supply a skill name when the upstream SKILL.md lacks a valid
``name:`` frontmatter field. On interactive TTY surfaces, a missing name
triggers a prompt instead; ``skip_confirm=True`` means "non-interactive"
(so pair it with ``name_override`` when installing from a URL that has
no frontmatter).
"""
from tools.skills_hub import (
GitHubAuth, create_source_router, ensure_hub_dirs,
quarantine_bundle, install_from_quarantine, HubLockFile,
@@ -354,6 +461,58 @@ def do_install(identifier: str, category: str = "", force: bool = False,
c.print()
return
# URL-sourced skills may arrive with an empty name when SKILL.md has no
# ``name:`` in frontmatter AND the URL path doesn't yield a valid
# identifier. Resolve by (1) --name override, (2) interactive prompt on
# a TTY, (3) refuse with an actionable error on non-interactive surfaces.
bundle_meta = getattr(bundle, "metadata", {}) or {}
if bundle.source == "url" and (not bundle.name or bundle_meta.get("awaiting_name")):
if name_override and _is_valid_installed_skill_name(name_override):
bundle.name = name_override.strip()
bundle_meta["awaiting_name"] = False
elif name_override:
c.print(
f"[bold red]Invalid --name:[/] {name_override!r}. "
"Must be a lowercase identifier (letters, digits, hyphens, "
"underscores; starts with a letter).\n"
)
return
elif skip_confirm:
# Non-interactive surface (slash command / TUI / gateway). Can't
# prompt — emit an actionable error.
url = bundle_meta.get("url") or identifier
c.print(
f"[bold red]Cannot install from URL:[/] {url}\n"
"[yellow]The SKILL.md has no `name:` in its frontmatter, "
"and the URL path doesn't produce a valid identifier.[/]\n\n"
"Retry with an explicit name:\n"
f" [bold]/skills install {url} --name <your-name>[/]\n"
f" [bold]hermes skills install {url} --name <your-name>[/]\n\n"
"[dim]Or ask the SKILL.md's author to add a `name:` field to "
"its YAML frontmatter.[/]\n"
)
return
else:
# Interactive TTY — prompt.
url = bundle_meta.get("url") or identifier
chosen = _prompt_for_skill_name(c, url)
if not chosen:
c.print("[dim]Installation cancelled.[/]\n")
return
bundle.name = chosen
bundle_meta["awaiting_name"] = False
# Keep SkillMeta in sync so downstream "already installed" checks,
# audit logs, and display all see the final name.
if meta is not None:
meta.name = bundle.name
meta.path = bundle.name
# URL-sourced skills: offer to pick a category interactively when the
# caller didn't specify one (TTY only — non-interactive installs fall
# through to flat install, matching all other sources).
if bundle.source == "url" and not category and not skip_confirm:
category = _prompt_for_category(c, _existing_categories())
# Auto-detect category for official skills (e.g. "official/autonomous-ai-agents/blackbox")
if bundle.source == "official" and not category:
id_parts = bundle.identifier.split("/") # ["official", "category", "skill"]
@@ -599,11 +758,24 @@ def inspect_skill(identifier: str) -> Optional[dict]:
return out
def do_list(source_filter: str = "all", console: Optional[Console] = None) -> None:
"""List installed skills, distinguishing hub, builtin, and local skills."""
def do_list(source_filter: str = "all",
enabled_only: bool = False,
console: Optional[Console] = None) -> None:
"""List installed skills, distinguishing hub, builtin, and local skills.
Args:
source_filter: ``all`` | ``hub`` | ``builtin`` | ``local``.
enabled_only: If True, hide disabled skills from the output.
Enabled/disabled state is resolved against the currently active profile's
config ``hermes -p <profile> skills list`` reads that profile's
``skills.disabled`` list because ``-p`` swaps ``HERMES_HOME`` at process
start. No explicit profile flag needed here.
"""
from tools.skills_hub import HubLockFile, ensure_hub_dirs
from tools.skills_sync import _read_manifest
from tools.skills_tool import _find_all_skills
from agent.skill_utils import get_disabled_skill_names
c = console or _console
ensure_hub_dirs()
@@ -611,17 +783,26 @@ def do_list(source_filter: str = "all", console: Optional[Console] = None) -> No
hub_installed = {e["name"]: e for e in lock.list_installed()}
builtin_names = set(_read_manifest())
all_skills = _find_all_skills()
# Pull ALL skills (including disabled ones) so we can annotate status.
all_skills = _find_all_skills(skip_disabled=True)
disabled_names = get_disabled_skill_names()
table = Table(title="Installed Skills")
title = "Installed Skills"
if enabled_only:
title += " (enabled only)"
table = Table(title=title)
table.add_column("Name", style="bold cyan")
table.add_column("Category", style="dim")
table.add_column("Source", style="dim")
table.add_column("Trust", style="dim")
table.add_column("Status", style="dim")
hub_count = 0
builtin_count = 0
local_count = 0
enabled_count = 0
disabled_count = 0
for skill in sorted(all_skills, key=lambda s: (s.get("category") or "", s["name"])):
name = skill["name"]
@@ -632,29 +813,48 @@ def do_list(source_filter: str = "all", console: Optional[Console] = None) -> No
source_type = "hub"
source_display = hub_entry.get("source", "hub")
trust = hub_entry.get("trust_level", "community")
hub_count += 1
elif name in builtin_names:
source_type = "builtin"
source_display = "builtin"
trust = "builtin"
builtin_count += 1
else:
source_type = "local"
source_display = "local"
trust = "local"
local_count += 1
if source_filter != "all" and source_filter != source_type:
continue
is_enabled = name not in disabled_names
if enabled_only and not is_enabled:
continue
if source_type == "hub":
hub_count += 1
elif source_type == "builtin":
builtin_count += 1
else:
local_count += 1
if is_enabled:
enabled_count += 1
status_cell = "[bold green]enabled[/]"
else:
disabled_count += 1
status_cell = "[dim red]disabled[/]"
trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow", "local": "dim"}.get(trust, "dim")
trust_label = "official" if source_display == "official" else trust
table.add_row(name, category, source_display, f"[{trust_style}]{trust_label}[/]")
table.add_row(name, category, source_display, f"[{trust_style}]{trust_label}[/]", status_cell)
c.print(table)
c.print(
f"[dim]{hub_count} hub-installed, {builtin_count} builtin, {local_count} local[/]\n"
)
summary = f"[dim]{hub_count} hub-installed, {builtin_count} builtin, {local_count} local"
if enabled_only:
summary += f"{enabled_count} enabled shown"
else:
summary += f"{enabled_count} enabled, {disabled_count} disabled"
summary += "[/]\n"
c.print(summary)
def do_check(name: Optional[str] = None, console: Optional[Console] = None) -> None:
@@ -1123,11 +1323,15 @@ def skills_command(args) -> None:
do_search(args.query, source=args.source, limit=args.limit)
elif action == "install":
do_install(args.identifier, category=args.category, force=args.force,
skip_confirm=getattr(args, "yes", False))
skip_confirm=getattr(args, "yes", False),
name_override=getattr(args, "name", "") or "")
elif action == "inspect":
do_inspect(args.identifier)
elif action == "list":
do_list(source_filter=args.source)
do_list(
source_filter=args.source,
enabled_only=getattr(args, "enabled_only", False),
)
elif action == "check":
do_check(name=getattr(args, "name", None))
elif action == "update":
@@ -1177,6 +1381,7 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
/skills search kubernetes
/skills install openai/skills/skill-creator
/skills install openai/skills/skill-creator --force
/skills install https://example.com/path/SKILL.md
/skills inspect openai/skills/skill-creator
/skills list
/skills list --source hub
@@ -1253,10 +1458,11 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
elif action == "install":
if not args:
c.print("[bold red]Usage:[/] /skills install <identifier> [--category <cat>] [--force] [--now]\n")
c.print("[bold red]Usage:[/] /skills install <identifier-or-url> [--name <name>] [--category <cat>] [--force] [--now]\n")
return
identifier = args[0]
category = ""
name_override = ""
# Slash commands run inside prompt_toolkit where input() hangs.
# Always skip confirmation — the user typing the command is implicit consent.
skip_confirm = True
@@ -1267,9 +1473,11 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
for i, a in enumerate(args):
if a == "--category" and i + 1 < len(args):
category = args[i + 1]
elif a == "--name" and i + 1 < len(args):
name_override = args[i + 1]
do_install(identifier, category=category, force=force,
skip_confirm=skip_confirm, invalidate_cache=invalidate_cache,
console=c)
name_override=name_override, console=c)
elif action == "inspect":
if not args:
@@ -1279,11 +1487,12 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
elif action == "list":
source_filter = "all"
enabled_only = "--enabled-only" in args or "--enabled" in args
if "--source" in args:
idx = args.index("--source")
if idx + 1 < len(args):
source_filter = args[idx + 1]
do_list(source_filter=source_filter, console=c)
do_list(source_filter=source_filter, enabled_only=enabled_only, console=c)
elif action == "check":
name = args[0] if args else None
@@ -1371,7 +1580,8 @@ def _print_skills_help(console: Console) -> None:
" [cyan]search[/] <query> Search registries for skills\n"
" [cyan]install[/] <identifier> Install a skill (with security scan)\n"
" [cyan]inspect[/] <identifier> Preview a skill without installing\n"
" [cyan]list[/] [--source hub|builtin|local] List installed skills\n"
" [cyan]list[/] [--source hub|builtin|local] [--enabled-only]\n"
" List installed skills; --enabled-only filters to the active profile's live set\n"
" [cyan]check[/] [name] Check hub skills for upstream updates\n"
" [cyan]update[/] [name] Update hub skills with upstream changes\n"
" [cyan]audit[/] [name] Re-scan hub skills for security\n"
+152
View File
@@ -0,0 +1,152 @@
"""``hermes slack ...`` CLI subcommands.
Today only ``hermes slack manifest`` is implemented it generates the
Slack app manifest JSON for registering every gateway command as a native
Slack slash (``/btw``, ``/stop``, ``/model``, ) so users get the same
first-class slash UX Discord and Telegram already have.
Typical workflow::
$ hermes slack manifest > slack-manifest.json
# or:
$ hermes slack manifest --write
Then paste the printed JSON into the Slack app config (Features App
Manifest Edit) and click Save. Slack diffs the manifest and prompts
for reinstall when scopes/commands change.
"""
from __future__ import annotations
import json
import sys
from pathlib import Path
def _build_full_manifest(bot_name: str, bot_description: str) -> dict:
"""Build a full Slack manifest merging display info + our slash list.
The slash-command list is always generated from ``COMMAND_REGISTRY`` so
it stays in sync with the rest of Hermes. Other manifest sections
(display info, OAuth scopes, socket mode) are set to sensible defaults
for a Hermes deployment users can tweak them in the Slack UI after
pasting.
"""
from hermes_cli.commands import slack_app_manifest
partial = slack_app_manifest()
slashes = partial["features"]["slash_commands"]
return {
"_metadata": {
"major_version": 1,
"minor_version": 1,
},
"display_information": {
"name": bot_name[:35],
"description": (bot_description or "Your Hermes agent on Slack")[:140],
"background_color": "#1a1a2e",
},
"features": {
"bot_user": {
"display_name": bot_name[:80],
"always_online": True,
},
"slash_commands": slashes,
"assistant_view": {
"assistant_description": "Chat with Hermes in threads and DMs.",
},
},
"oauth_config": {
"scopes": {
"bot": [
"app_mentions:read",
"assistant:write",
"channels:history",
"channels:read",
"chat:write",
"commands",
"files:read",
"files:write",
"groups:history",
"im:history",
"im:read",
"im:write",
"users:read",
],
},
},
"settings": {
"event_subscriptions": {
"bot_events": [
"app_mention",
"assistant_thread_context_changed",
"assistant_thread_started",
"message.channels",
"message.groups",
"message.im",
],
},
"interactivity": {
"is_enabled": True,
},
"org_deploy_enabled": False,
"socket_mode_enabled": True,
"token_rotation_enabled": False,
},
}
def slack_manifest_command(args) -> int:
"""Print or write a Slack app manifest JSON.
Flags (all parsed in ``hermes_cli/main.py``):
--write [PATH] Write to file instead of stdout (default path:
``$HERMES_HOME/slack-manifest.json``)
--name NAME Override the bot display name (default: "Hermes")
--description DESC Override the bot description
--slashes-only Emit only the ``features.slash_commands`` array (for
merging into an existing manifest manually)
"""
name = getattr(args, "name", None) or "Hermes"
description = getattr(args, "description", None) or "Your Hermes agent on Slack"
if getattr(args, "slashes_only", False):
from hermes_cli.commands import slack_app_manifest
manifest = slack_app_manifest()["features"]["slash_commands"]
else:
manifest = _build_full_manifest(name, description)
payload = json.dumps(manifest, indent=2, ensure_ascii=False) + "\n"
write_target = getattr(args, "write", None)
if write_target is not None:
if isinstance(write_target, bool) and write_target:
# --write with no value → default location
try:
from hermes_constants import get_hermes_home
target = Path(get_hermes_home()) / "slack-manifest.json"
except Exception:
target = Path.home() / ".hermes" / "slack-manifest.json"
else:
target = Path(write_target).expanduser()
target.parent.mkdir(parents=True, exist_ok=True)
target.write_text(payload, encoding="utf-8")
print(f"Slack manifest written to: {target}", file=sys.stderr)
print(
"\nNext steps:\n"
" 1. Open https://api.slack.com/apps and pick your Hermes app\n"
" (or create a new one: Create New App → From an app manifest).\n"
f" 2. Features → App Manifest → paste the contents of\n"
f" {target}\n"
" 3. Save; Slack will prompt to reinstall the app if scopes or\n"
" slash commands changed.\n"
" 4. Make sure Socket Mode is enabled and you have a bot token\n"
" (xoxb-...) and app token (xapp-...) configured via\n"
" `hermes setup`.\n",
file=sys.stderr,
)
else:
sys.stdout.write(payload)
return 0
+15 -7
View File
@@ -164,19 +164,26 @@ def show_status(args):
qwen_status = {}
nous_logged_in = bool(nous_status.get("logged_in"))
nous_error = nous_status.get("error")
nous_label = "logged in" if nous_logged_in else "not logged in (run: hermes auth add nous --type oauth)"
print(
f" {'Nous Portal':<12} {check_mark(nous_logged_in)} "
f"{'logged in' if nous_logged_in else 'not logged in (run: hermes model)'}"
f"{nous_label}"
)
if nous_logged_in:
portal_url = nous_status.get("portal_base_url") or "(unknown)"
access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
portal_url = nous_status.get("portal_base_url") or "(unknown)"
access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
if nous_logged_in or portal_url != "(unknown)" or nous_error:
print(f" Portal URL: {portal_url}")
if nous_logged_in or nous_status.get("access_expires_at"):
print(f" Access exp: {access_exp}")
if nous_logged_in or nous_status.get("agent_key_expires_at"):
print(f" Key exp: {key_exp}")
if nous_logged_in or nous_status.get("has_refresh_token"):
print(f" Refresh: {refresh_label}")
if nous_error and not nous_logged_in:
print(f" Error: {nous_error}")
codex_logged_in = bool(codex_status.get("logged_in"))
print(
@@ -319,7 +326,8 @@ def show_status(args):
"WeCom Callback": ("WECOM_CALLBACK_CORP_ID", None),
"Weixin": ("WEIXIN_ACCOUNT_ID", "WEIXIN_HOME_CHANNEL"),
"BlueBubbles": ("BLUEBUBBLES_SERVER_URL", "BLUEBUBBLES_HOME_CHANNEL"),
"QQBot": ("QQ_APP_ID", "QQBOT_HOME_CHANNEL"),
"QQBot": ("QQ_APP_ID", "QQ_HOME_CHANNEL"),
"Yuanbao": ("YUANBAO_APP_ID", "YUANBAO_HOME_CHANNEL"),
}
for name, (token_var, home_var) in platforms.items():
+4 -4
View File
@@ -20,10 +20,10 @@ def get_provider_request_timeout(
try:
from hermes_cli.config import load_config
except ImportError:
config = load_config()
except Exception:
return None
config = load_config()
providers = config.get("providers", {}) if isinstance(config, dict) else {}
provider_config = (
providers.get(provider_id, {}) if isinstance(providers, dict) else {}
@@ -49,10 +49,10 @@ def get_provider_stale_timeout(
try:
from hermes_cli.config import load_config
except ImportError:
config = load_config()
except Exception:
return None
config = load_config()
providers = config.get("providers", {}) if isinstance(config, dict) else {}
provider_config = (
providers.get(provider_id, {}) if isinstance(providers, dict) else {}
+3 -4
View File
@@ -10,8 +10,7 @@ import random
TIPS = [
# --- Slash Commands ---
"/btw <question> asks a quick side question without tools or history — great for clarifications.",
"/background <prompt> runs a task in a separate session while your current one stays free.",
"/background <prompt> (alias /bg or /btw) runs a task in a separate session while your current one stays free.",
"/branch forks the current session so you can explore a different direction without losing progress.",
"/compress manually compresses conversation context when things get long.",
"/rollback lists filesystem checkpoints — restore files the agent modified to any prior state.",
@@ -107,7 +106,7 @@ TIPS = [
"Set display.streaming: true to see tokens appear in real time as the model generates.",
"Set display.show_reasoning: true to watch the model's chain-of-thought reasoning.",
"Set display.compact: true to reduce whitespace in output for denser information.",
"Set display.busy_input_mode: queue to queue messages instead of interrupting the agent.",
"Set display.busy_input_mode: queue to queue messages instead of interrupting the agent, or steer to inject them mid-run via /steer.",
"Set display.resume_display: minimal to skip the full conversation recap on session resume.",
"Set compression.threshold: 0.50 to control when auto-compression fires (default: 50% of context).",
"Set agent.max_turns: 200 to let the agent take more tool-calling steps per turn.",
@@ -127,7 +126,7 @@ TIPS = [
# --- Tools & Capabilities ---
"execute_code runs Python scripts that call Hermes tools programmatically — results stay out of context.",
"delegate_task spawns up to 3 concurrent sub-agents by default (configurable via delegation.max_concurrent_children) with isolated contexts for parallel work.",
"delegate_task spawns up to 3 concurrent sub-agents by default (delegation.max_concurrent_children) with isolated contexts for parallel work.",
"web_extract works on PDF URLs — pass any PDF link and it converts to markdown.",
"search_files is ripgrep-backed and faster than grep — use it instead of terminal grep.",
"patch uses 9 fuzzy matching strategies so minor whitespace differences won't break edits.",
+150 -15
View File
@@ -11,6 +11,7 @@ the `platform_toolsets` key.
import json as _json
import logging
import os
import sys
from pathlib import Path
from typing import Dict, List, Optional, Set
@@ -25,7 +26,7 @@ from hermes_cli.nous_subscription import (
get_nous_subscription_features,
)
from tools.tool_backend_helpers import fal_key_is_configured, managed_nous_tools_enabled
from utils import base_url_hostname
from utils import base_url_hostname, is_truthy_value
logger = logging.getLogger(__name__)
@@ -67,26 +68,60 @@ CONFIGURABLE_TOOLSETS = [
("messaging", "📨 Cross-Platform Messaging", "send_message"),
("rl", "🧪 RL Training", "Tinker-Atropos training tools"),
("homeassistant", "🏠 Home Assistant", "smart home device control"),
("spotify", "🎵 Spotify", "playback, search, playlists, library"),
("discord", "💬 Discord (read/participate)", "fetch messages, search members, create thread"),
("discord_admin", "🛡️ Discord Server Admin", "list channels/roles, pin, assign roles"),
("yuanbao", "🤖 Yuanbao", "group info, member queries, DM"),
]
# Toolsets that are OFF by default for new installs.
# They're still in _HERMES_CORE_TOOLS (available at runtime if enabled),
# but the setup checklist won't pre-select them for first-time users.
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "discord_admin"}
_DEFAULT_OFF_TOOLSETS = {"moa", "homeassistant", "rl", "spotify", "discord", "discord_admin"}
# Platform-scoped toolsets: only appear in the `hermes tools` checklist for
# these platforms, and only resolve/save for these platforms. A toolset
# absent from this map is available on every platform (current behaviour).
#
# Use this for tools whose APIs only make sense on one platform (Discord
# server admin, Slack workspace admin, etc.). Keeps every other platform's
# checklist from filling up with irrelevant toggles.
_TOOLSET_PLATFORM_RESTRICTIONS: Dict[str, Set[str]] = {
"discord": {"discord"},
"discord_admin": {"discord"},
}
def _toolset_allowed_for_platform(ts_key: str, platform: str) -> bool:
"""Return True if ``ts_key`` is configurable on ``platform``.
Toolsets without a restriction entry are allowed everywhere (the default).
"""
allowed = _TOOLSET_PLATFORM_RESTRICTIONS.get(ts_key)
return allowed is None or platform in allowed
def _get_effective_configurable_toolsets():
"""Return CONFIGURABLE_TOOLSETS + any plugin-provided toolsets.
Plugin toolsets are appended at the end so they appear after the
built-in toolsets in the TUI checklist.
built-in toolsets in the TUI checklist. A plugin whose toolset key
already appears in ``CONFIGURABLE_TOOLSETS`` is skipped bundled
plugins (e.g. ``plugins/spotify``) share their toolset key with the
built-in entry, and we want the built-in label/description to win.
Without the dedupe, ``hermes tools`` "reconfigure existing" would
list the same toolset twice.
"""
result = list(CONFIGURABLE_TOOLSETS)
seen = {ts_key for ts_key, _, _ in result}
try:
from hermes_cli.plugins import discover_plugins, get_plugin_toolsets
discover_plugins() # idempotent — ensures plugins are loaded
result.extend(get_plugin_toolsets())
for entry in get_plugin_toolsets():
if entry[0] in seen:
continue
seen.add(entry[0])
result.append(entry)
except Exception:
pass
return result
@@ -362,6 +397,18 @@ TOOL_CATEGORIES = {
},
],
},
"spotify": {
"name": "Spotify",
"icon": "🎵",
"providers": [
{
"name": "Spotify Web API",
"tag": "PKCE OAuth — opens the setup wizard",
"env_vars": [],
"post_setup": "spotify",
},
],
},
"rl": {
"name": "RL Training",
"icon": "🧪",
@@ -462,6 +509,35 @@ def _run_post_setup(post_setup_key: str):
_print_warning(" kittentts install timed out (>5min)")
_print_info(f" Run manually: python -m pip install -U '{wheel_url}' soundfile")
elif post_setup_key == "spotify":
# Run the full `hermes auth spotify` flow — if the user has no
# client_id yet, this drops them into the interactive wizard
# (opens the Spotify dashboard, prompts for client_id, persists
# to ~/.hermes/.env), then continues straight into PKCE. If they
# already have an app, it skips the wizard and just does OAuth.
from types import SimpleNamespace
try:
from hermes_cli.auth import login_spotify_command
except Exception as exc:
_print_warning(f" Could not load Spotify auth: {exc}")
_print_info(" Run manually: hermes auth spotify")
return
_print_info(" Starting Spotify login...")
try:
login_spotify_command(SimpleNamespace(
client_id=None, redirect_uri=None, scope=None,
no_browser=False, timeout=None,
))
_print_success(" Spotify authenticated")
except SystemExit as exc:
# User aborted the wizard, or OAuth failed — don't fail the
# toolset enable; they can retry with `hermes auth spotify`.
_print_warning(f" Spotify login did not complete: {exc}")
_print_info(" Run later: hermes auth spotify")
except Exception as exc:
_print_warning(f" Spotify login failed: {exc}")
_print_info(" Run manually: hermes auth spotify")
elif post_setup_key == "rl_training":
try:
__import__("tinker_atropos")
@@ -575,7 +651,10 @@ def _get_platform_tools(
has_explicit_config = any(ts in configurable_keys for ts in toolset_names)
if has_explicit_config:
enabled_toolsets = {ts for ts in toolset_names if ts in configurable_keys}
enabled_toolsets = {
ts for ts in toolset_names
if ts in configurable_keys and _toolset_allowed_for_platform(ts, platform)
}
else:
# No explicit config — fall back to resolving composite toolset names
# (e.g. "hermes-cli") to individual tool names and reverse-mapping.
@@ -585,13 +664,29 @@ def _get_platform_tools(
enabled_toolsets = set()
for ts_key, _, _ in CONFIGURABLE_TOOLSETS:
if not _toolset_allowed_for_platform(ts_key, platform):
continue
ts_tools = set(resolve_toolset(ts_key))
if ts_tools and ts_tools.issubset(all_tool_names):
enabled_toolsets.add(ts_key)
default_off = set(_DEFAULT_OFF_TOOLSETS)
if platform in default_off:
# Legacy safety: if the platform's own name matches a default-off
# toolset (e.g. `homeassistant` platform + `homeassistant` toolset),
# keep that toolset enabled on first install. Skip this dodge for
# platform-restricted toolsets — those are always opt-in even on
# their own platform (e.g. `discord` + `discord` should stay OFF).
if platform in default_off and platform not in _TOOLSET_PLATFORM_RESTRICTIONS:
default_off.remove(platform)
# Home Assistant is already runtime-gated by its check_fn (requires
# HASS_TOKEN to register any tools). When a user has configured
# HASS_TOKEN, they've explicitly opted in — don't also strip it via
# _DEFAULT_OFF_TOOLSETS, which would silently drop HA from platforms
# (e.g. cron) that run through _get_platform_tools without an
# explicit saved toolset list. Without this, Norbert's HA cron jobs
# regressed after #14798 made cron honor per-platform tool config.
if "homeassistant" in default_off and os.getenv("HASS_TOKEN"):
default_off.remove("homeassistant")
enabled_toolsets -= default_off
# Recover non-configurable platform toolsets (e.g. discord, feishu_doc,
@@ -624,7 +719,10 @@ def _get_platform_tools(
enabled_toolsets.add(ts_key)
claimed.update(ts_tools)
# Plugin toolsets: enabled by default unless explicitly disabled.
# Plugin toolsets: enabled by default unless explicitly disabled, or
# unless the toolset is in _DEFAULT_OFF_TOOLSETS (e.g. spotify —
# shipped as a bundled plugin but user must opt in via `hermes tools`
# so we don't ship 7 Spotify tool schemas to users who don't use it).
# A plugin toolset is "known" for a platform once `hermes tools`
# has been saved for that platform (tracked via known_plugin_toolsets).
# Unknown plugins default to enabled; known-but-absent = disabled.
@@ -635,6 +733,9 @@ def _get_platform_tools(
if pts in toolset_names:
# Explicitly listed in config — enabled
enabled_toolsets.add(pts)
elif pts in _DEFAULT_OFF_TOOLSETS:
# Opt-in plugin toolset — stay off until user picks it
continue
elif pts not in known_for_platform:
# New plugin not yet seen by hermes tools — default enabled
enabled_toolsets.add(pts)
@@ -687,6 +788,14 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
"""
config.setdefault("platform_toolsets", {})
# Drop platform-scoped toolsets that don't apply here. Prevents the
# "Configure all platforms" checklist (or a hand-edited config.yaml)
# from turning on, say, the `discord` toolset for Telegram.
enabled_toolset_keys = {
ts for ts in enabled_toolset_keys
if _toolset_allowed_for_platform(ts, platform)
}
# Get the set of all configurable toolset keys (built-in + plugin)
configurable_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
plugin_keys = _get_plugin_toolset_keys()
@@ -709,8 +818,11 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
entry for entry in existing_toolsets
if entry not in configurable_keys and entry not in platform_default_keys
}
if "no_mcp" not in enabled_toolset_keys:
preserved_entries.discard("no_mcp")
# Opening `hermes tools` is the user's opt-in to reconfigure tools, so treat
# saving from the picker as consent to clear the "no_mcp" sentinel. The
# picker has no checkbox for no_mcp, so without this users who once set it
# by hand could never re-enable MCP servers through the UI.
preserved_entries.discard("no_mcp")
# Merge preserved entries with new enabled toolsets
config["platform_toolsets"][platform] = sorted(enabled_toolset_keys | preserved_entries)
@@ -818,7 +930,7 @@ def _estimate_tool_tokens() -> Dict[str, int]:
return _tool_token_cache
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform: str = "cli") -> Set[str]:
"""Multi-select checklist of toolsets. Returns set of selected toolset keys."""
from hermes_cli.curses_ui import curses_checklist
from toolsets import resolve_toolset
@@ -826,7 +938,12 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str
# Pre-compute per-tool token counts (cached after first call).
tool_tokens = _estimate_tool_tokens()
effective = _get_effective_configurable_toolsets()
effective_all = _get_effective_configurable_toolsets()
# Drop platform-scoped toolsets that don't apply to this platform.
effective = [
(k, l, d) for (k, l, d) in effective_all
if _toolset_allowed_for_platform(k, platform)
]
labels = []
for ts_key, ts_label, ts_desc in effective:
@@ -1071,7 +1188,7 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
configured_provider = image_cfg.get("provider")
if configured_provider not in (None, "", "fal"):
return False
if image_cfg.get("use_gateway") is False:
if image_cfg.get("use_gateway") is not None and not is_truthy_value(image_cfg.get("use_gateway"), default=False):
return False
return feature.managed_by_nous
if provider.get("tts_provider"):
@@ -1103,7 +1220,7 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
return (
provider["imagegen_backend"] == "fal"
and configured_provider in (None, "", "fal")
and not image_cfg.get("use_gateway")
and not is_truthy_value(image_cfg.get("use_gateway"), default=False)
)
return False
@@ -1740,7 +1857,7 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
checklist_preselected = current_enabled - _DEFAULT_OFF_TOOLSETS
# Show checklist
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected)
new_enabled = _prompt_toolset_checklist(pinfo["label"], checklist_preselected, pkey)
added = new_enabled - current_enabled
removed = current_enabled - new_enabled
@@ -2096,7 +2213,11 @@ def _apply_mcp_change(config: dict, targets: List[str], action: str) -> Set[str]
def _print_tools_list(enabled_toolsets: set, mcp_servers: dict, platform: str = "cli"):
"""Print a summary of enabled/disabled toolsets and MCP tool filters."""
effective = _get_effective_configurable_toolsets()
effective_all = _get_effective_configurable_toolsets()
effective = [
(k, l, d) for (k, l, d) in effective_all
if _toolset_allowed_for_platform(k, platform)
]
builtin_keys = {ts_key for ts_key, _, _ in CONFIGURABLE_TOOLSETS}
print(f"Built-in toolsets ({platform}):")
@@ -2162,6 +2283,20 @@ def tools_disable_enable_command(args):
_print_error(f"Unknown toolset '{name}'")
toolset_targets = [t for t in toolset_targets if t in valid_toolsets]
# Reject platform-scoped toolsets on platforms that don't allow them.
restricted_targets = [
t for t in toolset_targets
if not _toolset_allowed_for_platform(t, platform)
]
if restricted_targets:
for name in restricted_targets:
allowed = sorted(_TOOLSET_PLATFORM_RESTRICTIONS.get(name) or set())
_print_error(
f"Toolset '{name}' is not available on platform '{platform}' "
f"(only: {', '.join(allowed)})"
)
toolset_targets = [t for t in toolset_targets if t not in restricted_targets]
if toolset_targets:
_apply_toolset_change(config, platform, toolset_targets, action)
+349 -10
View File
@@ -49,7 +49,7 @@ from hermes_cli.config import (
from gateway.status import get_running_pid, read_runtime_status
try:
from fastapi import FastAPI, HTTPException, Request
from fastapi import FastAPI, HTTPException, Request, WebSocket, WebSocketDisconnect
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import FileResponse, HTMLResponse, JSONResponse
from fastapi.staticfiles import StaticFiles
@@ -73,6 +73,10 @@ app = FastAPI(title="Hermes Agent", version=__version__)
_SESSION_TOKEN = secrets.token_urlsafe(32)
_SESSION_HEADER_NAME = "X-Hermes-Session-Token"
# In-browser Chat tab (/chat, /api/pty, …). Off unless ``hermes dashboard --tui``
# or HERMES_DASHBOARD_TUI=1. Set from :func:`start_server`.
_DASHBOARD_EMBEDDED_CHAT_ENABLED = False
# Simple rate limiter for the reveal endpoint
_reveal_timestamps: List[float] = []
_REVEAL_MAX_PER_WINDOW = 5
@@ -283,7 +287,7 @@ _SCHEMA_OVERRIDES: Dict[str, Dict[str, Any]] = {
"display.busy_input_mode": {
"type": "select",
"description": "Input behavior while agent is running",
"options": ["queue", "interrupt", "block"],
"options": ["interrupt", "queue", "steer"],
},
"memory.provider": {
"type": "select",
@@ -1529,26 +1533,30 @@ def _submit_anthropic_pkce(session_id: str, code_input: str) -> Dict[str, Any]:
with urllib.request.urlopen(req, timeout=20) as resp:
result = json.loads(resp.read().decode())
except Exception as e:
sess["status"] = "error"
sess["error_message"] = f"Token exchange failed: {e}"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = f"Token exchange failed: {e}"
return {"ok": False, "status": "error", "message": sess["error_message"]}
access_token = result.get("access_token", "")
refresh_token = result.get("refresh_token", "")
expires_in = int(result.get("expires_in") or 3600)
if not access_token:
sess["status"] = "error"
sess["error_message"] = "No access token returned"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = "No access token returned"
return {"ok": False, "status": "error", "message": sess["error_message"]}
expires_at_ms = int(time.time() * 1000) + (expires_in * 1000)
try:
_save_anthropic_oauth_creds(access_token, refresh_token, expires_at_ms)
except Exception as e:
sess["status"] = "error"
sess["error_message"] = f"Save failed: {e}"
with _oauth_sessions_lock:
sess["status"] = "error"
sess["error_message"] = f"Save failed: {e}"
return {"ok": False, "status": "error", "message": sess["error_message"]}
sess["status"] = "approved"
with _oauth_sessions_lock:
sess["status"] = "approved"
_log.info("oauth/pkce: anthropic login completed (session=%s)", session_id)
return {"ok": True, "status": "approved"}
@@ -2263,6 +2271,327 @@ async def get_usage_analytics(days: int = 30):
db.close()
# ---------------------------------------------------------------------------
# /api/pty — PTY-over-WebSocket bridge for the dashboard "Chat" tab.
#
# The endpoint spawns the same ``hermes --tui`` binary the CLI uses, behind
# a POSIX pseudo-terminal, and forwards bytes + resize escapes across a
# WebSocket. The browser renders the ANSI through xterm.js (see
# web/src/pages/ChatPage.tsx).
#
# Auth: ``?token=<session_token>`` query param (browsers can't set
# Authorization on the WS upgrade). Same ephemeral ``_SESSION_TOKEN`` as
# REST. Localhost-only — we defensively reject non-loopback clients even
# though uvicorn binds to 127.0.0.1.
# ---------------------------------------------------------------------------
import re
import asyncio
from hermes_cli.pty_bridge import PtyBridge, PtyUnavailableError
_RESIZE_RE = re.compile(rb"\x1b\[RESIZE:(\d+);(\d+)\]")
_PTY_READ_CHUNK_TIMEOUT = 0.2
_VALID_CHANNEL_RE = re.compile(r"^[A-Za-z0-9._-]{1,128}$")
# Starlette's TestClient reports the peer as "testclient"; treat it as
# loopback so tests don't need to rewrite request scope.
_LOOPBACK_HOSTS = frozenset({"127.0.0.1", "::1", "localhost", "testclient"})
# Per-channel subscriber registry used by /api/pub (PTY-side gateway → dashboard)
# and /api/events (dashboard → browser sidebar). Keyed by an opaque channel id
# the chat tab generates on mount; entries auto-evict when the last subscriber
# drops AND the publisher has disconnected.
_event_channels: dict[str, set] = {}
_event_lock = asyncio.Lock()
def _resolve_chat_argv(
resume: Optional[str] = None,
sidecar_url: Optional[str] = None,
) -> tuple[list[str], Optional[str], Optional[dict]]:
"""Resolve the argv + cwd + env for the chat PTY.
Default: whatever ``hermes --tui`` would run. Tests monkeypatch this
function to inject a tiny fake command (``cat``, ``sh -c 'printf …'``)
so nothing has to build Node or the TUI bundle.
Session resume is propagated via the ``HERMES_TUI_RESUME`` env var
matching what ``hermes_cli.main._launch_tui`` does for the CLI path.
Appending ``--resume <id>`` to argv doesn't work because ``ui-tui`` does
not parse its argv.
`sidecar_url` (when set) is forwarded as ``HERMES_TUI_SIDECAR_URL`` so
the spawned ``tui_gateway.entry`` can mirror dispatcher emits to the
dashboard's ``/api/pub`` endpoint (see :func:`pub_ws`).
"""
from hermes_cli.main import PROJECT_ROOT, _make_tui_argv
argv, cwd = _make_tui_argv(PROJECT_ROOT / "ui-tui", tui_dev=False)
env = os.environ.copy()
env.setdefault("NODE_ENV", "production")
if resume:
env["HERMES_TUI_RESUME"] = resume
if sidecar_url:
env["HERMES_TUI_SIDECAR_URL"] = sidecar_url
return list(argv), str(cwd) if cwd else None, env
def _build_sidecar_url(channel: str) -> Optional[str]:
"""ws:// URL the PTY child should publish events to, or None when unbound."""
host = getattr(app.state, "bound_host", None)
port = getattr(app.state, "bound_port", None)
if not host or not port:
return None
netloc = f"[{host}]:{port}" if ":" in host and not host.startswith("[") else f"{host}:{port}"
qs = urllib.parse.urlencode({"token": _SESSION_TOKEN, "channel": channel})
return f"ws://{netloc}/api/pub?{qs}"
async def _broadcast_event(channel: str, payload: str) -> None:
"""Fan out one publisher frame to every subscriber on `channel`."""
async with _event_lock:
subs = list(_event_channels.get(channel, ()))
for sub in subs:
try:
await sub.send_text(payload)
except Exception:
# Subscriber went away mid-send; the /api/events finally clause
# will remove it from the registry on its next iteration.
pass
def _channel_or_close_code(ws: WebSocket) -> Optional[str]:
"""Return the channel id from the query string or None if invalid."""
channel = ws.query_params.get("channel", "")
return channel if _VALID_CHANNEL_RE.match(channel) else None
@app.websocket("/api/pty")
async def pty_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
# --- auth + loopback check (before accept so we can close cleanly) ---
token = ws.query_params.get("token", "")
expected = _SESSION_TOKEN
if not hmac.compare_digest(token.encode(), expected.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
await ws.accept()
# --- spawn PTY ------------------------------------------------------
resume = ws.query_params.get("resume") or None
channel = _channel_or_close_code(ws)
sidecar_url = _build_sidecar_url(channel) if channel else None
try:
argv, cwd, env = _resolve_chat_argv(resume=resume, sidecar_url=sidecar_url)
except SystemExit as exc:
# _make_tui_argv calls sys.exit(1) when node/npm is missing.
await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
try:
bridge = PtyBridge.spawn(argv, cwd=cwd, env=env)
except PtyUnavailableError as exc:
await ws.send_text(f"\r\n\x1b[31mChat unavailable: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
except (FileNotFoundError, OSError) as exc:
await ws.send_text(f"\r\n\x1b[31mChat failed to start: {exc}\x1b[0m\r\n")
await ws.close(code=1011)
return
loop = asyncio.get_running_loop()
# --- reader task: PTY master → WebSocket ----------------------------
async def pump_pty_to_ws() -> None:
while True:
chunk = await loop.run_in_executor(
None, bridge.read, _PTY_READ_CHUNK_TIMEOUT
)
if chunk is None: # EOF
return
if not chunk: # no data this tick; yield control and retry
await asyncio.sleep(0)
continue
try:
await ws.send_bytes(chunk)
except Exception:
return
reader_task = asyncio.create_task(pump_pty_to_ws())
# --- writer loop: WebSocket → PTY master ----------------------------
try:
while True:
msg = await ws.receive()
msg_type = msg.get("type")
if msg_type == "websocket.disconnect":
break
raw = msg.get("bytes")
if raw is None:
text = msg.get("text")
raw = text.encode("utf-8") if isinstance(text, str) else b""
if not raw:
continue
# Resize escape is consumed locally, never written to the PTY.
match = _RESIZE_RE.match(raw)
if match and match.end() == len(raw):
cols = int(match.group(1))
rows = int(match.group(2))
bridge.resize(cols=cols, rows=rows)
continue
bridge.write(raw)
except WebSocketDisconnect:
pass
finally:
reader_task.cancel()
try:
await reader_task
except (asyncio.CancelledError, Exception):
pass
bridge.close()
# ---------------------------------------------------------------------------
# /api/ws — JSON-RPC WebSocket sidecar for the dashboard "Chat" tab.
#
# Drives the same `tui_gateway.dispatch` surface Ink uses over stdio, so the
# dashboard can render structured metadata (model badge, tool-call sidebar,
# slash launcher, session info) alongside the xterm.js terminal that PTY
# already paints. Both transports bind to the same session id when one is
# active, so a tool.start emitted by the agent fans out to both sinks.
# ---------------------------------------------------------------------------
@app.websocket("/api/ws")
async def gateway_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
from tui_gateway.ws import handle_ws
await handle_ws(ws)
# ---------------------------------------------------------------------------
# /api/pub + /api/events — chat-tab event broadcast.
#
# The PTY-side ``tui_gateway.entry`` opens /api/pub at startup (driven by
# HERMES_TUI_SIDECAR_URL set in /api/pty's PTY env) and writes every
# dispatcher emit through it. The dashboard fans those frames out to any
# subscriber that opened /api/events on the same channel id. This is what
# gives the React sidebar its tool-call feed without breaking the PTY
# child's stdio handshake with Ink.
# ---------------------------------------------------------------------------
@app.websocket("/api/pub")
async def pub_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
channel = _channel_or_close_code(ws)
if not channel:
await ws.close(code=4400)
return
await ws.accept()
try:
while True:
await _broadcast_event(channel, await ws.receive_text())
except WebSocketDisconnect:
pass
@app.websocket("/api/events")
async def events_ws(ws: WebSocket) -> None:
if not _DASHBOARD_EMBEDDED_CHAT_ENABLED:
await ws.close(code=4403)
return
token = ws.query_params.get("token", "")
if not hmac.compare_digest(token.encode(), _SESSION_TOKEN.encode()):
await ws.close(code=4401)
return
client_host = ws.client.host if ws.client else ""
if client_host and client_host not in _LOOPBACK_HOSTS:
await ws.close(code=4403)
return
channel = _channel_or_close_code(ws)
if not channel:
await ws.close(code=4400)
return
await ws.accept()
async with _event_lock:
_event_channels.setdefault(channel, set()).add(ws)
try:
while True:
# Subscribers don't speak — the receive() just blocks until
# disconnect so the connection stays open as long as the
# browser holds it.
await ws.receive_text()
except WebSocketDisconnect:
pass
finally:
async with _event_lock:
subs = _event_channels.get(channel)
if subs is not None:
subs.discard(ws)
if not subs:
_event_channels.pop(channel, None)
def mount_spa(application: FastAPI):
"""Mount the built SPA. Falls back to index.html for client-side routing.
@@ -2284,8 +2613,10 @@ def mount_spa(application: FastAPI):
def _serve_index():
"""Return index.html with the session token injected."""
html = _index_path.read_text()
chat_js = "true" if _DASHBOARD_EMBEDDED_CHAT_ENABLED else "false"
token_script = (
f'<script>window.__HERMES_SESSION_TOKEN__="{_SESSION_TOKEN}";</script>'
f'<script>window.__HERMES_SESSION_TOKEN__="{_SESSION_TOKEN}";'
f"window.__HERMES_DASHBOARD_EMBEDDED_CHAT__={chat_js};</script>"
)
html = html.replace("</head>", f"{token_script}</head>", 1)
return HTMLResponse(
@@ -2798,10 +3129,15 @@ def start_server(
port: int = 9119,
open_browser: bool = True,
allow_public: bool = False,
*,
embedded_chat: bool = False,
):
"""Start the web UI server."""
import uvicorn
global _DASHBOARD_EMBEDDED_CHAT_ENABLED
_DASHBOARD_EMBEDDED_CHAT_ENABLED = embedded_chat
_LOCALHOST = ("127.0.0.1", "localhost", "::1")
if host not in _LOCALHOST and not allow_public:
raise SystemExit(
@@ -2817,7 +3153,10 @@ def start_server(
# Record the bound host so host_header_middleware can validate incoming
# Host headers against it. Defends against DNS rebinding (GHSA-ppp5-vxwm-4cf7).
# bound_port is also stashed so /api/pty can build the back-WS URL the
# PTY child uses to publish events to the dashboard sidebar.
app.state.bound_host = host
app.state.bound_port = port
if open_browser:
import webbrowser
+3 -4
View File
@@ -195,10 +195,6 @@ def setup_logging(
The ``logs/`` directory where files are written.
"""
global _logging_initialized
if _logging_initialized and not force:
home = hermes_home or get_hermes_home()
return home / "logs"
home = hermes_home or get_hermes_home()
log_dir = home / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
@@ -248,6 +244,9 @@ def setup_logging(
log_filter=_ComponentFilter(COMPONENT_PREFIXES["gateway"]),
)
if _logging_initialized and not force:
return log_dir
# Ensure root logger level is low enough for the handlers to fire.
if root.level == logging.NOTSET or root.level > level:
root.setLevel(level)
+412 -67
View File
@@ -22,6 +22,8 @@ import sqlite3
import threading
import time
from pathlib import Path
from agent.memory_manager import sanitize_context
from hermes_constants import get_hermes_home
from typing import Any, Callable, Dict, List, Optional, TypeVar
@@ -31,7 +33,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 8
SCHEMA_VERSION = 10
SCHEMA_SQL = """
CREATE TABLE IF NOT EXISTS schema_version (
@@ -83,7 +85,8 @@ CREATE TABLE IF NOT EXISTS messages (
reasoning TEXT,
reasoning_content TEXT,
reasoning_details TEXT,
codex_reasoning_items TEXT
codex_reasoning_items TEXT,
codex_message_items TEXT
);
CREATE TABLE IF NOT EXISTS state_meta (
@@ -118,6 +121,32 @@ CREATE TRIGGER IF NOT EXISTS messages_fts_update AFTER UPDATE ON messages BEGIN
END;
"""
# Trigram FTS5 table for CJK substring search. The default unicode61
# tokenizer splits CJK characters into individual tokens, breaking phrase
# matching. The trigram tokenizer creates overlapping 3-byte sequences so
# substring queries work natively for any script (CJK, Thai, etc.).
FTS_TRIGRAM_SQL = """
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts_trigram USING fts5(
content,
content=messages,
content_rowid=id,
tokenize='trigram'
);
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_insert AFTER INSERT ON messages BEGIN
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_delete AFTER DELETE ON messages BEGIN
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
END;
CREATE TRIGGER IF NOT EXISTS messages_fts_trigram_update AFTER UPDATE ON messages BEGIN
INSERT INTO messages_fts_trigram(messages_fts_trigram, rowid, content) VALUES('delete', old.id, old.content);
INSERT INTO messages_fts_trigram(rowid, content) VALUES (new.id, new.content);
END;
"""
class SessionDB:
"""
@@ -356,6 +385,27 @@ class SessionDB:
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 8")
if current_version < 9:
# v9: preserve replayable Codex assistant message ids/phases so
# follow-up turns can rebuild Responses API message items instead
# of flattening everything to plain assistant text.
try:
cursor.execute('ALTER TABLE messages ADD COLUMN "codex_message_items" TEXT')
except sqlite3.OperationalError:
pass # Column already exists
cursor.execute("UPDATE schema_version SET version = 9")
if current_version < 10:
# v10: trigram FTS5 table for CJK/substring search.
# Created via FTS_TRIGRAM_SQL below; backfill existing messages.
try:
cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
except sqlite3.OperationalError:
cursor.executescript(FTS_TRIGRAM_SQL)
cursor.execute(
"INSERT INTO messages_fts_trigram(rowid, content) "
"SELECT id, content FROM messages WHERE content IS NOT NULL"
)
cursor.execute("UPDATE schema_version SET version = 10")
# Unique title index — always ensure it exists (safe to run after migrations
# since the title column is guaranteed to exist at this point)
@@ -373,6 +423,12 @@ class SessionDB:
except sqlite3.OperationalError:
cursor.executescript(FTS_SQL)
# Trigram FTS5 for CJK/substring search
try:
cursor.execute("SELECT * FROM messages_fts_trigram LIMIT 0")
except sqlite3.OperationalError:
cursor.executescript(FTS_TRIGRAM_SQL)
self._conn.commit()
# =========================================================================
@@ -822,7 +878,18 @@ class SessionDB:
params = []
if not include_children:
where_clauses.append("s.parent_session_id IS NULL")
# Show root sessions and branch sessions (whose parent ended with
# end_reason='branched' before the child was created), while still
# hiding sub-agent runs and compression continuations (which also
# carry a parent_session_id but were spawned while the parent was
# still live — i.e., started_at < parent.ended_at).
where_clauses.append(
"(s.parent_session_id IS NULL"
" OR EXISTS (SELECT 1 FROM sessions p"
" WHERE p.id = s.parent_session_id"
" AND p.end_reason = 'branched'"
" AND s.started_at >= p.ended_at))"
)
if source:
where_clauses.append("s.source = ?")
@@ -956,6 +1023,7 @@ class SessionDB:
reasoning_content: str = None,
reasoning_details: Any = None,
codex_reasoning_items: Any = None,
codex_message_items: Any = None,
) -> int:
"""
Append a message to a session. Returns the message row ID.
@@ -972,6 +1040,10 @@ class SessionDB:
json.dumps(codex_reasoning_items)
if codex_reasoning_items else None
)
codex_message_items_json = (
json.dumps(codex_message_items)
if codex_message_items else None
)
tool_calls_json = json.dumps(tool_calls) if tool_calls else None
# Pre-compute tool call count
@@ -983,8 +1055,9 @@ class SessionDB:
cursor = conn.execute(
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_content, reasoning_details, codex_reasoning_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
reasoning, reasoning_content, reasoning_details, codex_reasoning_items,
codex_message_items)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
@@ -999,6 +1072,7 @@ class SessionDB:
reasoning_content,
reasoning_details_json,
codex_items_json,
codex_message_items_json,
),
)
msg_id = cursor.lastrowid
@@ -1039,22 +1113,98 @@ class SessionDB:
result.append(msg)
return result
def get_messages_as_conversation(self, session_id: str) -> List[Dict[str, Any]]:
def resolve_resume_session_id(self, session_id: str) -> str:
"""Redirect a resume target to the descendant session that holds the messages.
Context compression ends the current session and forks a new child session
(linked via ``parent_session_id``). The flush cursor is reset, so the
child is where new messages actually land the parent ends up with
``message_count = 0`` rows unless messages had already been flushed to
it before compression. See #15000.
This helper walks ``parent_session_id`` forward from ``session_id`` and
returns the first descendant in the chain that has at least one message
row. If the original session already has messages, or no descendant
has any, the original ``session_id`` is returned unchanged.
The chain is always walked via the child whose ``started_at`` is
latest; that matches the single-chain shape that compression creates.
A depth cap (32) guards against accidental loops in malformed data.
"""
if not session_id:
return session_id
with self._lock:
# If this session already has messages, nothing to redirect.
try:
row = self._conn.execute(
"SELECT 1 FROM messages WHERE session_id = ? LIMIT 1",
(session_id,),
).fetchone()
except Exception:
return session_id
if row is not None:
return session_id
# Walk descendants: at each step, pick the most-recently-started
# child session; stop once we find one with messages.
current = session_id
seen = {current}
for _ in range(32):
try:
child_row = self._conn.execute(
"SELECT id FROM sessions "
"WHERE parent_session_id = ? "
"ORDER BY started_at DESC, id DESC LIMIT 1",
(current,),
).fetchone()
except Exception:
return session_id
if child_row is None:
return session_id
child_id = child_row["id"] if hasattr(child_row, "keys") else child_row[0]
if not child_id or child_id in seen:
return session_id
seen.add(child_id)
try:
msg_row = self._conn.execute(
"SELECT 1 FROM messages WHERE session_id = ? LIMIT 1",
(child_id,),
).fetchone()
except Exception:
return session_id
if msg_row is not None:
return child_id
current = child_id
return session_id
def get_messages_as_conversation(
self, session_id: str, include_ancestors: bool = False
) -> List[Dict[str, Any]]:
"""
Load messages in the OpenAI conversation format (role + content dicts).
Used by the gateway to restore conversation history.
"""
session_ids = [session_id]
if include_ancestors:
session_ids = self._session_lineage_root_to_tip(session_id)
with self._lock:
cursor = self._conn.execute(
placeholders = ",".join("?" for _ in session_ids)
rows = self._conn.execute(
"SELECT role, content, tool_call_id, tool_calls, tool_name, "
"reasoning, reasoning_content, reasoning_details, codex_reasoning_items "
"FROM messages WHERE session_id = ? ORDER BY timestamp, id",
(session_id,),
)
rows = cursor.fetchall()
"reasoning, reasoning_content, reasoning_details, codex_reasoning_items, "
"codex_message_items "
f"FROM messages WHERE session_id IN ({placeholders}) ORDER BY timestamp, id",
tuple(session_ids),
).fetchall()
messages = []
for row in rows:
msg = {"role": row["role"], "content": row["content"]}
content = row["content"]
if row["role"] in {"user", "assistant"} and isinstance(content, str):
content = sanitize_context(content).strip()
msg = {"role": row["role"], "content": content}
if row["tool_call_id"]:
msg["tool_call_id"] = row["tool_call_id"]
if row["tool_name"]:
@@ -1085,9 +1235,53 @@ class SessionDB:
except (json.JSONDecodeError, TypeError):
logger.warning("Failed to deserialize codex_reasoning_items, falling back to None")
msg["codex_reasoning_items"] = None
if row["codex_message_items"]:
try:
msg["codex_message_items"] = json.loads(row["codex_message_items"])
except (json.JSONDecodeError, TypeError):
logger.warning("Failed to deserialize codex_message_items, falling back to None")
msg["codex_message_items"] = None
if include_ancestors and self._is_duplicate_replayed_user_message(messages, msg):
continue
messages.append(msg)
return messages
def _session_lineage_root_to_tip(self, session_id: str) -> List[str]:
if not session_id:
return [session_id]
chain = []
current = session_id
seen = set()
with self._lock:
for _ in range(100):
if not current or current in seen:
break
seen.add(current)
chain.append(current)
row = self._conn.execute(
"SELECT parent_session_id FROM sessions WHERE id = ?",
(current,),
).fetchone()
if row is None:
break
current = row["parent_session_id"] if hasattr(row, "keys") else row[0]
return list(reversed(chain)) or [session_id]
@staticmethod
def _is_duplicate_replayed_user_message(messages: List[Dict[str, Any]], msg: Dict[str, Any]) -> bool:
if msg.get("role") != "user":
return False
content = msg.get("content")
if not isinstance(content, str) or not content:
return False
for prev in reversed(messages):
if prev.get("role") == "user" and prev.get("content") == content:
return True
if prev.get("role") == "assistant" and (prev.get("content") or prev.get("tool_calls")):
return False
return False
# =========================================================================
# Search
# =========================================================================
@@ -1146,6 +1340,16 @@ class SessionDB:
return sanitized.strip()
@staticmethod
def _is_cjk_codepoint(cp: int) -> bool:
return (0x4E00 <= cp <= 0x9FFF or # CJK Unified Ideographs
0x3400 <= cp <= 0x4DBF or # CJK Extension A
0x20000 <= cp <= 0x2A6DF or # CJK Extension B
0x3000 <= cp <= 0x303F or # CJK Symbols
0x3040 <= cp <= 0x309F or # Hiragana
0x30A0 <= cp <= 0x30FF or # Katakana
0xAC00 <= cp <= 0xD7AF) # Hangul Syllables
@staticmethod
def _contains_cjk(text: str) -> bool:
"""Check if text contains CJK (Chinese, Japanese, Korean) characters."""
@@ -1161,6 +1365,11 @@ class SessionDB:
return True
return False
@classmethod
def _count_cjk(cls, text: str) -> int:
"""Count CJK characters in text."""
return sum(1 for ch in text if cls._is_cjk_codepoint(ord(ch)))
def search_messages(
self,
query: str,
@@ -1231,52 +1440,113 @@ class SessionDB:
LIMIT ? OFFSET ?
"""
with self._lock:
try:
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
# unless query contains CJK (fall back to LIKE below)
if not self._contains_cjk(query):
return []
matches = []
else:
matches = [dict(row) for row in cursor.fetchall()]
# LIKE fallback for CJK queries: FTS5 default tokenizer splits CJK
# characters individually, causing multi-character queries to fail.
if not matches and self._contains_cjk(query):
# CJK queries bypass the unicode61 FTS5 table. The default tokenizer
# splits CJK characters into individual tokens, so "大别山项目" becomes
# "大 AND 别 AND 山 AND 项 AND 目" — producing false positives and
# missing exact phrase matches.
#
# For queries with 3+ CJK characters, we use the trigram FTS5 table
# (indexed substring matching with ranking and snippets). For shorter
# CJK queries (1-2 chars), trigram can't match (it needs ≥9 UTF-8
# bytes = 3 CJK chars), so we fall back to LIKE.
is_cjk = self._contains_cjk(query)
if is_cjk:
raw_query = query.strip('"').strip()
like_where = ["m.content LIKE ?"]
like_params: list = [f"%{raw_query}%"]
if source_filter is not None:
like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
like_params.extend(source_filter)
if exclude_sources is not None:
like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
like_params.extend(exclude_sources)
if role_filter:
like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
like_params.extend(role_filter)
like_sql = f"""
SELECT m.id, m.session_id, m.role,
substr(m.content,
max(1, instr(m.content, ?) - 40),
120) AS snippet,
m.content, m.timestamp, m.tool_name,
s.source, s.model, s.started_at AS session_started
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(like_where)}
ORDER BY m.timestamp DESC
LIMIT ? OFFSET ?
"""
like_params.extend([limit, offset])
# instr() parameter goes first in the bound list
like_params = [raw_query] + like_params
cjk_count = self._count_cjk(raw_query)
if cjk_count >= 3:
# Trigram FTS5 path — quote each non-operator token to handle
# FTS5 special chars (%, *, etc.) while preserving boolean
# operators (AND, OR, NOT) for multi-term queries.
tokens = raw_query.split()
parts = []
for tok in tokens:
if tok.upper() in ("AND", "OR", "NOT"):
parts.append(tok)
else:
parts.append('"' + tok.replace('"', '""') + '"')
trigram_query = " ".join(parts)
tri_where = ["messages_fts_trigram MATCH ?"]
tri_params: list = [trigram_query]
if source_filter is not None:
tri_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
tri_params.extend(source_filter)
if exclude_sources is not None:
tri_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
tri_params.extend(exclude_sources)
if role_filter:
tri_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
tri_params.extend(role_filter)
tri_sql = f"""
SELECT
m.id,
m.session_id,
m.role,
snippet(messages_fts_trigram, 0, '>>>', '<<<', '...', 40) AS snippet,
m.content,
m.timestamp,
m.tool_name,
s.source,
s.model,
s.started_at AS session_started
FROM messages_fts_trigram
JOIN messages m ON m.id = messages_fts_trigram.rowid
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(tri_where)}
ORDER BY rank
LIMIT ? OFFSET ?
"""
tri_params.extend([limit, offset])
with self._lock:
try:
tri_cursor = self._conn.execute(tri_sql, tri_params)
except sqlite3.OperationalError:
matches = []
else:
matches = [dict(row) for row in tri_cursor.fetchall()]
else:
# Short CJK query (1-2 chars) — trigram needs ≥3 CJK chars.
# Fall back to LIKE substring search.
escaped = raw_query.replace("\\", "\\\\").replace("%", "\\%").replace("_", "\\_")
like_where = ["m.content LIKE ? ESCAPE '\\'"]
like_params: list = [f"%{escaped}%"]
if source_filter is not None:
like_where.append(f"s.source IN ({','.join('?' for _ in source_filter)})")
like_params.extend(source_filter)
if exclude_sources is not None:
like_where.append(f"s.source NOT IN ({','.join('?' for _ in exclude_sources)})")
like_params.extend(exclude_sources)
if role_filter:
like_where.append(f"m.role IN ({','.join('?' for _ in role_filter)})")
like_params.extend(role_filter)
like_sql = f"""
SELECT m.id, m.session_id, m.role,
substr(m.content,
max(1, instr(m.content, ?) - 40),
120) AS snippet,
m.content, m.timestamp, m.tool_name,
s.source, s.model, s.started_at AS session_started
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE {' AND '.join(like_where)}
ORDER BY m.timestamp DESC
LIMIT ? OFFSET ?
"""
like_params.extend([limit, offset])
# instr() parameter goes first in the bound list
like_params = [raw_query] + like_params
with self._lock:
like_cursor = self._conn.execute(like_sql, like_params)
matches = [dict(row) for row in like_cursor.fetchall()]
else:
with self._lock:
like_cursor = self._conn.execute(like_sql, like_params)
matches = [dict(row) for row in like_cursor.fetchall()]
try:
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
return []
else:
matches = [dict(row) for row in cursor.fetchall()]
# Add surrounding context (1 message before + after each match).
# Done outside the lock so we don't hold it across N sequential queries.
@@ -1336,16 +1606,32 @@ class SessionDB:
limit: int = 20,
offset: int = 0,
) -> List[Dict[str, Any]]:
"""List sessions, optionally filtered by source."""
"""List sessions, optionally filtered by source.
Returns rows enriched with a computed ``last_active`` column (latest
message timestamp for the session, falling back to ``started_at``),
ordered by most-recently-used first.
"""
select_with_last_active = (
"SELECT s.*, COALESCE(m.last_active, s.started_at) AS last_active "
"FROM sessions s "
"LEFT JOIN ("
"SELECT session_id, MAX(timestamp) AS last_active "
"FROM messages GROUP BY session_id"
") m ON m.session_id = s.id "
)
with self._lock:
if source:
cursor = self._conn.execute(
"SELECT * FROM sessions WHERE source = ? ORDER BY started_at DESC LIMIT ? OFFSET ?",
f"{select_with_last_active}"
"WHERE s.source = ? "
"ORDER BY last_active DESC, s.started_at DESC, s.id DESC LIMIT ? OFFSET ?",
(source, limit, offset),
)
else:
cursor = self._conn.execute(
"SELECT * FROM sessions ORDER BY started_at DESC LIMIT ? OFFSET ?",
f"{select_with_last_active}"
"ORDER BY last_active DESC, s.started_at DESC, s.id DESC LIMIT ? OFFSET ?",
(limit, offset),
)
return [dict(row) for row in cursor.fetchall()]
@@ -1412,12 +1698,45 @@ class SessionDB:
)
self._execute_write(_do)
def delete_session(self, session_id: str) -> bool:
@staticmethod
def _remove_session_files(sessions_dir: Optional[Path], session_id: str) -> None:
"""Remove on-disk transcript files for a session.
Cleans up ``{session_id}.json``, ``{session_id}.jsonl``, and any
``request_dump_{session_id}_*.json`` files left by the gateway.
Silently skips files that don't exist and swallows OSError so a
filesystem hiccup never blocks a DB operation.
"""
if sessions_dir is None:
return
for suffix in (".json", ".jsonl"):
p = sessions_dir / f"{session_id}{suffix}"
try:
p.unlink(missing_ok=True)
except OSError:
pass
# request_dump files use session_id as a prefix component
try:
for p in sessions_dir.glob(f"request_dump_{session_id}_*.json"):
try:
p.unlink(missing_ok=True)
except OSError:
pass
except OSError:
pass
def delete_session(
self,
session_id: str,
sessions_dir: Optional[Path] = None,
) -> bool:
"""Delete a session and all its messages.
Child sessions are orphaned (parent_session_id set to NULL) rather
than cascade-deleted, so they remain accessible independently.
Returns True if the session was found and deleted.
When *sessions_dir* is provided, also removes on-disk transcript
files (``.json`` / ``.jsonl`` / ``request_dump_*``) for the deleted
session. Returns True if the session was found and deleted.
"""
def _do(conn):
cursor = conn.execute(
@@ -1434,16 +1753,29 @@ class SessionDB:
conn.execute("DELETE FROM messages WHERE session_id = ?", (session_id,))
conn.execute("DELETE FROM sessions WHERE id = ?", (session_id,))
return True
return self._execute_write(_do)
def prune_sessions(self, older_than_days: int = 90, source: str = None) -> int:
deleted = self._execute_write(_do)
if deleted:
self._remove_session_files(sessions_dir, session_id)
return deleted
def prune_sessions(
self,
older_than_days: int = 90,
source: str = None,
sessions_dir: Optional[Path] = None,
) -> int:
"""Delete sessions older than N days. Returns count of deleted sessions.
Only prunes ended sessions (not active ones). Child sessions outside
the prune window are orphaned (parent_session_id set to NULL) rather
than cascade-deleted.
than cascade-deleted. When *sessions_dir* is provided, also removes
on-disk transcript files (``.json`` / ``.jsonl`` /
``request_dump_*``) for every pruned session, outside the DB
transaction.
"""
cutoff = time.time() - (older_than_days * 86400)
removed_ids: list[str] = []
def _do(conn):
if source:
@@ -1473,9 +1805,14 @@ class SessionDB:
for sid in session_ids:
conn.execute("DELETE FROM messages WHERE session_id = ?", (sid,))
conn.execute("DELETE FROM sessions WHERE id = ?", (sid,))
removed_ids.append(sid)
return len(session_ids)
return self._execute_write(_do)
count = self._execute_write(_do)
# Clean up on-disk files outside the DB transaction
for sid in removed_ids:
self._remove_session_files(sessions_dir, sid)
return count
# ── Meta key/value (for scheduler bookkeeping) ──
@@ -1529,6 +1866,7 @@ class SessionDB:
retention_days: int = 90,
min_interval_hours: int = 24,
vacuum: bool = True,
sessions_dir: Optional[Path] = None,
) -> Dict[str, Any]:
"""Idempotent auto-maintenance: prune old sessions + optional VACUUM.
@@ -1536,6 +1874,10 @@ class SessionDB:
within ``min_interval_hours`` no-op. Designed to be called once at
startup from long-lived entrypoints (CLI, gateway, cron scheduler).
When *sessions_dir* is provided, on-disk transcript files
(``.json`` / ``.jsonl`` / ``request_dump_*``) for pruned sessions
are removed as part of the same sweep (issue #3015).
Never raises. On any failure, logs a warning and returns a dict
with ``"error"`` set.
@@ -1559,7 +1901,10 @@ class SessionDB:
except (TypeError, ValueError):
pass # corrupt meta; treat as no prior run
pruned = self.prune_sessions(older_than_days=retention_days)
pruned = self.prune_sessions(
older_than_days=retention_days,
sessions_dir=sessions_dir,
)
result["pruned"] = pruned
# Only VACUUM if we actually freed rows — VACUUM on a tight DB
+26 -2
View File
@@ -24,6 +24,7 @@ import json
import asyncio
import logging
import threading
import time
from typing import Dict, Any, List, Optional, Tuple
from tools.registry import discover_builtin_tools, registry
@@ -347,6 +348,18 @@ def get_tool_definitions(
global _last_resolved_tool_names
_last_resolved_tool_names = [t["function"]["name"] for t in filtered_tools]
# Sanitize schemas for broad backend compatibility. llama.cpp's
# json-schema-to-grammar converter (used by its OAI server to build
# GBNF tool-call parsers) rejects some shapes that cloud providers
# silently accept — bare "type": "object" with no properties,
# string-valued schema nodes from malformed MCP servers, etc. This
# is a no-op for schemas that are already well-formed.
try:
from tools.schema_sanitizer import sanitize_tool_schemas
filtered_tools = sanitize_tool_schemas(filtered_tools)
except Exception as e: # pragma: no cover — defensive
logger.warning("Schema sanitization skipped: %s", e)
return filtered_tools
@@ -456,9 +469,9 @@ def _coerce_number(value: str, integer_only: bool = False):
f = float(value)
except (ValueError, OverflowError):
return value
# Guard against inf/nan before int() conversion
# Guard against inf/nan — not JSON-serializable, keep original string
if f != f or f == float("inf") or f == float("-inf"):
return f
return value
# If it looks like an integer (no fractional part), return int
if f == int(f):
return int(f)
@@ -555,6 +568,14 @@ def handle_function_call(
except Exception:
pass # file_tools may not be loaded yet
# Measure tool dispatch latency so post_tool_call and
# transform_tool_result hooks can observe per-tool duration.
# Inspired by Claude Code 2.1.119, which added ``duration_ms`` to
# PostToolUse hook inputs so plugin authors can build latency
# dashboards, budget alerts, and regression canaries without having
# to wrap every tool manually. We use monotonic() so the value is
# unaffected by wall-clock adjustments during the call.
_dispatch_start = time.monotonic()
if function_name == "execute_code":
# Prefer the caller-provided list so subagents can't overwrite
# the parent's tool set via the process-global.
@@ -570,6 +591,7 @@ def handle_function_call(
task_id=task_id,
user_task=user_task,
)
duration_ms = int((time.monotonic() - _dispatch_start) * 1000)
try:
from hermes_cli.plugins import invoke_hook
@@ -581,6 +603,7 @@ def handle_function_call(
task_id=task_id or "",
session_id=session_id or "",
tool_call_id=tool_call_id or "",
duration_ms=duration_ms,
)
except Exception:
pass
@@ -601,6 +624,7 @@ def handle_function_call(
task_id=task_id or "",
session_id=session_id or "",
tool_call_id=tool_call_id or "",
duration_ms=duration_ms,
)
for hook_result in hook_results:
if isinstance(hook_result, str):
+30 -3
View File
@@ -7,9 +7,7 @@
perSystem = { pkgs, system, lib, ... }:
let
hermes-agent = inputs.self.packages.${system}.default;
hermesVenv = pkgs.callPackage ./python.nix {
inherit (inputs) uv2nix pyproject-nix pyproject-build-systems;
};
hermesVenv = hermes-agent.hermesVenv;
configMergeScript = pkgs.callPackage ./configMergeScript.nix { };
@@ -193,6 +191,35 @@ json.dump(sorted(leaf_paths(DEFAULT_CONFIG)), sys.stdout, indent=2)
echo "ok" > $out/result
'';
# Verify extraPythonPackages PYTHONPATH injection
extra-python-packages = let
testPkg = pkgs.python312Packages.pyfiglet;
hermesWithExtra = hermes-agent.override {
extraPythonPackages = [ testPkg ];
};
in pkgs.runCommand "hermes-extra-python-packages" { } ''
set -e
echo "=== Checking extraPythonPackages PYTHONPATH injection ==="
grep -q "PYTHONPATH" ${hermesWithExtra}/bin/hermes || \
(echo "FAIL: PYTHONPATH not in wrapper"; exit 1)
echo "PASS: PYTHONPATH present in wrapper"
grep -q "${testPkg}" ${hermesWithExtra}/bin/hermes || \
(echo "FAIL: test package path not in PYTHONPATH"; exit 1)
echo "PASS: test package path found in wrapper"
echo "=== Checking base package has no PYTHONPATH ==="
if grep -q "PYTHONPATH" ${hermes-agent}/bin/hermes; then
echo "FAIL: base package should not have PYTHONPATH"; exit 1
fi
echo "PASS: base package clean"
echo "=== All extraPythonPackages checks passed ==="
mkdir -p $out
echo "ok" > $out/result
'';
# ── Config merge + round-trip test ────────────────────────────────
# Tests the merge script (Nix activation behavior) across 7
# scenarios, then verifies Python's load_config() reads correctly.

Some files were not shown because too many files have changed in this diff Show More