Compare commits

...

145 Commits

Author SHA1 Message Date
Brooklyn Nicholson 696a7143fb feat(tui): portable newline keys in the composer
Shift+Enter, Alt+Enter and Ctrl+Enter only reach the app when the
terminal forwards them via the kitty keyboard or modifyOtherKeys
protocols. On Windows + WSL2, plain Apple Terminal, and SSH-without-
extended-keys, every Enter variant collapses onto a bare '\r' (or
'\x1b\r' for Alt) and submits the message with no way to insert a
newline. Three small changes give every terminal a working newline
binding without any capability detection:

1. parse-keypress now recognises the legacy single/two-byte Enter
   encodings:
     '\r'      -> return
     '\n'      -> Ctrl+Enter (Ctrl+J)
     '\x1b\r'  -> Alt+Enter
     '\x1b\n'  -> Alt+Ctrl+Enter
   Modern CSI-u sequences still parse via the kitty path; this only
   fills the legacy gap.

2. The composer's Enter handler picks up `\<Enter>` as an inline
   line continuation: when the char left of the cursor is a literal
   backslash, the '\\' is consumed and replaced with '\n'. Mirrors
   Claude Code's PromptInput behaviour — discoverable, universal,
   in-place (vs. the existing submit-time buffered fallback).
   Extracted as a pure `resolveReturn(value, cursor, modifier)`
   helper so the contract is unit-tested.

3. Hotkeys help documents the trio: Shift+Enter / Alt+Enter / Ctrl+J
   plus `\\+Enter` inline.

Prior art: Claude Code (`\\+Enter` inline backspace), OpenAI Codex
(Ctrl+J / Ctrl+M / Enter / Shift+Enter / Alt+Enter as alternates),
Aider (`/multiline` modal toggle). This PR takes the union of the
discoverable variants without the modal mode.
2026-05-23 18:27:14 -05:00
brooklyn! 874c2b1fe6 fix(tui): ignore late thinking deltas after completion (#31055)
* fix(tui): ignore late thinking deltas after completion

Prevent stale reasoning events from repainting the TUI status after a turn has already completed and the UI is idle.

* test(tui): restore timers after thinking delta assertion

Keep fake timer cleanup in a finally block so assertion failures cannot leak timer mode into later tests.
2026-05-23 13:31:06 -05:00
brooklyn! e6ca730a22 fix(tui): log parent gateway lifecycle exits (#31051)
* fix(tui): log parent gateway lifecycle exits

Add parent-side breadcrumbs for TUI gateway shutdown and transport exits so future backend EOF/SIGTERM reports identify the parent action that caused them.

* chore(tui): retrigger lifecycle logging checks

Retry transient GitHub checkout failures on the lifecycle logging PR.
2026-05-23 13:28:40 -05:00
brooklyn! 026f64f8e0 fix(tui): commit composer input bursts immediately (#31053)
* fix(tui): commit composer input bursts immediately

Salvage the WSL/terminal multi-character input burst fix with focused regression coverage so delayed pseudo-paste buffers cannot reorder later edits.

* fix(tui): keep newline input bursts on paste path

Preserve paste handling for multi-character chunks with newlines while keeping repeated printable key bursts on the immediate composer path.

* refactor(tui): share composer frame batch interval

Use one frame-sized batching constant for parent updates, local renders, and input burst flushes.
2026-05-23 13:27:16 -05:00
Teknium ad11327db0 feat(kanban): warn users that scratch workspaces are deleted on completion (#30949)
First scratch workspace creation on an install now emits a one-shot
warning log + a 'tip_scratch_workspace' event on the task. Sentinel
file at ~/.hermes/kanban/.scratch_tip_shown silences subsequent
creations across the whole install.

Behavior unchanged — scratch is still ephemeral by design. This just
makes the design visible to new users (reported in user community:
'progress files vanished, no warning anywhere').

Docs (en + ko) updated to spell out 'Deleted when the task completes'
on the scratch bullet and 'Preserved on completion' on worktree/dir.
2026-05-23 11:27:00 -07:00
Teknium cae7537359 infographic: kanban.db corruption defense (#30858 + #30862) (#30952) 2026-05-23 05:55:25 -07:00
teknium1 c4b8f5efee fix(kanban): harden corrupt-db backup against CodeQL path-injection findings
Path.resolve() before any I/O and confine backup writes to the resolved
parent directory. Adds explicit parent-equality assertions so static
analyzers see the containment guarantee, and walks WAL/SHM sidecars
through the same resolved-parent path so accidental .. segments are
collapsed before shutil.copy2.

Functionally equivalent to the original PR; preserves the corrupt bytes
to <db>.corrupt.<ts>.bak in the same directory, still raises
KanbanDbCorruptError from connect(). E2E with Stefan's exact hex header
+ malformed pages still passes. 163/163 kanban tests still pass.
2026-05-23 05:51:33 -07:00
teknium1 4f835f7e43 chore(release): map NickLarcombe author email for #30707 salvage 2026-05-23 05:51:33 -07:00
Nick 39fe4ecee3 fix(kanban): refuse corrupt db auto-init 2026-05-23 05:51:33 -07:00
Teknium e97a4c8f37 docs(readme): add Nous Portal section between Getting Started and CLI/Messaging reference (#30941)
A small, self-contained section under 'Skip the API-key collection —
Nous Portal' explaining what Portal gives you (300+ models + Tool
Gateway), the one-shot install command, and how to inspect routing.
No buzzwords, no comparison tables, no overselling.

Positioned right after 'Getting Started' so it lands where someone
scanning the README has just seen the install steps and is deciding
their next move. Skippable by anyone who already knows their provider.

The line 'You can still bring your own keys per-tool whenever you
want' is the deliberate honesty rail — Portal is an option, not a
funnel. Existing per-provider language elsewhere in the README is
unchanged.

Mirrored to README.zh-CN.md to keep the two READMEs in sync.
2026-05-23 05:25:46 -07:00
QuenVix 7245bc77eb fix(fallback): merge fallback_providers with legacy fallback_model configurations 2026-05-23 05:24:57 -07:00
Teknium 7f1b2b4569 fix(approval): pin 'silence is not consent' contract on timeout/deny (#24912) (#30879)
User incident (Slack, 2026-05-13): user walked away mid-conversation,
agent requested approval to run `rm -rf .git`, the prompt timed out
after the gateway_timeout (default 300s), and the agent removed the
.git folder on its own. Corroborated by an independent report from a
Telegram user.

The underlying code path was correct — `check_all_command_guards`
returns `approved=False` with a BLOCKED message on both timeout and
explicit deny, and `terminal_tool` surfaces that as `status=blocked`
to the agent. The bug is at the model-interface layer: the message
"BLOCKED: Command timed out. Do NOT retry this command." reads to
some models as "try a different command achieving the same outcome."

This commit changes only the model-facing message + the structured
return shape:

  - Timeout message now explicitly names the three evasion paths the
    agent must avoid: retry, rephrase, AND achieve the same outcome
    via a different command. Ends with "Silence is not consent."
  - Explicit deny gets the same shape minus the silence-is-not-consent
    line (it WAS an explicit deny, not silence).
  - New structured fields on the return dict: `outcome` ("timeout"
    or "denied") and `user_consent` (always False on this branch)
    so plugins, hooks, and audit pipelines don't have to string-parse
    the message to distinguish the two cases.

The mechanism that should already have prevented the original incident
— timeout treated as deny, BLOCKED result, post hook fires with
`choice="timeout"` — is unchanged. This commit hardens only the
agent's reading of the result.

Tests:
  - test_timeout_returns_approved_false_with_no_consent — pins the
    return shape on the Slack-shaped notify_cb-registered path
  - test_timeout_message_is_emphatic_against_retry_and_rephrase —
    pins the exact phrases the message must contain
  - test_explicit_deny_carries_same_no_consent_shape — same contract
    on explicit /deny
  - test_timeout_emits_post_hook_with_timeout_outcome — pins the
    post_approval_response hook payload so audit plugins can act

329 approval tests passing (4 new + 325 existing).

Fixes #24912
2026-05-23 02:59:13 -07:00
Teknium 6855d17753 fix(memory): guard against external drift in MEMORY.md/USER.md (#26045) (#30877)
Reproduction (production, 2026-05-14): two concurrent sessions on the
same agent. Session A patches MEMORY.md directly via the patch tool,
appending ~8KB of structured content (Vendor Master, Standing Orders,
Pin Board) — none of it through the memory tool, so no § delimiters.
Session B starts later with stale in-memory state (1 entry, ~331
chars). Session B calls memory(action=replace) on its one known
entry. The tool's _read_file parses A's content as a single 8KB
'entry' (no § splits), then replace truncates that entry to B's new
333-byte content. ~8KB of structured content silently destroyed.

The atomic-rename write path is fine in isolation. The bug is the
implicit contract: the tool assumes MEMORY.md is exclusively a
§-delimited list of small entries it wrote, but the v0.13 install
runbook itself uses 'cat >> MEMORY.md' for onboarding, the patch tool
edits the file directly, and operators do too.

Fix: a drift guard in MemoryStore._detect_external_drift that fires
on either signal:

  1. Re-parse + re-serialize doesn't produce identical bytes
     (catches oddly-encoded delimiters / partial writes).
  2. Any single parsed entry exceeds the store's whole-file char
     limit. The tool budgets the ENTIRE store against that limit
     (2200 chars for memory, 1375 for user), so no tool-written
     entry can legitimately be larger. An entry bigger than the
     store limit means an external writer dropped free-form content
     into what the tool will treat as one entry.

When drift fires, _reload_target writes a .bak.<ts> snapshot of the
on-disk file, then add/replace/remove refuse to flush. The original
file stays untouched. The error dict surfaces the .bak path AND a
remediation string ('integrate missing entries via memory(add=...)
one at a time, then rewrite the file clean') so the model can act on
it without escalating to the operator.

Tests:
  - test_replace_refuses_on_drift, test_add_refuses_on_drift,
    test_remove_refuses_on_drift — all three mutators refuse
  - test_clean_file_does_not_trigger_drift — false-positive check
  - test_error_message_points_at_remediation — error string shape
  - test_drift_guard_also_protects_user_target — USER.md too
  - test_drift_backup_filename_is_unique_per_invocation — bak.<ts>
    naming pin

144 memory tests passing (was 137; +7).

Fixes #26045
2026-05-23 02:51:29 -07:00
Teknium cc93053b42 fix(xai-oauth): apply WKE disambiguator to recovery-path catch-all (#29344)
_recover_with_credential_pool had a second classification site that blanket-
treated any 403 against xai-oauth as entitlement (defense-in-depth for
#26847).  That override defeated the new _is_entitlement_failure
disambiguator from the parent commit — bad-credentials 403s still
short-circuited the refresh path.

Apply the same WKE-unauthenticated / OAuth2-validation-phrase guard at
the override site so xAI's authoritative 'this is auth, not entitlement'
signal wins there too.  The #26847 catch-all still triggers for genuine
entitlement bodies that don't carry the disambiguator.

Closes the end-to-end gap exposed by
test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403.
2026-05-23 02:48:13 -07:00
xxxigm b5ea6a5c80 test(xai-oauth): regression coverage for the bad-credentials disambiguator (#29344)
Eleven new tests pinning the #29344 fix.  Layout mirrors the existing
"Fix D" entitlement section so the bad-credentials disambiguator
sits alongside the entitlement-block tests it complements.

Classifier-level coverage:

* ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix``
  — verbatim shape from the reporter's wire capture
  (``{code: 'caller does not have permission', error: 'OAuth2 access
  token could not be validated. [WKE=unauthenticated:bad-credentials]'}``)
  ↦ classifier must return False so the refresh path runs.
* ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape``
  — same body after ``_extract_api_error_context`` has rewritten it
  to ``{reason, message}``.  The disambiguator must fire in BOTH
  shapes; without this guard the production call site at
  ``_recover_with_credential_pool`` (which goes through the
  normalised extractor) would still misclassify.
* ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant``
  — parametrised forward-compat: ``bad-credentials``,
  ``expired-token``, ``revoked``, ``some-future-reason``.  xAI
  documents the prefix as stable, the suffix after the colon as a
  reason code that can grow; every variant under
  ``unauthenticated:`` must route to refresh.
* ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone``
  — belt-and-braces guard: if a future API revision drops the WKE
  suffix but keeps "OAuth2 access token could not be validated", we
  still classify correctly.
* ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords``
  — defensive: if a body ever carries BOTH the WKE suffix and
  entitlement language, the WKE signal wins.  Auth is recoverable;
  entitlement isn't, and a refreshed token will resurface the
  entitlement message on the next request.
* ``test_is_entitlement_failure_case_insensitive_wke_match`` —
  pins that the classifier lowercases the haystack so a future xAI
  build that uppercases the prefix doesn't reintroduce the bug.

Recovery-path coverage (end-to-end through
``_recover_with_credential_pool``):

* ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403``
  — the headline test the reporter requested: a bad-credentials 403
  with the exact wire body must call ``try_refresh_current()``
  exactly once and ``_swap_credential`` once.  Pre-fix this returned
  ``(False, _)`` because the entitlement classifier over-matched and
  short-circuited the refresh path.
* ``test_recover_with_credential_pool_still_blocks_real_entitlement``
  — companion regression guard for #26847: a pure unsubscribed-
  account body (no WKE suffix, no OAuth2-validation phrase) must
  still surface as entitlement and skip refresh.  The new
  disambiguator must not weaken the original loop-protection it
  was added to preserve.

The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the
existing ``MagicMock`` patterns from the surrounding tests so the
new section reads as a natural extension of "Fix D" rather than a
separate test file.
2026-05-23 02:48:13 -07:00
xxxigm 8b3cb930c9 fix(xai-oauth): honor [WKE=unauthenticated:...] disambiguator in entitlement classifier (#29344)
``_is_entitlement_failure`` over-matched on xAI 403s.  xAI returns the
same permission-denied ``code`` text for two distinct conditions:

  1. Unsubscribed account ("active Grok subscription. Manage at
     https://grok.com" in the ``error`` field).
  2. Stale OAuth access token ("OAuth2 access token could not be
     validated. [WKE=unauthenticated:bad-credentials]" in the ``error``
     field).

The classifier's "does not have permission + grok" substring heuristic
treated both identically, so the credential-pool refresh path was
short-circuited for case (2) — long-running TUI sessions stuck on a
stale OAuth token surfaced a non-retryable client error and the user
had to exit + reopen the TUI to recover (the startup-resolve path
bypasses the classifier entirely, which is why bridge adapters with
proactive refresh cadences didn't see this in practice).

This patch adopts the reporter's recommended fix (option 1, tightest):
honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the
``OAuth2 access token could not be validated`` phrasing as
authoritative "this is auth, not entitlement" signals.  When either
appears anywhere in the body's text fields, the classifier returns
False eagerly — *before* the entitlement keyword checks run — so the
refresh-on-401 path takes over and the existing loop-protection still
guards against runaway refresh storms if the refresh itself fails.

Two small adjustments fall out of this:

* The haystack now also covers ``code`` and ``error`` keys directly,
  not just the ``message``/``reason`` shape ``_extract_api_error_context``
  produces.  Real runtime paths use the normalised shape, but the test
  suite and any future call sites that pass raw bodies get the same
  treatment.  Backwards compatible: missing keys default to empty
  strings, the haystack still skips when everything is blank.

* Both disambiguator checks fire BEFORE the entitlement keyword
  checks.  If a future xAI body somehow lands with both an entitlement
  message AND the WKE suffix, the WKE suffix wins (correct — auth is
  recoverable; entitlement is not, and a refreshed token will surface
  the entitlement message on the next request anyway).

Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``,
``test_is_entitlement_failure_false_for_unrelated_auth_errors``,
``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``,
``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``)
continue to pass unchanged — the unsubscribed-account path, the
generic auth-error path, and the refresh-on-401 path are all left
intact.
2026-05-23 02:48:13 -07:00
Teknium 64b3eb0dd7 docs: surface Nous Portal on pages where it solves a real problem the page describes (#30874)
Follow-up to #30869. Adds Portal mentions on user-facing pages that
naturally call for an LLM + tool credentials but didn't previously
acknowledge Portal as a one-stop option.

- getting-started/installation.md: tip after the 'after install' block
  pointing at 'hermes setup --portal' for users who want everything wired
  at once instead of piecewise via 'hermes model' + 'hermes tools'.
- user-guide/configuring-models.md: small tip near the top — the page is
  literally about provider/model choice and previously had zero Portal
  mention.
- user-guide/features/voice-mode.md: Prerequisites need both an LLM and
  TTS — a Portal subscription is the single setup that covers both.
- user-guide/features/batch-processing.md: highlights Portal as a
  predictable-cost option for parallel agent runs that hit many APIs.
- user-guide/features/api-server.md: backend needs models + tools; one
  Portal sub gives a fully-equipped OpenAI-compatible endpoint.
- user-guide/windows-native.md: early-beta users on Windows benefit most
  from skipping per-tool Windows-key-juggling.
- integrations/providers.md: updates the existing Tool Gateway tip and
  the Nous Portal section to mention the new commands.
- user-guide/features/fallback-providers.md: Nous row in the provider
  table now lists 'hermes setup --portal' as the fresh-install path.

Tone discipline: one Portal mention per page, concrete CLI commands
(no marketing copy), always solving a problem the page itself sets up.
2026-05-23 02:47:53 -07:00
Teknium f3fb7899d0 docs: surface 'hermes setup --portal' and 'hermes portal' across user-facing pages (#30869)
PR #30860 added a one-shot Portal setup command and a small portal CLI
surface. Update the docs so the new commands are discoverable without
upgrading the tone of existing Portal mentions.

- getting-started/quickstart.md: small tip near Choose a Provider
  pointing at 'hermes setup --portal' as the easiest fresh-install path.
- user-guide/features/tool-gateway.md: lead the Get-Started section
  with 'hermes setup --portal' for fresh installs, keep 'hermes model'
  for already-configured users, and add 'hermes portal status / tools'
  to the activity-check commands.
- user-guide/features/{web-search,image-generation,tts,browser}.md: the
  existing 'Nous Subscribers' tip blocks now name the one-shot command
  for new installs, keeping the existing 'hermes tools' path for users
  who only want to swap a single backend.
- reference/cli-commands.md: register 'hermes portal' in the top-level
  command table, add a 'hermes portal' section with subcommands, and
  add '--portal' to the 'hermes setup' options table.

Tone: each page already had a Portal mention. This PR keeps the per-page
count to one and uses concrete CLI commands rather than promotional copy.
Tool Gateway page is the one exception (the whole doc is about Portal).
2026-05-23 02:42:31 -07:00
Teknium 9acf949e34 feat(telegram): edit status messages in place instead of appending (#30864)
Closes #30045. Based on @qike-ms's PR #30141.

Telegram status callbacks (lifecycle, compression, context-pressure)
used to append a fresh bubble on every emit. Now adapter tracks
{(chat_id, status_key) -> message_id}; first call sends, subsequent
calls edit. Failed edits drop the cache entry and fall through to a
fresh send.

- gateway/platforms/telegram.py: send_or_update_status() (+34 LOC)
- gateway/run.py: route _status_callback_sync through it when the
  adapter supports it; plain adapter.send() otherwise (+15 LOC)
- 5 tests covering first send / edit-in-place / edit-failure fallback
  / distinct key & chat isolation
2026-05-23 02:42:10 -07:00
Teknium 4b6d68bd64 test(fast-command): stub _load_gateway_runtime_config too
PR 2362cc468 ("fix(gateway): enforce env variable template expansion
on runtime config loaders") refactored `_load_service_tier` to read
config via the new `_load_gateway_runtime_config` wrapper instead of
opening `_hermes_home/config.yaml` directly. The
`test_run_agent_passes_priority_processing_to_gateway_agent` test still
only stubbed `_load_gateway_config` (the inner loader), so the runtime
wrapper saw an empty config and `_load_service_tier` returned None,
breaking the test:

  FAILED tests/gateway/test_fast_command.py::test_run_agent_passes_priority_processing_to_gateway_agent
   - AssertionError: assert None == 'priority'

Fix: also stub `_load_gateway_runtime_config` to return the expected
`agent.service_tier=fast` config, so the test once again drives the
priority routing path it was written to verify.

Confirmed reproducing on current main before the patch and passing
after.
2026-05-23 02:40:33 -07:00
Zyrixtrex 61ac118724 fix(webhook): enforce INSECURE_NO_AUTH safety rail on dynamic route reloads 2026-05-23 02:39:12 -07:00
Teknium b4cf5b65dd feat(portal): one-shot setup, status CLI, and Nous-included markers (#30860)
* feat(portal): one-shot setup, status CLI, and Nous-included markers

Four small Portal-aware surfaces that drive subscription value without
adding friction for non-Portal users.

  - hermes setup --portal: one-shot Nous OAuth + provider switch + Tool
    Gateway opt-in. Shareable as a single command from docs/social.
  - hermes portal {status,open,tools}: small surface over Portal auth +
    Tool Gateway routing. Defaults to 'status' when no subcommand.
  - Tool picker (hermes tools): when the user is logged into Nous, mark
    Nous-managed provider rows with a star and 'Included with your Nous
    subscription'. Suppressed when not authed — non-subscribers see the
    picker unchanged.
  - BYOK setup hint: a single dim line 'Available through Nous Portal
    subscription.' appears when the user is being prompted for a paid
    API key (Firecrawl, FAL, ElevenLabs, Browserbase, etc.) AND the
    category has a Nous-managed sibling AND the user is not already
    authed to Nous. Suppressed in all other cases.

Tested live end-to-end in an isolated HERMES_HOME with a simulated
authed and unauthed user. Targeted suite (tests/hermes_cli/
test_tools_config.py + test_setup.py) passes 97/97.

* fix: add portal to _BUILTIN_SUBCOMMANDS so plugin discovery fast-path skips it
2026-05-23 02:39:09 -07:00
Teknium 6942b1836e fix(skills_guard): explain why --force is rejected on dangerous verdicts
Follow-up to @sprmn24's verdict-logic fix. The previous block-message
ended in 'Use --force to override' regardless of verdict — but as of
the --force fix above, dangerous community/trusted skills can't be
overridden by --force at all. The misleading hint sends users in a
loop. Replace it with a specific message that tells them what the
documented behavior actually is.

Adds two regression tests covering the dangerous-verdict message
shape and one that pins the existing --force hint for non-dangerous
blocks.
2026-05-23 02:37:30 -07:00
sprmn24 789043b691 fix(security): update tests for verdict and --force changes 2026-05-23 02:37:30 -07:00
sprmn24 0f8215f633 fix(security): correct verdict logic and enforce --force limitation in skills_guard
- _determine_verdict() returned 'caution' for medium/low-only findings,
  causing community skills with harmless patterns (e.g. path traversal
  notation, unpinned pip install) to be incorrectly blocked. Now returns
  'safe' when only medium/low severity findings are present.

- should_allow_install() allowed --force to override 'dangerous' verdict,
  contradicting documented behavior that --force does NOT override dangerous
  scan results. Added explicit check to prevent force-installing skills
  with dangerous verdict.
2026-05-23 02:37:30 -07:00
Teknium db489a315f fix(tests): allowlist tmp_path for kanban_notify artifact delivery (#30852)
`_deliver_kanban_artifacts` routes candidates through
`BasePlatformAdapter.filter_local_delivery_paths` (added in 41d2c758c),
which rejects paths outside `MEDIA_DELIVERY_SAFE_ROOTS`. The two
artifact-delivery tests create fixtures under `tmp_path`, which lives
outside the cache roots — so under CI's hermetic HOME the filter
silently dropped both fake files and the assertions on
`images_uploaded` / `documents_uploaded` failed.

Fix: monkeypatch `HERMES_MEDIA_ALLOW_DIRS=str(tmp_path)` in both tests
so the safety filter accepts the fixtures. Production behaviour
unchanged; test-side fix only.

CI fail repro on origin/main: test (6) shard, both
test_notifier_uploads_artifacts_on_completion and
test_notifier_artifact_delivery_skips_missing_files.
2026-05-23 02:34:34 -07:00
xxxigm 5b6f0b695b test(tls-fd-recycle): pin shutdown-only + thread-aware close contract (#29507)
Ten regressions across both prongs of the #29507 fix, organised so each
test names exactly which way the bug could come back:

Prong 1 — ``force_close_tcp_sockets``:
* ``shutdown_only_no_close`` is the smoking-gun assertion. If a future
  refactor adds back ``sock.close()`` to this helper, the FD-recycling
  race that wrote TLS bytes on top of ``kanban.db`` is back, and this
  trips.
* ``uses_shut_rdwr`` pins that both halves are shut down (a half-close
  wouldn't unblock a worker stuck in ``recv``).
* ``swallows_oserror_on_shutdown`` covers the already-shutdown case.
* ``handles_multiple_pool_entries`` walks all pool connections.

Prong 2 — thread-aware ``_close_request_client_once``:
* ``stranger_thread_aborts_only_no_close`` simulates the asyncio_0 →
  Thread-1616 interrupt path: stranger drives abort, holder stays
  populated for the worker's eventual finally.
* ``owner_thread_pops_and_full_close`` is the worker-thread path: pops
  + full close.
* ``stranger_then_owner_close_sequence_runs_full_close_exactly_once``
  replays the reporter's exact timeline at object level: abort runs
  once, full close runs once, holder ends empty.

Agent surface:
* ``_abort_request_openai_client_does_not_call_client_close`` pins
  that the new entrypoint shuts sockets and emits the
  ``deferred_close=stranger_thread`` marker but never calls
  ``client.close()``.
* ``_abort_request_openai_client_null_client_is_noop`` defensive.

End-to-end:
* ``fd_recycle_window_closed_by_shutdown_only`` reproduces the race
  at object level — runs the abort path from a stranger thread and
  asserts that no ``close()`` ever fires, so the kernel can never
  recycle the FD under the owner's still-active reference.
2026-05-23 02:31:10 -07:00
xxxigm 30c22f1158 fix(api-call): defer client.close() to owning worker thread on interrupt (#29507)
Layer-2 defense for the FD-recycling race: even with
``force_close_tcp_sockets`` reduced to shutdown-only, the followup
``client.close()`` in ``_close_openai_client`` still walks the httpx
pool and closes sockets — and if called from a stranger thread (the
interrupt-check loop, the stale-call detector) it has the same
FD-recycling exposure that wrote a TLS record on top of ``kanban.db``.

Stamp the request_client_holder with the owning thread's ident at
``_set_request_client`` time. In ``_close_request_client_once``:

* Owning thread (the worker's ``finally``) → pop + ``client.close()``
  via ``_close_request_openai_client``, exactly as before.
* Stranger thread → ``_abort_request_openai_client`` (new): only
  ``shutdown(SHUT_RDWR)`` the pool sockets and log a deferred-close
  marker. The holder stays populated so the worker's eventual
  ``finally`` performs the real close from its own thread context,
  where the FD release races nothing.

Applied symmetrically to both the non-streaming
``interruptible_api_call`` and the streaming variant — both routinely
get hit by stranger-thread interrupts.

The log field ``tcp_force_closed=N`` keeps its existing shape; the new
abort path adds ``deferred_close=stranger_thread`` so production
triage can distinguish the two close kinds.
2026-05-23 02:31:10 -07:00
xxxigm e2a7d73a66 fix(force_close_tcp_sockets): shutdown only, do not release FD (#29507)
The helper used to call ``socket.shutdown(SHUT_RDWR)`` followed by
``socket.close()`` to drop CLOSE-WAIT entries immediately. On its own
``shutdown()`` is safe from any thread — it only sends FIN and breaks
pending ``recv``/``send`` — but ``close()`` releases the FD integer to
the kernel. When the helper runs on a stranger thread (the interrupt
loop, the stale-call detector) the FD release races the owning httpx
worker thread that still has the same integer cached inside the SSL
BIO. The kernel then recycles that integer to the next ``open()`` call
— in production, kanban dispatcher's ``kanban.db`` — and the worker's
delayed TLS flush writes a 24-byte TLS application-data record on top
of the SQLite header.

Restrict the helper to ``shutdown(SHUT_RDWR)`` only. The owning httpx
worker's own unwind will close the underlying socket via the same
Python ``socket.socket`` object, which atomically swaps ``_fd`` to -1
before issuing ``close(2)`` — no FD-aliasing window.

The log field ``tcp_force_closed=N`` is kept (now counts shutdowns) so
existing dashboards / log parsers keep working.
2026-05-23 02:31:10 -07:00
sprmn24 53cb6d32be fix(agent): use atomic_json_write for request debug dumps instead of bare write_text 2026-05-23 02:30:57 -07:00
sprmn24 b183be95a2 fix(gateway-windows): atomic write for .cmd and startup launcher scripts 2026-05-23 02:30:41 -07:00
walli 60b0a0e006 fix(qqbot): fix SILK magic byte detection slice length
_guess_ext_from_data: data[:5] == b"#!SILK" -> data[:6] (6-byte string)
_looks_like_silk: data[:4] == b"#!SILK" -> data[:6]

The previous slices were too short to ever match the 6-byte "#!SILK"
literal, relying entirely on the "#!SILK_V3" (9-byte) and 0x02! (2-byte)
fallback paths for SILK format detection.
2026-05-23 02:27:17 -07:00
walli 0e7448d63a fix(qqbot): use original attachment filename for cached files
Add original_name parameter to _download_and_cache, preferring the
attachment metadata filename over the CDN URL path basename. Previously
files were cached with meaningless QQ CDN hash names (e.g.
qqdownload_...oadftnv5), causing ugly filenames when sent back to users.

Aligns with qqbot-agent-sdk's AttachmentDownloader.download_document.
2026-05-23 02:27:17 -07:00
walli a54f5afc70 fix(qqbot): handle op 7/9 and expand fatal close code set
1. Handle op 7 (Server Reconnect): close WS to trigger reconnect loop
   while preserving session for Resume
2. Handle op 9 (Invalid Session): check d value to determine if session
   is resumable; clear session only when not resumable
3. Remove 4009 from session-clearing set (connection timeout is resumable)
4. Expand fatal close codes: 4001/4002/4010-4014 now stop reconnect
   immediately instead of retrying uselessly
5. Add unit tests
2026-05-23 02:27:17 -07:00
walli bbd77d165c fix(qqbot): add INTERACTION intent and expose video/file cached paths
1. Add INTERACTION intent bit (1<<26) to _send_identify, fixing approval
   button clicks not being received (INTERACTION_CREATE events were never
   dispatched by the gateway)
2. Include local cached path in video/file attachment descriptions so the
   LLM can reference files for re-sending to users
3. Add unit tests (TestIdentifyIntents, TestProcessAttachmentsPathExposure)
2026-05-23 02:27:17 -07:00
teknium1 66d81f9e14 fix(gateway): don't swallow expansion errors in runtime config helper
A bare except in _load_gateway_runtime_config would silently return the
unexpanded dict on any _expand_env_vars failure — masking the very bug
this helper exists to fix. Drop it; let the caller see real errors.
2026-05-23 02:27:08 -07:00
QuenVix 2362cc4688 fix(gateway): enforce env variable template expansion on runtime config loaders 2026-05-23 02:27:08 -07:00
QuenVix d21ac579e9 fix(gateway): honor key_env in auth-failure fallback resolution 2026-05-23 02:25:53 -07:00
Teknium 99671a8634 test(kanban): allow tmp_path artifacts past media-delivery validator
PR #41d2c758c ("Fix unsafe gateway media path delivery") tightened
`validate_media_delivery_path` so that artifacts emitted by the agent
must live inside `MEDIA_DELIVERY_SAFE_ROOTS` (Hermes-managed cache
dirs) or an operator-allowlisted root via `HERMES_MEDIA_ALLOW_DIRS`.

Two kanban-notifier tests put their PDFs and PNGs under pytest's
`tmp_path`, which is correctly rejected by the new validator. They
started failing on main as soon as that PR landed:

  FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_uploads_artifacts_on_completion
  FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_artifact_delivery_skips_missing_files

Symptom in logs: "Skipping unsafe local file path outside allowed
roots". The validator is doing exactly what it should — the tests were
relying on the looser pre-fix behaviour.

Fix: add `HERMES_MEDIA_ALLOW_DIRS=tmp_path` to the `kanban_home`
fixture so artifacts under `tmp_path` are recognised as safe. This is
the same allowlist mechanism the operator-facing env var documents.
2026-05-23 02:25:09 -07:00
Teknium 5772e638c9 chore: drop in-repo infographic/ directory; keep PR-body URLs only (#30854)
PR infographics belong in PR descriptions, not committed to the repo.
Removes the 13 archived directories under infographic/ and adds the path
to .gitignore so future generations don't accidentally land in-tree.

The fal.media URLs embedded in each PR's body remain the canonical
artifact — those PR descriptions are the storage.
2026-05-23 02:25:03 -07:00
sprmn24 b2e6fdd3bf fix(agent): log warning when fallback model normalization fails instead of silently swallowing 2026-05-23 02:23:24 -07:00
teknium1 70aaa774be fix(opencode-go): emit Kimi reasoning_effort, match KimiProfile shape
The Kimi K2 branch added in the prior commit only emitted extra_body.thinking
and dropped reasoning_effort entirely. KimiProfile (api.moonshot.ai/v1) sends
both fields, and OpenCode Go proxies to the same Moonshot backend. Mirror that
shape on the Go path so /reasoning effort actually reaches Kimi.

- low/medium/high pass through verbatim
- xhigh/max clamp to high (Moonshot's max supported value)
- minimal / unknown effort → omit reasoning_effort, keep thinking on
- disabled / no config → unchanged
- DeepSeek branch unchanged
2026-05-23 02:20:28 -07:00
Harish Kukreja 3589960e03 fix(provider): expose OpenCode Go reasoning controls 2026-05-23 02:20:28 -07:00
helix4u 71291d83cd test: keep tirith checks hermetic 2026-05-23 02:20:14 -07:00
QuenVix 52a368fa72 fix(gateway): preserve WhatsApp pairing approvals across JID/LID alias flips 2026-05-23 01:46:34 -07:00
Teknium 3127a41cb1 test(acp): pin parse_model_input in slash-command tests
The two ACP slash-command tests that exercise `provider:model` routing
(`test_set_session_model_accepts_provider_prefixed_choice` and
`test_model_switch_uses_requested_provider`) relied on the live
`hermes_cli.models._KNOWN_PROVIDER_NAMES` / `_PROVIDER_ALIASES` module
state to parse `anthropic:claude-sonnet-4-6` into
`("anthropic", "claude-sonnet-4-6")`. If any earlier test in the same
xdist worker registers a custom provider that shadows `anthropic` or
otherwise mutates those globals, the parser falls into the
`detect_provider_for_model` branch and resolves to `custom` instead.

Observed once in CI on run 26326728502 / job 77505732299 as
`AssertionError: assert 'custom' == 'anthropic'` — could not reproduce
locally under per-file isolation, so the failing in-file order was
specific to a particular xdist scheduling.

Monkeypatching `parse_model_input` + `detect_provider_for_model` for
both tests removes the global-catalog dependency, so the tests now only
exercise what they were written to verify (the `requested_provider ->
runtime -> AIAgent kwargs` plumbing).
2026-05-23 01:44:56 -07:00
xxxigm 6a2df9f451 docs(env): clarify HERMES_ENABLE_PROJECT_PLUGINS contract (#29156)
The reference entry now documents the truthy set
(``1`` / ``true`` / ``yes`` / ``on``) explicitly, matches the
falsy half (``0`` / ``false`` / ``no`` / ``off`` / empty string)
that the GHSA-5qr3-c538-wm9j fix re-aligned both the agent loader
and the dashboard web server around, and points readers at the
defence-in-depth rule that project plugins never have their
Python ``api`` file auto-imported by the dashboard regardless of
the env var.
2026-05-23 01:43:52 -07:00
xxxigm 8bf99227f0 fix(plugins): block plugin-api path traversal + project RCE (#29156)
GHSA-5qr3-c538-wm9j — half two of the bypass chain.

``_mount_plugin_api_routes`` imports each dashboard plugin's
manifest ``api`` field as a Python module via
``importlib.util.spec_from_file_location`` — arbitrary code
execution by design.  Two primitives in the surrounding code
turned that "by design" RCE into a usable attack:

1. Absolute paths in the manifest swallow the plugin directory.
   ``Path('safe/dashboard') / '/tmp/evil.py'`` resolves to
   ``/tmp/evil.py``, so a single manifest line
   ``{"api": "/tmp/payload.py"}`` was enough to redirect the
   importer at any Python file on disk.
2. ``..`` traversal in the manifest climbs out of the dashboard
   directory.  ``Path('plugins/safe/dashboard') /
   '../../../tmp/evil.py'`` lands in ``/tmp/evil.py`` after
   ``resolve()`` — the static-asset handler
   (``serve_plugin_asset``) already defends against this via
   ``is_relative_to``; the api-mount path didn't.

Fix at three layers so a regression in any one can't re-open the
advisory:

* New ``_safe_plugin_api_relpath`` validator runs at *discovery*
  time and stores only sanitised relative paths on the plugin
  entry's ``_api_file`` field.  Absolute paths, ``..`` traversal,
  empty / non-string values, and paths that ``resolve()`` outside
  the plugin's ``dashboard/`` directory are rejected with a
  warning naming the plugin.  ``has_api`` follows the sanitised
  value so the dashboard frontend doesn't render a fake "Backend
  API" badge for plugins whose api was scrubbed.
* ``_mount_plugin_api_routes`` re-validates the resolved path
  against the live filesystem just before the import — defence in
  depth in case ``_dir`` is tampered with post-cache or a future
  caller bypasses the discovery-time validator.
* Project plugins (``source == "project"``) are refused outright
  for backend import.  ``./.hermes/plugins/`` ships with the CWD,
  so any threat model that includes "user opens a malicious repo"
  treats it as attacker-controlled; project plugins can still
  extend the UI via static JS/CSS but their Python ``api`` is no
  longer auto-imported.  Combined with the truthy env-gate fix
  from the previous commit, the original advisory chain now
  fails at two distinct choke points.
2026-05-23 01:43:52 -07:00
xxxigm da636e982b test(plugins): regression coverage for project-plugin RCE chain (#29156)
35 new tests across 5 classes covering every layer of the
GHSA-5qr3-c538-wm9j defence.  Each class corresponds to one chokepoint
so a regression in any single layer is caught by the named class:

* ``TestProjectPluginsEnvGate`` (13 cases) — parametrised over both
  the documented truthy values (``1`` / ``true`` / ``yes`` / ``on``
  + uppercase variants) and the previously-bypassing falsy strings
  (``0`` / ``false`` / ``no`` / ``off`` / ``""`` / ``False``).  The
  falsy half is the direct env-bypass repro: pre-fix any non-empty
  string enabled the project source.
* ``TestApiPathSanitizer`` (16 cases) — unit-level coverage of the
  new ``_safe_plugin_api_relpath`` helper.  Absolute paths
  (``/etc/passwd``, ``/tmp/payload.py``, ``/usr/bin/python``),
  ``..``-traversal payloads (including nested ``subdir/../../..``),
  and non-string / empty / whitespace-only values must all return
  ``None``.  Safe relative paths (``api.py``, ``backend/routes.py``)
  round-trip unchanged so legitimate plugins keep working.
* ``TestDiscoveryScrubsApiField`` (3 cases) — end-to-end through
  ``_discover_dashboard_plugins`` with a real manifest on disk.
  Verifies that the cached plugin entry's ``_api_file`` is
  scrubbed *at discovery time* (``None`` + ``has_api: False``) so
  any downstream consumer can't be tricked into re-deriving the
  unsafe path from cache.
* ``TestMountApiRoutesRefusesUntrusted`` (3 cases) — pokes
  synthetic plugin entries with each refusal vector directly into
  the cache and patches ``importlib.util.spec_from_file_location``
  to assert it is *not* invoked for project-source / traversal
  payloads, and *is* invoked normally for bundled / user plugins.
* ``TestEndToEndPocBlocked`` (1 case) — reproduces the original
  advisory PoC: operator sets ``HERMES_ENABLE_PROJECT_PLUGINS=0``
  believing project plugins are off, attacker plants a manifest in
  CWD's ``.hermes/plugins/`` with ``api`` pointing at an absolute
  payload path.  Asserts that the importer is never called against
  the payload path *and* that ``hermes_dashboard_plugin_evil`` is
  not in ``sys.modules`` after the mount routine runs.

An autouse fixture busts ``_dashboard_plugins_cache`` before and
after each test so the production cache (populated by the
import-time ``_mount_plugin_api_routes()`` call) can't bleed in.
All 12 pre-existing dashboard-plugin tests in
``test_web_server.py`` still pass unchanged.
2026-05-23 01:43:52 -07:00
xxxigm 09f85f2cf7 fix(plugins): apply truthy env semantics to project-plugin gate (#29156)
GHSA-5qr3-c538-wm9j — half one of the bypass chain.

``_discover_dashboard_plugins`` opted into the untrusted ``./.hermes/
plugins/`` source via ``if os.environ.get("HERMES_ENABLE_PROJECT_
PLUGINS"):`` — which is True for any non-empty string.  ``=0``,
``=false``, ``=no``, ``=off`` all return non-empty strings and so
*enabled* the project source even though every operator (and the
agent loader, ``hermes_cli/plugins.py`` line 815) reads those values
as "disabled".  An attacker who can land a manifest under the CWD's
``.hermes/plugins/`` directory — a malicious cloned repo, a worktree
checked out from a forked PR, a CI runner workspace — was therefore
guaranteed to get their manifest discovered the moment the user ran
``hermes dashboard`` from that directory, regardless of whether the
user thought they had project plugins disabled.

Switch to the shared ``utils.env_var_enabled`` helper used by the
agent loader so the gate accepts the documented truthy set (``1`` /
``true`` / ``yes`` / ``on``, case-insensitive) and treats everything
else — including ``0`` / ``false`` / ``no`` — as off.

Half two (path-traversal + project-source ``api`` import) lands in
the next commit.  Together they break the RCE chain at two distinct
choke points so a future regression in either one alone can't
re-open the advisory.
2026-05-23 01:43:52 -07:00
Teknium 11e6dd3c60 chore(release): add AUTHOR_MAP entry for egilewski (PR #30432) (#30833) 2026-05-23 01:41:31 -07:00
Eugeniusz Gilewski 41d2c758c3 Fix unsafe gateway media path delivery 2026-05-23 01:40:35 -07:00
Markus 4a91e36495 fix(gateway): separate observed Telegram group context 2026-05-23 01:33:42 -07:00
Teknium 729a778af0 infographic: PR #17659 read-deny credentials salvage 2026-05-22 20:15:09 -07:00
Teknium 97e975edd2 fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root
Extends @briandevans's PR #17659 from {auth.json, auth.lock,
.anthropic_oauth.json} to also cover:

  - HERMES_HOME/.env                       (provider API keys)
  - HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets)
  - HERMES_HOME/mcp-tokens/                (OAuth token directory; dir
                                            + everything inside)

…AND iterates over both _hermes_home_path() AND _hermes_root_path()
so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block
<root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the
write-deny side already does (#15981, #14157).

Explicitly NOT a security boundary. Per the personal-assistant trust
model, the terminal tool runs as the same OS user and can `cat
auth.json` directly. This read-deny exists as defense-in-depth:

  - Models that respect tool denials empirically tend to stop rather
    than reach for the shell.
  - The denial surfaces an audit trail when something tries to read
    credentials — easier to spot in logs than a generic `cat`.

Docstring + error message both flag this as defense-in-depth so future
contributors don't mistake it for a real security boundary and don't
re-decline reports that propose the same fix shape.

Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel
PR #8055 (closed-as-duplicate, credited).

Co-authored-by: Tom Qiao <zqiao@microsoft.com>
2026-05-22 20:15:09 -07:00
briandevans 567ea61298 fix(file-safety): block auth.json read via TERMINAL_CWD relative path
read_file_tool resolves relative paths against TERMINAL_CWD (or the
task's live terminal cwd), but the prior call passed the original
unresolved string to get_read_block_error. That function's own
resolve() is anchored at the Python process cwd, so when a task's
TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file
on the relative path "auth.json", the credential-store denylist was
never reached and the file was read normally.

Pass the already-resolved absolute path string at the file_tools call
site, document the contract on get_read_block_error, and add a
read_file_tool-level regression test that pins the relative-path
case under TERMINAL_CWD == HERMES_HOME.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:15:09 -07:00
briandevans 056e00a77e fix(file-safety): block read_file on HERMES_HOME credential stores (#17656)
`get_read_block_error` previously only denied reads inside
`${HERMES_HOME}/skills/.hub`, which left `auth.json` (provider OAuth
state + plaintext API keys) and `.anthropic_oauth.json` (Anthropic PKCE
tokens) directly readable by the agent. A prompt-injection reaching
`read_file` could exfiltrate active provider credentials in plaintext.

Mode-0600 file permissions only protect against *other Unix users* —
the agent runs as the file's owner, so `read_file` is unaffected.

Extend the existing deny list with the three credential paths
identified in #17656 (`auth.json`, `auth.lock`, `.anthropic_oauth.json`).
The check uses the same `Path.resolve()` pattern as `skills/.hub`, so
symlink/path-traversal indirection is caught too. The agent doesn't
need to read these directly — `auxiliary_client` and `credential_pool`
consume them through process env / OAuth flows that bypass `read_file`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:15:09 -07:00
Teknium 7f7245bf62 infographic: PR #6656 skill hub safety audit salvage 2026-05-22 19:59:24 -07:00
Teknium 3f78d8073c fix(skills): make content_hash filename-sensitive too (symmetric with bundle_content_hash)
PR #6656 added rel_path + \x00 prefixing to ``bundle_content_hash`` so a
filename swap between two files in a bundle changes the digest. But it
only patched the in-memory side — ``content_hash`` in ``tools/skills_guard.py``
(the on-disk equivalent) still hashed file contents only.

These two functions need to stay symmetric: ``check_for_skill_updates``
compares the disk hash of an installed skill against the bundle hash
of the upstream copy. With the asymmetric fix, every clean install
showed as drifted because the digests no longer matched
(2 existing tests in ``test_skills_hub.py`` started failing as soon as
the contributor's change landed).

Apply the same ``rel_path + \x00 + content`` shape to the disk-side
function. Both functions now produce the same digest for the same skill
content laid out two ways. Documented the symmetry invariant in the
docstring so a future change to either function knows to touch both.

Also adds tests/tools/test_pr_6656_regressions.py with 10 regression
tests covering all three fixes salvaged in PR #6656:
  - uninstall_skill path traversal (4 cases: parent segments, absolute
    paths, symlink escape, legitimate skill)
  - bundle_content_hash filename swap detection (4 cases: in-memory
    swap, identity, disk-side swap, bundle↔disk symmetry)
  - list_pending lock contract (2 cases: source-grep contract, smoke)

Also fixes AUTHOR_MAP entry for @aaronlab — their commit email
(1115117931@qq.com) maps to "aaronagent" which isn't a real GitHub
login, so changelog @mentions would 404.
2026-05-22 19:59:24 -07:00
aaronagent b82608a6f5 fix(skills,pairing): path traversal guard in uninstall, lock list_pending, hash file paths
- skills_hub: validate that uninstall_skill's install_path resolves
  inside SKILLS_DIR before calling shutil.rmtree, preventing recursive
  deletion of arbitrary directories via poisoned lock.json entries
- skills_hub: include file paths (not just contents) in
  bundle_content_hash so swapping filenames between files changes the
  hash, strengthening update-detection integrity
- pairing: wrap list_pending() in self._lock so _cleanup_expired() file
  writes don't race with concurrent generate_code()/approve_code() calls

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 19:59:24 -07:00
teknium1 8cf977c8b1 fix(plugins): widen _sanitize_plugin_name for category-namespaced names
Follow-up to PR #28832 — the dashboard plugin routes now accept slashed
names like `observability/langfuse` and `image_gen/openai`, but
`_sanitize_plugin_name` still rejected forward slash and so dashboard
update + remove on those plugins fell through to '404 not found' even
though they exist on disk.

Adds an opt-in `allow_subdir=True` flag that:
- Permits internal forward slashes (category-namespaced plugin keys
  emitted by `_discover_all_plugins`).
- Strips leading and trailing slashes.
- Still rejects `..` and backslash, and still asserts the resolved
  target lives inside `plugins_dir`.

Opted in at the two read-paths that operate on installed plugins:
`_require_installed_plugin` (CLI update/remove) and
`_user_installed_plugin_dir` (dashboard update/remove). The install
path keeps the default (`allow_subdir=False`) because freshly-cloned
plugins always land top-level under `~/.hermes/plugins/<name>/`.

Adds 6 targeted unit tests covering the new flag's allow/reject matrix.
2026-05-22 19:50:32 -07:00
Austin Pickett 487c398dcf refactor(web): dashboard typography & contrast pass
Removes the global `uppercase` + `font-mondwest` from the App.tsx root
that forced every page to opt-out, replaces stacked-alpha text colors
with semantic tokens for WCAG-AA contrast across all 7 themes, and
applies the new `text-display` utility from @nous-research/ui@0.16.0
on intentional brand chrome (page titles, sidebar headings, segmented
filters) only. Bumps every sub-12px arbitrary text size to text-xs.

Also widens the dashboard plugin routes (/api/dashboard/agent-plugins/
{name:path}/...) so category-namespaced plugins like observability/
langfuse and image_gen/openai can be enable/disabled from the dashboard
— previously the FE encodeURIComponent-ed the slash and the backend
{name} route rejected it. _validate_plugin_name still blocks .. and
backslash, and strips leading/trailing slash.

Touches sessions/env/keys page chrome and adds two new i18n keys
(`overview`, `showMore`/`showLess`) across all 18 locales.

Squashes 19 commits from PR #28832.

Co-authored-by: Hermes <noreply@nousresearch.com>
2026-05-22 19:50:32 -07:00
ethernet dc4b0465b5 feat(ci): use 6-way slicing based on benchmark results
Benchmarked 4/5/6/7/8 slices with LPT duration-balanced distribution:
- 4 slices: 4.8m wall, 135s spread
- 5 slices: 3.4m wall, 46s spread
- 6 slices: 3.3m wall, 26s spread ← optimal
- 7 slices: 3.9m wall, 109s spread
- 8 slices: 3.7m wall, 96s spread

6 slices is the sweet spot: lowest wall time, tightest spread.
7+ gets slower due to per-slice startup overhead dominating.

Also removes benchmark branch markers from save-durations condition.
2026-05-22 19:46:18 -07:00
ethernet e7cb5d4b68 fix: clean push triggers 2026-05-22 19:46:18 -07:00
ethernet f89afdbd17 fix(test): deflake two intermittent CI failures
- test_browser_secret_exfil: mock _run_browser_command instead of
  launching real Chrome (secret check is pre-launch, browser is
  irrelevant to the assertion)
- test_web_server: add time.sleep(0.05) after pub.send_text() to
  yield the event loop before receive_text(). TestClient's sync mode
  can race the broadcast handler otherwise, hanging the test.
2026-05-22 19:46:18 -07:00
ethernet 510df6eaf4 test: 4-way slice benchmark (with cache save) 2026-05-22 19:46:18 -07:00
ethernet b689624aee feat(ci): 4-way matrix slicing with LPT duration-balanced distribution
run_tests_parallel.py:
  - --slice I/N flag (also HERMES_TEST_SLICE env var) runs only the
    I-th slice of N, distributing files across slices by cached
    duration using LPT (Longest Processing Time first) greedy
    algorithm so each slice gets roughly equal wall time
  - Duration cache (test_durations.json): maps relative file paths to
    last-observed subprocess wall time. _save_durations merges with
    existing cache so entries from other slices are preserved.
  - Per-file subprocess timing in progress output + end-of-run
    distribution summary (percentiles, top-10 slowest, <1s/<2s counts)
  - Unknown files default to 2.0s estimate (~P50), spread evenly by LPT

.github/workflows/tests.yml:
  - Matrix strategy: slice [1, 2, 3, 4] with fail-fast: false
  - Each slice restores duration cache from main (stable key, no SHA),
    runs its portion, uploads per-slice durations as artifacts
  - save-durations job (main only, if: always()) downloads all 4
    artifacts, merges into single cache entry for future PRs
  - Timeout reduced from 60min to 30min per slice (~1/4 the work)

Cache design:
  - Stable key (test-durations) not keyed by commit SHA — durations
    are about files, not commits, and SHA-keyed caches miss on every
    new commit and on PR merge commits
  - actions/cache scoping: main's cache is visible to all PRs targeting
    main; feature branches without a cache still work (default 2.0s)
  - No dotfile prefix (upload-artifact v7 skips hidden files)
2026-05-22 19:46:18 -07:00
Teknium a84cec61ca fix(minimax-oauth): refresh short-lived access tokens per request (#30619)
* fix(minimax-oauth): refresh short-lived access tokens per request

MiniMax OAuth issues ~15-minute access tokens. The Anthropic SDK caches
api_key as a static string at client construction, so a session that
resolves credentials once at startup keeps sending the same bearer until
MiniMax returns 401 mid-session.

Swap the static string for a callable token provider, reusing the existing
Entra-ID bearer-hook infrastructure in build_anthropic_client. The callable
re-reads auth.json on each invocation and calls _refresh_minimax_oauth_state,
which is a no-op when the token still has more than 60s of life left and
refreshes proactively otherwise. Refreshes persist to auth.json so other
processes (gateway, cron) see them immediately.

The wire-up lives at the agent-init / model-switch boundary rather than in
resolve_runtime_provider, so aux client paths that hand the api_key string
to OpenAI(api_key=...) are unaffected.

* docs: add infographic for minimax-oauth token refresh
2026-05-22 15:16:15 -07:00
ethernet 2f320cb35a fix(ci): supply-chain-audit uses two-dot diff, causing false positives on stale-branch PRs
The workflow diffs base.sha..head.sha (two-dot), which compares the
tip-of-main tree directly against the PR tip. When files land on main
after a PR branched off, they appear in the diff even though the PR
never touched them — triggering false-positive findings.

Example: PR #30609 was flagged for hermes_cli/setup.py, a file added
to main by an unrelated commit after the PR branched.

Switch to three-dot diff (base.sha...head.sha), which diffs from the
merge base to the PR tip — only changes introduced by this PR are
included. Applied to all four diff commands in both jobs (scan and
dep-bounds).
2026-05-22 15:15:53 -07:00
Teknium 2233b8b244 infographic: PR #30609 Termux cold-start salvage (#30618) 2026-05-22 14:32:41 -07:00
adybag14-cyber a3beee475b perf(termux): speed up bare cli prompt startup 2026-05-22 14:27:38 -07:00
adybag14-cyber 6c3fd9714f perf(termux): fast-path cli version startup 2026-05-22 14:27:38 -07:00
Teknium d11cbb1032 infographic: PR #30591 Discord adapter → bundled plugin salvage (#30614) 2026-05-22 14:24:03 -07:00
Teknium 7849a3d73f fix(gateway,discord-plugin): _platform_status must respect is_connected=False, not silently fall back to check_fn
Two bugs surfaced by PR #24356 migrating Discord into the registry:

1. plugins/platforms/discord/adapter.py::_is_connected — read DISCORD_BOT_TOKEN
   via hermes_cli.gateway.get_env_value (the abstraction tests patch) instead
   of os.getenv directly. The legacy non-registry path used get_env_value;
   bypassing it broke test_setup_openclaw_migration which patches
   gateway_mod.get_env_value to simulate a hermetic env.

2. hermes_cli/gateway.py::_platform_status — when entry.is_connected is
   defined and returns False, return 'not configured' immediately. Don't
   fall back to entry.check_fn(), which would let 'SDK is installed'
   override 'no token configured' and incorrectly report the platform as
   ready. The fallback to check_fn is the right behaviour only when
   is_connected is None (not registered).

Fixes 5 test failures observed on CI for PR #24356:
- tests/hermes_cli/test_setup.py::test_setup_gateway_skips_service_install_when_systemctl_missing
- tests/hermes_cli/test_setup.py::test_setup_gateway_in_container_shows_docker_guidance
- tests/hermes_cli/test_setup_irc.py::TestIRCGatewaySetupFreshInstall::test_setup_gateway_irc_counts_as_messaging_platform
- tests/hermes_cli/test_setup_openclaw_migration.py::TestGetSectionConfigSummary::test_gateway_returns_none_without_tokens
- tests/hermes_cli/test_setup_openclaw_migration.py::TestSetupWizardSkipsConfiguredSections::test_sections_skipped_when_migration_imported_settings

Same _platform_status bug exists for sibling plugin platforms (teams,
google_chat) whose check_fn returns true on SDK install alone; their
tests just never exercised the registry path before. The bug only became
test-visible when Discord migrated into the registry.

Validation: 11,167 tests across tests/gateway/ + tests/cron/ +
tests/tools/test_send_message_tool.py + tests/hermes_cli/ pass with zero
failures.
2026-05-22 14:21:41 -07:00
kshitijk4poor cc8e5ec2af refactor(gateway): migrate Discord adapter to bundled plugin (full Teams parity)
First migration of an existing built-in platform adapter to the plugin
system established by IRC / Teams / LINE / Google Chat. Closes #24325;
advances the umbrella refactor in #3823.

Matches Teams' shape exactly — adapter under ``plugins/platforms/discord/``
with the standard ``__init__.py`` / ``adapter.py`` / ``plugin.yaml``
shell, ``register(ctx)`` entry point, **no back-compat shim** at the old
import path, and full parity for the four hooks Teams uses plus the
``apply_yaml_config_fn`` hook that landed in #25443 (the Discord plugin
is the first consumer of that hook):

* ``standalone_sender_fn`` — out-of-process cron delivery via REST API
* ``setup_fn`` — interactive ``hermes setup gateway`` wizard
* ``apply_yaml_config_fn`` — translate ``config.yaml`` ``discord:`` keys
  into ``DISCORD_*`` env vars (replaces the hardcoded block in
  ``gateway/config.py``)
* ``is_connected`` — declares connection state from ``DISCORD_BOT_TOKEN``
* ``check_fn`` — lazy-installs ``discord.py`` on demand
* plus ``allowed_users_env``, ``allow_all_env``, ``cron_deliver_env_var``,
  ``max_message_length``, ``emoji``, ``required_env``, ``install_hint``

* ``gateway/platforms/discord.py`` (5,101 LOC) →
  ``plugins/platforms/discord/adapter.py`` (git rename, R090).
* New ``plugins/platforms/discord/{__init__.py, plugin.yaml}`` with
  ``requires_env`` / ``optional_env`` declarations.
* Append ``register(ctx)`` block + new hook implementations
  (``_standalone_send``, ``interactive_setup``, ``_apply_yaml_config``,
  ``_clean_discord_user_ids``, ``_is_connected``, ``_build_adapter``,
  plus helpers ``_DISCORD_CHANNEL_TYPE_PROBE_CACHE`` etc.) to the
  adapter.

* Replace the ``Platform.DISCORD elif`` branch in
  ``GatewayRunner._create_adapter()`` (−9 LOC) with a generic post-creation
  hook (+6 LOC) in the registry path: any plugin adapter that declares a
  ``gateway_runner`` attribute now gets it auto-injected. Webhook's
  built-in branch is unchanged (it doesn't go through the registry path).

* Move ``_send_discord`` (190 LOC) and helpers
  (``_DISCORD_CHANNEL_TYPE_PROBE_CACHE``, ``_remember_channel_is_forum``,
  ``_probe_is_forum_cached``, ``_derive_forum_thread_name``) from
  ``tools/send_message_tool.py`` into the plugin as ``_standalone_send``.
* Wire via ``standalone_sender_fn=_standalone_send`` (Teams pattern; same
  gap fixed in #21804 for other plugin platforms).
* Replace the Discord ``elif`` in ``tools/send_message_tool.py``
  ``_send_to_platform`` with a 10-line registry-hook dispatch.
* Drop the ``DiscordAdapter`` import and the
  ``Platform.DISCORD: DiscordAdapter.MAX_MESSAGE_LENGTH`` ``_MAX_LENGTHS``
  entry — the registry's ``max_message_length=2000`` covers it.

* Move ``_setup_discord`` and ``_clean_discord_user_ids`` (68 LOC) from
  ``hermes_cli/setup.py`` into the plugin as ``interactive_setup``.
* Wire via ``setup_fn=interactive_setup``.  CLI helpers (``prompt``,
  ``print_info``, etc.) are lazy-imported so the plugin's module-load
  surface stays minimal.
* Remove ``"discord": _s._setup_discord`` from
  ``hermes_cli/gateway.py::_builtin_setup_fn``.
* Remove the entire 32-line ``_PLATFORMS["discord"]`` static dict entry —
  Discord's setup metadata is now discovered dynamically via
  ``_all_platforms()`` from the registry entry.

* Move the 59-line ``discord_cfg`` YAML→env bridge from
  ``gateway/config.py::load_gateway_config()`` into the plugin as
  ``_apply_yaml_config``.  Covers ``require_mention``,
  ``thread_require_mention``, ``free_response_channels``, ``auto_thread``,
  ``reactions``, ``ignored_channels``, ``allowed_channels``,
  ``no_thread_channels``, ``allow_mentions.{everyone,roles,users,
  replied_user}``, and ``reply_to_mode`` (including the YAML 1.1
  ``off``-as-False coercion and the ``extra.reply_to_mode`` fallback).
* Wire via ``apply_yaml_config_fn=_apply_yaml_config``.
* The hook runs BEFORE ``_apply_env_overrides`` and after the generic
  shared-key loop, exactly as documented in
  ``website/docs/developer-guide/adding-platform-adapters.md``.
* Behavior is preserved exactly — every assignment still uses
  ``not os.getenv(...)`` guards so env vars take precedence over YAML.

All 78 references to the old import path are rewritten — no back-compat
shim:

* 51 ``from gateway.platforms.discord import X`` →
  ``from plugins.platforms.discord.adapter import X``
* 5 ``import gateway.platforms.discord as discord_platform`` →
  ``import plugins.platforms.discord.adapter as discord_platform``
* 1 ``from gateway.platforms import discord as discord_mod`` →
  ``from plugins.platforms.discord import adapter as discord_mod``
* 21 ``mock.patch("gateway.platforms.discord.X")`` strings →
  ``mock.patch("plugins.platforms.discord.adapter.X")``
* 1 docstring reference in ``hermes_cli/commands.py``
* 1 import in ``tools/send_message_tool.py`` (now removed entirely)

The import-safety test in ``tests/gateway/test_discord_imports.py`` is
updated to purge the new canonical module name from ``sys.modules``.

**38 files changed, +621 / −473** — net positive due to the YAML hook
implementation (89 new LOC in the plugin trading for 59 deleted in core),
but every line moved has a clear plugin home now.  The git rename is
detected at R090 because the adapter gained ~340 LOC of moved-in hook
implementations (``_standalone_send`` + ``interactive_setup`` +
``_apply_yaml_config`` + helpers).

* All 568 Discord-specific tests pass across 25 ``test_discord_*.py``
  files plus voice/send/text-batching/reload-skills/stream-consumer/
  integration tests.
* All 147 tests in the YAML-touching subset
  (``test_discord_reply_mode``, ``test_discord_free_response``,
  ``test_discord_allowed_channels``, ``test_discord_allowed_mentions``,
  ``test_discord_channel_controls``, ``test_discord_reactions``,
  ``test_discord_thread_persistence``, ``test_runtime_footer``) pass —
  this is the strongest signal that the YAML→env hook behaves
  identically to the legacy block.
* Broader gateway/cron/integration sweep (1297 tests) introduces zero
  new failures vs ``main``.  Pre-existing failures in
  ``tests/gateway/test_tts_media_routing.py`` and
  ``tests/e2e/test_platform_commands.py`` reproduce identically on the
  unchanged ``main`` revision.
* Plugin discovery sanity check confirms Discord registers alongside the
  other four platform plugins:

    Registered platforms: ['discord', 'google_chat', 'irc', 'line', 'teams']

These Discord-shaped tendrils in core were **deliberately not moved** —
they are generic platform-registry concerns affecting every platform,
not Discord-specific:

* ``gateway/config.py:1205`` ``DISCORD_BOT_TOKEN → config.token`` env
  enablement — same shape Telegram has.  The existing
  ``env_enablement_fn`` registry hook only seeds ``extra``, not
  ``.token``, so it can't replace this without an adapter refactor to
  read from ``extra["bot_token"]``.
* ``gateway/run.py`` voice-mode hooks
  (``self.adapters.get(Platform.DISCORD)`` for
  ``start_voice_mode``/``stop_voice_mode``), role-based auth,
  ``DISCORD_ALLOW_BOTS`` branch in ``_is_user_authorized``,
  ``_UPDATE_ALLOWED_PLATFORMS`` frozenset, and the per-platform
  allowlist maps — generic platform-registry concerns.
* ``Platform.DISCORD`` enum literal — stable identifier used as dict
  keys throughout the codebase; removing it is a separate refactor with
  no real benefit.
* ``tools/discord_tool.py`` and ``tools/environments/local.py`` —
  first-class agent tools and env-passthrough config, neither is the
  gateway adapter.

Each of these is worth its own scoping issue when the time comes.
2026-05-22 14:21:41 -07:00
Teknium 4f988634f8 infographic: PR #27612 Nous URL allowlist salvage 2026-05-22 14:17:40 -07:00
Teknium e32d2ffc1d fix(security): wire Nous URL allowlist into refresh / mint persistence sites
@memosr's PR #27612 put the inference_base_url allowlist check only at the
Nous proxy adapter forward boundary. The poisoned URL, however, lands in
``auth.json`` upstream of that — at five refresh / agent-key-mint payload
read sites inside ``resolve_nous_runtime_credentials`` and
``_extend_state_from_refresh``. Without gating those sites, a single MITM
on a refresh response persists the attacker's URL across restarts, even
if the proxy adapter's defense-in-depth check would later catch it on
the way out.

Replace ``_optional_base_url`` with ``_validate_nous_inference_url_from_network``
at all five Portal-network reads:

  - hermes_cli/auth.py L4840  (refresh-only access-token path)
  - hermes_cli/auth.py L4876  (mint payload path)
  - hermes_cli/auth.py L5154  (terminal-runtime access-token refresh)
  - hermes_cli/auth.py L5262  (cross-process serialized refresh)
  - hermes_cli/auth.py L5317  (terminal-runtime mint payload)

The state-read path at L5025 (``state.get("inference_base_url")``) is
deliberately NOT gated — pre-existing state in ``auth.json`` is either
already validated (it came from one of the five network sites above) or
set by a trusted local actor (manual edit, ``_setup_nous_auth`` test
fixture, ``hermes login nous`` against a staging endpoint via the
documented ``NOUS_INFERENCE_BASE_URL`` env override). Direct write_file /
patch tampering with auth.json is independently blocked by PR #14157.

Adds tests/hermes_cli/test_nous_inference_url_validation.py covering:
  - validator https + host + edge-case rules (12 cases)
  - all 5 network call sites grep contracts (no _optional_base_url
    regression possible without test failure)
  - proxy adapter defense-in-depth check still present
  - env override path NOT gated (documented dev/staging behaviour)

18 new tests, all 119 Nous-auth tests green.
2026-05-22 14:17:40 -07:00
memosr d33c99bbb1 fix(security): validate Nous Portal inference_base_url against host allowlist
The Nous Portal proxy adapter forwards minted ``agent_key`` bearer tokens
to whatever ``base_url`` ``resolve_nous_runtime_credentials()`` returns,
which is read directly from the refresh / agent-key-mint response and
persisted to ``~/.hermes/auth.json``. With no validation beyond a
trailing-slash strip, a poisoned URL (Portal-side MITM, or local write
to auth.json) gets forwarded the legitimate bearer on every subsequent
proxy request — exfiltrating the user's inference budget and opening a
response-injection channel back into the IDE / chat client.

Add ``_validate_nous_inference_url_from_network()`` in ``hermes_cli.auth``:
an https + host-allowlist check that returns None for anything outside
``inference-api.nousresearch.com``, so callers fall back to the
documented default rather than ship the bearer to an attacker.

This commit wires the validator into the proxy adapter at
``nous_portal.py``. A follow-up commit wires it into the four refresh /
mint sites in ``auth.py`` so the poisoned URL never lands in auth.json
in the first place.

The env-var override path (``NOUS_INFERENCE_BASE_URL``) bypasses
validation by design — that's the documented staging/dev escape hatch
and the env source is already trusted (the user set it themselves).

Co-authored-by: memosr <mehmet.sr35@gmail.com>
2026-05-22 14:17:40 -07:00
Julien Talbot 09afafb87e fix(xai): resolve Grok Build context for OAuth 2026-05-22 13:05:36 -07:00
Teknium 1e71b7180e infographic: PR #14157 control-plane write-deny salvage 2026-05-22 04:32:14 -07:00
Teknium 42104218e0 fix(file-safety): also write-deny <root>/control-files in profile mode
PR #14157 added control-plane write-deny against the ACTIVE HERMES_HOME,
which is fine in non-profile mode but leaves a gap once a profile is
active: HERMES_HOME points at <root>/profiles/<name>, so the global
<root>/auth.json + <root>/config.yaml + <root>/webhook_subscriptions.json
+ <root>/mcp-tokens/ remain writable. Same shape as the .env gap PR
#15981 closed via _hermes_root_path().

Apply the same widening pattern here. The control-file/mcp-tokens check
now iterates BOTH _hermes_home_path() and _hermes_root_path() (dedupes
when they coincide in non-profile mode). Also tightens the mcp-tokens
check from "startswith dir + os.sep" to "==dir OR startswith dir + os.sep"
so writing the directory entry itself is blocked, not just files inside.

Regression tests cover both protections in a real profile-mode layout
(<tmp>/hermes/profiles/coder as HERMES_HOME, <tmp>/hermes as root).
2026-05-22 04:32:14 -07:00
Pratik Rai 1f5219fda5 fix(security): protect Hermes control-plane files from prompt injection
Adds active-HERMES_HOME control-plane files to the write deny list:
auth.json, config.yaml, webhook_subscriptions.json, and any path
under mcp-tokens/. realpath() resolves before comparison so
directory-traversal and symlink targets are normalised, preventing
trivial deny-list bypass via ../ tricks.

Without this, a prompt-injected agent could rewrite Hermes' own
auth state or routing config via write_file / patch — without
triggering the terminal dangerous-command approval — and persist
attacker-controlled behaviour across sessions.

Fixes #14072
2026-05-22 04:32:14 -07:00
Teknium 6f436a463e infographic: PR #27784 anthropic adapter refactor salvage 2026-05-22 04:23:02 -07:00
kshitijk4poor 9d61408837 refactor: extract 7 helpers from convert_messages_to_anthropic
Split convert_messages_to_anthropic (complexity 79) into 7 focused helpers:

- _convert_assistant_message    — assistant msg to content blocks
- _convert_tool_message_to_result — tool msg to tool_result + merge
- _convert_user_message         — user msg validation + conversion
- _strip_orphaned_tool_blocks   — orphan tool_use + tool_result removal
- _merge_consecutive_roles      — role alternation enforcement
- _manage_thinking_signatures   — strip/preserve/downgrade by endpoint
- _evict_old_screenshots        — keep only 3 most recent images

Main function complexity: 79 → 10 (below C901 threshold).
Zero logic changes — pure extraction. Net -4 lines (refactor itself);
+45/-17 follow-up polish for annotation tightening (List[Dict] →
List[Dict[str, Any]]), restored rationale comments in
_manage_thinking_signatures (third-party endpoint examples, #13848/#16748
issue refs, redacted_thinking 'data'-as-signature note), and "Mutates
``result`` in place." docstring lines on the four mutating helpers.
2026-05-22 04:23:02 -07:00
Teknium ec2ab5bfaf infographic: PR #8056 hash pairing codes salvage 2026-05-22 04:11:49 -07:00
Teknium 82c2035823 fix(pairing): handle legacy plaintext pending entries during upgrade
When an existing install upgrades to the hashed-pending schema, its
on-disk pending.json still has the old {code: entry} format with no
hash/salt fields. The original PR #8056 assumed every entry had both
fields and would have KeyErrored in approve_code, list_pending, and
_cleanup_expired.

Guard each consumer:
  - approve_code: skip entries that are not a dict, lack salt/hash,
    or have a non-hex salt. Legacy entries simply fail to match.
  - list_pending: tolerate missing 'hash' (show "legacy" placeholder)
    and non-numeric created_at (skip the row).
  - _cleanup_expired: treat malformed/legacy entries as expired so
    they get pruned on the next call rather than wedging the file.

Regression tests cover all three consumers plus a mixed-malformed
case.
2026-05-22 04:11:49 -07:00
Tom Qiao 2e509422ef fix(security): hash gateway pairing codes instead of storing plaintext
Pairing codes were stored as plaintext keys in JSON files. Now uses
sha256 + random salt hashing with constant-time comparison.

Fixes #8036

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-22 04:11:49 -07:00
0xDevNinja 3ac2125140 refactor(image_gen): port FAL backend to plugins/image_gen/fal
Mirrors the architecture established by the web (#25182), browser
(#25214), and video_gen (#25126) plugin migrations:

* `tools/fal_common.py` — stateless atoms shared by both FAL-backed
  plugins (image_gen + video_gen). Holds the lazy `fal_client` import
  helper, `_ManagedFalSyncClient`, `_normalize_fal_queue_url_format`,
  `_extract_http_status`. Stateful pieces (`fal_client` module global,
  `_managed_fal_client*` cache, `_submit_fal_request`,
  `_resolve_managed_fal_gateway`, `_get_managed_fal_client`)
  intentionally stay on `tools.image_generation_tool` so the existing
  `monkeypatch.setattr(image_tool, ...)` patch sites keep working
  unchanged.

* `plugins/video_gen/fal/__init__.py` — drops its inline
  `_load_fal_client` duplicate; consumes `tools.fal_common.import_fal_client`.

* `plugins/image_gen/fal/{plugin.yaml,__init__.py}` — new plugin.
  `FalImageGenProvider` is a thin registration adapter that resolves
  the legacy module via `import tools.image_generation_tool as _it`
  and calls `_it.image_generate_tool` + `_it._resolve_fal_model` at
  call time. The 18-model catalog, `_build_fal_payload`, managed-
  gateway selection, and Clarity Upscaler chaining all remain in
  `tools.image_generation_tool` as the single source of truth —
  the plugin is a registration adapter, not a parallel implementation.

* `tools/image_generation_tool.py::_dispatch_to_plugin_provider` —
  drops the `configured == "fal"` skip. Setting `image_gen.provider:
  fal` now routes through the registry like any other provider; the
  plugin re-enters this module's pipeline so behavior is identical.
  Unset `image_gen.provider` still falls through to the in-tree
  pipeline (preserves no-config-with-FAL_KEY UX from #15696).

* `hermes_cli/tools_config.py` — drops the hardcoded "FAL.ai" row from
  `TOOL_CATEGORIES["image_gen"]["providers"]` (now injected by
  `_plugin_image_gen_providers` like every other backend) and the
  `getattr(provider, "name") == "fal"` skip that protected against
  duplication with the hardcoded row. The "Nous Subscription" row
  stays as a setup-flow entry — same shape browser kept "Nous
  Subscription (Browser Use cloud)" after #25214.

* `tests/plugins/image_gen/test_fal_provider.py` — 14 cases covering
  the ABC surface, call-time indirection (verifying
  `monkeypatch.setattr(image_tool, "image_generate_tool", ...)` takes
  effect through the plugin), response-shape stamping, exception
  handling, and registry wiring.

* `tests/plugins/image_gen/check_parity_vs_main.py` — subprocess
  harness mirroring `tests/plugins/browser/check_parity_vs_main.py`.
  Pins one path to origin/main, one to the worktree; runs six
  scenarios (unset, explicit-fal-no-creds, explicit-fal-with-creds,
  explicit-fal-with-model, typo provider, managed-gateway-only) and
  diffs the reduced shape `{dispatch_kind, provider_name, model}`
  per scenario. The only acceptable diff is "legacy_fal → plugin
  (fal)" for explicit-FAL paths — every other delta is flagged as
  a regression.

* `tests/hermes_cli/test_image_gen_picker.py::test_fal_surfaced_alongside_other_plugins`
  — flips the previous `test_fal_skipped_to_avoid_duplicate` to
  match the new shape (FAL is a plugin now, no dedup needed).

Verified: 195/195 tests across
`tests/{tools/test_image_generation*,tools/test_managed_media_gateways,plugins/image_gen,plugins/video_gen,hermes_cli/test_image_gen_picker}.py`
pass on this branch with no test patches modified outside the picker
test that asserted the old skip behaviour.

Fixes #26241
2026-05-22 04:10:45 -07:00
Teknium 7dea33303a infographic: PR #30373 aux model picker parity salvage 2026-05-22 04:10:38 -07:00
Teknium d246f9a278 fix(aux-picker): drop stale session_search slot
PR #27590 removed auxiliary.session_search from DEFAULT_CONFIG (single-shape
tool now returns DB content directly without an aux LLM), but the slot
remained in _AUX_TASK_SLOTS (web_server.py) and AUX_TASKS (ModelsPage.tsx).
Removing the dead entries while we're touching these tables.
2026-05-22 04:10:38 -07:00
flooryyyy c1e93aa331 fix: add missing aux model slots to model picker
triage_specifier, kanban_decomposer, profile_describer exist in
DEFAULT_CONFIG auxiliary section but weren't in _AUX_TASK_SLOTS,
_AUX_TASKS, or the dashboard AUX_TASKS array — so users couldn't
configure them through hermes model or the web dashboard.

9â\x86\x9212 aux slots across all three UI surfaces.
2026-05-22 04:10:38 -07:00
Teknium 8b49012a0a infographic: PR #8306 webhook HMAC bypass salvage 2026-05-22 03:45:21 -07:00
Teknium 3fc715ddf5 test(webhook): regression cases for empty-secret HMAC bypass
Covers _reload_dynamic_routes() rejecting empty or missing per-route
secrets when no global fallback exists, preserving the INSECURE_NO_AUTH
opt-in, inheriting a global secret when only the per-route value is
missing, and partial-skip when only one of multiple routes is bad.
2026-05-22 03:45:21 -07:00
memosr 9c90b3a597 fix(security): validate secret in _reload_dynamic_routes to prevent HMAC bypass 2026-05-22 03:45:21 -07:00
briandevans 22b0d6dc1a test(tools): centralize disable_lazy_stt_install fixture in conftest
Move the autouse `_disable_lazy_stt_install` fixture out of the three
transcription test files and into `tests/tools/conftest.py` as a regular
(non-autouse) fixture. Each transcription test module opts in once at
the top via `pytestmark = pytest.mark.usefixtures(...)`.

Why: addresses three Copilot inline review comments on this PR that
flagged the verbatim duplication across files. Centralizing also keeps
the patch target in a single place, so a future rename of
`_try_lazy_install_stt` only updates one location.

Why opt-in (not autouse in conftest): other `tests/tools/` files do not
patch `_HAS_FASTER_WHISPER` and have no reason to bypass the runtime
lazy-install probe; making the fixture autouse globally would silently
mask any future test that wants to exercise the real lazy-install path.
2026-05-22 03:33:01 -07:00
briandevans 5dc232a6e2 test(tools): disarm lazy-install probe so _HAS_FASTER_WHISPER patches work
`b5c6d9ac0` ("fix: wire STT lazy-install into transcription_tools.py")
added `_try_lazy_install_stt()`, which calls
`importlib.util.find_spec("faster_whisper")` after `ensure()` runs.
In the dev / CI environment `faster_whisper` is already installed, so
the probe returns truthy and `_get_provider()` returns "local" even
when the test has patched `_HAS_FASTER_WHISPER=False` to simulate
"not installed".

Add a per-file autouse fixture that patches `_try_lazy_install_stt`
to return False so the simulation stays accurate. The 16 baseline
failures across `test_transcription_tools.py`,
`test_transcription.py`, and `test_transcription_dotenv_fallback.py`
disappear; the production lazy-install path is unaffected at runtime.
2026-05-22 03:33:01 -07:00
Teknium c25f9d1d36 feat(secrets): label detected credentials with their source (Bitwarden) (#30364)
When Bitwarden Secrets Manager supplies a provider key, 'hermes model'
and the setup wizard show 'credentials ✓' with no hint of where the
key came from — identical to the .env case. Users assume the integration
isn't wired up and re-enter the key (or hit Enter and cancel).

env_loader now tracks which env vars were injected by an external secret
source and exposes get_secret_source() / format_secret_source_suffix() so
the provider flows can render 'Anthropic credentials: sk-ant-... ✓
(from Bitwarden)' instead of an unlabeled checkmark.

Wired into _prompt_api_key (kimi, z.ai, minimax, opencode, ...), the
Anthropic provider flow, the Bedrock flow, and the GitHub Copilot token
display.

Future secret sources (Vault, 1Password, etc.) drop in by setting their
own label in _SECRET_SOURCES; format_secret_source_suffix() has a generic
fallback so no call sites need updating.
2026-05-22 03:32:58 -07:00
teknium1 d617858896 fix(openviking): target-aware mirror subdir, drop private-attr access, dedupe URI builder
- on_memory_write: map target='memory' -> patterns/, 'user' -> preferences/
  (was hardcoded to preferences/ for both)
- Replace client._user with self._user (no private-attr leakage)
- Extract _build_memory_uri() helper + module-level subdir maps
- Restore on_memory_write signature parity with MemoryProvider base
  (metadata kwarg; eliminates Pyright incompatible-override warning)
- AUTHOR_MAP entry for chrisdlc119@outlook.com
2026-05-22 01:27:52 -07:00
Christian de la Cruz 2d587c5662 fix(openviking): store memories via content/write API instead of session messages
_tool_remember and on_memory_write were posting memories as session
messages that depend on commit-time VLM extraction to persist. With
extraction_enabled: false (no VLM configured), the extraction pipeline
never processes these messages, causing memories to be silently lost.

Replace both paths with direct POST to /api/v1/content/write?mode=create,
which creates the file, stores the content, and queues vector indexing
in a single API call. Error reporting is immediate — no silent failures.

- Maps viking_remember category to viking:// subdirectory
- Generates UUID-based URIs via uuid4().hex[:12]
- Returns byte count in confirmation message
2026-05-22 01:27:52 -07:00
Teknium caf0f30eab chore(release): add sgtworkman to AUTHOR_MAP 2026-05-22 01:24:11 -07:00
sgtworkman 70d53d8b75 fix: run computer use post-setup when enabling tool 2026-05-22 01:24:11 -07:00
Rodrigo fbdca64f73 fix(computer-use): skip capture_after when action failed (ok=False)
_maybe_follow_capture() issued a follow-up screenshot unconditionally
when capture_after=True, even when res.ok=False. The model then received
a normal-looking screenshot alongside an error message, and in practice
it often ignored ok=False and proceeded as if the action had succeeded.

Fix: return _text_response(res) early when res.ok is False so the model
receives only the error and can decide how to recover.

Tests added:
- test_capture_after_skipped_when_action_failed: patches click to return
  ok=False and asserts no capture call is issued.
- test_capture_after_fires_when_action_succeeds: ensures the happy path
  still triggers the follow-up capture.
2026-05-22 01:19:01 -07:00
Teknium 07b7cf6fe4 chore(release): add rodrigoeqnit to AUTHOR_MAP 2026-05-22 01:14:15 -07:00
Rodrigo c52cd48e25 fix(computer-use): add set_value to ComputerUseBackend ABC and _NoopBackend stub
_dispatch() routes action="set_value" to backend.set_value(), but:
- ComputerUseBackend did not declare set_value as @abstractmethod, so
  subclasses could silently omit it without a TypeError at class load time.
- _NoopBackend (the test/CI stub) had no set_value method at all, causing
  AttributeError in any test that exercises the set_value action path.

Fix:
- Add set_value as @abstractmethod to ComputerUseBackend in backend.py.
- Add a recording stub in _NoopBackend in tool.py.
- Add two TestDispatch cases: one verifying the call reaches the backend,
  one verifying the missing-value guard returns a clean error.
2026-05-22 01:14:15 -07:00
Tranquil-Flow d3f62c6913 fix(cli): clamp curses color 8 for 8-color terminals (Docker)
curses.init_pair(N, 8, -1) uses extended color 8 ("bright black" /
dim gray) which does not exist on 8-color terminals (COLORS == 8,
valid range 0-7).  This crashes the entire plugins UI, session
browser, and radio picker in Docker containers with:

    curses.error: init_pair() : color number is greater than COLORS-1

Replace all 5 occurrences across plugins_cmd.py, main.py, and
curses_ui.py with min(8, curses.COLORS - 1), which falls back to
COLOR_WHITE (7) on 8-color terminals.

Closes #13688
2026-05-21 23:40:58 -07:00
Teknium c769be344a fix(agent): recover from providers rejecting list-type tool content (#27344) (#30259)
Some providers (Xiaomi MiMo, some Alibaba endpoints, a long tail of
OpenAI-compatible servers) follow the OpenAI spec strictly and require
tool message `content` to be a string — they reject our list-type
content (text + image_url parts) with HTTP 400 'text is not set' /
'tool message content must be a string'.

Instead of an allowlist of known-good providers (maintenance burden,
guaranteed to miss aggregators like OpenRouter where the underlying
model determines support, not the aggregator name), this lands a
reactive recovery:

1. New `FailoverReason.multimodal_tool_content_unsupported` with a
   small pattern list covering the common 400 wordings.
2. `AIAgent._try_strip_image_parts_from_tool_messages` walks the API
   message list, downgrades any `role:tool` message whose content is
   list-with-image to a plain text summary (preserves text parts) in
   place, AND records the active (provider, model) in a session-scoped
   `_no_list_tool_content_models` set.
3. `_tool_result_content_for_active_model` short-circuits to a text
   summary when (provider, model) is in the cache — so after the first
   400 + retry, subsequent screenshots in the same session skip the
   round trip entirely.
4. Retry hook in `agent.conversation_loop` mirrors the existing
   `image_too_large` recovery: detect the reason, run the helper,
   retry once, fall through to the normal error path if no list-type
   tool content was actually present.

Cache is transient (per-session) by design — next session retries in
case the provider added support, no persistent state to maintain.

Fixes #27344. Closes #27351 (allowlist approach superseded by reactive
recovery).
2026-05-21 23:40:16 -07:00
teknium1 372e9a18cd fixup: log lazy-install errors at debug + AUTHOR_MAP for CipherFrame
Co-authored-by: CipherFrame <cipherframe@users.noreply.github.com>
2026-05-21 23:36:18 -07:00
CipherFrame b5c6d9ac08 fix: wire STT lazy-install into transcription_tools.py
The ensure('stt.faster_whisper') lazy-install mechanism was defined in
lazy_deps.py but never called from the STT code path. When
_HAS_FASTER_WHISPER (a module-level constant) evaluated to False at
import time, _get_provider() returned 'none' immediately without
attempting installation. On fresh container builds or venv recreations,
this meant voice message transcription broke silently until someone
manually installed faster-whisper.

Add _try_lazy_install_stt() helper that calls ensure() and
re-checks dynamically via importlib.util.find_spec. Wire it into
all three gates in transcription_tools.py:

- _get_provider() explicit 'local' path (line 221)
- _get_provider() auto-detect path (line 287)
- _transcribe_local() guard (line 405)

This ensures the first voice message after any fresh install triggers
auto-installation instead of failing permanently until a process restart.
2026-05-21 23:36:18 -07:00
helix4u f6f25b9449 fix(agent): fail fast on small Ollama runtime context 2026-05-21 23:25:01 -07:00
Teknium e77f1ed5f7 fix(agent): widen toolset gate to context engine tools (#5544 sibling)
The memory-provider gate added in the prior commit closes one of two
blind-injection sites in agent_init.py. The context engine block (lines
~1445) follows the identical pattern: agent.context_compressor.get_tool_schemas()
(lcm_grep, lcm_describe, lcm_expand) was appended to agent.tools unconditionally,
ignoring enabled_toolsets.

Same bug class, same local-model latency penalty, same one-line gate — using
'context_engine' as the toolset name (matches the existing plugin-system
convention in plugins.py, plugins_cmd.py, etc.).

Also adds Lempkey to scripts/release.py AUTHOR_MAP for the prior commit's
authorship.
2026-05-21 23:18:37 -07:00
lempkey 4c61fb6cf6 fix(agent): gate memory tool injection on enabled_toolsets (#5544)
MemoryManager.get_all_tool_schemas() output was appended to AIAgent.tools
unconditionally — bypassing the enabled_toolsets / platform_toolsets filter.
Setting `platform_toolsets: telegram: []` had no effect: fact_store and other
memory provider tools still leaked into the tool surface on every session.

Impact on local models (per @thundercat49's benchmarks on Qwen3-30B-A3B Q4_K_M /
RTX 3090): tool-formatted prompts process at 134 tok/s vs 1,230 tok/s for plain
text. With 8 memory tool schemas injected, a simple 'hello' on Telegram took
~42s instead of ~1.7s. Small models also entered tool-call loops when memory
tools were the only tools present.

Gate condition (matches the natural meaning of enabled_toolsets):
  None                       → no filter, inject (backward compat)
  contains 'memory'          → user opted in, inject
  otherwise (including [])   → skip injection

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-21 23:18:37 -07:00
brooklyn! 1264fab156 fix(tui): surface verbose tool details (#30225)
* fix(tui): surface verbose tool details

Emit redacted structured verbose args/results to the TUI so /verbose verbose can show full tool detail without reopening stdout, and fail closed if redaction is unavailable.

Salvages #29011.

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>

* fix(tui): address verbose detail review

Label verbose tool failures as errors, cover forced verbose reasoning, and avoid new diff type warnings from the redaction regression tests.

* fix(tui): bound verbose tool payloads

Cap verbose tool detail text before emitting JSON-RPC events and preserve verbose results on inline diff completions.

* fix(tui): align termux argv test with gc flag

Update the stale TUI launch expectation so the Termux freshness path matches the current direct Node argv.

---------

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-22 00:16:52 -05:00
Teknium 4e2c66a098 chore(release): add AUTHOR_MAP entry for Stark-X 2026-05-21 19:17:51 -07:00
Stark-X eb51fb6f50 fix(ssh): keep bulk sync extraction scoped to .hermes 2026-05-21 19:17:51 -07:00
liuhao1024 4a2fa77c15 fix(cli): pre-check CUA release asset for Intel macOS before install
The upstream cua-driver installer resolves the latest release and attempts
to download an architecture-specific asset. When the release only ships
arm64 builds (as of v0.1.6), the installer fails with a raw 404 on Intel
macOS with no clear path forward.

Add _check_cua_driver_asset_for_arch() that probes the GitHub Releases API
before running the installer. If the latest release has no x86_64/amd64
asset, print a clear warning and link to the upstream issue. On arm64 or
API failure, fail open and let the installer proceed as before.

Fixes #24530
2026-05-21 19:17:45 -07:00
teknium1 9896e43db5 fix(skills): load Linux-tagged skills on Termux (android sys.platform)
Reported by @LikiusInik in Discord: on Termux only 3 built-in skills
appeared and /gh-pr-workflow + every other slash-skill from
github/productivity/mlops was missing.

Root cause: skill_matches_platform() compares sys.platform.startswith()
against the skill's platforms list. Termux is a Linux userland on
Android, but Python 3.13+ reports sys.platform == "android" instead of
"linux" — so the ~60 built-in skills tagged platforms:[linux,macos,
windows] (github-pr-workflow, google-workspace, github-auth,
huggingface-hub, etc.) all got filtered out at the listing step in
tools/skills_tool.py:_find_all_skills and never appeared as /slash
commands or in skill_view.

Fix: when is_termux() detects we're running inside Termux, accept
"linux" platform tags regardless of whether sys.platform is "linux"
(pre-3.13) or "android" (3.13+). Also accept explicit
platforms:[termux] / [android] tags. macOS-only and Windows-only
skills correctly remain excluded.

E2E (simulated TERMUX_VERSION=set + sys.platform="android"):
  Before: _find_all_skills() returned ~3 skills.
  After:  _find_all_skills() returns 84 skills including
          github-pr-workflow, google-workspace, github-auth,
          huggingface-hub. Apple-only skills remain excluded.

Non-Termux Linux/macOS/Windows behavior unchanged (verified).

Tests: tests/agent/test_skill_utils.py — 9 new cases covering
android-as-Termux, the [linux,macos,windows] case, macOS-only
exclusion, explicit termux/android tags, non-Termux Android safety,
and unchanged behavior on real Linux/macOS.
2026-05-21 19:08:38 -07:00
adybag14-cyber d08c2a016a fix(tui): termux-gate composer rendering tweaks for Ink TUI
Salvaged from #28942 (adybag14-cyber). Only the Ink TUI half is taken
here — the bundled "termux compatibility note" added to skills_tool.py
in the original PR did not address the actual user-reported bug
(skill_matches_platform() filtering Linux skills out on Termux) and
also regressed the EXCLUDED_SKILL_DIRS set used to prune nested
.venv/site-packages skills.

Changes:
- ui-tui/src/lib/prompt.ts: single-cell ASCII '>' marker in Termux mode
  to avoid ambiguous-width glyph artifacts while typing.
- ui-tui/src/components/appLayout.tsx: suppress profile prefix on
  narrow Termux panes (>=90 cols still shows it).
- ui-tui/src/lib/inputMetrics.ts + components/messageLine.tsx +
  lib/virtualHeights.ts: termux-aware transcript body width — drop
  the desktop 20-col floor on narrow mobile layouts, align virtual
  heights with actual rendered width.
- ui-tui/src/components/textInput.tsx: disable fast-echo bypass by
  default in Termux to avoid ghosting at soft-wrap boundaries.
  HERMES_TUI_TERMUX_FAST_ECHO=1 opts back in.

Tests: ui-tui/src/__tests__/{prompt,termuxComposerLayout,textInputFastEcho}.test.ts
(12 PR-added tests pass; 3 pre-existing wrapAnsi-bundling failures on
main are unrelated.)

The real skill-listing fix on Termux ('android' platform matching
Linux skills) ships as a follow-up commit on this branch.
2026-05-21 19:08:38 -07:00
Teknium 0e2873a77d fix(computer_use): build summary once before aux-vision routing branch
The cherry-pick of #22891 (max_elements cap) reshuffled _capture_response
so summary was assigned inside both the multimodal and AX branches,
but #30126's aux-vision routing call (_route_capture_through_aux_vision)
fires BEFORE either branch and references the not-yet-bound name.

Compute summary once up-front, keep the AX-branch rebuild for the
truncation note.
2026-05-21 19:07:32 -07:00
briandevans 280dd4513a fix(computer-use): address Copilot review on max_elements cap
Four findings from Copilot's review on PR #22891, all in the AX
elements-array cap added by 22fa1ed:

1. The truncation note ("response truncated to N of M elements") was
   appended unconditionally — including in the som/vision multimodal
   path, whose response carries a screenshot rather than an `elements`
   array. The note described a payload field that wasn't present.
   Moved the note into the AX-text branch where the array actually
   appears.

2. `_format_elements(cap.elements)` ran on the full untrimmed list with
   its own `max_lines=40` cap, so a caller passing `max_elements=10`
   would see summary lines referencing `#11..#40` even though the JSON
   `elements` array only held #1..#10. Format on `visible_elements`
   instead so the summary indices always exist in the response.

3. `_coerce_max_elements` enforced a lower bound but no upper bound,
   so `max_elements=10_000_000` silently disabled the safeguard and
   reintroduced the original context-blow-up. Added a hard cap
   (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values.

4. The schema string said "Default 100" but the property carried no
   `default` field, and claimed `max_elements` had no effect on som/
   vision while the image-missing fallback path can still return an
   elements array. Added `"default": 100`, `"maximum": 1000`, and
   clarified the fallback-path wording.

Each finding gets a regression test:

- test_capture_ax_clamps_oversized_max_elements_to_hard_cap
- test_capture_ax_summary_indices_match_returned_elements
- test_capture_multimodal_summary_omits_truncation_note
- test_schema_max_elements_documents_default_and_upper_bound

Verified with `pytest tests/tools/test_computer_use.py` (53 passed,
including the 5 new cases). Confirmed each new test fails on the
pre-fix code path before applying the production change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:07:32 -07:00
briandevans bb694bad42 fix(computer-use): cap AX elements array to prevent context blowup (#22865)
`computer_use(action='capture', mode='ax')` returned the full AX element
list verbatim in the JSON response. Dense Electron / Obsidian / JetBrains
UIs publish 500+ AX nodes (one reproduction in #22865 returned 597
elements against Obsidian), so a single capture could consume enough
context to trigger compression failures or render the session unusable.
The human-readable `_format_elements` summary is already capped at 40
lines, so the truncation gap was invisible to anyone reading the summary
output.

Add a `max_elements` argument to the tool schema, default 100, that
trims the AX `elements` array. When the cap fires, the response surfaces
`total_elements` and `truncated_elements` and appends a "raise
max_elements or pass app= to narrow" hint to the summary so the model
knows the JSON view is partial and can re-issue with a tighter scope.

Validation is centralized in `_coerce_max_elements`: missing /
non-integer / sub-1 inputs fall back to the default cap, so the
protection can never be silently disabled by a malformed tool-call
argument. The cap only affects AX-mode JSON; `mode='som'` and
`mode='vision'` keep returning a screenshot + image-aware summary
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:07:32 -07:00
brooklyn! 9e30ef224d fix(tui): preserve scrollback when branching sessions (#30162)
Keep the visible transcript mounted after /branch switches to the new session, since the backend already carries the copied history forward.
2026-05-21 21:01:04 -05:00
brooklyn! a7cd254c29 feat(tui): mouse_tracking DEC mode presets (salvage of #26681) (#30084)
* feat(tui): make display.mouse_tracking pick which DEC modes to enable

Previously the boolean flag was all-or-nothing across modes 1000+1002+1003+1006.
Inside tmux, mode 1003 (any-motion) makes every mouse cross of the prompt row
fire a clipboard probe that surfaces as "No image in clipboard" — sometimes
dozens in a row. Disabling tracking entirely killed scroll-wheel scrolling too,
since tmux's own scrollback is preempted by the alt-screen TUI.

`display.mouse_tracking` (and `/mouse <preset>`) now accepts `off | wheel |
buttons | all` in addition to the legacy booleans. `wheel` is 1000+1006:
scroll wheel + click only, no drag, no hover — the tmux-friendly subset.
`buttons` adds 1002 for drag-to-select. `all` (= legacy `true`) keeps the
hover-driven UI (scrollbar paginate-on-hover, link mouseenter, etc.).

* fix(tui): repaint + sync mouse mode when display.mouse_tracking changes

Two interacting bugs left the TUI blank when `display.mouse_tracking`
switched at runtime (config edit, /mouse <preset>):

1. AlternateScreen's effect re-runs on every `mouseTracking` change,
   tearing down and re-entering the alt screen. After re-entry, ink's
   frame buffers are reset by `resetFramesForAltScreen()` but nothing
   schedules the follow-up render — the alt screen sits blank until
   some other state change happens to trigger one. Add a
   `scheduleRender()` in `setAltScreenActive`'s active=true branch so
   the freshly-entered alt screen gets a full repaint immediately.

2. `setAltScreenActive` early-returns when `active` hasn't changed,
   which silently drops a `mouseTracking` change if the cleanup→setup
   pair somehow leaves `altScreenActive` already true. Call
   `setAltScreenMouseTracking` explicitly from the AlternateScreen
   effect so the in-memory mode and terminal DECSET sequence stay in
   sync regardless of how `setAltScreenActive` resolved (the call is a
   no-op when the mode is unchanged).

* fix(tui): address copilot review #4341269705

- tui_gateway/server.py: drop the never-referenced _MOUSE_TRACKING_MODES
  frozenset (comment #3284802434). _MOUSE_TRACKING_ALIASES already
  centralizes the canonical preset set via its values; the separate
  constant added no behavior.
- tests/test_tui_gateway_server.py: update the existing
  test_config_mouse_uses_documented_key_with_legacy_fallback to assert
  the new preset strings ('all'/'off' instead of 'on'/'off',
  display.mouse_tracking persisted as 'all' instead of True) and add
  test_config_mouse_accepts_preset_strings_and_aliases covering /mouse
  set with wheel/click/unknown (comment #3284802453). The on/off legacy
  config.set return shape was an implementation detail of the boolean
  flag, not a stable API — the slash command, gateway help text, and
  docs all advertise the preset values now.
- ui-tui/packages/hermes-ink/src/ink/ink.tsx: schedule a render at the
  end of reenterAltScreen() (comment #3284802461). Mirrors the same fix
  in setAltScreenActive() from ece0a2f4c — without it, SIGCONT/resize
  self-heal/stdin-gap re-entry leaves the alt screen blank because
  every caller returns early after invoking us.

* fix(tui): address copilot review #4341308478 round 2

- ui-tui/src/config/env.ts (comment #3284837577): the precedence
  comment was misleading. Actual behavior on origin/main is
  HERMES_TUI_MOUSE_TRACKING (explicit override) > Termux default >
  HERMES_TUI_DISABLE_MOUSE legacy kill-switch. This is preserved from
  main; the only change here was the wrong comment that claimed
  DISABLE_MOUSE kept kill-switch semantics. Rewrote the comment block
  to document the actual precedence ladder.
- tui_gateway/server.py /mouse set (comment #3284837607): replaced
  'str(value or "").strip().lower()' with the explicit None idiom
  already used for /indicator, so programmatic callers can pass 0 /
  False and have them route through _MOUSE_TRACKING_ALIASES → 'off'
  instead of collapsing to '' and triggering the toggle path.
- ui-tui/packages/hermes-ink/src/ink/components/AlternateScreen.tsx
  (comment #3284837620): always prepend DISABLE_MOUSE_TRACKING before
  enableMouseTrackingFor(...) on mount. Otherwise selecting
  'wheel'/'buttons' from a state where DEC 1003 was already asserted
  (crash, another app, debugger) would silently leave hover on. Also
  unconditionally DISABLE on unmount so a crash mid-mount can't leak
  DEC modes back to the host shell.

* chore(release): map nat@nthrow.io to @nthrow for #26681 salvage

* fix(tui): drop redundant setAltScreenMouseTracking in AlternateScreen

Copilot review #4341356637 (comment #3284880417). The explicit
setAltScreenMouseTracking(mouseTracking) after setAltScreenActive(true,
mouseTracking) was defensive paranoia added in the previous fix commit
that's not actually reachable in practice:

- React's cleanup always runs before the next setup, so on any prop
  change (mouseTracking or writeRaw) the cleanup sets active=false
  first. Setup then sees active was false and applies the new mode
  via setAltScreenActive without early-returning.
- On the impossible 'active stayed true' path, the writeRaw above has
  already sent DISABLE_MOUSE_TRACKING + enableMouseTrackingFor(newMode)
  to the terminal, so the in-memory mode would lag but the visible
  state is already correct.

Removing the redundant call means a single DEC sequence per mount.
If the 'active stayed true' path ever manifests in practice, the
right fix is in setAltScreenActive (track mode regardless of the
active early-return), not here.

* fix(tui): always DISABLE before enableMouseTrackingFor in ink.tsx

Copilot review #4341379994 (comments #3284900825, #3284900840,
#3284900852). Three remaining call sites in ink.tsx still re-enabled
mouse tracking without first sending DISABLE_MOUSE_TRACKING:

- handleResize alt-screen recovery (line ~577)
- reassertTerminalModes stdin-gap re-assertion (line ~1351)
- reenterAltScreen SIGCONT/resize/stdin-gap self-heal (line ~1408)

For 'wheel'/'buttons' presets, omitting DISABLE leaves any externally-
asserted DEC 1003 (other apps, prior crash, tmux state) still active
and the hover-free preset silently has hover on. DISABLE_MOUSE_TRACKING
is idempotent and safe to send unconditionally — it resets all four
modes. Matches the pattern already in setAltScreenMouseTracking and
the AlternateScreen mount path.

* fix(tui): always DISABLE before enableMouseTrackingFor in exitAlternateScreen

Copilot review #4341452823 (comment #3284959762). exitAlternateScreen()
was the last call site in ink.tsx still re-enabling mouse tracking
without DISABLE first. Editors (vim/nvim/less) and tmux can leave
DEC 1003 hover asserted across the handoff back; without DISABLE,
'wheel'/'buttons' presets silently kept hover on after the editor
quit. Now all five enableMouseTrackingFor() call sites in ink.tsx
prepend DISABLE_MOUSE_TRACKING — handleResize, reassertTerminalModes,
reenterAltScreen, setAltScreenMouseTracking, exitAlternateScreen.

* fix(tui): add defensive default to enableMouseTrackingFor switch

Copilot review #4341485231 (comment #3284979323). TS exhaustive switch
returns string per the type system, but a JS caller / corrupted config
/ hot-reload-in-dev could reach the function with an unknown value at
runtime. Without a default, that path returns undefined which then
concatenates as the literal string 'undefined' into the terminal byte
stream — visibly garbling output. Treat unknown as 'off' (no DEC
sequences) so the worst case is silent input loss rather than a
wrecked screen.

---------

Co-authored-by: Nat Thrower <nat@nthrow.io>
2026-05-21 20:25:52 -05:00
Ben Barclay 4d58e48cdb Merge pull request #29387 from NousResearch/fix/no-docker-tag
fix(ci): stop pushing per-commit SHA tags to Docker Hub
2026-05-22 10:38:32 +10:00
xxxigm bec2250d2c test(computer_use): end-to-end regression for capture routing (#24015)
Add tests/tools/test_computer_use_capture_routing.py — 13 integration
tests that drive _capture_response end-to-end with deterministic stubs
for the routing helper, _run_async, vision_analyze_tool, and
get_hermes_dir, so the full code path is exercised without a live
cua-driver, real auxiliary client, or network access.

Coverage:

  * TestCaptureResponseDefaultPath (3 cases)
    - SOM PNG capture returns the legacy multimodal envelope when the
      routing helper says 'native' (image/png MIME).
    - Same path returns image/jpeg MIME for JPEG payloads (cua-driver
      can return either).
    - AX-only mode never even consults the routing helper because no
      PNG is present.

  * TestCaptureResponseRoutedToAuxVision (5 cases)
    - SOM capture with routing on returns a JSON string with the
      vision_analysis embedded, the AX/SOM index preserved, and NO
      image_url parts. Verifies the aux call receives a path under
      the configured cache and a prompt that grounds itself against
      the AX summary.
    - Temp screenshot file is unlinked after _capture_response returns,
      including when the aux call raises (the finally block runs).
    - Empty / malformed aux analysis falls back to the multimodal
      envelope so the user always gets *something* useful.

  * TestRoutingDecisionWiring (4 cases)
    - Explicit auxiliary.vision in config flips routing on regardless of
      main-model vision capability.
    - Vision-capable main + native tool-result support keeps multimodal.
    - Config load failure fails open (returns False, multimodal path
      continues to work).
    - Helper exception is swallowed and routes to legacy behaviour.

  * TestBugReproductionAnchor (1 case) - directly pins the #24015
    contract: when routing is on, the response must NEVER contain a
    'data:image' or 'image_url' substring. That is exactly what tripped
    the reporter's HTTP 404 ('No endpoints found that support image
    input') on tencent/hy3-preview before the fix.

Bug-reproduction proof:
  $ git checkout upstream/main -- tools/computer_use/tool.py
  $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py
  ============================== 13 failed in 1.29s ==============================

  $ # restore tool.py to this branch's HEAD
  $ scripts/run_tests.sh tests/tools/test_computer_use_capture_routing.py
  ============================== 13 passed in 1.04s ==============================

Total branch coverage:
  85 passed across test_computer_use.py, test_computer_use_vision_routing.py,
  test_computer_use_capture_routing.py
2026-05-21 17:38:19 -07:00
xxxigm e02a7e5e1c fix(computer_use): route SOM/vision captures via auxiliary.vision (#24015)
When the active main model has no vision capability — or when the user
explicitly configured auxiliary.vision in config.yaml — sending the
captured screenshot back to the main model in a multimodal tool-result
envelope is the wrong move: it trips HTTP 404 / 400 at the provider
boundary (e.g. 'No endpoints found that support image input') and the
agent loop reports a hard tool failure for what should have been a
simple capture.

The reporter on #24015 hit this with:

  model:
    default: tencent/hy3-preview      # no vision support
    provider: openrouter
  auxiliary:
    vision:
      provider: openrouter
      model: google/gemini-2.5-flash  # explicitly configured

…and observed:

  computer_use(action='capture', mode='som')
  → ⚠️ API call failed (attempt1/3): NotFoundError [HTTP 404]
     🔌 Provider: openrouter  Model: tencent/hy3-preview
     📝 Error: HTTP 404: No endpoints found that support image input

Fix: in tools/computer_use/tool.py::_capture_response, after a
screenshot is captured (modes 'som' / 'vision'), consult the routing
helper introduced earlier in this branch. When it says 'route to aux',
materialise the PNG to $HERMES_HOME/cache/vision/, run vision_analyze
on it (which honours auxiliary.vision via the standard async_call_llm
task='vision' router), and return a text-only JSON tool result that
embeds the analysis alongside the existing AX/SOM index. The main
model never sees the pixels — it sees an actionable text description
plus the same set-of-mark element index it normally uses.

The two new helpers (_should_route_through_aux_vision,
_route_capture_through_aux_vision) keep the policy and the IO
separated so each can be tested in isolation. Both fail open: if the
config import fails, if the aux call raises, or if the analysis is
empty, we fall back to the existing multimodal envelope so the
behaviour is at worst the pre-fix status quo. Temp screenshot files
are cleaned up unconditionally in a finally block — even on aux call
failure — to avoid leaving residue under cache/vision/.

The end-to-end regression for #24015 is added in the next commit.
2026-05-21 17:38:19 -07:00
xxxigm 5ce5fe3181 test(computer_use): cover capture vision-routing helper
Add tests/tools/test_computer_use_vision_routing.py — 28 unit tests
that pin the contract of the new vision-routing helper introduced in
the previous commit:

  * TestExplicitAuxVisionOverride (12 cases): mirror the
    auxiliary.vision detection rules used by agent.image_routing so
    the capture path and the user-attached-image path agree on what
    counts as an explicit override (provider/model/base_url with
    non-blank, non-'auto' values).
  * TestRouteDecision (7 cases): pin the policy itself — explicit
    override always wins, vision-capable + native-tool-result keeps
    multimodal, everything else fails closed and routes to aux.
  * TestLookupHelpers (5 cases): defensive paths for the models.dev /
    tool-result-support lookups (blank inputs, exceptions, missing
    caps).
  * TestModuleSurface (4 cases): pin the public/__all__ surface and
    keep internal helpers addressable so the integration test in the
    next commit can monkeypatch them deterministically.

Run with:
  scripts/run_tests.sh tests/tools/test_computer_use_vision_routing.py
2026-05-21 17:38:19 -07:00
xxxigm 531efe7208 fix(computer_use): add helper to decide capture vision routing
Add tools/computer_use/vision_routing.py with
should_route_capture_to_aux_vision(provider, model, cfg) — a small
policy helper that decides whether a captured screenshot should be
returned as a multimodal envelope (main model has native vision) or
pre-analysed through the auxiliary.vision pipeline so the main model
only sees text.

The decision mirrors agent.image_routing.decide_image_input_mode for
user-attached images, so the capture path and the user-turn path agree
on what counts as an explicit aux vision override:
  * provider/model/base_url under auxiliary.vision => explicit override
    => route through aux vision
  * provider+model accepts multimodal tool results AND main model
    reports supports_vision=True => keep multimodal envelope
  * everything else (no tool-result image support, non-vision model,
    metadata lookup failure) => fail closed and route through aux

No call sites are changed in this commit; the helper is added in
isolation so the routing decision can be unit-tested before it is
plumbed into _capture_response().
2026-05-21 17:38:19 -07:00
Teknium 2a474bcf72 fix(termux): resolve packed-refs and worktree refs in skill-sync fingerprint
The bundled-skill sync stamp added in the cherry-picked salvage commit
parsed .git/HEAD and looked for a loose ref file in the worktree gitdir
only, so two real cases hit the unresolved branch:

- repos after `git gc` where active refs live in packed-refs
- linked worktrees, whose branch ref lives in <commondir>/refs/heads/
  (verified on the worktree this salvage was built in)

Both fell back to a constant-string fingerprint, so post-commit launches
would never re-run the real skill sync. Now we resolve packed-refs and
check both the worktree gitdir and the common dir for loose refs.

Adds three tests covering: packed-refs resolution, worktree common-dir
packed lookup, worktree common-dir loose lookup, and the explicit
'unresolved' marker (still stable + version-fallback-safe).
2026-05-21 17:19:05 -07:00
adybag14-cyber 6dbbf20ff4 perf(termux): speed up non-tui cli startup 2026-05-21 17:19:05 -07:00
briandevans 5aa4727f34 fix(computer-use): surface app=… filter no-match instead of silently using frontmost (#24170 bug 1)
`CuaDriverBackend.capture(app=X)` and `focus_app(app=X)` silently fell back
to the frontmost on-screen window when X matched no app — typically a
menu-bar utility (e.g. "Fuwari" in the bug reporter's case) rather than
the requested app. The agent then received UI elements for the wrong app
and clicked / typed into it.

The root cause is a localized macOS app name mismatch: `list_windows`
returns the localized `app_name` (e.g. "計算機" on a Japanese/Chinese
system) but callers naturally pass the English name ("Calculator"). The
substring filter doesn't match, and the code falls through to picking the
frontmost window with no signal that the filter was effectively dropped.

Fix:

- `capture(app=…)`: when the filter matches nothing, return a
  `CaptureResult` with empty `app`/`elements` and a diagnostic
  `window_title` pointing the caller at `list_apps` and noting the
  localized-name convention. `_active_pid` / `_active_window_id` are left
  untouched so a subsequent action doesn't inadvertently hit the wrong
  process.
- `focus_app(app=…)`: when the filter matches nothing, set `target = None`
  and let the existing `return ActionResult(ok=False, …, "No on-screen
  window found for app …")` path fire instead of falsely reporting success
  on the frontmost window.

This addresses bug 1 only from #24170. Bugs 2 & 5 are addressed in #30046;
bugs 3 & 4 in #30032.
2026-05-21 17:15:35 -07:00
Bartok9 4cc18877c6 fix(computer_use): preserve app context for capture_after; fix element label parsing (#24170 bugs 2 & 5)
Bug 2 (capture_after=True loses app context):
_maybe_follow_capture called backend.capture(mode='som') with no app=,
causing cua-driver to capture the frontmost window instead of the app
targeted by the preceding capture/focus_app. Fix: track _last_app on
CuaDriverBackend and thread it through the follow-up capture call so
the same app is re-captured regardless of which window has OS focus.

Bug 5 (element labels stripped in capture results):
_ELEMENT_LINE_RE matched the classic '  - [N] AXRole "label"' format
but not the '[N] AXRole (order) id=Label' format introduced in
cua-driver v0.1.6. All element labels were silently dropped as empty
strings, making element identification impossible.

Fix: extend regex to capture both group(3) (quoted label) and group(4)
(id= label), and update _parse_elements_from_tree to use group(4) as
fallback. Both old and new cua-driver output now produce populated
UIElement.label values.

focus_app() now also sets _last_app so that capture_after= on any
subsequent action re-targets the focused app.

5 new regression tests added.

Part of #24170 (bugs 1 and 3/4 addressed separately).
2026-05-21 14:19:09 -07:00
Teknium 3fde8c153d fix(skills): prune dependency/venv dirs from all skill scanners (#30042)
* fix(skills): skip dependency dirs in skill scan

* fix(skills): widen sibling rglob scanners to use shared exclusion set

Follow-up to PR #29968. The contributor's PR widened EXCLUDED_SKILL_DIRS
in the canonical walker (iter_skill_index_files), which fixes the
user-visible discovery path. This commit sweeps the ~12 other
rglob('SKILL.md') sites that did their own ad-hoc filtering — most only
checked .git/.hub, some had no filter at all — so dependency dirs
(.venv, node_modules, site-packages, etc.) cannot leak ghost skills
through the secondary paths.

Adds agent.skill_utils.is_excluded_skill_path(path) helper. Migrates
all 13 sites to use it. Removes 3 hardcoded duplicate filter sets.

Sites touched:
  agent/curator_backup.py        - skill backup file count
  gateway/run.py                 - disabled-skill response (2 sites)
  hermes_cli/dump.py             - skill count in env dump
  hermes_cli/profile_describer.py- profile description (2 sites)
  hermes_cli/profile_distribution.py - profile install count
  hermes_cli/profiles.py         - profile skill count
  hermes_cli/skills_hub.py       - category detection
  tools/skill_manager_tool.py    - skill name lookup (already used set, now uses helper)
  tools/skill_usage.py           - usage tracking + skill dir lookup (2 sites)
  tools/skills_hub.py            - optional skills find + scan (2 sites)
  tools/skills_sync.py           - bundled skills sync

E2E verified with the exact reported shape
(bring/scripts/.venv/.../typer/.agents/skills/typer/SKILL.md): no
sibling site picks up the ghost skill, all five legit-skill counts
still return 1.

* chore(infographic): retro-pop-grid bento for PR #30042 skill-scanner sweep

---------

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-05-21 14:18:02 -07:00
helix4u 3462b097e2 fix(voice): chunk oversized CLI recordings 2026-05-21 14:17:39 -07:00
Teknium 552e9c7881 feat(secrets): Bitwarden Secrets Manager integration with lazy bws install (#30035)
* feat(secrets): Bitwarden Secrets Manager integration with lazy bws install

Pull API keys from Bitwarden Secrets Manager at process startup
instead of storing them all in plaintext in ~/.hermes/.env.  One
bootstrap token (BWS_ACCESS_TOKEN) replaces N per-provider keys, and
rotating a credential becomes a single change in the Bitwarden web
app.

Bitwarden defaults to source of truth: secrets pulled from BSM
overwrite any matching env vars on startup so rotations actually
take effect.  Set secrets.bitwarden.override_existing: false in
config.yaml to invert.

The bws binary is auto-downloaded into ~/.hermes/bin/bws on first
use (pinned to v2.0.0, SHA-256 verified against the GitHub release
checksum file).  No apt, brew, or sudo required.

New surfaces:
  hermes secrets bitwarden setup    — interactive wizard
  hermes secrets bitwarden status   — config + binary + token state
  hermes secrets bitwarden sync     — dry-run fetch / --apply exports
  hermes secrets bitwarden disable  — flip enabled: false
  hermes secrets bitwarden install  — just download the binary

Failures (missing binary, bad token, no network) never block Hermes
startup — they emit a one-line warning to stderr and continue with
whatever credentials .env already had.

Docs: website/docs/user-guide/secrets/{index,bitwarden}.md
Tests: tests/test_bitwarden_secrets.py (26 tests, hermetic — bws
       subprocess and HTTP downloads fully mocked)

* chore(infographic): add bitwarden-secrets-manager bento-grid retro-pop-grid

Generated for PR #30035 — Bitwarden Secrets Manager integration.
Style picked via pick_pr_infographic_style.py rotation:
  layout: bento-grid
  style:  retro-pop-grid
  aspect: 1:1 square

Saved at infographic/bitwarden-secrets-manager/infographic.png
2026-05-21 14:10:34 -07:00
liuhao1024 18cd1e5c72 fix(computer_use): correct type_text MCP tool name and implement drag action
Bug 3: The cua_backend type_text() method called MCP tool 'type_text_chars'
which does not exist in current cua-driver. Changed to 'type_text' which is
the correct MCP tool name.

Bug 4: The drag() method returned a hardcoded 'not supported' error even
though cua-driver exposes a 'drag' MCP tool. Implemented proper drag
dispatching with coordinate-based and element-based targeting.

Added dispatch-level validation for drag to ensure from/to coordinates
or elements are provided before calling any backend.

Fixes #24170 (bugs 3 and 4)
2026-05-21 14:08:28 -07:00
github-actions[bot] 0ce12a9241 fix(nix): auto-refresh npm lockfile hashes
Source: 56b79f12ac

Run: https://github.com/NousResearch/hermes-agent/actions/runs/26250404490
2026-05-21 20:11:48 +00:00
Teknium 56b79f12ac fix(dashboard): remove country flags from language picker (#29997)
Closes #29750. Reporter flagged that 繁體中文 displayed the TW flag
instead of the PRC flag. Rather than picking a side, drop the
language-flag pairings entirely — languages aren't countries
(English ≠ GB, Portuguese ≠ PT, Mandarin variants ≠ any single
jurisdiction), and endonyms are unambiguous.

- LOCALE_META: strip flagCountryCode field
- LanguageSwitcher: remove LocaleFlagIcon component + both call sites
- main.tsx: drop flag-icons CSS import
- package.json: uninstall flag-icons
2026-05-21 13:10:52 -07:00
teknium1 3d2f146460 fix(tui): also pass --expose-gc on the wheel-bundled launch path
The original PR fixed the ext_dir and built-tui paths but missed the
sibling pip-wheel path at line 1155. Without this, wheel installs would
lose --expose-gc entirely (the env-var append at the call site was
already removed). All three production node-launch sites now pass
--expose-gc via argv consistently.
2026-05-21 13:10:34 -07:00
teknium1 2e3f576298 chore(release): map yichengqiao21 to YarrowQiao 2026-05-21 13:10:34 -07:00
YarrowQiao 2ea7cf287e fix(tui): pass --expose-gc as node argv instead of NODE_OPTIONS
Node refuses to start when NODE_OPTIONS contains --expose-gc:

    node: --expose-gc is not allowed in NODE_OPTIONS

NODE_OPTIONS is restricted to a small allowlist of flags that are safe
to inject via env (since any process able to set env vars on a node
child could otherwise enable arbitrary capabilities). --expose-gc is
not on that list and never has been -- it must be passed as a direct
CLI flag.

_launch_tui() was appending --expose-gc to NODE_OPTIONS before spawning
the TUI's node process, which made `hermes --tui` fail to start on
every modern node release. The intent (manual GC for long sessions to
avoid fatal-OOM) is preserved by inserting --expose-gc directly into
the node argv in _make_tui_argv() -- same effect, but actually allowed.

--max-old-space-size=8192 stays in NODE_OPTIONS: it *is* allowlisted,
and keeping it there means downstream node spawns inherit the same
heap cap without having to re-thread the flag through every spawn site.

The dev paths (`tsx src/entry.tsx` and `npm start` fallback) are left
alone -- they don't accept node flags directly, and the production
dist path is the one users actually hit via `hermes --tui`.

Repro before fix:

    $ hermes --tui
    /usr/bin/node: --expose-gc is not allowed in NODE_OPTIONS
2026-05-21 13:10:34 -07:00
helix4u ba9964ff0d fix(custom): pass custom provider extra body
Allow custom OpenAI-compatible providers declared under `custom_providers:`
to set provider-specific `extra_body` fields and have Hermes merge them into
chat-completions requests when the matching custom endpoint is active.

This is a manual per-provider override rather than a model-name heuristic.
OpenAI-compatible Gemma thinking support is real, but the on-wire payload
shape is backend-specific: some servers want top-level `enable_thinking`,
while vLLM Gemma and NIM-style endpoints expect `chat_template_kwargs`.
A per-provider override is safer than picking one assumed payload.

Example config:

```yaml
custom_providers:
  - name: gemma-local
    base_url: http://localhost:8080/v1
    model: google/gemma-4-31b-it
    extra_body:
      enable_thinking: true
      reasoning_effort: high
```

For vLLM Gemma or NIM-style endpoints, use the nested shape those servers
expect:

```yaml
extra_body:
  chat_template_kwargs:
    enable_thinking: true
```

Changes:

- `hermes_cli/config.py`: preserve `extra_body` in normalized
  `custom_providers:` entries and allow it in the validated field set.
- `hermes_cli/runtime_provider.py`: propagate custom-provider `extra_body`
  as `request_overrides.extra_body` for named custom runtime resolution,
  including credential-pool paths.
- `agent/agent_init.py`: at agent init, locate the matching custom-provider
  entry by `base_url` (+ optional model) and merge its `extra_body` into
  `AIAgent.request_overrides`, with caller-provided overrides winning on
  conflicting top-level keys.
- `plugins/model-providers/custom/__init__.py`: keep existing CustomProfile
  behavior (Ollama `num_ctx`, `think=False` when reasoning disabled);
  user-configured `extra_body` flows through `request_overrides`.
- `website/docs/integrations/providers.md`: document the explicit
  `extra_body` override and the vLLM/Gemma `chat_template_kwargs` variant.
- Tests cover config normalization, runtime propagation, model matching,
  trailing-slash equivalence, fallback when no `model` field is set, and
  caller-override merging precedence.

Verified end-to-end against `CustomProfile` via `ChatCompletionsTransport`:
configured `extra_body` reaches `kwargs.extra_body` on the wire request,
and coexists with profile-generated entries (Ollama `num_ctx`, `think=False`)
without clobber.

Salvaged from #29022 onto current `main`. Cosmetic typing edit in
`plugins/model-providers/custom/__init__.py` and a stale-base docs revert
in `providers.md` were dropped during cherry-pick.

Closes #29022
2026-05-21 07:48:53 -07:00
ethernet 2fdefca570 Merge pull request #28269 from cresslank/chore/tui-remove-unused-babel-deps
chore(tui): remove unused Babel build deps
2026-05-21 10:21:31 -04:00
ethernet 48be2e0e4d test: use subprocesses for each test file (#29016)
* ci(tests): install ripgrep from prebuilt tarball instead of apt

apt-get update + install of ripgrep takes ~4 min on the GHA Ubuntu
runners (the apt-get update against archive.ubuntu.com is the slow
part; ripgrep itself is small). Switching to the upstream musl
binary tarball cuts the step to a few seconds.

- Pinned to ripgrep 15.1.0 with sha256 verification (same hash as
  published in the releases sha256 sidecar file).
- Drops the `rg` binary into /usr/local/bin so it is on PATH for
  every subsequent step without GITHUB_PATH manipulation.
- Applied to both the test and e2e jobs in tests.yml.

* fix(cli): compile syntax check to tempdir, not source __pycache__

`_validate_critical_files_syntax` runs `py_compile.compile()` on each
critical bootstrap file after a successful `git pull`. The default
`py_compile` writes the resulting `.pyc` next to the source under
`__pycache__/`, which causes two real problems:

1. Parallel test workers walking the same source tree (e.g. running
   the suite under per-file process isolation) can race against each
   other on the `__pycache__` write — manifests as flaky 'directory
   not empty' errors during teardown.
2. In production, the post-pull syntax check leaves a `.pyc` behind
   that the next interpreter run might pick up — fine when the
   interpreter version matches, sketchy if it doesn't.

Fix: write the compiled output to a `tempfile.TemporaryDirectory()`
that's discarded on function exit. We only care about the compile-or-not
signal, not the artifact.

* test(runner): per-file process isolation, drop manual state reset + xdist

Replace fragile manual _reset_module_state test fixtures with robust
per-file subprocess isolation. Each test file runs in a fresh
`python -m pytest <file>` subprocess via ThreadPoolExecutor. No xdist,
no custom pytest plugin, no shared worker state.

Key changes:
  * scripts/run_tests_parallel.py — new runner: discovers test files,
    runs N in parallel via ThreadPoolExecutor, captures stdout per file,
    treats exit code 5 (no tests collected) as pass, kills all children
    on exit. Change from cpu_count to cpu_count*2. The runner is
    I/O-bound (waiting on subprocess.communicate() from pytest children)
    The parent process does almost no CPU work, so 2x oversubscription
    keeps more pipes full. When a file fails, immediately show the last
    30 lines of pytest output (stack traces + FAILED summary) plus a
    ready-to-copy repro command:
      python -m pytest tests/agent/test_auxiliary_client.py
  * scripts/run_tests.sh — delegates to run_tests_parallel.py
  * .github/workflows/tests.yml — test step: python
scripts/run_tests_parallel.py
  * pyproject.toml — drop pytest-xdist, pytest-split; simplify addopts
  * tests/conftest.py — remove ~200 lines of manual state-reset fixtures
  * AGENTS.md — update Testing section for per-file design

* test(runner): speed gateway test antipattern scan up

* fix(test): web search provider plugin test missing xai

* fix(tests): make 14 test files pass under per-file subprocess isolation

Tests that relied on cross-file state pollution from xdist workers
fail when run in isolation (per-file subprocess model). Root causes
and fixes:

Tool registry not populated:
  - test_video_generation_tool_surface_matrix: add discover_builtin_tools()
  - test_web_providers_brave_free/ddgs/searxng/general: autouse fixtures
    registering all 8 bundled web providers, reset after each test
  - test_website_policy: same provider registration pattern
  - test_web_tools_tavily: same pattern across 3 dispatch test classes
  - Also add is_safe_url/check_website_access mocks where SSRF check
    blocks example.com (DNS resolution fails in isolated envs)

Stale check_fn cache:
  - test_kanban_tools: invalidate_check_fn_cache() + _clear_tool_defs_cache()
    in both kanban guidance tests (prior test cached False for kanban_show)
  - test_discord_tool: cache invalidation in setup/teardown
  - test_homeassistant_tool: invalidate_check_fn_cache() before registry queries

Module-level state pollution:
  - test_auxiliary_client: autouse fixture clearing _aux_unhealthy_until cache
  - test_skill_commands: set_session_vars() instead of patch.dict(os.environ)
    (ContextVar takes precedence over os.environ)
  - test_dm_topics: overwrite sys.modules + separate telegram.constants mock
    + force-reimport of gateway.platforms.telegram
  - test_terminal_tool_requirements: removed duplicate class declaration,
    autouse _clear_caches fixture

* change(tests): run_tests.sh explicitly includes env vars

instead of manually dropping some vars, now we just only include some

* fix(tests): 5 more isolation/NixOS fixes

- test_approval_plugin_hooks: isolate HERMES_HOME so real user's
  command_allowlist doesn't short-circuit the approval path
- test_google_chat: skipif when Platform.GOOGLE_CHAT not in enum
  (feature not merged on this branch)
- test_write_deny: test systemd prefix against tmp_path instead of
  /etc/systemd which resolves to /nix/store on NixOS
- test_pty_bridge: use shutil.which('cat') instead of /bin/cat
  (doesn't exist on NixOS)
- profiles.py: rmtree onexc handler chmod's parent dirs too, fixing
  profile deletion when copytree preserved read-only modes from
  nix store

* fix(tests): clear unhealthy cache in autouse fixture for auxiliary_client

* fix(tests): skip send_message when telegram not installed; handle missing worker_id in browser_supervisor

* fix: py3.11 rmtree onexc compat + belt-and-suspenders unhealthy cache clear for expired codex test

* fix: address PR #29016 review feedback

- Remove tracked .pytest-cache/ artifact and add to .gitignore
- Fix stale 'xdist worker' comment in conftest.py
- Deduplicate web provider registration into tests/tools/conftest.py
  shared helper (register_all_web_providers), replacing 8 copy-pasted
  blocks across 6 test files
- Update PR description: remove stale recovered-test-files claim,
  fix worker count to match code (cpu_count*2)

* fix: eliminate race in stale-cache achievements test

The background scan thread could complete and overwrite _SNAPSHOT_CACHE
before evaluate_all() returned the stale data — only 10 fake sessions
made the scan finish instantly. Added scan_delay param to _FakeSessionDB
and set it to 2s in the stale-cache test so the background thread can't
win the race.
2026-05-21 16:40:04 +05:30
cresslank 8ad34db551 chore(tui): remove unused Babel build deps
Remove the stale Babel compiler config and direct Babel dev dependencies from the TUI package.

Regenerate the npm lockfile and refresh the Nix fetchNpmDeps hash for the trimmed dependency graph.
2026-05-20 17:23:36 -05:00
ethernet ef43938e2b fix(ci): stop pushing per-commit SHA tags to Docker Hub
Only push named tags (:main on merge, <release_tag> on release)
instead of creating a sha-<sha> tag for every commit to main.
The :main floating tag is still advanced on every merge with
the same ancestor-check safety guarantee, but there are no
longer individual immutable tags per commit.
2026-05-20 12:42:18 -04:00
338 changed files with 19935 additions and 3773 deletions
+70 -133
View File
@@ -27,9 +27,9 @@ on:
permissions:
contents: read
# Concurrency: push/release runs are NEVER cancelled so every merge gets its
# own SHA-tagged image; :main and :latest are guarded separately by the
# move-main and move-latest jobs. PR runs reuse a PR-scoped group with
# Concurrency: push/release runs are NEVER cancelled so every merge gets
# its own :main or release-tagged image. :latest is guarded separately
# by the move-latest job. PR runs reuse a PR-scoped group with
# cancel-in-progress: true so rapid pushes to the same PR collapse to the
# latest commit.
concurrency:
@@ -92,10 +92,10 @@ jobs:
# pattern for multi-runner multi-platform builds.
#
# We apply the OCI revision label here (and again on arm64) because
# the move-main / move-latest jobs read it off the linux/amd64
# sub-manifest config of the floating tag to decide whether it's safe
# to advance. The label must be on each per-arch image — manifest
# lists themselves don't carry image config labels.
# the move-latest job reads it off the linux/amd64 sub-manifest
# config of the floating tag to decide whether it's safe to advance.
# The label must be on each per-arch image — manifest lists themselves
# don't carry image config labels.
- name: Push amd64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
@@ -208,8 +208,14 @@ jobs:
# ---------------------------------------------------------------------------
# Stitch both per-arch digests into a single tagged multi-arch manifest.
# This is a registry-side operation — no building, no layer re-push —
# so it runs in ~30 seconds. On main pushes it produces :sha-<sha>.
# On releases it produces :<release_tag_name>.
# so it runs in ~30 seconds. On main pushes it produces :main; on
# releases it produces :<release_tag_name>.
#
# For main pushes the ancestor check runs BEFORE the manifest push so
# we never overwrite :main with an older commit. The top-level
# concurrency group (`docker-${{ github.ref }}` with
# `cancel-in-progress: false`) already serialises runs per ref; the
# ancestor check is defense-in-depth.
# ---------------------------------------------------------------------------
merge:
if: github.repository == 'NousResearch/hermes-agent' && (github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release')
@@ -217,10 +223,15 @@ jobs:
needs: [build-amd64, build-arm64]
timeout-minutes: 10
outputs:
pushed_sha_tag: ${{ steps.mark_pushed.outputs.pushed }}
pushed_release_tag: ${{ steps.mark_release_pushed.outputs.pushed }}
release_tag: ${{ steps.tag.outputs.tag }}
steps:
- name: Checkout code
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1000
- name: Download digests
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
@@ -237,120 +248,19 @@ jobs:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Compute the tag for this run. Main pushes use sha-<sha> (so every
# commit gets its own immutable tag); releases use the release tag name.
- name: Compute tag
id: tag
run: |
if [ "${{ github.event_name }}" = "release" ]; then
echo "tag=${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
else
echo "tag=sha-${{ github.sha }}" >> "$GITHUB_OUTPUT"
fi
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
set -euo pipefail
# Build the arg array from each digest file (filename = the digest
# hex, with no sha256: prefix; empty file content, only the name
# matters). Using an array avoids shellcheck SC2046 and keeps
# every digest a single argv token even under pathological names.
args=()
for digest_file in *; do
args+=("${IMAGE_NAME}@sha256:${digest_file}")
done
docker buildx imagetools create \
-t "${IMAGE_NAME}:${TAG}" \
"${args[@]}"
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
- name: Inspect image
run: |
docker buildx imagetools inspect "${IMAGE_NAME}:${TAG}"
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
# Signal to move-main that the SHA tag is live. Only on main pushes;
# releases set pushed_release_tag instead.
- name: Mark SHA tag pushed
id: mark_pushed
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: echo "pushed=true" >> "$GITHUB_OUTPUT"
# Signal to move-latest that the release tag is live.
- name: Mark release tag pushed
id: mark_release_pushed
if: github.event_name == 'release'
run: echo "pushed=true" >> "$GITHUB_OUTPUT"
# ---------------------------------------------------------------------------
# Move :main to point at the SHA tag the merge job pushed.
#
# :main is the floating tag that tracks the tip of the main branch. Every
# merge to main retags :main forward. Users who want "latest dev build"
# pull :main; users who want stable releases pull :latest.
#
# The real serialization guarantee comes from the top-level concurrency
# group (`docker-${{ github.ref }}` with `cancel-in-progress: false`),
# which ensures at most one workflow run for this ref executes at a time.
# That means two move-main steps for the same ref cannot overlap.
#
# This job has its own concurrency group as defense-in-depth: if the
# top-level group is ever loosened, queued move-mains will run serially
# in arrival order, each one running the ancestor check below and either
# advancing :main or skipping. `cancel-in-progress: false` matches the
# top-level setting — we don't want rapid pushes to cancel a queued
# move-main, because the ancestor check is the real safety mechanism
# and queueing is cheap (move-main is a ~30s registry op).
#
# Combined with the ancestor check, this means :main only ever moves
# forward in git history.
# ---------------------------------------------------------------------------
move-main:
if: |
github.repository == 'NousResearch/hermes-agent'
&& github.event_name == 'push'
&& github.ref == 'refs/heads/main'
&& needs.merge.outputs.pushed_sha_tag == 'true'
needs: merge
runs-on: ubuntu-latest
timeout-minutes: 10
concurrency:
group: docker-move-main-${{ github.ref }}
cancel-in-progress: false
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 1000
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Read the git revision label off the current :main manifest, then
# use `git merge-base --is-ancestor` to check whether our commit is a
# descendant of it. If :main doesn't exist yet, or its label is
# missing, we treat that as "safe to publish". If another run already
# advanced :main past us (or diverged), we skip and leave it alone.
# use `git merge-base --is-ancestor` to check whether our commit is
# a descendant of it. If :main doesn't exist yet, or its label is
# missing, we treat that as "safe to publish". If another run
# already advanced :main past us (or diverged), we skip and leave
# it alone.
- name: Decide whether to move :main
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
id: main_check
run: |
set -euo pipefail
image=nousresearch/hermes-agent
# Pull the JSON for the linux/amd64 sub-manifest's config and extract
# the OCI revision label with jq — Go template field access can't
# handle dots in map keys, so using json+jq is the robust route.
image_json=$(
docker buildx imagetools inspect "${image}:main" \
--format '{{ json (index .Image "linux/amd64") }}' \
@@ -383,7 +293,6 @@ jobs:
exit 0
fi
# Make sure we have the :main commit locally for merge-base.
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
git fetch --no-tags --prune origin \
"+refs/heads/main:refs/remotes/origin/main" \
@@ -396,7 +305,6 @@ jobs:
exit 0
fi
# Our SHA must be a descendant of the current :main to be safe.
if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
echo "Our commit is a descendant of :main — safe to advance."
echo "push_main=true" >> "$GITHUB_OUTPUT"
@@ -405,19 +313,48 @@ jobs:
echo "push_main=false" >> "$GITHUB_OUTPUT"
fi
# Retag the already-pushed SHA manifest as :main. This is a registry-
# side operation — no rebuild, no layer re-push — so it's quick and
# atomic per-tag. The ancestor check above plus the cancel-in-progress
# concurrency on this job together guarantee we only ever move :main
# forward in git history.
- name: Move :main to this SHA
if: steps.main_check.outputs.push_main == 'true'
# Compute the tag for this run. Main pushes tag directly as :main
# (no per-commit SHA tags); releases use the release tag name.
- name: Compute tag
id: tag
run: |
if [ "${{ github.event_name }}" = "release" ]; then
echo "tag=${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
else
echo "tag=main" >> "$GITHUB_OUTPUT"
fi
# Gate the manifest push on the ancestor check for main pushes.
# For releases there is no gate — the check doesn't even run.
- name: Create manifest list and push
if: github.event_name != 'push' || steps.main_check.outputs.push_main == 'true'
working-directory: /tmp/digests
run: |
set -euo pipefail
image=nousresearch/hermes-agent
args=()
for digest_file in *; do
args+=("${IMAGE_NAME}@sha256:${digest_file}")
done
docker buildx imagetools create \
--tag "${image}:main" \
"${image}:sha-${GITHUB_SHA}"
-t "${IMAGE_NAME}:${TAG}" \
"${args[@]}"
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
- name: Inspect image
if: github.event_name != 'push' || steps.main_check.outputs.push_main == 'true'
run: |
docker buildx imagetools inspect "${IMAGE_NAME}:${TAG}"
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
# Signal to move-latest that the release tag is live.
- name: Mark release tag pushed
id: mark_release_pushed
if: github.event_name == 'release'
run: echo "pushed=true" >> "$GITHUB_OUTPUT"
# ---------------------------------------------------------------------------
# Move :latest to point at the release tag the merge job pushed.
@@ -427,10 +364,10 @@ jobs:
#
# We still run an ancestor check against the existing :latest so that a
# backport release on an older branch (e.g. patching v1.1.5 after v1.2.3
# is out) doesn't drag :latest backwards. The check is the same shape as
# move-main: read the OCI revision label off the current :latest, look up
# that commit in git, and only advance if our release commit is a strict
# descendant.
# is out) doesn't drag :latest backwards. The check is the same shape
# as the ancestor check in the merge job for :main: read the OCI
# revision label off the current :latest, look up that commit in git,
# and only advance if our release commit is a strict descendant.
# ---------------------------------------------------------------------------
move-latest:
if: |
+7 -4
View File
@@ -47,14 +47,17 @@ jobs:
HEAD="${{ github.event.pull_request.head.sha }}"
# Added lines only, excluding lockfiles.
DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
# Three-dot diff (base...head) diffs from the merge base to HEAD,
# so only changes introduced by this PR are included — not changes
# that landed on main after the PR branched off.
DIFF=$(git diff "$BASE"..."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
FINDINGS=""
# --- .pth files (auto-execute on Python startup) ---
# The exact mechanism used in the litellm supply chain attack:
# https://github.com/BerriAI/litellm/issues/24512
PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
PTH_FILES=$(git diff --name-only "$BASE"..."$HEAD" | grep '\.pth$' || true)
if [ -n "$PTH_FILES" ]; then
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: .pth file added or modified
@@ -97,7 +100,7 @@ jobs:
# --- Install-hook files (setup.py/sitecustomize/usercustomize/__init__.pth) ---
# These execute during pip install or interpreter startup.
SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(^|/)(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
SETUP_HITS=$(git diff --name-only "$BASE"..."$HEAD" | grep -E '(^|/)(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
if [ -n "$SETUP_HITS" ]; then
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: Install-hook file added or modified
@@ -158,7 +161,7 @@ jobs:
HEAD="${{ github.event.pull_request.head.sha }}"
# Only check added lines in pyproject.toml
ADDED=$(git diff "$BASE".."$HEAD" -- pyproject.toml | grep '^+' | grep -v '^+++' || true)
ADDED=$(git diff "$BASE"..."$HEAD" -- pyproject.toml | grep '^+' | grep -v '^+++' || true)
if [ -z "$ADDED" ]; then
echo "found=false" >> "$GITHUB_OUTPUT"
+103 -7
View File
@@ -24,12 +24,34 @@ jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
slice: [1, 2, 3, 4, 5, 6]
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install system dependencies
run: sudo apt-get update && sudo apt-get install -y ripgrep
- name: Restore duration cache
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
# Single stable key. main always overwrites, PRs always find it.
key: test-durations
- name: Install ripgrep (prebuilt binary)
run: |
set -euo pipefail
RG_VERSION=15.1.0
RG_SHA256=1c9297be4a084eea7ecaedf93eb03d058d6faae29bbc57ecdaf5063921491599
RG_TARBALL=ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl.tar.gz
curl -sSfL -o "$RG_TARBALL" \
"https://github.com/BurntSushi/ripgrep/releases/download/${RG_VERSION}/${RG_TARBALL}"
echo "${RG_SHA256} ${RG_TARBALL}" | sha256sum -c -
tar -xzf "$RG_TARBALL"
sudo mv "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl/rg" /usr/local/bin/rg
rm -rf "$RG_TARBALL" "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl"
rg --version
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
@@ -43,16 +65,79 @@ jobs:
source .venv/bin/activate
uv pip install -e ".[all,dev]"
- name: Run tests
- name: Run tests (slice ${{ matrix.slice }}/6)
# Per-file isolation via scripts/run_tests_parallel.py: discovers
# every test_*.py file under tests/ (excluding integration/ + e2e/),
# then runs `python -m pytest <file>` in a freshly-spawned subprocess
# with bounded parallelism. No xdist, no shared workers, no
# module-level state leakage between files.
#
# Why per-file (not per-test): per-test spawn cost (~250ms × 17k
# tests = 70min CPU minimum) blew the wall-clock budget. Per-file
# spawn (~250ms × ~850 files = ~3.5min) fits while still giving
# every file a fresh interpreter — the only isolation boundary
# that matters in practice (cross-file leakage was the original
# flake source; intra-file is the test author's responsibility).
#
# Why drop xdist entirely: xdist's persistent workers accumulate
# state across files, which is exactly the leakage we wanted to
# fix. ThreadPoolExecutor + subprocess.run is ~60 lines and does
# the job with cleaner semantics.
#
# Matrix slicing (--slice I/N): files are distributed across 6
# jobs by cached duration (LPT algorithm) so each job gets
# roughly equal wall time. Without a cache, files default to 2s
# estimate and get split roughly evenly by count — still correct,
# just not perfectly balanced.
run: |
source .venv/bin/activate
python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto --timeout=30 --timeout-method=signal
python scripts/run_tests_parallel.py --slice ${{ matrix.slice }}/6
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
- name: Upload per-slice durations
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: test-durations-slice-${{ matrix.slice }}
path: test_durations.json
retention-days: 1
# Merge per-slice duration data into a single cache, so future runs
# (including PRs) get balanced slicing.
save-durations:
needs: test
if: always() && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Download all slice durations
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
with:
pattern: test-durations-slice-*
path: durations
merge-multiple: true
- name: Merge into single durations file
run: |
python3 -c "
import json, glob, os
merged = {}
for f in glob.glob('durations/*test_durations.json'):
with open(f) as fh:
merged.update(json.load(fh))
with open('test_durations.json', 'w') as fh:
json.dump(merged, fh, indent=2, sort_keys=True)
print(f'Merged {len(merged)} file durations')
"
- name: Save merged duration cache
uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
key: test-durations
e2e:
runs-on: ubuntu-latest
timeout-minutes: 15
@@ -60,8 +145,19 @@ jobs:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install system dependencies
run: sudo apt-get update && sudo apt-get install -y ripgrep
- name: Install ripgrep (prebuilt binary)
run: |
set -euo pipefail
RG_VERSION=15.1.0
RG_SHA256=1c9297be4a084eea7ecaedf93eb03d058d6faae29bbc57ecdaf5063921491599
RG_TARBALL=ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl.tar.gz
curl -sSfL -o "$RG_TARBALL" \
"https://github.com/BurntSushi/ripgrep/releases/download/${RG_VERSION}/${RG_TARBALL}"
echo "${RG_SHA256} ${RG_TARBALL}" | sha256sum -c -
tar -xzf "$RG_TARBALL"
sudo mv "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl/rg" /usr/local/bin/rg
rm -rf "$RG_TARBALL" "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl"
rg --version
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
@@ -82,4 +178,4 @@ jobs:
env:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
NOUS_API_KEY: ""
+2
View File
@@ -18,6 +18,8 @@ __pycache__/web_tools.cpython-310.pyc
logs/
data/
.pytest_cache/
test_durations.json
.pytest-cache/
tmp/
temp_vision_images/
hermes-*/*
+36 -8
View File
@@ -1013,17 +1013,39 @@ def profile_env(tmp_path, monkeypatch):
**ALWAYS use `scripts/run_tests.sh`** — do not call `pytest` directly. The script enforces
hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
4 xdist workers matching GHA ubuntu-latest). Direct `pytest` on a 16+ core
developer machine with API keys set diverges from CI in ways that have caused
multiple "works locally, fails in CI" incidents (and the reverse).
`-n auto` xdist workers, in-tree subprocess-isolation plugin). Direct `pytest`
on a 16+ core developer machine with API keys set diverges from CI in ways
that have caused multiple "works locally, fails in CI" incidents (and the reverse).
```bash
scripts/run_tests.sh # full suite, CI-parity
scripts/run_tests.sh tests/gateway/ # one directory
scripts/run_tests.sh tests/agent/test_foo.py::test_x # one test
scripts/run_tests.sh -v --tb=long # pass-through pytest flags
scripts/run_tests.sh --no-isolate tests/foo/ # disable subprocess isolation (faster, for debugging)
```
### Subprocess-per-test isolation
Every test runs in a freshly-spawned Python subprocess via the in-tree plugin
at `tests/_isolate_plugin.py`. This means module-level dicts/sets and
ContextVars from one test cannot leak into the next — the historic
`_reset_module_state` autouse fixture is gone.
Implementation notes:
- The plugin uses `multiprocessing.get_context("spawn")`, which works on
Linux, macOS, and Windows alike (POSIX `fork` is not used).
- Per-test overhead is ~0.51.0s (Python startup + pytest collection). xdist
parallelism amortizes this across cores; on a 20-core box the full suite
finishes in roughly the same wall time as before, but flake-free.
- `isolate_timeout` (configured in `pyproject.toml`) caps each test at 30s.
Hangs are killed and surfaced as a failure report.
- Pass `--no-isolate` to disable isolation — useful when debugging a single
test interactively, or when you specifically want to verify state leakage.
- The plugin disables itself in child processes (sentinel envvar
`HERMES_ISOLATE_CHILD=1`), so there's no fork-bomb risk.
### Why the wrapper (and why the old "just call pytest" doesn't work)
Five real sources of local-vs-CI drift the script closes:
@@ -1034,7 +1056,7 @@ Five real sources of local-vs-CI drift the script closes:
| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
| Timezone | Local TZ (PDT etc.) | UTC |
| Locale | Whatever is set | C.UTF-8 |
| xdist workers | `-n auto` = all cores (20+ on a workstation) | `-n 4` matching CI |
| xdist workers | `-n auto` = all cores | `-n auto` (safe — subprocess isolation prevents cross-worker flakes) |
`tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest
invocation (including IDE integrations) gets hermetic behavior — but the wrapper
@@ -1042,15 +1064,21 @@ is belt-and-suspenders.
### Running without the wrapper (only if you must)
If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
pytest directly), at minimum activate the venv and pass `-n 4`:
If you can't use the wrapper (e.g. inside an IDE that shells pytest directly),
at minimum activate the venv. The isolation plugin loads automatically from
`addopts` in `pyproject.toml`, so you get the same per-test process isolation
either way.
```bash
source .venv/bin/activate # or: source venv/bin/activate
python -m pytest tests/ -q -n 4
python -m pytest tests/ -q
```
Worker count above 4 will surface test-ordering flakes that CI never sees.
If you need to bypass isolation for fast feedback while debugging:
```bash
python -m pytest tests/agent/test_foo.py -q --no-isolate
```
Always run the full suite before pushing changes.
+21
View File
@@ -79,6 +79,27 @@ hermes doctor # Diagnose any issues
📖 **[Full documentation →](https://hermes-agent.nousresearch.com/docs/)**
---
## Skip the API-key collection — Nous Portal
Hermes works with whatever provider you want — that's not changing. But if you'd rather not collect five separate API keys for the model, web search, image generation, TTS, and a cloud browser, **[Nous Portal](https://portal.nousresearch.com)** covers all of them under one subscription:
- **300+ models** — pick any of them with `/model <name>`
- **Tool Gateway** — web search (Firecrawl), image generation (FAL), text-to-speech (OpenAI), cloud browser (Browser Use), all routed through your sub. No extra accounts.
One command from a fresh install:
```bash
hermes setup --portal
```
That logs you in via OAuth, sets Nous as your provider, and turns on the Tool Gateway. Check what's wired up any time with `hermes portal status`. Full details on the [Tool Gateway docs page](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway).
You can still bring your own keys per-tool whenever you want — the gateway is per-backend, not all-or-nothing.
---
## CLI vs Messaging Quick Reference
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
+21
View File
@@ -65,6 +65,27 @@ hermes doctor # 诊断问题
📖 **[完整文档 →](https://hermes-agent.nousresearch.com/docs/)**
---
## 省去到处收集 API Key — Nous Portal
Hermes 始终允许你使用任意服务商,这点不会改变。但如果你不想为模型、网页搜索、图像生成、TTS、云浏览器分别去申请五个不同的 API Key,**[Nous Portal](https://portal.nousresearch.com)** 用一个订阅就能覆盖全部:
- **300+ 模型** — 用 `/model <name>` 随时切换
- **Tool Gateway** — 网页搜索(Firecrawl)、图像生成(FAL)、文本转语音(OpenAI)、云浏览器(Browser Use),全部通过订阅托管。无需额外注册任何账户。
全新安装时一条命令即可:
```bash
hermes setup --portal
```
它会通过 OAuth 登录、把 Nous 设为推理服务商,并启用 Tool Gateway。随时用 `hermes portal status` 查看路由状态。完整说明见 [Tool Gateway 文档](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway)。
你随时可以按工具单独切回自己的 API Key — Gateway 是按工具粒度生效的,不是一刀切。
---
## CLI 与消息平台 快速对照
Hermes 有两种入口:用 `hermes` 启动终端 UI,或运行网关从 Telegram、Discord、Slack、WhatsApp、Signal 或 Email 与之对话。进入对话后,许多斜杠命令在两种界面中通用。
+119 -3
View File
@@ -71,6 +71,71 @@ def _ra():
return run_agent
def _normalized_custom_base_url(value: Any) -> str:
if not isinstance(value, str):
return ""
return value.strip().rstrip("/")
def _custom_provider_model_matches(agent_model: str, entry: Dict[str, Any]) -> bool:
provider_model = str(entry.get("model", "") or "").strip().lower()
if not provider_model:
return True
return provider_model == str(agent_model or "").strip().lower()
def _custom_provider_extra_body_for_agent(
*,
provider: str,
model: str,
base_url: str,
custom_providers: List[Dict[str, Any]],
) -> Optional[Dict[str, Any]]:
if (provider or "").strip().lower() != "custom":
return None
target_url = _normalized_custom_base_url(base_url)
if not target_url:
return None
fallback: Optional[Dict[str, Any]] = None
for entry in custom_providers or []:
if not isinstance(entry, dict):
continue
if _normalized_custom_base_url(entry.get("base_url")) != target_url:
continue
extra_body = entry.get("extra_body")
if not isinstance(extra_body, dict) or not extra_body:
continue
provider_model = str(entry.get("model", "") or "").strip()
if provider_model:
if _custom_provider_model_matches(model, entry):
return dict(extra_body)
elif fallback is None:
fallback = dict(extra_body)
return fallback
def _merge_custom_provider_extra_body(agent, custom_providers: List[Dict[str, Any]]) -> None:
extra_body = _custom_provider_extra_body_for_agent(
provider=agent.provider,
model=agent.model,
base_url=agent.base_url,
custom_providers=custom_providers,
)
if not extra_body:
return
overrides = dict(getattr(agent, "request_overrides", {}) or {})
merged_extra_body = dict(extra_body)
existing_extra_body = overrides.get("extra_body")
if isinstance(existing_extra_body, dict):
merged_extra_body.update(existing_extra_body)
overrides["extra_body"] = merged_extra_body
agent.request_overrides = overrides
def init_agent(
agent,
base_url: str = None,
@@ -542,6 +607,31 @@ def init_agent(
# Falling back would send Anthropic credentials to third-party endpoints (Fixes #1739, #minimax-401).
_is_native_anthropic = agent.provider == "anthropic"
effective_key = (api_key or resolve_anthropic_token() or "") if _is_native_anthropic else (api_key or "")
# MiniMax OAuth issues short-lived (~15-min) access tokens. The
# Anthropic SDK caches ``api_key`` as a static string at client
# construction time, so a session that resolves the bearer once
# at startup will keep sending the same token until MiniMax
# returns 401 mid-session. Swap the static string for a callable
# token provider — ``build_anthropic_client`` recognizes the
# callable and installs an httpx event hook that mints a fresh
# bearer per outbound request (re-reading auth.json so a refresh
# persisted by another process is visible immediately).
# The cached refresh path is a no-op when the token still has
# ``MINIMAX_OAUTH_REFRESH_SKEW_SECONDS`` of life left, so steady-
# state cost is one file read + one timestamp compare per request.
if agent.provider == "minimax-oauth" and isinstance(effective_key, str) and effective_key:
try:
from hermes_cli.auth import build_minimax_oauth_token_provider
effective_key = build_minimax_oauth_token_provider()
except Exception as _mm_exc: # noqa: BLE001 — never block startup on this
import logging as _logging
_logging.getLogger(__name__).warning(
"MiniMax OAuth: failed to install per-request token provider "
"(%s); falling back to static bearer that will expire ~15min in.",
_mm_exc,
)
agent.api_key = effective_key
agent._anthropic_api_key = effective_key
agent._anthropic_base_url = base_url
@@ -553,7 +643,7 @@ def init_agent(
# that cause 401/403 on their endpoints. Guards #1739 and
# the third-party identity-injection bug.
from agent.anthropic_adapter import _is_oauth_token as _is_oat
agent._is_anthropic_oauth = _is_oat(effective_key) if _is_native_anthropic else False
agent._is_anthropic_oauth = _is_oat(effective_key) if (_is_native_anthropic and isinstance(effective_key, str)) else False
agent._anthropic_client = build_anthropic_client(effective_key, base_url, timeout=_provider_timeout)
# No OpenAI client needed for Anthropic mode
agent.client = None
@@ -1060,7 +1150,18 @@ def init_agent(
# through _ra().get_tool_definitions()). Duplicate function names cause
# 400 errors on providers that enforce unique names (e.g. Xiaomi
# MiMo via Nous Portal).
if agent._memory_manager and agent.tools is not None:
#
# Respect the platform's enabled_toolsets configuration (#5544):
# enabled_toolsets is None → no filter, inject (backward compat)
# "memory" in enabled_toolsets → user opted in, inject
# otherwise (incl. []) → user excluded memory, skip injection
#
# Without this gate, `platform_toolsets: telegram: []` still leaks memory
# provider tools (fact_store, etc.) into the tool surface — a 10x latency
# penalty on local models and a frequent trigger of tool-call loops.
if agent._memory_manager and agent.tools is not None and (
agent.enabled_toolsets is None or "memory" in agent.enabled_toolsets
):
_existing_tool_names = {
t.get("function", {}).get("name")
for t in agent.tools
@@ -1213,6 +1314,7 @@ def init_agent(
# Store for reuse by _check_compression_model_feasibility (auxiliary
# compression model context-length detection needs the same list).
agent._custom_providers = _custom_providers
_merge_custom_provider_extra_body(agent, _custom_providers)
# Check custom_providers per-model context_length
if _config_context_length is None and _custom_providers:
@@ -1369,8 +1471,22 @@ def init_agent(
# errors. Even with the cache fix, dedup is the right defense
# against plugin paths that may register the same schemas via
# ctx.register_tool(). Mirrors the memory tools dedup above.
#
# Respect the platform's enabled_toolsets configuration (#5544):
# context engine tools follow the same gating pattern as memory
# provider tools — without the gate, `platform_toolsets: telegram: []`
# would still leak lcm_* tools into the tool surface and incur the
# same local-model latency penalty.
agent._context_engine_tool_names: set = set()
if hasattr(agent, "context_compressor") and agent.context_compressor and agent.tools is not None:
if (
hasattr(agent, "context_compressor")
and agent.context_compressor
and agent.tools is not None
and (
agent.enabled_toolsets is None
or "context_engine" in agent.enabled_toolsets
)
):
_existing_tool_names = {
t.get("function", {}).get("name")
for t in agent.tools
+75 -20
View File
@@ -617,9 +617,28 @@ def recover_with_credential_pool(
# existing entitlement keyword set in ``_is_entitlement_failure``.
# Any 403 against ``xai-oauth`` is treated as entitlement here so
# the refresh loop can't spin in those cases either.
#
# Exception (#29344): xAI's ``[WKE=unauthenticated:...]`` suffix and
# the ``OAuth2 access token could not be validated`` phrasing are
# xAI's authoritative "this is a stale token, not entitlement"
# signal. When either fires we must NOT apply the catch-all
# override — refresh is the recoverable path for these bodies, and
# blanket-classifying them as entitlement was the bug that left
# long-running TUI sessions stuck on stale tokens until the user
# exited and reopened.
is_entitlement = agent._is_entitlement_failure(error_context, status_code)
if not is_entitlement and status_code == 403 and (agent.provider or "") == "xai-oauth":
is_entitlement = True
_disambiguator_haystack = " ".join(
str(error_context.get(k) or "").lower()
for k in ("message", "reason", "code", "error")
if isinstance(error_context, dict)
)
_is_xai_auth_failure = (
"[wke=unauthenticated:" in _disambiguator_haystack
or "oauth2 access token could not be validated" in _disambiguator_haystack
)
if not _is_xai_auth_failure:
is_entitlement = True
if is_entitlement:
_ra().logger.info(
"Credential %s — entitlement-shaped 403 from %s; "
@@ -1064,10 +1083,7 @@ def dump_api_request_debug(
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
dump_file = agent.logs_dir / f"request_dump_{agent.session_id}_{timestamp}.json"
dump_file.write_text(
json.dumps(dump_payload, ensure_ascii=False, indent=2, default=str),
encoding="utf-8",
)
atomic_json_write(dump_file, dump_payload, default=str)
agent._vprint(f"{agent.log_prefix}🧾 Request debug dump written to: {dump_file}")
@@ -1352,6 +1368,22 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
# API key — falling back would send Anthropic credentials to third-party endpoints.
_is_native_anthropic = new_provider == "anthropic"
effective_key = (api_key or agent.api_key or resolve_anthropic_token() or "") if _is_native_anthropic else (api_key or agent.api_key or "")
# MiniMax OAuth: swap static string for a per-request callable token
# provider so the rebuilt client survives 15-min token expiry. See
# the matching block in agent_init.py for the full rationale.
if new_provider == "minimax-oauth" and isinstance(effective_key, str) and effective_key:
try:
from hermes_cli.auth import build_minimax_oauth_token_provider
effective_key = build_minimax_oauth_token_provider()
except Exception as _mm_exc: # noqa: BLE001
import logging as _logging
_logging.getLogger(__name__).warning(
"MiniMax OAuth: failed to install per-request token provider "
"on switch (%s); using static bearer.",
_mm_exc,
)
agent.api_key = effective_key
agent._anthropic_api_key = effective_key
agent._anthropic_base_url = base_url or getattr(agent, "_anthropic_base_url", None)
@@ -1359,7 +1391,7 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
effective_key, agent._anthropic_base_url,
timeout=get_provider_request_timeout(agent.provider, agent.model),
)
agent._is_anthropic_oauth = _is_oauth_token(effective_key) if _is_native_anthropic else False
agent._is_anthropic_oauth = _is_oauth_token(effective_key) if (_is_native_anthropic and isinstance(effective_key, str)) else False
agent.client = None
agent._client_kwargs = {}
else:
@@ -2116,33 +2148,56 @@ def apply_pending_steer_to_tool_results(agent, messages: list, num_tool_msgs: in
def force_close_tcp_sockets(client: Any) -> int:
"""Force-close underlying TCP sockets to prevent CLOSE-WAIT accumulation.
"""Abort in-flight TCP I/O by shutting down sockets WITHOUT closing FDs.
When a provider drops a connection mid-stream, httpx's ``client.close()``
performs a graceful shutdown which leaves sockets in CLOSE-WAIT until the
OS times them out (often minutes). This method walks the httpx transport
pool and issues ``socket.shutdown(SHUT_RDWR)`` + ``socket.close()`` to
force an immediate TCP RST, freeing the file descriptors.
When a provider drops a connection mid-stream — or the user issues an
interrupt — we want to unblock httpx's reader/writer immediately rather
than waiting for the kernel's per-connection timeout. ``shutdown(SHUT_RDWR)``
achieves that: it sends FIN, breaks any pending ``recv``/``send`` with EOF
or ``EPIPE``, but does NOT release the file descriptor.
Returns the number of sockets force-closed.
Historically this helper also called ``socket.close()`` so the FD got
released immediately, but that's unsafe when (as is the case for both the
interrupt-abort path and stale-call kill path) the helper runs on a
different thread than the one driving the request:
* The Python ``socket.socket`` we close here is the SAME object held by
httpx's pool, so closing it via Python sets its ``_fd`` to -1 and
future operations on that Python object fail safely.
* BUT the SSL wrapper (``ssl.SSLSocket``'s underlying OpenSSL ``BIO``)
caches the raw integer FD. Once ``os.close(fd)`` runs, the kernel may
immediately recycle that integer to the next ``open()`` call — e.g.
the kanban dispatcher opening ``kanban.db``.
* The owning worker thread then unwinds httpx, the SSL layer flushes a
pending TLS record, and the encrypted bytes get written into the
wrong file (issue #29507: 24-byte TLS application-data record
clobbering SQLite header bytes 5..28).
The fix is to let the owning thread own the close. ``shutdown()`` from any
thread is FD-safe; ``close()`` is not. The httpx connection's own close
path — which runs from the worker thread when it unwinds — will release
the FD via the same ``socket.socket`` object, and because Python's socket
close atomically swaps ``_fd`` to -1 *before* issuing ``os.close``, there
is no FD-aliasing window when only one thread closes.
Returns the number of sockets shut down. (Field kept as
``tcp_force_closed=N`` in the log line for backwards-compatible parsing.)
"""
import socket as _socket
closed = 0
shutdown_count = 0
try:
for sock in _iter_pool_sockets(client):
try:
sock.shutdown(_socket.SHUT_RDWR)
except OSError:
# Already shut down / not connected / FD invalid — all benign.
pass
try:
sock.close()
except OSError:
pass
closed += 1
# IMPORTANT (#29507): do NOT call sock.close() here. See docstring.
shutdown_count += 1
except Exception as exc:
_ra().logger.debug("Force-close TCP sockets sweep error: %s", exc)
return closed
return shutdown_count
+254 -230
View File
@@ -1606,182 +1606,155 @@ def _content_parts_to_anthropic_blocks(parts: Any) -> List[Dict[str, Any]]:
return out
def convert_messages_to_anthropic(
messages: List[Dict],
base_url: str | None = None,
model: str | None = None,
) -> Tuple[Optional[Any], List[Dict]]:
"""Convert OpenAI-format messages to Anthropic format.
def _convert_assistant_message(m: Dict[str, Any]) -> Dict[str, Any]:
"""Convert an assistant message to Anthropic content blocks.
Returns (system_prompt, anthropic_messages).
System messages are extracted since Anthropic takes them as a separate param.
system_prompt is a string or list of content blocks (when cache_control present).
When *base_url* is provided and points to a third-party Anthropic-compatible
endpoint, all thinking block signatures are stripped. Signatures are
Anthropic-proprietary — third-party endpoints cannot validate them and will
reject them with HTTP 400 "Invalid signature in thinking block".
When *model* is provided and matches the Kimi / Moonshot family (or
*base_url* is a Kimi / Moonshot host), unsigned thinking blocks
synthesised from ``reasoning_content`` are preserved on replayed
assistant tool-call messages — Kimi requires the field to exist, even
if empty.
Handles thinking blocks, regular content, tool calls, and
reasoning_content injection for Kimi/DeepSeek endpoints.
"""
system = None
result = []
for m in messages:
role = m.get("role", "user")
content = m.get("content", "")
if role == "system":
if isinstance(content, list):
# Preserve cache_control markers on content blocks
has_cache = any(
p.get("cache_control") for p in content if isinstance(p, dict)
)
if has_cache:
system = [p for p in content if isinstance(p, dict)]
else:
system = "\n".join(
p["text"] for p in content if p.get("type") == "text"
)
else:
system = content
continue
if role == "assistant":
blocks = _extract_preserved_thinking_blocks(m)
if content:
if isinstance(content, list):
converted_content = _convert_content_to_anthropic(content)
if isinstance(converted_content, list):
blocks.extend(converted_content)
else:
blocks.append({"type": "text", "text": str(content)})
for tc in m.get("tool_calls", []):
if not tc or not isinstance(tc, dict):
continue
fn = tc.get("function", {})
args = fn.get("arguments", "{}")
try:
parsed_args = json.loads(args) if isinstance(args, str) else args
except (json.JSONDecodeError, ValueError):
parsed_args = {}
blocks.append({
"type": "tool_use",
"id": _sanitize_tool_id(tc.get("id", "")),
"name": fn.get("name", ""),
"input": parsed_args,
})
# Kimi's /coding endpoint (Anthropic protocol) requires assistant
# tool-call messages to carry reasoning_content when thinking is
# enabled server-side. Preserve it as a thinking block so Kimi
# can validate the message history. See hermes-agent#13848.
#
# Accept empty string "" — _copy_reasoning_content_for_api()
# injects "" as a tier-3 fallback for Kimi tool-call messages
# that had no reasoning. Kimi requires the field to exist, even
# if empty.
#
# Prepend (not append): Anthropic protocol requires thinking
# blocks before text and tool_use blocks.
#
# Guard: only add when reasoning_details didn't already contribute
# thinking blocks. On native Anthropic, reasoning_details produces
# signed thinking blocks — adding another unsigned one from
# reasoning_content would create a duplicate (same text) that gets
# downgraded to a spurious text block on the last assistant message.
reasoning_content = m.get("reasoning_content")
_already_has_thinking = any(
isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"}
for b in blocks
)
if isinstance(reasoning_content, str) and not _already_has_thinking:
blocks.insert(0, {"type": "thinking", "thinking": reasoning_content})
# Anthropic rejects empty assistant content
effective = blocks or content
if not effective or effective == "":
effective = [{"type": "text", "text": "(empty)"}]
result.append({"role": "assistant", "content": effective})
continue
if role == "tool":
# Sanitize tool_use_id and ensure non-empty content.
# Computer-use (and other multimodal) tool results arrive as
# either a list of OpenAI-style content parts, or a dict
# marked `_multimodal` with an embedded `content` list. Convert
# both into Anthropic `tool_result` inner blocks (text + image).
multimodal_blocks: Optional[List[Dict[str, Any]]] = None
if isinstance(content, dict) and content.get("_multimodal"):
multimodal_blocks = _content_parts_to_anthropic_blocks(
content.get("content") or []
)
# Fallback text if the conversion produced nothing usable.
if not multimodal_blocks and content.get("text_summary"):
multimodal_blocks = [
{"type": "text", "text": str(content["text_summary"])}
]
elif isinstance(content, list):
converted = _content_parts_to_anthropic_blocks(content)
if any(b.get("type") == "image" for b in converted):
multimodal_blocks = converted
# Back-compat: some callers stash blocks under a private key.
if multimodal_blocks is None:
stashed = m.get("_anthropic_content_blocks")
if isinstance(stashed, list) and stashed:
text_content = content if isinstance(content, str) and content.strip() else None
multimodal_blocks = (
[{"type": "text", "text": text_content}] + stashed
if text_content else list(stashed)
)
if multimodal_blocks:
result_content: Any = multimodal_blocks
elif isinstance(content, str):
result_content = content
else:
result_content = json.dumps(content) if content else "(no output)"
if not result_content:
result_content = "(no output)"
tool_result = {
"type": "tool_result",
"tool_use_id": _sanitize_tool_id(m.get("tool_call_id", "")),
"content": result_content,
}
if isinstance(m.get("cache_control"), dict):
tool_result["cache_control"] = dict(m["cache_control"])
# Merge consecutive tool results into one user message
if (
result
and result[-1]["role"] == "user"
and isinstance(result[-1]["content"], list)
and result[-1]["content"]
and result[-1]["content"][0].get("type") == "tool_result"
):
result[-1]["content"].append(tool_result)
else:
result.append({"role": "user", "content": [tool_result]})
continue
# Regular user message — validate non-empty content (Anthropic rejects empty)
content = m.get("content", "")
blocks = _extract_preserved_thinking_blocks(m)
if content:
if isinstance(content, list):
converted_blocks = _convert_content_to_anthropic(content)
# Check if all text blocks are empty
if not converted_blocks or all(
b.get("text", "").strip() == ""
for b in converted_blocks
if isinstance(b, dict) and b.get("type") == "text"
):
converted_blocks = [{"type": "text", "text": "(empty message)"}]
result.append({"role": "user", "content": converted_blocks})
converted_content = _convert_content_to_anthropic(content)
if isinstance(converted_content, list):
blocks.extend(converted_content)
else:
# Validate string content is non-empty
if not content or (isinstance(content, str) and not content.strip()):
content = "(empty message)"
result.append({"role": "user", "content": content})
blocks.append({"type": "text", "text": str(content)})
for tc in m.get("tool_calls", []):
if not tc or not isinstance(tc, dict):
continue
fn = tc.get("function", {})
args = fn.get("arguments", "{}")
try:
parsed_args = json.loads(args) if isinstance(args, str) else args
except (json.JSONDecodeError, ValueError):
parsed_args = {}
blocks.append({
"type": "tool_use",
"id": _sanitize_tool_id(tc.get("id", "")),
"name": fn.get("name", ""),
"input": parsed_args,
})
# Kimi's /coding endpoint (Anthropic protocol) requires assistant
# tool-call messages to carry reasoning_content when thinking is
# enabled server-side. Preserve it as a thinking block so Kimi
# can validate the message history. See hermes-agent#13848.
#
# Accept empty string "" — _copy_reasoning_content_for_api()
# injects "" as a tier-3 fallback for Kimi tool-call messages
# that had no reasoning. Kimi requires the field to exist, even
# if empty.
#
# Prepend (not append): Anthropic protocol requires thinking
# blocks before text and tool_use blocks.
#
# Guard: only add when reasoning_details didn't already contribute
# thinking blocks. On native Anthropic, reasoning_details produces
# signed thinking blocks — adding another unsigned one from
# reasoning_content would create a duplicate (same text) that gets
# downgraded to a spurious text block on the last assistant message.
reasoning_content = m.get("reasoning_content")
_already_has_thinking = any(
isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"}
for b in blocks
)
if isinstance(reasoning_content, str) and not _already_has_thinking:
blocks.insert(0, {"type": "thinking", "thinking": reasoning_content})
# Anthropic rejects empty assistant content
effective = blocks or content
if not effective or effective == "":
effective = [{"type": "text", "text": "(empty)"}]
return {"role": "assistant", "content": effective}
def _convert_tool_message_to_result(
result: List[Dict[str, Any]], m: Dict[str, Any]
) -> None:
"""Convert a tool message to an Anthropic tool_result, merging consecutive
results into one user message.
Mutates ``result`` in place — either appends a new user message or extends
the trailing user message's tool_result list.
"""
content = m.get("content", "")
multimodal_blocks: Optional[List[Dict[str, Any]]] = None
if isinstance(content, dict) and content.get("_multimodal"):
multimodal_blocks = _content_parts_to_anthropic_blocks(
content.get("content") or []
)
# Fallback text if the conversion produced nothing usable.
if not multimodal_blocks and content.get("text_summary"):
multimodal_blocks = [
{"type": "text", "text": str(content["text_summary"])}
]
elif isinstance(content, list):
converted = _content_parts_to_anthropic_blocks(content)
if any(b.get("type") == "image" for b in converted):
multimodal_blocks = converted
# Back-compat: some callers stash blocks under a private key.
if multimodal_blocks is None:
stashed = m.get("_anthropic_content_blocks")
if isinstance(stashed, list) and stashed:
text_content = content if isinstance(content, str) and content.strip() else None
multimodal_blocks = (
[{"type": "text", "text": text_content}] + stashed
if text_content else list(stashed)
)
if multimodal_blocks:
result_content: Any = multimodal_blocks
elif isinstance(content, str):
result_content = content
else:
result_content = json.dumps(content) if content else "(no output)"
if not result_content:
result_content = "(no output)"
tool_result = {
"type": "tool_result",
"tool_use_id": _sanitize_tool_id(m.get("tool_call_id", "")),
"content": result_content,
}
if isinstance(m.get("cache_control"), dict):
tool_result["cache_control"] = dict(m["cache_control"])
# Merge consecutive tool results into one user message
if (
result
and result[-1]["role"] == "user"
and isinstance(result[-1]["content"], list)
and result[-1]["content"]
and result[-1]["content"][0].get("type") == "tool_result"
):
result[-1]["content"].append(tool_result)
else:
result.append({"role": "user", "content": [tool_result]})
def _convert_user_message(content: Any) -> Dict[str, Any]:
"""Validate and convert a user message to anthropic format."""
if isinstance(content, list):
converted_blocks = _convert_content_to_anthropic(content)
if not converted_blocks or all(
b.get("text", "").strip() == ""
for b in converted_blocks
if isinstance(b, dict) and b.get("type") == "text"
):
converted_blocks = [{"type": "text", "text": "(empty message)"}]
return {"role": "user", "content": converted_blocks}
else:
if not content or (isinstance(content, str) and not content.strip()):
content = "(empty message)"
return {"role": "user", "content": content}
def _strip_orphaned_tool_blocks(result: List[Dict[str, Any]]) -> None:
"""Strip tool_use blocks with no matching tool_result, and vice versa.
Context compression or session truncation can remove either side of a
tool-call pair. Anthropic rejects both orphans with HTTP 400.
Mutates ``result`` in place.
"""
# Strip orphaned tool_use blocks (no matching tool_result follows)
tool_result_ids = set()
for m in result:
@@ -1799,10 +1772,7 @@ def convert_messages_to_anthropic(
if not m["content"]:
m["content"] = [{"type": "text", "text": "(tool call removed)"}]
# Strip orphaned tool_result blocks (no matching tool_use precedes them).
# This is the mirror of the above: context compression or session truncation
# can remove an assistant message containing a tool_use while leaving the
# subsequent tool_result intact. Anthropic rejects these with a 400.
# Strip orphaned tool_result blocks (no matching tool_use precedes them)
tool_use_ids = set()
for m in result:
if m["role"] == "assistant" and isinstance(m["content"], list):
@@ -1819,12 +1789,16 @@ def convert_messages_to_anthropic(
if not m["content"]:
m["content"] = [{"type": "text", "text": "(tool result removed)"}]
# Enforce strict role alternation (Anthropic rejects consecutive same-role messages)
def _merge_consecutive_roles(result: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Merge consecutive same-role messages to enforce Anthropic alternation.
Returns a new list (caller must rebind ``result``).
"""
fixed = []
for m in result:
if fixed and fixed[-1]["role"] == m["role"]:
if m["role"] == "user":
# Merge consecutive user messages
prev_content = fixed[-1]["content"]
curr_content = m["content"]
if isinstance(prev_content, str) and isinstance(curr_content, str):
@@ -1832,7 +1806,6 @@ def convert_messages_to_anthropic(
elif isinstance(prev_content, list) and isinstance(curr_content, list):
fixed[-1]["content"] = prev_content + curr_content
else:
# Mixed types — wrap string in list
if isinstance(prev_content, str):
prev_content = [{"type": "text", "text": prev_content}]
if isinstance(curr_content, str):
@@ -1855,7 +1828,6 @@ def convert_messages_to_anthropic(
elif isinstance(prev_blocks, str) and isinstance(curr_blocks, str):
fixed[-1]["content"] = prev_blocks + "\n" + curr_blocks
else:
# Mixed types — normalize both to list and merge
if isinstance(prev_blocks, str):
prev_blocks = [{"type": "text", "text": prev_blocks}]
if isinstance(curr_blocks, str):
@@ -1863,37 +1835,34 @@ def convert_messages_to_anthropic(
fixed[-1]["content"] = prev_blocks + curr_blocks
else:
fixed.append(m)
result = fixed
return fixed
# ── Thinking block signature management ──────────────────────────
# Anthropic signs thinking blocks against the full turn content.
# Any upstream mutation (context compression, session truncation,
# orphan stripping, message merging) invalidates the signature,
# causing HTTP 400 "Invalid signature in thinking block".
#
# Signatures are Anthropic-proprietary. Third-party endpoints
# (MiniMax, Microsoft Foundry, self-hosted proxies) cannot validate
# them and will reject them outright. When targeting a third-party
# endpoint, strip ALL thinking/redacted_thinking blocks from every
# assistant message — the third-party will generate its own
# thinking blocks if it supports extended thinking.
#
# For direct Anthropic (strategy following clawdbot/OpenClaw):
# 1. Strip thinking/redacted_thinking from all assistant messages
# EXCEPT the last one — preserves reasoning continuity on the
# current tool-use chain while avoiding stale signature errors.
# 2. Downgrade unsigned thinking blocks (no signature) to text —
# Anthropic can't validate them and will reject them.
# 3. Strip cache_control from thinking/redacted_thinking blocks —
# cache markers can interfere with signature validation.
def _manage_thinking_signatures(
result: List[Dict[str, Any]], base_url: str | None, model: str | None
) -> None:
"""Strip or preserve thinking blocks based on endpoint type.
Anthropic signs thinking blocks against the full turn content.
Any upstream mutation (context compression, session truncation, orphan
stripping, message merging) invalidates the signature, causing HTTP 400
"Invalid signature in thinking block".
Signatures are Anthropic-proprietary. Third-party endpoints (MiniMax,
Azure AI Foundry, AWS Bedrock, self-hosted proxies) cannot validate them
and will reject them outright. Kimi's /coding and DeepSeek's /anthropic
endpoints speak the Anthropic protocol upstream but require unsigned
thinking blocks (synthesised from ``reasoning_content``) to round-trip on
replayed assistant tool-call messages. See hermes-agent#13848 (Kimi) and
hermes-agent#16748 (DeepSeek).
Mutates ``result`` in place.
"""
_THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
_is_third_party = _is_third_party_anthropic_endpoint(base_url)
# Kimi /coding and DeepSeek /anthropic share a contract: both speak the
# Anthropic Messages protocol upstream but require that thinking blocks
# synthesised from reasoning_content round-trip on subsequent turns when
# thinking is enabled. Signed Anthropic blocks still have to be stripped
# (neither endpoint can validate Anthropic's signatures); unsigned blocks
# are preserved. See hermes-agent#13848 (Kimi) and #16748 (DeepSeek).
# Kimi / DeepSeek share a contract: strip signed Anthropic blocks
# (neither upstream can validate Anthropic signatures), preserve unsigned
# ones synthesised from reasoning_content. See #13848, #16748.
_preserve_unsigned_thinking = (
_is_kimi_family_endpoint(base_url, model)
or _is_deepseek_anthropic_endpoint(base_url)
@@ -1910,26 +1879,19 @@ def convert_messages_to_anthropic(
continue
if _preserve_unsigned_thinking:
# Kimi's /coding and DeepSeek's /anthropic endpoints both enable
# thinking server-side and require unsigned thinking blocks on
# replayed assistant tool-call messages. Strip signed Anthropic
# blocks (neither upstream can validate Anthropic signatures) but
# preserve the unsigned ones we synthesised from reasoning_content.
# Kimi / DeepSeek: strip signed, preserve unsigned.
new_content = []
for b in m["content"]:
if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
new_content.append(b)
continue
if b.get("signature") or b.get("data"):
# Anthropic-signed block — upstream can't validate, strip
# Signed (or redacted-with-data) — upstream can't validate, strip.
continue
# Unsigned thinking (synthesised from reasoning_content) —
# keep it: the upstream needs it for message-history validation.
new_content.append(b)
m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
elif _is_third_party or idx != last_assistant_idx:
# Third-party endpoint: strip ALL thinking blocks from every
# assistant message — signatures are Anthropic-proprietary.
# Third-party: strip ALL thinking blocks (signatures are proprietary).
# Direct Anthropic: strip from non-latest assistant messages only.
stripped = [
b for b in m["content"]
@@ -1937,24 +1899,21 @@ def convert_messages_to_anthropic(
]
m["content"] = stripped or [{"type": "text", "text": "(thinking elided)"}]
else:
# Latest assistant on direct Anthropic: keep signed thinking
# blocks for reasoning continuity; downgrade unsigned ones to
# plain text.
# Latest assistant on direct Anthropic: keep signed, downgrade unsigned
# to text so the reasoning isn't lost.
new_content = []
for b in m["content"]:
if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
new_content.append(b)
continue
if b.get("type") == "redacted_thinking":
# Redacted blocks use 'data' for the signature payload
# Redacted blocks use 'data' for the signature payload
# drop the block when 'data' is missing (can't be validated).
if b.get("data"):
new_content.append(b)
# else: drop — no data means it can't be validated
elif b.get("signature"):
# Signed thinking block — keep it
new_content.append(b)
else:
# Unsigned thinking — downgrade to text so it's not lost
thinking_text = b.get("thinking", "")
if thinking_text:
new_content.append({"type": "text", "text": thinking_text})
@@ -1966,12 +1925,15 @@ def convert_messages_to_anthropic(
if isinstance(b, dict) and b.get("type") in _THINKING_TYPES:
b.pop("cache_control", None)
# ── Image eviction: keep only the most recent N screenshots ─────
# computer_use screenshots (base64 images) sit inside tool_result
# blocks: they accumulate and are sent with every API call. Each
# costs ~1,465 tokens; after 10+ the conversation becomes slow
# even for simple text queries. Walk backward, keep the most recent
# _MAX_KEEP_IMAGES, replace older ones with a text placeholder.
def _evict_old_screenshots(result: List[Dict[str, Any]]) -> None:
"""Keep only the most recent ``_MAX_KEEP_IMAGES`` computer-use screenshots.
Base64 images cost ~1,465 tokens each and accumulate across tool calls.
Walk backward, keep the most recent N, replace older ones with a placeholder.
Mutates ``result`` in place.
"""
_MAX_KEEP_IMAGES = 3
_image_count = 0
for msg in reversed(result):
@@ -1998,6 +1960,68 @@ def convert_messages_to_anthropic(
for b in inner
]
def convert_messages_to_anthropic(
messages: List[Dict],
base_url: str | None = None,
model: str | None = None,
) -> Tuple[Optional[Any], List[Dict]]:
"""Convert OpenAI-format messages to Anthropic format.
Returns (system_prompt, anthropic_messages).
System messages are extracted since Anthropic takes them as a separate param.
system_prompt is a string or list of content blocks (when cache_control present).
When *base_url* is provided and points to a third-party Anthropic-compatible
endpoint, all thinking block signatures are stripped. Signatures are
Anthropic-proprietary — third-party endpoints cannot validate them and will
reject them with HTTP 400 "Invalid signature in thinking block".
When *model* is provided and matches the Kimi / Moonshot family (or
*base_url* is a Kimi / Moonshot host), unsigned thinking blocks
synthesised from ``reasoning_content`` are preserved on replayed
assistant tool-call messages — Kimi requires the field to exist, even
if empty.
"""
system = None
result: List[Dict[str, Any]] = []
for m in messages:
role = m.get("role", "user")
content = m.get("content", "")
if role == "system":
if isinstance(content, list):
# Preserve cache_control markers on content blocks
has_cache = any(
p.get("cache_control") for p in content if isinstance(p, dict)
)
if has_cache:
system = [p for p in content if isinstance(p, dict)]
else:
system = "\n".join(
p["text"] for p in content if p.get("type") == "text"
)
else:
system = content
continue
if role == "assistant":
result.append(_convert_assistant_message(m))
continue
if role == "tool":
_convert_tool_message_to_result(result, m)
continue
# Regular user message
result.append(_convert_user_message(content))
_strip_orphaned_tool_blocks(result)
result = _merge_consecutive_roles(result)
_manage_thinking_signatures(result, base_url, model)
_evict_old_screenshots(result)
return system, result
+64 -8
View File
@@ -91,23 +91,55 @@ def interruptible_api_call(agent, api_kwargs: dict):
provider fallback.
"""
result = {"response": None, "error": None}
request_client_holder = {"client": None}
request_client_holder = {"client": None, "owner_tid": None}
request_client_lock = threading.Lock()
def _set_request_client(client):
with request_client_lock:
request_client_holder["client"] = client
# #29507: stamp the owning thread so a stranger-thread interrupt
# only shuts the connection down rather than racing the worker
# for FD ownership during ``client.close()``.
request_client_holder["owner_tid"] = threading.get_ident()
return client
def _take_request_client():
with request_client_lock:
client = request_client_holder.get("client")
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
return client
def _close_request_client_once(reason: str) -> None:
request_client = _take_request_client()
if request_client is not None:
# #29507: dispatch on the calling thread.
#
# When ``_call`` (the worker) reaches its ``finally`` it owns the
# close and we pop + fully close as before. When a *stranger* thread
# (the interrupt-check loop, the stale-call detector) drives the
# close, only shut the sockets down so the worker's blocked
# ``recv``/``send`` unwinds with an ``EPIPE`` / EOF — and let the
# worker close ``client`` from its own thread on its way out. That
# avoids the FD-recycling race where the kernel reassigned a
# just-closed TLS socket FD to ``kanban.db``, and the still-live SSL
# BIO on the worker thread then wrote a 24-byte TLS application-data
# record into the SQLite header (#29507).
with request_client_lock:
request_client = request_client_holder.get("client")
owner_tid = request_client_holder.get("owner_tid")
stranger_thread = (
request_client is not None
and owner_tid is not None
and owner_tid != threading.get_ident()
)
if not stranger_thread:
# Owning thread (or no recorded owner) → pop and fully close.
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
if request_client is None:
return
if stranger_thread:
agent._abort_request_openai_client(request_client, reason=reason)
else:
agent._close_request_openai_client(request_client, reason=reason)
def _call():
@@ -776,8 +808,11 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
from hermes_cli.model_normalize import normalize_model_for_provider
fb_model = normalize_model_for_provider(fb_model, fb_provider)
except Exception:
pass
except Exception as _norm_err:
logger.warning(
"Could not normalize fallback model %r for provider %r: %s",
fb_model, fb_provider, _norm_err,
)
# Determine api_mode from provider / base URL / model
fb_api_mode = "chat_completions"
@@ -1271,23 +1306,44 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
return result["response"]
result = {"response": None, "error": None, "partial_tool_names": []}
request_client_holder = {"client": None, "diag": None}
request_client_holder = {"client": None, "diag": None, "owner_tid": None}
request_client_lock = threading.Lock()
def _set_request_client(client):
with request_client_lock:
request_client_holder["client"] = client
# See #29507 explanation in the non-streaming variant above.
request_client_holder["owner_tid"] = threading.get_ident()
return client
def _take_request_client():
with request_client_lock:
client = request_client_holder.get("client")
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
return client
def _close_request_client_once(reason: str) -> None:
request_client = _take_request_client()
if request_client is not None:
# See #29507 explanation in the non-streaming variant above. A
# stranger thread (the interrupt-check / stale-stream detector loop)
# only aborts sockets — never pops, never calls ``client.close()`` —
# so the worker thread retains ownership of the FD release.
with request_client_lock:
request_client = request_client_holder.get("client")
owner_tid = request_client_holder.get("owner_tid")
stranger_thread = (
request_client is not None
and owner_tid is not None
and owner_tid != threading.get_ident()
)
if not stranger_thread:
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
if request_client is None:
return
if stranger_thread:
agent._abort_request_openai_client(request_client, reason=reason)
else:
agent._close_request_openai_client(request_client, reason=reason)
first_delta_fired = {"done": False}
+98 -1
View File
@@ -46,6 +46,7 @@ from agent.message_sanitization import (
_strip_non_ascii,
)
from agent.model_metadata import (
MINIMUM_CONTEXT_LENGTH,
estimate_messages_tokens_rough,
estimate_request_tokens_rough,
get_next_probe_tier,
@@ -73,6 +74,50 @@ from utils import base_url_host_matches, env_var_enabled
logger = logging.getLogger(__name__)
def _ollama_context_limit_error(agent: Any, request_tokens: int) -> Optional[str]:
"""Return a user-facing error when Ollama is loaded with too little context."""
if not getattr(agent, "tools", None):
return None
runtime_ctx = getattr(agent, "_ollama_num_ctx", None)
if not isinstance(runtime_ctx, int) or runtime_ctx <= 0:
return None
if runtime_ctx >= MINIMUM_CONTEXT_LENGTH:
return None
model = getattr(agent, "model", "") or "the selected model"
base_url = getattr(agent, "base_url", "") or "unknown base URL"
provider = getattr(agent, "provider", "") or "unknown"
tool_count = len(getattr(agent, "tools", None) or [])
logger.warning(
"Ollama runtime context too small for Hermes tool use: "
"model=%s provider=%s base_url=%s runtime_context=%d "
"minimum_context=%d estimated_request_tokens=%d tool_count=%d "
"session=%s",
model,
provider,
base_url,
runtime_ctx,
MINIMUM_CONTEXT_LENGTH,
request_tokens,
tool_count,
getattr(agent, "session_id", None) or "none",
)
return (
f"Ollama loaded `{model}` with only {runtime_ctx:,} tokens of runtime "
f"context, but Hermes needs at least {MINIMUM_CONTEXT_LENGTH:,} tokens "
"for reliable tool use.\n\n"
"Increase the Ollama context for this model and restart/reload the "
"model before trying again. A known-good starting point is 65,536 "
"tokens. In Hermes config, set `model.ollama_num_ctx: 65536` "
"(and `model.context_length: 65536` if you also override the displayed "
"model context). If you manage the model through an Ollama Modelfile, "
"set `PARAMETER num_ctx 65536` there instead."
)
def _ra():
"""Lazy reference to ``run_agent`` so callers can patch
``run_agent.handle_function_call`` / ``run_agent._set_interrupt`` /
@@ -527,6 +572,7 @@ def run_conversation(
api_call_count = 0
final_response = None
interrupted = False
failed = False
codex_ack_continuations = 0
length_continue_retries = 0
truncated_tool_call_retries = 0
@@ -883,6 +929,26 @@ def run_conversation(
# Calculate approximate request size for logging
total_chars = sum(len(str(msg)) for msg in api_messages)
approx_tokens = estimate_messages_tokens_rough(api_messages)
approx_request_tokens = estimate_request_tokens_rough(
api_messages, tools=agent.tools or None
)
_runtime_context_error = _ollama_context_limit_error(
agent, approx_request_tokens
)
if _runtime_context_error:
final_response = _runtime_context_error
failed = True
_turn_exit_reason = "ollama_runtime_context_too_small"
messages.append({"role": "assistant", "content": final_response})
agent._emit_status("❌ Ollama runtime context is too small for Hermes tool use")
api_call_count -= 1
agent._api_call_count = api_call_count
try:
agent.iteration_budget.refund()
except Exception:
pass
break
# Thinking spinner for quiet mode (animated during API call)
thinking_spinner = None
@@ -923,6 +989,7 @@ def run_conversation(
copilot_auth_retry_attempted=False
thinking_sig_retry_attempted = False
image_shrink_retry_attempted = False
multimodal_tool_content_retry_attempted = False
oauth_1m_beta_retry_attempted = False
llama_cpp_grammar_retry_attempted = False
has_retried_429 = False
@@ -1994,6 +2061,31 @@ def run_conversation(
"or shrink didn't reduce size; surfacing original error."
)
# Multimodal-tool-content recovery: providers that follow
# the OpenAI spec strictly (tool message content must be a
# string) reject our list-type content with a 400. Strip
# image parts from any list-type tool messages, mark the
# (provider, model) as no-list-tool-content for the rest
# of this session so future tool results preemptively
# downgrade, and retry once. See issue #27344.
if (
classified.reason == FailoverReason.multimodal_tool_content_unsupported
and not multimodal_tool_content_retry_attempted
):
multimodal_tool_content_retry_attempted = True
if agent._try_strip_image_parts_from_tool_messages(api_messages):
agent._vprint(
f"{agent.log_prefix}📐 Provider rejected list-type tool content — "
f"downgraded screenshots to text and retrying...",
force=True,
)
continue
else:
logger.info(
"multimodal-tool-content recovery: no list-type tool "
"messages with image parts found; surfacing original error."
)
# Anthropic OAuth subscription rejected the 1M-context beta
# header ("long context beta is not yet available for this
# subscription"). Disable the beta for the rest of this
@@ -3848,7 +3940,11 @@ def run_conversation(
)
# Determine if conversation completed successfully
completed = final_response is not None and api_call_count < agent.max_iterations
completed = (
final_response is not None
and api_call_count < agent.max_iterations
and not failed
)
# Save trajectory if enabled. ``user_message`` may be a multimodal
# list of parts; the trajectory format wants a plain string.
@@ -3998,6 +4094,7 @@ def run_conversation(
"api_calls": api_call_count,
"completed": completed,
"turn_exit_reason": _turn_exit_reason,
"failed": failed,
"partial": False, # True only when stopped due to invalid tool calls
"interrupted": interrupted,
"response_previewed": getattr(agent, "_response_was_previewed", False),
+4 -1
View File
@@ -50,6 +50,7 @@ from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from hermes_constants import get_hermes_home
from agent.skill_utils import is_excluded_skill_path
logger = logging.getLogger(__name__)
@@ -176,7 +177,9 @@ def get_keep() -> int:
def _count_skill_files(base: Path) -> int:
try:
return sum(1 for _ in base.rglob("SKILL.md"))
return sum(
1 for p in base.rglob("SKILL.md") if not is_excluded_skill_path(p)
)
except OSError:
return 0
+47
View File
@@ -50,6 +50,7 @@ class FailoverReason(enum.Enum):
# Request format
format_error = "format_error" # 400 bad request — abort or strip + retry
multimodal_tool_content_unsupported = "multimodal_tool_content_unsupported" # Provider rejected list-type content in tool messages (e.g. Xiaomi MiMo) — downgrade to text and retry
# Provider-specific
thinking_signature = "thinking_signature" # Anthropic thinking block sig invalid
@@ -165,6 +166,32 @@ _IMAGE_TOO_LARGE_PATTERNS = [
# the likely culprit; we still try the shrink path before giving up.
]
# Providers that follow the OpenAI spec strictly require tool message
# ``content`` to be a string. Some (Anthropic native, Codex Responses,
# Gemini native, first-party OpenAI) extend this to accept a content-parts
# list (text + image_url) so screenshots from computer_use survive. Others
# (Xiaomi MiMo, some Alibaba endpoints, a long tail of OpenAI-compatible
# providers) reject the list with a 400 — the patterns below are the most
# common error shapes we see. Recovery: strip image parts from tool
# messages in-place, record the (provider, model) for the rest of the
# session so we don't waste another call learning the same lesson, retry.
#
# See: https://github.com/NousResearch/hermes-agent/issues/27344
_MULTIMODAL_TOOL_CONTENT_PATTERNS = [
# Xiaomi MiMo: {"error":{"code":"400","message":"Param Incorrect","param":"text is not set"}}
"text is not set",
# Generic "tool message must be string" shapes
"tool message content must be a string",
"tool content must be a string",
"tool message must be a string",
# OpenAI-compat servers that reject list-type tool content with a
# schema-validation message
"expected string, got list",
"expected string, got array",
# Alibaba/DashScope variant
"tool_call.content must be string",
]
# Context overflow patterns
_CONTEXT_OVERFLOW_PATTERNS = [
"context length",
@@ -781,6 +808,19 @@ def _classify_400(
) -> ClassifiedError:
"""Classify 400 Bad Request — context overflow, format error, or generic."""
# Multimodal tool content rejected from 400. Must be checked BEFORE
# image_too_large because the recovery is different (strip image parts
# from tool messages, mark the model as no-list-tool-content for the
# rest of the session) and BEFORE context_overflow because some of the
# patterns ("text is not set") are ambiguous in isolation but become
# specific when combined with a 400 on a request known to contain
# multimodal tool content.
if any(p in error_msg for p in _MULTIMODAL_TOOL_CONTENT_PATTERNS):
return result_fn(
FailoverReason.multimodal_tool_content_unsupported,
retryable=True,
)
# Image-too-large from 400 (Anthropic's 5 MB per-image check fires this way).
# Must be checked BEFORE context_overflow because messages can trip both
# patterns ("exceeds" + "image") and image-shrink is a cheaper recovery.
@@ -922,6 +962,13 @@ def _classify_by_message(
should_compress=True,
)
# Multimodal tool content patterns (from message text when no status_code)
if any(p in error_msg for p in _MULTIMODAL_TOOL_CONTENT_PATTERNS):
return result_fn(
FailoverReason.multimodal_tool_content_unsupported,
retryable=True,
)
# Image-too-large patterns (from message text when no status_code)
if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
return result_fn(
+142 -11
View File
@@ -97,6 +97,37 @@ def is_write_denied(path: str) -> bool:
if resolved.startswith(prefix):
return True
# Hermes control-plane files: block both the ACTIVE profile's view
# (hermes_home) AND the global root view. Without the root pass, a
# profile-mode session leaves <root>/auth.json + <root>/config.yaml
# writable — letting a prompt-injected write_file overwrite the global
# files that every profile inherits from (same shape as #15981).
control_file_names = ("auth.json", "config.yaml", "webhook_subscriptions.json")
mcp_tokens_dir_name = "mcp-tokens"
hermes_dirs = []
for base in (_hermes_home_path(), _hermes_root_path()):
try:
real = os.path.realpath(base)
if real not in hermes_dirs:
hermes_dirs.append(real)
except Exception:
continue
for base_real in hermes_dirs:
for name in control_file_names:
try:
if resolved == os.path.realpath(os.path.join(base_real, name)):
return True
except Exception:
continue
try:
mcp_real = os.path.realpath(os.path.join(base_real, mcp_tokens_dir_name))
if resolved == mcp_real or resolved.startswith(mcp_real + os.sep):
return True
except Exception:
pass
safe_root = get_safe_write_root()
if safe_root and not (resolved == safe_root or resolved.startswith(safe_root + os.sep)):
return True
@@ -105,21 +136,121 @@ def is_write_denied(path: str) -> bool:
def get_read_block_error(path: str) -> Optional[str]:
"""Return an error message when a read targets internal Hermes cache files."""
"""Return an error message when a read targets a denied Hermes path.
Two categories are blocked:
* Internal Hermes cache files under ``HERMES_HOME/skills/.hub``
readable metadata that an attacker could use as a prompt-injection
carrier.
* Credential / secret stores under HERMES_HOME and the global Hermes
root: ``auth.json``, ``auth.lock``, ``.anthropic_oauth.json``,
``.env``, ``webhook_subscriptions.json``, and anything under
``mcp-tokens/``. These hold plaintext provider keys, OAuth tokens,
and HMAC secrets that the agent never needs to read directly
provider tools / gateway adapters consume them through internal
channels.
**This is NOT a security boundary.** The terminal tool runs as the
same OS user with shell access; the agent can still ``cat auth.json``
or ``cat ~/.hermes/.env`` and exfiltrate the file. The read-deny exists
as defense-in-depth that:
* Returns a clear error to models that respect tool denials, which
empirically prompts most modern models to stop rather than reach
for the shell.
* Surfaces a visible audit trail when something tries to read
credentials easier to spot in logs than a generic ``cat``.
Treat any user-visible framing around this as "may help" rather than
"stops attackers." A determined model or malicious instruction can
always shell out.
Callers that resolve relative paths against a non-process cwd
(e.g. ``TERMINAL_CWD`` in ``tools/file_tools.py``) MUST pre-resolve
and pass the absolute path string. This function's own ``resolve()``
is anchored at the Python process cwd, so a relative input like
``"auth.json"`` would otherwise miss the denylist when the task's
terminal cwd differs from the process cwd.
"""
resolved = Path(path).expanduser().resolve()
hermes_home = _hermes_home_path().resolve()
blocked_dirs = [
hermes_home / "skills" / ".hub" / "index-cache",
hermes_home / "skills" / ".hub",
]
for blocked in blocked_dirs:
# Resolve BOTH the active HERMES_HOME (profile-aware) AND the global
# Hermes root so credential stores at <root>/auth.json etc. are also
# blocked when running under a profile (HERMES_HOME points at
# <root>/profiles/<name> in profile mode). Same shape as the write
# deny widening (#15981, #14157).
hermes_dirs: list[Path] = []
for base in (_hermes_home_path(), _hermes_root_path()):
try:
resolved.relative_to(blocked)
real = base.resolve()
if real not in hermes_dirs:
hermes_dirs.append(real)
except Exception:
continue
# Skills .hub: prompt-injection carriers.
for hd in hermes_dirs:
blocked_dirs = [
hd / "skills" / ".hub" / "index-cache",
hd / "skills" / ".hub",
]
for blocked in blocked_dirs:
try:
resolved.relative_to(blocked)
except ValueError:
continue
return (
f"Access denied: {path} is an internal Hermes cache file "
"and cannot be read directly to prevent prompt injection. "
"Use the skills_list or skill_view tools instead."
)
# Credential / secret stores. Exact-file matches under either
# HERMES_HOME or <root>.
credential_file_names = (
"auth.json",
"auth.lock",
".anthropic_oauth.json",
".env",
"webhook_subscriptions.json",
)
for hd in hermes_dirs:
for name in credential_file_names:
try:
blocked = (hd / name).resolve()
except Exception:
continue
if resolved == blocked:
return (
f"Access denied: {path} is a Hermes credential store "
"and cannot be read directly. Provider tools consume "
"these credentials through internal channels. "
"(Defense-in-depth — not a security boundary; the "
"terminal tool can still bypass.)"
)
# mcp-tokens/: directory prefix match — anything inside is OAuth
# token material.
for hd in hermes_dirs:
try:
mcp_tokens = (hd / "mcp-tokens").resolve()
except Exception:
continue
if resolved == mcp_tokens:
return (
f"Access denied: {path} is the Hermes MCP token directory "
"and cannot be read directly. (Defense-in-depth — not a "
"security boundary; the terminal tool can still bypass.)"
)
try:
resolved.relative_to(mcp_tokens)
except ValueError:
continue
return (
f"Access denied: {path} is an internal Hermes cache file "
"and cannot be read directly to prevent prompt injection. "
"Use the skills_list or skill_view tools instead."
f"Access denied: {path} is a Hermes MCP token file "
"and cannot be read directly. (Defense-in-depth — not a "
"security boundary; the terminal tool can still bypass.)"
)
return None
+1
View File
@@ -209,6 +209,7 @@ DEFAULT_CONTEXT_LENGTHS = {
# via a custom provider. Values sourced from models.dev (2026-04).
# Keys use substring matching (longest-first), so e.g. "grok-4.20"
# matches "grok-4.20-0309-reasoning" / "-non-reasoning" / "-multi-agent-0309".
"grok-build": 256000, # grok-build-0.1
"grok-code-fast": 256000, # grok-code-fast-1
"grok-4-1-fast": 2000000, # grok-4-1-fast-(non-)reasoning
"grok-2-vision": 8192, # grok-2-vision, -1212, -latest
+3
View File
@@ -167,6 +167,9 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"gemini": "google",
"google": "google",
"xai": "xai",
# xAI OAuth is an authentication/transport path for the same xAI model
# catalog, so model metadata should resolve through the xAI provider.
"xai-oauth": "xai",
"xiaomi": "xiaomi",
"nvidia": "nvidia",
"groq": "groq",
+13
View File
@@ -0,0 +1,13 @@
"""External secret source integrations.
A secret source is anything that can supply environment-variable-shaped
credentials at process startup, _after_ ~/.hermes/.env has loaded. By
default sources are non-destructive: they only set values for env vars
that aren't already present, so .env and shell exports continue to win.
Currently shipped:
- ``bitwarden`` Bitwarden Secrets Manager (`bws` CLI). See
``agent.secret_sources.bitwarden`` for the integration and
``hermes_cli.secrets_cli`` for the user-facing setup wizard.
"""
+515
View File
@@ -0,0 +1,515 @@
"""Bitwarden Secrets Manager (`bws` CLI) integration.
Hermes pulls API keys from Bitwarden Secrets Manager at process startup
so they don't have to live in plaintext in ``~/.hermes/.env``.
Design summary
--------------
* The ``bws`` binary is auto-installed into ``<hermes_home>/bin/bws`` on
first use. Hermes pins one version (``_BWS_VERSION``) and downloads
the matching asset from the official GitHub Releases page, verifying
the SHA-256 against the release's published checksum file.
* The access token is stored in ``~/.hermes/.env`` as
``BWS_ACCESS_TOKEN`` (or whatever name the user picked in
``secrets.bitwarden.access_token_env``). This is the one
bootstrap secret every other provider key can live in Bitwarden.
* Pulling secrets is a single ``bws secret list <project_id>
--output json`` call. We cache the result in-process for
``cache_ttl_seconds`` so back-to-back ``hermes`` invocations don't
hammer the API.
* Failures NEVER block Hermes startup. Missing binary, no network,
expired token, etc. all emit a one-line warning and continue with
whatever credentials ``.env`` already had.
The module is intentionally subprocess-driven rather than going through
the ``bitwarden-sdk-secrets`` Python package: one cross-platform binary
is easier to lazy-install than a wheels-with-Rust-extension dependency.
"""
from __future__ import annotations
import hashlib
import json
import logging
import os
import platform
import shutil
import stat
import subprocess
import sys
import tempfile
import time
import urllib.error
import urllib.request
import zipfile
from dataclasses import dataclass, field
from pathlib import Path
from typing import Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Configuration constants
# ---------------------------------------------------------------------------
# Pinned upstream version. Bump in a follow-up PR — never auto-resolve
# "latest" because upstream release shape (asset names, CLI flags) is
# allowed to change between majors and we want updates to be deliberate.
_BWS_VERSION = "2.0.0"
_BWS_RELEASE_BASE = (
f"https://github.com/bitwarden/sdk-sm/releases/download/bws-v{_BWS_VERSION}"
)
_BWS_CHECKSUM_NAME = f"bws-sha256-checksums-{_BWS_VERSION}.txt"
# How long to wait for bws subprocesses and HTTP downloads, in seconds.
_BWS_DOWNLOAD_TIMEOUT = 60
_BWS_RUN_TIMEOUT = 30
# In-process cache so repeated load_hermes_dotenv() calls (CLI startup,
# gateway hot-reload, test suites) don't re-fetch from BSM.
_CacheKey = Tuple[str, str] # (access_token_fingerprint, project_id)
_CACHE: Dict[_CacheKey, "_CachedFetch"] = {}
@dataclass
class _CachedFetch:
secrets: Dict[str, str]
fetched_at: float
def is_fresh(self, ttl_seconds: float) -> bool:
if ttl_seconds <= 0:
return False
return (time.time() - self.fetched_at) < ttl_seconds
# ---------------------------------------------------------------------------
# Public dataclasses
# ---------------------------------------------------------------------------
@dataclass
class FetchResult:
"""Outcome of a single BSM pull."""
secrets: Dict[str, str] = field(default_factory=dict)
applied: List[str] = field(default_factory=list) # set into os.environ
skipped: List[str] = field(default_factory=list) # already set, not overridden
warnings: List[str] = field(default_factory=list) # non-fatal issues
error: Optional[str] = None # fatal: nothing was fetched
binary_path: Optional[Path] = None
@property
def ok(self) -> bool:
return self.error is None
# ---------------------------------------------------------------------------
# Binary discovery + lazy install
# ---------------------------------------------------------------------------
def _hermes_bin_dir() -> Path:
"""Where Hermes stores its managed binaries. Profile-aware."""
from hermes_constants import get_hermes_home
return get_hermes_home() / "bin"
def find_bws(*, install_if_missing: bool = False) -> Optional[Path]:
"""Return a path to a usable ``bws`` binary, or None.
Resolution order:
1. ``<hermes_home>/bin/bws`` (our managed copy preferred)
2. ``shutil.which("bws")`` (system PATH)
When ``install_if_missing`` is True and neither resolves, this calls
:func:`install_bws` to download and verify the pinned version.
"""
managed = _hermes_bin_dir() / _platform_binary_name()
if managed.exists() and os.access(managed, os.X_OK):
return managed
system = shutil.which("bws")
if system:
return Path(system)
if install_if_missing:
try:
return install_bws()
except Exception as exc: # noqa: BLE001 — never block startup
logger.warning("bws auto-install failed: %s", exc)
return None
return None
def _platform_binary_name() -> str:
return "bws.exe" if platform.system() == "Windows" else "bws"
def _platform_asset_name() -> str:
"""Map (uname, arch, libc) → the upstream asset filename.
Asset names follow Rust's target triple convention. Linux defaults
to gnu (glibc); we switch to musl only if ldd --version says so.
"""
system = platform.system()
machine = platform.machine().lower()
if system == "Darwin":
# Universal binary works on both Intel and Apple Silicon — no
# need to pick a per-arch asset.
return f"bws-macos-universal-{_BWS_VERSION}.zip"
if system == "Windows":
arch = "aarch64" if machine in ("arm64", "aarch64") else "x86_64"
return f"bws-{arch}-pc-windows-msvc-{_BWS_VERSION}.zip"
if system == "Linux":
arch = "aarch64" if machine in ("arm64", "aarch64") else "x86_64"
libc = "gnu"
# ldd --version writes to stderr on glibc, stdout on musl. We
# don't need bullet-proof detection — getting it wrong falls
# back to a clear error from the binary loader, which we catch.
try:
res = subprocess.run(
["ldd", "--version"],
capture_output=True,
text=True,
timeout=2,
)
if "musl" in (res.stdout + res.stderr).lower():
libc = "musl"
except (OSError, subprocess.TimeoutExpired):
pass
return f"bws-{arch}-unknown-linux-{libc}-{_BWS_VERSION}.zip"
raise RuntimeError(
f"Unsupported platform for bws auto-install: {system} {machine}"
)
def install_bws(*, force: bool = False) -> Path:
"""Download, verify, and install the pinned ``bws`` binary.
Returns the path to the installed executable. Raises on any
failure (network, checksum, extraction) callers in the auto-install
path catch these; the user-facing ``hermes secrets bitwarden setup``
surface lets them propagate so the wizard can show a clear error.
"""
bin_dir = _hermes_bin_dir()
bin_dir.mkdir(parents=True, exist_ok=True)
target = bin_dir / _platform_binary_name()
if target.exists() and not force:
return target
asset_name = _platform_asset_name()
asset_url = f"{_BWS_RELEASE_BASE}/{asset_name}"
checksum_url = f"{_BWS_RELEASE_BASE}/{_BWS_CHECKSUM_NAME}"
with tempfile.TemporaryDirectory(prefix="hermes-bws-") as tmpdir:
tmp = Path(tmpdir)
zip_path = tmp / asset_name
checksum_path = tmp / _BWS_CHECKSUM_NAME
logger.info("Downloading %s", asset_url)
_http_download(asset_url, zip_path)
_http_download(checksum_url, checksum_path)
expected = _expected_sha256(checksum_path, asset_name)
actual = _sha256_file(zip_path)
if expected.lower() != actual.lower():
raise RuntimeError(
f"Checksum mismatch for {asset_name}: "
f"expected {expected}, got {actual}"
)
with zipfile.ZipFile(zip_path) as zf:
member = _pick_zip_member(zf, _platform_binary_name())
zf.extract(member, tmp)
extracted = tmp / member
# Move into place atomically. We write to a sibling tempfile in
# the final directory so the rename can't cross filesystems.
fd, staged = tempfile.mkstemp(dir=str(bin_dir), prefix=".bws_")
os.close(fd)
shutil.copy2(extracted, staged)
os.chmod(
staged,
stat.S_IRUSR | stat.S_IWUSR | stat.S_IXUSR
| stat.S_IRGRP | stat.S_IXGRP
| stat.S_IROTH | stat.S_IXOTH,
)
os.replace(staged, target)
logger.info("Installed bws %s at %s", _BWS_VERSION, target)
return target
def _http_download(url: str, dest: Path) -> None:
req = urllib.request.Request(url, headers={"User-Agent": "hermes-agent"})
try:
with urllib.request.urlopen(req, timeout=_BWS_DOWNLOAD_TIMEOUT) as resp: # noqa: S310
with open(dest, "wb") as f:
shutil.copyfileobj(resp, f)
except urllib.error.URLError as exc:
raise RuntimeError(f"Failed to download {url}: {exc}") from exc
def _expected_sha256(checksum_file: Path, asset_name: str) -> str:
"""Parse the upstream ``bws-sha256-checksums-X.Y.Z.txt`` file.
Format is the standard ``sha256sum`` output: ``<hex> <filename>``,
one per line.
"""
text = checksum_file.read_text(encoding="utf-8", errors="replace")
for line in text.splitlines():
parts = line.strip().split()
if len(parts) >= 2 and parts[-1] == asset_name:
return parts[0]
raise RuntimeError(
f"No checksum entry for {asset_name} in {checksum_file.name}"
)
def _sha256_file(path: Path) -> str:
h = hashlib.sha256()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(65536), b""):
h.update(chunk)
return h.hexdigest()
def _pick_zip_member(zf: zipfile.ZipFile, binary_name: str) -> str:
"""Find the binary inside the upstream zip.
Historically the archive has been flat (``bws`` at the root) but we
tolerate a top-level directory just in case upstream changes.
"""
candidates = [n for n in zf.namelist() if n.split("/")[-1] == binary_name]
if not candidates:
raise RuntimeError(
f"Could not find {binary_name} inside downloaded archive "
f"(members: {zf.namelist()[:5]}...)"
)
# Prefer the shortest path (i.e. root over nested) for determinism.
candidates.sort(key=len)
return candidates[0]
# ---------------------------------------------------------------------------
# Secret fetch + apply
# ---------------------------------------------------------------------------
def _token_fingerprint(token: str) -> str:
"""SHA-256 prefix used as a cache key — never logged, never displayed."""
return hashlib.sha256(token.encode("utf-8")).hexdigest()[:16]
def fetch_bitwarden_secrets(
*,
access_token: str,
project_id: str,
binary: Optional[Path] = None,
cache_ttl_seconds: float = 300,
use_cache: bool = True,
) -> Tuple[Dict[str, str], List[str]]:
"""Pull the secrets for ``project_id`` from Bitwarden Secrets Manager.
Returns ``(secrets_dict, warnings_list)``.
Raises :class:`RuntimeError` for fatal conditions (missing binary,
auth failure, unparseable output). Callers in the env_loader path
catch this and emit a single warning; callers in the user-facing
setup wizard let it propagate.
"""
if not access_token:
raise RuntimeError("Bitwarden access token is empty")
if not project_id:
raise RuntimeError("Bitwarden project_id is empty")
cache_key = (_token_fingerprint(access_token), project_id)
if use_cache:
cached = _CACHE.get(cache_key)
if cached and cached.is_fresh(cache_ttl_seconds):
return cached.secrets, []
bws = binary or find_bws(install_if_missing=True)
if bws is None:
raise RuntimeError(
"bws binary not available — auto-install failed and `bws` is "
"not on PATH. Install manually from "
"https://github.com/bitwarden/sdk-sm/releases or re-run "
"`hermes secrets bitwarden setup`."
)
secrets, warnings = _run_bws_list(bws, access_token, project_id)
_CACHE[cache_key] = _CachedFetch(secrets=secrets, fetched_at=time.time())
return secrets, warnings
def _run_bws_list(
bws: Path, access_token: str, project_id: str
) -> Tuple[Dict[str, str], List[str]]:
cmd = [str(bws), "secret", "list", project_id, "--output", "json"]
env = os.environ.copy()
env["BWS_ACCESS_TOKEN"] = access_token
# Make sure we're not echoing telemetry / colour codes into json.
env.setdefault("NO_COLOR", "1")
try:
proc = subprocess.run( # noqa: S603 — bws path is trusted
cmd,
env=env,
capture_output=True,
text=True,
timeout=_BWS_RUN_TIMEOUT,
)
except subprocess.TimeoutExpired as exc:
raise RuntimeError(
f"bws timed out after {_BWS_RUN_TIMEOUT}s fetching secrets"
) from exc
except OSError as exc:
raise RuntimeError(f"failed to invoke bws: {exc}") from exc
if proc.returncode != 0:
# bws writes auth/network errors to stderr in plain English.
# Strip ANSI just in case and surface the first 200 chars.
err = (proc.stderr or proc.stdout or "").strip().replace("\x1b", "")
raise RuntimeError(
f"bws exited {proc.returncode}: {err[:200]}"
)
raw = proc.stdout.strip()
if not raw:
return {}, ["bws returned no output (empty project?)"]
try:
payload = json.loads(raw)
except json.JSONDecodeError as exc:
raise RuntimeError(f"bws returned non-JSON output: {exc}") from exc
if not isinstance(payload, list):
raise RuntimeError(
f"bws returned unexpected shape: {type(payload).__name__}"
)
secrets: Dict[str, str] = {}
warnings: List[str] = []
for item in payload:
if not isinstance(item, dict):
continue
key = item.get("key")
value = item.get("value")
if not isinstance(key, str) or not isinstance(value, str):
continue
if not _is_valid_env_name(key):
warnings.append(
f"Skipping secret {key!r}: not a valid env-var name"
)
continue
secrets[key] = value
return secrets, warnings
def _is_valid_env_name(name: str) -> bool:
if not name:
return False
if not (name[0].isalpha() or name[0] == "_"):
return False
return all(c.isalnum() or c == "_" for c in name)
# ---------------------------------------------------------------------------
# Public entry point — called from hermes_cli.env_loader
# ---------------------------------------------------------------------------
def apply_bitwarden_secrets(
*,
enabled: bool,
access_token_env: str = "BWS_ACCESS_TOKEN",
project_id: str = "",
override_existing: bool = False,
cache_ttl_seconds: float = 300,
auto_install: bool = True,
) -> FetchResult:
"""Pull secrets from BSM and set them on ``os.environ``.
This is the function ``load_hermes_dotenv()`` calls after the .env
files have loaded. It is intentionally defensive any failure
returns a :class:`FetchResult` with ``error`` set; it never raises.
Parameters mirror the ``secrets.bitwarden.*`` config keys so the
caller can just splat the dict in.
"""
result = FetchResult()
if not enabled:
return result
access_token = os.environ.get(access_token_env, "").strip()
if not access_token:
result.error = (
f"secrets.bitwarden.enabled is true but {access_token_env} is "
"not set. Run `hermes secrets bitwarden setup`."
)
return result
if not project_id:
result.error = (
"secrets.bitwarden.project_id is empty. "
"Run `hermes secrets bitwarden setup`."
)
return result
binary = find_bws(install_if_missing=auto_install)
result.binary_path = binary
if binary is None:
result.error = (
"bws binary not available and auto-install is disabled. "
"Run `hermes secrets bitwarden setup` to install."
)
return result
try:
secrets, warnings = fetch_bitwarden_secrets(
access_token=access_token,
project_id=project_id,
binary=binary,
cache_ttl_seconds=cache_ttl_seconds,
)
except RuntimeError as exc:
result.error = str(exc)
return result
result.secrets = secrets
result.warnings.extend(warnings)
for key, value in secrets.items():
if key == access_token_env:
# Don't let BSM clobber the very token we used to fetch
# itself — that would be a footgun if someone stored the
# token as a BSM secret too.
result.skipped.append(key)
continue
if not override_existing and os.environ.get(key):
result.skipped.append(key)
continue
os.environ[key] = value
result.applied.append(key)
return result
# ---------------------------------------------------------------------------
# Test hook — used by hermetic tests to flush the cache between cases.
# ---------------------------------------------------------------------------
def _reset_cache_for_tests() -> None:
_CACHE.clear()
+58 -3
View File
@@ -12,7 +12,7 @@ import sys
from pathlib import Path
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import get_config_path, get_skills_dir
from hermes_constants import get_config_path, get_skills_dir, is_termux
logger = logging.getLogger(__name__)
@@ -24,7 +24,43 @@ PLATFORM_MAP = {
"windows": "win32",
}
EXCLUDED_SKILL_DIRS = frozenset((".git", ".github", ".hub", ".archive"))
EXCLUDED_SKILL_DIRS = frozenset(
(
".git",
".github",
".hub",
".archive",
".venv",
"venv",
"node_modules",
"site-packages",
"__pycache__",
".tox",
".nox",
".pytest_cache",
".mypy_cache",
".ruff_cache",
)
)
def is_excluded_skill_path(path) -> bool:
"""True if any component of *path* is in EXCLUDED_SKILL_DIRS.
Use this on every SKILL.md path produced by ``rglob`` to prune
dependency, virtualenv, VCS, and cache directories. Centralising the
check here keeps every skill-scanning site in sync with the shared
exclusion set.
Accepts a Path or string.
"""
try:
parts = path.parts # Path
except AttributeError:
from pathlib import PurePath
parts = PurePath(str(path)).parts
return any(part in EXCLUDED_SKILL_DIRS for part in parts)
# ── Lazy YAML loader ─────────────────────────────────────────────────────
@@ -100,6 +136,14 @@ def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:
If the field is absent or empty the skill is compatible with **all**
platforms (backward-compatible default).
Termux note: on Termux/Android, ``sys.platform`` is ``"linux"`` on
older Pythons but became ``"android"`` on Python 3.13+. Termux is a
Linux userland riding on the Android kernel, so skills tagged
``linux`` are treated as compatible in Termux regardless of which
``sys.platform`` value Python reports. Individual Linux commands
inside a skill may still misbehave (no systemd, BusyBox utils, no
apt/dnf, etc.) but that is on the skill, not on platform gating.
"""
platforms = frontmatter.get("platforms")
if not platforms:
@@ -107,11 +151,21 @@ def skill_matches_platform(frontmatter: Dict[str, Any]) -> bool:
if not isinstance(platforms, list):
platforms = [platforms]
current = sys.platform
running_in_termux = is_termux()
for platform in platforms:
normalized = str(platform).lower().strip()
mapped = PLATFORM_MAP.get(normalized, normalized)
if current.startswith(mapped):
return True
# Termux runs a Linux userland on Android. Accept linux-tagged
# skills regardless of whether sys.platform is "linux" (pre-3.13
# Termux) or "android" (Python 3.13+ Termux, and any other
# Android runtime).
if running_in_termux and mapped == "linux":
return True
# Explicit termux/android tags match a Termux session too.
if running_in_termux and mapped in ("termux", "android"):
return True
return False
@@ -478,7 +532,8 @@ def extract_skill_description(frontmatter: Dict[str, Any]) -> str:
def iter_skill_index_files(skills_dir: Path, filename: str):
"""Walk skills_dir yielding sorted paths matching *filename*.
Excludes ``.git``, ``.github``, ``.hub``, ``.archive`` directories.
Excludes Hermes metadata, VCS, virtualenv/dependency, and cache
directories so dependencies cannot register nested skills.
"""
matches = []
for root, dirs, files in os.walk(skills_dir, followlinks=True):
+313 -86
View File
@@ -51,6 +51,8 @@ os.environ["HERMES_QUIET"] = "1" # Our own modules
import yaml
from hermes_cli.fallback_config import get_fallback_chain
# prompt_toolkit for fixed input area TUI
from prompt_toolkit.history import FileHistory
from prompt_toolkit.styles import Style as PTStyle
@@ -81,17 +83,73 @@ except Exception:
import threading
import queue
from agent.usage_pricing import (
CanonicalUsage,
estimate_usage_cost,
format_duration_compact,
format_token_count_compact,
)
from agent.markdown_tables import (
is_table_divider,
looks_like_table_row,
realign_markdown_tables,
)
def CanonicalUsage(*args, **kwargs):
from agent.usage_pricing import CanonicalUsage as _CanonicalUsage
return _CanonicalUsage(*args, **kwargs)
def estimate_usage_cost(*args, **kwargs):
from agent.usage_pricing import estimate_usage_cost as _estimate_usage_cost
return _estimate_usage_cost(*args, **kwargs)
def format_duration_compact(*args, **kwargs):
seconds = float(args[0] if args else kwargs.get("seconds", 0.0))
if seconds < 60:
return f"{seconds:.0f}s"
minutes = seconds / 60
if minutes < 60:
return f"{minutes:.0f}m"
hours = minutes / 60
if hours < 24:
remaining_min = int(minutes % 60)
return f"{int(hours)}h {remaining_min}m" if remaining_min else f"{int(hours)}h"
days = hours / 24
return f"{days:.1f}d"
def format_token_count_compact(*args, **kwargs):
value = int(args[0] if args else kwargs.get("value", 0))
abs_value = abs(value)
if abs_value < 1_000:
return str(value)
sign = "-" if value < 0 else ""
units = ((1_000_000_000, "B"), (1_000_000, "M"), (1_000, "K"))
for threshold, suffix in units:
if abs_value >= threshold:
scaled = abs_value / threshold
if scaled < 10:
text = f"{scaled:.2f}"
elif scaled < 100:
text = f"{scaled:.1f}"
else:
text = f"{scaled:.0f}"
if "." in text:
text = text.rstrip("0").rstrip(".")
return f"{sign}{text}{suffix}"
return f"{value:,}"
def is_table_divider(*args, **kwargs):
from agent.markdown_tables import is_table_divider as _is_table_divider
return _is_table_divider(*args, **kwargs)
def looks_like_table_row(*args, **kwargs):
from agent.markdown_tables import looks_like_table_row as _looks_like_table_row
return _looks_like_table_row(*args, **kwargs)
def realign_markdown_tables(*args, **kwargs):
from agent.markdown_tables import realign_markdown_tables as _realign_markdown_tables
return _realign_markdown_tables(*args, **kwargs)
# NOTE: `from agent.account_usage import ...` is deliberately NOT at module
# top — it transitively pulls the OpenAI SDK chain (~230 ms cold) and is only
# needed when the user runs `/limits`. Lazy-imported inside the handler below.
@@ -719,29 +777,135 @@ from rich.text import Text as _RichText
import fire
# Import the agent and tool systems
from run_agent import AIAgent
from model_tools import get_tool_definitions, get_toolset_for_tool
# Import agent and tool systems lazily. Bare interactive startup only needs the
# prompt; the full agent/tool registry is initialized on first use.
def AIAgent(*args, **kwargs):
from run_agent import AIAgent as _AIAgent
return _AIAgent(*args, **kwargs)
def get_tool_definitions(*args, **kwargs):
from model_tools import get_tool_definitions as _get_tool_definitions
return _get_tool_definitions(*args, **kwargs)
def get_toolset_for_tool(*args, **kwargs):
from model_tools import get_toolset_for_tool as _get_toolset_for_tool
return _get_toolset_for_tool(*args, **kwargs)
# Extracted CLI modules (Phase 3)
from hermes_cli.banner import build_welcome_banner
from hermes_cli.commands import SlashCommandCompleter, SlashCommandAutoSuggest
from toolsets import get_all_toolsets, get_toolset_info, validate_toolset
def get_all_toolsets(*args, **kwargs):
from toolsets import get_all_toolsets as _get_all_toolsets
return _get_all_toolsets(*args, **kwargs)
def get_toolset_info(*args, **kwargs):
from toolsets import get_toolset_info as _get_toolset_info
return _get_toolset_info(*args, **kwargs)
def validate_toolset(*args, **kwargs):
from toolsets import validate_toolset as _validate_toolset
return _validate_toolset(*args, **kwargs)
# Cron job system for scheduled tasks (execution is handled by the gateway)
from cron import get_job
def get_job(*args, **kwargs):
from cron import get_job as _get_job
return _get_job(*args, **kwargs)
# Resource cleanup imports for safe shutdown (terminal VMs, browser sessions)
from tools.terminal_tool import cleanup_all_environments as _cleanup_all_terminals
from tools.terminal_tool import set_sudo_password_callback, set_approval_callback
from tools.skills_tool import set_secret_capture_callback
from hermes_cli.callbacks import prompt_for_secret
from tools.browser_tool import _emergency_cleanup_all_sessions as _cleanup_all_browsers
def _cleanup_all_terminals(*args, **kwargs):
from tools.terminal_tool import cleanup_all_environments
return cleanup_all_environments(*args, **kwargs)
def set_sudo_password_callback(*args, **kwargs):
from tools.terminal_tool import set_sudo_password_callback as _set_sudo_password_callback
return _set_sudo_password_callback(*args, **kwargs)
def set_approval_callback(*args, **kwargs):
from tools.terminal_tool import set_approval_callback as _set_approval_callback
return _set_approval_callback(*args, **kwargs)
def set_secret_capture_callback(*args, **kwargs):
from tools.skills_tool import set_secret_capture_callback as _set_secret_capture_callback
return _set_secret_capture_callback(*args, **kwargs)
def _cleanup_all_browsers(*args, **kwargs):
from tools.browser_tool import _emergency_cleanup_all_sessions
return _emergency_cleanup_all_sessions(*args, **kwargs)
# Guard to prevent cleanup from running multiple times on exit
_cleanup_done = False
# Weak reference to the active AIAgent for memory provider shutdown at exit
_active_agent_ref = None
_deferred_agent_startup_done = False
def _prepare_deferred_agent_startup() -> None:
"""Run Termux-deferred agent discovery before the first real agent turn."""
global _deferred_agent_startup_done
if _deferred_agent_startup_done:
return
if os.environ.get("HERMES_DEFER_AGENT_STARTUP") != "1":
return
_deferred_agent_startup_done = True
_accept_hooks = os.environ.get("HERMES_ACCEPT_HOOKS", "").lower() in {
"1",
"true",
"yes",
"on",
}
try:
from hermes_cli.plugins import discover_plugins
discover_plugins()
except Exception:
logger.warning(
"plugin discovery failed at deferred CLI startup",
exc_info=True,
)
try:
from tools.mcp_tool import discover_mcp_tools
discover_mcp_tools()
except Exception:
logger.debug(
"MCP tool discovery failed at deferred CLI startup",
exc_info=True,
)
try:
from agent.shell_hooks import register_from_config
from hermes_cli.config import load_config
register_from_config(load_config(), accept_hooks=_accept_hooks)
except Exception:
logger.debug(
"shell-hook registration failed at deferred CLI startup",
exc_info=True,
)
def _run_cleanup():
"""Run resource cleanup exactly once."""
@@ -2455,7 +2619,13 @@ def _build_compact_banner() -> str:
line1 = f"{agent_name} - AI Agent Framework"
tiny_line = agent_name
version_line = format_banner_version_label()
if os.environ.get("HERMES_FAST_STARTUP_BANNER") == "1":
from hermes_cli import __release_date__ as _release_date
from hermes_cli import __version__ as _version
version_line = f"Hermes Agent v{_version} ({_release_date})"
else:
version_line = format_banner_version_label()
w = min(shutil.get_terminal_size().columns - 2, 88)
if w < 30:
@@ -2504,19 +2674,48 @@ def _looks_like_slash_command(text: str) -> bool:
# Skill Slash Commands — dynamic commands generated from installed skills
# ============================================================================
from agent.skill_commands import (
scan_skill_commands,
get_skill_commands,
build_skill_invocation_message,
build_preloaded_skills_prompt,
)
from agent.skill_bundles import (
get_skill_bundles,
build_bundle_invocation_message,
)
_skill_commands = None
_skill_bundles = None
_skill_commands = scan_skill_commands()
_skill_bundles = get_skill_bundles()
def _ensure_skill_commands() -> dict:
global _skill_commands
if _skill_commands is None:
from agent.skill_commands import scan_skill_commands
_skill_commands = scan_skill_commands()
return _skill_commands
def get_skill_commands() -> dict:
return _ensure_skill_commands()
def build_skill_invocation_message(*args, **kwargs):
from agent.skill_commands import build_skill_invocation_message as _impl
return _impl(*args, **kwargs)
def build_preloaded_skills_prompt(*args, **kwargs):
from agent.skill_commands import build_preloaded_skills_prompt as _impl
return _impl(*args, **kwargs)
def get_skill_bundles() -> dict:
global _skill_bundles
if _skill_bundles is None:
from agent.skill_bundles import get_skill_bundles as _impl
_skill_bundles = _impl()
return _skill_bundles
def build_bundle_invocation_message(*args, **kwargs):
from agent.skill_bundles import build_bundle_invocation_message as _impl
return _impl(*args, **kwargs)
def _get_plugin_cmd_handler_names() -> set:
@@ -2852,12 +3051,9 @@ class HermesCLI:
pass
# Fallback provider chain — tried in order when primary fails after retries.
# Supports new list format (fallback_providers) and legacy single-dict (fallback_model).
fb = CLI_CONFIG.get("fallback_providers") or CLI_CONFIG.get("fallback_model") or []
# Normalize legacy single-dict to a one-element list
if isinstance(fb, dict):
fb = [fb] if fb.get("provider") and fb.get("model") else []
self._fallback_model = fb
# Merge new ``fallback_providers`` entries with any legacy
# ``fallback_model`` entries so old configs still participate.
self._fallback_model = get_fallback_chain(CLI_CONFIG)
# Signature of the currently-initialised agent's runtime. Used to
# rebuild the agent when provider / model / base_url changes across
@@ -2865,7 +3061,9 @@ class HermesCLI:
self._active_agent_route_signature = None
# Agent will be initialized on first use
self.agent: Optional[AIAgent] = None
self.agent: Optional[Any] = None
self._tool_callbacks_installed = False
self._tirith_security_checked = False
self._app = None # prompt_toolkit Application (set in run())
# Conversation state
@@ -4488,6 +4686,41 @@ class HermesCLI:
route["request_overrides"] = overrides
return route
def _install_tool_callbacks(self) -> None:
"""Install tool callbacks that need the live prompt UI."""
if getattr(self, "_tool_callbacks_installed", False):
return
set_sudo_password_callback(self._sudo_password_callback)
set_approval_callback(self._approval_callback)
set_secret_capture_callback(self._secret_capture_callback)
try:
from tools.computer_use_tool import set_approval_callback as _set_cu_cb
_set_cu_cb(self._computer_use_approval_callback)
except ImportError:
pass
self._tool_callbacks_installed = True
def _ensure_tirith_security(self) -> None:
"""Check tirith availability once before tools can run terminal commands."""
if getattr(self, "_tirith_security_checked", False):
return
self._tirith_security_checked = True
try:
from tools.tirith_security import ensure_installed, is_platform_supported
tirith_path = ensure_installed(log_failures=False)
if tirith_path is None and is_platform_supported():
security_cfg = self.config.get("security", {}) or {}
tirith_enabled = security_cfg.get("tirith_enabled", True)
if tirith_enabled:
_cprint(
f" {_DIM}⚠ tirith security scanner enabled but not available "
f"— command scanning will use pattern matching only{_RST}"
)
except Exception:
pass
def _init_agent(self, *, model_override: str = None, runtime_override: dict = None, request_overrides: dict | None = None) -> bool:
"""
Initialize the agent on first use.
@@ -4499,6 +4732,10 @@ class HermesCLI:
if self.agent is not None:
return True
_prepare_deferred_agent_startup()
self._install_tool_callbacks()
self._ensure_tirith_security()
if not self._ensure_runtime_credentials():
return False
@@ -4713,8 +4950,10 @@ class HermesCLI:
context_length=ctx_len,
)
# Show tool availability warnings if any tools are disabled
self._show_tool_availability_warnings()
# Tool discovery is intentionally deferred on the Termux bare prompt
# path; availability warnings are shown once tools are initialized.
if os.environ.get("HERMES_DEFER_AGENT_STARTUP") != "1":
self._show_tool_availability_warnings()
# Warn about very low context lengths (common with local servers)
if ctx_len and ctx_len <= 8192:
@@ -5491,9 +5730,13 @@ class HermesCLI:
def _show_status(self):
"""Show compact startup status line."""
# Get tool count
tools = get_tool_definitions(enabled_toolsets=self.enabled_toolsets, quiet_mode=True)
tool_count = len(tools) if tools else 0
# Avoid pulling the full tool registry into the bare Termux prompt path.
if os.environ.get("HERMES_DEFER_AGENT_STARTUP") == "1":
tool_status = "tools deferred"
else:
tools = get_tool_definitions(enabled_toolsets=self.enabled_toolsets, quiet_mode=True)
tool_count = len(tools) if tools else 0
tool_status = f"{tool_count} tools"
# Format model name (shorten if needed)
model_short = self.model.split("/")[-1] if "/" in self.model else self.model
@@ -5525,7 +5768,7 @@ class HermesCLI:
self._console_print(
f" {api_indicator} [{accent_color}]{model_short}[/] "
f"[dim {separator_color}]·[/] [bold {label_color}]{tool_count} tools[/]"
f"[dim {separator_color}]·[/] [bold {label_color}]{tool_status}[/]"
f"{toolsets_info}{provider_info}"
)
@@ -5638,9 +5881,10 @@ class HermesCLI:
continue
ChatConsole().print(f" [bold {_accent_hex()}]{cmd:<15}[/] [dim]-[/] {_escape(desc)}")
if _skill_commands:
_cprint(f"\n{_BOLD}Skill Commands{_RST} ({len(_skill_commands)} installed):")
for cmd, info in sorted(_skill_commands.items()):
skill_commands = _ensure_skill_commands()
if skill_commands:
_cprint(f"\n{_BOLD}Skill Commands{_RST} ({len(skill_commands)} installed):")
for cmd, info in sorted(skill_commands.items()):
ChatConsole().print(
f" [bold {_accent_hex()}]{cmd:<22}[/] [dim]-[/] {_escape(info['description'])}"
)
@@ -8161,6 +8405,8 @@ class HermesCLI:
else:
# Check for user-defined quick commands (bypass agent loop, no LLM call)
base_cmd = cmd_lower.split()[0]
skill_commands = _ensure_skill_commands()
skill_bundles = get_skill_bundles()
quick_commands = self.config.get("quick_commands", {})
if base_cmd.lstrip("/") in quick_commands:
qcmd = quick_commands[base_cmd.lstrip("/")]
@@ -8216,14 +8462,14 @@ class HermesCLI:
_cprint(f"\033[1;31mPlugin command error: {e}{_RST}")
# Skill bundles take precedence over individual skills — /<bundle>
# loads multiple skills at once. Rescans cheaply when files change.
elif base_cmd in get_skill_bundles():
elif base_cmd in skill_bundles:
user_instruction = cmd_original[len(base_cmd):].strip()
bundle_result = build_bundle_invocation_message(
base_cmd, user_instruction, task_id=self.session_id
)
if bundle_result:
msg, loaded_names, missing = bundle_result
bundle_info = get_skill_bundles()[base_cmd]
bundle_info = skill_bundles[base_cmd]
print(
f"\n⚡ Loading bundle: {bundle_info['name']} "
f"({len(loaded_names)} skills)"
@@ -8239,13 +8485,13 @@ class HermesCLI:
f"[bold red]Failed to load bundle for {base_cmd}[/]"
)
# Check for skill slash commands (/gif-search, /axolotl, etc.)
elif base_cmd in _skill_commands:
elif base_cmd in skill_commands:
user_instruction = cmd_original[len(base_cmd):].strip()
msg = build_skill_invocation_message(
base_cmd, user_instruction, task_id=self.session_id
)
if msg:
skill_name = _skill_commands[base_cmd]["name"]
skill_name = skill_commands[base_cmd]["name"]
print(f"\n⚡ Loading skill: {skill_name}")
if hasattr(self, '_pending_input'):
self._pending_input.put(msg)
@@ -8257,7 +8503,7 @@ class HermesCLI:
# that execution-time resolution agrees with tab-completion.
from hermes_cli.commands import COMMANDS
typed_base = cmd_lower.split()[0]
all_known = set(COMMANDS) | set(_skill_commands) | set(get_skill_bundles())
all_known = set(COMMANDS) | set(skill_commands) | set(skill_bundles)
matches = [c for c in all_known if c.startswith(typed_base)]
if len(matches) > 1:
# Prefer an exact match (typed the full command name)
@@ -10221,6 +10467,7 @@ class HermesCLI:
self._voice_processing = True
submitted = False
transcription_failed = False
wav_path = None
try:
if self._voice_recorder is None:
@@ -10269,18 +10516,24 @@ class HermesCLI:
else:
error = result.get("error", "Unknown error")
_cprint(f"\n{_DIM}Transcription failed: {error}{_RST}")
transcription_failed = True
except Exception as e:
_cprint(f"\n{_DIM}Voice processing error: {e}{_RST}")
transcription_failed = wav_path is not None
finally:
with self._voice_lock:
self._voice_processing = False
if hasattr(self, '_app') and self._app:
self._app.invalidate()
# Clean up temp file
# Clean up temp file unless transcription failed. On failure, keep
# the source recording so long dictation is not lost.
try:
if wav_path and os.path.isfile(wav_path):
os.unlink(wav_path)
if transcription_failed:
_cprint(f"{_DIM}Recording preserved at: {wav_path}{_RST}")
else:
os.unlink(wav_path)
except Exception:
pass
@@ -12016,37 +12269,11 @@ class HermesCLI:
self._voice_tts_done = threading.Event() # Signals TTS playback finished
self._voice_tts_done.set() # Initially "done" (no TTS pending)
# Register callbacks so terminal_tool prompts route through our UI
set_sudo_password_callback(self._sudo_password_callback)
set_approval_callback(self._approval_callback)
set_secret_capture_callback(self._secret_capture_callback)
if os.environ.get("HERMES_DEFER_AGENT_STARTUP") != "1":
self._install_tool_callbacks()
# Computer-use shares the same approval UI (prompt_toolkit dialog).
# The tool handler expects a 3-arg callback (action, args, summary)
# and returns "approve_once" | "approve_session" | "always_approve"
# | "deny". Adapt our existing generic callback.
try:
from tools.computer_use_tool import set_approval_callback as _set_cu_cb
_set_cu_cb(self._computer_use_approval_callback)
except ImportError:
pass # computer_use extras not installed
# Ensure tirith security scanner is available (downloads if needed).
# Warn the user if tirith is enabled in config but not available,
# so they know command security scanning is degraded. Suppressed
# on platforms where tirith ships no binary (Windows etc.) — the
# user can't act on it and pattern-matching guards still run.
try:
from tools.tirith_security import ensure_installed, is_platform_supported
tirith_path = ensure_installed(log_failures=False)
if tirith_path is None and is_platform_supported():
security_cfg = self.config.get("security", {}) or {}
tirith_enabled = security_cfg.get("tirith_enabled", True)
if tirith_enabled:
_cprint(f" {_DIM}⚠ tirith security scanner enabled but not available "
f"— command scanning will use pattern matching only{_RST}")
except Exception:
pass # Non-fatal — fail-open at scan time if unavailable
if os.environ.get("HERMES_DEFER_AGENT_STARTUP") != "1":
self._ensure_tirith_security()
# Key bindings for the input area
kb = KeyBindings()
+4 -1
View File
@@ -529,7 +529,9 @@ def _send_media_via_adapter(
"""
from pathlib import Path
from gateway.platforms.base import should_send_media_as_audio
from gateway.platforms.base import BasePlatformAdapter, should_send_media_as_audio
media_files = BasePlatformAdapter.filter_media_delivery_paths(media_files)
for media_path, _is_voice in media_files:
try:
@@ -614,6 +616,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
# Extract MEDIA: tags so attachments are forwarded as files, not raw text
from gateway.platforms.base import BasePlatformAdapter
media_files, cleaned_delivery_content = BasePlatformAdapter.extract_media(delivery_content)
media_files = BasePlatformAdapter.filter_media_delivery_paths(media_files)
try:
config = load_gateway_config()
-67
View File
@@ -926,73 +926,6 @@ def load_gateway_config() -> GatewayConfig:
ac = ",".join(str(v) for v in ac)
os.environ["SLACK_ALLOWED_CHANNELS"] = str(ac)
# Discord settings → env vars (env vars take precedence)
discord_cfg = yaml_cfg.get("discord", {})
if isinstance(discord_cfg, dict):
if "require_mention" in discord_cfg and not os.getenv("DISCORD_REQUIRE_MENTION"):
os.environ["DISCORD_REQUIRE_MENTION"] = str(discord_cfg["require_mention"]).lower()
if "thread_require_mention" in discord_cfg and not os.getenv("DISCORD_THREAD_REQUIRE_MENTION"):
os.environ["DISCORD_THREAD_REQUIRE_MENTION"] = str(discord_cfg["thread_require_mention"]).lower()
frc = discord_cfg.get("free_response_channels")
if frc is not None and not os.getenv("DISCORD_FREE_RESPONSE_CHANNELS"):
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["DISCORD_FREE_RESPONSE_CHANNELS"] = str(frc)
if "auto_thread" in discord_cfg and not os.getenv("DISCORD_AUTO_THREAD"):
os.environ["DISCORD_AUTO_THREAD"] = str(discord_cfg["auto_thread"]).lower()
if "reactions" in discord_cfg and not os.getenv("DISCORD_REACTIONS"):
os.environ["DISCORD_REACTIONS"] = str(discord_cfg["reactions"]).lower()
# ignored_channels: channels where bot never responds (even when mentioned)
ic = discord_cfg.get("ignored_channels")
if ic is not None and not os.getenv("DISCORD_IGNORED_CHANNELS"):
if isinstance(ic, list):
ic = ",".join(str(v) for v in ic)
os.environ["DISCORD_IGNORED_CHANNELS"] = str(ic)
# allowed_channels: if set, bot ONLY responds in these channels (whitelist)
ac = discord_cfg.get("allowed_channels")
if ac is not None and not os.getenv("DISCORD_ALLOWED_CHANNELS"):
if isinstance(ac, list):
ac = ",".join(str(v) for v in ac)
os.environ["DISCORD_ALLOWED_CHANNELS"] = str(ac)
# no_thread_channels: channels where bot responds directly without creating thread
ntc = discord_cfg.get("no_thread_channels")
if ntc is not None and not os.getenv("DISCORD_NO_THREAD_CHANNELS"):
if isinstance(ntc, list):
ntc = ",".join(str(v) for v in ntc)
os.environ["DISCORD_NO_THREAD_CHANNELS"] = str(ntc)
# history_backfill: recover missed channel messages for shared sessions
# when require_mention is active. Fetches messages between bot turns
# and prepends them to the user message for context.
if "history_backfill" in discord_cfg and not os.getenv("DISCORD_HISTORY_BACKFILL"):
os.environ["DISCORD_HISTORY_BACKFILL"] = str(discord_cfg["history_backfill"]).lower()
hbl = discord_cfg.get("history_backfill_limit")
if hbl is not None and not os.getenv("DISCORD_HISTORY_BACKFILL_LIMIT"):
os.environ["DISCORD_HISTORY_BACKFILL_LIMIT"] = str(hbl)
# allow_mentions: granular control over what the bot can ping.
# Safe defaults (no @everyone/roles) are applied in the adapter;
# these YAML keys only override when set and let users opt back
# into unsafe modes (e.g. roles=true) if they actually want it.
allow_mentions_cfg = discord_cfg.get("allow_mentions")
if isinstance(allow_mentions_cfg, dict):
for yaml_key, env_key in (
("everyone", "DISCORD_ALLOW_MENTION_EVERYONE"),
("roles", "DISCORD_ALLOW_MENTION_ROLES"),
("users", "DISCORD_ALLOW_MENTION_USERS"),
("replied_user", "DISCORD_ALLOW_MENTION_REPLIED_USER"),
):
if yaml_key in allow_mentions_cfg and not os.getenv(env_key):
os.environ[env_key] = str(allow_mentions_cfg[yaml_key]).lower()
# reply_to_mode: top-level preferred, falls back to extra.reply_to_mode
# YAML 1.1 parses bare 'off' as boolean False — coerce to string "off".
_discord_extra = discord_cfg.get("extra") if isinstance(discord_cfg.get("extra"), dict) else {}
_discord_rtm = (
discord_cfg["reply_to_mode"] if "reply_to_mode" in discord_cfg
else _discord_extra.get("reply_to_mode")
)
if _discord_rtm is not None and not os.getenv("DISCORD_REPLY_TO_MODE"):
_rtm_str = "off" if _discord_rtm is False else str(_discord_rtm).lower()
os.environ["DISCORD_REPLY_TO_MODE"] = _rtm_str
# Bridge top-level require_mention to Telegram when the telegram: section
# does not already provide one. Users often write "require_mention: true"
# at the top level alongside group_sessions_per_user, expecting it to work
+168 -39
View File
@@ -18,6 +18,7 @@ Security features (based on OWASP + NIST SP 800-63-4 guidance):
Storage: ~/.hermes/pairing/
"""
import hashlib
import json
import os
import secrets
@@ -27,6 +28,10 @@ import time
from pathlib import Path
from typing import Optional
from gateway.whatsapp_identity import (
expand_whatsapp_aliases,
normalize_whatsapp_identifier,
)
from hermes_constants import get_hermes_dir
from utils import atomic_replace
@@ -109,12 +114,40 @@ class PairingStore:
def _save_json(self, path: Path, data: dict) -> None:
_secure_write(path, json.dumps(data, indent=2, ensure_ascii=False))
def _normalize_user_id(self, platform: str, user_id: str) -> str:
"""Normalize platform-specific user IDs before persisting them."""
raw_user_id = str(user_id or "").strip()
if platform == "whatsapp":
return normalize_whatsapp_identifier(raw_user_id) or raw_user_id
return raw_user_id
def _user_id_aliases(self, platform: str, user_id: str) -> set[str]:
"""Return all known equivalent user IDs for auth/rate-limit checks."""
raw_user_id = str(user_id or "").strip()
if not raw_user_id:
return set()
aliases = {raw_user_id, self._normalize_user_id(platform, raw_user_id)}
if platform == "whatsapp":
aliases.update(expand_whatsapp_aliases(raw_user_id))
aliases.discard("")
return aliases
def _user_ids_match(self, platform: str, left: str, right: str) -> bool:
"""Return True when two user IDs represent the same principal."""
left_aliases = self._user_id_aliases(platform, left)
right_aliases = self._user_id_aliases(platform, right)
return bool(left_aliases and right_aliases and (left_aliases & right_aliases))
# ----- Approved users -----
def is_approved(self, platform: str, user_id: str) -> bool:
"""Check if a user is approved (paired) on a platform."""
approved = self._load_json(self._approved_path(platform))
return user_id in approved
for approved_user_id in approved:
if self._user_ids_match(platform, approved_user_id, user_id):
return True
return False
def list_approved(self, platform: str = None) -> list:
"""List approved users, optionally filtered by platform."""
@@ -129,7 +162,16 @@ class PairingStore:
def _approve_user(self, platform: str, user_id: str, user_name: str = "") -> None:
"""Add a user to the approved list. Must be called under self._lock."""
approved = self._load_json(self._approved_path(platform))
approved[user_id] = {
normalized_user_id = self._normalize_user_id(platform, user_id)
duplicate_ids = [
approved_user_id
for approved_user_id in approved
if self._user_ids_match(platform, approved_user_id, normalized_user_id)
]
for approved_user_id in duplicate_ids:
del approved[approved_user_id]
approved[normalized_user_id] = {
"user_name": user_name,
"approved_at": time.time(),
}
@@ -140,14 +182,25 @@ class PairingStore:
path = self._approved_path(platform)
with self._lock:
approved = self._load_json(path)
if user_id in approved:
del approved[user_id]
matching_ids = [
approved_user_id
for approved_user_id in approved
if self._user_ids_match(platform, approved_user_id, user_id)
]
if matching_ids:
for approved_user_id in matching_ids:
del approved[approved_user_id]
self._save_json(path, approved)
return True
return False
# ----- Pending codes -----
@staticmethod
def _hash_code(code: str, salt: bytes) -> str:
"""Hash a pairing code with the given salt using SHA-256."""
return hashlib.sha256(salt + code.encode("utf-8")).hexdigest()
def generate_code(
self, platform: str, user_id: str, user_name: str = ""
) -> Optional[str]:
@@ -158,9 +211,13 @@ class PairingStore:
- User is rate-limited (too recent request)
- Max pending codes reached for this platform
- User/platform is in lockout due to failed attempts
The code is NOT stored in plaintext. Only a salted SHA-256 hash is
persisted so that reading the pending file does not reveal codes.
"""
with self._lock:
self._cleanup_expired(platform)
normalized_user_id = self._normalize_user_id(platform, user_id)
# Check lockout
if self._is_locked_out(platform):
@@ -178,9 +235,18 @@ class PairingStore:
# Generate cryptographically random code
code = "".join(secrets.choice(ALPHABET) for _ in range(CODE_LENGTH))
# Store pending request
pending[code] = {
"user_id": user_id,
# Hash the code with a random salt before storing
salt = os.urandom(16)
code_hash = self._hash_code(code, salt)
# Use a unique entry id as the key (not the code itself)
entry_id = secrets.token_hex(8)
# Store pending request with hashed code
pending[entry_id] = {
"hash": code_hash,
"salt": salt.hex(),
"user_id": normalized_user_id,
"user_name": user_name,
"created_at": time.time(),
}
@@ -195,10 +261,16 @@ class PairingStore:
"""
Approve a pairing code. Adds the user to the approved list.
Returns {user_id, user_name} on success, None if code is
Returns ``{user_id, user_name}`` on success, ``None`` if the code is
invalid/expired OR the platform is currently locked out after
``MAX_FAILED_ATTEMPTS`` failed approvals (#10195). Callers can
disambiguate with ``_is_locked_out(platform)``.
Verification: the user-provided code is hashed with each stored
entry's salt and compared to the stored hash using constant-time
comparison. Pre-hash entries (legacy plaintext-key format from
pre-upgrade pending.json files) are silently ignored they get
pruned at TTL by ``_cleanup_expired``.
"""
with self._lock:
self._cleanup_expired(platform)
@@ -213,37 +285,77 @@ class PairingStore:
return None
pending = self._load_json(self._pending_path(platform))
if code not in pending:
# Find the entry whose hash matches the provided code.
# Tolerate legacy plaintext-key entries (no salt/hash) and
# malformed entries — skip them rather than KeyError, so an
# in-place upgrade across an existing pending.json doesn't
# crash on the first approve call. Legacy entries get pruned
# at their TTL by _cleanup_expired.
matched_key = None
matched_entry = None
for entry_id, entry in pending.items():
if not isinstance(entry, dict):
continue
if "salt" not in entry or "hash" not in entry:
continue
try:
salt = bytes.fromhex(entry["salt"])
except ValueError:
continue
candidate_hash = self._hash_code(code, salt)
if secrets.compare_digest(candidate_hash, entry["hash"]):
matched_key = entry_id
matched_entry = entry
break
if matched_key is None:
self._record_failed_attempt(platform)
return None
entry = pending.pop(code)
del pending[matched_key]
self._save_json(self._pending_path(platform), pending)
# Add to approved list
self._approve_user(platform, entry["user_id"], entry.get("user_name", ""))
self._approve_user(platform, matched_entry["user_id"],
matched_entry.get("user_name", ""))
return {
"user_id": entry["user_id"],
"user_name": entry.get("user_name", ""),
"user_id": matched_entry["user_id"],
"user_name": matched_entry.get("user_name", ""),
}
def list_pending(self, platform: str = None) -> list:
"""List pending pairing requests, optionally filtered by platform."""
"""List pending pairing requests, optionally filtered by platform.
Codes are stored hashed the ``code`` field is replaced with the
first 8 hex characters of the hash so admins can distinguish entries
without revealing the original code. Legacy plaintext-key entries
(pre-hash format) are shown with a "legacy" placeholder so admins
can see them age out without crashing on a missing ``hash`` field.
"""
results = []
platforms = [platform] if platform else self._all_platforms("pending")
for p in platforms:
self._cleanup_expired(p)
pending = self._load_json(self._pending_path(p))
for code, info in pending.items():
age_min = int((time.time() - info["created_at"]) / 60)
results.append({
"platform": p,
"code": code,
"user_id": info["user_id"],
"user_name": info.get("user_name", ""),
"age_minutes": age_min,
})
with self._lock:
platforms = [platform] if platform else self._all_platforms("pending")
for p in platforms:
self._cleanup_expired(p)
pending = self._load_json(self._pending_path(p))
for entry_id, info in pending.items():
if not isinstance(info, dict):
continue
created_at = info.get("created_at")
if not isinstance(created_at, (int, float)):
continue
age_min = int((time.time() - created_at) / 60)
hash_val = info.get("hash")
code_display = hash_val[:8] if isinstance(hash_val, str) else "legacy"
results.append({
"platform": p,
"code": code_display,
"user_id": info.get("user_id", ""),
"user_name": info.get("user_name", ""),
"age_minutes": age_min,
})
return results
def clear_pending(self, platform: str = None) -> int:
@@ -262,15 +374,20 @@ class PairingStore:
def _is_rate_limited(self, platform: str, user_id: str) -> bool:
"""Check if a user has requested a code too recently."""
limits = self._load_json(self._rate_limit_path())
key = f"{platform}:{user_id}"
last_request = limits.get(key, 0)
return (time.time() - last_request) < RATE_LIMIT_SECONDS
for alias in self._user_id_aliases(platform, user_id):
key = f"{platform}:{alias}"
last_request = limits.get(key, 0)
if (time.time() - last_request) < RATE_LIMIT_SECONDS:
return True
return False
def _record_rate_limit(self, platform: str, user_id: str) -> None:
"""Record the time of a pairing request for rate limiting."""
limits = self._load_json(self._rate_limit_path())
key = f"{platform}:{user_id}"
limits[key] = time.time()
now = time.time()
for alias in self._user_id_aliases(platform, user_id):
key = f"{platform}:{alias}"
limits[key] = now
self._save_json(self._rate_limit_path(), limits)
def _is_locked_out(self, platform: str) -> bool:
@@ -297,17 +414,29 @@ class PairingStore:
# ----- Cleanup -----
def _cleanup_expired(self, platform: str) -> None:
"""Remove expired pending codes."""
"""Remove expired pending codes.
Tolerant of malformed / legacy entries anything without a numeric
``created_at`` is treated as expired (it's effectively unusable
with the new hash-keyed schema anyway).
"""
path = self._pending_path(platform)
pending = self._load_json(path)
now = time.time()
expired = [
code for code, info in pending.items()
if (now - info["created_at"]) > CODE_TTL_SECONDS
]
expired = []
for entry_id, info in pending.items():
if not isinstance(info, dict):
expired.append(entry_id)
continue
created_at = info.get("created_at")
if not isinstance(created_at, (int, float)):
expired.append(entry_id)
continue
if (now - created_at) > CODE_TTL_SECONDS:
expired.append(entry_id)
if expired:
for code in expired:
del pending[code]
for entry_id in expired:
del pending[entry_id]
self._save_json(path, pending)
def _all_platforms(self, suffix: str) -> list:
+112 -1
View File
@@ -472,7 +472,7 @@ sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
from gateway.config import Platform, PlatformConfig
from gateway.session import SessionSource, build_session_key
from hermes_constants import get_hermes_dir
from hermes_constants import get_hermes_dir, get_hermes_home
GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
@@ -813,6 +813,86 @@ def cache_video_from_bytes(data: bytes, ext: str = ".mp4") -> str:
# ---------------------------------------------------------------------------
DOCUMENT_CACHE_DIR = get_hermes_dir("cache/documents", "document_cache")
SCREENSHOT_CACHE_DIR = get_hermes_dir("cache/screenshots", "browser_screenshots")
_HERMES_HOME = get_hermes_home()
MEDIA_DELIVERY_ALLOW_DIRS_ENV = "HERMES_MEDIA_ALLOW_DIRS"
MEDIA_DELIVERY_SAFE_ROOTS = (
IMAGE_CACHE_DIR,
AUDIO_CACHE_DIR,
VIDEO_CACHE_DIR,
DOCUMENT_CACHE_DIR,
SCREENSHOT_CACHE_DIR,
_HERMES_HOME / "image_cache",
_HERMES_HOME / "audio_cache",
_HERMES_HOME / "video_cache",
_HERMES_HOME / "document_cache",
_HERMES_HOME / "browser_screenshots",
)
def _media_delivery_allowed_roots() -> List[Path]:
"""Return roots from which model-emitted local media may be delivered."""
roots = [Path(root) for root in MEDIA_DELIVERY_SAFE_ROOTS]
extra_roots = os.environ.get(MEDIA_DELIVERY_ALLOW_DIRS_ENV, "")
for chunk in extra_roots.split(os.pathsep):
for raw_root in chunk.split(","):
raw_root = raw_root.strip()
if not raw_root:
continue
root = Path(os.path.expanduser(raw_root))
if root.is_absolute():
roots.append(root)
return roots
def _path_is_within(path: Path, root: Path) -> bool:
try:
path.relative_to(root)
return True
except ValueError:
return False
def validate_media_delivery_path(path: str) -> Optional[str]:
"""Return a safe absolute file path for native media delivery, else None.
MEDIA tags and bare local paths in model output are untrusted text. Only
existing regular files under Hermes-managed media caches, or roots the
operator explicitly allowlists, may be uploaded as native attachments.
Symlinks are resolved before the containment check.
"""
if not path:
return None
candidate = str(path).strip()
if len(candidate) >= 2 and candidate[0] == candidate[-1] and candidate[0] in "`\"'":
candidate = candidate[1:-1].strip()
candidate = candidate.lstrip("`\"'").rstrip("`\"',.;:)}]")
if not candidate:
return None
expanded = Path(os.path.expanduser(candidate))
if not expanded.is_absolute():
return None
try:
resolved = expanded.resolve(strict=True)
except (OSError, RuntimeError, ValueError):
return None
if not resolved.is_file():
return None
for root in _media_delivery_allowed_roots():
try:
resolved_root = root.expanduser().resolve(strict=False)
except (OSError, RuntimeError, ValueError):
continue
if _path_is_within(resolved, resolved_root):
return str(resolved)
return None
SUPPORTED_DOCUMENT_TYPES = {
".pdf": "application/pdf",
@@ -2119,6 +2199,35 @@ class BasePlatformAdapter(ABC):
text = f"{caption}\n{text}"
return await self.send(chat_id=chat_id, content=text, reply_to=reply_to, metadata=metadata)
@staticmethod
def validate_media_delivery_path(path: str) -> Optional[str]:
"""Return a resolved path if it is safe for native attachment upload."""
return validate_media_delivery_path(path)
@staticmethod
def filter_media_delivery_paths(media_files) -> List[Tuple[str, bool]]:
"""Drop unsafe MEDIA paths and normalize accepted paths."""
safe_media: List[Tuple[str, bool]] = []
for media_path, is_voice in media_files or []:
safe_path = validate_media_delivery_path(str(media_path))
if safe_path:
safe_media.append((safe_path, bool(is_voice)))
else:
logger.warning("Skipping unsafe MEDIA directive path outside allowed roots")
return safe_media
@staticmethod
def filter_local_delivery_paths(file_paths) -> List[str]:
"""Drop unsafe bare local file paths and normalize accepted paths."""
safe_paths: List[str] = []
for file_path in file_paths or []:
safe_path = validate_media_delivery_path(str(file_path))
if safe_path:
safe_paths.append(safe_path)
else:
logger.warning("Skipping unsafe local file path outside allowed roots")
return safe_paths
@staticmethod
def extract_media(content: str) -> Tuple[List[Tuple[str, bool]], str]:
"""
@@ -3166,6 +3275,7 @@ class BasePlatformAdapter(ABC):
# Extract MEDIA:<path> tags (from TTS tool) before other processing
media_files, response = self.extract_media(response)
media_files = self.filter_media_delivery_paths(media_files)
# Extract image URLs and send them as native platform attachments
images, text_content = self.extract_images(response)
@@ -3179,6 +3289,7 @@ class BasePlatformAdapter(ABC):
# Auto-detect bare local file paths for native media delivery
# (helps small models that don't use MEDIA: syntax)
local_files, text_content = self.extract_local_files(text_content)
local_files = self.filter_local_delivery_paths(local_files)
if local_files:
logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
+77 -16
View File
@@ -534,9 +534,30 @@ class QQAdapter(BasePlatformAdapter):
self._mark_transport_disconnected()
self._fail_pending("Connection closed")
# Stop reconnecting for fatal codes
if code in {4914, 4915}:
desc = "offline/sandbox-only" if code == 4914 else "banned"
# Stop reconnecting for fatal codes (unrecoverable errors)
if code in {
4001, # Invalid opcode
4002, # Invalid payload
4010, # Invalid shard
4011, # Sharding required
4012, # Invalid API version
4013, # Invalid intent
4014, # Intent not authorized
4914, # Offline/sandbox-only
4915, # Banned
}:
fatal_descriptions = {
4001: "invalid opcode",
4002: "invalid payload",
4010: "invalid shard",
4011: "sharding required",
4012: "invalid API version",
4013: "invalid intent",
4014: "intent not authorized",
4914: "offline/sandbox-only",
4915: "banned",
}
desc = fatal_descriptions.get(code, f"fatal error (code={code})")
logger.error(
"[%s] Bot is %s. Check QQ Open Platform.", self._log_tag, desc
)
@@ -573,10 +594,11 @@ class QQAdapter(BasePlatformAdapter):
self._token_expires_at = 0.0
# Session invalid → clear session, will re-identify on next Hello
# Note: 4009 (connection timeout) is NOT included here — it is
# resumable per the QQ protocol and should preserve session state.
if code in {
4006,
4007,
4009,
4900,
4901,
4902,
@@ -705,9 +727,8 @@ class QQAdapter(BasePlatformAdapter):
"token": f"QQBot {token}",
"intents": (1 << 25)
| (1 << 30)
| (
1 << 12
), # C2C_GROUP_AT_MESSAGES + PUBLIC_GUILD_MESSAGES + DIRECT_MESSAGE
| (1 << 12)
| (1 << 26), # C2C_GROUP_AT_MESSAGES + PUBLIC_GUILD_MESSAGES + DIRECT_MESSAGE + INTERACTION
"shard": [0, 1],
"properties": {
"$os": "macOS",
@@ -826,6 +847,32 @@ class QQAdapter(BasePlatformAdapter):
if op == 11:
return
# op 7 = Server Reconnect — server asks client to reconnect (e.g.
# load-balancing, maintenance). Close the WS so _read_events raises
# and the outer loop triggers a reconnect with Resume.
if op == 7:
logger.info("[%s] Server requested reconnect (op 7)", self._log_tag)
if self._ws and not self._ws.closed:
self._create_task(self._ws.close())
return
# op 9 = Invalid Session — d=True means session is resumable,
# d=False means we must re-identify from scratch.
if op == 9:
resumable = bool(d) if d is not None else False
if not resumable:
logger.info(
"[%s] Invalid session (op 9, not resumable), clearing session",
self._log_tag,
)
self._session_id = None
self._last_seq = None
else:
logger.info("[%s] Invalid session (op 9, resumable)", self._log_tag)
if self._ws and not self._ws.closed:
self._create_task(self._ws.close())
return
logger.debug("[%s] Unknown op: %s", self._log_tag, op)
def _handle_ready(self, d: Any) -> None:
@@ -1607,7 +1654,7 @@ class QQAdapter(BasePlatformAdapter):
elif ct.startswith("image/"):
# Image: download and cache locally.
try:
cached_path = await self._download_and_cache(url, ct)
cached_path = await self._download_and_cache(url, ct, filename)
if cached_path and os.path.isfile(cached_path):
image_urls.append(cached_path)
image_media_types.append(ct or "image/jpeg")
@@ -1620,11 +1667,15 @@ class QQAdapter(BasePlatformAdapter):
except Exception as exc:
logger.debug("[%s] Failed to cache image: %s", self._log_tag, exc)
else:
# Other attachments (video, file, etc.): record as text.
# Other attachments (video, file, etc.): download and record with path.
try:
cached_path = await self._download_and_cache(url, ct)
cached_path = await self._download_and_cache(url, ct, filename)
if cached_path:
other_attachments.append(f"[Attachment: {filename or ct}]")
name = filename or ct
if ct.startswith("video/"):
other_attachments.append(f"[video: {name} ({cached_path})]")
else:
other_attachments.append(f"[file: {name} ({cached_path})]")
except Exception as exc:
logger.debug("[%s] Failed to cache attachment: %s", self._log_tag, exc)
@@ -1636,8 +1687,14 @@ class QQAdapter(BasePlatformAdapter):
"attachment_info": attachment_info,
}
async def _download_and_cache(self, url: str, content_type: str) -> Optional[str]:
"""Download a URL and cache it locally."""
async def _download_and_cache(
self, url: str, content_type: str, original_name: str = "",
) -> Optional[str]:
"""Download a URL and cache it locally.
:param original_name: Preferred filename from attachment metadata.
Falls back to the URL path basename if empty.
"""
from tools.url_safety import is_safe_url
if not is_safe_url(url):
@@ -1668,7 +1725,11 @@ class QQAdapter(BasePlatformAdapter):
# Convert to .wav using ffmpeg so STT engines can process it.
return await self._convert_audio_to_wav(data, url)
else:
filename = Path(urlparse(url).path).name or "qq_attachment"
filename = (
original_name
or Path(urlparse(url).path).name
or "qq_attachment"
)
return cache_document_from_bytes(data, filename)
@staticmethod
@@ -1881,7 +1942,7 @@ class QQAdapter(BasePlatformAdapter):
@staticmethod
def _guess_ext_from_data(data: bytes) -> str:
"""Guess file extension from magic bytes."""
if data[:9] == b"#!SILK_V3" or data[:5] == b"#!SILK":
if data[:9] == b"#!SILK_V3" or data[:6] == b"#!SILK":
return ".silk"
if data[:2] == b"\x02!":
return ".silk"
@@ -1901,7 +1962,7 @@ class QQAdapter(BasePlatformAdapter):
@staticmethod
def _looks_like_silk(data: bytes) -> bool:
"""Check if bytes look like a SILK audio file."""
return data[:4] == b"#!SILK" or data[:2] == b"\x02!" or data[:9] == b"#!SILK_V3"
return data[:6] == b"#!SILK" or data[:2] == b"\x02!" or data[:9] == b"#!SILK_V3"
async def _convert_silk_to_wav(self, src_path: str, wav_path: str) -> Optional[str]:
"""Convert audio file to WAV using the pilk library.
+41 -3
View File
@@ -468,6 +468,10 @@ class TelegramAdapter(BasePlatformAdapter):
# "all" — every message triggers a push notification (legacy
# behavior; opt-in via display.platforms.telegram.notifications).
self._notifications_mode: str = "important"
# send_or_update_status() bookkeeping: {(chat_id, status_key) -> bot message_id}
# Tracks status bubbles owned by this adapter so subsequent calls with the
# same key edit the same message instead of appending new ones (#30045).
self._status_message_ids: Dict[tuple, str] = {}
def _notification_kwargs(
self, metadata: Optional[Dict[str, Any]]
@@ -1908,6 +1912,40 @@ class TelegramAdapter(BasePlatformAdapter):
is_connect_timeout = self._looks_like_connect_timeout(e)
return SendResult(success=False, error=str(e), retryable=(is_connect_timeout or not is_timeout))
async def send_or_update_status(
self,
chat_id: str,
status_key: str,
content: str,
*,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a status message, or edit the previous one with the same key.
Issue #30045: progress/status callbacks (context-pressure, lifecycle,
compression, etc.) used to append a fresh bubble on every call. With
this method, the first call sends and the message id is remembered;
subsequent calls with the same (chat_id, status_key) edit that same
message in place. If the edit fails (message deleted, too old, etc.)
we drop the cached id and send fresh.
"""
key = (str(chat_id), str(status_key))
cached_id = self._status_message_ids.get(key)
if cached_id is not None:
result = await self.edit_message(
chat_id, cached_id, content, finalize=True, metadata=metadata,
)
if result.success:
if result.message_id:
self._status_message_ids[key] = str(result.message_id)
return result
# Edit failed — clear the cached id and fall through to a fresh send.
self._status_message_ids.pop(key, None)
result = await self.send(chat_id, content, metadata=metadata)
if result.success and result.message_id:
self._status_message_ids[key] = str(result.message_id)
return result
async def edit_message(
self,
chat_id: str,
@@ -4573,10 +4611,10 @@ class TelegramAdapter(BasePlatformAdapter):
return (
"You are handling a Telegram group chat message.\n"
f"- Your identity: user_id={bot_id}, @-mention name in this group=@{username}\n"
"- Lines in history prefixed with `[nickname|user_id]` are observed Telegram group context "
"and are not necessarily addressed to you.\n"
"- observed Telegram group context may be provided in a separate context-only block "
"before the current message; it is not necessarily addressed to you.\n"
"- Treat only the current new message as a request explicitly directed at you, "
"and answer it directly."
"and use observed context only when the current message asks for it."
)
def _apply_telegram_group_observe_attribution(self, event: MessageEvent) -> MessageEvent:
+31 -5
View File
@@ -308,11 +308,37 @@ class WebhookAdapter(BasePlatformAdapter):
data = json.loads(subs_path.read_text(encoding="utf-8"))
if not isinstance(data, dict):
return
# Merge: static routes take precedence over dynamic ones
self._dynamic_routes = {
k: v for k, v in data.items()
if k not in self._static_routes
}
# Merge: static routes take precedence over dynamic ones.
# Reject any dynamic route whose effective secret is empty —
# an empty secret would cause _handle_webhook to skip HMAC
# validation entirely, letting unauthenticated callers in.
new_dynamic: Dict[str, dict] = {}
for k, v in data.items():
if k in self._static_routes:
continue
effective_secret = v.get("secret", self._global_secret)
if not effective_secret:
logger.warning(
"[webhook] Dynamic route '%s' skipped: 'secret' is "
"missing or empty. Set a valid HMAC secret, or use "
"'%s' to explicitly disable auth (testing only).",
k,
_INSECURE_NO_AUTH,
)
continue
if (
effective_secret == _INSECURE_NO_AUTH
and not _is_loopback_host(self._host)
):
logger.warning(
"[webhook] Dynamic route '%s' skipped: INSECURE_NO_AUTH "
"is only allowed on loopback hosts. Current host: '%s'.",
k,
self._host,
)
continue
new_dynamic[k] = v
self._dynamic_routes = new_dynamic
self._routes = {**self._dynamic_routes, **self._static_routes}
self._dynamic_routes_mtime = mtime
logger.info(
+2
View File
@@ -1679,8 +1679,10 @@ class WeixinAdapter(BasePlatformAdapter):
# Extract MEDIA: tags and bare local file paths before text delivery.
media_files, cleaned_content = self.extract_media(content)
media_files = self.filter_media_delivery_paths(media_files)
_, image_cleaned = self.extract_images(cleaned_content)
local_files, final_content = self.extract_local_files(image_cleaned)
local_files = self.filter_local_delivery_paths(local_files)
_AUDIO_EXTS = {".ogg", ".opus", ".mp3", ".wav", ".m4a", ".flac"}
_VIDEO_EXTS = {".mp4", ".mov", ".avi", ".mkv", ".webm", ".3gp"}
+218 -153
View File
@@ -54,6 +54,7 @@ from agent.account_usage import fetch_account_usage, render_account_usage_lines
from agent.async_utils import safe_schedule_threadsafe
from agent.i18n import t
from hermes_cli.config import cfg_get
from hermes_cli.fallback_config import get_fallback_chain
# --- Agent cache tuning ---------------------------------------------------
# Bounds the per-session AIAgent cache to prevent unbounded growth in
@@ -238,6 +239,19 @@ def _prepare_gateway_status_message(platform: Any, event_type: str, message: str
return text
async def _send_or_update_status_coro(adapter, chat_id, status_key, content, metadata):
"""Route a status message through adapter.send_or_update_status when supported.
Issue #30045: adapters that implement send_or_update_status (currently
Telegram) edit the previous bubble for the same status_key instead of
appending a new one. Adapters without the method fall back to plain send.
"""
sender = getattr(adapter, "send_or_update_status", None)
if callable(sender):
return await sender(chat_id, status_key, content, metadata=metadata)
return await adapter.send(chat_id, content, metadata=metadata)
def _telegramize_command_mentions(text: str, platform: Any) -> str:
"""Rewrite slash-command mentions to Telegram-valid command names.
@@ -447,6 +461,109 @@ def _build_replay_entry(role: str, content: Any, msg: Dict[str, Any]) -> Dict[st
return entry
_TELEGRAM_OBSERVED_CONTEXT_PROMPT_MARKER = "observed Telegram group context"
_OBSERVED_GROUP_CONTEXT_HEADER = "[Observed Telegram group context - context only, not requests]"
_CURRENT_ADDRESSED_MESSAGE_HEADER = "[Current addressed message - answer only this unless it explicitly asks you to use the observed context]"
def _uses_telegram_observed_group_context(channel_prompt: Optional[str]) -> bool:
"""Return True for Telegram group turns that may include observed chatter.
Telegram's observe-unmentioned mode persists skipped group chatter so a
later @mention can see it. Those rows must not replay as ordinary user
turns: a weak wake word like ``@bot cambio`` should not make the model treat
old unmentioned chatter as pending work. The Telegram adapter marks these
turns with a channel prompt; this helper keeps the run-path check explicit
and unit-testable.
"""
return bool(channel_prompt and _TELEGRAM_OBSERVED_CONTEXT_PROMPT_MARKER in channel_prompt)
def _build_gateway_agent_history(
history: List[Dict[str, Any]],
*,
channel_prompt: Optional[str] = None,
) -> tuple[List[Dict[str, Any]], Optional[str]]:
"""Convert stored gateway transcript rows into agent replay messages.
Observed Telegram group rows are returned as API-only context for the
current addressed message instead of being replayed as normal prior user
turns. Keeping that context out of ``conversation_history`` avoids
consecutive-user repair merging it with the live user turn and then hiding
the current message behind ``history_offset`` during persistence.
"""
agent_history: List[Dict[str, Any]] = []
observed_group_context: List[str] = []
separate_observed_context = _uses_telegram_observed_group_context(channel_prompt)
for msg in history or []:
role = msg.get("role")
if not role:
continue
# Skip metadata entries (tool definitions, session info) -- these are
# for transcript logging, not for the LLM.
if role in {"session_meta",}:
continue
# Skip system messages -- the agent rebuilds its own system prompt.
if role == "system":
continue
content = msg.get("content")
if separate_observed_context and msg.get("observed") and role == "user" and content:
observed_group_context.append(str(content).strip())
continue
# Rich agent messages (tool_calls, tool results) must be passed through
# intact so the API sees valid assistant→tool sequences.
has_tool_calls = "tool_calls" in msg
has_tool_call_id = "tool_call_id" in msg
is_tool_message = role == "tool"
if has_tool_calls or has_tool_call_id or is_tool_message:
clean_msg = {k: v for k, v in msg.items() if k not in {"timestamp", "observed"}}
agent_history.append(clean_msg)
elif content:
# Simple text message - just need role and content.
if msg.get("mirror"):
mirror_src = msg.get("mirror_source", "another session")
content = f"[Delivered from {mirror_src}] {content}"
entry = _build_replay_entry(role, content, msg)
agent_history.append(entry)
observed_context = "\n".join(observed_group_context).strip() or None
return agent_history, observed_context
def _wrap_current_message_with_observed_context(message: Any, observed_context: Optional[str]) -> Any:
"""Prepend observed Telegram context to the API-only current user turn."""
if not observed_context:
return message
prefix = (
f"{_OBSERVED_GROUP_CONTEXT_HEADER}\n"
f"{observed_context}\n\n"
f"{_CURRENT_ADDRESSED_MESSAGE_HEADER}\n"
)
if isinstance(message, str):
return f"{prefix}{message}"
if isinstance(message, list):
wrapped = [dict(part) if isinstance(part, dict) else part for part in message]
for part in wrapped:
if isinstance(part, dict) and part.get("type") == "text":
part["text"] = f"{prefix}{part.get('text', '')}"
return wrapped
return [{"type": "text", "text": prefix.rstrip()}] + wrapped
return message
def _last_transcript_timestamp(history: Optional[List[Dict[str, Any]]]) -> Any:
"""Return the ``timestamp`` of the last usable transcript row, if any.
@@ -892,19 +1009,22 @@ def _try_resolve_fallback_provider() -> dict | None:
return None
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
fb = cfg.get("fallback_providers") or cfg.get("fallback_model")
if not fb:
fb_list = get_fallback_chain(cfg)
if not fb_list:
return None
# Normalize to list
fb_list = fb if isinstance(fb, list) else [fb]
for entry in fb_list:
if not isinstance(entry, dict):
continue
try:
explicit_api_key = entry.get("api_key")
if not explicit_api_key:
key_env = str(
entry.get("key_env") or entry.get("api_key_env") or ""
).strip()
if key_env:
explicit_api_key = os.getenv(key_env, "").strip() or None
runtime = resolve_runtime_provider(
requested=entry.get("provider"),
explicit_base_url=entry.get("base_url"),
explicit_api_key=entry.get("api_key"),
explicit_api_key=explicit_api_key,
)
logger.info(
"Fallback provider resolved: %s model=%s",
@@ -1109,7 +1229,7 @@ def _check_unavailable_skill(command_name: str) -> str | None:
normalized = command_name.lower().replace("_", "-")
try:
from tools.skills_tool import _get_disabled_skill_names
from agent.skill_utils import get_all_skills_dirs
from agent.skill_utils import get_all_skills_dirs, is_excluded_skill_path
disabled = _get_disabled_skill_names()
# Check disabled skills across all dirs (local + external)
@@ -1117,7 +1237,7 @@ def _check_unavailable_skill(command_name: str) -> str | None:
if not skills_dir.exists():
continue
for skill_md in skills_dir.rglob("SKILL.md"):
if any(part in {'.git', '.github', '.hub', '.archive'} for part in skill_md.parts):
if is_excluded_skill_path(skill_md):
continue
slug, declared_name = _skill_slug_from_frontmatter(skill_md)
if not slug or not declared_name:
@@ -1136,6 +1256,8 @@ def _check_unavailable_skill(command_name: str) -> str | None:
optional_dir = get_optional_skills_dir(repo_root / "optional-skills")
if optional_dir.exists():
for skill_md in optional_dir.rglob("SKILL.md"):
if is_excluded_skill_path(skill_md):
continue
slug, _declared = _skill_slug_from_frontmatter(skill_md)
if not slug:
continue
@@ -1196,6 +1318,26 @@ def _load_gateway_config() -> dict:
return {}
def _load_gateway_runtime_config() -> dict:
"""Load gateway config for runtime reads, expanding supported ``${VAR}`` refs.
Runtime helpers should honor the same env-template expansion documented for
``config.yaml`` while still respecting tests that monkeypatch
``gateway.run._hermes_home``. Build on ``_load_gateway_config()`` rather
than calling the canonical loader directly so both behaviors stay aligned.
Expansion failures are intentionally NOT swallowed silently returning
the unexpanded dict would mask the very bug this helper exists to fix.
"""
cfg = _load_gateway_config()
if not isinstance(cfg, dict) or not cfg:
return {}
from hermes_cli.config import _expand_env_vars
expanded = _expand_env_vars(cfg)
return expanded if isinstance(expanded, dict) else {}
def _resolve_gateway_model(config: dict | None = None) -> str:
"""Read model from config.yaml — single source of truth.
@@ -2530,15 +2672,8 @@ class GatewayRunner:
"""
file_path = os.getenv("HERMES_PREFILL_MESSAGES_FILE", "")
if not file_path:
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
file_path = cfg.get("prefill_messages_file", "")
except Exception:
pass
cfg = _load_gateway_runtime_config()
file_path = str(cfg.get("prefill_messages_file", "") or "")
if not file_path:
return []
path = Path(file_path).expanduser()
@@ -2568,16 +2703,8 @@ class GatewayRunner:
prompt = os.getenv("HERMES_EPHEMERAL_SYSTEM_PROMPT", "")
if prompt:
return prompt
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
return (cfg_get(cfg, "agent", "system_prompt", default="") or "").strip()
except Exception:
pass
return ""
cfg = _load_gateway_runtime_config()
return str(cfg_get(cfg, "agent", "system_prompt", default="") or "").strip()
@staticmethod
def _load_reasoning_config() -> dict | None:
@@ -2588,16 +2715,8 @@ class GatewayRunner:
default (medium).
"""
from hermes_constants import parse_reasoning_effort
effort = ""
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
effort = str(cfg_get(cfg, "agent", "reasoning_effort", default="") or "").strip()
except Exception:
pass
cfg = _load_gateway_runtime_config()
effort = str(cfg_get(cfg, "agent", "reasoning_effort", default="") or "").strip()
result = parse_reasoning_effort(effort)
if effort and effort.strip() and result is None:
logger.warning("Unknown reasoning_effort '%s', using default (medium)", effort)
@@ -2671,16 +2790,8 @@ class GatewayRunner:
"fast"/"priority"/"on" => "priority", while "normal"/"off" disables it.
Returns None when unset or unsupported.
"""
raw = ""
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
raw = str(cfg_get(cfg, "agent", "service_tier", default="") or "").strip()
except Exception:
pass
cfg = _load_gateway_runtime_config()
raw = str(cfg_get(cfg, "agent", "service_tier", default="") or "").strip()
value = raw.lower()
if not value or value in {"normal", "default", "standard", "off", "none"}:
@@ -2693,34 +2804,19 @@ class GatewayRunner:
@staticmethod
def _load_show_reasoning() -> bool:
"""Load show_reasoning toggle from config.yaml display section."""
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
return is_truthy_value(
cfg_get(cfg, "display", "show_reasoning"),
default=False,
)
except Exception:
pass
return False
cfg = _load_gateway_runtime_config()
return is_truthy_value(
cfg_get(cfg, "display", "show_reasoning"),
default=False,
)
@staticmethod
def _load_busy_input_mode() -> str:
"""Load gateway drain-time busy-input behavior from config/env."""
mode = os.getenv("HERMES_GATEWAY_BUSY_INPUT_MODE", "").strip().lower()
if not mode:
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
mode = str(cfg_get(cfg, "display", "busy_input_mode", default="") or "").strip().lower()
except Exception:
pass
cfg = _load_gateway_runtime_config()
mode = str(cfg_get(cfg, "display", "busy_input_mode", default="") or "").strip().lower()
if mode == "queue":
return "queue"
if mode == "steer":
@@ -2732,15 +2828,8 @@ class GatewayRunner:
"""Load graceful gateway restart/stop drain timeout in seconds."""
raw = os.getenv("HERMES_RESTART_DRAIN_TIMEOUT", "").strip()
if not raw:
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
raw = str(cfg_get(cfg, "agent", "restart_drain_timeout", default="") or "").strip()
except Exception:
pass
cfg = _load_gateway_runtime_config()
raw = str(cfg_get(cfg, "agent", "restart_drain_timeout", default="") or "").strip()
value = parse_restart_drain_timeout(raw)
if raw and value == DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT:
try:
@@ -2765,19 +2854,12 @@ class GatewayRunner:
"""
mode = os.getenv("HERMES_BACKGROUND_NOTIFICATIONS", "")
if not mode:
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
raw = cfg_get(cfg, "display", "background_process_notifications")
if raw is False:
mode = "off"
elif raw not in {None, ""}:
mode = str(raw)
except Exception:
pass
cfg = _load_gateway_runtime_config()
raw = cfg_get(cfg, "display", "background_process_notifications")
if raw is False:
mode = "off"
elif raw not in {None, ""}:
mode = str(raw)
mode = (mode or "all").strip().lower()
valid = {"all", "result", "error", "off"}
if mode not in valid:
@@ -2803,12 +2885,12 @@ class GatewayRunner:
return {}
@staticmethod
def _load_fallback_model() -> list | dict | None:
def _load_fallback_model() -> list | None:
"""Load fallback provider chain from config.yaml.
Returns a list of provider dicts (``fallback_providers``), a single
dict (legacy ``fallback_model``), or None if not configured.
AIAgent.__init__ normalizes both formats into a chain.
Returns the merged effective chain from ``fallback_providers`` plus any
legacy ``fallback_model`` entries. ``fallback_providers`` stays first
when both keys are present.
"""
try:
import yaml as _y
@@ -2816,7 +2898,7 @@ class GatewayRunner:
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
fb = cfg.get("fallback_providers") or cfg.get("fallback_model") or None
fb = get_fallback_chain(cfg)
if fb:
return fb
except Exception:
@@ -4953,6 +5035,11 @@ class GatewayRunner:
if not candidates:
return
from gateway.platforms.base import BasePlatformAdapter
candidates = BasePlatformAdapter.filter_local_delivery_paths(candidates)
if not candidates:
return
_IMAGE_EXTS = {".png", ".jpg", ".jpeg", ".gif", ".webp"}
_VIDEO_EXTS = {".mp4", ".mov", ".avi", ".mkv", ".webm", ".3gp"}
@@ -5895,6 +5982,12 @@ class GatewayRunner:
if platform_registry.is_registered(platform.value):
adapter = platform_registry.create_adapter(platform.value, config)
if adapter is not None:
# Adapters that need a back-reference to the gateway runner
# (e.g. for cross-platform admin alerts) declare a
# ``gateway_runner`` attribute. Inject it after creation so
# plugin adapters don't need a custom factory signature.
if hasattr(adapter, "gateway_runner"):
adapter.gateway_runner = self
return adapter
# Registered but failed to instantiate — don't silently fall
# through to built-ins (there are none for plugin platforms).
@@ -5937,15 +6030,6 @@ class GatewayRunner:
adapter._notifications_mode = _notify_mode
return adapter
elif platform == Platform.DISCORD:
from gateway.platforms.discord import DiscordAdapter, check_discord_requirements
if not check_discord_requirements():
logger.warning("Discord: discord.py not installed")
return None
adapter = DiscordAdapter(config)
adapter.gateway_runner = self # For cross-platform admin alerts on unauthorized slash
return adapter
elif platform == Platform.WHATSAPP:
from gateway.platforms.whatsapp import WhatsAppAdapter, check_whatsapp_requirements
if not check_whatsapp_requirements():
@@ -11162,14 +11246,16 @@ class GatewayRunner:
# send_multiple_images (Telegram sendPhoto recompresses to ~1280px).
force_document_attachments = "[[as_document]]" in response
from gateway.platforms.base import BasePlatformAdapter, should_send_media_as_audio
media_files, _ = adapter.extract_media(response)
media_files = BasePlatformAdapter.filter_media_delivery_paths(media_files)
_, cleaned = adapter.extract_images(response)
local_files, _ = adapter.extract_local_files(cleaned)
local_files = BasePlatformAdapter.filter_local_delivery_paths(local_files)
_thread_meta = self._thread_metadata_for_source(event.source, self._reply_anchor_for_event(event))
from gateway.platforms.base import should_send_media_as_audio
_VIDEO_EXTS = {'.mp4', '.mov', '.avi', '.mkv', '.webm', '.3gp'}
_IMAGE_EXTS = {'.jpg', '.jpeg', '.png', '.webp', '.gif'}
@@ -11461,6 +11547,8 @@ class GatewayRunner:
# Extract media files from the response
if response:
media_files, response = adapter.extract_media(response)
from gateway.platforms.base import BasePlatformAdapter
media_files = BasePlatformAdapter.filter_media_delivery_paths(media_files)
images, text_content = adapter.extract_images(response)
preview = prompt[:60] + ("..." if len(prompt) > 60 else "")
@@ -16063,11 +16151,7 @@ class GatewayRunner:
)
return
_fut = safe_schedule_threadsafe(
_status_adapter.send(
_status_chat_id,
prepared_message,
metadata=_status_thread_metadata,
),
_send_or_update_status_coro(_status_adapter, _status_chat_id, event_type, prepared_message, _status_thread_metadata),
_loop_for_step,
logger=logger,
log_message=f"status_callback ({event_type}) scheduling error",
@@ -16468,45 +16552,16 @@ class GatewayRunner:
# that may include tool_calls, tool_call_id, reasoning, etc.
# - These must be passed through intact so the API sees valid
# assistant→tool sequences (dropping tool_calls causes 500 errors)
agent_history = []
for msg in history:
role = msg.get("role")
if not role:
continue
# Skip metadata entries (tool definitions, session info)
# -- these are for transcript logging, not for the LLM
if role in {"session_meta",}:
continue
# Skip system messages -- the agent rebuilds its own system prompt
if role == "system":
continue
# Rich agent messages (tool_calls, tool results) must be passed
# through intact so the API sees valid assistant→tool sequences
has_tool_calls = "tool_calls" in msg
has_tool_call_id = "tool_call_id" in msg
is_tool_message = role == "tool"
if has_tool_calls or has_tool_call_id or is_tool_message:
clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}
agent_history.append(clean_msg)
else:
# Simple text message - just need role and content
content = msg.get("content")
if content:
# Tag cross-platform mirror messages so the agent knows their origin
if msg.get("mirror"):
mirror_src = msg.get("mirror_source", "another session")
content = f"[Delivered from {mirror_src}] {content}"
# Preserve assistant reasoning + Codex replay fields so
# multi-turn reasoning context, prefix-cache hits, and
# provider-specific echo requirements survive session
# reload. See ``_ASSISTANT_REPLAY_FIELDS`` for the full
# whitelist and rationale.
entry = _build_replay_entry(role, content, msg)
agent_history.append(entry)
#
# Telegram observed group context is handled structurally here:
# observed=True transcript rows are withheld from replayable
# history and attached to the current addressed message as
# API-only context, so persisted history stores only the real
# addressed user turn.
agent_history, observed_group_context = _build_gateway_agent_history(
history,
channel_prompt=channel_prompt,
)
# Collect MEDIA paths already in history so we can exclude them
# from the current turn's extraction. This is compression-safe:
@@ -16739,7 +16794,17 @@ class GatewayRunner:
else:
_run_message = message
result = agent.run_conversation(_run_message, conversation_history=agent_history, task_id=session_id)
_api_run_message = _wrap_current_message_with_observed_context(
_run_message,
observed_group_context,
)
_conversation_kwargs = {
"conversation_history": agent_history,
"task_id": session_id,
}
if observed_group_context:
_conversation_kwargs["persist_user_message"] = message
result = agent.run_conversation(_api_run_message, **_conversation_kwargs)
finally:
unregister_gateway_notify(_approval_session_key)
# Cancel any pending clarify entries so blocked agent
+1
View File
@@ -1277,6 +1277,7 @@ class SessionStore:
platform_message_id=(
message.get("platform_message_id") or message.get("message_id")
),
observed=bool(message.get("observed")),
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
+159 -26
View File
@@ -41,7 +41,7 @@ from dataclasses import dataclass, field
from datetime import datetime, timezone
from http.server import BaseHTTPRequestHandler, HTTPServer, ThreadingHTTPServer
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Tuple
from typing import Any, Callable, Dict, FrozenSet, List, Optional, Tuple
from urllib.parse import parse_qs, urlencode, urlparse
import httpx
@@ -1559,6 +1559,67 @@ def _optional_base_url(value: Any) -> Optional[str]:
return cleaned if cleaned else None
# Allowlist of hosts the Nous Portal proxy is willing to forward minted
# bearer tokens to. The bearer is a long-lived agent_key minted by
# portal.nousresearch.com — sending it anywhere else would leak it.
#
# This is consulted only for URLs coming from the NETWORK side (Portal
# refresh / agent-key-mint responses). User-controlled env-var overrides
# (NOUS_INFERENCE_BASE_URL) bypass validation — that's the documented
# dev/staging escape hatch and the env source is already trusted (the
# user set it themselves).
_ALLOWED_NOUS_INFERENCE_HOSTS: FrozenSet[str] = frozenset({
"inference-api.nousresearch.com",
})
def _validate_nous_inference_url_from_network(url: Optional[str]) -> Optional[str]:
"""Validate a Portal-returned inference URL against the host allowlist.
Returns ``url`` (normalised by stripping trailing slashes) if it's a
well-formed ``https://<allowlisted-host>/...`` URL. Returns ``None``
if the URL is missing, malformed, non-https, or points at an
unexpected host letting the caller fall back to the configured
default rather than persist or forward a poisoned value.
Defense-in-depth: a compromised refresh / mint response from the
Portal API (MITM, malicious response injection) could otherwise
redirect every subsequent proxy request bearing the user's
legitimately-minted agent_key to an attacker-controlled endpoint.
Validating scheme + host at the source closes that loop before the
poisoned URL ever lands in ``auth.json``.
The env-var override path (``NOUS_INFERENCE_BASE_URL``) bypasses
this env values come from the trusted OS user, not from the
network, and the override is documented for staging/dev use.
Co-authored-by: memosr <mehmet.sr35@gmail.com>
"""
if not isinstance(url, str):
return None
cleaned = url.strip()
if not cleaned:
return None
try:
parsed = urlparse(cleaned)
except Exception:
return None
if parsed.scheme != "https":
logger.warning(
"nous: refusing non-https inference URL scheme %r from Portal response",
parsed.scheme,
)
return None
if parsed.hostname not in _ALLOWED_NOUS_INFERENCE_HOSTS:
logger.warning(
"nous: refusing inference URL host %r from Portal response "
"(not in allowlist); falling back to default",
parsed.hostname,
)
return None
return cleaned.rstrip("/")
def _decode_jwt_claims(token: Any) -> Dict[str, Any]:
if not isinstance(token, str) or token.count(".") != 2:
return {}
@@ -4776,7 +4837,7 @@ def refresh_nous_oauth_pure(
state["refresh_token"] = refreshed.get("refresh_token") or state["refresh_token"]
state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
state["scope"] = refreshed.get("scope") or state.get("scope")
refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
refreshed_url = _validate_nous_inference_url_from_network(refreshed.get("inference_base_url"))
if refreshed_url:
state["inference_base_url"] = refreshed_url
state["obtained_at"] = now.isoformat()
@@ -4812,7 +4873,7 @@ def refresh_nous_oauth_pure(
state["agent_key_expires_in"] = mint_payload.get("expires_in")
state["agent_key_reused"] = bool(mint_payload.get("reused", False))
state["agent_key_obtained_at"] = now.isoformat()
minted_url = _optional_base_url(mint_payload.get("inference_base_url"))
minted_url = _validate_nous_inference_url_from_network(mint_payload.get("inference_base_url"))
if minted_url:
state["inference_base_url"] = minted_url
@@ -5090,7 +5151,7 @@ def resolve_nous_runtime_credentials(
state["refresh_token"] = refreshed.get("refresh_token") or refresh_token
state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
state["scope"] = refreshed.get("scope") or state.get("scope")
refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
refreshed_url = _validate_nous_inference_url_from_network(refreshed.get("inference_base_url"))
if refreshed_url:
inference_base_url = refreshed_url
state["obtained_at"] = now.isoformat()
@@ -5198,7 +5259,7 @@ def resolve_nous_runtime_credentials(
state["refresh_token"] = refreshed.get("refresh_token") or latest_refresh_token
state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
state["scope"] = refreshed.get("scope") or state.get("scope")
refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
refreshed_url = _validate_nous_inference_url_from_network(refreshed.get("inference_base_url"))
if refreshed_url:
inference_base_url = refreshed_url
state["obtained_at"] = now.isoformat()
@@ -5253,7 +5314,7 @@ def resolve_nous_runtime_credentials(
state["agent_key_expires_in"] = mint_payload.get("expires_in")
state["agent_key_reused"] = bool(mint_payload.get("reused", False))
state["agent_key_obtained_at"] = now.isoformat()
minted_url = _optional_base_url(mint_payload.get("inference_base_url"))
minted_url = _validate_nous_inference_url_from_network(mint_payload.get("inference_base_url"))
if minted_url:
inference_base_url = minted_url
_oauth_trace(
@@ -7045,10 +7106,95 @@ def _refresh_minimax_oauth_state(
return new_state
def _minimax_oauth_quarantine_on_terminal_refresh(state: Dict[str, Any], exc: AuthError) -> None:
"""Wipe dead tokens from auth.json after a terminal refresh failure.
Shared by both the eager-resolve path and the lazy per-request token
provider. Mirrors the Nous / xAI-OAuth / Codex-OAuth quarantine pattern
so subsequent calls fail fast without a network retry.
"""
if not (exc.relogin_required and state.get("refresh_token")):
return
for _k in ("access_token", "refresh_token", "expires_at", "expires_in", "obtained_at"):
state.pop(_k, None)
state["last_auth_error"] = {
"provider": "minimax-oauth",
"code": exc.code or "refresh_failed",
"message": str(exc),
"reason": "runtime_refresh_failure",
"relogin_required": True,
"at": datetime.now(timezone.utc).isoformat(),
}
try:
_minimax_save_auth_state(state)
except Exception as _save_exc:
logger.debug("MiniMax OAuth: failed to persist quarantined state: %s", _save_exc)
def build_minimax_oauth_token_provider() -> Callable[[], str]:
"""Return a zero-arg callable that yields a fresh MiniMax access token.
The Anthropic SDK caches ``api_key`` as a static string at construction
time, so a session that resolves credentials once at startup will keep
sending the same bearer until MiniMax's server returns 401 — typically
~15 minutes in, because MiniMax issues short-lived access tokens.
Returning a *callable* instead of a string lets us hook into the
existing Entra-ID bearer infrastructure in
:mod:`agent.anthropic_adapter`: ``build_anthropic_client`` detects a
callable and routes through ``_build_anthropic_client_with_bearer_hook``,
which mints a fresh ``Authorization`` header on every outbound request.
Each invocation re-reads the persisted state from ``auth.json`` and
calls :func:`_refresh_minimax_oauth_state` that helper is a no-op
when the token still has more than ``MINIMAX_OAUTH_REFRESH_SKEW_SECONDS``
of life left, so the steady-state cost is one file read + one
timestamp compare per request.
Reading state fresh each time also means a refresh persisted by one
process (CLI, gateway, cron) is immediately visible to every other
process sharing the same ``auth.json``.
"""
def _provide() -> str:
state = get_provider_auth_state("minimax-oauth")
if not state or not state.get("access_token"):
raise AuthError(
"Not logged into MiniMax OAuth. Run `hermes model` and select "
"MiniMax (OAuth).",
provider="minimax-oauth", code="not_logged_in", relogin_required=True,
)
try:
state = _refresh_minimax_oauth_state(state)
except AuthError as exc:
_minimax_oauth_quarantine_on_terminal_refresh(state, exc)
raise
token = state.get("access_token")
if not token:
raise AuthError(
"MiniMax OAuth state has no access_token after refresh.",
provider="minimax-oauth", code="no_access_token", relogin_required=True,
)
return token
return _provide
def resolve_minimax_oauth_runtime_credentials(
*, min_token_ttl_seconds: int = MINIMAX_OAUTH_REFRESH_SKEW_SECONDS,
as_token_provider: bool = False,
) -> Dict[str, Any]:
"""Return {provider, api_key, base_url, source} for minimax-oauth."""
"""Return {provider, api_key, base_url, source} for minimax-oauth.
When ``as_token_provider`` is True, ``api_key`` is a zero-arg callable
that mints a fresh access token per call (proactively refreshing if
the cached token is within ``MINIMAX_OAUTH_REFRESH_SKEW_SECONDS`` of
expiry). This is what the runtime provider path uses so that long
sessions survive MiniMax's short access-token lifetime — see
:func:`build_minimax_oauth_token_provider` for the rationale.
The default (string ``api_key``) preserves the historical contract for
diagnostic call sites like ``hermes status`` that just want to know
whether a valid token exists right now.
"""
state = get_provider_auth_state("minimax-oauth")
if not state or not state.get("access_token"):
raise AuthError(
@@ -7059,28 +7205,15 @@ def resolve_minimax_oauth_runtime_credentials(
try:
state = _refresh_minimax_oauth_state(state)
except AuthError as exc:
if exc.relogin_required and state.get("refresh_token"):
# Terminal refresh failure — clear dead tokens from auth.json so
# subsequent calls fail fast without a network retry, mirroring
# the Nous / xAI-OAuth / Codex-OAuth quarantine pattern.
for _k in ("access_token", "refresh_token", "expires_at", "expires_in", "obtained_at"):
state.pop(_k, None)
state["last_auth_error"] = {
"provider": "minimax-oauth",
"code": exc.code or "refresh_failed",
"message": str(exc),
"reason": "runtime_refresh_failure",
"relogin_required": True,
"at": datetime.now(timezone.utc).isoformat(),
}
try:
_minimax_save_auth_state(state)
except Exception as _save_exc:
logger.debug("MiniMax OAuth: failed to persist quarantined state: %s", _save_exc)
_minimax_oauth_quarantine_on_terminal_refresh(state, exc)
raise
if as_token_provider:
api_key: Any = build_minimax_oauth_token_provider()
else:
api_key = state["access_token"]
return {
"provider": "minimax-oauth",
"api_key": state["access_token"],
"api_key": api_key,
"base_url": state["inference_base_url"].rstrip("/"),
"source": "oauth",
}
+1 -1
View File
@@ -449,7 +449,7 @@ def _iter_plugin_command_entries() -> list[tuple[str, str, str]]:
:func:`hermes_cli.plugins.PluginContext.register_command`. They behave
like ``CommandDef`` entries for gateway surfacing: they appear in the
Telegram command menu, in Slack's ``/hermes`` subcommand mapping, and
(via :func:`gateway.platforms.discord._register_slash_commands`) in
(via :func:`plugins.platforms.discord.adapter._register_slash_commands`) in
Discord's native slash command picker.
Lookup is lazy so importing this module never forces plugin discovery
+37 -2
View File
@@ -1747,6 +1747,37 @@ DEFAULT_CONFIG = {
"retries": 2,
},
# =========================================================================
# External secret sources
# =========================================================================
# Pull credentials from external secret managers at process startup
# rather than storing them in ~/.hermes/.env.
"secrets": {
"bitwarden": {
# Master switch. When false, BSM is never contacted and the
# bws binary is never auto-installed — same as not having
# this section at all.
"enabled": False,
# Name of the env var that holds the Bitwarden machine-account
# access token. This is the one bootstrap secret; it lives
# in ~/.hermes/.env (or your shell) and never in config.yaml.
"access_token_env": "BWS_ACCESS_TOKEN",
# UUID of the BSM project to sync from.
"project_id": "",
# Seconds to cache fetched secrets in-process. 0 disables.
"cache_ttl_seconds": 300,
# When True, BSM values overwrite existing env vars. Default
# True because the point of using BSM is centralized rotation —
# if .env had the final say, rotating in Bitwarden wouldn't
# take effect until you also cleared the matching .env line.
"override_existing": True,
# When True, the bws binary is auto-downloaded into
# ~/.hermes/bin/ on first use. When False you must install
# bws yourself and have it on PATH.
"auto_install": True,
},
},
# Config schema version - bump this when adding new required fields
"_config_version": 23,
}
@@ -3017,7 +3048,7 @@ def _normalize_custom_provider_entry(
"api_mode", "transport", "model", "default_model", "models",
"context_length", "rate_limit_delay",
"request_timeout_seconds", "stale_timeout_seconds",
"discover_models",
"discover_models", "extra_body",
}
for camel, snake in _CAMEL_ALIASES.items():
if camel in entry and snake not in entry:
@@ -3112,6 +3143,10 @@ def _normalize_custom_provider_entry(
if isinstance(discover_models, bool):
normalized["discover_models"] = discover_models
extra_body = entry.get("extra_body")
if isinstance(extra_body, dict):
normalized["extra_body"] = dict(extra_body)
return normalized
@@ -3272,7 +3307,7 @@ _KNOWN_ROOT_KEYS = {
# Valid fields inside a custom_providers list entry
_VALID_CUSTOM_PROVIDER_FIELDS = {
"name", "base_url", "api_key", "api_mode", "model", "models",
"context_length", "rate_limit_delay",
"context_length", "rate_limit_delay", "extra_body",
# key_env is read at runtime by runtime_provider.py and auxiliary_client.py
# — include it here so the set accurately describes the supported schema.
"key_env",
+1 -1
View File
@@ -71,7 +71,7 @@ def curses_checklist(
curses.use_default_colors()
curses.init_pair(1, curses.COLOR_GREEN, -1)
curses.init_pair(2, curses.COLOR_YELLOW, -1)
curses.init_pair(3, 8, -1) # dim gray
curses.init_pair(3, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1) # dim gray
cursor = 0
scroll_offset = 0
+3
View File
@@ -16,6 +16,7 @@ from pathlib import Path
from hermes_cli.config import get_hermes_home, get_env_path, get_project_root, load_config
from hermes_cli.env_loader import load_hermes_dotenv
from hermes_constants import display_hermes_home
from agent.skill_utils import is_excluded_skill_path
def _get_git_commit(project_root: Path) -> str:
@@ -69,6 +70,8 @@ def _count_skills(hermes_home: Path) -> int:
return 0
count = 0
for item in skills_dir.rglob("SKILL.md"):
if is_excluded_skill_path(item):
continue
count += 1
return count
+121
View File
@@ -21,6 +21,44 @@ _CREDENTIAL_SUFFIXES = ("_API_KEY", "_TOKEN", "_SECRET", "_KEY")
# tests) don't spam the same warning multiple times.
_WARNED_KEYS: set[str] = set()
# Map of env-var name → source label ("bitwarden", etc.) for credentials
# that were injected by an external secret source during load_hermes_dotenv().
# Used by setup / `hermes model` flows to label detected credentials so
# users understand WHERE a key came from when their .env doesn't contain it
# directly (otherwise the "credentials detected ✓" line looks identical to
# the .env case and they don't know Bitwarden is wired up).
_SECRET_SOURCES: dict[str, str] = {}
def get_secret_source(env_var: str) -> str | None:
"""Return the label of the secret source that supplied ``env_var``, if any.
Returns ``"bitwarden"`` for keys pulled from Bitwarden Secrets Manager
during the current process's ``load_hermes_dotenv()`` call. Returns
``None`` for keys that came from ``.env``, the shell environment, or
aren't tracked.
"""
return _SECRET_SOURCES.get(env_var)
def format_secret_source_suffix(env_var: str) -> str:
"""Return a human-readable suffix like ``" (from Bitwarden)"`` or ``""``.
Use this when printing a detected credential so the user can see where
it came from. Empty string when the credential came from ``.env`` or
the shell those are the implicit / "default" cases users already
understand.
"""
source = get_secret_source(env_var)
if not source:
return ""
if source == "bitwarden":
return " (from Bitwarden)"
# Generic fallback — future-proofing for additional secret sources
# (e.g. 1Password, HashiCorp Vault) without having to update every
# call site.
return f" (from {source})"
def _format_offending_chars(value: str, limit: int = 3) -> str:
"""Return a compact 'U+XXXX ('c'), ...' summary of non-ASCII codepoints."""
@@ -172,4 +210,87 @@ def load_hermes_dotenv(
_load_dotenv_with_fallback(project_env_path, override=not loaded)
loaded.append(project_env_path)
_apply_external_secret_sources(home_path)
return loaded
def _apply_external_secret_sources(home_path: Path) -> None:
"""Pull secrets from external sources (currently Bitwarden) into env.
Runs AFTER dotenv loads so .env values are visible (we use them to
locate the access token) but BEFORE the rest of Hermes reads
``os.environ`` for credentials. Any failure here is logged and
swallowed external secret sources must never block startup.
"""
try:
cfg = _load_secrets_config(home_path)
except Exception: # noqa: BLE001 — config errors must not block startup
return
bw_cfg = (cfg or {}).get("bitwarden") or {}
if not bw_cfg.get("enabled"):
return
try:
from agent.secret_sources.bitwarden import apply_bitwarden_secrets
except ImportError:
return
result = apply_bitwarden_secrets(
enabled=True,
access_token_env=bw_cfg.get("access_token_env", "BWS_ACCESS_TOKEN"),
project_id=bw_cfg.get("project_id", ""),
override_existing=bool(bw_cfg.get("override_existing", False)),
cache_ttl_seconds=float(bw_cfg.get("cache_ttl_seconds", 300)),
auto_install=bool(bw_cfg.get("auto_install", True)),
)
if result.applied:
# Re-run the ASCII sanitization pass: BSM values are user-supplied
# and might have the same copy-paste corruption as a manually
# edited .env (see #6843).
_sanitize_loaded_credentials()
# Remember where these came from so the setup / `hermes model`
# flows can label detected credentials with "(from Bitwarden)" —
# otherwise users see "credentials ✓" with no hint that the value
# came from BSM rather than .env.
for name in result.applied:
_SECRET_SOURCES[name] = "bitwarden"
print(
f" Bitwarden Secrets Manager: applied {len(result.applied)} "
f"secret{'s' if len(result.applied) != 1 else ''} "
f"({', '.join(sorted(result.applied))})",
file=sys.stderr,
)
if result.error:
print(
f" Bitwarden Secrets Manager: {result.error}",
file=sys.stderr,
)
for warn in result.warnings:
print(
f" Bitwarden Secrets Manager: {warn}",
file=sys.stderr,
)
def _load_secrets_config(home_path: Path) -> dict:
"""Read just the ``secrets:`` section out of config.yaml.
Imported lazily and isolated from the main config loader so a
malformed config can't take down dotenv loading entirely.
"""
config_path = home_path / "config.yaml"
if not config_path.exists():
return {}
try:
import yaml # type: ignore
except ImportError:
return {}
try:
with open(config_path, "r", encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
except Exception: # noqa: BLE001
return {}
return data.get("secrets") or {}
+6 -13
View File
@@ -21,6 +21,8 @@ from __future__ import annotations
import copy
from typing import Any, Dict, List, Optional
from hermes_cli.fallback_config import get_fallback_chain
# ---------------------------------------------------------------------------
# Helpers
@@ -30,20 +32,11 @@ def _read_chain(config: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Return the normalized fallback chain as a list of dicts.
Accepts both the new list format (``fallback_providers``) and the legacy
single-dict format (``fallback_model``). The returned list is always a
fresh copy callers can mutate without touching the config dict.
``fallback_model`` format. When both are present, the effective chain is
merged with ``fallback_providers`` entries kept first. The returned list is
always a fresh copy callers can mutate without touching the config dict.
"""
chain = config.get("fallback_providers") or []
if isinstance(chain, list):
result = [dict(e) for e in chain if isinstance(e, dict) and e.get("provider") and e.get("model")]
if result:
return result
legacy = config.get("fallback_model")
if isinstance(legacy, dict) and legacy.get("provider") and legacy.get("model"):
return [dict(legacy)]
if isinstance(legacy, list):
return [dict(e) for e in legacy if isinstance(e, dict) and e.get("provider") and e.get("model")]
return []
return get_fallback_chain(config)
def _write_chain(config: Dict[str, Any], chain: List[Dict[str, Any]]) -> None:
+72
View File
@@ -0,0 +1,72 @@
"""Helpers for reading the effective fallback provider chain from config."""
from __future__ import annotations
from typing import Any
def _normalized_base_url(value: Any) -> str:
if not isinstance(value, str):
return ""
return value.strip().rstrip("/")
def _iter_fallback_entries(raw: Any) -> list[dict[str, Any]]:
if isinstance(raw, dict):
candidates = [raw]
elif isinstance(raw, list):
candidates = raw
else:
return []
entries: list[dict[str, Any]] = []
for entry in candidates:
if not isinstance(entry, dict):
continue
provider = str(entry.get("provider") or "").strip()
model = str(entry.get("model") or "").strip()
if not provider or not model:
continue
normalized = dict(entry)
normalized["provider"] = provider
normalized["model"] = model
base_url = _normalized_base_url(entry.get("base_url"))
if base_url:
normalized["base_url"] = base_url
entries.append(normalized)
return entries
def _entry_identity(entry: dict[str, Any]) -> tuple[str, str, str]:
return (
str(entry.get("provider") or "").strip().lower(),
str(entry.get("model") or "").strip().lower(),
_normalized_base_url(entry.get("base_url")).lower(),
)
def get_fallback_chain(config: dict[str, Any] | None) -> list[dict[str, Any]]:
"""Return the effective fallback chain merged across old and new config keys.
``fallback_providers`` remains the primary source of truth and keeps its
order. Legacy ``fallback_model`` entries are appended afterwards unless
they target the same provider/model/base_url route as an earlier entry.
The returned list always contains fresh dict copies.
"""
config = config or {}
chain: list[dict[str, Any]] = []
seen: set[tuple[str, str, str]] = set()
for key in ("fallback_providers", "fallback_model"):
for entry in _iter_fallback_entries(config.get(key)):
identity = _entry_identity(entry)
if identity in seen:
continue
seen.add(identity)
chain.append(entry)
return chain
+12 -30
View File
@@ -3327,34 +3327,9 @@ _PLATFORMS = [
"help": "For DMs, this is your user ID. You can set it later by typing /set-home in chat."},
],
},
{
"key": "discord",
"label": "Discord",
"emoji": "💬",
"token_var": "DISCORD_BOT_TOKEN",
"setup_instructions": [
"1. Go to https://discord.com/developers/applications → New Application",
"2. Go to Bot → Reset Token → copy the bot token",
"3. Enable: Bot → Privileged Gateway Intents → Message Content Intent",
"4. Invite the bot to your server:",
" OAuth2 → URL Generator → check BOTH scopes:",
" - bot",
" - applications.commands (required for slash commands!)",
" Bot Permissions: Send Messages, Read Message History, Attach Files",
" Copy the URL and open it in your browser to invite.",
"5. Get your user ID: enable Developer Mode in Discord settings,",
" then right-click your name → Copy ID",
],
"vars": [
{"name": "DISCORD_BOT_TOKEN", "prompt": "Bot token", "password": True,
"help": "Paste the token from step 2 above."},
{"name": "DISCORD_ALLOWED_USERS", "prompt": "Allowed user IDs or usernames (comma-separated)", "password": False,
"is_allowlist": True,
"help": "Paste your user ID from step 5 above."},
{"name": "DISCORD_HOME_CHANNEL", "prompt": "Home channel ID (for cron/notification delivery, or empty to set later with /set-home)", "password": False,
"help": "Right-click a channel → Copy Channel ID (requires Developer Mode)."},
],
},
# Discord moved to plugins/platforms/discord/ — its setup metadata is
# discovered dynamically via _all_platforms() from the platform registry
# entry registered by plugins/platforms/discord/adapter.py::register().
{
"key": "slack",
"label": "Slack",
@@ -3762,7 +3737,12 @@ def _platform_status(platform: dict) -> str:
configured = bool(entry.is_connected(synthetic))
except Exception:
configured = False
if not configured:
else:
# No is_connected hook — fall back to check_fn as a coarse
# "are deps present" gate. Don't fall back when is_connected
# is defined and returned False; that would let "SDK is
# installed" override "no token configured" and incorrectly
# report the platform as ready.
try:
configured = bool(entry.check_fn())
except Exception:
@@ -4747,7 +4727,9 @@ def _builtin_setup_fn(key: str):
from hermes_cli import setup as _s
return {
"telegram": _s._setup_telegram,
"discord": _s._setup_discord,
# discord moved into the plugin: setup_fn is registered by
# plugins/platforms/discord/adapter.py::register() and dispatched
# via the plugin path in _configure_platform().
"slack": _s._setup_slack,
"matrix": _s._setup_matrix,
"mattermost": _s._setup_mattermost,
+6 -2
View File
@@ -365,7 +365,9 @@ def _write_task_script() -> Path:
content = _build_gateway_cmd_script(python_path, working_dir, hermes_home, profile_arg)
script_path = get_task_script_path()
script_path.write_text(content, encoding="utf-8", newline="")
tmp = script_path.with_suffix(".tmp")
tmp.write_text(content, encoding="utf-8", newline="")
tmp.replace(script_path)
return script_path
@@ -436,7 +438,9 @@ def _install_startup_entry(script_path: Path) -> Path:
"""Write the Startup-folder fallback launcher. Returns its path."""
entry = get_startup_entry_path()
entry.parent.mkdir(parents=True, exist_ok=True)
entry.write_text(_build_startup_launcher(script_path), encoding="utf-8", newline="")
tmp = entry.with_suffix(".tmp")
tmp.write_text(_build_startup_launcher(script_path), encoding="utf-8", newline="")
tmp.replace(entry)
return entry
+222
View File
@@ -75,6 +75,7 @@ import json
import os
import re
import secrets
import shutil
import sqlite3
import subprocess
import sys
@@ -82,6 +83,7 @@ import threading
import logging
import time
from dataclasses import dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Any, Iterable, Optional
@@ -1005,6 +1007,131 @@ def _validate_sqlite_header(path: Path) -> None:
)
class KanbanDbCorruptError(RuntimeError):
"""Raised when an existing kanban DB file fails integrity checks.
Fail-closed guard against silent recreation of a corrupt board file,
which would otherwise destroy the user's tasks. Carries both the
original path and the timestamped backup we made before refusing.
"""
def __init__(self, db_path: Path, backup_path: Optional[Path], reason: str):
self.db_path = db_path
self.backup_path = backup_path
self.reason = reason
backup_str = str(backup_path) if backup_path is not None else "<backup failed>"
super().__init__(
f"Refusing to open corrupt kanban DB at {db_path}: {reason}. "
f"Original preserved; backup at {backup_str}."
)
def _backup_corrupt_db(path: Path) -> Optional[Path]:
"""Copy a corrupt DB (and its WAL/SHM sidecars) to a timestamped backup.
Returns the backup path of the main DB file, or ``None`` if the copy
itself failed (the caller still raises loudly in that case).
Writes are confined to the original DB's parent directory. The
backup basename is derived purely from ``path.name``, never from
caller-supplied directory segments no traversal is possible.
"""
# Resolve once and pin the parent so subsequent path operations cannot
# escape it. ``Path.resolve()`` collapses any ``..`` segments and
# symlinks, and we only ever write inside ``parent``.
resolved = path.resolve()
parent = resolved.parent
base_name = resolved.name # basename only
stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
candidate = parent / f"{base_name}.corrupt.{stamp}.bak"
# Defensive: candidate must still be inside parent after construction.
# f-string interpolation of ``base_name`` cannot escape ``parent``
# because ``base_name`` is itself a resolved basename, but assert it
# anyway so static analyzers can see the containment guarantee.
if candidate.parent != parent:
return None
counter = 0
while candidate.exists():
counter += 1
candidate = parent / f"{base_name}.corrupt.{stamp}.{counter}.bak"
if candidate.parent != parent:
return None
try:
shutil.copy2(resolved, candidate)
except OSError:
return None
for suffix in ("-wal", "-shm"):
sidecar = parent / (base_name + suffix)
if sidecar.parent != parent or not sidecar.exists():
continue
try:
sidecar_backup = parent / (candidate.name + suffix)
if sidecar_backup.parent != parent:
continue
shutil.copy2(sidecar, sidecar_backup)
except OSError:
pass
return candidate
def _guard_existing_db_is_healthy(path: Path) -> None:
"""Run ``PRAGMA integrity_check`` on an existing non-empty DB file.
Opens the probe in read/write mode so SQLite can recover or
checkpoint a healthy WAL/hot-journal DB before we declare it
corrupt. If the file is malformed, copy it (and any WAL/SHM
sidecars) to a timestamped backup and raise
:class:`KanbanDbCorruptError` so callers cannot silently recreate
the schema on top of a damaged DB.
Transient lock/busy errors (``sqlite3.OperationalError``) are NOT
treated as corruption; they propagate raw so the caller sees a
normal lock failure and no spurious ``.corrupt`` backup is made.
No-op for missing files, zero-byte files (treated as fresh), and
paths already proven healthy this process (cache hit).
Path-trust note: ``path`` arrives via :func:`connect`, which itself
resolves it from an explicit ``db_path`` argument, the
:func:`kanban_db_path` env-var chain, or the kanban-home default
all sources Hermes treats as user-controlled-but-trusted on the
user's own machine. We additionally resolve the path here and
confine all filesystem writes to its parent directory so any
accidental ``..`` segments are collapsed before any I/O happens.
"""
# Resolve before any I/O. ``Path.resolve()`` normalizes ``..`` and
# symlinks, giving us a canonical path whose parent dir we can pin.
try:
resolved = path.resolve()
except OSError:
return
try:
if not resolved.exists() or resolved.stat().st_size == 0:
return
except OSError:
return
if str(resolved) in _INITIALIZED_PATHS:
return
reason: Optional[str] = None
try:
probe = sqlite3.connect(str(resolved), timeout=5, isolation_level=None)
try:
row = probe.execute("PRAGMA integrity_check").fetchone()
finally:
probe.close()
if not row or (row[0] or "").lower() != "ok":
reason = f"integrity_check returned {row[0] if row else '<no row>'!r}"
except sqlite3.OperationalError:
# Lock contention, busy, transient IO — not corruption. Let it propagate.
raise
except sqlite3.DatabaseError as exc:
reason = f"sqlite refused to open file: {exc}"
if reason is None:
return
backup = _backup_corrupt_db(resolved)
raise KanbanDbCorruptError(resolved, backup, reason)
def connect(
db_path: Optional[Path] = None,
*,
@@ -1033,7 +1160,13 @@ def connect(
else:
path = kanban_db_path(board=board)
path.parent.mkdir(parents=True, exist_ok=True)
# Cheap byte-level check first — catches the #29507 TLS-overwrite shape
# and other invalid-header cases without opening a sqlite connection.
_validate_sqlite_header(path)
# Full integrity probe — catches corruption past the header (malformed
# pages, broken internal metadata). Cached per-path after first success
# via _INITIALIZED_PATHS so it only runs once per process per path.
_guard_existing_db_is_healthy(path)
resolved = str(path.resolve())
conn = sqlite3.connect(str(path), isolation_level=None, timeout=30)
try:
@@ -2961,6 +3094,93 @@ def _cleanup_worker_tmux(conn: sqlite3.Connection, task_id: str) -> None:
pass # best-effort — never block completion
# ---------------------------------------------------------------------------
# First-use tip for scratch workspaces
# ---------------------------------------------------------------------------
#
# Scratch workspaces are intentionally ephemeral — ``_cleanup_workspace``
# removes them as soon as ``complete_task`` runs. New users often don't
# realize that and lose worker output (community report, May 2026). The
# behavior is right; the lack of warning is the bug.
#
# On the FIRST scratch workspace materialization across the whole install
# we:
# 1. Log a warning line on the dispatcher logger.
# 2. Append a ``tip_scratch_workspace`` event on the task so it's visible
# via ``hermes kanban show <id>`` and the dashboard.
# 3. Touch a sentinel file under ``kanban_home() / '.scratch_tip_shown'``
# so we don't repeat the tip — once you know, you know.
#
# Scope is per-install, not per-board: a user creating a second board
# already learned the lesson on board #1.
_SCRATCH_TIP_SENTINEL_NAME = ".scratch_tip_shown"
_SCRATCH_TIP_MESSAGE = (
"scratch workspaces are ephemeral — they're deleted when the task "
"completes. Use --workspace worktree: (git worktree) or "
"--workspace dir:/abs/path (existing dir) to preserve worker output."
)
def _scratch_tip_sentinel_path() -> Path:
"""Path to the per-install scratch-workspace-tip sentinel file."""
return kanban_home() / _SCRATCH_TIP_SENTINEL_NAME
def _scratch_tip_shown() -> bool:
"""True iff the scratch-workspace tip has already been emitted on this
install. Best-effort any error means we re-emit, which is the safer
failure mode for a help message."""
try:
return _scratch_tip_sentinel_path().exists()
except OSError:
return False
def _mark_scratch_tip_shown() -> None:
"""Touch the sentinel so future scratch workspaces stay silent.
Best-effort: a failure here just means the tip might appear once more,
which is preferable to crashing dispatch over a help message.
"""
try:
path = _scratch_tip_sentinel_path()
path.parent.mkdir(parents=True, exist_ok=True)
path.touch(exist_ok=True)
except OSError:
pass
def _maybe_emit_scratch_tip(
conn: sqlite3.Connection,
task_id: str,
workspace_kind: Optional[str],
) -> None:
"""Emit the first-use scratch-workspace tip exactly once per install.
Called from the dispatcher right after a scratch workspace is
materialized. No-op for ``worktree`` / ``dir`` workspaces (they're
preserved by design) and no-op after the sentinel exists.
"""
if (workspace_kind or "scratch") != "scratch":
return
if _scratch_tip_shown():
return
try:
_log.warning("kanban: %s (task %s)", _SCRATCH_TIP_MESSAGE, task_id)
with write_txn(conn):
_append_event(
conn, task_id, "tip_scratch_workspace",
{"message": _SCRATCH_TIP_MESSAGE},
)
except Exception:
# Best-effort — never block the spawn loop over a help message.
pass
finally:
_mark_scratch_tip_shown()
def edit_completed_task_result(
conn: sqlite3.Connection,
task_id: str,
@@ -4892,6 +5112,7 @@ def dispatch_once(
continue
# Persist the resolved workspace path so the worker can cd there.
set_workspace_path(conn, claimed.id, str(workspace))
_maybe_emit_scratch_tip(conn, claimed.id, claimed.workspace_kind)
_spawn = spawn_fn if spawn_fn is not None else _default_spawn
try:
# Back-compat: older spawn_fn signatures accept only
@@ -4970,6 +5191,7 @@ def dispatch_once(
continue
# Persist the resolved workspace path so the worker can cd there.
set_workspace_path(conn, claimed.id, str(workspace))
_maybe_emit_scratch_tip(conn, claimed.id, claimed.workspace_kind)
# Force-load sdlc-review skill for review agents. The
# _default_spawn function already auto-loads kanban-worker, and
# appends task.skills via --skills. Setting task.skills here
+475 -87
View File
@@ -61,12 +61,76 @@ try:
except ModuleNotFoundError:
pass
import os
import sys
def _is_termux_startup_environment_fast() -> bool:
"""Tiny Termux check for pre-import startup shortcuts."""
prefix = os.environ.get("PREFIX", "")
return bool(
os.environ.get("TERMUX_VERSION")
or "com.termux/files/usr" in prefix
or prefix.startswith("/data/data/com.termux/")
)
def _is_termux_fast_version_argv(argv: list[str]) -> bool:
return argv in (["--version"], ["-V"], ["version"])
def _read_openai_version_fast() -> str | None:
"""Read OpenAI SDK version without importing ``importlib.metadata``."""
for base in sys.path:
if not base:
base = os.getcwd()
version_file = os.path.join(base, "openai", "_version.py")
try:
with open(version_file, encoding="utf-8") as handle:
for line in handle:
stripped = line.strip()
if not stripped.startswith("__version__"):
continue
_key, _sep, value = stripped.partition("=")
value = value.split("#", 1)[0].strip().strip("\"'")
return value or None
except OSError:
continue
return None
def _print_fast_version_info() -> None:
from hermes_cli import __release_date__, __version__
project_root = os.path.abspath(os.path.join(os.path.dirname(__file__), os.pardir))
print(f"Hermes Agent v{__version__} ({__release_date__})")
print(f"Project: {project_root}")
print(f"Python: {sys.version.split()[0]}")
openai_version = _read_openai_version_fast()
print(f"OpenAI SDK: {openai_version}" if openai_version else "OpenAI SDK: Not installed")
def _try_termux_ultrafast_version() -> bool:
"""Handle ``hermes --version`` before config/logging imports on Termux."""
if os.environ.get("HERMES_TERMUX_DISABLE_FAST_CLI") == "1":
return False
if not _is_termux_startup_environment_fast():
return False
if not _is_termux_fast_version_argv(sys.argv[1:]):
return False
_print_fast_version_info()
return True
if _try_termux_ultrafast_version():
raise SystemExit(0)
import argparse
import json
import os
import shutil
import subprocess
import sys
from pathlib import Path
from typing import Optional
@@ -275,6 +339,133 @@ def _is_termux_startup_environment(env: dict[str, str] | None = None) -> bool:
)
def _read_packed_ref(common_dir: Path, ref: str) -> str | None:
"""Look up a ref in .git/packed-refs without spawning git.
packed-refs lines look like ``<sha> <ref>`` with optional ``^<sha>``
peel lines and ``#``-prefixed comments / ``# pack-refs with:`` header.
"""
try:
text = (common_dir / "packed-refs").read_text(encoding="utf-8", errors="replace")
except OSError:
return None
for line in text.splitlines():
if not line or line.startswith("#") or line.startswith("^"):
continue
parts = line.split(" ", 1)
if len(parts) == 2 and parts[1].strip() == ref:
return parts[0].strip()
return None
def _read_git_revision_fingerprint(repo_root: Path) -> str | None:
"""Return a cheap checkout fingerprint without spawning git."""
git_dir = repo_root / ".git"
try:
if git_dir.is_file():
for line in git_dir.read_text(encoding="utf-8", errors="replace").splitlines():
key, _, value = line.partition(":")
if key.strip() == "gitdir" and value.strip():
git_dir = (repo_root / value.strip()).resolve()
break
# Worktrees point HEAD at a per-worktree gitdir but pack their refs
# in the main repo's gitdir (referenced via ``commondir``). Resolve
# that up front so packed-refs lookups hit the right file.
common_dir = git_dir
commondir_file = git_dir / "commondir"
if commondir_file.exists():
try:
rel = commondir_file.read_text(encoding="utf-8", errors="replace").strip()
if rel:
common_dir = (git_dir / rel).resolve()
except OSError:
pass
head_file = git_dir / "HEAD"
head = head_file.read_text(encoding="utf-8", errors="replace").strip()
if head.startswith("ref:"):
ref = head.split(":", 1)[1].strip()
# Loose refs may live in the worktree gitdir OR the common dir
# (branches created via `git worktree add` typically live in the
# common dir's refs/heads/).
for candidate in (git_dir, common_dir):
ref_file = candidate / ref
if ref_file.exists():
return f"git:{ref}:{ref_file.read_text(encoding='utf-8', errors='replace').strip()}"
packed_sha = _read_packed_ref(common_dir, ref)
if packed_sha:
return f"git:{ref}:{packed_sha}"
# Ref name is known but unresolved — still stable across launches,
# and the version/release fallback in the caller will invalidate
# after `hermes update`.
return f"git:{ref}:unresolved"
return f"git:HEAD:{head}"
except OSError:
return None
def _termux_bundled_skills_fingerprint() -> str:
"""Cheap invalidation key for Termux bundled-skill startup sync."""
git_fp = _read_git_revision_fingerprint(PROJECT_ROOT)
if git_fp:
return git_fp
skills_dir = PROJECT_ROOT / "skills"
try:
stat = skills_dir.stat()
return f"skills:{__version__}:{__release_date__}:{stat.st_mtime_ns}:{stat.st_size}"
except OSError:
return f"skills:{__version__}:{__release_date__}:missing"
def _termux_bundled_skills_stamp_path() -> Path:
return get_hermes_home() / "skills" / ".termux_bundled_sync_stamp"
def _termux_bundled_skills_sync_needed() -> bool:
if not _is_termux_startup_environment():
return True
if os.environ.get("HERMES_TERMUX_FORCE_SKILLS_SYNC") == "1":
return True
try:
stamp = _termux_bundled_skills_stamp_path()
return stamp.read_text(encoding="utf-8").strip() != _termux_bundled_skills_fingerprint()
except OSError:
return True
def _mark_termux_bundled_skills_synced() -> None:
if not _is_termux_startup_environment():
return
try:
stamp = _termux_bundled_skills_stamp_path()
stamp.parent.mkdir(parents=True, exist_ok=True)
stamp.write_text(_termux_bundled_skills_fingerprint() + "\n", encoding="utf-8")
except OSError:
pass
def _sync_bundled_skills_for_startup() -> bool:
"""Sync bundled skills, but skip unchanged Termux checkouts cheaply.
Hashing every bundled skill is safe but expensive on older Android
storage. The git/ref stamp keeps post-update correctness: a changed
checkout revision forces one real sync, then later starts skip it.
"""
if _is_termux_startup_environment() and not _termux_bundled_skills_sync_needed():
return False
from tools.skills_sync import sync_skills
sync_skills(quiet=True)
_mark_termux_bundled_skills_synced()
return True
def _termux_should_prefetch_update_check() -> bool:
if not _is_termux_startup_environment():
return True
return os.environ.get("HERMES_TERMUX_PREFETCH_UPDATES") == "1"
def _relative_time(ts) -> str:
"""Format a timestamp as relative time (e.g., '2h ago', 'yesterday')."""
if not ts:
@@ -464,7 +655,7 @@ def _session_browse_picker(sessions: list) -> Optional[str]:
curses.init_pair(1, curses.COLOR_GREEN, -1) # selected
curses.init_pair(2, curses.COLOR_YELLOW, -1) # header
curses.init_pair(3, curses.COLOR_CYAN, -1) # search
curses.init_pair(4, 8, -1) # dim
curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1) # dim
cursor = 0
scroll_offset = 0
@@ -1146,13 +1337,13 @@ def _make_tui_argv(tui_dir: Path, tui_dev: bool) -> tuple[list[str], Path]:
p = Path(ext_dir)
if (p / "dist" / "entry.js").is_file():
node = _node_bin("node")
return [node, str(p / "dist" / "entry.js")], p
return [node, "--expose-gc", str(p / "dist" / "entry.js")], p
# 1b. Bundled in wheel (pip install)
bundled = _find_bundled_tui()
if bundled is not None:
node = _node_bin("node")
return [node, str(bundled)], bundled.parent
return [node, "--expose-gc", str(bundled)], bundled.parent
# 2. Normal flow: npm install if needed, always esbuild, then node dist/entry.js.
# --dev flow: npm install if needed, then tsx src/entry.tsx.
@@ -1229,7 +1420,7 @@ def _make_tui_argv(tui_dir: Path, tui_dev: bool) -> tuple[list[str], Path]:
sys.exit(1)
node = _node_bin("node")
return [node, str(tui_dir / "dist" / "entry.js")], tui_dir
return [node, "--expose-gc", str(tui_dir / "dist" / "entry.js")], tui_dir
def _normalize_tui_toolsets(toolsets: object) -> list[str]:
@@ -1351,16 +1542,16 @@ def _launch_tui(
env["HERMES_TUI_TOOL_PROGRESS"] = "off"
if accept_hooks:
env["HERMES_ACCEPT_HOOKS"] = "1"
# Guarantee an 8GB V8 heap + exposed GC for the TUI. Default node cap is
# ~1.54GB depending on version and can fatal-OOM on long sessions with
# large transcripts / reasoning blobs. Token-level merge: respect any
# user-supplied --max-old-space-size (they may have set it higher) and
# avoid duplicating --expose-gc.
# Guarantee an 8GB V8 heap for the TUI. Default node cap is ~1.54GB
# depending on version and can fatal-OOM on long sessions with large
# transcripts / reasoning blobs. Token-level merge: respect any
# user-supplied --max-old-space-size (they may have set it higher).
# --expose-gc is *not* added here: Node rejects it in NODE_OPTIONS
# ("--expose-gc is not allowed in NODE_OPTIONS") and refuses to start.
# It is passed as a direct argv flag in _make_tui_argv() instead.
_tokens = env.get("NODE_OPTIONS", "").split()
if not any(t.startswith("--max-old-space-size=") for t in _tokens):
_tokens.append("--max-old-space-size=8192")
if "--expose-gc" not in _tokens:
_tokens.append("--expose-gc")
env["NODE_OPTIONS"] = " ".join(_tokens)
# HERMES_TUI_RESUME is an internal hand-off from the Python wrapper to the
# Ink app. Because we start from os.environ.copy(), an exported/stale value
@@ -1523,19 +1714,20 @@ def cmd_chat(args):
print("You can run 'hermes setup' at any time to configure.")
sys.exit(1)
# Start update check in background (runs while other init happens)
try:
from hermes_cli.banner import prefetch_update_check
# Start update check in background (runs while other init happens).
# On Termux this imports rich/prompt_toolkit in the foreground and then
# competes for CPU on single-core devices, so keep it opt-in there.
if _termux_should_prefetch_update_check():
try:
from hermes_cli.banner import prefetch_update_check
prefetch_update_check()
except Exception:
pass
prefetch_update_check()
except Exception:
pass
# Sync bundled skills on every CLI launch (fast -- skips unchanged skills)
try:
from tools.skills_sync import sync_skills
sync_skills(quiet=True)
_sync_bundled_skills_for_startup()
except Exception:
pass
@@ -1602,6 +1794,7 @@ def cmd_chat(args):
"max_turns": getattr(args, "max_turns", None),
"ignore_rules": getattr(args, "ignore_rules", False),
"ignore_user_config": getattr(args, "ignore_user_config", False),
"compact": getattr(args, "compact", False),
}
# Filter out None values
kwargs = {k: v for k, v in kwargs.items() if v is not None}
@@ -2305,6 +2498,9 @@ _AUX_TASKS: list[tuple[str, str, str]] = [
("mcp", "MCP", "MCP tool reasoning"),
("title_generation", "Title generation", "session titles"),
("skills_hub", "Skills hub", "skills search/install"),
("triage_specifier", "Triage specifier", "kanban spec fleshing"),
("kanban_decomposer", "Kanban decomposer", "task decomposition"),
("profile_describer", "Profile describer", "auto profile descriptions"),
("curator", "Curator", "skill-usage review pass"),
]
@@ -4534,7 +4730,9 @@ def _model_flow_copilot(config, current_model=""):
source = creds.get("source", "")
else:
if source in {"GITHUB_TOKEN", "GH_TOKEN"}:
print(f" GitHub token: {api_key[:8]}... ✓ ({source})")
from hermes_cli.env_loader import format_secret_source_suffix
bw_suffix = format_secret_source_suffix(source)
print(f" GitHub token: {api_key[:8]}... ✓ ({source}{bw_suffix})")
elif source == "gh auth token":
print(" GitHub token: ✓ (from `gh auth token`)")
else:
@@ -4791,7 +4989,10 @@ def _prompt_api_key(pconfig, existing_key: str, provider_id: str = "") -> tuple:
return new_key, False
# Already configured — offer K / R / C ────────────────────────────────
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓")
from hermes_cli.env_loader import format_secret_source_suffix
source_suffix = format_secret_source_suffix(key_env) if key_env else ""
print(f" {pconfig.name} API key: {existing_key[:8]}... ✓{source_suffix}")
if not key_env:
# Nothing we can rewrite; just acknowledge and move on.
print()
@@ -5074,7 +5275,9 @@ def _model_flow_bedrock_api_key(config, region, current_model=""):
# Prompt for API key
existing_key = get_env_value("AWS_BEARER_TOKEN_BEDROCK") or ""
if existing_key:
print(f" Bedrock API Key: {existing_key[:12]}... ✓")
from hermes_cli.env_loader import format_secret_source_suffix
source_suffix = format_secret_source_suffix("AWS_BEARER_TOKEN_BEDROCK")
print(f" Bedrock API Key: {existing_key[:12]}... ✓{source_suffix}")
else:
print(f" Endpoint: {mantle_base_url}")
print()
@@ -5745,7 +5948,22 @@ def _model_flow_anthropic(config, current_model=""):
if has_creds:
# Show what we found
if existing_key:
print(f" Anthropic credentials: {existing_key[:12]}... ✓")
from hermes_cli.env_loader import format_secret_source_suffix
from hermes_cli.auth import PROVIDER_REGISTRY
# Surface which env var supplied the key so users with
# Bitwarden see "(from Bitwarden)" — without this, a detected
# BSM key looks identical to a key in .env and users assume
# nothing is wired up.
source_suffix = ""
for var in PROVIDER_REGISTRY["anthropic"].api_key_env_vars:
if os.getenv(var, "").strip() == existing_key:
source_suffix = format_secret_source_suffix(var)
if source_suffix:
break
print(
f" Anthropic credentials: {existing_key[:12]}... ✓{source_suffix}"
)
elif cc_available:
print(" Claude Code credentials: ✓ (auto-detected)")
print()
@@ -5879,6 +6097,13 @@ def cmd_webhook(args):
webhook_command(args)
def cmd_portal(args):
"""Nous Portal status and Tool Gateway routing surface."""
from hermes_cli.portal_cli import portal_command
return portal_command(args)
def cmd_slack(args):
"""Slack integration helpers.
@@ -5971,8 +6196,7 @@ def cmd_import(args):
run_import(args)
def cmd_version(args):
"""Show version."""
def _print_version_info(*, check_updates: bool = True) -> None:
print(f"Hermes Agent v{__version__} ({__release_date__})")
print(f"Project: {PROJECT_ROOT}")
@@ -5992,6 +6216,9 @@ def cmd_version(args):
except ImportError:
print("OpenAI SDK: Not installed")
if not check_updates:
return
# Show update status (synchronous — acceptable since user asked for version info)
try:
from hermes_cli.banner import check_for_updates
@@ -6010,6 +6237,11 @@ def cmd_version(args):
pass
def cmd_version(args):
"""Show version."""
_print_version_info(check_updates=True)
def cmd_uninstall(args):
"""Uninstall Hermes Agent."""
_require_tty("uninstall")
@@ -6086,24 +6318,36 @@ def _validate_critical_files_syntax(root) -> tuple[bool, str | None, str | None]
them after a successful ``git pull`` so we can auto-roll-back instead of
leaving the user with a bricked install.
The compiled ``.pyc`` is written to a temp directory rather than the
source tree's ``__pycache__/`` so we don't race with concurrent test
workers that walk the same dir, and so we don't leave a stale pyc
behind in production if the next interpreter run picks a different
Python version. The pyc is discarded on function return either way
we only care about the compile-or-not signal.
Returns ``(ok, failing_path, error_message)``. ``ok=True`` means every
file parsed cleanly.
"""
import py_compile
import tempfile
root = Path(root)
for relpath in _UPDATE_CRITICAL_FILES:
path = root / relpath
if not path.exists():
# Missing file is suspicious but not necessarily fatal — a future
# refactor may legitimately remove one of these. Skip and move on.
continue
try:
py_compile.compile(str(path), doraise=True)
except py_compile.PyCompileError as exc:
return False, str(path), str(exc)
except OSError as exc:
return False, str(path), f"could not read: {exc}"
with tempfile.TemporaryDirectory(prefix="hermes-syntax-check-") as tmpdir:
for relpath in _UPDATE_CRITICAL_FILES:
path = root / relpath
if not path.exists():
# Missing file is suspicious but not necessarily fatal — a future
# refactor may legitimately remove one of these. Skip and move on.
continue
# Mirror the relative path under the tmpdir so two different
# files with the same basename don't collide on the cfile name.
cfile = Path(tmpdir) / (relpath.replace("/", "__") + "c")
try:
py_compile.compile(str(path), cfile=str(cfile), doraise=True)
except py_compile.PyCompileError as exc:
return False, str(path), str(exc)
except OSError as exc:
return False, str(path), f"could not read: {exc}"
return True, None, None
@@ -10410,10 +10654,10 @@ _BUILTIN_SUBCOMMANDS = frozenset(
"config", "cron", "curator", "dashboard", "debug", "doctor",
"dump", "fallback", "gateway", "hooks", "import", "insights",
"kanban", "login", "logout", "logs", "lsp", "mcp", "memory", "migrate",
"model", "pairing", "plugins", "postinstall", "profile", "proxy",
"model", "pairing", "plugins", "portal", "postinstall", "profile", "proxy",
"send", "sessions", "setup",
"skills", "slack", "status", "tools", "uninstall", "update",
"version", "webhook", "whatsapp", "chat",
"version", "webhook", "whatsapp", "chat", "secrets",
# Help-ish invocations — plugin commands not being listed in
# top-level --help is an acceptable trade-off for skipping an
# expensive eager import of every bundled plugin module.
@@ -10503,6 +10747,143 @@ def _plugin_cli_discovery_needed() -> bool:
return True
_AGENT_COMMANDS = {None, "chat", "acp", "rl"}
_AGENT_SUBCOMMANDS = {
"cron": ("cron_command", {"run", "tick"}),
"gateway": ("gateway_command", {"run"}),
"mcp": ("mcp_action", {"serve"}),
}
def _prepare_agent_startup(args) -> None:
"""Discover plugins/MCP/hooks for commands that can run an agent turn."""
_sub_attr, _sub_set = _AGENT_SUBCOMMANDS.get(args.command, (None, None))
if not (
args.command in _AGENT_COMMANDS
or (_sub_attr and getattr(args, _sub_attr, None) in _sub_set)
):
return
_accept_hooks = bool(getattr(args, "accept_hooks", False))
try:
from hermes_cli.plugins import discover_plugins
discover_plugins()
except Exception:
logger.warning(
"plugin discovery failed at CLI startup",
exc_info=True,
)
try:
# MCP tool discovery — no event loop running in CLI/TUI startup,
# so inline is safe. Moved here from model_tools.py module scope
# to avoid freezing the gateway's event loop on its first message
# via the same lazy import path (#16856).
from tools.mcp_tool import discover_mcp_tools
discover_mcp_tools()
except Exception:
logger.debug(
"MCP tool discovery failed at CLI startup",
exc_info=True,
)
try:
from hermes_cli.config import load_config
from agent.shell_hooks import register_from_config
register_from_config(load_config(), accept_hooks=_accept_hooks)
except Exception:
logger.debug(
"shell-hook registration failed at CLI startup",
exc_info=True,
)
def _set_chat_arg_defaults(args) -> None:
for attr, default in [
("query", None),
("model", None),
("provider", None),
("toolsets", None),
("verbose", False),
("resume", None),
("continue_last", None),
("worktree", False),
]:
if not hasattr(args, attr):
setattr(args, attr, default)
def _try_termux_fast_cli_launch() -> bool:
"""Run obvious Termux non-TUI chat/oneshot/version paths on a light parser."""
if not _is_termux_startup_environment():
return False
if os.environ.get("HERMES_TERMUX_DISABLE_FAST_CLI") == "1":
return False
argv = sys.argv[1:]
if "-h" in argv or "--help" in argv:
return False
if os.environ.get("HERMES_TUI") == "1" or "--tui" in argv:
return False
if _is_termux_fast_version_argv(argv):
_print_version_info(check_updates=False)
return True
first = _first_positional_argv()
has_oneshot = any(
arg == "-z" or arg == "--oneshot" or arg.startswith("--oneshot=")
for arg in argv
)
if not has_oneshot and first not in {None, "chat"}:
return False
from hermes_cli._parser import build_top_level_parser
parser, _subparsers, chat_parser = build_top_level_parser()
chat_parser.set_defaults(func=cmd_chat)
args = parser.parse_args(_coalesce_session_name_args(argv))
if getattr(args, "version", False):
_print_version_info(check_updates=False)
return True
if getattr(args, "oneshot", None):
_prepare_agent_startup(args)
from hermes_cli.oneshot import run_oneshot
sys.exit(
run_oneshot(
args.oneshot,
model=getattr(args, "model", None),
provider=getattr(args, "provider", None),
toolsets=getattr(args, "toolsets", None),
)
)
if (args.resume or args.continue_last) and args.command is None:
args.command = "chat"
if args.command in {None, "chat"}:
_set_chat_arg_defaults(args)
interactive_prompt = not getattr(args, "query", None) and not getattr(args, "image", None)
if interactive_prompt:
# Bare Termux CLI should reach the prompt first and do agent-only
# discovery on the first submitted turn instead of before input.
setattr(args, "compact", True)
os.environ["HERMES_DEFER_AGENT_STARTUP"] = "1"
os.environ["HERMES_FAST_STARTUP_BANNER"] = "1"
if getattr(args, "accept_hooks", False):
os.environ["HERMES_ACCEPT_HOOKS"] = "1"
else:
_prepare_agent_startup(args)
cmd_chat(args)
return True
return False
def _try_termux_fast_tui_launch() -> bool:
"""Launch obvious Termux TUI invocations before building every subparser.
@@ -10563,6 +10944,8 @@ def main():
if _try_termux_fast_tui_launch():
return
if _try_termux_fast_cli_launch():
return
from hermes_cli._parser import build_top_level_parser
@@ -10660,6 +11043,42 @@ def main():
)
fallback_parser.set_defaults(func=cmd_fallback)
# =========================================================================
# secrets command — external secret managers (currently: Bitwarden)
# =========================================================================
secrets_parser = subparsers.add_parser(
"secrets",
help="Manage external secret sources (Bitwarden Secrets Manager)",
description=(
"Pull API keys from an external secret manager at process startup "
"instead of storing them in ~/.hermes/.env. Currently supports "
"Bitwarden Secrets Manager. See: "
"https://hermes-agent.nousresearch.com/docs/user-guide/secrets/bitwarden"
),
)
secrets_subparsers = secrets_parser.add_subparsers(dest="secrets_command")
secrets_bw = secrets_subparsers.add_parser(
"bitwarden",
aliases=["bw"],
help="Bitwarden Secrets Manager integration",
)
# Lazy import — only pays for itself when this subcommand is actually used.
from hermes_cli import secrets_cli as _secrets_cli
_secrets_cli.register_cli(secrets_bw)
def _dispatch_secrets(args): # noqa: ANN001
sub = getattr(args, "secrets_command", None)
bw_sub = getattr(args, "secrets_bw_command", None)
if sub in ("bitwarden", "bw") and bw_sub is not None:
return args.func(args)
secrets_parser.print_help()
return 0
secrets_parser.set_defaults(func=_dispatch_secrets)
# =========================================================================
# migrate command
# =========================================================================
@@ -10972,6 +11391,13 @@ def main():
help="On existing installs: only prompt for items that are missing "
"or unset, instead of running the full reconfigure wizard.",
)
setup_parser.add_argument(
"--portal",
action="store_true",
help="One-shot Nous Portal setup: log in via OAuth, set Nous as the "
"inference provider, and opt into the Tool Gateway. Skips the "
"rest of the wizard.",
)
setup_parser.set_defaults(func=cmd_setup)
# =========================================================================
@@ -11447,6 +11873,12 @@ def main():
webhook_parser.set_defaults(func=cmd_webhook)
# =========================================================================
# portal command — Nous Portal status + Tool Gateway routing
# =========================================================================
from hermes_cli.portal_cli import add_parser as _add_portal_parser
_add_portal_parser(subparsers)
# =========================================================================
# kanban command — multi-profile collaboration board
# =========================================================================
@@ -13325,51 +13757,7 @@ Examples:
# so introspection/management commands (hermes hooks list, cron
# list, gateway status, mcp add, ...) don't pay discovery cost or
# trigger consent prompts for hooks the user is still inspecting.
# Groups with mixed admin/CRUD vs. agent-running entries narrow via
# the nested subcommand (dest varies by parser).
_AGENT_COMMANDS = {None, "chat", "acp", "rl"}
_AGENT_SUBCOMMANDS = {
"cron": ("cron_command", {"run", "tick"}),
"gateway": ("gateway_command", {"run"}),
"mcp": ("mcp_action", {"serve"}),
}
_sub_attr, _sub_set = _AGENT_SUBCOMMANDS.get(args.command, (None, None))
if args.command in _AGENT_COMMANDS or (
_sub_attr and getattr(args, _sub_attr, None) in _sub_set
):
_accept_hooks = bool(getattr(args, "accept_hooks", False))
try:
from hermes_cli.plugins import discover_plugins
discover_plugins()
except Exception:
logger.warning(
"plugin discovery failed at CLI startup",
exc_info=True,
)
try:
# MCP tool discovery — no event loop running in CLI/TUI startup,
# so inline is safe. Moved here from model_tools.py module scope
# to avoid freezing the gateway's event loop on its first message
# via the same lazy import path (#16856).
from tools.mcp_tool import discover_mcp_tools
discover_mcp_tools()
except Exception:
logger.debug(
"MCP tool discovery failed at CLI startup",
exc_info=True,
)
try:
from hermes_cli.config import load_config
from agent.shell_hooks import register_from_config
register_from_config(load_config(), accept_hooks=_accept_hooks)
except Exception:
logger.debug(
"shell-hook registration failed at CLI startup",
exc_info=True,
)
_prepare_agent_startup(args)
# Handle top-level --oneshot / -z: single-shot mode, stdout = final
# response only, nothing else. Bypasses cli.py entirely.
+5 -8
View File
@@ -28,6 +28,8 @@ import sys
from contextlib import redirect_stderr, redirect_stdout
from typing import Optional
from hermes_cli.fallback_config import get_fallback_chain
def _normalize_toolsets(toolsets: object = None) -> list[str] | None:
if not toolsets:
@@ -301,14 +303,9 @@ def _run_agent(
toolsets_list = sorted(_get_platform_tools(cfg, "cli"))
session_db = _create_session_db_for_oneshot()
# Read fallback chain from profile config — supports both the new list
# format (fallback_providers) and the legacy single-dict (fallback_model).
# Mirrors the same normalization in cli.py so oneshot workers (e.g. kanban
# workers spawned via `hermes -p <profile> chat -q ...`) honour the
# profile's fallback chain just like interactive sessions do.
_fb = cfg.get("fallback_providers") or cfg.get("fallback_model") or []
if isinstance(_fb, dict):
_fb = [_fb] if _fb.get("provider") and _fb.get("model") else []
# Read the effective fallback chain from profile config so oneshot workers
# honour the same merge semantics as interactive CLI and gateway sessions.
_fb = get_fallback_chain(cfg)
agent = AIAgent(
api_key=runtime.get("api_key"),
+27 -7
View File
@@ -76,22 +76,42 @@ def _plugins_dir() -> Path:
return plugins
def _sanitize_plugin_name(name: str, plugins_dir: Path) -> Path:
def _sanitize_plugin_name(
name: str,
plugins_dir: Path,
*,
allow_subdir: bool = False,
) -> Path:
"""Validate a plugin name and return the safe target path inside *plugins_dir*.
Raises ``ValueError`` if the name contains path-traversal sequences or would
resolve outside the plugins directory.
``allow_subdir=True`` permits a single forward slash inside *name* so
category-namespaced plugin keys like ``observability/langfuse`` or
``image_gen/openai`` (the registry keys emitted by ``_discover_all_plugins``)
can be looked up. ``..`` and backslash are still rejected, leading and
trailing slashes are stripped, and the resolved target must still live
inside *plugins_dir*. Install paths leave this at the default ``False``
because a freshly-cloned plugin always lands top-level under
``~/.hermes/plugins/<name>/``.
"""
if not name:
raise ValueError("Plugin name must not be empty.")
if allow_subdir:
name = name.strip("/")
if not name:
raise ValueError("Plugin name must not be empty.")
if name in {".", ".."}:
raise ValueError(
f"Invalid plugin name '{name}': must not reference the plugins directory itself."
)
# Reject obvious traversal characters
for bad in ("/", "\\", ".."):
bad_chars = ("\\", "..") if allow_subdir else ("/", "\\", "..")
for bad in bad_chars:
if bad in name:
raise ValueError(f"Invalid plugin name '{name}': must not contain '{bad}'.")
@@ -326,7 +346,7 @@ def _display_removed(name: str, plugins_dir: Path) -> None:
def _require_installed_plugin(name: str, plugins_dir: Path, console) -> Path:
"""Return the plugin path if it exists, or exit with an error listing installed plugins."""
target = _sanitize_plugin_name(name, plugins_dir)
target = _sanitize_plugin_name(name, plugins_dir, allow_subdir=True)
if not target.exists():
installed = ", ".join(d.name for d in plugins_dir.iterdir() if d.is_dir()) or "(none)"
console.print(
@@ -1051,7 +1071,7 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
curses.init_pair(1, curses.COLOR_GREEN, -1)
curses.init_pair(2, curses.COLOR_YELLOW, -1)
curses.init_pair(3, curses.COLOR_CYAN, -1)
curses.init_pair(4, 8, -1) # dim gray
curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1) # dim gray
cursor = 0
scroll_offset = 0
@@ -1196,7 +1216,7 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
curses.init_pair(1, curses.COLOR_GREEN, -1)
curses.init_pair(2, curses.COLOR_YELLOW, -1)
curses.init_pair(3, curses.COLOR_CYAN, -1)
curses.init_pair(4, 8, -1)
curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)
curses.curs_set(0)
elif key in {curses.KEY_ENTER, 10, 13}:
if cursor < n_plugins:
@@ -1228,7 +1248,7 @@ def _run_composite_ui(curses, plugin_names, plugin_labels, plugin_selected,
curses.init_pair(1, curses.COLOR_GREEN, -1)
curses.init_pair(2, curses.COLOR_YELLOW, -1)
curses.init_pair(3, curses.COLOR_CYAN, -1)
curses.init_pair(4, 8, -1)
curses.init_pair(4, 8 if curses.COLORS > 8 else curses.COLOR_WHITE, -1)
curses.curs_set(0)
elif key in {27, ord("q")}:
# Save plugin changes on exit
@@ -1508,7 +1528,7 @@ def _user_installed_plugin_dir(name: str) -> Optional[Path]:
"""Resolved path under ``~/.hermes/plugins/<name>`` if it exists."""
plugins_dir = _plugins_dir()
try:
target = _sanitize_plugin_name(name, plugins_dir)
target = _sanitize_plugin_name(name, plugins_dir, allow_subdir=True)
except ValueError:
return None
return target if target.is_dir() else None
+219
View File
@@ -0,0 +1,219 @@
"""``hermes portal`` — small CLI surface for Nous Portal users.
Subcommands:
status Show Portal auth state + which Tool Gateway tools are routed.
open Open the Portal subscription page in the user's default browser.
tools List Tool Gateway tools and which are active in the current config.
This command is intentionally minimal it does not duplicate functionality
already in ``hermes auth`` or ``hermes tools``. It's a discovery + status
surface for the Portal subscription itself.
"""
from __future__ import annotations
import sys
import webbrowser
from typing import Optional
from hermes_cli.colors import Colors, color
from hermes_cli.config import load_config
DEFAULT_PORTAL_URL = "https://portal.nousresearch.com"
SUBSCRIPTION_URL = "https://portal.nousresearch.com/manage-subscription"
DOCS_URL = "https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway"
def _nous_portal_base_url() -> str:
"""Resolve the Portal base URL from auth state or default."""
try:
from hermes_cli.auth import get_nous_auth_status
status = get_nous_auth_status() or {}
url = status.get("portal_base_url")
if isinstance(url, str) and url.strip():
return url.rstrip("/")
except Exception:
pass
return DEFAULT_PORTAL_URL
def _cmd_status(args) -> int:
"""Show Portal auth + Tool Gateway routing summary."""
from hermes_cli.auth import get_nous_auth_status
from hermes_cli.nous_subscription import get_nous_subscription_features
config = load_config() or {}
try:
auth = get_nous_auth_status() or {}
except Exception:
auth = {}
logged_in = bool(auth.get("logged_in"))
print()
print(color(" Nous Portal", Colors.MAGENTA))
print(color(" ───────────", Colors.MAGENTA))
if logged_in:
portal = auth.get("portal_base_url") or DEFAULT_PORTAL_URL
print(f" Auth: {color('✓ logged in', Colors.GREEN)}")
print(f" Portal: {portal}")
inference = auth.get("inference_base_url")
if inference:
print(f" API: {inference}")
else:
print(f" Auth: {color('not logged in', Colors.YELLOW)}")
print(f" Sign up: {SUBSCRIPTION_URL}")
print(f" Login: hermes auth add nous --type oauth")
# Provider selection (independent of auth)
model_cfg = config.get("model") if isinstance(config.get("model"), dict) else {}
provider = str(model_cfg.get("provider") or "").strip().lower()
if provider == "nous":
print(f" Model: {color('✓ using Nous as inference provider', Colors.GREEN)}")
elif provider:
print(f" Model: currently {provider} (switch with `hermes model`)")
# Tool Gateway routing
print()
print(color(" Tool Gateway", Colors.MAGENTA))
print(color(" ────────────", Colors.MAGENTA))
try:
features = get_nous_subscription_features(config)
except Exception:
features = None
if features is None:
print(" (could not resolve subscription state)")
return 0
rows = []
for feat in features.items():
if feat.managed_by_nous:
state = color("via Nous Portal", Colors.GREEN)
elif feat.active and feat.current_provider:
state = feat.current_provider
elif feat.active:
state = "active"
else:
state = color("not configured", Colors.DIM)
rows.append((feat.label, state))
width = max((len(r[0]) for r in rows), default=0)
for label, state in rows:
print(f" {label:<{width}} {state}")
if not logged_in:
print()
print(color(f" Docs: {DOCS_URL}", Colors.DIM))
return 0
def _cmd_open(args) -> int:
"""Open the Portal subscription page in the default browser."""
target = SUBSCRIPTION_URL
print(f"Opening {target}")
try:
opened = webbrowser.open(target)
except Exception:
opened = False
if not opened:
print()
print("Could not launch a browser. Visit the URL above manually.")
return 1
return 0
def _cmd_tools(args) -> int:
"""List the Tool Gateway catalog + current routing."""
from hermes_cli.nous_subscription import get_nous_subscription_features
config = load_config() or {}
try:
features = get_nous_subscription_features(config)
except Exception:
print("Could not resolve Tool Gateway state.", file=sys.stderr)
return 1
# Static catalog — the partners Tool Gateway routes to today.
catalog = [
("web", "Web search & extract", "Firecrawl"),
("image_gen", "Image generation", "FAL"),
("tts", "Text-to-speech", "OpenAI TTS"),
("browser", "Browser automation", "Browser Use"),
("modal", "Cloud terminal", "Modal"),
]
print()
print(color(" Tool Gateway catalog", Colors.MAGENTA))
print(color(" ────────────────────", Colors.MAGENTA))
if not features.nous_auth_present:
print(color(" Not logged into Nous Portal — sign in with `hermes auth add nous --type oauth`.", Colors.YELLOW))
print()
label_width = max(len(label) for _, label, _ in catalog)
for key, label, partner in catalog:
feat = features.features.get(key)
if feat is None:
state = color("unknown", Colors.DIM)
elif feat.managed_by_nous:
state = color("✓ via Nous Portal", Colors.GREEN)
elif feat.active and feat.current_provider:
state = feat.current_provider
elif feat.active:
state = "active"
else:
state = color("not configured", Colors.DIM)
print(f" {label:<{label_width}} partner: {partner:<14} {state}")
print()
print(color(f" Manage your subscription: {SUBSCRIPTION_URL}", Colors.DIM))
print(color(f" Docs: {DOCS_URL}", Colors.DIM))
return 0
def portal_command(args) -> int:
"""Top-level dispatch for `hermes portal <subcommand>`."""
sub = getattr(args, "portal_command", None)
if sub in {None, ""}:
# Default to status — matches gh / kubectl conventions where the
# subcommand-less form gives a useful overview.
return _cmd_status(args)
if sub == "status":
return _cmd_status(args)
if sub == "open":
return _cmd_open(args)
if sub == "tools":
return _cmd_tools(args)
print(f"Unknown portal subcommand: {sub}", file=sys.stderr)
print("Run `hermes portal -h` for usage.", file=sys.stderr)
return 1
def add_parser(subparsers) -> None:
"""Register `hermes portal` on the given argparse subparsers object."""
portal_parser = subparsers.add_parser(
"portal",
help="Nous Portal status, subscription, and Tool Gateway routing",
description=(
"Inspect Nous Portal auth, Tool Gateway routing, and open the "
"Portal subscription page. Subcommands: status (default), "
"open, tools."
),
)
portal_sub = portal_parser.add_subparsers(dest="portal_command")
portal_sub.add_parser(
"status",
help="Show Portal auth + Tool Gateway routing summary (default)",
)
portal_sub.add_parser(
"open",
help="Open the Portal subscription page in your default browser",
)
portal_sub.add_parser(
"tools",
help="List Tool Gateway tools and which are routed via Nous",
)
portal_parser.set_defaults(func=portal_command)
+3 -3
View File
@@ -35,6 +35,7 @@ from pathlib import Path
from typing import Optional
from hermes_cli import profiles as profiles_mod
from agent.skill_utils import is_excluded_skill_path
logger = logging.getLogger(__name__)
@@ -109,8 +110,7 @@ def _collect_skills(profile_dir: Path) -> list[str]:
return []
names: list[str] = []
for md in skills_dir.rglob("SKILL.md"):
path_str = str(md)
if "/.hub/" in path_str or "/.git/" in path_str:
if is_excluded_skill_path(md):
continue
try:
rel = md.relative_to(skills_dir)
@@ -201,7 +201,7 @@ def describe_profile(
skill_list = "\n".join(f" - {n}" for n in skill_names) or " (no skills installed)"
skill_count = sum(
1 for _ in (profile_dir / "skills").rglob("SKILL.md")
if "/.hub/" not in str(_) and "/.git/" not in str(_)
if not is_excluded_skill_path(_)
) if (profile_dir / "skills").is_dir() else 0
# Read model + provider from the profile's config.
+5 -1
View File
@@ -70,6 +70,8 @@ from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from agent.skill_utils import is_excluded_skill_path
# ---------------------------------------------------------------------------
# Constants
@@ -463,7 +465,9 @@ def _count_skills(staged: Path) -> int:
skills_dir = staged / "skills"
if not skills_dir.is_dir():
return 0
return sum(1 for _ in skills_dir.rglob("SKILL.md"))
return sum(
1 for p in skills_dir.rglob("SKILL.md") if not is_excluded_skill_path(p)
)
def plan_install(
+48 -3
View File
@@ -30,6 +30,8 @@ from dataclasses import dataclass
from pathlib import Path, PurePosixPath, PureWindowsPath
from typing import List, Optional
from agent.skill_utils import is_excluded_skill_path
_PROFILE_ID_RE = re.compile(r"^[a-z0-9][a-z0-9_-]{0,63}$")
# Directories bootstrapped inside every new profile
@@ -485,8 +487,9 @@ def _count_skills(profile_dir: Path) -> int:
return 0
count = 0
for md in skills_dir.rglob("SKILL.md"):
if "/.hub/" not in str(md) and "/.git/" not in str(md):
count += 1
if is_excluded_skill_path(md):
continue
count += 1
return count
@@ -902,7 +905,49 @@ def delete_profile(name: str, yes: bool = False) -> Path:
# 4. Remove profile directory
try:
shutil.rmtree(profile_dir)
def _make_writable(func, path, exc):
"""onexc/onerror handler: add +w on PermissionError so rmtree can proceed.
Handles two cases on NixOS (and other systems with read-only
copies from immutable stores):
1. The path itself isn't writable (e.g. a file with mode 0444)
2. The *parent* directory isn't writable (e.g. mode 0555)
Compatible with both the ``onexc`` API (3.12+, receives an
exception instance) and the ``onerror`` API (3.11-, receives
``sys.exc_info()`` tuple).
"""
import stat as _stat
import sys as _sys
# Normalise the two callback signatures:
# onexc(func, path, exc_instance) — 3.12+
# onerror(func, path, exc_info_tuple) — 3.11
if isinstance(exc, tuple):
exc = exc[1] # exc_info → actual exception object
if isinstance(exc, PermissionError):
# Make the path writable
try:
os.chmod(path, os.stat(path).st_mode | _stat.S_IWUSR)
except OSError:
pass
# Also make the parent writable (needed for unlink/rmdir)
parent = os.path.dirname(path)
if parent:
try:
os.chmod(parent, os.stat(parent).st_mode | _stat.S_IWUSR)
except OSError:
pass
func(path)
else:
raise
# ``onexc`` was added in 3.12; fall back to ``onerror`` on 3.11.
try:
shutil.rmtree(profile_dir, onexc=_make_writable)
except TypeError:
shutil.rmtree(profile_dir, onerror=_make_writable)
print(f"✓ Removed {profile_dir}")
except Exception as e:
print(f"⚠ Could not remove {profile_dir}: {e}")
+5 -1
View File
@@ -27,6 +27,7 @@ from hermes_cli.auth import (
_quarantine_nous_oauth_state,
_quarantine_nous_pool_entries,
_save_auth_store,
_validate_nous_inference_url_from_network,
_write_shared_nous_state,
resolve_nous_runtime_credentials,
)
@@ -137,7 +138,10 @@ class NousPortalAdapter(UpstreamAdapter):
"Try `hermes login nous` to re-authenticate."
)
base_url = refreshed.get("base_url") or DEFAULT_NOUS_INFERENCE_URL
base_url = (
_validate_nous_inference_url_from_network(refreshed.get("base_url"))
or DEFAULT_NOUS_INFERENCE_URL
)
base_url = base_url.rstrip("/")
return UpstreamCredential(
+25
View File
@@ -528,6 +528,9 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
"api_key": resolved_api_key,
"model": entry.get("default_model", ""),
}
extra_body = entry.get("extra_body")
if isinstance(extra_body, dict):
result["extra_body"] = dict(extra_body)
# The v11→v12 migration writes the API mode under the new
# ``transport`` field, but hand-edited configs may still
# use the legacy ``api_mode`` spelling. Accept both —
@@ -553,6 +556,9 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
"api_key": resolved_api_key,
"model": entry.get("default_model", ""),
}
extra_body = entry.get("extra_body")
if isinstance(extra_body, dict):
result["extra_body"] = dict(extra_body)
api_mode = _parse_api_mode(entry.get("api_mode") or entry.get("transport"))
if api_mode:
result["api_mode"] = api_mode
@@ -596,6 +602,9 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
result["key_env"] = key_env
if provider_key:
result["provider_key"] = provider_key
extra_body = entry.get("extra_body")
if isinstance(extra_body, dict):
result["extra_body"] = dict(extra_body)
api_mode = _parse_api_mode(entry.get("api_mode"))
if api_mode:
result["api_mode"] = api_mode
@@ -607,6 +616,13 @@ def _get_named_custom_provider(requested_provider: str) -> Optional[Dict[str, An
return None
def _custom_provider_request_overrides(custom_provider: Dict[str, Any]) -> Dict[str, Any]:
extra_body = custom_provider.get("extra_body")
if not isinstance(extra_body, dict) or not extra_body:
return {}
return {"extra_body": dict(extra_body)}
def _resolve_named_custom_runtime(
*,
requested_provider: str,
@@ -683,6 +699,12 @@ def _resolve_named_custom_runtime(
model_name = custom_provider.get("model")
if model_name:
pool_result["model"] = model_name
request_overrides = _custom_provider_request_overrides(custom_provider)
if request_overrides:
pool_result["request_overrides"] = {
**dict(pool_result.get("request_overrides") or {}),
**request_overrides,
}
return pool_result
_cp_is_openai_url = base_url_host_matches(base_url, "openai.com") or base_url_host_matches(base_url, "openai.azure.com")
@@ -714,6 +736,9 @@ def _resolve_named_custom_runtime(
# provider name differs from the actual model string the API expects.
if custom_provider.get("model"):
result["model"] = custom_provider["model"]
request_overrides = _custom_provider_request_overrides(custom_provider)
if request_overrides:
result["request_overrides"] = request_overrides
return result
+445
View File
@@ -0,0 +1,445 @@
"""CLI handlers for ``hermes secrets bitwarden ...``.
Subcommands:
setup interactive wizard: install bws, prompt for token + project, test fetch
status show current config + binary version + last fetch outcome
sync run a fetch right now and show what would be applied (dry-run friendly)
disable flip ``secrets.bitwarden.enabled`` to False
install just download the bws binary (no token / project required)
"""
from __future__ import annotations
import argparse
import getpass
import json
import os
import subprocess
import sys
from pathlib import Path
from typing import List, Optional, Tuple
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from agent.secret_sources import bitwarden as bw
from hermes_cli.config import (
get_env_path,
load_config,
save_config,
save_env_value,
)
# ---------------------------------------------------------------------------
# Argparse wiring — called from hermes_cli.main
# ---------------------------------------------------------------------------
def register_cli(parent_parser: argparse.ArgumentParser) -> None:
"""Attach the ``bitwarden`` subcommand tree to a parent parser.
Called from ``hermes_cli.main`` as part of building the top-level
``hermes secrets`` parser.
"""
sub = parent_parser.add_subparsers(dest="secrets_bw_command")
setup = sub.add_parser(
"setup",
help="Interactive wizard: install bws, store access token, pick project",
)
setup.add_argument(
"--project-id",
help="Pre-select a project UUID instead of prompting",
)
setup.add_argument(
"--access-token",
help="Provide the access token non-interactively (will be stored in .env)",
)
setup.set_defaults(func=cmd_setup)
status = sub.add_parser("status", help="Show config + binary + last fetch")
status.set_defaults(func=cmd_status)
sync = sub.add_parser("sync", help="Fetch secrets now and report what changed")
sync.add_argument(
"--apply",
action="store_true",
help="Actually export the secrets into the current shell's env (default: dry-run)",
)
sync.set_defaults(func=cmd_sync)
disable = sub.add_parser("disable", help="Turn off the Bitwarden integration")
disable.set_defaults(func=cmd_disable)
install = sub.add_parser(
"install",
help=f"Download and verify the pinned bws binary (v{bw._BWS_VERSION})",
)
install.add_argument(
"--force",
action="store_true",
help="Re-download even if a managed copy already exists",
)
install.set_defaults(func=cmd_install)
# ---------------------------------------------------------------------------
# Handlers
# ---------------------------------------------------------------------------
def cmd_setup(args: argparse.Namespace) -> int:
console = Console()
console.print(
Panel.fit(
"[bold]Bitwarden Secrets Manager setup[/bold]\n\n"
"Need an access token? In the Bitwarden web app:\n"
" Secrets Manager → Machine accounts → [your account] →\n"
" Access tokens → Create access token\n\n"
"Copy the token (starts with [cyan]0.[/cyan]…) — it cannot be retrieved later.",
border_style="cyan",
)
)
# ------------------------------------------------------------------ binary
console.print()
console.print("[bold]Step 1[/bold] Install the bws CLI")
try:
binary = bw.find_bws(install_if_missing=False)
if binary is None:
console.print(" No bws on PATH — downloading…")
binary = bw.install_bws()
version = _bws_version(binary)
console.print(f" [green]✓[/green] {binary} ({version})")
except Exception as exc: # noqa: BLE001
console.print(f" [red]✗ Could not install bws: {exc}[/red]")
console.print(
" Manual install: "
"https://github.com/bitwarden/sdk-sm/releases"
)
return 1
# ------------------------------------------------------------------- token
console.print()
console.print("[bold]Step 2[/bold] Provide your access token")
cfg = load_config()
secrets_cfg = (cfg.setdefault("secrets", {})
.setdefault("bitwarden", {}))
token_env = secrets_cfg.get("access_token_env", "BWS_ACCESS_TOKEN")
token = (args.access_token or "").strip()
if not token:
token = getpass.getpass(f" Paste access token ({token_env}): ").strip()
if not token:
console.print(" [red]Empty token, aborting.[/red]")
return 1
if not token.startswith("0."):
console.print(
" [yellow]Warning: token doesn't start with '0.' — usually that means "
"you pasted something other than a BSM access token. Continuing anyway.[/yellow]"
)
save_env_value(token_env, token)
os.environ[token_env] = token # so the test fetch below sees it
console.print(f" [green]✓[/green] stored in {get_env_path()} as {token_env}")
# ------------------------------------------------------------------- project
if args.project_id and args.project_id.strip():
project_id = args.project_id.strip()
else:
console.print()
console.print("[bold]Step 3[/bold] Pick a project")
project_id = ""
projects = _list_projects(binary, token, console)
if projects is None:
return 1
if not projects:
console.print(" [yellow]No projects visible to this machine account.[/yellow]")
console.print(
" In the Bitwarden web app, open the machine account → Projects tab "
"and grant it access to at least one project."
)
return 1
table = Table(show_header=True, header_style="bold")
table.add_column("#", style="cyan", width=4)
table.add_column("Name")
table.add_column("ID", style="dim")
for i, p in enumerate(projects, 1):
table.add_row(str(i), p.get("name", "?"), p.get("id", "?"))
console.print(table)
while True:
choice = console.input(f" Select project [1-{len(projects)}]: ").strip()
if not choice:
continue
try:
idx = int(choice)
except ValueError:
console.print(" [red]Enter a number.[/red]")
continue
if 1 <= idx <= len(projects):
project_id = projects[idx - 1]["id"]
break
console.print(f" [red]Out of range — pick 1-{len(projects)}.[/red]")
# ------------------------------------------------------------------- test
console.print()
step_num = 4 if not (args.project_id and args.project_id.strip()) else 3
console.print(f"[bold]Step {step_num}[/bold] Test fetch")
try:
secrets, warnings = bw.fetch_bitwarden_secrets(
access_token=token,
project_id=project_id,
binary=binary,
use_cache=False,
)
except Exception as exc: # noqa: BLE001
console.print(f" [red]✗ Fetch failed: {exc}[/red]")
return 1
if not secrets:
console.print(" [yellow]Fetch succeeded but the project has no secrets.[/yellow]")
else:
table = Table(show_header=True, header_style="bold")
table.add_column("Name", style="cyan")
table.add_column("Status")
for key in sorted(secrets):
if key == token_env:
status = "[dim]bootstrap token — never overrides itself[/dim]"
elif os.environ.get(key):
status = "[yellow]already set in env (will be overwritten)[/yellow]"
else:
status = "[green]new[/green]"
table.add_row(key, status)
console.print(table)
for w in warnings:
console.print(f" [yellow]warning:[/yellow] {w}")
# ------------------------------------------------------------------- save
secrets_cfg["enabled"] = True
secrets_cfg["project_id"] = project_id
secrets_cfg.setdefault("access_token_env", token_env)
secrets_cfg.setdefault("cache_ttl_seconds", 300)
secrets_cfg.setdefault("override_existing", True)
secrets_cfg.setdefault("auto_install", True)
save_config(cfg)
console.print()
console.print(
"[green]✓ Bitwarden Secrets Manager is enabled.[/green] "
"Secrets will be pulled at the start of every Hermes process."
)
console.print(
" Status: [cyan]hermes secrets bitwarden status[/cyan]\n"
" Refresh: [cyan]hermes secrets bitwarden sync[/cyan]\n"
" Disable: [cyan]hermes secrets bitwarden disable[/cyan]"
)
return 0
def cmd_status(args: argparse.Namespace) -> int:
console = Console()
cfg = load_config()
bw_cfg = (cfg.get("secrets") or {}).get("bitwarden") or {}
enabled = bool(bw_cfg.get("enabled"))
token_env = bw_cfg.get("access_token_env", "BWS_ACCESS_TOKEN")
project_id = bw_cfg.get("project_id", "")
token_set = bool(os.environ.get(token_env))
table = Table(show_header=False, box=None, padding=(0, 2))
table.add_column("", style="bold")
table.add_column("")
table.add_row("Enabled", _yn(enabled))
table.add_row("Token env var", token_env)
table.add_row("Token in env", _yn(token_set))
table.add_row("Project ID", project_id or "[dim](unset)[/dim]")
table.add_row("Override existing", _yn(bool(bw_cfg.get("override_existing", False))))
table.add_row("Cache TTL (s)", str(bw_cfg.get("cache_ttl_seconds", 300)))
table.add_row("Auto-install", _yn(bool(bw_cfg.get("auto_install", True))))
binary = bw.find_bws(install_if_missing=False)
if binary:
table.add_row("bws binary", f"{binary} ({_bws_version(binary)})")
else:
table.add_row("bws binary", "[yellow]not installed[/yellow]")
console.print(Panel(table, title="Bitwarden Secrets Manager", border_style="cyan"))
if not enabled:
console.print("\n Run [cyan]hermes secrets bitwarden setup[/cyan] to enable.")
return 0
if not token_set:
console.print(
f"\n [yellow]Enabled but {token_env} is not set — Hermes will skip BSM "
"and warn on next startup.[/yellow]"
)
if not project_id:
console.print(
"\n [yellow]Enabled but no project_id — nothing to fetch.[/yellow]"
)
return 0
def cmd_sync(args: argparse.Namespace) -> int:
console = Console()
cfg = load_config()
bw_cfg = (cfg.get("secrets") or {}).get("bitwarden") or {}
if not bw_cfg.get("enabled"):
console.print(
"[yellow]Bitwarden integration is disabled. Run "
"`hermes secrets bitwarden setup` first.[/yellow]"
)
return 1
token_env = bw_cfg.get("access_token_env", "BWS_ACCESS_TOKEN")
token = os.environ.get(token_env, "").strip()
if not token:
console.print(f"[red]{token_env} is not set.[/red]")
return 1
project_id = bw_cfg.get("project_id", "")
if not project_id:
console.print("[red]No project_id configured.[/red]")
return 1
try:
secrets, warnings = bw.fetch_bitwarden_secrets(
access_token=token,
project_id=project_id,
use_cache=False,
)
except Exception as exc: # noqa: BLE001
console.print(f"[red]Fetch failed: {exc}[/red]")
return 1
if not secrets:
console.print("[yellow]No secrets in project.[/yellow]")
return 0
override = bool(bw_cfg.get("override_existing", False)) or args.apply
table = Table(show_header=True, header_style="bold")
table.add_column("Name", style="cyan")
table.add_column("Action")
applied = 0
for key in sorted(secrets):
if key == token_env:
table.add_row(key, "[dim]skip (bootstrap token)[/dim]")
continue
already = bool(os.environ.get(key))
if already and not override:
table.add_row(key, "[dim]skip (already set)[/dim]")
continue
if args.apply:
os.environ[key] = secrets[key]
applied += 1
table.add_row(key, "[green]exported[/green]" + (" (overrode)" if already else ""))
else:
table.add_row(key, "[green]would export[/green]" + (" (overrides)" if already else ""))
console.print(table)
for w in warnings:
console.print(f"[yellow]warning:[/yellow] {w}")
if not args.apply:
console.print(
"\n This was a dry-run — secrets are picked up automatically on the "
"next [cyan]hermes[/cyan] invocation. Re-run with [cyan]--apply[/cyan] "
"to export into the current shell instead."
)
else:
console.print(f"\n [green]Exported {applied} secret(s) into current process.[/green]")
return 0
def cmd_disable(args: argparse.Namespace) -> int:
console = Console()
cfg = load_config()
bw_cfg = (cfg.setdefault("secrets", {})
.setdefault("bitwarden", {}))
bw_cfg["enabled"] = False
save_config(cfg)
console.print(
"[green]Disabled.[/green] Bitwarden secrets will NOT be pulled on the next "
"Hermes invocation.\n"
" Your access token is left in .env — remove it manually if you also want "
"to revoke the credential."
)
return 0
def cmd_install(args: argparse.Namespace) -> int:
console = Console()
try:
path = bw.install_bws(force=bool(args.force))
console.print(f"[green]✓[/green] {path} ({_bws_version(path)})")
return 0
except Exception as exc: # noqa: BLE001
console.print(f"[red]Install failed: {exc}[/red]")
return 1
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _yn(b: bool) -> str:
return "[green]yes[/green]" if b else "[dim]no[/dim]"
def _bws_version(binary: Path) -> str:
try:
res = subprocess.run(
[str(binary), "--version"],
capture_output=True,
text=True,
timeout=5,
)
if res.returncode == 0:
return (res.stdout or res.stderr).strip().splitlines()[0]
except (OSError, subprocess.TimeoutExpired):
pass
return "version unknown"
def _list_projects(
binary: Path, token: str, console: Console
) -> Optional[List[dict]]:
"""Call ``bws project list`` and return the parsed list, or None on failure."""
env = os.environ.copy()
env["BWS_ACCESS_TOKEN"] = token
env.setdefault("NO_COLOR", "1")
try:
res = subprocess.run(
[str(binary), "project", "list", "--output", "json"],
env=env,
capture_output=True,
text=True,
timeout=15,
)
except (OSError, subprocess.TimeoutExpired) as exc:
console.print(f" [red]Couldn't list projects: {exc}[/red]")
return None
if res.returncode != 0:
err = (res.stderr or res.stdout).strip()[:300]
console.print(f" [red]bws project list failed: {err}[/red]")
if "authorization" in err.lower() or "invalid" in err.lower():
console.print(
" [yellow]This usually means the access token is wrong or revoked. "
"Double-check it in the Bitwarden web app.[/yellow]"
)
return None
try:
data = json.loads(res.stdout or "[]")
except json.JSONDecodeError as exc:
console.print(f" [red]bws returned non-JSON: {exc}[/red]")
return None
if not isinstance(data, list):
return []
return [p for p in data if isinstance(p, dict) and p.get("id")]
+118 -68
View File
@@ -2034,74 +2034,6 @@ def _setup_telegram():
save_env_value("TELEGRAM_HOME_CHANNEL", home_channel)
def _setup_discord():
"""Configure Discord bot credentials and allowlist."""
print_header("Discord")
existing = get_env_value("DISCORD_BOT_TOKEN")
if existing:
print_info("Discord: already configured")
if not prompt_yes_no("Reconfigure Discord?", False):
if not get_env_value("DISCORD_ALLOWED_USERS"):
print_info("⚠️ Discord has no user allowlist - anyone can use your bot!")
if prompt_yes_no("Add allowed users now?", True):
print_info(" To find Discord ID: Enable Developer Mode, right-click name → Copy ID")
allowed_users = prompt("Allowed user IDs (comma-separated)")
if allowed_users:
cleaned_ids = _clean_discord_user_ids(allowed_users)
save_env_value("DISCORD_ALLOWED_USERS", ",".join(cleaned_ids))
print_success("Discord allowlist configured")
return
print_info("Create a bot at https://discord.com/developers/applications")
token = prompt("Discord bot token", password=True)
if not token:
return
save_env_value("DISCORD_BOT_TOKEN", token)
print_success("Discord token saved")
print()
print_info("🔒 Security: Restrict who can use your bot")
print_info(" To find your Discord user ID:")
print_info(" 1. Enable Developer Mode in Discord settings")
print_info(" 2. Right-click your name → Copy ID")
print()
print_info(" You can also use Discord usernames (resolved on gateway start).")
print()
allowed_users = prompt(
"Allowed user IDs or usernames (comma-separated, leave empty for open access)"
)
if allowed_users:
cleaned_ids = _clean_discord_user_ids(allowed_users)
save_env_value("DISCORD_ALLOWED_USERS", ",".join(cleaned_ids))
print_success("Discord allowlist configured")
else:
print_info("⚠️ No allowlist set - anyone in servers with your bot can use it!")
print()
print_info("📬 Home Channel: where Hermes delivers cron job results,")
print_info(" cross-platform messages, and notifications.")
print_info(" To get a channel ID: right-click a channel → Copy Channel ID")
print_info(" (requires Developer Mode in Discord settings)")
print_info(" You can also set this later by typing /set-home in a Discord channel.")
home_channel = prompt("Home channel ID (leave empty to set later with /set-home)")
if home_channel:
save_env_value("DISCORD_HOME_CHANNEL", home_channel)
def _clean_discord_user_ids(raw: str) -> list:
"""Strip common Discord mention prefixes from a comma-separated ID string."""
cleaned = []
for uid in raw.replace(" ", "").split(","):
uid = uid.strip()
if uid.startswith("<@") and uid.endswith(">"):
uid = uid.lstrip("<@!").rstrip(">")
if uid.lower().startswith("user:"):
uid = uid[5:]
if uid:
cleaned.append(uid)
return cleaned
def _setup_slack():
"""Configure Slack bot credentials."""
print_header("Slack")
@@ -3128,6 +3060,119 @@ SETUP_SECTIONS = [
]
def _run_portal_one_shot(config: dict) -> None:
"""One-shot Nous Portal setup — OAuth + provider switch + Tool Gateway.
Wired into ``hermes setup --portal``. Does NOT prompt for anything
besides what the underlying OAuth + Tool Gateway prompts already need.
Designed to be shareable as a single command (``hermes setup --portal``)
that gets a brand-new user from zero to a fully working Hermes session
with web/image/tts/browser tools all routed via their Portal sub.
"""
from types import SimpleNamespace
from hermes_cli.auth_commands import auth_add_command
from hermes_cli.config import save_config
from hermes_cli.auth import get_nous_auth_status
from hermes_cli.nous_subscription import prompt_enable_tool_gateway
print()
print(
color(
"┌─────────────────────────────────────────────────────────┐",
Colors.MAGENTA,
)
)
print(color("│ ⚕ Hermes Setup — Nous Portal (one-shot) │", Colors.MAGENTA))
print(
color(
"└─────────────────────────────────────────────────────────┘",
Colors.MAGENTA,
)
)
print()
print_info(" One subscription, 300+ models, plus the Tool Gateway:")
print_info(" web search, image generation, TTS, browser automation")
print_info(" — all routed through your Nous Portal sub.")
print()
print_info(" Sign up: https://portal.nousresearch.com/manage-subscription")
print()
# Skip OAuth if already logged in (don't re-prompt every time the user
# runs `hermes setup --portal` after a successful first run).
already_logged_in = False
try:
already_logged_in = bool((get_nous_auth_status() or {}).get("logged_in"))
except Exception:
already_logged_in = False
if already_logged_in:
print_success(" Already logged into Nous Portal.")
else:
# Hand off to the shared auth wiring so the device-code flow is
# identical to `hermes auth add nous --type oauth`. SimpleNamespace
# mirrors the argparse Namespace contract that auth_add_command expects.
ns = SimpleNamespace(
provider="nous",
auth_type="oauth",
label=None,
api_key=None,
portal_url=None,
inference_url=None,
client_id=None,
scope=None,
no_browser=False,
timeout=None,
insecure=False,
ca_bundle=None,
min_key_ttl_seconds=5 * 60,
)
try:
auth_add_command(ns)
except SystemExit as e:
print()
print_error(f" Nous Portal login failed (exit {e.code}).")
print_info(" You can retry later with `hermes auth add nous --type oauth`.")
return
except (KeyboardInterrupt, EOFError):
print()
print_info(" Setup cancelled.")
return
except Exception as exc:
print()
print_error(f" Nous Portal login failed: {exc}")
print_info(" You can retry later with `hermes auth add nous --type oauth`.")
return
# Set provider → nous so the model picker, status surfaces, and
# managed-tool gating all light up. Leave model.model empty so the
# runtime picks Nous's default model; the user can change it later
# with `hermes model`.
model_cfg = config.get("model")
if not isinstance(model_cfg, dict):
model_cfg = {}
config["model"] = model_cfg
model_cfg["provider"] = "nous"
save_config(config)
print()
print_success(" Nous set as your inference provider.")
# Offer the Tool Gateway opt-in (single Y/n) — same flow that fires
# from `hermes model` after picking Nous.
print()
try:
prompt_enable_tool_gateway(config)
except (KeyboardInterrupt, EOFError):
pass
except Exception as exc:
print_warning(f" Tool Gateway prompt skipped: {exc}")
print()
print_success("Portal setup complete.")
print_info(" Run `hermes portal status` to inspect routing.")
print_info(" Run `hermes` to start chatting.")
def run_setup_wizard(args):
"""Run the interactive setup wizard.
@@ -3183,6 +3228,11 @@ def run_setup_wizard(args):
)
return
# --portal: one-shot Nous Portal setup. Skips the rest of the wizard.
if bool(getattr(args, "portal", False)):
_run_portal_one_shot(config)
return
# Check if a specific section was requested
section = getattr(args, "section", None)
if section:
+6 -2
View File
@@ -23,6 +23,7 @@ from rich.table import Table
# Lazy imports to avoid circular dependencies and slow startup.
# tools.skills_hub and tools.skills_guard are imported inside functions.
from hermes_constants import display_hermes_home
from agent.skill_utils import is_excluded_skill_path
_console = Console()
@@ -178,9 +179,12 @@ def _existing_categories() -> List[str]:
# top level (no category); otherwise treat as a category bucket.
if (entry / "SKILL.md").exists():
continue
# Has at least one nested SKILL.md?
# Has at least one nested SKILL.md (excluding dependency/cache dirs)?
try:
if any(entry.rglob("SKILL.md")):
if any(
not is_excluded_skill_path(p)
for p in entry.rglob("SKILL.md")
):
out.append(entry.name)
except OSError:
continue
+128 -30
View File
@@ -311,6 +311,16 @@ TOOL_CATEGORIES = {
"image_gen": {
"name": "Image Generation",
"icon": "🎨",
# Per-provider rows for FAL.ai (`plugins/image_gen/fal`), OpenAI,
# OpenAI Codex, and xAI are injected at runtime from each
# ``plugins.image_gen.<vendor>`` package via
# ``_plugin_image_gen_providers()`` in ``_visible_providers``.
# Only non-provider UX setup-flow rows remain here:
# - "Nous Subscription" — managed FAL billed via the Nous
# subscription (requires_nous_auth + override_env_vars).
# Uses the fal plugin as the underlying backend but has a
# distinct setup UX.
# Mirrors the shape browser/video_gen ship today.
"providers": [
{
"name": "Nous Subscription",
@@ -322,15 +332,6 @@ TOOL_CATEGORIES = {
"override_env_vars": ["FAL_KEY"],
"imagegen_backend": "fal",
},
{
"name": "FAL.ai",
"badge": "paid",
"tag": "Pick from flux-2-klein, flux-2-pro, gpt-image, nano-banana, etc.",
"env_vars": [
{"key": "FAL_KEY", "prompt": "FAL API key", "url": "https://fal.ai/dashboard/keys"},
],
"imagegen_backend": "fal",
},
],
},
"video_gen": {
@@ -482,6 +483,11 @@ TOOLSET_ENV_REQUIREMENTS = {
# ─── Post-Setup Hooks ─────────────────────────────────────────────────────────
def _cua_driver_cmd() -> str:
"""Return the cua-driver executable name/path, honoring non-empty overrides."""
return os.environ.get("HERMES_CUA_DRIVER_CMD", "").strip() or "cua-driver"
def _pip_install(
args: List[str],
*,
@@ -550,6 +556,55 @@ def _pip_install(
)
def _check_cua_driver_asset_for_arch() -> bool:
"""Check whether the latest CUA release ships an asset for this architecture.
Returns True if the asset likely exists (or if we cannot determine it).
Returns False and prints a warning when the asset is confirmed missing,
so callers can skip the install attempt and avoid a raw 404.
"""
import platform as _plat
import urllib.request
machine = _plat.machine() # "x86_64" or "arm64"
if machine == "arm64":
# arm64 (Apple Silicon) assets are always published.
return True
# x86_64 / Intel — probe the latest release for an architecture-specific
# asset before falling through to the upstream installer.
api_url = (
"https://api.github.com/repos/trycua/cua/releases/latest"
)
try:
req = urllib.request.Request(api_url, headers={"Accept": "application/vnd.github+json"})
with urllib.request.urlopen(req, timeout=10) as resp:
release = _json.loads(resp.read().decode())
tag = release.get("tag_name", "")
assets = release.get("assets", [])
arch_names = {"x86_64", "amd64"}
has_asset = any(
any(a in a_info.get("name", "").lower() for a in arch_names)
for a_info in assets
)
if not has_asset:
_print_warning(
f" Latest CUA release ({tag}) has no Intel (x86_64) asset."
)
_print_info(
" CUA Driver currently only ships Apple Silicon builds."
)
_print_info(
" See: https://github.com/trycua/cua/issues/1493"
)
return False
except Exception:
# Network / API failure — proceed and let the installer handle it.
pass
return True
def install_cua_driver(upgrade: bool = False) -> bool:
"""Install or refresh the cua-driver binary used by Computer Use.
@@ -579,7 +634,8 @@ def install_cua_driver(upgrade: bool = False) -> bool:
_print_warning(" Computer Use (cua-driver) is macOS-only; skipping.")
return False
binary = shutil.which("cua-driver")
driver_cmd = _cua_driver_cmd()
binary = shutil.which(driver_cmd)
# Not installed → fresh install path (only when caller asked for it).
if not binary and not upgrade:
@@ -587,18 +643,20 @@ def install_cua_driver(upgrade: bool = False) -> bool:
_print_warning(" curl not found — install manually:")
_print_info(" https://github.com/trycua/cua/blob/main/libs/cua-driver/README.md")
return False
if not _check_cua_driver_asset_for_arch():
return False
return _run_cua_driver_installer(label="Installing")
# Already installed and caller didn't ask to upgrade → just confirm.
if binary and not upgrade:
try:
version = subprocess.run(
["cua-driver", "--version"],
[driver_cmd, "--version"],
capture_output=True, text=True, timeout=5,
).stdout.strip()
_print_success(f" cua-driver already installed: {version or 'unknown version'}")
_print_success(f" {driver_cmd} already installed: {version or 'unknown version'}")
except Exception:
_print_success(" cua-driver already installed.")
_print_success(f" {driver_cmd} already installed.")
_print_info(" Grant macOS permissions if not done yet:")
_print_info(" System Settings > Privacy & Security > Accessibility")
_print_info(" System Settings > Privacy & Security > Screen Recording")
@@ -609,11 +667,14 @@ def install_cua_driver(upgrade: bool = False) -> bool:
_print_warning(" curl not found — cannot refresh cua-driver.")
return bool(binary)
if not _check_cua_driver_asset_for_arch():
return bool(binary)
if binary:
# Show before/after version when we have a baseline. Best-effort.
try:
before = subprocess.run(
["cua-driver", "--version"],
[driver_cmd, "--version"],
capture_output=True, text=True, timeout=5,
).stdout.strip()
except Exception:
@@ -625,13 +686,13 @@ def install_cua_driver(upgrade: bool = False) -> bool:
if ok and before:
try:
after = subprocess.run(
["cua-driver", "--version"],
[driver_cmd, "--version"],
capture_output=True, text=True, timeout=5,
).stdout.strip()
if after and after != before:
_print_success(f" cua-driver upgraded: {before}{after}")
_print_success(f" {driver_cmd} upgraded: {before}{after}")
elif after:
_print_info(f" cua-driver up to date: {after}")
_print_info(f" {driver_cmd} up to date: {after}")
except Exception:
pass
return ok
@@ -655,11 +716,12 @@ def _run_cua_driver_installer(label: str = "Installing", verbose: bool = True) -
_print_info(f" {label} cua-driver (macOS background computer-use)...")
else:
_print_info(f" {label} cua-driver...")
driver_cmd = _cua_driver_cmd()
try:
result = subprocess.run(install_cmd, shell=True, timeout=300)
if result.returncode == 0 and shutil.which("cua-driver"):
if result.returncode == 0 and shutil.which(driver_cmd):
if verbose:
_print_success(" cua-driver installed.")
_print_success(f" {driver_cmd} installed.")
_print_info(" IMPORTANT — grant macOS permissions now:")
_print_info(" System Settings > Privacy & Security > Accessibility")
_print_info(" System Settings > Privacy & Security > Screen Recording")
@@ -1506,12 +1568,9 @@ def _plugin_image_gen_providers() -> list[dict]:
Each returned dict looks like a regular ``TOOL_CATEGORIES`` provider
row but carries an ``image_gen_plugin_name`` marker so downstream
code (config writing, model picker) knows to route through the
plugin registry instead of the in-tree FAL backend.
FAL is skipped it's already exposed by the hardcoded
``TOOL_CATEGORIES["image_gen"]`` entries. When FAL gets ported to
a plugin in a follow-up PR, the hardcoded entries go away and this
function surfaces it alongside OpenAI automatically.
plugin registry. Every image-gen backend is a plugin now there
are no hardcoded rows left in ``TOOL_CATEGORIES["image_gen"]`` for
this function to dedupe against (see issue #26241).
"""
try:
from agent.image_gen_registry import list_providers
@@ -1524,9 +1583,6 @@ def _plugin_image_gen_providers() -> list[dict]:
rows: list[dict] = []
for provider in providers:
if getattr(provider, "name", None) == "fal":
# FAL has its own hardcoded rows today.
continue
try:
schema = provider.get_setup_schema()
except Exception:
@@ -1751,7 +1807,7 @@ _POST_SETUP_INSTALLED: dict = {
# entry when (a) the post_setup is the ONLY install side-effect for
# a no-key provider, and (b) an installed-state check is cheap and
# doesn't trigger a heavy import.
"cua_driver": lambda: bool(shutil.which("cua-driver")),
"cua_driver": lambda: bool(shutil.which(_cua_driver_cmd())),
}
@@ -1869,6 +1925,16 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
print()
# Plain text labels only (no ANSI codes in menu items)
# When the user is logged into Nous, surface a marker on providers
# whose access is included in their subscription so it's visually
# obvious which options cost extra vs. cost nothing on top of Nous.
try:
_nous_logged_in = bool(
get_nous_subscription_features(config).nous_auth_present
)
except Exception:
_nous_logged_in = False
provider_choices = []
for p in providers:
badge = f" [{p['badge']}]" if p.get("badge") else ""
@@ -1882,7 +1948,15 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
configured = ""
else:
configured = " [configured]"
provider_choices.append(f"{p['name']}{badge}{tag}{configured}")
# Highlight Nous-managed entries when the user has Portal auth.
# curses_radiolist can't render ANSI inside item strings, so we
# use a plain unicode star + parenthetical phrase. Suppressed
# when no Portal auth is present so non-subscribers see the
# picker unchanged.
sub_marker = ""
if _nous_logged_in and p.get("managed_nous_feature"):
sub_marker = " ★ Included with your Nous subscription"
provider_choices.append(f"{p['name']}{badge}{tag}{configured}{sub_marker}")
# Add skip option
provider_choices.append("Skip — keep defaults / configure later")
@@ -2349,6 +2423,30 @@ def _configure_provider(provider: dict, config: dict):
# Prompt for each required env var
all_configured = True
# If this BYOK provider lives in a category that ALSO has a
# Nous-managed sibling, show a single dim hint so users know
# they can avoid the key entirely via a Portal subscription.
# Suppressed when the user is already authed to Nous.
_show_portal_hint = False
if env_vars and not managed_feature and not provider.get("requires_nous_auth"):
try:
_has_managed_sibling = False
for _cat_key, _cat in TOOL_CATEGORIES.items():
_providers = _cat.get("providers", [])
if provider in _providers and any(
sib.get("managed_nous_feature") for sib in _providers
):
_has_managed_sibling = True
break
if _has_managed_sibling:
_features = get_nous_subscription_features(config)
_show_portal_hint = not _features.nous_auth_present
except Exception:
_show_portal_hint = False
if _show_portal_hint:
_print_info(" Available through Nous Portal subscription.")
for var in env_vars:
existing = get_env_value(var["key"])
if existing:
+110 -13
View File
@@ -48,6 +48,7 @@ from hermes_cli.config import (
redact_key,
)
from gateway.status import get_running_pid, read_runtime_status
from utils import env_var_enabled
try:
from fastapi import FastAPI, HTTPException, Request, WebSocket, WebSocketDisconnect
@@ -975,11 +976,13 @@ _AUX_TASK_SLOTS: Tuple[str, ...] = (
"vision",
"web_extract",
"compression",
"session_search",
"skills_hub",
"approval",
"mcp",
"title_generation",
"triage_specifier",
"kanban_decomposer",
"profile_describer",
"curator",
)
@@ -3389,7 +3392,7 @@ async def _broadcast_event(channel: str, payload: str) -> None:
except Exception:
# Subscriber went away mid-send; the /api/events finally clause
# will remove it from the registry on its next iteration.
pass
_log.warning("broadcast send failed for subscriber on %s", channel, exc_info=True)
def _channel_or_close_code(ws: WebSocket) -> Optional[str]:
@@ -4044,6 +4047,43 @@ async def set_dashboard_theme(body: ThemeSetBody):
# Dashboard plugin system
# ---------------------------------------------------------------------------
def _safe_plugin_api_relpath(api_field: Any, *, dashboard_dir: Path) -> Optional[str]:
"""Validate the manifest's ``api`` field for the plugin loader.
The web server later imports this file as a Python module via
``importlib.util.spec_from_file_location`` (arbitrary code
execution by design that's how plugins extend the backend).
Pre-#29156 the field was used as-is, which meant:
* An absolute path swallowed the plugin's dashboard directory
entirely ``Path('safe/dashboard') / '/tmp/evil.py'`` resolves
to ``/tmp/evil.py``, so any attacker-controlled manifest could
point the import at any Python file on disk (GHSA-5qr3-c538-wm9j).
* A ``../..`` traversal could climb out of the plugin into
neighbouring directories on the search path.
Return the original string when the resolved path stays under
``dashboard_dir``; return ``None`` (with a warning logged at the
call site) otherwise so the plugin still loads its static JS/CSS
but its backend ``api`` is rejected.
"""
if not isinstance(api_field, str) or not api_field.strip():
return None
candidate = Path(api_field)
if candidate.is_absolute():
return None
try:
resolved = (dashboard_dir / candidate).resolve()
base = dashboard_dir.resolve()
except (OSError, RuntimeError):
return None
try:
resolved.relative_to(base)
except ValueError:
return None
return api_field
def _discover_dashboard_plugins() -> list:
"""Scan plugins/*/dashboard/manifest.json for dashboard extensions.
@@ -4062,7 +4102,16 @@ def _discover_dashboard_plugins() -> list:
(bundled_root / "memory", "bundled"),
(bundled_root, "bundled"),
]
if os.environ.get("HERMES_ENABLE_PROJECT_PLUGINS"):
# GHSA-5qr3-c538-wm9j (#29156): the previous ``os.environ.get(...)``
# check treated *any* non-empty string as truthy, so ``=0``, ``=false``,
# and ``=no`` — all of which the agent loader and operators correctly
# read as "disabled" — silently *enabled* the untrusted project source
# in the web server. Combined with the absolute-path RCE primitive on
# the manifest's ``api`` field (now patched below), this turned the
# opt-in into a sticky always-on switch. Use the shared truthy
# semantics (``1`` / ``true`` / ``yes`` / ``on``) so the gate matches
# ``hermes_cli/plugins.py`` and the documented user contract.
if env_var_enabled("HERMES_ENABLE_PROJECT_PLUGINS"):
search_dirs.append((Path.cwd() / ".hermes" / "plugins", "project"))
for plugins_root, source in search_dirs:
@@ -4101,6 +4150,23 @@ def _discover_dashboard_plugins() -> list:
slots: List[str] = []
if isinstance(slots_src, list):
slots = [s for s in slots_src if isinstance(s, str) and s]
# Validate ``api`` at discovery time so the value cached
# on the plugin entry is already safe to feed into the
# importer. An attacker-controlled manifest can name
# any absolute path or ``..`` traversal here — the
# web server then imports that file as a Python module
# (RCE, GHSA-5qr3-c538-wm9j).
raw_api = data.get("api")
dashboard_dir = child / "dashboard"
safe_api = _safe_plugin_api_relpath(raw_api, dashboard_dir=dashboard_dir)
if raw_api and safe_api is None:
_log.warning(
"Plugin %s: refusing unsafe api path %r (must be a "
"relative file inside the plugin's dashboard/ "
"directory); backend routes from this plugin will "
"not be mounted",
name, raw_api,
)
plugins.append({
"name": name,
"label": data.get("label", name),
@@ -4111,10 +4177,10 @@ def _discover_dashboard_plugins() -> list:
"slots": slots,
"entry": data.get("entry", "dist/index.js"),
"css": data.get("css"),
"has_api": bool(data.get("api")),
"has_api": bool(safe_api),
"source": source,
"_dir": str(child / "dashboard"),
"_api_file": data.get("api"),
"_dir": str(dashboard_dir),
"_api_file": safe_api,
})
except Exception as exc:
_log.warning("Bad dashboard plugin manifest %s: %s", manifest_file, exc)
@@ -4317,12 +4383,13 @@ async def post_agent_plugin_install(request: Request, body: _AgentPluginInstallB
def _validate_plugin_name(name: str) -> str:
"""Reject path-traversal attempts in plugin name URL parameters."""
if not name or "/" in name or "\\" in name or ".." in name:
name = name.strip("/")
if not name or ".." in name or "\\" in name:
raise HTTPException(status_code=400, detail="Invalid plugin name.")
return name
@app.post("/api/dashboard/agent-plugins/{name}/enable")
@app.post("/api/dashboard/agent-plugins/{name:path}/enable")
async def post_agent_plugin_enable(request: Request, name: str):
_require_token(request)
name = _validate_plugin_name(name)
@@ -4334,7 +4401,7 @@ async def post_agent_plugin_enable(request: Request, name: str):
return result
@app.post("/api/dashboard/agent-plugins/{name}/disable")
@app.post("/api/dashboard/agent-plugins/{name:path}/disable")
async def post_agent_plugin_disable(request: Request, name: str):
_require_token(request)
name = _validate_plugin_name(name)
@@ -4346,7 +4413,7 @@ async def post_agent_plugin_disable(request: Request, name: str):
return result
@app.post("/api/dashboard/agent-plugins/{name}/update")
@app.post("/api/dashboard/agent-plugins/{name:path}/update")
async def post_agent_plugin_update(request: Request, name: str):
_require_token(request)
name = _validate_plugin_name(name)
@@ -4359,7 +4426,7 @@ async def post_agent_plugin_update(request: Request, name: str):
return result
@app.delete("/api/dashboard/agent-plugins/{name}")
@app.delete("/api/dashboard/agent-plugins/{name:path}")
async def delete_agent_plugin(request: Request, name: str):
_require_token(request)
name = _validate_plugin_name(name)
@@ -4397,7 +4464,7 @@ class _PluginVisibilityBody(BaseModel):
hidden: bool
@app.post("/api/dashboard/plugins/{name}/visibility")
@app.post("/api/dashboard/plugins/{name:path}/visibility")
async def post_plugin_visibility(request: Request, name: str, body: _PluginVisibilityBody):
"""Toggle a plugin's sidebar visibility (persists to config.yaml dashboard.hidden_plugins)."""
_require_token(request)
@@ -4468,12 +4535,42 @@ def _mount_plugin_api_routes():
Each plugin's ``api`` field points to a Python file that must expose
a ``router`` (FastAPI APIRouter). Routes are mounted under
``/api/plugins/<name>/``.
Backend import is restricted to ``bundled`` and ``user`` sources.
Project plugins (``./.hermes/plugins/``) ship with the CWD and are
therefore attacker-controlled in any threat model where the user
opens a malicious repo; they can extend the dashboard UI via
static JS/CSS but their Python ``api`` file is never auto-imported
by the web server. See GHSA-5qr3-c538-wm9j (#29156).
"""
for plugin in _get_dashboard_plugins():
api_file_name = plugin.get("_api_file")
if not api_file_name:
continue
api_path = Path(plugin["_dir"]) / api_file_name
if plugin.get("source") == "project":
_log.warning(
"Plugin %s: ignoring backend api=%s (project plugins may "
"not auto-import Python code; move the plugin to "
"~/.hermes/plugins/ if you trust it)",
plugin["name"], api_file_name,
)
continue
dashboard_dir = Path(plugin["_dir"])
api_path = dashboard_dir / api_file_name
try:
resolved_api = api_path.resolve()
resolved_base = dashboard_dir.resolve()
resolved_api.relative_to(resolved_base)
except (OSError, RuntimeError, ValueError):
# Discovery already filters this, but re-check here in case
# ``_dir`` was tampered with after caching or a future caller
# bypasses the validator. Defence in depth keeps the import
# primitive contained even if the upstream check regresses.
_log.warning(
"Plugin %s: refusing to import api file outside its "
"dashboard directory (%s)", plugin["name"], api_path,
)
continue
if not api_path.exists():
_log.warning("Plugin %s declares api=%s but file not found", plugin["name"], api_file_name)
continue
+13 -7
View File
@@ -33,7 +33,7 @@ T = TypeVar("T")
DEFAULT_DB_PATH = get_hermes_home() / "state.db"
SCHEMA_VERSION = 12
SCHEMA_VERSION = 13
# ---------------------------------------------------------------------------
# WAL-compatibility fallback
@@ -237,7 +237,8 @@ CREATE TABLE IF NOT EXISTS messages (
reasoning_details TEXT,
codex_reasoning_items TEXT,
codex_message_items TEXT,
platform_message_id TEXT
platform_message_id TEXT,
observed INTEGER DEFAULT 0
);
CREATE TABLE IF NOT EXISTS state_meta (
@@ -1460,6 +1461,7 @@ class SessionDB:
codex_reasoning_items: Any = None,
codex_message_items: Any = None,
platform_message_id: str = None,
observed: bool = False,
) -> int:
"""
Append a message to a session. Returns the message row ID.
@@ -1501,8 +1503,8 @@ class SessionDB:
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_content, reasoning_details, codex_reasoning_items,
codex_message_items, platform_message_id)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
codex_message_items, platform_message_id, observed)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
@@ -1519,6 +1521,7 @@ class SessionDB:
codex_items_json,
codex_message_items_json,
platform_message_id,
1 if observed else 0,
),
)
msg_id = cursor.lastrowid
@@ -1590,8 +1593,8 @@ class SessionDB:
"""INSERT INTO messages (session_id, role, content, tool_call_id,
tool_calls, tool_name, timestamp, token_count, finish_reason,
reasoning, reasoning_content, reasoning_details, codex_reasoning_items,
codex_message_items, platform_message_id)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
codex_message_items, platform_message_id, observed)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
session_id,
role,
@@ -1608,6 +1611,7 @@ class SessionDB:
codex_items_json,
codex_message_items_json,
platform_msg_id,
1 if msg.get("observed") else 0,
),
)
total_messages += 1
@@ -1925,7 +1929,7 @@ class SessionDB:
rows = self._conn.execute(
"SELECT role, content, tool_call_id, tool_calls, tool_name, "
"finish_reason, reasoning, reasoning_content, reasoning_details, "
"codex_reasoning_items, codex_message_items, platform_message_id "
"codex_reasoning_items, codex_message_items, platform_message_id, observed "
f"FROM messages WHERE session_id IN ({placeholders}) ORDER BY id",
tuple(session_ids),
).fetchall()
@@ -1953,6 +1957,8 @@ class SessionDB:
# for backward compatibility with the JSONL transcript shape.
if row["platform_message_id"]:
msg["message_id"] = row["platform_message_id"]
if row["observed"]:
msg["observed"] = True
# Restore reasoning fields on assistant messages so providers
# that replay reasoning (OpenRouter, OpenAI, Nous) receive
# coherent multi-turn reasoning context.
Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

+1 -1
View File
@@ -4,7 +4,7 @@ let
src = ../ui-tui;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-dNL/J4tyQQ7Ji3xfIE5b5Jdi6rQyCFjqYpzLYftJVdc=";
hash = "sha256-F6/MzZOWc0zhW9mIfnaY+PrllPvJcsA/OdFdEM+NpLY=";
};
npm = hermesNpmLib.mkNpmPassthru { folder = "ui-tui"; attr = "tui"; pname = "hermes-tui"; };
+1 -1
View File
@@ -4,7 +4,7 @@ let
src = ../web;
npmDeps = pkgs.fetchNpmDeps {
inherit src;
hash = "sha256-GxSmEpclOwmv94KmGMediPITxqXAsxqTEQOoDIbYkUw=";
hash = "sha256-6qhGuifHVtCeep1SiQdCUxBMr7UGhYpdMTvXhrQu/zA=";
};
npm = hermesNpmLib.mkNpmPassthru { folder = "web"; attr = "web"; pname = "hermes-web"; };
+182
View File
@@ -0,0 +1,182 @@
"""FAL.ai image generation backend.
Wraps the 18-model FAL catalog (FLUX 2, Z-Image, Nano Banana, GPT
Image 1.5, Recraft, Imagen 4, Qwen, Ideogram, ) as an
:class:`ImageGenProvider` implementation.
The heavy lifting model catalog, payload construction, request
submission, managed-Nous-gateway selection, Clarity Upscaler chaining
lives in :mod:`tools.image_generation_tool`. This plugin reaches into
that module via call-time indirection (``import tools.image_generation_tool as _it``)
so:
* the existing test suite (``tests/tools/test_image_generation.py``,
``tests/tools/test_managed_media_gateways.py``) keeps patching
``image_tool._submit_fal_request`` / ``image_tool.fal_client`` /
``image_tool._managed_fal_client`` without modification, and
* there's exactly one canonical FAL code path on disk — the plugin is a
registration adapter, not a parallel implementation.
See issue #26241 for the migration plan and the
``plugin-extraction-test-patch-compatibility.md`` rules this follows.
"""
from __future__ import annotations
import json
import logging
import os
from typing import Any, Dict, List, Optional
from agent.image_gen_provider import (
DEFAULT_ASPECT_RATIO,
ImageGenProvider,
resolve_aspect_ratio,
)
logger = logging.getLogger(__name__)
class FalImageGenProvider(ImageGenProvider):
"""FAL.ai image generation backend.
Delegates to ``tools.image_generation_tool.image_generate_tool`` so
the in-tree FAL implementation (model catalog, payload builder,
managed-gateway selection, Clarity Upscaler chaining) is the single
source of truth. Everything is resolved at call time via the
``_it`` indirection so tests can monkey-patch the legacy module.
"""
@property
def name(self) -> str:
return "fal"
@property
def display_name(self) -> str:
return "FAL.ai"
def is_available(self) -> bool:
# Available when direct FAL_KEY is set OR the managed Nous
# gateway resolves a fal-queue origin. Both checks come from the
# legacy module so this provider tracks whatever logic ships
# there.
import tools.image_generation_tool as _it
try:
return bool(_it.check_fal_api_key())
except Exception: # noqa: BLE001 — defensive; never break the picker
return False
def list_models(self) -> List[Dict[str, Any]]:
import tools.image_generation_tool as _it
return [
{
"id": model_id,
"display": meta.get("display", model_id),
"speed": meta.get("speed", ""),
"strengths": meta.get("strengths", ""),
"price": meta.get("price", ""),
}
for model_id, meta in _it.FAL_MODELS.items()
]
def default_model(self) -> Optional[str]:
import tools.image_generation_tool as _it
return _it.DEFAULT_MODEL
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "FAL.ai",
"badge": "paid",
"tag": "Pick from flux-2-klein, flux-2-pro, gpt-image, nano-banana, etc.",
"env_vars": [
{
"key": "FAL_KEY",
"prompt": "FAL API key",
"url": "https://fal.ai/dashboard/keys",
},
],
}
def generate(
self,
prompt: str,
aspect_ratio: str = DEFAULT_ASPECT_RATIO,
**kwargs: Any,
) -> Dict[str, Any]:
"""Generate an image via the legacy FAL pipeline.
Forwards prompt + aspect_ratio (and any forward-compat extras
the schema supports) into :func:`tools.image_generation_tool.image_generate_tool`,
then reshapes its JSON-string response into the provider-ABC
dict format consumed by ``_dispatch_to_plugin_provider``.
"""
import tools.image_generation_tool as _it
aspect = resolve_aspect_ratio(aspect_ratio)
passthrough = {
key: kwargs[key]
for key in (
"num_inference_steps",
"guidance_scale",
"num_images",
"output_format",
"seed",
)
if key in kwargs and kwargs[key] is not None
}
try:
raw = _it.image_generate_tool(
prompt=prompt,
aspect_ratio=aspect,
**passthrough,
)
except Exception as exc: # noqa: BLE001 — never raise out of generate
logger.warning("FAL image_generate_tool raised: %s", exc, exc_info=True)
return {
"success": False,
"image": None,
"error": f"FAL image generation failed: {exc}",
"error_type": type(exc).__name__,
"provider": "fal",
"prompt": prompt,
"aspect_ratio": aspect,
}
try:
response = json.loads(raw) if isinstance(raw, str) else raw
except Exception: # noqa: BLE001
response = {"success": False, "image": None, "error": "Invalid JSON from FAL pipeline"}
if not isinstance(response, dict):
response = {
"success": False,
"image": None,
"error": "FAL pipeline returned a non-dict response",
"error_type": "provider_contract",
}
# Stamp provider/prompt/aspect_ratio so downstream consumers see
# the uniform shape declared in ``agent.image_gen_provider``.
response.setdefault("provider", "fal")
response.setdefault("prompt", prompt)
response.setdefault("aspect_ratio", aspect)
# Annotate model best-effort — the legacy pipeline resolves it
# internally, so query it after the fact for the response shape.
if "model" not in response:
try:
model_id, _meta = _it._resolve_fal_model()
response["model"] = model_id
except Exception: # noqa: BLE001
pass
return response
# ---------------------------------------------------------------------------
# Plugin entry point
# ---------------------------------------------------------------------------
def register(ctx) -> None:
"""Plugin entry point — wire ``FalImageGenProvider`` into the registry."""
ctx.register_image_gen_provider(FalImageGenProvider())
+7
View File
@@ -0,0 +1,7 @@
name: fal
version: 1.0.0
description: "FAL.ai image generation backend (flux-2-klein, flux-2-pro, nano-banana, gpt-image-1.5, recraft-v3, etc.)."
author: NousResearch
kind: backend
requires_env:
- FAL_KEY
+58 -25
View File
@@ -47,6 +47,25 @@ _DEFAULT_ENDPOINT = "http://127.0.0.1:1933"
_TIMEOUT = 30.0
_REMOTE_RESOURCE_PREFIXES = ("http://", "https://", "git@", "ssh://", "git://")
# Maps the viking_remember `category` enum to a viking:// subdirectory.
# Keep in sync with REMEMBER_SCHEMA.parameters.properties.category.enum.
_CATEGORY_SUBDIR_MAP = {
"preference": "preferences",
"entity": "entities",
"event": "events",
"case": "cases",
"pattern": "patterns",
}
_DEFAULT_MEMORY_SUBDIR = "preferences"
# Maps the built-in memory tool's `target` ("user" vs "memory") to a subdir
# for on_memory_write mirroring. User profile facts → preferences; agent
# notes / observations → patterns. Anything unknown falls back to the default.
_MEMORY_WRITE_TARGET_SUBDIR_MAP = {
"user": "preferences",
"memory": "patterns",
}
# ---------------------------------------------------------------------------
# Process-level atexit safety net — ensures pending sessions are committed
@@ -607,24 +626,35 @@ class OpenVikingMemoryProvider(MemoryProvider):
except Exception as e:
logger.warning("OpenViking session commit failed: %s", e)
def on_memory_write(self, action: str, target: str, content: str) -> None:
"""Mirror built-in memory writes to OpenViking as explicit memories."""
def _build_memory_uri(self, subdir: str) -> str:
"""Build a viking:// memory URI under the configured user/subdir."""
slug = uuid.uuid4().hex[:12]
return f"viking://user/{self._user}/memories/{subdir}/mem_{slug}.md"
def on_memory_write(
self,
action: str,
target: str,
content: str,
metadata: Optional[Dict[str, Any]] = None,
) -> None:
"""Mirror built-in memory writes to OpenViking via content/write."""
if not self._client or action != "add" or not content:
return
subdir = _MEMORY_WRITE_TARGET_SUBDIR_MAP.get(target, _DEFAULT_MEMORY_SUBDIR)
uri = self._build_memory_uri(subdir)
def _write():
try:
client = _VikingClient(
self._endpoint, self._api_key,
account=self._account, user=self._user, agent=self._agent,
)
# Add as a user message with memory context so the commit
# picks it up as an explicit memory during extraction
client.post(f"/api/v1/sessions/{self._session_id}/messages", {
"role": "user",
"parts": [
{"type": "text", "text": f"[Memory note — {target}] {content}"},
],
client.post("/api/v1/content/write", {
"uri": uri,
"content": content,
"mode": "create",
})
except Exception as e:
logger.debug("OpenViking memory mirror failed: %s", e)
@@ -858,24 +888,27 @@ class OpenVikingMemoryProvider(MemoryProvider):
if not content:
return tool_error("content is required")
# Store as a session message that will be extracted during commit.
# The category hint helps OpenViking's extraction classify correctly.
category = args.get("category", "")
text = f"[Remember] {content}"
if category:
text = f"[Remember — {category}] {content}"
subdir = _CATEGORY_SUBDIR_MAP.get(category, _DEFAULT_MEMORY_SUBDIR)
uri = self._build_memory_uri(subdir)
self._client.post(f"/api/v1/sessions/{self._session_id}/messages", {
"role": "user",
"parts": [
{"type": "text", "text": text},
],
})
return json.dumps({
"status": "stored",
"message": "Memory recorded. Will be extracted and indexed on session commit.",
})
# Write directly via content/write API.
# This creates the file, stores the content, and queues vector indexing
# in a single call — no dependency on session commit / VLM extraction.
try:
result = self._client.post("/api/v1/content/write", {
"uri": uri,
"content": content,
"mode": "create",
})
written = result.get("result", {}).get("written_bytes", 0)
return json.dumps({
"status": "stored",
"message": f"Memory stored ({written}b) and queued for vector indexing.",
})
except Exception as e:
logger.error("OpenViking content/write failed: %s", e)
return tool_error(f"Failed to store memory: {e}")
def _tool_add_resource(self, args: dict) -> str:
url = args.get("url", "")
@@ -7,9 +7,81 @@ Both use per-model api_mode routing:
(this profile)
"""
from __future__ import annotations
from typing import Any
from providers import register_provider
from providers.base import ProviderProfile
def _flat_model_name(model: str | None) -> str:
"""Return the bare OpenCode model ID, tolerating aggregator prefixes."""
return (model or "").strip().rsplit("/", 1)[-1].lower()
def _is_kimi_k2_model(model: str | None) -> bool:
return _flat_model_name(model).startswith("kimi-k2")
def _is_deepseek_thinking_model(model: str | None) -> bool:
m = _flat_model_name(model)
if m.startswith("deepseek-v") and not m.startswith("deepseek-v3"):
return True
return m == "deepseek-reasoner"
class OpenCodeGoProfile(ProviderProfile):
"""OpenCode Go - model-specific reasoning controls."""
def build_api_kwargs_extras(
self, *, reasoning_config: dict | None = None, model: str | None = None, **context
) -> tuple[dict[str, Any], dict[str, Any]]:
extra_body: dict[str, Any] = {}
top_level: dict[str, Any] = {}
if _is_kimi_k2_model(model):
# Kimi K2 on OpenCode Go uses Moonshot's native wire shape:
# extra_body.thinking (binary toggle) + top-level reasoning_effort
# (low|medium|high). Mirrors the KimiProfile (api.moonshot.ai/v1).
if not isinstance(reasoning_config, dict):
# No config → leave server defaults alone.
return extra_body, top_level
enabled = reasoning_config.get("enabled") is not False
extra_body["thinking"] = {"type": "enabled" if enabled else "disabled"}
if not enabled:
return extra_body, top_level
effort = (reasoning_config.get("effort") or "").strip().lower()
if effort in {"xhigh", "max"}:
top_level["reasoning_effort"] = "high"
elif effort in {"low", "medium", "high"}:
top_level["reasoning_effort"] = effort
return extra_body, top_level
if not _is_deepseek_thinking_model(model):
return extra_body, top_level
enabled = True
if isinstance(reasoning_config, dict) and reasoning_config.get("enabled") is False:
enabled = False
extra_body["thinking"] = {"type": "enabled" if enabled else "disabled"}
if not enabled:
return extra_body, top_level
if isinstance(reasoning_config, dict):
effort = (reasoning_config.get("effort") or "").strip().lower()
if effort in {"xhigh", "max"}:
top_level["reasoning_effort"] = "max"
elif effort in {"low", "medium", "high"}:
top_level["reasoning_effort"] = effort
return extra_body, top_level
opencode_zen = ProviderProfile(
name="opencode-zen",
aliases=("opencode", "opencode_zen", "zen"),
@@ -18,7 +90,7 @@ opencode_zen = ProviderProfile(
default_aux_model="gemini-3-flash",
)
opencode_go = ProviderProfile(
opencode_go = OpenCodeGoProfile(
name="opencode-go",
aliases=("opencode_go", "go", "opencode-go-sub"),
env_vars=("OPENCODE_GO_API_KEY",),
+3
View File
@@ -0,0 +1,3 @@
from .adapter import register
__all__ = ["register"]
@@ -1489,7 +1489,8 @@ class DiscordAdapter(BasePlatformAdapter):
reported in ``raw_response['warnings']`` so the caller can surface
partial-send issues.
"""
from tools.send_message_tool import _derive_forum_thread_name
# _derive_forum_thread_name is defined further down in this same
# module — no cross-module import needed.
formatted = self.format_message(content)
chunks = self.truncate_message(formatted, self.MAX_MESSAGE_LENGTH)
@@ -1551,7 +1552,8 @@ class DiscordAdapter(BasePlatformAdapter):
ForumChannel accepts the same file/files/content kwargs as
``channel.send``, creating the thread and starter message atomically.
"""
from tools.send_message_tool import _derive_forum_thread_name
# _derive_forum_thread_name is defined further down in this same
# module — no cross-module import needed.
if not thread_name:
# Prefer the text content, fall back to the first attached
@@ -5699,7 +5701,492 @@ def _define_discord_view_classes() -> None:
self.resolved = True
for child in self.children:
child.disabled = True
if DISCORD_AVAILABLE:
_define_discord_view_classes()
# ── Standalone (out-of-process) sender ────────────────────────────────────────
# Used by ``tools/send_message_tool._send_via_adapter`` when the gateway runner
# is not in this process (e.g. ``hermes cron`` running standalone) and no live
# DiscordAdapter instance is available. Implements the same forum/thread/
# multipart logic the live adapter would use, via Discord's REST API directly.
#
# This block was previously hosted in ``tools/send_message_tool.py`` as
# ``_send_discord``. It moved into the plugin so all Discord-specific HTTP
# logic lives next to the adapter — same shape as Teams' ``_standalone_send``.
# Process-local cache for Discord channel-type probes. Avoids re-probing the
# same channel on every send when the directory cache has no entry (e.g. fresh
# install, or channel created after the last directory build).
_DISCORD_CHANNEL_TYPE_PROBE_CACHE: Dict[str, bool] = {}
def _remember_channel_is_forum(chat_id: str, is_forum: bool) -> None:
_DISCORD_CHANNEL_TYPE_PROBE_CACHE[str(chat_id)] = bool(is_forum)
def _probe_is_forum_cached(chat_id: str) -> Optional[bool]:
return _DISCORD_CHANNEL_TYPE_PROBE_CACHE.get(str(chat_id))
def _derive_forum_thread_name(message: str) -> str:
"""Derive a thread name from the first line of the message, capped at 100 chars."""
first_line = message.strip().split("\n", 1)[0].strip()
# Strip common markdown heading prefixes
first_line = first_line.lstrip("#").strip()
if not first_line:
first_line = "New Post"
return first_line[:100]
def _standalone_sanitize_error(text) -> str:
"""Local copy of tools.send_message_tool._sanitize_error_text — strips bot
tokens from any error payload before bubbling it up. Inlined so the
plugin doesn't introduce a hard dependency on send_message_tool internals.
"""
s = str(text)
# Mask anything that looks like a Bot token in an Authorization header.
import re as _re_san
return _re_san.sub(
r"(Authorization:\s*Bot\s+)\S+",
r"\1***",
s,
flags=_re_san.IGNORECASE,
)
async def _standalone_send(
pconfig,
chat_id: str,
message: str,
*,
thread_id: Optional[str] = None,
media_files: Optional[list] = None,
force_document: bool = False,
) -> Dict[str, Any]:
"""Send via Discord REST API without a live gateway adapter.
Used by ``tools/send_message_tool._send_via_adapter`` when the gateway
runner is not in this process. Reads ``DISCORD_BOT_TOKEN`` from
``pconfig.token`` (set by the gateway config loader from env) and falls
back to the ``DISCORD_BOT_TOKEN`` env var.
Forum channels (type 15) reject ``POST /messages`` a thread post is
created automatically via ``POST /channels/{id}/threads``. Media files
are uploaded as multipart attachments on the starter message of the new
thread. Channel type is resolved from the channel directory first, then
a process-local probe cache, and only as a last resort with a live
``GET /channels/{id}`` probe (whose result is memoized).
``force_document`` is accepted for signature parity but unused Discord
treats every uploaded file as a generic attachment.
"""
try:
import aiohttp
except ImportError:
return {"error": "aiohttp not installed. Run: pip install aiohttp"}
token = (getattr(pconfig, "token", None) or os.getenv("DISCORD_BOT_TOKEN", "")).strip()
if not token:
return {"error": "Discord standalone send: DISCORD_BOT_TOKEN is not set"}
try:
from gateway.platforms.base import resolve_proxy_url, proxy_kwargs_for_aiohttp
_proxy = resolve_proxy_url(platform_env_var="DISCORD_PROXY")
_sess_kw, _req_kw = proxy_kwargs_for_aiohttp(_proxy)
auth_headers = {"Authorization": f"Bot {token}"}
json_headers = {**auth_headers, "Content-Type": "application/json"}
media_files = media_files or []
last_data = None
warnings = []
# Thread endpoint: Discord threads are channels; send directly to the thread ID.
if thread_id:
url = f"https://discord.com/api/v10/channels/{thread_id}/messages"
else:
# Check if the target channel is a forum channel (type 15).
# Forum channels reject POST /messages — create a thread post instead.
# Three-layer detection: directory cache → process-local probe
# cache → GET /channels/{id} probe (with result memoized).
_channel_type = None
try:
from gateway.channel_directory import lookup_channel_type
_channel_type = lookup_channel_type("discord", chat_id)
except Exception:
pass
if _channel_type == "forum":
is_forum = True
elif _channel_type is not None:
is_forum = False
else:
cached = _probe_is_forum_cached(chat_id)
if cached is not None:
is_forum = cached
else:
is_forum = False
try:
info_url = f"https://discord.com/api/v10/channels/{chat_id}"
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=15), **_sess_kw) as info_sess:
async with info_sess.get(info_url, headers=json_headers, **_req_kw) as info_resp:
if info_resp.status == 200:
info = await info_resp.json()
is_forum = info.get("type") == 15
_remember_channel_is_forum(chat_id, is_forum)
except Exception:
logger.debug("Failed to probe channel type for %s", chat_id, exc_info=True)
if is_forum:
thread_name = _derive_forum_thread_name(message)
thread_url = f"https://discord.com/api/v10/channels/{chat_id}/threads"
# Filter to readable media files up front so we can pick the
# right code path (JSON vs multipart) before opening a session.
valid_media = []
for media_path, _is_voice in media_files:
if not os.path.exists(media_path):
warning = f"Media file not found, skipping: {media_path}"
logger.warning(warning)
warnings.append(warning)
continue
valid_media.append(media_path)
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=60), **_sess_kw) as session:
if valid_media:
# Multipart: payload_json + files[N] creates a forum
# thread with the starter message plus attachments in
# a single API call.
attachments_meta = [
{"id": str(idx), "filename": os.path.basename(path)}
for idx, path in enumerate(valid_media)
]
starter_message = {"content": message, "attachments": attachments_meta}
payload_json = json.dumps({"name": thread_name, "message": starter_message})
form = aiohttp.FormData()
form.add_field("payload_json", payload_json, content_type="application/json")
try:
for idx, media_path in enumerate(valid_media):
with open(media_path, "rb") as fh:
form.add_field(
f"files[{idx}]",
fh.read(),
filename=os.path.basename(media_path),
)
async with session.post(thread_url, headers=auth_headers, data=form, **_req_kw) as resp:
if resp.status not in {200, 201}:
body = await resp.text()
return {"error": f"Discord forum thread creation error ({resp.status}): {body}"}
data = await resp.json()
except Exception as e:
return {"error": _standalone_sanitize_error(f"Discord forum thread upload failed: {e}")}
else:
# No media — simple JSON POST creates the thread with
# just the text starter.
async with session.post(
thread_url,
headers=json_headers,
json={
"name": thread_name,
"message": {"content": message},
},
**_req_kw,
) as resp:
if resp.status not in {200, 201}:
body = await resp.text()
return {"error": f"Discord forum thread creation error ({resp.status}): {body}"}
data = await resp.json()
thread_id_created = data.get("id")
starter_msg_id = (data.get("message") or {}).get("id", thread_id_created)
result = {
"success": True,
"platform": "discord",
"chat_id": chat_id,
"thread_id": thread_id_created,
"message_id": starter_msg_id,
}
if warnings:
result["warnings"] = warnings
return result
url = f"https://discord.com/api/v10/channels/{chat_id}/messages"
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=30), **_sess_kw) as session:
# Send text message (skip if empty and media is present)
if message.strip() or not media_files:
async with session.post(url, headers=json_headers, json={"content": message}, **_req_kw) as resp:
if resp.status not in {200, 201}:
body = await resp.text()
return {"error": f"Discord API error ({resp.status}): {body}"}
last_data = await resp.json()
# Send each media file as a separate multipart upload
for media_path, _is_voice in media_files:
if not os.path.exists(media_path):
warning = f"Media file not found, skipping: {media_path}"
logger.warning(warning)
warnings.append(warning)
continue
try:
form = aiohttp.FormData()
filename = os.path.basename(media_path)
with open(media_path, "rb") as f:
form.add_field("files[0]", f, filename=filename)
async with session.post(url, headers=auth_headers, data=form, **_req_kw) as resp:
if resp.status not in {200, 201}:
body = await resp.text()
warning = _standalone_sanitize_error(f"Failed to send media {media_path}: Discord API error ({resp.status}): {body}")
logger.error(warning)
warnings.append(warning)
continue
last_data = await resp.json()
except Exception as e:
warning = _standalone_sanitize_error(f"Failed to send media {media_path}: {e}")
logger.error(warning)
warnings.append(warning)
if last_data is None:
error = "No deliverable text or media remained after processing"
if warnings:
return {"error": error, "warnings": warnings}
return {"error": error}
result = {"success": True, "platform": "discord", "chat_id": chat_id, "message_id": last_data.get("id")}
if warnings:
result["warnings"] = warnings
return result
except Exception as e:
return {"error": _standalone_sanitize_error(f"Discord send failed: {e}")}
# ── Plugin entry point ────────────────────────────────────────────────────────
def _clean_discord_user_ids(raw: str) -> list:
"""Strip common Discord mention prefixes from a comma-separated ID string."""
cleaned = []
for uid in raw.replace(" ", "").split(","):
uid = uid.strip()
if uid.startswith("<@") and uid.endswith(">"):
uid = uid.lstrip("<@!").rstrip(">")
if uid.lower().startswith("user:"):
uid = uid[5:]
if uid:
cleaned.append(uid)
return cleaned
def interactive_setup() -> None:
"""Guide the user through Discord bot setup.
Mirrors Teams' ``interactive_setup`` shape: lazy-imports CLI helpers so
the plugin's import surface stays small, prompts for the bot token,
captures an allowlist, and offers to set a home channel.
"""
from hermes_cli.config import get_env_value, save_env_value
from hermes_cli.cli_output import (
prompt,
prompt_yes_no,
print_header,
print_info,
print_success,
)
print_header("Discord")
existing = get_env_value("DISCORD_BOT_TOKEN")
if existing:
print_info("Discord: already configured")
if not prompt_yes_no("Reconfigure Discord?", False):
if not get_env_value("DISCORD_ALLOWED_USERS"):
print_info("⚠️ Discord has no user allowlist - anyone can use your bot!")
if prompt_yes_no("Add allowed users now?", True):
print_info(" To find Discord ID: Enable Developer Mode, right-click name → Copy ID")
allowed_users = prompt("Allowed user IDs (comma-separated)")
if allowed_users:
cleaned_ids = _clean_discord_user_ids(allowed_users)
save_env_value("DISCORD_ALLOWED_USERS", ",".join(cleaned_ids))
print_success("Discord allowlist configured")
return
print_info("Create a bot at https://discord.com/developers/applications")
token = prompt("Discord bot token", password=True)
if not token:
return
save_env_value("DISCORD_BOT_TOKEN", token)
print_success("Discord token saved")
print()
print_info("🔒 Security: Restrict who can use your bot")
print_info(" To find your Discord user ID:")
print_info(" 1. Enable Developer Mode in Discord settings")
print_info(" 2. Right-click your name → Copy ID")
print()
print_info(" You can also use Discord usernames (resolved on gateway start).")
print()
allowed_users = prompt(
"Allowed user IDs or usernames (comma-separated, leave empty for open access)"
)
if allowed_users:
cleaned_ids = _clean_discord_user_ids(allowed_users)
save_env_value("DISCORD_ALLOWED_USERS", ",".join(cleaned_ids))
print_success("Discord allowlist configured")
else:
print_info("⚠️ No allowlist set - anyone in servers with your bot can use it!")
print()
print_info("📬 Home Channel: where Hermes delivers cron job results,")
print_info(" cross-platform messages, and notifications.")
print_info(" To get a channel ID: right-click a channel → Copy Channel ID")
print_info(" (requires Developer Mode in Discord settings)")
print_info(" You can also set this later by typing /set-home in a Discord channel.")
home_channel = prompt("Home channel ID (leave empty to set later with /set-home)")
if home_channel:
save_env_value("DISCORD_HOME_CHANNEL", home_channel)
def _apply_yaml_config(yaml_cfg: dict, discord_cfg: dict) -> dict | None:
"""Translate ``config.yaml`` ``discord:`` keys into env vars.
Implements the ``apply_yaml_config_fn`` contract (#24836). Mirrors the
legacy ``discord_cfg`` block that used to live in
``gateway/config.py::load_gateway_config()`` before this migration.
The DiscordAdapter reads its runtime configuration via ``os.getenv()``
throughout the connect / handle code paths (``DISCORD_REQUIRE_MENTION``,
``DISCORD_FREE_RESPONSE_CHANNELS``, ``DISCORD_AUTO_THREAD``,
``DISCORD_REACTIONS``, ``DISCORD_IGNORED_CHANNELS``,
``DISCORD_ALLOWED_CHANNELS``, ``DISCORD_NO_THREAD_CHANNELS``,
``DISCORD_HISTORY_BACKFILL``, ``DISCORD_HISTORY_BACKFILL_LIMIT``,
``DISCORD_ALLOW_MENTION_*``, ``DISCORD_REPLY_TO_MODE``,
``DISCORD_THREAD_REQUIRE_MENTION``). Rather than rewrite ~50 call sites
inside the adapter to read from ``PlatformConfig.extra`` instead, this
hook keeps the existing env-driven model and merely owns the
YAMLenv translation here, next to the adapter that consumes it.
Env vars take precedence over YAML every assignment is guarded by
``not os.getenv(...)`` so explicit env vars survive a config.yaml
update. Returns ``None`` because no extras are seeded into
``PlatformConfig.extra`` directly (everything flows through env).
"""
if "require_mention" in discord_cfg and not os.getenv("DISCORD_REQUIRE_MENTION"):
os.environ["DISCORD_REQUIRE_MENTION"] = str(discord_cfg["require_mention"]).lower()
if "thread_require_mention" in discord_cfg and not os.getenv("DISCORD_THREAD_REQUIRE_MENTION"):
os.environ["DISCORD_THREAD_REQUIRE_MENTION"] = str(discord_cfg["thread_require_mention"]).lower()
frc = discord_cfg.get("free_response_channels")
if frc is not None and not os.getenv("DISCORD_FREE_RESPONSE_CHANNELS"):
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["DISCORD_FREE_RESPONSE_CHANNELS"] = str(frc)
if "auto_thread" in discord_cfg and not os.getenv("DISCORD_AUTO_THREAD"):
os.environ["DISCORD_AUTO_THREAD"] = str(discord_cfg["auto_thread"]).lower()
if "reactions" in discord_cfg and not os.getenv("DISCORD_REACTIONS"):
os.environ["DISCORD_REACTIONS"] = str(discord_cfg["reactions"]).lower()
# ignored_channels: channels where bot never responds (even when mentioned)
ic = discord_cfg.get("ignored_channels")
if ic is not None and not os.getenv("DISCORD_IGNORED_CHANNELS"):
if isinstance(ic, list):
ic = ",".join(str(v) for v in ic)
os.environ["DISCORD_IGNORED_CHANNELS"] = str(ic)
# allowed_channels: if set, bot ONLY responds in these channels (whitelist)
ac = discord_cfg.get("allowed_channels")
if ac is not None and not os.getenv("DISCORD_ALLOWED_CHANNELS"):
if isinstance(ac, list):
ac = ",".join(str(v) for v in ac)
os.environ["DISCORD_ALLOWED_CHANNELS"] = str(ac)
# no_thread_channels: channels where bot responds directly without creating thread
ntc = discord_cfg.get("no_thread_channels")
if ntc is not None and not os.getenv("DISCORD_NO_THREAD_CHANNELS"):
if isinstance(ntc, list):
ntc = ",".join(str(v) for v in ntc)
os.environ["DISCORD_NO_THREAD_CHANNELS"] = str(ntc)
# history_backfill: recover missed channel messages for shared sessions
# when require_mention is active. Fetches messages between bot turns
# and prepends them to the user message for context.
if "history_backfill" in discord_cfg and not os.getenv("DISCORD_HISTORY_BACKFILL"):
os.environ["DISCORD_HISTORY_BACKFILL"] = str(discord_cfg["history_backfill"]).lower()
hbl = discord_cfg.get("history_backfill_limit")
if hbl is not None and not os.getenv("DISCORD_HISTORY_BACKFILL_LIMIT"):
os.environ["DISCORD_HISTORY_BACKFILL_LIMIT"] = str(hbl)
# allow_mentions: granular control over what the bot can ping.
# Safe defaults (no @everyone/roles) are applied in the adapter;
# these YAML keys only override when set and let users opt back
# into unsafe modes (e.g. roles=true) if they actually want it.
allow_mentions_cfg = discord_cfg.get("allow_mentions")
if isinstance(allow_mentions_cfg, dict):
for yaml_key, env_key in (
("everyone", "DISCORD_ALLOW_MENTION_EVERYONE"),
("roles", "DISCORD_ALLOW_MENTION_ROLES"),
("users", "DISCORD_ALLOW_MENTION_USERS"),
("replied_user", "DISCORD_ALLOW_MENTION_REPLIED_USER"),
):
if yaml_key in allow_mentions_cfg and not os.getenv(env_key):
os.environ[env_key] = str(allow_mentions_cfg[yaml_key]).lower()
# reply_to_mode: top-level preferred, falls back to extra.reply_to_mode.
# YAML 1.1 parses bare 'off' as boolean False — coerce to string "off".
_discord_extra = discord_cfg.get("extra") if isinstance(discord_cfg.get("extra"), dict) else {}
_discord_rtm = (
discord_cfg["reply_to_mode"] if "reply_to_mode" in discord_cfg
else _discord_extra.get("reply_to_mode")
)
if _discord_rtm is not None and not os.getenv("DISCORD_REPLY_TO_MODE"):
_rtm_str = "off" if _discord_rtm is False else str(_discord_rtm).lower()
os.environ["DISCORD_REPLY_TO_MODE"] = _rtm_str
return None # all settings flow through env; nothing to merge into extras
def _is_connected(config) -> bool:
"""Discord is considered connected when DISCORD_BOT_TOKEN is set.
Looks up via ``hermes_cli.gateway.get_env_value`` at call time (not via
the plugin's own bound import) so tests that patch ``gateway_mod.get_env_value``
including ``test_setup_openclaw_migration`` can suppress ambient
``DISCORD_BOT_TOKEN`` env vars. Matches what the legacy
``_PLATFORMS["discord"]`` dispatch did before this migration.
"""
import hermes_cli.gateway as gateway_mod
return bool((gateway_mod.get_env_value("DISCORD_BOT_TOKEN") or "").strip())
def _build_adapter(config):
"""Factory wrapper that constructs DiscordAdapter from a PlatformConfig."""
return DiscordAdapter(config)
def register(ctx) -> None:
"""Plugin entry point — called by the Hermes plugin system."""
ctx.register_platform(
name="discord",
label="Discord",
adapter_factory=_build_adapter,
check_fn=check_discord_requirements,
is_connected=_is_connected,
required_env=["DISCORD_BOT_TOKEN"],
install_hint="pip install 'hermes-agent[discord]'",
# Interactive setup wizard — replaces the central
# hermes_cli/setup.py::_setup_discord function. Same shape as Teams.
setup_fn=interactive_setup,
# YAML→env config bridge — owns the translation of ``config.yaml``
# ``discord:`` keys (require_mention, free_response_channels,
# auto_thread, reactions, ignored_channels, allowed_channels,
# no_thread_channels, allow_mentions.*, reply_to_mode,
# thread_require_mention) into ``DISCORD_*`` env vars that the
# adapter reads via ``os.getenv()``. Replaces the hardcoded block
# that used to live in ``gateway/config.py``. Hook contract: #24836.
apply_yaml_config_fn=_apply_yaml_config,
# Auth env vars for _is_user_authorized() integration
allowed_users_env="DISCORD_ALLOWED_USERS",
allow_all_env="DISCORD_ALLOW_ALL_USERS",
# Cron home-channel delivery
cron_deliver_env_var="DISCORD_HOME_CHANNEL",
# Out-of-process cron delivery via Discord REST API. Without this
# hook, ``deliver=discord`` cron jobs fail with "No live adapter"
# when cron runs separately from the gateway. Mirrors Teams pattern.
standalone_sender_fn=_standalone_send,
# Discord hard limit per message
max_message_length=2000,
# Display
emoji="🎮",
allow_update_command=True,
)
+34
View File
@@ -0,0 +1,34 @@
name: discord-platform
label: Discord
kind: platform
version: 1.0.0
description: >
Discord gateway adapter for Hermes Agent.
Connects to Discord via the discord.py library and relays messages
between Discord guilds/DMs and the Hermes agent. Supports voice mode,
slash commands, free-response channels, role-based DM auth, threads,
reactions, and channel skill bindings.
author: NousResearch
requires_env:
- name: DISCORD_BOT_TOKEN
description: "Discord bot token"
prompt: "Discord bot token"
url: "https://discord.com/developers/applications"
password: true
optional_env:
- name: DISCORD_ALLOWED_USERS
description: "Comma-separated Discord user IDs allowed to talk to the bot"
prompt: "Allowed users (comma-separated)"
password: false
- name: DISCORD_ALLOW_ALL_USERS
description: "Allow any Discord user to trigger the bot (dev only)"
prompt: "Allow all users? (true/false)"
password: false
- name: DISCORD_HOME_CHANNEL
description: "Default channel ID for cron / notification delivery"
prompt: "Home channel ID"
password: false
- name: DISCORD_HOME_CHANNEL_NAME
description: "Display name for the Discord home channel"
prompt: "Home channel display name"
password: false
+9 -5
View File
@@ -282,20 +282,24 @@ def _build_payload(
# ---------------------------------------------------------------------------
# fal_client lazy import (same pattern as image_generation_tool)
# fal_client lazy import (shared with image_generation_tool via fal_common)
# ---------------------------------------------------------------------------
_fal_client: Any = None
def _load_fal_client() -> Any:
"""Lazy-load the ``fal_client`` SDK and cache it on this module.
Delegates the actual import to :func:`tools.fal_common.import_fal_client`
so the ``lazy_deps`` ensure-install handling stays in one place.
"""
global _fal_client
if _fal_client is not None:
return _fal_client
import fal_client # type: ignore
_fal_client = fal_client
return fal_client
from tools.fal_common import import_fal_client
_fal_client = import_fal_client()
return _fal_client
# ---------------------------------------------------------------------------
+7 -11
View File
@@ -84,7 +84,7 @@ modal = ["modal==1.3.4"]
daytona = ["daytona==0.155.0"]
vercel = ["vercel==0.5.7"]
hindsight = ["hindsight-client==0.6.1"]
dev = ["debugpy==1.8.20", "pytest==9.0.2", "pytest-asyncio==1.3.0", "pytest-xdist==3.8.0", "pytest-split==0.11.0", "pytest-timeout==2.4.0", "mcp==1.26.0", "ty==0.0.21", "ruff==0.15.10"]
dev = ["debugpy==1.8.20", "pytest==9.0.2", "pytest-asyncio==1.3.0", "pytest-timeout==2.4.0", "mcp==1.26.0", "ty==0.0.21", "ruff==0.15.10"]
messaging = ["python-telegram-bot[webhooks]==22.6", "discord.py[voice]==2.7.1", "aiohttp==3.13.3", "brotlicffi==1.2.0.1", "slack-bolt==1.27.0", "slack-sdk==3.40.1", "qrcode==7.4.2"]
cron = [] # croniter is now a core dependency; this extra kept for back-compat
slack = ["slack-bolt==1.27.0", "slack-sdk==3.40.1", "aiohttp==3.13.3"]
@@ -232,16 +232,12 @@ markers = [
"integration: marks tests requiring external services (API keys, Modal, etc.)",
"real_concurrent_gate: opt out of the autouse stub that disables _detect_concurrent_hermes_instances",
]
# pytest-timeout: per-test 60s hard cap with thread method.
# Discovered May 2026: the suite reliably hangs at ~96% on full runs even
# though every individual test completes in <30s. Root cause is leaked
# threads / atexit handlers accumulating across thousands of tests until
# something deadlocks at session teardown. Adding pytest-timeout (with
# thread method, which forces an interrupt into the test thread) breaks
# the deadlock — the suite then completes cleanly. The 60s cap is large
# enough that no legitimate test trips it; if a test exceeds it that's a
# real bug worth surfacing as a Timeout failure.
addopts = "-m 'not integration' -n auto --timeout=30 --timeout-method=signal"
# pytest-timeout: per-test 30s hard cap with signal method.
# This is the fallback inside each per-file pytest subprocess (see
# scripts/run_tests_parallel.py). Per-file isolation gives every test
# file a fresh Python interpreter; pytest-timeout catches Python-level
# hangs within a file.
addopts = "-m 'not integration' --timeout=30 --timeout-method=signal"
[tool.ty.environment]
python-version = "3.13"
+157 -1
View File
@@ -1368,6 +1368,18 @@ class AIAgent:
* xAI OAuth: "do not have an active Grok subscription" /
"out of available resources" / "does not have permission" + "grok"
Disambiguator for xAI (#29344): the same ``code`` text ("The caller
does not have permission to execute the specified operation") is
returned for BOTH an unsubscribed account AND a stale OAuth access
token. xAI ships an explicit signal in the ``error`` field that
tells the two apart: a ``[WKE=unauthenticated:...]`` suffix (and/or
the ``OAuth2 access token could not be validated`` phrasing) means
the credentials failed validation that's recoverable by refreshing
the token, NOT by surfacing an entitlement message. When either
signal is present we return False eagerly so the credential-pool
refresh path runs, letting long-running TUI sessions recover from
stale tokens without an exit/reopen cycle.
Extend here for new providers as we discover them (Anthropic's
Claude Max OAuth entitlement errors look distinct enough today that
the existing 1M-context-beta branch handles them; revisit if other
@@ -1377,11 +1389,29 @@ class AIAgent:
return False
if not isinstance(error_context, dict):
return False
# Build a single lowercase haystack covering every field shape the
# body might land in. ``_extract_api_error_context`` normalises to
# ``message``/``reason``, but callers (and the test suite) may also
# hand us the raw body with ``code``/``error`` keys; cover both so
# the WKE disambiguator below fires regardless of entry point.
message = str(error_context.get("message") or "").lower()
reason = str(error_context.get("reason") or "").lower()
haystack = f"{message} {reason}"
code = str(error_context.get("code") or "").lower()
err = str(error_context.get("error") or "").lower()
haystack = f"{message} {reason} {code} {err}"
if not haystack.strip():
return False
# xAI's authoritative disambiguator for "stale token" vs
# "unsubscribed account". Both conditions share the same
# permission-denied ``code`` text; only one carries this suffix.
# Bail out before the entitlement keyword checks so a stale OAuth
# token routes through the credential-refresh path instead of the
# surface-error-as-entitlement path. See #29344 for the long-
# running TUI failure mode this closes.
if "[wke=unauthenticated:" in haystack:
return False
if "oauth2 access token could not be validated" in haystack:
return False
if "do not have an active grok subscription" in haystack:
return True
if "out of available resources" in haystack and "grok" in haystack:
@@ -2563,6 +2593,39 @@ class AIAgent:
def _close_request_openai_client(self, client: Any, *, reason: str) -> None:
self._close_openai_client(client, reason=reason, shared=False)
def _abort_request_openai_client(self, client: Any, *, reason: str) -> None:
"""Cross-thread abort: shut sockets down without releasing FDs.
Companion to :meth:`_close_request_openai_client` for stranger-thread
callers (interrupt-check loop, stale-call detector). Calling
``client.close()`` from a thread that does not own the active httpx
connection raced the still-live SSL BIO and corrupted unrelated file
descriptors when the kernel recycled the just-freed TCP FD (#29507).
Here we only ``shutdown(SHUT_RDWR)`` the pool's sockets. That unblocks
the owning worker thread's pending ``recv``/``send`` with an EOF or
``EPIPE`` so it can unwind and close ``client`` from its own context
which is where the FD release belongs.
"""
if client is None:
return
try:
shutdown_count = self._force_close_tcp_sockets(client)
logger.info(
"OpenAI client aborted (%s, shared=False, tcp_force_closed=%d, "
"deferred_close=stranger_thread) %s",
reason,
shutdown_count,
self._client_log_context(),
)
except Exception as exc:
logger.debug(
"OpenAI client abort failed (%s, shared=False) %s error=%s",
reason,
self._client_log_context(),
exc,
)
def _run_codex_stream(self, api_kwargs: dict, client: Any = None, on_first_delta: callable = None):
"""Forwarder — see ``agent.codex_runtime.run_codex_stream``."""
from agent.codex_runtime import run_codex_stream
@@ -3357,6 +3420,25 @@ class AIAgent:
return content
if self._model_supports_vision():
# Vision-capable on paper — but if we've already learned in this
# session that the active (provider, model) rejects list-type
# tool content (e.g. Xiaomi MiMo's 400 "text is not set"),
# short-circuit to a text summary so we don't burn another
# round-trip relearning the same lesson. Cache populated by
# the 400 recovery path in agent.conversation_loop. Transient
# per-session; next session retries.
key = (
(getattr(self, "provider", "") or "").strip().lower(),
(getattr(self, "model", "") or "").strip(),
)
no_list = getattr(self, "_no_list_tool_content_models", None)
if no_list and key in no_list:
logger.debug(
"Tool %s: model %s/%s known to reject list-type tool "
"content this session — sending text summary",
tool_name, key[0], key[1],
)
return _multimodal_text_summary(result)
return content
summary = _multimodal_text_summary(result)
@@ -3385,6 +3467,80 @@ class AIAgent:
from agent.conversation_compression import try_shrink_image_parts_in_messages
return try_shrink_image_parts_in_messages(api_messages)
def _try_strip_image_parts_from_tool_messages(self, api_messages: list) -> bool:
"""Downgrade list-type tool messages to text summaries in-place.
Recovery path for providers that reject list-type tool message content
(e.g. Xiaomi MiMo's 400 "text is not set"; see issue #27344). Walks
``api_messages`` for any ``role: "tool"`` message whose ``content`` is
a list containing image parts, replaces the content with the existing
text part(s) (or a minimal placeholder if none survive), and records
the active (provider, model) in ``self._no_list_tool_content_models``
so subsequent ``_tool_result_content_for_active_model`` calls in this
session preemptively downgrade screenshots without a round-trip.
Returns True when at least one tool message was downgraded the
caller (the 400 recovery branch in ``agent.conversation_loop``) uses
this to decide whether to retry the API call with the modified
history or surface the original error.
"""
if not isinstance(api_messages, list):
return False
# Record (provider, model) so we don't relearn this lesson.
key = (
(getattr(self, "provider", "") or "").strip().lower(),
(getattr(self, "model", "") or "").strip(),
)
if not hasattr(self, "_no_list_tool_content_models"):
self._no_list_tool_content_models = set()
if key[1]: # only record when we actually have a model id
self._no_list_tool_content_models.add(key)
changed = False
for msg in api_messages:
if not isinstance(msg, dict) or msg.get("role") != "tool":
continue
content = msg.get("content")
if not isinstance(content, list):
continue
# Salvage any text parts so the model still sees some signal.
text_parts: List[str] = []
had_image = False
for part in content:
if not isinstance(part, dict):
if isinstance(part, str) and part.strip():
text_parts.append(part.strip())
continue
ptype = part.get("type")
if ptype == "image_url" or ptype == "input_image":
had_image = True
continue
if ptype in {"text", "input_text"}:
text = str(part.get("text") or "").strip()
if text:
text_parts.append(text)
if not had_image:
# List-type content but no image parts — leave alone (some
# providers reject ANY list content, but stripping a
# text-only list doesn't reduce ambiguity; let the caller
# surface the original error if this turns out to be the
# case).
continue
if text_parts:
msg["content"] = "\n\n".join(text_parts)
else:
msg["content"] = (
"[image content removed — provider does not accept "
"list-type tool message content]"
)
changed = True
return changed
def _anthropic_preserve_dots(self) -> bool:
"""True when using an anthropic-compatible endpoint that preserves dots in model names.
Alibaba/DashScope keeps dots (e.g. qwen3.5-plus).
+12 -1
View File
@@ -47,7 +47,9 @@ ACP_REGISTRY_MANIFEST = REPO_ROOT / "acp_registry" / "agent.json"
AUTHOR_MAP = {
# teknium (multiple emails)
"teknium1@gmail.com": "teknium1",
"cipherframe@users.noreply.github.com": "CipherFrame",
"me@promplate.dev": "CNSeniorious000",
"yichengqiao21@gmail.com": "YarrowQiao",
"erhanyasarx@gmail.com": "erhnysr",
"30366221+WorldWriter@users.noreply.github.com": "WorldWriter",
"dafeng@DafengdeMacBook-Pro.local": "WorldWriter",
@@ -58,13 +60,18 @@ AUTHOR_MAP = {
"mgongzai@gmail.com": "vKongv",
"0x.badfriend@gmail.com": "discodirector",
"altriatree@gmail.com": "TruaShamu",
"contact-me@stark-x.cn": "Stark-X",
"nat@nthrow.io": "nthrow",
"m@mobrienv.dev": "mikeyobrien",
"saeed919@pm.me": "falasi",
"chrisdlc119@outlook.com": "chdlc",
"omar@techdeveloper.site": "nycomar",
"qiyin.zuo@pcitc.com": "qiyin-code",
"mr.aashiz@gmail.com": "aashizpoudel",
"70629228+shaun0927@users.noreply.github.com": "shaun0927",
"98262967+Bihruze@users.noreply.github.com": "Bihruze",
"189280367+Lempkey@users.noreply.github.com": "Lempkey",
"leovillalbajr@gmail.com": "Lempkey",
"nidhi2894@gmail.com": "nidhi-singh02",
"30312689+aashizpoudel@users.noreply.github.com": "aashizpoudel",
"oleksii.lisikh@gmail.com": "olisikh",
@@ -639,7 +646,7 @@ AUTHOR_MAP = {
"beibei1988@proton.me": "beibi9966",
# ── bulk addition: 75 emails resolved via API, PR salvage bodies, noreply
# crossref, and GH contributor list matching (April 2026 audit) ──
"1115117931@qq.com": "aaronagent",
"1115117931@qq.com": "aaronlab",
"1506751656@qq.com": "hqhq1025",
"364939526@qq.com": "luyao618",
"hgk324@gmail.com": "houziershi",
@@ -801,6 +808,7 @@ AUTHOR_MAP = {
"xiayh17@gmail.com": "xiayh0107",
"zhujianxyz@gmail.com": "opriz",
"tuancanhnguyen706@gmail.com": "xxxigm",
"larcombe.n@gmail.com": "NickLarcombe",
"54813621+xxxigm@users.noreply.github.com": "xxxigm",
"asurla@nvidia.com": "anniesurla",
"kchantharuan@nvidia.com": "nv-kasikritc",
@@ -928,6 +936,8 @@ AUTHOR_MAP = {
"holynn@placeholder.local": "holynn-q",
"agent@hermes.local": "jacdevos",
"sunsky.lau@gmail.com": "liuhao1024",
"fabianoeq@gmail.com": "rodrigoeqnit",
"178342791+sgtworkman@users.noreply.github.com": "sgtworkman",
"qiuqfang98@qq.com": "keepcalmqqf",
"261867348+ai-ag2026@users.noreply.github.com": "ai-ag2026",
"yanzh.su@gmail.com": "YanzhongSu",
@@ -1261,6 +1271,7 @@ AUTHOR_MAP = {
"120500656+oooindefatigable@users.noreply.github.com": "ooovenenoso",
"vanthinh6886@gmail.com": "vanthinh6886", # PR #28018 salvage (yaml/flock/atomic write guards)
"erik.engervall@gmail.com": "erikengervall", # PR #28774 (firecrawl integration tag)
"egilewski@egilewski.com": "egilewski", # PR #30432 (MEDIA path traversal fix, GHSA-jmf9-9729-7pp8)
}
+40 -96
View File
@@ -3,29 +3,36 @@
# `pytest` directly to guarantee your local run matches CI behavior.
#
# What this script enforces:
# * -n 4 xdist workers (CI has 4 cores; -n auto diverges locally)
# * Per-file isolation via scripts/run_tests_parallel.py — each test
# file runs in its own freshly-spawned `python -m pytest <file>`
# subprocess. No xdist, no shared workers, no module-level leakage
# between files.
# * TZ=UTC, LANG=C.UTF-8, PYTHONHASHSEED=0 (deterministic)
# * Credential env vars blanked (conftest.py also does this, but this
# is belt-and-suspenders for anyone running `pytest` outside of
# our conftest path — e.g. calling pytest on a single file)
# * Proper venv activation
# * Env vars blanked (conftest.py also does this, but this
# is belt-and-suspenders for anyone running pytest outside our
# conftest path — e.g. on a single file)
# * Proper venv activation (probes .venv, venv, then ~/.hermes/...)
#
# Usage:
# scripts/run_tests.sh # full suite
# scripts/run_tests.sh tests/agent/ # one directory
# scripts/run_tests.sh tests/agent/test_foo.py::TestClass::test_method
# scripts/run_tests.sh --tb=long -v # pass-through pytest args
# scripts/run_tests.sh # full suite
# scripts/run_tests.sh -j 4 # cap parallelism
# scripts/run_tests.sh tests/agent/ # discover only here
# scripts/run_tests.sh tests/agent/ tests/acp/ # multiple roots
# scripts/run_tests.sh tests/foo.py # single file
# scripts/run_tests.sh tests/foo.py -- --tb=long # path + pytest args
# scripts/run_tests.sh -- -v --tb=long # pytest args only
#
# Everything after a literal '--' is passed through to each per-file
# pytest invocation. Positional path arguments before '--' override
# the default discovery root (tests/).
set -euo pipefail
# ── Locate repo root ────────────────────────────────────────────────────────
# Works whether this is the main checkout or a worktree.
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
# ── Activate venv ───────────────────────────────────────────────────────────
# Prefer a .venv in the current tree, fall back to the main checkout's venv
# (useful for worktrees where we don't always duplicate the venv).
VENV=""
for candidate in "$REPO_ROOT/.venv" "$REPO_ROOT/venv" "$HOME/.hermes/hermes-agent/venv"; do
if [ -f "$candidate/bin/activate" ]; then
@@ -41,94 +48,31 @@ fi
PYTHON="$VENV/bin/python"
# ── Ensure pytest-split is installed (required for shard-equivalent runs) ──
if ! "$PYTHON" -c "import pytest_split" 2>/dev/null; then
echo "→ installing pytest-split into $VENV"
if command -v uv >/dev/null 2>&1; then
uv pip install --python "$PYTHON" --quiet "pytest-split>=0.9,<1"
elif "$PYTHON" -m pip --version >/dev/null 2>&1; then
"$PYTHON" -m pip install --quiet "pytest-split>=0.9,<1"
else
echo "error: neither uv nor pip is available in $VENV — pytest-split is missing" >&2
echo " fix: run uv pip install -e \".[dev]\" from $REPO_ROOT" >&2
exit 1
fi
fi
# ── Hermetic environment ────────────────────────────────────────────────────
# Mirror what CI does in .github/workflows/tests.yml + what conftest.py does.
# Unset every credential-shaped var currently in the environment.
while IFS='=' read -r name _; do
case "$name" in
*_API_KEY|*_TOKEN|*_SECRET|*_PASSWORD|*_CREDENTIALS|*_ACCESS_KEY| \
*_SECRET_ACCESS_KEY|*_PRIVATE_KEY|*_OAUTH_TOKEN|*_WEBHOOK_SECRET| \
*_ENCRYPT_KEY|*_APP_SECRET|*_CLIENT_SECRET|*_CORP_SECRET|*_AES_KEY| \
AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|AWS_SESSION_TOKEN|FAL_KEY| \
GH_TOKEN|GITHUB_TOKEN)
unset "$name"
;;
esac
done < <(env)
# Unset HERMES_* behavioral vars too.
unset HERMES_YOLO_MODE HERMES_INTERACTIVE HERMES_QUIET HERMES_TOOL_PROGRESS \
HERMES_TOOL_PROGRESS_MODE HERMES_MAX_ITERATIONS HERMES_SESSION_PLATFORM \
HERMES_SESSION_CHAT_ID HERMES_SESSION_CHAT_NAME HERMES_SESSION_THREAD_ID \
HERMES_SESSION_SOURCE HERMES_SESSION_KEY HERMES_GATEWAY_SESSION \
HERMES_CRON_SESSION \
HERMES_PLATFORM HERMES_INFERENCE_PROVIDER HERMES_MANAGED HERMES_DEV \
HERMES_CONTAINER HERMES_EPHEMERAL_SYSTEM_PROMPT HERMES_TIMEZONE \
HERMES_REDACT_SECRETS HERMES_BACKGROUND_NOTIFICATIONS HERMES_EXEC_ASK \
HERMES_HOME_MODE 2>/dev/null || true
# Pin deterministic runtime.
export TZ=UTC
export LANG=C.UTF-8
export LC_ALL=C.UTF-8
export PYTHONHASHSEED=0
# ── Live-gateway test guard (developer machines) ────────────────────────────
# If a system-wide hermes pytest_live_guard plugin is installed at
# $HOME/.hermes/pytest_live_guard.py, force-load it here so every test run
# from this script gets the protection regardless of which worktree is
# checked out (in-tree tests/conftest.py guard may be missing on stale
# branches). Harmless on CI / fresh machines that don't have the file.
# ── Live-gateway plugin (computed before we drop env) ───────────────────────
EXTRA_PYTHONPATH=""
EXTRA_PYTEST_PLUGINS=""
if [ -f "$HOME/.hermes/pytest_live_guard.py" ]; then
case ":${PYTHONPATH:-}:" in
*":$HOME/.hermes:"*) ;;
*) export PYTHONPATH="${PYTHONPATH:+$PYTHONPATH:}$HOME/.hermes" ;;
esac
if [[ ",${PYTEST_PLUGINS:-}," != *,pytest_live_guard,* ]]; then
export PYTEST_PLUGINS="${PYTEST_PLUGINS:+$PYTEST_PLUGINS,}pytest_live_guard"
fi
EXTRA_PYTHONPATH="$HOME/.hermes"
EXTRA_PYTEST_PLUGINS="pytest_live_guard"
fi
# ── Worker count ────────────────────────────────────────────────────────────
# CI uses `-n auto` on ubuntu-latest which gives 4 workers. A 20-core
# workstation with `-n auto` gets 20 workers and exposes test-ordering
# flakes that CI will never see. Pin to 4 so local matches CI.
WORKERS="${HERMES_TEST_WORKERS:-4}"
# ── Run pytest ──────────────────────────────────────────────────────────────
# ── Run in hermetic env ──────────────────────────────────────────────────────
# env -i: start with empty environment, opt-in only what we need.
# No credential var can leak — you'd have to explicitly add it here.
echo "▶ running per-file parallel test suite via run_tests_parallel.py"
echo " (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; clean env)"
cd "$REPO_ROOT"
# If the first argument starts with `-` treat all args as pytest flags;
# otherwise treat them as test paths.
ARGS=("$@")
echo "▶ running pytest with $WORKERS workers, hermetic env, in $REPO_ROOT"
echo " (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)"
# -o "addopts=" clears pyproject.toml's `-n auto` so our -n wins.
# We re-add --timeout/--timeout-method here because pyproject.toml's
# addopts is wiped above. The 60s cap is essential: see pyproject.toml
# for why (suite deadlocks at session teardown without it).
exec "$PYTHON" -m pytest \
-o "addopts=" \
-n "$WORKERS" \
--timeout=30 \
--timeout-method=signal \
--ignore=tests/integration \
--ignore=tests/e2e \
-m "not integration" \
"${ARGS[@]}"
exec env -i \
PATH="$PATH" \
HOME="$HOME" \
TZ=UTC \
LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
PYTHONHASHSEED=0 \
${EXTRA_PYTHONPATH:+PYTHONPATH="$EXTRA_PYTHONPATH"} \
${EXTRA_PYTEST_PLUGINS:+PYTEST_PLUGINS="$EXTRA_PYTEST_PLUGINS"} \
"$PYTHON" "$SCRIPT_DIR/run_tests_parallel.py" "$@"
+841
View File
@@ -0,0 +1,841 @@
#!/usr/bin/env python3
"""Per-file parallel test runner.
The minimum-viable replacement for pytest-xdist + a subprocess-isolation
plugin. Discovers test files under ``tests/`` (excluding integration/e2e
unless explicitly requested), then runs one ``python -m pytest <file>``
subprocess per file, with bounded parallelism (default: ``os.cpu_count()``).
Why per-file rather than per-test?
Per-test spawn overhead (~250ms × 17k tests = 70min CPU minimum)
swamped the actual work. Per-file spawn (~250ms × ~850 files = ~3.5min)
fits in the budget while still giving every file a fresh Python
interpreter the only isolation boundary that actually matters
(cross-file module-level state leakage was the original flake source;
intra-file state is the test author's responsibility).
Why drop xdist entirely?
xdist's persistent workers accumulate state across files, which is
exactly the leakage we wanted to fix. xdist also adds complexity
(loadfile vs loadscope, --max-worker-restart, internal control plane)
that we don't need when the unit of work is "run pytest on one file".
A subprocess.Popen pool gated by a semaphore is ~60 lines and does
the job.
Usage:
python scripts/run_tests_parallel.py [pytest_args...]
Common pytest args pass through (e.g. ``-v``, ``-x``, ``--tb=long``,
``-k 'pattern'``, ``--lf``).
Environment:
HERMES_TEST_WORKERS Override worker count (default: os.cpu_count())
HERMES_TEST_PATHS Override discovery roots (colon-sep, default: 'tests')
Exit code: 0 if every file's pytest exited 0; 1 otherwise.
"""
from __future__ import annotations
import argparse
import json
import os
import subprocess
import sys
import threading
import time
from concurrent.futures import ThreadPoolExecutor, Future
from pathlib import Path
from typing import Dict, List, Tuple
# Default test discovery roots.
_DEFAULT_ROOTS = ["tests"]
# Directories to skip during discovery — the e2e + integration suites
# require real services and are run separately. Match exactly the
# ``--ignore=`` flags the previous CI command used.
_SKIP_PARTS = {"integration", "e2e"}
# Per-file wall-clock cap. Generous default — pytest-timeout still
# enforces per-test caps inside each subprocess; this is just an outer
# safety net so a single hung file can't stall the whole suite. Override
# via --file-timeout or HERMES_TEST_FILE_TIMEOUT.
_DEFAULT_FILE_TIMEOUT_SECONDS = 600.0 # 10 minutes
# Duration cache: maps relative file paths to last-observed subprocess
# wall-clock seconds. Used by ``--slice`` to distribute files across
# CI jobs by estimated total time, so no one job gets all the slow files.
_DURATIONS_FILE = "test_durations.json"
def _count_tests(
files: List[Path], repo_root: Path, pytest_passthrough: List[str]
) -> dict[Path, int]:
"""Run ``pytest --co -q`` once to count individual tests per file.
Returns a mapping ``{file_path: test_count}``. Files with zero
collected tests are omitted from the dict (not an error e.g. the
file only defines fixtures / conftest helpers).
This is a single subprocess call (~2-5s for ~1k files) that gives
us the total test count for the discovery announcement and
per-file counts for the progress lines.
``--ignore`` flags for directories in ``_SKIP_PARTS`` are added
automatically so that pytest's own collection machinery (conftest
walking, directory traversal) doesn't pull in tests we intend to
skip matching what the per-file runs will actually execute.
"""
# Build --ignore flags for skipped dirs so the --co collection
# mirrors what we'll actually run (not what pytest might find via
# conftest walking or directory traversal).
ignore_args: List[str] = []
for root in [repo_root / p for p in _DEFAULT_ROOTS]:
for part in _SKIP_PARTS:
d = root / part
if d.is_dir():
ignore_args.extend(["--ignore", str(d)])
cmd = [
sys.executable, "-m", "pytest",
"--co", "-q",
*ignore_args,
*[str(f) for f in files],
*pytest_passthrough,
]
try:
result = subprocess.run(
cmd,
cwd=repo_root,
capture_output=True,
text=True,
timeout=120,
)
except (subprocess.TimeoutExpired, OSError):
return {}
counts: dict[Path, int] = {}
for line in result.stdout.splitlines():
# Lines look like: tests/acp/test_auth.py::TestClass::test_name
if "::" not in line:
continue
file_part = line.split("::", 1)[0]
key = repo_root / file_part
counts[key] = counts.get(key, 0) + 1
return counts
def _discover_files(roots: List[Path]) -> List[Path]:
"""Return every ``test_*.py`` under the given roots (sorted).
Roots may be directories (recursed for ``test_*.py``) or explicit
``.py`` files (included as-is, even if they don't match the
``test_*`` prefix caller knows what they want).
Exclude any file whose path contains a component in ``_SKIP_PARTS``,
UNLESS the user explicitly named it as a root (in which case the
user's intent overrides the skip filter).
"""
seen: set[Path] = set()
out: List[Path] = []
for root in roots:
if not root.exists():
continue
if root.is_file():
# Explicit file: include it as-is, skip the _SKIP_PARTS filter
# since the user named it directly.
real = root.resolve()
if real not in seen:
seen.add(real)
out.append(root)
continue
for path in root.rglob("test_*.py"):
if any(part in _SKIP_PARTS for part in path.parts):
continue
real = path.resolve()
if real in seen:
continue
seen.add(real)
out.append(path)
return sorted(out)
def _kill_tree(proc: "subprocess.Popen", pgid: int | None = None) -> None:
"""Kill the pytest subprocess and every descendant it spawned.
A test run can spin up uvicorn servers, async runtimes, or other
long-running grandchildren that survive the pytest subprocess exit
if we don't kill the whole tree. ``subprocess.Popen.kill()`` only
targets the immediate child; grandchildren reparent to PID 1
(Linux) / get adopted by services.exe (Windows) and leak.
POSIX: the caller must pass ``pgid`` the process group id captured
immediately after Popen (via ``os.getpgid(proc.pid)``). We can't
look it up here in the happy path because by the time we get
called the leader process has already been reaped and its pid is
gone from the kernel's process table, even though descendants in
the group are still alive. SIGKILL'ing the captured pgid takes out
everything in that group atomically.
Windows: ``taskkill /F /T /PID`` walks the recorded ppid chain and
terminates the whole tree, even when the root has already exited.
Why not psutil: psutil walks the parent-child tree, but in the
happy path the root has already been reaped so ``psutil.Process(pid)``
can't find it; grandchildren reparented to PID 1 are also
unreachable by tree walk at that point. The platform-native
primitives (process groups / taskkill) handle both cases correctly
without an extra abstraction layer.
"""
if proc.pid is None:
return
if sys.platform == "win32":
try:
subprocess.run(
["taskkill", "/F", "/T", "/PID", str(proc.pid)],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
timeout=10,
) # windows-footgun: ok
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
pass
else:
# POSIX: kill the captured pgid. Local-import signal so the
# SIGKILL attribute is never referenced on Windows.
if pgid is not None:
try:
import signal as _signal
os.killpg(pgid, _signal.SIGKILL) # windows-footgun: ok
except (ProcessLookupError, PermissionError, OSError):
pass
# Belt-and-suspenders: ensure subprocess.communicate() sees the exit.
try:
proc.kill()
except (ProcessLookupError, OSError):
pass
def _run_one_file(
file: Path,
pytest_args: List[str],
repo_root: Path,
file_timeout: float,
) -> Tuple[Path, int, str, dict[str, int], float]:
"""Run ``python -m pytest <file> <pytest_args>`` in a fresh subprocess.
Returns (file, returncode, captured_combined_output, summary_counts, subprocess_wall_seconds).
``summary_counts`` is the result of ``_parse_pytest_summary(output)``
pytest exit codes (https://docs.pytest.org/en/stable/reference/exit-codes.html):
0 = all tests passed
1 = some tests failed
2 = test execution interrupted
3 = internal error
4 = pytest CLI usage error
5 = no tests collected
We treat exit 5 as a pass: it just means every test in the file was
skipped or filtered by a marker (e.g. ``-m 'not integration'`` skips
files where every test is marked integration). That's intentional and
not a failure mode.
On per-file timeout (``file_timeout`` seconds) or any other exception
during ``communicate()``, we kill the whole process group / process
tree so grandchildren (uvicorn servers, async runtimes, etc.) do not
orphan onto PID 1. The pytest-timeout plugin enforces per-test
timeouts inside the subprocess; this outer timeout exists only to
bound a pathologically slow or hung file as a whole.
"""
cmd = [sys.executable, "-m", "pytest", str(file), *pytest_args]
subproc_start = time.monotonic()
proc = subprocess.Popen(
cmd,
cwd=repo_root,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
# POSIX: place the child at the head of its own process group so
# _kill_tree can SIGKILL the group atomically.
# Windows: this maps to CREATE_NEW_PROCESS_GROUP in CPython 3.12+;
# _kill_tree handles the Windows path via taskkill /F /T.
start_new_session=True,
)
# Capture the pgid NOW, before the leader can exit and be reaped.
# Once the leader is reaped, os.getpgid(proc.pid) raises
# ProcessLookupError even though grandchildren in that group are
# still alive — defeating the whole cleanup. None on Windows where
# the pgid concept doesn't apply (taskkill walks ppid chain instead).
pgid: int | None = None
if sys.platform != "win32":
try:
pgid = os.getpgid(proc.pid)
except (ProcessLookupError, PermissionError):
# Astonishingly fast child? Already dead. _kill_tree's
# fallback will handle this case as a no-op.
pgid = None
try:
output, _ = proc.communicate(timeout=file_timeout)
rc = proc.returncode
except subprocess.TimeoutExpired:
_kill_tree(proc, pgid=pgid)
# Drain whatever the child wrote before we killed it so we have
# something to surface in the failure dump.
try:
output, _ = proc.communicate(timeout=10)
except subprocess.TimeoutExpired:
output = "(file timeout exceeded; output unavailable)"
rc = 124 # de facto convention for "killed by timeout".
output = (
f"(per-file timeout: {file_timeout:.0f}s exceeded; "
f"process tree SIGKILL'd)\n{output}"
)
except BaseException:
# KeyboardInterrupt / runner crash — make sure no zombie
# grandchildren outlive us.
_kill_tree(proc, pgid=pgid)
raise
else:
# Happy path: pytest exited on its own. The child process already
# cleaned up its grandchildren if it's well-behaved, but
# well-behaved is not universal — kill the group anyway. Already-
# dead processes are a no-op.
_kill_tree(proc, pgid=pgid)
if rc == 5:
# No tests collected — every test in the file was filtered out.
# Treat as a pass; surface info in a slightly distinct status
# so the operator can spot it.
rc = 0
summary = _parse_pytest_summary(output)
subproc_wall = time.monotonic() - subproc_start
return file, rc, output, summary, subproc_wall
def _parse_pytest_summary(output: str) -> dict[str, int]:
"""Extract per-file test pass/fail/skip counts from pytest output.
pytest prints a summary line like ``12 passed, 3 skipped, 1 failed in 2.1s``
as the last non-empty line before the short test summary. We scrape that
line for the individual counts so the progress display can show test-level
granularity instead of just file-level pass/fail.
Returns a dict with keys ``passed``, ``failed``, ``skipped``, ``errors``,
``xfailed``, ``xpassed`` (only keys found in the output are present).
"""
import re
result: dict[str, int] = {}
# Walk backwards from the end — the summary line is always near the tail.
for line in reversed(output.splitlines()):
line = line.strip()
if not line:
continue
# Match "N passed", "N failed", "N skipped", "N errors", "N xfailed", "N xpassed"
for m in re.finditer(r"(\d+)\s+(passed|failed|skipped|errors|xfailed|xpassed)", line):
result[m.group(2)] = int(m.group(1))
# Also match "N error" (singular — pytest uses this sometimes).
for m in re.finditer(r"(\d+)\s+error\b", line):
result.setdefault("errors", result.get("errors", 0) + int(m.group(1)))
if result:
# Found the counts line — done.
break
# Stop at the short test summary header (if any) — everything above
# that is individual failure details, not the counts line.
if line.startswith("FAILED") or line.startswith("SHORT TEST SUMMARY"):
break
return result
def _format_file(file: Path, repo_root: Path) -> str:
"""Render a test-file path for display: strip the repo-root prefix
when possible so output reads ``tests/acp/test_auth.py`` instead of
``/home/runner/work/hermes-agent/hermes-agent/tests/acp/test_auth.py``.
Falls back to the absolute path for anything outside the repo root.
"""
try:
return str(file.resolve().relative_to(repo_root.resolve()))
except ValueError:
return str(file)
def _print_progress(
tests_done: int,
total_tests: int,
file: Path,
rc: int,
dur: float,
repo_root: Path,
tests_passed: int,
tests_failed: int,
test_counts: dict[Path, int],
file_summary: dict[str, int] | None = None,
subproc_wall: float | None = None,
) -> None:
"""Single-line live progress.
When ``file_summary`` is provided (parsed from pytest output), the
per-file parenthetical shows individual test pass/fail counts instead
of just the total test count.
``subproc_wall`` is the actual subprocess wall-clock time (excluding
queue-wait). When available, the display shows both the subprocess
time and the queue-inclusive elapsed time.
"""
status = "" if rc == 0 else ""
pct = (tests_done / total_tests * 100) if total_tests else 0
# Digit width for left-side counter padding (derived from total file count).
fw = len(str(tests_passed + tests_failed))
# Build per-file test count string.
if file_summary:
parts = []
p = file_summary.get("passed", 0)
f = file_summary.get("failed", 0)
s = file_summary.get("skipped", 0)
e = file_summary.get("errors", 0)
if p:
parts.append(f"{p}")
if f:
parts.append(f"{f}")
if s:
parts.append(f"{s}s")
if e:
parts.append(f"{e}e")
# xfailed/xpassed are rare; include if present.
xf = file_summary.get("xfailed", 0)
xp = file_summary.get("xpassed", 0)
if xf:
parts.append(f"{xf}xf")
if xp:
parts.append(f"{xp}xp")
test_str = " ".join(parts) + ", " if parts else ""
else:
n_tests = test_counts.get(file, 0)
test_str = f"{n_tests} tests, " if n_tests else ""
# Show subprocess time when available; fall back to queue-inclusive dur.
if subproc_wall is not None:
time_str = f"{subproc_wall:.1f}s"
else:
time_str = f"{dur:.1f}s"
msg = (
f"[{pct:5.1f}% | {tests_done:>5}/{total_tests}"
f" | ✓{tests_passed:>{fw}} | ✗{tests_failed:>{fw}}] "
f"{status} {_format_file(file, repo_root)} ({test_str}{time_str})"
)
# Truncate to terminal width if available (no clobbering ANSI lines).
try:
cols = os.get_terminal_size().columns
if len(msg) > cols:
msg = msg[: cols - 1] + ""
except OSError:
pass
print(msg, flush=True)
def _print_inline_failure(
file: Path, output: str, repo_root: Path, pytest_passthrough: List[str]
) -> None:
"""Print a compact failure summary immediately when a file fails.
Shows the tail of the pytest output (the failure section with stack
traces) and a ready-to-run repro command, so the developer doesn't
have to wait for the full run to finish before seeing what broke.
"""
rel = _format_file(file, repo_root)
# Build a repro command the developer can copy-paste.
passthrough_str = " ".join(pytest_passthrough) if pytest_passthrough else ""
repro = f"python -m pytest {rel}"
if passthrough_str:
repro += f" {passthrough_str}"
# Grab just the failure lines (last ~30 lines of pytest output —
# typically the FAILED summary + short test info).
lines = output.rstrip().splitlines()
tail = "\n".join(lines[-30:])
print(flush=True)
print(f" ╔╍ Failed: {rel} ╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍", flush=True)
for line in tail.splitlines():
print(f"{line}", flush=True)
print(f"", flush=True)
print(f" ║ Repro: {repro}", flush=True)
print(f" ╚╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍", flush=True)
print(flush=True)
def _load_durations(repo_root: Path) -> dict[str, float]:
"""Read the duration cache from the repo root.
Returns a dict mapping relative file paths (e.g.
``tests/tools/test_code_execution.py``) to wall-clock seconds from
the last run. Missing or corrupt file empty dict (safe fallback).
"""
path = repo_root / _DURATIONS_FILE
if not path.is_file():
return {}
try:
return json.loads(path.read_text())
except (json.JSONDecodeError, OSError):
return {}
def _save_durations(
file_times: List[Tuple[Path, float]],
repo_root: Path,
) -> None:
"""Write the duration cache so future ``--slice`` runs can use it.
Merges with any existing cache so entries from files not in the
current run (e.g. from a different slice) are preserved. Keys are
repo-relative paths so the cache is portable across checkouts
and CI runners.
"""
data: dict[str, float] = _load_durations(repo_root)
for f, t in file_times:
key = _format_file(f, repo_root)
data[key] = round(t, 3)
path = repo_root / _DURATIONS_FILE
path.write_text(json.dumps(data, indent=2, sort_keys=True) + "\n")
def _slice_files(
files: List[Path],
slice_index: int,
slice_count: int,
durations: dict[str, float],
repo_root: Path,
) -> List[Path]:
"""Return the subset of *files* belonging to slice *slice_index*.
Uses **Longest Processing Time first** (LPT) distribution: sort files
by estimated duration descending, then greedily assign each file to
the slice with the smallest accumulated time so far. This minimizes
the makespan (max slice duration) and keeps CI jobs balanced.
Files with no cached duration get a default estimate of 2.0s (roughly
the P50 from profiling). This means first-time ``--slice`` runs
(no cache) still get reasonable distribution, and new files don't
all land in one slice.
``slice_index`` is 1-indexed (1..slice_count) for ergonomics
``--slice 1/4`` reads more naturally than ``--slice 0/4``.
"""
if slice_count < 2:
return files
if not (1 <= slice_index <= slice_count):
print(
f"error: --slice index must be 1..{slice_count}, got {slice_index}",
file=sys.stderr,
)
sys.exit(2)
# Build (file, estimated_duration) pairs.
default_dur = 2.0
file_durs: List[Tuple[Path, float]] = []
for f in files:
rel = _format_file(f, repo_root)
dur = durations.get(rel, default_dur)
file_durs.append((f, dur))
# Sort longest first (LPT).
file_durs.sort(key=lambda x: x[1], reverse=True)
# Greedy assignment: for each file, add it to the slice with the
# smallest current total.
bucket_files: List[List[Path]] = [[] for _ in range(slice_count)]
bucket_totals: List[float] = [0.0] * slice_count
for f, dur in file_durs:
# Find the least-loaded bucket.
min_idx = min(range(slice_count), key=lambda i: bucket_totals[i])
bucket_files[min_idx].append(f)
bucket_totals[min_idx] += dur
# Print slice summary for visibility.
target = bucket_files[slice_index - 1]
target_dur = bucket_totals[slice_index - 1]
total_dur = sum(bucket_totals)
print(
f"Slice {slice_index}/{slice_count}: {len(target)} files "
f"(~{target_dur:.0f}s estimated of {total_dur:.0f}s total)",
flush=True,
)
return target
def main() -> int:
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"-j",
"--jobs",
type=int,
default=int(os.environ.get("HERMES_TEST_WORKERS") or (os.cpu_count() or 4) * 2),
help="Parallel worker count (default: $HERMES_TEST_WORKERS or cpu_count*2)",
)
parser.add_argument(
"--paths",
default=os.environ.get("HERMES_TEST_PATHS", ":".join(_DEFAULT_ROOTS)),
help="Colon-separated discovery roots (default: 'tests')",
)
parser.add_argument(
"--include-integration",
action="store_true",
help="Don't skip integration/ e2e/ during discovery",
)
parser.add_argument(
"--file-timeout",
type=float,
default=float(
os.environ.get("HERMES_TEST_FILE_TIMEOUT", _DEFAULT_FILE_TIMEOUT_SECONDS)
),
help=(
"Per-file wall-clock cap in seconds. On timeout, the pytest "
"subprocess and its full process tree are SIGKILL'd. "
"Default: 600 (10 min), env: HERMES_TEST_FILE_TIMEOUT."
),
)
parser.add_argument(
"--slice",
metavar="I/N",
help=(
"Run only slice I of N (e.g. --slice 1/4). "
"Files are distributed across slices using cached durations "
"so each slice takes roughly equal wall time. "
"Without a duration cache, files are distributed by count. "
"Env: HERMES_TEST_SLICE (format: I/N)."
),
)
parser.add_argument(
"paths_positional",
nargs="*",
metavar="PATH",
help=(
"Restrict discovery to these paths (directories or .py files). "
"Mutually exclusive with --paths. Anything after a literal '--' "
"separator is passed through to each per-file pytest invocation."
),
)
# Manually split argv on '--' so positional paths and pytest passthrough
# args don't fight over each other. argparse's nargs="*" positional is
# greedy and will swallow everything after '--' including the pytest
# flags, defeating the convention.
argv = sys.argv[1:]
if "--" in argv:
sep = argv.index("--")
our_args, pytest_passthrough = argv[:sep], argv[sep + 1 :]
else:
our_args, pytest_passthrough = argv, []
args = parser.parse_args(our_args)
# Parse --slice (or HERMES_TEST_SLICE) early so we can exit on bad input
# before doing any expensive discovery.
slice_raw = args.slice or os.environ.get("HERMES_TEST_SLICE")
slice_index: int | None = None
slice_count: int = 1
if slice_raw:
try:
idx_s, count_s = slice_raw.split("/", 1)
slice_index = int(idx_s)
slice_count = int(count_s)
except (ValueError, AttributeError):
print(f"error: --slice must be I/N (e.g. 1/4), got: {slice_raw!r}", file=sys.stderr)
sys.exit(2)
repo_root = Path(__file__).resolve().parent.parent
# Resolve discovery roots: positional path args override --paths if any
# were supplied, otherwise --paths (which itself defaults to 'tests').
if args.paths_positional:
# Positionals can be directories OR explicit .py files. Either is
# fine — _discover_files handles both via rglob('test_*.py') for
# dirs and direct inclusion for files.
roots = [repo_root / p for p in args.paths_positional]
else:
roots = [repo_root / p for p in args.paths.split(":") if p]
if args.include_integration:
# Caller takes responsibility — typically used via explicit -k filter.
global _SKIP_PARTS # noqa: PLW0603 — config knob
_SKIP_PARTS = set()
files = _discover_files(roots)
if not files:
print(f"No test files discovered under {[str(r) for r in roots]}", file=sys.stderr)
return 1
# Count individual tests per file via a single pytest --co pass.
test_counts = _count_tests(files, repo_root, pytest_passthrough)
total_tests = sum(test_counts.values())
# Apply slicing if requested — distribute files across CI jobs by
# estimated duration so no one job gets all the slow files.
if slice_index is not None:
durations = _load_durations(repo_root)
files = _slice_files(files, slice_index, slice_count, durations, repo_root)
# Recount after slicing.
test_counts = {f: test_counts[f] for f in files if f in test_counts}
total_tests = sum(test_counts.values())
print(
f"Discovered {len(files)} test files ({total_tests} tests) under "
f"{[str(r.relative_to(repo_root)) if r.is_relative_to(repo_root) else str(r) for r in roots]}; "
f"running with -j {args.jobs}",
flush=True,
)
# Capture and print on completion (out-of-order is fine — keeps the
# terminal clean rather than interleaving N parallel pytest outputs).
failures: List[Tuple[Path, str, Dict[str, int]]] = []
file_times: List[Tuple[Path, float]] = [] # (file, subprocess_wall) for distribution
started = time.monotonic()
files_done = 0
tests_done = 0
pass_count = 0
fail_count = 0
tests_passed = 0
tests_failed = 0
lock = threading.Lock()
def _on_done(file: Path, started_at: float, fut: "Future[Tuple[Path, int, str, dict[str, int], float]]") -> None:
nonlocal files_done, tests_done, pass_count, fail_count, tests_passed, tests_failed
n_tests = test_counts.get(file, 0)
try:
fpath, rc, output, summary, subproc_wall = fut.result()
except Exception as exc: # noqa: BLE001 — must always advance counter
with lock:
files_done += 1
tests_done += n_tests
fail_count += 1
failures.append((file, f"runner crashed: {exc!r}", {}))
_print_progress(
tests_done, total_tests, file, 1,
time.monotonic() - started_at,
repo_root, tests_passed, tests_failed,
test_counts,
subproc_wall=0.0,
)
return
with lock:
files_done += 1
tests_done += n_tests
# Accumulate test-level counts from parsed summary.
tests_passed += summary.get("passed", 0)
tests_failed += summary.get("failed", 0)
file_times.append((fpath, subproc_wall))
if rc == 0:
pass_count += 1
else:
fail_count += 1
failures.append((fpath, output, summary))
_print_progress(
tests_done, total_tests, fpath, rc,
time.monotonic() - started_at,
repo_root, tests_passed, tests_failed,
test_counts,
file_summary=summary,
subproc_wall=subproc_wall,
)
if rc != 0:
_print_inline_failure(fpath, output, repo_root, pytest_passthrough)
with ThreadPoolExecutor(max_workers=args.jobs) as pool:
futures: List[Future] = []
for file in files:
t0 = time.monotonic()
fut = pool.submit(
_run_one_file, file, pytest_passthrough, repo_root, args.file_timeout
)
fut.add_done_callback(lambda f, file=file, t0=t0: _on_done(file, t0, f))
futures.append(fut)
# Block until everything's done. ThreadPoolExecutor.__exit__ waits
# for all submitted work, but doing it explicitly here makes the
# control flow obvious.
for fut in futures:
fut.result() if fut.exception() is None else None
elapsed = time.monotonic() - started
print()
pct = (tests_done / total_tests * 100) if total_tests else 0
print(f"=== Summary: {len(files)} files, {tests_passed} tests passed, {tests_failed} failed ({pct:.0f}% complete) in {elapsed:.1f}s ({args.jobs} workers) ===")
# Save durations for future --slice runs. Each slice writes its own
# partial test_durations.json; a CI merge step joins them later.
# Locally, _save_durations merges with any existing cache so entries
# from previous runs aren't lost.
if file_times:
_save_durations(file_times, repo_root)
print(f" Durations cached to {_DURATIONS_FILE} ({len(file_times)} files)")
# Per-file time distribution (throwaway diagnostic — shows how
# subprocess time is distributed so we can see if startup dominates).
if file_times:
times = sorted([t for _, t in file_times])
total_subproc = sum(times)
median_t = times[len(times) // 2]
p50 = median_t
p90 = times[int(len(times) * 0.90)]
p95 = times[int(len(times) * 0.95)]
p99 = times[min(int(len(times) * 0.99), len(times) - 1)]
max_t = times[-1]
# How many files finish in <1s? That's roughly "just startup".
fast = sum(1 for t in times if t < 1.0)
fast_2s = sum(1 for t in times if t < 2.0)
print()
print(f"=== Per-file subprocess time distribution ===")
print(f" Files: {len(times)}")
print(f" Total subprocess CPU-wall: {total_subproc:.1f}s (runner wall: {elapsed:.1f}s, parallelism: {args.jobs}x)")
print(f" P50: {p50:.2f}s P90: {p90:.2f}s P95: {p95:.2f}s P99: {p99:.2f}s Max: {max_t:.2f}s")
print(f" <1s: {fast} files ({fast/len(times)*100:.0f}%) <2s: {fast_2s} files ({fast_2s/len(times)*100:.0f}%)")
# Top 10 slowest files — likely the ones dragging the run.
slowest = sorted(file_times, key=lambda x: x[1], reverse=True)[:10]
print(f" Top 10 slowest:")
for f, t in slowest:
print(f" {t:>6.2f}s {_format_file(f, repo_root)}")
if failures:
print()
print("=== Failure output ===")
for file, output, _summary in failures:
print()
print(f"--- {_format_file(file, repo_root)} ---")
print(output.rstrip())
print()
# Split: files with actual test failures vs non-zero exit for other reasons
test_fail_files = [(f, s) for f, _o, s in failures if s.get("failed", 0) > 0]
all_passed_but_nonzero = [(f, s) for f, _o, s in failures
if s.get("failed", 0) == 0 and s.get("passed", 0) > 0]
no_tests_ran = [(f, s) for f, _o, s in failures
if s.get("failed", 0) == 0 and s.get("passed", 0) == 0]
if test_fail_files:
total_tf = sum(s.get("failed", 0) for _, s in test_fail_files)
print(f"=== {len(test_fail_files)} file{'s' if len(test_fail_files) != 1 else ''} with test failures ({total_tf} test{'s' if total_tf != 1 else ''} failed) ===")
for file, s in test_fail_files:
nf = s.get("failed", 0)
print(f" {_format_file(file, repo_root)} ({nf} test{'s' if nf != 1 else ''} failed)")
if all_passed_but_nonzero:
print(f"=== {len(all_passed_but_nonzero)} file{'s' if len(all_passed_but_nonzero) != 1 else ''} where all tests passed but pytest exited non-zero (warnings-as-errors, hook failures, etc.) ===")
for file, s in all_passed_but_nonzero:
print(f" {_format_file(file, repo_root)} ({s.get('passed', 0)} passed)")
if no_tests_ran:
print(f"=== {len(no_tests_ran)} file{'s' if len(no_tests_ran) != 1 else ''} where no tests ran (collection/import error, timeout before collection, etc.) ===")
for file, s in no_tests_ran:
print(f" {_format_file(file, repo_root)}")
return 1
return 0
if __name__ == "__main__":
sys.exit(main())
+26
View File
@@ -971,6 +971,18 @@ class TestSessionConfiguration:
"hermes_cli.runtime_provider.resolve_runtime_provider",
fake_resolve_runtime_provider,
)
# Pin the parser so this test doesn't depend on live
# ``_KNOWN_PROVIDER_NAMES`` / ``_PROVIDER_ALIASES`` module state
# (sibling of the same hardening on
# ``test_model_switch_uses_requested_provider``).
monkeypatch.setattr(
"hermes_cli.models.parse_model_input",
lambda raw, current: ("anthropic", "claude-sonnet-4-6"),
)
monkeypatch.setattr(
"hermes_cli.models.detect_provider_for_model",
lambda model, current: None,
)
manager = SessionManager(db=SessionDB(tmp_path / "state.db"))
with patch("run_agent.AIAgent", side_effect=fake_agent):
@@ -1543,6 +1555,20 @@ class TestSlashCommands:
"hermes_cli.runtime_provider.resolve_runtime_provider",
fake_resolve_runtime_provider,
)
# Pin the model-string parser independently of the live
# ``_KNOWN_PROVIDER_NAMES`` / ``_PROVIDER_ALIASES`` module state.
# Otherwise any test in the same xdist worker that mutates those
# globals (e.g. registers a custom provider that shadows
# ``anthropic``) flakes this one — observed once in CI as
# ``'custom' == 'anthropic'``.
monkeypatch.setattr(
"hermes_cli.models.parse_model_input",
lambda raw, current: ("anthropic", "claude-sonnet-4-6"),
)
monkeypatch.setattr(
"hermes_cli.models.detect_provider_for_model",
lambda model, current: None,
)
manager = SessionManager(db=SessionDB(tmp_path / "state.db"))
with patch("run_agent.AIAgent", side_effect=fake_agent):
+35
View File
@@ -40,6 +40,16 @@ def _clean_env(monkeypatch):
"ANTHROPIC_API_KEY", "ANTHROPIC_TOKEN", "CLAUDE_CODE_OAUTH_TOKEN",
):
monkeypatch.delenv(key, raising=False)
# Module-level unhealthy cache (10-min TTL) leaks between tests;
# earlier tests that call _mark_provider_unhealthy() poison the
# cache for later ones, causing _resolve_auto to skip providers
# that the test patched to return valid clients.
import agent.auxiliary_client as _aux_mod
_aux_mod._aux_unhealthy_until.clear()
_aux_mod._aux_unhealthy_logged_at.clear()
yield
_aux_mod._aux_unhealthy_until.clear()
_aux_mod._aux_unhealthy_logged_at.clear()
@pytest.fixture
@@ -461,6 +471,17 @@ class TestExpiredCodexFallback:
import base64
import time as _time
# Belt-and-suspenders: _try_openrouter marks openrouter unhealthy
# when OPENROUTER_API_KEY is absent (which the preceding test in
# this class exercises). The file-level _clean_env autouse fixture
# clears the cache, but fixture ordering with the conftest
# _hermetic_environment autouse can leave a narrow window where
# the mark reappears. Explicitly clear here so this test is
# independent of run order.
import agent.auxiliary_client as _aux_mod
_aux_mod._aux_unhealthy_until.clear()
_aux_mod._aux_unhealthy_logged_at.clear()
header = base64.urlsafe_b64encode(b'{"alg":"RS256","typ":"JWT"}').rstrip(b"=").decode()
payload_data = json.dumps({"exp": int(_time.time()) - 3600}).encode()
payload = base64.urlsafe_b64encode(payload_data).rstrip(b"=").decode()
@@ -1047,6 +1068,20 @@ class TestGetProviderChain:
class TestTryPaymentFallback:
"""_try_payment_fallback skips the failed provider and tries alternatives."""
@pytest.fixture(autouse=True)
def _clear_unhealthy_cache(self):
"""Earlier tests in this file call _mark_provider_unhealthy() which
pollutes the module-level ``_aux_unhealthy_until`` dict (10-min TTL).
Without this cleanup the fallback chain skips providers we've patched
to return valid clients the patched function is never called.
"""
from agent.auxiliary_client import _aux_unhealthy_until, _aux_unhealthy_logged_at
_aux_unhealthy_until.clear()
_aux_unhealthy_logged_at.clear()
yield
_aux_unhealthy_until.clear()
_aux_unhealthy_logged_at.clear()
def test_skips_failed_provider(self):
mock_client = MagicMock()
with patch("agent.auxiliary_client._try_openrouter", return_value=(None, None)), \
+4 -4
View File
@@ -65,11 +65,11 @@ class TestCompress:
assert result == msgs
def test_truncation_fallback_no_client(self, compressor):
# compressor has client=None and abort_on_summary_failure=False (default),
# so the LEGACY fallback path inserts a static "summary unavailable"
# placeholder and the middle window is dropped.
# Simulate "no summarizer available" explicitly. call_llm can otherwise
# discover the developer's real auxiliary credentials from auth state.
msgs = [{"role": "system", "content": "System prompt"}] + self._make_messages(10)
result = compressor.compress(msgs)
with patch("agent.context_compressor.call_llm", side_effect=RuntimeError("no provider")):
result = compressor.compress(msgs)
assert len(result) < len(msgs)
# Should keep system message and last N
assert result[0]["role"] == "system"
@@ -0,0 +1,93 @@
from types import SimpleNamespace
from agent.agent_init import _merge_custom_provider_extra_body
def test_custom_provider_extra_body_merges_into_request_overrides():
agent = SimpleNamespace(
provider="custom",
model="google/gemma-4-31b-it",
base_url="https://example.test/v1",
request_overrides={"service_tier": "priority"},
)
_merge_custom_provider_extra_body(
agent,
[
{
"name": "gemma",
"base_url": "https://example.test/v1/",
"model": "google/gemma-4-31b-it",
"extra_body": {
"enable_thinking": True,
"reasoning_effort": "high",
},
}
],
)
assert agent.request_overrides == {
"service_tier": "priority",
"extra_body": {
"enable_thinking": True,
"reasoning_effort": "high",
},
}
def test_custom_provider_extra_body_preserves_caller_override():
agent = SimpleNamespace(
provider="custom",
model="google/gemma-4-31b-it",
base_url="https://example.test/v1",
request_overrides={
"extra_body": {
"reasoning_effort": "low",
"caller_only": True,
}
},
)
_merge_custom_provider_extra_body(
agent,
[
{
"name": "gemma",
"base_url": "https://example.test/v1",
"model": "google/gemma-4-31b-it",
"extra_body": {
"enable_thinking": True,
"reasoning_effort": "high",
},
}
],
)
assert agent.request_overrides["extra_body"] == {
"enable_thinking": True,
"reasoning_effort": "low",
"caller_only": True,
}
def test_custom_provider_extra_body_ignores_other_custom_models():
agent = SimpleNamespace(
provider="custom",
model="other-model",
base_url="https://example.test/v1",
request_overrides={},
)
_merge_custom_provider_extra_body(
agent,
[
{
"name": "gemma",
"base_url": "https://example.test/v1",
"model": "google/gemma-4-31b-it",
"extra_body": {"enable_thinking": True},
}
],
)
assert agent.request_overrides == {}
+64
View File
@@ -56,6 +56,7 @@ class TestFailoverReason:
"overloaded", "server_error", "timeout",
"context_overflow", "payload_too_large", "image_too_large",
"model_not_found", "format_error",
"multimodal_tool_content_unsupported",
"provider_policy_blocked",
"thinking_signature", "long_context_tier",
"oauth_long_context_beta_forbidden",
@@ -1256,3 +1257,66 @@ class TestRateLimitErrorWithoutStatusCode:
e.status_code = None
result = classify_api_error(e, provider="copilot", model="gpt-4o")
assert result.reason != FailoverReason.rate_limit
# ── Test: multimodal_tool_content_unsupported pattern ───────────────────
class TestMultimodalToolContentUnsupported:
"""Issue #27344 — providers that reject list-type tool message content
should be classified as ``multimodal_tool_content_unsupported`` so the
retry loop can downgrade screenshots to text and try again.
"""
def test_xiaomi_mimo_text_is_not_set_pattern(self):
"""The actual Xiaomi MiMo 400 wording from the bug report."""
e = MockAPIError(
"Error code: 400 - {'error': {'code': '400', 'message': 'Param Incorrect', 'param': 'text is not set', 'type': ''}}",
status_code=400,
)
result = classify_api_error(e, provider="xiaomi", model="mimo-v2.5")
assert result.reason == FailoverReason.multimodal_tool_content_unsupported
assert result.retryable is True
def test_generic_tool_message_must_be_string(self):
e = MockAPIError(
"tool message content must be a string",
status_code=400,
)
result = classify_api_error(e, provider="custom", model="some-model")
assert result.reason == FailoverReason.multimodal_tool_content_unsupported
def test_expected_string_got_list(self):
e = MockAPIError(
"Schema validation failed: expected string, got list",
status_code=400,
)
result = classify_api_error(e, provider="custom", model="some-model")
assert result.reason == FailoverReason.multimodal_tool_content_unsupported
def test_multimodal_tool_content_takes_priority_over_context_overflow(self):
"""Some providers return a 400 whose message contains BOTH
'text is not set' and a length-shaped phrase; the tool-content
recovery is cheaper than compression so it must win the priority.
"""
e = MockAPIError(
"text is not set; context length exceeded",
status_code=400,
)
result = classify_api_error(e, provider="xiaomi", model="mimo-v2.5")
assert result.reason == FailoverReason.multimodal_tool_content_unsupported
def test_no_status_code_path_also_classifies(self):
"""When the error reaches us without a status code (transport
layer ate it) the message-only classifier branch must also
recognise the pattern.
"""
e = MockTransportError("tool_call.content must be string")
result = classify_api_error(e, provider="alibaba", model="qwen3.5-plus")
assert result.reason == FailoverReason.multimodal_tool_content_unsupported
def test_unrelated_400_is_not_misclassified(self):
"""Make sure the patterns don't false-positive on normal 400s."""
e = MockAPIError("bad request: missing field 'model'", status_code=400)
result = classify_api_error(e, provider="openrouter", model="anthropic/claude-sonnet-4")
assert result.reason != FailoverReason.multimodal_tool_content_unsupported
+275
View File
@@ -0,0 +1,275 @@
"""Tests for HERMES_HOME credential-file read blocking in file_safety.
Regression for https://github.com/NousResearch/hermes-agent/issues/17656
``read_file`` was previously only sandboxed against ``HERMES_HOME`` itself,
which left ``auth.json`` and ``.anthropic_oauth.json`` (plaintext provider
keys + OAuth tokens) readable by the agent. A prompt-injection reaching
``read_file`` could exfiltrate active credentials.
These tests verify that ``get_read_block_error`` returns a denial message
for the credential stores while leaving arbitrary ``HERMES_HOME`` files
readable, and that the existing ``skills/.hub`` deny still applies.
"""
from __future__ import annotations
import os
from pathlib import Path
import pytest
@pytest.fixture()
def fake_home(tmp_path, monkeypatch):
"""Point ``_hermes_home_path()`` at a tmp dir for isolated checks."""
import agent.file_safety as fs
home = tmp_path / "hermes_home"
home.mkdir()
monkeypatch.setattr(fs, "_hermes_home_path", lambda: home)
return home
def _create(home: Path, rel: str | Path) -> Path:
"""Create the file (with parents) so realpath() resolves it."""
p = home / rel
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text("dummy", encoding="utf-8")
return p
def test_auth_json_blocked(fake_home):
from agent.file_safety import get_read_block_error
auth = _create(fake_home, "auth.json")
err = get_read_block_error(str(auth))
assert err is not None
assert "credential store" in err
assert "auth.json" in err
def test_auth_lock_blocked(fake_home):
from agent.file_safety import get_read_block_error
lock = _create(fake_home, "auth.lock")
err = get_read_block_error(str(lock))
assert err is not None
assert "credential store" in err
def test_anthropic_oauth_json_blocked(fake_home):
from agent.file_safety import get_read_block_error
oauth = _create(fake_home, ".anthropic_oauth.json")
err = get_read_block_error(str(oauth))
assert err is not None
assert "credential store" in err
def test_arbitrary_hermes_home_file_not_blocked(fake_home):
"""Non-credential files inside HERMES_HOME stay readable."""
from agent.file_safety import get_read_block_error
safe = _create(fake_home, "session_log.txt")
assert get_read_block_error(str(safe)) is None
def test_subdirectory_named_auth_json_not_blocked(fake_home):
"""Only the top-level auth.json is the credential store; a file with the
same name in a subdirectory (e.g., a skill mock) must remain readable."""
from agent.file_safety import get_read_block_error
nested = _create(fake_home, Path("skills") / "my-skill" / "auth.json")
assert get_read_block_error(str(nested)) is None
def test_skills_hub_block_still_applies(fake_home):
"""Regression guard: the original skills/.hub deny must keep working."""
from agent.file_safety import get_read_block_error
hub_file = _create(fake_home, "skills/.hub/manifest.json")
err = get_read_block_error(str(hub_file))
assert err is not None
assert "internal Hermes cache file" in err
def test_path_traversal_resolves_to_blocked(fake_home, tmp_path):
"""A path that traverses through a sibling dir back into HERMES_HOME's
auth.json must still be caught the check resolves through realpath."""
from agent.file_safety import get_read_block_error
_create(fake_home, "auth.json")
sibling = tmp_path / "elsewhere"
sibling.mkdir()
traversal = sibling / ".." / "hermes_home" / "auth.json"
err = get_read_block_error(str(traversal))
assert err is not None
assert "credential store" in err
def test_symlink_to_auth_json_blocked(fake_home, tmp_path):
"""A symlink pointing at HERMES_HOME/auth.json from outside the home
must be blocked readlink-resolution catches the indirection."""
from agent.file_safety import get_read_block_error
target = _create(fake_home, "auth.json")
link = tmp_path / "shim.json"
try:
os.symlink(target, link)
except (OSError, NotImplementedError):
pytest.skip("symlinks not supported on this platform/filesystem")
err = get_read_block_error(str(link))
assert err is not None
assert "credential store" in err
def test_read_file_tool_blocks_relative_path_under_terminal_cwd(
fake_home, tmp_path, monkeypatch
):
"""Bypass guard: a relative path like ``"auth.json"`` resolved by
``read_file_tool`` against ``TERMINAL_CWD == HERMES_HOME`` must still
be blocked, even though ``get_read_block_error``'s own ``resolve()``
is anchored at the (different) Python process cwd.
"""
import json
import tools.file_tools as ft
_create(fake_home, "auth.json")
# Force the file_tools resolver to anchor relative paths at HERMES_HOME
# while the Python process cwd remains tmp_path (a different directory).
monkeypatch.setenv("TERMINAL_CWD", str(fake_home))
monkeypatch.chdir(tmp_path)
monkeypatch.setattr(
ft, "_get_live_tracking_cwd", lambda task_id="default": None
)
out = json.loads(ft.read_file_tool("auth.json"))
assert "error" in out
assert "credential store" in out["error"]
# ---------------------------------------------------------------------------
# Widening: .env, webhook_subscriptions.json, mcp-tokens/
# ---------------------------------------------------------------------------
def test_dotenv_blocked(fake_home):
""".env in HERMES_HOME holds API keys — blocked."""
from agent.file_safety import get_read_block_error
env = _create(fake_home, ".env")
err = get_read_block_error(str(env))
assert err is not None
assert "credential store" in err
def test_webhook_subscriptions_blocked(fake_home):
"""webhook_subscriptions.json holds per-route HMAC secrets — blocked."""
from agent.file_safety import get_read_block_error
subs = _create(fake_home, "webhook_subscriptions.json")
err = get_read_block_error(str(subs))
assert err is not None
assert "credential store" in err
def test_mcp_tokens_file_blocked(fake_home):
"""Files under mcp-tokens/ hold OAuth tokens — blocked."""
from agent.file_safety import get_read_block_error
tok = _create(fake_home, Path("mcp-tokens") / "github.json")
err = get_read_block_error(str(tok))
assert err is not None
assert "MCP token" in err
def test_mcp_tokens_nested_blocked(fake_home):
"""Nested files inside mcp-tokens/ are also blocked."""
from agent.file_safety import get_read_block_error
tok = _create(fake_home, Path("mcp-tokens") / "providers" / "azure.json")
err = get_read_block_error(str(tok))
assert err is not None
assert "MCP token" in err
def test_mcp_tokens_dir_itself_blocked(fake_home):
"""The mcp-tokens directory itself is blocked (listing is exfiltrating)."""
from agent.file_safety import get_read_block_error
tokens_dir = fake_home / "mcp-tokens"
tokens_dir.mkdir(parents=True, exist_ok=True)
err = get_read_block_error(str(tokens_dir))
assert err is not None
assert "MCP token" in err
def test_identically_named_files_outside_hermes_home_not_blocked(
fake_home, tmp_path
):
"""A project's ``.env``, ``auth.json``, or ``mcp-tokens/`` outside
HERMES_HOME must remain readable the gate is per-location, not
per-filename."""
from agent.file_safety import get_read_block_error
project = tmp_path / "myproject"
project.mkdir()
for rel in (".env", "auth.json"):
p = project / rel
p.write_text("not secret here", encoding="utf-8")
assert get_read_block_error(str(p)) is None, (
f"{rel} outside HERMES_HOME should NOT be blocked"
)
tokens = project / "mcp-tokens"
tokens.mkdir()
tok_file = tokens / "token.json"
tok_file.write_text("not really a token", encoding="utf-8")
assert get_read_block_error(str(tok_file)) is None
def test_config_yaml_not_blocked(fake_home):
"""config.yaml is NOT a credential file — agent should still be
able to read it for debugging. (Writes are denied separately by
is_write_denied; reads stay allowed.)"""
from agent.file_safety import get_read_block_error
cfg = _create(fake_home, "config.yaml")
assert get_read_block_error(str(cfg)) is None
def test_profile_mode_blocks_root_credentials(tmp_path, monkeypatch):
"""Under a profile, HERMES_HOME = <root>/profiles/<name>, but
<root>/auth.json must ALSO be blocked credentials at root are
inherited by every profile."""
import agent.file_safety as fs
root = tmp_path / "hermes"
profile = root / "profiles" / "coder"
profile.mkdir(parents=True)
monkeypatch.setattr(fs, "_hermes_home_path", lambda: profile)
monkeypatch.setattr(fs, "_hermes_root_path", lambda: root)
from agent.file_safety import get_read_block_error
# Profile-local credential store: blocked
profile_auth = profile / "auth.json"
profile_auth.write_text("x")
assert "credential store" in (get_read_block_error(str(profile_auth)) or "")
# Root-level credential store: ALSO blocked (this is the widening)
root_auth = root / "auth.json"
root_auth.write_text("x")
assert "credential store" in (get_read_block_error(str(root_auth)) or "")
# Root-level .env: blocked too
root_env = root / ".env"
root_env.write_text("x")
assert "credential store" in (get_read_block_error(str(root_env)) or "")
# Root-level mcp-tokens: blocked
root_tok = root / "mcp-tokens" / "gh.json"
root_tok.parent.mkdir(parents=True, exist_ok=True)
root_tok.write_text("x")
assert "MCP token" in (get_read_block_error(str(root_tok)) or "")
+188
View File
@@ -1060,3 +1060,191 @@ class TestHonchoCadenceTracking:
p.on_turn_start(2, "second message")
should_skip = p._injection_frequency == "first-turn" and p._turn_count > 1
assert should_skip, "Second turn (turn 2) SHOULD be skipped"
class TestMemoryToolToolsetGate:
"""Issue #5544: memory provider tools must respect platform_toolsets.
Before the fix, MemoryManager.get_all_tool_schemas() output was appended
to AIAgent.tools unconditionally in agent_init.py bypassing the
enabled_toolsets filter. Result: `platform_toolsets: telegram: []`
still leaked fact_store and other memory tools into the tool surface,
causing 10x latency on local models (Qwen3-30B: 1.7s 42s) and
tool-call loops on small models.
These tests mirror the gate logic in agent/agent_init.py around the
memory provider tool injection block. The gate condition is:
enabled_toolsets is None no filter, inject (backward compat)
"memory" in enabled_toolsets user opted in, inject
otherwise (incl. []) skip injection
"""
@staticmethod
def _run_memory_injection(enabled_toolsets, memory_manager):
"""Simulate the gated memory-tool injection block from agent_init.py."""
tools = []
valid_tool_names = set()
if memory_manager and tools is not None and (
enabled_toolsets is None or "memory" in enabled_toolsets
):
_existing = {
t.get("function", {}).get("name")
for t in tools
if isinstance(t, dict)
}
for _schema in memory_manager.get_all_tool_schemas():
_tname = _schema.get("name", "")
if _tname and _tname in _existing:
continue
tools.append({"type": "function", "function": _schema})
if _tname:
valid_tool_names.add(_tname)
_existing.add(_tname)
return tools, valid_tool_names
def _mgr_with_tools(self, *tool_names):
"""Build a MemoryManager whose providers expose the named tool schemas."""
mgr = MemoryManager()
p = FakeMemoryProvider(
"ext",
tools=[{"name": n, "description": n, "parameters": {}} for n in tool_names],
)
mgr.add_provider(p)
return mgr
def test_none_toolsets_injects(self):
"""enabled_toolsets=None (no filter) injects memory tools — backward compat."""
mgr = self._mgr_with_tools("fact_store")
tools, names = self._run_memory_injection(None, mgr)
assert "fact_store" in names
assert any(t["function"]["name"] == "fact_store" for t in tools)
def test_memory_in_toolsets_injects(self):
"""enabled_toolsets including 'memory' injects memory tools."""
mgr = self._mgr_with_tools("fact_store")
tools, names = self._run_memory_injection(["terminal", "memory", "web"], mgr)
assert "fact_store" in names
def test_empty_toolsets_blocks_injection(self):
"""`platform_toolsets: telegram: []` must suppress memory tools. (#5544)"""
mgr = self._mgr_with_tools("fact_store")
tools, names = self._run_memory_injection([], mgr)
assert tools == []
assert names == set()
def test_toolsets_without_memory_blocks_injection(self):
"""Toolset list that doesn't name 'memory' must suppress injection."""
mgr = self._mgr_with_tools("fact_store")
tools, names = self._run_memory_injection(["terminal", "web"], mgr)
assert tools == []
assert names == set()
def test_no_memory_manager_no_injection(self):
"""Gate is moot without a memory manager."""
tools, names = self._run_memory_injection(None, None)
assert tools == []
def test_multiple_schemas_all_blocked_together(self):
"""When the gate is closed, no memory tools leak — not even partially."""
mgr = self._mgr_with_tools("fact_store", "memory_search", "memory_add")
tools, names = self._run_memory_injection(["terminal"], mgr)
assert tools == []
assert names == set()
def test_multiple_schemas_all_injected_when_enabled(self):
"""When the gate is open, every memory tool schema is injected."""
mgr = self._mgr_with_tools("fact_store", "memory_search", "memory_add")
tools, names = self._run_memory_injection(None, mgr)
assert names == {"fact_store", "memory_search", "memory_add"}
class TestContextEngineToolsetGate:
"""Issue #5544 (sibling): context engine tools follow the same gate.
`agent.context_compressor.get_tool_schemas()` (e.g. lcm_grep, lcm_describe,
lcm_expand) was appended to AIAgent.tools unconditionally. Same blind
injection class as the memory bug; same local-model penalty. Gate name:
"context_engine" (matches the existing plugin-system convention).
"""
@staticmethod
def _run_context_engine_injection(enabled_toolsets, compressor):
"""Simulate the gated context-engine injection block from agent_init.py."""
tools = []
valid_tool_names = set()
engine_tool_names = set()
if (
compressor is not None
and tools is not None
and (
enabled_toolsets is None
or "context_engine" in enabled_toolsets
)
):
_existing = {
t.get("function", {}).get("name")
for t in tools
if isinstance(t, dict)
}
for _schema in compressor.get_tool_schemas():
_tname = _schema.get("name", "")
if _tname and _tname in _existing:
continue
tools.append({"type": "function", "function": _schema})
if _tname:
valid_tool_names.add(_tname)
engine_tool_names.add(_tname)
_existing.add(_tname)
return tools, valid_tool_names, engine_tool_names
class _FakeCompressor:
def __init__(self, schemas):
self._schemas = schemas
def get_tool_schemas(self):
return list(self._schemas)
def _compressor_with(self, *tool_names):
return self._FakeCompressor(
[{"name": n, "description": n, "parameters": {}} for n in tool_names]
)
def test_none_toolsets_injects(self):
"""enabled_toolsets=None injects context-engine tools — backward compat."""
c = self._compressor_with("lcm_grep", "lcm_describe", "lcm_expand")
tools, names, engine_names = self._run_context_engine_injection(None, c)
assert engine_names == {"lcm_grep", "lcm_describe", "lcm_expand"}
def test_context_engine_in_toolsets_injects(self):
"""enabled_toolsets including 'context_engine' injects the tools."""
c = self._compressor_with("lcm_grep")
tools, names, engine_names = self._run_context_engine_injection(
["terminal", "context_engine"], c
)
assert "lcm_grep" in engine_names
def test_empty_toolsets_blocks_injection(self):
"""`platform_toolsets: telegram: []` must suppress context-engine tools."""
c = self._compressor_with("lcm_grep")
tools, names, engine_names = self._run_context_engine_injection([], c)
assert tools == []
assert engine_names == set()
def test_toolsets_without_context_engine_blocks_injection(self):
"""A toolset list that doesn't name 'context_engine' suppresses injection."""
c = self._compressor_with("lcm_grep", "lcm_describe")
tools, names, engine_names = self._run_context_engine_injection(
["terminal", "memory"], c
)
assert tools == []
assert engine_names == set()
def test_no_compressor_no_injection(self):
"""Gate is moot without a context_compressor."""
tools, names, engine_names = self._run_context_engine_injection(None, None)
assert tools == []
+28
View File
@@ -164,6 +164,7 @@ class TestDefaultContextLengths:
"grok-4-1-fast": 2000000,
"grok-4-fast": 2000000,
"grok-4": 256000,
"grok-build": 256000,
"grok-code-fast": 256000,
"grok-3": 131072,
"grok-2": 131072,
@@ -195,6 +196,7 @@ class TestDefaultContextLengths:
("grok-4-fast-non-reasoning", 2000000),
("grok-4", 256000),
("grok-4-0709", 256000),
("grok-build-0.1", 256000),
("grok-code-fast-1", 256000),
("grok-3", 131072),
("grok-3-mini", 131072),
@@ -210,6 +212,32 @@ class TestDefaultContextLengths:
f"{model_id}: expected {expected_ctx}, got {actual}"
)
def test_xai_oauth_grok_build_uses_xai_models_dev_context(self):
"""xAI OAuth should share the xAI provider metadata path.
The xAI /v1/models endpoint does not currently include context fields
for grok-build-0.1, so this guards against falling through to the
generic "grok" 131k fallback when using OAuth credentials.
"""
registry = {
"xai": {
"models": {
"grok-build-0.1": {
"limit": {"context": 256000, "output": 64000},
},
},
},
}
with patch("agent.model_metadata.get_cached_context_length", return_value=None), \
patch("agent.model_metadata._query_ollama_api_show", return_value=None), \
patch("agent.models_dev.fetch_models_dev", return_value=registry):
assert get_model_context_length(
"grok-build-0.1",
provider="xai-oauth",
base_url="https://api.x.ai/v1",
api_key="oauth-token",
) == 256000
def test_deepseek_v4_models_1m_context(self):
from agent.model_metadata import get_model_context_length
from unittest.mock import patch as mock_patch
+20
View File
@@ -41,6 +41,16 @@ SAMPLE_REGISTRY = {
},
},
},
"xai": {
"id": "xai",
"name": "xAI",
"models": {
"grok-build-0.1": {
"id": "grok-build-0.1",
"limit": {"context": 256000, "output": 64000},
},
},
},
"kilo": {
"id": "kilo",
"name": "Kilo Gateway",
@@ -86,6 +96,10 @@ class TestProviderMapping:
assert PROVIDER_TO_MODELS_DEV["kilocode"] == "kilo"
assert PROVIDER_TO_MODELS_DEV["ai-gateway"] == "vercel"
def test_xai_oauth_uses_xai_catalog(self):
assert PROVIDER_TO_MODELS_DEV["xai"] == "xai"
assert PROVIDER_TO_MODELS_DEV["xai-oauth"] == "xai"
def test_unmapped_provider_not_in_dict(self):
assert "nous" not in PROVIDER_TO_MODELS_DEV
@@ -144,6 +158,12 @@ class TestLookupModelsDevContext:
# GitHub Copilot: only 128K for same model
assert lookup_models_dev_context("copilot", "claude-opus-4.6") == 128000
@patch("agent.models_dev.fetch_models_dev")
def test_xai_oauth_resolves_xai_context(self, mock_fetch):
"""xAI OAuth is an auth path, not a separate model catalog."""
mock_fetch.return_value = SAMPLE_REGISTRY
assert lookup_models_dev_context("xai-oauth", "grok-build-0.1") == 256000
@patch("agent.models_dev.fetch_models_dev")
def test_zero_context_filtered(self, mock_fetch):
mock_fetch.return_value = SAMPLE_REGISTRY
+7 -4
View File
@@ -556,10 +556,11 @@ Generate some audio.
raising=False,
)
with patch.dict(
os.environ, {"HERMES_SESSION_PLATFORM": "telegram"}, clear=False
):
with patch("tools.skills_tool.SKILLS_DIR", tmp_path):
with patch("tools.skills_tool.SKILLS_DIR", tmp_path):
from gateway.session_context import clear_session_vars, set_session_vars
tokens = set_session_vars(platform="telegram")
try:
_make_skill(
tmp_path,
"test-skill",
@@ -571,6 +572,8 @@ Generate some audio.
)
scan_skill_commands()
msg = build_skill_invocation_message("/test-skill", "do stuff")
finally:
clear_session_vars(tokens)
assert msg is not None
assert "local cli" in msg.lower()
+143 -2
View File
@@ -1,6 +1,12 @@
"""Tests for agent/skill_utils.py — extract_skill_conditions metadata handling."""
"""Tests for agent/skill_utils.py."""
from agent.skill_utils import extract_skill_conditions
from unittest.mock import patch
from agent.skill_utils import (
extract_skill_conditions,
iter_skill_index_files,
skill_matches_platform,
)
def test_metadata_as_dict_with_hermes():
@@ -56,3 +62,138 @@ def test_metadata_missing_entirely():
"fallback_for_tools": [],
"requires_tools": [],
}
def test_iter_skill_index_files_prunes_dependency_dirs(tmp_path):
real = tmp_path / "real-skill"
real.mkdir()
(real / "SKILL.md").write_text("---\nname: real-skill\n---\n", encoding="utf-8")
nested = (
tmp_path
/ "bring"
/ "scripts"
/ ".venv"
/ "lib"
/ "python3.13"
/ "site-packages"
/ "typer"
/ ".agents"
/ "skills"
/ "typer"
)
nested.mkdir(parents=True)
(nested / "SKILL.md").write_text("---\nname: typer\n---\n", encoding="utf-8")
node_module = (
tmp_path
/ "web-skill"
/ "node_modules"
/ "dep"
/ ".agents"
/ "skills"
/ "dep"
)
node_module.mkdir(parents=True)
(node_module / "SKILL.md").write_text("---\nname: dep\n---\n", encoding="utf-8")
found = list(iter_skill_index_files(tmp_path, "SKILL.md"))
assert found == [real / "SKILL.md"]
# ── skill_matches_platform on Termux ──────────────────────────────────────
class TestSkillMatchesPlatformTermux:
"""Termux is Linux userland on Android. Skills tagged platforms:[linux]
must load there regardless of whether Python reports sys.platform as
"linux" (pre-3.13) or "android" (3.13+). Reported by user @LikiusInik
in May 2026 only 3 built-in skills appeared on Termux because every
github/productivity/mlops skill is tagged platforms:[linux,macos,windows]
and sys.platform=="android" did not start with "linux".
"""
def test_no_platforms_field_matches_everywhere(self):
# Backward-compat default — skills without a platforms tag load
# on any OS, Termux included.
with patch("agent.skill_utils.sys.platform", "android"), patch(
"agent.skill_utils.is_termux", return_value=True
):
assert skill_matches_platform({}) is True
assert skill_matches_platform({"name": "foo"}) is True
def test_linux_skill_loads_on_termux_android_platform(self):
# Python 3.13+ on Termux reports sys.platform == "android".
fm = {"platforms": ["linux"]}
with patch("agent.skill_utils.sys.platform", "android"), patch(
"agent.skill_utils.is_termux", return_value=True
):
assert skill_matches_platform(fm) is True
def test_linux_macos_windows_skill_loads_on_termux(self):
# The common "[linux, macos, windows]" tag used by github-*,
# productivity, mlops, etc.
fm = {"platforms": ["linux", "macos", "windows"]}
with patch("agent.skill_utils.sys.platform", "android"), patch(
"agent.skill_utils.is_termux", return_value=True
):
assert skill_matches_platform(fm) is True
def test_linux_skill_loads_on_termux_linux_platform(self):
# Pre-3.13 Termux reports sys.platform == "linux" already — this
# works without the Termux escape hatch but must still pass.
fm = {"platforms": ["linux"]}
with patch("agent.skill_utils.sys.platform", "linux"), patch(
"agent.skill_utils.is_termux", return_value=True
):
assert skill_matches_platform(fm) is True
def test_macos_only_skill_still_excluded_on_termux(self):
# macOS-only skills (apple-notes, imessage, ...) should NOT load
# on Termux. The Termux fallback only widens platforms:[linux,...].
fm = {"platforms": ["macos"]}
with patch("agent.skill_utils.sys.platform", "android"), patch(
"agent.skill_utils.is_termux", return_value=True
):
assert skill_matches_platform(fm) is False
def test_windows_only_skill_still_excluded_on_termux(self):
fm = {"platforms": ["windows"]}
with patch("agent.skill_utils.sys.platform", "android"), patch(
"agent.skill_utils.is_termux", return_value=True
):
assert skill_matches_platform(fm) is False
def test_explicit_termux_or_android_tag_matches(self):
# Skills can also opt in explicitly via platforms:[termux] or
# platforms:[android] — both should match a Termux session.
with patch("agent.skill_utils.sys.platform", "android"), patch(
"agent.skill_utils.is_termux", return_value=True
):
assert skill_matches_platform({"platforms": ["termux"]}) is True
assert skill_matches_platform({"platforms": ["android"]}) is True
def test_non_termux_android_does_not_widen(self):
# If we're somehow on a plain Android Python (not Termux), don't
# silently load Linux skills — Termux is the supported environment.
fm = {"platforms": ["linux"]}
with patch("agent.skill_utils.sys.platform", "android"), patch(
"agent.skill_utils.is_termux", return_value=False
):
assert skill_matches_platform(fm) is False
def test_linux_skill_on_real_linux_unaffected(self):
# The non-Termux Linux path must not change.
fm = {"platforms": ["linux"]}
with patch("agent.skill_utils.sys.platform", "linux"), patch(
"agent.skill_utils.is_termux", return_value=False
):
assert skill_matches_platform(fm) is True
def test_macos_skill_on_real_macos_unaffected(self):
fm = {"platforms": ["macos"]}
with patch("agent.skill_utils.sys.platform", "darwin"), patch(
"agent.skill_utils.is_termux", return_value=False
):
assert skill_matches_platform(fm) is True
+14
View File
@@ -102,6 +102,20 @@ class TestVerboseAndToolProgress:
assert cli.tool_progress_mode in {"off", "new", "all", "verbose"}
class TestFallbackChainInit:
def test_merges_new_and_legacy_fallback_config(self):
cli = _make_cli(config_overrides={
"fallback_providers": [
{"provider": "openrouter", "model": "anthropic/claude-sonnet-4.6"},
],
"fallback_model": {"provider": "nous", "model": "Hermes-4"},
})
assert cli._fallback_model == [
{"provider": "openrouter", "model": "anthropic/claude-sonnet-4.6"},
{"provider": "nous", "model": "Hermes-4"},
]
class TestBusyInputMode:
def test_default_busy_input_mode_is_interrupt(self):
cli = _make_cli()
+38 -184
View File
@@ -20,12 +20,9 @@ test runner at ``scripts/run_tests.sh``.
"""
import asyncio
import logging
import os
import re
import signal
import sys
import tempfile
from pathlib import Path
from unittest.mock import patch
@@ -37,6 +34,22 @@ if str(PROJECT_ROOT) not in sys.path:
sys.path.insert(0, str(PROJECT_ROOT))
# ── Per-file process isolation ──────────────────────────────────────────────
# Tests run via ``scripts/run_tests_parallel.py``, which spawns a fresh
# ``python -m pytest <file>`` subprocess per test file. Cross-file state
# leakage (module-level dicts, ContextVars, caches) is impossible: each
# file gets a clean Python interpreter. Intra-file ordering is the test
# author's responsibility — if test A in foo.py mutates state that test B
# in foo.py reads, that's a real bug to fix in the file (it would also
# bite anyone running ``pytest tests/foo.py`` directly).
#
# This replaces the historic _reset_module_state autouse fixture (manual
# state clearing) and the brief experiment with subprocess-per-test
# isolation (too slow at ~17k tests).
#
# See ``scripts/run_tests_parallel.py`` for the runner.
# ── Credential env-var filter ──────────────────────────────────────────────
#
# Any env var in the current process matching ONE of these patterns is
@@ -279,7 +292,7 @@ _HERMES_BEHAVIORAL_VARS = frozenset({
"WECOM_HOME_CHANNEL_NAME",
# Platform gating — set by load_gateway_config() as a side effect when
# a config.yaml is present, so individual test bodies that call the
# loader leak these values into later tests on the same xdist worker.
# loader leak these values into later tests in the same process.
# Force-clear on every test setup so the leak can't happen.
"SLACK_REQUIRE_MENTION",
"SLACK_STRICT_MENTION",
@@ -345,6 +358,10 @@ def _hermetic_environment(tmp_path, monkeypatch):
monkeypatch.setenv("AWS_EC2_METADATA_DISABLED", "true")
monkeypatch.setenv("AWS_METADATA_SERVICE_TIMEOUT", "1")
monkeypatch.setenv("AWS_METADATA_SERVICE_NUM_ATTEMPTS", "1")
# Tirith auto-installs from GitHub when enabled and missing. Unit tests
# should never perform that implicit network/bootstrap path; Tirith-specific
# tests opt back in by patching the security config directly.
monkeypatch.setenv("TIRITH_ENABLED", "false")
# 5. Reset plugin singleton so tests don't leak plugins from
# ~/.hermes/plugins/ (which, per step 3, is now empty — but the
@@ -368,144 +385,21 @@ def _isolate_hermes_home(_hermetic_environment):
return None
# ── Module-level state reset ───────────────────────────────────────────────
# ── Module-level state reset — replaced by per-file process isolation ──────
#
# Python modules are singletons per process, and pytest-xdist workers are
# long-lived. Module-level dicts/sets (tool registries, approval state,
# interrupt flags) and ContextVars persist across tests in the same worker,
# causing tests that pass alone to fail when run with siblings.
# Each test FILE runs in a freshly-spawned ``python -m pytest <file>``
# subprocess via ``scripts/run_tests_parallel.py``, so module-level dicts /
# sets / ContextVars from tests in one file cannot leak into tests in
# another file. No manual per-module clearing needed.
#
# Each entry in this fixture clears state that belongs to a specific module.
# New state buckets go here too — this is the single gate that prevents
# "works alone, flakes in CI" bugs from state leakage.
# Within a single file, ordering is the author's responsibility. If your
# tests in the same file share mutable state, either reset it explicitly
# in a fixture or split them across files.
#
# The skill `test-suite-cascade-diagnosis` documents the concrete patterns
# this closes; the running example was `test_command_guards` failing 12/15
# CI runs because ``tools.approval._session_approved`` carried approvals
# from one test's session into another's.
@pytest.fixture(autouse=True)
def _reset_module_state():
"""Clear module-level mutable state and ContextVars between tests.
Keeps state from leaking across tests on the same xdist worker. Modules
that don't exist yet (test collection before production import) are
skipped silently production import later creates fresh empty state.
"""
# --- logging — quiet/one-shot paths mutate process-global logger state ---
logging.disable(logging.NOTSET)
for _logger_name in ("tools", "run_agent", "trajectory_compressor", "cron", "hermes_cli"):
_logger = logging.getLogger(_logger_name)
_logger.disabled = False
_logger.setLevel(logging.NOTSET)
_logger.propagate = True
# --- tools.approval — the single biggest source of cross-test pollution ---
try:
from tools import approval as _approval_mod
_approval_mod._session_approved.clear()
_approval_mod._session_yolo.clear()
_approval_mod._permanent_approved.clear()
_approval_mod._pending.clear()
_approval_mod._gateway_queues.clear()
_approval_mod._gateway_notify_cbs.clear()
# ContextVar: reset to empty string so get_current_session_key()
# falls through to the env var / default path, matching a fresh
# process.
_approval_mod._approval_session_key.set("")
except Exception:
pass
# --- tools.interrupt — per-thread interrupt flag set ---
try:
from tools import interrupt as _interrupt_mod
with _interrupt_mod._lock:
_interrupt_mod._interrupted_threads.clear()
except Exception:
pass
# --- gateway.session_context — 9 ContextVars that represent
# the active gateway session. If set in one test and not reset,
# the next test's get_session_env() reads stale values.
try:
from gateway import session_context as _sc_mod
for _cv in (
_sc_mod._SESSION_PLATFORM,
_sc_mod._SESSION_CHAT_ID,
_sc_mod._SESSION_CHAT_NAME,
_sc_mod._SESSION_THREAD_ID,
_sc_mod._SESSION_USER_ID,
_sc_mod._SESSION_USER_NAME,
_sc_mod._SESSION_KEY,
_sc_mod._CRON_AUTO_DELIVER_PLATFORM,
_sc_mod._CRON_AUTO_DELIVER_CHAT_ID,
_sc_mod._CRON_AUTO_DELIVER_THREAD_ID,
):
_cv.set(_sc_mod._UNSET)
except Exception:
pass
# --- tools.env_passthrough — ContextVar<set[str]> with no default ---
# LookupError is normal if the test never set it. Setting it to an
# empty set unconditionally normalizes the starting state.
try:
from tools import env_passthrough as _envp_mod
_envp_mod._allowed_env_vars_var.set(set())
except Exception:
pass
# --- tools.terminal_tool — active environment/cwd cache ---
# File tools prefer a live terminal cwd when one is cached for the task.
# Clear terminal environments between tests so a prior terminal call can't
# override TERMINAL_CWD in path-resolution tests.
try:
from tools import terminal_tool as _term_mod
_envs_to_cleanup = []
with _term_mod._env_lock:
_envs_to_cleanup = list(_term_mod._active_environments.values())
_term_mod._active_environments.clear()
_term_mod._last_activity.clear()
_term_mod._creation_locks.clear()
for _env in _envs_to_cleanup:
try:
_env.cleanup()
except Exception:
pass
except Exception:
pass
# --- tools.credential_files — ContextVar<dict> ---
try:
from tools import credential_files as _credf_mod
_credf_mod._registered_files_var.set({})
except Exception:
pass
# --- agent.auxiliary_client — runtime main provider/model override and
# payment-error health cache. Both are process-global in production;
# reset them per test so one worker's fallback/402 test does not make
# later auxiliary-client tests skip otherwise-available providers.
try:
from agent import auxiliary_client as _aux_mod
_aux_mod.clear_runtime_main()
_aux_mod._reset_aux_unhealthy_cache()
except Exception:
pass
# --- tools.file_tools — per-task read history + file-ops cache ---
# _read_tracker accumulates per-task_id read history for loop detection,
# capped by _READ_HISTORY_CAP. If entries from a prior test persist, the
# cap is hit faster than expected and capacity-related tests flake.
try:
from tools import file_tools as _ft_mod
with _ft_mod._read_tracker_lock:
_ft_mod._read_tracker.clear()
with _ft_mod._file_ops_lock:
_ft_mod._file_ops_cache.clear()
except Exception:
pass
yield
# The skill ``test-suite-cascade-diagnosis`` documents the cascade patterns
# this replaces; the running example was ``test_command_guards`` failing
# 12/15 CI runs because ``tools.approval._session_approved`` carried
# approvals from one test's session into another's.
@pytest.fixture()
@@ -532,13 +426,12 @@ def mock_config():
}
# ── Global test timeout ─────────────────────────────────────────────────────
# Kill any individual test that takes longer than 30 seconds.
# Prevents hanging tests (subprocess spawns, blocking I/O) from stalling the
# entire test suite.
# ── Per-test timeout — handled by the isolation plugin ─────────────────────
#
# The subprocess-per-test plugin enforces the configured ``isolate_timeout``
# ini key by terminating the child if it overruns. The old SIGALRM-based
# fixture (POSIX-only, didn't work on Windows) is gone.
def _timeout_handler(signum, frame):
raise TimeoutError("Test exceeded 30 second timeout")
@pytest.fixture(autouse=True)
def _ensure_current_event_loop(request):
@@ -584,45 +477,6 @@ def _ensure_current_event_loop(request):
asyncio.set_event_loop(None)
@pytest.fixture(autouse=True)
def _enforce_test_timeout():
"""Kill any individual test that takes longer than 30 seconds.
SIGALRM is Unix-only; skip on Windows."""
if sys.platform == "win32":
yield
return
old = signal.signal(signal.SIGALRM, _timeout_handler)
signal.alarm(30)
yield
signal.alarm(0)
signal.signal(signal.SIGALRM, old)
@pytest.fixture(autouse=True)
def _reset_tool_registry_caches():
"""Clear tool-registry-level caches between tests.
The production registry caches ``check_fn()`` results for 30 s
(see tools/registry.py) and :func:`get_tool_definitions` memoizes
its result (see model_tools.py). Both are keyed on state that tests
routinely mutate (env vars, registry._generation, config.yaml mtime)
but a stale result from test A can still be served to test B
because 30 s covers the entire suite, and xdist worker reuse means
one test's cache lands in another's process. Clearing before every
test keeps hermetic behavior.
"""
try:
from tools.registry import invalidate_check_fn_cache
invalidate_check_fn_cache()
except ImportError:
pass
try:
from model_tools import _clear_tool_defs_cache
_clear_tool_defs_cache()
except ImportError:
pass
# ── Live-system guard ──────────────────────────────────────────────────────
#
# Several test files exercise the gateway-restart / kill code paths
+73 -35
View File
@@ -490,6 +490,17 @@ class TestRoutingIntents:
class TestDeliverResultWrapping:
"""Verify that cron deliveries are wrapped with header/footer and no longer mirrored."""
def _safe_media_path(self, tmp_path, monkeypatch, name, data=b"media"):
root = tmp_path / "media-cache"
media_file = root / name
media_file.parent.mkdir(parents=True, exist_ok=True)
media_file.write_bytes(data)
monkeypatch.setattr(
"gateway.platforms.base.MEDIA_DELIVERY_SAFE_ROOTS",
(root,),
)
return media_file.resolve()
def test_delivery_wraps_content_with_header_and_footer(self):
"""Delivered content should include task name header and agent-invisible note."""
from gateway.config import Platform
@@ -564,9 +575,10 @@ class TestDeliverResultWrapping:
assert "Cronjob Response" not in sent_content
assert "The agent cannot see" not in sent_content
def test_delivery_extracts_media_tags_before_send(self):
def test_delivery_extracts_media_tags_before_send(self, tmp_path, monkeypatch):
"""Cron delivery should pass MEDIA attachments separately to the send helper."""
from gateway.config import Platform
media_path = self._safe_media_path(tmp_path, monkeypatch, "test-voice.ogg")
pconfig = MagicMock()
pconfig.enabled = True
@@ -581,7 +593,7 @@ class TestDeliverResultWrapping:
"deliver": "origin",
"origin": {"platform": "telegram", "chat_id": "123"},
}
_deliver_result(job, "Title\nMEDIA:/tmp/test-voice.ogg")
_deliver_result(job, f"Title\nMEDIA:{media_path}")
send_mock.assert_called_once()
args, kwargs = send_mock.call_args
@@ -589,14 +601,15 @@ class TestDeliverResultWrapping:
assert "MEDIA:" not in args[3]
assert "Title" in args[3]
# Media files should be forwarded separately
assert kwargs["media_files"] == [("/tmp/test-voice.ogg", False)]
assert kwargs["media_files"] == [(str(media_path), False)]
def test_live_adapter_sends_media_as_attachments(self):
def test_live_adapter_sends_media_as_attachments(self, tmp_path, monkeypatch):
"""When a live adapter is available, MEDIA files should be sent as native
platform attachments (e.g., Discord voice, Telegram audio) rather than
as literal 'MEDIA:/path' text."""
from gateway.config import Platform
from concurrent.futures import Future
media_path = self._safe_media_path(tmp_path, monkeypatch, "cron-voice.mp3")
adapter = AsyncMock()
adapter.send.return_value = MagicMock(success=True)
@@ -628,7 +641,7 @@ class TestDeliverResultWrapping:
patch("asyncio.run_coroutine_threadsafe", side_effect=fake_run_coro):
_deliver_result(
job,
"Here is TTS\nMEDIA:/tmp/cron-voice.mp3",
f"Here is TTS\nMEDIA:{media_path}",
adapters={Platform.DISCORD: adapter},
loop=loop,
)
@@ -642,12 +655,13 @@ class TestDeliverResultWrapping:
# Audio file should be sent as a voice attachment
adapter.send_voice.assert_called_once()
voice_call = adapter.send_voice.call_args
assert voice_call[1]["audio_path"] == "/tmp/cron-voice.mp3"
assert voice_call[1]["audio_path"] == str(media_path)
def test_live_adapter_routes_image_to_send_image_file(self):
def test_live_adapter_routes_image_to_send_image_file(self, tmp_path, monkeypatch):
"""Image MEDIA files should be routed to send_image_file, not send_voice."""
from gateway.config import Platform
from concurrent.futures import Future
media_path = self._safe_media_path(tmp_path, monkeypatch, "chart.png")
adapter = AsyncMock()
adapter.send.return_value = MagicMock(success=True)
@@ -678,19 +692,20 @@ class TestDeliverResultWrapping:
patch("asyncio.run_coroutine_threadsafe", side_effect=fake_run_coro):
_deliver_result(
job,
"Chart attached\nMEDIA:/tmp/chart.png",
f"Chart attached\nMEDIA:{media_path}",
adapters={Platform.DISCORD: adapter},
loop=loop,
)
adapter.send_image_file.assert_called_once()
assert adapter.send_image_file.call_args[1]["image_path"] == "/tmp/chart.png"
assert adapter.send_image_file.call_args[1]["image_path"] == str(media_path)
adapter.send_voice.assert_not_called()
def test_live_adapter_media_only_no_text(self):
def test_live_adapter_media_only_no_text(self, tmp_path, monkeypatch):
"""When content is ONLY a MEDIA tag with no text, media should still be sent."""
from gateway.config import Platform
from concurrent.futures import Future
media_path = self._safe_media_path(tmp_path, monkeypatch, "voice.ogg")
adapter = AsyncMock()
adapter.send_voice.return_value = MagicMock(success=True)
@@ -720,7 +735,7 @@ class TestDeliverResultWrapping:
patch("asyncio.run_coroutine_threadsafe", side_effect=fake_run_coro):
_deliver_result(
job,
"[[audio_as_voice]]\nMEDIA:/tmp/voice.ogg",
f"[[audio_as_voice]]\nMEDIA:{media_path}",
adapters={Platform.TELEGRAM: adapter},
loop=loop,
)
@@ -2164,43 +2179,56 @@ class TestBuildJobPromptBumpUse:
class TestSendMediaViaAdapter:
"""Unit tests for _send_media_via_adapter — routes files to typed adapter methods."""
def _safe_media_path(self, tmp_path, monkeypatch, name, data=b"media"):
root = tmp_path / "media-cache"
media_file = root / name
media_file.parent.mkdir(parents=True, exist_ok=True)
media_file.write_bytes(data)
monkeypatch.setattr(
"gateway.platforms.base.MEDIA_DELIVERY_SAFE_ROOTS",
(root,),
)
return media_file.resolve()
@staticmethod
def _run_with_loop(adapter, chat_id, media_files, metadata, job):
"""Helper: run _send_media_via_adapter with a real running event loop."""
import asyncio
import threading
"""Helper: run _send_media_via_adapter with immediate scheduling."""
from concurrent.futures import Future
loop = asyncio.new_event_loop()
t = threading.Thread(target=loop.run_forever, daemon=True)
t.start()
try:
_send_media_via_adapter(adapter, chat_id, media_files, metadata, loop, job)
finally:
loop.call_soon_threadsafe(loop.stop)
t.join(timeout=5)
loop.close()
def fake_run_coro(coro, _loop):
coro.close()
completed = Future()
completed.set_result(MagicMock(success=True))
return completed
def test_video_dispatched_to_send_video(self):
with patch("asyncio.run_coroutine_threadsafe", side_effect=fake_run_coro):
_send_media_via_adapter(adapter, chat_id, media_files, metadata, MagicMock(), job)
def test_video_dispatched_to_send_video(self, tmp_path, monkeypatch):
adapter = MagicMock()
adapter.send_video = AsyncMock()
media_files = [("/tmp/clip.mp4", False)]
media_path = self._safe_media_path(tmp_path, monkeypatch, "clip.mp4")
media_files = [(str(media_path), False)]
self._run_with_loop(adapter, "123", media_files, None, {"id": "j1"})
adapter.send_video.assert_called_once()
assert adapter.send_video.call_args[1]["video_path"] == "/tmp/clip.mp4"
assert adapter.send_video.call_args[1]["video_path"] == str(media_path)
def test_unknown_ext_dispatched_to_send_document(self):
def test_unknown_ext_dispatched_to_send_document(self, tmp_path, monkeypatch):
adapter = MagicMock()
adapter.send_document = AsyncMock()
media_files = [("/tmp/report.pdf", False)]
media_path = self._safe_media_path(tmp_path, monkeypatch, "report.pdf")
media_files = [(str(media_path), False)]
self._run_with_loop(adapter, "123", media_files, None, {"id": "j2"})
adapter.send_document.assert_called_once()
assert adapter.send_document.call_args[1]["file_path"] == "/tmp/report.pdf"
assert adapter.send_document.call_args[1]["file_path"] == str(media_path)
def test_multiple_media_files_all_delivered(self):
def test_multiple_media_files_all_delivered(self, tmp_path, monkeypatch):
adapter = MagicMock()
adapter.send_voice = AsyncMock()
adapter.send_image_file = AsyncMock()
media_files = [("/tmp/voice.mp3", False), ("/tmp/photo.jpg", False)]
voice_path = self._safe_media_path(tmp_path, monkeypatch, "voice.mp3")
photo_path = self._safe_media_path(tmp_path, monkeypatch, "photo.jpg")
media_files = [(str(voice_path), False), (str(photo_path), False)]
self._run_with_loop(adapter, "123", media_files, None, {"id": "j3"})
adapter.send_voice.assert_called_once()
adapter.send_image_file.assert_called_once()
@@ -2462,7 +2490,7 @@ class TestSendMediaTimeoutCancelsFuture:
in-flight coroutine must be cancelled before the next file is tried.
"""
def test_media_send_timeout_cancels_future_and_continues(self):
def test_media_send_timeout_cancels_future_and_continues(self, tmp_path, monkeypatch):
"""End-to-end: _send_media_via_adapter with a future whose .result()
raises TimeoutError. Assert cancel() fires and the loop proceeds
to the next file rather than hanging or crashing."""
@@ -2493,9 +2521,19 @@ class TestSendMediaTimeoutCancelsFuture:
coro.close()
return next(futures_iter)
root = tmp_path / "media-cache"
slow = root / "slow.png"
fast = root / "fast.mp4"
slow.parent.mkdir(parents=True)
slow.write_bytes(b"slow")
fast.write_bytes(b"fast")
monkeypatch.setattr(
"gateway.platforms.base.MEDIA_DELIVERY_SAFE_ROOTS",
(root,),
)
media_files = [
("/tmp/slow.png", False), # times out
("/tmp/fast.mp4", False), # succeeds
(str(slow), False), # times out
(str(fast), False), # succeeds
]
loop = MagicMock()
@@ -2509,4 +2547,4 @@ class TestSendMediaTimeoutCancelsFuture:
assert timeout_cancel_calls == [True], "future.cancel() must fire on TimeoutError"
# 2. Second file still got dispatched — one timeout doesn't abort the batch
adapter.send_video.assert_called_once()
assert adapter.send_video.call_args[1]["video_path"] == "/tmp/fast.mp4"
assert adapter.send_video.call_args[1]["video_path"] == str(fast.resolve())
+1 -1
View File
@@ -119,7 +119,7 @@ _ensure_slack_mock()
import discord # noqa: E402 — mocked above
from gateway.platforms.telegram import TelegramAdapter # noqa: E402
from gateway.platforms.discord import DiscordAdapter # noqa: E402
from plugins.platforms.discord.adapter import DiscordAdapter # noqa: E402
import gateway.platforms.slack as _slack_mod # noqa: E402
_slack_mod.SLACK_AVAILABLE = True
+116 -17
View File
@@ -313,19 +313,30 @@ def _scan_for_plugin_adapter_antipattern(source: str) -> list[str]:
return offenses
def pytest_configure(config):
"""Reject plugin-adapter tests that use the sys.path anti-pattern.
def _fingerprint_gateway_tests() -> str:
"""Return a short fingerprint that changes when any gateway test file changes.
Runs once per pytest session on the controller, BEFORE any xdist
worker is spawned. If any file under ``tests/gateway/`` matches the
anti-pattern, we fail the whole session with a clear message
before a polluted ``sys.path`` can cascade across workers.
Uses (mtime, size) pairs instead of content hashing fast to compute
(stat-only, no reads) and sufficient for cache invalidation across
per-file subprocess runs.
"""
# Only run on the xdist controller (or in non-xdist runs). Skip on
# worker subprocesses so we don't scan the filesystem N times.
if hasattr(config, "workerinput"):
return
import hashlib
h = hashlib.sha256()
for path in sorted(_GATEWAY_DIR.rglob("test_*.py")):
try:
st = path.stat()
h.update(f"{path.name}:{st.st_mtime_ns}:{st.st_size}".encode())
except OSError:
h.update(f"{path.name}:missing".encode())
return h.hexdigest()[:16]
def _run_adapter_antipattern_scan() -> list[str]:
"""Scan gateway test files for the plugin-adapter anti-pattern.
Returns a list of violation strings (empty if clean).
"""
violations: list[str] = []
for path in _GATEWAY_DIR.rglob("test_*.py"):
if path.name in {"_plugin_adapter_loader.py", "conftest.py"}:
@@ -334,20 +345,108 @@ def pytest_configure(config):
source = path.read_text(encoding="utf-8")
except OSError:
continue
# Fast string pre-filter: skip files that can't possibly violate.
# A violating file MUST contain both (a) an adapter/plugins/platforms
# reference AND (b) either sys.path manipulation or a bare adapter import.
if "adapter" not in source and "plugins/platforms" not in source:
continue
if not (
"sys.path" in source
or "import adapter" in source
or "from adapter import" in source
):
continue
offenses = _scan_for_plugin_adapter_antipattern(source)
if offenses:
violations.append(
f" {path.relative_to(_GATEWAY_DIR.parent.parent)}:\n "
+ "\n ".join(offenses)
)
return violations
if violations:
raise pytest.UsageError(
"Plugin-adapter-import anti-pattern detected in gateway tests:\n"
+ "\n".join(violations)
+ "\n\n"
+ _GUARD_HINT
)
def pytest_configure(config):
"""Reject plugin-adapter tests that use the sys.path anti-pattern.
Runs once per pytest session on the controller, BEFORE any xdist
worker is spawned. If any file under ``tests/gateway/`` matches the
anti-pattern, we fail the whole session with a clear message
before a polluted ``sys.path`` can cascade across workers.
**Performance**: in the per-file subprocess isolation model (no xdist),
every subprocess is a "controller" so the naive scan would run 257
times, each costing ~1s of AST walking. We avoid this with two
strategies:
1. **Tight string pre-filter**: a file can only violate if it contains
*both* an adapter/plugins/platforms reference *and* a sys.path
manipulation or bare ``import adapter``. This drops ~95% of files
from needing AST parsing.
2. **File-locked cache**: the scan result is cached in
``.pytest-cache/gw-adapter-guard-<fingerprint>`` keyed on a
fingerprint of the gateway test file mtimes/sizes. Concurrent
subprocesses acquire a lock; only the first performs the scan;
the rest wait and read the cached result.
"""
# Only run on the xdist controller (or in non-xdist runs). Skip on
# worker subprocesses so we don't scan the filesystem N times.
if hasattr(config, "workerinput"):
return
fp = _fingerprint_gateway_tests()
cache_dir = Path.cwd() / ".pytest-cache"
cache_file = cache_dir / f"gw-adapter-guard-{fp}"
lock_file = cache_dir / f".gw-adapter-guard-{fp}.lock"
cache_dir.mkdir(parents=True, exist_ok=True)
# Evict stale cache entries from previous fingerprints (best-effort).
try:
for old in cache_dir.glob("gw-adapter-guard-*"):
if old.name != f"gw-adapter-guard-{fp}":
old.unlink(missing_ok=True)
for old in cache_dir.glob(".gw-adapter-guard-*.lock"):
if old.name != f".gw-adapter-guard-{fp}.lock":
old.unlink(missing_ok=True)
except OSError:
pass # Non-critical; old files are harmless.
# Use filelock to ensure only one process scans at a time.
# Concurrent subprocesses all hit pytest_configure simultaneously;
# without a lock they'd all find no cache and all run the scan.
try:
from filelock import FileLock
lock = FileLock(str(lock_file), timeout=120)
except ImportError:
# Fallback: no locking (still correct, just slower under contention).
import contextlib
class _NoLock:
def __enter__(self):
return self
def __exit__(self, *a):
pass
lock = _NoLock()
with lock:
if cache_file.exists():
cached = cache_file.read_text(encoding="utf-8")
if cached == "clean":
return
raise pytest.UsageError(cached)
# Slow path: this process is the first to acquire the lock.
violations = _run_adapter_antipattern_scan()
if violations:
msg = (
"Plugin-adapter-import anti-pattern detected in gateway tests:\n"
+ "\n".join(violations)
+ "\n\n"
+ _GUARD_HINT
)
cache_file.write_text(msg, encoding="utf-8")
raise pytest.UsageError(msg)
else:
cache_file.write_text("clean", encoding="utf-8")
+43
View File
@@ -71,3 +71,46 @@ class TestResolveRuntimeAgentKwargsAuthFallback:
from gateway.run import _resolve_runtime_agent_kwargs
with pytest.raises(RuntimeError):
_resolve_runtime_agent_kwargs()
def test_legacy_fallback_is_appended_after_fallback_providers(self, tmp_path, monkeypatch):
"""When both keys exist, the legacy entry still participates in resolution."""
config_path = tmp_path / "config.yaml"
config_path.write_text(
"fallback_providers:\n"
" - provider: openrouter\n"
" model: anthropic/claude-sonnet-4.6\n"
"fallback_model:\n"
" provider: nous\n"
" model: Hermes-4\n"
)
monkeypatch.setattr("gateway.run._hermes_home", tmp_path)
calls = []
def _mock_resolve(**kwargs):
requested = kwargs.get("requested")
calls.append(requested)
if requested == "openrouter":
raise RuntimeError("openrouter unavailable")
return {
"api_key": "nous-key",
"base_url": "https://portal.nousresearch.com/v1",
"provider": "nous",
"api_mode": "chat_completions",
"command": None,
"args": None,
"credential_pool": None,
}
with patch(
"hermes_cli.runtime_provider.resolve_runtime_provider",
side_effect=_mock_resolve,
):
from gateway.run import _try_resolve_fallback_provider
result = _try_resolve_fallback_provider()
assert calls == ["openrouter", "nous"]
assert result["provider"] == "nous"
assert result["model"] == "Hermes-4"
@@ -81,7 +81,7 @@ def _ensure_discord_mock():
_ensure_discord_mock()
from gateway.platforms.discord import _build_allowed_mentions # noqa: E402
from plugins.platforms.discord.adapter import _build_allowed_mentions # noqa: E402
# The four DISCORD_ALLOW_MENTION_* env vars that _build_allowed_mentions reads.
@@ -58,7 +58,7 @@ def _ensure_discord_mock():
_ensure_discord_mock()
from gateway.platforms.discord import DiscordAdapter # noqa: E402
from plugins.platforms.discord.adapter import DiscordAdapter # noqa: E402
from gateway.platforms.base import MessageType # noqa: E402
@@ -146,10 +146,10 @@ class TestCacheDiscordImage:
att = _make_attachment_with_read(_PNG_BYTES)
with patch(
"gateway.platforms.discord.cache_image_from_bytes",
"plugins.platforms.discord.adapter.cache_image_from_bytes",
return_value="/tmp/cached.png",
) as mock_bytes, patch(
"gateway.platforms.discord.cache_image_from_url",
"plugins.platforms.discord.adapter.cache_image_from_url",
new_callable=AsyncMock,
) as mock_url:
result = await adapter._cache_discord_image(att, ".png")
@@ -165,9 +165,9 @@ class TestCacheDiscordImage:
att = _make_attachment_without_read()
with patch(
"gateway.platforms.discord.cache_image_from_bytes",
"plugins.platforms.discord.adapter.cache_image_from_bytes",
) as mock_bytes, patch(
"gateway.platforms.discord.cache_image_from_url",
"plugins.platforms.discord.adapter.cache_image_from_url",
new_callable=AsyncMock,
return_value="/tmp/from_url.png",
) as mock_url:
@@ -186,10 +186,10 @@ class TestCacheDiscordImage:
att = _make_attachment_with_read(b"<html>forbidden</html>")
with patch(
"gateway.platforms.discord.cache_image_from_bytes",
"plugins.platforms.discord.adapter.cache_image_from_bytes",
side_effect=ValueError("not a valid image"),
), patch(
"gateway.platforms.discord.cache_image_from_url",
"plugins.platforms.discord.adapter.cache_image_from_url",
new_callable=AsyncMock,
return_value="/tmp/fallback.png",
) as mock_url:
@@ -210,10 +210,10 @@ class TestCacheDiscordAudio:
att = _make_attachment_with_read(_OGG_BYTES)
with patch(
"gateway.platforms.discord.cache_audio_from_bytes",
"plugins.platforms.discord.adapter.cache_audio_from_bytes",
return_value="/tmp/voice.ogg",
) as mock_bytes, patch(
"gateway.platforms.discord.cache_audio_from_url",
"plugins.platforms.discord.adapter.cache_audio_from_url",
new_callable=AsyncMock,
) as mock_url:
result = await adapter._cache_discord_audio(att, ".ogg")
@@ -228,7 +228,7 @@ class TestCacheDiscordAudio:
att = _make_attachment_without_read()
with patch(
"gateway.platforms.discord.cache_audio_from_url",
"plugins.platforms.discord.adapter.cache_audio_from_url",
new_callable=AsyncMock,
return_value="/tmp/from_url.ogg",
) as mock_url:
@@ -267,7 +267,7 @@ class TestCacheDiscordDocument:
att = _make_attachment_without_read() # no .read → forces fallback
with patch(
"gateway.platforms.discord.is_safe_url", return_value=False
"plugins.platforms.discord.adapter.is_safe_url", return_value=False
) as mock_safe, patch("aiohttp.ClientSession") as mock_session:
with pytest.raises(ValueError, match="SSRF"):
await adapter._cache_discord_document(att, ".pdf")
@@ -295,7 +295,7 @@ class TestCacheDiscordDocument:
session.__aexit__ = AsyncMock(return_value=False)
with patch(
"gateway.platforms.discord.is_safe_url", return_value=True
"plugins.platforms.discord.adapter.is_safe_url", return_value=True
), patch("aiohttp.ClientSession", return_value=session):
result = await adapter._cache_discord_document(att, ".pdf")
@@ -320,10 +320,10 @@ class TestHandleMessageUsesAuthenticatedRead:
adapter.handle_message = AsyncMock()
with patch(
"gateway.platforms.discord.cache_image_from_bytes",
"plugins.platforms.discord.adapter.cache_image_from_bytes",
return_value="/tmp/img_from_read.png",
), patch(
"gateway.platforms.discord.cache_image_from_url",
"plugins.platforms.discord.adapter.cache_image_from_url",
new_callable=AsyncMock,
) as mock_url_download:
att = SimpleNamespace(
@@ -342,7 +342,7 @@ class TestHandleMessageUsesAuthenticatedRead:
# Patch the DMChannel isinstance check so our fake counts as DM.
monkeypatch.setattr(
"gateway.platforms.discord.discord.DMChannel",
"plugins.platforms.discord.adapter.discord.DMChannel",
_FakeDMChannel,
)
chan = _FakeDMChannel()
@@ -368,7 +368,7 @@ class TestHandleMessageUsesAuthenticatedRead:
adapter.handle_message = AsyncMock()
with patch(
"gateway.platforms.discord.cache_audio_from_bytes",
"plugins.platforms.discord.adapter.cache_audio_from_bytes",
return_value="/tmp/voice_from_read.ogg",
):
att = SimpleNamespace(
@@ -386,7 +386,7 @@ class TestHandleMessageUsesAuthenticatedRead:
name = "dm"
monkeypatch.setattr(
"gateway.platforms.discord.discord.DMChannel",
"plugins.platforms.discord.adapter.discord.DMChannel",
_FakeDMChannel,
)
chan = _FakeDMChannel()
@@ -412,7 +412,7 @@ class TestHandleMessageUsesAuthenticatedRead:
adapter.handle_message = AsyncMock()
with patch(
"gateway.platforms.discord.cache_audio_from_bytes",
"plugins.platforms.discord.adapter.cache_audio_from_bytes",
return_value="/tmp/audio_from_read.ogg",
):
att = SimpleNamespace(
@@ -430,7 +430,7 @@ class TestHandleMessageUsesAuthenticatedRead:
name = "dm"
monkeypatch.setattr(
"gateway.platforms.discord.discord.DMChannel",
"plugins.platforms.discord.adapter.discord.DMChannel",
_FakeDMChannel,
)
chan = _FakeDMChannel()
@@ -45,8 +45,8 @@ def _ensure_discord_mock():
_ensure_discord_mock()
import gateway.platforms.discord as discord_platform # noqa: E402
from gateway.platforms.discord import DiscordAdapter # noqa: E402
import plugins.platforms.discord.adapter as discord_platform # noqa: E402
from plugins.platforms.discord.adapter import DiscordAdapter # noqa: E402
class FakeDMChannel:
@@ -58,7 +58,7 @@ def _install_fake_agent(monkeypatch):
def _make_adapter():
_ensure_discord_mock()
from gateway.platforms.discord import DiscordAdapter
from plugins.platforms.discord.adapter import DiscordAdapter
adapter = object.__new__(DiscordAdapter)
adapter.config = MagicMock()
+1 -1
View File
@@ -5,7 +5,7 @@ import pytest
def _make_adapter():
"""Create a minimal DiscordAdapter with mocked config."""
from gateway.platforms.discord import DiscordAdapter
from plugins.platforms.discord.adapter import DiscordAdapter
adapter = object.__new__(DiscordAdapter)
adapter.config = MagicMock()
adapter.config.extra = {}
@@ -26,7 +26,7 @@ if _repo not in sys.path:
# Triggers the shared discord mock from tests/gateway/conftest.py before
# importing the production module.
from gateway.platforms.discord import ( # noqa: E402
from plugins.platforms.discord.adapter import ( # noqa: E402
ClarifyChoiceView,
DiscordAdapter,
)
+1 -1
View File
@@ -18,7 +18,7 @@ import pytest
# Trigger the shared discord mock from tests/gateway/conftest.py before
# importing the production module.
from gateway.platforms.discord import ( # noqa: E402
from plugins.platforms.discord.adapter import ( # noqa: E402
ExecApprovalView,
ModelPickerView,
SlashConfirmView,
+2 -2
View File
@@ -67,8 +67,8 @@ def _ensure_discord_mock():
_ensure_discord_mock()
import gateway.platforms.discord as discord_platform # noqa: E402
from gateway.platforms.discord import DiscordAdapter # noqa: E402
import plugins.platforms.discord.adapter as discord_platform # noqa: E402
from plugins.platforms.discord.adapter import DiscordAdapter # noqa: E402
@pytest.fixture(autouse=True)
@@ -57,8 +57,8 @@ def _ensure_discord_mock():
_ensure_discord_mock()
import gateway.platforms.discord as discord_platform # noqa: E402
from gateway.platforms.discord import DiscordAdapter # noqa: E402
import plugins.platforms.discord.adapter as discord_platform # noqa: E402
from plugins.platforms.discord.adapter import DiscordAdapter # noqa: E402
# ---------------------------------------------------------------------------
@@ -371,7 +371,7 @@ class TestIncomingDocumentHandling:
async def test_image_attachment_unaffected(self, adapter):
"""Image attachments should still go through the image path, not the document path."""
with patch(
"gateway.platforms.discord.cache_image_from_url",
"plugins.platforms.discord.adapter.cache_image_from_url",
new_callable=AsyncMock,
return_value="/tmp/cached_image.png",
):
+2 -2
View File
@@ -45,8 +45,8 @@ def _ensure_discord_mock():
_ensure_discord_mock()
import gateway.platforms.discord as discord_platform # noqa: E402
from gateway.platforms.discord import DiscordAdapter # noqa: E402
import plugins.platforms.discord.adapter as discord_platform # noqa: E402
from plugins.platforms.discord.adapter import DiscordAdapter # noqa: E402
class FakeDMChannel:

Some files were not shown because too many files have changed in this diff Show More