Compare commits

...

341 Commits

Author SHA1 Message Date
dependabot[bot] 0fe0e499c0 chore(actions)(deps): bump actions/create-github-app-token
Bumps [actions/create-github-app-token](https://github.com/actions/create-github-app-token) from 1.9.3 to 3.2.0.
- [Release notes](https://github.com/actions/create-github-app-token/releases)
- [Changelog](https://github.com/actions/create-github-app-token/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/create-github-app-token/compare/7bfa3a4717ef143a604ee0a99d859b8886a96d00...bcd2ba49218906704ab6c1aa796996da409d3eb1)

---
updated-dependencies:
- dependency-name: actions/create-github-app-token
  dependency-version: 3.2.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-25 17:11:44 +00:00
alt-glitch b62af47da8 chore: drop stale line-number reference in PRIORITY path comment
The cherry-pick comment referenced 'line ~6771' for the /stop handler,
but on current main the handler is at a different offset. Remove the
hard-coded line number — the 'above' reference is sufficient.
2026-05-25 16:23:24 +00:00
xxxigm 737ee81167 test(gateway): regression tests for #30170 subagent interrupt protection
17 new tests in tests/gateway/test_subagent_protection_30170.py pin
down both the detection helper and the demotion behaviour:

  * TestAgentHasActiveSubagents — 11 cases covering the precision and
    defensiveness of _agent_has_active_subagents:
      - returns False for None, _AGENT_PENDING_SENTINEL, and stub
        agents that lack the _active_children attribute;
      - returns False for an empty list (the steady state of an idle
        AIAgent);
      - returns True for one or many children;
      - works when _active_children_lock is None (test stubs);
      - rejects truthy MagicMock auto-attributes — this is the
        regression-guard for "every MagicMock-based gateway test
        suddenly demotes to queue mode" (which is how this was
        originally found);
      - accepts list/tuple/set as the children container.

  * TestBusyHandlerDemotesInterruptForSubagents — 6 cases driving
    _handle_active_session_busy_message directly:
      - parent.interrupt is NOT called when subagents are active,
        message is still merged into the pending queue;
      - ack copy mentions "Subagent working", "queued", and the
        /stop escape hatch — and does NOT mention "Interrupting";
      - with no subagents, behaviour is byte-identical to the
        pre-#30170 interrupt path (parent.interrupt called with the
        user text, ack says "Interrupting");
      - configured queue mode keeps its vanilla "Queued for the next
        turn" ack (the #30170 demotion-specific copy must NOT fire);
      - configured steer mode still routes to running_agent.steer()
        even when subagents are active (the guard is interrupt-only);
      - _AGENT_PENDING_SENTINEL does not trigger demotion.

Refs #30170.
2026-05-25 16:23:24 +00:00
xxxigm 99d62f6ba1 fix(gateway): protect in-flight subagents from busy-mode interrupts (#30170)
When a user sends a conversational follow-up while delegate_task is
running, gateway/run.py calls running_agent.interrupt(event.text) on
the PARENT agent. AIAgent.interrupt() then cascades synchronously
through self._active_children and calls interrupt() on every child
subagent, aborting in-flight delegate_task work. The user sees the
fallback cascade with no root-cause in the gateway log, and minutes of
subagent progress are destroyed — the exact failure mode reported in

Add GatewayRunner._agent_has_active_subagents(running_agent) — a
static helper that returns True iff the parent is currently driving
subagents via delegate_task. The helper is type-defensive: it ignores
truthy MagicMock auto-attributes (so this doesn't accidentally fire
in every test mock that hits the busy path), the _AGENT_PENDING_SENTINEL
placeholder, and missing locks.

Wire the helper into both interrupt branches:

  1. _handle_active_session_busy_message — the adapter-level busy
     handler. When busy_input_mode == 'interrupt' AND the parent has
     active subagents, demote to 'queue' semantics: skip the
     parent.interrupt() call, merge the message into the pending
     queue, and surface a dedicated ack (" Subagent working — your
     message is queued for when it finishes (use /stop to cancel
     everything).") so the operator knows the message wasn't lost and
     discovers the explicit escape hatch.

  2. The PRIORITY interrupt branch inside _handle_message — the
     non-command fast path. Same rationale, same demotion. Routes
     through _queue_or_replace_pending_event so the next-turn pickup
     stays unchanged.

Explicit /stop and /new commands take a completely different path
(_interrupt_and_clear_session in the slash-command dispatch at line
~6771) and are NOT affected by this guard — the operator still has a
way to force-cancel everything when they actually mean it. Configured
'queue' and 'steer' modes are also untouched: 'queue' already does the
right thing, and 'steer' goes through running_agent.steer() which does
NOT cascade to children (so subagents survive a steer too).

This is Phase 1 of the fix outlined in #30170 — the minimum viable
change that stops subagent loss. Phase 2 (delegation-aware steer
forwarding to active children) and Phase 3 (async delegation, #11508)
are intentionally out of scope.

Refs #30170.
2026-05-25 16:23:24 +00:00
brooklyn! 50aaf0c4ad fix(tui): delineate assistant responses from details (#31087)
* fix(tui): delineate assistant responses from details

Add a muted Response marker before assistant text when thinking/tool details are visible so reasoning and final output do not visually run together.

* fix(tui): account for response separator height

Keep virtual transcript estimates aligned with the new response separator and avoid allocating trimmed copies of long assistant text.

* fix(tui): gate response separator estimate on details

Only add response-separator height when assistant details actually render, and use a non-allocating body-text check.

* fix(tui): skip empty detail height estimates

Do not add virtual transcript height for assistant details when no thinking or tool detail UI will render.

* fix(tui): estimate details by section visibility

Pass resolved thinking/tool visibility into virtual height estimates so hidden detail sections do not reserve response-separator rows.
2026-05-25 10:23:03 -05:00
brooklyn! 0ec0cafdd0 Merge pull request #31084 from NousResearch/bb/tui-right-click-copy-selection
fix(tui): right-click copies active transcript selection
2026-05-25 10:22:43 -05:00
Savanne Kham 4117fc3645 fix(credential-pool): correct pool rotation when weekly usage limit is reached
After key #1 is marked exhausted the retry still called the API with key #1
due to env-var bias in _get_cached_client / resolve_api_key_provider_credentials.
Fix: peek the pool and pass the active entry's key as explicit_api_key.
Secondary: api_key_hint in mark_exhausted_and_rotate pins the correct entry
under concurrent CLI+gateway calls; _is_payment_error matches GoUsageLimitError;
extract_api_error_context parses "Resets in Xhr Ymin".
2026-05-25 06:32:30 -07:00
Teknium 8f19485f53 chore(release): map kylekahraman email to GitHub login
Required by CI author validation after salvaging PR #29723.
2026-05-25 06:23:18 -07:00
kylekahraman ab42658dfc feat: configurable paste collapse thresholds (TUI + CLI)
Adds two new config keys:
- paste_collapse_threshold (default: 5) — line count threshold for
  bracketed paste collapse in both TUI and CLI
- paste_collapse_threshold_fallback (default: 0, disabled) — same for
  the fallback heuristic in terminals without bracketed paste support

TUI frontend reads these from config.get full via applyDisplay/patchUiState.
CLI reads from self.config at paste-handling time.

Closes #5626
Related: #5623
2026-05-25 06:23:18 -07:00
zccyman 973bb124a4 fix(credential-pool): rotate immediately when credential already exhausted
Closes #26145.

When the user interrupts the retry loop between two 429s (Ctrl-C in
interactive mode, /new, gateway disconnect), the local has_retried_429
flag dies with the recovery function. On the next user prompt the agent
restarts with has_retried_429=False, hits 429 on the exhausted credential,
sets the flag, returns 'retry once'. Repeat forever — the second 429 that
would trigger rotation is never reached, and healthy entries (priority>0
free/paid accounts) are never tried.

Fix: in recover_with_credential_pool's rate_limit branch, pre-check
pool.current().last_status before running the retry-once dance. If the
current entry is already STATUS_EXHAUSTED, rotate immediately. Uses
getattr() for the attribute read so existing tests with SimpleNamespace
mocks (which only set 'label') keep working.

Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
2026-05-25 06:21:28 -07:00
Teknium 0a6a0ba527 test(skills): widen assertion in PR#6656 regression to accept new validator msg
The new install-path validator from this PR raises 'Unsafe install path:
...' earlier in the pipeline than the previous resolve-then-check path.
Behavior is identical (ok=False, victim untouched, refused before
rmtree) — only the error string changed.
2026-05-25 06:13:36 -07:00
峯岸 亮 3b9b9a7ad7 fix(skills): guard uninstall lock paths
Validate Skills Hub lock-file install paths at both ends of the
lifecycle so a poisoned or malformed lock.json entry cannot drive
shutil.rmtree to a location outside SKILLS_DIR:

- HubLockFile.record_install rejects empty/'.'/absolute/traversal/
  Windows-drive paths at write time, and requires the final path
  component to match the skill name (shape: '<skill>' or
  '<category>/<skill>').
- install_from_quarantine resolves its destination through the same
  validator, catching symlink/junction redirects inside skills/.
- uninstall_skill resolves the lock entry through the new validator
  before rmtree. Refuses anything that resolves to SKILLS_DIR itself
  (empty/dot paths) or to a target outside SKILLS_DIR (absolute paths,
  traversal, symlinked dirs in skills/ pointing outward).
- 14 focused regression tests covering each rejection class plus a
  symlink-redirect case.

E2E verified: hand-crafted poisoned lock.json entries (absolute path,
empty install_path, traversal) all refuse and leave the targeted
victim untouched; legitimate uninstall still succeeds.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-25 06:13:36 -07:00
Teknium 0d137f1039 feat(errors): actionable guidance for Nous OAuth 401s (#32082)
Nous Portal is OAuth-only (auth_type=oauth_device_code, no API key path),
but the non-retryable-401 guidance branch only covered openai-codex and
xai-oauth. A Nous 401 fell through to the generic 'Your API key was
rejected... run hermes setup' message, which is wrong advice — the user
needs hermes auth add nous --type oauth, not an API key.

Also flag the case where the failing model slug ends in :free (OpenRouter
syntax) while provider is nous. Without that hint, users re-OAuth
successfully and then hit the same 401 on the next message because Nous
Portal doesn't carry the OpenRouter free-tier slug.

Reported by ashh — debug dump showed Nous device_code exhausted +
deepseek/deepseek-v4-flash:free as the model.
2026-05-25 06:06:51 -07:00
wysie dbe5d84972 fix(auxiliary): universal main-model fallback for aux tasks (#31845)
Aux callers (title generation, vision, session search, etc.) can reach
resolve_provider_client() without an explicit model when the user
picked their main provider via 'hermes model' and didn't bother
configuring a per-task auxiliary.<task>.model override.  The
expectation in that case is universal: 'use my main model for side
tasks too.'

Before, the OAuth providers (xai-oauth, openai-codex) silently
returned (None, None) on an empty model — both lack a catalog default
because their accepted-model lists drift on the backend.  That caused
_resolve_auto to drop to its Step-2 fallback chain (OpenRouter /
Nous / etc.), so aux tasks billed against the wrong subscription
without warning.

The fix is at the top of resolve_provider_client() — a single
3-step universal fallback that runs before any provider branch, so
no provider-specific empty-model guards are needed (now or for any
future provider we add):

    1. caller-passed model (caller knew what they wanted)
    2. provider's catalog default (cheap aux model, if registered)
    3. user's main model from config.yaml

Behaviour by provider class:

- OAuth providers (xai-oauth, openai-codex) — no catalog default, so
  step 3 applies.  Title gen runs on grok-4.3 / gpt-5.4 against the
  user's actual subscription instead of leaking to OpenRouter.
- API-key providers (anthropic, gemini, kimi-coding, etc.) — catalog
  default wins at step 2, preserving the original 'cheap aux model'
  behaviour.  Anthropic users still get claude-haiku-4-5 for titles,
  not opus.
- Explicit-model callers (auxiliary.<task>.model config, programmatic
  callers) — caller wins at step 1, no surprise switching.

Salvaged from @wysie's PR #31845 which fixed the xai-oauth branch
specifically.  The universal shape supersedes the per-branch fix
and covers openai-codex (same bug class) plus any future OAuth
providers.

4 new tests in TestResolveProviderClientUniversalModelFallback:

- empty_model_for_oauth_provider_falls_back_to_main_model
- empty_model_for_codex_also_uses_main_model
- empty_model_for_catalog_provider_uses_catalog_default
- explicit_model_takes_precedence_over_fallbacks

365/365 across tests/agent/test_auxiliary_*, tests/run_agent/test_codex_xai_oauth_recovery.py, tests/hermes_cli/test_auth_xai_oauth_provider.py, and tests/hermes_cli/test_plugin_auxiliary_tasks.py.

Co-authored-by: wysie <wysie@users.noreply.github.com>
2026-05-25 05:50:56 -07:00
Teknium 46c1ae8b24 fix(tests): four pre-existing flakes from the security cluster merge (#32072)
All four failures were broken by the security cluster (#10082 / #10133 /
#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when #32042 and #32061 ran, gating PRs that touched
unrelated code.

1) tests/hermes_cli/test_update_zip_symlink_reject.py
   test_update_via_zip_accepts_normal_member called the real
   _update_via_zip without sandboxing PROJECT_ROOT — so the function's
   shutil.copytree() actually copied the fake README from the test ZIP
   over the real repo's README.md, which then made
   test_readme_mentions_powershell_installer fail in any test run that
   happened to pick this test up earlier. Mock PROJECT_ROOT to an
   isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
   doesn't actually run, and assert the fake README lands in the
   sandbox (not the real tree).

2) tests/tools/test_windows_native_support.py
   test_readme_mentions_powershell_installer was the victim of (1) —
   nothing wrong with the test itself, the fix in (1) clears it.

3) tests/tools/test_file_read_guards.py
   test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
   expecting False. But _is_blocked_device runs realpath() and on
   pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
   (because the worker subprocess inherits open fds from pytest's
   collection pipe machinery). Switch to the lower-level
   _is_blocked_device_path which is the path-pattern check the test
   actually means to exercise; realpath-resolution coverage already
   lives in test_symlink_to_blocked_device_is_blocked.

4) tests/tools/test_transcription_tools.py
   Module installed a faster_whisper stub via sys.modules without
   setting __spec__, then later @pytest.mark.skipif called
   importlib.util.find_spec('faster_whisper') which raises
   'ValueError: __spec__ is None' for modules with a None spec attr.
   Set __spec__ on the stub to a real ModuleSpec.

Validation: 195/195 green across the 4 affected files.
2026-05-25 05:50:29 -07:00
alt-glitch f5bb595d51 chore(release): map 8bit64k + hclsys in AUTHOR_MAP 2026-05-25 12:48:46 +00:00
alt-glitch 85a0b3424e test(tui): regression test for /q alias resolving to queue (#31983)
Adapted from @hclsys's test in PR #31985. Asserts findSlashCommand('q')
resolves to the queue command, not quit.
2026-05-25 12:48:46 +00:00
8bit64K 064ac28cbd fix(tui): remove 'q' alias from /quit, add to /queue
The TUI frontend's slash command registry shadowed /queue's 'q' alias
with /quit's 'q' alias. Since /quit appeared later in the registry,
the flat lookup kept the later entry, making /q always quit instead
of queueing a prompt.

This mirrors the backend fix in PR #10538 (hermes_cli/commands.py)
but applies the same correction to the TUI TypeScript registry.

Fixes #10467
2026-05-25 12:48:46 +00:00
Teknium 8191f663dd feat(mcp-oauth): accept 'skip' at paste prompt to bypass auth without disabling server (#32069)
When an MCP server triggers OAuth at startup, the user can now type 'skip'
(or 'cancel', 's', 'n', 'no', 'q', 'quit') at the paste prompt + Enter to
exit the flow cleanly and continue agent startup without that server.

Previously the only ways to bypass an unwanted OAuth prompt were:
  - Wait the full 5-minute paste timeout
  - Ctrl+C (also kills the whole reload, may leave half-state)
  - Edit config.yaml to set 'enabled: false' on the server

Skip writes a sentinel to result['error'] which _wait_for_callback maps to
OAuthNonInteractiveError('user_skipped'). mcp_tool already classifies that
as an auth error in _is_auth_error() and the reconnect loop logs it as
'not retrying automatically' — server stays disconnected for the session,
other MCP servers continue normally, no infinite retry burn.

The skip message tells users how to re-auth later ('hermes mcp login') or
disable persistently ('enabled: false'), so they don't have to remember.

14 new tests covering: case-insensitive skip parsing, all 7 skip tokens,
skip not stomping an HTTP-listener win, skip routed to skip path rather
than URL-parse path, sentinel mapped to OAuthNonInteractiveError, prompt
mentions the skip option.
2026-05-25 05:37:30 -07:00
Teknium bdf3696705 docs(mcp-oauth): document paste-back flow and SSH options for remote MCP OAuth (#32067)
Follow-up to #32053. The OAuth-over-SSH guide and the MCP feature page
previously only covered xAI and Spotify. Now that MCP servers can complete
OAuth via stdin paste-back on remote/headless hosts, document it.

oauth-over-ssh.md:
- Add MCP servers to the 'Which Providers Need This' table.
- New 'MCP Servers' section covering: paste-back (no setup, works
  anywhere), SSH port forward (same pattern as xAI/Spotify), and the 30s
  config-auto-reload race pitfall (use 'hermes mcp login <server>' from a
  fresh terminal instead of editing config from inside a running session).

mcp.md:
- New 'OAuth-authenticated HTTP servers' section under HTTP servers,
  covering auth: oauth config, token cache path, paste-back vs SSH
  tunnel for headless hosts, and the same reload-race pitfall.
- Cross-links to the OAuth-over-SSH guide anchor.
2026-05-25 05:35:47 -07:00
Teknium 1c3c364287 feat(cli): show live background terminal-process count in status bar (#32061)
The CLI status bar tracked /background agent tasks (▶ N) but not shell
processes spawned via terminal(background=true). Both kinds of work can
run concurrently and a user has no in-bar signal for shell processes.

Add an independent indicator (⚙ N) sourced from
tools.process_registry.process_registry._running. The two indicators
render side-by-side when both are active (▶ 1 │ ⚙ 2), hidden when their
count is zero. Renders at all four status-bar tiers (text fallback +
prompt_toolkit fragments, narrow + wide widths). The narrow <52 tier
still drops both for space — unchanged.

New ProcessRegistry.count_running() returns len(_running) without
acquiring _lock; CPython dict len is atomic and we're polling on every
status-bar tick, so lock-free is the right tradeoff.
2026-05-25 05:35:02 -07:00
teknium1 2b16de0ec3 chore(release): map adam91holt for PR #31984 salvage 2026-05-25 05:34:42 -07:00
adam91holt 8601c4d44c fix(codex): add time-to-first-byte watchdog for stalled Codex streams
The chatgpt.com/backend-api/codex endpoint has an intermittent failure mode
where it accepts the connection but never emits a single stream event — the
socket just hangs. Direct sequential probing reproduces it (0 events, no HTTP
status), and a fresh reconnect then succeeds in ~2s. Today the only guard is
the wall-clock stale timeout in interruptible_api_call, so a dead-on-arrival
connection is held for the full stale window (90-900s depending on context /
config) before the retry loop can reconnect — minutes of wasted wall time per
stall, at a rate of ~20% of calls during affected windows.

Add a TTFB watchdog scoped to the codex_responses path:

- codex_runtime.run_codex_stream stamps agent._codex_stream_last_event_ts on
  *every* stream event (not just output-text deltas), so reasoning-only and
  tool-call-only turns are not mistaken for a stall.
- interruptible_api_call resets that marker before the worker starts and, while
  it is still None, kills the connection once elapsed exceeds the TTFB cutoff
  (default 45s, tunable via HERMES_CODEX_TTFB_TIMEOUT_SECONDS, 0 disables). The
  raised TimeoutError flows through the existing retry path unchanged.

Once any event has arrived the stream is healthy and only the existing
wall-clock stale timeout applies, so legitimate long generations are never
interrupted. Gated to codex_responses; the chat_completions non-stream,
anthropic and bedrock branches have no first-event signal and are untouched.

Adds tests/agent/test_codex_ttfb_watchdog.py covering the stall kill, the
events-flowing pass-through, and the env-disable path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-25 05:34:42 -07:00
Teknium a989a79c0c fix(gateway): allow native delivery of freshly-produced agent files (#32060)
The gateway's media delivery allowlist required files live inside
`~/.hermes/cache/{documents,images,...}`, which is the wrong shape for
real agent usage. Agents naturally produce artifacts via terminal tools
(`pandoc -o /tmp/report.pdf`, `matplotlib savefig`, etc.) or
write_file into project directories — these never land under the cache.
Result: users got a raw file path in chat instead of an attachment.

This is doubly bad in deployment shapes where the cache directories
aren't writable by the agent at all: Hermes running in Docker with a
read-only mount, or with a Docker/Modal/SSH terminal backend whose
filesystem isn't the gateway host's filesystem.

Layered trust model:

1. Cache-dir allowlist (unchanged) — Hermes-managed roots always trusted.
2. Operator allowlist — `HERMES_MEDIA_ALLOW_DIRS` env var, now also
   surfaced as `gateway.media_delivery_allow_dirs` in config.yaml.
3. Recency-based trust (new, default on) — files whose mtime is within
   `gateway.trust_recent_files_seconds` (default 600s) of "now" are
   trusted even outside the cache/operator allowlist. Old host files
   (`/etc/passwd`, `~/.bashrc`, `~/.ssh/id_rsa`) have mtimes measured
   in days/months, well outside the window — prompt-injection paths
   pointing at pre-existing files are still rejected.
4. Hard denylist — `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`,
   `/var/{log,lib,run}`, plus `$HOME/.{ssh,aws,gnupg,kube,docker,config,
   azure,gcloud}` and `Library/Keychains`. Denylist blocks delivery
   even when recency would trust the file, in case an attacker
   somehow refreshes a sensitive file's mtime.

Operators who want strict-allowlist behavior set
`gateway.trust_recent_files: false` and the system reverts to
pre-existing behavior.

Tests: 6 new cases in test_platform_base.py cover the recency window,
disabled mode, system-path denylist, and the motivating PDF-in-project
scenario. 3 existing tests (test_platform_base, test_tts_media_routing,
test_send_message_tool) that exercised the strict-allowlist path are
updated to disable recency trust explicitly.

E2E validation: real `validate_media_delivery_path()` accepts fresh
PDFs in /tmp and project dirs, rejects /etc/passwd, ~/.ssh/id_rsa, and
files older than the window; config.yaml `gateway.*` keys bridge
correctly to the env vars the validator reads.
2026-05-25 05:34:31 -07:00
Teknium 0ff7c09e2f feat(mcp-oauth): stdin paste-back fallback for headless OAuth flow (#32053)
When the user runs OAuth on a remote/SSH machine without a port forward,
the OAuth provider redirects to http://127.0.0.1:<port>/callback which
only the listener on the remote machine can receive — the user's browser
on another box just shows a connection error.

_wait_for_callback() now races the HTTP listener against a stdin reader
on interactive TTYs. The user can copy the URL from the browser's address
bar after authorization (which contains code=...&state=...) and paste it
back at the prompt. Whichever fills the result dict first wins; the HTTP
listener remains the primary path for local sessions and SSH tunnels.

Accepts any of:
  - Full local redirect URL: http://127.0.0.1:N/callback?code=...&state=...
  - Provider URL after redirect: https://mcp.linear.app/callback?code=...&state=...
  - Just the query string: ?code=...&state=... or code=...&state=...

The paste thread only spawns when _is_interactive() is true, preserving
the existing 'no input() in headless runs' invariant — verified by
TestWaitForCallbackPasteIntegration.test_paste_prompt_NOT_shown_when_noninteractive.

The SSH-session hint in _redirect_handler is updated to surface the paste
option as the primary remedy, with ssh -L tunneling as the alternative.
2026-05-25 05:20:05 -07:00
teknium1 e9119e0eb8 chore(release): map dsr-restyn + WuKongAI-CMU + codeblackhole1024 for S04 cluster 2026-05-25 05:15:55 -07:00
codeblackhole1024 bd2756dd22 fix(update): reject symlink members in update ZIP
_update_via_zip downloads a source ZIP from GitHub and calls
zipfile.ZipFile.extractall. The existing zip-slip path guard validates
each member's path stays under tmp_dir, but does not check member type
— so a ZIP containing a symlink member would still be materialized by
extractall, and a symlink target could point outside the extracted
tree (or to a sensitive system path).

This isn't a high-likelihood threat for hermes-agent's actual GitHub
source ZIPs (we don't ship symlinks), but the extractall path runs as
the user's account and a compromised mirror could plant arbitrary files
via the symlink → target → write chain.

Reject any member whose Unix mode bits (upper 16 bits of external_attr)
are S_IFLNK before extractall. Hermes source ZIPs contain only regular
files and directories; a symlink member is unambiguously suspicious.

Regression tests cover: symlink member rejection (raises ValueError,
caught by the outer try/except as a clean SystemExit, no extraction),
and the happy-path verification that a normal ZIP doesn't trigger the
symlink reject message.

Salvaged from PR #15881 by @codeblackhole1024. The remaining pieces of
that PR were already on main or contradicted explicit design decisions:
- config.yaml write-deny: already in agent/file_safety.py's
  control_file_names denylist (the modern guard); the proposed addition
  to build_write_denied_paths was the legacy path.
- Quick commands danger detection: contradicts the explicit
  cli.py:8491-8492 comment 'shell=True is intentional: quick_commands
  are user-defined shell snippets from config.yaml — not agent/LLM
  controlled.'
- Memory plugin shlex.split for dep checks: already on main
  (hermes_cli/memory_setup.py:133).

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-25 05:15:55 -07:00
aaronlab 5f20322d23 fix(tts): reject '..' traversal in output_path
text_to_speech_tool accepts an explicit output_path. Without a traversal
guard, a path containing '..' components (whether prompt-injection-
controlled, from a confused skill, or just a buggy caller) could escape
its declared base and write the audio to a system location — e.g.
`output_path='audio/../../etc/cron.d/x'` lands the file outside the
intended audio cache.

Reject '..' components in the user-supplied path. Explicit absolute
paths are unchanged (the agent legitimately writes audio wherever the
user/caller asks); only traversal-style escapes are blocked. The
terminal tool can still write anywhere with approval — this just keeps
the unattended TTS surface from materializing files via traversal.

Regression tests cover: '..' in the middle (audio/../../etc/...),
bare '..' prefix, and the negative cases (absolute paths + relative
paths without '..' both pass through unchanged).

Salvaged from PR #6693 by @aaronlab. The original PR confined output to
DEFAULT_OUTPUT_DIR-or-cwd, which broke 9 existing tests that legitimately
write to tmp_path locations. The traversal-only check covers the actual
threat (path-escape via '..' from prompt injection) without restricting
where users can choose to write their audio.

The remaining pieces of #6693 (skill_commands rglob symlink rejection,
delegate_tool batch prefix display) are dropped:
- skill_commands rglob: breaks the documented design supporting
  ~/.hermes/skills/<name> as a symlink to a checked-out skill elsewhere
  (see comment at agent/skill_commands.py:73-75)
- delegate_tool batch prefix: pure UX, doesn't belong in a security PR

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-25 05:15:55 -07:00
daimon-nous[bot] ac5359a3f3 fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998) (#32012)
* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998)

When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.

Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
  'length' for mid-tool-call partials. The stub still has tool_calls=None
  so no tool auto-executes — the model gets a fresh API call through the
  existing length-continuation machinery (bounded to 3 retries).
  Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
  partial-stream-stubs with dropped tool calls. Instead of the generic
  'continue where you left off' (which would retry the same large call),
  tell the model to break the output into smaller tool calls (~8K
  tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
  finish_reason='stop' to 'length', add _dropped_tool_names assertion,
  add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
  prompt branching.

Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.

* refactor: extract constants and continuation prompt helper

- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
  FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
  3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic

---------

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-05-25 17:43:10 +05:30
nguyen binh 46d8b5dadf fix(profile): reject symlinks in distributions (#25292) 2026-05-25 05:07:58 -07:00
nguyen binh 0d55315c36 fix(backup): skip symlinked files in zip archives (#25289) 2026-05-25 05:07:52 -07:00
Teknium 79799c80f5 test(approval): patch _YOLO_MODE_FROZEN directly in test_yolo_overrides_cron_deny
The test set HERMES_YOLO_MODE=1 via monkeypatch.setenv, expecting
check_dangerous_command() to honor yolo and bypass cron_mode=deny. But
tools.approval._YOLO_MODE_FROZEN is intentionally frozen at module
import time (security: prevents prompt-injection runtime escalation).
When CI imports the module BEFORE the test sets the env, the frozen
value stays False and the yolo bypass never activates.

Local runs missed this because the conftest leaked a non-empty
HERMES_YOLO_MODE into the import-time env. CI's clean-env path exposed
the bug deterministically on test (3) / test (4) shards.

Fix: patch the module attribute directly via mock.patch.object so the
test simulates process-startup-with-yolo regardless of import order.
The behavior under test (yolo bypasses cron_mode=deny for non-hardline
commands) is unchanged; the security invariant (_YOLO_MODE_FROZEN can't
be set at runtime by skills) is preserved.

Reproduced locally with: env -i HOME=$HOME PATH=$PATH python3 -m pytest
  tests/tools/test_cron_approval_mode.py -o 'addopts=' -v
Without the fix: 1 failed, 23 passed. With the fix: 24 passed.
2026-05-25 05:07:49 -07:00
Peter 95848b1cbc fix(transcription): reject symlinked audio inputs (#10082)
* fix(transcription): reject symlinked audio inputs

Validation runs before provider selection, so rejecting symbolic-link paths there prevents supported-extension links from being treated as normal audio files. Use os.path.islink to avoid perturbing the existing Path.stat error path and to reject links before resolving targets.

Constraint: Keep validation platform-safe and avoid requiring symlink support where unavailable.
Rejected: Use Path.is_symlink | it consumes pathlib stat calls and broke the existing stat error regression.
Confidence: high
Scope-risk: narrow
Directive: Keep path hardening in _validate_audio_file before provider dispatch.
Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases -q (5 passed)
Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases tests/tools/test_transcription_tools.py::TestTranscribeAudioDispatch::test_invalid_file_short_circuits -q (6 passed)
Tested: source venv/bin/activate && python -m compileall tools/transcription_tools.py tests/tools/test_transcription_tools.py
Tested: git diff --check
Not-tested: Full tests/tools/test_transcription_tools.py under .[dev] only; existing faster_whisper optional dependency tests fail with ModuleNotFoundError.

* Keep transcription tests independent of optional whisper install

The transcription suite mocks faster-whisper directly, so a minimal test stub keeps the branch verifiable in environments where the optional package is not installed. This preserves the existing mock-based coverage without adding a dependency.

Constraint: faster-whisper is an optional local STT dependency and is absent from the current validation environment
Rejected: Install faster-whisper just for branch validation | would add heavyweight environment coupling outside the patch scope
Confidence: high
Scope-risk: narrow
Directive: Keep this as a test-only stub unless production import semantics change
Tested: pytest tests/tools/test_transcription_tools.py -q

---------

Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
2026-05-25 05:07:45 -07:00
Peter ee59ef1946 fix: reject read_file symlinks to blocking devices (#10133)
* fix: reject read_file symlinks to blocking devices

The read_file guard already refused direct device paths such as /dev/zero, but a workspace symlink resolving to one of those devices could still reach the shell-backed read path and hang on wc/head/sed. Keep the literal alias check and add a resolved-path pass so local symlinks to blocked device/fd endpoints are rejected before I/O.

Constraint: Preserve literal /dev/stdin handling before terminal-specific realpath resolution

Confidence: high

Scope-risk: narrow

Tested: pytest tests/tools/test_file_read_guards.py tests/tools/test_file_tools.py -q; python -m compileall tools/file_tools.py tests/tools/test_file_read_guards.py; git diff --check
Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>

* Keep file guard tests off sensitive macOS temp paths

The branch now inherits a sensitive-path write guard from upstream main. On macOS, tempfile.mkdtemp() resolves under /private/var/folders, so the new write-path guard fired before the file read dedup assertions could exercise their intended behavior. The tests now create their scratch files inside the worktree temp checkout, outside those system-sensitive prefixes, without changing production behavior.

Constraint: Rebased branch must pass the expanded file read guard suite on macOS.

Rejected: Loosen the production sensitive-path prefix list | broader behavior change unrelated to this PR.

Confidence: high

Scope-risk: narrow

Tested: pytest tests/tools/test_file_read_guards.py -q

---------

Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
2026-05-25 05:07:38 -07:00
Dakota Secula-Rosell b7b8bec800 fix(security): block /proc/*/environ, cmdline, maps from file read (#4609)
The read_file tool and terminal cat can access /proc/self/environ to
recover all process env vars including secrets stripped by the subprocess
blocklist. Output redaction partially mitigates (catches known-format
tokens) but misses custom/proprietary key formats, especially when
values are printed without their key names.

Add /proc/*/environ, /proc/*/cmdline, and /proc/*/maps to the blocked
device paths in _is_blocked_device():

- /proc/*/environ: leaks full process env (API keys, tokens)
- /proc/*/cmdline: leaks command-line args (may contain passwords)
- /proc/*/maps: leaks memory layout (ASLR bypass for exploitation)

Legitimate /proc reads (cpuinfo, meminfo, uptime, version) remain
accessible — the check only blocks per-pid pseudo-files with known
sensitive suffixes.

Complements PR #4432 (PID namespace isolation for child processes)
which prevents children from reading the parent's /proc, but does not
prevent the parent process itself from being read via file tools.

Partially addresses #4427

Changes:
  tools/file_tools.py                  | +6
  tests/tools/test_file_read_guards.py | +18 -1

Co-authored-by: dsr-restyn <dsr-restyn@users.noreply.github.com>
2026-05-25 05:07:31 -07:00
Teknium 4909dd84c1 chore(release): map 66773372+Tranquil-Flow@users.noreply.github.com to Tranquil-Flow (PR #27518) 2026-05-25 05:07:11 -07:00
Evi Nova 1b12cd5241 fix(cli): bracketed-paste timeout prevents permanent input freeze (#16263)
When the terminal drops the ESC[201~ end mark during a bracketed paste
(terminal race, torn write, SSH glitch, macOS sleep/wake), prompt_toolkit's
Vt100Parser keeps buffering all later input in _paste_buffer forever. From
the user's perspective, the CLI appears frozen — the only recovery was
closing the tab/session.

This patch monkey-patches Vt100Parser.feed() so that bracketed-paste mode
flushes buffered content as a normal BracketedPaste event after 2 seconds
without an end marker, then restores normal parsing.

Includes 8 regression tests covering normal paste, timeout recovery,
torn end marks, and edge cases.

Surgical reapply of PR #27518. Original branch was many months stale
(1193 files / 172k LOC of unrelated reverts); the substantive ~77 LOC
patch in cli.py plus the new 157-line test file were reapplied onto
current main with the contributor's authorship preserved via --author.
2026-05-25 05:07:11 -07:00
Teknium 8697471419 test(cli): cover KeyboardInterrupt guard around slash command dispatch
4 tests: KBI during slash command does not set _should_exit; truthy
return keeps session alive; falsy return still sets exit (legit
/exit path); non-KBI exceptions propagate normally.
2026-05-25 05:06:06 -07:00
ygd58 63d6b9e637 fix(cli): catch KeyboardInterrupt during slash commands to prevent session exit
A Ctrl+C during a slow slash command (e.g. /skills browse on a large
skill tree, /sessions list against a multi-GB SQLite DB) used to unwind
past self.process_command() to the outer prompt_toolkit event loop,
which killed the entire session — losing all conversation state.

Fix: wrap the slash-command dispatch in try/except KeyboardInterrupt
so Ctrl+C aborts the command but the prompt loop continues. Other
exceptions still propagate so real bugs aren't silently swallowed.

Surgical reapply of PR #5189. Original branch was many months stale
(3764 files / 1M+ LOC of unrelated reverts); the substantive ~6 LOC
change in cli.py was reapplied by hand onto current main with the
contributor's authorship preserved via --author.
2026-05-25 05:06:06 -07:00
Teknium ee7789e547 chore(release): map simo.kiihamaki@gmail.com to SimoKiihamaki (PR #30773) 2026-05-25 05:06:03 -07:00
simokiihamaki fae815adc2 fix(cli): prevent /reset and /new freeze on Windows by falling back to stdin prompt
On Windows (PowerShell/Windows Terminal), the queue-based modal used for
destructive slash command confirmations deadlocks because prompt_toolkit's
input channel becomes unresponsive when entered from the process_loop daemon
thread. Keystrokes never reach the key bindings, so response_queue.get()
blocks until the 120-second timeout expires.

Fix: fall back to _prompt_text_input (stdin-based) when:
1. sys.platform == 'win32' — Windows console doesn't support the modal reliably
2. Called from non-main thread — key bindings can't fire from daemon threads
3. self._app is not set — existing behavior for tests/non-interactive

This mirrors the thread-aware guard from _prompt_text_input (PR #23454).

9 new regression tests covering Windows detection, non-main thread fallback,
macOS/Linux modal preservation, and integration with _confirm_destructive_slash.

Fixes #30768

Surgical reapply of PR #30773. Original branch was many months stale (911
files / 146k LOC of unrelated reverts); the substantive ~30 LOC change in
cli.py plus the new test file were reapplied onto current main with the
contributor's authorship preserved via --author.
2026-05-25 05:06:03 -07:00
Tranquil-Flow b1adb95038 fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern
The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically
silently dropped certain model requests: the connection is accepted but no
stream events are emitted and no error is raised. PR #31967 lowered the
implicit stale-call default from 300s to 90s so fallbacks kick in faster,
but users still see an opaque "No response from provider for 90s
(non-streaming, ...)" message that gives no path forward.

This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend
via codex_responses api_mode — that substitutes the generic timeout
message with actionable text naming the gpt-5.4-codex workaround and
pointing at #21444 for symptom history.

Changes:

- run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method.
  Returns ``None`` for any request that does not match all three guards
  (codex_responses api_mode, openai-codex provider or chatgpt.com Codex
  base URL, gpt-5.5-family model name with word-boundary regex anchoring
  to avoid false-positives on e.g. ``gpt-5.50``).
- agent/chat_completion_helpers.py — the non-stream stale-call site
  consults the hint via ``getattr(...)`` so the call site stays robust
  if the helper is ever removed or stubbed in tests. Hint is appended to
  both the ``_emit_status`` warning and the ``TimeoutError`` message so
  the user sees it in their terminal AND it lands in any retry-loop
  diagnostics.
- tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests
  covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5,
  gpt-5.5-codex SKU, model=None fallback to self.model) and negative
  cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard,
  non-codex api_mode, non-codex provider, empty/None model, unrelated
  models on Codex).

Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT
problem we cannot patch from here). Only converts an opaque timeout into
text that names the workaround so users do not have to dig through logs
or wait for a forum post to learn what to do.

Closes #22046
2026-05-25 04:49:22 -07:00
teknium1 4c64638897 chore(release): map liuhao1024 for PR #20778 salvage 2026-05-25 03:40:47 -07:00
liuhao1024 ba3c450914 fix(security): block read_file on project-local .env files
get_read_block_error() only blocked internal Hermes cache files but
allowed reading project-local secret-bearing environment files (.env,
.env.production, .env.local, etc.) through both read_file and ACP
fs/read_text_file paths.

Add a basename deny set for common secret-bearing .env variants.
.env.example remains readable as documentation.

Fixes #20734
2026-05-25 03:40:47 -07:00
teknium1 51c913caf7 chore(release): map dusterbloom for PR #25726 salvage 2026-05-25 03:40:47 -07:00
dusterbloom 79fc92e9cb fix(security): tighten .env file permissions to 0600 at all creation sites
.env holds API keys and secrets. Multiple creation sites used `cp` /
`touch` / `shutil.copy2` which obey the process umask — commonly
0o022, leaving the file at 0o644 (world-readable). Apply chmod 0o600
explicitly at every site that creates or copies .env.

Sites covered:
- docker/stage2-hook.sh: after the seed_one '.env' call, applied
  unconditionally (not just on first-seed) so a host-mounted .env with
  loose perms gets tightened on every container restart
- hermes_cli/doctor.py: 'hermes doctor --fix' touches an empty .env
  when missing
- hermes_cli/profiles.py: 'hermes profile create --clone' copies .env
  from the source profile; shutil.copy2 preserves source mode, so a
  source .env at 0o644 was being cloned into 0o644
- setup-hermes.sh: in-tree setup script's cp .env.example .env path,
  plus the already-exists branch (mirror of install.sh which already
  chmods 600 unconditionally on line 1442)

scripts/install.sh was NOT changed — it already chmod 600's the .env
unconditionally after the create/already-exists branches (line 1442).

Salvaged from PR #25726 by @dusterbloom. The docker/entrypoint.sh
portion of the original PR was dropped because main switched to an
s6-overlay shim — the .env creation logic moved to stage2-hook.sh,
which is where the chmod now lives.

Closes #25497 (subset — install.sh + setup-hermes.sh) and #8448
(subset — install.sh only) as superseded.

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-25 03:40:47 -07:00
Rodrigo 4cb3eb03c7 fix(approval): harden YOLO bypass, LLM parsing, auto-approve audit, pipe pattern (#23835)
* fix(approval): harden YOLO bypass, LLM parsing, auto-approve audit, pipe pattern

- BUG-009 (CRITICAL): freeze HERMES_YOLO_MODE at module import via
  _YOLO_MODE_FROZEN; prevents skills/prompt-injection from calling
  os.environ["HERMES_YOLO_MODE"]="true" at runtime to bypass all checks
- BUG-002 (HIGH): replace substring "APPROVE" in answer with exact
  answer == "APPROVE" in _smart_approve; prompt already requests exactly
  one word, substring match was exploitable via verbose LLM responses
- BUG-001 (MEDIUM): add logger.warning for every dangerous command that
  auto-approves in non-interactive non-gateway context; makes silent
  approvals visible in audit logs without breaking script behavior
- BUG-008 (LOW): expand curl/wget pipe pattern to cover | /bin/bash and
  | bash -c variants, not just | sh / | bash

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(approval): add missing is_truthy_value import + fix yolo test patches

_YOLO_MODE_FROZEN uses is_truthy_value() from utils — import was missing.
Tests that set HERMES_YOLO_MODE via monkeypatch.setenv() no longer work
because the value is frozen at import time; update them to patch the
module-level flag directly via monkeypatch.setattr().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 03:35:33 -07:00
Dennis Vorobyov 3ab7e2aa91 harden(env_passthrough): apply GHSA-rhgp-j443-p4rf filter to config.yaml path (#27794)
register_env_passthrough() (the skill-declared path) filters out names in
_HERMES_PROVIDER_ENV_BLOCKLIST and logs a warning citing GHSA-rhgp-j443-p4rf.
_load_config_passthrough() (the config.yaml path) did not. Both feed the
same is_env_passthrough() allowlist that local.py and code_execution_tool.py
consult before stripping a variable from the child env.

A skill that wanted to leak ANTHROPIC_API_KEY or OPENAI_API_KEY into
execute_code could no longer self-register the name (the GHSA fix
blocks it), but the same outcome was still reachable by asking the
operator to add the name to terminal.env_passthrough in config.yaml,
or by any in-process actor with write access to ~/.hermes/config.yaml.

Apply the same _is_hermes_provider_credential filter inside
_load_config_passthrough, mirroring the skill-path warning so operators
see the same explanation. Non-Hermes API keys (TENOR_API_KEY,
NOTION_TOKEN, etc.) are unaffected since they are not in the blocklist.
2026-05-25 03:35:23 -07:00
Teknium 0219b0408a perf(cli): cut hermes startup 63% — flip head-to-head vs codex (#31968)
* perf(bitwarden): persist secret-fetch cache across CLI invocations

Every `hermes` invocation paid a ~380ms tax for `bws secret list` to
Bitwarden Secrets Manager because the existing cache was in-process only.
Back-to-back `hermes chat -q`, gateway-spawned agents, and cron-launched
runs all re-fetched.

Adds a disk-persisted L2 cache at `<hermes_home>/cache/bws_cache.json`
(mode 0600, never contains the access token — only the SHA-256
fingerprint prefix). Same TTL as the in-process cache. Read on miss,
write on bws success, ignored on key mismatch / corruption / expiry.

Measured on a startup profile:
  load_hermes_dotenv() cold: 372ms → warm (disk cache hit): 20ms

End-to-end `hermes --version` cold→warm: 666ms → ~295ms.

In a hermes-vs-codex benchmark across 11 single- and multi-turn tasks
(framework overhead = wall − llm − tool_exec, median over 3 trials):

  cohort               before    after    saved
  single-turn (median)  2.96s    2.31s   -0.65s
  multi-turn  (5-turn)  9.40s    8.95s   -0.45s (≈0.3s/turn)

Hermes now wins head-to-head on 6/11 tasks vs codex (was 4/11 before).
The remaining ~0.6s single-turn delta is mostly Python's own import
cost in hermes_cli.main, which is a separate optimization.

* perf(cli): lazy-load model catalog + dedupe config.yaml reads at startup

Two import-time wins on top of the bws disk-cache fix:

1. Lazy-load `hermes_cli.models._PROVIDER_MODELS` via PEP 562
   module-level `__getattr__`. The catalog is ~55ms of work that was
   eagerly imported on every CLI invocation (line 4557 `if not
   _is_termux_startup_environment(): from hermes_cli.models import
   _PROVIDER_MODELS`). Audit showed every internal call site already
   does its own function-local import; only test code reads
   `hermes_cli.main._PROVIDER_MODELS` as a module attribute, and
   __getattr__ keeps that working transparently. First access triggers
   the import once and caches the result on the module via
   `globals()[name] = ...`, so subsequent reads are dict lookups.

2. Dedupe the double config.yaml read in the top-of-module bootstrap.
   Previously: one raw yaml.safe_load for the `security.redact_secrets`
   bridge, then a separate full `load_config()` (with deep-merge) for
   `network.force_ipv4`. Both keys come from the same file. Merged
   into one raw yaml load.

Combined with the bws cache fix in the previous commit:

  hermes --version wall time:
    original (cold):           666 ms
    after bws fix (warm):      295 ms
    after lazy-load + dedupe:  228 ms   (-67 ms additional, -66% from original)

Tests:
  - tests/hermes_cli/test_api_key_providers.py: 173/173 pass
    (lazy __getattr__ correctly handles
     `from hermes_cli.main import _PROVIDER_MODELS`)
  - tests/test_ipv4_preference.py + tests/hermes_cli/test_redact_config_bridge.py +
    tests/agent/test_redact.py: 93/93 pass (dedupe preserves both bridges)
  - tests/test_bitwarden_secrets.py + env_loader tests: 49/49 pass
2026-05-25 03:06:39 -07:00
teknium1 c0169496d0 chore(release): map jfuenmayor + Jiahui-Gu + YLChen-007 + AdamPlatin123 + waefrebeorn for S11 cluster salvage 2026-05-25 01:55:59 -07:00
waefrebeorn 5faea3f618 fix(file_tools): reject '..' traversal in V4A patch headers
V4A patch '*** Update File:', '*** Add File:', '*** Delete File:' headers
come from patch CONTENT, not the explicit `path=` argument. That makes
them attacker-influenceable through skill content, web extract output,
prompt injection, and other surfaces the agent processes. Headers like
'*** Update File: ../../../etc/shadow' would resolve relative to the
agent's cwd; in deployment configurations where that cwd is deep enough
to land outside Hermes' protected paths, the write could land somewhere
the agent operator did not intend.

Reject any V4A header containing a '..' path component before applying
the patch. The explicit `path=` argument on patch_tool is UNCHANGED —
the agent legitimately uses '..' there (e.g. `patch path='../other_module/x.py'`
from a worktree dir is normal cross-module editing).

Regression tests: V4A Update header with traversal rejected, V4A Add
header with traversal rejected, patch_v4a never invoked when rejection
fires.

Salvaged from PR #29395 by @waefrebeorn. The original PR added
has_traversal_component as a blanket reject on read_file_tool,
write_file_tool, patch_tool's explicit path, and search_tool — that
would break legitimate agent operation where '..' is normal. Also
dropped the over-eager skills_guard pattern additions
(pickle.loads/marshal.loads/ctypes.CDLL/importlib at high/critical
severity would false-positive on legit data-science and FFI skills).

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-25 01:55:59 -07:00
AdamPlatin123 00bd24e27c fix(security): expand memory content scanning patterns to parity with skills guard (#9151)
Expand _MEMORY_THREAT_PATTERNS from 13 to 24 regex patterns and align
_INVISIBLE_CHARS with skills_guard.py (10 → 17 characters).

Key changes:
- Add multi-word bypass prevention (?:\w+\s+)* to injection patterns
- Add missing injection patterns: role_pretend, leak_system_prompt,
  remove_filters, fake_update, translate_execute, html_comment_injection,
  hidden_div
- Add exfiltration patterns: send_to_url, context_exfil
- Add persistence patterns: agent_config_mod, hermes_config_mod
  (both require modification-verb prefix to avoid false positives on
  mere mentions of config filenames)
- Add hardcoded secret detection pattern
- Add role_hijack precision fix: require article after "now" to avoid
  blocking "you are now ready/connected/set up" etc.
- Expand invisible unicode set with directional isolates (U+2066-2069)
  and invisible math operators (U+2062-2064)

Test coverage expanded from ~8 to ~30 scan tests including dedicated
false-positive regression tests for all precision-sensitive patterns.

Known limitations (deferred to follow-up PRs):
- prompt_builder.py and cronjob_tools.py still use older pattern sets
- No semantic/LLM-based scanning (regex-only approach)
- No cross-entry or cross-store analysis
2026-05-25 01:51:53 -07:00
Edward-x 7ebebfbb8d Harden Skills Guard multi-word prompt patterns (#26852)
Co-authored-by: openhands <openhands@all-hands.dev>
2026-05-25 01:51:27 -07:00
JiahuiGu 0a2ee71ccc fix(skill): guard pickle.loads in darwinian-evolver show_snapshot with explicit flag (#29276)
show_snapshot.py unpickled a user-supplied path unconditionally. pickle.loads
is equivalent to arbitrary code execution, so a snapshot from an untrusted
source = RCE. Require an explicit --i-trust-this-file acknowledgement before
calling pickle.loads, and emit a stderr warning when proceeding.

Co-authored-by: Jiahui-Gu <jiahuigu@users.noreply.github.com>
2026-05-25 01:51:21 -07:00
Jorge Fuenmayor 93660643a6 fix: harden skill trust source matching (#31229)
Co-authored-by: gaia <gaia@gaia.local>
2026-05-25 01:51:15 -07:00
Kasun Athaudahetti 2d422720b5 fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults
Codex / Responses-API requests had three latent timeout bugs that combined
into the long silent hangs reported on #21444:

1. The non-stream stale-call detector estimated context tokens from
   ``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry
   their conversational load in ``input`` (with ``instructions`` and
   ``tools``), so every Codex turn logged ``context=~0 tokens`` and the
   detector never applied its >50k / >100k tier bumps.

2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the
   main Codex path. The chat_completions path and the auxiliary Codex
   adapter both forwarded it; the main path skipped it through three
   places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``,
   ``_preflight_codex_api_kwargs``).

3. The streaming stale detector had the same payload-shape bug for
   ``codex_responses`` requests, which route through the non-streaming
   detector (it's the path that emits the user-facing
   "No response from provider for 300s (non-streaming, ...)" warning that
   reporters keep pasting).

This commit:

- Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``,
  used by both the non-stream and stream detectors. Handles ``messages``
  (Chat Completions), ``input + instructions + tools`` (Responses API),
  bare lists, and an unknown-dict fallback.
- Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs``
  and ``_preflight_codex_api_kwargs`` (with guards against
  zero/negative/inf/bool values), and wires
  ``_resolved_api_call_timeout()`` into the Codex branch of
  ``build_api_kwargs``.
- Lowers the implicit non-stream stale defaults so fallback providers
  kick in faster when upstream stalls:
    * base   300s -> 90s
    * >50k   450s -> 150s
    * >100k  600s -> 240s
  These only apply when the user has *not* set
  ``providers.<id>.stale_timeout_seconds`` or
  ``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins.
- Adds regression tests for the estimator shapes, the new defaults, the
  context-tier scaling, transport timeout pass-through, and preflight
  timeout pass-through / rejection of invalid values.

Closes #21444
Supersedes #21652 #24126 #31855

Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>
2026-05-25 01:47:55 -07:00
Teknium 76135b329d docs(i18n): translate all docs into Simplified Chinese (zh-Hans) (#31942)
Translates the full English docs corpus (335 files) into Simplified
Chinese under website/i18n/zh-Hans/. Combined with PR #31895 (cross-
locale link fix), the 简体中文 locale toggle now serves a complete
Chinese site with working cross-page navigation.

Pipeline:
- Claude Sonnet 4.6 via OpenRouter, 8-way concurrent
- Preserves frontmatter keys, code blocks, MDX/JSX, link URLs, brand
  names, and technical jargon (prompt/token/hook/MCP/ACP/etc.)
- Translates only frontmatter title/description and prose
- Two largest files (configuration.md 93KB, research-paper-writing.md
  107KB) retried with 64K max_tokens after initial fence-drift
- 3 manual post-fixes for MDX edge cases the model didn't escape:
  &lt; in optional-skills-catalog table, double-quotes in an alt= tag,
  and a bare URL adjacent to a full-width period

Cost: ~$30 total (Sonnet 4.6 input $3/M + output $15/M).

Verified `npm run build` succeeds for both en and zh-Hans locales,
no double-prefixed /docs/zh-Hans/docs/ URLs in rendered output,
all in-page navigation resolves correctly.

Translations are machine-generated and may need human review on
specific pages — but they're an enormous improvement over the
previous state (3 zh-Hans pages out of 335).
2026-05-25 01:47:38 -07:00
Teknium ffe11c14ec test(cli): cover quiet-mode resume status lines routed to stderr
4 tests: session-not-found in quiet mode -> stderr; in full mode -> stdout
(unchanged); resumed banner in quiet mode -> stderr; has-no-messages in
quiet mode -> stderr.
2026-05-25 01:47:12 -07:00
Michel Belleau 25295e7ac9 fix(cli): redirect resume status lines to stderr in quiet mode (#11793)
When 'hermes chat --quiet --resume <id> -q "..."' is used, three status
messages were written to stdout via ChatConsole / _cprint:

  - '↻ Resumed session <id> (N user messages, M total messages)'
  - 'Session <id> found but has no messages. Starting fresh.'
  - 'Session not found: <id>' / usage hint

This polluted the machine-readable stdout that automation wrappers capture
with $(...), making it impossible to cleanly separate the agent's answer
from the resume banner.

Fix: detect quiet mode via tool_progress_mode == 'off' and route the three
resume status messages to stderr (as plain text, matching the existing
stderr convention for session_id). Interactive mode is unchanged — it
still uses the Rich-rendered path through ChatConsole.

Surgical reapply of PR #11868. Original branch was stale against current
main; reapplied onto current cli.py by hand with original authorship
preserved via --author.
2026-05-25 01:47:12 -07:00
Teknium 11c40d6a42 test+polish(compression): pin anti-thrash gate and gateway session_id persistence
Follow-up to @someaka's fix.

Polish:
- Drop the redundant `_preflight_tokens >= threshold_tokens` clause.
  `should_compress(tokens)` already short-circuits when tokens < threshold,
  so the explicit comparison was dead code on the True branch.

Tests:
- Preflight: pin that should_compress() is called (anti-thrash has a vote).
  Mocks should_compress to return False even with tokens past the raw
  threshold and asserts no compression runs — exact bug shape from #29335.
- Gateway: AST scan of gateway/run.py asserts every
  `session_entry.session_id = ...` assignment is followed by a
  `session_store._save()` call within the same block. Three sites mutate
  the session_id after compression; all three must persist or the next
  turn loads the pre-compression transcript and re-loops. Empirically
  verified the test catches the bug (drops the new _save() line → red).

AUTHOR_MAP:
- Map ed@bebop.crew -> someaka so the salvaged commit resolves to
  @someaka in release notes.
2026-05-25 01:44:46 -07:00
Radical Edward 3914089d52 fix(compression): 3-line fix for infinite compression loop (#29335)
Three compounding root causes:

A) run_conversation() result dict missing session_id — gateway's
   dead-code guard at gateway/run.py:8700 never triggers
B) preflight compression bypasses should_compress() anti-thrashing —
   re-triggers every turn when tool schemas dominate token budget
C) gateway updates session_entry.session_id in memory but doesn't
   persist via session_store._save()

Fixes: #29335
2026-05-25 01:44:46 -07:00
Teknium 222a3a9c19 test(cli): cover exit resume hint -p flag across profiles
5 tests: default/custom profiles emit no -p; named profile emits
-p <name> on both --resume and -c hints; lookup failure falls back
gracefully.
2026-05-25 01:41:54 -07:00
CK iRonin.IT 2a2cef4ac7 fix: include -p profile flag in exit resume hint
Session IDs are profile-constrained, so the resume hint needs to
include the active profile for multi-profile users. Without this,
copying the hint from a non-default profile fails to resume the
correct session.

Before:  hermes --resume 20260414_063228_c1240e
After:   hermes --resume 20260414_063228_c1240e -p dev

Also includes -p on the resume-by-title hint. Skipped for
'default' and 'custom' profiles (no -p needed).

Surgical reapply of PR #9652. Original branch was stale against
current main (~6 months); reapplied onto current cli.py by hand
with original authorship preserved.
2026-05-25 01:41:54 -07:00
teknium1 d3ffbc6409 feat(stt): add stt.providers.<name> command-provider registry
Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any
shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds,
SenseVoice, curl pipelines — become an STT backend with zero Python.
Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved
untouched via the built-in local_command path) and the
register_transcription_provider() Python plugin hook also shipped in this
PR.

Resolution order (mirrors TTS exactly):

  1. Built-in (local, local_command, groq, openai, mistral, xai)
     → native handler. Always wins.
  2. stt.providers.<name>: type: command  → command-provider runner.
  3. Plugin-registered TranscriptionProvider → plugin dispatch.
  4. No match → 'No STT provider available'.

Files
-----
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained;
  added _resolve_command_stt_provider_config, _transcribe_command_stt,
  and local helpers for template rendering, shell-quote context, and
  process-tree termination. Helpers are documented as mirrors of their
  tts_tool.py counterparts (kept local to avoid cross-tool private
  import). Wire-in is one insertion point in transcribe_audio() after
  the xai elif and before the plugin dispatcher. Plugin dispatcher
  additionally defensively short-circuits when a same-name command
  config exists (command-wins-over-plugin invariant).

- tests/tools/test_transcription_command_providers.py: 50 new tests
  covering resolution (builtin precedence, type/command gating,
  case-insensitive lookup, legacy stt.<name> back-compat), helpers
  (timeout fallback, format validation, iter, has-any), template
  rendering (shell-quote contexts, doubled-brace preservation),
  end-to-end via _transcribe_command_stt (output_path read, stdout
  fallback, timeout, nonzero exit envelope, model override,
  language precedence), and dispatcher integration via the real
  transcribe_audio() including command-wins-over-plugin and
  builtin-shadow-rejection.

- tests/plugins/transcription/check_parity_vs_main.py: extended from
  10 to 13 scenarios. New cases: command-provider-installed,
  command-vs-plugin-same-name (verifies command wins precedence),
  explicit-openai-with-command-shadow (verifies built-in wins).
  Adds command_provider dispatch_kind detection via transcript prefix
  (CMD: vs PLUGIN:) so command-provider scenarios can be distinguished
  from plugin scenarios even when sharing a provider name.

- website/docs/user-guide/features/tts.md: new 'STT custom command
  providers' section symmetric to the TTS section — example config,
  placeholder grammar table (input_path / output_path / output_dir /
  format / language / model), transcript-read-back semantics (file
  first, then stdout fallback), optional keys table, behavior notes,
  security note. Updated 'Python plugin providers (STT)' to include
  the new 'When to pick which (STT)' decision table and updated
  resolution-order section (now 4 layers instead of 3).

Verification
------------
189/189 STT targeted tests + 50/50 new command-provider tests pass.
Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/
8623/8623 — zero regressions across 14,199 tests.

Parity harness: 13 scenarios, 9 OK + 4 expected diffs
(no_provider_error → plugin, plugin_unavailable, command_provider × 2).

E2E live-verified in an isolated HERMES_HOME with a real .wav file:

  command:                    → dispatched to stt.providers.my-fake-cli
  plugin:                     → dispatched to registered TranscriptionProvider
  command-wins-over-plugin:   → command provider beats same-name plugin
  builtin-wins-over-command:  → built-in OpenAI handler fires;
                                stt.providers.openai: type: command
                                does NOT hijack it.
2026-05-25 01:41:19 -07:00
kshitijk4poor 2cd952e110 feat(stt): add register_transcription_provider() plugin hook
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.

Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.

Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
   a. if plugin.is_available() returns False → unavailability envelope
      identifying the plugin (not the generic "No STT provider"
      message — the user explicitly opted into this plugin)
   b. otherwise plugin.transcribe() with model + language forwarded
      from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)

Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.

Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
  built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
  PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
  _dispatch_to_plugin_provider() with availability gate, wire-in
  after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
  (covering built-in short-circuit, plugin dispatch, exception
  envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
  subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs

Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
  no_provider_error → plugin (plugin-installed scenario)
  no_provider_error → plugin_unavailable (plugin-installed-unavailable
  scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.

Issue follow-up to #30398.
2026-05-25 01:41:19 -07:00
Teknium 2e0ac31a72 chore(release): map claw@openclaw.ai to wanwan2qq (PR #10215) 2026-05-25 01:33:32 -07:00
Teknium 4fbdf0e893 test(cli,gateway): cover bracket-stripping and gateway session-ID lookup
- CLI: bracketed/quoted target resolves; mismatched single bracket passes through unchanged.
- Gateway: bracketed session ID resolves; bare untitled session ID resolves via get_session() fallback.
2026-05-25 01:33:32 -07:00
Claw Assistant 1c7a783c42 fix(cli,gateway): strip outer brackets/quotes from /resume args + accept session IDs in gateway
The /resume usage hint shows '<session_id_or_title>' which a few users have
typed verbatim, including the angle brackets. Strip outer <>, [], "", and ''
from the argument before lookup so '/resume <abc123>' works the same as
'/resume abc123'. Mirrors the new bracket-stripping in the CLI handler.

Also let the gateway resolve a bare session ID. Previously the gateway only
called resolve_session_by_title, so '/resume <session_id>' always returned
'Session not found' even for valid IDs. Try get_session() first, fall back
to title resolution second.

Surgical reapply of PR #10215 (branch was based on a many-months-old main
and reverted ~3100 unrelated files; original commit by claw@openclaw.ai
preserved via --author).
2026-05-25 01:33:32 -07:00
Teknium 920b350e57 test(auth): align copilot-remove test with borrowed-credential policy (#31416)
PR #31416 (avoid persisting borrowed credential secrets) added
sanitize_borrowed_credential_payload, which strips access_token from
any auth.json pool entry whose (provider, source) isn't in the
_PERSISTABLE_PROVIDER_SOURCES allowlist.

(copilot, gh_cli) is borrowed (not in the allowlist), so the test
fixture's pre-seeded access_token now gets stripped at load_pool()
time, leaving the pool empty. resolve_target('1') then fails with
'No credential #1. Provider: copilot.'

Fix: align the test with the new contract. At runtime, copilot tokens
are hydrated by resolve_copilot_token() — mock that path so the pool
gets an entry the test can remove. The behavior under test
(suppression of gh_cli + env variants on remove) is unchanged.

CI repro on origin/main HEAD; reproduced locally with stock checkout.
2026-05-25 01:23:31 -07:00
Teknium 9c77a0c3ce fix(plugins): widen masked secret prompt to plugin setup wizards
Extend PR #31716 to plugin setup paths that were also using bare
getpass.getpass(): hindsight (4 sites), honcho, simplex, line. Same
mechanical swap onto hermes_cli.secret_prompt.masked_secret_prompt.
2026-05-25 01:20:33 -07:00
helix4u ec4d6f1823 fix(cli): show masked feedback for secret prompts 2026-05-25 01:20:33 -07:00
Glen Workman d952b377aa fix: add cron API provenance logging (#24889)
Co-authored-by: sgtworkman <178342791+sgtworkman@users.noreply.github.com>
2026-05-25 01:15:56 -07:00
teknium1 92d91365e7 chore(release): map zapabob for PR #29826 salvage 2026-05-25 01:15:24 -07:00
zapabob 2c3ca475c0 fix(cron): reject id mutation + validate output paths under OUTPUT_DIR
Two defense-in-depth fixes on cron output path handling:

1. cron/jobs.py:update_job() rejects mutation of the immutable 'id' field
   (raises ValueError). Dashboard PUT /api/cron/jobs/{id} converts this to
   HTTP 400. Without this, an attacker who can reach the update endpoint
   could rename a job's id to '../escape' and move its output directory
   outside OUTPUT_DIR.

2. cron/jobs.py:_job_output_dir() validates job IDs before composing
   paths: rejects '.', '..', '/', '\\', absolute paths, and Windows drive
   prefixes. Used by save_job_output() and remove_job() so legacy unsafe
   IDs (from before this guard) fail closed rather than half-applying a
   shutil.rmtree or output write outside the sandbox.

Tests:
  - update_job rejects {'id': '../escape'} without renaming
  - remove_job(legacy '../escape' id) raises ValueError without deleting
    files outside OUTPUT_DIR or removing the job from the store
  - save_job_output rejects '..', './escape', 'nested/escape',
    absolute paths
  - dashboard PUT /api/cron/jobs/{id} with {'id': '../escape'} returns
    400, job list unchanged

Salvaged from PR #29826 by @zapabob. Simplified implementation:
- Dropped a 23-line _validate_job_output_id() helper using Path.parts
  semantics. The inline check (path separators + dot-components +
  is_absolute) is shorter and behaviorally identical.
- Dropped the secondary OUTPUT_DIR.resolve()/relative_to() check —
  redundant once we reject any path separator at the input boundary.
- Dropped the _docs/2026-05-21_cron-output-path-hardening_codex.md
  planning artifact (we don't check planning docs into the repo).

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-25 01:15:24 -07:00
teknium1 0c3e34e298 chore(release): map Schrotti77 for PR #25786 salvage 2026-05-25 01:09:54 -07:00
Schrotti77 9863a07af6 fix(cron): layer agent.disabled_toolsets onto cron baseline (#25752)
The bug: cron/scheduler.py:_resolve_cron_enabled_toolsets returns an
LLM-supplied per-job enabled_toolsets verbatim. The disabled_toolsets
passed to AIAgent was a hardcoded [cronjob, messaging, clarify] that
ignored agent.disabled_toolsets from config.yaml. An LLM could call
cronjob(action='add', enabled_toolsets=['terminal','file'],
prompt='...') and the cron-spawned agent would receive terminal+file
even when the operator had globally disabled them.

Fix: new _resolve_cron_disabled_toolsets() helper that ALWAYS layers
agent.disabled_toolsets on top of the cron baseline. AIAgent's
disabled_toolsets takes precedence over enabled_toolsets, so this
stops the bypass regardless of what the per-job override contains.

This is the disabled-side fix. Three concurrent PRs (#25842, #25815,
#25780) proposed intersection-side variants on _resolve_cron_enabled_toolsets;
this fix is more robust because it stops the leak at the precedence
boundary AIAgent itself enforces, not at a layer above.

Regression test reproduces the issue's PoC exactly:
config.yaml has agent.disabled_toolsets=[terminal,file]; cron job has
enabled_toolsets=[web,terminal,file]; assertion: AIAgent receives
disabled_toolsets containing terminal AND file.

Salvaged from PR #25786 by @Schrotti77. Simplified the implementation:
dropped a 23-line _normalize_toolset_list() helper (handled str/tuple/
set/garbage input shapes) in favor of the existing convention
(agent_cfg.get('disabled_toolsets') or []) used elsewhere in the
codebase. YAML always parses these as lists; the elaborate normalizer
was theatre for shapes we never produce.

Closes #25752

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-25 01:09:54 -07:00
teknium1 a6b0414ea0 feat(providers): extend openai-api with live /v1/models fetch + gpt-5.5-pro
Follow-up on top of @jacevys' PR #21437 cherry-pick:
- _provider_model_ids() now also matches normalized == 'openai-api' for
  the live /v1/models fetch path, so users see the full catalog instead
  of just the curated list.
- Add gpt-5.5-pro and gpt-5.3-codex to the curated list for parity with
  the existing 'openai' table (used as fallback when /v1/models fails).
- Add scripts/release.py AUTHOR_MAP entry for jacevys so CI doesn't
  block the salvage PR.
2026-05-25 00:59:53 -07:00
jacevys aeb87508c6 feat(providers): add OpenAI API provider option 2026-05-25 00:59:53 -07:00
Hasan Ali d7c5d5dee5 fix: avoid persisting borrowed credential secrets (#31416) 2026-05-25 00:32:08 -07:00
Teknium 2b768535c9 test(acp): drop flaky runtime_calls[-1] tail-position assertion
The legacy runtime_calls[-1] == "anthropic" check in
test_model_switch_uses_requested_provider failed in CI under
specific test-shard scheduling with 'custom' == 'anthropic',
across multiple unrelated PRs on 2026-05-25. The May 23 pin
(commit 3127a41cb) monkeypatched parse_model_input + detect_provider_for_model
to remove the dependency on live _KNOWN_PROVIDER_NAMES module state but the
flake reappeared anyway — root cause still not reproducible locally even
under stress runs.

The other three assertions ("Provider: anthropic" in result,
state.agent.provider == "anthropic", state.agent.base_url ==
"https://anthropic.example/v1") already prove
fake_resolve_runtime_provider was called with requested="anthropic"
for the model-switch step — the agent's provider and base_url
come directly from that fake's return value. The tail-position
check was redundant and the only assertion that flaked.

Replaces runtime_calls[-1] == "anthropic" with
"anthropic" in runtime_calls so the plumbing path is still
covered without depending on call ordering.
2026-05-24 23:23:12 -07:00
helix4u 3b839f4369 fix(context): align guidance with 64k minimum 2026-05-24 23:23:12 -07:00
Teknium 1d5deac346 fix(website): cross-locale doc links + drop empty ko locale (#31895)
The locale switcher appeared broken because hardcoded markdown links
(`](/docs/X)`) got double-prefixed by Docusaurus to `/docs/<locale>/docs/X`
(404) in non-English locales, and the MDX hero `<a href>` on the index
page escaped locale routing entirely.

Changes:
- Rewrite 922 `](/docs/X)` -> `](/X)` across 166 docs files (strip trailing
  .md too). Docusaurus prepends locale + baseUrl itself.
- docs/index.md -> index.mdx; hero "Get Started" anchor -> Docusaurus
  <Link> so it stays inside the active locale.
- Drop `ko` locale entirely from docusaurus.config.ts + delete i18n/ko/
  (4 stale auto-translated kanban pages, <2% coverage, misleading).

Verified `npm run build` succeeds for both en and zh-Hans; `build/zh-Hans/
index.html` has no /docs/zh-Hans/docs/... double-prefixed paths.

PR2 will translate the 335 English docs into i18n/zh-Hans/.
2026-05-24 23:16:20 -07:00
Teknium b0135c741d diag(xai-oauth): log loopback callback hits + wait-timeout outcome (#27385) (#31894)
#27385 reports that on macOS the browser sees the xAI 'authorization
received' success page but Hermes still raises xai_callback_timeout.
The loopback HTTP handler was silent — no log line on receipt, no log
line on wait timeout — so triaging the gap between 'browser saw
success' and 'CLI saw timeout' required either a code change or
guesswork.

Adds two INFO log lines:

- Per callback hit (handler): path, has_code, has_state, has_error,
  truncated User-Agent.  Booleans / fingerprints only — no actual
  code/state strings leak.
- On wait timeout: report whether result.code or result.error was
  populated at deadline.  Distinguishes three failure modes:
  1. No hit log + timeout log w/ has_code=False has_error=False
     → xAI's IDP never reached the loopback (firewall, port-binding,
     IPv6/IPv4 mismatch, browser blocked private-network access).
  2. Hit log w/ has_code=False has_error=False + timeout log
     → xAI hit the loopback without OAuth params (the bare-URL
     case the handler already 400s on).
  3. Hit log w/ has_code=True + timeout log w/ has_code=False
     → result_lock contention or race; would indicate a real bug.

133/133 in tests/hermes_cli/test_auth_xai_oauth_provider.py,
tests/hermes_cli/test_xai_oauth_pkce_token_exchange.py, and
tests/run_agent/test_codex_xai_oauth_recovery.py.
2026-05-24 23:05:25 -07:00
ethernet b288de8bf4 Merge pull request #31081 from NousResearch/bb/tui-skinny-status-rule
fix(tui): keep status rule one-line in skinny terminals
2026-05-25 01:24:29 -04:00
Ben Barclay 7e165e843d Merge pull request #31760 from NousResearch/hermes/hermes-bf5898da
feat(docker)!: s6-overlay container supervision (salvage of #30136)
2026-05-25 12:57:51 +10:00
Teknium 46f8948bad test+harden(cli): cover parent-chain walk in concurrent-instance detection
Follow-up to @Strontvod's fix.

Tests:
- Five new tests in test_update_concurrent_quarantine.py cover the parent-
  chain exclusion: the .exe launcher is excluded, an unrelated sibling
  hermes.exe is still reported, multi-level ancestry is fully excluded,
  PID cycles in the parent chain don't hang, and a partially-stubbed
  psutil (no Process attribute) degrades gracefully instead of crashing.
- New _fake_psutil_with_parent_chain helper builds a fuller stand-in
  (Process / NoSuchProcess / AccessDenied + process_iter) than the
  process_iter-only SimpleNamespace the older tests use.

Hardening:
- Broaden the except in the parent-walk to bare Exception. The original
  fix listed (NoSuchProcess, AccessDenied, ValueError), but those names
  are evaluated lazily during exception matching — if psutil is a partial
  stub without the attribute, the exception handler itself raises
  AttributeError that escapes. The function is documented as 'never raises'
  (the surrounding update flow depends on it), so the broader catch keeps
  the contract regardless of how the dependency is shaped.

AUTHOR_MAP:
- Map schepers.zander1@gmail.com -> Strontvod so the salvaged commit
  resolves to @Strontvod in the release notes.

All 18 detect_concurrent + quarantine tests pass.
2026-05-24 19:51:46 -07:00
Strontvod 323cce7e94 fix: exclude parent process chain from concurrent instance detection on Windows
On Windows, the setuptools-generated hermes.exe launcher is a separate native
process that spawns python.exe (the interpreter running the update code).
os.getpid() returns the Python PID, but the launcher (which holds the file
lock) is the parent. Without walking the parent chain, every 'hermes update'
reports its own launcher as a concurrent instance - a false positive.

This patch builds an exclusion set containing the Python process and its
entire ancestor chain, so the running invocation never reports itself.
2026-05-24 19:51:46 -07:00
Ben da8b2e95fd ci(docker): run tests/docker/ in build-amd64 against the freshly-built image
The new tests/docker/ suite (added by this PR) was being picked up by the
sharded pytest matrix in tests.yml, where its session-scoped `built_image`
fixture issued a 3-7min `docker build` under tests/docker/conftest.py's
180s pytest-timeout cap. Every test in the directory failed in fixture
setup across all 6 shards.

Fix the suite so it actually runs (not skips):

1. Wire the docker tests into docker-publish.yml's build-amd64 job, right
   after the existing smoke test. The image is already loaded into the
   local daemon as `nousresearch/hermes-agent:test`; set
   HERMES_TEST_IMAGE to that and the fixture's pre-built-image branch
   short-circuits the rebuild. 21 tests run in ~90s locally against a
   prebuilt image, no rebuild cost on top of the existing build step.

2. Exclude tests/docker/ from scripts/run_tests_parallel.py's default
   discovery so the sharded matrix in tests.yml stops trying to build
   the image. Explicit positional paths (`pytest tests/docker/` or
   `scripts/run_tests.sh tests/docker/`) still pick the suite up — the
   skip rule honors directory-level user intent, matching the existing
   per-file override pattern.

The dedicated docker-tests step runs on every PR that touches docker
code (the existing path filters on docker-publish.yml already cover
`tests/docker/**` via `**/*.py`), so the suite gates real changes.

(cherry picked from commit 4c481860ce)
2026-05-25 12:40:57 +10:00
Ben c524b8a4dc test(docker): fix svstat 'want up' assertion in profile-gateway lifecycle test
After the supervise-perms fix lands, the s6 lifecycle actually works
for the hermes user — hermes -p <profile> gateway start now genuinely
brings the supervised gateway up rather than silently no-op'ing on
EACCES. That exposes a latent bug in this test's assertion: it
expected 'want up' to appear literally in s6-svstat output, but
s6-svstat elides redundancies — when the slot is currently up AND
s6 wants it up, the output is just 'up (pid N pgid N) X seconds';
the explicit 'want up' token only appears when current ≠ wanted
(e.g. 'down (exitcode 1) … , want up' on a crash-loop).

Add a small helper _svstat_wants_up() that reads the want-state
correctly across both spellings:
  * 'up …'                       → wanted up (unless explicit 'want down')
  * 'down …, want up'            → wanted up explicitly
  * 'down …'                     → wanted down

Both stop and start assertions now use the helper. Also rewords
the module docstring to acknowledge that the supervised process
may succeed OR crash-loop depending on environment, but the want-
state contract holds either way.

(cherry picked from commit 02c933aedc)
2026-05-25 12:25:06 +10:00
Ben 7d54288d82 test(dockerfile): recognize s6-overlay/init as a valid PID-1; harden against historical-comment masquerade
PR #30136 CI: test_dockerfile_entrypoint_routes_through_the_init failed
because the test hardcoded known_inits = ('tini', 'dumb-init',
'catatonit'). The PR replaced tini with s6-overlay's /init (which execs
s6-svscan as PID 1) — same SIGCHLD-reaping contract, different name,
so the substring scan against ENTRYPOINT missed it.

Two-part fix:

1. Extend the accepted token list to include 's6-overlay', 's6-svscan',
   and '/init'. The contract these tests enforce is behavioural ('some
   PID-1 init reaps SIGCHLD'), so the names list is purely a recognition
   table and any reaper-capable family should qualify.

2. Harden test_dockerfile_installs_an_init_for_zombie_reaping (the
   sibling check) against comment-only matches. It was scanning the full
   Dockerfile text and only passed because the word 'tini' is still in
   a historical comment explaining why we used to use it. The next
   person to clean up that comment would have silently broken the test.
   New _instruction_text() helper joins only the parsed, non-comment
   Dockerfile instructions so stale comments can't satisfy the check.

(cherry picked from commit ffc1bb6393)
2026-05-25 12:24:58 +10:00
Ben 4f416fc40c fix(docker): make s6 lifecycle work for the unprivileged hermes user
Resolves the explicit "Known follow-up" left by commit 2f8ceeab9 and
the resulting CI failures in tests/docker/test_dashboard.py and
tests/docker/test_s6_profile_gateway_integration.py.

The product gap
---------------
Every hermes runtime operation inside the container runs as the
hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned
by s6-svscan running as PID 1 — creates each service's supervise/
and top-level event/ directories with mode 0700 owned by its
effective UID (root). That left every s6-svc / s6-svstat / s6-svwait
call from hermes hitting EACCES on the supervise/control FIFO and
supervise/status — i.e. the entire S6ServiceManager lifecycle
(register, start, stop, unregister) was inert in production.

The 2f8ceeab9 commit message called this out and deferred the fix.
The audit changes that landed alongside it (defaulting docker_exec
to -u hermes) made the integration tests reproduce the bug
deterministically; the fix below resolves it.

The fix: pre-create the supervise/ skeleton hermes-owned
----------------------------------------------------------
Reading s6's source (src/supervision/s6-supervise.c::trymkdir +
control_init), the mkdir and mkfifo calls that build the supervise
tree are EEXIST-safe: if the directory or FIFO is already present,
s6-supervise reuses it and skips the chown/chmod fix-up that would
normally make event/ 03730 root:root. So if we lay the skeleton
down with hermes ownership before triggering s6-svscanctl -a,
s6-supervise inherits our layout and never touches it. The
death_tally / lock / status regular files written later by
s6-supervise (still as root) land mode 0644 — world-readable —
which is all s6-svstat needs.

New module-level helper _seed_supervise_skeleton(svc_dir) in
hermes_cli/service_manager.py lays down:
  svc_dir/event/                       hermes:hermes 03730
  svc_dir/supervise/                   hermes:hermes 0755
  svc_dir/supervise/event/             hermes:hermes 03730
  svc_dir/supervise/control            hermes:hermes 0660 (FIFO)
  svc_dir/log/event/                   hermes:hermes 03730  (if log/ present)
  svc_dir/log/supervise/               hermes:hermes 0755
  svc_dir/log/supervise/event/         hermes:hermes 03730
  svc_dir/log/supervise/control        hermes:hermes 0660 (FIFO)

The log/ branch matters because the logger is a second
s6-supervise instance — without it, unregister rmtree races on
the logger's root-owned supervise dir even after the parent
slot's supervise/ is hermes-owned. The helper is idempotent and
swallows PermissionError on chown so it works equally well when
called from root (cont-init.d) or hermes (runtime register).

Wiring
------
1. S6ServiceManager.register_profile_gateway calls
   _seed_supervise_skeleton(tmp_dir) just before publishing the
   slot via Path.replace. Runtime-registered profile gateways are
   set up by hermes.

2. container_boot._register_service does the same in the cont-init.d
   reconciliation path so boot-time-restored profile slots inherit
   the same layout.

3. New cont-init.d/015-supervise-perms script chowns the supervise/
   and event/ trees for STATIC s6-rc services (dashboard,
   main-hermes). These are spawned by s6-rc before cont-init.d
   gets to run, so the EEXIST-trick doesn't apply; we chown the
   already-existing tree instead. s6-supervise keeps using the
   same files; it never re-asserts ownership on a running service.
   The script skips s6-overlay internal services (s6rc-*,
   s6-linux-*) so the supervision tree itself stays root-only.
   015- slot is intentional: lex-sorts between 01-hermes-setup
   and 02-reconcile-profiles in the container's C-locale, so
   the chown finishes before the reconciler walks the scandir.

Unregister teardown reordering
------------------------------
S6ServiceManager.unregister_profile_gateway now fires
s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so
s6-svscan reaps the supervise child and releases its file
handles on supervise/lock + supervise/status before we try to
remove the directory. Previously rmtree raced s6-supervise on a
set of files inside the supervise dir, and even with the parent
supervise/ now hermes-owned, the contained files (death_tally,
lock, status, written by root) could still be in use.

Dashboard down-state redesign
-----------------------------
The original PR #30136 review fix wrote a 'down' marker file
into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle.
That approach was broken in two ways:

  (a) /run/service/dashboard is a symlink to a TRANSIENT
      /run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is
      mid-transaction; the touch landed in a soon-to-be-discarded
      tmp.

  (b) Even when written to the final /run/s6-rc/servicedirs/
      location, the 'down' file is only consulted by s6-supervise
      at slot startup. s6-rc's user-bundle explicitly transitions
      'dashboard' to 'up' on every boot, overriding any down
      marker.

The right fix is the canonical s6 pattern: when HERMES_DASHBOARD
is unset, the dashboard run script exits 0 and a companion
finish script exits 125. Per s6-supervise(8), exit code 125 from
the finish script is the 'permanent failure, do not restart'
marker — equivalent to s6-svc -O. The slot reports as 'down' to
s6-svstat, matching the reality that no dashboard process is
running. When HERMES_DASHBOARD IS truthy, finish exits 0 and
restart-on-crash semantics apply.

03-dashboard-toggle is removed (its function is now subsumed by
the run/finish pair).

Tests
-----
Adds four unit tests for _seed_supervise_skeleton covering the
produced layout, the log/ subservice case, the skip-when-no-log
case, and idempotency. The live-container verification continues
to live in tests/docker/test_s6_profile_gateway_integration.py and
tests/docker/test_dashboard.py — both now pass against the
rebuilt image.

References
----------
* Skarnet skaware mailing list 2020-02-02 (Laurent Bercot
  + Guillermo Diaz Hartusch) on unprivileged s6 tool semantics:
  http://skarnet.org/lists/skaware/1424.html
* just-containers/s6-overlay#130 — same EEXIST-preseed pattern,
  community-validated 2016 onward
* https://skarnet.org/software/s6/servicedir.html — exit-code 125
  semantics in finish scripts

(cherry picked from commit c41f908ad4)
2026-05-25 12:23:23 +10:00
Ben Barclay a3abeb5954 Merge pull request #31775 from NousResearch/extending-docker-docs
docs(docker): add 'Installing more tools in the container' section
2026-05-25 11:41:59 +10:00
Ben 6840ca2d1e docs(docker): add 'Installing more tools in the container' section
Documents five approaches for adding tools beyond what the official
image ships with: npx/uvx for npm/Python tools, ad-hoc apt installs
that Hermes remembers, derived images for durability, sidecar
containers for multi-service stacks, and upstreaming via issue/PR
for broadly useful additions.
2026-05-25 11:40:58 +10:00
teknium1 7f6f00f6ec test(dockerfile): accept s6-overlay /init as a known PID-1 init
Follow-up to @benbarclay's #30136 salvage. The pre-existing PID-1
contract tests in tests/tools/test_dockerfile_pid1_reaping.py (added
with #15012) hardcoded tini/dumb-init/catatonit as the only accepted
inits, so they failed after #30136 replaced tini with s6-overlay's
/init.

s6-overlay's PID 1 is s6-svscan, which reaps zombies non-blockingly
on SIGCHLD — same contract the test exists to enforce. Two updates:

  * test_dockerfile_installs_an_init_for_zombie_reaping — accept
    's6-overlay' as a known-installed marker (matches the
    s6-overlay install layer in Ben's Dockerfile).
  * test_dockerfile_entrypoint_routes_through_the_init — accept
    '/init' as a known-routed marker (s6-overlay's PID-1 binary
    lives at /init by convention).

Both assertions still fire if a future Dockerfile rewrite drops
the init entirely. Local: 7/7 pass.
2026-05-24 18:32:14 -07:00
teknium1 5cbb132c1d fix(ci): exclude tests/docker/ from regular test shards; pin read_text encoding
Two CI follow-ups to @benbarclay's #30136 salvage:

1. scripts/run_tests_parallel.py — add 'docker' to _SKIP_PARTS so
   the new tests/docker/ harness doesn't run in the regular test (N)
   matrix. The harness builds the real Dockerfile in a session
   fixture, which can exceed pytest-timeout's 180s ceiling on
   ubuntu-latest where Docker IS available — it surfaced as 6
   identical setup-timeout failures across slices 1–6 on the first
   CI run.

   The docker harness has its own dedicated runner via
   .github/actions/hermes-smoke-test (added in #30136) plus the
   docker-lint workflow. Same treatment as tests/integration/ and
   tests/e2e/ — runs separately, not in the main shards.

2. hermes_cli/service_manager.py — pin encoding='utf-8' on the
   /proc/1/comm read_text call. Ruff PLW1514 enforcement rolled in
   between Ben's last push and the salvage; pure ruff-fix, no
   behavior change.
2026-05-24 18:23:13 -07:00
teknium1 af144cd60d fix(model): include Premium+ in xAI OAuth label
X Premium+ also grants Grok OAuth access — the 'SuperGrok Subscription'
wording suggested SuperGrok was the only entitlement path. Updated to
'SuperGrok / Premium+' across the picker label, setup wizard, auth flows,
and docs so Premium+ subscribers know the row applies to them too.
2026-05-24 18:12:16 -07:00
helix4u 4987fd2a59 fix(model): disambiguate xAI OAuth picker label 2026-05-24 18:12:16 -07:00
Teknium 031f9c9edc fix(image_gen): cache xAI ephemeral URL responses to disk (#26942) (#31759)
xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs
that 404 within minutes — long before downstream consumers (Telegram
send_photo, browser preview, multi-tier delivery fallback) get a chance
to fetch them.  The xAI image_gen provider was passing those URLs
through unchanged on the elif url: branch; b64 responses were already
cached locally via save_b64_image.  Result: every image_generate call
on a Telegram-routed xai-oauth profile delivered no image, falling
through to text-only.

Adds agent.image_gen_provider.save_url_image() — a sibling helper to
save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/.
Content-type-aware extension inference with URL-suffix fallback;
oversize cap (25MB default) with partial-write cleanup; empty-body
refusal.  Mirrors the audio_cache pattern used by text_to_speech.

Wires save_url_image into both the xAI and OpenAI providers' URL
branches.  When the download fails (network blip, 404 in-flight) we
log a warning and fall back to the bare URL rather than turning the
tool call into a hard error — the gateway's existing URL-send fallback
then gets a chance to surface the original error legibly.

Test plan:
- tests/agent/test_save_url_image.py — 8 direct tests against a real
  in-process HTTP server: bytes round-trip, content-type → extension,
  URL-suffix fallback, default-to-png, 404 propagation, empty-body
  refusal, oversize cap + cleanup, filename uniqueness.
- tests/plugins/image_gen/test_xai_provider.py — flip
  test_successful_url_response (was asserting the bug), add
  test_url_response_falls_back_to_bare_url_when_download_fails.
- tests/plugins/image_gen/test_openai_provider.py — symmetric pair.

160/160 in the broader image_gen test surface.
2026-05-24 18:10:47 -07:00
teknium1 a4092ab217 fix(profiles): short-circuit s6 hooks on host before importing service_manager
Follow-up to @benbarclay's Docker s6 PR (#30136). The Phase 4 hooks
`_maybe_register_gateway_service` and `_maybe_unregister_gateway_service`
were already documented as "no-op on host", but they reached that no-op
by:

  1. importing `hermes_cli.service_manager`
  2. calling `get_service_manager()` (which calls `detect_service_manager()`)
  3. checking `mgr.supports_runtime_registration()` and returning False

If anything in step 1 or 2 raised an unexpected exception (e.g. a host
machine with a partial s6 install — `/proc/1/comm == s6-svscan` somehow,
but `/run/s6/basedir` absent, or vice versa), the `except Exception`
in the hook would print a confusing "⚠ Could not register s6 gateway
service: ..." warning on a non-container machine that has never touched
the container.

Reorder so `detect_service_manager() != "s6"` is checked FIRST, and
return silently for any detection failure. Host machines now:

  - never import the s6 backend
  - never call get_service_manager()
  - never print an s6-shaped warning under any failure mode

E2E confirmed on host Linux (systemd):
  `_maybe_register_gateway_service(...)` produces empty stdout,
  detect_service_manager() returns "systemd".

Existing tests updated to patch `detect_service_manager` for the s6
call-through cases (they previously relied on get_service_manager
being the only gate, which is no longer true). Added one new test —
`test_register_silent_when_detect_throws` — asserting that a broken
detector cannot leak a warning to host users.

cc @benbarclay — visible behavior change vs. your branch is one
fewer code path on host. Test changes are minimal (one helper +
`_patch_detect_s6` opt-in per s6 test). Happy to revert if you
prefer the original shape.
2026-05-24 18:07:47 -07:00
kshitijk4poor af973e4071 refactor(gateway): migrate Mattermost adapter to bundled plugin
Second migration of an existing built-in platform adapter after Discord
(PR #30591) — follows the same shape established by IRC / Teams / LINE /
Google Chat / SimpleX and the playbook in
`references/platform-plugin-migration.md`. Advances the umbrella refactor
in #3823.

Matches Discord's parity bar — adapter under `plugins/platforms/mattermost/`
with the standard `__init__.py` / `adapter.py` / `plugin.yaml` shell,
`register(ctx)` entry point, **no back-compat shim** at the old import
path, and full parity for all five hooks Discord uses plus the
`apply_yaml_config_fn` hook (mattermost is the second consumer of #25443
after Discord):

* `standalone_sender_fn` — out-of-process cron delivery via Mattermost
  REST API. Picks up the thread_id + media_files capabilities the
  legacy `_send_mattermost` lacked (parity with Discord's `_standalone_send`).
* `setup_fn` — interactive `hermes setup gateway` wizard.
* `apply_yaml_config_fn` — translates `config.yaml` `mattermost:` keys
  (`require_mention`, `free_response_channels`, `allowed_channels`) into
  `MATTERMOST_*` env vars (replaces the hardcoded block in
  `gateway/config.py`).
* `is_connected` — declares connection state from `MATTERMOST_TOKEN` +
  `MATTERMOST_URL`.
* `check_fn` — verifies aiohttp is installed and both required env vars
  are set.
* plus `allowed_users_env`, `allow_all_env`, `cron_deliver_env_var`,
  `max_message_length` (4000 — Mattermost practical limit), `emoji`,
  `required_env`, `install_hint`.

Files
-----
* `gateway/platforms/mattermost.py` (873 LOC) →
  `plugins/platforms/mattermost/adapter.py` (git rename, R071) +
  appended `register()` block, hook helpers, and `_standalone_send`
  with media upload + thread_id support.
* New `plugins/platforms/mattermost/{__init__.py, plugin.yaml}` with
  `requires_env` / `optional_env` declarations covering MATTERMOST_URL,
  MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS, MATTERMOST_ALLOW_ALL_USERS,
  MATTERMOST_HOME_CHANNEL, MATTERMOST_REPLY_MODE,
  MATTERMOST_REQUIRE_MENTION, MATTERMOST_FREE_RESPONSE_CHANNELS,
  MATTERMOST_ALLOWED_CHANNELS.
* `gateway/config.py`: delete 17-LOC `mattermost_cfg` YAML→env bridge
  (moved into plugin's `_apply_yaml_config`).
* `gateway/run.py::_create_adapter`: delete `Platform.MATTERMOST elif` —
  replaced by the existing generic plugin-registry-first dispatch.
* `tools/send_message_tool.py`: delete `_send_mattermost` (22 LOC) +
  `Platform.MATTERMOST elif` in `_send_to_platform` — the `else` branch
  already routes plugin platforms through `_send_via_adapter`, which
  hits the registry's `standalone_sender_fn`.
* `hermes_cli/setup.py`: delete `_setup_mattermost` (44 LOC) — replaced
  by the plugin's `interactive_setup`.
* `hermes_cli/gateway.py`: delete `_PLATFORMS["mattermost"]` dict entry
  (3 LOC) — plugin's `setup_fn` is dispatched via the plugin path in
  `_configure_platform`.
* Consumer rewrite: 5 test files (test_mattermost.py,
  test_media_download_retry.py, test_send_multiple_images.py,
  test_stream_consumer.py, test_ws_auth_retry.py) get
  `gateway.platforms.mattermost` → `plugins.platforms.mattermost.adapter`
  with the bulk-rewrite recipe from the platform-plugin-migration playbook.
  Single `mock.patch` string in test_stream_consumer.py also repointed.
* `tests/tools/test_send_message_missing_platforms.py`: thin
  `(token, extra, chat_id, message)` compat shim around the plugin's
  `_standalone_send(pconfig, …)` so existing test bodies continue to
  work without rewriting every signature.

Validation
----------
* Plugin discovery: mattermost registers from `plugins/platforms/mattermost/`
  alongside discord / teams / irc / line / google_chat / simplex.
  All 9 hooks present (setup_fn, standalone_sender_fn,
  apply_yaml_config_fn, is_connected, check_fn, allowed_users_env,
  allow_all_env, cron_deliver_env_var, max_message_length=4000).
* Mattermost-touching tests: 62/62 pass
  (`test_mattermost.py` + `test_send_message_missing_platforms.py`).
* Targeted selectors (mattermost or platform_registry or stream_consumer
  or ws_auth_retry or media_download_retry or send_multiple_images or
  send_message_tool or platform_connected): 433/433 pass.
* Full sweep (`scripts/run_tests.sh tests/gateway/ tests/cron/
  tests/tools/test_send_message_tool.py tests/tools/test_send_message_missing_platforms.py
  tests/integration/`): **6220/6220 pass in 47.8s, 0 failures**.
* Lint: ruff clean on all touched files.
* Git identity verified: kshitijk4poor.
* Rename detection: R071 (similarity dropped from a hypothetical R09x
  by the ~320-line appended register block — ~36% growth over the
  873-LoC base, vs Discord's 5101 LoC base which kept R091).

Closes part of #3823.
2026-05-24 18:05:33 -07:00
Ben 6c49bdc4f4 docs(plans): trim s6-overlay plan to a post-implementation reference
PR #30136 review item O7: the plan doc was 3,191 lines — 5x the
size of any other plan in docs/plans/ and the largest reference
document in the repo. With the implementation shipped, most of
that content is either:

* The phase-by-phase TDD walkthrough (~2,800 lines): now canonical
  in the PR commit log (`git log a957ef083..a6f7171a5`).
* The v2/v3 re-validation preambles: artifacts of the planning
  process, no longer load-bearing.
* The full Open Questions deliberations with options A/B/C laid
  out: collapsed into the Decision Log.
* The Rollout Plan and Estimated Timeline: history.

Trim to ~430 lines covering what readers actually need going
forward: the goal, architecture, scope, key design decisions
(D1–D9), risk register (now including the three risks surfaced
in PR review — `_s6_running` detection, svscanctl FIFO perms,
supervise control FIFO perms), the decision log including the
post-merge additions, and the verification checklist (now all
boxes ticked).

Header now reads 'Status: shipped' and points at the PR. The git
history preserves the full v3 plan for anyone who needs it.
2026-05-24 18:05:33 -07:00
Ben cd5b2c4123 test(docker): poll for boot-log signal instead of fixed sleeps
PR #30136 review item O6: test_container_restart.py used fixed
`time.sleep(8)` calls after `docker restart` to wait for the
cont-init reconciler to finish. Fixed sleeps are slow when the
event happens fast and false-fail when the event happens slow.

Replace with two polling helpers:

* `_wait_for_path(container, path, kind='f' | 'd', deadline_s=...)`
  — generic `test -f/-d` poller. Returns True on success, False on
  timeout; callers assert with a clear message.
* `_wait_for_reconcile_log_mention(container, profile, ...)` — the
  reconciler's per-profile log line is the canonical signal that
  the cont-init reconcile has finished for that profile. Poll on
  it instead of a sleep that hopes 8 seconds is enough.

The fixture-level setup wait is similarly migrated: it now polls
for `profile=default` in the boot log (every container always
gets a default-slot entry per item I1) and raises a clear timeout
error from the fixture if the container never finishes cont-init —
much better diagnostics than a mid-test KeyError.

The remaining `time.sleep()` calls are all internal interval_s
between probe attempts; no fixed wait points left.
2026-05-24 18:05:33 -07:00
Ben 04bdbce906 docs(docker): deprecation warning in entrypoint.sh shim
PR #30136 review item O5: docker/entrypoint.sh is now a thin shim
that forwards to stage2-hook.sh — the real ENTRYPOINT is /init plus
main-wrapper.sh. External scripts that hard-coded entrypoint.sh as
the container's ENTRYPOINT will see the cont-init bootstrap happen
but the CMD will not be exec'd (because stage2-hook only handles
bootstrap; main-wrapper.sh handles the CMD passthrough).

Add a stderr warning explaining the new contract and pointing
callers at the migration path (drop the --entrypoint override).
The shim itself stays in place for one release cycle so the
deprecation isn't a hard break — anyone still invoking it sees
the warning in their logs and has time to migrate.
2026-05-24 18:05:33 -07:00
Ben d0b1ab48dc fix(container_boot): publish reconciled service dirs atomically
PR #30136 review noted the asymmetry: `register_profile_gateway`
used tmp_dir + rename to publish a new service slot atomically,
but the boot-time reconciler wrote files into the slot directly.
Same underlying concern (a concurrent s6-svscan rescan could
observe a half-populated directory), different code path.

Rewrite `container_boot._register_service` to mirror the manager:
build everything in `<scandir>/gateway-<profile>.tmp/`, then
`Path.replace` into place. If a previous interrupted run left a
`.tmp` sibling, it's cleaned up before the new build starts. If
the target already exists, it's removed before the rename so
`Path.replace` doesn't error on a non-empty target (Linux `rename`
overwrites empty targets only).

Three new tests: atomic publication leaves no .tmp leftovers,
overwriting an existing slot still leaves no .tmp leftovers, and
a stale .tmp from an interrupted run is cleaned up automatically.
2026-05-24 18:05:33 -07:00
Ben 4443fb481d fix(container_boot): rotate container-boot.log when it exceeds 256 KiB
PR #30136 review noted: container-boot.log was append-only with no
rotation. On a long-lived container with frequent restarts and
many profiles it would grow unboundedly (~80 B per profile per
reconcile pass).

Add a soft cap: when the file size hits 256 KiB (`_LOG_ROTATE_BYTES`,
≈3000 reconcile lines, ≈1 year of daily reboots × 5 profiles), the
current file is renamed to `container-boot.log.1` (replacing any
existing one) before new entries are appended. Worst case is two
files at ~512 KiB — well within visibility limits for grep/cat.

Rotation is intentionally simple (no logrotate or s6-log machinery
for one append-only file). Failures during rotation are logged via
the module logger and treated as non-fatal — we keep appending to
the existing file rather than dropping the reconcile entry. Three
new unit tests cover above-threshold rotation, below-threshold
non-rotation, and overwrite of an existing .1 file.
2026-05-24 18:05:33 -07:00
Ben 9914bfc594 docker: drop sh -c wrappers from stage2-hook.sh
PR #30136 review caught: three `s6-setuidgid hermes sh -c "..."`
invocations in stage2-hook.sh interpolated $HERMES_HOME into a
nested shell context. Practically low-risk (a malicious HERMES_HOME
already requires container-launch privileges) but the cleaner
pattern is to invoke commands directly so the shell isn't a second
interpreter.

* `mkdir -p` of the data subdirs now runs directly via s6-setuidgid,
  one path per arg.
* The .install_method stamp is written via `printf | tee` — also no
  shell wrapper.
* The skills_sync invocation uses the venv's python by absolute path
  instead of sourcing activate inside a shell. skills_sync.py doesn't
  need anything from activate beyond sys.path, which the bin-stub
  python already provides.

No behavior change. Just a smaller attack surface and a script
that's easier to read.
2026-05-24 18:05:33 -07:00
Ben d735b083e8 fix(service_manager): rip out dead port parameter
PR #30136 review caught: `_allocate_gateway_port()` in profiles.py
computed a SHA-256-derived port that was threaded through
`register_profile_gateway(profile, port=N)` →
`_render_run_script(profile, port, extra_env)` → and then **ignored**.
The rendered run script picked the bind port from the profile's
config.yaml (`[gateway] port = …`), never from the allocator. So
the entire allocator + parameter chain was dead code.

Remove:

* `hermes_cli.profiles._allocate_gateway_port` (deterministic
  SHA-256 → [9200, 9800) — never used).
* `port` kwarg from `ServiceManager.register_profile_gateway`
  (Protocol + Mixin + S6 implementation).
* `port` positional arg from `_render_run_script(profile, port,
  extra_env)` — now `_render_run_script(profile, extra_env)`.
* The pass-through call in `profiles._maybe_register_gateway_service`.

config.yaml is now the single source of truth for gateway port
selection — matches reality and reduces the API surface. Three
explanatory comments in service_manager.py / profiles.py document
the retirement so future readers don't reach for the allocator and
find a ghost.

Tests: drop the three `_allocate_gateway_port` tests; update
fakes' signatures throughout test_service_manager.py and
test_profiles_s6_hooks.py to match the new no-port API.
2026-05-24 18:05:33 -07:00
Ben 143a189def docs(compose): update entrypoint comment for s6-overlay
PR #30136 review caught: docker-compose.yml still said "If you
override entrypoint, keep /opt/hermes/docker/entrypoint.sh in the
command chain." That was true under tini; under s6-overlay the
entrypoint is /init plus main-wrapper.sh, and entrypoint.sh is now
only a backward-compat shim.

Replace with an accurate description: /init must remain first in the
chain because it's PID 1 and runs the cont-init.d scripts (chown,
profile reconcile, dashboard toggle) before any service starts.
2026-05-24 18:05:33 -07:00
Ben 1dfabe47b3 fix(docker): dashboard slot stays 'down' when HERMES_DASHBOARD unset
PR #30136 review caught a false positive: when HERMES_DASHBOARD was
unset, the dashboard run script did `exec sleep infinity`, so
`s6-svstat /run/service/dashboard` reported the slot as 'up'.
`hermes doctor` and any other s6-svstat-based health check saw the
dashboard as supervised-running even though no dashboard process
existed.

Add cont-init.d/03-dashboard-toggle: writes a `down` marker file
into `/run/service/dashboard/` when HERMES_DASHBOARD is falsy,
removes any leftover marker when it's truthy. s6-supervise honors
`down` by not starting the service, so s6-svstat reports 'down' —
matching reality.

The run script's HERMES_DASHBOARD case-statement stays in place as
a belt-and-suspenders guard, so the two layers can never disagree.

Two new integration tests lock the behavior: slot reports down
when unset; slot reports up when set to 1.
2026-05-24 18:05:33 -07:00
Ben b28b3f51d3 fix(service_manager): friendly errors for missing slots and s6-svc failures
PR #30136 review caught: `S6ServiceManager.start/stop/restart` called
`subprocess.run(check=True)` on `s6-svc`, so any failure surfaced as
a raw `CalledProcessError` traceback. The two cases operators
actually hit are:

  1. The service slot doesn't exist — most commonly because the user
     typed a profile name wrong (`hermes -p typo gateway start`).
  2. s6-svc itself fails — most commonly EACCES on the supervise
     control FIFO when running unprivileged.

Both deserve named errors with actionable messages, not stacktraces.

Changes:

* Add `S6Error` base + two concrete errors in `hermes_cli.service_manager`:
    - `GatewayNotRegisteredError(profile)` — carries the unprefixed
      profile name; message: `no such gateway 'typo': register it
      with `hermes profile create typo` first, or pass an existing
      profile name via `-p <name>``.
    - `S6CommandError(service, action, returncode, stderr)` — carries
      the s6-svc rc and stderr; message: `s6-svc start on
      'gateway-coder' failed (rc=111): <stderr>`.

* Factor lifecycle dispatch through `_run_svc(flag, label, name)`:
  pre-checks that the service directory exists (raises
  GatewayNotRegisteredError before invoking s6-svc), then runs
  s6-svc and translates any CalledProcessError into S6CommandError.

* `_dispatch_via_service_manager_if_s6` in `hermes_cli.gateway`
  catches both errors and prints `✗ <message>` + `sys.exit(1)`
  instead of letting the exception bubble. The dispatch path that
  used to dump a traceback at the user now gives an actionable
  one-liner.

Tests: 6 new tests for the error types and their CLI rendering;
existing lifecycle test pre-seeds the slot directory before calling
`mgr.start` etc.
2026-05-24 18:05:33 -07:00
Ben b044c1ac29 fix(container_boot): always register gateway-default slot
PR #30136 review caught: `hermes gateway start` (no `-p`) inside
the container resolves `_profile_suffix() == ""` → service name
`gateway-default`, but no such slot was ever registered. The Phase 4
profile-create hook only fired on `hermes profile create <name>`,
and the root profile (which lives at the top of $HERMES_HOME, not
under `profiles/`) was never one of those. So bare `hermes gateway
start` landed on `s6-svc -u /run/service/gateway-default` →
uncaught `CalledProcessError` → traceback to the user.

Changes:

1. `reconcile_profile_gateways` now always registers a
   `gateway-default` slot before iterating named profiles. Its
   prior state is read from `$HERMES_HOME/gateway_state.json`
   (sibling to the profile root, not under `profiles/`); stale
   runtime files there are swept the same way. Auto-up only if the
   prior state was `running` — same rule as named profiles.

2. `S6ServiceManager._render_run_script` special-cases
   `profile == "default"` to emit `hermes gateway run` with NO
   `-p` flag. Passing `-p default` would resolve to
   `$HERMES_HOME/profiles/default/` — a different profile that
   almost certainly doesn't exist. The empty profile-suffix
   convention is the dispatcher's contract and the run script has
   to match.

3. A user-created `profiles/default/` collides with the reserved
   root-profile slot; the reconciler now skips it with a warning
   rather than producing two registrations of the same service name.

Action-list ordering is stable: `default` first, then named
profiles in directory order. Boot-log readers can rely on this.

Tests: 8 new dedicated default-slot tests plus updates to every
existing test that asserted against the action list (via the new
`_named_actions` helper that drops the always-present default
entry).
2026-05-24 18:05:33 -07:00
Ben a1a53a5d6e docs(docker): dashboard IS supervised — update note that contradicted the PR
PR #30136 review caught that website/docs/user-guide/docker.md still
said "The dashboard side-process is **not supervised** — if it
crashes, it stays down until the container restarts." That was true
under tini but is the opposite of the s6 behavior this PR ships and
`test_dashboard_restarts_after_crash` proves.

Replace with a description of what users actually see now: automatic
restart by s6-overlay, new PID after a short backoff, logs via
`docker logs`. The standalone-container caveat carries forward
unchanged.
2026-05-24 18:05:33 -07:00
Ben 6dedaa4846 fix(gateway): route --all stop/restart through s6 under container
PR #30136 review caught that `hermes gateway stop --all` and
`... restart --all` were broken under s6. The Phase 4 dispatcher was
gated on `not stop_all` (and the symmetric restart_all), so `--all`
fell through to `kill_gateway_processes(all_profiles=True)`. pkill
SIGTERMed every gateway, s6-supervise observed the crashes, and
restarted every gateway ~1s later — net effect: `--all` *kicked*
gateways instead of *stopping* them.

Add `_dispatch_all_via_service_manager_if_s6(action)` that iterates
`mgr.list_profile_gateways()` and routes stop/restart through each
service slot. s6's `want up`/`want down` flips correctly, so a
stop persists. Partial failures are surfaced per-profile with a
running success count; the host pkill path is only reached when s6
isn't in play.

`start --all` isn't a CLI surface — the helper rejects it and
returns False (host code path can take over).
2026-05-24 18:05:33 -07:00
Ben fc26a5a1c8 fix(ci): drop --entrypoint override in hermes-smoke-test action
PR #30136 review caught a silent regression: the smoke-test action
overrode ENTRYPOINT to `/opt/hermes/docker/entrypoint.sh`, which the
s6-overlay migration reduced to a shim that just `exec`s the stage2
hook. stage2-hook ignores its CMD args, prints "Setup complete", and
exits 0 — so `hermes --help` and `hermes dashboard --help` never
ran. The #9153 regression guard was a green-always no-op.

Drop the override so the smoke test uses the image's real ENTRYPOINT
chain (`/init` + `main-wrapper.sh`), which is the actual production
startup path. `hermes --help` and `hermes dashboard --help` now run
through the full supervision tree and exercise the real argv routing.
2026-05-24 18:05:33 -07:00
Ben d4e452b67b fix(docker): SHA256-verify s6-overlay tarballs
PR #30136 review flagged the s6-overlay install as a supply-chain
regression vs the gosu source it replaced — `tianon/gosu` was
digest-pinned via `FROM ...@sha256:...`, but the three new
ADD/curl downloads had no integrity check at all.

Pin all three tarballs (noarch, symlinks-noarch, per-arch) to
upstream-published SHA256s via ARGs. Verification happens via
`sha256sum -c` against a single checksum file (avoids a piped-shell
hadolint DL4006 warning under dash). To bump S6_OVERLAY_VERSION,
fetch the four `.sha256` files from the new release and update
the ARGs — documented inline.

If upstream artifacts are tampered with mid-build, the build now
fails loudly at the verification step instead of silently
producing a tainted image.
2026-05-24 18:05:33 -07:00
Ben f7893df4d2 fix(docker): support multi-arch s6-overlay install (amd64 + arm64)
The Dockerfile only ADD'd `s6-overlay-x86_64.tar.xz`, so the
`build-arm64` job in docker-publish.yml — which runs on
`ubuntu-24.04-arm` and publishes by digest — produced an image whose
`/init` couldn't exec on actual arm64 hosts. Apple Silicon and ARM
server users were getting a broken container.

Map BuildKit's `TARGETARCH` (`amd64` / `arm64`) to s6's kernel-arch
naming (`x86_64` / `aarch64`) inside the RUN step and fetch the
correct tarball via `curl` (`ADD`'s URL is evaluated at parse time,
before TARGETARCH substitution, so dynamic arch selection requires
RUN). The noarch + symlinks tarballs are architecture-independent
and stay as ADDs.

The audit case is now explicit: unsupported architectures fail loudly
at build time rather than producing a silently-broken image.
2026-05-24 18:05:33 -07:00
Ben fc39296e1f fix(service_manager): s6 detection works for unprivileged hermes user
PR #30136 review surfaced two issues, both rooted in the same audit gap:
docker integration tests were running as root, not the unprivileged
`hermes` user (UID 10000) that the runtime actually uses via
`s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to
the s6 control surface worked as root in the tests but was inert in
production.

Fixes:

1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`,
   which is root-only readable. For UID 10000 the symlink yields
   PermissionError, `resolve()` silently returns the unresolved path,
   and `exe.name == "exe"` — so detection always returned False, the
   service-manager runtime-registration path was inert, and every
   `hermes profile create` / `hermes -p X gateway start` silently
   skipped the s6 hook. Replace with `/proc/1/comm` (world-readable)
   + `/run/s6/basedir` (s6-overlay-specific) — both required, fail
   closed.

2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/`
   {control,lock} to hermes so `s6-svscanctl -a/-an` works without
   root. Previously the directory chown stopped at `/run/service`
   and the FIFO inside stayed root-owned, so `register_profile_gateway`
   from hermes failed at the rescan-trigger step with EACCES — the
   wrapper in profiles.py caught the exception and printed a swallowed
   warning, so profile creation appeared to succeed while the slot
   was rolled back.

Audit changes to flush this class of bug next time:

- Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py`
  that default to `-u hermes`. The module docstring explains why and
  flags `user="root"` as opt-in only for tests that explicitly need
  root (none currently do).
- Refactor every `docker exec` call in tests/docker/ through the new
  helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py,
  test_container_restart.py, test_s6_profile_gateway_integration.py).
- Add 5 unit tests covering `_s6_running` under various probe states
  (both signals present; comm wrong; basedir missing; PermissionError
  on /proc/1/comm; missing /proc — non-Linux). The PermissionError
  test is the explicit regression guard for the original bug.

Known follow-up: the per-service `supervise/control` FIFO inside each
`/run/service/gateway-<profile>/supervise/` is created root-owned by
s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc
-u/-d/-t` from the hermes user will get EACCES on those. The audit
under `-u hermes` will reveal this in lifecycle tests — surfacing the
issue cleanly so it can be fixed in a focused follow-up (likely via a
small SUID helper or a polling chown loop in cont-init.d). The
detection + svscanctl fixes here are independent and complete on
their own.
2026-05-24 18:05:33 -07:00
Ben 4b4c36cb61 feat(docker): remove gosu from bundled image; s6-setuidgid handles privilege drop
The s6-overlay migration replaced every runtime use of gosu with
s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run
scripts, and cont-init.d hooks), but the gosu binary itself was still
being copied into the image from tianon/gosu, and several comments
across the repo still pointed to it.

Image changes:
- Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage
- Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer
- Net: one fewer base-image pull, ~12-15 MB layer eliminated

Documentation/comment refresh (no behavior change):
- Dockerfile: update root-user rationale comment + cont-init.d comment
- docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference
- docker-compose.yml: update UID/GID remap comment
- .hadolint.yaml: update DL3002 ignore rationale
- website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now
- hermes_cli/config.py: docker_run_as_host_user docstring

tools/environments/docker.py runs *arbitrary user images* via the
terminal backend, not the bundled Hermes image. It still needs SETUID/
SETGID caps so user images that use gosu/su/s6-setuidgid all work.
Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and
updated comments to list s6-setuidgid alongside the others as examples.
The matching test (test_security_args_include_setuid_setgid_for_gosu_drop
→ test_security_args_include_setuid_setgid_for_privdrop) was renamed
and its docstring updated; behavior is unchanged.

Verification:
- hadolint clean against .hadolint.yaml
- shellcheck clean against all docker/ shell scripts
- Image rebuilt successfully (sha 1a090924ccea)
- Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4
  per-profile-gateway lifecycle + container-restart reconciliation)
- tests/tools/test_docker_environment.py: 23 passed (rename did not
  break test discovery; pre-existing unrelated mock warning)

The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md)
intentionally retains its historical references to gosu — it describes
the pre-s6 entrypoint as background for understanding the migration.
2026-05-24 18:05:33 -07:00
Ben a36221ed91 docs(s6): document container supervision; doctor + skill + user-guide updates
Phase 5 of the s6-overlay supervision plan. Documentation + small
diagnostic cleanups; no behavior changes.

website/docs/user-guide/docker.md:
  - Replace the old 'entrypoint script does the bootstrap' section
    with the s6-overlay boot flow (cont-init.d/01-hermes-setup,
    cont-init.d/02-reconcile-profiles, static main-hermes + dashboard
    services, ENTRYPOINT-as-main-program pattern).
  - Add a 'Per-profile gateway supervision' subsection covering the
    new lifecycle commands, restart semantics, log persistence, and
    'Manager: s6 (container supervisor)' status reporting.
  - Add 'Breaking change vs. pre-s6 images' callout naming the
    /init ENTRYPOINT and pointing affected wrappers at the pin
    workaround.

website/docs/user-guide/profiles.md:
  - Add a note under 'Persistent services' pointing container users
    at the docker.md section explaining s6 supervision inside the
    image. Host-side systemd/launchd documentation is unchanged.

skills/software-development/hermes-s6-container-supervision/SKILL.md:
  - New maintainer skill covering the supervision-tree map, file
    layout, the Architecture B rationale (cont-init.d args + halt
    exit-code propagation), quick recipes, and the 8 pitfalls we hit
    while implementing the plan (PATH-without-/command, root-owned
    profile dirs, SOUL.md as marker, the '143' anti-pattern, etc.).

hermes_cli/doctor.py:
  - _check_gateway_service_linger skips on s6 (the linger concept
    doesn't apply inside the container).
  - New _check_s6_supervision section reports main-hermes/dashboard
    state and per-profile-gateway count (registered vs supervised
    up), only inside the s6 container. Host doctor output unchanged.
  - External Tools / Docker check no longer emits a 'docker not
    found' warning inside the container; prints an explanatory
    info line instead. Still respects an explicit TERMINAL_ENV=docker
    (in case the user mounted /var/run/docker.sock).

hermes_cli/gateway.py:
  - Document _container_systemd_operational more precisely: it's
    NOT for our Hermes Docker image (s6-overlay handles that via
    detect_service_manager() == 's6'). It still covers
    systemd-nspawn / k8s-with-systemd-init cases, so leaving it in
    place is correct; the docstring just makes that explicit.

Test harness (verification, no test changes in this commit):
  19 passed, 0 xfailed. 66 service-manager / container-boot /
  profiles-s6-hooks / gateway-s6-dispatch unit tests still green.
  61 doctor tests still green. Hadolint + shellcheck clean.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:33 -07:00
Ben 2afefc501c feat(docker): per-profile s6 supervision + container-restart reconciliation
Phase 4 of the s6-overlay supervision plan. Activates the Phase 3
S6ServiceManager by hooking it into the profile lifecycle and the
`hermes gateway start/stop/restart` dispatcher, and adds a cont-
init.d-time reconciliation pass that survives `docker restart`.

Task 4.0 — container-boot reconciliation:
  /run/service/ is tmpfs, so every `docker restart` wipes every
  per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles
  invokes hermes_cli.container_boot.reconcile_profile_gateways() on
  every boot, which walks $HERMES_HOME/profiles/<name>/, reads each
  gateway_state.json, recreates the s6 service slot, and auto-starts
  only those whose last state was 'running'. Other states
  (stopped, starting, startup_failed, missing) register the slot
  in the down state — avoiding crash-loops across restarts for a
  gateway that was broken last boot. Per-profile outcome is recorded
  to $HERMES_HOME/logs/container-boot.log.

  Implementation: hermes_cli/container_boot.py + 12 unit tests.
  Profile-marker is SOUL.md, not config.yaml, because `hermes profile
  create` only seeds SOUL.md by default (config.yaml comes from
  `hermes setup`).

Task 4.1 / 4.2 — profile create/delete hooks:
  hermes_cli/profiles.py::create_profile now calls
  _maybe_register_gateway_service(<canon>) at the end, which routes
  through ServiceManager.register_profile_gateway when running on s6
  and no-ops on host backends. delete_profile mirrors with
  _maybe_unregister_gateway_service. _allocate_gateway_port produces
  a deterministic SHA-256-derived port in [9200, 9800).

Task 4.3 — gateway dispatch + remove rejection arms:
  _dispatch_via_service_manager_if_s6(action) intercepts
  start/stop/restart at the top of each subcommand and routes them
  through S6ServiceManager.{start,stop,restart}. The pre-Phase-4
  `elif is_container():` rejection arms are kept as fallback for
  pre-s6 containers / unsupported runtimes, but only ever fire when
  detect_service_manager() != 's6'. install/uninstall under s6
  print informational guidance pointing users at profile create/delete.

  Removed the two xfail(strict=True) markers from
  tests/docker/test_profile_gateway.py — both tests now pass strictly.

Task 4.4 — status reporting:
  get_gateway_runtime_snapshot() reports
  Manager: 's6 (container supervisor)' inside an s6 container instead
  of 'docker (foreground)'.

Plan-vs-reality drift fixed in this commit:
  - Plan's S6ServiceManager._render_run_script used
    `gateway start --foreground --port {port}` — invented args; the
    real CLI is `gateway run`. Switched accordingly. port arg
    retained for API parity but now documented as 'currently ignored'.
  - Plan's reconciler keyed on config.yaml; switched to SOUL.md
    (config.yaml is created by hermes setup, not by hermes profile
    create, so the original gate caught nothing).
  - The plan's _dispatch helper used _profile_arg() which returns
    '--profile <name>' (i.e. with the flag prefix). Switched to
    _profile_suffix() which returns the bare name.
  - Architecture B's docker exec doesn't get /command on PATH or
    the venv on PATH; Dockerfile's runtime PATH now includes
    /opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works
    without sourcing the venv.
  - stage2-hook now chowns $HERMES_HOME/profiles to hermes on every
    boot, not just on the UID-remap path. Without this, files created
    by docker-exec-as-root accumulate and the next reconciler run
    fails with PermissionError reading SOUL.md.

Test harness:
  19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to
  passing). 78 unit tests across service_manager + container_boot +
  profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck
  pass cleanly.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:33 -07:00
Ben 0abf661f71 feat(service_manager): add S6ServiceManager for runtime gateway supervision
Phase 3 of the s6-overlay supervision plan. Implements the runtime-
registration surface from D4 — only the s6 backend supports
register_profile_gateway / unregister_profile_gateway /
list_profile_gateways; host backends continue to raise
NotImplementedError. No caller yet (Phase 4 wires in the profile
create/delete hooks).

Key implementation notes:

  - Service directory shape: /run/service/gateway-<profile>/{type,run,log/run}.
    Atomic register: write to gateway-<profile>.tmp, fsync via
    os.rename. Cleanup on rescan failure.

  - Run script uses #!/command/with-contenv sh so HERMES_HOME and any
    extra_env arrive at exec time. The hermes -p <profile> gateway
    start --foreground --port <port> command is wrapped in
    s6-setuidgid hermes for the per-service privilege drop (OQ2-A).

  - Log script (OQ8-C): persists via s6-log to
    ${HERMES_HOME}/logs/gateways/<profile>/. CRITICAL — HERMES_HOME is
    a runtime env-var expansion in the rendered script, NOT a Python
    f-string substitution. Negative-asserted in
    test_s6_register_creates_service_dir_and_triggers_scan so
    regressions are caught.

  - PATH gotcha: /command/ is only on PATH for processes spawned by
    the supervision tree (services, cont-init.d). `docker exec` and
    profile-create hooks don't get it. S6ServiceManager calls all
    s6-* binaries via absolute path through the new _S6_BIN_DIR
    constant so callers don't have to fix up env vars.

  - validate_profile_name rejects path-traversal, leading-dash (s6
    would parse as a flag), uppercase, whitespace, and names >251
    chars (s6-svscan default name_max).

Test coverage:
  - 13 new unit tests in tests/hermes_cli/test_service_manager.py
    (kind detection, run-script content, env quoting, register
    rollback on rescan failure, unregister idempotence, list filter,
    lifecycle dispatch, svstat parsing). Total: 36 passing.
  - 2 new in-container integration tests in
    tests/docker/test_s6_profile_gateway_integration.py validating
    end-to-end registration against a real s6 supervision tree.

Docker harness: 14 passed, 2 xfailed (Phase 4 target unchanged).

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:33 -07:00
Ben e0e9c895d3 feat(docker)!: replace tini with s6-overlay as PID 1
BREAKING CHANGE: the container ENTRYPOINT is now /init (s6-overlay)
instead of /usr/bin/tini. Main hermes runs as the container CMD with
TTY inherited (preserving --tui), dashboard runs as a supervised s6-rc
service (HERMES_DASHBOARD=1 starts it; crashes auto-restart), and the
ground is laid for per-profile gateway supervision (Phase 3+4).

All five pre-s6 docker run invocation patterns continue to work
identically — verified by the Phase 0 docker harness:

  docker run <image>                  → `hermes` with no args
  docker run <image> chat -q "..."    → `hermes chat -q ...` passthrough
  docker run <image> sleep infinity   → `sleep infinity` direct
  docker run <image> bash             → interactive bash
  docker run -it <image> --tui        → interactive Ink TUI

Phase 2 harness result: 12 passed, 2 xfailed (Phase 4 target). Hadolint
+ shellcheck pass cleanly.

Architecture pivot from plan v3 (documented in main-hermes/run header):
the plan called for main hermes to be an s6-supervised service, but
two real s6-overlay v3 mechanics blocked that — cont-init.d scripts
receive no arguments (CMD args are not visible to stage2-hook), and
`/run/s6/basedir/bin/halt` after writing the exit code did not
propagate the desired exit code (container exits 143). We use the
s6-overlay-native CMD pattern instead: main-wrapper.sh is the
container's main program (ENTRYPOINT prepends it so leading-dash
args like --version aren't intercepted by /init), exec's the final
program with stdin/stdout/stderr inherited, and the program's exit
code becomes the container exit code. main-hermes is now a no-op
`sleep infinity` slot kept for future supervised-gateway-container
modes. This trades "supervised restart of main hermes" for arg-
parity with the pre-s6 contract — main hermes was already unsupervised
under tini, so we lose nothing functional. Dashboard supervision is
the only new guarantee added by this phase.

Files added:
  docker/main-wrapper.sh           # arg routing + s6-setuidgid drop
  docker/stage2-hook.sh            # gosu-equivalent + chown + seed
  docker/s6-rc.d/main-hermes/{type,run,dependencies.d/base}
  docker/s6-rc.d/dashboard/{type,run,dependencies.d/base}
  docker/s6-rc.d/user/contents.d/{main-hermes,dashboard}

Files changed:
  Dockerfile: tini → s6-overlay install + ENTRYPOINT flip + service wiring
  docker/entrypoint.sh: thin shim to stage2-hook.sh for back-compat
  tests/docker/test_dashboard.py: add test_dashboard_restarts_after_crash

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:33 -07:00
Ben 51914b0514 feat(service_manager): add ServiceManager protocol + host wrappers
Phase 1 of the s6-overlay supervision plan. Pure-refactor addition:
introduces the abstract interface (with runtime_checkable Protocol),
detect_service_manager(), validate_profile_name(), and thin
SystemdServiceManager / LaunchdServiceManager / WindowsServiceManager
wrappers around the existing systemd_* / launchd_* / gateway_windows.*
module-level functions. No host call site was modified — host code
continues to use the existing functions directly; the protocol is for
new backend-agnostic code (Phase 4 profile create/delete hooks and the
Phase 4 s6 dispatch path in 'hermes gateway start/stop/restart').

WindowsServiceManager.install() forwards the v3 kwargs (start_now,
start_on_login, elevated_handoff) added in PRs #28169-adjacent so
non-Windows callers — there aren't any today — can opt in.

The s6 backend lands in Phase 3; until then get_service_manager()
raises a clear error if invoked on a host that detects as 's6'.
2026-05-24 18:05:14 -07:00
Ben b2168bf349 ci(docker): add hadolint + shellcheck for container build inputs
Phase 0.5 of the s6-overlay supervision plan. Catches Dockerfile and
shell-script regressions that the behavioral docker-publish smoke test
can't surface — unquoted variable expansions, silently-failing RUN
commands, missing apt-get clean, etc.

Both lint clean against the current (tini) Dockerfile + entrypoint.sh
at the configured thresholds (hadolint: warning, shellcheck: error).
Each ignore in .hadolint.yaml carries a one-line justification; the
shellcheck severity floor is documented in the workflow file.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:14 -07:00
Ben 440147ebea test(docker): stabilize Phase 0 baseline harness
Two pre-existing baseline issues found while running the Phase 0 harness
against the tini image that need fixing before later phases can use the
harness as a behavior-parity oracle:

1. The autouse `_enforce_test_timeout` fixture in tests/conftest.py
   hard-coded a 30s SIGALRM, which preempted any `pytest.mark.timeout`
   marker (already honored by pytest-timeout). Honor the marker if
   present; fall back to 30s otherwise. Docker harness tests carry a
   180s marker applied at collection time in tests/docker/conftest.py.

2. test_dashboard_port_override polled via `ss -tlnp` / `netstat -tln`
   — neither is installed in the Hermes image, so the probe trivially
   failed even when the dashboard was bound. The dashboard also takes
   8-15s to bind on cold image; the 5s sleep was insufficient. Replace
   with a poll loop reading /proc/net/tcp directly (port 9120 = 0x23A0,
   state 0A = LISTEN). Bump probe deadline to 60s and switch
   test_dashboard_opt_in_starts to a similar poll for pgrep so we don't
   regress to the same race.

Result: 11 passed, 2 xfailed (Phase 4 target) on tini image. Harness
now ready to serve as Phase 2's behavior-parity oracle.
2026-05-24 18:05:14 -07:00
Ben a18f69eb55 test(docker): apply 180s timeout to docker harness tests
The agent-test suite default is 30s; docker test_no_args (the dashboard
spin-up, the container restart) routinely take 60-90s. Without this
they intermittently fail in CI with TimeoutError.
2026-05-24 18:05:14 -07:00
Ben 6e6acdea2a test(docker): lock baseline behavior for Phase 0 harness
Tasks 0.2-0.6 of the s6-overlay supervision plan. Locks the
user-visible behavior we must preserve through the Phase 2 init-
system swap:

- test_main_invocation.py (Task 0.2): docker run <image> with no
  args, chat subcommand passthrough, bare executable passthrough,
  bash pattern, exit-code propagation
- test_tui_passthrough.py (Task 0.3): TTY allocation via docker -t
  using the host's script(1) for a PTY
- test_dashboard.py (Task 0.4): HERMES_DASHBOARD=1 opt-in,
  HERMES_DASHBOARD_PORT override
- test_profile_gateway.py (Task 0.5): per-profile gateway
  start/stop and profile-delete-stops-gateway. Both marked
  xfail(strict=True) because the current tini image refuses
  gateway lifecycle commands inside the container; Phase 4
  Task 4.3 flips them to passing.
- test_zombie_reaping.py (Task 0.6): PID 1 reaps orphaned
  zombies. tini does this today; s6-overlay's /init must
  continue to.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:14 -07:00
Ben 08302135b6 test(docker): add conftest fixtures for docker harness
Task 0.1 of the s6-overlay supervision plan. Establishes the test
infrastructure for tests/docker/: skip-on-missing-Docker collection
hook, session-scoped image-build fixture (overridable via the
HERMES_TEST_IMAGE env var for faster local iteration), and a
container_name fixture that ensures cleanup on test exit.

Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
2026-05-24 18:05:14 -07:00
Ben d36461d806 docs(plans): add s6-overlay supervision plan (v3)
Replace tini with s6-overlay as PID 1 in the Hermes Docker image so that
main hermes, the dashboard, and dynamically-created per-profile gateways
all run as supervised services. Includes container-boot reconciliation
(Task 4.0) so per-profile gateways survive docker restart.

Plan history:
- v1: 2026-05-07 — original design (subagent gateways scope)
- v2: 2026-05-18 — re-validated, scope narrowed to per-profile gateways,
  WindowsServiceManager added to protocol
- v3: 2026-05-21 — re-validated in docker_s6 worktree, install-method
  stamp preservation noted in Task 2.3, Task 4.0 added for container
  restart survival

12.5 engineering days estimated across 7 phases.
2026-05-24 18:05:14 -07:00
kshitijk4poor 00ec0b617c feat(tts): add register_tts_provider() plugin hook (closes #30398)
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR #17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.

The hook covers cases the shell-template grammar can't reasonably
express:

- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows

None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.

## Resolution order

The dispatcher's resolution order is the load-bearing invariant:

1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
   → command-provider dispatch (PR #17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
   → plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).

Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
  names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
  the `hermes tools` row list defensively.

Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
  `_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
  config defensively so a refactor of the caller can't silently break
  the invariant.

## New files

- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
  `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
  `voice_compatible` (all optional with sane defaults). Mirrors
  `agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
  with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
  `agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).

## Modified files

- `hermes_cli/plugins.py` — `register_tts_provider()` method on
  `PluginContext`. Matches the gating shape of
  `register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
  `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
  the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
  plugin rows into the Text-to-Speech picker category alongside the
  10 hardcoded built-in rows.

## Tests

- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
  lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
  test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
  `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
  circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
  built-in-always-wins, command-wins-over-plugin, plugin dispatch,
  exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
  picker surface, builtin shadowing defense, integration with
  `_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
  tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
  parity harness vs `origin/main`. The only intentional diff is
  `fallback_edge → plugin` for the `plugin-installed` scenario.

## Verification

- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
  test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
  `text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.

## Docs

- `website/docs/user-guide/features/tts.md` — new "Python plugin
  providers" section with a decision table (command-provider vs
  plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
  mention both surfaces (command-provider primary, plugin for
  SDK/streaming).

Closes #30398
2026-05-24 18:04:54 -07:00
Zyrix 782681f904 fix(google_chat): harden oauth credential persistence with atomic private writes (#24788) 2026-05-24 17:58:52 -07:00
teknium1 bf2f3b2469 chore(release): map vgocoder for PR #24758 salvage 2026-05-24 17:58:25 -07:00
vgocoder dcc163ee28 fix(security): redact credentials before persistence in session capture
Two-layer redaction at the persistence boundary so credentials never reach
state.db, session_*.json, or compression:

1. agent/chat_completion_helpers.py :: build_assistant_message
   - Redact assistant content before the message dict is constructed
     (catches PATs / API keys the model inlines into natural language)
   - Redact tool_call.function.arguments at the same site (catches secrets
     inlined into tool args, e.g. terminal command=curl -H 'Authorization: ...')
   Tool execution uses the raw API response object, not this dict, so
   redacting the persisted shape is safe.

2. run_agent.py :: _save_session_log
   - Add _redact_message_content() static helper that handles both string
     content and OpenAI/Anthropic multimodal list-of-parts (image parts
     pass through untouched, only text/content fields are redacted)
   - Apply to every message + the cached system prompt before writing
     session_*.json

Both layers respect HERMES_REDACT_SECRETS via redact_sensitive_text —
no-op when disabled.

Tests (TestSaveSessionLogRedactsSecrets, 4 cases):
  - api key in tool content
  - api key in user message
  - api key in system prompt
  - multimodal list-of-parts (image part preserved, text redacted)
Tests use an autouse fixture to force _REDACT_ENABLED=True because the
hermetic conftest defaults the env var to false.

Salvaged from PR #24758 by @vgocoder (build_assistant_message + session_log)
+ PR #19855 by @liuhao1024 (multimodal list helper, system_prompt redaction).
Kept only the redaction concern from #19855; its unrelated whatsapp npm
timeout + PATCH_SCHEMA changes are out of scope and dropped.

Refs #19798 (PAT leak via assistant inline mention), #19845 (session capture
credential leak).

Co-authored-by: liuhao1024 <liuhao03@bilibili.com>
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-24 17:58:25 -07:00
JunghwanNA 243ebc7a61 Protect dashboard OAuth credentials with the same file-safety guarantees as other auth paths
The web dashboard's Anthropic OAuth helper wrote the credential file
straight to its final destination and relied on the process umask for
permissions. That left the dashboard-specific path weaker than the
existing auth writers, which already use owner-only permissions and
safer write semantics.

This change keeps the scope narrow: make the dashboard helper write via
a temp file + replace, chmod the final file to owner-only, and add a
focused regression test for both permission handling and atomic-write
behavior.

Constraint: Must preserve the existing dashboard OAuth flow and credential-pool side effects
Rejected: Broader auth-storage refactor | unnecessary scope for a single verified inconsistency
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep dashboard credential writes aligned with existing auth storage semantics; do not reintroduce direct write_text() here without matching chmod/atomic behavior
Tested: pytest -o addopts='' tests/hermes_cli/test_web_server_oauth_write.py tests/hermes_cli/test_web_server.py -q (78 passed)
Not-tested: Cross-platform permission semantics on Windows-managed filesystems
2026-05-24 17:47:24 -07:00
teknium1 55987818b6 chore(release): map kronexoi for PR #30553 salvage 2026-05-24 17:47:24 -07:00
kronexoi 4694524dee fix(security): restrict write access to Anthropic OAuth credential store 2026-05-24 17:47:24 -07:00
Teknium be89c2e4fa ci(supply-chain): anchor install-hook regex at repo root (#31744)
The SETUP_HITS check matched any file ending in setup.py/setup.cfg/
sitecustomize.py/usercustomize.py at any path depth. This produced
false positives on every PR touching hermes_cli/setup.py (the CLI
setup wizard), which is unrelated to pip/site install hooks.

Only the top-level setup.py/setup.cfg execute during 'pip install',
and only top-level sitecustomize.py/usercustomize.py are auto-loaded
by site.py at interpreter startup. Anchor the regex with '^' so only
repo-root matches fire.

Symptom: PR #30916 (Mattermost plugin migration) flagged purely
because it deletes _setup_mattermost() from hermes_cli/setup.py.
Discord migration (#30591) hit the same false positive yesterday.
2026-05-24 17:46:08 -07:00
Guts 223a3971c0 fix(security): close TOCTOU window when saving Claude Code OAuth credentials (#21152)
_write_claude_code_credentials wrote ~/.claude/.credentials.json via
Path.write_text + replace + post-write chmod(0o600). Both the temp file
and the destination briefly inherited the process umask (commonly 0o644
= world-readable) between create/replace and chmod, exposing the OAuth
access/refresh tokens to other local users on multi-user hosts.

Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the temp file is created atomically at 0o600. After os.replace,
the destination inherits the temp's mode, so the post-write chmod is no
longer needed. The temp name also gains a per-process random suffix to
avoid collisions between concurrent writers and stale leftovers from a
crashed prior write.

Parent dir (~/.claude/) is owned by Claude Code itself and shared with
its native auth, so we deliberately don't tighten its mode here (unlike
the mcp_oauth fix which owns its own subtree under HERMES_HOME).

Mirrors the fix shipped for agent/google_oauth.py in #19673 and the
parallel fix for tools/mcp_oauth.py in #21148.

Adds a regression test in TestWriteClaudeCodeCredentials asserting the
resulting file mode is 0o600 (skipped on Windows where POSIX mode bits
aren't enforced).
2026-05-24 17:45:12 -07:00
Hinotobi bba76f3dcd fix(file-safety): deny reads of Google OAuth tokens (#30972) 2026-05-24 17:45:03 -07:00
flamiinngo fa957c06cf fix(security): add missing credential paths to write denylist (#27217)
The write denylist already protects SSH keys, AWS, GPG, npm, PyPI,
Docker, Azure, and GitHub CLI credentials. Two common credential
stores were missing:

~/.git-credentials stores plaintext git tokens in the format
https://username:token@github.com when using git credential-store.
It is directly analogous to ~/.netrc which was already protected.

~/.config/gcloud/ contains Google Cloud OAuth tokens and service
account credentials. It is directly analogous to ~/.aws/ which
was already protected.

Under prompt injection, an agent could be instructed to overwrite
these files, destroying credentials or planting malicious ones.

Verified before and after with is_write_denied() on both paths.
2026-05-24 17:44:53 -07:00
Teknium 9c08070703 test(cli): update resume usage-hint assertion for numbered selection
PR #9020's salvage changed the /resume list footer from
'Use /resume <session id or title> to continue.' to
'Use /resume <number>, /resume <session id>, or /resume <session title> to continue.\n  Example: /resume 2'.

test_resume_without_target_lists_recent_sessions still pinned the old
string verbatim and failed in CI. Relax to substring assertions that
allow both the new numbered footer and any future tweaks while still
verifying the hint is shown.
2026-05-24 16:22:48 -07:00
Teknium c043c86bd7 i18n+tests: add list_item_numbered, list_footer_numbered, out_of_range for 15 locales
The numbered /resume feature added new i18n keys to en.yaml; the catalog parity
tests require every locale to carry matching keys and placeholders, so add
translations to all 15 supported locales.

Also unblock tests/cli/test_cli_resume_command.py:
- _make_cli stub now sets self.resume_display = 'minimal' since
  _handle_resume_command (post-#31695) calls _display_resumed_history.
- mock_db.resolve_resume_session_id returns the input id (no compression
  chain) so HERMES_SESSION_ID is set to a real string, not a MagicMock.
2026-05-24 16:22:48 -07:00
Teknium 87580076fd chore(release): map 490408354@qq.com to daizhonggeng (PR #9020) 2026-05-24 16:22:48 -07:00
daizhonggeng fef733d56b feat: support numbered resume selection in cli and gateway 2026-05-24 16:22:48 -07:00
AhmetArif0 4f4e337c47 fix(file-safety): write-deny pairing/ directory to prevent approved-list injection
The gateway pairing directory (~/.hermes/pairing/) stores per-platform
access-control files (telegram-approved.json, discord-approved.json, etc.).
A prompt-injected agent using write_file could add arbitrary user IDs to an
approved file, granting persistent gateway access without going through the
pairing code flow — the same threat class that motivated protecting
webhook_subscriptions.json (#14157).

The pairing directory was not included in the original control-plane protection
because it postdates PR #14157. PR #30383 introduced the hashed-pending schema
and made the approved files the sole source of truth for gateway access, raising
the security sensitivity of the directory.

Apply the same mcp-tokens pattern: block writes to pairing/ and any path within
it, under both the active hermes_home and the root path (for profile-mode parity
with the fix in #30382).

Regression tests verify denial for pairing/telegram-approved.json,
pairing/discord-pending.json, and the directory itself, in both normal and
profile-mode layouts.
2026-05-24 16:15:33 -07:00
LeonSGP43 6c44d537cc fix(cli): show full session titles in /resume list 2026-05-24 16:13:23 -07:00
Teknium 8e68426981 fix(cli): add inline --yes/now skip for destructive slash commands (#30768)
Issue #30768 reports that on native Windows PowerShell the destructive-slash
confirmation modal renders but never registers keypresses, leaving the user
unable to confirm or cancel /reset, /new, /clear, or /undo. The modal works
on macOS, Linux, and WSL; PR #23907 (merged May 11) replaced the
daemon-thread input() pattern with a prompt_toolkit-native keybinding modal
but the win32 input pipeline apparently doesn't dispatch keys to the
filter-conditioned handlers. The modal investigation is ongoing.

This change ships the immediate escape hatch: append `now`, `--yes`, or `-y`
to any destructive slash command to bypass the modal and run the action
immediately. Works on every platform without touching the broken Windows
code path.

  /reset now            -> reset, no modal
  /new --yes my-session -> new session titled "my-session", no modal
  /clear -y             -> clear, no modal
  /undo -y              -> undo, no modal

The default behavior (modal prompts when approvals.destructive_slash_confirm
is True) is unchanged for users who don't pass a skip token.

Implementation:

- New classmethod HermesCLI._split_destructive_skip(text) -> (remainder, skip)
  parses a destructive-slash command string, strips the leading "/cmd" word
  and any recognized skip tokens (case-insensitive exact match, not substring),
  and reports whether a skip was requested.
- HermesCLI._confirm_destructive_slash gains an optional cmd_original= arg.
  When the arg contains a skip token, it returns "once" immediately —
  before the gate check and before any modal rendering.
- The /clear, /new, /undo handlers in process_command pass cmd_original
  through. /new additionally uses _split_destructive_skip to strip skip
  tokens from the remaining text before deriving the session title, so
  "/new now My Session" yields title="My Session" (not "now My Session").

Tests:

- 7 new unit tests in tests/cli/test_destructive_slash_confirm.py covering
  the helper (recognized tokens, command-word stripping, case-insensitive
  exact match, None/empty input) and the modal bypass (now and --yes both
  skip; no-skip-token still consults the modal).
- 3 new integration tests in tests/cli/test_destructive_slash_inline_skip_e2e.py
  driving HermesCLI.process_command end-to-end and asserting (a) new_session
  is invoked, (b) the modal is never reached, (c) the skip token does not
  leak into the session title, and (d) the no-skip-token path still reaches
  the modal as a sanity check that we haven't accidentally short-circuited
  the normal flow.

All 31 tests across the destructive-slash test surface pass.

Docs:

- website/docs/reference/slash-commands.md documents the new flags both in
  the destructive-commands table and the dedicated approval section, with a
  link back to issue #30768 explaining why the escape hatch exists.
2026-05-24 16:13:03 -07:00
teknium1 99a7ecc335 chore(release): map leeseoki0 for PR #31315 salvage 2026-05-24 15:48:58 -07:00
leeseoki0 ce529d6072 fix(kanban): scratch tasks must not inherit board.default_workdir (#28818)
Board defaults represent persistent project checkouts. Scratch workspaces
are auto-deleted on completion and must stay under the per-board scratch
root that resolve_workspace() creates. Inheriting default_workdir for a
scratch task pointed the cleanup path at the user's source tree — the
data-loss vector documented in #28818.

The containment guard in _cleanup_workspace (just added) is the safety
rail. This commit prevents the bad state from being created in the first
place: only persistent kinds (dir/worktree) inherit board defaults.

Tests updated to cover the new semantics: scratch with default_workdir
set keeps workspace_path=None; dir/worktree still inherits the board
default.

Salvaged from PR #31315 by @leeseoki0 — prevention layer on top of the
#28819 containment fix by @briandevans.

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-24 15:48:58 -07:00
briandevans 23115b5c0f fix(kanban): restrict managed-scratch roots to workspaces/ dirs only
Copilot review on PR #28819 flagged that `_is_managed_scratch_path` accepted
the entire `<kanban_home>/kanban` subtree as managed scratch storage. With
that, a task whose `workspace_kind='scratch'` and `workspace_path` was
mis-set to `<kanban_home>/kanban`, `.../kanban/logs`, or a board's
metadata directory (e.g. `.../kanban/boards/<slug>` without the
`workspaces/` child) would pass the containment guard and let task
completion `shutil.rmtree` Hermes' own DB, metadata, and log subtrees.

Tighten the guard:

* Allowed roots are now exclusively `workspaces/` directories — the
  `HERMES_KANBAN_WORKSPACES_ROOT` override, `<kanban_home>/kanban/workspaces`,
  and each `<kanban_home>/kanban/boards/<slug>/workspaces` discovered on
  disk.
* Require strict descendancy: a path equal to a root itself is rejected
  too, because deleting a workspaces root would wipe every task's scratch
  dir at once.

Add a regression test covering the three Copilot-named attack paths
(kanban root, kanban/logs, board root without `workspaces/`) plus the
workspaces-root-itself case, and confirm the inner task-id dir still
matches.
2026-05-24 15:48:58 -07:00
briandevans 80ad1609c8 fix(kanban): refuse to rmtree workspace_path outside managed scratch root (#28818)
A board's ``default_workdir`` (e.g. ``hermes kanban boards
set-default-workdir my-board /path/to/real/source``) is copied into
``tasks.workspace_path`` for tasks created without an explicit
``workspace_kind``. Those tasks default to ``workspace_kind='scratch'``,
so completion calls ``_cleanup_workspace`` and unconditionally runs
``shutil.rmtree(wp, ignore_errors=True)`` — deleting the user's real
source tree as if it were disposable scratch storage.

Add ``_is_managed_scratch_path()`` and gate ``_cleanup_workspace`` on
it: only delete paths under ``HERMES_KANBAN_WORKSPACES_ROOT`` (the
worker-side override the dispatcher injects) or under the active kanban
home's ``kanban/`` subtree (covering both the legacy default-board root
and per-board ``kanban/boards/<slug>/workspaces`` roots). Anything else
gets a warning log and is left alone, so a misconfigured
``default_workdir`` can no longer destroy user data on task completion.
2026-05-24 15:48:58 -07:00
Teknium 396ee69032 fix(gateway): seed plugin extras before is_connected gate (#31703)
Follow-up to 54e61f933. The plugin enablement gate calls
``entry.is_connected(probe_cfg)`` BEFORE ``env_enablement_fn`` runs,
and the probe is built as ``existing_cfg or PlatformConfig()`` — empty
extras, ``enabled=False``.

For plugins whose ``is_connected`` reads ``config.extra`` instead
of env vars directly, that probe is a misrepresentation of what the
platform will look like after enablement. Google Chat's
``_is_connected`` short-circuits on ``config.enabled`` and inspects
``config.extra["project_id"]`` / ``config.extra["subscription_name"]``
— both False on the default probe even when the user has set
``GOOGLE_CHAT_PROJECT_ID`` and ``GOOGLE_CHAT_SUBSCRIPTION_NAME``. Result:
Google Chat silently fails the gate on every env-var-only setup.

Build a candidate probe that mirrors what the platform will look like
post-enablement:
- pre-call ``env_enablement_fn`` and layer its result into the probe's
  ``extra`` (without mutating any existing platform config)
- pass ``enabled=True`` on the probe — we're asking "would this BE
  configured if we let it in?" not "is it currently enabled?"
- reuse the same seeded extras when we commit the platform to
  ``config.platforms`` (avoids calling ``env_enablement_fn`` twice)

Discord/IRC/Teams/LINE/ntfy/Simplex ``_is_connected`` hooks read env
vars directly, so they are unaffected. This change only restores
Google Chat on env-var-only setups while keeping the original #31116
Discord-no-token block intact.

All 6 shipped ``env_enablement_fn`` implementations were audited and
are pure reads (no ``os.environ`` writes), so running them earlier in
the loop has no observable side effects.

Tests: 2 new in tests/gateway/test_platform_registry.py covering
extras-seeded-before-is_connected and don't-leak-extras-on-gate-fail.
693 tests across 11 adjacent suites pass (platform_registry, config,
google_chat, matrix, discord_connect, ntfy_plugin, simplex_plugin,
line_plugin, irc_adapter, teams, gateway_platform_gating).

Refs #31116.
2026-05-24 15:44:26 -07:00
helix4u 514f5020c7 fix(debug): redact BlueBubbles webhook secrets 2026-05-24 15:43:48 -07:00
Teknium 13b85bc646 feat(config): document resume-recap tuning keys in DEFAULT_CONFIG
The hardcoded constants in _display_resumed_history were exposed as
config in PR #4434; declare them in DEFAULT_CONFIG and the CLI fallback
dict so they show up in 'hermes config' diagnostics and the schema
validator.
2026-05-24 15:36:37 -07:00
Teknium 5dc10ec3ba test(cli): reconcile resume-recap tests with skip-tool-only default and compression-chain helper
- test_tool_calls_shown_as_summary: explicitly disable resume_skip_tool_only
  (#4434 made True the default; the legacy assertion relied on tool-only
  entries being rendered as a summary).
- test_tool_only_message_skipped_by_default: add coverage for the new
  default skip behavior.
- test_resume_command_*: mock_db.resolve_resume_session_id now returns the
  same id (no compression chain) so the post-#15000 redirect block doesn't
  shove a MagicMock into HERMES_SESSION_ID.
2026-05-24 15:36:37 -07:00
Teknium 27c4ba98c3 chore(release): map zhangsamuel12@gmail.com to SamuelZ12 (PR #7480) 2026-05-24 15:36:37 -07:00
ygd58 cdf4876bfe fix(cli): skip tool-call-only entries in resume recap, expose limits as config options 2026-05-24 15:36:37 -07:00
Samuel Zhang 961e34a1d3 fix: show recap after in-session resume 2026-05-24 15:36:37 -07:00
Teknium 16eed4f91b test(telegram): add brand-new-topic regression for #31086
The cherry-picked fix from #28605 inverts an existing test (an unknown
non-lobby thread_id no longer rewrites to the most-recent binding), but
that test only seeds two bindings and queries a third thread_id. Add a
second regression test that more closely mirrors the live failure mode:
seed exactly one prior binding, then query a brand-new thread_id and
assert recovery returns None — so the new topic is allowed to get its
own session row instead of being silently merged into the previous
topic's session.

Co-authored-by: Fábio Siqueira <fabioxxx@gmail.com>
Co-authored-by: dillweed <dillweed@users.noreply.github.com>
2026-05-24 15:28:40 -07:00
Maxim Esipov bdc9b0eff5 fix(telegram): preserve new DM topic lanes 2026-05-24 15:28:40 -07:00
Teknium eea9553a9c fix(anthropic): skip mcp_ prefix on outgoing tool schemas when already prefixed
Companion to the GH-25255 incoming-strip fix from @hayka-pacha. Without
this, build_anthropic_kwargs unconditionally added 'mcp_' to every tool
name in step 3, so a native MCP server tool registered as
'mcp_composio_X' was sent as 'mcp_mcp_composio_X' on the wire. The
incoming strip only removes ONE prefix, which still worked on first
call, but on subsequent calls the model pattern-matched the
single-prefixed form from message history and produced names that
stripped to 'composio_X' — registry miss, dispatch fail.

The history-rewrite block (#4) already has this guard. Apply the same
guard to the schema-rewrite block (#3) so round-trip is symmetric.

Added 4 outgoing-side tests. Existing 7 incoming-side tests still pass.

Author map: hayka-pacha added for PR #25270 salvage attribution.

Refs GH-25255.
2026-05-24 15:27:45 -07:00
HKPA 2f91a8406c fix(agent): only strip mcp_ prefix for OAuth-injected tools (GH-25255)
When strip_tool_prefix=True (Anthropic OAuth path), normalize_response
unconditionally stripped the mcp_ prefix from ALL tool names starting
with mcp_. This broke Hermes-native MCP server tools (registered under
their full mcp_<server>_<tool> name in the registry) because the stripped
name doesn't match any registry entry.

Fix: check the tool registry before stripping. Only strip when:
- The stripped name EXISTS in the registry (OAuth-injected tool)
- The full name does NOT exist in the registry

This preserves backward compatibility for OAuth-injected tools while
protecting native MCP server tools from incorrect prefix removal.

7 new tests covering: OAuth strip, native preserve, no-flag, non-mcp,
unknown tools, mixed responses, and dual-registration edge case.

Signed-off-by: HKPA <hayka-pacha@users.noreply.github.com>
2026-05-24 15:27:45 -07:00
Yuan Li 476c897439 fix(telegram): gate send() on send-path health after reconnect storms (#31165)
After sustained Bad Gateway / TimedOut reconnect cycles, the PTB httpx
client can enter a state where bot.send_message() returns a valid
Message (real message_id) but the message never reaches the recipient.
TelegramAdapter.send returns SendResult(success=True) and cron's
live-adapter branch marks the run delivered while the message is
silently dropped.

Add a _send_path_degraded flag. _handle_polling_network_error sets it
on reconnect storms; the existing _verify_polling_after_reconnect
heartbeat probe clears it once getMe() confirms the Bot client is
healthy. While the flag is set, send() short-circuits with
SendResult(success=False, retryable=True) so cron falls through to
the standalone delivery path (fresh HTTP session).

Closes #31165.

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-24 15:27:41 -07:00
Teknium 54e61f9331 fix(matrix,gateway): Matrix E2EE installs full dep set; plugins respect is_connected
Fixes #31116 — two distinct bugs in fresh-install Matrix gateway:

1. Matrix E2EE setup installed only mautrix[encryption], leaving asyncpg
   / aiosqlite / Markdown / aiohttp-socks uninstalled. The first encrypted
   connect failed with 'No module named asyncpg' deep inside
   MatrixAdapter.connect(). Root cause: the setup wizard hand-rolled a
   pip install of one package instead of using lazy_deps.ensure(
   'platform.matrix'), and check_matrix_requirements() short-circuited the
   runtime installer on 'import mautrix' alone — so the other 4 packages
   were never pulled in.

2. Discord auto-enabled itself on every gateway start, even when the user
   never selected Discord and had no DISCORD_BOT_TOKEN. Root cause:
   gateway/config.py plugin-enablement loop gated enablement on
   entry.check_fn() (just 'is the SDK importable?') and ignored
   entry.is_connected (the 'did the user configure credentials?' probe).
   Same bug class as commit 7849a3d73 fixed for _platform_status in the
   setup wizard; this is the runtime counterpart. Affects Discord, Teams,
   and Google Chat.

Changes:
- hermes_cli/setup.py::_setup_matrix — install via
  lazy_deps.ensure('platform.matrix') to pull the full feature group.
- gateway/platforms/matrix.py::_check_e2ee_deps — verify asyncpg +
  aiosqlite + PgCryptoStore in addition to OlmMachine, so E2EE failures
  surface at startup instead of at first encrypted-room connect.
- gateway/platforms/matrix.py::check_matrix_requirements — use
  feature_missing('platform.matrix') as the install gate instead of a
  single 'import mautrix' check, so partial installs trigger the lazy
  installer correctly.
- gateway/config.py plugin-enablement loop — consult entry.is_connected
  before flipping enabled=True. Explicit YAML enabled=true still wins.

Tests: 3 new in tests/gateway/test_matrix.py (asyncpg-required,
aiosqlite-required, partial-install lazy-runs), 5 new in
tests/gateway/test_platform_registry.py (is_connected=False blocks,
is_connected=True enables, is_connected=None falls back to check_fn,
raising probe doesn't enable, explicit YAML wins).

Validation: 310 tests across affected test modules pass.
2026-05-24 15:16:03 -07:00
teknium1 88834baf50 chore: map soju06@users.noreply.github.com for PR #26054 salvage 2026-05-24 15:15:37 -07:00
Soju 6212e9ade8 fix(error-classifier): treat 5xx request-validation errors as non-retryable
Standard OpenAI returns request-validation failures (unknown/
unsupported parameter, malformed request) as 4xx. Some
OpenAI-compatible gateways return them as 5xx instead — codex.nekos.me
returns 502 for an unknown parameter.

The generic '5xx -> retryable server_error' rule then misfires: the
error is deterministic (every retry gets the identical rejection), so
the retry loop burns all 3 attempts, the transport-recovery path
resets the counter and burns 3 more, and the result is a request
flood against a request that can never succeed.

Fix: when a 500/502 body carries an unambiguous request-validation
signal — 'unknown parameter' / 'unsupported parameter' /
'invalid_request_error' in the message text, or invalid_request_error
/ unknown_parameter / unsupported_parameter as the structured error
code — classify as a non-retryable format_error so the loop fails
fast and falls back. Genuine 502 Bad Gateway with no such signal
stays retryable as before.

Origin: local-author
Upstream-PR: none
Patch-State: local-only
2026-05-24 15:15:37 -07:00
Soju 775a17284f fix(transport): strip Hermes-internal scaffolding keys before chat.completions
The empty-response recovery path in run_agent.py appends synthetic
messages tagged with _empty_recovery_synthetic (and the agent loop uses
_thinking_prefill / _empty_terminal_sentinel similarly). These are
internal bookkeeping markers — they must never reach the wire.

chat_completions' convert_messages only stripped Codex Responses leak
fields (codex_reasoning_items, call_id, etc.), not these _-prefixed
markers. Permissive providers (real OpenAI, Anthropic) silently ignore
unknown message keys so the bug stayed hidden, but strict
OpenAI-compatible gateways reject them outright. Observed against
codex.nekos.me:

  502: [ObjectParam] [input[617]._empty_recovery_synthetic]
       [unknown_parameter] Unknown parameter:
       '_empty_recovery_synthetic'

Because the synthetic messages persist in the session, every
subsequent request in that session carries the poisoned key and
fails identically — a deterministic 502 the retry loop mistakes for
a transient server error.

Fix: convert_messages now drops any top-level message key starting
with '_'. OpenAI's message schema has no '_'-prefixed fields, so this
is safe and future-proofs against new internal markers.

Origin: local-author
Upstream-PR: none
Patch-State: local-only
2026-05-24 15:15:37 -07:00
Teknium 7ab1677362 feat(security): on-demand supply-chain audit via OSV.dev (#31460)
Adds 'hermes security audit' — a one-shot vulnerability scan against
OSV.dev covering three surfaces a Hermes user actually controls:

  1. The running Python's installed PyPI dists (importlib.metadata)
  2. Plugin requirements.txt / pyproject.toml pins under ~/.hermes/plugins/
  3. Pinned npx/uvx MCP servers in config.yaml

Zero new dependencies (stdlib urllib + importlib.metadata + tomllib +
concurrent.futures). No auth required for OSV's public batch API.

Flags: --json, --fail-on {low,moderate,high,critical} (default: critical),
       --skip-venv, --skip-plugins, --skip-mcp

Output groups findings by source, sorts by severity descending, surfaces
fixed-versions inline. Exit 1 when any finding meets the --fail-on tier.

Deliberately out of scope: globally-installed pip/npm, editor/browser
extensions, daily background scans, auto-blocking of installs. The audit
is on-demand by design — daily scans become noise the user trains
themselves to ignore.
2026-05-24 15:15:16 -07:00
Teknium 8065e70274 fix(agent): abort on HTTP 402 after pool rotation and fallback fail (#31443)
Closes #31273.

HTTP 402 (insufficient credits) was retried up to agent.api_max_retries
times (default 3), burning paid requests against an exhausted balance.
Real-world impact: ~$40 in 48h on a 24/7 Telegram+Discord gateway.

Root cause: FailoverReason.billing was in the is_client_error
exclusion set in agent/conversation_loop.py, which prevents the
non-retryable-abort branch from firing.

By the time control reaches that predicate:
  * credential-pool rotation has already run for billing and either
    continued the loop or returned False (pool exhausted/absent)
  * the eager-fallback branch has also fired on billing and either
    continued the loop or fell through (no fallback configured)

Falling through to the backoff retry from here has no recovery
mechanism left — it just burns more paid requests.  Removing billing
from the exclusion set makes 402 abort cleanly once pool+fallback
recovery has failed, mirroring how 401/403 (also should_fallback=True)
already behave.

Added tests/run_agent/test_31273_402_not_retried.py which mirrors the
is_client_error predicate shape from the source and asserts the
invariant (plus a source-inspection guard against accidental
re-introduction).
2026-05-24 15:14:13 -07:00
teknium1 5b52e26d18 fix(gateway): swallow transient Telegram TimedOut at loop level
Closes #31066. Closes #31110.

An unhandled `telegram.error.TimedOut` (or peer `NetworkError` /
`httpx` connection error) propagating to the asyncio event loop killed
the entire gateway process, taking down every profile attached to the
same runner. systemd restarted the service after ~5s but the active
conversation turn was lost.

Public adapter methods (`adapter.send`, `adapter.edit_message`,
`adapter.send_voice`, …) are individually try/except-wrapped on
current main, but at least one async path was reaching the loop with
TimedOut unhandled — the report's traceback ends at the deepest httpx
frame and doesn't pinpoint the caller.

Rather than audit 30+ call sites blind, install a loop-level safety net:
`_gateway_loop_exception_handler` is set as the loop's exception handler
in `start_gateway()` after `asyncio.get_running_loop()`. It classifies
the exception via `_is_transient_network_error()` (walks the
__cause__/__context__ chain, matches on class name so the test suite
doesn't need the real telegram/httpx packages installed). Transient
errors are logged at WARNING with full traceback so the originating
call site stays diagnosable; everything else forwards to
`loop.default_exception_handler` so real bugs still surface.

Tests cover the classifier (known transients accepted, real bugs
rejected, cause/context chain unwrap, cyclic-cause termination) and the
handler (swallow + log warning, forward unknowns, missing-exception
context). One end-to-end test schedules an orphan task raising TimedOut
and asserts `asyncio.run` returns cleanly.
2026-05-24 15:03:27 -07:00
Teknium 3d66787a04 fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main (#31452)
* fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main for vision

Fixes #31179. Three coupled fixes so a configured aux vision backend
actually serves vision tasks instead of silently routing images to the
user's main provider:

1. agent/auxiliary_client.py: `auxiliary.<task>.provider: openai` resolves
   to `custom` + `https://api.openai.com/v1`. "openai" was not in
   PROVIDER_REGISTRY (we have `openai-codex` for OAuth and `custom` for
   manual base_url), so the obvious config name silently failed to build a
   client. User-supplied base_url is still preserved; only the provider
   name normalises to `custom` so resolution doesn't hit the
   PROVIDER_REGISTRY-only path.

2. agent/auxiliary_client.py: the vision auto-detect chain now skips the
   user's main provider when models.dev reports `supports_vision=False`.
   Without this guard, a misconfigured aux provider would fall back to
   `auto`, which happily returned the main-provider client. The caller
   would then send image content to e.g. api.deepseek.com with model
   `gpt-4o-mini` and get a cryptic `unknown variant 'image_url',
   expected 'text'` from the provider's parser.

3. tools/vision_tools.py + tools/browser_tool.py: `check_vision_requirements`
   now mirrors the runtime fallback chain (explicit provider, then auto),
   so `vision_analyze` shows up whenever vision is actually serviceable.
   `browser_vision` gets a new `check_browser_vision_requirements` check_fn
   that AND-gates browser + vision availability, so it doesn't get
   advertised to the model when the call would fail at runtime.

Reproduction (config from the bug report):
  model.provider: deepseek
  model.default: deepseek-v4-pro
  auxiliary.vision.provider: openai
  auxiliary.vision.model: gpt-4o-mini

Before: resolve_vision_provider_client() returns None for the explicit
provider, fallback auto returns the deepseek client with model='gpt-4o-mini',
image hits api.deepseek.com → 'unknown variant image_url'. vision_analyze
hidden from tool list; browser_vision exposed but fails at call time.

After: resolves to custom + api.openai.com/v1 with model gpt-4o-mini.
vision_analyze and browser_vision both gate correctly on capability.

Tests: tests/agent/test_vision_routing_31179.py covers all three fixes
(12 cases including the user's exact scenario, base_url preservation,
text-only-main skip, capability-unknown permissive fallback, and tool
gating parity). Existing 382 tests across auxiliary/vision/image_routing
suites still pass.

* test(vision): use exact hostname check to silence CodeQL substring-sanitization alert

* fix(auxiliary): drop model name from vision-skip debug log to silence CodeQL

The new `logger.debug(...)` added in the previous commit interpolated
both `main_provider` and `vision_model` (a public model slug \u2014 not
sensitive). CodeQL's `py/clear-text-logging-sensitive-data` heuristic
re-flagged it twice because the rule mis-detects multi-value
interpolations near tainted-via-config provider strings.

Drop the model from the log args (provider alone is enough to diagnose
the skip; the same sibling branch a few lines up already logs provider
only). Behavior unchanged; CodeQL false positive cleared.
2026-05-24 15:01:28 -07:00
Hinotoi Agent d9ec90585c test(dashboard): send loopback headers for WebSocket sidecar test 2026-05-24 15:00:44 -07:00
hinotoi-agent 2e66eefbc3 fix(dashboard): validate WebSocket Host and Origin 2026-05-24 15:00:44 -07:00
Teknium 186bf25cb1 test(guardrail): assert halt message reaches stream_delta_callback
Regression guard for #30770 — verifies the guardrail-halt branch in
agent/conversation_loop.py pushes the synthesized halt message through
stream_delta_callback before breaking out of the loop.  Without the
emit, chat-completions SSE writers drain an empty queue and clients
(Open WebUI, etc.) see a finish chunk with zero content delta —
indistinguishable from a crash.

Verified: the test fails when the production fix is reverted.
2026-05-24 07:38:24 -07:00
annguyenNous 38b8d0da85 fix: emit guardrail halt message to client before closing stream
When the tool loop guardrail fires (max_tool_failures, etc.), the
turn exits with guardrail_halt but no final assistant message was
emitted to the client. The SSE stream closed silently —
indistinguishable from a crash.

The stream_delta_callback(None) before tool execution is a display
flush, not a hard close. After generating the halt response, emit
it through both _safe_print (CLI) and stream_delta_callback (SSE)
so clients see the explanation.

Fixes #30770
2026-05-24 07:38:24 -07:00
Teknium 889903f0fa fix(tests): align CI tests with recent security hardening (#31470)
Four recent security PRs landed on main with stale/missing test updates,
breaking 4 test shards on every subsequent PR's CI run:

- test_discord_bot_auth_bypass.py (PR #30742 c3caca658):
  DISCORD_ALLOWED_ROLES no longer bypasses _is_user_authorized.
  Inverted 3 tests to assert the new (correct) behavior: role config
  alone does NOT authorize at the gateway layer.

- test_msgraph_webhook.py (PR #30169 4ca77f105):
  adapter.is_connected is a @property, not a method. Test was calling
  it with () after the connect() change; TypeError: 'bool' is not
  callable. Removed the parens.

- test_feishu_approval_buttons.py (PR #30744 bdb97b857):
  Card-action callbacks now go through _allow_group_message
  authorization. 3 tests in TestCardActionCallbackResponse didn't
  populate adapter._allowed_group_users so the operator's open_id got
  rejected. Added the allowlist setup to each test, matching the
  existing pattern in test_returns_card_for_approve_action.

Also raise tolerance on test_wait_for_process_kills_subprocess_on_keyboardinterrupt:
the SIGTERM → 3s TimeoutStopSec → SIGKILL → reap chain can exceed 10s
under loaded xdist (40 workers). Bumped _wait_for_pgid_exit timeout
10→30s and worker join timeout 5→15s. Passes 100% in isolation
already; this just makes it tolerant of CI-host load.

Validation: 270/270 tests pass across the 5 affected files.
2026-05-24 06:54:16 -07:00
Hinotoi-agent 3bace071bf fix(state): restrict sensitive store file permissions
response_store.db (api server) holds conversation history including tool
payloads, prompts, and results. webhook_subscriptions.json holds per-route
HMAC secrets. Under a permissive umask (e.g. 0o022, default on most
distros) both files were created mode 0o644 — readable by other local
users on shared boxes.

- gateway/platforms/api_server.py: ResponseStore tightens itself + WAL/SHM
  sidecars to 0o600 after __init__, then trusts the inode. (Original
  contributor patch chmod'd after every _commit() — wasteful on a hot
  api_server path; chmod-on-create is sufficient since SQLite preserves
  mode bits across writes.)

- hermes_cli/webhook.py: _save_subscriptions writes via tempfile.mkstemp
  (which itself creates the file with 0o600), chmods the temp before the
  atomic rename, and re-asserts 0o600 on the destination so an existing
  permissive file from before this fix gets narrowed.

Tests cover (a) creation under permissive umask leaves 0o600 and (b) an
existing 0o644 webhook_subscriptions.json gets narrowed on next save.
Tests guarded with skipif os.name=='nt' since POSIX mode bits don't apply
on Windows.

Salvaged from PR #30917 by @Hinotoi-agent. Reworked the api_server.py
side from chmod-on-every-commit to chmod-on-create.

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-24 04:55:18 -07:00
m0n3r0 f378f00bfb fix(feishu): validate verification token before reflecting url_verification challenge
When FEISHU_VERIFICATION_TOKEN is configured, an unauthenticated remote
could previously prove endpoint control by sending a url_verification
payload with any attacker-controlled challenge string — the handler
reflected the challenge BEFORE running the token check.

Move the verification_token check ahead of the url_verification echo so
the challenge response is gated on a valid token. Add a regression test
covering the wrong-token case. Also fix the stale
test_connect_webhook_mode_starts_local_server fixture to set
FEISHU_VERIFICATION_TOKEN (post #30746 webhook mode requires a secret).

Salvaged from PR #29663 by @m0n3r0 — kept the url_verification reorder
and its regression test; dropped the host-conditional weakening of the
#30746 secret guard (we want webhook secrets required regardless of
bind host, not only on 0.0.0.0/::).

Docs updated to call out the gating.

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-24 04:51:19 -07:00
teknium1 5e6749fbf3 chore(release): map m0n3r0 for PR #29629 salvage 2026-05-24 04:47:45 -07:00
teknium1 15aa6884a2 fix(webhook): use 403 not 500 for missing-secret rejection
Operator misconfiguration is a client/setup error, not an internal server
exception. 403 "forbidden" more accurately reflects "this route refuses
to authenticate" than 500 "internal server error" — the latter triggers
incident alerting on operator monitoring and conflates real bugs with
config drift.

Follow-up tweak to PR #29629 by @m0n3r0.
2026-05-24 04:47:45 -07:00
m0n3r0 dbf73e90fa fix: fail closed for webhook routes without secrets
Reject unsigned webhook requests when a route has no effective HMAC secret, even if the request handler is reached without the normal connect-time validation. Add regression coverage for the direct-handler path.
2026-05-24 04:47:45 -07:00
BaxBit bbf02c3224 fix(gateway): validate Svix webhook signatures (#30200) 2026-05-24 04:45:13 -07:00
Jiaming Guo ee002e7fc5 fix(dashboard): require auth for plugin rescan (#27340) 2026-05-24 04:45:07 -07:00
Teknium 5acaeba2bb fix(mcp): raise ImportError instead of NameError when stdio SDK missing (#31450)
When the 'mcp' Python SDK isn't installed, _run_stdio leaked a bare
'NameError: name StdioServerParameters is not defined' because the
top-level 'from mcp import ...' fails inside try/except ImportError,
leaving the names unbound at module scope.

Mirror the _MCP_HTTP_AVAILABLE gate that _run_http already had: raise
a clear ImportError with install instructions instead.

Fixes #30904
2026-05-24 04:44:59 -07:00
xxxigm 6cafcf9c77 test(streaming): pin partial-stream-stub finish_reason + continuation contract
Three test classes lock in the #30963 fix:

1. TestPartialStreamStubFinishReason — drives _interruptible_streaming_api_call
   through the two recovery branches and asserts:
     - text-only partial → finish_reason="length" (the new behaviour),
     - mid-tool-call partial → finish_reason="stop" (unchanged on purpose).

2. TestLengthContinuationPromptBranching — pure-Python check on the branch
   that picks the continuation prompt by response.id. Locks the network
   error wording for partial-stream-stub vs. the output-length wording
   for everything else.

3. TestConversationLoopPartialStreamContinuation — feeds a stub +
   continuation pair into run_conversation, verifies the loop makes a
   second API call (instead of exiting with text_response(stop)),
   confirms the network-error continuation prompt actually reaches the
   model on call #2, and that final_response stitches both halves.

Refs: NousResearch/hermes-agent#30963
2026-05-24 04:35:15 -07:00
xxxigm 20b3703a42 fix(conversation-loop): tailor length-continuation prompt for partial stream
The length-continue path's user-facing vprint and continuation prompt
both told the model "your response was truncated by the output length
limit." That's a lie when the stub came from a partial-stream network
error (issue #30963) — and a lie the model can detect, leading to "I
wasn't truncated, I'm done" no-op responses that defeat the
continuation entirely.

Detect the partial-stream-stub via response.id and swap in:

- vprint:   "Stream interrupted by network error
             (finish_reason='length' on partial-stream-stub)"
- prompt:   "[System: The previous response was cut off by a network
             error mid-stream. Continue exactly where you left off.
             Do not restart or repeat prior text. Finish the answer
             directly.]"

Real length truncations still see the original "truncated by output
length limit" prompt — the model needs to know which class of failure
it's recovering from. Same length_continue_retries=3 budget,
truncated_response_parts merging, and final-response stitching
infrastructure on both branches.

Refs: NousResearch/hermes-agent#30963
2026-05-24 04:35:15 -07:00
xxxigm 9140be7c22 fix(streaming): emit finish_reason=length on text-only partial-stream stub
When the API connection drops mid-stream after text deltas have already
been delivered, chat_completion_helpers returned a stub response with
finish_reason=stop. The conversation loop then classified the stub as a
clean text completion (text_response(finish_reason=stop)) and exited
with iteration budget remaining — even when the goal-judge verdict
came back as "continue" milliseconds later (issue #30963).

Switch the text-only partial-stream stub to finish_reason=length. The
existing length-continuation path (length_continue_retries up to 3,
"continue exactly where you left off" prompt, partial parts merged
into final_response) then fires automatically: the partial assistant
content is persisted, the model is asked to continue from the cut
point, and the loop keeps making progress against the goal.

The mid-tool-call branch keeps finish_reason=stop on purpose — its
user-facing warning ("Ask me to retry if you want to continue") asks
the user to drive the retry rather than auto-replaying a tool call
with possible side effects.

#5544's "no duplicate message" contract is preserved verbatim: the
partial content is reused, never re-emitted as a fresh API call, so
the user never sees two copies of the same delta.

Refs: NousResearch/hermes-agent#30963
2026-05-24 04:35:15 -07:00
teknium1 60d20a37c9 fix(acp): only deliver final_response after streaming when transformed
PR #29119 dropped the 'not streamed_message' guard unconditionally so
that plugin-transformed responses (transform_llm_output hook) would
reach ACP clients. That regressed test_prompt_does_not_duplicate_streamed_final_message:
when no transform happened, the streamed text was re-sent as a duplicate
final delivery.

Tighten the condition to mirror the gateway side: deliver after streaming
only when response_transformed=True. Otherwise keep the old guard.

Adds test_prompt_delivers_transformed_response_after_streaming so the
transformed path stays covered.
2026-05-24 04:31:13 -07:00
teknium1 26088ca669 chore: map kenyon1977@gmail.com for PR #29119 salvage 2026-05-24 04:31:13 -07:00
teknium1 b9f533af0a test(gateway): regression for plugin-transformed response after streaming
Adds a test that fails without the gateway fix, exercising the
response_transformed=True branch in _finalize_response: a streamed
response whose final text was modified by a transform_llm_output
plugin hook must be edit_message'd in place (not duplicate-sent),
with already_sent=True so the normal final-send is skipped.

Also drops two minor leftovers from the salvaged PR #29119:

  * accumulated_text property on GatewayStreamConsumer (unused)
  * duplicate _response_transformed=False inside the hook try block
2026-05-24 04:31:13 -07:00
kenyonxu 5cb21e3fb5 fix(gateway): edit streamed message instead of sending duplicate when response_transformed
When a transform_llm_output hook appends content after streaming, the previous
fix skipped the final-send suppression which caused the full response to be
sent as a NEW message (duplicate). Instead, edit the existing streamed message
in-place to append the transformed content, then set already_sent=True.

Added stream_consumer.message_id and .accumulated_text public properties.
2026-05-24 04:31:13 -07:00
kenyonxu a4ceead796 fix(gateway): propagate response_transformed flag through run_sync return dict
run_sync() cherry-picks fields from the run_conversation result dict into
a new response dict for the gateway. response_transformed was missing from
the cherry-pick list, so the gateway always saw it as False and suppressed
the final send even though a transform_llm_output hook had modified the content.
2026-05-24 04:31:13 -07:00
kenyonxu 8edeebe6d7 fix: propagate response_transformed flag — plugin hook output survives streaming suppression
When a transform_llm_output hook modifies final_response after streaming,
the gateway was silently discarding the transformed content because
streamed=True / content_delivered=True triggered the final-send
suppression. Three changes:

1. conversation_loop: set `_response_transformed=True` when a
   transform_llm_output hook returns a non-empty string, and expose it
   as `response_transformed` in the result dict.

2. gateway/run: skip the final-send suppression when
   `response_transformed` is True — the transformed response must
   reach the client even if streaming already sent the original text.

3. acp_adapter/server: remove `not streamed_message` guard so
   final_response is always delivered (ACP path fixed separately).
2026-05-24 04:31:13 -07:00
kenyonxu 7eb6c7f489 fix(acp): deliver final_response after streaming — transform_llm_output hook now visible
When streaming is active, streamed_message=True skipped the final_response
update, causing plugin hooks like transform_llm_output to be silently
invisible. Remove the `not streamed_message` guard so the final response
(possibly transformed by plugins) is always delivered to the ACP client.
2026-05-24 04:31:13 -07:00
Teknium 197f63f454 fix(feishu): require webhook auth secret and honor config extras (#30746) 2026-05-24 04:27:28 -07:00
Teknium bdb97b8573 fix(feishu): enforce auth and chat binding for approval buttons (#30744) 2026-05-24 04:27:17 -07:00
Teknium 485292ac7d fix(feishu): authorize interactive exec approval callbacks (#30739) 2026-05-24 04:26:57 -07:00
Teknium be27bfed01 security: harden API server key placeholder handling (#30738) 2026-05-24 04:25:32 -07:00
Teknium 2df2f9190b fix(docker): keep dashboard side-process loopback by default (#30740) 2026-05-24 04:25:28 -07:00
Teknium 4ca77f1059 Harden msgraph webhook auth requirements (#30169) 2026-05-24 04:25:20 -07:00
Teknium 3e78e353d7 fix(qqbot): authorize approval button interactions by session owner (#30737) 2026-05-24 04:25:12 -07:00
Teknium e4a1220f83 security: restrict default webhook toolset capabilities (#30745) 2026-05-24 04:24:54 -07:00
Teknium c3caca6584 fix(gateway): remove discord role allowlist auth bypass (#30742) 2026-05-24 04:24:49 -07:00
Teknium 1f897b0dc9 fix(gateway): stop enabling dingtalk allow-all during setup (#30743) 2026-05-24 04:24:44 -07:00
Teknium 9732559864 fix(security): restrict dashboard websockets to loopback clients (#30741) 2026-05-24 04:24:40 -07:00
Teknium bc3f1f4f34 feat(secrets/bitwarden): EU Cloud + self-hosted server URL support (#31378)
Closes #31370.

bws defaults to the US identity endpoint, so EU Cloud and self-hosted
machine-account tokens fail with [400 Bad Request] {"error":"invalid_client"}
during 'hermes secrets bitwarden setup'. The token is valid — it's just
being checked against the wrong region.

Add a Bitwarden region step to the wizard between the access-token and
project-list steps:

  Step 1  Install bws
  Step 2  Provide access token
  Step 3  Pick region   <-- new (US / EU / self-hosted-custom-URL)
  Step 4  Pick project  (now talks to the right endpoint)
  Step 5  Test fetch

Region is stored in config.yaml as secrets.bitwarden.server_url and
plumbed into every bws subprocess as BWS_SERVER_URL (project list,
secret list, test fetch, and the env_loader startup pull).

Also:
- Non-interactive: 'hermes secrets bitwarden setup --server-url ...'
- Pre-existing BWS_SERVER_URL in the shell is detected and reused
- Cache key includes server_url so EU/US fetches don't collide
- 'hermes secrets bitwarden status' shows the configured region
- 'invalid_client' / '400 Bad Request' from bws now triggers a hint
  pointing at the region setting instead of looking like a bad token
2026-05-24 02:19:57 -07:00
Teknium c9b3eeabdc fix(cli): decouple tool_progress=verbose from global DEBUG logging (#31379)
PR #6a1aa420e coupled `display.tool_progress: verbose` (a per-tool display
toggle for full args / results / think blocks) to `self.verbose` — which
controls root-logger DEBUG level. Result: setting tool_progress: verbose
in config silently flipped every module in the process to DEBUG and
flooded the terminal with internal logging, far beyond just full tool
calls.

The two concepts are separate:
- `tool_progress_mode == 'verbose'` → display behavior (tool rendering)
- `self.verbose` → logging behavior (root logger → DEBUG, line 9795)

This change keeps PR #6a1aa420e's argparse.SUPPRESS / config-fallback
plumbing but severs the verbose-display → debug-logging link.

Changes:
- cli.py:2868 — `self.verbose` only follows explicit `verbose=` arg; no
  longer auto-True when tool_progress_mode == 'verbose'.
- cli.py:_toggle_verbose — slash-cycle through tool progress modes no
  longer flips `self.verbose` / `agent.verbose_logging` / `agent.quiet_mode`.
- cli.py:9355 — fix misleading label (drop 'and debug logs').
- tui_gateway/server.py:_make_agent — same decoupling on the TUI side
  (verbose_logging no longer derived from tool_progress_mode).
- tests/cli/test_tool_progress_scrollback.py — invert the test that
  asserted the broken coupling; add coverage for explicit `--verbose`
  still enabling DEBUG independent of tool_progress.

Live verified:
- tool_progress: verbose, no --verbose flag → 0 DEBUG/INFO log lines
- --verbose flag explicit → 32 DEBUG/INFO log lines (as expected)
2026-05-24 02:19:20 -07:00
AhmetArif0 5848174374 fix(wecom): guard flush task against cancel-delivery race to prevent message loss
When asyncio.sleep() fires just before Task.cancel() is called, CPython
sets _must_cancel=True but cannot cancel the already-completed sleep
future, so CancelledError is delivered at the next await (handle_message)
rather than at the sleep.  By that point the superseded task has already
popped the merged event from _pending_text_batches, so the superseding
task sees an empty batch and silently drops the message.

Fix: add a synchronous task-registry check between the sleep and the pop.
No await between the check and the pop means no other coroutine can
interleave, so the guard is race-free.
2026-05-24 01:33:40 -07:00
Teknium 1bed4e8eed fix(gateway): drop text snippet from debounce debug log (CodeQL)
CodeQL py/clear-text-logging-sensitive-data flagged the candidate-accept
debug log including event.text[:60]. Log text_len instead — sufficient for
debugging burst behavior without surfacing message contents.

Co-authored-by: Paulo Nascimento <pnascimento9596@gmail.com>
2026-05-24 01:31:45 -07:00
Teknium 51bb8c0a9e chore: map pnascimento9596@gmail.com for PR #31235 salvage 2026-05-24 01:31:45 -07:00
Paulo Nascimento 7abd62719b gateway: debounce queued text follow-ups 2026-05-24 01:31:45 -07:00
AhmetArif0 21db250034 fix(wecom-callback): retry send with fresh token on errcode 40001/42001
When WeCom returns errcode=40001 (invalid credential) or 42001 (token
expired), send() was returning a failure without evicting the bad token
from _access_tokens. All subsequent sends then kept using the same
invalid cached token until its TTL naturally expired (~7200s).

Fix: on the first token-rejection errcode, evict the cache entry and
retry once with a freshly fetched token. Non-token errcodes fail
immediately as before. If the refreshed token also fails, the error
is returned without looping further.

Adds four regression tests covering: successful retry on 40001,
successful retry on 42001, no retry on unrelated errcode, and clean
failure when the refresh does not help.
2026-05-24 01:30:47 -07:00
Teknium d3c167b644 fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290)
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint

Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:

A. agent/file_safety.classify_cross_profile_target
   Classifies a write target against the active HERMES_HOME. Returns
   a {active_profile, target_profile, area, target_path} dict when the
   path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
   (skills, plugins, cron, memories). get_cross_profile_warning()
   wraps it into a model-facing error string that names both profiles,
   names the area, and points at the cross_profile=True bypass.

   Defense-in-depth, NOT a security boundary — the terminal tool runs
   as the same OS user and can write any of these paths directly. The
   guard exists to prevent confused-agent corruption, not to stop a
   determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
   still applies.

   Wired into tools/file_tools.write_file_tool and patch_tool with a
   cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
   advertise cross_profile so the model can pass it after explicit
   user direction. patch_tool extracts target paths from V4A patch
   bodies before checking (same shape as the existing sensitive-path
   check).

   skill_manage is already scoped to the active profile's SKILLS_DIR
   by construction, so no extra guard wiring is needed there. The
   D-side error message (below) still names other profiles when the
   skill exists elsewhere.

B. agent/system_prompt
   One deterministic line near the environment-hints block names the
   active profile and tells the model not to modify another profile's
   skills/plugins/cron/memories without explicit direction. Profile
   name is stable for the lifetime of the AIAgent, so the line is
   prompt-cache-safe.

D. tools/skill_manager_tool._skill_not_found_error
   Replaces the bare "Skill 'X' not found." with a message that:
     - names the active profile,
     - searches OTHER profiles' skills dirs for the same name,
     - names the profile(s) where the skill exists and the path,
     - suggests `hermes -p <name>` to switch profiles, or
       cross_profile=True for an explicit edit.

   All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
   write_file, remove_file) now go through the helper.

Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.

What this PR does NOT do:

  * No hard block. The terminal tool can still touch any of these
    paths with no guard — same posture as the dangerous-command
    approval flow. SECURITY.md §3.2 applies.
  * No regex sweep on terminal commands for cross-profile paths.
    That direction is a Skills-Guard-style arms race (cd + relative
    paths, base64, etc.) and would false-positive on legitimate
    cross-profile reads. Filed as a follow-up.
  * No on-disk path migration. ~/.hermes/skills/ remains the
    default profile's skills dir; this PR is about telling the
    agent about that boundary, not changing the layout.

Tests:
  tests/agent/test_file_safety_cross_profile.py (16 tests)
    - _resolve_active_profile_name covers default/named/failure paths
    - classify_cross_profile_target covers all four scoped areas,
      both directions (default → named, named → default, named → named),
      non-Hermes paths, and root-level config files
    - get_cross_profile_warning covers in-profile no-op, cross-profile
      message shape, and the defense-in-depth self-documentation

  tests/tools/test_cross_profile_guard.py (12 tests)
    - write_file: in-profile allow, cross-profile block, cross_profile=True
      bypass, non-Hermes pass-through
    - patch: replace-mode block, cross_profile=True bypass, V4A patch
      path extraction
    - skill_manage: error names the other profile (single + multiple),
      missing-everywhere falls back to skills_list hint
    - system prompt: contract-level checks (both branches present,
      cross_profile=True mentioned, ~/.hermes/profiles/ referenced)

All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.

E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.

* fix(code_execution): add cross_profile to write_file/patch stubs

The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().

Caught by CI on PR #31290.
2026-05-24 00:38:17 -07:00
Teknium b207dc28b3 feat(kanban): --ids bulk promote + AUTHOR_MAP entry for #29464
Adds an --ids flag to 'hermes kanban promote' mirroring the existing
block/schedule convention, so the marquee use case from issue #28822
(promote all children of a closed organizational parent in one shot)
doesn't require a shell loop. Single-id JSON output stays a flat
object for back-compat; bulk emits a list. Dedupes positional + --ids
so the same id can't be promoted twice in one call. 5 new CLI-level
tests cover bulk happy path, partial-failure exit code, JSON shapes,
and dedup.

Also adds the thedavidmurray noreply-email -> github-login mapping in
scripts/release.py so the salvage cherry-pick passes the AUTHOR_MAP
contributor-credit check.
2026-05-23 23:10:36 -07:00
David Murray d46adad22f feat(cli): kanban promote verb for manual todo->ready recovery
Adds `hermes kanban promote <task_id>` for manual lifecycle recovery
when an auto-promote daemon misses the parent-done transition (issue
#28822). Refuses promotion unless every parent dep is done/archived
(override with --force). Emits a `promoted_manual` audit event distinct
from the automatic `promoted` kind, so audit consumers can filter
human-driven from system-driven promotions. Supports --dry-run and
--json for orchestration. Does not mutate assignee/claim state — the
dispatcher picks the card up via its normal ready polling path.

Closes #28822.
2026-05-23 23:10:36 -07:00
novax635 421ab81052 fix(cli): reuse canonical root model key normalization in load_cli_config 2026-05-23 23:08:05 -07:00
Teknium 2442a0c281 fix(background-review): allow pinned skills to be improved
The post-turn background reviewer prompt listed pinned skills under
'Protected skills (DO NOT edit these)' alongside bundled and
hub-installed skills, with the instruction to say 'Nothing to save.'
if only protected skills needed updating. This meant the reviewer
would refuse to patch a pinned skill even when the user explicitly
wanted that skill improved.

The underlying tool layer already gets this right: skill_manage's
_pinned_guard only fires on delete; patch/edit/write_file go through
on pinned skills. Curator archive/consolidation still skips pinned
at the data layer (agent/curator.py), which is the correct place for
that protection — pin's job is anti-deletion, not anti-improvement.

Both _SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT now explicitly
tell the reviewer that pinned skills can be patched, with rationale,
so it doesn't bail out of an improvement just because the target is
pinned.
2026-05-23 22:57:42 -07:00
brooklyn! a627981a65 fix(tui): stop slash dropdown from chopping last char of /goal (#31311)
Two independent bugs caused the slash-command autocomplete to render
`/goal` as `/goa` (and `/gquota` as `/gquot` for that matter) in the TUI:

1. `tui_gateway/server.py` was forwarding `c.display` from
   prompt_toolkit's `Completion` straight into the JSON-RPC payload.
   prompt_toolkit normalizes `display=` into `FormattedText` (a `list`
   subclass), so the wire format became `[["", "/goal"]]` instead of
   the `string` that `CompletionItem.display` in the TUI declares.
   `meta` already went through `to_plain_text` — `display` did not.

2. The dropdown row in `appOverlays.tsx` used `flexDirection="row"`
   with the display `<Text>` and the (very long) meta `<Text>` as
   siblings. When the meta overflows the row width, Ink/Yoga shrinks
   the *first* column by one cell, lopping the trailing character off
   the command name. `/goal` triggers it reliably because its meta
   string is the longest of any built-in command (description +
   embedded `[text | pause | resume | clear | status]` usage hint).
   Wrapping the display column in `<Box flexShrink={0}>` keeps it at
   its natural width and lets the meta wrap or truncate instead.
2026-05-23 22:12:55 -07:00
Teknium 2666009ccc docs: dedicated Nous Portal integration page and setup guide (#31296)
If Nous Portal is the recommended way to run Hermes Agent, it deserves
more than a sub-section buried under `## Inference Providers`. Add two
new pages and shrink the existing providers.md section to a stub that
points at them.

New pages:
- `website/docs/integrations/nous-portal.md` — landing page. What's in
  the subscription (300+ model catalog table, Tool Gateway breakdown,
  Nous Chat, cross-platform parity, no-dotfile-credentials). Hermes 4
  recommendation note. Setup paths (fresh install, existing install,
  headless / SSH, profiles). Day-to-day usage (portal status / portal
  tools / portal open, switching models, mixing gateway with own
  backends, subscription management). Configuration reference. Token
  handling. Troubleshooting. Cross-links. Sidebar-position 1 — first
  entry under Integrations.

- `website/docs/guides/run-hermes-with-nous-portal.md` — task script.
  Eight numbered steps: subscribe → setup --portal → verify with
  portal status → first chat → switch models → customize gateway
  routing → voice mode → cron/always-on. Per-step troubleshooting.
  'What this gets you in plain numbers' comparison table. Sidebar
  position 1 — first entry under Guides & Tutorials.

Existing providers.md:
- Replace the 80-line `### Nous Portal` deep-dive with a 13-line stub
  that summarizes the value prop, lists the three CLI commands, and
  links to the new pages. Saves ~6KB. Other provider sections and
  callouts (Codex Note, Two Commands, Tool Gateway tip) preserved.

Sidebar:
- `integrations/nous-portal` inserted right after `integrations/index`,
  before `integrations/providers`.
- `guides/run-hermes-with-nous-portal` inserted first in Guides &
  Tutorials.
2026-05-23 21:07:58 -07:00
Teknium 2b10024ee8 test(display): cover failure-suffix rendering + update scrollback test
The original PR #17194 description claimed test_display_tool_preview.py
but only ever shipped test_display_todo_progress.py. Add the missing
coverage for the failure-suffix path:

- _trim_error: whitespace strip, length cap, File-not-found path collapse
- _detect_tool_failure: terminal exit codes, memory full, structured
  {error}/{message} extraction, malformed JSON, None result
- get_cute_tool_message E2E: read_file failure, terminal exit-only,
  terminal stderr message, memory full, success path, no-result path

Also update test_tool_progress_scrollback.test_error_suffix_on_failed_tool
to reflect the new behavior: the generic '[error]' fallback in cli.py
has been removed; failure suffixes now come from the result-aware
_detect_tool_failure (e.g. '[exit 1]', '[File not found: x]').
2026-05-23 21:03:51 -07:00
Albert.Zhou ffde8b7b09 feat(cli): show todo progress as done/total fraction
Parse the todo_tool result summary to display completion progress in
CLI tool preview lines:

  Read:    ┊ 📋 plan      3/4 task(s)  0.5s
  Update:  ┊ 📋 plan      update 3/4 ✓  0.5s
  Create:  falls back to plain count when no completed tasks

Falls back gracefully to the existing 'N task(s)' format when the
result is missing, malformed, or has no completed items.

Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current
main.

Co-authored-by: Albert.Zhou <albert748@gmail.com>
2026-05-23 21:03:51 -07:00
Albert.Zhou 094d732378 fix(cli): surface tool failures with specific error messages
Improves the failure suffix on tool completion lines. Instead of always
showing '[error]' for non-terminal failures, parse the tool's JSON result
and surface the actual message:

  Before:  ┊ 📖 read      foo.py  0.1s [error]
  After:   ┊ 📖 read      foo.py  0.1s [File not found: foo.py]

  Before:  ┊ 💻 $         ls bad  0.1s [exit 127]
  After:   ┊ 💻 $         ls bad  0.1s [ls: cannot access 'bad'...]

Adds a _trim_error helper that strips long absolute paths down to the
filename and caps the suffix at 48 chars so it stays readable on narrow
terminals.

Threads the tool result through the tool.completed progress callback so
agent/display.get_cute_tool_message can inspect it. The cli.py [error]
post-suffix is removed in favor of the richer suffix _detect_tool_failure
now produces directly.

Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current
main with the dead-code preview-length bumps dropped (tool_preview_length
config already strictly caps previews, so the per-tool n= defaults are
unreachable).

Co-authored-by: Albert.Zhou <albert748@gmail.com>
2026-05-23 21:03:51 -07:00
honor2030 6a1aa420e7 Fix CLI verbose tool progress config fallback 2026-05-23 21:03:51 -07:00
Teknium d97c324473 fix(terminal): warn at call time when background=true runs silently (#31289)
`terminal(background=true)` without `notify_on_complete=true` or
`watch_patterns` runs the process SILENTLY — the agent has no way
to learn it finished short of calling `process(action='poll')`
explicitly. That's correct for genuine long-lived processes (servers,
watchers, daemons) but is a footgun for every bounded task (tests,
builds, deploys, CI pollers, batch jobs), which is the vast majority
of background uses.

Hit on May 23, 2026 (PR #31231 incident): agent launched a CI-watch
loop with `background=true` only. The poller ran fine, exited green
6 minutes later, agent never noticed. User had to surface 'we are
green CI, you can merge.' Memory and skill docs said *what* to do
(poll in background) but not *how* to receive the result. The
`notify_on_complete=true` flag exists and works, but is easy to
forget when bg seems sufficient on its own.

Two changes here, mutually reinforcing:

1. Runtime nudge: tool result for `background=true` w/o notify or
   watch_patterns now includes a `hint` field explaining the silent-
   process failure mode and pointing at the corrective flag. Agent
   sees it on the same turn and self-corrects without needing the
   user to surface anything. Cost for legitimate server cases is one
   ignored read (~50 tokens); cost for forgot-notify cases is
   prevented blindness (potentially many turns, or a user nudge).
   False positives << false negatives.

2. Schema/description rewrite: top-level TERMINAL_TOOL_DESCRIPTION
   and the `background` field description now lead with 'Almost
   always pair with notify_on_complete=true' instead of presenting
   it as one of two equally-likely patterns. The two legitimate
   non-notify shapes (long-lived servers; watch_patterns mid-process
   signals) are still documented, but as the minority case.

Tests cover all four shapes: bg-only emits hint, bg+notify doesn't,
bg+watch_patterns doesn't, foreground doesn't. 4 new tests; full
suite of background/process tests stays green (160/160 across the
relevant 6 test files).
2026-05-23 21:02:14 -07:00
AhmetArif0 39b8d1d313 fix(dingtalk): finalize open streaming cards before disconnect
AI Card "tool progress" cards created with finalize=False were left in
streaming state on DingTalk's UI after a gateway restart because
disconnect() called _streaming_cards.clear() without first closing
them via _close_streaming_siblings.

Move the finalization loop before self._http_client.aclose() so the
HTTP client is still available when the finalize requests are sent.
Adds a regression test that asserts the HTTP client is alive during
finalization.
2026-05-23 20:48:56 -07:00
Teknium a7b622effc docs(providers): move Nous Portal first, Google Gemini OAuth last (#31287)
Reorder the per-provider subsections under '## Inference Providers'
so Nous Portal — the recommended setup — leads the list, and Google
Gemini via OAuth (which carries a policy-risk warning) drops to last
position right before the '## Custom & Self-Hosted LLM Providers'
section. All other provider sections keep their relative order. Pure
section move; no content changes.
2026-05-23 20:46:17 -07:00
Fewmanism 83f6a83b24 fix(tui): handle images with codex app-server 2026-05-23 20:40:09 -07:00
teknium1 7ce6b504a2 fix(process_registry): use taskkill /T /F for tree-kill on Windows
The Windows branch of `_terminate_host_pid` early-returned after
`os.kill(pid, SIGTERM)` (which Python maps to `TerminateProcess` for
the target handle only), leaving descendant processes — e.g. Chromium
renderer/GPU/network helpers spawned by an `agent-browser` daemon —
running on Windows even after the preceding commit fixed POSIX.

The right Windows primitive is `taskkill /PID <pid> /T /F`:
`/T` walks the tree, `/F` force-terminates. Same approach
`gateway.status.terminate_pid(force=True)` already uses for the
gateway's own shutdown path; reuse the same shape here.

Why NOT extend the POSIX psutil tree-walk to Windows:

  1. Windows doesn't maintain a Unix-style process tree. `psutil.
     Process.children(recursive=True)` walks PPID links that go stale
     when intermediate processes exit, so enumeration is best-effort
     and silently misses orphaned descendants. The whole bug we're
     fixing is orphaned descendants.

  2. `psutil.Process.terminate()` on Windows is `TerminateProcess()`
     for one handle — same single-PID scope as the existing
     `os.kill`. The existing comment in `gateway/status.py::
     terminate_pid` warns this explicitly: 'os.kill SIGTERM is not
     equivalent to a tree-killing hard stop' on Windows.

  3. Headless Chromium has no GUI window, so the softer
     `taskkill /T` without `/F` (which sends WM_CLOSE) won't reach
     it either. `/F` is required.

POSIX path is unchanged. The taskkill subprocess uses the same
`creationflags=windows_hide_flags()` pattern other Windows shellouts
in this codebase use. `FileNotFoundError` / `TimeoutExpired` /
`OSError` fall back to bare `os.kill(SIGTERM)` as cheap insurance.

Tests cover the Windows branch via the codebase's standard
`monkeypatch _IS_WINDOWS` pattern (`references/windows-native-
support.md`), plus POSIX tree-walk order, NoSuchProcess swallow,
and the OSError fallback path. 7 new tests, all green on Linux CI.
2026-05-23 20:30:29 -07:00
Yuan Li 22f3f5a75a fix(browser): use process-tree termination for daemon cleanup
os.kill(pid, SIGTERM) only signals the parent, leaving Chromium child
    processes (renderer, GPU, etc.) orphaned.  Reuse the existing
    ProcessRegistry._terminate_host_pid() helper which walks the process
    tree leaf-up via psutil, terminating children before the parent.
2026-05-23 20:30:29 -07:00
Teknium 72ff3e909c docs(providers): rewrite Nous Portal section as primary recommended path (#31230)
The old section sold Nous Portal as access to Hermes-4 models, which is
backwards — Hermes 4 is a chat/reasoning family that's NOT recommended
for Hermes Agent (per portal.nousresearch.com/info itself). The actual
value prop is the 300+ frontier agentic models (Claude, GPT, Gemini,
DeepSeek, etc.) plus the Tool Gateway plus Nous Chat under one
subscription.

Rewrite to lead with that, position the portal as the recommended way
to run Hermes Agent, demote Hermes 4 to a 'note' explaining why it's
not the right pick for agent workloads, and link to the
manage-subscription page from setup.
2026-05-23 18:19:17 -07:00
Teknium e42fcc5625 fix(provider): make config.yaml model.provider the single source of truth (#31222)
Policy: if it ain't a secret it goes in config.yaml. HERMES_INFERENCE_PROVIDER
was leaking behavioral config into the .env surface, including from the gateway,
which bypassed config.yaml entirely.

Behavior:
- gateway/run.py: drop HERMES_INFERENCE_PROVIDER read in _resolve_runtime_agent_kwargs.
  Gateway now flows through resolve_runtime_provider() with no `requested` override,
  which reads model.provider from config.yaml first.

Docs/UX (strip env var from user-facing surface):
- --provider help text no longer mentions the env var
- cli-config.yaml.example same
- reference/environment-variables.md: remove HERMES_INFERENCE_PROVIDER row and
  the cross-reference from HERMES_INFERENCE_MODEL
- reference/cli-commands.md: blank the env-var column for --provider
- guides/xai-grok-oauth.md, guides/minimax-oauth.md: replace
  HERMES_INFERENCE_PROVIDER=x hermes invocations with config.yaml / --provider
- developer-guide/adding-providers.md, model-provider-plugin.md: reframe

Internal mechanism (kept as-is):
- hermes_cli/main.py writes HERMES_INFERENCE_PROVIDER into the TUI subprocess env
- tui_gateway/server.py reads it on TUI startup
- resolve_requested_provider() / oneshot.py / cli.py still fall through to the
  env var as a last-resort behind config.yaml, which is what makes the TUI
  parent->child handoff work
This stays. We just stop documenting it as a user knob.

Tests: tests/gateway/test_auth_fallback.py — simplify mock to fail on first
call, succeed on second; drop monkeypatch.setenv lines that no longer matter.

Supersedes #31064 (closed with credit to @novax635 who surfaced the underlying
issue but proposed aligning gateway *to* the env var rather than removing it).
2026-05-23 18:18:41 -07:00
Teknium 7a4dc8e8d6 chore: map edison@mcclean.codes for PR #29817 salvage 2026-05-23 17:49:47 -07:00
Edison e752c9454e feat(plugins): add register_auxiliary_task() to PluginContext API
Auxiliary LLM tasks (vision, compression, web_extract, etc.) currently
require modifications to core files for any plugin that needs its own
task slot — specifically the _AUX_TASKS list in hermes_cli/main.py and
the hardcoded env-var bridging dict in gateway/run.py. This violates
the 'plugins must not modify core files' rule and forces every memory
or context plugin that wants its own auxiliary task to either fork
core or open a coupled core+plugin PR.

This change adds a generic plugin surface for auxiliary task
registration:

    ctx.register_auxiliary_task(
        key='memory_retain_filter',
        display_name='Memory retain filter',
        description='hindsight pre-retain dedup/extract',
        defaults={'timeout': 30, 'extra_body': {'reasoning_effort': 'low'}},
    )

After registration, the task automatically:

  - Appears in 'hermes model → Configure auxiliary models' picker via
    a new _all_aux_tasks() merge of built-in + plugin tasks
  - Has its provider/model/base_url/api_key bridged from config.yaml
    to AUXILIARY_<KEY_UPPER>_* env vars at gateway startup
    (gateway/run.py now uses a dynamic bridged-keys set instead of
    a hardcoded per-task dict)
  - Gets plugin-declared defaults (timeout, extra_body, etc.) layered
    underneath user config so unconfigured plugin tasks still work
    (agent/auxiliary_client._get_auxiliary_task_config)
  - Resets to auto via 'Reset all to auto' alongside built-ins

Validation:

  - Rejects shadowing of built-in keys (vision, compression, etc.)
  - Rejects invalid key shapes (must match [A-Za-z0-9_]+)
  - Rejects cross-plugin collisions (clear error)
  - Allows same-plugin re-registration (idempotent updates)

Plugin discovery failures (rare) fall back gracefully — the aux
config UI still shows built-in tasks if get_plugin_auxiliary_tasks()
raises, and gateway env-var bridging keeps working for built-ins.

Built-in tasks remain hardcoded in _AUX_TASKS for stability — they're
the baseline UX, and DEFAULT_CONFIG already ships their defaults.
Plugin tasks layer on top.

Tests: 15 new tests in test_plugin_auxiliary_tasks.py covering API
validation, manager state lifecycle, helper sort order, _all_aux_tasks
merge semantics, _reset_aux_to_auto inclusion of plugin tasks, and
default-layering in auxiliary_client.

Updates the gateway-bridge code-parity test (test_auxiliary_config_bridge)
to assert the new dynamic shape rather than the hardcoded literal env
var names which no longer appear post-refactor.

Motivation: this unblocks PR #20262 (hindsight smart retain pipeline)
and similar plugins that need a dedicated aux task slot. The change
is non-breaking — built-in env vars (AUXILIARY_VISION_PROVIDER, etc.)
keep working since they're produced by the same f-string template
that built the hardcoded names.
2026-05-23 17:49:47 -07:00
soynchux e8fa415a9e fix(cli): validate runtime token refresh capability in Qwen auth status 2026-05-23 17:47:36 -07:00
teknium1 4254f7dd17 refactor(skills): slim AST diagnostic to single entry point
Trim ~600 LOC off the original contribution while keeping the same
operator-facing surface and detection coverage.

- Collapse three entry points (file / dir / bundle) into one
  ast_scan_path(path) that handles both files and directories.
- Drop AstFinding dataclass + severity field — replaced with plain
  (file, line, pattern_id, description) tuples. Severity ordering was
  display-only for a diagnostic that explicitly disclaims security
  verdicts, so the field added bookkeeping without earning its place.
- Replace Rich-markup formatter with plain text grouped by file.
- Drop the 'inspect --ast-deep' surface — same scanner, same output as
  'audit --deep', single CLI entry is enough. Operators audit after
  install; pre-install inspection signal isn't worth the second surface.
- Trim test file to the cases that earn their place: bypass payload,
  syntax error survival, RecursionError survival, false-positive guard
  (importer lookalike), literal-arg false-positive guard, non-.py
  ignored, directory recursion + cache-dir skipping, missing-path,
  getattr/__dict__ detection, formatter empty + populated.

Net: tools/skills_ast_audit.py 353 -> 133 LOC,
tests/tools/test_skills_ast_audit.py 299 -> 103 LOC, full diff
+704/-12 -> +264/-6. No change to tools/skills_guard.py — Skills Guard
verdicts remain untouched per SECURITY.md §2.4.
2026-05-23 17:47:26 -07:00
Tranquil-Flow 7255050c99 feat(skills): add opt-in AST deep diagnostics
Add opt-in AST diagnostics for skill review without making Skills Guard stricter by default.

- Add hermes skills inspect --ast-deep to scan fetched skill bundles before installation
- Add hermes skills audit --deep to scan already-installed hub skills
- Keep AST analysis in tools/skills_ast_audit.py, separate from tools/skills_guard.py
- Label output as diagnostic hints, not security verdicts
- Cover dynamic import/access patterns: importlib, __import__(computed), getattr(computed), and __dict__[computed]

This follows the maintainer guidance from closed PR #7436: useful AST-level analysis belongs in an opt-in diagnostic path, not in Skills Guard's default heuristic scan.
2026-05-23 17:47:26 -07:00
novax635 86871ee25a fix(cli): synchronize HERMES_SESSION_ID across environment and contextvar during session switches 2026-05-23 17:46:55 -07:00
brooklyn! f63ef74eaf fix(tui): refresh virtual transcript on viewport resize (#31077)
* fix(tui): refresh virtual transcript on viewport resize

Notify scroll subscribers when ScrollBox viewport bounds change and key virtual-history updates on viewport height so resize/keyboard changes remount the tail rows instead of leaving stale spacers visible.

* test(tui): isolate viewport-height remount regression

Keep the resize delta below the virtual history scroll quantum so the regression test specifically depends on viewport height entering the snapshot key.

* test(tui): clarify virtual history resize snapshot

Update the resize regression and comments so the test specifically guards viewport-height changes in the virtual-history snapshot key.

* docs(tui): clarify scrollbox subscription signals

Document that ScrollBox subscribers are notified for renderer-computed viewport and content bound changes, not only imperative scrolls.

* fix(tui): recompute virtual tail after width resize

Avoid preserving a frozen virtual transcript range when wrapped rows shrink enough that the old tail window no longer covers the viewport.

* fix(tui): preserve transcript tail across resizes

Wraps + heights are column-dependent, so a width change must remeasure
every row and the renderer must repaint the full viewport.

- Key virtualRows on cols so React remounts wrapped rows on resize.
- Snap back to bottom after sticky-mode resize once React rerenders.
- Reserve a scrollbar + gap column in transcriptBodyWidth (non-termux).
- Full repaint on any viewport height change (was: shrink-only).
- ScrollBox scrollHeight uses deepest child bottom so sticky-bottom
  math can reach the real final rendered row after reflow.
- DECSTBM fast-path now requires full container rect match.

* feat(tui): responsive banner tiers

Terminals can't scale glyphs, so the banner now picks a layout per
column width instead of always rendering the full 101-col logo:

- Wide (>= logo width): full ASCII logo + tagline.
- Mid (>= 58 cols): centered rule banner that expands with viewport.
- Narrow (>= 34 cols): brand line + tagline, both width-aware.
- < 34 cols: hidden.

SessionPanel surfaces model/cwd/sid inline when the hero column is
hidden, so narrow layouts don't lose that info. Logo width constants
derive from the art itself.

* fix(tui): re-check sticky inside resize debounce + document remount

Addresses Copilot review on PR #31077:

- onResize now re-checks isSticky() inside the 100ms timer so manual
  scrolls during the debounce window don't get snapped back to tail.
- Comment on the virtualRows cols-keying calls out the deliberate
  trade-off: per-row local state (e.g. systemOpen) resets on resize so
  yoga can remeasure off live geometry. The hook's scale-by-ratio path
  is too approximate for mixed markdown widths.
2026-05-23 19:39:53 -05:00
0z1-ghb dcbcdd6526 fix(compressor): propagate api_mode and fix root logger calls
- Add api_mode to 4 update_model() call sites:
  - conversation_loop.py: long_context failover and probe stepping
  - agent_runtime_helpers.py: rollback restore (also saves compressor_api_mode)
  - chat_completion_helpers.py: fallback activation
- Fix 31 root-logger calls across 5 files (logging.warning/error/info
  -> logger.warning/error/info) to respect module-level log filtering
2026-05-23 17:38:19 -07:00
0z1-ghb 8b2adead78 fix(compressor): ABC compliance — total_tokens, api_mode, logger consistency 2026-05-23 17:38:19 -07:00
Yuan Li 75643a6154 fix(env): strip null bytes from .env before python-dotenv loads
Null bytes in API key values (introduced by copy-paste) crash
    os.environ[k] = v with ValueError: embedded null byte, preventing
    hermes from starting at all.
2026-05-23 17:17:05 -07:00
Brian D. Evans 514a4eff36 docs(simplex): remove broken Docker install command (#26974) (#26975)
* docs(simplex): remove broken Docker install command (#26974)

The "Or Docker" snippet pointed at `simplexchat/simplex-chat`, which is
not a published Docker Hub image. Users following the docs hit:

  docker: Error response from daemon: pull access denied for
  simplexchat/simplex-chat, repository does not exist or may require
  'docker login'.

The SimpleX Chat project only publishes Docker images for its server
components (smp-server, xftp-server) — the chat CLI is distributed as a
binary release. Drop the broken `docker run` line and keep the verified
binary-download path, with a note pointing users to the upstream
Dockerfile if they want to build a container themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(simplex): drop misleading "Dockerfile" link text

Copilot review flagged that the link text claimed "Dockerfile in the
upstream repo" but the URL pointed at the repository root, not a
specific Dockerfile path. Reword to "build from source from the
simplex-chat repository" so the link text and target match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-23 16:32:20 -07:00
teknium1 87111c7bfe chore: map Glucksberg noreply email in AUTHOR_MAP 2026-05-23 16:26:28 -07:00
Glucksberg 9451087aab fix(telegram): preserve observed group slash commands 2026-05-23 16:26:28 -07:00
chyuwei 7de8cd4c5f fix(tui): clear TTS env var on voice off, and add TTS indicator to status bar
Bug 1: /voice off in TUI mode did not clear HERMES_VOICE_TTS,
leaving TTS stuck ON with no way to disable it (the voice.toggle
tts handler requires voice mode to be ON).

Bug 2: TUI status bar only showed 'voice on/off' without any
indication of whether TTS speech output is active, because the
frontend never tracked voiceTts state.

- tui_gateway/server.py: clear HERMES_VOICE_TTS when voice is turned off
- ui-tui/src/app/useMainApp.ts: add voiceTts state, thread setVoiceTts
  through voice contexts, display [tts] in status bar
- ui-tui/src/app/slash/commands/session.ts: sync tts from voice.toggle response
- ui-tui/src/app/interfaces.ts: add setVoiceTts to all voice context interfaces
2026-05-23 16:21:29 -07:00
0xchainer 2c34a7da87 fix(cli): prevent temp directory leak on ZIP update failure
Move shutil.rmtree into a finally block so the temp directory is always
cleaned up, even when an exception occurs during download, extraction,
or file copying.
2026-05-23 16:16:35 -07:00
Teknium 3b096d6f6d ntfy: tighten robustness, dedupe auth/truncation, add docs
Robustness:
- Surface 401/404 stream failures via _set_fatal_error() so the gateway's
  runtime status reflects 'fatal: ntfy_unauthorized' / 'ntfy_topic_not_found'
  instead of staying 'connected' when the reconnect loop halts. Matches
  the pattern in whatsapp / telegram / sms adapters.
- Strip whitespace from auth tokens so pasted tokens with trailing
  newlines don't produce malformed Authorization headers.

Simplicity:
- Extract _build_auth_header() and _truncate_body() to module-level
  helpers, used by both NtfyAdapter and _standalone_send. Removes the
  duplicated auth/truncation logic between the two paths.

Docs:
- website/docs/user-guide/messaging/ntfy.md — full setup guide,
  identity-model warning, self-hosting, cron usage, troubleshooting.
- website/docs/reference/environment-variables.md — all 9 NTFY_* vars.
- website/docs/user-guide/messaging/index.md — platform comparison row.
- website/sidebars.ts — sidebar entry between simplex and open-webui.

Tests: 78/78 (+ 10 new robustness tests covering token hygiene, fatal
error propagation for 401/404, and the _truncate_body helper).
2026-05-23 16:13:01 -07:00
Teknium 6a8e131a0a refactor(ntfy): convert built-in adapter to platform plugin
ntfy now ships as a self-contained plugin under plugins/platforms/ntfy/
instead of editing 8 core files (gateway/config.py Platform enum,
gateway/run.py factory + auth maps, cron/scheduler.py, toolsets.py,
hermes_cli/status.py, agent/prompt_builder.py, gateway/channel_directory.py,
tools/send_message_tool.py).

All routing goes through gateway/platform_registry via register_platform():
- adapter_factory, check_fn, validate_config, is_connected
- env_enablement_fn seeds PlatformConfig.extra from NTFY_* env vars so
  gateway status reflects env-only setups without instantiating httpx
- standalone_sender_fn handles deliver=ntfy cron jobs when cron runs
  out-of-process from the gateway
- allowed_users_env / allow_all_env hook into _is_user_authorized
- cron_deliver_env_var=NTFY_HOME_CHANNEL for cron home routing
- platform_hint surfaces in the system prompt
- pii_safe=True (topic names are the only identifier; no PII to redact)

Tests moved to tests/gateway/test_ntfy_plugin.py using _plugin_adapter_loader
so the module lives under plugin_adapter_ntfy in sys.modules and cannot
collide with sibling plugin-adapter tests on the same xdist worker. The
core-file grep tests (Platform.NTFY in source, hermes-ntfy in toolsets,
etc.) are replaced with plugin-shape tests covering register() metadata,
env_enablement_fn output, and standalone_sender_fn behavior.

68 tests pass under scripts/run_tests.sh.
2026-05-23 16:13:01 -07:00
sprmn24 b10f17bf1e feat(ntfy): add ntfy platform adapter with atomic reconnect, identity fix, and 81 tests 2026-05-23 16:13:01 -07:00
Brooklyn Nicholson 511b8e2325 fix(tui): re-check sticky inside resize debounce + document remount
Addresses Copilot review on PR #31077:

- onResize now re-checks isSticky() inside the 100ms timer so manual
  scrolls during the debounce window don't get snapped back to tail.
- Comment on the virtualRows cols-keying calls out the deliberate
  trade-off: per-row local state (e.g. systemOpen) resets on resize so
  yoga can remeasure off live geometry. The hook's scale-by-ratio path
  is too approximate for mixed markdown widths.
2026-05-23 17:48:28 -05:00
Brooklyn Nicholson 35fdf11145 feat(tui): responsive banner tiers
Terminals can't scale glyphs, so the banner now picks a layout per
column width instead of always rendering the full 101-col logo:

- Wide (>= logo width): full ASCII logo + tagline.
- Mid (>= 58 cols): centered rule banner that expands with viewport.
- Narrow (>= 34 cols): brand line + tagline, both width-aware.
- < 34 cols: hidden.

SessionPanel surfaces model/cwd/sid inline when the hero column is
hidden, so narrow layouts don't lose that info. Logo width constants
derive from the art itself.
2026-05-23 17:37:51 -05:00
Brooklyn Nicholson 0277194e3b fix(tui): preserve transcript tail across resizes
Wraps + heights are column-dependent, so a width change must remeasure
every row and the renderer must repaint the full viewport.

- Key virtualRows on cols so React remounts wrapped rows on resize.
- Snap back to bottom after sticky-mode resize once React rerenders.
- Reserve a scrollbar + gap column in transcriptBodyWidth (non-termux).
- Full repaint on any viewport height change (was: shrink-only).
- ScrollBox scrollHeight uses deepest child bottom so sticky-bottom
  math can reach the real final rendered row after reflow.
- DECSTBM fast-path now requires full container rect match.
2026-05-23 17:37:42 -05:00
Brooklyn Nicholson 2a75bec607 fix(tui): recompute virtual tail after width resize
Avoid preserving a frozen virtual transcript range when wrapped rows shrink enough that the old tail window no longer covers the viewport.
2026-05-23 14:49:26 -05:00
Brooklyn Nicholson 521c870a05 docs(tui): clarify scrollbox subscription signals
Document that ScrollBox subscribers are notified for renderer-computed viewport and content bound changes, not only imperative scrolls.
2026-05-23 14:18:56 -05:00
Brooklyn Nicholson a4c27af697 fix(tui): measure status cwd by display width
Budget the right-hand status label by terminal display width so wide Unicode paths cannot wrap skinny status bars.
2026-05-23 14:11:44 -05:00
Brooklyn Nicholson d1ad919a44 test(tui): clarify virtual history resize snapshot
Update the resize regression and comments so the test specifically guards viewport-height changes in the virtual-history snapshot key.
2026-05-23 14:11:44 -05:00
Brooklyn Nicholson 4d9791c551 fix(tui): reclaim status width when cwd is hidden
Make the cwd separator width conditional so the computed status layout matches the rendered row on ultra-narrow terminals.
2026-05-23 14:06:08 -05:00
Brooklyn Nicholson cc61e3be49 test(tui): isolate viewport-height remount regression
Keep the resize delta below the virtual history scroll quantum so the regression test specifically depends on viewport height entering the snapshot key.
2026-05-23 14:06:08 -05:00
Brooklyn Nicholson 11b0d9ed2f fix(tui): keep status rule one-line in skinny terminals
Clamp and truncate the cwd/branch segment so narrow status bars cannot wrap into the composer input row.
2026-05-23 13:47:35 -05:00
Brooklyn Nicholson 4fea02cc16 fix(tui): refresh virtual transcript on viewport resize
Notify scroll subscribers when ScrollBox viewport bounds change and key virtual-history updates on viewport height so resize/keyboard changes remount the tail rows instead of leaving stale spacers visible.
2026-05-23 13:41:46 -05:00
brooklyn! 874c2b1fe6 fix(tui): ignore late thinking deltas after completion (#31055)
* fix(tui): ignore late thinking deltas after completion

Prevent stale reasoning events from repainting the TUI status after a turn has already completed and the UI is idle.

* test(tui): restore timers after thinking delta assertion

Keep fake timer cleanup in a finally block so assertion failures cannot leak timer mode into later tests.
2026-05-23 13:31:06 -05:00
brooklyn! e6ca730a22 fix(tui): log parent gateway lifecycle exits (#31051)
* fix(tui): log parent gateway lifecycle exits

Add parent-side breadcrumbs for TUI gateway shutdown and transport exits so future backend EOF/SIGTERM reports identify the parent action that caused them.

* chore(tui): retrigger lifecycle logging checks

Retry transient GitHub checkout failures on the lifecycle logging PR.
2026-05-23 13:28:40 -05:00
brooklyn! 026f64f8e0 fix(tui): commit composer input bursts immediately (#31053)
* fix(tui): commit composer input bursts immediately

Salvage the WSL/terminal multi-character input burst fix with focused regression coverage so delayed pseudo-paste buffers cannot reorder later edits.

* fix(tui): keep newline input bursts on paste path

Preserve paste handling for multi-character chunks with newlines while keeping repeated printable key bursts on the immediate composer path.

* refactor(tui): share composer frame batch interval

Use one frame-sized batching constant for parent updates, local renders, and input burst flushes.
2026-05-23 13:27:16 -05:00
Teknium ad11327db0 feat(kanban): warn users that scratch workspaces are deleted on completion (#30949)
First scratch workspace creation on an install now emits a one-shot
warning log + a 'tip_scratch_workspace' event on the task. Sentinel
file at ~/.hermes/kanban/.scratch_tip_shown silences subsequent
creations across the whole install.

Behavior unchanged — scratch is still ephemeral by design. This just
makes the design visible to new users (reported in user community:
'progress files vanished, no warning anywhere').

Docs (en + ko) updated to spell out 'Deleted when the task completes'
on the scratch bullet and 'Preserved on completion' on worktree/dir.
2026-05-23 11:27:00 -07:00
Teknium cae7537359 infographic: kanban.db corruption defense (#30858 + #30862) (#30952) 2026-05-23 05:55:25 -07:00
teknium1 c4b8f5efee fix(kanban): harden corrupt-db backup against CodeQL path-injection findings
Path.resolve() before any I/O and confine backup writes to the resolved
parent directory. Adds explicit parent-equality assertions so static
analyzers see the containment guarantee, and walks WAL/SHM sidecars
through the same resolved-parent path so accidental .. segments are
collapsed before shutil.copy2.

Functionally equivalent to the original PR; preserves the corrupt bytes
to <db>.corrupt.<ts>.bak in the same directory, still raises
KanbanDbCorruptError from connect(). E2E with Stefan's exact hex header
+ malformed pages still passes. 163/163 kanban tests still pass.
2026-05-23 05:51:33 -07:00
teknium1 4f835f7e43 chore(release): map NickLarcombe author email for #30707 salvage 2026-05-23 05:51:33 -07:00
Nick 39fe4ecee3 fix(kanban): refuse corrupt db auto-init 2026-05-23 05:51:33 -07:00
Teknium e97a4c8f37 docs(readme): add Nous Portal section between Getting Started and CLI/Messaging reference (#30941)
A small, self-contained section under 'Skip the API-key collection —
Nous Portal' explaining what Portal gives you (300+ models + Tool
Gateway), the one-shot install command, and how to inspect routing.
No buzzwords, no comparison tables, no overselling.

Positioned right after 'Getting Started' so it lands where someone
scanning the README has just seen the install steps and is deciding
their next move. Skippable by anyone who already knows their provider.

The line 'You can still bring your own keys per-tool whenever you
want' is the deliberate honesty rail — Portal is an option, not a
funnel. Existing per-provider language elsewhere in the README is
unchanged.

Mirrored to README.zh-CN.md to keep the two READMEs in sync.
2026-05-23 05:25:46 -07:00
QuenVix 7245bc77eb fix(fallback): merge fallback_providers with legacy fallback_model configurations 2026-05-23 05:24:57 -07:00
Teknium 7f1b2b4569 fix(approval): pin 'silence is not consent' contract on timeout/deny (#24912) (#30879)
User incident (Slack, 2026-05-13): user walked away mid-conversation,
agent requested approval to run `rm -rf .git`, the prompt timed out
after the gateway_timeout (default 300s), and the agent removed the
.git folder on its own. Corroborated by an independent report from a
Telegram user.

The underlying code path was correct — `check_all_command_guards`
returns `approved=False` with a BLOCKED message on both timeout and
explicit deny, and `terminal_tool` surfaces that as `status=blocked`
to the agent. The bug is at the model-interface layer: the message
"BLOCKED: Command timed out. Do NOT retry this command." reads to
some models as "try a different command achieving the same outcome."

This commit changes only the model-facing message + the structured
return shape:

  - Timeout message now explicitly names the three evasion paths the
    agent must avoid: retry, rephrase, AND achieve the same outcome
    via a different command. Ends with "Silence is not consent."
  - Explicit deny gets the same shape minus the silence-is-not-consent
    line (it WAS an explicit deny, not silence).
  - New structured fields on the return dict: `outcome` ("timeout"
    or "denied") and `user_consent` (always False on this branch)
    so plugins, hooks, and audit pipelines don't have to string-parse
    the message to distinguish the two cases.

The mechanism that should already have prevented the original incident
— timeout treated as deny, BLOCKED result, post hook fires with
`choice="timeout"` — is unchanged. This commit hardens only the
agent's reading of the result.

Tests:
  - test_timeout_returns_approved_false_with_no_consent — pins the
    return shape on the Slack-shaped notify_cb-registered path
  - test_timeout_message_is_emphatic_against_retry_and_rephrase —
    pins the exact phrases the message must contain
  - test_explicit_deny_carries_same_no_consent_shape — same contract
    on explicit /deny
  - test_timeout_emits_post_hook_with_timeout_outcome — pins the
    post_approval_response hook payload so audit plugins can act

329 approval tests passing (4 new + 325 existing).

Fixes #24912
2026-05-23 02:59:13 -07:00
Teknium 6855d17753 fix(memory): guard against external drift in MEMORY.md/USER.md (#26045) (#30877)
Reproduction (production, 2026-05-14): two concurrent sessions on the
same agent. Session A patches MEMORY.md directly via the patch tool,
appending ~8KB of structured content (Vendor Master, Standing Orders,
Pin Board) — none of it through the memory tool, so no § delimiters.
Session B starts later with stale in-memory state (1 entry, ~331
chars). Session B calls memory(action=replace) on its one known
entry. The tool's _read_file parses A's content as a single 8KB
'entry' (no § splits), then replace truncates that entry to B's new
333-byte content. ~8KB of structured content silently destroyed.

The atomic-rename write path is fine in isolation. The bug is the
implicit contract: the tool assumes MEMORY.md is exclusively a
§-delimited list of small entries it wrote, but the v0.13 install
runbook itself uses 'cat >> MEMORY.md' for onboarding, the patch tool
edits the file directly, and operators do too.

Fix: a drift guard in MemoryStore._detect_external_drift that fires
on either signal:

  1. Re-parse + re-serialize doesn't produce identical bytes
     (catches oddly-encoded delimiters / partial writes).
  2. Any single parsed entry exceeds the store's whole-file char
     limit. The tool budgets the ENTIRE store against that limit
     (2200 chars for memory, 1375 for user), so no tool-written
     entry can legitimately be larger. An entry bigger than the
     store limit means an external writer dropped free-form content
     into what the tool will treat as one entry.

When drift fires, _reload_target writes a .bak.<ts> snapshot of the
on-disk file, then add/replace/remove refuse to flush. The original
file stays untouched. The error dict surfaces the .bak path AND a
remediation string ('integrate missing entries via memory(add=...)
one at a time, then rewrite the file clean') so the model can act on
it without escalating to the operator.

Tests:
  - test_replace_refuses_on_drift, test_add_refuses_on_drift,
    test_remove_refuses_on_drift — all three mutators refuse
  - test_clean_file_does_not_trigger_drift — false-positive check
  - test_error_message_points_at_remediation — error string shape
  - test_drift_guard_also_protects_user_target — USER.md too
  - test_drift_backup_filename_is_unique_per_invocation — bak.<ts>
    naming pin

144 memory tests passing (was 137; +7).

Fixes #26045
2026-05-23 02:51:29 -07:00
Teknium cc93053b42 fix(xai-oauth): apply WKE disambiguator to recovery-path catch-all (#29344)
_recover_with_credential_pool had a second classification site that blanket-
treated any 403 against xai-oauth as entitlement (defense-in-depth for
#26847).  That override defeated the new _is_entitlement_failure
disambiguator from the parent commit — bad-credentials 403s still
short-circuited the refresh path.

Apply the same WKE-unauthenticated / OAuth2-validation-phrase guard at
the override site so xAI's authoritative 'this is auth, not entitlement'
signal wins there too.  The #26847 catch-all still triggers for genuine
entitlement bodies that don't carry the disambiguator.

Closes the end-to-end gap exposed by
test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403.
2026-05-23 02:48:13 -07:00
xxxigm b5ea6a5c80 test(xai-oauth): regression coverage for the bad-credentials disambiguator (#29344)
Eleven new tests pinning the #29344 fix.  Layout mirrors the existing
"Fix D" entitlement section so the bad-credentials disambiguator
sits alongside the entitlement-block tests it complements.

Classifier-level coverage:

* ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix``
  — verbatim shape from the reporter's wire capture
  (``{code: 'caller does not have permission', error: 'OAuth2 access
  token could not be validated. [WKE=unauthenticated:bad-credentials]'}``)
  ↦ classifier must return False so the refresh path runs.
* ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape``
  — same body after ``_extract_api_error_context`` has rewritten it
  to ``{reason, message}``.  The disambiguator must fire in BOTH
  shapes; without this guard the production call site at
  ``_recover_with_credential_pool`` (which goes through the
  normalised extractor) would still misclassify.
* ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant``
  — parametrised forward-compat: ``bad-credentials``,
  ``expired-token``, ``revoked``, ``some-future-reason``.  xAI
  documents the prefix as stable, the suffix after the colon as a
  reason code that can grow; every variant under
  ``unauthenticated:`` must route to refresh.
* ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone``
  — belt-and-braces guard: if a future API revision drops the WKE
  suffix but keeps "OAuth2 access token could not be validated", we
  still classify correctly.
* ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords``
  — defensive: if a body ever carries BOTH the WKE suffix and
  entitlement language, the WKE signal wins.  Auth is recoverable;
  entitlement isn't, and a refreshed token will resurface the
  entitlement message on the next request.
* ``test_is_entitlement_failure_case_insensitive_wke_match`` —
  pins that the classifier lowercases the haystack so a future xAI
  build that uppercases the prefix doesn't reintroduce the bug.

Recovery-path coverage (end-to-end through
``_recover_with_credential_pool``):

* ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403``
  — the headline test the reporter requested: a bad-credentials 403
  with the exact wire body must call ``try_refresh_current()``
  exactly once and ``_swap_credential`` once.  Pre-fix this returned
  ``(False, _)`` because the entitlement classifier over-matched and
  short-circuited the refresh path.
* ``test_recover_with_credential_pool_still_blocks_real_entitlement``
  — companion regression guard for #26847: a pure unsubscribed-
  account body (no WKE suffix, no OAuth2-validation phrase) must
  still surface as entitlement and skip refresh.  The new
  disambiguator must not weaken the original loop-protection it
  was added to preserve.

The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the
existing ``MagicMock`` patterns from the surrounding tests so the
new section reads as a natural extension of "Fix D" rather than a
separate test file.
2026-05-23 02:48:13 -07:00
xxxigm 8b3cb930c9 fix(xai-oauth): honor [WKE=unauthenticated:...] disambiguator in entitlement classifier (#29344)
``_is_entitlement_failure`` over-matched on xAI 403s.  xAI returns the
same permission-denied ``code`` text for two distinct conditions:

  1. Unsubscribed account ("active Grok subscription. Manage at
     https://grok.com" in the ``error`` field).
  2. Stale OAuth access token ("OAuth2 access token could not be
     validated. [WKE=unauthenticated:bad-credentials]" in the ``error``
     field).

The classifier's "does not have permission + grok" substring heuristic
treated both identically, so the credential-pool refresh path was
short-circuited for case (2) — long-running TUI sessions stuck on a
stale OAuth token surfaced a non-retryable client error and the user
had to exit + reopen the TUI to recover (the startup-resolve path
bypasses the classifier entirely, which is why bridge adapters with
proactive refresh cadences didn't see this in practice).

This patch adopts the reporter's recommended fix (option 1, tightest):
honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the
``OAuth2 access token could not be validated`` phrasing as
authoritative "this is auth, not entitlement" signals.  When either
appears anywhere in the body's text fields, the classifier returns
False eagerly — *before* the entitlement keyword checks run — so the
refresh-on-401 path takes over and the existing loop-protection still
guards against runaway refresh storms if the refresh itself fails.

Two small adjustments fall out of this:

* The haystack now also covers ``code`` and ``error`` keys directly,
  not just the ``message``/``reason`` shape ``_extract_api_error_context``
  produces.  Real runtime paths use the normalised shape, but the test
  suite and any future call sites that pass raw bodies get the same
  treatment.  Backwards compatible: missing keys default to empty
  strings, the haystack still skips when everything is blank.

* Both disambiguator checks fire BEFORE the entitlement keyword
  checks.  If a future xAI body somehow lands with both an entitlement
  message AND the WKE suffix, the WKE suffix wins (correct — auth is
  recoverable; entitlement is not, and a refreshed token will surface
  the entitlement message on the next request anyway).

Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``,
``test_is_entitlement_failure_false_for_unrelated_auth_errors``,
``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``,
``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``)
continue to pass unchanged — the unsubscribed-account path, the
generic auth-error path, and the refresh-on-401 path are all left
intact.
2026-05-23 02:48:13 -07:00
Teknium 64b3eb0dd7 docs: surface Nous Portal on pages where it solves a real problem the page describes (#30874)
Follow-up to #30869. Adds Portal mentions on user-facing pages that
naturally call for an LLM + tool credentials but didn't previously
acknowledge Portal as a one-stop option.

- getting-started/installation.md: tip after the 'after install' block
  pointing at 'hermes setup --portal' for users who want everything wired
  at once instead of piecewise via 'hermes model' + 'hermes tools'.
- user-guide/configuring-models.md: small tip near the top — the page is
  literally about provider/model choice and previously had zero Portal
  mention.
- user-guide/features/voice-mode.md: Prerequisites need both an LLM and
  TTS — a Portal subscription is the single setup that covers both.
- user-guide/features/batch-processing.md: highlights Portal as a
  predictable-cost option for parallel agent runs that hit many APIs.
- user-guide/features/api-server.md: backend needs models + tools; one
  Portal sub gives a fully-equipped OpenAI-compatible endpoint.
- user-guide/windows-native.md: early-beta users on Windows benefit most
  from skipping per-tool Windows-key-juggling.
- integrations/providers.md: updates the existing Tool Gateway tip and
  the Nous Portal section to mention the new commands.
- user-guide/features/fallback-providers.md: Nous row in the provider
  table now lists 'hermes setup --portal' as the fresh-install path.

Tone discipline: one Portal mention per page, concrete CLI commands
(no marketing copy), always solving a problem the page itself sets up.
2026-05-23 02:47:53 -07:00
Teknium f3fb7899d0 docs: surface 'hermes setup --portal' and 'hermes portal' across user-facing pages (#30869)
PR #30860 added a one-shot Portal setup command and a small portal CLI
surface. Update the docs so the new commands are discoverable without
upgrading the tone of existing Portal mentions.

- getting-started/quickstart.md: small tip near Choose a Provider
  pointing at 'hermes setup --portal' as the easiest fresh-install path.
- user-guide/features/tool-gateway.md: lead the Get-Started section
  with 'hermes setup --portal' for fresh installs, keep 'hermes model'
  for already-configured users, and add 'hermes portal status / tools'
  to the activity-check commands.
- user-guide/features/{web-search,image-generation,tts,browser}.md: the
  existing 'Nous Subscribers' tip blocks now name the one-shot command
  for new installs, keeping the existing 'hermes tools' path for users
  who only want to swap a single backend.
- reference/cli-commands.md: register 'hermes portal' in the top-level
  command table, add a 'hermes portal' section with subcommands, and
  add '--portal' to the 'hermes setup' options table.

Tone: each page already had a Portal mention. This PR keeps the per-page
count to one and uses concrete CLI commands rather than promotional copy.
Tool Gateway page is the one exception (the whole doc is about Portal).
2026-05-23 02:42:31 -07:00
Teknium 9acf949e34 feat(telegram): edit status messages in place instead of appending (#30864)
Closes #30045. Based on @qike-ms's PR #30141.

Telegram status callbacks (lifecycle, compression, context-pressure)
used to append a fresh bubble on every emit. Now adapter tracks
{(chat_id, status_key) -> message_id}; first call sends, subsequent
calls edit. Failed edits drop the cache entry and fall through to a
fresh send.

- gateway/platforms/telegram.py: send_or_update_status() (+34 LOC)
- gateway/run.py: route _status_callback_sync through it when the
  adapter supports it; plain adapter.send() otherwise (+15 LOC)
- 5 tests covering first send / edit-in-place / edit-failure fallback
  / distinct key & chat isolation
2026-05-23 02:42:10 -07:00
Teknium 4b6d68bd64 test(fast-command): stub _load_gateway_runtime_config too
PR 2362cc468 ("fix(gateway): enforce env variable template expansion
on runtime config loaders") refactored `_load_service_tier` to read
config via the new `_load_gateway_runtime_config` wrapper instead of
opening `_hermes_home/config.yaml` directly. The
`test_run_agent_passes_priority_processing_to_gateway_agent` test still
only stubbed `_load_gateway_config` (the inner loader), so the runtime
wrapper saw an empty config and `_load_service_tier` returned None,
breaking the test:

  FAILED tests/gateway/test_fast_command.py::test_run_agent_passes_priority_processing_to_gateway_agent
   - AssertionError: assert None == 'priority'

Fix: also stub `_load_gateway_runtime_config` to return the expected
`agent.service_tier=fast` config, so the test once again drives the
priority routing path it was written to verify.

Confirmed reproducing on current main before the patch and passing
after.
2026-05-23 02:40:33 -07:00
Zyrixtrex 61ac118724 fix(webhook): enforce INSECURE_NO_AUTH safety rail on dynamic route reloads 2026-05-23 02:39:12 -07:00
Teknium b4cf5b65dd feat(portal): one-shot setup, status CLI, and Nous-included markers (#30860)
* feat(portal): one-shot setup, status CLI, and Nous-included markers

Four small Portal-aware surfaces that drive subscription value without
adding friction for non-Portal users.

  - hermes setup --portal: one-shot Nous OAuth + provider switch + Tool
    Gateway opt-in. Shareable as a single command from docs/social.
  - hermes portal {status,open,tools}: small surface over Portal auth +
    Tool Gateway routing. Defaults to 'status' when no subcommand.
  - Tool picker (hermes tools): when the user is logged into Nous, mark
    Nous-managed provider rows with a star and 'Included with your Nous
    subscription'. Suppressed when not authed — non-subscribers see the
    picker unchanged.
  - BYOK setup hint: a single dim line 'Available through Nous Portal
    subscription.' appears when the user is being prompted for a paid
    API key (Firecrawl, FAL, ElevenLabs, Browserbase, etc.) AND the
    category has a Nous-managed sibling AND the user is not already
    authed to Nous. Suppressed in all other cases.

Tested live end-to-end in an isolated HERMES_HOME with a simulated
authed and unauthed user. Targeted suite (tests/hermes_cli/
test_tools_config.py + test_setup.py) passes 97/97.

* fix: add portal to _BUILTIN_SUBCOMMANDS so plugin discovery fast-path skips it
2026-05-23 02:39:09 -07:00
Teknium 6942b1836e fix(skills_guard): explain why --force is rejected on dangerous verdicts
Follow-up to @sprmn24's verdict-logic fix. The previous block-message
ended in 'Use --force to override' regardless of verdict — but as of
the --force fix above, dangerous community/trusted skills can't be
overridden by --force at all. The misleading hint sends users in a
loop. Replace it with a specific message that tells them what the
documented behavior actually is.

Adds two regression tests covering the dangerous-verdict message
shape and one that pins the existing --force hint for non-dangerous
blocks.
2026-05-23 02:37:30 -07:00
sprmn24 789043b691 fix(security): update tests for verdict and --force changes 2026-05-23 02:37:30 -07:00
sprmn24 0f8215f633 fix(security): correct verdict logic and enforce --force limitation in skills_guard
- _determine_verdict() returned 'caution' for medium/low-only findings,
  causing community skills with harmless patterns (e.g. path traversal
  notation, unpinned pip install) to be incorrectly blocked. Now returns
  'safe' when only medium/low severity findings are present.

- should_allow_install() allowed --force to override 'dangerous' verdict,
  contradicting documented behavior that --force does NOT override dangerous
  scan results. Added explicit check to prevent force-installing skills
  with dangerous verdict.
2026-05-23 02:37:30 -07:00
Teknium db489a315f fix(tests): allowlist tmp_path for kanban_notify artifact delivery (#30852)
`_deliver_kanban_artifacts` routes candidates through
`BasePlatformAdapter.filter_local_delivery_paths` (added in 41d2c758c),
which rejects paths outside `MEDIA_DELIVERY_SAFE_ROOTS`. The two
artifact-delivery tests create fixtures under `tmp_path`, which lives
outside the cache roots — so under CI's hermetic HOME the filter
silently dropped both fake files and the assertions on
`images_uploaded` / `documents_uploaded` failed.

Fix: monkeypatch `HERMES_MEDIA_ALLOW_DIRS=str(tmp_path)` in both tests
so the safety filter accepts the fixtures. Production behaviour
unchanged; test-side fix only.

CI fail repro on origin/main: test (6) shard, both
test_notifier_uploads_artifacts_on_completion and
test_notifier_artifact_delivery_skips_missing_files.
2026-05-23 02:34:34 -07:00
xxxigm 5b6f0b695b test(tls-fd-recycle): pin shutdown-only + thread-aware close contract (#29507)
Ten regressions across both prongs of the #29507 fix, organised so each
test names exactly which way the bug could come back:

Prong 1 — ``force_close_tcp_sockets``:
* ``shutdown_only_no_close`` is the smoking-gun assertion. If a future
  refactor adds back ``sock.close()`` to this helper, the FD-recycling
  race that wrote TLS bytes on top of ``kanban.db`` is back, and this
  trips.
* ``uses_shut_rdwr`` pins that both halves are shut down (a half-close
  wouldn't unblock a worker stuck in ``recv``).
* ``swallows_oserror_on_shutdown`` covers the already-shutdown case.
* ``handles_multiple_pool_entries`` walks all pool connections.

Prong 2 — thread-aware ``_close_request_client_once``:
* ``stranger_thread_aborts_only_no_close`` simulates the asyncio_0 →
  Thread-1616 interrupt path: stranger drives abort, holder stays
  populated for the worker's eventual finally.
* ``owner_thread_pops_and_full_close`` is the worker-thread path: pops
  + full close.
* ``stranger_then_owner_close_sequence_runs_full_close_exactly_once``
  replays the reporter's exact timeline at object level: abort runs
  once, full close runs once, holder ends empty.

Agent surface:
* ``_abort_request_openai_client_does_not_call_client_close`` pins
  that the new entrypoint shuts sockets and emits the
  ``deferred_close=stranger_thread`` marker but never calls
  ``client.close()``.
* ``_abort_request_openai_client_null_client_is_noop`` defensive.

End-to-end:
* ``fd_recycle_window_closed_by_shutdown_only`` reproduces the race
  at object level — runs the abort path from a stranger thread and
  asserts that no ``close()`` ever fires, so the kernel can never
  recycle the FD under the owner's still-active reference.
2026-05-23 02:31:10 -07:00
xxxigm 30c22f1158 fix(api-call): defer client.close() to owning worker thread on interrupt (#29507)
Layer-2 defense for the FD-recycling race: even with
``force_close_tcp_sockets`` reduced to shutdown-only, the followup
``client.close()`` in ``_close_openai_client`` still walks the httpx
pool and closes sockets — and if called from a stranger thread (the
interrupt-check loop, the stale-call detector) it has the same
FD-recycling exposure that wrote a TLS record on top of ``kanban.db``.

Stamp the request_client_holder with the owning thread's ident at
``_set_request_client`` time. In ``_close_request_client_once``:

* Owning thread (the worker's ``finally``) → pop + ``client.close()``
  via ``_close_request_openai_client``, exactly as before.
* Stranger thread → ``_abort_request_openai_client`` (new): only
  ``shutdown(SHUT_RDWR)`` the pool sockets and log a deferred-close
  marker. The holder stays populated so the worker's eventual
  ``finally`` performs the real close from its own thread context,
  where the FD release races nothing.

Applied symmetrically to both the non-streaming
``interruptible_api_call`` and the streaming variant — both routinely
get hit by stranger-thread interrupts.

The log field ``tcp_force_closed=N`` keeps its existing shape; the new
abort path adds ``deferred_close=stranger_thread`` so production
triage can distinguish the two close kinds.
2026-05-23 02:31:10 -07:00
xxxigm e2a7d73a66 fix(force_close_tcp_sockets): shutdown only, do not release FD (#29507)
The helper used to call ``socket.shutdown(SHUT_RDWR)`` followed by
``socket.close()`` to drop CLOSE-WAIT entries immediately. On its own
``shutdown()`` is safe from any thread — it only sends FIN and breaks
pending ``recv``/``send`` — but ``close()`` releases the FD integer to
the kernel. When the helper runs on a stranger thread (the interrupt
loop, the stale-call detector) the FD release races the owning httpx
worker thread that still has the same integer cached inside the SSL
BIO. The kernel then recycles that integer to the next ``open()`` call
— in production, kanban dispatcher's ``kanban.db`` — and the worker's
delayed TLS flush writes a 24-byte TLS application-data record on top
of the SQLite header.

Restrict the helper to ``shutdown(SHUT_RDWR)`` only. The owning httpx
worker's own unwind will close the underlying socket via the same
Python ``socket.socket`` object, which atomically swaps ``_fd`` to -1
before issuing ``close(2)`` — no FD-aliasing window.

The log field ``tcp_force_closed=N`` is kept (now counts shutdowns) so
existing dashboards / log parsers keep working.
2026-05-23 02:31:10 -07:00
sprmn24 53cb6d32be fix(agent): use atomic_json_write for request debug dumps instead of bare write_text 2026-05-23 02:30:57 -07:00
sprmn24 b183be95a2 fix(gateway-windows): atomic write for .cmd and startup launcher scripts 2026-05-23 02:30:41 -07:00
walli 60b0a0e006 fix(qqbot): fix SILK magic byte detection slice length
_guess_ext_from_data: data[:5] == b"#!SILK" -> data[:6] (6-byte string)
_looks_like_silk: data[:4] == b"#!SILK" -> data[:6]

The previous slices were too short to ever match the 6-byte "#!SILK"
literal, relying entirely on the "#!SILK_V3" (9-byte) and 0x02! (2-byte)
fallback paths for SILK format detection.
2026-05-23 02:27:17 -07:00
walli 0e7448d63a fix(qqbot): use original attachment filename for cached files
Add original_name parameter to _download_and_cache, preferring the
attachment metadata filename over the CDN URL path basename. Previously
files were cached with meaningless QQ CDN hash names (e.g.
qqdownload_...oadftnv5), causing ugly filenames when sent back to users.

Aligns with qqbot-agent-sdk's AttachmentDownloader.download_document.
2026-05-23 02:27:17 -07:00
walli a54f5afc70 fix(qqbot): handle op 7/9 and expand fatal close code set
1. Handle op 7 (Server Reconnect): close WS to trigger reconnect loop
   while preserving session for Resume
2. Handle op 9 (Invalid Session): check d value to determine if session
   is resumable; clear session only when not resumable
3. Remove 4009 from session-clearing set (connection timeout is resumable)
4. Expand fatal close codes: 4001/4002/4010-4014 now stop reconnect
   immediately instead of retrying uselessly
5. Add unit tests
2026-05-23 02:27:17 -07:00
walli bbd77d165c fix(qqbot): add INTERACTION intent and expose video/file cached paths
1. Add INTERACTION intent bit (1<<26) to _send_identify, fixing approval
   button clicks not being received (INTERACTION_CREATE events were never
   dispatched by the gateway)
2. Include local cached path in video/file attachment descriptions so the
   LLM can reference files for re-sending to users
3. Add unit tests (TestIdentifyIntents, TestProcessAttachmentsPathExposure)
2026-05-23 02:27:17 -07:00
teknium1 66d81f9e14 fix(gateway): don't swallow expansion errors in runtime config helper
A bare except in _load_gateway_runtime_config would silently return the
unexpanded dict on any _expand_env_vars failure — masking the very bug
this helper exists to fix. Drop it; let the caller see real errors.
2026-05-23 02:27:08 -07:00
QuenVix 2362cc4688 fix(gateway): enforce env variable template expansion on runtime config loaders 2026-05-23 02:27:08 -07:00
QuenVix d21ac579e9 fix(gateway): honor key_env in auth-failure fallback resolution 2026-05-23 02:25:53 -07:00
Teknium 99671a8634 test(kanban): allow tmp_path artifacts past media-delivery validator
PR #41d2c758c ("Fix unsafe gateway media path delivery") tightened
`validate_media_delivery_path` so that artifacts emitted by the agent
must live inside `MEDIA_DELIVERY_SAFE_ROOTS` (Hermes-managed cache
dirs) or an operator-allowlisted root via `HERMES_MEDIA_ALLOW_DIRS`.

Two kanban-notifier tests put their PDFs and PNGs under pytest's
`tmp_path`, which is correctly rejected by the new validator. They
started failing on main as soon as that PR landed:

  FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_uploads_artifacts_on_completion
  FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_artifact_delivery_skips_missing_files

Symptom in logs: "Skipping unsafe local file path outside allowed
roots". The validator is doing exactly what it should — the tests were
relying on the looser pre-fix behaviour.

Fix: add `HERMES_MEDIA_ALLOW_DIRS=tmp_path` to the `kanban_home`
fixture so artifacts under `tmp_path` are recognised as safe. This is
the same allowlist mechanism the operator-facing env var documents.
2026-05-23 02:25:09 -07:00
Teknium 5772e638c9 chore: drop in-repo infographic/ directory; keep PR-body URLs only (#30854)
PR infographics belong in PR descriptions, not committed to the repo.
Removes the 13 archived directories under infographic/ and adds the path
to .gitignore so future generations don't accidentally land in-tree.

The fal.media URLs embedded in each PR's body remain the canonical
artifact — those PR descriptions are the storage.
2026-05-23 02:25:03 -07:00
sprmn24 b2e6fdd3bf fix(agent): log warning when fallback model normalization fails instead of silently swallowing 2026-05-23 02:23:24 -07:00
teknium1 70aaa774be fix(opencode-go): emit Kimi reasoning_effort, match KimiProfile shape
The Kimi K2 branch added in the prior commit only emitted extra_body.thinking
and dropped reasoning_effort entirely. KimiProfile (api.moonshot.ai/v1) sends
both fields, and OpenCode Go proxies to the same Moonshot backend. Mirror that
shape on the Go path so /reasoning effort actually reaches Kimi.

- low/medium/high pass through verbatim
- xhigh/max clamp to high (Moonshot's max supported value)
- minimal / unknown effort → omit reasoning_effort, keep thinking on
- disabled / no config → unchanged
- DeepSeek branch unchanged
2026-05-23 02:20:28 -07:00
Harish Kukreja 3589960e03 fix(provider): expose OpenCode Go reasoning controls 2026-05-23 02:20:28 -07:00
helix4u 71291d83cd test: keep tirith checks hermetic 2026-05-23 02:20:14 -07:00
QuenVix 52a368fa72 fix(gateway): preserve WhatsApp pairing approvals across JID/LID alias flips 2026-05-23 01:46:34 -07:00
Teknium 3127a41cb1 test(acp): pin parse_model_input in slash-command tests
The two ACP slash-command tests that exercise `provider:model` routing
(`test_set_session_model_accepts_provider_prefixed_choice` and
`test_model_switch_uses_requested_provider`) relied on the live
`hermes_cli.models._KNOWN_PROVIDER_NAMES` / `_PROVIDER_ALIASES` module
state to parse `anthropic:claude-sonnet-4-6` into
`("anthropic", "claude-sonnet-4-6")`. If any earlier test in the same
xdist worker registers a custom provider that shadows `anthropic` or
otherwise mutates those globals, the parser falls into the
`detect_provider_for_model` branch and resolves to `custom` instead.

Observed once in CI on run 26326728502 / job 77505732299 as
`AssertionError: assert 'custom' == 'anthropic'` — could not reproduce
locally under per-file isolation, so the failing in-file order was
specific to a particular xdist scheduling.

Monkeypatching `parse_model_input` + `detect_provider_for_model` for
both tests removes the global-catalog dependency, so the tests now only
exercise what they were written to verify (the `requested_provider ->
runtime -> AIAgent kwargs` plumbing).
2026-05-23 01:44:56 -07:00
xxxigm 6a2df9f451 docs(env): clarify HERMES_ENABLE_PROJECT_PLUGINS contract (#29156)
The reference entry now documents the truthy set
(``1`` / ``true`` / ``yes`` / ``on``) explicitly, matches the
falsy half (``0`` / ``false`` / ``no`` / ``off`` / empty string)
that the GHSA-5qr3-c538-wm9j fix re-aligned both the agent loader
and the dashboard web server around, and points readers at the
defence-in-depth rule that project plugins never have their
Python ``api`` file auto-imported by the dashboard regardless of
the env var.
2026-05-23 01:43:52 -07:00
xxxigm 8bf99227f0 fix(plugins): block plugin-api path traversal + project RCE (#29156)
GHSA-5qr3-c538-wm9j — half two of the bypass chain.

``_mount_plugin_api_routes`` imports each dashboard plugin's
manifest ``api`` field as a Python module via
``importlib.util.spec_from_file_location`` — arbitrary code
execution by design.  Two primitives in the surrounding code
turned that "by design" RCE into a usable attack:

1. Absolute paths in the manifest swallow the plugin directory.
   ``Path('safe/dashboard') / '/tmp/evil.py'`` resolves to
   ``/tmp/evil.py``, so a single manifest line
   ``{"api": "/tmp/payload.py"}`` was enough to redirect the
   importer at any Python file on disk.
2. ``..`` traversal in the manifest climbs out of the dashboard
   directory.  ``Path('plugins/safe/dashboard') /
   '../../../tmp/evil.py'`` lands in ``/tmp/evil.py`` after
   ``resolve()`` — the static-asset handler
   (``serve_plugin_asset``) already defends against this via
   ``is_relative_to``; the api-mount path didn't.

Fix at three layers so a regression in any one can't re-open the
advisory:

* New ``_safe_plugin_api_relpath`` validator runs at *discovery*
  time and stores only sanitised relative paths on the plugin
  entry's ``_api_file`` field.  Absolute paths, ``..`` traversal,
  empty / non-string values, and paths that ``resolve()`` outside
  the plugin's ``dashboard/`` directory are rejected with a
  warning naming the plugin.  ``has_api`` follows the sanitised
  value so the dashboard frontend doesn't render a fake "Backend
  API" badge for plugins whose api was scrubbed.
* ``_mount_plugin_api_routes`` re-validates the resolved path
  against the live filesystem just before the import — defence in
  depth in case ``_dir`` is tampered with post-cache or a future
  caller bypasses the discovery-time validator.
* Project plugins (``source == "project"``) are refused outright
  for backend import.  ``./.hermes/plugins/`` ships with the CWD,
  so any threat model that includes "user opens a malicious repo"
  treats it as attacker-controlled; project plugins can still
  extend the UI via static JS/CSS but their Python ``api`` is no
  longer auto-imported.  Combined with the truthy env-gate fix
  from the previous commit, the original advisory chain now
  fails at two distinct choke points.
2026-05-23 01:43:52 -07:00
xxxigm da636e982b test(plugins): regression coverage for project-plugin RCE chain (#29156)
35 new tests across 5 classes covering every layer of the
GHSA-5qr3-c538-wm9j defence.  Each class corresponds to one chokepoint
so a regression in any single layer is caught by the named class:

* ``TestProjectPluginsEnvGate`` (13 cases) — parametrised over both
  the documented truthy values (``1`` / ``true`` / ``yes`` / ``on``
  + uppercase variants) and the previously-bypassing falsy strings
  (``0`` / ``false`` / ``no`` / ``off`` / ``""`` / ``False``).  The
  falsy half is the direct env-bypass repro: pre-fix any non-empty
  string enabled the project source.
* ``TestApiPathSanitizer`` (16 cases) — unit-level coverage of the
  new ``_safe_plugin_api_relpath`` helper.  Absolute paths
  (``/etc/passwd``, ``/tmp/payload.py``, ``/usr/bin/python``),
  ``..``-traversal payloads (including nested ``subdir/../../..``),
  and non-string / empty / whitespace-only values must all return
  ``None``.  Safe relative paths (``api.py``, ``backend/routes.py``)
  round-trip unchanged so legitimate plugins keep working.
* ``TestDiscoveryScrubsApiField`` (3 cases) — end-to-end through
  ``_discover_dashboard_plugins`` with a real manifest on disk.
  Verifies that the cached plugin entry's ``_api_file`` is
  scrubbed *at discovery time* (``None`` + ``has_api: False``) so
  any downstream consumer can't be tricked into re-deriving the
  unsafe path from cache.
* ``TestMountApiRoutesRefusesUntrusted`` (3 cases) — pokes
  synthetic plugin entries with each refusal vector directly into
  the cache and patches ``importlib.util.spec_from_file_location``
  to assert it is *not* invoked for project-source / traversal
  payloads, and *is* invoked normally for bundled / user plugins.
* ``TestEndToEndPocBlocked`` (1 case) — reproduces the original
  advisory PoC: operator sets ``HERMES_ENABLE_PROJECT_PLUGINS=0``
  believing project plugins are off, attacker plants a manifest in
  CWD's ``.hermes/plugins/`` with ``api`` pointing at an absolute
  payload path.  Asserts that the importer is never called against
  the payload path *and* that ``hermes_dashboard_plugin_evil`` is
  not in ``sys.modules`` after the mount routine runs.

An autouse fixture busts ``_dashboard_plugins_cache`` before and
after each test so the production cache (populated by the
import-time ``_mount_plugin_api_routes()`` call) can't bleed in.
All 12 pre-existing dashboard-plugin tests in
``test_web_server.py`` still pass unchanged.
2026-05-23 01:43:52 -07:00
xxxigm 09f85f2cf7 fix(plugins): apply truthy env semantics to project-plugin gate (#29156)
GHSA-5qr3-c538-wm9j — half one of the bypass chain.

``_discover_dashboard_plugins`` opted into the untrusted ``./.hermes/
plugins/`` source via ``if os.environ.get("HERMES_ENABLE_PROJECT_
PLUGINS"):`` — which is True for any non-empty string.  ``=0``,
``=false``, ``=no``, ``=off`` all return non-empty strings and so
*enabled* the project source even though every operator (and the
agent loader, ``hermes_cli/plugins.py`` line 815) reads those values
as "disabled".  An attacker who can land a manifest under the CWD's
``.hermes/plugins/`` directory — a malicious cloned repo, a worktree
checked out from a forked PR, a CI runner workspace — was therefore
guaranteed to get their manifest discovered the moment the user ran
``hermes dashboard`` from that directory, regardless of whether the
user thought they had project plugins disabled.

Switch to the shared ``utils.env_var_enabled`` helper used by the
agent loader so the gate accepts the documented truthy set (``1`` /
``true`` / ``yes`` / ``on``, case-insensitive) and treats everything
else — including ``0`` / ``false`` / ``no`` — as off.

Half two (path-traversal + project-source ``api`` import) lands in
the next commit.  Together they break the RCE chain at two distinct
choke points so a future regression in either one alone can't
re-open the advisory.
2026-05-23 01:43:52 -07:00
Teknium 11e6dd3c60 chore(release): add AUTHOR_MAP entry for egilewski (PR #30432) (#30833) 2026-05-23 01:41:31 -07:00
Eugeniusz Gilewski 41d2c758c3 Fix unsafe gateway media path delivery 2026-05-23 01:40:35 -07:00
Markus 4a91e36495 fix(gateway): separate observed Telegram group context 2026-05-23 01:33:42 -07:00
Teknium 729a778af0 infographic: PR #17659 read-deny credentials salvage 2026-05-22 20:15:09 -07:00
Teknium 97e975edd2 fix(file-safety): widen read-deny to .env, mcp-tokens/, webhook secrets, root
Extends @briandevans's PR #17659 from {auth.json, auth.lock,
.anthropic_oauth.json} to also cover:

  - HERMES_HOME/.env                       (provider API keys)
  - HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets)
  - HERMES_HOME/mcp-tokens/                (OAuth token directory; dir
                                            + everything inside)

…AND iterates over both _hermes_home_path() AND _hermes_root_path()
so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block
<root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the
write-deny side already does (#15981, #14157).

Explicitly NOT a security boundary. Per the personal-assistant trust
model, the terminal tool runs as the same OS user and can `cat
auth.json` directly. This read-deny exists as defense-in-depth:

  - Models that respect tool denials empirically tend to stop rather
    than reach for the shell.
  - The denial surfaces an audit trail when something tries to read
    credentials — easier to spot in logs than a generic `cat`.

Docstring + error message both flag this as defense-in-depth so future
contributors don't mistake it for a real security boundary and don't
re-decline reports that propose the same fix shape.

Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel
PR #8055 (closed-as-duplicate, credited).

Co-authored-by: Tom Qiao <zqiao@microsoft.com>
2026-05-22 20:15:09 -07:00
briandevans 567ea61298 fix(file-safety): block auth.json read via TERMINAL_CWD relative path
read_file_tool resolves relative paths against TERMINAL_CWD (or the
task's live terminal cwd), but the prior call passed the original
unresolved string to get_read_block_error. That function's own
resolve() is anchored at the Python process cwd, so when a task's
TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file
on the relative path "auth.json", the credential-store denylist was
never reached and the file was read normally.

Pass the already-resolved absolute path string at the file_tools call
site, document the contract on get_read_block_error, and add a
read_file_tool-level regression test that pins the relative-path
case under TERMINAL_CWD == HERMES_HOME.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:15:09 -07:00
briandevans 056e00a77e fix(file-safety): block read_file on HERMES_HOME credential stores (#17656)
`get_read_block_error` previously only denied reads inside
`${HERMES_HOME}/skills/.hub`, which left `auth.json` (provider OAuth
state + plaintext API keys) and `.anthropic_oauth.json` (Anthropic PKCE
tokens) directly readable by the agent. A prompt-injection reaching
`read_file` could exfiltrate active provider credentials in plaintext.

Mode-0600 file permissions only protect against *other Unix users* —
the agent runs as the file's owner, so `read_file` is unaffected.

Extend the existing deny list with the three credential paths
identified in #17656 (`auth.json`, `auth.lock`, `.anthropic_oauth.json`).
The check uses the same `Path.resolve()` pattern as `skills/.hub`, so
symlink/path-traversal indirection is caught too. The agent doesn't
need to read these directly — `auxiliary_client` and `credential_pool`
consume them through process env / OAuth flows that bypass `read_file`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 20:15:09 -07:00
Teknium 7f7245bf62 infographic: PR #6656 skill hub safety audit salvage 2026-05-22 19:59:24 -07:00
Teknium 3f78d8073c fix(skills): make content_hash filename-sensitive too (symmetric with bundle_content_hash)
PR #6656 added rel_path + \x00 prefixing to ``bundle_content_hash`` so a
filename swap between two files in a bundle changes the digest. But it
only patched the in-memory side — ``content_hash`` in ``tools/skills_guard.py``
(the on-disk equivalent) still hashed file contents only.

These two functions need to stay symmetric: ``check_for_skill_updates``
compares the disk hash of an installed skill against the bundle hash
of the upstream copy. With the asymmetric fix, every clean install
showed as drifted because the digests no longer matched
(2 existing tests in ``test_skills_hub.py`` started failing as soon as
the contributor's change landed).

Apply the same ``rel_path + \x00 + content`` shape to the disk-side
function. Both functions now produce the same digest for the same skill
content laid out two ways. Documented the symmetry invariant in the
docstring so a future change to either function knows to touch both.

Also adds tests/tools/test_pr_6656_regressions.py with 10 regression
tests covering all three fixes salvaged in PR #6656:
  - uninstall_skill path traversal (4 cases: parent segments, absolute
    paths, symlink escape, legitimate skill)
  - bundle_content_hash filename swap detection (4 cases: in-memory
    swap, identity, disk-side swap, bundle↔disk symmetry)
  - list_pending lock contract (2 cases: source-grep contract, smoke)

Also fixes AUTHOR_MAP entry for @aaronlab — their commit email
(1115117931@qq.com) maps to "aaronagent" which isn't a real GitHub
login, so changelog @mentions would 404.
2026-05-22 19:59:24 -07:00
aaronagent b82608a6f5 fix(skills,pairing): path traversal guard in uninstall, lock list_pending, hash file paths
- skills_hub: validate that uninstall_skill's install_path resolves
  inside SKILLS_DIR before calling shutil.rmtree, preventing recursive
  deletion of arbitrary directories via poisoned lock.json entries
- skills_hub: include file paths (not just contents) in
  bundle_content_hash so swapping filenames between files changes the
  hash, strengthening update-detection integrity
- pairing: wrap list_pending() in self._lock so _cleanup_expired() file
  writes don't race with concurrent generate_code()/approve_code() calls

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-05-22 19:59:24 -07:00
teknium1 8cf977c8b1 fix(plugins): widen _sanitize_plugin_name for category-namespaced names
Follow-up to PR #28832 — the dashboard plugin routes now accept slashed
names like `observability/langfuse` and `image_gen/openai`, but
`_sanitize_plugin_name` still rejected forward slash and so dashboard
update + remove on those plugins fell through to '404 not found' even
though they exist on disk.

Adds an opt-in `allow_subdir=True` flag that:
- Permits internal forward slashes (category-namespaced plugin keys
  emitted by `_discover_all_plugins`).
- Strips leading and trailing slashes.
- Still rejects `..` and backslash, and still asserts the resolved
  target lives inside `plugins_dir`.

Opted in at the two read-paths that operate on installed plugins:
`_require_installed_plugin` (CLI update/remove) and
`_user_installed_plugin_dir` (dashboard update/remove). The install
path keeps the default (`allow_subdir=False`) because freshly-cloned
plugins always land top-level under `~/.hermes/plugins/<name>/`.

Adds 6 targeted unit tests covering the new flag's allow/reject matrix.
2026-05-22 19:50:32 -07:00
Austin Pickett 487c398dcf refactor(web): dashboard typography & contrast pass
Removes the global `uppercase` + `font-mondwest` from the App.tsx root
that forced every page to opt-out, replaces stacked-alpha text colors
with semantic tokens for WCAG-AA contrast across all 7 themes, and
applies the new `text-display` utility from @nous-research/ui@0.16.0
on intentional brand chrome (page titles, sidebar headings, segmented
filters) only. Bumps every sub-12px arbitrary text size to text-xs.

Also widens the dashboard plugin routes (/api/dashboard/agent-plugins/
{name:path}/...) so category-namespaced plugins like observability/
langfuse and image_gen/openai can be enable/disabled from the dashboard
— previously the FE encodeURIComponent-ed the slash and the backend
{name} route rejected it. _validate_plugin_name still blocks .. and
backslash, and strips leading/trailing slash.

Touches sessions/env/keys page chrome and adds two new i18n keys
(`overview`, `showMore`/`showLess`) across all 18 locales.

Squashes 19 commits from PR #28832.

Co-authored-by: Hermes <noreply@nousresearch.com>
2026-05-22 19:50:32 -07:00
ethernet dc4b0465b5 feat(ci): use 6-way slicing based on benchmark results
Benchmarked 4/5/6/7/8 slices with LPT duration-balanced distribution:
- 4 slices: 4.8m wall, 135s spread
- 5 slices: 3.4m wall, 46s spread
- 6 slices: 3.3m wall, 26s spread ← optimal
- 7 slices: 3.9m wall, 109s spread
- 8 slices: 3.7m wall, 96s spread

6 slices is the sweet spot: lowest wall time, tightest spread.
7+ gets slower due to per-slice startup overhead dominating.

Also removes benchmark branch markers from save-durations condition.
2026-05-22 19:46:18 -07:00
ethernet e7cb5d4b68 fix: clean push triggers 2026-05-22 19:46:18 -07:00
ethernet f89afdbd17 fix(test): deflake two intermittent CI failures
- test_browser_secret_exfil: mock _run_browser_command instead of
  launching real Chrome (secret check is pre-launch, browser is
  irrelevant to the assertion)
- test_web_server: add time.sleep(0.05) after pub.send_text() to
  yield the event loop before receive_text(). TestClient's sync mode
  can race the broadcast handler otherwise, hanging the test.
2026-05-22 19:46:18 -07:00
ethernet 510df6eaf4 test: 4-way slice benchmark (with cache save) 2026-05-22 19:46:18 -07:00
ethernet b689624aee feat(ci): 4-way matrix slicing with LPT duration-balanced distribution
run_tests_parallel.py:
  - --slice I/N flag (also HERMES_TEST_SLICE env var) runs only the
    I-th slice of N, distributing files across slices by cached
    duration using LPT (Longest Processing Time first) greedy
    algorithm so each slice gets roughly equal wall time
  - Duration cache (test_durations.json): maps relative file paths to
    last-observed subprocess wall time. _save_durations merges with
    existing cache so entries from other slices are preserved.
  - Per-file subprocess timing in progress output + end-of-run
    distribution summary (percentiles, top-10 slowest, <1s/<2s counts)
  - Unknown files default to 2.0s estimate (~P50), spread evenly by LPT

.github/workflows/tests.yml:
  - Matrix strategy: slice [1, 2, 3, 4] with fail-fast: false
  - Each slice restores duration cache from main (stable key, no SHA),
    runs its portion, uploads per-slice durations as artifacts
  - save-durations job (main only, if: always()) downloads all 4
    artifacts, merges into single cache entry for future PRs
  - Timeout reduced from 60min to 30min per slice (~1/4 the work)

Cache design:
  - Stable key (test-durations) not keyed by commit SHA — durations
    are about files, not commits, and SHA-keyed caches miss on every
    new commit and on PR merge commits
  - actions/cache scoping: main's cache is visible to all PRs targeting
    main; feature branches without a cache still work (default 2.0s)
  - No dotfile prefix (upload-artifact v7 skips hidden files)
2026-05-22 19:46:18 -07:00
Teknium a84cec61ca fix(minimax-oauth): refresh short-lived access tokens per request (#30619)
* fix(minimax-oauth): refresh short-lived access tokens per request

MiniMax OAuth issues ~15-minute access tokens. The Anthropic SDK caches
api_key as a static string at client construction, so a session that
resolves credentials once at startup keeps sending the same bearer until
MiniMax returns 401 mid-session.

Swap the static string for a callable token provider, reusing the existing
Entra-ID bearer-hook infrastructure in build_anthropic_client. The callable
re-reads auth.json on each invocation and calls _refresh_minimax_oauth_state,
which is a no-op when the token still has more than 60s of life left and
refreshes proactively otherwise. Refreshes persist to auth.json so other
processes (gateway, cron) see them immediately.

The wire-up lives at the agent-init / model-switch boundary rather than in
resolve_runtime_provider, so aux client paths that hand the api_key string
to OpenAI(api_key=...) are unaffected.

* docs: add infographic for minimax-oauth token refresh
2026-05-22 15:16:15 -07:00
ethernet 2f320cb35a fix(ci): supply-chain-audit uses two-dot diff, causing false positives on stale-branch PRs
The workflow diffs base.sha..head.sha (two-dot), which compares the
tip-of-main tree directly against the PR tip. When files land on main
after a PR branched off, they appear in the diff even though the PR
never touched them — triggering false-positive findings.

Example: PR #30609 was flagged for hermes_cli/setup.py, a file added
to main by an unrelated commit after the PR branched.

Switch to three-dot diff (base.sha...head.sha), which diffs from the
merge base to the PR tip — only changes introduced by this PR are
included. Applied to all four diff commands in both jobs (scan and
dep-bounds).
2026-05-22 15:15:53 -07:00
Teknium 2233b8b244 infographic: PR #30609 Termux cold-start salvage (#30618) 2026-05-22 14:32:41 -07:00
adybag14-cyber a3beee475b perf(termux): speed up bare cli prompt startup 2026-05-22 14:27:38 -07:00
adybag14-cyber 6c3fd9714f perf(termux): fast-path cli version startup 2026-05-22 14:27:38 -07:00
Teknium d11cbb1032 infographic: PR #30591 Discord adapter → bundled plugin salvage (#30614) 2026-05-22 14:24:03 -07:00
Teknium 7849a3d73f fix(gateway,discord-plugin): _platform_status must respect is_connected=False, not silently fall back to check_fn
Two bugs surfaced by PR #24356 migrating Discord into the registry:

1. plugins/platforms/discord/adapter.py::_is_connected — read DISCORD_BOT_TOKEN
   via hermes_cli.gateway.get_env_value (the abstraction tests patch) instead
   of os.getenv directly. The legacy non-registry path used get_env_value;
   bypassing it broke test_setup_openclaw_migration which patches
   gateway_mod.get_env_value to simulate a hermetic env.

2. hermes_cli/gateway.py::_platform_status — when entry.is_connected is
   defined and returns False, return 'not configured' immediately. Don't
   fall back to entry.check_fn(), which would let 'SDK is installed'
   override 'no token configured' and incorrectly report the platform as
   ready. The fallback to check_fn is the right behaviour only when
   is_connected is None (not registered).

Fixes 5 test failures observed on CI for PR #24356:
- tests/hermes_cli/test_setup.py::test_setup_gateway_skips_service_install_when_systemctl_missing
- tests/hermes_cli/test_setup.py::test_setup_gateway_in_container_shows_docker_guidance
- tests/hermes_cli/test_setup_irc.py::TestIRCGatewaySetupFreshInstall::test_setup_gateway_irc_counts_as_messaging_platform
- tests/hermes_cli/test_setup_openclaw_migration.py::TestGetSectionConfigSummary::test_gateway_returns_none_without_tokens
- tests/hermes_cli/test_setup_openclaw_migration.py::TestSetupWizardSkipsConfiguredSections::test_sections_skipped_when_migration_imported_settings

Same _platform_status bug exists for sibling plugin platforms (teams,
google_chat) whose check_fn returns true on SDK install alone; their
tests just never exercised the registry path before. The bug only became
test-visible when Discord migrated into the registry.

Validation: 11,167 tests across tests/gateway/ + tests/cron/ +
tests/tools/test_send_message_tool.py + tests/hermes_cli/ pass with zero
failures.
2026-05-22 14:21:41 -07:00
kshitijk4poor cc8e5ec2af refactor(gateway): migrate Discord adapter to bundled plugin (full Teams parity)
First migration of an existing built-in platform adapter to the plugin
system established by IRC / Teams / LINE / Google Chat. Closes #24325;
advances the umbrella refactor in #3823.

Matches Teams' shape exactly — adapter under ``plugins/platforms/discord/``
with the standard ``__init__.py`` / ``adapter.py`` / ``plugin.yaml``
shell, ``register(ctx)`` entry point, **no back-compat shim** at the old
import path, and full parity for the four hooks Teams uses plus the
``apply_yaml_config_fn`` hook that landed in #25443 (the Discord plugin
is the first consumer of that hook):

* ``standalone_sender_fn`` — out-of-process cron delivery via REST API
* ``setup_fn`` — interactive ``hermes setup gateway`` wizard
* ``apply_yaml_config_fn`` — translate ``config.yaml`` ``discord:`` keys
  into ``DISCORD_*`` env vars (replaces the hardcoded block in
  ``gateway/config.py``)
* ``is_connected`` — declares connection state from ``DISCORD_BOT_TOKEN``
* ``check_fn`` — lazy-installs ``discord.py`` on demand
* plus ``allowed_users_env``, ``allow_all_env``, ``cron_deliver_env_var``,
  ``max_message_length``, ``emoji``, ``required_env``, ``install_hint``

* ``gateway/platforms/discord.py`` (5,101 LOC) →
  ``plugins/platforms/discord/adapter.py`` (git rename, R090).
* New ``plugins/platforms/discord/{__init__.py, plugin.yaml}`` with
  ``requires_env`` / ``optional_env`` declarations.
* Append ``register(ctx)`` block + new hook implementations
  (``_standalone_send``, ``interactive_setup``, ``_apply_yaml_config``,
  ``_clean_discord_user_ids``, ``_is_connected``, ``_build_adapter``,
  plus helpers ``_DISCORD_CHANNEL_TYPE_PROBE_CACHE`` etc.) to the
  adapter.

* Replace the ``Platform.DISCORD elif`` branch in
  ``GatewayRunner._create_adapter()`` (−9 LOC) with a generic post-creation
  hook (+6 LOC) in the registry path: any plugin adapter that declares a
  ``gateway_runner`` attribute now gets it auto-injected. Webhook's
  built-in branch is unchanged (it doesn't go through the registry path).

* Move ``_send_discord`` (190 LOC) and helpers
  (``_DISCORD_CHANNEL_TYPE_PROBE_CACHE``, ``_remember_channel_is_forum``,
  ``_probe_is_forum_cached``, ``_derive_forum_thread_name``) from
  ``tools/send_message_tool.py`` into the plugin as ``_standalone_send``.
* Wire via ``standalone_sender_fn=_standalone_send`` (Teams pattern; same
  gap fixed in #21804 for other plugin platforms).
* Replace the Discord ``elif`` in ``tools/send_message_tool.py``
  ``_send_to_platform`` with a 10-line registry-hook dispatch.
* Drop the ``DiscordAdapter`` import and the
  ``Platform.DISCORD: DiscordAdapter.MAX_MESSAGE_LENGTH`` ``_MAX_LENGTHS``
  entry — the registry's ``max_message_length=2000`` covers it.

* Move ``_setup_discord`` and ``_clean_discord_user_ids`` (68 LOC) from
  ``hermes_cli/setup.py`` into the plugin as ``interactive_setup``.
* Wire via ``setup_fn=interactive_setup``.  CLI helpers (``prompt``,
  ``print_info``, etc.) are lazy-imported so the plugin's module-load
  surface stays minimal.
* Remove ``"discord": _s._setup_discord`` from
  ``hermes_cli/gateway.py::_builtin_setup_fn``.
* Remove the entire 32-line ``_PLATFORMS["discord"]`` static dict entry —
  Discord's setup metadata is now discovered dynamically via
  ``_all_platforms()`` from the registry entry.

* Move the 59-line ``discord_cfg`` YAML→env bridge from
  ``gateway/config.py::load_gateway_config()`` into the plugin as
  ``_apply_yaml_config``.  Covers ``require_mention``,
  ``thread_require_mention``, ``free_response_channels``, ``auto_thread``,
  ``reactions``, ``ignored_channels``, ``allowed_channels``,
  ``no_thread_channels``, ``allow_mentions.{everyone,roles,users,
  replied_user}``, and ``reply_to_mode`` (including the YAML 1.1
  ``off``-as-False coercion and the ``extra.reply_to_mode`` fallback).
* Wire via ``apply_yaml_config_fn=_apply_yaml_config``.
* The hook runs BEFORE ``_apply_env_overrides`` and after the generic
  shared-key loop, exactly as documented in
  ``website/docs/developer-guide/adding-platform-adapters.md``.
* Behavior is preserved exactly — every assignment still uses
  ``not os.getenv(...)`` guards so env vars take precedence over YAML.

All 78 references to the old import path are rewritten — no back-compat
shim:

* 51 ``from gateway.platforms.discord import X`` →
  ``from plugins.platforms.discord.adapter import X``
* 5 ``import gateway.platforms.discord as discord_platform`` →
  ``import plugins.platforms.discord.adapter as discord_platform``
* 1 ``from gateway.platforms import discord as discord_mod`` →
  ``from plugins.platforms.discord import adapter as discord_mod``
* 21 ``mock.patch("gateway.platforms.discord.X")`` strings →
  ``mock.patch("plugins.platforms.discord.adapter.X")``
* 1 docstring reference in ``hermes_cli/commands.py``
* 1 import in ``tools/send_message_tool.py`` (now removed entirely)

The import-safety test in ``tests/gateway/test_discord_imports.py`` is
updated to purge the new canonical module name from ``sys.modules``.

**38 files changed, +621 / −473** — net positive due to the YAML hook
implementation (89 new LOC in the plugin trading for 59 deleted in core),
but every line moved has a clear plugin home now.  The git rename is
detected at R090 because the adapter gained ~340 LOC of moved-in hook
implementations (``_standalone_send`` + ``interactive_setup`` +
``_apply_yaml_config`` + helpers).

* All 568 Discord-specific tests pass across 25 ``test_discord_*.py``
  files plus voice/send/text-batching/reload-skills/stream-consumer/
  integration tests.
* All 147 tests in the YAML-touching subset
  (``test_discord_reply_mode``, ``test_discord_free_response``,
  ``test_discord_allowed_channels``, ``test_discord_allowed_mentions``,
  ``test_discord_channel_controls``, ``test_discord_reactions``,
  ``test_discord_thread_persistence``, ``test_runtime_footer``) pass —
  this is the strongest signal that the YAML→env hook behaves
  identically to the legacy block.
* Broader gateway/cron/integration sweep (1297 tests) introduces zero
  new failures vs ``main``.  Pre-existing failures in
  ``tests/gateway/test_tts_media_routing.py`` and
  ``tests/e2e/test_platform_commands.py`` reproduce identically on the
  unchanged ``main`` revision.
* Plugin discovery sanity check confirms Discord registers alongside the
  other four platform plugins:

    Registered platforms: ['discord', 'google_chat', 'irc', 'line', 'teams']

These Discord-shaped tendrils in core were **deliberately not moved** —
they are generic platform-registry concerns affecting every platform,
not Discord-specific:

* ``gateway/config.py:1205`` ``DISCORD_BOT_TOKEN → config.token`` env
  enablement — same shape Telegram has.  The existing
  ``env_enablement_fn`` registry hook only seeds ``extra``, not
  ``.token``, so it can't replace this without an adapter refactor to
  read from ``extra["bot_token"]``.
* ``gateway/run.py`` voice-mode hooks
  (``self.adapters.get(Platform.DISCORD)`` for
  ``start_voice_mode``/``stop_voice_mode``), role-based auth,
  ``DISCORD_ALLOW_BOTS`` branch in ``_is_user_authorized``,
  ``_UPDATE_ALLOWED_PLATFORMS`` frozenset, and the per-platform
  allowlist maps — generic platform-registry concerns.
* ``Platform.DISCORD`` enum literal — stable identifier used as dict
  keys throughout the codebase; removing it is a separate refactor with
  no real benefit.
* ``tools/discord_tool.py`` and ``tools/environments/local.py`` —
  first-class agent tools and env-passthrough config, neither is the
  gateway adapter.

Each of these is worth its own scoping issue when the time comes.
2026-05-22 14:21:41 -07:00
Teknium 4f988634f8 infographic: PR #27612 Nous URL allowlist salvage 2026-05-22 14:17:40 -07:00
Teknium e32d2ffc1d fix(security): wire Nous URL allowlist into refresh / mint persistence sites
@memosr's PR #27612 put the inference_base_url allowlist check only at the
Nous proxy adapter forward boundary. The poisoned URL, however, lands in
``auth.json`` upstream of that — at five refresh / agent-key-mint payload
read sites inside ``resolve_nous_runtime_credentials`` and
``_extend_state_from_refresh``. Without gating those sites, a single MITM
on a refresh response persists the attacker's URL across restarts, even
if the proxy adapter's defense-in-depth check would later catch it on
the way out.

Replace ``_optional_base_url`` with ``_validate_nous_inference_url_from_network``
at all five Portal-network reads:

  - hermes_cli/auth.py L4840  (refresh-only access-token path)
  - hermes_cli/auth.py L4876  (mint payload path)
  - hermes_cli/auth.py L5154  (terminal-runtime access-token refresh)
  - hermes_cli/auth.py L5262  (cross-process serialized refresh)
  - hermes_cli/auth.py L5317  (terminal-runtime mint payload)

The state-read path at L5025 (``state.get("inference_base_url")``) is
deliberately NOT gated — pre-existing state in ``auth.json`` is either
already validated (it came from one of the five network sites above) or
set by a trusted local actor (manual edit, ``_setup_nous_auth`` test
fixture, ``hermes login nous`` against a staging endpoint via the
documented ``NOUS_INFERENCE_BASE_URL`` env override). Direct write_file /
patch tampering with auth.json is independently blocked by PR #14157.

Adds tests/hermes_cli/test_nous_inference_url_validation.py covering:
  - validator https + host + edge-case rules (12 cases)
  - all 5 network call sites grep contracts (no _optional_base_url
    regression possible without test failure)
  - proxy adapter defense-in-depth check still present
  - env override path NOT gated (documented dev/staging behaviour)

18 new tests, all 119 Nous-auth tests green.
2026-05-22 14:17:40 -07:00
memosr d33c99bbb1 fix(security): validate Nous Portal inference_base_url against host allowlist
The Nous Portal proxy adapter forwards minted ``agent_key`` bearer tokens
to whatever ``base_url`` ``resolve_nous_runtime_credentials()`` returns,
which is read directly from the refresh / agent-key-mint response and
persisted to ``~/.hermes/auth.json``. With no validation beyond a
trailing-slash strip, a poisoned URL (Portal-side MITM, or local write
to auth.json) gets forwarded the legitimate bearer on every subsequent
proxy request — exfiltrating the user's inference budget and opening a
response-injection channel back into the IDE / chat client.

Add ``_validate_nous_inference_url_from_network()`` in ``hermes_cli.auth``:
an https + host-allowlist check that returns None for anything outside
``inference-api.nousresearch.com``, so callers fall back to the
documented default rather than ship the bearer to an attacker.

This commit wires the validator into the proxy adapter at
``nous_portal.py``. A follow-up commit wires it into the four refresh /
mint sites in ``auth.py`` so the poisoned URL never lands in auth.json
in the first place.

The env-var override path (``NOUS_INFERENCE_BASE_URL``) bypasses
validation by design — that's the documented staging/dev escape hatch
and the env source is already trusted (the user set it themselves).

Co-authored-by: memosr <mehmet.sr35@gmail.com>
2026-05-22 14:17:40 -07:00
Julien Talbot 09afafb87e fix(xai): resolve Grok Build context for OAuth 2026-05-22 13:05:36 -07:00
Teknium 1e71b7180e infographic: PR #14157 control-plane write-deny salvage 2026-05-22 04:32:14 -07:00
Teknium 42104218e0 fix(file-safety): also write-deny <root>/control-files in profile mode
PR #14157 added control-plane write-deny against the ACTIVE HERMES_HOME,
which is fine in non-profile mode but leaves a gap once a profile is
active: HERMES_HOME points at <root>/profiles/<name>, so the global
<root>/auth.json + <root>/config.yaml + <root>/webhook_subscriptions.json
+ <root>/mcp-tokens/ remain writable. Same shape as the .env gap PR
#15981 closed via _hermes_root_path().

Apply the same widening pattern here. The control-file/mcp-tokens check
now iterates BOTH _hermes_home_path() and _hermes_root_path() (dedupes
when they coincide in non-profile mode). Also tightens the mcp-tokens
check from "startswith dir + os.sep" to "==dir OR startswith dir + os.sep"
so writing the directory entry itself is blocked, not just files inside.

Regression tests cover both protections in a real profile-mode layout
(<tmp>/hermes/profiles/coder as HERMES_HOME, <tmp>/hermes as root).
2026-05-22 04:32:14 -07:00
Pratik Rai 1f5219fda5 fix(security): protect Hermes control-plane files from prompt injection
Adds active-HERMES_HOME control-plane files to the write deny list:
auth.json, config.yaml, webhook_subscriptions.json, and any path
under mcp-tokens/. realpath() resolves before comparison so
directory-traversal and symlink targets are normalised, preventing
trivial deny-list bypass via ../ tricks.

Without this, a prompt-injected agent could rewrite Hermes' own
auth state or routing config via write_file / patch — without
triggering the terminal dangerous-command approval — and persist
attacker-controlled behaviour across sessions.

Fixes #14072
2026-05-22 04:32:14 -07:00
1040 changed files with 154426 additions and 5816 deletions
+5 -2
View File
@@ -29,9 +29,13 @@ runs:
- name: hermes --help
shell: bash
run: |
# Use the image's real ENTRYPOINT (/init + main-wrapper.sh) so
# this exercises the actual production startup path. PR #30136
# review caught that an --entrypoint override here had been
# silently neutered by the s6-overlay migration — stage2-hook
# ignores its CMD args, so the smoke test was a no-op.
docker run --rm \
-v /tmp/hermes-test:/opt/data \
--entrypoint /opt/hermes/docker/entrypoint.sh \
"${{ inputs.image }}" --help
- name: hermes dashboard --help
@@ -43,5 +47,4 @@ runs:
# installed package.
docker run --rm \
-v /tmp/hermes-test:/opt/data \
--entrypoint /opt/hermes/docker/entrypoint.sh \
"${{ inputs.image }}" dashboard --help
+68
View File
@@ -0,0 +1,68 @@
name: Docker / shell lint
# Lints the container build inputs: Dockerfile (via hadolint) and any shell
# scripts under docker/ (via shellcheck). These catch the class of regression
# the behavioral docker-publish smoke test can't — unquoted variable
# expansions, silently-failing RUN commands, etc.
#
# Rules and ignores are documented in .hadolint.yaml at the repo root.
# shellcheck severity is pinned to `error` so SC1091-style "can't follow
# sourced script" info-level warnings don't fail the job — the .venv
# activate script doesn't exist at lint time.
on:
push:
branches: [main]
paths:
- Dockerfile
- docker/**
- .hadolint.yaml
- .github/workflows/docker-lint.yml
pull_request:
branches: [main]
paths:
- Dockerfile
- docker/**
- .hadolint.yaml
- .github/workflows/docker-lint.yml
permissions:
contents: read
concurrency:
group: docker-lint-${{ github.ref }}
cancel-in-progress: true
jobs:
hadolint:
name: Lint Dockerfile (hadolint)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: hadolint
uses: hadolint/hadolint-action@54c9adbab1582c2ef04b2016b760714a4bfde3cf # v3.1.0
with:
dockerfile: Dockerfile
config: .hadolint.yaml
failure-threshold: warning
shellcheck:
name: Lint docker/ shell scripts (shellcheck)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: shellcheck
uses: ludeeus/action-shellcheck@00cae500b08a931fb5698e11e79bfbd38e612a38 # v2.0.0
env:
# Severity = error: SC1091 (can't follow sourced script) is info-
# level and would otherwise fail when the venv activate script
# doesn't exist at lint time.
SHELLCHECK_OPTS: --severity=error
with:
scandir: ./docker
+50
View File
@@ -80,6 +80,56 @@ jobs:
with:
image: ${{ env.IMAGE_NAME }}:test
# ---------------------------------------------------------------------
# Run the docker-integration test suite against the freshly-built
# image already loaded into the local daemon (`:test`). These tests
# are excluded from the sharded `tests.yml :: test` matrix on purpose
# (see `_SKIP_PARTS` in scripts/run_tests_parallel.py) because each
# shard would otherwise reach the session-scoped ``built_image``
# fixture in ``tests/docker/conftest.py`` and start a 3-7min
# ``docker build`` under a 180s pytest-timeout cap — guaranteed to
# die in fixture setup.
#
# Piggybacking here avoids a second image build: the smoke test
# already proved the image loads + runs, so the daemon has it under
# `${IMAGE_NAME}:test` and we just point ``HERMES_TEST_IMAGE`` at
# that. The fixture's ``HERMES_TEST_IMAGE`` branch (see
# tests/docker/conftest.py:62-63) short-circuits the rebuild.
#
# Why this job and not a standalone one: the image is 5GB+; passing
# it between jobs via ``docker save``/``upload-artifact`` is slower
# than the build itself. Reusing the existing daemon state is the
# cheapest path to coverage on every PR that touches docker code.
# ---------------------------------------------------------------------
- name: Install uv (for docker tests)
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
- name: Set up Python 3.11 (for docker tests)
run: uv python install 3.11
- name: Install Python dependencies (for docker tests)
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
# ``dev`` extra pulls in pytest, pytest-asyncio, pytest-timeout —
# everything tests/docker/ needs. We deliberately avoid ``all``
# here because the docker tests only drive the container via
# subprocess and don't import hermes_agent's optional deps.
uv pip install -e ".[dev]"
- name: Run docker integration tests
env:
# Skip rebuild; use the image already loaded by the build step.
HERMES_TEST_IMAGE: ${{ env.IMAGE_NAME }}:test
# Match the policy in tests.yml :: test job — no accidental
# real-API calls from inside the harness.
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
run: |
source .venv/bin/activate
python -m pytest tests/docker/ -v --tb=short
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
+1 -1
View File
@@ -51,7 +51,7 @@ jobs:
steps:
- name: Generate GitHub App token
id: app-token
uses: actions/create-github-app-token@7bfa3a4717ef143a604ee0a99d859b8886a96d00 # v1.9.3
uses: actions/create-github-app-token@bcd2ba49218906704ab6c1aa796996da409d3eb1 # v3.2.0
with:
app-id: ${{ secrets.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
+12 -4
View File
@@ -47,14 +47,17 @@ jobs:
HEAD="${{ github.event.pull_request.head.sha }}"
# Added lines only, excluding lockfiles.
DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
# Three-dot diff (base...head) diffs from the merge base to HEAD,
# so only changes introduced by this PR are included — not changes
# that landed on main after the PR branched off.
DIFF=$(git diff "$BASE"..."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
FINDINGS=""
# --- .pth files (auto-execute on Python startup) ---
# The exact mechanism used in the litellm supply chain attack:
# https://github.com/BerriAI/litellm/issues/24512
PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
PTH_FILES=$(git diff --name-only "$BASE"..."$HEAD" | grep '\.pth$' || true)
if [ -n "$PTH_FILES" ]; then
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: .pth file added or modified
@@ -97,7 +100,12 @@ jobs:
# --- Install-hook files (setup.py/sitecustomize/usercustomize/__init__.pth) ---
# These execute during pip install or interpreter startup.
SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(^|/)(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
# Anchored at repo root: only the top-level setup.py/setup.cfg run during
# `pip install`, and only top-level sitecustomize.py/usercustomize.py are
# auto-loaded by the interpreter via site.py. Any nested file with the
# same name (e.g. hermes_cli/setup.py — the CLI setup wizard) is unrelated
# and produced false positives that trained reviewers to ignore the scanner.
SETUP_HITS=$(git diff --name-only "$BASE"..."$HEAD" | grep -E '^(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
if [ -n "$SETUP_HITS" ]; then
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: Install-hook file added or modified
@@ -158,7 +166,7 @@ jobs:
HEAD="${{ github.event.pull_request.head.sha }}"
# Only check added lines in pyproject.toml
ADDED=$(git diff "$BASE".."$HEAD" -- pyproject.toml | grep '^+' | grep -v '^+++' || true)
ADDED=$(git diff "$BASE"..."$HEAD" -- pyproject.toml | grep '^+' | grep -v '^+++' || true)
if [ -z "$ADDED" ]; then
echo "found=false" >> "$GITHUB_OUTPUT"
+61 -4
View File
@@ -23,11 +23,22 @@ concurrency:
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 60
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
slice: [1, 2, 3, 4, 5, 6]
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Restore duration cache
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
# Single stable key. main always overwrites, PRs always find it.
key: test-durations
- name: Install ripgrep (prebuilt binary)
run: |
set -euo pipefail
@@ -54,7 +65,7 @@ jobs:
source .venv/bin/activate
uv pip install -e ".[all,dev]"
- name: Run tests
- name: Run tests (slice ${{ matrix.slice }}/6)
# Per-file isolation via scripts/run_tests_parallel.py: discovers
# every test_*.py file under tests/ (excluding integration/ + e2e/),
# then runs `python -m pytest <file>` in a freshly-spawned subprocess
@@ -72,15 +83,61 @@ jobs:
# state across files, which is exactly the leakage we wanted to
# fix. ThreadPoolExecutor + subprocess.run is ~60 lines and does
# the job with cleaner semantics.
#
# Matrix slicing (--slice I/N): files are distributed across 6
# jobs by cached duration (LPT algorithm) so each job gets
# roughly equal wall time. Without a cache, files default to 2s
# estimate and get split roughly evenly by count — still correct,
# just not perfectly balanced.
run: |
source .venv/bin/activate
python scripts/run_tests_parallel.py
python scripts/run_tests_parallel.py --slice ${{ matrix.slice }}/6
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
- name: Upload per-slice durations
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: test-durations-slice-${{ matrix.slice }}
path: test_durations.json
retention-days: 1
# Merge per-slice duration data into a single cache, so future runs
# (including PRs) get balanced slicing.
save-durations:
needs: test
if: always() && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Download all slice durations
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
with:
pattern: test-durations-slice-*
path: durations
merge-multiple: true
- name: Merge into single durations file
run: |
python3 -c "
import json, glob, os
merged = {}
for f in glob.glob('durations/*test_durations.json'):
with open(f) as fh:
merged.update(json.load(fh))
with open('test_durations.json', 'w') as fh:
json.dump(merged, fh, indent=2, sort_keys=True)
print(f'Merged {len(merged)} file durations')
"
- name: Save merged duration cache
uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
key: test-durations
e2e:
runs-on: ubuntu-latest
timeout-minutes: 15
@@ -121,4 +178,4 @@ jobs:
env:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
NOUS_API_KEY: ""
+1
View File
@@ -18,6 +18,7 @@ __pycache__/web_tools.cpython-310.pyc
logs/
data/
.pytest_cache/
test_durations.json
.pytest-cache/
tmp/
temp_vision_images/
+36
View File
@@ -0,0 +1,36 @@
# hadolint configuration for the Hermes Agent Dockerfile.
# See https://github.com/hadolint/hadolint#configure for rules.
#
# We want hadolint to surface NEW Dockerfile lint regressions, but we
# don't want to rewrite the existing image to silence rules that are
# either intentional or pragmatic tradeoffs for this project. Each
# ignore below has a one-line justification.
failure-threshold: warning
ignored:
# Pin versions in apt get install. We intentionally don't pin common
# tools (curl, git, openssh-client, etc.) — security updates flow in
# via the periodic base-image rebuild, and pinning would lock us to
# superseded patch releases. Same rationale as nearly every distro-
# base official image (python, node, debian).
- DL3008
# Use WORKDIR to switch to a directory. The image uses `(cd web && …)`
# / `(cd ../ui-tui && …)` inline subshells for one-off build steps
# because they don't affect later RUN commands; promoting them to
# full WORKDIR switches with restores would obscure intent.
- DL3003
# Multiple consecutive RUN instructions. The `touch README.md` + `uv
# sync` split is intentional — `touch` is cheap, `uv sync` is the
# expensive layer-cached step we want isolated, and merging them
# would invalidate the cache for trivial changes.
- DL3059
# Last USER should not be root. /init (s6-overlay) runs as root so the
# stage2 hook can usermod/groupmod and chown the data volume per
# HERMES_UID at runtime; each supervised service then drops to the
# hermes user via `s6-setuidgid`.
- DL3002
# Require explicit base-image pins (SHA256) — we already do this.
trustedRegistries:
- docker.io
- ghcr.io
+114 -10
View File
@@ -1,5 +1,4 @@
FROM ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie@sha256:b3c543b6c4f23a5f2df22866bd7857e5d304b67a564f4feab6ac22044dde719b AS uv_source
FROM tianon/gosu:1.19-trixie@sha256:3b176695959c71e123eb390d427efc665eeb561b1540e82679c15e992006b8b9 AS gosu_source
FROM debian:13.4
# Disable Python stdout buffering to ensure logs are printed immediately
@@ -9,18 +8,68 @@ ENV PYTHONUNBUFFERED=1
# install survives the /opt/data volume overlay at runtime.
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# Install system dependencies in one layer, clear APT cache
# tini reaps orphaned zombie processes (MCP stdio subprocesses, git, bun, etc.)
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
# Install system dependencies in one layer, clear APT cache.
# tini was previously PID 1 to reap orphaned zombie processes (MCP stdio
# subprocesses, git, bun, etc.) that would otherwise accumulate when hermes
# ran as PID 1. See #15012. Phase 2 of the s6-overlay supervision plan
# replaces tini with s6-overlay's /init (PID 1 = s6-svscan), which reaps
# zombies non-blockingly on SIGCHLD and additionally supervises the main
# hermes process, the dashboard, and per-profile gateways.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
build-essential curl nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
build-essential curl nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli xz-utils && \
rm -rf /var/lib/apt/lists/*
# ---------- s6-overlay install ----------
# s6-overlay provides supervision for the main hermes process, the dashboard,
# and per-profile gateways. /init becomes PID 1 below — see ENTRYPOINT.
#
# Multi-arch: BuildKit auto-populates TARGETARCH (amd64 / arm64). s6-overlay
# uses tarball names keyed on the kernel arch string (x86_64 / aarch64), so
# we map between them inline. The noarch + symlinks tarballs are
# architecture-independent and reused as-is.
#
# We use `curl` instead of `ADD` for the per-arch tarball because `ADD`
# evaluates its URL at parse time, before any ARG / TARGETARCH substitution
# — splitting one URL per arch into two ADDs would download both on every
# build and leave dead bytes in the cache. A single curl + arch-keyed URL
# is simpler and cache-friendlier.
#
# Supply-chain integrity: every tarball is checksum-verified against the
# upstream-published SHA256. To bump S6_OVERLAY_VERSION, fetch the four
# `.sha256` files from the corresponding release and update the ARGs. The
# checksum lookup happens during build, so a compromised release artifact
# fails the build loudly instead of silently producing a tampered image.
ARG TARGETARCH
ARG S6_OVERLAY_VERSION=3.2.3.0
ARG S6_OVERLAY_NOARCH_SHA256=b720f9d9340efc8bb07528b9743813c836e4b02f8693d90241f047998b4c53cf
ARG S6_OVERLAY_X86_64_SHA256=a93f02882c6ed46b21e7adb5c0add86154f01236c93cd82c7d682722e8840563
ARG S6_OVERLAY_AARCH64_SHA256=0952056ff913482163cc30e35b2e944b507ba1025d78f5becbb89367bf344581
ARG S6_OVERLAY_SYMLINKS_SHA256=a60dc5235de3ecbcf874b9c1f18d73263ab99b289b9329aa950e8729c4789f0e
ADD https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz /tmp/
ADD https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-symlinks-noarch.tar.xz /tmp/
RUN set -eu; \
case "${TARGETARCH:-amd64}" in \
amd64) s6_arch="x86_64"; s6_arch_sha="${S6_OVERLAY_X86_64_SHA256}" ;; \
arm64) s6_arch="aarch64"; s6_arch_sha="${S6_OVERLAY_AARCH64_SHA256}" ;; \
*) echo "Unsupported TARGETARCH=${TARGETARCH} for s6-overlay" >&2; exit 1 ;; \
esac; \
curl -fsSL --retry 3 -o /tmp/s6-overlay-arch.tar.xz \
"https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-${s6_arch}.tar.xz"; \
{ \
printf '%s %s\n' "${S6_OVERLAY_NOARCH_SHA256}" /tmp/s6-overlay-noarch.tar.xz; \
printf '%s %s\n' "${s6_arch_sha}" /tmp/s6-overlay-arch.tar.xz; \
printf '%s %s\n' "${S6_OVERLAY_SYMLINKS_SHA256}" /tmp/s6-overlay-symlinks-noarch.tar.xz; \
} > /tmp/s6-overlay.sha256; \
sha256sum -c /tmp/s6-overlay.sha256; \
tar -C / -Jxpf /tmp/s6-overlay-noarch.tar.xz; \
tar -C / -Jxpf /tmp/s6-overlay-arch.tar.xz; \
tar -C / -Jxpf /tmp/s6-overlay-symlinks-noarch.tar.xz; \
rm /tmp/s6-overlay-*.tar.xz /tmp/s6-overlay.sha256
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
RUN useradd -u 10000 -m -d /opt/data hermes
COPY --chmod=0755 --from=gosu_source /gosu /usr/local/bin/
COPY --chmod=0755 --from=uv_source /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/
WORKDIR /opt/hermes
@@ -103,18 +152,73 @@ RUN cd web && npm run build && \
USER root
RUN chmod -R a+rX /opt/hermes && \
chown -R hermes:hermes /opt/hermes/.venv /opt/hermes/ui-tui /opt/hermes/node_modules
# Start as root so the entrypoint can usermod/groupmod + gosu.
# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
# Start as root so the s6-overlay stage2 hook can usermod/groupmod and chown
# the data volume. Each supervised service then drops to the hermes user via
# `s6-setuidgid hermes` in its run script. If HERMES_UID is unset, services
# run as the default hermes user (UID 10000).
# ---------- Link hermes-agent itself (editable) ----------
# Deps are already installed in the cached layer above; `--no-deps` makes
# this a fast (~1s) egg-link creation with no resolution or downloads.
RUN uv pip install --no-cache-dir --no-deps -e "."
# ---------- s6-overlay service wiring ----------
# Static services declared at build time: main-hermes + dashboard.
# Per-profile gateway services are registered dynamically at runtime by
# the profile create/delete hooks (Phase 4); they live under
# /run/service/ (tmpfs) and are reconciled on container restart by
# /etc/cont-init.d/02-reconcile-profiles (Phase 4 Task 4.0).
COPY docker/s6-rc.d/ /etc/s6-overlay/s6-rc.d/
# stage2-hook handles UID/GID remap, volume chown, config seeding,
# skills sync — all the work the old entrypoint.sh did before
# `exec hermes`. Wired in as cont-init.d/01- so it
# runs before user services start.
#
# 02-reconcile-profiles re-creates per-profile gateway s6 service
# slots from $HERMES_HOME/profiles/<name>/ after a container restart
# (the /run/service/ scandir is tmpfs and wiped on restart). Phase 4.
RUN mkdir -p /etc/cont-init.d && \
printf '#!/bin/sh\nexec /opt/hermes/docker/stage2-hook.sh\n' \
> /etc/cont-init.d/01-hermes-setup && \
chmod +x /etc/cont-init.d/01-hermes-setup
COPY --chmod=0755 docker/cont-init.d/015-supervise-perms /etc/cont-init.d/015-supervise-perms
COPY --chmod=0755 docker/cont-init.d/02-reconcile-profiles /etc/cont-init.d/02-reconcile-profiles
# ---------- Runtime ----------
ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
ENV PATH="/opt/data/.local/bin:${PATH}"
# Pre-s6 entrypoint.sh did `source .venv/bin/activate` which exported
# the venv bin onto PATH; Architecture B's main-wrapper.sh does the
# same for the container's main process, but `docker exec` and our
# cont-init.d scripts don't pass through the wrapper. Expose the venv
# bin globally so `docker exec <container> hermes ...` and any
# subprocess that doesn't activate the venv first still find hermes.
ENV PATH="/opt/hermes/.venv/bin:/opt/data/.local/bin:${PATH}"
RUN mkdir -p /opt/data
VOLUME [ "/opt/data" ]
ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]
# s6-overlay's /init is PID 1. It sets up the supervision tree, runs
# /etc/cont-init.d/* (our stage2 hook), starts s6-rc services
# declared in /etc/s6-overlay/s6-rc.d/, then exec's its remaining
# argv as the container's "main program" with stdin/stdout/stderr
# inherited (this is what makes interactive --tui work). When the
# main program exits, /init begins stage 3 shutdown and the container
# exits with the program's exit code. Replaces tini — see Phase 2 of
# docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md.
#
# We use the ENTRYPOINT+CMD split rather than CMD alone so the
# wrapper is prepended to user-supplied args automatically:
#
# docker run <image> → /init main-wrapper.sh (CMD default)
# docker run <image> chat -q "hi" → /init main-wrapper.sh chat -q hi
# docker run <image> sleep infinity → /init main-wrapper.sh sleep infinity
# docker run <image> --tui → /init main-wrapper.sh --tui
#
# main-wrapper.sh handles arg routing (bare-exec vs. hermes
# subcommand vs. no-args), drops to the hermes user via s6-setuidgid,
# and exec's the final program so its exit code becomes the container
# exit code. Without the wrapper-as-ENTRYPOINT, leading-dash args
# like `--version` would be intercepted by /init's POSIX shell.
ENTRYPOINT [ "/init", "/opt/hermes/docker/main-wrapper.sh" ]
CMD [ ]
+21
View File
@@ -79,6 +79,27 @@ hermes doctor # Diagnose any issues
📖 **[Full documentation →](https://hermes-agent.nousresearch.com/docs/)**
---
## Skip the API-key collection — Nous Portal
Hermes works with whatever provider you want — that's not changing. But if you'd rather not collect five separate API keys for the model, web search, image generation, TTS, and a cloud browser, **[Nous Portal](https://portal.nousresearch.com)** covers all of them under one subscription:
- **300+ models** — pick any of them with `/model <name>`
- **Tool Gateway** — web search (Firecrawl), image generation (FAL), text-to-speech (OpenAI), cloud browser (Browser Use), all routed through your sub. No extra accounts.
One command from a fresh install:
```bash
hermes setup --portal
```
That logs you in via OAuth, sets Nous as your provider, and turns on the Tool Gateway. Check what's wired up any time with `hermes portal status`. Full details on the [Tool Gateway docs page](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway).
You can still bring your own keys per-tool whenever you want — the gateway is per-backend, not all-or-nothing.
---
## CLI vs Messaging Quick Reference
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
+21
View File
@@ -65,6 +65,27 @@ hermes doctor # 诊断问题
📖 **[完整文档 →](https://hermes-agent.nousresearch.com/docs/)**
---
## 省去到处收集 API Key — Nous Portal
Hermes 始终允许你使用任意服务商,这点不会改变。但如果你不想为模型、网页搜索、图像生成、TTS、云浏览器分别去申请五个不同的 API Key,**[Nous Portal](https://portal.nousresearch.com)** 用一个订阅就能覆盖全部:
- **300+ 模型** — 用 `/model <name>` 随时切换
- **Tool Gateway** — 网页搜索(Firecrawl)、图像生成(FAL)、文本转语音(OpenAI)、云浏览器(Browser Use),全部通过订阅托管。无需额外注册任何账户。
全新安装时一条命令即可:
```bash
hermes setup --portal
```
它会通过 OAuth 登录、把 Nous 设为推理服务商,并启用 Tool Gateway。随时用 `hermes portal status` 查看路由状态。完整说明见 [Tool Gateway 文档](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway)。
你随时可以按工具单独切回自己的 API Key — Gateway 是按工具粒度生效的,不是一刀切。
---
## CLI 与消息平台 快速对照
Hermes 有两种入口:用 `hermes` 启动终端 UI,或运行网关从 Telegram、Discord、Slack、WhatsApp、Signal 或 Email 与之对话。进入对话后,许多斜杠命令在两种界面中通用。
+5 -1
View File
@@ -1534,7 +1534,11 @@ class HermesACPAgent(acp.Agent):
)
except Exception:
logger.debug("Failed to auto-title ACP session %s", session_id, exc_info=True)
if final_response and conn and not streamed_message:
if final_response and conn and (not streamed_message or result.get("response_transformed")):
# Deliver the final response when streaming did not already send it,
# or when a plugin hook transformed the response after streaming
# finished (e.g. transform_llm_output) — otherwise the appended /
# rewritten text never reaches the client.
update = acp.update_agent_message_text(final_response)
await conn.session_update(session_id, update)
+33 -9
View File
@@ -607,6 +607,31 @@ def init_agent(
# Falling back would send Anthropic credentials to third-party endpoints (Fixes #1739, #minimax-401).
_is_native_anthropic = agent.provider == "anthropic"
effective_key = (api_key or resolve_anthropic_token() or "") if _is_native_anthropic else (api_key or "")
# MiniMax OAuth issues short-lived (~15-min) access tokens. The
# Anthropic SDK caches ``api_key`` as a static string at client
# construction time, so a session that resolves the bearer once
# at startup will keep sending the same token until MiniMax
# returns 401 mid-session. Swap the static string for a callable
# token provider — ``build_anthropic_client`` recognizes the
# callable and installs an httpx event hook that mints a fresh
# bearer per outbound request (re-reading auth.json so a refresh
# persisted by another process is visible immediately).
# The cached refresh path is a no-op when the token still has
# ``MINIMAX_OAUTH_REFRESH_SKEW_SECONDS`` of life left, so steady-
# state cost is one file read + one timestamp compare per request.
if agent.provider == "minimax-oauth" and isinstance(effective_key, str) and effective_key:
try:
from hermes_cli.auth import build_minimax_oauth_token_provider
effective_key = build_minimax_oauth_token_provider()
except Exception as _mm_exc: # noqa: BLE001 — never block startup on this
import logging as _logging
_logging.getLogger(__name__).warning(
"MiniMax OAuth: failed to install per-request token provider "
"(%s); falling back to static bearer that will expire ~15min in.",
_mm_exc,
)
agent.api_key = effective_key
agent._anthropic_api_key = effective_key
agent._anthropic_base_url = base_url
@@ -618,7 +643,7 @@ def init_agent(
# that cause 401/403 on their endpoints. Guards #1739 and
# the third-party identity-injection bug.
from agent.anthropic_adapter import _is_oauth_token as _is_oat
agent._is_anthropic_oauth = _is_oat(effective_key) if _is_native_anthropic else False
agent._is_anthropic_oauth = _is_oat(effective_key) if (_is_native_anthropic and isinstance(effective_key, str)) else False
agent._anthropic_client = build_anthropic_client(effective_key, base_url, timeout=_provider_timeout)
# No OpenAI client needed for Anthropic mode
agent.client = None
@@ -951,16 +976,14 @@ def init_agent(
# Expose session ID to tools (terminal, execute_code) so agents can
# reference their own session for --resume commands, cross-session
# coordination, and logging. Uses the ContextVar system from
# session_context.py for concurrency safety (gateway runs multiple
# sessions in one process). Also writes os.environ as fallback for
# CLI mode where ContextVars aren't used.
os.environ["HERMES_SESSION_ID"] = agent.session_id
# coordination, and logging. Keep the ContextVar and os.environ
# fallback synchronized because different tool paths still read both.
try:
from gateway.session_context import _SESSION_ID
_SESSION_ID.set(agent.session_id)
from gateway.session_context import set_current_session_id
set_current_session_id(agent.session_id)
except Exception:
pass # CLI/test mode — ContextVar not needed
os.environ["HERMES_SESSION_ID"] = agent.session_id
# Session logs go into ~/.hermes/sessions/ alongside gateway sessions
hermes_home = get_hermes_home()
@@ -1404,6 +1427,7 @@ def init_agent(
base_url=agent.base_url,
api_key=getattr(agent, "api_key", ""),
provider=agent.provider,
api_mode=agent.api_mode,
)
if not agent.quiet_mode:
_ra().logger.info("Using context engine: %s", _selected_engine.name)
+128 -31
View File
@@ -41,6 +41,7 @@ from agent.message_sanitization import (
)
from agent.tool_dispatch_helpers import _trajectory_normalize_msg, make_tool_result_message
from agent.trajectory import convert_scratchpad_to_think
from agent.credential_pool import STATUS_EXHAUSTED
from agent.error_classifier import classify_api_error, FailoverReason
from utils import base_url_host_matches, base_url_hostname, env_var_enabled, atomic_json_write
@@ -132,7 +133,7 @@ def convert_to_trajectory_format(agent, messages: List[Dict[str, Any]], user_que
except json.JSONDecodeError:
# This shouldn't happen since we validate and retry during conversation,
# but if it does, log warning and use empty dict
logging.warning(f"Unexpected invalid JSON in trajectory conversion: {tool_call['function']['arguments'][:100]}")
logger.warning(f"Unexpected invalid JSON in trajectory conversion: {tool_call['function']['arguments'][:100]}")
arguments = {}
tool_call_json = {
@@ -582,12 +583,37 @@ def recover_with_credential_pool(
return False, has_retried_429
if effective_reason == FailoverReason.rate_limit:
# If current credential is already marked exhausted, skip retry and
# rotate immediately. This prevents the "cancel-between-429s" trap
# where has_retried_429 (a local var) gets reset on each new prompt,
# causing the pool to retry the same exhausted credential forever.
current_entry = pool.current()
current_last_status = getattr(current_entry, "last_status", None) if current_entry else None
if current_last_status == STATUS_EXHAUSTED:
_ra().logger.info(
"Credential already exhausted (last_status=%s) — rotating immediately instead of retrying",
current_last_status,
)
rotate_status = status_code if status_code is not None else 429
next_entry = pool.mark_exhausted_and_rotate(status_code=rotate_status, error_context=error_context)
if next_entry is not None:
_ra().logger.info(
"Credential %s (rate limit, pre-exhausted) — rotated to pool entry %s",
rotate_status,
getattr(next_entry, "id", "?"),
)
agent._swap_credential(next_entry)
return True, False
return False, True
usage_limit_reached = False
if error_context:
context_reason = str(error_context.get("reason") or "").lower()
context_message = str(error_context.get("message") or "").lower()
usage_limit_reached = (
"usage_limit_reached" in context_reason
or "gousagelimit" in context_reason
or "usage limit reached" in context_message
or "usage limit has been reached" in context_message
)
if not has_retried_429 and not usage_limit_reached:
@@ -617,9 +643,28 @@ def recover_with_credential_pool(
# existing entitlement keyword set in ``_is_entitlement_failure``.
# Any 403 against ``xai-oauth`` is treated as entitlement here so
# the refresh loop can't spin in those cases either.
#
# Exception (#29344): xAI's ``[WKE=unauthenticated:...]`` suffix and
# the ``OAuth2 access token could not be validated`` phrasing are
# xAI's authoritative "this is a stale token, not entitlement"
# signal. When either fires we must NOT apply the catch-all
# override — refresh is the recoverable path for these bodies, and
# blanket-classifying them as entitlement was the bug that left
# long-running TUI sessions stuck on stale tokens until the user
# exited and reopened.
is_entitlement = agent._is_entitlement_failure(error_context, status_code)
if not is_entitlement and status_code == 403 and (agent.provider or "") == "xai-oauth":
is_entitlement = True
_disambiguator_haystack = " ".join(
str(error_context.get(k) or "").lower()
for k in ("message", "reason", "code", "error")
if isinstance(error_context, dict)
)
_is_xai_auth_failure = (
"[wke=unauthenticated:" in _disambiguator_haystack
or "oauth2 access token could not be validated" in _disambiguator_haystack
)
if not _is_xai_auth_failure:
is_entitlement = True
if is_entitlement:
_ra().logger.info(
"Credential %s — entitlement-shaped 403 from %s; "
@@ -728,7 +773,7 @@ def try_recover_primary_transport(
time.sleep(wait_time)
return True
except Exception as e:
logging.warning("Primary transport recovery failed: %s", e)
logger.warning("Primary transport recovery failed: %s", e)
return False
# ── End provider fallback ──────────────────────────────────────────────
@@ -891,19 +936,20 @@ def restore_primary_runtime(agent) -> bool:
base_url=rt["compressor_base_url"],
api_key=rt["compressor_api_key"],
provider=rt["compressor_provider"],
api_mode=rt.get("compressor_api_mode", ""),
)
# ── Reset fallback chain for the new turn ──
agent._fallback_activated = False
agent._fallback_index = 0
logging.info(
logger.info(
"Primary runtime restored for new turn: %s (%s)",
agent.model, agent.provider,
)
return True
except Exception as e:
logging.warning("Failed to restore primary runtime: %s", e)
logger.warning("Failed to restore primary runtime: %s", e)
return False
# Which error types indicate a transient transport failure worth
@@ -1064,10 +1110,7 @@ def dump_api_request_debug(
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
dump_file = agent.logs_dir / f"request_dump_{agent.session_id}_{timestamp}.json"
dump_file.write_text(
json.dumps(dump_payload, ensure_ascii=False, indent=2, default=str),
encoding="utf-8",
)
atomic_json_write(dump_file, dump_payload, default=str)
agent._vprint(f"{agent.log_prefix}🧾 Request debug dump written to: {dump_file}")
@@ -1077,7 +1120,7 @@ def dump_api_request_debug(
return dump_file
except Exception as dump_error:
if agent.verbose_logging:
logging.warning(f"Failed to dump API request debug payload: {dump_error}")
logger.warning(f"Failed to dump API request debug payload: {dump_error}")
return None
@@ -1352,6 +1395,22 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
# API key — falling back would send Anthropic credentials to third-party endpoints.
_is_native_anthropic = new_provider == "anthropic"
effective_key = (api_key or agent.api_key or resolve_anthropic_token() or "") if _is_native_anthropic else (api_key or agent.api_key or "")
# MiniMax OAuth: swap static string for a per-request callable token
# provider so the rebuilt client survives 15-min token expiry. See
# the matching block in agent_init.py for the full rationale.
if new_provider == "minimax-oauth" and isinstance(effective_key, str) and effective_key:
try:
from hermes_cli.auth import build_minimax_oauth_token_provider
effective_key = build_minimax_oauth_token_provider()
except Exception as _mm_exc: # noqa: BLE001
import logging as _logging
_logging.getLogger(__name__).warning(
"MiniMax OAuth: failed to install per-request token provider "
"on switch (%s); using static bearer.",
_mm_exc,
)
agent.api_key = effective_key
agent._anthropic_api_key = effective_key
agent._anthropic_base_url = base_url or getattr(agent, "_anthropic_base_url", None)
@@ -1359,7 +1418,7 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
effective_key, agent._anthropic_base_url,
timeout=get_provider_request_timeout(agent.provider, agent.model),
)
agent._is_anthropic_oauth = _is_oauth_token(effective_key) if _is_native_anthropic else False
agent._is_anthropic_oauth = _is_oauth_token(effective_key) if (_is_native_anthropic and isinstance(effective_key, str)) else False
agent.client = None
agent._client_kwargs = {}
else:
@@ -1446,6 +1505,7 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
"compressor_api_key": getattr(_cc, "api_key", "") if _cc else "",
"compressor_provider": getattr(_cc, "provider", agent.provider) if _cc else agent.provider,
"compressor_context_length": _cc.context_length if _cc else 0,
"compressor_api_mode": getattr(_cc, "api_mode", agent.api_mode) if _cc else agent.api_mode,
"compressor_threshold_tokens": _cc.threshold_tokens if _cc else 0,
}
if api_mode == "anthropic_messages":
@@ -1477,7 +1537,7 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
agent._fallback_chain = fallback_chain
agent._fallback_model = fallback_chain[0] if fallback_chain else None
logging.info(
logger.info(
"Model switched in-place: %s (%s) -> %s (%s)",
old_model, old_provider, new_model, new_provider,
)
@@ -2032,19 +2092,33 @@ def extract_api_error_context(error: Exception) -> Dict[str, Any]:
if "reset_at" not in context:
message = context.get("message") or ""
if isinstance(message, str):
delay_match = re.search(r"quotaResetDelay[:\s\"]+(\\d+(?:\\.\\d+)?)(ms|s)", message, re.IGNORECASE)
delay_match = re.search(r"quotaResetDelay[:\s\"]+(\d+(?:\.\d+)?)(ms|s)", message, re.IGNORECASE)
if delay_match:
value = float(delay_match.group(1))
seconds = value / 1000.0 if delay_match.group(2).lower() == "ms" else value
context["reset_at"] = time.time() + seconds
else:
sec_match = re.search(
r"retry\s+(?:after\s+)?(\d+(?:\.\d+)?)\s*(?:sec|secs|seconds|s\b)",
resets_in_match = re.search(
r"resets?\s+in\s+"
r"(?:(\d+(?:\.\d+)?)\s*(?:h|hr|hrs|hour|hours)\b\s*)?"
r"(?:(\d+(?:\.\d+)?)\s*(?:m|min|mins|minute|minutes)\b\s*)?"
r"(?:(\d+(?:\.\d+)?)\s*(?:s|sec|secs|second|seconds)\b)?",
message,
re.IGNORECASE,
)
if sec_match:
context["reset_at"] = time.time() + float(sec_match.group(1))
if resets_in_match and any(resets_in_match.groups()):
hours = float(resets_in_match.group(1) or 0)
minutes = float(resets_in_match.group(2) or 0)
seconds = float(resets_in_match.group(3) or 0)
context["reset_at"] = time.time() + (hours * 3600) + (minutes * 60) + seconds
else:
sec_match = re.search(
r"retry\s+(?:after\s+)?(\d+(?:\.\d+)?)\s*(?:sec|secs|seconds|s\b)",
message,
re.IGNORECASE,
)
if sec_match:
context["reset_at"] = time.time() + float(sec_match.group(1))
return context
@@ -2116,33 +2190,56 @@ def apply_pending_steer_to_tool_results(agent, messages: list, num_tool_msgs: in
def force_close_tcp_sockets(client: Any) -> int:
"""Force-close underlying TCP sockets to prevent CLOSE-WAIT accumulation.
"""Abort in-flight TCP I/O by shutting down sockets WITHOUT closing FDs.
When a provider drops a connection mid-stream, httpx's ``client.close()``
performs a graceful shutdown which leaves sockets in CLOSE-WAIT until the
OS times them out (often minutes). This method walks the httpx transport
pool and issues ``socket.shutdown(SHUT_RDWR)`` + ``socket.close()`` to
force an immediate TCP RST, freeing the file descriptors.
When a provider drops a connection mid-stream — or the user issues an
interrupt — we want to unblock httpx's reader/writer immediately rather
than waiting for the kernel's per-connection timeout. ``shutdown(SHUT_RDWR)``
achieves that: it sends FIN, breaks any pending ``recv``/``send`` with EOF
or ``EPIPE``, but does NOT release the file descriptor.
Returns the number of sockets force-closed.
Historically this helper also called ``socket.close()`` so the FD got
released immediately, but that's unsafe when (as is the case for both the
interrupt-abort path and stale-call kill path) the helper runs on a
different thread than the one driving the request:
* The Python ``socket.socket`` we close here is the SAME object held by
httpx's pool, so closing it via Python sets its ``_fd`` to -1 and
future operations on that Python object fail safely.
* BUT the SSL wrapper (``ssl.SSLSocket``'s underlying OpenSSL ``BIO``)
caches the raw integer FD. Once ``os.close(fd)`` runs, the kernel may
immediately recycle that integer to the next ``open()`` call — e.g.
the kanban dispatcher opening ``kanban.db``.
* The owning worker thread then unwinds httpx, the SSL layer flushes a
pending TLS record, and the encrypted bytes get written into the
wrong file (issue #29507: 24-byte TLS application-data record
clobbering SQLite header bytes 5..28).
The fix is to let the owning thread own the close. ``shutdown()`` from any
thread is FD-safe; ``close()`` is not. The httpx connection's own close
path — which runs from the worker thread when it unwinds — will release
the FD via the same ``socket.socket`` object, and because Python's socket
close atomically swaps ``_fd`` to -1 *before* issuing ``os.close``, there
is no FD-aliasing window when only one thread closes.
Returns the number of sockets shut down. (Field kept as
``tcp_force_closed=N`` in the log line for backwards-compatible parsing.)
"""
import socket as _socket
closed = 0
shutdown_count = 0
try:
for sock in _iter_pool_sockets(client):
try:
sock.shutdown(_socket.SHUT_RDWR)
except OSError:
# Already shut down / not connected / FD invalid — all benign.
pass
try:
sock.close()
except OSError:
pass
closed += 1
# IMPORTANT (#29507): do NOT call sock.close() here. See docstring.
shutdown_count += 1
except Exception as exc:
_ra().logger.debug("Force-close TCP sockets sweep error: %s", exc)
return closed
return shutdown_count
+35 -6
View File
@@ -15,6 +15,8 @@ import json
import logging
import os
import platform
import secrets
import stat
import subprocess
from pathlib import Path
from urllib.parse import urlparse
@@ -1040,11 +1042,34 @@ def _write_claude_code_credentials(
existing["claudeAiOauth"] = oauth_data
cred_path.parent.mkdir(parents=True, exist_ok=True)
_tmp_cred = cred_path.with_suffix(".tmp")
_tmp_cred.write_text(json.dumps(existing, indent=2), encoding="utf-8")
_tmp_cred.replace(cred_path)
# Restrict permissions (credentials file)
cred_path.chmod(0o600)
# Per-process random suffix avoids collisions between concurrent
# writers and stale leftovers from a prior crashed write.
_tmp_cred = cred_path.with_suffix(f".tmp.{os.getpid()}.{secrets.token_hex(4)}")
try:
# Create the temp file atomically at 0o600. The previous
# write_text + post-replace chmod opened a TOCTOU window where
# both the temp file and the destination briefly inherited the
# process umask (commonly 0o644 = world-readable), exposing
# Claude Code OAuth tokens to other local users between create
# and chmod. Mirrors agent/google_oauth.py (#19673) and
# tools/mcp_oauth.py (#21148). Parent dir (~/.claude/) is
# owned by Claude Code itself, so we leave its mode alone.
fd = os.open(
str(_tmp_cred),
os.O_WRONLY | os.O_CREAT | os.O_EXCL,
stat.S_IRUSR | stat.S_IWUSR,
)
with os.fdopen(fd, "w", encoding="utf-8") as fh:
json.dump(existing, fh, indent=2)
fh.flush()
os.fsync(fh.fileno())
os.replace(_tmp_cred, cred_path)
except OSError:
try:
_tmp_cred.unlink(missing_ok=True)
except OSError:
pass
raise
except (OSError, IOError) as e:
logger.debug("Failed to write refreshed credentials: %s", e)
@@ -2122,9 +2147,13 @@ def build_anthropic_kwargs(
block["text"] = text
# 3. Prefix tool names with mcp_ (Claude Code convention)
# Skip names that already begin with the marker — native MCP server
# tools (from mcp_servers: in config.yaml) are registered under their
# full mcp_<server>_<tool> name and would double-prefix otherwise,
# breaking round-trip registry lookup in normalize_response. GH-25255.
if anthropic_tools:
for tool in anthropic_tools:
if "name" in tool:
if "name" in tool and not tool["name"].startswith(_MCP_TOOL_PREFIX):
tool["name"] = _MCP_TOOL_PREFIX + tool["name"]
# 4. Prefix tool names in message history (tool_use and tool_result blocks)
+271 -47
View File
@@ -1406,6 +1406,9 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
for provider_id, pconfig in PROVIDER_REGISTRY.items():
if pconfig.auth_type != "api_key":
continue
if _is_provider_unhealthy(provider_id):
logger.debug("Auxiliary api-key chain: %s is unhealthy, skipping", provider_id)
continue
if provider_id == "anthropic":
# Only try anthropic when the user has explicitly configured it.
# Without this gate, Claude Code credentials get silently used
@@ -2260,11 +2263,12 @@ def _is_payment_error(exc: Exception) -> bool:
"credits", "insufficient funds",
"can only afford", "billing",
"payment required",
# Daily / monthly quota exhaustion keywords
# Daily / monthly / weekly quota exhaustion keywords
"quota exceeded", "quota_exceeded",
"too many tokens per day", "daily limit",
"tokens per day", "daily quota",
"resource exhausted", # Vertex AI / gRPC quota errors
"weekly usage limit", "weekly limit", # OpenCode Go weekly subscription cap
)):
return True
return False
@@ -2478,7 +2482,11 @@ def _pool_error_context(exc: Exception) -> Dict[str, Any]:
return payload
def _recoverable_pool_provider(resolved_provider: str, client: Any) -> Optional[str]:
def _recoverable_pool_provider(
resolved_provider: str,
client: Any,
main_runtime: Optional[Dict[str, Any]] = None,
) -> Optional[str]:
"""Infer which provider pool can recover the current auxiliary client."""
normalized = _normalize_aux_provider(resolved_provider)
if normalized not in {"", "auto", "custom"}:
@@ -2496,11 +2504,33 @@ def _recoverable_pool_provider(resolved_provider: str, client: Any) -> Optional[
return "copilot"
if base_url_host_matches(base, "api.kimi.com"):
return "kimi-coding"
# For api_key providers not in the hardcoded list (e.g. opencode-go), match
# the client base URL against all registered api_key providers so that
# credential-pool rotation works for any provider the user configured.
if main_runtime:
rt = _normalize_main_runtime(main_runtime)
rt_provider = rt.get("provider", "")
if rt_provider and rt_provider not in {"", "auto", "custom"}:
try:
from hermes_cli.auth import PROVIDER_REGISTRY
pconfig = PROVIDER_REGISTRY.get(rt_provider)
if pconfig and getattr(pconfig, "auth_type", None) == "api_key":
rt_base = str(getattr(pconfig, "inference_base_url", "") or "").rstrip("/")
if rt_base and base_url_host_matches(base, base_url_hostname(rt_base)):
return rt_provider
except Exception:
pass
return None
def _recover_provider_pool(provider: str, exc: Exception) -> bool:
"""Try same-provider credential-pool recovery for auxiliary calls."""
def _recover_provider_pool(provider: str, exc: Exception, *, failed_api_key: str = "") -> bool:
"""Try same-provider credential-pool recovery for auxiliary calls.
``failed_api_key`` is the API key that was actually used for the failing
request. Passing it lets mark_exhausted_and_rotate identify the correct
pool entry even when another process has already rotated the pool (which
would leave current() as None, causing the wrong entry to be marked).
"""
normalized = _normalize_aux_provider(provider)
try:
pool = load_pool(normalized)
@@ -2512,6 +2542,7 @@ def _recover_provider_pool(provider: str, exc: Exception) -> bool:
status_code = getattr(exc, "status_code", None)
error_context = _pool_error_context(exc)
hint = failed_api_key or None
if _is_auth_error(exc):
refreshed = pool.try_refresh_current()
@@ -2521,6 +2552,7 @@ def _recover_provider_pool(provider: str, exc: Exception) -> bool:
next_entry = pool.mark_exhausted_and_rotate(
status_code=status_code if status_code is not None else 401,
error_context=error_context,
api_key_hint=hint,
)
if next_entry is not None:
_evict_cached_clients(normalized)
@@ -2532,6 +2564,7 @@ def _recover_provider_pool(provider: str, exc: Exception) -> bool:
next_entry = pool.mark_exhausted_and_rotate(
status_code=status_code if status_code is not None else fallback_status,
error_context=error_context,
api_key_hint=hint,
)
if next_entry is not None:
_evict_cached_clients(normalized)
@@ -2936,6 +2969,11 @@ def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Option
resolved_provider = "custom"
explicit_base_url = runtime_base_url
explicit_api_key = runtime_api_key or None
elif runtime_api_key:
# Pin auxiliary to the same api_key as the active main chat session
# so that a working key is reused instead of re-selecting from the pool
# (which might pick a different, potentially exhausted key).
explicit_api_key = runtime_api_key
# Skip Step-1 if the main provider was recently 402'd. The unhealthy
# cache TTL bounds how long we bypass it, so a topped-up account
# recovers automatically. If we tried Step-1 anyway, every aux call
@@ -3116,6 +3154,34 @@ def resolve_provider_client(
# Normalise aliases
provider = _normalize_aux_provider(provider)
# Universal model-resolution fallback chain. Callers (notably title
# generation, vision, session search, and other auxiliary tasks) can
# reach this function without an explicit model — the user picked their
# main provider, didn't bother configuring a per-task ``auxiliary.<task>.model``,
# and just expects "use my main model for side tasks too." Resolve in
# this order, stopping at the first non-empty answer:
#
# 1. ``model`` argument (caller knew what they wanted)
# 2. Provider's catalog default — cheap/fast model the provider
# registered via ``ProviderProfile.default_aux_model`` or the
# legacy ``_API_KEY_PROVIDER_AUX_MODELS_FALLBACK`` dict. Empty
# string for OAuth-gated providers (openai-codex, xai-oauth)
# whose accepted-model lists drift on the backend, so we don't
# pin a default that can silently rot.
# 3. User's main model from ``model.model`` in config.yaml. This is
# the load-bearing step for OAuth providers: an xai-oauth user
# with grok-4.3 configured gets grok-4.3 for title generation
# instead of silently dropping to whatever Step-2 fallback (#31845).
#
# Each provider branch below sees a non-empty ``model`` whenever the
# user has *anything* configured — no provider-specific empty-model
# guards needed. When the user has NOTHING configured (fresh install,
# main_model also empty), the branches still hit their own
# missing-credentials returns and ``_resolve_auto`` falls through to
# the Step-2 chain as before.
if not model:
model = _get_aux_model_for_provider(provider) or _read_main_model() or model
def _needs_codex_wrap(client_obj, base_url_str: str, model_str: str) -> bool:
"""Decide if a plain OpenAI client should be wrapped for Responses API.
@@ -3260,7 +3326,7 @@ def resolve_provider_client(
if client is None:
logger.warning(
"resolve_provider_client: xai-oauth requested but no xAI "
"OAuth token found (run: hermes model -> xAI Grok OAuth — SuperGrok Subscription)"
"OAuth token found (run: hermes model -> xAI Grok OAuth — SuperGrok / Premium+)"
)
return None, None
final_model = _normalize_resolved_model(model or default, provider)
@@ -3730,6 +3796,37 @@ _VISION_AUTO_PROVIDER_ORDER = (
)
def _main_model_supports_vision(provider: str, model: Optional[str]) -> bool:
"""Return True when ``provider``/``model`` is known to accept image input.
Used by the vision auto-detect chain to skip the user's main provider
when it's known to be text-only (e.g. DeepSeek, gpt-oss without vision).
Without this guard, ``resolve_vision_provider_client(provider="auto")``
would happily return the main-provider client and any subsequent image
payload would surface as a cryptic provider-side error
(``unknown variant `image_url`, expected `text```, #31179).
Returns True when capability lookup is unknown preserves the historical
behaviour of attempting the call, so providers we haven't catalogued yet
don't silently regress to text-only.
"""
try:
from agent.image_routing import _lookup_supports_vision
from hermes_cli.config import load_config
except ImportError:
return True
try:
supports = _lookup_supports_vision(provider, model, load_config())
except Exception: # pragma: no cover - defensive
return True
if supports is None:
# No capability data — keep current behaviour and let the call attempt
# happen rather than silently skipping. This avoids false-positive
# skips for new/custom providers.
return True
return bool(supports)
def _normalize_vision_provider(provider: Optional[str]) -> str:
return _normalize_aux_provider(provider)
@@ -3870,6 +3967,23 @@ def resolve_vision_provider_client(
"vision support) — falling through to aggregator chain",
main_provider,
)
elif not _main_model_supports_vision(main_provider, vision_model):
# The main model is known to be text-only (e.g. DeepSeek V4,
# gpt-oss-120b without vision). Building a client and sending
# an image would produce a cryptic provider-side error like
# ``unknown variant `image_url`, expected `text``` (#31179).
# Fall through to the aggregator chain instead.
#
# Only log the provider name (not the model) — mirrors the
# sibling _PROVIDERS_WITHOUT_VISION branch above, and avoids
# CodeQL py/clear-text-logging-sensitive-data heuristic false
# positives on multi-value interpolations.
logger.debug(
"Vision auto-detect: skipping main provider %s "
"(reports no vision capability) — falling through to "
"aggregator chain",
main_provider,
)
else:
rpc_client, rpc_model = resolve_provider_client(
main_provider, vision_model,
@@ -4252,13 +4366,25 @@ def _get_cached_client(
else:
effective = _compat_model(cached_client, model, cached_default)
return cached_client, effective
# Build outside the lock
# Build outside the lock.
# For pool-backed api_key providers, derive the active API key from the
# pool entry rather than from env vars. resolve_api_key_provider_credentials
# always prefers env vars (first-entry bias), which bypasses pool rotation:
# after key #1 is marked exhausted the retry would still get key #1 from
# the env var and fail again, causing the retry2_err handler to mark key #2.
effective_api_key = api_key
if not effective_api_key:
_pe = _peek_pool_entry(_normalize_aux_provider(provider))
if _pe is not None:
_pk = _pool_runtime_api_key(_pe)
if _pk:
effective_api_key = _pk
client, default_model = resolve_provider_client(
provider,
model,
async_mode,
explicit_base_url=base_url,
explicit_api_key=api_key,
explicit_api_key=effective_api_key,
api_mode=api_mode,
main_runtime=runtime,
is_vision=is_vision,
@@ -4281,6 +4407,23 @@ def _get_cached_client(
return client, model or default_model
# Aliases that target direct REST APIs not modeled as first-class providers
# in PROVIDER_REGISTRY. Used for ``auxiliary.<task>.provider`` so users can
# write the obvious name and have it resolve to a working ``custom`` endpoint
# without needing to know our internal provider IDs.
#
# Why these specifically: PROVIDER_REGISTRY has ``openai-codex`` (OAuth) and
# ``custom`` (manual base_url + OPENAI_API_KEY) but no plain ``openai`` for
# direct API-key access. Users predictably type ``provider: openai`` and
# expect it to use OPENAI_API_KEY against api.openai.com. Previously this
# silently fell back to the user's main provider, sending OpenAI model names
# to e.g. DeepSeek and producing cryptic ``unknown variant 'image_url'``
# errors (issue #31179).
_AUX_DIRECT_API_BASE_URLS: Dict[str, str] = {
"openai": "https://api.openai.com/v1",
}
def _resolve_task_provider_model(
task: str = None,
provider: str = None,
@@ -4317,6 +4460,25 @@ def _resolve_task_provider_model(
resolved_model = model or cfg_model
resolved_api_mode = cfg_api_mode
# Convenience aliases for direct API-key endpoints that aren't first-class
# providers (e.g. ``provider: openai`` → custom + api.openai.com/v1).
# Applied to both explicit args and config-derived values. When the user
# has already supplied a base_url we keep their endpoint but still rewrite
# the provider to ``custom`` so resolution doesn't hit the
# PROVIDER_REGISTRY-only path (which has no ``openai`` entry).
def _expand_direct_api_alias(prov: Optional[str], existing_base: Optional[str]) -> Tuple[Optional[str], Optional[str]]:
if not prov:
return prov, existing_base
target_base = _AUX_DIRECT_API_BASE_URLS.get(prov.strip().lower())
if target_base is None:
return prov, existing_base
return "custom", existing_base or target_base
if provider:
provider, base_url = _expand_direct_api_alias(provider, base_url)
if cfg_provider:
cfg_provider, cfg_base_url = _expand_direct_api_alias(cfg_provider, cfg_base_url)
if base_url:
return "custom", resolved_model, base_url, api_key, resolved_api_mode
if provider:
@@ -4344,7 +4506,17 @@ _DEFAULT_AUX_TIMEOUT = 30.0
def _get_auxiliary_task_config(task: str) -> Dict[str, Any]:
"""Return the config dict for auxiliary.<task>, or {} when unavailable."""
"""Return the config dict for auxiliary.<task>, or {} when unavailable.
For plugin-registered auxiliary tasks (see
:meth:`hermes_cli.plugins.PluginContext.register_auxiliary_task`) the
plugin's declared *defaults* are layered underneath the user's config
so an unconfigured plugin task still works:
plugin defaults config.yaml auxiliary.<task> (user wins)
Built-in tasks ignore this path (their defaults live in DEFAULT_CONFIG).
"""
if not task:
return {}
try:
@@ -4354,7 +4526,27 @@ def _get_auxiliary_task_config(task: str) -> Dict[str, Any]:
return {}
aux = config.get("auxiliary", {}) if isinstance(config, dict) else {}
task_config = aux.get(task, {}) if isinstance(aux, dict) else {}
return task_config if isinstance(task_config, dict) else {}
if not isinstance(task_config, dict):
task_config = {}
# Layer plugin-declared defaults underneath user config so
# ctx.register_auxiliary_task(defaults={...}) takes effect without
# forcing the user to write config.yaml entries.
try:
from hermes_cli.plugins import get_plugin_auxiliary_tasks
for _entry in get_plugin_auxiliary_tasks():
if _entry.get("key") == task:
_defaults = _entry.get("defaults") or {}
if isinstance(_defaults, dict):
merged = dict(_defaults)
merged.update(task_config)
return merged
break
except Exception:
# Plugin discovery failure must not break aux task config reads.
pass
return task_config
def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float:
@@ -4806,10 +4998,17 @@ def call_llm(
)
# ── Same-provider credential-pool recovery ─────────────────────
pool_provider = _recoverable_pool_provider(resolved_provider, client)
pool_provider = _recoverable_pool_provider(resolved_provider, client, main_runtime=main_runtime)
# Capture the exact API key used so mark_exhausted_and_rotate can find
# the correct pool entry even when another process rotated the pool
# between this call and recovery (which leaves current()=None and makes
# _select_unlocked() return the NEXT key by mistake).
_client_api_key = str(getattr(client, "api_key", "") or "")
if pool_provider and (_is_auth_error(first_err) or _is_payment_error(first_err) or _is_rate_limit_error(first_err)):
recovery_err = first_err
if _is_rate_limit_error(first_err):
# Skip the extra retry for clear payment/quota errors — the endpoint
# won't accept another request with the same exhausted key.
if _is_rate_limit_error(first_err) and not _is_payment_error(first_err):
try:
return _validate_llm_response(
client.chat.completions.create(**kwargs), task)
@@ -4817,27 +5016,40 @@ def call_llm(
if not (_is_auth_error(retry_err) or _is_payment_error(retry_err) or _is_rate_limit_error(retry_err)):
raise
recovery_err = retry_err
if _recover_provider_pool(pool_provider, recovery_err):
if _recover_provider_pool(pool_provider, recovery_err, failed_api_key=_client_api_key):
logger.info(
"Auxiliary %s: recovered %s via credential-pool rotation after %s",
task or "call", pool_provider, type(recovery_err).__name__,
)
return _retry_same_provider_sync(
task=task,
resolved_provider=resolved_provider,
resolved_model=resolved_model,
resolved_base_url=resolved_base_url,
resolved_api_key=resolved_api_key,
resolved_api_mode=resolved_api_mode,
main_runtime=main_runtime,
final_model=final_model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
effective_timeout=effective_timeout,
effective_extra_body=effective_extra_body,
)
try:
return _retry_same_provider_sync(
task=task,
resolved_provider=resolved_provider,
resolved_model=resolved_model,
resolved_base_url=resolved_base_url,
resolved_api_key=resolved_api_key,
resolved_api_mode=resolved_api_mode,
main_runtime=main_runtime,
final_model=final_model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
effective_timeout=effective_timeout,
effective_extra_body=effective_extra_body,
)
except Exception as retry2_err:
# The rotated key also hit a quota/auth wall. Mark it
# immediately so concurrent processes don't make a
# redundant API call to discover it's exhausted too.
# Then fall through to the payment fallback below so
# alternative providers can still serve the request.
if (_is_payment_error(retry2_err) or _is_auth_error(retry2_err)
or _is_rate_limit_error(retry2_err)):
_recover_provider_pool(pool_provider, retry2_err)
first_err = retry2_err
else:
raise
# ── Payment / credit exhaustion fallback ──────────────────────
# When the resolved provider returns 402 or a credit-related error,
@@ -4879,7 +5091,7 @@ def call_llm(
# 402). Mark THAT label unhealthy so subsequent aux calls
# skip it instead of paying another doomed RTT.
_mark_provider_unhealthy(
_recoverable_pool_provider(resolved_provider, client) or resolved_provider
_recoverable_pool_provider(resolved_provider, client, main_runtime=main_runtime) or resolved_provider
)
elif _is_rate_limit_error(first_err):
reason = "rate limit"
@@ -4999,6 +5211,7 @@ async def async_call_llm(
model: str = None,
base_url: str = None,
api_key: str = None,
main_runtime: Optional[Dict[str, Any]] = None,
messages: list,
temperature: float = None,
max_tokens: int = None,
@@ -5185,10 +5398,13 @@ async def async_call_llm(
)
# ── Same-provider credential-pool recovery (mirrors sync) ─────
pool_provider = _recoverable_pool_provider(resolved_provider, client)
pool_provider = _recoverable_pool_provider(resolved_provider, client, main_runtime=main_runtime)
_client_api_key = str(getattr(client, "api_key", "") or "")
if pool_provider and (_is_auth_error(first_err) or _is_payment_error(first_err) or _is_rate_limit_error(first_err)):
recovery_err = first_err
if _is_rate_limit_error(first_err):
# Skip the extra retry for clear payment/quota errors — the endpoint
# won't accept another request with the same exhausted key.
if _is_rate_limit_error(first_err) and not _is_payment_error(first_err):
try:
return _validate_llm_response(
await client.chat.completions.create(**kwargs), task)
@@ -5196,26 +5412,34 @@ async def async_call_llm(
if not (_is_auth_error(retry_err) or _is_payment_error(retry_err) or _is_rate_limit_error(retry_err)):
raise
recovery_err = retry_err
if _recover_provider_pool(pool_provider, recovery_err):
if _recover_provider_pool(pool_provider, recovery_err, failed_api_key=_client_api_key):
logger.info(
"Auxiliary %s (async): recovered %s via credential-pool rotation after %s",
task or "call", pool_provider, type(recovery_err).__name__,
)
return await _retry_same_provider_async(
task=task,
resolved_provider=resolved_provider,
resolved_model=resolved_model,
resolved_base_url=resolved_base_url,
resolved_api_key=resolved_api_key,
resolved_api_mode=resolved_api_mode,
final_model=final_model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
effective_timeout=effective_timeout,
effective_extra_body=effective_extra_body,
)
try:
return await _retry_same_provider_async(
task=task,
resolved_provider=resolved_provider,
resolved_model=resolved_model,
resolved_base_url=resolved_base_url,
resolved_api_key=resolved_api_key,
resolved_api_mode=resolved_api_mode,
final_model=final_model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
tools=tools,
effective_timeout=effective_timeout,
effective_extra_body=effective_extra_body,
)
except Exception as retry2_err:
if (_is_payment_error(retry2_err) or _is_auth_error(retry2_err)
or _is_rate_limit_error(retry2_err)):
_recover_provider_pool(pool_provider, retry2_err)
first_err = retry2_err
else:
raise
# ── Payment / connection / rate-limit fallback (mirrors sync call_llm) ──
should_fallback = (
+8 -2
View File
@@ -115,7 +115,10 @@ _SKILL_REVIEW_PROMPT = (
"Protected skills (DO NOT edit these):\n"
" • Bundled skills (shipped with Hermes, e.g. 'hermes-agent').\n"
" • Hub-installed skills (installed via 'hermes skills install').\n"
"Pinned skills (marked via 'hermes curator pin').\n"
"Pinned skills (marked via 'hermes curator pin') CAN be improved — "
"pin only blocks deletion/archive/consolidation by the curator, not "
"content updates. Patch them when a pitfall or missing step turns up, "
"same as any other agent-created skill.\n"
"If the only skills that need updating are protected, say\n"
"'Nothing to save.' and stop.\n\n"
"Do NOT capture (these become persistent self-imposed constraints "
@@ -198,7 +201,10 @@ _COMBINED_REVIEW_PROMPT = (
"Protected skills (DO NOT edit these):\n"
" • Bundled skills (shipped with Hermes, e.g. 'hermes-agent').\n"
" • Hub-installed skills (installed via 'hermes skills install').\n"
"Pinned skills (marked via 'hermes curator pin').\n"
"Pinned skills (marked via 'hermes curator pin') CAN be improved — "
"pin only blocks deletion/archive/consolidation by the curator, not "
"content updates. Patch them when a pitfall or missing step turns up, "
"same as any other agent-created skill.\n"
"If the only skills that need updating are protected, say\n"
"'Nothing to save.' and stop.\n\n"
"Do NOT capture as skills (these become persistent self-imposed "
+267 -53
View File
@@ -34,6 +34,7 @@ from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse, parse_qs, urlunparse
from hermes_cli.timeouts import get_provider_request_timeout, get_provider_stale_timeout
from hermes_constants import PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH
from agent.error_classifier import classify_api_error, FailoverReason
from agent.model_metadata import is_local_endpoint
from agent.message_sanitization import (
@@ -75,6 +76,59 @@ def _ra():
return run_agent
def estimate_request_context_tokens(api_payload: Any) -> int:
"""Estimate context/load tokens from an API payload, dict or messages list.
The stale-call detectors historically assumed a Chat Completions request:
they pulled ``api_kwargs["messages"]`` and ran a cheap char/4 estimate.
Codex / Responses API requests carry the conversational payload in
``input`` (with additional load in ``instructions`` and ``tools``), so the
legacy estimator reported ~0 tokens for every Codex turn and the
context-tier scaling never fired.
This helper handles both shapes:
- bare list -> treat as Chat Completions ``messages``
- dict with ``messages`` -> Chat Completions (+ ``tools`` if present)
- dict with ``input`` -> Responses API (+ ``instructions``/``tools``)
- any other dict -> fall back to summing string values
"""
def _chars(value: Any) -> int:
if value is None:
return 0
if isinstance(value, str):
return len(value)
return len(str(value))
def _message_chars(messages: Any) -> int:
if not isinstance(messages, list):
return _chars(messages)
return sum(_chars(item) for item in messages)
if isinstance(api_payload, list):
return _message_chars(api_payload) // 4
if isinstance(api_payload, dict):
messages = api_payload.get("messages")
if isinstance(messages, list):
total_chars = _message_chars(messages)
if "tools" in api_payload:
total_chars += _chars(api_payload.get("tools"))
return total_chars // 4
if "input" in api_payload:
total_chars = (
_chars(api_payload.get("input"))
+ _chars(api_payload.get("instructions"))
+ _chars(api_payload.get("tools"))
)
return total_chars // 4
return sum(_chars(value) for value in api_payload.values()) // 4
return _chars(api_payload) // 4
def interruptible_api_call(agent, api_kwargs: dict):
"""
@@ -91,23 +145,55 @@ def interruptible_api_call(agent, api_kwargs: dict):
provider fallback.
"""
result = {"response": None, "error": None}
request_client_holder = {"client": None}
request_client_holder = {"client": None, "owner_tid": None}
request_client_lock = threading.Lock()
def _set_request_client(client):
with request_client_lock:
request_client_holder["client"] = client
# #29507: stamp the owning thread so a stranger-thread interrupt
# only shuts the connection down rather than racing the worker
# for FD ownership during ``client.close()``.
request_client_holder["owner_tid"] = threading.get_ident()
return client
def _take_request_client():
with request_client_lock:
client = request_client_holder.get("client")
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
return client
def _close_request_client_once(reason: str) -> None:
request_client = _take_request_client()
if request_client is not None:
# #29507: dispatch on the calling thread.
#
# When ``_call`` (the worker) reaches its ``finally`` it owns the
# close and we pop + fully close as before. When a *stranger* thread
# (the interrupt-check loop, the stale-call detector) drives the
# close, only shut the sockets down so the worker's blocked
# ``recv``/``send`` unwinds with an ``EPIPE`` / EOF — and let the
# worker close ``client`` from its own thread on its way out. That
# avoids the FD-recycling race where the kernel reassigned a
# just-closed TLS socket FD to ``kanban.db``, and the still-live SSL
# BIO on the worker thread then wrote a 24-byte TLS application-data
# record into the SQLite header (#29507).
with request_client_lock:
request_client = request_client_holder.get("client")
owner_tid = request_client_holder.get("owner_tid")
stranger_thread = (
request_client is not None
and owner_tid is not None
and owner_tid != threading.get_ident()
)
if not stranger_thread:
# Owning thread (or no recorded owner) → pop and fully close.
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
if request_client is None:
return
if stranger_thread:
agent._abort_request_openai_client(request_client, reason=reason)
else:
agent._close_request_openai_client(request_client, reason=reason)
def _call():
@@ -168,9 +254,34 @@ def interruptible_api_call(agent, api_kwargs: dict):
# httpx timeout (default 1800s) with zero feedback. The stale
# detector kills the connection early so the main retry loop can
# apply richer recovery (credential rotation, provider fallback).
_stale_timeout = agent._compute_non_stream_stale_timeout(
api_kwargs.get("messages", [])
)
_stale_timeout = agent._compute_non_stream_stale_timeout(api_kwargs)
# ── Time-to-first-byte (TTFB) watchdog for the Codex Responses stream ──
# The chatgpt.com/backend-api/codex endpoint has an intermittent failure
# mode where it accepts the connection but never emits a single stream
# event (observed directly: 0 events, no HTTP status, the socket just
# hangs). A fresh reconnect succeeds in ~2s, but the wall-clock stale
# timeout (often 180900s) makes us wait minutes before retrying. While no
# stream event has arrived yet we apply a much shorter TTFB cutoff so the
# main retry loop can reconnect promptly. Once the first event arrives the
# stream is healthy, so we fall back to the wall-clock stale timeout and
# never interrupt a legitimate long generation. Gated to codex_responses:
# only that path streams events incrementally (the chat_completions
# non-stream, anthropic and bedrock branches here have no first-event
# signal). The marker advances on *any* event (see codex_runtime), so
# reasoning-only / tool-call-only turns are not mistaken for a stall.
# Operators can tune via HERMES_CODEX_TTFB_TIMEOUT_SECONDS (0 disables).
_ttfb_enabled = agent.api_mode == "codex_responses"
try:
_ttfb_timeout = float(os.getenv("HERMES_CODEX_TTFB_TIMEOUT_SECONDS", "45"))
except (TypeError, ValueError):
_ttfb_timeout = 45.0
if _ttfb_timeout <= 0:
_ttfb_enabled = False
if _ttfb_enabled:
# Reset before the worker starts so a marker left over from a previous
# call on this agent can't be misread as first-byte for this one.
agent._codex_stream_last_event_ts = None
_call_start = time.time()
agent._touch_activity("waiting for non-streaming API response")
@@ -190,22 +301,75 @@ def interruptible_api_call(agent, api_kwargs: dict):
f"waiting for non-streaming response ({int(_elapsed)}s elapsed)"
)
_elapsed = time.time() - _call_start
# TTFB detector: the Codex stream has produced no event at all and
# we're past the first-byte cutoff → the backend opened the
# connection but isn't responding. Kill it so the retry loop can
# reconnect (a fresh connection typically succeeds in seconds),
# instead of waiting out the much longer wall-clock stale timeout.
if (
_ttfb_enabled
and _elapsed > _ttfb_timeout
and getattr(agent, "_codex_stream_last_event_ts", None) is None
):
logger.warning(
"Codex stream produced no bytes within TTFB cutoff "
"(%.0fs > %.0fs, model=%s). Backend accepted the connection "
"but sent no stream events. Killing connection so the retry "
"loop can reconnect.",
_elapsed, _ttfb_timeout, api_kwargs.get("model", "unknown"),
)
agent._emit_status(
f"⚠️ No first byte from provider in {int(_elapsed)}s "
f"(codex stream, model: {api_kwargs.get('model', 'unknown')}). "
f"Reconnecting."
)
try:
_close_request_client_once("codex_ttfb_kill")
except Exception:
pass
agent._touch_activity(
f"codex stream killed after {int(_elapsed)}s with no first byte"
)
# Wait briefly for the worker to notice the closed connection.
t.join(timeout=2.0)
if result["error"] is None and result["response"] is None:
result["error"] = TimeoutError(
f"Codex stream produced no bytes within {int(_elapsed)}s "
f"(TTFB threshold: {int(_ttfb_timeout)}s)"
)
break
# Stale-call detector: kill the connection if no response
# arrives within the configured timeout.
_elapsed = time.time() - _call_start
if _elapsed > _stale_timeout:
_est_ctx = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
_est_ctx = estimate_request_context_tokens(api_kwargs)
_silent_hint: Optional[str] = None
_hint_fn = getattr(agent, "_codex_silent_hang_hint", None)
if callable(_hint_fn):
try:
_silent_hint = _hint_fn(model=api_kwargs.get("model"))
except Exception:
_silent_hint = None
logger.warning(
"Non-streaming API call stale for %.0fs (threshold %.0fs). "
"model=%s context=~%s tokens. Killing connection.",
_elapsed, _stale_timeout,
api_kwargs.get("model", "unknown"), f"{_est_ctx:,}",
)
agent._emit_status(
f"⚠️ No response from provider for {int(_elapsed)}s "
f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
f"Aborting call."
)
if _silent_hint:
agent._emit_status(
f"⚠️ No response from provider for {int(_elapsed)}s "
f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
f"{_silent_hint}"
)
else:
agent._emit_status(
f"⚠️ No response from provider for {int(_elapsed)}s "
f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
f"Aborting call."
)
try:
if agent.api_mode == "anthropic_messages":
agent._anthropic_client.close()
@@ -220,10 +384,17 @@ def interruptible_api_call(agent, api_kwargs: dict):
# Wait briefly for the thread to notice the closed connection.
t.join(timeout=2.0)
if result["error"] is None and result["response"] is None:
result["error"] = TimeoutError(
f"Non-streaming API call timed out after {int(_elapsed)}s "
f"with no response (threshold: {int(_stale_timeout)}s)"
)
if _silent_hint:
result["error"] = TimeoutError(
f"Non-streaming API call timed out after {int(_elapsed)}s "
f"with no response (threshold: {int(_stale_timeout)}s). "
f"{_silent_hint}"
)
else:
result["error"] = TimeoutError(
f"Non-streaming API call timed out after {int(_elapsed)}s "
f"with no response (threshold: {int(_stale_timeout)}s)"
)
break
if agent._interrupt_requested:
@@ -330,6 +501,7 @@ def build_api_kwargs(agent, api_messages: list) -> dict:
reasoning_config=agent.reasoning_config,
session_id=getattr(agent, "session_id", None),
max_tokens=agent.max_tokens,
timeout=agent._resolved_api_call_timeout(),
request_overrides=agent.request_overrides,
is_github_responses=is_github_responses,
is_codex_backend=is_codex_backend,
@@ -549,6 +721,17 @@ def build_assistant_message(agent, assistant_message, finish_reason: str) -> dic
if isinstance(_san_content, str) and _san_content:
_san_content = agent._strip_think_blocks(_san_content).strip()
# Defence-in-depth: redact credentials (PATs, API keys, Bearer tokens)
# from assistant content BEFORE the message enters conversation history.
# If the model accidentally inlines a secret in its natural-language
# response, catch it here at the persistence boundary so it never
# reaches state.db, session_*.json, gateway delivery, or compression.
# Respects HERMES_REDACT_SECRETS via redact_sensitive_text — no-op
# when disabled. (#19798)
if isinstance(_san_content, str) and _san_content:
from agent.redact import redact_sensitive_text
_san_content = redact_sensitive_text(_san_content)
msg = {
"role": "assistant",
"content": _san_content,
@@ -670,6 +853,18 @@ def build_assistant_message(agent, assistant_message, finish_reason: str) -> dic
"arguments": tool_call.function.arguments
},
}
# Defence-in-depth: redact credentials from tool call arguments
# before they enter conversation history. Tool execution uses the
# raw API response object, not this dict, so redacting the
# persisted shape is safe and only affects storage. Catches the
# case where a model accidentally inlines a secret into a tool
# call (e.g. `terminal(command="curl -H 'Authorization: Bearer
# sk-...'")`). (#19798)
if isinstance(tc_dict["function"]["arguments"], str):
from agent.redact import redact_sensitive_text
tc_dict["function"]["arguments"] = redact_sensitive_text(
tc_dict["function"]["arguments"]
)
# Preserve extra_content (e.g. Gemini thought_signature) so it
# is sent back on subsequent API calls. Without this, Gemini 3
# thinking models reject the request with a 400 error.
@@ -725,7 +920,7 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
current_base_url = str(getattr(agent, "base_url", "") or "").rstrip("/").lower()
fb_base_url_for_dedup = (fb.get("base_url") or "").strip().rstrip("/").lower()
if fb_provider == current_provider and fb_model == current_model:
logging.warning(
logger.warning(
"Fallback skip: chain entry %s/%s matches current provider/model",
fb_provider, fb_model,
)
@@ -736,7 +931,7 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
and fb_base_url_for_dedup == current_base_url
and fb_model == current_model
):
logging.warning(
logger.warning(
"Fallback skip: chain entry base_url %s matches current backend",
fb_base_url_for_dedup,
)
@@ -768,7 +963,7 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
explicit_base_url=fb_base_url_hint,
explicit_api_key=fb_api_key_hint)
if fb_client is None:
logging.warning(
logger.warning(
"Fallback to %s failed: provider not configured",
fb_provider)
return agent._try_activate_fallback() # try next in chain
@@ -776,8 +971,11 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
from hermes_cli.model_normalize import normalize_model_for_provider
fb_model = normalize_model_for_provider(fb_model, fb_provider)
except Exception:
pass
except Exception as _norm_err:
logger.warning(
"Could not normalize fallback model %r for provider %r: %s",
fb_model, fb_provider, _norm_err,
)
# Determine api_mode from provider / base URL / model
fb_api_mode = "chat_completions"
@@ -905,19 +1103,20 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
base_url=agent.base_url,
api_key=getattr(agent, "api_key", ""), # callable preserved → call_llm
provider=agent.provider,
api_mode=agent.api_mode,
)
agent._emit_status(
f"🔄 Primary model failed — switching to fallback: "
f"{fb_model} via {fb_provider}"
)
logging.info(
logger.info(
"Fallback activated: %s%s (%s)",
old_model, fb_model, fb_provider,
)
return True
except Exception as e:
logging.error("Failed to activate fallback %s: %s", fb_model, e)
logger.error("Failed to activate fallback %s: %s", fb_model, e)
return agent._try_activate_fallback() # try next in chain
@@ -1133,7 +1332,7 @@ def handle_max_iterations(agent, messages: list, api_call_count: int) -> str:
final_response = "I reached the iteration limit and couldn't generate a summary."
except Exception as e:
logging.warning(f"Failed to get summary response: {e}")
logger.warning(f"Failed to get summary response: {e}")
final_response = f"I reached the maximum iterations ({agent.max_iterations}) but couldn't summarize. Error: {str(e)}"
return final_response
@@ -1162,12 +1361,12 @@ def cleanup_task_resources(agent, task_id: str) -> None:
_ra().cleanup_vm(task_id)
except Exception as e:
if agent.verbose_logging:
logging.warning(f"Failed to cleanup VM for task {task_id}: {e}")
logger.warning(f"Failed to cleanup VM for task {task_id}: {e}")
try:
_ra().cleanup_browser(task_id)
except Exception as e:
if agent.verbose_logging:
logging.warning(f"Failed to cleanup browser for task {task_id}: {e}")
logger.warning(f"Failed to cleanup browser for task {task_id}: {e}")
@@ -1271,23 +1470,44 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
return result["response"]
result = {"response": None, "error": None, "partial_tool_names": []}
request_client_holder = {"client": None, "diag": None}
request_client_holder = {"client": None, "diag": None, "owner_tid": None}
request_client_lock = threading.Lock()
def _set_request_client(client):
with request_client_lock:
request_client_holder["client"] = client
# See #29507 explanation in the non-streaming variant above.
request_client_holder["owner_tid"] = threading.get_ident()
return client
def _take_request_client():
with request_client_lock:
client = request_client_holder.get("client")
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
return client
def _close_request_client_once(reason: str) -> None:
request_client = _take_request_client()
if request_client is not None:
# See #29507 explanation in the non-streaming variant above. A
# stranger thread (the interrupt-check / stale-stream detector loop)
# only aborts sockets — never pops, never calls ``client.close()`` —
# so the worker thread retains ownership of the FD release.
with request_client_lock:
request_client = request_client_holder.get("client")
owner_tid = request_client_holder.get("owner_tid")
stranger_thread = (
request_client is not None
and owner_tid is not None
and owner_tid != threading.get_ident()
)
if not stranger_thread:
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
if request_client is None:
return
if stranger_thread:
agent._abort_request_openai_client(request_client, reason=reason)
else:
agent._close_request_openai_client(request_client, reason=reason)
first_delta_fired = {"done": False}
@@ -1939,7 +2159,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
# when the context is large. Without this, the stale detector kills
# healthy connections during the model's thinking phase, producing
# spurious RemoteProtocolError ("peer closed connection").
_est_tokens = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
_est_tokens = estimate_request_context_tokens(api_kwargs)
if _est_tokens > 100_000:
_stream_stale_timeout = max(_stream_stale_timeout_base, 300.0)
elif _est_tokens > 50_000:
@@ -1975,7 +2195,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
# inner retry loop can start a fresh connection.
_stale_elapsed = time.time() - last_chunk_time["t"]
if _stale_elapsed > _stream_stale_timeout:
_est_ctx = sum(len(str(v)) for v in api_kwargs.get("messages", [])) // 4
_est_ctx = estimate_request_context_tokens(api_kwargs)
logger.warning(
"Stream stale for %.0fs (threshold %.0fs) — no chunks received. "
"model=%s context=~%s tokens. Killing connection.",
@@ -2019,24 +2239,15 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
if deltas_were_sent["yes"]:
# Streaming failed AFTER some tokens were already delivered to
# the platform. Re-raising would let the outer retry loop make
# a new API call, creating a duplicate message. Return a
# partial "stop" response instead so the outer loop treats this
# turn as complete (no retry, no fallback).
# Recover whatever content was already streamed to the user.
# _current_streamed_assistant_text accumulates text fired
# through _fire_stream_delta, so it has exactly what the
# user saw before the connection died.
# Return a partial response stub with finish_reason="length"
# so the conversation loop's continuation machinery fires.
# tool_calls=None prevents auto-execution of incomplete calls.
_partial_text = (
getattr(agent, "_current_streamed_assistant_text", "") or ""
).strip() or None
# If the stream died while the model was emitting a tool call,
# the stub below will silently set `tool_calls=None` and the
# agent loop will treat the turn as complete — the attempted
# action is lost with no user-facing signal. Append a
# human-visible warning to the stub content so (a) the user
# knows something failed, and (b) the next turn's model sees
# in conversation history what was attempted and can retry.
# Append a user-visible warning if tool calls were dropped so
# the user and model both know what was attempted.
_partial_names = list(result.get("partial_tool_names") or [])
if _partial_names:
_name_str = ", ".join(_partial_names[:3])
@@ -2048,8 +2259,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
f"Ask me to retry if you want to continue."
)
_partial_text = (_partial_text or "") + _warn
# Also fire as a streaming delta so the user sees it now
# instead of only in the persisted transcript.
# Fire as streaming delta so the user sees it immediately.
try:
agent._fire_stream_delta(_warn)
except Exception:
@@ -2059,25 +2269,29 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
"of text; surfaced warning to user: %s",
_partial_names, len(_partial_text or ""), result["error"],
)
_stub_finish_reason = FINISH_REASON_LENGTH
else:
logger.warning(
"Partial stream delivered before error; returning stub "
"response with %s chars of recovered content to prevent "
"duplicate messages: %s",
"Partial stream delivered before error; returning "
"length-truncated stub with %s chars of recovered "
"content so the loop can continue from where the "
"stream died: %s",
len(_partial_text or ""),
result["error"],
)
_stub_finish_reason = FINISH_REASON_LENGTH
_stub_msg = SimpleNamespace(
role="assistant", content=_partial_text, tool_calls=None,
reasoning_content=None,
)
return SimpleNamespace(
id="partial-stream-stub",
id=PARTIAL_STREAM_STUB_ID,
model=getattr(agent, "model", "unknown"),
choices=[SimpleNamespace(
index=0, message=_stub_msg, finish_reason="stop",
index=0, message=_stub_msg, finish_reason=_stub_finish_reason,
)],
usage=None,
_dropped_tool_names=_partial_names or None,
)
raise result["error"]
return result["response"]
+8 -1
View File
@@ -745,7 +745,7 @@ def _preflight_codex_api_kwargs(
"model", "instructions", "input", "tools", "store",
"reasoning", "include", "max_output_tokens", "temperature",
"tool_choice", "parallel_tool_calls", "prompt_cache_key", "service_tier",
"extra_headers", "extra_body",
"extra_headers", "extra_body", "timeout",
}
normalized: Dict[str, Any] = {
"model": model,
@@ -771,6 +771,13 @@ def _preflight_codex_api_kwargs(
max_output_tokens = api_kwargs.get("max_output_tokens")
if isinstance(max_output_tokens, (int, float)) and max_output_tokens > 0:
normalized["max_output_tokens"] = int(max_output_tokens)
timeout = api_kwargs.get("timeout")
if (
isinstance(timeout, (int, float))
and not isinstance(timeout, bool)
and 0 < float(timeout) < float("inf")
):
normalized["timeout"] = float(timeout)
temperature = api_kwargs.get("temperature")
if isinstance(temperature, (int, float)):
normalized["temperature"] = float(temperature)
+6
View File
@@ -19,6 +19,7 @@ from __future__ import annotations
import json
import logging
import os
import time
from types import SimpleNamespace
from typing import Any, Dict, List
@@ -194,6 +195,11 @@ def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta
try:
with active_client.responses.stream(**api_kwargs) as stream:
for event in stream:
# Mark stream activity for the TTFB watchdog in
# interruptible_api_call. The Codex backend can accept the
# connection but never emit a single event; this timestamp
# staying None tells the watchdog no bytes are flowing.
agent._codex_stream_last_event_ts = time.time()
agent._touch_activity("receiving stream response")
if agent._interrupt_requested:
break
+4 -3
View File
@@ -609,6 +609,7 @@ class ContextCompressor(ContextEngine):
"""Update tracked token usage from API response."""
self.last_prompt_tokens = usage.get("prompt_tokens", 0)
self.last_completion_tokens = usage.get("completion_tokens", 0)
self.last_total_tokens = usage.get("total_tokens", self.last_prompt_tokens + self.last_completion_tokens)
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Check if context exceeds the compression threshold.
@@ -897,7 +898,7 @@ class ContextCompressor(ContextEngine):
into the warning log.
"""
self._summary_model_fallen_back = True
logging.warning(
logger.warning(
"Summary model '%s' %s (%s). "
"Falling back to main model '%s' for compression.",
self.summary_model, reason, e, self.model,
@@ -1086,7 +1087,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logging.warning("Context compression: no provider available for "
logger.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
_SUMMARY_FAILURE_COOLDOWN_SECONDS)
@@ -1182,7 +1183,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
if len(err_text) > 220:
err_text = err_text[:217].rstrip() + "..."
self._last_summary_error = err_text
logging.warning(
logger.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
e,
+1
View File
@@ -200,6 +200,7 @@ class ContextEngine(ABC):
base_url: str = "",
api_key: str = "",
provider: str = "",
api_mode: str = "",
) -> None:
"""Called when the user switches models or on fallback activation.
+4 -4
View File
@@ -381,12 +381,12 @@ def compress_context(
agent._session_db.end_session(agent.session_id, "compression")
old_session_id = agent.session_id
agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
os.environ["HERMES_SESSION_ID"] = agent.session_id
try:
from gateway.session_context import _SESSION_ID
_SESSION_ID.set(agent.session_id)
from gateway.session_context import set_current_session_id
set_current_session_id(agent.session_id)
except Exception:
pass
os.environ["HERMES_SESSION_ID"] = agent.session_id
agent._session_db_created = False
agent._session_db.create_session(
session_id=agent.session_id,
+139 -30
View File
@@ -65,7 +65,7 @@ from agent.prompt_caching import apply_anthropic_cache_control
from agent.retry_utils import jittered_backoff
from agent.trajectory import has_incomplete_scratchpad
from agent.usage_pricing import estimate_usage_cost, normalize_usage
from hermes_constants import display_hermes_home as _dhh_fn
from hermes_constants import display_hermes_home as _dhh_fn, PARTIAL_STREAM_STUB_ID
from hermes_logging import set_session_context
from tools.schema_sanitizer import strip_pattern_and_format
from tools.skill_provenance import set_current_write_origin
@@ -229,6 +229,37 @@ def _restore_or_build_system_prompt(agent, system_message, conversation_history)
)
def _get_continuation_prompt(is_partial_stub: bool, dropped_tools: Optional[List[str]] = None) -> str:
if is_partial_stub and dropped_tools:
tool_list = ", ".join(dropped_tools[:3])
return (
"[System: Your previous tool call "
f"({tool_list}) was too large and "
"the stream timed out before it "
"could be delivered. Do NOT retry "
"the same tool call with the same "
"large content. Instead, break the "
"content into multiple smaller tool "
"calls (e.g. use multiple patch calls "
"or write smaller files). Each tool "
"call's arguments must be under ~8K "
"tokens to avoid stream timeouts.]"
)
elif is_partial_stub:
return (
"[System: The previous response was cut off by a "
"network error mid-stream. Continue exactly where "
"you left off. Do not restart or repeat prior text. "
"Finish the answer directly.]"
)
else:
return (
"[System: Your previous response was truncated by the output "
"length limit. Continue exactly where you left off. Do not "
"restart or repeat prior text. Finish the answer directly.]"
)
def run_conversation(
agent,
user_message: str,
@@ -484,7 +515,7 @@ def run_conversation(
tools=agent.tools or None,
)
if _preflight_tokens >= agent.context_compressor.threshold_tokens:
if agent.context_compressor.should_compress(_preflight_tokens):
logger.info(
"Preflight compression: ~%s tokens >= %s threshold (model %s, ctx %s)",
f"{_preflight_tokens:,}",
@@ -1183,7 +1214,7 @@ def run_conversation(
else str(_codex_error_obj) if _codex_error_obj
else f"Responses API returned status '{_codex_resp_status}'"
)
logging.warning(
logger.warning(
"Codex response status='%s' (error=%s). Routing to fallback. %s",
_codex_resp_status, _codex_error_msg,
agent._client_log_context(),
@@ -1335,7 +1366,7 @@ def run_conversation(
primary_recovery_attempted = False
continue
agent._emit_status(f"❌ Max retries ({max_retries}) exceeded for invalid responses. Giving up.")
logging.error(f"{agent.log_prefix}Invalid API response after {max_retries} retries.")
logger.error(f"{agent.log_prefix}Invalid API response after {max_retries} retries.")
agent._persist_session(messages, conversation_history)
return {
"messages": messages,
@@ -1348,7 +1379,7 @@ def run_conversation(
# Backoff before retry — jittered exponential: 5s base, 120s cap
wait_time = jittered_backoff(retry_count, base_delay=5.0, max_delay=120.0)
agent._vprint(f"{agent.log_prefix}⏳ Retrying in {wait_time:.1f}s ({_failure_hint})...", force=True)
logging.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}")
logger.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}")
# Sleep in small increments to stay responsive to interrupts
sleep_end = time.time() + wait_time
@@ -1414,7 +1445,18 @@ def run_conversation(
finish_reason = "length"
if finish_reason == "length":
agent._vprint(f"{agent.log_prefix}⚠️ Response truncated (finish_reason='length') - model hit max output tokens", force=True)
if getattr(response, "id", "") == PARTIAL_STREAM_STUB_ID:
agent._vprint(
f"{agent.log_prefix}⚠️ Stream interrupted by network error "
f"(finish_reason='length' on partial-stream-stub)",
force=True,
)
else:
agent._vprint(
f"{agent.log_prefix}⚠️ Response truncated "
f"(finish_reason='length') - model hit max output tokens",
force=True,
)
# Normalize the truncated response to a single OpenAI-style
# message shape so text-continuation and tool-call retry
@@ -1507,17 +1549,39 @@ def run_conversation(
truncated_response_parts.append(assistant_message.content)
if length_continue_retries < 3:
agent._vprint(
f"{agent.log_prefix}↻ Requesting continuation "
f"({length_continue_retries}/3)..."
_is_partial_stream_stub = (
getattr(response, "id", "") == PARTIAL_STREAM_STUB_ID
)
_dropped_tools = getattr(
response, "_dropped_tool_names", None
)
if _is_partial_stream_stub and _dropped_tools:
_tool_list = ", ".join(_dropped_tools[:3])
agent._vprint(
f"{agent.log_prefix}↻ Stream interrupted mid "
f"tool-call ({_tool_list}) — requesting "
f"chunked retry "
f"({length_continue_retries}/3)..."
)
elif _is_partial_stream_stub:
agent._vprint(
f"{agent.log_prefix}↻ Stream interrupted — "
f"requesting continuation "
f"({length_continue_retries}/3)..."
)
else:
agent._vprint(
f"{agent.log_prefix}↻ Requesting continuation "
f"({length_continue_retries}/3)..."
)
_continue_content = _get_continuation_prompt(
_is_partial_stream_stub, _dropped_tools
)
continue_msg = {
"role": "user",
"content": (
"[System: Your previous response was truncated by the output "
"length limit. Continue exactly where you left off. Do not "
"restart or repeat prior text. Finish the answer directly.]"
),
"content": _continue_content,
}
messages.append(continue_msg)
agent._session_messages = messages
@@ -2225,7 +2289,7 @@ def run_conversation(
f"stripped all thinking blocks, retrying...",
force=True,
)
logging.warning(
logger.warning(
"%sThinking block signature recovery: stripped "
"reasoning_details from %d messages",
agent.log_prefix, len(messages),
@@ -2250,7 +2314,7 @@ def run_conversation(
from tools.schema_sanitizer import strip_pattern_and_format
_, _stripped = strip_pattern_and_format(agent.tools)
except Exception as _strip_exc: # pragma: no cover — defensive
logging.warning(
logger.warning(
"%sllama.cpp grammar recovery: strip helper failed: %s",
agent.log_prefix, _strip_exc,
)
@@ -2261,7 +2325,7 @@ def run_conversation(
f"stripped {_stripped} pattern/format keyword(s), retrying...",
force=True,
)
logging.warning(
logger.warning(
"%sllama.cpp grammar recovery: stripped %d "
"pattern/format keyword(s) from tool schemas",
agent.log_prefix, _stripped,
@@ -2269,7 +2333,7 @@ def run_conversation(
continue
# No keywords found to strip — fall through to normal
# retry path rather than loop forever on the same error.
logging.warning(
logger.warning(
"%sllama.cpp grammar error but no pattern/format "
"keywords to strip — falling through to normal retry",
agent.log_prefix,
@@ -2370,6 +2434,7 @@ def run_conversation(
base_url=agent.base_url,
api_key=getattr(agent, "api_key", ""),
provider=agent.provider,
api_mode=agent.api_mode,
)
# Context probing flags — only set on built-in
# compressor (plugin engines manage their own).
@@ -2483,7 +2548,7 @@ def run_conversation(
error_context=error_context,
)
else:
logging.info(
logger.info(
"Nous 429 looks like upstream capacity "
"(no exhausted bucket in headers or "
"last-known state) -- not tripping "
@@ -2543,7 +2608,7 @@ def run_conversation(
if compression_attempts > max_compression_attempts:
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached for payload-too-large error.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logging.error(f"{agent.log_prefix}413 compression failed after {max_compression_attempts} attempts.")
logger.error(f"{agent.log_prefix}413 compression failed after {max_compression_attempts} attempts.")
agent._persist_session(messages, conversation_history)
return {
"messages": messages,
@@ -2574,7 +2639,7 @@ def run_conversation(
else:
agent._vprint(f"{agent.log_prefix}❌ Payload too large and cannot compress further.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logging.error(f"{agent.log_prefix}413 payload too large. Cannot compress further.")
logger.error(f"{agent.log_prefix}413 payload too large. Cannot compress further.")
agent._persist_session(messages, conversation_history)
return {
"messages": messages,
@@ -2627,7 +2692,7 @@ def run_conversation(
if compression_attempts > max_compression_attempts:
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logging.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
logger.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
agent._persist_session(messages, conversation_history)
return {
"messages": messages,
@@ -2679,6 +2744,7 @@ def run_conversation(
base_url=agent.base_url,
api_key=getattr(agent, "api_key", ""),
provider=agent.provider,
api_mode=agent.api_mode,
)
# Context probing flags — only set on built-in
# compressor (plugin engines manage their own).
@@ -2700,7 +2766,7 @@ def run_conversation(
if compression_attempts > max_compression_attempts:
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logging.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
logger.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
agent._persist_session(messages, conversation_history)
return {
"messages": messages,
@@ -2733,7 +2799,7 @@ def run_conversation(
# Can't compress further and already at minimum tier
agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.", force=True)
logging.error(f"{agent.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
logger.error(f"{agent.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
agent._persist_session(messages, conversation_history)
return {
"messages": messages,
@@ -2770,6 +2836,21 @@ def run_conversation(
# retryable=True mapping takes effect instead.
and not isinstance(api_error, ssl.SSLError)
)
# ``FailoverReason.billing`` (HTTP 402) is NOT in this
# exclusion set. By the time we reach this block:
# • credential-pool rotation (line ~2031) has already
# fired for billing and either ``continue``d or
# returned (False, ...) — pool is exhausted or absent.
# • the eager-fallback branch above (line ~2422) also
# fires on billing and ``continue``s if a fallback
# provider is configured.
# Falling through to here means BOTH recovery paths
# gave up. Treating 402 as retryable from this point
# just burns more paid requests against a depleted
# balance with no recovery mechanism left — see #31273
# (real-world: ~$40 in 48h on a 24/7 gateway). Aborting
# mirrors how 401/403 (also ``should_fallback=True``)
# already behave once their recovery paths have failed.
is_client_error = (
is_local_validation_error
or (
@@ -2777,7 +2858,6 @@ def run_conversation(
and not classified.should_compress
and classified.reason not in {
FailoverReason.rate_limit,
FailoverReason.billing,
FailoverReason.overloaded,
FailoverReason.context_overflow,
FailoverReason.payload_too_large,
@@ -2809,15 +2889,26 @@ def run_conversation(
agent._vprint(f"{agent.log_prefix} 🌐 Endpoint: {_base}", force=True)
# Actionable guidance for common auth errors
if classified.is_auth or classified.reason == FailoverReason.billing:
if _provider in {"openai-codex", "xai-oauth"} and status_code == 401:
if _provider in {"openai-codex", "xai-oauth", "nous"} and status_code == 401:
if _provider == "openai-codex":
agent._vprint(f"{agent.log_prefix} 💡 Codex OAuth token was rejected (HTTP 401). Your token may have been", force=True)
agent._vprint(f"{agent.log_prefix} refreshed by another client (Codex CLI, VS Code). To fix:", force=True)
agent._vprint(f"{agent.log_prefix} 1. Run `codex` in your terminal to generate fresh tokens.", force=True)
agent._vprint(f"{agent.log_prefix} 2. Then run `hermes auth` to re-authenticate.", force=True)
else:
elif _provider == "xai-oauth":
agent._vprint(f"{agent.log_prefix} 💡 xAI OAuth token was rejected (HTTP 401). To fix:", force=True)
agent._vprint(f"{agent.log_prefix} re-authenticate with xAI Grok OAuth (SuperGrok Subscription) from `hermes model`.", force=True)
agent._vprint(f"{agent.log_prefix} re-authenticate with xAI Grok OAuth (SuperGrok / Premium+) from `hermes model`.", force=True)
else: # nous
agent._vprint(f"{agent.log_prefix} 💡 Nous Portal OAuth token was rejected (HTTP 401). Your token may be", force=True)
agent._vprint(f"{agent.log_prefix} expired, revoked, or your account may be out of credits. To fix:", force=True)
agent._vprint(f"{agent.log_prefix} 1. Re-authenticate: hermes auth add nous --type oauth", force=True)
agent._vprint(f"{agent.log_prefix} 2. Check your portal account: https://portal.nousresearch.com", force=True)
# ``:free`` is OpenRouter slug syntax; Nous Portal will reject
# the model name even after a successful re-auth.
if isinstance(_model, str) and _model.endswith(":free"):
agent._vprint(f"{agent.log_prefix} ⚠️ Note: `{_model}` looks like an OpenRouter slug (`:free` suffix).", force=True)
agent._vprint(f"{agent.log_prefix} Nous Portal won't recognize that model name. Either switch to a", force=True)
agent._vprint(f"{agent.log_prefix} Nous catalog model, or run `/model openrouter:{_model}` to use OpenRouter.", force=True)
else:
agent._vprint(f"{agent.log_prefix} 💡 Your API key was rejected by the provider. Check:", force=True)
agent._vprint(f"{agent.log_prefix} • Is the key valid? Run: hermes setup", force=True)
@@ -2826,7 +2917,7 @@ def run_conversation(
agent._vprint(f"{agent.log_prefix} • Check credits: https://openrouter.ai/settings/credits", force=True)
else:
agent._vprint(f"{agent.log_prefix} 💡 This type of error won't be fixed by retrying.", force=True)
logging.error(f"{agent.log_prefix}Non-retryable client error: {api_error}")
logger.error(f"{agent.log_prefix}Non-retryable client error: {api_error}")
# Skip session persistence when the error is likely
# context-overflow related (status 400 + large session).
# Persisting the failed user message would make the
@@ -2903,7 +2994,7 @@ def run_conversation(
force=True,
)
logging.error(
logger.error(
"%sAPI call failed after %s retries. %s | provider=%s model=%s msgs=%s tokens=~%s",
agent.log_prefix, max_retries, _final_summary,
_provider, _model, len(api_messages), f"{approx_tokens:,}",
@@ -3434,6 +3525,19 @@ def run_conversation(
f"⚠️ Tool guardrail halted {decision.tool_name}: {decision.code}"
)
messages.append({"role": "assistant", "content": final_response})
# Emit the halt message to the client so it's not
# indistinguishable from a crash. The stream display
# was flushed (callback(None)) before tool execution,
# but the callback is still alive — fire the text
# through it so SSE/TUI clients see the explanation.
if final_response:
agent._safe_print(f"\n{final_response}\n")
if agent.stream_delta_callback:
try:
agent.stream_delta_callback(final_response)
agent.stream_delta_callback(None)
except Exception:
pass
break
# Reset per-turn retry counters after successful tool
@@ -4029,6 +4133,8 @@ def run_conversation(
except Exception as _ver_err:
logger.debug("file-mutation verifier footer failed: %s", _ver_err)
_response_transformed = False
# Plugin hook: transform_llm_output
# Fired once per turn after the tool-calling loop completes.
# Plugins can transform the LLM's output text before it's returned.
@@ -4046,6 +4152,7 @@ def run_conversation(
for _hook_result in _transform_results:
if isinstance(_hook_result, str) and _hook_result:
final_response = _hook_result
_response_transformed = True
break # First non-empty string wins
except Exception as exc:
logger.warning("transform_llm_output hook failed: %s", exc)
@@ -4097,6 +4204,7 @@ def run_conversation(
"failed": failed,
"partial": False, # True only when stopped due to invalid tool calls
"interrupted": interrupted,
"response_transformed": _response_transformed,
"response_previewed": getattr(agent, "_response_was_previewed", False),
"model": agent.model,
"provider": agent.provider,
@@ -4113,6 +4221,7 @@ def run_conversation(
"estimated_cost_usd": agent.session_estimated_cost_usd,
"cost_status": agent.session_cost_status,
"cost_source": agent.session_cost_source,
"session_id": agent.session_id,
}
if agent._tool_guardrail_halt_decision is not None:
result["guardrail"] = agent._tool_guardrail_halt_decision.to_metadata()
+174
View File
@@ -0,0 +1,174 @@
"""Credential-pool disk-boundary sanitization helpers.
These helpers define which credential-pool entries are references to borrowed
runtime secrets and strip raw values before those entries are written to
``auth.json``. They intentionally have no dependency on ``hermes_cli.auth`` so
both the pool model and the final auth-store write boundary can share the same
policy without import cycles.
"""
from __future__ import annotations
import hashlib
import re
from typing import Any, Dict, Mapping
# Sources Hermes owns and can intentionally persist in auth.json. Everything
# else with a non-empty source is treated as borrowed/reference-only by default
# so future external secret providers fail closed at the disk boundary.
_PERSISTABLE_PROVIDER_SOURCES = frozenset({
("anthropic", "hermes_pkce"),
("minimax-oauth", "oauth"),
("nous", "device_code"),
("openai-codex", "device_code"),
("xai-oauth", "loopback_pkce"),
})
_SAFE_SECRETISH_METADATA_KEYS = frozenset({
"secret_fingerprint",
"secret_source",
"token_type",
"scope",
"client_id",
"agent_key_id",
"agent_key_expires_at",
"agent_key_expires_in",
"agent_key_reused",
"agent_key_obtained_at",
"expires_at",
"expires_at_ms",
"expires_in",
"last_refresh",
"last_status",
"last_status_at",
"last_error_code",
"last_error_reason",
"last_error_message",
"last_error_reset_at",
})
_SECRET_VALUE_KEYS = frozenset({
"access_token",
"refresh_token",
"agent_key",
"api_key",
"apikey",
"api_token",
"auth_token",
"authorization",
"bearer_token",
"client_secret",
"credential",
"credentials",
"id_token",
"oauth_token",
"private_key",
"secret_key",
"session_token",
"password",
"secret",
"token",
"tokens",
})
_SECRET_VALUE_SUFFIXES = (
"_api_key",
"_api_token",
"_access_token",
"_auth_token",
"_refresh_token",
"_bearer_token",
"_client_secret",
"_id_token",
"_oauth_token",
"_private_key",
"_session_token",
"_secret_key",
"_password",
"_secret",
"_token",
"_key",
)
_CAMEL_CASE_BOUNDARY = re.compile(r"(?<=[a-z0-9])(?=[A-Z])")
def _normalize_key(key: Any) -> str:
raw = str(key or "").strip()
raw = _CAMEL_CASE_BOUNDARY.sub("_", raw)
return raw.lower().replace("-", "_").replace(".", "_")
def is_borrowed_credential_source(source: Any, provider_id: Any = None) -> bool:
"""Return True when ``source`` points at a borrowed/reference-only secret."""
normalized_source = str(source or "").strip().lower()
if not normalized_source:
return False
if normalized_source == "manual" or normalized_source.startswith("manual:"):
return False
normalized_provider = str(provider_id or "").strip().lower()
return (normalized_provider, normalized_source) not in _PERSISTABLE_PROVIDER_SOURCES
def _is_secret_payload_key(key: Any) -> bool:
normalized = _normalize_key(key)
if not normalized or normalized in _SAFE_SECRETISH_METADATA_KEYS:
return False
if normalized in _SECRET_VALUE_KEYS:
return True
return normalized.endswith(_SECRET_VALUE_SUFFIXES)
def _fingerprint_value(value: Any) -> str | None:
if value is None:
return None
text = str(value)
if not text:
return None
digest = hashlib.sha256(text.encode("utf-8", errors="surrogatepass")).hexdigest()
return f"sha256:{digest[:16]}"
def _credential_secret_fingerprint(payload: Mapping[str, Any]) -> str | None:
for key in ("agent_key", "access_token", "refresh_token", "api_key", "token", "secret"):
fingerprint = _fingerprint_value(payload.get(key))
if fingerprint:
return fingerprint
for key, value in payload.items():
if _is_secret_payload_key(key):
fingerprint = _fingerprint_value(value)
if fingerprint:
return fingerprint
existing = payload.get("secret_fingerprint")
if isinstance(existing, str) and existing.startswith("sha256:"):
return existing
return None
def sanitize_borrowed_credential_payload(
payload: Mapping[str, Any],
provider_id: Any = None,
) -> Dict[str, Any]:
"""Return a disk-safe credential-pool payload.
Owned sources (manual entries and Hermes-owned OAuth/device-code state)
pass through unchanged. Borrowed/reference-only sources keep labels,
source refs, status/cooldown metadata, counters, and a non-reversible
fingerprint, but raw secret value fields are removed.
"""
result = dict(payload)
if not is_borrowed_credential_source(result.get("source"), provider_id):
return result
fingerprint = _credential_secret_fingerprint(result)
sanitized = {
key: value
for key, value in result.items()
if not _is_secret_payload_key(key)
}
if fingerprint:
sanitized["secret_fingerprint"] = fingerprint
return sanitized
+89 -23
View File
@@ -15,6 +15,10 @@ from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.config import get_env_value, load_env
from agent.credential_persistence import (
is_borrowed_credential_source,
sanitize_borrowed_credential_payload,
)
import hermes_cli.auth as auth_mod
from hermes_cli.auth import (
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
@@ -86,7 +90,7 @@ CUSTOM_POOL_PREFIX = "custom:"
_EXTRA_KEYS = frozenset({
"token_type", "scope", "client_id", "portal_base_url", "obtained_at",
"expires_in", "agent_key_id", "agent_key_expires_in", "agent_key_reused",
"agent_key_obtained_at", "tls",
"agent_key_obtained_at", "tls", "secret_source", "secret_fingerprint",
})
@@ -161,7 +165,7 @@ class PooledCredential:
for k, v in self.extra.items():
if v is not None:
result[k] = v
return result
return sanitize_borrowed_credential_payload(result, self.provider)
@property
def runtime_api_key(self) -> str:
@@ -245,6 +249,16 @@ def _extract_retry_delay_seconds(message: str) -> Optional[float]:
sec_match = re.search(r"retry\s+(?:after\s+)?(\d+(?:\.\d+)?)\s*(?:sec|secs|seconds|s\b)", message, re.IGNORECASE)
if sec_match:
return float(sec_match.group(1))
# "Resets in 4hr 5min" format used by OpenCode Go weekly usage limits
hr_min_match = re.search(r"resets?\s+in\s+(\d+)\s*hr\s+(\d+)\s*min", message, re.IGNORECASE)
if hr_min_match:
return int(hr_min_match.group(1)) * 3600 + int(hr_min_match.group(2)) * 60
hr_only_match = re.search(r"resets?\s+in\s+(\d+)\s*hr\b", message, re.IGNORECASE)
if hr_only_match:
return int(hr_only_match.group(1)) * 3600
min_only_match = re.search(r"resets?\s+in\s+(\d+)\s*min\b", message, re.IGNORECASE)
if min_only_match:
return int(min_only_match.group(1)) * 60
return None
@@ -1261,9 +1275,21 @@ class CredentialPool:
*,
status_code: Optional[int],
error_context: Optional[Dict[str, Any]] = None,
api_key_hint: Optional[str] = None,
) -> Optional[PooledCredential]:
with self._lock:
entry = self.current() or self._select_unlocked()
entry = None
if api_key_hint:
# Prefer the specific entry whose API key matches the one that
# actually failed. When this pool was freshly loaded from disk
# (another process already rotated), current() is None and
# _select_unlocked() would return the NEXT key — the wrong one.
entry = next(
(e for e in self._entries if e.runtime_api_key == api_key_hint),
None,
)
if entry is None:
entry = self.current() or self._select_unlocked()
if entry is None:
return None
_label = entry.label or entry.id[:8]
@@ -1433,8 +1459,12 @@ def _upsert_entry(entries: List[PooledCredential], provider: str, source: str, p
if field_updates or extra_updates:
if extra_updates:
field_updates["extra"] = {**existing.extra, **extra_updates}
entries[existing_idx] = replace(existing, **field_updates)
return True
updated = replace(existing, **field_updates)
entries[existing_idx] = updated
# Runtime-only borrowed secret updates should refresh the in-memory
# entry without forcing auth.json churn when the disk-safe payload is
# unchanged (for example env keys with the same fingerprint).
return existing.to_dict() != updated.to_dict()
return False
@@ -1772,6 +1802,35 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
except ImportError:
def _is_source_suppressed(_p, _s): # type: ignore[misc]
return False
def _secret_source_for_env(env_var: str) -> Optional[str]:
try:
from hermes_cli.env_loader import get_secret_source
source_label = get_secret_source(env_var)
except Exception:
source_label = None
return str(source_label).strip() if source_label else None
def _env_payload(
*,
source: str,
env_var: str,
token: str,
base_url: str,
auth_type: str = AUTH_TYPE_API_KEY,
) -> Dict[str, Any]:
payload: Dict[str, Any] = {
"source": source,
"auth_type": auth_type,
"access_token": token,
"base_url": base_url,
"label": env_var,
}
secret_source = _secret_source_for_env(env_var)
if secret_source:
payload["secret_source"] = secret_source
return payload
if provider == "openrouter":
# Prefer ~/.hermes/.env over os.environ
token = _get_env_prefer_dotenv("OPENROUTER_API_KEY")
@@ -1784,13 +1843,12 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
entries,
provider,
source,
{
"source": source,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": token,
"base_url": OPENROUTER_BASE_URL,
"label": "OPENROUTER_API_KEY",
},
_env_payload(
source=source,
env_var="OPENROUTER_API_KEY",
token=token,
base_url=OPENROUTER_BASE_URL,
),
)
return changed, active_sources
@@ -1829,13 +1887,13 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
entries,
provider,
source,
{
"source": source,
"auth_type": auth_type,
"access_token": token,
"base_url": base_url,
"label": env_var,
},
_env_payload(
source=source,
env_var=env_var,
token=token,
base_url=base_url,
auth_type=auth_type,
),
)
return changed, active_sources
@@ -1847,8 +1905,11 @@ def _prune_stale_seeded_entries(entries: List[PooledCredential], active_sources:
if _is_manual_source(entry.source)
or entry.source in active_sources
or not (
entry.source.startswith("env:")
or entry.source in {"claude_code", "hermes_pkce"}
is_borrowed_credential_source(entry.source, entry.provider)
# Hermes PKCE is Hermes-owned/persistable while present, but it is
# still a file-backed singleton and should disappear from the pool
# when the backing OAuth file is gone.
or entry.source == "hermes_pkce"
)
]
if len(retained) == len(entries):
@@ -1933,17 +1994,22 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b
def load_pool(provider: str) -> CredentialPool:
provider = (provider or "").strip().lower()
raw_entries = read_credential_pool(provider)
raw_needs_sanitization = any(
isinstance(payload, dict)
and sanitize_borrowed_credential_payload(payload, provider) != payload
for payload in raw_entries
)
entries = [PooledCredential.from_dict(provider, payload) for payload in raw_entries]
if provider.startswith(CUSTOM_POOL_PREFIX):
# Custom endpoint pool — seed from custom_providers config and model config
custom_changed, custom_sources = _seed_custom_pool(provider, entries)
changed = custom_changed
changed = raw_needs_sanitization or custom_changed
changed |= _prune_stale_seeded_entries(entries, custom_sources)
else:
singleton_changed, singleton_sources = _seed_from_singletons(provider, entries)
env_changed, env_sources = _seed_from_env(provider, entries)
changed = singleton_changed or env_changed
changed = raw_needs_sanitization or singleton_changed or env_changed
changed |= _prune_stale_seeded_entries(entries, singleton_sources | env_sources)
changed |= _normalize_pool_priorities(provider, entries)
+1 -1
View File
@@ -285,7 +285,7 @@ def _remove_xai_oauth_loopback_pkce(provider: str, removed) -> RemovalResult:
if _clear_auth_store_provider(provider):
result.cleaned.append(f"Cleared {provider} OAuth tokens from auth store")
result.hints.append(
"Run `hermes model` → xAI Grok OAuth (SuperGrok Subscription) to re-authenticate if needed."
"Run `hermes model` → xAI Grok OAuth (SuperGrok / Premium+) to re-authenticate if needed."
)
return result
+56 -6
View File
@@ -787,33 +787,65 @@ class KawaiiSpinner:
# Cute tool message (completion line that replaces the spinner)
# =========================================================================
_ERROR_SUFFIX_MAX_LEN = 48
def _trim_error(msg: str) -> str:
"""Shrink an error message for inline display in a tool status line.
Strips overly long absolute paths down to just the filename so the
suffix stays readable on narrow terminals.
"""
msg = msg.strip()
# Common case: "File not found: /very/long/absolute/path/foo.py"
if "File not found:" in msg:
_, _, tail = msg.partition("File not found:")
tail = tail.strip()
if "/" in tail:
msg = f"File not found: {tail.rsplit('/', 1)[-1]}"
if len(msg) > _ERROR_SUFFIX_MAX_LEN:
msg = msg[: _ERROR_SUFFIX_MAX_LEN - 3] + "..."
return msg
def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]:
"""Inspect a tool result string for signs of failure.
Returns ``(is_failure, suffix)`` where *suffix* is an informational tag
like ``" [exit 1]"`` for terminal failures, or ``" [error]"`` for generic
failures. On success, returns ``(False, "")``.
Returns ``(is_failure, suffix)`` where *suffix* is a short informational
tag like ``" [exit 1]"`` for terminal failures, ``" [full]"`` for memory
overflow, or a trimmed error message (``" [File not found: foo.py]"``).
On success returns ``(False, "")``.
"""
if result is None:
return False, ""
if file_mutation_result_landed(tool_name, result):
return False, ""
data = safe_json_loads(result)
# Terminal: non-zero exit code is the canonical failure signal.
if tool_name == "terminal":
data = safe_json_loads(result)
if isinstance(data, dict):
exit_code = data.get("exit_code")
if exit_code is not None and exit_code != 0:
err_msg = data.get("error")
if err_msg:
return True, f" [{_trim_error(str(err_msg))}]"
return True, f" [exit {exit_code}]"
return False, ""
# Memory-specific: distinguish "full" from real errors
# Memory: distinguish "store full" from real errors.
if tool_name == "memory":
data = safe_json_loads(result)
if isinstance(data, dict):
if data.get("success") is False and "exceed the limit" in data.get("error", ""):
return True, " [full]"
# Structured error in JSON result (any tool that surfaces {"error": ...}).
if isinstance(data, dict):
err = data.get("error") or data.get("message")
if err and (data.get("success") is False or "error" in data):
return True, f" [{_trim_error(str(err))}]"
# Generic heuristic for non-terminal tools
# Multimodal tool results (dicts with _multimodal=True) are not strings —
# treat them as successes since failures would be JSON-encoded strings.
@@ -921,11 +953,29 @@ def get_cute_tool_message(
if tool_name == "todo":
todos_arg = args.get("todos")
merge = args.get("merge", False)
# Parse result for completion progress
total = 0
done = 0
if result:
try:
data = safe_json_loads(result)
if data:
s = data.get("summary", {})
total = s.get("total", 0)
done = s.get("completed", 0)
except Exception:
pass
if todos_arg is None:
if total > 0:
return _wrap(f"┊ 📋 plan {done}/{total} task(s) {dur}")
return _wrap(f"┊ 📋 plan reading tasks {dur}")
elif merge:
if total > 0 and done > 0:
return _wrap(f"┊ 📋 plan update {done}/{total}{dur}")
return _wrap(f"┊ 📋 plan update {len(todos_arg)} task(s) {dur}")
else:
if total > 0 and done > 0:
return _wrap(f"┊ 📋 plan {done}/{total} task(s) {dur}")
return _wrap(f"┊ 📋 plan {len(todos_arg)} task(s) {dur}")
if tool_name == "session_search":
return _wrap(f"┊ 🔍 recall \"{_trunc(args.get('query', ''), 35)}\" {dur}")
+35
View File
@@ -240,6 +240,24 @@ _MODEL_NOT_FOUND_PATTERNS = [
"unsupported model",
]
# Request-validation patterns — the request is malformed and will fail
# identically on every retry. Some OpenAI-compatible gateways (notably
# codex.nekos.me) return these as 5xx instead of the standard 4xx, which
# makes the generic "5xx → retryable server_error" rule misfire: the retry
# loop hammers the same deterministic rejection 3+ times, then the
# transport-recovery path resets the counter and does it again, producing
# a request flood. When a 5xx body carries one of these unambiguous
# request-validation signals, classify as a non-retryable format_error so
# the loop fails fast and falls back instead of looping.
_REQUEST_VALIDATION_PATTERNS = [
"unknown parameter",
"unsupported parameter",
"unrecognized request argument",
"invalid_request_error",
"unknown_parameter",
"unsupported_parameter",
]
# OpenRouter aggregator policy-block patterns.
#
# When a user's OpenRouter account privacy setting (or a per-request
@@ -745,6 +763,23 @@ def _classify_by_status(
)
if status_code in {500, 502}:
# Some OpenAI-compatible gateways return request-validation errors
# with a 5xx status (codex.nekos.me returns 502 for unknown/
# unsupported parameters). These are deterministic — every retry
# gets the identical rejection — so the generic "5xx → retryable
# server_error" rule turns one bad request into a retry flood.
# Detect the unambiguous request-validation signals (in either the
# message text or the structured error code) and fail fast.
if (
any(p in error_msg for p in _REQUEST_VALIDATION_PATTERNS)
or error_code.lower() in {"invalid_request_error", "unknown_parameter",
"unsupported_parameter"}
):
return result_fn(
FailoverReason.format_error,
retryable=False,
should_fallback=True,
)
return result_fn(FailoverReason.server_error, retryable=True)
if status_code in {503, 529}:
+335 -11
View File
@@ -41,6 +41,11 @@ def build_write_denied_paths(home: str) -> set[str]:
# Top-level .env, even when running under a profile — overwriting it
# leaks credentials across every profile that inherits from root (#15981).
str(hermes_root / ".env"),
# Active profile Anthropic PKCE credential store.
str(hermes_home / ".anthropic_oauth.json"),
# Top-level Anthropic PKCE credential store remains sensitive even
# when a profile is active; default/non-profile sessions still read it.
str(hermes_root / ".anthropic_oauth.json"),
os.path.join(home, ".bashrc"),
os.path.join(home, ".zshrc"),
os.path.join(home, ".profile"),
@@ -50,6 +55,7 @@ def build_write_denied_paths(home: str) -> set[str]:
os.path.join(home, ".pgpass"),
os.path.join(home, ".npmrc"),
os.path.join(home, ".pypirc"),
os.path.join(home, ".git-credentials"),
"/etc/sudoers",
"/etc/passwd",
"/etc/shadow",
@@ -71,6 +77,7 @@ def build_write_denied_prefixes(home: str) -> list[str]:
os.path.join(home, ".docker"),
os.path.join(home, ".azure"),
os.path.join(home, ".config", "gh"),
os.path.join(home, ".config", "gcloud"),
]
]
@@ -97,6 +104,43 @@ def is_write_denied(path: str) -> bool:
if resolved.startswith(prefix):
return True
# Hermes control-plane files: block both the ACTIVE profile's view
# (hermes_home) AND the global root view. Without the root pass, a
# profile-mode session leaves <root>/auth.json + <root>/config.yaml
# writable — letting a prompt-injected write_file overwrite the global
# files that every profile inherits from (same shape as #15981).
control_file_names = ("auth.json", "config.yaml", "webhook_subscriptions.json")
mcp_tokens_dir_name = "mcp-tokens"
hermes_dirs = []
for base in (_hermes_home_path(), _hermes_root_path()):
try:
real = os.path.realpath(base)
if real not in hermes_dirs:
hermes_dirs.append(real)
except Exception:
continue
for base_real in hermes_dirs:
for name in control_file_names:
try:
if resolved == os.path.realpath(os.path.join(base_real, name)):
return True
except Exception:
continue
try:
mcp_real = os.path.realpath(os.path.join(base_real, mcp_tokens_dir_name))
if resolved == mcp_real or resolved.startswith(mcp_real + os.sep):
return True
except Exception:
pass
try:
pairing_real = os.path.realpath(os.path.join(base_real, "pairing"))
if resolved == pairing_real or resolved.startswith(pairing_real + os.sep):
return True
except Exception:
pass
safe_root = get_safe_write_root()
if safe_root and not (resolved == safe_root or resolved.startswith(safe_root + os.sep)):
return True
@@ -104,22 +148,302 @@ def is_write_denied(path: str) -> bool:
return False
# Common secret-bearing project-local environment file basenames.
# These are blocked because .env files routinely contain API keys,
# database passwords, and other credentials.
_BLOCKED_PROJECT_ENV_BASENAMES: set[str] = {
".env",
".env.local",
".env.development",
".env.production",
".env.test",
".env.staging",
".envrc",
}
def get_read_block_error(path: str) -> Optional[str]:
"""Return an error message when a read targets internal Hermes cache files."""
"""Return an error message when a read targets a denied Hermes path.
Three categories are blocked:
* Internal Hermes cache files under ``HERMES_HOME/skills/.hub``
readable metadata that an attacker could use as a prompt-injection
carrier.
* Credential / secret stores under HERMES_HOME and the global Hermes
root: ``auth.json``, ``auth.lock``, ``.anthropic_oauth.json``,
``.env``, ``webhook_subscriptions.json``, ``auth/google_oauth.json``,
and anything under ``mcp-tokens/``. These hold plaintext provider keys,
OAuth tokens, and HMAC secrets that the agent never needs to read
directly provider tools / gateway adapters consume them through
internal channels.
* Project-local environment files anywhere on disk: ``.env``,
``.env.local``, ``.env.development``, ``.env.production``,
``.env.test``, ``.env.staging``, ``.envrc``. These routinely hold
API keys, database passwords, and other credentials for the user's
own projects. The agent helping debug a project shouldn't normally
need to read these ``.env.example`` is the documented-shape
substitute.
**This is NOT a security boundary.** The terminal tool runs as the
same OS user with shell access; the agent can still ``cat auth.json``
or ``cat ~/.hermes/.env`` and exfiltrate the file. The read-deny exists
as defense-in-depth that:
* Returns a clear error to models that respect tool denials, which
empirically prompts most modern models to stop rather than reach
for the shell.
* Surfaces a visible audit trail when something tries to read
credentials easier to spot in logs than a generic ``cat``.
Treat any user-visible framing around this as "may help" rather than
"stops attackers." A determined model or malicious instruction can
always shell out.
Callers that resolve relative paths against a non-process cwd
(e.g. ``TERMINAL_CWD`` in ``tools/file_tools.py``) MUST pre-resolve
and pass the absolute path string. This function's own ``resolve()``
is anchored at the Python process cwd, so a relative input like
``"auth.json"`` would otherwise miss the denylist when the task's
terminal cwd differs from the process cwd.
"""
resolved = Path(path).expanduser().resolve()
hermes_home = _hermes_home_path().resolve()
blocked_dirs = [
hermes_home / "skills" / ".hub" / "index-cache",
hermes_home / "skills" / ".hub",
]
for blocked in blocked_dirs:
# Resolve BOTH the active HERMES_HOME (profile-aware) AND the global
# Hermes root so credential stores at <root>/auth.json etc. are also
# blocked when running under a profile (HERMES_HOME points at
# <root>/profiles/<name> in profile mode). Same shape as the write
# deny widening (#15981, #14157).
hermes_dirs: list[Path] = []
for base in (_hermes_home_path(), _hermes_root_path()):
try:
resolved.relative_to(blocked)
real = base.resolve()
if real not in hermes_dirs:
hermes_dirs.append(real)
except Exception:
continue
# Skills .hub: prompt-injection carriers.
for hd in hermes_dirs:
blocked_dirs = [
hd / "skills" / ".hub" / "index-cache",
hd / "skills" / ".hub",
]
for blocked in blocked_dirs:
try:
resolved.relative_to(blocked)
except ValueError:
continue
return (
f"Access denied: {path} is an internal Hermes cache file "
"and cannot be read directly to prevent prompt injection. "
"Use the skills_list or skill_view tools instead."
)
# Credential / secret stores. Exact-file matches under either
# HERMES_HOME or <root>.
credential_file_names = (
"auth.json",
"auth.lock",
".anthropic_oauth.json",
".env",
"webhook_subscriptions.json",
os.path.join("auth", "google_oauth.json"),
)
for hd in hermes_dirs:
for name in credential_file_names:
try:
blocked = (hd / name).resolve()
except Exception:
continue
if resolved == blocked:
return (
f"Access denied: {path} is a Hermes credential store "
"and cannot be read directly. Provider tools consume "
"these credentials through internal channels. "
"(Defense-in-depth — not a security boundary; the "
"terminal tool can still bypass.)"
)
# mcp-tokens/: directory prefix match — anything inside is OAuth
# token material.
for hd in hermes_dirs:
try:
mcp_tokens = (hd / "mcp-tokens").resolve()
except Exception:
continue
if resolved == mcp_tokens:
return (
f"Access denied: {path} is the Hermes MCP token directory "
"and cannot be read directly. (Defense-in-depth — not a "
"security boundary; the terminal tool can still bypass.)"
)
try:
resolved.relative_to(mcp_tokens)
except ValueError:
continue
return (
f"Access denied: {path} is an internal Hermes cache file "
"and cannot be read directly to prevent prompt injection. "
"Use the skills_list or skill_view tools instead."
f"Access denied: {path} is a Hermes MCP token file "
"and cannot be read directly. (Defense-in-depth — not a "
"security boundary; the terminal tool can still bypass.)"
)
# Block common secret-bearing project-local .env files anywhere on disk.
# The agent helping a user with their project rarely needs to read raw
# .env contents — .env.example is the documented-shape substitute. The
# terminal tool can still ``cat .env``; this is defense-in-depth, not a
# boundary (see module docstring).
if resolved.name in _BLOCKED_PROJECT_ENV_BASENAMES:
return (
f"Access denied: {path} is a secret-bearing environment file "
"and cannot be read to prevent credential leakage. "
"If you need to check the file structure, read .env.example instead. "
"(Defense-in-depth — not a security boundary; the terminal tool can still bypass.)"
)
return None
# ---------------------------------------------------------------------------
# Cross-profile write guard (#TBD)
#
# Hermes profiles are separate HERMES_HOME dirs under
# ``<root>/profiles/<name>/``. Each profile has its own skills/, plugins/,
# cron/, memories/. When an agent runs under one profile, writing into
# ANOTHER profile's directories is almost always wrong — those skills /
# plugins / cron jobs / memories affect a different session the user runs
# from a different shell.
#
# Soft guard, NOT a security boundary: the agent runs as the same OS user
# and has unrestricted terminal access, so this returns a warning the model
# can choose to honor or override with ``cross_profile=True``. Same shape
# as the dangerous-command approval flow — the agent is told the boundary
# exists, and explicit user direction is required to cross it.
#
# Reference: May 2026 incident where a hermes-security profile session
# edited skills under both ``~/.hermes/profiles/hermes-security/skills/``
# AND ``~/.hermes/skills/`` (the default profile's skills) without realizing
# the second path belonged to a different profile.
# ---------------------------------------------------------------------------
# Profile-scoped directories under HERMES_HOME / <root> / <root>/profiles/<X>/
# that should be guarded. Adding a new area here extends the guard with no
# other code change.
PROFILE_SCOPED_AREAS = ("skills", "plugins", "cron", "memories")
def _resolve_active_profile_name() -> str:
"""Return the active profile name derived from HERMES_HOME.
``~/.hermes`` -> ``"default"``
``~/.hermes/profiles/X`` -> ``"X"``
Falls back to ``"default"`` on any resolution failure so the guard
never raises into the tool path.
"""
try:
home_real = _hermes_home_path().resolve()
root_real = _hermes_root_path().resolve()
except (OSError, RuntimeError):
return "default"
profiles_dir = root_real / "profiles"
try:
rel = home_real.relative_to(profiles_dir)
parts = rel.parts
if len(parts) >= 1:
return parts[0]
except ValueError:
pass
return "default"
def classify_cross_profile_target(path: str) -> Optional[dict]:
"""Classify a write target as cross-profile if it lands in another
profile's scoped area (skills/plugins/cron/memories).
Returns ``None`` when the target is outside Hermes scope, or is inside
the ACTIVE profile, or doesn't hit a profile-scoped area. Otherwise
returns a dict with:
* ``active_profile``: name of the profile the agent is running as
* ``target_profile``: name of the profile the path belongs to
* ``area``: which scoped area (``"skills"``, ``"plugins"``, etc.)
* ``target_path``: the resolved path string
The caller decides what to do with the result surface a warning to
the model, prompt the user, or (with explicit consent /
``cross_profile=True``) proceed anyway.
"""
try:
target = Path(os.path.expanduser(str(path))).resolve()
root_real = _hermes_root_path().resolve()
except (OSError, RuntimeError):
return None
target_profile: Optional[str] = None
area: Optional[str] = None
try:
rel = target.relative_to(root_real)
except ValueError:
return None
parts = rel.parts
if not parts:
return None
if parts[0] in PROFILE_SCOPED_AREAS:
# ``<root>/<area>/...`` → default profile.
target_profile = "default"
area = parts[0]
elif (
parts[0] == "profiles"
and len(parts) >= 3
and parts[2] in PROFILE_SCOPED_AREAS
):
# ``<root>/profiles/<name>/<area>/...`` → named profile.
target_profile = parts[1]
area = parts[2]
else:
return None
active_profile = _resolve_active_profile_name()
if target_profile == active_profile:
# In-profile write — not a cross-profile event.
return None
return {
"active_profile": active_profile,
"target_profile": target_profile,
"area": area,
"target_path": str(target),
}
def get_cross_profile_warning(path: str) -> Optional[str]:
"""Return a model-facing warning string when ``path`` is cross-profile.
Returns ``None`` when the write is in-scope (same profile) or outside
Hermes entirely. Caller is expected to surface the warning to the
agent as a tool-result error, NOT to silently allow the write the
agent must either get explicit user direction to proceed, or pass
``cross_profile=True`` to its write tool.
This is defense-in-depth: the terminal tool runs as the same OS user
and can write any of these paths without going through this guard.
Treat the guard as a confusion-reducer, not a security boundary.
"""
info = classify_cross_profile_target(path)
if info is None:
return None
return (
f"Cross-profile write blocked by soft guard: {info['target_path']} "
f"belongs to Hermes profile {info['target_profile']!r}, but the "
f"agent is running under profile {info['active_profile']!r}. "
f"Editing another profile's {info['area']}/ will affect that "
f"profile's future sessions, not the one you are currently in. "
f"Confirm with the user before proceeding. To bypass this guard "
f"after explicit user direction, retry the call with "
f"``cross_profile=True``. (Defense-in-depth — not a security "
f"boundary; the terminal tool can still bypass.)"
)
+82
View File
@@ -191,6 +191,88 @@ def save_b64_image(
return path
# Extension inference for save_url_image — keep small and explicit. We don't
# want to import mimetypes for a handful of formats every image_gen provider
# actually returns, and we never want to inherit a content-type that points
# at HTML or JSON when the API gives us a degenerate response.
_URL_IMAGE_CONTENT_TYPES = {
"image/png": "png",
"image/jpeg": "jpg",
"image/jpg": "jpg",
"image/webp": "webp",
"image/gif": "gif",
}
def save_url_image(
url: str,
*,
prefix: str = "image",
timeout: float = 60.0,
max_bytes: int = 25 * 1024 * 1024,
) -> Path:
"""Download an image URL and write it under ``$HERMES_HOME/cache/images/``.
Used by providers (xAI, fallback OpenAI) whose API returns an *ephemeral*
URL instead of inline base64 those URLs frequently expire before a
downstream consumer (Telegram ``send_photo``, browser fetch) can resolve
them, so we materialise the bytes locally at tool-completion time.
Mirrors :func:`save_b64_image`'s shape so providers can swap in one line.
Returns the absolute :class:`Path` to the saved file. Raises on any
network / HTTP / oversize / non-image-content-type error so callers can
fall back to returning the bare URL with a clear error message.
"""
import requests
response = requests.get(url, timeout=timeout, stream=True)
response.raise_for_status()
# Infer extension from the response content-type, falling back to the
# URL suffix when xAI / OpenAI omit a precise type (some CDNs return
# ``application/octet-stream``). Defaults to ``png``.
content_type = (response.headers.get("Content-Type") or "").split(";", 1)[0].strip().lower()
extension = _URL_IMAGE_CONTENT_TYPES.get(content_type)
if extension is None:
url_path = url.split("?", 1)[0].lower()
for ext in ("png", "jpg", "jpeg", "webp", "gif"):
if url_path.endswith(f".{ext}"):
extension = "jpg" if ext == "jpeg" else ext
break
if extension is None:
extension = "png"
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
short = uuid.uuid4().hex[:8]
path = _images_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
bytes_written = 0
with path.open("wb") as fh:
for chunk in response.iter_content(chunk_size=64 * 1024):
if not chunk:
continue
bytes_written += len(chunk)
if bytes_written > max_bytes:
fh.close()
try:
path.unlink()
except OSError:
pass
raise ValueError(
f"Image at {url} exceeds {max_bytes // (1024 * 1024)}MB cap; refusing to cache."
)
fh.write(chunk)
if bytes_written == 0:
try:
path.unlink()
except OSError:
pass
raise ValueError(f"Image at {url} returned 0 bytes; refusing to cache.")
return path
def success_response(
*,
image: str,
+2 -1
View File
@@ -209,6 +209,7 @@ DEFAULT_CONTEXT_LENGTHS = {
# via a custom provider. Values sourced from models.dev (2026-04).
# Keys use substring matching (longest-first), so e.g. "grok-4.20"
# matches "grok-4.20-0309-reasoning" / "-non-reasoning" / "-multi-agent-0309".
"grok-build": 256000, # grok-build-0.1
"grok-code-fast": 256000, # grok-code-fast-1
"grok-4-1-fast": 2000000, # grok-4-1-fast-(non-)reasoning
"grok-2-vision": 8192, # grok-2-vision, -1212, -latest
@@ -640,7 +641,7 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
return cache
except Exception as e:
logging.warning(f"Failed to fetch model metadata from OpenRouter: {e}")
logger.warning(f"Failed to fetch model metadata from OpenRouter: {e}")
return _model_metadata_cache or {}
+3
View File
@@ -167,6 +167,9 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"gemini": "google",
"google": "google",
"xai": "xai",
# xAI OAuth is an authentication/transport path for the same xAI model
# catalog, so model metadata should resolve through the xAI provider.
"xai-oauth": "xai",
"xiaomi": "xiaomi",
"nvidia": "nvidia",
"groq": "groq",
+42
View File
@@ -176,6 +176,15 @@ _URL_USERINFO_RE = re.compile(
r"(https?|wss?|ftp)://([^/\s:@]+):([^/\s@]+)@",
)
# HTTP access logs often use a relative request target rather than a full URL:
# `"POST /webhook?password=... HTTP/1.1"`. The full-URL redactor above only
# sees strings containing `://`, so handle request-target query strings too.
_HTTP_REQUEST_TARGET_QUERY_RE = re.compile(
r"\b((?:GET|POST|PUT|PATCH|DELETE|HEAD|OPTIONS|TRACE|CONNECT)\s+[^ \t\r\n\"']*?)"
r"\?([^ \t\r\n\"']+)",
re.IGNORECASE,
)
# Form-urlencoded body detection: conservative — only applies when the entire
# text looks like a query string (k=v&k=v pattern with no newlines).
_FORM_BODY_RE = re.compile(
@@ -293,6 +302,15 @@ def _redact_url_userinfo(text: str) -> str:
)
def _redact_http_request_target_query_params(text: str) -> str:
"""Redact sensitive query params in HTTP access-log request targets."""
def _sub(m: re.Match) -> str:
prefix = m.group(1)
query = _redact_query_string(m.group(2))
return f"{prefix}?{query}"
return _HTTP_REQUEST_TARGET_QUERY_RE.sub(_sub, text)
def _redact_form_body(text: str) -> str:
"""Redact sensitive values in a form-urlencoded body.
@@ -397,6 +415,11 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
if "?" in text:
text = _redact_url_query_params(text)
# HTTP access logs can contain relative request targets with query params
# and no URL scheme, e.g. `"POST /hook?password=... HTTP/1.1"`.
if "?" in text and "=" in text and _has_http_method_substring(text):
text = _redact_http_request_target_query_params(text)
# Form-urlencoded bodies (only triggers on clean k=v&k=v inputs).
if "&" in text and "=" in text:
text = _redact_form_body(text)
@@ -456,6 +479,25 @@ def _has_known_prefix_substring(text: str) -> bool:
return any(p in text for p in _PREFIX_SUBSTRINGS)
_HTTP_METHOD_SUBSTRINGS = (
"GET ",
"POST ",
"PUT ",
"PATCH ",
"DELETE ",
"HEAD ",
"OPTIONS ",
"TRACE ",
"CONNECT ",
)
def _has_http_method_substring(text: str) -> bool:
"""Cheap pre-check before scanning for access-log request targets."""
upper = text.upper()
return any(method in upper for method in _HTTP_METHOD_SUBSTRINGS)
class RedactingFormatter(logging.Formatter):
"""Log formatter that redacts secrets from all log messages."""
+152 -6
View File
@@ -70,9 +70,105 @@ _BWS_RUN_TIMEOUT = 30
# In-process cache so repeated load_hermes_dotenv() calls (CLI startup,
# gateway hot-reload, test suites) don't re-fetch from BSM.
_CacheKey = Tuple[str, str] # (access_token_fingerprint, project_id)
_CacheKey = Tuple[str, str, str] # (access_token_fingerprint, project_id, server_url)
_CACHE: Dict[_CacheKey, "_CachedFetch"] = {}
# Disk-persisted cache so back-to-back CLI invocations (e.g. `hermes chat -q ...`
# called from scripts, cron, the gateway forking new agents) don't each pay the
# ~380ms `bws secret list` tax. The in-process _CACHE above only saves repeated
# fetches WITHIN one process; this saves repeated fetches ACROSS processes.
#
# Layout: one JSON object per cache key, written atomically with mode 0600 in
# <hermes_home>/cache/bws_cache.json. The file holds only the secret VALUES,
# never the access token. It's plaintext-equivalent to ~/.hermes/.env (which
# we already accept) but kept out of the .env file so users editing it won't
# accidentally commit BSM-sourced secrets.
_DISK_CACHE_BASENAME = "bws_cache.json"
def _disk_cache_path(home_path: Optional[Path] = None) -> Path:
"""Return the disk cache path under hermes_home/cache/.
`home_path` is what `load_hermes_dotenv()` already resolved; falling back
to `$HERMES_HOME` / `~/.hermes` keeps direct callers working too.
"""
if home_path is None:
home_path = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
return home_path / "cache" / _DISK_CACHE_BASENAME
def _cache_key_str(cache_key: _CacheKey) -> str:
"""Serialize a cache key to a stable string for JSON storage."""
token_fp, project_id, server_url = cache_key
return f"{token_fp}|{project_id}|{server_url}"
def _read_disk_cache(cache_key: _CacheKey, ttl_seconds: float,
home_path: Optional[Path] = None) -> Optional["_CachedFetch"]:
"""Return a cached entry from disk if fresh, else None.
Best-effort: any I/O or parse error returns None and we re-fetch.
"""
if ttl_seconds <= 0:
return None
path = _disk_cache_path(home_path)
try:
with open(path, "r", encoding="utf-8") as f:
payload = json.load(f)
except (OSError, json.JSONDecodeError):
return None
if not isinstance(payload, dict):
return None
if payload.get("key") != _cache_key_str(cache_key):
return None
secrets = payload.get("secrets")
fetched_at = payload.get("fetched_at")
if not isinstance(secrets, dict) or not isinstance(fetched_at, (int, float)):
return None
# Coerce all values to strings — JSON allows numbers but env vars need strings
typed_secrets: Dict[str, str] = {
k: v for k, v in secrets.items() if isinstance(k, str) and isinstance(v, str)
}
entry = _CachedFetch(secrets=typed_secrets, fetched_at=float(fetched_at))
if not entry.is_fresh(ttl_seconds):
return None
return entry
def _write_disk_cache(cache_key: _CacheKey, entry: "_CachedFetch",
home_path: Optional[Path] = None) -> None:
"""Persist a cache entry to disk atomically with mode 0600.
Best-effort: any I/O error is swallowed (the next invocation will just
re-fetch). We never want disk cache failures to break startup.
"""
path = _disk_cache_path(home_path)
try:
path.parent.mkdir(parents=True, exist_ok=True)
payload = {
"key": _cache_key_str(cache_key),
"secrets": entry.secrets,
"fetched_at": entry.fetched_at,
}
# Write to a temp file in the same directory and atomic-rename.
# tempfile honors os.umask, so we explicitly chmod 0600 before rename.
fd, tmp = tempfile.mkstemp(
prefix=".bws_cache_", suffix=".tmp", dir=str(path.parent)
)
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(payload, f)
os.chmod(tmp, 0o600)
os.replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
except OSError:
pass
raise
except OSError:
pass # best-effort — disk cache miss on next invocation is fine
@dataclass
class _CachedFetch:
@@ -317,11 +413,26 @@ def fetch_bitwarden_secrets(
binary: Optional[Path] = None,
cache_ttl_seconds: float = 300,
use_cache: bool = True,
server_url: str = "",
home_path: Optional[Path] = None,
) -> Tuple[Dict[str, str], List[str]]:
"""Pull the secrets for ``project_id`` from Bitwarden Secrets Manager.
Returns ``(secrets_dict, warnings_list)``.
Set ``server_url`` to point at a non-default Bitwarden region or a
self-hosted instance e.g. ``https://vault.bitwarden.eu`` for EU
Cloud accounts. When empty, ``bws`` uses its built-in default
(``https://vault.bitwarden.com``, US Cloud). This is plumbed into
the subprocess as ``BWS_SERVER_URL``.
Caching is a two-layer LRU: an in-process dict (for hot-reload paths
inside one process) and a disk-persisted JSON file under
``<hermes_home>/cache/bws_cache.json`` (for back-to-back CLI invocations).
Both share the same TTL. Pass ``home_path`` so disk cache lookups find
the right directory in tests / non-standard installs; otherwise we fall
back to ``$HERMES_HOME`` / ``~/.hermes``.
Raises :class:`RuntimeError` for fatal conditions (missing binary,
auth failure, unparseable output). Callers in the env_loader path
catch this and emit a single warning; callers in the user-facing
@@ -332,11 +443,18 @@ def fetch_bitwarden_secrets(
if not project_id:
raise RuntimeError("Bitwarden project_id is empty")
cache_key = (_token_fingerprint(access_token), project_id)
cache_key = (_token_fingerprint(access_token), project_id, server_url or "")
if use_cache:
cached = _CACHE.get(cache_key)
if cached and cached.is_fresh(cache_ttl_seconds):
return cached.secrets, []
# L2: disk cache. ~5ms on cache hit vs ~380ms for `bws secret list`.
disk_cached = _read_disk_cache(cache_key, cache_ttl_seconds, home_path)
if disk_cached is not None:
# Promote into in-process cache so subsequent fetches in the
# same process skip the disk read too.
_CACHE[cache_key] = disk_cached
return disk_cached.secrets, []
bws = binary or find_bws(install_if_missing=True)
if bws is None:
@@ -347,19 +465,29 @@ def fetch_bitwarden_secrets(
"`hermes secrets bitwarden setup`."
)
secrets, warnings = _run_bws_list(bws, access_token, project_id)
_CACHE[cache_key] = _CachedFetch(secrets=secrets, fetched_at=time.time())
secrets, warnings = _run_bws_list(bws, access_token, project_id, server_url)
entry = _CachedFetch(secrets=secrets, fetched_at=time.time())
_CACHE[cache_key] = entry
if use_cache:
_write_disk_cache(cache_key, entry, home_path)
return secrets, warnings
def _run_bws_list(
bws: Path, access_token: str, project_id: str
bws: Path, access_token: str, project_id: str, server_url: str = ""
) -> Tuple[Dict[str, str], List[str]]:
cmd = [str(bws), "secret", "list", project_id, "--output", "json"]
env = os.environ.copy()
env["BWS_ACCESS_TOKEN"] = access_token
# Make sure we're not echoing telemetry / colour codes into json.
env.setdefault("NO_COLOR", "1")
# Region / self-hosted support. bws defaults to https://vault.bitwarden.com
# (US Cloud); EU Cloud users need https://vault.bitwarden.eu, and
# self-hosted users need their own URL. When unset, fall back to whatever
# BWS_SERVER_URL the caller already had in their shell env (preserved by
# the copy above) so manual overrides keep working too.
if server_url:
env["BWS_SERVER_URL"] = server_url
try:
proc = subprocess.run( # noqa: S603 — bws path is trusted
@@ -437,6 +565,8 @@ def apply_bitwarden_secrets(
override_existing: bool = False,
cache_ttl_seconds: float = 300,
auto_install: bool = True,
server_url: str = "",
home_path: Optional[Path] = None,
) -> FetchResult:
"""Pull secrets from BSM and set them on ``os.environ``.
@@ -444,6 +574,10 @@ def apply_bitwarden_secrets(
files have loaded. It is intentionally defensive any failure
returns a :class:`FetchResult` with ``error`` set; it never raises.
``server_url`` selects the Bitwarden region or self-hosted endpoint
(e.g. ``https://vault.bitwarden.eu`` for EU Cloud). Empty string
means use ``bws``'s default (US Cloud).
Parameters mirror the ``secrets.bitwarden.*`` config keys so the
caller can just splat the dict in.
"""
@@ -482,6 +616,8 @@ def apply_bitwarden_secrets(
project_id=project_id,
binary=binary,
cache_ttl_seconds=cache_ttl_seconds,
server_url=server_url,
home_path=home_path,
)
except RuntimeError as exc:
result.error = str(exc)
@@ -511,5 +647,15 @@ def apply_bitwarden_secrets(
# ---------------------------------------------------------------------------
def _reset_cache_for_tests() -> None:
def _reset_cache_for_tests(home_path: Optional[Path] = None) -> None:
"""Clear in-process AND disk caches.
Tests can pass ``home_path`` to scope the disk cleanup to a tmpdir.
Without it we fall back to the same default resolution as the cache
writer itself.
"""
_CACHE.clear()
try:
_disk_cache_path(home_path).unlink()
except (FileNotFoundError, OSError):
pass
+34
View File
@@ -205,6 +205,40 @@ def build_system_prompt_parts(agent: Any, system_message: Optional[str] = None)
if _env_hints:
stable_parts.append(_env_hints)
# Active-profile hint — names the Hermes profile the agent is running
# under so it doesn't conflate ~/.hermes/skills/ (default profile) with
# ~/.hermes/profiles/<active>/skills/ (this profile's). Deterministic
# for the lifetime of the agent — profile name doesn't change
# mid-session, so this doesn't break the prompt cache.
# See file_safety._resolve_active_profile_name + classify_cross_profile_target
# for the matching tool-side guard.
try:
from agent.file_safety import _resolve_active_profile_name
active_profile = _resolve_active_profile_name()
except Exception:
active_profile = "default"
if active_profile == "default":
stable_parts.append(
"Active Hermes profile: default. Other profiles (if any) live "
"under ~/.hermes/profiles/<name>/. Each profile has its own "
"skills/, plugins/, cron/, and memories/ that affect a different "
"session than this one. Do not modify another profile's "
"skills/plugins/cron/memories unless the user explicitly directs "
"you to."
)
else:
stable_parts.append(
f"Active Hermes profile: {active_profile}. This session reads "
f"and writes ~/.hermes/profiles/{active_profile}/. The default "
f"profile's data lives at ~/.hermes/skills/, ~/.hermes/plugins/, "
f"~/.hermes/cron/, ~/.hermes/memories/ — those belong to a "
f"different session run from a different shell. Do NOT modify "
f"another profile's skills/plugins/cron/memories unless the user "
f"explicitly directs you to. The cross-profile write guard will "
f"refuse such writes by default; pass cross_profile=True only "
f"after explicit direction."
)
platform_key = (agent.platform or "").lower().strip()
if platform_key in PLATFORM_HINTS:
stable_parts.append(PLATFORM_HINTS[platform_key])
+3 -1
View File
@@ -388,6 +388,7 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
agent.tool_progress_callback(
"tool.completed", function_name, None, None,
duration=tool_duration, is_error=is_error,
result=function_result,
)
except Exception as cb_err:
logging.debug(f"Tool progress callback error: {cb_err}")
@@ -491,7 +492,7 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
try:
function_args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError as e:
logging.warning(f"Unexpected JSON error after validation: {e}")
logger.warning(f"Unexpected JSON error after validation: {e}")
function_args = {}
if not isinstance(function_args, dict):
function_args = {}
@@ -822,6 +823,7 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
agent.tool_progress_callback(
"tool.completed", function_name, None, None,
duration=tool_duration, is_error=_is_error_result,
result=function_result,
)
except Exception as cb_err:
logging.debug(f"Tool progress callback error: {cb_err}")
+193
View File
@@ -0,0 +1,193 @@
"""
Transcription Provider ABC
==========================
Defines the pluggable-backend interface for speech-to-text. Providers
register instances via
:meth:`PluginContext.register_transcription_provider`; the active one
(selected via ``stt.provider`` in ``config.yaml``) services every
:func:`tools.transcription_tools.transcribe_audio` call **when the
configured name is neither a built-in (``local``, ``local_command``,
``groq``, ``openai``, ``mistral``, ``xai``) nor disabled**.
Two coexisting STT extension surfaces in resolution order:
1. **Built-in providers** (``BUILTIN_STT_PROVIDERS`` in
:mod:`tools.transcription_tools`) native Python implementations
for the 6 backends shipped today (faster-whisper, local_command,
Groq, OpenAI, Mistral, xAI). **Always win** plugins cannot
shadow them. The single-env-var shell escape hatch
``HERMES_LOCAL_STT_COMMAND`` is preserved via the built-in
``local_command`` path.
2. **Plugin-registered providers** (this ABC). For new STT backends
OpenRouter, SenseAudio, Gemini-STT, custom proprietary engines
that need a Python implementation without modifying
``tools/transcription_tools.py``.
Built-ins-always-win is enforced at registration time
(:func:`agent.transcription_registry.register_provider` rejects names
in ``BUILTIN_STT_PROVIDERS`` with a warning) AND at dispatch time
(:func:`tools.transcription_tools._dispatch_to_plugin_provider`
re-checks defensively).
Providers live in ``<repo>/plugins/transcription/<name>/`` (built-in
plugins, none shipped today) or
``~/.hermes/plugins/transcription/<name>/`` (user-installed).
Response contract
-----------------
:meth:`TranscriptionProvider.transcribe` returns a dict with keys::
success bool
transcript str transcribed text (empty when success=False)
provider str provider name (for diagnostics)
error str only when success=False
"""
from __future__ import annotations
import abc
import logging
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# ABC
# ---------------------------------------------------------------------------
class TranscriptionProvider(abc.ABC):
"""Abstract base class for a speech-to-text backend.
Subclasses must implement :attr:`name` and :meth:`transcribe`.
Everything else has sane defaults override only what your provider
needs.
"""
@property
@abc.abstractmethod
def name(self) -> str:
"""Stable short identifier used in ``stt.provider`` config.
Lowercase, no spaces. Examples: ``openrouter``, ``sensaudio``,
``gemini``, ``deepgram``. Names that collide with a built-in STT
provider (``local``, ``local_command``, ``groq``, ``openai``,
``mistral``, ``xai``) are rejected at registration time.
"""
@property
def display_name(self) -> str:
"""Human-readable label shown in ``hermes tools``.
Defaults to ``name.title()``.
"""
return self.name.title()
def is_available(self) -> bool:
"""Return True when this provider can service calls.
Typically checks for a required API key + that the SDK is
importable. Default: True (providers with no external
dependencies are always available).
Must NOT raise used by the picker and ``hermes setup`` for
availability displays and should fail gracefully.
"""
return True
def list_models(self) -> List[Dict[str, Any]]:
"""Return model catalog entries.
Each entry::
{
"id": "whisper-large-v3-turbo", # required
"display": "Whisper Large v3 Turbo", # optional
"languages": ["en", "es", "fr"], # optional
"max_audio_seconds": 1500, # optional
}
Default: empty list (provider has a single fixed model or
doesn't expose model selection).
"""
return []
def default_model(self) -> Optional[str]:
"""Return the default model id, or None if not applicable."""
models = self.list_models()
if models:
return models[0].get("id")
return None
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker.
Used by ``tools_config.py`` to inject this provider as a row in
the Speech-to-Text provider list. Shape::
{
"name": "OpenRouter STT", # picker label
"badge": "paid", # optional short tag
"tag": "Whisper via OpenRouter API", # optional subtitle
"env_vars": [ # keys to prompt for
{"key": "OPENROUTER_API_KEY",
"prompt": "OpenRouter API key",
"url": "https://openrouter.ai/keys"},
],
}
Default: minimal entry derived from ``display_name`` with no
env vars. Override to expose API key prompts and custom badges.
"""
return {
"name": self.display_name,
"badge": "",
"tag": "",
"env_vars": [],
}
@abc.abstractmethod
def transcribe(
self,
file_path: str,
*,
model: Optional[str] = None,
language: Optional[str] = None,
**extra: Any,
) -> Dict[str, Any]:
"""Transcribe the audio file at ``file_path``.
Returns a dict with the standard envelope::
{
"success": True,
"transcript": "the transcribed text",
"provider": "<this provider's name>",
}
or on failure::
{
"success": False,
"transcript": "",
"error": "human-readable error message",
"provider": "<this provider's name>",
}
Implementations should NOT raise convert exceptions to the
error envelope so the dispatcher can deliver a consistent shape
to the gateway/CLI caller.
Args:
file_path: Absolute path to the audio file. The dispatcher
has already validated existence + size before calling.
model: Model identifier from :meth:`list_models`, or None
to use :meth:`default_model`.
language: Optional BCP-47 language hint (e.g. ``"en"``,
``"ja"``) providers without language hints should
ignore this argument.
**extra: Forward-compat parameters future schema versions
may expose. Implementations should ignore unknown keys.
"""
+122
View File
@@ -0,0 +1,122 @@
"""
Transcription Provider Registry
================================
Central map of registered STT providers. Populated by plugins at
import-time via :meth:`PluginContext.register_transcription_provider`;
consumed by :mod:`tools.transcription_tools` to dispatch
:func:`transcribe_audio` calls to the active plugin backend **when**
the configured ``stt.provider`` name is not a built-in.
Built-ins-always-win
--------------------
Plugin names that collide with a built-in STT provider (``local``,
``local_command``, ``groq``, ``openai``, ``mistral``, ``xai``) are
rejected at registration with a warning. This invariant is also
re-checked at dispatch time in
:func:`tools.transcription_tools._dispatch_to_plugin_provider`.
"""
from __future__ import annotations
import logging
import threading
from typing import Dict, List, Optional
from agent.transcription_provider import TranscriptionProvider
logger = logging.getLogger(__name__)
# Names reserved for native built-in STT handlers. Plugins cannot
# register a name in this set — the registration call is rejected with
# a warning. **Kept in sync with ``BUILTIN_STT_PROVIDERS`` in
# :mod:`tools.transcription_tools`** — a regression test in
# ``tests/agent/test_transcription_registry.py::TestBuiltinSync``
# fails if the two lists drift. Importing from
# ``tools.transcription_tools`` directly would create a circular
# dependency (``tools.transcription_tools`` imports
# ``agent.transcription_registry`` for dispatch).
_BUILTIN_NAMES = frozenset({
"local",
"local_command",
"groq",
"openai",
"mistral",
"xai",
})
_providers: Dict[str, TranscriptionProvider] = {}
_lock = threading.Lock()
def register_provider(provider: TranscriptionProvider) -> None:
"""Register a transcription provider.
Rejects:
- Non-:class:`TranscriptionProvider` instances (raises :class:`TypeError`).
- Empty/whitespace ``.name`` (raises :class:`ValueError`).
- Names colliding with a built-in (logs a warning, silently
ignores built-ins-always-win invariant).
Re-registration (same ``name``) overwrites the previous entry and
logs a debug message makes hot-reload scenarios (tests, dev
loops) behave predictably.
"""
if not isinstance(provider, TranscriptionProvider):
raise TypeError(
f"register_provider() expects a TranscriptionProvider instance, "
f"got {type(provider).__name__}"
)
name = provider.name
if not isinstance(name, str) or not name.strip():
raise ValueError("Transcription provider .name must be a non-empty string")
key = name.strip().lower()
if key in _BUILTIN_NAMES:
logger.warning(
"Transcription provider '%s' shadows a built-in name; registration "
"ignored. Built-in STT providers (%s) always win — pick a different "
"name.",
key, ", ".join(sorted(_BUILTIN_NAMES)),
)
return
with _lock:
existing = _providers.get(key)
_providers[key] = provider
if existing is not None:
logger.debug(
"Transcription provider '%s' re-registered (was %r)",
key, type(existing).__name__,
)
else:
logger.debug(
"Registered transcription provider '%s' (%s)",
key, type(provider).__name__,
)
def list_providers() -> List[TranscriptionProvider]:
"""Return all registered providers, sorted by name."""
with _lock:
items = list(_providers.values())
return sorted(items, key=lambda p: p.name)
def get_provider(name: str) -> Optional[TranscriptionProvider]:
"""Return the provider registered under *name*, or None.
Name matching is case-insensitive and whitespace-tolerant mirrors
how ``tools.transcription_tools._get_provider`` normalizes the
configured ``stt.provider`` value.
"""
if not isinstance(name, str):
return None
return _providers.get(name.strip().lower())
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
_providers.clear()
+11 -1
View File
@@ -106,7 +106,17 @@ class AnthropicTransport(ProviderTransport):
elif block.type == "tool_use":
name = block.name
if strip_tool_prefix and name.startswith(_MCP_PREFIX):
name = name[len(_MCP_PREFIX):]
stripped = name[len(_MCP_PREFIX):]
# Only strip the mcp_ prefix for OAuth-injected tools
# (where Hermes adds the prefix when sending to Anthropic
# and must remove it on the way back). Native MCP server
# tools (from mcp_servers: in config.yaml) are registered
# in the tool registry under their FULL mcp_<server>_<tool>
# name and must NOT be stripped. GH-25255.
from tools.registry import registry as _tool_registry
if (_tool_registry.get_entry(stripped)
and not _tool_registry.get_entry(name)):
name = stripped
tool_calls.append(
ToolCall(
id=block.id,
+20 -3
View File
@@ -113,9 +113,8 @@ class ChatCompletionsTransport(ProviderTransport):
self, messages: list[dict[str, Any]], **kwargs
) -> list[dict[str, Any]]:
"""Messages are already in OpenAI format — strip internal fields
that strict chat-completions providers reject with HTTP 400/422.
Strips:
that strict chat-completions providers reject with HTTP 400/422
(or, in the case of some OpenAI-compatible gateways, 5xx):
- Codex Responses API fields: ``codex_reasoning_items`` /
``codex_message_items`` on the message, ``call_id`` /
@@ -127,6 +126,16 @@ class ChatCompletionsTransport(ProviderTransport):
``Extra inputs are not permitted, field: 'messages[N].tool_name'``.
Permissive providers (OpenRouter, MiniMax) silently ignore the
field, which masked the bug for months.
- Hermes-internal scaffolding markers any top-level message key
starting with ``_`` (e.g. ``_empty_recovery_synthetic``,
``_empty_terminal_sentinel``, ``_thinking_prefill``). These are
bookkeeping flags the agent loop attaches to messages so the
persistence layer can later strip its own scaffolding; they must
never reach the wire. Permissive providers (real OpenAI,
Anthropic) silently drop unknown message keys, but strict
gateways (e.g. opencode-go, codex.nekos.me) reject with
``Extra inputs are not permitted, field: 'messages[N]._empty_recovery_synthetic'``,
which then poisons every subsequent request in the session.
"""
needs_sanitize = False
for msg in messages:
@@ -139,6 +148,9 @@ class ChatCompletionsTransport(ProviderTransport):
):
needs_sanitize = True
break
if any(isinstance(k, str) and k.startswith("_") for k in msg):
needs_sanitize = True
break
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
@@ -160,6 +172,11 @@ class ChatCompletionsTransport(ProviderTransport):
msg.pop("codex_reasoning_items", None)
msg.pop("codex_message_items", None)
msg.pop("tool_name", None)
# Drop all Hermes-internal scaffolding markers (``_``-prefixed).
# OpenAI's message schema has no ``_``-prefixed fields, so this
# is safe and future-proofs against new markers being added.
for key in [k for k in msg if isinstance(k, str) and k.startswith("_")]:
msg.pop(key, None)
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
+15
View File
@@ -50,6 +50,7 @@ class ResponsesApiTransport(ProviderTransport):
reasoning_config: dict | None {effort, enabled}
session_id: str | None used for prompt_cache_key + xAI conv header
max_tokens: int | None max_output_tokens
timeout: float | None per-request timeout forwarded to the SDK
request_overrides: dict | None extra kwargs merged in
provider: str | None provider name for backend-specific logic
base_url: str | None endpoint URL
@@ -143,6 +144,20 @@ class ResponsesApiTransport(ProviderTransport):
if request_overrides:
kwargs.update(request_overrides)
# Forward per-request timeout to the SDK so OpenAI/Anthropic clients
# honor it. Without this, ``providers.<id>.request_timeout_seconds``
# is silently dropped on the main agent Codex path while the
# chat_completions path and auxiliary Codex adapter both forward it.
timeout = kwargs.get("timeout", params.get("timeout"))
if (
isinstance(timeout, (int, float))
and not isinstance(timeout, bool)
and 0 < float(timeout) < float("inf")
):
kwargs["timeout"] = float(timeout)
else:
kwargs.pop("timeout", None)
if is_codex_backend:
prompt_cache_key = kwargs.get("prompt_cache_key")
cache_scope_id = str(prompt_cache_key or session_id or "").strip()
+37 -2
View File
@@ -87,6 +87,39 @@ class TurnResult:
_TURN_ABORTED_MARKERS = ("<turn_aborted>", "<turn_aborted/>")
def _coerce_turn_input_text(user_input: Any) -> str:
"""Collapse Hermes/OpenAI rich content into app-server text input.
The current `turn/start` path sends text items only. TUI image attachment
can hand us OpenAI-style content parts, so keep the text/path hints and
replace opaque image payloads with a small marker instead of putting a
Python list into the `text` field.
"""
if isinstance(user_input, str):
return user_input
if isinstance(user_input, list):
parts: list[str] = []
for item in user_input:
if isinstance(item, str):
if item.strip():
parts.append(item)
continue
if not isinstance(item, dict):
if item is not None:
parts.append(str(item))
continue
item_type = item.get("type")
if item_type in {"text", "input_text"}:
text = item.get("text") or item.get("content") or ""
if text:
parts.append(str(text))
elif item_type in {"image", "image_url", "input_image"}:
parts.append("[image attached]")
text = "\n\n".join(p for p in parts if p).strip()
return text or "What do you see in this image?"
return "" if user_input is None else str(user_input)
# Substrings in codex stderr / JSON-RPC error messages that signal the
# subprocess died because its OAuth credentials are no longer valid.
# Kept conservative: we only redirect users to `codex login` when we're
@@ -327,7 +360,7 @@ class CodexAppServerSession:
def run_turn(
self,
user_input: str,
user_input: Any,
*,
turn_timeout: float = 600.0,
notification_poll_timeout: float = 0.25,
@@ -365,6 +398,8 @@ class CodexAppServerSession:
self._interrupt_event.clear()
projector = CodexEventProjector()
user_input_text = _coerce_turn_input_text(user_input)
# Send turn/start with the user input. Text-only for now (codex
# supports rich content but Hermes' text path is the common case).
try:
@@ -372,7 +407,7 @@ class CodexAppServerSession:
"turn/start",
{
"threadId": self._thread_id,
"input": [{"type": "text", "text": user_input}],
"input": [{"type": "text", "text": user_input_text}],
},
timeout=10,
)
+274
View File
@@ -0,0 +1,274 @@
"""
Text-to-Speech Provider ABC
============================
Defines the pluggable-backend interface for text-to-speech synthesis.
Providers register instances via
``PluginContext.register_tts_provider()``; the active one (selected via
``tts.provider`` in ``config.yaml``) services every ``text_to_speech``
tool call **only when the configured name is neither a built-in nor a
command-type provider declared under ``tts.providers.<name>``**.
Three coexisting TTS extension surfaces in resolution order:
1. **Built-in providers** (``BUILTIN_TTS_PROVIDERS`` in
:mod:`tools.tts_tool`) native Python implementations (edge, openai,
elevenlabs, ). **Always win** plugins cannot shadow them.
2. **Command-type providers** declared under ``tts.providers.<name>:
type: command`` (PR #17843, commit ``2facea7f7``). Wire any local
CLI into Hermes with shell-template placeholders. **Wins over a
same-name plugin** config is more local than plugin install.
3. **Plugin-registered providers** (this ABC). For backends that need a
Python SDK, streaming bytes, OAuth refresh, or voice-listing APIs
the shell-template grammar can't reasonably express.
Built-ins-always-win is enforced at registration time
(:func:`agent.tts_registry.register_provider` rejects names in
``BUILTIN_TTS_PROVIDERS`` with a warning) AND at dispatch time
(:func:`tools.tts_tool._dispatch_to_plugin_provider` re-checks
defensively). The dispatcher also rejects plugin dispatch when a same-
name command provider is configured.
Providers live in ``<repo>/plugins/tts/<name>/`` (built-in plugins, no
shipped today) or ``~/.hermes/plugins/tts/<name>/`` (user-installed).
None ship in-tree as of issue #30398 — the hook is additive
infrastructure waiting for a real consumer (Cartesia, Fish Audio, ).
Response contract
-----------------
:meth:`TTSProvider.synthesize` writes the audio bytes to ``output_path``
and returns the path as a string. Implementations should raise on
failure the dispatcher converts exceptions into the standard
``{success: False, error: }`` JSON envelope the rest of Hermes
expects.
"""
from __future__ import annotations
import abc
import logging
from typing import Any, Dict, Iterator, List, Optional
logger = logging.getLogger(__name__)
DEFAULT_OUTPUT_FORMAT = "mp3"
VALID_OUTPUT_FORMATS = frozenset({"mp3", "wav", "ogg", "opus", "flac"})
# ---------------------------------------------------------------------------
# ABC
# ---------------------------------------------------------------------------
class TTSProvider(abc.ABC):
"""Abstract base class for a text-to-speech backend.
Subclasses must implement :attr:`name` and :meth:`synthesize`.
Everything else has sane defaults override only what your provider
needs.
"""
@property
@abc.abstractmethod
def name(self) -> str:
"""Stable short identifier used in ``tts.provider`` config.
Lowercase, no spaces. Examples: ``cartesia``, ``fishaudio``,
``deepgram``. Names that collide with a built-in TTS provider
(``edge``, ``openai``, ``elevenlabs``, ``minimax``, ``gemini``,
``mistral``, ``xai``, ``piper``, ``kittentts``, ``neutts``) are
rejected at registration time.
"""
@property
def display_name(self) -> str:
"""Human-readable label shown in ``hermes tools``.
Defaults to ``name.title()`` (e.g. ``Cartesia`` for ``cartesia``).
"""
return self.name.title()
def is_available(self) -> bool:
"""Return True when this provider can service calls.
Typically checks for a required API key + that the SDK is
importable. Default: True (providers with no external
dependencies are always available).
Must NOT raise used by the picker and ``hermes setup`` for
availability displays and should fail gracefully.
"""
return True
def list_voices(self) -> List[Dict[str, Any]]:
"""Return voice catalog entries.
Each entry::
{
"id": "voice-abc-123", # required
"display": "Aria — neutral female", # optional; defaults to id
"language": "en-US", # optional
"gender": "female", # optional
"preview_url": "https://...mp3", # optional
}
Default: empty list (provider has no enumerable voices or
doesn't surface them via API).
"""
return []
def list_models(self) -> List[Dict[str, Any]]:
"""Return model catalog entries.
Each entry::
{
"id": "sonic-2", # required
"display": "Sonic 2", # optional
"languages": ["en", "es", "fr"], # optional
"max_text_length": 5000, # optional
}
Default: empty list (provider has a single fixed model or
doesn't expose model selection).
"""
return []
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker.
Used by ``tools_config.py`` to inject this provider as a row in
the Text-to-Speech provider list. Shape::
{
"name": "Cartesia", # picker label
"badge": "paid", # optional short tag
"tag": "Ultra-low-latency streaming", # optional subtitle
"env_vars": [ # keys to prompt for
{"key": "CARTESIA_API_KEY",
"prompt": "Cartesia API key",
"url": "https://play.cartesia.ai/console"},
],
}
Default: minimal entry derived from ``display_name`` with no
env vars. Override to expose API key prompts and custom badges.
"""
return {
"name": self.display_name,
"badge": "",
"tag": "",
"env_vars": [],
}
def default_model(self) -> Optional[str]:
"""Return the default model id, or None if not applicable."""
models = self.list_models()
if models:
return models[0].get("id")
return None
def default_voice(self) -> Optional[str]:
"""Return the default voice id, or None if not applicable."""
voices = self.list_voices()
if voices:
return voices[0].get("id")
return None
@abc.abstractmethod
def synthesize(
self,
text: str,
output_path: str,
*,
voice: Optional[str] = None,
model: Optional[str] = None,
speed: Optional[float] = None,
format: str = DEFAULT_OUTPUT_FORMAT,
**extra: Any,
) -> str:
"""Synthesize ``text`` and write audio bytes to ``output_path``.
Returns the absolute path to the written file as a string
(typically just echoes ``output_path``). Raises on failure
the dispatcher converts exceptions to the standard
``{success: False, error: ...}`` JSON envelope.
Args:
text: The text to synthesize. Already truncated to the
provider's max length by the dispatcher.
output_path: Absolute path where the audio file should be
written. Parent directory is guaranteed to exist.
voice: Voice identifier from :meth:`list_voices`, or None
to use :meth:`default_voice`.
model: Model identifier from :meth:`list_models`, or None
to use :meth:`default_model`.
speed: Optional speech-rate multiplier (1.0 = normal).
Providers that don't support speed control should
ignore this argument.
format: Output audio format. Implementations should match
the requested format when possible; if unsupported,
pick the closest equivalent and ensure ``output_path``
ends with the correct extension.
**extra: Forward-compat parameters future schema versions
may expose. Implementations should ignore unknown keys.
"""
def stream(
self,
text: str,
*,
voice: Optional[str] = None,
model: Optional[str] = None,
format: str = "opus",
**extra: Any,
) -> Iterator[bytes]:
"""Stream synthesized audio bytes.
Optional. Providers that don't support streaming raise
:class:`NotImplementedError` (the default) and the dispatcher
falls back to :meth:`synthesize` + read-whole-file.
Args mirror :meth:`synthesize`. Default ``format`` is ``opus``
because the primary streaming use case is voice-bubble
delivery (Telegram et al.) which requires Opus.
"""
raise NotImplementedError(
f"TTS provider {self.name!r} does not implement streaming "
"synthesis. Use synthesize() instead, or implement stream() "
"if your backend supports it."
)
@property
def voice_compatible(self) -> bool:
"""Whether output is suitable for voice-bubble delivery.
Mirrors the ``tts.providers.<name>.voice_compatible`` field
from PR #17843. When True, the gateway's voice-message
delivery pipeline runs ffmpeg conversion to Opus if needed.
When False, output is delivered as a regular audio attachment.
Default: False (safe providers opt in explicitly).
"""
return False
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def resolve_output_format(value: Optional[str]) -> str:
"""Clamp an output_format value to the valid set.
Invalid values are coerced to :data:`DEFAULT_OUTPUT_FORMAT` rather
than rejected so the tool surface is forgiving of agent mistakes.
"""
if not isinstance(value, str):
return DEFAULT_OUTPUT_FORMAT
v = value.strip().lower()
if v in VALID_OUTPUT_FORMATS:
return v
return DEFAULT_OUTPUT_FORMAT
+133
View File
@@ -0,0 +1,133 @@
"""
TTS Provider Registry
=====================
Central map of registered TTS providers. Populated by plugins at
import-time via :meth:`PluginContext.register_tts_provider`; consumed
by :mod:`tools.tts_tool` to dispatch ``text_to_speech`` tool calls to
the active plugin backend **when** the configured ``tts.provider``
name is neither a built-in nor a command-type provider.
Built-ins-always-win
--------------------
Plugin names that collide with a built-in TTS provider (``edge``,
``openai``, ``elevenlabs``, ``minimax``, ``gemini``, ``mistral``,
``xai``, ``piper``, ``kittentts``, ``neutts``) are rejected at
registration with a warning. This invariant is also re-checked at
dispatch time in :func:`tools.tts_tool._dispatch_to_plugin_provider`.
Command-providers-win-over-plugins
----------------------------------
This registry doesn't enforce the command-vs-plugin precedence — that
lives in the dispatcher, which checks for a same-name
``tts.providers.<name>: type: command`` entry before consulting the
registry. The rationale is locality: a name declared in the user's
``config.yaml`` is more specific to their setup than a plugin that
happens to be installed.
"""
from __future__ import annotations
import logging
import threading
from typing import Dict, List, Optional
from agent.tts_provider import TTSProvider
logger = logging.getLogger(__name__)
# Names reserved for native built-in TTS handlers. Plugins cannot
# register a name in this set — the registration call is rejected with
# a warning. **Kept in sync with ``BUILTIN_TTS_PROVIDERS`` in
# :mod:`tools.tts_tool`** — a regression test in
# ``tests/agent/test_tts_registry.py::TestBuiltinSync`` fails if the
# two lists drift. Importing from ``tools.tts_tool`` directly would
# create a circular dependency (``tools.tts_tool`` imports
# ``agent.tts_registry`` for dispatch).
_BUILTIN_NAMES = frozenset({
"edge",
"elevenlabs",
"openai",
"minimax",
"xai",
"mistral",
"gemini",
"neutts",
"kittentts",
"piper",
})
_providers: Dict[str, TTSProvider] = {}
_lock = threading.Lock()
def register_provider(provider: TTSProvider) -> None:
"""Register a TTS provider.
Rejects:
- Non-:class:`TTSProvider` instances (raises :class:`TypeError`).
- Empty/whitespace ``.name`` (raises :class:`ValueError`).
- Names colliding with a built-in (logs a warning, silently
ignores built-ins-always-win invariant).
Re-registration (same ``name``) overwrites the previous entry and
logs a debug message makes hot-reload scenarios (tests, dev
loops) behave predictably.
"""
if not isinstance(provider, TTSProvider):
raise TypeError(
f"register_provider() expects a TTSProvider instance, "
f"got {type(provider).__name__}"
)
name = provider.name
if not isinstance(name, str) or not name.strip():
raise ValueError("TTS provider .name must be a non-empty string")
key = name.strip().lower()
if key in _BUILTIN_NAMES:
logger.warning(
"TTS provider '%s' shadows a built-in name; registration ignored. "
"Built-in TTS providers (%s) always win — pick a different name.",
key, ", ".join(sorted(_BUILTIN_NAMES)),
)
return
with _lock:
existing = _providers.get(key)
_providers[key] = provider
if existing is not None:
logger.debug(
"TTS provider '%s' re-registered (was %r)",
key, type(existing).__name__,
)
else:
logger.debug(
"Registered TTS provider '%s' (%s)",
key, type(provider).__name__,
)
def list_providers() -> List[TTSProvider]:
"""Return all registered providers, sorted by name."""
with _lock:
items = list(_providers.values())
return sorted(items, key=lambda p: p.name)
def get_provider(name: str) -> Optional[TTSProvider]:
"""Return the provider registered under *name*, or None.
Name matching is case-insensitive and whitespace-tolerant mirrors
how ``tools.tts_tool._get_provider`` normalizes the configured
``tts.provider`` value.
"""
if not isinstance(name, str):
return None
return _providers.get(name.strip().lower())
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
_providers.clear()
+1 -1
View File
@@ -39,7 +39,7 @@ model:
# LM Studio is first-class and uses provider: "lmstudio".
# It works with both no-auth and auth-enabled server modes.
#
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
# Can also be overridden for a single invocation with the --provider flag.
provider: "auto"
# API configuration (falls back to OPENROUTER_API_KEY env var)
+677 -154
View File
File diff suppressed because it is too large Load Diff
+36 -2
View File
@@ -45,6 +45,28 @@ _jobs_file_lock = threading.Lock()
OUTPUT_DIR = CRON_DIR / "output"
ONESHOT_GRACE_SECONDS = 120
# Fields on a cron job that must never change after creation. ``id`` is used
# as a filesystem path component under ``OUTPUT_DIR``; allowing it to be
# updated lets an unsafe value (``../escape``, absolute path, nested) leak
# into output writes/deletes.
_IMMUTABLE_JOB_FIELDS = frozenset({"id"})
def _job_output_dir(job_id: str) -> Path:
"""Resolve a job's output directory, rejecting any path-escape attempt.
Job IDs are filesystem path components under ``OUTPUT_DIR``. A legacy or
crafted ID containing ``..``, absolute paths, or nested separators would
allow output writes/deletes to escape the cron output sandbox. Reject
anything that isn't a single safe path component.
"""
text = str(job_id or "").strip()
if not text or text in {".", ".."} or "/" in text or "\\" in text:
raise ValueError(f"Invalid cron job id for output path: {job_id!r}")
if Path(text).is_absolute() or Path(text).drive:
raise ValueError(f"Invalid cron job id for output path: {job_id!r}")
return OUTPUT_DIR / text
def _normalize_skill_list(skill: Optional[str] = None, skills: Optional[Any] = None) -> List[str]:
"""Normalize legacy/single-skill and multi-skill inputs into a unique ordered list."""
@@ -728,6 +750,15 @@ def list_jobs(include_disabled: bool = False) -> List[Dict[str, Any]]:
def update_job(job_id: str, updates: Dict[str, Any]) -> Optional[Dict[str, Any]]:
"""Update a job by ID, refreshing derived schedule fields when needed."""
# Block mutation of immutable fields. ``id`` in particular is a filesystem
# path component under OUTPUT_DIR — letting an update change it leaks
# path-escape values into output writes/deletes.
bad_fields = _IMMUTABLE_JOB_FIELDS.intersection(updates or {})
if bad_fields:
raise ValueError(
f"Cron job field(s) cannot be updated: {', '.join(sorted(bad_fields))}"
)
jobs = load_jobs()
for i, job in enumerate(jobs):
if job["id"] != job_id:
@@ -845,9 +876,12 @@ def remove_job(job_id: str) -> bool:
original_len = len(jobs)
jobs = [j for j in jobs if j["id"] != canonical_id]
if len(jobs) < original_len:
# Resolve the output dir BEFORE saving so a legacy unsafe ID (e.g.
# left over from before the create-time guard) fails closed without
# half-applying the removal.
job_output_dir = _job_output_dir(canonical_id)
save_jobs(jobs)
# Clean up output directory to prevent orphaned dirs accumulating
job_output_dir = OUTPUT_DIR / canonical_id
if job_output_dir.exists():
shutil.rmtree(job_output_dir)
return True
@@ -1061,7 +1095,7 @@ def _get_due_jobs_locked() -> List[Dict[str, Any]]:
def save_job_output(job_id: str, output: str):
"""Save job output to file."""
ensure_dirs()
job_output_dir = OUTPUT_DIR / job_id
job_output_dir = _job_output_dir(job_id)
job_output_dir.mkdir(parents=True, exist_ok=True)
_secure_dir(job_output_dir)
+59 -3
View File
@@ -57,6 +57,29 @@ class CronPromptInjectionBlocked(Exception):
"""
def _resolve_cron_disabled_toolsets(cfg: dict) -> list[str]:
"""Toolsets a cron-spawned agent must never receive.
Three protected toolsets are always disabled in cron context:
- ``cronjob`` would let a cron-spawned agent schedule more cron jobs
- ``messaging`` interactive, needs a live gateway session
- ``clarify`` interactive, blocks waiting for user input
User-level ``agent.disabled_toolsets`` from config.yaml is layered on top
so per-job ``enabled_toolsets`` cannot bypass policy that applies to
ordinary agent runs (#25752 — LLM-supplied enabled_toolsets was widening
past config.yaml's denylist).
"""
disabled = ["cronjob", "messaging", "clarify"]
agent_cfg = (cfg or {}).get("agent") or {}
user_disabled = agent_cfg.get("disabled_toolsets") or []
for name in user_disabled:
name = str(name).strip()
if name and name not in disabled:
disabled.append(name)
return disabled
def _resolve_cron_enabled_toolsets(job: dict, cfg: dict) -> list[str] | None:
"""Resolve the toolset list for a cron job.
@@ -234,6 +257,30 @@ def _resolve_origin(job: dict) -> Optional[dict]:
return None
def _cron_job_origin_log_suffix(job: dict) -> str:
"""Return safe provenance details for security warnings about a cron job.
The scheduler normally has no live HTTP request object when it detects a
bad stored ``context_from`` reference. Including the job's saved origin
makes future probe logs actionable without exposing secrets: platform/chat
metadata for gateway-created jobs, and optional source-IP fields for API
surfaces that persist them in origin metadata.
"""
origin = job.get("origin")
if not isinstance(origin, dict):
return ""
fields = []
for key in ("platform", "chat_id", "thread_id", "source_ip", "remote", "forwarded_for"):
value = origin.get(key)
if value is None:
continue
text = str(value).replace("\r", " ").replace("\n", " ").strip()
if text:
fields.append(f"origin_{key}={text[:200]!r}")
return " " + " ".join(fields) if fields else ""
def _plugin_cron_env_var(platform_name: str) -> str:
"""Return the cron home-channel env var registered by a plugin platform.
@@ -529,7 +576,9 @@ def _send_media_via_adapter(
"""
from pathlib import Path
from gateway.platforms.base import should_send_media_as_audio
from gateway.platforms.base import BasePlatformAdapter, should_send_media_as_audio
media_files = BasePlatformAdapter.filter_media_delivery_paths(media_files)
for media_path, _is_voice in media_files:
try:
@@ -614,6 +663,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Option
# Extract MEDIA: tags so attachments are forwarded as files, not raw text
from gateway.platforms.base import BasePlatformAdapter
media_files, cleaned_delivery_content = BasePlatformAdapter.extract_media(delivery_content)
media_files = BasePlatformAdapter.filter_media_delivery_paths(media_files)
try:
config = load_gateway_config()
@@ -1001,7 +1051,13 @@ def _build_job_prompt(job: dict, prerun_script: Optional[tuple] = None) -> str:
for source_job_id in context_from:
# Guard against path traversal — valid job IDs are 12-char hex strings
if not source_job_id or not all(c in "0123456789abcdef" for c in source_job_id):
logger.warning("context_from: skipping invalid job_id %r", source_job_id)
logger.warning(
"context_from: skipping invalid job_id %r for job_id=%r name=%r%s",
source_job_id,
job.get("id"),
job.get("name"),
_cron_job_origin_log_suffix(job),
)
continue
try:
job_output_dir = OUTPUT_DIR / source_job_id
@@ -1571,7 +1627,7 @@ def _run_job_impl(job: dict) -> tuple[bool, str, str, Optional[str]]:
provider_sort=pr.get("sort"),
openrouter_min_coding_score=(_cfg.get("openrouter") or {}).get("min_coding_score"),
enabled_toolsets=_resolve_cron_enabled_toolsets(job, _cfg),
disabled_toolsets=["cronjob", "messaging", "clarify"],
disabled_toolsets=_resolve_cron_disabled_toolsets(_cfg),
quiet_mode=True,
# Cron jobs should always inherit the user's SOUL.md identity from
# HERMES_HOME. When a workdir is configured, also inject project
+10 -5
View File
@@ -6,17 +6,22 @@
#
# Set HERMES_UID / HERMES_GID to the host user that owns ~/.hermes so
# files created inside the container stay readable/writable on the host.
# The entrypoint remaps the internal `hermes` user to these values via
# usermod/groupmod + gosu.
# The s6-overlay stage2 hook remaps the internal `hermes` user to these
# values via usermod/groupmod; each supervised service then drops to that
# user via `s6-setuidgid`.
#
# Security notes:
# - The dashboard service binds to 127.0.0.1 by default. It stores API
# keys; exposing it on LAN without auth is unsafe. If you want remote
# access, use an SSH tunnel or put it behind a reverse proxy that
# adds authentication — do NOT pass --insecure --host 0.0.0.0.
# - If you override entrypoint, keep /opt/hermes/docker/entrypoint.sh in
# the command chain. It drops root to the hermes user before gateway
# files such as gateway.lock are created.
# - If you override entrypoint, keep `/init` as the first command in
# the chain (or let docker use the image's default ENTRYPOINT,
# which is `["/init", "/opt/hermes/docker/main-wrapper.sh"]`).
# `/init` is s6-overlay's PID 1 — it runs the cont-init.d scripts
# (chown, profile reconcile, dashboard toggle) and sets up the
# supervision tree before any service starts. Bypassing it skips
# all of that setup and the gateway will not work correctly.
# - The gateway's API server is off unless you uncomment API_SERVER_KEY
# and API_SERVER_HOST. See docs/user-guide/api-server.md before doing
# this on an internet-facing host.
+90
View File
@@ -0,0 +1,90 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Make supervise/ trees for ALL declared s6 services queryable and
# controllable by the unprivileged hermes user (UID 10000).
#
# Background (PR #30136 review item I4): the entire s6 lifecycle
# (s6-svc, s6-svstat, s6-svwait) is dispatched as the hermes user
# inside the container (every Hermes runtime path runs under
# ``s6-setuidgid hermes``). But s6-supervise creates each service's
# ``supervise/`` and top-level ``event/`` directory with mode 0700
# owned by its effective UID — which is root, because s6-supervise
# is spawned by s6-svscan running as PID 1. So unprivileged clients
# get EACCES on every probe / control call against the slot.
#
# Two fixes, one in each registration path:
#
# 1. For RUNTIME-registered profile gateways (created via the s6
# runtime register hooks in profiles.py): the Python helper
# ``_seed_supervise_skeleton`` pre-creates supervise/ + event/ +
# supervise/control owned by hermes BEFORE s6-svscanctl -a fires.
# s6-supervise's mkdir/mkfifo are EEXIST-safe, so it inherits our
# ownership and never tries to chown back to root.
#
# 2. For STATIC s6-rc services (dashboard, main-hermes) declared at
# image-build time under /etc/s6-overlay/s6-rc.d/*: these are
# compiled by s6-rc at boot, and s6-supervise spawns BEFORE
# cont-init.d gets to run — so by the time we're here, the
# supervise/ tree is already there as root:root 0700. We chown
# it here. s6-supervise will keep using the same files; it never
# re-asserts ownership on a running service.
#
# This script runs as root after 01-hermes-setup but before
# 02-reconcile-profiles, so the chowns are settled before the
# Python reconciler walks the scandir. Lexicographic ordering
# guarantees this — the suffix is unusual because we want to slot
# in between 01 and the existing 02-reconcile-profiles without
# renumbering both (which would be a churn-noise patch on its own).
set -eu
# /run/s6-rc/servicedirs holds the live, compiled service directories
# for every static (s6-rc) service. Symlinks under /run/service/*
# point here. Per-service supervise/ + event/ both need hermes
# ownership for s6-svstat etc. to work as hermes.
SVC_ROOT=/run/s6-rc/servicedirs
if [ ! -d "$SVC_ROOT" ]; then
echo "[supervise-perms] $SVC_ROOT not present; skipping"
exit 0
fi
for svc in "$SVC_ROOT"/*; do
[ -d "$svc" ] || continue
name=$(basename "$svc")
# Skip s6-overlay-internal services (they need to stay root-only;
# the s6rc-* helpers manage the supervision tree itself).
case "$name" in
s6rc-*|s6-linux-*)
continue
;;
esac
# supervise/ tree — needed by s6-svstat / s6-svc.
if [ -d "$svc/supervise" ]; then
chown -R hermes:hermes "$svc/supervise" 2>/dev/null || \
echo "[supervise-perms] could not chown $svc/supervise"
# 0710 = group searchable. ``s6-svstat`` only needs to openat
# status, not list the dir, but giving the hermes group +x is
# the minimum that lets group members access the contents.
chmod 0710 "$svc/supervise" 2>/dev/null || true
# supervise/control is a FIFO that s6-svc writes commands
# into; the hermes user needs +w. Owner is already hermes
# after the recursive chown above; widen perms to 0660 so
# ``s6-svc`` works for any member of the hermes group too.
if [ -p "$svc/supervise/control" ]; then
chmod 0660 "$svc/supervise/control" 2>/dev/null || true
fi
fi
# Top-level event/ dir — s6-svlisten1 / s6-svwait subscribe here.
if [ -d "$svc/event" ]; then
chown hermes:hermes "$svc/event" 2>/dev/null || \
echo "[supervise-perms] could not chown $svc/event"
# Preserve s6's 03730 mode (setgid + g+rwx + sticky).
chmod 03730 "$svc/event" 2>/dev/null || true
fi
done
echo "[supervise-perms] chowned supervise/ trees for static s6-rc services"
+46
View File
@@ -0,0 +1,46 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Container-boot reconciliation of per-profile gateway s6 services.
#
# Runs as root after 01-hermes-setup (the stage2 hook) has chowned
# the volume and seeded $HERMES_HOME, but before s6-rc starts user
# services. /etc/cont-init.d/* scripts run in lexicographic order,
# so the `02-` prefix guarantees ordering.
#
# Service directories under /run/service/ live on tmpfs and are
# wiped on every container restart. Profile directories under
# $HERMES_HOME/profiles/ live on the persistent VOLUME. This script
# walks the persistent profiles, recreates the s6 service slots,
# and auto-starts only those whose last recorded state was
# `running` — see hermes_cli/container_boot.py.
#
# Phase 4 also needs hermes-user writes to /run/service/ (so the
# profile create/delete hooks can register/unregister at runtime),
# so we chown the scandir before invoking the reconciler. We
# additionally chown the s6-svscan control FIFO so the hermes user
# can send rescan signals via ``s6-svscanctl -a``; without this the
# entire runtime-registration path is inert under UID 10000 (the
# Python wrapper catches the resulting EACCES, prints a warning,
# and swallows the failure).
set -e
# Make the dynamic scandir hermes-writable. The directory itself
# starts root-owned by s6-overlay.
chown hermes:hermes /run/service 2>/dev/null || true
# Make the svscan control FIFO hermes-writable so s6-svscanctl -a
# / -an work for the hermes user. The FIFO is created by s6-svscan
# at PID-1 startup, so by the time this cont-init.d script runs it
# already exists. Both ``control`` and ``lock`` need to be writable
# for the various svscanctl operations; the directory itself stays
# root-owned (we only need to touch the two FIFOs/locks inside).
if [ -d /run/service/.s6-svscan ]; then
for entry in control lock; do
if [ -e "/run/service/.s6-svscan/$entry" ]; then
chown hermes:hermes "/run/service/.s6-svscan/$entry" 2>/dev/null || true
fi
done
fi
exec s6-setuidgid hermes /opt/hermes/.venv/bin/python -m hermes_cli.container_boot
+25 -158
View File
@@ -1,160 +1,27 @@
#!/bin/bash
# Docker/Podman entrypoint: bootstrap config files into the mounted volume, then run hermes.
set -e
HERMES_HOME="${HERMES_HOME:-/opt/data}"
INSTALL_DIR="/opt/hermes"
# --- Privilege dropping via gosu ---
# When started as root (the default for Docker, or fakeroot in rootless Podman),
# optionally remap the hermes user/group to match host-side ownership, fix volume
# permissions, then re-exec as hermes.
if [ "$(id -u)" = "0" ]; then
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
echo "Changing hermes UID to $HERMES_UID"
usermod -u "$HERMES_UID" hermes
fi
if [ -n "$HERMES_GID" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
echo "Changing hermes GID to $HERMES_GID"
# -o allows non-unique GID (e.g. macOS GID 20 "staff" may already exist
# as "dialout" in the Debian-based container image)
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
fi
# Fix ownership of the data volume. When HERMES_UID remaps the hermes user,
# files created by previous runs (under the old UID) become inaccessible.
# Always chown -R when UID was remapped; otherwise only if top-level is wrong.
actual_hermes_uid=$(id -u hermes)
needs_chown=false
if [ -n "$HERMES_UID" ] && [ "$HERMES_UID" != "10000" ]; then
needs_chown=true
elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
needs_chown=true
fi
if [ "$needs_chown" = true ]; then
echo "Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
# In rootless Podman the container's "root" is mapped to an unprivileged
# host UID — chown will fail. That's fine: the volume is already owned
# by the mapped user on the host side.
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "Warning: chown failed (rootless container?) — continuing anyway"
# The .venv must also be re-chowned when UID is remapped, otherwise
# lazy_deps.py cannot install platform packages (discord.py, etc.).
chown -R hermes:hermes "$INSTALL_DIR/.venv" 2>/dev/null || \
echo "Warning: chown .venv failed (rootless container?) — continuing anyway"
fi
# Ensure config.yaml is readable by the hermes runtime user even if it was
# edited on the host after initial ownership setup. Must run here (as root)
# rather than after the gosu drop, otherwise a non-root caller like
# `docker run -u $(id -u):$(id -g)` hits "Operation not permitted" (#15865).
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml" 2>/dev/null || true
chmod 640 "$HERMES_HOME/config.yaml" 2>/dev/null || true
fi
echo "Dropping root privileges"
exec gosu hermes "$0" "$@"
fi
# --- Running as hermes from here ---
source "${INSTALL_DIR}/.venv/bin/activate"
# Stamp install method for detect_install_method()
echo "docker" > "${HERMES_HOME:=/opt/data}/.install_method" 2>/dev/null || true
# Create essential directory structure. Cache and platform directories
# (cache/images, cache/audio, platforms/whatsapp, etc.) are created on
# demand by the application — don't pre-create them here so new installs
# get the consolidated layout from get_hermes_dir().
# The "home/" subdirectory is a per-profile HOME for subprocesses (git,
# ssh, gh, npm …). Without it those tools write to /root which is
# ephemeral and shared across profiles. See issue #4426.
mkdir -p "$HERMES_HOME"/{cron,sessions,logs,hooks,memories,skills,skins,plans,workspace,home}
# .env
if [ ! -f "$HERMES_HOME/.env" ]; then
cp "$INSTALL_DIR/.env.example" "$HERMES_HOME/.env"
fi
# config.yaml
if [ ! -f "$HERMES_HOME/config.yaml" ]; then
cp "$INSTALL_DIR/cli-config.yaml.example" "$HERMES_HOME/config.yaml"
fi
# SOUL.md
if [ ! -f "$HERMES_HOME/SOUL.md" ]; then
cp "$INSTALL_DIR/docker/SOUL.md" "$HERMES_HOME/SOUL.md"
fi
# auth.json: bootstrap from env on first boot only. Used by orchestrators
# (e.g. provisioning a Hermes VPS from an account-management service) that
# need to seed the OAuth refresh credential non-interactively, instead of
# walking the user through `hermes setup` + the device-flow login dance.
# Subsequent token rotations write back to the same file, which lives on a
# persistent volume — so this env var is consumed exactly once at first
# boot. The `[ ! -f ... ]` guard is critical: without it, a container
# restart would clobber a rotated refresh token with the now-stale value
# the orchestrator originally seeded.
if [ ! -f "$HERMES_HOME/auth.json" ] && [ -n "$HERMES_AUTH_JSON_BOOTSTRAP" ]; then
printf '%s' "$HERMES_AUTH_JSON_BOOTSTRAP" > "$HERMES_HOME/auth.json"
chmod 600 "$HERMES_HOME/auth.json"
fi
# Sync bundled skills (manifest-based so user edits are preserved)
if [ -d "$INSTALL_DIR/skills" ]; then
python3 "$INSTALL_DIR/tools/skills_sync.py"
fi
# Optionally start `hermes dashboard` as a side-process.
#!/bin/sh
# s6-overlay shim. The real logic lives in docker/stage2-hook.sh, invoked
# by /etc/cont-init.d/01-hermes-setup (installed by the Dockerfile). This
# file exists so external references to docker/entrypoint.sh still work,
# but it's no longer the ENTRYPOINT — /init is.
#
# Toggled by HERMES_DASHBOARD=1 (also accepts "true"/"yes", case-insensitive).
# Host/port/TUI can be overridden via:
# HERMES_DASHBOARD_HOST (default 0.0.0.0 — exposed outside the container)
# HERMES_DASHBOARD_PORT (default 9119, matches `hermes dashboard` default)
# HERMES_DASHBOARD_TUI (already honored by `hermes dashboard` itself)
# When called directly (e.g. by an old wrapper script that hard-coded
# docker/entrypoint.sh as the container ENTRYPOINT, or by an external
# orchestration script that invokes it inside the container), forward to
# the stage2 hook for parity with the pre-s6 entrypoint behavior. The
# stage2 hook only handles cont-init bootstrap (UID remap, chown, config
# seed, skills sync); it does NOT exec the CMD. Callers that depended
# on the pre-s6 contract "entrypoint.sh sets up state then execs hermes"
# will see the bootstrap happen but the CMD will not run from this shim.
#
# The dashboard is a long-lived server. We background it *before* the final
# `exec hermes "$@"` so the user's chosen foreground command (chat, gateway,
# sleep infinity, …) remains PID-of-interest for the container runtime. When
# the container stops the whole process tree is torn down, so no explicit
# cleanup is needed.
case "${HERMES_DASHBOARD:-}" in
1|true|TRUE|True|yes|YES|Yes)
dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"
dash_port="${HERMES_DASHBOARD_PORT:-9119}"
dash_args=(--host "$dash_host" --port "$dash_port" --no-open)
# Binding to anything other than localhost requires --insecure — the
# dashboard refuses otherwise because it exposes API keys. Inside a
# container this is the expected deployment (host reaches it via
# published port), so opt in automatically.
if [ "$dash_host" != "127.0.0.1" ] && [ "$dash_host" != "localhost" ]; then
dash_args+=(--insecure)
fi
echo "Starting hermes dashboard on ${dash_host}:${dash_port} (background)"
# Prefix dashboard output so it's distinguishable from the main
# process in `docker logs`. stdbuf keeps the pipe line-buffered.
(
stdbuf -oL -eL hermes dashboard "${dash_args[@]}" 2>&1 \
| sed -u 's/^/[dashboard] /'
) &
;;
esac
# Final exec: two supported invocation patterns.
#
# docker run <image> -> exec `hermes` with no args (legacy default)
# docker run <image> chat -q "..." -> exec `hermes chat -q "..."` (legacy wrap)
# docker run <image> sleep infinity -> exec `sleep infinity` directly
# docker run <image> bash -> exec `bash` directly
#
# If the first positional arg resolves to an executable on PATH, we assume the
# caller wants to run it directly (needed by the launcher which runs long-lived
# `sleep infinity` sandbox containers — see tools/environments/docker.py).
# Otherwise we treat the args as a hermes subcommand and wrap with `hermes`,
# preserving the documented `docker run <image> <subcommand>` behavior.
if [ $# -gt 0 ] && command -v "$1" >/dev/null 2>&1; then
exec "$@"
fi
exec hermes "$@"
# Deprecation: this shim is preserved for one release cycle to give
# downstream users time to migrate their wrappers to the image's real
# ENTRYPOINT (`/init`). It will be removed in a future major release.
# Surface a warning to stderr so anyone still invoking this path
# sees the migration notice in their logs.
echo "[hermes] WARNING: docker/entrypoint.sh is a deprecated shim under " \
"s6-overlay. The container's real ENTRYPOINT is /init + " \
"main-wrapper.sh; this script only runs the stage2 cont-init hook " \
"and does NOT exec the CMD. If you hard-coded docker/entrypoint.sh " \
"as your ENTRYPOINT, drop the override — docker will use the image's " \
"default ENTRYPOINT (/init), which handles bootstrap AND CMD." >&2
exec /opt/hermes/docker/stage2-hook.sh "$@"
+30
View File
@@ -0,0 +1,30 @@
#!/bin/sh
# /opt/hermes/docker/main-wrapper.sh — wraps the container's CMD with
# the same argument-routing logic the pre-s6 entrypoint.sh used. Runs
# as /init's "main program" (Docker CMD) so it inherits stdin/stdout/
# stderr from the container.
#
# Routing:
# no args → exec `hermes` (the default)
# first arg is an executable → exec it directly (sleep, bash, sh, …)
# first arg is anything else → exec `hermes <args>` (subcommand passthrough)
#
# We drop to the hermes user via `s6-setuidgid` so the supervised
# workload runs unprivileged (UID 10000 by default).
set -e
cd /opt/data
# shellcheck disable=SC1091
. /opt/hermes/.venv/bin/activate
if [ $# -eq 0 ]; then
exec s6-setuidgid hermes hermes
fi
if command -v "$1" >/dev/null 2>&1; then
# Bare executable — pass through directly.
exec s6-setuidgid hermes "$@"
fi
# Hermes subcommand pass-through.
exec s6-setuidgid hermes hermes "$@"
+30
View File
@@ -0,0 +1,30 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Dashboard finish script. Companion to ./run.
#
# When HERMES_DASHBOARD is unset (or falsy), ./run exits 0 immediately.
# Without this finish script, s6-supervise would just restart the run
# script in a tight loop. By exiting 125 here, we tell s6-supervise
# "this service has permanently failed; do not restart" — equivalent
# to `s6-svc -O`. The supervise slot reports as down, matching reality
# (no dashboard process is running).
#
# When HERMES_DASHBOARD IS enabled and the run script later exits or
# is killed, we want s6-supervise to restart it (the whole point of
# supervised lifecycle). So we exit non-125 in that case.
# Arguments passed to a finish script: $1=run-exit-code, $2=signal-num,
# $3=service-dir-name, $4=run-pgid. See servicedir(7).
case "${HERMES_DASHBOARD:-}" in
1|true|TRUE|True|yes|YES|Yes)
# Dashboard was enabled — let s6-supervise restart on crash by
# exiting non-125. (Pass-through any sensible default.)
exit 0
;;
*)
# Dashboard disabled — permanent-failure marker so s6-supervise
# leaves the slot in 'down' state and s6-svstat reflects that.
exit 125
;;
esac
+40
View File
@@ -0,0 +1,40 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Dashboard service. Always declared so s6 has a supervised slot; if
# HERMES_DASHBOARD isn't truthy the run script exits cleanly and the
# companion finish script returns 125 (s6's "permanent failure, do
# not restart" marker), so s6-svstat reports the slot as down. See
# also docker/s6-rc.d/dashboard/finish.
case "${HERMES_DASHBOARD:-}" in
1|true|TRUE|True|yes|YES|Yes) ;;
*)
# Exit 0; the finish script will exit 125 → s6-supervise won't
# restart us and the slot reports down. Using a clean exit
# (rather than `exec sleep infinity`) means s6-svstat reflects
# reality: when HERMES_DASHBOARD is unset, the service is NOT
# running, just supervised-with-permanent-failure. See PR
# #30136 review item I3.
exit 0
;;
esac
cd /opt/data
# shellcheck disable=SC1091
. /opt/hermes/.venv/bin/activate
dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"
dash_port="${HERMES_DASHBOARD_PORT:-9119}"
# Binding to anything other than localhost requires --insecure — the
# dashboard refuses otherwise because it exposes API keys. Inside a
# container this is the expected deployment.
insecure=""
case "$dash_host" in
127.0.0.1|localhost) ;;
*) insecure="--insecure" ;;
esac
# shellcheck disable=SC2086 # word-splitting of $insecure is intentional
exec s6-setuidgid hermes hermes dashboard \
--host "$dash_host" --port "$dash_port" --no-open $insecure
+1
View File
@@ -0,0 +1 @@
longrun
+27
View File
@@ -0,0 +1,27 @@
#!/command/with-contenv sh
# shellcheck shell=sh
# Main hermes service.
#
# IMPORTANT — this is NOT how the user's CMD runs.
#
# We chose Architecture B from the plan: the container's CMD (the bare
# command the user passes to `docker run <image> …`) runs as /init's
# "main program" via Docker's CMD mechanism, NOT as an s6-supervised
# service. This is the canonical s6-overlay pattern for "container
# exits when the program exits" semantics, and it lets us preserve
# every pre-s6 invocation contract (chat passthrough, sleep infinity,
# bash, --tui) without re-implementing argument routing through
# /run/s6/container_environment.
#
# So why does this service exist at all? Two reasons:
# 1. s6-rc requires at least one user service for the "user" bundle
# to be valid. We can't ship an empty bundle.
# 2. Future work may want to supervise a long-lived hermes process
# (e.g. for gateway-server containers); having the slot already
# wired in keeps that change small.
#
# For now this service is a no-op: it sleeps forever, doing nothing.
# The dashboard runs as a real s6 service alongside it (see
# ../dashboard/run) and per-profile gateways register dynamically via
# /run/service/ at runtime (Phase 4).
exec sleep infinity
+1
View File
@@ -0,0 +1 @@
longrun
+142
View File
@@ -0,0 +1,142 @@
#!/bin/sh
# s6-overlay stage2 hook — runs as root after the supervision tree is
# up but before user services start. Handles UID/GID remap, volume
# chown, config seeding, and skills sync.
#
# Per-service privilege drop happens inside each service's `run` script
# (and in main-wrapper.sh) via s6-setuidgid, not here.
#
# Wired into the image as /etc/cont-init.d/01-hermes-setup by the
# Dockerfile. The shim at docker/entrypoint.sh forwards to this script
# so external references to docker/entrypoint.sh still work.
#
# NB: cont-init.d scripts run with no arguments — the user's CMD args
# are NOT visible here. That's fine: we use Architecture B (s6-overlay
# main-program model), so main-wrapper.sh runs the CMD with full
# stdin/stdout/stderr access and handles arg parsing there.
set -eu
HERMES_HOME="${HERMES_HOME:-/opt/data}"
INSTALL_DIR="/opt/hermes"
# --- UID/GID remap ---
if [ -n "${HERMES_UID:-}" ] && [ "$HERMES_UID" != "$(id -u hermes)" ]; then
echo "[stage2] Changing hermes UID to $HERMES_UID"
usermod -u "$HERMES_UID" hermes
fi
if [ -n "${HERMES_GID:-}" ] && [ "$HERMES_GID" != "$(id -g hermes)" ]; then
echo "[stage2] Changing hermes GID to $HERMES_GID"
# -o allows non-unique GID (e.g. macOS GID 20 "staff" may already
# exist as "dialout" in the Debian-based container image).
groupmod -o -g "$HERMES_GID" hermes 2>/dev/null || true
fi
# --- Fix ownership of data volume ---
actual_hermes_uid=$(id -u hermes)
needs_chown=false
if [ -n "${HERMES_UID:-}" ] && [ "$HERMES_UID" != "10000" ]; then
needs_chown=true
elif [ "$(stat -c %u "$HERMES_HOME" 2>/dev/null)" != "$actual_hermes_uid" ]; then
needs_chown=true
fi
if [ "$needs_chown" = true ]; then
echo "[stage2] Fixing ownership of $HERMES_HOME to hermes ($actual_hermes_uid)"
# In rootless Podman the container's "root" is mapped to an
# unprivileged host UID — chown will fail. That's fine: the volume
# is already owned by the mapped user on the host side.
chown -R hermes:hermes "$HERMES_HOME" 2>/dev/null || \
echo "[stage2] Warning: chown failed (rootless container?) — continuing"
# The .venv must also be re-chowned when UID is remapped, otherwise
# lazy_deps.py cannot install platform packages (discord.py, etc.).
chown -R hermes:hermes "$INSTALL_DIR/.venv" 2>/dev/null || \
echo "[stage2] Warning: chown .venv failed (rootless container?) — continuing"
fi
# Always reset ownership of $HERMES_HOME/profiles to hermes on every
# boot. Profile dirs and files can land owned by root when commands
# are invoked via `docker exec <container> hermes …` (which defaults
# to root unless `-u` is passed), and that breaks the cont-init
# reconciler (02-reconcile-profiles) which runs as hermes and walks
# the profiles dir. Idempotent; skipped on rootless containers where
# chown would fail.
if [ -d "$HERMES_HOME/profiles" ]; then
chown -R hermes:hermes "$HERMES_HOME/profiles" 2>/dev/null || true
fi
# --- config.yaml permissions ---
# Ensure config.yaml is readable by the hermes runtime user even if it
# was edited on the host after initial ownership setup.
if [ -f "$HERMES_HOME/config.yaml" ]; then
chown hermes:hermes "$HERMES_HOME/config.yaml" 2>/dev/null || true
chmod 640 "$HERMES_HOME/config.yaml" 2>/dev/null || true
fi
# --- Seed directory structure as hermes user ---
# Run as hermes via s6-setuidgid so dirs end up owned correctly (matters
# under rootless Podman where chown back to root would fail).
#
# Use direct `mkdir -p` invocation (no `sh -c "..."` wrapper) so the
# shell isn't a second interpreter — defends against $HERMES_HOME values
# containing shell metacharacters. PR #30136 review item O2.
s6-setuidgid hermes mkdir -p \
"$HERMES_HOME/cron" \
"$HERMES_HOME/sessions" \
"$HERMES_HOME/logs" \
"$HERMES_HOME/hooks" \
"$HERMES_HOME/memories" \
"$HERMES_HOME/skills" \
"$HERMES_HOME/skins" \
"$HERMES_HOME/plans" \
"$HERMES_HOME/workspace" \
"$HERMES_HOME/home"
# --- Install-method stamp (read by detect_install_method() in hermes status) ---
# Preserved from the tini-era entrypoint (PR #27843). Must be written as
# the hermes user so ownership matches the file's documented owner.
# tee is invoked directly via s6-setuidgid (no `sh -c` wrapper) for the
# same shell-metacharacter safety described above.
printf 'docker\n' | s6-setuidgid hermes tee "$HERMES_HOME/.install_method" >/dev/null \
|| true
# --- Seed config files (only on first boot) ---
seed_one() {
dest=$1
src=$2
if [ ! -f "$HERMES_HOME/$dest" ] && [ -f "$INSTALL_DIR/$src" ]; then
s6-setuidgid hermes cp "$INSTALL_DIR/$src" "$HERMES_HOME/$dest"
fi
}
seed_one ".env" ".env.example"
seed_one "config.yaml" "cli-config.yaml.example"
seed_one "SOUL.md" "docker/SOUL.md"
# .env holds API keys and secrets — restrict to owner-only access. Applied
# unconditionally (not only on first-seed) so a host-mounted .env that was
# created with a permissive umask gets tightened on every container start.
if [ -f "$HERMES_HOME/.env" ]; then
chown hermes:hermes "$HERMES_HOME/.env" 2>/dev/null || true
chmod 600 "$HERMES_HOME/.env" 2>/dev/null || true
fi
# auth.json: bootstrap from env on first boot only. Same semantics as the
# pre-s6 entrypoint — the [ ! -f ] guard is critical to avoid clobbering
# rotated refresh tokens on container restart.
if [ ! -f "$HERMES_HOME/auth.json" ] && [ -n "${HERMES_AUTH_JSON_BOOTSTRAP:-}" ]; then
printf '%s' "$HERMES_AUTH_JSON_BOOTSTRAP" > "$HERMES_HOME/auth.json"
chown hermes:hermes "$HERMES_HOME/auth.json" 2>/dev/null || true
chmod 600 "$HERMES_HOME/auth.json"
fi
# --- Sync bundled skills ---
# Invoke the venv's python by absolute path so we don't need a `sh -c`
# wrapper to source the activate script. This is safe because
# skills_sync.py doesn't depend on any environment exports beyond what
# the python binary's own bin-stub already sets up (sys.path is rooted
# at the venv's site-packages by virtue of running .venv/bin/python).
if [ -d "$INSTALL_DIR/skills" ]; then
s6-setuidgid hermes "$INSTALL_DIR/.venv/bin/python" "$INSTALL_DIR/tools/skills_sync.py" \
|| echo "[stage2] Warning: skills_sync.py failed; continuing"
fi
echo "[stage2] Setup complete; starting user services"
@@ -0,0 +1,434 @@
# s6-overlay Supervision for Per-Profile Gateways in Docker — Implementation Plan
> **Status: shipped.** Phases 05 landed via PR
> [NousResearch/hermes-agent#30136](https://github.com/NousResearch/hermes-agent/pull/30136)
> in May 2026. This document is preserved as a post-implementation reference
> for the architecture and the resolved design questions. The phase-by-phase
> TDD walkthrough (≈2,800 lines) and the v2/v3 re-validation preambles have
> been removed — the canonical implementation history is the PR commit log
> (`git log --oneline a957ef083..a6f7171a5 -- 'docker/*' 'hermes_cli/service_manager.py' …`).
> Open Questions are collapsed into a single Decision Log table; full
> deliberations live in PR review comments.
**Goal:** Replace `tini` with s6-overlay as PID 1 in the Hermes Docker image so
that the main hermes process, the dashboard, and dynamically-created
per-profile gateways all run as supervised services (auto-restart on crash,
clean shutdown, signal forwarding, zombie reaping). Preserve every existing
`docker run …` invocation pattern — including interactive TUI.
**Architecture:** s6-overlay's `/init` is the container ENTRYPOINT, running
s6-svscan as PID 1. Main hermes and the dashboard are declared as static
s6-rc services at image build time. Per-profile gateways — which users create
*after* the image is built (`hermes profile create coder`
`coder gateway start`) — are registered dynamically by writing service
directories under a scandir watched by s6-svscan. A `ServiceManager` protocol
abstracts the install/start/stop/restart surface across the init systems we
care about (systemd on Linux host, launchd on macOS host, Scheduled Tasks on
native Windows host, s6 inside container) and adds a second tier for runtime
service registration that only s6 implements.
**Tech Stack:**
- [s6-overlay](https://github.com/just-containers/s6-overlay) v3.2.3.0
(noarch + per-arch tarballs ~15 MB). SHA256-pinned via build ARGs;
multi-arch via `TARGETARCH` (amd64 → `x86_64`, arm64 → `aarch64`).
- Debian 13.4 base image (unchanged).
- [hadolint](https://github.com/hadolint/hadolint) for the Dockerfile +
[shellcheck](https://github.com/koalaman/shellcheck) for entrypoint scripts.
- Python subprocess wrappers for `s6-svc`, `s6-svstat`, `s6-svscanctl`.
- Existing systemd/launchd/windows surface in `hermes_cli/gateway.py` and
`hermes_cli/gateway_windows.py`.
**Scope:**
- Container-only (host-side systemd/launchd/windows behavior is preserved,
not modified).
- s6-overlay only (no pure-Python fallback).
- Architecture A (s6 owns PID 1; tini is removed).
- Interactive TUI must keep working:
`docker run -it --rm nousresearch/hermes-agent:latest --tui`.
- Dynamic registration is limited to per-profile gateways — one service per
profile, created when a profile is created, torn down when deleted. A
`gateway-default` slot is always registered for the root HERMES_HOME
profile so `hermes gateway start` (no `-p`) has somewhere to land.
**Out of scope:**
- Host-side dynamic supervision (systemd-run / launchd transient plists) —
not needed.
- Pure-Python supervisor fallback — not needed.
- Arbitrary user-defined supervised processes inside the container — only
profile gateways.
- Migration of existing per-profile systemd unit generation to s6 on the
host side.
- Non-Docker container runtimes (Podman rootless validated reactively).
- UX polish around in-container profile lifecycle (e.g. a nice status view
of all supervised profile gateways) — deferred to follow-up.
---
## Background From The Codebase
> **Note on line numbers:** This section refers to functions and structures
> by name only. Use `grep -n 'def <name>' <file>` to locate anything below
> if you need the current line.
### Pre-s6 container init (what we replaced)
The original `Dockerfile` declared
`ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]`.
tini was PID 1, reaped zombies, forwarded SIGTERM to the process group. The
old `docker/entrypoint.sh`:
1. `gosu` privilege drop from root → `hermes` UID.
2. Copied `.env.example`, `cli-config.yaml.example`, `SOUL.md` into
`$HERMES_HOME` if missing.
3. Synced bundled skills via `tools/skills_sync.py`.
4. Optionally backgrounded `hermes dashboard` in a subshell when
`HERMES_DASHBOARD=1`**not supervised**, no restart.
5. `exec hermes "$@"` — tini's sole direct child.
Known limitations: dashboard crash → stays dead; dashboard fails at startup →
silent; gateway crash → dashboard dies too. The May 4, 2026 decision was
"leave as is" because nothing in the container needed supervision then.
Adding per-profile gateway supervision changed that.
### ServiceManager surface (what we wrapped, not refactored)
All init-system logic lives in **`hermes_cli/gateway.py`** (~5,400 LOC at
re-validation). The systemd/launchd code is ~1,500 lines of that, plus a
separate **`hermes_cli/gateway_windows.py`** (~690 LOC) for Windows
Scheduled Tasks.
| Layer | Systemd functions | Launchd functions | Windows functions |
|---|---|---|---|
| **Detection** | `supports_systemd_services()`, `_systemd_operational()`, `_wsl_systemd_operational()`, `_container_systemd_operational()` | `is_macos()` | `is_windows()`, `gateway_windows.is_installed()` |
| **Paths** | `get_systemd_unit_path(system)`, `get_service_name()` | `get_launchd_plist_path()`, `get_launchd_label()` | `gateway_windows.get_task_name()`, `get_task_script_path()`, `get_startup_entry_path()` |
| **Install/lifecycle** | `systemd_install(force, system, run_as_user)`, `systemd_uninstall(system)`, `systemd_start/stop/restart(system)` | `launchd_install(force)`, `launchd_uninstall/start/stop/restart` | `gateway_windows.install/uninstall/start/stop/restart` |
| **Probes** | `_probe_systemd_service_running(system)`, `_read_systemd_unit_properties(system)`, `_wait_for_systemd_service_restart`, `_recover_pending_systemd_restart` | `_probe_launchd_service_running()` | `gateway_windows.is_task_registered()`, `_pid_exists` helper |
| **D-Bus plumbing** | `_ensure_user_systemd_env`, `_user_systemd_socket_ready`, `_user_systemd_private_socket_path`, `get_systemd_linger_status` | — | — |
| **Unit/plist generation** | `generate_systemd_unit(system, run_as_user)`, `systemd_unit_is_current`, `refresh_systemd_unit_if_needed` | plist templating in `launchd_install` | `_build_gateway_cmd_script`, `_build_startup_launcher`, `_write_task_script` |
Container-relevant callers outside `gateway.py`:
- `hermes_cli/status.py` — gained an `s6` branch for in-container runs.
- `hermes_cli/profiles.py``create_profile` / `delete_profile` register and
unregister with s6 inside the container (no-op on host).
- `hermes_cli/doctor.py``_check_gateway_service_linger` skips on s6, and a
new "Service Supervisor" section reports main-hermes / dashboard /
profile-gateway counts via the ServiceManager.
- `hermes_cli/gateway.py::gateway_command` — the
`elif is_container():` rejection arms that refused gateway lifecycle
operations were removed; the `_dispatch_via_service_manager_if_s6` helper
intercepts start/stop/restart and routes them through s6.
### Per-profile gateway spawning
`hermes gateway start`, `coder gateway start` (profile alias), and
`hermes -p <profile> gateway start` all spawn a gateway process scoped to a
given profile. See
[Profiles: Running Gateways](https://hermes-agent.nousresearch.com/docs/user-guide/profiles#running-gateways).
On host, lifecycle is managed via per-profile systemd units
(`hermes-gateway-<profile>.service`); inside the container, an s6 service at
`/run/service/gateway-<name>/` is registered when the profile is created and
torn down when it's deleted.
**Persistence across container restart:** `/run/service/` is tmpfs —
service registrations are wiped when the container restarts. Profile
directories at `/opt/data/profiles/<name>/` live on the persistent VOLUME,
and each one records its gateway's last state in `gateway_state.json`.
`/etc/cont-init.d/02-reconcile-profiles` walks the persistent profiles on
every container boot, recreates the s6 service slots via
`hermes_cli/container_boot.py`, and auto-starts those whose last recorded
state was `running`. Profiles whose last state was `stopped`,
`startup_failed`, `starting`, or absent get their slot recreated in the
`down` state and wait for explicit user action. `docker restart` is therefore
invisible to a user with running profile gateways: they come back up;
stopped ones stay stopped.
### s6-overlay constraints
- **Root/non-root model:** `/init` runs as root to set up the supervision
tree, install signal handlers, and run the stage2 hook that does
`usermod`/`chown`. Each supervised service drops to UID 10000 via
`s6-setuidgid hermes` in its `run` script. The per-service `s6-supervise`
monitor stays root so it can signal its child regardless of UID. Net
effect: hermes and all its subprocesses run as UID 10000 exactly as
before; only the supervision tree itself runs as root.
- v3.2.3.0 has limited non-root support for running `/init` itself as
non-root — some tools (`fix-attrs`, `logutil-service`) assume root. We
don't hit this because `/init` runs as root.
- Scandir hard cap: `services_max` default 1000, configurable to 160,000.
- `/command/with-contenv` sources `/run/s6/container_environment/*` into
service env — convenient for passing `HERMES_HOME` etc.
- s6 signal semantics: service crash triggers `s6-supervise` restart after
1s; override with a `finish` script.
- Zombie reaping: PID 1 (s6-svscan) reaps all zombies non-blockingly on
SIGCHLD. Any subagent subprocess spawned by the main hermes process is
reaped automatically.
---
## Key Design Decisions
### D1. s6-overlay replaces tini entirely
Container ENTRYPOINT is `/init`, PID 1 is s6-svscan. The main hermes
process, the dashboard, and every per-profile gateway run as supervised
services. This is a single breaking change to the container contract.
### D2. Main hermes is an s6 service with container-exit semantics
The contract "container exits when `hermes` exits" is preserved via a
service `finish` script that writes to
`/run/s6-linux-init-container-results/exitcode` and calls
`/run/s6/basedir/bin/halt`. All five supported invocations work:
| `docker run <image> …` | Behavior |
|---|---|
| (no args) | `hermes` with no args, container exits when hermes exits |
| `chat -q "..."` | `hermes chat -q "..."`, container exits with hermes exit code |
| `sleep infinity` | `sleep infinity` directly (long-lived sandbox mode) |
| `bash` | interactive `bash` directly |
| `docker run -it … --tui` | interactive Ink TUI with real TTY — see D9 |
`docker/main-wrapper.sh` detects whether `$1` is an executable on PATH and
routes either to "run this as a one-shot main service" or "wrap with
hermes".
### D3. Static services at build time; dynamic (per-profile) services at runtime
s6 offers two mechanisms:
- **s6-rc** (declarative, compile-then-swap): used for main hermes and the
dashboard — they're known at image build time.
- **scandir** (drop a directory + `s6-svscanctl -a`): used for per-profile
gateways — profiles are user-created after the image is built.
Per-profile gateway service dirs live at `/run/service/gateway-<profile>/`
(tmpfs, hermes-writable). s6-svscan picks them up on rescan.
### D4. ServiceManager protocol with two methods for runtime registration
Host paths (systemd, launchd, Windows Scheduled Tasks) need only
install/start/stop/restart of pre-declared services. Inside the container,
we additionally need to register services at runtime when a profile is
created. The protocol exposes this directly:
```python
class ServiceManager(Protocol):
kind: ServiceManagerKind # "systemd" | "launchd" | "windows" | "s6" | "none"
# Lifecycle of an already-declared service
def start(self, name: str) -> None: ...
def stop(self, name: str) -> None: ...
def restart(self, name: str) -> None: ...
def is_running(self, name: str) -> bool: ...
# Runtime registration (container-only; hosts raise NotImplementedError)
def supports_runtime_registration(self) -> bool: ...
def register_profile_gateway(
self, profile: str, *,
extra_env: dict[str, str] | None = None,
) -> None: ...
def unregister_profile_gateway(self, profile: str) -> None: ...
def list_profile_gateways(self) -> list[str]: ...
```
Systemd, launchd, and Windows backends raise `NotImplementedError` on the
registration methods. Only the s6 backend implements them. Callers check
`supports_runtime_registration()` before calling.
The scope is intentionally narrow: it's specifically "register/unregister a
profile gateway," not a general-purpose process-management API.
### D5. Per-profile gateway service spec is fixed, not user-provided
Every profile gateway has the same command shape
(`hermes -p <profile> gateway run`, or `hermes gateway run` for the default
profile). The s6 backend generates the `run` script from a fixed template
given the profile name — no arbitrary command list. This keeps the API
surface tight and prevents callers from accidentally registering
non-gateway services.
Port selection is governed by the profile's `config.yaml`
(`[gateway] port = …`) — the single source of truth. (The original plan
proposed a Python-side SHA-256 port allocator with a 600-port range; it was
retired during PR review because it was dead code through the entire stack.)
### D6. Add detect_service_manager() alongside supports_systemd_services()
`supports_systemd_services()` stays as-is (host code paths unchanged). A new
`detect_service_manager() -> Literal["systemd", "launchd", "windows", "s6", "none"]`
composes existing detection functions (`is_macos()`, `is_windows()`,
`supports_systemd_services()`, `is_container()` + `_s6_running()`) and adds
an s6 branch for container detection. Host call sites continue to use the
existing functions; container-only code (the profile hooks) uses the new one.
`_s6_running()` probes `/proc/1/comm` (world-readable) and
`/run/s6/basedir`. The earlier `/proc/1/exe` probe was root-only readable
and silently failed for the unprivileged hermes user (UID 10000), making
the entire runtime-registration path inert in production — caught in PR
review.
### D7. Wrap existing systemd/launchd/windows functions, don't rewrite them
`SystemdServiceManager` / `LaunchdServiceManager` / `WindowsServiceManager`
are thin adapters over the existing `systemd_*` / `launchd_*` module-level
functions in `hermes_cli/gateway.py` and the
`gateway_windows.install/uninstall/start/stop/restart/is_installed`
functions in `hermes_cli/gateway_windows.py`. We get the abstraction
without rewriting ~2,200 LOC of working code.
### D8. Profile create/delete hooks register/unregister the s6 service
When `hermes profile create <name>` runs inside the container, the
profile-creation code path calls
`ServiceManager.register_profile_gateway(<name>)` if
`supports_runtime_registration()` is True. When `hermes profile delete
<name>` runs, it calls `unregister_profile_gateway(<name>)`. On host, both
calls are no-ops (registration not supported; existing systemd unit
generation continues to handle install/uninstall).
Existing per-profile `hermes -p <profile> gateway start/stop/restart` CLI
commands continue to work — in the container they dispatch to
`ServiceManager.start/stop/restart("gateway-<profile>")`, which translates
to `s6-svc -u`/`-d`/`-t` on the service dir.
`hermes gateway start` (no `-p`) targets a special `gateway-default` slot
that's always registered by the cont-init reconciler. Its run script omits
the `-p` flag and runs against the root `$HERMES_HOME` profile.
`--all` lifecycle (`hermes gateway stop --all`, `... restart --all`)
iterates `mgr.list_profile_gateways()` through s6 so s6's `want up`/`want
down` flips correctly. Without this, `--all` fell through to `pkill`
followed by s6-supervise auto-restart — net effect: kick instead of stop.
### D9. Interactive TUI bypasses s6 service-mode and runs as CMD for TTY passthrough
`docker run -it --rm <image> --tui` needs a real TTY connected to container
stdin/stdout for Ink raw-mode keyboard input, cursor control, and SIGWINCH.
Running the TUI as a normal s6 service fails because s6-supervise
disconnects service stdio from the container TTY (documented:
[s6-overlay#230](https://github.com/just-containers/s6-overlay/issues/230)).
**The pattern:** s6-overlay's `/init` execs a CMD as the container's "main
program" after the supervision tree is up. The CMD inherits
stdin/stdout/stderr from `/init` — which in `-it` mode is the container
TTY. The stage2 hook detects the TUI case and short-circuits the
main-hermes service so the hermes CMD becomes that main program.
```sh
# In docker/stage2-hook.sh
_is_tui_invocation() {
for arg in "$@"; do
case "$arg" in --tui|-T) return 0 ;; esac
done
case "${HERMES_TUI:-}" in 1|true|TRUE|yes) return 0 ;; esac
if [ -t 0 ] && [ $# -eq 0 ]; then return 0; fi
return 1
}
```
And in `docker/s6-rc.d/main-hermes/run`:
```sh
if [ -f /var/run/s6/container_environment/HERMES_TUI_MODE ]; then
exec sleep infinity # s6-overlay will exec CMD as the TTY-connected main
fi
exec s6-setuidgid hermes hermes ${HERMES_ARGS:-}
```
In TUI mode main hermes is effectively unsupervised (same as the pre-s6
behavior with tini — acceptable because the user is interactively
present). Dashboard and profile gateways still get full s6 supervision via
their separate services.
The integration test `test_tty_passthrough_to_container` uses `tput cols`
and `COLUMNS=123` as the probe.
---
## Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Phase 2 breaks a downstream user's Dockerfile that `FROM`s ours | Medium | Medium | Release notes call out ENTRYPOINT change; the test harness (`tests/docker/`) gives high confidence in behavior parity |
| TUI TTY passthrough fails on some Docker versions | Low | High | Harness includes `test_tty_passthrough_to_container` as a hard gate; fallback plan = s6-fdholder ([s6-overlay#230](https://github.com/just-containers/s6-overlay/issues/230) Solution 2) |
| s6-overlay non-root quirks (logutil-service, fix-attrs) bite us | Low | Low | Supervisor runs as root, services drop — sidesteps these issues |
| Podman rootless UID mapping confuses s6 | Medium | Low | Documented as supported, fix reactively; a Podman + Docker environment is stood up for validation |
| Test harness is flaky (docker daemon issues, timing) | Medium | Low | Generous timeouts; skip when docker unavailable; polling helpers replace fixed sleeps in `test_container_restart.py` |
| Profile gateway crash loop masks a real config error | Low | Medium | s6 `finish` script `max_restarts` cap (planned follow-up); operators see crash-looping logs in `$HERMES_HOME/logs/gateways/<profile>/` |
| Dockerfile+entrypoint drift from linter (hadolint/shellcheck) reveals latent bugs | Low | Low | CI lint jobs catch them; fix or document ignore with rationale |
| Stale `gateway.pid` from a dead container collides with an unrelated live PID in the restarted container | Low | Medium | Cont-init reconciliation removes `gateway.pid` and `processes.json` from every profile dir on boot, before any new gateway starts |
| `docker restart` silently loses per-profile gateway registrations (tmpfs scandir wiped) | High (without mitigation) | High | Cont-init reconciliation re-registers from persistent `$HERMES_HOME/profiles/` and auto-starts those last seen `running`; outcome recorded to `$HERMES_HOME/logs/container-boot.log` (size-bounded, rotates to `.1` at 256 KiB) |
| A `running` gateway that's actually broken auto-restarts into a crash loop after every container restart | Low | Medium | s6 `finish` script `max_restarts` cap (planned); follow-up: `hermes doctor` alerts when N consecutive container restarts ended in `startup_failed` |
| `_s6_running()` detection works as root but silently fails for unprivileged hermes user, making runtime-registration path inert | High (without mitigation) | High | **Caught in PR review.** Detection now probes `/proc/1/comm` (world-readable) + `/run/s6/basedir`. Docker integration tests refactored to `docker exec -u hermes` so the realistic runtime user is exercised |
| `s6-svscanctl` from hermes hits EACCES on the root-owned control FIFO | Medium | Medium | `02-reconcile-profiles` chowns `/run/service/.s6-svscan/{control,lock}` to hermes after stage1 creates them |
| Per-service `supervise/control` FIFO is root-owned by s6-supervise, blocking `s6-svc` from hermes | Known | Medium | Surfaced cleanly as `S6CommandError` (with rc + stderr) instead of raw `CalledProcessError`. Permission fix tracked as a follow-up (small SUID helper, polling chown loop in cont-init.d, or replace `s6-svc` with `down`-marker manipulation) |
---
## Decision Log
| # | Question | Decision |
|---|---|---|
| OQ1 | Gate Phase 2 behind env var? | Ship directly (Hermes is pre-1.0; users can pin the previous image) |
| OQ2 | s6 root model | Root `/init`, drop per-service via `s6-setuidgid hermes` |
| OQ3 | Dashboard opt-in mechanism | Always declared as an s6 service; `03-dashboard-toggle` cont-init script writes a `down` marker when `HERMES_DASHBOARD` is unset so `s6-svstat` reports the slot's real state |
| OQ4 | Podman rootless | Supported, fix reactively |
| OQ5 | Service naming | `gateway-<profile>` (matches pre-existing `hermes-gateway-<profile>.service` systemd convention) |
| OQ6 | — (retired; no subagent gateways in scope) | — |
| OQ7 | Resource limits per profile gateway | Defer (no per-cgroup limits; rely on the container's overall limit) |
| OQ8 | Log persistence | `$HERMES_HOME/logs/gateways/<profile>/`. The log path is sourced from runtime `$HERMES_HOME` via `with-contenv`, NOT Python-substituted at registration time |
| OQ9 | TUI passthrough | Trust the documented [s6-overlay#230](https://github.com/just-containers/s6-overlay/issues/230) Solution 1; harness includes a TTY passthrough hard-gate test |
**Post-merge additions from PR #30136 review:**
- **Multi-arch tarballs:** `TARGETARCH` mapped to `x86_64` / `aarch64`;
per-arch tarball fetched via `curl` because `ADD` doesn't honor BuildKit
args.
- **SHA256 verification:** all three tarballs (noarch, symlinks, per-arch)
pinned via build ARGs and verified with `sha256sum -c` against a single
checksum file (avoids hadolint DL4006 piped-shell warning).
- **`gateway-default` slot:** always registered by the reconciler so
`hermes gateway start` (no `-p`) has somewhere to land.
- **Friendly lifecycle errors:** `GatewayNotRegisteredError` and
`S6CommandError` translate `CalledProcessError` into actionable CLI
messages.
- **Atomic publication in the reconciler:** mirrors
`register_profile_gateway`'s tmp+rename pattern.
- **`container-boot.log` rotation:** 256 KiB soft cap, rotated to `.1`.
- **`port` parameter retired:** allocator + kwarg were dead code through
the entire stack; `config.yaml` is the single source of truth.
---
## Verification Checklist
- [x] Test harness (`tests/docker/`) passes against the s6 image
- [x] hadolint + shellcheck run green in CI
- [x] `docker run -it --rm hermes-agent --tui` starts the Ink TUI with
working keyboard input, cursor control, and resize (SIGWINCH)
- [x] Dashboard crashes are recovered by s6 within ~2s
- [x] `hermes profile create test` inside a container creates
`/run/service/gateway-test/`
- [x] `hermes -p test gateway start` inside a container dispatches through s6
- [x] `hermes -p test gateway stop` inside a container cleanly stops via s6
- [x] `hermes profile delete test` inside a container removes
`/run/service/gateway-test/`
- [x] Profile gateway logs persist at
`$HERMES_HOME/logs/gateways/test/current`
- [x] `hermes status` inside the container shows `Manager: s6`
- [x] `hermes gateway start` (no `-p`) inside a container targets
`gateway-default` and runs against the root profile
- [x] `hermes gateway stop --all` / `... restart --all` iterate every
profile gateway under s6 instead of pkill-then-supervise-restart
- [x] `docker restart` survives per-profile gateway registrations via the
cont-init reconciler; running gateways come back up, stopped ones
stay down
- [x] Multi-arch image builds for both `linux/amd64` and `linux/arm64`
- [x] s6-overlay tarballs are SHA256-verified at build time
- [x] No systemd/launchd host-side functions were modified (only wrapped)
- [x] `hermes gateway install/start/stop` on Linux host and macOS host
behave identically to pre-change
+103 -106
View File
@@ -424,7 +424,9 @@ _PLATFORM_CONNECTED_CHECKERS: dict[Platform, Callable[[PlatformConfig], bool]] =
Platform.SMS: lambda cfg: bool(os.getenv("TWILIO_ACCOUNT_SID")),
Platform.API_SERVER: lambda cfg: True,
Platform.WEBHOOK: lambda cfg: True,
Platform.MSGRAPH_WEBHOOK: lambda cfg: True,
Platform.MSGRAPH_WEBHOOK: lambda cfg: bool(
str(cfg.extra.get("client_state") or "").strip()
),
Platform.FEISHU: lambda cfg: bool(cfg.extra.get("app_id")),
Platform.WECOM: lambda cfg: bool(cfg.extra.get("bot_id")),
Platform.WECOM_CALLBACK: lambda cfg: bool(
@@ -926,73 +928,6 @@ def load_gateway_config() -> GatewayConfig:
ac = ",".join(str(v) for v in ac)
os.environ["SLACK_ALLOWED_CHANNELS"] = str(ac)
# Discord settings → env vars (env vars take precedence)
discord_cfg = yaml_cfg.get("discord", {})
if isinstance(discord_cfg, dict):
if "require_mention" in discord_cfg and not os.getenv("DISCORD_REQUIRE_MENTION"):
os.environ["DISCORD_REQUIRE_MENTION"] = str(discord_cfg["require_mention"]).lower()
if "thread_require_mention" in discord_cfg and not os.getenv("DISCORD_THREAD_REQUIRE_MENTION"):
os.environ["DISCORD_THREAD_REQUIRE_MENTION"] = str(discord_cfg["thread_require_mention"]).lower()
frc = discord_cfg.get("free_response_channels")
if frc is not None and not os.getenv("DISCORD_FREE_RESPONSE_CHANNELS"):
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["DISCORD_FREE_RESPONSE_CHANNELS"] = str(frc)
if "auto_thread" in discord_cfg and not os.getenv("DISCORD_AUTO_THREAD"):
os.environ["DISCORD_AUTO_THREAD"] = str(discord_cfg["auto_thread"]).lower()
if "reactions" in discord_cfg and not os.getenv("DISCORD_REACTIONS"):
os.environ["DISCORD_REACTIONS"] = str(discord_cfg["reactions"]).lower()
# ignored_channels: channels where bot never responds (even when mentioned)
ic = discord_cfg.get("ignored_channels")
if ic is not None and not os.getenv("DISCORD_IGNORED_CHANNELS"):
if isinstance(ic, list):
ic = ",".join(str(v) for v in ic)
os.environ["DISCORD_IGNORED_CHANNELS"] = str(ic)
# allowed_channels: if set, bot ONLY responds in these channels (whitelist)
ac = discord_cfg.get("allowed_channels")
if ac is not None and not os.getenv("DISCORD_ALLOWED_CHANNELS"):
if isinstance(ac, list):
ac = ",".join(str(v) for v in ac)
os.environ["DISCORD_ALLOWED_CHANNELS"] = str(ac)
# no_thread_channels: channels where bot responds directly without creating thread
ntc = discord_cfg.get("no_thread_channels")
if ntc is not None and not os.getenv("DISCORD_NO_THREAD_CHANNELS"):
if isinstance(ntc, list):
ntc = ",".join(str(v) for v in ntc)
os.environ["DISCORD_NO_THREAD_CHANNELS"] = str(ntc)
# history_backfill: recover missed channel messages for shared sessions
# when require_mention is active. Fetches messages between bot turns
# and prepends them to the user message for context.
if "history_backfill" in discord_cfg and not os.getenv("DISCORD_HISTORY_BACKFILL"):
os.environ["DISCORD_HISTORY_BACKFILL"] = str(discord_cfg["history_backfill"]).lower()
hbl = discord_cfg.get("history_backfill_limit")
if hbl is not None and not os.getenv("DISCORD_HISTORY_BACKFILL_LIMIT"):
os.environ["DISCORD_HISTORY_BACKFILL_LIMIT"] = str(hbl)
# allow_mentions: granular control over what the bot can ping.
# Safe defaults (no @everyone/roles) are applied in the adapter;
# these YAML keys only override when set and let users opt back
# into unsafe modes (e.g. roles=true) if they actually want it.
allow_mentions_cfg = discord_cfg.get("allow_mentions")
if isinstance(allow_mentions_cfg, dict):
for yaml_key, env_key in (
("everyone", "DISCORD_ALLOW_MENTION_EVERYONE"),
("roles", "DISCORD_ALLOW_MENTION_ROLES"),
("users", "DISCORD_ALLOW_MENTION_USERS"),
("replied_user", "DISCORD_ALLOW_MENTION_REPLIED_USER"),
):
if yaml_key in allow_mentions_cfg and not os.getenv(env_key):
os.environ[env_key] = str(allow_mentions_cfg[yaml_key]).lower()
# reply_to_mode: top-level preferred, falls back to extra.reply_to_mode
# YAML 1.1 parses bare 'off' as boolean False — coerce to string "off".
_discord_extra = discord_cfg.get("extra") if isinstance(discord_cfg.get("extra"), dict) else {}
_discord_rtm = (
discord_cfg["reply_to_mode"] if "reply_to_mode" in discord_cfg
else _discord_extra.get("reply_to_mode")
)
if _discord_rtm is not None and not os.getenv("DISCORD_REPLY_TO_MODE"):
_rtm_str = "off" if _discord_rtm is False else str(_discord_rtm).lower()
os.environ["DISCORD_REPLY_TO_MODE"] = _rtm_str
# Bridge top-level require_mention to Telegram when the telegram: section
# does not already provide one. Users often write "require_mention: true"
# at the top level alongside group_sessions_per_user, expecting it to work
@@ -1154,22 +1089,8 @@ def load_gateway_config() -> GatewayConfig:
allowed = ",".join(str(v) for v in allowed)
os.environ["DINGTALK_ALLOWED_USERS"] = str(allowed)
# Mattermost settings → env vars (env vars take precedence)
mattermost_cfg = yaml_cfg.get("mattermost", {})
if isinstance(mattermost_cfg, dict):
if "require_mention" in mattermost_cfg and not os.getenv("MATTERMOST_REQUIRE_MENTION"):
os.environ["MATTERMOST_REQUIRE_MENTION"] = str(mattermost_cfg["require_mention"]).lower()
frc = mattermost_cfg.get("free_response_channels")
if frc is not None and not os.getenv("MATTERMOST_FREE_RESPONSE_CHANNELS"):
if isinstance(frc, list):
frc = ",".join(str(v) for v in frc)
os.environ["MATTERMOST_FREE_RESPONSE_CHANNELS"] = str(frc)
# allowed_channels: if set, bot ONLY responds in these channels (whitelist)
ac = mattermost_cfg.get("allowed_channels")
if ac is not None and not os.getenv("MATTERMOST_ALLOWED_CHANNELS"):
if isinstance(ac, list):
ac = ",".join(str(v) for v in ac)
os.environ["MATTERMOST_ALLOWED_CHANNELS"] = str(ac)
# Mattermost config bridge moved into plugins/platforms/mattermost/
# adapter.py::_apply_yaml_config — see #25443 (apply_yaml_config_fn).
# Matrix settings → env vars (env vars take precedence)
matrix_cfg = yaml_cfg.get("matrix", {})
@@ -1878,6 +1799,17 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
# need to seed ``PlatformConfig.extra`` from env vars (e.g. Google Chat's
# project_id / subscription_name) can supply ``env_enablement_fn`` on
# their PlatformEntry — called here BEFORE adapter construction.
#
# Enablement gate (#31116): when a plugin registers ``is_connected``
# (the "has the user actually configured credentials for this?" check),
# we MUST consult it before flipping ``enabled = True``. Otherwise
# ``check_fn`` alone — which for adapter plugins typically just
# verifies the SDK is importable / lazy-installs it — silently enables
# platforms the user never opted into, and the gateway then tries to
# connect to Discord / Teams / Google Chat with no token and emits
# noisy retry-forever errors. ``_platform_status`` was already fixed
# for the same bug class in commit 7849a3d73; this is the runtime
# counterpart.
try:
from hermes_cli.plugins import discover_plugins
discover_plugins() # idempotent
@@ -1890,34 +1822,99 @@ def _apply_env_overrides(config: GatewayConfig) -> None:
logger.debug("check_fn for %s raised: %s", entry.name, e)
continue
platform = Platform(entry.name)
if platform not in config.platforms:
config.platforms[platform] = PlatformConfig()
config.platforms[platform].enabled = True
# Seed extras from env if the plugin opted in.
existing_cfg = config.platforms.get(platform)
# Seed candidate extras from ``env_enablement_fn`` so plugins
# whose ``is_connected`` reads ``config.extra`` (e.g. Google
# Chat's ``_is_connected`` checks ``config.extra["project_id"]``)
# see the same state they will after enablement. Without this,
# Google-Chat-on-env-vars-only setups silently fail the gate
# below even though the user is configured. Plugins whose
# ``is_connected`` reads env vars directly (Discord, IRC,
# Teams, LINE, ntfy, Simplex) are unaffected; this only
# restores Google Chat.
seed_for_probe = None
if entry.env_enablement_fn is not None:
try:
seed = entry.env_enablement_fn()
seed_for_probe = entry.env_enablement_fn()
except Exception as e:
logger.debug(
"env_enablement_fn for %s raised: %s", entry.name, e
)
seed = None
if isinstance(seed, dict) and seed:
# Extract the home_channel dict (if provided) so we wire it
# up as a proper HomeChannel dataclass. Everything else is
# merged into ``extra``.
home = seed.pop("home_channel", None)
config.platforms[platform].extra.update(seed)
if isinstance(home, dict) and home.get("chat_id"):
config.platforms[platform].home_channel = HomeChannel(
platform=platform,
chat_id=str(home["chat_id"]),
name=str(home.get("name") or "Home"),
thread_id=(
str(home["thread_id"])
if home.get("thread_id")
else None
),
seed_for_probe = None
# Only consult is_connected for platforms that are NOT already
# explicitly configured in YAML / env (existing_cfg with
# enabled=True means the user wrote it themselves or another
# env-var bridge enabled it — keep that decision).
if existing_cfg is None or not existing_cfg.enabled:
if entry.is_connected is not None:
try:
# Probe with ``enabled=True`` since we're asking
# "would this plugin BE configured if we enabled
# it?" not "is it currently enabled?". Google
# Chat's ``_is_connected`` short-circuits on
# ``config.enabled`` being False, which on the
# default ``PlatformConfig()`` would fail the
# gate even with proper env vars set.
if existing_cfg is not None:
probe_cfg = existing_cfg
if not probe_cfg.enabled:
probe_cfg = PlatformConfig(
enabled=True,
extra=dict(probe_cfg.extra or {}),
)
else:
probe_cfg = PlatformConfig(enabled=True)
if isinstance(seed_for_probe, dict) and seed_for_probe:
# Don't mutate ``existing_cfg``; the probe gets
# a transient view with env-seeded extras layered
# on top of whatever's already there.
probe_extra = dict(getattr(probe_cfg, "extra", {}) or {})
for k, v in seed_for_probe.items():
if k == "home_channel":
continue
probe_extra.setdefault(k, v)
probe_cfg = PlatformConfig(
enabled=True,
extra=probe_extra,
)
configured = bool(entry.is_connected(probe_cfg))
except Exception as exc:
logger.debug(
"is_connected for %s raised: %s — skipping enablement",
entry.name, exc,
)
configured = False
if not configured:
logger.debug(
"Plugin platform '%s' available but not configured "
"(is_connected returned False) — skipping enable",
entry.name,
)
continue
if platform not in config.platforms:
config.platforms[platform] = PlatformConfig()
config.platforms[platform].enabled = True
# Commit env-seeded extras onto the now-enabled platform.
# We've already called ``env_enablement_fn`` above (for the
# probe); reuse that result instead of calling it twice.
if isinstance(seed_for_probe, dict) and seed_for_probe:
seed = dict(seed_for_probe)
# Extract the home_channel dict (if provided) so we wire it
# up as a proper HomeChannel dataclass. Everything else is
# merged into ``extra``.
home = seed.pop("home_channel", None)
config.platforms[platform].extra.update(seed)
if isinstance(home, dict) and home.get("chat_id"):
config.platforms[platform].home_channel = HomeChannel(
platform=platform,
chat_id=str(home["chat_id"]),
name=str(home.get("name") or "Home"),
thread_id=(
str(home["thread_id"])
if home.get("thread_id")
else None
),
)
except Exception as e:
logger.debug("Plugin platform enable pass failed: %s", e)
+84 -30
View File
@@ -28,6 +28,10 @@ import time
from pathlib import Path
from typing import Optional
from gateway.whatsapp_identity import (
expand_whatsapp_aliases,
normalize_whatsapp_identifier,
)
from hermes_constants import get_hermes_dir
from utils import atomic_replace
@@ -110,12 +114,40 @@ class PairingStore:
def _save_json(self, path: Path, data: dict) -> None:
_secure_write(path, json.dumps(data, indent=2, ensure_ascii=False))
def _normalize_user_id(self, platform: str, user_id: str) -> str:
"""Normalize platform-specific user IDs before persisting them."""
raw_user_id = str(user_id or "").strip()
if platform == "whatsapp":
return normalize_whatsapp_identifier(raw_user_id) or raw_user_id
return raw_user_id
def _user_id_aliases(self, platform: str, user_id: str) -> set[str]:
"""Return all known equivalent user IDs for auth/rate-limit checks."""
raw_user_id = str(user_id or "").strip()
if not raw_user_id:
return set()
aliases = {raw_user_id, self._normalize_user_id(platform, raw_user_id)}
if platform == "whatsapp":
aliases.update(expand_whatsapp_aliases(raw_user_id))
aliases.discard("")
return aliases
def _user_ids_match(self, platform: str, left: str, right: str) -> bool:
"""Return True when two user IDs represent the same principal."""
left_aliases = self._user_id_aliases(platform, left)
right_aliases = self._user_id_aliases(platform, right)
return bool(left_aliases and right_aliases and (left_aliases & right_aliases))
# ----- Approved users -----
def is_approved(self, platform: str, user_id: str) -> bool:
"""Check if a user is approved (paired) on a platform."""
approved = self._load_json(self._approved_path(platform))
return user_id in approved
for approved_user_id in approved:
if self._user_ids_match(platform, approved_user_id, user_id):
return True
return False
def list_approved(self, platform: str = None) -> list:
"""List approved users, optionally filtered by platform."""
@@ -130,7 +162,16 @@ class PairingStore:
def _approve_user(self, platform: str, user_id: str, user_name: str = "") -> None:
"""Add a user to the approved list. Must be called under self._lock."""
approved = self._load_json(self._approved_path(platform))
approved[user_id] = {
normalized_user_id = self._normalize_user_id(platform, user_id)
duplicate_ids = [
approved_user_id
for approved_user_id in approved
if self._user_ids_match(platform, approved_user_id, normalized_user_id)
]
for approved_user_id in duplicate_ids:
del approved[approved_user_id]
approved[normalized_user_id] = {
"user_name": user_name,
"approved_at": time.time(),
}
@@ -141,8 +182,14 @@ class PairingStore:
path = self._approved_path(platform)
with self._lock:
approved = self._load_json(path)
if user_id in approved:
del approved[user_id]
matching_ids = [
approved_user_id
for approved_user_id in approved
if self._user_ids_match(platform, approved_user_id, user_id)
]
if matching_ids:
for approved_user_id in matching_ids:
del approved[approved_user_id]
self._save_json(path, approved)
return True
return False
@@ -170,6 +217,7 @@ class PairingStore:
"""
with self._lock:
self._cleanup_expired(platform)
normalized_user_id = self._normalize_user_id(platform, user_id)
# Check lockout
if self._is_locked_out(platform):
@@ -198,7 +246,7 @@ class PairingStore:
pending[entry_id] = {
"hash": code_hash,
"salt": salt.hex(),
"user_id": user_id,
"user_id": normalized_user_id,
"user_name": user_name,
"created_at": time.time(),
}
@@ -287,26 +335,27 @@ class PairingStore:
can see them age out without crashing on a missing ``hash`` field.
"""
results = []
platforms = [platform] if platform else self._all_platforms("pending")
for p in platforms:
self._cleanup_expired(p)
pending = self._load_json(self._pending_path(p))
for entry_id, info in pending.items():
if not isinstance(info, dict):
continue
created_at = info.get("created_at")
if not isinstance(created_at, (int, float)):
continue
age_min = int((time.time() - created_at) / 60)
hash_val = info.get("hash")
code_display = hash_val[:8] if isinstance(hash_val, str) else "legacy"
results.append({
"platform": p,
"code": code_display,
"user_id": info.get("user_id", ""),
"user_name": info.get("user_name", ""),
"age_minutes": age_min,
})
with self._lock:
platforms = [platform] if platform else self._all_platforms("pending")
for p in platforms:
self._cleanup_expired(p)
pending = self._load_json(self._pending_path(p))
for entry_id, info in pending.items():
if not isinstance(info, dict):
continue
created_at = info.get("created_at")
if not isinstance(created_at, (int, float)):
continue
age_min = int((time.time() - created_at) / 60)
hash_val = info.get("hash")
code_display = hash_val[:8] if isinstance(hash_val, str) else "legacy"
results.append({
"platform": p,
"code": code_display,
"user_id": info.get("user_id", ""),
"user_name": info.get("user_name", ""),
"age_minutes": age_min,
})
return results
def clear_pending(self, platform: str = None) -> int:
@@ -325,15 +374,20 @@ class PairingStore:
def _is_rate_limited(self, platform: str, user_id: str) -> bool:
"""Check if a user has requested a code too recently."""
limits = self._load_json(self._rate_limit_path())
key = f"{platform}:{user_id}"
last_request = limits.get(key, 0)
return (time.time() - last_request) < RATE_LIMIT_SECONDS
for alias in self._user_id_aliases(platform, user_id):
key = f"{platform}:{alias}"
last_request = limits.get(key, 0)
if (time.time() - last_request) < RATE_LIMIT_SECONDS:
return True
return False
def _record_rate_limit(self, platform: str, user_id: str) -> None:
"""Record the time of a pairing request for rate limiting."""
limits = self._load_json(self._rate_limit_path())
key = f"{platform}:{user_id}"
limits[key] = time.time()
now = time.time()
for alias in self._user_id_aliases(platform, user_id):
key = f"{platform}:{alias}"
limits[key] = now
self._save_json(self._rate_limit_path(), limits)
def _is_locked_out(self, platform: str) -> bool:
+90
View File
@@ -35,6 +35,7 @@ import re
import sqlite3
import time
import uuid
from pathlib import Path
from typing import Any, Dict, List, Optional
try:
@@ -337,10 +338,12 @@ class ResponseStore:
db_path = str(get_hermes_home() / "response_store.db")
except Exception:
db_path = ":memory:"
self._db_path: Optional[str] = db_path if db_path != ":memory:" else None
try:
self._conn = sqlite3.connect(db_path, check_same_thread=False)
except Exception:
self._conn = sqlite3.connect(":memory:", check_same_thread=False)
self._db_path = None
# Use shared WAL-fallback helper so response_store.db degrades
# gracefully on NFS/SMB/FUSE-mounted HERMES_HOME (same filesystem
# issue addressed for state.db/kanban.db — see
@@ -361,6 +364,31 @@ class ResponseStore:
)"""
)
self._conn.commit()
# response_store.db contains conversation history (tool payloads,
# prompts, results). Tighten to owner-only after creation so other
# local users on a shared box can't read it. Run once at __init__
# rather than after every commit — chmod-on-every-write is wasted
# syscalls on a hot path.
self._tighten_file_permissions()
def _tighten_file_permissions(self) -> None:
"""Force owner-only permissions on the DB and SQLite sidecars."""
if not self._db_path:
return
for candidate in (
Path(self._db_path),
Path(f"{self._db_path}-wal"),
Path(f"{self._db_path}-shm"),
):
try:
if candidate.exists():
candidate.chmod(0o600)
except OSError:
logger.debug(
"Failed to restrict response store permissions for %s",
candidate,
exc_info=True,
)
def get(self, response_id: str) -> Optional[Dict[str, Any]]:
"""Retrieve a stored response by ID (updates access time for LRU)."""
@@ -735,6 +763,58 @@ class APIServerAdapter(BasePlatformAdapter):
return "*" in self._cors_origins or origin in self._cors_origins
@staticmethod
def _clean_log_value(value: Any, *, max_len: int = 200) -> str:
"""Sanitize request metadata before it reaches security logs."""
if value is None:
return ""
text = str(value).replace("\r", " ").replace("\n", " ").strip()
return text[:max_len]
def _request_audit_context(self, request: "web.Request") -> Dict[str, str]:
"""Return non-secret source metadata for security/audit warnings."""
peer_ip = ""
try:
peer = request.transport.get_extra_info("peername") if request.transport else None
if isinstance(peer, (tuple, list)) and peer:
peer_ip = str(peer[0])
except Exception:
peer_ip = ""
return {
"remote": self._clean_log_value(getattr(request, "remote", "") or peer_ip),
"peer_ip": self._clean_log_value(peer_ip),
"forwarded_for": self._clean_log_value(request.headers.get("X-Forwarded-For", "")),
"real_ip": self._clean_log_value(request.headers.get("X-Real-IP", "")),
"method": self._clean_log_value(request.method, max_len=16),
"path": self._clean_log_value(request.path_qs, max_len=500),
"user_agent": self._clean_log_value(request.headers.get("User-Agent", ""), max_len=300),
}
def _request_audit_log_suffix(self, request: "web.Request") -> str:
ctx = self._request_audit_context(request)
fields = [f"{key}={value!r}" for key, value in ctx.items() if value]
return " ".join(fields) if fields else "source='unknown'"
def _cron_origin_from_request(self, request: "web.Request") -> Dict[str, str]:
"""Persist safe API source metadata on cron jobs created over HTTP."""
ctx = self._request_audit_context(request)
origin = {
"platform": "api_server",
"chat_id": "api",
}
if ctx.get("remote"):
origin["source_ip"] = ctx["remote"]
if ctx.get("peer_ip"):
origin["peer_ip"] = ctx["peer_ip"]
if ctx.get("forwarded_for"):
origin["forwarded_for"] = ctx["forwarded_for"]
if ctx.get("real_ip"):
origin["real_ip"] = ctx["real_ip"]
if ctx.get("user_agent"):
origin["user_agent"] = ctx["user_agent"]
return origin
# ------------------------------------------------------------------
# Auth helper
# ------------------------------------------------------------------
@@ -756,6 +836,10 @@ class APIServerAdapter(BasePlatformAdapter):
if hmac.compare_digest(token, self._api_key):
return None # Auth OK
logger.warning(
"API server rejected invalid API key: %s",
self._request_audit_log_suffix(request),
)
return web.json_response(
{"error": {"message": "Invalid API key", "type": "invalid_request_error", "code": "invalid_api_key"}},
status=401,
@@ -2426,6 +2510,11 @@ class APIServerAdapter(BasePlatformAdapter):
"""Validate and extract job_id. Returns (job_id, error_response)."""
job_id = request.match_info["job_id"]
if not self._JOB_ID_RE.fullmatch(job_id):
logger.warning(
"Cron jobs API rejected invalid job_id %r: %s",
job_id,
self._request_audit_log_suffix(request),
)
return job_id, web.json_response(
{"error": "Invalid job ID format"}, status=400,
)
@@ -2483,6 +2572,7 @@ class APIServerAdapter(BasePlatformAdapter):
"schedule": schedule,
"name": name,
"deliver": deliver,
"origin": self._cron_origin_from_request(request),
}
if skills:
kwargs["skills"] = skills
+455 -26
View File
@@ -15,6 +15,7 @@ import re
import socket as _socket
import subprocess
import sys
import time
import uuid
from abc import ABC, abstractmethod
from urllib.parse import urlsplit
@@ -40,6 +41,16 @@ def _platform_name(platform) -> str:
return str(value or "").lower()
def _float_env(name: str, default: float) -> float:
raw = os.environ.get(name, "").strip()
if not raw:
return default
try:
return float(raw)
except (TypeError, ValueError):
return default
def _thread_metadata_for_source(source, reply_to_message_id: str | None = None) -> dict | None:
"""Build platform-aware thread metadata for adapter sends.
@@ -472,7 +483,7 @@ sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
from gateway.config import Platform, PlatformConfig
from gateway.session import SessionSource, build_session_key
from hermes_constants import get_hermes_dir
from hermes_constants import get_hermes_dir, get_hermes_home
GATEWAY_SECRET_CAPTURE_UNSUPPORTED_MESSAGE = (
@@ -813,6 +824,201 @@ def cache_video_from_bytes(data: bytes, ext: str = ".mp4") -> str:
# ---------------------------------------------------------------------------
DOCUMENT_CACHE_DIR = get_hermes_dir("cache/documents", "document_cache")
SCREENSHOT_CACHE_DIR = get_hermes_dir("cache/screenshots", "browser_screenshots")
_HERMES_HOME = get_hermes_home()
MEDIA_DELIVERY_ALLOW_DIRS_ENV = "HERMES_MEDIA_ALLOW_DIRS"
MEDIA_DELIVERY_TRUST_RECENT_ENV = "HERMES_MEDIA_TRUST_RECENT_FILES"
MEDIA_DELIVERY_TRUST_RECENT_SECONDS_ENV = "HERMES_MEDIA_TRUST_RECENT_SECONDS"
MEDIA_DELIVERY_SAFE_ROOTS = (
IMAGE_CACHE_DIR,
AUDIO_CACHE_DIR,
VIDEO_CACHE_DIR,
DOCUMENT_CACHE_DIR,
SCREENSHOT_CACHE_DIR,
_HERMES_HOME / "image_cache",
_HERMES_HOME / "audio_cache",
_HERMES_HOME / "video_cache",
_HERMES_HOME / "document_cache",
_HERMES_HOME / "browser_screenshots",
)
# Default recency window for trusting freshly-produced files (seconds).
# The agent's actual work generally completes well inside 10 minutes; legitimate
# build artifacts (PDFs from pandoc, plots from matplotlib, etc.) almost always
# land seconds before delivery. Old system files (/etc/passwd, ~/.ssh/id_rsa,
# stray credentials) have mtimes measured in days or months — well outside this
# window — so prompt-injection paths pointing at pre-existing host files are
# still rejected.
_MEDIA_DELIVERY_TRUST_RECENT_DEFAULT_SECONDS = 600
# Hard denylist applied even when a path would otherwise pass recency trust.
# These prefixes hold credentials, system state, or process introspection that
# should never be uploaded as a gateway attachment, regardless of how new the
# file looks. The cache-dir allowlist still beats this — an operator-configured
# allowed root can intentionally live under one of these prefixes (rare, but
# their choice).
_MEDIA_DELIVERY_DENIED_PREFIXES = (
"/etc",
"/proc",
"/sys",
"/dev",
"/root",
"/boot",
"/var/log",
"/var/lib",
"/var/run",
)
# Within $HOME we additionally deny common credential / config directories.
# Resolved at check time against the live $HOME so containers and alt-home
# setups work correctly.
_MEDIA_DELIVERY_DENIED_HOME_SUBPATHS = (
".ssh",
".aws",
".gnupg",
".kube",
".docker",
".config",
".azure",
".gcloud",
"Library/Keychains", # macOS
)
def _media_delivery_allowed_roots() -> List[Path]:
"""Return roots from which model-emitted local media may be delivered."""
roots = [Path(root) for root in MEDIA_DELIVERY_SAFE_ROOTS]
extra_roots = os.environ.get(MEDIA_DELIVERY_ALLOW_DIRS_ENV, "")
for chunk in extra_roots.split(os.pathsep):
for raw_root in chunk.split(","):
raw_root = raw_root.strip()
if not raw_root:
continue
root = Path(os.path.expanduser(raw_root))
if root.is_absolute():
roots.append(root)
return roots
def _media_delivery_recency_seconds() -> float:
"""Return the recency window for trusting freshly-produced files.
0 disables recency-based trust entirely (pure-allowlist mode).
"""
raw = os.environ.get(MEDIA_DELIVERY_TRUST_RECENT_ENV, "1").strip().lower()
if raw in ("0", "false", "no", "off", ""):
return 0.0
try:
custom = os.environ.get(MEDIA_DELIVERY_TRUST_RECENT_SECONDS_ENV, "").strip()
if custom:
seconds = float(custom)
return max(0.0, seconds)
except (TypeError, ValueError):
pass
return float(_MEDIA_DELIVERY_TRUST_RECENT_DEFAULT_SECONDS)
def _media_delivery_denied_paths() -> List[Path]:
"""Return absolute denylist paths under which delivery is never allowed."""
denied = [Path(p) for p in _MEDIA_DELIVERY_DENIED_PREFIXES]
home = Path(os.path.expanduser("~"))
for sub in _MEDIA_DELIVERY_DENIED_HOME_SUBPATHS:
denied.append(home / sub)
# The Hermes home itself contains credentials (auth.json, .env) — only the
# cache subdirectories under it are explicitly allowlisted above.
denied.append(_HERMES_HOME / ".env")
denied.append(_HERMES_HOME / "auth.json")
denied.append(_HERMES_HOME / "credentials")
return denied
def _path_under_denied_prefix(resolved: Path) -> bool:
"""Return True if ``resolved`` lives under a deny-listed system path."""
for denied in _media_delivery_denied_paths():
try:
resolved_denied = denied.expanduser().resolve(strict=False)
except (OSError, RuntimeError, ValueError):
continue
if _path_is_within(resolved, resolved_denied) or resolved == resolved_denied:
return True
return False
def _file_is_recently_produced(resolved: Path, window_seconds: float) -> bool:
"""Return True if the file's mtime is within ``window_seconds`` of now.
Used as a session-scoped trust signal: agents almost always produce
delivery artifacts within seconds of asking to send them, while
prompt-injection paths pointing at pre-existing host files (/etc/passwd,
~/.ssh/id_rsa) have mtimes measured in days or months.
"""
if window_seconds <= 0:
return False
try:
mtime = resolved.stat().st_mtime
except OSError:
return False
return (time.time() - mtime) <= window_seconds
def _path_is_within(path: Path, root: Path) -> bool:
try:
path.relative_to(root)
return True
except ValueError:
return False
def validate_media_delivery_path(path: str) -> Optional[str]:
"""Return a safe absolute file path for native media delivery, else None.
MEDIA tags and bare local paths in model output are untrusted text. Only
existing regular files under Hermes-managed media caches, or roots the
operator explicitly allowlists, may be uploaded as native attachments.
Symlinks are resolved before the containment check.
"""
if not path:
return None
candidate = str(path).strip()
if len(candidate) >= 2 and candidate[0] == candidate[-1] and candidate[0] in "`\"'":
candidate = candidate[1:-1].strip()
candidate = candidate.lstrip("`\"'").rstrip("`\"',.;:)}]")
if not candidate:
return None
expanded = Path(os.path.expanduser(candidate))
if not expanded.is_absolute():
return None
try:
resolved = expanded.resolve(strict=True)
except (OSError, RuntimeError, ValueError):
return None
if not resolved.is_file():
return None
for root in _media_delivery_allowed_roots():
try:
resolved_root = root.expanduser().resolve(strict=False)
except (OSError, RuntimeError, ValueError):
continue
if _path_is_within(resolved, resolved_root):
return str(resolved)
# Outside the cache/operator allowlist: fall back to recency-based trust
# for files the agent has just produced (e.g. ``pandoc -o /tmp/report.pdf``
# or ``write_file("/home/user/report.pdf", ...)``). System paths and
# credential locations remain blocked even when "recent" — see
# ``_MEDIA_DELIVERY_DENIED_PREFIXES`` for the denylist.
window = _media_delivery_recency_seconds()
if window > 0 and not _path_under_denied_prefix(resolved):
if _file_is_recently_produced(resolved, window):
return str(resolved)
return None
SUPPORTED_DOCUMENT_TYPES = {
".pdf": "application/pdf",
@@ -1023,6 +1229,14 @@ class MessageEvent:
return args
@dataclass
class TextDebounceState:
event: MessageEvent
task: asyncio.Task | None
first_ts: float
last_ts: float
_PLAINTEXT_GATEWAY_RESTART_PATTERNS: tuple[re.Pattern[str], ...] = (
re.compile(r"^(?:please\s+)?restart\s+(?:the\s+)?gateway[.!?\s]*$", re.IGNORECASE),
re.compile(r"^(?:please\s+)?restart\s+(?:the\s+)?hermes\s+gateway[.!?\s]*$", re.IGNORECASE),
@@ -1318,6 +1532,17 @@ class BasePlatformAdapter(ABC):
self._active_sessions: Dict[str, asyncio.Event] = {}
self._pending_messages: Dict[str, MessageEvent] = {}
self._session_tasks: Dict[str, asyncio.Task] = {}
self._busy_text_mode: str = (
os.environ.get("HERMES_GATEWAY_BUSY_TEXT_MODE", "queue").strip().lower()
or "queue"
)
self._busy_text_debounce_seconds: float = _float_env(
"HERMES_GATEWAY_BUSY_TEXT_DEBOUNCE_SECONDS", 0.35
)
self._busy_text_hard_cap_seconds: float = _float_env(
"HERMES_GATEWAY_BUSY_TEXT_HARD_CAP_SECONDS", 1.0
)
self._text_debounce: dict[str, TextDebounceState] = {}
# Background message-processing tasks spawned by handle_message().
# Gateway shutdown cancels these so an old gateway instance doesn't keep
# working on a task after --replace or manual restarts.
@@ -2119,6 +2344,35 @@ class BasePlatformAdapter(ABC):
text = f"{caption}\n{text}"
return await self.send(chat_id=chat_id, content=text, reply_to=reply_to, metadata=metadata)
@staticmethod
def validate_media_delivery_path(path: str) -> Optional[str]:
"""Return a resolved path if it is safe for native attachment upload."""
return validate_media_delivery_path(path)
@staticmethod
def filter_media_delivery_paths(media_files) -> List[Tuple[str, bool]]:
"""Drop unsafe MEDIA paths and normalize accepted paths."""
safe_media: List[Tuple[str, bool]] = []
for media_path, is_voice in media_files or []:
safe_path = validate_media_delivery_path(str(media_path))
if safe_path:
safe_media.append((safe_path, bool(is_voice)))
else:
logger.warning("Skipping unsafe MEDIA directive path outside allowed roots")
return safe_media
@staticmethod
def filter_local_delivery_paths(file_paths) -> List[str]:
"""Drop unsafe bare local file paths and normalize accepted paths."""
safe_paths: List[str] = []
for file_path in file_paths or []:
safe_path = validate_media_delivery_path(str(file_path))
if safe_path:
safe_paths.append(safe_path)
else:
logger.warning("Skipping unsafe local file path outside allowed roots")
return safe_paths
@staticmethod
def extract_media(content: str) -> Tuple[List[Tuple[str, bool]], str]:
"""
@@ -2616,6 +2870,161 @@ class BasePlatformAdapter(ABC):
return f"{existing_text}\n\n{new_text}".strip()
return existing_text
def _text_debounce_store(self) -> dict[str, TextDebounceState]:
store = getattr(self, "_text_debounce", None)
if store is None:
store = {}
self._text_debounce = store
return store
def _is_queue_text_debounce_candidate(self, event: MessageEvent) -> bool:
"""Return True for normal text eligible for queue-mode debounce."""
result = (
getattr(self, "_busy_text_mode", "queue") == "queue"
and event.message_type == MessageType.TEXT
and not getattr(event, "internal", False)
and not event.is_command()
and bool((event.text or "").strip())
)
if result:
logger.debug(
"[%s] Queue-text debounce candidate accepted: session=%s text_len=%d",
self.name,
getattr(event, "session_key", "?"),
len(event.text or ""),
)
return result
def _can_merge_text_debounce_events(self, existing: MessageEvent, event: MessageEvent) -> bool:
"""Return True when two text debounce events came from the same sender."""
def _identity(candidate: MessageEvent) -> tuple[str, ...] | None:
source = getattr(candidate, "source", None)
if source is None:
return None
platform = _platform_name(getattr(source, "platform", None))
sender = getattr(source, "user_id_alt", None) or getattr(source, "user_id", None)
if sender:
return (platform, str(sender))
if getattr(source, "chat_type", None) in {"dm", "private"} and getattr(source, "chat_id", None):
return (platform, "dm", str(source.chat_id))
return None
existing_sender = _identity(existing)
incoming_sender = _identity(event)
return existing_sender is not None and existing_sender == incoming_sender
def _text_debounce_delay(self, session_key: str) -> float:
"""Return bounded busy-text debounce delay for ``session_key``."""
state = self._text_debounce_store().get(session_key)
if state is None:
return 0.0
now = time.monotonic()
window_deadline = state.last_ts + self._busy_text_debounce_seconds
hard_cap_deadline = state.first_ts + self._busy_text_hard_cap_seconds
return max(0.0, min(window_deadline, hard_cap_deadline) - now)
async def _queue_text_debounce(self, session_key: str, event: MessageEvent) -> None:
"""Buffer normal queue-mode busy text and schedule a bounded flush."""
store = self._text_debounce_store()
state = store.get(session_key)
if state is not None and not self._can_merge_text_debounce_events(state.event, event):
# Preserve sender attribution in shared sessions. The current
# buffer becomes the next pending turn; the new sender starts a
# fresh debounce burst when the pending slot allows it.
await self._flush_text_debounce_now(session_key)
state = store.get(session_key)
if state is not None and not self._can_merge_text_debounce_events(state.event, event):
existing_pending = self._pending_messages.get(session_key)
if existing_pending is not None and self._can_merge_text_debounce_events(existing_pending, event):
merge_pending_message_event(
self._pending_messages,
session_key,
event,
merge_text=True,
)
return
now = time.monotonic()
if state is None:
state = TextDebounceState(
event=event,
task=None,
first_ts=now,
last_ts=now,
)
store[session_key] = state
else:
if event.text:
state.event.text = (
f"{state.event.text}\n{event.text}"
if state.event.text
else event.text
)
latest_message_id = getattr(event, "message_id", None)
latest_anchor = latest_message_id or getattr(event, "reply_to_message_id", None)
if latest_message_id is not None:
state.event.message_id = str(latest_message_id)
if latest_anchor is not None and hasattr(state.event, "reply_to_message_id"):
state.event.reply_to_message_id = str(latest_anchor)
state.last_ts = now
if state.task is not None and not state.task.done():
state.task.cancel()
delay = self._text_debounce_delay(session_key)
state.task = asyncio.create_task(self._flush_text_debounce(session_key, delay))
async def _flush_text_debounce(self, session_key: str, delay: float) -> None:
"""Timer task that flushes the debounced text buffer."""
try:
await asyncio.sleep(delay)
await self._flush_text_debounce_now(session_key)
except asyncio.CancelledError:
return
finally:
current = asyncio.current_task()
state = self._text_debounce_store().get(session_key)
if state is not None and state.task is current:
state.task = None
async def _flush_text_debounce_now(self, session_key: str) -> bool:
"""Force-flush one debounced busy-text burst into the pending slot."""
store = self._text_debounce_store()
state = store.get(session_key)
if state is None:
return False
current = asyncio.current_task()
if state.task is not None and state.task is not current and not state.task.done():
state.task.cancel()
state.task = None
existing_pending = self._pending_messages.get(session_key)
if (
existing_pending is not None
and not self._can_merge_text_debounce_events(existing_pending, state.event)
):
return False
state = store.pop(session_key, None)
if state is None:
return False
merge_pending_message_event(
self._pending_messages,
session_key,
state.event,
merge_text=True,
)
return True
def _discard_text_debounce(self, session_key: str) -> None:
"""Cancel and drop pending text debounce state for control commands."""
state = self._text_debounce_store().pop(session_key, None)
if state is not None and state.task is not None and not state.task.done():
state.task.cancel()
# ------------------------------------------------------------------
# Session task + guard ownership helpers
# ------------------------------------------------------------------
@@ -2685,6 +3094,7 @@ class BasePlatformAdapter(ABC):
self._active_sessions.pop(session_key, None)
self._pending_messages.pop(session_key, None)
self._session_tasks.pop(session_key, None)
self._discard_text_debounce(session_key)
return True
def _start_session_processing(
@@ -2766,6 +3176,7 @@ class BasePlatformAdapter(ABC):
)
if discard_pending:
self._pending_messages.pop(session_key, None)
self._discard_text_debounce(session_key)
if release_guard:
self._release_session_guard(session_key)
@@ -2780,6 +3191,7 @@ class BasePlatformAdapter(ABC):
command-scoped guard, then if a follow-up message landed while the
command was running spawns a fresh processing task for it.
"""
await self._flush_text_debounce_now(session_key)
pending_event = self._pending_messages.pop(session_key, None)
self._release_session_guard(session_key, guard=command_guard)
if pending_event is None:
@@ -2911,6 +3323,7 @@ class BasePlatformAdapter(ABC):
# through the dedicated handoff path that serializes
# cancellation + runner response + pending drain.
if cmd in {"stop", "new", "reset"}:
self._discard_text_debounce(session_key)
try:
await self._dispatch_active_session_command(event, session_key, cmd)
except Exception as e:
@@ -2955,8 +3368,9 @@ class BasePlatformAdapter(ABC):
# clarify-intercept can resolve it and unblock the agent.
#
# Without this bypass: the message gets queued in
# _pending_messages AND triggers an interrupt, killing the
# agent run mid-clarify and discarding the user's answer.
# _pending_messages as a follow-up turn instead of reaching the
# clarify resolver, leaving the agent blocked and discarding the
# user's answer.
# Same shape as the /approve deadlock fix (PR #4926) — both
# cases are "agent thread blocked on Event.wait, message must
# reach the resolver before being treated as a new turn."
@@ -3015,27 +3429,28 @@ class BasePlatformAdapter(ABC):
merge_pending_message_event(self._pending_messages, session_key, event)
return # Don't interrupt now - will run after current task completes
# Default behavior for non-photo follow-ups: interrupt the running agent.
#
# Use merge_text=True so rapid TEXT follow-ups (#4469) accumulate
# into the single pending slot instead of clobbering each other.
# Without merging, three rapid messages "A", "B", "C" land like:
# _pending_messages[k] = A (interrupts)
# _pending_messages[k] = B (replaces A before consumer reads)
# _pending_messages[k] = C (replaces B)
# ...and only "C" reaches the next turn. merge_pending_message_event
# already does the right thing for photo/media bursts; the
# ``merge_text=True`` flag extends that to plain TEXT events.
# Same shape as the Telegram bursty-grace path in gateway/run.py.
logger.debug("[%s] New message while session %s is active — triggering interrupt", self.name, session_key)
merge_pending_message_event(
self._pending_messages,
session_key,
event,
merge_text=True,
)
# Signal the interrupt (the processing task checks this)
self._active_sessions[session_key].set()
if self._is_queue_text_debounce_candidate(event):
logger.debug(
"[%s] New text message while session %s is active — "
"debouncing follow-up (busy_text_mode=queue, window=%.2fs)",
self.name,
session_key,
self._busy_text_debounce_seconds,
)
await self._queue_text_debounce(session_key, event)
else:
logger.debug(
"[%s] New message while session %s is active — queuing follow-up "
"(no interrupt, will cascade after current turn)",
self.name,
session_key,
)
merge_pending_message_event(
self._pending_messages,
session_key,
event,
merge_text=event.message_type == MessageType.TEXT,
)
return # Don't process now - will be handled after current task finishes
# Mark session as active BEFORE spawning background task to close
@@ -3166,6 +3581,7 @@ class BasePlatformAdapter(ABC):
# Extract MEDIA:<path> tags (from TTS tool) before other processing
media_files, response = self.extract_media(response)
media_files = self.filter_media_delivery_paths(media_files)
# Extract image URLs and send them as native platform attachments
images, text_content = self.extract_images(response)
@@ -3179,6 +3595,7 @@ class BasePlatformAdapter(ABC):
# Auto-detect bare local file paths for native media delivery
# (helps small models that don't use MEDIA: syntax)
local_files, text_content = self.extract_local_files(text_content)
local_files = self.filter_local_delivery_paths(local_files)
if local_files:
logger.info("[%s] extract_local_files found %d file(s) in response", self.name, len(local_files))
@@ -3387,10 +3804,15 @@ class BasePlatformAdapter(ABC):
ProcessingOutcome.SUCCESS if processing_ok else ProcessingOutcome.FAILURE,
)
# The active drain owns debounce state. If a queue-mode timer has
# not fired yet, force-flush into _pending_messages here and let
# this task hand off the follow-up.
await self._flush_text_debounce_now(session_key)
# Check if there's a pending message that was queued during our processing
if session_key in self._pending_messages:
pending_event = self._pending_messages.pop(session_key)
logger.debug("[%s] Processing queued message from interrupt", self.name)
logger.debug("[%s] Processing queued follow-up message", self.name)
# Keep the _active_sessions entry live across the turn chain
# and only CLEAR the interrupt Event — do NOT delete the entry.
# If we deleted here, a concurrent inbound message arriving
@@ -3399,7 +3821,7 @@ class BasePlatformAdapter(ABC):
# with the recursive drain below. Two agents on one
# session_key = duplicate responses, duplicate tool calls.
# Clearing the Event keeps the guard live so follow-ups take
# the busy-handler path (queue + interrupt) as intended.
# the busy-handler path as intended.
_active = self._active_sessions.get(session_key)
if _active is not None:
_active.clear()
@@ -3492,6 +3914,9 @@ class BasePlatformAdapter(ABC):
await self.stop_typing(event.source.chat_id)
except Exception:
pass
# Final drain/release boundary: force-flush any timer that missed
# the in-band drain before deciding whether the guard can clear.
await self._flush_text_debounce_now(session_key)
# Late-arrival drain: a message may have arrived during the
# cleanup awaits above (typing_task cancel, stop_typing). Such
# messages passed the Level-1 guard (entry still live, Event
@@ -3611,6 +4036,10 @@ class BasePlatformAdapter(ABC):
self._session_tasks.clear()
self._pending_messages.clear()
self._active_sessions.clear()
for state in list(self._text_debounce_store().values()):
if state.task is not None and not state.task.done():
state.task.cancel()
self._text_debounce_store().clear()
def has_pending_interrupt(self, session_key: str) -> bool:
"""Check if there's a pending interrupt for a session."""
+17 -5
View File
@@ -189,7 +189,10 @@ class BlueBubblesAdapter(BasePlatformAdapter):
app = web.Application()
app.router.add_get("/health", lambda _: web.Response(text="ok"))
app.router.add_post(self.webhook_path, self._handle_webhook)
self._runner = web.AppRunner(app)
# The webhook auth value is carried in the query string because the
# BlueBubbles webhook API cannot send custom headers. Do not let
# aiohttp access logs write that request target to agent.log.
self._runner = web.AppRunner(app, access_log=None)
await self._runner.setup()
site = web.TCPSite(self._runner, self.webhook_host, self.webhook_port)
await site.start()
@@ -242,6 +245,14 @@ class BlueBubblesAdapter(BasePlatformAdapter):
return f"{base}?password={quote(self.password, safe='')}"
return base
@property
def _webhook_register_url_for_log(self) -> str:
"""Webhook registration URL safe for logs."""
base = self._webhook_url
if self.password:
return f"{base}?password=***"
return base
async def _find_registered_webhooks(self, url: str) -> list:
"""Return list of BB webhook entries matching *url*."""
try:
@@ -269,7 +280,8 @@ class BlueBubblesAdapter(BasePlatformAdapter):
existing = await self._find_registered_webhooks(webhook_url)
if existing:
logger.info(
"[bluebubbles] webhook already registered: %s", webhook_url
"[bluebubbles] webhook already registered: %s",
self._webhook_register_url_for_log,
)
return True
@@ -284,7 +296,7 @@ class BlueBubblesAdapter(BasePlatformAdapter):
if 200 <= status < 300:
logger.info(
"[bluebubbles] webhook registered with server: %s",
webhook_url,
self._webhook_register_url_for_log,
)
return True
else:
@@ -324,7 +336,8 @@ class BlueBubblesAdapter(BasePlatformAdapter):
removed = True
if removed:
logger.info(
"[bluebubbles] webhook unregistered: %s", webhook_url
"[bluebubbles] webhook unregistered: %s",
self._webhook_register_url_for_log,
)
except Exception as exc:
logger.debug(
@@ -934,4 +947,3 @@ class BlueBubblesAdapter(BasePlatformAdapter):
asyncio.create_task(self.mark_read(session_chat_id))
return web.Response(text="ok")
+13
View File
@@ -358,6 +358,19 @@ class DingTalkAdapter(BasePlatformAdapter):
await asyncio.gather(*self._bg_tasks, return_exceptions=True)
self._bg_tasks.clear()
# Finalize any open streaming cards before the HTTP client closes so
# they don't stay stuck in streaming state on DingTalk's UI after
# a gateway restart. _close_streaming_siblings handles its own
# per-card exceptions; the outer try is a safety net for token fetch.
for _chat_id in list(self._streaming_cards):
try:
await self._close_streaming_siblings(_chat_id)
except Exception as _exc:
logger.debug(
"[%s] Failed to finalize streaming card on disconnect for %s: %s",
self.name, _chat_id, _exc,
)
if self._http_client:
await self._http_client.aclose()
self._http_client = None
+72 -10
View File
@@ -1514,8 +1514,10 @@ class FeishuAdapter(BasePlatformAdapter):
connection_mode=str(
extra.get("connection_mode") or os.getenv("FEISHU_CONNECTION_MODE", "websocket")
).strip().lower(),
encrypt_key=os.getenv("FEISHU_ENCRYPT_KEY", "").strip(),
verification_token=os.getenv("FEISHU_VERIFICATION_TOKEN", "").strip(),
encrypt_key=str(extra.get("encrypt_key") or os.getenv("FEISHU_ENCRYPT_KEY", "")).strip(),
verification_token=str(
extra.get("verification_token") or os.getenv("FEISHU_VERIFICATION_TOKEN", "")
).strip(),
group_policy=os.getenv("FEISHU_GROUP_POLICY", "allowlist").strip().lower(),
allowed_group_users=frozenset(
item.strip()
@@ -1642,6 +1644,11 @@ class FeishuAdapter(BasePlatformAdapter):
self._connection_mode,
)
return False
if self._connection_mode == "webhook" and not (self._verification_token or self._encrypt_key):
logger.error(
"[Feishu] Webhook mode requires FEISHU_VERIFICATION_TOKEN or FEISHU_ENCRYPT_KEY."
)
return False
try:
self._app_lock_identity = self._app_id
@@ -2563,13 +2570,44 @@ class FeishuAdapter(BasePlatformAdapter):
if approval_id is None:
logger.debug("[Feishu] Card action missing approval_id, ignoring")
return P2CardActionTriggerResponse() if P2CardActionTriggerResponse else None
state = self._approval_state.get(approval_id)
if not state:
logger.debug("[Feishu] Approval %s already resolved or unknown", approval_id)
return P2CardActionTriggerResponse() if P2CardActionTriggerResponse else None
choice = _APPROVAL_CHOICE_MAP.get(action_value.get("hermes_action"), "deny")
operator = getattr(event, "operator", None)
open_id = str(getattr(operator, "open_id", "") or "")
sender_id = SimpleNamespace(open_id=open_id, user_id=str(getattr(operator, "user_id", "") or ""))
if not self._allow_group_message(sender_id, state.get("chat_id", ""), is_bot=False):
logger.warning("[Feishu] Unauthorized approval click by %s", open_id or "<unknown>")
return P2CardActionTriggerResponse() if P2CardActionTriggerResponse else None
callback_chat_id = str(getattr(getattr(event, "context", None), "open_chat_id", "") or "")
expected_chat_id = str(state.get("chat_id", "") or "")
if callback_chat_id and expected_chat_id and callback_chat_id != expected_chat_id:
logger.warning(
"[Feishu] Approval callback chat mismatch for %s (expected=%s, got=%s)",
approval_id,
expected_chat_id,
callback_chat_id,
)
return P2CardActionTriggerResponse() if P2CardActionTriggerResponse else None
user_name = self._get_cached_sender_name(open_id) or open_id
if not self._submit_on_loop(loop, self._resolve_approval(approval_id, choice, user_name)):
chat_context = getattr(event, "context", None)
chat_id = str(getattr(chat_context, "open_chat_id", "") or "")
if not self._submit_on_loop(
loop,
self._resolve_approval(
approval_id=approval_id,
choice=choice,
user_name=user_name,
open_id=open_id,
chat_id=chat_id,
),
):
return P2CardActionTriggerResponse() if P2CardActionTriggerResponse else None
if P2CardActionTriggerResponse is None:
@@ -2617,12 +2655,34 @@ class FeishuAdapter(BasePlatformAdapter):
response.card = card
return response
async def _resolve_approval(self, approval_id: Any, choice: str, user_name: str) -> None:
async def _resolve_approval(
self,
approval_id: Any,
choice: str,
user_name: str,
*,
open_id: str = "",
chat_id: str = "",
) -> None:
"""Pop approval state and unblock the waiting agent thread."""
state = self._approval_state.pop(approval_id, None)
state = self._approval_state.get(approval_id)
if not state:
logger.debug("[Feishu] Approval %s already resolved or unknown", approval_id)
return
if not self._is_interactive_operator_authorized(open_id):
logger.warning("[Feishu] Unauthorized approval click by %s for approval %s", open_id or "<unknown>", approval_id)
return
expected_chat_id = str(state.get("chat_id", "") or "")
if expected_chat_id and chat_id and expected_chat_id != chat_id:
logger.warning(
"[Feishu] Approval %s chat mismatch (expected=%s, got=%s)",
approval_id, expected_chat_id, chat_id,
)
return
state = self._approval_state.pop(approval_id, None)
if not state:
logger.debug("[Feishu] Approval %s already resolved while validating callback", approval_id)
return
try:
from tools.approval import resolve_gateway_approval
count = resolve_gateway_approval(state["session_key"], choice)
@@ -3229,11 +3289,6 @@ class FeishuAdapter(BasePlatformAdapter):
self._record_webhook_anomaly(remote_ip, "400")
return web.json_response({"code": 400, "msg": "invalid json"}, status=400)
# URL verification challenge — respond before other checks so that Feishu's
# subscription setup works even before encrypt_key is wired.
if payload.get("type") == "url_verification":
return web.json_response({"challenge": payload.get("challenge", "")})
# Verification token check — second layer of defence beyond signature (matches openclaw).
if self._verification_token:
header = payload.get("header") or {}
@@ -3243,6 +3298,13 @@ class FeishuAdapter(BasePlatformAdapter):
self._record_webhook_anomaly(remote_ip, "401-token")
return web.Response(status=401, text="Invalid verification token")
# URL verification challenge — Feishu includes the verification token in
# challenge requests. Validate the token (above) before reflecting the
# challenge so an unauthenticated remote request cannot prove endpoint
# control by getting attacker-supplied challenge data echoed back.
if payload.get("type") == "url_verification":
return web.json_response({"challenge": payload.get("challenge", "")})
# Timing-safe signature verification (only enforced when encrypt_key is set).
if self._encrypt_key and not self._is_webhook_signature_valid(request.headers, body_bytes):
logger.warning("[Feishu] Webhook rejected: invalid signature from %s", remote_ip)
+42 -8
View File
@@ -138,7 +138,8 @@ _OUTBOUND_MENTION_RE = re.compile(
)
_E2EE_INSTALL_HINT = (
"Install with: pip install 'mautrix[encryption]' (requires libolm C library)"
"Install with: pip install 'mautrix[encryption]' asyncpg aiosqlite "
"(requires libolm C library)"
)
_MATRIX_IMAGE_FILENAME_EXTS = frozenset({
@@ -214,9 +215,22 @@ def _create_matrix_session(proxy_url: str | None):
def _check_e2ee_deps() -> bool:
"""Return True if mautrix E2EE dependencies (python-olm) are available."""
"""Return True if mautrix E2EE dependencies are available.
Verifies python-olm (via mautrix.crypto.OlmMachine), the SQLite crypto
store backend (mautrix.crypto.store.asyncpg.PgCryptoStore yes, the
PgCryptoStore class also drives the sqlite backend in mautrix 0.21),
and the database drivers actually used at connect time (``asyncpg`` for
the underlying upgrade_table machinery, ``aiosqlite`` for the
``sqlite:///`` URL we pass to ``Database.create``). Without all four,
encrypted rooms fail at connect time with a confusing
``No module named 'asyncpg'`` (#31116).
"""
try:
from mautrix.crypto import OlmMachine # noqa: F401
from mautrix.crypto.store.asyncpg import PgCryptoStore # noqa: F401
import asyncpg # noqa: F401
import aiosqlite # noqa: F401
return True
except (ImportError, AttributeError):
@@ -226,8 +240,13 @@ def _check_e2ee_deps() -> bool:
def check_matrix_requirements() -> bool:
"""Return True if the Matrix adapter can be used.
Lazy-installs mautrix via ``tools.lazy_deps.ensure("platform.matrix")``
on first call if not present. Rebinds all module-level type globals on success.
Lazy-installs the full ``platform.matrix`` feature group via
``tools.lazy_deps.ensure_and_bind`` whenever any of the declared
packages (mautrix, Markdown, aiosqlite, asyncpg, aiohttp-socks) is
missing not just mautrix itself. Previously this short-circuited on
``import mautrix``, which left the other four packages uninstalled
forever and broke E2EE connect with ``No module named 'asyncpg'``
(#31116). Rebinds module-level type globals on success.
"""
token = os.getenv("MATRIX_ACCESS_TOKEN", "")
password = os.getenv("MATRIX_PASSWORD", "")
@@ -239,9 +258,20 @@ def check_matrix_requirements() -> bool:
if not homeserver:
logger.warning("Matrix: MATRIX_HOMESERVER not set")
return False
# Check whether any package in the platform.matrix feature group is
# missing. ``feature_missing`` is cheap (per-spec importlib.metadata
# lookups) and correctly handles ``mautrix[encryption]`` by stripping
# the extras marker before checking the bare package.
try:
import mautrix # noqa: F401
except ImportError:
from tools.lazy_deps import feature_missing, ensure_and_bind
missing = feature_missing("platform.matrix")
except Exception as exc: # pragma: no cover — defensive
logger.debug("Matrix: lazy_deps lookup failed: %s", exc)
missing = ()
ensure_and_bind = None # type: ignore[assignment]
if missing or ensure_and_bind is None:
def _import():
from mautrix.types import (
ContentURI, EventID, EventType, PaginationDirection,
@@ -261,10 +291,14 @@ def check_matrix_requirements() -> bool:
"UserID": UserID,
}
from tools.lazy_deps import ensure_and_bind
if ensure_and_bind is None:
return False
if not ensure_and_bind("platform.matrix", _import, globals(), prompt=False):
logger.warning(
"Matrix: mautrix not installed. Run: pip install 'mautrix[encryption]'"
"Matrix: required packages not installed (%s). "
"Run: pip install 'mautrix[encryption]' asyncpg aiosqlite "
"Markdown aiohttp-socks",
", ".join(missing) if missing else "platform.matrix",
)
return False
+7 -1
View File
@@ -133,6 +133,12 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
self._notification_scheduler = scheduler
async def connect(self) -> bool:
if self._client_state is None:
logger.error(
"[msgraph_webhook] Refusing to start without extra.client_state configured"
)
return False
app = web.Application()
app.router.add_get(self._health_path, self._handle_health)
app.router.add_get(self._webhook_path, self._handle_validation)
@@ -310,7 +316,7 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
"""
expected = self._client_state
if expected is None:
return True
return False
provided = self._string_or_none(notification.get("clientState"))
if provided is None:
return False
+131 -16
View File
@@ -534,9 +534,30 @@ class QQAdapter(BasePlatformAdapter):
self._mark_transport_disconnected()
self._fail_pending("Connection closed")
# Stop reconnecting for fatal codes
if code in {4914, 4915}:
desc = "offline/sandbox-only" if code == 4914 else "banned"
# Stop reconnecting for fatal codes (unrecoverable errors)
if code in {
4001, # Invalid opcode
4002, # Invalid payload
4010, # Invalid shard
4011, # Sharding required
4012, # Invalid API version
4013, # Invalid intent
4014, # Intent not authorized
4914, # Offline/sandbox-only
4915, # Banned
}:
fatal_descriptions = {
4001: "invalid opcode",
4002: "invalid payload",
4010: "invalid shard",
4011: "sharding required",
4012: "invalid API version",
4013: "invalid intent",
4014: "intent not authorized",
4914: "offline/sandbox-only",
4915: "banned",
}
desc = fatal_descriptions.get(code, f"fatal error (code={code})")
logger.error(
"[%s] Bot is %s. Check QQ Open Platform.", self._log_tag, desc
)
@@ -573,10 +594,11 @@ class QQAdapter(BasePlatformAdapter):
self._token_expires_at = 0.0
# Session invalid → clear session, will re-identify on next Hello
# Note: 4009 (connection timeout) is NOT included here — it is
# resumable per the QQ protocol and should preserve session state.
if code in {
4006,
4007,
4009,
4900,
4901,
4902,
@@ -705,9 +727,8 @@ class QQAdapter(BasePlatformAdapter):
"token": f"QQBot {token}",
"intents": (1 << 25)
| (1 << 30)
| (
1 << 12
), # C2C_GROUP_AT_MESSAGES + PUBLIC_GUILD_MESSAGES + DIRECT_MESSAGE
| (1 << 12)
| (1 << 26), # C2C_GROUP_AT_MESSAGES + PUBLIC_GUILD_MESSAGES + DIRECT_MESSAGE + INTERACTION
"shard": [0, 1],
"properties": {
"$os": "macOS",
@@ -826,6 +847,32 @@ class QQAdapter(BasePlatformAdapter):
if op == 11:
return
# op 7 = Server Reconnect — server asks client to reconnect (e.g.
# load-balancing, maintenance). Close the WS so _read_events raises
# and the outer loop triggers a reconnect with Resume.
if op == 7:
logger.info("[%s] Server requested reconnect (op 7)", self._log_tag)
if self._ws and not self._ws.closed:
self._create_task(self._ws.close())
return
# op 9 = Invalid Session — d=True means session is resumable,
# d=False means we must re-identify from scratch.
if op == 9:
resumable = bool(d) if d is not None else False
if not resumable:
logger.info(
"[%s] Invalid session (op 9, not resumable), clearing session",
self._log_tag,
)
self._session_id = None
self._last_seq = None
else:
logger.info("[%s] Invalid session (op 9, resumable)", self._log_tag)
if self._ws and not self._ws.closed:
self._create_task(self._ws.close())
return
logger.debug("[%s] Unknown op: %s", self._log_tag, op)
def _handle_ready(self, d: Any) -> None:
@@ -1007,6 +1054,46 @@ class QQAdapter(BasePlatformAdapter):
"deny": "deny",
}
@staticmethod
def _parse_gateway_session_key(session_key: str) -> Optional[Dict[str, str]]:
"""Parse ``agent:main:<platform>:<chat_type>:<chat_id>[:<user_id>]``."""
parts = str(session_key or "").split(":")
if len(parts) < 5 or parts[0] != "agent" or parts[1] != "main":
return None
parsed = {
"platform": parts[2],
"chat_type": parts[3],
"chat_id": parts[4],
}
if len(parts) > 5:
parsed["user_id"] = parts[5]
return parsed
def _is_authorized_interaction_for_session(
self,
event: InteractionEvent,
session_key: str,
) -> bool:
"""Authorize approval/update interactions against session + operator."""
parsed = self._parse_gateway_session_key(session_key)
operator = str(event.operator_openid or "").strip()
if not parsed or parsed.get("platform") != "qqbot" or not operator:
return False
chat_type = parsed.get("chat_type", "")
chat_id = parsed.get("chat_id", "")
if chat_type == "c2c":
return bool(chat_id) and operator == chat_id
if chat_type in {"group", "guild"}:
event_chat = str(event.group_openid or event.guild_id or "").strip()
if not event_chat or event_chat != chat_id:
return False
session_user = str(parsed.get("user_id", "")).strip()
return bool(session_user) and operator == session_user
return False
async def _default_interaction_dispatch(
self,
event: InteractionEvent,
@@ -1040,6 +1127,13 @@ class QQAdapter(BasePlatformAdapter):
self._log_tag, decision, session_key,
)
return
if not self._is_authorized_interaction_for_session(event, session_key):
logger.warning(
"[%s] Rejected unauthorized approval click for session %s "
"(operator=%s)",
self._log_tag, session_key, event.operator_openid,
)
return
try:
# Import lazily to keep the adapter importable in tests that
# don't exercise the approval subsystem.
@@ -1060,6 +1154,13 @@ class QQAdapter(BasePlatformAdapter):
update_answer = parse_update_prompt_button_data(button_data)
if update_answer is not None:
update_session_key = f"agent:main:qqbot:{event.scene}:{event.group_openid or event.guild_id or event.user_openid}"
if not self._is_authorized_interaction_for_session(event, update_session_key):
logger.warning(
"[%s] Rejected unauthorized update prompt click (operator=%s)",
self._log_tag, event.operator_openid,
)
return
self._write_update_response(update_answer, event.operator_openid)
return
@@ -1607,7 +1708,7 @@ class QQAdapter(BasePlatformAdapter):
elif ct.startswith("image/"):
# Image: download and cache locally.
try:
cached_path = await self._download_and_cache(url, ct)
cached_path = await self._download_and_cache(url, ct, filename)
if cached_path and os.path.isfile(cached_path):
image_urls.append(cached_path)
image_media_types.append(ct or "image/jpeg")
@@ -1620,11 +1721,15 @@ class QQAdapter(BasePlatformAdapter):
except Exception as exc:
logger.debug("[%s] Failed to cache image: %s", self._log_tag, exc)
else:
# Other attachments (video, file, etc.): record as text.
# Other attachments (video, file, etc.): download and record with path.
try:
cached_path = await self._download_and_cache(url, ct)
cached_path = await self._download_and_cache(url, ct, filename)
if cached_path:
other_attachments.append(f"[Attachment: {filename or ct}]")
name = filename or ct
if ct.startswith("video/"):
other_attachments.append(f"[video: {name} ({cached_path})]")
else:
other_attachments.append(f"[file: {name} ({cached_path})]")
except Exception as exc:
logger.debug("[%s] Failed to cache attachment: %s", self._log_tag, exc)
@@ -1636,8 +1741,14 @@ class QQAdapter(BasePlatformAdapter):
"attachment_info": attachment_info,
}
async def _download_and_cache(self, url: str, content_type: str) -> Optional[str]:
"""Download a URL and cache it locally."""
async def _download_and_cache(
self, url: str, content_type: str, original_name: str = "",
) -> Optional[str]:
"""Download a URL and cache it locally.
:param original_name: Preferred filename from attachment metadata.
Falls back to the URL path basename if empty.
"""
from tools.url_safety import is_safe_url
if not is_safe_url(url):
@@ -1668,7 +1779,11 @@ class QQAdapter(BasePlatformAdapter):
# Convert to .wav using ffmpeg so STT engines can process it.
return await self._convert_audio_to_wav(data, url)
else:
filename = Path(urlparse(url).path).name or "qq_attachment"
filename = (
original_name
or Path(urlparse(url).path).name
or "qq_attachment"
)
return cache_document_from_bytes(data, filename)
@staticmethod
@@ -1881,7 +1996,7 @@ class QQAdapter(BasePlatformAdapter):
@staticmethod
def _guess_ext_from_data(data: bytes) -> str:
"""Guess file extension from magic bytes."""
if data[:9] == b"#!SILK_V3" or data[:5] == b"#!SILK":
if data[:9] == b"#!SILK_V3" or data[:6] == b"#!SILK":
return ".silk"
if data[:2] == b"\x02!":
return ".silk"
@@ -1901,7 +2016,7 @@ class QQAdapter(BasePlatformAdapter):
@staticmethod
def _looks_like_silk(data: bytes) -> bool:
"""Check if bytes look like a SILK audio file."""
return data[:4] == b"#!SILK" or data[:2] == b"\x02!" or data[:9] == b"#!SILK_V3"
return data[:6] == b"#!SILK" or data[:2] == b"\x02!" or data[:9] == b"#!SILK_V3"
async def _convert_silk_to_wav(self, src_path: str, wav_path: str) -> Optional[str]:
"""Convert audio file to WAV using the pilk library.
+61 -4
View File
@@ -429,6 +429,13 @@ class TelegramAdapter(BasePlatformAdapter):
self._polling_conflict_count: int = 0
self._polling_network_error_count: int = 0
self._polling_error_callback_ref = None
# After sustained reconnect storms the PTB httpx pool can return
# SendResult(success=True) for sends that never actually transmit.
# _handle_polling_network_error sets this; _verify_polling_after_reconnect
# clears it once getMe() confirms the Bot client is healthy.
# While True, send() short-circuits to a failure so callers
# (cron live-adapter branch) fall through to standalone delivery.
self._send_path_degraded: bool = False
# DM Topics: map of topic_name -> message_thread_id (populated at startup)
self._dm_topics: Dict[str, int] = {}
# Track forum chats where we've already registered bot commands
@@ -468,6 +475,10 @@ class TelegramAdapter(BasePlatformAdapter):
# "all" — every message triggers a push notification (legacy
# behavior; opt-in via display.platforms.telegram.notifications).
self._notifications_mode: str = "important"
# send_or_update_status() bookkeeping: {(chat_id, status_key) -> bot message_id}
# Tracks status bubbles owned by this adapter so subsequent calls with the
# same key edit the same message instead of appending new ones (#30045).
self._status_message_ids: Dict[tuple, str] = {}
def _notification_kwargs(
self, metadata: Optional[Dict[str, Any]]
@@ -870,6 +881,7 @@ class TelegramAdapter(BasePlatformAdapter):
MAX_DELAY = 60
self._polling_network_error_count += 1
self._send_path_degraded = True
attempt = self._polling_network_error_count
if attempt > MAX_NETWORK_RETRIES:
@@ -967,6 +979,7 @@ class TelegramAdapter(BasePlatformAdapter):
try:
await asyncio.wait_for(self._app.bot.get_me(), PROBE_TIMEOUT)
self._send_path_degraded = False
except Exception as probe_err:
logger.warning(
"[%s] Polling heartbeat probe failed %ds after reconnect: %s",
@@ -1679,7 +1692,11 @@ class TelegramAdapter(BasePlatformAdapter):
"""Send a message to a Telegram chat."""
if not self._bot:
return SendResult(success=False, error="Not connected")
# getattr() — tests build adapters via object.__new__() (no __init__).
if getattr(self, "_send_path_degraded", False):
return SendResult(success=False, error="send_path_degraded", retryable=True)
# Skip whitespace-only text to prevent Telegram 400 empty-text errors.
if not content or not content.strip():
return SendResult(success=True, message_id=None)
@@ -1908,6 +1925,40 @@ class TelegramAdapter(BasePlatformAdapter):
is_connect_timeout = self._looks_like_connect_timeout(e)
return SendResult(success=False, error=str(e), retryable=(is_connect_timeout or not is_timeout))
async def send_or_update_status(
self,
chat_id: str,
status_key: str,
content: str,
*,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a status message, or edit the previous one with the same key.
Issue #30045: progress/status callbacks (context-pressure, lifecycle,
compression, etc.) used to append a fresh bubble on every call. With
this method, the first call sends and the message id is remembered;
subsequent calls with the same (chat_id, status_key) edit that same
message in place. If the edit fails (message deleted, too old, etc.)
we drop the cached id and send fresh.
"""
key = (str(chat_id), str(status_key))
cached_id = self._status_message_ids.get(key)
if cached_id is not None:
result = await self.edit_message(
chat_id, cached_id, content, finalize=True, metadata=metadata,
)
if result.success:
if result.message_id:
self._status_message_ids[key] = str(result.message_id)
return result
# Edit failed — clear the cached id and fall through to a fresh send.
self._status_message_ids.pop(key, None)
result = await self.send(chat_id, content, metadata=metadata)
if result.success and result.message_id:
self._status_message_ids[key] = str(result.message_id)
return result
async def edit_message(
self,
chat_id: str,
@@ -4573,10 +4624,10 @@ class TelegramAdapter(BasePlatformAdapter):
return (
"You are handling a Telegram group chat message.\n"
f"- Your identity: user_id={bot_id}, @-mention name in this group=@{username}\n"
"- Lines in history prefixed with `[nickname|user_id]` are observed Telegram group context "
"and are not necessarily addressed to you.\n"
"- observed Telegram group context may be provided in a separate context-only block "
"before the current message; it is not necessarily addressed to you.\n"
"- Treat only the current new message as a request explicitly directed at you, "
"and answer it directly."
"and use observed context only when the current message asks for it."
)
def _apply_telegram_group_observe_attribution(self, event: MessageEvent) -> MessageEvent:
@@ -4593,6 +4644,12 @@ class TelegramAdapter(BasePlatformAdapter):
shared_source = self._telegram_group_observe_shared_source(event.source)
observe_prompt = self._telegram_group_observe_channel_prompt()
channel_prompt = f"{event.channel_prompt}\n\n{observe_prompt}" if event.channel_prompt else observe_prompt
if event.message_type == MessageType.COMMAND:
return dataclasses.replace(
event,
source=shared_source,
channel_prompt=channel_prompt,
)
return dataclasses.replace(
event,
text=self._telegram_group_observe_attributed_text(event),
+108 -4
View File
@@ -27,6 +27,8 @@ Security:
"""
import asyncio
import base64
import binascii
import hashlib
import hmac
import json
@@ -326,6 +328,17 @@ class WebhookAdapter(BasePlatformAdapter):
_INSECURE_NO_AUTH,
)
continue
if (
effective_secret == _INSECURE_NO_AUTH
and not _is_loopback_host(self._host)
):
logger.warning(
"[webhook] Dynamic route '%s' skipped: INSECURE_NO_AUTH "
"is only allowed on loopback hosts. Current host: '%s'.",
k,
self._host,
)
continue
new_dynamic[k] = v
self._dynamic_routes = new_dynamic
self._routes = {**self._dynamic_routes, **self._static_routes}
@@ -366,9 +379,21 @@ class WebhookAdapter(BasePlatformAdapter):
logger.error("[webhook] Failed to read body: %s", e)
return web.json_response({"error": "Bad request"}, status=400)
# Validate HMAC signature FIRST (skip for INSECURE_NO_AUTH testing mode)
# Validate HMAC signature FIRST (skip only for the explicit local-test
# INSECURE_NO_AUTH mode). Missing/empty secrets must fail closed here,
# not only during connect(), so direct handler reuse cannot turn a
# network webhook route into an unauthenticated agent-dispatch surface.
secret = route_config.get("secret", self._global_secret)
if secret and secret != _INSECURE_NO_AUTH:
if not secret:
logger.error(
"[webhook] Route %s has no HMAC secret; refusing request",
route_name,
)
return web.json_response(
{"error": "Webhook route is missing an HMAC secret"},
status=403,
)
if secret != _INSECURE_NO_AUTH:
if not self._validate_signature(request, raw_body, secret):
logger.warning(
"[webhook] Invalid signature for route %s", route_name
@@ -408,6 +433,7 @@ class WebhookAdapter(BasePlatformAdapter):
request.headers.get("X-GitHub-Event", "")
or request.headers.get("X-GitLab-Event", "")
or payload.get("event_type", "")
or payload.get("type", "")
or "unknown"
)
allowed_events = route_config.get("events", [])
@@ -460,7 +486,10 @@ class WebhookAdapter(BasePlatformAdapter):
# Build a unique delivery ID
delivery_id = request.headers.get(
"X-GitHub-Delivery",
request.headers.get("X-Request-ID", str(int(time.time() * 1000))),
request.headers.get(
"svix-id",
request.headers.get("X-Request-ID", str(int(time.time() * 1000))),
),
)
# ── Idempotency ─────────────────────────────────────────
@@ -605,7 +634,32 @@ class WebhookAdapter(BasePlatformAdapter):
def _validate_signature(
self, request: "web.Request", body: bytes, secret: str
) -> bool:
"""Validate webhook signature (GitHub, GitLab, generic HMAC-SHA256)."""
"""Validate webhook signature (GitHub, GitLab, Svix, generic HMAC-SHA256)."""
def _header(name: str) -> str:
return (
request.headers.get(name, "")
or request.headers.get(name.lower(), "")
or request.headers.get(name.upper(), "")
)
# Svix / AgentMail:
# svix-id: msg_...
# svix-timestamp: unix seconds
# svix-signature: v1,<base64-hmac> [v1,<base64-hmac> ...]
# Signed content is: "{id}.{timestamp}.{raw_body}". Svix secrets
# usually start with "whsec_" and the remainder is base64-encoded.
svix_id = _header("svix-id")
svix_timestamp = _header("svix-timestamp")
svix_signature = _header("svix-signature")
if svix_id or svix_timestamp or svix_signature:
return self._validate_svix_signature(
body=body,
secret=secret,
msg_id=svix_id,
timestamp=svix_timestamp,
signature_header=svix_signature,
)
# GitHub: X-Hub-Signature-256 = sha256=<hex>
gh_sig = request.headers.get("X-Hub-Signature-256", "")
if gh_sig:
@@ -633,6 +687,56 @@ class WebhookAdapter(BasePlatformAdapter):
)
return False
def _validate_svix_signature(
self,
body: bytes,
secret: str,
msg_id: str,
timestamp: str,
signature_header: str,
tolerance_seconds: int = 300,
) -> bool:
"""Validate Svix-compatible signatures used by AgentMail webhooks."""
if not (msg_id and timestamp and signature_header and secret):
return False
try:
ts = int(timestamp)
except (TypeError, ValueError):
return False
if abs(int(time.time()) - ts) > tolerance_seconds:
logger.warning("[webhook] Svix signature timestamp outside replay window")
return False
if secret.startswith("whsec_"):
encoded_secret = secret.removeprefix("whsec_")
try:
key = base64.b64decode(encoded_secret, validate=True)
except (binascii.Error, ValueError):
logger.debug("[webhook] Invalid whsec_ Svix signing secret")
return False
else:
# Be permissive for providers that document Svix-style headers but
# hand out raw shared secrets rather than whsec_ base64 secrets.
logger.debug("[webhook] Validating Svix-style signature with raw secret")
key = secret.encode()
signed_content = msg_id.encode() + b"." + timestamp.encode() + b"." + body
expected = base64.b64encode(
hmac.new(key, signed_content, hashlib.sha256).digest()
).decode()
# Svix can send multiple signatures separated by spaces during secret
# rotation. Each entry is formatted as "vN,<base64>".
for part in signature_header.split():
try:
version, signature = part.split(",", 1)
except ValueError:
continue
if version == "v1" and hmac.compare_digest(signature, expected):
return True
return False
# ------------------------------------------------------------------
# Prompt rendering
# ------------------------------------------------------------------
+12
View File
@@ -616,6 +616,18 @@ class WeComAdapter(BasePlatformAdapter):
else:
delay = self._text_batch_delay_seconds
await asyncio.sleep(delay)
# Guard against the cancel-delivery race: when the sleep timer
# fires just before cancel() is called, CPython sets
# Task._must_cancel but cannot cancel the already-done sleep
# future, so CancelledError is delivered at the *next* await
# (handle_message) rather than here. By that point this task
# has already popped the merged event, so the superseding task
# sees an empty batch and silently drops the message.
# This check is synchronous — no await between the sleep and
# the pop — so no other coroutine can modify the task registry
# in between.
if self._pending_text_batch_tasks.get(key) is not current_task:
return
event = self._pending_text_batches.pop(key, None)
if not event:
return
+25 -13
View File
@@ -187,7 +187,6 @@ class WecomCallbackAdapter(BasePlatformAdapter):
app = self._resolve_app_for_chat(chat_id)
touser = chat_id.split(":", 1)[1] if ":" in chat_id else chat_id
try:
token = await self._get_access_token(app)
payload = {
"touser": touser,
"msgtype": "text",
@@ -195,18 +194,31 @@ class WecomCallbackAdapter(BasePlatformAdapter):
"text": {"content": content[:2048]},
"safe": 0,
}
resp = await self._http_client.post(
f"https://qyapi.weixin.qq.com/cgi-bin/message/send?access_token={token}",
json=payload,
)
data = resp.json()
if data.get("errcode") != 0:
return SendResult(success=False, error=str(data))
return SendResult(
success=True,
message_id=str(data.get("msgid", "")),
raw_response=data,
)
for _attempt in range(2):
token = await self._get_access_token(app)
resp = await self._http_client.post(
f"https://qyapi.weixin.qq.com/cgi-bin/message/send?access_token={token}",
json=payload,
)
data = resp.json()
errcode = data.get("errcode")
if errcode in {40001, 42001} and _attempt == 0:
# WeCom rejected the token — evict the cached entry so
# the next _get_access_token call forces a fresh fetch.
logger.warning(
"[WecomCallback] Token rejected for app '%s' (errcode=%s), refreshing",
app.get("name", "default"), errcode,
)
self._access_tokens.pop(app["name"], None)
continue
if errcode != 0:
return SendResult(success=False, error=str(data))
return SendResult(
success=True,
message_id=str(data.get("msgid", "")),
raw_response=data,
)
return SendResult(success=False, error="send failed after token refresh")
except Exception as exc:
return SendResult(success=False, error=str(exc))
+2
View File
@@ -1679,8 +1679,10 @@ class WeixinAdapter(BasePlatformAdapter):
# Extract MEDIA: tags and bare local file paths before text delivery.
media_files, cleaned_content = self.extract_media(content)
media_files = self.filter_media_delivery_paths(media_files)
_, image_cleaned = self.extract_images(cleaned_content)
local_files, final_content = self.extract_local_files(image_cleaned)
local_files = self.filter_local_delivery_paths(local_files)
_AUDIO_EXTS = {".ogg", ".opus", ".mp3", ".wav", ".m4a", ".flac"}
_VIDEO_EXTS = {".mp4", ".mov", ".avi", ".mkv", ".webm", ".3gp"}
+561 -226
View File
File diff suppressed because it is too large Load Diff
+1
View File
@@ -1277,6 +1277,7 @@ class SessionStore:
platform_message_id=(
message.get("platform_message_id") or message.get("message_id")
),
observed=bool(message.get("observed")),
)
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
+15
View File
@@ -83,6 +83,21 @@ _VAR_MAP = {
}
def set_current_session_id(session_id: str) -> None:
"""Synchronize ``HERMES_SESSION_ID`` across ContextVar and ``os.environ``.
Long-lived single-process entrypoints like the CLI can rotate sessions via
``/new``, ``/resume``, ``/branch``, or compression splits without
reconstructing the entire agent. Tools still consult
``get_session_env("HERMES_SESSION_ID")`` with an ``os.environ`` fallback,
so both storage paths must move together when the active session changes.
"""
import os
os.environ["HERMES_SESSION_ID"] = session_id
_SESSION_ID.set(session_id)
def set_session_vars(
platform: str = "",
chat_id: str = "",
+5
View File
@@ -192,6 +192,11 @@ class GatewayStreamConsumer:
"""True when the stream consumer delivered the final assistant reply."""
return self._final_response_sent
@property
def message_id(self) -> str | None:
"""The Discord/chat message ID of the last-sent or edited message."""
return self._message_id
@property
def final_content_delivered(self) -> bool:
"""True when the final response content reached the user, even if
+7 -2
View File
@@ -129,7 +129,8 @@ def build_top_level_parser():
default=None,
help=(
"Provider override for this invocation (e.g. openrouter, anthropic). "
"Applies to -z/--oneshot and --tui. Also settable via HERMES_INFERENCE_PROVIDER env var."
"Applies to -z/--oneshot and --tui. The persistent provider lives in config.yaml "
"under model.provider — use `hermes setup` or edit the file to change it."
),
)
parser.add_argument(
@@ -268,7 +269,11 @@ def build_top_level_parser():
help="Inference provider (default: auto). Built-in or a user-defined name from `providers:` in config.yaml.",
)
chat_parser.add_argument(
"-v", "--verbose", action="store_true", help="Verbose output"
"-v",
"--verbose",
action="store_true",
default=argparse.SUPPRESS,
help="Verbose output",
)
chat_parser.add_argument(
"-Q",
+224 -32
View File
@@ -41,7 +41,7 @@ from dataclasses import dataclass, field
from datetime import datetime, timezone
from http.server import BaseHTTPRequestHandler, HTTPServer, ThreadingHTTPServer
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Tuple
from typing import Any, Callable, Dict, FrozenSet, List, Optional, Tuple
from urllib.parse import parse_qs, urlencode, urlparse
import httpx
@@ -49,6 +49,7 @@ import yaml
from hermes_cli.config import get_hermes_home, get_config_path, read_raw_config
from hermes_constants import OPENROUTER_BASE_URL, secure_parent_dir
from agent.credential_persistence import sanitize_borrowed_credential_payload
from utils import atomic_replace, atomic_yaml_write, is_truthy_value
logger = logging.getLogger(__name__)
@@ -196,9 +197,17 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
auth_type="oauth_external",
inference_base_url=DEFAULT_CODEX_BASE_URL,
),
"openai-api": ProviderConfig(
id="openai-api",
name="OpenAI API",
auth_type="api_key",
inference_base_url="https://api.openai.com/v1",
api_key_env_vars=("OPENAI_API_KEY",),
base_url_env_var="OPENAI_BASE_URL",
),
"xai-oauth": ProviderConfig(
id="xai-oauth",
name="xAI Grok OAuth (SuperGrok Subscription)",
name="xAI Grok OAuth (SuperGrok / Premium+)",
auth_type="oauth_external",
inference_base_url=DEFAULT_XAI_OAUTH_BASE_URL,
),
@@ -553,6 +562,7 @@ _PLACEHOLDER_SECRET_VALUES = {
"***",
"changeme",
"your_api_key",
"your_api_key_here",
"your-api-key",
"placeholder",
"example",
@@ -1167,14 +1177,23 @@ def read_credential_pool(provider_id: Optional[str] = None) -> Dict[str, Any]:
def write_credential_pool(provider_id: str, entries: List[Dict[str, Any]]) -> Path:
"""Persist one provider's credential pool under auth.json."""
"""Persist one provider's credential pool under auth.json.
This is the final disk-boundary guard for borrowed/reference-only
credentials. Callers may pass raw dictionaries, so sanitize here even when
``PooledCredential.to_dict()`` already did the same work upstream.
"""
with _auth_store_lock():
auth_store = _load_auth_store()
pool = auth_store.get("credential_pool")
if not isinstance(pool, dict):
pool = {}
auth_store["credential_pool"] = pool
pool[provider_id] = list(entries)
pool[provider_id] = [
sanitize_borrowed_credential_payload(entry, provider_id)
if isinstance(entry, dict) else entry
for entry in entries
]
return _save_auth_store(auth_store)
@@ -1559,6 +1578,67 @@ def _optional_base_url(value: Any) -> Optional[str]:
return cleaned if cleaned else None
# Allowlist of hosts the Nous Portal proxy is willing to forward minted
# bearer tokens to. The bearer is a long-lived agent_key minted by
# portal.nousresearch.com — sending it anywhere else would leak it.
#
# This is consulted only for URLs coming from the NETWORK side (Portal
# refresh / agent-key-mint responses). User-controlled env-var overrides
# (NOUS_INFERENCE_BASE_URL) bypass validation — that's the documented
# dev/staging escape hatch and the env source is already trusted (the
# user set it themselves).
_ALLOWED_NOUS_INFERENCE_HOSTS: FrozenSet[str] = frozenset({
"inference-api.nousresearch.com",
})
def _validate_nous_inference_url_from_network(url: Optional[str]) -> Optional[str]:
"""Validate a Portal-returned inference URL against the host allowlist.
Returns ``url`` (normalised by stripping trailing slashes) if it's a
well-formed ``https://<allowlisted-host>/...`` URL. Returns ``None``
if the URL is missing, malformed, non-https, or points at an
unexpected host letting the caller fall back to the configured
default rather than persist or forward a poisoned value.
Defense-in-depth: a compromised refresh / mint response from the
Portal API (MITM, malicious response injection) could otherwise
redirect every subsequent proxy request bearing the user's
legitimately-minted agent_key to an attacker-controlled endpoint.
Validating scheme + host at the source closes that loop before the
poisoned URL ever lands in ``auth.json``.
The env-var override path (``NOUS_INFERENCE_BASE_URL``) bypasses
this env values come from the trusted OS user, not from the
network, and the override is documented for staging/dev use.
Co-authored-by: memosr <mehmet.sr35@gmail.com>
"""
if not isinstance(url, str):
return None
cleaned = url.strip()
if not cleaned:
return None
try:
parsed = urlparse(cleaned)
except Exception:
return None
if parsed.scheme != "https":
logger.warning(
"nous: refusing non-https inference URL scheme %r from Portal response",
parsed.scheme,
)
return None
if parsed.hostname not in _ALLOWED_NOUS_INFERENCE_HOSTS:
logger.warning(
"nous: refusing inference URL host %r from Portal response "
"(not in allowlist); falling back to default",
parsed.hostname,
)
return None
return cleaned.rstrip("/")
def _decode_jwt_claims(token: Any) -> Dict[str, Any]:
if not isinstance(token, str) or token.count(".") != 2:
return {}
@@ -2004,7 +2084,10 @@ def resolve_qwen_runtime_credentials(
def get_qwen_auth_status() -> Dict[str, Any]:
auth_path = _qwen_cli_auth_path()
try:
creds = resolve_qwen_runtime_credentials(refresh_if_expiring=False)
# Validate the runtime credentials, including refresh when the cached
# CLI token is expired. Otherwise stale tokens show up as "logged in"
# and `hermes model` walks users into a broken Qwen setup flow.
creds = resolve_qwen_runtime_credentials(refresh_if_expiring=True)
return {
"logged_in": True,
"auth_file": str(auth_path),
@@ -2405,6 +2488,32 @@ def _make_xai_callback_handler(expected_path: str) -> tuple[type[BaseHTTPRequest
"error_description": params.get("error_description", [None])[0],
}
# Diagnostic logging — emits at INFO so reporters of loopback bugs
# (#27385 — "callback received but Hermes times out") can produce
# actionable evidence without a code change. Logged values are
# fingerprints / booleans only; no actual code/state strings leak
# into the log file. Run with ``HERMES_LOG_LEVEL=INFO`` (or check
# ``~/.hermes/logs/agent.log`` which captures INFO+ unconditionally).
try:
logger.info(
"xAI loopback callback received: path=%s has_code=%s has_state=%s has_error=%s "
"ua=%s",
parsed.path,
incoming["code"] is not None,
incoming["state"] is not None,
incoming["error"] is not None,
(self.headers.get("User-Agent") or "")[:80],
)
if incoming["error"]:
logger.info(
"xAI loopback callback carries error=%s error_description=%s",
incoming["error"],
(incoming["error_description"] or "")[:200],
)
except Exception:
# Logging must never break the OAuth flow.
pass
# Treat a hit on the callback path with neither `code` nor `error`
# as a missing OAuth callback (e.g. xAI's auth backend failed to
# redirect and the user navigated to the bare loopback URL by hand).
@@ -2509,6 +2618,17 @@ def _xai_wait_for_callback(
server.shutdown()
server.server_close()
thread.join(timeout=1.0)
# Diagnostic: distinguish "no callback ever arrived" from "callback
# arrived but result wasn't populated" (#27385). The per-hit handler
# also logs at INFO; if neither line appears, xAI's IDP never reached
# the loopback at all (firewall, port-binding, IPv6/IPv4 mismatch).
logger.info(
"xAI loopback wait timed out after %.0fs with no usable callback "
"(result.code=%s result.error=%s)",
max(5.0, timeout_seconds),
result["code"] is not None,
result["error"] is not None,
)
raise AuthError(
"xAI authorization timed out waiting for the local callback.",
provider="xai-oauth",
@@ -3342,7 +3462,7 @@ def _read_xai_oauth_tokens(*, _lock: bool = True) -> Dict[str, Any]:
state = _load_provider_state(auth_store, "xai-oauth")
if not state:
raise AuthError(
"No xAI OAuth credentials stored. Select xAI Grok OAuth (SuperGrok Subscription) in `hermes model`.",
"No xAI OAuth credentials stored. Select xAI Grok OAuth (SuperGrok / Premium+) in `hermes model`.",
provider="xai-oauth",
code="xai_auth_missing",
relogin_required=True,
@@ -4776,7 +4896,7 @@ def refresh_nous_oauth_pure(
state["refresh_token"] = refreshed.get("refresh_token") or state["refresh_token"]
state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
state["scope"] = refreshed.get("scope") or state.get("scope")
refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
refreshed_url = _validate_nous_inference_url_from_network(refreshed.get("inference_base_url"))
if refreshed_url:
state["inference_base_url"] = refreshed_url
state["obtained_at"] = now.isoformat()
@@ -4812,7 +4932,7 @@ def refresh_nous_oauth_pure(
state["agent_key_expires_in"] = mint_payload.get("expires_in")
state["agent_key_reused"] = bool(mint_payload.get("reused", False))
state["agent_key_obtained_at"] = now.isoformat()
minted_url = _optional_base_url(mint_payload.get("inference_base_url"))
minted_url = _validate_nous_inference_url_from_network(mint_payload.get("inference_base_url"))
if minted_url:
state["inference_base_url"] = minted_url
@@ -5090,7 +5210,7 @@ def resolve_nous_runtime_credentials(
state["refresh_token"] = refreshed.get("refresh_token") or refresh_token
state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
state["scope"] = refreshed.get("scope") or state.get("scope")
refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
refreshed_url = _validate_nous_inference_url_from_network(refreshed.get("inference_base_url"))
if refreshed_url:
inference_base_url = refreshed_url
state["obtained_at"] = now.isoformat()
@@ -5198,7 +5318,7 @@ def resolve_nous_runtime_credentials(
state["refresh_token"] = refreshed.get("refresh_token") or latest_refresh_token
state["token_type"] = refreshed.get("token_type") or state.get("token_type") or "Bearer"
state["scope"] = refreshed.get("scope") or state.get("scope")
refreshed_url = _optional_base_url(refreshed.get("inference_base_url"))
refreshed_url = _validate_nous_inference_url_from_network(refreshed.get("inference_base_url"))
if refreshed_url:
inference_base_url = refreshed_url
state["obtained_at"] = now.isoformat()
@@ -5253,7 +5373,7 @@ def resolve_nous_runtime_credentials(
state["agent_key_expires_in"] = mint_payload.get("expires_in")
state["agent_key_reused"] = bool(mint_payload.get("reused", False))
state["agent_key_obtained_at"] = now.isoformat()
minted_url = _optional_base_url(mint_payload.get("inference_base_url"))
minted_url = _validate_nous_inference_url_from_network(mint_payload.get("inference_base_url"))
if minted_url:
inference_base_url = minted_url
_oauth_trace(
@@ -6273,7 +6393,7 @@ def _login_xai_oauth(
pass
print()
print("Signing in to xAI Grok OAuth (SuperGrok Subscription)...")
print("Signing in to xAI Grok OAuth (SuperGrok / Premium+)...")
print("(Hermes creates its own local OAuth session)")
print()
@@ -7045,10 +7165,95 @@ def _refresh_minimax_oauth_state(
return new_state
def _minimax_oauth_quarantine_on_terminal_refresh(state: Dict[str, Any], exc: AuthError) -> None:
"""Wipe dead tokens from auth.json after a terminal refresh failure.
Shared by both the eager-resolve path and the lazy per-request token
provider. Mirrors the Nous / xAI-OAuth / Codex-OAuth quarantine pattern
so subsequent calls fail fast without a network retry.
"""
if not (exc.relogin_required and state.get("refresh_token")):
return
for _k in ("access_token", "refresh_token", "expires_at", "expires_in", "obtained_at"):
state.pop(_k, None)
state["last_auth_error"] = {
"provider": "minimax-oauth",
"code": exc.code or "refresh_failed",
"message": str(exc),
"reason": "runtime_refresh_failure",
"relogin_required": True,
"at": datetime.now(timezone.utc).isoformat(),
}
try:
_minimax_save_auth_state(state)
except Exception as _save_exc:
logger.debug("MiniMax OAuth: failed to persist quarantined state: %s", _save_exc)
def build_minimax_oauth_token_provider() -> Callable[[], str]:
"""Return a zero-arg callable that yields a fresh MiniMax access token.
The Anthropic SDK caches ``api_key`` as a static string at construction
time, so a session that resolves credentials once at startup will keep
sending the same bearer until MiniMax's server returns 401 — typically
~15 minutes in, because MiniMax issues short-lived access tokens.
Returning a *callable* instead of a string lets us hook into the
existing Entra-ID bearer infrastructure in
:mod:`agent.anthropic_adapter`: ``build_anthropic_client`` detects a
callable and routes through ``_build_anthropic_client_with_bearer_hook``,
which mints a fresh ``Authorization`` header on every outbound request.
Each invocation re-reads the persisted state from ``auth.json`` and
calls :func:`_refresh_minimax_oauth_state` that helper is a no-op
when the token still has more than ``MINIMAX_OAUTH_REFRESH_SKEW_SECONDS``
of life left, so the steady-state cost is one file read + one
timestamp compare per request.
Reading state fresh each time also means a refresh persisted by one
process (CLI, gateway, cron) is immediately visible to every other
process sharing the same ``auth.json``.
"""
def _provide() -> str:
state = get_provider_auth_state("minimax-oauth")
if not state or not state.get("access_token"):
raise AuthError(
"Not logged into MiniMax OAuth. Run `hermes model` and select "
"MiniMax (OAuth).",
provider="minimax-oauth", code="not_logged_in", relogin_required=True,
)
try:
state = _refresh_minimax_oauth_state(state)
except AuthError as exc:
_minimax_oauth_quarantine_on_terminal_refresh(state, exc)
raise
token = state.get("access_token")
if not token:
raise AuthError(
"MiniMax OAuth state has no access_token after refresh.",
provider="minimax-oauth", code="no_access_token", relogin_required=True,
)
return token
return _provide
def resolve_minimax_oauth_runtime_credentials(
*, min_token_ttl_seconds: int = MINIMAX_OAUTH_REFRESH_SKEW_SECONDS,
as_token_provider: bool = False,
) -> Dict[str, Any]:
"""Return {provider, api_key, base_url, source} for minimax-oauth."""
"""Return {provider, api_key, base_url, source} for minimax-oauth.
When ``as_token_provider`` is True, ``api_key`` is a zero-arg callable
that mints a fresh access token per call (proactively refreshing if
the cached token is within ``MINIMAX_OAUTH_REFRESH_SKEW_SECONDS`` of
expiry). This is what the runtime provider path uses so that long
sessions survive MiniMax's short access-token lifetime — see
:func:`build_minimax_oauth_token_provider` for the rationale.
The default (string ``api_key``) preserves the historical contract for
diagnostic call sites like ``hermes status`` that just want to know
whether a valid token exists right now.
"""
state = get_provider_auth_state("minimax-oauth")
if not state or not state.get("access_token"):
raise AuthError(
@@ -7059,28 +7264,15 @@ def resolve_minimax_oauth_runtime_credentials(
try:
state = _refresh_minimax_oauth_state(state)
except AuthError as exc:
if exc.relogin_required and state.get("refresh_token"):
# Terminal refresh failure — clear dead tokens from auth.json so
# subsequent calls fail fast without a network retry, mirroring
# the Nous / xAI-OAuth / Codex-OAuth quarantine pattern.
for _k in ("access_token", "refresh_token", "expires_at", "expires_in", "obtained_at"):
state.pop(_k, None)
state["last_auth_error"] = {
"provider": "minimax-oauth",
"code": exc.code or "refresh_failed",
"message": str(exc),
"reason": "runtime_refresh_failure",
"relogin_required": True,
"at": datetime.now(timezone.utc).isoformat(),
}
try:
_minimax_save_auth_state(state)
except Exception as _save_exc:
logger.debug("MiniMax OAuth: failed to persist quarantined state: %s", _save_exc)
_minimax_oauth_quarantine_on_terminal_refresh(state, exc)
raise
if as_token_provider:
api_key: Any = build_minimax_oauth_token_provider()
else:
api_key = state["access_token"]
return {
"provider": "minimax-oauth",
"api_key": state["access_token"],
"api_key": api_key,
"base_url": state["inference_base_url"].rstrip("/"),
"source": "oauth",
}
+2 -2
View File
@@ -2,7 +2,6 @@
from __future__ import annotations
from getpass import getpass
import math
import sys
import time
@@ -30,6 +29,7 @@ from agent.credential_pool import (
import hermes_cli.auth as auth_mod
from hermes_cli.auth import PROVIDER_REGISTRY
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.secret_prompt import masked_secret_prompt
# Providers that support OAuth login in addition to API keys.
@@ -196,7 +196,7 @@ def auth_add_command(args) -> None:
if requested_type == AUTH_TYPE_API_KEY:
token = (getattr(args, "api_key", None) or "").strip()
if not token:
token = getpass("Paste your API key: ").strip()
token = masked_secret_prompt("Paste your API key: ").strip()
if not token:
raise SystemExit("No API key provided.")
default_label = _api_key_default_label(len(pool.entries()) + 1)
+18 -16
View File
@@ -85,6 +85,22 @@ def _should_exclude(rel_path: Path) -> bool:
return False
def _should_skip_backup_file(abs_path: Path, rel_path: Path, out_path: Path) -> bool:
"""Return True when a candidate file should not be written to a backup zip."""
if _should_exclude(rel_path):
return True
# zipfile.write() follows file symlinks, so skip links before any archive
# write can copy data from outside HERMES_HOME.
if abs_path.is_symlink():
return True
try:
return abs_path.resolve() == out_path.resolve()
except (OSError, ValueError):
return False
# ---------------------------------------------------------------------------
# SQLite safe copy
# ---------------------------------------------------------------------------
@@ -173,16 +189,9 @@ def run_backup(args) -> None:
fpath = dp / fname
rel = fpath.relative_to(hermes_root)
if _should_exclude(rel):
if _should_skip_backup_file(fpath, rel, out_path):
continue
# Skip the output zip itself if it happens to be inside hermes root
try:
if fpath.resolve() == out_path.resolve():
continue
except (OSError, ValueError):
pass
files_to_add.append((fpath, rel))
if not files_to_add:
@@ -726,16 +735,9 @@ def _write_full_zip_backup(out_path: Path, hermes_root: Path) -> Optional[Path]:
except ValueError:
continue
if _should_exclude(rel):
if _should_skip_backup_file(fpath, rel, out_path):
continue
# Skip the output zip itself if it already exists inside root.
try:
if fpath.resolve() == out_path.resolve():
continue
except (OSError, ValueError):
pass
files_to_add.append((fpath, rel))
except OSError as exc:
logger.warning("Full-zip backup: walk failed: %s", exc)
+2 -2
View File
@@ -8,10 +8,10 @@ with the TUI.
import queue
import time as _time
import getpass
from hermes_cli.banner import cprint, _DIM, _RST
from hermes_cli.config import save_env_value_secure
from hermes_cli.secret_prompt import masked_secret_prompt
from hermes_constants import display_hermes_home
@@ -75,7 +75,7 @@ def prompt_for_secret(cli, var_name: str, prompt: str, metadata=None) -> dict:
if not hasattr(cli, "_secret_deadline"):
cli._secret_deadline = 0
try:
value = getpass.getpass(f"{prompt} (hidden, ESC or empty Enter to skip): ")
value = masked_secret_prompt(f"{prompt} (hidden, ESC or empty Enter to skip): ")
except (EOFError, KeyboardInterrupt):
value = ""
+2 -3
View File
@@ -5,9 +5,8 @@ functions previously duplicated across setup.py, tools_config.py,
mcp_config.py, and memory_setup.py.
"""
import getpass
from hermes_cli.colors import Colors, color
from hermes_cli.secret_prompt import masked_secret_prompt
# ─── Print Helpers ────────────────────────────────────────────────────────────
@@ -59,7 +58,7 @@ def prompt(
try:
if password:
value = getpass.getpass(display)
value = masked_secret_prompt(display)
else:
value = input(display)
value = value.strip()
+2 -2
View File
@@ -164,7 +164,7 @@ COMMAND_REGISTRY: list[CommandDef] = [
cli_only=True),
CommandDef("skills", "Search, install, inspect, or manage skills",
"Tools & Skills", cli_only=True,
subcommands=("search", "browse", "inspect", "install")),
subcommands=("search", "browse", "inspect", "install", "audit")),
CommandDef("bundles", "List skill bundles (aliases /<name> for multiple skills)",
"Tools & Skills"),
CommandDef("cron", "Manage scheduled tasks", "Tools & Skills",
@@ -449,7 +449,7 @@ def _iter_plugin_command_entries() -> list[tuple[str, str, str]]:
:func:`hermes_cli.plugins.PluginContext.register_command`. They behave
like ``CommandDef`` entries for gateway surfacing: they appear in the
Telegram command menu, in Slack's ``/hermes`` subcommand mapping, and
(via :func:`gateway.platforms.discord._register_slash_commands`) in
(via :func:`plugins.platforms.discord.adapter._register_slash_commands`) in
Discord's native slash command picker.
Lookup is lazy so importing this module never forces plugin discovery
+66 -6
View File
@@ -26,6 +26,8 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Any, Optional, List, Tuple
from hermes_cli.secret_prompt import masked_secret_prompt
logger = logging.getLogger(__name__)
# Track which (config_path, mtime_ns, size) tuples we've already warned about
@@ -658,7 +660,8 @@ DEFAULT_CONFIG = {
# are owned by your host user instead of root, which avoids needing
# `sudo chown` after container runs. Default off to preserve behavior
# for images whose entrypoints expect to start as root (e.g. the
# bundled Hermes image, which drops to the `hermes` user via gosu).
# bundled Hermes image, which drops to the `hermes` user via
# s6-setuidgid inside each supervised service).
# When on, SETUID/SETGID caps are omitted from the container since
# no privilege drop is needed.
"docker_run_as_host_user": False,
@@ -1008,6 +1011,19 @@ DEFAULT_CONFIG = {
"compact": False,
"personality": "kawaii",
"resume_display": "full",
# Recap tuning for /resume and startup resume. The defaults match the
# historical hardcoded values; expose them as config so power users can
# widen or tighten the snapshot to taste.
"resume_exchanges": 10, # max user+assistant pairs to show
"resume_max_user_chars": 300, # truncate user message text
"resume_max_assistant_chars": 200, # truncate non-last assistant text
"resume_max_assistant_lines": 3, # truncate non-last assistant lines
# When True (default), assistant entries that are *only* tool calls
# (no visible text) are skipped in the recap. This prevents the recap
# from being dominated by `[2 tool calls: terminal, read_file]` lines
# when an exchange was tool-heavy. Set False to restore the legacy
# behavior of showing tool-call summaries inline.
"resume_skip_tool_only": True,
"busy_input_mode": "interrupt", # interrupt | queue | steer
# When true, `hermes --tui` auto-resumes the most recent human-
# facing session on launch instead of forging a fresh one.
@@ -1622,6 +1638,31 @@ DEFAULT_CONFIG = {
"force_ipv4": False,
},
# Gateway settings — control how messaging platforms (Telegram, Discord,
# Slack, etc.) deliver agent-produced files as native attachments.
"gateway": {
# Extra directories from which model-emitted bare file paths may be
# uploaded as native gateway attachments. Files inside the Hermes
# cache (~/.hermes/cache/{documents,images,audio,video,screenshots})
# are always trusted; this list adds operator-controlled roots
# (project dirs, scratch dirs, mounted shares). Accepts a list of
# absolute paths or a single os.pathsep-separated string. Bridged
# to HERMES_MEDIA_ALLOW_DIRS at gateway startup. Tilde paths are
# expanded.
"media_delivery_allow_dirs": [],
# When true, files whose mtime is within ``trust_recent_files_seconds``
# of "now" are trusted for native delivery even outside the cache /
# operator allowlist — useful for ``pandoc -o /tmp/report.pdf`` or
# PDFs the agent writes into a working directory. System paths
# (/etc, /proc, ~/.ssh, ~/.aws, etc.) remain blocked regardless.
# Disable to fall back to pure-allowlist mode. Bridged to
# HERMES_MEDIA_TRUST_RECENT_FILES.
"trust_recent_files": True,
# Recency window in seconds. 600 (10 min) comfortably covers a
# multi-tool agent turn. Bridged to HERMES_MEDIA_TRUST_RECENT_SECONDS.
"trust_recent_files_seconds": 600,
},
# Session storage — controls automatic cleanup of ~/.hermes/state.db.
# state.db accumulates every session, message, tool call, and FTS5 index
# entry forever. Without auto-pruning, a heavy user (gateway + cron)
@@ -1730,6 +1771,7 @@ DEFAULT_CONFIG = {
"servers": {},
},
# X (Twitter) Search via xAI's built-in x_search Responses tool.
# The tool registers when xAI credentials are available (SuperGrok
# OAuth or XAI_API_KEY) AND the x_search toolset is enabled in
@@ -1775,11 +1817,29 @@ DEFAULT_CONFIG = {
# ~/.hermes/bin/ on first use. When False you must install
# bws yourself and have it on PATH.
"auto_install": True,
# Bitwarden region / self-hosted endpoint. Empty string
# means use the bws CLI default (US Cloud,
# https://vault.bitwarden.com). Set to
# https://vault.bitwarden.eu for EU Cloud, or your own URL
# for self-hosted Bitwarden. Plumbed into the bws subprocess
# as BWS_SERVER_URL. Prompted for during
# `hermes secrets bitwarden setup`.
"server_url": "",
},
},
# Paste collapse thresholds (TUI + CLI).
# collapse_threshold: paste collapses to a file reference when line count
# exceeds this value (bracketed paste, safe: appends to existing text).
# collapse_threshold_fallback: same but for the fallback heuristic used
# by terminals without bracketed paste support (destructive: replaces
# entire buffer). 0 = disabled.
"paste_collapse_threshold": 5,
"paste_collapse_threshold_fallback": 0,
# Config schema version - bump this when adding new required fields
"_config_version": 23,
"_config_version": 24,
}
# =============================================================================
@@ -3982,8 +4042,7 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
print(f" Get your key at: {var['url']}")
if var.get("password"):
import getpass
value = getpass.getpass(f" {var['prompt']}: ")
value = masked_secret_prompt(f" {var['prompt']}: ")
else:
value = input(f" {var['prompt']}: ").strip()
@@ -4034,8 +4093,9 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
else:
print(f" {info.get('description', name)}")
if info.get("password"):
import getpass
value = getpass.getpass(f" {info.get('prompt', name)} (Enter to skip): ")
value = masked_secret_prompt(
f" {info.get('prompt', name)} (Enter to skip): "
)
else:
value = input(f" {info.get('prompt', name)} (Enter to skip): ").strip()
if value:
+325
View File
@@ -0,0 +1,325 @@
"""Container-boot reconciliation of per-profile gateway s6 services.
Service directories under /run/service/ live on **tmpfs** and are wiped
on every container restart. Profile directories under
``$HERMES_HOME/profiles/<name>/`` live on the persistent VOLUME, and
each one records its gateway's last state in ``gateway_state.json``.
This module bridges the two: on every container boot, walk the
persistent profiles, recreate the s6 service slots, and auto-start
only those whose last recorded state was ``running``.
Wired into the image as /etc/cont-init.d/02-reconcile-profiles by the
Dockerfile (Phase 4 Task 4.0). Runs as root after 01-hermes-setup
(the stage2 hook) has chowned the volume and seeded $HERMES_HOME, but
before s6-rc starts user services.
Without this module, every ``docker restart`` would silently wipe
every per-profile gateway, even though the user's profiles still
exist on disk.
"""
from __future__ import annotations
import json
import logging
import os
from dataclasses import dataclass
from pathlib import Path
from typing import Literal
log = logging.getLogger(__name__)
# Only this prior state triggers automatic restart. Everything else
# (startup_failed, starting, stopped, missing) registers the slot in
# the down state and waits for explicit user action — this avoids the
# crash-loop where a broken gateway keeps being restarted across
# `docker restart` cycles.
_AUTOSTART_STATES = frozenset({"running"})
# Stale runtime files we sweep before recreating service slots. These
# all hold container-namespaced state (PIDs, process tables) that's
# garbage post-restart — a numerically-equal PID in the new container
# is a different process. See the Risk Register in the plan.
_STALE_RUNTIME_FILES = ("gateway.pid", "processes.json")
ReconcileActionLabel = Literal["started", "registered", "skipped"]
@dataclass(frozen=True)
class ReconcileAction:
"""One profile's outcome from a single reconciliation pass."""
profile: str
prior_state: str | None
action: ReconcileActionLabel
def reconcile_profile_gateways(
*,
hermes_home: Path,
scandir: Path,
dry_run: bool = False,
) -> list[ReconcileAction]:
"""Recreate s6 service registrations for every persistent profile.
Always registers a ``gateway-default`` slot for the root profile
(the implicit profile that lives at the top of ``$HERMES_HOME``,
not under ``profiles/``). The dispatcher in ``hermes_cli.gateway``
maps an empty profile suffix to ``gateway-default``, so this slot
is what ``hermes gateway start`` (no ``-p``) targets. Without it,
bare ``hermes gateway start`` inside the container would land on
``s6-svc -u /run/service/gateway-default`` uncaught
``CalledProcessError`` traceback to the user (PR #30136 review).
The default slot's prior state is read from
``$HERMES_HOME/gateway_state.json`` (sibling to the profile root,
not under ``profiles/``); stale runtime files there are swept the
same way as for named profiles.
Args:
hermes_home: The container's HERMES_HOME (typically /opt/data).
Profiles live under ``<hermes_home>/profiles/<name>/``;
the default profile lives at ``<hermes_home>`` itself.
scandir: The s6 dynamic scandir (typically /run/service). Service
directories are created at ``<scandir>/gateway-<profile>/``.
dry_run: When True, walk and return the action list without
touching the filesystem. For tests and `--dry-run` debug.
Returns:
One :class:`ReconcileAction` per profile, in this order:
``default`` first, then named profiles in directory order.
"""
actions: list[ReconcileAction] = []
# Default profile — always register, even if nothing has ever
# populated the root profile dir. The slot exists so
# ``hermes gateway start`` (no ``-p``) has somewhere to land;
# auto-up only when the prior state was "running" (same rule as
# named profiles).
default_prior_state = _read_prior_state(hermes_home)
default_should_start = default_prior_state in _AUTOSTART_STATES
if not dry_run:
_cleanup_stale_runtime_files(hermes_home)
_register_service(scandir, "default", start=default_should_start)
actions.append(ReconcileAction(
profile="default",
prior_state=default_prior_state,
action="started" if default_should_start else "registered",
))
profiles_root = hermes_home / "profiles"
if profiles_root.is_dir():
for entry in sorted(profiles_root.iterdir()):
if not entry.is_dir():
continue
# SOUL.md is always seeded by `hermes profile create` (config.yaml
# is not — that comes later via `hermes setup`). Use it as the
# "real profile" marker so stray dirs (backups, manual mkdir)
# aren't picked up.
if not (entry / "SOUL.md").exists():
continue
# The "default" service name is reserved for the root
# profile (above) — if a user has somehow created a
# ``profiles/default/`` directory, skip it to avoid the
# slot collision. Their gateway would still be reachable
# via ``hermes -p default-named gateway start`` if they
# rename the directory; we don't try to disambiguate here.
if entry.name == "default":
log.warning(
"profiles/default/ exists — skipping to avoid colliding "
"with the reserved root-profile s6 slot",
)
continue
prior_state = _read_prior_state(entry)
should_start = prior_state in _AUTOSTART_STATES
if not dry_run:
_cleanup_stale_runtime_files(entry)
_register_service(scandir, entry.name, start=should_start)
actions.append(ReconcileAction(
profile=entry.name,
prior_state=prior_state,
action="started" if should_start else "registered",
))
if not dry_run:
_write_reconcile_log(hermes_home, actions)
return actions
def _read_prior_state(profile_dir: Path) -> str | None:
"""Read gateway_state.json's ``gateway_state`` field, or None if
missing or unparseable. Unparseable counts as "no prior state" so
we don't bork the whole reconciliation on a corrupt file."""
state_file = profile_dir / "gateway_state.json"
if not state_file.exists():
return None
try:
return json.loads(state_file.read_text()).get("gateway_state")
except (OSError, json.JSONDecodeError):
log.warning(
"could not read %s; treating as no prior state", state_file,
)
return None
def _cleanup_stale_runtime_files(profile_dir: Path) -> None:
"""Remove gateway.pid and processes.json — they reference PIDs in
the dead container's process namespace and would otherwise confuse
the newly-started gateway's process-mismatch checks."""
for name in _STALE_RUNTIME_FILES:
(profile_dir / name).unlink(missing_ok=True)
def _register_service(scandir: Path, profile: str, *, start: bool) -> None:
"""Recreate the s6 service slot for one profile.
Mirrors the rendering in :func:`S6ServiceManager.register_profile_gateway`,
but here we control the start state directly via the ``down`` marker
file (s6-svscan honors it on rescan). Cannot use the manager
directly because the cont-init.d phase runs as root before
s6-svscan starts scanning the dynamic scandir the manager's
``s6-svscanctl -a`` call would fail with no control socket.
Atomicity: build the new layout in a sibling temp directory and
rename it into place via :meth:`Path.replace`. This matches
:meth:`S6ServiceManager.register_profile_gateway` (PR #30136
review item O4) even though cont-init.d runs before s6-svscan
starts scanning, an atomic publication keeps the contract uniform
between the two registration paths and protects against a
half-populated dir if the script is interrupted mid-write.
"""
import shutil
from hermes_cli.service_manager import (
S6ServiceManager,
_seed_supervise_skeleton,
validate_profile_name,
)
validate_profile_name(profile)
service_dir = scandir / f"gateway-{profile}"
tmp_dir = service_dir.with_name(service_dir.name + ".tmp")
# Wipe any leftover tmp from a previous interrupted run.
if tmp_dir.exists():
shutil.rmtree(tmp_dir, ignore_errors=True)
tmp_dir.mkdir(parents=True)
try:
(tmp_dir / "type").write_text("longrun\n")
# Reuse the manager's run-script rendering — single source of
# truth so register_profile_gateway and reconcile_profile_gateways
# stay consistent. extra_env is empty here; users who need
# per-profile env can set it via the profile's config.yaml
# (which the gateway itself loads).
run = tmp_dir / "run"
run.write_text(S6ServiceManager._render_run_script(profile, extra_env={}))
run.chmod(0o755)
# Persistent log rotation (OQ8-C).
log_subdir = tmp_dir / "log"
log_subdir.mkdir()
log_run = log_subdir / "run"
log_run.write_text(S6ServiceManager._render_log_run(profile))
log_run.chmod(0o755)
# The presence of a `down` file tells s6-supervise to NOT
# start the service when s6-svscan picks it up. User brings
# it up explicitly with `hermes -p <profile> gateway start`
# (which routes through the Phase 4
# _dispatch_via_service_manager_if_s6 helper to `s6-svc -u`).
if not start:
(tmp_dir / "down").touch()
# Pre-create the supervise/ skeleton with hermes ownership
# BEFORE we publish the slot. Mirrors the same pre-creation
# step in S6ServiceManager.register_profile_gateway — when
# s6-svscan picks the published slot up, the s6-supervise it
# spawns will EEXIST our dirs/FIFOs and inherit hermes
# ownership, so runtime s6-svc / s6-svstat / s6-svwait calls
# (all dispatched as the hermes user) won't hit EACCES. See
# ``_seed_supervise_skeleton`` in service_manager.py for the
# full rationale.
_seed_supervise_skeleton(tmp_dir)
# Publish atomically. Path.replace handles the existing-target
# case the same way os.rename does on POSIX: the target is
# silently replaced, so a previous reconcile pass's slot is
# cleanly overwritten in one operation.
if service_dir.exists():
shutil.rmtree(service_dir)
tmp_dir.replace(service_dir)
except Exception:
shutil.rmtree(tmp_dir, ignore_errors=True)
raise
def _write_reconcile_log(
hermes_home: Path, actions: list[ReconcileAction],
) -> None:
"""Append one line per profile to $HERMES_HOME/logs/container-boot.log.
Operators inspect this to debug "why didn't my profile come back
up". Keeping a separate log file (vs. mixing into agent.log) lets
troubleshooters grep for "profile=foo" without wading through
unrelated activity.
Size-bounded: when the file exceeds ``_LOG_ROTATE_BYTES``
(defaults to 256 KiB 3000 reconcile lines), the current file
is renamed to ``container-boot.log.1`` (replacing any previous
rotation) before the new entries are appended. This gives long-
lived containers a soft cap of ~512 KiB across the two files
without pulling in logrotate or s6-log machinery just for this
one append-only file (PR #30136 review item O3).
"""
import time
log_dir = hermes_home / "logs"
log_dir.mkdir(parents=True, exist_ok=True)
log_path = log_dir / "container-boot.log"
# Rotate before opening to append, so the new entries always land
# in a fresh file when we crossed the threshold last time.
try:
if log_path.exists() and log_path.stat().st_size >= _LOG_ROTATE_BYTES:
log_path.replace(log_dir / "container-boot.log.1")
except OSError as exc:
# Rotation failure is non-fatal — keep appending to the
# existing file rather than losing the entry entirely.
log.warning("could not rotate %s: %s", log_path, exc)
ts = time.strftime("%Y-%m-%dT%H:%M:%S%z")
with log_path.open("a", encoding="utf-8") as f:
for a in actions:
f.write(
f"{ts} profile={a.profile} prior_state={a.prior_state} "
f"action={a.action}\n"
)
# 256 KiB soft cap on container-boot.log; rotated to .1 when crossed.
# At ~80 B per reconcile-action line this is ~3000 lines, or about a
# year of daily reboots on a 5-profile container. Two files = ~512 KiB
# worst case. Tuned for visibility (small enough to grep / cat without
# scrolling forever) more than space (the persistent volume has GB).
_LOG_ROTATE_BYTES = 256 * 1024
def main() -> int:
"""Entry point invoked from /etc/cont-init.d/02-reconcile-profiles."""
hermes_home = Path(os.environ.get("HERMES_HOME", "/opt/data"))
scandir = Path(os.environ.get("S6_PROFILE_GATEWAY_SCANDIR", "/run/service"))
actions = reconcile_profile_gateways(
hermes_home=hermes_home, scandir=scandir,
)
for a in actions:
print(
f"reconcile: profile={a.profile} "
f"prior_state={a.prior_state} action={a.action}"
)
return 0
if __name__ == "__main__":
raise SystemExit(main())
+9 -1
View File
@@ -14,6 +14,7 @@ Currently supports:
import io
import json
import logging
import re
import sys
import time
import urllib.error
@@ -36,6 +37,12 @@ _REDACTION_BANNER = (
"run with --no-redact to disable]\n"
)
_EMAIL_ADDRESS_RE = re.compile(
r"(?<![A-Za-z0-9._%+-])"
r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"
r"(?![A-Za-z0-9._%+-])"
)
# ---------------------------------------------------------------------------
# Paste services — try paste.rs first, dpaste.com as fallback.
@@ -398,7 +405,8 @@ def _redact_log_text(text: str) -> str:
return text
from agent.redact import redact_sensitive_text
return redact_sensitive_text(text, force=True)
text = redact_sensitive_text(text, force=True)
return _EMAIL_ADDRESS_RE.sub("[REDACTED_EMAIL]", text)
def _capture_log_snapshot(
+92 -1
View File
@@ -207,14 +207,69 @@ def _fail_and_issue(text: str, detail: str, fix: str, issues: list[str]) -> None
issues.append(fix)
def _check_s6_supervision(issues: list[str]) -> None:
"""Inside a container under our s6 /init, surface what s6 sees.
Runs as a counterpart to :func:`_check_gateway_service_linger` for
the systemd-on-host case. No-op everywhere except in the s6
container so host runs aren't cluttered with irrelevant output.
Reports:
- Whether the main-hermes and dashboard static services are up
- How many per-profile gateway slots are registered (via
``S6ServiceManager.list_profile_gateways()``) and how many are
currently supervised as ``up``
"""
try:
from hermes_cli.service_manager import (
S6ServiceManager,
detect_service_manager,
)
except Exception:
return
if detect_service_manager() != "s6":
return
_section("s6 Supervision")
mgr = S6ServiceManager()
# Static services. They live under /run/service/ via s6-rc symlinks,
# so the same s6-svstat probe works.
for static in ("main-hermes", "dashboard"):
if mgr.is_running(static):
check_ok(f"{static}: up")
else:
check_info(f"{static}: down (expected if not enabled via env)")
profiles = mgr.list_profile_gateways()
if not profiles:
check_info("No per-profile gateways registered yet — create one with `hermes profile create <name>`")
return
up_count = sum(1 for p in profiles if mgr.is_running(f"gateway-{p}"))
check_ok(
f"Per-profile gateways: {up_count}/{len(profiles)} supervised up"
+ (f" ({', '.join(sorted(profiles))})" if len(profiles) <= 8 else "")
)
def _check_gateway_service_linger(issues: list[str]) -> None:
"""Warn when a systemd user gateway service will stop after logout."""
"""Warn when a systemd user gateway service will stop after logout.
Skipped inside a container running under s6 the linger concept
(user-systemd surviving SSH logout) doesn't apply there, and the
s6 supervision state is surfaced separately by
``_check_s6_supervision``.
"""
try:
from hermes_cli.gateway import (
get_systemd_linger_status,
get_systemd_unit_path,
is_linux,
)
from hermes_cli.service_manager import detect_service_manager
except Exception as e:
check_warn("Gateway service linger", f"(could not import gateway helpers: {e})")
return
@@ -222,6 +277,12 @@ def _check_gateway_service_linger(issues: list[str]) -> None:
if not is_linux():
return
# Inside a container under our s6 /init, _check_s6_supervision
# reports the live supervision state; the linger warning would be
# confusing here (no systemd, no logout, no "lingering" concept).
if detect_service_manager() == "s6":
return
unit_path = get_systemd_unit_path()
if not unit_path.exists():
return
@@ -508,6 +569,13 @@ def run_doctor(args):
if should_fix:
env_path.parent.mkdir(parents=True, exist_ok=True)
env_path.touch()
# .env holds API keys — restrict to owner-only access from
# creation. touch() obeys umask which is commonly 0o022,
# leaving the file world-readable; tighten explicitly.
try:
os.chmod(str(env_path), 0o600)
except OSError:
pass
check_ok(f"Created empty {_DHH}/.env")
check_info("Run 'hermes setup' to configure API keys")
fixed_count += 1
@@ -984,6 +1052,7 @@ def run_doctor(args):
pass
_check_gateway_service_linger(issues)
_check_s6_supervision(issues)
if sys.platform != "win32":
_section("Command Installation")
@@ -1076,6 +1145,26 @@ def run_doctor(args):
# Docker (optional)
terminal_env = os.getenv("TERMINAL_ENV", "local")
try:
from hermes_constants import is_container as _is_container
running_in_container = _is_container()
except Exception:
running_in_container = False
if running_in_container:
# Inside our container the Docker terminal backend is not
# configured by default (Docker-in-Docker isn't set up); the
# local backend is the intended one. Skip the noisy "docker
# not found" warning. If the user has explicitly chosen
# TERMINAL_ENV=docker inside the container they likely mounted
# /var/run/docker.sock, so fall through to the normal check.
if terminal_env != "docker":
check_info(
"Running inside a container — using local terminal backend "
"(docker-in-docker is not configured by default)"
)
# Skip to next section; Docker isn't relevant here.
terminal_env = "local"
if terminal_env == "docker":
if _safe_which("docker"):
# Check if docker daemon is running
@@ -1098,6 +1187,8 @@ def run_doctor(args):
check_ok("docker", "(optional)")
elif _is_termux():
check_info("Docker backend is not available inside Termux (expected on Android)")
elif running_in_container:
pass # already explained above
else:
check_warn("docker not found", "(optional)")
+14 -2
View File
@@ -36,7 +36,9 @@ def get_secret_source(env_var: str) -> str | None:
Returns ``"bitwarden"`` for keys pulled from Bitwarden Secrets Manager
during the current process's ``load_hermes_dotenv()`` call. Returns
``None`` for keys that came from ``.env``, the shell environment, or
aren't tracked.
aren't tracked. The returned label is metadata only: credential-pool
persistence may store it to explain the origin of a borrowed secret, but
must never treat it as authorization to persist the raw value.
"""
return _SECRET_SOURCES.get(env_var)
@@ -140,6 +142,10 @@ def _sanitize_env_file_if_needed(path: Path) -> None:
This produces mangled values e.g. a bot token duplicated 8×
(see #8908).
Also strips embedded null bytes which crash ``os.environ[k] = v``
with ``ValueError: embedded null byte`` typically introduced by
copy-pasting API keys from terminals or rich-text editors.
We delegate to ``hermes_cli.config._sanitize_env_lines`` which
already knows all valid Hermes env-var names and can split
concatenated lines correctly.
@@ -155,7 +161,11 @@ def _sanitize_env_file_if_needed(path: Path) -> None:
try:
with open(path, **read_kw) as f:
original = f.readlines()
sanitized = _sanitize_env_lines(original)
# Strip null bytes before _sanitize_env_lines so they never
# reach python-dotenv (which passes them to os.environ and
# crashes with ValueError).
stripped = [line.replace("\x00", "") for line in original]
sanitized = _sanitize_env_lines(stripped)
if sanitized != original:
import tempfile
fd, tmp = tempfile.mkstemp(
@@ -244,6 +254,8 @@ def _apply_external_secret_sources(home_path: Path) -> None:
override_existing=bool(bw_cfg.get("override_existing", False)),
cache_ttl_seconds=float(bw_cfg.get("cache_ttl_seconds", 300)),
auto_install=bool(bw_cfg.get("auto_install", True)),
server_url=str(bw_cfg.get("server_url", "") or "").strip(),
home_path=home_path,
)
if result.applied:
+6 -13
View File
@@ -21,6 +21,8 @@ from __future__ import annotations
import copy
from typing import Any, Dict, List, Optional
from hermes_cli.fallback_config import get_fallback_chain
# ---------------------------------------------------------------------------
# Helpers
@@ -30,20 +32,11 @@ def _read_chain(config: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Return the normalized fallback chain as a list of dicts.
Accepts both the new list format (``fallback_providers``) and the legacy
single-dict format (``fallback_model``). The returned list is always a
fresh copy callers can mutate without touching the config dict.
``fallback_model`` format. When both are present, the effective chain is
merged with ``fallback_providers`` entries kept first. The returned list is
always a fresh copy callers can mutate without touching the config dict.
"""
chain = config.get("fallback_providers") or []
if isinstance(chain, list):
result = [dict(e) for e in chain if isinstance(e, dict) and e.get("provider") and e.get("model")]
if result:
return result
legacy = config.get("fallback_model")
if isinstance(legacy, dict) and legacy.get("provider") and legacy.get("model"):
return [dict(legacy)]
if isinstance(legacy, list):
return [dict(e) for e in legacy if isinstance(e, dict) and e.get("provider") and e.get("model")]
return []
return get_fallback_chain(config)
def _write_chain(config: Dict[str, Any], chain: List[Dict[str, Any]]) -> None:
+72
View File
@@ -0,0 +1,72 @@
"""Helpers for reading the effective fallback provider chain from config."""
from __future__ import annotations
from typing import Any
def _normalized_base_url(value: Any) -> str:
if not isinstance(value, str):
return ""
return value.strip().rstrip("/")
def _iter_fallback_entries(raw: Any) -> list[dict[str, Any]]:
if isinstance(raw, dict):
candidates = [raw]
elif isinstance(raw, list):
candidates = raw
else:
return []
entries: list[dict[str, Any]] = []
for entry in candidates:
if not isinstance(entry, dict):
continue
provider = str(entry.get("provider") or "").strip()
model = str(entry.get("model") or "").strip()
if not provider or not model:
continue
normalized = dict(entry)
normalized["provider"] = provider
normalized["model"] = model
base_url = _normalized_base_url(entry.get("base_url"))
if base_url:
normalized["base_url"] = base_url
entries.append(normalized)
return entries
def _entry_identity(entry: dict[str, Any]) -> tuple[str, str, str]:
return (
str(entry.get("provider") or "").strip().lower(),
str(entry.get("model") or "").strip().lower(),
_normalized_base_url(entry.get("base_url")).lower(),
)
def get_fallback_chain(config: dict[str, Any] | None) -> list[dict[str, Any]]:
"""Return the effective fallback chain merged across old and new config keys.
``fallback_providers`` remains the primary source of truth and keeps its
order. Legacy ``fallback_model`` entries are appended afterwards unless
they target the same provider/model/base_url route as an earlier entry.
The returned list always contains fresh dict copies.
"""
config = config or {}
chain: list[dict[str, Any]] = []
seen: set[tuple[str, str, str]] = set()
for key in ("fallback_providers", "fallback_model"):
for entry in _iter_fallback_entries(config.get(key)):
identity = _entry_identity(entry)
if identity in seen:
continue
seen.add(identity)
chain.append(entry)
return chain
+194 -36
View File
@@ -981,6 +981,18 @@ def get_gateway_runtime_snapshot(system: bool = False) -> GatewayRuntimeSnapshot
from hermes_constants import is_container
if is_linux() and is_container():
# Phase 4: report s6 supervision when running under our /init.
# Other container runtimes (or containers built before Phase 2)
# still get the original "docker (foreground)" label.
try:
from hermes_cli.service_manager import detect_service_manager
if detect_service_manager() == "s6":
return GatewayRuntimeSnapshot(
manager="s6 (container supervisor)",
gateway_pids=gateway_pids,
)
except Exception:
pass # Fall through to the legacy label on any detection error.
return GatewayRuntimeSnapshot(
manager="docker (foreground)",
gateway_pids=gateway_pids,
@@ -1202,7 +1214,17 @@ def _systemd_operational(system: bool = False) -> bool:
def _container_systemd_operational() -> bool:
"""Return True when a container exposes working user or system systemd."""
"""Return True when a container exposes working user or system systemd.
This is NOT our Hermes Docker image that one runs s6-overlay as
PID 1 (since Phase 2 of the s6-overlay supervision plan) and is
detected via ``service_manager.detect_service_manager() == "s6"``.
This function handles the "container managed by something else"
case: systemd-nspawn, certain k8s pods, containers built FROM
systemd-bearing distros where the user has wired systemd as their
init. In those environments systemctl behaves identically to the
host case, so we fall through to the normal systemd code paths.
"""
if _systemd_operational(system=False):
return True
if _systemd_operational(system=True):
@@ -3327,34 +3349,9 @@ _PLATFORMS = [
"help": "For DMs, this is your user ID. You can set it later by typing /set-home in chat."},
],
},
{
"key": "discord",
"label": "Discord",
"emoji": "💬",
"token_var": "DISCORD_BOT_TOKEN",
"setup_instructions": [
"1. Go to https://discord.com/developers/applications → New Application",
"2. Go to Bot → Reset Token → copy the bot token",
"3. Enable: Bot → Privileged Gateway Intents → Message Content Intent",
"4. Invite the bot to your server:",
" OAuth2 → URL Generator → check BOTH scopes:",
" - bot",
" - applications.commands (required for slash commands!)",
" Bot Permissions: Send Messages, Read Message History, Attach Files",
" Copy the URL and open it in your browser to invite.",
"5. Get your user ID: enable Developer Mode in Discord settings,",
" then right-click your name → Copy ID",
],
"vars": [
{"name": "DISCORD_BOT_TOKEN", "prompt": "Bot token", "password": True,
"help": "Paste the token from step 2 above."},
{"name": "DISCORD_ALLOWED_USERS", "prompt": "Allowed user IDs or usernames (comma-separated)", "password": False,
"is_allowlist": True,
"help": "Paste your user ID from step 5 above."},
{"name": "DISCORD_HOME_CHANNEL", "prompt": "Home channel ID (for cron/notification delivery, or empty to set later with /set-home)", "password": False,
"help": "Right-click a channel → Copy Channel ID (requires Developer Mode)."},
],
},
# Discord moved to plugins/platforms/discord/ — its setup metadata is
# discovered dynamically via _all_platforms() from the platform registry
# entry registered by plugins/platforms/discord/adapter.py::register().
{
"key": "slack",
"label": "Slack",
@@ -3762,7 +3759,12 @@ def _platform_status(platform: dict) -> str:
configured = bool(entry.is_connected(synthetic))
except Exception:
configured = False
if not configured:
else:
# No is_connected hook — fall back to check_fn as a coarse
# "are deps present" gate. Don't fall back when is_connected
# is defined and returned False; that would let "SDK is
# installed" override "no token configured" and incorrectly
# report the platform as ready.
try:
configured = bool(entry.check_fn())
except Exception:
@@ -4018,15 +4020,11 @@ def _setup_dingtalk():
client_id, client_secret = result
save_env_value("DINGTALK_CLIENT_ID", client_id)
save_env_value("DINGTALK_CLIENT_SECRET", client_secret)
save_env_value("DINGTALK_ALLOW_ALL_USERS", "true")
print()
print_success(f"{emoji} {label} configured via QR scan!")
else:
# ── Manual entry ──
_setup_standard_platform(dingtalk_platform)
# Also enable allow-all by default for convenience
if get_env_value("DINGTALK_CLIENT_ID"):
save_env_value("DINGTALK_ALLOW_ALL_USERS", "true")
def _setup_wecom():
@@ -4747,10 +4745,14 @@ def _builtin_setup_fn(key: str):
from hermes_cli import setup as _s
return {
"telegram": _s._setup_telegram,
"discord": _s._setup_discord,
# discord moved into the plugin: setup_fn is registered by
# plugins/platforms/discord/adapter.py::register() and dispatched
# via the plugin path in _configure_platform().
"slack": _s._setup_slack,
"matrix": _s._setup_matrix,
"mattermost": _s._setup_mattermost,
# mattermost moved into the plugin: setup_fn is registered by
# plugins/platforms/mattermost/adapter.py::register() and dispatched
# via the plugin path in _configure_platform().
"bluebubbles": _s._setup_bluebubbles,
"webhooks": _s._setup_webhooks,
"signal": _setup_signal,
@@ -5025,6 +5027,108 @@ def gateway_setup():
# Main Command Handler
# =============================================================================
def _dispatch_via_service_manager_if_s6(
action: str, profile: str | None = None,
) -> bool:
"""If we're in a container with s6, dispatch gateway lifecycle via s6.
Returns True iff dispatched (caller should ``return``); False
otherwise caller continues with the host-side code path.
``action`` is one of ``start`` / ``stop`` / ``restart``. The
profile defaults to the current one (resolved via ``_profile_arg``).
The s6 service slot was created either by the Phase 4 profile-create
hook or by the container-boot reconciler (cont-init.d/02-). If it
doesn't exist or s6 returns an error, the named errors from
:mod:`hermes_cli.service_manager` are caught and surfaced as
actionable CLI messages (no raw ``CalledProcessError`` traceback).
"""
from hermes_cli.service_manager import (
GatewayNotRegisteredError,
S6CommandError,
detect_service_manager,
get_service_manager,
)
if detect_service_manager() != "s6":
return False
if profile is None:
# _profile_suffix() returns the bare profile name for
# HERMES_HOME=<root>/profiles/<name>, "" for the default root,
# or a hash for unrelated paths. Map "" → "default" so the
# default-profile gateway is reachable as gateway-default.
profile = _profile_suffix() or "default"
mgr = get_service_manager()
service_name = f"gateway-{profile}"
try:
if action == "start":
mgr.start(service_name)
elif action == "stop":
mgr.stop(service_name)
elif action == "restart":
mgr.restart(service_name)
else:
return False
except GatewayNotRegisteredError as exc:
print(f"{exc}")
sys.exit(1)
except S6CommandError as exc:
print(f"{exc}")
sys.exit(1)
return True
def _dispatch_all_via_service_manager_if_s6(action: str) -> bool:
"""Inside a container with s6, dispatch ``--all`` lifecycle to every
registered profile gateway.
Returns True iff dispatched (caller should ``return``); False
otherwise caller continues with the host-side code path.
Without this, ``hermes gateway stop --all`` and ``... restart --all``
fall through to ``kill_gateway_processes(all_profiles=True)``, which
just ``pkill``s every gateway process. s6-supervise observes the
crash and restarts each one ~1s later so ``--all`` ends up
*kicking* every gateway instead of *stopping* it. By iterating
``list_profile_gateways()`` and sending the lifecycle command
through the service manager we get the intended semantics (s6's
``want up``/``want down`` flips correctly so supervise stays down
after a stop).
``action`` is one of ``stop`` / ``restart`` (``start --all`` isn't
a supported CLI surface).
"""
from hermes_cli.service_manager import (
detect_service_manager,
get_service_manager,
)
if detect_service_manager() != "s6":
return False
if action not in ("stop", "restart"):
return False
mgr = get_service_manager()
profiles = mgr.list_profile_gateways()
if not profiles:
print("✗ No profile gateways registered under s6")
return True
fn = mgr.stop if action == "stop" else mgr.restart
errors: list[tuple[str, Exception]] = []
for profile in profiles:
service_name = f"gateway-{profile}"
try:
fn(service_name)
except Exception as exc: # noqa: BLE001 — report and continue
errors.append((profile, exc))
succeeded = len(profiles) - len(errors)
verb = "stopped" if action == "stop" else "restarted"
if succeeded:
print(f"{verb.capitalize()} {succeeded} profile gateway(s) under s6")
for profile, exc in errors:
print(f"✗ Could not {action} gateway-{profile}: {exc}")
return True
def gateway_command(args):
"""Handle gateway subcommands."""
try:
@@ -5109,6 +5213,21 @@ def _gateway_command_inner(args):
print(" nohup hermes gateway run > ~/.hermes/logs/gateway.log 2>&1 & # background")
sys.exit(1)
elif is_container():
# Phase 4: inside a container with s6 the gateway service is
# auto-registered when the profile is created (and reconciled
# at every container boot). `install` is therefore informational.
from hermes_cli.service_manager import detect_service_manager
if detect_service_manager() == "s6":
print("Per-profile gateways are auto-registered when you create a profile.")
print()
print(" hermes profile create <name> # creates the s6 service slot")
print(" hermes -p <name> gateway start # bring it up via s6")
print(" hermes status # see currently-supervised gateways")
return
# Fallback for pre-s6 containers or other container runtimes
# we haven't taught about supervision (Podman without our
# /init, k8s plain runs, etc.) — the historical guidance still
# applies.
print("Service installation is not needed inside a Docker container.")
print("The container runtime is your service manager — use Docker restart policies instead:")
print()
@@ -5139,6 +5258,13 @@ def _gateway_command_inner(args):
from hermes_cli import gateway_windows
gateway_windows.uninstall()
elif is_container():
from hermes_cli.service_manager import detect_service_manager
if detect_service_manager() == "s6":
print("Per-profile gateways are auto-unregistered when you delete the profile.")
print()
print(" hermes profile delete <name> # tears down the s6 service slot")
print(" hermes -p <name> gateway stop # stop without deleting the profile")
return
print("Service uninstall is not applicable inside a Docker container.")
print("To stop the gateway, stop or remove the container:")
print()
@@ -5153,6 +5279,14 @@ def _gateway_command_inner(args):
system = getattr(args, 'system', False)
start_all = getattr(args, 'all', False)
# Phase 4: inside a container with s6, dispatch via the service
# manager instead of falling through to systemd/launchd/windows.
# `--all` isn't meaningful here (each profile has its own service
# slot — start them individually via `hermes -p <name> gateway
# start`), so just bring up the current profile's slot.
if not start_all and _dispatch_via_service_manager_if_s6("start"):
return
if start_all:
# Kill all stale gateway processes across all profiles before starting
killed = kill_gateway_processes(all_profiles=True)
@@ -5182,6 +5316,11 @@ def _gateway_command_inner(args):
print("To enable systemd: add systemd=true to /etc/wsl.conf and run 'wsl --shutdown' from PowerShell.")
sys.exit(1)
elif is_container():
# Reached only when s6 ISN'T running (the early dispatch
# above handles the s6 case). Pre-s6 containers or other
# container runtimes that don't ship our /init get the
# historical guidance: the gateway is the container's main
# process, so use docker lifecycle commands.
print("Service start is not applicable inside a Docker container.")
print("The gateway runs as the container's main process.")
print()
@@ -5198,6 +5337,15 @@ def _gateway_command_inner(args):
stop_all = getattr(args, 'all', False)
system = getattr(args, 'system', False)
# Phase 4: inside a container with s6, dispatch via the service
# manager. ``--all`` iterates every registered profile gateway
# through s6 (otherwise it would fall through to ``pkill``,
# which s6-supervise observes as a crash and immediately restarts).
if stop_all and _dispatch_all_via_service_manager_if_s6("stop"):
return
if not stop_all and _dispatch_via_service_manager_if_s6("stop"):
return
if stop_all:
# --all: kill every gateway process on the machine
service_available = False
@@ -5267,6 +5415,16 @@ def _gateway_command_inner(args):
restart_all = getattr(args, 'all', False)
service_configured = False
# Phase 4: inside a container with s6, dispatch via the service
# manager (s6-svc -t restarts the supervised process). ``--all``
# iterates every registered profile gateway through s6; without
# this it would fall through to ``pkill``, which s6-supervise
# would observe as a crash and immediately restart anyway.
if restart_all and _dispatch_all_via_service_manager_if_s6("restart"):
return
if not restart_all and _dispatch_via_service_manager_if_s6("restart"):
return
if restart_all:
# --all: stop every gateway process across all profiles, then start fresh
service_stopped = False
+6 -2
View File
@@ -365,7 +365,9 @@ def _write_task_script() -> Path:
content = _build_gateway_cmd_script(python_path, working_dir, hermes_home, profile_arg)
script_path = get_task_script_path()
script_path.write_text(content, encoding="utf-8", newline="")
tmp = script_path.with_suffix(".tmp")
tmp.write_text(content, encoding="utf-8", newline="")
tmp.replace(script_path)
return script_path
@@ -436,7 +438,9 @@ def _install_startup_entry(script_path: Path) -> Path:
"""Write the Startup-folder fallback launcher. Returns its path."""
entry = get_startup_entry_path()
entry.parent.mkdir(parents=True, exist_ok=True)
entry.write_text(_build_startup_launcher(script_path), encoding="utf-8", newline="")
tmp = entry.with_suffix(".tmp")
tmp.write_text(_build_startup_launcher(script_path), encoding="utf-8", newline="")
tmp.replace(entry)
return entry

Some files were not shown because too many files have changed in this diff Show More