The cherry-pick comment referenced 'line ~6771' for the /stop handler,
but on current main the handler is at a different offset. Remove the
hard-coded line number — the 'above' reference is sufficient.
17 new tests in tests/gateway/test_subagent_protection_30170.py pin
down both the detection helper and the demotion behaviour:
* TestAgentHasActiveSubagents — 11 cases covering the precision and
defensiveness of _agent_has_active_subagents:
- returns False for None, _AGENT_PENDING_SENTINEL, and stub
agents that lack the _active_children attribute;
- returns False for an empty list (the steady state of an idle
AIAgent);
- returns True for one or many children;
- works when _active_children_lock is None (test stubs);
- rejects truthy MagicMock auto-attributes — this is the
regression-guard for "every MagicMock-based gateway test
suddenly demotes to queue mode" (which is how this was
originally found);
- accepts list/tuple/set as the children container.
* TestBusyHandlerDemotesInterruptForSubagents — 6 cases driving
_handle_active_session_busy_message directly:
- parent.interrupt is NOT called when subagents are active,
message is still merged into the pending queue;
- ack copy mentions "Subagent working", "queued", and the
/stop escape hatch — and does NOT mention "Interrupting";
- with no subagents, behaviour is byte-identical to the
pre-#30170 interrupt path (parent.interrupt called with the
user text, ack says "Interrupting");
- configured queue mode keeps its vanilla "Queued for the next
turn" ack (the #30170 demotion-specific copy must NOT fire);
- configured steer mode still routes to running_agent.steer()
even when subagents are active (the guard is interrupt-only);
- _AGENT_PENDING_SENTINEL does not trigger demotion.
Refs #30170.
When a user sends a conversational follow-up while delegate_task is
running, gateway/run.py calls running_agent.interrupt(event.text) on
the PARENT agent. AIAgent.interrupt() then cascades synchronously
through self._active_children and calls interrupt() on every child
subagent, aborting in-flight delegate_task work. The user sees the
fallback cascade with no root-cause in the gateway log, and minutes of
subagent progress are destroyed — the exact failure mode reported in
Add GatewayRunner._agent_has_active_subagents(running_agent) — a
static helper that returns True iff the parent is currently driving
subagents via delegate_task. The helper is type-defensive: it ignores
truthy MagicMock auto-attributes (so this doesn't accidentally fire
in every test mock that hits the busy path), the _AGENT_PENDING_SENTINEL
placeholder, and missing locks.
Wire the helper into both interrupt branches:
1. _handle_active_session_busy_message — the adapter-level busy
handler. When busy_input_mode == 'interrupt' AND the parent has
active subagents, demote to 'queue' semantics: skip the
parent.interrupt() call, merge the message into the pending
queue, and surface a dedicated ack ("⏳ Subagent working — your
message is queued for when it finishes (use /stop to cancel
everything).") so the operator knows the message wasn't lost and
discovers the explicit escape hatch.
2. The PRIORITY interrupt branch inside _handle_message — the
non-command fast path. Same rationale, same demotion. Routes
through _queue_or_replace_pending_event so the next-turn pickup
stays unchanged.
Explicit /stop and /new commands take a completely different path
(_interrupt_and_clear_session in the slash-command dispatch at line
~6771) and are NOT affected by this guard — the operator still has a
way to force-cancel everything when they actually mean it. Configured
'queue' and 'steer' modes are also untouched: 'queue' already does the
right thing, and 'steer' goes through running_agent.steer() which does
NOT cascade to children (so subagents survive a steer too).
This is Phase 1 of the fix outlined in #30170 — the minimum viable
change that stops subagent loss. Phase 2 (delegation-aware steer
forwarding to active children) and Phase 3 (async delegation, #11508)
are intentionally out of scope.
Refs #30170.
* fix(tui): delineate assistant responses from details
Add a muted Response marker before assistant text when thinking/tool details are visible so reasoning and final output do not visually run together.
* fix(tui): account for response separator height
Keep virtual transcript estimates aligned with the new response separator and avoid allocating trimmed copies of long assistant text.
* fix(tui): gate response separator estimate on details
Only add response-separator height when assistant details actually render, and use a non-allocating body-text check.
* fix(tui): skip empty detail height estimates
Do not add virtual transcript height for assistant details when no thinking or tool detail UI will render.
* fix(tui): estimate details by section visibility
Pass resolved thinking/tool visibility into virtual height estimates so hidden detail sections do not reserve response-separator rows.
After key #1 is marked exhausted the retry still called the API with key #1
due to env-var bias in _get_cached_client / resolve_api_key_provider_credentials.
Fix: peek the pool and pass the active entry's key as explicit_api_key.
Secondary: api_key_hint in mark_exhausted_and_rotate pins the correct entry
under concurrent CLI+gateway calls; _is_payment_error matches GoUsageLimitError;
extract_api_error_context parses "Resets in Xhr Ymin".
Adds two new config keys:
- paste_collapse_threshold (default: 5) — line count threshold for
bracketed paste collapse in both TUI and CLI
- paste_collapse_threshold_fallback (default: 0, disabled) — same for
the fallback heuristic in terminals without bracketed paste support
TUI frontend reads these from config.get full via applyDisplay/patchUiState.
CLI reads from self.config at paste-handling time.
Closes#5626
Related: #5623
Closes#26145.
When the user interrupts the retry loop between two 429s (Ctrl-C in
interactive mode, /new, gateway disconnect), the local has_retried_429
flag dies with the recovery function. On the next user prompt the agent
restarts with has_retried_429=False, hits 429 on the exhausted credential,
sets the flag, returns 'retry once'. Repeat forever — the second 429 that
would trigger rotation is never reached, and healthy entries (priority>0
free/paid accounts) are never tried.
Fix: in recover_with_credential_pool's rate_limit branch, pre-check
pool.current().last_status before running the retry-once dance. If the
current entry is already STATUS_EXHAUSTED, rotate immediately. Uses
getattr() for the attribute read so existing tests with SimpleNamespace
mocks (which only set 'label') keep working.
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
The new install-path validator from this PR raises 'Unsafe install path:
...' earlier in the pipeline than the previous resolve-then-check path.
Behavior is identical (ok=False, victim untouched, refused before
rmtree) — only the error string changed.
Validate Skills Hub lock-file install paths at both ends of the
lifecycle so a poisoned or malformed lock.json entry cannot drive
shutil.rmtree to a location outside SKILLS_DIR:
- HubLockFile.record_install rejects empty/'.'/absolute/traversal/
Windows-drive paths at write time, and requires the final path
component to match the skill name (shape: '<skill>' or
'<category>/<skill>').
- install_from_quarantine resolves its destination through the same
validator, catching symlink/junction redirects inside skills/.
- uninstall_skill resolves the lock entry through the new validator
before rmtree. Refuses anything that resolves to SKILLS_DIR itself
(empty/dot paths) or to a target outside SKILLS_DIR (absolute paths,
traversal, symlinked dirs in skills/ pointing outward).
- 14 focused regression tests covering each rejection class plus a
symlink-redirect case.
E2E verified: hand-crafted poisoned lock.json entries (absolute path,
empty install_path, traversal) all refuse and leave the targeted
victim untouched; legitimate uninstall still succeeds.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Nous Portal is OAuth-only (auth_type=oauth_device_code, no API key path),
but the non-retryable-401 guidance branch only covered openai-codex and
xai-oauth. A Nous 401 fell through to the generic 'Your API key was
rejected... run hermes setup' message, which is wrong advice — the user
needs hermes auth add nous --type oauth, not an API key.
Also flag the case where the failing model slug ends in :free (OpenRouter
syntax) while provider is nous. Without that hint, users re-OAuth
successfully and then hit the same 401 on the next message because Nous
Portal doesn't carry the OpenRouter free-tier slug.
Reported by ashh — debug dump showed Nous device_code exhausted +
deepseek/deepseek-v4-flash:free as the model.
Aux callers (title generation, vision, session search, etc.) can reach
resolve_provider_client() without an explicit model when the user
picked their main provider via 'hermes model' and didn't bother
configuring a per-task auxiliary.<task>.model override. The
expectation in that case is universal: 'use my main model for side
tasks too.'
Before, the OAuth providers (xai-oauth, openai-codex) silently
returned (None, None) on an empty model — both lack a catalog default
because their accepted-model lists drift on the backend. That caused
_resolve_auto to drop to its Step-2 fallback chain (OpenRouter /
Nous / etc.), so aux tasks billed against the wrong subscription
without warning.
The fix is at the top of resolve_provider_client() — a single
3-step universal fallback that runs before any provider branch, so
no provider-specific empty-model guards are needed (now or for any
future provider we add):
1. caller-passed model (caller knew what they wanted)
2. provider's catalog default (cheap aux model, if registered)
3. user's main model from config.yaml
Behaviour by provider class:
- OAuth providers (xai-oauth, openai-codex) — no catalog default, so
step 3 applies. Title gen runs on grok-4.3 / gpt-5.4 against the
user's actual subscription instead of leaking to OpenRouter.
- API-key providers (anthropic, gemini, kimi-coding, etc.) — catalog
default wins at step 2, preserving the original 'cheap aux model'
behaviour. Anthropic users still get claude-haiku-4-5 for titles,
not opus.
- Explicit-model callers (auxiliary.<task>.model config, programmatic
callers) — caller wins at step 1, no surprise switching.
Salvaged from @wysie's PR #31845 which fixed the xai-oauth branch
specifically. The universal shape supersedes the per-branch fix
and covers openai-codex (same bug class) plus any future OAuth
providers.
4 new tests in TestResolveProviderClientUniversalModelFallback:
- empty_model_for_oauth_provider_falls_back_to_main_model
- empty_model_for_codex_also_uses_main_model
- empty_model_for_catalog_provider_uses_catalog_default
- explicit_model_takes_precedence_over_fallbacks
365/365 across tests/agent/test_auxiliary_*, tests/run_agent/test_codex_xai_oauth_recovery.py, tests/hermes_cli/test_auth_xai_oauth_provider.py, and tests/hermes_cli/test_plugin_auxiliary_tasks.py.
Co-authored-by: wysie <wysie@users.noreply.github.com>
All four failures were broken by the security cluster (#10082 / #10133 /
#4609 / symlink-reject batch) merging on May 25. They were red on
origin/main HEAD when #32042 and #32061 ran, gating PRs that touched
unrelated code.
1) tests/hermes_cli/test_update_zip_symlink_reject.py
test_update_via_zip_accepts_normal_member called the real
_update_via_zip without sandboxing PROJECT_ROOT — so the function's
shutil.copytree() actually copied the fake README from the test ZIP
over the real repo's README.md, which then made
test_readme_mentions_powershell_installer fail in any test run that
happened to pick this test up earlier. Mock PROJECT_ROOT to an
isolated tmp_path / install_dir, stub subprocess so pip/uv reinstall
doesn't actually run, and assert the fake README lands in the
sandbox (not the real tree).
2) tests/tools/test_windows_native_support.py
test_readme_mentions_powershell_installer was the victim of (1) —
nothing wrong with the test itself, the fix in (1) clears it.
3) tests/tools/test_file_read_guards.py
test_proc_fd_other_not_blocked called _is_blocked_device('/proc/self/fd/3')
expecting False. But _is_blocked_device runs realpath() and on
pytest xdist workers fd 3 happens to be dup'd to /dev/urandom
(because the worker subprocess inherits open fds from pytest's
collection pipe machinery). Switch to the lower-level
_is_blocked_device_path which is the path-pattern check the test
actually means to exercise; realpath-resolution coverage already
lives in test_symlink_to_blocked_device_is_blocked.
4) tests/tools/test_transcription_tools.py
Module installed a faster_whisper stub via sys.modules without
setting __spec__, then later @pytest.mark.skipif called
importlib.util.find_spec('faster_whisper') which raises
'ValueError: __spec__ is None' for modules with a None spec attr.
Set __spec__ on the stub to a real ModuleSpec.
Validation: 195/195 green across the 4 affected files.
The TUI frontend's slash command registry shadowed /queue's 'q' alias
with /quit's 'q' alias. Since /quit appeared later in the registry,
the flat lookup kept the later entry, making /q always quit instead
of queueing a prompt.
This mirrors the backend fix in PR #10538 (hermes_cli/commands.py)
but applies the same correction to the TUI TypeScript registry.
Fixes#10467
When an MCP server triggers OAuth at startup, the user can now type 'skip'
(or 'cancel', 's', 'n', 'no', 'q', 'quit') at the paste prompt + Enter to
exit the flow cleanly and continue agent startup without that server.
Previously the only ways to bypass an unwanted OAuth prompt were:
- Wait the full 5-minute paste timeout
- Ctrl+C (also kills the whole reload, may leave half-state)
- Edit config.yaml to set 'enabled: false' on the server
Skip writes a sentinel to result['error'] which _wait_for_callback maps to
OAuthNonInteractiveError('user_skipped'). mcp_tool already classifies that
as an auth error in _is_auth_error() and the reconnect loop logs it as
'not retrying automatically' — server stays disconnected for the session,
other MCP servers continue normally, no infinite retry burn.
The skip message tells users how to re-auth later ('hermes mcp login') or
disable persistently ('enabled: false'), so they don't have to remember.
14 new tests covering: case-insensitive skip parsing, all 7 skip tokens,
skip not stomping an HTTP-listener win, skip routed to skip path rather
than URL-parse path, sentinel mapped to OAuthNonInteractiveError, prompt
mentions the skip option.
Follow-up to #32053. The OAuth-over-SSH guide and the MCP feature page
previously only covered xAI and Spotify. Now that MCP servers can complete
OAuth via stdin paste-back on remote/headless hosts, document it.
oauth-over-ssh.md:
- Add MCP servers to the 'Which Providers Need This' table.
- New 'MCP Servers' section covering: paste-back (no setup, works
anywhere), SSH port forward (same pattern as xAI/Spotify), and the 30s
config-auto-reload race pitfall (use 'hermes mcp login <server>' from a
fresh terminal instead of editing config from inside a running session).
mcp.md:
- New 'OAuth-authenticated HTTP servers' section under HTTP servers,
covering auth: oauth config, token cache path, paste-back vs SSH
tunnel for headless hosts, and the same reload-race pitfall.
- Cross-links to the OAuth-over-SSH guide anchor.
The CLI status bar tracked /background agent tasks (▶ N) but not shell
processes spawned via terminal(background=true). Both kinds of work can
run concurrently and a user has no in-bar signal for shell processes.
Add an independent indicator (⚙ N) sourced from
tools.process_registry.process_registry._running. The two indicators
render side-by-side when both are active (▶ 1 │ ⚙ 2), hidden when their
count is zero. Renders at all four status-bar tiers (text fallback +
prompt_toolkit fragments, narrow + wide widths). The narrow <52 tier
still drops both for space — unchanged.
New ProcessRegistry.count_running() returns len(_running) without
acquiring _lock; CPython dict len is atomic and we're polling on every
status-bar tick, so lock-free is the right tradeoff.
The chatgpt.com/backend-api/codex endpoint has an intermittent failure mode
where it accepts the connection but never emits a single stream event — the
socket just hangs. Direct sequential probing reproduces it (0 events, no HTTP
status), and a fresh reconnect then succeeds in ~2s. Today the only guard is
the wall-clock stale timeout in interruptible_api_call, so a dead-on-arrival
connection is held for the full stale window (90-900s depending on context /
config) before the retry loop can reconnect — minutes of wasted wall time per
stall, at a rate of ~20% of calls during affected windows.
Add a TTFB watchdog scoped to the codex_responses path:
- codex_runtime.run_codex_stream stamps agent._codex_stream_last_event_ts on
*every* stream event (not just output-text deltas), so reasoning-only and
tool-call-only turns are not mistaken for a stall.
- interruptible_api_call resets that marker before the worker starts and, while
it is still None, kills the connection once elapsed exceeds the TTFB cutoff
(default 45s, tunable via HERMES_CODEX_TTFB_TIMEOUT_SECONDS, 0 disables). The
raised TimeoutError flows through the existing retry path unchanged.
Once any event has arrived the stream is healthy and only the existing
wall-clock stale timeout applies, so legitimate long generations are never
interrupted. Gated to codex_responses; the chat_completions non-stream,
anthropic and bedrock branches have no first-event signal and are untouched.
Adds tests/agent/test_codex_ttfb_watchdog.py covering the stall kill, the
events-flowing pass-through, and the env-disable path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The gateway's media delivery allowlist required files live inside
`~/.hermes/cache/{documents,images,...}`, which is the wrong shape for
real agent usage. Agents naturally produce artifacts via terminal tools
(`pandoc -o /tmp/report.pdf`, `matplotlib savefig`, etc.) or
write_file into project directories — these never land under the cache.
Result: users got a raw file path in chat instead of an attachment.
This is doubly bad in deployment shapes where the cache directories
aren't writable by the agent at all: Hermes running in Docker with a
read-only mount, or with a Docker/Modal/SSH terminal backend whose
filesystem isn't the gateway host's filesystem.
Layered trust model:
1. Cache-dir allowlist (unchanged) — Hermes-managed roots always trusted.
2. Operator allowlist — `HERMES_MEDIA_ALLOW_DIRS` env var, now also
surfaced as `gateway.media_delivery_allow_dirs` in config.yaml.
3. Recency-based trust (new, default on) — files whose mtime is within
`gateway.trust_recent_files_seconds` (default 600s) of "now" are
trusted even outside the cache/operator allowlist. Old host files
(`/etc/passwd`, `~/.bashrc`, `~/.ssh/id_rsa`) have mtimes measured
in days/months, well outside the window — prompt-injection paths
pointing at pre-existing files are still rejected.
4. Hard denylist — `/etc`, `/proc`, `/sys`, `/dev`, `/root`, `/boot`,
`/var/{log,lib,run}`, plus `$HOME/.{ssh,aws,gnupg,kube,docker,config,
azure,gcloud}` and `Library/Keychains`. Denylist blocks delivery
even when recency would trust the file, in case an attacker
somehow refreshes a sensitive file's mtime.
Operators who want strict-allowlist behavior set
`gateway.trust_recent_files: false` and the system reverts to
pre-existing behavior.
Tests: 6 new cases in test_platform_base.py cover the recency window,
disabled mode, system-path denylist, and the motivating PDF-in-project
scenario. 3 existing tests (test_platform_base, test_tts_media_routing,
test_send_message_tool) that exercised the strict-allowlist path are
updated to disable recency trust explicitly.
E2E validation: real `validate_media_delivery_path()` accepts fresh
PDFs in /tmp and project dirs, rejects /etc/passwd, ~/.ssh/id_rsa, and
files older than the window; config.yaml `gateway.*` keys bridge
correctly to the env vars the validator reads.
When the user runs OAuth on a remote/SSH machine without a port forward,
the OAuth provider redirects to http://127.0.0.1:<port>/callback which
only the listener on the remote machine can receive — the user's browser
on another box just shows a connection error.
_wait_for_callback() now races the HTTP listener against a stdin reader
on interactive TTYs. The user can copy the URL from the browser's address
bar after authorization (which contains code=...&state=...) and paste it
back at the prompt. Whichever fills the result dict first wins; the HTTP
listener remains the primary path for local sessions and SSH tunnels.
Accepts any of:
- Full local redirect URL: http://127.0.0.1:N/callback?code=...&state=...
- Provider URL after redirect: https://mcp.linear.app/callback?code=...&state=...
- Just the query string: ?code=...&state=... or code=...&state=...
The paste thread only spawns when _is_interactive() is true, preserving
the existing 'no input() in headless runs' invariant — verified by
TestWaitForCallbackPasteIntegration.test_paste_prompt_NOT_shown_when_noninteractive.
The SSH-session hint in _redirect_handler is updated to surface the paste
option as the primary remedy, with ssh -L tunneling as the alternative.
_update_via_zip downloads a source ZIP from GitHub and calls
zipfile.ZipFile.extractall. The existing zip-slip path guard validates
each member's path stays under tmp_dir, but does not check member type
— so a ZIP containing a symlink member would still be materialized by
extractall, and a symlink target could point outside the extracted
tree (or to a sensitive system path).
This isn't a high-likelihood threat for hermes-agent's actual GitHub
source ZIPs (we don't ship symlinks), but the extractall path runs as
the user's account and a compromised mirror could plant arbitrary files
via the symlink → target → write chain.
Reject any member whose Unix mode bits (upper 16 bits of external_attr)
are S_IFLNK before extractall. Hermes source ZIPs contain only regular
files and directories; a symlink member is unambiguously suspicious.
Regression tests cover: symlink member rejection (raises ValueError,
caught by the outer try/except as a clean SystemExit, no extraction),
and the happy-path verification that a normal ZIP doesn't trigger the
symlink reject message.
Salvaged from PR #15881 by @codeblackhole1024. The remaining pieces of
that PR were already on main or contradicted explicit design decisions:
- config.yaml write-deny: already in agent/file_safety.py's
control_file_names denylist (the modern guard); the proposed addition
to build_write_denied_paths was the legacy path.
- Quick commands danger detection: contradicts the explicit
cli.py:8491-8492 comment 'shell=True is intentional: quick_commands
are user-defined shell snippets from config.yaml — not agent/LLM
controlled.'
- Memory plugin shlex.split for dep checks: already on main
(hermes_cli/memory_setup.py:133).
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
text_to_speech_tool accepts an explicit output_path. Without a traversal
guard, a path containing '..' components (whether prompt-injection-
controlled, from a confused skill, or just a buggy caller) could escape
its declared base and write the audio to a system location — e.g.
`output_path='audio/../../etc/cron.d/x'` lands the file outside the
intended audio cache.
Reject '..' components in the user-supplied path. Explicit absolute
paths are unchanged (the agent legitimately writes audio wherever the
user/caller asks); only traversal-style escapes are blocked. The
terminal tool can still write anywhere with approval — this just keeps
the unattended TTS surface from materializing files via traversal.
Regression tests cover: '..' in the middle (audio/../../etc/...),
bare '..' prefix, and the negative cases (absolute paths + relative
paths without '..' both pass through unchanged).
Salvaged from PR #6693 by @aaronlab. The original PR confined output to
DEFAULT_OUTPUT_DIR-or-cwd, which broke 9 existing tests that legitimately
write to tmp_path locations. The traversal-only check covers the actual
threat (path-escape via '..' from prompt injection) without restricting
where users can choose to write their audio.
The remaining pieces of #6693 (skill_commands rglob symlink rejection,
delegate_tool batch prefix display) are dropped:
- skill_commands rglob: breaks the documented design supporting
~/.hermes/skills/<name> as a symlink to a checked-out skill elsewhere
(see comment at agent/skill_commands.py:73-75)
- delegate_tool batch prefix: pure UX, doesn't belong in a security PR
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
* fix(streaming): route mid-tool-call partial-stream-stub through length continuation (#31998)
When a stream stalls mid-tool-call (e.g. a large write_file), the
partial-stream-stub recovery used finish_reason='stop' which caused the
conversation loop to treat the turn as complete, returning only the
warning text. When users said 'continue', the model retried the same
large tool call, hit the same stale timeout, and looped indefinitely.
Changes:
- chat_completion_helpers.py: change _stub_finish_reason from 'stop' to
'length' for mid-tool-call partials. The stub still has tool_calls=None
so no tool auto-executes — the model gets a fresh API call through the
existing length-continuation machinery (bounded to 3 retries).
Also attach _dropped_tool_names to the stub for downstream use.
- conversation_loop.py: add a third continuation prompt branch for
partial-stream-stubs with dropped tool calls. Instead of the generic
'continue where you left off' (which would retry the same large call),
tell the model to break the output into smaller tool calls (~8K
tokens each) to avoid stream timeouts.
- test_partial_stream_finish_reason.py: update existing test from
finish_reason='stop' to 'length', add _dropped_tool_names assertion,
add new test_dropped_tool_call_uses_chunking_prompt for the 3-way
prompt branching.
Safety: tool_calls=None is preserved on the stub, so the conversation
loop enters the text-continuation branch (line 1513), NOT the tool-call
execution branch (line 3246). No tool auto-executes. The model simply
gets another API call with targeted guidance.
* refactor: extract constants and continuation prompt helper
- Move magic strings to hermes_constants.py (PARTIAL_STREAM_STUB_ID,
FINISH_REASON_LENGTH)
- Extract _get_continuation_prompt() in conversation_loop.py — DRYs the
3-way prompt branching and lets tests import the real function
- Trim verbose inline comments in chat_completion_helpers.py
- Tests import constants + helper instead of duplicating logic
---------
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
The test set HERMES_YOLO_MODE=1 via monkeypatch.setenv, expecting
check_dangerous_command() to honor yolo and bypass cron_mode=deny. But
tools.approval._YOLO_MODE_FROZEN is intentionally frozen at module
import time (security: prevents prompt-injection runtime escalation).
When CI imports the module BEFORE the test sets the env, the frozen
value stays False and the yolo bypass never activates.
Local runs missed this because the conftest leaked a non-empty
HERMES_YOLO_MODE into the import-time env. CI's clean-env path exposed
the bug deterministically on test (3) / test (4) shards.
Fix: patch the module attribute directly via mock.patch.object so the
test simulates process-startup-with-yolo regardless of import order.
The behavior under test (yolo bypasses cron_mode=deny for non-hardline
commands) is unchanged; the security invariant (_YOLO_MODE_FROZEN can't
be set at runtime by skills) is preserved.
Reproduced locally with: env -i HOME=$HOME PATH=$PATH python3 -m pytest
tests/tools/test_cron_approval_mode.py -o 'addopts=' -v
Without the fix: 1 failed, 23 passed. With the fix: 24 passed.
* fix(transcription): reject symlinked audio inputs
Validation runs before provider selection, so rejecting symbolic-link paths there prevents supported-extension links from being treated as normal audio files. Use os.path.islink to avoid perturbing the existing Path.stat error path and to reject links before resolving targets.
Constraint: Keep validation platform-safe and avoid requiring symlink support where unavailable.
Rejected: Use Path.is_symlink | it consumes pathlib stat calls and broke the existing stat error regression.
Confidence: high
Scope-risk: narrow
Directive: Keep path hardening in _validate_audio_file before provider dispatch.
Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases -q (5 passed)
Tested: source venv/bin/activate && python -m pytest tests/tools/test_transcription_tools.py::TestValidateAudioFileEdgeCases tests/tools/test_transcription_tools.py::TestTranscribeAudioDispatch::test_invalid_file_short_circuits -q (6 passed)
Tested: source venv/bin/activate && python -m compileall tools/transcription_tools.py tests/tools/test_transcription_tools.py
Tested: git diff --check
Not-tested: Full tests/tools/test_transcription_tools.py under .[dev] only; existing faster_whisper optional dependency tests fail with ModuleNotFoundError.
* Keep transcription tests independent of optional whisper install
The transcription suite mocks faster-whisper directly, so a minimal test stub keeps the branch verifiable in environments where the optional package is not installed. This preserves the existing mock-based coverage without adding a dependency.
Constraint: faster-whisper is an optional local STT dependency and is absent from the current validation environment
Rejected: Install faster-whisper just for branch validation | would add heavyweight environment coupling outside the patch scope
Confidence: high
Scope-risk: narrow
Directive: Keep this as a test-only stub unless production import semantics change
Tested: pytest tests/tools/test_transcription_tools.py -q
---------
Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
* fix: reject read_file symlinks to blocking devices
The read_file guard already refused direct device paths such as /dev/zero, but a workspace symlink resolving to one of those devices could still reach the shell-backed read path and hang on wc/head/sed. Keep the literal alias check and add a resolved-path pass so local symlinks to blocked device/fd endpoints are rejected before I/O.
Constraint: Preserve literal /dev/stdin handling before terminal-specific realpath resolution
Confidence: high
Scope-risk: narrow
Tested: pytest tests/tools/test_file_read_guards.py tests/tools/test_file_tools.py -q; python -m compileall tools/file_tools.py tests/tools/test_file_read_guards.py; git diff --check
Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
* Keep file guard tests off sensitive macOS temp paths
The branch now inherits a sensitive-path write guard from upstream main. On macOS, tempfile.mkdtemp() resolves under /private/var/folders, so the new write-path guard fired before the file read dedup assertions could exercise their intended behavior. The tests now create their scratch files inside the worktree temp checkout, outside those system-sensitive prefixes, without changing production behavior.
Constraint: Rebased branch must pass the expanded file read guard suite on macOS.
Rejected: Loosen the production sensitive-path prefix list | broader behavior change unrelated to this PR.
Confidence: high
Scope-risk: narrow
Tested: pytest tests/tools/test_file_read_guards.py -q
---------
Signed-off-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
Co-authored-by: WuKongAI-CMU <210765158+WuKongAI-CMU@users.noreply.github.com>
The read_file tool and terminal cat can access /proc/self/environ to
recover all process env vars including secrets stripped by the subprocess
blocklist. Output redaction partially mitigates (catches known-format
tokens) but misses custom/proprietary key formats, especially when
values are printed without their key names.
Add /proc/*/environ, /proc/*/cmdline, and /proc/*/maps to the blocked
device paths in _is_blocked_device():
- /proc/*/environ: leaks full process env (API keys, tokens)
- /proc/*/cmdline: leaks command-line args (may contain passwords)
- /proc/*/maps: leaks memory layout (ASLR bypass for exploitation)
Legitimate /proc reads (cpuinfo, meminfo, uptime, version) remain
accessible — the check only blocks per-pid pseudo-files with known
sensitive suffixes.
Complements PR #4432 (PID namespace isolation for child processes)
which prevents children from reading the parent's /proc, but does not
prevent the parent process itself from being read via file tools.
Partially addresses #4427
Changes:
tools/file_tools.py | +6
tests/tools/test_file_read_guards.py | +18 -1
Co-authored-by: dsr-restyn <dsr-restyn@users.noreply.github.com>
When the terminal drops the ESC[201~ end mark during a bracketed paste
(terminal race, torn write, SSH glitch, macOS sleep/wake), prompt_toolkit's
Vt100Parser keeps buffering all later input in _paste_buffer forever. From
the user's perspective, the CLI appears frozen — the only recovery was
closing the tab/session.
This patch monkey-patches Vt100Parser.feed() so that bracketed-paste mode
flushes buffered content as a normal BracketedPaste event after 2 seconds
without an end marker, then restores normal parsing.
Includes 8 regression tests covering normal paste, timeout recovery,
torn end marks, and edge cases.
Surgical reapply of PR #27518. Original branch was many months stale
(1193 files / 172k LOC of unrelated reverts); the substantive ~77 LOC
patch in cli.py plus the new 157-line test file were reapplied onto
current main with the contributor's authorship preserved via --author.
A Ctrl+C during a slow slash command (e.g. /skills browse on a large
skill tree, /sessions list against a multi-GB SQLite DB) used to unwind
past self.process_command() to the outer prompt_toolkit event loop,
which killed the entire session — losing all conversation state.
Fix: wrap the slash-command dispatch in try/except KeyboardInterrupt
so Ctrl+C aborts the command but the prompt loop continues. Other
exceptions still propagate so real bugs aren't silently swallowed.
Surgical reapply of PR #5189. Original branch was many months stale
(3764 files / 1M+ LOC of unrelated reverts); the substantive ~6 LOC
change in cli.py was reapplied by hand onto current main with the
contributor's authorship preserved via --author.
On Windows (PowerShell/Windows Terminal), the queue-based modal used for
destructive slash command confirmations deadlocks because prompt_toolkit's
input channel becomes unresponsive when entered from the process_loop daemon
thread. Keystrokes never reach the key bindings, so response_queue.get()
blocks until the 120-second timeout expires.
Fix: fall back to _prompt_text_input (stdin-based) when:
1. sys.platform == 'win32' — Windows console doesn't support the modal reliably
2. Called from non-main thread — key bindings can't fire from daemon threads
3. self._app is not set — existing behavior for tests/non-interactive
This mirrors the thread-aware guard from _prompt_text_input (PR #23454).
9 new regression tests covering Windows detection, non-main thread fallback,
macOS/Linux modal preservation, and integration with _confirm_destructive_slash.
Fixes#30768
Surgical reapply of PR #30773. Original branch was many months stale (911
files / 146k LOC of unrelated reverts); the substantive ~30 LOC change in
cli.py plus the new test file were reapplied onto current main with the
contributor's authorship preserved via --author.
The ChatGPT Codex backend (chatgpt.com/backend-api/codex) has historically
silently dropped certain model requests: the connection is accepted but no
stream events are emitted and no error is raised. PR #31967 lowered the
implicit stale-call default from 300s to 90s so fallbacks kick in faster,
but users still see an opaque "No response from provider for 90s
(non-streaming, ...)" message that gives no path forward.
This patch adds a narrow heuristic — gpt-5.5 family on the Codex backend
via codex_responses api_mode — that substitutes the generic timeout
message with actionable text naming the gpt-5.4-codex workaround and
pointing at #21444 for symptom history.
Changes:
- run_agent.py — new ``AIAgent._codex_silent_hang_hint(model=...)`` method.
Returns ``None`` for any request that does not match all three guards
(codex_responses api_mode, openai-codex provider or chatgpt.com Codex
base URL, gpt-5.5-family model name with word-boundary regex anchoring
to avoid false-positives on e.g. ``gpt-5.50``).
- agent/chat_completion_helpers.py — the non-stream stale-call site
consults the hint via ``getattr(...)`` so the call site stays robust
if the helper is ever removed or stubbed in tests. Hint is appended to
both the ``_emit_status`` warning and the ``TimeoutError`` message so
the user sees it in their terminal AND it lands in any retry-loop
diagnostics.
- tests/run_agent/test_codex_silent_hang_hint.py — 10 regression tests
covering positive cases (bare gpt-5.5, vendor-prefixed openai/gpt-5.5,
gpt-5.5-codex SKU, model=None fallback to self.model) and negative
cases (gpt-5.4-codex workaround, gpt-5.50 false-positive guard,
non-codex api_mode, non-codex provider, empty/None model, unrelated
models on Codex).
Does NOT fix the backend-side issue (that's an upstream OpenAI/ChatGPT
problem we cannot patch from here). Only converts an opaque timeout into
text that names the workaround so users do not have to dig through logs
or wait for a forum post to learn what to do.
Closes#22046
get_read_block_error() only blocked internal Hermes cache files but
allowed reading project-local secret-bearing environment files (.env,
.env.production, .env.local, etc.) through both read_file and ACP
fs/read_text_file paths.
Add a basename deny set for common secret-bearing .env variants.
.env.example remains readable as documentation.
Fixes#20734
.env holds API keys and secrets. Multiple creation sites used `cp` /
`touch` / `shutil.copy2` which obey the process umask — commonly
0o022, leaving the file at 0o644 (world-readable). Apply chmod 0o600
explicitly at every site that creates or copies .env.
Sites covered:
- docker/stage2-hook.sh: after the seed_one '.env' call, applied
unconditionally (not just on first-seed) so a host-mounted .env with
loose perms gets tightened on every container restart
- hermes_cli/doctor.py: 'hermes doctor --fix' touches an empty .env
when missing
- hermes_cli/profiles.py: 'hermes profile create --clone' copies .env
from the source profile; shutil.copy2 preserves source mode, so a
source .env at 0o644 was being cloned into 0o644
- setup-hermes.sh: in-tree setup script's cp .env.example .env path,
plus the already-exists branch (mirror of install.sh which already
chmods 600 unconditionally on line 1442)
scripts/install.sh was NOT changed — it already chmod 600's the .env
unconditionally after the create/already-exists branches (line 1442).
Salvaged from PR #25726 by @dusterbloom. The docker/entrypoint.sh
portion of the original PR was dropped because main switched to an
s6-overlay shim — the .env creation logic moved to stage2-hook.sh,
which is where the chmod now lives.
Closes#25497 (subset — install.sh + setup-hermes.sh) and #8448
(subset — install.sh only) as superseded.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
* fix(approval): harden YOLO bypass, LLM parsing, auto-approve audit, pipe pattern
- BUG-009 (CRITICAL): freeze HERMES_YOLO_MODE at module import via
_YOLO_MODE_FROZEN; prevents skills/prompt-injection from calling
os.environ["HERMES_YOLO_MODE"]="true" at runtime to bypass all checks
- BUG-002 (HIGH): replace substring "APPROVE" in answer with exact
answer == "APPROVE" in _smart_approve; prompt already requests exactly
one word, substring match was exploitable via verbose LLM responses
- BUG-001 (MEDIUM): add logger.warning for every dangerous command that
auto-approves in non-interactive non-gateway context; makes silent
approvals visible in audit logs without breaking script behavior
- BUG-008 (LOW): expand curl/wget pipe pattern to cover | /bin/bash and
| bash -c variants, not just | sh / | bash
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(approval): add missing is_truthy_value import + fix yolo test patches
_YOLO_MODE_FROZEN uses is_truthy_value() from utils — import was missing.
Tests that set HERMES_YOLO_MODE via monkeypatch.setenv() no longer work
because the value is frozen at import time; update them to patch the
module-level flag directly via monkeypatch.setattr().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
register_env_passthrough() (the skill-declared path) filters out names in
_HERMES_PROVIDER_ENV_BLOCKLIST and logs a warning citing GHSA-rhgp-j443-p4rf.
_load_config_passthrough() (the config.yaml path) did not. Both feed the
same is_env_passthrough() allowlist that local.py and code_execution_tool.py
consult before stripping a variable from the child env.
A skill that wanted to leak ANTHROPIC_API_KEY or OPENAI_API_KEY into
execute_code could no longer self-register the name (the GHSA fix
blocks it), but the same outcome was still reachable by asking the
operator to add the name to terminal.env_passthrough in config.yaml,
or by any in-process actor with write access to ~/.hermes/config.yaml.
Apply the same _is_hermes_provider_credential filter inside
_load_config_passthrough, mirroring the skill-path warning so operators
see the same explanation. Non-Hermes API keys (TENOR_API_KEY,
NOTION_TOKEN, etc.) are unaffected since they are not in the blocklist.
* perf(bitwarden): persist secret-fetch cache across CLI invocations
Every `hermes` invocation paid a ~380ms tax for `bws secret list` to
Bitwarden Secrets Manager because the existing cache was in-process only.
Back-to-back `hermes chat -q`, gateway-spawned agents, and cron-launched
runs all re-fetched.
Adds a disk-persisted L2 cache at `<hermes_home>/cache/bws_cache.json`
(mode 0600, never contains the access token — only the SHA-256
fingerprint prefix). Same TTL as the in-process cache. Read on miss,
write on bws success, ignored on key mismatch / corruption / expiry.
Measured on a startup profile:
load_hermes_dotenv() cold: 372ms → warm (disk cache hit): 20ms
End-to-end `hermes --version` cold→warm: 666ms → ~295ms.
In a hermes-vs-codex benchmark across 11 single- and multi-turn tasks
(framework overhead = wall − llm − tool_exec, median over 3 trials):
cohort before after saved
single-turn (median) 2.96s 2.31s -0.65s
multi-turn (5-turn) 9.40s 8.95s -0.45s (≈0.3s/turn)
Hermes now wins head-to-head on 6/11 tasks vs codex (was 4/11 before).
The remaining ~0.6s single-turn delta is mostly Python's own import
cost in hermes_cli.main, which is a separate optimization.
* perf(cli): lazy-load model catalog + dedupe config.yaml reads at startup
Two import-time wins on top of the bws disk-cache fix:
1. Lazy-load `hermes_cli.models._PROVIDER_MODELS` via PEP 562
module-level `__getattr__`. The catalog is ~55ms of work that was
eagerly imported on every CLI invocation (line 4557 `if not
_is_termux_startup_environment(): from hermes_cli.models import
_PROVIDER_MODELS`). Audit showed every internal call site already
does its own function-local import; only test code reads
`hermes_cli.main._PROVIDER_MODELS` as a module attribute, and
__getattr__ keeps that working transparently. First access triggers
the import once and caches the result on the module via
`globals()[name] = ...`, so subsequent reads are dict lookups.
2. Dedupe the double config.yaml read in the top-of-module bootstrap.
Previously: one raw yaml.safe_load for the `security.redact_secrets`
bridge, then a separate full `load_config()` (with deep-merge) for
`network.force_ipv4`. Both keys come from the same file. Merged
into one raw yaml load.
Combined with the bws cache fix in the previous commit:
hermes --version wall time:
original (cold): 666 ms
after bws fix (warm): 295 ms
after lazy-load + dedupe: 228 ms (-67 ms additional, -66% from original)
Tests:
- tests/hermes_cli/test_api_key_providers.py: 173/173 pass
(lazy __getattr__ correctly handles
`from hermes_cli.main import _PROVIDER_MODELS`)
- tests/test_ipv4_preference.py + tests/hermes_cli/test_redact_config_bridge.py +
tests/agent/test_redact.py: 93/93 pass (dedupe preserves both bridges)
- tests/test_bitwarden_secrets.py + env_loader tests: 49/49 pass
V4A patch '*** Update File:', '*** Add File:', '*** Delete File:' headers
come from patch CONTENT, not the explicit `path=` argument. That makes
them attacker-influenceable through skill content, web extract output,
prompt injection, and other surfaces the agent processes. Headers like
'*** Update File: ../../../etc/shadow' would resolve relative to the
agent's cwd; in deployment configurations where that cwd is deep enough
to land outside Hermes' protected paths, the write could land somewhere
the agent operator did not intend.
Reject any V4A header containing a '..' path component before applying
the patch. The explicit `path=` argument on patch_tool is UNCHANGED —
the agent legitimately uses '..' there (e.g. `patch path='../other_module/x.py'`
from a worktree dir is normal cross-module editing).
Regression tests: V4A Update header with traversal rejected, V4A Add
header with traversal rejected, patch_v4a never invoked when rejection
fires.
Salvaged from PR #29395 by @waefrebeorn. The original PR added
has_traversal_component as a blanket reject on read_file_tool,
write_file_tool, patch_tool's explicit path, and search_tool — that
would break legitimate agent operation where '..' is normal. Also
dropped the over-eager skills_guard pattern additions
(pickle.loads/marshal.loads/ctypes.CDLL/importlib at high/critical
severity would false-positive on legit data-science and FFI skills).
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Expand _MEMORY_THREAT_PATTERNS from 13 to 24 regex patterns and align
_INVISIBLE_CHARS with skills_guard.py (10 → 17 characters).
Key changes:
- Add multi-word bypass prevention (?:\w+\s+)* to injection patterns
- Add missing injection patterns: role_pretend, leak_system_prompt,
remove_filters, fake_update, translate_execute, html_comment_injection,
hidden_div
- Add exfiltration patterns: send_to_url, context_exfil
- Add persistence patterns: agent_config_mod, hermes_config_mod
(both require modification-verb prefix to avoid false positives on
mere mentions of config filenames)
- Add hardcoded secret detection pattern
- Add role_hijack precision fix: require article after "now" to avoid
blocking "you are now ready/connected/set up" etc.
- Expand invisible unicode set with directional isolates (U+2066-2069)
and invisible math operators (U+2062-2064)
Test coverage expanded from ~8 to ~30 scan tests including dedicated
false-positive regression tests for all precision-sensitive patterns.
Known limitations (deferred to follow-up PRs):
- prompt_builder.py and cronjob_tools.py still use older pattern sets
- No semantic/LLM-based scanning (regex-only approach)
- No cross-entry or cross-store analysis
show_snapshot.py unpickled a user-supplied path unconditionally. pickle.loads
is equivalent to arbitrary code execution, so a snapshot from an untrusted
source = RCE. Require an explicit --i-trust-this-file acknowledgement before
calling pickle.loads, and emit a stderr warning when proceeding.
Co-authored-by: Jiahui-Gu <jiahuigu@users.noreply.github.com>
Codex / Responses-API requests had three latent timeout bugs that combined
into the long silent hangs reported on #21444:
1. The non-stream stale-call detector estimated context tokens from
``api_kwargs["messages"]`` only. Codex / Responses-API payloads carry
their conversational load in ``input`` (with ``instructions`` and
``tools``), so every Codex turn logged ``context=~0 tokens`` and the
detector never applied its >50k / >100k tier bumps.
2. ``providers.<id>.request_timeout_seconds`` was silently dropped on the
main Codex path. The chat_completions path and the auxiliary Codex
adapter both forwarded it; the main path skipped it through three
places (``build_api_kwargs``, ``ResponsesApiTransport.build_kwargs``,
``_preflight_codex_api_kwargs``).
3. The streaming stale detector had the same payload-shape bug for
``codex_responses`` requests, which route through the non-streaming
detector (it's the path that emits the user-facing
"No response from provider for 300s (non-streaming, ...)" warning that
reporters keep pasting).
This commit:
- Adds ``estimate_request_context_tokens`` in ``chat_completion_helpers``,
used by both the non-stream and stream detectors. Handles ``messages``
(Chat Completions), ``input + instructions + tools`` (Responses API),
bare lists, and an unknown-dict fallback.
- Forwards ``timeout`` through ``ResponsesApiTransport.build_kwargs``
and ``_preflight_codex_api_kwargs`` (with guards against
zero/negative/inf/bool values), and wires
``_resolved_api_call_timeout()`` into the Codex branch of
``build_api_kwargs``.
- Lowers the implicit non-stream stale defaults so fallback providers
kick in faster when upstream stalls:
* base 300s -> 90s
* >50k 450s -> 150s
* >100k 600s -> 240s
These only apply when the user has *not* set
``providers.<id>.stale_timeout_seconds`` or
``HERMES_API_CALL_STALE_TIMEOUT``. Explicit config still wins.
- Adds regression tests for the estimator shapes, the new defaults, the
context-tier scaling, transport timeout pass-through, and preflight
timeout pass-through / rejection of invalid values.
Closes#21444
Supersedes #21652#24126#31855
Co-authored-by: Hoang V. Pham <26063003+hehehe0803@users.noreply.github.com>
Translates the full English docs corpus (335 files) into Simplified
Chinese under website/i18n/zh-Hans/. Combined with PR #31895 (cross-
locale link fix), the 简体中文 locale toggle now serves a complete
Chinese site with working cross-page navigation.
Pipeline:
- Claude Sonnet 4.6 via OpenRouter, 8-way concurrent
- Preserves frontmatter keys, code blocks, MDX/JSX, link URLs, brand
names, and technical jargon (prompt/token/hook/MCP/ACP/etc.)
- Translates only frontmatter title/description and prose
- Two largest files (configuration.md 93KB, research-paper-writing.md
107KB) retried with 64K max_tokens after initial fence-drift
- 3 manual post-fixes for MDX edge cases the model didn't escape:
< in optional-skills-catalog table, double-quotes in an alt= tag,
and a bare URL adjacent to a full-width period
Cost: ~$30 total (Sonnet 4.6 input $3/M + output $15/M).
Verified `npm run build` succeeds for both en and zh-Hans locales,
no double-prefixed /docs/zh-Hans/docs/ URLs in rendered output,
all in-page navigation resolves correctly.
Translations are machine-generated and may need human review on
specific pages — but they're an enormous improvement over the
previous state (3 zh-Hans pages out of 335).
When 'hermes chat --quiet --resume <id> -q "..."' is used, three status
messages were written to stdout via ChatConsole / _cprint:
- '↻ Resumed session <id> (N user messages, M total messages)'
- 'Session <id> found but has no messages. Starting fresh.'
- 'Session not found: <id>' / usage hint
This polluted the machine-readable stdout that automation wrappers capture
with $(...), making it impossible to cleanly separate the agent's answer
from the resume banner.
Fix: detect quiet mode via tool_progress_mode == 'off' and route the three
resume status messages to stderr (as plain text, matching the existing
stderr convention for session_id). Interactive mode is unchanged — it
still uses the Rich-rendered path through ChatConsole.
Surgical reapply of PR #11868. Original branch was stale against current
main; reapplied onto current cli.py by hand with original authorship
preserved via --author.
Follow-up to @someaka's fix.
Polish:
- Drop the redundant `_preflight_tokens >= threshold_tokens` clause.
`should_compress(tokens)` already short-circuits when tokens < threshold,
so the explicit comparison was dead code on the True branch.
Tests:
- Preflight: pin that should_compress() is called (anti-thrash has a vote).
Mocks should_compress to return False even with tokens past the raw
threshold and asserts no compression runs — exact bug shape from #29335.
- Gateway: AST scan of gateway/run.py asserts every
`session_entry.session_id = ...` assignment is followed by a
`session_store._save()` call within the same block. Three sites mutate
the session_id after compression; all three must persist or the next
turn loads the pre-compression transcript and re-loops. Empirically
verified the test catches the bug (drops the new _save() line → red).
AUTHOR_MAP:
- Map ed@bebop.crew -> someaka so the salvaged commit resolves to
@someaka in release notes.
Three compounding root causes:
A) run_conversation() result dict missing session_id — gateway's
dead-code guard at gateway/run.py:8700 never triggers
B) preflight compression bypasses should_compress() anti-thrashing —
re-triggers every turn when tool schemas dominate token budget
C) gateway updates session_entry.session_id in memory but doesn't
persist via session_store._save()
Fixes: #29335
Session IDs are profile-constrained, so the resume hint needs to
include the active profile for multi-profile users. Without this,
copying the hint from a non-default profile fails to resume the
correct session.
Before: hermes --resume 20260414_063228_c1240e
After: hermes --resume 20260414_063228_c1240e -p dev
Also includes -p on the resume-by-title hint. Skipped for
'default' and 'custom' profiles (no -p needed).
Surgical reapply of PR #9652. Original branch was stale against
current main (~6 months); reapplied onto current cli.py by hand
with original authorship preserved.
Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any
shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds,
SenseVoice, curl pipelines — become an STT backend with zero Python.
Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved
untouched via the built-in local_command path) and the
register_transcription_provider() Python plugin hook also shipped in this
PR.
Resolution order (mirrors TTS exactly):
1. Built-in (local, local_command, groq, openai, mistral, xai)
→ native handler. Always wins.
2. stt.providers.<name>: type: command → command-provider runner.
3. Plugin-registered TranscriptionProvider → plugin dispatch.
4. No match → 'No STT provider available'.
Files
-----
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained;
added _resolve_command_stt_provider_config, _transcribe_command_stt,
and local helpers for template rendering, shell-quote context, and
process-tree termination. Helpers are documented as mirrors of their
tts_tool.py counterparts (kept local to avoid cross-tool private
import). Wire-in is one insertion point in transcribe_audio() after
the xai elif and before the plugin dispatcher. Plugin dispatcher
additionally defensively short-circuits when a same-name command
config exists (command-wins-over-plugin invariant).
- tests/tools/test_transcription_command_providers.py: 50 new tests
covering resolution (builtin precedence, type/command gating,
case-insensitive lookup, legacy stt.<name> back-compat), helpers
(timeout fallback, format validation, iter, has-any), template
rendering (shell-quote contexts, doubled-brace preservation),
end-to-end via _transcribe_command_stt (output_path read, stdout
fallback, timeout, nonzero exit envelope, model override,
language precedence), and dispatcher integration via the real
transcribe_audio() including command-wins-over-plugin and
builtin-shadow-rejection.
- tests/plugins/transcription/check_parity_vs_main.py: extended from
10 to 13 scenarios. New cases: command-provider-installed,
command-vs-plugin-same-name (verifies command wins precedence),
explicit-openai-with-command-shadow (verifies built-in wins).
Adds command_provider dispatch_kind detection via transcript prefix
(CMD: vs PLUGIN:) so command-provider scenarios can be distinguished
from plugin scenarios even when sharing a provider name.
- website/docs/user-guide/features/tts.md: new 'STT custom command
providers' section symmetric to the TTS section — example config,
placeholder grammar table (input_path / output_path / output_dir /
format / language / model), transcript-read-back semantics (file
first, then stdout fallback), optional keys table, behavior notes,
security note. Updated 'Python plugin providers (STT)' to include
the new 'When to pick which (STT)' decision table and updated
resolution-order section (now 4 layers instead of 3).
Verification
------------
189/189 STT targeted tests + 50/50 new command-provider tests pass.
Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/
8623/8623 — zero regressions across 14,199 tests.
Parity harness: 13 scenarios, 9 OK + 4 expected diffs
(no_provider_error → plugin, plugin_unavailable, command_provider × 2).
E2E live-verified in an isolated HERMES_HOME with a real .wav file:
command: → dispatched to stt.providers.my-fake-cli
plugin: → dispatched to registered TranscriptionProvider
command-wins-over-plugin: → command provider beats same-name plugin
builtin-wins-over-command: → built-in OpenAI handler fires;
stt.providers.openai: type: command
does NOT hijack it.
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
- CLI: bracketed/quoted target resolves; mismatched single bracket passes through unchanged.
- Gateway: bracketed session ID resolves; bare untitled session ID resolves via get_session() fallback.
The /resume usage hint shows '<session_id_or_title>' which a few users have
typed verbatim, including the angle brackets. Strip outer <>, [], "", and ''
from the argument before lookup so '/resume <abc123>' works the same as
'/resume abc123'. Mirrors the new bracket-stripping in the CLI handler.
Also let the gateway resolve a bare session ID. Previously the gateway only
called resolve_session_by_title, so '/resume <session_id>' always returned
'Session not found' even for valid IDs. Try get_session() first, fall back
to title resolution second.
Surgical reapply of PR #10215 (branch was based on a many-months-old main
and reverted ~3100 unrelated files; original commit by claw@openclaw.ai
preserved via --author).
PR #31416 (avoid persisting borrowed credential secrets) added
sanitize_borrowed_credential_payload, which strips access_token from
any auth.json pool entry whose (provider, source) isn't in the
_PERSISTABLE_PROVIDER_SOURCES allowlist.
(copilot, gh_cli) is borrowed (not in the allowlist), so the test
fixture's pre-seeded access_token now gets stripped at load_pool()
time, leaving the pool empty. resolve_target('1') then fails with
'No credential #1. Provider: copilot.'
Fix: align the test with the new contract. At runtime, copilot tokens
are hydrated by resolve_copilot_token() — mock that path so the pool
gets an entry the test can remove. The behavior under test
(suppression of gh_cli + env variants on remove) is unchanged.
CI repro on origin/main HEAD; reproduced locally with stock checkout.
Extend PR #31716 to plugin setup paths that were also using bare
getpass.getpass(): hindsight (4 sites), honcho, simplex, line. Same
mechanical swap onto hermes_cli.secret_prompt.masked_secret_prompt.
Two defense-in-depth fixes on cron output path handling:
1. cron/jobs.py:update_job() rejects mutation of the immutable 'id' field
(raises ValueError). Dashboard PUT /api/cron/jobs/{id} converts this to
HTTP 400. Without this, an attacker who can reach the update endpoint
could rename a job's id to '../escape' and move its output directory
outside OUTPUT_DIR.
2. cron/jobs.py:_job_output_dir() validates job IDs before composing
paths: rejects '.', '..', '/', '\\', absolute paths, and Windows drive
prefixes. Used by save_job_output() and remove_job() so legacy unsafe
IDs (from before this guard) fail closed rather than half-applying a
shutil.rmtree or output write outside the sandbox.
Tests:
- update_job rejects {'id': '../escape'} without renaming
- remove_job(legacy '../escape' id) raises ValueError without deleting
files outside OUTPUT_DIR or removing the job from the store
- save_job_output rejects '..', './escape', 'nested/escape',
absolute paths
- dashboard PUT /api/cron/jobs/{id} with {'id': '../escape'} returns
400, job list unchanged
Salvaged from PR #29826 by @zapabob. Simplified implementation:
- Dropped a 23-line _validate_job_output_id() helper using Path.parts
semantics. The inline check (path separators + dot-components +
is_absolute) is shorter and behaviorally identical.
- Dropped the secondary OUTPUT_DIR.resolve()/relative_to() check —
redundant once we reject any path separator at the input boundary.
- Dropped the _docs/2026-05-21_cron-output-path-hardening_codex.md
planning artifact (we don't check planning docs into the repo).
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
The bug: cron/scheduler.py:_resolve_cron_enabled_toolsets returns an
LLM-supplied per-job enabled_toolsets verbatim. The disabled_toolsets
passed to AIAgent was a hardcoded [cronjob, messaging, clarify] that
ignored agent.disabled_toolsets from config.yaml. An LLM could call
cronjob(action='add', enabled_toolsets=['terminal','file'],
prompt='...') and the cron-spawned agent would receive terminal+file
even when the operator had globally disabled them.
Fix: new _resolve_cron_disabled_toolsets() helper that ALWAYS layers
agent.disabled_toolsets on top of the cron baseline. AIAgent's
disabled_toolsets takes precedence over enabled_toolsets, so this
stops the bypass regardless of what the per-job override contains.
This is the disabled-side fix. Three concurrent PRs (#25842, #25815,
#25780) proposed intersection-side variants on _resolve_cron_enabled_toolsets;
this fix is more robust because it stops the leak at the precedence
boundary AIAgent itself enforces, not at a layer above.
Regression test reproduces the issue's PoC exactly:
config.yaml has agent.disabled_toolsets=[terminal,file]; cron job has
enabled_toolsets=[web,terminal,file]; assertion: AIAgent receives
disabled_toolsets containing terminal AND file.
Salvaged from PR #25786 by @Schrotti77. Simplified the implementation:
dropped a 23-line _normalize_toolset_list() helper (handled str/tuple/
set/garbage input shapes) in favor of the existing convention
(agent_cfg.get('disabled_toolsets') or []) used elsewhere in the
codebase. YAML always parses these as lists; the elaborate normalizer
was theatre for shapes we never produce.
Closes#25752
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Follow-up on top of @jacevys' PR #21437 cherry-pick:
- _provider_model_ids() now also matches normalized == 'openai-api' for
the live /v1/models fetch path, so users see the full catalog instead
of just the curated list.
- Add gpt-5.5-pro and gpt-5.3-codex to the curated list for parity with
the existing 'openai' table (used as fallback when /v1/models fails).
- Add scripts/release.py AUTHOR_MAP entry for jacevys so CI doesn't
block the salvage PR.
The legacy runtime_calls[-1] == "anthropic" check in
test_model_switch_uses_requested_provider failed in CI under
specific test-shard scheduling with 'custom' == 'anthropic',
across multiple unrelated PRs on 2026-05-25. The May 23 pin
(commit 3127a41cb) monkeypatched parse_model_input + detect_provider_for_model
to remove the dependency on live _KNOWN_PROVIDER_NAMES module state but the
flake reappeared anyway — root cause still not reproducible locally even
under stress runs.
The other three assertions ("Provider: anthropic" in result,
state.agent.provider == "anthropic", state.agent.base_url ==
"https://anthropic.example/v1") already prove
fake_resolve_runtime_provider was called with requested="anthropic"
for the model-switch step — the agent's provider and base_url
come directly from that fake's return value. The tail-position
check was redundant and the only assertion that flaked.
Replaces runtime_calls[-1] == "anthropic" with
"anthropic" in runtime_calls so the plumbing path is still
covered without depending on call ordering.
The locale switcher appeared broken because hardcoded markdown links
(`](/docs/X)`) got double-prefixed by Docusaurus to `/docs/<locale>/docs/X`
(404) in non-English locales, and the MDX hero `<a href>` on the index
page escaped locale routing entirely.
Changes:
- Rewrite 922 `](/docs/X)` -> `](/X)` across 166 docs files (strip trailing
.md too). Docusaurus prepends locale + baseUrl itself.
- docs/index.md -> index.mdx; hero "Get Started" anchor -> Docusaurus
<Link> so it stays inside the active locale.
- Drop `ko` locale entirely from docusaurus.config.ts + delete i18n/ko/
(4 stale auto-translated kanban pages, <2% coverage, misleading).
Verified `npm run build` succeeds for both en and zh-Hans; `build/zh-Hans/
index.html` has no /docs/zh-Hans/docs/... double-prefixed paths.
PR2 will translate the 335 English docs into i18n/zh-Hans/.
#27385 reports that on macOS the browser sees the xAI 'authorization
received' success page but Hermes still raises xai_callback_timeout.
The loopback HTTP handler was silent — no log line on receipt, no log
line on wait timeout — so triaging the gap between 'browser saw
success' and 'CLI saw timeout' required either a code change or
guesswork.
Adds two INFO log lines:
- Per callback hit (handler): path, has_code, has_state, has_error,
truncated User-Agent. Booleans / fingerprints only — no actual
code/state strings leak.
- On wait timeout: report whether result.code or result.error was
populated at deadline. Distinguishes three failure modes:
1. No hit log + timeout log w/ has_code=False has_error=False
→ xAI's IDP never reached the loopback (firewall, port-binding,
IPv6/IPv4 mismatch, browser blocked private-network access).
2. Hit log w/ has_code=False has_error=False + timeout log
→ xAI hit the loopback without OAuth params (the bare-URL
case the handler already 400s on).
3. Hit log w/ has_code=True + timeout log w/ has_code=False
→ result_lock contention or race; would indicate a real bug.
133/133 in tests/hermes_cli/test_auth_xai_oauth_provider.py,
tests/hermes_cli/test_xai_oauth_pkce_token_exchange.py, and
tests/run_agent/test_codex_xai_oauth_recovery.py.
Follow-up to @Strontvod's fix.
Tests:
- Five new tests in test_update_concurrent_quarantine.py cover the parent-
chain exclusion: the .exe launcher is excluded, an unrelated sibling
hermes.exe is still reported, multi-level ancestry is fully excluded,
PID cycles in the parent chain don't hang, and a partially-stubbed
psutil (no Process attribute) degrades gracefully instead of crashing.
- New _fake_psutil_with_parent_chain helper builds a fuller stand-in
(Process / NoSuchProcess / AccessDenied + process_iter) than the
process_iter-only SimpleNamespace the older tests use.
Hardening:
- Broaden the except in the parent-walk to bare Exception. The original
fix listed (NoSuchProcess, AccessDenied, ValueError), but those names
are evaluated lazily during exception matching — if psutil is a partial
stub without the attribute, the exception handler itself raises
AttributeError that escapes. The function is documented as 'never raises'
(the surrounding update flow depends on it), so the broader catch keeps
the contract regardless of how the dependency is shaped.
AUTHOR_MAP:
- Map schepers.zander1@gmail.com -> Strontvod so the salvaged commit
resolves to @Strontvod in the release notes.
All 18 detect_concurrent + quarantine tests pass.
On Windows, the setuptools-generated hermes.exe launcher is a separate native
process that spawns python.exe (the interpreter running the update code).
os.getpid() returns the Python PID, but the launcher (which holds the file
lock) is the parent. Without walking the parent chain, every 'hermes update'
reports its own launcher as a concurrent instance - a false positive.
This patch builds an exclusion set containing the Python process and its
entire ancestor chain, so the running invocation never reports itself.
The new tests/docker/ suite (added by this PR) was being picked up by the
sharded pytest matrix in tests.yml, where its session-scoped `built_image`
fixture issued a 3-7min `docker build` under tests/docker/conftest.py's
180s pytest-timeout cap. Every test in the directory failed in fixture
setup across all 6 shards.
Fix the suite so it actually runs (not skips):
1. Wire the docker tests into docker-publish.yml's build-amd64 job, right
after the existing smoke test. The image is already loaded into the
local daemon as `nousresearch/hermes-agent:test`; set
HERMES_TEST_IMAGE to that and the fixture's pre-built-image branch
short-circuits the rebuild. 21 tests run in ~90s locally against a
prebuilt image, no rebuild cost on top of the existing build step.
2. Exclude tests/docker/ from scripts/run_tests_parallel.py's default
discovery so the sharded matrix in tests.yml stops trying to build
the image. Explicit positional paths (`pytest tests/docker/` or
`scripts/run_tests.sh tests/docker/`) still pick the suite up — the
skip rule honors directory-level user intent, matching the existing
per-file override pattern.
The dedicated docker-tests step runs on every PR that touches docker
code (the existing path filters on docker-publish.yml already cover
`tests/docker/**` via `**/*.py`), so the suite gates real changes.
(cherry picked from commit 4c481860ce)
After the supervise-perms fix lands, the s6 lifecycle actually works
for the hermes user — hermes -p <profile> gateway start now genuinely
brings the supervised gateway up rather than silently no-op'ing on
EACCES. That exposes a latent bug in this test's assertion: it
expected 'want up' to appear literally in s6-svstat output, but
s6-svstat elides redundancies — when the slot is currently up AND
s6 wants it up, the output is just 'up (pid N pgid N) X seconds';
the explicit 'want up' token only appears when current ≠ wanted
(e.g. 'down (exitcode 1) … , want up' on a crash-loop).
Add a small helper _svstat_wants_up() that reads the want-state
correctly across both spellings:
* 'up …' → wanted up (unless explicit 'want down')
* 'down …, want up' → wanted up explicitly
* 'down …' → wanted down
Both stop and start assertions now use the helper. Also rewords
the module docstring to acknowledge that the supervised process
may succeed OR crash-loop depending on environment, but the want-
state contract holds either way.
(cherry picked from commit 02c933aedc)
PR #30136 CI: test_dockerfile_entrypoint_routes_through_the_init failed
because the test hardcoded known_inits = ('tini', 'dumb-init',
'catatonit'). The PR replaced tini with s6-overlay's /init (which execs
s6-svscan as PID 1) — same SIGCHLD-reaping contract, different name,
so the substring scan against ENTRYPOINT missed it.
Two-part fix:
1. Extend the accepted token list to include 's6-overlay', 's6-svscan',
and '/init'. The contract these tests enforce is behavioural ('some
PID-1 init reaps SIGCHLD'), so the names list is purely a recognition
table and any reaper-capable family should qualify.
2. Harden test_dockerfile_installs_an_init_for_zombie_reaping (the
sibling check) against comment-only matches. It was scanning the full
Dockerfile text and only passed because the word 'tini' is still in
a historical comment explaining why we used to use it. The next
person to clean up that comment would have silently broken the test.
New _instruction_text() helper joins only the parsed, non-comment
Dockerfile instructions so stale comments can't satisfy the check.
(cherry picked from commit ffc1bb6393)
Resolves the explicit "Known follow-up" left by commit 2f8ceeab9 and
the resulting CI failures in tests/docker/test_dashboard.py and
tests/docker/test_s6_profile_gateway_integration.py.
The product gap
---------------
Every hermes runtime operation inside the container runs as the
hermes user (UID 10000) via s6-setuidgid. But s6-supervise — spawned
by s6-svscan running as PID 1 — creates each service's supervise/
and top-level event/ directories with mode 0700 owned by its
effective UID (root). That left every s6-svc / s6-svstat / s6-svwait
call from hermes hitting EACCES on the supervise/control FIFO and
supervise/status — i.e. the entire S6ServiceManager lifecycle
(register, start, stop, unregister) was inert in production.
The 2f8ceeab9 commit message called this out and deferred the fix.
The audit changes that landed alongside it (defaulting docker_exec
to -u hermes) made the integration tests reproduce the bug
deterministically; the fix below resolves it.
The fix: pre-create the supervise/ skeleton hermes-owned
----------------------------------------------------------
Reading s6's source (src/supervision/s6-supervise.c::trymkdir +
control_init), the mkdir and mkfifo calls that build the supervise
tree are EEXIST-safe: if the directory or FIFO is already present,
s6-supervise reuses it and skips the chown/chmod fix-up that would
normally make event/ 03730 root:root. So if we lay the skeleton
down with hermes ownership before triggering s6-svscanctl -a,
s6-supervise inherits our layout and never touches it. The
death_tally / lock / status regular files written later by
s6-supervise (still as root) land mode 0644 — world-readable —
which is all s6-svstat needs.
New module-level helper _seed_supervise_skeleton(svc_dir) in
hermes_cli/service_manager.py lays down:
svc_dir/event/ hermes:hermes 03730
svc_dir/supervise/ hermes:hermes 0755
svc_dir/supervise/event/ hermes:hermes 03730
svc_dir/supervise/control hermes:hermes 0660 (FIFO)
svc_dir/log/event/ hermes:hermes 03730 (if log/ present)
svc_dir/log/supervise/ hermes:hermes 0755
svc_dir/log/supervise/event/ hermes:hermes 03730
svc_dir/log/supervise/control hermes:hermes 0660 (FIFO)
The log/ branch matters because the logger is a second
s6-supervise instance — without it, unregister rmtree races on
the logger's root-owned supervise dir even after the parent
slot's supervise/ is hermes-owned. The helper is idempotent and
swallows PermissionError on chown so it works equally well when
called from root (cont-init.d) or hermes (runtime register).
Wiring
------
1. S6ServiceManager.register_profile_gateway calls
_seed_supervise_skeleton(tmp_dir) just before publishing the
slot via Path.replace. Runtime-registered profile gateways are
set up by hermes.
2. container_boot._register_service does the same in the cont-init.d
reconciliation path so boot-time-restored profile slots inherit
the same layout.
3. New cont-init.d/015-supervise-perms script chowns the supervise/
and event/ trees for STATIC s6-rc services (dashboard,
main-hermes). These are spawned by s6-rc before cont-init.d
gets to run, so the EEXIST-trick doesn't apply; we chown the
already-existing tree instead. s6-supervise keeps using the
same files; it never re-asserts ownership on a running service.
The script skips s6-overlay internal services (s6rc-*,
s6-linux-*) so the supervision tree itself stays root-only.
015- slot is intentional: lex-sorts between 01-hermes-setup
and 02-reconcile-profiles in the container's C-locale, so
the chown finishes before the reconciler walks the scandir.
Unregister teardown reordering
------------------------------
S6ServiceManager.unregister_profile_gateway now fires
s6-svscanctl -an BEFORE rmtree (with a 200ms grace), so
s6-svscan reaps the supervise child and releases its file
handles on supervise/lock + supervise/status before we try to
remove the directory. Previously rmtree raced s6-supervise on a
set of files inside the supervise dir, and even with the parent
supervise/ now hermes-owned, the contained files (death_tally,
lock, status, written by root) could still be in use.
Dashboard down-state redesign
-----------------------------
The original PR #30136 review fix wrote a 'down' marker file
into /run/service/dashboard/ via cont-init.d/03-dashboard-toggle.
That approach was broken in two ways:
(a) /run/service/dashboard is a symlink to a TRANSIENT
/run/s6-rc:s6-rc-init:<tmpdir>/ directory while s6-rc is
mid-transaction; the touch landed in a soon-to-be-discarded
tmp.
(b) Even when written to the final /run/s6-rc/servicedirs/
location, the 'down' file is only consulted by s6-supervise
at slot startup. s6-rc's user-bundle explicitly transitions
'dashboard' to 'up' on every boot, overriding any down
marker.
The right fix is the canonical s6 pattern: when HERMES_DASHBOARD
is unset, the dashboard run script exits 0 and a companion
finish script exits 125. Per s6-supervise(8), exit code 125 from
the finish script is the 'permanent failure, do not restart'
marker — equivalent to s6-svc -O. The slot reports as 'down' to
s6-svstat, matching the reality that no dashboard process is
running. When HERMES_DASHBOARD IS truthy, finish exits 0 and
restart-on-crash semantics apply.
03-dashboard-toggle is removed (its function is now subsumed by
the run/finish pair).
Tests
-----
Adds four unit tests for _seed_supervise_skeleton covering the
produced layout, the log/ subservice case, the skip-when-no-log
case, and idempotency. The live-container verification continues
to live in tests/docker/test_s6_profile_gateway_integration.py and
tests/docker/test_dashboard.py — both now pass against the
rebuilt image.
References
----------
* Skarnet skaware mailing list 2020-02-02 (Laurent Bercot
+ Guillermo Diaz Hartusch) on unprivileged s6 tool semantics:
http://skarnet.org/lists/skaware/1424.html
* just-containers/s6-overlay#130 — same EEXIST-preseed pattern,
community-validated 2016 onward
* https://skarnet.org/software/s6/servicedir.html — exit-code 125
semantics in finish scripts
(cherry picked from commit c41f908ad4)
Documents five approaches for adding tools beyond what the official
image ships with: npx/uvx for npm/Python tools, ad-hoc apt installs
that Hermes remembers, derived images for durability, sidecar
containers for multi-service stacks, and upstreaming via issue/PR
for broadly useful additions.
Follow-up to @benbarclay's #30136 salvage. The pre-existing PID-1
contract tests in tests/tools/test_dockerfile_pid1_reaping.py (added
with #15012) hardcoded tini/dumb-init/catatonit as the only accepted
inits, so they failed after #30136 replaced tini with s6-overlay's
/init.
s6-overlay's PID 1 is s6-svscan, which reaps zombies non-blockingly
on SIGCHLD — same contract the test exists to enforce. Two updates:
* test_dockerfile_installs_an_init_for_zombie_reaping — accept
's6-overlay' as a known-installed marker (matches the
s6-overlay install layer in Ben's Dockerfile).
* test_dockerfile_entrypoint_routes_through_the_init — accept
'/init' as a known-routed marker (s6-overlay's PID-1 binary
lives at /init by convention).
Both assertions still fire if a future Dockerfile rewrite drops
the init entirely. Local: 7/7 pass.
Two CI follow-ups to @benbarclay's #30136 salvage:
1. scripts/run_tests_parallel.py — add 'docker' to _SKIP_PARTS so
the new tests/docker/ harness doesn't run in the regular test (N)
matrix. The harness builds the real Dockerfile in a session
fixture, which can exceed pytest-timeout's 180s ceiling on
ubuntu-latest where Docker IS available — it surfaced as 6
identical setup-timeout failures across slices 1–6 on the first
CI run.
The docker harness has its own dedicated runner via
.github/actions/hermes-smoke-test (added in #30136) plus the
docker-lint workflow. Same treatment as tests/integration/ and
tests/e2e/ — runs separately, not in the main shards.
2. hermes_cli/service_manager.py — pin encoding='utf-8' on the
/proc/1/comm read_text call. Ruff PLW1514 enforcement rolled in
between Ben's last push and the salvage; pure ruff-fix, no
behavior change.
X Premium+ also grants Grok OAuth access — the 'SuperGrok Subscription'
wording suggested SuperGrok was the only entitlement path. Updated to
'SuperGrok / Premium+' across the picker label, setup wizard, auth flows,
and docs so Premium+ subscribers know the row applies to them too.
xAI's grok-imagine-image API returns ephemeral imgen.x.ai/xai-tmp-* URLs
that 404 within minutes — long before downstream consumers (Telegram
send_photo, browser preview, multi-tier delivery fallback) get a chance
to fetch them. The xAI image_gen provider was passing those URLs
through unchanged on the elif url: branch; b64 responses were already
cached locally via save_b64_image. Result: every image_generate call
on a Telegram-routed xai-oauth profile delivered no image, falling
through to text-only.
Adds agent.image_gen_provider.save_url_image() — a sibling helper to
save_b64_image that downloads URL bytes to $HERMES_HOME/cache/images/.
Content-type-aware extension inference with URL-suffix fallback;
oversize cap (25MB default) with partial-write cleanup; empty-body
refusal. Mirrors the audio_cache pattern used by text_to_speech.
Wires save_url_image into both the xAI and OpenAI providers' URL
branches. When the download fails (network blip, 404 in-flight) we
log a warning and fall back to the bare URL rather than turning the
tool call into a hard error — the gateway's existing URL-send fallback
then gets a chance to surface the original error legibly.
Test plan:
- tests/agent/test_save_url_image.py — 8 direct tests against a real
in-process HTTP server: bytes round-trip, content-type → extension,
URL-suffix fallback, default-to-png, 404 propagation, empty-body
refusal, oversize cap + cleanup, filename uniqueness.
- tests/plugins/image_gen/test_xai_provider.py — flip
test_successful_url_response (was asserting the bug), add
test_url_response_falls_back_to_bare_url_when_download_fails.
- tests/plugins/image_gen/test_openai_provider.py — symmetric pair.
160/160 in the broader image_gen test surface.
Follow-up to @benbarclay's Docker s6 PR (#30136). The Phase 4 hooks
`_maybe_register_gateway_service` and `_maybe_unregister_gateway_service`
were already documented as "no-op on host", but they reached that no-op
by:
1. importing `hermes_cli.service_manager`
2. calling `get_service_manager()` (which calls `detect_service_manager()`)
3. checking `mgr.supports_runtime_registration()` and returning False
If anything in step 1 or 2 raised an unexpected exception (e.g. a host
machine with a partial s6 install — `/proc/1/comm == s6-svscan` somehow,
but `/run/s6/basedir` absent, or vice versa), the `except Exception`
in the hook would print a confusing "⚠ Could not register s6 gateway
service: ..." warning on a non-container machine that has never touched
the container.
Reorder so `detect_service_manager() != "s6"` is checked FIRST, and
return silently for any detection failure. Host machines now:
- never import the s6 backend
- never call get_service_manager()
- never print an s6-shaped warning under any failure mode
E2E confirmed on host Linux (systemd):
`_maybe_register_gateway_service(...)` produces empty stdout,
detect_service_manager() returns "systemd".
Existing tests updated to patch `detect_service_manager` for the s6
call-through cases (they previously relied on get_service_manager
being the only gate, which is no longer true). Added one new test —
`test_register_silent_when_detect_throws` — asserting that a broken
detector cannot leak a warning to host users.
cc @benbarclay — visible behavior change vs. your branch is one
fewer code path on host. Test changes are minimal (one helper +
`_patch_detect_s6` opt-in per s6 test). Happy to revert if you
prefer the original shape.
Second migration of an existing built-in platform adapter after Discord
(PR #30591) — follows the same shape established by IRC / Teams / LINE /
Google Chat / SimpleX and the playbook in
`references/platform-plugin-migration.md`. Advances the umbrella refactor
in #3823.
Matches Discord's parity bar — adapter under `plugins/platforms/mattermost/`
with the standard `__init__.py` / `adapter.py` / `plugin.yaml` shell,
`register(ctx)` entry point, **no back-compat shim** at the old import
path, and full parity for all five hooks Discord uses plus the
`apply_yaml_config_fn` hook (mattermost is the second consumer of #25443
after Discord):
* `standalone_sender_fn` — out-of-process cron delivery via Mattermost
REST API. Picks up the thread_id + media_files capabilities the
legacy `_send_mattermost` lacked (parity with Discord's `_standalone_send`).
* `setup_fn` — interactive `hermes setup gateway` wizard.
* `apply_yaml_config_fn` — translates `config.yaml` `mattermost:` keys
(`require_mention`, `free_response_channels`, `allowed_channels`) into
`MATTERMOST_*` env vars (replaces the hardcoded block in
`gateway/config.py`).
* `is_connected` — declares connection state from `MATTERMOST_TOKEN` +
`MATTERMOST_URL`.
* `check_fn` — verifies aiohttp is installed and both required env vars
are set.
* plus `allowed_users_env`, `allow_all_env`, `cron_deliver_env_var`,
`max_message_length` (4000 — Mattermost practical limit), `emoji`,
`required_env`, `install_hint`.
Files
-----
* `gateway/platforms/mattermost.py` (873 LOC) →
`plugins/platforms/mattermost/adapter.py` (git rename, R071) +
appended `register()` block, hook helpers, and `_standalone_send`
with media upload + thread_id support.
* New `plugins/platforms/mattermost/{__init__.py, plugin.yaml}` with
`requires_env` / `optional_env` declarations covering MATTERMOST_URL,
MATTERMOST_TOKEN, MATTERMOST_ALLOWED_USERS, MATTERMOST_ALLOW_ALL_USERS,
MATTERMOST_HOME_CHANNEL, MATTERMOST_REPLY_MODE,
MATTERMOST_REQUIRE_MENTION, MATTERMOST_FREE_RESPONSE_CHANNELS,
MATTERMOST_ALLOWED_CHANNELS.
* `gateway/config.py`: delete 17-LOC `mattermost_cfg` YAML→env bridge
(moved into plugin's `_apply_yaml_config`).
* `gateway/run.py::_create_adapter`: delete `Platform.MATTERMOST elif` —
replaced by the existing generic plugin-registry-first dispatch.
* `tools/send_message_tool.py`: delete `_send_mattermost` (22 LOC) +
`Platform.MATTERMOST elif` in `_send_to_platform` — the `else` branch
already routes plugin platforms through `_send_via_adapter`, which
hits the registry's `standalone_sender_fn`.
* `hermes_cli/setup.py`: delete `_setup_mattermost` (44 LOC) — replaced
by the plugin's `interactive_setup`.
* `hermes_cli/gateway.py`: delete `_PLATFORMS["mattermost"]` dict entry
(3 LOC) — plugin's `setup_fn` is dispatched via the plugin path in
`_configure_platform`.
* Consumer rewrite: 5 test files (test_mattermost.py,
test_media_download_retry.py, test_send_multiple_images.py,
test_stream_consumer.py, test_ws_auth_retry.py) get
`gateway.platforms.mattermost` → `plugins.platforms.mattermost.adapter`
with the bulk-rewrite recipe from the platform-plugin-migration playbook.
Single `mock.patch` string in test_stream_consumer.py also repointed.
* `tests/tools/test_send_message_missing_platforms.py`: thin
`(token, extra, chat_id, message)` compat shim around the plugin's
`_standalone_send(pconfig, …)` so existing test bodies continue to
work without rewriting every signature.
Validation
----------
* Plugin discovery: mattermost registers from `plugins/platforms/mattermost/`
alongside discord / teams / irc / line / google_chat / simplex.
All 9 hooks present (setup_fn, standalone_sender_fn,
apply_yaml_config_fn, is_connected, check_fn, allowed_users_env,
allow_all_env, cron_deliver_env_var, max_message_length=4000).
* Mattermost-touching tests: 62/62 pass
(`test_mattermost.py` + `test_send_message_missing_platforms.py`).
* Targeted selectors (mattermost or platform_registry or stream_consumer
or ws_auth_retry or media_download_retry or send_multiple_images or
send_message_tool or platform_connected): 433/433 pass.
* Full sweep (`scripts/run_tests.sh tests/gateway/ tests/cron/
tests/tools/test_send_message_tool.py tests/tools/test_send_message_missing_platforms.py
tests/integration/`): **6220/6220 pass in 47.8s, 0 failures**.
* Lint: ruff clean on all touched files.
* Git identity verified: kshitijk4poor.
* Rename detection: R071 (similarity dropped from a hypothetical R09x
by the ~320-line appended register block — ~36% growth over the
873-LoC base, vs Discord's 5101 LoC base which kept R091).
Closes part of #3823.
PR #30136 review item O7: the plan doc was 3,191 lines — 5x the
size of any other plan in docs/plans/ and the largest reference
document in the repo. With the implementation shipped, most of
that content is either:
* The phase-by-phase TDD walkthrough (~2,800 lines): now canonical
in the PR commit log (`git log a957ef083..a6f7171a5`).
* The v2/v3 re-validation preambles: artifacts of the planning
process, no longer load-bearing.
* The full Open Questions deliberations with options A/B/C laid
out: collapsed into the Decision Log.
* The Rollout Plan and Estimated Timeline: history.
Trim to ~430 lines covering what readers actually need going
forward: the goal, architecture, scope, key design decisions
(D1–D9), risk register (now including the three risks surfaced
in PR review — `_s6_running` detection, svscanctl FIFO perms,
supervise control FIFO perms), the decision log including the
post-merge additions, and the verification checklist (now all
boxes ticked).
Header now reads 'Status: shipped' and points at the PR. The git
history preserves the full v3 plan for anyone who needs it.
PR #30136 review item O6: test_container_restart.py used fixed
`time.sleep(8)` calls after `docker restart` to wait for the
cont-init reconciler to finish. Fixed sleeps are slow when the
event happens fast and false-fail when the event happens slow.
Replace with two polling helpers:
* `_wait_for_path(container, path, kind='f' | 'd', deadline_s=...)`
— generic `test -f/-d` poller. Returns True on success, False on
timeout; callers assert with a clear message.
* `_wait_for_reconcile_log_mention(container, profile, ...)` — the
reconciler's per-profile log line is the canonical signal that
the cont-init reconcile has finished for that profile. Poll on
it instead of a sleep that hopes 8 seconds is enough.
The fixture-level setup wait is similarly migrated: it now polls
for `profile=default` in the boot log (every container always
gets a default-slot entry per item I1) and raises a clear timeout
error from the fixture if the container never finishes cont-init —
much better diagnostics than a mid-test KeyError.
The remaining `time.sleep()` calls are all internal interval_s
between probe attempts; no fixed wait points left.
PR #30136 review item O5: docker/entrypoint.sh is now a thin shim
that forwards to stage2-hook.sh — the real ENTRYPOINT is /init plus
main-wrapper.sh. External scripts that hard-coded entrypoint.sh as
the container's ENTRYPOINT will see the cont-init bootstrap happen
but the CMD will not be exec'd (because stage2-hook only handles
bootstrap; main-wrapper.sh handles the CMD passthrough).
Add a stderr warning explaining the new contract and pointing
callers at the migration path (drop the --entrypoint override).
The shim itself stays in place for one release cycle so the
deprecation isn't a hard break — anyone still invoking it sees
the warning in their logs and has time to migrate.
PR #30136 review noted the asymmetry: `register_profile_gateway`
used tmp_dir + rename to publish a new service slot atomically,
but the boot-time reconciler wrote files into the slot directly.
Same underlying concern (a concurrent s6-svscan rescan could
observe a half-populated directory), different code path.
Rewrite `container_boot._register_service` to mirror the manager:
build everything in `<scandir>/gateway-<profile>.tmp/`, then
`Path.replace` into place. If a previous interrupted run left a
`.tmp` sibling, it's cleaned up before the new build starts. If
the target already exists, it's removed before the rename so
`Path.replace` doesn't error on a non-empty target (Linux `rename`
overwrites empty targets only).
Three new tests: atomic publication leaves no .tmp leftovers,
overwriting an existing slot still leaves no .tmp leftovers, and
a stale .tmp from an interrupted run is cleaned up automatically.
PR #30136 review noted: container-boot.log was append-only with no
rotation. On a long-lived container with frequent restarts and
many profiles it would grow unboundedly (~80 B per profile per
reconcile pass).
Add a soft cap: when the file size hits 256 KiB (`_LOG_ROTATE_BYTES`,
≈3000 reconcile lines, ≈1 year of daily reboots × 5 profiles), the
current file is renamed to `container-boot.log.1` (replacing any
existing one) before new entries are appended. Worst case is two
files at ~512 KiB — well within visibility limits for grep/cat.
Rotation is intentionally simple (no logrotate or s6-log machinery
for one append-only file). Failures during rotation are logged via
the module logger and treated as non-fatal — we keep appending to
the existing file rather than dropping the reconcile entry. Three
new unit tests cover above-threshold rotation, below-threshold
non-rotation, and overwrite of an existing .1 file.
PR #30136 review caught: three `s6-setuidgid hermes sh -c "..."`
invocations in stage2-hook.sh interpolated $HERMES_HOME into a
nested shell context. Practically low-risk (a malicious HERMES_HOME
already requires container-launch privileges) but the cleaner
pattern is to invoke commands directly so the shell isn't a second
interpreter.
* `mkdir -p` of the data subdirs now runs directly via s6-setuidgid,
one path per arg.
* The .install_method stamp is written via `printf | tee` — also no
shell wrapper.
* The skills_sync invocation uses the venv's python by absolute path
instead of sourcing activate inside a shell. skills_sync.py doesn't
need anything from activate beyond sys.path, which the bin-stub
python already provides.
No behavior change. Just a smaller attack surface and a script
that's easier to read.
PR #30136 review caught: `_allocate_gateway_port()` in profiles.py
computed a SHA-256-derived port that was threaded through
`register_profile_gateway(profile, port=N)` →
`_render_run_script(profile, port, extra_env)` → and then **ignored**.
The rendered run script picked the bind port from the profile's
config.yaml (`[gateway] port = …`), never from the allocator. So
the entire allocator + parameter chain was dead code.
Remove:
* `hermes_cli.profiles._allocate_gateway_port` (deterministic
SHA-256 → [9200, 9800) — never used).
* `port` kwarg from `ServiceManager.register_profile_gateway`
(Protocol + Mixin + S6 implementation).
* `port` positional arg from `_render_run_script(profile, port,
extra_env)` — now `_render_run_script(profile, extra_env)`.
* The pass-through call in `profiles._maybe_register_gateway_service`.
config.yaml is now the single source of truth for gateway port
selection — matches reality and reduces the API surface. Three
explanatory comments in service_manager.py / profiles.py document
the retirement so future readers don't reach for the allocator and
find a ghost.
Tests: drop the three `_allocate_gateway_port` tests; update
fakes' signatures throughout test_service_manager.py and
test_profiles_s6_hooks.py to match the new no-port API.
PR #30136 review caught: docker-compose.yml still said "If you
override entrypoint, keep /opt/hermes/docker/entrypoint.sh in the
command chain." That was true under tini; under s6-overlay the
entrypoint is /init plus main-wrapper.sh, and entrypoint.sh is now
only a backward-compat shim.
Replace with an accurate description: /init must remain first in the
chain because it's PID 1 and runs the cont-init.d scripts (chown,
profile reconcile, dashboard toggle) before any service starts.
PR #30136 review caught a false positive: when HERMES_DASHBOARD was
unset, the dashboard run script did `exec sleep infinity`, so
`s6-svstat /run/service/dashboard` reported the slot as 'up'.
`hermes doctor` and any other s6-svstat-based health check saw the
dashboard as supervised-running even though no dashboard process
existed.
Add cont-init.d/03-dashboard-toggle: writes a `down` marker file
into `/run/service/dashboard/` when HERMES_DASHBOARD is falsy,
removes any leftover marker when it's truthy. s6-supervise honors
`down` by not starting the service, so s6-svstat reports 'down' —
matching reality.
The run script's HERMES_DASHBOARD case-statement stays in place as
a belt-and-suspenders guard, so the two layers can never disagree.
Two new integration tests lock the behavior: slot reports down
when unset; slot reports up when set to 1.
PR #30136 review caught: `S6ServiceManager.start/stop/restart` called
`subprocess.run(check=True)` on `s6-svc`, so any failure surfaced as
a raw `CalledProcessError` traceback. The two cases operators
actually hit are:
1. The service slot doesn't exist — most commonly because the user
typed a profile name wrong (`hermes -p typo gateway start`).
2. s6-svc itself fails — most commonly EACCES on the supervise
control FIFO when running unprivileged.
Both deserve named errors with actionable messages, not stacktraces.
Changes:
* Add `S6Error` base + two concrete errors in `hermes_cli.service_manager`:
- `GatewayNotRegisteredError(profile)` — carries the unprefixed
profile name; message: `no such gateway 'typo': register it
with `hermes profile create typo` first, or pass an existing
profile name via `-p <name>``.
- `S6CommandError(service, action, returncode, stderr)` — carries
the s6-svc rc and stderr; message: `s6-svc start on
'gateway-coder' failed (rc=111): <stderr>`.
* Factor lifecycle dispatch through `_run_svc(flag, label, name)`:
pre-checks that the service directory exists (raises
GatewayNotRegisteredError before invoking s6-svc), then runs
s6-svc and translates any CalledProcessError into S6CommandError.
* `_dispatch_via_service_manager_if_s6` in `hermes_cli.gateway`
catches both errors and prints `✗ <message>` + `sys.exit(1)`
instead of letting the exception bubble. The dispatch path that
used to dump a traceback at the user now gives an actionable
one-liner.
Tests: 6 new tests for the error types and their CLI rendering;
existing lifecycle test pre-seeds the slot directory before calling
`mgr.start` etc.
PR #30136 review caught: `hermes gateway start` (no `-p`) inside
the container resolves `_profile_suffix() == ""` → service name
`gateway-default`, but no such slot was ever registered. The Phase 4
profile-create hook only fired on `hermes profile create <name>`,
and the root profile (which lives at the top of $HERMES_HOME, not
under `profiles/`) was never one of those. So bare `hermes gateway
start` landed on `s6-svc -u /run/service/gateway-default` →
uncaught `CalledProcessError` → traceback to the user.
Changes:
1. `reconcile_profile_gateways` now always registers a
`gateway-default` slot before iterating named profiles. Its
prior state is read from `$HERMES_HOME/gateway_state.json`
(sibling to the profile root, not under `profiles/`); stale
runtime files there are swept the same way. Auto-up only if the
prior state was `running` — same rule as named profiles.
2. `S6ServiceManager._render_run_script` special-cases
`profile == "default"` to emit `hermes gateway run` with NO
`-p` flag. Passing `-p default` would resolve to
`$HERMES_HOME/profiles/default/` — a different profile that
almost certainly doesn't exist. The empty profile-suffix
convention is the dispatcher's contract and the run script has
to match.
3. A user-created `profiles/default/` collides with the reserved
root-profile slot; the reconciler now skips it with a warning
rather than producing two registrations of the same service name.
Action-list ordering is stable: `default` first, then named
profiles in directory order. Boot-log readers can rely on this.
Tests: 8 new dedicated default-slot tests plus updates to every
existing test that asserted against the action list (via the new
`_named_actions` helper that drops the always-present default
entry).
PR #30136 review caught that website/docs/user-guide/docker.md still
said "The dashboard side-process is **not supervised** — if it
crashes, it stays down until the container restarts." That was true
under tini but is the opposite of the s6 behavior this PR ships and
`test_dashboard_restarts_after_crash` proves.
Replace with a description of what users actually see now: automatic
restart by s6-overlay, new PID after a short backoff, logs via
`docker logs`. The standalone-container caveat carries forward
unchanged.
PR #30136 review caught that `hermes gateway stop --all` and
`... restart --all` were broken under s6. The Phase 4 dispatcher was
gated on `not stop_all` (and the symmetric restart_all), so `--all`
fell through to `kill_gateway_processes(all_profiles=True)`. pkill
SIGTERMed every gateway, s6-supervise observed the crashes, and
restarted every gateway ~1s later — net effect: `--all` *kicked*
gateways instead of *stopping* them.
Add `_dispatch_all_via_service_manager_if_s6(action)` that iterates
`mgr.list_profile_gateways()` and routes stop/restart through each
service slot. s6's `want up`/`want down` flips correctly, so a
stop persists. Partial failures are surfaced per-profile with a
running success count; the host pkill path is only reached when s6
isn't in play.
`start --all` isn't a CLI surface — the helper rejects it and
returns False (host code path can take over).
PR #30136 review caught a silent regression: the smoke-test action
overrode ENTRYPOINT to `/opt/hermes/docker/entrypoint.sh`, which the
s6-overlay migration reduced to a shim that just `exec`s the stage2
hook. stage2-hook ignores its CMD args, prints "Setup complete", and
exits 0 — so `hermes --help` and `hermes dashboard --help` never
ran. The #9153 regression guard was a green-always no-op.
Drop the override so the smoke test uses the image's real ENTRYPOINT
chain (`/init` + `main-wrapper.sh`), which is the actual production
startup path. `hermes --help` and `hermes dashboard --help` now run
through the full supervision tree and exercise the real argv routing.
PR #30136 review flagged the s6-overlay install as a supply-chain
regression vs the gosu source it replaced — `tianon/gosu` was
digest-pinned via `FROM ...@sha256:...`, but the three new
ADD/curl downloads had no integrity check at all.
Pin all three tarballs (noarch, symlinks-noarch, per-arch) to
upstream-published SHA256s via ARGs. Verification happens via
`sha256sum -c` against a single checksum file (avoids a piped-shell
hadolint DL4006 warning under dash). To bump S6_OVERLAY_VERSION,
fetch the four `.sha256` files from the new release and update
the ARGs — documented inline.
If upstream artifacts are tampered with mid-build, the build now
fails loudly at the verification step instead of silently
producing a tainted image.
The Dockerfile only ADD'd `s6-overlay-x86_64.tar.xz`, so the
`build-arm64` job in docker-publish.yml — which runs on
`ubuntu-24.04-arm` and publishes by digest — produced an image whose
`/init` couldn't exec on actual arm64 hosts. Apple Silicon and ARM
server users were getting a broken container.
Map BuildKit's `TARGETARCH` (`amd64` / `arm64`) to s6's kernel-arch
naming (`x86_64` / `aarch64`) inside the RUN step and fetch the
correct tarball via `curl` (`ADD`'s URL is evaluated at parse time,
before TARGETARCH substitution, so dynamic arch selection requires
RUN). The noarch + symlinks tarballs are architecture-independent
and stay as ADDs.
The audit case is now explicit: unsupported architectures fail loudly
at build time rather than producing a silently-broken image.
PR #30136 review surfaced two issues, both rooted in the same audit gap:
docker integration tests were running as root, not the unprivileged
`hermes` user (UID 10000) that the runtime actually uses via
`s6-setuidgid hermes`. Anything that probed PID-1 state or wrote to
the s6 control surface worked as root in the tests but was inert in
production.
Fixes:
1. `_s6_running()` previously called `Path("/proc/1/exe").resolve()`,
which is root-only readable. For UID 10000 the symlink yields
PermissionError, `resolve()` silently returns the unresolved path,
and `exe.name == "exe"` — so detection always returned False, the
service-manager runtime-registration path was inert, and every
`hermes profile create` / `hermes -p X gateway start` silently
skipped the s6 hook. Replace with `/proc/1/comm` (world-readable)
+ `/run/s6/basedir` (s6-overlay-specific) — both required, fail
closed.
2. `02-reconcile-profiles` now also chowns `/run/service/.s6-svscan/`
{control,lock} to hermes so `s6-svscanctl -a/-an` works without
root. Previously the directory chown stopped at `/run/service`
and the FIFO inside stayed root-owned, so `register_profile_gateway`
from hermes failed at the rescan-trigger step with EACCES — the
wrapper in profiles.py caught the exception and printed a swallowed
warning, so profile creation appeared to succeed while the slot
was rolled back.
Audit changes to flush this class of bug next time:
- Add `docker_exec` / `docker_exec_sh` helpers to `tests/docker/conftest.py`
that default to `-u hermes`. The module docstring explains why and
flags `user="root"` as opt-in only for tests that explicitly need
root (none currently do).
- Refactor every `docker exec` call in tests/docker/ through the new
helpers (test_dashboard.py, test_zombie_reaping.py, test_profile_gateway.py,
test_container_restart.py, test_s6_profile_gateway_integration.py).
- Add 5 unit tests covering `_s6_running` under various probe states
(both signals present; comm wrong; basedir missing; PermissionError
on /proc/1/comm; missing /proc — non-Linux). The PermissionError
test is the explicit regression guard for the original bug.
Known follow-up: the per-service `supervise/control` FIFO inside each
`/run/service/gateway-<profile>/supervise/` is created root-owned by
s6-supervise (which runs as root because s6-svscan is PID 1). `s6-svc
-u/-d/-t` from the hermes user will get EACCES on those. The audit
under `-u hermes` will reveal this in lifecycle tests — surfacing the
issue cleanly so it can be fixed in a focused follow-up (likely via a
small SUID helper or a polling chown loop in cont-init.d). The
detection + svscanctl fixes here are independent and complete on
their own.
The s6-overlay migration replaced every runtime use of gosu with
s6-setuidgid (in stage2-hook.sh, main-wrapper.sh, per-service run
scripts, and cont-init.d hooks), but the gosu binary itself was still
being copied into the image from tianon/gosu, and several comments
across the repo still pointed to it.
Image changes:
- Drop the FROM tianon/gosu:1.19-trixie AS gosu_source stage
- Drop the COPY --from=gosu_source /gosu /usr/local/bin/ layer
- Net: one fewer base-image pull, ~12-15 MB layer eliminated
Documentation/comment refresh (no behavior change):
- Dockerfile: update root-user rationale comment + cont-init.d comment
- docker/main-wrapper.sh: drop "pre-s6 contract (gosu drop)" reference
- docker-compose.yml: update UID/GID remap comment
- .hadolint.yaml: update DL3002 ignore rationale
- website/docs/user-guide/docker.md: privilege-drop helper is s6-setuidgid now
- hermes_cli/config.py: docker_run_as_host_user docstring
tools/environments/docker.py runs *arbitrary user images* via the
terminal backend, not the bundled Hermes image. It still needs SETUID/
SETGID caps so user images that use gosu/su/s6-setuidgid all work.
Renamed the cap-list constant _GOSU_CAP_ARGS → _PRIVDROP_CAP_ARGS and
updated comments to list s6-setuidgid alongside the others as examples.
The matching test (test_security_args_include_setuid_setgid_for_gosu_drop
→ test_security_args_include_setuid_setgid_for_privdrop) was renamed
and its docstring updated; behavior is unchanged.
Verification:
- hadolint clean against .hadolint.yaml
- shellcheck clean against all docker/ shell scripts
- Image rebuilt successfully (sha 1a090924ccea)
- Docker harness: 19 passed in 41.87s (every Phase 0 test + Phase 4
per-profile-gateway lifecycle + container-restart reconciliation)
- tests/tools/test_docker_environment.py: 23 passed (rename did not
break test discovery; pre-existing unrelated mock warning)
The plan document (docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md)
intentionally retains its historical references to gosu — it describes
the pre-s6 entrypoint as background for understanding the migration.
Phase 5 of the s6-overlay supervision plan. Documentation + small
diagnostic cleanups; no behavior changes.
website/docs/user-guide/docker.md:
- Replace the old 'entrypoint script does the bootstrap' section
with the s6-overlay boot flow (cont-init.d/01-hermes-setup,
cont-init.d/02-reconcile-profiles, static main-hermes + dashboard
services, ENTRYPOINT-as-main-program pattern).
- Add a 'Per-profile gateway supervision' subsection covering the
new lifecycle commands, restart semantics, log persistence, and
'Manager: s6 (container supervisor)' status reporting.
- Add 'Breaking change vs. pre-s6 images' callout naming the
/init ENTRYPOINT and pointing affected wrappers at the pin
workaround.
website/docs/user-guide/profiles.md:
- Add a note under 'Persistent services' pointing container users
at the docker.md section explaining s6 supervision inside the
image. Host-side systemd/launchd documentation is unchanged.
skills/software-development/hermes-s6-container-supervision/SKILL.md:
- New maintainer skill covering the supervision-tree map, file
layout, the Architecture B rationale (cont-init.d args + halt
exit-code propagation), quick recipes, and the 8 pitfalls we hit
while implementing the plan (PATH-without-/command, root-owned
profile dirs, SOUL.md as marker, the '143' anti-pattern, etc.).
hermes_cli/doctor.py:
- _check_gateway_service_linger skips on s6 (the linger concept
doesn't apply inside the container).
- New _check_s6_supervision section reports main-hermes/dashboard
state and per-profile-gateway count (registered vs supervised
up), only inside the s6 container. Host doctor output unchanged.
- External Tools / Docker check no longer emits a 'docker not
found' warning inside the container; prints an explanatory
info line instead. Still respects an explicit TERMINAL_ENV=docker
(in case the user mounted /var/run/docker.sock).
hermes_cli/gateway.py:
- Document _container_systemd_operational more precisely: it's
NOT for our Hermes Docker image (s6-overlay handles that via
detect_service_manager() == 's6'). It still covers
systemd-nspawn / k8s-with-systemd-init cases, so leaving it in
place is correct; the docstring just makes that explicit.
Test harness (verification, no test changes in this commit):
19 passed, 0 xfailed. 66 service-manager / container-boot /
profiles-s6-hooks / gateway-s6-dispatch unit tests still green.
61 doctor tests still green. Hadolint + shellcheck clean.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
Phase 4 of the s6-overlay supervision plan. Activates the Phase 3
S6ServiceManager by hooking it into the profile lifecycle and the
`hermes gateway start/stop/restart` dispatcher, and adds a cont-
init.d-time reconciliation pass that survives `docker restart`.
Task 4.0 — container-boot reconciliation:
/run/service/ is tmpfs, so every `docker restart` wipes every
per-profile gateway slot. /etc/cont-init.d/02-reconcile-profiles
invokes hermes_cli.container_boot.reconcile_profile_gateways() on
every boot, which walks $HERMES_HOME/profiles/<name>/, reads each
gateway_state.json, recreates the s6 service slot, and auto-starts
only those whose last state was 'running'. Other states
(stopped, starting, startup_failed, missing) register the slot
in the down state — avoiding crash-loops across restarts for a
gateway that was broken last boot. Per-profile outcome is recorded
to $HERMES_HOME/logs/container-boot.log.
Implementation: hermes_cli/container_boot.py + 12 unit tests.
Profile-marker is SOUL.md, not config.yaml, because `hermes profile
create` only seeds SOUL.md by default (config.yaml comes from
`hermes setup`).
Task 4.1 / 4.2 — profile create/delete hooks:
hermes_cli/profiles.py::create_profile now calls
_maybe_register_gateway_service(<canon>) at the end, which routes
through ServiceManager.register_profile_gateway when running on s6
and no-ops on host backends. delete_profile mirrors with
_maybe_unregister_gateway_service. _allocate_gateway_port produces
a deterministic SHA-256-derived port in [9200, 9800).
Task 4.3 — gateway dispatch + remove rejection arms:
_dispatch_via_service_manager_if_s6(action) intercepts
start/stop/restart at the top of each subcommand and routes them
through S6ServiceManager.{start,stop,restart}. The pre-Phase-4
`elif is_container():` rejection arms are kept as fallback for
pre-s6 containers / unsupported runtimes, but only ever fire when
detect_service_manager() != 's6'. install/uninstall under s6
print informational guidance pointing users at profile create/delete.
Removed the two xfail(strict=True) markers from
tests/docker/test_profile_gateway.py — both tests now pass strictly.
Task 4.4 — status reporting:
get_gateway_runtime_snapshot() reports
Manager: 's6 (container supervisor)' inside an s6 container instead
of 'docker (foreground)'.
Plan-vs-reality drift fixed in this commit:
- Plan's S6ServiceManager._render_run_script used
`gateway start --foreground --port {port}` — invented args; the
real CLI is `gateway run`. Switched accordingly. port arg
retained for API parity but now documented as 'currently ignored'.
- Plan's reconciler keyed on config.yaml; switched to SOUL.md
(config.yaml is created by hermes setup, not by hermes profile
create, so the original gate caught nothing).
- The plan's _dispatch helper used _profile_arg() which returns
'--profile <name>' (i.e. with the flag prefix). Switched to
_profile_suffix() which returns the bare name.
- Architecture B's docker exec doesn't get /command on PATH or
the venv on PATH; Dockerfile's runtime PATH now includes
/opt/hermes/.venv/bin so 'docker exec <c> hermes ...' works
without sourcing the venv.
- stage2-hook now chowns $HERMES_HOME/profiles to hermes on every
boot, not just on the UID-remap path. Without this, files created
by docker-exec-as-root accumulate and the next reconciler run
fails with PermissionError reading SOUL.md.
Test harness:
19 passed, 0 xfailed (the two pre-Phase-4 xfail targets flip to
passing). 78 unit tests across service_manager + container_boot +
profiles_s6_hooks + gateway_s6_dispatch. Hadolint + shellcheck
pass cleanly.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
Phase 3 of the s6-overlay supervision plan. Implements the runtime-
registration surface from D4 — only the s6 backend supports
register_profile_gateway / unregister_profile_gateway /
list_profile_gateways; host backends continue to raise
NotImplementedError. No caller yet (Phase 4 wires in the profile
create/delete hooks).
Key implementation notes:
- Service directory shape: /run/service/gateway-<profile>/{type,run,log/run}.
Atomic register: write to gateway-<profile>.tmp, fsync via
os.rename. Cleanup on rescan failure.
- Run script uses #!/command/with-contenv sh so HERMES_HOME and any
extra_env arrive at exec time. The hermes -p <profile> gateway
start --foreground --port <port> command is wrapped in
s6-setuidgid hermes for the per-service privilege drop (OQ2-A).
- Log script (OQ8-C): persists via s6-log to
${HERMES_HOME}/logs/gateways/<profile>/. CRITICAL — HERMES_HOME is
a runtime env-var expansion in the rendered script, NOT a Python
f-string substitution. Negative-asserted in
test_s6_register_creates_service_dir_and_triggers_scan so
regressions are caught.
- PATH gotcha: /command/ is only on PATH for processes spawned by
the supervision tree (services, cont-init.d). `docker exec` and
profile-create hooks don't get it. S6ServiceManager calls all
s6-* binaries via absolute path through the new _S6_BIN_DIR
constant so callers don't have to fix up env vars.
- validate_profile_name rejects path-traversal, leading-dash (s6
would parse as a flag), uppercase, whitespace, and names >251
chars (s6-svscan default name_max).
Test coverage:
- 13 new unit tests in tests/hermes_cli/test_service_manager.py
(kind detection, run-script content, env quoting, register
rollback on rescan failure, unregister idempotence, list filter,
lifecycle dispatch, svstat parsing). Total: 36 passing.
- 2 new in-container integration tests in
tests/docker/test_s6_profile_gateway_integration.py validating
end-to-end registration against a real s6 supervision tree.
Docker harness: 14 passed, 2 xfailed (Phase 4 target unchanged).
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
BREAKING CHANGE: the container ENTRYPOINT is now /init (s6-overlay)
instead of /usr/bin/tini. Main hermes runs as the container CMD with
TTY inherited (preserving --tui), dashboard runs as a supervised s6-rc
service (HERMES_DASHBOARD=1 starts it; crashes auto-restart), and the
ground is laid for per-profile gateway supervision (Phase 3+4).
All five pre-s6 docker run invocation patterns continue to work
identically — verified by the Phase 0 docker harness:
docker run <image> → `hermes` with no args
docker run <image> chat -q "..." → `hermes chat -q ...` passthrough
docker run <image> sleep infinity → `sleep infinity` direct
docker run <image> bash → interactive bash
docker run -it <image> --tui → interactive Ink TUI
Phase 2 harness result: 12 passed, 2 xfailed (Phase 4 target). Hadolint
+ shellcheck pass cleanly.
Architecture pivot from plan v3 (documented in main-hermes/run header):
the plan called for main hermes to be an s6-supervised service, but
two real s6-overlay v3 mechanics blocked that — cont-init.d scripts
receive no arguments (CMD args are not visible to stage2-hook), and
`/run/s6/basedir/bin/halt` after writing the exit code did not
propagate the desired exit code (container exits 143). We use the
s6-overlay-native CMD pattern instead: main-wrapper.sh is the
container's main program (ENTRYPOINT prepends it so leading-dash
args like --version aren't intercepted by /init), exec's the final
program with stdin/stdout/stderr inherited, and the program's exit
code becomes the container exit code. main-hermes is now a no-op
`sleep infinity` slot kept for future supervised-gateway-container
modes. This trades "supervised restart of main hermes" for arg-
parity with the pre-s6 contract — main hermes was already unsupervised
under tini, so we lose nothing functional. Dashboard supervision is
the only new guarantee added by this phase.
Files added:
docker/main-wrapper.sh # arg routing + s6-setuidgid drop
docker/stage2-hook.sh # gosu-equivalent + chown + seed
docker/s6-rc.d/main-hermes/{type,run,dependencies.d/base}
docker/s6-rc.d/dashboard/{type,run,dependencies.d/base}
docker/s6-rc.d/user/contents.d/{main-hermes,dashboard}
Files changed:
Dockerfile: tini → s6-overlay install + ENTRYPOINT flip + service wiring
docker/entrypoint.sh: thin shim to stage2-hook.sh for back-compat
tests/docker/test_dashboard.py: add test_dashboard_restarts_after_crash
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
Phase 1 of the s6-overlay supervision plan. Pure-refactor addition:
introduces the abstract interface (with runtime_checkable Protocol),
detect_service_manager(), validate_profile_name(), and thin
SystemdServiceManager / LaunchdServiceManager / WindowsServiceManager
wrappers around the existing systemd_* / launchd_* / gateway_windows.*
module-level functions. No host call site was modified — host code
continues to use the existing functions directly; the protocol is for
new backend-agnostic code (Phase 4 profile create/delete hooks and the
Phase 4 s6 dispatch path in 'hermes gateway start/stop/restart').
WindowsServiceManager.install() forwards the v3 kwargs (start_now,
start_on_login, elevated_handoff) added in PRs #28169-adjacent so
non-Windows callers — there aren't any today — can opt in.
The s6 backend lands in Phase 3; until then get_service_manager()
raises a clear error if invoked on a host that detects as 's6'.
Phase 0.5 of the s6-overlay supervision plan. Catches Dockerfile and
shell-script regressions that the behavioral docker-publish smoke test
can't surface — unquoted variable expansions, silently-failing RUN
commands, missing apt-get clean, etc.
Both lint clean against the current (tini) Dockerfile + entrypoint.sh
at the configured thresholds (hadolint: warning, shellcheck: error).
Each ignore in .hadolint.yaml carries a one-line justification; the
shellcheck severity floor is documented in the workflow file.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
Two pre-existing baseline issues found while running the Phase 0 harness
against the tini image that need fixing before later phases can use the
harness as a behavior-parity oracle:
1. The autouse `_enforce_test_timeout` fixture in tests/conftest.py
hard-coded a 30s SIGALRM, which preempted any `pytest.mark.timeout`
marker (already honored by pytest-timeout). Honor the marker if
present; fall back to 30s otherwise. Docker harness tests carry a
180s marker applied at collection time in tests/docker/conftest.py.
2. test_dashboard_port_override polled via `ss -tlnp` / `netstat -tln`
— neither is installed in the Hermes image, so the probe trivially
failed even when the dashboard was bound. The dashboard also takes
8-15s to bind on cold image; the 5s sleep was insufficient. Replace
with a poll loop reading /proc/net/tcp directly (port 9120 = 0x23A0,
state 0A = LISTEN). Bump probe deadline to 60s and switch
test_dashboard_opt_in_starts to a similar poll for pgrep so we don't
regress to the same race.
Result: 11 passed, 2 xfailed (Phase 4 target) on tini image. Harness
now ready to serve as Phase 2's behavior-parity oracle.
The agent-test suite default is 30s; docker test_no_args (the dashboard
spin-up, the container restart) routinely take 60-90s. Without this
they intermittently fail in CI with TimeoutError.
Tasks 0.2-0.6 of the s6-overlay supervision plan. Locks the
user-visible behavior we must preserve through the Phase 2 init-
system swap:
- test_main_invocation.py (Task 0.2): docker run <image> with no
args, chat subcommand passthrough, bare executable passthrough,
bash pattern, exit-code propagation
- test_tui_passthrough.py (Task 0.3): TTY allocation via docker -t
using the host's script(1) for a PTY
- test_dashboard.py (Task 0.4): HERMES_DASHBOARD=1 opt-in,
HERMES_DASHBOARD_PORT override
- test_profile_gateway.py (Task 0.5): per-profile gateway
start/stop and profile-delete-stops-gateway. Both marked
xfail(strict=True) because the current tini image refuses
gateway lifecycle commands inside the container; Phase 4
Task 4.3 flips them to passing.
- test_zombie_reaping.py (Task 0.6): PID 1 reaps orphaned
zombies. tini does this today; s6-overlay's /init must
continue to.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
Task 0.1 of the s6-overlay supervision plan. Establishes the test
infrastructure for tests/docker/: skip-on-missing-Docker collection
hook, session-scoped image-build fixture (overridable via the
HERMES_TEST_IMAGE env var for faster local iteration), and a
container_name fixture that ensures cleanup on test exit.
Refs: docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md
Replace tini with s6-overlay as PID 1 in the Hermes Docker image so that
main hermes, the dashboard, and dynamically-created per-profile gateways
all run as supervised services. Includes container-boot reconciliation
(Task 4.0) so per-profile gateways survive docker restart.
Plan history:
- v1: 2026-05-07 — original design (subagent gateways scope)
- v2: 2026-05-18 — re-validated, scope narrowed to per-profile gateways,
WindowsServiceManager added to protocol
- v3: 2026-05-21 — re-validated in docker_s6 worktree, install-method
stamp preservation noted in Task 2.3, Task 4.0 added for container
restart survival
12.5 engineering days estimated across 7 phases.
Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR #17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.
The hook covers cases the shell-template grammar can't reasonably
express:
- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows
None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.
## Resolution order
The dispatcher's resolution order is the load-bearing invariant:
1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
→ command-provider dispatch (PR #17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
→ plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).
Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
the `hermes tools` row list defensively.
Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
`_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
config defensively so a refactor of the caller can't silently break
the invariant.
## New files
- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
`list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
`voice_compatible` (all optional with sane defaults). Mirrors
`agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
`agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).
## Modified files
- `hermes_cli/plugins.py` — `register_tts_provider()` method on
`PluginContext`. Matches the gating shape of
`register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
`_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
plugin rows into the Text-to-Speech picker category alongside the
10 hardcoded built-in rows.
## Tests
- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
`tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
built-in-always-wins, command-wins-over-plugin, plugin dispatch,
exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
picker surface, builtin shadowing defense, integration with
`_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
parity harness vs `origin/main`. The only intentional diff is
`fallback_edge → plugin` for the `plugin-installed` scenario.
## Verification
- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
`text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.
## Docs
- `website/docs/user-guide/features/tts.md` — new "Python plugin
providers" section with a decision table (command-provider vs
plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
mention both surfaces (command-provider primary, plugin for
SDK/streaming).
Closes#30398
Two-layer redaction at the persistence boundary so credentials never reach
state.db, session_*.json, or compression:
1. agent/chat_completion_helpers.py :: build_assistant_message
- Redact assistant content before the message dict is constructed
(catches PATs / API keys the model inlines into natural language)
- Redact tool_call.function.arguments at the same site (catches secrets
inlined into tool args, e.g. terminal command=curl -H 'Authorization: ...')
Tool execution uses the raw API response object, not this dict, so
redacting the persisted shape is safe.
2. run_agent.py :: _save_session_log
- Add _redact_message_content() static helper that handles both string
content and OpenAI/Anthropic multimodal list-of-parts (image parts
pass through untouched, only text/content fields are redacted)
- Apply to every message + the cached system prompt before writing
session_*.json
Both layers respect HERMES_REDACT_SECRETS via redact_sensitive_text —
no-op when disabled.
Tests (TestSaveSessionLogRedactsSecrets, 4 cases):
- api key in tool content
- api key in user message
- api key in system prompt
- multimodal list-of-parts (image part preserved, text redacted)
Tests use an autouse fixture to force _REDACT_ENABLED=True because the
hermetic conftest defaults the env var to false.
Salvaged from PR #24758 by @vgocoder (build_assistant_message + session_log)
+ PR #19855 by @liuhao1024 (multimodal list helper, system_prompt redaction).
Kept only the redaction concern from #19855; its unrelated whatsapp npm
timeout + PATCH_SCHEMA changes are out of scope and dropped.
Refs #19798 (PAT leak via assistant inline mention), #19845 (session capture
credential leak).
Co-authored-by: liuhao1024 <liuhao03@bilibili.com>
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
The web dashboard's Anthropic OAuth helper wrote the credential file
straight to its final destination and relied on the process umask for
permissions. That left the dashboard-specific path weaker than the
existing auth writers, which already use owner-only permissions and
safer write semantics.
This change keeps the scope narrow: make the dashboard helper write via
a temp file + replace, chmod the final file to owner-only, and add a
focused regression test for both permission handling and atomic-write
behavior.
Constraint: Must preserve the existing dashboard OAuth flow and credential-pool side effects
Rejected: Broader auth-storage refactor | unnecessary scope for a single verified inconsistency
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep dashboard credential writes aligned with existing auth storage semantics; do not reintroduce direct write_text() here without matching chmod/atomic behavior
Tested: pytest -o addopts='' tests/hermes_cli/test_web_server_oauth_write.py tests/hermes_cli/test_web_server.py -q (78 passed)
Not-tested: Cross-platform permission semantics on Windows-managed filesystems
The SETUP_HITS check matched any file ending in setup.py/setup.cfg/
sitecustomize.py/usercustomize.py at any path depth. This produced
false positives on every PR touching hermes_cli/setup.py (the CLI
setup wizard), which is unrelated to pip/site install hooks.
Only the top-level setup.py/setup.cfg execute during 'pip install',
and only top-level sitecustomize.py/usercustomize.py are auto-loaded
by site.py at interpreter startup. Anchor the regex with '^' so only
repo-root matches fire.
Symptom: PR #30916 (Mattermost plugin migration) flagged purely
because it deletes _setup_mattermost() from hermes_cli/setup.py.
Discord migration (#30591) hit the same false positive yesterday.
_write_claude_code_credentials wrote ~/.claude/.credentials.json via
Path.write_text + replace + post-write chmod(0o600). Both the temp file
and the destination briefly inherited the process umask (commonly 0o644
= world-readable) between create/replace and chmod, exposing the OAuth
access/refresh tokens to other local users on multi-user hosts.
Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the temp file is created atomically at 0o600. After os.replace,
the destination inherits the temp's mode, so the post-write chmod is no
longer needed. The temp name also gains a per-process random suffix to
avoid collisions between concurrent writers and stale leftovers from a
crashed prior write.
Parent dir (~/.claude/) is owned by Claude Code itself and shared with
its native auth, so we deliberately don't tighten its mode here (unlike
the mcp_oauth fix which owns its own subtree under HERMES_HOME).
Mirrors the fix shipped for agent/google_oauth.py in #19673 and the
parallel fix for tools/mcp_oauth.py in #21148.
Adds a regression test in TestWriteClaudeCodeCredentials asserting the
resulting file mode is 0o600 (skipped on Windows where POSIX mode bits
aren't enforced).
The write denylist already protects SSH keys, AWS, GPG, npm, PyPI,
Docker, Azure, and GitHub CLI credentials. Two common credential
stores were missing:
~/.git-credentials stores plaintext git tokens in the format
https://username:token@github.com when using git credential-store.
It is directly analogous to ~/.netrc which was already protected.
~/.config/gcloud/ contains Google Cloud OAuth tokens and service
account credentials. It is directly analogous to ~/.aws/ which
was already protected.
Under prompt injection, an agent could be instructed to overwrite
these files, destroying credentials or planting malicious ones.
Verified before and after with is_write_denied() on both paths.
PR #9020's salvage changed the /resume list footer from
'Use /resume <session id or title> to continue.' to
'Use /resume <number>, /resume <session id>, or /resume <session title> to continue.\n Example: /resume 2'.
test_resume_without_target_lists_recent_sessions still pinned the old
string verbatim and failed in CI. Relax to substring assertions that
allow both the new numbered footer and any future tweaks while still
verifying the hint is shown.
The numbered /resume feature added new i18n keys to en.yaml; the catalog parity
tests require every locale to carry matching keys and placeholders, so add
translations to all 15 supported locales.
Also unblock tests/cli/test_cli_resume_command.py:
- _make_cli stub now sets self.resume_display = 'minimal' since
_handle_resume_command (post-#31695) calls _display_resumed_history.
- mock_db.resolve_resume_session_id returns the input id (no compression
chain) so HERMES_SESSION_ID is set to a real string, not a MagicMock.
The gateway pairing directory (~/.hermes/pairing/) stores per-platform
access-control files (telegram-approved.json, discord-approved.json, etc.).
A prompt-injected agent using write_file could add arbitrary user IDs to an
approved file, granting persistent gateway access without going through the
pairing code flow — the same threat class that motivated protecting
webhook_subscriptions.json (#14157).
The pairing directory was not included in the original control-plane protection
because it postdates PR #14157. PR #30383 introduced the hashed-pending schema
and made the approved files the sole source of truth for gateway access, raising
the security sensitivity of the directory.
Apply the same mcp-tokens pattern: block writes to pairing/ and any path within
it, under both the active hermes_home and the root path (for profile-mode parity
with the fix in #30382).
Regression tests verify denial for pairing/telegram-approved.json,
pairing/discord-pending.json, and the directory itself, in both normal and
profile-mode layouts.
Issue #30768 reports that on native Windows PowerShell the destructive-slash
confirmation modal renders but never registers keypresses, leaving the user
unable to confirm or cancel /reset, /new, /clear, or /undo. The modal works
on macOS, Linux, and WSL; PR #23907 (merged May 11) replaced the
daemon-thread input() pattern with a prompt_toolkit-native keybinding modal
but the win32 input pipeline apparently doesn't dispatch keys to the
filter-conditioned handlers. The modal investigation is ongoing.
This change ships the immediate escape hatch: append `now`, `--yes`, or `-y`
to any destructive slash command to bypass the modal and run the action
immediately. Works on every platform without touching the broken Windows
code path.
/reset now -> reset, no modal
/new --yes my-session -> new session titled "my-session", no modal
/clear -y -> clear, no modal
/undo -y -> undo, no modal
The default behavior (modal prompts when approvals.destructive_slash_confirm
is True) is unchanged for users who don't pass a skip token.
Implementation:
- New classmethod HermesCLI._split_destructive_skip(text) -> (remainder, skip)
parses a destructive-slash command string, strips the leading "/cmd" word
and any recognized skip tokens (case-insensitive exact match, not substring),
and reports whether a skip was requested.
- HermesCLI._confirm_destructive_slash gains an optional cmd_original= arg.
When the arg contains a skip token, it returns "once" immediately —
before the gate check and before any modal rendering.
- The /clear, /new, /undo handlers in process_command pass cmd_original
through. /new additionally uses _split_destructive_skip to strip skip
tokens from the remaining text before deriving the session title, so
"/new now My Session" yields title="My Session" (not "now My Session").
Tests:
- 7 new unit tests in tests/cli/test_destructive_slash_confirm.py covering
the helper (recognized tokens, command-word stripping, case-insensitive
exact match, None/empty input) and the modal bypass (now and --yes both
skip; no-skip-token still consults the modal).
- 3 new integration tests in tests/cli/test_destructive_slash_inline_skip_e2e.py
driving HermesCLI.process_command end-to-end and asserting (a) new_session
is invoked, (b) the modal is never reached, (c) the skip token does not
leak into the session title, and (d) the no-skip-token path still reaches
the modal as a sanity check that we haven't accidentally short-circuited
the normal flow.
All 31 tests across the destructive-slash test surface pass.
Docs:
- website/docs/reference/slash-commands.md documents the new flags both in
the destructive-commands table and the dedicated approval section, with a
link back to issue #30768 explaining why the escape hatch exists.
Board defaults represent persistent project checkouts. Scratch workspaces
are auto-deleted on completion and must stay under the per-board scratch
root that resolve_workspace() creates. Inheriting default_workdir for a
scratch task pointed the cleanup path at the user's source tree — the
data-loss vector documented in #28818.
The containment guard in _cleanup_workspace (just added) is the safety
rail. This commit prevents the bad state from being created in the first
place: only persistent kinds (dir/worktree) inherit board defaults.
Tests updated to cover the new semantics: scratch with default_workdir
set keeps workspace_path=None; dir/worktree still inherits the board
default.
Salvaged from PR #31315 by @leeseoki0 — prevention layer on top of the
#28819 containment fix by @briandevans.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Copilot review on PR #28819 flagged that `_is_managed_scratch_path` accepted
the entire `<kanban_home>/kanban` subtree as managed scratch storage. With
that, a task whose `workspace_kind='scratch'` and `workspace_path` was
mis-set to `<kanban_home>/kanban`, `.../kanban/logs`, or a board's
metadata directory (e.g. `.../kanban/boards/<slug>` without the
`workspaces/` child) would pass the containment guard and let task
completion `shutil.rmtree` Hermes' own DB, metadata, and log subtrees.
Tighten the guard:
* Allowed roots are now exclusively `workspaces/` directories — the
`HERMES_KANBAN_WORKSPACES_ROOT` override, `<kanban_home>/kanban/workspaces`,
and each `<kanban_home>/kanban/boards/<slug>/workspaces` discovered on
disk.
* Require strict descendancy: a path equal to a root itself is rejected
too, because deleting a workspaces root would wipe every task's scratch
dir at once.
Add a regression test covering the three Copilot-named attack paths
(kanban root, kanban/logs, board root without `workspaces/`) plus the
workspaces-root-itself case, and confirm the inner task-id dir still
matches.
A board's ``default_workdir`` (e.g. ``hermes kanban boards
set-default-workdir my-board /path/to/real/source``) is copied into
``tasks.workspace_path`` for tasks created without an explicit
``workspace_kind``. Those tasks default to ``workspace_kind='scratch'``,
so completion calls ``_cleanup_workspace`` and unconditionally runs
``shutil.rmtree(wp, ignore_errors=True)`` — deleting the user's real
source tree as if it were disposable scratch storage.
Add ``_is_managed_scratch_path()`` and gate ``_cleanup_workspace`` on
it: only delete paths under ``HERMES_KANBAN_WORKSPACES_ROOT`` (the
worker-side override the dispatcher injects) or under the active kanban
home's ``kanban/`` subtree (covering both the legacy default-board root
and per-board ``kanban/boards/<slug>/workspaces`` roots). Anything else
gets a warning log and is left alone, so a misconfigured
``default_workdir`` can no longer destroy user data on task completion.
Follow-up to 54e61f933. The plugin enablement gate calls
``entry.is_connected(probe_cfg)`` BEFORE ``env_enablement_fn`` runs,
and the probe is built as ``existing_cfg or PlatformConfig()`` — empty
extras, ``enabled=False``.
For plugins whose ``is_connected`` reads ``config.extra`` instead
of env vars directly, that probe is a misrepresentation of what the
platform will look like after enablement. Google Chat's
``_is_connected`` short-circuits on ``config.enabled`` and inspects
``config.extra["project_id"]`` / ``config.extra["subscription_name"]``
— both False on the default probe even when the user has set
``GOOGLE_CHAT_PROJECT_ID`` and ``GOOGLE_CHAT_SUBSCRIPTION_NAME``. Result:
Google Chat silently fails the gate on every env-var-only setup.
Build a candidate probe that mirrors what the platform will look like
post-enablement:
- pre-call ``env_enablement_fn`` and layer its result into the probe's
``extra`` (without mutating any existing platform config)
- pass ``enabled=True`` on the probe — we're asking "would this BE
configured if we let it in?" not "is it currently enabled?"
- reuse the same seeded extras when we commit the platform to
``config.platforms`` (avoids calling ``env_enablement_fn`` twice)
Discord/IRC/Teams/LINE/ntfy/Simplex ``_is_connected`` hooks read env
vars directly, so they are unaffected. This change only restores
Google Chat on env-var-only setups while keeping the original #31116
Discord-no-token block intact.
All 6 shipped ``env_enablement_fn`` implementations were audited and
are pure reads (no ``os.environ`` writes), so running them earlier in
the loop has no observable side effects.
Tests: 2 new in tests/gateway/test_platform_registry.py covering
extras-seeded-before-is_connected and don't-leak-extras-on-gate-fail.
693 tests across 11 adjacent suites pass (platform_registry, config,
google_chat, matrix, discord_connect, ntfy_plugin, simplex_plugin,
line_plugin, irc_adapter, teams, gateway_platform_gating).
Refs #31116.
The hardcoded constants in _display_resumed_history were exposed as
config in PR #4434; declare them in DEFAULT_CONFIG and the CLI fallback
dict so they show up in 'hermes config' diagnostics and the schema
validator.
- test_tool_calls_shown_as_summary: explicitly disable resume_skip_tool_only
(#4434 made True the default; the legacy assertion relied on tool-only
entries being rendered as a summary).
- test_tool_only_message_skipped_by_default: add coverage for the new
default skip behavior.
- test_resume_command_*: mock_db.resolve_resume_session_id now returns the
same id (no compression chain) so the post-#15000 redirect block doesn't
shove a MagicMock into HERMES_SESSION_ID.
The cherry-picked fix from #28605 inverts an existing test (an unknown
non-lobby thread_id no longer rewrites to the most-recent binding), but
that test only seeds two bindings and queries a third thread_id. Add a
second regression test that more closely mirrors the live failure mode:
seed exactly one prior binding, then query a brand-new thread_id and
assert recovery returns None — so the new topic is allowed to get its
own session row instead of being silently merged into the previous
topic's session.
Co-authored-by: Fábio Siqueira <fabioxxx@gmail.com>
Co-authored-by: dillweed <dillweed@users.noreply.github.com>
Companion to the GH-25255 incoming-strip fix from @hayka-pacha. Without
this, build_anthropic_kwargs unconditionally added 'mcp_' to every tool
name in step 3, so a native MCP server tool registered as
'mcp_composio_X' was sent as 'mcp_mcp_composio_X' on the wire. The
incoming strip only removes ONE prefix, which still worked on first
call, but on subsequent calls the model pattern-matched the
single-prefixed form from message history and produced names that
stripped to 'composio_X' — registry miss, dispatch fail.
The history-rewrite block (#4) already has this guard. Apply the same
guard to the schema-rewrite block (#3) so round-trip is symmetric.
Added 4 outgoing-side tests. Existing 7 incoming-side tests still pass.
Author map: hayka-pacha added for PR #25270 salvage attribution.
Refs GH-25255.
When strip_tool_prefix=True (Anthropic OAuth path), normalize_response
unconditionally stripped the mcp_ prefix from ALL tool names starting
with mcp_. This broke Hermes-native MCP server tools (registered under
their full mcp_<server>_<tool> name in the registry) because the stripped
name doesn't match any registry entry.
Fix: check the tool registry before stripping. Only strip when:
- The stripped name EXISTS in the registry (OAuth-injected tool)
- The full name does NOT exist in the registry
This preserves backward compatibility for OAuth-injected tools while
protecting native MCP server tools from incorrect prefix removal.
7 new tests covering: OAuth strip, native preserve, no-flag, non-mcp,
unknown tools, mixed responses, and dual-registration edge case.
Signed-off-by: HKPA <hayka-pacha@users.noreply.github.com>
After sustained Bad Gateway / TimedOut reconnect cycles, the PTB httpx
client can enter a state where bot.send_message() returns a valid
Message (real message_id) but the message never reaches the recipient.
TelegramAdapter.send returns SendResult(success=True) and cron's
live-adapter branch marks the run delivered while the message is
silently dropped.
Add a _send_path_degraded flag. _handle_polling_network_error sets it
on reconnect storms; the existing _verify_polling_after_reconnect
heartbeat probe clears it once getMe() confirms the Bot client is
healthy. While the flag is set, send() short-circuits with
SendResult(success=False, retryable=True) so cron falls through to
the standalone delivery path (fresh HTTP session).
Closes#31165.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Fixes#31116 — two distinct bugs in fresh-install Matrix gateway:
1. Matrix E2EE setup installed only mautrix[encryption], leaving asyncpg
/ aiosqlite / Markdown / aiohttp-socks uninstalled. The first encrypted
connect failed with 'No module named asyncpg' deep inside
MatrixAdapter.connect(). Root cause: the setup wizard hand-rolled a
pip install of one package instead of using lazy_deps.ensure(
'platform.matrix'), and check_matrix_requirements() short-circuited the
runtime installer on 'import mautrix' alone — so the other 4 packages
were never pulled in.
2. Discord auto-enabled itself on every gateway start, even when the user
never selected Discord and had no DISCORD_BOT_TOKEN. Root cause:
gateway/config.py plugin-enablement loop gated enablement on
entry.check_fn() (just 'is the SDK importable?') and ignored
entry.is_connected (the 'did the user configure credentials?' probe).
Same bug class as commit 7849a3d73 fixed for _platform_status in the
setup wizard; this is the runtime counterpart. Affects Discord, Teams,
and Google Chat.
Changes:
- hermes_cli/setup.py::_setup_matrix — install via
lazy_deps.ensure('platform.matrix') to pull the full feature group.
- gateway/platforms/matrix.py::_check_e2ee_deps — verify asyncpg +
aiosqlite + PgCryptoStore in addition to OlmMachine, so E2EE failures
surface at startup instead of at first encrypted-room connect.
- gateway/platforms/matrix.py::check_matrix_requirements — use
feature_missing('platform.matrix') as the install gate instead of a
single 'import mautrix' check, so partial installs trigger the lazy
installer correctly.
- gateway/config.py plugin-enablement loop — consult entry.is_connected
before flipping enabled=True. Explicit YAML enabled=true still wins.
Tests: 3 new in tests/gateway/test_matrix.py (asyncpg-required,
aiosqlite-required, partial-install lazy-runs), 5 new in
tests/gateway/test_platform_registry.py (is_connected=False blocks,
is_connected=True enables, is_connected=None falls back to check_fn,
raising probe doesn't enable, explicit YAML wins).
Validation: 310 tests across affected test modules pass.
Standard OpenAI returns request-validation failures (unknown/
unsupported parameter, malformed request) as 4xx. Some
OpenAI-compatible gateways return them as 5xx instead — codex.nekos.me
returns 502 for an unknown parameter.
The generic '5xx -> retryable server_error' rule then misfires: the
error is deterministic (every retry gets the identical rejection), so
the retry loop burns all 3 attempts, the transport-recovery path
resets the counter and burns 3 more, and the result is a request
flood against a request that can never succeed.
Fix: when a 500/502 body carries an unambiguous request-validation
signal — 'unknown parameter' / 'unsupported parameter' /
'invalid_request_error' in the message text, or invalid_request_error
/ unknown_parameter / unsupported_parameter as the structured error
code — classify as a non-retryable format_error so the loop fails
fast and falls back. Genuine 502 Bad Gateway with no such signal
stays retryable as before.
Origin: local-author
Upstream-PR: none
Patch-State: local-only
The empty-response recovery path in run_agent.py appends synthetic
messages tagged with _empty_recovery_synthetic (and the agent loop uses
_thinking_prefill / _empty_terminal_sentinel similarly). These are
internal bookkeeping markers — they must never reach the wire.
chat_completions' convert_messages only stripped Codex Responses leak
fields (codex_reasoning_items, call_id, etc.), not these _-prefixed
markers. Permissive providers (real OpenAI, Anthropic) silently ignore
unknown message keys so the bug stayed hidden, but strict
OpenAI-compatible gateways reject them outright. Observed against
codex.nekos.me:
502: [ObjectParam] [input[617]._empty_recovery_synthetic]
[unknown_parameter] Unknown parameter:
'_empty_recovery_synthetic'
Because the synthetic messages persist in the session, every
subsequent request in that session carries the poisoned key and
fails identically — a deterministic 502 the retry loop mistakes for
a transient server error.
Fix: convert_messages now drops any top-level message key starting
with '_'. OpenAI's message schema has no '_'-prefixed fields, so this
is safe and future-proofs against new internal markers.
Origin: local-author
Upstream-PR: none
Patch-State: local-only
Adds 'hermes security audit' — a one-shot vulnerability scan against
OSV.dev covering three surfaces a Hermes user actually controls:
1. The running Python's installed PyPI dists (importlib.metadata)
2. Plugin requirements.txt / pyproject.toml pins under ~/.hermes/plugins/
3. Pinned npx/uvx MCP servers in config.yaml
Zero new dependencies (stdlib urllib + importlib.metadata + tomllib +
concurrent.futures). No auth required for OSV's public batch API.
Flags: --json, --fail-on {low,moderate,high,critical} (default: critical),
--skip-venv, --skip-plugins, --skip-mcp
Output groups findings by source, sorts by severity descending, surfaces
fixed-versions inline. Exit 1 when any finding meets the --fail-on tier.
Deliberately out of scope: globally-installed pip/npm, editor/browser
extensions, daily background scans, auto-blocking of installs. The audit
is on-demand by design — daily scans become noise the user trains
themselves to ignore.
Closes#31273.
HTTP 402 (insufficient credits) was retried up to agent.api_max_retries
times (default 3), burning paid requests against an exhausted balance.
Real-world impact: ~$40 in 48h on a 24/7 Telegram+Discord gateway.
Root cause: FailoverReason.billing was in the is_client_error
exclusion set in agent/conversation_loop.py, which prevents the
non-retryable-abort branch from firing.
By the time control reaches that predicate:
* credential-pool rotation has already run for billing and either
continued the loop or returned False (pool exhausted/absent)
* the eager-fallback branch has also fired on billing and either
continued the loop or fell through (no fallback configured)
Falling through to the backoff retry from here has no recovery
mechanism left — it just burns more paid requests. Removing billing
from the exclusion set makes 402 abort cleanly once pool+fallback
recovery has failed, mirroring how 401/403 (also should_fallback=True)
already behave.
Added tests/run_agent/test_31273_402_not_retried.py which mirrors the
is_client_error predicate shape from the source and asserts the
invariant (plus a source-inspection guard against accidental
re-introduction).
Closes#31066. Closes#31110.
An unhandled `telegram.error.TimedOut` (or peer `NetworkError` /
`httpx` connection error) propagating to the asyncio event loop killed
the entire gateway process, taking down every profile attached to the
same runner. systemd restarted the service after ~5s but the active
conversation turn was lost.
Public adapter methods (`adapter.send`, `adapter.edit_message`,
`adapter.send_voice`, …) are individually try/except-wrapped on
current main, but at least one async path was reaching the loop with
TimedOut unhandled — the report's traceback ends at the deepest httpx
frame and doesn't pinpoint the caller.
Rather than audit 30+ call sites blind, install a loop-level safety net:
`_gateway_loop_exception_handler` is set as the loop's exception handler
in `start_gateway()` after `asyncio.get_running_loop()`. It classifies
the exception via `_is_transient_network_error()` (walks the
__cause__/__context__ chain, matches on class name so the test suite
doesn't need the real telegram/httpx packages installed). Transient
errors are logged at WARNING with full traceback so the originating
call site stays diagnosable; everything else forwards to
`loop.default_exception_handler` so real bugs still surface.
Tests cover the classifier (known transients accepted, real bugs
rejected, cause/context chain unwrap, cyclic-cause termination) and the
handler (swallow + log warning, forward unknowns, missing-exception
context). One end-to-end test schedules an orphan task raising TimedOut
and asserts `asyncio.run` returns cleanly.
* fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main for vision
Fixes#31179. Three coupled fixes so a configured aux vision backend
actually serves vision tasks instead of silently routing images to the
user's main provider:
1. agent/auxiliary_client.py: `auxiliary.<task>.provider: openai` resolves
to `custom` + `https://api.openai.com/v1`. "openai" was not in
PROVIDER_REGISTRY (we have `openai-codex` for OAuth and `custom` for
manual base_url), so the obvious config name silently failed to build a
client. User-supplied base_url is still preserved; only the provider
name normalises to `custom` so resolution doesn't hit the
PROVIDER_REGISTRY-only path.
2. agent/auxiliary_client.py: the vision auto-detect chain now skips the
user's main provider when models.dev reports `supports_vision=False`.
Without this guard, a misconfigured aux provider would fall back to
`auto`, which happily returned the main-provider client. The caller
would then send image content to e.g. api.deepseek.com with model
`gpt-4o-mini` and get a cryptic `unknown variant 'image_url',
expected 'text'` from the provider's parser.
3. tools/vision_tools.py + tools/browser_tool.py: `check_vision_requirements`
now mirrors the runtime fallback chain (explicit provider, then auto),
so `vision_analyze` shows up whenever vision is actually serviceable.
`browser_vision` gets a new `check_browser_vision_requirements` check_fn
that AND-gates browser + vision availability, so it doesn't get
advertised to the model when the call would fail at runtime.
Reproduction (config from the bug report):
model.provider: deepseek
model.default: deepseek-v4-pro
auxiliary.vision.provider: openai
auxiliary.vision.model: gpt-4o-mini
Before: resolve_vision_provider_client() returns None for the explicit
provider, fallback auto returns the deepseek client with model='gpt-4o-mini',
image hits api.deepseek.com → 'unknown variant image_url'. vision_analyze
hidden from tool list; browser_vision exposed but fails at call time.
After: resolves to custom + api.openai.com/v1 with model gpt-4o-mini.
vision_analyze and browser_vision both gate correctly on capability.
Tests: tests/agent/test_vision_routing_31179.py covers all three fixes
(12 cases including the user's exact scenario, base_url preservation,
text-only-main skip, capability-unknown permissive fallback, and tool
gating parity). Existing 382 tests across auxiliary/vision/image_routing
suites still pass.
* test(vision): use exact hostname check to silence CodeQL substring-sanitization alert
* fix(auxiliary): drop model name from vision-skip debug log to silence CodeQL
The new `logger.debug(...)` added in the previous commit interpolated
both `main_provider` and `vision_model` (a public model slug \u2014 not
sensitive). CodeQL's `py/clear-text-logging-sensitive-data` heuristic
re-flagged it twice because the rule mis-detects multi-value
interpolations near tainted-via-config provider strings.
Drop the model from the log args (provider alone is enough to diagnose
the skip; the same sibling branch a few lines up already logs provider
only). Behavior unchanged; CodeQL false positive cleared.
Regression guard for #30770 — verifies the guardrail-halt branch in
agent/conversation_loop.py pushes the synthesized halt message through
stream_delta_callback before breaking out of the loop. Without the
emit, chat-completions SSE writers drain an empty queue and clients
(Open WebUI, etc.) see a finish chunk with zero content delta —
indistinguishable from a crash.
Verified: the test fails when the production fix is reverted.
When the tool loop guardrail fires (max_tool_failures, etc.), the
turn exits with guardrail_halt but no final assistant message was
emitted to the client. The SSE stream closed silently —
indistinguishable from a crash.
The stream_delta_callback(None) before tool execution is a display
flush, not a hard close. After generating the halt response, emit
it through both _safe_print (CLI) and stream_delta_callback (SSE)
so clients see the explanation.
Fixes#30770
Four recent security PRs landed on main with stale/missing test updates,
breaking 4 test shards on every subsequent PR's CI run:
- test_discord_bot_auth_bypass.py (PR #30742c3caca658):
DISCORD_ALLOWED_ROLES no longer bypasses _is_user_authorized.
Inverted 3 tests to assert the new (correct) behavior: role config
alone does NOT authorize at the gateway layer.
- test_msgraph_webhook.py (PR #301694ca77f105):
adapter.is_connected is a @property, not a method. Test was calling
it with () after the connect() change; TypeError: 'bool' is not
callable. Removed the parens.
- test_feishu_approval_buttons.py (PR #30744bdb97b857):
Card-action callbacks now go through _allow_group_message
authorization. 3 tests in TestCardActionCallbackResponse didn't
populate adapter._allowed_group_users so the operator's open_id got
rejected. Added the allowlist setup to each test, matching the
existing pattern in test_returns_card_for_approve_action.
Also raise tolerance on test_wait_for_process_kills_subprocess_on_keyboardinterrupt:
the SIGTERM → 3s TimeoutStopSec → SIGKILL → reap chain can exceed 10s
under loaded xdist (40 workers). Bumped _wait_for_pgid_exit timeout
10→30s and worker join timeout 5→15s. Passes 100% in isolation
already; this just makes it tolerant of CI-host load.
Validation: 270/270 tests pass across the 5 affected files.
response_store.db (api server) holds conversation history including tool
payloads, prompts, and results. webhook_subscriptions.json holds per-route
HMAC secrets. Under a permissive umask (e.g. 0o022, default on most
distros) both files were created mode 0o644 — readable by other local
users on shared boxes.
- gateway/platforms/api_server.py: ResponseStore tightens itself + WAL/SHM
sidecars to 0o600 after __init__, then trusts the inode. (Original
contributor patch chmod'd after every _commit() — wasteful on a hot
api_server path; chmod-on-create is sufficient since SQLite preserves
mode bits across writes.)
- hermes_cli/webhook.py: _save_subscriptions writes via tempfile.mkstemp
(which itself creates the file with 0o600), chmods the temp before the
atomic rename, and re-asserts 0o600 on the destination so an existing
permissive file from before this fix gets narrowed.
Tests cover (a) creation under permissive umask leaves 0o600 and (b) an
existing 0o644 webhook_subscriptions.json gets narrowed on next save.
Tests guarded with skipif os.name=='nt' since POSIX mode bits don't apply
on Windows.
Salvaged from PR #30917 by @Hinotoi-agent. Reworked the api_server.py
side from chmod-on-every-commit to chmod-on-create.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
When FEISHU_VERIFICATION_TOKEN is configured, an unauthenticated remote
could previously prove endpoint control by sending a url_verification
payload with any attacker-controlled challenge string — the handler
reflected the challenge BEFORE running the token check.
Move the verification_token check ahead of the url_verification echo so
the challenge response is gated on a valid token. Add a regression test
covering the wrong-token case. Also fix the stale
test_connect_webhook_mode_starts_local_server fixture to set
FEISHU_VERIFICATION_TOKEN (post #30746 webhook mode requires a secret).
Salvaged from PR #29663 by @m0n3r0 — kept the url_verification reorder
and its regression test; dropped the host-conditional weakening of the
#30746 secret guard (we want webhook secrets required regardless of
bind host, not only on 0.0.0.0/::).
Docs updated to call out the gating.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Operator misconfiguration is a client/setup error, not an internal server
exception. 403 "forbidden" more accurately reflects "this route refuses
to authenticate" than 500 "internal server error" — the latter triggers
incident alerting on operator monitoring and conflates real bugs with
config drift.
Follow-up tweak to PR #29629 by @m0n3r0.
Reject unsigned webhook requests when a route has no effective HMAC secret, even if the request handler is reached without the normal connect-time validation. Add regression coverage for the direct-handler path.
When the 'mcp' Python SDK isn't installed, _run_stdio leaked a bare
'NameError: name StdioServerParameters is not defined' because the
top-level 'from mcp import ...' fails inside try/except ImportError,
leaving the names unbound at module scope.
Mirror the _MCP_HTTP_AVAILABLE gate that _run_http already had: raise
a clear ImportError with install instructions instead.
Fixes#30904
Three test classes lock in the #30963 fix:
1. TestPartialStreamStubFinishReason — drives _interruptible_streaming_api_call
through the two recovery branches and asserts:
- text-only partial → finish_reason="length" (the new behaviour),
- mid-tool-call partial → finish_reason="stop" (unchanged on purpose).
2. TestLengthContinuationPromptBranching — pure-Python check on the branch
that picks the continuation prompt by response.id. Locks the network
error wording for partial-stream-stub vs. the output-length wording
for everything else.
3. TestConversationLoopPartialStreamContinuation — feeds a stub +
continuation pair into run_conversation, verifies the loop makes a
second API call (instead of exiting with text_response(stop)),
confirms the network-error continuation prompt actually reaches the
model on call #2, and that final_response stitches both halves.
Refs: NousResearch/hermes-agent#30963
The length-continue path's user-facing vprint and continuation prompt
both told the model "your response was truncated by the output length
limit." That's a lie when the stub came from a partial-stream network
error (issue #30963) — and a lie the model can detect, leading to "I
wasn't truncated, I'm done" no-op responses that defeat the
continuation entirely.
Detect the partial-stream-stub via response.id and swap in:
- vprint: "Stream interrupted by network error
(finish_reason='length' on partial-stream-stub)"
- prompt: "[System: The previous response was cut off by a network
error mid-stream. Continue exactly where you left off.
Do not restart or repeat prior text. Finish the answer
directly.]"
Real length truncations still see the original "truncated by output
length limit" prompt — the model needs to know which class of failure
it's recovering from. Same length_continue_retries=3 budget,
truncated_response_parts merging, and final-response stitching
infrastructure on both branches.
Refs: NousResearch/hermes-agent#30963
When the API connection drops mid-stream after text deltas have already
been delivered, chat_completion_helpers returned a stub response with
finish_reason=stop. The conversation loop then classified the stub as a
clean text completion (text_response(finish_reason=stop)) and exited
with iteration budget remaining — even when the goal-judge verdict
came back as "continue" milliseconds later (issue #30963).
Switch the text-only partial-stream stub to finish_reason=length. The
existing length-continuation path (length_continue_retries up to 3,
"continue exactly where you left off" prompt, partial parts merged
into final_response) then fires automatically: the partial assistant
content is persisted, the model is asked to continue from the cut
point, and the loop keeps making progress against the goal.
The mid-tool-call branch keeps finish_reason=stop on purpose — its
user-facing warning ("Ask me to retry if you want to continue") asks
the user to drive the retry rather than auto-replaying a tool call
with possible side effects.
#5544's "no duplicate message" contract is preserved verbatim: the
partial content is reused, never re-emitted as a fresh API call, so
the user never sees two copies of the same delta.
Refs: NousResearch/hermes-agent#30963
PR #29119 dropped the 'not streamed_message' guard unconditionally so
that plugin-transformed responses (transform_llm_output hook) would
reach ACP clients. That regressed test_prompt_does_not_duplicate_streamed_final_message:
when no transform happened, the streamed text was re-sent as a duplicate
final delivery.
Tighten the condition to mirror the gateway side: deliver after streaming
only when response_transformed=True. Otherwise keep the old guard.
Adds test_prompt_delivers_transformed_response_after_streaming so the
transformed path stays covered.
Adds a test that fails without the gateway fix, exercising the
response_transformed=True branch in _finalize_response: a streamed
response whose final text was modified by a transform_llm_output
plugin hook must be edit_message'd in place (not duplicate-sent),
with already_sent=True so the normal final-send is skipped.
Also drops two minor leftovers from the salvaged PR #29119:
* accumulated_text property on GatewayStreamConsumer (unused)
* duplicate _response_transformed=False inside the hook try block
When a transform_llm_output hook appends content after streaming, the previous
fix skipped the final-send suppression which caused the full response to be
sent as a NEW message (duplicate). Instead, edit the existing streamed message
in-place to append the transformed content, then set already_sent=True.
Added stream_consumer.message_id and .accumulated_text public properties.
run_sync() cherry-picks fields from the run_conversation result dict into
a new response dict for the gateway. response_transformed was missing from
the cherry-pick list, so the gateway always saw it as False and suppressed
the final send even though a transform_llm_output hook had modified the content.
When a transform_llm_output hook modifies final_response after streaming,
the gateway was silently discarding the transformed content because
streamed=True / content_delivered=True triggered the final-send
suppression. Three changes:
1. conversation_loop: set `_response_transformed=True` when a
transform_llm_output hook returns a non-empty string, and expose it
as `response_transformed` in the result dict.
2. gateway/run: skip the final-send suppression when
`response_transformed` is True — the transformed response must
reach the client even if streaming already sent the original text.
3. acp_adapter/server: remove `not streamed_message` guard so
final_response is always delivered (ACP path fixed separately).
When streaming is active, streamed_message=True skipped the final_response
update, causing plugin hooks like transform_llm_output to be silently
invisible. Remove the `not streamed_message` guard so the final response
(possibly transformed by plugins) is always delivered to the ACP client.
Closes#31370.
bws defaults to the US identity endpoint, so EU Cloud and self-hosted
machine-account tokens fail with [400 Bad Request] {"error":"invalid_client"}
during 'hermes secrets bitwarden setup'. The token is valid — it's just
being checked against the wrong region.
Add a Bitwarden region step to the wizard between the access-token and
project-list steps:
Step 1 Install bws
Step 2 Provide access token
Step 3 Pick region <-- new (US / EU / self-hosted-custom-URL)
Step 4 Pick project (now talks to the right endpoint)
Step 5 Test fetch
Region is stored in config.yaml as secrets.bitwarden.server_url and
plumbed into every bws subprocess as BWS_SERVER_URL (project list,
secret list, test fetch, and the env_loader startup pull).
Also:
- Non-interactive: 'hermes secrets bitwarden setup --server-url ...'
- Pre-existing BWS_SERVER_URL in the shell is detected and reused
- Cache key includes server_url so EU/US fetches don't collide
- 'hermes secrets bitwarden status' shows the configured region
- 'invalid_client' / '400 Bad Request' from bws now triggers a hint
pointing at the region setting instead of looking like a bad token
PR #6a1aa420e coupled `display.tool_progress: verbose` (a per-tool display
toggle for full args / results / think blocks) to `self.verbose` — which
controls root-logger DEBUG level. Result: setting tool_progress: verbose
in config silently flipped every module in the process to DEBUG and
flooded the terminal with internal logging, far beyond just full tool
calls.
The two concepts are separate:
- `tool_progress_mode == 'verbose'` → display behavior (tool rendering)
- `self.verbose` → logging behavior (root logger → DEBUG, line 9795)
This change keeps PR #6a1aa420e's argparse.SUPPRESS / config-fallback
plumbing but severs the verbose-display → debug-logging link.
Changes:
- cli.py:2868 — `self.verbose` only follows explicit `verbose=` arg; no
longer auto-True when tool_progress_mode == 'verbose'.
- cli.py:_toggle_verbose — slash-cycle through tool progress modes no
longer flips `self.verbose` / `agent.verbose_logging` / `agent.quiet_mode`.
- cli.py:9355 — fix misleading label (drop 'and debug logs').
- tui_gateway/server.py:_make_agent — same decoupling on the TUI side
(verbose_logging no longer derived from tool_progress_mode).
- tests/cli/test_tool_progress_scrollback.py — invert the test that
asserted the broken coupling; add coverage for explicit `--verbose`
still enabling DEBUG independent of tool_progress.
Live verified:
- tool_progress: verbose, no --verbose flag → 0 DEBUG/INFO log lines
- --verbose flag explicit → 32 DEBUG/INFO log lines (as expected)
When asyncio.sleep() fires just before Task.cancel() is called, CPython
sets _must_cancel=True but cannot cancel the already-completed sleep
future, so CancelledError is delivered at the next await (handle_message)
rather than at the sleep. By that point the superseded task has already
popped the merged event from _pending_text_batches, so the superseding
task sees an empty batch and silently drops the message.
Fix: add a synchronous task-registry check between the sleep and the pop.
No await between the check and the pop means no other coroutine can
interleave, so the guard is race-free.
When WeCom returns errcode=40001 (invalid credential) or 42001 (token
expired), send() was returning a failure without evicting the bad token
from _access_tokens. All subsequent sends then kept using the same
invalid cached token until its TTL naturally expired (~7200s).
Fix: on the first token-rejection errcode, evict the cache entry and
retry once with a freshly fetched token. Non-token errcodes fail
immediately as before. If the refreshed token also fails, the error
is returned without looping further.
Adds four regression tests covering: successful retry on 40001,
successful retry on 42001, no retry on unrelated errcode, and clean
failure when the refresh does not help.
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint
Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:
A. agent/file_safety.classify_cross_profile_target
Classifies a write target against the active HERMES_HOME. Returns
a {active_profile, target_profile, area, target_path} dict when the
path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
(skills, plugins, cron, memories). get_cross_profile_warning()
wraps it into a model-facing error string that names both profiles,
names the area, and points at the cross_profile=True bypass.
Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly. The
guard exists to prevent confused-agent corruption, not to stop a
determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
still applies.
Wired into tools/file_tools.write_file_tool and patch_tool with a
cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
advertise cross_profile so the model can pass it after explicit
user direction. patch_tool extracts target paths from V4A patch
bodies before checking (same shape as the existing sensitive-path
check).
skill_manage is already scoped to the active profile's SKILLS_DIR
by construction, so no extra guard wiring is needed there. The
D-side error message (below) still names other profiles when the
skill exists elsewhere.
B. agent/system_prompt
One deterministic line near the environment-hints block names the
active profile and tells the model not to modify another profile's
skills/plugins/cron/memories without explicit direction. Profile
name is stable for the lifetime of the AIAgent, so the line is
prompt-cache-safe.
D. tools/skill_manager_tool._skill_not_found_error
Replaces the bare "Skill 'X' not found." with a message that:
- names the active profile,
- searches OTHER profiles' skills dirs for the same name,
- names the profile(s) where the skill exists and the path,
- suggests `hermes -p <name>` to switch profiles, or
cross_profile=True for an explicit edit.
All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
write_file, remove_file) now go through the helper.
Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.
What this PR does NOT do:
* No hard block. The terminal tool can still touch any of these
paths with no guard — same posture as the dangerous-command
approval flow. SECURITY.md §3.2 applies.
* No regex sweep on terminal commands for cross-profile paths.
That direction is a Skills-Guard-style arms race (cd + relative
paths, base64, etc.) and would false-positive on legitimate
cross-profile reads. Filed as a follow-up.
* No on-disk path migration. ~/.hermes/skills/ remains the
default profile's skills dir; this PR is about telling the
agent about that boundary, not changing the layout.
Tests:
tests/agent/test_file_safety_cross_profile.py (16 tests)
- _resolve_active_profile_name covers default/named/failure paths
- classify_cross_profile_target covers all four scoped areas,
both directions (default → named, named → default, named → named),
non-Hermes paths, and root-level config files
- get_cross_profile_warning covers in-profile no-op, cross-profile
message shape, and the defense-in-depth self-documentation
tests/tools/test_cross_profile_guard.py (12 tests)
- write_file: in-profile allow, cross-profile block, cross_profile=True
bypass, non-Hermes pass-through
- patch: replace-mode block, cross_profile=True bypass, V4A patch
path extraction
- skill_manage: error names the other profile (single + multiple),
missing-everywhere falls back to skills_list hint
- system prompt: contract-level checks (both branches present,
cross_profile=True mentioned, ~/.hermes/profiles/ referenced)
All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.
E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.
* fix(code_execution): add cross_profile to write_file/patch stubs
The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().
Caught by CI on PR #31290.
Adds an --ids flag to 'hermes kanban promote' mirroring the existing
block/schedule convention, so the marquee use case from issue #28822
(promote all children of a closed organizational parent in one shot)
doesn't require a shell loop. Single-id JSON output stays a flat
object for back-compat; bulk emits a list. Dedupes positional + --ids
so the same id can't be promoted twice in one call. 5 new CLI-level
tests cover bulk happy path, partial-failure exit code, JSON shapes,
and dedup.
Also adds the thedavidmurray noreply-email -> github-login mapping in
scripts/release.py so the salvage cherry-pick passes the AUTHOR_MAP
contributor-credit check.
Adds `hermes kanban promote <task_id>` for manual lifecycle recovery
when an auto-promote daemon misses the parent-done transition (issue
#28822). Refuses promotion unless every parent dep is done/archived
(override with --force). Emits a `promoted_manual` audit event distinct
from the automatic `promoted` kind, so audit consumers can filter
human-driven from system-driven promotions. Supports --dry-run and
--json for orchestration. Does not mutate assignee/claim state — the
dispatcher picks the card up via its normal ready polling path.
Closes#28822.
The post-turn background reviewer prompt listed pinned skills under
'Protected skills (DO NOT edit these)' alongside bundled and
hub-installed skills, with the instruction to say 'Nothing to save.'
if only protected skills needed updating. This meant the reviewer
would refuse to patch a pinned skill even when the user explicitly
wanted that skill improved.
The underlying tool layer already gets this right: skill_manage's
_pinned_guard only fires on delete; patch/edit/write_file go through
on pinned skills. Curator archive/consolidation still skips pinned
at the data layer (agent/curator.py), which is the correct place for
that protection — pin's job is anti-deletion, not anti-improvement.
Both _SKILL_REVIEW_PROMPT and _COMBINED_REVIEW_PROMPT now explicitly
tell the reviewer that pinned skills can be patched, with rationale,
so it doesn't bail out of an improvement just because the target is
pinned.
Two independent bugs caused the slash-command autocomplete to render
`/goal` as `/goa` (and `/gquota` as `/gquot` for that matter) in the TUI:
1. `tui_gateway/server.py` was forwarding `c.display` from
prompt_toolkit's `Completion` straight into the JSON-RPC payload.
prompt_toolkit normalizes `display=` into `FormattedText` (a `list`
subclass), so the wire format became `[["", "/goal"]]` instead of
the `string` that `CompletionItem.display` in the TUI declares.
`meta` already went through `to_plain_text` — `display` did not.
2. The dropdown row in `appOverlays.tsx` used `flexDirection="row"`
with the display `<Text>` and the (very long) meta `<Text>` as
siblings. When the meta overflows the row width, Ink/Yoga shrinks
the *first* column by one cell, lopping the trailing character off
the command name. `/goal` triggers it reliably because its meta
string is the longest of any built-in command (description +
embedded `[text | pause | resume | clear | status]` usage hint).
Wrapping the display column in `<Box flexShrink={0}>` keeps it at
its natural width and lets the meta wrap or truncate instead.
If Nous Portal is the recommended way to run Hermes Agent, it deserves
more than a sub-section buried under `## Inference Providers`. Add two
new pages and shrink the existing providers.md section to a stub that
points at them.
New pages:
- `website/docs/integrations/nous-portal.md` — landing page. What's in
the subscription (300+ model catalog table, Tool Gateway breakdown,
Nous Chat, cross-platform parity, no-dotfile-credentials). Hermes 4
recommendation note. Setup paths (fresh install, existing install,
headless / SSH, profiles). Day-to-day usage (portal status / portal
tools / portal open, switching models, mixing gateway with own
backends, subscription management). Configuration reference. Token
handling. Troubleshooting. Cross-links. Sidebar-position 1 — first
entry under Integrations.
- `website/docs/guides/run-hermes-with-nous-portal.md` — task script.
Eight numbered steps: subscribe → setup --portal → verify with
portal status → first chat → switch models → customize gateway
routing → voice mode → cron/always-on. Per-step troubleshooting.
'What this gets you in plain numbers' comparison table. Sidebar
position 1 — first entry under Guides & Tutorials.
Existing providers.md:
- Replace the 80-line `### Nous Portal` deep-dive with a 13-line stub
that summarizes the value prop, lists the three CLI commands, and
links to the new pages. Saves ~6KB. Other provider sections and
callouts (Codex Note, Two Commands, Tool Gateway tip) preserved.
Sidebar:
- `integrations/nous-portal` inserted right after `integrations/index`,
before `integrations/providers`.
- `guides/run-hermes-with-nous-portal` inserted first in Guides &
Tutorials.
The original PR #17194 description claimed test_display_tool_preview.py
but only ever shipped test_display_todo_progress.py. Add the missing
coverage for the failure-suffix path:
- _trim_error: whitespace strip, length cap, File-not-found path collapse
- _detect_tool_failure: terminal exit codes, memory full, structured
{error}/{message} extraction, malformed JSON, None result
- get_cute_tool_message E2E: read_file failure, terminal exit-only,
terminal stderr message, memory full, success path, no-result path
Also update test_tool_progress_scrollback.test_error_suffix_on_failed_tool
to reflect the new behavior: the generic '[error]' fallback in cli.py
has been removed; failure suffixes now come from the result-aware
_detect_tool_failure (e.g. '[exit 1]', '[File not found: x]').
Parse the todo_tool result summary to display completion progress in
CLI tool preview lines:
Read: ┊ 📋 plan 3/4 task(s) 0.5s
Update: ┊ 📋 plan update 3/4 ✓ 0.5s
Create: falls back to plain count when no completed tasks
Falls back gracefully to the existing 'N task(s)' format when the
result is missing, malformed, or has no completed items.
Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current
main.
Co-authored-by: Albert.Zhou <albert748@gmail.com>
Improves the failure suffix on tool completion lines. Instead of always
showing '[error]' for non-terminal failures, parse the tool's JSON result
and surface the actual message:
Before: ┊ 📖 read foo.py 0.1s [error]
After: ┊ 📖 read foo.py 0.1s [File not found: foo.py]
Before: ┊ 💻 $ ls bad 0.1s [exit 127]
After: ┊ 💻 $ ls bad 0.1s [ls: cannot access 'bad'...]
Adds a _trim_error helper that strips long absolute paths down to the
filename and caps the suffix at 48 chars so it stays readable on narrow
terminals.
Threads the tool result through the tool.completed progress callback so
agent/display.get_cute_tool_message can inspect it. The cli.py [error]
post-suffix is removed in favor of the richer suffix _detect_tool_failure
now produces directly.
Originally proposed in PR #17194 by Albert.Zhou; salvaged onto current
main with the dead-code preview-length bumps dropped (tool_preview_length
config already strictly caps previews, so the per-tool n= defaults are
unreachable).
Co-authored-by: Albert.Zhou <albert748@gmail.com>
`terminal(background=true)` without `notify_on_complete=true` or
`watch_patterns` runs the process SILENTLY — the agent has no way
to learn it finished short of calling `process(action='poll')`
explicitly. That's correct for genuine long-lived processes (servers,
watchers, daemons) but is a footgun for every bounded task (tests,
builds, deploys, CI pollers, batch jobs), which is the vast majority
of background uses.
Hit on May 23, 2026 (PR #31231 incident): agent launched a CI-watch
loop with `background=true` only. The poller ran fine, exited green
6 minutes later, agent never noticed. User had to surface 'we are
green CI, you can merge.' Memory and skill docs said *what* to do
(poll in background) but not *how* to receive the result. The
`notify_on_complete=true` flag exists and works, but is easy to
forget when bg seems sufficient on its own.
Two changes here, mutually reinforcing:
1. Runtime nudge: tool result for `background=true` w/o notify or
watch_patterns now includes a `hint` field explaining the silent-
process failure mode and pointing at the corrective flag. Agent
sees it on the same turn and self-corrects without needing the
user to surface anything. Cost for legitimate server cases is one
ignored read (~50 tokens); cost for forgot-notify cases is
prevented blindness (potentially many turns, or a user nudge).
False positives << false negatives.
2. Schema/description rewrite: top-level TERMINAL_TOOL_DESCRIPTION
and the `background` field description now lead with 'Almost
always pair with notify_on_complete=true' instead of presenting
it as one of two equally-likely patterns. The two legitimate
non-notify shapes (long-lived servers; watch_patterns mid-process
signals) are still documented, but as the minority case.
Tests cover all four shapes: bg-only emits hint, bg+notify doesn't,
bg+watch_patterns doesn't, foreground doesn't. 4 new tests; full
suite of background/process tests stays green (160/160 across the
relevant 6 test files).
AI Card "tool progress" cards created with finalize=False were left in
streaming state on DingTalk's UI after a gateway restart because
disconnect() called _streaming_cards.clear() without first closing
them via _close_streaming_siblings.
Move the finalization loop before self._http_client.aclose() so the
HTTP client is still available when the finalize requests are sent.
Adds a regression test that asserts the HTTP client is alive during
finalization.
Reorder the per-provider subsections under '## Inference Providers'
so Nous Portal — the recommended setup — leads the list, and Google
Gemini via OAuth (which carries a policy-risk warning) drops to last
position right before the '## Custom & Self-Hosted LLM Providers'
section. All other provider sections keep their relative order. Pure
section move; no content changes.
The Windows branch of `_terminate_host_pid` early-returned after
`os.kill(pid, SIGTERM)` (which Python maps to `TerminateProcess` for
the target handle only), leaving descendant processes — e.g. Chromium
renderer/GPU/network helpers spawned by an `agent-browser` daemon —
running on Windows even after the preceding commit fixed POSIX.
The right Windows primitive is `taskkill /PID <pid> /T /F`:
`/T` walks the tree, `/F` force-terminates. Same approach
`gateway.status.terminate_pid(force=True)` already uses for the
gateway's own shutdown path; reuse the same shape here.
Why NOT extend the POSIX psutil tree-walk to Windows:
1. Windows doesn't maintain a Unix-style process tree. `psutil.
Process.children(recursive=True)` walks PPID links that go stale
when intermediate processes exit, so enumeration is best-effort
and silently misses orphaned descendants. The whole bug we're
fixing is orphaned descendants.
2. `psutil.Process.terminate()` on Windows is `TerminateProcess()`
for one handle — same single-PID scope as the existing
`os.kill`. The existing comment in `gateway/status.py::
terminate_pid` warns this explicitly: 'os.kill SIGTERM is not
equivalent to a tree-killing hard stop' on Windows.
3. Headless Chromium has no GUI window, so the softer
`taskkill /T` without `/F` (which sends WM_CLOSE) won't reach
it either. `/F` is required.
POSIX path is unchanged. The taskkill subprocess uses the same
`creationflags=windows_hide_flags()` pattern other Windows shellouts
in this codebase use. `FileNotFoundError` / `TimeoutExpired` /
`OSError` fall back to bare `os.kill(SIGTERM)` as cheap insurance.
Tests cover the Windows branch via the codebase's standard
`monkeypatch _IS_WINDOWS` pattern (`references/windows-native-
support.md`), plus POSIX tree-walk order, NoSuchProcess swallow,
and the OSError fallback path. 7 new tests, all green on Linux CI.
os.kill(pid, SIGTERM) only signals the parent, leaving Chromium child
processes (renderer, GPU, etc.) orphaned. Reuse the existing
ProcessRegistry._terminate_host_pid() helper which walks the process
tree leaf-up via psutil, terminating children before the parent.
The old section sold Nous Portal as access to Hermes-4 models, which is
backwards — Hermes 4 is a chat/reasoning family that's NOT recommended
for Hermes Agent (per portal.nousresearch.com/info itself). The actual
value prop is the 300+ frontier agentic models (Claude, GPT, Gemini,
DeepSeek, etc.) plus the Tool Gateway plus Nous Chat under one
subscription.
Rewrite to lead with that, position the portal as the recommended way
to run Hermes Agent, demote Hermes 4 to a 'note' explaining why it's
not the right pick for agent workloads, and link to the
manage-subscription page from setup.
Policy: if it ain't a secret it goes in config.yaml. HERMES_INFERENCE_PROVIDER
was leaking behavioral config into the .env surface, including from the gateway,
which bypassed config.yaml entirely.
Behavior:
- gateway/run.py: drop HERMES_INFERENCE_PROVIDER read in _resolve_runtime_agent_kwargs.
Gateway now flows through resolve_runtime_provider() with no `requested` override,
which reads model.provider from config.yaml first.
Docs/UX (strip env var from user-facing surface):
- --provider help text no longer mentions the env var
- cli-config.yaml.example same
- reference/environment-variables.md: remove HERMES_INFERENCE_PROVIDER row and
the cross-reference from HERMES_INFERENCE_MODEL
- reference/cli-commands.md: blank the env-var column for --provider
- guides/xai-grok-oauth.md, guides/minimax-oauth.md: replace
HERMES_INFERENCE_PROVIDER=x hermes invocations with config.yaml / --provider
- developer-guide/adding-providers.md, model-provider-plugin.md: reframe
Internal mechanism (kept as-is):
- hermes_cli/main.py writes HERMES_INFERENCE_PROVIDER into the TUI subprocess env
- tui_gateway/server.py reads it on TUI startup
- resolve_requested_provider() / oneshot.py / cli.py still fall through to the
env var as a last-resort behind config.yaml, which is what makes the TUI
parent->child handoff work
This stays. We just stop documenting it as a user knob.
Tests: tests/gateway/test_auth_fallback.py — simplify mock to fail on first
call, succeed on second; drop monkeypatch.setenv lines that no longer matter.
Supersedes #31064 (closed with credit to @novax635 who surfaced the underlying
issue but proposed aligning gateway *to* the env var rather than removing it).
Auxiliary LLM tasks (vision, compression, web_extract, etc.) currently
require modifications to core files for any plugin that needs its own
task slot — specifically the _AUX_TASKS list in hermes_cli/main.py and
the hardcoded env-var bridging dict in gateway/run.py. This violates
the 'plugins must not modify core files' rule and forces every memory
or context plugin that wants its own auxiliary task to either fork
core or open a coupled core+plugin PR.
This change adds a generic plugin surface for auxiliary task
registration:
ctx.register_auxiliary_task(
key='memory_retain_filter',
display_name='Memory retain filter',
description='hindsight pre-retain dedup/extract',
defaults={'timeout': 30, 'extra_body': {'reasoning_effort': 'low'}},
)
After registration, the task automatically:
- Appears in 'hermes model → Configure auxiliary models' picker via
a new _all_aux_tasks() merge of built-in + plugin tasks
- Has its provider/model/base_url/api_key bridged from config.yaml
to AUXILIARY_<KEY_UPPER>_* env vars at gateway startup
(gateway/run.py now uses a dynamic bridged-keys set instead of
a hardcoded per-task dict)
- Gets plugin-declared defaults (timeout, extra_body, etc.) layered
underneath user config so unconfigured plugin tasks still work
(agent/auxiliary_client._get_auxiliary_task_config)
- Resets to auto via 'Reset all to auto' alongside built-ins
Validation:
- Rejects shadowing of built-in keys (vision, compression, etc.)
- Rejects invalid key shapes (must match [A-Za-z0-9_]+)
- Rejects cross-plugin collisions (clear error)
- Allows same-plugin re-registration (idempotent updates)
Plugin discovery failures (rare) fall back gracefully — the aux
config UI still shows built-in tasks if get_plugin_auxiliary_tasks()
raises, and gateway env-var bridging keeps working for built-ins.
Built-in tasks remain hardcoded in _AUX_TASKS for stability — they're
the baseline UX, and DEFAULT_CONFIG already ships their defaults.
Plugin tasks layer on top.
Tests: 15 new tests in test_plugin_auxiliary_tasks.py covering API
validation, manager state lifecycle, helper sort order, _all_aux_tasks
merge semantics, _reset_aux_to_auto inclusion of plugin tasks, and
default-layering in auxiliary_client.
Updates the gateway-bridge code-parity test (test_auxiliary_config_bridge)
to assert the new dynamic shape rather than the hardcoded literal env
var names which no longer appear post-refactor.
Motivation: this unblocks PR #20262 (hindsight smart retain pipeline)
and similar plugins that need a dedicated aux task slot. The change
is non-breaking — built-in env vars (AUXILIARY_VISION_PROVIDER, etc.)
keep working since they're produced by the same f-string template
that built the hardcoded names.
Trim ~600 LOC off the original contribution while keeping the same
operator-facing surface and detection coverage.
- Collapse three entry points (file / dir / bundle) into one
ast_scan_path(path) that handles both files and directories.
- Drop AstFinding dataclass + severity field — replaced with plain
(file, line, pattern_id, description) tuples. Severity ordering was
display-only for a diagnostic that explicitly disclaims security
verdicts, so the field added bookkeeping without earning its place.
- Replace Rich-markup formatter with plain text grouped by file.
- Drop the 'inspect --ast-deep' surface — same scanner, same output as
'audit --deep', single CLI entry is enough. Operators audit after
install; pre-install inspection signal isn't worth the second surface.
- Trim test file to the cases that earn their place: bypass payload,
syntax error survival, RecursionError survival, false-positive guard
(importer lookalike), literal-arg false-positive guard, non-.py
ignored, directory recursion + cache-dir skipping, missing-path,
getattr/__dict__ detection, formatter empty + populated.
Net: tools/skills_ast_audit.py 353 -> 133 LOC,
tests/tools/test_skills_ast_audit.py 299 -> 103 LOC, full diff
+704/-12 -> +264/-6. No change to tools/skills_guard.py — Skills Guard
verdicts remain untouched per SECURITY.md §2.4.
Add opt-in AST diagnostics for skill review without making Skills Guard stricter by default.
- Add hermes skills inspect --ast-deep to scan fetched skill bundles before installation
- Add hermes skills audit --deep to scan already-installed hub skills
- Keep AST analysis in tools/skills_ast_audit.py, separate from tools/skills_guard.py
- Label output as diagnostic hints, not security verdicts
- Cover dynamic import/access patterns: importlib, __import__(computed), getattr(computed), and __dict__[computed]
This follows the maintainer guidance from closed PR #7436: useful AST-level analysis belongs in an opt-in diagnostic path, not in Skills Guard's default heuristic scan.
* fix(tui): refresh virtual transcript on viewport resize
Notify scroll subscribers when ScrollBox viewport bounds change and key virtual-history updates on viewport height so resize/keyboard changes remount the tail rows instead of leaving stale spacers visible.
* test(tui): isolate viewport-height remount regression
Keep the resize delta below the virtual history scroll quantum so the regression test specifically depends on viewport height entering the snapshot key.
* test(tui): clarify virtual history resize snapshot
Update the resize regression and comments so the test specifically guards viewport-height changes in the virtual-history snapshot key.
* docs(tui): clarify scrollbox subscription signals
Document that ScrollBox subscribers are notified for renderer-computed viewport and content bound changes, not only imperative scrolls.
* fix(tui): recompute virtual tail after width resize
Avoid preserving a frozen virtual transcript range when wrapped rows shrink enough that the old tail window no longer covers the viewport.
* fix(tui): preserve transcript tail across resizes
Wraps + heights are column-dependent, so a width change must remeasure
every row and the renderer must repaint the full viewport.
- Key virtualRows on cols so React remounts wrapped rows on resize.
- Snap back to bottom after sticky-mode resize once React rerenders.
- Reserve a scrollbar + gap column in transcriptBodyWidth (non-termux).
- Full repaint on any viewport height change (was: shrink-only).
- ScrollBox scrollHeight uses deepest child bottom so sticky-bottom
math can reach the real final rendered row after reflow.
- DECSTBM fast-path now requires full container rect match.
* feat(tui): responsive banner tiers
Terminals can't scale glyphs, so the banner now picks a layout per
column width instead of always rendering the full 101-col logo:
- Wide (>= logo width): full ASCII logo + tagline.
- Mid (>= 58 cols): centered rule banner that expands with viewport.
- Narrow (>= 34 cols): brand line + tagline, both width-aware.
- < 34 cols: hidden.
SessionPanel surfaces model/cwd/sid inline when the hero column is
hidden, so narrow layouts don't lose that info. Logo width constants
derive from the art itself.
* fix(tui): re-check sticky inside resize debounce + document remount
Addresses Copilot review on PR #31077:
- onResize now re-checks isSticky() inside the 100ms timer so manual
scrolls during the debounce window don't get snapped back to tail.
- Comment on the virtualRows cols-keying calls out the deliberate
trade-off: per-row local state (e.g. systemOpen) resets on resize so
yoga can remeasure off live geometry. The hook's scale-by-ratio path
is too approximate for mixed markdown widths.
Null bytes in API key values (introduced by copy-paste) crash
os.environ[k] = v with ValueError: embedded null byte, preventing
hermes from starting at all.
* docs(simplex): remove broken Docker install command (#26974)
The "Or Docker" snippet pointed at `simplexchat/simplex-chat`, which is
not a published Docker Hub image. Users following the docs hit:
docker: Error response from daemon: pull access denied for
simplexchat/simplex-chat, repository does not exist or may require
'docker login'.
The SimpleX Chat project only publishes Docker images for its server
components (smp-server, xftp-server) — the chat CLI is distributed as a
binary release. Drop the broken `docker run` line and keep the verified
binary-download path, with a note pointing users to the upstream
Dockerfile if they want to build a container themselves.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(simplex): drop misleading "Dockerfile" link text
Copilot review flagged that the link text claimed "Dockerfile in the
upstream repo" but the URL pointed at the repository root, not a
specific Dockerfile path. Reword to "build from source from the
simplex-chat repository" so the link text and target match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 1: /voice off in TUI mode did not clear HERMES_VOICE_TTS,
leaving TTS stuck ON with no way to disable it (the voice.toggle
tts handler requires voice mode to be ON).
Bug 2: TUI status bar only showed 'voice on/off' without any
indication of whether TTS speech output is active, because the
frontend never tracked voiceTts state.
- tui_gateway/server.py: clear HERMES_VOICE_TTS when voice is turned off
- ui-tui/src/app/useMainApp.ts: add voiceTts state, thread setVoiceTts
through voice contexts, display [tts] in status bar
- ui-tui/src/app/slash/commands/session.ts: sync tts from voice.toggle response
- ui-tui/src/app/interfaces.ts: add setVoiceTts to all voice context interfaces
Move shutil.rmtree into a finally block so the temp directory is always
cleaned up, even when an exception occurs during download, extraction,
or file copying.
Robustness:
- Surface 401/404 stream failures via _set_fatal_error() so the gateway's
runtime status reflects 'fatal: ntfy_unauthorized' / 'ntfy_topic_not_found'
instead of staying 'connected' when the reconnect loop halts. Matches
the pattern in whatsapp / telegram / sms adapters.
- Strip whitespace from auth tokens so pasted tokens with trailing
newlines don't produce malformed Authorization headers.
Simplicity:
- Extract _build_auth_header() and _truncate_body() to module-level
helpers, used by both NtfyAdapter and _standalone_send. Removes the
duplicated auth/truncation logic between the two paths.
Docs:
- website/docs/user-guide/messaging/ntfy.md — full setup guide,
identity-model warning, self-hosting, cron usage, troubleshooting.
- website/docs/reference/environment-variables.md — all 9 NTFY_* vars.
- website/docs/user-guide/messaging/index.md — platform comparison row.
- website/sidebars.ts — sidebar entry between simplex and open-webui.
Tests: 78/78 (+ 10 new robustness tests covering token hygiene, fatal
error propagation for 401/404, and the _truncate_body helper).
ntfy now ships as a self-contained plugin under plugins/platforms/ntfy/
instead of editing 8 core files (gateway/config.py Platform enum,
gateway/run.py factory + auth maps, cron/scheduler.py, toolsets.py,
hermes_cli/status.py, agent/prompt_builder.py, gateway/channel_directory.py,
tools/send_message_tool.py).
All routing goes through gateway/platform_registry via register_platform():
- adapter_factory, check_fn, validate_config, is_connected
- env_enablement_fn seeds PlatformConfig.extra from NTFY_* env vars so
gateway status reflects env-only setups without instantiating httpx
- standalone_sender_fn handles deliver=ntfy cron jobs when cron runs
out-of-process from the gateway
- allowed_users_env / allow_all_env hook into _is_user_authorized
- cron_deliver_env_var=NTFY_HOME_CHANNEL for cron home routing
- platform_hint surfaces in the system prompt
- pii_safe=True (topic names are the only identifier; no PII to redact)
Tests moved to tests/gateway/test_ntfy_plugin.py using _plugin_adapter_loader
so the module lives under plugin_adapter_ntfy in sys.modules and cannot
collide with sibling plugin-adapter tests on the same xdist worker. The
core-file grep tests (Platform.NTFY in source, hermes-ntfy in toolsets,
etc.) are replaced with plugin-shape tests covering register() metadata,
env_enablement_fn output, and standalone_sender_fn behavior.
68 tests pass under scripts/run_tests.sh.
Addresses Copilot review on PR #31077:
- onResize now re-checks isSticky() inside the 100ms timer so manual
scrolls during the debounce window don't get snapped back to tail.
- Comment on the virtualRows cols-keying calls out the deliberate
trade-off: per-row local state (e.g. systemOpen) resets on resize so
yoga can remeasure off live geometry. The hook's scale-by-ratio path
is too approximate for mixed markdown widths.
Terminals can't scale glyphs, so the banner now picks a layout per
column width instead of always rendering the full 101-col logo:
- Wide (>= logo width): full ASCII logo + tagline.
- Mid (>= 58 cols): centered rule banner that expands with viewport.
- Narrow (>= 34 cols): brand line + tagline, both width-aware.
- < 34 cols: hidden.
SessionPanel surfaces model/cwd/sid inline when the hero column is
hidden, so narrow layouts don't lose that info. Logo width constants
derive from the art itself.
Wraps + heights are column-dependent, so a width change must remeasure
every row and the renderer must repaint the full viewport.
- Key virtualRows on cols so React remounts wrapped rows on resize.
- Snap back to bottom after sticky-mode resize once React rerenders.
- Reserve a scrollbar + gap column in transcriptBodyWidth (non-termux).
- Full repaint on any viewport height change (was: shrink-only).
- ScrollBox scrollHeight uses deepest child bottom so sticky-bottom
math can reach the real final rendered row after reflow.
- DECSTBM fast-path now requires full container rect match.
Keep the resize delta below the virtual history scroll quantum so the regression test specifically depends on viewport height entering the snapshot key.
Notify scroll subscribers when ScrollBox viewport bounds change and key virtual-history updates on viewport height so resize/keyboard changes remount the tail rows instead of leaving stale spacers visible.
* fix(tui): ignore late thinking deltas after completion
Prevent stale reasoning events from repainting the TUI status after a turn has already completed and the UI is idle.
* test(tui): restore timers after thinking delta assertion
Keep fake timer cleanup in a finally block so assertion failures cannot leak timer mode into later tests.
* fix(tui): log parent gateway lifecycle exits
Add parent-side breadcrumbs for TUI gateway shutdown and transport exits so future backend EOF/SIGTERM reports identify the parent action that caused them.
* chore(tui): retrigger lifecycle logging checks
Retry transient GitHub checkout failures on the lifecycle logging PR.
* fix(tui): commit composer input bursts immediately
Salvage the WSL/terminal multi-character input burst fix with focused regression coverage so delayed pseudo-paste buffers cannot reorder later edits.
* fix(tui): keep newline input bursts on paste path
Preserve paste handling for multi-character chunks with newlines while keeping repeated printable key bursts on the immediate composer path.
* refactor(tui): share composer frame batch interval
Use one frame-sized batching constant for parent updates, local renders, and input burst flushes.
First scratch workspace creation on an install now emits a one-shot
warning log + a 'tip_scratch_workspace' event on the task. Sentinel
file at ~/.hermes/kanban/.scratch_tip_shown silences subsequent
creations across the whole install.
Behavior unchanged — scratch is still ephemeral by design. This just
makes the design visible to new users (reported in user community:
'progress files vanished, no warning anywhere').
Docs (en + ko) updated to spell out 'Deleted when the task completes'
on the scratch bullet and 'Preserved on completion' on worktree/dir.
Path.resolve() before any I/O and confine backup writes to the resolved
parent directory. Adds explicit parent-equality assertions so static
analyzers see the containment guarantee, and walks WAL/SHM sidecars
through the same resolved-parent path so accidental .. segments are
collapsed before shutil.copy2.
Functionally equivalent to the original PR; preserves the corrupt bytes
to <db>.corrupt.<ts>.bak in the same directory, still raises
KanbanDbCorruptError from connect(). E2E with Stefan's exact hex header
+ malformed pages still passes. 163/163 kanban tests still pass.
A small, self-contained section under 'Skip the API-key collection —
Nous Portal' explaining what Portal gives you (300+ models + Tool
Gateway), the one-shot install command, and how to inspect routing.
No buzzwords, no comparison tables, no overselling.
Positioned right after 'Getting Started' so it lands where someone
scanning the README has just seen the install steps and is deciding
their next move. Skippable by anyone who already knows their provider.
The line 'You can still bring your own keys per-tool whenever you
want' is the deliberate honesty rail — Portal is an option, not a
funnel. Existing per-provider language elsewhere in the README is
unchanged.
Mirrored to README.zh-CN.md to keep the two READMEs in sync.
User incident (Slack, 2026-05-13): user walked away mid-conversation,
agent requested approval to run `rm -rf .git`, the prompt timed out
after the gateway_timeout (default 300s), and the agent removed the
.git folder on its own. Corroborated by an independent report from a
Telegram user.
The underlying code path was correct — `check_all_command_guards`
returns `approved=False` with a BLOCKED message on both timeout and
explicit deny, and `terminal_tool` surfaces that as `status=blocked`
to the agent. The bug is at the model-interface layer: the message
"BLOCKED: Command timed out. Do NOT retry this command." reads to
some models as "try a different command achieving the same outcome."
This commit changes only the model-facing message + the structured
return shape:
- Timeout message now explicitly names the three evasion paths the
agent must avoid: retry, rephrase, AND achieve the same outcome
via a different command. Ends with "Silence is not consent."
- Explicit deny gets the same shape minus the silence-is-not-consent
line (it WAS an explicit deny, not silence).
- New structured fields on the return dict: `outcome` ("timeout"
or "denied") and `user_consent` (always False on this branch)
so plugins, hooks, and audit pipelines don't have to string-parse
the message to distinguish the two cases.
The mechanism that should already have prevented the original incident
— timeout treated as deny, BLOCKED result, post hook fires with
`choice="timeout"` — is unchanged. This commit hardens only the
agent's reading of the result.
Tests:
- test_timeout_returns_approved_false_with_no_consent — pins the
return shape on the Slack-shaped notify_cb-registered path
- test_timeout_message_is_emphatic_against_retry_and_rephrase —
pins the exact phrases the message must contain
- test_explicit_deny_carries_same_no_consent_shape — same contract
on explicit /deny
- test_timeout_emits_post_hook_with_timeout_outcome — pins the
post_approval_response hook payload so audit plugins can act
329 approval tests passing (4 new + 325 existing).
Fixes#24912
Reproduction (production, 2026-05-14): two concurrent sessions on the
same agent. Session A patches MEMORY.md directly via the patch tool,
appending ~8KB of structured content (Vendor Master, Standing Orders,
Pin Board) — none of it through the memory tool, so no § delimiters.
Session B starts later with stale in-memory state (1 entry, ~331
chars). Session B calls memory(action=replace) on its one known
entry. The tool's _read_file parses A's content as a single 8KB
'entry' (no § splits), then replace truncates that entry to B's new
333-byte content. ~8KB of structured content silently destroyed.
The atomic-rename write path is fine in isolation. The bug is the
implicit contract: the tool assumes MEMORY.md is exclusively a
§-delimited list of small entries it wrote, but the v0.13 install
runbook itself uses 'cat >> MEMORY.md' for onboarding, the patch tool
edits the file directly, and operators do too.
Fix: a drift guard in MemoryStore._detect_external_drift that fires
on either signal:
1. Re-parse + re-serialize doesn't produce identical bytes
(catches oddly-encoded delimiters / partial writes).
2. Any single parsed entry exceeds the store's whole-file char
limit. The tool budgets the ENTIRE store against that limit
(2200 chars for memory, 1375 for user), so no tool-written
entry can legitimately be larger. An entry bigger than the
store limit means an external writer dropped free-form content
into what the tool will treat as one entry.
When drift fires, _reload_target writes a .bak.<ts> snapshot of the
on-disk file, then add/replace/remove refuse to flush. The original
file stays untouched. The error dict surfaces the .bak path AND a
remediation string ('integrate missing entries via memory(add=...)
one at a time, then rewrite the file clean') so the model can act on
it without escalating to the operator.
Tests:
- test_replace_refuses_on_drift, test_add_refuses_on_drift,
test_remove_refuses_on_drift — all three mutators refuse
- test_clean_file_does_not_trigger_drift — false-positive check
- test_error_message_points_at_remediation — error string shape
- test_drift_guard_also_protects_user_target — USER.md too
- test_drift_backup_filename_is_unique_per_invocation — bak.<ts>
naming pin
144 memory tests passing (was 137; +7).
Fixes#26045
_recover_with_credential_pool had a second classification site that blanket-
treated any 403 against xai-oauth as entitlement (defense-in-depth for
#26847). That override defeated the new _is_entitlement_failure
disambiguator from the parent commit — bad-credentials 403s still
short-circuited the refresh path.
Apply the same WKE-unauthenticated / OAuth2-validation-phrase guard at
the override site so xAI's authoritative 'this is auth, not entitlement'
signal wins there too. The #26847 catch-all still triggers for genuine
entitlement bodies that don't carry the disambiguator.
Closes the end-to-end gap exposed by
test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403.
Eleven new tests pinning the #29344 fix. Layout mirrors the existing
"Fix D" entitlement section so the bad-credentials disambiguator
sits alongside the entitlement-block tests it complements.
Classifier-level coverage:
* ``test_is_entitlement_failure_false_for_bad_credentials_wke_suffix``
— verbatim shape from the reporter's wire capture
(``{code: 'caller does not have permission', error: 'OAuth2 access
token could not be validated. [WKE=unauthenticated:bad-credentials]'}``)
↦ classifier must return False so the refresh path runs.
* ``test_is_entitlement_failure_false_for_wke_suffix_in_normalized_shape``
— same body after ``_extract_api_error_context`` has rewritten it
to ``{reason, message}``. The disambiguator must fire in BOTH
shapes; without this guard the production call site at
``_recover_with_credential_pool`` (which goes through the
normalised extractor) would still misclassify.
* ``test_is_entitlement_failure_false_for_any_wke_unauthenticated_variant``
— parametrised forward-compat: ``bad-credentials``,
``expired-token``, ``revoked``, ``some-future-reason``. xAI
documents the prefix as stable, the suffix after the colon as a
reason code that can grow; every variant under
``unauthenticated:`` must route to refresh.
* ``test_is_entitlement_failure_false_via_oauth2_validation_phrase_alone``
— belt-and-braces guard: if a future API revision drops the WKE
suffix but keeps "OAuth2 access token could not be validated", we
still classify correctly.
* ``test_is_entitlement_failure_wke_signal_overrides_entitlement_keywords``
— defensive: if a body ever carries BOTH the WKE suffix and
entitlement language, the WKE signal wins. Auth is recoverable;
entitlement isn't, and a refreshed token will resurface the
entitlement message on the next request.
* ``test_is_entitlement_failure_case_insensitive_wke_match`` —
pins that the classifier lowercases the haystack so a future xAI
build that uppercases the prefix doesn't reintroduce the bug.
Recovery-path coverage (end-to-end through
``_recover_with_credential_pool``):
* ``test_recover_with_credential_pool_refreshes_on_xai_bad_credentials_403``
— the headline test the reporter requested: a bad-credentials 403
with the exact wire body must call ``try_refresh_current()``
exactly once and ``_swap_credential`` once. Pre-fix this returned
``(False, _)`` because the entitlement classifier over-matched and
short-circuited the refresh path.
* ``test_recover_with_credential_pool_still_blocks_real_entitlement``
— companion regression guard for #26847: a pure unsubscribed-
account body (no WKE suffix, no OAuth2-validation phrase) must
still surface as entitlement and skip refresh. The new
disambiguator must not weaken the original loop-protection it
was added to preserve.
The scaffolding reuses ``_make_codex_agent``, ``_FakePool``, and the
existing ``MagicMock`` patterns from the surrounding tests so the
new section reads as a natural extension of "Fix D" rather than a
separate test file.
``_is_entitlement_failure`` over-matched on xAI 403s. xAI returns the
same permission-denied ``code`` text for two distinct conditions:
1. Unsubscribed account ("active Grok subscription. Manage at
https://grok.com" in the ``error`` field).
2. Stale OAuth access token ("OAuth2 access token could not be
validated. [WKE=unauthenticated:bad-credentials]" in the ``error``
field).
The classifier's "does not have permission + grok" substring heuristic
treated both identically, so the credential-pool refresh path was
short-circuited for case (2) — long-running TUI sessions stuck on a
stale OAuth token surfaced a non-retryable client error and the user
had to exit + reopen the TUI to recover (the startup-resolve path
bypasses the classifier entirely, which is why bridge adapters with
proactive refresh cadences didn't see this in practice).
This patch adopts the reporter's recommended fix (option 1, tightest):
honor xAI's explicit ``[WKE=unauthenticated:...]`` suffix and the
``OAuth2 access token could not be validated`` phrasing as
authoritative "this is auth, not entitlement" signals. When either
appears anywhere in the body's text fields, the classifier returns
False eagerly — *before* the entitlement keyword checks run — so the
refresh-on-401 path takes over and the existing loop-protection still
guards against runaway refresh storms if the refresh itself fails.
Two small adjustments fall out of this:
* The haystack now also covers ``code`` and ``error`` keys directly,
not just the ``message``/``reason`` shape ``_extract_api_error_context``
produces. Real runtime paths use the normalised shape, but the test
suite and any future call sites that pass raw bodies get the same
treatment. Backwards compatible: missing keys default to empty
strings, the haystack still skips when everything is blank.
* Both disambiguator checks fire BEFORE the entitlement keyword
checks. If a future xAI body somehow lands with both an entitlement
message AND the WKE suffix, the WKE suffix wins (correct — auth is
recoverable; entitlement is not, and a refreshed token will surface
the entitlement message on the next request anyway).
Existing tests (``test_is_entitlement_failure_matches_real_xai_bodies``,
``test_is_entitlement_failure_false_for_unrelated_auth_errors``,
``test_recover_with_credential_pool_skips_refresh_on_entitlement_403``,
``test_recover_with_credential_pool_still_refreshes_genuine_auth_failure``)
continue to pass unchanged — the unsubscribed-account path, the
generic auth-error path, and the refresh-on-401 path are all left
intact.
Follow-up to #30869. Adds Portal mentions on user-facing pages that
naturally call for an LLM + tool credentials but didn't previously
acknowledge Portal as a one-stop option.
- getting-started/installation.md: tip after the 'after install' block
pointing at 'hermes setup --portal' for users who want everything wired
at once instead of piecewise via 'hermes model' + 'hermes tools'.
- user-guide/configuring-models.md: small tip near the top — the page is
literally about provider/model choice and previously had zero Portal
mention.
- user-guide/features/voice-mode.md: Prerequisites need both an LLM and
TTS — a Portal subscription is the single setup that covers both.
- user-guide/features/batch-processing.md: highlights Portal as a
predictable-cost option for parallel agent runs that hit many APIs.
- user-guide/features/api-server.md: backend needs models + tools; one
Portal sub gives a fully-equipped OpenAI-compatible endpoint.
- user-guide/windows-native.md: early-beta users on Windows benefit most
from skipping per-tool Windows-key-juggling.
- integrations/providers.md: updates the existing Tool Gateway tip and
the Nous Portal section to mention the new commands.
- user-guide/features/fallback-providers.md: Nous row in the provider
table now lists 'hermes setup --portal' as the fresh-install path.
Tone discipline: one Portal mention per page, concrete CLI commands
(no marketing copy), always solving a problem the page itself sets up.
PR #30860 added a one-shot Portal setup command and a small portal CLI
surface. Update the docs so the new commands are discoverable without
upgrading the tone of existing Portal mentions.
- getting-started/quickstart.md: small tip near Choose a Provider
pointing at 'hermes setup --portal' as the easiest fresh-install path.
- user-guide/features/tool-gateway.md: lead the Get-Started section
with 'hermes setup --portal' for fresh installs, keep 'hermes model'
for already-configured users, and add 'hermes portal status / tools'
to the activity-check commands.
- user-guide/features/{web-search,image-generation,tts,browser}.md: the
existing 'Nous Subscribers' tip blocks now name the one-shot command
for new installs, keeping the existing 'hermes tools' path for users
who only want to swap a single backend.
- reference/cli-commands.md: register 'hermes portal' in the top-level
command table, add a 'hermes portal' section with subcommands, and
add '--portal' to the 'hermes setup' options table.
Tone: each page already had a Portal mention. This PR keeps the per-page
count to one and uses concrete CLI commands rather than promotional copy.
Tool Gateway page is the one exception (the whole doc is about Portal).
Closes#30045. Based on @qike-ms's PR #30141.
Telegram status callbacks (lifecycle, compression, context-pressure)
used to append a fresh bubble on every emit. Now adapter tracks
{(chat_id, status_key) -> message_id}; first call sends, subsequent
calls edit. Failed edits drop the cache entry and fall through to a
fresh send.
- gateway/platforms/telegram.py: send_or_update_status() (+34 LOC)
- gateway/run.py: route _status_callback_sync through it when the
adapter supports it; plain adapter.send() otherwise (+15 LOC)
- 5 tests covering first send / edit-in-place / edit-failure fallback
/ distinct key & chat isolation
PR 2362cc468 ("fix(gateway): enforce env variable template expansion
on runtime config loaders") refactored `_load_service_tier` to read
config via the new `_load_gateway_runtime_config` wrapper instead of
opening `_hermes_home/config.yaml` directly. The
`test_run_agent_passes_priority_processing_to_gateway_agent` test still
only stubbed `_load_gateway_config` (the inner loader), so the runtime
wrapper saw an empty config and `_load_service_tier` returned None,
breaking the test:
FAILED tests/gateway/test_fast_command.py::test_run_agent_passes_priority_processing_to_gateway_agent
- AssertionError: assert None == 'priority'
Fix: also stub `_load_gateway_runtime_config` to return the expected
`agent.service_tier=fast` config, so the test once again drives the
priority routing path it was written to verify.
Confirmed reproducing on current main before the patch and passing
after.
* feat(portal): one-shot setup, status CLI, and Nous-included markers
Four small Portal-aware surfaces that drive subscription value without
adding friction for non-Portal users.
- hermes setup --portal: one-shot Nous OAuth + provider switch + Tool
Gateway opt-in. Shareable as a single command from docs/social.
- hermes portal {status,open,tools}: small surface over Portal auth +
Tool Gateway routing. Defaults to 'status' when no subcommand.
- Tool picker (hermes tools): when the user is logged into Nous, mark
Nous-managed provider rows with a star and 'Included with your Nous
subscription'. Suppressed when not authed — non-subscribers see the
picker unchanged.
- BYOK setup hint: a single dim line 'Available through Nous Portal
subscription.' appears when the user is being prompted for a paid
API key (Firecrawl, FAL, ElevenLabs, Browserbase, etc.) AND the
category has a Nous-managed sibling AND the user is not already
authed to Nous. Suppressed in all other cases.
Tested live end-to-end in an isolated HERMES_HOME with a simulated
authed and unauthed user. Targeted suite (tests/hermes_cli/
test_tools_config.py + test_setup.py) passes 97/97.
* fix: add portal to _BUILTIN_SUBCOMMANDS so plugin discovery fast-path skips it
Follow-up to @sprmn24's verdict-logic fix. The previous block-message
ended in 'Use --force to override' regardless of verdict — but as of
the --force fix above, dangerous community/trusted skills can't be
overridden by --force at all. The misleading hint sends users in a
loop. Replace it with a specific message that tells them what the
documented behavior actually is.
Adds two regression tests covering the dangerous-verdict message
shape and one that pins the existing --force hint for non-dangerous
blocks.
- _determine_verdict() returned 'caution' for medium/low-only findings,
causing community skills with harmless patterns (e.g. path traversal
notation, unpinned pip install) to be incorrectly blocked. Now returns
'safe' when only medium/low severity findings are present.
- should_allow_install() allowed --force to override 'dangerous' verdict,
contradicting documented behavior that --force does NOT override dangerous
scan results. Added explicit check to prevent force-installing skills
with dangerous verdict.
`_deliver_kanban_artifacts` routes candidates through
`BasePlatformAdapter.filter_local_delivery_paths` (added in 41d2c758c),
which rejects paths outside `MEDIA_DELIVERY_SAFE_ROOTS`. The two
artifact-delivery tests create fixtures under `tmp_path`, which lives
outside the cache roots — so under CI's hermetic HOME the filter
silently dropped both fake files and the assertions on
`images_uploaded` / `documents_uploaded` failed.
Fix: monkeypatch `HERMES_MEDIA_ALLOW_DIRS=str(tmp_path)` in both tests
so the safety filter accepts the fixtures. Production behaviour
unchanged; test-side fix only.
CI fail repro on origin/main: test (6) shard, both
test_notifier_uploads_artifacts_on_completion and
test_notifier_artifact_delivery_skips_missing_files.
Ten regressions across both prongs of the #29507 fix, organised so each
test names exactly which way the bug could come back:
Prong 1 — ``force_close_tcp_sockets``:
* ``shutdown_only_no_close`` is the smoking-gun assertion. If a future
refactor adds back ``sock.close()`` to this helper, the FD-recycling
race that wrote TLS bytes on top of ``kanban.db`` is back, and this
trips.
* ``uses_shut_rdwr`` pins that both halves are shut down (a half-close
wouldn't unblock a worker stuck in ``recv``).
* ``swallows_oserror_on_shutdown`` covers the already-shutdown case.
* ``handles_multiple_pool_entries`` walks all pool connections.
Prong 2 — thread-aware ``_close_request_client_once``:
* ``stranger_thread_aborts_only_no_close`` simulates the asyncio_0 →
Thread-1616 interrupt path: stranger drives abort, holder stays
populated for the worker's eventual finally.
* ``owner_thread_pops_and_full_close`` is the worker-thread path: pops
+ full close.
* ``stranger_then_owner_close_sequence_runs_full_close_exactly_once``
replays the reporter's exact timeline at object level: abort runs
once, full close runs once, holder ends empty.
Agent surface:
* ``_abort_request_openai_client_does_not_call_client_close`` pins
that the new entrypoint shuts sockets and emits the
``deferred_close=stranger_thread`` marker but never calls
``client.close()``.
* ``_abort_request_openai_client_null_client_is_noop`` defensive.
End-to-end:
* ``fd_recycle_window_closed_by_shutdown_only`` reproduces the race
at object level — runs the abort path from a stranger thread and
asserts that no ``close()`` ever fires, so the kernel can never
recycle the FD under the owner's still-active reference.
Layer-2 defense for the FD-recycling race: even with
``force_close_tcp_sockets`` reduced to shutdown-only, the followup
``client.close()`` in ``_close_openai_client`` still walks the httpx
pool and closes sockets — and if called from a stranger thread (the
interrupt-check loop, the stale-call detector) it has the same
FD-recycling exposure that wrote a TLS record on top of ``kanban.db``.
Stamp the request_client_holder with the owning thread's ident at
``_set_request_client`` time. In ``_close_request_client_once``:
* Owning thread (the worker's ``finally``) → pop + ``client.close()``
via ``_close_request_openai_client``, exactly as before.
* Stranger thread → ``_abort_request_openai_client`` (new): only
``shutdown(SHUT_RDWR)`` the pool sockets and log a deferred-close
marker. The holder stays populated so the worker's eventual
``finally`` performs the real close from its own thread context,
where the FD release races nothing.
Applied symmetrically to both the non-streaming
``interruptible_api_call`` and the streaming variant — both routinely
get hit by stranger-thread interrupts.
The log field ``tcp_force_closed=N`` keeps its existing shape; the new
abort path adds ``deferred_close=stranger_thread`` so production
triage can distinguish the two close kinds.
The helper used to call ``socket.shutdown(SHUT_RDWR)`` followed by
``socket.close()`` to drop CLOSE-WAIT entries immediately. On its own
``shutdown()`` is safe from any thread — it only sends FIN and breaks
pending ``recv``/``send`` — but ``close()`` releases the FD integer to
the kernel. When the helper runs on a stranger thread (the interrupt
loop, the stale-call detector) the FD release races the owning httpx
worker thread that still has the same integer cached inside the SSL
BIO. The kernel then recycles that integer to the next ``open()`` call
— in production, kanban dispatcher's ``kanban.db`` — and the worker's
delayed TLS flush writes a 24-byte TLS application-data record on top
of the SQLite header.
Restrict the helper to ``shutdown(SHUT_RDWR)`` only. The owning httpx
worker's own unwind will close the underlying socket via the same
Python ``socket.socket`` object, which atomically swaps ``_fd`` to -1
before issuing ``close(2)`` — no FD-aliasing window.
The log field ``tcp_force_closed=N`` is kept (now counts shutdowns) so
existing dashboards / log parsers keep working.
_guess_ext_from_data: data[:5] == b"#!SILK" -> data[:6] (6-byte string)
_looks_like_silk: data[:4] == b"#!SILK" -> data[:6]
The previous slices were too short to ever match the 6-byte "#!SILK"
literal, relying entirely on the "#!SILK_V3" (9-byte) and 0x02! (2-byte)
fallback paths for SILK format detection.
Add original_name parameter to _download_and_cache, preferring the
attachment metadata filename over the CDN URL path basename. Previously
files were cached with meaningless QQ CDN hash names (e.g.
qqdownload_...oadftnv5), causing ugly filenames when sent back to users.
Aligns with qqbot-agent-sdk's AttachmentDownloader.download_document.
1. Handle op 7 (Server Reconnect): close WS to trigger reconnect loop
while preserving session for Resume
2. Handle op 9 (Invalid Session): check d value to determine if session
is resumable; clear session only when not resumable
3. Remove 4009 from session-clearing set (connection timeout is resumable)
4. Expand fatal close codes: 4001/4002/4010-4014 now stop reconnect
immediately instead of retrying uselessly
5. Add unit tests
1. Add INTERACTION intent bit (1<<26) to _send_identify, fixing approval
button clicks not being received (INTERACTION_CREATE events were never
dispatched by the gateway)
2. Include local cached path in video/file attachment descriptions so the
LLM can reference files for re-sending to users
3. Add unit tests (TestIdentifyIntents, TestProcessAttachmentsPathExposure)
A bare except in _load_gateway_runtime_config would silently return the
unexpanded dict on any _expand_env_vars failure — masking the very bug
this helper exists to fix. Drop it; let the caller see real errors.
PR #41d2c758c ("Fix unsafe gateway media path delivery") tightened
`validate_media_delivery_path` so that artifacts emitted by the agent
must live inside `MEDIA_DELIVERY_SAFE_ROOTS` (Hermes-managed cache
dirs) or an operator-allowlisted root via `HERMES_MEDIA_ALLOW_DIRS`.
Two kanban-notifier tests put their PDFs and PNGs under pytest's
`tmp_path`, which is correctly rejected by the new validator. They
started failing on main as soon as that PR landed:
FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_uploads_artifacts_on_completion
FAILED tests/hermes_cli/test_kanban_notify.py::test_notifier_artifact_delivery_skips_missing_files
Symptom in logs: "Skipping unsafe local file path outside allowed
roots". The validator is doing exactly what it should — the tests were
relying on the looser pre-fix behaviour.
Fix: add `HERMES_MEDIA_ALLOW_DIRS=tmp_path` to the `kanban_home`
fixture so artifacts under `tmp_path` are recognised as safe. This is
the same allowlist mechanism the operator-facing env var documents.
PR infographics belong in PR descriptions, not committed to the repo.
Removes the 13 archived directories under infographic/ and adds the path
to .gitignore so future generations don't accidentally land in-tree.
The fal.media URLs embedded in each PR's body remain the canonical
artifact — those PR descriptions are the storage.
The Kimi K2 branch added in the prior commit only emitted extra_body.thinking
and dropped reasoning_effort entirely. KimiProfile (api.moonshot.ai/v1) sends
both fields, and OpenCode Go proxies to the same Moonshot backend. Mirror that
shape on the Go path so /reasoning effort actually reaches Kimi.
- low/medium/high pass through verbatim
- xhigh/max clamp to high (Moonshot's max supported value)
- minimal / unknown effort → omit reasoning_effort, keep thinking on
- disabled / no config → unchanged
- DeepSeek branch unchanged
The two ACP slash-command tests that exercise `provider:model` routing
(`test_set_session_model_accepts_provider_prefixed_choice` and
`test_model_switch_uses_requested_provider`) relied on the live
`hermes_cli.models._KNOWN_PROVIDER_NAMES` / `_PROVIDER_ALIASES` module
state to parse `anthropic:claude-sonnet-4-6` into
`("anthropic", "claude-sonnet-4-6")`. If any earlier test in the same
xdist worker registers a custom provider that shadows `anthropic` or
otherwise mutates those globals, the parser falls into the
`detect_provider_for_model` branch and resolves to `custom` instead.
Observed once in CI on run 26326728502 / job 77505732299 as
`AssertionError: assert 'custom' == 'anthropic'` — could not reproduce
locally under per-file isolation, so the failing in-file order was
specific to a particular xdist scheduling.
Monkeypatching `parse_model_input` + `detect_provider_for_model` for
both tests removes the global-catalog dependency, so the tests now only
exercise what they were written to verify (the `requested_provider ->
runtime -> AIAgent kwargs` plumbing).
The reference entry now documents the truthy set
(``1`` / ``true`` / ``yes`` / ``on``) explicitly, matches the
falsy half (``0`` / ``false`` / ``no`` / ``off`` / empty string)
that the GHSA-5qr3-c538-wm9j fix re-aligned both the agent loader
and the dashboard web server around, and points readers at the
defence-in-depth rule that project plugins never have their
Python ``api`` file auto-imported by the dashboard regardless of
the env var.
GHSA-5qr3-c538-wm9j — half two of the bypass chain.
``_mount_plugin_api_routes`` imports each dashboard plugin's
manifest ``api`` field as a Python module via
``importlib.util.spec_from_file_location`` — arbitrary code
execution by design. Two primitives in the surrounding code
turned that "by design" RCE into a usable attack:
1. Absolute paths in the manifest swallow the plugin directory.
``Path('safe/dashboard') / '/tmp/evil.py'`` resolves to
``/tmp/evil.py``, so a single manifest line
``{"api": "/tmp/payload.py"}`` was enough to redirect the
importer at any Python file on disk.
2. ``..`` traversal in the manifest climbs out of the dashboard
directory. ``Path('plugins/safe/dashboard') /
'../../../tmp/evil.py'`` lands in ``/tmp/evil.py`` after
``resolve()`` — the static-asset handler
(``serve_plugin_asset``) already defends against this via
``is_relative_to``; the api-mount path didn't.
Fix at three layers so a regression in any one can't re-open the
advisory:
* New ``_safe_plugin_api_relpath`` validator runs at *discovery*
time and stores only sanitised relative paths on the plugin
entry's ``_api_file`` field. Absolute paths, ``..`` traversal,
empty / non-string values, and paths that ``resolve()`` outside
the plugin's ``dashboard/`` directory are rejected with a
warning naming the plugin. ``has_api`` follows the sanitised
value so the dashboard frontend doesn't render a fake "Backend
API" badge for plugins whose api was scrubbed.
* ``_mount_plugin_api_routes`` re-validates the resolved path
against the live filesystem just before the import — defence in
depth in case ``_dir`` is tampered with post-cache or a future
caller bypasses the discovery-time validator.
* Project plugins (``source == "project"``) are refused outright
for backend import. ``./.hermes/plugins/`` ships with the CWD,
so any threat model that includes "user opens a malicious repo"
treats it as attacker-controlled; project plugins can still
extend the UI via static JS/CSS but their Python ``api`` is no
longer auto-imported. Combined with the truthy env-gate fix
from the previous commit, the original advisory chain now
fails at two distinct choke points.
35 new tests across 5 classes covering every layer of the
GHSA-5qr3-c538-wm9j defence. Each class corresponds to one chokepoint
so a regression in any single layer is caught by the named class:
* ``TestProjectPluginsEnvGate`` (13 cases) — parametrised over both
the documented truthy values (``1`` / ``true`` / ``yes`` / ``on``
+ uppercase variants) and the previously-bypassing falsy strings
(``0`` / ``false`` / ``no`` / ``off`` / ``""`` / ``False``). The
falsy half is the direct env-bypass repro: pre-fix any non-empty
string enabled the project source.
* ``TestApiPathSanitizer`` (16 cases) — unit-level coverage of the
new ``_safe_plugin_api_relpath`` helper. Absolute paths
(``/etc/passwd``, ``/tmp/payload.py``, ``/usr/bin/python``),
``..``-traversal payloads (including nested ``subdir/../../..``),
and non-string / empty / whitespace-only values must all return
``None``. Safe relative paths (``api.py``, ``backend/routes.py``)
round-trip unchanged so legitimate plugins keep working.
* ``TestDiscoveryScrubsApiField`` (3 cases) — end-to-end through
``_discover_dashboard_plugins`` with a real manifest on disk.
Verifies that the cached plugin entry's ``_api_file`` is
scrubbed *at discovery time* (``None`` + ``has_api: False``) so
any downstream consumer can't be tricked into re-deriving the
unsafe path from cache.
* ``TestMountApiRoutesRefusesUntrusted`` (3 cases) — pokes
synthetic plugin entries with each refusal vector directly into
the cache and patches ``importlib.util.spec_from_file_location``
to assert it is *not* invoked for project-source / traversal
payloads, and *is* invoked normally for bundled / user plugins.
* ``TestEndToEndPocBlocked`` (1 case) — reproduces the original
advisory PoC: operator sets ``HERMES_ENABLE_PROJECT_PLUGINS=0``
believing project plugins are off, attacker plants a manifest in
CWD's ``.hermes/plugins/`` with ``api`` pointing at an absolute
payload path. Asserts that the importer is never called against
the payload path *and* that ``hermes_dashboard_plugin_evil`` is
not in ``sys.modules`` after the mount routine runs.
An autouse fixture busts ``_dashboard_plugins_cache`` before and
after each test so the production cache (populated by the
import-time ``_mount_plugin_api_routes()`` call) can't bleed in.
All 12 pre-existing dashboard-plugin tests in
``test_web_server.py`` still pass unchanged.
GHSA-5qr3-c538-wm9j — half one of the bypass chain.
``_discover_dashboard_plugins`` opted into the untrusted ``./.hermes/
plugins/`` source via ``if os.environ.get("HERMES_ENABLE_PROJECT_
PLUGINS"):`` — which is True for any non-empty string. ``=0``,
``=false``, ``=no``, ``=off`` all return non-empty strings and so
*enabled* the project source even though every operator (and the
agent loader, ``hermes_cli/plugins.py`` line 815) reads those values
as "disabled". An attacker who can land a manifest under the CWD's
``.hermes/plugins/`` directory — a malicious cloned repo, a worktree
checked out from a forked PR, a CI runner workspace — was therefore
guaranteed to get their manifest discovered the moment the user ran
``hermes dashboard`` from that directory, regardless of whether the
user thought they had project plugins disabled.
Switch to the shared ``utils.env_var_enabled`` helper used by the
agent loader so the gate accepts the documented truthy set (``1`` /
``true`` / ``yes`` / ``on``, case-insensitive) and treats everything
else — including ``0`` / ``false`` / ``no`` — as off.
Half two (path-traversal + project-source ``api`` import) lands in
the next commit. Together they break the RCE chain at two distinct
choke points so a future regression in either one alone can't
re-open the advisory.
Extends @briandevans's PR #17659 from {auth.json, auth.lock,
.anthropic_oauth.json} to also cover:
- HERMES_HOME/.env (provider API keys)
- HERMES_HOME/webhook_subscriptions.json (per-route HMAC secrets)
- HERMES_HOME/mcp-tokens/ (OAuth token directory; dir
+ everything inside)
…AND iterates over both _hermes_home_path() AND _hermes_root_path()
so profile-mode runs (HERMES_HOME = <root>/profiles/<name>) also block
<root>/{auth.json, .env, mcp-tokens/, ...}. Same widening shape as the
write-deny side already does (#15981, #14157).
Explicitly NOT a security boundary. Per the personal-assistant trust
model, the terminal tool runs as the same OS user and can `cat
auth.json` directly. This read-deny exists as defense-in-depth:
- Models that respect tool denials empirically tend to stop rather
than reach for the shell.
- The denial surfaces an audit trail when something tries to read
credentials — easier to spot in logs than a generic `cat`.
Docstring + error message both flag this as defense-in-depth so future
contributors don't mistake it for a real security boundary and don't
re-decline reports that propose the same fix shape.
Absorbs the .env and mcp-tokens/ coverage from @tomqiaozc's parallel
PR #8055 (closed-as-duplicate, credited).
Co-authored-by: Tom Qiao <zqiao@microsoft.com>
read_file_tool resolves relative paths against TERMINAL_CWD (or the
task's live terminal cwd), but the prior call passed the original
unresolved string to get_read_block_error. That function's own
resolve() is anchored at the Python process cwd, so when a task's
TERMINAL_CWD pointed at HERMES_HOME and the agent issued read_file
on the relative path "auth.json", the credential-store denylist was
never reached and the file was read normally.
Pass the already-resolved absolute path string at the file_tools call
site, document the contract on get_read_block_error, and add a
read_file_tool-level regression test that pins the relative-path
case under TERMINAL_CWD == HERMES_HOME.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`get_read_block_error` previously only denied reads inside
`${HERMES_HOME}/skills/.hub`, which left `auth.json` (provider OAuth
state + plaintext API keys) and `.anthropic_oauth.json` (Anthropic PKCE
tokens) directly readable by the agent. A prompt-injection reaching
`read_file` could exfiltrate active provider credentials in plaintext.
Mode-0600 file permissions only protect against *other Unix users* —
the agent runs as the file's owner, so `read_file` is unaffected.
Extend the existing deny list with the three credential paths
identified in #17656 (`auth.json`, `auth.lock`, `.anthropic_oauth.json`).
The check uses the same `Path.resolve()` pattern as `skills/.hub`, so
symlink/path-traversal indirection is caught too. The agent doesn't
need to read these directly — `auxiliary_client` and `credential_pool`
consume them through process env / OAuth flows that bypass `read_file`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #6656 added rel_path + \x00 prefixing to ``bundle_content_hash`` so a
filename swap between two files in a bundle changes the digest. But it
only patched the in-memory side — ``content_hash`` in ``tools/skills_guard.py``
(the on-disk equivalent) still hashed file contents only.
These two functions need to stay symmetric: ``check_for_skill_updates``
compares the disk hash of an installed skill against the bundle hash
of the upstream copy. With the asymmetric fix, every clean install
showed as drifted because the digests no longer matched
(2 existing tests in ``test_skills_hub.py`` started failing as soon as
the contributor's change landed).
Apply the same ``rel_path + \x00 + content`` shape to the disk-side
function. Both functions now produce the same digest for the same skill
content laid out two ways. Documented the symmetry invariant in the
docstring so a future change to either function knows to touch both.
Also adds tests/tools/test_pr_6656_regressions.py with 10 regression
tests covering all three fixes salvaged in PR #6656:
- uninstall_skill path traversal (4 cases: parent segments, absolute
paths, symlink escape, legitimate skill)
- bundle_content_hash filename swap detection (4 cases: in-memory
swap, identity, disk-side swap, bundle↔disk symmetry)
- list_pending lock contract (2 cases: source-grep contract, smoke)
Also fixes AUTHOR_MAP entry for @aaronlab — their commit email
(1115117931@qq.com) maps to "aaronagent" which isn't a real GitHub
login, so changelog @mentions would 404.
- skills_hub: validate that uninstall_skill's install_path resolves
inside SKILLS_DIR before calling shutil.rmtree, preventing recursive
deletion of arbitrary directories via poisoned lock.json entries
- skills_hub: include file paths (not just contents) in
bundle_content_hash so swapping filenames between files changes the
hash, strengthening update-detection integrity
- pairing: wrap list_pending() in self._lock so _cleanup_expired() file
writes don't race with concurrent generate_code()/approve_code() calls
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Follow-up to PR #28832 — the dashboard plugin routes now accept slashed
names like `observability/langfuse` and `image_gen/openai`, but
`_sanitize_plugin_name` still rejected forward slash and so dashboard
update + remove on those plugins fell through to '404 not found' even
though they exist on disk.
Adds an opt-in `allow_subdir=True` flag that:
- Permits internal forward slashes (category-namespaced plugin keys
emitted by `_discover_all_plugins`).
- Strips leading and trailing slashes.
- Still rejects `..` and backslash, and still asserts the resolved
target lives inside `plugins_dir`.
Opted in at the two read-paths that operate on installed plugins:
`_require_installed_plugin` (CLI update/remove) and
`_user_installed_plugin_dir` (dashboard update/remove). The install
path keeps the default (`allow_subdir=False`) because freshly-cloned
plugins always land top-level under `~/.hermes/plugins/<name>/`.
Adds 6 targeted unit tests covering the new flag's allow/reject matrix.
Removes the global `uppercase` + `font-mondwest` from the App.tsx root
that forced every page to opt-out, replaces stacked-alpha text colors
with semantic tokens for WCAG-AA contrast across all 7 themes, and
applies the new `text-display` utility from @nous-research/ui@0.16.0
on intentional brand chrome (page titles, sidebar headings, segmented
filters) only. Bumps every sub-12px arbitrary text size to text-xs.
Also widens the dashboard plugin routes (/api/dashboard/agent-plugins/
{name:path}/...) so category-namespaced plugins like observability/
langfuse and image_gen/openai can be enable/disabled from the dashboard
— previously the FE encodeURIComponent-ed the slash and the backend
{name} route rejected it. _validate_plugin_name still blocks .. and
backslash, and strips leading/trailing slash.
Touches sessions/env/keys page chrome and adds two new i18n keys
(`overview`, `showMore`/`showLess`) across all 18 locales.
Squashes 19 commits from PR #28832.
Co-authored-by: Hermes <noreply@nousresearch.com>
- test_browser_secret_exfil: mock _run_browser_command instead of
launching real Chrome (secret check is pre-launch, browser is
irrelevant to the assertion)
- test_web_server: add time.sleep(0.05) after pub.send_text() to
yield the event loop before receive_text(). TestClient's sync mode
can race the broadcast handler otherwise, hanging the test.
run_tests_parallel.py:
- --slice I/N flag (also HERMES_TEST_SLICE env var) runs only the
I-th slice of N, distributing files across slices by cached
duration using LPT (Longest Processing Time first) greedy
algorithm so each slice gets roughly equal wall time
- Duration cache (test_durations.json): maps relative file paths to
last-observed subprocess wall time. _save_durations merges with
existing cache so entries from other slices are preserved.
- Per-file subprocess timing in progress output + end-of-run
distribution summary (percentiles, top-10 slowest, <1s/<2s counts)
- Unknown files default to 2.0s estimate (~P50), spread evenly by LPT
.github/workflows/tests.yml:
- Matrix strategy: slice [1, 2, 3, 4] with fail-fast: false
- Each slice restores duration cache from main (stable key, no SHA),
runs its portion, uploads per-slice durations as artifacts
- save-durations job (main only, if: always()) downloads all 4
artifacts, merges into single cache entry for future PRs
- Timeout reduced from 60min to 30min per slice (~1/4 the work)
Cache design:
- Stable key (test-durations) not keyed by commit SHA — durations
are about files, not commits, and SHA-keyed caches miss on every
new commit and on PR merge commits
- actions/cache scoping: main's cache is visible to all PRs targeting
main; feature branches without a cache still work (default 2.0s)
- No dotfile prefix (upload-artifact v7 skips hidden files)
* fix(minimax-oauth): refresh short-lived access tokens per request
MiniMax OAuth issues ~15-minute access tokens. The Anthropic SDK caches
api_key as a static string at client construction, so a session that
resolves credentials once at startup keeps sending the same bearer until
MiniMax returns 401 mid-session.
Swap the static string for a callable token provider, reusing the existing
Entra-ID bearer-hook infrastructure in build_anthropic_client. The callable
re-reads auth.json on each invocation and calls _refresh_minimax_oauth_state,
which is a no-op when the token still has more than 60s of life left and
refreshes proactively otherwise. Refreshes persist to auth.json so other
processes (gateway, cron) see them immediately.
The wire-up lives at the agent-init / model-switch boundary rather than in
resolve_runtime_provider, so aux client paths that hand the api_key string
to OpenAI(api_key=...) are unaffected.
* docs: add infographic for minimax-oauth token refresh
The workflow diffs base.sha..head.sha (two-dot), which compares the
tip-of-main tree directly against the PR tip. When files land on main
after a PR branched off, they appear in the diff even though the PR
never touched them — triggering false-positive findings.
Example: PR #30609 was flagged for hermes_cli/setup.py, a file added
to main by an unrelated commit after the PR branched.
Switch to three-dot diff (base.sha...head.sha), which diffs from the
merge base to the PR tip — only changes introduced by this PR are
included. Applied to all four diff commands in both jobs (scan and
dep-bounds).
Two bugs surfaced by PR #24356 migrating Discord into the registry:
1. plugins/platforms/discord/adapter.py::_is_connected — read DISCORD_BOT_TOKEN
via hermes_cli.gateway.get_env_value (the abstraction tests patch) instead
of os.getenv directly. The legacy non-registry path used get_env_value;
bypassing it broke test_setup_openclaw_migration which patches
gateway_mod.get_env_value to simulate a hermetic env.
2. hermes_cli/gateway.py::_platform_status — when entry.is_connected is
defined and returns False, return 'not configured' immediately. Don't
fall back to entry.check_fn(), which would let 'SDK is installed'
override 'no token configured' and incorrectly report the platform as
ready. The fallback to check_fn is the right behaviour only when
is_connected is None (not registered).
Fixes 5 test failures observed on CI for PR #24356:
- tests/hermes_cli/test_setup.py::test_setup_gateway_skips_service_install_when_systemctl_missing
- tests/hermes_cli/test_setup.py::test_setup_gateway_in_container_shows_docker_guidance
- tests/hermes_cli/test_setup_irc.py::TestIRCGatewaySetupFreshInstall::test_setup_gateway_irc_counts_as_messaging_platform
- tests/hermes_cli/test_setup_openclaw_migration.py::TestGetSectionConfigSummary::test_gateway_returns_none_without_tokens
- tests/hermes_cli/test_setup_openclaw_migration.py::TestSetupWizardSkipsConfiguredSections::test_sections_skipped_when_migration_imported_settings
Same _platform_status bug exists for sibling plugin platforms (teams,
google_chat) whose check_fn returns true on SDK install alone; their
tests just never exercised the registry path before. The bug only became
test-visible when Discord migrated into the registry.
Validation: 11,167 tests across tests/gateway/ + tests/cron/ +
tests/tools/test_send_message_tool.py + tests/hermes_cli/ pass with zero
failures.
First migration of an existing built-in platform adapter to the plugin
system established by IRC / Teams / LINE / Google Chat. Closes#24325;
advances the umbrella refactor in #3823.
Matches Teams' shape exactly — adapter under ``plugins/platforms/discord/``
with the standard ``__init__.py`` / ``adapter.py`` / ``plugin.yaml``
shell, ``register(ctx)`` entry point, **no back-compat shim** at the old
import path, and full parity for the four hooks Teams uses plus the
``apply_yaml_config_fn`` hook that landed in #25443 (the Discord plugin
is the first consumer of that hook):
* ``standalone_sender_fn`` — out-of-process cron delivery via REST API
* ``setup_fn`` — interactive ``hermes setup gateway`` wizard
* ``apply_yaml_config_fn`` — translate ``config.yaml`` ``discord:`` keys
into ``DISCORD_*`` env vars (replaces the hardcoded block in
``gateway/config.py``)
* ``is_connected`` — declares connection state from ``DISCORD_BOT_TOKEN``
* ``check_fn`` — lazy-installs ``discord.py`` on demand
* plus ``allowed_users_env``, ``allow_all_env``, ``cron_deliver_env_var``,
``max_message_length``, ``emoji``, ``required_env``, ``install_hint``
* ``gateway/platforms/discord.py`` (5,101 LOC) →
``plugins/platforms/discord/adapter.py`` (git rename, R090).
* New ``plugins/platforms/discord/{__init__.py, plugin.yaml}`` with
``requires_env`` / ``optional_env`` declarations.
* Append ``register(ctx)`` block + new hook implementations
(``_standalone_send``, ``interactive_setup``, ``_apply_yaml_config``,
``_clean_discord_user_ids``, ``_is_connected``, ``_build_adapter``,
plus helpers ``_DISCORD_CHANNEL_TYPE_PROBE_CACHE`` etc.) to the
adapter.
* Replace the ``Platform.DISCORD elif`` branch in
``GatewayRunner._create_adapter()`` (−9 LOC) with a generic post-creation
hook (+6 LOC) in the registry path: any plugin adapter that declares a
``gateway_runner`` attribute now gets it auto-injected. Webhook's
built-in branch is unchanged (it doesn't go through the registry path).
* Move ``_send_discord`` (190 LOC) and helpers
(``_DISCORD_CHANNEL_TYPE_PROBE_CACHE``, ``_remember_channel_is_forum``,
``_probe_is_forum_cached``, ``_derive_forum_thread_name``) from
``tools/send_message_tool.py`` into the plugin as ``_standalone_send``.
* Wire via ``standalone_sender_fn=_standalone_send`` (Teams pattern; same
gap fixed in #21804 for other plugin platforms).
* Replace the Discord ``elif`` in ``tools/send_message_tool.py``
``_send_to_platform`` with a 10-line registry-hook dispatch.
* Drop the ``DiscordAdapter`` import and the
``Platform.DISCORD: DiscordAdapter.MAX_MESSAGE_LENGTH`` ``_MAX_LENGTHS``
entry — the registry's ``max_message_length=2000`` covers it.
* Move ``_setup_discord`` and ``_clean_discord_user_ids`` (68 LOC) from
``hermes_cli/setup.py`` into the plugin as ``interactive_setup``.
* Wire via ``setup_fn=interactive_setup``. CLI helpers (``prompt``,
``print_info``, etc.) are lazy-imported so the plugin's module-load
surface stays minimal.
* Remove ``"discord": _s._setup_discord`` from
``hermes_cli/gateway.py::_builtin_setup_fn``.
* Remove the entire 32-line ``_PLATFORMS["discord"]`` static dict entry —
Discord's setup metadata is now discovered dynamically via
``_all_platforms()`` from the registry entry.
* Move the 59-line ``discord_cfg`` YAML→env bridge from
``gateway/config.py::load_gateway_config()`` into the plugin as
``_apply_yaml_config``. Covers ``require_mention``,
``thread_require_mention``, ``free_response_channels``, ``auto_thread``,
``reactions``, ``ignored_channels``, ``allowed_channels``,
``no_thread_channels``, ``allow_mentions.{everyone,roles,users,
replied_user}``, and ``reply_to_mode`` (including the YAML 1.1
``off``-as-False coercion and the ``extra.reply_to_mode`` fallback).
* Wire via ``apply_yaml_config_fn=_apply_yaml_config``.
* The hook runs BEFORE ``_apply_env_overrides`` and after the generic
shared-key loop, exactly as documented in
``website/docs/developer-guide/adding-platform-adapters.md``.
* Behavior is preserved exactly — every assignment still uses
``not os.getenv(...)`` guards so env vars take precedence over YAML.
All 78 references to the old import path are rewritten — no back-compat
shim:
* 51 ``from gateway.platforms.discord import X`` →
``from plugins.platforms.discord.adapter import X``
* 5 ``import gateway.platforms.discord as discord_platform`` →
``import plugins.platforms.discord.adapter as discord_platform``
* 1 ``from gateway.platforms import discord as discord_mod`` →
``from plugins.platforms.discord import adapter as discord_mod``
* 21 ``mock.patch("gateway.platforms.discord.X")`` strings →
``mock.patch("plugins.platforms.discord.adapter.X")``
* 1 docstring reference in ``hermes_cli/commands.py``
* 1 import in ``tools/send_message_tool.py`` (now removed entirely)
The import-safety test in ``tests/gateway/test_discord_imports.py`` is
updated to purge the new canonical module name from ``sys.modules``.
**38 files changed, +621 / −473** — net positive due to the YAML hook
implementation (89 new LOC in the plugin trading for 59 deleted in core),
but every line moved has a clear plugin home now. The git rename is
detected at R090 because the adapter gained ~340 LOC of moved-in hook
implementations (``_standalone_send`` + ``interactive_setup`` +
``_apply_yaml_config`` + helpers).
* All 568 Discord-specific tests pass across 25 ``test_discord_*.py``
files plus voice/send/text-batching/reload-skills/stream-consumer/
integration tests.
* All 147 tests in the YAML-touching subset
(``test_discord_reply_mode``, ``test_discord_free_response``,
``test_discord_allowed_channels``, ``test_discord_allowed_mentions``,
``test_discord_channel_controls``, ``test_discord_reactions``,
``test_discord_thread_persistence``, ``test_runtime_footer``) pass —
this is the strongest signal that the YAML→env hook behaves
identically to the legacy block.
* Broader gateway/cron/integration sweep (1297 tests) introduces zero
new failures vs ``main``. Pre-existing failures in
``tests/gateway/test_tts_media_routing.py`` and
``tests/e2e/test_platform_commands.py`` reproduce identically on the
unchanged ``main`` revision.
* Plugin discovery sanity check confirms Discord registers alongside the
other four platform plugins:
Registered platforms: ['discord', 'google_chat', 'irc', 'line', 'teams']
These Discord-shaped tendrils in core were **deliberately not moved** —
they are generic platform-registry concerns affecting every platform,
not Discord-specific:
* ``gateway/config.py:1205`` ``DISCORD_BOT_TOKEN → config.token`` env
enablement — same shape Telegram has. The existing
``env_enablement_fn`` registry hook only seeds ``extra``, not
``.token``, so it can't replace this without an adapter refactor to
read from ``extra["bot_token"]``.
* ``gateway/run.py`` voice-mode hooks
(``self.adapters.get(Platform.DISCORD)`` for
``start_voice_mode``/``stop_voice_mode``), role-based auth,
``DISCORD_ALLOW_BOTS`` branch in ``_is_user_authorized``,
``_UPDATE_ALLOWED_PLATFORMS`` frozenset, and the per-platform
allowlist maps — generic platform-registry concerns.
* ``Platform.DISCORD`` enum literal — stable identifier used as dict
keys throughout the codebase; removing it is a separate refactor with
no real benefit.
* ``tools/discord_tool.py`` and ``tools/environments/local.py`` —
first-class agent tools and env-passthrough config, neither is the
gateway adapter.
Each of these is worth its own scoping issue when the time comes.
@memosr's PR #27612 put the inference_base_url allowlist check only at the
Nous proxy adapter forward boundary. The poisoned URL, however, lands in
``auth.json`` upstream of that — at five refresh / agent-key-mint payload
read sites inside ``resolve_nous_runtime_credentials`` and
``_extend_state_from_refresh``. Without gating those sites, a single MITM
on a refresh response persists the attacker's URL across restarts, even
if the proxy adapter's defense-in-depth check would later catch it on
the way out.
Replace ``_optional_base_url`` with ``_validate_nous_inference_url_from_network``
at all five Portal-network reads:
- hermes_cli/auth.py L4840 (refresh-only access-token path)
- hermes_cli/auth.py L4876 (mint payload path)
- hermes_cli/auth.py L5154 (terminal-runtime access-token refresh)
- hermes_cli/auth.py L5262 (cross-process serialized refresh)
- hermes_cli/auth.py L5317 (terminal-runtime mint payload)
The state-read path at L5025 (``state.get("inference_base_url")``) is
deliberately NOT gated — pre-existing state in ``auth.json`` is either
already validated (it came from one of the five network sites above) or
set by a trusted local actor (manual edit, ``_setup_nous_auth`` test
fixture, ``hermes login nous`` against a staging endpoint via the
documented ``NOUS_INFERENCE_BASE_URL`` env override). Direct write_file /
patch tampering with auth.json is independently blocked by PR #14157.
Adds tests/hermes_cli/test_nous_inference_url_validation.py covering:
- validator https + host + edge-case rules (12 cases)
- all 5 network call sites grep contracts (no _optional_base_url
regression possible without test failure)
- proxy adapter defense-in-depth check still present
- env override path NOT gated (documented dev/staging behaviour)
18 new tests, all 119 Nous-auth tests green.
The Nous Portal proxy adapter forwards minted ``agent_key`` bearer tokens
to whatever ``base_url`` ``resolve_nous_runtime_credentials()`` returns,
which is read directly from the refresh / agent-key-mint response and
persisted to ``~/.hermes/auth.json``. With no validation beyond a
trailing-slash strip, a poisoned URL (Portal-side MITM, or local write
to auth.json) gets forwarded the legitimate bearer on every subsequent
proxy request — exfiltrating the user's inference budget and opening a
response-injection channel back into the IDE / chat client.
Add ``_validate_nous_inference_url_from_network()`` in ``hermes_cli.auth``:
an https + host-allowlist check that returns None for anything outside
``inference-api.nousresearch.com``, so callers fall back to the
documented default rather than ship the bearer to an attacker.
This commit wires the validator into the proxy adapter at
``nous_portal.py``. A follow-up commit wires it into the four refresh /
mint sites in ``auth.py`` so the poisoned URL never lands in auth.json
in the first place.
The env-var override path (``NOUS_INFERENCE_BASE_URL``) bypasses
validation by design — that's the documented staging/dev escape hatch
and the env source is already trusted (the user set it themselves).
Co-authored-by: memosr <mehmet.sr35@gmail.com>
PR #14157 added control-plane write-deny against the ACTIVE HERMES_HOME,
which is fine in non-profile mode but leaves a gap once a profile is
active: HERMES_HOME points at <root>/profiles/<name>, so the global
<root>/auth.json + <root>/config.yaml + <root>/webhook_subscriptions.json
+ <root>/mcp-tokens/ remain writable. Same shape as the .env gap PR
#15981 closed via _hermes_root_path().
Apply the same widening pattern here. The control-file/mcp-tokens check
now iterates BOTH _hermes_home_path() and _hermes_root_path() (dedupes
when they coincide in non-profile mode). Also tightens the mcp-tokens
check from "startswith dir + os.sep" to "==dir OR startswith dir + os.sep"
so writing the directory entry itself is blocked, not just files inside.
Regression tests cover both protections in a real profile-mode layout
(<tmp>/hermes/profiles/coder as HERMES_HOME, <tmp>/hermes as root).
Adds active-HERMES_HOME control-plane files to the write deny list:
auth.json, config.yaml, webhook_subscriptions.json, and any path
under mcp-tokens/. realpath() resolves before comparison so
directory-traversal and symlink targets are normalised, preventing
trivial deny-list bypass via ../ tricks.
Without this, a prompt-injected agent could rewrite Hermes' own
auth state or routing config via write_file / patch — without
triggering the terminal dangerous-command approval — and persist
attacker-controlled behaviour across sessions.
Fixes#14072
2026-05-22 04:32:14 -07:00
1040 changed files with 154426 additions and 5816 deletions
Hermes works with whatever provider you want — that's not changing. But if you'd rather not collect five separate API keys for the model, web search, image generation, TTS, and a cloud browser, **[Nous Portal](https://portal.nousresearch.com)** covers all of them under one subscription:
- **300+ models** — pick any of them with `/model <name>`
- **Tool Gateway** — web search (Firecrawl), image generation (FAL), text-to-speech (OpenAI), cloud browser (Browser Use), all routed through your sub. No extra accounts.
One command from a fresh install:
```bash
hermes setup --portal
```
That logs you in via OAuth, sets Nous as your provider, and turns on the Tool Gateway. Check what's wired up any time with `hermes portal status`. Full details on the [Tool Gateway docs page](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway).
You can still bring your own keys per-tool whenever you want — the gateway is per-backend, not all-or-nothing.
---
## CLI vs Messaging Quick Reference
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
# Can't compress further and already at minimum tier
agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.",force=True)
agent._vprint(f"{agent.log_prefix} 💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.",force=True)
**The pattern:** s6-overlay's `/init` execs a CMD as the container's "main
program" after the supervision tree is up. The CMD inherits
stdin/stdout/stderr from `/init` — which in `-it` mode is the container
TTY. The stage2 hook detects the TUI case and short-circuits the
main-hermes service so the hermes CMD becomes that main program.
```sh
# In docker/stage2-hook.sh
_is_tui_invocation() {
for arg in "$@"; do
case "$arg" in --tui|-T) return 0 ;; esac
done
case "${HERMES_TUI:-}" in 1|true|TRUE|yes) return 0 ;; esac
if [ -t 0 ] && [ $# -eq 0 ]; then return 0; fi
return 1
}
```
And in `docker/s6-rc.d/main-hermes/run`:
```sh
if [ -f /var/run/s6/container_environment/HERMES_TUI_MODE ]; then
exec sleep infinity # s6-overlay will exec CMD as the TTY-connected main
fi
exec s6-setuidgid hermes hermes ${HERMES_ARGS:-}
```
In TUI mode main hermes is effectively unsupervised (same as the pre-s6
behavior with tini — acceptable because the user is interactively
present). Dashboard and profile gateways still get full s6 supervision via
their separate services.
The integration test `test_tty_passthrough_to_container` uses `tput cols`
and `COLUMNS=123` as the probe.
---
## Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Phase 2 breaks a downstream user's Dockerfile that `FROM`s ours | Medium | Medium | Release notes call out ENTRYPOINT change; the test harness (`tests/docker/`) gives high confidence in behavior parity |
| TUI TTY passthrough fails on some Docker versions | Low | High | Harness includes `test_tty_passthrough_to_container` as a hard gate; fallback plan = s6-fdholder ([s6-overlay#230](https://github.com/just-containers/s6-overlay/issues/230) Solution 2) |
| s6-overlay non-root quirks (logutil-service, fix-attrs) bite us | Low | Low | Supervisor runs as root, services drop — sidesteps these issues |
| Podman rootless UID mapping confuses s6 | Medium | Low | Documented as supported, fix reactively; a Podman + Docker environment is stood up for validation |
| Test harness is flaky (docker daemon issues, timing) | Medium | Low | Generous timeouts; skip when docker unavailable; polling helpers replace fixed sleeps in `test_container_restart.py` |
| Profile gateway crash loop masks a real config error | Low | Medium | s6 `finish` script `max_restarts` cap (planned follow-up); operators see crash-looping logs in `$HERMES_HOME/logs/gateways/<profile>/` |
| Dockerfile+entrypoint drift from linter (hadolint/shellcheck) reveals latent bugs | Low | Low | CI lint jobs catch them; fix or document ignore with rationale |
| Stale `gateway.pid` from a dead container collides with an unrelated live PID in the restarted container | Low | Medium | Cont-init reconciliation removes `gateway.pid` and `processes.json` from every profile dir on boot, before any new gateway starts |
| `docker restart` silently loses per-profile gateway registrations (tmpfs scandir wiped) | High (without mitigation) | High | Cont-init reconciliation re-registers from persistent `$HERMES_HOME/profiles/` and auto-starts those last seen `running`; outcome recorded to `$HERMES_HOME/logs/container-boot.log` (size-bounded, rotates to `.1` at 256 KiB) |
| A `running` gateway that's actually broken auto-restarts into a crash loop after every container restart | Low | Medium | s6 `finish` script `max_restarts` cap (planned); follow-up: `hermes doctor` alerts when N consecutive container restarts ended in `startup_failed` |
| `_s6_running()` detection works as root but silently fails for unprivileged hermes user, making runtime-registration path inert | High (without mitigation) | High | **Caught in PR review.** Detection now probes `/proc/1/comm` (world-readable) + `/run/s6/basedir`. Docker integration tests refactored to `docker exec -u hermes` so the realistic runtime user is exercised |
| `s6-svscanctl` from hermes hits EACCES on the root-owned control FIFO | Medium | Medium | `02-reconcile-profiles` chowns `/run/service/.s6-svscan/{control,lock}` to hermes after stage1 creates them |
| Per-service `supervise/control` FIFO is root-owned by s6-supervise, blocking `s6-svc` from hermes | Known | Medium | Surfaced cleanly as `S6CommandError` (with rc + stderr) instead of raw `CalledProcessError`. Permission fix tracked as a follow-up (small SUID helper, polling chown loop in cont-init.d, or replace `s6-svc` with `down`-marker manipulation) |
---
## Decision Log
| # | Question | Decision |
|---|---|---|
| OQ1 | Gate Phase 2 behind env var? | Ship directly (Hermes is pre-1.0; users can pin the previous image) |
| OQ2 | s6 root model | Root `/init`, drop per-service via `s6-setuidgid hermes` |
| OQ3 | Dashboard opt-in mechanism | Always declared as an s6 service; `03-dashboard-toggle` cont-init script writes a `down` marker when `HERMES_DASHBOARD` is unset so `s6-svstat` reports the slot's real state |
| OQ6 | — (retired; no subagent gateways in scope) | — |
| OQ7 | Resource limits per profile gateway | Defer (no per-cgroup limits; rely on the container's overall limit) |
| OQ8 | Log persistence | `$HERMES_HOME/logs/gateways/<profile>/`. The log path is sourced from runtime `$HERMES_HOME` via `with-contenv`, NOT Python-substituted at registration time |
| OQ9 | TUI passthrough | Trust the documented [s6-overlay#230](https://github.com/just-containers/s6-overlay/issues/230) Solution 1; harness includes a TTY passthrough hard-gate test |
**Post-merge additions from PR #30136 review:**
- **Multi-arch tarballs:**`TARGETARCH` mapped to `x86_64` / `aarch64`;
per-arch tarball fetched via `curl` because `ADD` doesn't honor BuildKit
args.
- **SHA256 verification:** all three tarballs (noarch, symlinks, per-arch)
pinned via build ARGs and verified with `sha256sum -c` against a single
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.