Compare commits

...

62 Commits

Author SHA1 Message Date
Teknium 0c859a1c04 chore: release v0.15.0 (2026.5.28) (#34008)
* chore: release v0.15.0 (2026.5.28)

The Velocity Release. Run_agent.py refactor (16k→3.8k LOC, -76%),
kanban grows into a multi-agent platform (104 PRs), cold-start perf wave
continues (-240ms / -47% per-turn function calls / -195ms per tool call),
session_search rebuilt (4500x faster, no LLM), promptware defense lands,
Bitwarden Secrets Manager integration, two new image_gen providers
(Krea 2, FAL plugin port), Nous-approved MCP catalog, OpenHands skill,
ntfy as 23rd messaging platform, deep xAI integration round.
15 P0 + 65 P1 closures. 747 PRs, 1,302 commits, 321 contributors.

* chore(release): bump acp_registry/agent.json to 0.15.0 (sync with pyproject)
2026-05-28 10:45:33 -07:00
kshitij 1a74795735 feat: add claude-opus-4.8 and claude-opus-4.8-fast (#34003)
Anthropic released Claude Opus 4.8 on 2026-05-27, available on
OpenRouter, Anthropic, Amazon Bedrock, and Claude Platform on AWS:
  - https://openrouter.ai/anthropic/claude-opus-4.8
  - https://openrouter.ai/anthropic/claude-opus-4.8-fast

The fast-mode variant is a separate model ID (anthropic/claude-opus-4.8-fast)
priced at 2x of the base model — a notable improvement over the 6x premium
on older Opus generations (4.6/4.7). It is NOT a `speed: "fast"` request
parameter like Opus 4.6; Anthropic's native fast-mode beta still only
covers Opus 4.6.

Changes:

  hermes_cli/models.py
    - Add anthropic/claude-opus-4.8 + anthropic/claude-opus-4.8-fast to
      the OpenRouter fallback snapshot and the Nous Portal curated list
      (live catalogs surface them automatically when reachable; the
      fallback list matters when the manifest fetch fails).
    - Add claude-opus-4-8 to the Anthropic-native picker list.

  agent/model_metadata.py
    - Register claude-opus-4-8 / claude-opus-4.8 in DEFAULT_CONTEXT_LENGTHS
      with 1M tokens (matches 4.6/4.7).

  agent/anthropic_adapter.py
    - Extend _XHIGH_EFFORT_SUBSTRINGS, _ADAPTIVE_THINKING_SUBSTRINGS, and
      _NO_SAMPLING_PARAMS_SUBSTRINGS with "4-8"/"4.8". 4.8 inherits the
      Opus 4.7 API contract: adaptive thinking only, xhigh effort level
      supported, sampling parameters (temperature/top_p/top_k) return 400.
    - Add claude-opus-4-8 to _ANTHROPIC_OUTPUT_LIMITS (128k max output,
      same as 4.7). Matches by substring so claude-opus-4-8-fast and
      date-stamped variants resolve correctly.

  agent/usage_pricing.py
    - Add anthropic/claude-opus-4-8: $5/$25 per MTok input/output, $0.50
      cache read, $6.25 cache write (same as 4.6/4.7).
    - Add anthropic/claude-opus-4-8-fast: $10/$50 per MTok (2x), $1.00
      cache read, $12.50 cache write. Per OpenRouter, the 2x premium is
      the only differentiator from regular Opus 4.8.
    - OpenRouter routes still pull pricing from the live /models API, so
      no static OpenRouter entry is needed.

  tests/agent/test_model_metadata.py
    - Extend the Claude 4.6+ context-length tag list with 4.8/4-8.

  website/static/api/model-catalog.json
    - Regenerated via `python scripts/build_model_catalog.py` to pick up
      the new entries in the OpenRouter and Nous Portal fallback lists.

E2E verification (isolated sys.path import against the worktree):
  - _supports_adaptive_thinking, _supports_xhigh_effort, _forbids_sampling_params
    all return True for claude-opus-4.8 and claude-opus-4.8-fast.
  - _supports_fast_mode (the `speed: "fast"` request-parameter gate) stays
    False for 4.8 — fast mode is a separate model ID on OpenRouter, not a
    parameter Anthropic accepts on the base model.
  - DEFAULT_CONTEXT_LENGTHS resolves 1M for both notations.
  - resolve_billing_route + _lookup_official_docs_pricing resolve the
    correct $5/$25 (regular) and $10/$50 (fast) pricing for both
    dot-notation and dash-notation inputs.
  - 4.7 and 4.6 regression: behavior unchanged.

Unit tests: 305 passed across tests/agent/test_usage_pricing.py,
test_model_metadata.py, tests/hermes_cli/test_model_catalog.py,
test_models.py, test_model_validation.py, test_models_dev_preferred_merge.py.
2026-05-28 10:31:59 -07:00
Ben Heidorn e8b9369a9d feat(openrouter): pass session_id in extra_body for sticky routing
OpenRouter supports a session_id field in extra_body that pins
multi-turn conversations to the same provider endpoint, enabling
prompt cache reuse across turns. The session_id was already threaded
through to build_extra_body() but never included in the returned dict.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
2026-05-28 08:52:19 -07:00
kshitij 0554ef1aa3 fix(agent): fallback immediately on provider content-policy blocks (#33883)
* fix(agent): fallback immediately on provider content-policy blocks

Provider safety-filter refusals (e.g. OpenAI Codex 'flagged for possible
cybersecurity risk', OpenAI moderation 'violates our usage policies',
Anthropic safety-system rejections, Azure content_filter) are
deterministic decisions about a specific prompt. Retrying the same
prompt up to api_max_retries times just reproduces the same refusal and
burns paid attempts before surfacing the generic 'API failed after 3
retries — <provider message>' to Telegram / cron with no indication that
the failure came from the model provider rather than Hermes itself.

Classify these as a new FailoverReason.content_policy_blocked
(non-retryable, should_fallback=True) and route them through the
existing is_client_error path so the loop:
  - skips the 3x retry backoff
  - activates a configured fallback model immediately
  - emits a clear provider-safety message to the user (not the generic
    'Non-retryable error (HTTP None)') and surfaces actionable guidance
    when no fallback is configured (rephrase, narrow context, or set
    fallback_model in hermes config)
  - returns a final_response that explicitly tells the user this came
    from the model provider, so gateway delivery is unambiguous and
    cron last_status reflects the safety block rather than a vague
    'agent reported failure'

Patterns are intentionally narrow — verbatim refusal phrasings keyed to
specific provider safety pipelines, not generic words like 'policy' or
'violation' that would collide with billing / format / auth errors.
Regression guards in test_18028_content_policy_blocked.py verify
billing 402s, generic 400s, and OpenRouter account-level
provider_policy_blocked remain distinct classifications.

Salvaged from #18164 onto current main (file restructure: loop logic
moved from run_agent.py to agent/conversation_loop.py, _emit_status →
_buffer_status), broadened patterns beyond the original OpenAI Codex
cybersecurity case to cover OpenAI moderation, Anthropic safety system,
and Azure content_filter; added user-actionable guidance and a clear
final_response so cron/gateway surfaces the policy block instead of a
generic non-retryable error, and added a regression-guard test module
mirroring the is_client_error predicate.

Addresses #18028.

Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>

* chore: add kchuang1015 to AUTHOR_MAP

---------

Co-authored-by: Kuan-Chieh Huang <kchuang1015@users.noreply.github.com>
2026-05-28 07:28:24 -07:00
kshitij a82c88bac0 fix(xai-oauth): accept bare-code manual paste (state=None) (#26923) (#33880)
xAI's consent page renders the authorization code in-page rather than
redirecting through the 127.0.0.1 callback, so on remote/headless setups
(GCP Cloud Shell, Codespaces, container consoles, headless VPS) the only
value the user can paste is the opaque code with no `code=`/`state=`
query parameters. `_parse_pasted_callback` correctly returns
`state=None` for that input, but `_xai_oauth_loopback_login` then
validated state unconditionally and raised `xai_state_mismatch`,
making the documented bare-code paste path unreachable.

PKCE (code_verifier) still binds the token exchange to this client,
so the local state-equality check is redundant when there is no state
to compare. On the manual-paste path only, substitute the locally
generated state when the callback returned none — the rest of the
validation chain (code presence, error field, token exchange) is
unchanged. The loopback HTTP-server path still requires a matching
state (a real browser redirect always carries one).

Also: clarify the manual-paste prompt to mention xAI's in-page code
rendering so users know pasting the bare code on its own is expected.

Root-cause analysis from #26923 comment by @AccursedGalaxy (2026-05-20).

Tests
-----
* test_xai_loopback_login_manual_paste_bare_code_succeeds — positive
  end-to-end through the token exchange with state=None.
* test_xai_loopback_login_loopback_path_rejects_missing_state — the
  HTTP-server path still rejects state=None as a regression guard
  (the bare-code relaxation must NOT widen the loopback path).
* Existing test_xai_loopback_login_manual_paste_state_mismatch_raises
  continues to verify wrong (non-None) state is rejected on manual-paste.

Closes #26923.
2026-05-28 05:47:30 -07:00
helix4u c0d04694ea docs(email): clarify gateway vs Himalaya setup 2026-05-28 05:42:09 -07:00
Teknium 67011cc0d7 feat(agent): buffer retry/fallback status, surface only on terminal failure (#33816)
Users report that the CLI/gateway floods them with confusing retry chatter
during transient failures: a single 429 can produce 10+ "Provider/Endpoint/
Retrying in 5s..." lines before the request eventually succeeds. The same
firehose hits Telegram, Discord, Slack, etc. via _emit_status.

This patch defers all retry/fallback/compression status messages until we
know the outcome:
  - if the turn ultimately succeeds (any path: primary recovers, fallback
    activates, compression unsticks the request), the buffer is silently
    dropped — the user sees nothing.
  - if every retry and fallback exhausts and the turn fails, the buffer
    is flushed at the terminal-failure return so the user sees the full
    retry trace alongside the final error.

Backend logging (agent.log) is unchanged — every emission site still
writes to logger.warning/info, so post-mortem diagnosis is intact.

## What changed

run_agent.py: four new methods on AIAgent:
  _buffer_status(msg)   — defer an _emit_status call
  _buffer_vprint(msg)   — defer a _vprint(force=True) line
  _clear_status_buffer() — drop pending messages on success
  _flush_status_buffer() — replay pending messages on terminal failure

agent/conversation_loop.py:
  - converted ~30 mid-process emit/vprint sites in the retry, fallback,
    compression, empty-response, and stream-watchdog paths to the buffered
    helpers
  - added _flush_status_buffer() at every terminal-failure return so users
    still see the trace when it actually matters
  - added _clear_status_buffer() at the "non-empty assistant content"
    point (NOT at "API call returned bytes" — empty responses still loop
    through the empty-retry path and would otherwise lose their trace
    between iterations)
  - silenced the two "(´;ω;`) oops, retrying..." / "(╥_╥) error,
    retrying..." spinner final-frame messages — the spinner now stops
    cleanly so retries leave no visible residue

agent/chat_completion_helpers.py: same conversion for codex TTFB / stale-
stream / fallback-activation status messages.

agent/stream_diag.py: _emit_stream_drop now buffers instead of emitting
directly.

## Tests

tests/run_agent/test_retry_status_buffer.py: 7 unit tests covering
accumulate→flush, clear-on-success, mixed kinds, empty-buffer no-op,
re-buffer after flush, exception swallowing.

Updated 3 existing tests that mocked _emit_status to also mock (or use)
_buffer_status:
  - tests/run_agent/test_run_agent.py::test_empty_response_emits_status_for_gateway
  - tests/run_agent/test_stream_drop_logging.py (2 tests)
  - tests/agent/test_codex_ttfb_watchdog.py (TTFB hint test)

## Validation

Live test: hermes chat -q against an unreachable endpoint with no fallback
exhausts retries and prints the full trace at the end. Same flow against
a working endpoint prints zero retry chatter.
2026-05-28 04:53:27 -07:00
Teknium e0572a6def fix(skills-hub): stop ellipsis-truncating the Identifier column (#33810)
`hermes skills search` rendered the Identifier column with the default
overflow behaviour, so long slugs (notably browse-sh — every browse-sh
skill ends in a `-XXXXXX` hash that's part of the identifier) were cut
to `browse-sh/weathe…`. Users copied the visible string into
`hermes skills install` and got a not-found error because the hash was
gone.

Set overflow="fold" on the Identifier column in both search tables
(`do_search` and the `_resolve_short_name` multi-match table) so long
slugs wrap onto a second line instead of getting eaten. Also add a
`--json` flag to `hermes skills search` (and the `/skills search`
slash variant) for scripting — emits a list of {name, identifier,
source, trust_level, description} objects with the full identifier,
which is the right shape for copy-paste pipelines too.

Closes #33674.
2026-05-28 04:53:13 -07:00
Teknium 5e1f793430 chore(web): remove web_crawl tool + provider crawl plumbing (#33824)
The web_crawl_tool() function was an orphan — no model schema registered
it, no skill or CLI command called it, and the agent had no way to invoke
it. PR #32608 proposed wiring it up as a model-callable tool; we've
decided not to expose crawl as a separate capability since web_search +
web_extract cover the use cases we want models to have.

Removed:
- tools/web_tools.py: web_crawl_tool() (~230 LOC)
- plugins/web/firecrawl/provider.py: supports_crawl() + crawl()
- plugins/web/tavily/provider.py: supports_crawl() + crawl()
- plugins/web/xai/provider.py: supports_crawl() override
- agent/web_search_provider.py: supports_crawl() + crawl() ABC methods
- agent/web_search_registry.py: get_active_crawl_provider() +
  the 'crawl' branch in _resolve()
- agent/display.py: web_crawl tool-progress rendering
- hermes_cli/config.py: 'web_crawl' from TAVILY_API_KEY.tools
- tools/website_policy.py: stale comment reference
- Tests: removed TestWebCrawlTavily class, the two website-policy
  web_crawl tests, the searxng/ddgs/brave-free crawl-error tests,
  the integration test_web_crawl method, and the
  test_unconfigured_crawl_emits_top_level_error test. Trimmed the
  capability-flag parametrize list and the WebSearchProvider ABC
  conformance tests.
- Docs: trimmed the Crawl column from capability tables in both EN
  and zh-Hans, updated the developer-guide ABC table.

Net: 25 files, +115/-1067.

Closes #33762 (the schema-text bug only existed if #32608 landed).
Supersedes #32608.
2026-05-28 04:52:42 -07:00
teknium1 b243afb68b fix(discord): skip backfill for auto-created threads and update test fakes
When auto-threading kicked in, the broadened backfill gate ran on the
freshly-created thread — but the thread has no prior context to fetch,
and the parent-channel reference passed to _fetch_channel_context would
have leaked unrelated context (see #31467).

Skip backfill when auto_threaded_channel is set.  Also teach the
_FakeTextChannel / _FakeThreadChannel test doubles to expose a no-op
history() async generator so the broadened gate doesn't trip
AttributeError → discord.Forbidden (MagicMock) → TypeError in the
existing auto-thread tests.  Add a regression test that asserts
auto-threaded messages do not trigger backfill.
2026-05-28 04:52:02 -07:00
teknium1 68ddd6b338 refactor(discord): inline backfill gate and document intent
Drop the _needed_mention local variable now that it has only one use,
inline its expression as _has_mention_gap, and add a comment explaining
the three backfill cases (mention-gated channel, thread, DM skip).

Behaviorally identical to the prior commit; cleanup only.

Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>
2026-05-28 04:52:02 -07:00
Pluviobyte eafe11d456 fix(gateway): backfill Discord thread context
Discord threads where the bot has already participated bypass mention gating by default, but the backfill check was still tied to the mention-needed condition. That meant follow-up thread messages could trigger a response without providing recent thread history to the session.

Run history backfill for thread messages whenever backfill is enabled, while keeping DMs skipped and channel mention backfill behavior unchanged. Add a regression test for a known thread follow-up without an explicit mention.

Fixes #33666

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 04:52:02 -07:00
Teknium a1eaad2fc0 perf(skills-page): lazy-fetch the catalog instead of bundling 34MB into JS (#33809)
PR #33748 grew the live skills index from ~2k skills to ~69k, which made
the previous build-time bundling strategy untenable: the skills page's
JS chunk was about to balloon from ~1MB to ~35MB.  Initial page load
on mobile became unusable, search lagged on every keystroke against the
68k-item array, and JSON.parse blocked the main thread at startup.

Three changes:

1. extract-skills.py writes skills.json + skills-meta.json into
   website/static/api/ instead of website/src/data/.  Static-served by
   Vercel as /docs/api/skills.json (gzipped on the wire), same CDN that
   already serves skills-index.json.

2. skills/index.tsx drops the static import and fetches both files in
   parallel on mount.  Loading state shows '…' for the count; failures
   surface a small error pill instead of blanking the page.

3. Search is debounced 150ms and runs against a precomputed lowercase
   haystack stamped onto each row at load time.  Before: array-join +
   toLowerCase per row per keystroke on a 68k array.  After: single
   .includes() per row, deferred until typing settles.

Validation:

| | before | after |
|---|---|---|
| skills.json location | src/data/ (bundled) | static/api/ (CDN) |
| Largest JS chunk | would be ~35MB at 68k skills | 659 KB |
| Initial page render | wait for full parse | immediate, fetch async |
| Per-keystroke filter | join+lowercase x 68k rows | single includes x 68k rows |
| Debounce | none | 150ms |

Built locally for both en and zh-Hans locales; the 34MB skills.json now
lives in build/api/ and is served separately rather than inlined into
the page's bundle.

skills.json and skills-meta.json added to .gitignore — they were already
build artifacts, but the gitignore only listed skills-index.json before.
2026-05-28 03:41:43 -07:00
teknium1 6f9182cb34 fix(kanban): content-addressed corrupt-DB backup filename
Repeated quarantines of an unchanged corrupt kanban.db used to amplify
disk usage by N: the gateway dispatcher's 5-minute retry loop, multi-
profile fleets sharing one DB, and manual reopen attempts each produced
a fresh '.corrupt.<timestamp>.bak' copy of the same bytes. After 10
retries on a 100KB DB you had 11x the disk footprint of duplicate
corrupt data.

Derive the backup filename from a sha256 of the main DB instead of a
timestamp + collision counter. Same bytes → same filename → skip the
copy on retries. Different bytes (partial repair, further damage) →
different filename → preserve separately. Sidecar (-wal/-shm) backups
inherit the same content-addressed name.

Inspired by @hanzckernel's PR #33529, simplified down to ~30 LOC: drop
the persistent JSON marker file, drop the atomic temp+fsync+rename
helper (shutil.copy2 is fine for a quarantine-only path), drop the
gateway-side WAL/SHM fingerprint extension (the existing
(path, mtime, size) tuple still gives the 5-minute retry semantics it
needs), and drop the gateway-side helper extraction. The backup file
existing IS the marker; no separate state needed.

Test: tests/hermes_cli/test_kanban_db.py::test_repeated_corrupt_open_reuses_single_backup
proves 10 retries on the same corrupt bytes produce 1 backup (was 11),
and mutating the corrupt bytes produces a second backup with a
different fingerprint.

Refs #33529
Co-authored-by: hanzckernel <zhicheng.han@mathematik.uni-goettingen.de>
2026-05-28 03:38:09 -07:00
Teknium 432a691758 fix(update): stream + idle-kill npm run build so a stalled webui-build can't soft-brick the install (#33803)
`hermes update` ran the webui build with `capture_output=True` and no timeout. On low-memory hosts (WSL2's 4 GB default, small VPSes, antivirus stalls) Vite goes silent for minutes; users see a frozen terminal, decide the update is hung, and reboot. The reboot lands *after* `pip install -e .` has already touched the install but *before* the build completes, leaving the `hermes` launcher in place while `hermes_cli` is no longer importable — i.e. `ModuleNotFoundError: No module named 'hermes_cli'` (#33788, same class as #32384).

Changes:

- New `_run_with_idle_timeout()` helper: streams subprocess output line-by-line (so the user sees Vite progress in real time) and kills the process if no bytes appear on stdout/stderr for 180s. The existing stale-dist fallback (#23817) then serves the previous build instead of failing the update.
- `_build_web_ui()` uses the helper for `npm run build` (the actual stall site). `npm install` keeps `subprocess.run` + capture_output to preserve the existing EPERM-retry-on-Windows contract.
- Both `cmd_update` call sites print `→ Core update complete. Building dashboard (optional)...` before the webui build. The CLI is fully functional at this point; a webui-build failure only affects `hermes dashboard`. Telegraphing the boundary explicitly stops users from rebooting through the build step.

Tests:

- `tests/hermes_cli/test_run_with_idle_timeout.py` — 4 tests covering streaming success, nonzero exit, idle-kill, and missing-binary cases. Uses real `subprocess.Popen` on tiny Python scripts; isolated in its own file so per-file canonical-runner parallelism doesn't pair it with the mock-heavy tests.
- `tests/hermes_cli/test_web_ui_build.py` — updated existing tests to patch `_run_with_idle_timeout` for the build step in addition to `subprocess.run` for the install step.
- `tests/hermes_cli/test_cmd_update.py::test_update_refreshes_repo_and_tui_node_dependencies` — same update.

Full suite: `scripts/run_tests.sh tests/hermes_cli/` → 5646 passed, 0 failed.

Fixes #33788.
2026-05-28 03:34:47 -07:00
teknium1 78be458608 fix(patch): widen new_string \t/\r unescape to all match strategies (#33733)
Extends @liuhao1024's escape-normalized fix so the patch tool also
recovers when old_string carries a real tab byte and matches via the
`exact` strategy — which is the headline reproduction in the issue and
the most common case in practice (LLMs frequently get old_string right
because they re-read the file, but still serialize new_string's tabs as
two-character `\t`).

Instead of gating on the match strategy, decide per-sequence by looking
at the *matched region of the file*: only convert `\t` -> tab and
`\r` -> CR when the file region we're replacing actually contains the
corresponding control byte. That mirrors the region-based heuristic in
`_detect_escape_drift` and keeps legitimate writes of the literal
two-character string `"\t"` (e.g. patching `sep = "\t"` in Python
source) untouched — those files have a backslash+t in the matched
region, not a real tab, so new_string passes through verbatim. `\n` is
still excluded because newlines serialize correctly through JSON and
unescaping would corrupt source escape sequences far more often than
help.

E2E verified against the live `patch` tool: tab-indented file + literal
`\t` in new_string under both `exact` (Variant 1) and `escape_normalized`
(Variant 2) strategies now produces real tab bytes; a Python source line
containing `sep = "\t"` (legitimate literal backslash-t) survives a
patch unchanged.

Tests updated to cover both strategies and the legitimate-literal case,
and to assert that `\n` is intentionally preserved.

Refs #33733
2026-05-28 03:27:20 -07:00
liuhao1024 e9f3f2b34a fix(tools): unescape common sequences in new_string when escape_normalized matches
When the patch tool matches via the escape_normalized strategy, old_string
contains literal \t, \n, \r sequences that get unescaped to match real
control characters in the file. However, new_string was written as-is,
leaving literal backslash sequences in the output.

Add _unescape_common_sequences() helper and apply it to new_string when
the matching strategy is escape_normalized. This ensures LLM-generated
tab/newline sequences become real bytes in the patched file.

Fixes #33733
2026-05-28 03:27:20 -07:00
Teknium 10ee4a729b fix(gateway): drain on Windows hermes gateway stop so sessions survive restart (#33798)
Sessions now survive `hermes gateway stop` / `restart` on native Windows.
Previously the gateway died on schtasks `/End` + os.kill SIGTERM without
ever running the drain loop, so the v0.13.0 session-resume feature (#21192)
silently broke on Windows: `resume_pending=True` was never written, and
the next boot started with a blank conversation history (issue #33778).

Root cause is twofold and the reporter only identified half of it:

1. `hermes_cli/gateway_windows.py::stop()` did not write the
   `planned_stop_marker` before signalling. The reporter caught this.

2. The bigger reason: `asyncio.add_signal_handler` raises
   NotImplementedError for SIGTERM/SIGINT on Windows, so even if the
   marker had been written, the gateway's existing SIGTERM handler
   (which is what calls `runner.stop()` and the `mark_resume_pending`
   loop) was never invoked. Writing the marker would have been
   necessary-but-insufficient.

The fix has two parts:

* gateway/run.py: new `_run_planned_stop_watcher` daemon thread polls
  for the planned-stop marker file every 0.5s. When the marker appears
  it `loop.call_soon_threadsafe(shutdown_signal_handler, None)` — the
  same shutdown path a real SIGTERM would have driven, including the
  pre-drain `mark_resume_pending` writes (run.py:5977) and graceful
  drain wait. The existing signal handler already accepts
  `received_signal=None` and falls through to
  `consume_planned_stop_marker_for_self()`, so no handler changes
  needed. Runs on every platform as cheap belt-and-suspenders.

* hermes_cli/gateway_windows.py: `stop()` now writes the marker for
  the running gateway PID and waits up to `agent.restart_drain_timeout`
  (default 30s) for the PID to exit cleanly. On clean drain, the kill
  sweep is non-forceful; on timeout, escalates to
  `kill_gateway_processes(force=True)` which routes to taskkill /T /F
  per `references/windows-native-support.md`.

Validation:

* 7 new tests in tests/gateway/test_planned_stop_watcher.py covering:
  marker→handler dispatch, no-marker idle, already-draining skip,
  not-yet-running skip, stop_event responsiveness, fire-once
  semantics, error tolerance.
* 8 new tests in tests/hermes_cli/test_gateway_windows.py covering:
  marker-before-kill ordering, clean-drain skips force-kill,
  drain-timeout escalates to force=True, no-pid-skips-drain,
  invalid-pid handling, fast-exit success, timeout failure,
  marker-write-failure tolerance.
* E2E (Linux, detached orphan): write_planned_stop_marker(pid) +
  `_drain_gateway_pid(pid, 5.0)` returns True in 0.5s after the
  victim sees the marker and exits. Tested with a double-forked
  subprocess so the test parent isn't holding it as a zombie.
* Targeted: tests/gateway/{restart_drain,restart_resume_pending,
  signal,signal_format,status,shutdown_forensics,approve_deny_commands,
  planned_stop_watcher} + tests/hermes_cli/{gateway_windows,
  gateway_service} → 519/519.

What was wrong with the reporter's claim (for future archaeology): they
described the symptom as "no `resume_pending=True` written to
`sessions.json`" — but Hermes uses `state.db` (SQLite), not
`sessions.json`, and `mark_resume_pending` is called regardless of
the marker (the marker only affects exit code 0 vs 1 for systemd
revival semantics). The real session-loss path is the missing drain
on Windows, not a missing marker. Both halves are fixed here.

Closes #33778.
2026-05-28 03:25:32 -07:00
teknium1 f8896dedc8 chore(release): map biser@bisko.be -> bisko in AUTHOR_MAP 2026-05-28 03:21:00 -07:00
Biser Perchinkov b5495db701 fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers
api_messages is built once before the retry loop while the primary provider
is active. When a mid-conversation fallback switches to a require-side thinking
provider (DeepSeek/Kimi/MiMo), assistant turns built under a non-require primary
(e.g. Codex) go out without reasoning_content and the new provider rejects the
request with HTTP 400 ("reasoning_content must be passed back").

Re-apply the echo-back pad against the current provider immediately before
building the request kwargs. Idempotent and a no-op unless the active provider
enforces echo-back, so it covers all fallback paths without affecting normal or
reject-side operation.

Drafted by Claude (Opus 4.7) under human review while fixing a personal deployment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 03:21:00 -07:00
Indigo Karasu 9179396cb7 fix(stream-consumer): only set _final_content_delivered when final response confirmed delivered
In GatewayStreamConsumer._run(), _final_content_delivered was set to True
based on the success of a mid-stream finalize edit, before the final
finalize edit was attempted. When the final edit later failed (Telegram
flood control, retry-after), _final_response_sent stayed False but
_final_content_delivered was already True, so gateway/run.py suppressed
its normal final send and the user saw a partial / fallback message
instead of the real answer.

Changes in gateway/stream_consumer.py:
- Remove the premature _final_content_delivered = True at the top of
  the got_done block.
- Set _final_content_delivered = True only when the actual final send /
  edit succeeds, in each finalize branch (no-finalize adapter,
  _message_id finalize, no-_already_sent send).
- _send_fallback_final: don't set _final_response_sent = True when only
  some chunks were delivered; the gateway should still attempt a
  complete final send. Set _final_content_delivered = True alongside
  _final_response_sent on the success path and short-text path.
- Cancellation handler: set _final_content_delivered = True alongside
  _final_response_sent when the best-effort final edit succeeds.

Adds TestFinalContentDeliveredGuard with 3 regression tests covering
the core bug scenario, the happy path, and partial fallback.

Closes #33708
Closes #25010
Refs #29200

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-05-28 03:15:19 -07:00
Dusk1e a91b1c8b31 fix(tirith): reject non-regular tar members during auto-install process 2026-05-28 02:49:26 -07:00
teknium1 247b24b49f chore(release): add AUTHOR_MAP entry for AdityaRajeshGadgil 2026-05-28 02:45:25 -07:00
Aditya Rajesh Gadgil 031983bbf8 fix: limit pre-update state snapshots 2026-05-28 02:45:25 -07:00
Teknium 8b6beaab5f docs: 30-day overhaul — correctness audit, PR coverage, Nous Portal weave, sidebar reorg (#33782)
* docs(audit): correctness pass across getting-started, reference, features, messaging, developer-guide, guides, integrations, user-guide

* docs: add PR coverage for last 30d + Nous Portal weave + nav reorg + build fixes

- Add docs for top user-visible PRs that shipped without docs (api-server
  session control, kanban features, telegram pin/edit, provider client tag,
  xAI retired-model migration, cron name lookup, --branch update flag, etc.)
- Apply Nous Portal weave across 23 pages (tasteful one-liners on
  getting-started/learning-path, configuration, overview, vision, x-search,
  credential-pools, provider-routing, cron, codex-runtime, profiles, docker,
  messaging/index, multiple guides, plus FAQ + index promotion)
- Reorganize sidebar: split Messaging into Popular/M365/Chinese/Other,
  Reference into Command/Configuration/Tools-Skills sub-categories, add
  orphan developer-guide pages (web-search-provider-plugin,
  browser-supervisor), move features from Integrations back to Features,
  fold lone spotify into Media & Web.
- Regenerate skill stubs + catalogs (kanban-codex-lane, hermes-s6-container-
  supervision, web-pentest)
- Fix broken anchor links (security/cron, configuration/fallback, telegram
  large-files, adding-platform-adapters step-by-step)
2026-05-28 02:41:36 -07:00
teknium1 c7f7783e5c test(xai-proxy): regression coverage for #28932 429 handling
Three new tests in tests/hermes_cli/test_proxy.py:

- xai_adapter_retry_rotates_pool_entry_on_429 — headline #28932 case.
  Two-entry pool, 429 on first entry, must rotate to second entry
  AND must NOT call refresh_xai_oauth_pure (refresh is irrelevant
  for rate limits).
- xai_adapter_retry_returns_none_on_429_when_pool_exhausted —
  single-entry pool: 429 returns None so the rate-limit response
  flows back to the client unchanged (existing behavior preserved).
- xai_adapter_retry_returns_none_for_unrelated_status — non-{401,
  429} statuses must not trigger any retry path at all; guards
  against the gate becoming too broad in future changes.

Each test asserts that refresh_xai_oauth_pure is never called on the
429 path — refresh is a 401-specific concern.

39/39 in tests/hermes_cli/test_proxy.py.
2026-05-28 02:36:37 -07:00
sprmn24 4ed482549f fix(xai-proxy): handle 429 rate-limit responses in proxy retry path
get_retry_credential only triggered on 401; a 429 Too Many Requests from
xAI was silently streamed back with no key rotation or back-off signal.

- server.py: widen retry gate from == 401 to in {401, 429}
- xai.py: on 429, skip token refresh and call mark_exhausted_and_rotate
  to stamp the 1-hour cooldown on the rate-limited key and return the
  next available credential. Returns None if pool is exhausted.
2026-05-28 02:36:37 -07:00
Dusk1e aa3466063b fix(android): reject unsafe tar members in psutil compatibility installer 2026-05-28 02:36:09 -07:00
Teknium bb0ac5ced2 chore(release): AUTHOR_MAP entry for vynxevainglory-ai
PR #29233 salvage.
2026-05-28 02:33:51 -07:00
Teknium 70abae8e3b fix(kanban): show horizontal scrollbar instead of wrapping columns
Salvage follow-up on top of @vynxevainglory-ai's PR #29233. Keep the
column-body flex:1 + min-height:0 fix (tall columns scroll internally
now), but drop the flex-wrap: wrap part — instead just stop hiding
the existing horizontal scrollbar.

PR #523254b34 (sadiksaifi, May 18) deliberately moved the kanban board
from a wrapping grid to a single-row pinned-width flex so the board
stays as one stable horizontal row. The mistake in that PR was the
scrollbar-width: none + ::-webkit-scrollbar { display: none } pair,
which hid the affordance so columns past the viewport became visually
inaccessible. Fixing that hidden-scrollbar bug while keeping the
single-row design honors both contributors' intent.
2026-05-28 02:33:51 -07:00
Vynxe Vainglory 538f0fa339 fix(kanban): wrap columns into rows and fix vertical overflow
Two CSS issues in the kanban dashboard:

1. Columns overflow horizontally with no way to reach them — the
   original scrollbar-width: none hid the scrollbar entirely, and
   even with a scrollbar, a wrapping layout is better UX for a board
   with 8+ columns. Changed to flex-wrap: wrap and removed the
   overflow-x: auto + hidden scrollbar rules. Columns now flow into
   multiple rows (~3 per row on a typical viewport) instead of
   running off-screen.

2. .hermes-kanban-column-body lacked flex: 1 and min-height: 0,
   so the flex child's implicit min-height: auto prevented it from
   shrinking below its content size. Columns with many cards pushed
   past the parent max-height instead of scrolling internally.

Verified: 9 columns wrap into 3 rows, all visible without
horizontal scroll. Done column (53 tasks) scrolls vertically
within its column bounds.
2026-05-28 02:33:51 -07:00
teknium1 9b5dae17a5 feat(context-engine): host contract for external context engines
Condenses the substance of PRs #16453, #17453, #16451, #17600, and #13373
into a minimal generic host contract that external context engine plugins
(e.g. hermes-lcm) need to integrate cleanly. Drops scaffolding that
duplicated existing infrastructure or had marginal value.

Five concrete changes:

1. `_transition_context_engine_session()` on AIAgent — generic lifecycle
   helper that fires on_session_end → on_session_reset → on_session_start
   → optional carry_over_new_session_context. Engines implement only the
   hooks they need; missing hooks are skipped. Built-in compressor keeps
   its existing reset-only behavior because callers default to no
   metadata. `reset_session_state()` now optionally accepts
   previous_messages / old_session_id / carry_over_context and delegates
   to the transition helper when provided. (#16453)

2. `conversation_id` passed to `on_session_start()` — both the
   agent-init call site and the compression-boundary call site now
   forward `self._gateway_session_key` so plugin engines have a stable
   conversation identity that survives session_id rotation (compression
   splits, /new, resume). The key already existed on AIAgent; it just
   wasn't reaching engines. (#16453)

3. Canonical cache buckets forwarded to engines — the usage dict passed
   to `update_from_response()` now includes input_tokens, output_tokens,
   cache_read_tokens, cache_write_tokens, and reasoning_tokens on top of
   the legacy prompt/completion/total keys. Engines can make decisions on
   cache-hit ratios and reasoning costs instead of only aggregates. ABC
   docstring updated. (#17453)

4. Plugin-registered context engines visible in the picker —
   `_discover_context_engines()` in plugins_cmd.py now also includes
   engines registered via `ctx.register_context_engine()` from plugin
   manifests, deduplicating by name so repo-shipped descriptions win on
   collision. (#16451)

5. `_EngineCollector.register_command()` — context engines using the
   standard `register(ctx)` pattern can now expose slash commands (e.g.
   `/lcm`). Routes to the global plugin command registry with the same
   conflict-rejection policy regular plugins use (no shadowing built-ins,
   no clobbering other plugins). Previously these calls hit a no-op and
   the slash commands silently never appeared. (#17600)

Dropped from the original 5 PRs:

- Compression boundary signal (`boundary_reason="compression"`) from
  #16453 — already on main at `agent/conversation_compression.py:412-424`,
  landed via the bg-review extraction.

- `discover_plugins()` before fallback in run_agent.py from #16451 —
  redundant: `get_plugin_context_engine()` already routes through
  `_ensure_plugins_discovered()` which is idempotent.

- Runtime identity diagnostics method + helpers from #13373 (+251 LOC) —
  operators can already read engine state via `engine.get_status()`;
  the diagnostics view added marginal value relative to its surface area.

- The 553-LOC slash-command machinery from #17600 — replaced with a
  20-LOC `register_command` method on the collector that reuses the
  existing plugin command registry instead of building a parallel one.

Net: ~215 LOC of host-contract changes + 282 LOC of focused tests, vs
~1,176 LOC across the original 5 PRs.

Co-authored-by: Tosko4 <1294707+Tosko4@users.noreply.github.com>

Closes #16453.
Closes #17453.
Closes #16451.
Closes #17600.
Closes #13373.
Related: stephenschoettler/hermes-lcm#68.
2026-05-28 01:45:30 -07:00
Teknium fb9f3a4ef9 fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748)
* fix(skills): pull full ClawHub catalog into the skills index

The website was showing 200 ClawHub skills out of 20k+ because
`ClawHubSource.search("")` for empty queries went straight to a single
unpaginated request. ClawHub's API caps any single page at 200 items and
returns a `nextCursor`; we grabbed page 1 and stopped, so the cached
index served from hermes-agent.nousresearch.com had a silent 99%
truncation.

End users never hit clawhub.ai directly (the index is rebuilt twice
daily by .github/workflows/skills-index.yml and served as a static JSON
on the docs site), so the cap-and-cache architecture is correct — it
just wasn't being filled.

Changes:
- `ClawHubSource.search(query="")` now routes through the existing
  `_load_catalog_index()` paginating walker instead of the unpaginated
  listing fallback (non-empty queries still hit the fast catalog search).
- `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live
  catalog is ~20k as of May 2026, with headroom for growth).
- `build_skills_index.py`: per-source crawl limits split out — ClawHub
  and LobeHub get 100k, others keep their effective caps.
- `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination
  regression hard-fails the CI build instead of silently shipping a
  degenerate index.

Test plan:
- New unit test `test_search_empty_query_paginates_full_catalog`
  exercises the cursor-following path with three mocked pages (450
  total items) and asserts all pages are walked.
- Existing 9 ClawHub tests + 127 broader skills_hub tests all pass.
- E2E against live ClawHub API: walker reached 9700+ skills across 49
  pages before this commit landed, paginating well past the previous
  50-page cap.

* fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k

E2E walk against live ClawHub API hit my initial 250-page cap at 49,698
skills with cursor=yes still pending. The catalog is roughly 2.5x larger
than the docstring estimate.

- max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None
  well before this in practice)
- SOURCE_LIMITS['clawhub'] 100k → 200k
- EXPECTED_FLOORS['clawhub'] 5000 → 20000
2026-05-28 01:42:19 -07:00
Teknium 09a5cd8084 fix(auth): sync manual:device_code Codex pool entries on re-auth (#33744)
#33164 made _save_codex_tokens sync the singleton-seeded `device_code`
pool entry on Codex OAuth re-auth. That fixed the #33000 path but missed
`manual:device_code` entries created by `hermes auth add openai-codex`
(the recommended workaround for users who hit #33000 before #33164
landed).

Every subsequent re-auth would refresh the device_code entry but leave
the manual:device_code entry holding the consumed refresh token plus
stale last_error_* markers — immediately recreating the 401
token_invalidated symptom on the next request, exactly as reported in
#33538.

Extend the refreshable source set to include `manual:device_code`.
Completing the device-code OAuth flow proves the user owns the ChatGPT
account, so it is safe to refresh every device-code-backed entry. Keep
`manual:api_key` and other non-device-code manual sources untouched —
those represent independent credentials.

Closes #33538.
2026-05-28 01:33:10 -07:00
Dusk1e 43abc51f66 fix(security): require source CIDR allowlisting for public msgraph webhook binds 2026-05-28 01:26:18 -07:00
Teknium 986abb3cf7 docs: drop stale Kimi/DeepSeek vision example (#33736)
Kimi K2.6 is natively multimodal — flagged by Shengyuan from the Kimi
growth team. Replace the named-vendor example with a model-agnostic
phrasing so the row doesn't go stale as more vendors ship vision.
2026-05-28 01:23:38 -07:00
Teknium 87e5b2fae0 feat(mcp): support TLS client certificates (mTLS) for HTTP and SSE servers (#33721)
Adds first-class `client_cert` / `client_key` config keys so MCP servers
behind mTLS work without an external TLS-terminating proxy. Resolves
inbound community question (Jeremy W.).

Schema (per `mcp_servers.<name>`, HTTP/SSE only):

- `client_cert: "/path/to/combined.pem"` — single PEM with cert + key
- `client_cert: "/path/to/cert"` + `client_key: "/path/to/key"` — separate
- `client_cert: [cert, key]` or `[cert, key, password]` — list form,
  with optional passphrase for encrypted keys

Paths support `~` expansion. Missing files raise a server-scoped
`FileNotFoundError` at connect time rather than failing later with an
opaque TLS handshake error.

Wiring:

- New SDK HTTP path (mcp >= 1.24): `cert=` on the user-owned
  `httpx.AsyncClient` alongside the existing `verify=` handling.
- SSE path: routed through an `httpx_client_factory` that wraps the
  SDK's defaults (follow_redirects=True) and layers `verify` + `cert`
  on top. The factory is only injected when needed, so the SDK's
  built-in `create_mcp_http_client` keeps being used in the default
  case.
- Deprecated mcp<1.24 path left untouched — that SDK's
  `streamablehttp_client` signature doesn't expose `cert`, and adding
  it would be dead code.

Also documents the previously-undocumented `ssl_verify` key (bool or
CA bundle path) in the MCP config reference.

Tests:

- `tests/tools/test_mcp_client_cert.py` (new, 19 tests):
  - `_resolve_client_cert` helper: all three input forms, `~` expansion,
    missing-file and validation errors.
  - HTTP transport: `cert=` forwarded into `httpx.AsyncClient` for
    string and tuple forms; absent when unset; missing-file error
    propagates.
  - SSE transport: factory only injected when cert or non-default
    verify is set; factory applies cert, custom CA bundle, and
    preserves `follow_redirects=True` + forwarded headers/auth.
- Existing tests: 200/200 in `test_mcp_tool.py` + `test_mcp_sse_transport.py`
  still pass.
2026-05-28 00:55:55 -07:00
Stephen Schoettler 8595281f3c fix: expose context engine tools with saved toolsets 2026-05-28 00:28:42 -07:00
Dusk1e 1a9ef83147 fix(security): require API_SERVER_KEY before dispatching API server work 2026-05-28 00:25:08 -07:00
LeonSGP43 442a9203c0 Fix xAI OAuth timeout manual fallback 2026-05-28 00:24:17 -07:00
helix4u 459d7694d3 fix(agent): preload jiter native parser 2026-05-28 00:20:11 -07:00
Robin Fernandes dc52b82d53 test(auth): update entitlement CI expectations 2026-05-28 00:19:31 -07:00
Robin Fernandes 1cf5e639b3 fix(auth): refresh Nous entitlement in tool menus 2026-05-28 00:19:31 -07:00
Robin Fernandes 406901b27d feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. 2026-05-28 00:19:31 -07:00
stephenschoettler 0bf9b867cf fix(website): pin serialize-javascript and uuid via npm overrides
Resolves the two Dependabot alerts currently open against the website
lockfile:

- serialize-javascript: pin to ^7.0.5 (was 6.0.2 — high-severity RCE
  via RegExp.flags + Date.prototype.to*, plus medium-severity DoS)
- uuid: pin to ^14.0.0 (was 8.3.2 — medium buffer bounds check miss
  in v3/v5/v6 when buf is provided)

Lockfile regenerated against current main (not the stale lockfile
from the original PR — several Dependabot bumps for mermaid,
webpack-dev-server, @babel/plugin-transform-modules-systemjs,
fast-uri, lodash-es+langium, lodash, follow-redirects, and dompurify
have landed since #30036 was opened, so the website portion was
re-applied surgically on top of those).

Salvaged the website half of PR #30036. The TUI test half landed
on main separately, so this PR is web-only.
2026-05-28 00:07:54 -07:00
kshitijk4poor 7b778db472 chore(release): map MoonRay305 contributor email for #32759 salvage
Adds `squiddy@2rook.ai → MoonRay305` to AUTHOR_MAP so contributor_audit.py
passes for the salvaged commits in #33482-followup PR.
2026-05-27 23:28:51 -07:00
Squiddy 3ba8962738 fix(kanban): add Windows init lock guard 2026-05-27 23:28:51 -07:00
Squiddy 90b6b3d18f fix(kanban): harden sqlite connection concurrency 2026-05-27 23:28:51 -07:00
Brian D. Evans 3ad46933d3 docs(voice): use uv pip install faster-whisper in STT install hints (#29800)
* docs(voice): use `uv pip install faster-whisper` in STT install hints

Three runtime messages told users to `pip install faster-whisper`
(reported in #29782 for the gateway STT failure message under
Telegram-in-Docker, where the user hit `bash: pip: command not
found`). The Hermes Docker image is built on `ghcr.io/astral-sh/uv`
with a uv-managed venv that doesn't ship `pip` on PATH; users on
modern `uv tool install` / `uv venv` installs see the same problem.

The canonical install command in this repo is `uv pip install`
(see `tools/lazy_deps.py:509` `feature_install_command()`), which
works in Docker (uv image), in `uv tool install` venvs, and in
pip-based venvs that already have uv on PATH.

Changed three locations to match:

- `gateway/run.py` — Telegram/Discord/Slack/WhatsApp/etc. voice
  reply when no STT provider is configured. Suggests
  `uv pip install faster-whisper` and notes that
  `pip install faster-whisper` also works if `pip` is on PATH.
- `tools/voice_mode.py` — `/voice` status line for missing STT.
- `cli.py` — Voice-mode startup error, "Option 1".

No behavior change beyond the user-facing text. No production
code path was touched.

* docs(voice): add pip fallback to cli + voice_mode STT hints

Copilot flagged that cli.py and tools/voice_mode.py recommend
`uv pip install faster-whisper` without a fallback for environments
where uv isn't on PATH. The gateway/run.py message already lists
`pip install faster-whisper` as an alternative; this commit aligns
the two remaining call sites to match.

Addresses inline Copilot review on #29800.

---------

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
2026-05-28 16:23:14 +10:00
Teknium 4e702fe2d9 test(ci): harden two flaky tests against CI noise (#33675)
Two unrelated transient failures on PR #33661's initial CI run, both
pre-existing on main and recovered on rerun. Hardening:

1. tests/cron/test_scheduler.py::TestRunJobConfigLogging — added mocks for
   resolve_runtime_provider() and discover_mcp_tools(). The yaml-warning
   tests intend to exercise only the warning-log path, but
   _run_job_impl continues into provider resolution and MCP discovery
   after the warning. Both can spawn subprocesses / hit the network and
   pushed the test over its 30s budget under GHA load.

2. tests/tools/test_browser_supervisor.py — wrapped Chrome teardown
   against the stdlib subprocess._wait() race (bpo-38630). When SIGCHLD
   arrives during proc.wait(), _try_wait(WNOHANG) can return a foreign
   pid and the 'assert pid == self.pid or pid == 0' fires. Fixture now
   catches AssertionError/TimeoutExpired, force-kills, and always reaps
   so no zombie escapes. Same hardening applied to the early-skip branch.
2026-05-27 23:15:41 -07:00
Ben 875d930ac7 test(docker-update): stub subprocess.run in git-install regression guard
The regression-guard test
`test_cmd_update_on_git_install_does_not_print_docker_message` mocked
`is_managed` and `detect_install_method` but not `subprocess.run`, so
once `cmd_update(check=True)` decided this was a git install it shelled
out to a real `git fetch upstream` / `git fetch origin`. On CI runners
the worktree has no `upstream` remote configured and the fetch hung
past the 30s pytest-timeout — test (4) slice failed in #33659 CI.

Fix: stub `subprocess.run` with a successful CompletedProcess-shaped
object whose stdout is `"0\n"`, so:
  - no real git command is ever invoked
  - the rev-list parsing later in the flow (`int(stdout.strip())`)
    succeeds rather than `ValueError`-ing through the test's
    SystemExit catch
  - the flow proceeds far enough to confirm the docker banner is
    absent (the actual assertion)

Also broaden the except clause to `(SystemExit, Exception)`: the only
assertion in this test is the negative-banner check on captured stdout;
any further failure in the rest of the update flow is irrelevant to
that contract.

Verified locally: all 7 tests in
`tests/hermes_cli/test_cmd_update_docker.py` pass in 0.39s (previously
the regression-guard test alone consumed 30s+ and got SIGTERM'd).
2026-05-28 15:50:25 +10:00
Ben b924b22a9d fix(docker): hermes update prints docker pull guidance instead of bogus git error
Inside the published Docker image, `hermes update` was hitting the
".git missing → reinstall via curl" fallback:

    ✗ Not a git repository. Please reinstall:
      curl -fsSL https://raw.githubusercontent.com/.../install.sh | bash

That message is wrong on two counts:
  1. It tells the user to run the host-side installer, which would
     install a *new* Hermes on the host — not update the running
     container.
  2. It doesn't mention `docker pull` at all, leaving Docker users
     to figure out the right action from scratch.

`hermes update --check` was worse: it bailed with "Not a git
repository — cannot check for updates." and nothing else.

Fix: detect the Docker install method (already stamped by
`docker/stage2-hook.sh` and surfaced by `detect_install_method()`)
in both update entry points and print a long-form message that
covers:

  - The right command: `docker pull nousresearch/hermes-agent:latest`
  - Restart guidance (`docker compose up -d --force-recreate` /
    re-run `docker run`)
  - How to verify the new version after restart
  - Tag-pinning caveat (`:latest` doesn't move a pinned tag)
  - Config persistence across upgrades (state under `HERMES_HOME` /
    `/opt/data` is bind-mounted and survives)
  - Fork escape hatch (build your own image with the repo's Dockerfile)

Exit code is 1 (matches `managed_error` semantic for "tried to
update but can't update this way").

Plumbing:
  - hermes_cli/config.py: new `format_docker_update_message()` helper
    sits next to the existing `_NIX_UPDATE_MSG` /
    `format_managed_message()` family so the wording lives in one
    place and both call sites (apply path + check path) consume it.
  - hermes_cli/main.py:
      * `cmd_update()`: bail right after the `is_managed()` gate, before
        any of the apply-path branches.
      * `_cmd_update_check()`: bail at the top of the function, before
        the existing `method == "pip"` branch.
    Neither path touches subprocess.run / git when method == "docker".

Coverage:
  - 7 new tests in `tests/hermes_cli/test_cmd_update_docker.py`:
      * `hermes update` in Docker → message + exit 1, no git calls
      * `hermes update --check` (via cmd_update) → same
      * `--yes` / `--force` don't bypass (intentional)
      * `_cmd_update_check` called directly → bails too
      * git/pip installs still take their normal paths (regression guards)
      * `format_docker_update_message` content-lock test pinning the
        five user-actionable bits the message must contain
  - Existing test_cmd_update.py (21 tests) + test_managed_installs.py
    (5 tests) still pass — no regression on the source-install path.
  - Verified end-to-end in a real container: `docker run ... update`
    and `docker run ... update --check` both render the message and
    exit 1.
2026-05-28 15:50:25 +10:00
stephenschoettler 4a6f1863ac test: cover ci-unblocker production regressions
Snapshot review_agent._session_messages before teardown so close() can
clean per-session state without dropping the user-visible
self-improvement summary. Adds two regressions:

- bg-review summarizer receives captured review-agent tool messages
  after review_agent.close() runs
- context-compressor protected-head handoff rehydration populates
  _previous_summary and keeps the old handoff out of newly summarized
  turns

Salvaged from PR #26039 onto current main after agent/background_review.py
extraction. Original commit 63eaf6055; bg-review test updated to patch
the module-level summarize_background_review_actions in
agent.background_review instead of the now-forwarder
AIAgent._summarize_background_review_actions.
2026-05-27 22:14:53 -07:00
Ben 66489f38c7 fix(docker): bake build-time git SHA into the image
`hermes dump` and the startup banner both call `git rev-parse HEAD` to
report the running commit, but `.dockerignore` line 2 excludes `.git` —
so inside the published image `hermes dump` shows
`version: ... [(unknown)]` and the banner drops its `· upstream <sha>`
suffix entirely.  That makes support triage from container bug reports
impossible: we can't tell which commit the user is actually running.

Fix: thread the build-time SHA through as a Docker build-arg, write it
to `/opt/hermes/.hermes_build_sha` in the image, and have a new
`hermes_cli/build_info.get_build_sha()` read it as a fallback after the
existing live-git lookup fails.  Output format is unchanged in both
callsites — same 8-char short SHA whether resolved live or baked.

Wiring:
  - Dockerfile: `ARG HERMES_GIT_SHA=` + write-file step after the source
    copy.  Empty/missing arg → no file written → callers fall through to
    live git (so local `docker build` without --build-arg is unchanged).
  - docker-publish.yml: passes `HERMES_GIT_SHA=${{ github.sha }}` on all
    four build-push-action steps (amd64/arm64, smoke-test + final push).
  - dump.py:_get_git_commit() / banner.py:get_git_banner_state(): try
    live git first, fall back to baked SHA, then to legacy `(unknown)`
    / None.  Banner returns `upstream == local, ahead=0` because a built
    image is by definition pinned to one commit.

Coverage:
  - Unit tests cover build_info (file present/absent/empty/error,
    truncation, whitespace), dump (live-git wins, both fallbacks,
    identical output-format regression guard), and banner (no-repo +
    baked, no-repo + no-sha, shallow-clone fallback).
  - tests/docker/test_dump_build_sha.py is an integration regression
    guard that runs against the real image, reads
    `/opt/hermes/.hermes_build_sha`, and asserts `hermes dump` surfaces
    its content (or stays at `(unknown)` if no file).
  - Verified end-to-end: `docker build --build-arg HERMES_GIT_SHA=abc...`
    → `docker run ... dump` reports `[abc12345]`; without the build-arg
    it reports `[(unknown)]` as before.
2026-05-28 15:14:05 +10:00
teknium1 ebe04c66cd fix(kanban): close kanban.db FD after every connect() in long-lived processes
`sqlite3.Connection.__exit__` commits/rollbacks but does NOT close the
underlying FD. `with kb.connect() as conn:` in long-lived processes
(gateway `run_slash`, dashboard `decompose_task_endpoint`) therefore
leaks one FD to `kanban.db` per call. After enough operations the
gateway dies with `[Errno 24] Too many open files` (~4 days uptime
in the production report — #33159).

Fix: add a `connect_closing()` context manager in `hermes_cli/kanban_db`
that wraps `connect()` with a real `try/finally: conn.close()`. Switch
the 42 leak-prone call sites in `hermes_cli/kanban.py` (35),
`hermes_cli/kanban_decompose.py` (4), and `hermes_cli/kanban_specify.py`
(3) over to it.

`kanban.py` matters because `run_slash` (called from the gateway for
every `/kanban` slash command) parses argparse and dispatches to those
`_cmd_*` functions in-process — each one was leaking one FD per
invocation.

Tests inside `tests/` are untouched: short-lived processes where OS
cleanup masks the leak. Regression tests added in
`test_kanban_db.py` cover both happy-path and exception-path closure,
plus an explicit assertion that bare `with kb.connect()` still does
NOT close (documenting the upstream sqlite3 behaviour we're working
around).

Closes #33159.
2026-05-27 22:07:49 -07:00
Teknium 6d947e4d78 feat(image_gen/fal): add Krea 2 Medium + Large to FAL catalog (#33506)
fal announced Krea 2 day-0 as an official API partner on 2026-05-27.
Add both variants to the FAL_MODELS catalog so they appear in the
'hermes tools' model picker alongside flux-2, gpt-image, nano-banana,
etc. Users who already bill through FAL or Nous Portal subscription
can now use Krea without registering directly with Krea.

Model IDs (as listed in fal's launch announcement):
  fal-ai/krea/v2/medium/text-to-image  — $0.030 / image
  fal-ai/krea/v2/large/text-to-image   — $0.060 / image

Both share the same parameter schema:
  - aspect_ratio (1:1, 4:3, 3:2, 16:9, 2.35:1, 4:5, 2:3, 9:16)
    mapped from our 3 abstract ratios via size_style='aspect_ratio'
  - creativity (raw|low|medium|high; default medium)
  - seed (reproducibility)
  - image_style_references (up to 10 per Krea's API spec)

No num_inference_steps / guidance_scale / num_images — Krea 2 does
not expose those, and the supports-set filter strips them defensively
if the agent ever passes them.

This is the FAL-routed variant. The separate native-Krea-API plugin
shipped in PR #33236 (plugins/image_gen/krea/) remains available for
users who want to bill directly through Krea's API with their own
key. Both routes converge on the same underlying model.

Nous Portal managed-FAL gateway: this commit makes the model IDs
known to the catalog and the picker. The Portal team will need to
allowlist these two endpoint slugs on the fal-queue origin server-side
for them to flow through the managed billing path.
2026-05-27 21:42:52 -07:00
Wesley Simplicio 10f13c3881 fix(web): allow mobile dashboard scrolling (#28051) (#28577)
* fix(web): allow mobile dashboard scrolling

* fix(web): combine mobile root scroll rules

---------

Co-authored-by: Wesley Simplicio <wesley.simplicio.ext@siemens-energy.com>
2026-05-28 00:02:50 -04:00
Austin Pickett c9410b3462 feat(web): add collapsible sidebar for the dashboard (#33421)
* feat(web): add collapsible sidebar for the dashboard

The desktop sidebar can now be collapsed to an icon-only rail via a
toggle button in the sidebar header.  State is persisted in
localStorage so it survives page reloads.

When collapsed (lg+ only):
- Sidebar shrinks from w-64 to w-14 with a smooth width transition
- Nav items show only their icon with a native title tooltip
- Brand text, plugin headings, system actions, theme/language
  switchers, auth widget, and footer are hidden
- Mobile drawer behavior is unchanged (always full-width)

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): align sidebar tooltips to sidebar edge consistently

Tooltip left position now uses the sidebar's right edge instead of the
anchor element's right edge, so narrow anchors (theme/language switchers)
align with full-width anchors (nav links, system actions).

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(web): add tooltip animations, restore theme label, rename Sessions tab

- Sidebar tooltips now animate in with a subtle 120ms ease-out slide;
  subsequent tooltips within the same hover sequence appear instantly
  (no delay/animation) following Emil Kowalski's tooltip pattern
- Restore theme name label when sidebar is expanded
- Rename Sessions segment tab to "History" across all 16 locales

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(web): smooth sidebar collapse animation

- Remove icon centering on collapse; icons stay left-aligned at px-5
  so they don't jump during the width transition
- Text labels fade out with opacity transition instead of instant
  display:none, clipped naturally by overflow-hidden
- Slow collapse duration from 450ms to 600ms for a more relaxed feel
- Gateway dot always rendered with opacity toggle so it doesn't
  slide in from the right on collapse
- Pin gateway dot at fixed left offset (pl-[1.625rem]) to align
  with nav icons
- Align header toggle button with justify-center when collapsed
- Bottom switchers use items-start when collapsed to prevent reflow

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-27 23:58:41 -04:00
Dusk c341a2d107 fix(docker): align HOME for dashboard and s6 gateway services (#33481) 2026-05-28 13:42:27 +10:00
teknium1 71b4a6b18e fix(docker): install python-is-python3 so bare python resolves in containers
Debian 13 ships only `python3` — there's no `/usr/bin/python` symlink. When
the agent emits bash commands using bare `python` (which models do frequently
from their training prior), every such call fails with:

    /usr/bin/bash: python: command not found
    Tool terminal returned error … exit_code 127

The agent then retries with different approaches, sessions take longer, and
agent.log fills with WARNING noise.

`python-is-python3` is the standard Debian package that drops a
`/usr/bin/python → python3` symlink. ~30 KB, zero behavior change for
anything calling `python3` directly; transparent fix for everything else.

Fixes #33178.
2026-05-28 13:37:17 +10:00
Ben Barclay aeb992d343 fix(docker): drop docker exec to hermes uid before invoking the CLI
When operators ran `docker exec <c> hermes login` (or anything else
that wrote under $HERMES_HOME) they defaulted to root, leaving
/opt/data/auth.json root:root mode 0600. The supervised gateway
(UID 10000) then couldn't read its own credentials and returned
"Provider authentication failed: Hermes is not logged into Nous
Portal" on every Telegram/Discord/etc. message — even though
`docker exec <c> hermes chat -q ping` (also root) succeeded because
root could read its own root-owned file. _load_auth_store swallowed
PermissionError as a parse failure and copied the file aside as
auth.json.corrupt, making the diagnostic more misleading.

Fix: install a privilege-drop shim at /opt/hermes/bin/hermes,
prepended ahead of the venv on PATH. When invoked as root the shim
exec's the real venv binary via `s6-setuidgid hermes` — so any file
the docker-exec session writes is uid-aligned with the supervised
processes. Non-root callers (the supervised processes themselves,
`docker exec --user hermes`, kanban subagents, anything inside the
container that's not coming through docker-exec) hit a single exec
to the absolute venv path with no privilege change.

Recursion is impossible: the shim exec's the venv binary by
absolute path (/opt/hermes/.venv/bin/hermes), so the second hop
cannot re-enter the shim regardless of PATH state. No sentinel env
var needed (unlike #33583's gateway-run redirect which DOES need
HERMES_S6_SUPERVISED_CHILD because there's no absolute-path
equivalent for the s6 dispatch).

Opt-out: `docker exec -e HERMES_DOCKER_EXEC_AS_ROOT=1 …` for
diagnostic sessions where the operator deliberately wants root.
Strict truthiness (1/true/yes case-insensitive); typos like `=0`
do not silently opt out, mirroring HERMES_GATEWAY_NO_SUPERVISE in
#33583.

If `s6-setuidgid` is missing (someone stripped s6-overlay in a
downstream fork), the shim exits 126 with a remediation message
pointing at `--user hermes` and the opt-out — never silently runs
as root.

Test plan:
- tests/docker/test_docker_exec_privilege_drop.py — 11 tests
  - shim drops root to hermes uid (file ownership check)
  - shim short-circuits for non-root docker exec
  - HERMES_DOCKER_EXEC_AS_ROOT=1 keeps root
  - strict-truthiness parametrization (5 falsy values reject)
  - main CMD path unaffected (recursion guard)
  - E2E: every file written by docker-exec is readable by uid 10000
- Full tests/docker/ harness: 32/32 pass against fresh image build
- shellcheck --severity=error: clean
- hadolint: clean
- Manual: reproduced the original symptom (root-owned auth.json)
  by bypassing the shim; confirmed default docker-exec produces
  hermes-owned files; confirmed opt-out env keeps root semantics.

Known follow-up: this prevents NEW instances of the bug. Volumes
that already have root:root /opt/data/auth.json from a pre-shim
image need a one-time `chown hermes:hermes` before rebooting onto
the new image. A stage2-hook chown sweep can self-heal that, but
is deferred per scope decision.
2026-05-28 13:30:36 +10:00
Ben Barclay b345323195 fix(docker): tee supervised gateway stdout to docker logs
Follow-up to #33583 (the gateway-run-supervised redirect).

Before this fix, the supervised gateway's stdout (most visibly the
"Hermes Gateway Starting…" rich-console banner) was swallowed by
`s6-log` into the rotated file at
`${HERMES_HOME}/logs/gateways/<profile>/current` and never reached
`docker logs`. Operational signal lived in two places:

  * **docker logs** — saw stderr (Python `logging` defaults to
    stderr), so warnings/errors were visible.
  * **the rotated file** — saw stdout (rich banners, `print()`
    output, third-party libs that wrote to fd 1).

This was surprising for users coming from the pre-s6 image, where
`docker run … gateway run` produced a single unified stream in
`docker logs`. They'd see partial output, conclude something was
broken, and dig around for the missing pieces.

Fix: add the `1` s6-log action directive before the file destination
so each line is forwarded to s6-log's stdout — which propagates up
the s6-supervise pipeline to /init's stdout = container stdout =
`docker logs`. The file destination is preserved as a second
destination, so the rotated log (with ISO 8601 timestamps) still
exists for `hermes logs` and for survival across container restarts.

Trade-off considered: timestamps. Putting `T` between `1` and the
file destination (not before `1`) means:

  * docker logs sees raw lines — Python's logging formatter has its
    own timestamps, and `docker logs --timestamps` adds another
    layer when desired. No double-stamping in the common reading
    path.
  * The persisted file gets s6-log's ISO 8601 timestamp so even
    output that lacked a Python-logger timestamp (rich banners,
    third-party raw prints) is correlatable in `current`.

Verification:

  * New unit-test assertion in `test_service_manager.py` locks the
    `s6-log 1` directive into the rendered run-script. Mutation-
    tested by reverting to the pre-fix script (no `1`); the assert
    catches it cleanly.
  * New docker-harness test `test_supervised_gateway_stdout_reaches_docker_logs`
    builds the image, runs `docker run … gateway run`, and asserts
    the unique `⚕` banner glyph reaches `docker logs`. Also verifies
    the rotated file still contains the banner (no regression on
    the existing file destination). Mutation-tested end-to-end: built
    a deliberately-broken image without the `1` directive and the
    test failed exactly as designed, citing the banner present in
    `current` but absent from `docker logs`.
  * `website/docs/user-guide/docker.md` gains a new `:::note Where
    gateway logs go` admonition documenting both destinations and
    the audit-log file at `${HERMES_HOME}/logs/container-boot.log`.

Existing functionality preserved: every other docker-harness test
still passes against the new image. Unit-test sweep across
`tests/hermes_cli/` (5561 tests) is green.
2026-05-28 13:18:41 +10:00
332 changed files with 12774 additions and 2521 deletions
+8
View File
@@ -71,6 +71,8 @@ jobs:
load: true
platforms: linux/amd64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
@@ -149,6 +151,8 @@ jobs:
platforms: linux/amd64
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
@@ -203,6 +207,8 @@ jobs:
load: true
platforms: linux/arm64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: type=gha,scope=docker-arm64
cache-to: type=gha,mode=max,scope=docker-arm64
@@ -228,6 +234,8 @@ jobs:
platforms: linux/arm64
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=gha,scope=docker-arm64
cache-to: type=gha,mode=max,scope=docker-arm64
+6
View File
@@ -78,6 +78,12 @@ mini-swe-agent/
.nix-stamps/
result
website/static/api/skills-index.json
# skills.json + skills-meta.json are build artifacts emitted by
# website/scripts/extract-skills.py during prebuild — keep them out of
# git for the same reason as skills-index.json (large, generated, change
# every build).
website/static/api/skills.json
website/static/api/skills-meta.json
models-dev-upstream/
hermes_cli/tui_dist/*
hermes_cli/scripts/
+44 -2
View File
@@ -25,7 +25,7 @@ ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# hermes process, the dashboard, and per-profile gateways.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates curl python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli xz-utils && \
ca-certificates curl python3 python-is-python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli xz-utils && \
rm -rf /var/lib/apt/lists/*
# ---------- s6-overlay install ----------
@@ -187,6 +187,29 @@ RUN chmod -R a+rX /opt/hermes && \
# this a fast (~1s) egg-link creation with no resolution or downloads.
RUN uv pip install --no-cache-dir --no-deps -e "."
# ---------- Bake build-time git revision ----------
# .dockerignore excludes .git, so `git rev-parse HEAD` from inside the
# container always returns nothing — meaning `hermes dump` reports
# "(unknown)" and the startup banner drops its `· upstream <sha>` suffix.
# That makes support triage from container bug reports impossible:
# we can't tell which commit the user is actually running.
#
# Fix: write the commit SHA passed via the HERMES_GIT_SHA build-arg to
# /opt/hermes/.hermes_build_sha at build time, and have
# hermes_cli/build_info.py read it at runtime. Both `hermes dump` and
# banner.get_git_banner_state() try the baked SHA first, then fall back
# to live `git rev-parse` for source installs (unchanged behaviour).
#
# The arg is optional — local `docker build` without --build-arg simply
# omits the file, and the runtime falls back to live-git lookup. CI
# (.github/workflows/docker-publish.yml) passes ${{ github.sha }} so
# every published image has it.
ARG HERMES_GIT_SHA=
RUN if [ -n "${HERMES_GIT_SHA}" ]; then \
printf '%s\n' "${HERMES_GIT_SHA}" > /opt/hermes/.hermes_build_sha && \
chown hermes:hermes /opt/hermes/.hermes_build_sha; \
fi
# ---------- s6-overlay service wiring ----------
# Static services declared at build time: main-hermes + dashboard.
# Per-profile gateway services are registered dynamically at runtime by
@@ -213,13 +236,32 @@ COPY --chmod=0755 docker/cont-init.d/02-reconcile-profiles /etc/cont-init.d/02-r
# ---------- Runtime ----------
ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
# `docker exec` privilege-drop shim. When operators run
# `docker exec <c> hermes ...` they default to root, and any file the
# command writes under $HERMES_HOME (auth.json, .env, config.yaml) ends
# up root-owned and unreadable to the supervised gateway (UID 10000).
# The shim lives at /opt/hermes/bin/hermes, sits earliest on PATH, and
# transparently re-exec's the real venv binary via `s6-setuidgid hermes`
# when invoked as root. Non-root callers (supervised processes,
# `--user hermes`, etc.) hit the short-circuit path with no overhead.
# Recursion is impossible because the shim exec's the venv binary by
# absolute path (/opt/hermes/.venv/bin/hermes). See the shim source for
# the opt-out env var (HERMES_DOCKER_EXEC_AS_ROOT=1).
COPY --chmod=0755 docker/hermes-exec-shim.sh /opt/hermes/bin/hermes
# Pre-s6 entrypoint.sh did `source .venv/bin/activate` which exported
# the venv bin onto PATH; Architecture B's main-wrapper.sh does the
# same for the container's main process, but `docker exec` and our
# cont-init.d scripts don't pass through the wrapper. Expose the venv
# bin globally so `docker exec <container> hermes ...` and any
# subprocess that doesn't activate the venv first still find hermes.
ENV PATH="/opt/hermes/.venv/bin:/opt/data/.local/bin:${PATH}"
#
# /opt/hermes/bin is prepended ahead of the venv so the privilege-drop
# shim wins PATH resolution. The shim's last act is to exec the venv
# binary by absolute path, so this PATH ordering is transparent to
# every other consumer.
ENV PATH="/opt/hermes/bin:/opt/hermes/.venv/bin:/opt/data/.local/bin:${PATH}"
RUN mkdir -p /opt/data
VOLUME [ "/opt/data" ]
+655
View File
@@ -0,0 +1,655 @@
# Hermes Agent v0.15.0 (v2026.5.28)
**Release Date:** May 28, 2026
**Since v0.14.0:** 1,302 commits · 747 merged PRs · 1,746 files changed · 282,712 insertions · 36,699 deletions · 560+ issues closed (15 P0, 65 P1, 19 security-tagged) · 321 community contributors (including co-authors)
> **The Velocity Release.** Hermes gets dramatically faster — to start, to run, to ship work, and to grow. The 16,083-line `run_agent.py` collapses to 3,821 (-76%) across 14 cohesive `agent/*` modules. Kanban grew into a real multi-agent platform across 104 PRs — orchestrator auto-decomposition, swarm topology, scheduled tasks, worktree-per-task, per-task model overrides. The cold-start perf wave keeps going: another second shaved off launch, 47% fewer per-conversation function calls, `hermes --version` flipping the head-to-head benchmark against Codex CLI. `session_search` is 4,500× faster and free now. Promptware defense lands against Brainworm-class attacks. Bitwarden Secrets Manager replaces N per-provider API keys with one bootstrap token. Skill bundles let one slash command load a whole workflow. The Ink TUI gets a multi-session orchestrator. Two new image_gen providers (Krea 2 Medium + Large, FAL ported to plugin), the Nous-approved MCP catalog with an interactive picker, an OpenHands orchestration skill, ntfy as the 23rd messaging platform, and a deep xAI integration round (Web Search plugin, xai-oauth `hermes proxy` upstream, retired-May-15 model detection + `hermes migrate xai`, natural TTS speech-tag pauses, base_url leak guard, OpenAI-style execution guidance for Grok). 15 P0 + 65 P1 closures alongside.
---
## ✨ Highlights
- **The Big Refactor — `run_agent.py` is no longer 16,000 lines** — The file at the heart of Hermes — the agent conversation loop — has been reduced from 16,083 lines to 3,821 (-76%), with the extracted code redistributed across 14 cohesive modules under `agent/`. Behavior is unchanged: every extraction keeps a thin forwarder on `AIAgent`, every test patch path still works, every external caller is compatible. The reason you care: future Hermes development moves faster, plugin authors can finally grep the codebase, and the file that took 90 seconds to load in your editor opens in a blink. ([#27248](https://github.com/NousResearch/hermes-agent/pull/27248))
- **Kanban grew into a real multi-agent platform — 104 PRs end to end** — Triage auto-decomposes one task into a tree of sub-tasks. `hermes kanban swarm` creates a full Swarm v1 graph in one command — root, parallel workers, gated verifier, gated synthesizer, shared blackboard. Tasks support per-task model overrides (cheap models for boilerplate, expensive ones for hard sub-tasks), board-level default workdirs, per-task worktree paths and branches, scheduled start times, configurable claim TTL, retry fingerprinting, stale-task detection, respawn guards, and a drag-to-delete trash zone. Workers report through `/workers/active`, `/runs/{id}`, and `/inspect` endpoints. ([#27572](https://github.com/NousResearch/hermes-agent/pull/27572), [#28443](https://github.com/NousResearch/hermes-agent/pull/28443), [#28364](https://github.com/NousResearch/hermes-agent/pull/28364), [#28394](https://github.com/NousResearch/hermes-agent/pull/28394), [#28462](https://github.com/NousResearch/hermes-agent/pull/28462), [#28384](https://github.com/NousResearch/hermes-agent/pull/28384), [#28467](https://github.com/NousResearch/hermes-agent/pull/28467), [#28455](https://github.com/NousResearch/hermes-agent/pull/28455), [#28452](https://github.com/NousResearch/hermes-agent/pull/28452), [#28432](https://github.com/NousResearch/hermes-agent/pull/28432), [#28468](https://github.com/NousResearch/hermes-agent/pull/28468), [#28420](https://github.com/NousResearch/hermes-agent/pull/28420))
- **Cold-start perf wave keeps going — another second saved, 47% fewer per-turn function calls** — Three new optimization rounds: defer `openai._base_client` import (-240ms / -17MB on every CLI invocation), hot-path optimizations cut 47% of per-conversation function calls (399k → 213k for 31-turn chat), defer compression-feasibility check (-170 to -290ms on every agent construction), adaptive subprocess polling (-195ms per tool call, 1+ second per turn). Termux cold start drops from 2.9s to 0.8s. `hermes --version` cold drops 63% (701ms → 258ms), flipping the head-to-head benchmark against Codex CLI from 5/11 wins to 6/11. ([#28864](https://github.com/NousResearch/hermes-agent/pull/28864), [#28866](https://github.com/NousResearch/hermes-agent/pull/28866), [#28957](https://github.com/NousResearch/hermes-agent/pull/28957), [#29006](https://github.com/NousResearch/hermes-agent/pull/29006), [#29419](https://github.com/NousResearch/hermes-agent/pull/29419), [#30121](https://github.com/NousResearch/hermes-agent/pull/30121), [#30609](https://github.com/NousResearch/hermes-agent/pull/30609), [#31968](https://github.com/NousResearch/hermes-agent/pull/31968))
- **`session_search` rebuilt — no LLM, no cost, 4,500× faster** — The old `session_search` was an aux-LLM-powered tool that cost ~$0.30/call and took ~30 seconds to summarize three sessions, sometimes confabulating when the right session wasn't even in the FTS5 hit list. The new shape is one tool with three modes (discovery, scroll, browse) inferred from which args are set — no `mode` parameter, no aux-LLM, no config knob, no companion skill. Discovery is ~20ms instead of ~90s; scroll is ~1ms. Searching your past sessions for context is now free and instant. ([#27590](https://github.com/NousResearch/hermes-agent/pull/27590))
- **Promptware defense — Brainworm-class attacks blocked at three chokepoints** — Inspired by recent Brainworm / Promptware Kill Chain research (Origin HQ, arxiv 2601.09625), Hermes now defends the context window against prompt-injection attacks that try to hijack the agent via tool output, recalled memory, or stored skills. Single source of truth (`tools/threat_patterns.py`) with ~15 new Brainworm/C2 patterns; recalled memory is scanned at load time; tool results get delimiter markers so a malicious file or remote service can't impersonate Hermes' own system content. Paired with a new `security-guidance` plugin that pattern-matches dangerous code writes. ([#32269](https://github.com/NousResearch/hermes-agent/pull/32269), [#33131](https://github.com/NousResearch/hermes-agent/pull/33131), [#9151](https://github.com/NousResearch/hermes-agent/pull/9151))
- **Bitwarden Secrets Manager — one bootstrap token replaces every per-provider API key** — Stop keeping plaintext API keys in `~/.hermes/.env`. Install Bitwarden Secrets Manager (`bws` auto-installs lazily on first use), point Hermes at it with one bootstrap token (`BWS_ACCESS_TOKEN`), and every credential you need comes from Bitwarden at startup. Rotate a key in the Bitwarden web app and the rotation actually takes effect — Bitwarden defaults to source-of-truth so its values overwrite matching env vars on startup. Flip `secrets.bitwarden.override_existing: false` to invert. EU Cloud and self-hosted Bitwarden server URLs supported. Detected credentials are now labeled with their source so you can see at a glance which keys came from Bitwarden vs. the local env. ([#30035](https://github.com/NousResearch/hermes-agent/pull/30035), [#31378](https://github.com/NousResearch/hermes-agent/pull/31378), [#30364](https://github.com/NousResearch/hermes-agent/pull/30364))
- **ntfy as the 23rd messaging platform — push notifications without an account** — ntfy is the self-hostable push-notification service with no signup, no API key, just a topic URL. Hermes now adapts to it as a platform plugin (zero edits to core), so your agent can send you push notifications from any cron job, kanban task completion, or chat `send_message` — to your phone, your watch, your desktop, your homelab. (salvages [#30625](https://github.com/NousResearch/hermes-agent/pull/30625) → originally [#4043](https://github.com/NousResearch/hermes-agent/pull/4043)) ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- **Skill bundles — `/<name>` loads multiple skills at once** — A skill bundle is a named group of skills that loads them all together with one slash command. Set up your "writing day" bundle (humanizer + ideation + obsidian + youtube-content) and `/writing-day` activates all four for the session. Skills Hub now has health checks, a freshness badge, and a watchdog cron. Three new optional skills land: `code-wiki` (Karpathy's LLM-Wiki, persistent indexed dev wiki), `openhands` (delegate to OpenHands for parallel coding agents), and `web-pentest` (OWASP-style web pentest recipes). ([#28373](https://github.com/NousResearch/hermes-agent/pull/28373), [#32345](https://github.com/NousResearch/hermes-agent/pull/32345), [#32240](https://github.com/NousResearch/hermes-agent/pull/32240), [#32261](https://github.com/NousResearch/hermes-agent/pull/32261), [#32265](https://github.com/NousResearch/hermes-agent/pull/32265))
- **TUI session orchestrator — multiple live sessions in one TUI window** — The Ink TUI gained an active-session switcher overlay. List, switch between, refresh, and close multiple live process-local sessions without leaving the TUI; dispatch a new session with a session-scoped model picker. Plus a wave of TUI polish — mouse-tracking DEC mode presets, scrollback preservation across branches and termux, slash-dropdown fixes, x.com link rendering, and CJK / IME input rendering improvements. (salvages [#27642](https://github.com/NousResearch/hermes-agent/pull/27642)) ([#32980](https://github.com/NousResearch/hermes-agent/pull/32980), [#30084](https://github.com/NousResearch/hermes-agent/pull/30084))
- **Two new image_gen providers — Krea 2 Medium + Large, FAL ported to plugin** — Krea joins the image_gen lineup as a built-in plugin: `Krea 2 Medium` ($0.03) and `Krea 2 Large` ($0.06), auto-discovered, selectable via `hermes tools` → Image Generation → Krea. Available through both the native Krea plugin and the FAL.ai catalog. The FAL.ai backend got pulled out of the monolithic image-generation tool into `plugins/image_gen/fal/`, completing the four-way architectural parity already established by web, browser, and video_gen — new image providers are now one file, not a fork. ([#33236](https://github.com/NousResearch/hermes-agent/pull/33236), [#30380](https://github.com/NousResearch/hermes-agent/pull/30380), [#33506](https://github.com/NousResearch/hermes-agent/pull/33506))
- **Nous-approved MCP catalog with interactive picker** — A curated catalog of Nous-vetted MCP servers, mirroring the optional-skills shape. Run `hermes mcp` and you get an interactive picker; install with one keystroke, credentials prompted at install time and written to `~/.hermes/.env`. Ships with the n8n manifest first. Closes the discovery gap that left users hunting GitHub for trusted MCP servers. ([#30870](https://github.com/NousResearch/hermes-agent/pull/30870))
- **OpenHands orchestration skill** — A new optional skill under `optional-skills/autonomous-ai-agents/openhands/` lets the agent delegate coding tasks to the OpenHands CLI alongside `claude-code`, `codex`, and `opencode`. OpenHands is the model-agnostic member of that family — any LiteLLM-supported provider works (OpenAI, Anthropic, OpenRouter, your own), so you can route a sub-task to the cheapest model that can finish it. Drop-in worker for kanban swarms and `/delegate` flows. (closes [#477](https://github.com/NousResearch/hermes-agent/issues/477)) ([#32261](https://github.com/NousResearch/hermes-agent/pull/32261))
- **Deep xAI integration round — Web Search plugin, OAuth proxy upstream, May 15 retirement detection, natural TTS, security hardening** — Six interlocking xAI improvements:
- **xAI Web Search** lands as a `plugins/web/xai/` provider, slots alongside Brave / Tavily / Exa / SearXNG / DDGS / Firecrawl — reuses your existing Grok OAuth or `XAI_API_KEY` credentials, no new env vars. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
- **`hermes proxy` gains an xAI upstream** — your local OpenAI-compatible endpoint can now be backed by SuperGrok OAuth, no PKCE-refresh code to write in your client. ([#28356](https://github.com/NousResearch/hermes-agent/pull/28356))
- **May 15 model retirement detection** — `grok-4`, `grok-4-fast{,-reasoning,-non-reasoning}`, `grok-3`, `grok-code-fast-1`, `grok-imagine-image-pro` etc. are detected in doctor and chat startup, with `hermes migrate xai` to one-shot config migration to the supported model. No more silent 404s after the retirement date. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- **Opt-in `auto_speech_tags`** for xAI TTS — inserts light `[pause]` tags between paragraphs and sentences for more natural-sounding voice replies. Default OFF. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- **`xai-oauth` `base_url` pinned to `x.ai` origin** — closes a silent credential-leak vector where `XAI_BASE_URL` could repoint OAuth-authenticated inference to an attacker-controlled host. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- **OpenAI-style execution guidance applied to Grok models** — Grok and xai-oauth now get the same family-specific execution discipline block GPT/Codex have, so the model stops claiming completion without tool calls and stops suggesting workarounds instead of using existing tools. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- Plus `x_search` degraded-results surfacing, tier-gated 403 with API-key fallback, PKCE `code_challenge` round-trip fix, dead-token quarantine on terminal refresh failure, MiniMax-style short-token refresh on per-request, and `WKE=unauthenticated` honor at both classifier sites. ([#29484](https://github.com/NousResearch/hermes-agent/pull/29484), [#28351](https://github.com/NousResearch/hermes-agent/pull/28351), [#27560](https://github.com/NousResearch/hermes-agent/pull/27560), [#28116](https://github.com/NousResearch/hermes-agent/pull/28116), [#30619](https://github.com/NousResearch/hermes-agent/pull/30619), [#30872](https://github.com/NousResearch/hermes-agent/pull/30872))
---
## 🏗️ Core Agent & Architecture
### The Big Refactor — `run_agent.py` 16k → 3.8k
- `run_agent.py` from 16,083 → 3,821 lines (-76%), extracted into 14 cohesive `agent/*` modules. `run_conversation` alone was 3,877 lines before the refactor. Every extraction keeps a thin forwarder on `AIAgent`, every test-patch path is preserved, every external caller stays compatible. ([#27248](https://github.com/NousResearch/hermes-agent/pull/27248))
### Agent loop & conversation
- Auxiliary task layered fallback (primary → chain → main agent → graceful fail) on capacity errors (402/429/connection). (salvages [#26811](https://github.com/NousResearch/hermes-agent/pull/26811) + [#26998](https://github.com/NousResearch/hermes-agent/pull/26998)) ([#27625](https://github.com/NousResearch/hermes-agent/pull/27625))
- Buffer retry/fallback status; surface only on terminal failure (no more noisy "retrying..." spam in mid-run output). ([#33816](https://github.com/NousResearch/hermes-agent/pull/33816))
- Host contract for external context engines — condenses 5 prior PRs into one extension surface. ([#33750](https://github.com/NousResearch/hermes-agent/pull/33750))
- Fallback immediately on provider content-policy blocks. ([#33883](https://github.com/NousResearch/hermes-agent/pull/33883))
- Re-pad `reasoning_content` on cross-provider fallback to require-side providers. (salvage [#33784](https://github.com/NousResearch/hermes-agent/pull/33784)) ([#33795](https://github.com/NousResearch/hermes-agent/pull/33795))
- Per-turn tool-outcome verifier — patch tool gets indent preservation, CRLF preservation, per-file failure escalation. ([#32273](https://github.com/NousResearch/hermes-agent/pull/32273))
- Single-knob native vision for custom-provider models. ([#29679](https://github.com/NousResearch/hermes-agent/pull/29679))
- Background review fork isolated from external memory plugins. ([#27190](https://github.com/NousResearch/hermes-agent/pull/27190))
- Background review inherits parent toolset config for `tools[]` cache parity. ([#29704](https://github.com/NousResearch/hermes-agent/pull/29704))
- Recover from providers returning list-type tool content. ([#30259](https://github.com/NousResearch/hermes-agent/pull/30259))
- Treat partial-stream stub responses as length truncation rather than clean stop. ([#30998](https://github.com/NousResearch/hermes-agent/pull/30998))
- OpenAI execution guidance applied to xAI Grok / xai-oauth. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- ContextVars propagate to concurrent tool worker threads.
- Preload `jiter` native parser. ([#33692](https://github.com/NousResearch/hermes-agent/pull/33692))
- Expose context engine tools with saved toolsets. (salvage of [#31194](https://github.com/NousResearch/hermes-agent/pull/31194)) ([#33719](https://github.com/NousResearch/hermes-agent/pull/33719))
### Sessions & memory
- `session_search` rebuilt — single-shape (discovery + scroll + browse), no aux-LLM, ~20ms vs. ~90s. ([#27590](https://github.com/NousResearch/hermes-agent/pull/27590))
- Salvage [#29182](https://github.com/NousResearch/hermes-agent/pull/29182) — opt-in JSON snapshot writer for sessions. ([#29278](https://github.com/NousResearch/hermes-agent/pull/29278))
- Persist `platform_message_id` for recall across gateway restarts. ([#29449](https://github.com/NousResearch/hermes-agent/pull/29449))
- Inline memory-context mentions stay visible in conversation. ([#28132](https://github.com/NousResearch/hermes-agent/pull/28132))
- Recalled memory labeled informational, not authoritative. ([#28583](https://github.com/NousResearch/hermes-agent/pull/28583))
- Memory + context-engine tool injection gated on `enabled_toolsets`. ([#30177](https://github.com/NousResearch/hermes-agent/pull/30177))
- Guard against external drift in `MEMORY.md` / `USER.md`. ([#30877](https://github.com/NousResearch/hermes-agent/pull/30877))
- Honcho runtime peer mapping — correctness follow-ups + setup wizard + docs. ([#30077](https://github.com/NousResearch/hermes-agent/pull/30077))
- Periodic memory logging for leak detection. (salvage of [#17667](https://github.com/NousResearch/hermes-agent/pull/17667)) ([#27102](https://github.com/NousResearch/hermes-agent/pull/27102))
### Codex / Responses-API maturation
- TTFB watchdog for stalled Codex Responses streams. ([#32042](https://github.com/NousResearch/hermes-agent/pull/32042))
- Actionable hint when stale-call detector fires on known silent-reject pattern. ([#32016](https://github.com/NousResearch/hermes-agent/pull/32016), [#33133](https://github.com/NousResearch/hermes-agent/pull/33133))
- Drop SDK `responses.stream()` helper; consume events directly. ([#33042](https://github.com/NousResearch/hermes-agent/pull/33042))
- Gracefully recover from `invalid_encrypted_content`. (salvage of [#10144](https://github.com/NousResearch/hermes-agent/pull/10144)) ([#33035](https://github.com/NousResearch/hermes-agent/pull/33035))
- Recover Codex Responses streams with null output. ([#32963](https://github.com/NousResearch/hermes-agent/pull/32963), [#33390](https://github.com/NousResearch/hermes-agent/pull/33390))
- Drop foreign-issuer reasoning and transient `rs_tmp` reasoning replay state. ([#33156](https://github.com/NousResearch/hermes-agent/pull/33156), [#33146](https://github.com/NousResearch/hermes-agent/pull/33146))
- Codex 429 quota classified as rate-limit, not missing credentials. ([#33168](https://github.com/NousResearch/hermes-agent/pull/33168))
- Codex chat path falls back to credential_pool when singleton is empty. ([#33189](https://github.com/NousResearch/hermes-agent/pull/33189))
- Codex re-auth syncs credential_pool. ([#33164](https://github.com/NousResearch/hermes-agent/pull/33164))
- Omit `tools` key when no tools registered. ([#33409](https://github.com/NousResearch/hermes-agent/pull/33409))
- Parse Codex image-generation SSE directly. ([#32933](https://github.com/NousResearch/hermes-agent/pull/32933))
---
## 🎛️ Kanban — Multi-Agent Maturation Wave
### Orchestration & dispatch
- Orchestrator-driven auto-decomposition on triage. ([#27572](https://github.com/NousResearch/hermes-agent/pull/27572))
- Kanban swarm topology helper — `hermes kanban swarm` creates a Swarm v1 graph (root + parallel workers + gated verifier + gated synthesizer + shared blackboard). (salvages [#26791](https://github.com/NousResearch/hermes-agent/pull/26791) by @Niraven) ([#28443](https://github.com/NousResearch/hermes-agent/pull/28443))
- Dispatcher wires review agents from the review column. ([#28449](https://github.com/NousResearch/hermes-agent/pull/28449))
- Stale-detection for running tasks in dispatcher. ([#28452](https://github.com/NousResearch/hermes-agent/pull/28452))
- Respawn guard blocks repeat worker storms. ([#28455](https://github.com/NousResearch/hermes-agent/pull/28455))
- Respawn guard defers `blocker_auth` instead of auto-blocking. ([#28683](https://github.com/NousResearch/hermes-agent/pull/28683))
- Cross-profile cron jobs surface in dashboard. ([#28457](https://github.com/NousResearch/hermes-agent/pull/28457))
- Worker visibility endpoints: `/workers/active`, `/runs/{id}`, `/inspect`. (salvages [#23761](https://github.com/NousResearch/hermes-agent/pull/23761) by @Interstellar-code) ([#28432](https://github.com/NousResearch/hermes-agent/pull/28432))
### Task configuration & scheduling
- Per-task model override. ([#28364](https://github.com/NousResearch/hermes-agent/pull/28364))
- Board-level default workdir. ([#28394](https://github.com/NousResearch/hermes-agent/pull/28394))
- Configurable worktree paths and branches. ([#28462](https://github.com/NousResearch/hermes-agent/pull/28462))
- Scheduled task start times. ([#28384](https://github.com/NousResearch/hermes-agent/pull/28384))
- Scheduled status for delayed follow-ups. ([#28467](https://github.com/NousResearch/hermes-agent/pull/28467))
- Trimmed task comments. ([#28399](https://github.com/NousResearch/hermes-agent/pull/28399))
- Initial-status for human-ops cards. ([#28414](https://github.com/NousResearch/hermes-agent/pull/28414))
- `max_in_progress` config to cap concurrent running tasks. ([#28420](https://github.com/NousResearch/hermes-agent/pull/28420))
- Filter tasks by workflow fields. ([#28454](https://github.com/NousResearch/hermes-agent/pull/28454))
- `--sort` for `hermes kanban list`. ([#28427](https://github.com/NousResearch/hermes-agent/pull/28427))
- Optional `board` parameter on all MCP tools. ([#28444](https://github.com/NousResearch/hermes-agent/pull/28444))
- Stamp originating ACP session_id on tasks. ([#28447](https://github.com/NousResearch/hermes-agent/pull/28447))
- `auto_promote_children` config toggle. ([#28344](https://github.com/NousResearch/hermes-agent/pull/28344))
- `archive --rm` to hard-delete archived tasks. ([#28355](https://github.com/NousResearch/hermes-agent/pull/28355))
- Promote dependents when parent is archived. ([#28372](https://github.com/NousResearch/hermes-agent/pull/28372))
- Promote blocked tasks when parent dependencies complete. ([#28377](https://github.com/NousResearch/hermes-agent/pull/28377))
- Demote ready children when parent is reopened. ([#28382](https://github.com/NousResearch/hermes-agent/pull/28382))
- `promote` verb for manual `todo→ready` recovery + bulk `--ids`. (salvage [#29464](https://github.com/NousResearch/hermes-agent/pull/29464)) ([#31334](https://github.com/NousResearch/hermes-agent/pull/31334))
### Dashboard
- Drag-to-delete trash zone + bulk delete. ([#28468](https://github.com/NousResearch/hermes-agent/pull/28468))
- Surface per-task `model_override` in show + tool output. ([#28442](https://github.com/NousResearch/hermes-agent/pull/28442))
- Cross-profile notification delivery via `kanban.notification_sources`. ([#28395](https://github.com/NousResearch/hermes-agent/pull/28395))
- Scratch-workspace deletion warning for users. ([#30949](https://github.com/NousResearch/hermes-agent/pull/30949))
- Mobile dashboard UX polish. ([#28127](https://github.com/NousResearch/hermes-agent/pull/28127))
### Reliability
- Worker log retention configurable. ([#27867](https://github.com/NousResearch/hermes-agent/pull/27867))
- Configurable claim TTL. ([#28392](https://github.com/NousResearch/hermes-agent/pull/28392))
- Fingerprint crash errors to prevent fleet-wide retry exhaustion. ([#28380](https://github.com/NousResearch/hermes-agent/pull/28380))
- Reset failure counters on `unblock_task`. ([#28379](https://github.com/NousResearch/hermes-agent/pull/28379))
- Detect cycles in `decompose_triage_task` sibling-link pre-validation. ([#28088](https://github.com/NousResearch/hermes-agent/pull/28088))
- Surface unusable triage auxiliary model (auto-decompose aware). ([#27871](https://github.com/NousResearch/hermes-agent/pull/27871))
- Align failure diagnostics with retry limit. ([#27868](https://github.com/NousResearch/hermes-agent/pull/27868))
- Align worker terminal timeout with task runtime. ([#27864](https://github.com/NousResearch/hermes-agent/pull/27864))
- Auto-install bundled skills (kanban-worker) on init. ([#28368](https://github.com/NousResearch/hermes-agent/pull/28368))
- Make legacy task migration idempotent. ([#28397](https://github.com/NousResearch/hermes-agent/pull/28397))
- Serialize DB initialization. ([#28383](https://github.com/NousResearch/hermes-agent/pull/28383))
- Persist worker session metadata on completion. ([#28387](https://github.com/NousResearch/hermes-agent/pull/28387))
- Pass `accept-hooks` to worker chat subprocess. ([#28393](https://github.com/NousResearch/hermes-agent/pull/28393))
- Preserve worker tools with restricted toolsets. ([#28396](https://github.com/NousResearch/hermes-agent/pull/28396))
- Avoid unsafe Windows worker Hermes shim resolution. ([#28398](https://github.com/NousResearch/hermes-agent/pull/28398))
- Sync slash subcommands with live parser. ([#28376](https://github.com/NousResearch/hermes-agent/pull/28376))
- Show scheduled kanban tasks in dashboard. ([#28400](https://github.com/NousResearch/hermes-agent/pull/28400))
- Assign single-task kanban decompositions. ([#28401](https://github.com/NousResearch/hermes-agent/pull/28401))
- Configurable `max_tokens` for kanban specify. ([#28374](https://github.com/NousResearch/hermes-agent/pull/28374))
- Per-job profile support for cron. ([#28124](https://github.com/NousResearch/hermes-agent/pull/28124))
- Codex app-server: include every Kanban-pinned path in `writable_roots`. ([#28435](https://github.com/NousResearch/hermes-agent/pull/28435))
- Cache kanban worker guidance at session init for prompt-cache reuse. ([#28425](https://github.com/NousResearch/hermes-agent/pull/28425))
---
## ⚡ Performance
- `openai._base_client` import deferred — 240ms / 17MB off every CLI cold start. ([#28864](https://github.com/NousResearch/hermes-agent/pull/28864))
- Agent-loop hot-path optimizations — 47% fewer per-conversation function calls (399k → 213k for 31-turn chat). ([#28866](https://github.com/NousResearch/hermes-agent/pull/28866))
- Compression-feasibility check deferred — 170-290ms off every agent construction. ([#28957](https://github.com/NousResearch/hermes-agent/pull/28957))
- Adaptive subprocess poll — ~195ms off every tool call, 1+ second per turn. ([#29006](https://github.com/NousResearch/hermes-agent/pull/29006))
- Termux TUI cold start speedup. ([#29419](https://github.com/NousResearch/hermes-agent/pull/29419))
- Termux non-TUI cold start speedup. (salvage [#29438](https://github.com/NousResearch/hermes-agent/pull/29438)) ([#30121](https://github.com/NousResearch/hermes-agent/pull/30121))
- Termux fast-path version + deferred bare-prompt agent startup. ([#30609](https://github.com/NousResearch/hermes-agent/pull/30609))
- Cut hermes `--version` wall time 63% — flips head-to-head vs Codex CLI. ([#31968](https://github.com/NousResearch/hermes-agent/pull/31968))
- Date-only timestamp + loud gateway-DB roundtrip logging — improves prompt-cache hit rate. ([#27675](https://github.com/NousResearch/hermes-agent/pull/27675))
- Cache kanban worker guidance at session init for prompt-cache reuse. ([#28425](https://github.com/NousResearch/hermes-agent/pull/28425))
---
## 🔧 Tool System
### Tool surface
- `patch`: indent preservation, CRLF preservation, per-file failure escalation. ([#32273](https://github.com/NousResearch/hermes-agent/pull/32273))
- `terminal`: warn at call time when `background=true` runs silently. ([#31289](https://github.com/NousResearch/hermes-agent/pull/31289))
- `terminal`: nudge homebrewed CI pollers at the tool surface. ([#33142](https://github.com/NousResearch/hermes-agent/pull/33142))
- `x_search`: surface degraded results + validate dates. ([#29484](https://github.com/NousResearch/hermes-agent/pull/29484))
- `x_search`: auto-enable toolset when xAI credentials are configured. ([#27376](https://github.com/NousResearch/hermes-agent/pull/27376))
- `computer_use`: route SOM/vision captures via auxiliary.vision. ([#30126](https://github.com/NousResearch/hermes-agent/pull/30126))
- `transcription`: reject symlinked audio inputs. ([#10082](https://github.com/NousResearch/hermes-agent/pull/10082))
- TTS: prevent double `[pause]` in xAI auto speech tags. ([#32237](https://github.com/NousResearch/hermes-agent/pull/32237))
- TTS: preserve native audio outside Telegram voice delivery. ([#28512](https://github.com/NousResearch/hermes-agent/pull/28512))
- TTS: opt-in xAI `auto_speech_tags` speech-tag pauses for natural voice replies. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- Voice: chunk oversized CLI recordings. ([#30044](https://github.com/NousResearch/hermes-agent/pull/30044))
- Voice: honor `PULSE_SERVER` / `PIPEWIRE_REMOTE` inside Docker. ([#22534](https://github.com/NousResearch/hermes-agent/pull/22534))
### Browser
- All cloud browser providers (Browserbase, Anchor, Camofox, Hyperbrowser, etc.) migrated to image_gen-style plugins. (salvages [#25580](https://github.com/NousResearch/hermes-agent/pull/25580)) ([#27403](https://github.com/NousResearch/hermes-agent/pull/27403))
- Auto-launch Chromium-family browser for CDP. ([#29106](https://github.com/NousResearch/hermes-agent/pull/29106))
- Docker: discover agent-browser Chromium binary at boot. ([#33184](https://github.com/NousResearch/hermes-agent/pull/33184))
### Image generation
- **Krea** provider plugin (Krea 2 Medium + Large). ([#33236](https://github.com/NousResearch/hermes-agent/pull/33236))
- FAL backend ported to `plugins/image_gen/fal`. (salvage [#27966](https://github.com/NousResearch/hermes-agent/pull/27966)) ([#30380](https://github.com/NousResearch/hermes-agent/pull/30380))
- Cache xAI ephemeral URL responses to disk. ([#31759](https://github.com/NousResearch/hermes-agent/pull/31759))
### Web search
- **xAI Web Search** as a provider plugin. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
### MCP
- **Nous-approved MCP catalog** with interactive picker. ([#30870](https://github.com/NousResearch/hermes-agent/pull/30870))
- **TLS client certificate (mTLS) support** for HTTP and SSE MCP servers. ([#33721](https://github.com/NousResearch/hermes-agent/pull/33721))
- Stdin paste-back fallback for headless OAuth flow. ([#32053](https://github.com/NousResearch/hermes-agent/pull/32053))
- `skip` at paste prompt bypasses auth without disabling server. ([#32069](https://github.com/NousResearch/hermes-agent/pull/32069))
- Registry-aware `mcp_` prefix on both ends of round-trip. ([#31700](https://github.com/NousResearch/hermes-agent/pull/31700))
---
## 🧩 Skills Ecosystem
### Skills system
- **Skill bundles** — `/<name>` loads multiple skills. ([#28373](https://github.com/NousResearch/hermes-agent/pull/28373))
- Skills Hub: health checks, freshness badge, and a watchdog cron. ([#32345](https://github.com/NousResearch/hermes-agent/pull/32345))
- Opt-in AST deep diagnostics on skill writes. (salvage of [#30918](https://github.com/NousResearch/hermes-agent/pull/30918)) ([#31198](https://github.com/NousResearch/hermes-agent/pull/31198))
- Bundled/pinned skill protection in background-review prompts. ([#28338](https://github.com/NousResearch/hermes-agent/pull/28338))
- Show user-modified skill names in bundled skill sync summary. ([#28671](https://github.com/NousResearch/hermes-agent/pull/28671))
- Load symlinked skill slash commands. ([#27759](https://github.com/NousResearch/hermes-agent/pull/27759))
- Deduplicate Skills Hub search results by identifier, not name. ([#29490](https://github.com/NousResearch/hermes-agent/pull/29490))
### New skills
- `openhands` — delegate-to-OpenHands orchestration skill (closes [#477](https://github.com/NousResearch/hermes-agent/issues/477)) ([#32261](https://github.com/NousResearch/hermes-agent/pull/32261))
- `code-wiki` — persistent indexed dev wiki (closes [#486](https://github.com/NousResearch/hermes-agent/issues/486)) ([#32240](https://github.com/NousResearch/hermes-agent/pull/32240))
- `web-pentest` — OWASP recipes (closes [#400](https://github.com/NousResearch/hermes-agent/issues/400)) ([#32265](https://github.com/NousResearch/hermes-agent/pull/32265))
- `baoyu-article-illustrator` ([#28287](https://github.com/NousResearch/hermes-agent/pull/28287))
---
## ☁️ Providers
### xAI deep integration
- **xAI Web Search** as a `plugins/web/xai/` provider plugin. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
- **`hermes proxy` xAI upstream** — OpenAI-compatible local proxy backed by xai-oauth. ([#28356](https://github.com/NousResearch/hermes-agent/pull/28356))
- **May 15 model retirement detection + `hermes migrate xai`** for grok-4 / grok-3 / grok-code-fast-1 / grok-imagine-image-pro. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- **Opt-in `auto_speech_tags`** for natural xAI TTS voice replies. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- **xai-oauth base_url pinned to x.ai origin** — closes silent credential-leak vector. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- **OpenAI-style execution guidance** applied to Grok / xai-oauth models. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- xAI: detect retired May 15 models in doctor/chat startup. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- xAI: resolve Grok Build context for OAuth. ([#30579](https://github.com/NousResearch/hermes-agent/pull/30579))
- xAI OAuth: tier-gated 403 with API-key fallback. ([#28351](https://github.com/NousResearch/hermes-agent/pull/28351))
- xAI OAuth: PKCE `code_challenge` echo. ([#27560](https://github.com/NousResearch/hermes-agent/pull/27560))
- xAI OAuth: quarantine dead tokens on terminal refresh failure. ([#28116](https://github.com/NousResearch/hermes-agent/pull/28116))
- xAI OAuth: honor `WKE=unauthenticated` disambiguator at both classifier sites. ([#30872](https://github.com/NousResearch/hermes-agent/pull/30872))
- xAI OAuth: accept bare-code manual paste (state=None). (closes [#26923](https://github.com/NousResearch/hermes-agent/issues/26923)) ([#33880](https://github.com/NousResearch/hermes-agent/pull/33880))
- xAI OAuth: fall back to manual paste on loopback timeout. ([#33231](https://github.com/NousResearch/hermes-agent/pull/33231))
- xAI proxy: handle 429 rate-limit responses in proxy retry path. ([#33743](https://github.com/NousResearch/hermes-agent/pull/33743))
### Other providers
- **OpenAI API as a first-class provider** (distinct from Codex runtime). ([#31898](https://github.com/NousResearch/hermes-agent/pull/31898))
- **Microsoft Entra ID** auth for Azure Foundry (with 1M Anthropic-Messages beta preserved on Bearer). (salvages [#27509](https://github.com/NousResearch/hermes-agent/pull/27509), [#27022](https://github.com/NousResearch/hermes-agent/pull/27022)) ([#28101](https://github.com/NousResearch/hermes-agent/pull/28101), [#28084](https://github.com/NousResearch/hermes-agent/pull/28084))
- **OpenRouter** sticky routing — `session_id` passed via `extra_body` so a long-running session keeps landing on the same upstream provider. (@Cybourgeoisie) ([#33939](https://github.com/NousResearch/hermes-agent/pull/33939))
- Nous: JWT token for inference; stop replaying invalid Nous refresh tokens. (@rewbs) ([#27663](https://github.com/NousResearch/hermes-agent/pull/27663))
- Nous Portal: one-shot setup, status CLI, and Nous-included markers. ([#30860](https://github.com/NousResearch/hermes-agent/pull/30860))
- Anthropic adapter: extract 7 helpers from `convert_messages_to_anthropic`. (salvage [#27784](https://github.com/NousResearch/hermes-agent/pull/27784)) ([#30386](https://github.com/NousResearch/hermes-agent/pull/30386))
- Catalog: add `qwen3.7-max` to Alibaba + Alibaba-Coding-Plan model lists. ([#33129](https://github.com/NousResearch/hermes-agent/pull/33129))
- opencode-go: route `qwen3.7-max` via `anthropic_messages`. (@beardthelion) ([#32780](https://github.com/NousResearch/hermes-agent/pull/32780))
- opencode-go: expose Kimi K2 + DeepSeek reasoning controls. ([#30845](https://github.com/NousResearch/hermes-agent/pull/30845))
- Remove Vercel AI Gateway and Vercel Sandbox.
- MiniMax OAuth: refresh short-lived access tokens per request. ([#30619](https://github.com/NousResearch/hermes-agent/pull/30619))
- Codex OAuth: quarantine terminal refresh errors. ([#28118](https://github.com/NousResearch/hermes-agent/pull/28118))
- Codex: drop dead model slugs that HTTP 400 on ChatGPT Pro. ([#33424](https://github.com/NousResearch/hermes-agent/pull/33424))
- Codex: sync `manual:device_code` pool entries on re-auth. ([#33744](https://github.com/NousResearch/hermes-agent/pull/33744))
- MiniMax OAuth: quarantine terminal refresh errors. ([#28119](https://github.com/NousResearch/hermes-agent/pull/28119))
---
## 🔑 Secrets
- **Bitwarden Secrets Manager** integration with lazy `bws` install. ([#30035](https://github.com/NousResearch/hermes-agent/pull/30035))
- Bitwarden: EU Cloud + self-hosted server URL support. ([#31378](https://github.com/NousResearch/hermes-agent/pull/31378))
- Label detected credentials with their source (Bitwarden). ([#30364](https://github.com/NousResearch/hermes-agent/pull/30364))
---
## 📱 Messaging Platforms (Gateway)
### Gateway core
- **Deliverable mode** — agents ship artifacts as native uploads from any platform (Slack/Discord/Telegram/Teams/Email). ([#27813](https://github.com/NousResearch/hermes-agent/pull/27813))
- `hermes send` — pipe any script's output to any messaging platform. (salvage of [#19631](https://github.com/NousResearch/hermes-agent/pull/19631)) ([#27188](https://github.com/NousResearch/hermes-agent/pull/27188))
- Debounce queued text follow-ups during active sessions. (salvage of [#31235](https://github.com/NousResearch/hermes-agent/pull/31235)) ([#31341](https://github.com/NousResearch/hermes-agent/pull/31341))
- Plugin-transformed final_response delivered through streaming gate. ([#31433](https://github.com/NousResearch/hermes-agent/pull/31433))
- Refresh cached agent tools on `/reload-mcp`. ([#32815](https://github.com/NousResearch/hermes-agent/pull/32815))
- Harden kanban + provider cleanup races on long-running workloads. ([#29479](https://github.com/NousResearch/hermes-agent/pull/29479))
### New / reorganized adapters
- **ntfy** — 23rd platform, push notifications, plugin shape, zero core edits. (salvages [#30625](https://github.com/NousResearch/hermes-agent/pull/30625) → [#4043](https://github.com/NousResearch/hermes-agent/pull/4043)) ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- **Discord** adapter migrated to bundled plugin. (salvage of [#24356](https://github.com/NousResearch/hermes-agent/pull/24356)) ([#30591](https://github.com/NousResearch/hermes-agent/pull/30591))
- **Mattermost** adapter migrated to bundled plugin. (salvage of [#30916](https://github.com/NousResearch/hermes-agent/pull/30916)) ([#31748](https://github.com/NousResearch/hermes-agent/pull/31748))
### Telegram
- Edit status messages in place instead of appending. (based on [#30141](https://github.com/NousResearch/hermes-agent/pull/30141) by @qike-ms) ([#30864](https://github.com/NousResearch/hermes-agent/pull/30864))
- Skip-STT audio path + 2GB cap via local Bot API server. ([#28541](https://github.com/NousResearch/hermes-agent/pull/28541))
- Route image documents (.png/.jpg/.webp/.gif) through vision pipeline. ([#28519](https://github.com/NousResearch/hermes-agent/pull/28519))
- Route audio file attachments away from STT pipeline. ([#28478](https://github.com/NousResearch/hermes-agent/pull/28478))
- `disable_topic_auto_rename` gateway flag. ([#28523](https://github.com/NousResearch/hermes-agent/pull/28523))
- `ignore_root_dm` config to drop messages without thread_id. ([#28536](https://github.com/NousResearch/hermes-agent/pull/28536))
- Chat-scoped auth without sender user_id. ([#28525](https://github.com/NousResearch/hermes-agent/pull/28525))
- Fail-closed auth fallback when `TELEGRAM_ALLOWED_USERS` is empty. ([#28494](https://github.com/NousResearch/hermes-agent/pull/28494))
- Roll over tool progress bubbles + scope audio_file_paths. ([#28482](https://github.com/NousResearch/hermes-agent/pull/28482))
- Avoid duplicate text after auto-TTS voice replies. ([#28509](https://github.com/NousResearch/hermes-agent/pull/28509))
- Mark final voice reply notify-worthy so Telegram delivers it audibly. ([#28504](https://github.com/NousResearch/hermes-agent/pull/28504))
### Discord
- Recover Windows voice opus decoding. ([#33182](https://github.com/NousResearch/hermes-agent/pull/33182))
- `allow_any_attachment` config to accept arbitrary file types. ([#27245](https://github.com/NousResearch/hermes-agent/pull/27245))
- Transcribe native voice notes. ([#28993](https://github.com/NousResearch/hermes-agent/pull/28993))
- Define UI view classes after lazy install. ([#28817](https://github.com/NousResearch/hermes-agent/pull/28817))
### Signal / Matrix / Feishu / Slack / WeCom
- Signal: `require_mention` filter for group chats. ([#28574](https://github.com/NousResearch/hermes-agent/pull/28574))
- Matrix: warn on clock-skew silent message drops. ([#27330](https://github.com/NousResearch/hermes-agent/pull/27330))
- Matrix E2EE installs full dep set; plugins respect `is_connected`. ([#31688](https://github.com/NousResearch/hermes-agent/pull/31688))
- Feishu: require webhook auth secret + honor config extras. ([#30746](https://github.com/NousResearch/hermes-agent/pull/30746))
- Feishu: enforce auth and chat binding for approval buttons. ([#30744](https://github.com/NousResearch/hermes-agent/pull/30744))
- Slack: socket recovery + Windows restart dedupe. ([#28873](https://github.com/NousResearch/hermes-agent/pull/28873))
- WeCom: safe-parse untrusted XML. ([#32442](https://github.com/NousResearch/hermes-agent/pull/32442))
### DingTalk / Webhooks / Microsoft Graph
- DingTalk: transcribe native voice notes. ([#28993](https://github.com/NousResearch/hermes-agent/pull/28993))
- Webhook: enforce `INSECURE_NO_AUTH` safety rail on dynamic route reloads. ([#30863](https://github.com/NousResearch/hermes-agent/pull/30863))
- Webhook: restrict default toolset capabilities. ([#30745](https://github.com/NousResearch/hermes-agent/pull/30745))
- Microsoft Graph: harden webhook auth requirements. ([#30169](https://github.com/NousResearch/hermes-agent/pull/30169))
---
## 🖥️ CLI & TUI
### CLI
- `/update` slash command in CLI and TUI. ([#23854](https://github.com/NousResearch/hermes-agent/pull/23854))
- Update auto-rollback when post-pull syntax check fails. ([#28669](https://github.com/NousResearch/hermes-agent/pull/28669))
- `--branch` flag for `hermes update`. (@jquesnelle) ([#29591](https://github.com/NousResearch/hermes-agent/pull/29591))
- `/exit --delete` flag to remove session on quit. (salvage of [#17665](https://github.com/NousResearch/hermes-agent/pull/17665)) ([#27101](https://github.com/NousResearch/hermes-agent/pull/27101))
- `▶ N` indicator in status bar for running `/background` tasks. ([#27175](https://github.com/NousResearch/hermes-agent/pull/27175))
- Live background terminal-process count in status bar. ([#32061](https://github.com/NousResearch/hermes-agent/pull/32061))
- Append session recap to `/status` output. (salvage of [#18587](https://github.com/NousResearch/hermes-agent/pull/18587)) ([#27176](https://github.com/NousResearch/hermes-agent/pull/27176))
- Configurable paste-collapse thresholds (TUI + CLI). (salvage [#29723](https://github.com/NousResearch/hermes-agent/pull/29723)) ([#32087](https://github.com/NousResearch/hermes-agent/pull/32087))
- `/resume` accepts position numbers. ([#31709](https://github.com/NousResearch/hermes-agent/pull/31709))
- Bring tool-call display back — verbose mode, specific failure reasons, todo progress. ([#31293](https://github.com/NousResearch/hermes-agent/pull/31293))
- Validate runtime token refresh in Qwen auth status. ([#31196](https://github.com/NousResearch/hermes-agent/pull/31196))
### TUI
- **TUI session orchestrator** — multiple live sessions in one TUI window. (salvages [#27642](https://github.com/NousResearch/hermes-agent/pull/27642)) ([#32980](https://github.com/NousResearch/hermes-agent/pull/32980))
- `mouse_tracking` DEC mode presets. (salvage of [#26681](https://github.com/NousResearch/hermes-agent/pull/26681) by @OutThisLife) ([#30084](https://github.com/NousResearch/hermes-agent/pull/30084))
- Termux scrollback preservation + touch-friendly defaults. ([#28910](https://github.com/NousResearch/hermes-agent/pull/28910))
- Full assistant text in scrollback (no history truncation). ([#28829](https://github.com/NousResearch/hermes-agent/pull/28829))
- Preserve scrollback when branching sessions. ([#30162](https://github.com/NousResearch/hermes-agent/pull/30162))
- Preserve Python dunder identifiers in markdown. ([#28582](https://github.com/NousResearch/hermes-agent/pull/28582))
- Active profile shown in TUI prompt. ([#28581](https://github.com/NousResearch/hermes-agent/pull/28581))
- Improve Charizard completion menu contrast. ([#28346](https://github.com/NousResearch/hermes-agent/pull/28346))
- Stop slash dropdown chopping last char of `/goal`. ([#31311](https://github.com/NousResearch/hermes-agent/pull/31311))
- Clipboard copy on linux/wayland. ([#29342](https://github.com/NousResearch/hermes-agent/pull/29342))
- Anchor `splitReasoning` unclosed-tag regex; stop eating last paragraph. ([#29426](https://github.com/NousResearch/hermes-agent/pull/29426))
- Surface verbose tool details. ([#30225](https://github.com/NousResearch/hermes-agent/pull/30225))
- Load Linux skills on Termux + salvage @adybag14-cyber's Termux gates. ([#30166](https://github.com/NousResearch/hermes-agent/pull/30166))
- Handle images with codex app-server. ([#31220](https://github.com/NousResearch/hermes-agent/pull/31220))
- Refresh virtual transcript on viewport resize. ([#31077](https://github.com/NousResearch/hermes-agent/pull/31077))
- Ignore late thinking deltas after completion. ([#31055](https://github.com/NousResearch/hermes-agent/pull/31055))
- Commit composer input bursts immediately. ([#31053](https://github.com/NousResearch/hermes-agent/pull/31053))
- Log parent gateway lifecycle exits. ([#31051](https://github.com/NousResearch/hermes-agent/pull/31051))
- Clear TTS env var on voice off + TTS indicator in status bar. ([#30987](https://github.com/NousResearch/hermes-agent/pull/30987))
- Pass `--expose-gc` as node argv instead of NODE_OPTIONS. ([#29998](https://github.com/NousResearch/hermes-agent/pull/29998))
- Align composer cursorLayout with wrap-ansi to kill multiline cursor drift. ([#27489](https://github.com/NousResearch/hermes-agent/pull/27489))
- Harden Terminal.app rendering and color paths. ([#27251](https://github.com/NousResearch/hermes-agent/pull/27251))
- Keep `/goal` verdict out of compact status row. ([#27971](https://github.com/NousResearch/hermes-agent/pull/27971))
- Clamp curses color 8 for 8-color terminals (Docker). ([#30260](https://github.com/NousResearch/hermes-agent/pull/30260))
---
## 🔒 Security & Reliability
### Promptware & memory hardening
- **Promptware defense** — shared threat patterns + memory load-time scan + tool-result delimiters. ([#32269](https://github.com/NousResearch/hermes-agent/pull/32269))
- Expand memory content scanning patterns to parity with skills guard. ([#9151](https://github.com/NousResearch/hermes-agent/pull/9151))
- Harden Skills Guard multi-word prompt patterns. (@YLChen-007) ([#26852](https://github.com/NousResearch/hermes-agent/pull/26852))
- Split cron scanner so skill prose stops false-positiving exfil patterns. ([#32339](https://github.com/NousResearch/hermes-agent/pull/32339))
### File safety
- Protect Hermes control-plane files from prompt injection (`auth.json`, `config.yaml`, `webhook_subscriptions.json`, `mcp-tokens/`). (salvages @PratikRai0101's [#14157](https://github.com/NousResearch/hermes-agent/pull/14157)) ([#30397](https://github.com/NousResearch/hermes-agent/pull/30397))
- Write-deny `<root>/.env` when running under a profile. ([#29687](https://github.com/NousResearch/hermes-agent/pull/29687))
- Defense-in-depth read-deny on credential stores. (salvages [#17659](https://github.com/NousResearch/hermes-agent/pull/17659) + [#8055](https://github.com/NousResearch/hermes-agent/pull/8055)) ([#30721](https://github.com/NousResearch/hermes-agent/pull/30721))
- TTS `output_path` traversal + update ZIP symlink reject. (salvage [#6693](https://github.com/NousResearch/hermes-agent/pull/6693) + [#15881](https://github.com/NousResearch/hermes-agent/pull/15881)) ([#32056](https://github.com/NousResearch/hermes-agent/pull/32056))
- Reject symlinked audio inputs. ([#10082](https://github.com/NousResearch/hermes-agent/pull/10082))
### Credential safety
- Avoid persisting borrowed credential secrets — runtime env-sourced keys no longer leak into `auth.json`. ([#31416](https://github.com/NousResearch/hermes-agent/pull/31416))
- Validate Nous Portal `inference_base_url` against host allowlist. (salvages [#27612](https://github.com/NousResearch/hermes-agent/pull/27612)) ([#30611](https://github.com/NousResearch/hermes-agent/pull/30611))
- Harden API server key placeholder handling. ([#30738](https://github.com/NousResearch/hermes-agent/pull/30738))
- Harden Google Chat OAuth credential persistence. (@Zyrixtrex) ([#24788](https://github.com/NousResearch/hermes-agent/pull/24788))
- xAI OAuth: pin inference `base_url` to x.ai origin. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- Quarantine dead OAuth tokens on terminal refresh failure (xAI, Codex, MiniMax). ([#28116](https://github.com/NousResearch/hermes-agent/pull/28116), [#28118](https://github.com/NousResearch/hermes-agent/pull/28118), [#28119](https://github.com/NousResearch/hermes-agent/pull/28119))
### Supply-chain
- **On-demand supply-chain audit via OSV.dev** — `hermes audit`. ([#31460](https://github.com/NousResearch/hermes-agent/pull/31460))
- `hermes update` syntax-validates critical files post-pull, auto-rollback on failure. ([#28669](https://github.com/NousResearch/hermes-agent/pull/28669))
- Quarantine `hermes.exe` vs concurrent Windows instance. ([#26677](https://github.com/NousResearch/hermes-agent/pull/26677))
### Other hardening
- Restrict default webhook toolset capabilities. ([#30745](https://github.com/NousResearch/hermes-agent/pull/30745))
- Harden Microsoft Graph webhook auth requirements. ([#30169](https://github.com/NousResearch/hermes-agent/pull/30169))
- Require source CIDR allowlisting for public msgraph webhook binds. ([#33722](https://github.com/NousResearch/hermes-agent/pull/33722))
- Require `API_SERVER_KEY` before dispatching API server work. ([#33232](https://github.com/NousResearch/hermes-agent/pull/33232))
- env_passthrough: apply GHSA-rhgp-j443-p4rf filter to config.yaml path. (@roadhero) ([#27794](https://github.com/NousResearch/hermes-agent/pull/27794))
- Dashboard + WeCom: restrict markdown link schemes; safe-parse untrusted XML. ([#32442](https://github.com/NousResearch/hermes-agent/pull/32442))
- Salvage project-plugin RCE bypass fix from PR [#29311](https://github.com/NousResearch/hermes-agent/pull/29311) (GHSA-5qr3-c538-wm9j). ([#30837](https://github.com/NousResearch/hermes-agent/pull/30837))
- Cross-profile soft guard on file-write tools + system-prompt hint. ([#31290](https://github.com/NousResearch/hermes-agent/pull/31290))
- Reject unsafe tar members in Android psutil compatibility installer. ([#33742](https://github.com/NousResearch/hermes-agent/pull/33742))
- Reject non-regular tar members during tirith auto-install. ([#33786](https://github.com/NousResearch/hermes-agent/pull/33786))
---
## 🪟 Native Windows (Beta Continued)
- Thin desktop installer + first-launch `install.ps1` bootstrap. ([#27822](https://github.com/NousResearch/hermes-agent/pull/27822))
- Complete Windows bootstrap — `dep_ensure` + `install.ps1` + detection. (@alt-glitch) ([#27845](https://github.com/NousResearch/hermes-agent/pull/27845))
- `install.ps1`: strip BOM, `-Commit`/`-Tag` pin params, harden git ops. (@jquesnelle) ([#28169](https://github.com/NousResearch/hermes-agent/pull/28169))
- Consolidate ACP browser bootstrap into `install.{sh,ps1}`. (@alt-glitch) ([#27851](https://github.com/NousResearch/hermes-agent/pull/27851))
- `hermes update` quarantines live `hermes.exe`. ([#26677](https://github.com/NousResearch/hermes-agent/pull/26677))
- Discord voice opus decoding on Windows. ([#33182](https://github.com/NousResearch/hermes-agent/pull/33182))
- Windows Docker Desktop compatible compose file. (@Sunil123135) ([#31031](https://github.com/NousResearch/hermes-agent/pull/31031))
---
## 🖼️ Hermes Desktop GUI
- `hermes gui` launcher — install + build + launch packaged Electron app. (@OutThisLife) ([#30165](https://github.com/NousResearch/hermes-agent/pull/30165))
- Desktop UI lift. ([#27227](https://github.com/NousResearch/hermes-agent/pull/27227))
- `nix` package `.#desktop`. (@ethernet8023) ([#28964](https://github.com/NousResearch/hermes-agent/pull/28964))
- Hardened Slack socket recovery + Windows desktop restart dedupe. ([#28873](https://github.com/NousResearch/hermes-agent/pull/28873))
- Web dashboard: migrate checkboxes to `@nous-research/ui` + design-system polish. (@austinpickett) ([#28814](https://github.com/NousResearch/hermes-agent/pull/28814))
- Web dashboard: collapsible sidebar. (@austinpickett) ([#33421](https://github.com/NousResearch/hermes-agent/pull/33421))
- Dashboard typography & contrast pass. (salvage of [#28832](https://github.com/NousResearch/hermes-agent/pull/28832)) ([#30714](https://github.com/NousResearch/hermes-agent/pull/30714))
- Skills page: lazy-fetch catalog instead of bundling 34MB into JS. ([#33809](https://github.com/NousResearch/hermes-agent/pull/33809))
---
## 🐳 Docker
- **s6-overlay container supervision** — abstract `ServiceManager` protocol (systemd/launchd/Windows/s6 backends), per-profile gateway supervision in-container, container-restart reconciliation, hadolint/shellcheck CI. (salvage of [#30136](https://github.com/NousResearch/hermes-agent/pull/30136), @benbarclay) ([#31760](https://github.com/NousResearch/hermes-agent/pull/31760))
- Auto-redirect `gateway run` to supervised mode inside the s6 image. (@benbarclay) ([#33583](https://github.com/NousResearch/hermes-agent/pull/33583))
- Tee supervised gateway stdout to docker logs. (@benbarclay) ([#33621](https://github.com/NousResearch/hermes-agent/pull/33621))
- Drop `docker exec` to hermes uid before invoking the CLI. (@benbarclay) ([#33628](https://github.com/NousResearch/hermes-agent/pull/33628))
- Align HOME for dashboard and s6 gateway services. (@Dusk1e) ([#33481](https://github.com/NousResearch/hermes-agent/pull/33481))
- Bake build-time git SHA into image so `hermes dump` reports it. (@benbarclay) ([#33655](https://github.com/NousResearch/hermes-agent/pull/33655))
- `hermes update` prints `docker pull` guidance instead of bogus git error. (@benbarclay) ([#33659](https://github.com/NousResearch/hermes-agent/pull/33659))
- Upgrade Node to 22 LTS via multi-stage from `node:22-bookworm-slim`. (@benbarclay) ([#33060](https://github.com/NousResearch/hermes-agent/pull/33060))
- Drop `build-essential` from apt install. (@benbarclay) ([#33028](https://github.com/NousResearch/hermes-agent/pull/33028))
- Propagate env through s6 to cont-init and main CMD. ([#32412](https://github.com/NousResearch/hermes-agent/pull/32412))
- Targeted chown to preserve host file ownership in `HERMES_HOME`. ([#33033](https://github.com/NousResearch/hermes-agent/pull/33033))
- `mkdir HERMES_HOME` as root in stage2 before chown / privilege drop. ([#33078](https://github.com/NousResearch/hermes-agent/pull/33078))
- chown `ui-tui` and `node_modules` on UID remap so TUI esbuild works. ([#33045](https://github.com/NousResearch/hermes-agent/pull/33045))
- Include `anthropic`, `bedrock`, `azure-identity` extras in image. ([#30504](https://github.com/NousResearch/hermes-agent/pull/30504))
- Stop pushing per-commit SHA tags to Docker Hub. ([#29387](https://github.com/NousResearch/hermes-agent/pull/29387))
- Simplify Docker tagging — push both `:main` and `:latest` on main push. ([#33225](https://github.com/NousResearch/hermes-agent/pull/33225))
- Test slicing across GH actions jobs. (@ethernet8023) ([#30575](https://github.com/NousResearch/hermes-agent/pull/30575))
- Discover agent-browser Chromium binary at boot. ([#33184](https://github.com/NousResearch/hermes-agent/pull/33184))
---
## 🌐 API Server
- **Session control API** — `/api/sessions/*` (list/create/read/patch/delete/fork) + SSE-streaming chat. (salvages [#29302](https://github.com/NousResearch/hermes-agent/pull/29302) by @Codename-11 + multimodal followup by @Schwartz10) ([#33134](https://github.com/NousResearch/hermes-agent/pull/33134))
- `GET /v1/skills` and `/v1/toolsets`. ([#33016](https://github.com/NousResearch/hermes-agent/pull/33016))
- Coerce stringified booleans in stream/store/approval payloads. (salvage [#26639](https://github.com/NousResearch/hermes-agent/pull/26639)) ([#27293](https://github.com/NousResearch/hermes-agent/pull/27293))
- Honor `key_env` in auth-failure fallback resolution. ([#30840](https://github.com/NousResearch/hermes-agent/pull/30840))
---
## 🎟️ ACP (VS Code / Zed / JetBrains)
- Session edit auto-approval modes. (salvage of [#27034](https://github.com/NousResearch/hermes-agent/pull/27034)) ([#27862](https://github.com/NousResearch/hermes-agent/pull/27862))
- Enrich Zed permission cards — command in title + `reject_always`. ([#28148](https://github.com/NousResearch/hermes-agent/pull/28148))
- Replay session history before responding to `session/load`. ([#26957](https://github.com/NousResearch/hermes-agent/pull/26957), [#26943](https://github.com/NousResearch/hermes-agent/pull/26943))
- Plugin-transformed final_response delivered through streaming gate. ([#31433](https://github.com/NousResearch/hermes-agent/pull/31433))
---
## 🔌 Plugin Surface
- `register_tts_provider()` plugin hook. (salvage of [#30420](https://github.com/NousResearch/hermes-agent/pull/30420)) ([#31745](https://github.com/NousResearch/hermes-agent/pull/31745))
- `register_transcription_provider()` hook + `stt.providers` command-provider registry. (salvage of [#30493](https://github.com/NousResearch/hermes-agent/pull/30493)) ([#31907](https://github.com/NousResearch/hermes-agent/pull/31907))
- `register_auxiliary_task()` in PluginContext API. (salvage [#29817](https://github.com/NousResearch/hermes-agent/pull/29817)) ([#31177](https://github.com/NousResearch/hermes-agent/pull/31177))
- Bundled `security-guidance` plugin. ([#33131](https://github.com/NousResearch/hermes-agent/pull/33131))
- Discord and Mattermost migrated to bundled plugins. ([#30591](https://github.com/NousResearch/hermes-agent/pull/30591), [#31748](https://github.com/NousResearch/hermes-agent/pull/31748))
- ntfy as platform plugin. ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- Surface category-namespaced plugins in `hermes plugins list`. ([#27187](https://github.com/NousResearch/hermes-agent/pull/27187))
- Plugin discovery failures raised to WARNING level. ([#28318](https://github.com/NousResearch/hermes-agent/pull/28318))
- `hermes_plugins` included in gateway.log component filter. ([#28313](https://github.com/NousResearch/hermes-agent/pull/28313))
- Seed plugin extras before `is_connected` gate. ([#31703](https://github.com/NousResearch/hermes-agent/pull/31703))
- Dashboard: allowlist plugin assets + denylist subprocess-influencing env vars. ([#32277](https://github.com/NousResearch/hermes-agent/pull/32277))
---
## 📦 Distribution & Install
- Install-method stamping + Docker detection. (@alt-glitch) ([#27843](https://github.com/NousResearch/hermes-agent/pull/27843))
- Nix `#messaging` and `#full` package variants. (@alt-glitch) ([#33108](https://github.com/NousResearch/hermes-agent/pull/33108))
- Pre-load messaging gateway deps via `--extra messaging`. (salvage [#26394](https://github.com/NousResearch/hermes-agent/pull/26394)) ([#27558](https://github.com/NousResearch/hermes-agent/pull/27558))
- Avoid piping installer directly into `iex` (Windows). ([#28347](https://github.com/NousResearch/hermes-agent/pull/28347))
- Ship bundled skills in wheel. ([#28421](https://github.com/NousResearch/hermes-agent/pull/28421))
- Ship dashboard plugin assets in wheel. ([#28406](https://github.com/NousResearch/hermes-agent/pull/28406))
- Make Camofox lazy-installed instead of eager. ([#27055](https://github.com/NousResearch/hermes-agent/pull/27055))
- Wire STT lazy-install into transcription_tools.py. ([#30256](https://github.com/NousResearch/hermes-agent/pull/30256))
---
## 🐛 Notable Bug Fixes (highlights only)
- Match bare custom provider by active base URL in `hermes model`. ([#28908](https://github.com/NousResearch/hermes-agent/pull/28908))
- Route `auxiliary.vision.provider=openai` to api.openai.com, skip text-only main. ([#31452](https://github.com/NousResearch/hermes-agent/pull/31452))
- Lint: skip per-file shell linter when LSP will handle the file. ([#29054](https://github.com/NousResearch/hermes-agent/pull/29054))
- Treat empty credential pool entries as unauthenticated in `/model` picker. ([#28312](https://github.com/NousResearch/hermes-agent/pull/28312))
- Reverted within window: Firecrawl integration tag, send_message @username auto-mentions, Telegram quick-command-only menus, Telegram pin-on-turn.
---
## 🧪 Testing
- Disarm lazy-install probe so `_HAS_FASTER_WHISPER` patches work. ([#30334](https://github.com/NousResearch/hermes-agent/pull/30334))
- Cover default board dashboard pin. ([#28361](https://github.com/NousResearch/hermes-agent/pull/28361))
- Cover `_task_dict` `task_age` fallback. ([#28365](https://github.com/NousResearch/hermes-agent/pull/28365))
- Allowlist `tmp_path` for `kanban_notify` artifact delivery tests. ([#30851](https://github.com/NousResearch/hermes-agent/pull/30851), [#30852](https://github.com/NousResearch/hermes-agent/pull/30852))
- Cover null output stream terminal events in Codex. ([#33137](https://github.com/NousResearch/hermes-agent/pull/33137))
---
## 📚 Documentation
- **30-day docs overhaul** — full correctness audit, every PR in the window covered, Nous Portal weave, sidebar reorg. ([#33782](https://github.com/NousResearch/hermes-agent/pull/33782))
- Dedicated Nous Portal integration page and setup guide. ([#31296](https://github.com/NousResearch/hermes-agent/pull/31296))
- Providers: move Nous Portal first, Google Gemini OAuth last. ([#31287](https://github.com/NousResearch/hermes-agent/pull/31287))
- `session_search` rewrite for single-shape tool. ([#27840](https://github.com/NousResearch/hermes-agent/pull/27840))
- Kanban: document failure_limit, max_retries, inline create shortcuts, goals & kanban settings. ([#28357](https://github.com/NousResearch/hermes-agent/pull/28357), [#28358](https://github.com/NousResearch/hermes-agent/pull/28358), [#28359](https://github.com/NousResearch/hermes-agent/pull/28359), [#28360](https://github.com/NousResearch/hermes-agent/pull/28360), [#28362](https://github.com/NousResearch/hermes-agent/pull/28362))
- Kanban Codex lane skill. ([#28430](https://github.com/NousResearch/hermes-agent/pull/28430))
- xAI OAuth: note X Premium+ also unlocks Grok OAuth. ([#29055](https://github.com/NousResearch/hermes-agent/pull/29055))
- Docs site: Docker audio bridge notes, "Installing more tools in the container", xurl auth HOME in Docker.
- Email: clarify gateway vs Himalaya setup. (@helix4u) ([#33634](https://github.com/NousResearch/hermes-agent/pull/33634))
- Auth docs: replace stale `hermes login` references with `hermes auth add`. ([#32859](https://github.com/NousResearch/hermes-agent/pull/32859))
---
## 👥 Contributors
### Core
- @teknium1 (lead)
### Notable salvages & cherry-picks
- **@benbarclay** — s6-overlay container supervision (29 commits salvaged), Node 22 LTS upgrade, build-essential cleanup, `gateway run` auto-redirect in s6, tee supervised stdout to docker logs, `hermes update` Docker guidance, build-time SHA stamping
- **@OutThisLife** — `hermes gui` desktop launcher, `mouse_tracking` DEC mode presets
- **@jquesnelle** — Windows installer hardening, `--branch` flag for `hermes update`, install.ps1 BOM strip / commit-pin
- **@alt-glitch** — Windows `dep_ensure` bootstrap, Nix package variants (`.#messaging`, `.#full`), install-method stamping, ACP browser bootstrap consolidation
- **@austinpickett** — `/update` slash command, dashboard checkboxes → `@nous-research/ui`, mobile dashboard polish, collapsible sidebar
- **@ethernet8023** — Nix `.#desktop` packaging, CI test slicing across GH Actions jobs, TUI clipboard copy fix
- **@kshitijk4poor** — doctor section banner + fail-and-issue helpers extraction, post-tag salvage cluster (curator-fallout, kanban SQLite hardening, install world-readable uv dirs, xAI bare-code paste)
- **@rewbs** — Nous JWT inference switch + refresh-token replay fix
- **@Codename-11** + **@Schwartz10** — session control API (REST + SSE + multimodal followup)
- **@Niraven** — kanban swarm topology helper
- **@Interstellar-code** — kanban worker visibility endpoints
- **@adybag14-cyber** — termux cold-start optimizations (multiple PRs)
- **@qike-ms** — Telegram in-place status edits design
- **@sprmn24** — ntfy adapter
- **@Jaaneek** — xAI Web Search provider plugin
- **@yannsunn** — xAI upstream adapter for `hermes proxy`
- **@Cybourgeoisie** — OpenRouter sticky routing via session_id
- **@memosr** — Nous Portal base_url allowlist validation
- **@Sunil123135** — Windows Docker Desktop compose file
- **@Dusk1e** — Docker HOME alignment for dashboard + s6 gateway services
- **@beardthelion** — opencode-go anthropic_messages routing
- **@YLChen-007** — Skills Guard multi-word prompt patterns
- **@roadhero** — env_passthrough GHSA-rhgp-j443-p4rf filter
- **@Zyrixtrex** — Google Chat OAuth credential persistence hardening
- **@briandevans**, **@tomqiaozc** — defense-in-depth read-deny on credential stores
- **@PratikRai0101** — control-plane file write protection
- **@helix4u**, **@Bartok9**, **@zccyman** — auxiliary fallback ladder components
- **@ms-alan**, **@ticketclosed-wontfix**, **@donovan-yohan** — TUI session orchestrator + follow-ups
- **@daimon-nous[bot]** — cron per-job profile support
- **@bisko** — re-pad `reasoning_content` on cross-provider fallback
### All Contributors
@02356abc, @0xchainer, @0xDevNinja, @0xjackyang, @0xsir0000, @0z1-ghb, @8bit64k, @aaronlab, @AceWattGit,
@ACR27, @adam91holt, @AdamPlatin123, @Ade5954, @AdityaRajeshGadgil, @adybag14-cyber, @AhmetArif0, @ai-hana-ai,
@alaamohanad169-ship-it, @alber70g, @albert748, @alt-glitch, @aqilaziz, @argabor, @asdlem, @austinpickett,
@avifenesh, @awizemann, @B0Tch1, @Bartok9, @BaxBit, @Beandon13, @beardthelion, @benbarclay, @bensargotest-sys,
@binhnt92, @bird, @bisko, @BlackishGreen33, @booker1207, @bradhallett, @briandevans, @Brixyy, @brndnsvr,
@BROCCOLO1D, @btorresgil, @burjorjee, @carltonawong, @Carry00, @chaconne67, @chdlc, @chromalinx, @ChyuWei,
@CipherFrame, @cmullins70, @CNSeniorious000, @codeblackhole1024, @Codename-11, @colin-chang, @counterposition,
@cresslank, @CryptoByz, @cyb0rgk1tty, @Cybourgeoisie, @daizhonggeng, @darvsum, @davidcampbelldc, @deas,
@dgians, @dillweed, @DoGMaTiiC, @donovan-yohan, @draplater, @Drexuxux, @dskwe, @dsr-restyn, @Dusk1e,
@dusterbloom, @duyua9, @egilewski, @el-analista, @eliteworkstation94-ai, @eloklam, @EloquentBrush0x, @emonty,
@emozilla, @erhnysr, @erikengervall, @Erosika, @ether-btc, @ethernet8023, @EvilHumphrey, @fabiosiqueira,
@falasi, @falconexe, @fardoche6, @felix-windsor, @Fewmanism, @ffr31mr, @flamiinngo, @flanny7, @flooryyyy,
@fonhal, @francip, @fujinice, @gianfrancopiana, @glennc, @Glucksberg, @godlin-gh, @Grogger, @guillaumemeyer,
@Gutslabs, @H-Ali13381, @hanzckernel, @haran2001, @hawknewton, @hayka-pacha, @hehehe0803, @helix4u, @HenkDz,
@Hermes, @hermesagent26, @Hinotoi-agent, @hongchen1993, @honor2030, @houenyang-momo, @ht1072, @hueilau,
@iamfoz, @ilonagaja509-glitch, @InB4DevOps, @indigokarasu, @Interstellar-code, @iqdoctor, @iRonin, @Jaaneek,
@JabberELF, @jacevys, @jackey8616, @jackjin1997, @jdelmerico, @jfuenmayor, @Jiahui-Gu, @JimLiu, @joe102084,
@JohnC1009, @jonpol01, @Jpalmer95, @Julientalbot, @justemu, @justincc, @jvinals, @karthikeyann, @kasunvinod,
@kchuang1015, @kenyonxu, @khungate, @kiranvk-2011, @kjames2001, @konsisumer, @kpadilha, @kriscolab,
@krislidimo, @kronexoi, @kshitijk4poor, @kunci115, @Kylejeong2, @kylekahraman, @LaPhilosophie, @leeseoki0,
@lemassykoi, @Lempkey, @LeonJS, @LeonSGP43, @lidge-jun, @LifeJiggy, @liuhao1024, @LizerAIDev, @loicnico96,
@loongfay, @m0n3r0, @malaiwah, @matthewlai, @mavrickdeveloper, @maxmilian, @McClean-Edison, @memosr,
@Mind-Dragon, @momowind, @MoonJuhan, @MoonRay305, @moortekweb-art, @MorAlekss, @ms-alan, @Nami4D,
@nehaaprasaad, @nekwo, @nftpoetrist, @NickLarcombe, @nidhi-singh02, @Niraven, @nnnet, @noctilust, @novax635,
@nthrow, @nv-kasikritc, @nycomar, @OCWC22, @oemtalks, @OmX, @ooovenenoso, @orcool, @oseftg, @outsourc-e,
@OutThisLife, @Paperclip, @PaTTeeL, @pepelax, @phoenixshen, @Pluviobyte, @pnascimento9596, @pochi-gio, @pr7426,
@PratikRai0101, @Prithvi1994, @psionic73, @ptichalouf, @Que0x, @QuenVix, @quocanh261997, @qWaitCrypto, @Qwinty,
@r266-tech, @rak135, @rdasilva1016-ui, @rewbs, @roadhero, @rodrigoeqnit, @RonHillDev, @roycepersonalassistant,
@rudi193-cmd, @RyanRana, @sadiksaifi, @samahn0601, @samggggflynn, @SamuelZ12, @sanghyuk-seo-nexcube,
@Saurav0989, @savanne-kham, @Schrotti77, @Schwartz10, @SerenityTn, @sgtworkman, @sharziki, @shaun0927,
@shellybotmoyer, @shunsuke-hikiyama, @SimbaKingjoe, @SimoKiihamaki, @sir-ad, @Slimydog21, @slowtokki0409,
@Soju06, @someaka, @soynchux, @sprmn24, @Stark-X, @steezkelly, @stepanov1975, @stephenschoettler,
@stevehq26-bot, @steveonjava, @Strontvod, @subtract0, @Sunil123135, @superearn-fisher, @Sylw3ster, @tchanee,
@that-ambuj, @thedavidmurray, @TheOnlyMika, @therahul-yo, @thewillhuang, @ticketclosed-wontfix, @Timur00Kh,
@tomqiaozc, @Tosko4, @Tranquil-Flow, @tw2818, @uzunkuyruk, @vaddisrinivas, @vanthinh6886, @vgocoder,
@victorGPT, @vynxevainglory-ai, @waefrebeorn, @walli, @wangpuv, @wanwan2qq, @wesleysimplicio, @worlldz,
@wpengpeng168, @WuKongAI-CMU, @wuli666, @Wysie, @wysie, @xxxigm, @yannsunn, @YanzhongSu, @YarrowQiao, @ygd58,
@YLChen-007, @yoniebans, @yu-xin-c, @YuanHanzhong, @zapabob, @zccyman, @ziliangpeng, @zwolniony, @Zyrixtrex
---
**Full Changelog**: [v2026.5.16...v2026.5.28](https://github.com/NousResearch/hermes-agent/compare/v2026.5.16...v2026.5.28)
+2 -2
View File
@@ -1,7 +1,7 @@
{
"id": "hermes-agent",
"name": "Hermes Agent",
"version": "0.14.0",
"version": "0.15.0",
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
"repository": "https://github.com/NousResearch/hermes-agent",
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
@@ -9,7 +9,7 @@
"license": "MIT",
"distribution": {
"uvx": {
"package": "hermes-agent[acp]==0.14.0",
"package": "hermes-agent[acp]==0.15.0",
"args": ["hermes-acp"]
}
}
+2
View File
@@ -4,3 +4,5 @@ These modules contain pure utility functions and self-contained classes
that were previously embedded in the 3,600-line run_agent.py. Extracting
them makes run_agent.py focused on the AIAgent orchestrator class.
"""
from . import jiter_preload as _jiter_preload # noqa: F401
+1
View File
@@ -1522,6 +1522,7 @@ def init_agent(
platform=agent.platform or "cli",
model=agent.model,
context_length=getattr(agent.context_compressor, "context_length", 0),
conversation_id=getattr(agent, "_gateway_session_key", None),
)
except Exception as _ce_err:
_ra().logger.debug("Context engine on_session_start: %s", _ce_err)
+30
View File
@@ -1994,6 +1994,36 @@ def copy_reasoning_content_for_api(agent, source_msg: dict, api_msg: dict) -> No
api_msg.pop("reasoning_content", None)
def reapply_reasoning_echo_for_provider(agent, api_messages: list) -> int:
"""Re-pad assistant turns with reasoning_content for the active provider.
``api_messages`` is built once, before the retry loop, while the *primary*
provider is active. If a mid-conversation fallback then switches to a
require-side provider (DeepSeek / Kimi / MiMo thinking mode), assistant
turns that were built when the prior provider did NOT need the echo-back go
out without ``reasoning_content`` and the new provider rejects them with
HTTP 400 ("The reasoning_content in the thinking mode must be passed back").
Calling this immediately before building the request kwargs re-applies the
pad against the *current* provider. It is idempotent and a no-op unless
``_needs_thinking_reasoning_pad()`` is True for the active provider, so it
is safe to call every iteration and covers every fallback path.
Returns the number of assistant turns that gained reasoning_content.
"""
if not agent._needs_thinking_reasoning_pad():
return 0
padded = 0
for api_msg in api_messages:
if api_msg.get("role") != "assistant":
continue
if api_msg.get("reasoning_content"):
continue
copy_reasoning_content_for_api(agent, api_msg, api_msg)
if api_msg.get("reasoning_content"):
padded += 1
return padded
def _iter_pool_sockets(client: Any):
"""Yield raw sockets reachable from an OpenAI/httpx client pool.
+5 -3
View File
@@ -77,16 +77,16 @@ ADAPTIVE_EFFORT_MAP = {
# xhigh as a distinct level between high and max; older adaptive-thinking
# models (4.6) reject it with a 400. Keep this substring list in sync with
# the Anthropic migration guide as new model families ship.
_XHIGH_EFFORT_SUBSTRINGS = ("4-7", "4.7")
_XHIGH_EFFORT_SUBSTRINGS = ("4-7", "4.7", "4-8", "4.8")
# Models where extended thinking is deprecated/removed (4.6+ behavior: adaptive
# is the only supported mode; 4.7 additionally forbids manual thinking entirely
# and drops temperature/top_p/top_k).
_ADAPTIVE_THINKING_SUBSTRINGS = ("4-6", "4.6", "4-7", "4.7")
_ADAPTIVE_THINKING_SUBSTRINGS = ("4-6", "4.6", "4-7", "4.7", "4-8", "4.8")
# Models where temperature/top_p/top_k return 400 if set to non-default values.
# This is the Opus 4.7 contract; future 4.x+ models are expected to follow it.
_NO_SAMPLING_PARAMS_SUBSTRINGS = ("4-7", "4.7")
_NO_SAMPLING_PARAMS_SUBSTRINGS = ("4-7", "4.7", "4-8", "4.8")
_FAST_MODE_SUPPORTED_SUBSTRINGS = ("opus-4-6", "opus-4.6")
# ── Max output token limits per Anthropic model ───────────────────────
@@ -94,6 +94,8 @@ _FAST_MODE_SUPPORTED_SUBSTRINGS = ("opus-4-6", "opus-4.6")
# max_tokens as a mandatory field. Previously we hardcoded 16384, which
# starves thinking-enabled models (thinking tokens count toward the limit).
_ANTHROPIC_OUTPUT_LIMITS = {
# Claude 4.8
"claude-opus-4-8": 128_000,
# Claude 4.7
"claude-opus-4-7": 128_000,
# Claude 4.6
+90 -1
View File
@@ -2244,11 +2244,15 @@ def _is_payment_error(exc: Exception) -> bool:
# but sometimes wrap them in 429 or other codes.
# Daily quota exhaustion from Bedrock, Vertex AI, and similar providers
# uses different language but is semantically identical to credit exhaustion.
if status in {402, 429, None}:
if status in {402, 404, 429, None}:
if any(kw in err_lower for kw in (
"credits", "insufficient funds",
"can only afford", "billing",
"payment required",
"out of funds", "run out of funds",
"balance_depleted", "no usable credits",
"model_not_supported_on_free_tier",
"not available on the free tier",
# Daily / monthly / weekly quota exhaustion keywords
"quota exceeded", "quota_exceeded",
"too many tokens per day", "daily limit",
@@ -2260,6 +2264,18 @@ def _is_payment_error(exc: Exception) -> bool:
return False
def _nous_portal_account_has_fresh_paid_access() -> bool:
"""Return True only when the fresh Nous account API says paid access is allowed."""
try:
from hermes_cli.nous_account import get_nous_portal_account_info
account_info = get_nous_portal_account_info(force_fresh=True)
return account_info.paid_service_access is True
except Exception as exc:
logger.debug("Auxiliary Nous paid-entitlement refresh check failed: %s", exc)
return False
def _is_rate_limit_error(exc: Exception) -> bool:
"""Detect rate-limit errors that warrant provider fallback.
@@ -2288,6 +2304,10 @@ def _is_rate_limit_error(exc: Exception) -> bool:
if not any(kw in err_lower for kw in (
"credits", "insufficient funds", "billing",
"payment required", "can only afford",
"out of funds", "run out of funds",
"balance_depleted", "no usable credits",
"model_not_supported_on_free_tier",
"not available on the free tier",
)):
return True
return False
@@ -4937,6 +4957,41 @@ def call_llm(
resolved_provider == "nous"
or base_url_host_matches(_base_info, "inference-api.nousresearch.com")
)
if (
_is_payment_error(first_err)
and client_is_nous
and _nous_portal_account_has_fresh_paid_access()
):
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
cache_provider=resolved_provider or "nous",
model=final_model,
async_mode=False,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
main_runtime=main_runtime,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info(
"Auxiliary %s: refreshed Nous runtime credentials after paid account check, retrying",
task or "call",
)
if refreshed_model and refreshed_model != kwargs.get("model"):
kwargs["model"] = refreshed_model
try:
return _validate_llm_response(
refreshed_client.chat.completions.create(**kwargs), task)
except Exception as retry_err:
if not (
_is_auth_error(retry_err)
or _is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_rate_limit_error(retry_err)
):
raise
first_err = retry_err
if _is_auth_error(first_err) and client_is_nous:
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
cache_provider=resolved_provider or "nous",
@@ -5339,6 +5394,40 @@ async def async_call_llm(
resolved_provider == "nous"
or base_url_host_matches(_client_base, "inference-api.nousresearch.com")
)
if (
_is_payment_error(first_err)
and client_is_nous
and _nous_portal_account_has_fresh_paid_access()
):
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
cache_provider=resolved_provider or "nous",
model=final_model,
async_mode=True,
base_url=resolved_base_url,
api_key=resolved_api_key,
api_mode=resolved_api_mode,
is_vision=(task == "vision"),
)
if refreshed_client is not None:
logger.info(
"Auxiliary %s (async): refreshed Nous runtime credentials after paid account check, retrying",
task or "call",
)
if refreshed_model and refreshed_model != kwargs.get("model"):
kwargs["model"] = refreshed_model
try:
return _validate_llm_response(
await refreshed_client.chat.completions.create(**kwargs), task)
except Exception as retry_err:
if not (
_is_auth_error(retry_err)
or _is_payment_error(retry_err)
or _is_connection_error(retry_err)
or _is_rate_limit_error(retry_err)
):
raise
first_err = retry_err
if _is_auth_error(first_err) and client_is_nous:
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
cache_provider=resolved_provider or "nous",
+5 -1
View File
@@ -483,6 +483,11 @@ def _run_review_in_thread(
finally:
clear_thread_tool_whitelist()
# Snapshot review actions before teardown. close() is allowed to
# clean per-session state, but the user-visible self-improvement
# summary still needs the completed review agent's tool results.
review_messages = list(getattr(review_agent, "_session_messages", []))
# Tear down memory providers while stdout is still
# redirected so background thread teardown (Honcho flush,
# Hindsight sync, etc.) stays silent. The finally block
@@ -495,7 +500,6 @@ def _run_review_in_thread(
review_agent.close()
except Exception:
pass
review_messages = list(getattr(review_agent, "_session_messages", []))
review_agent = None
# Scan the review agent's messages for successful tool actions
+8 -8
View File
@@ -403,13 +403,13 @@ def interruptible_api_call(agent, api_kwargs: dict):
_elapsed, _ttfb_timeout, api_kwargs.get("model", "unknown"),
)
if _silent_hint:
agent._emit_status(
agent._buffer_status(
f"⚠️ No first byte from provider in {int(_elapsed)}s "
f"(codex stream, model: {api_kwargs.get('model', 'unknown')}). "
f"Reconnecting. {_silent_hint}"
)
else:
agent._emit_status(
agent._buffer_status(
f"⚠️ No first byte from provider in {int(_elapsed)}s "
f"(codex stream, model: {api_kwargs.get('model', 'unknown')}). "
f"Reconnecting."
@@ -455,7 +455,7 @@ def interruptible_api_call(agent, api_kwargs: dict):
api_kwargs.get("model", "unknown"),
f"{_est_tokens_for_codex_watchdog:,}",
)
agent._emit_status(
agent._buffer_status(
f"⚠️ Codex stream sent no events for {int(_event_stale_elapsed)}s "
f"after first byte (model: {api_kwargs.get('model', 'unknown')}). "
f"Reconnecting."
@@ -493,13 +493,13 @@ def interruptible_api_call(agent, api_kwargs: dict):
api_kwargs.get("model", "unknown"), f"{_est_ctx:,}",
)
if _silent_hint:
agent._emit_status(
agent._buffer_status(
f"⚠️ No response from provider for {int(_elapsed)}s "
f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
f"{_silent_hint}"
)
else:
agent._emit_status(
agent._buffer_status(
f"⚠️ No response from provider for {int(_elapsed)}s "
f"(non-streaming, model: {api_kwargs.get('model', 'unknown')}). "
f"Aborting call."
@@ -1262,7 +1262,7 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
api_mode=agent.api_mode,
)
agent._emit_status(
agent._buffer_status(
f"🔄 Primary model failed — switching to fallback: "
f"{fb_model} via {fb_provider}"
)
@@ -2251,7 +2251,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
mid_tool_call=False,
diag=request_client_holder.get("diag"),
)
agent._emit_status(
agent._buffer_status(
"❌ Provider returned malformed streaming data after "
f"{_max_stream_retries + 1} attempts. "
"The provider may be experiencing issues — "
@@ -2358,7 +2358,7 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_stale_elapsed, _stream_stale_timeout,
api_kwargs.get("model", "unknown"), f"{_est_ctx:,}",
)
agent._emit_status(
agent._buffer_status(
f"⚠️ No response from provider for {int(_stale_elapsed)}s "
f"(model: {api_kwargs.get('model', 'unknown')}, "
f"context: ~{_est_ctx:,} tokens). "
+6 -1
View File
@@ -71,7 +71,12 @@ class ContextEngine(ABC):
def update_from_response(self, usage: Dict[str, Any]) -> None:
"""Update tracked token usage from an API response.
Called after every LLM call with the usage dict from the response.
Called after every LLM call with a normalized usage dict. The legacy
keys ``prompt_tokens``, ``completion_tokens``, and ``total_tokens``
are always present. Newer hosts also include canonical buckets:
``input_tokens``, ``output_tokens``, ``cache_read_tokens``,
``cache_write_tokens``, and ``reasoning_tokens``. Engines should
treat those fields as optional for compatibility with older hosts.
"""
@abstractmethod
+1
View File
@@ -421,6 +421,7 @@ def compress_context(
agent.session_id or "",
boundary_reason="compression",
old_session_id=_old_sid,
conversation_id=getattr(agent, "_gateway_session_key", None),
)
except Exception as _ce_err:
logger.debug("context engine on_session_start (compression): %s", _ce_err)
+338 -92
View File
@@ -127,6 +127,106 @@ def _ra():
return run_agent
def _nous_entitlement_message(capability: str) -> str:
try:
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
account_info = get_nous_portal_account_info(force_fresh=True)
message = format_nous_portal_entitlement_message(
account_info,
capability=capability,
)
return message or ""
except Exception:
return ""
def _print_nous_entitlement_guidance(agent, capability: str) -> bool:
message = _nous_entitlement_message(capability)
if not message:
return False
for line in message.splitlines():
agent._vprint(f"{agent.log_prefix} 💡 {line}", force=True)
return True
def _is_nous_inference_route(provider: str, base_url: str) -> bool:
provider = (provider or "").strip().lower()
if provider == "nous":
return True
base = str(base_url or "")
return (
base_url_host_matches(base, "inference-api.nousresearch.com")
or base_url_host_matches(base, "inference.nousresearch.com")
)
def _billing_or_entitlement_message(
*,
capability: str,
provider: str,
base_url: str,
model: str,
) -> str:
if _is_nous_inference_route(provider, base_url):
return _nous_entitlement_message(capability)
provider_label = (provider or "").strip() or "the selected provider"
model_label = (model or "").strip() or "the selected model"
lines = [
(
f"{provider_label} reported that billing, credits, or account "
f"entitlement is exhausted for {model_label}."
),
"Add credits or update billing with that provider, then retry.",
]
if base_url_host_matches(str(base_url or ""), "openrouter.ai"):
lines.append("OpenRouter credits: https://openrouter.ai/settings/credits")
lines.append("You can switch providers temporarily with /model <model> --provider <provider>.")
return "\n".join(lines)
def _print_billing_or_entitlement_guidance(
agent,
*,
capability: str,
provider: str,
base_url: str,
model: str,
) -> bool:
message = _billing_or_entitlement_message(
capability=capability,
provider=provider,
base_url=base_url,
model=model,
)
if not message:
return False
for line in message.splitlines():
agent._vprint(f"{agent.log_prefix} 💡 {line}", force=True)
return True
def _try_refresh_nous_paid_entitlement_credentials(agent) -> bool:
"""Refresh Nous runtime credentials after a fresh paid-entitlement check."""
try:
from hermes_cli.auth import NOUS_INFERENCE_AUTH_MODE_LEGACY
from hermes_cli.nous_account import get_nous_portal_account_info
account_info = get_nous_portal_account_info(force_fresh=True)
if account_info.paid_service_access is not True:
return False
return agent._try_refresh_nous_client_credentials(
force=False,
inference_auth_mode=NOUS_INFERENCE_AUTH_MODE_LEGACY,
)
except Exception:
return False
def _restore_or_build_system_prompt(agent, system_message, conversation_history):
"""Restore the cached system prompt from the session DB or build it fresh.
@@ -1017,6 +1117,7 @@ def run_conversation(
codex_auth_retry_attempted=False
anthropic_auth_retry_attempted=False
nous_auth_retry_attempted=False
nous_paid_entitlement_refresh_attempted=False
copilot_auth_retry_attempted=False
thinking_sig_retry_attempted = False
invalid_encrypted_content_retry_attempted = False
@@ -1050,17 +1151,18 @@ def run_conversation(
f"Nous Portal rate limit active — "
f"resets in {_fmt_nous_remaining(_nous_remaining)}."
)
agent._vprint(
f"{agent.log_prefix}{_nous_msg} Trying fallback...",
force=True,
agent._buffer_vprint(
f"{_nous_msg} Trying fallback..."
)
agent._emit_status(f"{_nous_msg}")
agent._buffer_status(f"{_nous_msg}")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
primary_recovery_attempted = False
continue
# No fallback available — return with clear message
# No fallback available — surface buffered context
# so user sees the rate-limit message that led here.
agent._flush_status_buffer()
agent._persist_session(messages, conversation_history)
return {
"final_response": (
@@ -1082,6 +1184,14 @@ def run_conversation(
try:
agent._reset_stream_delivery_tracking()
# api_messages is built once, before this retry loop, while the
# primary provider is active. A mid-conversation fallback can
# switch to a require-side provider (DeepSeek / Kimi / MiMo) that
# rejects assistant turns lacking reasoning_content. Re-apply the
# echo-back pad for the *current* provider here (idempotent no-op
# unless the active provider needs it) so the fallback request
# isn't sent with stale, primary-shaped reasoning fields.
agent._reapply_reasoning_echo_for_provider(api_messages)
api_kwargs = agent._build_api_kwargs(api_messages)
if agent._force_ascii_payload:
_sanitize_structure_non_ascii(api_kwargs)
@@ -1275,9 +1385,10 @@ def run_conversation(
error_details.append("response.choices is empty")
if response_invalid:
# Stop spinner before printing error messages
# Stop spinner silently — retry status is now buffered
# and only surfaced if every retry+fallback exhausts.
if thinking_spinner:
thinking_spinner.stop("(´;ω;`) oops, retrying...")
thinking_spinner.stop("")
thinking_spinner = None
if agent.thinking_callback:
agent.thinking_callback("")
@@ -1290,7 +1401,7 @@ def run_conversation(
# rate-limit symptom. Switch to fallback immediately
# rather than retrying with extended backoff.
if agent._fallback_index < len(agent._fallback_chain):
agent._emit_status("⚠️ Empty/malformed response — switching to fallback...")
agent._buffer_status("⚠️ Empty/malformed response — switching to fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
@@ -1352,20 +1463,22 @@ def run_conversation(
else:
_failure_hint = f"response time {api_duration:.1f}s"
agent._vprint(f"{agent.log_prefix}⚠️ Invalid API response (attempt {retry_count}/{max_retries}): {', '.join(error_details)}", force=True)
agent._vprint(f"{agent.log_prefix} 🏢 Provider: {provider_name}", force=True)
agent._buffer_vprint(f"⚠️ Invalid API response (attempt {retry_count}/{max_retries}): {', '.join(error_details)}")
agent._buffer_vprint(f" 🏢 Provider: {provider_name}")
cleaned_provider_error = agent._clean_error_message(error_msg)
agent._vprint(f"{agent.log_prefix} 📝 Provider message: {cleaned_provider_error}", force=True)
agent._vprint(f"{agent.log_prefix} ⏱️ {_failure_hint}", force=True)
agent._buffer_vprint(f" 📝 Provider message: {cleaned_provider_error}")
agent._buffer_vprint(f" ⏱️ {_failure_hint}")
if retry_count >= max_retries:
# Try fallback before giving up
agent._emit_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
agent._buffer_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
primary_recovery_attempted = False
continue
# Terminal — flush buffered retry trace so user sees what happened.
agent._flush_status_buffer()
agent._emit_status(f"❌ Max retries ({max_retries}) exceeded for invalid responses. Giving up.")
logger.error(f"{agent.log_prefix}Invalid API response after {max_retries} retries.")
agent._persist_session(messages, conversation_history)
@@ -1379,7 +1492,7 @@ def run_conversation(
# Backoff before retry — jittered exponential: 5s base, 120s cap
wait_time = jittered_backoff(retry_count, base_delay=5.0, max_delay=120.0)
agent._vprint(f"{agent.log_prefix}⏳ Retrying in {wait_time:.1f}s ({_failure_hint})...", force=True)
agent._buffer_vprint(f"⏳ Retrying in {wait_time:.1f}s ({_failure_hint})...")
logger.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}")
# Sleep in small increments to stay responsive to interrupts
@@ -1606,14 +1719,14 @@ def run_conversation(
if assistant_message is not None and _trunc_has_tool_calls:
if truncated_tool_call_retries < 1:
truncated_tool_call_retries += 1
agent._vprint(
f"{agent.log_prefix}⚠️ Truncated tool call detected — retrying API call...",
force=True,
agent._buffer_vprint(
f"⚠️ Truncated tool call detected — retrying API call..."
)
# Don't append the broken response to messages;
# just re-run the same API call from the current
# message state, giving the model another chance.
continue
agent._flush_status_buffer()
agent._vprint(
f"{agent.log_prefix}⚠️ Truncated tool call response detected again — refusing to execute incomplete tool arguments.",
force=True,
@@ -1647,6 +1760,7 @@ def run_conversation(
}
else:
# First message was truncated - mark as failed
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ First response truncated - cannot recover", force=True)
agent._persist_session(messages, conversation_history)
return {
@@ -1668,10 +1782,19 @@ def run_conversation(
prompt_tokens = canonical_usage.prompt_tokens
completion_tokens = canonical_usage.output_tokens
total_tokens = canonical_usage.total_tokens
# Forward canonical token + cache buckets so context engines
# can make decisions on cache hit ratios / reasoning costs,
# not just legacy aggregate tokens. Legacy keys stay for
# back-compat with engines that only read prompt/completion/total.
usage_dict = {
"prompt_tokens": prompt_tokens,
"completion_tokens": completion_tokens,
"total_tokens": total_tokens,
"input_tokens": canonical_usage.input_tokens,
"output_tokens": canonical_usage.output_tokens,
"cache_read_tokens": canonical_usage.cache_read_tokens,
"cache_write_tokens": canonical_usage.cache_write_tokens,
"reasoning_tokens": canonical_usage.reasoning_tokens,
}
agent.context_compressor.update_from_response(usage_dict)
@@ -1789,6 +1912,11 @@ def run_conversation(
)
has_retried_429 = False # Reset on success
# Note: don't clear the retry buffer here — an "API call
# success" only means we got bytes back, not that we got
# usable content. Empty responses still loop through the
# empty-retry path below; the buffer is cleared when
# genuinely successful content is detected later (~L4127).
# Clear Nous rate limit state on successful request —
# proves the limit has reset and other sessions can
# resume hitting Nous.
@@ -1815,9 +1943,10 @@ def run_conversation(
break
except Exception as api_error:
# Stop spinner before printing error messages
# Stop spinner silently — retry status is buffered and
# only flushed when every retry+fallback is exhausted.
if thinking_spinner:
thinking_spinner.stop("(╥_╥) error, retrying...")
thinking_spinner.stop("")
thinking_spinner = None
if agent.thinking_callback:
agent.thinking_callback("")
@@ -1872,14 +2001,12 @@ def run_conversation(
if _surrogates_found or _is_surrogate_error:
agent._unicode_sanitization_passes += 1
if _surrogates_found:
agent._vprint(
f"{agent.log_prefix}⚠️ Stripped invalid surrogate characters from messages. Retrying...",
force=True,
agent._buffer_vprint(
f"⚠️ Stripped invalid surrogate characters from messages. Retrying..."
)
else:
agent._vprint(
f"{agent.log_prefix}⚠️ Surrogate encoding error — retrying after full-payload sanitization...",
force=True,
agent._buffer_vprint(
f"⚠️ Surrogate encoding error — retrying after full-payload sanitization..."
)
continue
if _is_ascii_codec:
@@ -2093,6 +2220,23 @@ def run_conversation(
classified.should_rotate_credential, classified.should_fallback,
)
if (
classified.reason == FailoverReason.billing
and _is_nous_inference_route(
getattr(agent, "provider", "") or "",
getattr(agent, "base_url", "") or "",
)
and not nous_paid_entitlement_refresh_attempted
):
nous_paid_entitlement_refresh_attempted = True
if _try_refresh_nous_paid_entitlement_credentials(agent):
agent._vprint(
f"{agent.log_prefix}🔐 Nous paid access verified — "
"refreshed runtime credentials and retrying request...",
force=True,
)
continue
recovered_with_pool, has_retried_429 = agent._recover_with_credential_pool(
status_code=status_code,
has_retried_429=has_retried_429,
@@ -2190,7 +2334,7 @@ def run_conversation(
codex_auth_retry_attempted = True
if agent._try_refresh_codex_client_credentials(force=True):
_label = "xAI OAuth" if agent.provider == "xai-oauth" else "Codex"
agent._vprint(f"{agent.log_prefix}🔐 {_label} auth refreshed after 401. Retrying request...")
agent._buffer_vprint(f"🔐 {_label} auth refreshed after 401. Retrying request...")
continue
if (
agent.api_mode == "chat_completions"
@@ -2217,7 +2361,8 @@ def run_conversation(
print(f"{agent.log_prefix}🔐 Nous 401 — Portal authentication failed.")
if _body_text:
print(f"{agent.log_prefix} Response: {_body_text}")
print(f"{agent.log_prefix} Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
if not _print_nous_entitlement_guidance(agent, "Nous model access"):
print(f"{agent.log_prefix} Most likely: Portal OAuth expired, account out of credits, or agent key revoked.")
print(f"{agent.log_prefix} Troubleshooting:")
print(f"{agent.log_prefix} • Re-authenticate: hermes auth add nous")
print(f"{agent.log_prefix} • Check credits / billing: https://portal.nousresearch.com")
@@ -2230,7 +2375,7 @@ def run_conversation(
):
copilot_auth_retry_attempted = True
if agent._try_refresh_copilot_client_credentials():
agent._vprint(f"{agent.log_prefix}🔐 Copilot credentials refreshed after 401. Retrying request...")
agent._buffer_vprint(f"🔐 Copilot credentials refreshed after 401. Retrying request...")
continue
if (
agent.api_mode == "anthropic_messages"
@@ -2405,41 +2550,37 @@ def run_conversation(
_base = getattr(agent, "base_url", "unknown")
_model = getattr(agent, "model", "unknown")
_status_code_str = f" [HTTP {status_code}]" if status_code else ""
agent._vprint(f"{agent.log_prefix}⚠️ API call failed (attempt {retry_count}/{max_retries}): {error_type}{_status_code_str}", force=True)
agent._vprint(f"{agent.log_prefix} 🔌 Provider: {_provider} Model: {_model}", force=True)
agent._vprint(f"{agent.log_prefix} 🌐 Endpoint: {_base}", force=True)
agent._vprint(f"{agent.log_prefix} 📝 Error: {_error_summary}", force=True)
agent._buffer_vprint(f"⚠️ API call failed (attempt {retry_count}/{max_retries}): {error_type}{_status_code_str}")
agent._buffer_vprint(f" 🔌 Provider: {_provider} Model: {_model}")
agent._buffer_vprint(f" 🌐 Endpoint: {_base}")
agent._buffer_vprint(f" 📝 Error: {_error_summary}")
if status_code and status_code < 500:
_err_body = getattr(api_error, "body", None)
_err_body_str = str(_err_body)[:300] if _err_body else None
if _err_body_str:
agent._vprint(f"{agent.log_prefix} 📋 Details: {_err_body_str}", force=True)
agent._vprint(f"{agent.log_prefix} ⏱️ Elapsed: {elapsed_time:.2f}s Context: {len(api_messages)} msgs, ~{approx_tokens:,} tokens")
agent._buffer_vprint(f" 📋 Details: {_err_body_str}")
agent._buffer_vprint(f" ⏱️ Elapsed: {elapsed_time:.2f}s Context: {len(api_messages)} msgs, ~{approx_tokens:,} tokens")
# Actionable hint for OpenRouter "no tool endpoints" error.
# This fires regardless of whether fallback succeeds — the
# user needs to know WHY their model failed so they can fix
# their provider routing, not just silently fall back.
# Buffered like the rest of the retry trace — surfaced only
# if every retry+fallback exhausts. Avoids spamming users
# who recover automatically via fallback.
if (
agent._is_openrouter_url()
and "support tool use" in error_msg
):
agent._vprint(
f"{agent.log_prefix} 💡 No OpenRouter providers for {_model} support tool calling with your current settings.",
force=True,
agent._buffer_vprint(
f" 💡 No OpenRouter providers for {_model} support tool calling with your current settings."
)
if agent.providers_allowed:
agent._vprint(
f"{agent.log_prefix} Your provider_routing.only restriction is filtering out tool-capable providers.",
force=True,
agent._buffer_vprint(
f" Your provider_routing.only restriction is filtering out tool-capable providers."
)
agent._vprint(
f"{agent.log_prefix} Try removing the restriction or adding providers that support tools for this model.",
force=True,
agent._buffer_vprint(
f" Try removing the restriction or adding providers that support tools for this model."
)
agent._vprint(
f"{agent.log_prefix} Check which providers support tools: https://openrouter.ai/models/{_model}",
force=True,
agent._buffer_vprint(
f" Check which providers support tools: https://openrouter.ai/models/{_model}"
)
# Check for interrupt before deciding to retry
@@ -2489,11 +2630,10 @@ def run_conversation(
# user later enables extra usage the 1M limit
# should come back automatically.
compressor._context_probe_persistable = False
agent._vprint(
f"{agent.log_prefix}⚠️ Anthropic long-context tier "
agent._buffer_vprint(
f"⚠️ Anthropic long-context tier "
f"requires extra usage — reducing context: "
f"{old_ctx:,}{_reduced_ctx:,} tokens",
force=True,
f"{old_ctx:,}{_reduced_ctx:,} tokens"
)
compression_attempts += 1
@@ -2509,7 +2649,7 @@ def run_conversation(
# messages to the new session, not skipping them.
conversation_history = None
if len(messages) < original_len or old_ctx > _reduced_ctx:
agent._emit_status(
agent._buffer_status(
f"🗜️ Context reduced to {_reduced_ctx:,} tokens "
f"(was {old_ctx:,}), retrying..."
)
@@ -2538,7 +2678,12 @@ def run_conversation(
base_url=getattr(agent, "base_url", None),
)
if not pool_may_recover:
agent._emit_status("⚠️ Rate limited — switching to fallback provider...")
if classified.reason == FailoverReason.billing:
agent._buffer_status(
"⚠️ Billing or credits exhausted — switching to fallback provider..."
)
else:
agent._buffer_status("⚠️ Rate limited — switching to fallback provider...")
if agent._try_activate_fallback(reason=classified.reason):
retry_count = 0
compression_attempts = 0
@@ -2650,6 +2795,8 @@ def run_conversation(
if is_payload_too_large:
compression_attempts += 1
if compression_attempts > max_compression_attempts:
# Terminal — surface the buffered retry trace.
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached for payload-too-large error.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logger.error(f"{agent.log_prefix}413 compression failed after {max_compression_attempts} attempts.")
@@ -2663,7 +2810,7 @@ def run_conversation(
"failed": True,
"compression_exhausted": True,
}
agent._emit_status(f"⚠️ Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")
agent._buffer_status(f"⚠️ Request payload too large (413) — compression attempt {compression_attempts}/{max_compression_attempts}...")
original_len = len(messages)
messages, active_system_prompt = agent._compress_context(
@@ -2676,11 +2823,14 @@ def run_conversation(
conversation_history = None
if len(messages) < original_len:
agent._emit_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
agent._buffer_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
time.sleep(2) # Brief pause between compression retries
restart_with_compressed_messages = True
break
else:
# Terminal — surface buffered context so the user
# sees what compression attempts were made.
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Payload too large and cannot compress further.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logger.error(f"{agent.log_prefix}413 payload too large. Cannot compress further.")
@@ -2724,16 +2874,16 @@ def run_conversation(
# touching context_length or triggering compression.
safe_out = max(1, available_out - 64) # small safety margin
agent._ephemeral_max_output_tokens = safe_out
agent._vprint(
f"{agent.log_prefix}⚠️ Output cap too large for current prompt — "
agent._buffer_vprint(
f"⚠️ Output cap too large for current prompt — "
f"retrying with max_tokens={safe_out:,} "
f"(available_tokens={available_out:,}; context_length unchanged at {old_ctx:,})",
force=True,
f"(available_tokens={available_out:,}; context_length unchanged at {old_ctx:,})"
)
# Still count against compression_attempts so we don't
# loop forever if the error keeps recurring.
compression_attempts += 1
if compression_attempts > max_compression_attempts:
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logger.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
@@ -2769,13 +2919,12 @@ def run_conversation(
)
if parsed_limit and parsed_limit < old_ctx:
new_ctx = parsed_limit
agent._vprint(f"{agent.log_prefix}Context limit detected from API: {new_ctx:,} tokens (was {old_ctx:,})", force=True)
agent._buffer_vprint(f"Context limit detected from API: {new_ctx:,} tokens (was {old_ctx:,})")
elif minimax_delta_only_overflow:
new_ctx = old_ctx
agent._vprint(
f"{agent.log_prefix}Provider reported overflow amount only; "
f"keeping context_length at {old_ctx:,} tokens and compressing.",
force=True,
agent._buffer_vprint(
f"Provider reported overflow amount only; "
f"keeping context_length at {old_ctx:,} tokens and compressing."
)
else:
# Step down to the next probe tier
@@ -2802,12 +2951,13 @@ def run_conversation(
compressor._context_probe_persistable = bool(
parsed_limit and parsed_limit == new_ctx
)
agent._vprint(f"{agent.log_prefix}⚠️ Context length exceeded — stepping down: {old_ctx:,}{new_ctx:,} tokens", force=True)
agent._buffer_vprint(f"⚠️ Context length exceeded — stepping down: {old_ctx:,}{new_ctx:,} tokens")
else:
agent._vprint(f"{agent.log_prefix}⚠️ Context length exceeded at minimum tier — attempting compression...", force=True)
agent._buffer_vprint(f"⚠️ Context length exceeded at minimum tier — attempting compression...")
compression_attempts += 1
if compression_attempts > max_compression_attempts:
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
logger.error(f"{agent.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
@@ -2821,7 +2971,7 @@ def run_conversation(
"failed": True,
"compression_exhausted": True,
}
agent._emit_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")
agent._buffer_status(f"🗜️ Context too large (~{approx_tokens:,} tokens) — compressing ({compression_attempts}/{max_compression_attempts})...")
original_len = len(messages)
messages, active_system_prompt = agent._compress_context(
@@ -2835,12 +2985,13 @@ def run_conversation(
if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
if len(messages) < original_len:
agent._emit_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
agent._buffer_status(f"🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
time.sleep(2) # Brief pause between compression retries
restart_with_compressed_messages = True
break
else:
# Can't compress further and already at minimum tier
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.", force=True)
agent._vprint(f"{agent.log_prefix} 💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.", force=True)
logger.error(f"{agent.log_prefix}Context length exceeded: {approx_tokens:,} tokens. Cannot compress further.")
@@ -2929,7 +3080,10 @@ def run_conversation(
if is_client_error:
# Try fallback before aborting — a different provider
# may not have the same issue (rate limit, auth, etc.)
agent._emit_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
if classified.reason == FailoverReason.content_policy_blocked:
agent._buffer_status("⚠️ Provider safety filter blocked this request — trying fallback...")
else:
agent._buffer_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
@@ -2939,16 +3093,38 @@ def run_conversation(
agent._dump_api_request_debug(
api_kwargs, reason="non_retryable_client_error", error=api_error,
)
agent._emit_status(
f"❌ Non-retryable error (HTTP {status_code}): "
f"{agent._summarize_api_error(api_error)}"
)
# Terminal — flush buffered context so the user sees
# what was tried before the abort.
agent._flush_status_buffer()
if classified.reason == FailoverReason.content_policy_blocked:
agent._emit_status(
f"❌ Provider safety filter blocked this request: "
f"{agent._summarize_api_error(api_error)}"
)
else:
agent._emit_status(
f"❌ Non-retryable error (HTTP {status_code}): "
f"{agent._summarize_api_error(api_error)}"
)
agent._vprint(f"{agent.log_prefix}❌ Non-retryable client error (HTTP {status_code}). Aborting.", force=True)
agent._vprint(f"{agent.log_prefix} 🔌 Provider: {_provider} Model: {_model}", force=True)
agent._vprint(f"{agent.log_prefix} 🌐 Endpoint: {_base}", force=True)
# Actionable guidance for common auth errors
if classified.is_auth or classified.reason == FailoverReason.billing:
if _provider in {"openai-codex", "xai-oauth", "nous"} and status_code == 401:
if classified.reason == FailoverReason.billing and _print_billing_or_entitlement_guidance(
agent,
capability="model access",
provider=_provider,
base_url=str(_base),
model=_model,
):
pass
elif _provider == "nous" and _print_nous_entitlement_guidance(
agent,
"Nous model access",
):
pass
elif _provider in {"openai-codex", "xai-oauth", "nous"} and status_code == 401:
if _provider == "openai-codex":
agent._vprint(f"{agent.log_prefix} 💡 Codex OAuth token was rejected (HTTP 401). Your token may have been", force=True)
agent._vprint(f"{agent.log_prefix} refreshed by another client (Codex CLI, VS Code). To fix:", force=True)
@@ -2976,6 +3152,28 @@ def run_conversation(
agent._vprint(f"{agent.log_prefix} • Check credits: https://openrouter.ai/settings/credits", force=True)
else:
agent._vprint(f"{agent.log_prefix} 💡 This type of error won't be fixed by retrying.", force=True)
# Content-policy blocks deserve their own actionable
# guidance — neither "fix your API key" nor "retry won't
# help" tells the user what to actually do. The provider
# has refused this specific prompt, so the recovery is
# either a rephrase or routing to a different model.
if classified.reason == FailoverReason.content_policy_blocked:
agent._vprint(
f"{agent.log_prefix} 💡 The provider's safety filter rejected this specific prompt.",
force=True,
)
agent._vprint(
f"{agent.log_prefix} • Try rephrasing the request, narrowing the context, or splitting into smaller steps.",
force=True,
)
agent._vprint(
f"{agent.log_prefix} • Configure a fallback provider so future blocks route automatically:",
force=True,
)
agent._vprint(
f"{agent.log_prefix} hermes fallback add (interactive picker — same as `hermes model`)",
force=True,
)
logger.error(f"{agent.log_prefix}Non-retryable client error: {api_error}")
# Skip session persistence when the error is likely
# context-overflow related (status 400 + large session).
@@ -2990,6 +3188,23 @@ def run_conversation(
)
else:
agent._persist_session(messages, conversation_history)
if classified.reason == FailoverReason.content_policy_blocked:
_summary = agent._summarize_api_error(api_error)
_policy_response = (
f"⚠️ The model provider's safety filter blocked this request "
f"(not a Hermes/gateway failure).\n\n"
f"Provider message: {_summary}\n\n"
f"Try rephrasing the request, narrowing the context, or "
f"adding a fallback provider with `hermes fallback add`."
)
return {
"final_response": _policy_response,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
"failed": True,
"error": f"content_policy_blocked: {_summary}",
}
return {
"final_response": None,
"messages": messages,
@@ -3011,14 +3226,32 @@ def run_conversation(
retry_count = 0
continue
# Try fallback before giving up entirely
agent._emit_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
agent._buffer_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
primary_recovery_attempted = False
continue
# Terminal — flush buffered retry/fallback trace.
agent._flush_status_buffer()
_final_summary = agent._summarize_api_error(api_error)
if is_rate_limited:
_billing_guidance = ""
if classified.reason == FailoverReason.billing:
agent._emit_status(f"❌ Billing or credits exhausted — {_final_summary}")
_billing_guidance = _billing_or_entitlement_message(
capability="model access",
provider=_provider,
base_url=str(_base),
model=_model,
)
_print_billing_or_entitlement_guidance(
agent,
capability="model access",
provider=_provider,
base_url=str(_base),
model=_model,
)
elif is_rate_limited:
agent._emit_status(f"❌ Rate limited after {max_retries} retries — {_final_summary}")
else:
agent._emit_status(f"❌ API failed after {max_retries} retries — {_final_summary}")
@@ -3063,7 +3296,12 @@ def run_conversation(
api_kwargs, reason="max_retries_exhausted", error=api_error,
)
agent._persist_session(messages, conversation_history)
_final_response = f"API call failed after {max_retries} retries: {_final_summary}"
if classified.reason == FailoverReason.billing:
_final_response = f"Billing or credits exhausted: {_final_summary}"
if _billing_guidance:
_final_response += f"\n\n{_billing_guidance}"
else:
_final_response = f"API call failed after {max_retries} retries: {_final_summary}"
if _is_stream_drop:
_final_response += (
"\n\nThe provider's stream connection keeps "
@@ -3095,9 +3333,9 @@ def run_conversation(
pass
wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
if is_rate_limited:
agent._emit_status(f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries})...")
agent._buffer_status(f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries})...")
else:
agent._emit_status(f"⏳ Retrying in {wait_time:.1f}s (attempt {retry_count}/{max_retries})...")
agent._buffer_status(f"⏳ Retrying in {wait_time:.1f}s (attempt {retry_count}/{max_retries})...")
logger.warning(
"Retrying API call in %ss (attempt %s/%s) %s error=%s",
wait_time,
@@ -3256,14 +3494,15 @@ def run_conversation(
if has_incomplete_scratchpad(assistant_message.content or ""):
agent._incomplete_scratchpad_retries += 1
agent._vprint(f"{agent.log_prefix}⚠️ Incomplete <REASONING_SCRATCHPAD> detected (opened but never closed)")
agent._buffer_vprint(f"⚠️ Incomplete <REASONING_SCRATCHPAD> detected (opened but never closed)")
if agent._incomplete_scratchpad_retries <= 2:
agent._vprint(f"{agent.log_prefix}🔄 Retrying API call ({agent._incomplete_scratchpad_retries}/2)...")
agent._buffer_vprint(f"🔄 Retrying API call ({agent._incomplete_scratchpad_retries}/2)...")
# Don't add the broken message, just retry
continue
else:
# Max retries - discard this turn and save as partial
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max retries (2) for incomplete scratchpad. Saving as partial.", force=True)
agent._incomplete_scratchpad_retries = 0
@@ -3371,9 +3610,10 @@ def run_conversation(
available = ", ".join(sorted(agent.valid_tool_names))
invalid_name = invalid_tool_calls[0]
invalid_preview = invalid_name[:80] + "..." if len(invalid_name) > 80 else invalid_name
agent._vprint(f"{agent.log_prefix}⚠️ Unknown tool '{invalid_preview}' — sending error to model for agent-correction ({agent._invalid_tool_retries}/3)")
agent._buffer_vprint(f"⚠️ Unknown tool '{invalid_preview}' — sending error to model for agent-correction ({agent._invalid_tool_retries}/3)")
if agent._invalid_tool_retries >= 3:
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Max retries (3) for invalid tool calls exceeded. Stopping as partial.", force=True)
agent._invalid_tool_retries = 0
agent._persist_session(messages, conversation_history)
@@ -3457,16 +3697,16 @@ def run_conversation(
agent._invalid_json_retries += 1
tool_name, error_msg = invalid_json_args[0]
agent._vprint(f"{agent.log_prefix}⚠️ Invalid JSON in tool call arguments for '{tool_name}': {error_msg}")
agent._buffer_vprint(f"⚠️ Invalid JSON in tool call arguments for '{tool_name}': {error_msg}")
if agent._invalid_json_retries < 3:
agent._vprint(f"{agent.log_prefix}🔄 Retrying API call ({agent._invalid_json_retries}/3)...")
agent._buffer_vprint(f"🔄 Retrying API call ({agent._invalid_json_retries}/3)...")
# Don't add anything to messages, just retry the API call
continue
else:
# Instead of returning partial, inject tool error results so the model can recover.
# Using tool results (not user messages) preserves role alternation.
agent._vprint(f"{agent.log_prefix}⚠️ Injecting recovery tool results for invalid JSON...")
agent._buffer_vprint(f"⚠️ Injecting recovery tool results for invalid JSON...")
agent._invalid_json_retries = 0 # Reset for next attempt
# Append the assistant message with its (broken) tool_calls
@@ -3774,7 +4014,7 @@ def run_conversation(
"Empty response after tool calls — nudging model "
"to continue processing"
)
agent._emit_status(
agent._buffer_status(
"⚠️ Model returned empty after tool calls — "
"nudging to continue"
)
@@ -3820,7 +4060,7 @@ def run_conversation(
"prefilling to continue (%d/2)",
agent._thinking_prefill_retries,
)
agent._emit_status(
agent._buffer_status(
f"↻ Thinking-only response — prefilling to continue "
f"({agent._thinking_prefill_retries}/2)"
)
@@ -3855,7 +4095,7 @@ def run_conversation(
"retry %d/3 (model=%s)",
agent._empty_content_retries, agent.model,
)
agent._emit_status(
agent._buffer_status(
f"⚠️ Empty response from model — retrying "
f"({agent._empty_content_retries}/3)"
)
@@ -3874,13 +4114,13 @@ def run_conversation(
agent._empty_content_retries, agent.model,
agent.provider,
)
agent._emit_status(
agent._buffer_status(
"⚠️ Model returning empty responses — "
"switching to fallback provider..."
)
if agent._try_activate_fallback():
agent._empty_content_retries = 0
agent._emit_status(
agent._buffer_status(
f"↻ Switched to fallback: {agent.model} "
f"({agent.provider})"
)
@@ -3894,6 +4134,9 @@ def run_conversation(
# Exhausted retries and fallback chain (or no
# fallback configured). Fall through to the
# "(empty)" terminal.
# Surface the buffered retry/fallback trace so the
# user can see what was attempted before "(empty)".
agent._flush_status_buffer()
_turn_exit_reason = "empty_response_exhausted"
reasoning_text = agent._extract_reasoning(assistant_message)
agent._drop_trailing_empty_response_scaffolding(messages)
@@ -3938,6 +4181,9 @@ def run_conversation(
# Reset retry counter/signature on successful content
agent._empty_content_retries = 0
agent._thinking_prefill_retries = 0
# Successful content reached — drop any buffered retry
# status from earlier failed attempts in this turn.
agent._clear_status_buffer()
if (
agent.api_mode == "codex_responses"
-4
View File
@@ -904,10 +904,6 @@ def get_cute_tool_message(
extra = f" +{len(urls)-1}" if len(urls) > 1 else ""
return _wrap(f"┊ 📄 fetch {_trunc(domain, 35)}{extra} {dur}")
return _wrap(f"┊ 📄 fetch pages {dur}")
if tool_name == "web_crawl":
url = args.get("url", "")
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
return _wrap(f"┊ 🕸️ crawl {_trunc(domain, 35)} {dur}")
if tool_name == "terminal":
return _wrap(f"┊ 💻 $ {_trunc(args.get('command', ''), 42)} {dur}")
if tool_name == "process":
+89 -4
View File
@@ -44,9 +44,10 @@ class FailoverReason(enum.Enum):
payload_too_large = "payload_too_large" # 413 — compress payload
image_too_large = "image_too_large" # Native image part exceeds provider's per-image limit — shrink and retry
# Model
# Model / provider policy
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
provider_policy_blocked = "provider_policy_blocked" # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy
content_policy_blocked = "content_policy_blocked" # Provider safety filter rejected this prompt — deterministic per-request, don't retry unchanged
# Request format
format_error = "format_error" # 400 bad request — abort or strip + retry
@@ -97,13 +98,20 @@ _BILLING_PATTERNS = [
"insufficient_quota",
"insufficient balance",
"credit balance",
"credits exhausted",
"credits have been exhausted",
"no usable credits",
"top up your credits",
"payment required",
"billing hard limit",
"exceeded your current quota",
"account is deactivated",
"plan does not include",
"out of funds",
"run out of funds",
"balance_depleted",
"model_not_supported_on_free_tier",
"not available on the free tier",
]
# Patterns that indicate rate limiting (transient, will resolve)
@@ -282,6 +290,45 @@ _PROVIDER_POLICY_BLOCKED_PATTERNS = [
"no endpoints found matching your data policy",
]
# Provider content-policy / safety-filter blocks. Distinct from
# ``provider_policy_blocked`` above (which is an OpenRouter *account*-level
# data/privacy guardrail) — these are *per-prompt* safety decisions made by
# the upstream model provider. They are deterministic for the unchanged
# request, so retrying the same prompt three times just reproduces the same
# block and burns paid attempts on a refusal. The recovery is to switch to a
# configured fallback model/provider immediately, or surface the block to
# the user with actionable guidance if no fallback exists.
#
# Patterns are intentionally narrow — each phrase is a verbatim string from
# a specific provider's safety pipeline, not a generic word like "policy" or
# "violation" that could collide with billing/auth/format errors:
# • OpenAI Codex cybersecurity refusal (gpt-5.5, the case from #18028)
# • OpenAI moderation refusal ("violates our usage policies", with
# "usage policies" disambiguating from billing's "exceeded ... policy")
# • Anthropic safety refusal ("prompt was flagged by ... safety system")
# • OpenAI Responses content filter
_CONTENT_POLICY_BLOCKED_PATTERNS = [
# OpenAI Codex (#18028) — message may arrive without an HTTP status
"flagged for possible cybersecurity risk",
"trusted access for cyber",
# OpenAI moderation — chat completions / responses
"violates our usage policies",
"violates openai's usage policies",
"your request was flagged by",
# Anthropic safety system
"prompt was flagged by our safety",
"responses cannot be generated due to safety",
# Generic content-filter wording seen on Azure / OpenAI Responses.
# ``content_filter`` (underscore) is the OpenAI-standard error/finish
# token surfaced verbatim by their SDKs when a request is blocked.
# ``responsibleaipolicyviolation`` is Azure OpenAI's error code.
# Deliberately NOT matching the space variant ("content filter") — it
# appears in benign config descriptions and tooltip text that providers
# echo back; the underscore form is provider-specific enough.
"content_filter",
"responsibleaipolicyviolation",
]
# Auth patterns (non-status-code signals)
_AUTH_PATTERNS = [
"invalid api key",
@@ -485,6 +532,20 @@ def classify_api_error(
# ── 1. Provider-specific patterns (highest priority) ────────────
# Provider content-policy / safety-filter block. The provider has made a
# deterministic refusal decision about THIS prompt — retrying unchanged
# just reproduces the same refusal and burns paid attempts. Must run
# before status-based classification so a 400 safety block isn't
# downgraded to a generic ``format_error`` and a status-less block
# (OpenAI Codex SDK can raise without one) isn't left in the retryable
# ``unknown`` bucket. See issue #18028.
if any(p in error_msg for p in _CONTENT_POLICY_BLOCKED_PATTERNS):
return _result(
FailoverReason.content_policy_blocked,
retryable=False,
should_fallback=True,
)
# Anthropic thinking block signature invalid (400).
# Don't gate on provider — OpenRouter proxies Anthropic errors, so the
# provider may be "openrouter" even though the error is Anthropic-specific.
@@ -690,8 +751,13 @@ def _classify_by_status(
)
if status_code == 403:
# OpenRouter 403 "key limit exceeded" is actually billing
if "key limit exceeded" in error_msg or "spending limit" in error_msg:
# OpenRouter 403 "key limit exceeded" is actually billing. Other
# providers also use 403 for account-plan or credit exhaustion.
if (
"key limit exceeded" in error_msg
or "spending limit" in error_msg
or any(p in error_msg for p in _BILLING_PATTERNS)
):
return result_fn(
FailoverReason.billing,
retryable=False,
@@ -708,6 +774,17 @@ def _classify_by_status(
return _classify_402(error_msg, result_fn)
if status_code == 404:
# Nous API currently surfaces HA/NAS credit depletion as a paid model
# becoming unavailable on the Free Tier, returned as 404 rather than
# 402. Treat that as entitlement/billing exhaustion, not a missing
# model, so the retry loop can show credit/top-up guidance.
if any(p in error_msg for p in _BILLING_PATTERNS):
return result_fn(
FailoverReason.billing,
retryable=False,
should_rotate_credential=True,
should_fallback=True,
)
# OpenRouter policy-block 404 — distinct from "model not found".
# The model exists; the user's account privacy setting excludes the
# only endpoint serving it. Falling back to another provider won't
@@ -973,7 +1050,15 @@ def _classify_by_error_code(
should_rotate_credential=True,
)
if code_lower in {"insufficient_quota", "billing_not_active", "payment_required"}:
if code_lower in {
"insufficient_quota",
"billing_not_active",
"payment_required",
"insufficient_credits",
"no_usable_credits",
"balance_depleted",
"model_not_supported_on_free_tier",
}:
return result_fn(
FailoverReason.billing,
retryable=False,
+39
View File
@@ -0,0 +1,39 @@
"""Best-effort early import for the OpenAI SDK's native streaming parser.
The OpenAI SDK imports ``jiter`` while constructing streaming chat-completion
responses. On some Windows installs the native extension can be imported
directly from the Hermes venv, but the first import fails when it happens later
inside the threaded streaming request path. Loading it once during agent
package import avoids that import-order failure while preserving the normal
SDK error path for genuinely missing or broken installs.
"""
from __future__ import annotations
import importlib
_JITER_PRELOADED = False
_JITER_PRELOAD_ERROR: Exception | None = None
def preload_jiter_native_extension() -> bool:
"""Import jiter's native extension early if it is available."""
global _JITER_PRELOADED, _JITER_PRELOAD_ERROR
if _JITER_PRELOADED:
return True
try:
importlib.import_module("jiter.jiter")
from jiter import from_json as _from_json # noqa: F401
except Exception as exc:
_JITER_PRELOAD_ERROR = exc
return False
_JITER_PRELOADED = True
_JITER_PRELOAD_ERROR = None
return True
preload_jiter_native_extension()
+2
View File
@@ -141,6 +141,8 @@ DEFAULT_CONTEXT_LENGTHS = {
# fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
# substring of "anthropic/claude-sonnet-4.6").
# OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
"claude-opus-4-8": 1000000,
"claude-opus-4.8": 1000000,
"claude-opus-4-7": 1000000,
"claude-opus-4.7": 1000000,
"claude-opus-4-6": 1000000,
+1 -1
View File
@@ -258,7 +258,7 @@ def emit_stream_drop(
except Exception:
pass
try:
agent._emit_status(
agent._buffer_status(
f"⚠️ {provider} stream {kind} ({type(error).__name__}){_suffix} "
f"— reconnecting, retry {attempt}/{max_attempts}"
)
+28
View File
@@ -83,6 +83,34 @@ _UTC_NOW = lambda: datetime.now(timezone.utc)
# Official docs snapshot entries. Models whose published pricing and cache
# semantics are stable enough to encode exactly.
_OFFICIAL_DOCS_PRICING: Dict[tuple[str, str], PricingEntry] = {
# ── Anthropic Claude 4.8 ─────────────────────────────────────────────
# Same $5/$25 base pricing as 4.6/4.7. Fast-mode variant is a separate
# model ID with 2x premium (vs the 6x premium on older Opus generations).
# Source: https://openrouter.ai/anthropic/claude-opus-4.8
(
"anthropic",
"claude-opus-4-8",
): PricingEntry(
input_cost_per_million=Decimal("5.00"),
output_cost_per_million=Decimal("25.00"),
cache_read_cost_per_million=Decimal("0.50"),
cache_write_cost_per_million=Decimal("6.25"),
source="official_docs_snapshot",
source_url="https://platform.claude.com/docs/en/about-claude/pricing",
pricing_version="anthropic-pricing-2026-05",
),
(
"anthropic",
"claude-opus-4-8-fast",
): PricingEntry(
input_cost_per_million=Decimal("10.00"),
output_cost_per_million=Decimal("50.00"),
cache_read_cost_per_million=Decimal("1.00"),
cache_write_cost_per_million=Decimal("12.50"),
source="official_docs_snapshot",
source_url="https://openrouter.ai/anthropic/claude-opus-4.8-fast",
pricing_version="anthropic-pricing-2026-05",
),
# ── Anthropic Claude 4.7 ─────────────────────────────────────────────
# Opus 4.5/4.6/4.7 share $5/$25 pricing (new tokenizer, up to 35% more
# tokens for the same text).
+6 -42
View File
@@ -61,14 +61,14 @@ from typing import Any, Dict, List
class WebSearchProvider(abc.ABC):
"""Abstract base class for a web search/extract/crawl backend.
"""Abstract base class for a web search/extract backend.
Subclasses must implement :meth:`is_available` and at least one of
:meth:`search` / :meth:`extract` / :meth:`crawl`. The
:meth:`supports_search` / :meth:`supports_extract` / :meth:`supports_crawl`
capability flags let the registry route each tool call to the right
provider, and let multi-capability providers (Firecrawl, Tavily, Exa,
) advertise multiple capabilities from a single class.
:meth:`search` / :meth:`extract`. The :meth:`supports_search` /
:meth:`supports_extract` capability flags let the registry route each
tool call to the right provider, and let multi-capability providers
(Firecrawl, Tavily, Exa, ) advertise multiple capabilities from a
single class.
"""
@property
@@ -113,22 +113,6 @@ class WebSearchProvider(abc.ABC):
"""
return False
def supports_crawl(self) -> bool:
"""Return True if this provider implements :meth:`crawl`.
Crawl differs from extract in that the agent provides a *seed URL*
and the provider walks linked pages on its own useful for
documentation sites where the agent doesn't know all relevant
URLs upfront. Tavily is the only built-in backend that natively
crawls today; Firecrawl provides a similar capability that we
don't currently surface as a tool.
Providers that don't crawl should leave this as False; the
dispatcher in :func:`tools.web_tools.web_crawl_tool` will fall
back to its auxiliary-model summarization path.
"""
return False
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a web search.
@@ -173,26 +157,6 @@ class WebSearchProvider(abc.ABC):
f"{self.name} does not support extract (override supports_extract)"
)
def crawl(self, url: str, **kwargs: Any) -> Any:
"""Crawl a seed URL and return results.
Override when :meth:`supports_crawl` returns True. The default
raises NotImplementedError; callers should gate on
:meth:`supports_crawl` before calling.
Return shape: ``{"results": [{"url": str, "title": str,
"content": str, ...}, ...]}`` matching what
:func:`tools.web_tools.web_crawl_tool` post-processing expects.
Implementations MAY be ``async def``.
``kwargs`` may carry forward-compat fields (e.g. ``max_depth``,
``include_domains``) implementations should ignore unknown keys.
"""
raise NotImplementedError(
f"{self.name} does not support crawl (override supports_crawl)"
)
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker.
+6 -23
View File
@@ -11,7 +11,7 @@ Active selection
----------------
The active provider is chosen by configuration with this precedence:
1. ``web.search_backend`` / ``web.extract_backend`` / ``web.crawl_backend``
1. ``web.search_backend`` / ``web.extract_backend``
(per-capability override).
2. ``web.backend`` (shared fallback).
3. If exactly one capability-eligible provider is registered AND available,
@@ -24,10 +24,10 @@ The active provider is chosen by configuration with this precedence:
5. Otherwise ``None`` the tool surfaces a helpful error pointing at
``hermes tools``.
The capability filter (``supports_search`` / ``supports_extract`` /
``supports_crawl``) is applied at every step so a search-only provider
(``brave-free``) configured as ``web.extract_backend`` correctly falls
through to an extract-capable backend.
The capability filter (``supports_search`` / ``supports_extract``) is
applied at every step so a search-only provider (``brave-free``)
configured as ``web.extract_backend`` correctly falls through to an
extract-capable backend.
"""
from __future__ import annotations
@@ -131,7 +131,7 @@ _LEGACY_PREFERENCE = (
def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearchProvider]:
"""Resolve the active provider for a capability ("search" | "extract" | "crawl").
"""Resolve the active provider for a capability ("search" | "extract").
Resolution rules (in order):
@@ -168,8 +168,6 @@ def _resolve(configured: Optional[str], *, capability: str) -> Optional[WebSearc
return bool(p.supports_search())
if capability == "extract":
return bool(p.supports_extract())
if capability == "crawl":
return bool(p.supports_crawl())
return False
def _is_available_safe(p: WebSearchProvider) -> bool:
@@ -241,21 +239,6 @@ def get_active_extract_provider() -> Optional[WebSearchProvider]:
return _resolve(explicit, capability="extract")
def get_active_crawl_provider() -> Optional[WebSearchProvider]:
"""Resolve the currently-active web crawl provider.
Reads ``web.crawl_backend`` (preferred) or ``web.backend`` (shared
fallback) from config.yaml; falls back per the module docstring.
Crawl is a niche capability among built-in providers only Tavily and
Firecrawl implement it. Callers should expect ``None`` and fall back to
a different strategy (e.g. summarize-via-LLM) when neither is
configured.
"""
explicit = _read_config_key("web", "crawl_backend") or _read_config_key("web", "backend")
return _resolve(explicit, capability="crawl")
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
+2 -1
View File
@@ -10667,7 +10667,8 @@ class HermesCLI:
if not reqs.get("stt_available", reqs.get("stt_key_set")):
raise RuntimeError(
"Voice mode requires an STT provider for transcription.\n"
"Option 1: pip install faster-whisper (free, local)\n"
"Option 1: uv pip install faster-whisper "
"(free, local; `pip install faster-whisper` also works if pip is on PATH)\n"
"Option 2: Set GROQ_API_KEY (free tier)\n"
"Option 3: Set VOICE_TOOLS_OPENAI_KEY (paid)"
)
+87
View File
@@ -0,0 +1,87 @@
#!/bin/sh
# shellcheck shell=sh
# /opt/hermes/bin/hermes — `docker exec` privilege-drop shim.
#
# Background
# ----------
# The s6 image runs the supervised gateway/main process as the unprivileged
# `hermes` user (UID 10000). When an operator runs `docker exec <c> hermes ...`
# the default UID is root (0), and any file the command writes under
# $HERMES_HOME — auth.json, .env, config.yaml — ends up root-owned and
# unreadable to the supervised gateway. The most common manifestation: the
# user runs `docker exec <c> hermes login`, this writes
# /opt/data/auth.json as root:root mode 0600, and from then on the gateway
# returns "Provider authentication failed: Hermes is not logged into Nous
# Portal" on every incoming message — even though `docker exec <c> hermes
# chat -q ping` (also running as root) succeeds because root happens to be
# able to read its own root-owned file. See systematic-debugging skill
# notes attached to this fix.
#
# Fix
# ---
# This shim sits at /opt/hermes/bin/hermes and is placed earliest on PATH.
# When invoked as root, it drops to the hermes user (via s6-setuidgid)
# before exec'ing the real venv binary, so anything that writes under
# $HERMES_HOME is uid-aligned with the supervised processes. When invoked
# as any non-root UID — including the supervised processes themselves,
# `docker exec --user hermes`, kanban subagents, etc. — it short-circuits
# straight to the venv binary with no privilege change. Net: one extra
# fork on the docker-exec-as-root path, zero behavioral change on every
# other path.
#
# Recursion safety: the shim exec's the venv binary by *absolute path*
# (/opt/hermes/.venv/bin/hermes), so the second hop cannot re-enter this
# shim regardless of PATH state. No sentinel env var needed.
#
# Opt-out: set HERMES_DOCKER_EXEC_AS_ROOT=1 (1/true/yes, case-insensitive)
# to keep running as root. Reserved for diagnostic sessions where the
# operator deliberately wants root semantics — e.g. inspecting root-only
# state via the hermes CLI. Default is to drop.
set -e
REAL=/opt/hermes/.venv/bin/hermes
# Defensive: if the venv binary is missing (corrupted image, partial
# install), fail loudly rather than silently masking it.
if [ ! -x "$REAL" ]; then
echo "hermes-shim: $REAL not found or not executable" >&2
exit 127
fi
# Already non-root? Just exec the real binary. This is the hot path for
# supervised processes (uid 10000) and for `docker exec --user hermes`.
if [ "$(id -u)" != "0" ]; then
exec "$REAL" "$@"
fi
# Root, with opt-out set? Honor it.
case "${HERMES_DOCKER_EXEC_AS_ROOT:-}" in
1|true|TRUE|True|yes|YES|Yes)
exec "$REAL" "$@"
;;
esac
# Root, no opt-out. Drop to the hermes user.
#
# s6-setuidgid lives under /command/ which is NOT on `docker exec`'s PATH
# (s6-overlay only puts /command/ on PATH for supervision-tree children).
# Reference it by absolute path so the drop is robust against PATH
# manipulation.
S6_SUID=/command/s6-setuidgid
if [ ! -x "$S6_SUID" ]; then
# Non-s6 image (someone stripped s6-overlay, or a hand-built variant).
# Fail loud rather than silently re-execing as root and leaking the
# bug this shim exists to prevent.
echo "hermes-shim: $S6_SUID not found; refusing to silently run as root." >&2
echo "hermes-shim: re-run with --user hermes or set HERMES_DOCKER_EXEC_AS_ROOT=1." >&2
exit 126
fi
# Reset HOME to the hermes user's home before dropping privileges. Without
# this, $HOME stays /root and any library that resolves paths off $HOME
# (XDG caches, lockfiles, .config writes) will try to write to /root and
# fail with EACCES. Mirrors main-wrapper.sh.
export HOME=/opt/data
exec "$S6_SUID" hermes "$REAL" "$@"
+4
View File
@@ -19,6 +19,10 @@ case "${HERMES_DASHBOARD:-}" in
;;
esac
# with-contenv repopulates HOME from /init as /root. Reset it before
# dropping privileges so HOME-anchored state lands under /opt/data.
export HOME=/opt/data
cd /opt/data
# shellcheck disable=SC1091
. /opt/hermes/.venv/bin/activate
+11 -16
View File
@@ -24,7 +24,8 @@ Exposes an HTTP server with endpoints:
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect to hermes-agent
through this adapter by pointing at http://localhost:8642/v1.
through this adapter by pointing at http://localhost:8642/v1 and
authenticating with API_SERVER_KEY.
Requires:
- aiohttp (already available in the gateway)
@@ -844,11 +845,11 @@ class APIServerAdapter(BasePlatformAdapter):
Validate Bearer token from Authorization header.
Returns None if auth is OK, or a 401 web.Response on failure.
If no API key is configured, all requests are allowed (only when API
server is local).
connect() refuses to start the API server without API_SERVER_KEY, so
the no-key branch only exists for tests or unsupported manual wiring.
"""
if not self._api_key:
return None # No key configured — allow all (local-only use)
return None
auth_header = request.headers.get("Authorization", "")
if auth_header.startswith("Bearer "):
@@ -4099,11 +4100,13 @@ class APIServerAdapter(BasePlatformAdapter):
if hasattr(sweep_task, "add_done_callback"):
sweep_task.add_done_callback(self._background_tasks.discard)
# Refuse to start network-accessible without authentication
if is_network_accessible(self._host) and not self._api_key:
# Refuse to start without authentication. The API server can
# dispatch terminal-capable agent work, so every deployment needs
# an explicit API_SERVER_KEY regardless of bind address.
if not self._api_key:
logger.error(
"[%s] Refusing to start: binding to %s requires API_SERVER_KEY. "
"Set API_SERVER_KEY or use the default 127.0.0.1.",
"[%s] Refusing to start: API_SERVER_KEY is required for the API server, "
"including loopback-only binds on %s.",
self.name, self._host,
)
return False
@@ -4141,14 +4144,6 @@ class APIServerAdapter(BasePlatformAdapter):
await self._site.start()
self._mark_connected()
if not self._api_key:
logger.warning(
"[%s] ⚠️ No API key configured (API_SERVER_KEY / platforms.api_server.key). "
"All requests will be accepted without authentication. "
"Set an API key for production deployments to prevent "
"unauthorized access to sessions, responses, and cron jobs.",
self.name,
)
logger.info(
"[%s] API server listening on http://%s:%d (model: %s)",
self.name, self._host, self._port, self._model_name,
+20 -2
View File
@@ -25,6 +25,7 @@ from gateway.platforms.base import (
MessageEvent,
MessageType,
SendResult,
is_network_accessible,
)
logger = logging.getLogger(__name__)
@@ -132,12 +133,24 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
def set_notification_scheduler(self, scheduler: Optional[NotificationScheduler]) -> None:
self._notification_scheduler = scheduler
def _source_allowlist_required_but_missing(self) -> bool:
return is_network_accessible(self._host) and not self._allowed_source_networks
async def connect(self) -> bool:
if self._client_state is None:
logger.error(
"[msgraph_webhook] Refusing to start without extra.client_state configured"
)
return False
if self._source_allowlist_required_but_missing():
logger.error(
"[msgraph_webhook] Refusing to start: binding to %s requires "
"extra.allowed_source_cidrs. Configure the Microsoft Graph "
"source CIDRs or bind to loopback (127.0.0.1/::1) behind a "
"tunnel or reverse proxy.",
self._host,
)
return False
app = web.Application()
app.router.add_get(self._health_path, self._handle_health)
@@ -177,6 +190,8 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
return {"name": chat_id, "type": "webhook"}
async def _handle_health(self, request: "web.Request") -> "web.Response":
if not self._source_ip_allowed(request):
return web.Response(status=403)
return web.json_response(
{
"status": "ok",
@@ -271,9 +286,12 @@ class MSGraphWebhookAdapter(BasePlatformAdapter):
def _source_ip_allowed(self, request: "web.Request") -> bool:
"""Return True if the request's source IP is in the configured allowlist.
When ``allowed_source_cidrs`` is empty (the default), everything is
allowed preserves behavior for dev tunnels / localhost setups.
Loopback-only binds may omit ``allowed_source_cidrs`` for local reverse
proxies and dev tunnels. Network-accessible binds fail closed until an
explicit CIDR allowlist is configured.
"""
if self._source_allowlist_required_but_missing():
return False
if not self._allowed_source_networks:
return True
peer = request.remote or ""
+94 -2
View File
@@ -8017,7 +8017,8 @@ class GatewayRunner:
"🎤 I received your voice message but can't transcribe it — "
"no speech-to-text provider is configured.\n\n"
"To enable voice: install faster-whisper "
"(`pip install faster-whisper` in the Hermes venv) "
"(`uv pip install faster-whisper` in the Hermes venv; "
"`pip install faster-whisper` also works if pip is on PATH) "
"and set `stt.enabled: true` in config.yaml, "
"then /restart the gateway."
)
@@ -18192,6 +18193,72 @@ class GatewayRunner:
return response
def _run_planned_stop_watcher(
stop_event: threading.Event,
runner,
loop: asyncio.AbstractEventLoop,
shutdown_handler,
*,
poll_interval: float = 0.5,
) -> None:
"""Poll for the planned-stop marker and trigger graceful shutdown.
On Windows, ``asyncio.add_signal_handler`` raises NotImplementedError
for SIGTERM/SIGINT, so the standard signal-driven shutdown path
never runs when ``hermes gateway stop`` signals the gateway. The
consequence is that the drain loop is skipped in-flight agent
sessions are killed mid-turn and ``resume_pending`` is never set,
so the next gateway boot has no idea those sessions need to be
auto-resumed (issue #33778, v0.13.0 session-resume feature broken
on native Windows).
This watcher runs on every platform (cheap, defensive) and bridges
the gap on Windows by translating a filesystem marker into the
same shutdown-handler invocation a real SIGTERM would have produced
on POSIX. The CLI's ``hermes_cli.gateway_windows.stop()`` writes
the marker via ``write_planned_stop_marker(pid)`` and then waits
for the gateway PID to exit; this watcher is what makes that
exit happen cleanly.
On POSIX this is a no-op safety net the signal handler always
races us to consuming the marker file because it fires synchronously
from the kernel's signal delivery.
Args:
stop_event: cleared by start_gateway() during normal shutdown
to tell the watcher to exit.
runner: the GatewayRunner instance; we check ``_running`` and
``_draining`` to avoid triggering shutdown if the gateway
is already in one of those states.
loop: the asyncio event loop the shutdown handler must run on.
shutdown_handler: same callable that's wired to SIGTERM —
tolerates a ``None`` signal argument (planned stop case)
and consumes the marker via
``consume_planned_stop_marker_for_self()``.
poll_interval: seconds between marker checks. 0.5s gives a
responsive shutdown without burning CPU.
"""
from gateway.status import _get_planned_stop_marker_path
marker_path = _get_planned_stop_marker_path()
while not stop_event.is_set():
try:
if (
marker_path.exists()
and not getattr(runner, "_draining", False)
and getattr(runner, "_running", False)
):
# Drive the same path as a real signal handler.
# Pass signal=None — the handler tolerates that and consumes
# the marker via consume_planned_stop_marker_for_self,
# which also validates target_pid + start_time match us.
loop.call_soon_threadsafe(shutdown_handler, None)
# Done — the handler will set _draining; we exit on next tick.
break
except Exception as _e:
logger.debug("Planned-stop watcher tick error: %s", _e)
stop_event.wait(poll_interval)
def _start_cron_ticker(stop_event: threading.Event, adapters=None, loop=None, interval: int = 60):
"""
Background thread that ticks the cron scheduler at a regular interval.
@@ -18596,7 +18663,28 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
pass
else:
logger.info("Skipping signal handlers (not running in main thread).")
# Windows fallback: asyncio.add_signal_handler raises NotImplementedError
# on Windows, so `hermes gateway stop`'s SIGTERM (which Python maps to
# TerminateProcess on Windows) never invokes shutdown_signal_handler.
# That means the drain loop never runs, mark_resume_pending never fires,
# and sessions are silently lost across restarts (issue #33778).
#
# The fix is a marker-polling thread: `hermes gateway stop` writes the
# planned-stop marker BEFORE killing, and this thread notices it and
# drives the same shutdown path the signal handler would have. Runs
# on every platform (cheap, defensive) so non-signal-bearing
# environments (Windows native, sandboxed CI runners that mask
# SIGTERM) still get a clean drain.
_planned_stop_watcher_stop = threading.Event()
_planned_stop_watcher_thread = threading.Thread(
target=_run_planned_stop_watcher,
args=(_planned_stop_watcher_stop, runner, loop, shutdown_signal_handler),
daemon=True,
name="planned-stop-watcher",
)
_planned_stop_watcher_thread.start()
# Claim the PID file BEFORE bringing up any platform adapters.
# This closes the --replace race window: two concurrent `gateway run
# --replace` invocations both pass the termination-wait above, but
@@ -18674,6 +18762,10 @@ async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool =
cron_stop.set()
cron_thread.join(timeout=5)
# Stop the planned-stop watcher (daemon=True so this is belt-and-suspenders).
_planned_stop_watcher_stop.set()
_planned_stop_watcher_thread.join(timeout=2)
# Close MCP server connections
try:
from tools.mcp_tool import shutdown_mcp_servers
+14 -9
View File
@@ -552,11 +552,6 @@ class GatewayStreamConsumer:
self._last_edit_time = time.monotonic()
if got_done:
# Record that the final content reached the user even
# if the cosmetic final edit below fails.
if current_update_visible and self._accumulated:
self._final_content_delivered = True
# Final edit without cursor. If progressive editing failed
# mid-stream, send a single continuation/fallback message
# here instead of letting the base gateway path send the
@@ -573,6 +568,7 @@ class GatewayStreamConsumer:
# final edit — but only for adapters that don't
# need an explicit finalize signal.
self._final_response_sent = True
self._final_content_delivered = True
elif self._message_id:
# Either the mid-stream edit didn't run (no
# visible update this tick) OR the adapter needs
@@ -580,8 +576,12 @@ class GatewayStreamConsumer:
self._final_response_sent = await self._send_or_edit(
self._accumulated, finalize=True,
)
if self._final_response_sent:
self._final_content_delivered = True
elif not self._already_sent:
self._final_response_sent = await self._send_or_edit(self._accumulated)
if self._final_response_sent:
self._final_content_delivered = True
return
if commentary_text is not None:
@@ -641,6 +641,7 @@ class GatewayStreamConsumer:
# "Let me search…") had been delivered, not the real answer.
if _best_effort_ok and not self._final_response_sent:
self._final_response_sent = True
self._final_content_delivered = True
except Exception as e:
logger.error("Stream consumer error: %s", e)
@@ -778,6 +779,7 @@ class GatewayStreamConsumer:
pass
self._already_sent = True
self._final_response_sent = True
self._final_content_delivered = True
return
raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
@@ -814,11 +816,13 @@ class GatewayStreamConsumer:
if not result or not result.success:
if sent_any_chunk:
# Some continuation text already reached the user. Suppress
# the base gateway final-send path so we don't resend the
# full response and create another duplicate.
# Some continuation text already reached the user, but not
# the full response. Do NOT set _final_response_sent — the
# base gateway final-send path should still deliver the
# complete response so the user gets the full answer.
# Suppress only _already_sent to avoid a duplicate send
# of the same partial content.
self._already_sent = True
self._final_response_sent = True
self._message_id = last_message_id
self._last_sent_text = last_successful_chunk
self._fallback_prefix = ""
@@ -856,6 +860,7 @@ class GatewayStreamConsumer:
self._message_id = last_message_id
self._already_sent = True
self._final_response_sent = True
self._final_content_delivered = True
self._last_sent_text = chunks[-1]
self._fallback_prefix = ""
+2 -2
View File
@@ -14,8 +14,8 @@ Provides subcommands for:
import os
import sys
__version__ = "0.14.0"
__release_date__ = "2026.5.16"
__version__ = "0.15.0"
__release_date__ = "2026.5.28"
def _ensure_utf8():
+178 -40
View File
@@ -802,16 +802,18 @@ def format_auth_error(error: Exception) -> str:
return f"{error} Run `hermes model` to re-authenticate."
if error.code == "subscription_required":
return (
"No active paid subscription found on Nous Portal. "
"Please purchase/activate a subscription, then retry."
)
if error.provider == "nous":
return _format_nous_entitlement_auth_error(error)
return "No active paid subscription found. Please purchase/activate a subscription, then retry."
if error.code == "insufficient_credits":
return (
"Subscription credits are exhausted. "
"Top up/renew credits in Nous Portal, then retry."
)
if error.provider == "nous":
return _format_nous_entitlement_auth_error(error)
return "Subscription credits are exhausted. Top up/renew credits, then retry."
if error.code in {"subscription_expired", "no_usable_credits", "account_missing"}:
if error.provider == "nous":
return _format_nous_entitlement_auth_error(error)
if error.code == "temporarily_unavailable":
return f"{error} Please retry in a few seconds."
@@ -819,6 +821,25 @@ def format_auth_error(error: Exception) -> str:
return str(error)
def _format_nous_entitlement_auth_error(error: AuthError) -> str:
try:
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
account_info = get_nous_portal_account_info(force_fresh=True)
message = format_nous_portal_entitlement_message(
account_info,
capability="Nous model access",
)
if message:
return message
except Exception:
pass
return f"{error} Check credits or billing in Nous Portal, then retry."
def _token_fingerprint(token: Any) -> Optional[str]:
"""Return a short hash fingerprint for telemetry without leaking token bytes."""
if not isinstance(token, str):
@@ -3160,6 +3181,9 @@ def _prompt_manual_callback_paste(redirect_uri: str) -> dict:
print("not on your laptop) — that is expected. Copy the FULL URL")
print("from your browser's address bar of that failed page and paste")
print("it below. A bare '?code=...&state=...' fragment also works.")
print("If the consent page shows the authorization code in-page")
print("(xAI's current behavior) rather than redirecting, paste the")
print("bare code value on its own.")
print("───────────────────────────────────────────────────────────────")
try:
raw = input("Callback URL: ")
@@ -3291,16 +3315,38 @@ def _sync_codex_pool_entries(
tokens: Dict[str, str],
last_refresh: Optional[str],
) -> None:
"""Mirror a fresh Codex re-auth into the credential_pool singleton entries.
"""Mirror a fresh Codex re-auth into the credential_pool OAuth entries.
The runtime selects credentials from ``credential_pool.openai-codex``, not
from ``providers.openai-codex.tokens``. A re-auth invalidates the prior
OAuth pair server-side, but the pool's ``device_code`` entry keeps holding
the now-consumed refresh token plus any stale error markers so the next
request spends a dead token and gets a 401 ``token_invalidated``. Update
the singleton-seeded entries in lockstep with the provider tokens and clear
the error state so the fresh credentials take effect immediately. Manual
(``manual:*``) entries are independent credentials and are left untouched.
OAuth pair server-side, but pool entries keep holding the now-consumed
refresh token plus any stale error markers so the next request spends a
dead token and gets a 401 ``token_invalidated``.
What gets refreshed:
* ``device_code`` the singleton-seeded entry written by the device-code
OAuth flow when the user logged in via ``hermes setup`` / the model
picker. Always synced with the fresh tokens.
* ``manual:device_code`` entries created by ``hermes auth add openai-codex``
that use the same device-code OAuth mechanism. An interactive re-auth
proves the user owns the ChatGPT account, so it is safe (and expected)
to refresh these entries too. Without this, a user who once ran the
``hermes auth add`` workaround for #33000 would silently leave that
manual entry stale on every subsequent re-auth, recreating the issue
reported in #33538.
What does NOT get refreshed:
* ``manual:api_key`` and any other non-device-code manual sources those
are independent credentials (an explicit API key, a different ChatGPT
account, etc.) and must not be overwritten by a single re-auth.
Error markers (``last_status``, ``last_error_*``) are also cleared on
every device-code-backed entry even those whose tokens we did not
rewrite so that an interactive re-auth gives every relevant pool entry
a fresh selection chance instead of leaving them marked unhealthy from a
pre-re-auth 401.
"""
access_token = tokens.get("access_token")
if not access_token:
@@ -3312,8 +3358,15 @@ def _sync_codex_pool_entries(
entries = pool.get("openai-codex")
if not isinstance(entries, list):
return
# Sources whose tokens should be rewritten by a fresh Codex device-code
# OAuth re-auth. ``manual:api_key`` and unknown sources are intentionally
# excluded — they represent independent credentials.
REFRESHABLE_SOURCES = {"device_code", "manual:device_code"}
for entry in entries:
if not isinstance(entry, dict) or entry.get("source") != "device_code":
if not isinstance(entry, dict):
continue
source = entry.get("source")
if source not in REFRESHABLE_SOURCES:
continue
entry["access_token"] = access_token
if refresh_token:
@@ -5627,6 +5680,8 @@ def _empty_nous_auth_status() -> Dict[str, Any]:
"access_expires_at": None,
"agent_key_expires_at": None,
"has_refresh_token": False,
"inference_credential_present": False,
"credential_source": None,
}
@@ -5655,24 +5710,36 @@ def _snapshot_nous_pool_status() -> Dict[str, Any]:
return (agent_exp, access_exp, -priority)
entry = max(entries, key=_entry_sort_key)
access_token = (
getattr(entry, "access_token", None)
or getattr(entry, "runtime_api_key", "")
)
if not access_token:
runtime_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
if not runtime_key:
return _empty_nous_auth_status()
access_token = getattr(entry, "access_token", None)
auth_type = str(getattr(entry, "auth_type", "") or "").strip().lower()
refresh_token = getattr(entry, "refresh_token", None)
is_portal_oauth = bool(access_token) and (
auth_type.startswith("oauth") or bool(refresh_token)
)
label = getattr(entry, "label", "unknown")
portal_status_url = None
if is_portal_oauth:
portal_status_url = (
getattr(entry, "portal_base_url", None)
or DEFAULT_NOUS_PORTAL_URL
)
return {
"logged_in": True,
"portal_base_url": getattr(entry, "portal_base_url", None)
or getattr(entry, "base_url", None),
"logged_in": is_portal_oauth,
"portal_base_url": portal_status_url,
"inference_base_url": getattr(entry, "inference_base_url", None)
or getattr(entry, "runtime_base_url", None)
or getattr(entry, "base_url", None),
"access_token": access_token,
"access_token": access_token if is_portal_oauth else None,
"access_expires_at": getattr(entry, "expires_at", None),
"agent_key_expires_at": getattr(entry, "agent_key_expires_at", None),
"has_refresh_token": bool(getattr(entry, "refresh_token", None)),
"source": f"pool:{getattr(entry, 'label', 'unknown')}",
"has_refresh_token": bool(refresh_token),
"inference_credential_present": True,
"credential_source": f"pool:{label}",
"source": f"pool:{label}",
}
except Exception:
return _empty_nous_auth_status()
@@ -5755,6 +5822,10 @@ def _compute_nous_auth_status() -> Dict[str, Any]:
"agent_key_expires_at": state.get("agent_key_expires_at"),
"has_refresh_token": bool(state.get("refresh_token")),
"access_token": state.get("access_token"),
"inference_credential_present": bool(
state.get("access_token") or state.get("agent_key")
),
"credential_source": "auth_store",
"source": "auth_store",
}
try:
@@ -5772,6 +5843,8 @@ def _compute_nous_auth_status() -> Dict[str, Any]:
or refreshed_state.get("agent_key_expires_at")
or base_status.get("agent_key_expires_at"),
"has_refresh_token": bool(refreshed_state.get("refresh_token")),
"inference_credential_present": True,
"credential_source": "auth_store",
"source": f"runtime:{creds.get('source', 'portal')}",
"key_id": creds.get("key_id"),
}
@@ -6283,6 +6356,7 @@ def _prompt_model_selection(
pricing: Optional[Dict[str, Dict[str, str]]] = None,
unavailable_models: Optional[List[str]] = None,
portal_url: str = "",
unavailable_message: str = "",
) -> Optional[str]:
"""Interactive model selection. Puts current_model first with a marker. Returns chosen model ID or None.
@@ -6374,18 +6448,22 @@ def _prompt_model_selection(
choices.append(" Enter custom model name")
choices.append(" Skip (keep current)")
_upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
unavailable_footer = unavailable_message.strip()
if not unavailable_footer and _unavailable:
unavailable_footer = f"Upgrade at {_upgrade_url} for paid models"
# Print the unavailable block BEFORE the menu via regular print().
# simple_term_menu pads title lines to terminal width (causes wrapping),
# so we keep the title minimal and use stdout for the static block.
# clear_screen=False means our printed output stays visible above.
_upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
if _unavailable:
print(menu_title)
print()
for mid in _unavailable:
print(f"{_DIM} {_label(mid)}{_RESET}")
print()
print(f"{_DIM} ── Upgrade at {_upgrade_url} for paid models ──{_RESET}")
print(f"{_DIM} ── {unavailable_footer} ──{_RESET}")
print()
effective_title = "Available free models:"
else:
@@ -6427,8 +6505,11 @@ def _prompt_model_selection(
if _unavailable:
_upgrade_url = (portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
unavailable_footer = unavailable_message.strip() or (
f"Unavailable models (requires paid tier — upgrade at {_upgrade_url})"
)
print()
print(f" {_DIM}── Unavailable models (requires paid tier — upgrade at {_upgrade_url}) ──{_RESET}")
print(f" {_DIM}── {unavailable_footer} ──{_RESET}")
for mid in _unavailable:
print(f" {'':>{num_width}} {_DIM}{_label(mid)}{_RESET}")
print()
@@ -6777,6 +6858,12 @@ def _xai_oauth_loopback_login(
remote VM). The same PKCE verifier, ``state``, and ``nonce`` are
used for both paths so the upstream-side OAuth flow is identical.
"""
def _stdin_supports_manual_paste() -> bool:
try:
return bool(getattr(sys.stdin, "isatty", lambda: False)())
except Exception:
return False
discovery = _xai_oauth_discovery(timeout_seconds)
authorization_endpoint = discovery["authorization_endpoint"]
token_endpoint = discovery["token_endpoint"]
@@ -6840,12 +6927,28 @@ def _xai_oauth_loopback_login(
else:
print("Could not open the browser automatically; use the URL above.")
callback = _xai_wait_for_callback(
server,
thread,
callback_result,
timeout_seconds=max(30.0, timeout_seconds * 9),
)
try:
callback = _xai_wait_for_callback(
server,
thread,
callback_result,
timeout_seconds=max(30.0, timeout_seconds * 9),
)
except AuthError as exc:
if (
getattr(exc, "code", "") != "xai_callback_timeout"
or not _stdin_supports_manual_paste()
):
raise
print()
print("xAI loopback callback timed out.")
print("If your browser reached a failed 127.0.0.1 callback page,")
print("paste that FULL callback URL below to continue this login.")
print("You can also re-run with `--manual-paste` to skip the")
print("loopback listener from the start.")
callback = _prompt_manual_callback_paste(redirect_uri)
if callback.get("code") is None and callback.get("error") is None:
raise exc
except Exception:
try:
server.shutdown()
@@ -6865,7 +6968,21 @@ def _xai_oauth_loopback_login(
provider="xai-oauth",
code="xai_authorization_failed",
)
if callback.get("state") != state:
callback_state = callback.get("state")
# Manual-paste bare-code path: when a user pastes only the opaque
# authorization code (no ``code=``/``state=`` query parameters),
# ``_parse_pasted_callback`` returns ``state=None``. xAI's consent
# page renders the code in-page rather than redirecting through the
# 127.0.0.1 callback, so on many remote setups (Cloud Shell, headless
# VPS, container consoles) the bare code is the only thing the user
# can obtain. PKCE (code_verifier) still binds the exchange to this
# client, so the local state-equality check is redundant on the
# bare-code path — we substitute the locally generated state to keep
# the rest of the validation chain (and the token exchange) unchanged.
# See #26923 (AccursedGalaxy comment, 2026-05-20).
if callback_state is None and manual_paste:
callback_state = state
if callback_state != state:
raise AuthError(
"xAI authorization failed: state mismatch.",
provider="xai-oauth",
@@ -7626,8 +7743,9 @@ def _nous_device_code_login(
portal_url = auth_state.get(
"portal_base_url", DEFAULT_NOUS_PORTAL_URL
).rstrip("/")
message = format_auth_error(exc)
print()
print("Your Nous Portal account does not have an active subscription.")
print(message)
print(f" Subscribe here: {portal_url}/billing")
print()
print("After subscribing, run `hermes model` again to finish setup.")
@@ -7737,11 +7855,30 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
print()
unavailable_models: list = []
unavailable_message = ""
if model_ids:
pricing = get_pricing_for_provider("nous")
free_tier = check_nous_free_tier()
# Force fresh account data for model selection so recent credit
# purchases are reflected immediately.
free_tier = check_nous_free_tier(force_fresh=True)
_portal_for_recs = auth_state.get("portal_base_url", "")
if free_tier:
try:
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
_account_info = get_nous_portal_account_info(force_fresh=True)
unavailable_message = (
format_nous_portal_entitlement_message(
_account_info,
capability="paid Nous models",
)
or ""
)
except Exception:
unavailable_message = ""
# The Portal's freeRecommendedModels endpoint is the
# source of truth for what's free *right now*. Augment
# the curated list with anything new the Portal flags
@@ -7768,11 +7905,12 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
model_ids, pricing=pricing,
unavailable_models=unavailable_models,
portal_url=_portal,
unavailable_message=unavailable_message,
)
elif unavailable_models:
_url = (_portal or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
print("No free models currently available.")
print(f"Upgrade at {_url} to access paid models.")
print(unavailable_message or f"Upgrade at {_url} to access paid models.")
else:
print("No curated models available for Nous Portal.")
except Exception as exc:
+5 -2
View File
@@ -512,6 +512,7 @@ def _quick_snapshot_root(hermes_home: Optional[Path] = None) -> Path:
def create_quick_snapshot(
label: Optional[str] = None,
hermes_home: Optional[Path] = None,
keep: Optional[int] = None,
) -> Optional[str]:
"""Create a quick state snapshot of critical files.
@@ -585,8 +586,10 @@ def create_quick_snapshot(
with open(snap_dir / "manifest.json", "w", encoding="utf-8") as f:
json.dump(meta, f, indent=2)
# Auto-prune
_prune_quick_snapshots(root, keep=_QUICK_DEFAULT_KEEP)
# Auto-prune. Defaults preserve historical manual /snapshot behavior; callers
# with known high-churn safety snapshots (for example pre-update) can pass a
# smaller keep value so large state.db copies do not accumulate indefinitely.
_prune_quick_snapshots(root, keep=_QUICK_DEFAULT_KEEP if keep is None else keep)
logger.info("State snapshot created: %s (%d files)", snap_id, len(manifest))
return snap_id
+29 -1
View File
@@ -300,14 +300,42 @@ def _git_short_hash(repo_dir: Path, rev: str) -> Optional[str]:
def get_git_banner_state(repo_dir: Optional[Path] = None) -> Optional[dict]:
"""Return upstream/local git hashes for the startup banner."""
"""Return upstream/local git hashes for the startup banner.
For source installs and dev images this runs ``git rev-parse`` against
the active checkout. When no checkout is available the canonical case
is the published Docker image, which excludes ``.git`` from the build
context we fall back to the baked-in build SHA (see
``hermes_cli/build_info.py``) and return it as a frozen
``upstream == local`` state with ``ahead=0``. A built image is by
definition pinned to one commit, so "ahead" is always zero and the
banner correctly shows ``· upstream <sha>`` with no carried-commits
annotation.
"""
repo_dir = repo_dir or _resolve_repo_dir()
if repo_dir is None:
# No git checkout — try the baked build SHA (Docker image path).
try:
from hermes_cli.build_info import get_build_sha
baked = get_build_sha(short=8)
if baked:
return {"upstream": baked, "local": baked, "ahead": 0}
except Exception:
pass
return None
upstream = _git_short_hash(repo_dir, "origin/main")
local = _git_short_hash(repo_dir, "HEAD")
if not upstream or not local:
# Live-git lookup failed (e.g. shallow clone without origin/main).
# Fall back to the baked build SHA if available.
try:
from hermes_cli.build_info import get_build_sha
baked = get_build_sha(short=8)
if baked:
return {"upstream": baked, "local": baked, "ahead": 0}
except Exception:
pass
return None
ahead = 0
+51
View File
@@ -0,0 +1,51 @@
"""
Baked-in build metadata for Hermes Agent.
Source installs report their git revision live via ``git rev-parse`` (see
``hermes_cli/dump.py`` and ``hermes_cli/banner.py``). That doesn't work inside
the published Docker image because ``.dockerignore`` excludes ``.git``, so
those callsites fall back to ``"(unknown)"`` / drop the banner suffix entirely.
To make ``hermes dump`` and the startup banner identify the exact commit the
image was built from, the Docker build writes the build-time ``$HERMES_GIT_SHA``
arg into ``<project_root>/.hermes_build_sha``. This module is the single
read-side helper consumed by both callsites keeping the lookup in one place
so the file path and missing-file behaviour stay consistent.
Behaviour:
- Returns ``None`` when the file is absent. Source installs and dev images
built without the ``HERMES_GIT_SHA`` build-arg fall through to live-git
resolution in the caller, so non-Docker installs are unaffected.
- Returns ``None`` on any IO / decoding error. The build-sha is a nice-to-have
for support triage; nothing in the CLI is allowed to crash because of it.
- Truncates to ``short`` characters (default 8) to match the format used by
``git rev-parse --short=8`` throughout the codebase.
"""
from __future__ import annotations
from pathlib import Path
from typing import Optional
# Path is resolved relative to this module so it works regardless of cwd —
# matches the pattern used by ``banner._resolve_repo_dir``.
_BUILD_SHA_FILE = Path(__file__).parent.parent / ".hermes_build_sha"
def get_build_sha(short: int = 8) -> Optional[str]:
"""Return the baked-in build SHA, truncated to ``short`` chars, or None.
Reads ``<project_root>/.hermes_build_sha`` if present. The file is
written by the Dockerfile's ``HERMES_GIT_SHA`` build-arg and contains
the full 40-character commit hash on a single line.
"""
try:
if not _BUILD_SHA_FILE.is_file():
return None
sha = _BUILD_SHA_FILE.read_text(encoding="utf-8").strip()
except Exception:
return None
if not sha:
return None
return sha[:short] if short and short > 0 else sha
+57 -5
View File
@@ -345,6 +345,58 @@ def recommended_update_command() -> str:
return recommended_update_command_for_method(method)
# Long-form text for ``hermes update`` / ``--check`` when running inside the
# Docker image. Surfaced by ``cmd_update`` and ``_cmd_update_check`` in
# hermes_cli/main.py; lives here so the wording stays consistent and we
# don't grow two slightly-different copies.
#
# Why this matters:
# - The published image excludes ``.git`` (see .dockerignore), so the
# git-based update path can never succeed inside the container.
# - The pre-existing fallback message ("✗ Not a git repository. Please
# reinstall: curl ... install.sh") is actively misleading inside Docker
# — that script installs a *new* host-side Hermes, it doesn't update
# the running container.
# - The right action is ``docker pull`` + restart the container; this
# helper spells that out, with notes on tag pinning and config
# persistence so users don't get blindsided.
_DOCKER_UPDATE_MESSAGE = """\
``hermes update`` doesn't apply inside the Docker container.
Hermes Agent runs as a published image (nousresearch/hermes-agent), not a
git checkout the container has no working tree to pull into. Update by
pulling a fresh image and restarting your container instead:
docker pull nousresearch/hermes-agent:latest
# then restart whatever started the container, e.g.:
docker compose up -d --force-recreate hermes-agent
# or, for ad-hoc runs, exit the current container and `docker run` again
Verify the new version after restart:
docker run --rm nousresearch/hermes-agent:latest --version
Notes:
If you pinned a specific tag (e.g. ``:v0.14.0``) the ``:latest`` tag
won't move your container — pull the newer tag you actually want, or
switch to ``:latest`` / ``:main`` for rolling updates. See available
tags at https://hub.docker.com/r/nousresearch/hermes-agent/tags
Your config and session history live under ``$HERMES_HOME`` (``/opt/data``
in the container, typically bind-mounted from the host) and persist
across image upgrades re-pulling doesn't lose any state.
Running a fork? Build your own image with this repo's ``Dockerfile``
and replace the ``docker pull`` step with your build/push pipeline."""
def format_docker_update_message() -> str:
"""Return the user-facing message for ``hermes update`` inside Docker.
Centralised so ``cmd_update`` (the apply path) and ``_cmd_update_check``
(the dry-run path) share the same wording. See ``_DOCKER_UPDATE_MESSAGE``
above for the full rationale.
"""
return _DOCKER_UPDATE_MESSAGE
def format_managed_message(action: str = "modify this Hermes installation") -> str:
"""Build a user-facing error for managed installs."""
managed_system = get_managed_system() or "a package manager"
@@ -2453,10 +2505,10 @@ OPTIONAL_ENV_VARS = {
"advanced": True,
},
"TAVILY_API_KEY": {
"description": "Tavily API key for AI-native web search, extract, and crawl",
"description": "Tavily API key for AI-native web search and extract",
"prompt": "Tavily API key",
"url": "https://app.tavily.com/home",
"tools": ["web_search", "web_extract", "web_crawl"],
"tools": ["web_search", "web_extract"],
"password": True,
"category": "tool",
},
@@ -2940,8 +2992,8 @@ OPTIONAL_ENV_VARS = {
"advanced": True,
},
"API_SERVER_KEY": {
"description": "Bearer token for API server authentication. Required for non-loopback binding; server refuses to start without it. On loopback (127.0.0.1), all requests are allowed if empty.",
"prompt": "API server auth key (required for network access)",
"description": "Bearer token for API server authentication. Required whenever the API server is enabled; server refuses to start without it.",
"prompt": "API server auth key",
"url": None,
"password": True,
"category": "messaging",
@@ -2956,7 +3008,7 @@ OPTIONAL_ENV_VARS = {
"advanced": True,
},
"API_SERVER_HOST": {
"description": "Host/bind address for the API server (default: 127.0.0.1). Use 0.0.0.0 for network access — server refuses to start without API_SERVER_KEY.",
"description": "Host/bind address for the API server (default: 127.0.0.1). API_SERVER_KEY is still required even on loopback binds.",
"prompt": "API server host",
"url": None,
"password": False,
+24 -2
View File
@@ -20,7 +20,15 @@ from agent.skill_utils import is_excluded_skill_path
def _get_git_commit(project_root: Path) -> str:
"""Return short git commit hash, or '(unknown)'."""
"""Return short git commit hash, or '(unknown)'.
Source installs and dev images resolve this live via ``git rev-parse``.
The published Docker image excludes ``.git`` from the build context, so
that lookup always fails we fall back to the baked-in build SHA written
to ``<project_root>/.hermes_build_sha`` by the Dockerfile's
``HERMES_GIT_SHA`` build-arg (see ``hermes_cli/build_info.py``).
The output format is identical regardless of source.
"""
try:
result = subprocess.run(
["git", "rev-parse", "--short=8", "HEAD"],
@@ -28,9 +36,23 @@ def _get_git_commit(project_root: Path) -> str:
cwd=str(project_root),
)
if result.returncode == 0:
return result.stdout.strip()
value = result.stdout.strip()
if value:
return value
except Exception:
pass
# Fall back to the build-time baked SHA (populated in published Docker
# images, absent otherwise). Defers the import so the dump module
# stays cheap on non-dump code paths.
try:
from hermes_cli.build_info import get_build_sha
baked = get_build_sha(short=8)
if baked:
return baked
except Exception:
pass
return "(unknown)"
+72 -7
View File
@@ -1014,12 +1014,70 @@ def start() -> None:
_report_gateway_start(f"direct spawn (PID {pid})")
def stop() -> None:
"""Stop the gateway. Tries /End on the scheduled task, then kills any stragglers."""
_assert_windows()
from hermes_cli.gateway import kill_gateway_processes
def _drain_gateway_pid(pid: int, drain_timeout: float) -> bool:
"""Write the planned-stop marker and wait for the gateway PID to exit.
stopped_any = False
Windows cannot deliver POSIX signals to a Python asyncio loop
(``loop.add_signal_handler`` raises NotImplementedError), so writing
the marker is the ONLY way to ask a running gateway to drain
in-flight agents and persist ``resume_pending`` before exit. The
gateway's planned-stop watcher thread (gateway/run.py) polls for
the marker and drives the same shutdown path the SIGTERM handler
would have on POSIX.
Returns True if the PID exited within the timeout, False if it
didn't (caller should escalate to schtasks /End + taskkill).
"""
if pid <= 0:
return False
try:
from gateway.status import write_planned_stop_marker, _pid_exists
except ImportError:
return False
try:
write_planned_stop_marker(pid)
except Exception:
# Best-effort: if the marker can't be written, we have no choice
# but to fall through to a hard kill. Caller decides escalation.
pass
deadline = time.monotonic() + max(drain_timeout, 1.0)
while time.monotonic() < deadline:
if not _pid_exists(pid):
return True
time.sleep(0.5)
return False
def stop() -> None:
"""Stop the gateway.
Writes the planned-stop marker first so the gateway can drain
in-flight agents and persist ``resume_pending`` before exit (the
gateway's marker-watcher thread picks this up — Windows asyncio
can't deliver SIGTERM to the loop, so the marker is our only IPC).
Then escalates: ``schtasks /End`` (kills the scheduled-task tree)
+ ``kill_gateway_processes(force=True)`` for any strays.
"""
_assert_windows()
from hermes_cli.gateway import kill_gateway_processes, _get_restart_drain_timeout
from gateway.status import get_running_pid
# Phase 1: ask the running gateway (if any) to drain itself by writing
# the planned-stop marker, then wait briefly for it to exit cleanly.
# On clean exit, sessions land with resume_pending=True and the next
# boot will auto-resume them.
pid = get_running_pid()
drained = False
if pid is not None:
try:
drain_timeout = float(_get_restart_drain_timeout() or 30.0)
except Exception:
drain_timeout = 30.0
drained = _drain_gateway_pid(pid, drain_timeout)
stopped_any = drained
if is_task_registered():
code, _out, err = _exec_schtasks(["/End", "/TN", get_task_name()])
# schtasks returns nonzero when the task isn't currently running — don't treat that as an error.
@@ -1028,12 +1086,19 @@ def stop() -> None:
elif "not running" not in (err or "").lower():
print(f"⚠ schtasks /End returned code {code}: {err.strip()}")
killed = kill_gateway_processes(all_profiles=False)
# Phase 3: hard-kill any strays. When drain succeeded this is a no-op;
# when drain timed out this is the escalation that ensures the PID
# actually exits. Use force=True on Windows so taskkill /T /F walks
# the descendant tree (browser helpers, etc.).
killed = kill_gateway_processes(all_profiles=False, force=not drained)
if killed:
stopped_any = True
print(f"✓ Killed {killed} gateway process(es)")
if stopped_any:
print("✓ Gateway stopped")
if drained:
print("✓ Gateway stopped (drained cleanly)")
else:
print("✓ Gateway stopped")
else:
print("✗ No gateway was running")
+35 -35
View File
@@ -1021,7 +1021,7 @@ def _board_task_counts(slug: str) -> dict[str, int]:
path = kb.kanban_db_path(board=slug)
if not path.exists():
return {}
with kb.connect(board=slug) as conn:
with kb.connect_closing(board=slug) as conn:
rows = conn.execute(
"SELECT status, COUNT(*) AS n FROM tasks GROUP BY status"
).fetchall()
@@ -1264,7 +1264,7 @@ def _cmd_init(args: argparse.Namespace) -> int:
def _cmd_heartbeat(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.heartbeat_worker(
conn,
args.task_id,
@@ -1279,7 +1279,7 @@ def _cmd_heartbeat(args: argparse.Namespace) -> int:
def _cmd_assignees(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
data = kb.known_assignees(conn)
if getattr(args, "json", False):
print(json.dumps(data, indent=2, ensure_ascii=False))
@@ -1320,7 +1320,7 @@ def _cmd_create(args: argparse.Namespace) -> int:
file=sys.stderr,
)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
task_id = kb.create_task(
conn,
title=args.title,
@@ -1369,7 +1369,7 @@ def _cmd_swarm(args: argparse.Namespace) -> int:
if not workers:
print("kanban swarm: at least one --worker is required", file=sys.stderr)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
created = ks.create_swarm(
conn,
goal=args.goal,
@@ -1395,7 +1395,7 @@ def _cmd_list(args: argparse.Namespace) -> int:
assignee = args.assignee
if args.mine and not assignee:
assignee = _profile_author()
with kb.connect() as conn:
with kb.connect_closing() as conn:
# Cheap "mini-dispatch": recompute ready so list output reflects
# dependencies that may have cleared since the last dispatcher tick.
kb.recompute_ready(conn)
@@ -1444,7 +1444,7 @@ def _cmd_show(args: argparse.Namespace) -> int:
file=sys.stderr,
)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
task = kb.get_task(conn, args.task_id)
if not task:
print(f"no such task: {args.task_id}", file=sys.stderr)
@@ -1610,7 +1610,7 @@ def _cmd_show(args: argparse.Namespace) -> int:
def _cmd_assign(args: argparse.Namespace) -> int:
profile = None if args.profile.lower() in {"none", "-", "null"} else args.profile
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.assign_task(conn, args.task_id, profile)
if not ok:
print(f"no such task: {args.task_id}", file=sys.stderr)
@@ -1620,7 +1620,7 @@ def _cmd_assign(args: argparse.Namespace) -> int:
def _cmd_reclaim(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.reclaim_task(
conn, args.task_id,
reason=getattr(args, "reason", None),
@@ -1637,7 +1637,7 @@ def _cmd_reclaim(args: argparse.Namespace) -> int:
def _cmd_reassign(args: argparse.Namespace) -> int:
profile = None if args.profile.lower() in {"none", "-", "null"} else args.profile
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.reassign_task(
conn, args.task_id, profile,
reclaim_first=bool(getattr(args, "reclaim", False)),
@@ -1667,7 +1667,7 @@ def _cmd_diagnostics(args: argparse.Namespace) -> int:
diag_config = kd.config_from_runtime_config(load_config())
with kb.connect() as conn:
with kb.connect_closing() as conn:
# Either one-task mode or fleet mode.
if getattr(args, "task", None):
task = kb.get_task(conn, args.task)
@@ -1790,14 +1790,14 @@ def _cmd_diagnostics(args: argparse.Namespace) -> int:
def _cmd_link(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
kb.link_tasks(conn, args.parent_id, args.child_id)
print(f"Linked {args.parent_id} -> {args.child_id}")
return 0
def _cmd_unlink(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.unlink_tasks(conn, args.parent_id, args.child_id)
if not ok:
print(f"No such link: {args.parent_id} -> {args.child_id}", file=sys.stderr)
@@ -1807,7 +1807,7 @@ def _cmd_unlink(args: argparse.Namespace) -> int:
def _cmd_claim(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
task = kb.claim_task(conn, args.task_id, ttl_seconds=args.ttl)
if task is None:
# Report why
@@ -1838,7 +1838,7 @@ def _cmd_comment(args: argparse.Namespace) -> int:
suffix = f"\n\n[trimmed to {args.max_len} chars by --max-len]"
body = body[: max(0, args.max_len - len(suffix))].rstrip() + suffix
author = args.author or _profile_author()
with kb.connect() as conn:
with kb.connect_closing() as conn:
kb.add_comment(conn, args.task_id, author, body)
print(f"Comment added to {args.task_id}")
return 0
@@ -1885,7 +1885,7 @@ def _cmd_complete(args: argparse.Namespace) -> int:
print(f"kanban: --metadata: {exc}", file=sys.stderr)
return 2
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
if not kb.complete_task(
conn, tid,
@@ -1912,7 +1912,7 @@ def _cmd_edit(args: argparse.Namespace) -> int:
except (ValueError, json.JSONDecodeError) as exc:
print(f"kanban: --metadata: {exc}", file=sys.stderr)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
if not kb.edit_completed_task_result(
conn,
args.task_id,
@@ -1934,7 +1934,7 @@ def _cmd_block(args: argparse.Namespace) -> int:
author = _profile_author()
ids = [args.task_id] + list(getattr(args, "ids", None) or [])
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
if reason:
kb.add_comment(conn, tid, author, f"BLOCKED: {reason}")
@@ -1956,7 +1956,7 @@ def _cmd_schedule(args: argparse.Namespace) -> int:
author = _profile_author()
ids = [args.task_id] + list(getattr(args, "ids", None) or [])
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
if reason:
kb.add_comment(conn, tid, author, f"SCHEDULED: {reason}")
@@ -1979,7 +1979,7 @@ def _cmd_unblock(args: argparse.Namespace) -> int:
print("at least one task_id is required", file=sys.stderr)
return 1
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
if not kb.unblock_task(conn, tid):
failed.append(tid)
@@ -2003,7 +2003,7 @@ def _cmd_promote(args: argparse.Namespace) -> int:
seen.add(tid)
results: list[dict[str, object]] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
for tid in ids:
ok, err = kb.promote_task(
conn,
@@ -2050,7 +2050,7 @@ def _cmd_archive(args: argparse.Namespace) -> int:
print("at least one task_id is required", file=sys.stderr)
return 1
failed: list[str] = []
with kb.connect() as conn:
with kb.connect_closing() as conn:
if purge_ids:
for tid in purge_ids:
if not kb.delete_archived_task(conn, tid):
@@ -2073,7 +2073,7 @@ def _cmd_tail(args: argparse.Namespace) -> int:
print(f"Tailing events for {args.task_id}. Ctrl-C to stop.")
try:
while True:
with kb.connect() as conn:
with kb.connect_closing() as conn:
events = kb.list_events(conn, args.task_id)
for e in events:
if e.id > last_id:
@@ -2087,7 +2087,7 @@ def _cmd_tail(args: argparse.Namespace) -> int:
def _cmd_dispatch(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
res = kb.dispatch_once(
conn,
dry_run=args.dry_run,
@@ -2257,7 +2257,7 @@ def _cmd_daemon(args: argparse.Namespace) -> int:
from the dispatcher's perspective, not stuck.
"""
try:
with kb.connect() as conn:
with kb.connect_closing() as conn:
return kb.has_spawnable_ready(conn)
except Exception:
return False
@@ -2288,7 +2288,7 @@ def _cmd_watch(args: argparse.Namespace) -> int:
cursor = 0
print("Watching kanban events. Ctrl-C to stop.", flush=True)
# Seed cursor at the latest id so we don't replay history.
with kb.connect() as conn:
with kb.connect_closing() as conn:
row = conn.execute(
"SELECT COALESCE(MAX(id), 0) AS m FROM task_events"
).fetchone()
@@ -2296,7 +2296,7 @@ def _cmd_watch(args: argparse.Namespace) -> int:
try:
while True:
with kb.connect() as conn:
with kb.connect_closing() as conn:
rows = conn.execute(
"SELECT e.id, e.task_id, e.kind, e.payload, e.created_at, "
" t.assignee, t.tenant "
@@ -2329,7 +2329,7 @@ def _cmd_watch(args: argparse.Namespace) -> int:
def _cmd_stats(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
stats = kb.board_stats(conn)
if getattr(args, "json", False):
print(json.dumps(stats, indent=2, ensure_ascii=False))
@@ -2349,7 +2349,7 @@ def _cmd_stats(args: argparse.Namespace) -> int:
def _cmd_notify_subscribe(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
if kb.get_task(conn, args.task_id) is None:
print(f"no such task: {args.task_id}", file=sys.stderr)
return 1
@@ -2366,7 +2366,7 @@ def _cmd_notify_subscribe(args: argparse.Namespace) -> int:
def _cmd_notify_list(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
subs = kb.list_notify_subs(conn, args.task_id)
if getattr(args, "json", False):
print(json.dumps(subs, indent=2, ensure_ascii=False))
@@ -2383,7 +2383,7 @@ def _cmd_notify_list(args: argparse.Namespace) -> int:
def _cmd_notify_unsubscribe(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.remove_notify_sub(
conn, task_id=args.task_id,
platform=args.platform, chat_id=args.chat_id,
@@ -2417,7 +2417,7 @@ def _cmd_runs(args: argparse.Namespace) -> int:
file=sys.stderr,
)
return 2
with kb.connect() as conn:
with kb.connect_closing() as conn:
runs = kb.list_runs(conn, args.task_id, **rsk)
if getattr(args, "json", False):
print(json.dumps([
@@ -2456,7 +2456,7 @@ def _cmd_runs(args: argparse.Namespace) -> int:
def _cmd_context(args: argparse.Namespace) -> int:
with kb.connect() as conn:
with kb.connect_closing() as conn:
text = kb.build_worker_context(conn, args.task_id)
print(text)
return 0
@@ -2622,7 +2622,7 @@ def _cmd_gc(args: argparse.Namespace) -> int:
import shutil
scratch_root = kb.workspaces_root()
removed_ws = 0
with kb.connect() as conn:
with kb.connect_closing() as conn:
rows = conn.execute(
"SELECT id, workspace_kind, workspace_path FROM tasks WHERE status = 'archived'"
).fetchall()
@@ -2645,7 +2645,7 @@ def _cmd_gc(args: argparse.Namespace) -> int:
event_days = getattr(args, "event_retention_days", 30)
log_days = getattr(args, "log_retention_days", 30)
with kb.connect() as conn:
with kb.connect_closing() as conn:
removed_events = kb.gc_events(
conn, older_than_seconds=event_days * 24 * 3600,
)
+207 -82
View File
@@ -71,6 +71,7 @@ new locking.
from __future__ import annotations
import contextlib
import hashlib
import json
import os
import re
@@ -982,6 +983,89 @@ CREATE INDEX IF NOT EXISTS idx_notify_task ON kanban_notify_subs(task_
_INITIALIZED_PATHS: set[str] = set()
_INIT_LOCK = threading.RLock()
_SQLITE_HEADER = b"SQLite format 3\x00"
DEFAULT_BUSY_TIMEOUT_MS = 120_000
def _resolve_busy_timeout_ms() -> int:
"""Return the SQLite busy timeout for Kanban connections.
Kanban is the shared cross-profile dispatch bus, so worker stampedes are
expected. A long busy timeout lets SQLite serialize writers via WAL rather
than surfacing transient ``database is locked`` failures during bursts.
"""
raw = os.environ.get("HERMES_KANBAN_BUSY_TIMEOUT_MS", "").strip()
if raw:
try:
parsed = int(raw)
except ValueError:
parsed = 0
if parsed > 0:
return parsed
return DEFAULT_BUSY_TIMEOUT_MS
def _sqlite_connect(path: Path) -> sqlite3.Connection:
"""Open a Kanban SQLite connection with consistent lock waiting."""
busy_timeout_ms = _resolve_busy_timeout_ms()
conn = sqlite3.connect(
str(path),
isolation_level=None,
timeout=busy_timeout_ms / 1000.0,
)
# ``sqlite3.connect(timeout=...)`` normally maps to busy_timeout, but set
# the PRAGMA explicitly so it is observable and survives future wrapper
# changes. Parameter binding is not supported for PRAGMA assignments.
conn.execute(f"PRAGMA busy_timeout={busy_timeout_ms}")
return conn
@contextlib.contextmanager
def _cross_process_init_lock(path: Path):
"""Serialize first-connect WAL/schema/integrity setup across processes.
``_INIT_LOCK`` only protects threads inside one Python process. During a
dispatcher burst, many worker processes can all hit a fresh/legacy board at
once and each process has an empty ``_INITIALIZED_PATHS`` cache. This file
lock keeps header validation, integrity probing, WAL activation, and
additive migrations single-file/single-writer across the whole host while
leaving normal post-init DB usage concurrent under SQLite WAL.
"""
path.parent.mkdir(parents=True, exist_ok=True)
lock_path = path.with_name(path.name + ".init.lock")
handle = lock_path.open("a+b")
try:
if _IS_WINDOWS:
import msvcrt
# Lock a single byte in the sidecar file. ``msvcrt.locking`` starts
# at the current file position, so seek explicitly before both
# lock and unlock. The file is opened in append/read binary mode so
# it always exists but the byte-range lock is the synchronization
# primitive; no payload needs to be written.
handle.seek(0)
locking = getattr(msvcrt, "locking")
lock_mode = getattr(msvcrt, "LK_LOCK")
locking(handle.fileno(), lock_mode, 1)
else:
import fcntl
fcntl.flock(handle.fileno(), fcntl.LOCK_EX)
yield
finally:
try:
if _IS_WINDOWS:
import msvcrt
handle.seek(0)
locking = getattr(msvcrt, "locking")
unlock_mode = getattr(msvcrt, "LK_UNLCK")
locking(handle.fileno(), unlock_mode, 1)
else:
import fcntl
fcntl.flock(handle.fileno(), fcntl.LOCK_UN)
finally:
handle.close()
def _looks_like_tls_record_at(data: bytes, offset: int) -> bool:
@@ -1055,14 +1139,21 @@ class KanbanDbCorruptError(RuntimeError):
def _backup_corrupt_db(path: Path) -> Optional[Path]:
"""Copy a corrupt DB (and its WAL/SHM sidecars) to a timestamped backup.
"""Copy a corrupt DB (and its WAL/SHM sidecars) to a content-addressed backup.
The backup filename is deterministic in the main DB's sha256, so repeated
quarantines of the same corrupt bytes (gateway restarts, dispatcher retries,
multi-profile fleets all hitting the same shared DB) reuse one backup
instead of amplifying disk usage by N. If the corrupt bytes actually
change between attempts e.g. a partial repair or further damage the
fingerprint changes and a separate backup is preserved.
Returns the backup path of the main DB file, or ``None`` if the copy
itself failed (the caller still raises loudly in that case).
Writes are confined to the original DB's parent directory. The
backup basename is derived purely from ``path.name``, never from
caller-supplied directory segments no traversal is possible.
Writes are confined to the original DB's parent directory. The backup
basename is derived purely from ``path.name`` and a content hash, never
from caller-supplied directory segments no traversal is possible.
"""
# Resolve once and pin the parent so subsequent path operations cannot
# escape it. ``Path.resolve()`` collapses any ``..`` segments and
@@ -1070,32 +1161,31 @@ def _backup_corrupt_db(path: Path) -> Optional[Path]:
resolved = path.resolve()
parent = resolved.parent
base_name = resolved.name # basename only
stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
candidate = parent / f"{base_name}.corrupt.{stamp}.bak"
# Defensive: candidate must still be inside parent after construction.
# f-string interpolation of ``base_name`` cannot escape ``parent``
# because ``base_name`` is itself a resolved basename, but assert it
# anyway so static analyzers can see the containment guarantee.
if candidate.parent != parent:
return None
counter = 0
while candidate.exists():
counter += 1
candidate = parent / f"{base_name}.corrupt.{stamp}.{counter}.bak"
if candidate.parent != parent:
return None
digest = hashlib.sha256()
try:
shutil.copy2(resolved, candidate)
with resolved.open("rb") as handle:
for chunk in iter(lambda: handle.read(1024 * 1024), b""):
digest.update(chunk)
except OSError:
return None
token = digest.hexdigest()[:16]
candidate = parent / f"{base_name}.corrupt.{token}.bak"
# Defensive: candidate must still be inside parent after construction.
if candidate.parent != parent:
return None
if not candidate.exists():
try:
shutil.copy2(resolved, candidate)
except OSError:
return None
for suffix in ("-wal", "-shm"):
sidecar = parent / (base_name + suffix)
if sidecar.parent != parent or not sidecar.exists():
continue
sidecar_backup = parent / (candidate.name + suffix)
if sidecar_backup.parent != parent or sidecar_backup.exists():
continue
try:
sidecar_backup = parent / (candidate.name + suffix)
if sidecar_backup.parent != parent:
continue
shutil.copy2(sidecar, sidecar_backup)
except OSError:
pass
@@ -1142,7 +1232,7 @@ def _guard_existing_db_is_healthy(path: Path) -> None:
return
reason: Optional[str] = None
try:
probe = sqlite3.connect(str(resolved), timeout=5, isolation_level=None)
probe = _sqlite_connect(resolved)
try:
row = probe.execute("PRAGMA integrity_check").fetchone()
finally:
@@ -1188,54 +1278,90 @@ def connect(
else:
path = kanban_db_path(board=board)
path.parent.mkdir(parents=True, exist_ok=True)
# Cheap byte-level check first — catches the #29507 TLS-overwrite shape
# and other invalid-header cases without opening a sqlite connection.
_validate_sqlite_header(path)
# Full integrity probe — catches corruption past the header (malformed
# pages, broken internal metadata). Cached per-path after first success
# via _INITIALIZED_PATHS so it only runs once per process per path.
_guard_existing_db_is_healthy(path)
resolved = str(path.resolve())
conn = sqlite3.connect(str(path), isolation_level=None, timeout=30)
try:
conn.row_factory = sqlite3.Row
with _INIT_LOCK:
# WAL activation can take an exclusive lock while SQLite creates the
# sidecar files for a fresh database. Keep it in the same process-local
# critical section as schema initialization so concurrent gateway
# startup threads do not race before _INITIALIZED_PATHS is populated.
# WAL doesn't work on network filesystems (NFS/SMB/FUSE). Shared helper
# falls back to DELETE with one WARNING so kanban stays usable there.
# See hermes_state._WAL_INCOMPAT_MARKERS for detection logic.
from hermes_state import apply_wal_with_fallback
apply_wal_with_fallback(conn, db_label=f"kanban.db ({path.name})")
# FULL (was NORMAL): fsync before each checkpoint to narrow the
# crash window that can leave a b-tree page header torn.
conn.execute("PRAGMA synchronous=FULL")
conn.execute("PRAGMA wal_autocheckpoint=100")
conn.execute("PRAGMA foreign_keys=ON")
# Zero freed pages so a later torn write cannot expose stale
# cell content; persisted in the DB header for new DBs.
conn.execute("PRAGMA secure_delete=ON")
# Surface corrupt cells as read errors instead of silent
# wrong-data returns.
conn.execute("PRAGMA cell_size_check=ON")
needs_init = resolved not in _INITIALIZED_PATHS
if needs_init:
# Idempotent: runs CREATE TABLE IF NOT EXISTS + the additive
# migrations. Cached so subsequent connect() calls in the same
# process are cheap. The lock prevents same-process dispatcher
# threads from racing through the additive ALTER TABLE pass with
# stale PRAGMA snapshots during gateway startup.
conn.executescript(SCHEMA_SQL)
_migrate_add_optional_columns(conn)
_INITIALIZED_PATHS.add(resolved)
except Exception:
conn.close()
raise
with _cross_process_init_lock(path):
# Cheap byte-level check first — catches the #29507 TLS-overwrite shape
# and other invalid-header cases without opening a sqlite connection.
_validate_sqlite_header(path)
# Full integrity probe — catches corruption past the header (malformed
# pages, broken internal metadata). Cached per-path after first success
# via _INITIALIZED_PATHS so it only runs once per process per path.
_guard_existing_db_is_healthy(path)
resolved = str(path.resolve())
conn = _sqlite_connect(path)
try:
conn.row_factory = sqlite3.Row
with _INIT_LOCK:
# WAL activation can take an exclusive lock while SQLite creates the
# sidecar files for a fresh database. Keep it in the same process-local
# critical section as schema initialization so concurrent gateway
# startup threads do not race before _INITIALIZED_PATHS is populated.
# WAL doesn't work on network filesystems (NFS/SMB/FUSE). Shared helper
# falls back to DELETE with one WARNING so kanban stays usable there.
# See hermes_state._WAL_INCOMPAT_MARKERS for detection logic.
from hermes_state import apply_wal_with_fallback
apply_wal_with_fallback(conn, db_label=f"kanban.db ({path.name})")
# FULL (was NORMAL): fsync before each checkpoint to narrow the
# crash window that can leave a b-tree page header torn.
conn.execute("PRAGMA synchronous=FULL")
conn.execute("PRAGMA wal_autocheckpoint=100")
conn.execute("PRAGMA foreign_keys=ON")
# Zero freed pages so a later torn write cannot expose stale
# cell content; persisted in the DB header for new DBs.
conn.execute("PRAGMA secure_delete=ON")
# Surface corrupt cells as read errors instead of silent
# wrong-data returns.
conn.execute("PRAGMA cell_size_check=ON")
needs_init = resolved not in _INITIALIZED_PATHS
if needs_init:
# Idempotent: runs CREATE TABLE IF NOT EXISTS + the additive
# migrations. Cached so subsequent connect() calls in the same
# process are cheap. The lock prevents same-process dispatcher
# threads from racing through the additive ALTER TABLE pass with
# stale PRAGMA snapshots during gateway startup.
conn.executescript(SCHEMA_SQL)
_migrate_add_optional_columns(conn)
_INITIALIZED_PATHS.add(resolved)
except Exception:
conn.close()
raise
return conn
@contextlib.contextmanager
def connect_closing(
db_path: Optional[Path] = None,
*,
board: Optional[str] = None,
):
"""Open a kanban DB connection and guarantee it is closed on exit.
Use this instead of ``with kb.connect() as conn:`` sqlite3's
built-in connection context manager only commits/rollbacks the
transaction; it does NOT close the file descriptor. In long-lived
processes (gateway, dashboard) that route every kanban operation
through ``connect()`` (e.g. ``run_slash`` dispatching ``/kanban ``
commands, ``decompose_task_endpoint`` calling
``kanban_decompose.decompose_task``), the unclosed connections
accumulate as open FDs to ``kanban.db`` and ``kanban.db-wal``. After
enough operations the process hits the kernel FD limit and dies
with ``[Errno 24] Too many open files``.
See #33159 for the production incident.
The ``connect()`` function itself remains unchanged so callers that
intentionally manage the connection lifetime (tests, long-lived
callers) continue to work.
"""
conn = connect(db_path=db_path, board=board)
try:
yield conn
finally:
try:
conn.close()
except Exception:
pass
def init_db(
db_path: Optional[Path] = None,
*,
@@ -4264,21 +4390,20 @@ def reap_worker_zombies() -> "list[int]":
Returns the list of reaped PIDs. Safe to call when there are no
children (returns []). No-op on Windows.
"""
if os.name == "nt":
return []
reaped: "list[int]" = []
try:
while True:
try:
pid, status = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
break
if pid == 0:
break
_record_worker_exit(pid, status)
reaped.append(pid)
except Exception:
pass
if os.name != "nt":
try:
while True:
try:
pid, status = os.waitpid(-1, os.WNOHANG)
except ChildProcessError:
break
if pid == 0:
break
_record_worker_exit(pid, status)
reaped.append(pid)
except Exception:
pass
return reaped
+4 -4
View File
@@ -281,7 +281,7 @@ def decompose_task(
configured, API error, malformed response, decomposer returned
fanout=true with empty task list) those surface via ``ok=False``.
"""
with kb.connect() as conn:
with kb.connect_closing() as conn:
task = kb.get_task(conn, task_id)
if task is None:
return DecomposeOutcome(task_id, False, "unknown task id")
@@ -370,7 +370,7 @@ def decompose_task(
return DecomposeOutcome(
task_id, False, "decomposer returned fanout=false with no title/body",
)
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.specify_triage_task(
conn,
task_id,
@@ -439,7 +439,7 @@ def decompose_task(
})
try:
with kb.connect() as conn:
with kb.connect_closing() as conn:
child_ids = kb.decompose_triage_task(
conn,
task_id,
@@ -467,7 +467,7 @@ def decompose_task(
def list_triage_ids(*, tenant: Optional[str] = None) -> list[str]:
"""Return task ids currently in the triage column."""
with kb.connect() as conn:
with kb.connect_closing() as conn:
rows = kb.list_tasks(
conn,
status="triage",
+3 -3
View File
@@ -150,7 +150,7 @@ def specify_task(
error, malformed response) those surface via ``ok=False`` so the
``--all`` sweep can continue past individual failures.
"""
with kb.connect() as conn:
with kb.connect_closing() as conn:
task = kb.get_task(conn, task_id)
if task is None:
return SpecifyOutcome(task_id, False, "unknown task id")
@@ -239,7 +239,7 @@ def specify_task(
task_id, False, "LLM response missing title and body"
)
with kb.connect() as conn:
with kb.connect_closing() as conn:
ok = kb.specify_triage_task(
conn,
task_id,
@@ -261,7 +261,7 @@ def list_triage_ids(*, tenant: Optional[str] = None) -> list[str]:
``tenant`` narrows the sweep; ``None`` returns every triage task.
"""
with kb.connect() as conn:
with kb.connect_closing() as conn:
tasks = kb.list_tasks(
conn,
status="triage",
+190 -47
View File
@@ -2997,6 +2997,7 @@ def _model_flow_nous(config, current_model="", args=None):
"""Nous Portal provider: ensure logged in, then pick model."""
from hermes_cli.auth import (
get_provider_auth_state,
NOUS_INFERENCE_AUTH_MODE_LEGACY,
_prompt_model_selection,
_save_model_choice,
_update_config_for_provider,
@@ -3092,8 +3093,21 @@ def _model_flow_nous(config, current_model="", args=None):
# Fetch live pricing (non-blocking — returns empty dict on failure)
pricing = get_pricing_for_provider("nous")
# Check if user is on free tier
free_tier = check_nous_free_tier()
# Force fresh account data for model selection so recent credit purchases
# are reflected immediately.
free_tier = check_nous_free_tier(force_fresh=True)
if not free_tier:
try:
refreshed_creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=5 * 60,
inference_auth_mode=NOUS_INFERENCE_AUTH_MODE_LEGACY,
)
if refreshed_creds:
creds = refreshed_creds
except Exception:
# Runtime inference has its own paid-entitlement recovery path; do
# not block model selection if this opportunistic remint fails.
pass
# Resolve portal URL early — needed both for upgrade links and for the
# freeRecommendedModels endpoint below.
@@ -3115,7 +3129,24 @@ def _model_flow_nous(config, current_model="", args=None):
# newly-launched paid models surface in the picker too — independent
# of CLI release cadence.
unavailable_models: list[str] = []
unavailable_message = ""
if free_tier:
try:
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
_account_info = get_nous_portal_account_info(force_fresh=True)
unavailable_message = (
format_nous_portal_entitlement_message(
_account_info,
capability="paid Nous models",
)
or ""
)
except Exception:
unavailable_message = ""
model_ids, pricing = union_with_portal_free_recommendations(
model_ids, pricing, _nous_portal_url,
)
@@ -3137,7 +3168,7 @@ def _model_flow_nous(config, current_model="", args=None):
from hermes_cli.auth import DEFAULT_NOUS_PORTAL_URL
_url = (_nous_portal_url or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
print(f"Upgrade at {_url} to access paid models.")
print(unavailable_message or f"Upgrade at {_url} to access paid models.")
return
print(
@@ -3150,6 +3181,7 @@ def _model_flow_nous(config, current_model="", args=None):
pricing=pricing,
unavailable_models=unavailable_models,
portal_url=_nous_portal_url,
unavailable_message=unavailable_message,
)
if selected:
_save_model_choice(selected)
@@ -6477,6 +6509,104 @@ def _web_ui_build_needed(web_dir: Path) -> bool:
return False
def _run_with_idle_timeout(
cmd: list[str],
cwd: Path,
*,
idle_timeout_seconds: int = 180,
indent: str = " ",
) -> subprocess.CompletedProcess:
"""Run a subprocess that streams output, with an idle-output timeout.
Issue #33788: ``npm run build`` (Vite) was invoked with
``capture_output=True`` and no timeout. On low-memory hosts (notably
WSL2 with the default 4 GB cap) the build can stall or sit silent for
minutes; users see a frozen terminal, assume the update is hung, and
reboot leaving the editable install in a half-state with the
``hermes`` launcher present but ``hermes_cli`` not importable.
This helper fixes both halves: stdout is streamed (so the user sees
progress), and if no bytes have appeared on stdout/stderr for
``idle_timeout_seconds``, the process is terminated and the call
returns with a non-zero ``returncode``. The caller's existing
stale-dist fallback (#23817) takes over from there.
Returns a ``CompletedProcess`` with merged stdout (text), empty
stderr, and an integer returncode. Never raises on idle timeout
propagation of failure is via the returncode.
"""
merged_chunks: list[str] = []
last_output_ts = _time.monotonic()
lock = threading.Lock()
try:
proc = subprocess.Popen(
cmd,
cwd=cwd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
encoding="utf-8",
errors="replace",
bufsize=1,
)
except OSError as exc:
# E.g. npm not on PATH between the which() check and now.
return subprocess.CompletedProcess(cmd, 127, stdout="", stderr=str(exc))
def _reader() -> None:
nonlocal last_output_ts
assert proc.stdout is not None
for line in proc.stdout:
try:
print(f"{indent}{line.rstrip()}", flush=True)
except UnicodeEncodeError:
# Windows cp1252 fallback — same pattern as _say().
enc = getattr(sys.stdout, "encoding", None) or "ascii"
safe = line.rstrip().encode(enc, errors="replace").decode(enc, errors="replace")
print(f"{indent}{safe}", flush=True)
with lock:
merged_chunks.append(line)
last_output_ts = _time.monotonic()
reader_thread = threading.Thread(target=_reader, daemon=True)
reader_thread.start()
idle_killed = False
while True:
try:
rc = proc.wait(timeout=5)
break
except subprocess.TimeoutExpired:
with lock:
idle = _time.monotonic() - last_output_ts
if idle > idle_timeout_seconds:
idle_killed = True
proc.terminate()
try:
rc = proc.wait(timeout=3)
except subprocess.TimeoutExpired:
proc.kill()
rc = proc.wait()
break
# Drain reader so we don't leak the stdout file descriptor.
reader_thread.join(timeout=2)
combined = "".join(merged_chunks)
if idle_killed:
msg = (
f"\n ⚠ Build produced no output for {idle_timeout_seconds}s — terminated.\n"
" Common causes: out-of-memory on a low-RAM host (WSL/container),\n"
" a stuck Node process, or an antivirus scan stalling I/O.\n"
)
combined += msg
# Force a non-zero rc even if terminate() raced with a clean exit.
if rc == 0:
rc = 124 # GNU `timeout` convention
return subprocess.CompletedProcess(cmd, rc, stdout=combined, stderr="")
def _run_npm_install_deterministic(
npm: str,
cwd: Path,
@@ -6582,31 +6712,26 @@ def _build_web_ui(web_dir: Path, *, fatal: bool = False) -> bool:
if fatal:
_say(" Run manually: cd web && npm install && npm run build")
return False
# First attempt
r2 = subprocess.run(
[npm, "run", "build"],
cwd=web_dir,
capture_output=True,
text=True,
encoding="utf-8",
errors="replace",
)
# First attempt — stream output via idle-timeout helper (issue #33788).
# capture_output=True on a long Vite build looks identical to a hang;
# users react by rebooting, which leaves the editable install in a
# half-state. Streaming + idle-kill makes failures observable AND
# recoverable (the stale-dist fallback below handles the kill path).
r2 = _run_with_idle_timeout([npm, "run", "build"], cwd=web_dir)
if r2.returncode != 0:
# Retry once after a short delay — covers boot-time races on Windows
# (antivirus scanning Node.js binaries, npm cache not ready, transient
# I/O when launched via Scheduled Task at logon). See issue #23817.
_time.sleep(3)
r2 = subprocess.run(
[npm, "run", "build"],
cwd=web_dir,
capture_output=True,
text=True,
encoding="utf-8",
errors="replace",
)
r2 = _run_with_idle_timeout([npm, "run", "build"], cwd=web_dir)
if r2.returncode != 0:
stderr_preview = (r2.stderr or "").strip()
# _run_with_idle_timeout merges stderr into stdout; older callers
# using subprocess.run kept them split. Pull from whichever has
# content so the error surfaces regardless of which path produced
# the CompletedProcess.
build_output = (r2.stderr or "") + (r2.stdout or "")
stderr_preview = build_output.strip()
stderr_tail = "\n ".join(stderr_preview.splitlines()[-10:]) if stderr_preview else ""
dist_dir = web_dir.parent / "hermes_cli" / "web_dist"
dist_index = dist_dir / "index.html"
@@ -7097,6 +7222,11 @@ def _update_via_zip(args):
_install_python_dependencies_with_optional_fallback(pip_cmd)
_update_node_dependencies()
# Core (Python deps + git pull / ZIP extract) is now complete; the CLI
# is functional from this point onward. The web UI build below is
# optional — a failure here only affects ``hermes dashboard``. Make
# that visible so users don't panic and reboot mid-build (#33788).
print("→ Core update complete. Building dashboard (optional)...")
_build_web_ui(PROJECT_ROOT / "web")
# Sync skills
@@ -8125,37 +8255,18 @@ def _install_psutil_android_compat(
nothing is persisted in the repository.
Stopgap: remove this once https://github.com/giampaolo/psutil/pull/2762
merges and ships in a release. ``scripts/install_psutil_android.py``
contains the same logic for ``scripts/install.sh`` (fresh installs).
Both copies should be removed together.
merges and ships in a release. The standalone installer script uses the
same shared helper and should be removed together.
"""
import tarfile
import tempfile
import urllib.request
psutil_url = (
"https://files.pythonhosted.org/packages/aa/c6/"
"d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/"
"psutil-7.2.2.tar.gz"
)
from hermes_cli.psutil_android import PSUTIL_URL, prepare_patched_psutil_sdist
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
archive = tmp_path / "psutil.tar.gz"
urllib.request.urlretrieve(psutil_url, archive)
with tarfile.open(archive) as tar:
tar.extractall(tmp_path)
src_root = next(
p for p in tmp_path.iterdir() if p.is_dir() and p.name.startswith("psutil-")
)
common_py = src_root / "psutil" / "_common.py"
content = common_py.read_text(encoding="utf-8")
marker = 'LINUX = sys.platform.startswith("linux")'
replacement = 'LINUX = sys.platform.startswith(("linux", "android"))'
if marker not in content:
raise RuntimeError("psutil Android compatibility patch marker not found")
common_py.write_text(content.replace(marker, replacement), encoding="utf-8")
urllib.request.urlretrieve(PSUTIL_URL, archive)
src_root = prepare_patched_psutil_sdist(archive, tmp_path)
_run_install_with_heartbeat(
install_cmd_prefix + ["install", "--no-build-isolation", str(src_root)],
@@ -8416,6 +8527,14 @@ def _cmd_update_check(branch: str = "main", *, branch_explicit: bool = False):
"""
from hermes_cli.config import detect_install_method
method = detect_install_method(PROJECT_ROOT)
if method == "docker":
# Docker can't ``git fetch`` from within the container. Surface the
# same long-form ``docker pull`` guidance ``hermes update`` (apply
# path) uses — telling the user to "reinstall via curl" or that
# ".git is missing" would point them at the wrong remediation.
from hermes_cli.config import format_docker_update_message
print(format_docker_update_message())
sys.exit(1)
if method == "pip":
from hermes_cli.config import recommended_update_command
from hermes_cli.banner import check_via_pypi
@@ -8716,12 +8835,27 @@ def cmd_update(args):
runs the update, then restores stdio on the way out (even on
``sys.exit`` or unhandled exceptions).
"""
from hermes_cli.config import is_managed, managed_error
from hermes_cli.config import (
detect_install_method,
format_docker_update_message,
is_managed,
managed_error,
)
if is_managed():
managed_error("update Hermes Agent")
return
# Docker users can't ``git pull`` — the image excludes ``.git`` from
# the build context. Bail with a friendly explanation pointing at
# ``docker pull`` BEFORE any of the apply-path / check-path branches
# below get a chance to error out with misleading "Not a git
# repository" text. See format_docker_update_message() for the full
# rationale and tag-pinning / config-persistence notes.
if detect_install_method(PROJECT_ROOT) == "docker":
print(format_docker_update_message())
sys.exit(1)
if getattr(args, "check", False):
# --check honors --branch so the "any new commits?" answer matches
# what a subsequent `hermes update --branch=<x>` would actually pull.
@@ -8997,7 +9131,7 @@ def _cmd_update_impl(args, gateway_mode: bool):
try:
from hermes_cli.backup import create_quick_snapshot
snap_id = create_quick_snapshot(label="pre-update")
snap_id = create_quick_snapshot(label="pre-update", keep=1)
if snap_id:
print(f" ✓ Pre-update snapshot: {snap_id}")
except Exception as exc:
@@ -9167,6 +9301,10 @@ def _cmd_update_impl(args, gateway_mode: bool):
_refresh_active_lazy_features()
_update_node_dependencies()
# See note above (ZIP path): core is now complete, web UI build is
# optional from a CLI perspective. Telegraphing this avoids the
# "stuck at webui-build → reboot → broken install" trap (#33788).
print("→ Core update complete. Building dashboard (optional)...")
_build_web_ui(PROJECT_ROOT / "web")
print()
@@ -12518,6 +12656,11 @@ Examples:
],
)
skills_search.add_argument("--limit", type=int, default=10, help="Max results")
skills_search.add_argument(
"--json",
action="store_true",
help="Output JSON instead of a table (full identifiers, scripting-friendly)",
)
skills_install = skills_subparsers.add_parser("install", help="Install a skill")
skills_install.add_argument(
+23 -21
View File
@@ -32,6 +32,8 @@ COPILOT_REASONING_EFFORTS_O_SERIES = ["low", "medium", "high"]
# Fallback OpenRouter snapshot used when the live catalog is unavailable.
# (model_id, display description shown in menus)
OPENROUTER_MODELS: list[tuple[str, str]] = [
("anthropic/claude-opus-4.8", ""),
("anthropic/claude-opus-4.8-fast", "2x price, higher output speed"),
("anthropic/claude-opus-4.7", ""),
("anthropic/claude-opus-4.6", ""),
("anthropic/claude-sonnet-4.6", ""),
@@ -139,6 +141,7 @@ def _xai_curated_models() -> list[str]:
_PROVIDER_MODELS: dict[str, list[str]] = {
"nous": [
"anthropic/claude-opus-4.8",
"anthropic/claude-opus-4.7",
"anthropic/claude-opus-4.6",
"anthropic/claude-sonnet-4.6",
@@ -290,6 +293,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"MiniMax-M2",
],
"anthropic": [
"claude-opus-4-8",
"claude-opus-4-7",
"claude-opus-4-6",
"claude-sonnet-4-6",
@@ -518,9 +522,19 @@ def fetch_nous_account_tier(access_token: str, portal_base_url: str = "") -> dic
def is_nous_free_tier(account_info: dict[str, Any]) -> bool:
"""Return True if the account info indicates a free (unpaid) tier.
Checks ``subscription.monthly_charge == 0``. Returns False when
the field is missing or unparseable (assumes paid don't block users).
Prefer the Portal's explicit ``paid_service_access.allowed`` entitlement
decision. Legacy payloads fall back to ``subscription.monthly_charge == 0``.
Returns False when both signals are missing or unparseable.
"""
paid_access = account_info.get("paid_service_access")
if isinstance(paid_access, dict):
allowed = paid_access.get("allowed")
if isinstance(allowed, bool):
return not allowed
paid = paid_access.get("paid_access")
if isinstance(paid, bool):
return not paid
sub = account_info.get("subscription")
if not isinstance(sub, dict):
return False
@@ -699,40 +713,28 @@ _FREE_TIER_CACHE_TTL: int = 180 # seconds (3 minutes)
_free_tier_cache: tuple[bool, float] | None = None # (result, timestamp)
def check_nous_free_tier() -> bool:
def check_nous_free_tier(*, force_fresh: bool = False) -> bool:
"""Check if the current Nous Portal user is on a free (unpaid) tier.
Results are cached for ``_FREE_TIER_CACHE_TTL`` seconds to avoid
hitting the Portal API on every call. The cache is short-lived so
that an account upgrade is reflected within a few minutes.
Returns False (assume paid) on any error never blocks paying users.
Returns True only when entitlement is known to be free. Unknown/error
states return False so this compatibility wrapper does not block users.
"""
global _free_tier_cache
now = time.monotonic()
if _free_tier_cache is not None:
if not force_fresh and _free_tier_cache is not None:
cached_result, cached_at = _free_tier_cache
if now - cached_at < _FREE_TIER_CACHE_TTL:
return cached_result
try:
from hermes_cli.auth import get_provider_auth_state, resolve_nous_runtime_credentials
from hermes_cli.nous_account import get_nous_portal_account_info
# Ensure we have a fresh token (triggers refresh if needed)
resolve_nous_runtime_credentials(min_key_ttl_seconds=60)
state = get_provider_auth_state("nous")
if not state:
_free_tier_cache = (False, now)
return False
access_token = state.get("access_token", "")
portal_url = state.get("portal_base_url", "")
if not access_token:
_free_tier_cache = (False, now)
return False
account_info = fetch_nous_account_tier(access_token, portal_url)
result = is_nous_free_tier(account_info)
account_info = get_nous_portal_account_info(force_fresh=force_fresh)
result = account_info.is_free_tier
_free_tier_cache = (result, now)
return result
except Exception:
+678
View File
@@ -0,0 +1,678 @@
"""Normalized Nous Portal account entitlement helpers."""
from __future__ import annotations
import hashlib
import json
import time
import urllib.request
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Literal, Optional
NousAccountInfoSource = Literal["jwt", "account_api", "inference_key", "none", "error"]
_ACCOUNT_INFO_CACHE_TTL = 60
_account_info_cache: tuple[str, float, "NousPortalAccountInfo"] | None = None
@dataclass(frozen=True)
class NousPortalSubscriptionInfo:
plan: Optional[str] = None
tier: Optional[int] = None
monthly_charge: Optional[float] = None
current_period_end: Optional[str] = None
credits_remaining: Optional[float] = None
rollover_credits: Optional[float] = None
@dataclass(frozen=True)
class NousPaidServiceAccessInfo:
allowed: Optional[bool] = None
paid_access: Optional[bool] = None
reason: Optional[str] = None
organisation_id: Optional[str] = None
effective_at_ms: Optional[int] = None
has_active_subscription: Optional[bool] = None
active_subscription_is_paid: Optional[bool] = None
subscription_tier: Optional[int] = None
subscription_monthly_charge: Optional[float] = None
subscription_credits_remaining: Optional[float] = None
purchased_credits_remaining: Optional[float] = None
total_usable_credits: Optional[float] = None
@dataclass(frozen=True)
class NousPortalAccountInfo:
logged_in: bool
source: NousAccountInfoSource
fresh: bool
user_id: Optional[str] = None
org_id: Optional[str] = None
client_id: Optional[str] = None
product_id: Optional[str] = None
nous_client: Optional[str] = None
portal_base_url: Optional[str] = None
inference_base_url: Optional[str] = None
inference_credential_present: bool = False
credential_source: Optional[str] = None
expires_at: Optional[datetime] = None
email: Optional[str] = None
privy_did: Optional[str] = None
subscription: Optional[NousPortalSubscriptionInfo] = None
paid_service_access: Optional[bool] = None
paid_service_access_info: Optional[NousPaidServiceAccessInfo] = None
raw_claims: Optional[dict[str, Any]] = None
raw_account: Optional[dict[str, Any]] = None
error: Optional[str] = None
@property
def is_paid(self) -> bool:
return self.paid_service_access is True
@property
def is_free_tier(self) -> bool:
return self.paid_service_access is False
@property
def tool_gateway_entitled(self) -> bool:
return self.paid_service_access is True
def nous_portal_billing_url(account_info: Optional[NousPortalAccountInfo] = None) -> str:
"""Return the billing URL for a normalized Nous account snapshot."""
try:
from hermes_cli.auth import DEFAULT_NOUS_PORTAL_URL
except Exception:
DEFAULT_NOUS_PORTAL_URL = "https://portal.nousresearch.com"
base = None
if account_info is not None:
base = account_info.portal_base_url
if not isinstance(base, str) or not base.strip():
base = DEFAULT_NOUS_PORTAL_URL
return f"{base.rstrip('/')}/billing"
def format_nous_portal_entitlement_message(
account_info: Optional[NousPortalAccountInfo],
*,
capability: str = "this feature",
include_refresh_hint: bool = True,
) -> Optional[str]:
"""Return user-facing guidance for a missing Nous paid entitlement.
``None`` means the account is known to have paid service access. The
message intentionally works from normalized entitlement fields rather than
subscription price alone: purchased credits without a subscription still
count as paid access, while a paid subscription with exhausted usable
credits does not.
"""
billing_url = nous_portal_billing_url(account_info)
if account_info is not None and account_info.paid_service_access is True:
return None
if account_info is None:
return (
f"Hermes could not verify your Nous Portal entitlement, so {capability} "
f"is unavailable. Run `hermes model` to refresh your login, or check "
f"billing at {billing_url}."
)
if not account_info.logged_in:
if account_info.inference_credential_present:
return (
f"Nous inference credentials are configured, but Hermes cannot verify "
f"your Nous Portal paid access for {capability}. Log in with "
f"`hermes model` to enable Portal-managed features. Billing and "
f"credits are managed at {billing_url}."
)
return (
f"Log in to Nous Portal to use {capability}: run `hermes model`. "
f"Billing and credits are managed at {billing_url}."
)
if account_info.paid_service_access is None:
detail = (
f"Hermes could not verify your Nous Portal paid access, so {capability} "
f"is unavailable."
)
if account_info.error:
detail += f" Account lookup failed: {account_info.error}."
if include_refresh_hint:
detail += " Run `hermes model` to refresh your session."
detail += f" Check billing at {billing_url}."
return detail
access = account_info.paid_service_access_info
reason = access.reason if access else None
if reason == "account_missing":
return (
f"Hermes could not find a Nous Portal account or organisation for this "
f"login, so {capability} is unavailable. Run `hermes model` to "
f"authenticate again; if the problem persists, contact Nous support."
)
if reason == "no_usable_credits" or account_info.paid_service_access is False:
message = _no_paid_access_message(account_info, capability, billing_url)
if include_refresh_hint and not account_info.fresh:
message += " If you recently bought credits, run `hermes model` to refresh Hermes."
return message
return (
f"Your Nous Portal account does not currently have paid service access, "
f"so {capability} is unavailable. Add credits or update billing at {billing_url}."
)
def _no_paid_access_message(
account_info: NousPortalAccountInfo,
capability: str,
billing_url: str,
) -> str:
access = account_info.paid_service_access_info
has_active_subscription = access.has_active_subscription if access else None
active_subscription_is_paid = access.active_subscription_is_paid if access else None
total_usable = access.total_usable_credits if access else None
subscription_credits = access.subscription_credits_remaining if access else None
purchased_credits = access.purchased_credits_remaining if access else None
if has_active_subscription and active_subscription_is_paid:
credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
return (
f"Your Nous Portal credits are exhausted{credit_detail}, so {capability} "
f"is unavailable. Top up or renew credits at {billing_url}."
)
if has_active_subscription and active_subscription_is_paid is False:
return (
f"Your current Nous Portal plan does not include paid service access, "
f"so {capability} is unavailable. Upgrade or add credits at {billing_url}."
)
if has_active_subscription is False:
credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
return (
f"Your Nous Portal account has no active subscription or usable credits"
f"{credit_detail}, so {capability} is unavailable. Subscribe or add credits "
f"at {billing_url}."
)
credit_detail = _credit_detail(total_usable, subscription_credits, purchased_credits)
return (
f"Your Nous Portal account has no usable paid credits{credit_detail}, so "
f"{capability} is unavailable. Add credits or update billing at {billing_url}."
)
def _credit_detail(
total_usable: Optional[float],
subscription_credits: Optional[float],
purchased_credits: Optional[float],
) -> str:
parts: list[str] = []
if total_usable is not None:
parts.append(f"usable ${total_usable:.2f}")
if subscription_credits is not None:
parts.append(f"subscription ${subscription_credits:.2f}")
if purchased_credits is not None:
parts.append(f"purchased ${purchased_credits:.2f}")
if not parts:
return ""
return f" ({', '.join(parts)})"
def reset_nous_portal_account_info_cache() -> None:
"""Clear the short-lived account-info cache used by tests."""
global _account_info_cache
_account_info_cache = None
def get_nous_portal_account_info(
*,
force_fresh: bool = False,
min_jwt_ttl_seconds: int = 60,
) -> NousPortalAccountInfo:
"""Return normalized Nous Portal account entitlement information.
By default, a valid unexpired OAuth access JWT is used as a low-latency
local account snapshot. ``force_fresh=True`` always calls
``/api/oauth/account`` and bypasses the short-lived cache. JWT claims are
decoded locally for UX gating only; server APIs remain authoritative.
"""
try:
from hermes_cli.auth import get_provider_auth_state
state = get_provider_auth_state("nous") or {}
except Exception as exc:
return _error_info(error=exc, logged_in=False)
access_token = state.get("access_token")
portal_base_url = _portal_base_url(state)
if not isinstance(access_token, str) or not access_token.strip():
pool_oauth_info = _info_from_oauth_pool(
force_fresh=force_fresh,
min_jwt_ttl_seconds=min_jwt_ttl_seconds,
portal_base_url=portal_base_url,
)
if pool_oauth_info is not None:
return pool_oauth_info
pool_info = _info_from_inference_key_pool(portal_base_url)
if pool_info is not None:
return pool_info
return NousPortalAccountInfo(
logged_in=False,
source="none",
fresh=False,
portal_base_url=portal_base_url,
)
if not force_fresh:
jwt_info = _info_from_valid_jwt(
access_token,
state=state,
portal_base_url=portal_base_url,
min_jwt_ttl_seconds=min_jwt_ttl_seconds,
)
if jwt_info is not None:
return jwt_info
return _fresh_account_info(
state=state,
force_fresh=force_fresh,
portal_base_url=portal_base_url,
)
def _fresh_account_info(
*,
state: dict[str, Any],
force_fresh: bool,
portal_base_url: Optional[str],
) -> NousPortalAccountInfo:
global _account_info_cache
try:
from hermes_cli.auth import get_provider_auth_state, resolve_nous_access_token
access_token = resolve_nous_access_token()
refreshed_state = get_provider_auth_state("nous") or state
portal_base_url = _portal_base_url(refreshed_state) or portal_base_url
cache_key = _cache_key(access_token, portal_base_url)
if not force_fresh and _account_info_cache is not None:
cached_key, cached_at, cached_info = _account_info_cache
if cached_key == cache_key and (time.monotonic() - cached_at) < _ACCOUNT_INFO_CACHE_TTL:
return cached_info
payload = _fetch_nous_account_info(access_token, portal_base_url)
if not payload:
return _error_info(
error="empty_account_response",
logged_in=True,
portal_base_url=portal_base_url,
)
if isinstance(payload.get("error"), str):
return _error_info(
error=payload.get("error") or "account_response_error",
logged_in=True,
portal_base_url=portal_base_url,
raw_account=payload,
)
info = _info_from_account_payload(
payload,
state=refreshed_state,
portal_base_url=portal_base_url,
)
_account_info_cache = (cache_key, time.monotonic(), info)
return info
except Exception as exc:
return _error_info(
error=exc,
logged_in=bool(state.get("access_token")),
portal_base_url=portal_base_url,
)
def _info_from_inference_key_pool(
portal_base_url: Optional[str],
) -> Optional[NousPortalAccountInfo]:
"""Return an explicit unknown-entitlement snapshot for opaque Nous keys."""
try:
entry = _select_nous_pool_entry()
if entry is None:
return None
runtime_key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
if not isinstance(runtime_key, str) or not runtime_key.strip():
return None
return NousPortalAccountInfo(
logged_in=False,
source="inference_key",
fresh=False,
portal_base_url=(
getattr(entry, "portal_base_url", None)
or portal_base_url
),
inference_base_url=(
getattr(entry, "inference_base_url", None)
or getattr(entry, "runtime_base_url", None)
or getattr(entry, "base_url", None)
),
inference_credential_present=True,
credential_source=f"pool:{getattr(entry, 'label', 'unknown')}",
error="portal_oauth_missing",
)
except Exception:
return None
def _info_from_oauth_pool(
*,
force_fresh: bool,
min_jwt_ttl_seconds: int,
portal_base_url: Optional[str],
) -> Optional[NousPortalAccountInfo]:
try:
entry = _select_nous_pool_entry()
except Exception:
return None
if entry is None or not _pool_entry_is_portal_oauth(entry):
return None
access_token = getattr(entry, "access_token", None)
if not isinstance(access_token, str) or not access_token.strip():
return None
entry_portal_url = (
getattr(entry, "portal_base_url", None)
or portal_base_url
)
state = {
"access_token": access_token,
"client_id": getattr(entry, "client_id", None),
"inference_base_url": (
getattr(entry, "inference_base_url", None)
or getattr(entry, "runtime_base_url", None)
or getattr(entry, "base_url", None)
),
"agent_key": getattr(entry, "agent_key", None),
"credential_source": f"pool:{getattr(entry, 'label', 'unknown')}",
}
if not force_fresh:
jwt_info = _info_from_valid_jwt(
access_token,
state=state,
portal_base_url=entry_portal_url,
min_jwt_ttl_seconds=min_jwt_ttl_seconds,
)
if jwt_info is not None:
return jwt_info
try:
payload = _fetch_nous_account_info(access_token, entry_portal_url)
except Exception as exc:
return _error_info(
error=exc,
logged_in=True,
portal_base_url=entry_portal_url,
)
if not payload:
return _error_info(
error="empty_account_response",
logged_in=True,
portal_base_url=entry_portal_url,
)
if isinstance(payload.get("error"), str):
return _error_info(
error=payload.get("error") or "account_response_error",
logged_in=True,
portal_base_url=entry_portal_url,
raw_account=payload,
)
return _info_from_account_payload(
payload,
state=state,
portal_base_url=entry_portal_url,
)
def _select_nous_pool_entry() -> Optional[Any]:
from agent.credential_pool import load_pool
pool = load_pool("nous")
if not pool or not pool.has_credentials():
return None
entries = list(pool.entries())
if not entries:
return None
def _entry_sort_key(entry: Any) -> tuple[float, float, int]:
agent_exp = _parse_iso_timestamp(getattr(entry, "agent_key_expires_at", None)) or 0.0
access_exp = _parse_iso_timestamp(getattr(entry, "expires_at", None)) or 0.0
priority = int(getattr(entry, "priority", 0) or 0)
return (agent_exp, access_exp, -priority)
return max(entries, key=_entry_sort_key)
def _pool_entry_is_portal_oauth(entry: Any) -> bool:
access_token = getattr(entry, "access_token", None)
if not isinstance(access_token, str) or not access_token.strip():
return False
auth_type = str(getattr(entry, "auth_type", "") or "").strip().lower()
refresh_token = getattr(entry, "refresh_token", None)
return auth_type.startswith("oauth") or bool(refresh_token)
def _fetch_nous_account_info(
access_token: str,
portal_base_url: Optional[str] = None,
) -> dict[str, Any]:
base = (portal_base_url or "https://portal.nousresearch.com").rstrip("/")
url = f"{base}/api/oauth/account"
headers = {
"Authorization": f"Bearer {access_token}",
"Accept": "application/json",
}
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req, timeout=8) as resp:
payload = json.loads(resp.read().decode())
return payload if isinstance(payload, dict) else {}
def _info_from_valid_jwt(
token: str,
*,
state: dict[str, Any],
portal_base_url: Optional[str],
min_jwt_ttl_seconds: int,
) -> Optional[NousPortalAccountInfo]:
try:
from hermes_cli.auth import _decode_jwt_claims
except Exception:
return None
claims = _decode_jwt_claims(token)
if not claims:
return None
exp = _coerce_float(claims.get("exp"))
if exp is None or exp <= time.time() + max(0, int(min_jwt_ttl_seconds)):
return None
paid_access = _coerce_bool(claims.get("paid_access"))
subscription_tier = _coerce_int(claims.get("subscription_tier"))
access_info = NousPaidServiceAccessInfo(
allowed=paid_access,
paid_access=paid_access,
organisation_id=_coerce_str(claims.get("org_id")),
subscription_tier=subscription_tier,
)
return NousPortalAccountInfo(
logged_in=True,
source="jwt",
fresh=False,
user_id=_coerce_str(claims.get("sub")),
org_id=_coerce_str(claims.get("org_id")),
client_id=_coerce_str(claims.get("client_id") or state.get("client_id")),
product_id=_coerce_str(claims.get("product_id")),
nous_client=_coerce_str(claims.get("nous_client")),
portal_base_url=portal_base_url,
inference_base_url=_coerce_str(state.get("inference_base_url")),
inference_credential_present=True,
credential_source=_coerce_str(state.get("credential_source")) or "auth_store",
expires_at=datetime.fromtimestamp(exp, tz=timezone.utc),
paid_service_access=paid_access,
paid_service_access_info=access_info,
raw_claims=dict(claims),
)
def _info_from_account_payload(
payload: dict[str, Any],
*,
state: dict[str, Any],
portal_base_url: Optional[str],
) -> NousPortalAccountInfo:
user = payload.get("user") if isinstance(payload.get("user"), dict) else {}
organisation = (
payload.get("organisation")
if isinstance(payload.get("organisation"), dict)
else {}
)
subscription = _subscription_from_payload(payload.get("subscription"))
access = _paid_service_access_from_payload(payload.get("paid_service_access"))
paid_access = access.allowed if access else None
if paid_access is None and access is not None:
paid_access = access.paid_access
return NousPortalAccountInfo(
logged_in=True,
source="account_api",
fresh=True,
org_id=_coerce_str(organisation.get("id")) or (access.organisation_id if access else None),
client_id=_coerce_str(state.get("client_id")),
portal_base_url=portal_base_url,
inference_base_url=_coerce_str(state.get("inference_base_url")),
inference_credential_present=bool(state.get("access_token") or state.get("agent_key")),
credential_source=_coerce_str(state.get("credential_source")) or "auth_store",
email=_coerce_str(user.get("email")),
privy_did=_coerce_str(user.get("privy_did")),
subscription=subscription,
paid_service_access=paid_access,
paid_service_access_info=access,
raw_account=dict(payload),
)
def _subscription_from_payload(value: Any) -> Optional[NousPortalSubscriptionInfo]:
if not isinstance(value, dict):
return None
return NousPortalSubscriptionInfo(
plan=_coerce_str(value.get("plan")),
tier=_coerce_int(value.get("tier")),
monthly_charge=_coerce_float(value.get("monthly_charge")),
current_period_end=_coerce_str(value.get("current_period_end")),
credits_remaining=_coerce_float(value.get("credits_remaining")),
rollover_credits=_coerce_float(value.get("rollover_credits")),
)
def _paid_service_access_from_payload(value: Any) -> Optional[NousPaidServiceAccessInfo]:
if not isinstance(value, dict):
return None
allowed = _coerce_bool(value.get("allowed"))
paid_access = _coerce_bool(value.get("paid_access"))
return NousPaidServiceAccessInfo(
allowed=allowed,
paid_access=paid_access,
reason=_coerce_str(value.get("reason")),
organisation_id=_coerce_str(value.get("organisation_id")),
effective_at_ms=_coerce_int(value.get("effective_at_ms")),
has_active_subscription=_coerce_bool(value.get("has_active_subscription")),
active_subscription_is_paid=_coerce_bool(value.get("active_subscription_is_paid")),
subscription_tier=_coerce_int(value.get("subscription_tier")),
subscription_monthly_charge=_coerce_float(value.get("subscription_monthly_charge")),
subscription_credits_remaining=_coerce_float(value.get("subscription_credits_remaining")),
purchased_credits_remaining=_coerce_float(value.get("purchased_credits_remaining")),
total_usable_credits=_coerce_float(value.get("total_usable_credits")),
)
def _error_info(
*,
error: object,
logged_in: bool,
portal_base_url: Optional[str] = None,
raw_account: Optional[dict[str, Any]] = None,
) -> NousPortalAccountInfo:
return NousPortalAccountInfo(
logged_in=logged_in,
source="error",
fresh=False,
portal_base_url=portal_base_url,
raw_account=raw_account,
error=str(error),
)
def _portal_base_url(state: dict[str, Any]) -> Optional[str]:
value = state.get("portal_base_url")
if not isinstance(value, str) or not value.strip():
return None
return value.strip().rstrip("/")
def _cache_key(access_token: str, portal_base_url: Optional[str]) -> str:
digest = hashlib.sha256(access_token.encode("utf-8")).hexdigest()
return f"{portal_base_url or ''}:{digest}"
def _parse_iso_timestamp(value: Any) -> Optional[float]:
if not isinstance(value, str) or not value:
return None
text = value.strip()
if text.endswith("Z"):
text = text[:-1] + "+00:00"
try:
return datetime.fromisoformat(text).timestamp()
except Exception:
return None
def _coerce_str(value: Any) -> Optional[str]:
if isinstance(value, str) and value:
return value
return None
def _coerce_bool(value: Any) -> Optional[bool]:
return value if isinstance(value, bool) else None
def _coerce_int(value: Any) -> Optional[int]:
if isinstance(value, bool):
return None
try:
if value is None:
return None
return int(value)
except (TypeError, ValueError):
return None
def _coerce_float(value: Any) -> Optional[float]:
if isinstance(value, bool):
return None
try:
if value is None:
return None
return float(value)
except (TypeError, ValueError):
return None
+40 -11
View File
@@ -6,8 +6,8 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Dict, Iterable, Optional, Set
from hermes_cli.auth import get_nous_auth_status
from hermes_cli.config import get_env_value, load_config
from hermes_cli.nous_account import NousPortalAccountInfo, get_nous_portal_account_info
from tools.managed_tool_gateway import is_managed_tool_gateway_ready
from utils import is_truthy_value
from tools.tool_backend_helpers import (
@@ -53,6 +53,7 @@ class NousSubscriptionFeatures:
nous_auth_present: bool
provider_is_nous: bool
features: Dict[str, NousFeatureState]
account_info: Optional[NousPortalAccountInfo] = None
@property
def web(self) -> NousFeatureState:
@@ -227,6 +228,8 @@ def _resolve_browser_feature_state(
def get_nous_subscription_features(
config: Optional[Dict[str, object]] = None,
*,
force_fresh: bool = False,
) -> NousSubscriptionFeatures:
if config is None:
config = load_config() or {}
@@ -235,12 +238,19 @@ def get_nous_subscription_features(
provider_is_nous = str(model_cfg.get("provider") or "").strip().lower() == "nous"
try:
nous_status = get_nous_auth_status()
if force_fresh:
account_info = get_nous_portal_account_info(force_fresh=True)
else:
account_info = get_nous_portal_account_info()
except Exception:
nous_status = {}
account_info = None
managed_tools_flag = managed_nous_tools_enabled()
nous_auth_present = bool(nous_status.get("logged_in"))
managed_tools_flag = bool(
account_info
and account_info.logged_in
and account_info.paid_service_access is True
)
nous_auth_present = bool(account_info and account_info.logged_in)
subscribed = provider_is_nous or nous_auth_present
web_tool_enabled = _toolset_enabled(config, "web")
@@ -317,6 +327,7 @@ def get_nous_subscription_features(
modal_mode,
has_direct=direct_modal,
managed_ready=managed_modal_available,
managed_enabled=managed_tools_flag,
)
web_managed = web_backend == "firecrawl" and managed_web_available and not direct_firecrawl
@@ -483,6 +494,7 @@ def get_nous_subscription_features(
nous_auth_present=nous_auth_present,
provider_is_nous=provider_is_nous,
features=features,
account_info=account_info,
)
@@ -493,11 +505,15 @@ def apply_nous_managed_defaults(
config: Dict[str, object],
*,
enabled_toolsets: Optional[Iterable[str]] = None,
force_fresh: bool = False,
) -> set[str]:
if not managed_nous_tools_enabled():
features = get_nous_subscription_features(config, force_fresh=force_fresh)
if not (
features.account_info
and features.account_info.logged_in
and features.account_info.paid_service_access is True
):
return set()
features = get_nous_subscription_features(config)
if not features.provider_is_nous:
return set()
@@ -594,6 +610,8 @@ _ALL_GATEWAY_KEYS = ("web", "image_gen", "tts", "browser")
def get_gateway_eligible_tools(
config: Optional[Dict[str, object]] = None,
*,
force_fresh: bool = False,
) -> tuple[list[str], list[str], list[str]]:
"""Return (unconfigured, has_direct, already_managed) tool key lists.
@@ -604,7 +622,11 @@ def get_gateway_eligible_tools(
All lists are empty when the user is not a paid Nous subscriber or
is not using Nous as their provider.
"""
if not managed_nous_tools_enabled():
if force_fresh:
managed_enabled = managed_nous_tools_enabled(force_fresh=True)
else:
managed_enabled = managed_nous_tools_enabled()
if not managed_enabled:
return [], [], []
if config is None:
@@ -695,7 +717,11 @@ def apply_gateway_defaults(
return changed
def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
def prompt_enable_tool_gateway(
config: Dict[str, object],
*,
force_fresh: bool = True,
) -> set[str]:
"""If eligible tools exist, prompt the user to enable the Tool Gateway.
Uses prompt_choice() with a description parameter so the curses TUI
@@ -704,7 +730,10 @@ def prompt_enable_tool_gateway(config: Dict[str, object]) -> set[str]:
Returns the set of tools that were enabled, or empty set if the user
declined or no tools were eligible.
"""
unconfigured, has_direct, already_managed = get_gateway_eligible_tools(config)
unconfigured, has_direct, already_managed = get_gateway_eligible_tools(
config,
force_fresh=force_fresh,
)
if not unconfigured and not has_direct:
return set()
+26 -3
View File
@@ -864,12 +864,35 @@ def _discover_memory_providers() -> list[tuple[str, str]]:
def _discover_context_engines() -> list[tuple[str, str]]:
"""Return [(name, description), ...] for available context engines."""
"""Return [(name, description), ...] for available context engines.
Includes repo-shipped engines from ``plugins/context_engine/`` AND
plugin-registered engines (third-party engines installed as Hermes
plugins via ``ctx.register_context_engine``). Repo-shipped descriptions
win when a plugin-registered engine collides on name.
"""
engines: list[tuple[str, str]] = []
seen: set[str] = set()
try:
from plugins.context_engine import discover_context_engines
return [(name, desc) for name, desc, _avail in discover_context_engines()]
for name, desc, _avail in discover_context_engines():
if name not in seen:
engines.append((name, desc))
seen.add(name)
except Exception:
return []
pass
try:
from hermes_cli.plugins import discover_plugins, get_plugin_context_engine
discover_plugins()
plugin_engine = get_plugin_context_engine()
if plugin_engine and getattr(plugin_engine, "name", None) and plugin_engine.name not in seen:
engines.append((plugin_engine.name, "installed plugin"))
except Exception:
pass
return engines
def _get_current_memory_provider() -> str:
+13 -4
View File
@@ -79,7 +79,7 @@ class XAIGrokAdapter(UpstreamAdapter):
failed_credential: UpstreamCredential,
status_code: int,
) -> Optional[UpstreamCredential]:
if status_code != 401:
if status_code not in {401, 429}:
return None
with self._lock:
@@ -87,16 +87,25 @@ class XAIGrokAdapter(UpstreamAdapter):
if pool is None:
return None
refreshed = pool.try_refresh_current()
if refreshed is None:
if status_code == 429:
# Mark the rate-limited key with its 1-hour cooldown and rotate
# to the next available credential. Returns None when the pool
# has no other key to offer — the 429 will flow back to the client.
refreshed = pool.mark_exhausted_and_rotate(status_code=status_code)
else:
refreshed = pool.try_refresh_current()
if refreshed is None:
refreshed = pool.mark_exhausted_and_rotate(status_code=status_code)
if refreshed is None:
return None
retry_cred = self._credential_from_entry(refreshed)
if retry_cred.bearer == failed_credential.bearer:
return None
logger.info("proxy: xAI upstream rejected bearer; retrying with refreshed pool credential")
logger.info(
"proxy: xAI upstream returned %s; retrying with rotated pool credential",
status_code,
)
return retry_cred
def _load_pool(self) -> Optional[CredentialPool]:
+1 -1
View File
@@ -206,7 +206,7 @@ def create_app(adapter: UpstreamAdapter) -> "web.Application":
return session_or_response
session = session_or_response
if upstream_resp.status == 401:
if upstream_resp.status in {401, 429}:
try:
retry_cred = adapter.get_retry_credential(
failed_credential=cred,
+108
View File
@@ -0,0 +1,108 @@
"""Helpers for the temporary psutil-on-Android compatibility installer."""
from __future__ import annotations
import shutil
import tarfile
from pathlib import Path, PurePosixPath
# Pin a version we know patches cleanly. Update when a newer psutil
# changes the marker line shape and we need to follow upstream.
PSUTIL_URL = (
"https://files.pythonhosted.org/packages/aa/c6/"
"d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/"
"psutil-7.2.2.tar.gz"
)
MARKER = 'LINUX = sys.platform.startswith("linux")'
REPLACEMENT = 'LINUX = sys.platform.startswith(("linux", "android"))'
class PsutilAndroidInstallError(RuntimeError):
"""Raised when the pinned psutil sdist is missing or unsafe."""
def _normalize_member_parts(member_name: str) -> tuple[str, ...]:
path = PurePosixPath(member_name)
parts = tuple(part for part in path.parts if part not in ("", "."))
if path.is_absolute() or ".." in parts or not parts:
raise PsutilAndroidInstallError(
f"Unsafe archive member path: {member_name!r}"
)
return parts
def _safe_extract_tar_gz(archive: Path, destination: Path) -> None:
"""Extract a tar.gz without allowing traversal or link members."""
with tarfile.open(archive, "r:gz") as tf:
for member in tf.getmembers():
parts = _normalize_member_parts(member.name)
target = destination.joinpath(*parts)
if member.isdir():
target.mkdir(parents=True, exist_ok=True)
continue
if not member.isfile():
raise PsutilAndroidInstallError(
f"Unsupported archive member type: {member.name}"
)
target.parent.mkdir(parents=True, exist_ok=True)
extracted = tf.extractfile(member)
if extracted is None:
raise PsutilAndroidInstallError(
f"Cannot read archive member: {member.name}"
)
with extracted, open(target, "wb") as dst:
shutil.copyfileobj(extracted, dst)
try:
target.chmod(member.mode & 0o777)
except OSError:
pass
def prepare_patched_psutil_sdist(archive: Path, destination: Path) -> Path:
"""Safely extract the pinned psutil sdist and patch it for Android."""
_safe_extract_tar_gz(archive, destination)
src_roots = sorted(
(
path for path in destination.iterdir()
if path.is_dir() and path.name.startswith("psutil-")
),
key=lambda path: path.name,
)
if not src_roots:
raise PsutilAndroidInstallError(
"psutil sdist did not contain a psutil-* directory"
)
src_root = src_roots[0]
common_py = src_root / "psutil" / "_common.py"
if not common_py.is_file():
raise PsutilAndroidInstallError(
f"psutil sdist did not contain {common_py.relative_to(src_root)!s}"
)
try:
content = common_py.read_text(encoding="utf-8")
except OSError as exc:
raise PsutilAndroidInstallError(
f"Failed to read {common_py.relative_to(src_root)!s}"
) from exc
if MARKER not in content:
raise PsutilAndroidInstallError(
"psutil Android compatibility patch marker not found"
)
try:
common_py.write_text(
content.replace(MARKER, REPLACEMENT),
encoding="utf-8",
)
except OSError as exc:
raise PsutilAndroidInstallError(
f"Failed to write {common_py.relative_to(src_root)!s}"
) from exc
return src_root
+39 -3
View File
@@ -566,8 +566,11 @@ class S6ServiceManager:
1. Sources HERMES_HOME (and any extra env) via with-contenv
so e.g. ``-e HERMES_HOME=/data/hermes`` is honored at run
time, not Python-substituted at registration time (OQ8-C).
2. Activates the bundled venv.
3. Drops to the hermes user and exec's
2. Resets ``HOME`` to ``/opt/data`` before the privilege drop
so with-contenv's root HOME does not leak into the
unprivileged gateway process.
3. Activates the bundled venv.
4. Drops to the hermes user and exec's
``hermes -p <profile> gateway run`` (or just ``hermes
gateway run`` for the default profile see below).
@@ -597,6 +600,7 @@ class S6ServiceManager:
"#!/command/with-contenv sh",
"# shellcheck shell=sh",
"set -e",
"export HOME=/opt/data",
"cd /opt/data",
". /opt/hermes/.venv/bin/activate",
]
@@ -628,6 +632,38 @@ class S6ServiceManager:
so a container started with ``-e HERMES_HOME=/data/hermes``
gets its logs under /data/hermes/logs/..., not the build-time
default.
Output routing the script is two action directives, applied
per line, in order:
1. ``1`` (forward to stdout) propagates the line up the
s6-supervise pipeline to /init's stdout, which is the
container's stdout, which is ``docker logs``. Without
this, supervised stdout would be terminated inside
s6-log and never reach the container's log stream;
users would have to ``docker exec`` and ``tail`` the
file just to see startup banners. (Python's ``logging``
module defaults to stderr, which s6-supervise leaves
unfiltered so warnings/errors already reach docker
logs. This change is specifically about the rich-console
banner output and other plain stdout writes.)
2. ``T <log_dir>`` also write a timestamped copy to the
rotated log directory (``current`` + archived ``@*.s``
files). This is what ``hermes logs`` reads and what
persists across container restarts via the volume mount.
``T`` is non-sticky: it only prefixes lines for the next
action directive. We deliberately put ``T`` between ``1``
and the log dir (not before ``1``) so:
* ``docker logs`` shows raw lines Python's logging
formatter has its own timestamps, and ``docker logs
--timestamps`` adds a third layer when desired. No
double-stamping in the most common reading path.
* The persisted file gets s6-log's own ISO 8601 timestamp
so even output that lacked a Python-logger timestamp
(rich banners, third-party libs' raw prints) is
correlatable in ``current``.
"""
import shlex
prof = shlex.quote(profile)
@@ -638,7 +674,7 @@ class S6ServiceManager:
f'log_dir="$HERMES_HOME/logs/gateways/{prof}"\n'
f'mkdir -p "$log_dir"\n'
f'chown -R hermes:hermes "$log_dir" 2>/dev/null || true\n'
f'exec s6-setuidgid hermes s6-log n10 s1000000 T "$log_dir"\n'
f'exec s6-setuidgid hermes s6-log 1 n10 s1000000 T "$log_dir"\n'
)
# -- lifecycle ---------------------------------------------------------
+46 -9
View File
@@ -58,7 +58,9 @@ def _resolve_short_name(name: str, sources, console: Console) -> str:
table = Table()
table.add_column("Source", style="dim")
table.add_column("Trust", style="dim")
table.add_column("Identifier", style="bold cyan")
# overflow="fold" keeps the full slug visible (wraps instead of ellipsis-truncating)
# so users can copy it for `hermes skills install`.
table.add_column("Identifier", style="bold cyan", overflow="fold", no_wrap=False)
for r in exact:
trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow"}.get(r.trust_level, "dim")
trust_label = "official" if r.source == "official" else r.trust_level
@@ -244,15 +246,39 @@ def _prompt_for_category(c: Console, existing: List[str]) -> str:
def do_search(query: str, source: str = "all", limit: int = 10,
console: Optional[Console] = None) -> None:
"""Search registries and display results as a Rich table."""
console: Optional[Console] = None, as_json: bool = False) -> None:
"""Search registries and display results as a Rich table.
When ``as_json=True`` writes a JSON array of result records to stdout
(one object per skill: ``name``, ``identifier``, ``source``,
``trust_level``, ``description``) and skips the table render. This is
the scripting / copy-paste handle: the full identifier is always
intact, even for browse-sh slugs that the table would otherwise wrap.
"""
from tools.skills_hub import GitHubAuth, create_source_router, unified_search
c = console or _console
c.print(f"\n[bold]Searching for:[/] {query}")
auth = GitHubAuth()
sources = create_source_router(auth)
if as_json:
# Avoid Rich status spinner contaminating stdout — JSON consumers
# expect a clean parseable stream.
results = unified_search(query, sources, source_filter=source, limit=limit)
payload = [
{
"name": r.name,
"identifier": r.identifier,
"source": r.source,
"trust_level": r.trust_level,
"description": r.description,
}
for r in results
]
print(json.dumps(payload, indent=2))
return
c.print(f"\n[bold]Searching for:[/] {query}")
with c.status("[bold]Searching registries..."):
results = unified_search(query, sources, source_filter=source, limit=limit)
@@ -265,7 +291,11 @@ def do_search(query: str, source: str = "all", limit: int = 10,
table.add_column("Description", max_width=60)
table.add_column("Source", style="dim")
table.add_column("Trust", style="dim")
table.add_column("Identifier", style="dim")
# overflow="fold" keeps the full slug visible (wraps instead of
# ellipsis-truncating). Browse.sh slugs end in a `-XXXXXX` hash that
# is part of the actual identifier — truncating it makes copy-paste
# into `hermes skills install` fail.
table.add_column("Identifier", style="dim", overflow="fold", no_wrap=False)
for r in results:
trust_style = {"builtin": "bright_cyan", "trusted": "green", "community": "yellow"}.get(r.trust_level, "dim")
@@ -280,7 +310,8 @@ def do_search(query: str, source: str = "all", limit: int = 10,
c.print(table)
c.print("[dim]Use: hermes skills inspect <identifier> to preview, "
"hermes skills install <identifier> to install[/]\n")
"hermes skills install <identifier> to install "
"(--json for scripting)[/]\n")
def do_browse(page: int = 1, page_size: int = 20, source: str = "all",
@@ -1390,7 +1421,8 @@ def skills_command(args) -> None:
if action == "browse":
do_browse(page=args.page, page_size=args.size, source=args.source)
elif action == "search":
do_search(args.query, source=args.source, limit=args.limit)
do_search(args.query, source=args.source, limit=args.limit,
as_json=getattr(args, "json", False))
elif action == "install":
do_install(args.identifier, category=args.category, force=args.force,
skip_confirm=getattr(args, "yes", False),
@@ -1511,10 +1543,11 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
elif action == "search":
if not args:
c.print("[bold red]Usage:[/] /skills search <query> [--source skills-sh|well-known|github|official] [--limit N]\n")
c.print("[bold red]Usage:[/] /skills search <query> [--source skills-sh|well-known|github|official] [--limit N] [--json]\n")
return
source = "all"
limit = 10
as_json = False
query_parts = []
i = 0
while i < len(args):
@@ -1527,10 +1560,14 @@ def handle_skills_slash(cmd: str, console: Optional[Console] = None) -> None:
except ValueError:
pass
i += 2
elif args[i] == "--json":
as_json = True
i += 1
else:
query_parts.append(args[i])
i += 1
do_search(" ".join(query_parts), source=source, limit=limit, console=c)
do_search(" ".join(query_parts), source=source, limit=limit,
console=c, as_json=as_json)
elif action == "install":
if not args:
+49 -14
View File
@@ -16,6 +16,10 @@ from hermes_cli.auth import AuthError, resolve_provider
from hermes_cli.colors import Colors, color
from hermes_cli.config import get_env_path, get_env_value, get_hermes_home, load_config
from hermes_cli.models import provider_label
from hermes_cli.nous_account import (
format_nous_portal_entitlement_message,
get_nous_portal_account_info,
)
from hermes_cli.nous_subscription import get_nous_subscription_features
from hermes_cli.runtime_provider import resolve_requested_provider
from hermes_constants import OPENROUTER_MODELS_URL
@@ -193,26 +197,57 @@ def show_status(args):
qwen_status = {}
minimax_status = {}
nous_logged_in = bool(nous_status.get("logged_in"))
nous_account_info = None
if (
nous_status.get("logged_in")
or nous_status.get("access_token")
or nous_status.get("portal_base_url")
or nous_status.get("inference_credential_present")
or nous_status.get("error_code")
):
try:
nous_account_info = get_nous_portal_account_info()
except Exception:
nous_account_info = None
nous_logged_in = bool(
nous_status.get("logged_in")
or (nous_account_info and nous_account_info.logged_in)
)
nous_inference_present = bool(
nous_status.get("inference_credential_present")
or (nous_account_info and nous_account_info.inference_credential_present)
)
nous_error = nous_status.get("error")
nous_label = "logged in" if nous_logged_in else "not logged in (run: hermes auth add nous --type oauth)"
if nous_logged_in:
nous_label = "logged in"
elif nous_inference_present:
nous_label = "not logged in (Nous inference key configured)"
else:
nous_label = "not logged in (run: hermes auth add nous --type oauth)"
print(
f" {'Nous Portal':<12} {check_mark(nous_logged_in)} "
f"{nous_label}"
)
portal_url = nous_status.get("portal_base_url") or "(unknown)"
inference_url = (
nous_status.get("inference_base_url")
or (nous_account_info.inference_base_url if nous_account_info else None)
)
access_exp = _format_iso_timestamp(nous_status.get("access_expires_at"))
key_exp = _format_iso_timestamp(nous_status.get("agent_key_expires_at"))
refresh_label = "yes" if nous_status.get("has_refresh_token") else "no"
if nous_logged_in or portal_url != "(unknown)" or nous_error:
print(f" Portal URL: {portal_url}")
if nous_inference_present and inference_url:
print(f" Inference: {inference_url}")
if nous_logged_in or nous_status.get("access_expires_at"):
print(f" Access exp: {access_exp}")
if nous_logged_in or nous_status.get("agent_key_expires_at"):
if nous_logged_in or nous_inference_present or nous_status.get("agent_key_expires_at"):
print(f" Key exp: {key_exp}")
if nous_logged_in or nous_status.get("has_refresh_token"):
print(f" Refresh: {refresh_label}")
if nous_error and not nous_logged_in:
if nous_error:
print(f" Error: {nous_error}")
codex_logged_in = bool(codex_status.get("logged_in"))
@@ -303,18 +338,18 @@ def show_status(args):
else:
state = "not configured"
print(f" {feature.label:<15} {check_mark(feature.available or feature.active or feature.managed_by_nous)} {state}")
elif nous_logged_in:
# Logged into Nous but on the free tier — show upgrade nudge
elif nous_logged_in or nous_inference_present:
# Nous OAuth without entitlement, or an opaque inference key without
# Portal account information, cannot enable the Tool Gateway.
print()
print(color("◆ Nous Tool Gateway", Colors.CYAN, Colors.BOLD))
print(" Your free-tier Nous account does not include Tool Gateway access.")
print(" Upgrade your subscription to unlock managed web, image, TTS, and browser tools.")
try:
portal_url = nous_status.get("portal_base_url", "").rstrip("/")
if portal_url:
print(f" Upgrade: {portal_url}")
except Exception:
pass
message = format_nous_portal_entitlement_message(
nous_account_info,
capability="managed web, image, TTS, browser, and Modal tools",
)
if message:
for line in message.splitlines():
print(f" {line}")
# =========================================================================
# API-Key Providers
+252 -53
View File
@@ -28,7 +28,8 @@ from hermes_cli.nous_subscription import (
apply_nous_managed_defaults,
get_nous_subscription_features,
)
from tools.tool_backend_helpers import fal_key_is_configured, managed_nous_tools_enabled
from hermes_cli.nous_account import format_nous_portal_entitlement_message
from tools.tool_backend_helpers import fal_key_is_configured
from utils import base_url_hostname, is_truthy_value
logger = logging.getLogger(__name__)
@@ -67,6 +68,7 @@ CONFIGURABLE_TOOLSETS = [
("skills", "📚 Skills", "list, view, manage"),
("todo", "📋 Task Planning", "todo"),
("memory", "💾 Memory", "persistent memory across sessions"),
("context_engine", "🧩 Context Engine", "runtime tools from the active context engine"),
("session_search", "🔎 Session Search", "search past conversations"),
("clarify", "❓ Clarifying Questions", "clarify"),
("delegation", "👥 Task Delegation", "delegate_task"),
@@ -1294,6 +1296,24 @@ def _get_platform_tools(
enabled_toolsets.add(pts)
# else: known but not in config = user disabled it
# Context-engine tools are runtime-provided by the active engine, so they
# are not part of any static platform composite. When a non-default engine
# is selected, keep its recovery/status tools available even after a user
# saves an explicit platform toolset list. Preserve the explicit empty-list
# contract: selecting no configurable tools means no context-engine tools
# either unless the user adds ``context_engine`` manually later.
context_cfg = config.get("context") or {}
if not isinstance(context_cfg, dict):
context_cfg = {}
context_engine_name = str(context_cfg.get("engine") or "compressor").strip().lower()
explicit_empty_selection = (
platform in platform_toolsets
and isinstance(platform_toolsets.get(platform), list)
and not toolset_names
)
if context_engine_name and context_engine_name != "compressor" and not explicit_empty_selection:
enabled_toolsets.add("context_engine")
# Preserve any explicit non-configurable toolset entries (for example,
# custom toolsets or MCP server names saved in platform_toolsets).
explicit_passthrough = {
@@ -1399,7 +1419,12 @@ def _save_platform_tools(config: dict, platform: str, enabled_toolset_keys: Set[
save_config(config)
def _toolset_has_keys(ts_key: str, config: dict = None) -> bool:
def _toolset_has_keys(
ts_key: str,
config: dict = None,
*,
force_fresh: bool = False,
) -> bool:
"""Check if a toolset's required API keys are configured."""
if config is None:
config = load_config()
@@ -1414,7 +1439,7 @@ def _toolset_has_keys(ts_key: str, config: dict = None) -> bool:
return False
if ts_key in {"web", "image_gen", "tts", "browser"}:
features = get_nous_subscription_features(config)
features = get_nous_subscription_features(config, force_fresh=force_fresh)
feature = features.features.get(ts_key)
if feature and (feature.available or feature.managed_by_nous):
return True
@@ -1422,7 +1447,7 @@ def _toolset_has_keys(ts_key: str, config: dict = None) -> bool:
# Check TOOL_CATEGORIES first (provider-aware)
cat = TOOL_CATEGORIES.get(ts_key)
if cat:
for provider in _visible_providers(cat, config):
for provider in _visible_providers(cat, config, force_fresh=force_fresh):
env_vars = provider.get("env_vars", [])
if not env_vars:
return True # No-key provider (e.g. Local Browser, Edge TTS)
@@ -1493,7 +1518,13 @@ def _estimate_tool_tokens() -> Dict[str, int]:
return _tool_token_cache
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform: str = "cli") -> Set[str]:
def _prompt_toolset_checklist(
platform_label: str,
enabled: Set[str],
platform: str = "cli",
*,
force_fresh: bool = True,
) -> Set[str]:
"""Multi-select checklist of toolsets. Returns set of selected toolset keys."""
from hermes_cli.curses_ui import curses_checklist
from toolsets import resolve_toolset
@@ -1511,7 +1542,10 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform:
labels = []
for ts_key, ts_label, ts_desc in effective:
suffix = ""
if not _toolset_has_keys(ts_key) and (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
if (
not _toolset_has_keys(ts_key, force_fresh=force_fresh)
and (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key))
):
suffix = " [no API key]"
labels.append(f"{ts_label} ({ts_desc}){suffix}")
@@ -1547,7 +1581,12 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str], platform:
# ─── Provider-Aware Configuration ────────────────────────────────────────────
def _configure_toolset(ts_key: str, config: dict):
def _configure_toolset(
ts_key: str,
config: dict,
*,
force_fresh: bool = True,
):
"""Configure a toolset - provider selection + API keys.
Uses TOOL_CATEGORIES for provider-aware config, falls back to simple
@@ -1556,7 +1595,7 @@ def _configure_toolset(ts_key: str, config: dict):
cat = TOOL_CATEGORIES.get(ts_key)
if cat:
_configure_tool_category(ts_key, cat, config)
_configure_tool_category(ts_key, cat, config, force_fresh=force_fresh)
else:
# Simple fallback for vision, moa, etc.
_configure_simple_requirements(ts_key)
@@ -1809,12 +1848,22 @@ def _plugin_tts_providers() -> list[dict]:
return rows
def _visible_providers(cat: dict, config: dict) -> list[dict]:
def _visible_providers(
cat: dict,
config: dict,
*,
force_fresh: bool = False,
) -> list[dict]:
"""Return provider entries visible for the current auth/config state."""
features = get_nous_subscription_features(config)
features = get_nous_subscription_features(config, force_fresh=force_fresh)
managed_available = bool(
features.account_info
and features.account_info.logged_in
and features.account_info.paid_service_access is True
)
visible = []
for provider in cat.get("providers", []):
if provider.get("managed_nous_feature") and not managed_nous_tools_enabled():
if provider.get("managed_nous_feature") and not managed_available:
continue
if provider.get("requires_nous_auth") and not features.nous_auth_present:
continue
@@ -1855,6 +1904,31 @@ def _visible_providers(cat: dict, config: dict) -> list[dict]:
return visible
def _hidden_nous_gateway_message(
cat: dict,
config: dict,
capability: str,
*,
force_fresh: bool = False,
) -> str:
"""Return a reason when a category's Nous provider is hidden."""
features = get_nous_subscription_features(config, force_fresh=force_fresh)
managed_available = bool(
features.account_info
and features.account_info.logged_in
and features.account_info.paid_service_access is True
)
if managed_available:
return ""
if not any(p.get("managed_nous_feature") for p in cat.get("providers", [])):
return ""
message = format_nous_portal_entitlement_message(
features.account_info,
capability=capability,
)
return message or ""
_POST_SETUP_INSTALLED: dict = {
# post_setup_key -> predicate(): True when the install side-effect
# is already satisfied. Used by `_toolset_needs_configuration_prompt`
@@ -1886,17 +1960,22 @@ def _post_setup_already_installed(post_setup_key: str) -> bool:
return True
def _toolset_needs_configuration_prompt(ts_key: str, config: dict) -> bool:
def _toolset_needs_configuration_prompt(
ts_key: str,
config: dict,
*,
force_fresh: bool = False,
) -> bool:
"""Return True when enabling this toolset should open provider setup."""
cat = TOOL_CATEGORIES.get(ts_key)
if not cat:
return not _toolset_has_keys(ts_key, config)
return not _toolset_has_keys(ts_key, config, force_fresh=force_fresh)
# If any visible provider has a registered post_setup install-state
# check that hasn't been satisfied (e.g. cua-driver binary not on
# PATH yet), force the configuration flow so `_configure_provider`
# invokes `_run_post_setup` and the install actually runs.
for provider in _visible_providers(cat, config):
for provider in _visible_providers(cat, config, force_fresh=force_fresh):
post_setup = provider.get("post_setup")
if post_setup and not _post_setup_already_installed(post_setup):
return True
@@ -1947,14 +2026,26 @@ def _toolset_needs_configuration_prompt(ts_key: str, config: dict) -> bool:
pass
return True
return not _toolset_has_keys(ts_key, config)
return not _toolset_has_keys(ts_key, config, force_fresh=force_fresh)
def _configure_tool_category(ts_key: str, cat: dict, config: dict):
def _configure_tool_category(
ts_key: str,
cat: dict,
config: dict,
*,
force_fresh: bool = True,
):
"""Configure a tool category with provider selection."""
icon = cat.get("icon", "")
name = cat["name"]
providers = _visible_providers(cat, config)
providers = _visible_providers(cat, config, force_fresh=force_fresh)
hidden_nous_message = _hidden_nous_gateway_message(
cat,
config,
f"the Nous Subscription provider for {name}",
force_fresh=force_fresh,
)
# Check Python version requirement
if cat.get("requires_python"):
@@ -1975,7 +2066,10 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
# For single-provider tools, show a note if available
if cat.get("setup_note"):
_print_info(f" {cat['setup_note']}")
_configure_provider(provider, config)
if hidden_nous_message:
for line in hidden_nous_message.splitlines():
_print_warning(f" {line}")
_configure_provider(provider, config, force_fresh=force_fresh)
else:
# Multiple providers - let user choose
print()
@@ -1984,6 +2078,9 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
print(color(f" --- {icon} {name} - {title} ---", Colors.CYAN))
if cat.get("setup_note"):
_print_info(f" {cat['setup_note']}")
if hidden_nous_message:
for line in hidden_nous_message.splitlines():
_print_warning(f" {line}")
print()
# Plain text labels only (no ANSI codes in menu items)
@@ -1992,7 +2089,10 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
# obvious which options cost extra vs. cost nothing on top of Nous.
try:
_nous_logged_in = bool(
get_nous_subscription_features(config).nous_auth_present
get_nous_subscription_features(
config,
force_fresh=force_fresh,
).nous_auth_present
)
except Exception:
_nous_logged_in = False
@@ -2004,7 +2104,7 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
if _is_provider_active(p, config):
if _is_provider_active(p, config, force_fresh=force_fresh):
configured = " [active]"
elif not env_vars:
configured = ""
@@ -2024,7 +2124,11 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
provider_choices.append("Skip — keep defaults / configure later")
# Detect current provider as default
default_idx = _detect_active_provider_index(providers, config)
default_idx = _detect_active_provider_index(
providers,
config,
force_fresh=force_fresh,
)
provider_idx = _prompt_choice(f" {title}:", provider_choices, default_idx)
@@ -2033,10 +2137,15 @@ def _configure_tool_category(ts_key: str, cat: dict, config: dict):
_print_info(f" Skipped {name}")
return
_configure_provider(providers[provider_idx], config)
_configure_provider(providers[provider_idx], config, force_fresh=force_fresh)
def _is_provider_active(provider: dict, config: dict) -> bool:
def _is_provider_active(
provider: dict,
config: dict,
*,
force_fresh: bool = False,
) -> bool:
"""Check if a provider entry matches the currently active config."""
plugin_name = provider.get("image_gen_plugin_name")
if plugin_name:
@@ -2050,7 +2159,7 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
managed_feature = provider.get("managed_nous_feature")
if managed_feature:
features = get_nous_subscription_features(config)
features = get_nous_subscription_features(config, force_fresh=force_fresh)
feature = features.features.get(managed_feature)
if feature is None:
return False
@@ -2097,10 +2206,15 @@ def _is_provider_active(provider: dict, config: dict) -> bool:
return False
def _detect_active_provider_index(providers: list, config: dict) -> int:
def _detect_active_provider_index(
providers: list,
config: dict,
*,
force_fresh: bool = False,
) -> int:
"""Return the index of the currently active provider, or 0."""
for i, p in enumerate(providers):
if _is_provider_active(p, config):
if _is_provider_active(p, config, force_fresh=force_fresh):
return i
# Fallback: env vars present → likely configured
env_vars = p.get("env_vars", [])
@@ -2403,15 +2517,29 @@ def _select_plugin_video_gen_provider(plugin_name: str, config: dict) -> None:
_configure_videogen_model_for_plugin(plugin_name, config)
def _configure_provider(provider: dict, config: dict):
def _configure_provider(
provider: dict,
config: dict,
*,
force_fresh: bool = True,
):
"""Configure a single provider - prompt for API keys and set config."""
env_vars = provider.get("env_vars", [])
managed_feature = provider.get("managed_nous_feature")
if provider.get("requires_nous_auth"):
features = get_nous_subscription_features(config)
if not features.nous_auth_present:
_print_warning(" Nous Subscription is only available after logging into Nous Portal.")
features = get_nous_subscription_features(config, force_fresh=force_fresh)
entitled = bool(
features.account_info and features.account_info.paid_service_access is True
)
if not features.nous_auth_present or not entitled:
message = format_nous_portal_entitlement_message(
features.account_info,
capability=f"{provider.get('name', 'Nous Subscription')}",
)
_print_warning(
f" {message or 'Nous Subscription is only available after logging into Nous Portal.'}"
)
return
# Set TTS provider in config if applicable
@@ -2501,7 +2629,10 @@ def _configure_provider(provider: dict, config: dict):
_has_managed_sibling = True
break
if _has_managed_sibling:
_features = get_nous_subscription_features(config)
_features = get_nous_subscription_features(
config,
force_fresh=force_fresh,
)
_show_portal_hint = not _features.nous_auth_present
except Exception:
_show_portal_hint = False
@@ -2619,7 +2750,11 @@ def _configure_simple_requirements(ts_key: str):
_print_warning(" Skipped")
def _reconfigure_tool(config: dict):
def _reconfigure_tool(
config: dict,
*,
force_fresh: bool = True,
):
"""Let user reconfigure an existing tool's provider or API key."""
# Build list of configurable tools that are currently set up
configurable = []
@@ -2627,7 +2762,10 @@ def _reconfigure_tool(config: dict):
cat = TOOL_CATEGORIES.get(ts_key)
reqs = TOOLSET_ENV_REQUIREMENTS.get(ts_key)
if cat or reqs:
if _toolset_has_keys(ts_key, config) or _toolset_enabled_for_reconfigure(ts_key, config):
if (
_toolset_has_keys(ts_key, config, force_fresh=force_fresh)
or _toolset_enabled_for_reconfigure(ts_key, config)
):
configurable.append((ts_key, ts_label))
if not configurable:
@@ -2646,7 +2784,12 @@ def _reconfigure_tool(config: dict):
cat = TOOL_CATEGORIES.get(ts_key)
if cat:
_configure_tool_category_for_reconfig(ts_key, cat, config)
_configure_tool_category_for_reconfig(
ts_key,
cat,
config,
force_fresh=force_fresh,
)
else:
_reconfigure_simple_requirements(ts_key)
@@ -2675,20 +2818,38 @@ def _toolset_enabled_for_reconfigure(ts_key: str, config: dict) -> bool:
return False
def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
def _configure_tool_category_for_reconfig(
ts_key: str,
cat: dict,
config: dict,
*,
force_fresh: bool = True,
):
"""Reconfigure a tool category - provider selection + API key update."""
icon = cat.get("icon", "")
name = cat["name"]
providers = _visible_providers(cat, config)
providers = _visible_providers(cat, config, force_fresh=force_fresh)
hidden_nous_message = _hidden_nous_gateway_message(
cat,
config,
f"the Nous Subscription provider for {name}",
force_fresh=force_fresh,
)
if len(providers) == 1:
provider = providers[0]
print()
print(color(f" --- {icon} {name} ({provider['name']}) ---", Colors.CYAN))
_reconfigure_provider(provider, config)
if hidden_nous_message:
for line in hidden_nous_message.splitlines():
_print_warning(f" {line}")
_reconfigure_provider(provider, config, force_fresh=force_fresh)
else:
print()
print(color(f" --- {icon} {name} - Choose a provider ---", Colors.CYAN))
if hidden_nous_message:
for line in hidden_nous_message.splitlines():
_print_warning(f" {line}")
print()
provider_choices = []
@@ -2698,7 +2859,7 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
configured = ""
env_vars = p.get("env_vars", [])
if not env_vars or all(get_env_value(v["key"]) for v in env_vars):
if _is_provider_active(p, config):
if _is_provider_active(p, config, force_fresh=force_fresh):
configured = " [active]"
elif not env_vars:
configured = ""
@@ -2706,21 +2867,43 @@ def _configure_tool_category_for_reconfig(ts_key: str, cat: dict, config: dict):
configured = " [configured]"
provider_choices.append(f"{p['name']}{badge}{tag}{configured}")
default_idx = _detect_active_provider_index(providers, config)
default_idx = _detect_active_provider_index(
providers,
config,
force_fresh=force_fresh,
)
provider_idx = _prompt_choice(" Select provider:", provider_choices, default_idx)
_reconfigure_provider(providers[provider_idx], config)
_reconfigure_provider(
providers[provider_idx],
config,
force_fresh=force_fresh,
)
def _reconfigure_provider(provider: dict, config: dict):
def _reconfigure_provider(
provider: dict,
config: dict,
*,
force_fresh: bool = True,
):
"""Reconfigure a provider - update API keys."""
env_vars = provider.get("env_vars", [])
managed_feature = provider.get("managed_nous_feature")
if provider.get("requires_nous_auth"):
features = get_nous_subscription_features(config)
if not features.nous_auth_present:
_print_warning(" Nous Subscription is only available after logging into Nous Portal.")
features = get_nous_subscription_features(config, force_fresh=force_fresh)
entitled = bool(
features.account_info and features.account_info.paid_service_access is True
)
if not features.nous_auth_present or not entitled:
message = format_nous_portal_entitlement_message(
features.account_info,
capability=f"{provider.get('name', 'Nous Subscription')}",
)
_print_warning(
f" {message or 'Nous Subscription is only available after logging into Nous Portal.'}"
)
return
if provider.get("tts_provider"):
@@ -2921,11 +3104,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
auto_configured = apply_nous_managed_defaults(
config,
enabled_toolsets=new_enabled,
force_fresh=True,
)
if managed_nous_tools_enabled():
for ts_key in sorted(auto_configured):
label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts_key), ts_key)
print(color(f"{label}: using your Nous subscription defaults", Colors.GREEN))
for ts_key in sorted(auto_configured):
label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts_key), ts_key)
print(color(f"{label}: using your Nous subscription defaults", Colors.GREEN))
# Walk through ALL selected tools that have provider options or
# need API keys. This ensures browser (Local vs Browserbase),
@@ -2993,7 +3176,7 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
# "Reconfigure" selected
if idx == _reconfig_idx:
_reconfigure_tool(config)
_reconfigure_tool(config, force_fresh=True)
print()
continue
@@ -3009,7 +3192,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
all_current = set()
for pk in platform_keys:
all_current |= _get_platform_tools(config, pk, include_default_mcp_servers=False)
new_enabled = _prompt_toolset_checklist("All platforms", all_current)
new_enabled = _prompt_toolset_checklist(
"All platforms",
all_current,
force_fresh=True,
)
if new_enabled != all_current:
for pk in platform_keys:
prev = _get_platform_tools(config, pk, include_default_mcp_servers=False)
@@ -3027,7 +3214,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
# Configure API keys for newly enabled tools
for ts_key in sorted(added):
if (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
if _toolset_needs_configuration_prompt(ts_key, config):
if _toolset_needs_configuration_prompt(
ts_key,
config,
force_fresh=True,
):
_configure_toolset(ts_key, config)
_save_platform_tools(config, pk, new_enabled)
save_config(config)
@@ -3049,7 +3240,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
current_enabled = _get_platform_tools(config, pkey, include_default_mcp_servers=False)
# Show checklist
new_enabled = _prompt_toolset_checklist(pinfo["label"], current_enabled)
new_enabled = _prompt_toolset_checklist(
pinfo["label"],
current_enabled,
force_fresh=True,
)
if new_enabled != current_enabled:
added = new_enabled - current_enabled
@@ -3067,7 +3262,11 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
# Configure newly enabled toolsets that need API keys
for ts_key in sorted(added):
if (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
if _toolset_needs_configuration_prompt(ts_key, config):
if _toolset_needs_configuration_prompt(
ts_key,
config,
force_fresh=True,
):
_configure_toolset(ts_key, config)
_save_platform_tools(config, pkey, new_enabled)
+69 -3
View File
@@ -174,7 +174,7 @@ def _load_engine_from_dir(engine_dir: Path) -> Optional["ContextEngine"]:
# Try register(ctx) pattern first (how plugins are written)
if hasattr(mod, "register"):
collector = _EngineCollector()
collector = _EngineCollector(engine_name=name)
try:
mod.register(collector)
if collector.engine:
@@ -197,14 +197,80 @@ def _load_engine_from_dir(engine_dir: Path) -> Optional["ContextEngine"]:
class _EngineCollector:
"""Fake plugin context that captures register_context_engine calls."""
"""Fake plugin context that captures register_context_engine calls.
def __init__(self):
Plugin context engines using the standard ``register(ctx)`` pattern may
also call ``ctx.register_command(...)`` to expose slash commands (e.g.
``/lcm``). Forward those to the global plugin command registry so they
behave identically to commands registered by normal plugins.
"""
def __init__(self, engine_name: str = ""):
self.engine = None
self._engine_name = engine_name or "context_engine"
self._registered_commands: list[str] = []
def register_context_engine(self, engine):
self.engine = engine
def register_command(
self,
name: str,
handler,
description: str = "",
args_hint: str = "",
) -> None:
"""Forward to the global plugin command registry."""
clean = (name or "").lower().strip().lstrip("/").replace(" ", "-")
if not clean:
logger.warning(
"Context engine '%s' tried to register a command with an empty name.",
self._engine_name,
)
return
# Reject conflicts with built-in commands.
try:
from hermes_cli.commands import resolve_command
if resolve_command(clean) is not None:
logger.warning(
"Context engine '%s' tried to register command '/%s' which conflicts "
"with a built-in command. Skipping.",
self._engine_name, clean,
)
return
except Exception:
pass
try:
from hermes_cli.plugins import get_plugin_manager
manager = get_plugin_manager()
if clean in manager._plugin_commands:
# Don't clobber a regular plugin's command — same conflict
# policy the plugin system uses for plugin-vs-plugin collisions.
logger.warning(
"Context engine '%s' tried to register command '/%s' which "
"is already registered by a plugin. Skipping.",
self._engine_name, clean,
)
return
manager._plugin_commands[clean] = {
"handler": handler,
"description": description or "Context engine command",
"plugin": f"context-engine:{self._engine_name}",
"args_hint": (args_hint or "").strip(),
}
self._registered_commands.append(clean)
logger.debug(
"Context engine '%s' registered command: /%s",
self._engine_name, clean,
)
except Exception as exc:
logger.debug(
"Context engine '%s' could not register /%s: %s",
self._engine_name, clean, exc,
)
# No-op for other registration methods
def register_tool(self, *args, **kwargs):
pass
+2 -4
View File
@@ -67,10 +67,6 @@
gap: 0.75rem;
align-items: start;
overflow-x: auto;
scrollbar-width: none;
}
.hermes-kanban-columns::-webkit-scrollbar {
display: none;
}
.hermes-kanban-column {
@@ -143,6 +139,8 @@
gap: 0.45rem;
overflow-y: auto;
padding-right: 0.1rem;
flex: 1;
min-height: 0;
}
.hermes-kanban-empty {
@@ -43,6 +43,8 @@ class OpenRouterProfile(ProviderProfile):
self, *, session_id: str | None = None, **context: Any
) -> dict[str, Any]:
body: dict[str, Any] = {}
if session_id:
body["session_id"] = session_id
prefs = context.get("provider_preferences")
if prefs:
body["provider"] = prefs
+13 -8
View File
@@ -4811,14 +4811,19 @@ class DiscordAdapter(BasePlatformAdapter):
# to keep the partition rule clean.
_channel_context = None
_is_dm = isinstance(message.channel, discord.DMChannel)
if not _is_dm:
_needed_mention = (
require_mention
and not is_free_channel
and not in_bot_thread
)
_backfill_enabled = self._discord_history_backfill()
if _needed_mention and _backfill_enabled:
if not _is_dm and self._discord_history_backfill():
# Run backfill when there's a real gap to fill:
# - mention-gated channels with no free-response override
# (messages between bot turns aren't in the transcript)
# - any thread (in_bot_thread bypasses the mention check, but
# processing-window gaps and post-restart context still need
# recovery)
# DMs skip entirely because every DM message triggers the bot,
# so the session transcript already has everything.
# Auto-threaded messages also skip — we just created the thread,
# there's nothing prior to backfill.
_has_mention_gap = require_mention and not is_free_channel and not in_bot_thread
if (_has_mention_gap or is_thread) and auto_threaded_channel is None:
_backfill_text = await self._fetch_channel_context(
message.channel, before=message,
)
+6 -185
View File
@@ -196,9 +196,13 @@ def _raise_web_backend_configuration_error() -> None:
)
if _wt.managed_nous_tools_enabled():
message += (
" With your Nous subscription you can also use the Tool Gateway "
" With your Nous subscription you can also use the Tool Gateway. "
"run `hermes tools` and select Nous Subscription as the web provider."
)
else:
message += " " + _wt.nous_tool_gateway_unavailable_message(
"managed Firecrawl web tools",
)
raise ValueError(message)
@@ -381,9 +385,6 @@ class FirecrawlWebSearchProvider(WebSearchProvider):
def supports_extract(self) -> bool:
return True
def supports_crawl(self) -> bool:
return True
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a Firecrawl search.
@@ -575,192 +576,12 @@ class FirecrawlWebSearchProvider(WebSearchProvider):
return results
async def crawl(self, url: str, **kwargs: Any) -> Dict[str, Any]:
"""Crawl a seed URL via Firecrawl's ``/crawl`` endpoint.
Sync SDK call wrapped in ``asyncio.to_thread`` because the dispatcher
in :func:`tools.web_tools.web_crawl_tool` is async and runs LLM
post-processing on the response. The dispatcher gates the seed URL
against SSRF + website-access policy before calling us; this method
re-checks every crawled page's URL against the policy after the
crawl returns to catch redirected pages that map to a blocked host.
Accepted kwargs (others ignored for forward compat):
- ``instructions``: str logged then dropped. Firecrawl's /crawl
endpoint does NOT accept natural-language instructions (that's
an /extract feature), so we record the value for debugging and
proceed without it. Tavily's crawl IS instruction-aware; this
divergence is documented in both plugins' docstrings.
- ``limit``: int max pages to crawl (default 20).
- ``depth``: str accepted for API parity with Tavily; ignored
by Firecrawl's crawl endpoint.
Returns ``{"results": [...]}`` matching the shape that
:func:`tools.web_tools.web_crawl_tool`'s shared LLM-summarization
path expects. Per-page failures (policy block on redirected URL,
bad response shape) are included as items with an ``error`` field
rather than raising.
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return {"results": [{"url": url, "title": "", "content": "", "error": "Interrupted"}]}
instructions = kwargs.get("instructions")
limit = kwargs.get("limit", 20)
# Firecrawl's /crawl endpoint does not accept natural-language
# instructions (that's an /extract feature). Log + drop.
if instructions:
logger.info(
"Firecrawl crawl: 'instructions' parameter ignored "
"(not supported by Firecrawl /crawl)"
)
logger.info("Firecrawl crawl: %s (limit=%d)", url, limit)
crawl_params = {
"limit": limit,
"scrape_options": {"formats": ["markdown"]},
}
# The SDK call is sync; run in a thread so we don't block the
# gateway event loop on a multi-page crawl.
crawl_result = await asyncio.to_thread(
_get_firecrawl_client().crawl,
url=url,
**crawl_params,
)
# CrawlJob normalization across SDK + direct + gateway shapes.
data_list: List[Any] = []
if hasattr(crawl_result, "data"):
data_list = crawl_result.data if crawl_result.data else []
logger.info(
"Firecrawl crawl status: %s, %d pages",
getattr(crawl_result, "status", "unknown"),
len(data_list),
)
elif isinstance(crawl_result, dict) and "data" in crawl_result:
data_list = crawl_result.get("data", []) or []
else:
logger.warning(
"Firecrawl crawl: unexpected result type %r",
type(crawl_result).__name__,
)
pages: List[Dict[str, Any]] = []
for item in data_list:
# Pydantic model | typed object | dict — handle all shapes.
content_markdown = None
content_html = None
metadata: Any = {}
if hasattr(item, "model_dump"):
item_dict = item.model_dump()
content_markdown = item_dict.get("markdown")
content_html = item_dict.get("html")
metadata = item_dict.get("metadata", {})
elif hasattr(item, "__dict__"):
content_markdown = getattr(item, "markdown", None)
content_html = getattr(item, "html", None)
metadata_obj = getattr(item, "metadata", {})
if hasattr(metadata_obj, "model_dump"):
metadata = metadata_obj.model_dump()
elif hasattr(metadata_obj, "__dict__"):
metadata = metadata_obj.__dict__
elif isinstance(metadata_obj, dict):
metadata = metadata_obj
else:
metadata = {}
elif isinstance(item, dict):
content_markdown = item.get("markdown")
content_html = item.get("html")
metadata = item.get("metadata", {})
# Ensure metadata is a plain dict.
if not isinstance(metadata, dict):
if hasattr(metadata, "model_dump"):
metadata = metadata.model_dump()
elif hasattr(metadata, "__dict__"):
metadata = metadata.__dict__
else:
metadata = {}
page_url = metadata.get(
"sourceURL", metadata.get("url", "Unknown URL")
)
title = metadata.get("title", "")
# Per-page policy re-check (catches blocked redirects).
page_blocked = check_website_access(page_url)
if page_blocked:
logger.info(
"Blocked crawled page %s by rule %s",
page_blocked["host"],
page_blocked["rule"],
)
pages.append(
{
"url": page_url,
"title": title,
"content": "",
"raw_content": "",
"error": page_blocked["message"],
"blocked_by_policy": {
"host": page_blocked["host"],
"rule": page_blocked["rule"],
"source": page_blocked["source"],
},
}
)
continue
content = content_markdown or content_html or ""
pages.append(
{
"url": page_url,
"title": title,
"content": content,
"raw_content": content,
"metadata": metadata,
}
)
return {"results": pages}
except ValueError as exc:
return {"results": [{"url": url, "title": "", "content": "", "error": str(exc)}]}
except ImportError as exc:
return {
"results": [
{
"url": url,
"title": "",
"content": "",
"error": f"Firecrawl SDK not installed: {exc}",
}
]
}
except Exception as exc: # noqa: BLE001
logger.warning("Firecrawl crawl error: %s", exc)
return {
"results": [
{
"url": url,
"title": "",
"content": "",
"error": f"Firecrawl crawl failed: {exc}",
}
]
}
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "Firecrawl",
"badge": "paid · optional gateway",
"tag": (
"Full search + extract + crawl; supports direct API and "
"Full search + extract; supports direct API and "
"Nous tool-gateway routing."
),
"env_vars": [
+1 -6
View File
@@ -1,9 +1,4 @@
"""Tavily web search + extract + crawl plugin — bundled, auto-loaded.
First plugin in this codebase to advertise ``supports_crawl=True``. The
crawl method maps to Tavily's ``/crawl`` endpoint, which accepts a seed
URL plus optional instructions and extract depth.
"""
"""Tavily web search + extract plugin — bundled, auto-loaded."""
from __future__ import annotations
+8 -73
View File
@@ -1,33 +1,24 @@
"""Tavily web search + content extraction + crawl — plugin form.
"""Tavily web search + content extraction — plugin form.
Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Three
Subclasses :class:`agent.web_search_provider.WebSearchProvider`. Two
capabilities advertised:
- ``supports_search()`` -> True (Tavily ``/search``)
- ``supports_extract()`` -> True (Tavily ``/extract``)
- ``supports_crawl()`` -> True (Tavily ``/crawl``) sync HTTP crawl;
Firecrawl also advertises ``supports_crawl=True`` (async)
All three are sync the underlying call is ``httpx.post(...)``. The
dispatcher in :func:`tools.web_tools.web_crawl_tool` (which is itself
async) will run sync providers in a thread when appropriate.
Both are sync the underlying call is ``httpx.post(...)``.
Config keys this provider responds to::
web:
search_backend: "tavily" # explicit per-capability
extract_backend: "tavily" # explicit per-capability
crawl_backend: "tavily" # explicit per-capability
backend: "tavily" # shared fallback for all three
backend: "tavily" # shared fallback for both
Env vars::
TAVILY_API_KEY=... # https://app.tavily.com/home (required)
TAVILY_BASE_URL=... # optional override of https://api.tavily.com
Auth note: Tavily uses ``api_key`` in the JSON body for /search and
/extract, but **also requires** ``Authorization: Bearer <key>`` for /crawl
(body-only auth returns 401 on /crawl). The plugin handles both.
"""
from __future__ import annotations
@@ -63,11 +54,7 @@ def _tavily_request(endpoint: str, payload: Dict[str, Any]) -> Dict[str, Any]:
url = f"{base_url}/{endpoint.lstrip('/')}"
logger.info("Tavily %s request to %s", endpoint, url)
# Tavily /crawl requires Bearer header auth in addition to body auth;
# /search and /extract are body-only.
headers = {"Authorization": f"Bearer {api_key}"} if endpoint.strip("/") == "crawl" else {}
response = httpx.post(url, json=payload, headers=headers, timeout=60)
response = httpx.post(url, json=payload, timeout=60)
response.raise_for_status()
return response.json()
@@ -90,7 +77,7 @@ def _normalize_tavily_search_results(response: Dict[str, Any]) -> Dict[str, Any]
def _normalize_tavily_documents(
response: Dict[str, Any], fallback_url: str = ""
) -> List[Dict[str, Any]]:
"""Map Tavily ``/extract`` or ``/crawl`` response to standard documents.
"""Map Tavily ``/extract`` response to standard documents.
Documents follow the legacy LLM post-processing shape::
@@ -139,7 +126,7 @@ def _normalize_tavily_documents(
class TavilyWebSearchProvider(WebSearchProvider):
"""Tavily search + extract + crawl provider."""
"""Tavily search + extract provider."""
@property
def name(self) -> str:
@@ -159,9 +146,6 @@ class TavilyWebSearchProvider(WebSearchProvider):
def supports_extract(self) -> bool:
return True
def supports_crawl(self) -> bool:
return True
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
"""Execute a Tavily search."""
try:
@@ -221,60 +205,11 @@ class TavilyWebSearchProvider(WebSearchProvider):
for u in urls
]
def crawl(self, url: str, **kwargs: Any) -> Dict[str, Any]:
"""Crawl a seed URL via Tavily's ``/crawl`` endpoint.
Accepted kwargs (others ignored for forward compat):
- ``instructions``: str natural-language guidance for the crawl
- ``depth``: str ``"basic"`` (default) or ``"advanced"``
- ``limit``: int max pages to crawl (default 20)
Returns ``{"results": [...]}`` shaped to match what
:func:`tools.web_tools.web_crawl_tool` post-processes.
"""
try:
from tools.interrupt import is_interrupted
if is_interrupted():
return {"results": [{"url": url, "title": "", "content": "", "error": "Interrupted"}]}
instructions = kwargs.get("instructions")
depth = kwargs.get("depth", "basic")
limit = kwargs.get("limit", 20)
logger.info("Tavily crawl: %s (depth=%s, limit=%d)", url, depth, limit)
payload: Dict[str, Any] = {
"url": url,
"limit": limit,
"extract_depth": depth,
}
if instructions:
payload["instructions"] = instructions
raw = _tavily_request("crawl", payload)
return {
"results": _normalize_tavily_documents(raw, fallback_url=url)
}
except ValueError as exc:
return {"results": [{"url": url, "title": "", "content": "", "error": str(exc)}]}
except Exception as exc: # noqa: BLE001
logger.warning("Tavily crawl error: %s", exc)
return {
"results": [
{
"url": url,
"title": "",
"content": "",
"error": f"Tavily crawl failed: {exc}",
}
]
}
def get_setup_schema(self) -> Dict[str, Any]:
return {
"name": "Tavily",
"badge": "paid",
"tag": "Search + extract + crawl in one provider.",
"tag": "Search + extract in one provider.",
"env_vars": [
{
"key": "TAVILY_API_KEY",
-3
View File
@@ -143,9 +143,6 @@ class XAIWebSearchProvider(WebSearchProvider):
def supports_extract(self) -> bool:
return False
def supports_crawl(self) -> bool:
return False
# -- Search -----------------------------------------------------------
def search(self, query: str, limit: int = 5) -> Dict[str, Any]:
+1 -1
View File
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
version = "0.14.0"
version = "0.15.0"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"
+183 -13
View File
@@ -527,7 +527,81 @@ class AIAgent:
"Session DB creation failed (will retry next turn): %s", e
)
def reset_session_state(self):
def _transition_context_engine_session(
self,
*,
old_session_id: Optional[str] = None,
new_session_id: Optional[str] = None,
previous_messages: Optional[list] = None,
carry_over_context: bool = False,
reset_engine: bool = True,
**extra_context,
) -> None:
"""Notify the active context engine about a host session transition.
Generic host-side lifecycle helper. The built-in compressor keeps its
existing reset behavior; plugin engines that implement richer hooks
(``on_session_end``, ``on_session_reset``, ``on_session_start``,
``carry_over_new_session_context``) can flush old-session state,
reset runtime counters, bind to the new session, and optionally
carry retained context forward.
"""
engine = getattr(self, "context_compressor", None)
if not engine:
return
if old_session_id and previous_messages is not None and hasattr(engine, "on_session_end"):
try:
engine.on_session_end(old_session_id, previous_messages)
except Exception as exc:
logger.debug("context engine on_session_end during transition: %s", exc)
if reset_engine and hasattr(engine, "on_session_reset"):
try:
engine.on_session_reset()
except Exception as exc:
logger.debug("context engine on_session_reset during transition: %s", exc)
should_start = bool(
old_session_id
or previous_messages is not None
or carry_over_context
or extra_context
)
target_session_id = new_session_id or getattr(self, "session_id", "") or ""
if should_start and target_session_id and hasattr(engine, "on_session_start"):
start_context = {
"old_session_id": old_session_id,
"carry_over_context": carry_over_context,
"platform": getattr(self, "platform", None) or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
"model": getattr(self, "model", ""),
"context_length": getattr(engine, "context_length", None),
"conversation_id": getattr(self, "_gateway_session_key", None),
}
start_context.update(extra_context)
start_context = {k: v for k, v in start_context.items() if v not in (None, "")}
try:
engine.on_session_start(target_session_id, **start_context)
except Exception as exc:
logger.debug("context engine on_session_start during transition: %s", exc)
if (
carry_over_context
and old_session_id
and target_session_id
and hasattr(engine, "carry_over_new_session_context")
):
try:
engine.carry_over_new_session_context(old_session_id, target_session_id)
except Exception as exc:
logger.debug("context engine carry_over_new_session_context during transition: %s", exc)
def reset_session_state(
self,
previous_messages: Optional[list] = None,
old_session_id: Optional[str] = None,
carry_over_context: bool = False,
):
"""Reset all session-scoped token counters to 0 for a fresh session.
This method encapsulates the reset logic for all session-level metrics
@@ -541,9 +615,12 @@ class AIAgent:
The method safely handles optional attributes (e.g., context compressor)
using ``hasattr`` checks.
This keeps the counter reset logic DRY and maintainable in one place
rather than scattering it across multiple methods.
When ``previous_messages`` / ``old_session_id`` / ``carry_over_context``
are provided, the active context engine is notified through the
full transition lifecycle (``_transition_context_engine_session``)
instead of a bare reset. Default callers pass nothing and keep the
existing reset-only behavior.
"""
# Token usage counters
self.session_total_tokens = 0
@@ -562,9 +639,14 @@ class AIAgent:
# Turn counter (added after reset_session_state was first written — #2635)
self._user_turn_count = 0
# Context engine reset (works for both built-in compressor and plugins)
if hasattr(self, "context_compressor") and self.context_compressor:
self.context_compressor.on_session_reset()
# Context engine reset/transition (works for built-in compressor and plugins)
self._transition_context_engine_session(
old_session_id=old_session_id,
new_session_id=getattr(self, "session_id", None),
previous_messages=previous_messages,
carry_over_context=carry_over_context,
reset_engine=True,
)
def _ensure_lmstudio_runtime_loaded(self, config_context_length: Optional[int] = None) -> None:
"""
@@ -719,6 +801,83 @@ class AIAgent:
except Exception:
logger.debug("status_callback error in _emit_warning", exc_info=True)
# ── Buffered retry/fallback status ────────────────────────────────────
# Retry and fallback chains were flooding the CLI/gateway with status
# noise that users found confusing: a single transient 429 could produce
# 10+ "Provider/Endpoint/Retrying in 5s..." lines before the request
# eventually succeeded. The buffered helpers below capture these
# status messages instead of emitting them immediately. They are
# flushed (shown to the user) ONLY when every retry and fallback has
# been exhausted; on success they are silently dropped. Backend logs
# (agent.log) are unaffected — every individual emission site still
# writes to ``logger.warning`` / ``logger.info`` for diagnosis.
def _buffer_status(self, message: str) -> None:
"""Buffer a retry/fallback status message.
Stored as a (kind, text) tuple where ``kind`` is one of:
- ``"status"`` -> replays via ``_emit_status``
- ``"vprint"`` -> replays via ``_vprint(force=True)``
- ``"warn"`` -> replays via ``_emit_warning``
Used to defer noisy retry chatter until we know whether the
turn ultimately recovered or failed.
"""
try:
buf = getattr(self, "_retry_status_buffer", None)
if buf is None:
buf = []
self._retry_status_buffer = buf
buf.append(("status", message))
except Exception:
# Never break the retry loop on a buffer hiccup.
pass
def _buffer_vprint(self, message: str) -> None:
"""Buffer a vprint(force=True) retry/fallback line."""
try:
buf = getattr(self, "_retry_status_buffer", None)
if buf is None:
buf = []
self._retry_status_buffer = buf
buf.append(("vprint", message))
except Exception:
pass
def _clear_status_buffer(self) -> None:
"""Drop buffered retry messages — call on successful recovery."""
try:
buf = getattr(self, "_retry_status_buffer", None)
if buf:
buf.clear()
except Exception:
pass
def _flush_status_buffer(self) -> None:
"""Emit buffered retry messages — call on terminal failure.
Surfaces the full retry/fallback trace so the user can see what
was tried before the turn gave up.
"""
try:
buf = getattr(self, "_retry_status_buffer", None)
if not buf:
return
# Drain first so a callback exception doesn't double-emit.
messages = list(buf)
buf.clear()
for kind, msg in messages:
try:
if kind == "status":
self._emit_status(msg)
elif kind == "warn":
self._emit_warning(msg)
else:
self._vprint(f"{self.log_prefix}{msg}", force=True)
except Exception:
pass
except Exception:
pass
def _disable_codex_reasoning_replay(
self,
messages: Optional[List[Dict[str, Any]]] = None,
@@ -2847,7 +3006,12 @@ class AIAgent:
return True
def _try_refresh_nous_client_credentials(self, *, force: bool = True) -> bool:
def _try_refresh_nous_client_credentials(
self,
*,
force: bool = True,
inference_auth_mode: str | None = None,
) -> bool:
if self.api_mode != "chat_completions" or self.provider != "nous":
return False
@@ -2858,14 +3022,15 @@ class AIAgent:
resolve_nous_runtime_credentials,
)
selected_auth_mode = inference_auth_mode or (
NOUS_INFERENCE_AUTH_MODE_LEGACY
if force
else NOUS_INFERENCE_AUTH_MODE_AUTO
)
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
inference_auth_mode=(
NOUS_INFERENCE_AUTH_MODE_LEGACY
if force
else NOUS_INFERENCE_AUTH_MODE_AUTO
),
inference_auth_mode=selected_auth_mode,
)
except Exception as exc:
logger.debug("Nous credential refresh failed: %s", exc)
@@ -3988,6 +4153,11 @@ class AIAgent:
from agent.agent_runtime_helpers import copy_reasoning_content_for_api
return copy_reasoning_content_for_api(self, source_msg, api_msg)
def _reapply_reasoning_echo_for_provider(self, api_messages: list) -> int:
"""Forwarder — see ``agent.agent_runtime_helpers.reapply_reasoning_echo_for_provider``."""
from agent.agent_runtime_helpers import reapply_reasoning_echo_for_provider
return reapply_reasoning_echo_for_provider(self, api_messages)
@staticmethod
def _sanitize_tool_calls_for_strict_api(api_msg: dict) -> dict:
"""Strip Codex Responses API fields from tool_calls for strict providers.
+24 -3
View File
@@ -269,11 +269,28 @@ def main():
# Crawl skills.sh
all_skills.extend(crawl_skills_sh(skills_sh_source))
# Crawl other sources in parallel
# Crawl other sources in parallel.
# Per-source soft caps — sources stop returning when they run out, so these
# are ceilings, not targets. ClawHub has 20k+ skills; bumping to 100k
# (well above current catalog size) lets the full catalog land in the
# index instead of being truncated at an arbitrary build-time limit.
SOURCE_LIMITS = {
# ClawHub had 49,698+ skills as of May 2026; 200k leaves headroom.
"clawhub": 200_000,
"lobehub": 100_000,
"browse-sh": 5_000,
"claude-marketplace": 5_000,
"github": 5_000,
"well-known": 5_000,
"official": 5_000,
}
DEFAULT_SOURCE_LIMIT = 500
with ThreadPoolExecutor(max_workers=4) as pool:
futures = {}
for name, source in sources.items():
futures[pool.submit(crawl_source, source, name, 500)] = name
limit = SOURCE_LIMITS.get(name, DEFAULT_SOURCE_LIMIT)
futures[pool.submit(crawl_source, source, name, limit)] = name
for future in as_completed(futures):
try:
all_skills.extend(future.result())
@@ -330,7 +347,11 @@ def main():
EXPECTED_FLOORS = {
"skills.sh": 100,
"lobehub": 100,
"clawhub": 50,
# ClawHub had 49,698+ skills as of May 2026 — anything under 20k means
# pagination broke or the API surface changed. Fail loudly rather
# than ship a degenerate index (we shipped 200/50000 silently for
# weeks because the floor was 50).
"clawhub": 20000,
"official": 50,
"github": 30, # collapsed across all GitHub taps
"browse-sh": 50,
+3
View File
@@ -42,6 +42,7 @@ IGNORED_PATTERNS = [
re.compile(r"^Copilot$", re.IGNORECASE),
re.compile(r"^Cursor(\s+Agent)?$", re.IGNORECASE),
re.compile(r"^GitHub\s*Actions?$", re.IGNORECASE),
re.compile(r"^github-actions(\[bot\])?$", re.IGNORECASE),
re.compile(r"^dependabot", re.IGNORECASE),
re.compile(r"^renovate", re.IGNORECASE),
re.compile(r"^Hermes\s+(Agent|Audit)$", re.IGNORECASE),
@@ -51,10 +52,12 @@ IGNORED_PATTERNS = [
IGNORED_EMAILS = {
"noreply@anthropic.com",
"noreply@github.com",
"noreply@nousresearch.com",
"cursoragent@cursor.com",
"hermes@nousresearch.com",
"hermes-audit@example.com",
"hermes@habibilabs.dev",
"omx@oh-my-codex.dev",
}
+13 -28
View File
@@ -27,21 +27,22 @@ import argparse
import shutil
import subprocess
import sys
import tarfile
import tempfile
import urllib.request
from pathlib import Path
# Pin a version we know patches cleanly. Update when a newer psutil
# changes the marker line shape and we need to follow upstream.
PSUTIL_URL = (
"https://files.pythonhosted.org/packages/aa/c6/"
"d1ddf4abb55e93cebc4f2ed8b5d6dbad109ecb8d63748dd2b20ab5e57ebe/"
"psutil-7.2.2.tar.gz"
# Keep sibling imports working when invoked as
# ``python scripts/install_psutil_android.py`` from the repo checkout.
REPO_ROOT = Path(__file__).resolve().parents[1]
if str(REPO_ROOT) not in sys.path:
sys.path.insert(0, str(REPO_ROOT))
from hermes_cli.psutil_android import (
PSUTIL_URL,
PsutilAndroidInstallError,
prepare_patched_psutil_sdist,
)
MARKER = 'LINUX = sys.platform.startswith("linux")'
REPLACEMENT = 'LINUX = sys.platform.startswith(("linux", "android"))'
def _resolve_install_cmd(pip_arg: str | None, prefer_uv: bool) -> list[str]:
@@ -82,26 +83,10 @@ def main() -> int:
tmp_path = Path(tmp)
archive = tmp_path / "psutil.tar.gz"
urllib.request.urlretrieve(PSUTIL_URL, archive)
with tarfile.open(archive) as tar:
tar.extractall(tmp_path)
try:
src_root = next(
p for p in tmp_path.iterdir()
if p.is_dir() and p.name.startswith("psutil-")
)
except StopIteration:
sys.exit("psutil sdist did not contain a psutil-* directory")
common_py = src_root / "psutil" / "_common.py"
content = common_py.read_text(encoding="utf-8")
if MARKER not in content:
sys.exit(
"psutil Android compatibility patch marker not found — "
"upstream may have changed the LINUX detection line. "
"Update MARKER/REPLACEMENT in this script."
)
common_py.write_text(content.replace(MARKER, REPLACEMENT), encoding="utf-8")
src_root = prepare_patched_psutil_sdist(archive, tmp_path)
except PsutilAndroidInstallError as exc:
sys.exit(str(exc))
cmd = install_cmd_prefix + ["install", "--no-build-isolation", str(src_root)]
print(f" $ {' '.join(cmd)}")
+24 -1
View File
@@ -46,12 +46,14 @@ ACP_REGISTRY_MANIFEST = REPO_ROOT / "acp_registry" / "agent.json"
# Auto-extracted from noreply emails + manual overrides
AUTHOR_MAP = {
"9592417+adam91holt@users.noreply.github.com": "adam91holt",
"kchuang1015@users.noreply.github.com": "kchuang1015",
"45688690+fujinice@users.noreply.github.com": "fujinice",
"276689385+carltonawong@users.noreply.github.com": "carltonawong",
"195255660+EvilHumphrey@users.noreply.github.com": "EvilHumphrey",
"270604154+superearn-fisher@users.noreply.github.com": "superearn-fisher",
"3540493+kpadilha@users.noreply.github.com": "kpadilha",
"40378218+chaconne67@users.noreply.github.com": "chaconne67",
"Pluviobyte@users.noreply.github.com": "Pluviobyte",
"sanghyuk_seo@nexcubecorp.com": "sanghyuk-seo-nexcube",
"subrtt@gmail.com": "Brixyy",
"wangpuv@hotmail.com": "wangpuv",
@@ -73,6 +75,7 @@ AUTHOR_MAP = {
"anadi.jaggia@gmail.com": "Jaggia",
"steve@steveonjava.com": "steveonjava",
"steveonjava@gmail.com": "steveonjava",
"squiddy@2rook.ai": "MoonRay305",
"32201324+simpolism@users.noreply.github.com": "simpolism",
"simpolism@gmail.com": "simpolism",
"jake@nousresearch.com": "simpolism",
@@ -87,6 +90,7 @@ AUTHOR_MAP = {
"omar@techdeveloper.site": "nycomar",
"qiyin.zuo@pcitc.com": "qiyin-code",
"mr.aashiz@gmail.com": "aashizpoudel",
"adityargadgil@gmail.com": "AdityaRajeshGadgil",
"70629228+shaun0927@users.noreply.github.com": "shaun0927",
"soju06@users.noreply.github.com": "Soju06",
"34199905+Soju06@users.noreply.github.com": "Soju06",
@@ -568,7 +572,7 @@ AUTHOR_MAP = {
"ruzzgarcn@gmail.com": "Ruzzgar",
"yukipukikedy@gmail.com": "Yukipukii1",
"alireza78.crypto@gmail.com": "alireza78a",
"brooklyn.bb.nicholson@gmail.com": "brooklynnicholson",
"brooklyn.bb.nicholson@gmail.com": "OutThisLife",
"withapurpose37@gmail.com": "StefanIsMe",
"4317663+helix4u@users.noreply.github.com": "helix4u",
"ifkellx@users.noreply.github.com": "Ifkellx",
@@ -1287,6 +1291,8 @@ AUTHOR_MAP = {
"rudi193@gmail.com": "rudi193-cmd",
"86684667+sadiksaifi@users.noreply.github.com": "sadiksaifi", # PR #27982 salvage (kanban horiz scroll)
"mail@sadiksaifi.dev": "sadiksaifi",
"231588442+vynxevainglory-ai@users.noreply.github.com": "vynxevainglory-ai", # PR #29233 salvage (kanban scrollbar + body overflow)
"vynxevainglory@gmail.com": "vynxevainglory-ai",
# batch salvage (May 2026 LHF run, group 8)
"266824395+AceWattGit@users.noreply.github.com": "AceWattGit", # PR #28159 salvage (_pool_may_recover NameError)
"57024493+YuanHanzhong@users.noreply.github.com": "YuanHanzhong", # PR #28032 salvage (x.com status link-like)
@@ -1342,6 +1348,23 @@ AUTHOR_MAP = {
"timothy.b.dixon@gmail.com": "Codename-11", # PR #29302 (API server session controls — sessions/chat/fork/stream)
"jpschwartz2@uwalumni.com": "Schwartz10", # PR #29302 sub-PR (multimodal media in session chat API)
"JohnC1009@users.noreply.github.com": "JohnC1009", # PR #32020 salvage (auth: global auth.json fallback in _load_provider_state)
"biser@bisko.be": "bisko", # PR #33784 salvage (re-pad reasoning_content on cross-provider fallback to require-side providers)
# v0.15.0 additions
"glen@workmanfirearms.com": "sgtworkman",
"jorge.fuenmayort@gmail.com": "jfuenmayor",
"mordred@inaugust.com": "emonty",
"rodrigoeq@hotmail.com": "rodrigoeqnit",
"soliva.johnpaul@icloud.com": "jonpol01",
"2182712990@qq.com": "yu-xin-c", # PR #32122 (Docker audio bridge notes)
"baxter@bitreserve.ai": "BaxBit", # PR #30200 (Svix webhook signature validation)
"chris.eth@qq.com": "duyua9", # PR #10949 (render object config values structurally)
"ethie@nous": "ethernet8023", # PR #29342 (TUI clipboard copy on linux/wayland)
"jiahuigu@sjtu.edu.cn": "Jiahui-Gu", # PR #29276 (guard pickle.loads in darwinian-evolver)
"justinccdev@gmail.com": "justincc", # PR #28914 (set tool_name on tool-result messages)
"kdkcfp@gmail.com": "slowtokki0409", # PR #29025 (ignore local Hermes runtime files)
"peter.yuqin@gmail.com": "WuKongAI-CMU", # PR #10082 (reject symlinked audio inputs)
"sunil.nitie@gmail.com": "Sunil123135", # PR #31031 (Windows Docker Desktop compose)
"weichangyuwcy@gmail.com": "ChyuWei", # PR #30987 (TUI TTS env var on voice off)
}
+5
View File
@@ -17,6 +17,11 @@ prerequisites:
Himalaya is a CLI email client that lets you manage emails from the terminal using IMAP, SMTP, Notmuch, or Sendmail backends.
This skill is separate from the Hermes Email gateway adapter. The gateway
adapter lets people email the agent and uses Hermes' built-in IMAP/SMTP
adapter; this skill lets the agent operate a mailbox from terminal tools and
requires the external `himalaya` CLI.
## References
- `references/configuration.md` (config file setup + IMAP/SMTP authentication)
+12 -1
View File
@@ -1188,16 +1188,27 @@ class TestBuildAnthropicKwargs:
# params through its signature, we exercise the strip behavior by
# calling the internal predicate directly.
from agent.anthropic_adapter import _forbids_sampling_params
assert _forbids_sampling_params("claude-opus-4-8") is True
assert _forbids_sampling_params("claude-opus-4-8-fast") is True
assert _forbids_sampling_params("claude-opus-4-7") is True
assert _forbids_sampling_params("claude-opus-4-6") is False
assert _forbids_sampling_params("claude-sonnet-4-5") is False
def test_supports_fast_mode_predicate(self):
"""Fast mode is Opus 4.6 only — Opus 4.7 and others must be excluded."""
"""Fast mode is Opus 4.6 only — Opus 4.7 and others must be excluded.
For Opus 4.8 the fast variant is a separate model ID
(anthropic/claude-opus-4.8-fast) routed through the normal model
field, NOT via the ``speed: "fast"`` request parameter. So
``_supports_fast_mode`` (which gates the parameter) must stay
False for both opus-4-8 and opus-4-8-fast.
"""
from agent.anthropic_adapter import _supports_fast_mode
assert _supports_fast_mode("claude-opus-4-6") is True
assert _supports_fast_mode("anthropic/claude-opus-4-6") is True
assert _supports_fast_mode("claude-opus-4-7") is False
assert _supports_fast_mode("claude-opus-4-8") is False
assert _supports_fast_mode("claude-opus-4-8-fast") is False
assert _supports_fast_mode("claude-sonnet-4-6") is False
assert _supports_fast_mode("claude-haiku-4-5") is False
assert _supports_fast_mode("") is False
+96
View File
@@ -992,6 +992,47 @@ class TestAuxiliaryPoolAwareness:
assert stale_client.chat.completions.create.call_count == 1
assert fresh_client.chat.completions.create.call_count == 1
def test_call_llm_refreshes_nous_after_free_tier_block_when_account_paid(self):
from hermes_cli.nous_account import NousPortalAccountInfo
class _Payment404(Exception):
status_code = 404
stale_client = MagicMock()
stale_client.base_url = "https://inference-api.nousresearch.com/v1"
stale_client.chat.completions.create.side_effect = _Payment404(
"model_not_supported_on_free_tier: model is not available on the free tier"
)
fresh_client = MagicMock()
fresh_client.base_url = "https://inference-api.nousresearch.com/v1"
fresh_client.chat.completions.create.return_value = {"ok": True}
with (
patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("nous", "nous-model", None, None, None)),
patch("agent.auxiliary_client._get_cached_client", return_value=(stale_client, "nous-model")),
patch("agent.auxiliary_client.OpenAI", return_value=fresh_client),
patch("agent.auxiliary_client._validate_llm_response", side_effect=lambda resp, _task: resp),
patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", "https://inference-api.nousresearch.com/v1")),
patch(
"hermes_cli.nous_account.get_nous_portal_account_info",
return_value=NousPortalAccountInfo(
logged_in=True,
source="account_api",
fresh=True,
paid_service_access=True,
),
),
):
result = call_llm(
task="compression",
messages=[{"role": "user", "content": "hi"}],
)
assert result == {"ok": True}
assert stale_client.chat.completions.create.call_count == 1
assert fresh_client.chat.completions.create.call_count == 1
@pytest.mark.asyncio
async def test_async_call_llm_retries_nous_after_401(self):
class _Auth401(Exception):
@@ -1021,6 +1062,48 @@ class TestAuxiliaryPoolAwareness:
assert stale_client.chat.completions.create.await_count == 1
assert fresh_async_client.chat.completions.create.await_count == 1
@pytest.mark.asyncio
async def test_async_call_llm_refreshes_nous_after_free_tier_block_when_account_paid(self):
from hermes_cli.nous_account import NousPortalAccountInfo
class _Payment404(Exception):
status_code = 404
stale_client = MagicMock()
stale_client.base_url = "https://inference-api.nousresearch.com/v1"
stale_client.chat.completions.create = AsyncMock(side_effect=_Payment404(
"model_not_supported_on_free_tier: model is not available on the free tier"
))
fresh_async_client = MagicMock()
fresh_async_client.base_url = "https://inference-api.nousresearch.com/v1"
fresh_async_client.chat.completions.create = AsyncMock(return_value={"ok": True})
with (
patch("agent.auxiliary_client._resolve_task_provider_model", return_value=("nous", "nous-model", None, None, None)),
patch("agent.auxiliary_client._get_cached_client", return_value=(stale_client, "nous-model")),
patch("agent.auxiliary_client._to_async_client", return_value=(fresh_async_client, "nous-model")),
patch("agent.auxiliary_client._validate_llm_response", side_effect=lambda resp, _task: resp),
patch("agent.auxiliary_client._resolve_nous_runtime_api", return_value=("fresh-agent-key", "https://inference-api.nousresearch.com/v1")),
patch(
"hermes_cli.nous_account.get_nous_portal_account_info",
return_value=NousPortalAccountInfo(
logged_in=True,
source="account_api",
fresh=True,
paid_service_access=True,
),
),
):
result = await async_call_llm(
task="session_search",
messages=[{"role": "user", "content": "hi"}],
)
assert result == {"ok": True}
assert stale_client.chat.completions.create.await_count == 1
assert fresh_async_client.chat.completions.create.await_count == 1
def test_cached_gmi_client_keeps_explicit_slash_model_override(self):
import agent.auxiliary_client as aux
@@ -1076,6 +1159,19 @@ class TestIsPaymentError:
exc.status_code = 429
assert _is_payment_error(exc) is True
def test_404_free_tier_model_block_is_payment(self):
exc = Exception(
"Model 'gpt-5' is not available on the Free Tier. "
"Upgrade at https://portal.nousresearch.com or pick a free model."
)
exc.status_code = 404
assert _is_payment_error(exc) is True
def test_404_generic_not_found_is_not_payment(self):
exc = Exception("Not Found")
exc.status_code = 404
assert _is_payment_error(exc) is False
def test_429_without_credits_message_is_not_payment(self):
"""Normal rate limits should NOT be treated as payment errors."""
exc = Exception("Rate limit exceeded, try again in 2 seconds")
+1
View File
@@ -114,6 +114,7 @@ def test_ttfb_includes_silent_hang_hint_for_gpt_5_5(tmp_path, monkeypatch):
statuses: list[str] = []
dummy_client = SimpleNamespace()
monkeypatch.setattr(agent, "_create_request_openai_client", lambda **k: dummy_client)
monkeypatch.setattr(agent, "_buffer_status", lambda msg: statuses.append(msg))
monkeypatch.setattr(agent, "_emit_status", lambda msg: statuses.append(msg))
monkeypatch.setattr(
agent, "_abort_request_openai_client",
@@ -67,3 +67,21 @@ def test_resume_rehydrates_previous_summary_from_handoff_message():
assert "TURNS TO SUMMARIZE:" not in prompt
assert prompt.count(old_summary) == 1
assert f"[USER]: {SUMMARY_PREFIX}" not in prompt
def test_handoff_in_protected_head_populates_previous_summary_before_update():
"""A resumed protected-head handoff should restore iterative-summary state."""
compressor = _compressor()
old_summary = "PROTECTED-HEAD-SUMMARY durable facts from before restart"
seen_turns = []
def fake_generate_summary(turns_to_summarize, focus_topic=None):
seen_turns.extend(turns_to_summarize)
return "new summary from resumed turns"
with patch.object(compressor, "_generate_summary", side_effect=fake_generate_summary):
compressor.compress(_messages_with_handoff(old_summary))
assert compressor._previous_summary == old_summary
assert seen_turns
assert all(old_summary not in str(msg.get("content", "")) for msg in seen_turns)
@@ -0,0 +1,290 @@
"""Regressions for the context-engine host contract.
These tests pin the five generic host-side guarantees that external context
engine plugins (e.g. hermes-lcm) rely on:
1. ``_transition_context_engine_session`` drives the full lifecycle
(on_session_end on_session_reset on_session_start optional
carry_over_new_session_context) and ``reset_session_state`` delegates
to it when callers pass session metadata.
2. ``on_session_start`` receives ``conversation_id`` derived from
``_gateway_session_key`` at agent init time.
3. ``conversation_loop`` forwards canonical cache buckets
(``cache_read_tokens``, ``cache_write_tokens``, ``input_tokens``,
``output_tokens``, ``reasoning_tokens``) to the engine's
``update_from_response``, on top of the legacy aggregate keys.
4. ``_discover_context_engines`` includes plugin-registered engines (not
just repo-shipped engines under ``plugins/context_engine/``).
5. The repo-shipped ``_EngineCollector`` honors ``ctx.register_command``
from a plugin engine's ``register(ctx)`` entry point and routes it
to the global plugin command registry.
"""
from __future__ import annotations
from unittest.mock import MagicMock
import pytest
from run_agent import AIAgent
def _bare_agent() -> AIAgent:
agent = object.__new__(AIAgent)
agent.session_id = "test-session"
agent.model = "fake-model"
agent.platform = "telegram"
agent._gateway_session_key = "agent:main:telegram:dm:42"
return agent
def test_transition_runs_full_lifecycle_in_order():
"""End → reset → start → carry_over, in that order, when all inputs apply."""
events: list[str] = []
engine = MagicMock()
engine.context_length = 200_000
engine.on_session_end.side_effect = lambda *a, **kw: events.append("on_session_end")
engine.on_session_reset.side_effect = lambda *a, **kw: events.append("on_session_reset")
engine.on_session_start.side_effect = lambda *a, **kw: events.append("on_session_start")
engine.carry_over_new_session_context.side_effect = lambda *a, **kw: events.append("carry_over")
agent = _bare_agent()
agent.context_compressor = engine
agent._transition_context_engine_session(
old_session_id="old-sid",
new_session_id="new-sid",
previous_messages=[{"role": "user", "content": "hi"}],
carry_over_context=True,
)
assert events == [
"on_session_end",
"on_session_reset",
"on_session_start",
"carry_over",
]
def test_transition_passes_conversation_id_from_gateway_session_key():
"""on_session_start receives ``conversation_id`` from ``_gateway_session_key``."""
engine = MagicMock()
engine.context_length = 200_000
captured: dict = {}
engine.on_session_start.side_effect = lambda sid, **kw: captured.update(kw)
agent = _bare_agent()
agent.context_compressor = engine
agent._transition_context_engine_session(
old_session_id="old-sid",
new_session_id="new-sid",
previous_messages=[{"role": "user", "content": "hi"}],
)
assert captured.get("conversation_id") == "agent:main:telegram:dm:42"
assert captured.get("old_session_id") == "old-sid"
assert captured.get("platform") == "telegram"
def test_transition_skips_optional_hooks_when_engine_lacks_them():
"""Engines that don't implement on_session_end/carry_over still work."""
class MinimalEngine:
def __init__(self):
self.context_length = 100_000
self.reset_called = False
self.start_called_with = None
def on_session_reset(self):
self.reset_called = True
def on_session_start(self, sid, **kw):
self.start_called_with = (sid, kw)
engine = MinimalEngine()
agent = _bare_agent()
agent.context_compressor = engine
# Should not raise even though on_session_end / carry_over are missing.
agent._transition_context_engine_session(
old_session_id="old",
new_session_id="new",
previous_messages=[{"role": "user", "content": "hi"}],
carry_over_context=True,
)
assert engine.reset_called is True
assert engine.start_called_with is not None
new_sid, kw = engine.start_called_with
assert new_sid == "new"
assert kw.get("old_session_id") == "old"
def test_reset_session_state_delegates_to_transition_when_args_provided():
"""``reset_session_state(previous_messages=..., old_session_id=...)`` fires full lifecycle."""
engine = MagicMock()
engine.context_length = 100_000
agent = _bare_agent()
agent.context_compressor = engine
agent.reset_session_state(
previous_messages=[{"role": "user", "content": "hi"}],
old_session_id="old-sid",
)
assert engine.on_session_end.called
assert engine.on_session_reset.called
assert engine.on_session_start.called
# No carry_over_context, so carry_over hook NOT called.
assert not engine.carry_over_new_session_context.called
def test_reset_session_state_default_call_only_resets():
"""Bare ``reset_session_state()`` still only resets the engine (no end/start)."""
engine = MagicMock()
engine.context_length = 100_000
agent = _bare_agent()
agent.context_compressor = engine
agent.reset_session_state()
assert engine.on_session_reset.called
assert not engine.on_session_end.called
assert not engine.on_session_start.called
def test_update_from_response_forwards_canonical_cache_buckets():
"""conversation_loop passes cache_read/write/reasoning tokens to engine."""
# Test the contract directly: a usage_dict built from CanonicalUsage must
# contain the canonical buckets in addition to the legacy keys. We don't
# spin up the full conversation loop; we just verify the dict shape.
from agent.usage_pricing import CanonicalUsage
canonical = CanonicalUsage(
input_tokens=1000,
output_tokens=500,
cache_read_tokens=800,
cache_write_tokens=200,
reasoning_tokens=50,
)
usage_dict = {
"prompt_tokens": canonical.prompt_tokens,
"completion_tokens": canonical.output_tokens,
"total_tokens": canonical.total_tokens,
"input_tokens": canonical.input_tokens,
"output_tokens": canonical.output_tokens,
"cache_read_tokens": canonical.cache_read_tokens,
"cache_write_tokens": canonical.cache_write_tokens,
"reasoning_tokens": canonical.reasoning_tokens,
}
# Legacy keys present
assert usage_dict["prompt_tokens"] == canonical.prompt_tokens
assert usage_dict["completion_tokens"] == 500
assert usage_dict["total_tokens"] == canonical.total_tokens
# Canonical cache + reasoning buckets present
assert usage_dict["cache_read_tokens"] == 800
assert usage_dict["cache_write_tokens"] == 200
assert usage_dict["reasoning_tokens"] == 50
assert usage_dict["input_tokens"] == 1000
assert usage_dict["output_tokens"] == 500
def test_discover_context_engines_includes_plugin_registered_engines(monkeypatch):
"""Plugin-registered context engines appear in the ``hermes plugins`` picker."""
from hermes_cli import plugins_cmd
fake_repo = lambda: [("compressor", "built-in", True)]
class FakePluginEngine:
name = "lcm"
monkeypatch.setattr(
"plugins.context_engine.discover_context_engines",
fake_repo,
)
monkeypatch.setattr(
"hermes_cli.plugins.discover_plugins",
lambda *_a, **_kw: None,
)
monkeypatch.setattr(
"hermes_cli.plugins.get_plugin_context_engine",
lambda: FakePluginEngine(),
)
engines = plugins_cmd._discover_context_engines()
names = [n for n, _desc in engines]
assert "compressor" in names
assert "lcm" in names
def test_discover_context_engines_dedupes_by_name(monkeypatch):
"""Repo-shipped engine wins when name collides with a plugin-registered one."""
from hermes_cli import plugins_cmd
class FakePluginEngine:
name = "compressor" # same name as repo-shipped
monkeypatch.setattr(
"plugins.context_engine.discover_context_engines",
lambda: [("compressor", "built-in compressor", True)],
)
monkeypatch.setattr(
"hermes_cli.plugins.discover_plugins",
lambda *_a, **_kw: None,
)
monkeypatch.setattr(
"hermes_cli.plugins.get_plugin_context_engine",
lambda: FakePluginEngine(),
)
engines = plugins_cmd._discover_context_engines()
# Only one entry — the repo-shipped one. Description is preserved.
assert engines == [("compressor", "built-in compressor")]
def test_engine_collector_forwards_register_command_to_plugin_manager():
"""A plugin context engine can register a slash command via ``ctx.register_command``."""
from plugins.context_engine import _EngineCollector
from hermes_cli.plugins import get_plugin_manager
handler = lambda raw_args: f"echo: {raw_args}"
collector = _EngineCollector(engine_name="my-lcm")
collector.register_command(
"my-lcm-test-cmd",
handler,
description="test command from a context engine",
args_hint="<msg>",
)
manager = get_plugin_manager()
try:
assert "my-lcm-test-cmd" in manager._plugin_commands
entry = manager._plugin_commands["my-lcm-test-cmd"]
assert entry["handler"] is handler
assert entry["args_hint"] == "<msg>"
assert entry["plugin"] == "context-engine:my-lcm"
finally:
# Clean up so we don't leak the registration across tests.
manager._plugin_commands.pop("my-lcm-test-cmd", None)
def test_engine_collector_rejects_builtin_command_conflicts():
"""Context engine cannot shadow built-in slash commands like /help."""
from plugins.context_engine import _EngineCollector
from hermes_cli.plugins import get_plugin_manager
collector = _EngineCollector(engine_name="my-lcm")
collector.register_command("help", lambda *_: "shadow")
manager = get_plugin_manager()
# Must NOT have overwritten / registered against built-in /help.
assert "help" not in manager._plugin_commands or \
manager._plugin_commands["help"].get("plugin") != "context-engine:my-lcm"
+130
View File
@@ -59,6 +59,7 @@ class TestFailoverReason:
"invalid_encrypted_content",
"multimodal_tool_content_unsupported",
"provider_policy_blocked",
"content_policy_blocked",
"thinking_signature", "long_context_tier",
"oauth_long_context_beta_forbidden",
"llama_cpp_grammar_pattern",
@@ -254,12 +255,51 @@ class TestClassifyApiError:
assert result.reason == FailoverReason.billing
assert result.retryable is False
def test_402_out_of_funds_billing(self):
e = MockAPIError(
"Payment Required",
status_code=402,
body={
"status": 402,
"message": (
"Your API key has run out of funds. Please go visit the "
"portal to sort that out: https://portal.nousresearch.com"
),
},
)
result = classify_api_error(e)
assert result.reason == FailoverReason.billing
assert result.retryable is False
def test_402_transient_usage_limit(self):
e = MockAPIError("usage limit exceeded, try again later", status_code=402)
result = classify_api_error(e)
assert result.reason == FailoverReason.rate_limit
assert result.retryable is True
def test_403_plan_entitlement_billing(self):
e = MockAPIError("This plan does not include the requested model", status_code=403)
result = classify_api_error(e)
assert result.reason == FailoverReason.billing
assert result.retryable is False
def test_404_free_tier_model_block_is_billing(self):
e = MockAPIError(
"Not Found",
status_code=404,
body={
"status": 404,
"message": (
"Model 'gpt-5' is not available on the Free Tier. "
"Upgrade at https://portal.nousresearch.com or pick a free model."
),
},
)
result = classify_api_error(e, provider="nous", model="gpt-5")
assert result.reason == FailoverReason.billing
assert result.retryable is False
assert result.should_fallback is True
# ── Rate limit ──
def test_429_rate_limit(self):
@@ -427,6 +467,78 @@ class TestClassifyApiError:
result = classify_api_error(e)
assert result.reason == FailoverReason.provider_policy_blocked
# ── Provider content-policy block (per-prompt safety filter) ──
#
# Distinct from ``provider_policy_blocked`` above — these are upstream
# model-provider safety refusals for THIS prompt, not OpenRouter
# account-level data policy. Recovery is fallback model, not config fix.
# See issue #18028 — OpenAI Codex was burning 3 retries on identical
# refusals before users saw "API failed after 3 retries" on Telegram.
def test_message_only_cyber_content_policy_blocked(self):
# OpenAI Codex returns this without an HTTP status. Retrying the
# same prompt three times only repeats the same policy decision, so
# the classifier must jump straight to fallback / abort instead of
# leaving it in the retryable ``unknown`` bucket.
e = Exception(
"This content was flagged for possible cybersecurity risk. If this "
"seems wrong, try rephrasing your request. To get authorized for "
"security work, join the Trusted Access for Cyber program."
)
result = classify_api_error(e, provider="openai-codex", model="gpt-5.5")
assert result.reason == FailoverReason.content_policy_blocked
assert result.retryable is False
assert result.should_fallback is True
assert result.should_compress is False
def test_400_cyber_content_policy_blocked(self):
# When the SDK does attach a status (e.g. 400), the safety pattern
# must still beat the format_error fallthrough.
e = MockAPIError(
"This content was flagged for possible cybersecurity risk",
status_code=400,
)
result = classify_api_error(e, provider="openai-codex", model="gpt-5.5")
assert result.reason == FailoverReason.content_policy_blocked
assert result.retryable is False
assert result.should_fallback is True
def test_openai_usage_policy_violation_content_policy_blocked(self):
# OpenAI moderation refusal wording from chat completions / responses.
e = MockAPIError(
"Your request was flagged by the moderation system as potentially "
"violating OpenAI's usage policies.",
status_code=400,
)
result = classify_api_error(e, provider="openai", model="gpt-4o")
assert result.reason == FailoverReason.content_policy_blocked
assert result.retryable is False
assert result.should_fallback is True
def test_anthropic_safety_system_content_policy_blocked(self):
# Anthropic safety refusal — distinct phrasing from OpenAI.
e = Exception(
"Your prompt was flagged by our safety system. Please rephrase "
"and try again."
)
result = classify_api_error(e, provider="anthropic", model="claude-3-5-sonnet")
assert result.reason == FailoverReason.content_policy_blocked
assert result.retryable is False
assert result.should_fallback is True
def test_azure_content_filter_content_policy_blocked(self):
# Azure OpenAI returns ``content_filter`` finish reason / error code
# and ``ResponsibleAIPolicyViolation`` in error bodies — both narrow
# tokens, not the generic English phrase.
e = MockAPIError(
"The response was filtered: ResponsibleAIPolicyViolation "
"(finish_reason=content_filter).",
status_code=400,
)
result = classify_api_error(e, provider="azure", model="gpt-4o")
assert result.reason == FailoverReason.content_policy_blocked
assert result.retryable is False
def test_404_model_not_found_still_works(self):
# Regression guard: the new policy-block check must not swallow
# genuine model_not_found 404s.
@@ -753,6 +865,19 @@ class TestClassifyApiError:
result = classify_api_error(e)
assert result.reason == FailoverReason.context_overflow
def test_error_code_model_not_supported_on_free_tier_is_billing(self):
e = MockAPIError(
"Model unavailable",
body={
"error": {
"code": "model_not_supported_on_free_tier",
"message": "Model 'gpt-5' is not available on the Free Tier.",
}
},
)
result = classify_api_error(e, provider="nous", model="gpt-5")
assert result.reason == FailoverReason.billing
# ── Message-only patterns (no status code) ──
def test_message_billing_pattern(self):
@@ -760,6 +885,11 @@ class TestClassifyApiError:
result = classify_api_error(e)
assert result.reason == FailoverReason.billing
def test_message_free_tier_model_block_is_billing(self):
e = Exception("Model 'gpt-5' is not available on the Free Tier.")
result = classify_api_error(e, provider="nous", model="gpt-5")
assert result.reason == FailoverReason.billing
def test_message_rate_limit_pattern(self):
e = Exception("rate limit reached for this model")
result = classify_api_error(e)
+25
View File
@@ -0,0 +1,25 @@
from __future__ import annotations
import importlib
import sys
from agent import jiter_preload
def test_preload_jiter_native_extension_loads_sdk_parser_dependency():
assert jiter_preload.preload_jiter_native_extension() is True
assert "jiter.jiter" in sys.modules
def test_preload_jiter_native_extension_is_best_effort(monkeypatch):
monkeypatch.setattr(jiter_preload, "_JITER_PRELOADED", False)
def _raise_missing(name: str):
assert name == "jiter.jiter"
raise ModuleNotFoundError(name)
monkeypatch.setattr(importlib, "import_module", _raise_missing)
assert jiter_preload.preload_jiter_native_extension() is False
assert jiter_preload._JITER_PRELOADED is False
assert isinstance(jiter_preload._JITER_PRELOAD_ERROR, ModuleNotFoundError)
+2 -2
View File
@@ -131,10 +131,10 @@ class TestDefaultContextLengths:
for key, value in DEFAULT_CONTEXT_LENGTHS.items():
if "claude" not in key:
continue
# Claude 4.6+ models (4.6 and 4.7) have 1M context at standard
# Claude 4.6+ models (4.6, 4.7, 4.8) have 1M context at standard
# API pricing (no long-context premium). Older Claude 4.x and
# 3.x models cap at 200k.
if any(tag in key for tag in ("4.6", "4-6", "4.7", "4-7")):
if any(tag in key for tag in ("4.6", "4-6", "4.7", "4-7", "4.8", "4-8")):
assert value == 1000000, f"{key} should be 1000000"
else:
assert value == 200000, f"{key} should be 200000"
+8 -2
View File
@@ -271,7 +271,10 @@ def test_codex_provider_replaces_incompatible_default_model(monkeypatch):
def test_model_flow_nous_prints_subscription_guidance_without_mutating_explicit_tts(monkeypatch, capsys):
monkeypatch.setattr("hermes_cli.nous_subscription.managed_nous_tools_enabled", lambda: True)
monkeypatch.setattr(
"hermes_cli.nous_subscription.managed_nous_tools_enabled",
lambda *args, **kwargs: True,
)
config = {
"model": {"provider": "nous", "default": "claude-opus-4-6"},
"tts": {"provider": "elevenlabs"},
@@ -306,7 +309,10 @@ def test_model_flow_nous_prints_subscription_guidance_without_mutating_explicit_
def test_model_flow_nous_offers_tool_gateway_prompt_when_unconfigured(monkeypatch, capsys):
monkeypatch.setattr("hermes_cli.nous_subscription.managed_nous_tools_enabled", lambda: True)
monkeypatch.setattr(
"hermes_cli.nous_subscription.managed_nous_tools_enabled",
lambda *args, **kwargs: True,
)
config = {
"model": {"provider": "nous", "default": "claude-opus-4-6"},
"tts": {"provider": "edge"},
+15
View File
@@ -1450,9 +1450,19 @@ class TestRunJobConfigLogging:
"prompt": "hello",
}
# Mock heavy post-yaml work so the test only exercises the warning
# path. Without these mocks, _run_job_impl continues into provider
# resolution and MCP discovery, both of which can spawn subprocesses
# / hit the network and have caused this test to time out on CI
# (>30s wall clock) under load. See PR #33661 follow-up.
with patch("cron.scheduler._hermes_home", tmp_path), \
patch("cron.scheduler._resolve_origin", return_value=None), \
patch("dotenv.load_dotenv"), \
patch("hermes_cli.runtime_provider.resolve_runtime_provider",
return_value={"provider": "openrouter", "api_key": "x",
"base_url": "https://example.invalid",
"api_mode": "chat_completions"}), \
patch("tools.mcp_tool.discover_mcp_tools", return_value=[]), \
patch("run_agent.AIAgent") as mock_agent_cls:
mock_agent = MagicMock()
mock_agent.run_conversation.return_value = {"final_response": "ok"}
@@ -1482,6 +1492,11 @@ class TestRunJobConfigLogging:
with patch("cron.scheduler._hermes_home", tmp_path), \
patch("cron.scheduler._resolve_origin", return_value=None), \
patch("dotenv.load_dotenv"), \
patch("hermes_cli.runtime_provider.resolve_runtime_provider",
return_value={"provider": "openrouter", "api_key": "x",
"base_url": "https://example.invalid",
"api_mode": "chat_completions"}), \
patch("tools.mcp_tool.discover_mcp_tools", return_value=[]), \
patch("run_agent.AIAgent") as mock_agent_cls:
mock_agent = MagicMock()
mock_agent.run_conversation.return_value = {"final_response": "ok"}
@@ -0,0 +1,290 @@
"""Regression tests for the docker-exec privilege-drop shim.
The shim (docker/hermes-exec-shim.sh, installed at /opt/hermes/bin/hermes)
exists to prevent the auth.json ownership-mismatch bug where
`docker exec <c> hermes login` would write /opt/data/auth.json as
root:root mode 0600, leaving the supervised gateway (UID 10000) unable
to read its own credentials and returning "Provider authentication
failed: Hermes is not logged into Nous Portal" on every message.
These tests verify:
1. ``docker exec <c> hermes `` (defaulting to root) gets dropped to the
hermes user before the real binary runs.
2. ``docker exec --user hermes <c> hermes `` (already non-root) short-
circuits and doesn't try to drop again.
3. Files written under $HERMES_HOME from a ``docker exec`` session land
as hermes:hermes the actual user-visible invariant.
4. The HERMES_DOCKER_EXEC_AS_ROOT opt-out lets diagnostic sessions keep
running as root deliberately.
5. The main CMD path (``docker run <image> ``) is unaffected by the
PATH-shim ordering no recursion, no behavior change.
"""
from __future__ import annotations
import subprocess
import time
from collections.abc import Iterator
import pytest
# How long to give a `docker run -d` container before declaring it not ready.
_RUN_READY_TIMEOUT_S = 20
def _wait_for_init(container: str) -> None:
"""Block until /init is up enough that `docker exec` is responsive."""
deadline = time.time() + _RUN_READY_TIMEOUT_S
while time.time() < deadline:
r = subprocess.run(
["docker", "exec", container, "true"],
capture_output=True, timeout=5,
)
if r.returncode == 0:
return
time.sleep(0.2)
pytest.fail(f"container {container} not responsive to docker exec within {_RUN_READY_TIMEOUT_S}s")
@pytest.fixture
def sleep_container(built_image: str, container_name: str) -> Iterator[str]:
"""Long-lived container running `sleep infinity` so we can docker exec into it."""
subprocess.run(
["docker", "rm", "-f", container_name],
capture_output=True, check=False,
)
r = subprocess.run(
["docker", "run", "-d", "--name", container_name, built_image,
"sleep", "infinity"],
capture_output=True, text=True, timeout=30,
)
assert r.returncode == 0, f"docker run failed: {r.stderr}"
try:
_wait_for_init(container_name)
yield container_name
finally:
subprocess.run(
["docker", "rm", "-f", container_name],
capture_output=True, check=False,
)
def test_shim_drops_root_to_hermes_uid(sleep_container: str) -> None:
"""docker exec defaults to root; the shim should drop to uid 10000.
We invoke `hermes` with a Python-style `-c` shim equivalent there's no
pure-hermes "print my uid" command, so we use the venv's python directly
via the shim's PATH lookup: `python -c 'print(os.getuid())'` is resolved
through the venv. But that bypasses the shim. Instead, we exploit the
fact that the venv's `hermes` is a console_scripts entry — under the
hood it's a tiny Python wrapper. We can't easily inject "print my uid"
into it without forking subcommands. Simplest approach: have `hermes`
do anything that writes to disk, then check the file's owner.
Use `hermes config set` which writes config.yaml under HERMES_HOME.
The resulting file ownership tells us what UID the shim ended up at.
"""
# Wipe any prior state.
subprocess.run(
["docker", "exec", "--user", "root", sleep_container,
"rm", "-f", "/opt/data/config.yaml"],
capture_output=True, check=False,
)
# Default docker exec (root) — should be dropped by the shim.
r = subprocess.run(
["docker", "exec", sleep_container,
"hermes", "config", "set", "_test.shim_marker", "1"],
capture_output=True, text=True, timeout=30,
)
assert r.returncode == 0, f"config set failed: stdout={r.stdout!r} stderr={r.stderr!r}"
# The written file must be owned by hermes, not root.
r = subprocess.run(
["docker", "exec", sleep_container,
"stat", "-c", "%U:%G", "/opt/data/config.yaml"],
capture_output=True, text=True, timeout=10,
)
assert r.returncode == 0, f"stat failed: {r.stderr}"
assert r.stdout.strip() == "hermes:hermes", (
f"config.yaml owned by {r.stdout.strip()!r}, expected hermes:hermes. "
"The shim did not drop privileges before invoking hermes."
)
def test_shim_short_circuits_for_non_root_exec(sleep_container: str) -> None:
"""docker exec --user hermes already runs as 10000; shim should be a no-op.
Verified indirectly: the command must still succeed end-to-end. If the
shim incorrectly tried to drop privileges a second time (e.g. by
invoking s6-setuidgid which requires root), it would fail with
EPERM. A clean success proves the short-circuit fired.
"""
subprocess.run(
["docker", "exec", "--user", "root", sleep_container,
"rm", "-f", "/opt/data/config.yaml"],
capture_output=True, check=False,
)
r = subprocess.run(
["docker", "exec", "--user", "hermes", sleep_container,
"hermes", "config", "set", "_test.shim_short_circuit", "1"],
capture_output=True, text=True, timeout=30,
)
assert r.returncode == 0, (
f"docker exec --user hermes failed: {r.stderr!r} stdout={r.stdout!r}. "
"If the shim mis-handled the non-root path, this would fail with EPERM."
)
# File still ends up hermes:hermes — orthogonally confirms uid.
r = subprocess.run(
["docker", "exec", sleep_container,
"stat", "-c", "%U:%G", "/opt/data/config.yaml"],
capture_output=True, text=True, timeout=10,
)
assert r.stdout.strip() == "hermes:hermes"
def test_shim_opt_out_keeps_root(sleep_container: str) -> None:
"""HERMES_DOCKER_EXEC_AS_ROOT=1 should suppress the privilege drop.
Reserved for diagnostic sessions where the operator deliberately
wants root semantics. Verified by writing a file and checking its
owner.
"""
subprocess.run(
["docker", "exec", "--user", "root", sleep_container,
"rm", "-f", "/opt/data/config.yaml"],
capture_output=True, check=False,
)
r = subprocess.run(
["docker", "exec",
"-e", "HERMES_DOCKER_EXEC_AS_ROOT=1",
sleep_container,
"hermes", "config", "set", "_test.opt_out", "1"],
capture_output=True, text=True, timeout=30,
)
assert r.returncode == 0, f"opt-out invocation failed: {r.stderr}"
r = subprocess.run(
["docker", "exec", sleep_container,
"stat", "-c", "%U:%G", "/opt/data/config.yaml"],
capture_output=True, text=True, timeout=10,
)
assert r.stdout.strip() == "root:root", (
f"With HERMES_DOCKER_EXEC_AS_ROOT=1, expected root:root, "
f"got {r.stdout.strip()!r}"
)
@pytest.mark.parametrize("falsy_value", ["0", "false", "no", "", "garbage", "2"])
def test_shim_opt_out_strict_truthiness(
sleep_container: str, falsy_value: str,
) -> None:
"""Anything other than 1/true/yes (case-insensitive) does NOT opt out.
Strict truthiness so a typo (``HERMES_DOCKER_EXEC_AS_ROOT=0``) doesn't
silently keep the user as root. Mirrors the policy used by
``HERMES_GATEWAY_NO_SUPERVISE`` in #33583.
"""
subprocess.run(
["docker", "exec", "--user", "root", sleep_container,
"rm", "-f", "/opt/data/config.yaml"],
capture_output=True, check=False,
)
r = subprocess.run(
["docker", "exec",
"-e", f"HERMES_DOCKER_EXEC_AS_ROOT={falsy_value}",
sleep_container,
"hermes", "config", "set", "_test.falsy", "1"],
capture_output=True, text=True, timeout=30,
)
assert r.returncode == 0, f"falsy value {falsy_value!r} caused failure: {r.stderr}"
r = subprocess.run(
["docker", "exec", sleep_container,
"stat", "-c", "%U:%G", "/opt/data/config.yaml"],
capture_output=True, text=True, timeout=10,
)
assert r.stdout.strip() == "hermes:hermes", (
f"falsy opt-out value {falsy_value!r} unexpectedly suppressed the drop; "
f"file owner is {r.stdout.strip()!r}, expected hermes:hermes"
)
def test_main_cmd_path_unaffected(built_image: str) -> None:
"""The CMD path (docker run <image> <args>) must still work.
The shim sits at /opt/hermes/bin earliest on PATH; main-wrapper.sh
invokes `s6-setuidgid hermes hermes <args>` which resolves `hermes`
through PATH. With the shim in the way, this could regress if the
shim recurses or interferes with TTY/exit-code propagation.
`chat --help` is cheap and exercises the full subcommand
passthrough path. The duplicate of test_main_invocation's
pre-existing test is intentional that one would have passed
pre-shim too; this one specifically guards against shim regressions
in the CMD-as-main-program codepath.
"""
r = subprocess.run(
["docker", "run", "--rm", built_image, "chat", "--help"],
capture_output=True, text=True, timeout=60,
)
assert r.returncode == 0, f"CMD path broken by shim: stderr={r.stderr!r}"
assert "Traceback" not in r.stderr
def test_e2e_login_then_supervised_gateway_can_read_auth(
sleep_container: str,
) -> None:
"""End-to-end regression for the original bug.
Pre-shim: ``docker exec <c> hermes login`` (root) wrote
/opt/data/auth.json as root:root 0600. The supervised gateway (UID
10000) couldn't read it, _load_auth_store swallowed PermissionError
as a parse failure, and resolve_nous_runtime_credentials raised
"Hermes is not logged into Nous Portal" on every message.
We can't do a real OAuth login in a unit test, but we can stand in
for it by writing the same file shape via `hermes config set`-style
writes what matters is the *file ownership invariant* downstream
of `_save_auth_store`. If the shim works, every file the
`docker exec` path produces is hermes-readable.
Specifically: pretend the operator ran `hermes login` (writes
auth.json) and verify (a) the file exists and (b) it's readable by
the hermes UID. We use `hermes auth list` since that touches the
auth store on the read side and would fail with the same
'not logged in' shape if the file was unreadable to uid 10000.
"""
# Have the shim-protected `docker exec` write the auth store.
# `hermes auth list` is read-only but still exercises _load_auth_store
# under the shim's UID. We invoke `hermes config set` first to
# provoke a write into HERMES_HOME so we have something concrete to
# owner-check.
r = subprocess.run(
["docker", "exec", sleep_container,
"hermes", "config", "set", "_test.e2e_marker", "1"],
capture_output=True, text=True, timeout=30,
)
assert r.returncode == 0, f"config set failed: {r.stderr}"
# The supervised UID (10000) must be able to read everything under
# HERMES_HOME that docker exec just wrote.
r = subprocess.run(
["docker", "exec", "--user", "hermes", sleep_container,
"find", "/opt/data", "-maxdepth", "2", "-type", "f",
"!", "-readable", "-print"],
capture_output=True, text=True, timeout=15,
)
assert r.returncode == 0, f"find failed: {r.stderr}"
unreadable = [ln for ln in r.stdout.splitlines() if ln.strip()]
assert not unreadable, (
"Files written by `docker exec` are unreadable to the hermes user "
f"(supervised gateway UID): {unreadable}. The shim failed to drop "
"privileges before the write."
)
+104
View File
@@ -0,0 +1,104 @@
"""Regression test: ``hermes dump`` reports a real git SHA inside the container.
Background: ``.dockerignore`` excludes ``.git``, so ``git rev-parse HEAD``
fails inside the published image and ``hermes dump`` used to report
``version: ... [(unknown)]``. The Dockerfile now writes the build-time
``$HERMES_GIT_SHA`` build-arg to ``/opt/hermes/.hermes_build_sha`` and
``hermes_cli/build_info.py`` reads it as a fallback.
CI (``.github/workflows/docker-publish.yml``) always sets the build-arg
to ``${{ github.sha }}``. Local ``docker build`` (the ``built_image``
fixture in ``tests/docker/conftest.py``) does NOT so locally the file
is absent and ``hermes dump`` correctly falls back to ``(unknown)``.
This test handles both cases:
* If ``/opt/hermes/.hermes_build_sha`` exists in the image, assert that
``hermes dump`` surfaces its content as the version SHA (not
``(unknown)``).
* If the file is absent, assert the legacy behaviour (``(unknown)``)
still holds defensive guard against the helper accidentally
reporting bogus data from somewhere else.
"""
from __future__ import annotations
import re
import subprocess
_VERSION_LINE = re.compile(r"^version:\s+(?P<rest>.+)$", re.MULTILINE)
_SHA_BRACKET = re.compile(r"\[(?P<sha>[^\]]+)\]\s*$")
def _run_dump(image: str) -> str:
"""Return the stdout of ``docker run <image> dump``.
Relies on Docker's anonymous VOLUME for ``/opt/data`` (declared by the
Dockerfile) so the container's hermes user (UID 10000) can bootstrap
its config. Anonymous volumes are auto-cleaned by ``--rm``, so unlike
a host bind-mount we don't have to chown anything to UID 10000 (which
would break cleanup on non-root hosts).
"""
r = subprocess.run(
["docker", "run", "--rm", image, "dump"],
capture_output=True, text=True, timeout=120,
)
assert r.returncode == 0, (
f"hermes dump exited {r.returncode}: "
f"stderr={r.stderr[-1000:]!r}\nstdout={r.stdout[-1000:]!r}"
)
return r.stdout
def _read_baked_sha_from_image(image: str) -> str | None:
"""Return the ``/opt/hermes/.hermes_build_sha`` content, or None if absent."""
r = subprocess.run(
[
"docker", "run", "--rm", "--entrypoint", "cat", image,
"/opt/hermes/.hermes_build_sha",
],
capture_output=True, text=True, timeout=30,
)
if r.returncode != 0:
return None
return r.stdout.strip() or None
def test_dump_reports_baked_sha_when_present(built_image: str) -> None:
"""When the image was built with ``HERMES_GIT_SHA``, dump must surface it.
Together with the smoke-test action (which exercises ``--help``), this
closes the regression loop for the missing-sha bug: any future change
that breaks the baked-file -> dump pipeline will fail CI here.
"""
baked = _read_baked_sha_from_image(built_image)
stdout = _run_dump(built_image)
match = _VERSION_LINE.search(stdout)
assert match, f"no `version:` line in dump output:\n{stdout[:2000]}"
sha_match = _SHA_BRACKET.search(match.group("rest"))
assert sha_match, (
f"`version:` line missing [<sha>] bracket: {match.group('rest')!r}"
)
reported = sha_match.group("sha")
if baked is None:
# Local-build path: no build-arg was passed. Verify the legacy
# fallback ``(unknown)`` is intact — guards against the helper
# ever inventing a SHA from thin air.
assert reported == "(unknown)", (
f"expected '(unknown)' when no SHA baked, got {reported!r}"
)
return
# CI path: build-arg was set, baked file exists. ``hermes dump``
# truncates to 8 chars via ``git rev-parse --short=8`` semantics.
assert reported != "(unknown)", (
"baked SHA file present in image but dump still reported "
f"'(unknown)' — the build-info fallback is broken. "
f"Baked file content: {baked!r}"
)
assert reported == baked[:8], (
f"dump reported {reported!r} but baked file contained {baked!r} "
f"(expected first 8 chars: {baked[:8]!r})"
)
@@ -327,3 +327,69 @@ def test_dashboard_supervised_when_env_set(
assert _svstat_wants_up(container_name, "dashboard"), (
f"dashboard slot not up: {_svstat(container_name, 'dashboard')!r}"
)
def test_supervised_gateway_stdout_reaches_docker_logs(
built_image: str, container_name: str,
) -> None:
"""The supervised gateway's stdout — including the rich-console
startup banner must reach ``docker logs``, not just the rotated
log file under ``${HERMES_HOME}/logs/gateways/<profile>/current``.
Without the ``1`` action directive in ``_render_log_run``, s6-log
swallows the gateway's stdout into the file and ``docker logs``
only sees stderr (Python ``logging`` defaults to stderr). That's
a poor user experience: the iconic "Hermes Gateway Starting…"
banner with the symbol is the most visible "yes, your gateway
started" signal, and forcing users to ``docker exec`` + ``tail``
the log file just to see it is friction users don't expect.
With the ``1`` directive, s6-log forwards every line to its own
stdout (which propagates up through the s6-supervise pipeline to
/init's stdout = container stdout = ``docker logs``) AND also
writes a timestamped copy to the rotated file. Best of both.
We assert by looking for the literal banner glyph (````) a
distinctive character that won't appear in stderr-routed
Python-logging output, so its presence in ``docker logs`` proves
the stdout-tee is working.
"""
subprocess.run(
["docker", "run", "-d", "--name", container_name, built_image,
"gateway", "run"],
check=True, capture_output=True, timeout=30,
)
# Banner is printed during gateway startup — give it time to
# initialize past the imports + config-load phase.
time.sleep(8)
logs = subprocess.run(
["docker", "logs", container_name],
capture_output=True, text=True, timeout=10,
)
combined = logs.stdout + logs.stderr
# The banner ⚕ symbol is the load-bearing assertion — it's unique
# to gateway startup stdout output and won't appear in stderr
# (Python logging) or s6 boot messages.
assert "" in combined or "Hermes Gateway Starting" in combined, (
"Supervised gateway's stdout banner did not reach docker logs. "
"This means the `1` action directive in _render_log_run isn't "
"forwarding stdout to /init. "
f"docker logs (last 2000 chars):\n{combined[-2000:]}\n"
f"file contents:\n{_sh(container_name, 'cat /opt/data/logs/gateways/default/current').stdout}"
)
# Cross-check: the same banner must also be in the rotated log
# file (we kept the file destination, just added stdout). The
# file version has s6-log's ISO 8601 timestamp prefix; the
# docker logs version is raw.
file_contents = _sh(
container_name, "cat /opt/data/logs/gateways/default/current",
).stdout
assert "" in file_contents or "Hermes Gateway Starting" in file_contents, (
"Banner also missing from rotated log file — the file "
"destination may have been dropped by the new s6-log script. "
f"File contents:\n{file_contents}"
)
+6 -5
View File
@@ -1,7 +1,7 @@
"""Tests for the API server bind-address startup guard.
Validates that is_network_accessible() correctly classifies addresses and
that connect() refuses to start on non-loopback without API_SERVER_KEY.
that connect() refuses to start without API_SERVER_KEY.
"""
import socket
@@ -111,13 +111,14 @@ class TestConnectBindGuard:
result = await adapter.connect()
assert result is False
def test_allows_loopback_without_key(self):
"""Loopback with no key should pass the guard."""
@pytest.mark.asyncio
async def test_refuses_loopback_without_key(self):
"""Loopback binds are still an auth boundary and require API_SERVER_KEY."""
adapter = APIServerAdapter(PlatformConfig(enabled=True, extra={"host": "127.0.0.1"}))
assert adapter._api_key == ""
# The guard condition: is_network_accessible(host) AND NOT api_key
# For loopback, is_network_accessible is False so the guard does not block.
assert is_network_accessible(adapter._host) is False
result = await adapter.connect()
assert result is False
@pytest.mark.asyncio
async def test_allows_wildcard_with_key(self):
@@ -851,6 +851,27 @@ async def test_discord_per_user_channel_backfills_too(adapter, monkeypatch):
assert event.channel_context == "[Recent channel messages]\n[Alice] context"
@pytest.mark.asyncio
async def test_discord_participated_thread_backfills_without_mention(adapter, monkeypatch):
"""Known threads still need recent thread context when mention gating is bypassed."""
monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
monkeypatch.delenv("DISCORD_THREAD_REQUIRE_MENTION", raising=False)
adapter.config.extra["history_backfill"] = True
adapter._fetch_channel_context = AsyncMock(return_value="[Recent channel messages]\n[Alice] thread context")
thread = FakeThread(channel_id=456, name="follow-up")
adapter._threads.mark("456")
message = make_message(channel=thread, content="follow-up without mention")
await adapter._handle_message(message)
adapter._fetch_channel_context.assert_awaited_once()
event = adapter.handle_message.await_args.args[0]
assert event.text == "follow-up without mention"
assert event.channel_context == "[Recent channel messages]\n[Alice] thread context"
@pytest.mark.asyncio
async def test_discord_dm_does_not_backfill(adapter, monkeypatch):
"""DMs skip backfill — every DM triggers the bot, so there's no mention gap."""
@@ -884,3 +905,25 @@ async def test_discord_dm_does_not_backfill(adapter, monkeypatch):
assert event.channel_context is None
@pytest.mark.asyncio
async def test_discord_auto_thread_skips_backfill(adapter, monkeypatch):
"""Auto-created threads skip backfill — the thread is brand new with no prior context."""
monkeypatch.setenv("DISCORD_REQUIRE_MENTION", "true")
monkeypatch.setenv("DISCORD_AUTO_THREAD", "true")
monkeypatch.delenv("DISCORD_NO_THREAD_CHANNELS", raising=False)
monkeypatch.delenv("DISCORD_FREE_RESPONSE_CHANNELS", raising=False)
adapter.config.extra["history_backfill"] = True
fake_thread = FakeThread(channel_id=777, name="auto-thread")
adapter._auto_create_thread = AsyncMock(return_value=fake_thread)
adapter._fetch_channel_context = AsyncMock(return_value="[Recent channel messages]\n[Alice] noise")
bot_user = adapter._client.user
parent = FakeTextChannel(channel_id=200, name="general")
message = make_message(channel=parent, content="hello", mentions=[bot_user])
await adapter._handle_message(message)
adapter._auto_create_thread.assert_awaited_once()
adapter._fetch_channel_context.assert_not_awaited()
@@ -624,6 +624,13 @@ class _FakeTextChannel:
self.guild = SimpleNamespace(name=guild_name, id=1)
self.topic = None
def history(self, *args, **kwargs):
async def _empty():
return
yield # pragma: no cover — make this an async generator
return _empty()
class _FakeThreadChannel(_discord_mod.Thread):
"""isinstance(ch, discord.Thread) → True."""
@@ -636,6 +643,13 @@ class _FakeThreadChannel(_discord_mod.Thread):
self.topic = None
self.parent = SimpleNamespace(id=parent_id, name="general", guild=SimpleNamespace(name=guild_name, id=1))
def history(self, *args, **kwargs):
async def _empty():
return
yield # pragma: no cover — make this an async generator
return _empty()
def _fake_message(channel, *, content="Hello", author_id=42, display_name="Jezza"):
return SimpleNamespace(
+50 -3
View File
@@ -11,6 +11,7 @@ from gateway.platforms.msgraph_webhook import AIOHTTP_AVAILABLE, MSGraphWebhookA
def _make_adapter(**extra_overrides) -> MSGraphWebhookAdapter:
extra = {
"host": "127.0.0.1",
"client_state": "expected-client-state",
"accepted_resources": ["communications/onlineMeetings"],
}
@@ -80,6 +81,27 @@ class TestMSGraphValidationHandshake:
# is_connected is a @property on the base adapter, not a method.
assert adapter.is_connected is False
@pytest.mark.anyio
async def test_connect_requires_source_allowlist_on_public_bind(self):
if not AIOHTTP_AVAILABLE:
pytest.skip("aiohttp not installed")
adapter = _make_adapter(host="0.0.0.0", port=0, allowed_source_cidrs=[])
connected = await adapter.connect()
assert connected is False
assert adapter.is_connected is False
@pytest.mark.anyio
async def test_connect_allows_loopback_without_source_allowlist(self):
if not AIOHTTP_AVAILABLE:
pytest.skip("aiohttp not installed")
adapter = _make_adapter(host="127.0.0.1", port=0, allowed_source_cidrs=[])
try:
connected = await adapter.connect()
assert connected is True
assert adapter.is_connected is True
finally:
await adapter.disconnect()
@pytest.mark.anyio
async def test_validation_token_echo_on_get(self):
adapter = _make_adapter()
@@ -381,9 +403,9 @@ class TestMSGraphNotifications:
class TestMSGraphSourceIPAllowlist:
@pytest.mark.anyio
async def test_disabled_by_default_allows_all(self):
"""Empty allowlist preserves pre-existing behavior (dev tunnels, localhost)."""
adapter = _make_adapter() # no allowed_source_cidrs set
async def test_public_bind_without_allowlist_fails_closed(self):
"""Public binds must not accept requests until a source allowlist is configured."""
adapter = _make_adapter(host="0.0.0.0", allowed_source_cidrs=[])
payload = {
"value": [
{
@@ -396,6 +418,24 @@ class TestMSGraphSourceIPAllowlist:
resp = await adapter._handle_notification(
_FakeRequest(json_payload=payload, remote="203.0.113.99")
)
assert resp.status == 403
@pytest.mark.anyio
async def test_loopback_bind_without_allowlist_still_accepts_local_requests(self):
"""Loopback-only listeners may rely on local proxying/tunnels instead of CIDRs."""
adapter = _make_adapter(host="127.0.0.1", allowed_source_cidrs=[])
payload = {
"value": [
{
"id": "notif-ip-local",
"resource": "communications/onlineMeetings/m",
"clientState": "expected-client-state",
}
]
}
resp = await adapter._handle_notification(
_FakeRequest(json_payload=payload, remote="127.0.0.1")
)
assert resp.status == 202
@pytest.mark.anyio
@@ -441,6 +481,13 @@ class TestMSGraphSourceIPAllowlist:
)
assert resp.status == 403
@pytest.mark.anyio
async def test_health_endpoint_also_respects_allowlist(self):
"""The readiness endpoint should not leak counters to arbitrary sources."""
adapter = _make_adapter(allowed_source_cidrs=["10.0.0.0/8"])
resp = await adapter._handle_health(_FakeRequest(remote="203.0.113.99"))
assert resp.status == 403
@pytest.mark.anyio
async def test_invalid_cidr_entries_are_ignored_at_init(self):
"""Malformed CIDR strings should log a warning and be ignored, not crash."""
+267
View File
@@ -0,0 +1,267 @@
"""Tests for the planned-stop marker watcher thread (gateway/run.py).
The watcher is the Windows-fallback path for the v0.13.0 session-resume
feature on Windows ``asyncio.add_signal_handler`` raises
NotImplementedError, so the SIGTERM signal handler never runs and the
shutdown drain (which writes ``resume_pending=True``) is skipped. The
watcher closes this gap by polling for the planned-stop marker file
and translating its existence into the same shutdown-handler call a
real SIGTERM would have produced.
See issue #33778 for the original Windows session-loss bug report.
"""
import asyncio
import threading
import time
from types import SimpleNamespace
from unittest.mock import MagicMock
import pytest
from gateway.run import _run_planned_stop_watcher
class _FakeRunner:
"""Stand-in for GatewayRunner — only exposes the two flags the watcher reads."""
def __init__(self, *, running: bool = True, draining: bool = False):
self._running = running
self._draining = draining
def _make_loop_capturing_calls():
"""Build a fake asyncio loop whose call_soon_threadsafe records its args."""
loop = MagicMock(spec=asyncio.AbstractEventLoop)
loop._captured = []
def fake_call_soon_threadsafe(fn, *args):
loop._captured.append((fn, args))
loop.call_soon_threadsafe = fake_call_soon_threadsafe
return loop
def test_watcher_fires_shutdown_when_marker_appears(tmp_path, monkeypatch):
"""When the marker file exists, the watcher must call the shutdown handler."""
marker = tmp_path / ".gateway-planned-stop.json"
# Patch the marker-path resolver so the watcher polls our temp location.
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
runner = _FakeRunner(running=True, draining=False)
loop = _make_loop_capturing_calls()
shutdown_handler = MagicMock(name="shutdown_signal_handler")
stop_event = threading.Event()
# Drop the marker before the thread starts.
marker.write_text('{"target_pid": 1234}', encoding="utf-8")
watcher = threading.Thread(
target=_run_planned_stop_watcher,
args=(stop_event, runner, loop, shutdown_handler),
kwargs={"poll_interval": 0.05},
daemon=True,
)
watcher.start()
watcher.join(timeout=2.0)
assert not watcher.is_alive(), "Watcher should exit after firing"
assert len(loop._captured) == 1, (
f"Expected exactly one shutdown invocation, got {loop._captured}"
)
fn, args = loop._captured[0]
assert fn is shutdown_handler
# The handler must be called with signal=None (planned stop sentinel).
assert args == (None,)
def test_watcher_does_not_fire_when_marker_absent(tmp_path, monkeypatch):
"""No marker = no shutdown call. Watcher just spins until stop_event."""
marker = tmp_path / ".gateway-planned-stop.json"
# Deliberately do NOT create the marker.
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
runner = _FakeRunner(running=True, draining=False)
loop = _make_loop_capturing_calls()
shutdown_handler = MagicMock()
stop_event = threading.Event()
watcher = threading.Thread(
target=_run_planned_stop_watcher,
args=(stop_event, runner, loop, shutdown_handler),
kwargs={"poll_interval": 0.05},
daemon=True,
)
watcher.start()
time.sleep(0.3) # let it poll a few times
stop_event.set()
watcher.join(timeout=2.0)
assert not watcher.is_alive()
assert loop._captured == [], (
f"No marker present, but watcher fired shutdown: {loop._captured}"
)
shutdown_handler.assert_not_called()
def test_watcher_skips_when_runner_already_draining(tmp_path, monkeypatch):
"""If shutdown is already in progress, don't re-fire the handler.
This prevents a race where the SIGTERM handler is mid-drain and the
watcher would double-tap the shutdown path. We check ``_draining``
so the watcher backs off once any shutdown is in flight.
"""
marker = tmp_path / ".gateway-planned-stop.json"
marker.write_text('{"target_pid": 1234}', encoding="utf-8")
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
# Already draining — watcher should be a no-op.
runner = _FakeRunner(running=False, draining=True)
loop = _make_loop_capturing_calls()
shutdown_handler = MagicMock()
stop_event = threading.Event()
watcher = threading.Thread(
target=_run_planned_stop_watcher,
args=(stop_event, runner, loop, shutdown_handler),
kwargs={"poll_interval": 0.05},
daemon=True,
)
watcher.start()
time.sleep(0.2)
stop_event.set()
watcher.join(timeout=2.0)
assert loop._captured == [], "Watcher fired while runner was already draining"
def test_watcher_skips_when_runner_not_started(tmp_path, monkeypatch):
"""If the runner hasn't started, the marker is for a previous instance —
we shouldn't shutdown a not-yet-running gateway.
"""
marker = tmp_path / ".gateway-planned-stop.json"
marker.write_text('{"target_pid": 9999}', encoding="utf-8")
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
runner = _FakeRunner(running=False, draining=False)
loop = _make_loop_capturing_calls()
shutdown_handler = MagicMock()
stop_event = threading.Event()
watcher = threading.Thread(
target=_run_planned_stop_watcher,
args=(stop_event, runner, loop, shutdown_handler),
kwargs={"poll_interval": 0.05},
daemon=True,
)
watcher.start()
time.sleep(0.2)
stop_event.set()
watcher.join(timeout=2.0)
assert loop._captured == [], "Watcher fired before runner was running"
def test_watcher_responds_to_stop_event_promptly(tmp_path, monkeypatch):
"""Setting stop_event must exit the watcher within ~poll_interval seconds."""
marker = tmp_path / ".gateway-planned-stop.json"
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
runner = _FakeRunner(running=True, draining=False)
loop = _make_loop_capturing_calls()
stop_event = threading.Event()
watcher = threading.Thread(
target=_run_planned_stop_watcher,
args=(stop_event, runner, loop, MagicMock()),
kwargs={"poll_interval": 0.1},
daemon=True,
)
watcher.start()
time.sleep(0.05)
started_stop = time.monotonic()
stop_event.set()
watcher.join(timeout=2.0)
elapsed = time.monotonic() - started_stop
assert not watcher.is_alive()
assert elapsed < 0.5, f"Watcher took {elapsed:.2f}s to honour stop_event"
def test_watcher_fires_only_once_when_marker_persists(tmp_path, monkeypatch):
"""Marker file existing for multiple polls must NOT spam the handler.
The watcher fires once and exits its loop (the shutdown handler is
responsible for consuming the marker on its own thread). If we
re-fired on every tick, the handler would be invoked dozens of
times before the gateway actually shuts down.
"""
marker = tmp_path / ".gateway-planned-stop.json"
marker.write_text('{"target_pid": 1234}', encoding="utf-8")
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
runner = _FakeRunner(running=True, draining=False)
loop = _make_loop_capturing_calls()
stop_event = threading.Event()
watcher = threading.Thread(
target=_run_planned_stop_watcher,
args=(stop_event, runner, loop, MagicMock()),
kwargs={"poll_interval": 0.05},
daemon=True,
)
watcher.start()
# Let the watcher tick several times — but it should exit after the first fire.
watcher.join(timeout=1.0)
assert not watcher.is_alive()
assert len(loop._captured) == 1, (
f"Watcher fired {len(loop._captured)} times; should fire once "
f"and exit (events={loop._captured})"
)
def test_watcher_tolerates_marker_path_resolution_errors(tmp_path, monkeypatch, caplog):
"""If _get_planned_stop_marker_path() raises, the watcher logs and continues."""
from gateway import status as status_mod
call_count = [0]
def explode():
call_count[0] += 1
# First call (the one outside the loop, at thread start) is fine —
# but subsequent .exists() calls on a corrupt Path could explode.
if call_count[0] == 1:
return tmp_path / "nonexistent"
raise OSError("filesystem failed")
monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", explode)
runner = _FakeRunner(running=True, draining=False)
loop = _make_loop_capturing_calls()
stop_event = threading.Event()
watcher = threading.Thread(
target=_run_planned_stop_watcher,
args=(stop_event, runner, loop, MagicMock()),
kwargs={"poll_interval": 0.05},
daemon=True,
)
watcher.start()
time.sleep(0.2)
stop_event.set()
watcher.join(timeout=2.0)
assert not watcher.is_alive(), "Watcher should still honour stop_event after errors"
# No shutdown fired because the marker never reported existence.
assert loop._captured == []
+127
View File
@@ -939,6 +939,133 @@ class TestFinalResponseDeliveryGuard:
assert consumer._final_response_sent is True
class TestFinalContentDeliveredGuard:
"""Regression coverage for #25010 — _final_content_delivered must only be
set when the final response is actually confirmed delivered to the user,
not when a mid-stream edit happened to show partial content. Prematurely
setting this flag causes the gateway to suppress the normal final send,
leaving the user with an incomplete partial message."""
@pytest.mark.asyncio
async def test_mid_stream_edit_success_does_not_mark_content_delivered(self):
"""When the mid-stream edit with finalize=True succeeds but the
subsequent finalize edit fails, _final_content_delivered must stay
False so the gateway does not suppress its fallback send (#25010).
Simulates TelegramAdapter which sets REQUIRES_EDIT_FINALIZE=True,
requiring a second finalize edit even when content is unchanged."""
adapter = MagicMock()
adapter.REQUIRES_EDIT_FINALIZE = True # Telegram adapter behavior
# First send (initial streaming message) succeeds
# Mid-stream finalize edit succeeds
# Final finalize edit FAILS (e.g. flood control on Telegram)
adapter.edit_message = AsyncMock(side_effect=[
SimpleNamespace(success=True), # mid-stream edit
SimpleNamespace(success=True), # finalize edit on line 548
SimpleNamespace(success=False), # final finalize on line 580 (FAILS)
])
adapter.send = AsyncMock(
return_value=SimpleNamespace(success=True, message_id="msg_1"),
)
adapter.MAX_MESSAGE_LENGTH = 4096
config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5)
consumer = GatewayStreamConsumer(adapter, "chat_123", config)
# Simulate streaming: send initial text, then more text, then done
consumer.on_delta("Part one of the response...\n")
task = asyncio.create_task(consumer.run())
await asyncio.sleep(0.05)
consumer.on_delta("Part two, the complete final answer.\n")
await asyncio.sleep(0.05)
consumer.finish()
await task
# The key assertion: _final_content_delivered must NOT be True,
# because the final edit failed and the complete response was never
# confirmed delivered.
assert consumer._final_content_delivered is False, (
"_final_content_delivered was prematurely set to True — gateway "
"will wrongly suppress its fallback send, leaving the user with "
"an incomplete partial message (#25010)"
)
# The gateway must still be allowed to send the complete response
assert consumer._final_response_sent is False, (
"_final_response_sent must also be False when the final edit failed"
)
@pytest.mark.asyncio
async def test_final_edit_success_does_mark_content_delivered(self):
"""When the final finalize edit succeeds, _final_content_delivered
must be True the normal happy path should still work."""
adapter = MagicMock()
adapter.edit_message = AsyncMock(return_value=SimpleNamespace(success=True))
adapter.send = AsyncMock(
return_value=SimpleNamespace(success=True, message_id="msg_1"),
)
adapter.MAX_MESSAGE_LENGTH = 4096
config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5)
consumer = GatewayStreamConsumer(adapter, "chat_123", config)
consumer.on_delta("The complete response.\n")
task = asyncio.create_task(consumer.run())
await asyncio.sleep(0.05)
consumer.finish()
await task
assert consumer._final_content_delivered is True, (
"_final_content_delivered must be True when the final edit succeeds"
)
assert consumer._final_response_sent is True
@pytest.mark.asyncio
async def test_fallback_partial_send_does_not_mark_final_sent(self):
"""When fallback final send delivers only some chunks before failing,
_final_response_sent must stay False so the gateway can still attempt
a complete final send (#25010)."""
call_count = 0
async def fake_send(*, chat_id, content, **kwargs):
nonlocal call_count
call_count += 1
if call_count <= 2:
return SimpleNamespace(success=True, message_id="msg_1")
# Third chunk (fallback continuation) FAILS
return SimpleNamespace(success=False, error="flood_control:13.0")
adapter = MagicMock()
adapter.send = AsyncMock(side_effect=fake_send)
adapter.edit_message = AsyncMock(
return_value=SimpleNamespace(success=False, error="flood_control:13.0"),
)
adapter.MAX_MESSAGE_LENGTH = 4096
config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5)
consumer = GatewayStreamConsumer(adapter, "chat_123", config)
# Trigger enough delta to enter fallback mode
consumer.on_delta("Initial streaming text...\n")
task = asyncio.create_task(consumer.run())
await asyncio.sleep(0.05)
# Send a very long text that will trigger overflow/fallback
long_text = ("x" * 3000 + "\n") + ("y" * 3000 + "\n") + "Final answer.\n"
consumer.on_delta(long_text)
await asyncio.sleep(0.1)
consumer.finish()
await task
assert consumer._final_response_sent is False, (
"Partial fallback send must not set _final_response_sent — gateway "
"must still be able to deliver the complete response (#25010)"
)
class TestEditOverflowSplitAndDeliver:
"""When edit_message split-and-delivers an oversized payload across the
original message + N continuations (Telegram >4096 UTF-16), the consumer
@@ -303,6 +303,88 @@ def test_save_codex_tokens_syncs_credential_pool(tmp_path, monkeypatch):
assert auth["providers"]["openai-codex"]["tokens"]["access_token"] == "new-at"
def test_save_codex_tokens_syncs_manual_device_code_entries(tmp_path, monkeypatch):
"""Re-auth must also refresh ``manual:device_code`` pool entries.
Regression for #33538: a user who hit #33000 before the #33164 fix landed
would have run ``hermes auth add openai-codex`` as a workaround, leaving
a pool entry with ``source="manual:device_code"``. On every subsequent
re-auth via setup/model picker, the singleton-seeded ``device_code`` entry
got refreshed but the ``manual:device_code`` entry stayed stale, recreating
the same 401 token_invalidated symptom that #33164 was supposed to fix.
An interactive Codex device-code re-auth proves the user owns the ChatGPT
account, so it is safe to refresh every device-code-backed entry in the
pool but NOT independent ``manual:api_key`` entries (separate accounts /
explicit API keys).
"""
hermes_home = tmp_path / "hermes"
hermes_home.mkdir(parents=True, exist_ok=True)
(hermes_home / "auth.json").write_text(json.dumps({
"version": 1,
"providers": {
"openai-codex": {
"tokens": {"access_token": "old-at", "refresh_token": "old-rt"},
"last_refresh": "2026-01-01T00:00:00Z",
"auth_mode": "chatgpt",
},
},
"credential_pool": {
"openai-codex": [
{
"id": "seeded",
"source": "device_code",
"auth_type": "oauth",
"access_token": "old-at",
"refresh_token": "old-rt",
},
{
"id": "auth-add",
"source": "manual:device_code",
"auth_type": "oauth",
"access_token": "stale-manual-at",
"refresh_token": "stale-manual-rt",
"last_status": "exhausted",
"last_error_code": 401,
"last_error_reason": "token_invalidated",
},
{
"id": "api-key",
"source": "manual:api_key",
"auth_type": "api_key",
"access_token": "user-api-key",
},
],
},
}))
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
_save_codex_tokens({"access_token": "fresh-at", "refresh_token": "fresh-rt"},
last_refresh="2026-05-28T00:00:00Z")
auth = json.loads((hermes_home / "auth.json").read_text())
pool = auth["credential_pool"]["openai-codex"]
# Singleton-seeded device_code entry: refreshed and error markers cleared.
seeded = next(e for e in pool if e["source"] == "device_code")
assert seeded["access_token"] == "fresh-at"
assert seeded["refresh_token"] == "fresh-rt"
# manual:device_code entry: ALSO refreshed (the new behavior).
manual_dc = next(e for e in pool if e["source"] == "manual:device_code")
assert manual_dc["access_token"] == "fresh-at"
assert manual_dc["refresh_token"] == "fresh-rt"
assert manual_dc["last_refresh"] == "2026-05-28T00:00:00Z"
assert manual_dc["last_status"] is None
assert manual_dc["last_error_code"] is None
assert manual_dc["last_error_reason"] is None
# manual:api_key entry: untouched — independent credential.
api_key = next(e for e in pool if e["source"] == "manual:api_key")
assert api_key["access_token"] == "user-api-key"
assert "refresh_token" not in api_key or api_key.get("refresh_token") is None
def test_import_codex_cli_tokens(tmp_path, monkeypatch):
codex_home = tmp_path / "codex-cli"
codex_home.mkdir(parents=True, exist_ok=True)
+258
View File
@@ -330,6 +330,107 @@ def test_xai_loopback_login_manual_paste_state_mismatch_raises(monkeypatch):
assert exc.value.code == "xai_state_mismatch"
def test_xai_loopback_login_manual_paste_bare_code_succeeds(monkeypatch):
"""Bare-code paste (state=None) must complete login under manual_paste.
xAI's consent page renders the authorization code in-page rather than
redirecting through 127.0.0.1, so on remote/headless setups the only
value the user can obtain is the opaque code with no ``state=``
parameter. ``_parse_pasted_callback`` correctly returns
``state=None`` for that input. The login flow must accept this case
(PKCE still protects the exchange); historically it raised
``xai_state_mismatch``. Regression for the bare-code branch of #26923.
"""
monkeypatch.setattr(
auth_mod, "_xai_oauth_discovery",
lambda *_a, **_k: {
"authorization_endpoint": "https://auth.x.ai/oauth2/authorize",
"token_endpoint": "https://auth.x.ai/oauth2/token",
},
)
monkeypatch.setattr(
auth_mod, "_prompt_manual_callback_paste",
lambda _ru: {
"code": "bare-opaque-code",
"state": None,
"error": None,
"error_description": None,
},
)
def _fake_token_post(*_a, **_k):
return _StubTokenResponse(
{
"access_token": "at",
"refresh_token": "rt",
"id_token": "",
"expires_in": 3600,
"token_type": "Bearer",
}
)
monkeypatch.setattr(auth_mod.httpx, "post", _fake_token_post)
with contextlib.redirect_stdout(io.StringIO()):
creds = auth_mod._xai_oauth_loopback_login(manual_paste=True)
assert creds["tokens"]["access_token"] == "at"
assert creds["tokens"]["refresh_token"] == "rt"
def test_xai_loopback_login_loopback_path_rejects_missing_state(monkeypatch):
"""Loopback (manual_paste=False) must NOT accept ``state=None``.
The bare-code relaxation only applies to the manual-paste path,
where the user demonstrably has no way to supply ``state``. The
HTTP-server path always sees ``state`` populated from the real
callback query string, so missing state there means something is
wrong (a malformed callback, an attacker-supplied request) and
must still raise ``xai_state_mismatch``.
"""
monkeypatch.setattr(
auth_mod, "_xai_oauth_discovery",
lambda *_a, **_k: {
"authorization_endpoint": "https://auth.x.ai/oauth2/authorize",
"token_endpoint": "https://auth.x.ai/oauth2/token",
},
)
class _StubServer:
def shutdown(self):
return None
def server_close(self):
return None
monkeypatch.setattr(
auth_mod, "_xai_start_callback_server",
lambda *_a, **_k: (
_StubServer(),
None,
{"code": "fake", "state": None, "error": None,
"error_description": None},
"http://127.0.0.1:56121/callback",
),
)
monkeypatch.setattr(
auth_mod, "_xai_wait_for_callback",
lambda *_a, **_k: {
"code": "fake",
"state": None,
"error": None,
"error_description": None,
},
)
monkeypatch.setattr(auth_mod, "_xai_validate_loopback_redirect_uri", lambda _u: None)
monkeypatch.setattr(auth_mod, "_print_loopback_ssh_hint", lambda *_a, **_k: None)
with contextlib.redirect_stdout(io.StringIO()):
with pytest.raises(auth_mod.AuthError) as exc:
auth_mod._xai_oauth_loopback_login(manual_paste=False, open_browser=False)
assert exc.value.code == "xai_state_mismatch"
def test_xai_loopback_login_manual_paste_missing_code_raises(monkeypatch):
"""Empty paste must surface as ``xai_code_missing``, not crash."""
monkeypatch.setattr(
@@ -363,6 +464,163 @@ def test_xai_loopback_login_manual_paste_missing_code_raises(monkeypatch):
assert exc.value.code == "xai_code_missing"
def test_xai_loopback_login_timeout_falls_back_to_manual_paste(monkeypatch):
"""Loopback timeout should offer the existing manual-paste path."""
monkeypatch.setattr(
auth_mod, "_xai_oauth_discovery",
lambda *_a, **_k: {
"authorization_endpoint": "https://auth.x.ai/oauth2/authorize",
"token_endpoint": "https://auth.x.ai/oauth2/token",
},
)
class _StubServer:
def shutdown(self):
return None
def server_close(self):
return None
class _StubThread:
def join(self, timeout=None):
return None
monkeypatch.setattr(
auth_mod,
"_xai_start_callback_server",
lambda: (
_StubServer(),
_StubThread(),
{
"code": None,
"state": None,
"error": None,
"error_description": None,
},
"http://127.0.0.1:56121/callback",
),
)
captured: dict = {"state": None, "prompt_calls": 0}
original_build = auth_mod._xai_oauth_build_authorize_url
def _capture(**kwargs):
captured["state"] = kwargs["state"]
return original_build(**kwargs)
monkeypatch.setattr(auth_mod, "_xai_oauth_build_authorize_url", _capture)
def _raise_timeout(*_a, **_k):
raise auth_mod.AuthError(
"xAI authorization timed out waiting for the local callback.",
provider="xai-oauth",
code="xai_callback_timeout",
)
monkeypatch.setattr(auth_mod, "_xai_wait_for_callback", _raise_timeout)
def _fake_prompt(_redirect_uri):
captured["prompt_calls"] += 1
return {
"code": "manual-auth-code",
"state": captured["state"],
"error": None,
"error_description": None,
}
monkeypatch.setattr(auth_mod, "_prompt_manual_callback_paste", _fake_prompt)
monkeypatch.setattr(
auth_mod.sys, "stdin", type("StubStdin", (), {"isatty": lambda self: True})()
)
monkeypatch.setattr(
auth_mod.httpx,
"post",
lambda *_a, **_k: _StubTokenResponse(
{
"access_token": "at-timeout",
"refresh_token": "rt-timeout",
"id_token": "",
"expires_in": 3600,
"token_type": "Bearer",
}
),
)
buf = io.StringIO()
with contextlib.redirect_stdout(buf):
creds = auth_mod._xai_oauth_loopback_login(manual_paste=False)
rendered = buf.getvalue()
assert "xAI loopback callback timed out." in rendered
assert "--manual-paste" in rendered
assert captured["prompt_calls"] == 1
assert creds["tokens"]["access_token"] == "at-timeout"
assert creds["tokens"]["refresh_token"] == "rt-timeout"
def test_xai_loopback_login_timeout_noninteractive_reraises(monkeypatch):
"""Non-interactive stdin must keep the original timeout error."""
monkeypatch.setattr(
auth_mod, "_xai_oauth_discovery",
lambda *_a, **_k: {
"authorization_endpoint": "https://auth.x.ai/oauth2/authorize",
"token_endpoint": "https://auth.x.ai/oauth2/token",
},
)
class _StubServer:
def shutdown(self):
return None
def server_close(self):
return None
class _StubThread:
def join(self, timeout=None):
return None
monkeypatch.setattr(
auth_mod,
"_xai_start_callback_server",
lambda: (
_StubServer(),
_StubThread(),
{
"code": None,
"state": None,
"error": None,
"error_description": None,
},
"http://127.0.0.1:56121/callback",
),
)
monkeypatch.setattr(
auth_mod,
"_xai_wait_for_callback",
lambda *_a, **_k: (_ for _ in ()).throw(
auth_mod.AuthError(
"xAI authorization timed out waiting for the local callback.",
provider="xai-oauth",
code="xai_callback_timeout",
)
),
)
monkeypatch.setattr(
auth_mod.sys, "stdin", type("StubStdin", (), {"isatty": lambda self: False})()
)
monkeypatch.setattr(
auth_mod,
"_prompt_manual_callback_paste",
lambda *_a, **_k: pytest.fail("manual-paste fallback should not run"),
)
with contextlib.redirect_stdout(io.StringIO()):
with pytest.raises(auth_mod.AuthError) as exc:
auth_mod._xai_oauth_loopback_login(manual_paste=False)
assert exc.value.code == "xai_callback_timeout"
# ---------------------------------------------------------------------------
# _print_loopback_ssh_hint — now also mentions --manual-paste
# ---------------------------------------------------------------------------
+46 -2
View File
@@ -667,6 +667,42 @@ def test_get_nous_auth_status_checks_credential_pool(tmp_path, monkeypatch):
assert "example.com" in str(status.get("portal_base_url", ""))
def test_get_nous_auth_status_pool_opaque_key_is_not_portal_login(tmp_path, monkeypatch):
from hermes_cli.auth import get_nous_auth_status, invalidate_nous_auth_status_cache
hermes_home = tmp_path / "hermes"
hermes_home.mkdir(parents=True, exist_ok=True)
(hermes_home / "auth.json").write_text(json.dumps({
"version": 1, "providers": {},
}))
monkeypatch.setenv("HERMES_HOME", str(hermes_home))
invalidate_nous_auth_status_cache()
from agent.credential_pool import PooledCredential, load_pool
pool = load_pool("nous")
entry = PooledCredential.from_dict("nous", {
"access_token": "",
"agent_key": "opaque-agent-key",
"agent_key_expires_at": "2099-01-01T00:00:00+00:00",
"label": "manual opaque key",
"auth_type": "api_key",
"source": "manual",
"base_url": "https://inference.example.com/v1",
"inference_base_url": "https://inference.example.com/v1",
})
pool.add_entry(entry)
status = get_nous_auth_status()
assert status["logged_in"] is False
assert status["inference_credential_present"] is True
assert status["credential_source"] == "pool:manual opaque key"
assert status.get("access_token") is None
assert status.get("portal_base_url") is None
assert status.get("inference_base_url") == "https://inference.example.com/v1"
invalidate_nous_auth_status_cache()
def test_get_nous_auth_status_auth_store_fallback(tmp_path, monkeypatch):
"""get_nous_auth_status() falls back to auth store when credential
pool is empty.
@@ -1023,12 +1059,19 @@ class TestLoginNousSkipKeepsCurrent:
lambda *a, **kw: prompt_returns,
)
monkeypatch.setattr(models_mod, "get_pricing_for_provider", lambda p: {})
monkeypatch.setattr(models_mod, "check_nous_free_tier", lambda: None)
free_tier_calls = []
def _check_nous_free_tier(**kwargs):
free_tier_calls.append(kwargs)
return None
monkeypatch.setattr(models_mod, "check_nous_free_tier", _check_nous_free_tier)
monkeypatch.setattr(
models_mod, "partition_nous_models_by_tier",
lambda ids, p, free_tier=False: (ids, []),
)
monkeypatch.setattr(ns, "prompt_enable_tool_gateway", lambda cfg: None)
return free_tier_calls
def test_skip_keep_current_preserves_provider_and_model(self, tmp_path, monkeypatch):
"""User picks Skip → config.yaml untouched, Nous creds still saved."""
@@ -1070,7 +1113,7 @@ class TestLoginNousSkipKeepsCurrent:
hermes_home, config_path, auth_path = self._setup_home_with_openrouter(
tmp_path, monkeypatch,
)
self._patch_login_internals(
free_tier_calls = self._patch_login_internals(
monkeypatch, prompt_returns="xiaomi/mimo-v2-pro",
)
@@ -1083,6 +1126,7 @@ class TestLoginNousSkipKeepsCurrent:
cfg_after = yaml.safe_load(config_path.read_text())
assert cfg_after["model"]["provider"] == "nous"
assert cfg_after["model"]["default"] == "xiaomi/mimo-v2-pro"
assert free_tier_calls == [{"force_fresh": True}]
auth_after = json.loads(auth_path.read_text())
assert auth_after["active_provider"] == "nous"
+53
View File
@@ -61,3 +61,56 @@ def test_get_git_banner_state_reads_origin_and_head(tmp_path):
state = banner.get_git_banner_state(repo_dir)
assert state == {"upstream": "b2f477a3", "local": "af8aad31", "ahead": 3}
def test_get_git_banner_state_falls_back_to_build_sha_when_no_repo():
"""Docker image case: no .git checkout — baked build SHA fills the gap.
``_resolve_repo_dir`` returns None when neither the running code's
parent nor ``$HERMES_HOME/hermes-agent/`` is a git repo (the canonical
case inside the published container, where .git is dockerignored).
The banner should still report the build SHA so support bug reports
can identify the running commit.
"""
from hermes_cli import banner
with patch.object(banner, "_resolve_repo_dir", return_value=None), \
patch("hermes_cli.build_info.get_build_sha", return_value="abcdef12"):
state = banner.get_git_banner_state()
assert state == {"upstream": "abcdef12", "local": "abcdef12", "ahead": 0}
def test_get_git_banner_state_returns_none_when_no_repo_and_no_build_sha():
"""Pip-installed wheel with neither git checkout nor baked SHA → None.
Banner correctly omits the upstream/local suffix in this case.
"""
from hermes_cli import banner
with patch.object(banner, "_resolve_repo_dir", return_value=None), \
patch("hermes_cli.build_info.get_build_sha", return_value=None):
state = banner.get_git_banner_state()
assert state is None
def test_get_git_banner_state_falls_back_when_live_git_returns_nothing(tmp_path):
"""Shallow clone without origin/main → still surface build SHA if baked.
Some install paths (e.g. ``git clone --depth 1`` without a remote) have
a ``.git`` directory but ``git rev-parse origin/main`` fails. When that
happens AND a baked SHA exists, return the baked one instead of None.
"""
from hermes_cli import banner
repo_dir = tmp_path / "repo"
(repo_dir / ".git").mkdir(parents=True)
# All git invocations fail (returncode=1, empty stdout).
failed = MagicMock(returncode=1, stdout="")
with patch("hermes_cli.banner.subprocess.run", return_value=failed), \
patch("hermes_cli.build_info.get_build_sha", return_value="cafef00d"):
state = banner.get_git_banner_state(repo_dir)
assert state == {"upstream": "cafef00d", "local": "cafef00d", "ahead": 0}
+78
View File
@@ -0,0 +1,78 @@
"""Tests for hermes_cli.build_info — baked-in build SHA resolution.
The build SHA is written by the Dockerfile's ``HERMES_GIT_SHA`` build-arg
into ``<project_root>/.hermes_build_sha``. These tests cover the read-side
helper: missing file, malformed file, truncation, and error tolerance.
"""
from pathlib import Path
from unittest.mock import patch
def test_get_build_sha_returns_none_when_file_absent(tmp_path):
"""Source installs: no file present → None, callers fall back to git."""
from hermes_cli import build_info
missing = tmp_path / ".hermes_build_sha" # never created
with patch.object(build_info, "_BUILD_SHA_FILE", missing):
assert build_info.get_build_sha() is None
def test_get_build_sha_reads_baked_file(tmp_path):
"""Docker image case: file exists with full 40-char SHA → truncated to 8."""
from hermes_cli import build_info
sha_file = tmp_path / ".hermes_build_sha"
sha_file.write_text("abcdef1234567890abcdef1234567890abcdef12\n")
with patch.object(build_info, "_BUILD_SHA_FILE", sha_file):
assert build_info.get_build_sha() == "abcdef12"
def test_get_build_sha_respects_short_argument(tmp_path):
"""``short=N`` truncates to N chars; ``short<=0`` returns full SHA."""
from hermes_cli import build_info
sha_file = tmp_path / ".hermes_build_sha"
full_sha = "abcdef1234567890abcdef1234567890abcdef12"
sha_file.write_text(full_sha + "\n")
with patch.object(build_info, "_BUILD_SHA_FILE", sha_file):
assert build_info.get_build_sha(short=12) == "abcdef123456"
assert build_info.get_build_sha(short=0) == full_sha
assert build_info.get_build_sha(short=-1) == full_sha
def test_get_build_sha_strips_whitespace(tmp_path):
"""The Dockerfile uses ``printf '%s\\n'`` — strip the trailing newline."""
from hermes_cli import build_info
sha_file = tmp_path / ".hermes_build_sha"
sha_file.write_text(" abcdef1234567890\n\n")
with patch.object(build_info, "_BUILD_SHA_FILE", sha_file):
assert build_info.get_build_sha() == "abcdef12"
def test_get_build_sha_returns_none_for_empty_file(tmp_path):
"""A whitespace-only file is treated as absent."""
from hermes_cli import build_info
sha_file = tmp_path / ".hermes_build_sha"
sha_file.write_text(" \n\n")
with patch.object(build_info, "_BUILD_SHA_FILE", sha_file):
assert build_info.get_build_sha() is None
def test_get_build_sha_swallows_read_errors(tmp_path):
"""Any IO exception from the read returns None — never raises."""
from hermes_cli import build_info
sha_file = tmp_path / ".hermes_build_sha"
sha_file.write_text("abcdef1234567890\n")
with patch.object(build_info, "_BUILD_SHA_FILE", sha_file), \
patch.object(Path, "read_text", side_effect=OSError("boom")):
assert build_info.get_build_sha() is None
+20 -6
View File
@@ -144,7 +144,13 @@ class TestCmdUpdateBranchFallback:
mock_run.side_effect = _make_run_side_effect(
branch="main", verify_ok=True, commit_count="1"
)
with patch.object(hm, "_is_termux_env", return_value=False):
# The web UI build runs through _run_with_idle_timeout now (issue
# #33788) so it no longer appears in subprocess.run's call list.
# Mock it so the test doesn't actually shell out to ``tsc``.
import subprocess as _subprocess
build_ok = _subprocess.CompletedProcess([], 0, stdout="", stderr="")
with patch.object(hm, "_is_termux_env", return_value=False), \
patch.object(hm, "_run_with_idle_timeout", return_value=build_ok) as mock_idle:
cmd_update(mock_args)
npm_calls = [
@@ -153,10 +159,11 @@ class TestCmdUpdateBranchFallback:
if call.args and call.args[0][0] == "/usr/bin/npm"
]
# cmd_update runs npm commands in three locations:
# 1. repo root — slash-command / TUI bridge deps
# 2. ui-tui/ — Ink TUI deps
# 3. web/ — install + "npm run build" for the web frontend
# cmd_update runs npm commands in four locations:
# 1. repo root — slash-command / TUI bridge deps (subprocess.run)
# 2. ui-tui/ — Ink TUI deps (subprocess.run)
# 3. web/ — npm install (subprocess.run)
# 4. web/ — npm run build (_run_with_idle_timeout)
#
# Repo-root and ui-tui installs intentionally omit `--silent` and run
# without `capture_output` so optional postinstall scripts (e.g.
@@ -175,11 +182,18 @@ class TestCmdUpdateBranchFallback:
(update_flags, PROJECT_ROOT / "ui-tui"),
]
if len(npm_calls) > 2:
# Only the web/ install is left in subprocess.run; the build moved
# to _run_with_idle_timeout to make Vite progress visible (#33788).
assert npm_calls[2:] == [
(["/usr/bin/npm", "ci", "--silent"], PROJECT_ROOT / "web"),
(["/usr/bin/npm", "run", "build"], PROJECT_ROOT / "web"),
]
# The web UI build itself went through the streaming helper.
mock_idle.assert_called_once()
idle_args, idle_kwargs = mock_idle.call_args
assert idle_args[0] == ["/usr/bin/npm", "run", "build"]
assert idle_kwargs["cwd"] == PROJECT_ROOT / "web"
# Regression for #18840: repo root + ui-tui installs must stream
# output (capture_output=False) so postinstall progress is visible
# to the user.
+185
View File
@@ -0,0 +1,185 @@
"""Tests for ``hermes update`` / ``--check`` inside the Docker container.
Background: ``.dockerignore`` excludes ``.git``, so the existing git-pull
update path can never succeed inside the published image. Before this
fix, ``hermes update`` would fall through to ``"✗ Not a git repository.
Please reinstall: curl ... install.sh"`` — that script installs a *new*
host-side Hermes, not an update to the running container, so the message
was actively misleading.
These tests pin the new behaviour: when ``detect_install_method`` reports
``"docker"`` (stamped by ``docker/stage2-hook.sh``), both the apply path
(``cmd_update``) and the check path (``_cmd_update_check``) print the
``docker pull`` guidance from ``format_docker_update_message`` and exit
with status 1, without running ``git fetch`` / ``subprocess.run``.
"""
from __future__ import annotations
from types import SimpleNamespace
from unittest.mock import patch
import pytest
from hermes_cli.main import _cmd_update_check, cmd_update
# ---------- cmd_update (apply path) ----------
@patch("hermes_cli.config.is_managed", return_value=False)
@patch("hermes_cli.config.detect_install_method", return_value="docker")
@patch("subprocess.run")
def test_cmd_update_in_docker_prints_guidance_and_exits(
mock_run, _mock_method, _mock_managed, capsys
):
"""``hermes update`` inside Docker → friendly message + exit 1, no git calls."""
with pytest.raises(SystemExit) as excinfo:
cmd_update(SimpleNamespace(check=False))
assert excinfo.value.code == 1
out = capsys.readouterr().out
# Spot-check the key guidance — exhaustive wording is locked in by the
# config-module test below to keep these CLI tests resilient to copy edits.
assert "doesn't apply inside the Docker container" in out
assert "docker pull nousresearch/hermes-agent:latest" in out
# No git invocations — the early-return must beat every git command.
git_calls = [c for c in mock_run.call_args_list if c.args and c.args[0] and "git" in str(c.args[0][0])]
assert git_calls == [], f"expected no git calls, got: {git_calls}"
@patch("hermes_cli.config.is_managed", return_value=False)
@patch("hermes_cli.config.detect_install_method", return_value="docker")
@patch("subprocess.run")
def test_cmd_update_check_in_docker_prints_guidance_and_exits(
mock_run, _mock_method, _mock_managed, capsys
):
"""``hermes update --check`` inside Docker → same message + exit 1, no fetch."""
with pytest.raises(SystemExit) as excinfo:
cmd_update(SimpleNamespace(check=True, branch=None))
assert excinfo.value.code == 1
out = capsys.readouterr().out
assert "doesn't apply inside the Docker container" in out
assert "docker pull nousresearch/hermes-agent:latest" in out
git_calls = [c for c in mock_run.call_args_list if c.args and c.args[0] and "git" in str(c.args[0][0])]
assert git_calls == [], f"expected no git calls, got: {git_calls}"
@patch("hermes_cli.config.is_managed", return_value=False)
@patch("hermes_cli.config.detect_install_method", return_value="docker")
@patch("subprocess.run")
def test_cmd_update_in_docker_ignores_yes_and_force(
mock_run, _mock_method, _mock_managed, capsys
):
"""``--yes`` / ``--force`` don't bypass the Docker bail-out.
The point of the bail-out is "git pull will never work here", so even
a user trying to barge through with ``--yes --force`` should see the
docker-pull guidance.
"""
with pytest.raises(SystemExit):
cmd_update(SimpleNamespace(check=False, yes=True, force=True))
assert "docker pull" in capsys.readouterr().out
git_calls = [c for c in mock_run.call_args_list if c.args and c.args[0] and "git" in str(c.args[0][0])]
assert git_calls == []
# ---------- _cmd_update_check (check path, direct entry) ----------
@patch("hermes_cli.config.detect_install_method", return_value="docker")
@patch("subprocess.run")
def test_cmd_update_check_direct_in_docker(mock_run, _mock_method, capsys):
"""Calling ``_cmd_update_check`` directly (no apply path) also bails."""
with pytest.raises(SystemExit) as excinfo:
_cmd_update_check()
assert excinfo.value.code == 1
assert "docker pull" in capsys.readouterr().out
git_calls = [c for c in mock_run.call_args_list if c.args and c.args[0] and "git" in str(c.args[0][0])]
assert git_calls == []
# ---------- Non-Docker installs unaffected ----------
@patch("hermes_cli.config.is_managed", return_value=False)
@patch("hermes_cli.config.detect_install_method", return_value="git")
@patch(
"subprocess.run",
return_value=SimpleNamespace(returncode=0, stdout="0\n", stderr=""),
)
def test_cmd_update_on_git_install_does_not_print_docker_message(
_mock_run, _mock_method, _mock_managed, capsys
):
"""Source/git installs MUST NOT hit the Docker branch.
Regression guard: an over-eager detection refactor could accidentally
route git users through the docker-pull message. We swallow
SystemExit / unrelated errors from the rest of the update flow
those don't matter for this assertion; what matters is that the
docker text is absent.
``subprocess.run`` is mocked because the git path will otherwise shell
out to ``git fetch upstream`` / ``git fetch origin`` on CI runners
with no ``upstream`` remote configured this can hang past the 30s
pytest-timeout depending on git's network behaviour. The stub
returns a successful CompletedProcess-shaped object with ``"0\\n"``
stdout, which both keeps the flow shell-free AND parses cleanly as
the "0 commits behind" rev-list output the check path later parses
via ``int(rev_result.stdout.strip())``.
"""
try:
cmd_update(SimpleNamespace(check=True, branch=None))
except (SystemExit, Exception):
# Update flow may exit for unrelated reasons in a stubbed env —
# that's fine; we only care about the banner not appearing.
pass
assert "doesn't apply inside the Docker container" not in capsys.readouterr().out
@patch("hermes_cli.config.detect_install_method", return_value="pip")
@patch("hermes_cli.banner.check_via_pypi", return_value=0)
def test_cmd_update_check_on_pip_install_still_uses_pypi(
_mock_pypi, _mock_method, capsys
):
"""PyPI installs route to PyPI check, not the Docker bail-out."""
_cmd_update_check()
out = capsys.readouterr().out
assert "Already up to date" in out
assert "doesn't apply inside the Docker container" not in out
# ---------- format_docker_update_message — content lock ----------
def test_format_docker_update_message_contents():
"""Lock in the high-value content of the Docker update message.
These are the bits a user actually needs to act on; if any of them
disappear in a copy edit, the message has lost its value. Specific
wording around them is free to evolve (we don't assert full text).
"""
from hermes_cli.config import format_docker_update_message
msg = format_docker_update_message()
# Primary command — the entire reason this message exists.
assert "docker pull nousresearch/hermes-agent:latest" in msg
# The four key concepts the message must cover:
assert "restart" in msg.lower(), "must explain that a restart is required"
assert "--version" in msg, "must show how to verify the new version"
assert ":latest" in msg, "must mention tag pinning caveat"
assert "HERMES_HOME" in msg or "/opt/data" in msg, (
"must address config persistence across upgrades"
)
# Acknowledges that forks exist (build-your-own-image escape hatch).
assert "fork" in msg.lower() or "Dockerfile" in msg
+118
View File
@@ -0,0 +1,118 @@
"""Tests for hermes_cli.dump._get_git_commit — git SHA resolution for ``hermes dump``.
``hermes dump`` prints the running commit so support bug reports identify the
exact version. Source installs resolve it live via ``git rev-parse``; the
published Docker image excludes ``.git`` and falls back to the baked SHA
written by the Dockerfile's ``HERMES_GIT_SHA`` build-arg.
These tests cover both paths plus the failure modes (no git, no baked file).
"""
from unittest.mock import MagicMock, patch
def test_get_git_commit_uses_live_git_when_available(tmp_path):
"""Source install: ``git rev-parse --short=8 HEAD`` wins; no fallback."""
from hermes_cli import dump
repo_dir = tmp_path / "repo"
repo_dir.mkdir()
git_result = MagicMock(returncode=0, stdout="deadbeef\n")
# build_info should NOT be consulted when live git succeeds.
with patch("hermes_cli.dump.subprocess.run", return_value=git_result) as mock_run, \
patch("hermes_cli.build_info.get_build_sha") as mock_build:
commit = dump._get_git_commit(repo_dir)
assert commit == "deadbeef"
mock_run.assert_called_once()
mock_build.assert_not_called()
def test_get_git_commit_falls_back_to_build_sha_when_live_git_fails(tmp_path):
"""Docker image case: live git returns non-zero → use baked SHA."""
from hermes_cli import dump
repo_dir = tmp_path / "no-git-here"
repo_dir.mkdir()
failed = MagicMock(returncode=128, stdout="")
with patch("hermes_cli.dump.subprocess.run", return_value=failed), \
patch("hermes_cli.build_info.get_build_sha", return_value="cafef00d"):
commit = dump._get_git_commit(repo_dir)
assert commit == "cafef00d"
def test_get_git_commit_falls_back_when_git_returns_empty_stdout(tmp_path):
"""Edge case: git exits 0 but prints nothing — still try the baked SHA."""
from hermes_cli import dump
repo_dir = tmp_path / "repo"
repo_dir.mkdir()
empty = MagicMock(returncode=0, stdout="\n")
with patch("hermes_cli.dump.subprocess.run", return_value=empty), \
patch("hermes_cli.build_info.get_build_sha", return_value="abcdef12"):
commit = dump._get_git_commit(repo_dir)
assert commit == "abcdef12"
def test_get_git_commit_falls_back_when_git_raises(tmp_path):
"""git binary missing (e.g. minimal container w/o git) → baked SHA path."""
from hermes_cli import dump
repo_dir = tmp_path / "repo"
repo_dir.mkdir()
with patch("hermes_cli.dump.subprocess.run", side_effect=FileNotFoundError("git")), \
patch("hermes_cli.build_info.get_build_sha", return_value="feedface"):
commit = dump._get_git_commit(repo_dir)
assert commit == "feedface"
def test_get_git_commit_returns_unknown_when_neither_source_available(tmp_path):
"""Pip-installed wheel: no git, no baked SHA → '(unknown)' (legacy contract)."""
from hermes_cli import dump
repo_dir = tmp_path / "repo"
repo_dir.mkdir()
failed = MagicMock(returncode=128, stdout="")
with patch("hermes_cli.dump.subprocess.run", return_value=failed), \
patch("hermes_cli.build_info.get_build_sha", return_value=None):
commit = dump._get_git_commit(repo_dir)
assert commit == "(unknown)"
def test_get_git_commit_output_format_identical_between_sources(tmp_path):
"""Regression guard: live-git and baked-SHA outputs share the same shape.
Ben explicitly asked for identical output between Docker and source installs
so support tooling that parses ``hermes dump`` doesn't have to special-case
container builds. Both paths must return a bare 8-char SHA no prefix,
no suffix, no annotation.
"""
from hermes_cli import dump
repo_dir = tmp_path / "repo"
repo_dir.mkdir()
# Live-git path.
git_result = MagicMock(returncode=0, stdout="b2f477a3\n")
with patch("hermes_cli.dump.subprocess.run", return_value=git_result):
live = dump._get_git_commit(repo_dir)
# Baked-SHA path.
failed = MagicMock(returncode=128, stdout="")
with patch("hermes_cli.dump.subprocess.run", return_value=failed), \
patch("hermes_cli.build_info.get_build_sha", return_value="b2f477a3"):
baked = dump._get_git_commit(repo_dir)
assert live == baked == "b2f477a3"
# Same length, same charset — no decoration in either branch.
assert len(live) == 8
assert all(c in "0123456789abcdef" for c in live)
+218 -1
View File
@@ -481,4 +481,221 @@ def test_uninstall_access_denied_declined_keeps_task_and_cleans_files(monkeypatc
out = capsys.readouterr().out
assert "Skipped elevation" in out
assert "UAC is Windows' admin approval prompt" in out
assert "Scheduled Task still registered" in out
assert "Scheduled Task still registered" in out
# ---------------------------------------------------------------------------
# stop() drain semantics — issue #33778
#
# Background: on Windows, asyncio.add_signal_handler raises NotImplementedError,
# so the gateway's SIGTERM handler (which drains in-flight agents and writes
# resume_pending=True) never fires when `hermes gateway stop` kills the
# process. The fix: stop() writes the planned_stop_marker first, waits for
# the gateway's marker-watcher thread to drain + exit cleanly, then escalates
# to taskkill if drain times out.
# ---------------------------------------------------------------------------
def test_stop_writes_planned_stop_marker_before_killing(monkeypatch):
"""stop() must write the planned-stop marker BEFORE any kill signal.
Without this, the gateway's drain loop never runs on Windows and
sessions silently lose context across restarts.
"""
pid = 99999
events = []
monkeypatch.setattr(gateway_windows, "_assert_windows", lambda: None)
monkeypatch.setattr(gateway_windows, "is_task_registered", lambda: False)
# Stub the marker write so we can record the order of operations.
from gateway import status as status_mod
def fake_write_marker(target_pid):
events.append(("write_marker", target_pid))
return True
def fake_pid_exists(check_pid):
# Drain succeeds: pid "exits" right after the marker write.
return ("write_marker", pid) not in events
monkeypatch.setattr(status_mod, "write_planned_stop_marker", fake_write_marker)
monkeypatch.setattr(status_mod, "_pid_exists", fake_pid_exists)
monkeypatch.setattr(status_mod, "get_running_pid", lambda: pid)
def fake_kill(**kwargs):
events.append(("kill", kwargs.get("force", False)))
return 0
monkeypatch.setattr("hermes_cli.gateway.kill_gateway_processes", fake_kill)
monkeypatch.setattr("hermes_cli.gateway._get_restart_drain_timeout", lambda: 5.0)
gateway_windows.stop()
# Marker MUST be written before any kill.
kinds = [e[0] for e in events]
assert "write_marker" in kinds, "stop() never wrote the planned-stop marker"
marker_idx = kinds.index("write_marker")
kill_idx = kinds.index("kill") if "kill" in kinds else len(kinds)
assert marker_idx < kill_idx, (
f"stop() killed before writing the marker (events={events})"
)
def test_stop_waits_for_graceful_drain_before_force_kill(monkeypatch):
"""When drain succeeds, stop() should NOT force-kill the gateway.
drained=True means the gateway exited cleanly after seeing the
marker escalating to taskkill /F afterwards would be wasted
work and may emit confusing "killed N processes" output.
"""
pid = 88888
events = []
monkeypatch.setattr(gateway_windows, "_assert_windows", lambda: None)
monkeypatch.setattr(gateway_windows, "is_task_registered", lambda: False)
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "write_planned_stop_marker", lambda p: True)
# Simulate the gateway exiting cleanly after one poll tick.
poll_count = [0]
def fake_pid_exists(check_pid):
poll_count[0] += 1
return poll_count[0] < 2 # alive on first poll, gone on second
monkeypatch.setattr(status_mod, "_pid_exists", fake_pid_exists)
monkeypatch.setattr(status_mod, "get_running_pid", lambda: pid)
def fake_kill(**kwargs):
events.append(("kill", kwargs.get("force", False)))
return 0
monkeypatch.setattr("hermes_cli.gateway.kill_gateway_processes", fake_kill)
monkeypatch.setattr("hermes_cli.gateway._get_restart_drain_timeout", lambda: 5.0)
gateway_windows.stop()
# kill_gateway_processes is still called as the no-op sweep, but
# NOT with force=True — drain succeeded, gateway is already gone.
assert events == [("kill", False)], (
f"After clean drain, force kill should be disabled (events={events})"
)
def test_stop_escalates_to_force_kill_when_drain_times_out(monkeypatch):
"""When drain times out, stop() MUST escalate to force=True.
Drain timeout = gateway is stuck or unresponsive. Without the
taskkill /T /F escalation, the gateway stays alive and the next
`hermes gateway start` fails with "another instance is running".
"""
pid = 77777
events = []
monkeypatch.setattr(gateway_windows, "_assert_windows", lambda: None)
monkeypatch.setattr(gateway_windows, "is_task_registered", lambda: False)
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "write_planned_stop_marker", lambda p: True)
# PID never exits — drain times out.
monkeypatch.setattr(status_mod, "_pid_exists", lambda check_pid: True)
monkeypatch.setattr(status_mod, "get_running_pid", lambda: pid)
def fake_kill(**kwargs):
events.append(("kill", kwargs.get("force", False)))
return 1
monkeypatch.setattr("hermes_cli.gateway.kill_gateway_processes", fake_kill)
# Tiny drain timeout to keep the test fast.
monkeypatch.setattr("hermes_cli.gateway._get_restart_drain_timeout", lambda: 1.0)
gateway_windows.stop()
# When drain times out, kill is invoked with force=True so taskkill /T /F
# walks the process tree.
assert events == [("kill", True)], (
f"After drain timeout, kill must use force=True (events={events})"
)
def test_stop_no_running_gateway_skips_drain(monkeypatch):
"""When no gateway is running, skip the drain wait entirely."""
events = []
monkeypatch.setattr(gateway_windows, "_assert_windows", lambda: None)
monkeypatch.setattr(gateway_windows, "is_task_registered", lambda: False)
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "get_running_pid", lambda: None)
def fake_write_marker(target_pid):
events.append(("write_marker", target_pid))
return True
monkeypatch.setattr(status_mod, "write_planned_stop_marker", fake_write_marker)
monkeypatch.setattr(status_mod, "_pid_exists", lambda check_pid: False)
def fake_kill(**kwargs):
events.append(("kill", kwargs.get("force", False)))
return 0
monkeypatch.setattr("hermes_cli.gateway.kill_gateway_processes", fake_kill)
monkeypatch.setattr("hermes_cli.gateway._get_restart_drain_timeout", lambda: 5.0)
gateway_windows.stop()
# With no PID to drain, no marker is written. Kill sweep still runs
# (defensive — covers the case where a stray gateway is alive without
# a PID file). force=True because drained=False.
assert ("write_marker", None) not in events
assert all(e[0] != "write_marker" for e in events), (
f"Should not write marker when no PID is running (events={events})"
)
assert events == [("kill", True)]
def test_drain_helper_handles_invalid_pid(monkeypatch):
"""_drain_gateway_pid returns False for invalid PIDs without crashing."""
assert gateway_windows._drain_gateway_pid(0, 5.0) is False
assert gateway_windows._drain_gateway_pid(-1, 5.0) is False
def test_drain_helper_returns_true_when_pid_exits_quickly(monkeypatch):
"""_drain_gateway_pid polls _pid_exists until it returns False."""
pid = 66666
poll_count = [0]
def fake_pid_exists(check_pid):
poll_count[0] += 1
return poll_count[0] < 3 # alive twice, then gone
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "write_planned_stop_marker", lambda p: True)
monkeypatch.setattr(status_mod, "_pid_exists", fake_pid_exists)
assert gateway_windows._drain_gateway_pid(pid, drain_timeout=5.0) is True
def test_drain_helper_returns_false_on_timeout(monkeypatch):
"""_drain_gateway_pid returns False when the PID never exits."""
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "write_planned_stop_marker", lambda p: True)
monkeypatch.setattr(status_mod, "_pid_exists", lambda check_pid: True)
assert gateway_windows._drain_gateway_pid(55555, drain_timeout=1.0) is False
def test_drain_helper_still_waits_if_marker_write_fails(monkeypatch):
"""Marker-write failures are swallowed; drain still polls for PID exit.
If the marker can't be written (disk full, permission error), the
gateway can't drain — but the wait still happens so a slow-shutdown
gateway from a different code path (e.g. SIGTERM working on this
platform after all) still gets observed cleanly.
"""
pid = 44444
def fake_write(target_pid):
raise OSError("disk full")
from gateway import status as status_mod
monkeypatch.setattr(status_mod, "write_planned_stop_marker", fake_write)
monkeypatch.setattr(status_mod, "_pid_exists", lambda check_pid: False)
# Returns True because _pid_exists immediately says "gone".
assert gateway_windows._drain_gateway_pid(pid, drain_timeout=5.0) is True
+1 -1
View File
@@ -237,7 +237,7 @@ class TestConfigWriting:
monkeypatch.setattr(
tools_config,
"get_nous_subscription_features",
lambda config: SimpleNamespace(
lambda config, **kwargs: SimpleNamespace(
features={"image_gen": SimpleNamespace(managed_by_nous=True)}
),
)
+140
View File
@@ -5,7 +5,9 @@ from __future__ import annotations
import concurrent.futures
import os
import sqlite3
import sys
import time
import types
import unittest.mock
from pathlib import Path
@@ -49,6 +51,43 @@ def test_init_creates_expected_tables(kanban_home):
assert {"tasks", "task_links", "task_comments", "task_events"} <= names
def test_connect_honors_kanban_busy_timeout_env(kanban_home, monkeypatch):
"""All kanban connections should use the explicit busy-timeout knob.
A worker stampede should wait for SQLite's writer lock instead of failing
immediately with ``database is locked`` during first-connect/WAL/schema
setup. The timeout must be queryable via PRAGMA so CLI, gateway, and tool
connections behave the same way.
"""
monkeypatch.setenv("HERMES_KANBAN_BUSY_TIMEOUT_MS", "123456")
with kb.connect() as conn:
row = conn.execute("PRAGMA busy_timeout").fetchone()
assert row[0] == 123456
def test_cross_process_init_lock_uses_windows_byte_range_lock(tmp_path, monkeypatch):
"""Windows must use a real process lock, not a no-op sidecar open."""
calls: list[tuple[int, int, int]] = []
fake_msvcrt = types.SimpleNamespace(
LK_LOCK=1,
LK_UNLCK=2,
locking=lambda fd, mode, nbytes: calls.append((fd, mode, nbytes)),
)
monkeypatch.setattr(kb, "_IS_WINDOWS", True)
monkeypatch.setitem(sys.modules, "msvcrt", fake_msvcrt)
db_path = tmp_path / "kanban.db"
with kb._cross_process_init_lock(db_path):
assert calls == [(calls[0][0], fake_msvcrt.LK_LOCK, 1)]
assert [call[1:] for call in calls] == [
(fake_msvcrt.LK_LOCK, 1),
(fake_msvcrt.LK_UNLCK, 1),
]
def test_connect_rejects_tls_record_in_sqlite_header(tmp_path, monkeypatch):
"""Kanban should classify TLS-looking page-0 clobbers before WAL setup."""
home = tmp_path / ".hermes"
@@ -3278,6 +3317,44 @@ def test_connect_refuses_corrupt_existing_file(tmp_path):
kb.connect(db_path=db_path)
def test_repeated_corrupt_open_reuses_single_backup(tmp_path):
"""Repeated quarantines of the same corrupt bytes must not amplify disk usage.
Regression for the gateway dispatcher's 5-min retry loop on shared kanban
DBs across multi-profile fleets: each retry on an unchanged corrupt file
used to create a fresh ``.corrupt.<timestamp>.bak`` until disk filled. The
content-addressed backup name is deterministic in the DB's sha256, so
N retries of the same bytes share one backup.
"""
db_path = tmp_path / "kanban.db"
original = _write_corrupt_db(db_path)
backups: set[Path] = set()
for _ in range(10):
kb._INITIALIZED_PATHS.discard(str(db_path.resolve()))
with pytest.raises(kb.KanbanDbCorruptError) as excinfo:
kb.connect(db_path=db_path)
assert excinfo.value.backup_path is not None
backups.add(excinfo.value.backup_path)
assert len(backups) == 1, f"expected 1 deterministic backup, got {len(backups)}"
(backup,) = backups
assert backup.exists()
assert backup.read_bytes() == original
# Mutate the corrupt bytes — fingerprint changes, separate backup preserved.
with db_path.open("r+b") as f:
f.seek(4096)
f.write(b"\xAB" * 64)
kb._INITIALIZED_PATHS.discard(str(db_path.resolve()))
with pytest.raises(kb.KanbanDbCorruptError) as excinfo2:
kb.connect(db_path=db_path)
second_backup = excinfo2.value.backup_path
assert second_backup is not None
assert second_backup != backup
assert second_backup.exists()
def test_locked_healthy_db_does_not_classify_as_corrupt(tmp_path, monkeypatch):
"""A transient lock during the probe must not produce a .corrupt backup
and must not be reported as :class:`KanbanDbCorruptError`. Raw sqlite
@@ -3805,3 +3882,66 @@ def test_dispatch_once_still_reaps_via_extracted_fn(kanban_home):
pids = kb.reap_worker_zombies()
assert pids == [99999]
# ---------------------------------------------------------------------------
# connect_closing(): context manager that actually closes the FD
# Regression coverage for #33159 (kanban.db FD leak — gateway crashes after
# ~4 days). sqlite3.Connection's built-in __exit__ commits/rollbacks but
# does NOT close, so `with kb.connect() as conn:` leaks the FD in
# long-lived processes (gateway run_slash, dashboard decompose handler).
# `connect_closing()` is the leak-safe replacement.
# ---------------------------------------------------------------------------
def test_connect_closing_closes_connection_on_exit(tmp_path):
"""The new context manager MUST actually close the underlying FD."""
db_path = tmp_path / "kanban.db"
kb._INITIALIZED_PATHS.discard(str(db_path.resolve()))
with kb.connect_closing(db_path=db_path) as conn:
conn.execute("SELECT 1").fetchone()
# After exit, the connection MUST be closed — subsequent execute
# should raise ProgrammingError.
with pytest.raises(sqlite3.ProgrammingError):
conn.execute("SELECT 1")
def test_connect_closing_closes_on_exception(tmp_path):
"""Connection closed even when the body raises."""
db_path = tmp_path / "kanban.db"
kb._INITIALIZED_PATHS.discard(str(db_path.resolve()))
captured = []
with pytest.raises(RuntimeError, match="boom"):
with kb.connect_closing(db_path=db_path) as conn:
captured.append(conn)
raise RuntimeError("boom")
with pytest.raises(sqlite3.ProgrammingError):
captured[0].execute("SELECT 1")
def test_connect_closing_yields_usable_connection(tmp_path):
"""Smoke test: schema is initialized and basic ops work."""
db_path = tmp_path / "kanban.db"
kb._INITIALIZED_PATHS.discard(str(db_path.resolve()))
with kb.connect_closing(db_path=db_path) as conn:
tid = kb.create_task(conn, title="closing-cm test")
task = kb.get_task(conn, tid)
assert task is not None
assert task.title == "closing-cm test"
def test_bare_connect_does_not_close_on_context_exit(tmp_path):
"""Document the leak that connect_closing exists to prevent.
sqlite3.Connection's __exit__ commits/rollbacks but doesn't close.
This is the upstream behaviour we cannot change; the regression
guard is to make sure connect_closing() does the right thing.
"""
db_path = tmp_path / "kanban.db"
kb._INITIALIZED_PATHS.discard(str(db_path.resolve()))
with kb.connect(db_path=db_path) as conn:
pass
# Still usable after with-block exit (the leak).
conn.execute("SELECT 1").fetchone()
conn.close() # explicit close to avoid leaking THIS test
+52 -23
View File
@@ -2,6 +2,7 @@
from unittest.mock import patch, MagicMock
from hermes_cli.nous_account import NousPortalAccountInfo
from hermes_cli.models import (
OPENROUTER_MODELS, fetch_openrouter_models, model_ids, detect_provider_for_model,
is_nous_free_tier, partition_nous_models_by_tier,
@@ -308,6 +309,15 @@ class TestDetectProviderForModel:
class TestIsNousFreeTier:
"""Tests for is_nous_free_tier — account tier detection."""
def test_paid_service_access_allowed_true_is_not_free(self):
assert is_nous_free_tier({"paid_service_access": {"allowed": True}}) is False
def test_paid_service_access_allowed_false_is_free(self):
assert is_nous_free_tier({"paid_service_access": {"allowed": False}}) is True
def test_paid_service_access_paid_access_fallback(self):
assert is_nous_free_tier({"paid_service_access": {"paid_access": False}}) is True
def test_paid_plus_tier(self):
assert is_nous_free_tier({"subscription": {"plan": "Plus", "tier": 2, "monthly_charge": 20}}) is False
@@ -657,39 +667,58 @@ class TestCheckNousFreeTierCache:
def teardown_method(self):
_models_mod._free_tier_cache = None
@patch("hermes_cli.models.fetch_nous_account_tier")
@patch("hermes_cli.models.is_nous_free_tier", return_value=True)
def test_result_is_cached(self, mock_is_free, mock_fetch):
"""Second call within TTL returns cached result without API call."""
mock_fetch.return_value = {"subscription": {"monthly_charge": 0}}
with patch("hermes_cli.auth.get_provider_auth_state", return_value={"access_token": "tok"}), \
patch("hermes_cli.auth.resolve_nous_runtime_credentials"):
result1 = check_nous_free_tier()
result2 = check_nous_free_tier()
@patch("hermes_cli.nous_account.get_nous_portal_account_info")
def test_result_is_cached(self, mock_account):
"""Second call within TTL returns cached result without account lookup."""
mock_account.return_value = NousPortalAccountInfo(
logged_in=True,
source="jwt",
fresh=False,
paid_service_access=False,
)
result1 = check_nous_free_tier()
result2 = check_nous_free_tier()
assert result1 is True
assert result2 is True
assert mock_fetch.call_count == 1
assert mock_account.call_count == 1
@patch("hermes_cli.models.fetch_nous_account_tier")
@patch("hermes_cli.models.is_nous_free_tier", return_value=False)
def test_cache_expires_after_ttl(self, mock_is_free, mock_fetch):
"""After TTL expires, the API is called again."""
mock_fetch.return_value = {"subscription": {"monthly_charge": 20}}
with patch("hermes_cli.auth.get_provider_auth_state", return_value={"access_token": "tok"}), \
patch("hermes_cli.auth.resolve_nous_runtime_credentials"):
result1 = check_nous_free_tier()
assert mock_fetch.call_count == 1
@patch("hermes_cli.nous_account.get_nous_portal_account_info")
def test_cache_expires_after_ttl(self, mock_account):
"""After TTL expires, account info is resolved again."""
mock_account.return_value = NousPortalAccountInfo(
logged_in=True,
source="jwt",
fresh=False,
paid_service_access=True,
)
result1 = check_nous_free_tier()
assert mock_account.call_count == 1
cached_result, cached_at = _models_mod._free_tier_cache
_models_mod._free_tier_cache = (cached_result, cached_at - _FREE_TIER_CACHE_TTL - 1)
cached_result, cached_at = _models_mod._free_tier_cache
_models_mod._free_tier_cache = (cached_result, cached_at - _FREE_TIER_CACHE_TTL - 1)
result2 = check_nous_free_tier()
assert mock_fetch.call_count == 2
result2 = check_nous_free_tier()
assert mock_account.call_count == 2
assert result1 is False
assert result2 is False
@patch("hermes_cli.nous_account.get_nous_portal_account_info")
def test_force_fresh_bypasses_cache(self, mock_account):
mock_account.return_value = NousPortalAccountInfo(
logged_in=True,
source="account_api",
fresh=True,
paid_service_access=True,
)
assert check_nous_free_tier() is False
assert check_nous_free_tier(force_fresh=True) is False
assert mock_account.call_count == 2
mock_account.assert_called_with(force_fresh=True)
def test_cache_ttl_is_short(self):
"""TTL should be short enough to catch upgrades quickly (<=5 min)."""
assert _FREE_TIER_CACHE_TTL <= 300

Some files were not shown because too many files have changed in this diff Show More